LNMB PhD Course
Networks and Semidefinite Programming
2014/2015
Monique Laurent
CWI, Amsterdam, and Tilburg University
These notes are based on material developed by M. Laurent and F. Vallentin
for the Mastermath course Semidefinite Optimization
February 8, 2015
CONTENTS
1 Positive semidefinite matrices
1.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Characterizations of positive semidefinite matrices
n
1.1.2 The positive semidefinite cone S0
. . . . . . . . .
1.1.3 The trace inner product . . . . . . . . . . . . . . .
1.2 Basic properties . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Schur complements . . . . . . . . . . . . . . . . .
1.2.2 Kronecker and Hadamard products . . . . . . . . .
1.2.3 Properties of the kernel . . . . . . . . . . . . . . .
1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
3
3
4
4
5
6
7
2 Semidefinite programs
2.1 Semidefinite programs . . . . . . . . . . . . .
2.1.1 Recap on linear programs . . . . . . .
2.1.2 Semidefinite program in primal form .
2.1.3 Semidefinite program in dual form . .
2.1.4 Duality . . . . . . . . . . . . . . . . .
2.2 Application to eigenvalue optimization . . . .
2.3 Some facts about complexity . . . . . . . . . .
2.3.1 More differences between LP and SDP
2.3.2 Algorithms . . . . . . . . . . . . . . .
2.3.3 Gaussian elimination . . . . . . . . . .
2.4 Exercises . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
10
10
10
12
12
14
15
15
16
17
18
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Graph coloring and independent sets
20
3.1 Preliminaries on graphs . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1 Stability and chromatic numbers . . . . . . . . . . . . . . 21
3.1.2 Perfect graphs . . . . . . . . . . . . . . . . . . . . . . . . . 22
i
3.1.3 The perfect graph theorem . . . . . . . . . . . . .
3.2 Linear programming bounds . . . . . . . . . . . . . . . . .
3.2.1 Fractional stable sets and colorings . . . . . . . . .
3.2.2 Polyhedral characterization of perfect graphs . . .
3.3 Semidefinite programming bounds . . . . . . . . . . . . .
3.3.1 The theta number . . . . . . . . . . . . . . . . . .
3.3.2 Computing maximum stable sets in perfect graphs
3.3.3 Minimum colorings of perfect graphs . . . . . . . .
3.4 Other formulations of the theta number . . . . . . . . . .
3.4.1 Dual formulation . . . . . . . . . . . . . . . . . . .
3.4.2 Two more (lifted) formulations . . . . . . . . . . .
3.5 The theta body TH(G) . . . . . . . . . . . . . . . . . . . .
3.6 The theta number for vertex-transitive graphs . . . . . . .
3.7 Bounding the Shannon capacity . . . . . . . . . . . . . . .
3.8 Geometric application . . . . . . . . . . . . . . . . . . . .
3.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Approximating the MAX CUT problem
4.1 Introduction . . . . . . . . . . . . . . . . . . . . .
4.1.1 The MAX CUT problem . . . . . . . . . .
4.1.2 Linear programming relaxation . . . . . .
4.2 The algorithm of Goemans and Williamson . . . .
4.2.1 Semidefinite programming relaxation . .
4.2.2 Dual semidefinite programming relaxation
4.2.3 The Goemans-Williamson algorithm . . .
4.2.4 Remarks on the algorithm . . . . . . . . .
4.3 Extension to variations of MAX CUT . . . . . . .
4.3.1 The maximum bisection problem . . . . .
4.3.2 The maximum k-cut problem . . . . . . .
4.4 Extension to quadratic programming . . . . . . .
4.4.1 Nesterov’s approximation algorithm . . .
4.4.2 Quadratic programs modeling MAX 2SAT
4.4.3 Approximating MAX 2-SAT . . . . . . . .
4.5 Further reading and remarks . . . . . . . . . . .
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . .
ii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
24
24
25
28
28
29
30
31
31
32
34
35
37
38
41
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
45
45
47
50
50
52
54
57
57
57
59
61
61
63
64
65
66
CHAPTER 1
POSITIVE SEMIDEFINITE
MATRICES
In this chapter we collect basic facts about positive semidefinite matrices, which
we will need in the next chapter to define semidefinite programs.
We use the following notation.
√ Throughout
pPnkxk denotes the Euclidean norm
2
of x ∈ Rn , defined by kxk = xT x =
i=1 xi . An orthonormal basis of
Rn is a set of unit vectors {u1 , . . . , un } that are pairwise orthogonal: kui k = 1
for all i and uT
i uj = 0 for all i 6= j. For instance, the standard unit vectors
e1 , . . . , en ∈ Rn form an orthonormal basis. In denotes the n × n identity matrix
and Jn denotes the all-ones matrix (we may sometimes omit the index n if the
dimension is clear from the context). We let S n denote the set of symmetric n×n
matrices and O(n) denote the set of orthogonal matrices. A matrix P ∈ Rn×n
is orthogonal if P P T = In or, equivalently, P T P = In , i.e. the rows (resp., the
columns) of P form an orthonormal basis of Rn . A diagonal matrix D ∈ S n has
entries zero at all off-diagonal positions: Dij = 0 for all i 6= j.
1.1 Basic definitions
1.1.1 Characterizations of positive semidefinite matrices
We recall the notions of eigenvalues and eigenvectors. For a matrix X ∈ Rn×n ,
a nonzero vector u ∈ Rn is an eigenvector of X if there exists a scalar λ ∈ R
such that Xu = λu, then λ is the eigenvalue of X for the eigenvector u. A
fundamental property of symmetric matrices is that they admit a set of eigenvectors {u1 , . . . , un } forming an orthonormal basis of Rn . This is the spectral
decomposition theorem, one of the most important theorems about symmetric
1
matrices.
Theorem 1.1.1. (Spectral decomposition theorem) Any real symmetric matrix
X ∈ S n can be decomposed as
X=
n
X
λi u i u T
i,
(1.1)
i=1
where λ1 , . . . , λn ∈ R are the eigenvalues of X and where u1 , . . . , un ∈ Rn are
the corresponding eigenvectors which form an orthonormal basis of Rn . In matrix
terms, X = P DP T , where D is the diagonal matrix with the λi ’s on the diagonal
and P is the orthogonal matrix with the ui ’s as its columns.
Next we define positive semidefinite matrices and give several equivalent
characterizations.
Theorem 1.1.2. (Positive semidefinite matrices) The following assertions are
equivalent for a symmetric matrix X ∈ S n .
(1) X is positive semidefinite, written as X 0, which is defined by the property: xT Xx ≥ 0 for all x ∈ Rn .
(2) The smallest eigenvalue of
is nonnegative, i.e., the spectral decomposition
PX
n
of X is of the form X = i=1 λi ui uTi with all λi ≥ 0.
(3) X = LLT for some matrix L ∈ Rn×k (for some k ≥ 1), called a Cholesky
decomposition of X.
(4) There exist vectors v1 , . . . , vn ∈ Rk (for some k ≥ 1) such that Xij = viT vj
for all i, j ∈ [n]; the vectors vi ’s are called a Gram representation of X.
(5) All principal minors of X are non-negative.
Proof. (i) =⇒ (ii): By assumption, uT
i Xui ≥ 0 for all i ∈ [n]. On the other hand,
2
Xui = λi ui implies uT
Xu
=
λ
ku
k
= λi , and thus λi ≥ 0 for all i.
i
i
i
i
(ii) =⇒ (iii): By assumption, X has a decomposition (1.1) where all scalars λi
are nonnegative. Define the matrix L ∈ Rn×n whose i-th column is the vector
√
λi ui . Then X = LLT holds.
(iii) =⇒ (iv): Assume X = LLT where L ∈ Rn×k . Let vi ∈ Rk denote the i-th
row of L. The equality X = LLT gives directly that Xij = viT vj for all i, j ∈ [n].
(iv) =⇒ (i): Assume Xij P
= viT vj for all i, j ∈P[n], where v1 , . . . , vnP
∈ Rk , and let
n
n
n
n
T
T
x ∈ R . Then, x Xx = i,j=1 xi xj Xij = i,j=1 xi xj vi vj = k i=1 xi vi k2 is
thus nonnegative. This shows that X 0.
The equivalence (i) ⇐⇒ (v) can be found in any standard Linear Algebra textbook (and will not be used here).
Observe that for a diagonal matrix X, X 0 if and only if its diagonal
entries are nonnegative: Xii ≥ 0 for all i ∈ [n].
2
The above result extends to positive definite matrices. A matrix X is said to
be positive definite, which is denoted as X ≻ 0, if it satisfies any of the following
equivalent properties: (1) xT Xx > 0 for all x ∈ Rn \ {0}; (2) all eigenvalues
of X are strictly positive; (3) in a Cholesky decomposition of X, the matrix L
is nonsingular; (4) in any Gram representation of X as (viT vj )ni,j=1 , the system
of vectors {v1 , . . . , vn } has full rank n; and (5) all the principal minors of X
are positive (in fact positivity of all the leading principal minors already implies
positive definiteness, this is known as Sylvester’s criterion).
n
1.1.2 The positive semidefinite cone S0
n
We let S0
denote the set of all positive semidefinite matrices in S n , called the
n
positive semidefinite cone. Indeed, S0
is a convex cone in S n , i.e., the following
holds:
X, X ′ 0, λ, λ′ ≥ 0 =⇒ λX + λ′ X ′ 0
n
(check it). Moreover, S0
is a closed subset of S n . (Assume we have a sequence
(i)
of matrices X 0 converging to a matrix X as i → ∞ and let x ∈ Rn . Then
xT X (i) x ≥ 0 for all i and thus xT Xx ≥ 0 by taking the limit.) Moreover, as a
n
direct application of (1.1), we find that the cone S0
is generated by rank one
matrices, i.e.,
n
S0
= cone{xxT : x ∈ Rn }.
(1.2)
n
Furthermore, the cone S0
is full-dimensional and the matrices lying in its interior are precisely the positive definite matrices.
1.1.3 The trace inner product
The trace of an n × n matrix A is defined as
Tr(A) =
n
X
Aii .
i=1
Taking the trace is a linear operation:
Tr(λA) = λTr(A), Tr(A + B) = Tr(A) + Tr(B).
Moreover, the trace satisfies the following properties:
Tr(A) = Tr(AT ), Tr(AB) = Tr(BA), Tr(uuT ) = uT u = kuk2 for u ∈ Rn . (1.3)
Using the fact that Tr(uuT ) = 1 for any unit vector u, combined with (1.1), we
deduce that the trace of a symmetric matrix is equal to the sum of its eigenvalues.
Lemma 1.1.3. If X ∈ S n has eigenvalues λ1 , . . . , λn , then Tr(X) = λ1 + . . . + λn .
3
One can define an inner product, denoted as h·, ·i, on Rn×n by setting
hA, Bi = Tr(AT B) =
n
X
i,j=1
Aij Bij for A, B ∈ Rn×n .
This defines the Frobenius norm on Rn×n by setting kAk =
p
hA, Ai =
(1.4)
qP
n
i,j=1
A2ij .
In other words, this is the usual Euclidean norm, just viewing a matrix as a vec2
tor in Rn . For a vector x ∈ Rn we have
hA, xxT i = xT Ax.
The following property is useful to know:
Lemma 1.1.4. Let A, B ∈ S n and P ∈ O(n). Then, hA, Bi = hP AP T , P BP T i.
Proof. Indeed, hP AP T , P BP T i is equal to
Tr(P AP T P BP T ) = Tr(P ABP T ) = Tr(ABP T P ) = Tr(AB) = hA, Bi,
where we have used the fact that P T P = P P T = In and the commutativity rule
from (1.3).
Positive semidefinite matrices satisfy the following fundamental property:
Lemma 1.1.5. For a symmetric matrix A ∈ S n ,
n
A 0 ⇐⇒ hA, Bi ≥ 0 for all B ∈ S0
.
n
Proof. The proof is based on the fact that S0
is generated by rank 1 matrices
T
(recall (1.2)). Indeed, if A 0 then hA, xx i = xT Ax ≥ 0 for all x ∈ Rn , and
n
n
thus hA, Bi ≥ 0 for all B ∈ S0
. Conversely, if hA, Bi ≥ 0 for all B ∈ S0
then,
T
T
for B = xx , we obtain that x Ax ≥ 0, which shows A 0.
n
In other words, the cone S0
is self dual, i.e., it coincides with its dual cone1 .
1.2 Basic properties
1.2.1 Schur complements
We recall some basic operations about positive semidefinite matrices. The proofs
of the following Lemmas 1.2.1, 1.2.2 and 1.2.3 are easy and left as an exercise.
Lemma 1.2.1. If X 0 then every principal submatrix of X is positive semidefinite.
1 By
n is the set of all matrices Y ∈ S n satisfying hY, Xi ≥ 0
definition, the dual of the cone S0
n
for all X ∈ S0 .
4
Moreover, any matrix congruent to X 0 (i.e., of the form P XP T where P
is nonsingular) is positive semidefinite:
Lemma 1.2.2. Let P ∈ Rn×n be a nonsingular matrix. Then,
X 0 ⇐⇒ P XP T 0.
Lemma 1.2.3. Let X ∈ S n be a matrix having the following block-diagonal form:
A 0
X=
.
0 C
Then,
X 0 ⇐⇒ A 0 and B 0.
We now introduce the notion of Schur complement, which can be very useful
for showing positive semidefiniteness.
Lemma 1.2.4. Let X ∈ S n be a matrix in block form
A B
,
X=
BT C
(1.5)
where A ∈ S p , C ∈ S n−p and B ∈ Rp×(n−p) . If A is non-singular, then
X 0 ⇐⇒ A 0 and C − B T A−1 B 0.
The matrix C − B T A−1 B is called the Schur complement of A in X.
Proof. One can verify that the following identity holds:
0
I
T A
P, where P =
X=P
0
0 C − B T A−1 B
A−1 B
.
I
As P is nonsingular, we deduce that X 0 if and only if (P −1 )T XP −1 0 (use
Lemma 1.2.2), which is thus equivalent to A 0 and C − B T A−1 B 0 (use
Lemma 1.2.3).
1.2.2 Kronecker and Hadamard products
Given two matrices A = (Aij ) ∈ Rn×m and B = (Bhk ) ∈ Rp×q , their Kronecker
product is the matrix A ⊗ B ∈ Rnp×mq with entries
Aih,jk = Aij Bhk ∀i ∈ [n], j ∈ [m], h ∈ [p], k ∈ [q].
The matrix A ⊗ B can be seen as the n × m block matrix whose ij-th block is
the p × q matrix Aij B for all i ∈ [n], j ∈ [m]. Alternatively, it can be seen as the
5
p × q block matrix whose hk-block is the n × m matrix Bhk A for h ∈ [p], k ∈ [q].
As an example, I2 ⊗ J3 takes the form:
1 0 1 0 1 0
0 1 0 1 0 1
I2 I2 I2
I2 I2 I2 = 1 0 1 0 1 0 ,
0 1 0 1 0 1
I2 I2 I2
1 0 1 0 1 0
0 1 0 1 0 1
or, after permuting rows and columns, the form:
1 1 1 0
1 1 1 0
1 1 1 0
J3 0
=
0 0 0 1
0 J3
0 0 0 1
0 0 0 1
0
0
0
1
1
1
0
0
0
.
1
1
1
This includes in particular defining the Kronecker product u ⊗ v ∈ Rnp of
two vectors u ∈ Rn and v ∈ Rp , with entries (u ⊗ v)ih = ui vh for i ∈ [n], h ∈ [p].
Given two matrices A, B ∈ Rn×m , their Hadamard product is the matrix
A ◦ B ∈ Rn×m with entries
(A ◦ B)ij = Aij Bij ∀i ∈ [n], j ∈ [m].
Note that A ◦ B coincides with the principal submatrix of A ⊗ B indexed by the
subset of all ‘diagonal’ pairs of indices of the form (ii, jj) for i ∈ [n], j ∈ [m].
Here are some (easy to verify) facts about these products, where the matrices
and vectors have the appropriate sizes.
1. (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD).
2. In particular, (A ⊗ B)(u ⊗ v) = (Au) ⊗ (Bv).
3. Assume A ∈ S n and B ∈ S p have, respectively, eigenvalues α1 , . . . , αn and
β1 , . . . , βp . Then A ⊗ B ∈ S np has eigenvalues αi βh for i ∈ [n], h ∈ [p]. In
particular,
A, B 0 =⇒ A ⊗ B 0 and A ◦ B 0,
A 0 =⇒ A◦k = ((Aij )k )ni,j=1 0 ∀k ∈ N.
1.2.3 Properties of the kernel
Here is a first useful property of the kernel of positive semidefinite matrices.
Lemma 1.2.5. Assume X ∈ S n is positive semidefinite and let x ∈ Rn . Then,
Xx = 0 ⇐⇒ xT Xx = 0.
6
Pn
Proof. The ‘only if’ part is clear. Conversely, decompose
Px = i=1 xi uiT in the
orthonormal base of eigenvectors of X.PThen, Xx = i λi xi ui and x Xx =
P
2
2
T
i λi xi and thus xi = 0 for each i for
i xi λi . Hence, 0 = x Xx gives 0 =
which λi > 0. This shows that x is a linear combination of the eigenvectors ui
with eigenvalue λi = 0, and thus Xx = 0.
Clearly, X 0 implies Xii ≥ 0 for all i (because Xii = eT
i Xei ≥ 0). Moreover, if X 0 has a zero diagonal entry at position (i, i) then the whole i-th
row/column is identically zero. This follows from the following property:
Lemma 1.2.6. Let X ∈ S n be a matrix in block form
A B
X=
,
BT C
(1.6)
where A ∈ S p , C ∈ S n−p and B ∈ Rp×(n−p) . Assume y ∈ Rp belongs to the kernel
of A, i.e., Ay = 0. Then the vector x = (y, 0, . . . , 0) ∈ Rn (obtained from y by
adding zero coordinates at the remaining n − p positions) belongs to the kernel of
X, i.e., Xx = 0.
Proof. We have: xT Xx = uT Au = 0 which, in view of Lemma 1.2.5, implies
that Xx = 0.
We conclude with the following property: The inner product of two positive
semidefinite matrices is zero if and only if their matrix product is equal to 0.
Lemma 1.2.7. Let A, B 0. Then,
hA, Bi = 0 ⇐⇒ AB = 0.
Proof. TheP
‘only if’ part is clear since hA, Bi = Tr(AB). Assume now hA, Bi = 0.
n
Say, B =
uT
i , where λi ≥ 0 and the ui form an orthonormal base.
i=1 λi uiP
T
Then, 0 = hA, Bi = i λi hA, ui uT
i i. This implies that each term λi hA, ui ui i =
T
T
λi ui Aui is equal to 0, since λi ≥ 0 and ui Aui ≥ 0 (as A 0). Hence, λi > 0
implies uT
(by Lemma 1.2.5).
each term
i Aui = 0 and thus Aui = 0 P
P Therefore,
T
λi Aui is equal to 0 and thus AB = A( i λi ui uT
)
=
=
0.
λ
Au
u
i
i
i
i
i
1.3 Exercises
1.1 Given x1 , . . . , xn ∈ R, consider the following matrix
1 x 1 . . . xn
x1 x1 0
0
X= .
.
..
..
.
0
0
xn
0
0
xn
That is, X ∈ S n+1 is the matrix indexed by {0, 1, . . . , n}, with entries
X00 = 1, X0i = Xi0 = Xii = xi for i ∈ [n], and all other entries are equal
to 0.
7
Show: X 0 if and only if xi ≥ 0 for all i ∈ [n] and
Hint: Use Schur complements.
Pn
i=1
xi ≤ 1.
1.2. Define the matrix Fij = (ei − ej )(ei − ej )T ∈ S n for 1 ≤ i < j ≤ n. That is,
Fij has entries 1 at positions (i, i) and (j, j), entries −1 at (i, j) and (j, i),
and entries 0 at all other positions.
(a) Show: Fij 0.
(b) Assume that X ∈ S n satisfies the condition:
X
Xii ≥
|Xij | for all i ∈ [n].
j∈[n]:j6=i
(Then X is said to be diagonally dominant.)
Show: X 0.
1.3 Let X ∈ {±1}n×n be a symmetric matrix whose entries are 1 or −1.
Show: X 0 if and only if X = xxT for some x ∈ {±1}n .
8
CHAPTER 2
SEMIDEFINITE PROGRAMS
Semidefinite programming is the analogue of linear programming but now, instead of having variables that are vectors assumed to lie in the nonnegative
n
orthant Rn≥0 , we have variables that are matrices assumed to lie in the cone S0
of positive semidefinite matrices. Thus semidefinite optimization can be seen as
linear optimization over the convex cone of positive semidefinite matrices.
In this chapter we introduce semidefinite programs and give some basic
properties, in particular, about duality and complexity.
For convenience we recap some notation, mostly already introduced in the
previous chapter. S n denotes the set of symmetric n × n matrices. For a matrix
n
X ∈ S n , X 0 means that X is positive semidefinite and S0
is the cone of
positive semidefinite matrices; X ≻ 0 means that X is positive definite.
Throughout In (or simply I when the dimension is clear from the context) denotes the n × n identity matrix, e denotes the all-ones vector, i.e.,
e = (1, . . . , 1)T ∈ Rn , and Jn = eeT (or simply J) denotes the all-ones matrix. The vectors e1 , . . . , en are the standard unit vectors in Rn , and the matrices
n
T
Eij = (ei eT
j + ej ei )/2 form the standard basis of S . O(n) denotes the set
of orthogonal matrices, where A is orthogonal if AAT = In or, equivalently,
AT A = I n .
Pn
We consider the trace inner product: hA, Bi = Tr(AT B) = i,j=1 Aij Bij for
P
n
two matrices A, B ∈ Rn×n . Here Tr(A) = hIn , Ai = i=1 Aii denotes the trace
of A. Recall that Tr(AB) = Tr(BA); in particular, hQAQT , QBQT i = hA, Bi if Q
is an orthogonal matrix. A well known property of the positive semidefinite cone
n
S0
is that it is self-dual: for a matrix X ∈ S n , X 0 if and only if hX, Y i ≥ 0
n
for all Y ∈ S0
. For a matrix A ∈ S n , diag(A) denotes the vector in Rn with
entries are the diagonal entries of A and, for a vector a ∈ Rn , Diag(a) ∈ S n is
the diagonal matrix with diagonal entries the entries of a.
9
2.1 Semidefinite programs
2.1.1 Recap on linear programs
We begin with recalling the standard form of a linear program, in primal form:
p∗ = maxn {cT x : aT
j x = bj (j ∈ [m]), x ≥ 0},
x∈R
(2.1)
m
where c, a1 , . . . , am ∈ Rn and b = (bj )m
are the given data of the LP.
j=1 ∈ R
Then the dual LP reads:
m
m
X
X
bj y j :
(2.2)
yj a j − c ≥ 0 .
d∗ = minm
y∈R
j=1
j=1
We recall the following well known facts about LP duality:
Theorem 2.1.1. The following holds for the programs (2.1) and (2.2).
1. (weak duality) If x is primal feasible and y is dual feasible then cT x ≤ bT y.
Thus, p∗ ≤ d∗ .
2. (strong duality) p∗ = d∗ unless both programs (2.1) and (2.2) are infeasible (in which case p∗ = −∞ and d∗ = +∞).
If p∗ is finite (i.e., (2.1) is feasible and bounded) or if d∗ is finite (i.e., (2.2) is
feasible and bounded), then p∗ = d∗ and both (2.1) and (2.2) have optimum
solutions.
3. (optimality condition) If (x, y) is a pair of primal/dual feasible solutions,
then they are primal/dual optimal solutions if and only if cT x = bT y or,
equivalently, the complementary slackness condition holds:
m
X
xi
yj aj − c = 0 ∀i ∈ [n].
j=1
i
2.1.2 Semidefinite program in primal form
The standard form of a semidefinite program (abbreviated as SDP) is a maximization problem of the form
p∗ = sup{hC, Xi : hAj , Xi = bj (j ∈ [m]), X 0}.
(2.3)
X
Here A1 , . . . , Am ∈ S n are given n×n symmetric matrices and b ∈ Rm is a given
vector, they are the data of the semidefinite program (2.3). The matrix X is the
variable, which is constrained to be positive semidefinite and to lie in the affine
subspace
W = {X ∈ S n | hAj , Xi = bj (j ∈ [m])}
10
of S n . The goal is to maximize the linear objective function hC, Xi over the
feasible region
n
F = S0
∩ W,
n
obtained by intersecting the positive semidefinite cone S0
with the affine subspace W.
Of course, one can also handle minimization problems, of the form
inf {hC, Xi : hAj , Xi = bj (j ∈ [m]), X 0}
X
since they can be brought into the above standard maximization form using the
fact that infhC, Xi = − suph−C, Xi.
In the special case when the matrices Aj , C are diagonal matrices, with diagonals aj , c ∈ Rn , then the program (2.3) reduces to the linear program (2.1).
Indeed, let x denote the vector consisting of the diagonal entries of the matrix
X, so that x ≥ 0 if X 0, and hC, Xi = cT x, hAj , Xi = aT
j x. Hence semidefinite
programming contains linear programming as a special instance.
A feasible solution X ∈ F is said to be strictly feasible if X is positive definite.
The program (2.3) is said to be strictly feasible if it admits at least one strictly
feasible solution.
Note that we write a supremum in (2.3) rather than a maximum. This is
because the optimum value p∗ might not be attained in (2.3). In general, p∗ ∈
R ∪ {±∞}, with p∗ = −∞ if the problem (2.3) is infeasible (i.e., F = ∅) and
p∗ = +∞ might occur in which case we say that the problem is unbounded.
We give a small example as an illustration.
Example 2.1.2. Consider the problem of minimizing/maximizing X11 over the
feasible region
n
o
X11 a
Fa = X ∈ S 2 : X =
0 where a ∈ R is a given parameter.
a
0
Note that det(X) = −a2 for any X ∈ Fa . Hence, if a 6= 0 then Fa = ∅ (the
problem is infeasible). Moreover, if a = 0 then the problem is feasible but not
strictly feasible. The minimum value of X11 over F0 is equal to 0, attained at
X = 0, while the maximum value of X11 over F0 is equal to ∞ (the problem is
unbounded).
Example 2.1.3. As another example, consider the problem
X11
1
0 .
p∗ = inf 2 X11 :
1
X22
X∈S
(2.4)
Then the infimum is p∗ = 0 which is reached at the limit when X11 = 1/X22 and
letting X22 tend to ∞. So the infimum is not attained.
11
2.1.3 Semidefinite program in dual form
The program (2.3) is often referred to as the primal SDP in standard form. One
can define its dual SDP, which takes the form:
m
m
X
X
d∗ = inf
bj yj = bT y :
(2.5)
y j Aj − C 0 .
y
j=1
j=1
Thus the dual program has variables yj , one for each linear constraint of the primal program. The positive semidefinite constraint arising in (2.5) is also named
a linear matrix inequality (LMI). The
P SDP (2.5) is said to be strictly feasible if it
has a feasible solution y for which j yj Aj − C ≻ 0.
Example 2.1.4. Let us work out the dual SDP of the SDP in Example 2.1.3. First
we write (2.4) in standard primal form as
− p∗ = max2 {h−E11 , Xi : hE12 , Xi = 2}.
X∈S
(2.6)
As there is one linear equation, there is one dual variable y and the dual SDP reads:
1 y
∗
− d = inf {2y : yE12 + E11 =
0}.
(2.7)
y 0
y∈R
Hence y = 0 is the only dual feasible solution. Hence, the dual optimum value is
d∗ = 0, attained at y = 0.
2.1.4 Duality
The following facts relate the primal and dual SDP’s. They are simple, but very
important.
Lemma 2.1.5. Let X be a feasible solution of (2.3) and let y be a feasible solution
of (2.5). Then the following holds.
1. (weak duality) We have: hC, Xi ≤ bT y and thus p∗ ≤ d∗ .
2. (optimality condition) Assume that p∗ = d∗ holds. Then X is an optimal
solution of (2.3) and y is an optimal solution
Pm of (2.5) if and only if equality:
hC, Xi = bT y holds or, equivalently, hX, j=1 yj Aj −Ci = 0 which, in turn,
is equivalent to the following complementarity condition:
m
X
yj Aj − C = 0.
X
j=1
Proof. Let (X, y) is a primal/dual pair of feasible solutions.
1. We have:
X
X
X
hX,
yj Aj − Ci =
hX, Aj iyj − hX, Ci =
bj yj − hX, Ci = bT y − hC, Xi,
j
j
j
(2.8)
12
where we
Pequality. As both
P used the fact that hAj , Xi = bj to get the second
X and j yj Aj − C are positive semidefinite, we get: hX, j yj Aj − Ci ≥ 0,
which implies hC, Xi ≤ bT y and thus p∗ ≤ d∗ .
2. By assumption, we have: hC, Xi ≤ p∗ = d∗ ≤ bT y. Hence, (X, y) form a pair
of primal/dual optimal solutionsP
if and only if hC, Xi = bT y or, equivalently
(in view
P of relation (2.8)), hX, j yj Aj − Ci = 0. Finally, as both X and
Z = j yj Aj − C are positive semidefinite, we deduce that hX, Zi = 0 if and
only if XZ = 0. (Recall Lemma 1.2.7.)
The quantity d∗ − p∗ is called the duality gap. While there is no duality gap
in LP, there might be a positive duality gap between the primal and dual SDP’s.
When there is no duality gap, i.e., when p∗ = d∗ , one says that strong duality
holds. Having strong duality is a very desirable situation, which happens when
at least one of the primal and dual semidefinite programs is strictly feasible. We
only quote the following result on strong duality. For its proof we refer e.g. to
the textbook [1] or to [3].
Theorem 2.1.6. (Strong duality: no duality gap) Consider the pair of primal
and dual programs (2.3) and (2.5).
1. Assume that the dual program (2.5) is bounded from below (d∗ > −∞)
and that it is strictly feasible. Then the primal program (2.3) attains its
supremum (i.e., p∗ = hC, Xi for some primal feasible X) and there is no
duality gap: p∗ = d∗ .
2. Assume that the primal program (2.3) is bounded from above (p∗ < ∞) and
that it is strictly feasible. Then the dual program (2.5) attains its infimum
(i.e., d∗ = bT y for some dual feasible y) and there is no duality gap: p∗ = d∗ .
Consider again the primal and dual SDP’s of Example 2.1.4. Then, the primal
(2.6) is strictly feasible, the dual (2.7) attains its optimum value and there is no
duality gap, while the dual is not strictly feasible and the primal does not attain
its optimum value.
We conclude with an example having a positive duality gap.
Example 2.1.7. Consider the primal semidefinite program with data matrices
−1 0 0
1 0 0
0 0 1
C = 0 −1 0 , A1 = 0 0 0 , A2 = 0 1 0 ,
0
0 0
0 0 0
1 0 0
and b1 = 0, b2 = 1. It reads
p∗ = sup{−X11 − X22 : X11 = 0, 2X13 + X22 = 1, X 0}
and its dual reads
y1 + 1
d∗ = inf y2 : y1 A1 + y2 A2 − C = 0
y2
13
0
y2 + 1
0
y2
00 .
0
Then any primal feasible solution satisfies X13 = 0, X22 = 1, so that the primal
optimum value is equal to p∗ = −1, attained at the matrix X = E22 . Any dual
feasible solution satisfies y2 = 0, so that the dual optimum value is equal to d∗ = 0,
attained at y = 0. Hence there is a positive duality gap: d∗ − p∗ = 1.
Note that in this example both the primal and dual programs are not strictly
feasible.
2.2 Application to eigenvalue optimization
Given a matrix C ∈ S n , let λmin (C) (resp., λmax (C)) denote its smallest (resp.,
largest) eigenvalue. One can express them (please check it) as follows:
λmax (C) =
xT Cx
= max
xT Cx,
\{0} kxk2
x∈Sn−1
max
n
x∈R
(2.9)
where Sn−1 = {x ∈ Rn | kxkx = 1} denotes the unit sphere in Rn , and
λmin (C) =
xT Cx
= min
xT Cx.
\{0} kxk2
x∈Sn−1
min
n
x∈R
(2.10)
(This is known as the Rayleigh principle.) As we now see the largest and smallest eigenvalues can be computed via a semidefinite program. Namely, consider
the semidefinite program
p∗ = sup {hC, Xi : Tr(X) = hI, Xi = 1, X 0}
(2.11)
and its dual program
d∗ = inf {y : yI − C 0} .
(2.12)
y∈R
In view of (2.9), we have that d∗ = λmax (C). The feasible region of (2.11)
is bounded (all entries of any feasible X lie in [0, 1]) and contains a positive
definite matrix (e.g., the matrix In /n), hence the infimum is attained in (2.12).
Analogously, the program (2.12) is bounded from below (as y ≥ λmax (C) for
any feasible y) and strictly feasible (pick y large enough), hence the infimum is
attained in (2.12). Moreover there is no duality gap: p∗ = d∗ . Here we have
applied Theorem 2.1.6. Thus we have shown:
Lemma 2.2.1. The largest and smallest eigenvalues of a symmetric matrix C ∈ S n
can be expressed with the following semidefinite programs:
λmax (C) =
max
s.t.
hC, Xi
Tr(X) = 1, X 0
=
min
s.t.
y
yIn − C 0
λmin (C) =
min
s.t.
hC, Xi
=
Tr(X) = 1, X 0
max
s.t.
y
C − yIn 0
14
More generally, also the sum of the k largest eigenvalues of a symmetric
matrix can be computed via a semidefinite program. For details see [4].
Theorem 2.2.2. (Fan’s theorem) Let C ∈ S n be a symmetric matrix with eigenvalues λ1 ≥ . . . ≥ λn . Then the sum of its k largest eigenvalues is given by any of
the following two programs:
λ1 + · · · + λk = maxn {hC, Xi : Tr(X) = k, In X 0} ,
X∈S
λ1 + · · · + λk = max
Y ∈Rn×k
hC, Y Y T i : Y T Y = Ik .
(2.13)
(2.14)
2.3 Some facts about complexity
2.3.1 More differences between LP and SDP
We have already seen above several differences between linear programming
and semidefinite programming: there might be a duality gap between the primal and dual programs and the supremum/infimum might not be attained even
though they are finite. We point out some more differences regarding rationality
and bit size of optimal solutions.
In the classical bit (Turing machine) model of computation an integer number p is encoded in binary notation, so that its bit size is log p + 1 (logarithm in
base 2). Rational numbers are encoded as two integer numbers and the bit size
of a vector or a matrix is the sum of the bit sizes of its entries.
Consider a linear program
max{cT x : Ax = b, x ≥ 0}
(2.15)
where the data A, b, c is rational valued. From the point of view of computability
this is a natural assumption and it would be desirable to have an optimal solution which is also rational-valued. A fundamental result in linear programming
asserts that this is indeed the case: If program (4.4) has an optimal solution,
then it has a rational optimal solution x ∈ Qn , whose bit size is polynomially
bounded in terms of the bit sizes of A, b, c (see e.g. [10]).
On the other hand it is easy to construct instances of semidefinite programming where the data are rational valued, yet there is no rational optimal solution. For instance, the following program
1 x
max x :
0
(2.16)
x 2
√
attains its maximum at x = ± 2.
Consider now the semidefinite program, with variables x1 , . . . , xn ,
1 2
1
xi−1
inf xn :
0 for i = 2, . . . , n .
0,
xi−1
xi
2 x1
n
(2.17)
Then any feasible solution satisfies xn ≥ 22 . Hence the bit-size of an optimal
solution is exponential in n, thus exponential in terms of the bit-size of the data.
15
2.3.2 Algorithms
It is well known that linear programs (with rational data c, a1 , . . . , am , b) can be
solved in polynomial time. Although the simplex method invented by Dantzig
in 1948 performs very well in practice, it is still an open problem whether it
gives a polynomial time algorithm for solving general LP’s. The first polynomialtime algorithm for solving LP’s was given by Khachiyan in 1979, based on the
ellipsoid method. The value of this algorithm is however mainly theoretical as
it is very slow in practice. Later the algorithm of Karmarkar in 1984 opened the
way to polynomial time algorithms for LP based on interior-point algorithms,
which also perform well in practice.
What about algorithms for solving semidefinite programs?
First of all, one cannot hope for a polynomial time algorithm permitting to
solve any semidefinite program exactly. Indeed, even if the data of the SDP
are assumed to be rational valued, the output might be an irrational number,
thus not representable in the bit model of computation. Such an instance was
mentioned above in (2.16). Therefore, one can hope at best for an algorithm
permitting to compute in polynomial time an ǫ-approximate optimal solution.
However, even if we set up to this less ambitious goal of just computing
ǫ-approximate optimal solutions, we should make some assumptions on the
semidefinite program, roughly speaking, in order to avoid having too large or
too small optimal solutions. An instance of SDP whose output is exponentially
large in the bit size of the data was mentioned above in (2.17).
On the positive side, it is well known that one can test whether a given
rational matrix is positive semidefinite in polynomial time — using Gaussian
elimination. Hence one can test in polynomial time membership in the positive
n
semidefinite cone and, moreover, if X 6∈ S0
, then one can compute in polynon
mial time a hyperplane strictly separating X from S0
(again as a byproduct of
Gaussian elimination). See Section 2.3.3 below for details.
This observation is at the base of the polynomial time algorithm for solving
approximately semidefinite programs, based on the ellipsoid method. Roughly
speaking, one can solve a semidefinite program in polynomial time up to any
given precision. More precisely, we quote the following result describing the
complexity of solving semidefinite programming with the ellipsoid method:
Consider the semidefinite program
p∗ = sup{hC, Xi : hAj , Xi = bj (j ∈ [m]), X 0},
where Aj , C, bj are integer valued. Denote by F its feasibility region. Suppose
that an integer R is known a priori such that either F = ∅ or there exists X ∈ F
with kXk ≤ R. Let ǫ > 0 be given.Then, either one can find a matrix X ∗ at
distance at most ǫ from F and such that |hC, X ∗ i − p∗ | ≤ ǫ, or one can find
a certificate that F does not contain a ball of radius ǫ. The complexity of this
algorithm is polynomial in n, m, log R, log(1/ǫ), and the bit size of the input
data.
16
Again, although polynomial time in theory, algorithms based on the ellipsoid
method are not practical. Instead, interior-point algorithms are used to solve
semidefinite programs in practice. We refer e.g. to [1], [2], [10], [6] for more
information about algorithms for linear and semidefinite programming.
2.3.3 Gaussian elimination
Let A = (aij ) ∈ S n be a rational matrix. Gaussian elimination permits to do the
following tasks in polynomial time:
(i) Either: find a rational matrix U ∈ Qn×n and a rational diagonal matrix
D ∈ Qn×n such that A = U DU T , thus showing that A 0.
(ii) Or: find a rational vector x ∈ Qn such that xT Ax < 0, thus showing that
A is not positive semidefinite and giving a hyperplane separating A from
n
the cone S0
.
Here is a sketch. We distinguish three cases.
Case 1: a11 < 0. Then, (ii) applies, since eT
1 Ae1 < 0.
Case 2: a11 = 0, but some entry a1j is not zero, say a12 6= 0. Then choose λ ∈ Q
such that 2λa12 + a22 < 0, so that xT Ax < 0 for the vector x = (λ, 1, 0, . . . , 0)
and thus (ii) applies again.
Case 3: a11 > 0. Then we apply Gaussian elimination to the rows Rj and
columns Cj of A for j = 2, . . . , n. Namely, for each j = 2, . . . , n, we replace Cj
a1j
C1 , and analogously we replace Rj by Rj − aa12
Rj , which amounts to
by Cj − a11
11
making all entries of A equal to zero at the positions (1, j) and (j, 1) for j 6= 1.
a1j
For this, define the matrices Pj = In − a11
E1j and P = P2 · · · Pn . Then, P is
rational and nonsingular, and P T AP has the block form:
1 0
T
,
P AP =
0 A′
where A′ ∈ S n−1 . Thus, A 0 ⇐⇒ P T AP 0 ⇐⇒ A′ 0.
Then, we proceed inductively with the matrix A′ ∈ S n−1 :
• Either, we find W ∈ Q(n−1)×(n−1) and a diagonal matrix D′ ∈ Q(n−1)×(n−1)
such that A′ = W T D′ W . Then, we obtain that A = U T DU , setting
1 0
1 0
U=
.
P −1 , D =
0 W
0 D′
• Or, we find y ∈ Qn−1 such that y T A′ y < 0. Then, we obtain that xT Ax < 0,
after defining z = (0, y) and x = P z ∈ Qn .
17
2.4 Exercises
2.1. Consider a symmetric matrix A ∈ S n and a scalar t ∈ R. Show:
There exists a scalar t0 ∈ R such that the matrix A + tIn is positive definite
for all t ≥ t0 . Give some explicit value for t0 .
2.2. Consider the semidefinite program:
p∗ = sup {X22 : X11 = 0, X12 = 1, X 0}.
X∈S 2
(a) Write it in standard primal form and determine the value of p∗ .
(b) Write the dual semidefinite program and determine its optimum value
d∗ .
(c) Is there a duality gap? Are the semidefinite programs strictly feasible?
n
2.3. (a) Consider a matrix A ∈ S0
and L ∈ Rn×k such that A = LLT , a
n
vector b ∈ R and a scalar c ∈ R. Show: For any vector x ∈ Rn ,
Ik
LT x
T
T
x Ax ≤ b x + c ⇐⇒
0.
x T L bT x + c
(b) Show: For any vector x ∈ Rn ,
kxk2 ≤ 1 ⇐⇒
1
x
xT
In
0.
(c) Given c ∈ Rn , consider the problem:
min {cT x : kxk2 ≤ 1}.
x∈Rn
Reformulate it as a semidefinite program and write its dual semidefinite program. Is there a duality gap?
2.4. Let G = (V = [n], E) be a graph and let d = (dij ){i,j}∈E ∈ RE
≥0 be given
nonnegative weights on the edges. Consider the following problem (P):
Find vectors v1 , . . . , vn ∈ Rk (for some integer k ≥ 1) such that
n
X
i=1
kvi k2 = 1, kvi − vj k2 = dij for all {i, j} ∈ E
and for which the sum
Pn
i,j=1
viT vj is minimum.
(a) Formulate problem (P) as an instance of semidefinite program.
(b) If in problem (P) we would like to add the additional constraint that
the vectors v1 , . . . , vn should belong to Rk for some fixed dimension
k, which condition would you add to the semidefinite program?
Hint: This could be a condition on the rank of the matrix variable.
18
BIBLIOGRAPHY
[1] A. Ben-Tal, A. Nemirovski. Lectures on Modern Convex Optimization,
SIAM, 2001.
[2] M. Grötschel, L. Lovász and A. Schrijver. Geometric Algorithms and Combinatorial Optimization. Springer, 1988.
[3] M. Laurent and A. Vallentin. Semidefinite Optimization. Lecture Notes.
https://sites.google.com/site/mastermathsdp/lectures
[4] M. Overton and R.S. Womersley. On the sum of the k largest eigenvalues
of a symmetric matrix. SIAM Journal on Matrix Analysis and its Applications 13(1):41–45, 1992.
[5] A. Schrijver, Theory of linear and integer programming, John Wiley &
Sons, 1986.
[6] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review
38:49–95, 1996.
http://stanford.edu/~boyd/papers/sdp.html
19
CHAPTER 3
GRAPH COLORING AND
INDEPENDENT SETS
In this chapter we discuss how semidefinite programming can be used for constructing tractable bounds for two hard combinatorial problems: for finding
maximum independent sets and minimum colorings in graphs.
We introduce the graph parameter ϑ(G), known as the theta number of the
graph G. This parameter was introduced by L. Lovász in his seminal paper [8].
We present several equivalent formulations and explain how ϑ(G) can be used
to compute maximum stable sets and minimum colorings in perfect graphs in
polynomial time, whereas these problems are NP-hard for general graphs.
Here are some definitions that we use in this chapter. Let G = (V, E) be a
graph; often we let V = [n] = {1, . . . , n}. Then, E denotes the set of pairs {i, j}
of distinct nodes that are not adjacent in G. The graph G = (V, E) is called the
complementary graph of G. G is self-complementary if G and G are isomorphic
graphs. Given a subset S ⊆ V , G[S] denotes the subgraph induced by S: its node
set is S and its edges are all pairs {i, j} ∈ E with i, j ∈ S. The graph Cn is the
circuit (or cycle) of length n, with node set [n] and edges the pairs {i, i + 1}
(for i ∈ [n], indices taken modulo n). For a set S ⊆ V , its characteristic vector is
the vector χS ∈ {0, 1}V , whose i-th entry is 1 if i ∈ S and 0 otherwise. We let
e = (1, . . . , 1)T denote the all-ones vector.
20
3.1 Preliminaries on graphs
3.1.1 Stability and chromatic numbers
A subset S ⊆ V of nodes is said to be stable (or independent) if no two nodes
of S are adjacent in G. Then the stability number of G is the parameter α(G)
defined as the maximum cardinality of an independent set in G.
A subset C ⊆ V of nodes is called a clique if every two distinct nodes in C
are adjacent. The maximum cardinality of a clique in G is denoted ω(G), the
clique number of G. Clearly,
ω(G) = α(G).
Computing the stability number of a graph is a hard problem: Given a graph
G and an integer k, deciding whether α(G) ≥ k is an N P -complete problem.
Given an integer k ≥ 1, a k-coloring of G is an assignment of numbers (view
them as colors) from {1, · · · , k} to the nodes in such a way that two adjacent
nodes receive distinct colors. In other words, this corresponds to a partition of
V into k stable sets: V = S1 ∪ · · · ∪ Sk , where Si is the stable set consisting of
all nodes that received the i-th color. The coloring (or chromatic) number is the
smallest integer k for which G admits a k-coloring, it is denoted as χ(G).
Again it is an N P -complete problem to decide whether a graph is k-colorable.
In fact, it is N P -complete to decide whether a planar graph is 3-colorable. On
the other hand, it is known that every planar graph is 4-colorable – this is the
celebrated 4-color theorem. Moreover, observe that one can decide in polynomial time whether a graph is 2-colorable, since one can check in polynomial
time whether a graph is bipartite.
Figure 3.1: The Petersen graph has α(G) = 4, ω(G) = 2 and χ(G) = 3
Clearly, any two nodes in a clique of G must receive distinct colors. Therefore, for any graph, the following inequality holds:
ω(G) ≤ χ(G).
(3.1)
This inequality is strict, for example, when G is an odd circuit, i.e., a circuit
of odd length at least 5, or its complement. Indeed, for an odd circuit C2n+1
(n ≥ 2), ω(C2n+1 ) = 2 while χ(C2n+1 ) = 3. Moreover, for the complement
21
G = C2n+1 , ω(G) = n while χ(G) = n + 1. For an illustration see the cycle of
length 7 and its complement in Figure 6.2.
Figure 3.2: For C7 and its complement C7 : ω(C7 ) = 2, χ(C7 ) = 3, ω(C7 ) =
α(C7 ) = 3, χ(C7 ) = 4
3.1.2 Perfect graphs
It is intriguing to understand for which graphs equality ω(G) = χ(G) holds.
Note that any graph G with ω(G) < χ(G) can be embedded in a larger graph Ĝ
with ω(Ĝ) = χ(Ĝ), simply by adding to G a clique of size χ(G) (disjoint from
V ). This justifies the following definition, introduced by C. Berge in the early
sixties, which makes the problem well posed.
Definition 3.1.1. A graph G is said to be perfect if equality
ω(H) = χ(H)
holds for all induced subgraphs H of G (including H = G).
Here are some classes of perfect graphs. For each of them the relation
ω(G) = χ(G) gives a combinatorial min-max relation.
1. Bipartite graphs (the relation ω(G) = χ(G) = 2 is clear).
2. Line graphs of bipartite graphs (the min-max relation claims that the maximum cardinality of a matching is equal to the minimum cardinality of a
vertex cover, which is König’s theorem).
3. Comparability graphs (the min-max relation corresponds to Diilworth’s
theorem).
It follows from the definition and the above observation about odd circuits
that if G is a perfect graph then it does not contain an odd circuit of length at
least 5 or its complement as an induced subgraph. Berge already conjectured
that all perfect graphs arise in this way. Resolving this conjecture has haunted
generations of graph theorists. It was finally settled in 2002 by Chudnovsky,
Robertson, Seymour and Thomas who proved the following result, known as
the strong perfect graph theorem:
22
Theorem 3.1.2. (The strong perfect graph theorem)[2] A graph G is perfect if
and only if it does not contain an odd circuit of length at least 5 or its complement
as an induced subgraph.
This implies the following structural result about perfect graphs, known as
the perfect graph theorem, already proved by Lovász in 1972.
Theorem 3.1.3. (The perfect graph theorem)[7] If G is a perfect graph, then
its complement G too is a perfect graph.
We give a direct proof of Theorem 3.1.3 in the next section and we will
mention later some other, more geometric, characterizations of perfect graphs
(see, e.g., Theorem 3.2.5).
3.1.3 The perfect graph theorem
Lovász [7] proved the following result, which implies the perfect graph theorem
(Theorem 3.1.3). The proof given below follows the elegant linear-algebraic
argument of Gasparian [4].
Theorem 3.1.4. A graph G is perfect if and only if |V (G′ )| ≤ ω(G′ )χ(G′ ) for each
induced subgraph G′ of G.
Proof. Necessity is easy: Assume that G is perfect and let G′ be an induced
subgraph of G. Then χ(G′ ) = ω(G′ ) and thus V (G′ ) can be covered by ω(G′ )
stable sets, which implies that |V (G′ )| ≤ ω(G′ )α(G′ ).
To show sufficiency, assume for a contradiction that there exists a graph G
which satisfies the condition but is not perfect; choose such a graph with |V (G)|
minimal. Then, n ≤ α(G)ω(G), ω(G) < χ(G) and ω(G′ ) = χ(G′ ) for each
induced subgraph G′ 6= G of G. Set ω = ω(G) and α = α(G) for simplicity. Our
first claim is:
Claim 1: There exist αω + 1 stable sets S0 , . . . , Sαω such that each vertex of G
is covered by exactly α of them.
Proof of the claim: Let S0 be a stable set of size α in G. For each node v ∈ S0 ,
as G\v is perfect (by the minimality assumption on G), χ(G\v) = ω(G\v) ≤ ω.
Hence, V \ {v} can be partitioned into ω stable sets. In this way we obtain a
collection of αω stable sets which together with S0 satisfy the claim.
Our next claim is:
Claim 2: For each i = 0, 1, . . . , αω, there exists a clique Ki of size ω such that
Ki ∩ Si = ∅ and Ki ∩ Sj 6= ∅ for j 6= i.
Proof of the claim: For each i = 0, 1, . . . , αω, as G \ Si is perfect we have that
χ(G\Si ) = ω(Si ) ≤ ω. This implies that χ(G\Si ) = ω since, if χ(G\Si ) ≤ ω −1,
then one could color G with ω colors, contradicting our assumption on G. Hence
there exists a clique Ki disjoint from Si and with |Ki | = ω. Moreover Ki meets
all the other αω stable sets Sj for j 6= i. This follows from the fact that each
23
of the ω elements of Ki belongs to α stable sets among the Sj ’s (Claim 1) and
these ωα sets are pairwise distinct.
We can now conclude the proof. Define the matrices M, N ∈ Rn×(αω+1) , whose
columns are χS0 , . . . , χSαω (the incidence vectors of the stable sets Si ), and the
vectors χK0 , . . . , χαω+1 (the incidence vectors of the cliques Ki ), respectively.
By Claim 2, we have that M T N = J − I (where J is the all-ones matrix and
I the identity). As J − I is nonsingular, we obtain that that rank(M T N ) =
rank(J − I) = αω + 1. On the other hand, rank(M T N ) ≤ rankN ≤ n. Thus we
obtain that n ≥ αω + 1, contradicting our assumption on G.
3.2 Linear programming bounds
3.2.1 Fractional stable sets and colorings
Let ST(G) denote the polytope in RV defined as the convex hull of the characteristic vectors of the stable sets of G:
ST(G) = conv{χS : S ⊆ V, S is a stable set in G},
called the stable set polytope of G. Hence, computing α(G) is linear optimization
over the stable set polytope:
α(G) = max{eT x : x ∈ ST(G)}.
We have now defined the stable set polytope by listing explicitly its extreme
points. Alternatively, it can also be represented by its hyperplanes representation, i.e., in the form
ST(G) = {x ∈ RV : Ax ≤ b}
for some matrix A and some vector b. As computing the stability number is
a hard problem one cannot hope to find the full linear inequality description
of the stable set polytope (i.e., the explicit A and b). However some partial
information is known: several classes of valid inequalities for the stable set
polytope are known. For instance, if C is a clique of G, then the clique inequality
X
xi ≤ 1
(3.2)
x(C) =
i∈C
is valid for ST(G): any stable set can contain at most one vertex from the clique
C. The clique inequalities define the polytope
QST(G) = x ∈ RV : x ≥ 0, x(C) ≤ 1 ∀C clique of G
(3.3)
and maximizing the linear function eT x over it gives the parameter
α∗ (G) = max{eT x : x ∈ QST(G)},
24
(3.4)
known as the fractional stability number of G. Clearly, QST(G) is a relaxation of
the stable set polytope:
ST(G) ⊆ QST(G).
(3.5)
Analogously, χ∗ (G) denotes the fractional coloring number of G, defined by the
following linear program:
)
(
X
X
S
∗
λS χ = e, λS ≥ 0 ∀S stable in G .
λS :
χ (G) = min
S stable in G
S stable in G
(3.6)
If we add the constraint that all λS should be integral we obtain the coloring
number of G. Thus, χ∗ (G) ≤ χ(G). In fact the fractional stability number of
G coincides with the fractional coloring number of its complement: α∗ (G) =
χ∗ (G), and it is nested between α(G) and χ(G).
Lemma 3.2.1. For any graph G, we have
α(G) ≤ α∗ (G) = χ∗ (G) ≤ χ(G),
(3.7)
where χ∗ (G) is the optimum value of the linear program:
X
X
yC χC = e, yC ≥ 0 ∀C clique of G .
min
yC :
C clique of G
(3.8)
C clique of G
Proof. The inequality α(G) ≤ α∗ (G) in (3.7) follows from the inclusion (3.5)
and the inequality χ∗ (G) ≤ χ(G) was observed above. We now show that
α∗ (G) = χ∗ (G). For this, we first observe that in the linear program (3.4) the
condition x ≥ 0 can be removed without changing the optimal value; that is,
α∗ (G) = max{eT x : x(C) ≤ 1 ∀C clique of G}
(3.9)
(check it). Now, it suffices to observe that the dual LP of the above linear
program (3.9) coincides with the linear program (3.8).
For instance, for an odd circuit C2n+1 (n ≥ 2), α∗ (C2n+1 ) =
lies strictly between α(C2n+1 ) = n and χ(C2n+1 ) = n + 1.
2n+1
2
(check it)
When G is a perfect graph, equality holds throughout in relation (3.7). As
we see in the next section, there is a natural extension of this result to weighted
graphs, which permits to show the equality ST(G) = QST(G) when G is a
perfect graph. Moreover, it turns out that this geometric property characterizes
perfect graphs.
3.2.2 Polyhedral characterization of perfect graphs
For any graph G, the factional stable set polytope is a linear relaxation of the
stable set polytope: ST(G) ⊆ QST(G). Here we show a geometric characterization of perfect graphs: G is perfect if and only if both polytopes coincide:
ST(G) = QST(G).
25
The following operation of duplicating a node will be useful. Let G = (V, E)
be a graph and let v ∈ V . Add to G a new node, say v ′ , which is adjacent to v
and to all neighbours of v in G. In this way we obtain a new graph H, which
we say is obtained from G by duplicating v. Repeated duplicating is called
replicating.
Lemma 3.2.2. Let H arise from G by duplicating a node. If G is perfect then H
too is perfect.
Proof. First we show that α(H) = χ(H) if H arises from G by duplicating node
v. Indeed, by construction, α(H) = α(G), which is equal to χ(G) since G is
perfect. Now, if C1 , . . . , Ct are cliques in G that cover V with (say) v ∈ C1 , then
C1 ∪{v ′ }, . . . , Ct are cliques in H covering V (H). This shows that χ(G) = χ(H),
which implies that α(H) = χ(H).
From this we can conclude that, if H arises from G by duplicating a node
v, then α(H ′ ) = χ(H ′ ) for any induced subgraph H ′ of H, using induction on
the number of nodes of G. Indeed, either H ′ is an induced subgraph of G (if
H ′ does not contain both v and v ′ ), or H ′ is obtained by duplicating v in an
induced subgraph of G; in both cases we have that α(H ′ ) = χ(H ′ ).
Hence, if H arises by duplicating a node in a perfect graph G, then H is
perfect which, by Theorem 3.1.3, implies that H is perfect.
Given node weights w ∈ RV+ , we define the following weighted analogues of
the (fractional) stability and chromatic numbers:
α(G, w) = max wT x,
x∈ST(G)
α∗ (G, w) =
χ∗ (G, w) = min
y
χ(G, w) = min
y
X
yC :
C clique of G
X
C clique of G
yC :
max
x∈QST(G)
X
C clique of G
X
C clique of G
wT x,
yC χC = w, yC ≥ 0 ∀C clique of G
yC χC = w, yC ∈ Z, yC ≥ 0 ∀C clique of G
When w is the all-ones weight function, we find again α(G), α∗ (G), χ∗ (G) and
χ(G), respectively. The following analogue of (3.7) holds for arbitrary node
weights:
(3.10)
α(G, w) ≤ α∗ (G, w) = χ∗ (G, w) ≤ χ(G, w).
Lemma 3.2.3. Let G be a perfect graph and let w ∈ ZV≥0 be nonnegative integer
node weights. Then, α(G, w) = χ(G, w).
Proof. Let H denote the graph obtained from G by duplicating node i wi times
if wi ≥ 1 and deleting node i if wi = 0. Then, by construction, α(G, w) = ω(H),
which is equal to χ(H) since H is perfect (by Lemma 3.2.2). Say, S̃1 , . . . , S̃t are
26
,
.
t = χ(H) stable sets in H partitioning V (H). Each stable set S̃k corresponds to
a stable set Sk in G (since S̃k contains at most one of the wi copies of each node
i of G). Now, these stable sets S1 , . . . , St have the property that each node i of
G belongs to exactly wi of them, which shows that χ(G, w) ≤ t = χ(H). This
implies that χ(G, w) ≤ χ(H) = α(G, w), giving equality χ(G, w) = α(G, w).
We will also use the following geometric property of down-monotone polytopes. A polytope P ⊆ Rn≥0 is said to be down-monotone if x ∈ P and 0 ≤ y ≤ x
(coordinate-wise) implies y ∈ P .
Lemma 3.2.4. Let P, Q ⊆ Rn be polytopes such that P ⊆ Q.
(i) P = Q if and only if the following equality holds for all weights w ∈ Rn :
max wT x = max wT x.
x∈P
x∈Q
(3.11)
(ii) Assume that P ⊆ Q ⊆ Rn≥0 are down-monotone. Then P = Q if and only if
(3.11) holds for all nonnegative weights w ∈ Rn≥0 .
Moreover, in (i) and (ii) it suffices to show that (3.11) holds for integer weights w.
Proof. (i) The ‘only if’ part is clear. The ‘if part’ follows using the ‘hyperplane
separation’ theorem: Assume that P ⊂ Q and that there exists z ∈ Q \ P . Then
there exists a hyperplane separating z from P , i.e., there exists a nonzero vector
w ∈ Rn and a scalar w0 ∈ R such that wT z > w0 and wT x ≤ w0 for all x ∈ P .
These two facts contradict the condition (3.11).
(ii) The ‘only if’ part is clear. For the ‘if part’, it suffices to show that the equality
(3.11) holds for all weights w if it holds for all nonnegative weights w′ . This
follows from the following claim (applied to both P and Q).
Claim: Let P ⊆ Rn≥0 be a down-monotone polytope, let w ∈ Rn and define the nonnegative vector w′ ∈ Rn≥0 by wi′ = max{wi , 0} for i ∈ [n]. Then,
maxx∈P wT x = maxx∈P (w′ )T x.
Proof of the claim: Suppose x ∈ P maximizes wT x over P ; we claim that xi = 0
at all positions i for which wi < 0. Indeed, if xi > 0 and wi < 0 then, by setting
yi = 0 and yj = xj for j 6= i, one obtains another point y ∈ P (since 0 ≤ y ≤ x
and P is down-monotone) with wT y > wT x. Therefore, wT x = (w′ )T x and thus
x maximizes w′ over P .
The last part of the lemma follows using a continuity argument (if (3.11) holds
for all integer weights w, it holds for all rational weights (by scaling) and thus
for all real weights (taking limits)).
We can now show the following geometric characterization of perfect graphs,
due to Chvátal [3].
Theorem 3.2.5. [3] A graph G is perfect if and only if ST(G) = QST(G).
27
Proof. First assume that G is perfect, we show that ST(G) = QST(G). As ST(G)
and QST(G) are down-monotone in RV≥0 , we can apply Lemma 3.2.4. Hence, it
suffices to show that, for any w ∈ ZV≥0 , α(G, w) = maxx∈ST(G) wT x is equal to
α∗ (G, w) = maxx∈QST(G) wT x, which follows from Lemma 3.2.3 (applied to G).
Conversely, assume that ST(G) = QST(G) and that G is not perfect. Pick a
minimal subset U ⊆ V for which the subgraph G′ of G induced by U satisfies
α(G′ ) < χ(G′ ). Setting w = χU , we have that α(G′ ) = α(G, w) which, by
assumption, is equal to maxx∈QST(G) wT x = α∗ (G, w). Consider the dual of the
linear program defining α∗ (G, w) with an optimal solution y = (yC ). Pick a
clique C of G for which yC > 0. Using complementary slackness, we deduce
that x(C) = 1 for any optimal solution x ∈ QST(G) and thus |C ∩ S| = 1 for
any maximum cardinality stable set S ⊆ U . Let G′′ denote the subgraph of G
induced by U \ C. Then, α(G′′ ) ≤ α(G′ ) − 1 < χ(G′ ) − 1 ≤ χ(G′′ ), which
contradicts the minimality assumption made on U .
When G is a perfect graph, an explicit linear inequality description is known
for its stable set polytope, given by the clique inequalities. However, it is not
clear how to use this information in order to give an efficient algorithm for
optimizing over the stable set polytope of a perfect graph. As we see later in
Section 3.5 there is yet another description of ST(G) – in terms of semidefinite
programming, using the theta body TH(G) – that will allow to give an efficient
algorithm.
3.3 Semidefinite programming bounds
3.3.1 The theta number
Definition 3.3.1. Given a graph G = (V, E), consider the following semidefinite
program
max {hJ, Xi : Tr(X) = 1, Xij = 0 ∀{i, j} ∈ E, X 0} .
X∈S n
(3.12)
Its optimal value is denoted as ϑ(G), and called the theta number of G.
This parameter was introduced by Lovász [8]. He proved the following simple, but crucial result – called the Sandwich Theorem by Knuth [6] – which
shows that ϑ(G) provides a bound for both the stability number of G and the
chromatic number of the complementary graph G.
Theorem 3.3.2. (Lovász’ sandwich theorem) For any graph G, we have that
α(G) ≤ ϑ(G) ≤ χ(G).
Proof. Given a stable set S of cardinality |S| = α(G), define the matrix
X=
1 S S T
χ (χ ) ∈ S n .
|S|
28
Then X is feasible for (3.12) with objective value hJ, Xi = |S| (check it). This
shows the inequality α(G) ≤ ϑ(G).
Now, consider a matrix X feasible for the program (3.12) and a partition of
V into k cliques: V = C1 ∪ · · · ∪ Ck . Our goal is now to show that hJ, Xi ≤ k,
Pk
Ci
which will imply ϑ(G) ≤ χ(G). For this, using the relation e =
i=1 χ ,
observe that
Y :=
k
X
i=1
Moreover,
kχCi − e
*
X,
kχCi − e
k
X
Ci
T
Ci T
χ (χ )
i=1
= k2
k
X
i=1
+
χCi (χCi )T − kJ.
= Tr(X).
P
Indeed the matrix i χCi (χCi )T has all its diagonal entries equal to 1 and it
has zero off-diagonal entries outside the edge set of G, while X has zero offdiagonal entries on the edge set of G. As X, Y 0, we obtain
0 ≤ hX, Y i = k 2 Tr(X) − khJ, Xi
and thus hJ, Xi ≤ k Tr(X) = k.
An alternative argument for the inequality ϑ(G) ≤ χ(G), showing an even
more transparent link to coverings by cliques, will be given in the paragraph
after the proof of Lemma 3.4.2.
3.3.2 Computing maximum stable sets in perfect graphs
Assume that G is a graph satisfying α(G) = χ(G). Then, as a direct application of Theorem 3.3.2, α(G) = χ(G) = ϑ(G) can be computed by solving the
semidefinite program (3.12), it suffices to solve this semidefinite program with
precision ǫ < 1/2 as one can then find α(G) by rounding the optimal value to
the nearest integer. In particular, combining with the perfect graph theorem
(Theorem 3.1.3):
Theorem 3.3.3. If G is a perfect graph then α(G) = χ(G) = ϑ(G) and ω(G) =
χ(G) = ϑ(G).
Hence one can compute the stability number and the chromatic number in
polynomial time for perfect graphs. Moreover, one can also find a maximum
stable set and a minimum coloring in polynomial time for perfect graphs. We
now indicate how to construct a maximum stable set – we deal with minimum
graph colorings in the next section.
Let G = (V, E) be a perfect graph. Order the nodes of G as v1 , · · · , vn . Then
we construct a sequence of induced subgraphs G0 , G1 , · · · , Gn of G. Hence each
Gi is perfect, also after removing a node, so that we can compute in polynomial
time the stability number of such graphs. The construction goes as follows: Set
G0 = G. For each i = 1, · · · , n do the following:
29
1. Compute α(Gi−1 \vi ).
2. If α(Gi−1 \vi ) = α(G), then set Gi = Gi−1 \vi .
3. Otherwise, set Gi = Gi−1 .
By construction, α(Gi ) = α(G) for all i. In particular, α(Gn ) = α(G). Moreover,
the node set of the final graph Gn is a stable set and, therefore, it is a maximum
stable set of G. Indeed, if the node set of Gn is not stable then it contains a
node vi for which α(Gn \vi ) = α(Gn ). But then, as Gn is an induced subgraph
of Gi−1 , one would have that α(Gn \vi ) ≤ α(Gi−1 \vi ) and thus α(Gi−1 \vi ) =
α(G), so that node vi would have been removed at Step 2.
Hence, the above algorithm permits to construct a maximum stable set in
a perfect graph G in polynomial time – namely by solving n + 1 semidefinite
programs for computing α(G) and α(Gi−1 \vi ) for i = 1, · · · , n.
More generally, given integer node weights w ∈ ZV≥0 , the above algorithm
can also be used to find a stable set S of maximum weight w(S). For this,
construct the new graph G′ in the following way: Duplicate each node i ∈ V wi
times, i.e., replace node i ∈ V by a set Wi of wi nodes pairwise non-adjacent,
and make two nodes x ∈ Wi and y ∈ Wj adjacent if i and j are adjacent in
G. By Lemma 3.2.2, the graph G′ is perfect. Moreover, α(G′ ) is equal to the
maximum weight w(S) of a stable set S in G. From this it follows that, if the
weights wi are bounded by a polynomial in n, then one can compute α(G, w)
in polynomial time. (More generally, one can compute α(G, w) in polynomial
time, e.g. by optimizing the linear function wT x over the theta body TH(G),
introduced in Section 3.5 below.)
3.3.3 Minimum colorings of perfect graphs
We now describe an algorithm for computing a minimum coloring of a perfect
graph G in polynomial time. This will be reduced to several computations of
the theta number which we will use for computing the clique number of some
induced subgraphs of G.
Let G = (V, E) be a perfect graph. Call a clique of G maximum if it has
maximum cardinality ω(G). The crucial observation is that it suffices to find a
stable set S in G which meets all maximum cliques.
First of all, such a stable set S exists: in a ω(G)-coloring, any color class S
must meet all maximum cliques, since ω(G \ S) = χ(G \ S) = ω(G) − 1.
Now, if we have found such a stable set S, then one can recursively color
G\S with ω(G\S) = ω(G) − 1 colors (in polynomial time), and thus one obtains
a coloring of G with ω(G) colors.
The algorithm goes as follows: For t ≥ 1, we grow a list L of t maximum
cliques C1 , · · · , Ct . Suppose C1 , · · · , Ct have been found. Then do the following:
1. We find a stable set S meeting each of the cliques C1 , · · · , Ct (see below).
30
2. Compute ω(G\S).
3. If ω(G\S) < ω(G) then S meets all maximum cliques and we are done.
4. Otherwise, compute a maximum clique Ct+1 in G\S, which is thus a new
maximum clique of G, and we add it to the list L.
Pt
The first step can be done as follows: Set w = i=1 χCi ∈ ZV≥0 . As G is
perfect, we know that α(G, w) = χ(G, w), which in turn is equal to t. (Indeed,
χ(G, w) ≤ t follows from the definition of w. Moreover, if y = (yCP
) is feasible
T
for
the
program
defining
χ(G,
w)
then,
on
the
one
hand,
w
e
=
C yC |C| ≤
P
T
y
ω(G)
and,
on
the
other
hand,
w
e
=
tω(G),
thus
implying
t
≤
χ(G, w).)
C
C
Now we compute a stable set S having maximum possible weight w(S). Hence,
w(S) = t and thus S meets each of the cliques C1 , · · · , Ct .
The above algorithm has polynomial running time, since the number of iterations is bounded by |V |. To see this, define the affine space Lt ⊆ RV defined
by the equations x(C1 ) = 1, · · · , x(Ct ) = 1 corresponding to the cliques in the
current list L. Then, Lt contains strictly Lt+1 , since χS ∈ Lt \ Lt+1 for the set S
constructed in the first step, and thus the dimension decreases at least by 1 at
each iteration.
3.4 Other formulations of the theta number
3.4.1 Dual formulation
We now give several equivalent formulations for the theta number obtained by
applying semidefinite programming duality and some further elementary manipulations.
Lemma 3.4.1. The theta number can be expressed by any of the following programs:
ϑ(G) =
ϑ(G) =
ϑ(G) =
min
t∈R,A∈S n
{t : tI + A − J 0, Aij = 0 (i = j or {i, j} ∈ E)}, (3.13)
min
t∈R,B∈S n
min
t∈R,C∈S n
t : tI − B 0, Bij = 1 (i = j or {i, j} ∈ E) ,
(3.14)
{t : C − J 0, Cii = t (i ∈ V ), Cij = 0 ({i, j} ∈ E)}, (3.15)
ϑ(G) = minn λmax (B) : Bij = 1 (i = j or {i, j} ∈ E) .
B∈S
(3.16)
Proof. First we build the dual of the semidefinite program (3.12), which reads:
X
t : tI +
min
yij Eij − J 0 .
(3.17)
t∈R,y∈RE
{i,j}∈E
31
As both programs (3.12) and (3.17) are strictly feasible, there is no duality gap:
the optimal value of (3.17) is equal to ϑ(G), and the optimal values are attained
in both programs
P– here we have applied the duality theorem (Theorem 2.1.6).
Setting A = {i,j}∈E yij Eij , B = J − A and C = tI + A in (3.17), it follows
that the program (3.17) is equivalent to each of the programs (3.13), (3.14)
and (3.15). Finally the formulation (3.16) follows directly from (3.14) after
recalling that λmax (B) is the smallest scalar t for which tI − B 0.
3.4.2 Two more (lifted) formulations
We give here two more formulations for the theta number. They rely on semidefinite programs involving symmetric matrices of order 1 + n which we will index
by the set {0} ∪ V , where 0 is an additional index that does not belong to V .
Lemma 3.4.2. The theta number ϑ(G) is equal to the optimal value of the following semidefinite program:
min {Z00 : Z 0, Z0i = Zii = 1 (i ∈ V ), Zij = 0 ({i, j} ∈ E)}.
Z∈S n+1
(3.18)
Proof. We show that the two semidefinite programs in (3.13) and (3.18) are
equivalent. For this, observe that
t
eT
tI + A − J 0 ⇐⇒ Z :=
0,
e I + 1t A
which follows by taking the Schur complement of the upper left corner t in the
block matrix Z. Hence, if (t, A) is feasible for (3.13), then Z is feasible for
(3.18) with same objective value: Z00 = t. The construction can be reversed:
if Z is feasible for (3.18), then one can construct (t, A) feasible for (3.13) with
t = Z00 . Hence both programs are equivalent.
From the formulation (3.18), the link of the theta number to the (fractional)
chromatic number is even more transparent.
Lemma 3.4.3. For any graph G, we have that ϑ(G) ≤ χ∗ (G).
Proof. Let y = (yC ) be feasible for the linear program (3.8) defining χ∗ (G). For
each clique C define the vector zC = (1 χC ) ∈ R1+n , obtained by appending
an
P entry equal to 1T to the characteristic vector of C. Define the matrix Z =
C clique of G yC zC zC . One can verify that Z is feasible for the program (3.18)
P
with objective value Z00 = C yC (check it). This shows ϑ(G) ≤ χ∗ (G).
Applying duality to the semidefinite program (3.18), we obtain1 the following formulation for ϑ(G).
1 Of course there is more than one road leading to Rome: one can also show directly the equivalence of the two programs (3.12) and (3.19).
32
Lemma 3.4.4. The theta number ϑ(G) is equal to the optimal value of the following semidefinite program:
)
(
X
Yii : Y 0, Y00 = 1, Y0i = Yii (i ∈ V ), Yij = 0 ({i, j} ∈ E) .
max
n+1
Y ∈S
i∈V
(3.19)
Proof. First we write the program (3.18) in standard form, using the elementary
matrices Eij (with entries 1 at positions (i, j) and (j, i) and 0 elsewhere):
inf{hE00 , Zi : hEii , Zi = 1, hE0i , Zi = 2 (i ∈ V ), hEij , Zi = 0 ({i, j} ∈ E), Z 0}.
Next we write the dual of this sdp:
X
X
X
yi Eii + zi E0i +
sup
yi + 2zi : Y = E00 −
uij Eij 0 .
i∈V
i∈V
{i,j}∈E
Observe now that the matrix Y ∈ S n+1 occurring in this program can be equivalently characterized by the conditions: Y00
P = 1, Yij = 0 if {i, j}
P∈ E and Y 0.
Moreover the objective function reads:
i∈V yi + 2zi = −
i∈V Yii + 2Y0i .
Therefore the dual can be equivalently reformulated as
!
)
(
X
Yii + 2Y0i : Y 0, Y00 = 1, Yij = 0 ({i, j} ∈ E) . (3.20)
max −
i∈V
As (3.18) is strictly feasible (check it) there is no duality gap, the optimal value
of (3.20) is attained and it is equal to ϑ(G).
Let Y be an optimal solution of (3.20). We claim that Y0i + Yii = 0 for all
i ∈ V . Indeed, assume that Y0i + Yii 6= 0 for some i ∈ V . Then, Yii 6= 0. Let us
multiply the i-th column and the i-th row of the matrix Y by the scalar − YY0i
. In
ii
this way we obtain a new matrix Y ′ which is still feasible for (3.20), but now
2
Y2
with a better objective value: Indeed, by construction, Yii′ = Yii − YY0i
= Y0i
ii
ii
2
Y0i
Y0i
′
′
and Y0i = Y0i − Yii = − Yii −Yii . Moreover, the i-th term in the new objective
value is
Y2
−(Yii′ + 2Y0i′ ) = 0i > −(Yii + 2Y0i ).
Yii
This contradicts optimality of Y and thus we have shown that Y0i = −Yii for
all i ∈ V . Therefore, we can add w.l.o.g. the conditionP
Y0i = −Yii (i ∈ V ) to
(3.20), so that its objective function can be replaced by i∈V Yii .
Finally, to get the program (3.19), it suffices to observe that one can change
the signs on the first row and column of Y (indexed by the index 0). In this way
we obtain a matrix Ỹ such that Ỹ0i = −Y0i for all i and Ỹij = Yij at all other
positions. Thus Ỹ now satisfies the conditions Yii = Y0i for i ∈ V and it is an
optimal solution of (3.19).
33
3.5 The theta body TH(G)
It is convenient to introduce the following set of matrices X ∈ S n+1 , where
columns and rows are indexed by the set {0} ∪ V :
MG = {Y ∈ S n+1 : Y00 = 1, Y0i = Yii (i ∈ V ), Yij = 0 ({i, j} ∈ E), Y 0},
(3.21)
which is thus the feasible region of the semidefinite program (3.19). Now let
TH(G) denote the convex set obtained by projecting the set MG onto the subspace RV of the diagonal entries:
TH(G) = {x ∈ RV : ∃Y ∈ MG such that xi = Yii ∀i ∈ V },
(3.22)
called the theta body of G. It turns out that TH(G) is nested between ST(G) and
QST(G).
Lemma 3.5.1. For any graph G, we have that ST(G) ⊆ TH(G) ⊆ QST(G).
Proof. The inclusion ST(G) ⊆ TH(G) follows from the fact that the characteristic vector of any stable set S lies in TH(G). To see this, define the vector
y = (1 χS ) ∈ Rn+1 obtained by adding an entry equal to 1 to the characteristic vector of S, and define the matrix Y = yy T ∈ S n+1 . Then Y ∈ MG and
χS = (Yii )i∈V , which shows that χS ∈ TH(G).
We now show the inclusion TH(G) ⊆ QST(G). For this take x ∈ TH(G) and
let Y ∈ MG such that x = (Yii )i∈V . Then x ≥ 0 (as the diagonal entries of
a psd matrix are nonnegative). Moreover, for any clique C of G, we have that
x(C) ≤ 1 (cf. Exercise 1.1).
In view of Lemma 3.4.4, maximizing the all-ones objective function over
TH(G) gives the theta number:
ϑ(G) = max {eT x : x ∈ TH(G)}.
x∈RV
As maximizing eT x over QST(G) gives the LP bound α∗ (G), Lemma 3.5.1 implies directly that the SDP bound ϑ(G) dominates the LP bound α∗ (G):
Corollary 3.5.2. For any graph G, we have that α(G) ≤ ϑ(G) ≤ α∗ (G).
Combining the inclusion from Lemma 3.5.1 with Theorem 3.2.5, we deduce
that TH(G) = ST(G) = QST(G) for perfect graphs. It turns out that these
equalities characterize perfect graphs.
Theorem 3.5.3. (see [2]) For any graph G the following assertions are equivalent.
1. G is perfect.
2. TH(G) = ST(G)
3. TH(G) = QST(G).
34
4. TH(G) is a polytope.
We also mention the following beautiful relationship between the theta bodies of a graph G and of its complementary graph G:
Theorem 3.5.4. For any graph G,
TH(G) = {x ∈ RV≥0 : xT z ≤ 1 ∀z ∈ TH(G)}.
In other words, we know an explicit linear inequality description of TH(G);
moreover, the normal vectors to the supporting hyperplanes of TH(G) are precisely the elements of TH(G). One inclusion is easy:
Lemma 3.5.5. If x ∈ TH(G) and z ∈ TH(G) then xT z ≤ 1.
Proof. Let Y ∈ MG and Z ∈ MG such that x = (Yii ) and z = (Zii ). Let Z ′
be obtained from Z by changing signs in its first row and column (indexed by
0). Then hY, Z ′ i ≥ 0 as Y, Z ′ 0. Moreover, hY, Z ′ i = 1 − xT z (check it), thus
giving xT z ≤ 1.
3.6 The theta number for vertex-transitive graphs
First we mention an inequality relating the theta numbers of a graph and its
complement. (You will show it in Exercise 6.1.)
Proposition 3.6.1. For any graph G = (V, E), we have that ϑ(G)ϑ(G) ≥ |V |.
We now show that equality ϑ(G)ϑ(G) = |V | holds for certain symmetric
graphs, namely for vertex-transitive graphs. In order to show this, one exploits
in a crucial manner the symmetry of G, which permits to show that the semidefinite program defining the theta number has an optimal solution with a special
(symmetric) structure. We need to introduce some definitions.
Let G = (V, E) be a graph. A permutation σ of the node set V is called an
automorphism of G if it preserves edges, i.e., {i, j} ∈ E implies {σ(i), σ(j)} ∈ E.
Then the set Aut(G) of automorphisms of G is a group. The graph G is said to
be vertex-transitive if for any two nodes i, j ∈ V there exists an automorphism
σ ∈ Aut(G) mapping i to j: σ(i) = j.
The group of permutations of V acts on symmetric matrices X indexed by
V . Namely, if σ is a permutation of V and Pσ is the corresponding permutation
matrix (with Pσ (i, j) = 1 if j = σ(i) and 0 otherwise), then one can build the
new symmetric matrix
σ(X) := Pσ XPσT = (Xσ(i),σ(j) )i,j∈V .
If σ is an automorphism of G, then it preserves the feasible region of the
semidefinite program (3.12) defining the theta number ϑ(G). This is an easy,
but very useful fact, which is based on the convexity of the feasible region.
35
Lemma 3.6.2. If X is feasible for the program (3.12) and σ is an automorphism
of G, then σ(X) is again feasible for (3.12), moreover with the same objective
value as X.
Proof. Directly from the fact that hJ, σ(X)i = hJ, Xi, Tr(σ(X)) = Tr(X) and
σ(X)ij = Xσ(i)σ(j) = 0 if {i, j} ∈ E (since σ is an automorphism of G).
Lemma 3.6.3. The program (3.12) has an optimal solution X ∗ which is invariant
under action of the automorphism group of G, i.e., satisfies σ(X ∗ ) = X ∗ for all
σ ∈ Aut(G).
Proof. Let X be an optimal solution of (3.12). By Lemma 3.6.2, σ(X) is again
an optimal solution for each σ ∈ Aut(G). Define the matrix
X
1
σ(X),
X∗ =
|Aut(G)|
σ∈Aut(G)
obtained by averaging over all matrices σ(X) for σ ∈ Aut(G). As the set of
optimal solutions of (3.12) is convex, X ∗ is still an optimal solution of (3.12).
Moreover, by construction, X ∗ is invariant under action of Aut(G).
Corollary 3.6.4. If G is a vertex-transitive graph then the program (3.12) has an
optimal solution X ∗ satisfying Xii∗ = 1/n for all i ∈ V and X ∗ e = ϑ(G)
n e.
Proof. By Lemma 3.6.3, there is an optimal solution X ∗ which is invariant
under action of Aut(G). As G is vertex-transitive, all diagonal entries of X ∗
are equal: Indeed, let i, j ∈ V and σ ∈ Aut(G) such that σ(i) = j. Then,
∗
Xjj = Xσ(i)σ(i) = Xii . As Tr(X ∗ ) = 1 we must
= 1/n for all i. AnalP have∗ Xii P
∗
for all i, j,
= k∈V Xjk
ogously, the invariance of X ∗ implies that k∈V Xik
∗
i.e., X e = λe for some scalar λ. Combining with the condition hJ, X ∗ i = ϑ(G)
we obtain that λ = ϑ(G)
n .
Proposition 3.6.5. If G is a vertex-transitive graph, then ϑ(G)ϑ(G) = |V |.
Proof. By Corollary 3.6.4, there is an optimal solution X ∗ of the program (3.12)
defining ϑ(G) which satisfies Xii∗ = 1/n for i ∈ V and X ∗ e = ϑ(G)
n e. Then
n2
n
n2
∗
∗
X
−
J
0
(check
it).
Hence,
t
=
and
C
=
X
define
a feasible
ϑ(G)
ϑ(G)
ϑ(G)
solution of the program (3.15) defining ϑ(G), which implies ϑ(G) ≤ n/ϑ(G).
Combining with Proposition 3.6.1 we get the equality ϑ(G)ϑ(G) = |V |.
For instance, the cycle Cn is vertex-transitive, so that
ϑ(Cn )ϑ(Cn ) = n.
In particular, as C5 is isomorphic to C5 , we deduce that
√
ϑ(C5 ) = 5.
(3.23)
(3.24)
For n even, Cn is bipartite (and thus perfect), so that ϑ(Cn ) = α(Cn ) = n2
and ϑ(Cn ) = ω(Cn ) = 2. For n odd, one can compute ϑ(Cn ) using the above
symmetry reduction:
36
Proposition 3.6.6. For any odd n ≥ 3,
ϑ(Cn ) =
1 + cos(π/n)
n cos(π/n)
and ϑ(Cn ) =
.
1 + cos(π/n)
cos(π/n)
Proof. As ϑ(Cn )ϑ(Cn ) = n, it suffices to compute ϑ(Cn ). We use the formulation
(3.16). As Cn is vertex-transitive, there is an optimal solution B whose entries
are all equal to 1, except Bij = 1 + x for some scalar x whenever |i − j| = 1
(modulo n). In other words, B = J + xACn , where ACn is the adjacency
matrix of the cycle Cn . Thus ϑ(Cn ) is equal to the minimum value of λmax (B)
for all possible x. The eigenvalues of ACn are known: They are ω k + ω −k
2iπ
(for k = 0, 1, · · · , n − 1), where ω = e n is an n-th root of unity. Hence the
eigenvalues of B are
n + 2x and x(ω k + ω −k ) for k = 1, · · · , n − 1.
(3.25)
We minimize the maximum of the values in (3.25) when choosing x such that
n + 2x = −2x cos(π/n)
(check it). This gives ϑ(Cn ) = λmax (B) = −2x cos(π/n) =
n cos(π/n)
1+cos(π/n) .
3.7 Bounding the Shannon capacity
The theta number was introduced by Lovász [8] in connection with the problem
of computing the Shannon capacity of a graph, a problem in coding theory
considered by Shannon. We need some definitions.
Definition 3.7.1. (Strong product) Let G = (V, E) and H = (W, F ) be two
graphs. Their strong product is the graph denoted as G · H with node set V × W
and with edges the pairs of distinct nodes {(i, r), (j, s)} ∈ V × W with (i = j or
{i, j} ∈ E) and (r = s or {r, s} ∈ F ).
If S ⊆ V is stable in G and T ⊆ W is stable in H then S × T is stable in
G · H. Hence, α(G · H) ≥ α(G)α(H). Let Gk denote the strong product of k
copies of G, we have that
α(Gk ) ≥ (α(G))k .
Based on this, one can verify that
q
q
Θ(G) := sup k α(Gk ) = lim k α(Gk ).
k→∞
k≥1
(3.26)
The parameter Θ(G) was introduced by Shannon in 1956, it is called the Shannon capacity of the graph G. The motivation is as follows. Suppose V is a finite
alphabet, where some pairs of letters could be confused when they are transmitted over some transmission channel. These pairs of confusable letters can
be seen as the edge set E of a graph G = (V, E). Then the stability number of
37
G is the largest number of one-letter messages that can be sent without danger of confusion. Words of length k correspond to k-tuples in V k . Two words
(i1 , · · · , ik ) and (j1 , · · · , jk ) can be confused if at every position h ∈ [k] the two
letters ih and jh are equal or can be confused, which corresponds to having an
edge in the strong product Gk . Hence the largest number of words of length k
that can be sent without danger of confusion is equal to the stability number of
Gk and the Shannon capacity of G represents the rate of correct transmission of
the graph.
For instance, for the 5-cycle C5 , α(C5 ) = 2, but α((C5 )2 ) ≥ 5. Indeed,
if 1, 2, 3, 4, 5 are the nodes of C5 (in this cyclic order), then the five 2-letter
words (1,√
1), (2, 3), (3, 5), (4, 2), (5, 4) form a stable set in G2 . This implies that
Θ(C5 ) ≥ 5.
Determining the exact Shannon capacity of a graph is a very difficult problem in general, even for small graphs. For instance, the exact value of the
Shannon capacity of C5 was not known until Lovász [8] showed how to use the
theta number in order to upper bound
the Shannon capacity: Lovász
√
√ showed
that Θ(G) ≤ ϑ(G) and ϑ(C5 ) = 5, which implies that Θ(C5 ) = 5. For instance, although the exact value of the theta number of C2n+1 is known (cf.
Proposition 3.6.6), the exact value of the Shannon capacity of C2n+1 is not
known, already for C7 .
Theorem 3.7.2. For any graph G, we have that Θ(G) ≤ ϑ(G).
The proof is based on the multiplicative property of the theta number from
Lemma 3.7.3 – which you will prove in Exercise 6.2 – combined with the fact
that the theta number upper bounds
p the stability number: For any integer k,
α(Gk ) ≤ ϑ(Gk ) = (ϑ(G))k implies k α(Gk ) ≤ ϑ(G) and thus Θ(G) ≤ ϑ(G).
Lemma 3.7.3. The theta number of the strong product of two graphs G and H
satisfies ϑ(G · H) = ϑ(G)ϑ(H).
As an application one can compute the Shannon capacity of any graph G
which is vertex-transitive and self-complementary (e.g., like C5 ).
Theorem 3.7.4. If G = (V, E) is a vertex-transitivepgraph, then Θ(G · G) = |V |.
If, moreover, G is self-complementary, then Θ(G) = |V |.
Proof. We have Θ(G · G) ≥ α(G · G) ≥ |V |, since the set of diagonal pairs
{(i, i) : i ∈ V } is stable in G · G. The reverse inequality follows from Lemma
3.7.3 combined with Proposition 3.6.5: Θ(G · G) ≤ ϑ(G · G) = ϑ(G)ϑ(G) = |V |.
2
2
If G is isomorphic to G then Θ(G
p · G) = Θ(G ) = (Θ(G)) (check the rightmost
equality). This gives Θ(G) = |V |.
3.8 Geometric application
Definition 3.8.1. A set of vectors u1 , . . . , un ∈ Rd is called a vector labeling of
G if they are unit vectors: kui k = 1 for all i ∈ V and satisfy the orthogonality
38
conditions:
uT
i uj = 0 ∀{i, j} ∈ E.
In other words, they form an orthonormal representation of G (with the notation
of Exercise 3.3).
We are interested in the following graph parameter ξ(G), defined as the
smallest integer d for which G admits a vector labeling in Rd .
Theorem 3.8.2. For any graph G, we have:
ϑ(G) ≤ ξ(G) ≤ χ(G).
Proof. We show first the inequality ξ(G) ≤ χ(G). If d = χ(G), consider a
partition of V into d stable sets S1 , . . . , Sd . We now assign to all nodes v ∈ Si
the standard unit vector ei in Rd . In this way we have defined a vector labeling
of G in the space Rd , thus showing ξ(G) ≤ d.
We now show the inequality ϑ(G) ≤ ξ(G). For this, set d = ξ(G) and
consider a vector labeling u1 , . . . , un ∈ Rd of G. Define the matrices U0 = Id ,
n+1
whose entries are
U i = ui uT
i (i ∈ [n]). We now consider the matrix Z ∈ S
defined as Zij = hUi , Uj i for i, j ∈ {0, 1, . . . , n}. By construction, Z is positive
semidefinite and satisfies: Z00 = d, Z0i = Tr(Ui ) = kui k2 = 1, Zii = hUi , Ui i =
2
kui k2 = 1, and Zij = hUi , Uj i = (uT
i uj ) = 0 for all edges {i, j} ∈ E. Hence,
Z is feasible for the program (3.18) in Lemma 3.4.2 (for the graph G) and thus
we can conclude that ϑ(G) ≤ Z00 = d.
Hence ξ(G) gives a bound for the chromatic number which is at least as
good as the theta number. For bipartite graphs it is easy to check that equality
holds throughout, more precisely the following holds (Exercise 4.5).
Lemma 3.8.3. For any graph G, we have the equivalences:
ξ(G) ≤ 2 ⇐⇒ χ(G) ≤ 2 ⇐⇒ G is bipartite.
However, for larger values, the parameter ξ(G) is hard to compute. In fact,
checking whether ξ(G) ≤ 3 is already an NP-hard problem. This follows from a
construction of Peeters [9] which we now sketch.
Consider the prism graph H, with node set V (H) = {x, a, b, c, d, y} and with
edge set
E(H) = {{a, x}, {a, b}, {b, x}, {c, d}, {c, y}, {d, y}, {x, d}, {b, c}, {a, y}}.
We will use the following (easy to check) combinatorial property of H: H is
3-colorable and it has essentially two distinct 3-colorings, depending whether
the two nodes x, y receive the same color or not. We will also use the following
geometric property of its 3-dimensional vector labelings.
Lemma 3.8.4. Consider a vector labeling ux , ua , ub , uc , ud , uy in R3 of the prism
graph H. Then, the two vectors ux and uy are either orthogonal or parallel.
39
Proof. As the set {x, a, b} is a clique of H the vectors ux , ua , ub are pairwise
orthogonal and thus form an orthonormal basis of R3 . Hence the remaining
vectors uc , ud , uy have a unique linear decomposition in this basis. Moreover,
they are pairwise orthogonal since the set {c, d, y} is also a clique of H. It is
now easy to check that the coefficients of the linear decompositions of uc .ud , uy
in the basis {ux , ua , ub } can follow only the following two patterns, where *
indicates a nonzero entry:
u
x
uc
∗
ud 0
uy
0
ua
0
∗
0
ub
0
0 and
∗
u
x
uc
0
ud 0
uy
∗
ua
∗
0
0
ub
0
∗ .
0
Therefore, in the first case the vectors ux and uy are orthogonal and in the
second case they are parallel.
Let G = (V = [n], E) be a graph. We now build a new graph G′ containing
G as follows: For each pair of distinct nodes i, j ∈ V , we add a copy Hij
of H whose nodes are x = i, y = j and four new additionnal nodes called
aij , bij , cij , dij . Thus G′ contains G as an induced subgraph and G′ has 4 n−1
2
new edges. Clearly, χ(G) ≤ χ(G′ ). The following result
new nodes and 9 n−1
2
shows that
χ(G) ≤ 3 ⇐⇒ ξ(G′ ) ≤ 3.
As the problem of testing whether χ(G) ≤ 3 is NP-complete, we can conclude
that the problem of deciding whether ξ(G) ≤ 3 is also NP-hard.
Proposition 3.8.5. Let G and G′ be as indicated above. Then the following holds:
χ(G) ≤ 3 ⇐⇒ χ(G′ ) ≤ 3 ⇐⇒ ξ(G′ ) ≤ 3.
Proof. We show: χ(G′ ) ≤ 3 =⇒ ξ(G′ ) ≤ 3 =⇒ χ(G) ≤ 3 =⇒ χ(G′ ) ≤ 3.
The implication: χ(G′ ) ≤ 3 =⇒ ξ(G′ ) is clear, since ξ(G′ ) ≤ χ(G′ ).
The implication: χ(G) ≤ 3 =⇒ χ(G′ ) ≤ 3 is easy. For this consider a 3coloring of G. Then, for any i 6= j ∈ V , one can extend the coloring of the
nodes i, j to a 3-coloring of Hij , and thus χ(G′ ) ≤ 3.
We now show the last implication: ξ(G′ ) ≤ 3 =⇒ χ(G) ≤ 3. Assume that
ξ(G′ ) ≤ 3 and consider a vector labeling ux ∈ R3 (for x ∈ V (G′ )) of G′ . Then,
the vectors labeling each subgraph Hij satisfy the conclusion of Lemma 3.8.4.
Therefore, for any i 6= j ∈ V , the vectors ui and uj are either orthogonal or
parallel. Moreover, if {i, j} is an edge of G, then ui and uj are orthogonal. If
we now look at the lines spanned by the vectors ui (for i ∈ V ), it follows that
there are (at most) three distinct lines among them. (Indeed, if there would be
four distinct lines among them, then they should be spanned by four pairwise
orthogonal vectors, which is a contradiction since we are in the space R3 .) This
induces a partition of the vertices of G into three sets, depending to which of
the three lines the vector vi is paralel to. Moreover, each of these sets is a stable
set of G (since adjacent vertices correspond to orthogonal vectors). Hence we
have found a 3-coloring of G.
40
Consider the semidefinite program:
X 0, Xii = 1 (i ∈ [n]), Xij = 0 ({i, j} ∈ E).
Summarizing, what the above shows is that checking whether this semidefinite
program admits a feasible solution of rank at most 3 is an NP-hard problem.
More generally, for any fixed integer r ≥ 1, checking existence of a feasible
solution of rank at most r to a semidefinite program is NP-hard.
For the case r = 1 this is easy to see: Consider e.g. the formulation (3.19) of
the theta number ϑ(G) from Lemma 3.4.4. Then, a matrix Y which is feasible
for (3.19) and has rank 1 corresponds exactly to a stable set S ⊆ V , namely
Y = yy T , where y ∈ {0, 1}n+1 is the extended characteristic vector of S (with
1 at its zero-th entry). Therefore, If one could decide whether there is a feasible
Pn solution of rank 1 to the program (3.19) augmented with the condition
i=1 Yii ≥ k, then one could decide whether α(G) ≥ k, which is however an
NP-complete problem.
There are some known results that permit to ‘predict’ the rank of a feasible solution to a semidefinite program. In particular the following result
holds, which bounds the rank in terms of the number of linear equations in
the semidefinite program.
Theorem 3.8.6. (see [1]) Consider the semidefinite program:
hAj , Xi = bj (j ∈ [m]), X 0.
(3.27)
If it has a feasible solution then it has one whose rank r satisfies:
r+1
≤ m.
2
In particular, if m ≤ 2, then there is a feasible solution of rank 1.
See Exercise 3.4 for a weaker result: there exists a feasible solution of rank
at most m + 1.
3.9 Exercises
3.1. Show: ϑ(G)ϑ(G) ≥ n for any graph G on n nodes.
√
Show: ϑ(C5 ) ≥ 5.
Hint: Let C be a feasible solution for the program (3.15) defining ϑ(G),
and let C ′ be a feasible solution of the analogous program defining ϑ(G).
Use the fact that hC − J, C ′ − Ji ≥ 0, hC − J, Ji ≥ 0, hC ′ − J, Ji ≥ 0 (why
is this true?), and compute hC, C ′ i.
3.2 The goal is to show the result of Lemma 3.7.3 about the theta number of
the strong product of two graphs G = (V, E) and H = (W, F ):
ϑ(G · H) = ϑ(G)ϑ(H).
41
(a) Show that ϑ(G · H) ≥ ϑ(G)ϑ(H).
(b) Show that ϑ(G · H) ≤ ϑ(G)ϑ(H).
Hint: Use the primal formulation (3.12) for (a), and the dual formulation
(Lemma 3.4.1) for (b), and think of using Kronecker products of matrices
in order to build feasible solutions.
3.3 Let G = (V = [n], E) be a graph. A set of vectors u1 , . . . , un ∈ Rn is said
to be an orthonormal representation of G if they satisfy:
kui k = 1 ∀i ∈ V, uT
i uj = 0 ∀{i, j} ∈ E.
Consider the graph parameter
ϑ1 (G) = min max
c,ui i∈V
1
,
(cT ui )2
where the minimum is taken over all unit vectors c and all orthonormal
representations u1 , · · · , un of G.
(a) Show: ϑ(G) ≤ ϑ1 (G).
Hint: Consider unit vectors c, u1 , . . . , un forming an orthonormal representation of G. Set t = maxi (cT u1 i )2 and define the symmetric mauT u
trix A with entries Aij = (cT uii)(cjT uj ) for {i, j} ∈ E and Aij = 0
otherwise. It might also be useful to consider the matrix M , defined
as the Gram matrix of the vectors c − cTuui i for i ∈ V . Show that (t, A)
is feasible for the formulation (3.13) for ϑ(G).
(b) Show: ϑ1 (G) ≤ ϑ(G).
Hint: Consider an optimum solution (t = ϑ(G), B) to the program
(3.14) defining ϑ(G). Say tI − B is the Gram matrix of the vectors x1 , . . . , xn . Show that there exists a unit vector c orthogonal
√ i (i ∈ V ) form an
to x1 , . . . , xn . Show that the vectors ui = c+x
t
orthonormal representation of G.
√
(c) Show: ϑ(C5 ) ≤ 5.
Hint: Use the formulation via ϑ1 (G) and the following construction
(known as Lovász’ umbrella construction).
Set c = (0, 0, 1) ∈ R3 and, for k = 1, 2, 3, 4, 5, define the vector
uk = (s cos(2kπ/5), s sin(2kπ/5), t) ∈ R3 , where s and t are scalars
to be determined. Show that one can choose s and t in such a way
that u1 , . . . , u5 form an orthonormal representation of C5 .
Recall: cos(2π/5) =
√
5−1
4 .
3.4 Consider the semidefinite program:
X 0, hAj , Xi = bj (j ∈ [m]).
42
(3.28)
Ps
Assume that X is a feasible solution of (3.28) and write X = i=1 vi viT
for some vectors v1 , . . . , vs ∈ Rn , where s = rankX. Define the set
)
(
s
X
T
s
yi vi vi i = bj (j ∈ [m] .
P = y ∈ R : y ≥ 0, hAj ,
i=1
(a) Show: If
Pm
Aj ≻ 0, then P is a nonempty polytope.
Ps
(b) Show: If y is a vertex of P , then the matrix Xy = i=1 yi vi viT is a
feasible solution of (3.28) with rank at most m.
Pm
(c) Show: If j=1 Aj ≻ 0, then there exists a feasible solution of (3.28)
with rank at most m.
j=1
(d) Show: There exists a feasible solution of (3.28) with rank at most
m + 1.
3.5 Show Lemma 3.8.3.
43
BIBLIOGRAPHY
[1] A. Barvinok. A course in geometry. AMS, 2002.
[2] M. Chudnovsky, N. Robertson, P. Seymour, and R. Thomas. The strong
perfect graph theorem, Annals of Mathematics 164 (1): 51229, 2006.
[3] V. Chvátal. On certain polytopes associated with graphs. Journal of Combinatorial Theory, Series B 18:138–154, 1975.
[4] G.S. Gasparian. Minimal imperfect graphs: a simple approach. Combinatorica, 16:209–212, 1996.
[5] M. Grötschel, L. Lovász, A. Schrijver. Geometric Algorithms in Combinatorial Optimization, Springer, 1988. http://www.zib.de/groetschel/
pubnew/paper/groetschellovaszschrijver1988.pdf
[6] D.E. Knuth. The Sandwich Theorem. The Electronic Journal of Combinatorics 1, A1, 1994. http://www.combinatorics.org/ojs/index.php/eljc/
article/view/v1i1a1
[7] L. Lovász. A characterization of perfect graphs. Journal of Combinatorial
Theory, Series B 13:95–98, 1972.
[8] L. Lovász. On the Shannon capacity of a graph. IEEE Transactions on
Information Theory IT-25:1–7, 1979.
[9] R. Peeters. Orthogonal representations over finite fields and the chromatic number of graphs. Combinatorica 16(3):417–431, 1996.
44
CHAPTER 4
APPROXIMATING THE MAX
CUT PROBLEM
4.1 Introduction
4.1.1 The MAX CUT problem
The maximum cut problem (MAX CUT) is the following problem in combinatorial optimization. Let G = (V, E) be a graph and let w = (wij ) ∈ RE
+ be
nonnegative weights assigned to the edges. Given a subset S ⊆ V , the cut
δG (S) consists of the edges {u, v} ∈ E having exactly one endnode in S, i.e.,
with |{i, j} ∩ S| = 1. In other words, δG (S) consists of the edges that are
cut by the partition (S, S = V \ S) of V . The cut δG (S) is called trivial if
S = ∅ or V (inPwhich case it is empty). Then the weight of the cut δG (S)
is w(δG (S)) =
{i,j}∈δG (S) wij and the MAX CUT problem asks for a cut of
maximum weight, i.e., compute
mc(G, w) = max w(δG (S)).
S⊆V
T
When w = e = (1, . . . , 1) is the all-ones weight function, MAX CUT asks for a
cut of maximum cardinality and we use the simpler notation mc(G) = mc(G, e).
It is sometimes convenient to extend the weight function w ∈ RE to all pairs
of nodes of V , by setting wij = 0 if {i, j} is not an edge of G. Given disjoint
subsets S, T ⊆ V , it is also convenient to use the following notation:
X
wij .
w(S, T ) =
i∈S,j∈T
Thus,
w(S, S) = w(δG (S)) for all S ⊆ V.
45
To state its complexity, we formulate MAX CUT as a decision problem:
MAX CUT: Given a graph G = (V, E), edge weights w ∈ ZE
+ and an integer
k ∈ N, decide whether there exists a cut of weight at least k.
It is well known that MAX CUT is an NP-complete problem. In fact, MAX CUT
is one of Karp’s 21 NP-complete problems. So unless the complexity classes P
and NP coincide there is no efficient polynomial-time algorithm which solves
MAX CUT exactly. We give here a reduction of MAX CUT from the PARTITION
problem, defined below, which is one the first six basic NP-complete problems
in Garey and Johnson [5]:
PARTITION: Given naturalPnumbers aP
1 , . . . , an ∈ N, decide whether there exists a subset S ⊆ [n] such that i∈S ai = i6∈S ai .
Theorem 4.1.1. The MAX CUT problem is NP-complete.
Proof. It is clear that MAX CUT to the class NP. We now show a reduction from
PARTITION. Let a1 , . . . , an ∈ N be given. Construct the following
weights wij =
Pn
ai aj for the edges of the completeP
graph Kn . Set σ = i=1 ai and k = σ 2 /4.
For any subset S ⊆ [n], set a(S) = i∈S ai . Then, we have
w(S, S) =
X
i∈S,j∈S
wij =
X
ai aj = (
i∈S,j∈S
X
i∈S
ai )(
X
j∈S
aj ) = a(S)(σ−a(S)) ≤ σ 2 /4,
with equality if and only if a(S) = σ/2 or, equivalently, a(S) = a(S). From
this it follows that there is a cut of weight at least k if and only if the sequence
a1 , . . . , an can be partitioned. This concludes the proof.
This hardness result for MAX CUT is in sharp contrast to the situation of the
MIN CUT problem, which asks for a nontrivial cut of minimum weight, i.e., to
compute
min
w(S, S).
S⊆V :S6=∅,V
(For MIN CUT the weights of edges are usually called capacities and they also
assumed to be nonnegative). It is well known that the MIN CUT problem can be
solved in polynomial time (together with its dual MAX FLOW problem), using
the Ford-Fulkerson algorithm. Specifically, the Ford-Fulkerson algorithm permits to find in polynomial time a minimum cut (S, S) separating a given source
s and a given sink t, i.e., with s ∈ S and t ∈ S. Thus a minimum weight nontrivial cut can be obtained by applying this algorithm |V | times, fixing any s ∈ V
and letting t vary over all nodes of V \ {s}. Details can be found in Chapter 4
of the Lecture Notes [10].
Even stronger, Håstad in 2001 showed that it is NP-hard to approximate
16
∼ 0.941.
MAX CUT within a factor of 17
On the positive side, one can compute a 0.878-approximation of MAX CUT
in polynomial time, using semidefinite programming. This algorithm, due to
46
Figure 4.1: Minimum and maximum weight cuts
Goemans and Williamson [6], is one of the most influential approximation algorithms which are based on semidefinite programming. We will explain this
result in detail in Section 4.2.
Before doing that we recall some results for MAX CUT based on using linear
programming.
4.1.2 Linear programming relaxation
In order to define linear programming bounds for MAX CUT, one needs to find
some linear inequalities that are satisfied by all cuts of G, i.e., some valid inequalities for the cut polytope of G. Large classes of such inequalities are known
(cf. e.g. [3] for an overview and references).
We now present some simple but important valid inequalities for the cut
polytope of the complete graph Kn , which is denoted as CUTn , and defined as
the convex hull of the incidence vectors of the cuts of Kn :
CUTn = conv{χδKn (S) : S ⊆ [n]}.
For instance, for n = 2, CUTn = [0, 1] and, for n = 3, CUT3 is a simplex in R3
(indexed by the edges of K3 ordered as {1, 2}, {1, 3}, {2, 3}) with as vertices the
incidence vectors of the four cuts (S, S) of K3 : (0, 0, 0), (1, 1, 0), (1, 0, 1),
and (0 1 1) (for S = ∅, {1}, {2} and {3}, respectively).
As a first easy observation it is important to realize that in order to compute
the maximum cut mc(G, w) in a weighted graph G on n nodes, one can as well
deal with the complete graph Kn . Indeed, any cut δG (S) of G can be obtained
from the corresponding cut δKn (S) of Kn , simply by ignoring the pairs that are
not edges of G, in other words, by projecting onto the edge set of G. Hence one
can reformulate any maximum cut problem as a linear optimization problem
over the cut polytope of Kn :
X
wij xij ;
mc(G, w) = max
x∈CUTn
{i,j}∈E
the graph G is taken into account by the objective function of this LP.
The following triangle inequalities are valid for the cut polytope CUTn :
xij − xik − xjk ≤ 0, xij + xjk + xjk ≤ 2,
47
(4.1)
for all distinct i, j, k ∈ [n]. This is easy to see, just verify that these inequalities
hold when x is equal to the incidence vector of a cut. The triangle inequalities
(4.1) imply the following bounds (check it):
0 ≤ xij ≤ 1
(4.2)
on the variables. Let METn denote the polytope in RE(Kn ) defined by the triangle inequalities (4.1). Thus, METn is a linear relaxation of CUTn , tighter than
the trivial relaxation by the unit hypercube:
CUTn ⊆ METn ⊆ [0, 1]E(Kn ) .
Moreover, one can verify that, if we add integrality conditions to the triangle inequalities, then we obtain an integer programming reformulation of MAX CUT.
More precisley, we have
CUTn = conv(METn ∩ {0, 1}En ).
It is known that equality CUTn = METn holds for n ≤ 4, but the inclusion is
strict for n ≥ 5. Indeed, the inequality:
X
xij ≤ 6
(4.3)
1≤i<j≤5
is valid for CUT5 (as any cut of K5 has cardinality 0, 4 or 6), but it is not valid
for MET5 . For instance, the vector (2/3, . . . , 2/3) ∈ R10 belongs to MET5 but it
violates the inequality (4.3) (since 10.2/3 > 6).
We can define the following linear programming bound:
X
lp(G, w) = max
wij xij : x ∈ METn
(4.4)
{i,j}∈E(G)
for the maximum cut:
mc(G, w) ≤ lp(G, w).
For the all-ones weight function w = e we also denote lp(G) = lp(G, e). The
graphs for which this bound is tight have been characterized by Barahona [1].
Theorem 4.1.2. Let G be a graph. Then, mc(G, w) = lp(G, w) for all weight
functions w ∈ RE if and only if the graph G has no K5 minor.
In particular, if G is a planar graph, then mc(G, w) = lp(G, w) so that the
maximum cut can be computed in polynomial time using linear programming.
A natural question is how good the LP bound is for general graphs. Here are
some easy bounds.
Lemma 4.1.3. Let G be a graph with nonnegative weights w. The following holds:
w(E)/2 ≤ mc(G, w) ≤ lp(G, w) ≤ w(E).
48
Proof. The inequality lp(G, w) ≤ w(E) follows from the inclusion METn ⊆
[0, 1]E(Kn ) and the fact that w ≥ 0. We now show that w(E)/2 ≤ mc(G, w).
For this, pick S ⊆ V for which (S, S) is a cut of maximum weight: w(S, S) =
mc(G, w). Thus if we move one node i ∈ S to S, or if we move one node j ∈ S
to S, then we obtain another cut whose weight is at most w(S, S). This gives:
w(S \ {i}, S ∪ {i}) − w(S, S) = w(S \ {i}, {i}) − w({i}, S) ≤ 0,
w(S ∪ {j}, S \ {j}) − w(S, S) = w({j}, S \ {j}) − w(S, {j}) ≤ 0.
Summing the first relation over i ∈ S and using the fact that 2w(E(S)) =
P
w(S \ {i}, {i}), where E(S) is the set of edges contained in S, and the fact
i∈S
P
that i∈S w({i}, S) = w(S, S), we obtain:
2w(E(S)) ≤ w(S, S).
Analogously, summing the second relation over j ∈ S, we obtain:
2w(E(S)) ≤ w(S, S).
Summing these two relations yields: w(E(S)) + w(E(S)) ≤ w(S, S). Now
adding w(S, S) to both sides implies: w(E) ≤ 2w(S, S) = 2mc(G, w), which
concludes the proof.
The above proof shows in fact that w(S, S) ≥ w(E)/2 for any cut (S, S)
that is locally optimal, in the sense that switching one node from one side to
the other side does not improve the weight of the cut. As an application of
Lemma 4.1.3, we obtain that
1 w(E)
mc(G, w)
1
≤
≤
≤ 1 for all nonnegative weights w ≥ 0.
2
2 lp(G, w)
lp(G, w)
It turns out that there are graphs for which the ratio mc(G, w)/lp(G, w) can be
arbitrarily close to 1/2 [9]. Hence, for these graphs, the ratio w(E)/lp(G, w)
is arbitrarily close to 1, which means that the metric polytope does not provide
a better approximation of the cut polytope than its trivial relaxation by the
hypercube [0, 1]E .
We now provide another argument for the lower bound mc(G, w) ≥ w(E)/2.
This argument is probabilistic and based on the following simple randomized algorithm: Construct a random partition (S, S) of V by assigning, independently,
with probability 1/2, each node i ∈ V to either side of the partition. Then the
probability that an edge {i, j} is cut by the partition is equal to
P({i, j} is cut) = P(i ∈ S, j ∈ S) + P(i ∈ S, j ∈ S) =
1 1 1 1
1
· + · = .
2 2 2 2
2
Hence, the expected weight of the cut produced by this random partition is
equal to
E(w(S, S)) =
X
wij P({i, j} is cut) =
{i,j}∈E
X
{i,j}∈E
49
wij
w(E)
1
=
.
2
2
Here we have used the linearity of the expectation. As the expected weight
of a cut is at most the maximum weight of a cut, we obtain the inequality:
mc(G, w) ≥ E(w(S, S)) = w(E)/2. Moreover this simple randomized algorithm
yields a random cut whose expected weight is at least half the optimum weight:
E(w(S, S)) ≥ 0.5 · mc(G, w).
In the next section, we will see another probabilistic argument, due to Goemans and Williamson, which permits to construct a much better random cut.
Namely we will get a random cut whose expected weight satisfies:
E(w(S, S)) ≥ 0.878 · mc(G, w),
thus improving the above factor 0.5. The crucial tool will be to use a semidefinite relaxation for MAX CUT combined with a simple, but ingenious randomized
“hyperplane rounding” technique.
4.2 The algorithm of Goemans and Williamson
In this section we discuss the basic semidefinite programming relaxation for
MAX CUT and its dual reformulation and then we present the approximation
algorithm of Goemans and Williamson.
4.2.1 Semidefinite programming relaxation
Here we will introduce the basic semidefinite programming relaxation for MAX
CUT. For this we first reformulate MAX CUT as a (non-convex) quadratic optimization problem having quadratic equality constraints. With every vertex
i ∈ V , we associate a binary variable xi ∈ {−1, +1} which indicates whether i
lies in S or in S, say, i ∈ S if xi = −1 and i ∈ S if xi = +1. We model the binary
constraint xi ∈ {−1, +1} as a quadratic equality constraint
x2i = 1 for i ∈ V.
For two vertices i, j ∈ V we have
1 − xi xj ∈ {0, 2}.
This value equals to 0 if i and j lie on the same side of the cut (S, S) and the
value equals to 2 if i and j lie on different sides of the cut. Hence, one can
express the weight of the cut (S, S) by
X
1 − xi xj
w(S, S) =
.
wij
2
{i,j}∈E
Now, the MAX CUT problem can be equivalently formulated as
1 X
wij (1 − xi xj ) : x2i = 1 ∀i ∈ V .
mc(G, w) = max
2
{i,j}∈E
50
(4.5)
Next, we introduce a matrix variable X = (xij ) ∈ S n , whose entries xij
model the pairwise products xi xj . Then, as the matrix (xi xj )ni,j=1 = xxT is
positive semidefinite, we can require the condition that X should be positive
semidefinite. Moreover, the constraints x2i = 1 give the constraints Xii = 1 for
all i ∈ [n]. Therefore we can formulate the following semidefinite programming
relaxation:
1 X
wij (1 − Xij ) : X 0, Xii = 1 ∀i ∈ [n] . (4.6)
sdp(G, w) = max
2
{i,j}∈E
By construction, we have:
mc(G, w) ≤ sdp(G, w).
(4.7)
Again we set sdp(G) = sdp(G, e) for the all-ones weight function.
The feasible region of the above semidefinite program is the convex (nonpolyhedral) set
En = {X ∈ S n : X 0, Xii = 1 ∀i ∈ [n]},
called the elliptope (and its members are known as correlation matrices). One
can visualize the elliptope E3 . Indeed, for a 3 × 3 symmetric matrix X with an
all-ones diagonal, we have:
1 x y
X = x 1 z 0 ⇐⇒ 1 + 2xyz − x2 − y 2 − z 2 ≥ 0, x, y, z ∈ [−1, 1],
y z 1
which expresses the fact that the determinant of X is nonnegative as well as the
three 2 × 2 principal subdeterminants. The following Figure 4.2.1 visualizes the
set of triples (x, y, z) for which X ∈ E3 . Notice that the elliptope E3 looks like
an “inflated” tetrahedron, while the underlying tetrahedron corresponds to the
linear relaxation MET3 .
Figure 4.2: Views on the convex set E3 behind the semidefinite relaxation.
We have: CUTn ⊆ En , with strict inclusion for any n ≥ 3. One can for
instance verify that mc(Kn ) < sdp(Kn ) for any odd n ≥ 3.
51
4.2.2 Dual semidefinite programming relaxation
Given a graph G with edge weights w, its Laplacian matrix Lw is the symmetric
n × n matrix with entries:
X
(Lw )ii =
wij (i ∈ [n]),
j:{i,j}∈E
(Lw )ij = −wij ({i, j} ∈ E), (Lw )ij = 0 (i 6= j, {i, j} 6∈ E).
The following can be checked (Exercise 4.2).
Lemma 4.2.1. The following properties hold for the Laplacian matrix Lw :
P
(i) For any vector x ∈ {±1}n , 41 xT Lw x = 12 {i,j}∈E wij (1 − xi xj ).
(ii) For any nonnegative edge weights w ≥ 0, Lw 0.
This permits to reformulate the quadratic formulation (4.5) of MAX CUT as
1 T
mc(G, w) = max
x Lw x : x2i = 1 ∀i ∈ V
(4.8)
4
and its semidefinite relaxation (4.6) as
1
hLw , Xi : X 0, Xii = 1 ∀i ∈ V .
sdp(G, w) = max
4
(4.9)
This is a semidefinite program in standard primal form and its dual semidefinite
program reads:
)
( n
X
1
(4.10)
yi : Diag(y) − Lw 0 .
sdp(G, w) = min
4
i=1
Here, for a vector y ∈ Rn , Diag(y) denotes the diagonal matrix with diagonal
entries y1 , . . . , yn . Indeed, as the program (4.6) is strictly feasible (e.g. the identity matrix is strictly feasible), the minimization program in (4.10) attains its
minimum and there is no duality gap: its minimum value is equal to sdp(G, w).
We now give a reformulation for the program (4.10), in terms of the maximum eigenvalue of the Laplacian matrix up to a suitable diagonal shift. This
upper bound for MAX CUT was first introduced and investigated by Delorme
and Poljak [2].
Theorem 4.2.2. Let G = (V = [n], E) be a graph and let w ∈ RE
+ be nonnegative
edge weights. The following holds:
)
(
n
X
n
ui = 0 .
(4.11)
λmax (Lw + Diag(u)) :
sdp(G, w) = minn
u∈R
4
i=1
52
Proof. Let ϕ(G, w) denote the optimum value of the program in (4.11). We
need to show that sdp(G, w) = ϕ(G, w). Observe first that, using the fact that
λmax (Lw + Diag(u)) = min{t : tI − (Lw + Diag(u)) 0},
t∈R
we can reformulate ϕ(G, w) as
(
)
n
X
n
ui = 0 .
ϕ(G, w) = min n
t : tI − (Lw + Diag(u)) 0,
t∈R,u∈R
4
i=1
(4.12)
Consider first a feasible solution y to the program (4.10). Set t = n4 eT y and
u = te − 4y (where e denotes the all-ones vector). Then, eT u = 0 and we
have tI − (Lw + Diag(u)) = 4Diag(y) − Lw 0. Hence, (t, u) is feasible for
the program defining ϕ(G, w) and thus ϕ(G, w) ≤ nt/4 = eT y. This shows that
ϕ(G, w) ≤ sdp(G, w).
Conversely, consider a feasible solution (t, u) for the program defining ϕ(G, w).
Set y = 14 (te − u). Then, Diag(y) − L4w = 41 (tI − (Lw + Diag(u)) 0. This shows
that sdp(G, w) ≤ eT y = nt/4 and thus sdp(G, w) ≤ ϕ(G, w).
What the above result says is that the semidefinite bound amounts to finding
an optimal ‘correcting’ vector u to add to the diagonal of the Laplacian matrix
Lw . Interestingly, Delorme and Poljak [2] show the unicity of this optimal correcting vector.
Lemma 4.2.3. The program (4.11) has a unique optimal solution u.
Proof. Let X be an optimal solution to the primal program (4.9), let u be an
optimal solution of the program (4.11) and set t = λmax (Lw + Diag(u)). We
claim that
X(tI − (Lw + Diag(u)) = 0.
Indeed, by the optimality condition we have that hLw /4, Xi = nt/4. Combining
this with the fact that hI, Xi = n and hX, Diag(u)i = uT e = 0, we obtain that
hX, tI − (Lw + Diag(u)i = tn − hX, Lw i = 0. As both X and tI − (Lw + Diag(u))
are positive semidefinite, we can conclude that their product is zero, that is,
X(tI − (Lw + Diag(u)) = 0.
Now, if u′ is another optimal solution of (4.11), X(tI − (Lw + Diag(u′ )) =
0 also holds. Therefore, combining with the above identity, we obtain that
XDiag(u − u′ ) = 0, which easily implies that u = u′ (since the diagonal entries
of X are equal to 1).
As an immediate corollary of Theorem 4.2.2 we obtain that
mc(G, w) ≤ sdp(G, w) ≤
n
λmax (Lw )
4
(by selecting u = 0 in the program (4.11)). Moreover, if G is vertex-transitive
and w = e is the all-ones weight function, then selecting the zero vector u = 0
is the optimum solution.
53
Lemma 4.2.4. If G is vertex transitive then sdp(G) =
n
4 λmax (L).
(You will show this in Exercise 4.5). As an application, one can compute the
exact value of sdp(G) when G is an odd cycle (see Exercise 4.5). In particular,
one has:
mc(C5 )
32
√ =: α∗ ∼ 0.884.
=
sdp(C5 )
25 + 5 5
Delorme and Poljak [2] show that the ratio mc(G)/sdp(G) is at least α∗ ∼ 0.884
for several classes of graphs. As we will see in the next section, for general
graphs, the worst case ratio for mc(G)/sdp(G) is at least 0.878... Interestingly,
Poljak [8] could show that α∗ is the worst case value for the ratio of the LP and
SDP bounds:
lp(G, w)
≥ α∗ ∼ 0.884.
sdp(G, w)
4.2.3 The Goemans-Williamson algorithm
Goemans and Williamson [6] show the following result for the semidefinite
programming bound sdp(G, w).
Theorem 4.2.5. Given a graph G with nonnegative edge weights w, the following
inequalities hold:
sdp(G, w) ≥ mc(G, w) ≥ 0.878 · sdp(G, w).
The proof is algorithmic and it gives an approximation algorithm which
approximates the MAX CUT problem within a ratio of 0.878. The GoemansWilliamson algorithm has five steps:
1. Solve the semidefinite
program (4.6); let X be an optimal solution, so
P
that sdp(G, w) = {i,j}∈E wij (1 − Xij )/2.
2. Perform a Cholesky decomposition of X to find unit vectors vi ∈ R|V | for
i ∈ V , so that X = (viT vj )i,j∈V .
3. Choose a random unit vector r ∈ R|V | , according to the rotationally invariant probability distribution on the unit sphere.
4. Define a cut (S, S) by setting xi = sign(viT r) for all i ∈ V . That is, i ∈ S if
and only if sign(viT r) ≤ 0.
P
5. Check whether {i,j}∈E wij (1 − xi xj )/2 ≥ 0.878 · sdp(G, w). If not, go to
step 3.
The steps 3 and 4 in the algorithm are called a randomized rounding procedure because a solution of a semidefinite program is “rounded” (or better:
projected) to a solution of the original combinatorial problem with the help of
randomness.
54
Note also that because the expectation of the constructed solution is at least
0.878 · sdp(G, w) the algorithm eventually terminates; it will pass step 5 and
without getting stuck in an endless loop. One can show that with high probability we do not have to wait long until the condition in step 5 is fulfilled.
The following lemma (also known as Grothendieck’s identity, since it came
up in work of Grothendieck in the 50’s, however in the different context of
functional analysis) is the key to the proof of Theorem 4.2.5. For t ∈ [−1, 1],
recall that arccos t is the unique angle ϑ ∈ [0, π] such that cos ϑ = t.
Lemma 4.2.6. Let u, v ∈ Rd (for some d ≥ 1) be unit vectors and let r ∈ Rd
be a random unit vector chosen according to the rotationally invariant probability
distribution on the unit sphere. The following holds.
(i) The probability that sign(uT r) 6= sign(v T r) is equal to
P(sign(uT r) 6= sign(v T r)) =
arccos(uT v)
.
π
(4.13)
(ii) The expectation of the random variable sign(uT r) sign(v T r) ∈ {−1, +1} is
equal to
2
E[sign(uT r) sign(v T r)] = arcsin(uT v).
(4.14)
π
Proof. (i) If u = ±v then arccos(uT v) = 0, π and the identity (4.13) holds
clearly. We now assume that u, v span a vector space W of dimension 2. Let s
be the orthogonal projection of r onto W , so that rT u = sT u and rT v = sT v.
Then the vectors u, v lie on the unit circle in W and s/ksk is now uniformely
distributed on the unit circle. Hence, the probability that sign(uT r) 6= sign(v T r)
depends only on the angle between u and v and
P[sign(uT r) 6= sign(v T r)] = 2 ·
1
1
arccos(uT v) = arccos(uT v).
2π
π
(To see this, it helps to draw a figure.)
(ii) By definition, the expectation E[sign(uT r) sign(v T r)] can be computed as
(+1) · P[sign(uT r) = sign(v T r)] + (−1) · P[sign(uT r) 6= sign(v T r)]
= 1 − 2 · P[sign(uT r) 6= sign(v T r)] = 1 − 2 ·
arccos(uT v)
,
π
where we have used (i) for the last equality. Now use the trigonometric identity
arcsin t + arccos t =
π
,
2
to conclude the proof of (ii).
Using elementary univariate calculus one can show the following fact.
55
Lemma 4.2.7. For all t ∈ [−1, 1)], the following inequality holds:
2 arccos t
≥ 0.878.
π 1−t
(4.15)
One can also “see” this on the following plots of the function in (4.15), where
t varies in [−1, 1) in the first plot and in [−0.73, −0.62] in the second plot.
10
8.791e-1
8
8.79e-1
6
8.789e-1
4
8.788e-1
8.787e-1
2
8.786e-1
-1
-0.5
0
1
0.5
-0.73
-0.72
-0.71
-0.7
-0.69
-0.68
-0.67
-0.66
-0.65
Proof. (of Theorem 4.2.5) Let X be the optimal solution of the semidefinite
program (4.6) and let v1 , . . . , vn be unit vectors such that X = (viT vj )ni,j=1 , as
in Steps 1,2 of the GW algorithm. Let (S, S) be the random partition of V ,
as in Steps 3,4 of the algorithm. We now use Lemma 4.2.6(i) to compute the
expected value of the cut (S, S):
P
P
E(w(S, S)) = {i,j}∈E wij P({i, j} is cut) = {i,j}∈E wij P(xi 6= xj )
=
=
P
P
{i,j}∈E
{i,j}∈E
wij P(sign(viT r) 6= sign(vjT r)) =
wij
1−viT vj
2
By Lemma 4.2.7, each term
2
π
arccos(v T v )
· π2 1−vT vij j .
P
{i,j}∈E
wij
arccos(viT vj )
π
i
arccos(viT vj )
1−viT vj
can be lower bounded by the constant
0.878. Since all weights are nonnegative, each term wij (1−viT vj ) is nonnegative.
Therefore, we can lower bound E(w(S, S)) in the following way:
X
1 − viT vj
E(w(S, S)) ≥ 0.878 ·
.
wij
2
{i,j}∈E
Now we recognize that the objective value sdp(G, w) of the semidefinite program is appearing in the right hand side and we obtain:
X
1 − viT vj
= 0.878 · sdp(G, w).
wij
E(w(S, S)) ≥ 0.878 ·
2
{i,j}∈E
Finally, it is clear that the maximum weight of a cut is at least the expected
value of the random cut (S, S):
mc(G, w) ≥ E(w(S, S)).
56
Putting things together we can conclude that
mc(G, w) ≥ E(w(S, S)) ≥ 0.878 · sdp(G, w).
This concludes the proof, since the other inequality mc(G, w) ≤ sdp(G, w) holds
by (4.7).
4.2.4 Remarks on the algorithm
It remains to give a procedure which samples a random vector from the unit
sphere. This can be done if one can sample random numbers from the standard
normal (Gaussian) distribution (with mean zero and variance one) which has
probability density
2
1
f (x) = √ e−x /2 .
2π
Many software packages include a procedure which produces random numbers
from the standard normal distribution.
If we sample n real numbers x1 , . . . , xn independently uniformly at random
from the standard normal distribution, then, the vector
1
(x1 , . . . , xn )T ∈ S n−1
r= p 2
2
x1 + · · · + xn
is distributed according to the rotationally invariant probability measure on the
unit sphere.
Finally we mention that one can modify the Goemans-Williamson algorithm
so that it becomes an algorithm which runs deterministically (without the use
of randomness) in polynomial time and which gives the same approximation
ratio. This was done by Mahajan and Ramesh in 1995.
4.3 Extension to variations of MAX CUT
We now consider two variations of MAX CUT: the maximum bisection and maximum k-cut problems, where the analysis of Goemans and Williamson can be
extended to yield good approximations.
4.3.1 The maximum bisection problem
The maximum weight bisection problem is the variant of the maximum cut problem, where one wants to find a cut (S, S) such that |S| = n/2 (called a bisection)
having maximum possible weight. This is also an NP-hard problem.
n
As before let us encode a partition
P (S, S) of V by a vector x ∈ {±1} . Then
we have a bisection precisely when i∈V xi = 0. Hence the maximum bisection
problem can be formulated by the following quadratic program:
1 X
X
mcb (G, w) := max
xi = 0 ,
wij (1 − xi xj ) : x2i = 1 (i ∈ V ),
2
i∈V
{i,j}∈E
57
P
which is obtained by adding the constraint i∈V xi = 0 to the program (4.5).
T
As before we consider
a matrix X ∈ S n modelling the outer
Pproduct2 xx . As
P
x
)
=
0
and
x
=
0
can
be
equivalently
written
as
(
the constraint
i∈V i
i∈V i
P
P
thus as i,j∈V xi xj = 0, this gives the constraint i,j∈V Xij for the matrix X.
Therefore, we get the following semidefinite program:
sdpb (G, w) := max
1
2
X
{i,j}∈E
(1 − Xij ) : X 0, Xii = 1 (i ∈ V ),
X
i,j∈V
which gives a relaxation for the maximum bisection problem:
Xij = 0 ,
(4.16)
mcb (G, w) ≤ sdpb (G, w).
Frieze and Jerrum [4] use and extend the idea of Goemans-Williamson to derive
a 0.651-approximation algorithm for the maximum bisection problem. For this,
one first performs the steps 1-4 of the Goemans-Williamson algorithm. That is,
1. Compute an optimal solution X of the semidefinite program (4.16) and
vectors v1 , . . . , vn ∈ Rn such that Xij = viT vj for i, j ∈ V .
2. Choose a random unit vector r ∈ Rn (according to the rotationally invariant probability distribution on the unit sphere) and define the cut (S, S)
by S = {i ∈ V : rT vi > 0}.
In the next step, one has to modify the cut (S, S) in order to get a bisection.
3. Say s := |S| ≥ n/2 and write S = {i1 , . . . , is }, where
w(i1 , S) ≥ w(i2 , S) ≥ . . . ≥ w(is , S).
Then, set S̃ = {i1 , . . . , in/2 }, so that (S̃, V \ S̃) is a bisection.
First we relate the weights of the two cuts δG (S) and δG (S̃).
Lemma 4.3.1. We have: w(δG (S̃)) ≥
n
2|S| w(δ(S)).
Proof. Set T = S \ S̃ = {in/2+1 , . . . , is }. By construction, w(j, S) ≤ w(i, S) for
all i ∈ S̃ and j ∈ T . After summing over i ∈ S̃ and j ∈ T , we obtain that
|S̃|w(T, S) ≤ |T |w(S̃, S).
Therefore,
|S|
|T |
|S|
= w(S̃, S)
≤ w(δG (S̃)) ,
w(S, S) = w(S̃, S)+w(T, S) ≤ w(S̃, S) 1 +
|S̃|
|S̃|
|S̃|
as w(δG (S̃)) = w(S̃, T ) + w(S̃, S) ≥ w(S̃, S). This concludes the proof .
58
We now consider the following random variable Z:
Z=
|S|(n − |S|)
w(δG (S))
+
.
sdpb (G, w)
n2 /4
The first term is the weight of the random cut (S, S) in (G, w) scaled by the optimum value of the semidefinite program, whose expected value is at least αGW =
0.878.. by the analysis of Goemans-Williamson. For the second term, observe
that the numerator is the cardinality of the random cut (S, S) in the unweighted
P
1−v T v
complete graph Kn , while the denominator is n2 /4 = {i,j}∈E(Kn ) 2i j , usPn
ing the fact that i=1 vi = 0. Hence the expected value of the second term too
is at least αGW by the analysis of Goemans-Williamson. This shows:
Lemma 4.3.2. E(Z) ≥ 2αGW , where αGW = 0.878...
The strategy used by Frieze and Jerrum [4] to get a ”good” bisection is now
to repeat the above steps 1-3 until obtained a cut (S, S) satisfying (almost)
the condition: Z ≥ 2αGW . (Roughly speaking, this will happen with high
probability after sufficiently many rounds.) Indeed, as we now show, if the
random cut (S, S) satisfies the inequality: Z ≥ 2αGW , then the corresponding
bisection (S̃, V \ S̃) provides
a good (0.631) approximation of the maximum
√
bisection. Note that 2( 2αGW − 1) ∼ 0.631.
√
Lemma 4.3.3. If Z ≥ 2αGW then w(δG (S̃)) ≥ 2( 2αGW − 1)sdpb (G, w).
Proof. Using Lemma 4.3.1, we get:
w(δG (S̃)) ≥
n
2|S| w(δG (S))
=
≥
n
2|S|
2(n−|S|)
sdpb (G, w)
n
2(n−|S|)
sdpb (G, w).
n
Zsdpb (G, w) −
n
|S| αGW
−
√
n
It suffices now to check that |S|
αGW − 2(n−|S|)
≥ 2( 2αGW −1) or, equivalently,
n
2
√
αGW − 2 2αGW |S|
+ 2|S|
n
n2 ≥ 0. The latter holds since the left hand side is equal
√
2
√
to
.
αGW − 2 |S|
n
4.3.2 The maximum k-cut problem
Given a graph G = (V, E) and nonnegative edge weights w ∈ RE
+ , the maximum
k-cut problem asks for a partition P = (S1 , . . . , Sk ) of V into k classes so that
the total weight w(P) of the edges cut by the partition is maximized. Here, an
edge is cut by the partition
P if P
its end nodes belong to distinct classes and thus
P
we define w(P) = 1≤h<h′ ≤k′ {i,j}∈E,i∈Sh ,j∈Sh′ wij .
For k = 2, we find again MAX CUT.
As in the case of MAX CUT, there is a simple probabilistic algorithm permitting to construct a random partition of weight at least w(E)(k − 1)/k. Namely,
construct a random partition of V into k classes by assigning, independently
59
with probability 1/k, each node i ∈ V to any of the k classes. Then, the probability that two nodes i, j fall into the same class is equal to 1/k and thus the
expected weight of the random partition is w(E)(1 − k1 ).
Frieze and Jerrum [4] present an approximation algorithm with performance guarantee αk satisfying:
(i) αk > 1 −
1
k
and limk→∞
1
)
αk −(1− k
2k−2 ln k
= 1.
(ii) α2 = αGW > 0.878.., α3 ≥ 0.832, α4 ≥ 0.850, α5 ≥ 0.874, α10 ≥ 0.926,
α100 ≥ 0.99.
The starting point is to model a partition P = (S1 , . . . , Sk ) of V by a vector
x ∈ Rn , whose coordinates can take k possible values. For k = 2 these two
possible values are ±1 and, for k ≥ 2, these possible values are k unit vectors
a1 , . . . , ak ∈ Rk−1 satisfying
aT
i aj = −
1
for all i 6= j ∈ [k].
k−1
(Such vectors exist since the k × k matrix kI − J is positive semidefinite with
rank k − 1.)
Now we can formulate the maximum k-cut problem as the problem:
k − 1 X
wij (1 − xT
mck (G, w) = max
i xj ) : x1 , . . . , xn ∈ {a1 , . . . , ak }
k
{i,j}∈E
and a natural semidefinite relaxation is:
P
sdpk (G, w) = max{ k−1
{i,j}∈E wij (1 − Xij ) :
k
X 0, Xii = 1 (i ∈ V ),
1
(i 6= j ∈ V )}.
Xij ≥ − k−1
(4.17)
Frieze and Jerrum propose the following randomized algorithm:
1. Compute an optimal solution of the semidefinite program (4.17) and vectors v1 , . . . , vn ∈ Rn such that X = (viT vj ).
2. Choose k independent random vectors r1 , . . . , rk ∈ Rn (according to the
rotationally invariant probability distribution on the unit sphere).
3. For h ∈ [k], let Sh consist of the nodes i ∈ V for which rhT vi = maxh′ ∈[k] rhT vi .
This defines a partition P = (S1 , . . . , Sk ) of V (after beaking ties arbitrarily as they occur with probability 0).
The analysis is along the same lines as the analysis for the case k = 2,
although the technical details are more involved. We give a sketch only.
The following observation is crucial: Given two unit vectors u, v, the probability that both maxima in maxh′ ∈[k] rhT u and maxh′ ∈[k] rhT v are attained at the
60
same vector among r1 , . . . , rk depends only on the angle between the two vectors u, v and thus on uT v; we denote this probability as k · I(uT v).
Then the expected weight of the random partition P returned by the above
algorithm is equal to
P
E(w(P)) = {i,j}∈E wij (1 − kI(viT vj )
T
P
k 1−kI(vi vj )
k−1
T
= {i,j}∈E wij k−1
T
k (1 − vi vj )
1−vi vj
≥ αk · sdpk (G, w),
after setting
αk :=
k 1 − kI(t)
.
1−t
− k−1 ≤t<1 k − 1
min
1
The evaluation of the constants αk requires delicate technical details.
4.4 Extension to quadratic programming
In this section we consider several extensions to general quadratic problems,
where some (weaker) approximation algorithms can be shown.
4.4.1 Nesterov’s approximation algorithm
Let us now consider the general quadratic problem:
n
X
Aij xi xj : x2i = 1 ∀i ∈ [n] ,
qp(A) = max
(4.18)
i,j=1
where A ∈ S n is a symmetric matrix. Analogously we can define the semidefinite programming relaxation:
sdp(A) = max {hA, Xi : X 0, Xii = 1 ∀i ∈ [n]} .
(4.19)
The following inequality holds:
qp(A) ≤ sdp(A)
for any symmetric matrix A.
In the case when A = Lw /4 is the (scaled) Laplacian matrix of the weighted
graph (G, w), then we find the MAX CUT problem: qp(A) = mc(G, w), and
its semidefinite relaxation: sdp(A) = sdp(G, w). Assuming all edge weights
are nonnegative, we have just seen that 0.878..sdp(A) ≤ qp(A). The matrix
A = Lw /4 has two specific properties: (a) it is positive semidefinite and (b) its
off-diagonal entries are nonpositive.
We next consider the case when A is assumed to be positive semidefinite
only. Then Nesterov [7] shows that sdp(A) yields a π2 -approximation for qp(A)
61
(Theorem 4.4.2 below). His proof is based on the same rounding technique
of Goemans-Williamson, but the analysis is different. It relies on the following
property of the function arcsin t: There exist positive scalars ak > 0 (k ≥ 0)
such that
X
arcsin t = t +
ak t2k+1 for all t ∈ [−1, 1].
(4.20)
k≥0
Based on this one can show the following result.
Lemma 4.4.1. Given a matrix X = (xij ) ∈ S n , define the new matrix
X̃ = (arcsin Xij − Xij )ni,j=1 ,
whose entries are the images of the entries of X under the map t 7→ arcsin t − t.
Then, X 0 implies X̃ 0.
Proof. The proof uses the following fact: If X = (xij )ni,j=1 is positive semidefk n
)i,j=1 (whose entries are the
inite then, for any integer k ≥ 1, the matrix (Xij
k-th powers of the entries of X) is positive semidefinite as well. (Recall Section
1.2.2 of Chapter 1.)
Using this fact, the form of the series decomposition (4.20), and taking limits,
implies the result of the lemma.
Theorem 4.4.2. Assume A is a positive semidefinite matrix. Then,
sdp(A) ≥ qp(A) ≥
2
sdp(A).
π
Proof. Let X be an optimal solution of the semidefinite program (4.19) and let
v1 , . . . , vn be unit vectors such that X = (viT vj )ni,j=1 (as in Steps 1,2 of the GW
algorithm). Pick a random unit vector r and set xi = sign(viT r) for i ∈ V (as in
Steps 3,4 of the GW
Pnalgorithm). We now use Lemma 4.2.6(ii) to compute the
expected value of i,j=1 Aij xi xj :
E(
=
=
Pn
i,j=1
2
π
2
π
Pn
Aij xi xj ) =
i,j=1
P
Pn
i,j=1
Aij E(xi xj )
Aij arcsin(viT vj ) =
n
i,j=1
Aij Xij +
2
π
Pn
i,j=1
Aij arcsin Xij
A
(arcsin
X
−
X
)
.
ij
ij
ij
i,j=1
Pn
By Lemma 4.4.1, the second term is equal to hA, X̃i ≥ 0, since X̃ 0. Moreover,
we recognize in the first term the objective value of the semidefinite program
(4.19). Combining these facts, we obtain:
E(
n
X
i,j=1
Aij xi xj ) ≥
62
2
sdp(A).
π
On the other hand, it is clear that
qp(A) ≥ E(
n
X
Aij xi xj ).
i,j=1
This concludes the proof.
4.4.2 Quadratic programs modeling MAX 2SAT
Here we consider another class of quadratic programs, of the form:
X
X
bij (1 + xi xj ) : x ∈ {±1}n ,
aij (1 − xi xj ) +
qp(a, b) = max
ij∈E2
ij∈E1
(4.21)
where aij , bij ≥ 0 for all ij. Write the semidefinite relaxation:
X
X
bij (1 + Xij ) : X 0, Xii = 1 ∀i ∈ [n] .
aij (1 − Xij ) +
sdp(a, b) = max
ij∈E1
ij∈E2
(4.22)
Goemans and Williamson [6] show that the same approximation result holds as
for MAX CUT:
Theorem 4.4.3. Assume that a, b ≥ 0. Then,
sdp(a, b) ≥ qp(a, b) ≥ 0.878 · sdp(a, b).
In the proof we will use the following variation of Lemma 4.2.7.
Lemma 4.4.4. For any z ∈ [−1, 1], the following inequality holds:
2 π − arccos z
≥ 0.878.
π
1+z
Proof. Set t = −z ∈ [−1, 1]. Using the identity arccos(−t) = π − arccos t and
z
t
applying (4.15), we get: π2 π−arccos
= π2 arccos
1+z
1−t ≥ 0.878.
Proof. (of Theorem 4.4.3) We apply the GW algorithm: Let X = (viT vj ) be an
optimal solution of (4.22). Pick a random unit vector r and set xi = sign(viT r)
arccos(v T v )
i j
, we
for i ∈ [n]. Using the fact that E(xi xj ) = 1−2·P(xi 6= xj ) = 1−2·
π
can compute the expected value of the quadratic objective of (4.21) evaluated
at x:
63
E
P
=2·
=
P
ij∈E1 aij (1 − xi xj ) +
P
ij∈E1
aij
arccos(viT vj )
π
P
ij∈E2 bij (1 + xi xj )
+2·
P
ij∈E2 bij
1−
arccos(viT vj )
π
2 arccos(viT vj ) P
2 π − arccos(viT vj )
T
T
b
(1
+
v
v
)
+
a
(1
−
v
v
)
ij
j
ij
j
ij∈E2
ij∈E1
|
|
{z i } π 1 − viT vj
{z i } π
1 + viT vj
{z
}
{z
}
|
|
≥0
≥0
≥0.878
≥0.878
≥ 0.878 · sdp(a, b).
Here we have used Lemmas 4.2.7 and 4.4.4. From this we can conclude that
qp(a, b) ≥ 0.878 · sdp(a, b).
In the next section we indicate how to use the quadratic program (4.21) in
order to formulate MAX 2 SAT.
4.4.3 Approximating MAX 2-SAT
An instance of MAX SAT is given by a collection of Boolean clauses C1 , . . . , Cm ,
where each clause Cj is a disjunction of literals, drawn from a set of variables
{z1 , . . . , zn }. A literal is a variable zi or its negation z i . Moreover there is
a weight wj attached to each clause Cj . The MAX SAT problem asks for an
assignment of truth values to the variables z1 , . . . , zn that maximizes the total
weight of the clauses that are satisfied. MAX 2SAT consists of the instances
of MAX SAT where each clause has at most two literals. It is an NP-complete
problem [5] and analogously to MAX CUT it is also hard to approximate.
Goemans and Williamson show that their randomized algorithm for MAX
CUT also applies to MAX 2SAT and yields again a 0.878-approximation algorithm. Prior to their result, the best approximation was 3/4, due to Yannakakis
(1994).
To show this it suffices to model MAX 2SAT as a quadratic program of the
form (4.21). We now indicate how to do this. We introduce a variable xi ∈ {±1}
for each variable zi of the SAT instance. We also introduce an additional variable
x0 ∈ {±1} which is used as follows: zi is true if xi = x0 and false otherwise.
Given a clause C, define its value v(C) to be 1 if the clause C is true and 0
otherwise. Thus,
v(zi ) =
1 − x0 xi
1 + x0 xi
, v(z i ) = 1 − v(zi ) =
.
2
2
Based on this one can now express v(C) for a clause with two literals:
v(zi ∨ zj ) = 1 − v(z i ∧ z j ) = 1 − v(z i )v(z j ) = 1 −
=
1+x0 xi
4
+
1+x0 xj
4
64
+
1−xi xj
.
4
1−x0 xi 1−x0 xj
2
2
Analogously, one can express v(zi ∨ z j ) and v(z i ∨ z j ), by replacing xi by −xi
when zi is negated. In all cases we see that v(C) is a linear combination of
terms of the form 1 + xi xj and 1 − xi xj with nonnegative coefficients.
Now MAX 2SAT can be modelled as
m
X
wj v(Cj ) : x21 = . . . = x2n = 1}.
max{
j=1
This quadratic program is of the form (4.21). Hence Theorem 4.4.3 applies.
Therefore, the approximation algorithm of Goemans and Williamson gives a
0.878 approximation for MAX 2SAT.
4.5 Further reading and remarks
We start with an anecdote. About the finding of the approximation ratio 0.878,
Knuth writes in the article “Mathematical Vanity Plates”:
For their work [6], Goemans and Williamson won in 2000 the Fulkerson
prize (sponsored jointly by the Mathematical Programming Society and the
AMS) which recognizes outstanding papers in the area of discrete mathematics for this result.
65
How good is the MAX CUT algorithm? Are there graphs where the value
of the semidefinite relaxation and the value of the maximal cut are a factor of
0.878 apart or is this value 0.878, which maybe looks strange at first sight, only
an artefact of our analysis? It turns out that the value is optimal. In 2002 Feige
and Schechtmann gave an infinite family of graphs for which the ratio mc/sdp
converges to exactly 0.878 . . .. This proof uses a lot of nice mathematics (continuous graphs, Voronoi regions, isoperimetric inequality) and it is explained in
detail in the Chapter 8 of the book Approximation Algorithms and Semidefinite
Programming of Gärtner and Matoušek.
In 2007, Khot, Kindler, Mossel, O’Donnell showed that the algorithm of Goemans and Williamson is optimal in the following sense: If the unique games
conjecture is true, then there is no polynomial time approximation algorithm
achieving a better approximation ratio than 0.878 unless P = NP. Currently, the
validity and the implications of the unique games conjecture are under heavy
investigation. The book of Gärtner and Matoušek also contains an introduction
to the unique games conjecture.
4.6 Exercises
4.1 The goal of this exercise is to show that the maximum weight stable set
problem can be formulated as an instance of the maximum cut problem.
Let G = (V, E) be a graph with node weights c ∈ RV+ . Define the new
graph G′ = (V ′ , E ′ ) with node set V ′ = V ∪ {0}, with edge set E ′ =
′
E ∪ {{0, i} : i ∈ V }, and with edge weights w ∈ RE
+ defined by
w0i = ci − degG (i)M for i ∈ V, and wij = M for {i, j} ∈ E.
Here, degG (i) denotes the degree of node i in G, and M is a constant to
be determined.
(a) Let S ⊆ V . Show: w(S, V ′ \ S) = c(S) − 2M |E(S)|.
(b) Show: If M is sufficiently large, then S ⊆ V is a stable set of maximum weight in (G, c) if and only if (S, V ′ \ S) is a cut of maximum
weight in (G′ , w).
Give an explicit value of M for which the above holds.
4.2 Let G = (V = [n], E) be a graph and let En denote the set of edges of
the complete graph Kn (thus E ⊆ En ). Let πE denote the projection
from the space REn to its subspace RE , mapping x = (xij )1≤i<j≤n to
y = (xij ){i,j}∈E .
The metric polytope MET(G) is the polytope in RE , defined by the inequalities:
0 ≤ xij ≤ 1
66
for all edges {i, j} of G, and
x(F ) − x(C \ F ) ≤ |F | − 1
for all cycles C in G and all subsets F ⊆ C with odd cardinality |F |.
Recall that the metric polytope METn is the polytope in REn defined by
the triangle inequalities:
xij + xik + xjk ≤ 2, xij ≤ xik + xjk , xik ≤ xij + xjk , xjk ≤ xij + xik
for all 1 ≤ i < j < k ≤ n.
Show: MET(G) = πE (METn ).
E
4.3 Let G = (V = [n], E) be a graph with edge
P weights w ∈ R . Define
n
the Laplacian matrix Lw ∈ S by: Lii =
j∈V :{i,j}∈E wij for i ∈ V ,
Lij = −wij if {i, j} ∈ E, and Lij = 0 otherwise.
P
(a) Show: xT Lw x = 2 · {i,j}∈E wij (1 − xi xj ) for any vector x ∈ {±1}n .
(b) Show: If w ≥ 0 then Lw 0.
(c) Given an example of weights w for which Lw is not positive semidefinite.
4.4 Let G = (V, E) be a graph with nonnegative edge weights w ∈ RE
+ . Let
sdp(G, w) be the optimum value of the semidefinite relaxation for MAX
CUT (as defined in (4.6)).
(a) Show: sdp(G, w) ≤ w(E).
(b) Show: sdp(G, w) = mc(G, w) if G is bipartite.
(c) Assume that G is the complete graph Kn and all edge weights are
equal to 1. Show: sdp(Kn ) = n2 /4. What is the value of mc(Kn )?
4.5 Let G = (V, G) be a graph and let sdp(G) be the optimum value of the
semidefinite relaxation for unweighted MAX CUT, where all edge weights
are equal to 1. Let L denote the Laplacian matrix of G and let λmax (L)
denote its largest eigenvalue.
(a) Show: If G is vertex-transitive then sdp(G) =
n
4 λmax (L).
(b) Show: For the odd cycle G = C2n+1 on 2n + 1 vertices,
π
2n + 1
1 + cos
.
sdp(G) =
2
2n + 1
(c) How does the ratio
0.878..?
mc(C5 )
sdp(C5 )
compare to the Goemans-Williamson ratio
4.6 Let G = (V = [n], E) be a graph and let w ∈ RE
+ be nonnegative edge
weights.
67
(a) Show the following reformulation for the MAX CUT problem:
X
T
arccos(vi vj )
mc(G, w) = max
: v1 , . . . , vn unit vectors in Rn .
wij
π
{i,j}∈E
Hint: Use the analysis of the Goemans-Williamson algorithm.
(b) Let v1 , . . . , v5 be unit vectors. Show:
X
1≤i<j≤3
arccos(viT vj ) + arccos(v4T v5 ) −
5
3 X
X
i=1 j=4
arccos(viT vj ) ≤ 0.
(c) Let v1 , . . . , v7 be unit vectors. Show:
X
arccos(viT vj ) ≤ 12π.
1≤i<j≤7
4.7 For a matrix A ∈ Rm×n we define the following quantities:
XX
Aij |,
f (A) =
max |
I⊆[m],J⊆[n]
i∈I j∈J
called the cut norm of A, and
X X
g(A) = max
Aij xi yj : x1 , . . . , xm , y1 , . . . , yn ∈ {±1} .
i∈[m] j∈[n]
(a) Show: f (A) ≤ g(A) ≤ 4f (A).
(b) Assume that all row sums and all column sums of A are equal to 0.
Show: g(A) = 4f (A).
(c) Formulate a semidefinite programming relaxation for g(A).
(d) Show:
g(A) = max
X X
i∈[m] j∈[n]
Aij xi yj : x1 , . . . , xm , y1 , . . . , yn ∈ [−1, 1]
(e) Assume that A is a symmetric positive semidefinite n × n matrix.
Show:
n X
n
X
g(A) = max
Aij xi xj : x1 , . . . , xn ∈ {±1} .
i=1 j=1
68
.
4.8 Let G = (V, E) be a graph with nonnegative edge weights w ∈ RE
+ . The
goal is to show that the maximum cut problem in (G, w) can be formulated
as an instance of computing the cut norm f (A) of some matrix A (as
defined in Exercise 4.7).
For this consider the following 2|E|×|V | matrix A. Its columns are indexed
by V and, for each edge e = {u, v} ∈ E, there are two rows in A, indexed
by the two oriented pairs e1 = (u, v) and e2 = (v, u), with entries:
Ae1 ,u = we , Ae1 ,v = −we , Ae2 ,u = −we , Ae2 ,v = we .
Show: mc(G, w) = f (A).
69
BIBLIOGRAPHY
[1] F. Barahona. The max-cut problem in graphs not contractible to K5 .
Operations Research Letters 2:107–111, 1983.
[2] C. Delorme and S. Poljak. Laplacian eigenvalues and the maximum cut
problem. Mathematical Programming 62:557–574, 1993.
[3] M. Deza and M. Laurent. Geometry of Cuts and Metrics. Springer, 1997.
[4] A. Frieze and M. Jerrum. Improved approximation algorithms for MAX
k-CUT and MAX BISECTION. Algorithmica 18:67–81, 1997.
[5] M.R. Garey and D.S. Johnson. Computers and Intractability - A guide to
the Theory of NP-Completeness. Freeman, 1979.
[6] M.X. Goemans, D.P. Williamson, Improved approximation algorithms for
maximum cut and satisfiability problems using semidefinite programming,
J. ACM 42:1115–1145, 1995.
[7] Y. Nesterov. Quality of semidefinite relaxation for nonconvex quadratic
optimization. CORE Discussion Paper, Number 9719, 1997.
[8] S. Poljak. Polyhedral and eigenvalue approximations of the max-cut
problem. In Sets, Graphs and Numbers, Vol 60 of Colloquia Mathematica Societatis János Bolyai, Budapest, pp. 569–581, 1991.
[9] S. Poljak and Z. Tuza. The expected relative error of the polyhedral
approximation of the max-cut problem. Operations Research Letters
16:191–1998, 1994.
[10] A. Schrijver. A Course in Combinatorial Optimization. Lecture Notes.
Available at http://homepages.cwi.nl/~lex/files/dict.pdf
70