Goemans LP Notes
Goemans LP Notes
Goemans LP Notes
Linear Programming
Lecturer: Michel X. Goemans
LP-1
where
x1
.
x = .
. ∈ Rn×1
xn
b1
.
..
b = ∈ Rm×1
bm
c1
.
..
c = ∈ Rn×1
c
n
a11
A = ..
∈ Rm×n
.
amn
2 Basic Terminology
Definition 1 If x satisfies Ax = b, x ≥ 0, then x is feasible.
Definition 2 A linear program (LP) is feasible if there exists a feasible solution,
otherwise it is said to be infeasible.
Definition 3 An optimal solution x∗ is a feasible solution s.t. cT x∗ = min{cT x :
Ax = b, x ≥ 0}.
Definition 4 LP is unbounded (from below) if ∀λ ∈ R, ∃ a feasible x∗ s.t. cT x∗ ≤ λ.
3 Equivalent Forms
A linear program can take on several forms. We might be maximizing instead of
minimizing. We might have a combination of equality and inequality contraints.
Some variables may be restricted to be non-positive instead of non-negative, or be
unrestricted in sign. Two forms are said to be equivalent if they have the same set of
optimal solutions or are both infeasible or both unbounded.
1. A maximization problem can be expressed as a minimization problem.
max cT x ⇔ min −cT x
LP-2
3. By adding a slack variable, an inequality can be represented as a combination
of equality and non-negativity constraints.
aTi x ≤ bi ⇔ aTi x + si = bi , si ≥ 0.
4 Example
Consider the following linear program:
x1
≥ 2
3x1 − x2
≥ 0
min x2 subject to
x1 + x2 ≥ 6
−x1 + 2x2 ≥ 0
The optimal solution is (4, 2) of cost 2 (see Figure 1). If we were maximizing x2
instead of minimizing under the same feasible region, the resulting linear program
would be unbounded since x2 can increase arbitrarily. From this picture, the reader
should be convinced that, for any objective function for which the linear program is
bounded, there exists an optimal solution which is a “corner” of the feasible region.
We shall formalize this notion in the next section.
LP-3
3x1− x2 > 0
x2
(2,4)
−x1 + 2x2 > 0
x 1+ x2 > 6
(2,1)
x1 > 2
x1
5 The Geometry of LP
Let P = {x : Ax = b, x ≥ 0} ⊆ Rn .
Proof:
′
If x is a vertex, then take x = x.
If x is not a vertex, then, by definition, ∃y 6= 0 s.t. x + y, x − y ∈ P . Since
A(x + y) = b and A(x − y) = b, Ay = 0.
LP-4
WLOG, assume cT y ≤ 0 (take either y or −y). If cT y = 0, choose y such that ∃j
s.t. yj < 0. Since y 6= 0 and cT y = cT (−y) = 0, this must be true for either y or −y.
Consider x + λy, λ > 0. cT (x + λy) = cT x + λcT y ≤ cT x, since cT y is assumed
non-positive.
Case 2 yj ≥ 0 ∀j
By assumption, cT y < 0 and x + λy is feasible for all λ ≥ 0, since A(x + λy) =
Ax + λAy = Ax = b, and x + λy ≥ x ≥ 0. But cT (x + λy) = cT x + λcT y → −∞
as λ → ∞, implying LP is unbounded, a contradiction.
Proof:
Suppose not. Take an optimal solution. By Theorem 1 there exists a vertex
costing no more and this vertex must be optimal as well. ✷
Corollary 3 If P = {x : Ax = b, x ≥ 0} =
6 ∅, then P has a vertex.
LP-5
x2
(0,1)
(0,0) x1
LP-6
Show ¬ii → ¬i.
Suppose Ax has linearly dependent columns. Then ∃y s.t. Ax y = 0, y 6= 0.
Extend y to Rn by adding 0 components. Then ∃y ∈ Rn s.t. Ay = 0, y 6= 0
and yj = 0 wherever xj = 0.
′ ′ ′
Consider y = λy for small λ ≥ 0. Claim that x + y , x − y ∈ P , by argument
analogous to that in Case 1 of the proof of Theorem 1, above. Hence, x is not
a vertex.
6 Bases
Let x be a vertex of P = {x : Ax = b, x ≥ 0}. Suppose first that |{j : xj > 0}| = m
(where A is m × n). In this case we denote B = {j : xj > 0}. Also let AB = Ax ; we
use this notation not only for A and B, but also for x and for other sets of indices.
Then AB is a square matrix whose columns are linearly independent (by Theorem
4), so it is non-singular. Therefore we can express x as xj = 0 if j 6∈ B, and since
AB xB = b, it follows that xB = A−1
B b. The variables corresponding to B will be called
basic. The others will be referred to as nonbasic. The set of indices corresponding to
nonbasic variables is denoted by N = {1, . . . , n} − B. Thus, we can write the above
as xB = A−1 B b and xN = 0.
Without loss of generality we will assume that A has full row rank, rank(A) = m.
Otherwise either there is a redundant constraint in the system Ax = b (and we can
remove it), or the system has no solution at all.
If |{j : xj > 0}| < m, we can augment Ax with additional linearly independent
columns, until it is an m × m submatrix of A of full rank, which we will denote AB .
In other words, although there may be less than m positive components in x, it is
convenient to always have a basis B such that |B| = m and AB is non-singular. This
enables us to always express x as we did before, xN = 0, xB = A−1 B b.
1. xN = 0 for N = {1, . . . , n} − B
2. AB is non-singular
−1
3. xB = AB b≥0
In this case we say that x is a basic feasible solution. Note that a vertex can have
several basic feasible solution corresponding to it (by augmenting {j : xj > 0} in
different ways). A basis might not lead to any basic feasible solution since A−1 B b is
not necessarily nonnegative.
LP-7
Example:
x1 + x2 + x3 = 5
2x1 − x2 + 2x3 = 1
x1 , x2 , x3 ≥ 0
n
Remark. A crude upper bound on the number of vertices of P is m . This number
m
is exponential
(it is upper bounded by n ). We can come up with a tighter approx-
n− m
imation of m , though this is still exponential. The reason why the number is
2
2
much smaller is that most basic solutions to the system Ax = b (which we counted)
are not feasible, that is, they do not satisfy x ≥ 0.
min cB xB + cN xN
s.t. AB xB + AN xN = b
xB , xN ≥ 0
Here B is the basis corresponding to the bfs we are starting from. Note that, for
any solution x, xB = A−1 −1 T
B b − AB AN xN and that its total cost, c x can be specified
as follows:
LP-8
cT x = cB xB + cN xN
= cB (A−1 −1
B b − AB AN xN ) + cN xN
= cB A−1 −1
B b + (cN − cB AB AN )xN
We denote the reduced cost of the non-basic variables by c̃N , c̃N = cN − cB A−1
B AN ,
i.e. the quantity which is the coefficient of xN above. If there is a j ∈ N such that
c̃j < 0, then by increasing xj (up from zero) we will decrease the cost (the value of
the objective function). Of course xB depends on xN , and we can increase xj only as
long as all the components of xB remain positive.
So in a step of the Simplex method, we find a j ∈ N such that c̃j < 0, and increase
it as much as possible while keeping xB ≥ 0. It is not possible any more to increase
xj , when one of the components of xB is zero. What happened is that a non-basic
variable is now positive and we include it in the basis, and one variable which was
basic is now zero, so we remove it from the basis.
If, on the other hand, there is no j ∈ N such that c̃j < 0, then we stop, and
the current basic feasible solution is an optimal solution. This follows from the new
expression for cT x since xN is nonnegative.
Remarks:
1. Note that some of the basic variables may be zero to begin with, and in this
case it is possible that we cannot increase xj at all. In this case we can replace
say j by k in the basis, but without moving from the vertex corresponding to
the basis. In the next step we might replace k by j, and be stuck in a loop.
Thus, we need to specify a “pivoting rule” to determine which index should
enter the basis, and which index should be removed from the basis.
2. While many pivoting rules (including those that are used in practice) can lead
to infinite loops, there is a pivoting rule which will not (known as the minimal
index rule - choose the minimal j and k possible [Bland, 1977]). This fact was
discovered by Bland in 1977. There are other methods of “breaking ties” which
eliminate infinite loops.
3. There is no known pivoting rule for which the number of pivots in the worst
case is better than exponential.
4. The question of the complexity of the Simplex algorithm and the last remark
leads to the question of what is the length of the shortest path between two
vertices of a convex polyhedron, where the path is along edges, and the length
of the path in measured in terms of the number of vertices visited.
LP-9
Hirsch Conjecture: For m hyperplanes in d dimensions the length of the
shortest path between any two vertices of the arrangement is at most m − d.
This is a very open question — there is not even a polynomial bound proven
on this length.
On the other hand, one should note that even if the Hirsch Conjecture is true,
it doesn’t say much about the Simplex Algorithm, because Simplex generates
paths which are monotone with respect to the objective function, whereas the
shortest path need not be monotone.
Recently, Kalai (and others) has considered a randomized pivoting rule. The
idea is to randomly permute the index columns of A and to apply the Simplex
method, always choosing the smallest j possible. In this way, it is possible to
show a subexponential bound on the expected number of pivots. This leads to
a subexponential bound for the diameter of any convex polytope defined by m
hyperplanes in a d dimension space.
The question of the existence of a polynomial pivoting scheme is still open
though. We will see later a completely different algorithm which is polynomial,
although not strongly polynomial (the existence of a strongly polynomial algo-
rithm for linear programming is also open). That algorithm will not move from
one vertex of the feasible domain to another like the Simplex, but will confine
its interest to points in the interior of the feasible domain.
LP-10
Objective
function
−4 × x1 + x2 + x3 = 6
1 × 2x1 + 3x2 + x3 = 8
1 × 2x1 + x2 + 3x3 = 0
The linear combination results in the equation
which means of course that the system of equations has no feasible solution.
In fact, an elementary theorem of linear algebra says that if a system has no
solution, there is always a vector y such as in our example (y = (−4, 1, 1)) which
proves that the system has no solution.
This is not quite enough for our purposes, because a system can be feasible,
but still have no non-negative solutions x ≥ 0. Fortunately, the following lemma
establishes the equivalent results for our system Ax = b, x ≥ 0.
Theorem 6 (Farkas’ Lemma) Exactly one of the following is true for the system
Ax = b, x ≥ 0:
LP-11
1. There is x such that Ax = b, x ≥ 0.
Proof:
We will first show that the two conditions cannot happen together, and then than
at least one of them must happen.
Suppose we do have both x and y as in the statement of the theorem.
Ax = b =⇒ y T Ax = y T b =⇒ xT AT y = y T b
but this is a contradiction, because y T b < 0, and since x ≥ 0 and AT y ≥ 0, so
xT AT y ≥ 0.
The other direction is less trivial, and usually shown using properties of the Sim-
plex algorithm, mainly duality. We will use another tool, and later use Farkas’ Lemma
to prove properties about duality in linear programming. The tool we shall use is the
Projection theorem, which we state without proof:
We are now ready to prove the other direction of Farkas’ Lemma. Assume that
there is no x such that Ax = b, x ≥ 0; we will show that there is y such that AT y ≥ 0
but y T b < 0.
Let K = {Ax : x ≥ 0} ⊆ Rm (A is an m × n matrix). K is a cone in Rm and it is
convex, non-empty and closed. According to our assumption, Ax = b, x ≥ 0 has no
solution, so b does not belong to K. Let p be the projection of b onto K.
LP-12
z
b
p
LP-13
The intuition behind the precise form for 2. in the previous theorem lies in the proof
that both cannot happen. The contradiction 0 = 0x = (y T A)x = y T (Ax) = y T b < 0
is obtained if AT y = 0 and y T b < 0.
9 Duality
Duality is the most important concept in linear programming. Duality allows to
provide a proof of optimality. This is not only important algorithmically but also it
leads to beautiful combinatorial statements. For example, consider the statement
min cT x
s.t. Ax = b
x≥0
Suppose we wanted to obtain the best possible upper bound on the cost function.
By multiplying each equation Am x = bm by some number ym and summing up the
resulting equations, we obtain that y T Ax = bT y. if we impose that the coefficient of
xj in the resulting inequality is less or equal to cj then bT y must be a lower bound on
the optimal value since xj is constrained to be nonnegative. To get the best possible
lower bound, we want to solve the following problem:
max bT y
s.t. AT y ≤ c
This is another linear program. We call this one the dual of the original one, called
the primal. As we just argued, solving this dual LP will give us a lower bound on the
optimum value of the primal problem. Weak duality says precisely this: if we denote
the optimum value of the primal by z, z = min cT x, and the optimum value of the
dual by w, then w ≤ z. We will use Farkas’ lemma to prove strong duality which says
that these quantities are in fact equal. We will also see that, in general, the dual of
the dual is the problem.
LP-14
Example:
z = min x1 + 2x2 + 4x3
x1 + x2 + 2x3 = 5
2x1 + x2 + 3x3 = 8
The first equality gives a lower bound of 5 on the optimum value z, since x1 + 2x2 +
4x3 ≥ x1 + x2 + 2x3 = 5 because of nonnegativity of the xi . We can get an even
better lower bound by taking 3 times the first equality minus the second one. This
givesx1 +2x2 + 3x3 = 7 ≤ x1 + 2x2 + 4x3 , implying a lower bound of 7 on z. For
3
x = 2 , the objective function is precisely 7, implying optimality. The mechanism
0
of generating lower bounds is formalized by the dual linear program:
y1 represents the multiplier for the first constraint and y2 the multiplier for the second
constraint,
! This LP’s objective function also achieves a maximum value of 7 at y =
3
.
−1
We now formalize the notion of duality. Let P and D be the following pair of dual
linear programs:
(P ) z = min{cT x : Ax = b, x ≥ 0}
(D) w = max{bT y : AT y ≤ c}.
(P ) is called the primal linear program and (D) the dual linear program.
In the proof below, we show that the dual of the dual is the primal. In other
words, if one formulates (D) as a linear program in standard form (i.e. in the same
form as (P )), its dual D(D) can be seen to be equivalent to the original primal (P ).
In any statement, we may thus replace the roles of primal and dual without affecting
the statement.
Proof:
The dual problem D is equivalent to min{−bT y : AT y + Is = c, s ≥ 0}. Changing
forms we get min{−bT y + + bT y − : AT y + − AT y − + Is = c, and y + , y −, s ≥ 0}. Taking
the dual of this we obtain: max{−cT x : A(−x) ≤ −b, −A(−x) ≤ b, I(−x) ≤ 0}. But
this is the same as min{cT x : Ax = b, x ≥ 0} and we are done. ✷
We have the following results relating w and z.
LP-15
Proof:
Suppose x is primal feasible and y is dual feasible. Then, cT x ≥ y T Ax = y T b,
thus z = min{cT x : Ax = b, x ≥ 0} ≥ max{bT y : AT y ≤ c} = w. ✷
From the preceding lemma we conclude that the following cases are not possible
(these are dual statements):
1. ∃x′ : A′ x′ ≤ b′ .
Proof:
We only need to show that z ≤ w. Assume without loss of generality (by duality)
that P is feasible. If P is unbounded, then by Weak Duality, we have that z = w =
−∞. Suppose P is bounded, and let x∗ be an optimal solution, i.e. Ax∗ = b, x∗ ≥ 0
and cT x∗ = z. We claim that ∃y s.t. AT y ≤ c and bT y ≥ z. If so we are done. !
′ AT
Suppose no such y exists. Then, by the preceding corollary, with A = ,
−bT
! !
′ c ′ ′ x
b = , x = y, y = , ∃x ≥ 0, λ ≥ 0 such that
−z λ
Ax = λb
and cT x < λz.
LP-16
9.1 Rules for Taking Dual Problems
If P is a minimization problem then D is a maximization problem. If P is a maxi-
mization problem then D is a minimization problem. In general, using the rules for
transforming a linear program into standard form, we have that the dual of (P ):
z = min cT1 x1 + cT2 x2 + cT3 x3
s.t.
A11 x1 + A12 x2 + A13 x3 = b1
A21 x1 + A22 x2 + A23 x3 ≥ b2
A31 x1 + A32 x2 + A33 x3 ≤ b3
x1 ≥ 0 , x2 ≤ 0 , x3 UIS
(where UIS means “unrestricted in sign” to emphasize that no constraint is on the
variable) is (D)
w = max bT1 y1 + bT2 y2 + bT3 y3
s.t.
AT11 y1 + AT21 y2 + AT31 y3 ≤ c1
AT12 y1 + AT22 y2 + AT32 y3 ≥ c2
AT13 y1 + AT23 y2 + AT33 y3 = c3
y1 UIS , y2 ≥ 0 , y3 ≤ 0
10 Complementary Slackness
Let P and D be
(P ) z = min{cT x : Ax = b, x ≥ 0}
(D) w = max{bT y : AT y ≤ c},
and let x be feasible in P , and y be fesible in D. Then, by weak duality, we know that
cT x ≥ bT y. We call the difference cT x − bT y the duality gap. Then we have that the
duality gap is zero iff x is optimal in P , and y is optimal in D. That is, the duality
gap can serve as a good measure of how close a feasible x and y are to the optimal
solutions for P and D. The duality gap will be used in the description of the interior
point method to monitor the progress towards optimality.
It is convenient to write the dual of a linear program as
LP-17
since AT y + s = c.
The following theorem allows to check optimality of a primal and/or a dual solu-
tion.
2. (s∗ )T x∗ = 0.
3. x∗j s∗j = 0, ∀ j = 1, . . . , n.
Proof:
Suppose (1) holds, then, by strong duality, cT x∗ = bT y ∗ . Since c = AT y ∗ + s∗ and
Ax∗ = b, we get that (y ∗ )T Ax∗ + (s∗ )T x∗ = (x∗ )T AT y ∗ , and thus, (s∗ )T x∗ = 0 (i.e (2)
holds). It follows, since x∗j , s∗j ≥ 0, that x∗j s∗j = 0, ∀ j = 1, . . . , n (i.e. (3) holds).
Hence, if s∗j > 0 then x∗j = 0, ∀ j = 1, . . . , n (i.e. (4) holds). The converse also holds,
and thus the proof is complete. ✷
In the example of section 9, the complementary slackness equations corresponding
to the primal solution x = (3, 2, 0)T would be:
y1 + 2y2 = 1
y1 + y2 = 2
Note that this implies that y1 = 3 and y2 = −1. Since this solution satisfies the
other constraint of the dual, y is dual feasible, proving that x is an optimum solution
to the primal (and therefore y is an optimum solution to the dual).
LP-18
Let’s consider the linear program of the form:
min cT x
s.t.
Ax = b
x≥0
where the first 1 stands for the fact that we need one bit to store the sign of n, size(n)
represents the number of bits needed to encode n in binary. Analogously, we define
the size of a p × 1 vector d, and of a p × l matrix M as follows:
△ Pp
size(v) = i=1 size(vi )
△ Pp Pl
size(M) = i=1 j=1 size(mij )
where
△
detmax = max (| det(A′ )|)
A′
△
bmax = max(|bi |)
i
△
cmax = max(|cj |)
j
LP-19
Proposition 13 L < size(LP), ∀A, b, c.
Proof:
1. By definition.
n
X n
Y n
Y
2. 1 + kvk ≤ 1 + kvk1 = 1 + |vi | ≤ (1 + |vi|) ≤ 2size(vi )−1 = 2size(v)−n where
i=1 i=1 i=1
we have used 1.
Hence, by 2,
n
Y n
Y n
Y 2
1 + |det(A)| ≤ 1 + kai k ≤ (1 + kai k) ≤ 2size(ai )−n = 2size(A)−n .
i=1 i=1 i=1
✷
We now prove Proposition 13.
Proof:
If B is a square submatrix of A then, by definition, size(B) ≤ size(A). Moreover,
by lemma 14, 1 + |det(B)| ≤ 2size(B)−1 . Hence,
Let v ∈ Zp . Then size(v) ≥ size(maxj |vj |) + p − 1 = ⌈log(1 + maxj |vj |)⌉ + p. Hence,
size(b) + size(c) ≥ ⌈log(1 + max |cj |)⌉ + ⌈log(1 + max |bi |)⌉ + m + n. (2)
j i
Remark 1 detmax ∗ bmax ∗ cmax ∗ 2m+n < 2L , since for any integer n, 2size(n) > |n|.
In what follows we will work with L as the size of the input to our algorithm.
LP-20
11.2 Size of the Output
In order to even hope to solve a linear program in polynomial time, we better make
sure that the solution is representable in size polynomial in L. We know already that
if the LP is feasible, there is at least one vertex which is an optimal solution. Thus,
when finding an optimal solution to the LP, it makes sense to restrict our attention
to vertices only. The following theorem makes sure that vertices have a compact
representation.
and
0 ≤ pi < 2 L
1 ≤ q < 2L .
Proof:
Since x is a basic feasible solution, ∃ a basis B such that xB = A−1
B b and xN = 0.
Thus, we can set pj = 0, ∀ j ∈ N, and focus our attention on the xj ’s such that
j ∈ B. We know by linear algebra that
1
xB = A−1
B b = cof (AB )b
det(AB )
where cof (AB ) is the cofactor matrix of AB . Every entry of AB consists of a deter-
minant of some submatrix of A. Let q = |det(AB )|, then q is an integer since AB has
integer components, q ≥ 1 since AB is invertible, and q ≤ detmax < 2L . Finally, note
P
that pB = qxB = |cof (AB )b|, thus pi ≤ m L
j=1 |cof (AB )ij ||bj | ≤ m detmax bmax < 2 .
✷
Definition 8 (LP)
Input: Integral A, b, c, and a rational number λ,
Question: Is min{cT x : Ax = b, x ≥ 0} ≤ λ?
LP-21
Theorem 16 LP ∈ NP ∩ co-NP
Proof:
First, we prove that LP ∈ NP.
If the linear program is feasible and bounded, the “certificate” for verification of
instances for which min{cT x : Ax = b, x ≥ 0} ≤ λ is a vertex x′ of {Ax = b, x ≥ 0}
s.t. cT x′ ≤ λ. This vertex x′ always exists since by assumption the minimum is finite.
Given x′ , it is easy to check in polynomial time whether Ax′ = b and x′ ≥ 0. We also
need to show that the size of such a certificate is polynomially bounded by the size
of the input. This was shown in section 11.2.
If the linear program is feasible and unbounded, then, by strong duality, the dual
is infeasible. Using Farkas’ lemma on the dual, we obtain the existence of x̃: Ax̃ = 0,
x̃ ≥ 0 and cT x̃ = −1 < 0. Our certificate in this case consists of both a vertex of
{Ax = b, x ≥ 0} (to show feasiblity) and a vertex of {Ax = 0, x ≥ 0, cT x = −1}
(to show unboundedness if feasible). By choosing a vertex x′ of {Ax = 0, x ≥ 0,
cT x = −1}, we insure that x′ has polynomial size (again, see Section 11.2).
This proves that LP ∈ NP. (Notice that when the linear program is infeasible,
the answer to LP is “no”, but we are not responsible to offer such an answer in order
to show LP ∈ NP).
Secondly, we show that LP ∈ co-NP, i.e. LP ∈ NP, where LP is defined as:
Input: A, b, c, and a rational number λ,
Question: Is min{cT x : Ax = b, x ≥ 0} > λ?
If {x : Ax = b, x ≥ 0} is nonempty, we can use strong duality to show that LP is
indeed equivalent to:
Input: A, b, c, and a rational number λ,
Question: Is max{bT y : AT y ≤ c} > λ?
which is also in NP, for the same reason as LP is.
If the primal is infeasible, by Farkas’ lemma we know the existence of a y s.t.
A y ≥ 0 and bT y = −1 < 0. This completes the proof of the theorem.
T
✷
LP-22
In 1984, Karmarkar presented another polynomial-time algorithm for linear pro-
gramming. His algorithm avoids the combinatorial complexity (inherent in the sim-
plex algorithm) of the vertices, edges and faces of the polyhedron by staying well
inside the polyhedron (see Figure 13). His algorithm lead to many other algorithms
for linear programming based on similar ideas. These algorithms are known as interior
point methods.
It still remains an open question whether there exists a strongly polynomial algo-
rithm for linear programming, i.e. an algorithm whose running time depends on m
and n and not on the size of any of the entries of A, b or c.
In the rest of these notes, we discuss an interior-point method for linear program-
ming and show its polynomiality.
High-level description of an interior-point algorithm:
1. If x (current solution) is close to the boundary, then map the polyhedron onto
another one s.t. x is well in the interior of the new polyhedron (see Figure 7).
LP-23
Theorem 17 Let x1 , x2 be vertices of Ax = b,
x ≥ 0.
Proof:
By Theorem 15, ∃ qi , q2 , such that 1 ≤ q1 , q2 < 2L , and q1 x1 , q2 x2 ∈ Nn . Further-
more,
q1 cT x1 q2 cT x2
|cT x1 − cT x2 | = −
q1 q2
q1 q2 (c x1 − cT x2 )
T
=
q1 q2
1
≥ since cT x1 − cT x2 6= 0, q1 , q2 ≥ 1
q1 q2
1
> L L = 2−2L since q1 , q2 < 2L .
2 2
✷
Proof:
Suppose x′ is not optimal. Then, ∃x∗ , an optimal vertex, such that cT x∗ = z.
Since x′ is not optimal, cT x′ 6= cT x∗ , and by Theorem 17
⇒ cT x′ − cT x∗ > 2−2L
⇒ cT x′ > cT x∗ + 2−2L
= Z + 2−2L
≥ cT x by definition of x
≥ cT x′ by definition of x′
⇒ c x > cT x′ ,
T ′
a contradiction. ✷
What this corollary tells us is that we do not need to be very precise when choosing
an optimal vertex. More precisely we only need to compute the objective function
with error less than 2−2L . If we find a vertex that is within that margin of error, then
it will be optimal.
LP-24
x’
x
P
P’
Figure 7: A centering mapping. If x is close to the boundary, we map the polyhedron
P onto another one P ′ , s.t. the image x′ of x is closer to the center of P ′ .
cT x − bT y = xT s > 0
LP-25
very small but stay away from the boundaries. Two tools will be used to achieve this
goal in polynomial time.
Tool 1: Scaling (see Figure 7)
Scaling is a crucial ingredient in interior point methods. The two types of scaling
commonly used are projective scaling (the one used by Karmarkar) and affine scaling
(the one we are going to use).
Suppose the current iterate is x > 0 and s > 0, where x = (x1 , x2 , . . . , xn )T , then
the affine scaling maps x to x′ as follows.
x1
x1 x1
x2
x2
x2
.
.
−→ x′ =
x=
.
.
.
.
xn
xn xn .
T
Notice this transformation maps x to e = (1, . . . , 1) .
−1
We can express the scaling transformation in matrix form as x′ = X x or x =
Xx′ , where
x1 0 0 ... 0
0 x2 0 ... 0
X = ... .. ..
. .
0 0 . . . xn−1 0
0 0 ... 0 xn .
Using matrix notation we can rewrite the linear program (P) in terms of the trans-
formed variables as:
minimize Z = cT Xx′
subject to AXx′ = b,
x′ ≥ 0.
T
If we define c = Xc (note that X = X ) and A = AX we can get a linear program
in the original form as follows.
minimize Z = cT x′
subject to Ax′ = b,
x′ ≥ 0.
We can also write the dual problem (D) as:
maximize W = bT y
subject to (AX)T y + Xs = c,
Xs ≥ 0
LP-26
or, equivalently,
maximize W = bT y
T
subject to A y + s′ = c,
s′ ≥ 0
LP-27
Proof:
Given any n positive numbers t1 , . . . , tn , we know that their geometric mean does
not exceed their arithmetic mean, i.e.
1/n
n n
Y 1 X
tj ≤ tj
j=1 n j=1
.
(In fact the last inequality can be derived directly from the concavity of the logarith-
mic function). The lemma follows if we set tj = xj sj . ✷
Since our objective is that G → −∞ as xT s → 0 (since our primary goal is to get
close to optimality), according to Lemma 19, we should choose some q > n (notice
that ln xT s → −∞ as xT s → 0) . In particular, if we choose q = n + 1, the√algorithm
√ In fact we are going to set q = n + n, which
will terminate after O(nL) iterations.
gives us the smallest number — O( nL) — of iterations by this method.
Question 2: When can we stop?
Suppose that xT s ≤ 2−2L , then cT x − Z ≤ cT x − bT y = xT s ≤ 2−2L , where Z is
the optimum value to the primal problem. From Corollary 18, the following claim
follows immediately.
In order to find x∗ from x, two methods can be used. One is based on purely
algebraic techniques (but is a bit cumbersome to describe), while the other (the
cleanest one in literature) is based upon basis reduction for lattices. We shall not
elaborate on this topic, although we’ll get back to this issue when discussing basis
reduction in lattices.
√
Lemma 21 Let x, s be feasible primal-dual vectors such that G(x, s) ≤ −k nL for
some constant k. Then
xT s < e−kL .
LP-28
Proof:
By the definition of G(x, s) and the previous theorem we have:
√
−k nL ≥ G(x, s)
√ Xn
= (n + n) ln xT s − ln xj sj
j=1
√ T
≥ n ln x s + n ln n.
Rearranging we obtain
√
ln xT s ≤ −kL − n ln n
< −kL.
Therefore
xT s < e−kL . ✷
The previous lemma and claim tell us that we can stop whenever G(x, s) ≤
√
−2 nL. In practice, the algorithm can terminate even earlier, so it is a good idea to
check from time to time if we can get the optimal solution right away.
Please notice that according to Equation (3) the affine transformation does not
change the value of the potential function. Hence we can work either in the original
space or in the transformed space when we talk about the potential function.
The iterative step is as follows. Affine scaling maps (xi , si ) to (e, s′ ). In this
transformed space, the point is far away from the boundaries. Either a dual or
LP-29
a1 a2 g
g−d
Null space of A
d
{x: Ax = 0}
primal step occurs, giving (x̃, s̃) and reducing the potential function. The point is
then mapped back to the original space, resulting in (xi+1 , si+1 ).
Next, we are going to describe precisely how the primal or dual step is made such
that
7
G(xi+1 , si+1 ) − G(xi , si ) ≤ − <0
120
√
holds for either a primal or dual step, yielding an O( nL) total number of iterations.
In order to find the new point (x̃, s̃) given the current iterate (e, s′ ) (remember
we are working in the transformed space), we compute the gradient of the potential
function. This is the direction along which the value of the potential function changes
at the highest rate. Let g denote the gradient. Recall that (e, s′ ) is the map of the
current iterate, we obtain
g = ∇x G(x, s)|(e,s′)
1/x1
q ..
= T s− .
x s
1/xn (e,s′ )
q
= T ′ s′ − e (4)
e s
We would like to maximize the change in G, so we would like to move in the
direction of −g. However, we must insure the new point is still feasible (i.e. Ax̃ = b).
Let d be the projection of g onto the null space {x : Ax = 0} of A. Thus, we will
move in the direction of −d.
T
Claim 22 d = (I − A(A A )−1 A)g.
LP-30
Proof:
Since g − d is orthogonal to the null space of A, it must be the combination of
some row vectors of A. Hence we have
(
Ad = 0
T
∃w, s.t. A w = g − d.
This implies
T
A w =g−d
(normal equations).
(A AT )w = Ag
Claim 23 x̃ > 0.
Proof:
dj
x˜j = 1 − 41 ||d|| ≥ 43 > 0. ✷
This claim insures that the new iterate is still an interior point. For the similar
reason, we will see that s̃ > 0 when we make a dual step.
7
Proposition 24 When a primal step is made, G(x̃, s̃) − G(e, s′ ) ≤ − 120 .
If ||d|| < 0.4, we make a dual step. Again, we calculate the gradient
h = ∇s G(x, s)|(e,s′ )
1/s′1
q .
..
= T ′e − (5)
e s
1/s′n
LP-31
Notice that hj = gj /sj , thus h and g can be seen to be approximately in the same
direction.
Suppose the current dual feasible solution is y ′ , s′ such that
T
A y ′ + s′ = c.
Thus, in the dual space, we move perpendicular to the null space and in the direction
of −(g − d).
Thus, we have
s̃ = s′ − (g − d)µ
T
For any µ, ∃y A y + s̃ = c
T ′ T
So, we can choose µ = e qs and get A (y ′ + µw) + s̃ = c.
Therefore,
eT s′
s̃ = s′ − (g − d)
q
eT s′ s′
= s′ − (q T ′ − e − d)
q e s
T ′
e s
= (d + e)
q
x̃ = x′ = e.
One can show that s̃ > 0 as we did in Claim 23. So such move is legal.
LP-32
15 Analysis of the Potential Function
In this section, we prove the two propositions of the previous section, which concludes
the analysis of Ye’s algorithm.
Proof of Proposition 24:
1
G(x̃, s̃) − G(e, s′ ) = G(e − d, s̃) − G(e, s′ )
4||d||
! n
! n
T ′ d T s′ X dj X
= q ln e s − − ln 1 − − ln s′j −
4||d|| j=1 4||d|| j=1
n
X n
X
T ′
−q ln e s + ln 1 + ln s′j
j=1 j=1
! n
!
d T s′ X dj
= q ln 1 − − ln 1 − .
4||d||eT s′ j=1 4||d||
x2
−x− ≤ ln(1 − x) ≤ −x (6)
2(1 − a)
which holds for |x| ≤ a < 1, we get:
′ q d T s′ n
X dj n
X d2j
G(x̃, s̃) − G(e, s ) ≤ − + + for a = 1/4
4||d||eT s′ j=1 4||d|| j=1 16||d||2 2(3/4)
q d T s′ eT d 1
= − T ′
+ +
4||d||e s 4||d|| 24
1 q ′ T 1
= (e − T ′ s ) d +
4||d|| e s 24
1 1
= (−g)T d +
4||d|| 24
2
||d|| 1
= − +
4||d|| 24
||d|| 1
= − +
4 24
1 1
≤ − +
10 24
7
= − .
120
Note that g T d = ||d||2, since d is the projection of g. (This is where we use the
fact that d is the projected gradient!) ✷
Before proving Proposition 25, we need the following lemma.
LP-33
Lemma 26 n
X eT s̃ −2
ln(s̃j ) − n ln( )≥ .
j=1 n 15
Proof:
∆
Using the equality s̃ = q
(e + d) and Equation 6, which holds for |x| ≤ a < 1, we
see that
Pn T Pn eT d
j=1 ln(s̃j ) − n ln( e ns̃ ) = ∆
j=1 ln( q (1 + dj )) − n ln( ∆q (1 + n
))
Pn dj 2 T
≥ j=1 (dj − 2(3/5)
) − n e nd
2
≥ − ||d||
6/5
−2
≥ 15
✷
Proof of Proposition 25:
Using Lemma 26 and the inequality
n
X eT s
ln(sj ) ≤ n ln( ),
j=1 n
which follows from the concavity of the logarithm function, we have
T Pn Pn
G(e, s̃) − G(e, s′ ) = q ln( eeT ss̃′ ) − j=1 ln(s˜j ) + ′
j=1 ln(sj )
T T T ′
≤ q ln( eeT ss̃′ ) + 2
15
− n ln( en s̃ ) + n ln( e ns )
2 √ T
= 15
+ n ln( eeT ss̃′ )
On the other hand,
∆
eT s̃ = (n + eT d)
q
and recall that ∆ = eT s′ ,
eT s̃ 1 1 √
T ′
= (n + eT d) ≤ √ (n + 0.4 n),
e s q n+ n
√
since, by Cauchy-Schwartz inequality, |eT d| ≤ ||e|| ||d|| = n||d||. Combining the
above inequalities yields
2 √ √
0.6 √n
G(e, s̃) − G(e, s′ ) ≤ 15 + n ln(1 − n+ n
)
2 0.6n
≤ 15
− √
n+ n
2 3
≤ 15
− 10
= − 61
LP-34
√
since n + n ≤ 2n. ✷
This completes the analysis of Ye’s algorithm.
16 Bit Complexity
Throughout the presentation of the algorithm, we assumed that all operations can
be performed exactly. This is a fairly unrealistic assumption. For example, notice
that kdk might be irrational since it involves a square root. However, none of the
thresholds we set were crucial. We could for example test whether kdk ≥ 0.4 or
kdk ≤ 0.399. To test this, we need to compute only a few bits of kdk. Also, if
we perform a primal step (i.e. kdk ≥ 0.4) and compute the first few bits of kdk so
that the resulting approximation kdkap satisfies (4/5)kdk ≤ kdkap ≤ kdk then if we go
through the analysis of the primal step performed in Proposition 1, we obtain that the
reduction in the potential function is at least 19/352 instead of the previous 7/120.
Hence, by rounding kdk we can still maintain a constant decrease in the potential
function.
Another potential problem is when using Gaussian elimination to compute the pro-
jected gradient. We mentioned that Gaussian elimination requires O(n3 ) arithmetic
operations but we need to show that, during the computation, the numbers involved
have polynomial size. For that purpose, consider the use of Gaussian elimination to
solve a system Ax = b where
(1) (1) (1)
a11 a12 . . . a1n
(1) (1) (1)
(1)
a21 a22 . . . a2n
A=A = .. .. .. .. .
. . . .
(1) (1) (1)
am1 am2 . . . amn
Assume that a11 6= 0 (otherwise, we can permute rows or columns). In the first
(1) (1)
iteration, we substract ai1 /a11 times the first row from row i where i = 2, . . . , m,
resulting in the following matrix:
(2) (2) (2)
a11 a12 . . . a1n
(2) (2)
0 a22 . . . a2n
A(2) =
.. .. .. ..
.
. . . .
(2) (2)
0 am2 . . . amn
(i) (i)
In general, A(i+1) is obtained by subtracting aji /aii times row i from row j of A(i)
for j = i + 1, . . . , m.
(i)
Theorem 27 For all i ≤ j, k, ajk can be written in the form det(B)/ det(C) where
B and C are some submatrices of A.
LP-35
Proof:
Let Bi denote the i × i submatrix of A(i) consisting of the first i entries of the first
(i)
i rows. Let Bjk denote the i × i submatrix of A(i) consisting of the first i − 1 rows
(i)
and row j, and the first i − 1 columns and column k. Since Bi and Bjk are upper
triangular matrices, their determinants are the products of the entries along the main
diagonal and, as a result, we have:
(i) det(Bi )
aii =
det(Bi−1 )
and
(i)
(i) det(Bjk )
ajk = .
det(Bi−1 )
Moreover, remember that row operations do not affect the determinants and, hence,
(i)
the determinants of Bjk and Bi−1 are also determinants of submatrices of the original
matrix A. ✷
Using the fact that the size of the determinant of any submatrix of A is at most the
size of the matrix A, we obtain that all numbers occuring during Gaussian elimination
require only O(L) bits.
Finally, we need to round the current iterates x, y and s to O(L) bits. Otherwise,
these vectors would require a constantly increasing number of bits as we iterate. By
rounding up x and s, we insure that these vectors are still strictly positive. It is
fairly easy to check that this rounding does not change the potential function by a
significant amount and so the analysis of the algorithm is still valid. Notice that now
the primal and dual constraints might be slightly violated but this can be taken care
of in the rounding step.
Min cT x Max bT y
(P ) s.t. Ax = b (D) s.t. AT y + s = c
x≥0 s ≥ 0
√
and q = n + n.
LP-36
Consider the pair of dual linear programs:
Min cT x + kc xn+1
′ 2L
(P ) s.t. Ax + (b − 2 Ae)xn+1 = b
(24L e − c)T x + 24L xn+2 = kb
x≥0 xn+1 ≥ 0 xn+2 ≥ 0
and
Min bT y + kb ym+1
′ T 4L
(D ) s.t. A y + (2 e − c)ym+1 + s = c
(b − 2 Ae)T y
2L
+ sn+1 = kc
24L ym+1 + sn+2 = 0
s, sn+1 , sn+2 ≥ 0
where kb = 26L (n + 1) − 22L cT e is chosen in such a way that x′ = (x, xn+1 , xn+2 ) =
(22L e, 1, 22L ) is a (strict) feasible solution to (P ′) and kc = 26L . Notice that (y ′ , s′ ) =
(y, ym+1, s, sn+1 , sn+2) = (0, −1, 24L e, kc , 24L ) is a feasible solution to (D ′ ) with s′ > 0.
x′ and (y ′, s′ ) serve as our initial feasible solutions.
We have to show:
√
1. G(x′ ; s′ ) = O( n′ L) where n′ = n + 2,
3. the input size L′ for (P ′ ) as defined in the lecture notes does not increase too
much.
The proofs of these statements are simple but heavily use the definition of L and
the fact that vertices have all components bounded by 2L .
We first show 1. Notice first that x′j s′j = 26L for all j, implying that
′ ′ ′
√ ′T ′
n
X
′
LP-37
Proposition 28 Let x′ = (x∗ , 0, (kb −(24L e−c)T x∗ )/24L ) and let (y ′, s′ ) = (y ∗ , 0, , s∗, kc −
(b − 22L Ae)T y ∗, 0). Then
1. x′ is a feasible solution to (P ′ ) with x′n+2 > 0,
2. (y ′, s′ ) is a feasible solution to (D ′ ) with s′n+1 > 0,
3. x′ and (y ′, s′ ) satisfy complementary slackness, i.e. they constitute a pair of
optimal solutions for (P ′ ) − (D ′).
Proof:
To show that x′ is a feasible solution to (P ′) with x′n+2 > 0, we only need to
show that kb − (24L e − c)T x∗ > 0 (the reader can easily verify that x′ satisfy all the
equalities defining the feasible region of (P ′ )). This follows from the fact that
(24L e − c)T x∗ ≤ n(24L + 2L )2L = n(25L + 22L ) < n26L
and
kb = 26L (n + 1) − 22L cT e ≥ 26L (n + 1) − 22L n max |cj | ≥ 26L n + 26L − 23L > n26L
j
where we have used the definition of L and the fact that vertices have all their entries
bounded by 2L .
To show that (y ′, s′ ) is a feasible solution to (D ′ ) with s′n+1 > 0, we only need to
show that kc − (b − 22L Ae)T y ∗ > 0. This is true since
(b − 22L Ae)T y ∗ ≤ bT y ∗ − 22L eT AT y ∗
≤ m max |bi |2L + 22L nm max |aij |2L
i i,j
2L 4L 6L
= 2 +2 <2 = kc .
x′ and (y ′ , s′ ) satisfy complementary slackness since
• x∗ T s∗ = 0 by optimality of x∗ and (y ∗, s∗ ) for (P ) and (D)
• x′n+1 s′n+1 = 0 and
• x′n+2 s′n+2 = 0.
✷
This proposition shows that, from an optimal solution to (P ) − (D), we can easily
construct an optimal solution to (P ′ ) − (D ′ ) of the same cost. Since this solution
has s′n+1 > 0, any optimal solution x̂ to (P ′) must have x̂n+1 = 0. Moreover, since
x′n+2 > 0, any optimal solution (ŷ, ŝ) to (D ′ ) must satisfy ŝn+2 = 0 and, as a result,
ŷm+1 = 0. Hence, from any optimal solution to (P ′ ) − (D ′ ), we can easily deduce an
optimal solution to (P ) − (D). This shows the equivalence between (P ) − (D) and
(P ′ ) − (D ′ ).
By some tedious but straightforward calculations, it is possible to show that L′
(corresponding to (P ′)−(D ′ )) is at most 24L. In other words, (P )−(D) and (P ′ )−(D ′ )
have equivalent sizes.
LP-38
References
[1] V. Chvatal. Linear Programming. W.H. Freeman and Company, 1983.
[8] A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons,
1986.
[9] Y. Ye. An O(n3L) potential reduction algorithm for linear programming. Math-
ematical Programming, 50:239–258, 1991.
LP-39
LP-40