Notes
Notes
Notes
Required reading is from Chvatal. Optional reading is from these reference books:
A: AMPL
K: Karloff
M: Murtys LP book
MC: Murtys LP/CO book
S: Schrijver
V: Vanderbei
Vz: Approximation Algorithms by Vijay Vazirani, Springer 2001.
WV: Winston & Venkataramanan
Handout Reading
#
from C
Handout Title
1
2
3
4
3-9
213-223 (M Ch.1)
Linear Programming
Standard Form
LP Objective Functions
Complexity of LP & Related Problems
13-23
17-19
27-33
23-25
250-255, 260-261
33-37, 258-260
37-38
39-42
42-43
15
16
17
18
19
20
21
54-57
57-59
57-59, 261-262
60-62
62-65
65-68
137-143
22
Ch.15
More Applications
Duality in Game Theory
CSCI 5654
H. Gabow
Fall 2007
#i, p. 1
97-100
100-105
Ch.6
105-111
27
28
195-200, 207-211
201-207
More Applications
The Cutting Stock Problem
Branch-and-Bound Algorithms
37
38
118-119
119-129, 132-133
130-132, 133-134
242-243
143-146
Ex.16.10
152-157
158-162
162-166
(WV 9.8)
262-269
39
40
291-295
296-303
41
443-452, (K Ch.4)
42
43
44
(Vz, Ch.26)
CSCI 5654
More Applications
Semidefinite Programming: Approximating MAX CUT
H. Gabow
Fall 2007
#i, p. 2
9.A. Overview
45
46
(M Ch.1)
9.B. Fundamentals
47
48
47-49, 255-258
49
37-38
9.C. Duality
50
51
52
53
261-262, (S 7.5)
54
9.D. Implementation
55
79-84
56
100-105
57
105-115
58
Review of Matrices
Revised Simplex Example
Eta Factorization of the Basis
Revised Simplex Summary
9.E. Extensions
59
Ch.16
60
149-152
9.F. Networks
61
(S Ch.16, 19, 22)
62
303-317
63
320-322
Integral Polyhdera
Initialization, Cycling, Implementation
Related Network Problems
9.G. Polynomial LP
64
65
(K Ch.5)
66
67
68
69
70
71
72
CSCI 5654
H. Gabow
Fall 2007
#i, p. 3
M W 4:00 5:15
Instructor
Hal Gabow
Office Phone 492-6862
Office Hours
Grader
TBA
Prerequisites
Requirements
H. Gabow
Fall 2007
#0, p. 1
ask me. You will get a 0 for copying even part of an assignment from any source; a second violation
will result in failing the course.
Anyone using any of my solution sheets to do homework will receive an automatic F in the course.
(One student wound up in jail for doing this.)
All students are required to know and obey the University of Colorado Student Honor Code, posted
at http://www.colorado.edu/academics/honorcode. The class website has this link posted under Campus Rules, which also has links to policies on classroom behavior, religious observances
and student disability services. Each assignment must include a statement that the honor code was
obeyed see directions on the class website. I used to lower grades because of honesty issues, but
the Honor Code is working and I havent had to do this in a while.
Late homeworks: Homeworks are due in class on the due date. Late submissions will not be
accepted. Exceptions will be made only if arrangements are made with me 1 week in advance.
When this is impossible (e.g., medical reasons) documentation will be required (e.g., physicians
note).
Ignorance of these rules is not an excuse.
Website & Email
The class website is http://www.cs.colorado.edu/e hal/CS5654/home.html. It is the main form
of communication, aside from class. Assignments, current HW point totals and other course materials will be posted there.
You can email me questions on homework assignments.
the class website. Ill send email to the class indicating
clarifications of the homework assignments, missing office
class will be limited to pointers to the website. I check my
Inappropriate email: Email is great for clarifying homework assignments. I try to answer all email
questions. But sometimes students misuse email and ask me to basically do their work for them.
Dont do this.
Text Linear Programming
by Vasek Chvatal, W.H. Freeman and Co., New York, 1984
References On reserve in Lester Math-Physics Library, or available from me.
Background in Linear Algebra:
Linear Algebra and its Applications
by G. Strang, Harcourt College Publishers, 1988 (3rd Edition)
Similar to Chv
atal:
Linear Programming: Foundations and Extensions
by Robert J. Vanderbei, Kluwer Academic Publishers, 2001 (2nd Edition)
(optional 2nd Text)
Linear Programming
by K.G. Murty, Wiley & Sons, 1983 (revised)
CSCI 5654
H. Gabow
Fall 2007
#0, p. 2
H. Gabow
Fall 2007
#0, p. 3
Unit 5, Extensions, gives other implementation techniques such as upper-bounding and sensitivity
analysis, as well as some general theory about linear inequalities, and the Dual Simplex Algorithm,
which recent work indicates is more powerful than previously thought. We introduce the cutting
plane technique for Integer Linear Programming.
Unit 6 is an introduction to Network Algorithms. These special LPs, defined on graphs, arise often
in real-life applications. We study the Network Simplex Algorithm, which takes advantage of the
graph structure to gain even more efficiency.
Unit 7, Polynomial-Time Linear Programming, surveys the ellipsoid method. This is a powerful tool
in theoretical algorithm design, and it opens the door to nonlinear programming. The Supplemental
Material covers Karmarkars algorithm, an alternative to Simplex that is often even more efficient.
Unit 8 is a brief introduction to nonlinear methods: quadratic programming (& the Markowitz
model for assessing risk versus return on financial investments) and its generalization to semidefinite programming. We illustrate the latter with the groundbreaking Goemans-Williamson SDP
algorithm for approximating the maximum cut problem in graphs.
The Supplemental Material in Unit 9 will be covered as time permits.
Linear Algebra
Chvatals treatment centers on the intuitive notion of dictionary. Units 13 use very little linear
algebra, although its still helpful for your intuition. Units 46 switch to the language of linear
algebra, but we really only use very big ideas. Units 78 use more of what you learned at the
start of an introductory linear algebra course. The course is essentially self-contained as far as
background from linear algebra is concerned.
Tips on using these notes
Class lectures will work through the notes. This will save you writing.
Add comments at appropriate places in the notes. Use backs of pages for notes on more extensive
discussions (or if you like to write big!). If I switch to free sheets of paper for extended discussions
not in the notes, you may also want to use free sheets of paper.
A multi-colored pen can be useful (to separate different comments; for complicated pictures; to color
code remarks, e.g. red = important, etc.) Such a pen is a crucial tool for most mathematicians
and theoretical computer scientists!
If you want to review the comments I wrote in class, I can reproduce them for you.
The notes complement the textbook. The material in Chvatal corresponding to a handout is given
in the upper left corner of the handouts page 1, & in Handout #i. The notes follow Chvatal, but
provide more formal discussions of many concepts as well as alternate explanations & additional
material. The notes are succinct and require additional explanation, which we do in class. You
should understand both Chvatal and my notes.
Extra credit for finding a mistake in these notes.
CSCI 5654
H. Gabow
Fall 2007
#0, p. 4
Chvatal, 3 9
Linear Programming
Unit 1: Overview
8
12
3
0
CSCI 5654
H. Gabow
Fall 2007
#1, p. 1
Pn
j=1 cj xj
Pn
subject to
j=1
aij xj
xj
bi
0
(i = 1, . . . , m)
(j = 1, . . . , n)
constraints
nonnegativity constraints
cost coefficient cj , decision variable xj , right-hand side coefficient bi
a feasible solution satisfies all the constraints
an optimum solution is feasible and achieves the optimum objective value
Activity Space Representation
n dimensions, xj = level of activity j
y
4x+2y=8
3x+4y=12
x
6x+4y=14.4
CSCI 5654
H. Gabow
Fall 2007
#1, p. 2
Real-world LPs
blending problem find a least cost blend satisfying various requirements
e.g., gasoline blends (Chvatal, p.10, ex.1.7); diet problem (Chvatal, pp.35); animal feed;
steel composition
resource allocation problem allocate limited resources to various activities to maximize profit
e.g., forestry (Chvatal, pp.1716); Chvatal, p.10, ex.1.6; beer production
multiperiod scheduling problems schedule activities in various time periods to maximize profit
e.g., Chvatal, p.11, ex.1.8; cash flow; inventory management; electric power generation
And Solving Them
LP-solvers include CPLEX (the industrial standard), MINOS, MOSEK, LINDO
most of these have free (lower-powered) versions as downloads
modelling languages facilitate specification of large LPs
e.g., heres an AMPL program for a resource-allocation problem (from AMPL text):
the model is in a .mod file:
set PROD;
# products
set STAGE; # stages
param rate {PROD,STAGE} > 0; # tons per hour in each stage
param avail {STAGE} >= 0;
# hours available/week in each stage
param profit {PROD};
# profit per ton
param commit {PROD} >= 0;
# lower limit on tons sold in week
param market {PROD} >= 0;
# upper limit on tons sold in week
var Make {p in PROD} >= commit[p], <= market[p]; # tons produced
maximize total profit: sum {p in PROD} profit[p] * Make[p];
subject to Time {s in STAGE}:
sum {p in PROD} (1/rate[p,s]) * Make[p] <= avail[s];
# In each stage: total of hours used by all products may not exceed hours available
CSCI 5654
H. Gabow
Fall 2007
#1, p. 3
LP versus ILP
an Integer Linear Program (ILP) is the same as an LP
but the variables xj are required to be integers
much harder!
Two Examples
Knapsack Problem
maximize z = 3x1 + 5x2 + 6x3
subject to x1 + 2x2 + 3x3 5
0 xi 1, xi integral
i = 1, 2, 3
0 yi i,
i = 1, 2, 3
() is a special case of a polymatroid a broad class of LPs with 0/1 coefficients where the greedy
method works correctly.
CSCI 5654
H. Gabow
Fall 2007
#1, p. 4
|{i,j}S|=1 xij
xij
=2
2
{0, 1}
k = 1, . . . , n
(enter & leave each city)
S {1, . . . , n} (subtour elimination constraints)
1i<jn
we form the (Held-Karp) LP relaxation of this ILP by dropping the integrality constraints
i.e., replace the last line by xij 0, 1 i < j n
let zILP and zLP denote the optimum objective values of the 2 problems
assume distances satisfy the triangle inequality
cij + cjk cik for all i, j, k
then we know that zILP (3/2)zLP , i.e., the integrality gap is 3/2
experimentally the Held-Karp lower bound is typically above .99zILP
the 4/3 Conjecture states (unsurprisingly) that the integrality gap is always 4/3
CSCI 5654
H. Gabow
Fall 2007
#1, p. 5
Exercise. Well show the integrality gap can be 4/3 , for any > 0. Heres the graph:
...
...
...
...
...
k+1
...
A tour with length 4k + 2.
...
1/2
1/2
...
1/2
1/2
1/2
1/2
...
A fractional solution of length 3k + 3
3k + 3 = 3(k 1) + 2(1 + 1/2 + 1/2 + 1/2(2)) .
(a) Explain why the above fractional solution is feasible. Hint: Concentrate on the 3 paths of solid
edges. (b) Explain why any fractional solution has length 3k + 2. (c) Explain why any tour has
length 4k + 2. Hint: Concentrate on the length 1 edges traversed by the tour. They break up
into subpaths beginning and ending at 1 of the 2 extreme points of the graph. (d) Conclude that
for this example, limk zILP /zLP = 4/3.
CSCI 5654
H. Gabow
Fall 2007
#1, p. 6
Standard Form
Unit 1: Overview
a linear program is a problem that can be put into standard (maximization) form
maximize z =
subject to
Pn
j=1 cj xj
Pn
j=1
aij xj
xj
bi
0
(i = 1, . . . , m)
(j = 1, . . . , n)
bi
0
(i = 1, . . . , m)
(j = 1, . . . , n)
Pn
j=1 cj xj
Pn
j=1
aij xj
xj
CSCI 5654
H. Gabow
Fall 2007
#2, p. 1
Free variables
a free variable has no sign restriction
we can model a free variable x by replacing it by 2 nonnegative variables p & n with x = p n
replace all occurrences of x by p n
the 2 problems are equivalent:
a solution to the original problem gives a solution to the new problem
with the same objective value
conversely, a solution to the new problem gives a solution to the original problem
with the same objective value
A more economical transformation
the above transformation models k free variables xi , i = 1, . . . , k by 2k new variables
we can model these k free variables by k + 1 new variables:
f i = pi N
Remark
LINDO variables are automatically nonnegative, unless declared FREE
Equality constraints
an equality constraint
Pn
a x b
Pnj=1 j j
j=1 aj xj b
Pn
j=1 aj xj
Pn
j=1
we can
Pnuse only k + 1 new constraints, by simply adding together all the constraints:
j=1 aij xj bi , i = 1, . . . , k
Pk Pn
Pk
i=1
j=1 aij xj
i=1 bi
x + 2y + 5z 30
CSCI 5654
H. Gabow
Fall 2007
#2, p. 2
Exercise. (Karmarkar Standard Form) The Linear Inequalities (LI) problem is to find a solution
to a given system of linear inequalities or declare the system infeasible. Consider the system
Pn
bi
Pn
= bi
0
j=1 aij xj
(i = 1, . . . , m)
xj
(i = 1, . . . , m)
(j = 1, . . . , n)
(ii) Show that we can assume a normalizing constraint in (i), i.e., any system in form (i) is
equivalent to a system in this form:
Pn
a x
j=1
Pn ij j
j=1 xj
xj
= bi
=1
0
(i = 1, . . . , m)
(j = 1, . . . , n)
=0
=1
0
(i = 1, . . . , m)
(j = 1, . . . , n)
(iv) Starting with the system of (iii) construct the following LP, which uses another variable s:
minimize z = s
subject to
Pn
j=1
Pn
aij xj ( j=1 aij )s = 0
Pn
j=1 xj + s = 1
xj , s 0
(i = 1, . . . , m)
(j = 1, . . . , n)
Show (iii) has a solution if and only if (iv) has optimum cost 0. Furthermore (iv) always has
nonnegative cost, and setting all variables equal to 1/(n + 1) gives a feasible solution.
(iv) is standard form for Karmarkars algorithm. That is, Karmarkars algorithm has input an LP
of the form
minimize z =
subject to
Pn
j=1 cj xj
Pn
a x
j=1
Pn ij j
j=1 xj
xj
=0
=1
0
(i = 1, . . . , m)
(j = 1, . . . , n)
where any feasible solution has z 0 and xj = 1/n, j = 1, . . . , n is feasible. The algorithm
determines whether or not the optimum cost is 0. (The exercise of Handout#18 shows any LP can
be placed into Karmarkar Standard Form.)
CSCI 5654
H. Gabow
Fall 2007
#2, p. 3
LP Objective Functions
Unit 1: Overview
CSCI 5654
H. Gabow
Fall 2007
#3, p. 1
Special Cases
these useful objective functions all have a concavity restriction
dont try to remember them, just know the general method!
Diminishing Returns (maximizing piecewise linear concave down objective functions)
(concave down means slope is decreasing)
Example 3.
maximizing z =
( 2x
x+1
(x + 5)/2
x1
1x3
3x5
is equivalent to
maximizing min{2x, x + 1, (x + 5)/2}
z
z=(x+5)/2
z=(x+5)/2
z=x+1
z=x+1
z=2x
z=2x
x
z=
( 2x
x+1
(x + 5)/2
x1
1x3
3x5
min{2x, x + 1, (x + 5)/2}
P
Example 4. maximizing z = nj=1 cj (xj ), where each cj is a piecewise linear concave down function
the same transformation as Example 3 works
Remark.
Handout #45 gives an alternate solution,
that adds more variables but uses simpler constraints
similarly we can minimize a sum of piecewise linear concave up functions
CSCI 5654
H. Gabow
Fall 2007
#3, p. 2
Absolute Values
Example 5. the objective can contain terms with absolute values, e.g., |x|, 3|y|, 2|x y 6|
but the coefficients must be positive in a minimization problem (e.g., +|x|)
& negative in a maximization problem (e.g., |x|)
y
y=|x|
CSCI 5654
H. Gabow
Fall 2007
#3, p. 3
aj xj & target b
CSCI 5654
H. Gabow
Fall 2007
#3, p. 4
Unit 1: Overview
m
1.5K
n
4K
nonzeroes
40K
10K..100K
even 1M
20K..500K
even 2M
100K..2M
even 6M
the big problems still have only 10..30 nonzeroes per constraint
the bigger problems may take days to solve
Notation: L = (number of bits in the input) (see Handout#69)
Perspective: to understand the bounds, note that Gaussian elimination is time O(n3 L)
i.e., O(n3 ) operations, each on O(L) bits
Simplex Method
G.B. Dantzig, 1951: Maximization of a linear function of variables subject to linear inequalities
visits extreme points, always increasing the profit
can do 2n pivot steps, each time O(mn)
but in practice, simplex is often the method of choice
this is backed up by some theoretic results
in a certain model where problems are chosen randomly, average number of pivots is bounded
by min{n/2, (m + 1)/2, (m + n + 1)/8}, in agreement with practice
simplex is polynomial-time if we use smoothed analysis compute average time of a randomly perturbed variant of the given LP
the next 2 algorithms show LP is in P
Ellipsoid Method
L.G. Khachiyan, 1979: A polynomial algorithm for linear programming
finds sequence of ellipsoids of decreasing volume, each containing a feasible solution
O(mn3 L) arithmetic operations on numbers of O(L) bits
impractical, even for 15 variables
theoretic tool for developing polynomial time algorithms (Gr
otschel, Lovasz, Schrijver, 1981)
extends to convex programming, semidefinite programming
Interior Point Algorithm
N. Karmarkar, 1984: A new polynomial-time algorithm for linear programming
(Combinatorica 84)
navigates through the interior, eventually jumping to optimum extreme point
O((m1.5 n2 + m2 n)L) arithmetic operations on numbers of O(L) bits
in practice, competitive with simplex for large problems
refinements: O((mn2 + m1.5 n)L) operations on numbers of O(L) bits (Vaidya, 1987, and others)
CSCI 5654
H. Gabow
Fall 2007
#4, p. 1
CSCI 5654
H. Gabow
Fall 2007
#4, p. 2
Chvatal, 13 23
Unit 2: Fundamentals
4x1+2x2=8
3x1+4x2=12
x1
6x1+4x2=14.4
6x1 + 4x2
CSCI 5654
H. Gabow
Fall 2007
#5, p. 1
12
5
z=
72
5
2
1
5 x3 + 5 x4
3
x 25 x4
10 3
6
5 x3
25 x4
CSCI 5654
H. Gabow
Fall 2007
#5, p. 2
Geometric View
the algorithm visits corners of the feasible region, always moving along an edge
x2
4x1+2x2=8
3x1+4x2=12
x1
6x1+4x2=14.4
CSCI 5654
H. Gabow
Fall 2007
#5, p. 3
Chvatal, 17 19
Unit 2: Fundamentals
Dictionaries
start with an LP () in standard form,
maximize z =
()
subject to
Pn
j=1 cj xj
Pn
j=1
aij xj
xj
bi
0
(i = 1, . . . , m)
(j = 1, . . . , n)
(xn+i 0)
the equations defining the slacks & z give the initial dictionary for ():
Pn
xn+i = bi j=1 aij xj
(i = 1, . . . , m)
Pn
z=
j=1 cj xj
A dictionary for LP () is determined by a set of m basic variables B.
The remaining n variables are the nonbasic variables N .
B, N {1, . . . , n + m}
(i) The dictionary has
Pthe form
xi = bi jN aij xj
P
z =z+
jN cj xj
(i B)
Remarks.
1. a dictionary is a system of equations
showing how the nonbasic variables determine the values of the basic variables and the objective
nonnegativity is not a constraint of the dictionary
2. notice the sign conventions of the dictionary
3. B, the set of basic variables, is a basis
4. well satisfy condition (ii) by deriving our dictionaries from the initial dictionary
using equality-preserving transformations
a feasible dictionary has each bi 0
/ B)
it gives a basic feasible solution, xi = bi (i B), xi = 0 (i
Examples
1. the initial dictionary of a resource allocation problem is a feasible dictionary
2. the blending problem of Handout #1
CSCI 5654
H. Gabow
Fall 2007
#6, p. 1
minimize z = 6x+4y
subject to
4x+2y 8
3x+4y 12
x, y 0
has initial dictionary
s1 = 8 + 4x + 2y
s2 = 12 + 3x + 4y
z=
6x + 4y
infeasible!
Lemma [Nonbasic variables are free.] Let D be an arbitrary dictionary, with basic (nonbasic)
variables B (N ).
(i) Any linear relation always satisfied by the nonbasic variables of D has all its coefficients equal
to 0, i.e.,
P
jN
jBN
j xj + =
jBN
j xj +
always satisfied by the solutions of D has the same coefficients on both sides if all basic coefficients
are the same, i.e.,
j = j for every j B = = , j = j for every j N .
Proof of (i).
setting xj = 0, j N = = 0
setting xj = 0, j N i, xi = 1 = i = 0
CSCI 5654
H. Gabow
Fall 2007
#6, p. 2
0110
110010
1010
1010
10
yx>=1
y<=x
yx<=1
x+y= constant
max x+y
Infeasible LP
Unbounded LP
CSCI 5654
H. Gabow
Fall 2007
#6, p. 3
Chvatal, 27 33
Unit 2: Fundamentals
in the Entering Variable Step usually > 1 variable has positive cost
the pivot rule specifies the choice of entering variable
e.g., the largest coefficient rule chooses the entering variable with maximum cs
the computation in the Leaving Variable Step is called the minimum ratio test
Efficiency (Dantzig, LP & Extensions, p.160)
in practice the algorithm does between m & 2m pivot steps, usually < 3m/2
for example see the real-life forestry LP in Chvatal, Ch.11
simplex finds the optimum for 17 constraints in 7 pivots
Deficiencies of the Basic Algorithm
we need to add 2 ingredients to get a complete algorithm:
in general, how do we find an initial feasible dictionary?
how do we guarantee the algorithm halts?
CSCI 5654
H. Gabow
Fall 2007
#7, p. 1
Chvatal, 27 33
Unit 2: Fundamentals
our goal is to show the basic simplex algorithm always halts with the correct answer
assuming we repair the 2 deficiencies of the algorithm
we achieve the goal by proving 6 properties of the algorithm
1. Each dictionary constructed by the algorithm is valid.
Proof.
each Pivot Step replaces a system of equations by an equivalent system
(t
CSCI 5654
bj ajs t
0
j=s
jB
j
/ Bs
H. Gabow
Fall 2007
#8, p. 1
xi = bi jN aij xj = bi jN aij xj
nonbasic variables are free = the equations are the same, i.e., bi = bi , aij = aij
similarly, z = z , cj = cj
Cycling
in a cycle, the objective z stays constant (Property 3 shows this is necessary for cycling)
so each pivot has br = 0 (Property 3)
thus the entering variable stays at 0, and the solution (x1 , . . . , xn ) does not change
Chvatal pp. 3132 gives an example of cycling (see Handout #48)
Degeneracy
a basis is degenerate if one or more basic variables = 0
degeneracy is necessary for cycling
but simplex can construct a degenerate basis without cycling:
br neednt be 0
even if br = 0 we neednt be in a cycle
although such a pivot does not change the objective value (see Property 3)
(i) degeneracy is theoretically unlikely in a random LP, but seems to always occur in practice!
(ii) if there is a tie for leaving variable, the new basis is degenerate (see proof of Property 2)
Exercise. Prove the converse: A pivot step gives a nondegenerate dictionary if it starts with a
nondegenerate dictionary and has no tie for leaving variable.
Handout #11 adds a rule so we never cycle
in fact, each pivot increases z
this guarantees the algorithm eventually halts with the correct answer
Handout #13 shows how to proceed when the initial dictionary is infeasible
CSCI 5654
H. Gabow
Fall 2007
#8, p. 2
Chvatal, 23 25
Unit 2: Fundamentals
6x1 + 4x2
x1 x2 x3 x4 z b
x3 4 2 1
8
x4 3 4
1
12
z 6 4
1 0
CSCI 5654
H. Gabow
Fall 2007
#9, p. 1
x4
2.5 .75 1
6 2.4
z
1 1.5
1 12
Optimum Tableau
x1 x2 x3 x4 z b
x1 1
.4 .2
.8
x2
1 .3 .4
2.4
z
1.2 .4 1 14.4
Example 2.
LP: maximize z = xy
subject to
x+y 2
ax +y 4
x, y 0
Initial Tableau
s1 s2 x
s1 1 0 1
s2 0 1 a
z
1
y z b ratio
1
2
1
4 4/a
1 1 0
s2 x
y
z
b
1+
2 + 4
1
1+ 1
4
CSCI 5654
H. Gabow
Fall 2007
#9, p. 2
Introduction to Geometry of LP
Unit 2: Fundamentals
Pn
j=1 aj xj
x
this unbounded convex polyhedron has
3 vertices, 4 edges (facets), 9 faces total
CSCI 5654
H. Gabow
Fall 2007
#10, p. 1
Special Faces
vertex 0-dimensional face of P
i.e., a point of P that is the unique intersection of n hyperplanes of P
edge 1-dimensional face of P
i.e., a line segment that is the intersection of P and n 1 hyperplanes of P
can be a ray or a line
facet (n 1)-dimensional face of P
CSCI 5654
H. Gabow
Fall 2007
#10, p. 2
p(m)
theres a path of length p(m, n)
CSCI 5654
H. Gabow
Fall 2007
#10, p. 3
Unit 2: Fundamentals
well give 2 rules, each ensures the simplex algorithm does not cycle
both rules are easy to implement
but many simplex codes ignore the possibility of cycling, since it doesnt occur in practice
avoiding cycling is important theoretically, e.g.,
needed to prove the Fundamental Theorem of LP
Intuition for Lexicographic Method
degeneracy is an accident, i.e., > n hyperplanes intersect in a common point
a random LP is totally nondegenerate, i.e., it has no degenerate dictionary
our approach is to perturb the problem, so only n hyperplanes intersect in a common point
= there are no degenerate bfss = the simplex algorithm doesnt cycle
3 planes meet
at a vertex
The Perturbed LP
given an LP in standard form,
maximize z =
subject to
Pn
j=1 cj xj
Pn
j=1
aij xj
bi
(i = 1, . . . , m)
xj
(j = 1, . . . , n)
CSCI 5654
H. Gabow
Fall 2007
#11, p. 1
Remarks
1. the definition 0 = 1 comes in handy below
2. its tempting to use a simpler strategy, replacing bi by bi +
i.e., use the same perturbation in each inequality
Chvatal p.34 shows this is incorrect, simplex can still cycle
the constraints must be perturbed by linearly independent quantities
well see this is crucial in our proof of correctness
3. the appropriate values of i are unknown for i > 0, and difficult to find
we finesse this problem by treating the i as variables with the above property ()!
imagine executing the simplex method on the perturbed problem
well get expressions like 2 + 1 52 as b terms get combined
call such expressions linear combinations (of the i s)
& simpler expressions that are just numbers,
Pmlike 2 call these scalars
so a linear combination has the form i=0 i i where each i is a scalar
in any dictionary,
the coefficients are aij & cj , i = 1, . . . , m, j = 1, . . . , n
the absolute terms are bi , i = 1, . . . , m & z
CSCI 5654
H. Gabow
Fall 2007
#11, p. 2
CSCI 5654
H. Gabow
Fall 2007
#11, p. 3
Remarks.
1. our original intuition is correct:
there are numeric values of i that give a perturbed problem
the simplex algorithm does exactly the same pivots as the lexicographic method
just take i = i with > 0 small enough
this is doable since there are a finite number of pivots
2. many books prove the key Lemma 2 using linear algebra
(simple properties of inverse matrices)
Chvatal finesses linear algebra with dictionaries
3. perturbation is a general technique in combinatorial computing
e.g., any graph has a unique minimum spanning tree if we perturb the weights
4. smoothed analysis (Handout#4) computes the time to solve an LP L
by averaging over perturbed versions of L
where we randomly perturb the aij s and the bi s
CSCI 5654
H. Gabow
Fall 2007
#11, p. 4
Chvatal, 37 38
Unit 2: Fundamentals
CSCI 5654
H. Gabow
Fall 2007
#12, p. 1
Theorem. The simplex algorithm with the smallest-subscript rule never cycles.
Proof.
consider a sequence of pivot steps forming a cycle,
i.e., it begins and ends with the same dictionary
we derive a contradiction as follows
let F be the set of all subscripts of variables that enter (and leave) the basis during the cycle
let t F
let D be a dictionary in the cycle that gives a pivot where xt leaves the basis
similarly D is a dictionary giving a pivot where xt enters the basis
(note that xt can enter and leave the basis many times in a cycle)
dictionary D: basis B
coefficients aij , bi , cj
next pivot: xs enters, xt leaves
dictionary D : coefficients cj
next pivot: xt enters
Claim: cs = cs
iB ci ais
Proof of Claim:
the pivot for D corresponds to solutions xs = u, xi = bi ais u, i B, remaining xj = 0
these solutions satisfy D (although they may not be feasible)
the cost of such a solution varies linearly with u:
dictionary D shows it varies as cs u
P
dictionary D shows it varies as (cs iB ci ais )u
these two functions must be the same! this gives the claim
well derive a contradiction by showing the l.h.s. of the Claim is positive but the r.h.s. is nonpositive
CSCI 5654
H. Gabow
Fall 2007
#12, p. 2
CSCI 5654
H. Gabow
Fall 2007
#12, p. 3
Chvatal, 39 42
Unit 2: Fundamentals
(i = 1, . . . , m)
xj 0
(j = 1, . . . , n)
(i = 1, . . . , m)
xj 0
(j = 0, . . . , n)
Motivation:
the minimum = 0 the given LP is feasible
but if the given LP is feasible, will we get a feasible dictionary for it?
x0 is sometimes called an artificial variable
before describing Phase 1, heres an example:
the given LP has constraints
x1 x2 1, 2x1 + x2 2,
7x1 x2 6
x1 , x2 0
(the first constraint 7x1 7x2 7 is inconsistent with the last 7x1 7x2 7x1 x2 6 )
put the constraints in standard form:
x1 + x2 1, 2x1 x2 2,
7x1 x2 6
x1 , x2 0
(infeasible)
x3 = 1 + x1 x2 + x0
x3 = 1 x1 2x2 + x4
x4 = 2 + 2x1 + x2 + x0
x0 = 2 2x1 x2 + x4
x5 = 6 7x1 + x2 + x0
w =
CSCI 5654
x5 = 8 9x1
x0
+ x4
w = 2 + 2x1 + x2 x4
H. Gabow
Fall 2007
#13, p. 1
1
2
3
2
1
x
2 1
3
2 x1
1
x
2 3
1
2 x3
x5 = 8 9x1
+
+
1
x
2 4
1
2 x4
x2 = . . .
x0 = . . .
x1 =
+ x4
w = 32 + 23 x1 12 x3 21 x4
8
9
+ 91 x4 19 x5
w = 16 21 x3 31 x4 16 x5
optimum dictionary
Pn
j=1
aij xj + x0
(i = 1, . . . , m)
x0
D0 is infeasible
2. feasible dictionary
to get a feasible dictionary pivot with x0 entering, xn+k leaving (well choose k momentarily)
x0 = bk +
Pn
akj xj + xn+k
Pn
xn+i = bi bk j=1 (aij akj )xj + xn+k
Pn
w = bk j=1 akj xj xn+k
j=1
(i = 1, . . . , m, i 6= k)
CSCI 5654
H. Gabow
Fall 2007
#13, p. 2
cj xj +
|{z}
jB
cj xj
j B
/
substitute
now execute Phase 2: run the simplex algorithm, starting with dictionary D
Exercise 1 (contd). Prove that in Case 1, the last row of D is always w = x0 .
Case 2. Phase 1 terminates with x0 basic.
In this case the given LP is infeasible
Proof.
it suffices to show that the final (optimum) value of x0 is > 0
equivalently, no pivot step changes x0 to 0:
suppose a pivot step changes x0 from positive to 0
x0 was basic at the start of the pivot, and could have left the basis
(it had the minimum ratio)
in this case Phase 1 makes x0 leave the basis
Remark.
the big-M method solves 1 LP
Pninstead of 2
it uses objective function z = j=1 cj xj M x0
where M is a symbolic value that is larger than any number encountered
CSCI 5654
H. Gabow
Fall 2007
#13, p. 3
A Surprising Bonus
if an LP is infeasible wed like our algorithm to output succinct evidence of infeasibility
in our example infeasible LP
the objective of the final dictionary shows how the given constraints imply a contradiction:
using the given constraints in standard form,
add 21 (1st constraint) + 31 (2nd constraint) +
1
6
1
2 (x1
+ x2 1) + 13 (2x1 x2 2) + 16 (7x1 x2 6)
simplifies to 0 61 , a contradiction!
relevant definition: a linear combination of inequalities is
the sum of multiples of each of the inequalities
the original inequalities must be of the same type (, , <, >)
the multiples must be nonnegative
we combine the l.h.s.s & the r.h.s.s
Phase 1 Infeasibility Proof
in general, suppose Phase 1 halts with optimum objective value w < 0
consider the last (optimal) dictionary
(ci 0)
suppose slack si has coefficient ci , i = 1, . . . , m
multiply the ith constraint by ci and add all m constraints
this will give a contradiction, (nonnegative #) w < 0
we show this always works in Handout#32,p.2
LINDO Phase 1 (method sketched in Chvatal, p.129)
Phase 1 does not use any artificial variables. Each dictionary uses a different objective function:
The Phase 1 objective for dictionary D is
w=
xh
where the sum is over all (basic) variables xh having negative values in D.
Tableau:
Row 1 gives the coefficients, in the current dictionary, of the given objective function. The last row
(labelled ART) gives the coefficients of the current Phase 1 cost function. This row is constructed
by adding together all rows that have negative bi s (but keeping the entries in the basic columns
equal to 0).
CSCI 5654
H. Gabow
Fall 2007
#13, p. 4
Simplex Iteration.
In the following, aij , bi and cj refer to entries in the current LINDO tableau (not dictionary!);
further, cj are the Phase 1 cost coefficients, i.e., the entries in the ART row. The value of the Phase
1 objective (bottom right tableau entry) is negative.
Entering Variable Step.
If every cj is 0 stop, the problem is infeasible.
Otherwise choose a (nonbasic) s with cs < 0.
Leaving Variable Step.
Choose a basic r that minimizes this set of ratios:
{
bi
: ais > 0 and bi 0, or ais < 0 and bi < 0}.
ais
Pivot Step.
Construct the tableau for the new basis (xs enters, xr leaves) except for the ART row. If every bi is
nonnegative the current bfs is feasible for the given LP. Proceed to Phase 2.
Otherwise construct the ART row by adding the rows of all negative bi s and zeroing the basic
columns.
Exercises.
1. Justify the conclusion of infeasibility in the Entering Variable Step. Hint. Show
P a feasible
solution implies an equation (nonnegative number) = (negative number), using 0 xh < 0.
2. Explain why the set of ratios in the Leaving Variable Step is nonempty. If it were empty wed
be in trouble!
3. Explain why any variable that is negative in the current dictionary started out negative and
remained so in every dictionary.
4. Explain why Phase 1 eventually halts, assuming it doesnt cycle. Hint. Show a pivot always
increases the current objective function (even when we switch objectives!).
5. Explain why the following is probably a better Leaving Variable Step:
Let P OS = {i : ais > 0 and bi 0}.
Let N EG = {i : ais < 0 and bi < 0}.
If P OS 6= then r is the minimizer of { abisi : i P OS}.
Otherwise r is the minimizer of { abisi : i N EG}.
CSCI 5654
H. Gabow
Fall 2007
#13, p. 5
Chvatal, 42 43
Unit 2: Fundamentals
(i = 1, . . . , m)
(j = 1, . . . , n)
x1
CSCI 5654
H. Gabow
Fall 2007
#14, p. 1
Example 1 contd.
adopting the objective function x1 + x2 ,
& transforming to standard form by the substitutions xj = pj n, gives the LP
maximize z = p1 + p2 2n
subject to p1 + p2 2n
p1 , p2 , n
5
0
p2
p2-2n <= 5
p1
p1-2n <= 5
n
The 2 optimum bfss are circled.
part (i) of the Fundamental Theorem holds for any LP
Question. Can you think of other sorts of linear problems, not quite in standard form and not
satisfying the theorem?
an unbounded LP has an edge thats a line of unboundedness
heres a stronger version of this fact:
Extended Fundamental Theorem (see Chvatal, 242243)
If the LP is unbounded, it has a basic feasible direction with positive cost.
to explain, start with the definition:
CSCI 5654
H. Gabow
Fall 2007
#14, p. 2
ajs
0
jB
j
/ Bs
z = 1x+s
basic feasible direction s = t, y = t, x = 0
y
bfd
CSCI 5654
H. Gabow
Fall 2007
#14, p. 3
Claim. if an LP
Pn is unbounded the simplex algorithm finds a basic feasible direction wj
with j=1 cj wj > 0 (these cj s are original cost coefficients)
(n above can be the total number of decision variables)
the Claim implies the Extended Fundamental Theorem
Proof of Claim.
let cj denote the cost cofficients in the final dictionary
& z the cost value
let s be the entering variable for the unbounded pivot (cs > 0)
in what follows, all sums are over all variables, including slacks
X
cj xj = z +
cj (xj + wj ) = z +
cj (xj + wj )
X
j
CSCI 5654
c j xj
subtract to get
cj wj =
cj wj =
H. Gabow
jB
0 wj +
j Bs
/
cj 0 + cs = cs > 0
Fall 2007
#14, p. 4
Chvatal, 54 57
Unit 3: Duality
(i = 1, . . . , m)
xj 0
(j = 1, . . . , n)
Pm
minimize z = i=1 bi yi
P
subject to m
i=1 aij yi cj
(j = 1, . . . , n)
yi 0
(i = 1, . . . , m)
Caution. this definition only works if the primal is in standard maximization form
Example. find the dual of Chvatal, problem 5.2, p. 69
notice how n & m get interchanged!
Dual Problem
Primal Problem
maximize
subject to
x1 2x2
3x1 + x2 1
x1 x2 1
2x1 + 7x2 6
9x1 4x2 6
5x1 + 2x2 3
7x1 3x2 6
x1 , x2 0
CSCI 5654
H. Gabow
Fall 2007
#15, p. 1
bi y i
n
m
n X
n
m X
X
X
X
cj xj
aij yi )xj
(
aij xj )yi =
(
i=1
i=1 j=1
j=1 i=1
algebra
CSCI 5654
H. Gabow
j=1
Fall 2007
#15, p. 2
Remarks
1. its obvious that for a tight upper bound, only tight constraints get combined
i.e., the multiplier for a loose constraint is 0 see Handout#19
its not obvious how to combine tight constraints to get the good upper bound
the dual problem does this
2. How good an upper bound does the dual place on the primal objective?
its perfect! see Strong Duality
its remarkable that the problem of upper bounding an LP is another LP
3. plot all primal and dual objective values on the x-axis:
maximum primal objective
primal objective values
duality gap
Illustration of weak duality.
Strong Duality will show the duality gap is actually 0
4. starting with a primal-dual pair of LPs, add arbitrary constraints to each
Weak Duality still holds (above proof is still valid)
even though the 2 LPs need no longer form a primal-dual pair
e.g., we constrain all primal & dual variables to be integers
this gives a dual pair of integer linear programs
Weak Duality holds for any dual pair of ILPs
the duality gap is usually nonzero for dual ILPs
CSCI 5654
H. Gabow
Fall 2007
#15, p. 3
Chvatal, 57 59
Unit 3: Duality
j=1
n
X
j=1
|
{z
}
Ds equation for z
cj xj
m
X
i=1
cn+i (bi
n
X
aij xj xn+i )
j=1
|
{z
}
Pm
Ds equation for z, minus i=1 cn+i 0
both sides of the equation are identical, since the slacks have the same coefficient on both sides
(Handout#6, Nonbasic variables are free)
Remark. the lemmas equation would be simpler if we wrote +cn+i rather than cn+i
but the minus sign comes in handy in the next handout
CSCI 5654
H. Gabow
Fall 2007
#16, p. 1
|
{z
}
Ds equation for xk
m
X
ak,n+i (bi
i=1
n
X
aij xj xn+i )
j=1
{z
}
Ds equations for the slacks
CSCI 5654
H. Gabow
Fall 2007
#16, p. 2
Unit 3: Duality
s1 = 1 + 3x1 x2
s 2 = 1 x1 + x2
s3 = 6 + 2x1 7x2
s4 = 6 9x1 + 4x2
s5 = 3 + 5x1 2x2
s6 = 6 7x1 + 3x2
z = x1 2x2
starting primal dictionary isnt feasible, but this doesnt matter
Optimum Primal Dictionary
(= final Phase 1 dictionary)
primal
decisions
decisions
the cost equation of optimum primal dictionary indicates an optimum dual solution
y5 = 0.2, yj = 0 for j = 1, 2, 3, 4, 6
& the corresponding slack in the dual constraints
t2 = 2.4, t1 = 0
CSCI 5654
H. Gabow
Fall 2007
#17, p. 1
CSCI 5654
H. Gabow
Fall 2007
#17, p. 2
Remarks.
1. theres no sign flip in LINDO tableaux
2. Strong Duality is the source of many minimax theorems in mathematics
e.g., the Max Flow Min Cut Theorem (Chvatal, p.370)
see Handout#63
3. another visualization of primal-dual correspondence:
primal
primal
decisions slacks
n + m variables
dual
slacks
dual
decisions
CSCI 5654
H. Gabow
Fall 2007
#17, p. 3
Chvatal, 60 62
Unit 3: Duality
Why Dual?
(j = 1, . . . , n)
(i = 1, . . . , m)
(j = 1, . . . , n)
CSCI 5654
H. Gabow
Fall 2007
#18, p. 1
5x1 + 6x2
29x2 5
29x1
6
x1 , x2 0
this LP is infeasible
the LP is self-dual, i.e., it is its own dual
so the dual is infeasible
Exercise. Using matrix notation of Unit 4, show the LP
max cx s.t. Ax b, x 0
is self-dual if A is skew symmetric and c = bT .
in case (i), plotting all primal and dual objective values on the x-axis gives
primal objective values
CSCI 5654
H. Gabow
Fall 2007
#18, p. 2
Exercise. As in the exercise of Handout#2 the Linear Inequalities (LI) problem is to find a solution
to a given system of linear inequalities or declare the system infeasible. We will show that LI is
equivalent to LP, i.e., an algorithm for one problem can be used to solve the other.
(i) Show an LP algorithm can solve an LI problem.
(ii) Show an LI algorithm can solve an LP problem. To do this start with a standard form LP,
maximize z =
Pn
subject to
Pn
j=1 cj xj
j=1
aij xj
xj
bi
0
(i = 1, . . . , m)
(j = 1, . . . , n)
aij xj
Pm
aij yi
j=1
i=1
Pn
xj
j=1 cj xj
yi
b
i=1 i yi
Pm
bi
0
cj
0
0
(i = 1, . . . , m)
(j = 1, . . . , n)
(j = 1, . . . , n)
(i = 1, . . . , m)
(ii) together with the exercise of Handout#2 shows any LP can be placed into the standard form
required by Karmarkars algorithm.
CSCI 5654
H. Gabow
Fall 2007
#18, p. 3
Chvatal, 62 65
Unit 3: Duality
Complementary Slackness
Remark CS expresses a fact thats obvious from the multiplier interpretation of duality
the dual solution only uses tight primal constraints
Proof.
Weak Duality holds for xj and yi
lets repeat the proof:
Pm
i=1 bi yi
Pm
Pn
i=1 (
j=1
aij xj )yi =
Pn
Pm
j=1 (
i=1
aij yi )xj
Pn
j=1 cj xj
CSCI 5654
H. Gabow
Fall 2007
(Strong Duality)
#19, p. 1
Remarks.
Pm
1. a common error is to assume xj = 0 = i=1 aij yi 6= cj , or vice versa
2. the simplex algorithm maintains primal feasibility and complementary slackness (previous page)
& halts when dual feasibility is achieved
3. Complementary Slackness is the basis of primal-dual algorithms (Ch.23)
they solve LPs by explicitly working on both the primal & dual
e.g., the Hungarian algorithm for the assignment problem; minimum cost flow problems
& primal-dual approximation algorithms for NP-hard problems
Exercise. Show the set of all optimum solutions of an LP is a face.
Testing optimality
complementary slackness gives a test for optimality, of any LP solution
given a standard form LP L
Pn
maximize z = j=1 cj xj
P
subject to nj=1 aij xj bi
(i = 1, . . . , m)
xj 0
(j = 1, . . . , n)
m
X
aij yi = cj
(1)
i=1
n
X
aij xj < bi = yi = 0
(2)
j=1
m
X
aij yi cj
(j = 1, . . . , n)
(3)
(i = 1, . . . , m)
(4)
i=1
yi 0
CSCI 5654
H. Gabow
Fall 2007
#19, p. 2
Application
to check a given feasible solution xj is optimal
use (2) to deduce the yi s that vanish
use (1) to find the remaining yi s (assuming a unique solution)
then check (3) (4)
Examples:
1. Chvatal pp. 6465
2. check x1 = .6, x2 = 0 is optimum to the primal of Handout#15, p.1 (y5 = .2, yi = 0 for i 6= 5)
the above uniqueness assumption is reasonable
for a nondegenerate bfs xj , (1)(2) form a system of m equations in m unknowns
more precisely if k decision variables are nonzero and m k slacks are nonzero
(1) becomes a system of k equations in k unknowns
satisfying the uniqueness condition:
Theorem. xj , j = 1, . . . , n a nondegenerate bfs = system (1)(2) has a unique solution.
Proof.
let D be a dictionary for xj
(1)(2) are the equations satisfied by the m multipliers for the cost equation of D
so we need only show D is unique
since yi appears in the cost equation, distinct multipliers give distinct cost equations
uniqueness follows since
xj corresponds to a unique basis (nondegeneracy) & any basis has a unique dictionary
Corollary. An LP with an optimum nondegnerate dictionary has a unique optimum dual solution.
although the primal can still have many optima
primal: max 2x1 + 4x2 s.t. x1 + 2x2 1, x1 , x2 0
optimum nondegenerate dictionary: x1 = 1 s 2x2
dual: min y s.t. y 2, 2y 4, y 0
CSCI 5654
H. Gabow
z = 2 2s
Fall 2007
#19, p. 3
Exercise. If the complementary slackness conditions almost hold, were close to optimality.
This principle is the basis for many ILP approximation algorithms. This exercise proves the principle, as follows.
Consider a standard form LP
Pn
maximize z = j=1 cj xj
Pn
subject to j=1 aij xj bi
(i = 1, . . . , m)
xj 0
(j = 1, . . . , n)
Pn
j=1 cj xj .
CSCI 5654
H. Gabow
Fall 2007
#19, p. 4
Chvatal, 65 68
Unit 3: Duality
x2
x1 + x2 = 100
10x1 + 50x2 = 4000
x1
40x1 + 70x2 = 6250
x1 = # acres to fell and regenerate; x2 = # acres to fell and plant pine
standard form LP
s1 = 100 x1 x2
z=
net profit
40x1 + 70x2
z gives the net profit from 2 the forestry activities, executed at levels x1 and x2
optimum dictionary D
x1 = 25 1.25s1 + .025s2
x2 = 75 + .25s1 .025s2
z = 6250 32.5s1 .75s2
suppose (as in Chvatal) there are t more units of resource #2, cash
(t is positive or negative)
in 2nd constraint, 4000 4000 + t
How does this change dictionary D ?
since the optimum dual solution is y1 = 32.5, y2 = .75,
Lemma 1 of Handout#16 shows D s objective equation is
(original equation for z) + 32.5 (1st constraint) +.75 (2nd constraint)
= the objective equation becomes z = 6250 + .75t 32.5s1 .75s2
CSCI 5654
H. Gabow
Fall 2007
#20, p. 1
CSCI 5654
H. Gabow
Fall 2007
#20, p. 2
(j = 1, . . . , n)
bi yi , so z/bi = yi
basic values bk + m
u
t
,
cost
z
+
i=1 ki i
i=1 yi ti
this solution is optimum as long as its feasible, i.e., bk +
Pm
i=1
uki ti 0
let b = min{bk : 1 k m}
(b > 0 by nondegeneracy)
U = max{|uki | : 1 k, i m} (U > 0, else no constraints)
= b/(2mU )
( > 0)
taking |ti | for all i makes l.h.s. b mU = b/2 > 0
CSCI 5654
H. Gabow
Fall 2007
#20, p. 3
j=1 aij xj
< bi = yi = 0
i=1 aij yi
> cj = xj = 0
if the resources consumed by activity j are worth more than its (net) profit,
we wont produce j
c
x
z
+
i=1 yi ti
j=1 j j
Proof.
we repeat the Weak Duality argument:
Pn
j=1 cj xj
Pn
Pm
j=1 (
i=1
aij yi )xj =
CSCI 5654
Pm
Pn
i=1 (
j=1
aij xj )yi
Pm
i=1 (bi
+ ti )yi = z +
Pm
i=1 ti yi
H. Gabow
Fall 2007
#20, p. 4
Unit 3: Duality
Primal
maximize x1 + 2x2 + 3x3 + 4x4
subject to
3x1 + x2 + x3 x4 1
x1 x2 x3 + 2x4 = 1
2x1 + 7x2 + x3 4x4 = 6
minimize
subject to
y1 + y2 + 6y3 + 6y4
3y1 + y2 2y3 + 9y4 1
y1 y2 + 7y3 4y4 2
y1 y2 + y3 y4 = 3
y1 + 2y2 4y3 + 6y4 = 4
y1 , y4 0
note that to form a dual, we must still start with a consistent primal
e.g., a maximization problem with no constraints
Proof of (i) (ii).
consider a problem P in standard form plus additional equations & free variables
we transform P to standard form & take the dual D
we show D is equivalent to
the problem produced by using rules (i) (ii) on P
Pn
(i) consider an = constraint in P, j=1 aij xj = bi
it gets transformed to standard form constraints,
Pn
aij xj bi
Pj=1
n
j=1 aij xj bi
these constraints give 2 nonnegative variables in D, pi & ni
the jth constraint of D has the terms aij pi aij ni & the objective bi pi bi ni
equivalently, aij (pi ni ) & bi (pi ni )
substituting yi = pi ni gives a free dual variable yi with terms aij yi & bi yi
(ii) is similar
in fact just read the above proof backwards
CSCI 5654
H. Gabow
Fall 2007
#21, p. 1
Exercises.
1. Repeat the proof using our slicker transformations, i.e., just 1 negative variable/1 constraint.
2. Show the dual of an LP in standard maximization form remains the same when we think of all
variables as free and xj 0 as a linear constraint.
3. Show dropping a constraint that doesnt change the feasible region doesnt change the dual.
Remarks
1. Equivalent LPs have equivalent duals, as the exercises show.
P
P
2. Often we take the primal problem to be, max
cj xj s.t.
aij xj bi
(optimizing over
polyhedron)
Pa generalP
with dual min
yi bi s.t.
yi aij = cj , yi 0
obviously Weak Duality & Strong Duality still hold for a primal-dual pair
Chvatal proves all this sticking to the interpretation of the dual LP
as an upper bound on the primal
Complementary Slackness still holds
theres no complementary slackness condition for an equality constraint or a free variable!
(since its automatic)
Examples. we give 2 examples of LPs with no Complementary Slackness conditions:
1. heres a primal-dual pair where every feasible primal or dual solution is optimum:
maximize x1 + 2x2
subject to x1 + 2x2 = 1
minimize y1
subject to y1 = 1
2y1 = 2
CSCI 5654
minimize y1
subject to y1 = 1
y1 = 0
H. Gabow
Fall 2007
#21, p. 2
Chvatal, Ch. 15
Unit 3: Duality
-1
0
(a)
-1
-1
1
(b)
CSCI 5654
H. Gabow
Fall 2007
()
#22, p. 1
Remark.
for 0-sum games, stability is the same as a Nash point:
in any game, a Nash equilibrium point is a set of strategies for the players
where no player can improve by unilaterally changing strategies
Example 2. Matching Pennies is unstable:
1
-1
-1
-1
-2
-2
any game with no saddle point is unstable theres no Nash equilibrium point, so some player will always switch
ROW prefers and will switch to it
COLUMN prefers and will switch to it
.. no saddle point = 1 or both players always switch
but suppose we allow (more realistic) stochastic strategies
each player plays randomly, choosing moves according to fixed probabilities
Example.
in Matching Pennies, each player chooses heads or tails with probability 1/2
this is a Nash equilibrium point:
each player has expected winnings = 0
she cannot improve this by (unilaterally) switching strategies
any other strategy still has expected winnings = 0
the game is stable and () holds
but what if ROW plays row 1 with probability 3/4?
COLUMN will switch to playing column 2 always,
increasing expected winnings from 0 to 1/2 = (3/4)(1) + (1/4)(1)
then they start looping as in Fig.2
well show that in general, stochastic strategies recover ()
CSCI 5654
H. Gabow
Fall 2007
#22, p. 2
j = 1, . . . , n
i = 1, . . . , m
z unrestricted
COLUMN plays column j with probability yj , j = 1, . . . , n
this strategy gives worst-case expected losses w iff
the expected losses are w for each row
to minimize w, COLUMN computes yj as the solution to the LP
minimize w
Pn
subject to w j=1 aij yj 0
Pn
j=1 yj = 1
i = 1, . . . , m
yj 0
j = 1, . . . , n
w unrestricted
these 2 LPs are duals
both are feasible (set one xi = 1, all others 0)
= both have the same optimum objective value
CSCI 5654
H. Gabow
Fall 2007
#22, p. 3
5/11
6/11
7/11
-2
4/11
-3
-4
Fig.4 Optimum stochastic strategies are given along the matrix borders.
The loop for deterministic play is also shown.
the value of this game is 2/11:
ROWs expected winnings equal
(7/11) [(5/11)(2) + (6/11)(2)] +(4/11) [(5/11)(4) + (6/11)(3)] = 2/11
|
{z
}
|
{z
}
2/11
2/11
1/2-q
1/2-q
-1
-1
1/2-p
1
-1
1/2-p
-1
CSCI 5654
H. Gabow
Fall 2007
#22, p. 4
Chvatal, 97 100
Unit 4: Implementation
subject to
2
1 1 x2
0
x1
x2
0
Standard Form LP:
maximize cx
subject to Ax b
x0
minimize yb
subject to yA c
y0
Conventions
A: m n coefficient matrix
b: column vector of r.h.s. coefficients (length m)
c: row vector of costs (length n)
x: column vector of primal variables (length n)
y: row vector of dual variables (length m)
Initial Dictionary
let xS be the column vector of slacks
xS = b Ax
z = cx
more generally:
extend x, c, A to take slack variables into account
now x & c are length n + m vectors; A = [A0 I ] is m (n + m)
the standard form LP becomes Standard Equality Form:
maximize cx
subject to Ax = b
x0
we assume this form has been constructed from standard form, i.e., slacks exist
alternatively we assume A has rank m, i.e., it contains a basis (see Handout#31)
x1
1
1 0 1 0 x2
e.g., our example LP E has constraints
=
2
1 1 0 1
x3
x4
a basis is denoted by B, an ordered list of indices of basic variables
(each index is between 1 & n + m)
e.g., in E the basis of slacks is 3, 4
when B denotes a basis, N denotes the indices of nonbasic variables
i.e., the complementary subset of {1, . . . , n + m}, in any order
CSCI 5654
H. Gabow
Fall 2007
#23, p. 1
Remark.
the expression cB B1 in the cost equation will be denoted y (the vector of dual values)
the cost row corresponds to Lemma 1 of Handout#16
looking at the original dictionary, y is the vector of multipliers
Example.
0
1 0
1
,B =
1
1 1
1
in E, B = (1, 2) gives B =
1
dictionary:
x1
x2
1 0
=
1 1
1
1 0 1 0 x3
1
1 0 x3
2
1 1 0 1 x4
1
1 1 x4
1
+
z= 3 1
0
1
cB B
= 3
0 3 1
1 0
1 1
x3
x4
= 4 2x3 x4
1 0
= 2 1
1 1
in scalar form,
x1 = 1 x3
x2 = 1 + x3 x4
z = 4 2x3 x4
CSCI 5654
H. Gabow
Fall 2007
#23, p. 2
u0
u1
u3
u2
CSCI 5654
H. Gabow
Fall 2007
#23, p. 3
Unit 4: Implementation
H. Gabow
Fall 2007
#24, p. 1
Pivot Step
In basis heading B replace r by s (this redefines B)
xB xB td
In xB , replace entry for r (now 0) by t
CSCI 5654
H. Gabow
Fall 2007
#24, p. 2
Chvatal, Ch. 6
Unit 4: Implementation
Gaussian Elimination
solves a linear system
Pn
j=1 aij xj
= bi , i = 1, . . . , n in 2 steps:
Accuracy
avoid bad round-off error by initially scaling the system to get coefficients of similar magnitude
also, choose each pivot element aij to have largest magnitude (from among all elements aj )
this is called partial pivotting
complete pivotting makes the selection from among all elements a
so the order that variables are eliminated can change
Time
assume 1 arithmetic operation takes time O(1)
i.e., assume the numbers in a computation do not grow too big
(although in pathological cases the numbers can grow exponentially (Edmonds, 67))
time = O(n3 )
back substitution is O(n2 )
theoretically, solving a linear system uses time O(M (n)) = O(n2.38 )
by divide-and-conquer methods of matrix multiplication & inversion
for sparse systems the time is much less, if we preserve sparsity
CSCI 5654
H. Gabow
Fall 2007
#25, p. 1
Preserving Sparsity
if we pivot on aij we eliminate row i & column j from the remaining system
but we can change other coefficients from 0 to nonzero
the number of such changes equals the fill-in
we can reduce fill-in by choice of pivot element, for example:
let pi (qj ) be the number of nonzero entries in row i (column j) in the (remaining) system
pivotting on aij gives fill-in (pi 1)(qj 1)
Markowitzs rule: always pivot on aij , a nonzero element
that minimizes (pi 1)(qj 1)
this usually keeps the fill-in small
Remark. minimizing the total fill-in is NP-hard
Matrices for Pivotting
permutation matrix an identity matrix with rows permuted
let P be an n n permutation matrix
for any n p matrix A, PA is A with rows permuted the same as P
e.g., to interchange rows r and s, take P an identity with rows r and s interchanged
a permutation matrix can equivalently be described as I with columns permuted
for any p n matrix A, AP is A with columns permuted as in P
upper triangular matrix all entries strictly below the diagonal are 0
similarly lower triangular matrix
eta matrix (also called pivot matrix)
identity matrix with 1 column (the eta column) changed arbitrarily
as long as its diagonal entry is nonzero (so its nonsingular)
Remark. the elementary matrices are the permutation matrices and the etas
in the system Ax = b pivotting on akk results in the system LAx = Lb
for L a lower triangular eta matrix whose kth column entries ik are
0 (i < k), 1/akk (i = k), aik /akk (i > k)
a pivot in the simplex algorithm is similar (L is eta but not triangular)
(a Gauss-Jordan pivot)
CSCI 5654
H. Gabow
Fall 2007
#25, p. 2
(1)
CSCI 5654
H. Gabow
Fall 2007
#25, p. 3
Unit 4: Implementation
Time
O(dimension of b or c)
this improves to time O(#of nonzeros in the eta column) if
(i) we can overwrite b (c) & change it to x (y)
both b & c are stored as arrays
(ii) E is stored in a sparse data structure, e.g., a list of nonzero entries in the eta column
ekk is stored first, to avoid 2 passes
General Systems
basic principle: order multiplications to work with vectors rather than matrices
equivalently, work from the outside inwards
let E1 , . . . , Ek be eta matrices
To Solve E1 . . . Ek x = b
write the system as E1 (. . . (Ek x)) = b & work left-to-right:
solve E1 b1 = b for unknown b1
then solve E2 b2 = b1 for unknown b2
etc., finally solving Ek bk = bk1 for unknown bk = x
CSCI 5654
H. Gabow
Fall 2007
#26, p. 1
To Solve yE1 . . . Ek = c
write as ((yE1 ) . . .)Ek = c & work right-to-left:
solve ck Ek = c for ck
then solve ck1 Ek1 = ck for ck1
etc., finally solving c1 E1 = c2 for c1 = y
Time
using sparse data structures time = O(total # nonzeros in all k eta columns)
+O(n) if we cannot destroy b (c)
An Extension
let U be upper triangular with diagonal entries 1
To Solve UE1 . . . Ek x = b
Method #1:
solve Ub1 = b for b1 (by back substitution)
then solve E1 . . . Ek x = b1
Method #2 (more uniform):
for j = 1, . . . , n, let Uj be the eta matrix whose jth column is U.j
Fact: U = Un Un1 . . . U1 (note the order)
Verify thisby doing the
pivots.
1 a b
1 0 b
1 a 0
Example: 0 1 c = 0 1 c 0 1 0
0 0 1
0 0 1
0 0 1
so use the algorithm for a product of eta matrices
for both methods, the time is essentially O(total # nonzero coefficients)
similarly for yUE1 . . . Ek = c
CSCI 5654
H. Gabow
Fall 2007
#26, p. 2
Unit 4: Implementation
the revised simplex algorithm can handle LPs with huge numbers of variables!
this technique was popularized by the work of Gilmore & Gomory on the cutting stock problem
(the decomposition principle, Chvatal Ch.26, uses the same idea on LPs that overflow memory)
1-Dimensional Cutting Stock Problem
arises in production of paper, foil, sheet metal, etc.
Cutting Stock Problem
raw material comes in raw rolls of width r
must produce bi final rolls of width wi , i = 1, . . . , m
each wi r
Problem: minimize the number of raws used
(this problem is NP-complete just 3 finals per raw is the 3 partition problem)
LP Formulation of Cutting Stock Problem
a cutting pattern cuts 1 raw into ai finals of width wi (i = 1, . . . , m)
plus perhaps some waste
thus a cutting pattern is any solution to
Pm
i=1 wi ai r, ai a nonnegative integer
form matrix A having column Aj = (the jth cutting pattern)
let variable xj = (# of raws that use pattern j)
cutting stock LP:
Remarks
1. the definition of cutting pattern allows us to assume = (rather than ) in the LP constraints
also it makes all cj = 1; this will be crucial for column generation
2. the LP ignores the constraint that xj is integral
in practice, rounding the LP optimum to an integral solution gives a high-quality answer
(since rounding increases the cost by < m)
in some applications, fractional xj s are OK see Chvatal
3. the LP is hard to solve because the # of variables xj is huge
practical values are m 40, r 500, 50 wi 200; gives 107 patterns!
4. a good initial solution to the LP is given by the greedy algorithm, first fit decreasing:
use the largest final that fits, and proceed recursively
CSCI 5654
H. Gabow
Fall 2007
#27, p. 1
(i = 1, . . . , m)
CSCI 5654
H. Gabow
Fall 2007
#27, p. 2
CSCI 5654
H. Gabow
Fall 2007
#27, p. 3
Branch-and-Bound Algorithms
Unit 4: Implementation
ijS
xij
{0, 1}
|S| 1
i = 1, . . . , n
i = 1, . . . , n
i, j = 1, . . . , n
S {1, . . . , n}
(no subtours)
dropping the last line of the ILP gives another ILP, the assignment problem
(see Handout#45,p.2)
the assignment problem can be solved efficiently (time O(n3 ))
its optimum solution is a lower bound on the TSP solution
(since its a relaxation of TSP)
CSCI 5654
H. Gabow
Fall 2007
#28, p. 1
maximize z =
Pn
subject to
ui + vj cij
i=1 ui
j=1
vj
i, j = 1, . . . , n
(b) Let z be the cost of an optimum assignment. Let ui , vj be optimum duals. Show that any
feasible assignment with xij = 1 costs z + cij ui vj .
(c) The b-&-b algorithm branches on xij where xij = 1 and ij maximizes
0 = min{cik ui vk : k 6= j} + min{ckj uk vj : k 6= i} + z .
Show this choice allows us to discard the xij = 0 problem if 0 M . Hint. Use what you learned
in (b).
CSCI 5654
H. Gabow
Fall 2007
#28, p. 2
Example Execution
j
1
2
3
4
5
6
vj
7
20
21
12
23
7
2
27
13
16
46
5
5
3
43
16
25
27
5
5
4
16
1
35
48
9
1
5
30
30
5
18
5
5
6
26
25
9
18
5
x63 = 1, x36 = 0
= 54 (same assgt.)
63
ui
15
0
0
11
5
0
x63 = 0
tour asst.
M =
(0 = 63)
x63 = 1, x36 = x35 = 0
0 = 8 + 2 + 54 = 64
discard
x63 = x35 = 1
x36 = x53 = x56 = 0
= 65
discard
(i = 1, . . . , m)
Pm
k+1 ci xi
CSCI 5654
(ck+1 /ak+1 )
Pm
k+1 ai xi
H. Gabow
Fall 2007
#28, p. 3
Remarks
1. (x1 , . . . , xk ) is increasing in xk
since increasing xk by 1 changes by ck (ck+1 /ak+1 )ak 0
2. if all ci are integral, z
Example. A knapsack can hold b = 8 pounds. The 2 densest items have these parameters:
i
ci /ai
ai
1
2
3
2
1
1.1
CSCI 5654
H. Gabow
Fall 2007
#28, p. 4
Unit 5: Extensions
tj
xj
uj
replace xj by 2 slacks sj , tj 0
eliminate xj by substituting xj = j + sj
add constraint sj + tj = uj j
Case 2: xj has 1 finite bound.
replace xj by sj = (the slack in xj s bound)
sj 0
eliminate xj by substituting xj = j + sj or xj = uj sj
Case 3: xj free, i.e., no finite bounds.
nj
xj
pj
aj
xj
replace xj by pj , nj 0
eliminate xj by substituting xj = pj nj
more generally xj = aj + pj nj for some constant aj
CSCI 5654
H. Gabow
Fall 2007
#29, p. 1
(valid if j < uj )
Proof 2. the constraint sj + tj = (constant) gives row 0 . . . 0 1 1 0 . . . 0 in A
so B contains 1 or both columns sj , tj
CSCI 5654
H. Gabow
Fall 2007
#29, p. 2
Upper Bounding
Unit 5: Extensions
z 6= cB B1 b
so our algorithm must maintain the current value of x, denoted x (xB & xN )
The Simplex Algorithm with Upper Bounding
let B be a basis with corresponding basic solution x
Pivot Step
changes the value of some nonbasic xs
from xs to xs + , where is positive or negative
basic variables change from xB to xB d
objective value changes from z to z + (cs yAs )
as in the revised simplex, d = B1 As , y = cB B1
CSCI 5654
H. Gabow
Fall 2007
#30, p. 1
cj
uj
j
xj
CSCI 5654
H. Gabow
Fall 2007
#30, p. 2
+ x3 = 1
x1 + 3x2 + 4x3 = 13
x1 , x2 0
equivalently x1 = 1 x3 , x2 = x3 4
so take x3 = 4 and initial basis (1, 2), x1 = 3, x2 = 0
Phase 1 not needed, go directly to Phase 2
the method of generalized upper bounding (Chvatal, Ch. 25)
adapts
P the simplex algorithm for constraints of the form
jS xj = b
each xj appearing in 1 such constraint
CSCI 5654
H. Gabow
Fall 2007
#30, p. 3
Unit 5: Extensions
2v1
x i , vi 0
to proceed to Phase 2 we drop v1
we could safely drop v2 and its constraint v2 = 0
this would give us a basis
intuitively this corresponds to eliminating the redundant constraint
well show how to eliminate redundancy in general, & get a basis
consider a general form LP L:
maximize cx subject to Ax = b, x u
L has a basis some m columns B have AB nonsingular
A has full row rank
the row rank of an m n matrix A is the maximum # of linearly independent rows
similarly for column rank
(row rank of A) = (column rank of A) = the rank of A
CSCI 5654
H. Gabow
Fall 2007
#31, p. 1
CSCI 5654
H. Gabow
Fall 2007
#31, p. 2
1 0
0 1
0 0
0 0
0
0
1
0
0
...
...
...
...
...
2. L has basis B = B R
Proof.
in B, consider the rows for constraints of L
any entry in a column of R is 0
= these rows are linearly independent when the columns of R are dropped
= these rows form a basis of L
Proof.
for (i), run the general simplex algorithm (2-Phase)
for (ii), initialize Phase 1 with a normal bfs
it halts with a normal bfs
eliminate all artificial variables using the above procedure
CSCI 5654
H. Gabow
Fall 2007
#31, p. 3
Example
L: maximize x1 + x2 subject to 2 x1 4, 2 x2 4
this LP has empty A, which has full row rank!
x2
optimum bfs
x1
CSCI 5654
H. Gabow
Fall 2007
#31, p. 4
Unit 5: Extensions
Certificates of Infeasiblity
let I be a system of inequalities Ax b
Theorem. I is infeasible
the contradiction 0 1 can be obtained as a linear combination of constraints.
Proof.
consider this primal-dual pair:
primal
dual
maximize 0x
minimize yb
subject to Ax b
subject to yA = 0
y0
let n = (# variables in I)
Corollary. I is infeasible
some subsystem of n + 1 constraints is infeasible.
Proof.
first assume A has full column rank
this implies [A | b] has full column rank
if not, b is a linear combination of columns of A, contradicting infeasiblity
I infeasible = yA = 0, yb = 1, y 0 is feasible
= it has a bfs (in the general sense) y
by the Generalized Fundamental Theorem of LP, and full column rank
there are n + 1 constraints, so n + 1 basic variables
any nonbasic variable is 0
so y has n + 1 positive variables
now consider a general A
drop columns of A to form A of full column rank, & apply above argument
the multipliers y satisfy y A = 0
this implies y A = 0 since each dropped column is linearly dependent on A
CSCI 5654
H. Gabow
Fall 2007
#32, p. 1
Remarks
1. the corollary cant be strengthened to n infeasible constraints
e.g., in this system in variables x1 , x2 , any 2 constraints are feasible:
x2
-x1+x2>=1
x1+x2>=7
x2<=2
x1
x0
each term on the l.h.s. is nonnegative but the r.h.s. is negative, contradiction!
CSCI 5654
H. Gabow
Fall 2007
#32, p. 2
Theorems of Alternatives
Unit 5: Extensions
1
0
1
the 2nd constraint to the other two gives 0 = 2
Farkas Lemma. (1902) For any A and b, exactly 1 of these 2 systems is feasible:
(I)
Ax = b, x 0
(II)
yA 0, yb < 0
Interpretations:
(i) system (I) is infeasible iff it implies the contradiction (nonnegative #) = (negative #)
(ii) system (II) is infeasible iff it implies the contradiction (negative #) 0
Example: the system
x1 x2 = 1
2x1 x2 = 0
is inconsistent, since
1 (first constraint) + (2nd constraint)
gives x1 = 1, i.e., (nonnegative #) = (negative #)
Proof.
consider this primal-dual pair:
Primal
Dual
maximize 0x
minimize yb
subject to Ax = b
x0
subject to yA 0
(I) feasible 0 is the optimum objective value for both primal & dual
(II) infeasible
for = note that y = 0 gives dual objective 0
CSCI 5654
H. Gabow
Fall 2007
#33, p. 1
Remarks
1. Farkas Lemma useful in linear, nonlinear and integer programming
Integer Version:
For A and b integral, exactly 1 of these 2 systems is feasible:
Ax = b, x Zn
yA Zn , yb
/ Z, y Rm
Example
consider this system of equations in integral quantities xi :
2x1 + 6x2 + x3 = 8
4x1 + 7x2 + 7x3 = 4
tripling the 1st equation & adding the 2nd gives the contradiction
10x1 + 25x2 + 10x3 = 28
the corresponding vector for Farkas Lemma is y1 = 3/5, y2 = 1/5
2. Farkass Lemma is a special case of the Separating Hyperplane Theorem:
S a closed convex set in Rm , b Rm S =
some hyperplane separates b from S, i.e., yb > a, ys a for all s S
for Farkas, S is the cone generated by the columns of A
(I) says b is in the cone, (II) says b can be separated from it
x2
Ax
yx=0
x1
CSCI 5654
H. Gabow
Fall 2007
#33, p. 2
Dual
maximize
minimize y0
subject to Ax + 1 0
subject to yA = 0
y1 = 1
y0
Remarks.
1. Heres a generalization of Gordans Lemma to nonlinear programming:
Theorem (Fan et al, 1957).
Let C be a convex set in Rn , and let f : C Rm be a convex function. Then exactly 1 of these 2
systems is feasible:
(I)
f (x) < 0
(II)
yf (x) 0 for all x C, y 0, y 6= 0
Exercise. Prove Fans Theorem includes Gordans as a special case. Begin by taking f (x) = Ax.
The challenge is to prove yA = 0, as required in Gordan, not yA 0, which looks like what comes
out of Fan.
2. Chvatal gives other theorems of alternatives
CSCI 5654
H. Gabow
Fall 2007
#33, p. 3
Unit 5: Extensions
6
45
15
0
+ 45 s1 41 s2
49 s1 + 41 s2
+ 43 s1 + 41 s2
165
4
45 s1 43 s2
z=
32 s2 + 35 s3
+ s2 +3 s3
31 s2 + 34 s3
z = 40
31 s2 35 s3
x1 =
x2 =
s1 =
Dual
maximize z = cx
minimize yb
subject to Ax = b
x0
subject to yA c
CSCI 5654
H. Gabow
Fall 2007
#34, p. 1
jN
arj xj
jN [cj
(1)
CSCI 5654
H. Gabow
Fall 2007
5 3
/
4 4
= 5/3,
s2 :
3 1
/
4 4
= 3.
#34, p. 2
Remarks
1. the Entering and Leaving Variable Steps are reversed from standard simplex
2. a pivot kills 1 negative variable, but it can create many other negative variables
e.g., in Chvatal pp. 155156 the first pivot kills 1 negative variable but creates another
in fact the total infeasibility (total of all negative variables) increases in magnitude
3. dual simplex allows us to avoid Phase I for blending problems
the initial dictionary is dual feasible
in general a variant of the big-M method can be used to initialize the dual simplex
4. the CPLEX dual simplex algorithm is particularly efficient
because of a convenient pivot rule
CSCI 5654
H. Gabow
Fall 2007
#34, p. 3
cj
uj
j
xj
CSCI 5654
H. Gabow
Fall 2007
#34, p. 4
Sensitivity Analysis
Unit 5: Extensions
postoptimality analysis studies the optimum solution its robustness and patterns of change
the ease of doing this is an important practical attraction of LP and the simplex algorithm
in contrast postoptimality analysis is very hard for IP
assume weve solved a general LP,
maximize cx subject to Ax = b, x u
obtaining optimum basis B and corresponding optimum bfs x
sensitivity analysis tells how to solve slightly-changed versions of the problem
e.g., c changes to e
c
e will always denotes modified parameters
Cost Changes, e
c
CSCI 5654
H. Gabow
0
(cj )y1 b
Case 1
Case 2
Fall 2007
#35, p. 1
e
Right-hand Side Changes, b
B remains a basis
it has a corresponding bfs
xN = xN
e B1 AN xN
xB = B1 b
()
this bfs is dual-feasible
so we start the revised dual simplex algorithm using the above bfs
the eta file and current cost coefficients are available from the primal simplex
r.h.s. ranging is similar, using (), & u
changing 1 bi gives 2 inequalities per basic variable
e have new bounds ,
e u
e
more generally suppose in addition to b
B remains a basis
define a new bfs x as follows:
for j nonbasic, xj = if xj is free in new problem then xj
else if xj = j then ej
ej
else if xj = uj then u
else / xj was free and is now bound / a finite bound ej or u
ej
for this to make sense we assume no finite bound becomes infinite
also define xB by ()
x is dual feasible
hence restart revised dual simplex from x
Question. Why does the method fail if a finite bound becomes infinite?
Adding New Constraints
add a slack variable vi in each new constraint
vi has bounds 0 vi < for an inequality & 0 vi 0 for an equation
extend B & x to the new system:
add each vi to the basis
compute its value from its equation
we have a dual-feasible solution (cost(vi ) = 0) so now use the dual simplex algorithm
refactor the basis since the eta file changes
DS Example (Handout#34) contd.
we solve the LP
maximize z = 8x1 +5x2
subject to
x1 + x2
6
9x1 +5x2
45
x1 , x2 0
using standard simplex
CSCI 5654
H. Gabow
Fall 2007
#35, p. 2
15
4
9
4
165
4
+ 54 s1 41 s2
49 s1 + 41 s2
54 s1 43 s2
assume B doesnt change (handle such changes by new variables, see Chvatal pp.1612)
solve the new problem in 2 simplex runs, as follows
1. run primal simplex algorithm
initial bfs x:
ej , or to xj if free
set nonbasic xj to a finite bound ej or u
define basic xj by ()
eu
e:
to make x primal feasible, redefine violated bounds ,
if j is basic and xj > u
ej , new upper bound = xj
if j is basic and xj < ej , new lower bound = xj
CSCI 5654
H. Gabow
Fall 2007
#35, p. 3
Parametric LPs
Unit 5: Extensions
p
piecewise linear concave up
Parametric Costs
L(p) has the form
maximize (c + pc )x subject to Ax = b, x u
p
I
Example.
maximize (p + 1)x1 + x2 + (p 1)x3 subject to x1 0, 0 x2 1, 0 x3
z
x1
CSCI 5654
H. Gabow
x2
p
x3
Fall 2007
#36, p. 1
Proof.
wlog A has full row rank
a basic feasible direction w has (c + pc )w > 0 in some interval
(, r), (, ), R or
thus L(p) is unbounded in 2 maximal intervals of the above form
& is bounded in a closed interval I (I = [, r], (, r], [, ), R or )
for the 2nd part note that L(p) has a finite number of normal bfss
optimum basis:
CSCI 5654
x2
H. Gabow
x1
Fall 2007
#36, p. 2
infeasible
1
infeasible
p
1
2. maximize x1 subject to x1 + x2 = p, x1 , x2 0
z
infeasible
3. maximize x1 subject to x1 + x2 = 2p 2, 2 x1 p, x2 0
z
infeasible
2 2
CSCI 5654
H. Gabow
Fall 2007
#36, p. 3
WV 9.8
Unit 5: Extensions
the cutting plane method for ILP starts with the LP relaxation,
and repeatedly adds a new constraint
the new constraint eliminates some nonintegral points from the relaxations feasible region
eventually the LP optimum is the ILP optimum
DS Example (Handout#34) contd.
ILP:
maximize z = 8x1 +5x2
subject to
x1 + x2
6
9x1 +5x2
45
x1 , x2 Z +
6
45
15
0
6
3x1 + 2x2 = 15
5
4
9x1 + 5x2 = 45
x2 3
y
2
1
x1 + x2 = 6
y
1
3
x1
CSCI 5654
H. Gabow
Fall 2007
#37, p. 1
165
4
54 s1 43 s2
has x1 nonintegral
5
1
x1 s equation is x1 = 15
4 ( 4 )s1 4 s2
keeping only fractional parts the r.h.s. is 34 43 s1 14 s2
so the cutting plane is 43 43 s1 41 s2 0
equivalently 3 3s1 + s2 , or in terms of original variables, 12x1 + 8x2 60, 3x1 + 2x2 15
Summary of the Algorithm
solve the LP relaxation of the given IP
while the solution is fractional do
add a cut () to the LP
resolve the new LP
/ use the dual simplex algorithm, since were just adding a new constraint /
Example. DS Example in Handouts#3435 show how the ILP of p.1 is solved
Gomory proved this algorithm solves the ILP in a finite number of steps
Refinements:
choosing an fi close to half is recommended in practice
can discard cuts that become inactive
in practice the method can be slow more sophisticated cutting strategies are used
CSCI 5654
H. Gabow
Fall 2007
#37, p. 2
Remarks
1. if the given ILP has constraints Ax b rather than equalities,
we require A & b both integral, so all slack variables are integral
if this doesnt hold, can scale up A & b
2. the fractional cutting plane method can be extended to mixed integer programs (MIP)
3. cutting planes can be used within a branch-and-bound algorithm
to strengthen the bound on the objective function
CSCI 5654
H. Gabow
Fall 2007
#37, p. 3
Applications to Geometry
Unit 5: Extensions
Proof.
a set of red & green points can be strictly separated
some hyperplane ax = b has ar b & ag b + 1 for each red point r & each green point g
(by rescaling)
thus our given set can be separated
this system of inequalities is feasible for unknowns a & b:
ar b
for each given red point r
ag b + 1 for each given green point g
since there are n + 1 unknowns, the Corollary gives the Theorem
CSCI 5654
H. Gabow
Fall 2007
#38, p. 1
P1
P2
CSCI 5654
H. Gabow
Fall 2007
#38, p. 2
Remarks
1. the assumption Pi nonempty in the above argument ensures h 6= 0
(since h = 0 doesnt separate any points)
this argument is a little slicker than Chvatal
2. for both theorems of this handout, Chvatal separates sets A & B using 2 disjoint half-spaces
i.e., points x A have hx c, points x B have hx c +
for finite sets of points, the 2 ways to separate are equivalent
but not for infinite sets e.g., we can strictly separate the sets hx > c & hx < c
but not with disjoint half-spaces
separation by disjoint half-spaces implies strict separation
so the above Separation Theorem would be stronger if we separated by disjoint half-spaces
thats what we did in the proof!, so the stronger version is true
(why not do this in the first place? simplicity)
CSCI 5654
H. Gabow
Fall 2007
#38, p. 3
Unit 6: Networks
this problem is to route specified quantities of homogeneous goods, minimizing the routing cost
more precisely:
let G be a directed graph on n vertices and m edges
the undirected version of G ignores all edge directions
for simplicity assume its connected
4
3
6
Fig.1. Graph G.
sink
each vertex i has a demand bi , & we call i a source
i bi
bi > 0
bi < 0
bi = 0
=0
each edge ij has a cost cij , the cost of shipping 1 unit of goods from i to j
we want to satisfy all demands exactly, and minimize the cost
4
-1
4
2
0
2
-2
0
4
1
1
(b)
(a)
Fig.2. (a) Graph G with vertex demands & edge costs. (b) Optimum transshipment,
cost 10. Edge labels give # units shipped on the edge; 0 labels are omitted.
1 unit is shipped along path 5,3,6 vertex 3 functions as a transshipment node.
Special Cases of the Transshipment Problem.
assignment problem & its generalization, transportation problem
shortest path problem
Exercise. Model the single-source shortest path problem as a transhipment problem.
CSCI 5654
H. Gabow
Fall 2007
#39, p. 1
minimize cx
subject to Ax = b
x0
maximize yb
subject to yj yi + cij , for each arc ij
CSCI 5654
H. Gabow
Fall 2007
#39, p. 2
Unit 6: Networks
4
1
0
6
CSCI 5654
H. Gabow
Fall 2007
#40, p. 1
e
Claim 2. The columns of a forest are linearly independent, in A & A
Proof.
e
it suffices to prove this for A
suppose a linear combination of the edges sums to 0
an edge incident to a leaf 6= r has coefficient 0
repeat this argument to eventually show all coefficients are 0
1
3
4
1
2
6
Fig.4. A (suboptimal) basis. Pivotting edge (5, 1) into the basis gives Fig.3.
CSCI 5654
H. Gabow
Fall 2007
#40, p. 2
Prices
each vertex maintains a dual variable (its price) defined by yr = 0 and yB = c (Exercise 1(c))
4
-4
2
2
-2
-4
4
-1
-4
-5
1
-4
2
-1
-3
(a)
(b)
Fig.5. Prices for the bases of Fig.4 and 3 respectively. r is the top left vertex
(as in Fig.1, Handout#39).
Each basic edge ij satisfies yi + cij = yj . (b) gives
P
optimum prices:
yi bi = 10.
Note: yr doesnt exist in the reduced system
but the constraints for edges incident to r are equivalent to
usual constraints yi + cij yj with yr = 0
Network Simplex Algorithm
this algorithm implements the (basic) simplex algorithm for the transshipment problem
each iteration starts with a basis B (spanning tree T ) and bfs x
Entering Variable Step
Solve yB = cB by traversing T top-down (Exercise 1(c)).
Choose any (nonbasic) edge ij cij < yj yi . / underbidding /
If none exists, stop, B is an optimum basis.
Leaving Variable Step
Execute a pivot step by traversing edge ijs cycle C and finding t.
If t = , stop, the problem is unbounded. / impossible if c 0 /
Otherwise choose a backwards edge uv that defines t.
Pivot Step
Update x: change values along C by t.
In T replace edge uv by ij.
CSCI 5654
H. Gabow
Fall 2007
#40, p. 3
Unit 7: Polynomial LP
CSCI 5654
H. Gabow
Fall 2007
#41, p. 1
x2
x1
x21
x22
9 0
Fig.1. Ellipse
+
= 1; equivalently center c = 0, B =
.
0 4
9
4
High-level Algorithm
construct a sequence of ellipsoids Er , r = 0, . . . , s
each containing P
with volume shrinking by a factor 21/2(n+1)
stop when either
(i) the center of Es is in P , or
(ii) volume(Es ) < 2(n+2)L (= P = )
Initialization and Efficiency
we use a stronger version of the basic volume fact:
P 6= = volume(P {x : |xj | < 2L , j = 1, . . . , n}) 2(n+2)L
thus E0 can be a sphere of radius n2L , center 0
# iterations = O(n2 L)
more precisely suppose we do N = 16n(n + 1)L iterations without finding a feasible point
our choice of E0 restricts all coordinates to be n2L
i.e., E0 is contained in a box with each side 2n2L 22L
= volume(E0 ) 22nL
after N iterations the volume has shrunk by a factor 2N/2(n+1) = 28nL
.. after N iterations the ellipse has volume 22nL8nL 26nL < 2(n+2)L
= P =
CSCI 5654
H. Gabow
Fall 2007
#41, p. 2
x2
x1
x1 + x2 = 0
84/13
32/13
contains intersection of E of Fig.1 & half-space x1 + x2 0
32/13
496/117
Ellipsoid Algorithm
Initialization
Set N = 1 + 16n(n + 1)L.
Set p = 0 and B = n2 22L I.
T
/ The ellipsoid is always {x : (x p) B1 (x p) 1}.
The initial ellipse is a sphere centered at 0 of radius n2L . /
Main Loop
Repeat the Shrink Step N times (unless it returns).
If it never returns, return infeasible.
Shrink Step
If Ap < b then return p as a feasible point.
Choose a violated constraint, i.e., an i with Ai p bi .
Let a = ATi .
Ba
1
.
Let p = p
n + 1 aT Ba
2 (Ba)(Ba)T
n2
B
.
Let B = 2
n 1
n + 1 aT Ba
CSCI 5654
H. Gabow
Fall 2007
#41, p. 3
Remarks
1. the ellipsoids of the algorithm must be approximated,
since their defining equations involve square roots
this leads to a polynomial time algorithm
2. but the ellipsoid algorithm doesnt take advantage of sparsity
3. it can be used to get polynomial algorithms for LPs with exponential #s of constraints!
e.g., the Held-Karp TSP relaxation (Handout#1)
note the derivation of N is essentially independent of m
to execute ellipsoid on an LP, we only need an efficient algorithm for
the separation problem:
given x, decide if x P
if x
/ P , find a violated constraint
Convex Programming
let C be a convex set in Rn
i.e., x, y C = x + (1 )y C for any [0, 1]
Problem: min cx s.t. x C
Fundamental Properties for Optimization
1. for our problem a local optimum is a global optimum
Proof.
let x be a local optimum & take any y C
take (0, 1] small enough so c (1 )x + y cx
thus (1 )cx + cy cx, cy cx
2. x
/ C = a separating hyperplane, i.e.,
by > a for all y C and bx < a
proved in Handout#38
because of these properties the ellipsoid algorithm solves our convex optimization problem,
assuming we can recognize points in C & solve the separation problem
CSCI 5654
H. Gabow
Fall 2007
#41, p. 4
b
an n n symmetric positive semidefinite matrix
x 1
Example. min x s.t.
is PSD
1 1
this problem is equivalent to min x s.t. xv 2 2vw + w2 0 for all v, w
x = 1 is the optimum
(taking v = w = 1 shows x 1, & clearly x = 1 is feasible)
in general, the feasible region is convex
a convex combination of PSD matrices is PSD
the separation problem is solved by finding Xs eigenvalues
X not PSD = it has a negative eigenvalue
let v be the corresponding eigenvector
vT Xv = vT v < 0
so vT Xv 0 is a separating hyperplane
i.e., it separates the current solution from the feasible region
& can be used to construct the next ellipse
Conclusion: For any > 0, any semidefinite program can be solved by the ellipsoid algorithm
to within an additive error of , in time polynomial in the input size and log (1/).
Examples:
1. (G) (Handout#50) is computed in polynomial time using semidefinite programming
2. .878 approximation algorithms for MAX CUT & MAX 2 SAT
(Goemans & Williamson, STOC 94); see Handout#44
CSCI 5654
H. Gabow
Fall 2007
#41, p. 5
Remarks.
1. SDP includes convex QP as a special case (Exercise below)
2. SDP also has many applications in control theory
3. work on SDP started in the 80s
interior point methods (descendants of Karmarkar) run in polynomial time
& are efficient in practice, especially on bigger problems
Exercise. We show SDP includes QP (Handout#42) and more generally, convex quadratically
constrainted quadratic programming (QCQP).
(i) For x Rn , show the inequality
(Ax + b)T (Ax + b) cx + d
is equivalent to
I
Ax + b
is PSD
(Ax + b)T cx + d
.
Hint. Just use the definition of PSD. Recall ax2 + bx + c is always nonnegative iff b2 4ac and
a + c 0.
(ii) Show QP is a special case of SDP.
(iii) Show QCQP is a special case of SDP. QCQP is minimizing a quadratic form (as in QP) subject
to quadratic constraints
(Ax + b)T (Ax + b) cx + d.
CSCI 5654
H. Gabow
Fall 2007
#41, p. 6
MC 16.4.4, V 23.1
CSCI 5654
H. Gabow
Fall 2007
#42, p. 1
Example 5, Markowitzs Investment Model. (H. Markowitz won the 1990 Nobel Prize in Economics
for a model whose basics are what follows.)
We have historical performance data on n activities we can invest in. We want to invest in a
mixture of these activities that intuitively has maximum return & minimum risk. Markowitz
models this by maximizing the objective function
(expectation of the return) (variance of the return)
where is some multiplier.
Define
xi = the fraction of our investment that well put into activity i
ri = the (historical) average return on investment i
vi = the (historical) variance of investment i
cij = the (historical) covariance of investments i & j
If Ii is the random variable equal to the return of investment i, our investment returns
elementary probability the variance of this sum is
P
x2i vi + 2
xi Ii . By
i<j cij xi xj .
So forming r, the vector of expected returns, & the covariance matrix C = (cij ), Markowitzs QP
is
minimize xT Cx rx
subject to 1T x = 1
x0
Note that vi = cii . Also the covariance matrix C is PSD, since the variance of a random variable
is nonnegative.
Markowitzs model develops the family of solutions of the QP as varies
in some sense, these are the only investment strategies one should consider
the LINDO manual gives a similar example:
achieve a given minimal return (rx r0 ) while minimizing the variance
CSCI 5654
H. Gabow
Fall 2007
#42, p. 2
MC, 16.4.4
x*
x
x+y=1
2 2
x + y = 1/4
H. Gabow
Fall 2007
#43, p. 1
1 0
the cost vector of #1 is c = xT Q = (1/2, 1/2)
= (1/2, 1/2)
0 1
the linear cost function c x = x/2 + y/2 approximates q close to x
switching to cost c x, x is still optimum
although other optima exist: the edge E = {(x, 1 x, 0) : 0 x 1}
Example 2:
modify the QP to
min q = z 2 + x + y
s.t. x + y 1
x, y, z 0
the set of optima is edge E
0 0
this is consistent with the Remark, since Q = 0 0
0 0
0
0 is PSD but not PD
1
b
A
0
y
v
T
u
x
=0
v
y
x, u, y, v 0
Proof.
the dual LP is
max yT b s.t. yT A c + xT Q, y 0
introducing primal (dual) slacks v, (uT ),
the complementary slackness conditions for optimality are
u, v 0, uT x = vT y = 0
the LCP expresses these conditions
the above LCP is the Karush-Kuhn-Tucker necessary conditions (KKT conditions) for optimality
taking Q = 0 makes Q an LP
& the KKT conditions become the complementary slackness characterization of LP optimality
in fact
+
+
+
CSCI 5654
H. Gabow
Fall 2007
#43, p. 2
2A 1
u
1
0
v
1
0
A = (1, 1)T
B
1
x
0 y1 =
h
y2
0
ux, v1 y1 , v2 y2 = 0
x, u, y, v 0
the linear constraints in scalar form:
u 2Ax + y1 y2 = B
v1 x
=
v2 + x
=h
CS allows 3 possibilities, v1 = 0, v2 = 0, or y1 = y2 = 0
v1 = 0: = x =
v2 = 0: = x = h
y1 = y2 = 0:
when u = 0: = 2Ax = B (i.e., first derivative = 0), x = B/2A
when u > 0: = x = 0, so = 0, and x = as in first case
so KKT asks for the same computations as Calcs set-the-derivative-to-0 method
CSCI 5654
H. Gabow
Fall 2007
#43, p. 3
CSCI 5654
H. Gabow
Fall 2007
#43, p. 4
CSCI 5654
H. Gabow
Fall 2007
#44, p. 1
CSCI 5654
H. Gabow
Fall 2007
#44, p. 2
w/
=
min
(w/2)(1 cos )
0 1 cos
CSCI 5654
H. Gabow
Fall 2007
#44, p. 3
Pn
Pn
subject to
Pn
xij
i=1
i=1
Pn
k=1 xjk
=0
xij ij
xij uij
j = 1, . . . , n
i, j = 1, . . . , n
i, j = 1, . . . , n
flow conservation
Some Variations
Networks with Losses & Gains
a unit of material starting at i gets multiplied by mij while traversing link ij
P
P
so replace conservation by ni=1 mij xij nk=1 xjk = 0
example from currency conversion:
a site is a currency, e.g., dollars, pounds
mij = the number of units of currency j purchased by 1 unit of currency i
sample problem: convert $10000 into the most rubles possible
more examples: investments at points in time ($1 now $1.08 in a year), conversion of raw
materials into energy (coal electricity), transporting materials (evaporation, seepage)
Multicommodity Flow
1 network transports flows of various types (shipping corn, wheat, etc.)
use variables xkij , for k ranging over the commodities
if were routing people, Internet packets or telephone messages, we get an ILP (xkij integral)
in the next 2 examples take ij = 0 (for convenience)
Concave Down Cost Functions (works for any LP)
for convenience assume were maximizing z = profit, not minimizing cost
the profit of transporting material on link ij is a piecewise linear concave down function:
CSCI 5654
H. Gabow
Fall 2007
#45, p. 1
c(xij )
m3
m2
m1
xij
b1
b2
uij
decreasing returns to scale
replace xij by 3 variables rij , sij , tij
each appears in the flow conservation constraints where xij does
bounds on variables: 0 rij b1 , 0 sij b2 b1 , 0 tij uij b2
objective function contains terms m1 rij + m2 sij + m3 tij
concavity of c(xij ) ensures this is a correct model
Fixed Charge Flow
link ij incurs a startup cost sij if material flows across it
ILP model:
introduce decision variable yij = 0 or 1
new upper bound constraint: xij uij yij
objective function: add term sij yij
Assignment Problem
there are n workers & n jobs
assigning worker i to job j costs cij dollars
find an assignment of workers to jobs with minimum total cost
let xij be an indicator variable for the condition, worker i is assigned to job j
we get this LP:
minimize z =
subject to
Pn
i=1
Pn
Pn
j=1
Pn
i=1
xij
=1
i = 1, . . . , n
xij
xij
=1
0
j = 1, . . . , n
i, j = 1, . . . , n
CSCI 5654
H. Gabow
Fall 2007
#45, p. 2
Set Covering
constructing fire station j costs cj dollars, j = 1, . . . , n
station j could service some known subset of the buildings
construct a subset of the n stations so each building can be serviced
and the cost is minimum
let aij = 1 if station j can service building i, else 0
let xj be an indicator variable for construcing station j
minimize z =
Pn
subject to
Pn
j=1 cj xj
j=1
aij xj 1
i = 1, . . . , m
xj 0, integral
j = 1, . . . , n
j cj xj
i,j
sij yij
yij = 1
yij xj
yij , xj {0, 1}
subject to
i a client
i a client, j a facility
i a client, j a facility
CSCI 5654
H. Gabow
Fall 2007
#45, p. 3
1
7
1
3
5
(a)
3
(b)
i,j,p,q
p=1
Pn
i=1
CSCI 5654
H. Gabow
xip = 1
i = 1, . . . , n
xip = 1
xip {0, 1}
p = 1, . . . , n
i, p = 1, . . . , n
Fall 2007
#45, p. 4
Remarks.
1. we could convert this to an ILP
introduce variables yijpq {0, 1}, & force them to equal xip xjq by the constraint
yijpq xip + xjq 1
but this adds many new variables & constraints
2. a quadratic program has the form
maximize z =
j,k cjk xj xk
j cj xj
Pn
subject to
j=1 aij xj
bi
i = 1, . . . , m
subject to
Pn
j cj xj
j=1
aij xj bi
xj {0, 1}
i = 1, . . . , m
j = 1, . . . , n
by solving the QP
maximize z =
subject to
CSCI 5654
j (xj
1/2)2
Pn
j=1 aij xj
xj
bi
1
xj 0
H. Gabow
i = 1, . . . , m
j = 1, . . . , n
j = 1, . . . , n
Fall 2007
#45, p. 5
Murty, Ch.1
Multiobjective Functions
Example
we want to maximize profit 500p + 620f from producing pentium (p) & 486 (f ) computer systems
but maintaining a high-tech image dictates maximize p
whatever the strategy it should be a vector maximum, (pareto optimum) i.e.,
if we produce (p, f ) units, no other feasible schedule (p , f ) should have p p & f f
we give several approaches to multiobjective problems
a common aspect is that in practice, we iterate the process
resolving the LP with different parameters until a satisfactory solution is obtained
sensitivity analysis techniques (Handout #35) allow us to efficiently resolve an LP
if we modify it in a way that the solution changes by a small amount
now write our objective functions as fk =
j ckj xj
+ dk , k = 1, . . . , r
H. Gabow
Fall 2007
#46, p. 1
Goal Programming
adapt a goal value gk for each objective function fk
and use appropriate penalities for excess & shortages of each goal
e.g., pe = ps = 1 keeps us close to the goal
pe = 0, se = 1 says exceeding the goal is OK but each unit of shortfall incurs a unit penalty
iterate this process, varying the parameters, until a satisfactory solution is achieved
Goal Setting with Marginal Values
choose a primary objective function f0 and the other objectives fk , k = 1, . . . , r
f0 is most naturally the monetary price of the solution
adapt goals gk , k = 1, . . . , r
solve the LP with objective z = f0 and added constraints fk = gk , k = 1, . . . , r
duality theory (Handout #20) reveals the price pk of each goal gk :
changing gk by a small changes the cost by pk
use these prices to compute better goals that are achieved at an acceptable cost
resolve the LP to verify the predicted change
iterate until the solution is satisfactory
CSCI 5654
H. Gabow
Fall 2007
#46, p. 2
Fact 2. A nondegenerate pivot moves from one vertex, along an edge of P , to an adjacent vertex.
Proof.
a pivot step travels along a line segment whose equation, in parameterized form,
is given in Handout #8, Property 5:
CSCI 5654
H. Gabow
Fall 2007
#47, p. 1
xj =
(t
bj ajs t
0
j=s
jB
j
/ Bs
CSCI 5654
H. Gabow
Fall 2007
#47, p. 2
Simplex Cycles
the simplest example of cycling in the simplex algorithm
has the variables swapped in and out in a fixed cyclic order
+x3 -x1
+x1 -x2
+x2 -x3
D1
D2
D3
+x2 -x6
+x1 -x5
D1
+x3 -x1
D2
D3
+x4 -x2
+x5 -x3
D4
D5
D6
+x1 - s1
00
+x2 - s2
01
+s1 - x1
11
10
CSCI 5654
H. Gabow
Fall 2007
#48, p. 1
+s3 - x3
+x1 - s1
000
+x2 - s2
001
+s1 - x1
011
+x3 - s3
010
+x1 - s1
110
+s2 - x2
111
+s1 - x1
101
100
CSCI 5654
H. Gabow
Fall 2007
#48, p. 2
Chvatal, 37 38
the smallest-subscript rule can do an exponential number of pivots before finding the optimum
in fact it can stall for an exponential number of pivots!
to understand stalling well redo the proof of Handout #12:
consider a sequence S of degenerate pivots using the smallest-subscript rule
so the bfs never changes in S
say a pivot step involves the entering & leaving variables, but no others
a variable xi is fickle if its involved in > 1 pivot of S
if there are no fickle variables, |S| n/2
suppose S has fickle variables; let t be the largest subscript of a fickle variable
Corollary. S has a nonfickle variable
which is involved in a pivot between the first two pivots involving xt .
Proof. (essentially same argument Handout #12)
let F be the set of subscripts of fickle variables
let D & D be the dictionaries of the first two pivot steps involving xt , with pivots as follows:
xs xt
xt
we can assume cs 0:
suppose cs > 0
= xs is nonbasic in D
s > t (since xs doesnt enter in D s pivot)
= xs isnt fickle
so Ds pivot proves the Corollary!
(note: D precedes D in this case)
ci ais 0 for i B F :
Case i = t:
ats > 0: since xt is leaving in Ds pivot
ct > 0: since xt is entering in D s pivot
CSCI 5654
H. Gabow
Fall 2007
#49, p. 1
Case i B (F t):
ais 0: bi = 0 (since xi = 0 throughout S)
but xi isnt the leaving variable in Ds pivot
ci 0: otherwise xi is nonbasic in D & D s pivot makes xi entering (i < t)
since the r.h.s. of () is positive, some i B F has ci 6= 0
hence xi is in B but not B , i.e., a pivot between D & D involves xi
S
S
some variable drops
out of problem
i = 1, . . . , n
j = 1, . . . , 2n
CSCI 5654
H. Gabow
Fall 2007
#49, p. 2
d
e
maximize xa + xb + xc + xd + xe
subject to xa + xb 1
xa + xc 1
xb + xc 1
xb + xd 1
xc + xe 1
xd + xe 1
xa , xb , xc , xd , xe {0, 1}
maximum independent set ILP
.5
.5
.5
.5
.5
maximum fractional
independent set
maximize xa + xb + xc + xd + xe
subject to xa + xb + xc 1
xb + xd 1
xc + xe 1
xd + xe 1
xa , xb , xc , xd , xe {0, 1}
subject to yabc 1
yabc + ybd 1
yabc + yce 1
ybd + yde 1
yce + yde 1
yabc , ybd , yce , yde {0, 1}
CSCI 5654
H. Gabow
minimum
edge cover
Fall 2007
.5
.5
.5
.5
.5
minimum fractional
edge cover
#50, p. 1
Pn
maximize z = j=1 xj
subject to xj + xk 1
xj {0, 1}
(j, k) an edge of G
j = 1, . . . , n
LP
relaxation:
Pn
maximize z = j=1 xj
subject to xj + xk 1
xj 0
(j, k) an edge of G
j = 1, . . . , n
P
y
minimize z = m
P i=1 i
subject to {yi : vertex j is on edge i} 1
yi 0
j = 1, . . . , n
i = 1, . . . , m
integral
dual:
P
y
minimize z = m
P i=1 i
subject to {yi : vertex j is on edge i} 1
yi {0, 1}
j = 1, . . . , n
i = 1, . . . , m
CSCI 5654
H. Gabow
Fall 2007
#50, p. 2
C a maximal clique of G
j = 1, . . . , n
LP optimum
duality gap
CSCI 5654
H. Gabow
Fall 2007
#50, p. 3
Generalization to Hypergraphs
consider a family F of subsets of V
a packing is a set of elements of V , 1 in each set of F
a covering is a family of sets of F collectively containing all elements of V
maximum packing
Pn ILP:
P
maximize j=1 xj subject to {xj : j S} 1, S F; xj {0, 1}, j = 1, . . . , n
minimum covering
P ILP:
P
minimize SF yS subject to {yS : j S} 1, j = 1, . . . , n; yS {0, 1}, S F
as above, the LP relaxations of these 2 ILPs are duals, so any packing is any covering
this packing/covering duality is the source of a number of beautiful combinatoric theorems
where the duality gap is 0
in these cases the ILPs are solvable in polynomial time!
e.g., finding a maximum flow; packing arborescences in a directed graph
CSCI 5654
H. Gabow
Fall 2007
#50, p. 4
Polyhedral Combinatorics
(0,0,1)
(.5,.5,.5)
(1,0,0)
(0,1,0)
(0,1,0)
(1,0,0)
clique constraints
edge constraints
CSCI 5654
H. Gabow
Fall 2007
#51, p. 1
Polyhedral Combinatorics
to analyze the independent sets of a graph G,
we can analyze the polyhedron P whose vertices are those independent sets
this depends on having a nice description of P
which we have if G is perfect
in general polyhedral combinatorics analyzes a family of sets
by analyzing the polyhedron whose vertices are (the characteristic vectors of) those sets
CSCI 5654
H. Gabow
Fall 2007
#51, p. 2
Disjoint Paths Problem: Given a graph, vertices s, t & integer k, are there k openly disjoint
st-paths?
s1
s2
t2
t1
CSCI 5654
H. Gabow
Fall 2007
#52, p. 1
Example
Primal
maximize z = x1 + 4x2
subject to
x1 + x2
x1
1
2
Dual
minimize w = y1 + 2y2
subject to
y1 + y2
y1
y1 , y2
=1
=4
0
optimum primal: x1 = 2, x2 = 3, z = 14
optimum dual: y1 = 4, y2 = 5, w = 14
(1,4)
x2
x2
(1,1)
(2,3)
(1,0)
x1+4x2=14
x1+4x2=14
(6,2)
x1+x2=1
x1=2
x1+x2=1
x1=2
x1
x1
CSCI 5654
H. Gabow
Fall 2007
#53, p. 1
Pn
j=1
aij xj
minimize w =
bi
subject to
Dual
Pm
i=1 bi yi
Pm
i=1 aij yi
yi
= cj
0
vector
vector
vector
vector
(x1 , . . . , xn )
of cost coefficients
(ai1 , . . . , ain )
of dual values (y1 , . . . , ym )
CSCI 5654
H. Gabow
Fall 2007
#53, p. 2
1
2
s1 + s2
3
3
1
2
s1 s2
3
3
CSCI 5654
H. Gabow
Fall 2007
#54, p. 1
Testing Optimality
we verify (xj ) is optimal:
(2): inequality in 3rd upper bound = y3 = 0
(1) :
y0 + y1 = 9
2y0 + y2 = 12
3y0 = 15
= y0 = 5, y2 = 2, y1 = 4
(3): holds by definition
(4): holds
= (xj ) is optimum
Duals are Prices
How valuable is more knapsack capacity?
suppose we increase the size of the knapsack by
we can add /3 more pounds of item 3
increasing the value by 5
so the marginal price of knapsack capacity is 5 (= y0 )
How valuable is more of item 3?
obviously 0 (= y3 )
How valuable is more of item 1?
suppose more pounds of item 1 are available
we can add more pounds of item 1 to the knapsack
assuming we remove /3 pounds of item 3
this increases the knapsack value by 9 5 = 4
so the marginal price of item 1 is 4 (= y1 )
General Knapsack LPs
Primal:
maximize
subject to
n
X
vj xj
j=1
n
X
wj xj C
j=1
Dual:
minimize Cy0 +
xj 1
j = 1, . . . , n
xj 0
j = 1, . . . , n
n
X
yj
j=1
subject to
wj y0 + yj vj
yj 0
CSCI 5654
j = 1, . . . , n
j = 0, . . . , n
H. Gabow
Fall 2007
#54, p. 2
Optimal Solutions
assume the items are indexed by decreasing value per pound, i.e.,
v1 /w1 v2 /w2 . . .
optimum solution: using the greedy algorithm, for some s we get
x1 = . . . = xs1 = 1, xs+1 = . . . = xn = 0
to verify its optimality & compute optimum duals:
(2): ys+1 = . . . = yn = 0 (intuitively clear: theyre worthless!)
we can assume xs > 0
now consider 2 cases:
Case 1: xs < 1
(2): ys = 0
(1): equation for xs gives
y0 = vs /ws
equations for xj , j < s give
yj = vj wj (vs /ws )
(4): holds by our indexing
(3): first s equations hold by definition
remaining inequalities say y0 vj /wj , true by indexing
Case 2: xs = 1
(1): a system of s + 1 unknowns yj , j = 0, . . . , s & s equations
solution is not unique
but for prices, we know item s is worthless
so we can set ys = 0 and solve as in Case 1
Duals are Prices
our formulas for the duals confirm their interpretation as prices
Pn
the dual objective Cy0 + j=1 yj
computes the value of the knapsack and the items on hand
(i.e., the value of our scarse resources)
the dual constraint wj y0 + yj vj
says the monetary (primal) value of item j is no more than its value computed by price
CSCI 5654
H. Gabow
Fall 2007
#54, p. 3
Chvatal, 79 84
Review of Matrices
1 0
I is the identity matrix, e.g., 0 1
0 0
the dimension of I is unspecified
0
0
1
and determined by context
0
similarly 0 is the column vector of 0s, e.g. 0
0
with dimension determined by context
Matrix Operations
scalar multiple: for A = [aij ] an m n matrix, t a real number
tA is an m n matrix, [taij ]
matrix sum: for m n matrices A = [aij ], B = [bij ]
A + B is an m n matrix, [aij + bij ]
matrix product: for m n matrix A = [aij ], nP p matrix B = [bjk ]
AB is an m p matrix with ikth entry nj=1 aij bjk
time to multiply two n n matrices:
O(n3 ) using the definition
O(n2.38 ) using theoretically efficient but impractical methods
in practice much faster than either bound, for sparse matrices
only store the nonzero elements and their position
only do work on nonzero elements
matrix multiplication is associative, but not necessarily commutative
AI = IA = A for every n n matrix A
(see also Handout# 65 for more background on matrices)
CSCI 5654
H. Gabow
Fall 2007
#55, p. 1
Matrix Relations
for A, B matrices of the same shape,
A B means aij bij for all entries
A < B means aij < bij for all entries
Linear Independence & Nonsingularity
P
a linear combination of vectors xi is the sum i ti xi , for some real numbers ti
if some ti is nonzero the combination is nontrivial
a set of vectors xi is linearly dependent if some nontrivial linear combination of xi equals 0
let A be an n n matrix
A is singular the columns of A are linearly dependent
some nonzero vector x satisfies Ax = 0
for every column vector b, Ax = b
has no solution or an infinite number of solutions
A is nonsingular
0
1
2
Proof.
0 = 1 :
1 solution:
n column vectors in Rn that are linearly independent span Rn ,
i.e., any vector is a linear combination of them
1 solution: Ax = Ax = b = A(x x ) = 0 = x = x
1 = 2 :
construct A1 column by column so AA1 = I
to show A1 A = I:
A(A1 A) = A, so deduce column by column that A1 A = I
2 = 0 :
Ax = 0 = x = A1 Ax = A1 0 = 0
CSCI 5654
H. Gabow
Fall 2007
#55, p. 2
x1 enters, x3 leaves
x2 enters, x4 leaves
x3 = 1 x1
x4 = 2 x1 x2
x1 = 1 x3
x4 = 1 x2 + x3
x1 = 1 x3
x2 = 1 + x3 x4
z=
z = 3 + x2 3x3
z = 4 2x3 x4
Optimum Dictionary
3x1 + x2
xB
0 0
1
=
2
1st Iteration
1 0
since we start with the basis of slacks, B =
, the identity matrix
0 1
thus all linear algebra is trivial
this is usually true in general for the first iteration
Entering Variable Step
yB = yI = y = cB = 0 0
1
1
0
td =
=
2
1
1
CSCI 5654
H. Gabow
Fall 2007
#56, p. 1
xB =
1
1
B = (1, 4)
2nd Iteration
Entering Variable Step
1 0
y
= 3 0 ,y= 3 0
1 1
0
trying x2 , 1 > 3 0
= 0 so it enters
1
Leaving Variable Step
1 0
0
0
d=
,d=
1 1
1
1
1
0
t
0
1
1
=
1
1
0
1
xB =
1
B = (1, 2)
3rd Iteration
Entering Variable Step
1 0
y
= 3 1 ,y= 2
1 1
CSCI 5654
H. Gabow
Fall 2007
#56, p. 2
Approach
a revised simplex iteration solves 2 systems, yB = cB , Bd = As
then replaces rth column of B by As
& solves similar systems for this new B
maintaining B in a factored form makes the systems easy to solve & maintain
also maintains sparsity, numerically stable
Bd = As = the next B matrix is BE, where E is an eta matrix with rth column = d
thus Bk = B E+1 E+2 . . . Ek1 Ek
()
()
CSCI 5654
H. Gabow
Fall 2007
#57, p. 1
CSCI 5654
H. Gabow
Fall 2007
#57, p. 2
Principles in Chv
atals Time Estimate
1. the eta columns of Ei have density between .25 & .5
in the steady state, i.e., after the slacks have left the basis
density is much lower before this
2. solving Bk d = As is twice as fast as yBk = cB
Argument:
cB is usually dense, but not As
we compute the solution di+1 to Ei+1 di+1 = di
di is expected to have the density given in #1
(since it could have been an eta column)
= if Ei+1 has eta column p, dip = 0 with probability .5
but when dip = 0, di+1 = di , i.e., no work done for Ei+1 !
3. in standard simplex, storing the entire dictionary can create core problems
also the sparse data structure for standard simplex is messy (e.g., Knuth, Vol. I)
dictionary must be accessed by row (pivot row) & column (entering column)
Product Form of the Inverse a commonly-used implementation of revised simplex
maintains B1
k = Ek Ek1 . . . E1
where Ei = eta matrix that specifies the ith pivot
Enhanced Triangular Factorization of the Basis (Chvatal, Ch. 24)
achieves even greater sparsity, halving the number of nonzeros
CSCI 5654
H. Gabow
Fall 2007
#57, p. 3
Data Structures
given data:
col 1
A
packed form
by column
c
dense form
b
dense form
col 2
col 3
col 4
basis:
xB
basis values
B
basis heading
eta file:
P1
Pm
L1
Lm
U1
Um
Ek
E+1
Lm Pm . . . L1 P1 B = UE+1 . . . Ek
(B is the current basis)
Entering Variable Step
Solve yB = cB :
1. solve zUE+1 . . . Ek = cB
dense copy
of cB
2. y = zLm Pm . . . L1 P1
dense
z
CSCI 5654
H. Gabow
Fall 2007
#58, p. 1
dense copy
of As
2. solve UE+1 . . . Ek d = a
dense
a
CSCI 5654
H. Gabow
Fall 2007
#58, p. 2
Remark.
to achieve a sparser triangular factorization,
we may permute the rows and columns of B to make it almost lower triangular form,
with a few spikes (Chvatal, p.9192)
CSCI 5654
H. Gabow
Fall 2007
#58, p. 3
Chvatal, Ch. 16
j=1 aij xj
bi , i = 1, . . . , m
unfortunately Step 3 is hard, & repeated applications of Step 2 can generate huge systems
but heres an example where Fourier-Motzkin works well:
consider the system
xi xj bij
write x1 s inequalities as
i, j = 1, . . . , n, i 6= j
x1 xj + b,
x1 xk + b
CSCI 5654
H. Gabow
Fall 2007
#59, p. 1
1 0 1 0
introducing slacks v1 , v2 , v3 gives coefficient matrix 0 1 0 1
0
1 0 0
0
0
1
x1
CSCI 5654
H. Gabow
Fall 2007
#59, p. 2
the vector
Pk
i
i=1 ti x
is a
CSCI 5654
H. Gabow
Fall 2007
#59, p. 3
xu
A x = b
ri , s j 0
since M & N are huge, we dont work with M explicitly instead solve M by column generation:
each Entering Variable Step solves the auxiliary problem A:
maximize c yA subject to A x = b , x u
where y is the vector of dual values for M, with its last component dropped
the solution to A will be either
a normal bfs (i.e., a vi ) which can enter Ms basis, or
a basic feasible direction of an unbounded solution (a wj ) which can enter Ms basis, or
a declaration of optimality
the decomposition algorithm works well when we can choose A , A so
A has few constraints, and either
A can be solved fast, e.g., a network problem, or
A breaks up into smaller independent LPs, so we can solve small auxiliary problems
CSCI 5654
H. Gabow
Fall 2007
#59, p. 4
Dual Problem D
maximize z = cx
minimize yb
subject to Ax b
x0
subject to yA c
y0
iB
dual dictionary
P
yj = cj + iB aij yi ,
P
w = z iB bi yi
jN
Proof of Theorem 1:
show that after each pivot
the 2 simplex algorithms have dictionaries corresponding as in (ii)
argument is straightforward
e.g., dual simplexs minimum ratio test is
minimize cs /ars , ars < 0
standard simplexs minimum ratio test on the corresponding dual dictionary is
minimize cs / ars , ars > 0
Proof of Theorem 2:
index the primal constraints and variables as follows:
C = the set of primal constraints (|C| = m)
D = the set of primal decision variables (i.e., the given variables; |D| = n)
Proof of (i):
primal constraints after introducing slacks:
xC
=b
I A
xD
define P = I A
xC consists of m slack variables indexed by C
xD consists of n decision variables indexed by D
dual constraints after introducing slacks:
CSCI 5654
H. Gabow
Fall 2007
#60, p. 1
A
=c
yD
I
yC
A
define Q =
I
yC : m decision variables
yD : n slack variables
in (i), B is a set of m indices of C D, N is the complementary set of n indices
for simplicity let B consist of
the first k indices of D and last m k indices of C
denote intersections by dropping the sign
e.g., BC denotes all indices in both B & C
we write P with its rows and columns labelled by the corresponding indices:
NC BC BD ND
Ik
0
B Y
P=
0 Imk X Z
NC
BC
B
Y
Z
X
Q=
Ik
0
0
Ink
NC
BC
BD
ND
CSCI 5654
H. Gabow
Fall 2007
#60, p. 2
Proof of (ii):
the primal dictionary for basis B is
1
xB = P1
B b PB PN xN
1
z = cB P1
B b + (cN cB PB PN )xN
Imk
0
B1 Y
Ink
1. now we check the terms aij in the 2 dictionaries (defined in (ii)) correspond:
in the primal dictionary
these terms
are P1
B PN
1
XB
Imk Ik Y
which equal
B1
0
0 Z
in the dual dictionary
theseterms are QB Q1
N
X Z B1 B1 Y
which equal
Ik 0
0
Ink
the 2 products are negatives of each other, as desired
2. now we check the objective values are negatives of each other:
the primal objective value is cB P1
B b
bN C
cB = 0mk cBD , b =
bBC
1
objective value = cBD B
0mk b = cBD B1 bN C
1
the dual objective value is cQ
N bN :
bN C
c = cBD cN D , bN =
0nk
1
B bN C
= cBD B1 bN C , negative of primal
objective value = c
0nk
CSCI 5654
H. Gabow
Fall 2007
#60, p. 3
Integral Polyhdera
Total Unimodularity
this property, of A alone, makes a polyhedron integral
a matrix A is totally unimodular if every square submatrix has determinant 0, 1
Examples of Totally Unimodular Matrices
1. the node-arc incidence matrix of a digraph
Proof Sketch:
let B be a square submatrix of the incidence matrix
induct on the size of B
Case 1: every column of B contains both a +1 & a 1
the rows of B sum to 0, so det(B) = 0
Case 2: some column of B contains only 1 nonzero
expand det(B) by this column and use inductive hypothesis
CSCI 5654
H. Gabow
Fall 2007
#61, p. 1
3. interval matrices 0,1 matrices where each column has all its 1s consecutive
Proof Idea:
proceed as above if row 1 contains 1 positive entry
otherwise, subtract the shorter column from the longer to reduce the total number of 1s
4. an amazing theorem of Seymour characterizes the totally unimodular matrices as being built up
from network matrices and 2 exceptional 5 5 matrices, using 9 types of operations
Theorem 1. A totally unimodular =
for every integral vector b, the polyhedron Ax b is integral.
Theorem 2. Let A be integral. A totally unimodular
for every integral b, the polyhedron Ax b, x 0 is integral
for every integral a, b, c, d, the polyhedron a Ax b, c x d is integral
can also allow components of these vectors to be
Proof Idea (this is the basic idea of total unimodularity):
suppose A is totally unimodular & b is integral
well show the polyhedron Ax b, x 0 is integral
for any basis B, the basic variables have values B1 b
B has determinant 1, by total unimodularity
note that some columns of B can be slacks
so B1 is an integral matrix
CSCI 5654
H. Gabow
Fall 2007
#61, p. 2
CSCI 5654
H. Gabow
Fall 2007
#61, p. 3
Initialization
in general we need a Phase 1 procedure for initialization
since a transshipment problem can be infeasible (e.g., no source-sink path)
the following Phase 1 procedure sometimes even speeds up Phase 2
(by breaking the network into smaller pieces)
a simple solution vector x is where 1 node w transships all goods:
all sources i send their goods to w, along edge iw if i 6= w
all sinks receive all their demand from w, along edge wi if i 6= w
this is feasible (even if w is a source or sink) if all the above edges exist in G
(since satisfying the constraints for all vertices except w implies satisfying ws constraint too)
in general, add every missing edge iw or wi as an
P artificial edge
then run a Phase 1 problem with objective t = {xwi , xiw : wi (iw) artificial}
there are 3 possibilities when Phase 1 halts with optimum objective t :
1. t > 0: the given transshipment problem is infeasible
2. t = 0 & no artificial edge is in the basis: proceed to Phase 2
3. t = 0 & an artificial edge is in the basis
we now show that in Case 3, the given problem decomposes into smaller subproblems
graph terminology:
V denotes the set of all vertices
for S V , edge ij enters S if i
/ S, j S
ij leaves S if i S, j
/S
Lemma 1. Let S be a set of vertices where
(a) no edge of G enters S;
P
(b) the total net demand in S ( iS bi ) equals 0.
Then any feasible solution (of the given transshipment problem)
has xe = 0 for every edge e leaving S.
Remark. (a) + the Lemmas conclusion show we can solve this network
by finding an optimum solution on S and an optimum solution on V S.
Proof.
(a) shows the demand in S must be satisfied by sources in S
(b) shows this exhausts all the supply in S, i.e., no goods can be shipped out
CSCI 5654
H. Gabow
Fall 2007
#62, p. 1
y i +1
artificial edge
yi
basic
nonbasic
Fig.1. y stays the same on an edge of G in T . It doesnt increase
on an edge of G T . Artificial edges increase by 1.
yu +1
yu
u
V-S
Fig.1 implies
vS
no edge of G enters S
no edge of G leaving S is in T
thus no goods enter or leave S
this implies the total net demand in S equals 0
now Lemma 1 applies, so the network decomposes into 2 smaller networks
Cycling
the network simplex never cycles, in practice
but Chvatal (p.303) gives a simple example of cycling
its easy to avoid cycling, as follows
suppose we always use the top-down procedure for computing y (fixing yr = 0)
its easy to see a pivot updates y as follows:
Fact. Suppose we pivot ij into the basis. Let cij = cij + yi yj . Let d be the deeper end of ij in
the new basis. Then y changes only on descendants of d. In fact
CSCI 5654
H. Gabow
Fall 2007
#62, p. 2
old 0
leave
old 0
candidate
new 0
candidate
new 0
candidate
old 0
non0
flip
enter
degenerate
(b)
(a)
no old 0
nondegenerate
(c)
CSCI 5654
H. Gabow
Fall 2007
#62, p. 3
Transportation Problem
special case of transshipment:
no transshipment nodes
every edge goes from a source to a sink
the setting for Hitchcocks early version of network simplex
assignment problem is a special case
we can extend the transshipment problem in 2 ways:
Bounded Sources & Sinks
generalize from demands = b to demands , b:
to model such demands add a dummy node d
route excess goods to d and satisfy shortfalls of goods from d:
d
<=b
>=b
d has demand
i bi ,
=b
=b
Bounded Edges
edges usually have a capacity, i.e., capacity uij means xij uij
an edge e with capacity u can be modelled by placing 2 nodes on e:
<=u
(a)
-u
(b)
u-x
(c)
CSCI 5654
H. Gabow
Fall 2007
#63, p. 1
3
s
1
1
t
5
CSCI 5654
H. Gabow
Fall 2007
#63, p. 2
2 1
Example. A =
1
1
(1) 2x21 2x1 x2 + x22 = x21 + (x1 x2 )2 > 0 for (x1 , x2 ) 6= (0, 0)
3 5
2
=
= the eigenvalues are (3 5)/2
(2) A
1 5
1 5
1 1
1 0
(3) A =
0
1 1 1
Proof.
(1)= (2):
suppose Ax = x
then xT Ax = xT x = kxk2 > 0 = > 0
(2)= (3):
A symmetric = it has n orthonormal eigenvectors, say xi , i = 1, . . . , n
form n n matrix Q, with ith column xi
form diagonal matrix with ith diagonal entry i , eigenvalue of xi .
then AQ = Q, A = QQT
since Q1 = QT
since each eigenvalue is positive, we can write = DD
this gives A = QD(QD)T
(3)= (1):
xT Ax = xT BT Bx = (Bx)T Bx = kBxk2 > 0
CSCI 5654
H. Gabow
Fall 2007
#64, p. 1
CSCI 5654
H. Gabow
Fall 2007
#64, p. 2
Karloff, Ch.5
Linear Algebra
the transpose of an m n matrix A is the n m matrix AT where ATij = Aji
(AB)T = BT AT
the L2 -norm kxk of x Rn is its length according to Pythagoras,
a unit vector has length 1
pPn
2
i=1 xi
P
the scalar product of 2 vectors x, y Rn is xT y = ni=1 xi yi
it equals kxkkyk cos(the angle between x & y)
Cauchy-Schwartz inequality: xT y kxkkyk
if y is a unit vector, the scalar product is the length of the projection of x onto y
x & y are orthogonal if their scalar product is 0
2 subspaces are orthogonal if every vector in one is orthogonal to every vector in the other
an m n matrix A has 2 associated subspaces of Rn :
the row space is the subspace spanned by the rows of A
the nullspace is the set of vectors x with Ax = 0
the row space & nullspace are orthogonal (by definition)
in fact theyre orthogonal complements:
any vector x Rn can be written uniquely as r + n
where r (n) is in the row space (nullspace)
and is called the projection of x onto the row space (nullspace)
Lemma 1. If the rows of A are linearly independent, the projection of any vector x onto the row
space of A is AT (AAT )1 Ax.
Proof.
AAT is nonsingular:
AAT y = 0 = kAT yk2 = (AT y)T AT y = yT AAT y = 0 = AT y = 0 = y = 0
the vector of the lemma is in the row space of A
its difference with x is in the null space of A:
A(x AT (AAT )1 Ax) = Ax Ax = 0
an affine space is the set v + V for some linear subspace V , i.e., all vectors v + x, x V
a ball B(v, r) in Rn is the set of all vectors within distance r of v
Lemma 2. Let F be the affine space v + V . The minimum of an arbitrary linear cost function cx
over B(v, r) F is achieved at v ru, where u is a unit vector along the projection of c onto V .
Proof.
take any vector x B(v, r) F
let cP be the projection of c onto V
CSCI 5654
H. Gabow
Fall 2007
#65, p. 1
Calculus
logarithms:
for all real x > 1, ln (1 + x) x
Lemma 3. Let x Rn be a vector with x > 0 and
Y
n
2
.
< 1. Then
ln
xj
1
j=1
Pn
j=1 xj
Exercise 1. Prove Lemma 3. Start by using the general fact that the geometric mean is at most
the arithmetic mean:
For any n 1 nonnegative numbers xj , j = 1, . . . , n,
Y
X
1/n
n
n
xj
xj /n
j=1
j=1
(This inequality is tight when all xj s are equal. It can be easily derived from Jensens Inequality
below.)
Qn
Upperbound j=1 1/xj using the above relation. Then write y = 1 x & substitute, getting terms
1/(1 yj ). Check that
terms can be expanded into a geometric series. Simplify
Pn |yj | <P1,n so those
2
using the values of j=1 yj , j=1 yj . (Use the latter to estimate all high order terms). At the end
take logs, & simplify using the above inequality for ln (1 + x).
P
Lemma 4. Let H be the hyperplane ni=1 xi = 1. Let be the subset of H where all coordinates
xi are nonnegative. Let g = (1/n, . . . , 1/n).
p
(i) Any point in is at distance at most Rp= (n 1)/n from g.
(ii) Any point of H within distance r = 1/ n(n 1) of g is in .
note that (1, 0, . . . , 0) n and is at distance R from g, since
(1 1/n)2 + (n 1)/n2 = (n 1)n/n2 = (n 1)/n = R2 .
hence (i) shows that the smallest circle circumscribed about n with center g has radius R
note that (1/(n 1), . . . , 1/(n 1), 0) n and is at distance r from g, since
(n 1)(1/(n 1) 1/n)2 + 1/n2 = 1/n2 (n 1) + 1/n2 = 1/n(n 1) = r 2
hence (ii) shows that the largest circle inscribed in n with center g has radius r
Exercise 2. Prove Lemma 4. First observe that the function (x 1/n)2 is concave up. (i) follows
from this fact. (ii) follows similarly the handy principle is known as Jensens Inequality:
If f (x) is concave up,
CSCI 5654
Pn
j=1 f (xj )
H. Gabow
Pn
nf ( j=1 xj /n).
Fall 2007
#65, p. 2
Karloff, Ch.5
g
spherical
minimization
()
p
s
Tp1
this handout and the next two explain the basic ideas in ()
then we present the algorithm
Optimizing Over Spheres
its easy to optimize a linear function over a spherical feasible region
by method of steepest descent
y
3x+4y=-10
(-6/5,-8/5)
CSCI 5654
H. Gabow
Fall 2007
#66, p. 1
P
s*
x*
to make this precise suppose were currently at point x in the feasible region P
let S (S ) be balls contained in (containing) P with center x
let the radius of S be times that of S, 1
let x (s , s ) have minimum cost in P (S, S ) respectively
Lemma 1. cs cx (1 1/)(cx cx ).
Proof. s = x + (s x). hence
cx cs = cx + c(s x)
( 1)cx + cx cs
( 1)(cx cx ) (cs cx )
dividing by gives the desired inequality
CSCI 5654
H. Gabow
Fall 2007
#66, p. 2
S
x*
CSCI 5654
H. Gabow
Fall 2007
#66, p. 3
Karloff, Ch.5
Simplices
x2
x3
g
2
sin(30 ) =
CSCI 5654
H. Gabow
1
2
= = 2 for 3
Fall 2007
#67, p. 1
CSCI 5654
H. Gabow
Fall 2007
#67, p. 2
Karloff, Ch.5
Projective Transformations
g
spherical
minimization
()
p
s
Tp1
2
g
(3x, 23 y)
.
3x + 23 y
2
.
x+1
Properties of Tp
any vector y = Tp (x) belongs to n
since x 0 implies y 0
and the defining formula shows 1T y = 1
Tp (p) = g since all its coordinates are equal
Tp restricted to n has an inverse since we can recover x:
y j pj
xj = Pn
k=1 yk pk
(the definition of Tp implies xj = yj pj
where the constant is chosen to make 1T x = 1)
CSCI 5654
H. Gabow
Fall 2007
#68, p. 1
Vector notation:
define D to be the n n diagonal matrix with p along the diagonal
i.e., Djj = pj
the above formulas become
x=
Dy
T
1 Dy
and
y=
D1 x
1T D1 x
our plan () is to find s as minimizing a linear cost function over a ball inscribed in P
good news: P is well-rounded
bad news: the cost function in the transformed space is nonlinear,
cx = cDy/(1T Dy) = c y/(1T Dy)
solution: exercising some care, we can ignore the denominator
and minimize c y, the pseudocost, in transformed space!
Caution. because of the nonlinearity,
the sequence of points p generated by Karmarkars algorithm need not have cost cp decreasing
a point p may have larger cost than the previous point p
CSCI 5654
H. Gabow
Fall 2007
#68, p. 2
the denominator is
D1 x
cD T 1
1 D x
n
cx
T
1 D1 x
n
(cx)n
(cx)n
=
p
ni=1 (xi /pi )
x
in the scheme () we will choose s so fp (s) fp (g) , for some positive constant
the Lemma shows we get f (p ) f (p)
thus each step of the algorithm decreases f (p) by
and we push the potential to as planned
CSCI 5654
H. Gabow
Fall 2007
#68, p. 3
Karloff, Ch.5
Karmarkars Algorithm
P
{ log |r| : r a nonzero entry in A or c} (see Handout
Karmarkars Algorithm
Initialization
Set p = 1/n. If cp = 0 then return p.
Let > 0 be a constant determined below (Handout#70, Lemma 4). Set N = 2nL/.
Main Loop
Repeat the Advance Step N times (unless it returns).
Then go to the Rounding Step.
Advance Step
Advance from p to the next point p , using an implementation of ().
If cp = 0 then return p .
Set p = p .
Rounding Step
Move from p to a vertex v of no greater cost. (Use the exercise of Handout#23.)
If cv = 0 then return v, else return z > 0.
a valid implementation of () has these properties:
assume z = 0, p P , p > 0 & cp > 0
then p P , p > 0, and either cp = 0 or f (p ) f (p)
Lemma 1. A valid implementation of () ensures the algorithm is correct.
Proof. we can assume the Main Loop repeats N times
Pn
we start at potential value f (g) = ln (c1/n)n /(1/n)n = n ln
i=1 ci nL
the last inequality
follows since if C is the largest cost coefficient,
Pn
ln
c
i=1 i ln (nC) ln n + ln C L
each repetition decreases the potential by
so the Main Loop ends with a point p of potential nL N nL
thus ln (cp)n < f (p) < nL, cp < eL < 2L
so the Rounding Step finds a vertex of cost < 2L
is a rational number with denominator < 2L (by the exercise of Handout#25)
so > 0 = > 1/2L
thus = 0
CSCI 5654
H. Gabow
Fall 2007
#69, p. 1
AD
1T
CSCI 5654
H. Gabow
Fall 2007
#69, p. 2
Karloff, Ch.5
Lemma 3. z = 0 = cP 6= 0.
Proof.
suppose cP = 0
then the formula for cP shows c is in the rowspace of B
.. c is orthogonal to every vector in the nullspace of B
take any q P
thus Tp (q) P , and Tp (q) Tp (p) is in the nullspace of B
so c (Tp (q) Tp (p)) = 0
recalling from Handout#68, Lemma 1 how cost c transforms to c ,
cp > 0 = c Tp (p) > 0
thus c Tp (q) > 0, and so cq > 0
equivalently, z > 0
H. Gabow
Fall 2007
#70, p. 1
Lemma 5. Choosing as an arbitrary real value in (0, 1/2) and = 2 /(1 ) gives a valid
implementation of ().
Proof.
the first 2 requirements for validity are clear:
p P since s P (were using Lemma 4 of Handout#65 & Lemma 1 of Handout#68!)
p > 0 (since < 1, each coordinate sj is positive)
for the 3rd requirement, assume z = 0, cp > 0
we must show f (p ) f (p)
from Handout#68,p.2 this means fp (s) fp (g)
by definition fp (y) = ln ((c y)n /y)
c s n
ln (ns)
this gives fp (s) fp (g) = ln
cg
the 1st term is :
c s n
ln
< ln (1 /n)n = n ln (1 /n)
cg
Lemma 4
the 2nd term is 2 /(1 ):
apply the Lemma 3 of Handout#65 to vector x = ns
sP> 0
n
j=1 sj = 1
k1 nsk = nk1/n sk = n/n = < 1
2
conclude ln (ns)
1
combining the 2 estimates, fp (s) fp (g) + 2 /(1 )
the lemma chooses as the negative of the r.h.s., giving the desired inequality
furthermore choosing < 1/2 makes > 0
Theorem. Karmarkars algorithm solves an LP in polynomial time, assuming all arithmetic operations are carried out exactly.
Proof.
the Main Loop repeats O(nL) times
each repetition performs all matrix calculations in O(n3 ) arithmetic operations
including taking a square root to calculate kcP k in ()
so we execute O(n4 L) arithmetic operations
(Vaidya (STOC 90) reduces this to O(n3 L))
CSCI 5654
H. Gabow
Fall 2007
#70, p. 2
Karloff, Ch.5
Pn
j=1 xj
Proof.
sinceQthe geometricPmean is at most the arithmetic mean,
n
n
n
j=1 1/xj [
j=1 (1/xj )/n]
let y = 1 x
Pn
so the r.h.s. becomes [ j=1 (1/(1 yj ))/n]n
we will
P upperbound the sum in this expression, using these properties of yj :
j yj = 0
kyk =
for each j, |yj | < 1 (since |yj | kyk = < 1)
son the sum is
n
X
X
(1/(1 yj )
(1 + yj + yj2 + yj3 + yj4 + . . .)
j=1
j=1
n
n
X
X
(yj2 + yj3 + yj4 + . . .)
(1 + yj ) +
=
j=1
j=1
=n+0+
n
X
yj2 (1 + yj + yj2 + . . .)
j=1
n+
=n+
n
X
j=1
n
X
yj2 (1 + + 2 + . . .)
yj2 /(1 )
j=1
= n + kyk2 /(1 )
= n + 2 /(1 )
we have shown
Qn
j=1
taking logs,
Qn
ln j=1 1/xj n ln [1 + 2 /n(1 )] n2 /n(1 ) = 2 /(1 )
CSCI 5654
H. Gabow
Fall 2007
#71, p. 1
Exercise 2:
Pn
Lemma 4. Let H be the hyperplane i=1 xi = 1. Let be the subset of H where all coordinates
xi are nonnegative. Let g = (1/n, . . . , 1/n).
p
(i) Any point in is at distance at most Rp= (n 1)/n from g.
(ii) Any point of H within distance r = 1/ n(n 1) of g is in .
Proof.
(i) take any point in x n
we show its distance from g is R by starting at x,
moving away from g, and eventually reaching a corner point like (1, 0, . . . , 0),
which weve seen is at distance R
wlog let x1 be the maximum coordinate xj
choose any positive xj , j > 1
increase x1 by xj and decrease xj to 0
we stay on H,
this increases the distance from g, since (x 1/n)2 is concave up
repeat this until the corner point (1, 0, . . . , 0) is reached
(ii) it suffices to show any point x H with xn < 0 is at distance > r from g
Pn1
a point of H with xn < 0 has j=1 xj > 1
Pn1
to minimize j=1 (xj 1/n)2 , set all coordinates equal (by Jensen)
so the minimum sum is > (n 1)(1/(n 1) 1/n)2
this implies
the distance to g is
Pn
2
2
2
2
(x
j=1 j 1/n) > (n 1)(1/(n 1) 1/n) + 1/n = r
CSCI 5654
H. Gabow
Fall 2007
#71, p. 2
source: Primal-dual Interior-point Methods by S.J. Wright, SIAM, Philadelphia PA, 1997.
the break-through papers:
L.G. Khachiyan, A polynomial algorithm in linear programming, Soviet Math. Dolkady, 1979
N. Karmarkar, A new polynomial-time algorithm for linear programming, Combinatorica, 1984
after Karmarkar other interior-point methods were discovered,
making both theoretic and practical improvements
Primal-dual Interior-point Methods
consider an LP
minimize cx
subject to Ax = b
x0
& its dual,
maximize
yb
subject to yA + s = c
s0
we find optimum primal and dual solutions by solving the system () of Handout#19, p.2:
()
Ax
yA + s
xj sj
x, s
=
=
=
b
c
0
0
j = 1, . . . , n
Approach
apply Newtons method to the 3 equations of ()
modified so that positivity (x, s > 0) always holds
this keeps us in the interior and avoids negative values!
the complementary slackness measure =
Pn
j=1 xj sj
Pn
j=1
ln (xj sj )
H. Gabow
Fall 2007
#72, p. 1
Path-following methods
the central path of the feasible region is a path (xt , yt , st ) (t is a parameter > 0)
satisfying () with the 3rd constraint replaced by
xj sj = t, j = 1, . . . , n
clearly this implies positivity
as t approaches 0 we approach the desired solution
we can take bigger Newton steps along the central path
predictor-corrector methods alternate between 2 types of steps:
(a) a predictor step: a pure Newton step, reducing
(b) a corrector step: moves back closer to the central path
Mehrotras predictor-corrector algorithm is the basis of most current interior point codes
e.g., CPLEX
Infeasible-interior-point methods can start without a feasible interior point
extensions of these methods solve semidefinite programming (Handouts#41, 44)
& (convex) quadratic programming,
minimize cT x + xT Qx
subject to
Ax = b
x0
where Q is symmetric positive semidefinite (Handouts#42, 43)
CSCI 5654
H. Gabow
Fall 2007
#72, p. 2
MC, 16.1-2,16.5
T
0
A
y
b
=
AT
0
x
cT
T
T
y
=0
s
t
x
s, t, x, y 0
s
tT
CSCI 5654
H. Gabow
Fall 2007
#73, p. 1
chickens out
2, 2
1, 1
1, 1
0, 0
both players
chicken out
By x
Example. Chicken has 2 pure Nash points: ROW always chickens out & COLUMN never does, and vice
versa Also, both players choose randomly with probability 1/2.
(2(1/2) 1(1/2) = 1/2, 1(1/2) + 0(1/2) = 1/2)
Fact. Any bimatrix game has a (stochastic) Nash point.
Theorem. The Nash equilibria correspond to the solutions of an LCP.
Proof.
1. can assume all entries of A & B are > 0
Proof. for any A , distributivity shows xT (A + A )y = xT Ay + xT A y
any stochastic x, y have xT 1y = 1
for 1 an m n matrix of 1s
so we can increase A by a large multiple of 1,
without changing the Nash equilibria, to make every coefficient positive
CSCI 5654
H. Gabow
Fall 2007
#73, p. 2
= 1
T
v
B
0
y
T
u
x
=0
v
y
x, u, y, v 0
3. conversely let x, y be any solution to the LCP
make them stochastic vectors:
x
y
x = T , y = T
1 x
1 y
these vectors form a Nash equilibrium point, by an argument similar to #2
e.g., we know Ay 1
also xT Ay = 1T x by complementarity uT x = 0
thus xT Ay = 1, Ay (xT Ay)1, Ay (xT Ay )1
the last inequality implies ROWs Nash condition
CSCI 5654
H. Gabow
Fall 2007
#73, p. 3
V 23.2
CSCI 5654
H. Gabow
Fall 2007
#74, p. 1
Examples.
for simplicity well use 1-dimensional primals
1. Primal: min x2 /2 s.t. x 1
Q = (1), PD
the optimum solution is x = 1, objective = 1/2
Dual: max z 2 /2 + y s.t. y + z 0, y 0
optimum solution y = 1, z = 1, objective = 1/2 + 1 = 1/2
Proof this dual solution is optimum:
z y 0 = z 2 y 2 , z 2 /2 y 2 /2
.. objective y 2 /2 + y = y(y/2 + 1)
this quadratic achieves its maximum midway between the 2 roots, i.e., y = 1
CSCI 5654
H. Gabow
Fall 2007
#74, p. 2
V, p.404
Losing PSD
1. If Q is not PSD a point satisfying KKT need not be optimum. In fact every vertex can be a
local min! No good QP algorithms are known for this general case.
2. this isnt surprising: QP is NP-hard
integer QP is undecidable!
we give an example where Q with Q PD has a unique optimum
but flipping Qs sign makes every vertex a local min!
Q PD: P
Pn
n
Q: min j=1 xj (1 + xj ) + j=1 j xj s.t. 0 xj 1, j = 1, . . . , n
assume the s are small, specifically for each j, |j | < 1
going to standard form, A = I, b = (1, . . . , 1)T , Q = 2I
the dual KKT constraint AT y Qx cT is
yj 2xj 1 + j , j = 1, . . . , n
the KKT CS constraints are
xj > 0 = yj 2xj = 1 + j , i.e., 2xj = 1 j yj
yj > 0 = xj = 1
the first CS constraint implies we must have xj = 0 for all j
(since our assumption on j implies 1 j yj < yj 0)
taking x = y = 0 satisfies all KKT conditions, so the origin is the unique optimum
(as expected)
Q ND:
flipping Qs sign is disastrous we get an exponential number of solutions to KKT!
Q: flip the sign of the quadratic term in the objective function, xj (1 + xj ) xj (1 xj )
now Q = 2I but the other matrices & vectors are unchanged
the dual KKT constraint gets a sign flipped,
yj + 2xj 1 + j , j = 1, . . . , n
& the KKT CS constraints change only in that sign:
xj > 0 = yj + 2xj = 1 + j , i.e., 2xj = 1 + j + yj
yj > 0 = xj = 1
the new KKT system has 3n solutions:
there are 3 possibilities for each j,
xj = y j = 0
xj = 1, yj = 1 j
xj = (1 + j )/2, yj = 0
its easy to check all 3 satisfy all KKT conditions
note the 3rd alternative is the only possibility allowed by CS when 0 < xj < 1
the feasible region has 2n vertices, each is a KKT solution,
and it can be shown that each vertex is a local minimum
CSCI 5654
H. Gabow
Fall 2007
#75, p. 1