Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 220

CSCI 5654 Handout List

Required reading is from Chvatal. Optional reading is from these reference books:
A: AMPL
K: Karloff
M: Murtys LP book
MC: Murtys LP/CO book
S: Schrijver
V: Vanderbei
Vz: Approximation Algorithms by Vijay Vazirani, Springer 2001.
WV: Winston & Venkataramanan
Handout Reading
#
from C

Handout Title

Course Fact Sheet


Unit 1: Overview

1
2
3
4

3-9
213-223 (M Ch.1)

Linear Programming
Standard Form
LP Objective Functions
Complexity of LP & Related Problems

Unit 2: Basic Simplex Algorithm and


Fundamental Theorem of LP
5
6
7
8
9
10
11
12
13
14

13-23
17-19
27-33

23-25
250-255, 260-261
33-37, 258-260
37-38
39-42
42-43

The Simplex Algorithm: Example


Dictionaries & LP Solutions
The Simplex Algorithm: Conceptual Version
Correctness of the Simplex Algorithm
The Simplex Algorithm with Tableaus
Introduction to Geometry of LP
Avoiding Cycling: Lexicographic Method
Pivot Rules & Avoiding Cycling
The Two-Phase Method
The Fundamental Theorem of LP
Unit 3: Duality

15
16
17
18
19
20
21

54-57
57-59
57-59, 261-262
60-62
62-65
65-68
137-143

The Dual Problem & Weak Duality


Dictionaries are Linear Combinations
Strong Duality Theorem
Why Dual?
Complementary Slackness
Dual Variables are Prices in Economics
Allowing Equations & Free Variables

22

Ch.15

More Applications
Duality in Game Theory

CSCI 5654

H. Gabow

Fall 2007

#i, p. 1

Unit 4: Efficient Implementation of Simplex Algorithm


23
24
25
26

97-100
100-105
Ch.6
105-111

Matrix Representations of LPs


Revised Simplex Algorithm: High Level
Review of Gaussian Elimination
Solving Eta Systems

27
28

195-200, 207-211
201-207

More Applications
The Cutting Stock Problem
Branch-and-Bound Algorithms

Unit 5: Extensions of Theory and Algorithms


29
30
31
32
33
34
35
36

37
38

118-119
119-129, 132-133
130-132, 133-134
242-243
143-146
Ex.16.10
152-157
158-162
162-166

(WV 9.8)
262-269

General Form LPs


Upper Bounding
Generalized Fundamental Theorem
Inconsistent Systems of Linear Inequalities
Theorems of Alternatives
Dual Simplex Algorithm
Sensitivity Analysis
Parametric LPs
More Applications
Cutting Planes for ILP
Applications to Geometry
Unit 6: Network Algorithms

39
40

291-295
296-303

The Transshipment Problem


Network Simplex Algorithm
Unit 7: Polynomial-Time LP

41

443-452, (K Ch.4)

Overview of the Ellipsoid Algorithm


Unit 8: Beyond Linearity

42
43

(MC 16.4.4V 23.1) Quadratic Programming Examples


(MC, 16.4.4)
Solving Quadratic Programs

44

(Vz, Ch.26)

CSCI 5654

More Applications
Semidefinite Programming: Approximating MAX CUT

H. Gabow

Fall 2007

#i, p. 2

Unit 9: Supplemental Material

9.A. Overview
45
46
(M Ch.1)

More LP & ILP Examples


Multiobjective Functions

9.B. Fundamentals
47
48
47-49, 255-258
49
37-38

Geometric View of the Simplex Algorithm: Proofs


Inefficient Pivot Sequences
Stalling in Blands Rule

9.C. Duality
50
51
52
53
261-262, (S 7.5)
54

Combinatoric Example of Weak Duality


Polyhedral Combinatorics
Duality & NP-Completeness
Geometric Interpretation of Duality
Illustrating Duality by Knapsack LPs

9.D. Implementation
55
79-84
56
100-105
57
105-115
58

Review of Matrices
Revised Simplex Example
Eta Factorization of the Basis
Revised Simplex Summary

9.E. Extensions
59
Ch.16
60
149-152

Solving Systems of Linear Inequalities


Primal-Dual Simplex Correspondence

9.F. Networks
61
(S Ch.16, 19, 22)
62
303-317
63
320-322

Integral Polyhdera
Initialization, Cycling, Implementation
Related Network Problems

9.G. Polynomial LP
64
65
(K Ch.5)
66

67

68

69

70

71

72

Positive Definite Matrices


Background Facts for Karmarkars Algorithm
Optimizing Over Round Regions
Simplices
Projective Transformations
Karmarkars Algorithm
Analysis of Karmarkars Algorithm
Exercises for Karmarkars Algorithm
Primal-Dual Interior Point Methods

9.H. Beyond Linearity


73
(MC, 16.1-2,16.5)
74
(V, 23.2)
75
(V, p.404)

Linear Complementarity Problems


Quadratic Programming Duality
Losing PSD

CSCI 5654

H. Gabow

Fall 2007

#i, p. 3

Course Fact Sheet for CSCI 5654: Linear Programming


Time

M W 4:00 5:15

MUEN E118 or ECCS 1B12 (CAETE)

Instructor

Hal Gabow
Office Phone 492-6862

Office Hours

M 2:003:50, 5:15 6:00; W 3:003:50; or by appointment


These times may change slightly see my home page for the authoritative list.
After class is always good for questions.

Grader

TBA

Prerequisites

Undergraduate courses in linear algebra & data structures


e.g., MATH 3130 & CSCI 2270, or their equivalents

Requirements

Weeks 19: Written problem sets, sometimes using LP solvers


Takehome exam, worth 2 HWs
Weeks 1014: multipart reading project
Week 1516: Takehome final, worth 33.5 HWs.
Due Fri. Dec. 14 (day before official final)
Final grades are assigned on a curve. In Fall 05 the grades were:
Campus: 11 As 2 Bs 2 Cs 1 F
CATECS: 1 A
1B

Office ECOT 624


Mailbox ECOT 717
Email [email protected]

Pretaped class Mon. Oct. 22 (FOCS Conference).


The pretaping is Wed. Oct. 17, 5:30-6:45, same room.
Additional cancellations/pretaping sessions possible.
Homework
Most assignments will be 1 week long, due on a Wednesday. That Wednesday the current assignment
is turned in, a solution sheet is distributed in class & the next assignment is posted on the class
website. The due date for CAETE students who do not attend live class is 1 week later (see the
email message to CAETE students). A few assignments may be 1 21 2 weeks & the schedule will
be modified accordingly.
Writeups must be legible. Computer-typeset writeups are great but are not required. If youre
handwriting an assignment use lined paper and leave lots of space (for legibility, and comments by
the grader). Illegible homework may be returned with no credit.
Grading: Each assignment is worth 100 points. Some assignments will have an extra credit problem
worth 20 points. The extra credit is more difficult than the main assignment and is meant for
students who want to go deeper into the subject. Extra credit points are recorded separately as
HECs. Final grades are based on the main assignments and are only marginally influenced by
HECs or other ECs.
In each class you can get 1 VEC (verbal extra credit) for verbal participation.
Collaboration Policy: Homework should be your own. There will be no need for collaborative
problem solving in this course. Note this policy is different from CSCI 5454. If theres any doubt
CSCI 5654

H. Gabow

Fall 2007

#0, p. 1

ask me. You will get a 0 for copying even part of an assignment from any source; a second violation
will result in failing the course.
Anyone using any of my solution sheets to do homework will receive an automatic F in the course.
(One student wound up in jail for doing this.)
All students are required to know and obey the University of Colorado Student Honor Code, posted
at http://www.colorado.edu/academics/honorcode. The class website has this link posted under Campus Rules, which also has links to policies on classroom behavior, religious observances
and student disability services. Each assignment must include a statement that the honor code was
obeyed see directions on the class website. I used to lower grades because of honesty issues, but
the Honor Code is working and I havent had to do this in a while.
Late homeworks: Homeworks are due in class on the due date. Late submissions will not be
accepted. Exceptions will be made only if arrangements are made with me 1 week in advance.
When this is impossible (e.g., medical reasons) documentation will be required (e.g., physicians
note).
Ignorance of these rules is not an excuse.
Website & Email
The class website is http://www.cs.colorado.edu/e hal/CS5654/home.html. It is the main form
of communication, aside from class. Assignments, current HW point totals and other course materials will be posted there.
You can email me questions on homework assignments.
the class website. Ill send email to the class indicating
clarifications of the homework assignments, missing office
class will be limited to pointers to the website. I check my

Ill post any needed clarifications on


new information on the website, e.g.,
hours, etc. Probably my email to the
email until 9:30 PM most nights.

Inappropriate email: Email is great for clarifying homework assignments. I try to answer all email
questions. But sometimes students misuse email and ask me to basically do their work for them.
Dont do this.
Text Linear Programming
by Vasek Chvatal, W.H. Freeman and Co., New York, 1984
References On reserve in Lester Math-Physics Library, or available from me.
Background in Linear Algebra:
Linear Algebra and its Applications
by G. Strang, Harcourt College Publishers, 1988 (3rd Edition)
Similar to Chv
atal:
Linear Programming: Foundations and Extensions
by Robert J. Vanderbei, Kluwer Academic Publishers, 2001 (2nd Edition)
(optional 2nd Text)
Linear Programming
by K.G. Murty, Wiley & Sons, 1983 (revised)
CSCI 5654

H. Gabow

Fall 2007

#0, p. 2

Linear and Combinatorial Programming


by K.G. Murty, Wiley & Sons, 1976
Introduction to Mathematical Programming, 4th Edition
by W.L. Winston, M. Venkataramanan, Thomson, 2003
Linear Programming
by H. Karloff, Birkhauser, 1991
More Theoretical:
Theory of Linear and Integer Programming
by A. Schrijver, John Wiley, 1986
Integer and Combinatorial Programming
by G.L. Nemhauser and L.A. Wolsey, John Wiley, 1988
Geometric Algorithms and Combinatorial Optimization
by M. Gr
otschel, L. Lovasz and A. Schrijver, Springer-Verlag, 1988
Using LP:
AMPL: A Modeling Language for Mathematical Programming
by R.Fourer, D.M.Gay, and B.W.Kernighan, Boyd & Fraser, 1993
Course Goals
Linear Programming is one of the most successful models for optimization, in terms of both realworld computing and theoretical applications to CS, mathematics, economics & operations research.
This course is an introduction to the major techniques for LP and the related theory, as well as
touching on some interesting applications.
Course Content
A course outline is given by the Handout List (Handout #i). We follow Chvatals excellent development, until the last two units on extra material.
Unit 1 is an Overview. It defines the LP problem & illustrates LP models.
Unit 2, Fundamentals, covers the simplex method at a high level. Simplex is used in most codes for
solving real-world LPs. Our conceptual treatment culminates in proving the Fundamental Theorem
of LP, which summarizes what Simplex does.
Unit 3 gives the basics of one of the most important themes in operations research and theoretical computer science, Duality. This theory is the basis of other LP methods as well as many
combinatorial algorithms. We touch on applications to economics and game theory.
Unit 4 returns to Simplex to present its Efficient Implementation. These details are crucial for
realizing the speed of the algorithm. We cover the technique of delayed column generation and its
application to industrial problems such as cutting stock, as well as the branch-and-bound method
for integer linear programming.
CSCI 5654

H. Gabow

Fall 2007

#0, p. 3

Unit 5, Extensions, gives other implementation techniques such as upper-bounding and sensitivity
analysis, as well as some general theory about linear inequalities, and the Dual Simplex Algorithm,
which recent work indicates is more powerful than previously thought. We introduce the cutting
plane technique for Integer Linear Programming.
Unit 6 is an introduction to Network Algorithms. These special LPs, defined on graphs, arise often
in real-life applications. We study the Network Simplex Algorithm, which takes advantage of the
graph structure to gain even more efficiency.
Unit 7, Polynomial-Time Linear Programming, surveys the ellipsoid method. This is a powerful tool
in theoretical algorithm design, and it opens the door to nonlinear programming. The Supplemental
Material covers Karmarkars algorithm, an alternative to Simplex that is often even more efficient.
Unit 8 is a brief introduction to nonlinear methods: quadratic programming (& the Markowitz
model for assessing risk versus return on financial investments) and its generalization to semidefinite programming. We illustrate the latter with the groundbreaking Goemans-Williamson SDP
algorithm for approximating the maximum cut problem in graphs.
The Supplemental Material in Unit 9 will be covered as time permits.
Linear Algebra
Chvatals treatment centers on the intuitive notion of dictionary. Units 13 use very little linear
algebra, although its still helpful for your intuition. Units 46 switch to the language of linear
algebra, but we really only use very big ideas. Units 78 use more of what you learned at the
start of an introductory linear algebra course. The course is essentially self-contained as far as
background from linear algebra is concerned.
Tips on using these notes
Class lectures will work through the notes. This will save you writing.
Add comments at appropriate places in the notes. Use backs of pages for notes on more extensive
discussions (or if you like to write big!). If I switch to free sheets of paper for extended discussions
not in the notes, you may also want to use free sheets of paper.
A multi-colored pen can be useful (to separate different comments; for complicated pictures; to color
code remarks, e.g. red = important, etc.) Such a pen is a crucial tool for most mathematicians
and theoretical computer scientists!
If you want to review the comments I wrote in class, I can reproduce them for you.
The notes complement the textbook. The material in Chvatal corresponding to a handout is given
in the upper left corner of the handouts page 1, & in Handout #i. The notes follow Chvatal, but
provide more formal discussions of many concepts as well as alternate explanations & additional
material. The notes are succinct and require additional explanation, which we do in class. You
should understand both Chvatal and my notes.
Extra credit for finding a mistake in these notes.

CSCI 5654

H. Gabow

Fall 2007

#0, p. 4

Chvatal, 3 9

Linear Programming

Unit 1: Overview

What you didnt learn in high school algebra


High School Exercise. Solve this system of inequalities (Hint: Its inconsistent!):
4x+2y 8
3x+4y 12
3x+2y 7
systems of inequalities are much harder to solve than systems of equations!
linear programming gives the theory and methods to solve these systems
in fact LP is mathematically equivalent to solving these systems
A real-world LP
Power Flakes new product will be a mixture of corn & oats.
1 serving must supply at least 8 grams of protein, 12 grams of carbohydrates, & 3 grams of fat.
1 ounce of corn supplies 4,3, and 2 grams of these nutrients, respectively.
1 ounce of oats supplies 2,4, and 1 grams, respectively.
Corn can be bought for 6 cents an ounce, oats at 4 cents an ounce.
What blend of cereal is best?
Best means that it minimizes the cost ignore the taste!
LP Formulation
x = # ounces of corn per serving; y = # ounces of oats per serving
minimize z = 6x+4y
subject to
4x+2y
3x+4y
2x+ y
x, y

8
12
3
0

assumptions for our model:


linearity proportionality, additivity
(versus diminishing returns to scale, economies of scale, protein complementarity)
continuity versus integrality

CSCI 5654

H. Gabow

Fall 2007

#1, p. 1

Linear Programming optimize a linear function subject to linear constraints


Standard Form (changes from text to text)
objective function
maximize z =

Pn

j=1 cj xj

Pn

subject to

j=1

aij xj
xj

bi
0

(i = 1, . . . , m)
(j = 1, . . . , n)

constraints

nonnegativity constraints
cost coefficient cj , decision variable xj , right-hand side coefficient bi
a feasible solution satisfies all the constraints
an optimum solution is feasible and achieves the optimum objective value
Activity Space Representation
n dimensions, xj = level of activity j
y

4x+2y=8

3x+4y=12

x
6x+4y=14.4

Requirement Space Representation (m dimensions see Handout#33, p.2)

CSCI 5654

H. Gabow

Fall 2007

#1, p. 2

Real-world LPs
blending problem find a least cost blend satisfying various requirements
e.g., gasoline blends (Chvatal, p.10, ex.1.7); diet problem (Chvatal, pp.35); animal feed;
steel composition
resource allocation problem allocate limited resources to various activities to maximize profit
e.g., forestry (Chvatal, pp.1716); Chvatal, p.10, ex.1.6; beer production
multiperiod scheduling problems schedule activities in various time periods to maximize profit
e.g., Chvatal, p.11, ex.1.8; cash flow; inventory management; electric power generation
And Solving Them
LP-solvers include CPLEX (the industrial standard), MINOS, MOSEK, LINDO
most of these have free (lower-powered) versions as downloads
modelling languages facilitate specification of large LPs
e.g., heres an AMPL program for a resource-allocation problem (from AMPL text):
the model is in a .mod file:
set PROD;
# products
set STAGE; # stages
param rate {PROD,STAGE} > 0; # tons per hour in each stage
param avail {STAGE} >= 0;
# hours available/week in each stage
param profit {PROD};
# profit per ton
param commit {PROD} >= 0;
# lower limit on tons sold in week
param market {PROD} >= 0;
# upper limit on tons sold in week
var Make {p in PROD} >= commit[p], <= market[p]; # tons produced
maximize total profit: sum {p in PROD} profit[p] * Make[p];
subject to Time {s in STAGE}:
sum {p in PROD} (1/rate[p,s]) * Make[p] <= avail[s];
# In each stage: total of hours used by all products may not exceed hours available

& the data is in a .dat file:


set PROD := bands coils plate;
set STAGE := reheat roll;
param rate: reheat roll :=
bands
200
200
coils
200
140
plate
200
160 ;
param:
profit commit market :=
bands
25
1000
6000
coils
30
500
4000
plate
29
750
3500 ;
param avail := reheat 35
roll
40 ;

CSCI 5654

H. Gabow

Fall 2007

#1, p. 3

LP versus ILP
an Integer Linear Program (ILP) is the same as an LP
but the variables xj are required to be integers
much harder!
Two Examples
Knapsack Problem
maximize z = 3x1 + 5x2 + 6x3
subject to x1 + 2x2 + 3x3 5

0 xi 1, xi integral

i = 1, 2, 3

this is a knapsack problem:


items 1,2 & 3 weigh 1,2 & 3 pounds respectively
and are worth $3,$5 & $6 respectively
put the most valuable collection of items into a knapsack that can hold 5 pounds
the corresponding LP (i.e., drop integrality constraint) is easy to solve by the greedy method
add items to the knapsack in order of decreasing value per pound
if the last item overflows the knapsack, add just enough to fill it
we get x1 = x2 = 1, x3 = 2/3, z = 12
the best integral solution is x1 = 0, x2 = x3 = 1, z = 11
but the greedy method wont solve arbitrary LPs!
Exercise. The example LP (i.e., continuous knapsack problem) can be put into a form where the
greedy algorithm is even more obvious, by rescaling the variables. (a) Let yi be the number of
pounds of item i in the knapsack. Show the example LP is equivalent to the LP () below. (b)
Explain why its obvious that the greedy method works on ().
()

maximize z = 3y1 + 52 y2 + 2y3


subject to y1 + y2 + y3 5

0 yi i,

i = 1, 2, 3

() is a special case of a polymatroid a broad class of LPs with 0/1 coefficients where the greedy
method works correctly.

CSCI 5654

H. Gabow

Fall 2007

#1, p. 4

Traveling Salesman Problem (TSP)


TSP is the problem of finding the shortest cyclic route through n cities,
visiting each city exactly once
starting & ending at the same city
index the cities from 1 to n
let cij , i, j = 1, . . . , n denote the direct distance between cities i & j
well consider the symmetric TSP, where cij = cji
for 1 i < j n,
let xij = 1 if cities i & j are adjacent in the tour, otherwise xij = 0
symmetric TSP is this ILP:
P
minimize z = i<j cij xij
P
subject to
xij
Pk{i,j}

|{i,j}S|=1 xij

xij

=2
2

{0, 1}

k = 1, . . . , n
(enter & leave each city)
S {1, . . . , n} (subtour elimination constraints)
1i<jn

we form the (Held-Karp) LP relaxation of this ILP by dropping the integrality constraints
i.e., replace the last line by xij 0, 1 i < j n
let zILP and zLP denote the optimum objective values of the 2 problems
assume distances satisfy the triangle inequality
cij + cjk cik for all i, j, k
then we know that zILP (3/2)zLP , i.e., the integrality gap is 3/2
experimentally the Held-Karp lower bound is typically above .99zILP
the 4/3 Conjecture states (unsurprisingly) that the integrality gap is always 4/3

CSCI 5654

H. Gabow

Fall 2007

#1, p. 5

Exercise. Well show the integrality gap can be 4/3 , for any > 0. Heres the graph:

...
...
...

3k + 2 vertices; k vertices are in the top horizontal path. For any 2


vertices i, j, cij is the number of edges in a shortest path from i to j.

...
...
k+1

...
A tour with length 4k + 2.

...
1/2

1/2

...

1/2

1/2
1/2

1/2

...
A fractional solution of length 3k + 3

3k + 3 = 3(k 1) + 2(1 + 1/2 + 1/2 + 1/2(2)) .

(a) Explain why the above fractional solution is feasible. Hint: Concentrate on the 3 paths of solid
edges. (b) Explain why any fractional solution has length 3k + 2. (c) Explain why any tour has
length 4k + 2. Hint: Concentrate on the length 1 edges traversed by the tour. They break up
into subpaths beginning and ending at 1 of the 2 extreme points of the graph. (d) Conclude that
for this example, limk zILP /zLP = 4/3.

CSCI 5654

H. Gabow

Fall 2007

#1, p. 6

Standard Form

Unit 1: Overview

a linear program is a problem that can be put into standard (maximization) form
maximize z =
subject to

Pn

j=1 cj xj

Pn

j=1

aij xj
xj

bi
0

(i = 1, . . . , m)
(j = 1, . . . , n)

bi
0

(i = 1, . . . , m)
(j = 1, . . . , n)

Standard minimization form


minimize z =
subject to

Pn

j=1 cj xj

Pn

j=1

aij xj
xj

to convert standard minimization form to standard (maximization) form,


the new objective is z, the negative of the original
Pn
the new linear constraints are the negatives of the original, j=1 (aij )xj bi
nonnegativity remains the same
Remark. well see many more standard forms!
resource allocation problems often translate immediately into standard maximization form,
with aij , bi , cj 0:
n activities
xj = the level of activity j
m scarse resources
bi = the amount of resource i that is available
we seek the level of each activity that maximizes total profit
blending problems often translate immediately into standard minimization form,
with aij , bi , cj 0:
n raw materials
xj = the amount of raw material j
m components of the blend
bi = requirement on the ith component
we seek the amount of each raw material that minimizes total cost

CSCI 5654

H. Gabow

Fall 2007

#2, p. 1

Free variables
a free variable has no sign restriction
we can model a free variable x by replacing it by 2 nonnegative variables p & n with x = p n
replace all occurrences of x by p n
the 2 problems are equivalent:
a solution to the original problem gives a solution to the new problem
with the same objective value
conversely, a solution to the new problem gives a solution to the original problem
with the same objective value
A more economical transformation
the above transformation models k free variables xi , i = 1, . . . , k by 2k new variables
we can model these k free variables by k + 1 new variables:
f i = pi N
Remark
LINDO variables are automatically nonnegative, unless declared FREE
Equality constraints
an equality constraint
Pn
a x b
Pnj=1 j j
j=1 aj xj b

Pn

j=1 aj xj

= b can be modelled by 2 inequality constraints:

this models k equality constraints

Pn

j=1

aij xj = bi , i = 1, . . . , k by 2k new constraints

we can
Pnuse only k + 1 new constraints, by simply adding together all the constraints:
j=1 aij xj bi , i = 1, . . . , k
Pk Pn
Pk
i=1
j=1 aij xj
i=1 bi

(this works since if < held in one of the first k inequalities,


weld have < in their sum, contradicting the last inequality)
Example. The 2 constraints
x + y = 10 y + 5z = 20
are equivalent to
x + y 10 y + 5z 20

x + 2y + 5z 30

Exercise. This optimization problem


minimize z = x
subject to
x>1
illustrates why strict inequalities are not allowed. Explain.

CSCI 5654

H. Gabow

Fall 2007

#2, p. 2

Exercise. (Karmarkar Standard Form) The Linear Inequalities (LI) problem is to find a solution
to a given system of linear inequalities or declare the system infeasible. Consider the system
Pn

bi

Pn

= bi
0

j=1 aij xj

(i = 1, . . . , m)

(i) Show the system is equivalent to a system in this form:


j=1 aij xj

xj

(i = 1, . . . , m)
(j = 1, . . . , n)

(ii) Show that we can assume a normalizing constraint in (i), i.e., any system in form (i) is
equivalent to a system in this form:
Pn
a x
j=1
Pn ij j
j=1 xj
xj

= bi
=1
0

(i = 1, . . . , m)
(j = 1, . . . , n)

Hint. Let M be an upper bound to each xj . (The exercise of P


Handout#25 gives a formula for M ,
assuming all aij & bi are integers. Thus we can assume in (i), nj=1 xj nM . Add this constraint
and rescale.
(iii) Show a system in form (ii) is equivalent to a system in this form:
Pn
a x
j=1
Pn ij j
j=1 xj
xj

=0
=1
0

(i = 1, . . . , m)
(j = 1, . . . , n)

(iv) Starting with the system of (iii) construct the following LP, which uses another variable s:
minimize z = s
subject to

Pn

j=1

Pn
aij xj ( j=1 aij )s = 0
Pn
j=1 xj + s = 1
xj , s 0

(i = 1, . . . , m)
(j = 1, . . . , n)

Show (iii) has a solution if and only if (iv) has optimum cost 0. Furthermore (iv) always has
nonnegative cost, and setting all variables equal to 1/(n + 1) gives a feasible solution.
(iv) is standard form for Karmarkars algorithm. That is, Karmarkars algorithm has input an LP
of the form
minimize z =
subject to

Pn

j=1 cj xj

Pn
a x
j=1
Pn ij j
j=1 xj
xj

=0
=1
0

(i = 1, . . . , m)
(j = 1, . . . , n)

where any feasible solution has z 0 and xj = 1/n, j = 1, . . . , n is feasible. The algorithm
determines whether or not the optimum cost is 0. (The exercise of Handout#18 shows any LP can
be placed into Karmarkar Standard Form.)

CSCI 5654

H. Gabow

Fall 2007

#2, p. 3

Chvatal 213223; Murty Ch.1

LP Objective Functions

Unit 1: Overview

we can handle many seemingly nonlinear objective functions


by adding a new variable
Minimax Objectives
Example 1. minimize the maximum of 3 variables x, y, z (subject to various other constraints)
LP: add a new variable u & add new constraints:
minimize u
new objective
ux
new constraints
uy
uz
this is a correct model:
for any fixed values of x, y, & z the LP sets u to max{x, y, z}, in order to minimize the objective
this trick doesnt work to minimize the min of 3 variables see Exercise below
Example 2. 3 new technologies can manufacture our product with different costs
3x + y + 2z + 100, 4x + y + 2z + 200, 3x + 3y + z + 60
but were not sure which technology well be using
to be conservative we minimize the maximum cost of the 3 technologies
LP: add a new variable u & with new constraints:
minimize u
new objective
u 3x + y + 2z + 100
new constraints
u 4x + y + 2z + 200
u 3x + 3y + z + 60
Example 2 . In makespan scheduling we want to minimize the completion time of the last job.
in general we can solve LPs
P with a minimax objective function, i.e.,
minimize z = max{ j ckj xj + dk : k = 1, . . . , r}
we add variable
u & minimize z = u with new constraints
P
u j ckj xj + dk , k = 1, . . . , r
Note: u is a free variable
similarly we can model maximin
objective functions
P
maximize z = min{ j ckj xj + dk : k = 1, . . . , r}
add variable
P u & maximize z = u with new constraints
u j ckj xj + dk , k = 1, . . . , r

we can minimize sums of max terms, e.g.,


minimize max{2x + 3y + 1, x + y + 10} + max{x + 7y, 3x + 3y + 3}
but not mixed sums, e.g.,
minimize max{2x + 3y + 1, x + y + 10} + min{x + 7y, 3x + 3y + 3}

CSCI 5654

H. Gabow

Fall 2007

#3, p. 1

Special Cases
these useful objective functions all have a concavity restriction
dont try to remember them, just know the general method!
Diminishing Returns (maximizing piecewise linear concave down objective functions)
(concave down means slope is decreasing)
Example 3.
maximizing z =

( 2x

x+1
(x + 5)/2

x1
1x3
3x5

is equivalent to
maximizing min{2x, x + 1, (x + 5)/2}
z

z=(x+5)/2

z=(x+5)/2

z=x+1

z=x+1

z=2x

z=2x
x

z=

( 2x

x+1
(x + 5)/2

x1
1x3
3x5

min{2x, x + 1, (x + 5)/2}

P
Example 4. maximizing z = nj=1 cj (xj ), where each cj is a piecewise linear concave down function
the same transformation as Example 3 works
Remark.
Handout #45 gives an alternate solution,
that adds more variables but uses simpler constraints
similarly we can minimize a sum of piecewise linear concave up functions

CSCI 5654

H. Gabow

Fall 2007

#3, p. 2

Absolute Values
Example 5. the objective can contain terms with absolute values, e.g., |x|, 3|y|, 2|x y 6|
but the coefficients must be positive in a minimization problem (e.g., +|x|)
& negative in a maximization problem (e.g., |x|)
y

y=|x|

y = |x| = max{x, x} is concave up.


e.g., minimizing x + 3|y| + 2|x y 6|
is equivalent to
minimizing x + 3 max{y, y} + 2 max{x y 6, (x y 6)}
Example 6: Data Fitting. (Chvatal, pp.213223)
Pn
we observe data that is known to satisfy a linear relation j=1 aij xj = bi , i = 1, . . . , m
we want to find the values xj , j = 1, . . . , n that best approximate the observations
in some cases its
use an L1 -approximation
to P

Pmbest
n

minimize i=1 bi j=1 aij xj (perhaps subject to xj 0)
and sometimes its best
to

-approximation
Puse an L
minimize maxi bi nj=1 aij xj

(when a least-squares, i.e., L2 -approximation, is most appropriate,


we may be able to use calculus, or more generally we use QP see Handout#42)

CSCI 5654

H. Gabow

Fall 2007

#3, p. 3

Excesses and Shortfalls


Example 7.
in a resource allocation problem, variable x is the number of barrels of high-octane gas produced
the demand for high-octane gas is 10000 barrels
producing more incurs a holding cost of $25 per barrel
producing less incurs a purchase cost of $50 per barrel
LP: add new variables e (excess) and s (shortfall)
add terms 25e 50s to the objective function (maximizing profit)
add constraints e x 10000, e 0
& s 10000 x, s 0
Remark. were modelling excess = max{x 10000, 0}, shortfall = max{10000 x, 0}
in general we can model an excess or shortage of a linear function
with penalties pe for excess, ps for shortage
when were maximizing or minimizing
if pe , ps 0

aj xj & target b

more generally we can allow pe + ps 0


this allows a reward for excess or shortage (but not both)
to do this add terms
P pe e + ps s to the objective (minimizing cost)
and constraints j aj xj b = e s, e 0, s 0

Exercise. Suppose, similar to Example 1, we want to minimize the minimum of 3 variables x, y, z


(subject to various other constraints). Show how to do this by solving 3 LPs, each involving 2
extra constraints.

CSCI 5654

H. Gabow

Fall 2007

#3, p. 4

Complexity of LP & Related Problems

Unit 1: Overview

What is a large LP?


era
Dantzigs US
economy model
2000

m
1.5K

n
4K

nonzeroes
40K

10K..100K
even 1M

20K..500K
even 2M

100K..2M
even 6M

the big problems still have only 10..30 nonzeroes per constraint
the bigger problems may take days to solve
Notation: L = (number of bits in the input) (see Handout#69)
Perspective: to understand the bounds, note that Gaussian elimination is time O(n3 L)
i.e., O(n3 ) operations, each on O(L) bits
Simplex Method
G.B. Dantzig, 1951: Maximization of a linear function of variables subject to linear inequalities
visits extreme points, always increasing the profit
can do 2n pivot steps, each time O(mn)
but in practice, simplex is often the method of choice
this is backed up by some theoretic results
in a certain model where problems are chosen randomly, average number of pivots is bounded
by min{n/2, (m + 1)/2, (m + n + 1)/8}, in agreement with practice
simplex is polynomial-time if we use smoothed analysis compute average time of a randomly perturbed variant of the given LP
the next 2 algorithms show LP is in P
Ellipsoid Method
L.G. Khachiyan, 1979: A polynomial algorithm for linear programming
finds sequence of ellipsoids of decreasing volume, each containing a feasible solution
O(mn3 L) arithmetic operations on numbers of O(L) bits
impractical, even for 15 variables
theoretic tool for developing polynomial time algorithms (Gr
otschel, Lovasz, Schrijver, 1981)
extends to convex programming, semidefinite programming
Interior Point Algorithm
N. Karmarkar, 1984: A new polynomial-time algorithm for linear programming
(Combinatorica 84)
navigates through the interior, eventually jumping to optimum extreme point
O((m1.5 n2 + m2 n)L) arithmetic operations on numbers of O(L) bits
in practice, competitive with simplex for large problems
refinements: O((mn2 + m1.5 n)L) operations on numbers of O(L) bits (Vaidya, 1987, and others)

CSCI 5654

H. Gabow

Fall 2007

#4, p. 1

Strong Polynomial Algorithms (for special LPs)


2 variables per inequality: time O(mn3 log n) (Megiddo, 1983)
each aij {0, 1} (combinatorial LPs): p(n, m) i..e., strong polynomial time (Tardos, 1985)
time O(m) for n = O(1) (Megiddo, 1984)
Randomized Algorithms
relatively simple randomized algorithms achieve
average running times that are subexponential
O( n log n)
e.g., the only superpolynomial term is 2
(Motwani & Raghavan, Randomized Algorithms) surveys these
Kelner & Spielman (STOC 2006) show a certain randomized version of simplex
runs in polynomial time
Integer Linear Programming
NP-complete
polynomial algorithm for n = O(1), the basis reduction method (Lenstra, 1983)
in practice ILPs are solved using a subroutine for LP, and generating additional linear constraints
What is a large ILP?
m = 100..2K or even 5K; n = 500..2K or even 5K
free LINDO can handle m = 300, n = 150 for LP, and n = 50 for ILP

CSCI 5654

H. Gabow

Fall 2007

#4, p. 2

Chvatal, 13 23

The Simplex Algorithm: Example

Unit 2: Fundamentals

we solve this LP (a resource allocation problem similar to Handout #1):


maximize z = 6x1 +4x2
subject to
4x1 +2x2
8
3x1 +4x2
12
x1 , x2 0
maximum z = 14.4 achieved by (.8, 2.4)
x2

4x1+2x2=8

3x1+4x2=12

x1
6x1+4x2=14.4

to start, change inequalities to equations by introducing slack variables x3 , x4


nonnegative, x3 , x4 0
this gives the initial dictionary it defines the slack variables in terms of the original variables:
x3 = 8 4x1 2x2
x4 = 12 3x1 4x2
z=

6x1 + 4x2

this dictionary is associated with the solution x1 = x2 = 0, x3 = 8, x4 = 12, z = 0


the l.h.s. variables x3 , x4 are the basic variables
the r.h.s. variables x1 , x2 are the nonbasic variables
in general a dictionary has the same form as above
it is a set of equations defining the basic variables in terms of the nonbasic variables
the solution associated with the dictionary has all nonbasic variables set to zero
the dictionary is feasible if this makes all basic variables nonnegative
in which case the solution is a basic feasible solution (bfs)

CSCI 5654

H. Gabow

Fall 2007

#5, p. 1

increasing x1 will increase z


we maintain a solution by decreasing x3 and x4
x3 0 = 8 4x1 2x2 0, x1 2
x4 0 = x1 4
so set x1 = 2
this gives z = 12, and also makes x3 = 0
this procedure is a pivot step
we do more pivots to get even better solutions:
the first pivot:
x1 & x3 change roles, i.e., x1 becomes basic, x3 nonbasic:
1. solve for x1 in x3 s equation
2. substitute for x1 in remaining equations
2nd Dictionary
x1 = 2 12 x2 14 x3
x4 = 6 25 x2 + 34 x3
z = 12 + x2 23 x3
this dictionary has bfs x1 = 2, x4 = 6
the objective value z = 12
the 2nd pivot:
increasing nonbasic variable x2 will increase z
= make x2 the entering variable
x1 = 2 12 x2 0 = x2 4
x4 = 6 52 x2 0 = x2 12/5
= make x4 the leaving variable
pivot (on entry 52 ) to get new dictionary
3rd Dictionary
x1 = 54
x2 =

12
5

z=

72
5

2
1
5 x3 + 5 x4
3
x 25 x4
10 3
6
5 x3

25 x4

this dictionary gives solution x1 = 4/5, x2 = 12/5


an optimum solution
Proof. x3 , x4 0 = z 72/5

CSCI 5654

H. Gabow

Fall 2007

#5, p. 2

Geometric View
the algorithm visits corners of the feasible region, always moving along an edge
x2

4x1+2x2=8

3x1+4x2=12

x1
6x1+4x2=14.4

CSCI 5654

H. Gabow

Fall 2007

#5, p. 3

Chvatal, 17 19

Dictionaries & LP Solutions

Unit 2: Fundamentals

Dictionaries
start with an LP () in standard form,
maximize z =
()

subject to

Pn

j=1 cj xj

Pn

j=1

aij xj
xj

bi
0

introduce m slack variables xn+1 , . . . , xn+m

(i = 1, . . . , m)
(j = 1, . . . , n)
(xn+i 0)

the equations defining the slacks & z give the initial dictionary for ():
Pn
xn+i = bi j=1 aij xj
(i = 1, . . . , m)
Pn
z=
j=1 cj xj
A dictionary for LP () is determined by a set of m basic variables B.
The remaining n variables are the nonbasic variables N .
B, N {1, . . . , n + m}
(i) The dictionary has
Pthe form
xi = bi jN aij xj
P
z =z+
jN cj xj

(i B)

(ii) xj , j = 1, . . . , m + n, z is a solution of the dictionary


it is a solution of the initial dictionary

Remarks.
1. a dictionary is a system of equations
showing how the nonbasic variables determine the values of the basic variables and the objective
nonnegativity is not a constraint of the dictionary
2. notice the sign conventions of the dictionary
3. B, the set of basic variables, is a basis
4. well satisfy condition (ii) by deriving our dictionaries from the initial dictionary
using equality-preserving transformations
a feasible dictionary has each bi 0
/ B)
it gives a basic feasible solution, xi = bi (i B), xi = 0 (i
Examples
1. the initial dictionary of a resource allocation problem is a feasible dictionary
2. the blending problem of Handout #1

CSCI 5654

H. Gabow

Fall 2007

#6, p. 1

minimize z = 6x+4y
subject to
4x+2y 8
3x+4y 12
x, y 0
has initial dictionary
s1 = 8 + 4x + 2y
s2 = 12 + 3x + 4y
z=

6x + 4y

infeasible!
Lemma [Nonbasic variables are free.] Let D be an arbitrary dictionary, with basic (nonbasic)
variables B (N ).
(i) Any linear relation always satisfied by the nonbasic variables of D has all its coefficients equal
to 0, i.e.,
P

jN

j xj = for all solutions of D = = 0, j = 0 for all j N .

(ii) Any linear relation


P

jBN

j xj + =

jBN

j xj +

always satisfied by the solutions of D has the same coefficients on both sides if all basic coefficients
are the same, i.e.,
j = j for every j B = = , j = j for every j N .
Proof of (i).
setting xj = 0, j N = = 0
setting xj = 0, j N i, xi = 1 = i = 0

CSCI 5654

H. Gabow

Fall 2007

#6, p. 2

The 3 Possibilities for an LP


any LP either
(i) has an optimum solution
it actually has an optimum bfs (basic feasible solution)
(ii) is infeasible (no values xj satisfy all the constraints)
(iii) is unbounded (the objective can be made arbitrarily large)

0110
110010
1010
1010
10

yx>=1

y<=x
yx<=1

x+y= constant

max x+y

Infeasible LP

Unbounded LP

well show (i) (iii) are the only possibilities for an LP


part of the Fundamental Theorem of Linear Programming
What Constitutes a Solution to an LP?
for (i): an optimum solution an optimum bfs is even better!
for (ii): a small number ( n + 1) of inconsistent constraints
for (iii): a line of unboundedness on the boundary is best!
in real-world modelling situations, (ii) & (iii) usually indicate errors in the model
the extra information indicates how to fix the model

CSCI 5654

H. Gabow

Fall 2007

#6, p. 3

Chvatal, 27 33

The Basic Simplex Algorithm

Unit 2: Fundamentals

the basic simplex algorithm is good for conceptualization and hand-calculation


although it ignores 2 issues
Initialization
Construct a feasible dictionary
often, as in a resource allocation problem, initial dictionary is feasible
i.e., all bi 0
Main Loop
Repeat the following 3 steps
until the Entering Variable Step or Leaving Variable Step stops
aij , bi , cj all refer to the current dictionary
Entering Variable Step
If every cj 0, stop, the current basis is optimum
Otherwise choose any (nonbasic) s with cs > 0
Leaving Variable Step
If every ais 0, stop, the problem is unbounded
Otherwise choose a (basic) r with ars > 0 that minimizes bi /ais
Pivot Step
Construct a dictionary for the new basis as follows:
(i) Solve for xs in the equation for xr
X
xs = (br /ars )
(arj /ars )xj
N = N {s} {r} is the new set of nonbasic variables
jN

note arr = 1 by definition

(ii) Substitute this equation in the rest of the dictionary,


so all right-hand sides are in terms of N

in the Entering Variable Step usually > 1 variable has positive cost
the pivot rule specifies the choice of entering variable
e.g., the largest coefficient rule chooses the entering variable with maximum cs
the computation in the Leaving Variable Step is called the minimum ratio test
Efficiency (Dantzig, LP & Extensions, p.160)
in practice the algorithm does between m & 2m pivot steps, usually < 3m/2
for example see the real-life forestry LP in Chvatal, Ch.11
simplex finds the optimum for 17 constraints in 7 pivots
Deficiencies of the Basic Algorithm
we need to add 2 ingredients to get a complete algorithm:
in general, how do we find an initial feasible dictionary?
how do we guarantee the algorithm halts?

CSCI 5654

H. Gabow

Fall 2007

#7, p. 1

Chvatal, 27 33

Unit 2: Fundamentals

Correctness of Basic Simplex

our goal is to show the basic simplex algorithm always halts with the correct answer
assuming we repair the 2 deficiencies of the algorithm
we achieve the goal by proving 6 properties of the algorithm
1. Each dictionary constructed by the algorithm is valid.
Proof.
each Pivot Step replaces a system of equations by an equivalent system

2. Each dictionary constructed by the algorithm is feasible.


Proof.
after pivotting, any basic i 6= s has xi = bi ais (br /ars )
| {z }
new xs , 0
ais 0 = xi bi 0
ais > 0 = bi /ais br /ars (minimum ratio test) = bi ais br /ars

3. The objective value never decreases:


It increases in a pivot with br > 0 and stays the same when br = 0.
this property shows how the algorithm makes progress toward an optimum solution
Proof.
P
in the dictionary before the pivot, z = z + jN cj xj
the objective value is z before the pivot
let it be z after the pivot
the new bfs has xs = br /ars
thus z = z + cs (br /ars )
since cs > 0, z z
if br > 0 then z > z

4. Every cj 0 = current basis is optimum.


local optimum is global optimum
Proof.
P
consider the objective in the current dictionary, z = z + jN cj xj
current objective value = z
any feasible solution has all variables nonnegative = its objective value is z

5. In the Leaving Variable Step, every ais 0 = the LP is unbounded.


Proof.
set xj =

(t

CSCI 5654

bj ajs t
0

j=s
jB
j
/ Bs
H. Gabow

Fall 2007

#8, p. 1

this is a feasible solution for every t 0


its objective value z = z + cs t can be made arbitrarily large

the simplex algorithm can output this line of unboundedness


Properties 4 5 show the algorithm is correct if it stops (i.e., it is partially correct)
6. If the algorithm doesnt stop, it cycles, i.e.,
it repeats a fixed sequence of pivots ad infinitum.
Proof.
Claim: there are a finite number of distinct dictionaries
the claim implies Property 6, assuming the pivot rule is deterministic
Proof of Claim: 
there are n+m
bases B
m
each basis B has a unique dictionary
to show this suppose we have 2 dictionaries for the same basis
let xi be a basic
P its equation in both dictionaries,
P variable and consider

xi = bi jN aij xj = bi jN aij xj
nonbasic variables are free = the equations are the same, i.e., bi = bi , aij = aij

similarly, z = z , cj = cj
Cycling
in a cycle, the objective z stays constant (Property 3 shows this is necessary for cycling)
so each pivot has br = 0 (Property 3)
thus the entering variable stays at 0, and the solution (x1 , . . . , xn ) does not change
Chvatal pp. 3132 gives an example of cycling (see Handout #48)
Degeneracy
a basis is degenerate if one or more basic variables = 0
degeneracy is necessary for cycling
but simplex can construct a degenerate basis without cycling:
br neednt be 0
even if br = 0 we neednt be in a cycle
although such a pivot does not change the objective value (see Property 3)
(i) degeneracy is theoretically unlikely in a random LP, but seems to always occur in practice!
(ii) if there is a tie for leaving variable, the new basis is degenerate (see proof of Property 2)
Exercise. Prove the converse: A pivot step gives a nondegenerate dictionary if it starts with a
nondegenerate dictionary and has no tie for leaving variable.
Handout #11 adds a rule so we never cycle
in fact, each pivot increases z
this guarantees the algorithm eventually halts with the correct answer
Handout #13 shows how to proceed when the initial dictionary is infeasible

CSCI 5654

H. Gabow

Fall 2007

#8, p. 2

Chvatal, 23 25

The Simplex Algorithm with Tableaus

Unit 2: Fundamentals

tableaus are an abbreviated representation of dictionaries,


suitable for solving LPs by hand, and used in most LP texts
a tableau is a labelled matrix that represents a dictionary,
e.g., heres the initial dictionary of Handout#5 & the corresponding tableau:
x3 = 8 4x1 2x2
x4 = 12 3x1 4x2
z=

6x1 + 4x2

x1 x2 x3 x4 z b
x3 4 2 1
8
x4 3 4
1
12
z 6 4
1 0

To get the tableau representing a given dictionary


label the columns with the variable names, followed by z & b
label each row with the corresponding basic variable (from the dictionary),
the last row with z
in each dictionary equation move all
P
P variables to l.h.s.
so the equations become xi + aij xj = bi , z cj xj = z
copy all numeric coefficients (with sign) in the dictionary
into the tableaus corresponding matrix entry
Remarks.
1. a coefficient aij (cj ) in the dictionary becomes aij (cj ) in the tableau
2. Chvatal uses the opposite sign convention in the z row
LINDO uses the same sign convention
To execute the simplex algorithm with tableaus
add a new rightmost column to the tableau, for the ratios in the minimum ratio test
star the pivot element ars (its row has the minimum ratio)
Pivot Step:
get new pivot row by dividing by the pivot element
relabel the pivot row to xs (the entering variable)
decrease each row i (excepting the pivot row but including the objective row)
by ais times the (new) pivot row

CSCI 5654

H. Gabow

Fall 2007

#9, p. 1

Solution of the Example LP by Tableaus


Initial Tableau
x1 x2 x3 x4 z b ratio
x3 4 2 1
8
2
x4 3 4
1
12
4
z 6 4
1 0
1st Pivot
x1 x2
x3 x4 z b ratio
x1 1 .5
.25
2
4

x4
2.5 .75 1
6 2.4
z
1 1.5
1 12
Optimum Tableau
x1 x2 x3 x4 z b
x1 1
.4 .2
.8
x2
1 .3 .4
2.4
z
1.2 .4 1 14.4
Example 2.
LP: maximize z = xy
subject to
x+y 2
ax +y 4
x, y 0
Initial Tableau
s1 s2 x
s1 1 0 1
s2 0 1 a
z
1

y z b ratio
1
2
1
4 4/a
1 1 0

this ratio test assumes a > 0


if a 0 the initial tableau has an unbounded pivot
corresponding to the line y = 0 (parametrically x = t, y = 0, s1 = 2 + t, s2 = 4 at)
1st Pivot (Optimum)
let = 1/a
s1
s1 1
x 0
z

s2 x
y
z
b

1+
2 + 4
1

1+ 1
4

CSCI 5654

H. Gabow

Fall 2007

#9, p. 2

Chvatal, 250 255, 260 261

Introduction to Geometry of LP

Unit 2: Fundamentals

Rn n-dimensional space, i.e., the set of all vectors or points (x1 , . . . , xn )


let aj , j = 1, . . . , n and b be real numbers
Pnwith some aj 6= 0
hyperplane all points of Rn satisfying j=1 aj xj = b
e.g.: a line ax + by = c in the plane R2 , a plane ax + by + cz = d in 3-space, etc.
a hyperplane is an (n 1)-dimensional space
(closed) half-space all points of Rn satisfying

Pn

j=1 aj xj

convex polyhedron an intersection of a finite number of half-spaces


convex polytope a convex polyhedron that is bounded
let P be a convex polyhedron
the hyperplanes of P the hyperplanes corresponding to the half-spaces of P
this may include extraneous hyperplanes that dont change the intersection
P contains various polyhedra
face P intersected with some of the hyperplanes of P , or or P

x
this unbounded convex polyhedron has
3 vertices, 4 edges (facets), 9 faces total

CSCI 5654

H. Gabow

Fall 2007

#10, p. 1

Special Faces
vertex 0-dimensional face of P
i.e., a point of P that is the unique intersection of n hyperplanes of P
edge 1-dimensional face of P
i.e., a line segment that is the intersection of P and n 1 hyperplanes of P
can be a ray or a line
facet (n 1)-dimensional face of P

A famous 3D-convex polyhedron


2 vertices of P are adjacent if they are joined by an edge of P
a line in Rn has a parameterized form, xj = mj t + bj , j = 1, . . . , n
Geometry of LPs
the feasible region of an LP is a convex polyhedron P
the set of all optima of an LP form a face
e.g., a vertex, if the optimum is unique
, if the LP is infeasible or unbounded
an unbounded LP has a line of unboundedness, which can always be chosen as an edge
P , if the objective is constant on P

CSCI 5654

H. Gabow

Fall 2007

#10, p. 2

Geometric View of the Simplex Algorithm


(see Handout#47 for proofs of these facts)
consider the problem in activity space (no slacks)
a bfs is a vertex of P
plus n hyperplanes of P that define it
a degenerate bfs is the intersection of > n hyperplanes of P
may (or may not) correspond to > n facets intersecting at a point
(see also Chvatal, pp. 259260)
corresponds to > 1 dictionary
(nondegeneracy corresponds to general position in geometry)
a nondegenerate pivot moves from one vertex, along an edge, to an adjacent vertex
a degenerate pivot stays at the same vertex
the path traversed by the simplex algorithm, from initial vertex to final (optimum) vertex,
is the simplex path
note the objective function always increases as we move along the simplex path
Hirsch Conjecture. (1957, still open)
any 2 vertices of a convex polyhedron are joined by a simplex path of length m
actually all the interesting relaxations of Hirsch are also open:

p(m)
theres a path of length p(m, n)

p(m, n, L) (L = total # bits in the integers aij , bi )


here p denotes any polynomial function, and we assume standard form
our geometric intuition can be misleading, e.g.,
a polyhedron is neighborly if every 2 vertices are adjacent
in any dimension 4 there are neighborly polyhedra with arbitrarily many vertices!

CSCI 5654

H. Gabow

Fall 2007

#10, p. 3

Chvatal, 337, 25860

Avoiding Cycling: Lexicographic Method

Unit 2: Fundamentals

well give 2 rules, each ensures the simplex algorithm does not cycle
both rules are easy to implement
but many simplex codes ignore the possibility of cycling, since it doesnt occur in practice
avoiding cycling is important theoretically, e.g.,
needed to prove the Fundamental Theorem of LP
Intuition for Lexicographic Method
degeneracy is an accident, i.e., > n hyperplanes intersect in a common point
a random LP is totally nondegenerate, i.e., it has no degenerate dictionary
our approach is to perturb the problem, so only n hyperplanes intersect in a common point
= there are no degenerate bfss = the simplex algorithm doesnt cycle

3 planes meet
at a vertex

the 3 planes & a 4th


meet at a vertex

moving the 4th plane forward


gives 2 vertices

The Perturbed LP
given an LP in standard form,
maximize z =
subject to

Pn

j=1 cj xj

Pn

j=1

aij xj

bi

(i = 1, . . . , m)

xj

(j = 1, . . . , n)

replace each right-hand side bi by bi + i , where


0 < m m1 . . . 1 1 = 0
()

CSCI 5654

H. Gabow

Fall 2007

#11, p. 1

Remarks
1. the definition 0 = 1 comes in handy below
2. its tempting to use a simpler strategy, replacing bi by bi +
i.e., use the same perturbation in each inequality
Chvatal p.34 shows this is incorrect, simplex can still cycle
the constraints must be perturbed by linearly independent quantities
well see this is crucial in our proof of correctness
3. the appropriate values of i are unknown for i > 0, and difficult to find
we finesse this problem by treating the i as variables with the above property ()!
imagine executing the simplex method on the perturbed problem
well get expressions like 2 + 1 52 as b terms get combined
call such expressions linear combinations (of the i s)
& simpler expressions that are just numbers,
Pmlike 2 call these scalars
so a linear combination has the form i=0 i i where each i is a scalar

in any dictionary,
the coefficients are aij & cj , i = 1, . . . , m, j = 1, . . . , n
the absolute terms are bi , i = 1, . . . , m & z

Lemma 1. Suppose we do a pivot step on a dictionary where


every coefficient is a scalar, &
every absolute term is a linear combination of the i s.
The resulting dictionary has the same form.
Proof. (intuitively the pivots are determined by the as)
an equation in a dictionary has the form xi = i aP
is xs . . .
rewriting the pivot equation gives xs = (r /ars ) j6=s (arj /ars )xj
substituting for xs in each equation other than the pivot equation
preserves every coefficient as a scalar
& every absolute term as a linear combination

How Do We Perform the Minimum Ratio Test in the Perturbed Problem?


we want to choose the row i minimizing i /ais , a linear combination
so we need to compare linear combinations
() tells us we should compare linear combinations using
lexicographic order, i.e., dictionary order, <
e.g., 50 + 22 + 94 < 50 + 32 P
5
Pm
m
in general for linear combinations = i=0 i i & = i=0 i i ,
< for some index j, 0 j m, j < j and i = i for i < j
thus 0 is most significant, 1 is next most significant, etc.
this lexical comparison corresponds to an ordinary numeric comparison:
take i = i with > 0 small enough
the above comparison becomes 5 +22 + 94 < 5 + 32 5 , i.e., 94 + 5 < 2
it suffice to have 104 < 2 , < 1/ 10

CSCI 5654

H. Gabow

Fall 2007

#11, p. 2

The Lexicographic Method


start with the perturbed problem
execute the simplex algorithm
choose any pivot rule you wish(!)
but use lexical order in the minimum ratio test
heres the key fact:
Lemma 2. The perturbed LP is totally nondegenerate, i.e.,
Pm
P
in any dictionary equation xk = i=0 i i jN aj xj ,
the first sum is not lexically 0, i.e., some i 6= 0 (in fact i > 0).
Remark. of course the most significant nonzero i will be positive
Proof.
recall the definition ofPeach slack variable in the initial dictionary:
n
xn+i = bi + i j=1 aij xj
substitute these equations in the above equation for xk
this gives an equation involving xj , j = 1, . . . , n & i , i = 1, . . . , m
that holds for any values of these variables
so each variable has the same coefficient on both sides of the equation
Case 1. xk is a slack variable in the initial dictionary.
say k = n + i, so the l.h.s. has the term i
to get i on the r.h.s. we need i = 1
Case 2. xk is a decision variable in the initial dictionary.
to get xk on the r.h.s., some nonbasic slack variable xn+i has an+i 6= 0
to cancel the term an+i i we must have i = an+i 6= 0

every pivot in the lexicographic method increases z, lexicographically


by Lemma 2 & Handout#8, Property 3
so the lexicographic method eventually halts
with a dictionary giving an optimum or unbounded solution
this dictionary corresponds to a dictionary for the given LP
take all i = 0
& gives an optimum or unbounded solution to the original LP!

CSCI 5654

H. Gabow

Fall 2007

#11, p. 3

Remarks.
1. our original intuition is correct:
there are numeric values of i that give a perturbed problem
the simplex algorithm does exactly the same pivots as the lexicographic method
just take i = i with > 0 small enough
this is doable since there are a finite number of pivots
2. many books prove the key Lemma 2 using linear algebra
(simple properties of inverse matrices)
Chvatal finesses linear algebra with dictionaries
3. perturbation is a general technique in combinatorial computing
e.g., any graph has a unique minimum spanning tree if we perturb the weights
4. smoothed analysis (Handout#4) computes the time to solve an LP L
by averaging over perturbed versions of L
where we randomly perturb the aij s and the bi s

CSCI 5654

H. Gabow

Fall 2007

#11, p. 4

Chvatal, 37 38

Pivot Rules & Avoiding Cycling

Unit 2: Fundamentals

the choice of entering variable is limited to the eligible variables


i.e., those with cost coefficient ci > 0
a pivot rule specifies the entering variable
Common Pivot Rules
Largest Coefficient Rule (nonbasic gradient, Dantzigs rule)
choose the variable with maximum ci ; stop if its negative
this rule depends on the scaling of the variables
e.g., formulating the problem in terms of x1 = x1 /10
makes x1 10 times more attractive as entering variable
Largest Increase Rule (best neighbor)
choose the variable whose pivot step increases z the most
(a pivot with xs entering & xr leaving increases z by cs br /ars )
in practice this rule decreases the number of iterations but increases the total time
Least Recently Considered Rule
examine the variables in cyclic order, x1 , x2 , . . . , xn , x1 , . . .
at each pivot step, start from the last entering variable
the first eligible variable encountered is chosen as the next entering variable
used in many commercial codes
Devex, Steepest Edge Rule (all variable gradient)
choose variable to make the vector from old bfs to new
as parallel as possible to the cost vector
recent experiments indicate this old rule is actually very efficient
in the dual LP
Open Problem
Is there a pivot rule that makes the simplex algorithm run in polynomial time?
Blands Rule
nice theoretic properties
slow in practice, although related to the least recently considered rule
Smallest-subscript Rule (Bland, 1977)
if more than one entering variable or leaving variable can be chosen,
always choose the candidate variable with the smallest subscript

CSCI 5654

H. Gabow

Fall 2007

#12, p. 1

Theorem. The simplex algorithm with the smallest-subscript rule never cycles.
Proof.
consider a sequence of pivot steps forming a cycle,
i.e., it begins and ends with the same dictionary
we derive a contradiction as follows
let F be the set of all subscripts of variables that enter (and leave) the basis during the cycle
let t F
let D be a dictionary in the cycle that gives a pivot where xt leaves the basis
similarly D is a dictionary giving a pivot where xt enters the basis
(note that xt can enter and leave the basis many times in a cycle)
dictionary D: basis B
coefficients aij , bi , cj
next pivot: xs enters, xt leaves
dictionary D : coefficients cj
next pivot: xt enters
Claim: cs = cs

iB ci ais

Proof of Claim:
the pivot for D corresponds to solutions xs = u, xi = bi ais u, i B, remaining xj = 0
these solutions satisfy D (although they may not be feasible)
the cost of such a solution varies linearly with u:
dictionary D shows it varies as cs u
P
dictionary D shows it varies as (cs iB ci ais )u

these two functions must be the same! this gives the claim

well derive a contradiction by showing the l.h.s. of the Claim is positive but the r.h.s. is nonpositive

CSCI 5654

H. Gabow

Fall 2007

#12, p. 2

cs > 0: since xs is entering in Ds pivot


to make the r.h.s. nonpositive, choose t as the largest subscript in F
cs 0: otherwise xs is nonbasic in D & D s pivot makes xs entering (s < t)
ci ais 0 for i B:
Case i = t:
ats > 0: since xt is leaving in Ds pivot
ct > 0: since xt is entering in D s pivot
Case i B F :
ci = 0: since xi never leaves the basis
Case i B (F t):
ais 0: since bi = 0 (any variable of F stays at 0 throughout the cycle see Handout #8)
but xi isnt the leaving variable in Ds pivot
(Wow!)
ci 0: otherwise xi is nonbasic in D & D s pivot makes xi entering (i < t)
Remarks
1. Blands discovery resulted from using matroids to study the sign properties of dictionaries
2. stalling when a large number of consecutive pivots stay at the same vertex
Blands rule can stall see Handout#49
3. interesting results have been proved on randomized pivot rules
e.g., Kalai (STOC 92) shows this pivot rule gives subexponential average running time:
choose a random facet F that passes through the current vertex
recursively move to an optimum point on F

CSCI 5654

H. Gabow

Fall 2007

#12, p. 3

Chvatal, 39 42

The Two-Phase Method

Unit 2: Fundamentals

given a standard form LP


P
maximize z = nj=1 cj xj
Pn
subject to j=1 aij xj bi

(i = 1, . . . , m)

xj 0

(j = 1, . . . , n)

if all bi are nonnegative the initial dictionary is feasible,


so the basic simplex algorithm solves the problem
if some bi < 0 we solve the problem as follows:
The Two-Phase Method
Phase 1: find a feasible dictionary (or detect infeasibility)
Phase 2: solve the given LP with the simplex algorithm, starting with the feasible dictionary
weve already described Phase 2; well use simplex to do Phase 1 too!
The Phase 1 LP
minimize x0
Pn
subject to j=1 aij xj x0 bi

(i = 1, . . . , m)

xj 0

(j = 0, . . . , n)

Motivation:
the minimum = 0 the given LP is feasible
but if the given LP is feasible, will we get a feasible dictionary for it?
x0 is sometimes called an artificial variable
before describing Phase 1, heres an example:
the given LP has constraints
x1 x2 1, 2x1 + x2 2,

7x1 x2 6

x1 , x2 0

(the first constraint 7x1 7x2 7 is inconsistent with the last 7x1 7x2 7x1 x2 6 )
put the constraints in standard form:
x1 + x2 1, 2x1 x2 2,

7x1 x2 6

x1 , x2 0

Phase 1 starting dictionary:

1st pivot: x0 enters, x4 leaves

(infeasible)

(achieving Phase 1 feasibility)

x3 = 1 + x1 x2 + x0

x3 = 1 x1 2x2 + x4

x4 = 2 + 2x1 + x2 + x0

x0 = 2 2x1 x2 + x4

x5 = 6 7x1 + x2 + x0
w =

CSCI 5654

x5 = 8 9x1

x0

+ x4

w = 2 + 2x1 + x2 x4

H. Gabow

Fall 2007

#13, p. 1

2nd pivot: x2 enters, x3 leaves


x2 =
x0 =

1
2
3
2

1
x
2 1
3
2 x1

1
x
2 3
1
2 x3

x5 = 8 9x1

+
+

last pivot: x1 enters, x5 leaves

1
x
2 4
1
2 x4

x2 = . . .
x0 = . . .
x1 =

+ x4

w = 32 + 23 x1 12 x3 21 x4

8
9

+ 91 x4 19 x5

w = 16 21 x3 31 x4 16 x5
optimum dictionary

the optimum w is negative = the given problem is infeasible


Exercise 1. Prove that throughout Phase 1, the equation for w and x0 are negatives of each other.
General Procedure for Phase 1
1. starting dictionary D0
in the Phase 1 LP, minimizing x0 amounts to maximizing x0
introduce slack variables xj , j = n + 1, . . . , n + m to get dictionary D0 for Phase 1:
xn+i = bi
w=

Pn

j=1

aij xj + x0

(i = 1, . . . , m)

x0

D0 is infeasible
2. feasible dictionary
to get a feasible dictionary pivot with x0 entering, xn+k leaving (well choose k momentarily)
x0 = bk +

Pn

akj xj + xn+k
Pn
xn+i = bi bk j=1 (aij akj )xj + xn+k
Pn
w = bk j=1 akj xj xn+k
j=1

(i = 1, . . . , m, i 6= k)

to make this a feasible dictionary choose bk = min{bi : 1 i m}


this makes each basic variable nonnegative, since bi bk
3. execute the simplex algorithm, starting with this feasible dictionary
choose x0 to leave the basis as soon as it becomes a candidate
then stop (x0 = 0 = optimal Phase 1 solution)
obviously the simplex algorithm halts with an optimum solution
the Phase 1 LP is bounded (w 0) so it has an optimum
4. Phase 1 ends when the simplex algorithm halts:

CSCI 5654

H. Gabow

Fall 2007

#13, p. 2

Case 1. Phase 1 terminates when x0 leaves the basis


let D be the optimum Phase 1 dictionary, with basis B
(since x0 = 0, D gives a feasible solution to given LP)
transform D to a feasible dictionary D for the given LP, with basis B:
1. drop all terms involving x0 from D
2. replace the objective function w with an equation for z:
eliminate the basic variables from the given equation for z
z=

cj xj +
|{z}
jB

cj xj

j B
/

substitute

D is a valid dictionary for the given LP:


Proof.
D has the same solutions as D0
hence D & D0 have same solutions with x0 = 0
i.e., ignoring objective functions,
D has the same solutions as the initial dictionary of the given LP

now execute Phase 2: run the simplex algorithm, starting with dictionary D
Exercise 1 (contd). Prove that in Case 1, the last row of D is always w = x0 .
Case 2. Phase 1 terminates with x0 basic.
In this case the given LP is infeasible
Proof.
it suffices to show that the final (optimum) value of x0 is > 0
equivalently, no pivot step changes x0 to 0:
suppose a pivot step changes x0 from positive to 0
x0 was basic at the start of the pivot, and could have left the basis
(it had the minimum ratio)
in this case Phase 1 makes x0 leave the basis

Remark.
the big-M method solves 1 LP
Pninstead of 2
it uses objective function z = j=1 cj xj M x0
where M is a symbolic value that is larger than any number encountered

CSCI 5654

H. Gabow

Fall 2007

#13, p. 3

A Surprising Bonus
if an LP is infeasible wed like our algorithm to output succinct evidence of infeasibility
in our example infeasible LP
the objective of the final dictionary shows how the given constraints imply a contradiction:
using the given constraints in standard form,
add 21 (1st constraint) + 31 (2nd constraint) +

1
6

(3rd constraint), i.e.,

1
2 (x1

+ x2 1) + 13 (2x1 x2 2) + 16 (7x1 x2 6)
simplifies to 0 61 , a contradiction!
relevant definition: a linear combination of inequalities is
the sum of multiples of each of the inequalities
the original inequalities must be of the same type (, , <, >)
the multiples must be nonnegative
we combine the l.h.s.s & the r.h.s.s
Phase 1 Infeasibility Proof
in general, suppose Phase 1 halts with optimum objective value w < 0
consider the last (optimal) dictionary
(ci 0)
suppose slack si has coefficient ci , i = 1, . . . , m
multiply the ith constraint by ci and add all m constraints
this will give a contradiction, (nonnegative #) w < 0
we show this always works in Handout#32,p.2
LINDO Phase 1 (method sketched in Chvatal, p.129)
Phase 1 does not use any artificial variables. Each dictionary uses a different objective function:
The Phase 1 objective for dictionary D is
w=

xh

where the sum is over all (basic) variables xh having negative values in D.
Tableau:
Row 1 gives the coefficients, in the current dictionary, of the given objective function. The last row
(labelled ART) gives the coefficients of the current Phase 1 cost function. This row is constructed
by adding together all rows that have negative bi s (but keeping the entries in the basic columns
equal to 0).

CSCI 5654

H. Gabow

Fall 2007

#13, p. 4

Simplex Iteration.
In the following, aij , bi and cj refer to entries in the current LINDO tableau (not dictionary!);
further, cj are the Phase 1 cost coefficients, i.e., the entries in the ART row. The value of the Phase
1 objective (bottom right tableau entry) is negative.
Entering Variable Step.
If every cj is 0 stop, the problem is infeasible.
Otherwise choose a (nonbasic) s with cs < 0.
Leaving Variable Step.
Choose a basic r that minimizes this set of ratios:
{

bi
: ais > 0 and bi 0, or ais < 0 and bi < 0}.
ais

Pivot Step.
Construct the tableau for the new basis (xs enters, xr leaves) except for the ART row. If every bi is
nonnegative the current bfs is feasible for the given LP. Proceed to Phase 2.
Otherwise construct the ART row by adding the rows of all negative bi s and zeroing the basic
columns.
Exercises.
1. Justify the conclusion of infeasibility in the Entering Variable Step. Hint. Show
P a feasible
solution implies an equation (nonnegative number) = (negative number), using 0 xh < 0.

2. Explain why the set of ratios in the Leaving Variable Step is nonempty. If it were empty wed
be in trouble!
3. Explain why any variable that is negative in the current dictionary started out negative and
remained so in every dictionary.
4. Explain why Phase 1 eventually halts, assuming it doesnt cycle. Hint. Show a pivot always
increases the current objective function (even when we switch objectives!).
5. Explain why the following is probably a better Leaving Variable Step:
Let P OS = {i : ais > 0 and bi 0}.
Let N EG = {i : ais < 0 and bi < 0}.
If P OS 6= then r is the minimizer of { abisi : i P OS}.
Otherwise r is the minimizer of { abisi : i N EG}.

CSCI 5654

H. Gabow

Fall 2007

#13, p. 5

Chvatal, 42 43

Unit 2: Fundamentals

The Fundamental Theorem of LP

recall the standard form LP


Pn
maximize z = j=1 cj xj
Pn
subject to j=1 aij xj bi
xj 0

(i = 1, . . . , m)
(j = 1, . . . , n)

Phase 1 proves the following fact:


A feasible LP in standard form has a basic feasible solution.
geometrically this says the polyhedron of the LP has a corner point
Example 1. this fact neednt hold if the LP is not in standard form, e.g., the 1 constraint LP
x1 + x2 5
(no nonnegativity constraints) is feasible but has no corner point:
x2

x1

weve now completely proved our main result:

Fundamental Theorem of LP. Consider any LP in standard form.


(i) Either the LP has an optimum solution
or the objective is unbounded or the constraints are infeasible.
(ii) If the LP is feasible then there is a basic feasible solution.
(iii) If the LP has an optimum solution then there is a basic optimum solution.

CSCI 5654

H. Gabow

Fall 2007

#14, p. 1

Example 1 contd.
adopting the objective function x1 + x2 ,
& transforming to standard form by the substitutions xj = pj n, gives the LP
maximize z = p1 + p2 2n
subject to p1 + p2 2n
p1 , p2 , n

5
0

this LP satisfies the Fundamental Theorem, having 2 optimum bfss/corner points:

p2
p2-2n <= 5

p1

p1-2n <= 5

n
The 2 optimum bfss are circled.
part (i) of the Fundamental Theorem holds for any LP
Question. Can you think of other sorts of linear problems, not quite in standard form and not
satisfying the theorem?
an unbounded LP has an edge thats a line of unboundedness
heres a stronger version of this fact:
Extended Fundamental Theorem (see Chvatal, 242243)
If the LP is unbounded, it has a basic feasible direction with positive cost.
to explain, start with the definition:

CSCI 5654

H. Gabow

Fall 2007

#14, p. 2

consider an arbitrary dictionary


let B be the set of basic variables
let s be a nonbasic variable, with coefficients ais , i B in the dictionary
if ais 0 for each i B then the following values wj , j = 1, . . . , n
form a basic feasible direction:
(1
j=s
wj =

ajs
0

jB
j
/ Bs

(n above denotes the total number of variables, including slacks)


Property of bfds:
if (x1 , . . . , xn ) is any feasible solution to an LP,
(w1 , . . . , wn ) is any basic feasible direction, & t 0,
then increasing each xj by twj gives another feasible solution to the LP
(prove by examining the dictionary)
Example. take the LP maximize z =
y
subject to
x+y 1
x, y 0
introducing slack variable s gives feasible dictionary
y = 1x+s

z = 1x+s
basic feasible direction s = t, y = t, x = 0
y

bfd

CSCI 5654

H. Gabow

Fall 2007

#14, p. 3

Claim. if an LP
Pn is unbounded the simplex algorithm finds a basic feasible direction wj
with j=1 cj wj > 0 (these cj s are original cost coefficients)
(n above can be the total number of decision variables)
the Claim implies the Extended Fundamental Theorem
Proof of Claim.
let cj denote the cost cofficients in the final dictionary
& z the cost value
let s be the entering variable for the unbounded pivot (cs > 0)
in what follows, all sums are over all variables, including slacks
X

cj xj = z +

cj (xj + wj ) = z +

for any feasible xj

cj (xj + wj )

since xj + wj is also feasible

X
j

CSCI 5654

c j xj

subtract to get

cj wj =

cj wj =

H. Gabow

jB

0 wj +

j Bs
/

cj 0 + cs = cs > 0

Fall 2007

#14, p. 4

Chvatal, 54 57

The Dual Problem & Weak Duality

Unit 3: Duality

The Dual Problem


consider a standard form LP, sometimes called the primal problem
Pn
maximize z = j=1 cj xj
Pn
subject to j=1 aij xj bi

(i = 1, . . . , m)

xj 0

(j = 1, . . . , n)

Pm
minimize z = i=1 bi yi
P
subject to m
i=1 aij yi cj

(j = 1, . . . , n)

yi 0

(i = 1, . . . , m)

its dual problem is this LP

Caution. this definition only works if the primal is in standard maximization form
Example. find the dual of Chvatal, problem 5.2, p. 69
notice how n & m get interchanged!
Dual Problem

Primal Problem
maximize
subject to

x1 2x2
3x1 + x2 1

minimize y1 + y2 + 6y3 + 6y4 3y5 + 6y6


subject to 3y1 + y2 2y3 + 9y4 5y5 + 7y6 1
y1 y2 + 7y3 4y4 + 2y5 3y6 2
y1 , y2 , y3 , y4 , y5 , y6 0

x1 x2 1
2x1 + 7x2 6
9x1 4x2 6
5x1 + 2x2 3
7x1 3x2 6
x1 , x2 0

Exercise. Put the LP


max x s.t. x 2
into standard form to verify that its dual is
min 2y s.t. y 1, y 0.
Professor Dull says Rather than convert to standard form by flipping x 2, Ill take the dual first
and then flip the inequality. So the dual is
min 2y s.t. y 1, y 0.
Show Dull is wrong by comparing the optimum dual objective values.

CSCI 5654

H. Gabow

Fall 2007

#15, p. 1

Multiplier Interpretation of the Dual, & Weak Duality


the dual LP solves the problem,
Find the best upper bound on the primal objective implied by the primal constraints.
Example contd. Prove that the Primal Problem has optimum solution x1 = 3/5, x2 = 0, z = 3/5.
z (1/3)(3x1 + x2 ) = z (1/3)(1) = 1/3 not good enough
z (1/5)(5x1 + 2x2 ) = z (1/5)(3) = 3/5 yes!
in general we want a linear combination of primal constraints
(l.h.s.) (the primal objective)
(r.h.s.) is as small as possible
this corresponds exactly to the definition of the dual problem:
the multipliers are the yi (= dual nonnegativity constraints)
dual constraints say coefficient-by-coefficient,
(the linear combination) (the primal objective)
dual objective asks for the best (smallest) upper bound possible
so by definition, (any dual objective value) (any primal objective value)
more formally:
Weak Duality Theorem.
Pm
Pn
xj , j = 1, . . . , n primal feasible & yi , i = 1, . . . , m dual feasible = j=1 cj xj i=1 bi yi
Proof.
m
X

bi y i

n
m
n X
n
m X
X
X
X
cj xj
aij yi )xj
(
aij xj )yi =
(

i=1

i=1 j=1

j=1 i=1

primal constraints &


dual nonnegativity

algebra

CSCI 5654

H. Gabow

j=1

dual constraints &


primal nonnegativity

Fall 2007

#15, p. 2

Remarks
1. its obvious that for a tight upper bound, only tight constraints get combined
i.e., the multiplier for a loose constraint is 0 see Handout#19
its not obvious how to combine tight constraints to get the good upper bound
the dual problem does this
2. How good an upper bound does the dual place on the primal objective?
its perfect! see Strong Duality
its remarkable that the problem of upper bounding an LP is another LP
3. plot all primal and dual objective values on the x-axis:
maximum primal objective
primal objective values

minimum dual objective


dual objective values

duality gap
Illustration of weak duality.
Strong Duality will show the duality gap is actually 0
4. starting with a primal-dual pair of LPs, add arbitrary constraints to each
Weak Duality still holds (above proof is still valid)
even though the 2 LPs need no longer form a primal-dual pair
e.g., we constrain all primal & dual variables to be integers
this gives a dual pair of integer linear programs
Weak Duality holds for any dual pair of ILPs
the duality gap is usually nonzero for dual ILPs

CSCI 5654

H. Gabow

Fall 2007

#15, p. 3

Chvatal, 57 59

Dictionaries are Linear Combinations

Unit 3: Duality

last handout viewed the dual variables as multipliers of primal inequalities


next handout views them as multipliers of dictionary equations in the simplex algorithm
to prepare for this we show the equations of any dictionary
are linear combinations of equations of the initial dictionary
for a given LP, consider the initial dictionary D and any other dictionary D
D has coefficients aij , bi , cj , basic slack variables & nonbasic decision variables
D has coefficients aij , bi , cj
Remark. the same logic applies if D is an arbitrary dictionary
we start by analyzing the objective function:
Pn
Ds cost equation is obtained from Ds cost equation, z = j=1 cj xj ,
by adding in, for all i,
Pn
cn+i times Ds equation for xn+i , xn+i = bi j=1 aij xj
in other words:
Pn+m
Lemma 1. Ds cost equation, z = z + j=1 cj xj , is precisely the equation
P
P
Pn
z = nj=1 cj xj m
i=1 cn+i (bi
j=1 aij xj xn+i ).
Example. check that the optimum primal dictionary of next handout, p.1 has cost equation
z = (x1 2x2 ) + 0.2(3 + 5x1 2x2 s5 )
Proof.
start with two expressions for z:
n+m
X
z+
cj xj

j=1

n
X
j=1

|
{z
}
Ds equation for z

cj xj

m
X
i=1

cn+i (bi

n
X

aij xj xn+i )

j=1

|
{z
}
Pm
Ds equation for z, minus i=1 cn+i 0

both sides of the equation are identical, since the slacks have the same coefficient on both sides
(Handout#6, Nonbasic variables are free)

Remark. the lemmas equation would be simpler if we wrote +cn+i rather than cn+i
but the minus sign comes in handy in the next handout

CSCI 5654

H. Gabow

Fall 2007

#16, p. 1

we analyze the equations for the basic variables similarly:


(this is only needed in Handout#20)
P
Ds equation for basic variable xk is xk = bk jN akj xj
define akj to be 1 if j = k & 0 for any other basic variable
now xk s equation
is a rearrangement of
Pn+m
0 = bk j=1 akj xj
this equation is obtained by adding together, for all i,P
n
ak,n+i times Ds equation for xn+i , xn+i = bi j=1 aij xj
in other words:
P
Lemma 2. Ds equation for basic variable xk , xk = bk jN akj xj , is a rearrangement of the
equation
Pm
Pn
0 = i=1 ak,n+i (bi j=1 aij xj xn+i ).
(Rearrangement simply means move xk to the l.h.s.)
Examples. Check that in the optimum primal dictionary of next handout, p.1,
the equation for x1 is a rearrangement of
0 = 0.2(3 + 5x1 2x2 s5 )
& in the optimum dual dictionary,
the equation for t2 is a rearrangement of
0 = 0.4(1 3y1 + . . . + 7y6 t1 ) + (2 + y1 + . . . 3y6 t2 )
Proof.
start with two expressions for 0:
n+m
X
bk
akj xj
j=1

|
{z
}
Ds equation for xk

m
X

ak,n+i (bi

i=1

n
X

aij xj xn+i )

j=1

{z
}
Ds equations for the slacks

again the slacks have the same coefficients on both sides


so both sides are identical

Moral: its easy to obtain the equations of a dictionary


as linear combinations of equations of the initial dictionary:
the slack coefficients are the multipliers

CSCI 5654

H. Gabow

Fall 2007

#16, p. 2

Chvatal, 57 59, 261 262

Strong Duality Theorem

Unit 3: Duality

Second View of the Dual Variables


the dual LP gives the constraints on the multipliers used by the simplex algorithm
to construct its optimum dictionary
more precisely the simplex algorithm actually solves the dual problem:
in the cost equation of the optimum dictionary
the coefficients cn+i , i = 1, . . . , m form an optimum dual solution
a coefficient cj , j = 1, . . . , n equals the slack in the jth dual inequality, for this solution
before proving this, heres an example, the primal & dual of Handout#15:
for convenience we denote the slack variables as si in the primal, tj in the dual
Starting Primal Dictionary

Starting Dual Dictionary

s1 = 1 + 3x1 x2
s 2 = 1 x1 + x2

t1 = 1 3y1 + y2 2y3 + 9y4 5y5 + 7y6


t2 = 2 + y1 y2 + 7y3 4y4 + 2y5 3y6

s3 = 6 + 2x1 7x2
s4 = 6 9x1 + 4x2

z = y1 y2 6y3 6y4 + 3y5 6y6

s5 = 3 + 5x1 2x2
s6 = 6 7x1 + 3x2
z = x1 2x2
starting primal dictionary isnt feasible, but this doesnt matter
Optimum Primal Dictionary
(= final Phase 1 dictionary)

Optimum Dual Dictionary


(obtained in 1 pivot)

x1 = 0.6 + 0.4x2 + 0.2s5

y5 = 0.2 0.6y1 + 0.2y2 0.4y3 + 1.8y4 + 1.4y6 0.2t1

s2 = 0.4 + 0.60x2 0.2s5


s3 = 7.2 6.2x2 + 0.4s5
s4 = 0.6 + 0.4x2 1.8s5

t2 = 2.4 0.2y1 0.6y2 + 6.2y3 0.4y4 0.2y6 0.4t1

s1 = 0.8 + 0.2x2 + 0.6s5


s6 = 1.8 + 0.2x2 1.4s5

z = 0.6 0.8y1 0.4y2 7.2y3 0.6y4 1.8y6 0.6t1


primal
slacks

primal
decisions

z = 0.6 2.4x2 0.2s5


dual
dual
slacks

decisions

the cost equation of optimum primal dictionary indicates an optimum dual solution
y5 = 0.2, yj = 0 for j = 1, 2, 3, 4, 6
& the corresponding slack in the dual constraints
t2 = 2.4, t1 = 0
CSCI 5654

H. Gabow

Fall 2007

#17, p. 1

check out the dual dictionary!


the proof of the next theorem shows the second view is correct in general
our theorem says the primal & dual problems have the same optimum values, as suspected
e.g., the common optimum is 0.6 in our example

Strong Duality Theorem.


If the primal LP has an optimum solution xj , j = 1, . . . , n then
the dual LP has an optimum solution yi , i = 1, . . . , m with the same objective value,
Pn
Pm
j=1 cj xj =
i=1 bi yi .
Proof.
consider the optimum primal dictionary found by the simplex method
let bars refer to the optimum dictionary, e.g., cj
no bars refer to the given dictionary, e.g., cj
set yi = cn+i , i = 1, . . . , m
these yi are the multipliers
found P
by the simplex
Pn
Pnalgorithm, i.e.,
m
the final cost row is j=1 cj xj + i=1 yi (bi j=1 aij xj xn+i )
Pm
note for j n, cj = cj i=1 aij yi = tj , where tj is the slack in the jth dual constraint
these yi are dual feasible because the final cost coefficients are nonpositive:
yi 0 since cn+i 0, for i = 1, . . . , m
Pm
i=1 aij yi cj since the slack tj = cj 0, for j = 1, . . . , n
Pm
final primal objective value is z = i=1 yi bi = (the dual objective value)
Weak Duality implies this is the minimum possible dual objective value,
& yi is optimum

CSCI 5654

H. Gabow

Fall 2007

#17, p. 2

Remarks.
1. theres no sign flip in LINDO tableaux
2. Strong Duality is the source of many minimax theorems in mathematics
e.g., the Max Flow Min Cut Theorem (Chvatal, p.370)
see Handout#63
3. another visualization of primal-dual correspondence:
primal
primal
decisions slacks
n + m variables
dual
slacks

dual
decisions

Variable correspondence in duality


4. many other mathematical programs have duals and strong duality (e.g., see Handout#43)

CSCI 5654

H. Gabow

Fall 2007

#17, p. 3

Chvatal, 60 62

Unit 3: Duality

Why Dual?

the dual & primal problems are symmetric, in particular:


Theorem. The dual of the dual LP is (equivalent to) the primal LP.
equivalent to means they have the same feasible solutions
& the essentially the same objective values
Proof.
the dual in standardPform is
maximize m
i=1 bi yi
Pm
subject to i=1 aij yi cj
yi 0

(j = 1, . . . , n)
(i = 1, . . . , m)

its dual is equivalent


P to the primal problem:
minimize nj=1 cj uj
Pn
subject to j=1 aij uj bi
(i = 1, . . . , m)
uj 0

(j = 1, . . . , n)

for instance in the example of Handout#17


we can read the optimum primal solution from the optimum dual dictionary:
x1 = 0.6, x2 = 0; & also the primal slack values si
Remark
to find the dual of an LP with both & constraints,
place it into 1 of our 2 standard forms maximization or minimization
whichever is most convenient
Example 1. Consider the LP
maximize z = x s.t. x 1
At first glance its plausible that the dual is
minimize z = y s.t. y 1, y 0
To get the correct dual put the primal into standard minimization form,
minimize z = x s.t. x 1, x 0
and get the correct dual
maximize z = y s.t. y 1, y 0
Alternatively convert to standard maximization form
maximize z = x s.t. x 1, x 0
and get dual
minimize z = y s.t. y 1, y 0.

CSCI 5654

H. Gabow

Fall 2007

#18, p. 1

The 3 Possibilities for an LP and its Dual


if an LP has an optimum, so does its dual (Strong Duality)
if an LP is unbounded, its dual is infeasible (Weak Duality)
e.g., in Example 1 the primal is unbounded and the dual is infeasible
these observations & the previous theorem show there are 3 possibilities for an LP and its dual:
(i) both problems have an optimum & optimum objective values are =
(ii) 1 problem is unbounded, the other is infeasible
(iii) both problems are infeasible
Example of (iii):
maximize
subject to

5x1 + 6x2
29x2 5
29x1
6
x1 , x2 0

this LP is infeasible
the LP is self-dual, i.e., it is its own dual
so the dual is infeasible
Exercise. Using matrix notation of Unit 4, show the LP
max cx s.t. Ax b, x 0
is self-dual if A is skew symmetric and c = bT .
in case (i), plotting all primal and dual objective values on the x-axis gives
primal objective values

dual objective values

optimum primal &


dual objective
in case (ii) the optimum line moves to + or

CSCI 5654

H. Gabow

Fall 2007

#18, p. 2

Exercise. As in the exercise of Handout#2 the Linear Inequalities (LI) problem is to find a solution
to a given system of linear inequalities or declare the system infeasible. We will show that LI is
equivalent to LP, i.e., an algorithm for one problem can be used to solve the other.
(i) Show an LP algorithm can solve an LI problem.
(ii) Show an LI algorithm can solve an LP problem. To do this start with a standard form LP,
maximize z =

Pn

subject to

Pn

j=1 cj xj

j=1

aij xj
xj

bi
0

(i = 1, . . . , m)
(j = 1, . . . , n)

and consider the LI problem,


Pn

aij xj

Pm

aij yi

j=1
i=1

Pn

xj

j=1 cj xj

yi
b
i=1 i yi

Pm

bi
0
cj
0
0

(i = 1, . . . , m)
(j = 1, . . . , n)
(j = 1, . . . , n)
(i = 1, . . . , m)

(ii) together with the exercise of Handout#2 shows any LP can be placed into the standard form
required by Karmarkars algorithm.

CSCI 5654

H. Gabow

Fall 2007

#18, p. 3

Chvatal, 62 65

Unit 3: Duality

Complementary Slackness

rephrase the termination condition of the simplex algorithm in terms of duality:


for any j = 1, . . . , n, xj > 0 = xj basic = cj = 0 = tj = 0, i.e,
xj > 0 = the jth dual constraint holds with equality
for any i = 1, . . . , m, yi > 0 = cn+i < 0 = xn+i nonbasic, i.e.,
yi > 0 = the ith primal constraint holds with equality
heres a more general version of this fact:
as usual assume a standard form primal LP

Complementary Slackness Theorem.


Let xj , j = 1, . . . , n be primal feasible, yi , i = 1, . . . , m dual feasible.
xj is primal optimal & yi is dual optimal
Pm
for j = 1, . . . , n, either xj = 0 or i=1 aij yi = cj
and
Pn
for i = 1, . . . , m, either yi = 0 or j=1 aij xj = bi .
heres an equivalent formulation:

Complementary Slackness Theorem.


Let xj , j = 1, . . . , n be primal feasible, yi , i = 1, . . . , m dual feasible.
Let si , i = 1, . . . , m be the slack in the ith primal inequality,
& tj , j = 1, . . . , n the slack in the jth dual inequality.
xj is primal optimal & yi is dual optimal
for j = 1, . . . , n, xj tj = 0 & for i = 1, . . . , m, yi si = 0.

Remark CS expresses a fact thats obvious from the multiplier interpretation of duality
the dual solution only uses tight primal constraints
Proof.
Weak Duality holds for xj and yi
lets repeat the proof:
Pm

i=1 bi yi

Pm

Pn

i=1 (

j=1

aij xj )yi =

Pn

Pm

j=1 (

i=1

aij yi )xj

Pn

j=1 cj xj

xj and yi are both optimal in this proof, the two s are =


the first is an = for each i = 1, . . . , m, si or yi is 0
the 2nd is an = for each j = 1, . . . , n, tj or xj is 0

CSCI 5654

H. Gabow

Fall 2007

(Strong Duality)

#19, p. 1

Remarks.
Pm
1. a common error is to assume xj = 0 = i=1 aij yi 6= cj , or vice versa
2. the simplex algorithm maintains primal feasibility and complementary slackness (previous page)
& halts when dual feasibility is achieved
3. Complementary Slackness is the basis of primal-dual algorithms (Ch.23)
they solve LPs by explicitly working on both the primal & dual
e.g., the Hungarian algorithm for the assignment problem; minimum cost flow problems
& primal-dual approximation algorithms for NP-hard problems
Exercise. Show the set of all optimum solutions of an LP is a face.
Testing optimality
complementary slackness gives a test for optimality, of any LP solution
given a standard form LP L
Pn
maximize z = j=1 cj xj
P
subject to nj=1 aij xj bi

(i = 1, . . . , m)

xj 0

(j = 1, . . . , n)

let xj , j = 1, . . . , n be a feasible solution to L


our result says xj is an optimal solution it has optimal simplex multipliers:
Theorem. xj is optimal yi , i = 1, . . . , m
xj > 0 =

m
X

aij yi = cj

(1)

i=1
n
X

aij xj < bi = yi = 0

(2)

j=1
m
X

aij yi cj

(j = 1, . . . , n)

(3)

(i = 1, . . . , m)

(4)

i=1

yi 0

remembering the optimum


Pm cost equation of a dictionary (Handout#17),
cj = tj = cj i=1 aij yi for j n, cn+i = yi for i m
(3)(4) say any variable has nonpositive cost
(1)(2) say basic variables have cost 0
Proof.
= : Strong Duality shows the optimum yi exists
Complementary Slackness gives (1) (2)
= : Complementary Slackness guarantees xj is optimal

CSCI 5654

H. Gabow

Fall 2007

#19, p. 2

Application
to check a given feasible solution xj is optimal
use (2) to deduce the yi s that vanish
use (1) to find the remaining yi s (assuming a unique solution)
then check (3) (4)
Examples:
1. Chvatal pp. 6465
2. check x1 = .6, x2 = 0 is optimum to the primal of Handout#15, p.1 (y5 = .2, yi = 0 for i 6= 5)
the above uniqueness assumption is reasonable
for a nondegenerate bfs xj , (1)(2) form a system of m equations in m unknowns
more precisely if k decision variables are nonzero and m k slacks are nonzero
(1) becomes a system of k equations in k unknowns
satisfying the uniqueness condition:
Theorem. xj , j = 1, . . . , n a nondegenerate bfs = system (1)(2) has a unique solution.
Proof.
let D be a dictionary for xj
(1)(2) are the equations satisfied by the m multipliers for the cost equation of D
so we need only show D is unique
since yi appears in the cost equation, distinct multipliers give distinct cost equations
uniqueness follows since
xj corresponds to a unique basis (nondegeneracy) & any basis has a unique dictionary

Corollary. An LP with an optimum nondegnerate dictionary has a unique optimum dual solution.
although the primal can still have many optima
primal: max 2x1 + 4x2 s.t. x1 + 2x2 1, x1 , x2 0
optimum nondegenerate dictionary: x1 = 1 s 2x2
dual: min y s.t. y 2, 2y 4, y 0

CSCI 5654

H. Gabow

z = 2 2s

Fall 2007

#19, p. 3

Exercise. If the complementary slackness conditions almost hold, were close to optimality.
This principle is the basis for many ILP approximation algorithms. This exercise proves the principle, as follows.
Consider a standard form LP
Pn
maximize z = j=1 cj xj
Pn
subject to j=1 aij xj bi

(i = 1, . . . , m)

xj 0

(j = 1, . . . , n)

with optimum objective value z . Let xj , j = 1, . . . , n be primal feasible & yi , i = 1, . . . , m dual


feasible, such that these weakened versions of Complementary Slackness hold, for two constants
, 1:
Pm
for j = 1, . . . , n, xj > 0 implies i=1 aij yi cj ;
Pn
for i = 1, . . . , m, yi > 0 implies j=1 aij xj bi /.
Show that the values xj , j = 1, . . . , n solve the given LP to within a factor of opimality, i.e.,
z

Pn

j=1 cj xj .

Hint. Mimic the proof of Weak Duality.

CSCI 5654

H. Gabow

Fall 2007

#19, p. 4

Chvatal, 65 68

Dual Variables are Prices in Economics

Unit 3: Duality

dual variables are prices of resources: yi is the marginal value of resource i


i.e., yi is the per unit value of resource i,
assuming just a small change in the amount of resource i
Chv
atals Forestry Example (pp. 6768)

x2

x1 + x2 = 100
10x1 + 50x2 = 4000
x1
40x1 + 70x2 = 6250
x1 = # acres to fell and regenerate; x2 = # acres to fell and plant pine
standard form LP
s1 = 100 x1 x2

acreage constraint, 100 acres available

s2 = 4000 10x1 50x2

cash constraint, $4000 on hand

z=

net profit

40x1 + 70x2

z gives the net profit from 2 the forestry activities, executed at levels x1 and x2
optimum dictionary D
x1 = 25 1.25s1 + .025s2
x2 = 75 + .25s1 .025s2
z = 6250 32.5s1 .75s2
suppose (as in Chvatal) there are t more units of resource #2, cash
(t is positive or negative)
in 2nd constraint, 4000 4000 + t
How does this change dictionary D ?
since the optimum dual solution is y1 = 32.5, y2 = .75,
Lemma 1 of Handout#16 shows D s objective equation is
(original equation for z) + 32.5 (1st constraint) +.75 (2nd constraint)
= the objective equation becomes z = 6250 + .75t 32.5s1 .75s2

CSCI 5654

H. Gabow

Fall 2007

#20, p. 1

D is an optimum dictionary as long as its feasible


get the constraints of D using Lemma 2 of Handout#16
constraint for x1 is (a rearrangement of) 1.25 (1st constraint) .025 (2nd constraint)
= x1 = 25 .025t 1.25s1 + .025s2
constraint for x2 is (a rearrangement of) .25 (1st constraint) +.025 (2nd constraint)
= x2 = 75 + .025t + .25s1 .025s2
so D is optimum precisely when 25 .025t 75, i.e., 3000 t 1000
in this range, t units of resource #2 increases net profit by .75t
i.e., the marginal value of 1 unit of resource #2 is .75, i.e., y2
thus its profitable to purchase extra units of resource 2
at a price of (current price) +0.75
i.e., borrow $1 if we pay back $1.75
invest $1 (of our $4000) in another activity if it returns $1.75
Remark
packages differ in their sign conventions for dual prices
LINDO dual price = amount objective improves when r.h.s. increases by 1
AMPL dual price (.rc, dual variable) = amount objective increases when increase by 1

CSCI 5654

H. Gabow

Fall 2007

#20, p. 2

Marginal Value Theorem


the dual variables are marginal values, shadow prices
specifically we prove yi is the marginal value of resource i:
suppose a standard form LP L has a nondegenerate optimum bfs
let D (with starred coefficients, e.g., z ) be the optimum dictionary
let yi , i = 1, . . . , m be the optimum dual solution
(unique by Handout #19)
for values ti , i = 1, P
. . . , m, define the perturbed LP L(t):
n
maximize j=1 cj xj
P
(i = 1, . . . , m)
subject to nj=1 aij xj bi + ti
xj 0

(j = 1, . . . , n)

Theorem. > 0 for any ti , |ti | , i = 1, . . . , m,


Pm
L(t) has an optimum & its optimum value is z + i=1 yi ti .
Mnemonic. z =

bi yi , so z/bi = yi

this theorem is stronger than the above example


the resource amounts can vary independently
Proof.
recall Lemmas 12, Handout#16:
in the optimum, nondegenerate dictionary D for L,
each equation is a linear combination of equations of the initial dictionary
cost equation has associated multipliers yi
kth constraint equation has associated multipliers uki
using these multipliersP
on L(t) gives dictionary
Pm with

basic values bk + m
u
t
,
cost
z
+
i=1 ki i
i=1 yi ti
this solution is optimum as long as its feasible, i.e., bk +

Pm

i=1

uki ti 0

let b = min{bk : 1 k m}
(b > 0 by nondegeneracy)
U = max{|uki | : 1 k, i m} (U > 0, else no constraints)
= b/(2mU )
( > 0)
taking |ti | for all i makes l.h.s. b mU = b/2 > 0

CSCI 5654

H. Gabow

Fall 2007

#20, p. 3

More Applications to Economics


Economic Interpretation of Complementary Slackness
Pn

j=1 aij xj

< bi = yi = 0

not all of resource i is used = its price is 0


i.e., more of i doesnt increase profit
Pm

i=1 aij yi

> cj = xj = 0
if the resources consumed by activity j are worth more than its (net) profit,
we wont produce j

Nonincreasing Returns to Scale


we show the value of each resource is nonincreasing, by Weak Duality:
let L have optimum value z & (any) dual optimal solution yi , i = 1, . . . , m
(yi is unique of L is nondegenerate, but we dont assume that)
Theorem.
For any ti ,P
i = 1, . . . , m and any fs xj , j = 1, . . . , n to L(t),
Pn
m

c
x

z
+
i=1 yi ti
j=1 j j
Proof.
we repeat the Weak Duality argument:
Pn

j=1 cj xj

Pn

Pm

j=1 (

i=1

aij yi )xj =

(last step uses Strong Duality)

CSCI 5654

Pm

Pn

i=1 (

j=1

aij xj )yi

Pm

i=1 (bi

+ ti )yi = z +

Pm

i=1 ti yi

H. Gabow

Fall 2007

#20, p. 4

Chvatal, 137 143

Allowing Equations & Free Variables

Unit 3: Duality

in standard maximization form


each constraint gives a nonnegative dual variable
each nonnegative variable gives a constraint
we can extend this correspondence to allow equations and free variables
in standard maximization form:
(i) each = constraint gives a free dual variable
(ii) each free variable gives an = constraint
(in all other respects we form the dual as usual)
(i) & (ii) hold in standard minimization form too
Example.
Dual

Primal
maximize x1 + 2x2 + 3x3 + 4x4
subject to

3x1 + x2 + x3 x4 1
x1 x2 x3 + 2x4 = 1
2x1 + 7x2 + x3 4x4 = 6

minimize
subject to

y1 + y2 + 6y3 + 6y4
3y1 + y2 2y3 + 9y4 1
y1 y2 + 7y3 4y4 2
y1 y2 + y3 y4 = 3
y1 + 2y2 4y3 + 6y4 = 4
y1 , y4 0

9x1 4x2 x3 + 6x4 6


x1 , x2 0

note that to form a dual, we must still start with a consistent primal
e.g., a maximization problem with no constraints
Proof of (i) (ii).
consider a problem P in standard form plus additional equations & free variables
we transform P to standard form & take the dual D
we show D is equivalent to
the problem produced by using rules (i) (ii) on P
Pn
(i) consider an = constraint in P, j=1 aij xj = bi
it gets transformed to standard form constraints,
Pn
aij xj bi
Pj=1
n
j=1 aij xj bi
these constraints give 2 nonnegative variables in D, pi & ni
the jth constraint of D has the terms aij pi aij ni & the objective bi pi bi ni
equivalently, aij (pi ni ) & bi (pi ni )
substituting yi = pi ni gives a free dual variable yi with terms aij yi & bi yi
(ii) is similar
in fact just read the above proof backwards

CSCI 5654

H. Gabow

Fall 2007

#21, p. 1

Exercises.
1. Repeat the proof using our slicker transformations, i.e., just 1 negative variable/1 constraint.
2. Show the dual of an LP in standard maximization form remains the same when we think of all
variables as free and xj 0 as a linear constraint.
3. Show dropping a constraint that doesnt change the feasible region doesnt change the dual.
Remarks
1. Equivalent LPs have equivalent duals, as the exercises show.
P
P
2. Often we take the primal problem to be, max
cj xj s.t.
aij xj bi
(optimizing over
polyhedron)
Pa generalP
with dual min
yi bi s.t.
yi aij = cj , yi 0
obviously Weak Duality & Strong Duality still hold for a primal-dual pair
Chvatal proves all this sticking to the interpretation of the dual LP
as an upper bound on the primal
Complementary Slackness still holds
theres no complementary slackness condition for an equality constraint or a free variable!
(since its automatic)
Examples. we give 2 examples of LPs with no Complementary Slackness conditions:
1. heres a primal-dual pair where every feasible primal or dual solution is optimum:
maximize x1 + 2x2
subject to x1 + 2x2 = 1

minimize y1
subject to y1 = 1
2y1 = 2

2. in this primal-dual pair, the dual problem is infeasible


maximize x1
subject to x1 + x2 = 1

CSCI 5654

minimize y1
subject to y1 = 1
y1 = 0

H. Gabow

Fall 2007

#21, p. 2

Chvatal, Ch. 15

Unit 3: Duality

Duality in Game Theory

Saddle Points in Matrices


-2

-1

0
(a)

-1

-1

1
(b)

Fig.1. Matrices with row minima underlined by leftward arrow


& column maxima marked with upward arrow.
Fact. In any matrix, (the minimum entry of any row) (the maximum entry of any column).
an entry is a saddle point if its the minimum value in its row and the maximum value in its column
a matrix with no duplicate entries has 1 saddle point (by the Fact)
Example. Fig.1(a) has a saddle point but (b) does not
0-Sum Games & Nash Equilibria
a finite 2-person 0-sum game is specified by an m n payoff matrix aij
when the ROW player chooses i and the COLUMN player chooses j,
ROW wins aij , COLUMN wins aij
Example. Fig.1(b) is for the game Matching Pennies
ROW & COLUMN each choose heads or tails
ROW wins $1 when the choices match & loses $1 when they mismatch
for Fig.1(a),
ROW maximizes her worst-case earnings by choosing a maximin strategy,
i.e., she choose the row whose minimum entry is maximum, row 2
COLUMN maximizes her worst-case earnings by choosing a minimax strategy,
i.e., she choose the column whose maximum entry is minimum, column 1
this game is stable in repeated plays,
neither player is tempted to change strategy
this is because entry 1 is a saddle point
in general,
if a payoff matrix has all entries distinct & contains a saddle point,
both players choose it, the game is stable &
(ROWs worst-case winnings) = (COLUMNs worst-case losses)
& in repeated plays both players earn/lose this worst-case amount

CSCI 5654

H. Gabow

Fall 2007

()

#22, p. 1

Remark.
for 0-sum games, stability is the same as a Nash point:
in any game, a Nash equilibrium point is a set of strategies for the players
where no player can improve by unilaterally changing strategies
Example 2. Matching Pennies is unstable:
1

-1

-1

Fig.2 ROW plays row 1 and COLUMN plays column 1.


Then COLUMN switches to 2. Then ROW switches to 2.
Then COLUMN switches to 1. Then ROW switches to 1.
The players are in a loop!
Example 3. this game is stable, in spite of the embedded cycle from Matching Pennies:
-1

-1

-2

-2

any game with no saddle point is unstable theres no Nash equilibrium point, so some player will always switch
ROW prefers and will switch to it
COLUMN prefers and will switch to it
.. no saddle point = 1 or both players always switch
but suppose we allow (more realistic) stochastic strategies
each player plays randomly, choosing moves according to fixed probabilities
Example.
in Matching Pennies, each player chooses heads or tails with probability 1/2
this is a Nash equilibrium point:
each player has expected winnings = 0
she cannot improve this by (unilaterally) switching strategies
any other strategy still has expected winnings = 0
the game is stable and () holds
but what if ROW plays row 1 with probability 3/4?
COLUMN will switch to playing column 2 always,
increasing expected winnings from 0 to 1/2 = (3/4)(1) + (1/4)(1)
then they start looping as in Fig.2
well show that in general, stochastic strategies recover ()

CSCI 5654

H. Gabow

Fall 2007

#22, p. 2

The Minimax Theorem.


For any payoff matrix, there are stochastic strategies for ROW & COLUMN
(ROWs worst-case expected winnings) = (COLUMNs worst-case expected losses).
Proof.
ROW plays row i with probability xi , i = 1, . . . , m
this strategy gives worst-case expected winnings z iff
ROWs expected winnings are z for each column
to maximize z, ROW computes xi as the solution to the LP
maximize z P
m
subject to z i=1 aij xi 0
Pm
i=1 xi = 1
xi 0

j = 1, . . . , n
i = 1, . . . , m

z unrestricted
COLUMN plays column j with probability yj , j = 1, . . . , n
this strategy gives worst-case expected losses w iff
the expected losses are w for each row
to minimize w, COLUMN computes yj as the solution to the LP
minimize w
Pn
subject to w j=1 aij yj 0
Pn
j=1 yj = 1

i = 1, . . . , m

yj 0

j = 1, . . . , n

w unrestricted
these 2 LPs are duals
both are feasible (set one xi = 1, all others 0)
= both have the same optimum objective value

the common optimum is the value of the game


obviously if both players use their optimum strategy,
they both earn/lose this worst-case amount
Exercise. Using complementary slackness, check that ROW & COLUMN have the equal expected
winnings.

CSCI 5654

H. Gabow

Fall 2007

#22, p. 3

Example for Minimax Theorem.

5/11

6/11

7/11

-2

4/11

-3

-4

Fig.4 Optimum stochastic strategies are given along the matrix borders.
The loop for deterministic play is also shown.
the value of this game is 2/11:
ROWs expected winnings equal
(7/11) [(5/11)(2) + (6/11)(2)] +(4/11) [(5/11)(4) + (6/11)(3)] = 2/11
|
{z
}
|
{z
}
2/11

2/11

1/2-q

1/2-q

-1

-1

1/2-p
1

-1

1/2-p
-1

Fig.5 Game with 2 copies of Matching Pennies.


General form of optimum strategies is shown, for 0 p, q 1/2.
Remarks
1. as shown in Handout#3,
LP can model a minimax objective function (as shown in COLUMNs LP)
or a maximin (ROWs LP)
2. in more abstract terms weve shown the following:
a matrix with a saddle point satisfies maxi minj {aij } = minj maxi {aij }
the Minimax Theorem says any matrix has
a stochastic row vector x & a stochastic column vector y with
miny x Ay = maxx xAy

CSCI 5654

H. Gabow

Fall 2007

#22, p. 4

Chvatal, 97 100

Matrix Representations of LPs

Unit 4: Implementation

the matrix representation of LPs uses standard row/column conventions


e.g., heres an example LP well call E:
 

 x1
maximize z = 3 1
x2
   

1
1 0 x1

subject to
2
1 1 x2
   
0
x1

x2
0
Standard Form LP:

Standard Minimization Form: (dual)

maximize cx
subject to Ax b
x0

minimize yb
subject to yA c
y0

Conventions
A: m n coefficient matrix
b: column vector of r.h.s. coefficients (length m)
c: row vector of costs (length n)
x: column vector of primal variables (length n)
y: row vector of dual variables (length m)
Initial Dictionary
let xS be the column vector of slacks
xS = b Ax
z = cx
more generally:
extend x, c, A to take slack variables into account
now x & c are length n + m vectors; A = [A0 I ] is m (n + m)
the standard form LP becomes Standard Equality Form:
maximize cx
subject to Ax = b
x0
we assume this form has been constructed from standard form, i.e., slacks exist
alternatively we assume A has rank m, i.e., it contains a basis (see Handout#31)

 
 x1

1
1 0 1 0 x2
e.g., our example LP E has constraints
=
2
1 1 0 1
x3
x4
a basis is denoted by B, an ordered list of indices of basic variables
(each index is between 1 & n + m)
e.g., in E the basis of slacks is 3, 4
when B denotes a basis, N denotes the indices of nonbasic variables
i.e., the complementary subset of {1, . . . , n + m}, in any order

CSCI 5654

H. Gabow

Fall 2007

#23, p. 1

Theorem. B is a basis AB is a nonsingular matrix.


Proof.
= :
the dictionary for B is equivalent to the given system Ax = b
it shows the bfs for B is the unique solution when we set xN = 0,
i.e., the unique solution to AB xB = b
this implies AB is nonsingular (see Handout #55)
= :
let B denote AB (standard notation)
the given constraints are BxB + AN xN = b
solve for xB by multiplying by B1
the ith variable of B is the l.h.s. of ith dictionary equation
express objective z = cB xB + cN xN in terms of xN by substituting for xB
thus the dictionary for a basis B is
xB = B1 b B1 AN xN
z = cB B1 b + (cN cB B1 AN )xN

Remark.
the expression cB B1 in the cost equation will be denoted y (the vector of dual values)
the cost row corresponds to Lemma 1 of Handout#16
looking at the original dictionary, y is the vector of multipliers
Example.





0
1 0
1
,B =
1
1 1

1
in E, B = (1, 2) gives B =
1
dictionary:


x1
x2

1 0
=
1 1

  

    
 
1
1 0 1 0 x3
1
1 0 x3

2
1 1 0 1 x4
1
1 1 x4

  

 1
+
z= 3 1
0
1


cB B

= 3

0 3 1

1 0
1 1

 

x3
x4

= 4 2x3 x4




1 0
= 2 1
1 1

in scalar form,
x1 = 1 x3
x2 = 1 + x3 x4
z = 4 2x3 x4

CSCI 5654

H. Gabow

Fall 2007

#23, p. 2

Exercise. (Rounding Algorithm) Consider an LP


minimize cx
subject to Ax b
where A is an m n matrix of rank n. (This means A has n linearly independent columns. Any
LP in standard form satisfies this hypothesis.) Let u be feasible. We will prove the feasible region
has a vertex with cost cu, unless it has a line of unboundedness.
Let I {1, . . . , m} be a maximal set of linearly independent constraints that are tight at u. If
|I| = n then u is a vertex and were done. So suppose |I| < n.
We will find either a line of unboundedness or a new u, of no greater cost, that has a larger set I.
Repeating this procedure n times gives the desired line of unboundedness or the desired vertex
u.

u0

u1

u3

u2

Path u0 , . . . , u3 taken by rounding algorithm.


Objective = height.
Choose a nonzero vector w such that Ai w = 0 for every constraint i I.
(i) Explain why such a w exists.
Assume cw 0 (if not, replace w by its negative).
(ii) Explain why every constraint i with Ai w 0 is satisfied by u+tw for every t 0. Furthermore
the constraint is tight (for every t 0) if Ai w = 0.
Let J be the set of constraints where Ai w > 0.
(iii) Suppose J = and cw < 0. Explain why u + tw, t 0 is a line of unboundedness.
(iv) Suppose J 6= . Give a formula for , the largest nonnegative value of t where u + tw is
feasible. Explain why u + w has a larger set I, and cost no greater than u.
(v) The remaining case is J = and cw = 0. Explain why choosing the vector w gets us into the
previous case.
This proof is actually an algorithm that converts a given feasible point u into a vertex of no greater
cost (or a line of unboundedness). The algorithm is used in Karmarkars algorithm.
(vi) Explain why any polyhedron Ax b where A has rank n has a vertex.

CSCI 5654

H. Gabow

Fall 2007

#23, p. 3

Chvatal, 100 105

Revised Simplex Algorithm: High Level

Unit 4: Implementation

weve described the standard simplex algorithm


it works with completely specified dictionaries/tableaus
the revised simplex algorithm implements the standard simplex more efficiently
using linear algebra techniques
to understand the approach recall the dictionary for a basis B:
xB = B1 b B1 AN xN
z = cB B1 b + (cN cB B1 AN )xN
1. most of B1 AN xN isnt needed
we only use the column of the entering variable
so the revised simplex algorithm doesnt compute it!
2. this could be done by computing/maintaining B1
but inverting a matrix is slow, inaccurate, & most importantly we may lose sparsity
e.g., Chvatal p.96, ex. 6.12
instead well use routines that solve linear systems Ax = b, yA = c
Data Structures for Revised Simplex Algorithm
A, b, c all refer to the given LP, which is in Standard Equality Form
the basis heading B is an ordered list of the m basic variables
B denotes AB , i.e., the m basic columns of A (ordered according to B)
xB is the vector of current basic values, B1 b (ordered according to B)
to find the entering variable we need the current dictionarys cost equation
well compute the vector y = cB B1
notice its appearance twice in the cost equation
at termination y is the optimum dual vector (the simplex multipliers)
to find the leaving variable we need the entering variables coefficients in the current dictionary
this is the vector d = B1 A.s
Revised Simplex Algorithm, High-level
Entering Variable Step
Solve yB = cB
Choose any (nonbasic) s cs > yAs
If none exists, stop, B is an optimum basis
Leaving Variable Step
Solve Bd = As
Let t be the largest value xB td 0
If t = , stop, the problem is unbounded
Otherwise choose a (basic) r whose component of xB td is zero
CSCI 5654

H. Gabow

Fall 2007

#24, p. 1

Pivot Step
In basis heading B replace r by s (this redefines B)
xB xB td
In xB , replace entry for r (now 0) by t

Correctness & Efficiency


Note: in Standard Equality Form, n m (i.e., # variables # equations)
Entering Variable Step
a nonbasic variable xj has current cost coefficient cj yAj
to save computation its convenient to take the entering variable as
the first nonbasic variable with positive cost
time for this step: O(m3 ) to solve the system of equations
plus O(m) per nonbasic variable considered, O(mn) in the worst case
Leaving Variable Step
xs = t, xB = xB td, & all other variables 0 satisfies the dictionary equations
since xB does
and increasing xs to t decreases the r.h.s. by dt
so xs = t is chosen to preserve nonnegativity
if xB td 0 for all t 0, it gives a line of unboundedness, i.e.,
as t increases without bound, so does z
(since xs has positive cost coefficient)
time: O(m3 )
Pivot Step
time: O(m)
our worst-case time estimates show revised simplex takes time O(m3 + mn) per iteration
not an improvement: standard simplex takes time O(mn) per iteration (Pivot Step)
but well implement revised simplex to solve each system of equations
taking advantage of the previous solution
well take advantage of sparsity of the given LP
real-life problems are sparse
this is the key to the efficiency of the revised simplex algorithm

CSCI 5654

H. Gabow

Fall 2007

#24, p. 2

Chvatal, Ch. 6

Review of Gaussian Elimination

Unit 4: Implementation

Gaussian Elimination
solves a linear system

Pn

j=1 aij xj

= bi , i = 1, . . . , n in 2 steps:

1. rewrite thePequations so each variable has a substitution equation of the form


n
xi = bi + j=i+1 aij xj , as follows:
for each variable xj , j = 1, . . . , n
choose a remaining equation with aij 6= 0
rewrite this equation as a substitution equation for xj
(i.e., divide by aij & isolate xj )
remove this equation from the system
eliminate xj from the equations remaining in the system, by substituting
(an iteration of this procedure is called a pivot step for aij )
2. back substitute:
compute successively the values of xn , xn1 , . . . , x1 ,
finding each from its substitution equation

Accuracy
avoid bad round-off error by initially scaling the system to get coefficients of similar magnitude
also, choose each pivot element aij to have largest magnitude (from among all elements aj )
this is called partial pivotting
complete pivotting makes the selection from among all elements a
so the order that variables are eliminated can change
Time
assume 1 arithmetic operation takes time O(1)
i.e., assume the numbers in a computation do not grow too big
(although in pathological cases the numbers can grow exponentially (Edmonds, 67))
time = O(n3 )
back substitution is O(n2 )
theoretically, solving a linear system uses time O(M (n)) = O(n2.38 )
by divide-and-conquer methods of matrix multiplication & inversion
for sparse systems the time is much less, if we preserve sparsity

CSCI 5654

H. Gabow

Fall 2007

#25, p. 1

Preserving Sparsity
if we pivot on aij we eliminate row i & column j from the remaining system
but we can change other coefficients from 0 to nonzero
the number of such changes equals the fill-in
we can reduce fill-in by choice of pivot element, for example:
let pi (qj ) be the number of nonzero entries in row i (column j) in the (remaining) system
pivotting on aij gives fill-in (pi 1)(qj 1)
Markowitzs rule: always pivot on aij , a nonzero element
that minimizes (pi 1)(qj 1)
this usually keeps the fill-in small
Remark. minimizing the total fill-in is NP-hard
Matrices for Pivotting
permutation matrix an identity matrix with rows permuted
let P be an n n permutation matrix
for any n p matrix A, PA is A with rows permuted the same as P
e.g., to interchange rows r and s, take P an identity with rows r and s interchanged
a permutation matrix can equivalently be described as I with columns permuted
for any p n matrix A, AP is A with columns permuted as in P
upper triangular matrix all entries strictly below the diagonal are 0
similarly lower triangular matrix
eta matrix (also called pivot matrix)
identity matrix with 1 column (the eta column) changed arbitrarily
as long as its diagonal entry is nonzero (so its nonsingular)
Remark. the elementary matrices are the permutation matrices and the etas
in the system Ax = b pivotting on akk results in the system LAx = Lb
for L a lower triangular eta matrix whose kth column entries ik are
0 (i < k), 1/akk (i = k), aik /akk (i > k)
a pivot in the simplex algorithm is similar (L is eta but not triangular)
(a Gauss-Jordan pivot)

CSCI 5654

H. Gabow

Fall 2007

#25, p. 2

Gaussian Elimination Using Matrices


the substitution equations form a system Ux = b
U is upper triangular, diagonal entries = 1

(1)

if we start with a system Ax = b and do repeated pivots, the kth pivot


(i) interchanges the current kth equation with the equation containing the pivot element,
i.e., it premultiplies the system by some permutation matrix Pk
(ii) pivots, i.e., premultiplies by a lower triangular eta matrix Lk , where the eta column is k
so the final system (1) has
U = Ln Pn Ln1 Pn1 . . . L1 P1 A, an upper triangular matrix with diagonal entries 1
b = Ln Pn Ln1 Pn1 . . . L1 P1 b
matrices U, Li , Pi form a triangular factorization for A
if A is sparse, can usually achieve sparse U and Li s
the # of nonzeroes slightly more than doubles (Chvatal, p.92)
Remark. an LUP decomposition of a matrix A writes it as A = LUP.
Exercise. In an integral LP all coefficients in A, b & c integers. Its size in bits is measured by this
parameter:
P
L = (m + 1)n + n log n + { log |r| : r a nonzero entry in A, b or c}
The following fact is important for polynomial time LP algorithms:
Lemma. Any bfs of an integral LP has all coordinates rational numbers whose numerator &
denominator have magnitude < 2L .
Prove the Lemma. To do this recall Cramers Rule for solving Ax = b, and apply it to our formula
for a dictionary (Handout#23).

CSCI 5654

H. Gabow

Fall 2007

#25, p. 3

Chvatal, 105 111

Solving Eta Systems

Unit 4: Implementation

in the linear systems solved by the revised simplex algorithm


yB = cB , Bd = As
B can be viewed as a product of eta matrices
thus we must solve eta systems of the form
yE1 . . . Ek = c, E1 . . . Ek x = b
Why B is a Product of Eta Matrices (Handout #57 gives a slightly different explanation)
let eta matrix Ei specify the pivot for ith simplex iteration
if the kth basis is B, then Ek . . . E1 B = I
1
thus B = (Ek . . . E1 )1 = E1
1 . . . Ek
B is a product of eta matrices, since
the inverse of a nonsingular eta matrix is an eta matrix
(because the inverse of a pivot is a pivot)
Simple Eta Systems
let E be an eta matrix, with eta column k
To Solve Ex = b
1. xk = bk /ekk
2. for j 6= k, xj = bj ejk xk
To Solve yE = c
1. for j 6= k, yP
j = cj
2. yk = (ck j6=k yj ejk )/ekk

Time
O(dimension of b or c)
this improves to time O(#of nonzeros in the eta column) if
(i) we can overwrite b (c) & change it to x (y)
both b & c are stored as arrays
(ii) E is stored in a sparse data structure, e.g., a list of nonzero entries in the eta column
ekk is stored first, to avoid 2 passes
General Systems
basic principle: order multiplications to work with vectors rather than matrices
equivalently, work from the outside inwards
let E1 , . . . , Ek be eta matrices
To Solve E1 . . . Ek x = b
write the system as E1 (. . . (Ek x)) = b & work left-to-right:
solve E1 b1 = b for unknown b1
then solve E2 b2 = b1 for unknown b2
etc., finally solving Ek bk = bk1 for unknown bk = x

CSCI 5654

H. Gabow

Fall 2007

#26, p. 1

To Solve yE1 . . . Ek = c
write as ((yE1 ) . . .)Ek = c & work right-to-left:
solve ck Ek = c for ck
then solve ck1 Ek1 = ck for ck1
etc., finally solving c1 E1 = c2 for c1 = y
Time
using sparse data structures time = O(total # nonzeros in all k eta columns)
+O(n) if we cannot destroy b (c)
An Extension
let U be upper triangular with diagonal entries 1
To Solve UE1 . . . Ek x = b
Method #1:
solve Ub1 = b for b1 (by back substitution)
then solve E1 . . . Ek x = b1
Method #2 (more uniform):
for j = 1, . . . , n, let Uj be the eta matrix whose jth column is U.j
Fact: U = Un Un1 . . . U1 (note the order)
Verify thisby doing the
pivots.

1 a b
1 0 b
1 a 0
Example: 0 1 c = 0 1 c 0 1 0
0 0 1
0 0 1
0 0 1
so use the algorithm for a product of eta matrices
for both methods, the time is essentially O(total # nonzero coefficients)
similarly for yUE1 . . . Ek = c

CSCI 5654

H. Gabow

Fall 2007

#26, p. 2

Chvatal, 195 200, 207 211

The Cutting Stock Problem

Unit 4: Implementation

the revised simplex algorithm can handle LPs with huge numbers of variables!
this technique was popularized by the work of Gilmore & Gomory on the cutting stock problem
(the decomposition principle, Chvatal Ch.26, uses the same idea on LPs that overflow memory)
1-Dimensional Cutting Stock Problem
arises in production of paper, foil, sheet metal, etc.
Cutting Stock Problem
raw material comes in raw rolls of width r
must produce bi final rolls of width wi , i = 1, . . . , m
each wi r
Problem: minimize the number of raws used
(this problem is NP-complete just 3 finals per raw is the 3 partition problem)
LP Formulation of Cutting Stock Problem
a cutting pattern cuts 1 raw into ai finals of width wi (i = 1, . . . , m)
plus perhaps some waste
thus a cutting pattern is any solution to
Pm
i=1 wi ai r, ai a nonnegative integer
form matrix A having column Aj = (the jth cutting pattern)
let variable xj = (# of raws that use pattern j)
cutting stock LP:

minimize [1, . . . , 1]x


subject to Ax = b
x0

Remarks
1. the definition of cutting pattern allows us to assume = (rather than ) in the LP constraints
also it makes all cj = 1; this will be crucial for column generation
2. the LP ignores the constraint that xj is integral
in practice, rounding the LP optimum to an integral solution gives a high-quality answer
(since rounding increases the cost by < m)
in some applications, fractional xj s are OK see Chvatal
3. the LP is hard to solve because the # of variables xj is huge
practical values are m 40, r 500, 50 wi 200; gives 107 patterns!
4. a good initial solution to the LP is given by the greedy algorithm, first fit decreasing:
use the largest final that fits, and proceed recursively

CSCI 5654

H. Gabow

Fall 2007

#27, p. 1

Delayed Column Generation


this technique adapts the revised simplex algorithm so it can handle
potentially unlimited numbers of variables, if they have a nice structure
to understand the method recall that
the Entering Variable Step seeks a (nonbasic) variable xj with positive current cost cj ,
where cj = cj yAj
we implement the Entering Variable Step using a subroutine for the following auxiliary problem:
check if current solution x & and dual variables y are optimal, i.e., yA c
if not, find a (nonbasic) column Aj , cj with yAj < cj
usually we do both parts by maximizing cj yAj
solving the auxiliary problem allows us to complete the simplex iteration:
use the variable returned xj as the entering variable
use Aj in the Leaving Variable Step
use cj in the next Entering Variable Step (for cB )
thus we solve the LP without explicitly generating A!
The Knapsack Problem
the knapsack problem is to pack a 1-dimensional knapsack with objects of given types,
maximizing the value packed:
P
maximize z = m
i=1 ci xi
Pm
subject to i=1 ai xi b
xi 0, integral

(i = 1, . . . , m)

all ci & ai are positive (not necessarily integral)


the knapsack problem
(i) is the simplest of ILPs, & a common subproblem in IP
(ii) is NP-complete
(iii) can be solved efficiently in practice using branch-and-bound
(iv) solved in pseudo-polynomial time (O(mb2 )) by dynamic programming, for integral ai s

CSCI 5654

H. Gabow

Fall 2007

#27, p. 2

The Cutting Stock Auxiliary Problem is a Knapsack Problem


for an unknownP
cutting pattern ai
m
(satisfying i=1 wi ai r, ai nonnegative integer)
Pm
we want to maximize 1 i=1 yi ai
a pattern costs 1 when we formulate
Pm the cutting stock problem as a maximization problem
equivalently we want to maximize z = i=1 (yi )ai
can assume ai = 0 if yi 0
this gives a knapsack problem
let z be the maximum value of the knapsack problem
z 1 = current cutting stock solution is optimum
z > 1 gives an entering column As
note that our column generation technique amounts to using the largest coefficient rule
experiments indicate this rule gives fewer iterations too!
A Clarification
Chvatal (pp.199200) executes the simplex algorithm for a minimization problem
i.e., each entering variable is chosen to have negative cost
so his duals are the negatives of those in this handout
Exercises.
1. Prove the above statement, i.e., executing the simplex to minimize z gives current costs & duals
that are the negatives of those if we execute the simplex to maximize z.
2. Explain why the entire approach fails in a hypothetical situation where different cutting patterns
can have different costs.

CSCI 5654

H. Gabow

Fall 2007

#27, p. 3

Chvatal, 201 207

Branch-and-Bound Algorithms

Unit 4: Implementation

branch-and-bound is a method of solving problems by partial enumeration


i.e., we skip over solutions that cant possibly be optimal
invented and used successfully for the Travelling Salesman Problem
in general, a branch-and-bound search for a maximum maintains
M , the value of the maximum solution seen so far
& a partition of the feasible region into sets Si
each Si has an associated upper bound i for solutions in Si
repeatedly choose i with largest i
if i M stop, M is the maximum value
otherwise search for an optimum solution in Si & either
find an optimum & update M ,
or split Si into smaller regions Sj , each with a lower upper bound j
Example
consider the asymmetric TSP
the same problem as Handout#1, but we dont assume cij = cji
asymmetric TSP is this ILP:
P
minimize z = i,j cij xij
P
xji = 1
subject to
Pj6=i
=1
j6=i xij
xij
P

ijS

xij

{0, 1}
|S| 1

i = 1, . . . , n
i = 1, . . . , n

(enter each city once)


(leave each city once)

i, j = 1, . . . , n
S {1, . . . , n}

(no subtours)

Exercise. Show that the subtour elimination constraints are equivalent to


P
S {1, . . . , n}
iS,j
/ S xij 1,
for this ILP, as well as for its LP relaxation.

dropping the last line of the ILP gives another ILP, the assignment problem
(see Handout#45,p.2)
the assignment problem can be solved efficiently (time O(n3 ))
its optimum solution is a lower bound on the TSP solution
(since its a relaxation of TSP)

CSCI 5654

H. Gabow

Fall 2007

#28, p. 1

Branch-and-Bound Procedure for TSP


in the following code each partition set Si is represented by an assignment problem P
Si = { all possible assignments for P }
M / M will be the smallest tour cost seen so far /
repeat the following until all problems are examined:
choose an unexamined problem P
P is always an assignment problem
initially the only unexamined problem is the assignment version of the given TSP
let be the optimum cost for assignment problem P
if < M then
if the optimum assignment is a tour, M
otherwise choose an unfixed variable xij & create 2 new (unexamined) assignment problems:
the 1st problem is P, fixing xij = 0
the 2nd problem is P, fixing xij = 1, xji = 0
/ possibly other xi s can be zeroed /
else / M / discard problem P
Remarks.
1. to fix xij = 1, delete row i & column j from the cost matrix
to fix xij = 0, set cij =
in both cases we have a new assignment problem
2. the assignment algorithm can use the previous optimum as a starting assignment
time O(n2 ) rather than O(n3 )
3. choosing the branching variable xij (see Exercise for details):
xij is chosen as the variable having xij = 1 such that
setting xij = 0 gives the greatest increase in the dual objective
sometimes this allows the xij = 0 problem to be pruned before it is solved
Exercise. (a) Show the assignment problem (Handout#45) has dual problem
Pn

maximize z =

Pn

subject to

ui + vj cij

i=1 ui

j=1

vj
i, j = 1, . . . , n

(b) Let z be the cost of an optimum assignment. Let ui , vj be optimum duals. Show that any
feasible assignment with xij = 1 costs z + cij ui vj .
(c) The b-&-b algorithm branches on xij where xij = 1 and ij maximizes
0 = min{cik ui vk : k 6= j} + min{ckj uk vj : k 6= i} + z .
Show this choice allows us to discard the xij = 0 problem if 0 M . Hint. Use what you learned
in (b).

CSCI 5654

H. Gabow

Fall 2007

#28, p. 2

Example Execution
j

1
2
3
4
5
6
vj

7
20
21
12
23
7

2
27

13
16
46
5
5

3
43
16

25
27
5
5

4
16
1
35

48
9
1

5
30
30
5
18

5
5

6
26
25
9
18
5

= 54, M = since have cycle (1, 4, 2, 1)

x63 = 1, x36 = 0
= 54 (same assgt.)

63

ui
15
0
0
11
5
0

x63 = 0
tour asst.
M =
(0 = 63)
x63 = 1, x36 = x35 = 0
0 = 8 + 2 + 54 = 64
discard

Cost matrix cij for given TSP


& optimum assignment (boldface; cost 54)
& optimum duals ui , vj (sum 54)

x63 = x35 = 1
x36 = x53 = x56 = 0
= 65
discard

The optimum tour cost is 63.

the b-&-b algorithm has enjoyed success because


the optimum assignment cost is often close to the optimum TSP cost
e.g., in 400 randomly generated asymmetric TSP instances with 50 n 250,
the optimum assignment cost averaged > 99% of the optimum TSP cost
with the bound getting better as n increased (The Travelling Salesman Problem, 1985)
we might expect 1 in n assignments to be a tour
since there are (n 1)! tours,
& the number of assignments (i.e., the number of derangements) is n!/e
we dont expect good performance in symmetric problems,
since a cheap edge ij will typically match both ways, ij and ji
Branch-and-Bound Algorithm for Knapsack
recall the KnapsackPProblem from last handout:
maximize z = m
i=1 ci xi
Pm
subject to i=1 ai xi b
xi 0, integral

(i = 1, . . . , m)

like any other b-&-b algorithm, we need


a simple but accurate method to upperbound z
order the items by per unit value, i.e., assume
c1 /a1 c2 /a2 . . . cm /am
wed fill the knapsack completely with item 1, if there were no integrality constraints
can upperbound z for solutions having the first k variables x1 , . . . , xk fixed:
P
P
z k1 ci xi + (ck+1 /ak+1 )(b k1 ai xi ) = = (x1 , . . . , xk )
since

Pm

k+1 ci xi

CSCI 5654

(ck+1 /ak+1 )

Pm

k+1 ai xi

H. Gabow

Fall 2007

#28, p. 3

Remarks
1. (x1 , . . . , xk ) is increasing in xk
since increasing xk by 1 changes by ck (ck+1 /ak+1 )ak 0
2. if all ci are integral, z
Example. A knapsack can hold b = 8 pounds. The 2 densest items have these parameters:
i
ci /ai
ai

1
2
3

2
1
1.1

including 2 items of type 1 gives profit 2 3 2 = 12


including 1 item of type 1 gives profit 1 3 2 + 1 (8 3) = 11
regardless of parameters for items 3, 4, . . .
11 < 12 = reject this possiblity
including no type 1 items is even worse, profit 1 8 = 8
for knapsack a good heuristic is to choose the values xi by being greedy:
initial solution:
x1 b/a1
(the greatest possible number of item 1)
x2 (b a1 x1 )/a2 (the greatest possible number of item 2, given x1 )
etc.
Knapsack Algorithm
maintain M as largest objective value seen so far
examine every possible solution, skipping over solutions with M
set M = and execute search(1)
procedure search(k); / sets xk , given x1 , . . . , xk1 /
set xk greedily (as above), updating ;
if k = m then M max{M, z}
else repeat {
search(k + 1);
decrease xk by 1, updating ;
until xk < 0 or M ; / by Remark 1 / }
Remark. Chvatals algorithm prunes more aggressively, each time is computed

CSCI 5654

H. Gabow

Fall 2007

#28, p. 4

Chvatal, 118 119

General Form LPs

Unit 5: Extensions

in practice variables usually have both lower and upper bounds


the simple nature of these constraints motivates our definition of general form
assume, wlog, each xj has lower bound j R {} & upper bound uj R {+}
forming column vectors & u we get this General Form LP:
maximize cx
subject to Ax = b
xu
Notation
m = (# of equations); n = (# of variables)
a free variable has bounds and +
Converting General Form to Standard Equality Form
replace each variable xj by 1 or 2 nonnegative variables:
Case 1: xj has 2 finite bounds.
sj

tj

xj

uj

replace xj by 2 slacks sj , tj 0
eliminate xj by substituting xj = j + sj
add constraint sj + tj = uj j
Case 2: xj has 1 finite bound.
replace xj by sj = (the slack in xj s bound)
sj 0
eliminate xj by substituting xj = j + sj or xj = uj sj
Case 3: xj free, i.e., no finite bounds.
nj

xj

pj

aj

xj

replace xj by pj , nj 0
eliminate xj by substituting xj = pj nj
more generally xj = aj + pj nj for some constant aj

CSCI 5654

H. Gabow

Fall 2007

#29, p. 1

unfortunately the transformation increases m, n


the transformed LP has special structure:
consider an xj bounded above and below (this increases m)
1. any basis contains at least one of sj , tj
Proof 1. sj or tj is nonzero

(valid if j < uj )
Proof 2. the constraint sj + tj = (constant) gives row 0 . . . 0 1 1 0 . . . 0 in A
so B contains 1 or both columns sj , tj

2. only sj basic = xj = uj ; only tj basic = xj = j


so a basis still has, in some sense, only m variables:
xj adds a constraint, but also puts a meaningless variable into the basis
this motivates Dantzigs method of upper bounding discussed in Handout#30
it handles lower & upper bounds without increasing problem size m, n
most LP codes use this method

CSCI 5654

H. Gabow

Fall 2007

#29, p. 2

Chvatal, 119 129, 132 133

Upper Bounding

Unit 5: Extensions

starting with a General Form LP,


maximize cx
subject to Ax = b
xu
we rework our definitions and the simplex algorithm with lower & upper bounds in mind
basis list of m columns B, with AB nonsingular
basic solution vector x
(i) Ax = b
(ii) basis B j nonbasic & nonfree = xj {j , uj }
Remarks
1. 1 basis can be associated with > 1 basic solution (see Chvatal, p.120)
a degenerate basic solution has a basic variable equal to its lower or upper bound
a nondegenerate basic solution has a unique basis if there are no free variables
2. in a normal basic solution, any nonbasic free variable equals zero
running the simplex algorithm on the LP transformed as in previous handout
only produces normal bfss
so non-normal bfss are unnecessary
but non-normal bfss may be useful for initialization (see below)
feasible solution satisfies all constraints
the simplex algorithm works with a basis B and the usual dictionary relations,
xB = B1 b B1 AN xN , z = cB B1 b + (cN cB B1 AN )xN
in general xB 6= B1 b,

z 6= cB B1 b

so our algorithm must maintain the current value of x, denoted x (xB & xN )
The Simplex Algorithm with Upper Bounding
let B be a basis with corresponding basic solution x
Pivot Step
changes the value of some nonbasic xs
from xs to xs + , where is positive or negative
basic variables change from xB to xB d
objective value changes from z to z + (cs yAs )
as in the revised simplex, d = B1 As , y = cB B1

CSCI 5654

H. Gabow

Fall 2007

#30, p. 1

Entering Variable Step


as in revised simplex, find y = cB B1 & choose entering xs
2 possibilities for xs will increase z:
(i) cs > yAs and xs < us increase xs (from its current value s , if nonfree)
(ii) cs < yAs and xs > s decrease xs (from its current value us , if nonfree)
this is the new case
Fact. no variable satisfies (i) or (ii) = B is optimal
our optimality check, i.e. no variable satisfies (i) or (ii), amounts to saying
every variable is in-kilter, i.e., it is on the following kilter diagram:

cj
uj
j

xj

A missing bound eliminates a vertical line


from the kilter diagram, e.g., the diagram is
the x-axis for a free variable.
Proof.
consider the current bfs xB and an arbitrary fs x, with objective values z , z respectively
from the dictionary, z z = (cN cB B1 AN )(xN xN )
hypothesis implies each term of the inner product is 0 = z is maximum

Leaving Variable Step


constraints on the pivot:
(i) B xB d uB
(ii) s xs + us
(half these inequalities are irrelevant)
Case 1: an inequality of (i) is binding.
the corresponding variable xr leaves the basis
the new xr equals r or ur
Case 2: an inequality of (ii) is binding.
basis B stays the same but the bfs changes
xs changes from one bound (s or us ) to the other
code so ties for the binding variable are broken in favor of this case since it involves less work
Case 3: no binding inequality
LP is unbounded as usual

CSCI 5654

H. Gabow

Fall 2007

#30, p. 2

Other Issues in the Simplex Algorithm


The Role of Free Variables xj
1. if xj ever becomes basic, it remains so (see Case 1)
2. if xj starts out nonbasic, it never changes value until it enters the basis (see Pivot Step)
so non-normal free variables can only help in initialization
a bfs x is degenerate if some basic xj equals j or uj
degenerate bfss may cause the algorithm to cycle; avoid by smallest-subscript rule
to understand this think of sj and tj has having consecutive subscripts
Initialization
if an initial feasible basis is not obvious, use two-phase method
Phase 1: introduce a full artificial basis artificial variable vi , i = 1, . . . , m
set xj arbitrarily if free, else to j or uj
multiply ith constraint by 1 if Ai x bi
Phase 1 LP:
minimize 1v
subject to Ax + Iv = b
xu
0v

(1 = row vector of m 1s)

a basis with v = 0 gives a bfs for the original LP:


drop all nonbasic artificial variables
for each basic artificial variable vi add constraints 0 vi 0
a non-normal bfs can help in initialization:
e.g., consider the LP

maximize x1 such that


x1

+ x3 = 1

x1 + 3x2 + 4x3 = 13
x1 , x2 0
equivalently x1 = 1 x3 , x2 = x3 4
so take x3 = 4 and initial basis (1, 2), x1 = 3, x2 = 0
Phase 1 not needed, go directly to Phase 2
the method of generalized upper bounding (Chvatal, Ch. 25)
adapts
P the simplex algorithm for constraints of the form
jS xj = b
each xj appearing in 1 such constraint

CSCI 5654

H. Gabow

Fall 2007

#30, p. 3

Chvatal, 130-2, 133-4, 242-3

Generalized Fundamental Theorem

Unit 5: Extensions

BFSs in General LPs


a standard form LP, converted to a dictionary, automatically has a basis
formed by the slacks not necessarily feasible
this general LP
maximize x1
subject to x1 + x2 = 1
2x1 + 2x2 = 2
x1 , x2 0
has optimum
x1 = 1, x2 = 0 but no basis!
 solution

1 1
since
is singular
2 2
Phase 1, using artificial variables v1 , v2 , terminates successfully with LP
minimize 3v1
subject to x1 = 1 x2 v1
v2 =

2v1

x i , vi 0
to proceed to Phase 2 we drop v1
we could safely drop v2 and its constraint v2 = 0
this would give us a basis
intuitively this corresponds to eliminating the redundant constraint
well show how to eliminate redundancy in general, & get a basis
consider a general form LP L:
maximize cx subject to Ax = b, x u
L has a basis some m columns B have AB nonsingular
A has full row rank
the row rank of an m n matrix A is the maximum # of linearly independent rows
similarly for column rank
(row rank of A) = (column rank of A) = the rank of A

CSCI 5654

H. Gabow

Fall 2007

#31, p. 1

to prove (row rank) = (column rank) it suffices to show


A has full row rank = it has m linearly independent columns
Proof.
let B be a maximal set of linearly independent columns
so any other column A.j is dependent on B, i.e., Bxj = A.j
the rows of B are linearly independent:
rB = 0 = rA.j = rBxj = 0. so rA = 0. this makes r = 0
thus B has m columns

Eliminating Artificial Variables & Redundant Constraints


A Simple Test for Nonsingularity
define
A: a nonsingular n n matrix
a: a length n column vector
A : A with column n replaced by a
r: the last row of A1 , i.e.,
a length n row vector rA = (0, . . . , 0, 1)
Lemma. A is nonsingular ra 6= 0.
Proof.
we prove the 2 contrapositives (by similar arguments)
ra = 0 = rA = 0 = A singular
A singular =
=
=
=

for some row vector s 6= 0, sA = 0


sA = (0, . . . , 0, x), x 6= 0 (since A is nonsingular)
s = xr
ra = 0

this gives an efficient procedure to get a bfs for an LP


1. solve the Phase 1 LP
get a feasible basis B, involving artificial variables vi
2. eliminate artificials from B as follows:
for each basic artificial vi do
let vi be basic in the kth column
solve rB = Ik.
/r is the vector of the lemma /
replace vi in B by a (nonbasic) variable xj rA.j 6= 0, if such a j exists
the procedure halts with a basis (possibly still containing artificials), by the lemma
have same bfs (i.e., each new xj equals its original value)

CSCI 5654

H. Gabow

Fall 2007

#31, p. 2

Procedure to Eliminate Redundant Constraints


let R = {k : vk remains in B after the procedure}
R stands for redundant
form LP L by dropping the constraints for R
1. L is equivalent to L, i.e., they have the same feasible points
Proof.
take any k R
for simplicity assume vk is basic in row k

1 0
0 1

e.g., for m = 5, |R| = 3, B is 0 0

0 0
0 0

0
0
1
0
0

...
...

...

...
...

let r be the row vector used in the procedure for vk


rk = 1; rj = 0 for j R k
rA = 0 = the kth row is a linear combination of the rows of L

2. L has basis B = B R
Proof.
in B, consider the rows for constraints of L
any entry in a column of R is 0
= these rows are linearly independent when the columns of R are dropped
= these rows form a basis of L

Generalized Fundamental Theorem of LP. Consider any general LP L.


(i) Either L has an optimum solution
or the objective is unbounded or the constraints are infeasible.
Suppose A has full row rank.
(ii) L feasible = there is a normal bfs.
(iii) L has an optimum = the optimum is achieved by a normal bfs.

Proof.
for (i), run the general simplex algorithm (2-Phase)
for (ii), initialize Phase 1 with a normal bfs
it halts with a normal bfs
eliminate all artificial variables using the above procedure

CSCI 5654

H. Gabow

Fall 2007

#31, p. 3

Example
L: maximize x1 + x2 subject to 2 x1 4, 2 x2 4
this LP has empty A, which has full row rank!
x2

optimum bfs

x1

Extended Fundamental Theorem (Chvatal p.242):


If A has full row rank & L is unbounded, there is a basic feasible direction with positive cost.
w is a feasible direction if Aw = 0, wj < 0 only when j = & wj > 0 only when uj =
w is a basic feasible direction if A has a basis
exactly 1 nonbasic variable wj is nonzero, and wj = 1
the general simplex algorithm proves the extended theorem:
x d is feasible

CSCI 5654

H. Gabow

Fall 2007

#31, p. 4

Chvatal, 143 146

Inconsistent Systems of Linear Inequalities

Unit 5: Extensions

Certificates of Infeasiblity
let I be a system of inequalities Ax b
Theorem. I is infeasible
the contradiction 0 1 can be obtained as a linear combination of constraints.
Proof.
consider this primal-dual pair:
primal

dual

maximize 0x

minimize yb

subject to Ax b

subject to yA = 0
y0

(the primal is I with a constant objective function 0)


I infeasible = dual unbounded (since dual is feasible, e.g., y = 0)
= some feasible y has yb = 1
i.e., a linear combination of constraints of I gives 0 1

let n = (# variables in I)
Corollary. I is infeasible
some subsystem of n + 1 constraints is infeasible.
Proof.
first assume A has full column rank
this implies [A | b] has full column rank
if not, b is a linear combination of columns of A, contradicting infeasiblity
I infeasible = yA = 0, yb = 1, y 0 is feasible
= it has a bfs (in the general sense) y
by the Generalized Fundamental Theorem of LP, and full column rank
there are n + 1 constraints, so n + 1 basic variables
any nonbasic variable is 0
so y has n + 1 positive variables
now consider a general A
drop columns of A to form A of full column rank, & apply above argument
the multipliers y satisfy y A = 0
this implies y A = 0 since each dropped column is linearly dependent on A

CSCI 5654

H. Gabow

Fall 2007

#32, p. 1

Remarks
1. the corollary cant be strengthened to n infeasible constraints
e.g., in this system in variables x1 , x2 , any 2 constraints are feasible:
x2

-x1+x2>=1

x1+x2>=7

x2<=2

x1

2. both results extend to allow equality constraints in I


the proof is unchanged (just messier notation)
3. Chvatal proves both results differently
Inconsistency in the Simplex Algorithm
well now show the Phase 1 Infeasibility Proof (Handout #13,p.4) is correct, i.e.,
executing the Phase 1 simplex algorithm on an inconsistent system I,
the final Phase 1 dictionary gives the multipliers of the corollary:
recall the starting dictionary for Phase 1,
with artificial variable x0 & slacks xj , j = n + 1, . . . , n + m:
Pn
xn+i = bi j=1 aij xj + x0
(i = 1, . . . , m)
w=

x0

the final Phase 1 dictionary has objective


P
w = w + j cj xj

where each cj 0 & (assuming inconsistency) w < 0


this equation is a linear combination of equations of the starting dictionary
(Lemma 1 of Handout#16)

i.e., setting yi = cn+i , the equation is


Pm
Pn
w = x0 + i=1 yi (bi j=1 aij xj + x0 xn+i )

thus multiplying the ith original inequality by yi and adding gives


Pm
Pn
Pm
i=1 yi (
j=1 aij xj )
i=1 yi bi
Pn
i.e., j=1 cj xj w

each term on the l.h.s. is nonnegative but the r.h.s. is negative, contradiction!

CSCI 5654

H. Gabow

Fall 2007

#32, p. 2

Chvatal, p.248, ex.16.10

Theorems of Alternatives

Unit 5: Extensions

duality gives many other characterizations for feasibility of systems of inequalities


theyre called theorems of alternatives
they assert that exactly 1 of 2 systems has a solution
e.g., heres one you already know:
Farkass Lemma for Gaussian Elimination.
For any A and b, exactly 1 of these 2 systems is feasible:
(I)
Ax = b
(II)
yA = 0, yb 6= 0
Example.
x1 x2 =
2x1 + x2 =
3x1 x2 =
adding twice

1
0
1
the 2nd constraint to the other two gives 0 = 2

Farkas Lemma. (1902) For any A and b, exactly 1 of these 2 systems is feasible:
(I)
Ax = b, x 0
(II)
yA 0, yb < 0
Interpretations:
(i) system (I) is infeasible iff it implies the contradiction (nonnegative #) = (negative #)
(ii) system (II) is infeasible iff it implies the contradiction (negative #) 0
Example: the system
x1 x2 = 1
2x1 x2 = 0
is inconsistent, since
1 (first constraint) + (2nd constraint)
gives x1 = 1, i.e., (nonnegative #) = (negative #)
Proof.
consider this primal-dual pair:
Primal

Dual

maximize 0x

minimize yb

subject to Ax = b
x0

subject to yA 0

(I) feasible 0 is the optimum objective value for both primal & dual
(II) infeasible
for = note that y = 0 gives dual objective 0

CSCI 5654

H. Gabow

Fall 2007

#33, p. 1

Remarks
1. Farkas Lemma useful in linear, nonlinear and integer programming
Integer Version:
For A and b integral, exactly 1 of these 2 systems is feasible:
Ax = b, x Zn
yA Zn , yb
/ Z, y Rm
Example
consider this system of equations in integral quantities xi :
2x1 + 6x2 + x3 = 8
4x1 + 7x2 + 7x3 = 4
tripling the 1st equation & adding the 2nd gives the contradiction
10x1 + 25x2 + 10x3 = 28
the corresponding vector for Farkas Lemma is y1 = 3/5, y2 = 1/5
2. Farkass Lemma is a special case of the Separating Hyperplane Theorem:
S a closed convex set in Rm , b Rm S =
some hyperplane separates b from S, i.e., yb > a, ys a for all s S
for Farkas, S is the cone generated by the columns of A
(I) says b is in the cone, (II) says b can be separated from it
x2

Ax

yx=0

x1

Farkass Lemma in requirement space

CSCI 5654

H. Gabow

Fall 2007

#33, p. 2

our last theorem of alternatives deals with strict inequalities


Lemma (Gordan). For any A, exactly 1 of these 2 systems is feasible:
(I)
Ax < 0
(II)
yA = 0, y 0, y 6= 0
Proof.
consider this primal-dual pair:
Primal

Dual

maximize

minimize y0

subject to Ax + 1 0

subject to yA = 0
y1 = 1
y0

1 denotes a column vector of 1s


(I) feasible primal unbounded
dual infeasible (since primal is feasible, all variables 0)
(II) infeasible (by scaling y)

Remarks.
1. Heres a generalization of Gordans Lemma to nonlinear programming:
Theorem (Fan et al, 1957).
Let C be a convex set in Rn , and let f : C Rm be a convex function. Then exactly 1 of these 2
systems is feasible:
(I)
f (x) < 0
(II)
yf (x) 0 for all x C, y 0, y 6= 0
Exercise. Prove Fans Theorem includes Gordans as a special case. Begin by taking f (x) = Ax.
The challenge is to prove yA = 0, as required in Gordan, not yA 0, which looks like what comes
out of Fan.
2. Chvatal gives other theorems of alternatives

CSCI 5654

H. Gabow

Fall 2007

#33, p. 3

Chvatal, 152 157

Dual Simplex Algorithm

Unit 5: Extensions

we can solve an LP by running the simplex algorithm on the dual


the dual simplex algorithm amounts to that but is executed on primal dictionaries
DS Example. well show that for the LP
maximize z = 8x1 +5x2
subject to
x1 + x2
9x1 +5x2
3x1 +2x2
x1 , x2

6
45
15
0

and initial dictionary


x1 = 15
4
x2 = 49
s3 = 43

+ 45 s1 41 s2
49 s1 + 41 s2
+ 43 s1 + 41 s2

165
4

45 s1 43 s2

z=

1 dual simplex pivot gives the optimal dictionary


5
0
1

32 s2 + 35 s3
+ s2 +3 s3
31 s2 + 34 s3

z = 40

31 s2 35 s3

x1 =
x2 =
s1 =

Dual Feasible Dictionaries


consider an LP for the standard simplex, and its dual:
Primal

Dual

maximize z = cx

minimize yb

subject to Ax = b
x0

subject to yA c

recall the cost equation in a dictionary: z = yb + cN xN


where y = cB B1 , cN = cN yAN
simplex halts when cN 0
this is equivalent to y being dual feasible
show by considering the dual constraints for B & N separately
so call a dictionary dual feasible when cN 0
e.g., the 2 dictionaries of DS Example

CSCI 5654

H. Gabow

Fall 2007

#34, p. 1

A dictionary that is primal and dual feasible is optimal


Proof #1: this dictionary makes simplex algorithm halt with optimum solution
Proof #2: x and y satisfy strong duality
Idea of Dual Simplex Algorithm & Comparison with Standard Simplex
Simplex Algorithm
maintains a primal feasible solution
each iteration increases the objective
halts when dual feasibility is achieved

Dual Simplex Algorithm


maintains a dual feasible solution
each iteration decreases the objective
halts when primal feasibility is achieved

why does dual simplex decrease the objective?


to improve the current dictionary we must increase some nonbasic variables
this decreases z (or doesnt change it) by the above cost equation
Sketch of a Dual Simplex Pivot
Example. the optimum dictionary of DS Example results from
a dual simplex pivot on the initial dictionary, with s1 entering and s3 leaving
consider a dictionary with coefficients aij , bi , cj
the dictionary is dual feasible (all cj 0) but primal infeasible (some bi are negative)
we want to pivot to a better dual feasible dictionary
i.e., a negative basic variable increases
because a nonbasic variable increases (from 0)
starting pivot row: xr = br

jN

arj xj

in keeping with our goal we choose a row with br negative


want ars < 0 so increasing xs increases xr
new pivot row: xs = (br /ars )

jN (arj /ars )xj

here N = N {s} {r}, arr = 1


note the new value of xs is positive, the quotient of 2 negative numbers

new cost row: z = (original z) + (cs br /ars ) +


here cr = 0
the cost decreases when cs < 0

jN [cj

(cs arj /ars )]xj

(1)

to maintain dual feasibility, want cj cs arj /ars for all nonbasic j


true if arj 0
so choose s to satisfy cj /arj cs /ars for all nonbasic j with arj < 0
Example.
in the initial dictionary of DS Example, the ratios for s = 3 are s1 :
min ratio test = s1 enters

CSCI 5654

H. Gabow

Fall 2007

5 3
/
4 4

= 5/3,

s2 :

3 1
/
4 4

= 3.

#34, p. 2

Standard Dual Simplex Algorithm


let aij , bi , cj refer to the current dictionary, which is dual feasible
Leaving Variable Step
If every bi 0, stop, the current basis is optimum
Choose any (basic) r with br < 0
Entering Variable Step
If every arj 0, stop, the problem is infeasible
Choose a (nonbasic) s with ars < 0 that minimizes cs /ars
Pivot Step
Construct dictionary for the new basis as usual

Correctness of the Algorithm


Entering Variable Step:
if every arj 0, starting equation for xr is unsatisfiable
nonnegative # = negative #
termination of the algorithm:
pivots with cs < 0 decrease z
pivots with cs = 0 dont change z
finite # bases = such pivots eventually cause algorithm to halt
unless it cycles through pivots with cs = 0
a pivot is degenerate if cs = 0
a degenerate pivot changes x, but not the cost row (y)
cycling doesnt occur in practice
it can be prevented as in the standard simplex algorithm
alternatively, see Handout#60

Remarks
1. the Entering and Leaving Variable Steps are reversed from standard simplex
2. a pivot kills 1 negative variable, but it can create many other negative variables
e.g., in Chvatal pp. 155156 the first pivot kills 1 negative variable but creates another
in fact the total infeasibility (total of all negative variables) increases in magnitude
3. dual simplex allows us to avoid Phase I for blending problems
the initial dictionary is dual feasible
in general a variant of the big-M method can be used to initialize the dual simplex
4. the CPLEX dual simplex algorithm is particularly efficient
because of a convenient pivot rule

CSCI 5654

H. Gabow

Fall 2007

#34, p. 3

Revised Dual Simplex Algorithm


as in primal revised simplex maintain
the basis heading & eta factorization of the basis, xB = b (current basic values)
in addition maintain the current nonbasic cost coefficients cN
y isnt needed, but the Entering Variable Step must compute every vAN
Leaving Variable Step: same as standard
Entering Variable Step:
we need the dictionary equation for xr ,
xr = Ir B1 b Ir B1 AN xN
first solve vB = Ir
then compute the desired coefficients arj , j N as vAN
Pivot Step:
solve Bd = As
to update the basic values xB ,
xs xr /ars
the rest of the basis is updated by decreasing xB by xs d
use d to update the eta file
update cN as indicated in (1)

General Dual Simplex


a basic solution x is dual feasible if the cost cj of each nonbasic variable xj satisfies
cj > 0 = xj = uj
cj < 0 = xj = j
i.e., each nonbasic variable is in-kilter as in Handout#30:

cj
uj
j

xj

the algorithm maintains the non-basic variables in-kilter


halting when all basic variables are in-kilter, i.e., within their bounds
& otherwise pivotting to bring the leaving basic variable into kilter
details similar to revised dual simplex

CSCI 5654

H. Gabow

Fall 2007

#34, p. 4

Chvatal, 158 162

Sensitivity Analysis

Unit 5: Extensions

postoptimality analysis studies the optimum solution its robustness and patterns of change
the ease of doing this is an important practical attraction of LP and the simplex algorithm
in contrast postoptimality analysis is very hard for IP
assume weve solved a general LP,
maximize cx subject to Ax = b, x u
obtaining optimum basis B and corresponding optimum bfs x
sensitivity analysis tells how to solve slightly-changed versions of the problem
e.g., c changes to e
c
e will always denotes modified parameters

Cost Changes, e
c

B is a basis for the new problem with x a bfs


so simply restart the revised simplex algorithm, using e
c but no other changes
(standard simplex: must recalculate c but thats easy)
Cost Ranging
assuming only 1 cost cj changes
find the values of cj for which B and x are optimal
well show the answer is a closed interval the interval of optimality of B & x
recall z = yb + (cN yAN )xN where y = cB B1
the solution is optimum as long as each variable is in-kilter
Case 1. j nonbasic.
xj = j : cj yAj gives the interval of optimality
xj = uj : cj yAj
xj free: cj = yAj (trivial interval)
Case 2. j basic.
compute the new multipliers for B:
e=e
e depends linearly on cj
y
cB B1 = y1 cj + y2 , i.e., y
e , each nonbasic variable gives a lower or upper bound (or both) for cj , as in Case 1
using y
note the interval of optimality is closed
the optimum objective z changes by

CSCI 5654

H. Gabow

0
(cj )y1 b

Case 1
Case 2

Fall 2007

#35, p. 1

e
Right-hand Side Changes, b

B remains a basis
it has a corresponding bfs
xN = xN
e B1 AN xN
xB = B1 b
()
this bfs is dual-feasible
so we start the revised dual simplex algorithm using the above bfs
the eta file and current cost coefficients are available from the primal simplex
r.h.s. ranging is similar, using (), & u
changing 1 bi gives 2 inequalities per basic variable
e have new bounds ,
e u
e
more generally suppose in addition to b

B remains a basis
define a new bfs x as follows:
for j nonbasic, xj = if xj is free in new problem then xj
else if xj = j then ej
ej
else if xj = uj then u
else / xj was free and is now bound / a finite bound ej or u
ej
for this to make sense we assume no finite bound becomes infinite
also define xB by ()

x is dual feasible
hence restart revised dual simplex from x
Question. Why does the method fail if a finite bound becomes infinite?
Adding New Constraints
add a slack variable vi in each new constraint
vi has bounds 0 vi < for an inequality & 0 vi 0 for an equation
extend B & x to the new system:
add each vi to the basis
compute its value from its equation
we have a dual-feasible solution (cost(vi ) = 0) so now use the dual simplex algorithm
refactor the basis since the eta file changes
DS Example (Handout#34) contd.
we solve the LP
maximize z = 8x1 +5x2
subject to
x1 + x2
6
9x1 +5x2
45
x1 , x2 0
using standard simplex

CSCI 5654

H. Gabow

Fall 2007

#35, p. 2

the optimum dictionary is


x1 =
x2 =
z=

15
4
9
4
165
4

+ 54 s1 41 s2
49 s1 + 41 s2
54 s1 43 s2

adding a new constraint 3x1 + 2x2 15 gives the LP of DS Example


solve it by adding the corresponding constraint 3x1 + 2x2 + s3 = 15 to the above dictionary
giving the initial (dual feasible) dictionary of DS Example
which is solved in 1 dual simplex pivot
adding a constraint is the basic operation in cutting plane methods for integer programming
(Handout#37)
Arbitrary Changes
consider a new system still close to the original,
e = b,
e e x u
e
maximize e
cx subject to Ax

assume B doesnt change (handle such changes by new variables, see Chvatal pp.1612)
solve the new problem in 2 simplex runs, as follows
1. run primal simplex algorithm
initial bfs x:
ej , or to xj if free
set nonbasic xj to a finite bound ej or u
define basic xj by ()
eu
e:
to make x primal feasible, redefine violated bounds ,
if j is basic and xj > u
ej , new upper bound = xj
if j is basic and xj < ej , new lower bound = xj

solve this LP using primal simplex, starting from B, x

2. run dual simplex algorithm


eu
e
change the modified bounds to their proper values ,

in this change no finite bound becomes infinite


hence it can be handled as in bound changes, using dual simplex algorithm

we expect a small # of iterations in both runs

CSCI 5654

H. Gabow

Fall 2007

#35, p. 3

Chvatal, 162 166

Parametric LPs

Unit 5: Extensions

an affine function of p has the form ap + b


given an LP L(p) with coefficients (A, b, c) affine functions of a parameter p
we wish to analyze the LP as a function of p
p may be time, interest rate, etc.
Basic Fact. Consider affine functions fi (p), i = 1, . . . , k.
max{fi (p) : i = 1, . . . , k} is a piecewise-linear concave up function of p.
min {fi (p) : i = 1, . . . , k} is a piecewise-linear concave down function of p.

p
piecewise linear concave up

piecewise linear concave down

Parametric Costs
L(p) has the form

maximize (c + pc )x subject to Ax = b, x u

Theorem. For some closed interval I, L(p) is bounded p I;


in I the optimum objective value z is a piecewise-linear concave up function of p.
z

p
I
Example.
maximize (p + 1)x1 + x2 + (p 1)x3 subject to x1 0, 0 x2 1, 0 x3
z

x1
CSCI 5654

H. Gabow

x2

p
x3
Fall 2007

#36, p. 1

Proof.
wlog A has full row rank
a basic feasible direction w has (c + pc )w > 0 in some interval
(, r), (, ), R or
thus L(p) is unbounded in 2 maximal intervals of the above form
& is bounded in a closed interval I (I = [, r], (, r], [, ), R or )
for the 2nd part note that L(p) has a finite number of normal bfss

each one x has objective value (c + pc )x, an affine function of p


a basis B and bfs x has an interval of optimality, consisting of all values p where
y = (cB + pcB )B1 is dual feasible
dual feasiblity corresponds to a system of inequalities in p, one per nonbasic variable
hence the interval of optimality is closed
Algorithm to Find I and z (walk the curve)
solve L(p) for some arbitrary p, using the simplex algorithm
let B be the optimum basis, with interval of optimality [, r]
if L(p) is unbounded the algorithm is similar
at r, B is dual feasible & 1 of the corresponding inequalities holds with equality
increasing r slightly, these tight inequalities determine the entering variable in a simplex pivot
do this simplex pivot to find a basis B with interval of optimality [r, r ]
1 pivot often suffices but more may be required
continue in the same way to find the optimal bases to the right of r
stop when a basis has interval [r , ), perhaps with unbounded objective
similarly find the optimal bases to the left of
Example.
maximize px1 subject to x1 + x2 = 1, x1 , x2 0
z

optimum basis:

CSCI 5654

x2

H. Gabow

x1

Fall 2007

#36, p. 2

Parametric R.H. Sides


L(p) has form maximize cx subject to Ax = b + pb , + p x u + pu
assume
()
some L(p) has an optimum solution
() implies no L(p) is unbounded
since some dual L (p0 ) has an optimum = every L (p) is feasible
Theorem. Assuming () there is a closed interval I (L(p) is feasible p I);
in I the optimum objective value exists & is a piecewise-linear concave down function of p.
Proof. by duality

if () fails, any L(p) is infeasible or unbounded


Examples.
1. maximize x1 subject to x2 = p, 1 x2 1
z

infeasible
1

infeasible
p
1

2. maximize x1 subject to x1 + x2 = p, x1 , x2 0
z

infeasible

3. maximize x1 subject to x1 + x2 = 2p 2, 2 x1 p, x2 0
z

infeasible

2 2

CSCI 5654

H. Gabow

Fall 2007

#36, p. 3

WV 9.8

Cutting Planes for ILP

Unit 5: Extensions

the cutting plane method for ILP starts with the LP relaxation,
and repeatedly adds a new constraint
the new constraint eliminates some nonintegral points from the relaxations feasible region
eventually the LP optimum is the ILP optimum
DS Example (Handout#34) contd.
ILP:
maximize z = 8x1 +5x2
subject to
x1 + x2
6
9x1 +5x2
45
x1 , x2 Z +

LP with cutting plane:


maximize z = 8x1 +5x2
subject to
x1 + x2
9x1 +5x2
3x1 +2x2
x1 , x2

6
45
15
0

6
3x1 + 2x2 = 15
5

4
9x1 + 5x2 = 45
x2 3
y
2

1
x1 + x2 = 6
y
1

3
x1

The cutting plane 3x1 + 2x2 = 15 moves the LP optimum


from y = (15/4, 9/4) to the ILP optimum y = (5, 10).
(Fig. from WV).
Method of Fractional Cutting Planes (Gomory, 58)
consider an ILP: maximize cx subject to Ax = b, x integral
we allow all coefficients A, b, c to be real-valued,
although theyre usually integral in the given problem

CSCI 5654

H. Gabow

Fall 2007

#37, p. 1

suppose the LP relaxation has a fractional optimum x


in the optimum dictionary consider the equation for
P a basic variable xi whose value bi is fractional:
()
xi = bi jN aij xj
let fi denote the fractional part of bi , i.e., fi = bi bi
similarly let fij denote the fractional part of aij

in the optimum ILP solution, the r.h.s. of () is an integer


it remains
P integral even if we discard integral terms

. . fi jN fij xj is an integer, say a


a fi < 1 = a 0
thus any integral solution satisfies
P
()
fi jN fij xj 0
with integral slack
the current LP optimum doesnt satisfy () (since all nonbasic xj are 0)
so adding () to the constraints, with an integral slack variable, gives an equivalent ILP
with a new LP optimum
DS Example (Handout#34) contd.
the optimum dictionary of (Handout#35)
x1 = 15
+ 54 s1 41 s2
4
x2 = 94 49 s1 + 41 s2
z=

165
4

54 s1 43 s2

has x1 nonintegral
5
1
x1 s equation is x1 = 15
4 ( 4 )s1 4 s2
keeping only fractional parts the r.h.s. is 34 43 s1 14 s2
so the cutting plane is 43 43 s1 41 s2 0
equivalently 3 3s1 + s2 , or in terms of original variables, 12x1 + 8x2 60, 3x1 + 2x2 15
Summary of the Algorithm
solve the LP relaxation of the given IP
while the solution is fractional do
add a cut () to the LP
resolve the new LP
/ use the dual simplex algorithm, since were just adding a new constraint /
Example. DS Example in Handouts#3435 show how the ILP of p.1 is solved
Gomory proved this algorithm solves the ILP in a finite number of steps
Refinements:
choosing an fi close to half is recommended in practice
can discard cuts that become inactive
in practice the method can be slow more sophisticated cutting strategies are used
CSCI 5654

H. Gabow

Fall 2007

#37, p. 2

Remarks
1. if the given ILP has constraints Ax b rather than equalities,
we require A & b both integral, so all slack variables are integral
if this doesnt hold, can scale up A & b
2. the fractional cutting plane method can be extended to mixed integer programs (MIP)
3. cutting planes can be used within a branch-and-bound algorithm
to strengthen the bound on the objective function

CSCI 5654

H. Gabow

Fall 2007

#37, p. 3

Chvatal, 262 269

Applications to Geometry

Unit 5: Extensions

we give some geometric consequences of the characterization of Handout #32


for inconsistent sytems of inequalities:
Corollary. Ax b is infeasible some subsystem of n + 1 inequalities is infeasible.
say a hyperplane ax = b strictly separates sets R, G Rn if
each r R has ar > b & each g G has ag < b
Theorem. Consider a finite set of points of Rn , each one colored red or green.
Some hyperplane strictly separates the red & green points
this holds for every subset of n + 2 points.
Example.
red & green points in the plane, can be separated by a line unless
there are 4 points in 1 of these 2 configurations:

Proof.
a set of red & green points can be strictly separated
some hyperplane ax = b has ar b & ag b + 1 for each red point r & each green point g
(by rescaling)
thus our given set can be separated
this system of inequalities is feasible for unknowns a & b:
ar b
for each given red point r
ag b + 1 for each given green point g
since there are n + 1 unknowns, the Corollary gives the Theorem

a subset of Rn is convex if any 2 of its points can see each other


x, y C = the line segment between x & y is contained in C
a similar use of the Corollary, plus some facts on convex sets,
implies this famous result (Chvatal p.266):
Hellys Theorem. Consider a finite collection of n + 1 convex sets in Rn .
They have a common point if every n + 1 sets do.
Hellys Theorem cant be improved to n sets, e.g., take 3 lines the plane:

CSCI 5654

H. Gabow

Fall 2007

#38, p. 1

we can also separate two polyhedra, e.g.,

Separation Theorem for Polyhedra.


Two disjoint convex polyhedra in Rn can be strictly separated by a hyperplane.
Proof.
let the 2 polyhedra be Pi , i = 1, 2
corresponding to systems Ai x bi , i = 1, 2
assume both Pi are nonempty else the theorem is trivial
disjointness = no point satisfies both systems
= there are vectors yi 0 satisfying
y1 A1 + y2 A2 = 0, y1 b1 + y2 b2 < 0
set y1 A1 = h, so y2 A2 = h
h is a row vector
for x P1 , hx = y1 A1 x y1 b1
for x P2 , hx = y2 A2 x y2 b2
since y1 b1 < y2 b2 , taking c as a value in between these 2 gives
hx = c a hyperplane strictly separating P1 & P2

P1

P2

P1 is a closed line segment, hence a convex polyhderon.


P2 is a half-open line segment its missing endpoint is in P1 .
P1 & P2 cannot be separated.

CSCI 5654

H. Gabow

Fall 2007

#38, p. 2

Remarks
1. the assumption Pi nonempty in the above argument ensures h 6= 0
(since h = 0 doesnt separate any points)
this argument is a little slicker than Chvatal
2. for both theorems of this handout, Chvatal separates sets A & B using 2 disjoint half-spaces
i.e., points x A have hx c, points x B have hx c +
for finite sets of points, the 2 ways to separate are equivalent
but not for infinite sets e.g., we can strictly separate the sets hx > c & hx < c
but not with disjoint half-spaces
separation by disjoint half-spaces implies strict separation
so the above Separation Theorem would be stronger if we separated by disjoint half-spaces
thats what we did in the proof!, so the stronger version is true
(why not do this in the first place? simplicity)

CSCI 5654

H. Gabow

Fall 2007

#38, p. 3

Chvatal, 291 295

Unit 6: Networks

The Transshipment Problem

this problem is to route specified quantities of homogeneous goods, minimizing the routing cost
more precisely:
let G be a directed graph on n vertices and m edges
the undirected version of G ignores all edge directions
for simplicity assume its connected

4
3
6

Fig.1. Graph G.

sink
each vertex i has a demand bi , & we call i a source

intermediate (transshipment) node


for simplicity assume

i bi

bi > 0
bi < 0
bi = 0

=0

each edge ij has a cost cij , the cost of shipping 1 unit of goods from i to j
we want to satisfy all demands exactly, and minimize the cost
4

-1
4

2
0

2
-2

0
4

1
1

(b)
(a)
Fig.2. (a) Graph G with vertex demands & edge costs. (b) Optimum transshipment,
cost 10. Edge labels give # units shipped on the edge; 0 labels are omitted.
1 unit is shipped along path 5,3,6 vertex 3 functions as a transshipment node.
Special Cases of the Transshipment Problem.
assignment problem & its generalization, transportation problem
shortest path problem
Exercise. Model the single-source shortest path problem as a transhipment problem.

CSCI 5654

H. Gabow

Fall 2007

#39, p. 1

we state the problem as an LP:


the (node-arc) incidence matrix of G: n m matrix A
the column for edge ij has ith entry 1, jth entry +1 & all others 0
Example. the column for edge (3, 1) is [1 0 1 0 0 0]T
edge ij has a variable xij giving the number of units shipped from i to j
Transshipment Problem:

minimize cx
subject to Ax = b
x0

its obvious that any feasible routing satisfies this LP


the following exercise proves that any x feasible to the LP corresponds to a feasible routing
Exercise. (a) Assume all costs are nonnegative (as one expects). Prove that the following algorithm
translates any feasible LP solution x into a valid routing for the transshipment problem.
Let P be a path from a source to a sink, containing only edges with positive xij . Let
= min{xij : ij P }. Ship units along P . Then reduce b and x to reflect the
shipment, and repeat the process. Stop when there are no sources.
(b) Modify the algorithm so it works even when there are negative costs. How do negative costs
change the nature of the problem?
the Dual Transhipment Problem:

maximize yb
subject to yj yi + cij , for each arc ij

the dual variables have a nice economic interpretation as prices:


the dual constraints say
(price at node i) + (shipment cost to j) (price at node j)
well solve the transhipment problem using the minimizing simplex algorithm (Handout#27, p.3)
Exercise. Show that if (yi ) is dual optimal, so is (yi + a) for any constant a.

CSCI 5654

H. Gabow

Fall 2007

#39, p. 2

Chvatal, 296 303

Network Simplex Algorithm

Unit 6: Networks

Linear Algebra & Graph Theory


the constraints Ax = b of the transshipment problem do not have full row rank:
the rows of A sum to 0 since every edge leaves & enters a vertex
e by discarding the row for vertex r (choose r arbitrarily)
e &b
form A
e
e =b
the reduced system is the transhipment problem with constraints Ax
a solution to the reduced system is a solution to the original problem
the discarded equation holds automatically since the entries of b sum to 0
now well show the reduced system has full row rank
the phrases spanning tree of G & cycle of G refer to the undirected version of G
when we traverse a cycle, an edge is called forward if its traversed in the correct direction,
else backward
Example. in cycle 1, 6, 3, edge (6, 3) is backward, the others are forward
for edge ij in the cycle, sign(ij) is +1 (1) if ij is forwards (backwards)
Lemma. A basis of the reduced system is a spanning tree of G.
Example. the solution of Handout#39, Fig.2(b) is not a spanning tree, but we can enlarge it to a
spanning tree with edges shipping 0 units:
1

4
1

0
6

Fig.3. Edges forming an optimum (degenerate) basis.


Proof.
e
Claim 1. The edges of a cycle have linearly dependent columns, in both A & A
Proof.
it suffices to prove it for A
traverse the cycle, adding the edges column times its sign
the sum is 0

Example. for the cycle 1, 6, 3 & A we get [1 0 0 0 0 1]T [0 0 1 0 0 1]T + [1 0 1 0 0 0]T = 0

CSCI 5654

H. Gabow

Fall 2007

#40, p. 1

e
Claim 2. The columns of a forest are linearly independent, in A & A
Proof.
e
it suffices to prove this for A
suppose a linear combination of the edges sums to 0
an edge incident to a leaf 6= r has coefficient 0
repeat this argument to eventually show all coefficients are 0

the lemma follows


e
since a basis of the reduced system consists of n 1 linearly independent columns of A

e whose rows &


Exercise 1. (a) Show a basis (spanning tree) has a corresponding matrix B in A
columns can be ordered so the matrix is upper triangular, with 1s on the diagonal.
In (b)(c), root the spanning tree at r. (b) The equation Bx = b gives the values of the basic
variables. Show how to compute x by traversing T bottom-up. (c) The equation yB = c gives the
values of the duals. Show how to compute y by traversing T top-down.
Execute your algorithms on Fig.5(a).
the algorithms of (b) & (c) use only addition and subtraction, no or /
the Fundamental Theorem shows some basis B is optimum. so we get (see also Handout#61):
Integrality Theorem. If b is integral, the transshipment problem has an optimum integral solution x (regardless of c).
Pivotting
how do we pivot an edge into the current basis?
let T be the current basis; to add edge ij to the basis:
T + ij contains a cycle C
traverse C, starting out by going from i to j
suppose we increase each xuv , uv C, by sign(uv) t (t 0)
xij increases, as desired
e doesnt change, since at each vertex u, 2 edges change and the changes balance
the quantity Ax
so x remains feasible, as long as it stays nonnegative
take t = min{xuv : uv a backwards edge}
this is the largest t possible; some backwards edge drops to 0 and is the leaving variable
1

1
3

4
1

2
6

Fig.4. A (suboptimal) basis. Pivotting edge (5, 1) into the basis gives Fig.3.

CSCI 5654

H. Gabow

Fall 2007

#40, p. 2

Prices
each vertex maintains a dual variable (its price) defined by yr = 0 and yB = c (Exercise 1(c))
4

-4

2
2

-2

-4

4
-1

-4

-5
1

-4
2

-1

-3

(a)

(b)

Fig.5. Prices for the bases of Fig.4 and 3 respectively. r is the top left vertex
(as in Fig.1, Handout#39).
Each basic edge ij satisfies yi + cij = yj . (b) gives
P
optimum prices:
yi bi = 10.
Note: yr doesnt exist in the reduced system
but the constraints for edges incident to r are equivalent to
usual constraints yi + cij yj with yr = 0
Network Simplex Algorithm
this algorithm implements the (basic) simplex algorithm for the transshipment problem
each iteration starts with a basis B (spanning tree T ) and bfs x
Entering Variable Step
Solve yB = cB by traversing T top-down (Exercise 1(c)).
Choose any (nonbasic) edge ij cij < yj yi . / underbidding /
If none exists, stop, B is an optimum basis.
Leaving Variable Step
Execute a pivot step by traversing edge ijs cycle C and finding t.
If t = , stop, the problem is unbounded. / impossible if c 0 /
Otherwise choose a backwards edge uv that defines t.
Pivot Step
Update x: change values along C by t.
In T replace edge uv by ij.

Example. in Fig.5(a) edge (5, 1) can enter, since 3 < 0 (4)


the pivot step (Fig.4) gives Fig.5(b), optimum.
this code involves additions and subtractions, no or / (as expected!)
like simplex,
the network simplex algorithm is very fast in practice
although no polynomial time bound is known (for any pivotting rule)
the primal-dual method (Chvatal Ch.23) leads to polynomial algorithms

CSCI 5654

H. Gabow

Fall 2007

#40, p. 3

Chvatal 443452; K Ch.4

Overview of the Ellipsoid Algorithm

Unit 7: Polynomial LP

for polynomial-time algorithms, we assume the given data A, b, c is integral


the ellipsoid algorithm solves the problem of Linear Strict Inequalities (LSI):
Given: open polyhedron P defined by Ax < b
as usual this means m strict inequalities in n unknowns x
Task: find some x P or declare P =
note P Rn ; our entire discussion takes place in Rn
recall LI (Linear Inequalities) is equivalent to LP (Exercise of Handout#18)
we can solve an LI problem using an algorithm for LSI (using integrality of A, b)
define L = size of input, i.e., (total # bits in A, b) (see Handout#69)
Why Strict Inequalities?
using integrality we can prove
P 6= = volume(P ) 2(n+2)L
Proof Sketch:
a simplex in Rn is the convex hull of n + 1 vertices
any polytope P decomposes into a finite number of simplices
each simplex has vertices that are cornerpoints of P
P 6= = interior(P ) 6=
= P contains a simplex of positive volume, with integral cornerpoints
the volume bound follows from the integrality of A & b and the Exercise of Handout#25
Ellipsoids (see Fig.1)
an ellipsoid is the image of the unit sphere under an affine transformation, i.e.,
an ellipsoid is {c + Qx : kxk 1} for an n n nonsingular matrix Q
equivalently an ellipsoid is {x : (x c)T B1 (x c) 1}, for a positive definite matrix B
Proof. (matrix background in Handouts#65,64)
the ellipsoid is the set of points y with kQ1 (y c)k 1
i.e., (y c)T (Q1 )T Q1 (y c) 1
so B = QQT , B is positive definite

CSCI 5654

H. Gabow

Fall 2007

#41, p. 1

x2

x1



x21
x22
9 0
Fig.1. Ellipse
+
= 1; equivalently center c = 0, B =
.
0 4
9
4
High-level Algorithm
construct a sequence of ellipsoids Er , r = 0, . . . , s
each containing P
with volume shrinking by a factor 21/2(n+1)
stop when either
(i) the center of Es is in P , or
(ii) volume(Es ) < 2(n+2)L (= P = )
Initialization and Efficiency
we use a stronger version of the basic volume fact:
P 6= = volume(P {x : |xj | < 2L , j = 1, . . . , n}) 2(n+2)L
thus E0 can be a sphere of radius n2L , center 0
# iterations = O(n2 L)
more precisely suppose we do N = 16n(n + 1)L iterations without finding a feasible point
our choice of E0 restricts all coordinates to be n2L
i.e., E0 is contained in a box with each side 2n2L 22L
= volume(E0 ) 22nL
after N iterations the volume has shrunk by a factor 2N/2(n+1) = 28nL
.. after N iterations the ellipse has volume 22nL8nL 26nL < 2(n+2)L
= P =

CSCI 5654

H. Gabow

Fall 2007

#41, p. 2

Implementing the High-level Algorithm


if P E, & center c of E is not in P ,
there is a violated hyperplane, i.e., Ai c bi
for the corresponding half-space H (i.e., Ai x < bi )
P E H
the algorithm replaces E by a smaller ellipsoid that contains E H, given by the following theorem
Ellipsoid Shrinking Theorem.
For an ellipsoid E, let H be a half-space containing the center.
an ellipsoid E containing E H with volume(E ) 21/2(n+1) volume(E).

x2

x1

x1 + x2 = 0

Fig.2. E , with center c = (3/ 13, 4/3 13)T , B =

84/13
32/13
contains intersection of E of Fig.1 & half-space x1 + x2 0

32/13
496/117

Ellipsoid Algorithm
Initialization
Set N = 1 + 16n(n + 1)L.
Set p = 0 and B = n2 22L I.
T
/ The ellipsoid is always {x : (x p) B1 (x p) 1}.
The initial ellipse is a sphere centered at 0 of radius n2L . /
Main Loop
Repeat the Shrink Step N times (unless it returns).
If it never returns, return infeasible.
Shrink Step
If Ap < b then return p as a feasible point.
Choose a violated constraint, i.e., an i with Ai p bi .
Let a = ATi .
Ba
1

.
Let p = p
n + 1 aT Ba


2 (Ba)(Ba)T
n2
B
.
Let B = 2
n 1
n + 1 aT Ba

CSCI 5654

H. Gabow

Fall 2007

#41, p. 3

Remarks
1. the ellipsoids of the algorithm must be approximated,
since their defining equations involve square roots
this leads to a polynomial time algorithm
2. but the ellipsoid algorithm doesnt take advantage of sparsity
3. it can be used to get polynomial algorithms for LPs with exponential #s of constraints!
e.g., the Held-Karp TSP relaxation (Handout#1)
note the derivation of N is essentially independent of m
to execute ellipsoid on an LP, we only need an efficient algorithm for
the separation problem:
given x, decide if x P
if x
/ P , find a violated constraint
Convex Programming
let C be a convex set in Rn
i.e., x, y C = x + (1 )y C for any [0, 1]
Problem: min cx s.t. x C
Fundamental Properties for Optimization
1. for our problem a local optimum is a global optimum
Proof.
let x be a local optimum & take any y C

take (0, 1] small enough so c (1 )x + y cx
thus (1 )cx + cy cx, cy cx

2. x
/ C = a separating hyperplane, i.e.,
by > a for all y C and bx < a
proved in Handout#38
because of these properties the ellipsoid algorithm solves our convex optimization problem,
assuming we can recognize points in C & solve the separation problem

CSCI 5654

H. Gabow

Fall 2007

#41, p. 4

a prime example is semidefinite programming:


in the following X denotes a square matrix of variables
b denotes the column vector of Xs entries
and X

the semidefinite programming problem is this generalization of LP:


b
maximize z = cX
b
subject to
AX
X


b
an n n symmetric positive semidefinite matrix


x 1
Example. min x s.t.
is PSD
1 1
this problem is equivalent to min x s.t. xv 2 2vw + w2 0 for all v, w
x = 1 is the optimum
(taking v = w = 1 shows x 1, & clearly x = 1 is feasible)
in general, the feasible region is convex
a convex combination of PSD matrices is PSD
the separation problem is solved by finding Xs eigenvalues
X not PSD = it has a negative eigenvalue
let v be the corresponding eigenvector
vT Xv = vT v < 0
so vT Xv 0 is a separating hyperplane
i.e., it separates the current solution from the feasible region
& can be used to construct the next ellipse

Conclusion: For any > 0, any semidefinite program can be solved by the ellipsoid algorithm
to within an additive error of , in time polynomial in the input size and log (1/).
Examples:
1. (G) (Handout#50) is computed in polynomial time using semidefinite programming
2. .878 approximation algorithms for MAX CUT & MAX 2 SAT
(Goemans & Williamson, STOC 94); see Handout#44

CSCI 5654

H. Gabow

Fall 2007

#41, p. 5

Remarks.
1. SDP includes convex QP as a special case (Exercise below)
2. SDP also has many applications in control theory
3. work on SDP started in the 80s
interior point methods (descendants of Karmarkar) run in polynomial time
& are efficient in practice, especially on bigger problems
Exercise. We show SDP includes QP (Handout#42) and more generally, convex quadratically
constrainted quadratic programming (QCQP).
(i) For x Rn , show the inequality
(Ax + b)T (Ax + b) cx + d
is equivalent to


I
Ax + b
is PSD
(Ax + b)T cx + d
.
Hint. Just use the definition of PSD. Recall ax2 + bx + c is always nonnegative iff b2 4ac and
a + c 0.
(ii) Show QP is a special case of SDP.
(iii) Show QCQP is a special case of SDP. QCQP is minimizing a quadratic form (as in QP) subject
to quadratic constraints
(Ax + b)T (Ax + b) cx + d.

CSCI 5654

H. Gabow

Fall 2007

#41, p. 6

MC 16.4.4, V 23.1

Quadratic Programming Examples

Unit 8: Beyond Linearity

a Quadratic Program (QP) Q has the form


minimize 21 xT Qx + cx
subject to Ax b
x0
Q is an n n symmetric matrix (wlog)
we have a convex quadratic program if Q is positive semi-definite, i.e., xT Qx 0 for every x
justification: the objective function is convex, i.e., concave up, Q is PSD
Exercises.
1. Prove the above, i.e., denoting the objective function as c(x), we have
for all x, y, & with 0 1, c(x + (1 )y) c(x) + (1 )c(y) Q is PSD.
(Hint. Just the definition of PSD is needed.)
2. Show the objective of any convex QP can be written as
minimize (Dx)T (Dx) + cx
for D an n n matrix. And conversely, any such objective gives a convex QP. (Hint. Recall
Handout#64.) So again the restriction to PSD Qs is natural.
Example 1. Let P be a convex polyhedron & let p be a point not in P . Find the point in P closest
to p.
we want to minimize (x p)T (x p) = xT x 2xT p + pT p
so we have a QP with Q = I, c = pT
Q is PD
Example 2. Let P & P be 2 convex polyhedra that do not intersect. Find the points of P & P
that are closest together.
we want to minimize (x y)T (x y)
this gives a QP with c = 0, & Q PSD
Example 3, Data Fitting. (recall Handout#3)
Find the best least-squares data-fit, where we know a priori some linear relations among the parameters.
Example 4. In data mining, we construct a support vector machine by finding a hyperplane that
gives the best separation between 2 data sets (the positive and negative examples).

CSCI 5654

H. Gabow

Fall 2007

#42, p. 1

Example 5, Markowitzs Investment Model. (H. Markowitz won the 1990 Nobel Prize in Economics
for a model whose basics are what follows.)
We have historical performance data on n activities we can invest in. We want to invest in a
mixture of these activities that intuitively has maximum return & minimum risk. Markowitz
models this by maximizing the objective function
(expectation of the return) (variance of the return)
where is some multiplier.
Define
xi = the fraction of our investment that well put into activity i
ri = the (historical) average return on investment i
vi = the (historical) variance of investment i
cij = the (historical) covariance of investments i & j
If Ii is the random variable equal to the return of investment i, our investment returns
elementary probability the variance of this sum is
P

x2i vi + 2

xi Ii . By

i<j cij xi xj .

So forming r, the vector of expected returns, & the covariance matrix C = (cij ), Markowitzs QP
is
minimize xT Cx rx
subject to 1T x = 1
x0
Note that vi = cii . Also the covariance matrix C is PSD, since the variance of a random variable
is nonnegative.
Markowitzs model develops the family of solutions of the QP as varies
in some sense, these are the only investment strategies one should consider
the LINDO manual gives a similar example:
achieve a given minimal return (rx r0 ) while minimizing the variance

CSCI 5654

H. Gabow

Fall 2007

#42, p. 2

MC, 16.4.4

Unit 8: Beyond Linearity

Solving Quadratic Programs

we can reduce many QPs to LP


intuition: in the small, the QP objective is linear
1. An optimum solution x to Q is also optimum to the LP
minimize (c + xT Q)x
subject to Ax b
x0
Proof. take any feasible point and write it as x +
since x is optimum to Q, its objective in Q is at most c(x + ) + (x + )T Q(x + )/2
thus
(c + xT Q) + T Q/2 0
since the feasible region is convex, x + is feasible, for any 0 1
so the previous inequality holds if we replace by
we get an inequality of the form a + b2 0, equivalently a + b 0
this implies a 0
so (c + xT Q) 0 as desired

Remark. the proof shows if Q is PD, Q has 1 optimum point


since a 0, b > 0 = a + b > 0
Example 1.
consider the QP for distance from the origin,
min q = x2 + y 2
s.t. x + y 1
x, y 0
y

x*

x
x+y=1
2 2
x + y = 1/4

Fig.1 x = (1/2, 1/2) is the unique QP optimum.


x is not a corner point of the feasible region.
CSCI 5654

H. Gabow

Fall 2007

#43, p. 1



1 0
the cost vector of #1 is c = xT Q = (1/2, 1/2)
= (1/2, 1/2)
0 1
the linear cost function c x = x/2 + y/2 approximates q close to x
switching to cost c x, x is still optimum
although other optima exist: the edge E = {(x, 1 x, 0) : 0 x 1}
Example 2:
modify the QP to
min q = z 2 + x + y
s.t. x + y 1
x, y, z 0
the set of optima is edge E

0 0
this is consistent with the Remark, since Q = 0 0
0 0

0
0 is PSD but not PD
1

2. Complementary slackness gives us conditions equivalent to optimality of the above LP:


Lemma. x an optimum solution to Q = there are column vectors u, v, y (of length n, m, m
respectively) satisfying this LCP:
   T 
  
x
c
u
Q AT
=

b
A
0
y
v
 T  
u
x
=0
v
y
x, u, y, v 0
Proof.
the dual LP is
max yT b s.t. yT A c + xT Q, y 0
introducing primal (dual) slacks v, (uT ),
the complementary slackness conditions for optimality are
u, v 0, uT x = vT y = 0
the LCP expresses these conditions

the above LCP is the Karush-Kuhn-Tucker necessary conditions (KKT conditions) for optimality
taking Q = 0 makes Q an LP
& the KKT conditions become the complementary slackness characterization of LP optimality
in fact
+
+
+

we can think of the KKT conditions as nonnegativity


feasibility of the dual QP (condition on cT , see Handout#74)
primal feasibility (condition on b)
complementary slackness

CSCI 5654

H. Gabow

Fall 2007

#43, p. 2

Theorem. Let Q be PSD. Then


x is an optimum solution to Q x satisfies the KKT conditions.
Proof.
= : suppose x satisifies the KKT conditions
take any feasible point x +
the same algebra as above shows its objective in Q exceeds that of x by
(c + xT Q) + T Q/2
the first term is 0, since x is optimum to the LP of #1
this follows from the KKT conditions, which are complementary slackness for the LP
the second term is nonnegative, by PSD

Example: KKT does Calc I


well apply KKT in 1 dimension to optimize a quadratic function over an interval
problem: min Ax2 + Bx s.t. 0 x h
Remarks.
1. were minimizing a general quadratic Ax2 + Bx + C the C disappears since its irrelevant
2. for convenience we assume the intervals left end is nonnegative
3. really only the sign of A is important
our QP has Q = (2A), c = (B), b = (, h)T ,
so KKT is

 
2A 1
u
1
0
v
1
0

A = (1, 1)T


B
1
x
0 y1 =
h
y2
0

ux, v1 y1 , v2 y2 = 0

x, u, y, v 0
the linear constraints in scalar form:
u 2Ax + y1 y2 = B
v1 x
=
v2 + x
=h
CS allows 3 possibilities, v1 = 0, v2 = 0, or y1 = y2 = 0
v1 = 0: = x =
v2 = 0: = x = h
y1 = y2 = 0:
when u = 0: = 2Ax = B (i.e., first derivative = 0), x = B/2A
when u > 0: = x = 0, so = 0, and x = as in first case
so KKT asks for the same computations as Calcs set-the-derivative-to-0 method
CSCI 5654

H. Gabow

Fall 2007

#43, p. 3

Lagrangian Multiplier Interpretation


Lagrangian relaxation tries to eliminate constraints
by bringing them into the objective function with a multiplicative penalty for violation
the Lagrangian for Q is
L(x, y, u) = cx + 12 xT Qx yT (Ax b) uT x
the Lagrangian optimality conditions are:
feasibility: Ax b, x 0
nonnegativity of multipliers: y, u 0
no gain from feasibility: (Ax)i > bi = yi = 0; xj > 0 = uj = 0
T
T
1st order optimality condition: L
x = 0, i.e., c + Qx A y u = 0
Remarks.
1. the constraints preceding 1st order optimality ensure that
L(x, y, u) upper-bounds the objective function of Q
2. the Lagrangian optimality conditions are exactly the KKT conditions
3. LINDO specifies a QP as an LP, in this form:
min x1 + . . . + xn + y1 + . . . + ym
subject to
1st order optimality constraints
Ax b
end
QCP n + 2
in the objective function, only the order of the variables is relevant
it specifies the order of the 1st order optimality conditions, & the order of the dual variables
the us are omitted: the 1st order optimality conditions are written as inequalities
the QCP statement gives the row number of the first primal constraint Ax b

CSCI 5654

H. Gabow

Fall 2007

#43, p. 4

Vz,Ch26 Semidefinite Programming:Approximating MAXCUT Unit 8: Beyond Linearity


many approximation algorithms for NP-hard problems are designed as follows:
formulate the problem as an ILP
relax the ILP to an LP by discarding the integrality constraints
solve the LP
use a rounding procedure to perturb the LP solution to a good integral solution
in the last 10 years a more powerful approach has been developed,
using semidefinite programming (SDP) instead of LP
here we model the problem by a general quadratic program
(achieve integrality using quadratic constraints)
relax by changing the variables to vectors
solve the relaxation as an SDP, and round
we illustrate by sketching Goemans & Williamsons approximation algorithm for MAX CUT
in the MAX CUT problem were given an undirected graph G
we want a set of vertices S with the greatest number of edges joining S and V S
more generally, each edge ij has a given nonnegative weight wij
& we want to maximize the total weight of all edges joining S and V S
this problem can arise in circuit layout
General Quadratic Program
each vertex i has a variable ui {1, 1}
the 2 possibilities
for ui correspond to the 2 sides of the cut

1 i and j are on the same side of the cut
so ui uj =
1 i and j are on opposite sides of the cut
its easy to see MAX CUT amounts to this quadratic program:
P
maximize (1/2) i<j wij (1 ui uj )
subject to u2i = 1 for each vertex i
next we replace the n variables ui by n n-dimensional vectors ui
quadratic terms ui uj become inner products ui uj
we get this vector program:
P
maximize (1/2) i<j wij (1 ui uj )
subject to ui ui = 1
for each vertex i
ui Rn for each vertex i
a cut gives a feasible solution using vectors (1, 0, . . . , 0)
so this program is a relaxation of MAX CUT
for any n n-dimensional vectors ui , i = 1, . . . n
form the n n matrix B whose columns are the ui
then X = BT B is PSD (Handout#64) with xij = ui uj
furthermore, any symmetric PSD X arises in this way

CSCI 5654

H. Gabow

Fall 2007

#44, p. 1

thus our vector program is equivalent to the following program:


SDP
P
maximize (1/2) i<j wij (1 xij )
subject to
xii = 1
for each vertex i
xij = xji
for each i 6= j
(xij ) is P SD
this is a semidefinite program (Handout#41)
so we can solve it (to within arbitrarily small additive error) in polynomial time
the vectors ui can be computed from (xij ) (to within any desired accuracy)
in polynomial time (Handout#64)
so we can assume we have the vectors ui ; now we need to round them to values 1
in the rest of the discussion let ui & uj be 2 arbitrary vectors
abbreviate them to u & u , and abbreviate wij to w
also let be the angle between u & u
recall the definition of scalar product (Handout#65)
then our 2 vector contribute (w/2)(1 cos ) to the objective function
the bigger the angle , the bigger the contribution
so we should round vectors with big angles to opposite sides of the cut
Rounding Algorithm. Let H be a random hyperplane through the origin in Rn . Round all vectors
on the same side of H to the same side of the cut. (Vectors on H are rounded arbitrarily.)
H

Random hyperplane H separating 2 unit vectors at angle .


generating H in polynomial time is easy, we omit the details

CSCI 5654

H. Gabow

Fall 2007

#44, p. 2

the only remaining question is, how good an approximation do we get?


let OPT denote the maximum weight of a cut
let z be the optimum value of the SDP (so z OPT)
let EC be the expected weight of the algorithms cut
the (expected worst-case) approximation ratio is the smallest possible value of = EC/OPT
so EC/z
linearity of expectations shows EC is the sum, over all pairs i, j,
of the expected contribution of edge ij to the cuts weight
so we analyze the expected contribution of our 2 typical vectors u, u
the probability that u & u round to opposite sides of the cut is exactly / (see the figure)
.. the contribution of this pair to EC is w/
then min

w/
=
min
(w/2)(1 cos )
0 1 cos

calculus shows the last expression is > .878


to simplify the calculation, verify the identity 2/ > .878(1 cos )
Conclusion. The SDP algorithm has approximation ratio > .878.

CSCI 5654

H. Gabow

Fall 2007

#44, p. 3

Unit 9.A: Overview

More LP & ILP Examples


Minimum Cost Network Flow

a network has sites interconnected by links


material circulates through the network
transporting material across the link from site i to site j costs cij dollars per unit of material
the link from i to j must carry ij units of material and uij units
find a minimum cost routing of the material
letting xij be the amount of material shipped on link ij gives this LP:
minimize z =

Pn

Pn

subject to

Pn

xij

i=1

i=1

j=1 cij xij

Pn

k=1 xjk

=0
xij ij
xij uij

j = 1, . . . , n
i, j = 1, . . . , n
i, j = 1, . . . , n

flow conservation

Some Variations
Networks with Losses & Gains
a unit of material starting at i gets multiplied by mij while traversing link ij
P
P
so replace conservation by ni=1 mij xij nk=1 xjk = 0
example from currency conversion:
a site is a currency, e.g., dollars, pounds
mij = the number of units of currency j purchased by 1 unit of currency i
sample problem: convert $10000 into the most rubles possible
more examples: investments at points in time ($1 now $1.08 in a year), conversion of raw
materials into energy (coal electricity), transporting materials (evaporation, seepage)
Multicommodity Flow
1 network transports flows of various types (shipping corn, wheat, etc.)
use variables xkij , for k ranging over the commodities
if were routing people, Internet packets or telephone messages, we get an ILP (xkij integral)
in the next 2 examples take ij = 0 (for convenience)
Concave Down Cost Functions (works for any LP)
for convenience assume were maximizing z = profit, not minimizing cost
the profit of transporting material on link ij is a piecewise linear concave down function:
CSCI 5654

H. Gabow

Fall 2007

#45, p. 1

c(xij )

m3

m2

m1

xij

b1
b2
uij
decreasing returns to scale
replace xij by 3 variables rij , sij , tij
each appears in the flow conservation constraints where xij does
bounds on variables: 0 rij b1 , 0 sij b2 b1 , 0 tij uij b2
objective function contains terms m1 rij + m2 sij + m3 tij
concavity of c(xij ) ensures this is a correct model
Fixed Charge Flow
link ij incurs a startup cost sij if material flows across it
ILP model:
introduce decision variable yij = 0 or 1
new upper bound constraint: xij uij yij
objective function: add term sij yij
Assignment Problem
there are n workers & n jobs
assigning worker i to job j costs cij dollars
find an assignment of workers to jobs with minimum total cost
let xij be an indicator variable for the condition, worker i is assigned to job j
we get this LP:
minimize z =
subject to

Pn

i=1

Pn

j=1 cij xij

Pn

j=1

Pn

i=1

xij

=1

i = 1, . . . , n

xij
xij

=1
0

j = 1, . . . , n
i, j = 1, . . . , n

xij should be constrained to be integral


but the optimum always occurs for an integal xij
so we solve the ILP as an LP!

CSCI 5654

H. Gabow

Fall 2007

#45, p. 2

Set Covering
constructing fire station j costs cj dollars, j = 1, . . . , n
station j could service some known subset of the buildings
construct a subset of the n stations so each building can be serviced
and the cost is minimum
let aij = 1 if station j can service building i, else 0
let xj be an indicator variable for construcing station j
minimize z =

Pn

subject to

Pn

j=1 cj xj

j=1

aij xj 1

i = 1, . . . , m

xj 0, integral

j = 1, . . . , n

this ILP is a set covering problem


choose sets from a given family, so each element is covered, minimizing total cost
similarly we have
set packing problem choose disjoint sets, maximizing total cost
set partitioning problem choose sets so every element is in exactly one set
Facility Location
elaborates on the above fire station location problem
there are m clients and n potential locations for facilities
we want to open a set of facilities to service all clients, minimizing total cost
e.g., post offices for mail delivery, web proxy servers
constructing facility j costs cj dollars, j = 1, . . . , n
facility j services client i at cost of sij dollars
ILP model:
let xj be an indicator variable for opening facility j
let yij be an indicator variable for facility j servicing client i
minimize z =

j cj xj

i,j

sij yij

yij = 1
yij xj
yij , xj {0, 1}

subject to

i a client
i a client, j a facility
i a client, j a facility

this illustrates how ILP models if then constraints

CSCI 5654

H. Gabow

Fall 2007

#45, p. 3

Quadratic Assignment Problem


there are n plants and n locations
we want to assign each plant to a distinct location
each plant p ships spq units to every other plant q
the per unit shipping cost from location i to location j is cij
find an assignment of plants to locations with minimum total cost
1
10
2

1
7

1
3

5
(a)

3
(b)

Quadratic assignment problem.


(a) Amounts shipped between 3 plants.
(b) Shipping costs for 3 locations.
Optimum assignment = identity, cost 10 1 + 7 2 + 5 3 = 39.
let xip be an indicator variable for assigning plant p to location i
set dijpq = cij spq
minimize z =
subject to

i,j,p,q

dijpq xip xjq


Pn

p=1

Pn

i=1

CSCI 5654

H. Gabow

xip = 1

i = 1, . . . , n

xip = 1
xip {0, 1}

p = 1, . . . , n
i, p = 1, . . . , n

Fall 2007

#45, p. 4

Remarks.
1. we could convert this to an ILP
introduce variables yijpq {0, 1}, & force them to equal xip xjq by the constraint
yijpq xip + xjq 1
but this adds many new variables & constraints
2. a quadratic program has the form
maximize z =

j,k cjk xj xk

j cj xj

Pn

subject to

j=1 aij xj

bi

i = 1, . . . , m

3. we can find a feasible solution to an ILP


maximize z =

subject to

Pn

j cj xj

j=1

aij xj bi
xj {0, 1}

i = 1, . . . , m
j = 1, . . . , n

by solving the QP
maximize z =
subject to

CSCI 5654

j (xj

1/2)2

Pn

j=1 aij xj
xj

bi
1
xj 0

H. Gabow

i = 1, . . . , m
j = 1, . . . , n
j = 1, . . . , n

Fall 2007

#45, p. 5

Murty, Ch.1

Multiobjective Functions

Unit 9.A: Overview

Example
we want to maximize profit 500p + 620f from producing pentium (p) & 486 (f ) computer systems
but maintaining a high-tech image dictates maximize p
whatever the strategy it should be a vector maximum, (pareto optimum) i.e.,
if we produce (p, f ) units, no other feasible schedule (p , f ) should have p p & f f
we give several approaches to multiobjective problems
a common aspect is that in practice, we iterate the process
resolving the LP with different parameters until a satisfactory solution is obtained
sensitivity analysis techniques (Handout #35) allow us to efficiently resolve an LP
if we modify it in a way that the solution changes by a small amount
now write our objective functions as fk =

j ckj xj

+ dk , k = 1, . . . , r

Prioritization (lexicographic optimum)


index the objective functions in order of decreasing importance f1 , . . . , fk
solve the LP using objective maximize z = f1
let z1 be the maximum found
if the optimum solution vector (x1 , . . . , xn ) is unique, stop
otherwise
P
add the constraint j c1j xj + d1 = z1
repeat the process for f2
keep on repeating for f3 , . . . , fr
until the optimum is unique or all objectives have been handled
Remarks
1. the optimum is unique if every nonbasic cost coefficient is negative
the possibility that a nonbasic cost coefficient is 0 prevents this from being iff
2. sensitivity analysis allow us to add a new constraint easily
Worst-case Approach
optimize the minimax value of the objectives
(as in Handout #3)
Weighted Average Objective
P
solve the LP with objective k wk fk
where the weights wk are nonnegative values summing to 1
if the solution is unreasonable, adjust the weights and resolve
starting the simplex from the previous optimum will probably be very efficient
CSCI 5654

H. Gabow

Fall 2007

#46, p. 1

Goal Programming
adapt a goal value gk for each objective function fk
and use appropriate penalities for excess & shortages of each goal
e.g., pe = ps = 1 keeps us close to the goal
pe = 0, se = 1 says exceeding the goal is OK but each unit of shortfall incurs a unit penalty
iterate this process, varying the parameters, until a satisfactory solution is achieved
Goal Setting with Marginal Values
choose a primary objective function f0 and the other objectives fk , k = 1, . . . , r
f0 is most naturally the monetary price of the solution
adapt goals gk , k = 1, . . . , r
solve the LP with objective z = f0 and added constraints fk = gk , k = 1, . . . , r
duality theory (Handout #20) reveals the price pk of each goal gk :
changing gk by a small changes the cost by pk
use these prices to compute better goals that are achieved at an acceptable cost
resolve the LP to verify the predicted change
iterate until the solution is satisfactory

CSCI 5654

H. Gabow

Fall 2007

#46, p. 2

Geometric View of the Simplex Algorithm: Proofs

Unit 9.B: Fundamentals

this handout proves the assertions of Handout#10,p.3


consider a standard form LP L, with P the corresponding convex polyhedron
P Rn , activity space, i.e., no slacks
we associate each (decision or slack) variable of L with a unique constraint of L:
the
 constraint for xj is
nonnegativity
if xj is a decision variable
the corresponding linear inequality if xj is a slack variable
(minor point: no variable is associated with the nonnegativity constraint of a slack variable)
a variable = 0 its constraint holds with equality
Fact 1. A bfs is a vertex x of P plus n hyperplanes of P that uniquely define x.
() The constraints of the nonbasic variables are the n hyperplanes that define x.
Proof.
consider a bfs x, and its corresponding dictionary D (there may be more than 1)
when the nonbasic variables are set to 0, x is the unique solution of D
hence x is the unique point on the hyperplanes of the nonbasic variables
(since D is equivalent to the initial dictionary, which in turn models L)
so weve shown a bfs gives a vertex, satisfying ()
conversely, well show that any vertex of P corresponds to a dictionary, satisfying ()
take n hyperplanes of P that have x as their unique intersection
let N be the variables that correspond to these n hyperplanes
let B be the remaining m variables
set the variables of N to 0
this gives a system of n equations with a unique solution, x
lets reexpress this fact using matrices:
write LP L in the equality form Ax = b of Handout#23,p.1
then AB x = b has a unique solution
this shows the matrix AB is nonsingular (Handout#55,p.2)
thus the Theorem of Handout#23,p.2 shows B is a basis
the nonbasic variables N are described by ()

Fact 2. A nondegenerate pivot moves from one vertex, along an edge of P , to an adjacent vertex.
Proof.
a pivot step travels along a line segment whose equation, in parameterized form,
is given in Handout #8, Property 5:
CSCI 5654

H. Gabow

Fall 2007

#47, p. 1

xj =

(t

bj ajs t
0

j=s
jB
j
/ Bs

let t range from to in this equation to get a line


in traversing , the n 1 variables other than B s remain at 0
thus lies in each of the n 1 corresponding hyperplanes
in fact the dictionary shows is exactly equal to the intersection of these n 1 hyperplanes
so the portion of traversed in the pivot step is an edge of P

the last fact is a prerequisite for Hirschs Conjecture:


Fact 3. Any 2 vertices of a convex polyhedron are joined by a simplex path.
Proof.
let v & w be any 2 vertices of P
Pn
Claim. there is a cost function j=1 cj xj that is tangent to P at w
Pn
Pn
i.e., the hyperplane j=1 cj xj = j=1 cj wj passes thru w, but thru no other point of P

the Claim implies Fact 3:


execute the simplex algorithm on the LP L with the Claims objective function
choose the initial dictionary to correspond to vertex v
simplex executes a sequence of pivots that end at w (since w is the only optimum point)
this gives the desired simplex path
Proof of Claim.
P
write every constraint of L in the form nj=1 aij xj bi
so a nonnegativity constraint xj 0 becomes xj 0
let w be the unique intersection ofPn hyperplanes of P ,
n
corresponding to constraints j=1 aij xj bi , i = 1, . . . , n
(some of these may be nonnegativity)
Pn
take cj = i=1 aij
Pn
Pn
every point x P w has j=1 cj xj < j=1 cj wj
since w satisfies the n constraints with =
& every other point of P has < in at least 1 of these constraints

(see Handout#53 for a deeper look at the Claim)


Corollary. Any face of P is the set of optimum solutions for some objective function.
Proof. as above, use the hyperplanes of the face to construct the objective function

the converse of this statement is proved in the first exercise of Handout#19

CSCI 5654

H. Gabow

Fall 2007

#47, p. 2

Chvatal, 479, 2558

Unit 9.B: Fundamentals

Inefficient Pivot Sequences

Simplex Cycles
the simplest example of cycling in the simplex algorithm
has the variables swapped in and out in a fixed cyclic order
+x3 -x1
+x1 -x2

+x2 -x3

D1

D2

D3

Chvatals example of cycling (pp.3132) is almost as simple:


+x6 -x4

+x2 -x6

+x1 -x5
D1

+x3 -x1

D2

D3

+x4 -x2

+x5 -x3

D4

D5

D6

but cycles in the simplex algorithm can be exponentially long!


e.g., a cycle can mimic a Gray code
a Gray code is a sequential listing of the 2n possible bitstrings of n bits, such that
each bitstring (including the 1st) differs from the previous by flipping 1 bit
Examples:
(i) 00, 01, 11, 10
(ii) Gn is a specific Gray code on n bits, from 0 . . . 0 to 10 . . . 0:
recursive recipe for Gn :
start with 0Gn1 (00 . . . 0 . . . 01 . . . 0)
flip the leftmost bit to 1 ( 110 . . . 0)
do 1Gn1 in reverse (110 . . . 0 100 . . . 0)
e.g., example (i) is G2 , and G3 is
000, 001, 011, 010, 110, 111, 101, 100
Gk gives a simplex cycle of 2k pivots involving only 2k variables, e.g.,
+s2 - x2

+x1 - s1
00

+x2 - s2
01

+s1 - x1
11

10

simplex cycle for k = 2

CSCI 5654

H. Gabow

Fall 2007

#48, p. 1

+s3 - x3

+x1 - s1
000

+x2 - s2
001

+s1 - x1
011

+x3 - s3
010

+x1 - s1
110

+s2 - x2
111

+s1 - x1
101

100

simplex cycle for k = 3


geometrically the Gray code Gn is a Hamiltonian tour
of the vertices of the n-dimensional unit cube
Klee-Minty Examples
on these LPs the simplex algorithm takes 2n 1 pivots to find the optimum
the feasible region is a (perturbed) n-dimensional unit cube
so the standard form LP has n variables and n constraints
the pivot sequence is the above sequence derived from Gn
00 . . . 0 . . . 01 . . . 0 110 . . . 0 100 . . . 0
initial
optimum
bfs
bfs
you can check this using Problem 4.3 (Chvatal, p.53). it says
after 2n1 1 pivots xn1 is the only basic decision variable
this corresponds to 010 . . . 0
after 2n 1 pivots xn is the only basic decision variable
this corresponds to 100 . . . 0
Klee-Minty examples have been devised for most known pivot rules

CSCI 5654

H. Gabow

Fall 2007

#48, p. 2

Chvatal, 37 38

Stalling in Blands Rule

Unit 9.B: Fundamentals

the smallest-subscript rule can do an exponential number of pivots before finding the optimum
in fact it can stall for an exponential number of pivots!
to understand stalling well redo the proof of Handout #12:
consider a sequence S of degenerate pivots using the smallest-subscript rule
so the bfs never changes in S
say a pivot step involves the entering & leaving variables, but no others
a variable xi is fickle if its involved in > 1 pivot of S
if there are no fickle variables, |S| n/2
suppose S has fickle variables; let t be the largest subscript of a fickle variable
Corollary. S has a nonfickle variable
which is involved in a pivot between the first two pivots involving xt .
Proof. (essentially same argument Handout #12)
let F be the set of subscripts of fickle variables
let D & D be the dictionaries of the first two pivot steps involving xt , with pivots as follows:
xs xt

xt

D may precede D in S or vice versa


P
()
as in Handout #12, cs = cs iB ci ais

cs > 0: since xs is entering in Ds pivot

we can assume cs 0:
suppose cs > 0
= xs is nonbasic in D
s > t (since xs doesnt enter in D s pivot)
= xs isnt fickle
so Ds pivot proves the Corollary!
(note: D precedes D in this case)
ci ais 0 for i B F :
Case i = t:
ats > 0: since xt is leaving in Ds pivot
ct > 0: since xt is entering in D s pivot
CSCI 5654

H. Gabow

Fall 2007

#49, p. 1

Case i B (F t):
ais 0: bi = 0 (since xi = 0 throughout S)
but xi isnt the leaving variable in Ds pivot
ci 0: otherwise xi is nonbasic in D & D s pivot makes xi entering (i < t)
since the r.h.s. of () is positive, some i B F has ci 6= 0
hence xi is in B but not B , i.e., a pivot between D & D involves xi

we can apply the Corollary repeatedly to reveal the structure of S:


1st 2 pivots involving xt

S
S
some variable drops
out of problem

reapply the Corollary


to this sequence S
involving 1 less variable

starting with n variables, n can drop out


so eventually there are no fickle variables
i.e., the smallest subscript rule never cycles
but S can have exponential length
the recursive nature of this picture is a guide to constructing a bad example
Stalling Example
Chvatal (1978) gave the following Klee-Minty example:
let be a real number 0 < < 1/2
consider the LPP
n
maximize j=1 nj xj
Pi1
subject to 2( j=1 ij xj ) + xi + xn+i = 1,
xj 0,

i = 1, . . . , n
j = 1, . . . , 2n

start with the bfs (0, . . . , 0, 1, . . . , 1) of n 0s & n 1s


& use Blands rule
it takes fn pivots to reach the optimum
where fn is defined by
f1 = 1, f2 = 3, fn = fn1 + fn2 1
& fn (the nth Fibonacci number) 1.6n2
a minor variant of this LP does exactly the same pivots at the origin
i.e., it stalls for an exponential number of pivots
& then does 1 pivot to the optimum

CSCI 5654

H. Gabow

Fall 2007

#49, p. 2

Combinatoric Example of Weak Duality

Unit 9.C: Duality

Example Graph & ILPs


a

d

e

graph & maximum independent set

maximize xa + xb + xc + xd + xe
subject to xa + xb 1
xa + xc 1
xb + xc 1
xb + xd 1
xc + xe 1
xd + xe 1
xa , xb , xc , xd , xe {0, 1}
maximum independent set ILP

.5

.5

.5

.5
.5

maximum fractional
independent set

minimize yab + yac + ybc + ybd + yce + yde


subject to yab + yac 1
yab + ybc + ybd 1
yac + ybc + yce 1
ybd + yde 1
yce + yde 1
yab , yac , ybc , ybd , yce , yde {0, 1}
minimum edge cover ILP

maximize xa + xb + xc + xd + xe
subject to xa + xb + xc 1
xb + xd 1
xc + xe 1
xd + xe 1
xa , xb , xc , xd , xe {0, 1}

minimize yabc + ybd + yce + yde


subject to yabc 1

yabc + ybd 1

yabc + yce 1
ybd + yde 1
yce + yde 1
yabc , ybd , yce , yde {0, 1}

minimum clique cover ILP


minimum
clique cover

maximum independent set ILP


(clique constraints)

CSCI 5654

H. Gabow

minimum
edge cover

Fall 2007


.5

.5

.5

.5

.5

minimum fractional
edge cover

#50, p. 1

Independent Sets & Duality


consider an undirected graph
an independent set is a set of vertices, no two of which are adjacent
a maximum independent set contains the greatest number of vertices possible
finding a maximum independent set is an NP-complete problem
we can formulate the problem as an ILP:
each vertex j has a 0-1 variable xj
xj = 1 if vertex j is in the independent set, 0 if it is not
maximum
independent
set ILP:

Pn
maximize z = j=1 xj
subject to xj + xk 1
xj {0, 1}

(j, k) an edge of G
j = 1, . . . , n

LP
relaxation:

Pn
maximize z = j=1 xj
subject to xj + xk 1
xj 0

(j, k) an edge of G
j = 1, . . . , n

(the LP relaxation neednt constrain xj 1)


number the edges of G from 1 to m
LP
dual:

P
y
minimize z = m
P i=1 i
subject to {yi : vertex j is on edge i} 1
yi 0

j = 1, . . . , n
i = 1, . . . , m

integral
dual:

P
y
minimize z = m
P i=1 i
subject to {yi : vertex j is on edge i} 1
yi {0, 1}

j = 1, . . . , n
i = 1, . . . , m

(constraining yi integral is the same as making it 0-1)


an edge cover of a graph is a set of edges spanning all the vertices
a minimum edge cover contains the fewest number of edges possible
the integral dual is the problem of finding a minimum edge cover
yi = 1 if edge i is in the cover, else 0
Weak Duality implies
(size of a maximum independent set) (size of a minimum edge cover)
indeed this is obvious each vertex of an independent set requires its own edge to cover it
we can find a minimum edge cover in polynomial time
thus getting a bound on the size of a maximum independent set

CSCI 5654

H. Gabow

Fall 2007

#50, p. 2

A Better Upper Bound


a clique of a graph is a complete subgraph, i.e., a set of vertices joined by every possible edge
an independent set contains 1 vertex in each clique
this gives an ILP with more constraints:
P
clique
maximize z = nj=1 xj
P
constraint
subject to {xj : vertex j is in C} 1
MIS ILP:
xj {0, 1}

C a maximal clique of G
j = 1, . . . , n

this can be a large problem


the number of maximal cliques can be exponential in n !
the payoff is the LP solution will be closer to the ILP
LP relaxation: last line becomes xj 0
P
dual
minimize z = {yC : C a maximal clique of G}
P
LP:
subject to {yC : vertex j is in C} 1
j = 1, . . . , n
yC 0
C a maximal clique of G
integral dual LP: yC {0, 1}
a clique cover is a collection of cliques that spans every vertex
the integral dual LP is the problem of finding a minimum clique cover
Weak Duality:
(size of a maximum independent set) (size of a minimum clique cover)
the rest of this handout assumes we use the clique constraint ILP for maximal independent sets
a graph is perfect if the relaxed LP is an integral polyhedron
equivalently, G, and all its induced subgraphs,
achieve equality in ILP Weak Duality (Chvatal, 1975)
there are many families of perfect graphs:
bipartite graphs, interval graphs, comparability graphs, triangulated (chordal) graphs, . . .
a maximum independent set of a perfect graph can be found in polynomial time
(Gr
otschel, Lovasz, Schrijver, 1981)
the Lovasz number (G) lies in the duality gap of the two ILPs
for a perfect graph (G) is the size of a maximum independent set
(G) is computable in polynomial time (GLS)
(G)

maximum independent set size

LP optimum

minimum edge cover size

duality gap

CSCI 5654

H. Gabow

Fall 2007

#50, p. 3

Generalization to Hypergraphs
consider a family F of subsets of V
a packing is a set of elements of V , 1 in each set of F
a covering is a family of sets of F collectively containing all elements of V
maximum packing
Pn ILP:
P
maximize j=1 xj subject to {xj : j S} 1, S F; xj {0, 1}, j = 1, . . . , n
minimum covering
P ILP:
P
minimize SF yS subject to {yS : j S} 1, j = 1, . . . , n; yS {0, 1}, S F
as above, the LP relaxations of these 2 ILPs are duals, so any packing is any covering
this packing/covering duality is the source of a number of beautiful combinatoric theorems
where the duality gap is 0
in these cases the ILPs are solvable in polynomial time!
e.g., finding a maximum flow; packing arborescences in a directed graph

CSCI 5654

H. Gabow

Fall 2007

#50, p. 4

Polyhedral Combinatorics

Unit 9.C: Duality

Perfect Graph Example


K3 is the triangle:

here are the MIS polyhedra for K3 :


(0,0,1)

(0,0,1)

(.5,.5,.5)

(1,0,0)

(0,1,0)

(0,1,0)

(1,0,0)

clique constraints

edge constraints

the clique constraints give an integral polyhedron


so K3 is perfect
observe that the vertices of this polyhedron correspond to the independent sets of K3
(more precisely, the vertices are the characteristic vectors of the independent sets)
this holds in general:

CSCI 5654

H. Gabow

Fall 2007

#51, p. 1

Theorem. Take any graph, & its MIS polyhedron P


defined by edge constraints or clique constraints.
P is an integral polyhedron
its vertices are precisely the independent sets of G.
Proof.
= : trivial (the characteristic vector of an independent set is 0-1)
= : the argument consists of 2 assertions:
(i) every independent set is a vertex of P
(ii) every vertex of P is an independent set
for simplicity well prove the assertions for the edge constraints
the same argument works for the clique constraints
Proof of (i)
let I be an independent set, with corresponding vector (xi )
xi = 1 if i I else 0
choose n constraints (that (xi ) satisfies with equality) as follows:
for i
/ I choose nonnegativity, xi 0
for i I choose the constraint for an edge containing i
no 2 is choose the same edge constraint, since I is independent
(xi ) satisfies these n constraints with equality, & no other point of P does:
each chosen edge constraint has 1 end constrained to 0
so the other end must equal 1
Proof of (ii)
a vertex (xi ) is a 0-1 vector, by nonnegativity and the edge constraints
if xi = xj = 1 then (i, j) is not an edge (since (xi ) is feasible)
thus (xi ) corresponds to an independent set

Polyhedral Combinatorics
to analyze the independent sets of a graph G,
we can analyze the polyhedron P whose vertices are those independent sets
this depends on having a nice description of P
which we have if G is perfect
in general polyhedral combinatorics analyzes a family of sets
by analyzing the polyhedron whose vertices are (the characteristic vectors of) those sets

CSCI 5654

H. Gabow

Fall 2007

#51, p. 2

Unit 9.C: Duality

Duality & NP-Completeness

Disjoint Paths Problem: Given a graph, vertices s, t & integer k, are there k openly disjoint
st-paths?
s1

s2

t2

t1

Example graph G0 has 2 openly disjoint s1 t1 -paths


& 3 openly disjoint s2 t2 -paths
Disjoint Paths Problem is in P (i.e., it has a polynomial-time algorithm) because of this min-max
theorem:
Mengers Theorem. For any 2 nonadjacent vertices s, t, the greatest number of openly disjoint
st-paths equals the fewest number of vertices that separate s and t.
Hamiltonian Cycle Problem: Does a given graph have a tour passing through each vertex exactly
once?
e.g., G0 has a Hamiltonian cycle

The Peterson graph has no Hamiltonian cycle.


the Hamiltonian Cycle Problem is in N P
because a Hamiltonian cycle is a succinct certificate for a yes answer
the Hamiltonian Cycle Problem N P-complete
& does not seem to have a succinct certificate for a no answer
the Disjoint Paths Problem is in P
both yes & no answers have succinct certificates:
yes answer: the k paths form a succinct certificate
no answer: the < k separating vertices form a succinct certificate

CSCI 5654

H. Gabow

Fall 2007

#52, p. 1

S 7.5; Chvatal 261262

Unit 9.C: Duality

Geometric Interpretation of Duality

Example
Primal
maximize z = x1 + 4x2
subject to
x1 + x2
x1

1
2

Dual
minimize w = y1 + 2y2
subject to
y1 + y2
y1
y1 , y2

=1
=4
0

optimum primal: x1 = 2, x2 = 3, z = 14
optimum dual: y1 = 4, y2 = 5, w = 14
(1,4)

x2

x2

(1,1)

(2,3)

(1,0)
x1+4x2=14

x1+4x2=14
(6,2)
x1+x2=1

x1=2

x1+x2=1

x1=2

x1

x1

recall that the vector (a, b) is normal to the line ax + by = c


and points in the direction of increasing ax + by
e.g., see Handout#65
the objective is tangent to the feasible region at corner point (2, 3)
= its normal lies between the normals of the 2 constraint lines defining (2, 3)
all 3 vectors point away from the feasible region
.. the vector of cost coefficients (i.e., (1, 4)) is a nonnegative linear combination of
the constraint vectors defining (2, 3) (i.e., (1, 1) & (1, 0)):
(1, 4) = 4(1, 1) + 5(1, 0)
the linear combination is specified by the optimum dual values y1 = 4, y2 = 5!

CSCI 5654

H. Gabow

Fall 2007

#53, p. 1

The General Law


consider this pair of LPs:
Primal
Pn
maximize z =
j=1 cj xj
subject to

Pn

j=1

aij xj

minimize w =
bi

subject to

Dual
Pm

i=1 bi yi

Pm

i=1 aij yi

yi

= cj
0

Remark. the primal is the general form of a polyhedron


Notation:
let P be the feasible region of the primal
we use these vectors:
(xj ) denotes the
(cj ) denotes the
(ai ) denotes the
(yi ) denotes the

vector
vector
vector
vector

(x1 , . . . , xn )
of cost coefficients
(ai1 , . . . , ain )
of dual values (y1 , . . . , ym )

suppose the primal optimum is achieved at corner point (xj )


with the objective tangent to P
(xj ) is the intersection of n hyperplanes of P
let them be for the first n constraints, with normal vectors (ai ), i = 1, . . . , n
(cj ) is a nonnegative
linear combination of the n normal vectors (ai ), i = 1, . . . , n
Pm
i.e., cj = i=1 aij yi where yi = 0 for i > n
.. (yi ) is dual feasible
its obvious that (xj ) & (yi ) satisfy Complementary Slackness, so (yi ) is optimal
Conclusion: Suppose an LP has a unique optimum point. The cost vector is a nonnegative linear
combination of the constraint vectors that define the optimum corner point. The optimum duals
are the mulitipliers in that linear combination.
Summary:
dual feasibility says (cj ) is a nonnegative linear combination of hyperplane normals
complementary slackness says only hyperplanes defining x are used

CSCI 5654

H. Gabow

Fall 2007

#53, p. 2

Illustrating Duality by Knapsack LPs

Unit 9.C: Duality

Primal Problem, a (continuous) knapsack LP:


maximize 9x1 + 12x2 + 15x3
subject to x1 + 2x2 + 3x3 5
xj 1 j = 1, 2, 3
xj 0 j = 1, 2, 3
Optimum Solution: x1 = x2 = 1, x3 = 2/3, objective z = 31
Optimum Dictionary
2 1
s0 +
3 3
x1 = 1 s 1
x2 = 1 s 2
1 1
s3 = + s0
3 3
x3 =

1
2
s1 + s2
3
3

1
2
s1 s2
3
3

z = 31 5s0 4s1 2s2


Dual LP:
minimize 5y0 + y1 + y2 + y3
subject to y0 + y1 9
2y0 + y2 12
3y0 + y3 15
yi 0 i = 0, . . . , 3
Optimum Dual Solution: y0 = 5, y1 = 4, y2 = 2, y3 = 0, objective z = 31
Multiplier Interpretion of Duals
adding 5 [x1 + 2x2 + 3x3 5] + 4 [x1 1] + 2 [x2 1] shows 9x1 + 12x2 + 15x3 31
i.e., proof of optimality
obviously we dont use x3 1 in the proof
Complementary Slackness
every xj positive = every dual constraint holds with equality
first 3 yi s positive = the knapsack constraint & 1st 2 upper bounds hold with equality

CSCI 5654

H. Gabow

Fall 2007

#54, p. 1

Testing Optimality
we verify (xj ) is optimal:
(2): inequality in 3rd upper bound = y3 = 0
(1) :
y0 + y1 = 9
2y0 + y2 = 12
3y0 = 15
= y0 = 5, y2 = 2, y1 = 4
(3): holds by definition
(4): holds
= (xj ) is optimum
Duals are Prices
How valuable is more knapsack capacity?
suppose we increase the size of the knapsack by
we can add /3 more pounds of item 3
increasing the value by 5
so the marginal price of knapsack capacity is 5 (= y0 )
How valuable is more of item 3?
obviously 0 (= y3 )
How valuable is more of item 1?
suppose more pounds of item 1 are available
we can add more pounds of item 1 to the knapsack
assuming we remove /3 pounds of item 3
this increases the knapsack value by 9 5 = 4
so the marginal price of item 1 is 4 (= y1 )
General Knapsack LPs
Primal:
maximize
subject to

n
X

vj xj

j=1
n
X

wj xj C

j=1

Dual:
minimize Cy0 +

xj 1

j = 1, . . . , n

xj 0

j = 1, . . . , n

n
X

yj

j=1

subject to

wj y0 + yj vj
yj 0

CSCI 5654

j = 1, . . . , n
j = 0, . . . , n

H. Gabow

Fall 2007

#54, p. 2

Optimal Solutions
assume the items are indexed by decreasing value per pound, i.e.,
v1 /w1 v2 /w2 . . .
optimum solution: using the greedy algorithm, for some s we get
x1 = . . . = xs1 = 1, xs+1 = . . . = xn = 0
to verify its optimality & compute optimum duals:
(2): ys+1 = . . . = yn = 0 (intuitively clear: theyre worthless!)
we can assume xs > 0
now consider 2 cases:
Case 1: xs < 1
(2): ys = 0
(1): equation for xs gives
y0 = vs /ws
equations for xj , j < s give
yj = vj wj (vs /ws )
(4): holds by our indexing
(3): first s equations hold by definition
remaining inequalities say y0 vj /wj , true by indexing
Case 2: xs = 1
(1): a system of s + 1 unknowns yj , j = 0, . . . , s & s equations
solution is not unique
but for prices, we know item s is worthless
so we can set ys = 0 and solve as in Case 1
Duals are Prices
our formulas for the duals confirm their interpretation as prices
Pn
the dual objective Cy0 + j=1 yj
computes the value of the knapsack and the items on hand
(i.e., the value of our scarse resources)
the dual constraint wj y0 + yj vj
says the monetary (primal) value of item j is no more than its value computed by price

CSCI 5654

H. Gabow

Fall 2007

#54, p. 3

Chvatal, 79 84

Unit 9.D: Implementation

Review of Matrices

a vector is a column vector or a row vector, i.e., an n 1 or 1 n matrix


so matrix definitions apply to vectors too
Notation
let x be a row vector & S be an ordered list of indices (of columns)
xS is the row vector
to S, ordered as in S
 of columnsof x corresponding

e.g., for x = 5 3 8 1 , x(2,1,4) = 3 5 1
define xS similarly if x is a column vector
define AS similarly if A is a matrix
where we extract the columns corresponding to S if S is a list of column indices
& similarly for rows

1 0
I is the identity matrix, e.g., 0 1
0 0
the dimension of I is unspecified

0
0
1
and determined by context


0
similarly 0 is the column vector of 0s, e.g. 0
0
with dimension determined by context
Matrix Operations
scalar multiple: for A = [aij ] an m n matrix, t a real number
tA is an m n matrix, [taij ]
matrix sum: for m n matrices A = [aij ], B = [bij ]
A + B is an m n matrix, [aij + bij ]
matrix product: for m n matrix A = [aij ], nP p matrix B = [bjk ]
AB is an m p matrix with ikth entry nj=1 aij bjk
time to multiply two n n matrices:
O(n3 ) using the definition
O(n2.38 ) using theoretically efficient but impractical methods
in practice much faster than either bound, for sparse matrices
only store the nonzero elements and their position
only do work on nonzero elements
matrix multiplication is associative, but not necessarily commutative
AI = IA = A for every n n matrix A
(see also Handout# 65 for more background on matrices)

CSCI 5654

H. Gabow

Fall 2007

#55, p. 1

Matrix Relations
for A, B matrices of the same shape,
A B means aij bij for all entries
A < B means aij < bij for all entries
Linear Independence & Nonsingularity
P
a linear combination of vectors xi is the sum i ti xi , for some real numbers ti
if some ti is nonzero the combination is nontrivial
a set of vectors xi is linearly dependent if some nontrivial linear combination of xi equals 0
let A be an n n matrix
A is singular the columns of A are linearly dependent
some nonzero vector x satisfies Ax = 0
for every column vector b, Ax = b
has no solution or an infinite number of solutions
A is nonsingular

0
1
2

the columns of A are linearly independent


for every column vector b, Ax = b has exactly one solution
A has an inverse, i.e., an n n matrix A1
AA1 = A1 A = I

Proof.
0 = 1 :
1 solution:
n column vectors in Rn that are linearly independent span Rn ,
i.e., any vector is a linear combination of them

1 solution: Ax = Ax = b = A(x x ) = 0 = x = x
1 = 2 :
construct A1 column by column so AA1 = I
to show A1 A = I:
A(A1 A) = A, so deduce column by column that A1 A = I
2 = 0 :
Ax = 0 = x = A1 Ax = A1 0 = 0

CSCI 5654

H. Gabow

Fall 2007

#55, p. 2

Chvatal, 100 105

Unit 9.D: Implementation

Revised Simplex Example

consider LP E of Handout #23


its solved by the standard simplex in 2 pivots:
Initial Dictionary

x1 enters, x3 leaves

x2 enters, x4 leaves

x3 = 1 x1
x4 = 2 x1 x2

x1 = 1 x3
x4 = 1 x2 + x3

x1 = 1 x3
x2 = 1 + x3 x4

z=

z = 3 + x2 3x3

z = 4 2x3 x4
Optimum Dictionary

3x1 + x2

Revised Simplex Algorithm for E


in matrix form of E,

x1

 


1 0 1 0
1
x2
A=
, x = , b =
,c= 3 1
1 1 0 1
2
x3
x4
initially B = (3, 4),

xB

0 0

 
1
=
2

1st Iteration


1 0
since we start with the basis of slacks, B =
, the identity matrix
0 1
thus all linear algebra is trivial
this is usually true in general for the first iteration
Entering Variable Step


yB = yI = y = cB = 0 0

in computing costs, yA.s = 0, so costs are the given costs, as expected


choose x1 as the entering variable, c1 = 3 > 0
Leaving Variable Step
 
1
Bd = d = A.s =
1
 
 
1
1
xB td =
t
0
2
1
take t = 1, x3 (1st basic variable) leaves
Pivot Step
xB

     
1
1
0
td =

=
2
1
1

CSCI 5654

H. Gabow

Fall 2007

#56, p. 1

xB =

 
1
1

B = (1, 4)
2nd Iteration
Entering Variable Step







1 0
y
= 3 0 ,y= 3 0
1 1
 

 0
trying x2 , 1 > 3 0
= 0 so it enters
1
Leaving Variable Step



 
 
1 0
0
0
d=
,d=
1 1
1
1
 
 
1
0
t
0
1
1

t = 1, x4 (2nd basic variable) leaves


Pivot Step
     
1
0
1

=
1
1
0
 
1
xB =
1
B = (1, 2)
3rd Iteration
Entering Variable Step






1 0
y
= 3 1 ,y= 2
1 1

all nonbasic costs are nonpositive:





 
 1 0


= 2 1
0 0 2 1
0 1
= optimum solution

CSCI 5654

H. Gabow

Fall 2007

#56, p. 2

Chvatal, 105 115

Eta Factorization of the Basis

Unit 9.D: Implementation

Approach
a revised simplex iteration solves 2 systems, yB = cB , Bd = As
then replaces rth column of B by As
& solves similar systems for this new B
maintaining B in a factored form makes the systems easy to solve & maintain
also maintains sparsity, numerically stable
Bd = As = the next B matrix is BE, where E is an eta matrix with rth column = d
thus Bk = B E+1 E+2 . . . Ek1 Ek

()

where Bi = the basis after i iterations


Ei = the eta matrix used in the ith iteration to get Bi
0k
() is the eta factorization of the basis
Case 1. The Early Pivots
in () take = 0, B0 = I (assuming the initial basis is from slacks)
the systems of iteration (k + 1), yBk = cB , Bk d = As ,
become yE1 . . . Ek = cB , E1 . . . Ek d = As
solve them as eta systems
this method slows down as k increases
eventually it pays to reduce the size of () by refactoring the basis:
suppose weve just finished iteration
extract the current base B from A, using the basis heading
compute a triangular factorization for B ,
U = Lm Pm . . . L1 P1 B

()

added benefit of refactoring: curtails round-off errors

CSCI 5654

H. Gabow

Fall 2007

#57, p. 1

Case 2. Later Pivots


let Bk be the current basis
let B be the last basis with a triangular factorization
To Solve Bk d = As
using () this system becomes B E+1 . . . Ek d = As
using () this becomes UE+1 . . . Ek d = Lm Pm . . . L1 P1 As
to solve,
1. set a = Lm Pm . . . L1 P1 As
multiply right-to-left, so always work with a column vector
2. solve UE+1 . . . Ek d = a
treating U as a product of etas, U = Um . . . U1
this procedure accesses the eta file
P1 , L1 , . . . , Pm , Lm , Um , . . . U1 , E+1 , . . . , Ek
in forward (left-to-right) order
the pivot adds the next eta matrix Ek+1 (with eta column d) to the end of the file
To Solve yBk = cB
using () this system becomes yB E+1 . . . Ek = cB
to use () write y = zLm Pm . . . L1 P1 , so zUE+1 . . . Ek = cB
to solve,
1. solve zUE+1 . . . Ek = cB (treating U as a product of etas)
2. y = zLm Pm . . . L1 P1
multiply left-to-right
this accesses the eta file in reverse order
so this method has good locality of reference
obviously the early pivots are a special case of this scheme
other implementation issues: in-core vs. out-of-core; pricing strategies; zero tolerances
Efficiency of Revised Simplex
Chvatal estimates optimum refactoring frequency = every 16 iterations
gives (average # arithmetic operations per iteration) = 32m + 10n
versus mn/4 for standard simplex (even assuming sparsity)
i.e., revised simplex is better when (m 40)n 128m, e.g.,
n 2m (expected in practice) & m 104
m 100 is a small LP
todays large LPs have thousands or even millions of variables

CSCI 5654

H. Gabow

Fall 2007

#57, p. 2

Principles in Chv
atals Time Estimate
1. the eta columns of Ei have density between .25 & .5
in the steady state, i.e., after the slacks have left the basis
density is much lower before this
2. solving Bk d = As is twice as fast as yBk = cB
Argument:
cB is usually dense, but not As
we compute the solution di+1 to Ei+1 di+1 = di
di is expected to have the density given in #1
(since it could have been an eta column)
= if Ei+1 has eta column p, dip = 0 with probability .5
but when dip = 0, di+1 = di , i.e., no work done for Ei+1 !
3. in standard simplex, storing the entire dictionary can create core problems
also the sparse data structure for standard simplex is messy (e.g., Knuth, Vol. I)
dictionary must be accessed by row (pivot row) & column (entering column)
Product Form of the Inverse a commonly-used implementation of revised simplex
maintains B1
k = Ek Ek1 . . . E1
where Ei = eta matrix that specifies the ith pivot
Enhanced Triangular Factorization of the Basis (Chvatal, Ch. 24)
achieves even greater sparsity, halving the number of nonzeros

CSCI 5654

H. Gabow

Fall 2007

#57, p. 3

Chvatal, 105 115

Unit 9.D: Implementation

Revised Simplex Summary

Data Structures
given data:
col 1
A
packed form
by column

c
dense form

b
dense form

col 2
col 3
col 4

basis:
xB
basis values

B
basis heading

eta file:
P1

Pm

L1

Lm

U1

Um

Ek

E+1

Lm Pm . . . L1 P1 B = UE+1 . . . Ek
(B is the current basis)
Entering Variable Step
Solve yB = cB :
1. solve zUE+1 . . . Ek = cB

dense copy
of cB

2. y = zLm Pm . . . L1 P1

dense
z

Choose any (nonbasic) s cs > yAs


compute yAs using dense y, packed As
If none exists, stop, B is an optimum basis
Remark. Steps 12 read the eta file backwards, so LP practitioners call them BTRAN (backward
transformation)

CSCI 5654

H. Gabow

Fall 2007

#58, p. 1

Leaving Variable Step


Solve Bd = As :
1. a = Lm Pm . . . L1 P1 As

dense copy
of As

2. solve UE+1 . . . Ek d = a

dense
a

Add packed copy of Ek+1 to eta file


Let t be the largest value xB td 0
dense vectors
If t = , stop, the problem is unbounded
Otherwise choose a (basic) r whose component of xB td is zero
Remark. Steps 12 read the eta file forwards & are called FTRAN (forward transformation)
Pivot Step
In basis heading B replace r by s
xB xB td
In xB , replace entry for r (now 0) by t
Refactoring Step (done every 20 iterations)
Use B to extract B = AB from packed matrix A
Convert B to linked list format
Execute Gaussian elimination on B
for ith pivot, record Pi , Li in new eta file
& record ith row of U in the packed vectors U
At end, add the U vectors to the eta file

CSCI 5654

H. Gabow

Fall 2007

#58, p. 2

Remark.
to achieve a sparser triangular factorization,
we may permute the rows and columns of B to make it almost lower triangular form,
with a few spikes (Chvatal, p.9192)

The spikes can create fill-in.


to adjust for permuting columns, do the same permutation on B & xB
to adjust for permuting rows by P, make P the first matrix of the eta file
Exercise. Verify this works.

CSCI 5654

H. Gabow

Fall 2007

#58, p. 3

Chvatal, Ch. 16

Unit 9.E: Extensions

Solving Systems of Linear Inequalities

we want to solve a system of linear inequalities I,


Pn

j=1 aij xj

bi , i = 1, . . . , m

this problem, LI, is equivalent to LP (Exercise of Handout#18)


Fourier (1827) & later Motzkin (1936) proposed a simple method to solve inequality systems:
elimination & back substitution
usually inefficient, but has some applications
Recursive Algorithm to Find a Solution to I
1. rewrite each inequality involving x1 in the form x1 u or x1 ,
Pn
where each u, is an affine function of x2 , . . . , xn , j=2 cj xj + d
2. form I from I by replacing the inequalities involving x1 by inequalities u
ranges over all lower bounds on x1 , u ranges over all upper bounds on x1
I is a system on x2 , . . . , xn
3. delete any redundant inequalities from I
4. recursively solve I
5. if I is infeasible, so is I
if I is feasible, choose x1 so (the largest ) x1 (the smallest u)

unfortunately Step 3 is hard, & repeated applications of Step 2 can generate huge systems
but heres an example where Fourier-Motzkin works well:
consider the system

xi xj bij

write x1 s inequalities as

i, j = 1, . . . , n, i 6= j

x1 xj + b,

x1 xk + b

eliminating x1 creates inequalities xk xj b


so the system on x2 , . . . , xn has the original form
& eliminating redundancies (simple!) ensures all systems generated have n2 inequalities
thus we solve the given system in time O(n3 )
Remark
the problem of finding shortest paths in a graph has this form
O(n3 ) is the best bound for this problem!

CSCI 5654

H. Gabow

Fall 2007

#59, p. 1

Finite Basis Theorem


says I has essentially a finite # of distinct solutions
first we extend the notion of bfs:
x is a normal bfs for Ax b if for v = b Ax,
 
x
is a normal bfs of Ax + v = b, v 0
v
define a basic feasible direction of Ax b in the same way, i.e., introduce slacks
Example.
x1 1, 2 x2 3

1 0 1 0
introducing slacks v1 , v2 , v3 gives coefficient matrix 0 1 0 1
0
1 0 0

0
0
1

bfss x1 = 1, x2 = 2, v3 = 1 & x1 = 1, x2 = 3, v2 = 1 are corner points


bfd x1 = 1, v1 = 1 (basis v1 , v2 , v3 ) is direction vector for unboundedness
x2

x1

CSCI 5654

H. Gabow

Fall 2007

#59, p. 2

the vector

Pk

i
i=1 ti x

is a

nonnegative combination of xi , i = 1, . . . , k if each ti 0


convex combination of xi , i = 1, . . . , k if each ti 0 & the ti s sum to 1
in our example, any feasible point is
a convex combination of the 2 corners
plus a nonnegative combination of the direction vector for unboundedness
this is true in general:
Finite Basis Theorem. The solutions to Ax b are precisely the vectors that are
convex combinations of vi , i = 1, . . . , M plus nonnegative combinations of wj , j = 1, . . . , N
for some finite sets of vectors vi , wj .
Proof Idea.
the vi are the normal bfs
the wj are the bfds
the argument is based on Farkas Lemma

CSCI 5654

H. Gabow

Fall 2007

#59, p. 3

Decomposition Algorithm (Chvatal, Ch.26)


applicable to structured LPs
start with a general form LP L:
maximize cx subject to Ax = b,

xu

break the equality constraints into 2 sets, A x = b ,

A x = b

apply the Finite Basis Theorem to the system S


A x = b , x u
to get that
any fs to SPhas the form
PM
N
x = i=1 ri vi + j=1 sj wj
PM
for ri , sj nonnegative, i=1 ri = 1, and vi , wj as above
rewrite L using the equation for x to get the master problem
PM M:
maximize cr r + cs s subject to Ar r + As s = b ,
i=1 ri = 1,

for vectors cr , cs & matrices Ar , As derived from c & A respectively

ri , s j 0

since M & N are huge, we dont work with M explicitly instead solve M by column generation:
each Entering Variable Step solves the auxiliary problem A:
maximize c yA subject to A x = b , x u
where y is the vector of dual values for M, with its last component dropped
the solution to A will be either
a normal bfs (i.e., a vi ) which can enter Ms basis, or
a basic feasible direction of an unbounded solution (a wj ) which can enter Ms basis, or
a declaration of optimality
the decomposition algorithm works well when we can choose A , A so
A has few constraints, and either
A can be solved fast, e.g., a network problem, or
A breaks up into smaller independent LPs, so we can solve small auxiliary problems

CSCI 5654

H. Gabow

Fall 2007

#59, p. 4

Chvatal, 149 152

PrimalDual Simplex Correspondence

Unit 9.E: Extensions

consider a standard form LP & its dual:


Primal Problem P

Dual Problem D

maximize z = cx

minimize yb

subject to Ax b
x0

subject to yA c
y0

Theorem 1. The standard dual simplex algorithm for P amounts to


executing the standard simplex algorithm on the dual problem D.
Theorem 2. For a standard form LP P, there is a 1-1 correspondence between
primal dictionaries (for P) & dual dictionaries (for D) such that
(i) B is a primal basis N is a dual basis
(ii) any row in Bs dictionary is the negative of a column in N s dictionary:
primal dictionary
P
xi = bi jN aij xj ,
P
z = z + jN cj xj

iB

dual dictionary
P
yj = cj + iB aij yi ,
P
w = z iB bi yi

jN

Proof of Theorem 1:
show that after each pivot
the 2 simplex algorithms have dictionaries corresponding as in (ii)
argument is straightforward
e.g., dual simplexs minimum ratio test is
minimize cs /ars , ars < 0
standard simplexs minimum ratio test on the corresponding dual dictionary is
minimize cs / ars , ars > 0

Proof of Theorem 2:
index the primal constraints and variables as follows:
C = the set of primal constraints (|C| = m)
D = the set of primal decision variables (i.e., the given variables; |D| = n)
Proof of (i):
primal constraints after introducing slacks:



 xC
=b
I A
xD


define P = I A
xC consists of m slack variables indexed by C
xD consists of n decision variables indexed by D
dual constraints after introducing slacks:
CSCI 5654

H. Gabow

Fall 2007

#60, p. 1



 A
=c
yD
I

yC


A
define Q =
I
yC : m decision variables
yD : n slack variables
in (i), B is a set of m indices of C D, N is the complementary set of n indices
for simplicity let B consist of
the first k indices of D and last m k indices of C
denote intersections by dropping the sign
e.g., BC denotes all indices in both B & C
we write P with its rows and columns labelled by the corresponding indices:
NC BC BD ND

Ik
0
B Y
P=
0 Imk X Z


NC
BC

B is a primal basis the columns of BC & BD are linearly independent


B is nonsingular
we write Q with its rows and columns labelled by the corresponding indices:
BD
ND

B
Y
Z
X
Q=

Ik
0
0
Ink

NC
BC
BD
ND

N is a dual basis the rows of NC & ND are linearly independent


B is nonsingular

CSCI 5654

H. Gabow

Fall 2007

#60, p. 2

Proof of (ii):
the primal dictionary for basis B is
1
xB = P1
B b PB PN xN
1
z = cB P1
B b + (cN cB PB PN )xN

using our expression for P we have





0
B
XB1
1
PB =
, PB =
Imk X
B1

Imk
0

remembering the dual constraints are yQ = c


we derive the dual dictionary for basis N (with nonbasic variables B):
let QN denote the matrix Q keeping only the rows of N , & similarly for QB
1
yN = cQ1
N yB QB QN
1
z = cQ1
N bN + yB (QB QN bN bB )

using our expression for Q we have



 1

B
B
Y
1
, QN =
QN =
0
0 Ink

B1 Y
Ink

1. now we check the terms aij in the 2 dictionaries (defined in (ii)) correspond:
in the primal dictionary
these terms
are P1
B PN


1
XB
Imk Ik Y
which equal
B1
0
0 Z
in the dual dictionary
theseterms are QB Q1
N

X Z B1 B1 Y
which equal
Ik 0
0
Ink
the 2 products are negatives of each other, as desired
2. now we check the objective values are negatives of each other:
the primal objective value is cB P1
B b


bN C
cB = 0mk cBD , b =
bBC 

1
objective value = cBD B
0mk b = cBD B1 bN C
1
the dual objective value is cQ
 N bN :


bN C
c = cBD cN D , bN =
0nk

 1
B bN C
= cBD B1 bN C , negative of primal
objective value = c
0nk

3. similar calculations show


primal cost coefficients are negatives of dual r.h.s. coefficients
dual cost coefficients are negatives of primal r.h.s. coefficients

CSCI 5654

H. Gabow

Fall 2007

#60, p. 3

Schrijver, Ch. 16, 19, 22

Unit 9.F: Networks

Integral Polyhdera

a polyhedron Ax b is rational if every entry in A & b is rational


a rational polyhedron P is integral it is the convex hull of its integral points
every (minimal) face of P has an integral vector
every LP max cx st x P with an optimum point has an integral optimum point
so for integral polyhedra, ILP reduces to LP
Example Application of Integral Polyhedra
a graph is regular if every vertex has the same degree
a matching is perfect if every vertex is on a matched edge
Theorem. Any regular bipartite graph has a perfect matching.
Proof.
take a bipartite graph where every vertex has degree d
let A be the node-arc incidence matrix
consider the polyhedron Ax = 1, x 0 well see in Theorem 2 that its integral
the polyhedron has a feasible point: set xij = 1/d for every edge ij
so theres an integral feasible point, i.e., a perfect matching

Total Unimodularity
this property, of A alone, makes a polyhedron integral
a matrix A is totally unimodular if every square submatrix has determinant 0, 1
Examples of Totally Unimodular Matrices
1. the node-arc incidence matrix of a digraph
Proof Sketch:
let B be a square submatrix of the incidence matrix
induct on the size of B
Case 1: every column of B contains both a +1 & a 1
the rows of B sum to 0, so det(B) = 0
Case 2: some column of B contains only 1 nonzero
expand det(B) by this column and use inductive hypothesis

2. the node-arc incidence matrix of a bipartite graph


similar proof

CSCI 5654

H. Gabow

Fall 2007

#61, p. 1

3. interval matrices 0,1 matrices where each column has all its 1s consecutive
Proof Idea:
proceed as above if row 1 contains 1 positive entry
otherwise, subtract the shorter column from the longer to reduce the total number of 1s
4. an amazing theorem of Seymour characterizes the totally unimodular matrices as being built up
from network matrices and 2 exceptional 5 5 matrices, using 9 types of operations
Theorem 1. A totally unimodular =
for every integral vector b, the polyhedron Ax b is integral.
Theorem 2. Let A be integral. A totally unimodular
for every integral b, the polyhedron Ax b, x 0 is integral
for every integral a, b, c, d, the polyhedron a Ax b, c x d is integral
can also allow components of these vectors to be
Proof Idea (this is the basic idea of total unimodularity):
suppose A is totally unimodular & b is integral
well show the polyhedron Ax b, x 0 is integral
for any basis B, the basic variables have values B1 b
B has determinant 1, by total unimodularity
note that some columns of B can be slacks
so B1 is an integral matrix

Theorem 2 gives the Transhipment Integrality Theorem


(even with lower and upper bounds on flow)
Theorem 3. Let A be integral. A totally unimodular
for every integral b & c where the primal-dual LPs
max cx st Ax b, x 0
min yb st yA c, y 0
both have an optimum, both LPs have an integral optimum point.
Theorem 3 contains many combinatorial facts, e.g.:
let A be the incidence matrix of a bipartite graph
let b, c be vectors of all 1s
K
onig-Egerv
ary Theorem. In a bipartite graph,
the maximum cardinality of a matching equals the minimum cardinality of a vertex cover.

CSCI 5654

H. Gabow

Fall 2007

#61, p. 2

Total Dual Integrality


this property makes a polyhedron integral, but involves A, b, and every c
Illustrative Example: Fulkersons Arborescence Theorem
take a digraph with nonnegative integral edge lengths (e) & a distinguished vertex r
an r-arborescence is a directed spanning tree rooted at vertex r
all edges are directed away from r
an r-cut is a set of vertices not containing r
an r-cut packing is a collection of r-cuts, with repetitions allowed,
such that each edge e enters (e) cuts
its size is the number of sets
Theorem. For any digraph, & r,
the minimum total length of an r-arborescence equals the maximum size of an r-cut packing.
let C be the r-cut-edge incidence matrix
consider the primal-dual pair,
max y1 st yC , y 0
min x st Cx 1, x 0
its not hard to see that if both LPs have integral optima, Fulkersons Theorem holds
its easy to see that C is not totally unimodular
3 edges
ra, rb, rc
with r-cuts {a, b}, {a, c}, {b, c} give this submatrix with determinant 2:
1 1 0
1 0 1
0 1 1
well get Fulkersons Theorem using the TDI property
take a rational polyhedron Ax b
consider the primal-dual pair of LPs,
max cx st Ax b
min yb st yA = c, y 0
Ax b is totally dual integral (TDI) if for every integral c where the dual has an optimum,
the dual has an integral optimum point
Theorem 4. Ax b TDI with b integral = Ax b is integral.
returning to Fulkerson:
it can be proved that the system Cx 1, x 0 is TDI, so it is an integral polyhedron
the definition of TDI shows the dual has an integral optimum
so both LPs have integral optima, i.e., Fulkersons Arborescence Theorem is true

CSCI 5654

H. Gabow

Fall 2007

#61, p. 3

Chvatal, 303 317

Initialization, Cycling, Implementation

Unit 9.F: Networks

Initialization
in general we need a Phase 1 procedure for initialization
since a transshipment problem can be infeasible (e.g., no source-sink path)
the following Phase 1 procedure sometimes even speeds up Phase 2
(by breaking the network into smaller pieces)
a simple solution vector x is where 1 node w transships all goods:
all sources i send their goods to w, along edge iw if i 6= w
all sinks receive all their demand from w, along edge wi if i 6= w
this is feasible (even if w is a source or sink) if all the above edges exist in G
(since satisfying the constraints for all vertices except w implies satisfying ws constraint too)
in general, add every missing edge iw or wi as an
P artificial edge
then run a Phase 1 problem with objective t = {xwi , xiw : wi (iw) artificial}
there are 3 possibilities when Phase 1 halts with optimum objective t :
1. t > 0: the given transshipment problem is infeasible
2. t = 0 & no artificial edge is in the basis: proceed to Phase 2
3. t = 0 & an artificial edge is in the basis
we now show that in Case 3, the given problem decomposes into smaller subproblems
graph terminology:
V denotes the set of all vertices
for S V , edge ij enters S if i
/ S, j S
ij leaves S if i S, j
/S
Lemma 1. Let S be a set of vertices where
(a) no edge of G enters S;
P
(b) the total net demand in S ( iS bi ) equals 0.
Then any feasible solution (of the given transshipment problem)
has xe = 0 for every edge e leaving S.
Remark. (a) + the Lemmas conclusion show we can solve this network
by finding an optimum solution on S and an optimum solution on V S.
Proof.
(a) shows the demand in S must be satisfied by sources in S
(b) shows this exhausts all the supply in S, i.e., no goods can be shipped out

CSCI 5654

H. Gabow

Fall 2007

#62, p. 1

let T be the optimum basis from Phase 1


as we traverse an edge from tail to head, y (from Phase 1) increases by 1
more precisely every edge is oriented like this:
y
real edge

y i +1

artificial edge

yi

basic
nonbasic
Fig.1. y stays the same on an edge of G in T . It doesnt increase
on an edge of G T . Artificial edges increase by 1.

take any artificial edge uv T


let S = {i : yi > yu }
y

yu +1

yu

u
V-S

Fig.1 implies
vS
no edge of G enters S
no edge of G leaving S is in T
thus no goods enter or leave S
this implies the total net demand in S equals 0
now Lemma 1 applies, so the network decomposes into 2 smaller networks
Cycling
the network simplex never cycles, in practice
but Chvatal (p.303) gives a simple example of cycling
its easy to avoid cycling, as follows
suppose we always use the top-down procedure for computing y (fixing yr = 0)
its easy to see a pivot updates y as follows:
Fact. Suppose we pivot ij into the basis. Let cij = cij + yi yj . Let d be the deeper end of ij in
the new basis. Then y changes only on descendants of d. In fact

d = j (d = i) = every descendant w of d has yw increase by cij (cij ).

CSCI 5654

H. Gabow

Fall 2007

#62, p. 2

consider a sequence of consecutive degenerate pivots


each entering edge ij is chosen so cij < 0
assume the deeper
P vertex of ij is always j
the Fact shows nk=1 yk always decreases
P
this implies we cant return to the starting tree T (since T determines nk=1 yk uniquely)
so it suffices to give a rule that keeps every edge e T with xe = 0 directed away from the root
Cunninghams Rule does it, as follows:
suppose 2 or more edge can be chosen to leave the basis
let ij be the entering edge
let a be the nearest common ancestor of i and j in T
traverse the cycle of ij in the direction of ij, starting at a
choose the first edge found that can leave the basis

old 0

leave

old 0
candidate

new 0
candidate

new 0
candidate

old 0
non0

flip

enter

degenerate

(b)

(a)

no old 0

nondegenerate

(c)

Fig.2. Understanding Cunninghams Rule. 0 edge means xe = 0.


(a) Edges between the leaving and entering edges flip orientation in a pivot.
(b) Degenerate pivot: the first old 0 edge following j leaves.
(c) Nondegenerate pivot: the first new 0 edge following a leaves.
Implementing Network Simplex Efficiently (Chvatal, 311317)
tree data structures can be used to speed up the processing of T
using the Fact, y can be updated by visiting only the descendants of d
a preorder list of T is used to find the descendants of d
Fig.2(a) shows we can update the preorder list in a pivot by working only along the path that flips

CSCI 5654

H. Gabow

Fall 2007

#62, p. 3

Chvatal, 320 322

Unit 9.F: Networks

Related Network Problems

Transportation Problem
special case of transshipment:
no transshipment nodes
every edge goes from a source to a sink
the setting for Hitchcocks early version of network simplex
assignment problem is a special case
we can extend the transshipment problem in 2 ways:
Bounded Sources & Sinks
generalize from demands = b to demands , b:
to model such demands add a dummy node d
route excess goods to d and satisfy shortfalls of goods from d:
d

<=b

>=b

d has demand

i bi ,

=b

=b

where the sum is over all real vertices

Bounded Edges
edges usually have a capacity, i.e., capacity uij means xij uij
an edge e with capacity u can be modelled by placing 2 nodes on e:
<=u

(a)

-u

(b)

u-x

(c)

The capacitated edge of (a) is modelled by (b).


(c) shows how x units flow through the edge. We must have x u.
when all edges have capacities (possibly infinite), we have the minimum cost flow problem
Exercise. Sometimes edges have lower bounds, i.e., lower bound ij means xij ij . Show how
a lower bound can be modelled by decreasing the demand at j & increasing it at i.
Chvatal Ch.21 treats these upper-bounded transhipment problems directly, without enlarging
the network.

CSCI 5654

H. Gabow

Fall 2007

#63, p. 1

Max Flow Problem (Chvatal Ch.22)


the given graph has 1 source s, with unbounded supply, & 1 sink t, with unbounded demand
each edge has a capacity
the goal is to ship as many units as possible from s to t
i.e., each edge from s costs 1, all other edges cost 0
Strong Duality has this interpretation:
a cut is a set S of vertices that
P contains s but not t
the capacity of a cut is iS,j / S uij
obviously the value of any flow is at most the capacity of any cut. Strong Duality says
Max-Flow Min-Cut Theorem. The maximum value of a flow equals the minimum capacity of
a cut.
for proof see Chvatal p.371

3
s

1
1

t
5

Flow network with capacities.


The max flow & min cut are both 3.

CSCI 5654

H. Gabow

Fall 2007

#63, p. 2

Unit 9.G: Polynomial LP

Positive Definite Matrices


positive definite matrices behave very much like positive numbers
let A be an n n symmetric matrix

A is positive definite (1) every vector x 6= 0 has xT Ax > 0


(2) every eigenvalue of A is positive
(3) A can be written BT B for some nonsingular matrix B
((3) is like saying every positive number has a nonzero square root)


2 1
Example. A =
1
1

is PD & satisfies (1)(3):

(1) 2x21 2x1 x2 + x22 = x21 + (x1 x2 )2 > 0 for (x1 , x2 ) 6= (0, 0)
 



3 5
2
=
= the eigenvalues are (3 5)/2
(2) A
1 5
1 5



1 1
1 0
(3) A =
0
1 1 1
Proof.
(1)= (2):
suppose Ax = x
then xT Ax = xT x = kxk2 > 0 = > 0
(2)= (3):
A symmetric = it has n orthonormal eigenvectors, say xi , i = 1, . . . , n
form n n matrix Q, with ith column xi
form diagonal matrix with ith diagonal entry i , eigenvalue of xi .
then AQ = Q, A = QQT
since Q1 = QT
since each eigenvalue is positive, we can write = DD
this gives A = QD(QD)T
(3)= (1):
xT Ax = xT BT Bx = (Bx)T Bx = kBxk2 > 0

note the factorization A = QQT can be computed in polynomial time

CSCI 5654

H. Gabow

Fall 2007

#64, p. 1

2 More Properties of Positive Definite Matrices


1. A positive definite = the curve xT Ax = 1 defines an ellipsoid, i.e., an n-dimensional ellipse
Proof.
using A = QQT & substituting y = QT x, the curve is
xT QQT x = yT y
P = 12
the latter equation is
i yi = 1, an ellipsoid since all eigenvalues are positive
since x = Qy, we rotate the ellipsoid, each axis going into an eigenvector of A

2. let f : Rn R, & at some point x,


If f = (f /xi ) vanishes & the Hessian matrix H = ( 2 f /xi xj ) is positive definite
then x is a local minimum
Proof idea.
follows from the Taylor series for f ,
f (x) = f (0) + (f )T x + 21 xT Hx+ (higher order terms)
H positive definite = f (x) > f (0) for small x

A is positive semidefinite every vector x has xT Ax 0


every eigenvalue of A is nonnegative
A can be written BT B for some n n matrix B
In keeping with the above intuition we sometimes write X PSD as X  0

CSCI 5654

H. Gabow

Fall 2007

#64, p. 2

Karloff, Ch.5

Background Facts for Karmarkars Algorithm

Unit 9.G: Polynomial LP

Linear Algebra
the transpose of an m n matrix A is the n m matrix AT where ATij = Aji
(AB)T = BT AT
the L2 -norm kxk of x Rn is its length according to Pythagoras,
a unit vector has length 1

pPn

2
i=1 xi

P
the scalar product of 2 vectors x, y Rn is xT y = ni=1 xi yi
it equals kxkkyk cos(the angle between x & y)
Cauchy-Schwartz inequality: xT y kxkkyk
if y is a unit vector, the scalar product is the length of the projection of x onto y
x & y are orthogonal if their scalar product is 0
2 subspaces are orthogonal if every vector in one is orthogonal to every vector in the other
an m n matrix A has 2 associated subspaces of Rn :
the row space is the subspace spanned by the rows of A
the nullspace is the set of vectors x with Ax = 0
the row space & nullspace are orthogonal (by definition)
in fact theyre orthogonal complements:
any vector x Rn can be written uniquely as r + n
where r (n) is in the row space (nullspace)
and is called the projection of x onto the row space (nullspace)
Lemma 1. If the rows of A are linearly independent, the projection of any vector x onto the row
space of A is AT (AAT )1 Ax.
Proof.
AAT is nonsingular:
AAT y = 0 = kAT yk2 = (AT y)T AT y = yT AAT y = 0 = AT y = 0 = y = 0
the vector of the lemma is in the row space of A
its difference with x is in the null space of A:
A(x AT (AAT )1 Ax) = Ax Ax = 0

an affine space is the set v + V for some linear subspace V , i.e., all vectors v + x, x V
a ball B(v, r) in Rn is the set of all vectors within distance r of v
Lemma 2. Let F be the affine space v + V . The minimum of an arbitrary linear cost function cx
over B(v, r) F is achieved at v ru, where u is a unit vector along the projection of c onto V .
Proof.
take any vector x B(v, r) F
let cP be the projection of c onto V
CSCI 5654

H. Gabow

Fall 2007

#65, p. 1

c(v ru) cx = c((v ru) x) = cP ((v ru) x) (since c cP is orthogonal to V )


to estimate the r.h.s.,
Cauchy-Schwartz shows cP (v x) kcP kkv xk rkcP k
cP (ru) = rkcP k
so the r.h.s. is 0, as desired

Calculus
logarithms:
for all real x > 1, ln (1 + x) x
Lemma 3. Let x Rn be a vector with x > 0 and
Y

n
2
.
< 1. Then
ln
xj

1
j=1

Pn

j=1 xj

= n. Set = k1 xk & assume

Exercise 1. Prove Lemma 3. Start by using the general fact that the geometric mean is at most
the arithmetic mean:
For any n 1 nonnegative numbers xj , j = 1, . . . , n,
Y
X
1/n

n
n
xj
xj /n

j=1

j=1

(This inequality is tight when all xj s are equal. It can be easily derived from Jensens Inequality
below.)
Qn
Upperbound j=1 1/xj using the above relation. Then write y = 1 x & substitute, getting terms
1/(1 yj ). Check that
terms can be expanded into a geometric series. Simplify
Pn |yj | <P1,n so those
2
using the values of j=1 yj , j=1 yj . (Use the latter to estimate all high order terms). At the end
take logs, & simplify using the above inequality for ln (1 + x).
P
Lemma 4. Let H be the hyperplane ni=1 xi = 1. Let be the subset of H where all coordinates
xi are nonnegative. Let g = (1/n, . . . , 1/n).
p
(i) Any point in is at distance at most Rp= (n 1)/n from g.
(ii) Any point of H within distance r = 1/ n(n 1) of g is in .
note that (1, 0, . . . , 0) n and is at distance R from g, since
(1 1/n)2 + (n 1)/n2 = (n 1)n/n2 = (n 1)/n = R2 .
hence (i) shows that the smallest circle circumscribed about n with center g has radius R
note that (1/(n 1), . . . , 1/(n 1), 0) n and is at distance r from g, since
(n 1)(1/(n 1) 1/n)2 + 1/n2 = 1/n2 (n 1) + 1/n2 = 1/n(n 1) = r 2
hence (ii) shows that the largest circle inscribed in n with center g has radius r
Exercise 2. Prove Lemma 4. First observe that the function (x 1/n)2 is concave up. (i) follows
from this fact. (ii) follows similarly the handy principle is known as Jensens Inequality:
If f (x) is concave up,

CSCI 5654

Pn

j=1 f (xj )

H. Gabow

Pn
nf ( j=1 xj /n).
Fall 2007

#65, p. 2

Karloff, Ch.5

Unit 9.G: Polynomial LP

Optimizing Over Round Regions

Karmarkars algorithm advances from a point p to the next point p


by a scheme that looks like this:
Tp

g
spherical
minimization

()
p

s
Tp1

this handout and the next two explain the basic ideas in ()
then we present the algorithm
Optimizing Over Spheres
its easy to optimize a linear function over a spherical feasible region
by method of steepest descent
y

3x+4y=-10

(-6/5,-8/5)

To minimize 3x + 4y over the disc with center (0, 0) and radius 2


start at the center & move 2 units along (3, 4) to (6/5, 8/5)
in general to minimize cx over a ball of radius r
start at the center & move r units along the vector c
this works because cx = 0 is the hyperplane of all vectors x orthogonal to c
so at the point we reach, the hyperplane cx = (constant) is tangent to the ball

CSCI 5654

H. Gabow

Fall 2007

#66, p. 1

Optimizing Over Round Regions


if the feasible region is round like a ball, the above strategy should get us close to a minimum

P
s*

x*

to make this precise suppose were currently at point x in the feasible region P
let S (S ) be balls contained in (containing) P with center x
let the radius of S be times that of S, 1
let x (s , s ) have minimum cost in P (S, S ) respectively
Lemma 1. cs cx (1 1/)(cx cx ).
Proof. s = x + (s x). hence
cx cs = cx + c(s x)

( 1)cx + cx cs
( 1)(cx cx ) (cs cx )
dividing by gives the desired inequality

CSCI 5654

H. Gabow

Fall 2007

#66, p. 2

Lemma 1 generalizes to allow S to be any closed subset of P :


for x a point in S, define S as the scaled up version of S,
S = {x + (y x) : y S}
assuming S P S , the same proof works
if is small we get a big improvement by going from x to s
but is big if were near the boundary of P
wed be in trouble if we were near the boundary but far from the optimum vertex
s*
P
x

S
x*

Moving to s makes little progress.


well keep relatively small by transforming the problem

CSCI 5654

H. Gabow

Fall 2007

#66, p. 3

Karloff, Ch.5

Unit 9.G: Polynomial LP

Simplices

we work with simplices because they have small


let 1 be the vector of n 1s, (1, . . . , 1)
the standard simplex n consists of all x Rn with 1T x = 1, x 0
its center (of gravity) is g = 1/n
x1

x2

x3

Lemma 4 of Handout#65 shows that using center g of n ,


the circumscribed sphere S has radius = n 1 times the radius of the inscribed sphere S
(well actually use a slightly different )

g
2

sin(30 ) =

CSCI 5654

H. Gabow

1
2

= = 2 for 3

Fall 2007

#67, p. 1

Karmarkar Standard Form


we always work with simplices, and Karmarkar Standard Form is defined in terms of them
consider the LP
minimize z = cx
subject to
Ax = 0
1T x = 1
x 0
A is an m n matrix, all vectors are in Rn
further assume the coefficients in A & c are integers
assume that g is feasible and z 0
the problem is to find a point with objective value 0 or show that none exists
also assume that the m + 1 equations are linearly independent
i.e., eliminate redundant rows of A
the exercise of Handout#18 shows that any LP can be converted to this standard form

CSCI 5654

H. Gabow

Fall 2007

#67, p. 2

Karloff, Ch.5

Unit 9.G: Polynomial LP

Projective Transformations

to transform our problem, mapping p to the center g of the simplex n ,


Tp

g
spherical
minimization

()
p

s
Tp1

we use the projective transformation y = Tp (x) where


xj /pj
y j = Pn
k=1 xk /pk
here we assume p > 0, x 0 & x 6= 0 , so Tp is well-defined
Example.
y

2
g

For 2 , p = (1/3, 2/3), Tp maps (x, y) to

(3x, 23 y)
.
3x + 23 y

Since y = 1 x, the point of 2 at x goes to 2

2
.
x+1

Properties of Tp
any vector y = Tp (x) belongs to n
since x 0 implies y 0
and the defining formula shows 1T y = 1
Tp (p) = g since all its coordinates are equal
Tp restricted to n has an inverse since we can recover x:
y j pj
xj = Pn
k=1 yk pk
(the definition of Tp implies xj = yj pj
where the constant is chosen to make 1T x = 1)

CSCI 5654

H. Gabow

Fall 2007

#68, p. 1

Vector notation:
define D to be the n n diagonal matrix with p along the diagonal
i.e., Djj = pj
the above formulas become
x=

Dy
T
1 Dy

and

y=

D1 x
1T D1 x

Exercise. Check Tp is a bijection between n and n , i.e., its onto.


let P be the feasible region of the given LP, i.e., Ax = 0, 1T x = 1, x 0
let P be the set of vectors satisfying ADy = 0, 1T y = 1, y 0
let c = cD
Lemma 1. Tp is a bijection between P and P , with Tp (p) = g.
Furthermore cx = 0 c Tp (x) = 0.
Proof. it remains only to observe that for y = Tp (x),
Ax = 0 ADy = 0, and cx = 0 cDy = 0.

our plan () is to find s as minimizing a linear cost function over a ball inscribed in P
good news: P is well-rounded
bad news: the cost function in the transformed space is nonlinear,
cx = cDy/(1T Dy) = c y/(1T Dy)
solution: exercising some care, we can ignore the denominator
and minimize c y, the pseudocost, in transformed space!
Caution. because of the nonlinearity,
the sequence of points p generated by Karmarkars algorithm need not have cost cp decreasing
a point p may have larger cost than the previous point p

CSCI 5654

H. Gabow

Fall 2007

#68, p. 2

Logarithmic Potential Function


we analyze the decrease in cost cp using a potential function:
for a vector x, let x = x1 x2 . . . xn
the potential at x is

(cx)n
f (x) = ln
x
assume cx > 0 & x > 0, as will be the case in the algorithm, so f (x) is well-defined
Remark. f is sometimes called a logarithmic barrier function it keeps us away from the boundary
since x < 1 in the simplex n , f (x) > ln (cx)n
so pushing f (x) to pushes cx to 0
define a corresponding potential in the transformed space, fp (y) = ln (c y)n /y

Lemma 2. If y = Tp (x) then fp (y) = f (x) + ln (p).


Proof. fp (y) is the natural log of (c y)n /y
the numerator of this fraction is

the denominator is

so the fraction equals

D1 x
cD T 1
1 D x

n

cx
T
1 D1 x

n

ni=1 (xi /pi )


(1T D1 x)n

(cx)n
(cx)n
=
p
ni=1 (xi /pi )
x

taking its natural log gives the lemma

in the scheme () we will choose s so fp (s) fp (g) , for some positive constant
the Lemma shows we get f (p ) f (p)
thus each step of the algorithm decreases f (p) by
and we push the potential to as planned

CSCI 5654

H. Gabow

Fall 2007

#68, p. 3

Karloff, Ch.5

Unit 9.G: Polynomial LP

Karmarkars Algorithm

recall the parameter L = mn + n log n +


#25)

P
{ log |r| : r a nonzero entry in A or c} (see Handout

Karmarkars Algorithm
Initialization
Set p = 1/n. If cp = 0 then return p.
Let > 0 be a constant determined below (Handout#70, Lemma 4). Set N = 2nL/.
Main Loop
Repeat the Advance Step N times (unless it returns).
Then go to the Rounding Step.
Advance Step
Advance from p to the next point p , using an implementation of ().
If cp = 0 then return p .
Set p = p .
Rounding Step
Move from p to a vertex v of no greater cost. (Use the exercise of Handout#23.)
If cv = 0 then return v, else return z > 0.
a valid implementation of () has these properties:
assume z = 0, p P , p > 0 & cp > 0
then p P , p > 0, and either cp = 0 or f (p ) f (p)
Lemma 1. A valid implementation of () ensures the algorithm is correct.
Proof. we can assume the Main Loop repeats N times


Pn
we start at potential value f (g) = ln (c1/n)n /(1/n)n = n ln
i=1 ci nL
the last inequality
 follows since if C is the largest cost coefficient,
Pn
ln
c
i=1 i ln (nC) ln n + ln C L
each repetition decreases the potential by
so the Main Loop ends with a point p of potential nL N nL
thus ln (cp)n < f (p) < nL, cp < eL < 2L
so the Rounding Step finds a vertex of cost < 2L
is a rational number with denominator < 2L (by the exercise of Handout#25)
so > 0 = > 1/2L
thus = 0

CSCI 5654

H. Gabow

Fall 2007

#69, p. 1

Idea for Implementing ()


as in Handout#68, we need to go from g to a point s
where fp (s) fp (g)
since fp (s) is the log of (c s)n /s
we could define s to minimize c s over the inscribed ball S
but to prevent the denominator from decreasing too much we use a slightly smaller ball:
p
S has radius r = 1/ n(n 1) > 1/n (Handout #65, Lemma 4)
minimize over the ball of radius /n, for some value 1
actually Lemma 4 of Handout#70 shows that must be < 1/2
Implementation of ()
Let B be the (m + 1) n matrix

AD
1T

Let c be the pseudocost vector cD


Project c onto the nullspace of B to get cP :
cP = c BT (BBT )1 Bc
If cP = 0 then return z > 0. Otherwise move /n units in the direction cP :
s = g (/n)cP /kcP k
Return to the original space:
p = Ds/(1T Ds)
Exercise. What is right, and what is wrong, with Professor Dulls objection to our implementation:
I doubt this implementation will work. The plan was to minimize over an inscribed ball.
The implementation minimizes over a ball S in transformed space. But in real space its
minimizing over Tp1 (S), which is not a ball.
Remark.
the projection step can be implemented more carefully:
(i) as a rank 1 modification from the previous projection step
this achieves O(n2.5 ) arithmetic operations, rather than O(n3 )
(ii) to take advantage of sparsity of A (& B)

CSCI 5654

H. Gabow

Fall 2007

#69, p. 2

Karloff, Ch.5

Analysis of Karmarkars Algorithm

Unit 9.G: Polynomial LP

we prove the implementation of () is valid in 5 lemmas


Lemma 1. The formula for projecting c onto the nullspace of B is correct.
Proof.
the rows of B are linearly independent
since standard form assumes A and 1T are linearly independent
so the lemma follows from Lemma 1 of Handout#65

Lemma 2. s minimizes the cost function c x over B(g, /n) P .


Proof.
the lemma follows from Lemma 2 of Handout#65
if we show that B(g, /n) P is the intersection of a ball and an affine space
let F be the affine space of points satisfying ADx = 0, 1T x = 1
Claim: B(g, /n) P = B(g, /n) F
comparing the definitions of F & P (Handout#68), it suffices to show
any coordinate of a point in B(g, /n) is nonnegative
this follows since any coordinate is gj /n = 1/n /n 0

Lemma 3. z = 0 = cP 6= 0.
Proof.
suppose cP = 0
then the formula for cP shows c is in the rowspace of B
.. c is orthogonal to every vector in the nullspace of B
take any q P
thus Tp (q) P , and Tp (q) Tp (p) is in the nullspace of B
so c (Tp (q) Tp (p)) = 0
recalling from Handout#68, Lemma 1 how cost c transforms to c ,
cp > 0 = c Tp (p) > 0
thus c Tp (q) > 0, and so cq > 0
equivalently, z > 0

Lemma 4. z = 0 = c s/(c g) < 1 /n.


Proof. we apply Lemma 1 of Handout#66:
the inscribed set S is B(g, /n) F
Lemma 2 above shows we optimize over this set
the circumscribed set S is B(g, R) F
clearly this set contains the feasible region P
CSCI 5654

H. Gabow

Fall 2007

#70, p. 1

S is S scaled up by the factor = R/(/n) < n/


p
since R = (n 1)/n < 1 (Handout #65, Lemma 4)

now assuming z = 0 the lemma gives c s (1 1/)c g (1 /n)c g

Lemma 5. Choosing as an arbitrary real value in (0, 1/2) and = 2 /(1 ) gives a valid
implementation of ().
Proof.
the first 2 requirements for validity are clear:
p P since s P (were using Lemma 4 of Handout#65 & Lemma 1 of Handout#68!)
p > 0 (since < 1, each coordinate sj is positive)
for the 3rd requirement, assume z = 0, cp > 0
we must show f (p ) f (p)
from Handout#68,p.2 this means fp (s) fp (g)
by definition fp (y) = ln ((c y)n /y)
 c s n

ln (ns)
this gives fp (s) fp (g) = ln
cg
the 1st term is :
 c s n
ln
< ln (1 /n)n = n ln (1 /n)
cg

Lemma 4
the 2nd term is 2 /(1 ):
apply the Lemma 3 of Handout#65 to vector x = ns
sP> 0
n
j=1 sj = 1
k1 nsk = nk1/n sk = n/n = < 1

2
conclude ln (ns)
1
combining the 2 estimates, fp (s) fp (g) + 2 /(1 )
the lemma chooses as the negative of the r.h.s., giving the desired inequality
furthermore choosing < 1/2 makes > 0

Theorem. Karmarkars algorithm solves an LP in polynomial time, assuming all arithmetic operations are carried out exactly.
Proof.
the Main Loop repeats O(nL) times
each repetition performs all matrix calculations in O(n3 ) arithmetic operations
including taking a square root to calculate kcP k in ()
so we execute O(n4 L) arithmetic operations
(Vaidya (STOC 90) reduces this to O(n3 L))

it can be proved that maintaining O(L) bits of precision is sufficient


thus completing the proof of a polynomial time bound

CSCI 5654

H. Gabow

Fall 2007

#70, p. 2

Karloff, Ch.5

Unit 9.G: Polynomial LP

Exercises for Karmarkars Algorithm

we solve the exercises of Handout#65 for Karmarkars algorithm


Exercise 1:
Lemma 3. Let x Rn be a vector with x > 0 and
Y

n
2
.
< 1. Then
ln
xj
1
j=1

Pn

j=1 xj

= n. Set = k1 xk & assume

Proof.
sinceQthe geometricPmean is at most the arithmetic mean,
n
n
n
j=1 1/xj [
j=1 (1/xj )/n]
let y = 1 x
Pn
so the r.h.s. becomes [ j=1 (1/(1 yj ))/n]n
we will
P upperbound the sum in this expression, using these properties of yj :
j yj = 0
kyk =
for each j, |yj | < 1 (since |yj | kyk = < 1)
son the sum is
n
X
X
(1/(1 yj )
(1 + yj + yj2 + yj3 + yj4 + . . .)
j=1

j=1

n
n
X
X
(yj2 + yj3 + yj4 + . . .)
(1 + yj ) +
=
j=1

j=1

=n+0+

n
X

yj2 (1 + yj + yj2 + . . .)

j=1

n+
=n+

n
X

j=1
n
X

yj2 (1 + + 2 + . . .)
yj2 /(1 )

j=1

= n + kyk2 /(1 )
= n + 2 /(1 )
we have shown

Qn

j=1

1/xj [1 + 2 /n(1 )]n

taking logs,
Qn
ln j=1 1/xj n ln [1 + 2 /n(1 )] n2 /n(1 ) = 2 /(1 )

CSCI 5654

H. Gabow

Fall 2007

#71, p. 1

Exercise 2:
Pn
Lemma 4. Let H be the hyperplane i=1 xi = 1. Let be the subset of H where all coordinates
xi are nonnegative. Let g = (1/n, . . . , 1/n).
p
(i) Any point in is at distance at most Rp= (n 1)/n from g.
(ii) Any point of H within distance r = 1/ n(n 1) of g is in .
Proof.
(i) take any point in x n
we show its distance from g is R by starting at x,
moving away from g, and eventually reaching a corner point like (1, 0, . . . , 0),
which weve seen is at distance R
wlog let x1 be the maximum coordinate xj
choose any positive xj , j > 1
increase x1 by xj and decrease xj to 0
we stay on H,
this increases the distance from g, since (x 1/n)2 is concave up
repeat this until the corner point (1, 0, . . . , 0) is reached
(ii) it suffices to show any point x H with xn < 0 is at distance > r from g
Pn1
a point of H with xn < 0 has j=1 xj > 1
Pn1
to minimize j=1 (xj 1/n)2 , set all coordinates equal (by Jensen)
so the minimum sum is > (n 1)(1/(n 1) 1/n)2
this implies
the distance to g is
Pn
2
2
2
2
(x
j=1 j 1/n) > (n 1)(1/(n 1) 1/n) + 1/n = r

CSCI 5654

H. Gabow

Fall 2007

#71, p. 2

Unit 9.G: Polynomial LP

Primal-Dual Interior Point Methods

source: Primal-dual Interior-point Methods by S.J. Wright, SIAM, Philadelphia PA, 1997.
the break-through papers:
L.G. Khachiyan, A polynomial algorithm in linear programming, Soviet Math. Dolkady, 1979
N. Karmarkar, A new polynomial-time algorithm for linear programming, Combinatorica, 1984
after Karmarkar other interior-point methods were discovered,
making both theoretic and practical improvements
Primal-dual Interior-point Methods
consider an LP
minimize cx
subject to Ax = b
x0
& its dual,
maximize
yb
subject to yA + s = c
s0
we find optimum primal and dual solutions by solving the system () of Handout#19, p.2:
()

Ax
yA + s
xj sj
x, s

=
=
=

b
c
0
0

j = 1, . . . , n

Approach
apply Newtons method to the 3 equations of ()
modified so that positivity (x, s > 0) always holds
this keeps us in the interior and avoids negative values!
the complementary slackness measure =

Pn

j=1 xj sj

there are 2 approaches for the modification


Potential-reduction methods
each step reduces a logarithmic potential function
the potential function has 2 properties:
(a) it approaches if xj sj 0 for some j but 6 0
(b) it approaches if & only if (x, y, s) approaches an optimum point
a very good potential function: ln
where is a parameter > n

Pn

j=1

ln (xj sj )

note the similarity to Karmarkars potential function!


CSCI 5654

H. Gabow

Fall 2007

#72, p. 1

Path-following methods
the central path of the feasible region is a path (xt , yt , st ) (t is a parameter > 0)
satisfying () with the 3rd constraint replaced by
xj sj = t, j = 1, . . . , n
clearly this implies positivity
as t approaches 0 we approach the desired solution
we can take bigger Newton steps along the central path
predictor-corrector methods alternate between 2 types of steps:
(a) a predictor step: a pure Newton step, reducing
(b) a corrector step: moves back closer to the central path
Mehrotras predictor-corrector algorithm is the basis of most current interior point codes
e.g., CPLEX
Infeasible-interior-point methods can start without a feasible interior point
extensions of these methods solve semidefinite programming (Handouts#41, 44)
& (convex) quadratic programming,
minimize cT x + xT Qx
subject to
Ax = b
x0
where Q is symmetric positive semidefinite (Handouts#42, 43)

CSCI 5654

H. Gabow

Fall 2007

#72, p. 2

MC, 16.1-2,16.5

Linear Complementarity Problems

Unit 9.H: Beyond LP

LCP: we are given an n n matrix A & a length n column vector b


we wish to find length n column vectors x, y satisfying
y Ax = b
yT x = 0
x, y 0
equivalently for each i = 1, . . . , n, discard 1 of yi , xi
then find a nonnegative solution to the reduced linear system
by way of motivation we show LCP generalizes LP.
Proof. consider a primal-dual pair
max cx s.t. Ax b, x 0
min yb s.t. yA c, y 0
introduce primal slacks s and dual slacks t
.. x & y are optimum s, t 0 & ys = tx = 0
we can rewrite the optimality condition as this LCP:


 T  

0
A
y
b

=
AT
0
x
cT
 T
 T
 y
=0
s
t
x
s, t, x, y 0

s
tT

Exercise. Explain why LCP is a special case of QP (Handout#42)


algorithms such as complementary pivot (simplex) algorithm and interior point solve LCP

CSCI 5654

H. Gabow

Fall 2007

#73, p. 1

Application to Game Theory: Bimatrix Games


also called nonzero sum 2-person games
we have two m n payoff matrices A, B
if ROW player chooses i & COLUMN chooses j, ROW loses aij & COLUMN loses bij
ROW plays according to a stochastic column vector x (length m)
COLUMN plays according to a stochastic column vector y (length n)
so ROW has expected loss xT Ay, COLUMN has expected loss xT By
Example. In the game of Chicken, whoever chickens out first loses
neither player

chickens out

2, 2

1, 1

1, 1

0, 0

both players
chicken out

in a bimatrix game x , y form a Nash equilibrium point (recall Handout#22) if


xT Ay xT Ay for all stochastic vectors x
T

By x

By for all stochastic vectors y

(ROW cant improve)


(COLUMN cant improve)

Example. Chicken has 2 pure Nash points: ROW always chickens out & COLUMN never does, and vice
versa Also, both players choose randomly with probability 1/2.
(2(1/2) 1(1/2) = 1/2, 1(1/2) + 0(1/2) = 1/2)
Fact. Any bimatrix game has a (stochastic) Nash point.
Theorem. The Nash equilibria correspond to the solutions of an LCP.
Proof.
1. can assume all entries of A & B are > 0
Proof. for any A , distributivity shows xT (A + A )y = xT Ay + xT A y
any stochastic x, y have xT 1y = 1
for 1 an m n matrix of 1s
so we can increase A by a large multiple of 1,
without changing the Nash equilibria, to make every coefficient positive

CSCI 5654

H. Gabow

Fall 2007

#73, p. 2

2. let x , y be a Nash equilibrium


rewrite the Nash conditions: (well do it for ROWs condition & A; COLUMN & B is symmetric)
write = xT Ay
(a) taking xi = 1 & all other xj = 0 shows each component (Ay )i of Ay must be
P
P
(b) since xT Ay = i xi (Ay )i i xi =
we must have for all i, either xi = 0 or (Ay )i =
its easy to see that conversely, (a) & (b) guarantee the Nash condition for ROW
assumption #1 with x , y stochastic implies > 0
so we can define vectors
x
y
x = T
y = T
,
x By
x Ay
(a) becomes Ay 1 (1 is a column vector of 1s)
letting u be the vector of slacks in this inequality,
(b) becomes uT x = 0
so weve shown x, y, u & v (the slacks for x) satisfy this LCP:
  
 
u
0
A
x

= 1
T
v
B
0
y
 T  
u
x
=0
v
y
x, u, y, v 0
3. conversely let x, y be any solution to the LCP
make them stochastic vectors:
x
y
x = T , y = T
1 x
1 y
these vectors form a Nash equilibrium point, by an argument similar to #2
e.g., we know Ay 1
also xT Ay = 1T x by complementarity uT x = 0
thus xT Ay = 1, Ay (xT Ay)1, Ay (xT Ay )1
the last inequality implies ROWs Nash condition

CSCI 5654

H. Gabow

Fall 2007

#73, p. 3

V 23.2

Quadratic Programming Duality

Unit 9.H: Beyond LP

the dual is defined using the constraints of the KKT conditions:


Primal
minimize 21 xT Qx + cx
subject to Ax b
x0
there are 2 row vectors of dual variables
y, an m-vector of duals corresponding to the m linear constraints (i.e., the LP duals)
z, an n-vector of free variables, corresponding to the objective functions Q
Dual
maximize 12 zQzT + yb
subject to yA + zQ c
y0
Theorem. (Weak Duality) If Q is PSD, any feasible dual (max) solution lower bounds any feasible
primal (min) solution.
Proof.
PSD gives (z + xT )Q(zT + x) 0 , i.e.,
zQzT + 2zQx + xT Qx 0
as in LP Weak Duality we have
yb yAx
& rewriting the PSD inequality gives
21 zQzT zQx + 12 xT Qx
adding these 2 inequalities, the dual objective is on the left
the first 2 terms on the right are upper bounded as in LP Weak Duality:
yAx + zQx cx
giving the primal objective on the right, i.e.,
(primal objective) (dual objective)

Exercise. Prove Strong Duality.

CSCI 5654

H. Gabow

Fall 2007

#74, p. 1

Examples.
for simplicity well use 1-dimensional primals
1. Primal: min x2 /2 s.t. x 1
Q = (1), PD
the optimum solution is x = 1, objective = 1/2
Dual: max z 2 /2 + y s.t. y + z 0, y 0
optimum solution y = 1, z = 1, objective = 1/2 + 1 = 1/2
Proof this dual solution is optimum:
z y 0 = z 2 y 2 , z 2 /2 y 2 /2
.. objective y 2 /2 + y = y(y/2 + 1)
this quadratic achieves its maximum midway between the 2 roots, i.e., y = 1

2. we show Weak Duality fails in an example where Q is not PSD:


use Example 1 except Q = (1)
Primal: min x2 /2 s.t. x 1
the problem is unbounded
Dual: max z 2 /2 + y s.t. y z 0, y 0
taking y = z shows the problem is unbounded
so the primal does not upper bound the dual

CSCI 5654

H. Gabow

Fall 2007

#74, p. 2

V, p.404

Unit 9.H: Beyond LP

Losing PSD

1. If Q is not PSD a point satisfying KKT need not be optimum. In fact every vertex can be a
local min! No good QP algorithms are known for this general case.
2. this isnt surprising: QP is NP-hard
integer QP is undecidable!
we give an example where Q with Q PD has a unique optimum
but flipping Qs sign makes every vertex a local min!
Q PD: P
Pn
n
Q: min j=1 xj (1 + xj ) + j=1 j xj s.t. 0 xj 1, j = 1, . . . , n
assume the s are small, specifically for each j, |j | < 1
going to standard form, A = I, b = (1, . . . , 1)T , Q = 2I
the dual KKT constraint AT y Qx cT is
yj 2xj 1 + j , j = 1, . . . , n
the KKT CS constraints are
xj > 0 = yj 2xj = 1 + j , i.e., 2xj = 1 j yj
yj > 0 = xj = 1
the first CS constraint implies we must have xj = 0 for all j
(since our assumption on j implies 1 j yj < yj 0)
taking x = y = 0 satisfies all KKT conditions, so the origin is the unique optimum
(as expected)
Q ND:
flipping Qs sign is disastrous we get an exponential number of solutions to KKT!
Q: flip the sign of the quadratic term in the objective function, xj (1 + xj ) xj (1 xj )
now Q = 2I but the other matrices & vectors are unchanged
the dual KKT constraint gets a sign flipped,
yj + 2xj 1 + j , j = 1, . . . , n
& the KKT CS constraints change only in that sign:
xj > 0 = yj + 2xj = 1 + j , i.e., 2xj = 1 + j + yj
yj > 0 = xj = 1
the new KKT system has 3n solutions:
there are 3 possibilities for each j,
xj = y j = 0
xj = 1, yj = 1 j
xj = (1 + j )/2, yj = 0
its easy to check all 3 satisfy all KKT conditions
note the 3rd alternative is the only possibility allowed by CS when 0 < xj < 1
the feasible region has 2n vertices, each is a KKT solution,
and it can be shown that each vertex is a local minimum
CSCI 5654

H. Gabow

Fall 2007

#75, p. 1

You might also like