Introduction To Optimisation PDF

Introduction to Optimization
Course Notes for CO 250/CM 340

Fall 2012
Department
c of Combinatorics and Optimization
University of Waterloo
August 27, 2012

2
Contents
1 Introduction 7
1.1 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.1 The formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.2 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Linear programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 Multiperiod models . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Integer programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.1 Assignment problem . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Knapsack problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4 Optimization problems on graphs . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4.1 Shortest path problem . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4.2 Minimum cost perfect matching . . . . . . . . . . . . . . . . . . . . 37
1.5 Integer programs, continued . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.5.1 Minimum cost perfect matching . . . . . . . . . . . . . . . . . . . . 40
1.5.2 Shortest path problem . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.6 Non linear programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.6.1 Pricing a Tech Gadget . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.6.2 Finding a closest point feasible in an LP . . . . . . . . . . . . . . . . 50
1.6.3 Finding a “central” feasible solution of an LP . . . . . . . . . . . . . 50
1.7 Overview of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.8 Further reading and notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2 Solving linear programs 55

2.1 Possible outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.1.1 Infeasible linear programs . . . . . . . . . . . . . . . . . . . . . . . 56
3
4 CONTENTS
2.1.2 Unbounded linear programs . . . . . . . . . . . . . . . . . . . . . . 59

2.1.3 Linear programs with optimal solutions . . . . . . . . . . . . . . . . 60
2.2 Standard equality form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.3 A Simplex iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.4 Bases and canonical forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4.1 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.4.2 Canonical forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.5 The Simplex algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.5.1 An example with an optimal solution . . . . . . . . . . . . . . . . . 79
2.5.2 An unbounded example . . . . . . . . . . . . . . . . . . . . . . . . 82
2.5.3 Formalizing the procedure . . . . . . . . . . . . . . . . . . . . . . . 83
2.6 Finding feasible solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.6.1 Reducibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.6.2 The Two Phase method - an example . . . . . . . . . . . . . . . . . 95
2.6.3 Consequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2.7 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
2.8 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.8.1 Feasible solutions to linear programs and polyhedra . . . . . . . . . . 102
2.8.2 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.8.3 Extreme points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
2.8.4 Geometric interpretation of the Simplex algorithm . . . . . . . . . . 108
3 Duality through examples 115

3.1 The shortest path problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.1.1 An intuitive lower bound . . . . . . . . . . . . . . . . . . . . . . . . 116
3.1.2 A general argument - weak duality . . . . . . . . . . . . . . . . . . . 120
3.1.3 Revisiting the intuitive lower bound . . . . . . . . . . . . . . . . . . 123
3.1.4 An algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.1.5 Correctness of the algorithm . . . . . . . . . . . . . . . . . . . . . . 132
3.2 Minimum cost perfect matching in bipartite graphs . . . . . . . . . . . . . . 135
3.2.1 An intuitive lower bound . . . . . . . . . . . . . . . . . . . . . . . . 136
3.2.2 A general argument - weak duality . . . . . . . . . . . . . . . . . . . 138
3.2.3 Revisiting the intuitive lower bound . . . . . . . . . . . . . . . . . . 141
CONTENTS 5
3.2.4 An algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

3.2.5 Correctness of the algorithm . . . . . . . . . . . . . . . . . . . . . . 150
3.2.6 Finding perfect matchings in bipartite graphs* . . . . . . . . . . . . 154
4 Duality theory 165

4.1 Weak duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.2 Strong duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
4.3 A geometric characterization of optimality . . . . . . . . . . . . . . . . . . . 179
4.3.1 Complementary slackness . . . . . . . . . . . . . . . . . . . . . . . 179
4.3.2 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
4.4 Farkas’ Lemma* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
5 Applications of duality* 195

5.1 Approximation algorithm for set cover . . . . . . . . . . . . . . . . . . . . . 195
5.1.1 A primal-dual algorithm . . . . . . . . . . . . . . . . . . . . . . . . 197
5.1.2 Greed is good ... at least sometimes . . . . . . . . . . . . . . . . . . 199
5.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
5.2 Economic interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.3 The max-flow min-cut theorem . . . . . . . . . . . . . . . . . . . . . . . . . 206
6 Solving Integer Programs 213

6.1 Integer programs versus linear programs . . . . . . . . . . . . . . . . . . . . 214
6.2 Cutting planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.2.1 Cutting planes and the Simplex algorithm . . . . . . . . . . . . . . . 221
6.3 Branch & Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
7 Nonlinear optimization 233

7.1 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.2 Some nonlinear programs are very hard . . . . . . . . . . . . . . . . . . . . 235
7.2.1 NP-hardness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.2.2 Hard small dimensional instances . . . . . . . . . . . . . . . . . . . 236
7.3 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
6 CONTENTS
7.3.1 Convex functions and epigraphs . . . . . . . . . . . . . . . . . . . . 237

7.3.2 Level sets and feasible region . . . . . . . . . . . . . . . . . . . . . 240
7.4 Relaxing convex NLPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
7.4.1 Subgradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
7.4.2 Supporting halfspaces . . . . . . . . . . . . . . . . . . . . . . . . . 243
7.5 Optimality conditions for the differentiable case . . . . . . . . . . . . . . . . 244
7.5.1 Sufficient conditions for optimality . . . . . . . . . . . . . . . . . . 245
7.5.2 Differentiability and gradients . . . . . . . . . . . . . . . . . . . . . 247
7.5.3 A Karush-Kuhn-Tucker Theorem . . . . . . . . . . . . . . . . . . . 248
7.6 Optimality conditions based on Lagrangians . . . . . . . . . . . . . . . . . . 250
7.7 Interior-point method for linear programs* . . . . . . . . . . . . . . . . . . . 253
7.7.1 A polynomial-time interior-point algorithm . . . . . . . . . . . . . . 258
Chapter 1
Introduction
Broadly speaking, optimization is the problem of minimizing or maximizing a function sub-

ject to a number of constraints. Optimization problems are ubiquitous. Every CEO is faced
with the problem of maximizing profit given limited resources. In general this is too general a
problem to be solved exactly, however, many aspects of decision making can be successfully
tackled using optimization techniques. This includes, for instance, production, inventory, and
machine scheduling problems. Indeed, the overwhelming majority of Fortune 500 compa-
nies make use of optimization techniques. However, optimization problems, are not limited
to the corporate world. Every time you use your GPS, it solves an optimization problem,
namely, how to minimize the travel time between two different locations. Your hometown
may wish to minimize the number of trucks it requires to pick up garbage by finding the most
efficient route for each truck. City planners, may need to decide where to build new fire sta-
tions in order to efficiently serve its citizens. Other examples include: how to construct a
portfolio that maximizes its expected return while limiting volatility; how to build a resilient
tele-comunication network as cheaply as possible; how to schedule flights in a cost effective
way while meeting the demand for passengers; or how to schedule final exams using as few
classrooms as possible.
Suppose that you are a consultant hired by the CEO of the WaterTech company to solve an
optimization problem. Say for simplicity that it is a maximization problem. You will follow a
two step process,
(1) find a formulation of the optimization problem,
(2) using a suitable algorithm solve the formulation.
7
8 CHAPTER 1. INTRODUCTION
A formulation is a mathematical representation of the optimization problem. The various

parameters that the WaterTech CEO wishes to determine are represented as variables (un-
knowns) in your formulations. The objective function will represent the quantity that needs
to be maximized. Finally, every constraint to the problem, is expressed as a mathematical
constraint.
Now given a mathematical formulation of an appropriate form, you need to develop (or
use an existing) algorithm to solve the formulation. By an algorithm we mean in this case,
a finite procedure (something that can be coded as a computer program) that will take as
input the formulation, and return an assignment of values to each of the variables such that
all constraints are satisfied, and subject to these conditions, maximizes the objective function.
The values assigned to the variables indicate the optimal choices for the parameters that the
CEO of WaterTech wishes to determine.
This two step process is summarized in the following figure. In this chapter we will focus
our attention on the first step, namely how to formulate optimization problems.
Real life problem 1. Modeling Formulation
2. Run algorithm
Optimal choices
Solution
1.1 An example
To clarify these ideas let us consider a simple example. Suppose WaterTech manufactures four
products, requiring time on two machines and two types (skilled and unskilled) of labour. The
amount of machine time and labor (in hours) needed to produce a unit of each product and the
sales prices in dollars per unit of each product are given in the following table:
Product Machine 1 Machine 2 Skilled Labor Unskilled Labor Unit Sale Price
1 11 4 8 7 300
2 7 6 5 8 260
3 6 5 5 7 220
4 5 4 6 4 180
1.1. AN EXAMPLE 9
Each month, 700 hours are available on machine 1 and 500 hours on machine 2. Each month,
WaterTech can purchase up to 600 hours of skilled labour at $8 per hour and up to 650 hours of
unskilled labour at $6 per hour. The company wants to determine how much of each product
it should produce each month and how much labor to purchase in order to maximize its profit
(i.e., revenue from sales minus labour costs).
1.1.1 The formulation

We wish to find a formulation for this problem, i.e. we need to determine the variables, the
objective function, and the constraints.
Variables. WaterTech must decide how much of each product to manufacture; we capture
this by introducing a variable xi for each i ∈ {1, 2, 3, 4} for the number of units of product i to
manufacture. As part of the planning process, the company must also decide on the number
of hours of skilled and unskilled labour that it wants to purchase. We therefore introduce vari-
ables ys and yu for the number of purchased hours of skilled and unskilled labour, respectively.
Objective function. Deciding on a production plan now amounts to finding values for
variables x1 , . . . , x4 , ys and yu . Once the values for these variables have been found, Wa-
terTech’s profit is easily expressed by the following function
300x + 260x2 + 220x3 + 180x4 − 8ys − 6yu ,

| 1 {z } | {z }
Profit from Sales Labour Costs
and the company wants to maximize this quantity.
Constraints. We manufacture x1 units of product 1 and each unit of product 1 requires
11 hours on machine 1. Hence, product 1 will use 11x1 hours on machine 1. Similarly, for
machine 1, product 2 will use 7x2 hours, product 3 will use 6x3 hours, and product 4 will use
5x4 hours. Hence, the total amount of time needed on machine 1 is given by
11x1 + 7x2 + 6x3 + 5x4 ,
and this must not exceed the available 700 hours of time on that machine. Thus,
11x1 + 7x2 + 6x3 + 5x4 ≤ 700. (1.1)
In a similar way, we derive a constraint for machine 2
4x1 + 6x2 + 5x3 + 4x4 ≤ 500. (1.2)

Analogously, once we decide on how much of each product should be produced, we know
how much skilled and unskilled labour is needed. Naturally, we need to make sure that enough
hours of each type of labour are purchased. The following two constraints enforce this.
8x1 + 5x2 + 5x3 + 6x4 ≤ ys (1.3)

7x1 + 8x2 + 7x3 + 4x4 ≤ yu . (1.4)
Finally, we need to add constraints that force each of the variables to take on only non-negative
values as well as constraints that limit the number of hours of skilled and unskilled labour pur-
chased. Combining the objective function with (1.1)-(1.4) gives the following formulation,
max 300x1 + 260x2 + 220x3 + 180x4 − 8ys − 6yu

subject to
11x1 + 7x2 + 6x3 + 5x4 ≤ 700
4x1 + 6x2 + 5x3 + 4x4 ≤ 500
(1.5)
8x1 + 5x2 + 5x3 + 6x4 ≤ ys
7x1 + 8x2 + 7x3 + 4x4 ≤ yu
ys ≤ 600
yu ≤ 650
x1 , x2 , x3 , x4 , yu , ys ≥ 0.
1.1.2 Correctness
Is the formulation given by (1.5) correct, i.e. does this formulation capture exactly the Wa-
terTech problem? We will argue that it does and outline a procedure to verify whether a given
formulation is correct. Let us introduce a bit of terminology. By the word description of the
optimization problem we mean a description of the optimization problem in plain english.
This is the description that the CEO of WaterTech would give you. By a formulation we mean
the mathematical formulation, as in (1.5). A solution to the formulation is an assignment of
values to each of the variables of the formulation. A solution is feasible if it has the prop-
erty that all the constraints are satisfied. An optimal solution to the formulation is a feasible
solution that maximizes the objective function (or minimizes it if the optimization problem
is a minimization problem). Similarly, we define a solution to the word description of the
optimization problem to be a choice for the unknowns, and a feasible solution to be such a
choice that satisfies all the constraints.
When attempting to write a formulation for an optimization problem you need to make
sure that there is a mapping between feasible solution of the word description and feasible
1.1. AN EXAMPLE 11
solution of the formulation and vice versa between feasible solution of the formulation and
feasible solution of the word description. For instance, a feasible solution for WaterTech is to
produce, 10 units of product 1, 50 of product 2, 0 units of product 3, and 20 of product 4, and
buy 600 hours of both skilled and unskilled labour. This corresponds to the following feasible
solution of the formulation:
x1 = 10, x2 = 50, x3 = 0, x4 = 20, ys = 600, yu = 600. (1.6)
Conversely, given the feasible solution (1.6) we can construct a feasible solution for the word
description of the WaterTech problem. Note that this works for every feasible solution as,
(1) every constraint in the word description is present in the formulation, and
(2) every constraint in the formulation arises from a constraint in the word description.
When constructing a formulation you need to make sure that both conditions (1) and (2) are
satisfied. For if (1) does not hold, feasible solutions to the formulation may violate constraints
of the word formulation, and if (2) does not hold, then the formulation is more restrictive
than the word description. A common mistake is to violate (1) by forgetting some constraint
when writing down the formulation. For instance, we may forget to write down the condition
for WaterTech that ys is non-negative. In that case, when solving the formulation using an
algorithm, we may end up with an negative value for ys , i.e. we buy a negative amount of
skilled labor or equivalently we sell skilled labor, which something that is not allowed in our
word description. Thus far, we have only discussed feasible solutions. Clearly, we also need
to verify that the objective function in the word formulation and the formulation are the same.
This is clearly the case for the WaterTech formulation, and is usually straightforward to verify.
We used an algorithm to find an optimal solution to (1.5) and obtained
2 1 1
x1 = 16 + , x2 = 50, x4 = 33 + , ys = 583 + , yu = 650, (1.7)
3 3 3
achieving a total profit of $15, 433 + 13 . Thus the optimal strategy for WaterTech is to manu-
facture 16 + 32 units of product 1, 50 units of product 2, 0 units of product 3, 33 + 13 units of
product 4, and to buy 583 hours of skilled labour and 650 units of unskilled labour.
Since constructing the formulation is only our first step (see Figure 1) and we need to use
an algorithm to find an optimal solution to the formulation, we will strive to get, among all
possible formulation, one that is as simple as possible. In the remainder of the chapter we will
introduce three types of formulations,
• linear programs (section 1.2),
• integer programs (section 1.3), and
• non-linear programs (section 1.6).
There are very efficient techniques to solve linear programs and we will see of these in Chap-
ters 2 and 7. Integer programs and non-linear programs can be very hard to solve, however.
Thus when possible we will try to formulate our problem as a linear program. Unfortunately,
this is not always possible, and sometimes the only possible formulation is an integer program
or a non-linear program.
1.2 Linear programs

A function f : Rn → R is an affine function if f (x) = aT x + β where x and a are vectors with n
entries, and β is a real number. If β = 0 then f is a linear function. Thus every linear function
is affine, but the converse is not true.
Example 1. Suppose x = (x1 , x2 , x3 , x4 )T . Then
(a) f (x) = x1 − x3 + x4 , is a linear function,
(b) f (x) = 2x1 − x3 + x4 − 6, is an affine function, but not a linear function, and
(c) f (x) = 3x1 + x2 − 6x3 x4 , is not an affine function (because of the product x3 x4 ).
A linear constraint is any constraint that is of one of the following forms (after moving all
variables to the left and side and all constants to the right hand side),
f (x) ≤ β or f (x) ≥ β or f (x) = β
where f (x) is a linear function, and β is a real number. A linear program is the problem of
maximizing or minimizing an affine function subject to a finite number of linear constraints.
We will abbreviate the term linear program by LP, throughout this book.
Example 2.
1.2. LINEAR PROGRAMS 13
(a) The following is an LP,

min x1 − 2x2 + x4
subject to
x1 − x3 ≤ 3
x2 + x4 ≥ 2
x1 + x2 = 4
x1 , x2 , x3 , x4 ≥ 0
(b) The following is not an LP, as x2 + x3 < 3 is not a linear constraint, 1
max x1
subject to
x2 + x3 < 3
x1 − x4 ≥ 1
(c) The following is not an LP, as the objective function is not affine,
1
min
x
subject to
x1 ≥ 3
(d) The following is not an LP, as there are an infinite number of constraints,
max x1
subject to
x1 + kx2 ≤ 5 (for all constants k where 2 ≤ k ≤ 3)
Observe also that (1.5) is an example of an LP. In that example constraint 8x1 + 5x2 + 5x3 +
6x4 ≤ ys can be rewritten as 8x1 + 5x2 + 5x3 + 6x4 − ys ≤ 0.
1.2.1 Multiperiod models

In this section we present another example of optimization problems that can be formulated as
an LP. In the problem discussed in section 1.1, we were asked to make a one-time decision on
1 strict inequalities are not allowed
a production plan. Often times, the decision making process has a temporal component; time
is split into so called periods and we have to make certain decisions at the beginning or end of
each of them. Each of these decisions will naturally determine the final outcome at the end of
all periods. In this section, we introduce this area by discussing a sample problem. KWOil is
a local supplier of heating oil. The company has been around for many years, and knows its
home turf. In particular, KWOil has developed a dependable model to forecast future demand
for oil. For each of the following four months, the company expects the following amounts of
demand for heating oil.
Month 1 2 3 4
Demand (`) 5000 8000 9000 6000
At the beginning of each of the 4 months, KWOil may purchase heating oil from a regional
supplier at the current market rate. The following table shows the projected price per litre at
the beginning of each of these months.
Month 1 2 3 4
Price ($/`) 0.75 0.72 0.92 0.90
KWOil has a small storage tank on its facility. The tank can hold up to 4000 litres of oil, and
currently (at the beginning of month 1) contains 2000 litres. The company wants to know how
much oil it should purchase at the beginning of each of the 4 months such that it satisfies the
projected customer demand at the minimum possible total cost. Note, oil that is delivered at
the beginning of each month can be delivered directly to the customer, it does not need to be
first put in our storage, only, oil that is left over at the end of the month goes into storage. We
wish to find an LP formulation for this problem. Thus we need to determine the variables, the
objective function, and the constraints.
Variables. KWOil needs to decide how much oil to purchase at the beginning of each of
the four months. We therefore introduce variables pi for i ∈ {1, 2, 3, 4} denoting the number
of litres of oil purchased at the beginning of month i for i ∈ {1, 2, 3, 4}. We also introduce
variables ti for each i ∈ {2, 3, 4} to denote the number of litres of heating oil in the company’s
tank at the beginning of month i. Thus while every unknown of the word description always
needs to be represented as a variable in the formulation, it is sometimes useful, or necessary,
to introduce additional variables to keep track of various parameters.
Objective function. Given the variables defined above, it is straightforward to write down
the cost incurred by KWOil. The objective function of KWOil’s problem is
min 0.75p1 + 0.72p2 + 0.92p3 + 0.90p4 . (1.8)

Constraints. What constraints does KWOil face, or put differently, when do {pi }i and
{ti }i yield a feasible oil purchasing pattern? Well, in each month i, the company needs to have
enough heating oil available to satisfy customer demand. The amount of available oil at the
beginning of month 1, for example, is comprised of two parts: the p1 litres of oil purchased
in month 1, and the t1 litres contained in the tank. The sum of these two quantities needs to
cover the demand in month 1, and the excess is stored in the local tank. Hence, we obtain the
following constraint:
p1 + t1 = 5000 + t2 . (1.9)
We obtain similar constraints for months 2 and 3:
p2 + t2 = 8000 + t3 , (1.10)
p3 + t3 = 9000 + t4 . (1.11)
Finally, in order to satisfy the demand in month 4 we need to satisfy the constraint
p4 + t4 ≥ 6000. (1.12)
Notice that each of the variables ti for i ∈ {2, 3, 4} appears in two of the constraints (1.9)-
(1.12). The constraints are therefore linked by the t-variables. Such linkage is a typical feature
in multiperiod models.
We now obtain the entire LP for the KWOil problem by combining (1.8)-(1.12), and
by adding upper bounds and initialization constraints for the tank contents, as well as non-
negativity constraints:
min 0.75p1 + 0.72p2 + 0.92p3 + 0.90p4

subject to
p1 + t1 = 5000 + t2
p2 + t2 = 8000 + t3
p3 + t3 = 9000 + t4
p4 + t4 ≥ 6000
t1 = 2000
ti ≤ 4000 (i = 2, 3, 4)
ti , pi ≥ 0 (i = 1, 2, 3, 4)
Solving this LP yields
p1 = 3000, p2 = 12000, p3 = 5000, p4 = 6000, t1 = 2000, t2 = 0, t3 = 4000, t4 = 0,

corresponding to a total purchasing cost of $20, 890. Not surprisingly, this solution suggests
to take advantage of the low oil prices in month 2 while no oil should be stored in month 3
where the prices are higher.
Exercises
1. Consider the following table indicating the nutritional value of different food types.
Foods Price ($) Calories Fat (g) Protein (g) Carbohydrade (g)
per serving per serving per serving per serving per serving
Raw Carrots 0.14 23 0.1 0.6 6
Baked Potatoes 0.12 171 0.2 3.7 30
Wheat Bread 0.2 65 0 2.2 13
Cheddar Cheese 0.75 112 9.3 7 0
Peanut Butter 0.15 188 16 7.7 2
You need to decide how many servings of each food to buy each day so that you min-
imize the total cost of buying your food while satisfying the following daily nutritional
requirements:
• calories must be at least 2000,

• fat must be at least 50g,
• protein must be at least 100g,
• carbohydrates must be at least 250g.
Write an LP that will decide how many servings of each of the aforementioned foods we
need to meet all the nutritional requirement, while minimizing the total cost of the food
(you may buy fractional numbers of servings).
2. MUCOW (Milk Undertakings, C and O, Waterloo) owns a herd of Holstein cows and a
herd of Jersey cows. For each herd, the total number of litres produced each day, and
milk-fat content (as a percentage) are as follows
litres produced percent milk-fat

Holstein 500 3.7
Jersey 250 4.9
The fat is split off and blended again to create various products. For each product, the
volume, required milk-fat percentage, and profit are as follows. In particular, the milk fat
percentage must exactly equal what is specified.
skim 2% whole milk table cream whipping cream

volume (litres) 2 2 2 0.6 0.3
milk-fat requirement 0.1 % 2% 4% 15% 45 %
profit ($) 0.1 0.15 0.2 0.5 1.2
(a) Formulate as an LP the problem of deciding how many items of each type to produce,
so that the total profit is maximized. Don’t worry about fractional numbers of items.
Write your formulation in matrix notation.
(b) MUCOW is told of a regulation change: ‘skim milk’ can now contain anything up to
0.1% milk-fat, but no more. How does the formulation of the problem change? Note
the resulting formulation should also be an LP.
3. The director of the CO-Tech startup needs to decide what salaries to offer to its employees
for the year 2012. In order to keep the employees satisfied she needs to satisfy the following
constraints:
• Tom wants at least 20’000$ or he will quit;

• Peter, Nina, and Samir want each to be paid at least 5000$ more than Tom;
• Gary wants his salary to be at least as high as the combined salary of Tom and Peter;
• Linda wants her salary to be 200$ more than Gary;
• The combined salary of Nina and Samir should be at least twice the combined salary
of Tom and Peter;
• Bob’s salary is at least has high as that of Peter and at least as high as that of Samir;
• The combined salary of Bob and Peter should be at least 60’000$;
• Linda should make less money than the combined salary of Bob and Tom.
(a) Write an LP that will determine salaries for the employee of CO-tech that satisfy each
of these constraints while minimizing the total salary expenses.
(b) Write an LP that will determine salaries for the employee of CO-tech that satisfy each
of these constraints while minimizing the salary of the highest paid employee.
(c) What is the relation between the solution for (a) and (b)?
4. You wish to build a house and you have divided the process into a number of tasks, namely,
B. excavation and building the foundation,

F. raising the wooden frame,
E. electrical wiring,
P. indoor plumbing,
D. dry walls and flooring,
L. landscaping.
You estimate the following duration for each of the tasks (in weeks):
task B F E P D L
duration 3 2 3 4 1 2
Some of the tasks can only be started when some other tasks are completed. For instance,
we can only build the frame once the foundation has been completed, i.e. F can start only
after B is completed. We summarize all the precedence constraints,
• F can start only after B is completed

• L can start only after B is completed
• E can start only after F is completed
• P can start only after F is completed
• D can start only after E is completed
• D can start only after P is completed
The goal is to schedule the starting time of the tasks such that the entire project is completed
as soon as possible.
As an example, here is a feasible schedule with a completion time of 10 weeks.
tasks B F E P D L
starting time 0 3 6 5 9 6
end time 3 5 9 9 10 8
Formulate this problem as an LP. Explain your formulation. Note, that there is no limit on
the number of tasks that can be done in parallel.
H INT: Introduce variables to indicate the times that the tasks start.
5. The CRUD chemical plant produces as part of its production process a noxious compound
call chemical X. Chemical X is highly toxic and needs to be disposed of properly. Fortu-
nately, CRUD is linked by a pipe system to the FRESHAIR recycling plant that can safely
reprocess chemical X. On any give day the CRUD plant will produce the following amount
of Chemical X (the plant operates between 9AM and 3PM only),
time 9-10AM 10-11AM 11AM-12PM 12-1PM 1-2PM 2-3PM

chemical X (in `) 300 240 600 200 300 900
Because of environmental regulation at no point in time is the CRUD plant allowed to keep
more than 1000 ` on site and no chemical X is allowed to be kept overnight. At the top
of every hour an arbitrary amount of chemical X can be sent to the FRESHAIR recycling
plant. The cost of recycling chemical X is different for every hour,
time 10AM 11AM 12PM 1PM 2PM 3PM

price ($ per `) 30 40 35 45 38 50
You need to decide how much chemical to send from the CRUD plant to the FRESHAIR
recycling plant at the top of each hour, so that you minimize the total recycling cost but
meet all the environmental constraints. Formulate this problem as an LP.
6. We are given an m by n matrix A and a vector b in Rm , for which the system Ax = b has no
solution. Here is an example.
2x1 − x2 = −1
x1 + x2 = 1
x1 + 3x2 = 4
−2x1 + 4x2 = 3
We are interested in finding a vector x ∈ Rn that “comes close” to solving the system.
Namely we want to find an x ∈ Rn , whose deviation is minimum, where the deviation of x
is defined to be n n
∑ |bi − ∑ ai j x j |.
i=1 j=1
(For the example system above, the vector x = (1, 1)T has deviation 2 + 1 + 0 + 1 = 4.)
(a) Show that a solution to the optimization problem

m
minimize ∑ yi
i=1
subject to

n

∑ ai j x j − bi ≤ yi (i = 1, . . . , m)
j=1
will give a vector x of minimum deviation.
(b) The problem of part (a) is not an LP. (Why?) Show that it can be formulated as an LP.
(c) Suppose that we had instead defined the deviation of x as

n
max |bi − ∑ ai j x j |.
1≤i≤m j=1
(According to this definition, in the example above x = (1, 1)T would have deviation
max(2, 1, 0, 1) = 2.) With this new definition, write the problem of finding a vector of
minimum deviation as an optimization problem, and show that this problem can also
be formulated as an LP.
7. Consider the following set up: we have factories 1 through m and stores 1 through n.
Each factory i produces ui units of a commodity and each store j requires ` j units of that
commodity. Note, each factory produces the same commodity, and each store requires
!
a1
1
a1
1
A 2
2
B
3
3 b4!
b3 4
the same commodity. The goal is to transfer the commodity from the factories to the store.
All the commodities going from the factories to the store are first sent to one of two central
storage hubs A and B. The cost of transporting one unit of commodity from factory i to hub
A (resp. B) is given by ai (resp. bi ). The cost of transporting one unit of commodity from
1.3. INTEGER PROGRAMS 21
hub A (resp. B) to store j is given by a0j (resp. b0j ). 2 In the figure we illustrate the case of
3 factories and 4 stores. The problem is to decide how much to send from each factory to
each hub and how much to send from each hub to each store so that: each store receives the
amount of commodity it requires, no factory sends out more commodity than it produces,
and such that the total transportation cost is minimized. Formulate this problem as an IP
(we may assume that the number of units of commodity sent is fractional).
8. We are given a matrix A ∈ Rm×n and a matrix B ∈ R p×n so that the rows of A denote
observations for healthy human tissue and the rows of B denote observations for unhealthy
human tissue. We would like to find two parallel hyperplanes which separate rows of A
and B as much as possible. That is, we want to find a ∈ Rn α, β ∈ R such that the parallel
hyperplanes {x ∈ Rn : aT x = α} and {x ∈ Rn : aT x = β } separate rows of A and rows of
B and the distance between the hyperplanes is maximized. The next figure illustrates the
situation for n = 2; circles correspond to rows in A, and x’s for rows of B.
x1
aTx = α
aTx = β
x2
Formulate the problem of computing a, α and β achieving the above-mentioned goals as

an LP.
1.3 Integer programs

An integer program is obtained by taking a linear program and adding the condition that a
non-empty subset of the variable be required to take integer values. When all variables are
2 Each of ui , ` j , ai , bi , a0j , b0j is a constant.
required to take integer values the integer program is called a pure integer program otherwise
it is called a mixed integer program. We will abbreviate the term integer program by IP,
throughout this book.
Example 3. The following is a mixed IP, where variables x1 and x3 are required to take integer
values,
max x1 + x2 + 2x4
subject to
x1 + x2 ≤ 1
−x2 − x3 ≥ −1
x1 + x3 = 1
x1 , x2 , x3 ≥ 0 and x1 , x3 integer
In section 1.1 we introduced the WaterTech production problem. We gave an LP formu-
lation (1.5) and a solution to that formulation in (1.7). This solution told us to manufacture,
16 + 23 units of product 1. Depending on the nature of product 1, it may not make sense to
produce a fractional number of units of this product. Thus we may want to add the condition
that each of x1 , x2 , x3 , x4 are integer. The resulting program would be an IP. In this example
we could try to ignore the integer condition, and round down the solution, hoping to get a
reasonably good approximation to the optimal solution.
1.3.1 Assignment problem

Our friends at WaterTech are once again looking to us for help. The company faces the fol-
lowing problem: there is a set of J jobs that need to be handled urgently. The company has
selected I of its most trusted employees to handle these jobs. Naturally, the skill sets of these
employees differ, and not all of the jobs are equally well handled by all of the employees.
From past experience, management knows the number of hours ci, j each worker i ∈ I is ex-
pected to take in order to complete any of the jobs j ∈ J. The following table gives an example
for a case with |J| = 4 jobs and |I| = 4 employees,
Jobs
Employees
1 2 3 4
1 3 5 1 7
2 8 2 2 4
3 2 1 6 8
4 8 3 3 2
For instance, the table says that c3,4 = 8, i.e. employee 3 would take 8 hours to finish job 4.
WaterTech wants to assign jobs to employees with the conditions that,
(1) each employee i ∈ I is assigned exactly one job j ∈ J,
(2) each job j ∈ J is assigned to exactly one employee i ∈ I.
Naturally, we want to find such an assignment that minimizes the total expected amount of
time needed to process all jobs J. A feasible solution would be to assign job k to employee k,
for k = 1, 2, 3, 4. The total amount of time require for this assignment is 3 + 2 + 6 + 2 = 13
hours. This not an optimal solution however. We wish to find an IP formulation for this
problem. Thus we need to determine the variables, the objective function, and the constraints.
Variables. In this case we need to decide for each employee i ∈ I and each job j ∈ J
whether employee i is assigned job j. We will introduce for every such pair i, j a variable xi, j
that we restrict to take values 0 or 1 where xi, j = 1 represents the case where employee i is
assigned job j, and xi, j = 0 means that employee i is not assigned job j. Thus we have |I||J|
variables. In the example given by the table we end up with 16 variables.
Constraints. We need to encode condition (1) as a mathematical constraint. Let i ∈ I be
an employee, then ∑ j∈J xi, j is the number of jobs employee i is assigned to (we do the sum
over all jobs). We want this quantity to be one, thus the following should hold,
∑ xi, j = 1. (1.13)
j∈J
In the example given by the table this says that xi,1 + xi,2 + xi,3 + xi,4 = 1 for all jobs i ∈
{1, 2, 3, 4}. We need to encode condition (2) as a mathematical constraint. Let j ∈ J be an
employee, then ∑i∈I xi, j is the number of employees job j is assigned to (we do the sum over
all employees). We want this quantity to be one, thus the following should hold,
∑ xi, j = 1. (1.14)
i∈I
Objective function. The objective function should calculate the total amount of time
spent to complete the jobs. For every employee i ∈ I and job j ∈ J, if employee i is assigned
job j, then we should contribute ci, j to the objective function, otherwise we should contribute
0. Thus, we should contribute ci, j xi, j . Therefore, the objective function is given by,
∑ ∑ ci, j xi, j . (1.15)

i∈I j∈J
For instance, in our specific example, the objective function is,
3x1,1 + 5x1,2 + 1x1,3 + 7x1,4 + 8x2,1 + 2x2,2 + 2x2,3 +

4x2,4 + 2x3,1 + 1x3,2 + 6x3,3 + 8x3,4 + 8x4,1 +3x4,2 + 3x4,3 + 2x1,4 .
Thus the IP formulation is given by objective function (1.15), and constraints (1.13), (1.14) as
well as the condition that each variable xi, j can only take variable 0, 1,
min ∑ ∑ ci, j xi, j

i∈I j∈J
subject to
(1.16)
∑ j∈J xi, j = 1 (i ∈ I)
∑i∈I xi, j = 1 ( j ∈ J)
xi, j ∈ {0, 1} (i ∈ I, j ∈ J)
Note, formally speaking (1.16) is not an IP, because of the constraints xi, j ∈ {0, 1}. However,
we can clearly replace these constraints by the constraints xi, j ≥ 0, xi, j ≤ 1 and xi, j integer. If
we do this for all i ∈ I and j ∈ J then the resulting optimization formulation is an IP. Hence,
we will abuse notation slightly and call the formulation (1.16) an IP.
Solving this IP for the special case given in the table, yields,
x1,1 = 1, x2,3 = 1, x3,2 = 1, x4,4 = 1,
and all other values xi, j = 0. Thus an optimal solution is to assign job 1 to employee 1, job 3
to employee 2, job 2 to employee 3, and job 4 to employee 4.
Note, in this example we represent a binary choice (whether to assign job j to employee i)
by a variable taking values 0 or 1. We call such a variable a decision variable. When using a
decision variable y, we can express y ∈ {0, 1} by 0 ≤ y ≤ 1 and y integer, but the condition that
y is integer cannot easily be omitted, as we may otherwise get any value between 0 and 1 (say
1
2 for instance) and this value gives no information as to what our binary choice should be.
1.3.2 Knapsack problem

The company KitchTech wishes to ship a number of crates from Toronto to Kitchener in a
freight container. The crates are of six possible type, say type 1 through type 6. Each type of
crate has a given weight in kilograms and has a particular retail value in $, as indicated in the
following table,
Type 1 2 3 4 5 6
weight 30 20 30 90 30 70
value 60 70 40 70 20 90
In addition you have the following constraints,
(1) you cannot send more than 10 crates of the same type in the container,
(2) you can only send crates of type 3, if you send at least one crate of type 4,
(3) at least one of the following two conditions has to be satisfied:
(i) a total of at least 4 crates of type 1 or type 2 is selected or

(ii) a total of at least 4 crates of type 5 or type 6 is selected.
Finally, the total weight allowed on the freight container is 10,000 kilograms. Your goal is
to decide how many crates of each type to place in the freight container so that the value of
the crates in the container is maximized. We wish to find an IP formulation for this problem.
Thus we need to determine the variables, the objective function, and the constraints.
Variables. We will have variables xi for i = 1, . . . , 6 to indicate how many crates of type
i we place in the container. We will also require an additional decision variable y to handle
condition (3).
Constraints. The total weight of the crates selected in kilograms is given by,
30x1 + 20x2 + 30x3 + 90x4 + 30x5 + 70x6 .
This weight should not exceed, 10,000 kilogram, thus
30x1 + 20x2 + 30x3 + 90x4 + 30x5 + 70x6 ≤ 10000. (1.17)
Condition (1) simply says that for all i = 1, . . . , 6,
xi ≤ 10 (1.18)
We claim that condition (2) can be stated as,
x3 ≤ 10x4 . (1.19)
For if no crates of type 4 is sent, then x4 = 0 which implies by (1.19) that x3 = 0 as well, i.e.
no crates of type 3 are sent. On the other hand, if at least one crate of type 4 is sent, then
x4 ≥ 1 and (1.19) says that x3 ≤ 10, which is the maximum number of crates of type 3 we can
send anyway.
It remains to express condition (3). The decision variable y will play the following role.
If y = 1 then we want (i) to be true, and if y = 0 then we want (ii) to be true. This can be
achieved by adding the following two constraints,
x1 + x2 ≥ 4y
(1.20)
x5 + x6 ≥ 4(1 − y).
Let us verify that (1.20) behaves as claimed. If y = 1, then the conditions become x1 + x2 ≥ 4
and x5 + x6 ≥ 0, which implies that (i) holds. If y = 0, then the conditions become x1 + x2 ≥ 0
and x5 + x6 ≥ 1, which implies that (ii) holds.
Objective function. The total value of the crates selected for the container is given by,
60x1 + 70x2 + 40x3 + 70x4 + 20x5 + 90x6 . (1.21)
Thus the IP formulation is given by objective function (1.21), and constraints (1.17), (1.18),
(1.19), (1.20) as well as the condition that each variable xi is integer, and y ∈ {0, 1}. We
obtain,
max 60x1 + 70x2 + 40x3 + 70x4 + 20x5 + 90x6
subject to
30x1 + 20x2 + 30x3 + 90x4 + 30x5 + 70x6 ≤ 10000
xi ≤ 10 (i = 1, . . . , 6)
x3 ≤ 10x4
x1 + x2 ≥ 4y
x5 + x6 ≥ 4(1 − y)
xi ≥ 0 (i = 1, . . . , 6)
xi integer (i = 1, . . . , 6)
y ∈ {0, 1}
Exercises
1. You are about to trek across the desert with a vehicle having 3.6 cubic metres (3.6m3 ) of
cargo space for goods. There are various types of items available for putting in this space,
each with a different volume and a different net value for your trip, shown as follows.
item type i 1 2 3 4 5 6 7
volume vi (m3 ) 0.55 0.6 0.7 0.75 0.85 0.9 0.95
net value ni 250 300 500 700 750 900 950
(a) You need to decide which items to take, not exceeding the volume constraint. You can
take at most one of any item. No item can be split into fractions. The total net value
must be maximized. Formulate this problem as an LP or IP. (You may use the notation
vi and ni for volume and net value of item i.)
(b) Your two friends have decided to come as well, each with an identical vehicle. There
are exactly two items of each type. The question is, can you fit all 14 items in the
vehicles without exceeding the volume constraints? (No cutting items into pieces is
permitted! Each item taken must be placed entirely in one of the vehicles.) Formulate
an LP or IP that has a feasible solution if and only if the items can be packed as
desired. Describe how to determine from a feasible solution how to pack the items
into the vehicles. Note that net value is ignored for part (b).
2. Consider a public swimming pool. In the following table we give a list of seven potential
lifeguards. For each lifeguard we have the time he/she wants to start and finish work and
how much he/she wishes to be paid for the work. The problem is to find a selection of
lifeguards so that there is at least one (but possibly more than one) lifeguard present at
each time between 1pm and 9pm. An example of a possible selection would be Joy, Tim,
and Beth. This selection has a total cost of 30 + 21 + 20.
Lifeguards Joy Dean Tim Celicia Beth Ivar Eilene

Hours 1-5 1-3 4-7 4-9 6-9 5-8 8-9
Amount required 30 18 21 38 20 22 9
Formulate this problem as an integer program.
3. You have gone to an exotic destination during the summer vacation and decided to do your
part to stimulate the economy by going on a shopping spree. Unfortunately, the day before
your return you realize that you can only take 20 kilograms of luggage on your flight which
is less than the total weight of the items that you purchased. The next table gives the value
and the weight in kilograms of each item,
A B C D E F
value 60$ 70$ 40$ 70$ 16$ 100$
weight 6 7 4 9 3 8
The problem is to decide which subset of the items to put in your luggage so that you max-
imize the total value of the items selected without exceeding the weight requirement (i.e.
that the total weight of the items selected is no more than 20 kilograms). For instance you
could select items A,C and D for a total value of 170$ and a total weight of 19 kilograms.
(a) Formulate this problem as an IP.

(b) Suppose that you only want to pack item D when item A is selected but it is ok to
pack item A without item D. Add a constraints to your formulation that imposes this
additional condition.
(c) Suppose that the airline allows you exceed the 20 kilogram weight limit at a cost of
15$ per additional kilogram. For instance you could select items A, B and D for a total
value of 200$ and a total weight of 22 kilogram and pay 2 × 15$ to the airline for
exceeding the maximum capacity by 2 kilograms. Modify your formulation so that
you are allowed to go over the 20 kilogram capacity and such that you maximize the
total value of the items packed minus the cost paid to the airline.
Note, for (b) and (c) the resulting formulation should remain an IP.
4. The Waterloo hotel wants to rent rooms 1, 2 and 3 for New Year’s night. Abby is willing
to pay $60 for room 1, $50 for room 2 but is not interested in room 3. Bob is willing to pay
$40 for room 1, $70 for room 2, and $80 for room 3. Clarence is not interested in room 1,
but is willing to pay $55 for room 2 and $75 for room 3. Donald is willing to pay $65 for
room 1, $90 for room 2, but is not interested in room 3. The information is summarized in
the following table.
Room number Abby’s offer Bob’s offer Clarence’s offer Donald’s offer
1 $60 $40 not interested 65$
2 $50 $70 $55 $90
3 not interested $80 $75 not interested
The hotel wants fill up rooms 1,2,3 with some of the potential clients (Abby, Bob, Clarence
and Donald) in a way that maximizes the total revenue. Each room is to be assigned
to exactly one potential client, and each potential client is to be assigned at most one
room. As an example, Room 1 could be assigned to Bob, room 2 to Abby, and room 3 to
Clarence (while Donald would not get to stay in the hotel). This would yield a revenue of
$40+$50+$75=$165.
(a) Formulate this problem as an IP. Your solution should be easy to modify if we change
the values in the table.
(b) Abby and Bob have a history of loud and rude behavior when celebrating together. In
the interest of keeping the New Year’s eve party orderly the hotel management decides
that it does not wish to rent rooms to both Abby and Bob. Add a constraint to the IP in
(a) that will enforce this condition (the resulting formulation should still be an IP).
5. You wish to find out how to pack crates on a transport plane in an optimal way. The crates
are of five possible types, namely A, B,C, D, E. For each crate type the next table gives its
weight (in kg), its volume (in cubic meters), and its value (in dollars):
Type A B C D E
Weight 500 1500 2100 600 400
Volume 25 15 13 20 16
Value 50’000 60’000 90’000 40’000 30’000
The transport plane is divided into three segments: Front, Middle, and Back. Each segment
has a limited volume (in cubic meters), and a limit on the weight of the cargo in that
segment (in kg):
Segment Front Middle Back

Available volume 200 500 300
Weight capacity 8000 20000 6000
Finally, to keep the plane balanced we need to satisfy the following constraints:
weight of Middle cargo ≥ weight of Front cargo + weight Back cargo,

weight of Middle cargo ≤ 2 × weight of Front cargo + weight Back cargo .
Suppose that there are 12 crates of type A, 8 crates of type B, 22 crates of type C, 15
crates of type D and 11 crates of type E that are waiting to be transported. Your goal is to
maximize the total value of the crates on the plane. You need to decide how many crates
of each type is going in what segment of the plane. Formulate your problem as an IP.
6. Consider an LP with variables x1 , x2 , x3 , x4 . Suppose that the LP includes the constraints,

x1 , x2 , x3 , x4 ≥ 0.
(a) Consider the constraint,

x4 ≥ |x3 − 2x1 |. (1.22)
Suppose that we want to add to the LP the condition that (1.22) is satisfied. Show how
to satisfy this requirement so that the resulting formulation is an LP.
H INT: rewrite (1.22) as a pair of linear inequalities.
(b) Consider the following inequalities,
6x1 + 2x2 + 3x3 + 3x4 ≥ 3 (1.23)

2x1 + 4x2 + 2x3 + 7x4 ≥ 9 (1.24)
Suppose that we want to add to an IP the condition that at LEAST ONE of constraints
(1.23) or (1.24) is satisfied. Show how to satisfy this requirement so that the resulting
formulation is an IP.
H INT: add a decision variable indicating whether (1.23) or (1.24) must be satisfied.
Note the left hand side of either (1.23) or (1.24) is always non-negative.
(c) Suppose that i = 1, . . . , k we have a vector ai ≥ 0 with 4 entries and a number βi (both
ai and βi are constants). Let r be any number between 1 and k. Consider the following
set of inequalities,
(ai )T x ≥ βi (i = 1, . . . , k) (1.25)
We want to add to an IP the condition that at LEAST r of the constraints are satisfied.
Show how to satisfy this requirement so that the resulting formulation is an IP.
H INT: add a decision variable for each constraint in (1.25).
(d) Consider the following set of values,
S := {3, 9, 17, 19, 36, 67, 1893}.
Suppose that we want to add to an IP the condition that the variable x takes only one of
the values in S . Show how to satisfy this requirement so that the resulting formulation
is an IP.
H INT: add a decision variables for each number in the set S .
7. The company C & O operates an oil pipeline pumping oil from Alberta to various states in
the Northwestern USA. The figure below shows the direction of flow, four input lines and
the three output lines. Note for instance that State A can only get its oil from either Input
1 or from the Yukon Input line.
Yukon Input 1 Input 2 Input 3
State A State B State C
Figure 1.1: The structure of the oil pipeline, inputs and outputs.
Each input line has a capacity (barrels/day) and a cost per barrel:
Input Line 1 2 3 Yukon

Capacity 4000 2000 3000 10000
Cost per barrel $7 $5 $3 $9
Each state has a daily demand (barrels/day) that must be met exactly:
State A B C
Demand 3500 3000 4000
The input from Yukon is not owned by the company and activating that line has a fixed
cost of $11000. per day.
Write an IP that plans the activities of the company C & O for a day (how many barrels of
oil to pump from each input line) by minimizing the total daily cost of the company while
meeting all the demand.
8. A company won a government bid to meet the yearly demands d1 , d2 , . . . , dn in the areas
j ∈ {1, 2, . . . , n}. Now the company has to decide where to build its factories and how
much of each factory’s output will be shipped to which of these n areas.
There are m potential locations for building the factories. If the company decides to build
at location i ∈ {1, 2, . . . , m} then the fixed cost of building the factory (yearly amortized
version) is fi and the yearly capacity of the factory will be si . The cost of transporting one
unit of the product from location i to area j is given as ci j .
Construct an IP whose solution indicates where to build the factories, how many units of
product to ship from each factory to each demand area so that the demand is met and the
total yearly cost of the company is minimized.
9. Your boss asks you to purchase bi units of product i for each i in a set P of products. (These
products are all divisible, that is, they can be obtained in fractional amounts.) Of course,
your boss wants you to spend as little money as possible. You call up all the stores in a set
S of stores, and store j gives you a per-unit price ci j for product i for all i, j.
(a) You decide to just order all bi units of product i from the store that gives the cheapest
per-unit price, for each i. Show that this is optimal.
(b) Actually, there is another constraint. Your boss forgot to tell you that he does not
want you to buy from too many different stores—he wants you to keep the number of
stores from which you buy to at most (integer) k. Formulate this problem as an IP.
(c) It turns out that the stores have special deals. If the total value of your order from
store j is at least t j dollars, it will give you d j cash back. (All stores j offer such a
deal, with perhaps different values of t j and d j .) Formulate this problem as an IP.
10. KW mining has an open-pit mine with 12 blocks of ore as shown in this diagram. The
mineability condition says that no block can be mined without mining all blocks which lie
at least partially above it. For example, block 7 cannot be mined unless blocks 2 and 3 are
mined, and block 12 requires blocks 8 and 9, and hence 3, 4 and 5 too.
The net value of mining block i is mi , and its volume is vi m3 .

(a) Formulate as an integer program the problem of deciding which blocks should be
mined, in order to maximize total value of blocks mined, and satisfy the mineability
condition, if at most 7 blocks can be mined.
(b) What extra constraint would you add if, in addition to all constraints needed in (a), it
is required that the total volume mined must not exceed 10,000 m3 ?
(c) Mines often have a minimum short-term return requirement. For this mine, the board
of the company requires that the total value of blocks mined in the first two years must
total at least 1, 000, 000. Each block takes 1 year to mine, at most two blocks can be
mined at once, and a block cannot be mined in a given year unless all ones lying at
least partially above it were mined by the year before. Besides this new condition, the
mineability condition still applies, and at most 7 blocks can be mined. Formulate as
an integer program the problem of deciding which blocks should be mined, subject to
these constraints, in order to maximize total value of blocks mined.
11. Suppose that you are given an N × N chessboard. We wish to place N queens on the board
such that no two queens share any row, column of diagonal. The next figure shows a
solution for N = 8. Formulate this problem as an integer feasibility problem (i.e. an IP
without an objective function).
12. A 9 × 9 matrix A is partitioned into nine 3 × 3-submatrices A1 , . . . , A9 (of consecutive ele-

ments). Certain entries of A contain numbers from the set {1, . . . , 9}. An example of such
a pre-assignment is shown below.
5 3
4 6
7 2
1 3 6 9
4 6 9 5
9 8 2 7
2 9
8 1
6 4
A solution to the Suduku game is an assignment of integers from 1 to 9 to each (unassigned)

entry of the matrix such that,
• each row of A,
• each column of A,
• each 3-submatrix A1 , . . . , A9
contains every number from {1, . . . , 9} exactly once.

Formulate the problem of checking whether there is a solution to the Suduku game as an
integer feasibility problem (i.e. an IP without an objective function).
H INT: define a decision variable xi jk that takes value 1 when entry i, j is assigned value k.
13. In Conway’s Game of Life, we are given a chess board of size n × n for some positive
integer n. Each cell (i, j) of this board has up to 8 neighboring cells Ni, j . A cell (i, j) can
be either alive or dead and a configuration of the game consists of a set of cells that are
alive: L = {(i1 , j1 ), . . . , (ik , jk )}. We use a set of simple rules to compute the successor
configuration succ(L ) to L :
(a) if there is at most one living cell in the neighborhood of (i, j) then (i, j) will be dead
in the next iteration
(b) if there are exactly two living cells in Ni, j then we do not change the status of (i, j)
(c) if (i, j) has exactly three living neighbors in Ni, j that its status in the next configuration
will be alive
(d) if there are at least four living cells in the neighborhood of (i, j) then (i, j) will be
dead in the next iteration.
1.4. OPTIMIZATION PROBLEMS ON GRAPHS 35
Check out http://www.bitstorm.org/gameoflife/ for a Java applet for LIFE.

A still life is a configuration L such that succ(L ) = L and the density of L is defined
as |L |/n2 . Given an n × n board, we are interested in finding a still life of maximum
density.
Formulate the problem of finding a maximum density still life as an IP.
1.4 Optimization problems on graphs

A graph G is a pair (V, E) where V is a finite set and E is a set of unordered pairs of elements
of V (i.e. the pair uv and vu is the same). Elements of V are called vertices and elements
of E edges. We can represent a graph as a drawing where vertices correspond to points and
a pair u, v of vertices is joined by a line when uv ∈ E. Observe that different drawings may
correspond to the same graph.
Example 4. Two drawings of G = (V, E) with
V = {1, 2, 3, 4, 5} E = {12, 23, 34, 14, 15, 25, 35, 45}.
1 2 5
5
1 2 3 4
4 3
Let G = (V, E) be a graph. Suppose uv ∈ E. Then u and v are adjacent vertices; u, v are
the endpoints of edge uv; and edge uv is incident to vertices u and v. In this section we present
several optimization problems that can be expressed as optimization problems in graphs.
1.4.1 Shortest path problem

The shortest path problem is a fundamental optimization problem that most of us encounter
(and solve) frequently: starting in geographic location s, we wish to travel to location t. Since
we are frugal travellers, we wish to choose a route of minimum total length. Let us illustrate
this problem using a concrete example. Suppose you are visiting the city of Watertown (see
s a Columbia Ave
Albert St
d
Fisher Rd
King St
c
Westmount Rd
Erb St
b f g
Victoria St
t
Figure 1.2: Map of Watertown,
Figure 1.2). You are staying with a friend who lives at the intersection of Fischer Road and
Columbia Avenue (location s), and you wish to visit the Tannery District which is located
at the intersection of King and Victoria streets (location t). A shortest route from s to t is
indicated by the think lines in Figure 1.2. The problem of finding such a shortest route is
known as the shortest path problem.
Let us rephrase the problem in the language of graph theory. We represent in Figure 1.3 the
street network by a graph, where vertices correspond to intersections and edges to roads. For
instance, the graph G = (V, E) corresponding to the WaterTown map is given by the following
figure, where the number ce ≥ 0 next to each edge e corresponds to the length (in meter) of
s 650 a 830 600

90
80
450
630
900
700
d
590
610
5
250
41
325 c 830 600
325 b g
f
350
0
50
720
700
330
650 800 610

t
Figure 1.3: Graph representing map of Watertown,
the corresponding road on the map.

An st-path P is a sequence of edges
v1 v2 , v2 v3 , . . . , vk−2 vk−1 , vk−1 vk
such that v1 = s, vk = t, and vi 6= v j for all i 6= j. 3 The length c(P) of an st-path is defined as
the sum of the length of the edges of P, i.e. as ∑(ce : e ∈ P).
Example 5. In the figure 1.3 the thick black edges in the graph form an st-path
P = sa, ad, db, b f , f g, gt
of total length,
c(P) = csa + cad + cdb + cb f + c f g + cgt

= 650 + 490 + 250 + 830 + 600 + 700 = 3520.
We can now define formally the shortest path problem. We are given a graph G = (V, E)
with non-negative weights ce for every edge e ∈ E, and distinct vertices s,t. We wish to find
among all possible st-paths, one that is of minimum length. Thus the optimization problem
we wish to solve is,
min c(P)
subject to (1.26)
P is an st-path.
In the following section we will see how to formulate this optimization problem as an IP.
1.4.2 Minimum cost perfect matching

Recall our assignment problem from Section 1.3.1. WaterTech wishes to assign a set of em-
ployees I to do a set of J jobs with the condition that every employee is assigned exactly one
job and every job is assigned to exactly one employee. For every i ∈ I and j ∈ J, ci, j is the
number of hours it takes employee i to do job j. The goal is to find among all such assignment
one that minimizes the total completion time of the jobs. To make the example more realistic
we have to consider the fact that some employees are not qualified to do certain jobs, and
obviously we do not wish to assign these jobs to those employees. The following table gives
an example for a case with |J| = 4 jobs and |I| = 4 employees. Entries ”-” indicate that the
employee is not qualified for the corresponding job.
3 It is often useful to think of an st-path as a set of edges that can order to form an st-path. We will alternate
between the representation of an st-path as a set of edges and a sequence of edges.
Jobs
Employees
10 20 30 40
1 - 5 - 7
2 8 - 2 4
3 - 1 - 8
4 8 3 3 -
For instance employee 3 is not qualified to do job 1. We can represent this data by a graph
G = (V, E) where V = I ∪ J, and i j ∈ E where i ∈ I, j ∈ J if employee i is qualified to do job
j. Moreover, edge i j is assigned weight ci, j . We represent the weighted graph corresponding
to the table in Figure 1.4.
1 2 3 4
I
7 4 1
8 8
8 5 3 2 3
1' 2' 3' 4' J
Figure 1.4: Graph representing assignment problem data.
A graph G = (V, E) is bipartite if we can partition the vertices into two sets, say I and J,
such that every edge has one endpoint in I and one endpoint in J. Given a graph, a subset
of edges M is called a matching if no vertex is incident to more than one edge of M. A
matching is called perfect if every vertex is incident to exactly one edge in the matching. The
weight c(M) of a matching M is defined as the sum of the weights of the edges of M, i.e. as
∑(ce : e ∈ M).
Example 6. The graph in Figure 1.4 is bipartite as every edge has an endpoint in {1, 2, 3, 4}
and one endpoint in {10 , 20 , 30 , 40 }. The set of edges {120 , 230 , 340 } is a matching. However,
it is not a perfect matching as no edge of M is incident to 10 for instance. The set of edges
M = {140 , 210 , 320 , 430 } is a perfect matching. The matching M has weight,
c(M) = c140 + c210 + c320 + c430 = 7 + 8 + 1 + 3 = 19.
Going back to our optimization problem, since we wish to assign every employee to ex-
actly one job, and every job to exactly one employee, we are looking for a perfect matching
in the graph given in Figure 1.4. As we are trying to minimize the total completion of all the
jobs, we wish to find among all possible perfect matching one that is of minimum weight.
Thus given a bipartite graph G = (V, E) with non-negative edge weights ce for all e ∈ E, the
optimization problem we wish to solve is,
min c(M)
subject to (1.27)
M is a perfect matching.
In the following section we will see how to formulate this optimization problem as an IP.
Exercises
1. Let G = (V, E) be a graph with distinct vertices s,t. An even st-path is an st-path that has
an even number of edges. Show that we can formulate the problem of finding an even
st-path with as few edges as possible, as a minimum cost perfect matching problem.
H INT: Make a copy of the graph G and remove vertices s and t. Call the resulting graph
G0 . Construct a new graph H starting with the union of graph G and G0 and by joining
every vertex v 6= s,t in G with its copy v0 in G0 .
2. This exercise concerns an optimal storage of books (for a library) by their size to mini-
mize the storage cost for a given collection of books. Suppose that we know the heights
and thicknesses of all the books in a collection (assuming that all widths fit on the same
shelving, we consider only a two-dimensional problem and ignore book widths). Sup-
pose that we have arranged the book heights in ascending order of their n known heights
H1 , H2 , . . . , Hn ; that is, H1 < H2 < . . . < Hn . Since we know the thicknesses of the books,
we can compute the required length of shelving for each height class. Let Li denote the
length of shelving for books of height Hi . If we order shelves of height Hi for length L,
we incur cost equal to F +Ci L; F is a fixed ordering cost (and is independent of the length
or the height of shelf ordered) and Ci is the cost of the shelf per unit length of height Hi ,
C1 ≤ C2 ≤ . . . ≤ Cn .
Notice that in order to save the fixed cost of ordering, we might not order shelves of every
possible height, because we can use a shelf of height Hi to store books of smaller heights.
For an example, if we have 5 books of height H1 , . . . , H5 and length L1 , . . . , L5 respectively,
one solution is to order a shelf of height H3 , length (L1 + L2 + L3 ) to hold books 1,2 and
3, plus a shelf of height H5 , length (L4 + L5 ) to hold books 4 and 5. Such a strategy would
incur a cost of
[F +C3 (L1 + L2 + L3 )] + [F +C5 (L4 + L5 )].
We want to determine the length of shelving for each height class that would minimize the
total cost of the shelving.
Formulate this problem as a shortest paths problem. Give a clear description of the graph,
and the cost of each edge; also indicate the start vertex (s) and the end vertex (t). Draw the
graph for n = 4.
1.5 Integer programs, continued

In the previous section we presented two optimization problems on graphs, namely, the short-
est path problem and the minimum weight perfect matching problem. We will see in this
section that both of these problems can be formulated as IP. Furthermore, in Chapter 3 we
will use these formulations to guide us in devising efficient algorithms to solve these prob-
lems. We consider the minimum cost perfect matching first as the formulation is simpler.
1.5.1 Minimum cost perfect matching

Let G = (V, E) be a graph and let v ∈ V . We denote by δ (v) the set of edges that have v as one
of the endpoint, i.e. δ (v) is the set of edges incident to v. Consider the following graph,
4
a
3 6
d 2 b
1 2
g
Figure 1.5:
Then, δ (a) = {ad, ag, ab} and δ (b) = {db, ab, gb}.
We are now ready to find an IP formulation for the minimum cost perfect matching prob-
lem. We are given a graph G = (V, E), distinct vertices s and t and edge weights ce for every
edge e. We need to determine the variables, the objective function, and the constraints.
1.5. INTEGER PROGRAMS, CONTINUED 41
Variables. We introduce a decision variable xe for every edge e, where xe = 1 indicates

that edge e is selected to be part of our perfect matching and xe = 0 indicates that the edge e
is not selected. Note, here we indicate how to interpret our variables, we of course need to
define the constraints so that the edges in the matching are given by {e : xe = 1}.
Constraints. Let v be a vertex. The number of edges incident to v that are selected is,

∑ xe : e ∈ δ (v) .
Since we want a perfect matching we need that number to be equal to 1, i.e,

∑ xe : e ∈ δ (v) = 1. (1.28)
Objective function. We wish to minimize the total weights of the edges in the matching
M we selected. Thus the objective function should return the weight of M. If e is an edge of
M then we will have xe = 1 and we should contribute ce to the objective function, otherwise
xe = 0 and we should contribute 0 to the objective function. This can be modelled by the term
ce xe . Thus the objective function should be,
∑(cexe : e ∈ E). (1.29)
The formulation for the minimum weight matching problem is obtained by combining (1.28)
and (1.29) as well as the condition that xe be a decision variables. Thus we obtain,

min ∑ ce xe : e ∈ E
subject to
(1.30)
∑ xe : e ∈ δ (v) = 1 (v ∈ V )
xe ≥ 0 (e ∈ E)
xe integer (e ∈ E)
Consider an arbitrary edge f and let v be one of its endpoints. Then f ∈ δ (v), and thus
x f ≤ ∑(xe : e ∈ δ (v)) = 1. Thus 0 ≤ x f ≤ 1. As x f is integer, x f ∈ {0, 1}, as required.
As an example let us write the formulation (1.30) for the graph in Figure 1.5. Let x denote
the vector (xab , xbc , xdg , xad , xag , xbd )T and let 1 denote the vector of all ones (of appropriate
dimension). The IP in that case is,
min (6, 2, 1, 3, 2, 4)x

subject to
ab bg dg ad ag bd
 
a 1 0 0 1 1 0
 
b 1 1 0 0 0 1x = 1
 
d 0 0 1 1 0 1
 
g 0 1 1 0 1 0
x≥0 integer.
Each of the first 4 constraints correspond to one of the 4 vertices of G. For instance the second
constraint, corresponding to vertex b, states that xab + xbg + xbd = 1. It says that we should
select exactly one of the edges incident to b.
Let us verify the correctness of the formulation (1.30). We have a graph G = (V, E) and
edge weights ce for every edge e. Let M be a perfect matching. We need to verify (see
Section 1.1.2) that M corresponds to a feasible solution x̄ of (1.30). To do this we set x̄e = 1
if e is an edge of M, and x̄e = 0 otherwise. Conversely, suppose that x̄ is a feasible solution
of (1.30) then M := {e : x̄ = 1} is a perfect matching. Finally, observe that the objective
function of the formulation computes the weight of the corresponding perfect matching.
1.5.2 Shortest path problem

The IP formulation will rely on a characterization of st-paths. We first require a few defini-
tions. Let G = (V, E) be a graph and let U ⊆ V be a subset of the vertices. We denote by δ (U)
the set of edges that have exactly one endpoint in U, i.e.
δ (U) = {uv ∈ E : u ∈ U, v ∈
/ U}.
Consider the following graph,

a
3 2
s 1 t
4 2
Figure 1.6:
Then, δ ({s, a}) is the set of all edges with exactly one end equal to either a or s, i.e. it is
the set {sb, ab, at}. Suppose a graph G has a distinct set of vertices s and t then an st-cut is
a set of edges of the form δ (U) where s ∈ U and t ∈
/ U. The set of all st-cuts in the graph in
Figure 1.6 is given by,
δ ({s}) = {sa, sb}
δ ({s, a}) = {at, ab, sb}
(1.31)
δ ({s, b}) = {sa, ab, bt}
δ ({s, a, b}) = {at, bt}.
These st-cuts are obtained by considering all possible sets of vertices U where s ∈ U and
t∈
/ U. Note, the term st-cut arises from the fact that if we remove all the edges in an st-cut,
the resulting graph breaks into (at least) two parts that are not joint by any edge and where
one part contains s and the other contains t.
Consider a graph G = (V, E) with distinct vertices s and t and let P be an st-path of G. Let
δ (U) be an arbitrary st-cut of G = (V, E). Follow the path P, starting from s and denote by
u the last vertex of P in U, and denote by u0 the vertex that follows u in P. Note that u exists
/ U. Then by definition uu0 is an edge that is in δ (U). Since δ (U) was an
since s ∈ U and t ∈
arbitrary st-cut, we have shown the following remark,
Remark 1.1. An st-path P intersects every st-cut.
In fact the converse, also holds, namely,
Remark 1.2. Let S be a set of edges that contains at least one edge from every st-cut. Then
there exists an st-path P that is contained in the edges of S.
Proof. Let U denote the set of vertices that contain s as well as all vertices u for which there
exists a path from s to u that only uses edges in S. We

U t
need to show that there exists an st-path that is contained s u u 0
Q
in S, i.e. that t ∈ U. Suppose for a contradiction that t ∈
/ U.
Then by definition δ (U) is an st-cut. By hypothesis, there
exists an edge of S in δ (U), say uu0 ∈ S where u ∈ U and u0 ∈
/ U (see figure on the right). By
construction, there exists an su-path Q that is contained in S. Then the path obtained from Q
by adding the edge uu0 is an su0 -path contained in S. It follows by definition of U that u0 ∈ U,
a contradiction.
We are now ready to find an IP formulation for the shortest path problem. We are given a
graph G = (V, E), distinct vertices s and t and edge lengths ce ≥ 0 for every edge e. We need
to determine the variables, the objective function, and the constraints.
Variables. We will introduce a decision variable xe for every edge e, where xe = 1 rep-
resents the case where edge e is selected to be part of our st-path, and xe = 0, means that the
edge e is not selected.
Constraints. Remark 1.2 says that in order to construct an st-path, it suffices to select one
edge from every st-cut. We will use this idea for our formulation. Let δ (U) be an arbitrary
st-cut of G. The number of edges we selected from the st-cut δ (U) is given by,
∑(xe : e ∈ δ (U))
and since we wish to select at least one edge from δ (U), the constraint should be,
∑(xe : e ∈ δ (U)) ≥ 1. (1.32)
Objective function. We wish to minimize the total length of the edges in the st-path P we
selected. Thus the objective function should return the length of P. If e is an edge of P then
we will have xe = 1 and we should contribute ce to the objective function, otherwise xe = 0
and we should contribute 0 to the objective function. This can be modelled by the term ce xe .
Thus the objective function should be,
∑(cexe : e ∈ P). (1.33)
The formulation for the shortest path problem is obtained by combining (1.32) and (1.33)
as well as the condition that xe be a decision variables. Thus we obtain,
min ∑(cexe : e ∈ P)
subject to
(1.34)
∑(xe : e ∈ δ (U)) ≥ 1 (U ⊆ V, s ∈ U,t 6∈ U)
xe ≥ 0 (e ∈ E)
Observe in this formulation that the condition that xe ∈ {0, 1} has been replaced by the con-
dition that xe ≥ 0 and xe is integer, i.e. we removed the condition that xe ≤ 1. This is correct,
for if x̄ is a feasible solution to (1.34), where x̄e ≥ 2, then x0 obtained from x̄ by setting xe0 = 1,
is also a feasible solution to (1.34), moreover, the objective value for x0 is smaller or equal to
the objective value for x̄ (as ce ≥ 0).
As an example let us write the formulation (1.34) for the graph in Figure 1.6. Let x
denote the vector (xsa , xsb , xab , xat , xbt )T and let 1 denote the vector of all ones (of appropriate
dimension). The IP in that case is,
min (3, 4, 1, 2, 2)x

subject to
sa sb ab at bt
 
{s} 1 1 0 0 0
 
{s, a}  0 1 1 1 0x ≥ 1.
 
{s, b}  1 0 1 0 1
 
{s, a, b} 0 0 0 1 1
x≥0 x integer
Each of the first 4 constraints correspond to one of the 4 distinct st-cuts δ (U), given by (1.31).
For instance the second constraint, corresponding to δ ({s, a}) = {sb, ab, at}, states that xsb +
xab + xat ≥ 1. It says that we should select at least one edge from the st-cut δ ({s, a}).
Let us verify the correctness of the formulation (1.34). We have a graph G = (V, E) with
distinct vertices s and t and will assume that ce > 0 for every e ∈ E, i.e. every edge has positive
length. 4 Let P be an st-path. We need to verify (see Section 1.1.2) that P corresponds to a
feasible solution x̄ of (1.34). To do this we set x̄e = 1 if e is an edge of P, and x̄e = 0 otherwise.
4 The formulation does not work if ce < 0 for some edge e and there are technical difficulties when ce = 0.
By Remark 1.1 every st-cut δ (U) contains an edge of P, hence, ∑(x̄e : e ∈ δ (U)) ≥ 1. It
follows that x̄ is feasible for the formulation. It is not true however, that every feasible solution
to the formulation corresponds to an st-path. This is seemingly in contradiction with the
strategy outline in Section 1.1.2). However, it suffices to show that every optimal solution
x̄ to the formulation corresponds to an st-path. Let S = {e ∈ E : x̄ = 1}. We know from
Remark 1.2 that the edges of S contain an st-path. If S itself is an st-path P then we are done.
Otherwise, define x0 by setting xe0 = 1 if e is an edge of P, and xe0 = 0 otherwise. Then, x0 is
also a feasible solution to the formulation, but it has an objective value that is strictly smaller
than x̄, a contradiction. It follows that S is an st-path, as required.
Exercises
1. Let G = (V, E) be a graph with non-negative edge weights ce for every edge e. The maxi-
mum weight matching problem, is the problem of finding a matching of maximum weight.
An edge cover, is a set of edges S that has the property that every vertex is the endpoint
of some edge of S. The minimum weight edge-cover problem, is the problem of finding an
edge cover of minimum weight.
(a) Formulate as an IP the problem of finding a maximum weight matching.

(b) Formulate as an IP the problem of finding a minimum edge-cover.
2. A vertex cover of a graph G is a set S of vertices of G such that each edge of G is

incident with at least one vertex of S . For instance, in the following graph, the set S =
{1, 2, 4, 6, 7} is a vertex cover. The size of a vertex cover is the number of vertices it
contains. So the above set is a vertex cover of size 5.
1 2 3
4 5
6 7 8
(a) Formulate the following problem as an IP: find a vertex cover in the graph G above of
smallest possible size.
(b) Now do the same for an arbitrary graph G. That is, you are given a graph G = (V, E)
formulate the problem of finding a minimum size vertex cover, as an IP.
3. Consider a graph G = (V, E). A triangle T is a set of three distinct vertices i, j, k where
i j, jk and ik are all edges. A packing of triangles is a set S of triangles with the property
that no two triangles in S have a common vertex. The Maximum Triangle Packing Prob-
lem (MTP) is the problem of finding a packing of triangles S with as many triangles as
possible. As an example consider the following graph,
3
2 4 5
8
1 7 9
6 12
10 11
Then triangles T1 = {2, 3, 7}, T2 = {4, 8, 11}, form a packing of triangles. Formulate the
(MTP) problem for this graph G as an IP.
H INT: Define one variable for every triangle and write one constraint for every vertex.
4. Let G = (V, E) be a graph, let s,t be vertices of G, and suppose each edge e has a non-
negative length ce . Remark 1.1 states that an st-cut δ (U) intersects every st-path.
(a) Suppose S is a set of edges that contains at least one edge from every st-path. Show
that there exists an st-cut δ (U) that is contained in the edges of S.

(b) The weight of an st-cut δ (U) is defined as c(δ (U)) := ∑ ce : e ∈ δ (U) . Find an IP
formulation for the problem of finding an st-cut of minimum weight, where we have a
decision variable for every edge and a constraint for every st-path.
H INT: This is analogous to the formulation given in (1.34).
(c) Show that the formulation given in (b) is correct.

1.6 Non linear programs

Consider functions f : Rn → R, and gi : Rn → R, for every i ∈ {1, 2, . . . , m}. A non linear
program is an optimization problem of the form
min z = f (x)
s.t.
g1 (x) ≤ 0,
g2 (x) ≤ 0, (1.35)
.. .. ..
. . .
gm (x) ≤ 0.
We will abbreviate the term non linear program by NLP throughout this book. The reader
might be troubled by the fact that NLP are always minimization problems. What if we wish to
maximize the objective function f (x)? Then we could replace in (1.35) z = f (x) by z = − f (x),
as making the function − f (x) as small as possible is equivalent to making f (x) as large as
possible. Note that when each of the functions f (x) and g1 (x), . . . , gm (x) are affine functions
in (1.35), then we obtain an LP.
1.6.1 Pricing a Tech Gadget

Company Dibson is a local producer of tech gadgets. Its newest model, the Dibson BR-1 will
be sold in three regions: 1, 2, and 3. Dibson is considering a price between $50 and $70 for
the device, but is uncertain as to which exact price it should choose. Naturally, the company
wants to maximize its revenue, and that depends on the demand. The demand on the other
hand is a function of the price. Dibson has done a considerable amount of market research,
and has reliable estimates for the demand (in thousands) for the new product for three different
prices in each of the three regions. The following table summarizes this.
Demand
Price ($)
Region 1 Region 2 Region 3
50 130 90 210
60 125 80 190
70 80 20 140
The company wants to model the demand in region i as a function di (p) of the unit’s price.
Dibson believes that the demand within a region can be modelled by a quadratic function, and
it uses the data from the above table to obtain the following quadratic demand functions:
1.6. NON LINEAR PROGRAMS 49
d1 (p) = −0.2p2 + 21.5p − 445

d2 (p) = −0.25p2 + 26.5p − 610 (1.36)
d3 (p) = −0.15p2 + 14.5p − 140.
The previous figure shows the demand functions for the three regions. Dibson wants to
restrict the price to be between $50 and $70. What price should Dibson choose in order to
maximize its revenue? We will formulate this problem as an NLP. In this case there is a single
variable, namely, the price p of the unit and the constraints state that 50 ≤ p ≤ 70.
Objective function. For each region i, the demand for region i is given by di (p). Since
the price is p, the revenue from region i is given by pdi (p). Thus the total revenue from each
3 regions is given by,
3
p ∑ di (p) =
i=1

p −0.2p2 + 21.5p − 445 − 0.25p2 + 26.5p − 610 − 0.15p2 + 14.5p − 140 =
− 0.6p3 + 62.5p2 − 1195p.
where the first equality follows by (1.36). Thus we obtain the following NLP,
min − 0.6p3 + 62.5p2 − 1195p

subject to
p ≥ 50
.
p ≤ 70
Using an algorithm to solve NLPs, we find that a price of p = 57.9976 maximizes Dibson’s
revenue. The revenue at this price is $23872.80024.
1.6.2 Finding a closest point feasible in an LP

Suppose we are given x̄ ∈ Rn as well as an m-by-n matrix A and b ∈ Rm . We are asked to find
a point in the set
{x ∈ Rn : Ax ≤ b}
that is closest to x̄. This problem can be formulated as
min kx − x̄k
subject to
Ax ≤ b
q
Recall that kxk := x12 + x22 + s + xn2 . Even though the above problem is equivalent to
min kx − x̄k2
subject to
Ax ≤ b
the objective function is still nonlinear.
1.6.3 Finding a “central” feasible solution of an LP

Suppose we are given an m-by-n matrix A and b ∈ Rm . We are interested in the solutions x
satisfying
Ax ≤ b.
1.6. NON LINEAR PROGRAMS 51
However, we are worried about violating constraints and as a result, we want a solution that is
as far away as possible from getting close to violating any of the constraints. Then one option
is to solve the optimization problem
!
m n
max ∑ ln bi − ∑ Ai j x j
i=1 j=1
subject to
Ax ≤ b.
Another option is to solve the optimization problem
m
1
min ∑ bi − ∑n
i=1 j=1 Ai j x j
subject to
Ax ≤ b
Indeed, we can pick any, suitable function which is finite valued and monotone decreasing
for positive arguments, and tends to infinity as its argument goes to zero with positive values.
1
Note that the functions used above as building blocks f1 : R → R, f1 (x) := x and f2 : R → R,
f2 (x) := − ln(x) both have these desired properties.
Further note that the objective functions of both of these problems are nonlinear. There-
fore, they are both NLP problems but not LP problems.
Exercises
1. UW Smokestacks has a chimney which needs cleaning regularly. Soot builds up on the
inside of the chimney at a rate of 1cm per month. The cost of cleaning when the soot has
thickness x cm is 13 + x2 hundred dollars. Cleaning can occur at the end of any month.
The chimney is clean at present (i.e. at the start of month 1). According to the contract
with the cleaners, it must be cleaned precisely 9 times in the next 48 months, one of which
times must be at the end of month 48. Also, at least three cleanings must occur in the first
15 months.
It is needed to determine which months the cleaning should occur, to minimize total cost.
Formulate a nonlinear mathematical program to solve this problem, such that all con-
straints are either linear or integrality constraints. (Hint: Consider a variable giving the
time of the ith cleaning.)
2. A number of remote communities want to build a central warehouse. The location of each
of the community is described by an (x, y)-coordinate as given in the next table,
x y
Dog River 3 9
Springfield 10 6
Empire Falls 0 12
Worthing 4 9
If the warehouse is located at the coordinate (7, 6) then the distance from the warehouse to
Dog River will be, q
(7 − 3)2 + (6 − 9)2 = 5.
Goods from the warehouse will be delivered to the communities by plane. The price of
a single delivery to a community is proportional to the distance between the warehouse
and the community. Each community will need a number of deliveries from the ware-
house which depends on the size of the community. Dog River will require 20 deliveries,
Springfield 14, Empire Falls 8, and Worthing 24. The goal is to locate the warehouse, i.e.
find (x, y)-coordinates, so that the total cost of all deliveries is minimized. Formulate this
problem as an NLP.
3. A 0, 1 program is an integer program where every variable is a decision variable, i.e. can
only take values 0 or 1. Show that every 0, 1 program can be expressed as an NLP.
H INT: for every variable x consider the linear equality with left hand side x(x − 1).
1.7 Overview of the course

In this chapter, we have seen a number of examples of optimization problems arising in prac-
tice. In each of these cases, we have developed a concise mathematical model that captures
the respective problem. Clearly, modeling a problem in mathematical terms is nice and the-
oretically satisfying but it will not by itself satisfy a practitioner who is after all interested in
solutions! For each of the given examples in this chapter, we also provided a solution. But
how do we find these solutions? The techniques to solve these problems depend on the type
1.8. FURTHER READING AND NOTES 53
of model considered. The simplest of these models is linear programming and an algorithm
to solve linear programs is the Simplex algorithm that we describe in Chapter 2.
In Chapter 3 will develop efficient algorithms to solve two classical optimization prob-
lems, namely the problem of finding a shortest path between a fixed pair of vertices in an
undirected graph and the the problem of finding a minimum cost perfect matching in a bi-
partite graph. We show in the process that the natural framework for understanding these
problems is the theory of linear programming duality. This theory will be further developed
in Chapter 4. Further applications to this theory are given in Chapter 5. General techniques to
solve integer programs are studies in Chapter 6. Finally, in Chapter 7 we explain how some
of the concepts introduced for linear programs extend to non-linear programs.
1.8 Further reading and notes

A recent survey (see also [25]) of Fortune 500 firms shows that 85% of all respondents use
linear programming in their operations. The roots of linear programming can be traced back
at least a couple of hundred years to the work of Fourier on solutions of systems of linear
inequalities. The name linear programming, however, was coined only in the late 1930’s
and early 1940’s when the Russian mathematician Leonid Kantorovich and the American
mathematician George Dantzig formally defined the underlying techniques. George Dantzig
also developed the Simplex algorithm which to this date remains one of the most popular
methods to solve linear programs. Dantzig who worked as a mathematical advisor for the
U.S. Air Force initially applied linear programming to solve logistical problems arising in the
military. It did, however, not take long for industry to realize the technique’s potential, and its
use is wide-spread today.
We just saw some examples of modeling a given problem as an optimization problem.
An important part of the mathematical modeling process is to prove the correctness of the
mathematical model constructed. In this course, typically, our starting point will be a well-
described statement of the problem with clearly stated data. Then, we take this description of
the problem, define variables, the objective function and the constraints. Once the mathemati-
cal model is constructed, we prove that the mathematical model we constructed is correct, i.e.,
it exactly represents the given problem statement. For many examples of such mathematical
models, see the book [25].
In real applications, we have to go through a lot of preparation to arrive at the clear
description of the problem and the data. In some applications (actually in almost all applica-
tions), some of the data will be uncertain. There are more advanced tools to deal with such
situations (see, for instance, the literature on Robust Optimization starting with the book [2]).
Many of the subjects that we introduced in this chapter have a whole course dedicated
to them. For example, the portfolio optimization example is an instance of Markowitz model.
It was proposed by Harry Markowitz in the 1950’s. For his work in the area, Markowitz
received the Nobel Prize in Economics in 1990. For applications of optimization in financial
mathematics, see the books [3, 7]. For further information on scheduling, see [19].
Chapter 2
Solving linear programs
2.1 Possible outcomes

Consider an LP (P) with variables x1 , . . . , xn . Recall that an assignment of values to each of
x1 , . . . , xn is a feasible solution if the constraints of (P) are satisfied. We can view a feasible
solution to (P) as a vector x = (x1 , . . . , xn )T . Given a vector x, by the value of x we mean
the value of the objective function of (P) for x. Suppose (P) is a maximization problem.
Then recall that we call a vector x is an optimal solution if it is a feasible solution and no
feasible solution has larger value. The value of the optimal solution is the optimal value. By
definition, an LP has only one optimal value, however, it may have many optimal solutions.
When solving an LP, we will be satisfied with finding any optimal solution. Suppose (P) is a
minimization problem. Then a vector x is an optimal solution if it is a feasible solution and
no feasible solution has smaller value.
If an LP (P) has a feasible solution then it is said to be feasible, otherwise it is infeasible.
Suppose (P) is a maximization problem and for every real number α, there is a feasible solu-
tion to (P) which has value greater than α, then we say that (P) is unbounded. In other words,
(P) is unbounded if we can find feasible solutions of arbitrarily high value. Suppose (P) is a
minimization problem and for every real number α there is a feasible solution to (P) which
has value smaller than α, then we say that (P) is unbounded. Unbounded linear programs are
easy to construct. Try to construct one yourself (hint: you can do this for an LP with a single
variable).
We have identified three possible outcomes for an LP (P) namely,
1. it is infeasible,
55
56 CHAPTER 2. SOLVING LINEAR PROGRAMS
2. it has an optimal solution,
3. it is unbounded.
Clearly, each of these outcomes are mutually exclusive (i.e. no two can occur at the same
time). We will show that in fact exactly one of these outcomes must occur (this is a form of
the Fundamental Theorem of Linear Programming which will be established at the end of this
chapter). We will now illustrate each of these outcomes with a different example.
2.1.1 Infeasible linear programs

If we are interested in knowing whether an LP (P) is infeasible, it suffices to consider the
constraints, as it does not depend on the objective function. Suppose that the constraints of
(P) are as follows
4x1 + 10x2 − 6x3 − 2x4 = 6 (2.1)

−2x1 + 2x2 − 4x3 + x4 = 5 (2.2)
−7x1 − 2x2 + 4x4 = 3 (2.3)
x1 , x2 , x3 , x4 ≥ 0. (2.4)
Suppose further that a friend (or maybe your boss) asked you to solve (P). After spending
a substantial amount of time on the problem, you are starting to become more and more
convinced that (P) is in fact infeasible. But how can you be certain? How will you convince a
third party (your friend or boss) that (P) does indeed have no solution? You certainly can not
hope to claim to have tried all possible sets of values for x1 , x2 , x3 , x4 as there are an infinite
number of possibilities.
We now present a concise way of proving that the system (2.1)-(2.4) has no solution.
The first step is to create a new equation by combining equations (2.1),(2.2) and (2.3). We
pick some values y1 , y2 , y3 . Then we multiply equation (2.1) by y1 , equation (2.2) by y2 and
equation (2.3) by y3 and add each of the resulting equations together. If we choose the values
y1 = 1, y2 = −2 and y3 = 1 we obtain the equation,
x1 + 4x2 + 2x3 = −1. (2.5)
Let us proceed by contradiction and suppose that there is in fact a solution x̄1 , x̄2 , x̄3 , x̄4 to (2.1)-
(2.4). Then clearly x̄1 , x̄2 , x̄3 , x̄4 must satisfy (2.5) as it satisfies each of (2.1), (2.2) and (2.3).
2.1. POSSIBLE OUTCOMES 57
As x̄1 , x̄2 , x̄3 , x̄4 are all non-negative, it follows that x̄1 + 4x̄2 + 2x̄3 ≥ 0. But this contradicts
constraint (2.5). Hence, our hypothesis that there was in fact a solution to (2.1)-(2.4) must be
false.
The vector y = (1, −2, 1)T is the kind of proof that should satisfy a third party; given y,
anyone can now easily and quickly check that there is no solution to (P). Of course, this proof
will only work for an appropriate choice of y and we have not told you how to find such a
vector at this juncture. We will derive an algorithm that either finds a feasible solution, or
finds a vector y that proves that no feasible solution exists.
We wish to generalize this argument, but before this can be achieved, we need to become
comfortable with matrix notation. Equations (2.1)-(2.3) can be expressed as:
Ax = b
where  
    x1
4 10 −6 −2 6 x2 
A = −2 2 −4 1  
b = 5 x= 
x3  . (2.6)
−7 −2 0 4 3
x4
Then Ax is a vector with three components. Components 1,2 and 3 of the vector Ax cor-
respond to respectively the left hand side of equations (2.1),(2.2) and (2.3). For a vector
y = (y1 , y2 , y3 )T , yT (Ax) is the scalar product of y and Ax and it consists of multiplying the
left hand side of equation (2.1) by y1 , the left hand side of equation (2.2) by y2 , the left hand
side of equation (2.3) by y3 and adding each of resulting expressions. Also, yT b is the scalar
product of y and b and it consists of multiplying the right hand side of equation (2.1) by y1
the right hand side of equation (2.2) by y2 and the right hand side of equation (2.3) by y3 and
adding each of the resulting values. Thus
yT (Ax) = yT b
is the equation obtained by multiplying equation (2.1) by y1 , (2.2) by y2 , (2.3) by y3 , and

by adding each of the resulting equations together. Note that in the previous relation we may
omit the parenthesis because of associativity, i.e. yT (Ax) = (yT A)x. For instance, if we choose
y1 = 1, y2 = −2 and y3 = 1 then we obtain,
   
4 10 −6 −2 6
(1, −2, 1) −2 2 −4 1 x = (1, −2, 1) 5 ,
  
−7 −2 0 4 3
and after simplifying,

(1, 4, 2, 0)(x1 , x2 , x3 , x4 )T = −1
which is equation (2.5). We then observed that all the coefficients in the left hand side of (2.5)
are non-negative, i.e. that (1, 4, 2, 0) ≥ 0T or equivalently that yT A ≥ 0T . We also observed
that the right hand side of (2.5) is negative or equivalently that yT b < 0. (Note, 0 denotes the
number zero and 0 the column vector whose entries are all zero.) These two facts implied that
there is no solution to (2.1)-(2.4), i.e. that Ax = b, x ≥ 0 has no solution.
Let us generalize this argument to an arbitrary matrix A and vector b. We will assume that
the matrices and vectors have appropriate dimensions so that the matrix relations make sense.
This remark holds for all subsequent statements.
Proposition 2.1. Let A be a matrix and b be a vector. Then the system
Ax = b x≥0
has no solution if there exists a vector y such that,
1. yT A ≥ 0T and
2. yT b < 0.
Note if A has m rows then the vector y must have m components. Then the matrix equation
yT Ax = yT b is obtained by multiplying for every i ∈ {1, . . . , m} row i of A by yi and adding all
the resulting equations together.
Proof of Proposition 2.1. Let us proceed by contradiction and suppose that there exists a so-
lution x̄ to Ax = b, x ≥ 0 and that we can find y such that yT A ≥ 0T and yT b < 0. Since Ax̄ = b
is satisfied, we must also satisfy yT Ax̄ = yT b. Since yT A ≥ 0T and x̄ ≥ 0, it follows that
yT Ax̄ ≥ 0. Then 0 ≤ yT Ax̄ = yT b < 0, a contradiction.
We call a vector y which satisfies conditions (1) and (2) of Proposition 2.1 a certificate of
infeasibility. To convince a third party that a particular system Ax = b, x ≥ 0 has no solution,
it suffices to exhibit a certificate of infeasibility. Note, while we have argued that such a
certificate will be sufficient to prove infeasibility, it is not clear at all that for every infeasible
system there exists a certificate of infeasibility. The fact that this is indeed so, is a deep result
which is known as Farkas’ Lemma, see Theorem 4.8.
2.1.2 Unbounded linear programs

Consider the linear program
max{z(x) = cT x : Ax = b, x ≥ 0}
where
  
   −1 x1
1 1 −3 1 2 7 0 x2 
   
A= 0 1 −2 2 −2 b = −2 c=
3
 x= 
x3  . (2.7)
−2 −1 4 1 0 −3 7 x4 
−1 x5
This linear program is unbounded. Just like in the previous case of infeasibility, we are looking
for a concise proof that establishes this fact. Specifically, we will define a family of feasible
solutions x(t) for all real numbers t ≥ 0 and show that as t tends to infinity, so does the value
of x(t). This will show that (2.7) is indeed unbounded.
We define for every t ≥ 0,
x(t) = x̄ + td
where
x̄ = (2, 0, 0, 1, 2)T and d = (1, 2, 1, 0, 0)T .
For instance when t = 0 then x(t) = (2, 0, 0, −1, 1)T and when t = 2, x(t) = (4, 4, 2, 1, 2)T . We
claim that for every t ≥ 0, x(t) is feasible for (2.7). Let us first check that the equations Ax = b
hold for every x(t). You can verify that Ax̄ = b and that Ad = 0. Then we have,
Ax(t) = A(x̄ + td) = Ax̄ + A(td) = |{z}

Ax̄ +t |{z}
Ad = b
=b =0
as required. We also need to verify that x(t) ≥ 0 for every t ≥ 0. Note that x̄, d ≥ 0 hence,
x(t) = x̄ + td ≥ td ≥ 0, as required. Let us investigate what happens to the objective function
for x(t) as t increases. Observe that cT d = 2 > 0 then
cT x(t) = cT (x̄ + td) = cT x̄ + cT (td) = cT x̄ + tcT d = cT x̄ + 2t,
and as t tends to infinity so does cT x(t). Hence, we have proved that the linear program is in
fact unbounded.
Given x̄ and d, anyone can now easily verify that the linear program is unbounded. We
have not told you how to find such a pair of vectors x̄ and d. We will want an algorithm that
detects if a linear program is unbounded and when it is, provides us with the vectors x̄ and d.
Let us generalize the previous argument. Let A be an m × n matrix, b a vector with m

components and c a vector with n components. The vector of variables x has n components.
Consider the linear program
max{cT x : Ax = b, x ≥ 0}. (P)
We leave the proof of the following proposition to the reader, as the argument is essentially
the one which we outlined in the above example.
Proposition 2.2. Suppose there exists a feasible solution x̄ and a vector d such that,
1. Ad = 0,
2. d ≥ 0,
3. cT d > 0.
Then (P) is unbounded.
We call a pair of vectors x̄, d as in the previous proposition a certificate of unboundedness. We

will show that there exists a certificate of unboundedness for every unbounded linear program.
2.1.3 Linear programs with optimal solutions

Consider the linear program,
max z(x) = (0, −1, −2, 0, −3)x + 3

subject to

1 −2 1 0 2 2 (2.8)
x=
0 1 −1 1 3 4
x≥0
Suppose you are being told that (2, 0, 0, 4, 0)T is an optimal solution to (2.8). Clearly you will
not simply believe such a claim but rightfully request a proof. It is easy enough to verify that
(2, 0, 0, 4, 0)T is feasible and that the corresponding value is 3. However, once again, trying
all feasible solutions, and comparing their values to that of (2, 0, 0, 4, 0)T is impossible.
In order to construct a concise proof of optimality, we will prove that z(x̄) ≤ 3 for every
feasible solution x̄, i.e. no feasible solution has value that exceeds 3. Since (2, 0, 0, 4, 0)T has
value equal to 3 it will imply that it is optimal. We seemingly traded one problem for another
however. How to show that, z(x̄) ≤ 3 for every feasible solution x̄? It suffices in this particular
case to analyze the objective function as,
z(x̄) = (0, −1, −2, 0, −3) |{z}

x̄ +3 ≤ 3,
| {z }
≤0T ≥0
where the inequality follows from the fact that the scalar product of two vectors, one where
all entries a non-positive and one where all entries are non-negative, is always non-positive.
Hence, (2, 0, 0, 4, 0)T is indeed optimal.
Consider now the following linear program,
max z(x) = (−1, 3, −5, 2, 1)x − 3

subject to

1 −2 1 0 2 2 (2.9)
x=
0 1 −1 1 3 4
x≥0
Observe that the only difference between (2.8) and (2.9) is the objective function. In particular,
(2, 0, 0, 4, 0)T is a feasible solution to (2.9). We claim that it is in fact an optimal solution. To
do this we construct a new constraint by multiplying the first constraint of the linear program
by −1, the second by 2 and by adding the two constraints together. Then we obtain,

1 −2 1 0 2 2
(−1, 2) x = (−1, 2)
0 1 −1 1 3 4
and after simplifying,
(−1, 4, −3, 2, 4)x = 6 or equivalently 0 = −(−1, 4, −3, 2, 4)x + 6.
This equation holds for every feasible solution of (2.8). Thus adding this equation to the ob-
jective function z(x) = (−1, 3, −5, 2, 1)x−3 will not change the value of the objective function
for any of the feasible solutions. The resulting objective function is
z(x) = (−1, 3, −5, 2, 1)x − 3 − (−1, 4, −3, 2, 4)x + 6 = (0, −1, −2, 0, −3)x + 3.
However, this is the same objective function as in the linear program (2.8). It means that
the linear programs (2.9) are essentially the same linear programs, written differently. In
particular, this proves that (2, 0, 0, 4, 0)T is an optimal solution to (2.9).
More generally for linear programs of the form,
max{z(x) = cT x + z̄ : Ax = b, x ≥ 0}. (P)
we will always be able to show that an optimal solution x̄ of value z(x̄) is indeed optimal by
proving that z(x̄) is an upper bound for (P). We will see later in this chapter that it is always
possible to prove such an upper bound by combining the objective function with a linear
combination of the constraints.
Exercises
1. (a) Prove that the following LP problem is infeasible.
max 3x1 + 4x2 + 6x3
s.t. 3x1 + 5x2 − 6x3 = 4
x1 + 3x2 − 4x3 = 2
−x1 + x2 − x3 = −1
x1 , x2 , x3 ≥ 0
(b) Prove that the following LP problem is unbounded.

max −x3 + x4
s.t. x1 + x3 − x4 = 1
x2 + 2x3 − x4 = 2
x1 , x2 , x3 , x4 ≥ 0
(c) Prove that the LP problem
max{cT x : Ax = b, x ≥ 0}
is unbounded, where
 
    1
4 2 1 −6 −1 11  −2 
 
A =  −1 1 −4 1 3  , b =  −2  , c = 
 1 

3 −6 5 3 −5 −8  1 
1
H INT: Consider the the vectors x̂ = (1, 3, 1, 0, 0)T and d = (1, 1, 1, 1, 1)T .
(d) For each of the problems in parts (b) and (c), give a feasible solution having objective
value exactly 5000.
2.2. STANDARD EQUALITY FORM 63
2. Let A be an m × n matrix and let b be a vector with m entries. Prove or disprove each of
the following statements (in both cases y is a vector with m entries),
i. If there exists y such that yT A ≥ 0 and bT y < 0 then Ax ≤ b, x ≥ 0 has no solution.

ii. If there exists y ≥ 0 such that yT A ≥ 0 and bT y < 0 then Ax ≤ b, x ≥ 0 has no solution.
2.2 Standard equality form

An LP is said to be in Standard Equality Form (SEF) if it is of the form:
max{cT x + z̄ : Ax = b, x ≥ 0}.
where z̄ denotes some constant. In other words, an LP is in SEF, if it satisfies the following
conditions:
1. it is a maximization problem,
2. other than the non-negativity constraints, all constraints are equations,
3. every variable has a non-negativity constraint.
Here is an example,
max (1, −2, 4, −4, 0, 0)x + 3

subject to
   
1 5 3 −3 0 −1 5 (2.10)
2 −1 2 −2 1 0  x = 4
1 2 −1 1 0 0 2
x ≥ 0.
We will develop an algorithm that given an LP (P) in SEF will either: prove that (P) is infea-
sible by exhibiting a certificate of infeasibility, or prove that (P) is unbounded by exhibiting
a certificate of unboundedness, or find an optimal solution and show that it is indeed optimal
by exhibiting a certificate of optimality.
However, not every linear program is in SEF. Given an LP (P) which is not in SEF, we
wish to ”convert” (P) into an LP (P’) in SEF and apply the algorithm to (P’) instead. Of course
we want the answer for (P’) to give us some meaningful answer for (P). More precisely what
we wish is for (P) and (P’) to satisfy the following relationships:
1. (P) is infeasible if and only if (P’) is infeasible,
2. (P) is unbounded if and only if (P’) is unbounded,
3. given any optimal solution of (P’) we can construct an optimal solution of (P).
Linear programs that satisfy the relationships (1),(2) and (3) are said to be equivalent.
We now illustrate on an example how to convert an arbitrary linear program into an equiv-
alent linear program in SEF. Note, we will leave it to the reader to verify that at each step we
do indeed get an equivalent linear program:
min (−1, 2, −4)(x1 , x2 , x3 )T

subject to
    
1 5 3 x1 ≥ 5 (2.11)
2 −1 2  x2  ≤ 4
1 2 −1 x3 = 2
x1 , x2 ≥ 0.
The notation used here means that ≥, ≤ and = refer to the 1st, 2nd, and 3rd constraints
respectively. We want to convert (2.11) into an LP in SEF. We will proceed step by step.
First note that (2.11) is a minimization problem. We can replace min(−1, 2, −4)(x1 , x2 , x3 )T
by max(1, −2, 4)(x1 , x2 , x3 )T , or more generally min cT x by max −cT x. The resulting linear
program is as follows,
max (1, −2, 4)(x1 , x2 , x3 )T

subject to
    
1 5 3 x1 ≥ 5 (2.12)
2 −1 2  x2  ≤ 4
1 2 −1 x3 = 2
x1 , x2 ≥ 0.
We do not have the condition that x3 ≥ 0 as part of the formulation in (2.12). We call such
a variable free. We might be tempted to simply add the constraint x3 ≥ 0 to the formulation.
However, by doing so we may change the optimal solution as it is possible for instance that all
optimal solutions to (2.12) satisfy x3 < 0. The idea here is to express x3 as the difference of
two non-negative variables, say x3 = x3+ − x3− where x3+ , x3− ≥ 0. Let us rewrite the objective
2.2. STANDARD EQUALITY FORM 65
function with these new variables,
(1, −2, 4)(x1 , x2 , x3 )T = x1 − 2x2 + 4x3

= x1 − 2x2 + 4(x3+ − x3− )
= x1 − 2x2 + 4x3+ − 4x3−
= (1, −2, 4, −4)(x1 , x2 , x3+ , x3− )T .
Let us rewrite the left hand side of the constraints with these new variables,
        
1 5 3 x1 1 5 3
2 −1 2  x2  = x1 2 + x2 −1 + x3  2 
1 2 −1 x3 1 2 −1
     
1 5 3
= x1 2 + x2 −1 + (x3+ − x3− )  2 
1 2 −1
       
1 5 3 −3
    + −
= x1 2 + x2 −1 + x3 2 + x3 −2

1 2 −1 1
 
  x
1 5 3 −3  1 
x2 
= 2 −1 2 −2  x +  .
1 2 −1 1 3
x3−
In general, for an LP with variables x = (x1 , . . . , xn )T if we have a variable xi where xi ≥ 0 is
not part of the formulation (i.e. a free variable), we introduce variables xi+ , xi− ≥ 0 and define
x0 = (x1 , . . . , xi−1 , xi+ , xi− , xi+1 , . . . , xn )T . Replace the objective function cT x by c0T x0 where
c0 = (c1 , . . . , ci−1 , ci , −ci , ci+1 . . . , cn )T . If the left hand side of the constraints is of the form
Ax where A is a matrix with columns A1 , . . . , An replace Ax by A0 x0 where A0 is the matrix
which consists of columns A1 , . . . , Ai−1 , Ai , −Ai , Ai+1 , . . . , An .
The new linear program is as follows,
max (1, −2, 4, −4)(x1 , x2 , x3+ , x3− )T

subject to
 
  x1  
1 5 3 −3   ≥ 5 (2.13)
2 −1 2 −2  x+2  ≤ 4
x 
1 2 −1 1 3 = 2
x3−
x1 , x2 , x3+ , x3− ≥ 0.
Let us replace the constraint 2x1 − x2 + 2x3+ − 2x3− ≤ 4 in (2.13) by an equality constraint. We
introduce a new variable x4 where x4 ≥ 0 and we rewrite the constraint as 2x1 − x2 + 2x3+ −
2x3− + x4 = 4. The variable x4 is called a slack variable. More generally, given a constraint of
the form ∑ni=1 ai xi ≤ β we can replace it by ∑ni=1 ai xi + xn+1 = β , where xn+1 ≥ 0.
The resulting linear program is as follows,
max (1, −2, 4, −4, 0)(x1 , x2 , x3+ , x3− , x4 )T
subject to
 
  x 1  
1 5 3 −3 0  
 x+2  ≥ 5 (2.14)
2 −1 2 −2 1 x  = 4
 3
1 2 −1 1 0 x3−  = 2
x4
x1 , x2 , x3+ , x3− , x4 ≥ 0.
Let us replace the constraint x1 + 5x2 + 3x3+ − 3x3− ≥ 5 in (2.14) by an equality constraint.
We introduce a new variable x5 , where x5 ≥ 0 and we rewrite the constraint as x1 + 5x2 +
3x3+ − 3x3− − x5 = 5. The variable x5 is also called a slack variable. More generally, given a
constraint of the form ∑ni=1 ai xi ≥ β we can replace it by ∑ni=1 ai xi − xn+1 = β where xn+1 ≥ 0.
The resulting linear program is as follows,
max (1, −2, 4, −4, 0, 0)(x1 , x2 , x3+ , x3− , x4 , x5 )T
subject to
 
x1
   x2   
1 5 3 −3 0 −1  +
 5
2 −1 2 −2 1 0  x3−  = 4
x 
1 2 −1 1 0 0  3
 x4  2
x5
x1 , x2 , x3+ , x3− x4 , x5 ≥ 0.
Note that after relabeling the variables x1 , x2 , x3+ , x3− , x4 , x5 by x1 , x2 , x3 , x4 , x5 , x6 , we obtain the
linear program (2.10). We leave to the reader to verify that the aforementioned transformations
are sufficient to convert any linear program into an LP in SEF.
Exercises
1. Convert the following linear programs (LP) into SEF form,
2.3. A SIMPLEX ITERATION 67
(a)
min (2, −1, 4, 2, 4)(x1 , x2 , x3 , x4 , x5 )T
subject to
 
  x1  
1 2 4 7 3   ≤ 1
2 8 9 0 0 x2  =  
  x3  2 
1 1 0 2 6  
x4  ≥ 3 
−3 4 3 1 −1 ≥ 4
x5
x1 ≥ 0, x2 ≥ 0, x4 ≥ 0
(b) Let A, B, D be matrices and b, c, d, f be vectors (all of suitable dimensions). Convert

the following LP with variables x and y (where x, y are vectors) into SEF,
min cT x + d T y
subject to
Ax ≥ b
Bx + Dy = f
x≥0
Note, the variables y are free.
2.3 A Simplex iteration

Consider the following linear program in SEF,
max z(x) = (2, 3, 0, 0, 0)x

subject to
   
1 1 1 0 0 6 (2.15)
 2 1 0 1 0 x = 10
−1 1 0 0 1 4
x1 , x2 , x3 , x4 , x5 ≥ 0,
where x = (x1 , x2 , x3 , x4 , x5 )T . Because (2.15) has a special form it is easy to verify that x̄ =
(0, 0, 6, 10, 4)T is a feasible solution with value z(x̄) = 0. Let us try to find a feasible solution
x with larger value. Since the objective function is z(x) = 2x1 + 3x2 by increasing the value of
x̄1 or x̄2 we will increase the value of the objective function. Let us try to increase the value of
x̄1 while keeping x̄2 equal to zero. In other words, we look for a new feasible solution x where
x1 = t for some t ≥ 0 and x2 = 0. The matrix equation will tell us which values we need for
x3 , x4 , x5 . We have,
   
6 1 1 1 0 0
10 =  2 1 0 1 0 x
4 −1 1 0 0 1
         
1 1 1 0 0
= x1 2 + x2 1 + x3 0 + x4 1 + x5 0
        
−1 1 0 0 1
     
1 1 x3
 
= t 2 + 0 1 + x4  .
  
−1 1 x5
Thus,      
x3 6 1
x4  = 10 − t  2  . (2.16)
x5 4 −1
The larger we pick t ≥ 0, the more we will increase the objective function. How large can we
choose t to be? We simply need to make sure that x3 , x4 , x5 ≥ 0, i.e. that,
   
1 6
t 2 ≤ 10 .
  
−1 4
10
Thus, t ≤ 6 and 2t ≤ 10, i.e. t ≤ 2. Note that −t ≤ 4 does not impose any bound on how
large t can be. Picking the largest possible t yields 5. We can summarize this computation as,

6 10
t = min , ,− .
1 2
Replacing t = 5 in (2.16) yields x0 := (5, 0, 1, 0, 9)T with z(x0 ) = 10.

What happens if we try to apply the same approach again? We could try to increase the
value of x20 , but in order to do this we would have to decrease the value of x10 which might
decrease the objective function. The same strategy no longer seems to work! We were able
to carry out the computations the first time around because the linear program (2.15) was in a
suitable form for the original solution x̄. However (2.15) is not in a suitable form for the new
solution x0 . In the next section, we will show that we can rewrite (2.15) so that the resulting
linear program is in a form that allows us to carry out the kind of computations outlined for
2.4. BASES AND CANONICAL FORMS 69
x̄. By repeating this type of computations and rewriting the linear program at every step, this
will lead to an algorithm for solving linear programs which is know as the Simplex algorithm.
Exercises
1. In this exercise you are asked to repeat the argument in Section 2.3 with different examples.
(a) Consider the following linear program,

max (−1, 0, 0, 2)x
subject to

−1 1 0 2 2
x=
1 0 1 −3 3
x ≥ 0.
Observe that x̄ = (0, 2, 3, 0)T is a feasible solution. Starting from x̄ construct a feasible
solution x0 with value larger than that of x̄ by increasing as much as possible the value
of exactly one of x̄1 or x̄4 (keeping the other variable unchanged).
(b) Consider the following linear program,
max (0, 0, 4, −6)x
subject to

1 0 −1 1 2
x=
0 1 −3 2 1
x≥0
Observe that x̄ = (2, 1, 0, 0)T is a feasible solution. Starting from x̄ construct a feasible
solution x0 with value larger than that of x̄ by increasing as much as possible the value
of exactly one of x̄3 or x̄4 (keeping the other variable unchanged). What can you deduce
in this case?
2.4 Bases and canonical forms
2.4.1 Bases
Consider an m × n matrix A where the rows of A are linearly independent. We will denote
column j of A by A j . Let J be a subset of the column indices (i.e. J ⊆ {1, . . . , n}), we define
AJ to be the matrix formed by columns A j for all j ∈ J (where the columns appear in the order
given by their corresponding indices). We say that a set of column indices B forms a basis if
the matrix AB is a square non-singular matrix. Equivalently, a basis corresponds to a maximal
subset of linearly independent columns. Consider for instance,
 
2 1 2 −1 0 0
A = 1 0 −1 2 1 0 .
3 0 3 1 0 1
Then B = {2, 5, 6} is a basis as the matrix AB is the identity matrix. Note that B = {1, 2, 3}
and {1, 5, 6} are also bases, while B = {1, 3} is not a basis (as AB is not square in this case)
and neither is B = {1, 3, 5} (as AB is singular in this case). We will denote by N the set of
column indices not in B. Thus, B and N will always denote a partition of the column indices
of A.
Suppose that in addition to the matrix A we have a vector b with m components, and
consider the system of equations Ax = b. For instance say,
   
2 1 2 −1 0 0 2
1 0 −1 2 1 0 x = 1 . (2.17)
3 0 3 1 0 1 1
Variables x j are said to be basic when j ∈ B and non-basic otherwise. The vector which is
formed by the basic variables is denoted by xB and the vector which is formed by the non-basic
variables is xN . We assume that the components in xB appear in the same order as AB and that
the components in xN appear in the same order as AN . For instance, for (2.17) B = {1, 5, 6} is
a basis, then N = {2, 3, 4} and xB = (x1 , x5 , x6 )T , xN = (x2 , x3 , x4 )T .
The following easy observation will be used repeatedly,
n
Ax = ∑ x j A j = ∑ x j A j + ∑ x j AJ = ABxB + AN xN .
j=1 j∈B j∈N
A vector x̄ is a basic solution of Ax = b for a basis B if the following conditions hold:
1. Ax̄ = b and
2. x̄N = 0.
Suppose x̄ is such a basic solution, then
b = Ax̄ = AB x̄B + AN x̄N = AB x̄B .

|{z}
=0
Since, AB is non-singular it has an inverse and we have x̄B = A−1

B b. In particular it shows,
Remark 2.3. Every basis is associated with a unique basic solution.
For (2.17) and basis B = {1, 5, 6}, the unique basic solution x̄ is x̄2 = x̄3 = x̄4 = 0 and,
   −1    
x̄1 2 0 0 2 1
x̄5  = 1 1 0 1 =  0  .
x̄6 3 0 1 1 −2
Thus, x̄ = (1, 0, 0, 0, 0, −2)T . A basic solution x̄ is feasible if x̄ ≥ 0. A basis B is feasible if the

corresponding basic solution is feasible. If a basic solution (resp. a basis) is not feasible then
it is infeasible. For instance, the basis B = {1, 5, 6} is infeasible as the corresponding basic
solution x̄ has negative entries such as x̄6 = −2. When B = {2, 5, 6}, then as AB is the identity
matrix, the corresponding basic solution is x̄ = (0, 2, 0, 0, 1, 1)T which is feasible as all entries
are non-negative.
The Simplex algorithm will solve linear programs in SEF by considering the bases of the
matrix A where Ax = b are the equations that define all the equality constraints of the linear
program. Of course if the rows of A are not linearly independent then A will not have any
basis. We claim however that we may assume without loss of generality that the rows of A are
indeed linearly independent. For otherwise, you can prove using elementary linear algebra
that one of the following two possibilities must occur:
1. The system Ax = b has no solution,
2. The system Ax = b has a redundant constraint.
If (1) occurs, then linear program is infeasible and we can stop. If (2) occurs, then we can
eliminate a redundant constraint. We repeat the procedure until all rows of A are linearly inde-
pendent. Hence, throughout this chapter we will always assume (without stating it explicitly)
that the rows of the matrix defining the left hand side of the equality constraints in an LP in
SEF are linearly independent.
2.4.2 Canonical forms

Let us restate the linear program (2.15) we were trying to solve in section 2.3,
max z(x) = (2, 3, 0, 0, 0)x
subject to
   
1 1 1 0 0 6 (2.15)
 2 1 0 1 0 x = 10
−1 1 0 0 1 4
x1 , x2 , x3 , x4 , x5 ≥ 0.
Observe that B = {3, 4, 5} is a basis. The corresponding basic solution is given by x̄ = (0, 0, 6,
10, 4)T . Note that x̄ is the feasible solution with which we started the iteration. The basis
B has the property that AB is an identity matrix. In addition, the objective function has the
property that cB = (c3 , c4 , c5 )T = (0, 0, 0)T . These are the two properties that allowed us to
find a new feasible solution x0 = (5, 0, 1, 0, 9)T with larger value. We let the students verify
that x0 is a basic solution for the basis B = {1, 3, 5}. Clearly, for that new basis B, AB is not an
identity matrix, and cB is not the 0 vector. Since these properties are no longer satisfied for x0 ,
we could not carry the computation further.
This motivates the following definition: consider the following linear program in SEF,
max{cT x + z̄ : Ax = b, x ≥ 0}, (P)
where z̄ is a constant and let B be a basis of A. We say that (P) is in canonical form for B if
the following conditions are satisfied:
(C1) AB is an identity matrix,
(C2) cB = 0.
We will show that given any basis B, we can rewrite the linear program (P) so that it is in
canonical form. For instance, B = {1, 2, 4}, is a basis of the linear program (2.15) that can be
rewritten as the following equivalent linear program,
5 1
max z(x) = 17 + (0, 0, − , 0, − )x
2 2
subject to
   
1 0 1/2 0 −1/2 1 (2.18)
0 1 1/2 0 1/2  x = 5
0 0 −3/2 1 1/2 3
x ≥ 0.
How did we get this linear program?

Let us first rewrite the equations of (2.15) so that condition (C1) is satisfied for B =
{1, 2, 4} We left multiply the equations by the inverse of AB (where Ax = b denote the equality
constraints of (2.15) and where B = {1, 2, 4}),
 −1    −1  
1 1 0 1 1 1 0 0 1 1 0 6
 2 1 1  2 1 0 1 0 x =  2 1 1 10 .
−1 1 0 −1 1 0 0 1 −1 1 0 4
The resulting equation is exactly the set of equations in the linear program (2.18).
Let us rewrite the objective function of (2.15) so that condition (C2) is satisfied (still for
basis B = {1, 2, 4}). First, we generate an equation obtained by multiplying the first equation
of (2.15) by y1 , multiplying the second equation of (2.15) by y2 , the third equation by y3 , and
adding each of the corresponding constraints together (where the values of y1 , y2 , y3 are yet to
be decided). The resulting equation can be written as,
   
1 1 1 0 0 6
(y1 , y2 , y3 ) 2 1 0 1 0 x = (y1 , y2 , y3 ) 10 ,
  
−1 1 0 0 1 4
or equivalently as,
   
6 1 1 1 0 0
0 = (y1 , y2 , y3 ) 10 − (y1 , y2 , y3 )  2 1 0 1 0 x.
4 −1 1 0 0 1
Since this equation holds for every feasible solution x, we can add this previous constraint to
the objective function of (2.15), namely z(x) = (2, 3, 0, 0, 0)x. The resulting objective function
is,     
6 1 1 1 0 0
z(x) = (y1 , y2 , y3 ) 10 + (2, 3, 0, 0, 0) − (y1 , y2 , y3 )  2 1 0 1 0 x (?)
4 −1 1 0 0 1
which is of the form z(x) = z̄ + c̄T x. For (C2) to be satisfied we need c̄1 = c̄2 = c̄4 = 0. We
need to choose y1 , y2 , y3 accordingly. Namely, we need,
 
1 1 0
(2, 3, 0) − (y1 , y2 , y3 )  2 1 1 = 0T ,
−1 1 0
or equivalently,  
1 1 0
(y1 , y2 , y3 )  2 1 1 = (2, 3, 0)
−1 1 0
by taking the transpose on both sides of the equation we get

 T    
1 1 0 y1 2
 2 1 1 y2  = 3 .
−1 1 0 y3 0
By solving the system, we get the unique solution y = ( 52 , 0, 12 )T . By substituting y in (?) we

obtain,
5 1
z(x) = 17 + (0, 0, − , 0, − )x
2 2
which is the objective function of (2.18).
Let us formalize these observations. Consider an LP in SEF,
max{cT x + z̄ : Ax = b, x ≥ 0},
where z̄ is a constant. By definition, AB is non-singular, hence A−1

B exists.
We claim that to achieve condition (C1) it suffices to replace Ax = b by,
A−1 −1
B Ax = AB b. (2.19)
First observe that Ax = AB xB + AN xN , thus
A−1 −1
B Ax = AB (AB xB + AN xN )
= A−1 −1
B AB xB + AB AN xN
= xB + A−1
B AN xN .
In particular, the columns corresponding to B in the left hand side of (2.19) form an identity
matrix as required. Moreover, we claim that the set of solutions to Ax = b is equal to the set
of solutions to (2.19). Clearly, every solution to Ax = b is also a solution to A−1 −1
B Ax = AB b as
these equations are linear combinations of the equations Ax = b. Moreover, every solution to
A−1 −1 −1 −1
B Ax = AB b is also a solution to AB AB Ax = AB AB b, but this equation is simply Ax = b,
proving the claim.
Let us consider condition (C2). Let B be a basis of A. Suppose A has m rows, then for any
vector y = (y1 , . . . , ym )T the equation,
yT Ax = yT b
can be rewritten as,

0 = yT b − yT Ax.
Since this equation holds for every feasible solution, we can add this constraint to the objective
function z(x) = cT x + z̄. The resulting objective function is,
z(x) = yT b + z̄ + (cT − yT A)x. (2.20)
Let c̄T := cT − yT A. For (C2) to be satisfied we need c̄B = 0 and need to choose y accordingly.
Namely we want that
c̄TB = cTB − yT AB = 0T
or equivalently that,
yT AB = cTB .
By taking the transpose on both sides we get,
ATB y = cB .
Note that inverse operation and the transpose operations commute, hence (A−1 T T −1
B ) = (AB ) .
Therefore, we will write A−T −1 T
B for (AB ) . Hence, the previous relation can be rewritten as,
y = A−T
B cB . (2.21)
We have shown the following, see (2.19), (2.20), (2.21),
Proposition 2.4. Suppose an LP,
max{z(x) = cT x + z̄ : Ax = b, x ≥ 0}
and a basis B of A are given. Then, the following linear program is an equivalent linear
program in canonical form for the basis B,
max z(x) = yT b + z̄ + (cT − yT A)x

subject to
A−1 −1
B Ax = AB b
x≥0
where y = A−T
B cB .
Exercises
1. Consider the system Ax = b where the rows of A are linearly independent. Let x̄ be a
solution to Ax = b. Let J be the column indices j of A for which x̄ j 6= 0.
(a) Show that if x̄ is a basic solution then the columns of AJ are linearly independent.
(b) Show that if the columns of AJ are linearly independent then x̄ is a basic solution.
Note that (a) and (b) gives you a way of checking whether x̄ is a basic solution, namely
you simply need to verify whether the columns of AJ are linearly independent.
(c) Consider the system of equations,

   
1 1 0 2 1 1 1 2
0 2 2 0 0 −2 1 x = 2
1 2 1 5 4 3 3 6
and the following vectors:

i. (1, 1, 0, 0, 0, 0, 0)T ,
ii. (2, −1, 2, 0, 1, 0, 0)T ,
iii. (1, 0, 1, 0, 1, 0, 0)T ,
iv. (0, 0, 1, 1, 0, 0, 0)T ,
v. (0, 21 , 0, 0, 12 , 0, 1)T .
For each vector in (i)-(v) indicate if it is a basic solution or not. You need to justify
your answers.
(c) Which of the vectors in (i)-(v) are basic feasible solutions?
2. The following LP is in SEF:
max (1, −2, 0, 1, 3)x

subject to

1 −1 2 −1 0 1 (w)
x=
2 0 1 −1 1 −1
x≥0
here x is a vector of the appropriate length. Find an equivalent LP in canonical form for
(a) the basis {1, 4};
(b) the basis {3, 5}.
In each case, state whether the basis (i.e. the corresponding basic solution) is feasible.
3. Let A be an m × n matrix and consider the following linear program (P),
max cT x
subject to
Ax = b
x j ≥ 0 for all j = 1, . . . , n − 1
Convert (P) into a linear program (P’) in SEF by replacing the free variable xn by two
variables xn+ and xn− . Show then that no basic solution x̄ of (P’) satisfy x̄n+ > 0 and
x̄n− > 0.
4. Consider the linear program
max{cT x : Ax = b, x ≥ 0}. (P)
Assume that the rows of A are linearly independent. Let x(1) and x(2) be two distinct
feasible solutions of (P) and define,
1 1
x̄ := x(1) + x(2) .
2 2
(a) Show that x̄ is a feasible solution to (P).
(b) Show that if x(1) and x(2) are optimal solutions to (P) then so is x̄.
(c) Show that x̄ is not a basic feasible solution.

H INT: proceed by contradiction and suppose that x̄ is a basic feasible solution for
some basis B. Denote by N the indices of the columns of A not in B. Then show
(1) (2) (1) (2)
that x̄N = xN = xN = 0 and that xB = xB .
(d) Deduce from (b) and (c) that if (P) has two optimal solutions then (P) has an optimal
solution that is NOT a basic feasible solution.
5. Consider the following linear program (P),
max (0, 0, 0, −1, −2, −6)x + 17

subject to
   
1 0 0 −9 12 17 4
0 1 0 39 0 14  x = 8
0 0 1 −2 13 −66 5
x≥0
(a) Find the basic solution x̄ of (P) for the basis B = {1, 2, 3}.
(b) Show that x̄ is an optimal solution of (P).
(c) Show that x̄ is the unique optimal solution of (P).
H INT: Show that for every feasible solution x0 of value 17, { j : x0j 6= 0} ⊆ B. Then
use Exercise 1.
Consider the following linear program (Q),
max cT x + z̄
subject to
Ax = b
x≥0
where z̄ is a constant, B is a basis of A, AB = I, c j = 0 for every j ∈ B and c j < 0 for

every j 6∈ B. Moreover, assume that b ≥ 0. (The previous example has such a form).
Let x̄ be a basic solution for (Q).
(d) Show that x̄ is an optimal solution of (Q).

(e) Show that x̄ is the unique optimal solution of (Q).
H INT: Same argument as (c).
6. Consider the following LP in SEF,
max{cT x : Ax = b, x ≥ 0}. (P)
where A is an m × n matrix. Let x̄ be a feasible solution to (P) and let J = { j : x̄ j > 0}.
Call a vector d an good direction for x̄ if it satisfies the following properties:
2.5. THE SIMPLEX ALGORITHM 79
(P1) d j < 0 for some j,

(P2) Ad = 0,
(P3) d j = 0 for all j 6∈ J.
(a) Show that if the columns of AJ are linearly dependent then there exists a good
direction for x̄.
H INT: Use the definition of linear dependence to get a vector d. Then possibly
replace d by −d.
(b) Show that if x̄ is not basic then there exists a good direction for x̄.
H INT: Use (a) and the exercise 4(b).
(c) Show that if x̄ has a good direction then there exists a feasible solution x0 of (P) such
that the set J 0 := { j : x0j > 0} has fewer elements than J.
H INT: Let x0 = x + εd for a suitable value ε > 0.
(d) Show that if (P) has a feasible solution then it has a feasible solution that is basic.
H INT: Use (b) and (c) repeatedly.
2.5 The Simplex algorithm
2.5.1 An example with an optimal solution

Let us continue the example (2.15) which we first started in Section 2.3. At the end of the first
iteration, we obtained the feasible solution x̄ = (5, 0, 1, 0, 9)T which is a basic solution for the
basis B = {1, 3, 5}. Using the formulae in Proposition 2.4, we can rewrite the linear program
so that it is in canonical form for that basis. We obtain,
max{z(x) = 10 + cT x : Ax = b, x ≥ 0},
where  
    0
1 1/2 0 1/2 0 5 2
 
A = 0 1/2 1 −1/2 0 b = 1
 c= 
 0 . (2.22)
0 3/2 0 1/2 1 9 −1
0
Let us try to find a feasible solution x with value larger than x̄. Recall, B and N partition the
column indices of A, i.e. N = {2, 4}. Since the linear program is in canonical form, cB = 0.
Therefore, to increase the objective function value, we must select k ∈ N such that ck > 0 and
increase the component x̄k of x̄. In this case, our choice for k is k = 2. Therefore, we set x2 = t
for some t ≥ 0. For all j ∈ N where j 6= k we keep component x j of x equal to zero. It means
in this case that x4 = 0. The matrix equation will tell us what values we need to choose for
xB = (x1 , x3 , x5 )T . Following the same argument as in Section 2.3 we obtain that,
     
x1 5 1/2
xB = x3  = 1 − t 1/2 (2.23)
x5 9 3/2
which implies as xB ≥ 0 that,    

1/2 5
t 1/2 ≤ 1 . (2.24)
3/2 9
The largest possible value for t is given by,

5 1 9
t = min , , = 2. (2.25)
1/2 1/2 3/2
Note that (2.24) can be written as tAk ≤ b (where k = 2). Thus t is obtained by taking the
smallest ratio between the entry bi and entry Aik for all Aik > 0.
Replacing t = 2 in (2.23) yields (4, 2, 0, 0, 6)T . We redefine x̄ to be (4, 2, 0, 0, 6)T and
DO! we now have z(x̄) = 14. This vector x̄ is a basic solution [Place ref to exercise here!] . Since
x̄1 , x̄2 , x̄5 > 0 the basis corresponding to x̄ must contain each of {1, 2, 5}. As each basis of
A contains exactly 3 basic elements, it follows that the new basis must be {1, 2, 5}. We can
rewrite the linear program so that it is in canonical form for that basis and repeat the same
process.
To start the new iteration it suffices to know the new basis {1, 2, 5}. We obtained {1, 2, 5}
from the old basis {1, 3, 5} by adding element 2 and removing element 3. We will say that
2 entered the basis and 3 left the basis. Thus it suffices at each iteration to establish which
element enters and which element leaves the basis. If we set xk = t where k ∈ N, element k
will enter the basis. If some basic variable x` is decreased to 0 then we can select ` to leave
the basis. In (2.25) the minimum was attained for the second term. Thus in (2.23) the second
component of xB will be set to zero. i.e. x3 will be set to zero and 3 will leave the basis.
Let us proceed with the next iteration. Using the formulae in Proposition 2.4 we can
rewrite the linear program so that it is in canonical form for the basis B = {1, 2, 5}. We get,
max{z(x) = 14 + cT x : Ax = b, x ≥ 0},
where  
    0
1 0 −1 1 0 4 0
 
A = 0 1 2 −1 0 
b = 2 c=
−4
 (2.26)
0 0 −3 2 1 6 1
0
Here we have N = {3, 4}. Let us first choose which element k enters the basis. We want k ∈ N
and ck > 0. The only choice is k = 4. We compute t by taking the smallest ratio between entry
bi and entry Aik (where k = 4) for all i where Aik > 0, namely,

4 6
t = min , −, = 3,
1 2
where ”-” indicates that the corresponding entry of Aik is not positive. The minimum was
attained for the ratio 26 , i.e. the 3rd row. It corresponds to the third component of xB . As the
3rd element of B is 5 we will have x5 = 0 for the new solution. Hence, 5 will be leaving the
basis and the new basis will be {1, 2, 4}.
Let us proceed with the next iteration. Using the formulae in Proposition 2.4 we can
rewrite the linear program so that it is in canonical form for the basis B = {1, 2, 4}. We get,

5 1
max z(x) = 17 + 0, 0, − , 0, − x
2 2
subject to
   
1 0 1/2 0 −1/2 1 (2.27)
0 1 1/2 0 1/2  x = 5
0 0 −3/2 1 1/2 3
x ≥ 0.
The basic solution is x̄ := (1, 5, 0, 3, 0)T and z(x̄) = 17. We have N = {3, 5}. We want k ∈ N
and ck > 0. However, c3 = − 52 and c5 = − 12 so there is no such choice. We claim that this
occurs because the current basic solution is optimal.
Let x0 be any feasible solution then,

0 5 1
z(x ) = 17 + 0, 0, − , 0, − x0 ≤ 17.
2 2 |{z}
| {z } ≥0
≤0
Thus 17 is an upper bound for the value of any feasible solution. It follows that x̄ is an optimal
solution.
2.5.2 An unbounded example

Consider the following linear program,
max{z(x) = cT x : Ax = b, x ≥ 0}
where  
−1
3
−2 4 1 0 1 1  
A= b= c= 
 0 .
−3 7 0 1 1 3 0
1
It is in canonical form for the basis B = {3, 4}. Then N = {1, 2, 5}. Let us choose which
element k enters the basis. We want k ∈ N and ck > 0. We have choices k = 2 and k = 5. Let
us select 5. We compute t by taking the smallest ratio between entry bi and entry Aik for all i
where Aik > 0. Namely,
1 3
min , .
1 1
1
The minimum is attained for the ratio 1 which corresponds to the first row. The first basic
variable is 3. Thus 3 is leaving the basis. Hence, the new basis is B = {4, 5}.
Using the formulae in Proposition 2.4 we can rewrite the linear program so that it is in
canonical form for the basis B = {4, 5}. We get,
max{z(x) = 1 + cT x : Ax = b, x ≥ 0},
where  
1
−1
−1 3 −1 1 0 2  
A= b= c= 
−1 .
−2 4 1 0 1 1 0
0
Here N = {1, 2, 3}. Let us choose which element k enters the basis. We want k ∈ N and ck > 0.
The only possible choice is k = 1. We compute t by taking the smallest ratio between entry bi
and entry Aik for all i where Aik > 0. However, as Ak ≤ 0, this is not well defined. We claim
that this occurs because the linear program is unbounded.
The new feasible solution x(t) is defined by setting x1 (t) = t for some t ≥ 0 and x2 = x3 = 0.
The matrix equation Ax = b tells us which values to choose for xB (t) namely (see argument in
Section 2.3),
x4 (t) 2 −1
xB (t) = = −t .
x5 (t) 1 −2
Thus, we have      
t 0 1
 0  0 0
     
x(t) =      
 0  = 0 +t 0 .
 2 + t  2 1
1 + 2t 1 2
| {z } | {z }
:=x̄ :=d
Then x̄ is feasible, Ad = 0, d ≥ 0, and cT d = 1 > 0. Hence, x̄, d form a certificate of unbound-

edness.
2.5.3 Formalizing the procedure

Let us formalize the Simplex procedure described in the previous sections. At each step we
have a feasible basis and we either detect unboundedness, or attempt to find a new feasible
basis where the associated basic solution has larger value than the basic solution for the current
basis.
In light of Proposition 2.4 we can assume that the linear program is in canonical form for
a feasible basis B, i.e. that it is of the form,
max z(x) = z̄ + cTN xN

subject to
(P)
xB + AN xN = b
x ≥ 0,
where z̄ is some real value and b ≥ 0.

Let x̄ be the basic solution for basis B, i.e. x̄N = 0 and x̄B = b.
Remark 2.5. If cN ≤ 0 then x̄ is an optimal solution to (P).
Proof. Suppose cN ≤ 0. Note that z(x̄) = z̄ + cTN x̄N = z̄ + cTN 0 = z̄. For any feasible solution x0
we have x0 ≥ 0. As cN ≤ 0, it implies that cT x0 = cTN xN0 ≤ 0. It follows that z(x0 ) ≤ z̄, i.e. z̄ is
an upper bound for (P). As z(x̄) = z̄, the result follows.
Now suppose that for some k ∈ N we have ck > 0. We define xN0 which depends on some
parameter t ≥ 0 as follows 
t if j = k
0
xj =
0 if j ∈ N \ {k}.
Now we need to satisfy, xB0 + AN xN0 = b. Thus,
xB0 = b − AN xN0 = b − ∑ x0j A j = b − xk0 Ak = b − tAk . (2.28)

j∈N
Remark 2.6. If Ak ≤ 0 then the linear program is unbounded.
Proof. Suppose Ak ≤ 0. Then for all t ≥ 0 we have xB0 = b − tAk ≥ 0. Hence, x0 is feasible.
Moreover,
z(x0 ) = z̄ + cTN xN0 = z̄ + ∑ c j x0j = z̄ + ck xk0 = z̄ + ckt.
j∈N
Since ck > 0, z(x0 ) goes to infinity as t goes to infinity.
Thus, we may assume that Aik > 0 for some row index i. We need to choose t so that
xB0 ≥ 0. It follows from (2.28) that,
tAk ≤ b
or equivalently, for every row index i for which Aik > 0 we must have,
bi
t≤ .
Aik
Hence, the largest value t for which x0 remains non-negative is given by

bi
t = min : Aik > 0 .
Aik
Note that since Ak ≤ 0 does not hold, this is well defined. Let r denote the index i where the
minimum is attained in the previous equation. Then (2.28) implies that the rth entry of xB0 will
be zero. Let ` denote the rth basic variable of B. Note that since we order the components
of xB in the same order as B, the rth component of xB0 is the basic variable x`0 . It follows that
x`0 = 0. Choose
B0 = B ∪ {k} \ {`}, N 0 = {1, . . . , n} ∪ {`} \ {k}.
We let the reader verify that xN0 0 = 0. It can be readily checked that B0 is a basis. Then, x0 must
be a basic solution for the basis B0 .
Let us summarize the simplex procedure for
max{cT x : Ax = b, x ≥ 0}. (P)
Note, we have argued that if the algorithm terminates, then it provides us with a correct
solution. However, the algorithm as it is described need not stop. Suppose that at every step,
Algorithm 1 Simplex Algorithm.

Input: Linear program (P) and feasible basis B.
Output: An optimal solution of (P) or a certificate proving that (P) is unbounded.
1: Rewrite (P) so that it is in canonical form for the basis B.
Let x̄ be the basic feasible solution for B.
2: if cN ≤ 0 then stop (x̄ is optimal) end if
3: Select k ∈ N such that ck > 0.
4: if Ak ≤ 0 then stop ((P) is unbounded) end if
5: Let r be any index i where the following minimum is attained,

bi
t = min : Aik > 0 .
Aik
6: Let ` be the rth basis element.

7: Set B := B ∪ {k} \ {`}.
8: Go to step 1.
the quantity t > 0. Then at every step, the objective function will increase. Moreover, it
is clear that at no iteration will the objective function decrease or stay the same. Hence, in
that case we never visit the same basis twice. As there are clearly only a finite number of
bases, this would guarantee that the algorithm terminates. However, it is possible that at every
iteration the quantity t equals 0. Then at the start of the next iteration we get a new basis, but
the same basic solution. After a number of iterations, it is possible that we revisit the same
basis. If we repeat this forever, the algorithm will not terminate.
There are a number of easy refinements to the version of the Simplex algorithm we de-
scribed that will guarantee termination. The easiest to state is as follows: Throughout the
Simplex iterations with t = 0, in Step 2, among all j ∈ N with c j > 0, choose k := min{ j ∈
N : c j > 0}; also, in Step 3, define t as before and choose the smallest r ∈ B with Ark > 0 and
br
Ark = t. This rule is known as the smallest subscript rule or Bland’s rule.
Theorem 2.7. The Simplex procedure with the smallest subscript rule terminates.
We will omit the proof of this result in these notes.

Exercises
1. Consider the LP problem max{cT x : Ax = b, x ≥ 0} where
 
0
1 2 −2 0 2 3
A= b= c= 
1 .
0 1 3 1 5
0
(a) Beginning with the basis B = {1, 4}, solve the problem with the simplex method. At
each step choose the entering variable by the smallest-subscript rule.
(b) Give a certificate of optimality or unboundedness for the the problem, and verify it.

 
2
  9 −4
2 5 1 0 3 1  
4 1
A = 0 2 2 −4 2 −4 b = 0 c= 
 4 .
3 5 1 2 6 3 4  
8
4
Let B = {1, 3, 4}. You are given that
 
2 − 12 −1
A−1
B =
 −3 1 2 .
3 1
−2 4 1
(a) Find the basic solution x determined by B, and the canonical form of the problem
corresponding to B. Is x feasible?
(b) Apply the simplex method beginning with the canonical form from part (a). Choose
the entering variable by the smallest-subscript rule. Go as far as finding the entering
and leaving variables on the first iteration, the new basis and the new basic solution.
Do not go further.

 
2
    1
−2 1 1 1 0 0 1  
−1
A =  1 −1 0 0 1 0 b = 2
 c= 
 0 .
2 −3 −1 0 0 1 6  
0
0
Notice that the problem is in canonical form with respect to the basis B = {4, 5, 6} and
that B determines a feasible basic solution x. Beginning from this canonical form, solve
the problem with the simplex method. At each step choose the entering variable by the
smallest-subscript rule. At termination, give a certificate of optimality or unboundedness.
4. Suppose at some step of the simplex procedure we have a feasible basis B and a linear
program
max{cT x : Ax = b, x ≥ 0}
which is in canonical form for the basis B. Following the simplex procedure we choose an
entering variable k and a leaving variable `. At the next step of the simplex procedure we
will consider the basis B0 := B ∪ {k} \ {`}. Explain why the new set B0 is in fact a basis.
H INT: Matrix AB is the identity matrix. Show that the columns of AB0 are linearly in-
dependent. Whether a set J ⊆ {1, . . . , n} is a basis or not, does not change when we do
row operations on A, or equivalently, it does not change when we left-multiply A by an
invertible matrix.
5. The princesss wedding ring can be made from four types of gold 1, 2, 3, 4 with the follow-
ing amounts of milligrams of impurity per gram, and with values as follows,
Type 1 2 3 4
mg of lead 1 2 2 1
mg of cobalt 0 1 1 2
value 1 2 3 2
Set up an LP which will determine the most valuable ring that can be made containing
at most 6mg of lead and at most 10mg of cobalt. Put the LP into SEF and then solve it
using the simplex algorithm. (For ease of marking, keep the variables in the natural order,
and use the smallest index rule for choosing the entering variable when doing a simplex
iteration.)
6. The Simplex Algorithm solves problem,
max{z = cT x : Ax = b, x ≥ 0}. (P)
min{cT x : Ax = b, x ≥ 0} (Q)
(a) Indicate what changes you need to make to the algorithm to solve (Q) instead of (P)
(max was replaced by min). Do NOT convert (Q) into SEF, your algorithm should be
able to deal with (Q) directly.
(b) Explain why the solution is optimal for (Q) when your algorithm claims that it is.
(c) Explain why (Q) is unbounded when your algorithm claims that it is.
7. Let A be an m × n matrix (with independent rows), let c and ` be vectors with n entries and
let b be a vector with m entries. Consider the following linear program,
max{cT x : Ax = b, x ≥ `}. (P)
Thus if ` = 0 then (P) is in SEF. One way to solve (P) would be for you to convert it into
SEF (using the procedure in class) and use the Simplex. Instead I want you to come up
with your own algorithm to solve (P) directly by modifying the simplex algorithm. Here
are some guidelines,
• When ` = 0 then your algorithm should be the same as the Simplex algorithm,
• Given a basis B, redefine x̄ to be basic if Ax̄ = b and for every j 6∈ B, x̄ j = ` j (when
` = 0 this corresponds to the standard definition). Define, B to be feasible if the
corresponding basic solution x̄ is a feasible solution of (P).
• Your algorithm should go from feasible basis to feasible basis, attempting at each
step to increase the value of the corresponding basic solution. You need to indicate
how the entering and leaving variables are defined. You also need to indicate how to
detect when (P) is unbounded and when your solution is optimal.
Suppose you have in addition a vector u with n entries that give upper values for each of
the variables, i.e. you have the following linear program,
max{cT x : Ax = b, x ≥ `, x ≤ u}. (P’)
One way to solve (P’) would be for you to convert it into SEF and use the simplex. Instead
I want you to come up with your own algorithm to solve (P’) directly by modifying the
simplex algorithm.
H INT: Given a basis B redefine x̄ to be basic if Ax̄ = b and for every j 6∈ B either x j = ` j or
x j = u j . Suppose that you rewrite the objective function so that it is of the form
z = c̄T x + z̄,
where z̄ is some constant and c̄ j = 0 for every j ∈ B. Then show that a feasible basic
solution x̄ (using our new definition) is optimal if the following holds for every j 6∈ B,
• if c̄ j < 0 then x̄ = ` j ,
• if c̄ j > 0 then x̄ = u j .
8. Suppose that an LP in SEF is changed by defining x10 by x1 = 10x10 and substituting for x1 .
The new problem is equivalent to the old problem. If we now solve the new problem with
the simplex method (and apply the same rule for the choice of the leaving variable), will
the same choice of entering variables occur as when the old problem was solved? Discuss
this question for the four entering-variable rules mentioned below:
(a) The largest coefficient rule: When Dantzig first proposed the simplex algorithm he
suggested choosing among all j ∈ N with c j > 0, a k ∈ N such that ck is maximum.
This rule is sometimes called Dantzig’s rule or the largest coefficient rule. Based on
the formula for the objective value for the new basic solution, it chooses the variable
giving the largest increase in the objective function per unit increase in the value of
the entering variable.
(b) The smallest subscript rule.
The largest improvement rule. This is the rule that chooses the entering variable to be
the one that leads to the largest increase in the objective value. To choose the variable,
therefore, we have to compute for each j ∈ N for which c j > 0, the amount t j by which
we will be able to increase the value of x j , and find the maximum of c j t j over all such
j and choose the corresponding index k,
(c) The steepest edge rule. This rule is geometrically motivated. In moving from the
current basic feasible solution to a new one, we move a certain distance in Rn . This
rule chooses the variable which gives the largest increase in the objective value per
unit distance moved. Suppose that k ∈ N is chosen and the new value of xk will be t.
Then the change in the value of the basic variable xi is −Aikt. Since there is no change
in the value of the other nonbasic variables, the distance moved is
r r
t + ∑ (Aikt) = t 1 + ∑ (Aik )2 .
2 2
i∈B i∈B
Since the change in the objective value is tck , we choose k ∈ N with ck > 0 so that
ck
p
1 + ∑i∈B (Aik )2
is maximized.
9. Show that the simplex method can cycle in twelve iterations beginning from the following
tableau. (Hint: x3 enters on the first iteration, and x4 enters on the second. Now do you
see the pattern?)
z x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12

1 0 0 −1 −2 0 −3 2 −2 3 0 2 1 0
0 1 0 1 1 0 1 −1 0 −1 −1 0 −1 0
0 0 1 −1 0 −1 −1 0 −1 1 0 1 1 0
10. Consider the Simplex Method applied to an LP problem in SEF with data (A, b, c) (A ∈
Rm×n , rank(A) = m).
(a) Let y ∈ Rm . Define

c̄ := c − AT y.
Prove that for every d ∈ Rn such that Ad = 0, we have c̄T d = cT d.

(b) Suppose we are solving an LP problem in the above form by Simplex Method with the
smallest subscript rules for choosing the entering and leaving variables. Also suppose
that the method cycles on this LP problem and we encounter the bases (in order):
B1 , B2 , . . . , Bt , Bt+1 = B1 .
Partition the indices {1, 2, . . . , n} into three sets as follows:

( )
t
[
J− := j: j∈
/ Bi ,
i=1
( )
t
\
J+ := j: j∈ Bi ,
i=1
J0 := { j : j ∈ Bi1 and j ∈
/ Bi2 for some i1 , i2 ∈ {1, 2, . . . ,t}} .
Let ` := max { j : j ∈ J0 } . Note that there exist h1 , h2 ∈ {1, 2, · · · ,t} such that when the
current basis in the Simplex Method is Bh1 , x` leaves the basis and when the current
basis is Bh2 , x` enters the basis. What is the value of xi for every i ∈ J0 in all the basic
feasible solutions visited throughout the cycle? Justify your answer.
2.6. FINDING FEASIBLE SOLUTIONS 91
(c) When x` leaves the basis, the Simplex Method constructs a d vector (a direction to
move along). Which d j are zero? Which d j are positive? Which d j are negative?
Which d j are non-negative? Justify your answer.
(d) Now, consider the Simplex Tableau when x` enters the basis. Which c̄ j are zero?
Which c̄ j are positive? Which c̄ j are non-positive? Justify your answer.
(e) Now, find a contradiction by considering cT d and c̄T d for these two iterations. and
conclude that starting with a basic feasible solution, the Simplex Method with the
smallest subscript rules terminates after finitely many iterations.
2.6 Finding feasible solutions

The Simplex algorithm requires a feasible basis as part of its input. We will describe in this
section on how to proceed when we are not given such feasible basis. We will first take a step
back and look at the problem in the more general context of reducible problems.
2.6.1 Reducibility
In Section 1.2.1 we saw an inventory problem (the KWoil problem) with three time periods.
We showed that this problem can be formulated as an LP. Moreover, it is easy to see that if
we generalize this inventory problem to one with an arbitrary number of time periods, it can
still be formulated as an LP. Thus, assuming that we have an efficient algorithm to solve linear
programs, we can find an efficient algorithm to solve any of these inventory problems. We
say that this class of inventory problem is reducible to linear programming. More generally,
consider two classes of optimization problems, say A and B. If given an efficient algorithm to
solve all instances of problem B, we can solve every instance of problem A efficiently, then
we say A is reducible to B.
Consider the following linear program in SEF,
max{cT x : Ax = b, x ≥ 0} (P)
and let us define the following three optimization problems,
Problem A.
Either,
1. prove that (P) has no feasible solution, or
2. prove that (P) is unbounded, or
3. find an optimal solution to (P).
Problem B.
Either,
1. prove that (P) has no feasible solution, or
2. find a feasible solution to (P).
Problem C.
Given a feasible solution x̄, either,

1. prove that (P) is unbounded, or
2. find an optimal solution to (P).
The problem we wish to solve is problem A. Note, that the simplex procedure essentially
solves problem C (with the difference that we need a feasible basic solution rather than an
arbitrary feasible solution, this will be addressed in Section 2.6.2). We will show the following
result,
Proposition 2.8. Problem B is reducible to problem C.
Suppose you have an algorithm to solve C and wish to solve A. You proceed as follows:
Proposition 2.8 implies that you can solve problem B. If we get that (P) has no feasible so-
lution we can stop as we solved A. Otherwise, we obtain a feasible solution x̄ of (P). Using
this solution x̄ we can use our algorithm to solve C to either deduce that (P) is unbounded or
to find a feasible solution to (P). Hence, we have solved A as well. Hence, Proposition 2.8
implies that
Proposition 2.9. Problem A is reducible to problem C.
It remains to prove Proposition 2.8.

Let us first proceed on an example and suppose that (P) is the following linear program
with variables x1 , x2 , x3 , x4 ,
max (1, 2, −1, 3)x
subject to

1 5 2 1 7
x=
−2 −9 0 3 −13
x≥0
Using an algorithm for problem C we wish to find a feasible solution for (P) if it exists or
show that (P) has no feasible solution. Let us first rewrite the constraints of (P) by multiplying
every equation by −1, where the right hand side is negative, we obtain,

1 5 2 1 7
x=
2 9 0 −3 13
x = (x1 , x2 , x3 , x4 )T ≥ 0.
We now define the following auxiliary linear program,
min x5 + x6
subject to

1 5 2 1 1 0 7 (Q)
x=
2 9 0 −3 0 1 13
x = (x1 , x2 , x3 , x4 , x5 , x6 )T ≥ 0.
Variables x5 , x6 are the auxiliary variables. Observe that B = {5, 6} is a basis, and that the cor-
responding basic solution x̄ = (0, 0, 0, 0, 7, 13)T is feasible, since all entries are non-negative.
Note, that this is the case since we made sure that the right hand side of the constraints of
(Q) are all non-negative. We can use the algorithm for problem C to solve (Q). Note that 0
is a lower bound for (Q), hence (Q) is not unbounded. It follows that the algorithm for C
will find an optimal solution. In this case the optimal solution is x0 = (2, 1, 0, 0, 0, 0)T . Since
x50 = x60 = 0 it follows that (2, 1, 0, 0)T is a feasible solution to (P). Hence, we have solved
problem B.
Consider a second example and suppose that (P) is the following linear program,
max (6, 1, −1)x
subject to

5 1 1 1
x=
−1 1 2 5
x = (x1 , x2 , x3 )T ≥ 0.
The corresponding auxiliary problem is
min x4 + x5
subject to

5 1 1 1 0 1 (Q)
x=
−1 1 2 0 1 5
x = (x1 , x2 , x3 , x4 , x5 )T ≥ 0.
An optimal solution to (Q) is x0 = (0, 0, 1, 0, 3)T which has value 3. We claim that (P) has no
feasible solution in this case. Suppose for a contradiction that there was a feasible solution
x̄1 , x̄2 , x̄3 to (P). Then x̄ = (x̄1 , x̄2 , x̄3 , 0, 0)T would be a feasible solution to (Q) of value 0,
contradicting the fact that x0 is optimal.
Let us summarize these observations. We consider
max{cT x : Ax = b, x ≥ 0} (P)
where A has m rows and n columns. We may assume that b ≥ 0 as we can multiply any
equation by −1 without changing the problem. We construct the auxiliary problem,
min w = xn+1 + . . . + xn+m

subject to
 
x1
  (Q)
A I  ...  = b
xn+m
(x1 , . . . , xn+m )T ≥ 0.
We leave the proof of the next remark as an easy exercise (follow the argument outlined in the
aforementioned examples),
Remark 2.10. Let x0 = (x10 , . . . , xn+m

0 )T be an optimal solution to (Q),
1. If w = 0 then (x10 , . . . , xn0 )T is a solution to (P).
2. If w > 0 then (P) is infeasible.
We can now prove Proposition 2.8. Construct from (P) the auxiliary problem (Q). Find an
optimal solution x0 for (Q) using the algorithm for Problem C. Then (P) has a feasible solution
if and only if w = 0 as indicated in the previous Remark.
Note, problem A is more general than problem B. However, it is an easy consequence

from Theorem 4.3 that in fact Problem A can be reduced to problem B. In other words for
linear programs, if we have an algorithm that can find feasible solutions (when they exists),
we can derive an algorithm that can find optimal solutions (when they exist).
2.6.2 The Two Phase method - an example

We illustrate the method presented in the previous section on an example. During Phase I,
we look for a basic feasible solution (if one exists), and during Phase II, we find an optimal
solution (if one exists), starting from the feasible basic solution obtained during Phase I.
Consider,
max 2 −1 2 x
subject to

−1 −2 1 −1 (P)
x= .
1 −1 1 3
x = (x1 , x2 , x3 )T ≥ 0
P HASE I.
We construct the auxiliary problem,

max 0 0 0 −1 −1 x
subject to

1 2 −1 1 0 1 (Q)
x= .
1 −1 1 0 1 3
x = (x1 , x2 , x3 , x4 , x5 )T ≥ 0
Note that the objective function is equivalent to min x4 + x5 . B = {4, 5} is a feasible basis,
however (Q) is not in canonical form for B. We could use the formulae in Proposition 2.4 to
rewrite (Q) in canonical form, but a simpler approach is to add each of the two equations of
(Q) to the objective function. The resulting linear program is,

max 2 1 0 0 0 x−4
subject to

1 2 −1 1 0 1
x= .
1 −1 1 0 1 3
x = (x1 , x2 , x3 , x4 , x5 )T ≥ 0
Solving this linear program using the Simplex algorithm starting from B = {4, 5} we obtain
the optimal basis B = {1, 3}. The canonical form for B is,

max 0 0 0 −1 −1 x
subject to

1 12 0 12 1
2 (Q’)
2 x= .
0 − 32 1 − 21 1
2
1
T
x = (x1 , x2 , x3 , x4 , x5 ) ≥ 0
The basic solution corresponding to B is x̄ = (2, 0, 1, 0, 0)T which has value 0. It follows
from Remark 2.10 that (2, 0, 1)T is a feasible solution for (P). Moreover, (2, 0, 1)T is the basic
solution for basis B of (P), hence B is a feasible basis of (P). Note, it is always true that the
feasible solution we construct after solving (Q) using the Simplex procedure will be a basic
solution of (P). It need not be the case that B be a basis of (P), however.
P HASE II.
We can use the formulae in Proposition 2.4 to rewrite (P) in canonical form for the basis
B = {1, 3}. Note that to obtain the constraints we can use the constraints of (Q’) (omitting the
auxiliary variables). We obtain,

max 0 1 0 x+6
subject to

1 12 0 2
3 x= .
0 −2 1 1
x = (x1 , x2 , x3 )T ≥ 0
Solving this linear program using the Simplex algorithm starting from B = {1, 3} we obtain
the optimal basis B = {2, 3}. The canonical form for B is,

max −2 0 0 x + 10
subject to

2 1 0 4
x=
3 0 1 7
x≥0
Then the basic solution (0, 4, 7)T is an optimal solution for (P).
2.6.3 Consequences
Suppose that we are given an arbitrary linear program (P) in SEF. Let us run the Two Phase
method for (P) using the smallest subscript rule. Theorem 2.7 implies that the Two Phase
method will terminate. The following now is a consequence of our previous discussion,
Theorem 2.11 (Fundamental Theorem of LP (SEF)).

Let (P) be an LP problem in SEF. If (P) does not have an optimal solution, then (P) is either
infeasible or unbounded. Moreover,
1. if (P) is feasible then (P) has a basic feasible solution;
2. if (P) has an optimal solution, then (P) has a basic feasible solution that is optimal.
Since we can convert any LP problem into SEF while preserving the main property of the LP,
the above theorem yields the following result for LP problems in any form.
Theorem 2.12 (Fundamental Theorem of LP).

Let (P) be an LP problem. Then exactly one of the following holds:
1. (P) is infeasible.
2. (P) is unbounded.
3. (P) has an optimal solution.
Exercises
1. Consider an LP of the form max{cT x : Ax = b, x ≥ 0}. Use the Two Phase Simplex Method
using the smallest index rule, to solve the LP for each of the following cases,
(a)  
1
1 −2 1 2
A= b= c = 3 .
−1 3 −2 −3
2
(b)  
27
2 4 2 2
A= b= c =  2 .
−1 −3 −2 0
−6
(c)  
0
1
1 1 1 −1 0 1  
A= b= c= 
 0 .
0 1 2 0 1 2 −1
0
(d)  
2
0 
−1 1 0 0 −1 1  
A= b= c= 
2  .
1 0 −1 1 2 1 0 
3
(e)  
−3
−1
2 −1 4 −2 1 2  
A= b= c= 
 1 .
−1 0 −3 1 −1 1 4
7
2.7 Pivoting
The Simplex algorithm requires us to reformulate the problem in canonical form for every ba-
sis. We can describe the computation required between two consecutive iteration in a compact
way. We explain this next. Let T be an m × n matrix and consider (i, j), where i ∈ {1, . . . , m}
and j ∈ {1, . . . , n} such that Ti, j 6= 0. We say that matrix T 0 is obtained from T by pivoting on
element (i, j) if T 0 is defined as follows: for every row index k,

 1 rowk (T ) if k = i
0
rowk (T ) = Ti, j
row (T ) − Tk, j row (T ) if k 6= i.
k Ti, j i
We illustrate this on an example. Consider the matrix,

 
2 2 −1 0 3
T = 0 3 2 3 −5 .
3 −1 −2 1 5
Let us compute the matrix T 0 obtained from T by pivoting on element (2, 3). (We will use the
convention that elements on which we pivot are surrounded by a square.) We get
 
2 7/2 0 3/2 1/2
T 0 = 0 3/2 1 3/2 −5/2 .
3 2 0 4 0
2.7. PIVOTING 99
Observe, that the effect of pivoting on element (i, j) is to transform column j of the matrix T
into the vector ei (the vector where all the entries are zero except for entry i which is equal to
1).
The students should verify that T 0 = Y T where,
 
1 1/2 0
Y = 0 1/2 0 .
0 1 1
Consider the system of equations,
   
2 2 −1 0 3
0 3  
2 3 x = −5 . (2.29)
3 −1 −2 1 5
We can represent this system by the matrix T . Namely, T is obtained from the coefficients
of the left hand side by adding an extra column corresponding to the right hand side. Given
T 0 we may construct a system of equations where we do the aforementioned operations in
reverse, namely, the set of all but the last columns of T 0 corresponds to the left hand side and
the last column corresponds to the right hand side. Then we get,
   
2 7/2 0 3/2 1/2
0 3/2 1 3/2 x = −5/2 . (2.30)
3 2 0 4 0
Since T 0 = Y T it follows that equation (2.30) is obtained from equation (2.29) by left multi-
plying by the matrix Y . Observe that the matrix Y is non-singular, hence the set of solutions
for (2.30) is the same as for (2.29). Hence, we used pivoting to derive an equivalent system of
equations.
We can proceed in a similar way in general. Given a system Ax = b we construct a matrix
T by adding to A an extra column corresponding to b, i.e. T = (A|b). Let T 0 = (A0 |b0 ) be
obtained from T by pivoting. Then T 0 = Y T for some non-singular matrix Y . It follows that
A0 = YA and b0 = Y b. So, in particular, Ax = b and A0 x = b0 have the same set of solutions.
We say that T is the tableau representing the system Ax = b.
To show how this discussion relates to the Simplex algorithm, let us revisit example (2.15),
max z = 0 + (2, 3, 0, 0, 0)x

subject to
   
1 1 1 0 0 6
 2 1 0 1 0 x = 10
−1 1 0 0 1 4
x1 , x2 , x3 , x4 , x5 ≥ 0.
Let us express both the equations, and the objective function as a system of equations,
 
  z  
1 −2 −3 0 0 0  x1 
 0
0 1  
1 1 0 0 x2   6  
 
0 2 1 0 1 0   = 10 . (2.31)
x3 
0 −1 1 0 0 1 x4  4
x5
Note, the first constraint of (2.31) states that z − 2x1 − 3x2 = 0, i.e. that z = 2x1 + 3x3 , which
is the objective function. Let T 1 be the tableau representing the system (2.31), namely,
 
1 −2 −3 0 0 0 0
 0 1 1 1 0 0 6 
T1 =   0 2
.
1 0 1 0 10 
0 −1 1 0 0 1 4
Note, for readability the vertical bars separate the z columns and the right hand side. The
horizontal separates the objective function from the equality constraints. Let us index the
column z by element 0 and the constraint corresponding to the objective function by 0 as well.
Thus the first row and first column of T 1 are row and column zero. Observe, that the linear
program is in canonical form for B = {3, 4, 5} as the columns of T 1 indexed by {0, 3, 4, 5},
form an identity matrix.
In general given the linear program,
max{z = z̄ + cT x : Ax = b, x ≥ 0} (P)
we construct the tableau,

1 −cT z̄
T=
0 A b
and the linear program (P) is in canonical form for basis B exactly when the columns of T
formed by columns B ∪ {0} form an identity matrix.
2.7. PIVOTING 101
Let us try to solve the linear program working with the tableau T 1 only. We select as
an entering variable k ∈ N such that ck > 0. It means in T 1 that we are selecting a column
1 is smaller than 0. We can select column 1 or 2, say select column
k ∈ {1, . . . , 5} where T0,k
k = 1. Note that b corresponds to column 6 (rows 1 to 3) of T1 . We select the row index
1 /T 1 where T 1 > 0, i.e. we consider
i ∈ {1, 2, 3}, minimizing the ratio Ti,6 i,k i,k

6 10
min , ,−
1 2
where the minimum is attained for row i = 2. Let us now pivot on the element (k, i) = (1, 2).
We obtain the following tableau,
 
1 0 −2 0 1 0 10
 0 0 1/2 1 −1/2 0 1 
T2 = 
 0
.
1 1/2 0 1/2 0 5 
0 0 3/2 0 1/2 1 9
Since we pivoted on (2, 1), column 1 of T 2 will have a 1 in row 2 and all other elements
will be zero. Since row 2 has zeros in columns 0, 3, 5 these columns will be unchanged in
T 2 . It follows that columns 0, 1, 3, 5 will form a permutation matrix in T 2 . We could permute
rows 1,2 of T 2 so that columns indexed by 0, 1, 3, 5 form an identity matrix, and therefore that
T 2 represents an LP in canonical form for the basis B = {1, 3, 5}. However, reordering the
rows will prove unnecessary in this procedure. The Simplex procedure consists of selecting a
column j, selecting a row i and pivoting on (i, j).
We state the remaining sequence of tableaus obtained to solve (2.15).
 
1 0 0 4 −1 0 14
 0 0 1 2 −1 0 2 
T3 = 
 0

1 0 −1 1 0 4 
0 0 0 −3 2 1 6
and finally,
 
1 0 0 5/2 0 1/2 17
 0 0 1 1/2 0 1/2 5 
T4 = 
 0
.
1 0 1/2 0 1/2 1 
0 0 0 −3/2 1 1/2 3
The corresponding objective function is now, z = 17−5/2x4 −1/2x5 . Hence the basic solution
x = (1, 5, 0, 3, 0)T is optimal.
2.8 Geometry
In this section, we introduce a number of geometric concepts and will interpret much of the
material defined in the previous section through the lens of geometry. Questions that we will
address include: what can we say about the shape of the set of solutions to a linear program,
how are basic feasible solutions distinguished from the set of all feasible solutions, what does
that say about the simplex algorithm.
2.8.1 Feasible solutions to linear programs and polyhedra

max (c1 , c2 )x
s.t.    
1 1 3 (1)
0 1  2 (2) (2.32)
   
1 0  x ≤ 2 (3)
   
−1 0  0 (4)
0 −1 0 (5)
We represented the set of all feasible solutions to (2.32) in the following figure. The set of
all points (x1 , x2 )T satisfying constraint (2) with equality correspond to line (2). The set of
all points satisfying constraint (2) correspond to all points to the left of line (2). A similar
argument holds for constraints (1),(3),(4) and (5). Hence, the set of all feasible solutions of
(2.32) is the shaded region (called the feasible region). Looking at examples in R2 as above
can be somewhat misleading however. In order to get the right geometric intuition we need
to introduce a number
q of definitions. Given a vector x = (x1 , . . . , xn ), the Euclidean norm of
x is defined as x12 + . . . + xn2 and we will denote it by kxk. kxk is the length of vector x and
measures the distance of x from the origin 0.
Remark 2.13. Let a, b ∈ Rn . Then, aT b = kakkbk cos(θ ), where θ is the angle between a and
b. Therefore, for every pair of nonzero vectors a, b, we have
• aT b = 0 if and only if a, b are orthogonal,
• aT b > 0 if and only if the angle between a, b is less than 90o ,
• aT b < 0 if and only if the angle between a, b is larger than 90o

2.8. GEOMETRY 103
x2
(2)
3
(1)
2
(3)
(5)
0 1 2 3 x1
(4)
Figure 2.1: Feasible region
Let a be a non-zero vector with n components and let β ∈ R, we define
1. H := {x ∈ Rn : aT x = β } is a hyperplane, and
2. F := {x ∈ Rn : aT x ≤ β } is a halfspace.
Consider the following inequality,

aT x ≤ β . (?)
Hence, H is the set of points satisfying constraint (?) with equality and F is the set of points
satisfying constraint (?). Suppose that x̄ ∈ H and let x be any other point in H. Then aT x̄ =
aT x = β . Equivalently, aT (x − x̄) = 0, i.e. a and x − x̄ are orthogonal. This implies (1) in the
following remark, we leave (2) as an exercise,
Remark 2.14. Let x̄ ∈ H.
1. H is the set of points x for which a and x − x̄ are orthogonal,
2. F is the set of points x for which a and x − x̄ form an angle of at least 90o .
We illustrate the previous remark in the following figure. The line is the hyperplane H and
the shaded region is the halfspace F.
a
H
x̄
x x − x̄ x−
x̄
x
F
In R2 a hyperplane is a line, i.e. a 1-dimensional object. What about in Rn ? Consider the

hyperplane H := {x ∈ Rn : aT x = 0}. Then H is a vector space and we know how to define its
dimension. Recall, that for any m × n matrix A we have the relation,
dim{x : Ax = 0} + rank(A) = n.
It follows that dim{x : aT x = 0} + rank(aT ) = n. Since by definition a 6= 0, rank(aT ) = 1, i.e.

dim(H) = dim({x : aT x = 0}) = n − 1. Hence, hyperplanes are n − 1 dimensional objects.
For any m × n matrix A and vector b, we say that P := {x ∈ Rn : Ax ≤ b} is a polyhedron.
Note, that the set of solutions to any one of the inequalities of Ax ≤ b is a halfspace. Thus
equivalently, we could define a polyhedron to be the intersection of a finite number of halfs-
paces. Given an inequality aT x ≥ β we can rewrite it as −aT x ≤ −β and given an equation
aT x = β we can rewrite it as aT x ≤ β and −aT x ≤ −β . Hence, any set of linear constraints
can be rewritten as Ax ≤ b for some matrix A and some vector b. It follows in particular
that the set of feasible solutions to an LP is a polyhedron. In the following section we will
investigate geometric properties of polyhedra.
2.8.2 Convexity
Let x(1) , x(2) be two points in Rn . We define the line through x(1) and x(2) to be the set of points
{λ x(1) + (1 − λ )x(2) : λ ∈ R}.
We define the line segment with ends x(1) and x(2) to be the set of points
{λ x(1) + (1 − λ )x(2) : 0 ≤ λ ≤ 1}.
Observe that the aforementioned definitions correspond to the commonly used notions of lines
and line segments. A subset C of Rn is said to be convex if for every pair of points x(1) and
x(2) in C the line segment with ends x(1) , x(2) is included in C.
2.8. GEOMETRY 105
The shaded regions in figure (i) contained in R2 is convex, as is the shaded region (iii)
contained in R3 . The shaded regions corresponding to figures (ii) and (iv) are not convex. We
prove this for either case by exhibiting two points x(1) , x(2) inside the shaded region for which
the line segment with ends x(1) , x(2) is not completely included in the shaded region.
x (1) x (2)
x (2)
x (1)
(i) (ii) (iii) (iv)
Remark 2.15. Halfspaces are convex.
Proof. Let H be a halfspace, i.e. H = {x : aT x ≤ β } for some vector a ∈ Rn and β ∈ R.

Let x(1) , x(2) ∈ H. Let x̄ be an arbitrary point in the line segment between x(1) and x(2) , i.e.
x̄ = λ x(1) + (1 − λ )x(2) for some λ ∈ [0, 1]. We need to show that x̄ ∈ H. We have

aT x̄ = aT λ x(1) + (1 − λ )x(2) = |{z} x(1)} + (1 − λ ) a| T{z
λ a| T{z x(2)} ≤ λ β + (1 − λ )β = β .
| {z }
≥0 ≤β ≥0 ≤β
Hence, x̄ ∈ H as required.
Remark 2.16. For every j ∈ J let C j denote a convex set. Then the intersection
\
C := {C j : j ∈ J}
is convex. Note, that J can be infinite.
Proof. Let x(1) and x(2) be two points that are in C. Then for every j ∈ J, x(1) , x(2) ∈ C j and
since C j is convex the line segment between x(1) and x(2) is in C j . It follows that the line
segment between x(1) and x(2) is in C. Hence, C is convex.
Remarks 2.15 and 2.16 imply immediately that,
Proposition 2.17. Polyhedra are convex.
Note, that the unit ball is an example of a convex set that is not a polyhedron. It is the
intersection of an infinite number of halfspaces (hence, convex) but cannot be expressed as
the intersection of a finite number of halfspaces.
2.8.3 Extreme points

We say that a point x is properly contained in a line segment if it is in the line segment but it
is distinct from its ends. Consider a convex set C and let x be a point of C. We say that x is an
extreme point of C if no line segment that properly contains x is included in C. Equivalently,
Remark 2.18. x ∈ C is not an extreme point of C if and only if
x = λ x(1) + (1 − λ )x(2)
for distinct points x(1) , x(2) ∈ C and λ with 0 < λ < 1.
In figures (i) and (ii) the shaded regions represent convex sets
included in R2 . In (i) we indicate each of the six extreme points
by small dark circles. Note that for (ii) every point in the bound-
ary of the shaded figure is an extreme point. This shows in par- (i)
ticular that a convex set can have an infinite number of extreme
points. In figure (ii) we illustrate why the point x is not extreme
x (1) x
by exhibiting a line segment with ends x(1) , x(2) which are con- x (2)
tained in the shaded figure and that properly contains x.
(ii)
Next, we present a theorem that will characterize the extreme
points in a polyhedron. We first need to introduce some notation
and definitions. Let Ax ≤ b be a system of inequalities and let x̄ denote a solution to Ax ≤ b.
We say that a constraint of aT x ≤ β of Ax ≤ b is tight for x̄ if aT x̄ = β . Such constraints are
also called active in part of the literature. We denote the set of all inequalities among Ax ≤ b
that are tight for x̄ by A= x = b= .
Theorem 2.19. Let P = {x ∈ Rn : Ax ≤ b} be a polyhedron and let x̄ ∈ P. Let A= x = b= be the

set of tight constraints for x̄. Then x̄ is an extreme point of P if and only if rank(A= ) = n.
We will illustrate this theorem on the polyhedron P that is a feasible region of the linear
program (2.32), see Figure (2.1). Suppose x̄ = (1, 2)T . We can see in that figure that x̄ is an
extreme point. Let us verify that this is what the previous theorem also indicates. Constraints
(1) and (2) are tight, hence,

= 1 1
A = .
0 1
2.8. GEOMETRY 107
It follows that rank(A= ) = 2 = n, hence x̄ is indeed an extreme point. Suppose x̄ = (0, 1)T . We
can see in that figure that x̄ is not an extreme point. Let us verify that this is what the previous
theorem also indicates. Constraint (4) is the only tight constraint, hence,

A= = −1 0 .
It follows that rank(A= ) = 1 < n, hence x̄ is not an extreme point.
Proof of Theorem 2.19. Assume that rank(A= ) = n. We will show that x̄ is an extreme point.
Suppose for a contradiction this is not the case. Then there exist (see Remark 2.18) x(1) , x(2) ∈
P, where x(1) 6= x(2) and λ where 0 < λ < 1 for which x̄ = λ x(1) + (1 − λ )x(2) . Thus,

b= = A= x̄ = A= λ x(1) + (1 − λ )x(2) = |{z}
λ A = (1)
x(2)} ≤ λ b= + (1 − λ )b= = b= .
x } + (1 − λ ) |A={z
| {z | {z }
>0 ≤b= >0 ≤b=
Hence, we have equality throughout which implies that A= x(1) = A= x(2) = b= . As rank(A= ) = n
there is a unique solution to A= x = b= . Therefore, x̄ = x(1) = x(2) , a contradiction.
Assume that rank(A= ) < n. We will show that x̄ is not an extreme point. Since rank(A= ) < n
the columns of A are linearly dependent, and hence there is a non-zero vector d such that
A= d = 0. Pick ε > 0 small and define
x(1) := x̄ + εd and x(2) := x̄ − εd.
Hence, x̄ = 21 x(1) + 12 x(2) and x(1) , x(2) are distinct. It follows that x̄ is in the line segment
between x(1) and x(2) . It remains to show that x(1) , x(2) ∈ P for ε > 0 small enough. Observe
first that
A= x(1) = A= (x̄ + εd) = |{z} A= d = b= .
A= x̄ +ε |{z}
=b= =0
Similarly, A =
x(2) = b . Let
=
aT x ≤ β be any of the inequalities of Ax ≤ b that is not in A= x ≤ b= .
It follows that for ε > 0 small enough,
aT (x(1) + εd) = a| T{z

x(1)} +εaT d ≤ β ,
<β
hence x(1) ∈ P and by the same argument, x(2) ∈ P as well.
Consider the following polyhedron,

4 1 3 1 0 2
P= x∈R : x= ,x ≥ 0 .
2 2 0 1 1
Note that x̄ = (0, 0, 2, 1)T is a basic feasible solution. We claim that x̄ is an extreme point of
P. To be able to apply Theorem 2.19 we need to rewrite P as the set of solutions to Ax ≤ b for
some matrix A and vector b. This can be done by choosing,
   
1 3 1 0 2
2 2 0 1 1
   
−1 −3 −1 0  −2
   
−2 −2 0 −1 −1
A= −1 0
 and b= 
 0 .
 0 0  
 0 −1 0 0 0
   
0 0 −1 0  0
0 0 0 −1 0.
Let A= x ≤ b= be the set of tight constraints for x̄, then
 
1 3 1 0
2 2 0 1
 
−1 −3 −1 0 
A =
=
−2 −2 0 −1

 
−1 0 0 0
0 −1 0 0
and it can be readily checked that the first two and last two rows of A= form a set of 4 linearly
independent rows. Hence rank(A= ) ≥ 4 = n. This implies (as x̄ is feasible) by Theorem 2.19
that x̄ is an extreme point of P. Using the idea outlined in the previous example we leave
it as an exercise to prove the following theorem which relates basic feasible solutions (for
problems in standard equality form) to extreme points,
Theorem 2.20. Let A be a matrix where the rows are linearly independent and let b be a
vector. Let P = {x : Ax = b, x ≥ 0} and let x̄ ∈ P. Then x̄ is an extreme point of P if and only
if x̄ is a basic solution of Ax = b.
2.8.4 Geometric interpretation of the Simplex algorithm

max z = 2x1 + 3x2
s.t.
2x1 + x2 ≤ 10 (1)
(P)
x1 + x2 ≤6 (2)
−x1 + x2 ≤4 (3)
x1 , x2 ≥ 0.
2.8. GEOMETRY 109
x2
10
(1)
z=17 8
(2)
6
x*
(3) -4 -2 0 2 4 6 8 x1
Figure 2.2: Feasible region and optimal solution
In Figure 2.2 we indicate the feasible region of this linear program as well as a feasible
solution x∗ = (1, 5)T . The line z = 17 indicates the set of all vectors for which the objective
function evaluates to 17. The points above this line have objective value greater than 17 and
the points below this line have value less than 17. Since there are no points in the feasible
region above the line z = 17, it follows that x∗ is an optimal solution. As the line z = 17
intersects the feasible region in only one point, namely x∗ , this also implies that x∗ is the
unique optimal solution.
Let us use the simplex algorithm to find this optimal solution x∗ . We first need to refor-
mulate this problem in standard equality form. This can be achieved by introducing slack
variables x3 , x4 , x5 for constraints respectively (1),(2) and (3) of (P). We obtain,
max z = 2x1 + 3x2

s.t.
2x1 + x2 + x3 = 10 (1)
(P̃)
x1 + x2 + x4 =6 (2)
−x1 + x2 + x5 =4 (3)
x1 , x2 , x3 , x4 , x5 ≥ 0.
Given any point x = (x1 , x2 )T we define,

 
x1
 x2 
 
x̃ :=  
10 − 2x1 − x2  ,
 6 − x1 − x2 
4 + x1 − x2
i.e. the components x̃3 , x̃4 , x̃5 are defined as the value of the slack of the constraints (1),(2) and
(3) respectively of (P). Thus, x is feasible for (P) if and only if x̃ is feasible for (P̃). Suppose
x is a feasible solution of (P) which is not an extreme point of the feasible region. Then x
is properly contained in the line segment with ends x(1) , x(2) where x(1) , x(2) are feasible for
(P), i.e. x(1) 6= x(2) and there exists λ such that 0 < λ < 1 and x = λ x(1) + (1 − λ )x(2) . It
can be readily checked that x̃ = λ x̃1 + (1 − λ )x̃2 . Hence, x̃ is properly contained in the line
segment with ends x̃1 , x̃2 . In particular, x̃ is not an extreme point of the feasible region of (P̃).
Conversely, if x̃ is not an extreme point for (P̃) then x is not an extreme point for (P). Hence,
Remark 2.21. x is an extreme point for the feasible region of (P) if and only if x̃ is an extreme
point for the feasible region of (P̃).
Starting in Section 2.3 we solved the linear program (P̃). The following table summarizes
the sequence of bases and basic solutions we obtained,
ITERATION BASIS x̃T xT
1 {3, 4, 5} (0, 0, 10, 6, 4) (0, 0)
2 {1, 4, 5} (5, 0, 0, 1, 9) (5, 0)
3 {1, 2, 5} (4, 2, 0, 0, 6) (4, 2)
4 {1, 2, 3} (1, 5, 3, 0, 0) (1, 5)
At each step, x̃ is a basic solution. It follows from Theorem 2.20 that x̃ must be an extreme
point of the feasible region of (P̃). Hence, by Remark 2.21, x must be an extreme point of the
feasible region of (P). We illustrate this in Figure 2.3. Each of x(1) , x(2) , x(3) , x(4) is an extreme
point and the simplex moves from one extreme point to another “adjacent” extreme point. In
this example, at each iteration, we move to a different basic feasible solution. The simplex
goes from one feasible basis to another feasible basis at each iteration. It is possible however,
that the corresponding basic solutions for two successive bases are the same. Thus the simplex
algorithm can keep the same feasible basic solution for a number of iterations. With a suitable
rule for the choice of entering and leaving variables the simplex will eventually move to a
different basic solution, see Section 2.5.3.
2.8. GEOMETRY 111
x2
10
6
x(4)
x(3)
2
x(1) x(2)
-4 -2 0 2 4 6 8 x1
Figure 2.3: Sequence of extreme points visited by simplex
Exercises
1. (a) Show that the set of all optimal solutions to a linear program is a convex set.
(b) Deduce that a linear program has either:
• no optimal solution,
• exactly one optimal solution, or
• an infinite number of optimal solutions.
2. Let F be the set of x satisfying

x1 + x2 + x3 + 2x4 = 3
x1 − x2 − x3 = 1
x1 − 2x3 + x4 = 2
x1 , x2 , x3 , x4 ≥ 0.
Find all extreme points of F. Justify your answers.
√
3. Let B := {x ∈ ℜn : kxk ≤ 1}. Recall, kxk := xT x corresponds to the length of x. Thus
B is the set of points in an n-dimensional space which are at distance at most 1 from the
origin, i.e. B is the n-dimensional unit sphere. It is intuitively clear that a sphere is convex.
In this exercise, you are asked to give an algebraic proof (using the definition of convexity
seen in class) of the fact that B is convex.
Hint: Use the Cauchy-Schwartz inequality, namely, for every pair of vectors x, y we have,
xT y ≤ kxk kyk.
E XTRA CREDIT: Show that z is an extreme point of B if and only if kzk = 1.
4. We say that an n × n matrix M is Positive Semidefinite if the following conditions hold:
• M is symmetric, i.e. M T = M, and

• for every vector z (of appropriate dimension), zT Mz ≥ 0.
Show that the set of all n × n Positive Semidefinite matrices forms a convex set.
5. Let A ⊆ ℜn be a convex set. Let M be an n × n matrix and define, B := {Mx : x ∈ A} i.e.

B is the set of all points in ℜn that can be obtained by selecting a point x ∈ A and applying
the transformation x → Mx.
(a) Show that B is convex.

(b) Suppose that M is invertible and consider z ∈ A, hence Mz ∈ B.
Show that z is an extreme point of A if and only if Mz is an extreme point of B.
6. Consider three points (vectors) x1 , x2 , x3 ∈ ℜn . We define
∆(x1 , x2 , x3 ) := {λ1 x1 + λ2 x2 + λ3 x3 : λ1 + λ2 + λ3 = 1, λ1 , λ2 , λ3 ≥ 0}.
(a) What does ∆(x1 , x2 , x3 ) correspond to geometrically when n = 2?
Let C be a subset of ℜn .
(b) Show that if for all x1 , x2 , x3 ∈ C, we have ∆(x1 , x2 , x3 ) ⊆ C, then C is convex.

H INT: x1 , x2 , x3 need not be distinct.
(c) Show that, if C is convex, then for all x1 , x2 , x3 ∈ C, we have ∆(x1 , x2 , x3 ) ⊆ C.
H INT: every point z ∈ ∆(x1 , x2 , x3 ) is in a line segment between a point y and x3 for
some point y that is in a line segment between x1 and x2 .
E XTRA CREDIT.
We say that x is a convex combination of points x1 , . . . , xk if x = ∑kj=1 λ j x j for some real
numbers λ1 , . . . , λk where ∑kj=1 λ j = 1 and λ j ≥ 0 for all j = 1, . . . , k.
2.8. GEOMETRY 113
(d) Show that if for all x1 , . . . , xk ∈ C, all convex combinations of x1 , . . . , xk are in C, then
C is convex.
(e) Show that, if C is convex, then for all x1 , . . . , xk ∈ C, all convex combinations of
x1 , . . . , xk are in C.
7. For each of the following sets, either prove that it is not a polyhedron, or give a matrix A
and a vector b such that the set is the solution set to Ax ≤ b,
(a) {(x1 , x2 , x3 )T : x1 ≥ 2 or x3 ≥ 2}.

(b) {(x1 , x2 )T : x1 ≤ x2 }.
(c) {(x1 , x2 , x3 , x4 )T : x1 + x2 + x3 = 4, x1 + x4 = 6, x2 ≥ 0}.
(d) {(x1 , x2 )T : x12 + x22 ≤ 1} (note x12 is the square of component x1 .)
8. Consider the polytope P defined by the following constraints,

   
1 0 1 4
1 1 0  4 
   
0 1 1  x ≤ 4  .
   
0 0 1  3 
0 1 −1 0
For each of the following points, determine whether it is an extreme point of P.
H INT: Consider the set of tight constraints in each case.
(a) (2, 2, 2)T ,

(b) (3, 1, 1)T ,
(c) (0, 1, 3)T ,
(d) (1, 3, 3)T .
9. Let C be a convex set and let x̄ ∈ C.
(a) Show that if x̄ is not an extreme point of C then C \ {x̄} is not convex.
(b) Show that if x̄ is an extreme point of C then C \ {x̄} is convex.
10. Consider the LP max{cT x : Ax ≤ b, x ≥ 0} where

1 1 2 1
A= b= c= .
1 0 1 2
(a) Convert this LP into SEF in the standard way, and call this LP2. Solve LP2 by applying
the simplex algorithm. Make sure you start with the basis {3, 4}. In each iteration, for
the choice of the entering variable amongst all eligible variables, always choose the
one with the smallest index.
(b) Give a diagram showing the set of feasible solutions of the LP, and show the or-
der that the Simplex algorithm visits its extreme points. (For each extreme point
(x1 , x2 , x3 , x4 )T of LP2 visited by the Simplex algorithm, you should indicate the ex-
treme point (x1 , x2 )T of the original LP.

The classical reference for the Simplex algorithm is the book by Dantzig [8]. The word “sim-
plex” is the name of a simple geometric object which generalizes a triangle (in R2 ) and a
tetrahedron (in R3 ) to arbitrary dimensions. Dantzig presents in his book an attractive ge-
ometric interpretation of the algorithm (which uses simplices) to suggest that the algorithm
would be efficient in practice. This is certainly worthwhile reading after this introductory
course is completed.
We saw that we can replace a free variable by the difference of two new non-negative
variables. Another way of handling free variables is to find an equation which contains the
free variable, isolate the free variable in the equation and use this identity elsewhere in the
LP, record the variable and the equation on the side and eliminate the free variable and the
equation from the original LP. This latter approach reduces the number of variables and
constraints in the final LP problem and is more suitable in many situations.
As we already hinted when discussing the finite termination and the smallest subscript
rule, there are many ways of choosing entering and leaving variables (these are called pivot
rules). In fact, there is a rule, based on perturbation ideas, called the lexicographic rule
which only restricts the choice of the leaving variable and also ensures finite convergence of
the Simplex algorithm. There are some theoretical and some computational issues which can
be addressed through proper choices of entering and leaving variables. See the books [5, 22].
In implementing the Simplex algorithm, it is important that the linear algebra is done in
an efficient and numerically stable way. Even though we derived various formulae involving
inverses of the matrices, in practice, A−1
B is almost never formed explicitly. For various tech-
niques that exploit sparsity in the data and solve linear systems of equations in a numerically
stable way, see [13].
Chapter 3
Duality through examples
In this chapter we revisit the shortest path and minimum-cost matching problems. Both were
first introduced in Chapter 1, where we discussed practical example applications. We further
showed that these problems can be expressed as IPs. The focus in this chapter will be on
solving instances of the shortest-path and matching problems. Our starting point will be to use
the IP formulation we introduced in Chapter 1.5 We will show that studying the two problems
through the lens of linear programming duality will allow us to design efficient algorithms.
We develop this theory further in Chapter 4.
3.1 The shortest path problem
Recall the shortest path problem from Chapter 1.4.1. We are given a graph G = (V, E), non-
negative lengths ce for all edges e ∈ E, and two distinct vertices s,t ∈ V . The length c(P) of
a path P is the sum of the length of its edges, i.e. ∑(ce : e ∈ P). We wish to find among all
possible st-paths one that is of minimum length.
Example 7. In the following figure we show an instance of this problem. Each of the edges
in the graph is labeled by its length. The thick black edges in the graph form an st-path
P = sa, ac, cb, bt of total length 3 + 1 + 2 + 1 = 7. This st-path is of minimum length, hence
is a solution to our problem.
115
116 CHAPTER 3. DUALITY THROUGH EXAMPLES
a 4 b
3 1
1 2
s t
4 2
c 2 d
3.1.1 An intuitive lower bound

We claimed in the previous example that the path P = sa, ac, cb, bt of length 7 is a shortest
st-path. How could you convince someone of that fact? Of course we could list all possible
st-paths and verify that P is indeed a shortest one. This is not a practical way of proceeding
however, as we may have a huge (exponential) number of such st-paths. Our goal in this
section is to find a certificate that can be used to quickly convince someone that a shortest
st-path is indeed shortest. As we will see, such a certificate is not only desirable from the
user’s point of view, but it also turns out to be crucial for designing the algorithm!
We will prove that the path P in Example 7 is indeed a shortest st-path. However, before
we do so it will be helpful to first consider the special case of the shortest st-path problem
where all edges e have edge length ce = 1. In that case we are looking for an st-path with as
few edges as possible. We refer to this case as the cardinality case. Consider the following
graph G = (V, E),
a b c
d
h
s t
j i g
Let us show that the path P = s j, ji, ig, gt is a shortest st-path of G. It has length 4. To show
that it is a shortest path, it suffices to show that every st-path has length at least 4. To do this
we exhibit the following collection of st-cuts (see the next figure),
δ (U1 ) = {sa, s j} δ (U2 ) = {ab, ah, i j}

δ (U3 ) = {bc, hc, ig} δ (U4 ) = {dt, gt}.
3.1. THE SHORTEST PATH PROBLEM 117
where,
U1 = {s}, U2 = {s, a, j}, U3 = {s, a, j, b, h, i}, and U4 = V \ {t}.
Observe, that no two of these st-cuts share an edge. Let Q be an arbitrary st-path. We know
from Remark 1.1 that every st-path and st-cut have a common edge. Hence, Q must contains
an edge ei of δ (Ui ), for i = 1, 2, 3, 4. Since these st-cuts are disjoint, e1 , e2 , e3 , e4 are distinct
edges. In particular, Q contains at least 4 edges. As Q was an arbitrary st-path, every st-path
contains at least 4 edges. It follows that P = s j, ji, ig, gt is a shortest st-path of G. The
a b c
d
h
s t
δ(U1 ) δ(U4 )
j i g
δ(U2 ) δ(U3 )
collection of st-cuts δ (Ui ) (i = 1, 2, 3, 4) is our certificate of optimality. More generally, if a

graph has k pairwise disjoint st-cuts then every st-path has length at least k and any st-path
with k edges is a shortest st-path.
Let us now go back to the problem where the edges e of G can have arbitrary lengths
ce ≥ 0. Rather than selecting a subset of st-cuts as in the cardinality case, we will assign to
every st-cut δ (U), a non-negative number ȳU 1 that we call the width of the st-cut δ (U). We
say that the widths {ȳU : U ⊆ V, s ∈ U,t ∈
/ U} are feasible if they satisfy the following
Feasibility condition. For every edge e ∈ E, the total width of all st-cuts that (3.1)
contain e does not exceed the length of e.
Example 7, continued. We assign the following widths for the graph G in Example 7.
ȳU1 = 3 ȳU2 = 1
ȳU3 = 2 ȳU4 = 1
where
U1 = {s}, U2 = {s, a}, U3 = {s, a, c}, and U4 = {s, a, c, b, d}.
1 while it might be more natural to use the notation ȳδ (U) to denote the width of the st-cut δ (U) the notation
ȳU is more compact while being unambiguous.
The other st-cuts are assigned a width of zero. The non-zero widths are represented in the
following figure. We claim that the widths are feasible. Let us check the condition for edge
y U2 = 1
a 4 b
3 1
1 2
s t
4 2
y U1 =3 2
c d y U4 = 1
y U3 = 2
ab for instance. The two st-cuts that have non-zero width and that contain ab are δ (U2 ) and
δ (U3 ) which have respectively width ȳU1 = 1 and ȳU2 = 2. Thus the total width of all st-cuts
that contain the edge ab is 1 + 2 = 3 which does not exceed the length cab = 4 of ab. We leave
it as an exercise to verify the feasibility condition for every other edge of the graph.
Suppose now we have a graph G = (V, E) (with s,t ∈ V and ce ≥ 0 for all e ∈ E) that has
feasible widths. Let Q denote an arbitrary st-path. By the feasibility condition, for every edge
e of Q, the total width of all st-cuts using e is at most ce . It follows that the total width of all
st-cuts using some edge of Q is at most ∑(ce : e ∈ Q), i.e. the length of Q. We know however
from Remark 1.1 that every st-cut of G contains some edge of Q. Hence, the length of Q is at
least equal to the total width of all st-cuts. We summarize this result,
Proposition 3.1 (Optimality Conditions for Shortest Paths). If the widths are feasible, then
the total width of all st-cuts is a lower bound on the length of any st-path. In particular, if an
st-path has length equal to the total width of all st-cuts, then it is a shortest st-path.
Example 7, continued. We can now prove that the path P = sa, ac, cb, bt is a shortest st-path.
We found a set of feasible widths, with total width,
ȳU1 + ȳU2 + ȳU3 + ȳU4 = 3 + 1 + 2 + 1 = 7.
As c(P) = 7, it follows from Proposition 3.1 that P is a shortest st-path.

The arguments used to prove Proposition 3.1 are fairly elementary. We will see however,
that this result is surprisingly powerful. Indeed we will be able to design an efficient algo-
rithm based on this optimality condition. Deriving good optimality conditions is a key step in
designing an algorithm.
Exercises
1. Let G = (V, E) be a graph and let s,t be distinct vertices of G.
(a) Show that if we have a collection of k edge disjoint st-paths then every st-cut contains
at least k edges.
(b) For the following graph find an st-cut with as few edges as possible. Use (a) to justify
your answer.
a f
e
c
s
h
d t
b
g
Suppose now each edge e has a non-negative thickness ce . The thickness of an st-cut is
the sum of the thickness of all the edge in the cut. Suppose we assign to each st-path P a
non-negative width yP and assume that the following condition holds for every edge e:
The total width of all the st-paths using e does not exceed the thickness of edge e.
(c) Show then that the total width of all the st-paths is a lower bound on the thickness of
any st-cut.
(d) For the following graph find an st-cut with minimum thickness. Edge labels indicate
thickness. Use (c) to justify your answer.
a f
2 6
4 e
2
c
2 4
s 4
7
3 h
3 3
d t
5
5
5 4 6
b 4
g
3.1.2 A general argument - weak duality

We used an ad-hoc argument argument to obtain Proposition 3.1. At a first glance it is not
obvious at all where the idea of assigning width to st-cuts arises from. We will show however,
that it is a natural idea when we look at the problem through the lens of duality theory. In this
section, we derive a natural bound on the values a certain class of linear programs can obtain.
In Section 3.1.3 we show that Proposition 3.1 is a direct consequence of that result.
Example 8. Consider the linear program
min{z(x) = cT x : Ax ≥ b, x ≥ 0}, (3.2)
where    
2 1 20
2
A =  1 1 b = 18 c= .
3
−1 1 8
It is easy to verify that the vectors (8, 16)T and (5, 13)T are all feasible for the linear pro-
gram (3.2). Their objective values are 64 and 49, respectively, and hence the first feasible
solution is clearly not optimal. We will show however that (5, 13)T is an optimal solution by
proving that z(x̄) ≥ 49 for every feasible solution x̄.
How can we find (and prove) such a lower bound? Recall from Chapter 2, that every
feasible solution of (3.2) must satisfy the inequality
   
2 1 20
  
(y1 , y2 , y3 ) 1 1 x ≥ (y1 , y2 , y3 ) 18 . (3.3)
−1 1 8
for any non-negative vector y = (y1 , y2 , y3 )T . If we choose values, ȳ1 = 0, ȳ2 = 2 and ȳ3 = 1
we obtain the inequality,
(1, 3)x ≥ 44 or equivalently 0 ≥ 44 − (1, 3)x.
Adding this inequality to the objective function z(x) = (2, 3)x yields,
z(x) ≥ (2, 3)x + 44 − (1, 3)x = 44 + (1, 0)x.
Let x̄ be any feasible solution. As x̄ ≥ 0 and (1, 0) ≥ 0T we have (1, 0)x̄ ≥ 0. Hence, z(x̄) ≥ 44.
Thus, we have proved that no feasible solution has value smaller than 44. Note, this is not
quite sufficient to prove that (5, 13)T is optimal. It shows however, that the optimum value for
(3.2) is between 44 and 49. It is at most 49, as we have a feasible solution with that value, and
it cannot be smaller than 44 by the previous argument.
Let us search for y1 , y2 , y3 ≥ 0 in a systematic way. We rewrite (3.3) as
   
20 2 1
0 ≥ (y1 , y2 , y3 ) 18 − (y1 , y2 , y3 )  1 1 x
8 −1 1
and add it to the objective function z(x) = (2, 3)x, to obtain,

    
20 2 1
z(x) ≥ (y1 , y2 , y3 ) 18 + (2, 3) − (y1 , y2 , y3 )  1 1 x. (3.4)
8 −1 1
Suppose that we pick, y1 , y2 , y3 ≥ 0 such that

 
2 1
(2, 3) − (y1 , y2 , y3 )  1 1 ≥ 0T .
−1 1
Then for any feasible solution x̄, inequality (3.4) and the fact that x ≥ 0 implies that
 
20

z(x̄) ≥ (y1 , y2 , y3 ) 18 .
8
For a minimization problem the larger the lower bound the better. Thus, the best possible
upper bound for (3.2) we can achieve using the above argument is given by the optimal value
to the following linear program,
 
20
max (y1 , y2 , y3 ) 18
8
subject to
 
2 1
(2, 3) − (y1 , y2 , y3 )  1 1 ≥ 0T
−1 1
y1 , y2 , y3 ≥ 0
which we can rewrite as,
max (20, 18, 8) y

subject to
 T
2 1 (3.5)
 1 1 y ≤ 2
3
−1 1
y ≥ 0.
Solving this linear program gives

5 1
ȳ1 = 0, ȳ2 = , and ȳ3 = ,
2 2
and this solution has objective value 49. Since solution (5, 13)T has value 49, it is optimal.
Let us generalize the previous argument and consider the following linear program,
min{cT x : Ax ≥ b, x ≥ 0} (3.6)
We first choose a vector y ≥ 0 and create a new inequality,
yT Ax ≥ yT b.
This last inequality is obtained from Ax ≥ b by multiplying the 1st inequality by y1 ≥ 0 the
2nd by y2 ≥ 0 the 3rd by y3 ≥ 0 etc, and by adding all of the resulting inequalities together.
This inequality can be rewritten as
0 ≥ yT b − yT Ax
which holds for every feasible solution x̄ of (3.6). Thus, adding this inequality to the objective
function z(x) = cT x yields,
z(x) ≥ yT b + cT x − yT Ax = yT b + (cT − yT A)x. (3.7)
Suppose that because of the choice of y, cT − yT A ≥ 0T . Let x̄ be any feasible solution. As

x̄ ≥ 0 we have that (cT − yT A)x̄ ≥ 0. It then follows by (3.7) that z(x̄) ≥ yT b. Thus, we
have shown that for all y ≥ 0 such that cT − yT A ≥ 0T the value yT b is a lower bound on the
value of the objective function. Finally, note that the condition cT − yT A ≥ 0T is equivalent to
yT A ≤ cT , i.e. to AT y ≤ c.
The best lower bound we can get in this way is the optimal value of
max{bT y : AT y ≤ c, y ≥ 0}. (3.8)
Let us summarize our result and provide a more direct proof.
Theorem 3.2 (Weak Duality - Special form). Consider the following pair of linear programs,
min{cT x : Ax ≥ b, x ≥ 0} (P)
max{bT y : AT y ≤ c, y ≥ 0} (D)
Let x̄ be a feasible solution for (P) and ȳ be a feasible solution for (D). Then cT x̄ ≥ bT ȳ.
Moreover, if equality holds then x̄ is an optimal solution for (P).
In the previous theorem we define (D) to be the dual of (P).
Proof of Theorem 3.2. Let x̄ be a feasible solution of (P) and let ȳ be a feasible solution (D).
Then
bT ȳ = ȳT b ≤ yT (Ax̄) = (yT A)x̄ = (AT ȳ)T x̄ ≤ cT x̄.
The first inequality follows from the fact that ȳ ≥ 0 and that Ax̄ ≥ b. The second inequality
follows from the fact that AT ȳ ≤ c and that x̄ ≥ 0. Finally, as bT ȳ is a lower bound on (P), if
cT x̄ = bT ȳ it follows that x̄ is optimal for (P).
3.1.3 Revisiting the intuitive lower bound

Suppose we are given a graph G = (V, E) with distinct vertices s,t and edge lengths ce ≥ 0
for every e ∈ E. The following is an integer programming formulation for the shortest st-path
problem (see Section 1.5.2),

subject to
(3.9)
∑ xe : e ∈ δ (U) ≥ 1 (U ⊆ V, s ∈ U,t ∈
/ U)
xe ≥ 0 (e ∈ E)
Example 9. In the following figure we show a simple instance of the shortest path problem.
Each of the edges in the graph is labeled by its length.
a
3 2
s 1 t
4 2
Let x := (xsa , xsb , xab , xat , xbt )T . Then (3.9) specializes to the following in this case,
min (3, 4, 1, 2, 2)x

subject to
sa sb ab at bt
 
{s} 1 1 0 0 0
  (3.10)
{s, a}  0 1 1 1 0x ≥ 1.
 
{s, b}  1 0 1 0 1
 
{s, a, b} 0 0 0 1 1
x≥0 integer
Each of the first 4 constraints correspond to one of the 4 distinct st-cuts δ (U). For instance
the second constraint, corresponding to δ ({s, a}) = {sb, ab, at}, states that xsb + xab + xat ≥ 1.
For an integer program (IP) we call the linear program, obtained by removing the condition
that some variables have to take integer values, the linear programming relaxation of (IP).
Consider an instance of the shortest path problem G = (V, E) with s,t ∈ V and ce ≥ 0 for all
e ∈ E. Suppose P is some arbitrary st-path of G. Then we can construct a feasible solution x̄
to the integer program (3.9) as follows:

1 if e is an edge of P
x̄e = (3.11)
0 otherwise.
Moreover, the length c(P) of the st-path P is equal to the value of x̄ for (3.9). It follows in
particular that c(P) is greater of equal to the optimal value of (3.9). Let (P) denote the linear
programming relaxation of (3.9). Since the constraints of (P) are a subset of the constraints
of (3.9) and since (3.9) is a minimization problem, the optimal value of (3.9) is greater or
equal to the optimal value of (P). Let (D) be the dual of (P). We know from the Weak duality
theorem 3.2 that the optimal value of (P) is greater or equal to the value of any feasible solution
ȳ of (D). Hence, we have proved the following result,
Remark 3.3. If (D) is the dual of the linear programming relaxation of (3.9) then the value
of any feasible solution of (D) is a lower bound on the length of any st-path.
Example 9, continued. Let us compute the dual of the linear programming relaxation of (3.10),
as defined in Theorem 3.2. Observe that by taking the dual we interchange the role of the vari-
ables and the constraints. We now have variables yU for each st-cut δ (U) and one constraint
T
for each edge of G. Hence, let us define y := y{s} , y{s,a} , y{s,b} , y{s,a,b} . The dual is given by
max 1T y
subject to
{s} {s, a} {s, b} {s, a, b}
 
sa 1 0 1 0  
  3

sb  1 1 0 0   (3.12)
4
   
ab 0 1 1 0 y ≤ 1 ,
  
  2

at  0 1 0 1  
2
bt 0 0 1 1
y≥0
where 1 denotes a column vector of 1s. Note, a feasible solution to (3.12) assigns some non-
negative width yU to every st-cut δ (U). The constraint for edge sb states that y{s} + y{s,a} ≤
4. Observe that δ ({s}) = {sa, sb} and δ ({s, a}) = {at, ab, sb} are the two st-cuts of G that
contain edge sb. Finally, 4 is the length of the edge sb. Hence the constraint y{s} + y{s,a} ≤ 4
says that the total width of all st-cuts of G that contain edge sb does not exceed the length of
sb. The corresponding condition holds for every other edge. Hence, if y is feasible for (3.12)
the widths y satisfy the Feasibility Conditions (3.1). The objective function 1T y calculates the
total width of all st-cuts. Hence, it follows from Remark 3.3 that if the width are feasible, then
the total width of all st-cuts is a lower bound on the length of any st-path. This was precisely
the statement of Proposition 3.1.
Consider now a general instance of the shortest path problem. We are given a graph
G = (V, E), vertices s,t ∈ V , and lengths ce ≥ 0 for all e ∈ E. We can rewrite the linear
programming relaxation of (3.9) as,
min{cT x : Ax ≥ 1, x ≥ 0}, (3.13)
where c is the vector of edge lengths, and the matrix A is defined as follows:
1. the rows of A are indexed by sets U ⊆ V with s ∈ U,t ∈

/ U,
2. columns are indexed by edges e ∈ E, and
3. for every row U and every column e:


1 if e is an edge in δ (U),
A[U, e] =
0 otherwise.
Remark 3.4.
(1) In row U of A, entries with a 1 correspond to the edges in δ (U).
(2) In column e of A, entries with a 1 correspond to the st-cuts containing edge e.
The dual of (3.13), as defined in Theorem 3.2, is given by,
max{1T y : AT y ≤ c, y ≥ 0}. (3.14)
Let us try to understand this dual. There is a variable yU for every st-cut δ (U) and a constraint
for every edge e ∈ E. Consider the constraint for edge e ∈ E. The right-hand side of this
constraint is the length ce of the edge, and the left-hand side corresponds to column e of A.
Remark 3.4(2) implies that the left-hand side of this constraint is the sum of the variables yU
over all st-cuts δ (U) that contain e. We can therefore rewrite (3.14) as follows:

max ∑ yU : δ (U) is an st-cut
subject to
(3.15)
y
∑ U : δ (U) is an st-cut containing e ≤ ce (e ∈ E)
yU ≥ 0 (δ (U) is an st-cut)
A feasible solution ȳ to (3.15) assigns non-negative width ȳU to every st-cut δ (U). The
constraint for each edge e states that the total width of all st-cuts of G that contain edge e does
not exceed the length of e. Hence, if ȳ is feasible for (3.15) the widths ȳ satisfy the Feasibility
Conditions (3.1). The objective function 1T y calculates the total width of all st-cuts. Hence,
as in the previous example, it follows from Remark 3.3 that if the width are feasible, then the
total width of all st-cuts is a lower bound on the length of any st-path. Hence, we now have
an alternate proof for Proposition 3.1.
While the derivation of Proposition 3.1 using duality may, at first glance, seem more tech-
nical than the ad-hoc argument we had in Section 3.1.1, notice that our derivation was com-
pletely mechanical: We formulated the shortest path problem as an integer program, wrote
the dual of its linear programming relaxation, and used weak duality to obtain bounds on the
possible values of our original optimization problem. After generalizing the notion of duals
in Chapter 4, we will be able to apply the aforementioned strategy to arbitrary optimization
problems that can be formulated as integer programs. A word of caution, however: The qual-
ity of the bounds obtained through this procedure depend on the problem, and on the linear
programming relaxation used. In the shortest path example discussed here, the bound pro-
duced was sufficient to prove optimality of a certain primal solution. This may not always be
possible as we will see in Chapter 6.
Exercises
1. A vertex cover of a graph G = (V, E) is a set S of vertices of G such that each edge of G is
incident with at least one vertex of S . The following IP finds a vertex cover of minimum
cardinality (see Problem 2 in Chapter 1.5.2),

min ∑ xe : e ∈ E
subject to
xi + x j ≥ 1 (for all i j ∈ E)
x≥0
Denote by (P) the linear programming relaxation of this IP.
(a) Find the dual (D) of (P).
(b) Show that the largest size (number of edges) of a matching in G is a lower bound on
the size of a minimum vertex cover of G.
H INT: Use part (a) and Theorem 3.2.
(c) Give an example where all matchings have size strictly less than the optimal solutions
of (P) and of (D) and where these are strictly less than the size of the minimum vertex
cover.
2. The following graph has vertices of two types, a set H = {1, 2, 3, 4, 5, 6, 7} of hubs, indi-
cated by filled circles, and a set C = {a, b, c, d, e, f } of connectors, indicated by squares.
a 2 d
5
b 3
1 e 7
6
c 4 f
A subset S of the hubs is dominant if each connector in C is has an edge to least one hub in
S. For instance, S = {1, 3, 7} is dominant because a, b and c have edges to 1, d and f have
edges to 7, and e has an edge to 3.
(a) Formulate as an IP the problem of finding a dominant set S ⊆ H which is a small as

possible.
H INT: Assign a decision variable to each hub and a constraint for each connector.
Denote by (P) the linear programming relaxation of the IP given in (a).
(b) State the dual (D) of (P).

(c) Find a solution of (D) of value greater than 2.
(d) Using Theorem 3.2 prove that every dominant set of hubs contains at least three hubs.
3.1.4 An algorithm
We will present an algorithm to solve the shortest path problem based on the optimality condi-
tions given in Proposition 3.1. Before we do so, however, we require a number of definitions.
We will need to generalize our definition of a graph G = (V, E) by

u v
allowing both edges and arcs. Recall that an edge uv is an unordered
u v
pair of vertices. We call an ordered pair of vertices an arc, and denote
it by →
−
uv. Vertex u is the tail of →
−
uv and vertex v is the head of →
−
uv. We
represent an arc as an arrow going from the tail of the arc to the head of the arc. A directed
st-path is a sequence of arcs,
−
v−→ −−→ −−−−−→ −−−→
1 v2 , v2 v3 , . . . , vk−2 vk−1 , vk−1 vk
such that v1 = s, vk = t and vi 6= v j for all i 6= j. In other words a directed st-path is an st-path
in the graph obtained by ignoring the orientation, with the additional property that for any two
consecutive arcs, the head of the first arc, is the tail of the second arc. The following figure
gives an example of a graph with both edges and arcs. The thick arcs form a directed st-path.
a b
c d
t
g s
Consider now a general instance of the shortest path problem. We are given a graph
G = (V, E), vertices s,t ∈ V , and lengths ce ≥ 0 for all e ∈ E. Suppose that ȳ is a feasible
solution to (3.15). In other words every st-cut δ (U) has a non-negative width ȳU ≥ 0 and the
Feasibility Condition (3.1) holds, namely, for every edge e the total width of all st-cuts using
e is smaller or equal to the length of e. We define the slack of edge e for ȳ to be the quantity,

slackȳ (e) := ce − ∑ ȳU : δ (U) is an st-cut containing e .
In other words, slackȳ (e) is the length of e minus the total width of all st-cuts using e.
We are now ready to describe our algorithm. 2
Example 10. Consider the shortest path problem described in Figure 3.1(i). We start with
ȳ = 0 and U = {s}. We compute the slack of the edges in δ ({s}),
slackȳ (sa) = 6, slackȳ (sb) = 2, slackȳ (sc) = 4,

2 We will assume that at least one st-path exists in the graph G.
Algorithm 2 Shortest-path.
Input: Graph G = (V, E), costs ce ≥ 0 for all e ∈ E, s,t ∈ V where s 6= t.
Output: A shortest st-path P
1: ȳW := 0 for all st-cuts δ (W ). Set U := {s}
2: while t 6∈ U do
3: Let ab be an edge in δ (U) of smallest slack for ȳ where a ∈ U, b ∈
/U
4: ȳU := slackȳ (ab)
5: U := U ∪ {b}
→
−
6: change edge ab into an arc ab
7: end while
8: return A directed st-path P.
and therefore sb is the edge in δ ({s}) of smallest slack for ȳ. We let y{s} = 2, we set U :=
→
−
{s, b} and change edge sb into an arc sb. See figure (ii).
Since t ∈
/ U = {s, b} we compute the slack of the edges in δ ({s, b}),
slackȳ (sa) = 6 − y{s} = 6 − 2 = 4

slackȳ (sc) = 4 − y{s} = 4 − 2 = 2
slackȳ (bc) = 1
slackȳ (bt) = 5,
and therefore bc is the edge in δ ({s, b}) with smallest slack for ȳ. We let ȳ{s,b} = 1, we set
→
−
U = {s, b, c} and change edge bc into an arc bc. See figure (iii).
Since t ∈
/ U = {s, b, c} we compute the slack of the edges in δ ({s, b, c}),
slackȳ (sa) = 6 − y{s} − y{s,b} = 6 − 2 − 1 = 3

slackȳ (ca) = 1
slackȳ (ct) = 2
slackȳ (bt) = 5 − y{s,b} = 5 − 1 = 4,
and therefore ca is the edge in δ ({s, b, c}) with smallest slack for ȳ. We let ȳ{s,b,c} = 1, we set
U = {s, b, c, a} and change edge ca into an arc → −
ca. See figure (iv).
yU = 2 yU = 1 yU = 1
s 6 a s 6 a s 6 a
1 1 1
4 c 4 c 4 c
2 4 2 4 2 4
1 2 1 2 1 2
5 t 5 t 5
b b b t
(i) (ii) (iii)
s 6 a s 6 a
1 1
4 c 4 c
yU = 1 2 4 2 4
1 2 1 2
5 5 t
b t b
(iv) (v)
Figure 3.1: Shortest path algorithm - an example.
Since t ∈
/ U = {s, b, c, a} we compute the slack of the edges in δ ({s, b, c, a}),
slackȳ (at) = 4
slackȳ (ct) = 2 − y{s,b,c} = 2 − 1 = 1
slackȳ (bt) = 5 − y{s,b} − y{s,b,c} = 5 − 1 − 1 = 3,
and therefore ct is the edge in δ ({s, b, c, a}) with smallest slack for ȳ. We let ȳ{s,b,c,a} = 1, we
→
−
set U = {s, b, c, a,t} and change edge ct into an arc ct . See figure (v). Note, now that t ∈ U
→
− →− → −
and there exists a directed st-path P, namely P = sb, bc, ct . It can be readily checked that ȳ
are feasible width. Moreover, the total width of all st-cut is equal to 2 + 1 + 1 + 1 = 5 and the
length c(P) of P is equal to 2 + 1 + 2 = 5. It follows from Proposition 3.1 that P is a shortest
st-path.
We will prove that this simple algorithm is guaranteed to always find a shortest st-path in
the next section. Suppose after running the algorithm and finding a shortest path P we define
variables x̄ as in (3.11). Then x̄ is a feasible solution to the linear program (3.9) and ȳ is a
feasible solution to the dual linear program (3.15). Moreover, we will see that the algorithm
will guarantee that the total width of all st-cuts is equal to the length of the shortest st-path.
In other words that the value of x̄ in (3.9) is equal to the value ȳ in (3.15). It follows from
Weak Duality (Theorem 3.2), that x̄ is an optimal solution to (3.9). Hence, correctness of the
algorithm will imply the following result,
Theorem 3.5. If ce ≥ 0 for all e ∈ E then the linear program (3.9) has an integral optimal
solution; i.e., it has an optimal solution all of whose variables have integer values.
Note that our shortest path algorithm preserves at each step a feasible solution to the dual
linear program (3.15). This is an example of a primal-dual algorithm. We will see axnother
example of such an algorithm in Chapter 5.1.
Exercises
1. For each of the following two graphs find the shortest path between s and t using the
algorithm described in the course. The number next to edge edge indicates the length of
the edge. Make sure to describe for each step of the algorithm which edge becomes an
arc, which vertex is added to the set U and what st-cut becomes active. At the end of the
procedure give the shortest st-path and certify that it is indeed a shortest st-path.
2
2 4
2
3 4
s
6 3 4 3 1
3 3 1 6
s 2 1 t
5
4 7 3
7
2 1 t
3.1.5 Correctness of the algorithm

Consider a graph G = (V, E) with distinct vertices s,t ∈ V and lengths ce ≥ 0 for all e ∈ E.
Suppose that ȳ is a feasible solution to (3.15). We say that an edge (resp. an arc) uv is an
equality edge (resp. arc) if its slack for ȳ is zero, i.e. if slackȳ (uv) = 0. We say that a cut δ (U)
is active for ȳ if ȳU > 0.
We shall first refine our optimality condition given in Proposition 3.1.
Proposition 3.6. Let ȳ be feasible widths and let P be an st-path. Then P is a shortest st-path
if both of the following conditions hold:
(P1) all edges of P are equality edges for ȳ,
(P2) all active cuts for ȳ contain exactly one edge of P.
Proof. Suppose that P is an st-path that satisfies both (P1) and (P2) for feasible widths ȳ. For
every edge e of P, denote by Ce the set of all active st-cuts that contain edge e. To prove that P
is a shortest path it suffices (because of Proposition 3.1) to verify that the following relations
hold,

Total width of st-cuts = ∑ yU : δ (U) is an active st-cut
(a)
h i
= ∑ ∑ yU : δ (U) ∈ Ce
e∈P
(b)
= ∑ ce = length of P.
e∈P
Finally, note that (a) holds because (P2) and that (b) holds because (P1).
Note that the algorithm is guaranteed to terminate after at most |V | iterations since at every
step one vertex is added to the set U. Hence, to show that the algorithm is correct it will suffice
to verify the following result,
Proposition 3.7 (Correctness of Shortest Path algorithm).

In S TEP 8 a directed st-path P exists and it is a shortest st-path.
Proof. Throughout the execution, we maintain the following properties:
(I1) ȳ are feasible width,
(I2) all arcs are equality arcs for ȳ,
(I3) there is no active cut δ (Ū) for ȳ and arc uv with u ∈

/ Ū and v ∈ Ū,
(I4) for every u ∈ U where u 6= s there exists a directed su-path,
(I5) all arcs have both ends in U.

Above, U refers to the current set U as defined in step 5 of Algorithm 2.
Claim. If (I1)-(I4) hold in S TEP 8 then a directed st-path P exists and it is a shortest st-path.
Proof of claim: Since the loop completed, t ∈ U and by (I4) there exists a directed st-path P.
We will show that P is a shortest st-path. (I1) states that ȳ is feasible. (I2) implies that all
arcs of P are equality arcs. Because of Proposi-
tion 3.6 it suffices to verify that active cuts for ȳ con-
tain exactly one edge of P. Suppose for a contradic- s
tion there is an active cut δ (U) that contains at least
two edges of P (it contains at least one edge because f
U
of Remark 1.1). Denote by f the second arc of P,
starting from s, that in δ (U). Then (see figure on the
right), the tail of f is not in U but the head of f is in
t
U, contradicting (I3). 3
We leave it as an exercise to verify that (I1)-(I5) hold at the end of S TEP 1. Assume that
(I1)-(I5) hold at the beginning of the LOOP we will show that (I1)-(I5) hold at the end of the
LOOP . Together with the Claim this will complete the proof. In the following argument U and
ȳ represent the quantities at the start of the LOOP and U 0 and y0 the end of the LOOP. The only
difference between ȳ and y0 is that ȳU = 0 and that yU
0 = slack (ab). Hence, to verify (I1) it
ȳ
suffices to consider f ∈ δ (U). Then

0
slacky0 ( f ) = c f − ∑ yU : δ (U) is an st-cut containing f

0
= c f − ∑ ȳU : δ (U) is an st-cut containing f − yU (?)
= slackȳ ( f ) − slackȳ (ab).
By the choice of ab in STEP 3, slackȳ ( f ) ≥ slackȳ (ab). Hence, (?) implies that slacky0 ( f ) ≥ 0.
Thus the feasibility condition (3.1) holds for y0 , i.e. (I1) holds at the end of the LOOP. By (?)
we have slacky0 (ab) = 0. Since ab is the only new arc between the start and the end of the
LOOP , (I2) holds at the end of the LOOP . The only cut that is active in y0 but not ȳ is δ (U).
By (I5) all arcs different from ab have both ends in U. Arc ab has tail in U and head outside
U. It follows that (I3) holds at the end of the LOOP . By (I4) there exists an sa-directed path
Q. Moreover, by (I5) all arcs of Q have both tail and head in U. Hence, b is distinct from all
vertices in Q, and adding arc ab at the end of Q gives a directed sb-path. It follows that (I4)
3.2. MINIMUM COST PERFECT MATCHING IN BIPARTITE GRAPHS 135
holds at the end of the LOOP. Finally, since the only new arc is ab and a, b ∈ U 0 , (I5) holds at
the end of the LOOP.
Exercises
1. Let G = (V, E) be a graph with vertices s and t and suppose there exists at least one st-path
in G. The cardinality case of the shortest path problem is the problem of finding a shortest
path with as few edges as possible.
a) Simplify the shortest path algorithm in Chapter 3.1.4 to deal with the cardinality case.
You should strive to make the resulting algorithm as simple as possible.
b) Simplify Proposition 3.6 for the cardinality case. Try to make the resulting statement
as simple as possible.
c) Simplify Proposition 3.7 for the cardinality case. Try to make the proof of correctness
as simple as possible.
2. Let G = (V, E) be a graph with distinct vertices s and t and non negative edge weights c.
(a) Show that if G has no st-path then the LP (3.15) is unbounded.

(b) Show that (3.9) has an optimal solution if and only if G has an st-path.
H INT: Use theorems 3.2 and 2.11.
3.2 Minimum cost perfect matching in bipartite graphs

Recall the minimum cost perfect matching problem from Chapter 1.4.2. We are given a graph
G = (V, E) and costs ce for all edges e ∈ E. (Note, we allow the costs to be negative.) A
perfect matching M is a subset of the edges with the property that for every vertex v exactly
one edge of M is incident to v. The cost c(M) of a perfect matching is defined as the sum of
the costs of the edges of M, i.e. as ∑(ce : e ∈ M). We wish to find among all possible perfect
matchings, one that is of minimum cost.
Example 11. In the following figure we show an instance of this problem. Each of the edges
in the graph is labeled by its cost. The thick edges in the graph form a perfect matching
M = {ag, hb, cd} of total cost 3 + 2 + 1 = 6. This perfect matching is of minimum cost, hence
is a solution to our problem.
a
3 6
1
g b
1
h 2
7 3
4 0
d c
1
3.2.1 An intuitive lower bound

We claimed in the previous example that the matching M = {ag, hb, cd} of cost 6 is a min-
imum cost perfect matching. How could you convince someone of that fact? Of course we
could list all possible perfect matchings and verify that M is indeed the one of minimum cost.
This is not a practical way of proceeding however, as we may have a huge (exponential) num-
ber of perfect matchings. Our goal in this section is to find a certificate that can be used to
quickly convince someone that a perfect matching is indeed a minimum cost perfect match-
ing. Once again, we will see that this kind of a certificate will also be of crucial importance
for the design of an algorithm.
Example 11 continued. Suppose that for every edge incident to vertex b we decrease the cost
of all edges incident to vertex b by the value 3, see figure 3.2(i). Since every perfect matching
has exactly one edge that is incident to vertex b this will decrease the cost of every perfect
matching by exactly 3. In particular, if a matching M is a minimum cost perfect matching
with these new costs, then it must be a minimum cost perfect matching with the original costs.
In figure 3.2(ii) we pick values for every vertex in the graph and repeat the same argument
for each vertex. For instance if as we choose value 3 for vertex b and value 2 for vertex a the
new cost of edge ab is given by 6 − 3 − 2 = 1. We indicate in (ii) the new cost for every edge.
Observe now, that for the graph given in (ii), every edge has non-negative cost. This implies
in particular, that every perfect matching has non-negative cost. However, as the matching
M = {ag, hb, cd} has cost 0, M must be a minimum cost perfect matching with these new
costs. But then M must be a minimum cost perfect matching with the original costs.
Let us try to extend these ideas to an arbitrary graph G = (V, E) with edge costs ce for all
e ∈ E. Let us assign to every vertex u, a number ȳu that we call the potential of the vertex u.
2
a a
6−
3 3 0 1
1 0
g b g b
1
h 3 3 1 h 0
3
2− 1
−1
3−3
7 3 2
4 0 2 3
d c d c
1 3 0 −2
(i) (ii)
Figure 3.2: Reduced costs
The reduced cost of the edge uv is defined as
c̄uv := cuv − ȳu − ȳv .
Let M be a perfect matching. Since by definition M has exactly one edge that is incident to
every vertex u, the difference between the cost of M with costs c and costs c̄ is given by the

sum of all the potentials ȳu over every vertex u, i.e. by ∑ ȳu u ∈ V . Since this quantity is a
constant (for a fixed ȳ) it follows in particular that if M is a minimum cost perfect matching
with the reduced costs c̄, then it must be a minimum cost perfect matching with the original
costs c. If we selected the potentials ȳ so that every reduced costs has non-negative costs then
every perfect matching has non-negative costs. If in addition all edges in the perfect matching
M have zero reduced costs, then M must be a minimum cost perfect matching with respect to
the reduced costs c̄. But then, it means that M is a minimum cost perfect matching with the
original costs c.
We formalize this result by introducing a few definitions. Let we say that an edge uv is an
equality edge with respect to some potentials ȳ if its reduced cost c̄uv = cuv − ȳu − ȳv = 0. We
say that potentials ȳ are feasible if they satisfy the following,
Feasibility condition. For every edge e ∈ E, c̄uv = cuv − ȳu − ȳv ≥ 0. (3.16)
We have thus proved the following,
Proposition 3.8 (Optimality Condition for perfect matchings). If the potentials ȳ are feasible
and all edges of M are equality edges with respect to ȳ then M is a minimum cost perfect
matching.
The arguments used to prove Proposition 3.8 are fairly elementary. We will see however,
that this result is surprisingly powerful. Indeed we will be able to design an efficient algorithm,
for the class of bipartite graphs, based on this optimality condition. Deriving good optimal
conditions is a key step in designing an algorithm.
3.2.2 A general argument - weak duality

We used an ad hoc argument argument to obtain Proposition 3.8. At a first glance it is not
obvious at all where the idea of assigning potentials to vertices arises from. We will show
however, that the result stated in Proposition 3.8 is an almost immediate consequence of look-
ing at the matching problem through the lens of duality theory.
Example 12. Consider the linear program
min{z(x) = cT x : Ax = b, x ≥ 0}, (3.17)
where  
   4
−2 1 −1 0 −2 3
A = 9 2 1 1
 b= 7  c= 
−2 .
5 1 0 3 7
3
It is easy to verify that the vectors (1/2, 0, 1, 3/2)T and (0, 1, 3, 2)T are both feasible for the
linear program (3.17). Their objective values are 9/2 and 3, respectively, and hence the first
feasible solution is clearly not optimal. We will show however that (0, 1, 3, 2)T is an optimal
solution by proving that z(x̄) ≥ 3 for every feasible solution x̄.
How can we find (and prove) such a lower bound? As before, recall from Chapter 2 that,
for any vector y = (y1 , y2 , y3 )T ∈ R3 , the equation
   
−2 1 −1 0 −2
(y1 , y2 , y3 ) 9 2 1 1 x = (y1 , y2 , y3 ) 7 
   (3.18)
5 1 0 3 7
holds for every feasible solution of (3.17). In particular, if we choose values, ȳ1 = 1, ȳ2 = −1
and ȳ3 = 1 we obtain the equality,
(−6, 0, −2, 2)x = −2 or equivalently 0 = −2 − (−6, 0, −2, 2)x.
Adding this equality to the objective function z(x) = (4, 3, −2, 3)x yields,
z(x) = (4, 3, −2, 3)x − (−6, 0, −2, 2)x − 2 = (10, 3, 0, 1)x − 2.

Let x̄ be any feasible solution. As x̄ ≥ 0 and (10, 3, 0, 1) ≥ 0T we have (10, 3, 0, 1)x̄ ≥ 0.

Hence, z(x̄) ≥ −2. Thus, we have proved that no feasible solution has value smaller than −2.
Note, this is not quite sufficient to prove that (0, 1, 3, 2)T is optimal. It shows however, that the
optimum value for (3.17) is between −2 and 3. It is at most 3, as we have a feasible solution
with that value, and it cannot be smaller than −2 by the previous argument.
Let us search for y1 , y2 , y3 in a systematic way. We rewrite (3.18) as
   
−2 −2 1 −1 0
0 = (y1 , y2 , y3 )  7  − (y1 , y2 , y3 )  9 2 1 1 x
7 5 1 0 3
and add it to the objective function z(x) = (4, 3, −2, 3)x, to obtain,
    
−2 −2 1 −1 0
z(x) = (y1 , y2 , y3 )  7  + (4, 3, −2, 3) − (y1 , y2 , y3 )  9 2 1 1 x. (3.19)
7 5 1 0 3
Suppose that we pick, y1 , y2 , y3 such that

 
−2 1 −1 0
(4, 3, −2, 3) − (y1 , y2 , y3 )  9 2 1 1 ≥ 0T .
5 1 0 3
Then for any feasible solution x̄, inequality (3.19) and the fact that x̄ ≥ 0 implies that
 
−2

z(x̄) ≥ (y1 , y2 , y3 ) 7  .
7
For a minimization problem the larger the lower bound the better. Thus, the best possible
upper bound for (3.17) we can achieve using the above argument is given by the optimal
value to the following linear program,
 
−2
max (y1 , y2 , y3 )  7 
7
subject to
 
−2 1 −1 0
(4, 3, −2, 3) − (y1 , y2 , y3 )  9 2 1 1 ≥ 0T
5 1 0 3
y1 , y2 , y3 ≥ 0
which we can rewrite as,
max (−2, 7, 7)y

subject to
 
 4 T
−2 1 −1 0   (3.20)
 9 2 1 1 y ≤  3 
−2
5 1 0 3
3
y ≥ 0.
Solving this linear program gives
ȳ1 = 2, ȳ2 = 0, and ȳ3 = 1,
and this solution has objective value 3. Since solution (0, 1, 3, 2)T has value 3, it is optimal.
Let us generalize the previous argument and consider the following linear program,
min{cT x : Ax = b, x ≥ 0} (3.21)
We first choose a vector y and create a new equality,
yT Ax = yT b.
This last equality is obtained from Ax = b by multiplying the 1st equality by y1 the 2nd by y2
the 3rd by y3 etc, and by adding all of the resulting equalities together. This equality can be
rewritten as
0 = yT b − yT Ax
which holds for every feasible solution x̄ of (3.6). Thus, adding this equality to the objective
function z(x) = cT x yields,
z(x) = yT b + cT x − yT Ax = yT b + (cT − yT A)x. (3.22)
Suppose that because of the choice of y, cT − yT A ≥ 0T . Let x̄ be any feasible solution. As

x̄ ≥ 0 we have that (cT − yT A)x̄ ≥ 0. It then follows by (3.7) that z(x̄) ≥ yT b. Thus, we
have shown that for all y such that cT − yT A ≥ 0T the value yT b is an lower bound on the
value of the objective function. Finally, note that the condition cT − yT A ≥ 0T is equivalent to
yT A ≤ cT , i.e. to AT y ≤ c.
The best lower bound we can get in this way is the optimal value of
max{bT y : AT y ≤ c}, (3.23)
where we note that the variables y are free (i.e., the variables are unrestricted). Let us summa-
rize our result and provide a more direct proof.
Theorem 3.9 (Weak Duality - Special form). Consider the following pair of linear programs,
min{cT x : Ax = b, x ≥ 0} (P)
max{bT y : AT y ≤ c}. (D)
Let x̄ be a feasible solution for (P) and ȳ be a feasible solution for (D). Then cT x̄ ≥ bT ȳ.
Moreover, if equality holds then x̄ is an optimal solution for (P).
In the previous theorem we define (D) to be the dual of (P).
Proof of Theorem 3.9. Let x̄ be a feasible solution of (P) and let ȳ be a feasible solution (D).
Then
bT ȳ = ȳT b = ȳT (Ax̄) = (ȳT A)x̄ = (AT ȳ)T x̄ ≤ cT x̄.
The inequality follows from the fact that AT ȳ ≤ c and that x̄ ≥ 0. Finally, as bT ȳ is a lower
bound on (P), if cT x̄ = bT ȳ it follows that x̄ is optimal for (P).
3.2.3 Revisiting the intuitive lower bound

Suppose we are given a graph G = (V, E) with edge costs ce (possibly negative) for every
e ∈ E. The following is an integer programming formulation for the minimum cost perfect
matching problem (see Section 1.5.2),

subject to
(3.24)
∑ xe : e ∈ δ (v) = 1 (v ∈ V )
xe ≥ 0 (e ∈ E)
Example 13. In the following figure we show a simple instance of the perfect matching prob-
lem. Each of the edges in the graph is labeled by its cost.
4
a
3 6
d 2 b
1 2
g
Let x := (xab , xbc , xdg , xad , xag , xbd )T . Then (3.24) specializes to the following in this case,
min (6, 2, 1, 3, 2, 4)x

subject to
ab bg dg ad ag bd
 
a 1 0 0 1 1 0
  (3.25)
b 1 1 0 0 0 1x = 1
 
d 0 0 1 1 0 1
 
g 0 1 1 0 1 0
x≥0 integer
Each of the first 4 constraints correspond to one of the 4 vertex of G. For instance the second
constraint, corresponding to vertex b, states that xab + xbg = 1, i.e. that we should select
exactly one of the edges incident to b.
Recall that for an integer program (IP) we call the linear program, obtained by removing
the condition that some variables have to take integer values, the linear programming relax-
ation of (IP). Consider an instance of the minimum cost perfect matching problem G = (V, E)
with ce for all e ∈ E. Suppose M is some arbitrary perfect matching of G. Then we can
construct a feasible solution x̄ to the integer program (3.24) as follows:

1 if e is an edge of M
x̄e = (3.26)
0 otherwise.
Moreover, the cost c(M) of the perfect matching M is equal to the value of x̄ for (3.24). It
follows in particular that c(M) is greater or equal to the optimal value of (3.24). Let (P) denote
the linear programming relaxation of (3.24). Since the constraints of (P) are a subset of the
constraints of (3.24) and since (3.24) is a minimization problem, the optimal value of (3.24)
is greater or equal to the optimal value of (P). Let (D) be the dual of (P). We know from the
Weak duality theorem 3.9 that the optimal value of (P) is greater or equal to the value of any
feasible solution ȳ of (D). Hence, we have proved the following result,
Remark 3.10. If (D) is the dual of the linear programming relaxation of (3.24) then the value
of any feasible solution of (D) is a lower bound on the cost of any perfect matching.
Example 9, continued. Let us compute the dual of the linear programming relaxation of (3.25),
as defined in Theorem 3.9. Observe that by taking the dual we interchange the role of the
variables and the constraints. We now have variables yv for every vertex v and one constraint
for each edge of G. Hence, let us define y := ya , yb , yd , yg ). The dual is given by,
max 1T y
subject to
a b d g
 
ab 1 1 0 0  
 

bg  0 1 0 
1
6 (3.27)
   
2
dg0 0 1 
1 1
 y ≤  
3 .
ad 
1 0 1 
0  
  2
ag 
1 0 0 
1 4
bd 0 1 1 0
A feasible solution to (3.27) assigns some potential yv to every vertex v. The constraint for
edge ab states that ya + yb ≤ 6 = cab or equivalently that the reduced cost cab − ya − yb of
edge ab is non-negative. Similarly, for every other edge the corresponding constraint states
the reduced cost is non-negative. Hence, if ȳ is feasible for (3.27) then the potentials ȳ satisfy
the Feasibility Conditions (3.16). For instance, ȳa = 2, ȳb = 2, ȳd = 1 and ȳg = 0 are feasible
potentials. Observe that,
cad = 3 = 2 + 1 = ȳa + ȳd and cbg = 2 = 2 + 0 = ȳb + ȳg ,
i.e. ad and bc are equality edges with respect to ȳ. It follows that,
1T ȳ = (ȳa + ȳd ) + (ȳb + ȳg ) = cad + cbg = c(M).
Hence, the value of ȳ in (3.27) is equal to the cost of the perfect matching M = {ad, bc}. It
follows by Remark 3.10 that M is a minimum cost perfect matching. Hence, in this example
we see that if the potentials ȳ are feasible and every edge of M is an equality edge, then M is
a perfect matching as predicted by Proposition 3.8.
Consider now a general instance of the perfect matching problem. We are given a graph
G = (V, E) and costs ce for all e ∈ E. We can rewrite the linear programming relaxation
of (3.24) as,
min{cT x : Ax = 1, x ≥ 0}, (3.28)
where c is the vector of edge costs, and the matrix A is defined as follows:
1. the rows of A are indexed by vertices v ∈ V ,
2. columns are indexed by edges e ∈ E, and
3. for every row U and every column e:


1 if v is an endpoint of e
A[v, e] =
0 otherwise.
Remark 3.11.
(1) In row v of A, entries with a 1 correspond to the edges incident to v.
(2) In column e of A, entries with a 1 correspond to the endpoints of e.
The dual of (3.28), as defined in Theorem 3.9, is given by,
max{1T y : AT y ≤ c}. (3.29)
Let us try and understand this dual. There is a variable yv for every vertex v and a constraint
for every edge e ∈ E. Consider the constraint for edge e ∈ E. The right-hand side is the cost
ce of the edge, and the left-hand side corresponds to column e = uv of A. Remark 3.11(2)
implies that the left-hand side of this constraint is yu + yv . We can therefore rewrite (3.29) as
follows:
max ∑ yv : v ∈ V
subject to (3.30)
yu + yv ≤ cuv (uv ∈ E)
A feasible solution ȳ to (3.30) assigns some potential yv to every vertex v. The constraint for
each edge uv states that yu + yv ≤ cuv or equivalently that the reduced cost cuv − yu − yv of edge
uv is non-negative. Hence, if ȳ is feasible for (3.30) then the potentials satisfy the Feasibility
Conditions (3.16). Let M be a perfect matching and suppose that every edge uv ∈ M is an
equality edge, i.e. ȳu + ȳv = cuv . Then,
1T ȳ = ∑ ȳv = ∑ (ȳu + ȳv) = ∑ cuv = c(M),

v∈V uv∈M uv
where the second equality follows from the fact that M is a perfect matching i.e. that every
vertex of G is the end of exactly one edge of M. Hence, the value of ȳ in (3.30) is equal to
the cost of the perfect matching M. It follows by Remark 3.10 that M is a minimum cost
perfect matching. Hence, we see that if the potentials ȳ are feasible and every edge of M is
an equality edges, then M is a perfect matching. Hence, we now have an alternate proof for
Proposition 3.8.
While the derivation of Proposition 3.8 using duality may, at first glance, seem more tech-
nical than the ad-hoc argument we had in Section 3.2.1, notice that our derivation was com-
pletely mechanical: We formulated the minimum cost perfect matching problem as an integer
program, wrote the dual of its linear programming relaxation, and used weak duality to obtain
bounds on the possible values of our original optimization problem. After generalizing the
notion of duals in Chapter 4, we will be able to apply the aforementioned strategy for arbi-
trary optimization problems that can be formulated as integer programs. Once again a word
of caution at this point: while the dual bound obtained through Proposition 3.8 was sufficient
to prove the optimality of a certain perfect matching, this may not always work. The success
of the mechanical procedure outlined above heavily depends on the quality of the underlying
linear program as well as the problem.
3.2.4 An algorithm
In this section we give an algorithm to find a minimum cost perfect matching in a bipartite
graph only. While there exist efficient algorithms for solving the minimum cost perfect match-
ing problem for general graphs, these algorithms are more involved and beyond the scope of
this book. One of the key reasons is that for general graphs we cannot always certify that a
minimum cost perfect matching is indeed a minimum cost perfect matching using Proposi-
tion 3.8.
We first need to characterize which bipartite graphs have a perfect matching. Recall that
a graph G = (V, E) is bipartite with bipartition U,W , if V is the disjoint union of U and
W and every edge has one end in U and one end in W . A necessary condition for a graph
G with bipartition U,W to have a perfect matching

S N (S)
is that |U| = |W |. However, this is not sufficient as
a e
the example in the figure illustrates. Suppose M is
an arbitrary matching of this graph. Observe that all
edges that have an endpoint in the set {a, b, c} have b f
their other endpoint in the set {e, f }. Since M is a U W
matching, at most two edges of M can have an end-
point in {e, f }. Hence, at most two of the vertices c g
{a, b, c} can be an endpoint of some edge of M. In

particular, M cannot be a perfect matching.
d h
Consider a graph G = (V, E) and a subset of ver-
tices S ⊆ V . The neighbours of S are the set of
all vertices outside S that are joined by an edge to some vertex of S, i.e it is the set
{r ∈ V \ S : sr ∈ E and s ∈ S}. We denote the neighbours of S in graph G by NG (S). For
instance, in the previous example, the neighbours of {a, b, c} are the vertices {e, f }. Consider
a bipartite graph G = (V, E) with a bipartition U,W . Suppose that there exists a set S ⊆ U
such that |S| > |NG (S)|. Let M be an arbitrary matching of G. Observe that all edges of M
that have an endpoint in S have an endpoint in NG (S). Since M is a matching, at most |NG (S)|
edges of M can have an endpoint in NG (S). Hence, at most |NG (S)| of the vertices in S can be
an endpoint of some edge of M. As |S| > |NG (S)|, M cannot be a perfect matching. We call
a set S ⊆ U such that |S| > |NG (S)| a deficient set. Thus we have argued that if G contains a
deficient set, then G has no perfect matching. The following result, states that the converse is
true.
Theorem 3.12 (Hall’s Theorem). Let G = (V, E) be a bipartite graph with bipartition U,W
where |U| = |W |. Then, there exists a perfect matching M if and only if G has no deficient
set S ⊆ U. Moreover, there exists an efficient (polynomial-time) algorithm that given G will
either find a perfect matching M or find a deficient set S ⊆ U.
We postpone the proof of the previous result and the description the associated algorithm until
Section 3.2.6.
We are now ready to start the description of the matching algorithm. We consider a bi-
partite graph G = (V, E) with bipartition U,W where |U| = |W |. In addition we are given a
set of edge costs ce for every edge e ∈ E. At each step we have a feasible solution ȳ to the
linear program (3.30). To get an initial feasible dual solution we let α denote the value of the
minimum cost edge and set ȳv := 21 α, for all v ∈ V .
Given a dual feasible solution ȳ we construct a graph H as follows: H has the same set
of vertices as G, and the edges of H consists of all edges of G that are equality edges with
respect to ȳ. We know from Theorem 3.12 that we can either find,
a) a perfect matching M in H, or
b) a deficient set S ⊆ U of H.
In case (a) since all edges of M are equality edges, Proposition 3.8 implies that M is a min-
imum cost perfect matching of G, in which case we can stop the algorithm. Thus we may
assume that (b) occurs. We will use the set S to define a new feasible solution y0 to (3.30) with
larger objective value than ȳ as follows, for every vertex v,


 ȳ + ε for v ∈ S

 v
y0v := ȳv − ε for v ∈ NH (S)



ȳ
v otherwise.
We wish to choose ε ≥ 0 as large as possible such that y0 is feasible for (3.30), i.e. for every
edge uv we need to satisfy cuv − y0u − y0v ≥ 0. As y is feasible for (3.30) cuv − ȳu − ȳv ≥ 0.
Consider an edge uv, we may assume u ∈ U and v ∈ W . There are four possible cases for such
an edge (see figure):
1
Case 1. u 6∈ S, v 6∈ NH (S).
N (S)
Case 2. u 6∈ S, v ∈ NH (S). 2
U S W
Case 3. u ∈ S, v ∈ NH (S). 3
Case 4. u ∈ S, v 6∈ NH (S).
4
Let us investigate each case,
Case 1. cuv − y0u − y0v = cuv − ȳu − ȳv ≥ 0.
Case 2. cuv − y0u − y0v = cuv − ȳu − (ȳv − ε) ≥ 0.
Case 3. cuv − y0u − y0v = cuv − (ȳu + ε) − (ȳv − ε) ≥ 0.

Thus, we only need to worry about Case 4. Note, we may assume that there is such an edge,
for otherwise S is a deficient set for G and we may stop as G has no perfect matching. We
want, 0 ≤ cuv − y0u − y0v = cuv − (ȳu + ε) − ȳv , i.e. ε ≤ cuv − ȳu − ȳv . Note that since uv is not
an edge of H, it is not an equality edge. Hence, cuv − ȳu − ȳv > 0. Thus, we can choose ε > 0
as follows:
ε = min{cuv − ȳu − ȳv : uv ∈ E, u ∈ S, v ∈
/ NH (S)}. (3.31)
Note, that 1T y0 − 1T ȳ = ε(|S| − |NH (S)|) > 0. Hence, the new dual solution y0 has a higher
(better) value than y. Algorithm 3 has a formal description of our algorithm.
Algorithm 3 Hungarian Algorithm for Perfect Matching in Bipartite Graphs.

Input: Graph G = (V, E) with bipartition U,W where |U| = |W | and costs c.
Output: A minimum cost perfect matching M or a deficient set S.
1: ȳv := 12 min{ce : e ∈ δ ({v})}, for all v ∈ V .
2: loop
3: Construct graph H with vertices V and edges {uv ∈ E : cuv = ȳu + ȳv }
4: if H has perfect matching M then
5: stop (M is a minimum cost perfect matching of G)
6: end if
7: Let S ⊆ U be a deficient set for H
8: if all edges of G with an endpoint in S have an endpoint in NH (S) then
9: stop (S is a deficient set of G)
10: end if
11:  uv − ȳu − ȳv : uv ∈ E, u ∈ S, v ∈
ε := min{c / NH (S)}

ȳ + ε for v ∈ S

 v
12: ȳv := ȳv − ε for v ∈ NH (S)



ȳ
v otherwise.
13: end loop
Example 14. Consider the minimum cost perfect matching problem described in Figure 3.3(i)
(left) where the edges are labeled by the costs. Since the minimum cost is 2 we initially set
potentials ȳa = ȳb = ȳc = ȳd = ȳe = ȳ f = 1. In figure (i) (right) we indicate the graph H and
the deficient set S. The edges of G with one endpoint in S and the other endpoint not in NH (S)
G and H and
potential deficient set/matching
S NH (S)
2 +2 a
1 a d 1 d −2
4 2
(i) 5
1 b e 1 +2 b e
6 2
1 c f 1 c f
3
S NH (S)
3 a 2 d −1 +1 a d −1
4 2
(ii) 5 +1 −1
3 b e 1 b e
6 2
1 c f 1 +1 c f
3
2
4 a d −2 a d
4 2
(iii) 5 e 0 e
4 b b
6 2
2 c f 1 c f
3
Figure 3.3: Matching algorithm - an example
are ae, be, b f , hence,
ε = min{cae − ȳa − ȳe , cbe − ȳb − ȳe , cb f − ȳb − ȳ f }

= min{4 − 1 − 1, 5 − 1 − 1, 6 − 1 − 1} = 2.
The new potentials are given in figure (ii) (left). In figure (ii) (right) we indicate the graph H
and the deficient set S. The edges of G with one endpoint in S and the other endpoint not in
NH (S) are b f , c f , hence,
ε = min{cb f − ȳb − ȳ f , cc f − ȳc − ȳ f }

min{6 − 3 − 1, 3 − 1 − 1} = 1.
The new potentials are given in figure (iii) (left). In figure (iii) (right) we indicate the graph H.
This graph has no deficient set, but it has a perfect matching M = {ae, bd, c f }. This matching
is a minimum cost perfect matching of G.
We will prove, in the next section, that this algorithm terminates and hence is guaranteed
to always find a minimum cost perfect matching. Suppose that after running the algorithm
and finding a minimum cost matching M we define variables x̄ as in (3.26). Then x̄ is a
feasible solution to the linear program (3.24) and ȳ is a feasible solution to the dual linear
program (3.30). Moreover, as all edges of M are equality edges the total sum of the potentials
will be equal to the cost of the matching M. In other words the value of x̄ in (3.24) is equal
to the value ȳ in (3.30). It follows from Weak Duality (Theorem 3.9), that x̄ is an optimal
solution to (3.24). Hence, correctness of the algorithm will imply the following result,
Theorem 3.13. If G is bipartite graph and there exists a perfect matching, then the linear
program (3.24) has an integral optimal solution; i.e., there is an optimal solution all of whose
entries are integers.
Note that our matching algorithm preserves at each step a feasible solution to the dual linear
program (3.30). This is an example of a primal-dual algorithm. We will see another example
in Chapter 5.1.
3.2.5 Correctness of the algorithm

It follows immediately from Proposition 3.8 that if the algorithm terminates with a perfect
matching then that matching is a minimum cost perfect matching. Moreover, if the algorithm
terminates with a deficient set, then Theorem 3.12 implies that G does not have a perfect
matching. Hence, to show that the algorithm is correct it suffices to prove that the algorithm
terminates. Let us assume for the time being that ce is an even integer for all edges uv ∈ E.
We claim that this implies that, for all uv ∈, the reduced cost
c̄uv := cuv − ȳu − ȳv

is an integer at all times during the execution of Algorithm 3. This is clearly true at the
beginning since ȳu is set to half the cost of the minimum-cost edge incident to u – and this is
an integer by our evenness assumption.
Now it is easy to check that integrality is maintained. In step 7 of the algorithm, we choose
a deficient set S ⊆ U. We then determine ε according to (3.31), add it to ȳv for all v ∈ S, and
subtract it from ȳv for v ∈ NH (S). Inductively, we know that c̄e is an integer for all e ∈ E, and
hence ε is an integer as well. Thus, ȳv changes by an integer amount in the current iteration
of the algorithm, and so do the reduced costs.
Proposition 3.14. Algorithm 3 terminates if ce is an even integer for all e ∈ E.
Proof. We have already seen that ȳ is a feasible solution of (3.30) at all times, and hence we
have
ȳu + ȳv ≤ cuv , (3.32)
for all uv ∈ E. For each u ∈ V , let uū ∈ δ ({u}) be a minimum-cost edge incident to u.
Applying (3.32) we see that
∑ ȳu ≤ ∑ cuū =: C. (3.33)
u u
Since ε is a positive integer in every iteration, and since |S| ≥ |NG (S)| + 1 for deficient set
S, it follows that ∑v ȳv must increase by at least 1 in every iteration of the algorithm. By
the initialization step 1 of Algorithm 3 the latter sum has an initial value C/2, and hence the
overall number of iterations must be bounded by C/2 ≤ ∑e ce .
The above proposition establishes the termination of Algorithm 3. In general matching

instances, the cost of each edge e ∈ E is a rational number ce . We can now find a suitable
integer ω such that c0e = ω ce is an even integer for all e ∈ E. Note that M is a minimum-cost
perfect matching for c0 if and only if it is a minimum-cost matching for costs c.
We conclude with a brief discussion of the iteration bound obtained in Proposition 3.14.
The earmark of an efficient algorithm is that its running time – the number of elementary steps
that it performs – is bounded by a polynomial in the size of the input. Roughly speaking, the
size of the input is the number of bits needed to store it. In the case of the matching problem,
this is a linear function of |V |, |E|, and ∑e log (|ce | + 1). Proposition 3.14 bounds the number
of iterations by c(E), and this can be an exponential function of ∑e log (|ce | + 1) in the worst-
case. Hence, the analysis provided does not show that the Hungarian Algorithm is an efficient
algorithm. Efficient implementations of this algorithm are however known, and we provide
some details in the next section.
Exercises
1. C&O transportation company has three trucks with different capabilities (affectionately
called Rooter, Scooter and Tooter) and three moving jobs in different cities (Kitchener,
London and Missisauga). The following table shows the profit gained (in $1,000s) if a
given truck is used for the job in a given city. Your task is to find an allocation of the trucks
to the three different cities to maximize profit.
K L M
R 12 9 9
S 12 4 3
T 12 3 4
(a) Formulate this as an IP problem.

(b) Formulate it as a maximum matching problem.
(c) Solve the maximum matching problem using the minimum matching algorithm.
2. Consider a bipartite graph with vertices U = {a, b, c, d} and W = {g, h, i, j} and all edges
between U and W . You also have a weight for each edge of G as indicated in the following
table,
g h i j
a 2 7 1 2
b 3 4 3 2
c 6 5 5 5
d 2 6 2 3
For instance edge ag has weight 2. Find a maximum weight perfect matching using the
matching algorithm.
3. Let G = (V, E) be a graph with edge weights c. Consider the following three problems,
(P1) Find a perfect matching of minimum weight,

(P2) Find a perfect matching of maximum weight,
(P3) Find a matching of maximum weight.
(a) Show that if you have an algorithm to solve (P1), then you can use it to solve (P2).
(b) Show that if you have an algorithm to solve (P2), then you can use it to solve (P1).
Suppose that in addition all the weights are non-negative.
(c) Show that if you have an algorithm to solve (P3), then you can use it to solve (P4).
(d) Show that if you have an algorithm to solve (P4), then you can use it to solve (P3).
4. Let G = (V, E) be a bipartite graph. Consider the following linear program (P),
max 0
subject to

∑ xe : e ∈ δ (i) = 1 (i ∈ V )
xe ≥ 0 (e ∈ E)
(a) Find the dual (D) of (P).

(b) Show that if G has a deficient set then (D) is unbounded.
H INT: look at how the variables y are updated in the matching algorithm.
(c) Deduce from (b) and Theorem 3.9 that if G has a deficient set then it has no perfect
matching.
5. Let G = (V, E) be a graph. A vertex cover of G is a subset S of V with the property that for
every edge i j we have that either i ∈ S of j ∈ S (or possibly both).
(a) Show that for every matching M and every vertex cover S we have |S| ≥ |M|. In
particular,
max{|M| : M is a matching of G} ≤ min{|S| : S is vertex cover of G}. (?)
H INT: How many vertices you need to cover just the edges of M?
(b) Show that there are graphs for which the inequality (?) is strict.
Suppose G = (V, E) is a bipartite graph with partition U,W (for every edge i j of G, i ∈ U
and j ∈ W and U,W is a partition of V ). Suppose also that |U| = |W |. Let H be the
complete bipartite graph with partition U and W (the edges of H are all pairs i j where
i ∈ U and j ∈ W ). Assign weights to the edges of H as follows: for every edge i j,

1 if i j is an edge of G
wi j =
0 otherwise
Recall that y is feasible if for every edge i j of H, yi + y j ≥ wi j .
a) Show that there exists a feasible 0, 1 vector y (i.e. all entries of y are 0 or 1).
b) Show that if y is integer at the beginning of an iteration of the matching algorithm then
y will be integer at the beginning of the next iteration.
c) Show that there exists a feasible integer vector y and a matching M of G such that
∑ yi = |M|.
i∈V
H INT: Look at what you have when the matching algorithm terminates.
d) Show that there exists a feasible 0, 1 vector y and a matching M of G such that
∑ yi ≤ |M|.
i∈V
H INT: Add (resp. substract) the same value α to all yi ∈ U (resp. yi ∈ W ).

e) Show that there exists a vertex cover S of G and a matching M of G such that |M| = |S|.
In particular (?) holds with equality for bipartite graphs.
3.2.6 Finding perfect matchings in bipartite graphs*

Hall’s Theorem 3.12 states that, in a bipartite graph G = (U ∪ W, E), there is an efficient
algorithm that finds a perfect matching in G if one exists, and otherwise returns a deficient set
S ⊆ U; i.e., a set S that has more than |S| neighbours. In this section, we will fill in the details
and show how this can be accomplished.
We will start introducing a few more matching related terms and definitions. Consider a
(possibly empty) matching M ⊂ E; we say that a path P in G is M-alternating if its edges
alternate between matching and non-matching edges. More formally, recall that E(P) is the
set of edges on path P. A path P is M-alternating if E(P) \ M is a matching.
v1 v2 v3 v4 v5 v6 v7 v8
Figure 3.4: An M-alternating path where edges of M are shown as thick lines.
The figure above shows an M-alternating path where edges of matching M appear in bold.
We will say that a matching M covers a vertex v if M has an edge that is incident to v. If M
does not cover v then it exposes v, and v is M-exposed. An alternating path P is called M-
augmenting if it starts and ends in an M-exposed vertex. Hence, the path in the above figure
is an augmenting path for matching M.
The reader may already suspect that aug- v2 v1 v2 v1
menting paths are crucial for the design of
efficient matching algorithms. Figure (i) on v3 v3
the right suggests why this may be. The fig-
ure shows a bipartite graph G, a matching v4 v4
M, and an M-augmenting path
P = v1 v2 , v2 v3 , v3 v4 . (i) (ii)
Now observe that removing v2 v3 from M, and adding the edges v1 v2 , and v3 v4 yields another
matching M 0 that is larger than M. Let us formalize this swap operation as follows. For two
sets A and B, we let A∆B denote the symmetric difference of A and B. A∆B is the set of those
elements in A ∪ B that are not in both A and B; i.e.,
A∆B = (A \ B) ∪ (B \ A).
Note that the matching M 0 in the above example is the symmetric difference of M and P; i.e.
M 0 = M∆P. In the following, we say that M is a maximum matching if there is no matching
M 0 with more edges.
Theorem 3.15. M is a maximum matching in graph G = (V, E) if and only if G does not have
an M-augmenting path.
One direction of the proof of this theorem is quite easy: clearly if M is a matching, and
there is an M-augmenting path then M can not be a maximum matching. For the other direc-
tion it is useful to observe that M∆M 0 has an M-augmenting path if M 0 is a larger matching
than M. We leave the details as an exercise.
In light of Theorem 3.15 a plausible strategy for finding a perfect matching is as follows:
start with the empty matching M = 0.
/ As long as there exists an M-augmenting path P, replace
M by M∆P. Theorem 3.15 implies that this method finds a maximum matching, and this must
clearly be perfect if the underlying graph has such a matching. But how do we efficiently find
augmenting paths?
Finding augmenting paths
We need a few more definitions before we can describe an algorithm for this task. A path
P = v1 v2 , v2 v3 , . . . , vk−1 vk
that starts and ends in the same vertex (i.e., v1 = vk ) is called a cycle. A graph that has no
cycle is called acyclic. We say that a graph T = (V, E) is connected if T has a uv-path for
any pair u, v ∈ V . T is a tree if it is connected, and acyclic. Figures (i) and (ii) are example
graphs that are not trees; figure (i) has a cycle (shown in bold edges), and figure (ii) is not
connected as there is, for example, no path connecting u and v. Finally, figure (iii) shows a
tree. Given a tree T = (V, E), we call a vertex u ∈ V a leaf if it is incident to exactly one edge
u
u
(i) (ii) (iii)
(see figure (iii)), and an internal vertex otherwise. Trees have many useful properties. We
state two crucial ones here, and leave their proofs to the reader.
Proposition 3.16. (a) Let T = (V, E) be a tree and u, v ∈ V ,
then T contains a unique uv-path; we will use Tuv to de-
y
note this path from now on.
x
(b) Let x, y ∈ V such that xy 6∈ E, then the graph obtained v
from T by adding the edge xy contains a unique cycle.
u
This cycle is called the fundamental cycle defined by xy.
In order to efficiently find augmenting paths, we will need a special kind of tree. Let M be
a matching in some underlying graph G = (V, E), and suppose that r is an M-exposed vertex.
A tree T in G is an M-alternating tree with root r if all vertices in V (T ) \ {r} are M-covered,
and if for any vertex u 6= r, the unique ru-path Tru in T is an M-alternating path. The left
figure below shows such a tree. In the following, we let A(T ) and B(T ) be the sets of vertices
u in tree T for which Tru has an even and odd, respectively, number of edges.
r r r
u1 u1
u2 u2
u3 u3
u v u4 v u4
(i) (ii) (iii)
Figure 3.5: Figure (i) shows an alternating tree for a matching M whose edges are depicted in
bold. Shaded vertices belong to A(T ), and white ones to B(T ). Figure (ii) shows a non-tree
vertex v that is connected to a vertex in B(T ), and the associated augmenting path P. Figure
(iii) shows the enlarged matching M∆P.
Our algorithm works in iterations to either find a perfect matching M, or a proof that none
exists. At any time during its execution, the algorithm will maintain a matching M, and an
M-alternating tree T . The algorithm has the following important properties:
[P1] M never decreases.
[P2] If M does not increase in an iteration then |V (T )| increases, or the algorithm terminates
with a deficient set.
Without describing any further details, it is clear that the above two properties imply that the
algorithm terminates in at most |V |2 iterations. To see this, let us group those iterations during
which the matching M has size i into the ith phase. Clearly, by property [P1], the iterations of
Phase i are consecutive. Moreover, every iteration within a phase grows the alternating tree
by [P2], and therefore, a phase can have at most |V | iterations. Finally, there are clearly no
more than |V | phases, and this completes the argument.
We describe phase i ≥ 0 of the algorithm. At the beginning of the phase, let M be a
matching with i edges. Let r be an M-exposed vertex, and let T 0 = ({r}, 0)
/ be the trivial
tree with a single vertex, and no edges. At the beginning of iteration j ≥ 0, we are given
an M-alternating tree T j with root r. We then check whether there is an edge uv ∈ E with
u ∈ B(T j ), and v 6∈ V (T j ). We later show that, if no such edge exists, then G has a deficient
set, and hence no perfect matching. Let us for now assume that there is such an edge uv.
j
If v is M-exposed, then the path P obtained by attaching the edge uv to Tru is an M-
augmenting path, and hence M∆P is a matching of size i + 1; this ends Phase i.
On the other hand, suppose that v is M-covered. Thus, there exists a vertex v0 such that
vv0 ∈ M. We will ensure that, like v, v0 is not part of the vertex set of tree T j ; the reader is
encouraged to verify this. We now add vertices v and v0 , as well as the edges uv, vv0 to our tree,
and define T j+1 = (V j+1 , E j+1 ), where V j+1 = V j ∪ {v, v0 }, and E j+1 = E j ∪ {uv, vv0 }. While
the matching M remains unchanged in this step, the size of the alternating tree increases.
w1 u1 u2
w1 u1 w5
w2 u2
u2 w5 w2 u2
u3 w2
w3 u3 w3
u3 w2 w3 u3
w1 u4 w3
w4 u4 w4
u4 w4 u4
w5 u5 u5 w5 u5 u5 w4
(i) (ii) (iii) (iv)
Figure 3.6: Figure (i) shows a bipartite graph, and matching M in bold. Figure (ii) shows an
alternating tree rooted at M-exposed vertex w3 . Figure (iii) shows a larger matching obtained
by augmenting M, and figure (iv) gives a certificate that no perfect matching exists.
The example instance in Figure 3.6.(i) shows a bipartite graph and a matching M. Let us
apply the above procedure starting from the given matching, in phase 3. We let the initial
tree be T 0 = ({w3 }, 0),
/ and note that w3 ∈ B(T 0 ). Vertex w3 has three neighbours u2 , u3 , and
u4 , all of which are M-covered. Following the algorithm, we arrive at a tree T 3 = (V 3 , E 3 )
(depicted in Figure 3.6.(ii)) with
V 3 = {w3 , u2 , u3 , u4 , w2 , w4 , w5 } and E 3 = {w3 u2 , w3 u3 , w3 u4 , u2 w5 , u3 w2 , u4 w4 }.
Vertex w4 is in B(T 3 ) and has an M-exposed neighbour u5 6∈ V 3 . The path
P = w3 u4 , u4 w4 , w4 u5 ,
is a P-augmenting path, and the matching M 0 = M∆P has one more edge than M. This starts
phase 4 of the algorithm. Figure 3.6.(iii) shows the new matching M 0 , and M 0 -exposed vertex
/ we find the M 0 -
w1 . Using the same procedure as before, starting from tree T 0 = ({w1 }, 0)
alternating tree T 0 = (V 0 , E 0 ) displayed in Figure 3.6.(iv). Note that none of the vertices
v ∈ B(T 0 ) has a neighbour outside the tree. Thus we are stuck! In fact, we are now able
to show that the given graph has no perfect matching. We state the following useful fact
whose proof we leave as an exercise.
Fact 3.17. A graph G = (V, E) is bipartite if and only if G does not contain cycles with an
odd number of edges.
Proposition 3.18. Let G = (V, E) be a bipartite graph, M a matching, and T = (V 0 , E 0 ) an

M-alternating tree such that the neighbours NG (v) of any vertex v ∈ B(T 0 ) are contained in
V 0 . In this case G has no perfect matching.
Proof. We claim that the set B(T 0 ) has at most |B(T 0 )| − 1 neighbours in G. Theorem 3.12
then implies that G has no perfect matching.
Consider a vertex v ∈ B(T 0 ), and let u be a neighbour of v. We claim that u ∈ A(T 0 ).
Indeed if not then one shows that the unique cycle in (V 0 , E 0 ∪ {uv}) has odd length; we leave
this as an exercise. Fact 3.17 then contradicts the fact that G is bipartite.
Let r be the root of tree T . We will finish the argument by finding a designated neighbour
for each vertex in B(T 0 ) \ {r}. Consider a vertex v ∈ B(T 0 ), and let vv0 ∈ M. We then let
v0 ∈ A(T 0 ) be v’s designated neighbour. Clearly, if v0 is v’s designated neighbour then it can
not be the designated neighbour of another vertex in B(T 0 ). The above argument shows that
B(T 0 ) has at least one more vertex than A(T 0 ). It follows from Theorem 3.12 that G has no
perfect matching as desired.
Consider the set of shaded vertices in Figure 3.6.(iv). Its neighbours in G are the white
vertices, and there are fewer white vertices than shaded ones. Hence, the graph in the example
has no perfect matching by Theorem 3.12.
The discussion in this section, and Proposition 3.18 provide an algorithmic proof of The-
orem 3.12. We state this method formally below in Algorithm 4.
An efficient implementation of the Hungarian method
Algorithm 4 could now simply be used as a black box in step 7 of Algorithm 3. We will
now see, however, that a more careful implementation yields a significant improvement over
Algorithm 4 Constructive proof of Theorem 3.12.

Input: Bipartite graph H = (V, E) with bipartition U,W where |U| = |W |.
Output: A perfect matching M in H, or a deficient set B ⊆ V .
1: M = 0/
2: loop
3: Let r be an M-exposed vertex, and T = ({r}, 0).
/
4: if ∃ vertex u ∈ B(T ) such that NG (u) 6⊆ V (T ) then
5: Pick v ∈ NG (u) \V (T )
6: if v is M-exposed then
7: Let P be the M-augmenting path obtained by adding uv to Tru
8: Replace M by M∆P
9: if M is a perfect matching then stop end if
10: Let r be an M exposed vertex, and T = ({r}, 0)
/
11: else
12: Let uū ∈ M be the matching edge incident to u
13: Replace T by (V (T ) ∪ {u, ū}, E(T ) ∪ {vu, uū})
14: end if
15: else
16: stop and return deficient set B(T ) (H has no perfect matching)
17: end if
18: end loop
the iteration bound provided by Proposition 3.14. The modified Algorithm 3 will maintain a
matching M in the graph H of equality edges, and an M-alternating tree T at all times.
Initially, M will be empty, and T will consist of a single (exposed) vertex, and no edges.
The algorithm stops as soon as M is a perfect matching, or when it finds a deficient set in G.
In an iteration we invoke Algorithm 4 starting from matching M, and M-alternating tree
T ; i.e., we start the execution of Algorithm 4 with step 4, skipping the two initialization steps
for M and T . Algorithm 4 either increases the size of matching M, or returns with a new
larger alternating tree T 0 for the same old matching. We present the details of this method in
Algorithm 5.
Similar to Algorithm 4 we will show that the algorithm maintains the following invariants:
[P’1] The size of matching M never decreases.

Algorithm 5 An efficient implementation of Algorithm 3.

Input: Non-empty graph G = (V, E) with with bipartition U,W where |U| = |W | and costs c.
Output: A minimum cost perfect matching M or a deficient set S.
1: ȳv := 12 min{ce : e ∈ δ ({v})}, for all v ∈ V
/ for some r ∈ V
/ T = ({r}, 0)
M := 0,
2: loop
3: Let H be the graph of equality edges for ȳ
4: Invoke Algorithm 4 with matching M, and alternating tree T .
5: Let M be the new matching, and T be a possibly empty M-alternating tree.
6: if M is a perfect matching then stop end if
7: Let S := B(T ) be the new deficient set
8: if NG (S) ⊆ A(T ) then stop (G has no perfect matching) end if
9:  uv − ȳu − ȳv : u ∈ S, v 6∈ V (T )}.
ε = min{c

ȳ + ε for v ∈ S

 v
10: ȳv := ȳv − ε for v ∈ NH (S)



ȳ
v otherwise.
11: end loop
[P’2] If M does not grow in an iteration then V (T ) grows.
These invariants clearly provide an upper bound on the total number of iterations. Using the
same notation as before, we say that an iteration of Algorithm 3 belongs to phase i if matching
M has i edges at its start. Invariant [P’1] implies that the iterations belonging to phase i are
consecutive. Invariant [P’2] implies that a phase can have at most |V | iterations, and [P’1]
implies that there are at most |V |/2 phases. Hence we obtain the following proposition.
Proposition 3.19. Algorithm 5 terminates after no more than |V |2 /2 iterations.
It remains to show that invariants [P’1] and [P’2] hold throughout. Consider an iteration
of Algorithm 5, and let M and T be the current matching and corresponding M-alternating
tree. If Algorithm 4 returns a perfect matching then we are obviously done, and have nothing
to show. Let us therefore assume that M is not a perfect matching, and T is the corresponding
M-alternating tree.
We know from Proposition 3.18 that the set B(T ) is a de-

ficient set in H. In fact if no vertex u ∈ B(T ) has a neighbour r
outside V (T ), then B(T ) is a deficient set in G, and we are
once again done as G does not have a perfect matching. Let
us therefore assume that there are vertices u ∈ B(T ) that have
neighbours in V \V (T ).
The minimum on the right-hand side of step 11 of Algo- v u
rithm 3 is precisely over the reduced costs cwz − ȳw − ȳz of

edges wz with one end in B(T ), and one outside V (T ). Among all such edges let uv be one
with smallest reduced cost; see also the figure on the right. The following observation sum-
marizes the effect of the subsequent potential-update step 10 of Algorithm 5. We leave its
straight-forward proof to the reader.
Observation 3.20. The dual update in step 10 does not change the reduced costs of edges
e ∈ E(T ), or those of matching edges wz ∈ M \ E(T ). The edge uv minimizing the expression
on the right-hand side of step 9 had positive reduced cost before, and zero reduced cost after
the update.
In other words, if matching M does not change in the process of invoking Algorithm 4
then all of its edges remain to be equality edges. Moreover, the dual-update step creates a
new equality edge uv ∈ δ (B(T )). The important thing to notice is what happens the next time
Algorithm 4 is invoked. If v is an M-exposed vertex, then Algorithm 4 finds a larger matching.
If it doesn’t find a larger matching then v must be M-covered, and the edge uv can be used to
increase the size of T . Altogether, we have argued that [P’1] and [P’2] hold throughout the
algorithm.

The shortest path algorithm presented in Section 3.1 is equivalent to the well-known algorithm
by Dijkstra [9]. This algorithm has several very efficient implementations using sophisticated
heap data-structures; e.g. see the work by Fredman and Tarjan [12].
The Hungarian Algorithm solves the minimum-cost perfect matching problem in bipartite
graphs. Efficient algorithms for this problem exist even in general graphs. We refer the reader
to [6] for more information.
The algorithms presented in Sections 3.1 and 3.2.3 are examples of so called primal-dual
algorithms. The primal-dual technique has more recently also been used successfully for
literally hundreds of NP-hard optimization problems. A good introduction to this topic which
is beyond the scope of these notes can be found in the books of Vazirani [23], and Williamson
and Shmoys [24].
Chapter 4
Duality theory
In Chapter 3, we introduced the notion of a dual of a linear program, and demonstrated how
this can be used to design algorithms to solve optimization problems, such as the shortest path
problem and the minimum cost perfect matching problem in bipartite graphs. In this chapter,
we shall generalize these results, and develop a general theoretical framework.
4.1 Weak duality

In Section 3.1 we defined the dual of the linear program,
min{cT x : Ax ≥ b, x ≥ 0} (4.1)
to be the linear program

max{bT y : AT y ≤ c, y ≥ 0}. (4.2)
Similarly, we in Section 3.2 we defined the dual of the linear program
min{cT x : Ax = b, x ≥ 0} (4.3)
to be the linear program

max{bT y : AT y ≤ c, y free}. (4.4)
Our goal in this section is to generalize these definitions, as well as the corresponding weak
duality theorems (Theorem 3.2 and Theorem 3.9).
Consider the following table describing a pair of linear programs, The notation Ax ? b
(resp. AT y ? c) indicates that constraints of (Pmax ) and (Pmin ) are arbitrary linear constraints,
165
166 CHAPTER 4. DUALITY THEORY
(Pmax ) (Pmin )
≤ constraint ≥ 0 variable
max cT x = constraint free variable min bT y
subject to ≥ constraint ≤ 0 variable subject to
Ax ? b ≥ 0 variable ≥ constraint AT y ? c
x?0 free variable = constraint y?0
≤ 0 variable ≤ constraint
Table 4.1: Primal-dual pairs
i.e. the relation between the LHS and RHS of the constraint is either ≤, =, or ≥. The notation
x ? 0 and y ? 0 indicates that the variables of (Pmax ) and (Pmin ) are either non-negative, free,
or non-positive. We pair constraint i of (Pmax ), i.e. rowi (A)x ? bi , with the variable yi of (Pmin )
and we pair constraint j of (Pmin ), i.e. col j (A)y ? c j , with the variable x j of (Pmax ). The
table indicates the relation between a constraint of (Pmax ) and the corresponding variable of
(Pmin ). For instance if rowi (A)x ≤ bi in (Pmax ) then we have yi ≥ 0 in (Pmin ). The table also
indicates the relation between a constraint of (Pmin ) and the corresponding variable of (Pmax ).
For instance if col j (A)y ≤ c j in (Pmin ) then we have x j ≤ 0 in (Pmax ).
Consider a pair of linear program (Pmax ) and (Pmin ) satisfying the relations given in ta-
ble 4.1. We then define (Pmin ) to be the dual of (Pmax ) and we define (Pmax ) to be the dual
of (Pmin ). We call the pair (Pmax ) and (Pmin ) a primal-dual pair. The reader should verify
that these definitions imply in particular that the dual of the dual of a linear program (P) is the
original linear program (P).
Example 15. Let us apply these definitions to try to find the dual of the linear program (4.1).
This linear program can be rewritten as,
min{b̂T ŷ : ÂT ŷ ≥ ĉ, ŷ ≥ 0} (P)
where b̂ = c, Â = AT , ĉ = b and ŷ = x. Since (P) is a minimization problem, it plays the role

of (Pmin ) in table 4.1. It follows that the dual (D) of (P) is of the form (Pmax ) in table 4.1,
max{ĉT x̂ : Âx̂ ? b̂, x̂ ? 0} (D)
Since,
• we have ÂT ŷ ≥ ĉ in (P) we have x̂ ≥ 0 in (D),
• ŷ ≥ 0 in (P) we have Âx̂ ≤ b̂ in (D).

4.1. WEAK DUALITY 167
Hence, the dual (D) of (P) is given by
max{ĉT x̂ : Âx̂ ≤ b̂, x̂ ≥ 0},
which is equal to (4.2) if we define y as being equal to x̂. Similarly, (4.4) is the dual of (4.3).
Hence, our new definition of dual is consistent with the definitions given in sections 3.1
and 3.2.
Example 16. Consider the following linear program,
max (12, 26, 20)x

subject to
   
1 2 1 ≥ −2 (1) (P)
4 6 5 x ≤  2  (2)
2 −1 −3 = 13 (3)
x1 ≥ 0, x2 free, x3 ≥ 0.
Suppose we wish to find the dual of (P). Since (P) is a maximization problem, it plays the role
of (Pmax ) in table 4.1. It follows that the dual (D) of (P) is of the form (Pmin ) in table 4.1,
min (−2, 2, 13)y

subject to
   
1 4 2 12 (1) (D)
2 6 −1 y ? 26 (2)
1 5 −3 20 (3)
y1 ? 0, y2 ? 0, y3 ? 0
Since,
• constraint (1) of (P) is of type ”≥” we have y1 ≤ 0 in (D),
• constraint (2) of (P) is of type ”≤” we have y2 ≥ 0 in (D),
• constraint (3) of (P) is of type ”=” we have y3 is free in (D),
• x1 ≥ 0 in (P) we have that constraint (1) of (D) is of type ”≥”,
• x2 is free in (P) we have that constraint (2) of (D) is of type ”=”,
• x3 ≥ 0 in (P) we have that constraint (3) of (D) is of type ”≥”.

Hence, the dual (D) of (P) is given by,
min (−2, 2, 13)y

subject to
   
1 4 2 ≥ 12 (1) (D)
2 6 −1 y = 26 (2)
1 5 −3 ≥ 20 (3)
y1 ≤ 0, y2 ≥ 0, y3 free
We proved in Theorem 3.2 that if x̄ is a feasible solution to (4.1) and ȳ is a feasible solution
to (4.2) then cT x̄ ≥ bT ȳ. Similarly, we proved in Theorem 3.9 that if x̄ is a feasible solution
to (4.3) and ȳ is a feasible solution to (4.4) then cT x̄ ≥ bT ȳ. The following is a common
generalization to both of these results.
Theorem 4.1 (Weak Duality Theorem). Let (P) and (D) be a primal-dual pair where (P) is a
maximimization problem and (D) a minimization problem. Let x̄ and ȳ be feasible solutions
for (P), and (D), respectively.
(1) Then cT x̄ ≤ bT ȳ.
(2) If cT x̄ = bT ȳ then x̄ is an optimal solution to (P) and ȳ is an optimal solution to (D).
Observe, that (2) in the previous theorem follows immediately from (1). This is because as
ȳ is feasible (1) implies that for every feasible solution x of (P), cT x ≤ bT ȳ, i.e. bT ȳ is an
upper bound of (P). But then x̄ is an optimal solution to (P) since its value is equal to its upper
bound. The argument to prove that ȳ is an optimal solution to (D) is similar.
Example 16, continued. Consider the primal-dual pair given by (P) and (D). Let
x̄ = (5, −3, 0)T and ȳ = (0, 4, −2)T .
It can be readily checked that x̄ is feasible for (P) and ȳ is feasible for (D). Moreover, (2, 2, 13)ȳ =
(12, 26, 20)x̄ = −18. It follows from Theorem 4.1(2) that x̄ is an optimal solution to (P) and
that ȳ is an optimal solution to (D).
Proof of Theorem 4.1. Let (P) be an arbitrary linear program where the goal is to maximize
the objective function. Then (P) can be expressed as,

max cT x
subject to
rowi (A)x ≤ bi (i ∈ R1 )
rowi (A)x ≥ bi (i ∈ R2 )
(4.5)
rowi (A)x = bi (i ∈ R3 )
xj ≥ 0 ( j ∈ C1 )
xj ≤ 0 ( j ∈ C2 )
x j free ( j ∈ C3 )
for some partition R1 , R2 , R3 of the row indices, and some partition C1 ,C2 ,C3 of the column
indices. Its dual (D) is given by (see table 4.1),
min bT y
subject to
col j (A)T y ≥ c j ( j ∈ C1 )
col j (A)T y ≤ c j ( j ∈ C2 )
(4.6)
col j (A)T y = c j ( j ∈ C3 )
yi ≥ 0 (i ∈ R1 )
yi ≤ 0 (i ∈ R2 )
yi free (i ∈ R3 )
After adding slack variables s we can rewrite (4.5) as,
max cT x
subject to
Ax + s = b
si ≥ 0 (i ∈ R1 )
si ≤ 0 (i ∈ R2 ) (4.7)
si = 0 (i ∈ R3 )
xj ≥ 0 ( j ∈ C1 )
xj ≤ 0 ( j ∈ C2 )
x j free ( j ∈ C3 )
After adding slack variables w we can rewrite (4.6) as,
min bT y
subject to
AT y + w = c
wj ≤ 0 ( j ∈ C1 )
wj ≥ 0 ( j ∈ C2 ) (4.8)
wj = 0 ( j ∈ C3 )
yi ≥ 0 (i ∈ R1 )
yi ≤ 0 (i ∈ R2 )
yi free (i ∈ R3 )
Let x̄ be a feasible solution to (4.5) and let ȳ be a feasible solution to (4.6). Then for s̄ := b−Ax̄,
the vectors x̄, s̄ are a solution to (4.7). Moreover, for w̄ := c − AT ȳ, the vectors ȳ, w̄ are a
solution to (4.8). Since Ax̄ + s̄ = b we have,
?
ȳT b = ȳT (Ax̄ + s̄) = (ȳT A)x̄ + ȳT s̄ = (c − w̄)T x̄ + ȳT s̄ = cT x̄ − w̄T x̄ + ȳT s̄.
where equality (?) follows from the fact that AT ȳ + w̄ = c or equivalently that ȳT A = (c − w̄)T .
Hence, to prove that bT ȳ ≥ cT x̄ it suffices to show that,
w̄T x̄ = ∑ w̄ j x̄ j + ∑ w̄ j x̄ j + w̄ j x̄ j
∑ ≤0 and
j∈C1 |{z} |{z} j∈C2 |{z} |{z} j∈C3 |{z}
≤0 ≥0 ≥0 ≤0 =0
ȳT s̄ = s̄i ȳi + ∑ s̄i ȳi + ∑ s̄i ȳi ≥ 0.

∑ |{z}
i∈R1
|{z} |{z} |{z}
i∈R2
|{z} i∈R3
≥0 ≥0 ≤0 ≤0 =0
We close this section by noting the following consequences of the Weak Duality Theorem
and the Fundamental Theorem of linear programming,
Corollary 4.2. Let (P) and (D) be a primal-dual pair. Then
1. if (P) is unbounded then (D) is infeasible,
2. if (D) is unbounded then (P) is infeasible,
3. if (P) and (D) are both feasible, then (P) and (D) both have optimal solutions.
Proof. We may assume that (P) is a maximization problem with objective function cT x and
that (D) is a minimization problem with objective function bT y. (1) Suppose that (D) has
a feasible solution ȳ. Then by the Weak duality (Theorem 4.1), bT ȳ is an upper bound for
(P). In particular, (P) is not unbounded. (2) Suppose that (D) has a feasible solution x̄. Then
by the Weak duality (Theorem 4.1), cT x̄ is an lower bound for (D). In particular, (D) is not
unbounded. (3) By (1) we know that (P) is not unbounded and by (2) we know that (D) is
not unbounded. Since by the Fundamental Theorem of linear programming (see Theorem
2.11) (P) is either unbounded, infeasible, or has an optimal solution, (P) must have an optimal
solution. By a similar argument (D) has an optimal solution.
It is often easy when given a primal-dual pair to verify that both linear programs are feasible.
In that case statement (3) in the previous result implies immediately that both linear programs
have an optimal solution.
Exercises
1. Find the dual of each of the following LPs,
(a)
max (1, 3, 1)x
subject to
   
1 2 7 ≤ −3
0 1 −1 = 9
 x  
9 0 0 ≤ 5
1 −1 1 = 0
x1 , x2 ≥ 0, x3 free
(b)
min (5, 6)x
subject to
   
3 −1 ≥ 8
2 4  = −2
 x  
3 2 ≤ 2
−1 2 = −3
x1 ≥ 0, x2 free
2. Find the dual of each of the following LPs,
(a)
min cT x + d T u
subject to
Ax + Du = b
x≥0
where A and D are matrices, c, d and b are vectors, and x and u are column vectors of
variables.
(b)
max cT x
subject to
Ax ≥ b
Dx − Iu ≤ d
x ≤ 0, u ≥ 0
where A and D are matrices, I is the identity matrix, c, b and d are vectors, and x and u
are column vectors of variables.
Note, assume that all vectors and matrices are of suitable sizes so that the constraints and
objective function are well-defined.
3. Let a1 , . . . , an be n distinct real numbers. Consider the LP problem,
max{x : x ≤ ai , for i = 1, . . . , n}. (4.9)
(a) Prove that the optimal value of (4.9) is the minimum of a1 , . . . , an .
(b) Find the dual (D) of (4.9).
(c) What is (D) doing?
4. Give an example of an infeasible LP problem whose dual is also infeasible. You need to
prove algebraically that your example and its dual are infeasible.
~ is a pair (V, E) where V is a finite set and ~E is an ordered set of pairs.

5. A directed graph G
Members of V are called vertices and members of ~E are called arcs. Directed graphs can
be represented in a drawing where vertices correspond to points and arcs i j are indicated
by an arrow going from i to j. For instance the directed graph with V = {1, 2, 3, 4, 5, 6} and
~E = {a, b, c, e, f , g, h, k, l} where a = 12, b = 13, c = 43, e = 35, f = 54, g = 25, h = 52, k =
65, l = 46 can be represented as follows
3 4
1 b c
a e f l
g
2 k 6
h 5
A directed cycle is a sequence of arcs of the form i1 i2 , i2 i3 , . . . , ik−1 ik , ik i1 where i1 , i2 , . . . , ik

denote distinct vertices. For instance c, e, f is a directed cycle and so is g, h. Given an arc
i j vertex i is the tail of i j and vertex j is the head of i j. Given a vertex v the set of all
arcs with tail v is denoted δ + (v) and the set of all arcs with head v is denoted δ − (v). For
instance δ + (5) = { f , h} and δ − (5) = {e, g, k}.
~ = (V, ~E) is an assignment of
(a) A topological ordering of the vertices of a digraph G
value yi to each vertex i such that for every arc i j, yi ≥ y j + 1. Show that if there exists
a topological ordering then there is no directed cycle.
~ = (V, ~E) be a directed graph and consider the following linear program (P),
Let G
max ∑(xe : e ∈ ~E)
subject to

∑ xe : e ∈ δ + (v) − ∑ xe : e ∈ δ − (v) = 0 (for all v ∈ V )
xe ≥ 0 (for all e ∈ ~E)
~ is the digraph represented in the figure, the equality constraint for vertex
For instance if G
5 says that,
x f + xh − xe − xg − xk = 0.
(b) Show that (P) is feasible.

~ has a directed cycle then (P) is unbounded.
(c) Show that if G
(d) Write the dual (D) of (P).
H INT: You will have one constraint for each arc i j and the constraints will be of the
form,
±yi ± y j ? 1
~ is a feasible solution to (D).

(e) Show that a topological ordering yi (∀i ∈ V ) of G
(f) Find an alternate proof to the statement you proved in (a) by using (e) and Weak
Duality.
~ has a directed
(g) Show that if (P) has a feasible solution of value greater than 0 then G
cycle.
~ has no directed cycle then G
(h) Using (g) prove that if G ~ must have a topological ordering.
6. Consider a graph G. A dyad of G is a pair of edges that share exactly one vertex. A packing
of dyads is a set S of dyads with the property that no two dyads in S share a common
vertex. In the following example the dyads are represented by bold edges.
(a) Show that the following IP finds a packing of dyads of maximum cardinality.
max ∑(xD : D is a dyad)

subject to
∑(xD : dyad D contains v) ≤ 1 (v vertex)
xD ≥ 0, xD integer (D is a dyad)
(b) Denote by (P) the linear program obtained from (IP) by relaxing the conditions that x
be integer. Find the dual (D) of (P).
(c) Suppose that G is bipartite and let A, B be a partition of V such that all edges are going
between A and B. Find a solution for (D) of value min{|A|, |B|}.
(d) Using (c) show that packing of dyads have cardinality at most min{|A|, |B|}.
4.2. STRONG DUALITY 175
4.2 Strong duality

Let (P) and (D) be a primal-dual pair where (P) is a maximization problem and (D) is a
minimization problem. We know from Theorem 4.1 that for any feasible solution x̄ of (P) and
any feasible solution ȳ of (D), the value of x̄ in (P) is at at most the value of ȳ in (D). It is
natural to ask whether we can find a pair of feasible solutions for (P) and (D) that have the
same optimal value. The next theorem states that this is indeed always the case.
Theorem 4.3 (Strong Duality). Let (P) and (D) be a primal-dual pair. If there exists an
optimal solution x̄ of (P) then there exists an optimal solution ȳ of (D). Moreover, the value of
x̄ in (P) equals the value of ȳ in (D).
We will only prove this theorem for the special case when (P) in in SEF. It is routine
however, to derive the result for arbitrary linear program by first converting the linear program
to an equivalent linear program in SEF.
Proof of Theorem 4.3 for SEF. If (P) is in SEF, the primal-dual pair (P) and (D) is of the form,
max{cT x : Ax = b, x ≥ 0} (P)
min{bT y : AT y ≥ c}. (D)
We know from Theorem 2.7 that the 2-phase Simplex procedure terminates. Since (P) has
an optimal solution, the simplex will terminate with a optimal basis B. Let us rewrite (P) in
canonical form for B (see Proposition 2.4),
max z = ȳT b + c̄T x

subject to
(P’)
xB + A−1 −1
B AN xN = AB b
x ≥ 0,
where
ȳ = A−T
B cB and c̄T = cT − ȳT A.
Let x̄ be the basic solution for B, i.e.
x̄B = AB−1 b and x̄N = 0.

(P’) has the property that for any feasible solution the value in (P) and (P’) is the same. (P’)
also has the property that c̄B = 0. Hence,
z = cT x̄ = ȳT b + c̄T x̄ = bT ȳ + c̄TN x̄N + c̄B x̄B = bT ȳ.
|{z} |{z}
=0 =0
Since the simplex procedure stopped we must have c̄ ≤ 0 i.e.
cT − ȳT A ≤ 0 or equivalently AT ȳ ≥ c.
It follows that ȳ is feasible for (D). Since cT x̄ = bT ȳ we know from Weak Duality (Theo-
rem 4.1) that x̄ is an optimal solution to (P) and ȳ is an optimal solution to (D).
Combining the previous result together with Corollary 4.2(3) we obtain.

Theorem 4.4 (Strong duality - feasibility version). Let (P) and (D) be a primal-dual pair. If
(P) and (D) are both feasible then there exists an optimal solution x̄ of (P) and an optimal
solution ȳ of (D) and the objective value of x̄ in (P) equals the objective value of ȳ in (D).
We can now classify the possibilities (infeasible, optimal, unbounded) for a primal-dual
pair of linear programs. On the face of it, the three possibilities for each LP would lead to
nine possibilities for the primal-dual pair, but certain of these cannot occur. The next table
describes the set of all possible outcomes,
PP
PP Primal
P PP optimal solution unbounded infeasible
Dual P PP
P
optimal solution possible 1 impossible
2 impossible 3
unbounded impossible 4 impossible
5 possible 6
infeasible impossible 7 possible 8 possible 9
Table 4.2: Outcomes for primal-dual pairs
Corollary 4.2(1) prove entries ,

2 5 and .
8 Corollary 4.2(2) prove entries ,
4 5 and .
6
The strong duality Theorem 4.3 prove entries ,
1 ,
2 ,
3 ,
4 .
7 We can construct an example
to show that entry
9 is possible. Here we take advantage that infeasibility depends only on
the constraints, and not on the objective function. So, the problem
max c1 x1 + c2 x2
subject to

1 −1 x1 −2
≤
−1 1 x2 1
x ≥ 0,
4.2. STRONG DUALITY 177
is infeasible, regardless of the values of c1 and c2 . To give an example for

9 one needs only
to choose c1 and c2 such that the dual is infeasible. We leave this as an exercise.
We know from Weak Duality that in order to prove that a vector x̄ is an optimal solution
to a linear program (P) it suffices to find a vector ȳ satisfying the following three properties:
(A) x̄ is feasible for (P);
(B) ȳ is feasible for the dual (D) of (P);
(C) the value of x̄ in (P) is equal to the value of ȳ in (D).
In the Simplex we consider (P) of the form max{cT x : Ax = b, x ≥ 0}. At each iteration we
have a feasible basis B. The vector x̄ is the basic feasible solution for B. Hence, property
(A) holds. We define ȳ as A−T
B cB which implies (see the proof of Theorem 4.3) that (C)
holds. The algorithm terminates when (B) is satisfied. Thus the simplex keeps two out of
the three properties (A),(B),(C) satisfied at each step, and terminates when the third property
holds. There are other algorithms that follow the same scheme. The dual Simplex procedures,
maintains properties (B),(C) at each iteration and terminates when (A) holds. Interior point
methods, maintain properties (A),(B) at each iteration and terminate when (C) holds.
Exercises
1. Let (P) denote the LP problem max{cT x : Ax ≤ b, x ≥ 0}, and let (D) be the dual of (P).
The goal of this exercise is to prove the strong Duality Theorem for (P) and (D). (In the
Course Notes we proved strong duality only for problems in SEF.)
(a) Convert (P) into an equivalent problem (P’) in SEF.
(b) Find the dual (D’) of (P’).
(c) Suppose that (P) has an optimal solution x. Construct an optimal solution x0 of (P’).
(d) Using Theorem 4.3 for problems in SEF, deduce that there exists an optimal solution
y0 to (D’) where the value of x0 for (P’) is equal to the value of y0 for (D’).
2. Consider the following linear program (P), max{cT x : Ax ≤ b, x ≥ 0}. Let (D) denote the
dual of (P), and suppose that (P) has an optimal solution x̄ and (D) has an optimal solution ȳ.
(a) Let (P2) denote the LP resulting by multiplying the first inequality of (P) by 2, and let
(D2) denote the dual of (P2). Find optimal solutions of (P2) and (D2) in terms of x̄
and ȳ. Justify your answers.
(b) Let (P3) denote the LP obtained from (P) by replacing the constraint Ax ≤ b by MAx ≤
Mb where M is a square non-singular matrix of suitable dimension. Let (D3) denote
the dual of (P3). Find optimal solutions of (P3) and (D3) in terms of x̄ and ȳ. Justify
your answers.
3. Consider the following linear program,
max{cT x : Ax = b, x ≥ 0}. (P)
Assume that (P) has a feasible solution. The goal of the exercise is to use duality to show
that (P) is unbounded if and only if there exists r such that
r ≥ 0, Ar = 0, and cT r > 0. (?)
a) Show that if there exists r satisfying (?) then (P) is unbounded.
b) Find the dual (D) of (P).
max{cT r : Ar = 0, r ≥ 0}. (P’)
Suppose that (P) is unbounded.
(a) Find the dual (D’) of (P’).
(b) Show that (D’) is infeasible.
(c) Show that (P’) is feasible.
(d) Show that (P’) is unbounded.

H INT: Consider the possible pairs of outcomes for (P’) and (D’).
(e) Deduce from f) that there exists a feasible solution r̄ to (P’) that satisfies (?).
4.3. A GEOMETRIC CHARACTERIZATION OF OPTIMALITY 179
4.3 A geometric characterization of optimality

In this section we will provide a geometric characterization for when a feasible solution to a
linear program of the form,
max{cT x : Ax ≤ b}
is also an optimal solution. Observe that every linear program can be transformed into a linear
program of the aforementioned form. As an intermediate step we will provide an alternate
statement of the Strong Duality theorem by carefully analyzing when the weak duality relation
holds with equality. This will lead to the notion of complementary slackness.
4.3.1 Complementary slackness

Consider the following primal-dual pair of linear programs (see table 4.1),
max{cT x : Ax ≤ b} (4.10)
min{bT y : AT y = c, y ≥ 0}. (4.11)
Let us revisit revisit the proof of Weak Duality (Theorem 4.1) for this particular case. After
adding slack variables s, we can rewrite (4.10) as,
max{cT x : Ax + s = b, s ≥ 0}. (4.12)
Let x̄ be a feasible solution to (4.10) and let ȳ be a feasible solution to (4.11). Then for
s̄ = b − Ax̄, the vectors x̄, s̄ are a solution to (4.12). Since Ax̄ + s̄ = b we have,
bT ȳ = ȳT b = ȳT (Ax̄ + s̄) = (ȳT A)x̄ + ȳT s̄ = cT x̄ + ȳT s̄,
where the last equality follows from the fact that ȳ is feasible for (4.11). We know from the
Strong Duality Theorem 4.3 that x̄ and ȳ are both optimal for (4.10) and (4.11) respectively,
if and only if cT x̄ = bT ȳ, or equivalently ȳT s̄ = 0. Let m denote the number of constraints
of (4.10) then
m
ȳT s̄ = ∑ s̄i ȳi . (?)
i=1
As s̄i ≥ 0 and ȳi ≥ 0, every term in the sum (?) is non-negative. It follows ȳT s̄ = 0 if and only
if for every i ∈ {1, . . . , m}, s̄i = 0 or ȳi = 0. Note, that s̄i = 0 means that constraint i of (4.10)
holds with equality for x̄; we will sometimes also say that this constraint is tight for x̄. The
following result summarizes these observations,
Theorem 4.5 (Complementary Slackness - special case). Let x̄ be a feasible solution to (4.10)
and let ȳ be a feasible solution to (4.11). Then x̄ is an optimal solution to (4.10) and ȳ is an
optimal solution to (4.11) if and only if for every row index i of A, constraint i of (4.10) is
tight for x̄ or the corresponding variable dual ȳi = 0.
Example 17. Consider the following primal-dual pair,
max (5, 3, 5) x
subject to
    (P)
1 2 −1 2
 3 1 2 x ≤  4 
−1 1 1 −1
and
min (2, 4, −1)y
subject to
   
1 3 −1 5 (D)
 2 1 1  y = 3 
−1 2 1 5
y≥0
It can be readily checked that x̄ = (1, −1, 1)T is a feasible solution to (P) and that ȳ = (0, 2, 1)T
is a feasible solution to (D). Theorem 4.5 states that to show that x̄ is an optimal solution to
(P) and that ȳ is an an optimal solution to (D) it suffices to check the following conditions:
• ȳ1 = 0 OR (1 2 − 1)x̄ = 2;
• ȳ2 = 0 OR (3 1 2)x̄ = 4 ;
• ȳ3 = 0 OR (−1 1 1)x̄ = −1 ;
e
We indicate by a the part of the OR condition that is satisfied. In our particular case, all
OR conditions are satisfied. It follows that x̄ is optimal for (P) and that ȳ is optimal for (D).
Consider now the feasible solution x0 = (2, −2, 0)T of (P). The conditions of Theorem 4.5 do
not hold for the pair x0 and ȳ, since (−1, 1, 1)x0 < −1 and ȳ3 = 1 > 0. Observe that in this case
Theorem 4.5 only allows us to deduce that either: x0 is not optimal for (P); or ȳ is not optimal
for (D). The theorem does not give any indication as to which outcome occurs.
Theorem 4.5 generalizes to arbitrary linear programs. Consider a primal-dual pair (Pmax )
and (Pmin ) as given in table 4.1. Let x̄ be a feasible solution to (Pmax ) and let ȳ be a feasible
solution to (Pmin ). We say that x̄, ȳ satisfy the Complementary Slackness (CS) conditions if,
• For every variable x j of (Pmax ) either:

x̄ j = 0 or the corresponding constraint j of (Pmin ) is satisfied with equality,
• For every variable yi of (Pmin ) either:

ȳi = 0 or the corresponding constraint i of (Pmax ) is satisfied with equality.
Note, that if a variable x j is free then the corresponding constraint j of (Pmin ) is an equality
constraint, hence the condition is trivially satisfied. Similarly, if yi is free then the corre-
sponding constraint i of (Pmax ) is an equality constraint. Thus we only need to state the CS
conditions for non-free variables of (Pmax ) and (Pmin ).
We leave as an exercise the following generalization of Theorem 4.5,
Theorem 4.6 (Complementary Slackness Theorem). Let (P) and (D) be an arbitrary primal-
dual pair. Let x̄ be a feasible solution to (P) and let ȳ be a feasible solution to (D). Then x̄ is
an optimal solution to (P) and ȳ is an optimal solution to (D) if and only if the Complementary
Slackness conditions hold.
Example 18. Consider the following linear program
max (12, 26, 20)x

subject to
   
1 2 1 ≤ −1 (P)
4 6 5 x ≤  2 
2 −1 −3 = 13
x1 ≥ 0, x3 ≥ 0.
Its dual is given by,

min (−1, 2, 13)y
subject to
   
1 4 2 ≥ 12 (D)
2 6 −1 = 26
1 5 −3 ≥ 20
y1 ≥ 0, y2 ≥ 0.
It can be readily checked that x̄ = (5, −3, 0)T is a feasible solution to (P) and that ȳ =
(0, 4, −2)T is a feasible solution to (D). Theorem 4.6 states that to show that x̄ is an opti-
mal solution to (P) and that ȳ is an optimal solution to (D) it suffices to check the following
conditions:
• x̄1 = 0 OR ȳ1 + 4ȳ2 + 2ȳ3 = 12 ;
• x̄3 = 0 OR ȳ1 + 5ȳ2 − 3ȳ3 = 20;
• ȳ1 = 0 OR x̄1 + 2x̄2 + x̄3 = −1 ;
• ȳ2 = 0 OR 4x̄1 + 6x̄2 + 5x̄3 = 2 .
e
We indicate by a the part of the OR condition that is satisfied. In our particular case, all
OR conditions are satisfied. It follows that x̄ is optimal for (P) and that ȳ is optimal for (D).
Observe that in the third condition, both OR alternatives are satisfied. This does not contradict
the theorem as the CS conditions require that at least one of the alternative holds, but they do
not preclude both alternatives to simultaneously hold.
Exercises
1. For each of (a) and (b) do the following:
• Write the dual (D) of (P).

• Write the Complementary Slackness (CS) conditions for (P) and (D).
• Use Weak Duality to prove that x̄ is optimal for (P) and ȳ optimal for (D).
• Use CS to prove that x̄ is optimal for (P) and ȳ optimal for (D).
(a)
max (−2, −1, 0)x
subject to

1 3 2 ≥ 5 (P)
x
−1 4 2 ≤ 7
x1 ≤ 0, x2 ≥ 0
and
x̄ = (−1, 0, 3)T ȳ = (−1, 1)T .
(b)
min (−5, −7, −5)x
subject to
   
1 2 3 ≤ 3 (P)
4 5 6 x ≤ 9
7 8 9 ≥ 2
x1 , x2 , x3 ≥ 0
and
x̄ = (1, 1, 0)T ȳ = (−1, −1, 0)T .
2. Recall that (3.9) is the IP formulation of the shortest path problem. Let (P) denote the
linear programming relaxation of (3.9). The dual of (P) is given by (3.15).
(a) Write the complementary slackness conditions for (P) and (3.15).
(b) Using the CS theorem (theorem 4.6) prove Proposition 3.6.
3. Recall that (3.24) is the IP formulation of the minimum cost matching problem. Let (P)
denote the linear programming relaxation of (3.24). The dual of (P) is given by (3.30).
(a) Write the complementary slackness conditions for (P) and (3.30).
(b) Using the CS theorem (theorem 4.6) prove Proposition 3.8.
4. (a) Suppose that the LP problem max{cT x : Ax = b}. Prove that it has an optimal solution
if and only if cT is in the row space of A.
H INT: Consider the CS conditions.
(b) Consider the LP problem, max{cT x : 1T x = 250}. Under what conditions on c does
this problem have an optimal solution? Justify your answer.
5. Consider the following linear program where A is an m × n matrix,
max cT x
subject to (4.13)
Ax ≤ b
(a) Write the dual (D) of (P).

Let x̄ and ȳ be feasible solutions to (P) and (D) respectively. We say that x̄ and ȳ almost
satisfy the CS conditions if for every i = 1, . . . , m at least one of the following holds:
(CS1) ȳi = 0,
(CS2) rowi (A)x̄ = bi ,
1
(CS3) ȳi ≤ 1 and bi − rowi (A)x̄ ≤ 1000 .
Suppose that A has a 100 rows and that x̄ and ȳ almost satisfy the CS conditions.
(b) Show that the value of an optimal solution of (P) cannot exceed the value of x̄ by more
1
than 10 .
(c) Show that the value of an optimal solution of (D) cannot be inferior to the value of ȳ
1
by more than 10 .
6. A factory makes n products from m resources. The jth product earns a profit of c j dollars
per unit and consumes ai j units of resource i per unit. We have bi units of resource i avail-
able. The production problem is to decide how many units to produces of each product
such that total profit is maximized while not exceeding the available resources. The fol-
lowing linear program (P) solves the production problem (note x j is the number of units of
resource j produced),
n
max ∑ c jx j
j=1
subject to
n
∑ ai j x j ≤ bi (i = 1, . . . , m)
j=1
xj ≥ 0 ( j = 1 . . . , n)
(a) Write the dual (D) of (P),

(b) Write the complementary slackness conditions for (P) and (D).
Interpret the dual variable yi as the unit price of resource i.
(c) Give an economic interpretation to the LP (D),

(d) Give an economic interpretation to the CS conditions.
4.3.2 Geometry
Before we can characterize optimal solutions to linear programs geometrically, we will need
a number of preliminary definitions.
Let a(1) , . . . , a(k) be a set of vectors in Rn . We define the cone generated by a(1) , . . . , a(k)
to be the set ( )
k
C= ∑ λia(i) : λi ≥ 0 for all i = 1, . . . , k ,
i=1
i.e., C is the set of all points that can be obtained by multiplying each of a(1) , . . . , a(k) by a
non-negative number and adding all of the resulting vectors together. We denote the set C by
cone{a(1) , . . . , a(k) }. Note, we define the cone generated by an empty set of vectors to be the
set consisting of the zero vector only, i.e. the set {0}.
Example 19. Consider vectors,

(1) 2 (2) 3 (3) 2
a = , a = , and a = .
−1 1 1
In the following figure, we represent each of the vectors a(1) , a(2) , a(3) by an arrow from the
origin and the infinite region containing a(2) bounded by the half lines from the origin deter-
mined by a(1) and a(3) respectively is cone{a(1) , a(2) , a(3) }.
x2
"
!2 ! "
3
(3 ) = 1 (2 ) = 1
1 a a
0
1 2 3 x1
−1 a (1) !
= 2 "
−1
Let P := {x : Ax ≤ b} be a polyhedron and let x̄ ∈ P. Let J(x̄) be the row indices of

A corresponding to the tight constraints of Ax ≤ b for x̄, i.e. i ∈ J(x̄) when rowi (A)x̄ = bi .
We define the cone of tight constraints for x̄ to be the cone C generated by the rows of A
corresponding to the tight constraints, i.e. C = cone{rowi (A)T : i ∈ J(x̄)}.
Example 20. Consider for instance,
max ( 32 , 12 )x
s.t.    
1 0 2 (1) (4.14)
1 1 x ≤ 3 . (2)
0 1 2 (3)
Note, x̄ = (2, 1)T is a feasible solution. The tight constraints for x̄ are constraints (1) and (2).
Hence, the cone of tight constraints for x̄ is,

1 1
C := cone , .
0 1
We are now ready to present our geometric characterization of optimality,
Theorem 4.7. Let x̄ be a feasible solution to max{cT x : Ax ≤ b}. Then, x̄ is optimal if and
only if c is in the cone of tight constraints for x̄.
Example 20, continued. The objective function for (4.14) is z = ( 23 , 12 )x and ( 32 , 12 )T ∈ C as,

3/2 1 1 1
= ȳ1 + ȳ2 for ȳ1 = 1 ≥ 0 and ȳ2 = 2 ≥ 0. (4.15)
1/2 0 1
It follows from Theorem 4.7 that x̄ = (2, 1)T is an optimal solution of (4.14). We illustrate
this result in the following figure. Note, in that figure the cone of tight constraints is drawn
with its origin shifted to the point x̄.
We will use the linear program (4.14) with x̄ = (2, 1)T to illustrate one direction of the
proof of Theorem 4.7, namely that if the objective function is in the cone of the tight con-
straints for x̄, then x̄ is an optimal solution. Observe that because of Theorem 4.5 it will
suffice to,
Step 1. find a feasible solution ȳ for the dual of (4.14), and
Step 2. verify that x̄, ȳ satisfy the CS conditions.

! "
1
2 1
! "
3/2
1/2
cone of tight
! "
1
constraints
1 x̄ 0
feasible region
0
1 2 x1
Figure 4.1: Geometric interpretation of optimality
The dual of (4.14) is given by (see table 4.1),
min (2, 3 , 2)y

subject to

1 1 0 3/2 (4.16)
y=
0 1 1 1/2
y ≥ 0,
Step 1. Let ȳ1 := 1, ȳ2 = 12 , and ȳ3 = 0. Then (4.15) implies that,
 
ȳ1
3/2 1 1 0 1 1 0  
= ȳ1 + ȳ2 + ȳ3 = ȳ2 .
1/2 0 1 1 0 1 1
ȳ3 .
Hence, ȳ = (ȳ1 , ȳ2 , ȳ3 )T ≥ 0 is a feasible solution to (4.16). In this argument we used the fact
that c = (3, 1)T is in the cone of the tight constraints to obtain values for the dual variables
of the tight constraints, and we assigned the value zero to all dual variables corresponding to
non-tight constraints.
Step 2. The CS conditions for x̄, ȳ are,

• y1 = 0 OR x1 = 2 ;
• y2 = 0 OR x1 + x2 = 3 ;
• y3 = 0 OR x2 = 2.
e
We indicate by a the part of the OR condition that is satisfied. In our particular case, the CS
conditions are satisfied. Note, because of the way the dual variables were defined in Step 1, if
the constraints are not tight, then the corresponding dual variable is zero. Hence, the fact that
the CS conditions hold is no surprise.
We completed Step 1 and Step 2, hence x̄ = (2, 1)T is optimal.
Proof of Theorem 4.7. We consider the following primal-dual pair,
max{cT x : Ax ≤ b} (P)
min{bT y : AT y = b, y ≥ 0} (D)
where A is an m × n matrix. The CS conditions for (P) and (D) are,
yi = 0 OR rowi (A)T x = bi (for every row index i). (?)
Suppose c is in the cone of the tight constraints for x̄.
We need to show that x̄ is an optimal solution to (P). Because of Theorem 4.5 it will suffice to,
Step 1. find a feasible solution ȳ for (D) and
Step 2. verify that x̄, ȳ satisfy the CS conditions (?).
Step 1. Recall, J(x̄) denotes the row indices of the tight constraints of Ax ≤ b for x̄. Since c is
in the cone of the tight constraints x̄, i.e. c ∈ cone{rowi (A)T : i ∈ J(x̄)}, there exists ȳi ≥ 0 for
all i ∈ J(x̄) such that
T
c = ∑ ȳi rowi (A) : i ∈ J(x̄) .
Let us assign the value zero to all dual variables corresponding to non-tight constraints, i.e we
set ȳi = 0 for all row indices i ∈
/ J(x̄). Then

c = ∑ ȳi rowi (A) : i = 1, . . . , m = AT (ȳ1 , . . . , ȳm )T .
T
Hence, ȳ := (ȳ1 , . . . , ȳm )T ≥ 0 is a feasible solution to (D).

Step 2. Because of the way the dual variables where defined in Step 1, either i ∈ J(x̄) and the
constraints rowi (x̄) ≤ bi are tight for x̄, or the corresponding dual variable ȳi = 0. Hence, the
CS conditions in (?) hold.
We completed Step 1 and Step 2, hence x̄ is optimal for (P).
Suppose x̄ is an optimal solution to (P).
We need to show that c is in the cone of the tight constraints for x̄. Theorem 4.5 implies that
there exist a feasible solution ȳ of (D) such that the CS conditions (?) hold for x̄, ȳ. Define,
J := {i : ȳi > 0}. Then

T T T
c = A ȳ = ∑ ȳi rowi (A) : i = 1, . . . , m = ∑ ȳi rowi (A) : i ∈ J .
Hence, c ∈ cone{rowi (A)T : i ∈ J}. Finally, observe that if i ∈ J, then ȳi > 0 which implies by
the CS conditions (?) that constraint i of Ax ≤ b is tight for x̄. Hence, J ⊆ J(x̄), and c is in the
cone of the tight constraints for x̄.
Exercises
1. Consider the following linear program (P):
max 7x1 + 3x2
s.t.
x1 − x2 ≤ 1 (1)
x1 + x2 ≤ 5 (2)
2x1 − 3x2 ≤ 1 (3)
Consider the feasible solution x̄ = (3, 2)T .
(a) Let I be the index set of the tight constraints for x̄ and let
C := cone{rowi (A)T : i ∈ I}.
Prove that (7, 3)T ∈ C and deduce that x̄ is an optimal solution to (P).
(b) Find the dual (D) of (P).
(c) Write the complementary slackness conditions for (P) and (D).
(d) Using the fact that (7, 3)T ∈ C, construct a feasible solution ȳ to (D) such that x̄ and ȳ
satisfy the complementary slackness conditions. Deduce that x̄ is an optimal solution.
4.4 Farkas’ Lemma*

The following result gives a sufficient and necessary condition for a linear program in SEF to
have a feasible solution (see also section 2.1.1).
Theorem 4.8 (Farkas’ Lemma). Let A be an m × n matrix and let b be a vector with m entries.
Then exactly one of the following statements is true:
(1) the system Ax = b, x ≥ 0 has a solution;
(2) there exists a vector y such that AT y ≥ 0 and bT y < 0.
Proof.
Statements (1) and (2) cannot both be true.
Let us proceed by contradiction and suppose that there exists a solution x̄ to Ax = b, x ≥ 0 and
that we can find ȳ such that ȳT A ≥ 0T and ȳT b < 0. Since Ax̄ = b is satisfied, we must also
satisfy ȳT Ax̄ = ȳT b. Since ȳT A ≥ 0T and x̄ ≥ 0, it follows that ȳT Ax̄ ≥ 0. Then 0 ≤ ȳT Ax̄ =
ȳT b < 0, a contradiction.
If (1) is not true, then (2) must be true.
Consider the following primal-dual pair (see table 4.1),
max{0T x : Ax = b, x ≥ 0} (P)
min{bT y : AT y ≥ 0}. (D)
By hypothesis, (P) is infeasible. Observe that (D) is feasible since AT 0 ≥ 0. It follows from the
possible outcomes for primal-dual pairs (see table 4.2) that (D) is unbounded. In particular,
there exists a feasible solution ȳ of (D) with negative value, i.e. AT ȳ ≥ 0 and bT ȳ < 0 as
required.
Recall, that a vector ȳ satisfying the conditions of Theorem 4.8(2) is called a certificate of
infeasibility. Next we show that if a system Ax = b, x ≥ 0 has no solution then by solving the
auxiliary problem with the Simplex algorithm we can obtain a certificate of infeasibility.
Suppose that Ax = b, x ≥ 0 has no solution and let us assume that b ≥ 0 (we leave it as an
exercise to modify the argument for an arbitrary vector b). Assume that A is an m × n matrix.
4.4. FARKAS’ LEMMA* 191
The auxiliary problem is given by (see Section 2.6),
max d T (x1 , . . . , xn+m )

subject to

x1
  (Q)
A I  ...  = b
xn+m
(x1 , . . . , xn+m )T ≥ 0
where d is the vector with n + m entries such that d j = 0 for j = 1, . . . , n and d j = −1 for
j = n + 1, . . . , n + m. Let B be the optimal basis for (Q) obtained by the Simplex algorithm.
The objective function of the linear program obtained by rewriting (Q) in canonical form for
the basis B is, h i
max d T − ȳT A I (x1 , . . . , xn+m )T + ȳT b
| {z }
=:d¯T
for some vector ȳ (see Proposition 2.4). The value of the objective function in (Q’) for the
basic solution of B is equal to bT ȳ. Since, Ax = b, x ≥ 0 has no feasible solution, that value is
negative 1 (see Remark 2.10). Hence,
bT ȳ < 0. (4.17)
Since, the Simplex algorithm terminated we have that d¯ ≤ 0, i.e.

d T − ȳT A I ≤ 0T .
In particular, all entries of d¯ corresponding to the columns of A are non-positive. Thus
0T − ȳT A ≤ 0T or equivalently AT y ≥ 0. (4.18)
Equation (4.17) and (4.18) now imply that ȳ is the required certificate of infeasibility.
Exercises
1. (a) Prove that exactly one of the following statements holds:
• there exists x such that Ax ≤ b, x ≥ 0;
1 we consider a maximization problem here rather than a minimization problem as in Section 2.6
• there exists y such that AT y ≥ 0, y ≥ 0 and bT y < 0.

(b) Prove that exactly one of the following statements holds:
• there exists x 6= 0 such that Ax = 0, x ≥ 0;
• there exists y such that all entries of AT y are positive.
H INT: Consider the LP max{1T x : Ax = 0, x ≥ 0}.
(c) Prove that exactly one of the following statements holds:
• there exists x such that Ax ≤ b and A0 x = b0 ;
• there exists y and z such that y ≥ 0, yT A + zT A0 ≥ 0, and yT b + zT b0 < 0.
(d) Prove that exactly one of the following statements hold:
• there exists x and x0 such that Ax + A0 x0 = b and x ≥ 0;
• there exists y such that yT A ≥ 0, yT A0 = 0, and yT b < 0.
2. The goal of the exercise is to prove the following,

Theorem.
For vectors a1 , . . . , an and b (all with same number of entries) exactly one of the following
statements is true,
(S1) b ∈ cone{a1 , . . . , an },
(S2) there exists a halfspace H = {x : d T x ≤ 0} that contains all of a1 , . . . , an but does
not contain b.
(a) Explain what the theorem says when all vectors a1 , . . . , an , b have two entries.
Denote by A the matrix with columns a1 , . . . , an .
(b) Show that (S1) holds if and only if there exists x such that Ax = b and x ≥ 0.
(c) Show that (S2) holds if and only if there exists y such that AT y ≥ 0 and bT y < 0.
(d) Prove the theorem above using parts (b) and (c) and Farkas’ lemma.
3. Consider the LP problems in standard inequality form, with data (A, b, c) such that A =
−AT (skew-symmetric), b = −c. Prove that the dual of such an LP problem is equivalent
to the original LP problem (such problems are called self-dual). Apply the duality theory
we developed to such LP problems to prove as many interesting properties as you can.

Just as Dantzig was developing his ideas on Linear Programming, independently of Dantzig,
von Neumann [1903–1957] was working on Game Theory. There is a very nice connection
between LP duality theory and Game Theory for Two-Person Zero-Sum Games. We will com-
ment on this some more following the chapter on non-linear optimization.
Duality theory for linear programming has close ties to work on solving systems of linear
inequalities. Such connection goes back at least to Joseph Fourier (recall Fourier series
and Fourier transform) [1768–1830]. Fourier’s paper from 1826 addresses systems of linear
inequalities. The result known as Farkas’ Lemma came during the early 1900’s. Hermann
Minkowski [1864–1909] also had similar results in his pioneering work on convex geometry.
There are many related results involving different forms of inequalities, some using strict
inequalities, for example, theorems by Gordan and Stiemke. See, [21].
Chapter 5
Applications of duality*
In Chapter 3, we have seen two examples for how the framework of duality theory leads
to algorithmic insights. Both in the case of the shortest-path problem, and in that of the
minimum-cost perfect matching problem we presented efficient algorithms that are primal-
dual. Such algorithms compute feasible solutions for the respective primal problems but also,
at the same time, construct feasible solutions for the respective duals. The dual solution is
in the end used as a certificate of optimality, but crucially provides algorithmic guidance
in finding the primal solution. In this chapter, we discuss a few more select examples that
demonstrate the diverse use of duality.
5.1 Approximation algorithm for set cover
The figure on the right shows the floor plan of an

art gallery. The management team of the gallery wants
to place guards in their gallery such that a) every point
in the gallery is visible from at least one guard, and b)
the number of guards is as small as possible. The figure
on the right shows a solution with 6 guards: guards are
depicted as blue dots, and the portion of museum floor
visible from a guard is shaded.
The art gallery problem is an instance from the more
general set cover problem. In an instance of this problem, we are given a universe U =
{e1 , . . . , em } of elements, and a collection of sets S = {S1 , . . . , Sn } where each S j is a subset
195
196 CHAPTER 5. APPLICATIONS OF DUALITY*
of U. Each set S j has a non-negative cost cS j associated with it, and the goal is to find a
collection C ⊆ S of sets of the smallest cost such that each element ei is contained in at least
one of the sets in C .
The problem has a natural integer programming formulation. We introduce an indicator
variable xS for every set S ∈ S that takes on value 1 if set S is chosen in our solution. The IP
has a constraint for every element e ∈ U that forces a feasible solution to pick at least one set
containing the element.

min ∑ cS xS : S ∈ S
subject to
(5.1)
∑ xS : e ∈ S ≥ 1 e∈U
x ≥ 0, x integer
Following the proven strategy from Chapter 3, we could now simply look at the linear pro-
gramming relaxation of the above integer program, and derive its dual. We could then design
an algorithm that finds a primal set-cover solution as well as a feasible dual solution whose
value equals that of the cover. This strategy worked for the shortest-path, and minimum-cost
perfect matching problems, but it is not likely to work here! The reason is that the set-cover
problem, unlike shortest-path and matching problems, is NP-hard. Here, we simply note that
few people believe that NP-hard optimization problems possess efficient exact algorithms.
The reason is that the class contains literally thousands of equivalent hard problems that are
not believed to admit efficient algorithms. Developing such an algorithm for any one of these
problems immediately implies the existence of an algorithm for all of the problems in this
class. The set-cover problem lies in this class, and thus is unlikely to be efficiently solvable.
So, now that we know that we should not expect an efficient exact algorithm for the set-
cover problem, what can be done? We settle for the next best thing: we approximate the
problem. Suppose we are given a class I of instances of some optimization (say minimiza-
tion) problem. For a given instance I ∈ I , let optI be the instance’s unknown optimal value.
We are looking for an efficient algorithm A that, given any instance I ∈ I , computes a so-
lution A (I) whose value is at most α(I) · optI , for some function α of the instance data.
Such an algorithm is called an α-approximation algorithm, and α is sometimes called its
approximation- or performance guarantee.
In the case of the set-cover problem we will present two algorithms. Both algorithm em-
ploy linear programming duality in order to compare the computed solution to the value of
5.1. APPROXIMATION ALGORITHM FOR SET COVER 197
the linear programming relaxation of (5.1). The first algorithm follows a primal-dual strat-
egy, similar to what we have seen in Chapter 3. The second algorithm is a so-called greedy
algorithm, and it produces a solution whose cost can once again be bounded within a certain
factor of a feasible dual solution.
5.1.1 A primal-dual algorithm

Let us start by giving the dual of the LP relaxation of (5.1). This LP has a variable ye for every
element e ∈ U , and a constraint for every set S ∈ S . The constraint for set S limits the total
y-value of elements in S to be at most the cost cS of the set.

max ∑ ye : e ∈ U
subject to
(5.2)
∑ ye : e ∈ S ≤ cS S∈S
x ≥ 0, x integer
We recall from the Weak Duality Theorem 4.1 that the value 1T y of any dual-feasible
solution y is at most the value of an optimal solution to (5.1). The main idea for our set-cover
algorithm is to compute a feasible cover C ⊆ S , and a feasible dual solution y for (5.2) such
that the sets in C have cost that is not much higher then the objective value 1T y of y.
Let us be a little more precise. Recall the minimum-cost perfect matching algorithm from
Chapter 3.2. Throughout its execution, Algorithm 3 maintains a pair of feasible matching
M and feasible solution y to the dual matching LP (3.30). In Proposition 3.14 we showed
that the matching algorithm terminates with a perfect matching M if the given graph has one.
Moreover, all of the matching edges in M are equality edges with respect to the final dual
solution y for (3.30). In other words, the primal solution corresponding to M, and y satisfy
complementary slackness, and we have
cuv − yv − yv = 0,
for all uv ∈ M. We will following a similar strategy for the set-cover problem. The comple-
mentary slackness conditions for set-cover are as follows:

[CS1] xS > 0 =⇒ ∑ ye : e ∈ S = cS , and

[CS2] ye > 0 =⇒ ∑ xS : e ∈ S = 1.
Since the set-cover problem is NP-hard, we can not expect to find an integral solution x to
(5.1), and a feasible solution y to (5.2) such that [CS1] and [CS2] hold. We will enforce
condition [CS1] and relax condition [CS2].
Our algorithm works as follows: initially, let y = 0 be the trivial feasible solution to (5.2),
and let C = 0/ be an infeasible cover. The algorithm adds sets to C as long as there is an
element e ∈ U that is not covered by any of the sets in C . Whenever such an element e exists,
we increase the corresponding dual variable ye . In fact, we increase its value by the largest
possible amount ε that maintains y’s feasibility for (5.2); ε is the minimum slack of the dual
constraints corresponding to sets containing e:

ε = min{cS − ∑ ye0 : e0 ∈ S : e ∈ S}.
Let S be a set that attains the minimum on the right-hand side of the above expression. S
is now added to C , leading to e being covered. Furthermore, note that condition [CS1] is
satisfied by set S: in the primal solution corresponding to C , the variable for S has value 1,
and its dual constraint is tight for the updated vector y by our choice of ε. See Algorithm 6
for a formal description of the algorithm.
Algorithm 6 Primal-dual algorithm for set-cover

Input: Elements U , sets S , costs cS for all S ∈ S .
S
Output: A collection C ⊆ S such that U ⊆ S∈C S, and a feasible dual y for (5.2).
1: y = 0, C = 0/
2: while ∃e ∈ U that is not covered by any set in C do
3: Increase ye as much as possible, maintaining feasibility of y for (5.2)
4: Let S be a tight set covering e
5: Add S to C
6: end while
7: return C and feasible dual y
We define the frequency fe of element e ∈ U to be the number of sets in S that cover the
element. The frequency f of the given set-cover instance is the maximum over all element
frequencies; i.e., f = max1≤m fe .
Theorem 5.1. Algorithm 6 is an f -approximation; i.e., it returns a feasible set-cover C and

dual y such that ∑ cS : S ∈ C ≤ f 1T y.
Proof. The algorithm clearly returns a feasible set-cover C if there is one. Notice that the
cost of the final cover is
∑ cS = ∑ ∑ ye, (5.3)
S∈C S∈C e∈S
where the equality follows directly from condition [CS1]. Observe that a variable ye may
appear multiple times in the double-sum on the right-hand side; it appears for every set in C
that covers e. The right-hand side of (5.3) can therefore be rewritten as
∑ ye |{S ∈ C : e ∈ S}| ≤ f 1T y,
e∈U
where the inequality follows from the fact that every element appears in at most f sets.
A particularly nice special case of set-cover is the vertex-

cover problem. In an instance of this problem, we are given
¡a graph G = (V, E), and a cost cv for each vertex v ∈ V . The
goal is to find a set C ⊆ V of smallest total cost such that
every edge in E has at least one of its endpoints in C . The
black vertices in the graph on the right form a vertex-cover of
size 6.
Note that the vertex-cover problem is a special case of the
set-cover problem: the elements of this instance are the edges of G, and each vertex v ∈ V
forms a set that covers all edges that are incident to v. Every edge is incident to two vertices,
and hence appears in two sets of the corresponding set-cover instance. In other words, the
frequency of the instance if f = 2! Hence we obtain the following immediate consequence.
Corollary 5.2. Algorithm 6 is a 2-approximation for the vertex-cover problem.
In other words, the algorithm returns a vertex-cover of cost at most twice the optimal
vertex-cover, for any given instance of the problem.
5.1.2 Greed is good ... at least sometimes

In the last section, we developed an algorithm that computes a cover whose cost is at most f
times the cost of an optimal cover, where f is the maximum frequency of any element in U .
Can we do better? Can we find f 0 < f , and an algorithm that finds a cover of cost no more
than f 0 optI for any instance I? This appears to be unlikely (more on this later), but we can
improve the guarantee obtained on the previous section sometimes.
Suppose you are faced with a set-cover instance with universe U , and sets S , all of which
have unit cost. Your goal is to find the smallest number of sets that jointly cover all elements.
What would your natural strategy be? Well, you would make locally optimal decisions: you
would first pick a set that covers most elements, in the next step you would pick a set that
covers most uncovered elements, and so on. It is easy to extend this strategy to the setting
where sets have non-unit costs. Then we would want to balance between two objectives: cover
as many previously uncovered elements, and do this at the smallest possible cost. One way
of balancing the two competing objectives would be to always pick a set S ∈ S with smallest
cost divided by the number of newly covered elements. The resulting Algorithm 7 is known
as the greedy algorithm for set-cover.
Algorithm 7 Greedy algorithm for set-cover

Input: Elements U , sets S , costs cS for all S ∈ S .
S
Output: A collection C ⊆ S such that U ⊆ S∈C S
1: C = 0/
2: while ∃e ∈ U that is not covered by any set in C do
S
3: Choose set S that minimizes cS /|S \ S0 ∈C S0 |
4: Add S to C
5: end while
6: return C
S1 S2 S3 S4 S5 S6 Sm-1 Sm
S
e1 e2 e3 e4 e5 e6 em-1 em
Figure 5.1: An instance of the set-cover problem. Black dots correspond to elements, and
squares to sets. Shaded set Si has cost 1/i, and the large set S has cost 1 + ε for a fixed small
epsilon > 0.
Let us demonstrate Algorithm 7 on the set-cover instance given in Figure 5.1. The instance
has m elements, and m + 1 sets. The shaded set Si in the figure has a cost of 1/i, for all i, and
set S has cost 1 + ε, for some tiny ε > 0. In the first step, each of the shaded sets Si covers
one element and has cost 1/i, while the large set S has cost 1 + ε. Thus, our algorithm first
chooses set Sm as its cost to newly covered elements ratio is smaller than that of S. Similar
in the second step, the algorithm chooses set Sm−1 , and so on. In the end, our algorithm will
have picked up all the shaded sets, while the optimal solution clearly is to pick just the set S.
The solution computed by the greedy algorithm has cost
m
1
Hm := ∑ ,
i=1 i
compared to an optimal cost of just 1 + ε. Thus, we have shown that the greedy algorithm
can not be better than Hm -approximate. We will now show that Algorithm 7 always returns a
cover of cost at most Hm times that of an optimal solution.
Theorem 5.3. Algorithm 7 is a Hm -approximation for the set-cover problem.
Proof. We will prove this theorem by once again using linear programming duality. Suppose
Algorithm 7 picks the sets
S1 , . . . , S p ,
in this order, and let C be the cover consisting of these p sets. For all 1 ≤ i ≤ p, let us also
define
i−1
[
Ui := Si \ S j.
j=1
as the set of newly covered elements resulting from adding set Si . For 1 ≤ i ≤ p, and for all
e ∈ Ui , define
cSi
ye = . (5.4)
|Ui |
That is, we distribute the cost of set Si evenly over the elements in Ui , and thus
p
∑ ye = ∑ cSi . (5.5)
e∈U i=1
We will now show that y/Hm is feasible for (5.2). This together with (5.5) immediately shows
that the algorithm returns a solution of cost no more than Hm times the optimal value of (5.2).
The dual (5.2) has a constraint for every set S ∈ S that limits the y-value of elements in S
to the cost cS of the set. We will show that each such constraint is satisfied by y/Hm . Let
e1 , . . . , el
be the elements of some set S ∈ S , in the order in which they were covered by the greedy
algorithm. Hence, ei was covered at the same time or before element ei0 for any 1 ≤ i ≤ i0 ≤ l.
We claim that the value yei picked according to (5.4) satisfies
cS
yei ≤ ,
l −i+1
for all 1 ≤ i ≤ l. Suppose ei was covered in iteration r, where Algorithm 7 chose a set Sr in
step 3. Note that set S contained l − i + 1 uncovered elements when Sr was picked. The fact
that Sr was picked regardless implies
cSr cS
≤ .
|Sr \ S1 ∪ Sr−1 | l − i + 1
The left-hand side of the inequality is exactly the value of yei , and this proves the claim. Using
this, the left-hand side of the dual constraint for set S can be bounded as follows:

cS
∑ yei : 1 ≤ i ≤ l ≤ ∑ l − i + 1 : 1 ≤ i ≤ l ≤ Hm cS ,
where the final inequality follows from the fact that S contains at most m = |U | elements.
5.1.3 Discussion
The two algorithms presented in this chapter are incomparable in terms of their approxima-
tion guarantee. Algorithm 6 is obviously superior when each element lies in few sets, and
Algorithm 7 shines when the maximum frequency is high. Examples for both situations are
easy to come by, and we have in fact seen a low-frequency one in the vertex cover problem.
At present, no algorithm with a better approximation guarantee for the vertex cover problem
√
is known, and it is known that no α-approximation with α < 10 5 − 21 ≈ 1.36 can exist
unless NP=P [10]. If one assumes the less tried and tested Unique Games Conjecture, the
2-approximation algorithm given here is best possible unless NP=P [16].
The set-cover problem in general is harder to approximate than the vertex cover problem.
In this section we have presented two algorithms that compute covers C whose cost is within
a certain factor of the optimal value of the LP relaxation of (5.1). How good an approximation
algorithm can we obtain with such a strategy? Well, it appears that there are limits.
Let (IP) be an integer programming formulation for some given minimization problem,
and let (P) be its LP relaxation. Let I be a set of instances for this problem, and for some
5.2. ECONOMIC INTERPRETATION 203
I ∈ I , define optIP P
I and optI be optimal values for (IP) and (P), respectively, for the given
instance I. We then define the integrality gap of (P) as
optIP
I
α(P) := max .
I∈I optPI
Hence αP is the worst case multiplicative gap between the fractional and integral optimal
values for (P). Notice that we know from Weak Duality Theorem 4.1 that the value of any
feasible dual solution for instance I ∈ I can not be higher than optPI . Furthermore, the value
of an integral solution must be at least optIP
I . Thus, using LP (P), we can not hope to obtain
an α-approximation algorithm with α < α(P).
For set-cover it is easy to find example instances where (1 − ε) ln m optPI ≤ optIP
I for any
fixed ε > 0; see [23] for an example, and this shows that the approximation guarantee obtained
by Algorithm 7 is best possible (up to constant factors). In fact, one can even show that there is
a constant c ∈ (0, 1) such that no O((1 − c) ln m)-approximation algorithm exists unless NP=P
(see [11, 20, 1], and [24]).
5.2 Economic interpretation

After seeing the beautiful applications of duality theory to the Shortest Paths problem, mini-
mum cost perfect matching problem and set covering problems, it should be clear to the reader
that we only scratched the surface in terms of similar applications and interpretations of dual
variables.
Duality theory has very nice connections to mathematical economics as well. A very
commonly used setting to explain some of this relationship is the class of production planning
problems formulated as max{cT x : Ax ≤ b, x ≥ 0} as we covered in Chapter 1. In this setting,
the dual variables correspond to shadow prices of resources. These are “internal prices” which
depend on the current optimal solution (hence the optimal production plan). For each resource
i, the ith component of an optimal dual solution y∗ answers the important question of “what
would be the rate of increase in the total optimal profit, per unit increase in the availability of
resource i?” We explain this next.
Let us call the original problem above, (P). Then, its dual is
min{bT y : AT y ≥ c, y ≥ 0} (D)
Suppose (P) and (D) have optimal solutions x∗ and y∗ respectively. Consider increasing the
availability of resource ` by a small amount, say ε > 0. Call the new RHS vector b̃. Note
that b̃ is the same as b except in the `th component. We denote the new primal-dual pair of
problems by (P̃) and (D̃) respectively. Let us use the Complementary Slackness conditions
on (P̃) and (D̃) for the dual solution y∗ which is still feasible in (D̃). For y∗ to be optimal in
(D̃), the necessary and sufficient conditions are that there exists x̄ ∈ Rn satisfying:
(a) rowi (A)x̄ = b̃i for all i such that y∗i > 0;
(b) x̄ j = 0 for all j such that row j (AT )y∗ > c j ; and,
(c) Ax̄ ≤ b̃ and x̄ ≥ 0.
Now, let us derive some sufficient conditions for y∗ to be optimal in (D̃). So, we set x j := 0
for every j such that row j (AT )y∗ > c j and write the linear system of equations
rowi (A)x̄ = b̃i for all i such that y∗i > 0
in terms of the remaining x j . So, y∗ being optimal forces for certain set of indices j, x j = 0 and
for certain set of indices i, rowi (A)x = b̃i . Assume that for the complements of these indices,
we have x∗j > 0 and rowi (A)x∗ < bi . Then, it can be proved that for every ε > 0 small enough,
the above system of linear equations has a solution x̄ satisfying the Complementary Slackness
conditions on (P̃) and (D̃) for the dual solution y∗ . Hence, we deduce that for every ε > 0
small enough, the optimal objective value of (D̃) (and hence of (P̃)) is bT y∗ + εy∗` . The only
difference between (P) and (P̃) is that b` is increased by ε. Therefore, y∗` is the maximum
unit “price” for resource ` that we should be willing to pay at our current (optimal) operating
conditions. We have the following result.
Theorem 5.4. Let x∗ be an optimal basic solution of max {cT x : Ax ≤ b, x ≥ 0} and let y∗
be an optimal solution of its dual. Further assume that for all i such that y∗i = 0, we have
rowi (A)x∗ < bi and for every j such that row j (AT )y∗ = c j , we have x∗j > 0. Then, for every
ε > 0 sufficiently small, the optimal value of the problem obtained by replacing b` by b` + ε
is εy∗` more than that of (P).
Note that if we consider decreasing the amount of resource ` available by ε then the
optimal objective value goes down by εy∗` . Moreover, the above assumption “for all i such
that y∗i = 0, we have rowi (A)x∗ < bi and for every j such that row j (AT )y∗ = c j , we have
x∗j > 0” (together with the assumption that x∗ is an optimal solution that is also basic) implies
that y∗ is the unique optimal solution of (D).
5.2. ECONOMIC INTERPRETATION 205
To recap, the dual variables correspond to some kind of “internal prices” called shadow
prices for the resources. These shadow prices depend on the current optimal production plan.
It is worth mentioning that the approach we used above can be used to derive correspond-
ing results for more complicated perturbations of b (i.e., we may perturb many elements of
b simultaneously). Moreover, by switching the roles of (P) and (D), and applying the same
mathematical reasoning, we can obtain analogous results for perturbations of c.
We may also want to consider larger perturbations. What if we wanted to analyze what
would happen to our profits as the availability of one or many of our resources were being
continuously increased? We could again use the techniques of linear optimization and duality
theory to prove that the rate of increase in our profits will be monotone decreasing per unit
increase in the availability of resources. E.g., if our profit goes up by $10 when b` is increased
from 500 to 501, our profit will go up by no more than $10 when b` is increased by one
provided b` ≥ 500. This allows us to prove a version of the economic principle of law of
diminishing marginal returns in our setting.
In addition to such strong connections to mathematical economics, linear programming
also has close historical ties to economics. During the early 1900’s, in the area of mathe-
matical economics, Leontief [1905–1999] and others were working on various problems that
had connections to linear optimization. One of the most notable models is the input-output
systems. Suppose there are n major sectors of a national economy (e.g., construction, labor,
energy, timber and wood, paper, banking, iron and steel, food, real estate, . . .). Let ai j denote
the amount of inputs from sector i required to produce one unit of the product of sector j (ev-
erything is in same units, dollars). Let A ∈ Rn×n denote the matrix of these coefficients. (A is
called the input-output matrix.) Now, given b ∈ Rn , where bi represents the outside demands
for the output of sector i, we solve the linear system of equations Ax + b = x. If (A − I) is
nonsingular, we get a unique solution x determining the output of each sector. Indeed, to have
viable system, we need every x j to be nonnegative. Otherwise, the economy requires some
imports and/or some kind of outside intervention to function properly. Leontief proposed this
model in the 1930s. Then in 1973 he won the Nobel Prize in Economics for it.
Kantorovich [1912–1986] won the Nobel Prize in Economics in 1975. Koopmans [1910–
1985] who had worked on Transportation Problem (a generalization of maximum-weight bi-
partite matching problem) in the 1940s as well as on input-output analysis (similar to Leon-
tief’s interests) for production systems, shared the Nobel Prize in Economics in 1975 with
Kantorovich. According to Dantzig, the Transportation Problem was formulated (and a solu-
tion method proposed) by Hitchcock in 1941. Later, the problem was also referred to as the
Hitchcock-Koopmans Transportation Problem.
The Transportation Problem together with the earlier work on electrical networks (by
mathematicians, engineers and physicists) laid some of the ground work for what was to
come in Network Flows and Network Design. Next, we will see another beautiful application
of duality theory in the area of Network Flows and Network/Graph Connectivity.
5.3 The max-flow min-cut theorem

Imagine the following situation: you and your friend are in far away parts of the world, and
would like to conduct online video chats. Both of you own a computer, and the two hosts are
connected by a common network. Video chats, in particular if they are conducted in HD res-
olution, require a substantial amount of end-to-end network bandwidth (some manufacturers
suggest as much as 1500 Mb/s). In this section, we will look at the following question: given
two computers and a connecting network, how many bits per second can we simultaneously
transmit from one to the other?
Let us assume that the network connecting the two u
→
− →
−
computers is modeled as a directed graph G = (V, E ) 30 25
that has vertices V (these are in the directed world some-
→
− s 10 30 t
times also called nodes), and arcs E that are, we re-
call, ordered pairs of vertices. The two computers cor- 20 15
respond to specific vertices s and t in this graph, and v

every other vertex corresponds to a router or switch in
the network. An arc qr represents a physical link between the two network entities corre-
sponding to q and r, and it comes with an integer capacity cqr that specifies the maximum
number of bits per second that the link may carry. The figure above shows a particularly
simple such example.
5.3. THE MAX-FLOW MIN-CUT THEOREM 207
Our goal is to write an integer program that computes the max-

imum transmission rate between s and t in the given network. In a1
b1
our model, we assume that data transmission is lossless; i.e., a data a2 b2
a3 q
bit that is injected at node s vanishes only at node t. This yields the
bm
following flow balance condition in our network: consider a vertex an
q 6∈ {s,t}, and suppose that there are n arcs a1 , . . . , an whose heads
are at q, and m arcs b1 , . . . , bm whose tails are at q. Any bit that trav-
els on one of the n in-arcs must subsequently traverse one of the m out-arcs. We can express
this linearly as

∑ xa : a ∈ δ (q̄) = ∑ xa : a ∈ δ (q) , (5.6)
→
−
where xa denotes the number of bits on arc a ∈ E , and δ (q) and δ (q̄) denote the sets of out-
and in-arcs at q, respectively. A vector x that satisfies (5.6) for all q ∈ V \ {s,t} as well as

0 ≤ x ≤ c is called an s,t flow, and we let ∑ xa : a ∈ δ (s) be its value. Our integer program
attempts to find an s,t-flow of maximum value.

max ∑ xa : a ∈ δ (s)
subject to
(5.7)
∑ xa : a ∈ δ (q) − ∑ xa : a ∈ δ (q̄) = 0 q ∈ V \ {s,t}
0 ≤ x ≤ c, x integer
Before we continue, let us first simplify our life, and assume that vertex s has no in-arcs,
and that t has no out-arcs. This assumption is easily seen to be benign: add new nodes s0 and
→
−
t 0 , and infinite capacity arcs s0 s and tt 0 to G . Note that the old graph has an st-flow of value γ
if and only if the new graph has an s0t 0 -flow of the same value.
The dual of the LP relaxation of (5.7) has a variable la corresponding to the capacity
→
−
constraint xa ≤ ca for every arc a ∈ E , and a variable αq for the flow balance constraint (5.6)
→
−
of each vertex q ∈ V \ {s,t}. The dual has a constraint for each arc a ∈ E . Because of the
simplifying assumption above, we need to distinguish between three types of arcs only: those
that are not incident to s or t, those that leave s, and those that enter t.
−
→
min ∑ ce le : e ∈ E
subject to
→
−
αu − αv + luv ≥ 0 uv ∈ E : u, v 6∈ {s,t} (5.8)
−αv + lsv ≥ 1 sv ∈ δ (s)
αu + lut ≥ 0 ut ∈ δ (t¯)
l≥0
Since x = 0, and l = 1, α = 0 are feasible solutions for the LP relaxation of (5.7) and for
its dual (5.8), it follows from the Fundamental Theorem of Linear Programming (Theorem
2.11) that both have optimal solutions, and that their values must be the same. We will first
show that, given that the capacities ca are integral for all a ∈ A, these optimal solutions will
be integral.
Observe that we can rewrite the LP relaxation of (5.7) as
max cT x
subject to
(5.9)
Ax = b
x ≥ 0.
where matrix A has a column for each arc in A, one row for each flow balance constraint,
and one more for each capacity constraint. We will soon see that this matrix A is in fact
totally unimodular (TU). In the following we say that a matrix B is a submatrix of A if it is
obtained by deleting some of A’s rows and columns. A matrix A is then TU if all of its square
submatrices B have determinant in {0, +1, −1}.
Total unimodularity is a very useful property as we will now see. Suppose x∗ is an optimal
basic solution for (5.9) for some basis B. As usual we will use AB for the square submatrix of
A defined by the basic columns. Then, we know that
xB∗ = A−1
B b,
and xN∗ = 0. Since A is assumed to be TU, and AB is a non-singular submatrix of A, we must

have det(AB ) = ±1. Cramer’s Rule from Linear Algebra now tells us that A−1
B is integral. In
j
fact, let AB be the matrix obtained from AB by replacing the jth column by the vector b. We
then have the following handy formula for the jth basic variable of x∗ :
j
det(AB )
A−1
B b j = .
det(AB )
Since the denominator is either 1 or −1, the above expression is an integer. Hence, x∗ =
(xB∗ , xN∗ ) is integer. We obtain the following theorem as an immediate consequence.
Theorem 5.5. Given a linear program (P) in standard equality form whose matrix A is totally
unimodular, and whose right-hand side vector b is integral. Then every basic solution x of (P)
is integral as well.
The matrix A in (5.9) has the form

A0 0
A= (5.10)
I I
→
−
where A0 is a submatrix of the vertex-arc incidence matrix of directed graph G ; A0 has a row
→
−
for each vertex v ∈ N \ {s,t}, and a column for each arc e ∈ E . The entry Ave is +1 if e has
tail v, it is −1 if v is the head of v, and it is 0 otherwise. Matrix I in (5.10) is an |A| × |A|
identity matrix, and 0 is an |N| × |A| matrix of 0’s. We show that A is TU.
Proposition 5.6. Matrix A defined in (5.10) is totally unimodular.
Proof. It suffices to show that det(B) = ±1 for all r × r non-singular submatrices of A. We

will do this by induction on r. Clearly, the claim is true for r = 1 as every entry of A is either
0, or ±1.
Therefore consider some non-singular r × r submatrix B of A with r ≥ 1, and assume that
all non-singular r0 × r0 submatrices B of A with r0 < r have determinant ±1. Note that B can
not have a 0 column as otherwise det(B) = 0.
Let us first assume that B has a row, or a column with a single non-zero entry. Both cases
are handled in the same manner. W.l.o.g., let us assume that column j has a single non-zero
in row i. Then we can use Laplace’s formula, and expand the determinant of B along column
j:
det(B) = (−1)i+ j Bi j det(Bi j ),
where Bi j is the minor of B obtained by removing row i and column j from B. Bi j is a non-
singular matrix of rank r − 1 and, using induction, has determinant of ±1. It now easily
follows that the determinant of B is ±1 as well.
We may now assume that every row and column has at least two non-zero entries. This
means that B is a submatrix of the node-arc incidence matrix A0 in (5.10), and thus, every
column of B has exactly one 1 and one −1. Hence, adding all rows yields the 0 vector, and
hence B is singular, contradicting our assumption.
We obtain the following immediate corollary.
Corollary 5.7. LP (5.7) has integral optimal solution whenever c is integral. The dual (5.8)
always has optimal integral solutions. In particular, there is an optimal solution (α ∗ , l ∗ ) such
→
−
that le∗ ∈ {0, 1} for every e ∈ E , and the set
→
−
C := {e ∈ E : le∗ = 1} (5.11)
forms an st-cut.
Proof. We already argued that (5.7) and (5.8) are both feasible, and hence both must have
optimal basic solutions. Proposition 5.6 (together with Theorem 5.5) immediately shows that
primal basic solutions must be integral. The same result now follows for the dual as the
transpose of a matrix A is TU if and only if A is.
Let (α ∗ , l ∗ ) be an optimal basic solution to (5.8). Consider the st-path
P = sv1 , v1 v2 , . . . , v p−1 v p , v pt.
Adding the constraints in (5.8) for the p + 1 arcs of P yields
−αv∗1 + lsv
∗
1
≥ 1
+ αv∗1 − αv∗2 + lv∗1 v2 ≥ 0
+ αv∗2 − αv∗3 + lv∗2 v3 ≥ 0
+ ...
+ αv∗p−1 − αv∗p + lv∗p−1 v p ≥ 0
+ αv∗p + lv∗pt ≥ 0
∗
= lsv1
+ lv∗1 v2 + . . . + lv∗pt ≥ 1
This shows that all st-paths P must contain at least one arc e with le∗ ≥ 1. Let C be an
inclusion-wise minimal st-cut contained in
→
−
{e ∈ E : le∗ ≥ 1};
i.e., choose C such that none of its proper subsets is an st-cut. Let S be the set of vertices that
→
−
are reachable from s in the graph obtained from G by removing the arcs in C , and define a
candidate solution for (5.8) as follows: let le = 1 if e ∈ C , and le = 0 otherwise; αv = −1 if
v ∈ S, and αv = 0 otherwise. We now verify that (α, l) satisfies the constraints of (5.8).
→
−
Consider the dual constraint for arc uv ∈ E with u, v ∈ V \ {s,t}. If u, v are both in S,
or both in V \ S then αu − αv = 0 and the constraint is satisfied. If u ∈ V \ S, and v ∈ S then
αu − αv = 1, and the constraint is once again satisfied. Finally, if u ∈ S and v ∈ V \ S, then
αu − αv + luv = −1 − 0 + 1 = 0.
It is equally routine to check that the dual constraints for arcs incident to s or t are satisfied.
Note that the value of solution (α, l) is no more than that of (α ∗ , l ∗ ), and hence (α, l) is
optimal for (5.8).
We obtain the following important consequence.
Corollary 5.8 (Max-flow Mincut). The maximum value of an st-flow is equal to the minimum-
capacity of an st-cut.
Exercises
1. Consider the LP problem max{cT x : Ax ≤ b, x ≥ 0}. Suppose it has an optimal solution x∗ .
(a) Define b(θ ) := b + θ e1 and let (Pθ ) be the LP problem obtained from the above by
replacing b by b(θ ). Prove that for all θ ≥ 0, (Pθ ) has a basic feasible solution that is
optimal.
(b) Let z∗ (θ ) denote the optimal value of (Pθ ). Prove that z∗ : R+ → R is a piecewise linear
function of θ .
(c) Prove that
θ 2 ∗
∈ R : θ ≥ 0, z ≤ z (θ )
z
is a convex set.
2. We discussed the Shortest Paths problem as well as the Maximum Flow problem. In this
exercise, we will try to formulate the Shortest Paths problem using a network flow idea.
Given s and t, consider formulating the problem of finding a shortest path from s to t as the
problem of finding one unit flow from s to t of minimum cost, where the cost of sending
one unit of flow along an edge is the length of that edge. Note that for every node except s
and t in the graph, we must have flow conservation (if there is flow coming into the node,
it must leave the node). For s, one unit of flow must leave it; and one unit of flow must
arrive at t. Using the variables

1, if there is a unit flow on the arc (i, j);
xi j :=
0, otherwise
and the above reasoning, formulate the shortest path problem as an integer programming
problem. Then consider the LP relaxation of your integer programming problem, write
down the dual and interpret the strong duality theorem for this pair of primal-dual LP
problems.
Chapter 6
Solving Integer Programs
In Chapter 2, we have seen how to solve linear programs using the Simplex algorithm, an
algorithm that is still widely used in practice. In Chapter 3 we discuss efficient algorithms
to solve the special class of integer programs describing the shortest path problem and the
minimum cost perfect matching in bipartite graphs. In both these examples it is sufficient to
solve the linear programming relaxation of the problem.
Integer programming is widely believed to be a difficult problem. Nonetheless, we will

present algorithms that are guaranteed to solve integer programs in finite time. The drawback
of these algorithms is that the running time may be in exponential in the worst case. However,
they can be quite fast for many instances, and are capable of solving many large-scale, real-life
problems.
These algorithms follow two general strategies: The first attempts to reduce integer pro-
grams to linear programs, this is known as the cutting plane approach and will be described
in Section 6.2. The other strategy is a divide and conquer approach which is known as branch
and bound which will be discussed in Section 6.3. In practice both strategies are combined
under the heading of branch and cut. This remains the preferred approach for all general
purpose commercial codes.
In this chapter in the interest of simplicity we will restrict our attention to pure integer
program where all the variables are required to be integer. The theory developed here extend
to mixed integer program where only some of the variables are required to be integer, but the
material is beyond the scope of this book.
213
214 CHAPTER 6. SOLVING INTEGER PROGRAMS
6.1 Integer programs versus linear programs

In this section, we show that in principle the problem of solving an integer program can be
reduced to the problem of solving a linear program. This reduction is in general very time
consuming and thus does not directly lead to an efficient algorithm to solve integer program.
It will however, provide a framework for understanding the idea of cutting planes.
Consider a (possibly infinite) set S of point in Rn . The convex hull of S, denoted conv(S),
is defined as the smallest convex set that contains S. In particular, S = conv(S) if and only if
S is a convex set. We illustrate this concept of couple of examples.
Example 21. Consider the case where S contains the following 3 points in R2 ,

(1) 0 (2) 1 (3) 3
x = x = and x = .
2 0 1
Then the convex hull of S is given by the triangle with vertices x(1) , x(2) , x(3) as indicated in the
next figure. Note, that this triangle is convex and contains S. Moreover, any set that contains
S must contain all the points in this triangle.
x2
x(1)
2
!
conv {x(1) , x(2) , x(3) })
x(3)
1
x(2)
0 1 2 3 x1
Example 22. Consider the following polyhedron,

     
 −1 0 −3/2 (1) 
x1 x
P= :  1 −1 1 ≤  5/2  (2) .
 x2 x2 
−1 −3 −3 (3)
The polyhedron P is represented in the next figure. Each of the constraints (1),(2),(3) corre-
spond to a halfspace as indicated in the figure. Let us define S to be set set of all integer points
6.1. INTEGER PROGRAMS VERSUS LINEAR PROGRAMS 215
in P. Note, that in this case S is an infinite set of points. Then the convex hull of S is described
by another polyhedron
     
 −1 0 −2 (a) 
x1   x1
Q= : 1 −1 ≤ 2
 (b) .
 x2 x2 
0 −1 −1 (c)
The polyhedron Q is represented in the previous figure. Each of the constraints (a),(b),(c)
correspond to a halfspace as indicated in the figure.
x2 (1) (a) (c)
4
(2)
P Q
3
1
(b)
0 1 2 3 (3) 4 5 x1
In example 22 we saw that the convex hull of the set of all integer points in a polyhedron
is a polyhedron. This is no accident, indeed we have the following deep result that relates
feasible solutions of integer programs to feasible solutions of linear programs,
Theorem 6.1. Consider the following polyhedron P = {x ∈ Rn : Ax ≤ b} where all entries of

A and b are rational numbers. Let S be the set of all integer points in P. Then the convex hull
of S is a polyhedron Q described by a matrix and a vector where all entries are rational.
We omit the proof of this result in this book. The condition that all entries of A and b be
rational numbers cannot be excluded from the hypothesis.
Consider now an integer program of the following form,
max cT x
subject to
(6.1)
Ax ≤ b
x integer
Let us assume that all entries in the matrix A and the vector b are rational. This is a natural
assumption since numbers stored on a computer can only be recorded with finite precision and
hence are rational. Let S denote the set of all integer points satisfying the constraints Ax ≤ b,
i.e. S is the set of feasible solutions to the integer program (6.1). It follows from Theorem 6.1
that the convex hull of S is a polyhedron, i.e. that conv(S) = {x : A0 x ≤ b0 } for some matrix
A0 and vector b0 where all entries of A0 and b0 are rational.
Let us define the following linear program,
max cT x
subject to (6.2)
A0 x ≤ b0
Theorem 6.2. The following hold for the integer program (6.1) and the linear program (6.2),
(a) (6.1) is infeasible if and only if (6.2) is infeasible,
(b) (6.1) is unbounded if and only if (6.2) is unbounded,
(c) every optimal solution to (6.1) is an optimal solution to (6.2),
(d) every optimal solution to (6.2) that is an extreme point is an optimal solution to (6.1).
Example 23. We illustrate the previous theorem. Suppose that (6.1) is of the form,
max (1, 1)x

subject to
   
2 1 7 (1)
1    (2) (IP)
 2
x ≤  71 
−1 0  −  (3)
2
−1 −4 −4 (4)
x integer
6.1. INTEGER PROGRAMS VERSUS LINEAR PROGRAMS 217
We describe the feasible region of (IP) in the next figure. Each of the constraints (1),(2),(3),(4)
correspond to a halfspace as indicated in the figure. Thus the set of all feasible solutions to
(IP) is given by,
1 1 1 2 2 3
S= , , , , , .
1 2 3 1 2 1
(4)
x2
(a)
3
(1)
(2)
2
S
1 (b)
(c)
(3)
0 1 2 3 4 x1
It follows that (6.2) is of the form,

max (1, 1)x
subject to
    (P)
−1 0 −1 (a)
 0 −1 x ≤ −1 (b)
1 1 4 (c)
Each of the constraints (a),(b),(c) correspond to a halfspace as indicated in the previous figure.
Since the objective function of (IP) is z = x1 +x2 the set of all optimal solutions to (IP) is given
by (3, 1)T , (2, 2)T , and (1, 3)T . It can be readily checked that each of these points are also
optimal solutions to (P) as was predicted by Theorem 6.2(c). Note, however, that every point
in the line segment between the points (1, 3)T and (3, 1)T is an optimal solution to (P). In
particular, it is not true that every optimal solution to (P) is an optimal solution to (IP).
Theorem 6.2 tells us that in a theoretical sense Integer Programming reduces to Linear
Programming: it suffices to compute a basic optimal solution to (6.2). The catch here, is that
the system A0 x ≤ b0 in the linear program (6.2) is in general much larger (exponentially larger)
than the system Ax ≤ b in the integer program (6.1), hence it cannot be described completely
in practice. What we do instead is to try to approximate the description of the polyhedron
{x : A0 x ≤ b0 } using a limited number of constraints. One way of proceeding is described in
the next section.
Exercises
1. Consider the IP,
max x1 + 10x2
subject to

1 −20 0
x≤
1 20 20
x ≥ 0, x integer
(a) What is the geometrical distance, in R2 , between the optimal solution of the IP and of
its LP relaxation? Explain.
(b) Give an example of an IP for which the geometrical distance between its optimal solu-
tion, and that of its LP relaxation, is at least 100.
(c) Give an example of an IP which has no feasible integer solutions, but its LP relaxation
has a feasible set in R2 of area at least 100.
2. Consider the IP problem:
max {x1 + 6x2 : x1 + 9x2 ≤ 10, 2x1 ≤ 19, x ≥ 0 and integer} .
(a) Graph the feasible region of the LP relaxation of IP, indicate the feasible solutions of
the IP on the graph. Then, find the optimal solutions of the LP relaxation and the IP.
(b) Focus on the optimal solution of the LP relaxation. What are the closest (in the sense
of Euclidean distance) feasible solutions of the IP? Are any of these closest IP feasible
solutions optimal for the IP? Now, replace the objective function with max x1 + 8.5x2 .
Answer the same questions again. What related conclusions can you draw from these
examples?
6.2. CUTTING PLANES 219
(c) Create a family of IP problems with two nonnegative integer variables parameterized
by an integer k ≥ 10 (in the above example k = 10) generalizing the above example
and its highlighted features. Then answer the parts (a) and (b) for all values of k.
6.2 Cutting planes

Let us motivate our approach with an example.
Example 24. Suppose you wish to solve the following integer program (IP):
max (2, 5)x

subject to

1 4 8 (1) (IP)
x≤
1 1 4 (2)
x ≥ 0 integer.
As we do not know how to deal with the integrality conditions, we shall simply ignore them.
Thus, we solve the linear program relaxation (LP1) obtained by removing integrality condi-
tions from (IP). We obtain the optimal solution x(1) = (8/3, 4/3)T . Unfortunately, x(1) is not
integral, thus it is not a feasible solution to (IP).
We wish to find a valid inequality (?) which satisfies the following properties:
(I) (?) is valid for (IP), i.e. every feasible solution of (IP) satisfies (?),
(II) x(1) does not satisfy (?).
An inequality that satisfies both (I) and (II) is called a cutting plane for x(1) . We will discuss
in the next section how to find such a cutting plane, but for the time being, let us ignore that
problem and suppose that we are simply given such an inequality,
x1 + 3x2 ≤ 6. (?)
We add (?) to the system (LP1) to obtain a new linear program (LP2). Because (?) satisfies (I),
every feasible solution to (IP) is a feasible solution to (LP2). Moreover, (II) implies that x(1)
is not feasible for (LP2), so in particular the optimal solution to (LP2) will be distinct from
x(1) . In our case (LP2) has an optimal solution x(2) = (3, 1)T . Note, that x(2) is integral. Since
it maximizes the objective function over all feasible solutions of (LP2) it also maximizes the
objective function over all feasible solutions of (IP), hence x(2) is optimal for (IP).
We describe the feasible region of (LP1) in figure (i). Each of the constraints (1),(2)
correspond to a halfspace as indicated in the figure. We also indicate the optimal solution
x(1) = (8/3, 4/3)T . Since x(1) was not integral we added a cutting plane (?): x1 + 3x2 ≤ 6
x2 x2
2 2
(1)
x(1)
x(2)
1 1
(2) (!)
feasible region of (LP1) feasible region of (LP2)
0 1 2 3 4 x1 0 1 2 3 4 x1
(i) (ii)
and obtained (LP2). We describe the feasible region of (LP2) in figure (ii). Constraint (?)
corresponds to a halfspace as indicated in the figure. We also indicate the optimal solution
x(2) = (3, 1)T of (LP2).
Let us formalize the procedure. Consider the following pair of optimization problems,
max{cT x : x ∈ S1 }, (P1)
max{cT x : x ∈ S2 }. (P2)
If S2 ⊇ S1 then we say that (P2) is a relaxation of (P1). For instance, (P1) may be an integer
program and (P2) its linear programming relaxation.
Remark 6.3. Suppose (P2) is a relaxation of (P1). Then
(a) if (P2) is infeasible then (P1) is infeasible, and
(b) if x̄ is optimal for (P2) and x̄ is feasible for (P1), then x̄ is optimal for (P1).
Proof. (a) If (P2) is infeasible, i.e. S2 = 0,

/ then S1 = 0,
/ i.e. (P1) is infeasible. (b) Suppose x̄
is an optimal solution to (P2) that is also a feasible solution to (P1). Then x̄ maximizes cT x
among all x ∈ S2 , so in particular it maximizes cT x among all x ∈ S1 . Hence, x̄ is an optimal
solution to (P1).
We can now describe our algorithm. Suppose we wish to solve the integer program,
max cT x
subject to
(IP)
Ax ≤ b
x ≥ 0, x integer.
Algorithm 8 Cutting plane Algorithm.

1: loop
2: Let (LP) denote max{cT x : Ax ≤ b}.
3: if (LP) is infeasible then
4: stop (IP) is infeasible.
5: end if
6: Let x̄ be the optimal solution to (LP).
7: if x̄ is integral then
8: stop x̄ is an optimal solution to (IP).
9: end if
10: Find a cutting plane aT x ≤ β for x̄
11: Add constraint aT x ≤ β to the system Ax ≤ b
12: end loop
Correctness of the algorithm follows from Remark 6.3. We did not discuss the possibility that
(LP) might be unbounded. In that case we can only now that one of the following holds: either
(IP) is unbounded or (IP) is infeasible. But we do not know which case occurs. In practice we
are often in situations where we know that the (IP) not unbounded.
6.2.1 Cutting planes and the Simplex algorithm
We have yet to show how to find cutting planes. Let us revisit example 24 from the previous
section. After introducing slack variable x3 for constraint (1) and x4 for constraint (2) we can
rewrite (IP) as follows,

max (2, 5, 0, 0)x
subject to

1 4 1 0 8 (IP’)
x=
1 1 0 1 4
x ≥ 0, integer.
Observe, that since in x3 = 8 − x1 − 4x2 and x4 = 4 − x1 − x2 and since x1 , x2 are integer in (IP)
we must have x3 , x4 integer as well. Thus (IP’) is equivalent to the integer program (IP) of
example 24. Let (LP1’) denote the linear programming relaxation of (IP’). Using the Simplex
algorithm we find the optimal basis B = {1, 2} and rewrite (LP1’) in canonical form for the
optimal basis B = {1, 2} to obtain,
max 12 + (0, 0, −1, −1)x

subject to

1 0 −1/3 4/3 8/3
x=
0 1 1/3 −1/3 4/3
x ≥ 0.
The corresponding basic solution is (8/3, 4/3, 0, 0)T . It implies that (8/3, 4/3)T is the optimal
solution for (LP1) in example 24. We will use the previous linear program to find a cutting
plane. Consider any constraint of the previous linear program where the right hand side is
fractional. In this case, we have a choice and select the first constraint, namely,
1 4 8
x1 − x3 + x4 = .
3 3 3
Then trivially the following constraint holds for every feasible solution to (LP1’),
1 4 8
x1 − x3 + x4 ≤ .
3 3 3
Observe that every variable is non-negative, thus if we replace any of the coefficient for the
variables in the previous equation by a smaller value, the resulting inequality will still be valid
for (LP1’). In particular,

1 4 8
x1 + − x3 + x4 = x1 − x3 + x4 ≤ ,
3 3 3
is valid for (LP1’), where bαc denotes the largest integer smaller or equal to α. Let x̄ be
any feasible solution to (IP’). Then x̄ is a feasible solution to (LP1’) and x̄1 − x̄3 + x̄4 ≤ 83 .
However, as x̄ is integer, x̄1 − x̄3 + x̄4 is integer, it follows that x̄ satisfies,

8
x1 − x3 + x4 ≤ = 2. (6.3)
3
Moreover, (8/3, 4/3, 0, 0)T does not satisfy (6.3). Hence, (6.3) is a cutting plane. Recall, that
(IP’) implies that x3 = 8 − x1 − 4x2 and x4 = 4 − x1 − x2 . Substituting this in (6.3) yields the
constraint x1 + 3x2 ≤ 6 which was the cutting plane (?) given in example 24.
To proceed further we add a slack variable x5 to the constraint (6.3) and modify the linear
programming relaxation (LP1’) of (IP’) by adding the resulting constraint. We thus obtain,
max (2, 5, 0, 0, 0)x

subject to
   
1 4 1 0 0 8 (LP2’)
1 1 0 1 0 x = 4
1 0 −1 1 1 2
x ≥ 0.
We solve (LP2’) using the 2-phase simplex algorithm (we need to use phase I, since we do not
have a feasible basis), and rewrite (LP2’) in canonical form for the optimal basis B = {1, 2, 3}
to obtain,
1 3
max 11 + (0, 0, 0, − , − )x
2 2
subject to
   
1 0 0 3/2 −1/2 3
0 1 0 −1/2 1/2  x = 1
0 0 1 1/2 −3/2 1
x ≥ 0.
The basic optimal solution is (3, 1, 1, 0, 0)T . Since it is integer, it follows (see Remark 6.3)
that it is an optimal solution to (IP’). Note that it corresponds to the optimal solution (3, 1)T
of (IP) in example 24.
Let us generalize this argument. Consider an integer program,
max cT x
subject to
(IP)
Ax = b
x ≥ 0, integer.
We solve the linear programming relaxation (LP) of (IP) using the simplex procedure. If (LP)
has no solution, then neither does (IP). Suppose we obtain an optimal basis B of (LP). We
rewrite (LP) in canonical form for B to obtain a linear program of the form,
max z̄ + c̄TN xN
subject to
xB + ĀN xN = b̄
x ≥ 0.
The corresponding basic solution x̄ is given by x̄B = b̄ and x̄N = 0. If b̄ is integer, then so is x̄,
and x̄ is an optimal solution to (IP), see Remark 6.3. Thus, we may assume that b̄i is fractional
for some index i. Constraint i is of the form
xi + ∑ Āi j x j = b̄i.
j∈N
As this constraint is valid for (LP) with equality it is also valid with ≤. Observe that every
variable is non-negative. Hence, the following inequality will be valid for (LP),
xi + ∑ bĀi j cx j ≤ b̄i.
j∈N
For x̄ feasible for (IP), the LHS of the previous equation is an integer. Hence, the following
constraint is valid for (IP),
xi + ∑ bĀi j cx j ≤ bb̄ic. (?)
j∈N
Remark 6.4. Constraint (?) is a cutting plane for the basic solution x̄.
Proof. Since x̄ j = 0 for all j ∈ N, the left hand side of (?) is x̄i = b̄i . As b̄i is fractional
b̄i > bb̄i c. Then the left hand side is larger than the right hand side for x̄.
Exercises
1. Suppose that we solve a linear programming relaxation (P) of an integer program (where
all variables are required to be integer) and that for the optimal basis B = {1, 2, 3} of (P)
we have the following canonical form,

17 1 5

max 0 0 0 12 12 12 x + 140
subject to
 111 1 9
1 0 0 60 − 13
12 − 60 2
0 1 0 1 1
− 1
x = 1
10 2 10
11 1 11
0 0 1 2 2 − 12 10
x≥0
Which constraints lead to cutting-planes? For each such constraint generate a cutting-
plane.
2. Consider the following linear program,
max{cT x : Ax = b, x ≥ 0}. (P)
Let (D) be the dual of (P). Suppose you found an optimal solution x̄ of (P) and an optimal
solution ȳ of (D). You are being told however, that you forgot to include one important
constraint,
n
∑ a jx j ≤ β . (?)
j=1
(a) Construct a new linear program (P’) by adding a slack variable to (?) and by adding
the resulting constraint to (P).
(b) Show that if x̄ satisfies (?) then x̄ is an optimal solution to (P’).
(c) Compute the dual (D) of (P) and the dual (D’) of (P’).
(d) Show how to compute a feasible solution y0 of (D’) from your solution ȳ of (D). Note,
this works whether or not x̄ satisfies (?).
3. Consider the following Integer program,
max (1, 0, 0)x

subject to
3 (IP)
1
1 1 2 x=
2
x ≥ 0, x integer
Denote by (P1) the linear programming relaxation of (IP).

(a) Solve (P1) with the Simplex (this is trivial).

Denote by x1 the optimal basic solution.
(b) Find a cutting plane (?) for x1 , using the technique described in class.
Denote by (P2) the linear program in Standard Equality Form obtained from (P1) by adding
constraint (?) and by adding a slack variable x4 .
(c) Solve (P2) with the Simplex, starting with the basis B = {3, 4}.
Denote by x2 the optimal basic solution.
(d) Using (c) deduce an optimal solution to (IP).
4. Let
xk + ∑ ā j x j = b̄k (k ∈
/ N)
j∈N
be one of the constraints of the LP in canonical form for an optimal basis B (where N is
the index set for nonbasic variables). Define
N ∗ := { j ∈ N : ā j is not an integer}
and suppose b̄k is not integer. Prove that
∑ xj ≥ 1
j∈N ∗
is a valid inequality for the underlying IP and that it cuts the current optimal solution of
the LP relaxation. Compare this cut to the cut
∑ xj ≥ 1.
j∈N
(Also prove that this is a valid inequality for the underlying IP and that it cuts the current
optimal solution of the LP relaxation.) Which one do you think is a better cut ? Explain
why and prove your claims.
5. Let G = (V, E) be a graph with edge weights we for all e ∈ E. Recall that (3.24) is the
IP formulation of the minimum weight matching problem. Let S ⊆ V be a subset of the
vertices of G where |S| is odd.
6.3. BRANCH & BOUND 227
(a) Show (by combining constraints and using rounding) that the following inequality is
valid for the (IP), i.e. that all feasible solutions to the (IP) satisfy this constraint,

|S|
∑(xi j : i j ∈ E and i ∈ S, j ∈ S) ≤ 2 .
In other words the sum of all variables corresponding to edges with both ends in S is
at most the number of vertices in S divided by two and rounded down.
(b) Find an intuitive explanation as to why this inequality is valid.
6.3 Branch & Bound
In this section, we introduce a solution method for integer programs, called branch & bound.
It is a “divide and conquer” approach for solving linear integer programming problems. We
illustrate some of the main ideas using a production example.
Every year, the University of Waterloo readies itself for the annual home coming festivi-
ties. The university expects hundreds of alumni and their families to visit its campus for the
occasion, and campus stores in particular expect to be busier than usual. Two items promise
to be in especially high demand during these times: the university’s infamous Pink Tie, as
well as the (almost) equally popular Pink Bow-Tie. With only a few weeks to go until home
coming, it is high time to replenish stocks for these two items. The university manufactures
its ties from its own special cloth of which it currently has 45ft2 in stock. For 10 pink bow
ties, it needs 9ft2 of cloth, and producing 10 pink ties requires 11ft2 of the raw material. In
addition to cloth, the manufacturing process also requires labour which is particularly short
during the next few weeks: only a total of 6 work days are available. Manufacturing of the
ties is done in batches of 10. Producing 10 pink bow ties takes 1.5 days, and producing 10
pink ties takes a day. The university expects a profit of $6 for a pink bow tie, and a profit of
$5 for a pink tie. How much of each product should it produce?
Just like in Chapter 1, we can formulate this problem as an integer program. We introduce
variables x and y for 10’s of bow-ties and ties to produce. The following integer program has
constraints restricting the available labour and raw material, and its objective function is the
total profit for the given production vector.
max 60 x + 50 y (6.4)
subject to 1.5 x + y ≤ 6
9 x + 11 y ≤ 45
x, y ≥ 0
x, y integer. (6.5)
How can we solve this integer program? Let us use what we know: linear programming!
We drop the integrality constraints (6.5) and obtain the linear programming relaxation (6.4 −
LP) of (6.4). Remark 6.3(2) states that if we solve the LP relaxation of an integer program,
and obtain a solution, all of whose variables are integers, then it is also an optimal solution of
the original IP! Unfortunately, solving (6.4 − LP) gives the solution x = 2.8, y = 1.8 of value
258 which has fractional components. Remark 6.3 implies, however, that no integral feasible
solution for (6.4) can have value greater than 258, and hence we have found an upper bound
on the maximum value of any feasible solution of the original IP.
We will now use the fractional solution to partition the feasible region. In the following,
we let Subproblem 1 denote the original LP relaxation of the IP. We observe that every feasible
solution for (6.4) must have either x ≤ 2 or x ≥ 3, and that the current fractional solution does
not satisfy either one of these constraints. Thus, we now branch on variable x and create two
additional subproblems:
Subproblem 2 Subproblem 1 + Constraint x ≥ 3.
Subproblem 3 Subproblem 1 + Constraint x ≤ 2.
None of the two subproblems contains the point (2.8, 1.8), and the optimal solution for the
original LP relaxation can therefore not re-occur when solving either one of the two subprob-
lems. We now choose any one of the above two subproblems to process next. Arbitrarily, we
pick subproblem 2. Solving the problem yields the optimal solution x = 3, and y = 1.5 with
value 255. This solution is still not integral (as y is fractional), but 255 gives us a new, tighter
upper bound on the maximum value of any integral feasible solution for this subproblem.
The subproblem structure explored by the branch & bound algorithm is depicted in Figure
??. Each of the subproblems generated so far is shown as a box which are referred to as
branch & bound nodes. The two nodes for subproblems 2 and 3 are connected to their parent
node, and the corresponding edges are labeled with the corresponding constraints that were
added to subproblem 1. The entire figure is commonly known as the branch & bound tree
generated by the algorithm.
The optimal solution for Subproblem 2 is still not integral as the value of y is fractional.
We branch on y, and generate two more subproblems:
Subproblem 4 Subproblem 2 + Constraint y ≤ 1.
Subproblem 5 Subproblem 2 + Constraint y ≥ 2.
Running the Simplex algorithm on Subproblem 5, we quickly find that the problem is
infeasible. Hence, this subproblem has no fractional feasible solution, and thus no integral
one either. Solving Subproblem 4, we find the solution x = 3 31 , and y = 1 of value 250. We
generate two more subproblems by branching on x.
Subproblem 6 Subproblem 4 + Constraint x ≤ 3.
Subproblem 7 Subproblem 4 + Constraint x ≥ 4.
Solving subproblem 6 yields integral solution x = 3, y = 1 of value 230. Solving subprob-

lem 7 gives integral solution x = 4, y = 0 of value 240, which is the current best. Figure ??
shows the current status.
So far, we have found a feasible integral solution of value 240 for subproblem 7. On the
other hand, we know from the LP solution at subproblem 1 that no fractional (and hence no
integral) feasible solution can have value bigger than 258. We continue exploring the tree at
subproblem 3. Solving the subproblem yields solution x = 2 and y = 2.45 and value 242.73.
Branching on y gives two more subproblems.
Subproblem 8 Subproblem 3 + Constraint y ≤ 2.
Subproblem 9 Subproblem 3 + Constraint y ≥ 3.
Solving subproblem 8 gives integral solution x = 2, y = 2 of value 220; this is inferior to

the solution of subproblem 7 that we had previously found. Solving subproblem 9 gives the
fractional solution x = 4/3, and y = 3 of value 230. Now observe the following: the optimal
value of this subproblem is 230, and hence no integral solution for this subproblem has value
greater than that. On the other hand, we have already found an integral solution of value 240,
and thus, will not find a solution that is better within this subproblem. We therefore might as
well stop branching here!
Formally, whenever the optimal value of the current subproblem is at most the value of the
best integral solution that we have already found, then we may stop branching at the current
node. We say: we prune the branch of the branch & bound tree at the current node.
We are done as no unexplored branches of the branch & bound tree remain. The final
tree is shown in Figure ??. The optimal solution to the original IP (6.4) is therefore x = 4,
y = 0 and achieves a value of 240. Therefore, in order to optimize profit, the university is best
advised to invest all resources into the production of pink bow-ties!
We conclude this section with a brief discussion. The branch & bound algorithm dis-
cussed here can be viewed as a smart enumeration algorithm: it uses linear programming in a
smart way, partitioning the space of feasible solutions into mutually exclusive and exhaustive
regions. For example, in the pink tie example discussed above, it is instructive to see that any
feasible integral solution occurs in one of the subproblems corresponding to the leaves of the
tree in Figure ??. The algorithm is smart as it uses the optimal LP value of a subproblem to
possibly prune it. Sometimes, such pruning can save exploring a large number of potential
solutions hidden in a subproblem.
The algorithm as described is quite flexible in many ways, and in our description, we
made many arbitrary choices. For example, if we solve a subproblem, and the solution has
more than one fractional variable, how do we decide which variable to branch on? And once
we branched on a variable, which of the generated problems do we solve next? The depth-
first search style exploration chosen in our example, where newly generated subproblems
are explored first, is popular as it leads to integral solutions quickly. However, many other
strategies have been analyzed.
Why do we not simply present the strategy that works best? Well, such a strategy likely
does not exist. Solving integer programs in general is N P-hard, and very likely, no efficient
algorithm exists. In practice, however, branch & bound is nearly always superior to simple
enumeration of feasible solutions, and is used in some form or the other in most commercial
codes.
We conclude with one last comment regarding implementation. In this section, we reduced
the problem of finding an optimal production plan for a tie production problem to that of
solving a series of 9 linear programming problems. The reader may notice that these problems
are extremely similar! I.e., branching on a variable merely adds a single constraint to an LP
for which we know an optimal basic solution. Can we use such an optimal basic solution for
a parent problem to compute solutions for the generated subproblems faster? The answer is
yes! The so called dual Simplex algorithm applies in situations where a single constraint is
added to a linear program, and where the old optimal solution is rendered infeasible by this
new constraint. In many cases, the algorithm reoptimizes quickly from the given infeasible
starting point.
Exercises
1. Suppose we use a branch and bound scheme to solve an integer program that is a maxi-
mization problem. The following figure describes the partial enumeration tree. For each
subproblem we indicate the value of the objective function as well as whether the solution
is an integer solution or not. Indicate in the figure which subproblems we still need to
solve (if any). In particular, can we already deduce the value of the optimal solution to the
integer program?
z=60
Fractional
x4 4 x4 5
z=52 z=57
Fractional Fractional
x6 6 x6 7 x3 9 x3 10
z=42 z=49 z=49

infeasible
Fractional Fractional Fractional
x8 2 x8 3 x5 0 x5 1
z=45 z=42 z=46 z=46
Fractional Integer Fractional Integer
2. Use branch and bound to solve the following linear program,
max 18x1 + 10x2 + 6x3 + 4x4

subject to
12x1 + 10x2 + 8x3 + 6x4 ≤ 18
0 ≤ xi ≤ 1, xi integer (i = 1, 2, 3, 4)
At each iteration select the active subproblem with the largest upper bound. Branch on
fractional variables, i.e. if xi = 1/3 create subproblem xi ≤ 0 and xi ≥ 1.
You do not need to use the simplex algorithm to find a solution to each of the relaxations.
We illustrate a simple technique on the relaxation of the original problem. Consider the
ratios:
18 10 6 4
> > > .
12 10 8 6
To maximize the objective function we set x1 as large as possible, then x2 as large as
possible, etc. In this case it yields, x1 = 1, x2 = 6/10 and x3 = x4 = 0. Proceed in a similar
way to solve each of the other relaxations.
3. Consider the integer program

max −xn+1 : 2x1 + 2x2 + · · · + 2xn + xn+1 = n, x ∈ {0, 1}n+1 ,
where n is an odd positive integer. Prove that the branch and bound algorithm (without
n
using cuts) will have to examine at least 2b 2 c subproblems before it can solve the main
problem, in the worst-case.

When we study integer programs or mixed integer programs, we usually assume that the data
are rational numbers (no irrational number is allowed in the data). This is a reasonable
assumption. Besides, allowing irrational numbers in data can cause some difficulties, e.g.,
(IP) may have an optimal
√ solution and the LP relaxation may be unbounded (consider the
problem max{x1 : x1 − 2x2 = 0, x1 , x2 integer}). Or, we may need infinitely many cutting-
planes to reach a solution of the IP (to construct an example for this, suitably modify the
previous example).
In practice, for hard IP problems, we use much more sophisticated rules for branching,
pruning, choosing the next subproblem to solve, etc. Moreover, as we mentioned before, we
use a hybrid approach, called branch and cut, which uses cutting planes for the original IP
as well as some of the subproblems. In addition, in many hard cases, we adjust our pruning
strategy so that instead of guaranteeing an optimal solution to the IP, we lower our standards
and strive for generating feasible solutions of the IP that are provably within say 10% of
the optimum value (or 2% of the optimum value, etc.). For further background in integer
programming, see, for instance [26].
Chapter 7
Nonlinear optimization
In this chapter, we will show that solving general NLP is likely to be difficult, even when the
problem has small size. One reason is that the feasible region (the set of all feasible solutions)
of an NLP is not always convex. We therefore, turn our attention to the special case where
the feasible region is convex. We discuss optimality conditions and give a brief overview of
a primal dual polynomial algorithm for linear programming based on ideas from nonlinear
optimization.
Key concepts covered in this chapter include: convex functions, epigraphs, subgradients,
Lagrangians, Karush-Kuhn-Tucker Theorem and primal-dual interior-point algorithms.
7.1 Some examples

Recall the definition of NLP, given in Chapter 1.6. Let us look at some example of NLP.
Example 25. Suppose n = 2 and m = 4 in (1.35) and that for x = (x1 , x2 )T ∈ R2 we have,
f (x) := x2 , and
3 3
g1 (x) := −x12 − x2 + 2, g2 (x) := x2 − , g3 (x) := x1 − , g4 (x) := −x1 − 2.
2 2
The feasible region is a subset of R2 . It corresponds to the union of the two shaded regions
in Figure 7.1. For instance, g1 (x) = −x12 − x2 + 2 ≤ 0 or equivalently, x2 ≥ 2 − x12 . Thus the
solution to the first constraint of the NLP is the set of all points above the curve indicated by g1
in the figure. As we are trying to find among all points x = (x1 , x2 )T in the feasible region the
one that minimizes f (x) = x2 , the unique optimal solution will be the point a = (−2, −2)T .
233
234 CHAPTER 7. NONLINEAR OPTIMIZATION
Observe that the feasible region is not convex, indeed it is not even “connected” (i.e. there
does not exists a continuous curve contained in the feasible region joining any two points of
the feasible region).
x2
g4 g3
2
g2
−1 0 1 2 x1
−2
b
−1
g1
a −2
Figure 7.1: Feasible region for Example 25
Example 26. Suppose n = 2 and m = 3 in (1.35) and that for x = (x1 , x2 )T ∈ R2 we have,
f (x) := −x1 − x2 , and

1
g1 (x) := −x1 + x22 , g2 (x) := −x2 + x12 , g3 (x) := −x1 + .
2
The feasible region is a subset of R2 . It corresponds to the shaded region in Figure 7.2.
For instance g2 (x) = −x2 + x12 ≤ 0 or equivalently, x2 ≥ x12 . Thus the solution to the second
constraint of the NLP is the set of all points above the curve indicated by g2 in the figure.
For g1 we interchange the role of x1 and x2 in g2 . We will prove that the feasible solution
a = (1, 1)T is an optimal solution to the NLP. Observe that the feasible region is convex in
this case.
Observe that if in (1.35) every function f and gi is affine (that is, a function of the form
aT x + β for a given vector a and a given constant β ), then we have f (x) = cT x + c0 , and
gi (x) = aTi x−bi , for every i ∈ {1, 2, . . . , m} and we see that our nonlinear optimization problem
(1.35) becomes a linear optimization problem. Thus, NLPs generalize linear programs but as
we will show in the next section they are much harder to solve than linear programs.
7.2. SOME NONLINEAR PROGRAMS ARE VERY HARD 235
x2
g3 ! "
1 1
a=
1
3
4
g1
1
2
g2
1
4
0 1 1 3 1 x1
4 2 4
Figure 7.2: Feasible region for Example 26
7.2 Some nonlinear programs are very hard
7.2.1 NP-hardness
Nonlinear optimization programs can be very hard in general. For instance, suppose that for
every j ∈ {1, . . . , n} we have the constraints,
x2j − x j ≤ 0,
−x2j + x j ≤ 0.
Then the constraints define the same feasible region as the quadratic equations: x2j = x j , for
every j ∈ {1, 2, . . . , n}. Therefore, the feasible region of these constraints is exactly the 0,1
vectors in Rn . Now, if we also add the constraints Ax ≤ b, we deduce that every 0,1 integer
programming problem can be formulated as an NLP. In other words, 0,1 integer program-
ming is reducible to nonlinear programming. Moreover, this reduction is clearly polynomial.
Therefore, solving an arbitrary instance of an NLP is at least as hard as solving an arbitrary
instance of a 0,1 integer programming problem. In particular, as 0,1 feasibility is NP-hard, so
is nonlinear programming feasibility.
7.2.2 Hard small dimensional instances

We might be tempted to think that if the number of variables in the NLP is very small perhaps
then solving NLP would be easy. However, this is not the case as we will illustrate.
Pierre de Fermat in 1637, conjectured the following result,
Theorem 7.1. There do not exist integers x, y, z ≥ 1 and integer n ≥ 3 such that xn + yn = zn .
Fermat, wrote his conjecture in the margin of a journal, and claimed to have a proof of this
result, but that it was too large to fit in the margin. This conjecture became known as Fermat’s
Last theorem. The first accepted proof of this result was published in 1995, some 358 years
after the original problem was proposed. We will show that a solution to a very simple looking
NLP with only four variables has the following key property: the optimal objective value of
zero is attained, if and only if Fermat’s Last Theorem is true. Hence, solving this particular
NLP is at least as hard as proving Fermat’s Last Theorem! In this NLP, see (1.35), we have 4
variables (n = 4) and 4 constraints (m = 4). For x = (x1 , x2 , x3 , x4 )T ∈ R4 ,
2
f (x) := x1x4 + x2x4 − x3x4 + (sin πx1 )2 + (sin πx2 )2 + (sin πx3 )2 + (sin πx4 )2 ,
and
g1 (x) := 1 − x1 , g2 (x) := 1 − x2 , g3 (x) := 1 − x3 , g4 (x) := 3 − x4 .
First observe that the feasible region of this NLP is given by,
S := {x ∈ R4 : x1 ≥ 1, x2 ≥ 1, x3 ≥ 1, x4 ≥ 3}.
Note that f (x) is a sum of squares. Therefore, f (x) ≥ 0 for every x ∈ R4 and it is equal to zero
if and only if every term in the sum is zero, i.e.,
x1x4 + x2x4 = x3x4 and sin πx1 = sin πx2 = sin πx3 = sin πx4 = 0.
The latter string of equations is equivalent to x j being integer for every j ∈ {1, 2, 3, 4}. More-
over, the feasibility conditions require x1 ≥ 1, x2 ≥ 1, x3 ≥ 1, x4 ≥ 3. Therefore, f (x̄) = 0 for
some x̄ ∈ S if and only if x̄ j is a positive integer for every j with x̄4 ≥ 3, and x̄1x̄4 + x̄2x̄4 = x̄3x̄4 .
That is, if and only if Fermat’s Last Theorem is false. Surprisingly, it is not difficult to prove
(and it is well-known) that the infimum of f over S is zero. Thus, the difficulty here lies
entirely in knowing whether the infimum can be attained.
7.3. CONVEXITY 237
We just argued that some nonlinear optimization problems can be very hard even if the
number of variables is very small (e.g., at most 10) or even if the nonlinearity is bounded
(e.g., at most quadratic functions). However, by carefully isolating the special nice structures
in some classes of nonlinear programming problems and exploiting these structures allow us
to solve many, large-scale non-linear programs in practice.
7.3 Convexity
Consider an NLP of the form given in (1.35) and denote by S the feasible region. We say that
x̄ ∈ S is locally optimal if for some positive d ∈ R we have that f (x̄) ≤ f (x) for every x ∈ S
where kx − x̄k ≤ d, i.e. no feasible solution of the NLP that is within distance d of x̄ has better
value than x̄. We sometimes call an optimal solution to the NLP, a globally optimal solution.
It is easy to verify, that if S is convex, then locally optimal solutions are globally optimal.
However, when S is not convex we can have locally optimal solutions that are not globally
optimal. This is illustrated in Example 25. There, b is locally optimal, yet a 6= b is the only
globally optimal solution.
A natural scheme for solution an optimization problem is as follows: find a feasible solu-
tion, and then repeatedly, either (i) show that the current feasible solution is globally optimal,
using some optimality criteria, or (ii) try to find a better feasible solution (here better might
mean one with better value for instance, though this may not always be possible). The simplex
algorithm for linear programming follows this scheme. Both steps (i) and (ii) may become
difficult when the feasible region is not convex. We will therefore turn our attention to the
case where the feasible region is convex. In this section, we establish sufficient conditions for
the feasible region of an NLP to be convex (see Remark 7.4).
7.3.1 Convex functions and epigraphs

We say that the function f : Rn → R is convex if for every pair of points x(1) , x(2) ∈ Rn and for
every λ ∈ [0, 1],

f λ x(1) + (1 − λ )x(2) ≤ λ f x(1) + (1 − λ ) f x(2) .
In other words, f is convex if for any two points x(1) , x(2) the unique linear function on the
line segment between x(1) and x(2) that takes the same value for x(1) and x(2) as f , dominates
the function f . An example of a convex function is given in Figure 7.3. An example of a

non-convex function is given in Figure 7.4.
y∈R y∈R
f (x)
f (x) λf (x(1) ) + (1 − λ)f (x(2) )

λf (x(1) ) + (1 − λ)f (x(2) )
x ∈ Rn x ∈ Rn
x(1) x(2) x(1) x(2)
Figure 7.3: Convex function Figure 7.4: Non-convex function
Example 27. Consider the function f : R → R where f (x) = x2 . Let a, b ∈ R be arbitrary, and
consider an arbitrary λ ∈ [0, 1]. To prove that f is convex we need to verify 1 that
?
[λ a + (1 − λ )b]2 ≤ λ a2 + (1 − λ )b2 .
Clearly, we may assume that λ ∈

/ {0, 1}, i.e. that 0 < λ < 1. After expanding and simplifying
the terms, it suffices to verify that,
?
λ (1 − λ )2ab ≤ λ (1 − λ )(a2 + b2 ),
or equivalently as λ , (1 − λ ) > 0, that a2 + b2 − 2ab ≥ 0, which is clearly the case as a2 + b2 −

2ab = (a − b)2 , and the square of any number is non-negative.
The concepts of convex functions and convex sets are closely related through the notion of
epigraph of a function. Given f : Rn → R, define the epigraph of f as

µ n
epi( f ) := ∈ R × R : f (x) ≤ µ .
x
In both figures 7.5 and 7.6 we represent a function f : R → R and its corresponding epigraph
(represented as the shaded region going to infinity in the up direction). The following result,
relates convex functions and convex sets,
1 Applying the definition is not the only way to prove that a function is convex. In this particular case, for
instance, we can compute its second derivative and observe that it is non-negative.
7.3. CONVEXITY 239
Proposition 7.2. Let f : Rn → R. Then, f is convex if and only if epi( f ) is a convex set.
Observe in the previous proposition that the epigraph is living in an (n+1)-dimensional space.
The function in Figure 7.5 is convex as its epigraph is convex. However, the function in
Figure 7.6 is not convex as its epigraph is not convex.
y∈R y∈R
epi(f )
f (x) f (x) epi(f )
(x, y), y ≥ f (x)
! "
x, f (x)
x ∈ Rn x ∈ Rn
x
Figure 7.5: Convex epigraph Figure 7.6: Non-convex epigraph

µ1 µ2
Proof of Proposition 7.2. Suppose f : Rn → R is convex. Let , ∈ epi( f ) and
u v
λ ∈ [0, 1]. We have
f (λ u + (1 − λ )v) ≤ λ f (u) + (1 − λ ) f (v) ≤ λ µ1 + (1 − λ )µ2 ,

λ µ1 + (1 − λ )µ2
which implies ∈ epi( f ). Note that in the above, the first inequality
λ u + (1 − λ )v
uses
the of f and the second inequality uses that facts λ ≥ 0, (1 − λ ) ≥ 0 and
convexity

µ1 µ2
, ∈ epi( f ).
u v

n f (u) f (v)
Now, suppose that epi( f ) is convex. Let u, v ∈ R and λ ∈ [0, 1]. Then , ∈
u v
epi( f ). Hence,

f (u) f (v)
λ + (1 − λ ) ∈ epi( f ).
u v
This implies (by the definition of epi( f )), f (λ u + (1 − λ )v) ≤ λ f (u) + (1 − λ ) f (v). There-
fore, f is convex.
7.3.2 Level sets and feasible region

Let g : Rn → R be a convex function and let β ∈ R. We call the set
{x ∈ Rn : g(x) ≤ β }
a level set of the function g.
Remark 7.3. The level set of a convex function is a convex set.
In Figure 7.7 we represent a convex function with a convex level set. In Figure 7.8 we rep-
resent a non-convex function with a non-convex level set. We leave it as an exercise to show
however, that it is possible to have a non-convex function where every level set is convex.
y∈R y∈R
f (x)
f (x)
y=α
y=β
x ∈ Rn x ∈ Rn
{x ∈ Rn : f (x) ≤ α} {x ∈ Rn : f (x) ≤ α}
Figure 7.7: Convex level set Figure 7.8: Non-convex level set
Proof of Remark 7.3. Let g :∈ Rn → R be a convex function and let β ∈ R. We need to show
that S := {x ∈ Rn : g(x) ≤ β } is convex. Let x(1) , x(2) ∈ S and let λ ∈ [0, 1]. Then

g λ x(1) + (1 − λ )x(2) ≤ |{z}
λ g x(1) + (1 − λ ) g x(2) ) ≤ β ,
| {z } | {z } | {z }
≥0 ≥0
≤β ≤β
where the first inequality follows from the fact that g is a convex function. It follows that
λ x(1) + (1 − λ )x(2) ∈ S. Hence, S is convex as required.
7.3. CONVEXITY 241
Consider the NLP defined in (1.35). We say that it is a convex NLP if g1 , . . . , gm and f are
all convex functions. It follows in that case from Remark 7.3 that for every i ∈ {1, . . . , n} the
level set {x ∈ Rn : gi ≤ 0} is a convex set. Since the intersection of convex sets is a convex set
(see Remark 2.16), we deduce that the feasible region,
{x ∈ Rn : g1 (x) ≤ 0, g2 (x) ≤ 0, . . . , gm (x) ≤ 0} (7.1)
is convex as well. Hence,
Remark 7.4. The feasible region of a convex NLP is a convex set.
When g1 , . . . , gm are all affine functions, then the feasible region (7.1) is a polyhedron. More-
over, in that case, the functions are clearly convex. Hence, the previous result implies in
particular that every polyhedron is a convex set, which was the statement of Proposition 2.17.
Exercises
1. (a) Let g1 , g2 , g3 : R → R be defined by
g1 (x) := −x, g2 (x) := 2, g3 (x) := x.
Plot these functions on R2 . Identify on your plot, the function ĝ : R → R defined by,
ĝ(x) := max{g1 (x), g2 (x), g3 (x)}, ∀x ∈ R.
Prove that ĝ is a convex function.

(b) Suppose g1 , g2 , . . . , gm : Rn → R are given convex functions. Define
ĝ(x) := max{g1 (x), g2 (x), . . . , gm (x)}, ∀x ∈ Rn .
Prove that ĝ is a convex function.
2. Denote by (P) the the following NLP
minimize f (x) subject to gi (x) ≤ 0 for i = 1, 2, . . . , m.
where each of the functions f , g1 , . . . , gm is convex.
(a) Prove that the set of feasible solutions of (P) is a convex set.
(b) Let x̂ be a locally optimal feasible solution of (P). Prove that x̂ is globally optimal.
7.4 Relaxing convex NLPs

Consider an NLP of the form (1.35) and let x̄ be a feasible solution. We say that constraint
g j (x) ≤ 0 is tight for x̄ if g j (x̄) = 0 (see also Section 2.8.3). We will show that under the right
circumstances, we can replace in a convex NLP a tight constraint by a linear constraint such
that the resulting NLP is a relaxation of the original NLP (see Corollary 7.6). This will allow
us in Section 7.5 to use our optimality conditions for linear programs to derive a sufficient
condition for a feasible solution to be an optimal solution to a convex NLP. The key concepts
that will be needed in this section are that of subgradients and supporting halfspaces.
7.4.1 Subgradients
Let g ∈ Rn → R be a convex function and let x̄ ∈ Rn . We say that s ∈ Rn is a subgradient of f
at x̄ if for every x ∈ Rn the following inequality holds,
f (x̄) + sT (x − x̄) ≤ f (x).
Denote by h(x) the function f (x̄) + sT (x − x̄). Observe that h(x) is an affine function (x̄ is a
constant). Moreover, we have that h(x̄) = f (x̄) and h(x) ≤ f (x) for every x ∈ Rn . Hence, the
function h(x) is an affine function that provides a lower bound on f (x) and approximates f (x)
around x̄. See Figure 7.9.
y∈R
f (x)
h(x) := f (x̄) + sT (x − x̄)
x ∈ Rn
x̄
Figure 7.9: Subgradient

7.4. RELAXING CONVEX NLPS 243
Example 28. Consider g :∈ R2 → R where for every x = (x1 , x2 )T we have g(x) = x22 − x1 . It
can be readily checked (see Example 27) that g is convex. We claim that s := (−1, 2)T is a

subgradient of g at x̄ = (1, 1)T . We have h(x) := g(x̄) + sT (x − x̄) = 0 + (−1, 2)T x − (1, 1) =
−x1 + 2x2 − 1. We need to verify, h(x) ≤ f (x) for every x ∈ R2 , i.e. that
?
−x1 + 2x2 − 1 ≤ x22 − x1 ,
or equivalently that x22 − 2x1 + 1 = (x2 − 1)2 ≥ 0 which clearly holds as the square of any
number is non-negative.
7.4.2 Supporting halfspaces

Recall the definitions of hyperplanes and halfspaces from Chapter 2.8.1. Consider a convex
set C ⊆ Rn and let x̄ ∈ C. We say that the halfspace F := {x ∈ Rn : sT x ≤ β } (s ∈ Rn and β ∈ R)
is a supporting halfspace of C at x̄ if the following conditions hold:
1. C ⊆ F
2. sT x̄ = β , i.e. x̄ is on the hyperplane that defines the boundary of F.

In Figure 7.10 we represent a convex set C ⊆ R2 . For the point x(1) ∈ C there is a unique
supporting halfspace. For the point x(2) ∈ C there are an infinite number of different supporting
halfspaces, we represent two of these.
The following remark relates subgradients and supporting halfspaces.
Remark 7.5. Let g : Rn → R be a convex function, let x̄ ∈ Rn such that g(x̄) = 0, and let
s ∈ Rn be a subgradient of f at x̄. Denote by C the level set {x ∈ Rn : g(x) ≤ 0} and by F the
halfspace {x ∈ Rn : g(x̄) + sT (x − x̄) ≤ 0}. 2 Then F is a supporting halfspace of C at x̄.
Proof. We need to verify conditions (1), (2) of supporting halfspaces. (1) Let x0 ∈ C. Then
g(x0 ) ≤ 0. Since s is a subgradient of g at x̄, g(x̄) + sT (x0 − x̄) ≤ g(x0 ). It follows that g(x̄) +
sT (x0 − x̄) ≤ 0, i.e. x0 ∈ F. Thus C ⊆ F. (2) sT x̄ = sT x̄ − g(x̄) as g(x̄) = 0.
This last remark is illustrated in Figure 7.11. We consider the function g : R2 → R, where
g(x) = x22 − x1 and the point x̄ = (1, 1)T . We saw in Example 28 that the vector s = (−1, 2)T
is a subgradient for g at x̄. Then F = {x ∈ R2 : −x1 + 2x2 − 1 ≤ 0}. We see in the figure that
F is a supporting halfspace of C at x̄ as predicted.
We deduce the following useful tool from the previous remark,
2F is clearly a halfspace since we can rewrite it as {x : sT x ≤ sT x̄ − g(x̄)} and sT x̄ − g(x̄) is a constant.
s
unique supporting
halfspace
x(1)
many supporting
(2)
halfspaces
C x
Figure 7.10: Supporting halfspace
Corollary 7.6. Consider an NLP of the form given in (1.35). Let x̄ be a feasible solution and
suppose that constraint gi (x) ≤ 0 is tight for some i ∈ {1, . . . , m}. Suppose gi is a convex func-
tion that has a subgradient s at x̄. Then the NLP obtained by replacing constraint gi (x) ≤ 0
by the linear constraint sT x ≤ sT x̄ − gi (x̄) is a relaxation of the original NLP.
7.5 Optimality conditions for the differentiable case
In this section we consider convex NLPs of the form (1.35) that satisfy the additional condition
that all of the functions f and g1 , . . . , gm are differentiable. In that setting we can characterize
(see Theorem 7.9) when a feasible solution x̄ is an optimal solution (assuming the existence
of a Slater point).
7.5. OPTIMALITY CONDITIONS FOR THE DIFFERENTIABLE CASE 245
x2
s
2
1
x̄ C F
0 1 2 x1
Figure 7.11: Subgradient and supporting halfspace
7.5.1 Sufficient conditions for optimality
We claim that it is sufficient to consider NLPs where the objective function is linear, i.e. of
the form,
min z = cT x
s.t.
g1 (x) ≤ 0,
g2 (x) ≤ 0, (7.2)
.. .. ..
. . .
gm (x) ≤ 0.
This is because problem (1.35) is reducible to problem (7.2). To prove this fact, we can
proceed as follows: given a NLP of the form (1.35) introduce a new variable xn+1 and add the
constraint f (x) ≤ xn+1 , to obtain the NLP,
max z = −xn+1
s.t.
f (x) − xn+1 ≤ 0,
g1 (x) ≤ 0,
g2 (x) ≤ 0,
.. .. ..
. . .
gm (x) ≤ 0.
This NLP is of the form (7.2), and minimizing xn+1 is equivalent to minimizing f (x).
Let x̄ be a feasible solution to the NLP (7.2) and assume that it is a convex NLP. Let
us derive sufficient conditions for x̄ to be an optimal solution to (7.2). Define, J(x̄) :=
{i : gi (x̄) = 0}. That is, J(x̄) is the index set of all constraints that are tight at x̄. Suppose
for every i ∈ J(x̄) we have a subgradient si of the function gi at the point x̄. Then we construct
a linear programming relaxation of the NLP as follows: first we omit every constraint that is
not tight at x̄, and for every i ∈ J(x̄), we replace (see Corollary 7.6) the constraint gi (x) ≤ 0
by the linear constraint sTi x ≤ sTi x̄ − g(x̄). Since the objective function is given by “min cT x”
we can rewrite it as “max −cT x”. The resulting linear program is thus,
max z = −cT x
s.t. (7.3)
sTi x ≤ sTi x̄ − g(x̄) for all i ∈ J(x̄)
Theorem 4.7 says that x̄ is an optimal solution to (7.3), and hence of (7.2) as it is a relaxation
of (7.3), when −c is in the cone of the tight constraints, i.e. if −c ∈ cone{si : i ∈ J(x̄)}. Hence,
we proved the following,
Proposition 7.7. Consider the NLP (7.2) and assume that g1 , . . . , gm are convex function. Let
x̄ be a feasible solution and suppose that for all i ∈ J(x̄) we have a subgradient si at x̄. If
−c ∈ cone{si : i ∈ J(x̄)} then x̄ is an optimal solution.
Thus we have sufficient conditions for optimality. Theorem 7.9 that we will state in the next
section, essentially says that when the subgradients are unique, then these conditions are also
necessary. We illustrate the previous proposition on an example.
Example 26 continued. Let us use the previous proposition to show that the point x̄ = (1, 1)T
is an optimal solution to the NLP given in Example 26. In this case J(x̄) = {1, 2}, and the
feasible region for the linear programming relaxation (7.3) corresponds to the shaded region
x2
1 (1, 1)
3
4 g1
g2
1
2 g3
F1 1
4
0 1 1 3 1 x1
4 2 4
F2
Figure 7.12: Linear programming relaxation of NLP
in Figure 7.12. In Example 28 we showed that the subgradient of g1 at x̄ is (−1, 2)T . Similarly,
we can show that the subgradient of g2 at x̄ is (2, −1)T . In this example with have c = (1, 1)T ,
thus Proposition 7.7 asks us to verify that

1 ? −1 2
−c = ∈ cone , ,
1 2 −1
which is the case since,

1 −1 2
= 1× +1× .
1 2 −1
7.5.2 Differentiability and gradients

Let f : Rn → R and x̄ ∈ Rn be given. If there exists s ∈ Rn such that
f (x̄ + h) − f (x̄) − sT h
lim = 0,
h→0 khk2
we say that f is differentiable at x̄ and call the vector s the gradient of f at x̄. We denote s by
∇ f (x̄). We will use the following without proof,
Proposition 7.8. Let f : Rn → R be a convex function and let x̄ ∈ Rn . If the gradient ∇ f (x̄)
exists then it is a subgradient of f at x̄.
Note that in the above definition of the gradient, h varies over all vectors in Rn . Under some
slightly more favorable conditions, we can obtain the gradient ∇ f (x̄) via partial derivatives of
∂f
f at x̄. For example, suppose that for every j ∈ {1, 2, . . . , n}, the partial derivatives ∂xj exist
and are continuous at every x ∈ Rn . Then,
T
∂ f (x) ∂ f (x) ∂ f (x)
∇ f (x) = , ,..., ,
∂ x1 ∂ x2 ∂ xn
and the gradient of f at x̄ is given by ∇ f (x̄).

Example 28 continued. Consider g :∈ R2 → R where for every x = (x1 , x2 )T we have g(x) =
x22 − x1 . In Example 28 we gave a proof that (−1, 2)T is a subgradient of g at x̄ = (1, 1)T . We
give an alternate proof based on the previous discussion. The partial derivatives of g exists
and ∇g(x) = (2x2 , −1)T . Evaluating, at x̄ we deduce that (2, −1)T is the gradient of g at x̄.
Since g is convex, Proposition 7.8 implies that (2, −1)T is also a subgradient of g at x̄.
7.5.3 A Karush-Kuhn-Tucker Theorem

To state the optimality theorem we need a notion of “strictly feasible point” for NLP. More
rigorously, we say that the NLP has a Slater point) if there exists x0 such that gi (x0 ) < 0 for
every i ∈ {1, . . . , m}, i.e. every inequality is satisfied strictly by x0 . For instance in Example 26,
the point ( 43 , 43 )T is a Slater point.
We can now state our optimality theorem,
Theorem 7.9 (Karush-Kuhn-Tucker Theorem based on the gradients). Consider a convex

NLP of the form (1.35) that has a Slater point. Let x̄ ∈ Rn be a feasible solution and assume
that f , g1 , g2 , . . . , gm are differentiable at x̄. Then x̄ is an optimal solution of NLP if and only if

−∇ f (x̄) ∈ cone ∇gi (x̄) : i ∈ J(x̄) . (?)
We illustrate the previous theorem in Figure 7.13. In that example, the tight constraints for x̄
are g1 (x) ≤ 0 and g2 (x) ≤ 0. We indicate the cone formed by ∇g1 (x̄), ∇g2 (x̄) (translated to
have x̄ as its origin). In this example, −∇ f (x̄) is in that cone, hence the feasible solution x̄ is
in fact optimal.
Suppose that in Theorem 7.9 the function f (x) is the linear function cT x, i.e. the NLP is of
the form (7.2). Then ∇ f (x̄) = c and the sufficiency of Theorem 7.9 follows immediately from
Proposition 7.7, i.e. we have shown that if the condition (?) holds then x̄ is indeed an optimal
! "
g2 (x) = 0 cone ∇gj (x̄) : j ∈ J(x̄)
feasible (x̄)
region ∇g 2
∇f (x̄)
−∇f (x̄)
∇g
1 (x̄
)
g1 (x) = 0
x̄
Figure 7.13: Karush-Kuhn-Tucker Theorem based on gradients
solution. The essence of the theorem is to prove the reverse direction, i.e. that x̄ will only be
optimal when (?) holds. This is when the fact that x̄ is a Slater point comes into play. Observe,
that when f and g1 , . . . , gm are all affine functions, then Theorem 7.9 becomes the optimality
theorem for linear programs (Theorem 4.7).
Theorem 4.7 was a restatement of the Complementary Slackness Theorem 4.6. Similarly,
we leave it as an exercise to check that condition (?) can be restated as follows, there exists
y ∈ Rm
+ such that the following conditions hold:
m
∇ f (x̄) + ∑ ȳi ∇gi (x̄) = 0 (Dual feasibility, together with y ∈ Rm
+)
i=1
ȳi gi (x̄) = 0, ∀i ∈ {1, 2, . . . , m} (Complementary Slackness)
Exercises
1. Consider the following NLP:
(NLP) min −7x1 − 5x2

s.t.
2x12 + x22 + x1 x2 − 4 ≤ 0,
x12 + x22 − 2 ≤ 0,
−x1 + 12 ≤ 0,
and the vector x̄ := (1, 1)T . Write down the optimality conditions for x̄ for this NLP as
described in the Karush-Kuhn-Tucker Theorem. Using these conditions and the theorem,
prove that x̄ is optimal. N OTE: you may use the fact that the functions defining the objective
function and the constraints are convex and differentiable, without proving it.
2. Consider the following NLP:
(NLP) min x3
s.t.
x1 + x2 ≤ 0,
x12 − 4 ≤ 0,
x12 − 2x1 + x22 − x3 + 1 ≤ 0,
1
1 1 T
and the vector x̄ := 2,−2, 2 . Write down the optimality conditions for x̄ for this NLP
as described in Karush-Kuhn-Tucker Theorem. Using these conditions and the theorem,
prove that x̄ is optimal.
7.6 Optimality conditions based on Lagrangians

So far, we have made a lot of progress by utilizing the idea of relaxation when we faced a
very difficult problem. In the previous sections for nonlinear programming, the relaxation
approach we used, replaced nonlinear functions by their linear approximations. In the case
of integer programming, in the earlier chapters, we simply removed the integrality constraint
and dealt with it in other ways. Now, let us try to apply this second approach to NLP. In a
given NLP, many or all of the constraints gi (x) ≤ 0 may be very difficult. We may assign each
of these constraints a variable (just like LP duality!) yi ≥ 0 and consider the function
m
f (x) + ∑ yi gi (x).
i=1
m
Since y ≥ 0, for every feasible solution x, we have ∑ yigi(x) ≤ 0. Therefore, the minimum
" # i=1
m
value of f (x) + ∑ yi gi (x) over all x ∈ Rn gives a lower bound on the optimal objective value
i=1
of the original NLP. Notice that if x̄ is a feasible solution of NLP, then f (x̄) is an upper bound
on the optimal objective value of the NLP. On the other hand, if we pick some nonnegative
7.6. OPTIMALITY CONDITIONS BASED ON LAGRANGIANS 251
vector y ∈ Rm and compute the minimum value of the relaxation:

( )
m
min f (x) + ∑ yi gi (x) : x ∈ Rn ,
i=1
we obtain a lower bound on the optimal objective value of the NLP.

Let us define the Lagrangian L : Rn × Rm → R for NLP as
m
L(x, y) := f (x) + ∑ yi gi (x).
i=1
Note that the Lagrangian encodes all the information about the problem NLP:
• Setting all y variables to zero in L(x, y) we obtain f (x). That is,
L(x, 0) = f (x), ∀x.
• Setting y to unit vectors and using the above, we can obtain all the constraint functions.
That is, for every i ∈ {1, 2, . . . , m},
L(x, ei ) − L(x, 0) = gi (x).
Perturbing y along a unit vector, we have the same result for every y:
L(x, y + ei ) − L(x, y) = gi (x).
Let z∗ denote the optimal objective objective value of NLP. Since the minimum value of
the Lagrangian at every nonnegative y gives a lower bound on z∗ , we also have
max min {L(x, y)} ≤ z∗ .

y∈Rm
+ x∈R
n
The next theorem summarizes the fundamental duality result based on the Lagrangian.
Theorem 7.10 (Karush-Kuhn-Tucker Theorem based on the Lagrangian). Consider a convex

NLP of the form (1.35) that has a Slater point. Then a feasible solution x̄ is an optimal solution
if and only if there exists ȳ ∈ Rm
+ such that the following conditions hold:
L(x̄, y) ≤ L(x̄, ȳ) ≤ L(x, ȳ), ∀x ∈ Rn , ∀y ∈ Rm

+ (Saddle-point)
ȳi gi (x̄) = 0, ∀i ∈ {1, 2, . . . , m} (Complementary Slackness)
(x, ȳ)
L(x, y)
(x̄, ȳ)
y ∈ Rm
(x̄, y)
x ∈ Rn
Figure 7.14: Saddle point function
We illustrate the Saddle-point condition of the Lagrangian in Figure 7.14. Among all
points (x, ȳ), the point (x̄, ȳ) minimizes L(x, y). Among all points (x̄, y), the point (x̄, ȳ) maxi-
mizes L(x, y).
Now, let us see a proof that if x̄ ∈ Rn satisfies the conditions of the above theorem for some
ȳ then it is an optimal solution of the NLP in the convex case. First, we prove that x̄ is feasible.
We saw that the Lagrangian encodes all the information on the constraints of NLP and we can
extract this information using unit vectors. So, let us fix i ∈ {1, 2, . . . , m} and apply the first
inequality in the saddle-point condition using y := ȳ + ei :
0 ≥ L(x̄, ȳ + ei ) − L(x̄, ȳ) = gi (x̄).
Hence, x̄ is feasible in NLP.

Next, we use the second inequality in the saddle-point condition and complementary slack-
ness. We have
L(x̄, ȳ) ≤ L(x, ȳ), ∀x ∈ Rn ,

7.7. INTERIOR-POINT METHOD FOR LINEAR PROGRAMS* 253
which is equivalent to
m m
f (x̄) + ∑ ȳi gi (x̄) ≤ f (x) + ∑ ȳi gi (x), ∀x ∈ Rn .
i=1 i=1
Consider all feasible x (that is, gi (x) ≤ 0 for every i). Since ȳi ≥ 0 for every i, we have
∑m
i=1 ȳi gi (x) ≤ 0. Hence, our conclusion becomes
m
f (x̄) + ∑ ȳi gi (x̄) ≤ f (x), ∀x feasible in NLP.
i=1
Using the fact that complementary slackness holds at (x̄, ȳ), we observe that ∑m
i=1 ȳi gi (x̄) is
zero and we conclude
f (x̄) ≤ f (x), ∀x feasible in NLP.
Indeed, this combined with feasibility of x̄ means, x̄ is optimal in NLP.
The last two theorems can be strengthened via replacing the assumption on the existence
of a Slater point by weaker, but more technical assumptions.
Moreover, the last two theorems can be generalized to NLP problems that are not convex.
In the general case, the first theorem only characterizes local optimality.
7.7 Interior-point method for linear programs*

In this section, we introduce a new breed of algorithms, namely interior-point methods. Recall
that when we start from a basic feasible solution of an LP, the simplex method generates a
sequence of basic feasible solutions eventually leading to either a basic feasible solution that is
optimal or to a feasible direction along which the objective function value is unbounded. Since
each basic feasible solution is an extreme point of the underlying feasible region and each
(nondegenerate) simplex iteration corresponds to moving from one extreme point to another
that is connected by an edge (line segment) on the boundary of the feasible region, the simplex
method considers only the feasible solutions that lie on the boundary of the underlying feasible
region. In contrast, the interior-point methods generate iterates which satisfy all inequality
constraints strictly. Hence, geometrically speaking, all iterates lie in the relative interior of the
feasible region.
We consider the LP problems (P) in SEF and assume without loss of generality that
rank(A) = m. The dual is

min bT y : AT y − s = c, s ≥ 0 .
Note that we have explicitly included the slack variables s. For now, we will also assume that
there exist interior solutions for both problems, i.e. there exist x̄ and (ȳ, s̄) such that Ax̄ = b,
x̄ j > 0 for all j and AT ȳ − s̄ = c, s̄ j > 0 for all j. So, an interior solution is a feasible solution
which satisfies all the inequality constraints strictly. Note that x̄ and s̄ are Slater points for (P)
and (D) respectively (in suitable reformulations of (P) and (D) in the form of NLP).
By the Weak Duality Theorem, we have
bT ȳ − cT x̄ ≥ 0.
In fact, we can write the duality gap in terms of x̄ and s̄:
bT ȳ − cT x̄ = x̄T AT ȳ − cT x̄ = x̄T (AT ȳ − c) = x̄T s̄ > 0.
If x̄ and s̄ had been optimal solutions, we would have had x̄T s̄ = 0. Recall the necessary and
sufficient conditions for optimality:
Ax = b, x ≥ 0, (primal feasibility),
AT y − s = c, s ≥ 0, (dual feasibility),
x j s j = 0 for all j (complementary slackness).
Now, we would like to find a new solution x, y, s so that x is feasible for (P), (y, s) is feasi-
ble for (D) and the duality gap bT y − cT x = xT s is smaller than the current duality gap: x̄T s̄.
Recall that the simplex algorithms keep primal feasibility and complementary slackness and
strive for dual feasibility. There are also so-called dual-simplex algorithms which keep dual
feasibility and complementary slackness and strive for primal feasibility. Actually, we have
seen an algorithm which behaved in this way (keeping dual feasibility and complementary
slackness and striving for primal feasibility), namely, the Hungarian Method for finding min-
imum cost perfect matchings in bipartite graphs. Here, we are keeping primal feasibility and
dual feasibility and striving for complementary slackness. So, we would like to find directions
dx in the primal and dy, ds in the dual such that
1. x̄ + dx is feasible in (P)
2. ȳ + dy and s̄ + ds are feasible in (D)
3. (x̄ + dx)T (s̄ + ds) is very small, preferably zero.

The first item above implies Adx = 0, the second item implies AT dy − ds = 0, and the third
one implies (x̄ j + dx j )(s̄ j + ds j ) = 0. The first two systems of equations are linear in dx, dy,
and ds; however, the third one is
s̄ j dx j + x̄ j ds j + dx j ds j = −x̄ j s̄ j
which is quadratic (we have the term dx j ds j ). Of course, if we could solve these three groups
of equations at once and maintain the nonnegativity of (x̄ + dx), (s̄ + ds), we would have
optimal solutions of (P) and (D). Since we do not know how to solve these equations (without
solving the linear programming problem), we will ignore the quadratic term and try to solve
the big linear system with the last group of linear equations s̄ j dx j + x̄ j ds j = −x̄ j s̄ j instead.
So, we solve the following system of linear equations. Note that the last group of equations
is trying to force a solution dx, ds such that (x̄ + dx)T (s̄ + ds) is close to zero.
Adx = 0, (7.4)
AT dy − ds = 0, (7.5)
s̄ j dx j + x̄ j ds j = −x̄ j s̄ j , for all j ∈ {1, 2, . . . , n}. (7.6)
This is a system of linear equations and we know how to solve it efficiently (e.g. by Gaussian
Elimination). Under our assumptions (rank(A) = m and x̄ > 0, s̄ > 0), this linear system
always has a unique solution.
One very important condition we neglected so far is the nonnegativity constraint on x and
s: Most likely, x̄ + dx is not feasible in (P) or s̄ + ds is not feasible in (D), possibly both.
However, the above development convinced us that the dx, dy, ds are good directions and we
should thus move in these directions, but not “all the way.” So we consider the solution
x̄ + αdx, ȳ + αdy, s̄ + αds,
for some α > 0. We should pick α such that x̄ + αdx > 0 and s̄ + αds > 0. We can choose,
for instance,

x̄ j s̄ j
α = 0.999 min , : dx j < 0, ds j < 0, . (7.7)
−dx j −ds j
Now, we show that the duality gap decreases from iteration to iteration. Here we are able to
show that the decrease is proportional to (1 − α). That is, when the step size is large, we get
a large decrease in the duality gap. Let us calculate the new duality gap:
n
new gap = ∑ (x̄ j + αdx j )(s̄ j + αds j ),
j=1
n n n
= ∑ x̄ j s̄ j + α ∑ (s̄ j dx j + x̄ j ds j ) + α 2 ∑ dx j ds j .
j=1 j=1 j=1
From (7.5) we have ds = AT dy. So,

n
∑ dx j ds j = (dx)T AT dy = (dy)T (Adx) = 0 (since Adx = 0).
j=1
Also note that using (7.6), we have (s̄ j dx j + x̄ j ds j ) = −x̄ j s̄ j . Hence,

n n
(new duality gap) = ∑ x̄ j s̄ j + α ∑ (s̄ j dx j + x̄ j ds j )
j=1 j=1
n n
= ∑ x̄ j s̄ j − α ∑ x̄ j s̄ j
j=1 j=1
n
= (1 − α) ∑ x̄ j s̄ j
j=1
= (1 − α) (current duality gap).
Indeed, if x̄ + dx ≥ 0 (i.e. α = 1 yields a feasible solution in (P)) and s̄ + ds ≥ 0 (i.e.
α = 1 also yields a feasible solution in (D)), then they (x̄ + dx and s̄ + ds) will be optimal.
On the other hand, unless α = 1 at some iteration, we never reach an optimal solution. So,
we might choose to stop iterating when the duality gap is below some small positive number
ε (say 10−8 ). Now, we can state the algorithm (this is called the primal-dual affine-scaling
algorithm[17]): Suppose we are given starting points x0 , (y0 , s0 ) feasible in (P) and (D) re-
spectively with x0 > 0, s0 > 0. Let ε ∈ (0, 1) denote the desired tolerance (we want to find x,
(y, s) feasible in (P) and (D) respectively such that xT s < ε).
It turns out that the solution dx, dy, and ds of the linear system (which is solved at each
iteration) can be written explicitly in terms of A and the current solution x̄ and s̄ (note that
Ax̄ = b and AT ȳ + s̄ = c). Define X to be the diagonal matrix with entries x̄1 , x̄2 , . . ., x̄n down
the diagonal (and S is the diagonal matrix with entries s̄1 , s̄2 , . . ., s̄n down the diagonal). Then
the solution vectors dx, dy, and ds are given by the following formulas:
dy = −(AXS−1 AT )−1 b,
ds = AT dy,
dx = −x̄ − XS−1 ds.
Algorithm 9 Primal-Dual Affine Scaling Algorithm.

1: loop
2: Let k := 0.
3: if (xk )T sk < ε then
4: stop xk and (yk , sk ) are the desired solutions.
5: end if
6: Let (x̄, ȳ, s̄) := (xk , yk , sk ).
7: Solve (7.4)-(7.6) for (dx, dy, ds).
8: if (x̄ + dx, ȳ + dy, s̄ + ds) is feasible then
9: Let (xk+1 , yk+1 , sk+1 ) := (x̄ + dx, ȳ + dy, s̄ + ds) and stop (xk+1 , yk+1 , sk+1 ) is an op-
timal solution of (P) and (D).
10: end if
11: Choose step size αk ∈ (0, 1) such that x̄ + αk dx > 0, and s̄ + αk ds > 0
12: (xk+1 , yk+1 , sk+1 ) := (x̄ + αk dx, ȳ + αk dy, s̄ + αk ds)
13: k := k + 1
14: end loop
Just as in the previous sections, instead of explicitly forming the inverse of (AXS−1 AT ), we
solve the linear system of equations (AXS−1 AT )dy = −b for the unknown (dy).
Remark 7.11. (i) The idea that led us to the system of equations (7.4), (7.5), (7.6) is very
closely related to Newton’s method for solving a system of nonlinear equations.
(ii) There are many different ways of choosing the step size α; some have practical and/or
theoretical advantages. In practice, we may even want nto choose different o step sizes
xj
in the primal and the dual problems, e.g. αP = 0.9 min −dx j : dx j < 0, and αD =
n o
s
0.9 min −dsj j : ds j < 0, and then we update (x̄, ȳ, s̄) using (x̄ + αP dx, ȳ + αD dy, s̄ +
αD ds).
(iii) The algorithm can be modified so that it allows infeasible starting points. The only
requirement is that x̄ and s̄ > 0. In this combined two-phase version of the algorithm,
the RHS vectors of (7.4) and (7.5) are replaced by (b − Ax̄) and (c − AT ȳ + s̄) and are
updated in each iteration.
7.7.1 A polynomial-time interior-point algorithm

In our primal-dual interior-point algorithm, we can choose the step size more conservatively
to guarantee that our algorithm converges in polynomial time. Let us define the following
potential function:

xT s
ψ(x, s; ρ) := (ρ + 1) ln − ln min{x j s j } ,
n j
where ρ > 0 to be chosen later. In the algorithm, we will choose the step size by the following
rule:

choose the step size αk ∈ (0, 1) such that ψ(x + αk dx, ¯ ρ) = 1 + ρ ln
¯ s + αk ds; 1
.
nε
Then, we can prove an upper bound on the number of iterations of the algorithm. We
assume that we are given ε ∈ (0, 1). We define

1
ρ := 2 ln .
ε
Theorem 7.12. Suppose we are given (x0 , y0 , s0 ) as in the statement of the primal-dual affine
scaling algorithm. Further assume that

0 0 1
ψ(x , s ; ρ) ≤ 1 + ρ ln .
nε
Then the primal-dual affine scaling algorithm with the above step size strategy based on ψ,

terminates in at most O n ln2 ε1 iterations with xk , (yk , sk ) feasible in (P) and (D) respec-
tively such that their duality gap is less than ε.
This theorem guarantees finding feasible solutions with very good objective values. It
turns out, for LP, if we can get the duality gap below a certain threshold (for data (A, b, c)
with rational entries, let t denote the number of bits required to store them in binary; then
choosing ε := exp(−2t) suffices), then using these very accurate approximate solutions, we
can quickly compute exact optimal solutions. Using these techniques and ideas, we can prove
that LP problems with rational data can be solved in polynomial time. We work through some
of the details in the exercises at the end of this chapter.
Exercises
1. Suppose A ∈ Rm×n with rank(A) = m, x̄ > 0 and s̄ > 0 are given. Prove that the system
(7.4)-(7.6) has a unique solution.
2. Consider the LP given in the form of (P) with

 
    4
1 2 1 1 0 5  2 
 
A= 3 1 −1 0 1  b =  8 c = 
 1 

0 1 1 1 0 3  1 
−3
Starting with the primal-dual pair

x̄ = (1, 1, 1, 1, 5)T , and s̄ = (4, 9, 3, 4, 4)T , apply the primal-dual interior-point method
until the duality gap goes below 10−4 . Then identify a pair of (exact) optimal solutions for
(P) and its dual. Finally, verify the correctness of your answer by utilizing duality theory.
3. Suppose that during an application of the primal-dual interior-point method (to a pair of
primal dual LPs, with the primal (P) in standard equality form), when we solve the system
Adx = 0,
AT dy − ds = 0,
s̄ j dx j + x̄ j ds j = −x̄ j s̄ j , for all j
(for given current iterate x̄ > 0, s̄ > 0 such that Ax̄ = b, AT ȳ − s̄ = c, for some ȳ), we find
that dx = 0. Prove that in this case, there exists ŷ such that
AT ŷ = c
and that every feasible solution of (P), including x̄ is optimal.
4. Consider Theorem 7.12. Prove

that the same iteration bound can be achieved if we take
1
a constant step size α = Θ nt 2 in every iteration. (Note t is defined in the paragraph
following Theorem 7.12.)

For a further introduction into nonlinear optimization, see Peressini et al. [18], Boyd and
Vandenberghe [4]. There are also algorithms designed for linear optimization using insights
gained from developing the theory of nonlinear convex optimization (see Khachiyan [15],
Karmarkar [14] and Ye [27]).
There are historical as well as theoretical connections to game theory. Suppose two play-
ers P and D are playing a two-person zero-sum game. Player P’s strategies are denoted by
a vector x ∈ Rn and player D’s strategies are denoted by a nonnegative vector y ∈ Rm . Play-
ers choose a strategy (without knowing each other’s choices) and reveal them simultaneously.
Based on the vectors x and y that they reveal, Player P pays Player D [ f (x) + ∑m i=1 yi gi (x)]
dollars (if this quantity is negative, Player D pays the absolute value of it to Player P). Player
P’s problem is " #
m
minn maxm {L(x, y)} = minn maxm f (x) + ∑ yi gi (x) ,
x∈R y∈R+ x∈R y∈R+ i=1
player D’s problem is

" #
m
max min {L(x, y)} = maxm minn f (x) + ∑ yi gi (x) .
y∈Rm
+ x∈R
n y∈R+ x∈R i=1
There are very many ways of utilizing the Lagrangian in the theory and practice of optimiza-
tion. For example, we can use the Lagrangian in designing primal-dual algorithms (even for
combinatorial optimization problems). Consider x̄ ∈ Rn , a current iterate of an algorithm.
Suppose x̄ violates a subset of the constraints gi (x) ≤ 0, for i ∈ J. We can choose ȳi > 0, ∀i ∈ J
(essentially penalizing the violations) and minimize L(x, ȳ). This gives us a new iterate x̄ and
we may repeat the process by choosing a new ȳ depending on the current violation and repeat.
In many cases, choosing a good ȳ is also done via solving an easier optimization problem.
Note that we outlined a primal-dual scheme, with alternating moves, based on relaxations, in
the primal and dual spaces.
Bibliography
[1] N. Alon, D. Moshkovitz, and S. Safra. Algorithmic construction of sets for k-restrictions. ACM
Transactions on Algorithms, 2(2):153–177, 2006.
[2] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust Optimization. Princeton Series in Applied
Mathematics. Princeton University Press, Princeton, NJ, 2009.
[3] M. J. Best. Portfolio Optimization. Chapman & Hall/CRC Finance Series. CRC Press, Boca
Raton, FL, 2010.
[4] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge,
2004.
[5] V. Chvátal. Linear Programming. A Series of Books in the Mathematical Sciences. W. H.

Freeman and Company, New York, 1983.
[6] W. J. Cook, W. H. Cunningham, W. R. Pulleyblank, and A. Schrijver. Combinatorial Optimiza-

tion. Wiley-Interscience Series in Discrete Mathematics and Optimization. John Wiley & Sons
Inc., New York, 1998.
[7] G. Cornuéjols and R. Tütüncü. Optimization Methods in Finance. Mathematics, Finance and
Risk. Cambridge University Press, Cambridge, 2007.
[8] G. B. Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, N.J.,
1963.
[9] E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik,
1:269–271, 1959.
[10] I. Dinur and S. Safra. The importance of being biased. In Proceedings, ACM Symposium on
Theory of Computing, pages 33–42, 2002.
[11] U. Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634–652, 1998.
[12] M. L. Fredman and R. E. Tarjan. Fibonacci heaps and their uses in improved network optimization
algorithms. Journal of the ACM, 34(3):596–615, July 1987.
[13] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins Studies in the Mathemat-
ical Sciences. Johns Hopkins University Press, Baltimore, MD, third edition, 1996.
261
262 BIBLIOGRAPHY
[14] N. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica,

4(4):373–395, 1984.
[15] L. G. Khachiyan. A polynomial algorithm in linear programming. Dokl. Akad. Nauk SSSR,
244(5):1093–1096, 1979.
[16] S. Khot and O. Regev. Vertex cover might be hard to approximate to within 2-epsilon. J. Comput.
Syst. Sci., 74(3):335–349, 2008.
[17] R. D. C. Monteiro, I. Adler, and M. G. C. Resende. A polynomial time primal-dual affine scaling
algorithm for linear and convex quadratic programming and its power series extension. Mathe-
matics of Operations Research, 15:191–214, 1990.
[18] A. L. Peressini, F. E. Sullivan, and J. J. Uhl. The Mathematics of Nonlinear Programming.

Undergraduate Texts in Mathematics. Springer-Verlag, New York, 1988.
[19] M. L. Pinedo. Scheduling. Springer, New York, third edition, 2008. Theory, algorithms, and
systems, With 1 CD-ROM (Windows, Macintosh and UNIX).
[20] R. Raz and S. Safra. A sub-constant error-probability low-degree test, and a sub-constant error-
probability pcp characterization of np. In Proceedings, ACM Symposium on Theory of Comput-
ing, pages 475–484, 1997.
[21] A. Schrijver. Theory of Linear and Integer Programming. Wiley-Interscience Series in Discrete
Mathematics. John Wiley & Sons Ltd., Chichester, 1986.
[22] R. J. Vanderbei. Linear Programming. International Series in Operations Research & Manage-
ment Science, 37. Kluwer Academic Publishers, Boston, MA, second edition, 2001.
[23] V. V. Vazirani. Approximation Algorithms. Springer, 2001.
[24] D. P. Williamson and D. Shmoys. The Design of Approximation Algorithms. Cambridge Univer-
sity Press, 2011.
[25] W. L. Winston. Operations Research, Applications and Algorithms. Thomson Learning, 2004.
[26] L. A. Wolsey. Integer Programming. Wiley-Interscience Series in Discrete Mathematics and

Optimization. John Wiley & Sons Inc., New York, 1998.
[27] Y. Ye. Interior Point Algorithms. Wiley-Interscience Series in Discrete Mathematics and Opti-
mization. John Wiley & Sons Inc., New York, 1997.
Index
st-cut, 43 differentiability, 247

dual, 123, 141
active constraints, 106 dual Simplex algorithm, 231
adjacent vertices, 35
affine function, 12, 234 edge, 35
Alternating path, 154 edge cover, 46
Augmenting path, 155 epigraph of a function, 238
auxiliary equivalent linear programs, 64
linear program, 93, 94 Exposed vertex, 155
variables, 93 extreme point, 106
basic feasible solution, 71 Farkas’ Lemma, 58

basic solution, 70 feasible region, 102
basis, 70 Fermat’s Last Theorem, 236
feasible, 71 free variable, 114
bipartite, 38 Fundamental Theorem of LP, 56, 97
branch & bound, 227 Fundamental Theorem of LP (SEF), 97
branch & bound nodes, 228
branch & bound tree, 229, 230 gradient, 247
branch and cut, 213 Graph
branching on a variable, 228 Connected, 156
graph, 35
canonical form, 72
certificate of infeasibility, 58 halfspace, 103
certificate of unboundedness, 60 Hall’s Theorem, 146
Complementary Slackness Theorem, 181 hyperplane, 103
cone, 185 incident, 35
cone of tight constraints, 186 input-output systems, 205
convex hull, 214 integer program, 21
convex set, 104 IP, 22
Cycle, 156 mixed, 22
Dantzig’s rule, 89 pure, 22
decision variable, 24 Karush-Kuhn-Tucker Theorem, 248, 251
deficient set, 146
depth-first search, 230 Lagrangian, 251
263
264 INDEX
largest coefficient rule, 89 Strong Duality Theorem, 175

largest improvement rule, 89 subgradient, 242
law of diminishing marginal returns, 205 supporting halfspace, 243
Leaf, 156
level set, 240 tight constraint, 106
lexicographic rule, 114 Tree, 156
linear constraint, 12 value, objective function value, 55
linear program, 12 variable
feasible, 55 basic, 70
infeasible, 55 free, 64
LP, 12 non-basic, 70
unbounded, 55 slack, 66
Markowitz model, 54 vertex, 35
Matching vertex cover, 46, 127
Maximum, 155 Weak Duality Theorem, 123, 141, 168
matching, 38
neighbours, 146
non linear program, 48
non linear program:NLP, 48
optimal value, 55
path, 37
perfect matching, 38
pivot rules, 114
polyhedron, polyhedra, 104
portfolio optimization, 54
potential function, 258
pruning, 230
reducible, 91
robust optimization, 54
shadow prices, 203, 205

Simplex algorithm, 85
Slater point, 248
smallest subscript rule, 85, 97
solution
feasible, 55
optimal, 55
standard equality form (SEF), 63
steepest edge rule, 89

Introduction To Optimisation PDF

Uploaded by

Copyright:

Available Formats

Introduction To Optimisation PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Optimisation PDF

Uploaded by

Copyright:

Available Formats

Introduction to Optimization

Course Notes for CO 250/CM 340

August 27, 2012

2 Solving linear programs 55

2.1.2 Unbounded linear programs . . . . . . . . . . . . . . . . . . . . . . 59

3 Duality through examples 115

3.2.4 An algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

4 Duality theory 165

5 Applications of duality* 195

6 Solving Integer Programs 213

7 Nonlinear optimization 233

7.3.1 Convex functions and epigraphs . . . . . . . . . . . . . . . . . . . . 237

Broadly speaking, optimization is the problem of minimizing or maximizing a function sub-

(1) find a formulation of the optimization problem,

(2) using a suitable algorithm solve the formulation.

A formulation is a mathematical representation of the optimization problem. The various

Real life problem 1. Modeling Formulation

1.1.1 The formulation

300x + 260x2 + 220x3 + 180x4 − 8ys − 6yu ,

11x1 + 7x2 + 6x3 + 5x4 ,

11x1 + 7x2 + 6x3 + 5x4 ≤ 700. (1.1)

In a similar way, we derive a constraint for machine 2

4x1 + 6x2 + 5x3 + 4x4 ≤ 500. (1.2)

8x1 + 5x2 + 5x3 + 6x4 ≤ ys (1.3)

max 300x1 + 260x2 + 220x3 + 180x4 − 8ys − 6yu

x1 = 10, x2 = 50, x3 = 0, x4 = 20, ys = 600, yu = 600. (1.6)

• linear programs (section 1.2),

• integer programs (section 1.3), and

• non-linear programs (section 1.6).

1.2 Linear programs

Example 1. Suppose x = (x1 , x2 , x3 , x4 )T . Then

(a) f (x) = x1 − x3 + x4 , is a linear function,

f (x) ≤ β or f (x) ≥ β or f (x) = β

(a) The following is an LP,

(b) The following is not an LP, as x2 + x3 < 3 is not a linear constraint, 1

1.2.1 Multiperiod models

min 0.75p1 + 0.72p2 + 0.92p3 + 0.90p4 . (1.8)

We obtain similar constraints for months 2 and 3:

min 0.75p1 + 0.72p2 + 0.92p3 + 0.90p4

p1 = 3000, p2 = 12000, p3 = 5000, p4 = 6000, t1 = 2000, t2 = 0, t3 = 4000, t4 = 0,

• calories must be at least 2000,

litres produced percent milk-fat

skim 2% whole milk table cream whipping cream

• Tom wants at least 20’000$ or he will quit;

B. excavation and building the foundation,

• F can start only after B is completed

time 9-10AM 10-11AM 11AM-12PM 12-1PM 1-2PM 2-3PM

time 10AM 11AM 12PM 1PM 2PM 3PM

(a) Show that a solution to the optimization problem

will give a vector x of minimum deviation.

(c) Suppose that we had instead defined the deviation of x as

Formulate the problem of computing a, α and β achieving the above-mentioned goals as

1.3 Integer programs

1.3.1 Assignment problem

(1) each employee i ∈ I is assigned exactly one job j ∈ J,

(2) each job j ∈ J is assigned to exactly one employee i ∈ I.

∑ ∑ ci, j xi, j . (1.15)

For instance, in our specific example, the objective function is,

3x1,1 + 5x1,2 + 1x1,3 + 7x1,4 + 8x2,1 + 2x2,2 + 2x2,3 +