Introduction To Optimisation PDF
Introduction To Optimisation PDF
Introduction To Optimisation PDF
Department
c of Combinatorics and Optimization
University of Waterloo
1 Introduction 7
1.1 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.1 The formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.2 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2 Linear programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 Multiperiod models . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Integer programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.1 Assignment problem . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.2 Knapsack problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4 Optimization problems on graphs . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4.1 Shortest path problem . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.4.2 Minimum cost perfect matching . . . . . . . . . . . . . . . . . . . . 37
1.5 Integer programs, continued . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.5.1 Minimum cost perfect matching . . . . . . . . . . . . . . . . . . . . 40
1.5.2 Shortest path problem . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.6 Non linear programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.6.1 Pricing a Tech Gadget . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.6.2 Finding a closest point feasible in an LP . . . . . . . . . . . . . . . . 50
1.6.3 Finding a “central” feasible solution of an LP . . . . . . . . . . . . . 50
1.7 Overview of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.8 Further reading and notes . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3
4 CONTENTS
Introduction
7
8 CHAPTER 1. INTRODUCTION
2. Run algorithm
Optimal choices
Solution
1.1 An example
To clarify these ideas let us consider a simple example. Suppose WaterTech manufactures four
products, requiring time on two machines and two types (skilled and unskilled) of labour. The
amount of machine time and labor (in hours) needed to produce a unit of each product and the
sales prices in dollars per unit of each product are given in the following table:
Product Machine 1 Machine 2 Skilled Labor Unskilled Labor Unit Sale Price
1 11 4 8 7 300
2 7 6 5 8 260
3 6 5 5 7 220
4 5 4 6 4 180
1.1. AN EXAMPLE 9
Each month, 700 hours are available on machine 1 and 500 hours on machine 2. Each month,
WaterTech can purchase up to 600 hours of skilled labour at $8 per hour and up to 650 hours of
unskilled labour at $6 per hour. The company wants to determine how much of each product
it should produce each month and how much labor to purchase in order to maximize its profit
(i.e., revenue from sales minus labour costs).
and this must not exceed the available 700 hours of time on that machine. Thus,
Analogously, once we decide on how much of each product should be produced, we know
how much skilled and unskilled labour is needed. Naturally, we need to make sure that enough
hours of each type of labour are purchased. The following two constraints enforce this.
Finally, we need to add constraints that force each of the variables to take on only non-negative
values as well as constraints that limit the number of hours of skilled and unskilled labour pur-
chased. Combining the objective function with (1.1)-(1.4) gives the following formulation,
1.1.2 Correctness
Is the formulation given by (1.5) correct, i.e. does this formulation capture exactly the Wa-
terTech problem? We will argue that it does and outline a procedure to verify whether a given
formulation is correct. Let us introduce a bit of terminology. By the word description of the
optimization problem we mean a description of the optimization problem in plain english.
This is the description that the CEO of WaterTech would give you. By a formulation we mean
the mathematical formulation, as in (1.5). A solution to the formulation is an assignment of
values to each of the variables of the formulation. A solution is feasible if it has the prop-
erty that all the constraints are satisfied. An optimal solution to the formulation is a feasible
solution that maximizes the objective function (or minimizes it if the optimization problem
is a minimization problem). Similarly, we define a solution to the word description of the
optimization problem to be a choice for the unknowns, and a feasible solution to be such a
choice that satisfies all the constraints.
When attempting to write a formulation for an optimization problem you need to make
sure that there is a mapping between feasible solution of the word description and feasible
1.1. AN EXAMPLE 11
solution of the formulation and vice versa between feasible solution of the formulation and
feasible solution of the word description. For instance, a feasible solution for WaterTech is to
produce, 10 units of product 1, 50 of product 2, 0 units of product 3, and 20 of product 4, and
buy 600 hours of both skilled and unskilled labour. This corresponds to the following feasible
solution of the formulation:
Conversely, given the feasible solution (1.6) we can construct a feasible solution for the word
description of the WaterTech problem. Note that this works for every feasible solution as,
(1) every constraint in the word description is present in the formulation, and
(2) every constraint in the formulation arises from a constraint in the word description.
When constructing a formulation you need to make sure that both conditions (1) and (2) are
satisfied. For if (1) does not hold, feasible solutions to the formulation may violate constraints
of the word formulation, and if (2) does not hold, then the formulation is more restrictive
than the word description. A common mistake is to violate (1) by forgetting some constraint
when writing down the formulation. For instance, we may forget to write down the condition
for WaterTech that ys is non-negative. In that case, when solving the formulation using an
algorithm, we may end up with an negative value for ys , i.e. we buy a negative amount of
skilled labor or equivalently we sell skilled labor, which something that is not allowed in our
word description. Thus far, we have only discussed feasible solutions. Clearly, we also need
to verify that the objective function in the word formulation and the formulation are the same.
This is clearly the case for the WaterTech formulation, and is usually straightforward to verify.
We used an algorithm to find an optimal solution to (1.5) and obtained
2 1 1
x1 = 16 + , x2 = 50, x4 = 33 + , ys = 583 + , yu = 650, (1.7)
3 3 3
achieving a total profit of $15, 433 + 13 . Thus the optimal strategy for WaterTech is to manu-
facture 16 + 32 units of product 1, 50 units of product 2, 0 units of product 3, 33 + 13 units of
product 4, and to buy 583 hours of skilled labour and 650 units of unskilled labour.
Since constructing the formulation is only our first step (see Figure 1) and we need to use
an algorithm to find an optimal solution to the formulation, we will strive to get, among all
possible formulation, one that is as simple as possible. In the remainder of the chapter we will
introduce three types of formulations,
12 CHAPTER 1. INTRODUCTION
There are very efficient techniques to solve linear programs and we will see of these in Chap-
ters 2 and 7. Integer programs and non-linear programs can be very hard to solve, however.
Thus when possible we will try to formulate our problem as a linear program. Unfortunately,
this is not always possible, and sometimes the only possible formulation is an integer program
or a non-linear program.
(b) f (x) = 2x1 − x3 + x4 − 6, is an affine function, but not a linear function, and
(c) f (x) = 3x1 + x2 − 6x3 x4 , is not an affine function (because of the product x3 x4 ).
A linear constraint is any constraint that is of one of the following forms (after moving all
variables to the left and side and all constants to the right hand side),
where f (x) is a linear function, and β is a real number. A linear program is the problem of
maximizing or minimizing an affine function subject to a finite number of linear constraints.
We will abbreviate the term linear program by LP, throughout this book.
Example 2.
1.2. LINEAR PROGRAMS 13
max x1
subject to
x2 + x3 < 3
x1 − x4 ≥ 1
(c) The following is not an LP, as the objective function is not affine,
1
min
x
subject to
x1 ≥ 3
(d) The following is not an LP, as there are an infinite number of constraints,
max x1
subject to
x1 + kx2 ≤ 5 (for all constants k where 2 ≤ k ≤ 3)
Observe also that (1.5) is an example of an LP. In that example constraint 8x1 + 5x2 + 5x3 +
6x4 ≤ ys can be rewritten as 8x1 + 5x2 + 5x3 + 6x4 − ys ≤ 0.
a production plan. Often times, the decision making process has a temporal component; time
is split into so called periods and we have to make certain decisions at the beginning or end of
each of them. Each of these decisions will naturally determine the final outcome at the end of
all periods. In this section, we introduce this area by discussing a sample problem. KWOil is
a local supplier of heating oil. The company has been around for many years, and knows its
home turf. In particular, KWOil has developed a dependable model to forecast future demand
for oil. For each of the following four months, the company expects the following amounts of
demand for heating oil.
Month 1 2 3 4
Demand (`) 5000 8000 9000 6000
At the beginning of each of the 4 months, KWOil may purchase heating oil from a regional
supplier at the current market rate. The following table shows the projected price per litre at
the beginning of each of these months.
Month 1 2 3 4
Price ($/`) 0.75 0.72 0.92 0.90
KWOil has a small storage tank on its facility. The tank can hold up to 4000 litres of oil, and
currently (at the beginning of month 1) contains 2000 litres. The company wants to know how
much oil it should purchase at the beginning of each of the 4 months such that it satisfies the
projected customer demand at the minimum possible total cost. Note, oil that is delivered at
the beginning of each month can be delivered directly to the customer, it does not need to be
first put in our storage, only, oil that is left over at the end of the month goes into storage. We
wish to find an LP formulation for this problem. Thus we need to determine the variables, the
objective function, and the constraints.
Variables. KWOil needs to decide how much oil to purchase at the beginning of each of
the four months. We therefore introduce variables pi for i ∈ {1, 2, 3, 4} denoting the number
of litres of oil purchased at the beginning of month i for i ∈ {1, 2, 3, 4}. We also introduce
variables ti for each i ∈ {2, 3, 4} to denote the number of litres of heating oil in the company’s
tank at the beginning of month i. Thus while every unknown of the word description always
needs to be represented as a variable in the formulation, it is sometimes useful, or necessary,
to introduce additional variables to keep track of various parameters.
Objective function. Given the variables defined above, it is straightforward to write down
the cost incurred by KWOil. The objective function of KWOil’s problem is
Constraints. What constraints does KWOil face, or put differently, when do {pi }i and
{ti }i yield a feasible oil purchasing pattern? Well, in each month i, the company needs to have
enough heating oil available to satisfy customer demand. The amount of available oil at the
beginning of month 1, for example, is comprised of two parts: the p1 litres of oil purchased
in month 1, and the t1 litres contained in the tank. The sum of these two quantities needs to
cover the demand in month 1, and the excess is stored in the local tank. Hence, we obtain the
following constraint:
p1 + t1 = 5000 + t2 . (1.9)
p2 + t2 = 8000 + t3 , (1.10)
p3 + t3 = 9000 + t4 . (1.11)
Finally, in order to satisfy the demand in month 4 we need to satisfy the constraint
p4 + t4 ≥ 6000. (1.12)
Notice that each of the variables ti for i ∈ {2, 3, 4} appears in two of the constraints (1.9)-
(1.12). The constraints are therefore linked by the t-variables. Such linkage is a typical feature
in multiperiod models.
We now obtain the entire LP for the KWOil problem by combining (1.8)-(1.12), and
by adding upper bounds and initialization constraints for the tank contents, as well as non-
negativity constraints:
corresponding to a total purchasing cost of $20, 890. Not surprisingly, this solution suggests
to take advantage of the low oil prices in month 2 while no oil should be stored in month 3
where the prices are higher.
Exercises
1. Consider the following table indicating the nutritional value of different food types.
Foods Price ($) Calories Fat (g) Protein (g) Carbohydrade (g)
per serving per serving per serving per serving per serving
Raw Carrots 0.14 23 0.1 0.6 6
Baked Potatoes 0.12 171 0.2 3.7 30
Wheat Bread 0.2 65 0 2.2 13
Cheddar Cheese 0.75 112 9.3 7 0
Peanut Butter 0.15 188 16 7.7 2
You need to decide how many servings of each food to buy each day so that you min-
imize the total cost of buying your food while satisfying the following daily nutritional
requirements:
Write an LP that will decide how many servings of each of the aforementioned foods we
need to meet all the nutritional requirement, while minimizing the total cost of the food
(you may buy fractional numbers of servings).
2. MUCOW (Milk Undertakings, C and O, Waterloo) owns a herd of Holstein cows and a
herd of Jersey cows. For each herd, the total number of litres produced each day, and
milk-fat content (as a percentage) are as follows
The fat is split off and blended again to create various products. For each product, the
volume, required milk-fat percentage, and profit are as follows. In particular, the milk fat
percentage must exactly equal what is specified.
(a) Formulate as an LP the problem of deciding how many items of each type to produce,
so that the total profit is maximized. Don’t worry about fractional numbers of items.
Write your formulation in matrix notation.
(b) MUCOW is told of a regulation change: ‘skim milk’ can now contain anything up to
0.1% milk-fat, but no more. How does the formulation of the problem change? Note
the resulting formulation should also be an LP.
3. The director of the CO-Tech startup needs to decide what salaries to offer to its employees
for the year 2012. In order to keep the employees satisfied she needs to satisfy the following
constraints:
(a) Write an LP that will determine salaries for the employee of CO-tech that satisfy each
of these constraints while minimizing the total salary expenses.
(b) Write an LP that will determine salaries for the employee of CO-tech that satisfy each
of these constraints while minimizing the salary of the highest paid employee.
18 CHAPTER 1. INTRODUCTION
(c) What is the relation between the solution for (a) and (b)?
4. You wish to build a house and you have divided the process into a number of tasks, namely,
You estimate the following duration for each of the tasks (in weeks):
task B F E P D L
duration 3 2 3 4 1 2
Some of the tasks can only be started when some other tasks are completed. For instance,
we can only build the frame once the foundation has been completed, i.e. F can start only
after B is completed. We summarize all the precedence constraints,
The goal is to schedule the starting time of the tasks such that the entire project is completed
as soon as possible.
As an example, here is a feasible schedule with a completion time of 10 weeks.
tasks B F E P D L
starting time 0 3 6 5 9 6
end time 3 5 9 9 10 8
1.2. LINEAR PROGRAMS 19
Formulate this problem as an LP. Explain your formulation. Note, that there is no limit on
the number of tasks that can be done in parallel.
H INT: Introduce variables to indicate the times that the tasks start.
5. The CRUD chemical plant produces as part of its production process a noxious compound
call chemical X. Chemical X is highly toxic and needs to be disposed of properly. Fortu-
nately, CRUD is linked by a pipe system to the FRESHAIR recycling plant that can safely
reprocess chemical X. On any give day the CRUD plant will produce the following amount
of Chemical X (the plant operates between 9AM and 3PM only),
Because of environmental regulation at no point in time is the CRUD plant allowed to keep
more than 1000 ` on site and no chemical X is allowed to be kept overnight. At the top
of every hour an arbitrary amount of chemical X can be sent to the FRESHAIR recycling
plant. The cost of recycling chemical X is different for every hour,
You need to decide how much chemical to send from the CRUD plant to the FRESHAIR
recycling plant at the top of each hour, so that you minimize the total recycling cost but
meet all the environmental constraints. Formulate this problem as an LP.
6. We are given an m by n matrix A and a vector b in Rm , for which the system Ax = b has no
solution. Here is an example.
2x1 − x2 = −1
x1 + x2 = 1
x1 + 3x2 = 4
−2x1 + 4x2 = 3
We are interested in finding a vector x ∈ Rn that “comes close” to solving the system.
Namely we want to find an x ∈ Rn , whose deviation is minimum, where the deviation of x
is defined to be n n
∑ |bi − ∑ ai j x j |.
i=1 j=1
(For the example system above, the vector x = (1, 1)T has deviation 2 + 1 + 0 + 1 = 4.)
20 CHAPTER 1. INTRODUCTION
(b) The problem of part (a) is not an LP. (Why?) Show that it can be formulated as an LP.
(According to this definition, in the example above x = (1, 1)T would have deviation
max(2, 1, 0, 1) = 2.) With this new definition, write the problem of finding a vector of
minimum deviation as an optimization problem, and show that this problem can also
be formulated as an LP.
7. Consider the following set up: we have factories 1 through m and stores 1 through n.
Each factory i produces ui units of a commodity and each store j requires ` j units of that
commodity. Note, each factory produces the same commodity, and each store requires
!
a1
1
a1
1
A 2
2
B
3
3 b4!
b3 4
the same commodity. The goal is to transfer the commodity from the factories to the store.
All the commodities going from the factories to the store are first sent to one of two central
storage hubs A and B. The cost of transporting one unit of commodity from factory i to hub
A (resp. B) is given by ai (resp. bi ). The cost of transporting one unit of commodity from
1.3. INTEGER PROGRAMS 21
hub A (resp. B) to store j is given by a0j (resp. b0j ). 2 In the figure we illustrate the case of
3 factories and 4 stores. The problem is to decide how much to send from each factory to
each hub and how much to send from each hub to each store so that: each store receives the
amount of commodity it requires, no factory sends out more commodity than it produces,
and such that the total transportation cost is minimized. Formulate this problem as an IP
(we may assume that the number of units of commodity sent is fractional).
8. We are given a matrix A ∈ Rm×n and a matrix B ∈ R p×n so that the rows of A denote
observations for healthy human tissue and the rows of B denote observations for unhealthy
human tissue. We would like to find two parallel hyperplanes which separate rows of A
and B as much as possible. That is, we want to find a ∈ Rn α, β ∈ R such that the parallel
hyperplanes {x ∈ Rn : aT x = α} and {x ∈ Rn : aT x = β } separate rows of A and rows of
B and the distance between the hyperplanes is maximized. The next figure illustrates the
situation for n = 2; circles correspond to rows in A, and x’s for rows of B.
x1
aTx = α
aTx = β
x2
required to take integer values the integer program is called a pure integer program otherwise
it is called a mixed integer program. We will abbreviate the term integer program by IP,
throughout this book.
Example 3. The following is a mixed IP, where variables x1 and x3 are required to take integer
values,
max x1 + x2 + 2x4
subject to
x1 + x2 ≤ 1
−x2 − x3 ≥ −1
x1 + x3 = 1
x1 , x2 , x3 ≥ 0 and x1 , x3 integer
In section 1.1 we introduced the WaterTech production problem. We gave an LP formu-
lation (1.5) and a solution to that formulation in (1.7). This solution told us to manufacture,
16 + 23 units of product 1. Depending on the nature of product 1, it may not make sense to
produce a fractional number of units of this product. Thus we may want to add the condition
that each of x1 , x2 , x3 , x4 are integer. The resulting program would be an IP. In this example
we could try to ignore the integer condition, and round down the solution, hoping to get a
reasonably good approximation to the optimal solution.
Jobs
Employees
1 2 3 4
1 3 5 1 7
2 8 2 2 4
3 2 1 6 8
4 8 3 3 2
1.3. INTEGER PROGRAMS 23
For instance, the table says that c3,4 = 8, i.e. employee 3 would take 8 hours to finish job 4.
WaterTech wants to assign jobs to employees with the conditions that,
Naturally, we want to find such an assignment that minimizes the total expected amount of
time needed to process all jobs J. A feasible solution would be to assign job k to employee k,
for k = 1, 2, 3, 4. The total amount of time require for this assignment is 3 + 2 + 6 + 2 = 13
hours. This not an optimal solution however. We wish to find an IP formulation for this
problem. Thus we need to determine the variables, the objective function, and the constraints.
Variables. In this case we need to decide for each employee i ∈ I and each job j ∈ J
whether employee i is assigned job j. We will introduce for every such pair i, j a variable xi, j
that we restrict to take values 0 or 1 where xi, j = 1 represents the case where employee i is
assigned job j, and xi, j = 0 means that employee i is not assigned job j. Thus we have |I||J|
variables. In the example given by the table we end up with 16 variables.
Constraints. We need to encode condition (1) as a mathematical constraint. Let i ∈ I be
an employee, then ∑ j∈J xi, j is the number of jobs employee i is assigned to (we do the sum
over all jobs). We want this quantity to be one, thus the following should hold,
∑ xi, j = 1. (1.13)
j∈J
In the example given by the table this says that xi,1 + xi,2 + xi,3 + xi,4 = 1 for all jobs i ∈
{1, 2, 3, 4}. We need to encode condition (2) as a mathematical constraint. Let j ∈ J be an
employee, then ∑i∈I xi, j is the number of employees job j is assigned to (we do the sum over
all employees). We want this quantity to be one, thus the following should hold,
∑ xi, j = 1. (1.14)
i∈I
Objective function. The objective function should calculate the total amount of time
spent to complete the jobs. For every employee i ∈ I and job j ∈ J, if employee i is assigned
job j, then we should contribute ci, j to the objective function, otherwise we should contribute
0. Thus, we should contribute ci, j xi, j . Therefore, the objective function is given by,
Thus the IP formulation is given by objective function (1.15), and constraints (1.13), (1.14) as
well as the condition that each variable xi, j can only take variable 0, 1,
subject to
(1.16)
∑ j∈J xi, j = 1 (i ∈ I)
∑i∈I xi, j = 1 ( j ∈ J)
xi, j ∈ {0, 1} (i ∈ I, j ∈ J)
Note, formally speaking (1.16) is not an IP, because of the constraints xi, j ∈ {0, 1}. However,
we can clearly replace these constraints by the constraints xi, j ≥ 0, xi, j ≤ 1 and xi, j integer. If
we do this for all i ∈ I and j ∈ J then the resulting optimization formulation is an IP. Hence,
we will abuse notation slightly and call the formulation (1.16) an IP.
Solving this IP for the special case given in the table, yields,
and all other values xi, j = 0. Thus an optimal solution is to assign job 1 to employee 1, job 3
to employee 2, job 2 to employee 3, and job 4 to employee 4.
Note, in this example we represent a binary choice (whether to assign job j to employee i)
by a variable taking values 0 or 1. We call such a variable a decision variable. When using a
decision variable y, we can express y ∈ {0, 1} by 0 ≤ y ≤ 1 and y integer, but the condition that
y is integer cannot easily be omitted, as we may otherwise get any value between 0 and 1 (say
1
2 for instance) and this value gives no information as to what our binary choice should be.
Type 1 2 3 4 5 6
weight 30 20 30 90 30 70
value 60 70 40 70 20 90
In addition you have the following constraints,
(1) you cannot send more than 10 crates of the same type in the container,
(2) you can only send crates of type 3, if you send at least one crate of type 4,
Finally, the total weight allowed on the freight container is 10,000 kilograms. Your goal is
to decide how many crates of each type to place in the freight container so that the value of
the crates in the container is maximized. We wish to find an IP formulation for this problem.
Thus we need to determine the variables, the objective function, and the constraints.
Variables. We will have variables xi for i = 1, . . . , 6 to indicate how many crates of type
i we place in the container. We will also require an additional decision variable y to handle
condition (3).
Constraints. The total weight of the crates selected in kilograms is given by,
xi ≤ 10 (1.18)
x3 ≤ 10x4 . (1.19)
For if no crates of type 4 is sent, then x4 = 0 which implies by (1.19) that x3 = 0 as well, i.e.
no crates of type 3 are sent. On the other hand, if at least one crate of type 4 is sent, then
26 CHAPTER 1. INTRODUCTION
x4 ≥ 1 and (1.19) says that x3 ≤ 10, which is the maximum number of crates of type 3 we can
send anyway.
It remains to express condition (3). The decision variable y will play the following role.
If y = 1 then we want (i) to be true, and if y = 0 then we want (ii) to be true. This can be
achieved by adding the following two constraints,
x1 + x2 ≥ 4y
(1.20)
x5 + x6 ≥ 4(1 − y).
Let us verify that (1.20) behaves as claimed. If y = 1, then the conditions become x1 + x2 ≥ 4
and x5 + x6 ≥ 0, which implies that (i) holds. If y = 0, then the conditions become x1 + x2 ≥ 0
and x5 + x6 ≥ 1, which implies that (ii) holds.
Objective function. The total value of the crates selected for the container is given by,
Thus the IP formulation is given by objective function (1.21), and constraints (1.17), (1.18),
(1.19), (1.20) as well as the condition that each variable xi is integer, and y ∈ {0, 1}. We
obtain,
max 60x1 + 70x2 + 40x3 + 70x4 + 20x5 + 90x6
subject to
30x1 + 20x2 + 30x3 + 90x4 + 30x5 + 70x6 ≤ 10000
xi ≤ 10 (i = 1, . . . , 6)
x3 ≤ 10x4
x1 + x2 ≥ 4y
x5 + x6 ≥ 4(1 − y)
xi ≥ 0 (i = 1, . . . , 6)
xi integer (i = 1, . . . , 6)
y ∈ {0, 1}
Exercises
1. You are about to trek across the desert with a vehicle having 3.6 cubic metres (3.6m3 ) of
cargo space for goods. There are various types of items available for putting in this space,
each with a different volume and a different net value for your trip, shown as follows.
1.3. INTEGER PROGRAMS 27
item type i 1 2 3 4 5 6 7
volume vi (m3 ) 0.55 0.6 0.7 0.75 0.85 0.9 0.95
net value ni 250 300 500 700 750 900 950
(a) You need to decide which items to take, not exceeding the volume constraint. You can
take at most one of any item. No item can be split into fractions. The total net value
must be maximized. Formulate this problem as an LP or IP. (You may use the notation
vi and ni for volume and net value of item i.)
(b) Your two friends have decided to come as well, each with an identical vehicle. There
are exactly two items of each type. The question is, can you fit all 14 items in the
vehicles without exceeding the volume constraints? (No cutting items into pieces is
permitted! Each item taken must be placed entirely in one of the vehicles.) Formulate
an LP or IP that has a feasible solution if and only if the items can be packed as
desired. Describe how to determine from a feasible solution how to pack the items
into the vehicles. Note that net value is ignored for part (b).
2. Consider a public swimming pool. In the following table we give a list of seven potential
lifeguards. For each lifeguard we have the time he/she wants to start and finish work and
how much he/she wishes to be paid for the work. The problem is to find a selection of
lifeguards so that there is at least one (but possibly more than one) lifeguard present at
each time between 1pm and 9pm. An example of a possible selection would be Joy, Tim,
and Beth. This selection has a total cost of 30 + 21 + 20.
3. You have gone to an exotic destination during the summer vacation and decided to do your
part to stimulate the economy by going on a shopping spree. Unfortunately, the day before
your return you realize that you can only take 20 kilograms of luggage on your flight which
is less than the total weight of the items that you purchased. The next table gives the value
and the weight in kilograms of each item,
28 CHAPTER 1. INTRODUCTION
A B C D E F
value 60$ 70$ 40$ 70$ 16$ 100$
weight 6 7 4 9 3 8
The problem is to decide which subset of the items to put in your luggage so that you max-
imize the total value of the items selected without exceeding the weight requirement (i.e.
that the total weight of the items selected is no more than 20 kilograms). For instance you
could select items A,C and D for a total value of 170$ and a total weight of 19 kilograms.
Note, for (b) and (c) the resulting formulation should remain an IP.
4. The Waterloo hotel wants to rent rooms 1, 2 and 3 for New Year’s night. Abby is willing
to pay $60 for room 1, $50 for room 2 but is not interested in room 3. Bob is willing to pay
$40 for room 1, $70 for room 2, and $80 for room 3. Clarence is not interested in room 1,
but is willing to pay $55 for room 2 and $75 for room 3. Donald is willing to pay $65 for
room 1, $90 for room 2, but is not interested in room 3. The information is summarized in
the following table.
Room number Abby’s offer Bob’s offer Clarence’s offer Donald’s offer
1 $60 $40 not interested 65$
2 $50 $70 $55 $90
3 not interested $80 $75 not interested
The hotel wants fill up rooms 1,2,3 with some of the potential clients (Abby, Bob, Clarence
and Donald) in a way that maximizes the total revenue. Each room is to be assigned
1.3. INTEGER PROGRAMS 29
to exactly one potential client, and each potential client is to be assigned at most one
room. As an example, Room 1 could be assigned to Bob, room 2 to Abby, and room 3 to
Clarence (while Donald would not get to stay in the hotel). This would yield a revenue of
$40+$50+$75=$165.
(a) Formulate this problem as an IP. Your solution should be easy to modify if we change
the values in the table.
(b) Abby and Bob have a history of loud and rude behavior when celebrating together. In
the interest of keeping the New Year’s eve party orderly the hotel management decides
that it does not wish to rent rooms to both Abby and Bob. Add a constraint to the IP in
(a) that will enforce this condition (the resulting formulation should still be an IP).
5. You wish to find out how to pack crates on a transport plane in an optimal way. The crates
are of five possible types, namely A, B,C, D, E. For each crate type the next table gives its
weight (in kg), its volume (in cubic meters), and its value (in dollars):
Type A B C D E
Weight 500 1500 2100 600 400
Volume 25 15 13 20 16
Value 50’000 60’000 90’000 40’000 30’000
The transport plane is divided into three segments: Front, Middle, and Back. Each segment
has a limited volume (in cubic meters), and a limit on the weight of the cargo in that
segment (in kg):
Finally, to keep the plane balanced we need to satisfy the following constraints:
Suppose that there are 12 crates of type A, 8 crates of type B, 22 crates of type C, 15
crates of type D and 11 crates of type E that are waiting to be transported. Your goal is to
maximize the total value of the crates on the plane. You need to decide how many crates
of each type is going in what segment of the plane. Formulate your problem as an IP.
30 CHAPTER 1. INTRODUCTION
Suppose that we want to add to the LP the condition that (1.22) is satisfied. Show how
to satisfy this requirement so that the resulting formulation is an LP.
H INT: rewrite (1.22) as a pair of linear inequalities.
(b) Consider the following inequalities,
Suppose that we want to add to an IP the condition that at LEAST ONE of constraints
(1.23) or (1.24) is satisfied. Show how to satisfy this requirement so that the resulting
formulation is an IP.
H INT: add a decision variable indicating whether (1.23) or (1.24) must be satisfied.
Note the left hand side of either (1.23) or (1.24) is always non-negative.
(c) Suppose that i = 1, . . . , k we have a vector ai ≥ 0 with 4 entries and a number βi (both
ai and βi are constants). Let r be any number between 1 and k. Consider the following
set of inequalities,
(ai )T x ≥ βi (i = 1, . . . , k) (1.25)
We want to add to an IP the condition that at LEAST r of the constraints are satisfied.
Show how to satisfy this requirement so that the resulting formulation is an IP.
H INT: add a decision variable for each constraint in (1.25).
(d) Consider the following set of values,
Suppose that we want to add to an IP the condition that the variable x takes only one of
the values in S . Show how to satisfy this requirement so that the resulting formulation
is an IP.
H INT: add a decision variables for each number in the set S .
1.3. INTEGER PROGRAMS 31
7. The company C & O operates an oil pipeline pumping oil from Alberta to various states in
the Northwestern USA. The figure below shows the direction of flow, four input lines and
the three output lines. Note for instance that State A can only get its oil from either Input
1 or from the Yukon Input line.
Figure 1.1: The structure of the oil pipeline, inputs and outputs.
Each input line has a capacity (barrels/day) and a cost per barrel:
Each state has a daily demand (barrels/day) that must be met exactly:
State A B C
Demand 3500 3000 4000
The input from Yukon is not owned by the company and activating that line has a fixed
cost of $11000. per day.
Write an IP that plans the activities of the company C & O for a day (how many barrels of
oil to pump from each input line) by minimizing the total daily cost of the company while
meeting all the demand.
8. A company won a government bid to meet the yearly demands d1 , d2 , . . . , dn in the areas
j ∈ {1, 2, . . . , n}. Now the company has to decide where to build its factories and how
much of each factory’s output will be shipped to which of these n areas.
There are m potential locations for building the factories. If the company decides to build
at location i ∈ {1, 2, . . . , m} then the fixed cost of building the factory (yearly amortized
32 CHAPTER 1. INTRODUCTION
version) is fi and the yearly capacity of the factory will be si . The cost of transporting one
unit of the product from location i to area j is given as ci j .
Construct an IP whose solution indicates where to build the factories, how many units of
product to ship from each factory to each demand area so that the demand is met and the
total yearly cost of the company is minimized.
9. Your boss asks you to purchase bi units of product i for each i in a set P of products. (These
products are all divisible, that is, they can be obtained in fractional amounts.) Of course,
your boss wants you to spend as little money as possible. You call up all the stores in a set
S of stores, and store j gives you a per-unit price ci j for product i for all i, j.
(a) You decide to just order all bi units of product i from the store that gives the cheapest
per-unit price, for each i. Show that this is optimal.
(b) Actually, there is another constraint. Your boss forgot to tell you that he does not
want you to buy from too many different stores—he wants you to keep the number of
stores from which you buy to at most (integer) k. Formulate this problem as an IP.
(c) It turns out that the stores have special deals. If the total value of your order from
store j is at least t j dollars, it will give you d j cash back. (All stores j offer such a
deal, with perhaps different values of t j and d j .) Formulate this problem as an IP.
10. KW mining has an open-pit mine with 12 blocks of ore as shown in this diagram. The
mineability condition says that no block can be mined without mining all blocks which lie
at least partially above it. For example, block 7 cannot be mined unless blocks 2 and 3 are
mined, and block 12 requires blocks 8 and 9, and hence 3, 4 and 5 too.
(a) Formulate as an integer program the problem of deciding which blocks should be
mined, in order to maximize total value of blocks mined, and satisfy the mineability
condition, if at most 7 blocks can be mined.
(b) What extra constraint would you add if, in addition to all constraints needed in (a), it
is required that the total volume mined must not exceed 10,000 m3 ?
(c) Mines often have a minimum short-term return requirement. For this mine, the board
of the company requires that the total value of blocks mined in the first two years must
total at least 1, 000, 000. Each block takes 1 year to mine, at most two blocks can be
mined at once, and a block cannot be mined in a given year unless all ones lying at
least partially above it were mined by the year before. Besides this new condition, the
mineability condition still applies, and at most 7 blocks can be mined. Formulate as
an integer program the problem of deciding which blocks should be mined, subject to
these constraints, in order to maximize total value of blocks mined.
11. Suppose that you are given an N × N chessboard. We wish to place N queens on the board
such that no two queens share any row, column of diagonal. The next figure shows a
solution for N = 8. Formulate this problem as an integer feasibility problem (i.e. an IP
5 3
4 6
7 2
1 3 6 9
4 6 9 5
9 8 2 7
2 9
8 1
6 4
• each row of A,
• each column of A,
• each 3-submatrix A1 , . . . , A9
13. In Conway’s Game of Life, we are given a chess board of size n × n for some positive
integer n. Each cell (i, j) of this board has up to 8 neighboring cells Ni, j . A cell (i, j) can
be either alive or dead and a configuration of the game consists of a set of cells that are
alive: L = {(i1 , j1 ), . . . , (ik , jk )}. We use a set of simple rules to compute the successor
configuration succ(L ) to L :
(a) if there is at most one living cell in the neighborhood of (i, j) then (i, j) will be dead
in the next iteration
(b) if there are exactly two living cells in Ni, j then we do not change the status of (i, j)
(c) if (i, j) has exactly three living neighbors in Ni, j that its status in the next configuration
will be alive
(d) if there are at least four living cells in the neighborhood of (i, j) then (i, j) will be
dead in the next iteration.
1.4. OPTIMIZATION PROBLEMS ON GRAPHS 35
1 2 5
5
1 2 3 4
4 3
Let G = (V, E) be a graph. Suppose uv ∈ E. Then u and v are adjacent vertices; u, v are
the endpoints of edge uv; and edge uv is incident to vertices u and v. In this section we present
several optimization problems that can be expressed as optimization problems in graphs.
s a Columbia Ave
Albert St
d
Fisher Rd
King St
c
Westmount Rd
Erb St
b f g
Victoria St
t
Figure 1.2). You are staying with a friend who lives at the intersection of Fischer Road and
Columbia Avenue (location s), and you wish to visit the Tannery District which is located
at the intersection of King and Victoria streets (location t). A shortest route from s to t is
indicated by the think lines in Figure 1.2. The problem of finding such a shortest route is
known as the shortest path problem.
Let us rephrase the problem in the language of graph theory. We represent in Figure 1.3 the
street network by a graph, where vertices correspond to intersections and edges to roads. For
instance, the graph G = (V, E) corresponding to the WaterTown map is given by the following
figure, where the number ce ≥ 0 next to each edge e corresponds to the length (in meter) of
80
450
630
900
700
d
590
610
5
250
41
325 c 830 600
325 b g
f
350
0
50
720
700
330
such that v1 = s, vk = t, and vi 6= v j for all i 6= j. 3 The length c(P) of an st-path is defined as
the sum of the length of the edges of P, i.e. as ∑(ce : e ∈ P).
Example 5. In the figure 1.3 the thick black edges in the graph form an st-path
of total length,
We can now define formally the shortest path problem. We are given a graph G = (V, E)
with non-negative weights ce for every edge e ∈ E, and distinct vertices s,t. We wish to find
among all possible st-paths, one that is of minimum length. Thus the optimization problem
we wish to solve is,
min c(P)
subject to (1.26)
P is an st-path.
In the following section we will see how to formulate this optimization problem as an IP.
Jobs
Employees
10 20 30 40
1 - 5 - 7
2 8 - 2 4
3 - 1 - 8
4 8 3 3 -
For instance employee 3 is not qualified to do job 1. We can represent this data by a graph
G = (V, E) where V = I ∪ J, and i j ∈ E where i ∈ I, j ∈ J if employee i is qualified to do job
j. Moreover, edge i j is assigned weight ci, j . We represent the weighted graph corresponding
to the table in Figure 1.4.
1 2 3 4
I
7 4 1
8 8
8 5 3 2 3
A graph G = (V, E) is bipartite if we can partition the vertices into two sets, say I and J,
such that every edge has one endpoint in I and one endpoint in J. Given a graph, a subset
of edges M is called a matching if no vertex is incident to more than one edge of M. A
matching is called perfect if every vertex is incident to exactly one edge in the matching. The
weight c(M) of a matching M is defined as the sum of the weights of the edges of M, i.e. as
∑(ce : e ∈ M).
Example 6. The graph in Figure 1.4 is bipartite as every edge has an endpoint in {1, 2, 3, 4}
and one endpoint in {10 , 20 , 30 , 40 }. The set of edges {120 , 230 , 340 } is a matching. However,
it is not a perfect matching as no edge of M is incident to 10 for instance. The set of edges
M = {140 , 210 , 320 , 430 } is a perfect matching. The matching M has weight,
Going back to our optimization problem, since we wish to assign every employee to ex-
actly one job, and every job to exactly one employee, we are looking for a perfect matching
in the graph given in Figure 1.4. As we are trying to minimize the total completion of all the
1.4. OPTIMIZATION PROBLEMS ON GRAPHS 39
jobs, we wish to find among all possible perfect matching one that is of minimum weight.
Thus given a bipartite graph G = (V, E) with non-negative edge weights ce for all e ∈ E, the
optimization problem we wish to solve is,
min c(M)
subject to (1.27)
M is a perfect matching.
In the following section we will see how to formulate this optimization problem as an IP.
Exercises
1. Let G = (V, E) be a graph with distinct vertices s,t. An even st-path is an st-path that has
an even number of edges. Show that we can formulate the problem of finding an even
st-path with as few edges as possible, as a minimum cost perfect matching problem.
H INT: Make a copy of the graph G and remove vertices s and t. Call the resulting graph
G0 . Construct a new graph H starting with the union of graph G and G0 and by joining
every vertex v 6= s,t in G with its copy v0 in G0 .
2. This exercise concerns an optimal storage of books (for a library) by their size to mini-
mize the storage cost for a given collection of books. Suppose that we know the heights
and thicknesses of all the books in a collection (assuming that all widths fit on the same
shelving, we consider only a two-dimensional problem and ignore book widths). Sup-
pose that we have arranged the book heights in ascending order of their n known heights
H1 , H2 , . . . , Hn ; that is, H1 < H2 < . . . < Hn . Since we know the thicknesses of the books,
we can compute the required length of shelving for each height class. Let Li denote the
length of shelving for books of height Hi . If we order shelves of height Hi for length L,
we incur cost equal to F +Ci L; F is a fixed ordering cost (and is independent of the length
or the height of shelf ordered) and Ci is the cost of the shelf per unit length of height Hi ,
C1 ≤ C2 ≤ . . . ≤ Cn .
Notice that in order to save the fixed cost of ordering, we might not order shelves of every
possible height, because we can use a shelf of height Hi to store books of smaller heights.
For an example, if we have 5 books of height H1 , . . . , H5 and length L1 , . . . , L5 respectively,
one solution is to order a shelf of height H3 , length (L1 + L2 + L3 ) to hold books 1,2 and
40 CHAPTER 1. INTRODUCTION
3, plus a shelf of height H5 , length (L4 + L5 ) to hold books 4 and 5. Such a strategy would
incur a cost of
[F +C3 (L1 + L2 + L3 )] + [F +C5 (L4 + L5 )].
We want to determine the length of shelving for each height class that would minimize the
total cost of the shelving.
Formulate this problem as a shortest paths problem. Give a clear description of the graph,
and the cost of each edge; also indicate the start vertex (s) and the end vertex (t). Draw the
graph for n = 4.
4
a
3 6
d 2 b
1 2
g
Figure 1.5:
Then, δ (a) = {ad, ag, ab} and δ (b) = {db, ab, gb}.
We are now ready to find an IP formulation for the minimum cost perfect matching prob-
lem. We are given a graph G = (V, E), distinct vertices s and t and edge weights ce for every
edge e. We need to determine the variables, the objective function, and the constraints.
1.5. INTEGER PROGRAMS, CONTINUED 41
∑ xe : e ∈ δ (v) .
∑ xe : e ∈ δ (v) = 1. (1.28)
Objective function. We wish to minimize the total weights of the edges in the matching
M we selected. Thus the objective function should return the weight of M. If e is an edge of
M then we will have xe = 1 and we should contribute ce to the objective function, otherwise
xe = 0 and we should contribute 0 to the objective function. This can be modelled by the term
ce xe . Thus the objective function should be,
The formulation for the minimum weight matching problem is obtained by combining (1.28)
and (1.29) as well as the condition that xe be a decision variables. Thus we obtain,
min ∑ ce xe : e ∈ E
subject to
(1.30)
∑ xe : e ∈ δ (v) = 1 (v ∈ V )
xe ≥ 0 (e ∈ E)
xe integer (e ∈ E)
Consider an arbitrary edge f and let v be one of its endpoints. Then f ∈ δ (v), and thus
x f ≤ ∑(xe : e ∈ δ (v)) = 1. Thus 0 ≤ x f ≤ 1. As x f is integer, x f ∈ {0, 1}, as required.
As an example let us write the formulation (1.30) for the graph in Figure 1.5. Let x denote
the vector (xab , xbc , xdg , xad , xag , xbd )T and let 1 denote the vector of all ones (of appropriate
42 CHAPTER 1. INTRODUCTION
Each of the first 4 constraints correspond to one of the 4 vertices of G. For instance the second
constraint, corresponding to vertex b, states that xab + xbg + xbd = 1. It says that we should
select exactly one of the edges incident to b.
Let us verify the correctness of the formulation (1.30). We have a graph G = (V, E) and
edge weights ce for every edge e. Let M be a perfect matching. We need to verify (see
Section 1.1.2) that M corresponds to a feasible solution x̄ of (1.30). To do this we set x̄e = 1
if e is an edge of M, and x̄e = 0 otherwise. Conversely, suppose that x̄ is a feasible solution
of (1.30) then M := {e : x̄ = 1} is a perfect matching. Finally, observe that the objective
function of the formulation computes the weight of the corresponding perfect matching.
δ (U) = {uv ∈ E : u ∈ U, v ∈
/ U}.
a
3 2
s 1 t
4 2
Figure 1.6:
Then, δ ({s, a}) is the set of all edges with exactly one end equal to either a or s, i.e. it is
the set {sb, ab, at}. Suppose a graph G has a distinct set of vertices s and t then an st-cut is
a set of edges of the form δ (U) where s ∈ U and t ∈
/ U. The set of all st-cuts in the graph in
Figure 1.6 is given by,
δ ({s}) = {sa, sb}
δ ({s, a}) = {at, ab, sb}
(1.31)
δ ({s, b}) = {sa, ab, bt}
δ ({s, a, b}) = {at, bt}.
These st-cuts are obtained by considering all possible sets of vertices U where s ∈ U and
t∈
/ U. Note, the term st-cut arises from the fact that if we remove all the edges in an st-cut,
the resulting graph breaks into (at least) two parts that are not joint by any edge and where
one part contains s and the other contains t.
Consider a graph G = (V, E) with distinct vertices s and t and let P be an st-path of G. Let
δ (U) be an arbitrary st-cut of G = (V, E). Follow the path P, starting from s and denote by
u the last vertex of P in U, and denote by u0 the vertex that follows u in P. Note that u exists
/ U. Then by definition uu0 is an edge that is in δ (U). Since δ (U) was an
since s ∈ U and t ∈
arbitrary st-cut, we have shown the following remark,
Remark 1.2. Let S be a set of edges that contains at least one edge from every st-cut. Then
there exists an st-path P that is contained in the edges of S.
Proof. Let U denote the set of vertices that contain s as well as all vertices u for which there
44 CHAPTER 1. INTRODUCTION
We are now ready to find an IP formulation for the shortest path problem. We are given a
graph G = (V, E), distinct vertices s and t and edge lengths ce ≥ 0 for every edge e. We need
to determine the variables, the objective function, and the constraints.
Variables. We will introduce a decision variable xe for every edge e, where xe = 1 rep-
resents the case where edge e is selected to be part of our st-path, and xe = 0, means that the
edge e is not selected.
Constraints. Remark 1.2 says that in order to construct an st-path, it suffices to select one
edge from every st-cut. We will use this idea for our formulation. Let δ (U) be an arbitrary
st-cut of G. The number of edges we selected from the st-cut δ (U) is given by,
∑(xe : e ∈ δ (U))
and since we wish to select at least one edge from δ (U), the constraint should be,
Objective function. We wish to minimize the total length of the edges in the st-path P we
selected. Thus the objective function should return the length of P. If e is an edge of P then
we will have xe = 1 and we should contribute ce to the objective function, otherwise xe = 0
and we should contribute 0 to the objective function. This can be modelled by the term ce xe .
Thus the objective function should be,
The formulation for the shortest path problem is obtained by combining (1.32) and (1.33)
1.5. INTEGER PROGRAMS, CONTINUED 45
min ∑(cexe : e ∈ P)
subject to
(1.34)
∑(xe : e ∈ δ (U)) ≥ 1 (U ⊆ V, s ∈ U,t 6∈ U)
xe ≥ 0 (e ∈ E)
xe integer (e ∈ E)
Observe in this formulation that the condition that xe ∈ {0, 1} has been replaced by the con-
dition that xe ≥ 0 and xe is integer, i.e. we removed the condition that xe ≤ 1. This is correct,
for if x̄ is a feasible solution to (1.34), where x̄e ≥ 2, then x0 obtained from x̄ by setting xe0 = 1,
is also a feasible solution to (1.34), moreover, the objective value for x0 is smaller or equal to
the objective value for x̄ (as ce ≥ 0).
As an example let us write the formulation (1.34) for the graph in Figure 1.6. Let x
denote the vector (xsa , xsb , xab , xat , xbt )T and let 1 denote the vector of all ones (of appropriate
dimension). The IP in that case is,
Each of the first 4 constraints correspond to one of the 4 distinct st-cuts δ (U), given by (1.31).
For instance the second constraint, corresponding to δ ({s, a}) = {sb, ab, at}, states that xsb +
xab + xat ≥ 1. It says that we should select at least one edge from the st-cut δ ({s, a}).
Let us verify the correctness of the formulation (1.34). We have a graph G = (V, E) with
distinct vertices s and t and will assume that ce > 0 for every e ∈ E, i.e. every edge has positive
length. 4 Let P be an st-path. We need to verify (see Section 1.1.2) that P corresponds to a
feasible solution x̄ of (1.34). To do this we set x̄e = 1 if e is an edge of P, and x̄e = 0 otherwise.
4 The formulation does not work if ce < 0 for some edge e and there are technical difficulties when ce = 0.
46 CHAPTER 1. INTRODUCTION
By Remark 1.1 every st-cut δ (U) contains an edge of P, hence, ∑(x̄e : e ∈ δ (U)) ≥ 1. It
follows that x̄ is feasible for the formulation. It is not true however, that every feasible solution
to the formulation corresponds to an st-path. This is seemingly in contradiction with the
strategy outline in Section 1.1.2). However, it suffices to show that every optimal solution
x̄ to the formulation corresponds to an st-path. Let S = {e ∈ E : x̄ = 1}. We know from
Remark 1.2 that the edges of S contain an st-path. If S itself is an st-path P then we are done.
Otherwise, define x0 by setting xe0 = 1 if e is an edge of P, and xe0 = 0 otherwise. Then, x0 is
also a feasible solution to the formulation, but it has an objective value that is strictly smaller
than x̄, a contradiction. It follows that S is an st-path, as required.
Exercises
1. Let G = (V, E) be a graph with non-negative edge weights ce for every edge e. The maxi-
mum weight matching problem, is the problem of finding a matching of maximum weight.
An edge cover, is a set of edges S that has the property that every vertex is the endpoint
of some edge of S. The minimum weight edge-cover problem, is the problem of finding an
edge cover of minimum weight.
4 5
6 7 8
(a) Formulate the following problem as an IP: find a vertex cover in the graph G above of
smallest possible size.
1.5. INTEGER PROGRAMS, CONTINUED 47
(b) Now do the same for an arbitrary graph G. That is, you are given a graph G = (V, E)
formulate the problem of finding a minimum size vertex cover, as an IP.
3. Consider a graph G = (V, E). A triangle T is a set of three distinct vertices i, j, k where
i j, jk and ik are all edges. A packing of triangles is a set S of triangles with the property
that no two triangles in S have a common vertex. The Maximum Triangle Packing Prob-
lem (MTP) is the problem of finding a packing of triangles S with as many triangles as
possible. As an example consider the following graph,
3
2 4 5
8
1 7 9
6 12
10 11
Then triangles T1 = {2, 3, 7}, T2 = {4, 8, 11}, form a packing of triangles. Formulate the
(MTP) problem for this graph G as an IP.
H INT: Define one variable for every triangle and write one constraint for every vertex.
4. Let G = (V, E) be a graph, let s,t be vertices of G, and suppose each edge e has a non-
negative length ce . Remark 1.1 states that an st-cut δ (U) intersects every st-path.
(a) Suppose S is a set of edges that contains at least one edge from every st-path. Show
that there exists an st-cut δ (U) that is contained in the edges of S.
(b) The weight of an st-cut δ (U) is defined as c(δ (U)) := ∑ ce : e ∈ δ (U) . Find an IP
formulation for the problem of finding an st-cut of minimum weight, where we have a
decision variable for every edge and a constraint for every st-path.
H INT: This is analogous to the formulation given in (1.34).
min z = f (x)
s.t.
g1 (x) ≤ 0,
g2 (x) ≤ 0, (1.35)
.. .. ..
. . .
gm (x) ≤ 0.
We will abbreviate the term non linear program by NLP throughout this book. The reader
might be troubled by the fact that NLP are always minimization problems. What if we wish to
maximize the objective function f (x)? Then we could replace in (1.35) z = f (x) by z = − f (x),
as making the function − f (x) as small as possible is equivalent to making f (x) as large as
possible. Note that when each of the functions f (x) and g1 (x), . . . , gm (x) are affine functions
in (1.35), then we obtain an LP.
Demand
Price ($)
Region 1 Region 2 Region 3
50 130 90 210
60 125 80 190
70 80 20 140
The company wants to model the demand in region i as a function di (p) of the unit’s price.
Dibson believes that the demand within a region can be modelled by a quadratic function, and
it uses the data from the above table to obtain the following quadratic demand functions:
1.6. NON LINEAR PROGRAMS 49
The previous figure shows the demand functions for the three regions. Dibson wants to
restrict the price to be between $50 and $70. What price should Dibson choose in order to
maximize its revenue? We will formulate this problem as an NLP. In this case there is a single
variable, namely, the price p of the unit and the constraints state that 50 ≤ p ≤ 70.
Objective function. For each region i, the demand for region i is given by di (p). Since
the price is p, the revenue from region i is given by pdi (p). Thus the total revenue from each
3 regions is given by,
3
p ∑ di (p) =
i=1
p −0.2p2 + 21.5p − 445 − 0.25p2 + 26.5p − 610 − 0.15p2 + 14.5p − 140 =
− 0.6p3 + 62.5p2 − 1195p.
50 CHAPTER 1. INTRODUCTION
where the first equality follows by (1.36). Thus we obtain the following NLP,
Using an algorithm to solve NLPs, we find that a price of p = 57.9976 maximizes Dibson’s
revenue. The revenue at this price is $23872.80024.
min kx − x̄k
subject to
Ax ≤ b
q
Recall that kxk := x12 + x22 + s + xn2 . Even though the above problem is equivalent to
min kx − x̄k2
subject to
Ax ≤ b
However, we are worried about violating constraints and as a result, we want a solution that is
as far away as possible from getting close to violating any of the constraints. Then one option
is to solve the optimization problem
!
m n
max ∑ ln bi − ∑ Ai j x j
i=1 j=1
subject to
Ax ≤ b.
Another option is to solve the optimization problem
m
1
min ∑ bi − ∑n
i=1 j=1 Ai j x j
subject to
Ax ≤ b
Indeed, we can pick any, suitable function which is finite valued and monotone decreasing
for positive arguments, and tends to infinity as its argument goes to zero with positive values.
1
Note that the functions used above as building blocks f1 : R → R, f1 (x) := x and f2 : R → R,
f2 (x) := − ln(x) both have these desired properties.
Further note that the objective functions of both of these problems are nonlinear. There-
fore, they are both NLP problems but not LP problems.
Exercises
1. UW Smokestacks has a chimney which needs cleaning regularly. Soot builds up on the
inside of the chimney at a rate of 1cm per month. The cost of cleaning when the soot has
thickness x cm is 13 + x2 hundred dollars. Cleaning can occur at the end of any month.
The chimney is clean at present (i.e. at the start of month 1). According to the contract
with the cleaners, it must be cleaned precisely 9 times in the next 48 months, one of which
times must be at the end of month 48. Also, at least three cleanings must occur in the first
15 months.
It is needed to determine which months the cleaning should occur, to minimize total cost.
Formulate a nonlinear mathematical program to solve this problem, such that all con-
52 CHAPTER 1. INTRODUCTION
straints are either linear or integrality constraints. (Hint: Consider a variable giving the
time of the ith cleaning.)
2. A number of remote communities want to build a central warehouse. The location of each
of the community is described by an (x, y)-coordinate as given in the next table,
x y
Dog River 3 9
Springfield 10 6
Empire Falls 0 12
Worthing 4 9
If the warehouse is located at the coordinate (7, 6) then the distance from the warehouse to
Dog River will be, q
(7 − 3)2 + (6 − 9)2 = 5.
Goods from the warehouse will be delivered to the communities by plane. The price of
a single delivery to a community is proportional to the distance between the warehouse
and the community. Each community will need a number of deliveries from the ware-
house which depends on the size of the community. Dog River will require 20 deliveries,
Springfield 14, Empire Falls 8, and Worthing 24. The goal is to locate the warehouse, i.e.
find (x, y)-coordinates, so that the total cost of all deliveries is minimized. Formulate this
problem as an NLP.
3. A 0, 1 program is an integer program where every variable is a decision variable, i.e. can
only take values 0 or 1. Show that every 0, 1 program can be expressed as an NLP.
H INT: for every variable x consider the linear equality with left hand side x(x − 1).
of model considered. The simplest of these models is linear programming and an algorithm
to solve linear programs is the Simplex algorithm that we describe in Chapter 2.
In Chapter 3 will develop efficient algorithms to solve two classical optimization prob-
lems, namely the problem of finding a shortest path between a fixed pair of vertices in an
undirected graph and the the problem of finding a minimum cost perfect matching in a bi-
partite graph. We show in the process that the natural framework for understanding these
problems is the theory of linear programming duality. This theory will be further developed
in Chapter 4. Further applications to this theory are given in Chapter 5. General techniques to
solve integer programs are studies in Chapter 6. Finally, in Chapter 7 we explain how some
of the concepts introduced for linear programs extend to non-linear programs.
description of the problem and the data. In some applications (actually in almost all applica-
tions), some of the data will be uncertain. There are more advanced tools to deal with such
situations (see, for instance, the literature on Robust Optimization starting with the book [2]).
Many of the subjects that we introduced in this chapter have a whole course dedicated
to them. For example, the portfolio optimization example is an instance of Markowitz model.
It was proposed by Harry Markowitz in the 1950’s. For his work in the area, Markowitz
received the Nobel Prize in Economics in 1990. For applications of optimization in financial
mathematics, see the books [3, 7]. For further information on scheduling, see [19].
Chapter 2
1. it is infeasible,
55
56 CHAPTER 2. SOLVING LINEAR PROGRAMS
3. it is unbounded.
Clearly, each of these outcomes are mutually exclusive (i.e. no two can occur at the same
time). We will show that in fact exactly one of these outcomes must occur (this is a form of
the Fundamental Theorem of Linear Programming which will be established at the end of this
chapter). We will now illustrate each of these outcomes with a different example.
Suppose further that a friend (or maybe your boss) asked you to solve (P). After spending
a substantial amount of time on the problem, you are starting to become more and more
convinced that (P) is in fact infeasible. But how can you be certain? How will you convince a
third party (your friend or boss) that (P) does indeed have no solution? You certainly can not
hope to claim to have tried all possible sets of values for x1 , x2 , x3 , x4 as there are an infinite
number of possibilities.
We now present a concise way of proving that the system (2.1)-(2.4) has no solution.
The first step is to create a new equation by combining equations (2.1),(2.2) and (2.3). We
pick some values y1 , y2 , y3 . Then we multiply equation (2.1) by y1 , equation (2.2) by y2 and
equation (2.3) by y3 and add each of the resulting equations together. If we choose the values
y1 = 1, y2 = −2 and y3 = 1 we obtain the equation,
Let us proceed by contradiction and suppose that there is in fact a solution x̄1 , x̄2 , x̄3 , x̄4 to (2.1)-
(2.4). Then clearly x̄1 , x̄2 , x̄3 , x̄4 must satisfy (2.5) as it satisfies each of (2.1), (2.2) and (2.3).
2.1. POSSIBLE OUTCOMES 57
As x̄1 , x̄2 , x̄3 , x̄4 are all non-negative, it follows that x̄1 + 4x̄2 + 2x̄3 ≥ 0. But this contradicts
constraint (2.5). Hence, our hypothesis that there was in fact a solution to (2.1)-(2.4) must be
false.
The vector y = (1, −2, 1)T is the kind of proof that should satisfy a third party; given y,
anyone can now easily and quickly check that there is no solution to (P). Of course, this proof
will only work for an appropriate choice of y and we have not told you how to find such a
vector at this juncture. We will derive an algorithm that either finds a feasible solution, or
finds a vector y that proves that no feasible solution exists.
We wish to generalize this argument, but before this can be achieved, we need to become
comfortable with matrix notation. Equations (2.1)-(2.3) can be expressed as:
Ax = b
where
x1
4 10 −6 −2 6 x2
A = −2 2 −4 1
b = 5 x=
x3 . (2.6)
−7 −2 0 4 3
x4
Then Ax is a vector with three components. Components 1,2 and 3 of the vector Ax cor-
respond to respectively the left hand side of equations (2.1),(2.2) and (2.3). For a vector
y = (y1 , y2 , y3 )T , yT (Ax) is the scalar product of y and Ax and it consists of multiplying the
left hand side of equation (2.1) by y1 , the left hand side of equation (2.2) by y2 , the left hand
side of equation (2.3) by y3 and adding each of resulting expressions. Also, yT b is the scalar
product of y and b and it consists of multiplying the right hand side of equation (2.1) by y1
the right hand side of equation (2.2) by y2 and the right hand side of equation (2.3) by y3 and
adding each of the resulting values. Thus
yT (Ax) = yT b
which is equation (2.5). We then observed that all the coefficients in the left hand side of (2.5)
are non-negative, i.e. that (1, 4, 2, 0) ≥ 0T or equivalently that yT A ≥ 0T . We also observed
that the right hand side of (2.5) is negative or equivalently that yT b < 0. (Note, 0 denotes the
number zero and 0 the column vector whose entries are all zero.) These two facts implied that
there is no solution to (2.1)-(2.4), i.e. that Ax = b, x ≥ 0 has no solution.
Let us generalize this argument to an arbitrary matrix A and vector b. We will assume that
the matrices and vectors have appropriate dimensions so that the matrix relations make sense.
This remark holds for all subsequent statements.
Ax = b x≥0
1. yT A ≥ 0T and
2. yT b < 0.
Note if A has m rows then the vector y must have m components. Then the matrix equation
yT Ax = yT b is obtained by multiplying for every i ∈ {1, . . . , m} row i of A by yi and adding all
the resulting equations together.
Proof of Proposition 2.1. Let us proceed by contradiction and suppose that there exists a so-
lution x̄ to Ax = b, x ≥ 0 and that we can find y such that yT A ≥ 0T and yT b < 0. Since Ax̄ = b
is satisfied, we must also satisfy yT Ax̄ = yT b. Since yT A ≥ 0T and x̄ ≥ 0, it follows that
yT Ax̄ ≥ 0. Then 0 ≤ yT Ax̄ = yT b < 0, a contradiction.
We call a vector y which satisfies conditions (1) and (2) of Proposition 2.1 a certificate of
infeasibility. To convince a third party that a particular system Ax = b, x ≥ 0 has no solution,
it suffices to exhibit a certificate of infeasibility. Note, while we have argued that such a
certificate will be sufficient to prove infeasibility, it is not clear at all that for every infeasible
system there exists a certificate of infeasibility. The fact that this is indeed so, is a deep result
which is known as Farkas’ Lemma, see Theorem 4.8.
2.1. POSSIBLE OUTCOMES 59
max{z(x) = cT x : Ax = b, x ≥ 0}
where
−1 x1
1 1 −3 1 2 7 0 x2
A= 0 1 −2 2 −2 b = −2 c=
3
x=
x3 . (2.7)
−2 −1 4 1 0 −3 7 x4
−1 x5
This linear program is unbounded. Just like in the previous case of infeasibility, we are looking
for a concise proof that establishes this fact. Specifically, we will define a family of feasible
solutions x(t) for all real numbers t ≥ 0 and show that as t tends to infinity, so does the value
of x(t). This will show that (2.7) is indeed unbounded.
We define for every t ≥ 0,
x(t) = x̄ + td
where
x̄ = (2, 0, 0, 1, 2)T and d = (1, 2, 1, 0, 0)T .
For instance when t = 0 then x(t) = (2, 0, 0, −1, 1)T and when t = 2, x(t) = (4, 4, 2, 1, 2)T . We
claim that for every t ≥ 0, x(t) is feasible for (2.7). Let us first check that the equations Ax = b
hold for every x(t). You can verify that Ax̄ = b and that Ad = 0. Then we have,
as required. We also need to verify that x(t) ≥ 0 for every t ≥ 0. Note that x̄, d ≥ 0 hence,
x(t) = x̄ + td ≥ td ≥ 0, as required. Let us investigate what happens to the objective function
for x(t) as t increases. Observe that cT d = 2 > 0 then
and as t tends to infinity so does cT x(t). Hence, we have proved that the linear program is in
fact unbounded.
Given x̄ and d, anyone can now easily verify that the linear program is unbounded. We
have not told you how to find such a pair of vectors x̄ and d. We will want an algorithm that
detects if a linear program is unbounded and when it is, provides us with the vectors x̄ and d.
60 CHAPTER 2. SOLVING LINEAR PROGRAMS
We leave the proof of the following proposition to the reader, as the argument is essentially
the one which we outlined in the above example.
Proposition 2.2. Suppose there exists a feasible solution x̄ and a vector d such that,
1. Ad = 0,
2. d ≥ 0,
3. cT d > 0.
Suppose you are being told that (2, 0, 0, 4, 0)T is an optimal solution to (2.8). Clearly you will
not simply believe such a claim but rightfully request a proof. It is easy enough to verify that
(2, 0, 0, 4, 0)T is feasible and that the corresponding value is 3. However, once again, trying
all feasible solutions, and comparing their values to that of (2, 0, 0, 4, 0)T is impossible.
In order to construct a concise proof of optimality, we will prove that z(x̄) ≤ 3 for every
feasible solution x̄, i.e. no feasible solution has value that exceeds 3. Since (2, 0, 0, 4, 0)T has
value equal to 3 it will imply that it is optimal. We seemingly traded one problem for another
2.1. POSSIBLE OUTCOMES 61
however. How to show that, z(x̄) ≤ 3 for every feasible solution x̄? It suffices in this particular
case to analyze the objective function as,
where the inequality follows from the fact that the scalar product of two vectors, one where
all entries a non-positive and one where all entries are non-negative, is always non-positive.
Hence, (2, 0, 0, 4, 0)T is indeed optimal.
Consider now the following linear program,
Observe that the only difference between (2.8) and (2.9) is the objective function. In particular,
(2, 0, 0, 4, 0)T is a feasible solution to (2.9). We claim that it is in fact an optimal solution. To
do this we construct a new constraint by multiplying the first constraint of the linear program
by −1, the second by 2 and by adding the two constraints together. Then we obtain,
1 −2 1 0 2 2
(−1, 2) x = (−1, 2)
0 1 −1 1 3 4
This equation holds for every feasible solution of (2.8). Thus adding this equation to the ob-
jective function z(x) = (−1, 3, −5, 2, 1)x−3 will not change the value of the objective function
for any of the feasible solutions. The resulting objective function is
z(x) = (−1, 3, −5, 2, 1)x − 3 − (−1, 4, −3, 2, 4)x + 6 = (0, −1, −2, 0, −3)x + 3.
However, this is the same objective function as in the linear program (2.8). It means that
the linear programs (2.9) are essentially the same linear programs, written differently. In
particular, this proves that (2, 0, 0, 4, 0)T is an optimal solution to (2.9).
62 CHAPTER 2. SOLVING LINEAR PROGRAMS
we will always be able to show that an optimal solution x̄ of value z(x̄) is indeed optimal by
proving that z(x̄) is an upper bound for (P). We will see later in this chapter that it is always
possible to prove such an upper bound by combining the objective function with a linear
combination of the constraints.
Exercises
1. (a) Prove that the following LP problem is infeasible.
max 3x1 + 4x2 + 6x3
s.t. 3x1 + 5x2 − 6x3 = 4
x1 + 3x2 − 4x3 = 2
−x1 + x2 − x3 = −1
x1 , x2 , x3 ≥ 0
max{cT x : Ax = b, x ≥ 0}
is unbounded, where
1
4 2 1 −6 −1 11 −2
A = −1 1 −4 1 3 , b = −2 , c =
1
3 −6 5 3 −5 −8 1
1
H INT: Consider the the vectors x̂ = (1, 3, 1, 0, 0)T and d = (1, 1, 1, 1, 1)T .
(d) For each of the problems in parts (b) and (c), give a feasible solution having objective
value exactly 5000.
2.2. STANDARD EQUALITY FORM 63
2. Let A be an m × n matrix and let b be a vector with m entries. Prove or disprove each of
the following statements (in both cases y is a vector with m entries),
max{cT x + z̄ : Ax = b, x ≥ 0}.
where z̄ denotes some constant. In other words, an LP is in SEF, if it satisfies the following
conditions:
1. it is a maximization problem,
Here is an example,
We will develop an algorithm that given an LP (P) in SEF will either: prove that (P) is infea-
sible by exhibiting a certificate of infeasibility, or prove that (P) is unbounded by exhibiting
a certificate of unboundedness, or find an optimal solution and show that it is indeed optimal
by exhibiting a certificate of optimality.
However, not every linear program is in SEF. Given an LP (P) which is not in SEF, we
wish to ”convert” (P) into an LP (P’) in SEF and apply the algorithm to (P’) instead. Of course
we want the answer for (P’) to give us some meaningful answer for (P). More precisely what
we wish is for (P) and (P’) to satisfy the following relationships:
64 CHAPTER 2. SOLVING LINEAR PROGRAMS
3. given any optimal solution of (P’) we can construct an optimal solution of (P).
Linear programs that satisfy the relationships (1),(2) and (3) are said to be equivalent.
We now illustrate on an example how to convert an arbitrary linear program into an equiv-
alent linear program in SEF. Note, we will leave it to the reader to verify that at each step we
do indeed get an equivalent linear program:
The notation used here means that ≥, ≤ and = refer to the 1st, 2nd, and 3rd constraints
respectively. We want to convert (2.11) into an LP in SEF. We will proceed step by step.
First note that (2.11) is a minimization problem. We can replace min(−1, 2, −4)(x1 , x2 , x3 )T
by max(1, −2, 4)(x1 , x2 , x3 )T , or more generally min cT x by max −cT x. The resulting linear
program is as follows,
We do not have the condition that x3 ≥ 0 as part of the formulation in (2.12). We call such
a variable free. We might be tempted to simply add the constraint x3 ≥ 0 to the formulation.
However, by doing so we may change the optimal solution as it is possible for instance that all
optimal solutions to (2.12) satisfy x3 < 0. The idea here is to express x3 as the difference of
two non-negative variables, say x3 = x3+ − x3− where x3+ , x3− ≥ 0. Let us rewrite the objective
2.2. STANDARD EQUALITY FORM 65
Let us rewrite the left hand side of the constraints with these new variables,
1 5 3 x1 1 5 3
2 −1 2 x2 = x1 2 + x2 −1 + x3 2
1 2 −1 x3 1 2 −1
1 5 3
= x1 2 + x2 −1 + (x3+ − x3− ) 2
1 2 −1
1 5 3 −3
+ −
= x1 2 + x2 −1 + x3 2 + x3 −2
1 2 −1 1
x
1 5 3 −3 1
x2
= 2 −1 2 −2 x + .
1 2 −1 1 3
x3−
In general, for an LP with variables x = (x1 , . . . , xn )T if we have a variable xi where xi ≥ 0 is
not part of the formulation (i.e. a free variable), we introduce variables xi+ , xi− ≥ 0 and define
x0 = (x1 , . . . , xi−1 , xi+ , xi− , xi+1 , . . . , xn )T . Replace the objective function cT x by c0T x0 where
c0 = (c1 , . . . , ci−1 , ci , −ci , ci+1 . . . , cn )T . If the left hand side of the constraints is of the form
Ax where A is a matrix with columns A1 , . . . , An replace Ax by A0 x0 where A0 is the matrix
which consists of columns A1 , . . . , Ai−1 , Ai , −Ai , Ai+1 , . . . , An .
The new linear program is as follows,
Let us replace the constraint 2x1 − x2 + 2x3+ − 2x3− ≤ 4 in (2.13) by an equality constraint. We
introduce a new variable x4 where x4 ≥ 0 and we rewrite the constraint as 2x1 − x2 + 2x3+ −
2x3− + x4 = 4. The variable x4 is called a slack variable. More generally, given a constraint of
the form ∑ni=1 ai xi ≤ β we can replace it by ∑ni=1 ai xi + xn+1 = β , where xn+1 ≥ 0.
The resulting linear program is as follows,
max (1, −2, 4, −4, 0)(x1 , x2 , x3+ , x3− , x4 )T
subject to
x 1
1 5 3 −3 0
x+2 ≥ 5 (2.14)
2 −1 2 −2 1 x = 4
3
1 2 −1 1 0 x3− = 2
x4
x1 , x2 , x3+ , x3− , x4 ≥ 0.
Let us replace the constraint x1 + 5x2 + 3x3+ − 3x3− ≥ 5 in (2.14) by an equality constraint.
We introduce a new variable x5 , where x5 ≥ 0 and we rewrite the constraint as x1 + 5x2 +
3x3+ − 3x3− − x5 = 5. The variable x5 is also called a slack variable. More generally, given a
constraint of the form ∑ni=1 ai xi ≥ β we can replace it by ∑ni=1 ai xi − xn+1 = β where xn+1 ≥ 0.
The resulting linear program is as follows,
max (1, −2, 4, −4, 0, 0)(x1 , x2 , x3+ , x3− , x4 , x5 )T
subject to
x1
x2
1 5 3 −3 0 −1 +
5
2 −1 2 −2 1 0 x3− = 4
x
1 2 −1 1 0 0 3
x4 2
x5
x1 , x2 , x3+ , x3− x4 , x5 ≥ 0.
Note that after relabeling the variables x1 , x2 , x3+ , x3− , x4 , x5 by x1 , x2 , x3 , x4 , x5 , x6 , we obtain the
linear program (2.10). We leave to the reader to verify that the aforementioned transformations
are sufficient to convert any linear program into an LP in SEF.
Exercises
1. Convert the following linear programs (LP) into SEF form,
2.3. A SIMPLEX ITERATION 67
(a)
min (2, −1, 4, 2, 4)(x1 , x2 , x3 , x4 , x5 )T
subject to
x1
1 2 4 7 3 ≤ 1
2 8 9 0 0 x2 =
x3 2
1 1 0 2 6
x4 ≥ 3
−3 4 3 1 −1 ≥ 4
x5
x1 ≥ 0, x2 ≥ 0, x4 ≥ 0
min cT x + d T y
subject to
Ax ≥ b
Bx + Dy = f
x≥0
where x = (x1 , x2 , x3 , x4 , x5 )T . Because (2.15) has a special form it is easy to verify that x̄ =
(0, 0, 6, 10, 4)T is a feasible solution with value z(x̄) = 0. Let us try to find a feasible solution
x with larger value. Since the objective function is z(x) = 2x1 + 3x2 by increasing the value of
x̄1 or x̄2 we will increase the value of the objective function. Let us try to increase the value of
68 CHAPTER 2. SOLVING LINEAR PROGRAMS
x̄1 while keeping x̄2 equal to zero. In other words, we look for a new feasible solution x where
x1 = t for some t ≥ 0 and x2 = 0. The matrix equation will tell us which values we need for
x3 , x4 , x5 . We have,
6 1 1 1 0 0
10 = 2 1 0 1 0 x
4 −1 1 0 0 1
1 1 1 0 0
= x1 2 + x2 1 + x3 0 + x4 1 + x5 0
−1 1 0 0 1
1 1 x3
= t 2 + 0 1 + x4 .
−1 1 x5
Thus,
x3 6 1
x4 = 10 − t 2 . (2.16)
x5 4 −1
The larger we pick t ≥ 0, the more we will increase the objective function. How large can we
choose t to be? We simply need to make sure that x3 , x4 , x5 ≥ 0, i.e. that,
1 6
t 2 ≤ 10 .
−1 4
10
Thus, t ≤ 6 and 2t ≤ 10, i.e. t ≤ 2. Note that −t ≤ 4 does not impose any bound on how
large t can be. Picking the largest possible t yields 5. We can summarize this computation as,
6 10
t = min , ,− .
1 2
x̄. By repeating this type of computations and rewriting the linear program at every step, this
will lead to an algorithm for solving linear programs which is know as the Simplex algorithm.
Exercises
1. In this exercise you are asked to repeat the argument in Section 2.3 with different examples.
2.4.1 Bases
Consider an m × n matrix A where the rows of A are linearly independent. We will denote
column j of A by A j . Let J be a subset of the column indices (i.e. J ⊆ {1, . . . , n}), we define
70 CHAPTER 2. SOLVING LINEAR PROGRAMS
AJ to be the matrix formed by columns A j for all j ∈ J (where the columns appear in the order
given by their corresponding indices). We say that a set of column indices B forms a basis if
the matrix AB is a square non-singular matrix. Equivalently, a basis corresponds to a maximal
subset of linearly independent columns. Consider for instance,
2 1 2 −1 0 0
A = 1 0 −1 2 1 0 .
3 0 3 1 0 1
Then B = {2, 5, 6} is a basis as the matrix AB is the identity matrix. Note that B = {1, 2, 3}
and {1, 5, 6} are also bases, while B = {1, 3} is not a basis (as AB is not square in this case)
and neither is B = {1, 3, 5} (as AB is singular in this case). We will denote by N the set of
column indices not in B. Thus, B and N will always denote a partition of the column indices
of A.
Suppose that in addition to the matrix A we have a vector b with m components, and
consider the system of equations Ax = b. For instance say,
2 1 2 −1 0 0 2
1 0 −1 2 1 0 x = 1 . (2.17)
3 0 3 1 0 1 1
Variables x j are said to be basic when j ∈ B and non-basic otherwise. The vector which is
formed by the basic variables is denoted by xB and the vector which is formed by the non-basic
variables is xN . We assume that the components in xB appear in the same order as AB and that
the components in xN appear in the same order as AN . For instance, for (2.17) B = {1, 5, 6} is
a basis, then N = {2, 3, 4} and xB = (x1 , x5 , x6 )T , xN = (x2 , x3 , x4 )T .
The following easy observation will be used repeatedly,
n
Ax = ∑ x j A j = ∑ x j A j + ∑ x j AJ = ABxB + AN xN .
j=1 j∈B j∈N
1. Ax̄ = b and
2. x̄N = 0.
For (2.17) and basis B = {1, 5, 6}, the unique basic solution x̄ is x̄2 = x̄3 = x̄4 = 0 and,
−1
x̄1 2 0 0 2 1
x̄5 = 1 1 0 1 = 0 .
x̄6 3 0 1 1 −2
If (1) occurs, then linear program is infeasible and we can stop. If (2) occurs, then we can
eliminate a redundant constraint. We repeat the procedure until all rows of A are linearly inde-
pendent. Hence, throughout this chapter we will always assume (without stating it explicitly)
that the rows of the matrix defining the left hand side of the equality constraints in an LP in
SEF are linearly independent.
72 CHAPTER 2. SOLVING LINEAR PROGRAMS
(C2) cB = 0.
We will show that given any basis B, we can rewrite the linear program (P) so that it is in
canonical form. For instance, B = {1, 2, 4}, is a basis of the linear program (2.15) that can be
rewritten as the following equivalent linear program,
5 1
max z(x) = 17 + (0, 0, − , 0, − )x
2 2
subject to
1 0 1/2 0 −1/2 1 (2.18)
0 1 1/2 0 1/2 x = 5
0 0 −3/2 1 1/2 3
x ≥ 0.
2.4. BASES AND CANONICAL FORMS 73
max{cT x + z̄ : Ax = b, x ≥ 0},
A−1 −1
B Ax = AB b. (2.19)
A−1 −1
B Ax = AB (AB xB + AN xN )
= A−1 −1
B AB xB + AB AN xN
= xB + A−1
B AN xN .
In particular, the columns corresponding to B in the left hand side of (2.19) form an identity
matrix as required. Moreover, we claim that the set of solutions to Ax = b is equal to the set
of solutions to (2.19). Clearly, every solution to Ax = b is also a solution to A−1 −1
B Ax = AB b as
these equations are linear combinations of the equations Ax = b. Moreover, every solution to
A−1 −1 −1 −1
B Ax = AB b is also a solution to AB AB Ax = AB AB b, but this equation is simply Ax = b,
proving the claim.
Let us consider condition (C2). Let B be a basis of A. Suppose A has m rows, then for any
vector y = (y1 , . . . , ym )T the equation,
yT Ax = yT b
Since this equation holds for every feasible solution, we can add this constraint to the objective
function z(x) = cT x + z̄. The resulting objective function is,
Let c̄T := cT − yT A. For (C2) to be satisfied we need c̄B = 0 and need to choose y accordingly.
Namely we want that
c̄TB = cTB − yT AB = 0T
or equivalently that,
yT AB = cTB .
ATB y = cB .
Note that inverse operation and the transpose operations commute, hence (A−1 T T −1
B ) = (AB ) .
Therefore, we will write A−T −1 T
B for (AB ) . Hence, the previous relation can be rewritten as,
y = A−T
B cB . (2.21)
max{z(x) = cT x + z̄ : Ax = b, x ≥ 0}
and a basis B of A are given. Then, the following linear program is an equivalent linear
program in canonical form for the basis B,
where y = A−T
B cB .
76 CHAPTER 2. SOLVING LINEAR PROGRAMS
Exercises
1. Consider the system Ax = b where the rows of A are linearly independent. Let x̄ be a
solution to Ax = b. Let J be the column indices j of A for which x̄ j 6= 0.
(a) Show that if x̄ is a basic solution then the columns of AJ are linearly independent.
(b) Show that if the columns of AJ are linearly independent then x̄ is a basic solution.
Note that (a) and (b) gives you a way of checking whether x̄ is a basic solution, namely
you simply need to verify whether the columns of AJ are linearly independent.
here x is a vector of the appropriate length. Find an equivalent LP in canonical form for
2.4. BASES AND CANONICAL FORMS 77
In each case, state whether the basis (i.e. the corresponding basic solution) is feasible.
max cT x
subject to
Ax = b
x j ≥ 0 for all j = 1, . . . , n − 1
Convert (P) into a linear program (P’) in SEF by replacing the free variable xn by two
variables xn+ and xn− . Show then that no basic solution x̄ of (P’) satisfy x̄n+ > 0 and
x̄n− > 0.
Assume that the rows of A are linearly independent. Let x(1) and x(2) be two distinct
feasible solutions of (P) and define,
1 1
x̄ := x(1) + x(2) .
2 2
(b) Show that if x(1) and x(2) are optimal solutions to (P) then so is x̄.
(d) Deduce from (b) and (c) that if (P) has two optimal solutions then (P) has an optimal
solution that is NOT a basic feasible solution.
78 CHAPTER 2. SOLVING LINEAR PROGRAMS
(a) Find the basic solution x̄ of (P) for the basis B = {1, 2, 3}.
(b) Show that x̄ is an optimal solution of (P).
(c) Show that x̄ is the unique optimal solution of (P).
H INT: Show that for every feasible solution x0 of value 17, { j : x0j 6= 0} ⊆ B. Then
use Exercise 1.
max cT x + z̄
subject to
Ax = b
x≥0
where A is an m × n matrix. Let x̄ be a feasible solution to (P) and let J = { j : x̄ j > 0}.
Call a vector d an good direction for x̄ if it satisfies the following properties:
2.5. THE SIMPLEX ALGORITHM 79
(a) Show that if the columns of AJ are linearly dependent then there exists a good
direction for x̄.
H INT: Use the definition of linear dependence to get a vector d. Then possibly
replace d by −d.
(b) Show that if x̄ is not basic then there exists a good direction for x̄.
H INT: Use (a) and the exercise 4(b).
(c) Show that if x̄ has a good direction then there exists a feasible solution x0 of (P) such
that the set J 0 := { j : x0j > 0} has fewer elements than J.
H INT: Let x0 = x + εd for a suitable value ε > 0.
(d) Show that if (P) has a feasible solution then it has a feasible solution that is basic.
H INT: Use (b) and (c) repeatedly.
max{z(x) = 10 + cT x : Ax = b, x ≥ 0},
where
0
1 1/2 0 1/2 0 5 2
A = 0 1/2 1 −1/2 0 b = 1
c=
0 . (2.22)
0 3/2 0 1/2 1 9 −1
0
Let us try to find a feasible solution x with value larger than x̄. Recall, B and N partition the
column indices of A, i.e. N = {2, 4}. Since the linear program is in canonical form, cB = 0.
80 CHAPTER 2. SOLVING LINEAR PROGRAMS
Therefore, to increase the objective function value, we must select k ∈ N such that ck > 0 and
increase the component x̄k of x̄. In this case, our choice for k is k = 2. Therefore, we set x2 = t
for some t ≥ 0. For all j ∈ N where j 6= k we keep component x j of x equal to zero. It means
in this case that x4 = 0. The matrix equation will tell us what values we need to choose for
xB = (x1 , x3 , x5 )T . Following the same argument as in Section 2.3 we obtain that,
x1 5 1/2
xB = x3 = 1 − t 1/2 (2.23)
x5 9 3/2
max{z(x) = 14 + cT x : Ax = b, x ≥ 0},
2.5. THE SIMPLEX ALGORITHM 81
where
0
1 0 −1 1 0 4 0
A = 0 1 2 −1 0
b = 2 c=
−4
(2.26)
0 0 −3 2 1 6 1
0
Here we have N = {3, 4}. Let us first choose which element k enters the basis. We want k ∈ N
and ck > 0. The only choice is k = 4. We compute t by taking the smallest ratio between entry
bi and entry Aik (where k = 4) for all i where Aik > 0, namely,
4 6
t = min , −, = 3,
1 2
where ”-” indicates that the corresponding entry of Aik is not positive. The minimum was
attained for the ratio 26 , i.e. the 3rd row. It corresponds to the third component of xB . As the
3rd element of B is 5 we will have x5 = 0 for the new solution. Hence, 5 will be leaving the
basis and the new basis will be {1, 2, 4}.
Let us proceed with the next iteration. Using the formulae in Proposition 2.4 we can
rewrite the linear program so that it is in canonical form for the basis B = {1, 2, 4}. We get,
5 1
max z(x) = 17 + 0, 0, − , 0, − x
2 2
subject to
1 0 1/2 0 −1/2 1 (2.27)
0 1 1/2 0 1/2 x = 5
0 0 −3/2 1 1/2 3
x ≥ 0.
The basic solution is x̄ := (1, 5, 0, 3, 0)T and z(x̄) = 17. We have N = {3, 5}. We want k ∈ N
and ck > 0. However, c3 = − 52 and c5 = − 12 so there is no such choice. We claim that this
occurs because the current basic solution is optimal.
Let x0 be any feasible solution then,
0 5 1
z(x ) = 17 + 0, 0, − , 0, − x0 ≤ 17.
2 2 |{z}
| {z } ≥0
≤0
Thus 17 is an upper bound for the value of any feasible solution. It follows that x̄ is an optimal
solution.
82 CHAPTER 2. SOLVING LINEAR PROGRAMS
max{z(x) = cT x : Ax = b, x ≥ 0}
where
−1
3
−2 4 1 0 1 1
A= b= c=
0 .
−3 7 0 1 1 3 0
1
It is in canonical form for the basis B = {3, 4}. Then N = {1, 2, 5}. Let us choose which
element k enters the basis. We want k ∈ N and ck > 0. We have choices k = 2 and k = 5. Let
us select 5. We compute t by taking the smallest ratio between entry bi and entry Aik for all i
where Aik > 0. Namely,
1 3
min , .
1 1
1
The minimum is attained for the ratio 1 which corresponds to the first row. The first basic
variable is 3. Thus 3 is leaving the basis. Hence, the new basis is B = {4, 5}.
Using the formulae in Proposition 2.4 we can rewrite the linear program so that it is in
canonical form for the basis B = {4, 5}. We get,
max{z(x) = 1 + cT x : Ax = b, x ≥ 0},
where
1
−1
−1 3 −1 1 0 2
A= b= c=
−1 .
−2 4 1 0 1 1 0
0
Here N = {1, 2, 3}. Let us choose which element k enters the basis. We want k ∈ N and ck > 0.
The only possible choice is k = 1. We compute t by taking the smallest ratio between entry bi
and entry Aik for all i where Aik > 0. However, as Ak ≤ 0, this is not well defined. We claim
that this occurs because the linear program is unbounded.
The new feasible solution x(t) is defined by setting x1 (t) = t for some t ≥ 0 and x2 = x3 = 0.
The matrix equation Ax = b tells us which values to choose for xB (t) namely (see argument in
Section 2.3),
x4 (t) 2 −1
xB (t) = = −t .
x5 (t) 1 −2
2.5. THE SIMPLEX ALGORITHM 83
Thus, we have
t 0 1
0 0 0
x(t) =
0 = 0 +t 0 .
2 + t 2 1
1 + 2t 1 2
| {z } | {z }
:=x̄ :=d
Proof. Suppose cN ≤ 0. Note that z(x̄) = z̄ + cTN x̄N = z̄ + cTN 0 = z̄. For any feasible solution x0
we have x0 ≥ 0. As cN ≤ 0, it implies that cT x0 = cTN xN0 ≤ 0. It follows that z(x0 ) ≤ z̄, i.e. z̄ is
an upper bound for (P). As z(x̄) = z̄, the result follows.
Now suppose that for some k ∈ N we have ck > 0. We define xN0 which depends on some
parameter t ≥ 0 as follows
t if j = k
0
xj =
0 if j ∈ N \ {k}.
84 CHAPTER 2. SOLVING LINEAR PROGRAMS
Proof. Suppose Ak ≤ 0. Then for all t ≥ 0 we have xB0 = b − tAk ≥ 0. Hence, x0 is feasible.
Moreover,
z(x0 ) = z̄ + cTN xN0 = z̄ + ∑ c j x0j = z̄ + ck xk0 = z̄ + ckt.
j∈N
Thus, we may assume that Aik > 0 for some row index i. We need to choose t so that
xB0 ≥ 0. It follows from (2.28) that,
tAk ≤ b
or equivalently, for every row index i for which Aik > 0 we must have,
bi
t≤ .
Aik
Hence, the largest value t for which x0 remains non-negative is given by
bi
t = min : Aik > 0 .
Aik
Note that since Ak ≤ 0 does not hold, this is well defined. Let r denote the index i where the
minimum is attained in the previous equation. Then (2.28) implies that the rth entry of xB0 will
be zero. Let ` denote the rth basic variable of B. Note that since we order the components
of xB in the same order as B, the rth component of xB0 is the basic variable x`0 . It follows that
x`0 = 0. Choose
B0 = B ∪ {k} \ {`}, N 0 = {1, . . . , n} ∪ {`} \ {k}.
We let the reader verify that xN0 0 = 0. It can be readily checked that B0 is a basis. Then, x0 must
be a basic solution for the basis B0 .
Let us summarize the simplex procedure for
Note, we have argued that if the algorithm terminates, then it provides us with a correct
solution. However, the algorithm as it is described need not stop. Suppose that at every step,
2.5. THE SIMPLEX ALGORITHM 85
the quantity t > 0. Then at every step, the objective function will increase. Moreover, it
is clear that at no iteration will the objective function decrease or stay the same. Hence, in
that case we never visit the same basis twice. As there are clearly only a finite number of
bases, this would guarantee that the algorithm terminates. However, it is possible that at every
iteration the quantity t equals 0. Then at the start of the next iteration we get a new basis, but
the same basic solution. After a number of iterations, it is possible that we revisit the same
basis. If we repeat this forever, the algorithm will not terminate.
There are a number of easy refinements to the version of the Simplex algorithm we de-
scribed that will guarantee termination. The easiest to state is as follows: Throughout the
Simplex iterations with t = 0, in Step 2, among all j ∈ N with c j > 0, choose k := min{ j ∈
N : c j > 0}; also, in Step 3, define t as before and choose the smallest r ∈ B with Ark > 0 and
br
Ark = t. This rule is known as the smallest subscript rule or Bland’s rule.
Theorem 2.7. The Simplex procedure with the smallest subscript rule terminates.
Exercises
1. Consider the LP problem max{cT x : Ax = b, x ≥ 0} where
0
1 2 −2 0 2 3
A= b= c=
1 .
0 1 3 1 5
0
(a) Beginning with the basis B = {1, 4}, solve the problem with the simplex method. At
each step choose the entering variable by the smallest-subscript rule.
(b) Give a certificate of optimality or unboundedness for the the problem, and verify it.
Notice that the problem is in canonical form with respect to the basis B = {4, 5, 6} and
that B determines a feasible basic solution x. Beginning from this canonical form, solve
the problem with the simplex method. At each step choose the entering variable by the
smallest-subscript rule. At termination, give a certificate of optimality or unboundedness.
4. Suppose at some step of the simplex procedure we have a feasible basis B and a linear
program
max{cT x : Ax = b, x ≥ 0}
which is in canonical form for the basis B. Following the simplex procedure we choose an
entering variable k and a leaving variable `. At the next step of the simplex procedure we
will consider the basis B0 := B ∪ {k} \ {`}. Explain why the new set B0 is in fact a basis.
H INT: Matrix AB is the identity matrix. Show that the columns of AB0 are linearly in-
dependent. Whether a set J ⊆ {1, . . . , n} is a basis or not, does not change when we do
row operations on A, or equivalently, it does not change when we left-multiply A by an
invertible matrix.
5. The princesss wedding ring can be made from four types of gold 1, 2, 3, 4 with the follow-
ing amounts of milligrams of impurity per gram, and with values as follows,
Type 1 2 3 4
mg of lead 1 2 2 1
mg of cobalt 0 1 1 2
value 1 2 3 2
Set up an LP which will determine the most valuable ring that can be made containing
at most 6mg of lead and at most 10mg of cobalt. Put the LP into SEF and then solve it
using the simplex algorithm. (For ease of marking, keep the variables in the natural order,
and use the smallest index rule for choosing the entering variable when doing a simplex
iteration.)
min{cT x : Ax = b, x ≥ 0} (Q)
88 CHAPTER 2. SOLVING LINEAR PROGRAMS
(a) Indicate what changes you need to make to the algorithm to solve (Q) instead of (P)
(max was replaced by min). Do NOT convert (Q) into SEF, your algorithm should be
able to deal with (Q) directly.
(b) Explain why the solution is optimal for (Q) when your algorithm claims that it is.
(c) Explain why (Q) is unbounded when your algorithm claims that it is.
7. Let A be an m × n matrix (with independent rows), let c and ` be vectors with n entries and
let b be a vector with m entries. Consider the following linear program,
Thus if ` = 0 then (P) is in SEF. One way to solve (P) would be for you to convert it into
SEF (using the procedure in class) and use the Simplex. Instead I want you to come up
with your own algorithm to solve (P) directly by modifying the simplex algorithm. Here
are some guidelines,
• When ` = 0 then your algorithm should be the same as the Simplex algorithm,
• Given a basis B, redefine x̄ to be basic if Ax̄ = b and for every j 6∈ B, x̄ j = ` j (when
` = 0 this corresponds to the standard definition). Define, B to be feasible if the
corresponding basic solution x̄ is a feasible solution of (P).
• Your algorithm should go from feasible basis to feasible basis, attempting at each
step to increase the value of the corresponding basic solution. You need to indicate
how the entering and leaving variables are defined. You also need to indicate how to
detect when (P) is unbounded and when your solution is optimal.
Suppose you have in addition a vector u with n entries that give upper values for each of
the variables, i.e. you have the following linear program,
One way to solve (P’) would be for you to convert it into SEF and use the simplex. Instead
I want you to come up with your own algorithm to solve (P’) directly by modifying the
simplex algorithm.
H INT: Given a basis B redefine x̄ to be basic if Ax̄ = b and for every j 6∈ B either x j = ` j or
x j = u j . Suppose that you rewrite the objective function so that it is of the form
z = c̄T x + z̄,
2.5. THE SIMPLEX ALGORITHM 89
where z̄ is some constant and c̄ j = 0 for every j ∈ B. Then show that a feasible basic
solution x̄ (using our new definition) is optimal if the following holds for every j 6∈ B,
• if c̄ j < 0 then x̄ = ` j ,
• if c̄ j > 0 then x̄ = u j .
8. Suppose that an LP in SEF is changed by defining x10 by x1 = 10x10 and substituting for x1 .
The new problem is equivalent to the old problem. If we now solve the new problem with
the simplex method (and apply the same rule for the choice of the leaving variable), will
the same choice of entering variables occur as when the old problem was solved? Discuss
this question for the four entering-variable rules mentioned below:
(a) The largest coefficient rule: When Dantzig first proposed the simplex algorithm he
suggested choosing among all j ∈ N with c j > 0, a k ∈ N such that ck is maximum.
This rule is sometimes called Dantzig’s rule or the largest coefficient rule. Based on
the formula for the objective value for the new basic solution, it chooses the variable
giving the largest increase in the objective function per unit increase in the value of
the entering variable.
(b) The smallest subscript rule.
The largest improvement rule. This is the rule that chooses the entering variable to be
the one that leads to the largest increase in the objective value. To choose the variable,
therefore, we have to compute for each j ∈ N for which c j > 0, the amount t j by which
we will be able to increase the value of x j , and find the maximum of c j t j over all such
j and choose the corresponding index k,
(c) The steepest edge rule. This rule is geometrically motivated. In moving from the
current basic feasible solution to a new one, we move a certain distance in Rn . This
rule chooses the variable which gives the largest increase in the objective value per
unit distance moved. Suppose that k ∈ N is chosen and the new value of xk will be t.
Then the change in the value of the basic variable xi is −Aikt. Since there is no change
in the value of the other nonbasic variables, the distance moved is
r r
t + ∑ (Aikt) = t 1 + ∑ (Aik )2 .
2 2
i∈B i∈B
Since the change in the objective value is tck , we choose k ∈ N with ck > 0 so that
ck
p
1 + ∑i∈B (Aik )2
90 CHAPTER 2. SOLVING LINEAR PROGRAMS
is maximized.
9. Show that the simplex method can cycle in twelve iterations beginning from the following
tableau. (Hint: x3 enters on the first iteration, and x4 enters on the second. Now do you
see the pattern?)
10. Consider the Simplex Method applied to an LP problem in SEF with data (A, b, c) (A ∈
Rm×n , rank(A) = m).
B1 , B2 , . . . , Bt , Bt+1 = B1 .
J0 := { j : j ∈ Bi1 and j ∈
/ Bi2 for some i1 , i2 ∈ {1, 2, . . . ,t}} .
Let ` := max { j : j ∈ J0 } . Note that there exist h1 , h2 ∈ {1, 2, · · · ,t} such that when the
current basis in the Simplex Method is Bh1 , x` leaves the basis and when the current
basis is Bh2 , x` enters the basis. What is the value of xi for every i ∈ J0 in all the basic
feasible solutions visited throughout the cycle? Justify your answer.
2.6. FINDING FEASIBLE SOLUTIONS 91
(c) When x` leaves the basis, the Simplex Method constructs a d vector (a direction to
move along). Which d j are zero? Which d j are positive? Which d j are negative?
Which d j are non-negative? Justify your answer.
(d) Now, consider the Simplex Tableau when x` enters the basis. Which c̄ j are zero?
Which c̄ j are positive? Which c̄ j are non-positive? Justify your answer.
(e) Now, find a contradiction by considering cT d and c̄T d for these two iterations. and
conclude that starting with a basic feasible solution, the Simplex Method with the
smallest subscript rules terminates after finitely many iterations.
2.6.1 Reducibility
In Section 1.2.1 we saw an inventory problem (the KWoil problem) with three time periods.
We showed that this problem can be formulated as an LP. Moreover, it is easy to see that if
we generalize this inventory problem to one with an arbitrary number of time periods, it can
still be formulated as an LP. Thus, assuming that we have an efficient algorithm to solve linear
programs, we can find an efficient algorithm to solve any of these inventory problems. We
say that this class of inventory problem is reducible to linear programming. More generally,
consider two classes of optimization problems, say A and B. If given an efficient algorithm to
solve all instances of problem B, we can solve every instance of problem A efficiently, then
we say A is reducible to B.
Consider the following linear program in SEF,
max{cT x : Ax = b, x ≥ 0} (P)
Problem A.
92 CHAPTER 2. SOLVING LINEAR PROGRAMS
Either,
1. prove that (P) has no feasible solution, or
2. prove that (P) is unbounded, or
3. find an optimal solution to (P).
Problem B.
Either,
1. prove that (P) has no feasible solution, or
2. find a feasible solution to (P).
Problem C.
The problem we wish to solve is problem A. Note, that the simplex procedure essentially
solves problem C (with the difference that we need a feasible basic solution rather than an
arbitrary feasible solution, this will be addressed in Section 2.6.2). We will show the following
result,
Suppose you have an algorithm to solve C and wish to solve A. You proceed as follows:
Proposition 2.8 implies that you can solve problem B. If we get that (P) has no feasible so-
lution we can stop as we solved A. Otherwise, we obtain a feasible solution x̄ of (P). Using
this solution x̄ we can use our algorithm to solve C to either deduce that (P) is unbounded or
to find a feasible solution to (P). Hence, we have solved A as well. Hence, Proposition 2.8
implies that
with variables x1 , x2 , x3 , x4 ,
max (1, 2, −1, 3)x
subject to
1 5 2 1 7
x=
−2 −9 0 3 −13
x≥0
Using an algorithm for problem C we wish to find a feasible solution for (P) if it exists or
show that (P) has no feasible solution. Let us first rewrite the constraints of (P) by multiplying
every equation by −1, where the right hand side is negative, we obtain,
1 5 2 1 7
x=
2 9 0 −3 13
x = (x1 , x2 , x3 , x4 )T ≥ 0.
We now define the following auxiliary linear program,
min x5 + x6
subject to
1 5 2 1 1 0 7 (Q)
x=
2 9 0 −3 0 1 13
x = (x1 , x2 , x3 , x4 , x5 , x6 )T ≥ 0.
Variables x5 , x6 are the auxiliary variables. Observe that B = {5, 6} is a basis, and that the cor-
responding basic solution x̄ = (0, 0, 0, 0, 7, 13)T is feasible, since all entries are non-negative.
Note, that this is the case since we made sure that the right hand side of the constraints of
(Q) are all non-negative. We can use the algorithm for problem C to solve (Q). Note that 0
is a lower bound for (Q), hence (Q) is not unbounded. It follows that the algorithm for C
will find an optimal solution. In this case the optimal solution is x0 = (2, 1, 0, 0, 0, 0)T . Since
x50 = x60 = 0 it follows that (2, 1, 0, 0)T is a feasible solution to (P). Hence, we have solved
problem B.
Consider a second example and suppose that (P) is the following linear program,
max (6, 1, −1)x
subject to
5 1 1 1
x=
−1 1 2 5
x = (x1 , x2 , x3 )T ≥ 0.
94 CHAPTER 2. SOLVING LINEAR PROGRAMS
min x4 + x5
subject to
5 1 1 1 0 1 (Q)
x=
−1 1 2 0 1 5
x = (x1 , x2 , x3 , x4 , x5 )T ≥ 0.
An optimal solution to (Q) is x0 = (0, 0, 1, 0, 3)T which has value 3. We claim that (P) has no
feasible solution in this case. Suppose for a contradiction that there was a feasible solution
x̄1 , x̄2 , x̄3 to (P). Then x̄ = (x̄1 , x̄2 , x̄3 , 0, 0)T would be a feasible solution to (Q) of value 0,
contradicting the fact that x0 is optimal.
Let us summarize these observations. We consider
max{cT x : Ax = b, x ≥ 0} (P)
where A has m rows and n columns. We may assume that b ≥ 0 as we can multiply any
equation by −1 without changing the problem. We construct the auxiliary problem,
We leave the proof of the next remark as an easy exercise (follow the argument outlined in the
aforementioned examples),
We can now prove Proposition 2.8. Construct from (P) the auxiliary problem (Q). Find an
optimal solution x0 for (Q) using the algorithm for Problem C. Then (P) has a feasible solution
if and only if w = 0 as indicated in the previous Remark.
2.6. FINDING FEASIBLE SOLUTIONS 95
Note that the objective function is equivalent to min x4 + x5 . B = {4, 5} is a feasible basis,
however (Q) is not in canonical form for B. We could use the formulae in Proposition 2.4 to
rewrite (Q) in canonical form, but a simpler approach is to add each of the two equations of
(Q) to the objective function. The resulting linear program is,
max 2 1 0 0 0 x−4
subject to
1 2 −1 1 0 1
x= .
1 −1 1 0 1 3
x = (x1 , x2 , x3 , x4 , x5 )T ≥ 0
96 CHAPTER 2. SOLVING LINEAR PROGRAMS
Solving this linear program using the Simplex algorithm starting from B = {4, 5} we obtain
the optimal basis B = {1, 3}. The canonical form for B is,
max 0 0 0 −1 −1 x
subject to
1 12 0 12 1
2 (Q’)
2 x= .
0 − 32 1 − 21 1
2
1
T
x = (x1 , x2 , x3 , x4 , x5 ) ≥ 0
The basic solution corresponding to B is x̄ = (2, 0, 1, 0, 0)T which has value 0. It follows
from Remark 2.10 that (2, 0, 1)T is a feasible solution for (P). Moreover, (2, 0, 1)T is the basic
solution for basis B of (P), hence B is a feasible basis of (P). Note, it is always true that the
feasible solution we construct after solving (Q) using the Simplex procedure will be a basic
solution of (P). It need not be the case that B be a basis of (P), however.
P HASE II.
We can use the formulae in Proposition 2.4 to rewrite (P) in canonical form for the basis
B = {1, 3}. Note that to obtain the constraints we can use the constraints of (Q’) (omitting the
auxiliary variables). We obtain,
max 0 1 0 x+6
subject to
1 12 0 2
3 x= .
0 −2 1 1
x = (x1 , x2 , x3 )T ≥ 0
Solving this linear program using the Simplex algorithm starting from B = {1, 3} we obtain
the optimal basis B = {2, 3}. The canonical form for B is,
max −2 0 0 x + 10
subject to
2 1 0 4
x=
3 0 1 7
x≥0
Then the basic solution (0, 4, 7)T is an optimal solution for (P).
2.6. FINDING FEASIBLE SOLUTIONS 97
2.6.3 Consequences
Suppose that we are given an arbitrary linear program (P) in SEF. Let us run the Two Phase
method for (P) using the smallest subscript rule. Theorem 2.7 implies that the Two Phase
method will terminate. The following now is a consequence of our previous discussion,
2. if (P) has an optimal solution, then (P) has a basic feasible solution that is optimal.
Since we can convert any LP problem into SEF while preserving the main property of the LP,
the above theorem yields the following result for LP problems in any form.
1. (P) is infeasible.
2. (P) is unbounded.
Exercises
1. Consider an LP of the form max{cT x : Ax = b, x ≥ 0}. Use the Two Phase Simplex Method
using the smallest index rule, to solve the LP for each of the following cases,
(a)
1
1 −2 1 2
A= b= c = 3 .
−1 3 −2 −3
2
(b)
27
2 4 2 2
A= b= c = 2 .
−1 −3 −2 0
−6
98 CHAPTER 2. SOLVING LINEAR PROGRAMS
(c)
0
1
1 1 1 −1 0 1
A= b= c=
0 .
0 1 2 0 1 2 −1
0
(d)
2
0
−1 1 0 0 −1 1
A= b= c=
2 .
1 0 −1 1 2 1 0
3
(e)
−3
−1
2 −1 4 −2 1 2
A= b= c=
1 .
−1 0 −3 1 −1 1 4
7
2.7 Pivoting
The Simplex algorithm requires us to reformulate the problem in canonical form for every ba-
sis. We can describe the computation required between two consecutive iteration in a compact
way. We explain this next. Let T be an m × n matrix and consider (i, j), where i ∈ {1, . . . , m}
and j ∈ {1, . . . , n} such that Ti, j 6= 0. We say that matrix T 0 is obtained from T by pivoting on
element (i, j) if T 0 is defined as follows: for every row index k,
1 rowk (T ) if k = i
0
rowk (T ) = Ti, j
row (T ) − Tk, j row (T ) if k 6= i.
k Ti, j i
Observe, that the effect of pivoting on element (i, j) is to transform column j of the matrix T
into the vector ei (the vector where all the entries are zero except for entry i which is equal to
1).
The students should verify that T 0 = Y T where,
1 1/2 0
Y = 0 1/2 0 .
0 1 1
2 2 −1 0 3
0 3
2 3 x = −5 . (2.29)
3 −1 −2 1 5
We can represent this system by the matrix T . Namely, T is obtained from the coefficients
of the left hand side by adding an extra column corresponding to the right hand side. Given
T 0 we may construct a system of equations where we do the aforementioned operations in
reverse, namely, the set of all but the last columns of T 0 corresponds to the left hand side and
the last column corresponds to the right hand side. Then we get,
2 7/2 0 3/2 1/2
0 3/2 1 3/2 x = −5/2 . (2.30)
3 2 0 4 0
Since T 0 = Y T it follows that equation (2.30) is obtained from equation (2.29) by left multi-
plying by the matrix Y . Observe that the matrix Y is non-singular, hence the set of solutions
for (2.30) is the same as for (2.29). Hence, we used pivoting to derive an equivalent system of
equations.
We can proceed in a similar way in general. Given a system Ax = b we construct a matrix
T by adding to A an extra column corresponding to b, i.e. T = (A|b). Let T 0 = (A0 |b0 ) be
obtained from T by pivoting. Then T 0 = Y T for some non-singular matrix Y . It follows that
A0 = YA and b0 = Y b. So, in particular, Ax = b and A0 x = b0 have the same set of solutions.
We say that T is the tableau representing the system Ax = b.
To show how this discussion relates to the Simplex algorithm, let us revisit example (2.15),
100 CHAPTER 2. SOLVING LINEAR PROGRAMS
Note, the first constraint of (2.31) states that z − 2x1 − 3x2 = 0, i.e. that z = 2x1 + 3x3 , which
is the objective function. Let T 1 be the tableau representing the system (2.31), namely,
1 −2 −3 0 0 0 0
0 1 1 1 0 0 6
T1 = 0 2
.
1 0 1 0 10
0 −1 1 0 0 1 4
Note, for readability the vertical bars separate the z columns and the right hand side. The
horizontal separates the objective function from the equality constraints. Let us index the
column z by element 0 and the constraint corresponding to the objective function by 0 as well.
Thus the first row and first column of T 1 are row and column zero. Observe, that the linear
program is in canonical form for B = {3, 4, 5} as the columns of T 1 indexed by {0, 3, 4, 5},
form an identity matrix.
In general given the linear program,
max{z = z̄ + cT x : Ax = b, x ≥ 0} (P)
Let us try to solve the linear program working with the tableau T 1 only. We select as
an entering variable k ∈ N such that ck > 0. It means in T 1 that we are selecting a column
1 is smaller than 0. We can select column 1 or 2, say select column
k ∈ {1, . . . , 5} where T0,k
k = 1. Note that b corresponds to column 6 (rows 1 to 3) of T1 . We select the row index
1 /T 1 where T 1 > 0, i.e. we consider
i ∈ {1, 2, 3}, minimizing the ratio Ti,6 i,k i,k
6 10
min , ,−
1 2
where the minimum is attained for row i = 2. Let us now pivot on the element (k, i) = (1, 2).
We obtain the following tableau,
1 0 −2 0 1 0 10
0 0 1/2 1 −1/2 0 1
T2 =
0
.
1 1/2 0 1/2 0 5
0 0 3/2 0 1/2 1 9
Since we pivoted on (2, 1), column 1 of T 2 will have a 1 in row 2 and all other elements
will be zero. Since row 2 has zeros in columns 0, 3, 5 these columns will be unchanged in
T 2 . It follows that columns 0, 1, 3, 5 will form a permutation matrix in T 2 . We could permute
rows 1,2 of T 2 so that columns indexed by 0, 1, 3, 5 form an identity matrix, and therefore that
T 2 represents an LP in canonical form for the basis B = {1, 3, 5}. However, reordering the
rows will prove unnecessary in this procedure. The Simplex procedure consists of selecting a
column j, selecting a row i and pivoting on (i, j).
We state the remaining sequence of tableaus obtained to solve (2.15).
1 0 0 4 −1 0 14
0 0 1 2 −1 0 2
T3 =
0
1 0 −1 1 0 4
0 0 0 −3 2 1 6
and finally,
1 0 0 5/2 0 1/2 17
0 0 1 1/2 0 1/2 5
T4 =
0
.
1 0 1/2 0 1/2 1
0 0 0 −3/2 1 1/2 3
The corresponding objective function is now, z = 17−5/2x4 −1/2x5 . Hence the basic solution
x = (1, 5, 0, 3, 0)T is optimal.
102 CHAPTER 2. SOLVING LINEAR PROGRAMS
2.8 Geometry
In this section, we introduce a number of geometric concepts and will interpret much of the
material defined in the previous section through the lens of geometry. Questions that we will
address include: what can we say about the shape of the set of solutions to a linear program,
how are basic feasible solutions distinguished from the set of all feasible solutions, what does
that say about the simplex algorithm.
max (c1 , c2 )x
s.t.
1 1 3 (1)
0 1 2 (2) (2.32)
1 0 x ≤ 2 (3)
−1 0 0 (4)
0 −1 0 (5)
We represented the set of all feasible solutions to (2.32) in the following figure. The set of
all points (x1 , x2 )T satisfying constraint (2) with equality correspond to line (2). The set of
all points satisfying constraint (2) correspond to all points to the left of line (2). A similar
argument holds for constraints (1),(3),(4) and (5). Hence, the set of all feasible solutions of
(2.32) is the shaded region (called the feasible region). Looking at examples in R2 as above
can be somewhat misleading however. In order to get the right geometric intuition we need
to introduce a number
q of definitions. Given a vector x = (x1 , . . . , xn ), the Euclidean norm of
x is defined as x12 + . . . + xn2 and we will denote it by kxk. kxk is the length of vector x and
measures the distance of x from the origin 0.
Remark 2.13. Let a, b ∈ Rn . Then, aT b = kakkbk cos(θ ), where θ is the angle between a and
b. Therefore, for every pair of nonzero vectors a, b, we have
x2
(2)
3
(1)
2
(3)
(5)
0 1 2 3 x1
(4)
Figure 2.1: Feasible region
1. H := {x ∈ Rn : aT x = β } is a hyperplane, and
2. F := {x ∈ Rn : aT x ≤ β } is a halfspace.
Hence, H is the set of points satisfying constraint (?) with equality and F is the set of points
satisfying constraint (?). Suppose that x̄ ∈ H and let x be any other point in H. Then aT x̄ =
aT x = β . Equivalently, aT (x − x̄) = 0, i.e. a and x − x̄ are orthogonal. This implies (1) in the
following remark, we leave (2) as an exercise,
2. F is the set of points x for which a and x − x̄ form an angle of at least 90o .
We illustrate the previous remark in the following figure. The line is the hyperplane H and
the shaded region is the halfspace F.
104 CHAPTER 2. SOLVING LINEAR PROGRAMS
a
H
x̄
x x − x̄ x−
x̄
x
F
dim{x : Ax = 0} + rank(A) = n.
2.8.2 Convexity
Let x(1) , x(2) be two points in Rn . We define the line through x(1) and x(2) to be the set of points
We define the line segment with ends x(1) and x(2) to be the set of points
Observe that the aforementioned definitions correspond to the commonly used notions of lines
and line segments. A subset C of Rn is said to be convex if for every pair of points x(1) and
x(2) in C the line segment with ends x(1) , x(2) is included in C.
2.8. GEOMETRY 105
The shaded regions in figure (i) contained in R2 is convex, as is the shaded region (iii)
contained in R3 . The shaded regions corresponding to figures (ii) and (iv) are not convex. We
prove this for either case by exhibiting two points x(1) , x(2) inside the shaded region for which
the line segment with ends x(1) , x(2) is not completely included in the shaded region.
x (1) x (2)
x (2)
x (1)
(i) (ii) (iii) (iv)
Hence, x̄ ∈ H as required.
Remark 2.16. For every j ∈ J let C j denote a convex set. Then the intersection
\
C := {C j : j ∈ J}
Proof. Let x(1) and x(2) be two points that are in C. Then for every j ∈ J, x(1) , x(2) ∈ C j and
since C j is convex the line segment between x(1) and x(2) is in C j . It follows that the line
segment between x(1) and x(2) is in C. Hence, C is convex.
Note, that the unit ball is an example of a convex set that is not a polyhedron. It is the
intersection of an infinite number of halfspaces (hence, convex) but cannot be expressed as
the intersection of a finite number of halfspaces.
106 CHAPTER 2. SOLVING LINEAR PROGRAMS
x = λ x(1) + (1 − λ )x(2)
In figures (i) and (ii) the shaded regions represent convex sets
included in R2 . In (i) we indicate each of the six extreme points
by small dark circles. Note that for (ii) every point in the bound-
ary of the shaded figure is an extreme point. This shows in par- (i)
ticular that a convex set can have an infinite number of extreme
points. In figure (ii) we illustrate why the point x is not extreme
x (1) x
by exhibiting a line segment with ends x(1) , x(2) which are con- x (2)
tained in the shaded figure and that properly contains x.
(ii)
Next, we present a theorem that will characterize the extreme
points in a polyhedron. We first need to introduce some notation
and definitions. Let Ax ≤ b be a system of inequalities and let x̄ denote a solution to Ax ≤ b.
We say that a constraint of aT x ≤ β of Ax ≤ b is tight for x̄ if aT x̄ = β . Such constraints are
also called active in part of the literature. We denote the set of all inequalities among Ax ≤ b
that are tight for x̄ by A= x = b= .
We will illustrate this theorem on the polyhedron P that is a feasible region of the linear
program (2.32), see Figure (2.1). Suppose x̄ = (1, 2)T . We can see in that figure that x̄ is an
extreme point. Let us verify that this is what the previous theorem also indicates. Constraints
(1) and (2) are tight, hence,
= 1 1
A = .
0 1
2.8. GEOMETRY 107
It follows that rank(A= ) = 2 = n, hence x̄ is indeed an extreme point. Suppose x̄ = (0, 1)T . We
can see in that figure that x̄ is not an extreme point. Let us verify that this is what the previous
theorem also indicates. Constraint (4) is the only tight constraint, hence,
A= = −1 0 .
Proof of Theorem 2.19. Assume that rank(A= ) = n. We will show that x̄ is an extreme point.
Suppose for a contradiction this is not the case. Then there exist (see Remark 2.18) x(1) , x(2) ∈
P, where x(1) 6= x(2) and λ where 0 < λ < 1 for which x̄ = λ x(1) + (1 − λ )x(2) . Thus,
b= = A= x̄ = A= λ x(1) + (1 − λ )x(2) = |{z}
λ A = (1)
x(2)} ≤ λ b= + (1 − λ )b= = b= .
x } + (1 − λ ) |A={z
| {z | {z }
>0 ≤b= >0 ≤b=
Hence, we have equality throughout which implies that A= x(1) = A= x(2) = b= . As rank(A= ) = n
there is a unique solution to A= x = b= . Therefore, x̄ = x(1) = x(2) , a contradiction.
Assume that rank(A= ) < n. We will show that x̄ is not an extreme point. Since rank(A= ) < n
the columns of A are linearly dependent, and hence there is a non-zero vector d such that
A= d = 0. Pick ε > 0 small and define
Hence, x̄ = 21 x(1) + 12 x(2) and x(1) , x(2) are distinct. It follows that x̄ is in the line segment
between x(1) and x(2) . It remains to show that x(1) , x(2) ∈ P for ε > 0 small enough. Observe
first that
A= x(1) = A= (x̄ + εd) = |{z} A= d = b= .
A= x̄ +ε |{z}
=b= =0
Similarly, A =
x(2) = b . Let
=
aT x ≤ β be any of the inequalities of Ax ≤ b that is not in A= x ≤ b= .
It follows that for ε > 0 small enough,
Note that x̄ = (0, 0, 2, 1)T is a basic feasible solution. We claim that x̄ is an extreme point of
P. To be able to apply Theorem 2.19 we need to rewrite P as the set of solutions to Ax ≤ b for
some matrix A and vector b. This can be done by choosing,
1 3 1 0 2
2 2 0 1 1
−1 −3 −1 0 −2
−2 −2 0 −1 −1
A= −1 0
and b=
0 .
0 0
0 −1 0 0 0
0 0 −1 0 0
0 0 0 −1 0.
Let A= x ≤ b= be the set of tight constraints for x̄, then
1 3 1 0
2 2 0 1
−1 −3 −1 0
A =
=
−2 −2 0 −1
−1 0 0 0
0 −1 0 0
and it can be readily checked that the first two and last two rows of A= form a set of 4 linearly
independent rows. Hence rank(A= ) ≥ 4 = n. This implies (as x̄ is feasible) by Theorem 2.19
that x̄ is an extreme point of P. Using the idea outlined in the previous example we leave
it as an exercise to prove the following theorem which relates basic feasible solutions (for
problems in standard equality form) to extreme points,
Theorem 2.20. Let A be a matrix where the rows are linearly independent and let b be a
vector. Let P = {x : Ax = b, x ≥ 0} and let x̄ ∈ P. Then x̄ is an extreme point of P if and only
if x̄ is a basic solution of Ax = b.
x2
10
(1)
z=17 8
(2)
6
x*
(3) -4 -2 0 2 4 6 8 x1
In Figure 2.2 we indicate the feasible region of this linear program as well as a feasible
solution x∗ = (1, 5)T . The line z = 17 indicates the set of all vectors for which the objective
function evaluates to 17. The points above this line have objective value greater than 17 and
the points below this line have value less than 17. Since there are no points in the feasible
region above the line z = 17, it follows that x∗ is an optimal solution. As the line z = 17
intersects the feasible region in only one point, namely x∗ , this also implies that x∗ is the
unique optimal solution.
Let us use the simplex algorithm to find this optimal solution x∗ . We first need to refor-
mulate this problem in standard equality form. This can be achieved by introducing slack
variables x3 , x4 , x5 for constraints respectively (1),(2) and (3) of (P). We obtain,
i.e. the components x̃3 , x̃4 , x̃5 are defined as the value of the slack of the constraints (1),(2) and
(3) respectively of (P). Thus, x is feasible for (P) if and only if x̃ is feasible for (P̃). Suppose
x is a feasible solution of (P) which is not an extreme point of the feasible region. Then x
is properly contained in the line segment with ends x(1) , x(2) where x(1) , x(2) are feasible for
(P), i.e. x(1) 6= x(2) and there exists λ such that 0 < λ < 1 and x = λ x(1) + (1 − λ )x(2) . It
can be readily checked that x̃ = λ x̃1 + (1 − λ )x̃2 . Hence, x̃ is properly contained in the line
segment with ends x̃1 , x̃2 . In particular, x̃ is not an extreme point of the feasible region of (P̃).
Conversely, if x̃ is not an extreme point for (P̃) then x is not an extreme point for (P). Hence,
Remark 2.21. x is an extreme point for the feasible region of (P) if and only if x̃ is an extreme
point for the feasible region of (P̃).
Starting in Section 2.3 we solved the linear program (P̃). The following table summarizes
the sequence of bases and basic solutions we obtained,
ITERATION BASIS x̃T xT
1 {3, 4, 5} (0, 0, 10, 6, 4) (0, 0)
2 {1, 4, 5} (5, 0, 0, 1, 9) (5, 0)
3 {1, 2, 5} (4, 2, 0, 0, 6) (4, 2)
4 {1, 2, 3} (1, 5, 3, 0, 0) (1, 5)
At each step, x̃ is a basic solution. It follows from Theorem 2.20 that x̃ must be an extreme
point of the feasible region of (P̃). Hence, by Remark 2.21, x must be an extreme point of the
feasible region of (P). We illustrate this in Figure 2.3. Each of x(1) , x(2) , x(3) , x(4) is an extreme
point and the simplex moves from one extreme point to another “adjacent” extreme point. In
this example, at each iteration, we move to a different basic feasible solution. The simplex
goes from one feasible basis to another feasible basis at each iteration. It is possible however,
that the corresponding basic solutions for two successive bases are the same. Thus the simplex
algorithm can keep the same feasible basic solution for a number of iterations. With a suitable
rule for the choice of entering and leaving variables the simplex will eventually move to a
different basic solution, see Section 2.5.3.
2.8. GEOMETRY 111
x2
10
6
x(4)
x(3)
2
x(1) x(2)
-4 -2 0 2 4 6 8 x1
Exercises
1. (a) Show that the set of all optimal solutions to a linear program is a convex set.
(b) Deduce that a linear program has either:
• no optimal solution,
• exactly one optimal solution, or
• an infinite number of optimal solutions.
Hint: Use the Cauchy-Schwartz inequality, namely, for every pair of vectors x, y we have,
xT y ≤ kxk kyk.
Show that the set of all n × n Positive Semidefinite matrices forms a convex set.
Let C be a subset of ℜn .
E XTRA CREDIT.
We say that x is a convex combination of points x1 , . . . , xk if x = ∑kj=1 λ j x j for some real
numbers λ1 , . . . , λk where ∑kj=1 λ j = 1 and λ j ≥ 0 for all j = 1, . . . , k.
2.8. GEOMETRY 113
(d) Show that if for all x1 , . . . , xk ∈ C, all convex combinations of x1 , . . . , xk are in C, then
C is convex.
(e) Show that, if C is convex, then for all x1 , . . . , xk ∈ C, all convex combinations of
x1 , . . . , xk are in C.
7. For each of the following sets, either prove that it is not a polyhedron, or give a matrix A
and a vector b such that the set is the solution set to Ax ≤ b,
(a) Show that if x̄ is not an extreme point of C then C \ {x̄} is not convex.
(b) Show that if x̄ is an extreme point of C then C \ {x̄} is convex.
(a) Convert this LP into SEF in the standard way, and call this LP2. Solve LP2 by applying
the simplex algorithm. Make sure you start with the basis {3, 4}. In each iteration, for
the choice of the entering variable amongst all eligible variables, always choose the
one with the smallest index.
(b) Give a diagram showing the set of feasible solutions of the LP, and show the or-
der that the Simplex algorithm visits its extreme points. (For each extreme point
(x1 , x2 , x3 , x4 )T of LP2 visited by the Simplex algorithm, you should indicate the ex-
treme point (x1 , x2 )T of the original LP.
In this chapter we revisit the shortest path and minimum-cost matching problems. Both were
first introduced in Chapter 1, where we discussed practical example applications. We further
showed that these problems can be expressed as IPs. The focus in this chapter will be on
solving instances of the shortest-path and matching problems. Our starting point will be to use
the IP formulation we introduced in Chapter 1.5 We will show that studying the two problems
through the lens of linear programming duality will allow us to design efficient algorithms.
We develop this theory further in Chapter 4.
Recall the shortest path problem from Chapter 1.4.1. We are given a graph G = (V, E), non-
negative lengths ce for all edges e ∈ E, and two distinct vertices s,t ∈ V . The length c(P) of
a path P is the sum of the length of its edges, i.e. ∑(ce : e ∈ P). We wish to find among all
possible st-paths one that is of minimum length.
Example 7. In the following figure we show an instance of this problem. Each of the edges
in the graph is labeled by its length. The thick black edges in the graph form an st-path
P = sa, ac, cb, bt of total length 3 + 1 + 2 + 1 = 7. This st-path is of minimum length, hence
is a solution to our problem.
115
116 CHAPTER 3. DUALITY THROUGH EXAMPLES
a 4 b
3 1
1 2
s t
4 2
c 2 d
d
h
s t
j i g
Let us show that the path P = s j, ji, ig, gt is a shortest st-path of G. It has length 4. To show
that it is a shortest path, it suffices to show that every st-path has length at least 4. To do this
we exhibit the following collection of st-cuts (see the next figure),
where,
U1 = {s}, U2 = {s, a, j}, U3 = {s, a, j, b, h, i}, and U4 = V \ {t}.
Observe, that no two of these st-cuts share an edge. Let Q be an arbitrary st-path. We know
from Remark 1.1 that every st-path and st-cut have a common edge. Hence, Q must contains
an edge ei of δ (Ui ), for i = 1, 2, 3, 4. Since these st-cuts are disjoint, e1 , e2 , e3 , e4 are distinct
edges. In particular, Q contains at least 4 edges. As Q was an arbitrary st-path, every st-path
contains at least 4 edges. It follows that P = s j, ji, ig, gt is a shortest st-path of G. The
a b c
d
h
s t
δ(U1 ) δ(U4 )
j i g
δ(U2 ) δ(U3 )
Feasibility condition. For every edge e ∈ E, the total width of all st-cuts that (3.1)
contain e does not exceed the length of e.
Example 7, continued. We assign the following widths for the graph G in Example 7.
ȳU1 = 3 ȳU2 = 1
ȳU3 = 2 ȳU4 = 1
where
U1 = {s}, U2 = {s, a}, U3 = {s, a, c}, and U4 = {s, a, c, b, d}.
1 while it might be more natural to use the notation ȳδ (U) to denote the width of the st-cut δ (U) the notation
ȳU is more compact while being unambiguous.
118 CHAPTER 3. DUALITY THROUGH EXAMPLES
The other st-cuts are assigned a width of zero. The non-zero widths are represented in the
following figure. We claim that the widths are feasible. Let us check the condition for edge
y U2 = 1
a 4 b
3 1
1 2
s t
4 2
y U1 =3 2
c d y U4 = 1
y U3 = 2
ab for instance. The two st-cuts that have non-zero width and that contain ab are δ (U2 ) and
δ (U3 ) which have respectively width ȳU1 = 1 and ȳU2 = 2. Thus the total width of all st-cuts
that contain the edge ab is 1 + 2 = 3 which does not exceed the length cab = 4 of ab. We leave
it as an exercise to verify the feasibility condition for every other edge of the graph.
Suppose now we have a graph G = (V, E) (with s,t ∈ V and ce ≥ 0 for all e ∈ E) that has
feasible widths. Let Q denote an arbitrary st-path. By the feasibility condition, for every edge
e of Q, the total width of all st-cuts using e is at most ce . It follows that the total width of all
st-cuts using some edge of Q is at most ∑(ce : e ∈ Q), i.e. the length of Q. We know however
from Remark 1.1 that every st-cut of G contains some edge of Q. Hence, the length of Q is at
least equal to the total width of all st-cuts. We summarize this result,
Proposition 3.1 (Optimality Conditions for Shortest Paths). If the widths are feasible, then
the total width of all st-cuts is a lower bound on the length of any st-path. In particular, if an
st-path has length equal to the total width of all st-cuts, then it is a shortest st-path.
Example 7, continued. We can now prove that the path P = sa, ac, cb, bt is a shortest st-path.
We found a set of feasible widths, with total width,
Exercises
1. Let G = (V, E) be a graph and let s,t be distinct vertices of G.
(a) Show that if we have a collection of k edge disjoint st-paths then every st-cut contains
at least k edges.
(b) For the following graph find an st-cut with as few edges as possible. Use (a) to justify
your answer.
a f
e
c
s
h
d t
b
g
Suppose now each edge e has a non-negative thickness ce . The thickness of an st-cut is
the sum of the thickness of all the edge in the cut. Suppose we assign to each st-path P a
non-negative width yP and assume that the following condition holds for every edge e:
The total width of all the st-paths using e does not exceed the thickness of edge e.
(c) Show then that the total width of all the st-paths is a lower bound on the thickness of
any st-cut.
(d) For the following graph find an st-cut with minimum thickness. Edge labels indicate
thickness. Use (c) to justify your answer.
a f
2 6
4 e
2
c
2 4
s 4
7
3 h
3 3
d t
5
5
5 4 6
b 4
g
120 CHAPTER 3. DUALITY THROUGH EXAMPLES
where
2 1 20
2
A = 1 1 b = 18 c= .
3
−1 1 8
It is easy to verify that the vectors (8, 16)T and (5, 13)T are all feasible for the linear pro-
gram (3.2). Their objective values are 64 and 49, respectively, and hence the first feasible
solution is clearly not optimal. We will show however that (5, 13)T is an optimal solution by
proving that z(x̄) ≥ 49 for every feasible solution x̄.
How can we find (and prove) such a lower bound? Recall from Chapter 2, that every
feasible solution of (3.2) must satisfy the inequality
2 1 20
(y1 , y2 , y3 ) 1 1 x ≥ (y1 , y2 , y3 ) 18 . (3.3)
−1 1 8
for any non-negative vector y = (y1 , y2 , y3 )T . If we choose values, ȳ1 = 0, ȳ2 = 2 and ȳ3 = 1
we obtain the inequality,
Adding this inequality to the objective function z(x) = (2, 3)x yields,
Let x̄ be any feasible solution. As x̄ ≥ 0 and (1, 0) ≥ 0T we have (1, 0)x̄ ≥ 0. Hence, z(x̄) ≥ 44.
Thus, we have proved that no feasible solution has value smaller than 44. Note, this is not
quite sufficient to prove that (5, 13)T is optimal. It shows however, that the optimum value for
3.1. THE SHORTEST PATH PROBLEM 121
(3.2) is between 44 and 49. It is at most 49, as we have a feasible solution with that value, and
it cannot be smaller than 44 by the previous argument.
Let us search for y1 , y2 , y3 ≥ 0 in a systematic way. We rewrite (3.3) as
20 2 1
0 ≥ (y1 , y2 , y3 ) 18 − (y1 , y2 , y3 ) 1 1 x
8 −1 1
Then for any feasible solution x̄, inequality (3.4) and the fact that x ≥ 0 implies that
20
z(x̄) ≥ (y1 , y2 , y3 ) 18 .
8
For a minimization problem the larger the lower bound the better. Thus, the best possible
upper bound for (3.2) we can achieve using the above argument is given by the optimal value
to the following linear program,
20
max (y1 , y2 , y3 ) 18
8
subject to
2 1
(2, 3) − (y1 , y2 , y3 ) 1 1 ≥ 0T
−1 1
y1 , y2 , y3 ≥ 0
122 CHAPTER 3. DUALITY THROUGH EXAMPLES
min{cT x : Ax ≥ b, x ≥ 0} (3.6)
yT Ax ≥ yT b.
This last inequality is obtained from Ax ≥ b by multiplying the 1st inequality by y1 ≥ 0 the
2nd by y2 ≥ 0 the 3rd by y3 ≥ 0 etc, and by adding all of the resulting inequalities together.
This inequality can be rewritten as
0 ≥ yT b − yT Ax
which holds for every feasible solution x̄ of (3.6). Thus, adding this inequality to the objective
function z(x) = cT x yields,
The best lower bound we can get in this way is the optimal value of
Theorem 3.2 (Weak Duality - Special form). Consider the following pair of linear programs,
min{cT x : Ax ≥ b, x ≥ 0} (P)
max{bT y : AT y ≤ c, y ≥ 0} (D)
Let x̄ be a feasible solution for (P) and ȳ be a feasible solution for (D). Then cT x̄ ≥ bT ȳ.
Moreover, if equality holds then x̄ is an optimal solution for (P).
Proof of Theorem 3.2. Let x̄ be a feasible solution of (P) and let ȳ be a feasible solution (D).
Then
bT ȳ = ȳT b ≤ yT (Ax̄) = (yT A)x̄ = (AT ȳ)T x̄ ≤ cT x̄.
The first inequality follows from the fact that ȳ ≥ 0 and that Ax̄ ≥ b. The second inequality
follows from the fact that AT ȳ ≤ c and that x̄ ≥ 0. Finally, as bT ȳ is a lower bound on (P), if
cT x̄ = bT ȳ it follows that x̄ is optimal for (P).
Example 9. In the following figure we show a simple instance of the shortest path problem.
Each of the edges in the graph is labeled by its length.
124 CHAPTER 3. DUALITY THROUGH EXAMPLES
a
3 2
s 1 t
4 2
Let x := (xsa , xsb , xab , xat , xbt )T . Then (3.9) specializes to the following in this case,
Each of the first 4 constraints correspond to one of the 4 distinct st-cuts δ (U). For instance
the second constraint, corresponding to δ ({s, a}) = {sb, ab, at}, states that xsb + xab + xat ≥ 1.
For an integer program (IP) we call the linear program, obtained by removing the condition
that some variables have to take integer values, the linear programming relaxation of (IP).
Consider an instance of the shortest path problem G = (V, E) with s,t ∈ V and ce ≥ 0 for all
e ∈ E. Suppose P is some arbitrary st-path of G. Then we can construct a feasible solution x̄
to the integer program (3.9) as follows:
1 if e is an edge of P
x̄e = (3.11)
0 otherwise.
Moreover, the length c(P) of the st-path P is equal to the value of x̄ for (3.9). It follows in
particular that c(P) is greater of equal to the optimal value of (3.9). Let (P) denote the linear
programming relaxation of (3.9). Since the constraints of (P) are a subset of the constraints
of (3.9) and since (3.9) is a minimization problem, the optimal value of (3.9) is greater or
3.1. THE SHORTEST PATH PROBLEM 125
equal to the optimal value of (P). Let (D) be the dual of (P). We know from the Weak duality
theorem 3.2 that the optimal value of (P) is greater or equal to the value of any feasible solution
ȳ of (D). Hence, we have proved the following result,
Remark 3.3. If (D) is the dual of the linear programming relaxation of (3.9) then the value
of any feasible solution of (D) is a lower bound on the length of any st-path.
Example 9, continued. Let us compute the dual of the linear programming relaxation of (3.10),
as defined in Theorem 3.2. Observe that by taking the dual we interchange the role of the vari-
ables and the constraints. We now have variables yU for each st-cut δ (U) and one constraint
T
for each edge of G. Hence, let us define y := y{s} , y{s,a} , y{s,b} , y{s,a,b} . The dual is given by
max 1T y
subject to
{s} {s, a} {s, b} {s, a, b}
sa 1 0 1 0
3
sb 1 1 0 0 (3.12)
4
ab 0 1 1 0 y ≤ 1 ,
2
at 0 1 0 1
2
bt 0 0 1 1
y≥0
where 1 denotes a column vector of 1s. Note, a feasible solution to (3.12) assigns some non-
negative width yU to every st-cut δ (U). The constraint for edge sb states that y{s} + y{s,a} ≤
4. Observe that δ ({s}) = {sa, sb} and δ ({s, a}) = {at, ab, sb} are the two st-cuts of G that
contain edge sb. Finally, 4 is the length of the edge sb. Hence the constraint y{s} + y{s,a} ≤ 4
says that the total width of all st-cuts of G that contain edge sb does not exceed the length of
sb. The corresponding condition holds for every other edge. Hence, if y is feasible for (3.12)
the widths y satisfy the Feasibility Conditions (3.1). The objective function 1T y calculates the
total width of all st-cuts. Hence, it follows from Remark 3.3 that if the width are feasible, then
the total width of all st-cuts is a lower bound on the length of any st-path. This was precisely
the statement of Proposition 3.1.
Consider now a general instance of the shortest path problem. We are given a graph
G = (V, E), vertices s,t ∈ V , and lengths ce ≥ 0 for all e ∈ E. We can rewrite the linear
126 CHAPTER 3. DUALITY THROUGH EXAMPLES
where c is the vector of edge lengths, and the matrix A is defined as follows:
Remark 3.4.
Let us try to understand this dual. There is a variable yU for every st-cut δ (U) and a constraint
for every edge e ∈ E. Consider the constraint for edge e ∈ E. The right-hand side of this
constraint is the length ce of the edge, and the left-hand side corresponds to column e of A.
Remark 3.4(2) implies that the left-hand side of this constraint is the sum of the variables yU
over all st-cuts δ (U) that contain e. We can therefore rewrite (3.14) as follows:
max ∑ yU : δ (U) is an st-cut
subject to
(3.15)
y
∑ U : δ (U) is an st-cut containing e ≤ ce (e ∈ E)
yU ≥ 0 (δ (U) is an st-cut)
A feasible solution ȳ to (3.15) assigns non-negative width ȳU to every st-cut δ (U). The
constraint for each edge e states that the total width of all st-cuts of G that contain edge e does
not exceed the length of e. Hence, if ȳ is feasible for (3.15) the widths ȳ satisfy the Feasibility
3.1. THE SHORTEST PATH PROBLEM 127
Conditions (3.1). The objective function 1T y calculates the total width of all st-cuts. Hence,
as in the previous example, it follows from Remark 3.3 that if the width are feasible, then the
total width of all st-cuts is a lower bound on the length of any st-path. Hence, we now have
an alternate proof for Proposition 3.1.
While the derivation of Proposition 3.1 using duality may, at first glance, seem more tech-
nical than the ad-hoc argument we had in Section 3.1.1, notice that our derivation was com-
pletely mechanical: We formulated the shortest path problem as an integer program, wrote
the dual of its linear programming relaxation, and used weak duality to obtain bounds on the
possible values of our original optimization problem. After generalizing the notion of duals
in Chapter 4, we will be able to apply the aforementioned strategy to arbitrary optimization
problems that can be formulated as integer programs. A word of caution, however: The qual-
ity of the bounds obtained through this procedure depend on the problem, and on the linear
programming relaxation used. In the shortest path example discussed here, the bound pro-
duced was sufficient to prove optimality of a certain primal solution. This may not always be
possible as we will see in Chapter 6.
Exercises
1. A vertex cover of a graph G = (V, E) is a set S of vertices of G such that each edge of G is
incident with at least one vertex of S . The following IP finds a vertex cover of minimum
cardinality (see Problem 2 in Chapter 1.5.2),
min ∑ xe : e ∈ E
subject to
xi + x j ≥ 1 (for all i j ∈ E)
x≥0
(b) Show that the largest size (number of edges) of a matching in G is a lower bound on
the size of a minimum vertex cover of G.
H INT: Use part (a) and Theorem 3.2.
128 CHAPTER 3. DUALITY THROUGH EXAMPLES
(c) Give an example where all matchings have size strictly less than the optimal solutions
of (P) and of (D) and where these are strictly less than the size of the minimum vertex
cover.
2. The following graph has vertices of two types, a set H = {1, 2, 3, 4, 5, 6, 7} of hubs, indi-
cated by filled circles, and a set C = {a, b, c, d, e, f } of connectors, indicated by squares.
a 2 d
5
b 3
1 e 7
6
c 4 f
A subset S of the hubs is dominant if each connector in C is has an edge to least one hub in
S. For instance, S = {1, 3, 7} is dominant because a, b and c have edges to 1, d and f have
edges to 7, and e has an edge to 3.
3.1.4 An algorithm
We will present an algorithm to solve the shortest path problem based on the optimality condi-
tions given in Proposition 3.1. Before we do so, however, we require a number of definitions.
3.1. THE SHORTEST PATH PROBLEM 129
such that v1 = s, vk = t and vi 6= v j for all i 6= j. In other words a directed st-path is an st-path
in the graph obtained by ignoring the orientation, with the additional property that for any two
consecutive arcs, the head of the first arc, is the tail of the second arc. The following figure
gives an example of a graph with both edges and arcs. The thick arcs form a directed st-path.
a b
c d
t
g s
Consider now a general instance of the shortest path problem. We are given a graph
G = (V, E), vertices s,t ∈ V , and lengths ce ≥ 0 for all e ∈ E. Suppose that ȳ is a feasible
solution to (3.15). In other words every st-cut δ (U) has a non-negative width ȳU ≥ 0 and the
Feasibility Condition (3.1) holds, namely, for every edge e the total width of all st-cuts using
e is smaller or equal to the length of e. We define the slack of edge e for ȳ to be the quantity,
slackȳ (e) := ce − ∑ ȳU : δ (U) is an st-cut containing e .
In other words, slackȳ (e) is the length of e minus the total width of all st-cuts using e.
We are now ready to describe our algorithm. 2
Example 10. Consider the shortest path problem described in Figure 3.1(i). We start with
ȳ = 0 and U = {s}. We compute the slack of the edges in δ ({s}),
Algorithm 2 Shortest-path.
Input: Graph G = (V, E), costs ce ≥ 0 for all e ∈ E, s,t ∈ V where s 6= t.
Output: A shortest st-path P
1: ȳW := 0 for all st-cuts δ (W ). Set U := {s}
2: while t 6∈ U do
3: Let ab be an edge in δ (U) of smallest slack for ȳ where a ∈ U, b ∈
/U
4: ȳU := slackȳ (ab)
5: U := U ∪ {b}
→
−
6: change edge ab into an arc ab
7: end while
8: return A directed st-path P.
and therefore sb is the edge in δ ({s}) of smallest slack for ȳ. We let y{s} = 2, we set U :=
→
−
{s, b} and change edge sb into an arc sb. See figure (ii).
Since t ∈
/ U = {s, b} we compute the slack of the edges in δ ({s, b}),
and therefore bc is the edge in δ ({s, b}) with smallest slack for ȳ. We let ȳ{s,b} = 1, we set
→
−
U = {s, b, c} and change edge bc into an arc bc. See figure (iii).
Since t ∈
/ U = {s, b, c} we compute the slack of the edges in δ ({s, b, c}),
and therefore ca is the edge in δ ({s, b, c}) with smallest slack for ȳ. We let ȳ{s,b,c} = 1, we set
U = {s, b, c, a} and change edge ca into an arc → −
ca. See figure (iv).
3.1. THE SHORTEST PATH PROBLEM 131
yU = 2 yU = 1 yU = 1
s 6 a s 6 a s 6 a
1 1 1
4 c 4 c 4 c
2 4 2 4 2 4
1 2 1 2 1 2
5 t 5 t 5
b b b t
(i) (ii) (iii)
s 6 a s 6 a
1 1
4 c 4 c
yU = 1 2 4 2 4
1 2 1 2
5 5 t
b t b
(iv) (v)
Since t ∈
/ U = {s, b, c, a} we compute the slack of the edges in δ ({s, b, c, a}),
slackȳ (at) = 4
slackȳ (ct) = 2 − y{s,b,c} = 2 − 1 = 1
slackȳ (bt) = 5 − y{s,b} − y{s,b,c} = 5 − 1 − 1 = 3,
and therefore ct is the edge in δ ({s, b, c, a}) with smallest slack for ȳ. We let ȳ{s,b,c,a} = 1, we
→
−
set U = {s, b, c, a,t} and change edge ct into an arc ct . See figure (v). Note, now that t ∈ U
→
− →− → −
and there exists a directed st-path P, namely P = sb, bc, ct . It can be readily checked that ȳ
are feasible width. Moreover, the total width of all st-cut is equal to 2 + 1 + 1 + 1 = 5 and the
length c(P) of P is equal to 2 + 1 + 2 = 5. It follows from Proposition 3.1 that P is a shortest
st-path.
We will prove that this simple algorithm is guaranteed to always find a shortest st-path in
the next section. Suppose after running the algorithm and finding a shortest path P we define
variables x̄ as in (3.11). Then x̄ is a feasible solution to the linear program (3.9) and ȳ is a
132 CHAPTER 3. DUALITY THROUGH EXAMPLES
feasible solution to the dual linear program (3.15). Moreover, we will see that the algorithm
will guarantee that the total width of all st-cuts is equal to the length of the shortest st-path.
In other words that the value of x̄ in (3.9) is equal to the value ȳ in (3.15). It follows from
Weak Duality (Theorem 3.2), that x̄ is an optimal solution to (3.9). Hence, correctness of the
algorithm will imply the following result,
Theorem 3.5. If ce ≥ 0 for all e ∈ E then the linear program (3.9) has an integral optimal
solution; i.e., it has an optimal solution all of whose variables have integer values.
Note that our shortest path algorithm preserves at each step a feasible solution to the dual
linear program (3.15). This is an example of a primal-dual algorithm. We will see axnother
example of such an algorithm in Chapter 5.1.
Exercises
1. For each of the following two graphs find the shortest path between s and t using the
algorithm described in the course. The number next to edge edge indicates the length of
the edge. Make sure to describe for each step of the algorithm which edge becomes an
arc, which vertex is added to the set U and what st-cut becomes active. At the end of the
procedure give the shortest st-path and certify that it is indeed a shortest st-path.
2
2 4
2
3 4
s
6 3 4 3 1
3 3 1 6
s 2 1 t
5
4 7 3
7
2 1 t
Proposition 3.6. Let ȳ be feasible widths and let P be an st-path. Then P is a shortest st-path
if both of the following conditions hold:
Proof. Suppose that P is an st-path that satisfies both (P1) and (P2) for feasible widths ȳ. For
every edge e of P, denote by Ce the set of all active st-cuts that contain edge e. To prove that P
is a shortest path it suffices (because of Proposition 3.1) to verify that the following relations
hold,
Total width of st-cuts = ∑ yU : δ (U) is an active st-cut
(a)
h i
= ∑ ∑ yU : δ (U) ∈ Ce
e∈P
(b)
= ∑ ce = length of P.
e∈P
Finally, note that (a) holds because (P2) and that (b) holds because (P1).
Note that the algorithm is guaranteed to terminate after at most |V | iterations since at every
step one vertex is added to the set U. Hence, to show that the algorithm is correct it will suffice
to verify the following result,
Claim. If (I1)-(I4) hold in S TEP 8 then a directed st-path P exists and it is a shortest st-path.
Proof of claim: Since the loop completed, t ∈ U and by (I4) there exists a directed st-path P.
We will show that P is a shortest st-path. (I1) states that ȳ is feasible. (I2) implies that all
arcs of P are equality arcs. Because of Proposi-
tion 3.6 it suffices to verify that active cuts for ȳ con-
tain exactly one edge of P. Suppose for a contradic- s
tion there is an active cut δ (U) that contains at least
two edges of P (it contains at least one edge because f
U
of Remark 1.1). Denote by f the second arc of P,
starting from s, that in δ (U). Then (see figure on the
right), the tail of f is not in U but the head of f is in
t
U, contradicting (I3). 3
We leave it as an exercise to verify that (I1)-(I5) hold at the end of S TEP 1. Assume that
(I1)-(I5) hold at the beginning of the LOOP we will show that (I1)-(I5) hold at the end of the
LOOP . Together with the Claim this will complete the proof. In the following argument U and
ȳ represent the quantities at the start of the LOOP and U 0 and y0 the end of the LOOP. The only
difference between ȳ and y0 is that ȳU = 0 and that yU
0 = slack (ab). Hence, to verify (I1) it
ȳ
suffices to consider f ∈ δ (U). Then
0
slacky0 ( f ) = c f − ∑ yU : δ (U) is an st-cut containing f
0
= c f − ∑ ȳU : δ (U) is an st-cut containing f − yU (?)
By the choice of ab in STEP 3, slackȳ ( f ) ≥ slackȳ (ab). Hence, (?) implies that slacky0 ( f ) ≥ 0.
Thus the feasibility condition (3.1) holds for y0 , i.e. (I1) holds at the end of the LOOP. By (?)
we have slacky0 (ab) = 0. Since ab is the only new arc between the start and the end of the
LOOP , (I2) holds at the end of the LOOP . The only cut that is active in y0 but not ȳ is δ (U).
By (I5) all arcs different from ab have both ends in U. Arc ab has tail in U and head outside
U. It follows that (I3) holds at the end of the LOOP . By (I4) there exists an sa-directed path
Q. Moreover, by (I5) all arcs of Q have both tail and head in U. Hence, b is distinct from all
vertices in Q, and adding arc ab at the end of Q gives a directed sb-path. It follows that (I4)
3.2. MINIMUM COST PERFECT MATCHING IN BIPARTITE GRAPHS 135
holds at the end of the LOOP. Finally, since the only new arc is ab and a, b ∈ U 0 , (I5) holds at
the end of the LOOP.
Exercises
1. Let G = (V, E) be a graph with vertices s and t and suppose there exists at least one st-path
in G. The cardinality case of the shortest path problem is the problem of finding a shortest
path with as few edges as possible.
a) Simplify the shortest path algorithm in Chapter 3.1.4 to deal with the cardinality case.
You should strive to make the resulting algorithm as simple as possible.
b) Simplify Proposition 3.6 for the cardinality case. Try to make the resulting statement
as simple as possible.
c) Simplify Proposition 3.7 for the cardinality case. Try to make the proof of correctness
as simple as possible.
2. Let G = (V, E) be a graph with distinct vertices s and t and non negative edge weights c.
a
3 6
1
g b
1
h 2
7 3
4 0
d c
1
Example 11 continued. Suppose that for every edge incident to vertex b we decrease the cost
of all edges incident to vertex b by the value 3, see figure 3.2(i). Since every perfect matching
has exactly one edge that is incident to vertex b this will decrease the cost of every perfect
matching by exactly 3. In particular, if a matching M is a minimum cost perfect matching
with these new costs, then it must be a minimum cost perfect matching with the original costs.
In figure 3.2(ii) we pick values for every vertex in the graph and repeat the same argument
for each vertex. For instance if as we choose value 3 for vertex b and value 2 for vertex a the
new cost of edge ab is given by 6 − 3 − 2 = 1. We indicate in (ii) the new cost for every edge.
Observe now, that for the graph given in (ii), every edge has non-negative cost. This implies
in particular, that every perfect matching has non-negative cost. However, as the matching
M = {ag, hb, cd} has cost 0, M must be a minimum cost perfect matching with these new
costs. But then M must be a minimum cost perfect matching with the original costs.
Let us try to extend these ideas to an arbitrary graph G = (V, E) with edge costs ce for all
e ∈ E. Let us assign to every vertex u, a number ȳu that we call the potential of the vertex u.
3.2. MINIMUM COST PERFECT MATCHING IN BIPARTITE GRAPHS 137
2
a a
6−
3 3 0 1
1 0
g b g b
1
h 3 3 1 h 0
3
2− 1
−1
3−3
7 3 2
4 0 2 3
d c d c
1 3 0 −2
(i) (ii)
Let M be a perfect matching. Since by definition M has exactly one edge that is incident to
every vertex u, the difference between the cost of M with costs c and costs c̄ is given by the
sum of all the potentials ȳu over every vertex u, i.e. by ∑ ȳu u ∈ V . Since this quantity is a
constant (for a fixed ȳ) it follows in particular that if M is a minimum cost perfect matching
with the reduced costs c̄, then it must be a minimum cost perfect matching with the original
costs c. If we selected the potentials ȳ so that every reduced costs has non-negative costs then
every perfect matching has non-negative costs. If in addition all edges in the perfect matching
M have zero reduced costs, then M must be a minimum cost perfect matching with respect to
the reduced costs c̄. But then, it means that M is a minimum cost perfect matching with the
original costs c.
We formalize this result by introducing a few definitions. Let we say that an edge uv is an
equality edge with respect to some potentials ȳ if its reduced cost c̄uv = cuv − ȳu − ȳv = 0. We
say that potentials ȳ are feasible if they satisfy the following,
Feasibility condition. For every edge e ∈ E, c̄uv = cuv − ȳu − ȳv ≥ 0. (3.16)
We have thus proved the following,
Proposition 3.8 (Optimality Condition for perfect matchings). If the potentials ȳ are feasible
and all edges of M are equality edges with respect to ȳ then M is a minimum cost perfect
matching.
138 CHAPTER 3. DUALITY THROUGH EXAMPLES
The arguments used to prove Proposition 3.8 are fairly elementary. We will see however,
that this result is surprisingly powerful. Indeed we will be able to design an efficient algorithm,
for the class of bipartite graphs, based on this optimality condition. Deriving good optimal
conditions is a key step in designing an algorithm.
where
4
−2 1 −1 0 −2 3
A = 9 2 1 1
b= 7 c=
−2 .
5 1 0 3 7
3
It is easy to verify that the vectors (1/2, 0, 1, 3/2)T and (0, 1, 3, 2)T are both feasible for the
linear program (3.17). Their objective values are 9/2 and 3, respectively, and hence the first
feasible solution is clearly not optimal. We will show however that (0, 1, 3, 2)T is an optimal
solution by proving that z(x̄) ≥ 3 for every feasible solution x̄.
How can we find (and prove) such a lower bound? As before, recall from Chapter 2 that,
for any vector y = (y1 , y2 , y3 )T ∈ R3 , the equation
−2 1 −1 0 −2
(y1 , y2 , y3 ) 9 2 1 1 x = (y1 , y2 , y3 ) 7
(3.18)
5 1 0 3 7
holds for every feasible solution of (3.17). In particular, if we choose values, ȳ1 = 1, ȳ2 = −1
and ȳ3 = 1 we obtain the equality,
Adding this equality to the objective function z(x) = (4, 3, −2, 3)x yields,
and add it to the objective function z(x) = (4, 3, −2, 3)x, to obtain,
−2 −2 1 −1 0
z(x) = (y1 , y2 , y3 ) 7 + (4, 3, −2, 3) − (y1 , y2 , y3 ) 9 2 1 1 x. (3.19)
7 5 1 0 3
Then for any feasible solution x̄, inequality (3.19) and the fact that x̄ ≥ 0 implies that
−2
z(x̄) ≥ (y1 , y2 , y3 ) 7 .
7
For a minimization problem the larger the lower bound the better. Thus, the best possible
upper bound for (3.17) we can achieve using the above argument is given by the optimal
value to the following linear program,
−2
max (y1 , y2 , y3 ) 7
7
subject to
−2 1 −1 0
(4, 3, −2, 3) − (y1 , y2 , y3 ) 9 2 1 1 ≥ 0T
5 1 0 3
y1 , y2 , y3 ≥ 0
140 CHAPTER 3. DUALITY THROUGH EXAMPLES
and this solution has objective value 3. Since solution (0, 1, 3, 2)T has value 3, it is optimal.
Let us generalize the previous argument and consider the following linear program,
min{cT x : Ax = b, x ≥ 0} (3.21)
yT Ax = yT b.
This last equality is obtained from Ax = b by multiplying the 1st equality by y1 the 2nd by y2
the 3rd by y3 etc, and by adding all of the resulting equalities together. This equality can be
rewritten as
0 = yT b − yT Ax
which holds for every feasible solution x̄ of (3.6). Thus, adding this equality to the objective
function z(x) = cT x yields,
The best lower bound we can get in this way is the optimal value of
where we note that the variables y are free (i.e., the variables are unrestricted). Let us summa-
rize our result and provide a more direct proof.
Theorem 3.9 (Weak Duality - Special form). Consider the following pair of linear programs,
min{cT x : Ax = b, x ≥ 0} (P)
max{bT y : AT y ≤ c}. (D)
Let x̄ be a feasible solution for (P) and ȳ be a feasible solution for (D). Then cT x̄ ≥ bT ȳ.
Moreover, if equality holds then x̄ is an optimal solution for (P).
Proof of Theorem 3.9. Let x̄ be a feasible solution of (P) and let ȳ be a feasible solution (D).
Then
bT ȳ = ȳT b = ȳT (Ax̄) = (ȳT A)x̄ = (AT ȳ)T x̄ ≤ cT x̄.
The inequality follows from the fact that AT ȳ ≤ c and that x̄ ≥ 0. Finally, as bT ȳ is a lower
bound on (P), if cT x̄ = bT ȳ it follows that x̄ is optimal for (P).
Example 13. In the following figure we show a simple instance of the perfect matching prob-
lem. Each of the edges in the graph is labeled by its cost.
142 CHAPTER 3. DUALITY THROUGH EXAMPLES
4
a
3 6
d 2 b
1 2
g
Let x := (xab , xbc , xdg , xad , xag , xbd )T . Then (3.24) specializes to the following in this case,
Each of the first 4 constraints correspond to one of the 4 vertex of G. For instance the second
constraint, corresponding to vertex b, states that xab + xbg = 1, i.e. that we should select
exactly one of the edges incident to b.
Recall that for an integer program (IP) we call the linear program, obtained by removing
the condition that some variables have to take integer values, the linear programming relax-
ation of (IP). Consider an instance of the minimum cost perfect matching problem G = (V, E)
with ce for all e ∈ E. Suppose M is some arbitrary perfect matching of G. Then we can
construct a feasible solution x̄ to the integer program (3.24) as follows:
1 if e is an edge of M
x̄e = (3.26)
0 otherwise.
Moreover, the cost c(M) of the perfect matching M is equal to the value of x̄ for (3.24). It
follows in particular that c(M) is greater or equal to the optimal value of (3.24). Let (P) denote
the linear programming relaxation of (3.24). Since the constraints of (P) are a subset of the
constraints of (3.24) and since (3.24) is a minimization problem, the optimal value of (3.24)
is greater or equal to the optimal value of (P). Let (D) be the dual of (P). We know from the
3.2. MINIMUM COST PERFECT MATCHING IN BIPARTITE GRAPHS 143
Weak duality theorem 3.9 that the optimal value of (P) is greater or equal to the value of any
feasible solution ȳ of (D). Hence, we have proved the following result,
Remark 3.10. If (D) is the dual of the linear programming relaxation of (3.24) then the value
of any feasible solution of (D) is a lower bound on the cost of any perfect matching.
Example 9, continued. Let us compute the dual of the linear programming relaxation of (3.25),
as defined in Theorem 3.9. Observe that by taking the dual we interchange the role of the
variables and the constraints. We now have variables yv for every vertex v and one constraint
for each edge of G. Hence, let us define y := ya , yb , yd , yg ). The dual is given by,
max 1T y
subject to
a b d g
ab 1 1 0 0
bg 0 1 0
1
6 (3.27)
2
dg0 0 1
1 1
y ≤
3 .
ad
1 0 1
0
2
ag
1 0 0
1 4
bd 0 1 1 0
A feasible solution to (3.27) assigns some potential yv to every vertex v. The constraint for
edge ab states that ya + yb ≤ 6 = cab or equivalently that the reduced cost cab − ya − yb of
edge ab is non-negative. Similarly, for every other edge the corresponding constraint states
the reduced cost is non-negative. Hence, if ȳ is feasible for (3.27) then the potentials ȳ satisfy
the Feasibility Conditions (3.16). For instance, ȳa = 2, ȳb = 2, ȳd = 1 and ȳg = 0 are feasible
potentials. Observe that,
i.e. ad and bc are equality edges with respect to ȳ. It follows that,
Hence, the value of ȳ in (3.27) is equal to the cost of the perfect matching M = {ad, bc}. It
follows by Remark 3.10 that M is a minimum cost perfect matching. Hence, in this example
144 CHAPTER 3. DUALITY THROUGH EXAMPLES
we see that if the potentials ȳ are feasible and every edge of M is an equality edge, then M is
a perfect matching as predicted by Proposition 3.8.
Consider now a general instance of the perfect matching problem. We are given a graph
G = (V, E) and costs ce for all e ∈ E. We can rewrite the linear programming relaxation
of (3.24) as,
min{cT x : Ax = 1, x ≥ 0}, (3.28)
where c is the vector of edge costs, and the matrix A is defined as follows:
Remark 3.11.
Let us try and understand this dual. There is a variable yv for every vertex v and a constraint
for every edge e ∈ E. Consider the constraint for edge e ∈ E. The right-hand side is the cost
ce of the edge, and the left-hand side corresponds to column e = uv of A. Remark 3.11(2)
implies that the left-hand side of this constraint is yu + yv . We can therefore rewrite (3.29) as
follows:
max ∑ yv : v ∈ V
subject to (3.30)
yu + yv ≤ cuv (uv ∈ E)
A feasible solution ȳ to (3.30) assigns some potential yv to every vertex v. The constraint for
each edge uv states that yu + yv ≤ cuv or equivalently that the reduced cost cuv − yu − yv of edge
3.2. MINIMUM COST PERFECT MATCHING IN BIPARTITE GRAPHS 145
uv is non-negative. Hence, if ȳ is feasible for (3.30) then the potentials satisfy the Feasibility
Conditions (3.16). Let M be a perfect matching and suppose that every edge uv ∈ M is an
equality edge, i.e. ȳu + ȳv = cuv . Then,
where the second equality follows from the fact that M is a perfect matching i.e. that every
vertex of G is the end of exactly one edge of M. Hence, the value of ȳ in (3.30) is equal to
the cost of the perfect matching M. It follows by Remark 3.10 that M is a minimum cost
perfect matching. Hence, we see that if the potentials ȳ are feasible and every edge of M is
an equality edges, then M is a perfect matching. Hence, we now have an alternate proof for
Proposition 3.8.
While the derivation of Proposition 3.8 using duality may, at first glance, seem more tech-
nical than the ad-hoc argument we had in Section 3.2.1, notice that our derivation was com-
pletely mechanical: We formulated the minimum cost perfect matching problem as an integer
program, wrote the dual of its linear programming relaxation, and used weak duality to obtain
bounds on the possible values of our original optimization problem. After generalizing the
notion of duals in Chapter 4, we will be able to apply the aforementioned strategy for arbi-
trary optimization problems that can be formulated as integer programs. Once again a word
of caution at this point: while the dual bound obtained through Proposition 3.8 was sufficient
to prove the optimality of a certain perfect matching, this may not always work. The success
of the mechanical procedure outlined above heavily depends on the quality of the underlying
linear program as well as the problem.
3.2.4 An algorithm
In this section we give an algorithm to find a minimum cost perfect matching in a bipartite
graph only. While there exist efficient algorithms for solving the minimum cost perfect match-
ing problem for general graphs, these algorithms are more involved and beyond the scope of
this book. One of the key reasons is that for general graphs we cannot always certify that a
minimum cost perfect matching is indeed a minimum cost perfect matching using Proposi-
tion 3.8.
We first need to characterize which bipartite graphs have a perfect matching. Recall that
a graph G = (V, E) is bipartite with bipartition U,W , if V is the disjoint union of U and
W and every edge has one end in U and one end in W . A necessary condition for a graph
146 CHAPTER 3. DUALITY THROUGH EXAMPLES
Theorem 3.12 (Hall’s Theorem). Let G = (V, E) be a bipartite graph with bipartition U,W
where |U| = |W |. Then, there exists a perfect matching M if and only if G has no deficient
set S ⊆ U. Moreover, there exists an efficient (polynomial-time) algorithm that given G will
either find a perfect matching M or find a deficient set S ⊆ U.
We postpone the proof of the previous result and the description the associated algorithm until
Section 3.2.6.
We are now ready to start the description of the matching algorithm. We consider a bi-
partite graph G = (V, E) with bipartition U,W where |U| = |W |. In addition we are given a
set of edge costs ce for every edge e ∈ E. At each step we have a feasible solution ȳ to the
3.2. MINIMUM COST PERFECT MATCHING IN BIPARTITE GRAPHS 147
linear program (3.30). To get an initial feasible dual solution we let α denote the value of the
minimum cost edge and set ȳv := 21 α, for all v ∈ V .
Given a dual feasible solution ȳ we construct a graph H as follows: H has the same set
of vertices as G, and the edges of H consists of all edges of G that are equality edges with
respect to ȳ. We know from Theorem 3.12 that we can either find,
a) a perfect matching M in H, or
b) a deficient set S ⊆ U of H.
In case (a) since all edges of M are equality edges, Proposition 3.8 implies that M is a min-
imum cost perfect matching of G, in which case we can stop the algorithm. Thus we may
assume that (b) occurs. We will use the set S to define a new feasible solution y0 to (3.30) with
larger objective value than ȳ as follows, for every vertex v,
ȳ + ε for v ∈ S
v
y0v := ȳv − ε for v ∈ NH (S)
ȳ
v otherwise.
We wish to choose ε ≥ 0 as large as possible such that y0 is feasible for (3.30), i.e. for every
edge uv we need to satisfy cuv − y0u − y0v ≥ 0. As y is feasible for (3.30) cuv − ȳu − ȳv ≥ 0.
Consider an edge uv, we may assume u ∈ U and v ∈ W . There are four possible cases for such
an edge (see figure):
1
Case 1. u 6∈ S, v 6∈ NH (S).
N (S)
Case 2. u 6∈ S, v ∈ NH (S). 2
U S W
Case 3. u ∈ S, v ∈ NH (S). 3
Case 4. u ∈ S, v 6∈ NH (S).
4
Let us investigate each case,
Thus, we only need to worry about Case 4. Note, we may assume that there is such an edge,
for otherwise S is a deficient set for G and we may stop as G has no perfect matching. We
want, 0 ≤ cuv − y0u − y0v = cuv − (ȳu + ε) − ȳv , i.e. ε ≤ cuv − ȳu − ȳv . Note that since uv is not
an edge of H, it is not an equality edge. Hence, cuv − ȳu − ȳv > 0. Thus, we can choose ε > 0
as follows:
ε = min{cuv − ȳu − ȳv : uv ∈ E, u ∈ S, v ∈
/ NH (S)}. (3.31)
Note, that 1T y0 − 1T ȳ = ε(|S| − |NH (S)|) > 0. Hence, the new dual solution y0 has a higher
(better) value than y. Algorithm 3 has a formal description of our algorithm.
Example 14. Consider the minimum cost perfect matching problem described in Figure 3.3(i)
(left) where the edges are labeled by the costs. Since the minimum cost is 2 we initially set
potentials ȳa = ȳb = ȳc = ȳd = ȳe = ȳ f = 1. In figure (i) (right) we indicate the graph H and
the deficient set S. The edges of G with one endpoint in S and the other endpoint not in NH (S)
3.2. MINIMUM COST PERFECT MATCHING IN BIPARTITE GRAPHS 149
G and H and
potential deficient set/matching
S NH (S)
2 +2 a
1 a d 1 d −2
4 2
(i) 5
1 b e 1 +2 b e
6 2
1 c f 1 c f
3
S NH (S)
3 a 2 d −1 +1 a d −1
4 2
(ii) 5 +1 −1
3 b e 1 b e
6 2
1 c f 1 +1 c f
3
2
4 a d −2 a d
4 2
(iii) 5 e 0 e
4 b b
6 2
2 c f 1 c f
3
The new potentials are given in figure (ii) (left). In figure (ii) (right) we indicate the graph H
150 CHAPTER 3. DUALITY THROUGH EXAMPLES
and the deficient set S. The edges of G with one endpoint in S and the other endpoint not in
NH (S) are b f , c f , hence,
The new potentials are given in figure (iii) (left). In figure (iii) (right) we indicate the graph H.
This graph has no deficient set, but it has a perfect matching M = {ae, bd, c f }. This matching
is a minimum cost perfect matching of G.
We will prove, in the next section, that this algorithm terminates and hence is guaranteed
to always find a minimum cost perfect matching. Suppose that after running the algorithm
and finding a minimum cost matching M we define variables x̄ as in (3.26). Then x̄ is a
feasible solution to the linear program (3.24) and ȳ is a feasible solution to the dual linear
program (3.30). Moreover, as all edges of M are equality edges the total sum of the potentials
will be equal to the cost of the matching M. In other words the value of x̄ in (3.24) is equal
to the value ȳ in (3.30). It follows from Weak Duality (Theorem 3.9), that x̄ is an optimal
solution to (3.24). Hence, correctness of the algorithm will imply the following result,
Theorem 3.13. If G is bipartite graph and there exists a perfect matching, then the linear
program (3.24) has an integral optimal solution; i.e., there is an optimal solution all of whose
entries are integers.
Note that our matching algorithm preserves at each step a feasible solution to the dual linear
program (3.30). This is an example of a primal-dual algorithm. We will see another example
in Chapter 5.1.
is an integer at all times during the execution of Algorithm 3. This is clearly true at the
beginning since ȳu is set to half the cost of the minimum-cost edge incident to u – and this is
an integer by our evenness assumption.
Now it is easy to check that integrality is maintained. In step 7 of the algorithm, we choose
a deficient set S ⊆ U. We then determine ε according to (3.31), add it to ȳv for all v ∈ S, and
subtract it from ȳv for v ∈ NH (S). Inductively, we know that c̄e is an integer for all e ∈ E, and
hence ε is an integer as well. Thus, ȳv changes by an integer amount in the current iteration
of the algorithm, and so do the reduced costs.
Proof. We have already seen that ȳ is a feasible solution of (3.30) at all times, and hence we
have
ȳu + ȳv ≤ cuv , (3.32)
for all uv ∈ E. For each u ∈ V , let uū ∈ δ ({u}) be a minimum-cost edge incident to u.
Applying (3.32) we see that
∑ ȳu ≤ ∑ cuū =: C. (3.33)
u u
Since ε is a positive integer in every iteration, and since |S| ≥ |NG (S)| + 1 for deficient set
S, it follows that ∑v ȳv must increase by at least 1 in every iteration of the algorithm. By
the initialization step 1 of Algorithm 3 the latter sum has an initial value C/2, and hence the
overall number of iterations must be bounded by C/2 ≤ ∑e ce .
algorithm. Efficient implementations of this algorithm are however known, and we provide
some details in the next section.
Exercises
1. C&O transportation company has three trucks with different capabilities (affectionately
called Rooter, Scooter and Tooter) and three moving jobs in different cities (Kitchener,
London and Missisauga). The following table shows the profit gained (in $1,000s) if a
given truck is used for the job in a given city. Your task is to find an allocation of the trucks
to the three different cities to maximize profit.
K L M
R 12 9 9
S 12 4 3
T 12 3 4
2. Consider a bipartite graph with vertices U = {a, b, c, d} and W = {g, h, i, j} and all edges
between U and W . You also have a weight for each edge of G as indicated in the following
table,
g h i j
a 2 7 1 2
b 3 4 3 2
c 6 5 5 5
d 2 6 2 3
For instance edge ag has weight 2. Find a maximum weight perfect matching using the
matching algorithm.
3. Let G = (V, E) be a graph with edge weights c. Consider the following three problems,
(a) Show that if you have an algorithm to solve (P1), then you can use it to solve (P2).
(b) Show that if you have an algorithm to solve (P2), then you can use it to solve (P1).
(c) Show that if you have an algorithm to solve (P3), then you can use it to solve (P4).
(d) Show that if you have an algorithm to solve (P4), then you can use it to solve (P3).
4. Let G = (V, E) be a bipartite graph. Consider the following linear program (P),
max 0
subject to
∑ xe : e ∈ δ (i) = 1 (i ∈ V )
xe ≥ 0 (e ∈ E)
5. Let G = (V, E) be a graph. A vertex cover of G is a subset S of V with the property that for
every edge i j we have that either i ∈ S of j ∈ S (or possibly both).
(a) Show that for every matching M and every vertex cover S we have |S| ≥ |M|. In
particular,
H INT: How many vertices you need to cover just the edges of M?
(b) Show that there are graphs for which the inequality (?) is strict.
Suppose G = (V, E) is a bipartite graph with partition U,W (for every edge i j of G, i ∈ U
and j ∈ W and U,W is a partition of V ). Suppose also that |U| = |W |. Let H be the
154 CHAPTER 3. DUALITY THROUGH EXAMPLES
complete bipartite graph with partition U and W (the edges of H are all pairs i j where
i ∈ U and j ∈ W ). Assign weights to the edges of H as follows: for every edge i j,
1 if i j is an edge of G
wi j =
0 otherwise
a) Show that there exists a feasible 0, 1 vector y (i.e. all entries of y are 0 or 1).
b) Show that if y is integer at the beginning of an iteration of the matching algorithm then
y will be integer at the beginning of the next iteration.
c) Show that there exists a feasible integer vector y and a matching M of G such that
∑ yi = |M|.
i∈V
H INT: Look at what you have when the matching algorithm terminates.
d) Show that there exists a feasible 0, 1 vector y and a matching M of G such that
∑ yi ≤ |M|.
i∈V
v1 v2 v3 v4 v5 v6 v7 v8
Figure 3.4: An M-alternating path where edges of M are shown as thick lines.
The figure above shows an M-alternating path where edges of matching M appear in bold.
We will say that a matching M covers a vertex v if M has an edge that is incident to v. If M
does not cover v then it exposes v, and v is M-exposed. An alternating path P is called M-
augmenting if it starts and ends in an M-exposed vertex. Hence, the path in the above figure
is an augmenting path for matching M.
The reader may already suspect that aug- v2 v1 v2 v1
menting paths are crucial for the design of
efficient matching algorithms. Figure (i) on v3 v3
the right suggests why this may be. The fig-
ure shows a bipartite graph G, a matching v4 v4
M, and an M-augmenting path
P = v1 v2 , v2 v3 , v3 v4 . (i) (ii)
Now observe that removing v2 v3 from M, and adding the edges v1 v2 , and v3 v4 yields another
matching M 0 that is larger than M. Let us formalize this swap operation as follows. For two
sets A and B, we let A∆B denote the symmetric difference of A and B. A∆B is the set of those
elements in A ∪ B that are not in both A and B; i.e.,
A∆B = (A \ B) ∪ (B \ A).
Note that the matching M 0 in the above example is the symmetric difference of M and P; i.e.
M 0 = M∆P. In the following, we say that M is a maximum matching if there is no matching
M 0 with more edges.
Theorem 3.15. M is a maximum matching in graph G = (V, E) if and only if G does not have
an M-augmenting path.
One direction of the proof of this theorem is quite easy: clearly if M is a matching, and
there is an M-augmenting path then M can not be a maximum matching. For the other direc-
tion it is useful to observe that M∆M 0 has an M-augmenting path if M 0 is a larger matching
than M. We leave the details as an exercise.
156 CHAPTER 3. DUALITY THROUGH EXAMPLES
In light of Theorem 3.15 a plausible strategy for finding a perfect matching is as follows:
start with the empty matching M = 0.
/ As long as there exists an M-augmenting path P, replace
M by M∆P. Theorem 3.15 implies that this method finds a maximum matching, and this must
clearly be perfect if the underlying graph has such a matching. But how do we efficiently find
augmenting paths?
We need a few more definitions before we can describe an algorithm for this task. A path
P = v1 v2 , v2 v3 , . . . , vk−1 vk
that starts and ends in the same vertex (i.e., v1 = vk ) is called a cycle. A graph that has no
cycle is called acyclic. We say that a graph T = (V, E) is connected if T has a uv-path for
any pair u, v ∈ V . T is a tree if it is connected, and acyclic. Figures (i) and (ii) are example
graphs that are not trees; figure (i) has a cycle (shown in bold edges), and figure (ii) is not
connected as there is, for example, no path connecting u and v. Finally, figure (iii) shows a
tree. Given a tree T = (V, E), we call a vertex u ∈ V a leaf if it is incident to exactly one edge
u
u
(i) (ii) (iii)
(see figure (iii)), and an internal vertex otherwise. Trees have many useful properties. We
state two crucial ones here, and leave their proofs to the reader.
Proposition 3.16. (a) Let T = (V, E) be a tree and u, v ∈ V ,
then T contains a unique uv-path; we will use Tuv to de-
y
note this path from now on.
x
(b) Let x, y ∈ V such that xy 6∈ E, then the graph obtained v
from T by adding the edge xy contains a unique cycle.
u
This cycle is called the fundamental cycle defined by xy.
3.2. MINIMUM COST PERFECT MATCHING IN BIPARTITE GRAPHS 157
In order to efficiently find augmenting paths, we will need a special kind of tree. Let M be
a matching in some underlying graph G = (V, E), and suppose that r is an M-exposed vertex.
A tree T in G is an M-alternating tree with root r if all vertices in V (T ) \ {r} are M-covered,
and if for any vertex u 6= r, the unique ru-path Tru in T is an M-alternating path. The left
figure below shows such a tree. In the following, we let A(T ) and B(T ) be the sets of vertices
u in tree T for which Tru has an even and odd, respectively, number of edges.
r r r
u1 u1
u2 u2
u3 u3
u v u4 v u4
(i) (ii) (iii)
Figure 3.5: Figure (i) shows an alternating tree for a matching M whose edges are depicted in
bold. Shaded vertices belong to A(T ), and white ones to B(T ). Figure (ii) shows a non-tree
vertex v that is connected to a vertex in B(T ), and the associated augmenting path P. Figure
(iii) shows the enlarged matching M∆P.
Our algorithm works in iterations to either find a perfect matching M, or a proof that none
exists. At any time during its execution, the algorithm will maintain a matching M, and an
M-alternating tree T . The algorithm has the following important properties:
[P2] If M does not increase in an iteration then |V (T )| increases, or the algorithm terminates
with a deficient set.
Without describing any further details, it is clear that the above two properties imply that the
algorithm terminates in at most |V |2 iterations. To see this, let us group those iterations during
which the matching M has size i into the ith phase. Clearly, by property [P1], the iterations of
Phase i are consecutive. Moreover, every iteration within a phase grows the alternating tree
by [P2], and therefore, a phase can have at most |V | iterations. Finally, there are clearly no
more than |V | phases, and this completes the argument.
We describe phase i ≥ 0 of the algorithm. At the beginning of the phase, let M be a
matching with i edges. Let r be an M-exposed vertex, and let T 0 = ({r}, 0)
/ be the trivial
158 CHAPTER 3. DUALITY THROUGH EXAMPLES
tree with a single vertex, and no edges. At the beginning of iteration j ≥ 0, we are given
an M-alternating tree T j with root r. We then check whether there is an edge uv ∈ E with
u ∈ B(T j ), and v 6∈ V (T j ). We later show that, if no such edge exists, then G has a deficient
set, and hence no perfect matching. Let us for now assume that there is such an edge uv.
j
If v is M-exposed, then the path P obtained by attaching the edge uv to Tru is an M-
augmenting path, and hence M∆P is a matching of size i + 1; this ends Phase i.
On the other hand, suppose that v is M-covered. Thus, there exists a vertex v0 such that
vv0 ∈ M. We will ensure that, like v, v0 is not part of the vertex set of tree T j ; the reader is
encouraged to verify this. We now add vertices v and v0 , as well as the edges uv, vv0 to our tree,
and define T j+1 = (V j+1 , E j+1 ), where V j+1 = V j ∪ {v, v0 }, and E j+1 = E j ∪ {uv, vv0 }. While
the matching M remains unchanged in this step, the size of the alternating tree increases.
w1 u1 u2
w1 u1 w5
w2 u2
u2 w5 w2 u2
u3 w2
w3 u3 w3
u3 w2 w3 u3
w1 u4 w3
w4 u4 w4
u4 w4 u4
w5 u5 u5 w5 u5 u5 w4
Figure 3.6: Figure (i) shows a bipartite graph, and matching M in bold. Figure (ii) shows an
alternating tree rooted at M-exposed vertex w3 . Figure (iii) shows a larger matching obtained
by augmenting M, and figure (iv) gives a certificate that no perfect matching exists.
The example instance in Figure 3.6.(i) shows a bipartite graph and a matching M. Let us
apply the above procedure starting from the given matching, in phase 3. We let the initial
tree be T 0 = ({w3 }, 0),
/ and note that w3 ∈ B(T 0 ). Vertex w3 has three neighbours u2 , u3 , and
u4 , all of which are M-covered. Following the algorithm, we arrive at a tree T 3 = (V 3 , E 3 )
(depicted in Figure 3.6.(ii)) with
P = w3 u4 , u4 w4 , w4 u5 ,
3.2. MINIMUM COST PERFECT MATCHING IN BIPARTITE GRAPHS 159
is a P-augmenting path, and the matching M 0 = M∆P has one more edge than M. This starts
phase 4 of the algorithm. Figure 3.6.(iii) shows the new matching M 0 , and M 0 -exposed vertex
/ we find the M 0 -
w1 . Using the same procedure as before, starting from tree T 0 = ({w1 }, 0)
alternating tree T 0 = (V 0 , E 0 ) displayed in Figure 3.6.(iv). Note that none of the vertices
v ∈ B(T 0 ) has a neighbour outside the tree. Thus we are stuck! In fact, we are now able
to show that the given graph has no perfect matching. We state the following useful fact
whose proof we leave as an exercise.
Fact 3.17. A graph G = (V, E) is bipartite if and only if G does not contain cycles with an
odd number of edges.
Proof. We claim that the set B(T 0 ) has at most |B(T 0 )| − 1 neighbours in G. Theorem 3.12
then implies that G has no perfect matching.
Consider a vertex v ∈ B(T 0 ), and let u be a neighbour of v. We claim that u ∈ A(T 0 ).
Indeed if not then one shows that the unique cycle in (V 0 , E 0 ∪ {uv}) has odd length; we leave
this as an exercise. Fact 3.17 then contradicts the fact that G is bipartite.
Let r be the root of tree T . We will finish the argument by finding a designated neighbour
for each vertex in B(T 0 ) \ {r}. Consider a vertex v ∈ B(T 0 ), and let vv0 ∈ M. We then let
v0 ∈ A(T 0 ) be v’s designated neighbour. Clearly, if v0 is v’s designated neighbour then it can
not be the designated neighbour of another vertex in B(T 0 ). The above argument shows that
B(T 0 ) has at least one more vertex than A(T 0 ). It follows from Theorem 3.12 that G has no
perfect matching as desired.
Consider the set of shaded vertices in Figure 3.6.(iv). Its neighbours in G are the white
vertices, and there are fewer white vertices than shaded ones. Hence, the graph in the example
has no perfect matching by Theorem 3.12.
The discussion in this section, and Proposition 3.18 provide an algorithmic proof of The-
orem 3.12. We state this method formally below in Algorithm 4.
Algorithm 4 could now simply be used as a black box in step 7 of Algorithm 3. We will
now see, however, that a more careful implementation yields a significant improvement over
160 CHAPTER 3. DUALITY THROUGH EXAMPLES
the iteration bound provided by Proposition 3.14. The modified Algorithm 3 will maintain a
matching M in the graph H of equality edges, and an M-alternating tree T at all times.
Initially, M will be empty, and T will consist of a single (exposed) vertex, and no edges.
The algorithm stops as soon as M is a perfect matching, or when it finds a deficient set in G.
In an iteration we invoke Algorithm 4 starting from matching M, and M-alternating tree
T ; i.e., we start the execution of Algorithm 4 with step 4, skipping the two initialization steps
for M and T . Algorithm 4 either increases the size of matching M, or returns with a new
larger alternating tree T 0 for the same old matching. We present the details of this method in
Algorithm 5.
Similar to Algorithm 4 we will show that the algorithm maintains the following invariants:
These invariants clearly provide an upper bound on the total number of iterations. Using the
same notation as before, we say that an iteration of Algorithm 3 belongs to phase i if matching
M has i edges at its start. Invariant [P’1] implies that the iterations belonging to phase i are
consecutive. Invariant [P’2] implies that a phase can have at most |V | iterations, and [P’1]
implies that there are at most |V |/2 phases. Hence we obtain the following proposition.
It remains to show that invariants [P’1] and [P’2] hold throughout. Consider an iteration
of Algorithm 5, and let M and T be the current matching and corresponding M-alternating
tree. If Algorithm 4 returns a perfect matching then we are obviously done, and have nothing
to show. Let us therefore assume that M is not a perfect matching, and T is the corresponding
M-alternating tree.
162 CHAPTER 3. DUALITY THROUGH EXAMPLES
Observation 3.20. The dual update in step 10 does not change the reduced costs of edges
e ∈ E(T ), or those of matching edges wz ∈ M \ E(T ). The edge uv minimizing the expression
on the right-hand side of step 9 had positive reduced cost before, and zero reduced cost after
the update.
In other words, if matching M does not change in the process of invoking Algorithm 4
then all of its edges remain to be equality edges. Moreover, the dual-update step creates a
new equality edge uv ∈ δ (B(T )). The important thing to notice is what happens the next time
Algorithm 4 is invoked. If v is an M-exposed vertex, then Algorithm 4 finds a larger matching.
If it doesn’t find a larger matching then v must be M-covered, and the edge uv can be used to
increase the size of T . Altogether, we have argued that [P’1] and [P’2] hold throughout the
algorithm.
literally hundreds of NP-hard optimization problems. A good introduction to this topic which
is beyond the scope of these notes can be found in the books of Vazirani [23], and Williamson
and Shmoys [24].
164 CHAPTER 3. DUALITY THROUGH EXAMPLES
Chapter 4
Duality theory
In Chapter 3, we introduced the notion of a dual of a linear program, and demonstrated how
this can be used to design algorithms to solve optimization problems, such as the shortest path
problem and the minimum cost perfect matching problem in bipartite graphs. In this chapter,
we shall generalize these results, and develop a general theoretical framework.
min{cT x : Ax ≥ b, x ≥ 0} (4.1)
min{cT x : Ax = b, x ≥ 0} (4.3)
Our goal in this section is to generalize these definitions, as well as the corresponding weak
duality theorems (Theorem 3.2 and Theorem 3.9).
Consider the following table describing a pair of linear programs, The notation Ax ? b
(resp. AT y ? c) indicates that constraints of (Pmax ) and (Pmin ) are arbitrary linear constraints,
165
166 CHAPTER 4. DUALITY THEORY
(Pmax ) (Pmin )
≤ constraint ≥ 0 variable
max cT x = constraint free variable min bT y
subject to ≥ constraint ≤ 0 variable subject to
Ax ? b ≥ 0 variable ≥ constraint AT y ? c
x?0 free variable = constraint y?0
≤ 0 variable ≤ constraint
Table 4.1: Primal-dual pairs
i.e. the relation between the LHS and RHS of the constraint is either ≤, =, or ≥. The notation
x ? 0 and y ? 0 indicates that the variables of (Pmax ) and (Pmin ) are either non-negative, free,
or non-positive. We pair constraint i of (Pmax ), i.e. rowi (A)x ? bi , with the variable yi of (Pmin )
and we pair constraint j of (Pmin ), i.e. col j (A)y ? c j , with the variable x j of (Pmax ). The
table indicates the relation between a constraint of (Pmax ) and the corresponding variable of
(Pmin ). For instance if rowi (A)x ≤ bi in (Pmax ) then we have yi ≥ 0 in (Pmin ). The table also
indicates the relation between a constraint of (Pmin ) and the corresponding variable of (Pmax ).
For instance if col j (A)y ≤ c j in (Pmin ) then we have x j ≤ 0 in (Pmax ).
Consider a pair of linear program (Pmax ) and (Pmin ) satisfying the relations given in ta-
ble 4.1. We then define (Pmin ) to be the dual of (Pmax ) and we define (Pmax ) to be the dual
of (Pmin ). We call the pair (Pmax ) and (Pmin ) a primal-dual pair. The reader should verify
that these definitions imply in particular that the dual of the dual of a linear program (P) is the
original linear program (P).
Example 15. Let us apply these definitions to try to find the dual of the linear program (4.1).
This linear program can be rewritten as,
Since,
which is equal to (4.2) if we define y as being equal to x̂. Similarly, (4.4) is the dual of (4.3).
Hence, our new definition of dual is consistent with the definitions given in sections 3.1
and 3.2.
Example 16. Consider the following linear program,
Suppose we wish to find the dual of (P). Since (P) is a maximization problem, it plays the role
of (Pmax ) in table 4.1. It follows that the dual (D) of (P) is of the form (Pmin ) in table 4.1,
Since,
We proved in Theorem 3.2 that if x̄ is a feasible solution to (4.1) and ȳ is a feasible solution
to (4.2) then cT x̄ ≥ bT ȳ. Similarly, we proved in Theorem 3.9 that if x̄ is a feasible solution
to (4.3) and ȳ is a feasible solution to (4.4) then cT x̄ ≥ bT ȳ. The following is a common
generalization to both of these results.
Theorem 4.1 (Weak Duality Theorem). Let (P) and (D) be a primal-dual pair where (P) is a
maximimization problem and (D) a minimization problem. Let x̄ and ȳ be feasible solutions
for (P), and (D), respectively.
Observe, that (2) in the previous theorem follows immediately from (1). This is because as
ȳ is feasible (1) implies that for every feasible solution x of (P), cT x ≤ bT ȳ, i.e. bT ȳ is an
upper bound of (P). But then x̄ is an optimal solution to (P) since its value is equal to its upper
bound. The argument to prove that ȳ is an optimal solution to (D) is similar.
Example 16, continued. Consider the primal-dual pair given by (P) and (D). Let
It can be readily checked that x̄ is feasible for (P) and ȳ is feasible for (D). Moreover, (2, 2, 13)ȳ =
(12, 26, 20)x̄ = −18. It follows from Theorem 4.1(2) that x̄ is an optimal solution to (P) and
that ȳ is an optimal solution to (D).
Proof of Theorem 4.1. Let (P) be an arbitrary linear program where the goal is to maximize
4.1. WEAK DUALITY 169
Let x̄ be a feasible solution to (4.5) and let ȳ be a feasible solution to (4.6). Then for s̄ := b−Ax̄,
the vectors x̄, s̄ are a solution to (4.7). Moreover, for w̄ := c − AT ȳ, the vectors ȳ, w̄ are a
solution to (4.8). Since Ax̄ + s̄ = b we have,
?
ȳT b = ȳT (Ax̄ + s̄) = (ȳT A)x̄ + ȳT s̄ = (c − w̄)T x̄ + ȳT s̄ = cT x̄ − w̄T x̄ + ȳT s̄.
where equality (?) follows from the fact that AT ȳ + w̄ = c or equivalently that ȳT A = (c − w̄)T .
Hence, to prove that bT ȳ ≥ cT x̄ it suffices to show that,
w̄T x̄ = ∑ w̄ j x̄ j + ∑ w̄ j x̄ j + w̄ j x̄ j
∑ ≤0 and
j∈C1 |{z} |{z} j∈C2 |{z} |{z} j∈C3 |{z}
≤0 ≥0 ≥0 ≤0 =0
We close this section by noting the following consequences of the Weak Duality Theorem
and the Fundamental Theorem of linear programming,
3. if (P) and (D) are both feasible, then (P) and (D) both have optimal solutions.
Proof. We may assume that (P) is a maximization problem with objective function cT x and
that (D) is a minimization problem with objective function bT y. (1) Suppose that (D) has
a feasible solution ȳ. Then by the Weak duality (Theorem 4.1), bT ȳ is an upper bound for
(P). In particular, (P) is not unbounded. (2) Suppose that (D) has a feasible solution x̄. Then
by the Weak duality (Theorem 4.1), cT x̄ is an lower bound for (D). In particular, (D) is not
unbounded. (3) By (1) we know that (P) is not unbounded and by (2) we know that (D) is
not unbounded. Since by the Fundamental Theorem of linear programming (see Theorem
2.11) (P) is either unbounded, infeasible, or has an optimal solution, (P) must have an optimal
solution. By a similar argument (D) has an optimal solution.
It is often easy when given a primal-dual pair to verify that both linear programs are feasible.
In that case statement (3) in the previous result implies immediately that both linear programs
have an optimal solution.
4.1. WEAK DUALITY 171
Exercises
(a)
max (1, 3, 1)x
subject to
1 2 7 ≤ −3
0 1 −1 = 9
x
9 0 0 ≤ 5
1 −1 1 = 0
x1 , x2 ≥ 0, x3 free
(b)
min (5, 6)x
subject to
3 −1 ≥ 8
2 4 = −2
x
3 2 ≤ 2
−1 2 = −3
x1 ≥ 0, x2 free
(a)
min cT x + d T u
subject to
Ax + Du = b
x≥0
where A and D are matrices, c, d and b are vectors, and x and u are column vectors of
variables.
172 CHAPTER 4. DUALITY THEORY
(b)
max cT x
subject to
Ax ≥ b
Dx − Iu ≤ d
x ≤ 0, u ≥ 0
where A and D are matrices, I is the identity matrix, c, b and d are vectors, and x and u
are column vectors of variables.
Note, assume that all vectors and matrices are of suitable sizes so that the constraints and
objective function are well-defined.
4. Give an example of an infeasible LP problem whose dual is also infeasible. You need to
prove algebraically that your example and its dual are infeasible.
3 4
1 b c
a e f l
g
2 k 6
h 5
(f) Find an alternate proof to the statement you proved in (a) by using (e) and Weak
Duality.
~ has a directed
(g) Show that if (P) has a feasible solution of value greater than 0 then G
cycle.
~ has no directed cycle then G
(h) Using (g) prove that if G ~ must have a topological ordering.
6. Consider a graph G. A dyad of G is a pair of edges that share exactly one vertex. A packing
of dyads is a set S of dyads with the property that no two dyads in S share a common
vertex. In the following example the dyads are represented by bold edges.
(a) Show that the following IP finds a packing of dyads of maximum cardinality.
(b) Denote by (P) the linear program obtained from (IP) by relaxing the conditions that x
be integer. Find the dual (D) of (P).
(c) Suppose that G is bipartite and let A, B be a partition of V such that all edges are going
between A and B. Find a solution for (D) of value min{|A|, |B|}.
(d) Using (c) show that packing of dyads have cardinality at most min{|A|, |B|}.
4.2. STRONG DUALITY 175
Theorem 4.3 (Strong Duality). Let (P) and (D) be a primal-dual pair. If there exists an
optimal solution x̄ of (P) then there exists an optimal solution ȳ of (D). Moreover, the value of
x̄ in (P) equals the value of ȳ in (D).
We will only prove this theorem for the special case when (P) in in SEF. It is routine
however, to derive the result for arbitrary linear program by first converting the linear program
to an equivalent linear program in SEF.
Proof of Theorem 4.3 for SEF. If (P) is in SEF, the primal-dual pair (P) and (D) is of the form,
max{cT x : Ax = b, x ≥ 0} (P)
min{bT y : AT y ≥ c}. (D)
We know from Theorem 2.7 that the 2-phase Simplex procedure terminates. Since (P) has
an optimal solution, the simplex will terminate with a optimal basis B. Let us rewrite (P) in
canonical form for B (see Proposition 2.4),
where
ȳ = A−T
B cB and c̄T = cT − ȳT A.
(P’) has the property that for any feasible solution the value in (P) and (P’) is the same. (P’)
also has the property that c̄B = 0. Hence,
z = cT x̄ = ȳT b + c̄T x̄ = bT ȳ + c̄TN x̄N + c̄B x̄B = bT ȳ.
|{z} |{z}
=0 =0
Since the simplex procedure stopped we must have c̄ ≤ 0 i.e.
cT − ȳT A ≤ 0 or equivalently AT ȳ ≥ c.
It follows that ȳ is feasible for (D). Since cT x̄ = bT ȳ we know from Weak Duality (Theo-
rem 4.1) that x̄ is an optimal solution to (P) and ȳ is an optimal solution to (D).
In the Simplex we consider (P) of the form max{cT x : Ax = b, x ≥ 0}. At each iteration we
have a feasible basis B. The vector x̄ is the basic feasible solution for B. Hence, property
(A) holds. We define ȳ as A−T
B cB which implies (see the proof of Theorem 4.3) that (C)
holds. The algorithm terminates when (B) is satisfied. Thus the simplex keeps two out of
the three properties (A),(B),(C) satisfied at each step, and terminates when the third property
holds. There are other algorithms that follow the same scheme. The dual Simplex procedures,
maintains properties (B),(C) at each iteration and terminates when (A) holds. Interior point
methods, maintain properties (A),(B) at each iteration and terminate when (C) holds.
Exercises
1. Let (P) denote the LP problem max{cT x : Ax ≤ b, x ≥ 0}, and let (D) be the dual of (P).
The goal of this exercise is to prove the strong Duality Theorem for (P) and (D). (In the
Course Notes we proved strong duality only for problems in SEF.)
(c) Suppose that (P) has an optimal solution x. Construct an optimal solution x0 of (P’).
(d) Using Theorem 4.3 for problems in SEF, deduce that there exists an optimal solution
y0 to (D’) where the value of x0 for (P’) is equal to the value of y0 for (D’).
2. Consider the following linear program (P), max{cT x : Ax ≤ b, x ≥ 0}. Let (D) denote the
dual of (P), and suppose that (P) has an optimal solution x̄ and (D) has an optimal solution ȳ.
178 CHAPTER 4. DUALITY THEORY
(a) Let (P2) denote the LP resulting by multiplying the first inequality of (P) by 2, and let
(D2) denote the dual of (P2). Find optimal solutions of (P2) and (D2) in terms of x̄
and ȳ. Justify your answers.
(b) Let (P3) denote the LP obtained from (P) by replacing the constraint Ax ≤ b by MAx ≤
Mb where M is a square non-singular matrix of suitable dimension. Let (D3) denote
the dual of (P3). Find optimal solutions of (P3) and (D3) in terms of x̄ and ȳ. Justify
your answers.
Assume that (P) has a feasible solution. The goal of the exercise is to use duality to show
that (P) is unbounded if and only if there exists r such that
(e) Deduce from f) that there exists a feasible solution r̄ to (P’) that satisfies (?).
4.3. A GEOMETRIC CHARACTERIZATION OF OPTIMALITY 179
max{cT x : Ax ≤ b} (4.10)
min{bT y : AT y = c, y ≥ 0}. (4.11)
Let us revisit revisit the proof of Weak Duality (Theorem 4.1) for this particular case. After
adding slack variables s, we can rewrite (4.10) as,
Let x̄ be a feasible solution to (4.10) and let ȳ be a feasible solution to (4.11). Then for
s̄ = b − Ax̄, the vectors x̄, s̄ are a solution to (4.12). Since Ax̄ + s̄ = b we have,
where the last equality follows from the fact that ȳ is feasible for (4.11). We know from the
Strong Duality Theorem 4.3 that x̄ and ȳ are both optimal for (4.10) and (4.11) respectively,
if and only if cT x̄ = bT ȳ, or equivalently ȳT s̄ = 0. Let m denote the number of constraints
of (4.10) then
m
ȳT s̄ = ∑ s̄i ȳi . (?)
i=1
As s̄i ≥ 0 and ȳi ≥ 0, every term in the sum (?) is non-negative. It follows ȳT s̄ = 0 if and only
if for every i ∈ {1, . . . , m}, s̄i = 0 or ȳi = 0. Note, that s̄i = 0 means that constraint i of (4.10)
holds with equality for x̄; we will sometimes also say that this constraint is tight for x̄. The
following result summarizes these observations,
180 CHAPTER 4. DUALITY THEORY
Theorem 4.5 (Complementary Slackness - special case). Let x̄ be a feasible solution to (4.10)
and let ȳ be a feasible solution to (4.11). Then x̄ is an optimal solution to (4.10) and ȳ is an
optimal solution to (4.11) if and only if for every row index i of A, constraint i of (4.10) is
tight for x̄ or the corresponding variable dual ȳi = 0.
max (5, 3, 5) x
subject to
(P)
1 2 −1 2
3 1 2 x ≤ 4
−1 1 1 −1
and
min (2, 4, −1)y
subject to
1 3 −1 5 (D)
2 1 1 y = 3
−1 2 1 5
y≥0
It can be readily checked that x̄ = (1, −1, 1)T is a feasible solution to (P) and that ȳ = (0, 2, 1)T
is a feasible solution to (D). Theorem 4.5 states that to show that x̄ is an optimal solution to
(P) and that ȳ is an an optimal solution to (D) it suffices to check the following conditions:
• ȳ1 = 0 OR (1 2 − 1)x̄ = 2;
• ȳ2 = 0 OR (3 1 2)x̄ = 4 ;
e
We indicate by a the part of the OR condition that is satisfied. In our particular case, all
OR conditions are satisfied. It follows that x̄ is optimal for (P) and that ȳ is optimal for (D).
Consider now the feasible solution x0 = (2, −2, 0)T of (P). The conditions of Theorem 4.5 do
not hold for the pair x0 and ȳ, since (−1, 1, 1)x0 < −1 and ȳ3 = 1 > 0. Observe that in this case
Theorem 4.5 only allows us to deduce that either: x0 is not optimal for (P); or ȳ is not optimal
for (D). The theorem does not give any indication as to which outcome occurs.
4.3. A GEOMETRIC CHARACTERIZATION OF OPTIMALITY 181
Theorem 4.5 generalizes to arbitrary linear programs. Consider a primal-dual pair (Pmax )
and (Pmin ) as given in table 4.1. Let x̄ be a feasible solution to (Pmax ) and let ȳ be a feasible
solution to (Pmin ). We say that x̄, ȳ satisfy the Complementary Slackness (CS) conditions if,
Note, that if a variable x j is free then the corresponding constraint j of (Pmin ) is an equality
constraint, hence the condition is trivially satisfied. Similarly, if yi is free then the corre-
sponding constraint i of (Pmax ) is an equality constraint. Thus we only need to state the CS
conditions for non-free variables of (Pmax ) and (Pmin ).
We leave as an exercise the following generalization of Theorem 4.5,
Theorem 4.6 (Complementary Slackness Theorem). Let (P) and (D) be an arbitrary primal-
dual pair. Let x̄ be a feasible solution to (P) and let ȳ be a feasible solution to (D). Then x̄ is
an optimal solution to (P) and ȳ is an optimal solution to (D) if and only if the Complementary
Slackness conditions hold.
It can be readily checked that x̄ = (5, −3, 0)T is a feasible solution to (P) and that ȳ =
(0, 4, −2)T is a feasible solution to (D). Theorem 4.6 states that to show that x̄ is an opti-
mal solution to (P) and that ȳ is an optimal solution to (D) it suffices to check the following
conditions:
e
We indicate by a the part of the OR condition that is satisfied. In our particular case, all
OR conditions are satisfied. It follows that x̄ is optimal for (P) and that ȳ is optimal for (D).
Observe that in the third condition, both OR alternatives are satisfied. This does not contradict
the theorem as the CS conditions require that at least one of the alternative holds, but they do
not preclude both alternatives to simultaneously hold.
Exercises
1. For each of (a) and (b) do the following:
(a)
max (−2, −1, 0)x
subject to
1 3 2 ≥ 5 (P)
x
−1 4 2 ≤ 7
x1 ≤ 0, x2 ≥ 0
and
x̄ = (−1, 0, 3)T ȳ = (−1, 1)T .
4.3. A GEOMETRIC CHARACTERIZATION OF OPTIMALITY 183
(b)
min (−5, −7, −5)x
subject to
1 2 3 ≤ 3 (P)
4 5 6 x ≤ 9
7 8 9 ≥ 2
x1 , x2 , x3 ≥ 0
and
x̄ = (1, 1, 0)T ȳ = (−1, −1, 0)T .
2. Recall that (3.9) is the IP formulation of the shortest path problem. Let (P) denote the
linear programming relaxation of (3.9). The dual of (P) is given by (3.15).
(a) Write the complementary slackness conditions for (P) and (3.15).
(b) Using the CS theorem (theorem 4.6) prove Proposition 3.6.
3. Recall that (3.24) is the IP formulation of the minimum cost matching problem. Let (P)
denote the linear programming relaxation of (3.24). The dual of (P) is given by (3.30).
(a) Write the complementary slackness conditions for (P) and (3.30).
(b) Using the CS theorem (theorem 4.6) prove Proposition 3.8.
4. (a) Suppose that the LP problem max{cT x : Ax = b}. Prove that it has an optimal solution
if and only if cT is in the row space of A.
H INT: Consider the CS conditions.
(b) Consider the LP problem, max{cT x : 1T x = 250}. Under what conditions on c does
this problem have an optimal solution? Justify your answer.
max cT x
subject to (4.13)
Ax ≤ b
Let x̄ and ȳ be feasible solutions to (P) and (D) respectively. We say that x̄ and ȳ almost
satisfy the CS conditions if for every i = 1, . . . , m at least one of the following holds:
(CS1) ȳi = 0,
(CS2) rowi (A)x̄ = bi ,
1
(CS3) ȳi ≤ 1 and bi − rowi (A)x̄ ≤ 1000 .
Suppose that A has a 100 rows and that x̄ and ȳ almost satisfy the CS conditions.
(b) Show that the value of an optimal solution of (P) cannot exceed the value of x̄ by more
1
than 10 .
(c) Show that the value of an optimal solution of (D) cannot be inferior to the value of ȳ
1
by more than 10 .
6. A factory makes n products from m resources. The jth product earns a profit of c j dollars
per unit and consumes ai j units of resource i per unit. We have bi units of resource i avail-
able. The production problem is to decide how many units to produces of each product
such that total profit is maximized while not exceeding the available resources. The fol-
lowing linear program (P) solves the production problem (note x j is the number of units of
resource j produced),
n
max ∑ c jx j
j=1
subject to
n
∑ ai j x j ≤ bi (i = 1, . . . , m)
j=1
xj ≥ 0 ( j = 1 . . . , n)
4.3.2 Geometry
Before we can characterize optimal solutions to linear programs geometrically, we will need
a number of preliminary definitions.
Let a(1) , . . . , a(k) be a set of vectors in Rn . We define the cone generated by a(1) , . . . , a(k)
to be the set ( )
k
C= ∑ λia(i) : λi ≥ 0 for all i = 1, . . . , k ,
i=1
i.e., C is the set of all points that can be obtained by multiplying each of a(1) , . . . , a(k) by a
non-negative number and adding all of the resulting vectors together. We denote the set C by
cone{a(1) , . . . , a(k) }. Note, we define the cone generated by an empty set of vectors to be the
set consisting of the zero vector only, i.e. the set {0}.
In the following figure, we represent each of the vectors a(1) , a(2) , a(3) by an arrow from the
origin and the infinite region containing a(2) bounded by the half lines from the origin deter-
mined by a(1) and a(3) respectively is cone{a(1) , a(2) , a(3) }.
x2
"
!2 ! "
3
(3 ) = 1 (2 ) = 1
1 a a
0
1 2 3 x1
−1 a (1) !
= 2 "
−1
We define the cone of tight constraints for x̄ to be the cone C generated by the rows of A
corresponding to the tight constraints, i.e. C = cone{rowi (A)T : i ∈ J(x̄)}.
max ( 32 , 12 )x
s.t.
1 0 2 (1) (4.14)
1 1 x ≤ 3 . (2)
0 1 2 (3)
Note, x̄ = (2, 1)T is a feasible solution. The tight constraints for x̄ are constraints (1) and (2).
Hence, the cone of tight constraints for x̄ is,
1 1
C := cone , .
0 1
Theorem 4.7. Let x̄ be a feasible solution to max{cT x : Ax ≤ b}. Then, x̄ is optimal if and
only if c is in the cone of tight constraints for x̄.
Example 20, continued. The objective function for (4.14) is z = ( 23 , 12 )x and ( 32 , 12 )T ∈ C as,
3/2 1 1 1
= ȳ1 + ȳ2 for ȳ1 = 1 ≥ 0 and ȳ2 = 2 ≥ 0. (4.15)
1/2 0 1
It follows from Theorem 4.7 that x̄ = (2, 1)T is an optimal solution of (4.14). We illustrate
this result in the following figure. Note, in that figure the cone of tight constraints is drawn
with its origin shifted to the point x̄.
We will use the linear program (4.14) with x̄ = (2, 1)T to illustrate one direction of the
proof of Theorem 4.7, namely that if the objective function is in the cone of the tight con-
straints for x̄, then x̄ is an optimal solution. Observe that because of Theorem 4.5 it will
suffice to,
! "
1
2 1
! "
3/2
1/2
cone of tight
! "
1
constraints
1 x̄ 0
feasible region
0
1 2 x1
Step 1. Let ȳ1 := 1, ȳ2 = 12 , and ȳ3 = 0. Then (4.15) implies that,
ȳ1
3/2 1 1 0 1 1 0
= ȳ1 + ȳ2 + ȳ3 = ȳ2 .
1/2 0 1 1 0 1 1
ȳ3 .
Hence, ȳ = (ȳ1 , ȳ2 , ȳ3 )T ≥ 0 is a feasible solution to (4.16). In this argument we used the fact
that c = (3, 1)T is in the cone of the tight constraints to obtain values for the dual variables
of the tight constraints, and we assigned the value zero to all dual variables corresponding to
non-tight constraints.
• y1 = 0 OR x1 = 2 ;
• y2 = 0 OR x1 + x2 = 3 ;
• y3 = 0 OR x2 = 2.
e
We indicate by a the part of the OR condition that is satisfied. In our particular case, the CS
conditions are satisfied. Note, because of the way the dual variables were defined in Step 1, if
the constraints are not tight, then the corresponding dual variable is zero. Hence, the fact that
the CS conditions hold is no surprise.
We completed Step 1 and Step 2, hence x̄ = (2, 1)T is optimal.
max{cT x : Ax ≤ b} (P)
min{bT y : AT y = b, y ≥ 0} (D)
We need to show that x̄ is an optimal solution to (P). Because of Theorem 4.5 it will suffice to,
Step 1. Recall, J(x̄) denotes the row indices of the tight constraints of Ax ≤ b for x̄. Since c is
in the cone of the tight constraints x̄, i.e. c ∈ cone{rowi (A)T : i ∈ J(x̄)}, there exists ȳi ≥ 0 for
all i ∈ J(x̄) such that
T
c = ∑ ȳi rowi (A) : i ∈ J(x̄) .
Let us assign the value zero to all dual variables corresponding to non-tight constraints, i.e we
set ȳi = 0 for all row indices i ∈
/ J(x̄). Then
c = ∑ ȳi rowi (A) : i = 1, . . . , m = AT (ȳ1 , . . . , ȳm )T .
T
Step 2. Because of the way the dual variables where defined in Step 1, either i ∈ J(x̄) and the
constraints rowi (x̄) ≤ bi are tight for x̄, or the corresponding dual variable ȳi = 0. Hence, the
CS conditions in (?) hold.
We completed Step 1 and Step 2, hence x̄ is optimal for (P).
We need to show that c is in the cone of the tight constraints for x̄. Theorem 4.5 implies that
there exist a feasible solution ȳ of (D) such that the CS conditions (?) hold for x̄, ȳ. Define,
J := {i : ȳi > 0}. Then
T T T
c = A ȳ = ∑ ȳi rowi (A) : i = 1, . . . , m = ∑ ȳi rowi (A) : i ∈ J .
Hence, c ∈ cone{rowi (A)T : i ∈ J}. Finally, observe that if i ∈ J, then ȳi > 0 which implies by
the CS conditions (?) that constraint i of Ax ≤ b is tight for x̄. Hence, J ⊆ J(x̄), and c is in the
cone of the tight constraints for x̄.
Exercises
1. Consider the following linear program (P):
max 7x1 + 3x2
s.t.
x1 − x2 ≤ 1 (1)
x1 + x2 ≤ 5 (2)
2x1 − 3x2 ≤ 1 (3)
(a) Let I be the index set of the tight constraints for x̄ and let
Prove that (7, 3)T ∈ C and deduce that x̄ is an optimal solution to (P).
(b) Find the dual (D) of (P).
(c) Write the complementary slackness conditions for (P) and (D).
(d) Using the fact that (7, 3)T ∈ C, construct a feasible solution ȳ to (D) such that x̄ and ȳ
satisfy the complementary slackness conditions. Deduce that x̄ is an optimal solution.
190 CHAPTER 4. DUALITY THEORY
Theorem 4.8 (Farkas’ Lemma). Let A be an m × n matrix and let b be a vector with m entries.
Then exactly one of the following statements is true:
Proof.
Let us proceed by contradiction and suppose that there exists a solution x̄ to Ax = b, x ≥ 0 and
that we can find ȳ such that ȳT A ≥ 0T and ȳT b < 0. Since Ax̄ = b is satisfied, we must also
satisfy ȳT Ax̄ = ȳT b. Since ȳT A ≥ 0T and x̄ ≥ 0, it follows that ȳT Ax̄ ≥ 0. Then 0 ≤ ȳT Ax̄ =
ȳT b < 0, a contradiction.
max{0T x : Ax = b, x ≥ 0} (P)
min{bT y : AT y ≥ 0}. (D)
By hypothesis, (P) is infeasible. Observe that (D) is feasible since AT 0 ≥ 0. It follows from the
possible outcomes for primal-dual pairs (see table 4.2) that (D) is unbounded. In particular,
there exists a feasible solution ȳ of (D) with negative value, i.e. AT ȳ ≥ 0 and bT ȳ < 0 as
required.
Recall, that a vector ȳ satisfying the conditions of Theorem 4.8(2) is called a certificate of
infeasibility. Next we show that if a system Ax = b, x ≥ 0 has no solution then by solving the
auxiliary problem with the Simplex algorithm we can obtain a certificate of infeasibility.
Suppose that Ax = b, x ≥ 0 has no solution and let us assume that b ≥ 0 (we leave it as an
exercise to modify the argument for an arbitrary vector b). Assume that A is an m × n matrix.
4.4. FARKAS’ LEMMA* 191
where d is the vector with n + m entries such that d j = 0 for j = 1, . . . , n and d j = −1 for
j = n + 1, . . . , n + m. Let B be the optimal basis for (Q) obtained by the Simplex algorithm.
The objective function of the linear program obtained by rewriting (Q) in canonical form for
the basis B is, h i
max d T − ȳT A I (x1 , . . . , xn+m )T + ȳT b
| {z }
=:d¯T
for some vector ȳ (see Proposition 2.4). The value of the objective function in (Q’) for the
basic solution of B is equal to bT ȳ. Since, Ax = b, x ≥ 0 has no feasible solution, that value is
negative 1 (see Remark 2.10). Hence,
bT ȳ < 0. (4.17)
Equation (4.17) and (4.18) now imply that ȳ is the required certificate of infeasibility.
Exercises
1. (a) Prove that exactly one of the following statements holds:
• there exists x such that Ax ≤ b, x ≥ 0;
1 we consider a maximization problem here rather than a minimization problem as in Section 2.6
192 CHAPTER 4. DUALITY THEORY
(S1) b ∈ cone{a1 , . . . , an },
(S2) there exists a halfspace H = {x : d T x ≤ 0} that contains all of a1 , . . . , an but does
not contain b.
(a) Explain what the theorem says when all vectors a1 , . . . , an , b have two entries.
(b) Show that (S1) holds if and only if there exists x such that Ax = b and x ≥ 0.
(c) Show that (S2) holds if and only if there exists y such that AT y ≥ 0 and bT y < 0.
(d) Prove the theorem above using parts (b) and (c) and Farkas’ lemma.
3. Consider the LP problems in standard inequality form, with data (A, b, c) such that A =
−AT (skew-symmetric), b = −c. Prove that the dual of such an LP problem is equivalent
to the original LP problem (such problems are called self-dual). Apply the duality theory
we developed to such LP problems to prove as many interesting properties as you can.
4.5. FURTHER READING AND NOTES 193
Applications of duality*
In Chapter 3, we have seen two examples for how the framework of duality theory leads
to algorithmic insights. Both in the case of the shortest-path problem, and in that of the
minimum-cost perfect matching problem we presented efficient algorithms that are primal-
dual. Such algorithms compute feasible solutions for the respective primal problems but also,
at the same time, construct feasible solutions for the respective duals. The dual solution is
in the end used as a certificate of optimality, but crucially provides algorithmic guidance
in finding the primal solution. In this chapter, we discuss a few more select examples that
demonstrate the diverse use of duality.
195
196 CHAPTER 5. APPLICATIONS OF DUALITY*
of U. Each set S j has a non-negative cost cS j associated with it, and the goal is to find a
collection C ⊆ S of sets of the smallest cost such that each element ei is contained in at least
one of the sets in C .
The problem has a natural integer programming formulation. We introduce an indicator
variable xS for every set S ∈ S that takes on value 1 if set S is chosen in our solution. The IP
has a constraint for every element e ∈ U that forces a feasible solution to pick at least one set
containing the element.
min ∑ cS xS : S ∈ S
subject to
(5.1)
∑ xS : e ∈ S ≥ 1 e∈U
x ≥ 0, x integer
Following the proven strategy from Chapter 3, we could now simply look at the linear pro-
gramming relaxation of the above integer program, and derive its dual. We could then design
an algorithm that finds a primal set-cover solution as well as a feasible dual solution whose
value equals that of the cover. This strategy worked for the shortest-path, and minimum-cost
perfect matching problems, but it is not likely to work here! The reason is that the set-cover
problem, unlike shortest-path and matching problems, is NP-hard. Here, we simply note that
few people believe that NP-hard optimization problems possess efficient exact algorithms.
The reason is that the class contains literally thousands of equivalent hard problems that are
not believed to admit efficient algorithms. Developing such an algorithm for any one of these
problems immediately implies the existence of an algorithm for all of the problems in this
class. The set-cover problem lies in this class, and thus is unlikely to be efficiently solvable.
So, now that we know that we should not expect an efficient exact algorithm for the set-
cover problem, what can be done? We settle for the next best thing: we approximate the
problem. Suppose we are given a class I of instances of some optimization (say minimiza-
tion) problem. For a given instance I ∈ I , let optI be the instance’s unknown optimal value.
We are looking for an efficient algorithm A that, given any instance I ∈ I , computes a so-
lution A (I) whose value is at most α(I) · optI , for some function α of the instance data.
Such an algorithm is called an α-approximation algorithm, and α is sometimes called its
approximation- or performance guarantee.
In the case of the set-cover problem we will present two algorithms. Both algorithm em-
ploy linear programming duality in order to compare the computed solution to the value of
5.1. APPROXIMATION ALGORITHM FOR SET COVER 197
the linear programming relaxation of (5.1). The first algorithm follows a primal-dual strat-
egy, similar to what we have seen in Chapter 3. The second algorithm is a so-called greedy
algorithm, and it produces a solution whose cost can once again be bounded within a certain
factor of a feasible dual solution.
cuv − yv − yv = 0,
for all uv ∈ M. We will following a similar strategy for the set-cover problem. The comple-
mentary slackness conditions for set-cover are as follows:
[CS1] xS > 0 =⇒ ∑ ye : e ∈ S = cS , and
[CS2] ye > 0 =⇒ ∑ xS : e ∈ S = 1.
198 CHAPTER 5. APPLICATIONS OF DUALITY*
Since the set-cover problem is NP-hard, we can not expect to find an integral solution x to
(5.1), and a feasible solution y to (5.2) such that [CS1] and [CS2] hold. We will enforce
condition [CS1] and relax condition [CS2].
Our algorithm works as follows: initially, let y = 0 be the trivial feasible solution to (5.2),
and let C = 0/ be an infeasible cover. The algorithm adds sets to C as long as there is an
element e ∈ U that is not covered by any of the sets in C . Whenever such an element e exists,
we increase the corresponding dual variable ye . In fact, we increase its value by the largest
possible amount ε that maintains y’s feasibility for (5.2); ε is the minimum slack of the dual
constraints corresponding to sets containing e:
ε = min{cS − ∑ ye0 : e0 ∈ S : e ∈ S}.
Let S be a set that attains the minimum on the right-hand side of the above expression. S
is now added to C , leading to e being covered. Furthermore, note that condition [CS1] is
satisfied by set S: in the primal solution corresponding to C , the variable for S has value 1,
and its dual constraint is tight for the updated vector y by our choice of ε. See Algorithm 6
for a formal description of the algorithm.
We define the frequency fe of element e ∈ U to be the number of sets in S that cover the
element. The frequency f of the given set-cover instance is the maximum over all element
frequencies; i.e., f = max1≤m fe .
Proof. The algorithm clearly returns a feasible set-cover C if there is one. Notice that the
cost of the final cover is
∑ cS = ∑ ∑ ye, (5.3)
S∈C S∈C e∈S
where the equality follows directly from condition [CS1]. Observe that a variable ye may
appear multiple times in the double-sum on the right-hand side; it appears for every set in C
that covers e. The right-hand side of (5.3) can therefore be rewritten as
∑ ye |{S ∈ C : e ∈ S}| ≤ f 1T y,
e∈U
where the inequality follows from the fact that every element appears in at most f sets.
In other words, the algorithm returns a vertex-cover of cost at most twice the optimal
vertex-cover, for any given instance of the problem.
than f 0 optI for any instance I? This appears to be unlikely (more on this later), but we can
improve the guarantee obtained on the previous section sometimes.
Suppose you are faced with a set-cover instance with universe U , and sets S , all of which
have unit cost. Your goal is to find the smallest number of sets that jointly cover all elements.
What would your natural strategy be? Well, you would make locally optimal decisions: you
would first pick a set that covers most elements, in the next step you would pick a set that
covers most uncovered elements, and so on. It is easy to extend this strategy to the setting
where sets have non-unit costs. Then we would want to balance between two objectives: cover
as many previously uncovered elements, and do this at the smallest possible cost. One way
of balancing the two competing objectives would be to always pick a set S ∈ S with smallest
cost divided by the number of newly covered elements. The resulting Algorithm 7 is known
as the greedy algorithm for set-cover.
S1 S2 S3 S4 S5 S6 Sm-1 Sm
S
e1 e2 e3 e4 e5 e6 em-1 em
Figure 5.1: An instance of the set-cover problem. Black dots correspond to elements, and
squares to sets. Shaded set Si has cost 1/i, and the large set S has cost 1 + ε for a fixed small
epsilon > 0.
Let us demonstrate Algorithm 7 on the set-cover instance given in Figure 5.1. The instance
has m elements, and m + 1 sets. The shaded set Si in the figure has a cost of 1/i, for all i, and
5.1. APPROXIMATION ALGORITHM FOR SET COVER 201
set S has cost 1 + ε, for some tiny ε > 0. In the first step, each of the shaded sets Si covers
one element and has cost 1/i, while the large set S has cost 1 + ε. Thus, our algorithm first
chooses set Sm as its cost to newly covered elements ratio is smaller than that of S. Similar
in the second step, the algorithm chooses set Sm−1 , and so on. In the end, our algorithm will
have picked up all the shaded sets, while the optimal solution clearly is to pick just the set S.
The solution computed by the greedy algorithm has cost
m
1
Hm := ∑ ,
i=1 i
compared to an optimal cost of just 1 + ε. Thus, we have shown that the greedy algorithm
can not be better than Hm -approximate. We will now show that Algorithm 7 always returns a
cover of cost at most Hm times that of an optimal solution.
Proof. We will prove this theorem by once again using linear programming duality. Suppose
Algorithm 7 picks the sets
S1 , . . . , S p ,
in this order, and let C be the cover consisting of these p sets. For all 1 ≤ i ≤ p, let us also
define
i−1
[
Ui := Si \ S j.
j=1
as the set of newly covered elements resulting from adding set Si . For 1 ≤ i ≤ p, and for all
e ∈ Ui , define
cSi
ye = . (5.4)
|Ui |
That is, we distribute the cost of set Si evenly over the elements in Ui , and thus
p
∑ ye = ∑ cSi . (5.5)
e∈U i=1
We will now show that y/Hm is feasible for (5.2). This together with (5.5) immediately shows
that the algorithm returns a solution of cost no more than Hm times the optimal value of (5.2).
The dual (5.2) has a constraint for every set S ∈ S that limits the y-value of elements in S
to the cost cS of the set. We will show that each such constraint is satisfied by y/Hm . Let
e1 , . . . , el
202 CHAPTER 5. APPLICATIONS OF DUALITY*
be the elements of some set S ∈ S , in the order in which they were covered by the greedy
algorithm. Hence, ei was covered at the same time or before element ei0 for any 1 ≤ i ≤ i0 ≤ l.
We claim that the value yei picked according to (5.4) satisfies
cS
yei ≤ ,
l −i+1
for all 1 ≤ i ≤ l. Suppose ei was covered in iteration r, where Algorithm 7 chose a set Sr in
step 3. Note that set S contained l − i + 1 uncovered elements when Sr was picked. The fact
that Sr was picked regardless implies
cSr cS
≤ .
|Sr \ S1 ∪ Sr−1 | l − i + 1
The left-hand side of the inequality is exactly the value of yei , and this proves the claim. Using
this, the left-hand side of the dual constraint for set S can be bounded as follows:
cS
∑ yei : 1 ≤ i ≤ l ≤ ∑ l − i + 1 : 1 ≤ i ≤ l ≤ Hm cS ,
where the final inequality follows from the fact that S contains at most m = |U | elements.
5.1.3 Discussion
The two algorithms presented in this chapter are incomparable in terms of their approxima-
tion guarantee. Algorithm 6 is obviously superior when each element lies in few sets, and
Algorithm 7 shines when the maximum frequency is high. Examples for both situations are
easy to come by, and we have in fact seen a low-frequency one in the vertex cover problem.
At present, no algorithm with a better approximation guarantee for the vertex cover problem
√
is known, and it is known that no α-approximation with α < 10 5 − 21 ≈ 1.36 can exist
unless NP=P [10]. If one assumes the less tried and tested Unique Games Conjecture, the
2-approximation algorithm given here is best possible unless NP=P [16].
The set-cover problem in general is harder to approximate than the vertex cover problem.
In this section we have presented two algorithms that compute covers C whose cost is within
a certain factor of the optimal value of the LP relaxation of (5.1). How good an approximation
algorithm can we obtain with such a strategy? Well, it appears that there are limits.
Let (IP) be an integer programming formulation for some given minimization problem,
and let (P) be its LP relaxation. Let I be a set of instances for this problem, and for some
5.2. ECONOMIC INTERPRETATION 203
I ∈ I , define optIP P
I and optI be optimal values for (IP) and (P), respectively, for the given
instance I. We then define the integrality gap of (P) as
optIP
I
α(P) := max .
I∈I optPI
Hence αP is the worst case multiplicative gap between the fractional and integral optimal
values for (P). Notice that we know from Weak Duality Theorem 4.1 that the value of any
feasible dual solution for instance I ∈ I can not be higher than optPI . Furthermore, the value
of an integral solution must be at least optIP
I . Thus, using LP (P), we can not hope to obtain
an α-approximation algorithm with α < α(P).
For set-cover it is easy to find example instances where (1 − ε) ln m optPI ≤ optIP
I for any
fixed ε > 0; see [23] for an example, and this shows that the approximation guarantee obtained
by Algorithm 7 is best possible (up to constant factors). In fact, one can even show that there is
a constant c ∈ (0, 1) such that no O((1 − c) ln m)-approximation algorithm exists unless NP=P
(see [11, 20, 1], and [24]).
that b̃ is the same as b except in the `th component. We denote the new primal-dual pair of
problems by (P̃) and (D̃) respectively. Let us use the Complementary Slackness conditions
on (P̃) and (D̃) for the dual solution y∗ which is still feasible in (D̃). For y∗ to be optimal in
(D̃), the necessary and sufficient conditions are that there exists x̄ ∈ Rn satisfying:
(a) rowi (A)x̄ = b̃i for all i such that y∗i > 0;
(b) x̄ j = 0 for all j such that row j (AT )y∗ > c j ; and,
Now, let us derive some sufficient conditions for y∗ to be optimal in (D̃). So, we set x j := 0
for every j such that row j (AT )y∗ > c j and write the linear system of equations
in terms of the remaining x j . So, y∗ being optimal forces for certain set of indices j, x j = 0 and
for certain set of indices i, rowi (A)x = b̃i . Assume that for the complements of these indices,
we have x∗j > 0 and rowi (A)x∗ < bi . Then, it can be proved that for every ε > 0 small enough,
the above system of linear equations has a solution x̄ satisfying the Complementary Slackness
conditions on (P̃) and (D̃) for the dual solution y∗ . Hence, we deduce that for every ε > 0
small enough, the optimal objective value of (D̃) (and hence of (P̃)) is bT y∗ + εy∗` . The only
difference between (P) and (P̃) is that b` is increased by ε. Therefore, y∗` is the maximum
unit “price” for resource ` that we should be willing to pay at our current (optimal) operating
conditions. We have the following result.
Theorem 5.4. Let x∗ be an optimal basic solution of max {cT x : Ax ≤ b, x ≥ 0} and let y∗
be an optimal solution of its dual. Further assume that for all i such that y∗i = 0, we have
rowi (A)x∗ < bi and for every j such that row j (AT )y∗ = c j , we have x∗j > 0. Then, for every
ε > 0 sufficiently small, the optimal value of the problem obtained by replacing b` by b` + ε
is εy∗` more than that of (P).
Note that if we consider decreasing the amount of resource ` available by ε then the
optimal objective value goes down by εy∗` . Moreover, the above assumption “for all i such
that y∗i = 0, we have rowi (A)x∗ < bi and for every j such that row j (AT )y∗ = c j , we have
x∗j > 0” (together with the assumption that x∗ is an optimal solution that is also basic) implies
that y∗ is the unique optimal solution of (D).
5.2. ECONOMIC INTERPRETATION 205
To recap, the dual variables correspond to some kind of “internal prices” called shadow
prices for the resources. These shadow prices depend on the current optimal production plan.
It is worth mentioning that the approach we used above can be used to derive correspond-
ing results for more complicated perturbations of b (i.e., we may perturb many elements of
b simultaneously). Moreover, by switching the roles of (P) and (D), and applying the same
mathematical reasoning, we can obtain analogous results for perturbations of c.
We may also want to consider larger perturbations. What if we wanted to analyze what
would happen to our profits as the availability of one or many of our resources were being
continuously increased? We could again use the techniques of linear optimization and duality
theory to prove that the rate of increase in our profits will be monotone decreasing per unit
increase in the availability of resources. E.g., if our profit goes up by $10 when b` is increased
from 500 to 501, our profit will go up by no more than $10 when b` is increased by one
provided b` ≥ 500. This allows us to prove a version of the economic principle of law of
diminishing marginal returns in our setting.
In addition to such strong connections to mathematical economics, linear programming
also has close historical ties to economics. During the early 1900’s, in the area of mathe-
matical economics, Leontief [1905–1999] and others were working on various problems that
had connections to linear optimization. One of the most notable models is the input-output
systems. Suppose there are n major sectors of a national economy (e.g., construction, labor,
energy, timber and wood, paper, banking, iron and steel, food, real estate, . . .). Let ai j denote
the amount of inputs from sector i required to produce one unit of the product of sector j (ev-
erything is in same units, dollars). Let A ∈ Rn×n denote the matrix of these coefficients. (A is
called the input-output matrix.) Now, given b ∈ Rn , where bi represents the outside demands
for the output of sector i, we solve the linear system of equations Ax + b = x. If (A − I) is
nonsingular, we get a unique solution x determining the output of each sector. Indeed, to have
viable system, we need every x j to be nonnegative. Otherwise, the economy requires some
imports and/or some kind of outside intervention to function properly. Leontief proposed this
model in the 1930s. Then in 1973 he won the Nobel Prize in Economics for it.
Kantorovich [1912–1986] won the Nobel Prize in Economics in 1975. Koopmans [1910–
1985] who had worked on Transportation Problem (a generalization of maximum-weight bi-
partite matching problem) in the 1940s as well as on input-output analysis (similar to Leon-
tief’s interests) for production systems, shared the Nobel Prize in Economics in 1975 with
Kantorovich. According to Dantzig, the Transportation Problem was formulated (and a solu-
206 CHAPTER 5. APPLICATIONS OF DUALITY*
tion method proposed) by Hitchcock in 1941. Later, the problem was also referred to as the
Hitchcock-Koopmans Transportation Problem.
The Transportation Problem together with the earlier work on electrical networks (by
mathematicians, engineers and physicists) laid some of the ground work for what was to
come in Network Flows and Network Design. Next, we will see another beautiful application
of duality theory in the area of Network Flows and Network/Graph Connectivity.
→
−
where xa denotes the number of bits on arc a ∈ E , and δ (q) and δ (q̄) denote the sets of out-
and in-arcs at q, respectively. A vector x that satisfies (5.6) for all q ∈ V \ {s,t} as well as
0 ≤ x ≤ c is called an s,t flow, and we let ∑ xa : a ∈ δ (s) be its value. Our integer program
attempts to find an s,t-flow of maximum value.
max ∑ xa : a ∈ δ (s)
subject to
(5.7)
∑ xa : a ∈ δ (q) − ∑ xa : a ∈ δ (q̄) = 0 q ∈ V \ {s,t}
0 ≤ x ≤ c, x integer
Before we continue, let us first simplify our life, and assume that vertex s has no in-arcs,
and that t has no out-arcs. This assumption is easily seen to be benign: add new nodes s0 and
→
−
t 0 , and infinite capacity arcs s0 s and tt 0 to G . Note that the old graph has an st-flow of value γ
if and only if the new graph has an s0t 0 -flow of the same value.
The dual of the LP relaxation of (5.7) has a variable la corresponding to the capacity
→
−
constraint xa ≤ ca for every arc a ∈ E , and a variable αq for the flow balance constraint (5.6)
→
−
of each vertex q ∈ V \ {s,t}. The dual has a constraint for each arc a ∈ E . Because of the
simplifying assumption above, we need to distinguish between three types of arcs only: those
that are not incident to s or t, those that leave s, and those that enter t.
208 CHAPTER 5. APPLICATIONS OF DUALITY*
−
→
min ∑ ce le : e ∈ E
subject to
→
−
αu − αv + luv ≥ 0 uv ∈ E : u, v 6∈ {s,t} (5.8)
−αv + lsv ≥ 1 sv ∈ δ (s)
αu + lut ≥ 0 ut ∈ δ (t¯)
l≥0
Since x = 0, and l = 1, α = 0 are feasible solutions for the LP relaxation of (5.7) and for
its dual (5.8), it follows from the Fundamental Theorem of Linear Programming (Theorem
2.11) that both have optimal solutions, and that their values must be the same. We will first
show that, given that the capacities ca are integral for all a ∈ A, these optimal solutions will
be integral.
Observe that we can rewrite the LP relaxation of (5.7) as
max cT x
subject to
(5.9)
Ax = b
x ≥ 0.
where matrix A has a column for each arc in A, one row for each flow balance constraint,
and one more for each capacity constraint. We will soon see that this matrix A is in fact
totally unimodular (TU). In the following we say that a matrix B is a submatrix of A if it is
obtained by deleting some of A’s rows and columns. A matrix A is then TU if all of its square
submatrices B have determinant in {0, +1, −1}.
Total unimodularity is a very useful property as we will now see. Suppose x∗ is an optimal
basic solution for (5.9) for some basis B. As usual we will use AB for the square submatrix of
A defined by the basic columns. Then, we know that
xB∗ = A−1
B b,
Since the denominator is either 1 or −1, the above expression is an integer. Hence, x∗ =
(xB∗ , xN∗ ) is integer. We obtain the following theorem as an immediate consequence.
Theorem 5.5. Given a linear program (P) in standard equality form whose matrix A is totally
unimodular, and whose right-hand side vector b is integral. Then every basic solution x of (P)
is integral as well.
Corollary 5.7. LP (5.7) has integral optimal solution whenever c is integral. The dual (5.8)
always has optimal integral solutions. In particular, there is an optimal solution (α ∗ , l ∗ ) such
→
−
that le∗ ∈ {0, 1} for every e ∈ E , and the set
→
−
C := {e ∈ E : le∗ = 1} (5.11)
forms an st-cut.
Proof. We already argued that (5.7) and (5.8) are both feasible, and hence both must have
optimal basic solutions. Proposition 5.6 (together with Theorem 5.5) immediately shows that
primal basic solutions must be integral. The same result now follows for the dual as the
transpose of a matrix A is TU if and only if A is.
Let (α ∗ , l ∗ ) be an optimal basic solution to (5.8). Consider the st-path
−αv∗1 + lsv
∗
1
≥ 1
+ αv∗1 − αv∗2 + lv∗1 v2 ≥ 0
+ αv∗2 − αv∗3 + lv∗2 v3 ≥ 0
+ ...
+ αv∗p−1 − αv∗p + lv∗p−1 v p ≥ 0
+ αv∗p + lv∗pt ≥ 0
∗
= lsv1
+ lv∗1 v2 + . . . + lv∗pt ≥ 1
This shows that all st-paths P must contain at least one arc e with le∗ ≥ 1. Let C be an
inclusion-wise minimal st-cut contained in
→
−
{e ∈ E : le∗ ≥ 1};
i.e., choose C such that none of its proper subsets is an st-cut. Let S be the set of vertices that
→
−
are reachable from s in the graph obtained from G by removing the arcs in C , and define a
candidate solution for (5.8) as follows: let le = 1 if e ∈ C , and le = 0 otherwise; αv = −1 if
v ∈ S, and αv = 0 otherwise. We now verify that (α, l) satisfies the constraints of (5.8).
5.3. THE MAX-FLOW MIN-CUT THEOREM 211
→
−
Consider the dual constraint for arc uv ∈ E with u, v ∈ V \ {s,t}. If u, v are both in S,
or both in V \ S then αu − αv = 0 and the constraint is satisfied. If u ∈ V \ S, and v ∈ S then
αu − αv = 1, and the constraint is once again satisfied. Finally, if u ∈ S and v ∈ V \ S, then
αu − αv + luv = −1 − 0 + 1 = 0.
It is equally routine to check that the dual constraints for arcs incident to s or t are satisfied.
Note that the value of solution (α, l) is no more than that of (α ∗ , l ∗ ), and hence (α, l) is
optimal for (5.8).
Corollary 5.8 (Max-flow Mincut). The maximum value of an st-flow is equal to the minimum-
capacity of an st-cut.
Exercises
1. Consider the LP problem max{cT x : Ax ≤ b, x ≥ 0}. Suppose it has an optimal solution x∗ .
(a) Define b(θ ) := b + θ e1 and let (Pθ ) be the LP problem obtained from the above by
replacing b by b(θ ). Prove that for all θ ≥ 0, (Pθ ) has a basic feasible solution that is
optimal.
(b) Let z∗ (θ ) denote the optimal value of (Pθ ). Prove that z∗ : R+ → R is a piecewise linear
function of θ .
(c) Prove that
θ 2 ∗
∈ R : θ ≥ 0, z ≤ z (θ )
z
is a convex set.
2. We discussed the Shortest Paths problem as well as the Maximum Flow problem. In this
exercise, we will try to formulate the Shortest Paths problem using a network flow idea.
Given s and t, consider formulating the problem of finding a shortest path from s to t as the
problem of finding one unit flow from s to t of minimum cost, where the cost of sending
one unit of flow along an edge is the length of that edge. Note that for every node except s
and t in the graph, we must have flow conservation (if there is flow coming into the node,
212 CHAPTER 5. APPLICATIONS OF DUALITY*
it must leave the node). For s, one unit of flow must leave it; and one unit of flow must
arrive at t. Using the variables
1, if there is a unit flow on the arc (i, j);
xi j :=
0, otherwise
and the above reasoning, formulate the shortest path problem as an integer programming
problem. Then consider the LP relaxation of your integer programming problem, write
down the dual and interpret the strong duality theorem for this pair of primal-dual LP
problems.
Chapter 6
In Chapter 2, we have seen how to solve linear programs using the Simplex algorithm, an
algorithm that is still widely used in practice. In Chapter 3 we discuss efficient algorithms
to solve the special class of integer programs describing the shortest path problem and the
minimum cost perfect matching in bipartite graphs. In both these examples it is sufficient to
solve the linear programming relaxation of the problem.
These algorithms follow two general strategies: The first attempts to reduce integer pro-
grams to linear programs, this is known as the cutting plane approach and will be described
in Section 6.2. The other strategy is a divide and conquer approach which is known as branch
and bound which will be discussed in Section 6.3. In practice both strategies are combined
under the heading of branch and cut. This remains the preferred approach for all general
purpose commercial codes.
In this chapter in the interest of simplicity we will restrict our attention to pure integer
program where all the variables are required to be integer. The theory developed here extend
to mixed integer program where only some of the variables are required to be integer, but the
material is beyond the scope of this book.
213
214 CHAPTER 6. SOLVING INTEGER PROGRAMS
Then the convex hull of S is given by the triangle with vertices x(1) , x(2) , x(3) as indicated in the
next figure. Note, that this triangle is convex and contains S. Moreover, any set that contains
S must contain all the points in this triangle.
x2
x(1)
2
!
conv {x(1) , x(2) , x(3) })
x(3)
1
x(2)
0 1 2 3 x1
The polyhedron P is represented in the next figure. Each of the constraints (1),(2),(3) corre-
spond to a halfspace as indicated in the figure. Let us define S to be set set of all integer points
6.1. INTEGER PROGRAMS VERSUS LINEAR PROGRAMS 215
in P. Note, that in this case S is an infinite set of points. Then the convex hull of S is described
by another polyhedron
−1 0 −2 (a)
x1 x1
Q= : 1 −1 ≤ 2
(b) .
x2 x2
0 −1 −1 (c)
The polyhedron Q is represented in the previous figure. Each of the constraints (a),(b),(c)
correspond to a halfspace as indicated in the figure.
4
(2)
P Q
3
1
(b)
0 1 2 3 (3) 4 5 x1
In example 22 we saw that the convex hull of the set of all integer points in a polyhedron
is a polyhedron. This is no accident, indeed we have the following deep result that relates
feasible solutions of integer programs to feasible solutions of linear programs,
We omit the proof of this result in this book. The condition that all entries of A and b be
rational numbers cannot be excluded from the hypothesis.
216 CHAPTER 6. SOLVING INTEGER PROGRAMS
max cT x
subject to
(6.1)
Ax ≤ b
x integer
Let us assume that all entries in the matrix A and the vector b are rational. This is a natural
assumption since numbers stored on a computer can only be recorded with finite precision and
hence are rational. Let S denote the set of all integer points satisfying the constraints Ax ≤ b,
i.e. S is the set of feasible solutions to the integer program (6.1). It follows from Theorem 6.1
that the convex hull of S is a polyhedron, i.e. that conv(S) = {x : A0 x ≤ b0 } for some matrix
A0 and vector b0 where all entries of A0 and b0 are rational.
Let us define the following linear program,
max cT x
subject to (6.2)
A0 x ≤ b0
Theorem 6.2. The following hold for the integer program (6.1) and the linear program (6.2),
(d) every optimal solution to (6.2) that is an extreme point is an optimal solution to (6.1).
Example 23. We illustrate the previous theorem. Suppose that (6.1) is of the form,
We describe the feasible region of (IP) in the next figure. Each of the constraints (1),(2),(3),(4)
correspond to a halfspace as indicated in the figure. Thus the set of all feasible solutions to
(IP) is given by,
1 1 1 2 2 3
S= , , , , , .
1 2 3 1 2 1
(4)
x2
(a)
3
(1)
(2)
2
S
1 (b)
(c)
(3)
0 1 2 3 4 x1
Theorem 6.2 tells us that in a theoretical sense Integer Programming reduces to Linear
Programming: it suffices to compute a basic optimal solution to (6.2). The catch here, is that
the system A0 x ≤ b0 in the linear program (6.2) is in general much larger (exponentially larger)
than the system Ax ≤ b in the integer program (6.1), hence it cannot be described completely
in practice. What we do instead is to try to approximate the description of the polyhedron
{x : A0 x ≤ b0 } using a limited number of constraints. One way of proceeding is described in
the next section.
Exercises
1. Consider the IP,
max x1 + 10x2
subject to
1 −20 0
x≤
1 20 20
x ≥ 0, x integer
(a) What is the geometrical distance, in R2 , between the optimal solution of the IP and of
its LP relaxation? Explain.
(b) Give an example of an IP for which the geometrical distance between its optimal solu-
tion, and that of its LP relaxation, is at least 100.
(c) Give an example of an IP which has no feasible integer solutions, but its LP relaxation
has a feasible set in R2 of area at least 100.
(a) Graph the feasible region of the LP relaxation of IP, indicate the feasible solutions of
the IP on the graph. Then, find the optimal solutions of the LP relaxation and the IP.
(b) Focus on the optimal solution of the LP relaxation. What are the closest (in the sense
of Euclidean distance) feasible solutions of the IP? Are any of these closest IP feasible
solutions optimal for the IP? Now, replace the objective function with max x1 + 8.5x2 .
Answer the same questions again. What related conclusions can you draw from these
examples?
6.2. CUTTING PLANES 219
(c) Create a family of IP problems with two nonnegative integer variables parameterized
by an integer k ≥ 10 (in the above example k = 10) generalizing the above example
and its highlighted features. Then answer the parts (a) and (b) for all values of k.
As we do not know how to deal with the integrality conditions, we shall simply ignore them.
Thus, we solve the linear program relaxation (LP1) obtained by removing integrality condi-
tions from (IP). We obtain the optimal solution x(1) = (8/3, 4/3)T . Unfortunately, x(1) is not
integral, thus it is not a feasible solution to (IP).
We wish to find a valid inequality (?) which satisfies the following properties:
(I) (?) is valid for (IP), i.e. every feasible solution of (IP) satisfies (?),
An inequality that satisfies both (I) and (II) is called a cutting plane for x(1) . We will discuss
in the next section how to find such a cutting plane, but for the time being, let us ignore that
problem and suppose that we are simply given such an inequality,
x1 + 3x2 ≤ 6. (?)
We add (?) to the system (LP1) to obtain a new linear program (LP2). Because (?) satisfies (I),
every feasible solution to (IP) is a feasible solution to (LP2). Moreover, (II) implies that x(1)
is not feasible for (LP2), so in particular the optimal solution to (LP2) will be distinct from
x(1) . In our case (LP2) has an optimal solution x(2) = (3, 1)T . Note, that x(2) is integral. Since
it maximizes the objective function over all feasible solutions of (LP2) it also maximizes the
objective function over all feasible solutions of (IP), hence x(2) is optimal for (IP).
220 CHAPTER 6. SOLVING INTEGER PROGRAMS
We describe the feasible region of (LP1) in figure (i). Each of the constraints (1),(2)
correspond to a halfspace as indicated in the figure. We also indicate the optimal solution
x(1) = (8/3, 4/3)T . Since x(1) was not integral we added a cutting plane (?): x1 + 3x2 ≤ 6
x2 x2
2 2
(1)
x(1)
x(2)
1 1
(2) (!)
feasible region of (LP1) feasible region of (LP2)
0 1 2 3 4 x1 0 1 2 3 4 x1
(i) (ii)
and obtained (LP2). We describe the feasible region of (LP2) in figure (ii). Constraint (?)
corresponds to a halfspace as indicated in the figure. We also indicate the optimal solution
x(2) = (3, 1)T of (LP2).
Let us formalize the procedure. Consider the following pair of optimization problems,
max{cT x : x ∈ S1 }, (P1)
max{cT x : x ∈ S2 }. (P2)
If S2 ⊇ S1 then we say that (P2) is a relaxation of (P1). For instance, (P1) may be an integer
program and (P2) its linear programming relaxation.
(b) if x̄ is optimal for (P2) and x̄ is feasible for (P1), then x̄ is optimal for (P1).
We can now describe our algorithm. Suppose we wish to solve the integer program,
max cT x
subject to
(IP)
Ax ≤ b
x ≥ 0, x integer.
Correctness of the algorithm follows from Remark 6.3. We did not discuss the possibility that
(LP) might be unbounded. In that case we can only now that one of the following holds: either
(IP) is unbounded or (IP) is infeasible. But we do not know which case occurs. In practice we
are often in situations where we know that the (IP) not unbounded.
We have yet to show how to find cutting planes. Let us revisit example 24 from the previous
section. After introducing slack variable x3 for constraint (1) and x4 for constraint (2) we can
222 CHAPTER 6. SOLVING INTEGER PROGRAMS
The corresponding basic solution is (8/3, 4/3, 0, 0)T . It implies that (8/3, 4/3)T is the optimal
solution for (LP1) in example 24. We will use the previous linear program to find a cutting
plane. Consider any constraint of the previous linear program where the right hand side is
fractional. In this case, we have a choice and select the first constraint, namely,
1 4 8
x1 − x3 + x4 = .
3 3 3
Then trivially the following constraint holds for every feasible solution to (LP1’),
1 4 8
x1 − x3 + x4 ≤ .
3 3 3
Observe that every variable is non-negative, thus if we replace any of the coefficient for the
variables in the previous equation by a smaller value, the resulting inequality will still be valid
for (LP1’). In particular,
1 4 8
x1 + − x3 + x4 = x1 − x3 + x4 ≤ ,
3 3 3
is valid for (LP1’), where bαc denotes the largest integer smaller or equal to α. Let x̄ be
any feasible solution to (IP’). Then x̄ is a feasible solution to (LP1’) and x̄1 − x̄3 + x̄4 ≤ 83 .
6.2. CUTTING PLANES 223
Moreover, (8/3, 4/3, 0, 0)T does not satisfy (6.3). Hence, (6.3) is a cutting plane. Recall, that
(IP’) implies that x3 = 8 − x1 − 4x2 and x4 = 4 − x1 − x2 . Substituting this in (6.3) yields the
constraint x1 + 3x2 ≤ 6 which was the cutting plane (?) given in example 24.
To proceed further we add a slack variable x5 to the constraint (6.3) and modify the linear
programming relaxation (LP1’) of (IP’) by adding the resulting constraint. We thus obtain,
We solve (LP2’) using the 2-phase simplex algorithm (we need to use phase I, since we do not
have a feasible basis), and rewrite (LP2’) in canonical form for the optimal basis B = {1, 2, 3}
to obtain,
1 3
max 11 + (0, 0, 0, − , − )x
2 2
subject to
1 0 0 3/2 −1/2 3
0 1 0 −1/2 1/2 x = 1
0 0 1 1/2 −3/2 1
x ≥ 0.
The basic optimal solution is (3, 1, 1, 0, 0)T . Since it is integer, it follows (see Remark 6.3)
that it is an optimal solution to (IP’). Note that it corresponds to the optimal solution (3, 1)T
of (IP) in example 24.
Let us generalize this argument. Consider an integer program,
max cT x
subject to
(IP)
Ax = b
x ≥ 0, integer.
224 CHAPTER 6. SOLVING INTEGER PROGRAMS
We solve the linear programming relaxation (LP) of (IP) using the simplex procedure. If (LP)
has no solution, then neither does (IP). Suppose we obtain an optimal basis B of (LP). We
rewrite (LP) in canonical form for B to obtain a linear program of the form,
max z̄ + c̄TN xN
subject to
xB + ĀN xN = b̄
x ≥ 0.
The corresponding basic solution x̄ is given by x̄B = b̄ and x̄N = 0. If b̄ is integer, then so is x̄,
and x̄ is an optimal solution to (IP), see Remark 6.3. Thus, we may assume that b̄i is fractional
for some index i. Constraint i is of the form
xi + ∑ Āi j x j = b̄i.
j∈N
As this constraint is valid for (LP) with equality it is also valid with ≤. Observe that every
variable is non-negative. Hence, the following inequality will be valid for (LP),
xi + ∑ bĀi j cx j ≤ b̄i.
j∈N
For x̄ feasible for (IP), the LHS of the previous equation is an integer. Hence, the following
constraint is valid for (IP),
xi + ∑ bĀi j cx j ≤ bb̄ic. (?)
j∈N
Remark 6.4. Constraint (?) is a cutting plane for the basic solution x̄.
Proof. Since x̄ j = 0 for all j ∈ N, the left hand side of (?) is x̄i = b̄i . As b̄i is fractional
b̄i > bb̄i c. Then the left hand side is larger than the right hand side for x̄.
Exercises
1. Suppose that we solve a linear programming relaxation (P) of an integer program (where
all variables are required to be integer) and that for the optimal basis B = {1, 2, 3} of (P)
6.2. CUTTING PLANES 225
Which constraints lead to cutting-planes? For each such constraint generate a cutting-
plane.
Let (D) be the dual of (P). Suppose you found an optimal solution x̄ of (P) and an optimal
solution ȳ of (D). You are being told however, that you forgot to include one important
constraint,
n
∑ a jx j ≤ β . (?)
j=1
(a) Construct a new linear program (P’) by adding a slack variable to (?) and by adding
the resulting constraint to (P).
(b) Show that if x̄ satisfies (?) then x̄ is an optimal solution to (P’).
(c) Compute the dual (D) of (P) and the dual (D’) of (P’).
(d) Show how to compute a feasible solution y0 of (D’) from your solution ȳ of (D). Note,
this works whether or not x̄ satisfies (?).
Denote by (P2) the linear program in Standard Equality Form obtained from (P1) by adding
constraint (?) and by adding a slack variable x4 .
(c) Solve (P2) with the Simplex, starting with the basis B = {3, 4}.
Denote by x2 the optimal basic solution.
4. Let
xk + ∑ ā j x j = b̄k (k ∈
/ N)
j∈N
be one of the constraints of the LP in canonical form for an optimal basis B (where N is
the index set for nonbasic variables). Define
N ∗ := { j ∈ N : ā j is not an integer}
∑ xj ≥ 1
j∈N ∗
is a valid inequality for the underlying IP and that it cuts the current optimal solution of
the LP relaxation. Compare this cut to the cut
∑ xj ≥ 1.
j∈N
(Also prove that this is a valid inequality for the underlying IP and that it cuts the current
optimal solution of the LP relaxation.) Which one do you think is a better cut ? Explain
why and prove your claims.
5. Let G = (V, E) be a graph with edge weights we for all e ∈ E. Recall that (3.24) is the
IP formulation of the minimum weight matching problem. Let S ⊆ V be a subset of the
vertices of G where |S| is odd.
6.3. BRANCH & BOUND 227
(a) Show (by combining constraints and using rounding) that the following inequality is
valid for the (IP), i.e. that all feasible solutions to the (IP) satisfy this constraint,
|S|
∑(xi j : i j ∈ E and i ∈ S, j ∈ S) ≤ 2 .
In other words the sum of all variables corresponding to edges with both ends in S is
at most the number of vertices in S divided by two and rounded down.
In this section, we introduce a solution method for integer programs, called branch & bound.
It is a “divide and conquer” approach for solving linear integer programming problems. We
illustrate some of the main ideas using a production example.
Every year, the University of Waterloo readies itself for the annual home coming festivi-
ties. The university expects hundreds of alumni and their families to visit its campus for the
occasion, and campus stores in particular expect to be busier than usual. Two items promise
to be in especially high demand during these times: the university’s infamous Pink Tie, as
well as the (almost) equally popular Pink Bow-Tie. With only a few weeks to go until home
coming, it is high time to replenish stocks for these two items. The university manufactures
its ties from its own special cloth of which it currently has 45ft2 in stock. For 10 pink bow
ties, it needs 9ft2 of cloth, and producing 10 pink ties requires 11ft2 of the raw material. In
addition to cloth, the manufacturing process also requires labour which is particularly short
during the next few weeks: only a total of 6 work days are available. Manufacturing of the
ties is done in batches of 10. Producing 10 pink bow ties takes 1.5 days, and producing 10
pink ties takes a day. The university expects a profit of $6 for a pink bow tie, and a profit of
$5 for a pink tie. How much of each product should it produce?
Just like in Chapter 1, we can formulate this problem as an integer program. We introduce
variables x and y for 10’s of bow-ties and ties to produce. The following integer program has
constraints restricting the available labour and raw material, and its objective function is the
total profit for the given production vector.
228 CHAPTER 6. SOLVING INTEGER PROGRAMS
max 60 x + 50 y (6.4)
subject to 1.5 x + y ≤ 6
9 x + 11 y ≤ 45
x, y ≥ 0
x, y integer. (6.5)
How can we solve this integer program? Let us use what we know: linear programming!
We drop the integrality constraints (6.5) and obtain the linear programming relaxation (6.4 −
LP) of (6.4). Remark 6.3(2) states that if we solve the LP relaxation of an integer program,
and obtain a solution, all of whose variables are integers, then it is also an optimal solution of
the original IP! Unfortunately, solving (6.4 − LP) gives the solution x = 2.8, y = 1.8 of value
258 which has fractional components. Remark 6.3 implies, however, that no integral feasible
solution for (6.4) can have value greater than 258, and hence we have found an upper bound
on the maximum value of any feasible solution of the original IP.
We will now use the fractional solution to partition the feasible region. In the following,
we let Subproblem 1 denote the original LP relaxation of the IP. We observe that every feasible
solution for (6.4) must have either x ≤ 2 or x ≥ 3, and that the current fractional solution does
not satisfy either one of these constraints. Thus, we now branch on variable x and create two
additional subproblems:
None of the two subproblems contains the point (2.8, 1.8), and the optimal solution for the
original LP relaxation can therefore not re-occur when solving either one of the two subprob-
lems. We now choose any one of the above two subproblems to process next. Arbitrarily, we
pick subproblem 2. Solving the problem yields the optimal solution x = 3, and y = 1.5 with
value 255. This solution is still not integral (as y is fractional), but 255 gives us a new, tighter
upper bound on the maximum value of any integral feasible solution for this subproblem.
The subproblem structure explored by the branch & bound algorithm is depicted in Figure
??. Each of the subproblems generated so far is shown as a box which are referred to as
branch & bound nodes. The two nodes for subproblems 2 and 3 are connected to their parent
6.3. BRANCH & BOUND 229
node, and the corresponding edges are labeled with the corresponding constraints that were
added to subproblem 1. The entire figure is commonly known as the branch & bound tree
generated by the algorithm.
The optimal solution for Subproblem 2 is still not integral as the value of y is fractional.
We branch on y, and generate two more subproblems:
Running the Simplex algorithm on Subproblem 5, we quickly find that the problem is
infeasible. Hence, this subproblem has no fractional feasible solution, and thus no integral
one either. Solving Subproblem 4, we find the solution x = 3 31 , and y = 1 of value 250. We
generate two more subproblems by branching on x.
and thus, will not find a solution that is better within this subproblem. We therefore might as
well stop branching here!
Formally, whenever the optimal value of the current subproblem is at most the value of the
best integral solution that we have already found, then we may stop branching at the current
node. We say: we prune the branch of the branch & bound tree at the current node.
We are done as no unexplored branches of the branch & bound tree remain. The final
tree is shown in Figure ??. The optimal solution to the original IP (6.4) is therefore x = 4,
y = 0 and achieves a value of 240. Therefore, in order to optimize profit, the university is best
advised to invest all resources into the production of pink bow-ties!
We conclude this section with a brief discussion. The branch & bound algorithm dis-
cussed here can be viewed as a smart enumeration algorithm: it uses linear programming in a
smart way, partitioning the space of feasible solutions into mutually exclusive and exhaustive
regions. For example, in the pink tie example discussed above, it is instructive to see that any
feasible integral solution occurs in one of the subproblems corresponding to the leaves of the
tree in Figure ??. The algorithm is smart as it uses the optimal LP value of a subproblem to
possibly prune it. Sometimes, such pruning can save exploring a large number of potential
solutions hidden in a subproblem.
The algorithm as described is quite flexible in many ways, and in our description, we
made many arbitrary choices. For example, if we solve a subproblem, and the solution has
more than one fractional variable, how do we decide which variable to branch on? And once
we branched on a variable, which of the generated problems do we solve next? The depth-
first search style exploration chosen in our example, where newly generated subproblems
are explored first, is popular as it leads to integral solutions quickly. However, many other
strategies have been analyzed.
Why do we not simply present the strategy that works best? Well, such a strategy likely
does not exist. Solving integer programs in general is N P-hard, and very likely, no efficient
algorithm exists. In practice, however, branch & bound is nearly always superior to simple
enumeration of feasible solutions, and is used in some form or the other in most commercial
codes.
We conclude with one last comment regarding implementation. In this section, we reduced
the problem of finding an optimal production plan for a tie production problem to that of
solving a series of 9 linear programming problems. The reader may notice that these problems
are extremely similar! I.e., branching on a variable merely adds a single constraint to an LP
6.3. BRANCH & BOUND 231
for which we know an optimal basic solution. Can we use such an optimal basic solution for
a parent problem to compute solutions for the generated subproblems faster? The answer is
yes! The so called dual Simplex algorithm applies in situations where a single constraint is
added to a linear program, and where the old optimal solution is rendered infeasible by this
new constraint. In many cases, the algorithm reoptimizes quickly from the given infeasible
starting point.
Exercises
1. Suppose we use a branch and bound scheme to solve an integer program that is a maxi-
mization problem. The following figure describes the partial enumeration tree. For each
subproblem we indicate the value of the objective function as well as whether the solution
is an integer solution or not. Indicate in the figure which subproblems we still need to
solve (if any). In particular, can we already deduce the value of the optimal solution to the
integer program?
z=60
Fractional
x4 4 x4 5
z=52 z=57
Fractional Fractional
x6 6 x6 7 x3 9 x3 10
x8 2 x8 3 x5 0 x5 1
z=45 z=42 z=46 z=46
Fractional Integer Fractional Integer
You do not need to use the simplex algorithm to find a solution to each of the relaxations.
We illustrate a simple technique on the relaxation of the original problem. Consider the
ratios:
18 10 6 4
> > > .
12 10 8 6
To maximize the objective function we set x1 as large as possible, then x2 as large as
possible, etc. In this case it yields, x1 = 1, x2 = 6/10 and x3 = x4 = 0. Proceed in a similar
way to solve each of the other relaxations.
where n is an odd positive integer. Prove that the branch and bound algorithm (without
n
using cuts) will have to examine at least 2b 2 c subproblems before it can solve the main
problem, in the worst-case.
Nonlinear optimization
In this chapter, we will show that solving general NLP is likely to be difficult, even when the
problem has small size. One reason is that the feasible region (the set of all feasible solutions)
of an NLP is not always convex. We therefore, turn our attention to the special case where
the feasible region is convex. We discuss optimality conditions and give a brief overview of
a primal dual polynomial algorithm for linear programming based on ideas from nonlinear
optimization.
Key concepts covered in this chapter include: convex functions, epigraphs, subgradients,
Lagrangians, Karush-Kuhn-Tucker Theorem and primal-dual interior-point algorithms.
f (x) := x2 , and
3 3
g1 (x) := −x12 − x2 + 2, g2 (x) := x2 − , g3 (x) := x1 − , g4 (x) := −x1 − 2.
2 2
The feasible region is a subset of R2 . It corresponds to the union of the two shaded regions
in Figure 7.1. For instance, g1 (x) = −x12 − x2 + 2 ≤ 0 or equivalently, x2 ≥ 2 − x12 . Thus the
solution to the first constraint of the NLP is the set of all points above the curve indicated by g1
in the figure. As we are trying to find among all points x = (x1 , x2 )T in the feasible region the
one that minimizes f (x) = x2 , the unique optimal solution will be the point a = (−2, −2)T .
233
234 CHAPTER 7. NONLINEAR OPTIMIZATION
Observe that the feasible region is not convex, indeed it is not even “connected” (i.e. there
does not exists a continuous curve contained in the feasible region joining any two points of
the feasible region).
x2
g4 g3
2
g2
−1 0 1 2 x1
−2
b
−1
g1
a −2
Example 26. Suppose n = 2 and m = 3 in (1.35) and that for x = (x1 , x2 )T ∈ R2 we have,
x2
g3 ! "
1 1
a=
1
3
4
g1
1
2
g2
1
4
0 1 1 3 1 x1
4 2 4
7.2.1 NP-hardness
Nonlinear optimization programs can be very hard in general. For instance, suppose that for
every j ∈ {1, . . . , n} we have the constraints,
x2j − x j ≤ 0,
−x2j + x j ≤ 0.
Then the constraints define the same feasible region as the quadratic equations: x2j = x j , for
every j ∈ {1, 2, . . . , n}. Therefore, the feasible region of these constraints is exactly the 0,1
vectors in Rn . Now, if we also add the constraints Ax ≤ b, we deduce that every 0,1 integer
programming problem can be formulated as an NLP. In other words, 0,1 integer program-
ming is reducible to nonlinear programming. Moreover, this reduction is clearly polynomial.
Therefore, solving an arbitrary instance of an NLP is at least as hard as solving an arbitrary
instance of a 0,1 integer programming problem. In particular, as 0,1 feasibility is NP-hard, so
is nonlinear programming feasibility.
236 CHAPTER 7. NONLINEAR OPTIMIZATION
Theorem 7.1. There do not exist integers x, y, z ≥ 1 and integer n ≥ 3 such that xn + yn = zn .
Fermat, wrote his conjecture in the margin of a journal, and claimed to have a proof of this
result, but that it was too large to fit in the margin. This conjecture became known as Fermat’s
Last theorem. The first accepted proof of this result was published in 1995, some 358 years
after the original problem was proposed. We will show that a solution to a very simple looking
NLP with only four variables has the following key property: the optimal objective value of
zero is attained, if and only if Fermat’s Last Theorem is true. Hence, solving this particular
NLP is at least as hard as proving Fermat’s Last Theorem! In this NLP, see (1.35), we have 4
variables (n = 4) and 4 constraints (m = 4). For x = (x1 , x2 , x3 , x4 )T ∈ R4 ,
2
f (x) := x1x4 + x2x4 − x3x4 + (sin πx1 )2 + (sin πx2 )2 + (sin πx3 )2 + (sin πx4 )2 ,
and
First observe that the feasible region of this NLP is given by,
S := {x ∈ R4 : x1 ≥ 1, x2 ≥ 1, x3 ≥ 1, x4 ≥ 3}.
Note that f (x) is a sum of squares. Therefore, f (x) ≥ 0 for every x ∈ R4 and it is equal to zero
if and only if every term in the sum is zero, i.e.,
x1x4 + x2x4 = x3x4 and sin πx1 = sin πx2 = sin πx3 = sin πx4 = 0.
The latter string of equations is equivalent to x j being integer for every j ∈ {1, 2, 3, 4}. More-
over, the feasibility conditions require x1 ≥ 1, x2 ≥ 1, x3 ≥ 1, x4 ≥ 3. Therefore, f (x̄) = 0 for
some x̄ ∈ S if and only if x̄ j is a positive integer for every j with x̄4 ≥ 3, and x̄1x̄4 + x̄2x̄4 = x̄3x̄4 .
That is, if and only if Fermat’s Last Theorem is false. Surprisingly, it is not difficult to prove
(and it is well-known) that the infimum of f over S is zero. Thus, the difficulty here lies
entirely in knowing whether the infimum can be attained.
7.3. CONVEXITY 237
We just argued that some nonlinear optimization problems can be very hard even if the
number of variables is very small (e.g., at most 10) or even if the nonlinearity is bounded
(e.g., at most quadratic functions). However, by carefully isolating the special nice structures
in some classes of nonlinear programming problems and exploiting these structures allow us
to solve many, large-scale non-linear programs in practice.
7.3 Convexity
Consider an NLP of the form given in (1.35) and denote by S the feasible region. We say that
x̄ ∈ S is locally optimal if for some positive d ∈ R we have that f (x̄) ≤ f (x) for every x ∈ S
where kx − x̄k ≤ d, i.e. no feasible solution of the NLP that is within distance d of x̄ has better
value than x̄. We sometimes call an optimal solution to the NLP, a globally optimal solution.
It is easy to verify, that if S is convex, then locally optimal solutions are globally optimal.
However, when S is not convex we can have locally optimal solutions that are not globally
optimal. This is illustrated in Example 25. There, b is locally optimal, yet a 6= b is the only
globally optimal solution.
A natural scheme for solution an optimization problem is as follows: find a feasible solu-
tion, and then repeatedly, either (i) show that the current feasible solution is globally optimal,
using some optimality criteria, or (ii) try to find a better feasible solution (here better might
mean one with better value for instance, though this may not always be possible). The simplex
algorithm for linear programming follows this scheme. Both steps (i) and (ii) may become
difficult when the feasible region is not convex. We will therefore turn our attention to the
case where the feasible region is convex. In this section, we establish sufficient conditions for
the feasible region of an NLP to be convex (see Remark 7.4).
In other words, f is convex if for any two points x(1) , x(2) the unique linear function on the
line segment between x(1) and x(2) that takes the same value for x(1) and x(2) as f , dominates
238 CHAPTER 7. NONLINEAR OPTIMIZATION
x ∈ Rn x ∈ Rn
x(1) x(2) x(1) x(2)
Example 27. Consider the function f : R → R where f (x) = x2 . Let a, b ∈ R be arbitrary, and
consider an arbitrary λ ∈ [0, 1]. To prove that f is convex we need to verify 1 that
?
[λ a + (1 − λ )b]2 ≤ λ a2 + (1 − λ )b2 .
Proposition 7.2. Let f : Rn → R. Then, f is convex if and only if epi( f ) is a convex set.
Observe in the previous proposition that the epigraph is living in an (n+1)-dimensional space.
The function in Figure 7.5 is convex as its epigraph is convex. However, the function in
Figure 7.6 is not convex as its epigraph is not convex.
y∈R y∈R
epi(f )
f (x) f (x) epi(f )
(x, y), y ≥ f (x)
! "
x, f (x)
x ∈ Rn x ∈ Rn
x
µ1 µ2
Proof of Proposition 7.2. Suppose f : Rn → R is convex. Let , ∈ epi( f ) and
u v
λ ∈ [0, 1]. We have
This implies (by the definition of epi( f )), f (λ u + (1 − λ )v) ≤ λ f (u) + (1 − λ ) f (v). There-
fore, f is convex.
240 CHAPTER 7. NONLINEAR OPTIMIZATION
{x ∈ Rn : g(x) ≤ β }
In Figure 7.7 we represent a convex function with a convex level set. In Figure 7.8 we rep-
resent a non-convex function with a non-convex level set. We leave it as an exercise to show
however, that it is possible to have a non-convex function where every level set is convex.
y∈R y∈R
f (x)
f (x)
y=α
y=β
x ∈ Rn x ∈ Rn
{x ∈ Rn : f (x) ≤ α} {x ∈ Rn : f (x) ≤ α}
Figure 7.7: Convex level set Figure 7.8: Non-convex level set
Proof of Remark 7.3. Let g :∈ Rn → R be a convex function and let β ∈ R. We need to show
that S := {x ∈ Rn : g(x) ≤ β } is convex. Let x(1) , x(2) ∈ S and let λ ∈ [0, 1]. Then
g λ x(1) + (1 − λ )x(2) ≤ |{z}
λ g x(1) + (1 − λ ) g x(2) ) ≤ β ,
| {z } | {z } | {z }
≥0 ≥0
≤β ≤β
where the first inequality follows from the fact that g is a convex function. It follows that
λ x(1) + (1 − λ )x(2) ∈ S. Hence, S is convex as required.
7.3. CONVEXITY 241
Consider the NLP defined in (1.35). We say that it is a convex NLP if g1 , . . . , gm and f are
all convex functions. It follows in that case from Remark 7.3 that for every i ∈ {1, . . . , n} the
level set {x ∈ Rn : gi ≤ 0} is a convex set. Since the intersection of convex sets is a convex set
(see Remark 2.16), we deduce that the feasible region,
When g1 , . . . , gm are all affine functions, then the feasible region (7.1) is a polyhedron. More-
over, in that case, the functions are clearly convex. Hence, the previous result implies in
particular that every polyhedron is a convex set, which was the statement of Proposition 2.17.
Exercises
1. (a) Let g1 , g2 , g3 : R → R be defined by
Plot these functions on R2 . Identify on your plot, the function ĝ : R → R defined by,
(a) Prove that the set of feasible solutions of (P) is a convex set.
(b) Let x̂ be a locally optimal feasible solution of (P). Prove that x̂ is globally optimal.
242 CHAPTER 7. NONLINEAR OPTIMIZATION
7.4.1 Subgradients
Let g ∈ Rn → R be a convex function and let x̄ ∈ Rn . We say that s ∈ Rn is a subgradient of f
at x̄ if for every x ∈ Rn the following inequality holds,
Denote by h(x) the function f (x̄) + sT (x − x̄). Observe that h(x) is an affine function (x̄ is a
constant). Moreover, we have that h(x̄) = f (x̄) and h(x) ≤ f (x) for every x ∈ Rn . Hence, the
function h(x) is an affine function that provides a lower bound on f (x) and approximates f (x)
around x̄. See Figure 7.9.
y∈R
f (x)
x ∈ Rn
x̄
Example 28. Consider g :∈ R2 → R where for every x = (x1 , x2 )T we have g(x) = x22 − x1 . It
can be readily checked (see Example 27) that g is convex. We claim that s := (−1, 2)T is a
subgradient of g at x̄ = (1, 1)T . We have h(x) := g(x̄) + sT (x − x̄) = 0 + (−1, 2)T x − (1, 1) =
−x1 + 2x2 − 1. We need to verify, h(x) ≤ f (x) for every x ∈ R2 , i.e. that
?
−x1 + 2x2 − 1 ≤ x22 − x1 ,
or equivalently that x22 − 2x1 + 1 = (x2 − 1)2 ≥ 0 which clearly holds as the square of any
number is non-negative.
This last remark is illustrated in Figure 7.11. We consider the function g : R2 → R, where
g(x) = x22 − x1 and the point x̄ = (1, 1)T . We saw in Example 28 that the vector s = (−1, 2)T
is a subgradient for g at x̄. Then F = {x ∈ R2 : −x1 + 2x2 − 1 ≤ 0}. We see in the figure that
F is a supporting halfspace of C at x̄ as predicted.
We deduce the following useful tool from the previous remark,
2F is clearly a halfspace since we can rewrite it as {x : sT x ≤ sT x̄ − g(x̄)} and sT x̄ − g(x̄) is a constant.
244 CHAPTER 7. NONLINEAR OPTIMIZATION
s
unique supporting
halfspace
x(1)
many supporting
(2)
halfspaces
C x
Corollary 7.6. Consider an NLP of the form given in (1.35). Let x̄ be a feasible solution and
suppose that constraint gi (x) ≤ 0 is tight for some i ∈ {1, . . . , m}. Suppose gi is a convex func-
tion that has a subgradient s at x̄. Then the NLP obtained by replacing constraint gi (x) ≤ 0
by the linear constraint sT x ≤ sT x̄ − gi (x̄) is a relaxation of the original NLP.
In this section we consider convex NLPs of the form (1.35) that satisfy the additional condition
that all of the functions f and g1 , . . . , gm are differentiable. In that setting we can characterize
(see Theorem 7.9) when a feasible solution x̄ is an optimal solution (assuming the existence
of a Slater point).
7.5. OPTIMALITY CONDITIONS FOR THE DIFFERENTIABLE CASE 245
x2
s
2
1
x̄ C F
0 1 2 x1
We claim that it is sufficient to consider NLPs where the objective function is linear, i.e. of
the form,
min z = cT x
s.t.
g1 (x) ≤ 0,
g2 (x) ≤ 0, (7.2)
.. .. ..
. . .
gm (x) ≤ 0.
This is because problem (1.35) is reducible to problem (7.2). To prove this fact, we can
proceed as follows: given a NLP of the form (1.35) introduce a new variable xn+1 and add the
246 CHAPTER 7. NONLINEAR OPTIMIZATION
max z = −xn+1
s.t.
f (x) − xn+1 ≤ 0,
g1 (x) ≤ 0,
g2 (x) ≤ 0,
.. .. ..
. . .
gm (x) ≤ 0.
This NLP is of the form (7.2), and minimizing xn+1 is equivalent to minimizing f (x).
Let x̄ be a feasible solution to the NLP (7.2) and assume that it is a convex NLP. Let
us derive sufficient conditions for x̄ to be an optimal solution to (7.2). Define, J(x̄) :=
{i : gi (x̄) = 0}. That is, J(x̄) is the index set of all constraints that are tight at x̄. Suppose
for every i ∈ J(x̄) we have a subgradient si of the function gi at the point x̄. Then we construct
a linear programming relaxation of the NLP as follows: first we omit every constraint that is
not tight at x̄, and for every i ∈ J(x̄), we replace (see Corollary 7.6) the constraint gi (x) ≤ 0
by the linear constraint sTi x ≤ sTi x̄ − g(x̄). Since the objective function is given by “min cT x”
we can rewrite it as “max −cT x”. The resulting linear program is thus,
max z = −cT x
s.t. (7.3)
sTi x ≤ sTi x̄ − g(x̄) for all i ∈ J(x̄)
Theorem 4.7 says that x̄ is an optimal solution to (7.3), and hence of (7.2) as it is a relaxation
of (7.3), when −c is in the cone of the tight constraints, i.e. if −c ∈ cone{si : i ∈ J(x̄)}. Hence,
we proved the following,
Proposition 7.7. Consider the NLP (7.2) and assume that g1 , . . . , gm are convex function. Let
x̄ be a feasible solution and suppose that for all i ∈ J(x̄) we have a subgradient si at x̄. If
−c ∈ cone{si : i ∈ J(x̄)} then x̄ is an optimal solution.
Thus we have sufficient conditions for optimality. Theorem 7.9 that we will state in the next
section, essentially says that when the subgradients are unique, then these conditions are also
necessary. We illustrate the previous proposition on an example.
Example 26 continued. Let us use the previous proposition to show that the point x̄ = (1, 1)T
is an optimal solution to the NLP given in Example 26. In this case J(x̄) = {1, 2}, and the
feasible region for the linear programming relaxation (7.3) corresponds to the shaded region
7.5. OPTIMALITY CONDITIONS FOR THE DIFFERENTIABLE CASE 247
x2
1 (1, 1)
3
4 g1
g2
1
2 g3
F1 1
4
0 1 1 3 1 x1
4 2 4
F2
in Figure 7.12. In Example 28 we showed that the subgradient of g1 at x̄ is (−1, 2)T . Similarly,
we can show that the subgradient of g2 at x̄ is (2, −1)T . In this example with have c = (1, 1)T ,
thus Proposition 7.7 asks us to verify that
1 ? −1 2
−c = ∈ cone , ,
1 2 −1
f (x̄ + h) − f (x̄) − sT h
lim = 0,
h→0 khk2
we say that f is differentiable at x̄ and call the vector s the gradient of f at x̄. We denote s by
∇ f (x̄). We will use the following without proof,
Proposition 7.8. Let f : Rn → R be a convex function and let x̄ ∈ Rn . If the gradient ∇ f (x̄)
exists then it is a subgradient of f at x̄.
248 CHAPTER 7. NONLINEAR OPTIMIZATION
Note that in the above definition of the gradient, h varies over all vectors in Rn . Under some
slightly more favorable conditions, we can obtain the gradient ∇ f (x̄) via partial derivatives of
∂f
f at x̄. For example, suppose that for every j ∈ {1, 2, . . . , n}, the partial derivatives ∂xj exist
and are continuous at every x ∈ Rn . Then,
T
∂ f (x) ∂ f (x) ∂ f (x)
∇ f (x) = , ,..., ,
∂ x1 ∂ x2 ∂ xn
We illustrate the previous theorem in Figure 7.13. In that example, the tight constraints for x̄
are g1 (x) ≤ 0 and g2 (x) ≤ 0. We indicate the cone formed by ∇g1 (x̄), ∇g2 (x̄) (translated to
have x̄ as its origin). In this example, −∇ f (x̄) is in that cone, hence the feasible solution x̄ is
in fact optimal.
Suppose that in Theorem 7.9 the function f (x) is the linear function cT x, i.e. the NLP is of
the form (7.2). Then ∇ f (x̄) = c and the sufficiency of Theorem 7.9 follows immediately from
Proposition 7.7, i.e. we have shown that if the condition (?) holds then x̄ is indeed an optimal
7.5. OPTIMALITY CONDITIONS FOR THE DIFFERENTIABLE CASE 249
! "
g2 (x) = 0 cone ∇gj (x̄) : j ∈ J(x̄)
feasible (x̄)
region ∇g 2
∇f (x̄)
−∇f (x̄)
∇g
1 (x̄
)
g1 (x) = 0
x̄
solution. The essence of the theorem is to prove the reverse direction, i.e. that x̄ will only be
optimal when (?) holds. This is when the fact that x̄ is a Slater point comes into play. Observe,
that when f and g1 , . . . , gm are all affine functions, then Theorem 7.9 becomes the optimality
theorem for linear programs (Theorem 4.7).
Theorem 4.7 was a restatement of the Complementary Slackness Theorem 4.6. Similarly,
we leave it as an exercise to check that condition (?) can be restated as follows, there exists
y ∈ Rm
+ such that the following conditions hold:
m
∇ f (x̄) + ∑ ȳi ∇gi (x̄) = 0 (Dual feasibility, together with y ∈ Rm
+)
i=1
ȳi gi (x̄) = 0, ∀i ∈ {1, 2, . . . , m} (Complementary Slackness)
Exercises
1. Consider the following NLP:
and the vector x̄ := (1, 1)T . Write down the optimality conditions for x̄ for this NLP as
described in the Karush-Kuhn-Tucker Theorem. Using these conditions and the theorem,
prove that x̄ is optimal. N OTE: you may use the fact that the functions defining the objective
function and the constraints are convex and differentiable, without proving it.
(NLP) min x3
s.t.
x1 + x2 ≤ 0,
x12 − 4 ≤ 0,
x12 − 2x1 + x22 − x3 + 1 ≤ 0,
1
1 1 T
and the vector x̄ := 2,−2, 2 . Write down the optimality conditions for x̄ for this NLP
as described in Karush-Kuhn-Tucker Theorem. Using these conditions and the theorem,
prove that x̄ is optimal.
m
Since y ≥ 0, for every feasible solution x, we have ∑ yigi(x) ≤ 0. Therefore, the minimum
" # i=1
m
value of f (x) + ∑ yi gi (x) over all x ∈ Rn gives a lower bound on the optimal objective value
i=1
of the original NLP. Notice that if x̄ is a feasible solution of NLP, then f (x̄) is an upper bound
on the optimal objective value of the NLP. On the other hand, if we pick some nonnegative
7.6. OPTIMALITY CONDITIONS BASED ON LAGRANGIANS 251
Note that the Lagrangian encodes all the information about the problem NLP:
• Setting y to unit vectors and using the above, we can obtain all the constraint functions.
That is, for every i ∈ {1, 2, . . . , m},
Perturbing y along a unit vector, we have the same result for every y:
Let z∗ denote the optimal objective objective value of NLP. Since the minimum value of
the Lagrangian at every nonnegative y gives a lower bound on z∗ , we also have
The next theorem summarizes the fundamental duality result based on the Lagrangian.
(x, ȳ)
L(x, y)
(x̄, ȳ)
y ∈ Rm
(x̄, y)
x ∈ Rn
Figure 7.14: Saddle point function
We illustrate the Saddle-point condition of the Lagrangian in Figure 7.14. Among all
points (x, ȳ), the point (x̄, ȳ) minimizes L(x, y). Among all points (x̄, y), the point (x̄, ȳ) maxi-
mizes L(x, y).
Now, let us see a proof that if x̄ ∈ Rn satisfies the conditions of the above theorem for some
ȳ then it is an optimal solution of the NLP in the convex case. First, we prove that x̄ is feasible.
We saw that the Lagrangian encodes all the information on the constraints of NLP and we can
extract this information using unit vectors. So, let us fix i ∈ {1, 2, . . . , m} and apply the first
inequality in the saddle-point condition using y := ȳ + ei :
which is equivalent to
m m
f (x̄) + ∑ ȳi gi (x̄) ≤ f (x) + ∑ ȳi gi (x), ∀x ∈ Rn .
i=1 i=1
Consider all feasible x (that is, gi (x) ≤ 0 for every i). Since ȳi ≥ 0 for every i, we have
∑m
i=1 ȳi gi (x) ≤ 0. Hence, our conclusion becomes
m
f (x̄) + ∑ ȳi gi (x̄) ≤ f (x), ∀x feasible in NLP.
i=1
Using the fact that complementary slackness holds at (x̄, ȳ), we observe that ∑m
i=1 ȳi gi (x̄) is
zero and we conclude
f (x̄) ≤ f (x), ∀x feasible in NLP.
Indeed, this combined with feasibility of x̄ means, x̄ is optimal in NLP.
The last two theorems can be strengthened via replacing the assumption on the existence
of a Slater point by weaker, but more technical assumptions.
Moreover, the last two theorems can be generalized to NLP problems that are not convex.
In the general case, the first theorem only characterizes local optimality.
Note that we have explicitly included the slack variables s. For now, we will also assume that
there exist interior solutions for both problems, i.e. there exist x̄ and (ȳ, s̄) such that Ax̄ = b,
x̄ j > 0 for all j and AT ȳ − s̄ = c, s̄ j > 0 for all j. So, an interior solution is a feasible solution
which satisfies all the inequality constraints strictly. Note that x̄ and s̄ are Slater points for (P)
and (D) respectively (in suitable reformulations of (P) and (D) in the form of NLP).
By the Weak Duality Theorem, we have
bT ȳ − cT x̄ ≥ 0.
If x̄ and s̄ had been optimal solutions, we would have had x̄T s̄ = 0. Recall the necessary and
sufficient conditions for optimality:
Ax = b, x ≥ 0, (primal feasibility),
AT y − s = c, s ≥ 0, (dual feasibility),
x j s j = 0 for all j (complementary slackness).
Now, we would like to find a new solution x, y, s so that x is feasible for (P), (y, s) is feasi-
ble for (D) and the duality gap bT y − cT x = xT s is smaller than the current duality gap: x̄T s̄.
Recall that the simplex algorithms keep primal feasibility and complementary slackness and
strive for dual feasibility. There are also so-called dual-simplex algorithms which keep dual
feasibility and complementary slackness and strive for primal feasibility. Actually, we have
seen an algorithm which behaved in this way (keeping dual feasibility and complementary
slackness and striving for primal feasibility), namely, the Hungarian Method for finding min-
imum cost perfect matchings in bipartite graphs. Here, we are keeping primal feasibility and
dual feasibility and striving for complementary slackness. So, we would like to find directions
dx in the primal and dy, ds in the dual such that
1. x̄ + dx is feasible in (P)
The first item above implies Adx = 0, the second item implies AT dy − ds = 0, and the third
one implies (x̄ j + dx j )(s̄ j + ds j ) = 0. The first two systems of equations are linear in dx, dy,
and ds; however, the third one is
s̄ j dx j + x̄ j ds j + dx j ds j = −x̄ j s̄ j
which is quadratic (we have the term dx j ds j ). Of course, if we could solve these three groups
of equations at once and maintain the nonnegativity of (x̄ + dx), (s̄ + ds), we would have
optimal solutions of (P) and (D). Since we do not know how to solve these equations (without
solving the linear programming problem), we will ignore the quadratic term and try to solve
the big linear system with the last group of linear equations s̄ j dx j + x̄ j ds j = −x̄ j s̄ j instead.
So, we solve the following system of linear equations. Note that the last group of equations
is trying to force a solution dx, ds such that (x̄ + dx)T (s̄ + ds) is close to zero.
Adx = 0, (7.4)
AT dy − ds = 0, (7.5)
s̄ j dx j + x̄ j ds j = −x̄ j s̄ j , for all j ∈ {1, 2, . . . , n}. (7.6)
This is a system of linear equations and we know how to solve it efficiently (e.g. by Gaussian
Elimination). Under our assumptions (rank(A) = m and x̄ > 0, s̄ > 0), this linear system
always has a unique solution.
One very important condition we neglected so far is the nonnegativity constraint on x and
s: Most likely, x̄ + dx is not feasible in (P) or s̄ + ds is not feasible in (D), possibly both.
However, the above development convinced us that the dx, dy, ds are good directions and we
should thus move in these directions, but not “all the way.” So we consider the solution
for some α > 0. We should pick α such that x̄ + αdx > 0 and s̄ + αds > 0. We can choose,
for instance,
x̄ j s̄ j
α = 0.999 min , : dx j < 0, ds j < 0, . (7.7)
−dx j −ds j
Now, we show that the duality gap decreases from iteration to iteration. Here we are able to
show that the decrease is proportional to (1 − α). That is, when the step size is large, we get
256 CHAPTER 7. NONLINEAR OPTIMIZATION
a large decrease in the duality gap. Let us calculate the new duality gap:
n
new gap = ∑ (x̄ j + αdx j )(s̄ j + αds j ),
j=1
n n n
= ∑ x̄ j s̄ j + α ∑ (s̄ j dx j + x̄ j ds j ) + α 2 ∑ dx j ds j .
j=1 j=1 j=1
Just as in the previous sections, instead of explicitly forming the inverse of (AXS−1 AT ), we
solve the linear system of equations (AXS−1 AT )dy = −b for the unknown (dy).
Remark 7.11. (i) The idea that led us to the system of equations (7.4), (7.5), (7.6) is very
closely related to Newton’s method for solving a system of nonlinear equations.
(ii) There are many different ways of choosing the step size α; some have practical and/or
theoretical advantages. In practice, we may even want nto choose different o step sizes
xj
in the primal and the dual problems, e.g. αP = 0.9 min −dx j : dx j < 0, and αD =
n o
s
0.9 min −dsj j : ds j < 0, and then we update (x̄, ȳ, s̄) using (x̄ + αP dx, ȳ + αD dy, s̄ +
αD ds).
(iii) The algorithm can be modified so that it allows infeasible starting points. The only
requirement is that x̄ and s̄ > 0. In this combined two-phase version of the algorithm,
the RHS vectors of (7.4) and (7.5) are replaced by (b − Ax̄) and (c − AT ȳ + s̄) and are
updated in each iteration.
258 CHAPTER 7. NONLINEAR OPTIMIZATION
where ρ > 0 to be chosen later. In the algorithm, we will choose the step size by the following
rule:
choose the step size αk ∈ (0, 1) such that ψ(x + αk dx, ¯ ρ) = 1 + ρ ln
¯ s + αk ds; 1
.
nε
Then, we can prove an upper bound on the number of iterations of the algorithm. We
assume that we are given ε ∈ (0, 1). We define
1
ρ := 2 ln .
ε
Theorem 7.12. Suppose we are given (x0 , y0 , s0 ) as in the statement of the primal-dual affine
scaling algorithm. Further assume that
0 0 1
ψ(x , s ; ρ) ≤ 1 + ρ ln .
nε
Then the primal-dual affine scaling algorithm with the above step size strategy based on ψ,
terminates in at most O n ln2 ε1 iterations with xk , (yk , sk ) feasible in (P) and (D) respec-
tively such that their duality gap is less than ε.
This theorem guarantees finding feasible solutions with very good objective values. It
turns out, for LP, if we can get the duality gap below a certain threshold (for data (A, b, c)
with rational entries, let t denote the number of bits required to store them in binary; then
choosing ε := exp(−2t) suffices), then using these very accurate approximate solutions, we
can quickly compute exact optimal solutions. Using these techniques and ideas, we can prove
that LP problems with rational data can be solved in polynomial time. We work through some
of the details in the exercises at the end of this chapter.
Exercises
1. Suppose A ∈ Rm×n with rank(A) = m, x̄ > 0 and s̄ > 0 are given. Prove that the system
(7.4)-(7.6) has a unique solution.
7.8. FURTHER READING AND NOTES 259
3. Suppose that during an application of the primal-dual interior-point method (to a pair of
primal dual LPs, with the primal (P) in standard equality form), when we solve the system
Adx = 0,
AT dy − ds = 0,
s̄ j dx j + x̄ j ds j = −x̄ j s̄ j , for all j
(for given current iterate x̄ > 0, s̄ > 0 such that Ax̄ = b, AT ȳ − s̄ = c, for some ȳ), we find
that dx = 0. Prove that in this case, there exists ŷ such that
AT ŷ = c
a vector x ∈ Rn and player D’s strategies are denoted by a nonnegative vector y ∈ Rm . Play-
ers choose a strategy (without knowing each other’s choices) and reveal them simultaneously.
Based on the vectors x and y that they reveal, Player P pays Player D [ f (x) + ∑m i=1 yi gi (x)]
dollars (if this quantity is negative, Player D pays the absolute value of it to Player P). Player
P’s problem is " #
m
minn maxm {L(x, y)} = minn maxm f (x) + ∑ yi gi (x) ,
x∈R y∈R+ x∈R y∈R+ i=1
There are very many ways of utilizing the Lagrangian in the theory and practice of optimiza-
tion. For example, we can use the Lagrangian in designing primal-dual algorithms (even for
combinatorial optimization problems). Consider x̄ ∈ Rn , a current iterate of an algorithm.
Suppose x̄ violates a subset of the constraints gi (x) ≤ 0, for i ∈ J. We can choose ȳi > 0, ∀i ∈ J
(essentially penalizing the violations) and minimize L(x, ȳ). This gives us a new iterate x̄ and
we may repeat the process by choosing a new ȳ depending on the current violation and repeat.
In many cases, choosing a good ȳ is also done via solving an easier optimization problem.
Note that we outlined a primal-dual scheme, with alternating moves, based on relaxations, in
the primal and dual spaces.
Bibliography
[1] N. Alon, D. Moshkovitz, and S. Safra. Algorithmic construction of sets for k-restrictions. ACM
Transactions on Algorithms, 2(2):153–177, 2006.
[2] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski. Robust Optimization. Princeton Series in Applied
Mathematics. Princeton University Press, Princeton, NJ, 2009.
[3] M. J. Best. Portfolio Optimization. Chapman & Hall/CRC Finance Series. CRC Press, Boca
Raton, FL, 2010.
[4] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge,
2004.
[7] G. Cornuéjols and R. Tütüncü. Optimization Methods in Finance. Mathematics, Finance and
Risk. Cambridge University Press, Cambridge, 2007.
[8] G. B. Dantzig. Linear Programming and Extensions. Princeton University Press, Princeton, N.J.,
1963.
[9] E. W. Dijkstra. A note on two problems in connexion with graphs. Numerische Mathematik,
1:269–271, 1959.
[10] I. Dinur and S. Safra. The importance of being biased. In Proceedings, ACM Symposium on
Theory of Computing, pages 33–42, 2002.
[11] U. Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634–652, 1998.
[12] M. L. Fredman and R. E. Tarjan. Fibonacci heaps and their uses in improved network optimization
algorithms. Journal of the ACM, 34(3):596–615, July 1987.
[13] G. H. Golub and C. F. Van Loan. Matrix Computations. Johns Hopkins Studies in the Mathemat-
ical Sciences. Johns Hopkins University Press, Baltimore, MD, third edition, 1996.
261
262 BIBLIOGRAPHY
[15] L. G. Khachiyan. A polynomial algorithm in linear programming. Dokl. Akad. Nauk SSSR,
244(5):1093–1096, 1979.
[16] S. Khot and O. Regev. Vertex cover might be hard to approximate to within 2-epsilon. J. Comput.
Syst. Sci., 74(3):335–349, 2008.
[17] R. D. C. Monteiro, I. Adler, and M. G. C. Resende. A polynomial time primal-dual affine scaling
algorithm for linear and convex quadratic programming and its power series extension. Mathe-
matics of Operations Research, 15:191–214, 1990.
[19] M. L. Pinedo. Scheduling. Springer, New York, third edition, 2008. Theory, algorithms, and
systems, With 1 CD-ROM (Windows, Macintosh and UNIX).
[20] R. Raz and S. Safra. A sub-constant error-probability low-degree test, and a sub-constant error-
probability pcp characterization of np. In Proceedings, ACM Symposium on Theory of Comput-
ing, pages 475–484, 1997.
[21] A. Schrijver. Theory of Linear and Integer Programming. Wiley-Interscience Series in Discrete
Mathematics. John Wiley & Sons Ltd., Chichester, 1986.
[22] R. J. Vanderbei. Linear Programming. International Series in Operations Research & Manage-
ment Science, 37. Kluwer Academic Publishers, Boston, MA, second edition, 2001.
[24] D. P. Williamson and D. Shmoys. The Design of Approximation Algorithms. Cambridge Univer-
sity Press, 2011.
[25] W. L. Winston. Operations Research, Applications and Algorithms. Thomson Learning, 2004.
[27] Y. Ye. Interior Point Algorithms. Wiley-Interscience Series in Discrete Mathematics and Opti-
mization. John Wiley & Sons Inc., New York, 1997.
Index
263
264 INDEX
neighbours, 146
non linear program, 48
non linear program:NLP, 48
optimal value, 55
path, 37
perfect matching, 38
pivot rules, 114
polyhedron, polyhedra, 104
portfolio optimization, 54
potential function, 258
pruning, 230
reducible, 91
robust optimization, 54