Denardo (2009) LinearProgramming
Denardo (2009) LinearProgramming
Denardo (2009) LinearProgramming
Volume 149
Series Editor
Frederick S. Hillier
Stanford University, CA, USA
1 3
Eric V. Denardo
Yale University
P.O. Box 208267
New Haven CT 06520-8267
USA
[email protected]
ISSN 0884-8289
ISBN 978-1-4419-6490-8â•…â•…â•…â•… e-ISBN 978-1-4419-6491-5
DOI 10.1007/978-1-4419-6491-5
Springer New York Dordrecht Heidelberg London
Over the past half century, dozens of excellent books have appeared on
this subject. Why another? This book fuses five components:
v
vi Linear Programming and Generalizations
has become vastly easier to learn and to use. Spreadsheets help the student to
become facile with the subject, and it helps them use it to shape their profes-
sional identities.
The book is designed for use in courses that focus on the applications of
constrained optimization, in courses that emphasize the theory, and in cours-
es that link the subject to economics. A “use’s guide” is provided; it takes the
form of a brief preview of each of the six Parts that comprise this book.
Acknowledgement
This book’s style and content have been shaped by decades of interac-
tion with Yale students. Their insights, reactions and critiques have led me
toward a problem-based approach to teaching and writing. With enthusiasm,
I acknowledge their contribution. This book also benefits from interactions
with my colleagues on the faculty. I am deeply indebted to Uriel G. Rothblum,
Kurt Anstreicher, Ludo Van der Heyden, Harvey M. Wagner, Arthur J. Swer-
sey, Herbert E. Scarf and Donald J. Brown, whose influences are evident here.
vii
Contents
Part I – Prelude
Part IV – LP Theory
Chapter 13. The Dual Simplex Pivot and Its Uses���������������������������������╇ 413
ix
x Linear Programming and Generalizations
Chapter 2 contains the facets of Excel that are used in this book. Also
discussed in Chapter 2 is the software that accompanies this text. All of the
information in it is helpful, and some of it is vital.
1. Preview
The goals of this chapter are to introduce you to linear programming and
its generalizations and to preview what’s coming. The chapter itself is orga-
nized into six main sections:
• In the first of these sections, the terminology that describes linear pro-
grams is introduced and a simple linear program is solved graphically.
• The fourth section introduces four themes that pervade this book.
• The fifth section introduces the computer codes that are used in this
text.
• The sixth section consists of a brief account of the origins of the field.
2. An Example
1x + 2y ≤ 4,
3x + 2y ≤ 6,
x â•› ≥ â•›0,
╅╅╇ y ≥ 0.
Chapter 1: Eric V. Denardo 5
The decision variables in a linear program are the quantities whose val-
ues are to be determined. Program 1.1 has two decision variables, which are
x and y. Program 1.1 has four constraints, each of which is a linear inequality.
A big deal?
A linear program seems rather simple. Can something this simple be im-
portant? Yes! Listed below are three reasons why this is so.
• The ideas that underlie the simplex method generalize readily to situa-
tions that are far from linear and to settings that entail several decision
makers, rather than one.
Feasible solutions
Like any field, linear programming has its own specialized terminology
(jargon). Most of these terms are easy to remember because they are sug-
gested by normal English usage. A feasible solution to a linear program is
a set of values of its decision variables that satisfies each of its constraints.
Program 1.1 has many feasible solutions, one of which xâ•›=â•›1 and yâ•›=â•›0. The
feasible region of a linear program is its set of feasible solutions. Program 1.1
has only two decision variables, so its feasible region can be represented on
the plane. Figure 1.1 does so.
6 Linear Programming and Generalizations
y
3
1x + 2y = 4
2
3x + 2y = 6
1
feasible region
x=0
0 x
0 1 2 3 4
y=0
Figure 1.1 is easy to construct because the pairs (x, y) that satisfy a par-
ticular linear constraint form a “half-plane” whose boundary is the line on
which this constraint holds as an equation. For example:
• Two points determine a line, and the line 1xâ•›+â•›2yâ•›=â•›4 includes the points
(pairs) (0, 2) and (4, 0).
• In Figure 1.1, a thick arrow points from the line 1xâ•›+â•›2yâ•›=â•›4 into the half-
plane that satisfies the inequality 1xâ•›+â•›2yâ•›≤â•›4.
The feasible region for Program 1.1 is the intersection of four half-planes,
one per constraint. In Figure 1.1, the feasible region is the area into which the
thick arrows point, and it is shaded.
Chapter 1: Eric V. Denardo 7
Optimal solutions
An optimal solution to Program 1.1 is xâ•›=â•›1 and yâ•›=â•›1.5, and its optimal
value z*â•›=â•›2xâ•›+â•›2yâ•›=â•›(2)(1)â•›+â•›(2)(1.5)â•›=â•›5. To convince yourself that this is the
optimal solution to Program 1.1, consider Figure 1.2. It augments Figure 1.1
by including two “iso-profit” lines, each of which is dashed. One of these lines
contains the points (x, y) whose objective value equals 4, the other contains
the pairs (x, y) whose objective value that equals 5. It is clear, visually, that the
unique optimal solution to Program 1.1 has xâ•›=â•›1 and yâ•›=â•›1.5.
Figure 1.2.↜ Feasible region for Program 1.1, with two iso-profit lines.
y
3
2x + 2y = 4
2 (1, 1.5)
2x + 2y = 5
1
feasible region
0 x
0 1 2 3 4
A linear program can have only one optimal value, but it can have more
than one optimal solution. If the objective of Program 1.1 were to maximize
(xâ•›+â•›2y), its optimal value would be 4, and every point on the line segment
connecting (0, 2) and (1, 1.5) would be optimal.
8 Linear Programming and Generalizations
A taxonomy
Each linear program that is feasible and bounded has at least one opti-
mal solution.
• It may be unbounded.
╅╛╛╛╅╇╛1u + 3v ≥ 2,
╛╛2u + 2v ≥ 2,
╇╛╛u â•› ≥ 0,
╅╅╇╛╛v ≥ 0.
Figure 1.3 plots the feasible region for Program 1.2. This feasible region is
clearly unbounded. Program 1.2 is bounded, nonetheless; every feasible solu-
tion has objective value that exceeds 0.
v
2u + 2v = 2
1
feasible region
1u + 3v = 2
(1/2, 1/2)
0 u
0 1 2
You might suspect that unbounded feasible reasons do not arise in prac-
tice, but that is not quite accurate. In a later chapter, we’ll see that every linear
program is paired with another, which is called its “dual.” We will see that if a
10 Linear Programming and Generalizations
linear program is feasible and bounded, then so is its dual, in which case both
linear programs have the same optimal value, and at least one of them has an
unbounded feasible region. Programs 1.1 and 1.2 are each other’s duals, by
the way. One of their feasible regions is unbounded, as must be.
3. Generalizations
This problem does not have an optimal solution. The “infimum” of its
objective equals 6, and setting y slightly above 2 comes “close” to 6, but an
objective value of 6 is not achievable. Ruling out strict inequalities eliminates
this difficulty.
On the other hand, the simplex method can – and will – be used to find
solutions to linear systems that include one or more strict inequalities. To
illustrate, suppose a feasible solution to Program 1.1 is sought for which the
variables x and y are positive. To construct one, use the linear program:
θ ≤ x,
θ ≤ y.
Integer-valued variables
A linear program lets us impose constraints that require the decision vari-
able x to lie between 0 and 1, inclusive. On the other hand, linear programs
do not allow us to impose the constraint that restrict a decision variable x to
the values 0 and 1. This would seem to be a major restriction. Lots of entities
(people, airplanes, and so forth) are integer-valued.
Competition
Non-linear functions
Linear programs require that the objective and constraints have a par-
ticular form, that they be linear. A nonlinear program is an optimization
problem whose objective and/or constraints are described by functions
that fail to be linear. The ideas used to solve linear programs generalize
to handle a variety of nonlinear programs. How that occurs is probed in
Chapter 20.
12 Linear Programming and Generalizations
4. Linearization
A “maximin” objective
â•…â•… 1x + 2y ≤ 4,
â•…â•… 3x + 2y ≤ 6,
â•…â•… x â•›≥ 0,
╅╅╅╅╇ y ≥ 0.
The object of Program 1.3 is to maximize the smaller of two linear expres-
sions. This is not a linear program because its objective is not a linear expres-
sion. To convert Program 1.3 into an equivalent linear program, we maximize
the quantity t subject to constraints that keep t from exceeding the linear ex-
pressions (2xâ•›+â•›2y) and (xâ•›−â•›3y). In other words, we replace Program 1.3 by
A “minimax” objective
(2xâ•›+â•›2y) and (xâ•›−â•›3y), subject to the constraints of Program 1.3. The same
trick works, as is suggested by:
t ≥ 2x + 2y,
t ≥ 1x – 3y,
A linear program seems to require that the objective vary linearly with
the level of a decision variable. In Program 1.1, the objective is to maximize
the linear expression, 2xâ•›+â•›2y. Let us replace the addend 2y in this objective by
the (nonlinear) function p(y) that is exhibited in Figure 1.4. This function il-
lustrates the case of decreasing marginal benefit, in which the (profit) func-
tion p(y) has slope that decreases as the quantity y increases.
14 Linear Programming and Generalizations
S\
VORSHHTXDOV
VORSHHTXDOV
\
y1 ≥ 0, y1 ≤ 0.75, y2 ≥ 0, y = y1 + y2,
and replacing the addend 2y in the objective by (2y1â•›+â•›0.25y2). This results in:
1x + 2y ≤ 4,
3x + 2y ≤ 6,
y = y1 + y2,
â•›y1 ≤ 0.75,
x ≥ 0, y ≥ 0, y1 ≥ 0, y2 ≥ 0.
To verify that Program 1.5 accounts correctly for the profit function p(y)
in Figure 1.4, we consider two cases. First, if the total quantity y does not
exceed 0.75, it is optimal to set y1â•›=â•›y and y2â•›=â•›0. Second, if the total quantity y
does exceed 0.75, it is optimal to set y1â•›=â•›0.75 and y2â•›=â•›yâ•›−â•›0.75.
Chapter 1: Eric V. Denardo 15
An unintended option
Program 1.5 is a bit more subtle than it might seem. Its constraints allow
an unintended option, which is to set y2â•›>â•›0 while y1â•›<â•›1. This option is ruled
out by optimization, however. In this case and in general:
The point is that a linear program will not engage in a more costly way to
do something if a less expensive method of doing the same thing is available.
Net profit is the negative of net cost: A net profit of $6.29 is identical to a
net cost of −$6.29, for instance. Maximizing net profit is precisely equivalent
to minimizing net cost. Because of this, the same trick that handles the case
of decreasing marginal profit also handles the case of increasing marginal
cost. One or more unintended options are introduced, but they are ruled out
by optimization. Again, the more costly way of doing something is avoided.
T\
VORSH
HTXDOV
VORSHHTXDOV
\
16 Linear Programming and Generalizations
Let us turn our attention to the variant of Program 1.1 whose object is to
maximize the nonlinear expression, {2xâ•›+â•›q(y)}. Proceeding as before would
lead to:
1x + 2y ≤ 4,
3x + 2y ≤ 6,
╅╅╇ y = y1 + y2,
╅╅╇ y1 ≤ 1,
x ≥ 0, y ≥ 0, y1 ≥ 0, y2 ≥ 0.
Program 1.6 introduces an unintended option, which is to set y2 posi-
tive while y1 is below 1, and this option is selected by optimization. Indeed,
in Program 1.6, it cannot be optimal to set y1 positive. Given the option, the
linear program chooses the more profitable way to do something. In this case
and in general:
(1) x – 1 ≤ t, – t ≤ x – 1, y – 2 ≤ u, – u ≤ y – 2.
Chapter 1: Eric V. Denardo 17
• If x exceeds 1, the first two constraints in (1) are satisfied by any value
of t that has tâ•›≥â•›xâ•›−â•›1, and the fact that a is positive guarantees that the
objective is minimized by setting tâ•›=â•›xâ•›−â•›1.
• If 1 exceeds x, the first two constraints in (1) are satisfied by any value
of t that has tâ•›≥â•›1â•›−â•›x, and the fact that a is positive guarantees that the
objective is minimized by setting tâ•›=â•›1â•›−â•›x.
A similar observation applies to y. Programs 1.7 and 1.7´ have the same
optimal value, and the optimal solution to Program 1.7´ specifies values of x
and y that are optimal for Program 1.7.
Program 1.8.╇ Minimize {a(x − 1)2 + b(y − 2)2 }, subject to the constraints
in Program 1.1.
and convert the model to a linear program, exactly as was done for
Program 1.7.
On the other hand, the variance squares the difference between the out-
come and its expectation, and it weighs upside and downside differences
equally. Substituting a “mean absolute deviation” for the variance produces
a linear program that may make better sense. Also, removing two of the
constraints in (1) minimizes the expected downside variability, which might
make still better sense.
Constraints on ratios
But the other constraints in Program 1.1 guarantee yâ•›≥â•›0, and multiply-
ing an inequality by a nonnegative number preserves its sense. In particular,
multiplying the ratio constraint that is displayed above by the nonnegative
number y produces the linear constraint
x ≤ 0.8y.
This conversion must be qualified, slightly, because ratios are not defined
when their denominators equal zero. If the constraint x/yâ•›≤â•›0.8 is intended
to mean that x cannot be positive when yâ•›=â•›0, it is equivalent to xâ•›≤â•›0.8y. In
general:
Optimizing a ratio*
The next three subsections concern a linear program whose objective func-
tion is a ratio of linear expressions. These subsections are starred. They cover a
specialized topic that can be skipped or deferred with no loss of continuity. Read-
ers who are facile with matrix notation may wish to read them now, however.
Chapter 1: Eric V. Denardo 19
Program 1.9, below, maximizes the ratio of two linear expressions, sub-
ject to linear constraints. Its data form the mâ•›×â•›n matrix A, the mâ•›×â•›1 vector b,
the 1â•›×â•›n vector c and the 1â•›×â•›n vector d. Its decision variables are the entries
in the nâ•›×â•›1 vector x.
cx
Program 1.9.╇ z* = Maximize , subject to the constraints
dx
(2) â•…â•…â•…â•…â•… â•…â•… Ax = b, x ≥ 0.
Hypothesis A:╇
1.╇ Every vector x that satisfies Axâ•›=â•›b and xâ•›≥â•›0 has dxâ•›>â•›0.
Interpretation of Hypothesis A*
In applications, it is often evident that every vector x that satisfies (2) as-
signs a value to d x that is bounded away from 0 and from +∞.
A change of variables*
1
(3) t= and x̂ = xt.
dx
Programs 9 and 10 have the same data, namely, the matrix A and the vec-
tors b, c and d. They have different decision variables. Feasible solutions to
these two optimization problems are related to each other by:
cx
(5) = cx̂.
dx
cx cx cx
= ×t= × t = cx̂,
dx d x̂ 1
which completes the proof. ■
Chapter 1: Eric V. Denardo 21
Proposition 1.1 shows how every feasible solution to Program 1.9 cor-
responds to a feasible solution to Program 1.10 that has the same objective
value. Thus, rather than solving Program 1.9 (which is nonlinear), we can
solve Program 1.10 (which is linear) and use (3) to construct an optimal solu-
tion to Program 1.9.
5. Themes
Discussed in this section are several themes that are developed in later
chapters. These themes are:
• The broad array of situations that can be modeled and solved as linear
programs and their generalizations.
Pivoting
At the heart of nearly every software package that solves linear programs
lies the simplex method. The simplex method was devised by George B.
Dantzig in 1947. An enormous number of person-years have been invested in
attempts to improve on the simplex method. Algorithms that compete with it
in specialized situations have been devised, but nothing beats it for general-
purpose use, especially when integer-valued solutions are sought. Dantzig’s
simplex method remains the best general-purpose solver six decades after he
proposed it.
At the core of the simplex method lies the pivot, which plays a central
role in Gauss-Jordan elimination. In Chapter 3, we will see how Gauss-Jordan
elimination pivots in search of a solution to a system of linear equations. In
Chapter 4, we will see that the simplex method keeps on pivoting, in search of
an optimal solution to a linear program. In Chapter 15, we’ll see how a slightly
different pivot rule (called complementary pivoting) finds the solution to a
non-zero sum matrix game. And in Chapter 16, we’ll see how complementary
22 Linear Programming and Generalizations
Impact on mathematics
The analysis of linear programs and their generalizations have had a pro-
found impact on mathematics. Three facets of this impact are noted here.
The simplex method actually solves a pair of linear programs – the one
under attack and its dual. That it does so is an important – and largely un-
anticipated – facet of linear algebra whose implications are probed in Chap-
ter 12. Duality is an important addition to the mathematician’s tool kit; it
facilitates the proof of many theorems, as is evident in nearly every issue of
the journals, Mathematics of Operations Research and Mathematical Program-
ming.
Economic reasoning
Within economics, these three concepts are usually described in the con-
text of an optimal allocation of resources. In Chapter 12, however, it will be
seen that these three concepts apply to each step of the simplex method, that
it uses them to pivot from one “basis” to another as it seeks an optimal solu-
tion.
It was mentioned earlier that every linear program is paired with another,
in particular, that Programs 1.1 and 1.2 are each duals. This duality provides
economic insight at several different levels. Three illustrations of its impact
are listed below.
well-being while requiring the market for each good to “clear.” The dual
linear program sets prices that maximize the producers’ profits. Their
optimal solutions satisfy the consumer’s budget constraint, thereby
constructing a general equilibrium.
Areas of application
Several chapters of this book are devoted to the situations that can be
modeled as linear programs and their generalizations.
The applications in the above list are of a linear program, without regard
to its dual. Models of competition can be analyzed by a linear program and
its dual. These include the aforementioned model of an economy in general
equilibrium.
6. Software
One choice
• Solver, which comes with Excel. The original version of Solver was
written by Frontline Systems. Solver is now maintained by Microsoft.
These packages are introduced in Chapter 2, and their uses are elaborated
upon in subsequent chapters. These packages (and many others) have user
interfaces that are amazingly user-friendly.
Large problems
Solver and Premium Solver for Education can handle all of the linear and
nonlinear optimization problems that appear in this text. These codes fail
on problems that are “really big” or “really messy” – those with a great many
variables, with a great many constraints, with a large number of integer-val-
ued variables, or with nonlinear functions that are not differentiable. For big
problems, you will need to switch to one of the many commercially available
packages, and you may need to consult an expert.
Leoinid V. Kantorovich
the best way to plan for production2. This monograph included a linear pro-
gram, and it recognized the importance of duality, but it seemed to omit a
systematic method of solution. In 1942, Kantorovich published a paper that
included a complete description of a network flow problem, including dual-
ity, again without a systematic solution method3.
For the next twenty years, Kantorovich’s work went unnoticed in the
West. Nor was it applauded within the U. S. S. R., where planning was central-
ized and break-even prices were anathema. It was eventually recognized that
Kantorovich was the first to explore linear programming and that he probed
it deeply. Leonid V. Kantorovich richly deserved his share of the 1975 Nobel
Prize in Economics, awarded for work on the optimal allocation of resources.
George B. Dantzig
After war’s end, Dantzig returned to Berkeley for a few months to com-
plete his Ph. D. degree. By the summer of 1946, Dantzig was back in Washing-
ton as the lead mathematician in a group whose assignment was to mechanize
the planning problems faced by the Air Force. By the spring of 1947, Dantzig
had observed that a variety of Air Force planning problems could be posed
as linear programs. By the summer of 1947 he had developed the simplex
method. These and a string of subsequent accomplishments have cemented
his stature as the preeminent figure in linear programming.
Tjalling C. Koopmans
2╇
Kantorovich, L. V., The mathematical method of production planning and organiza-
tion, Leningrad University Press, Leningrad, 1939. Translated in Management Sci-
ence, V. 6, pp. 366-422, 1960.
3╇
Kantorovich, L. V., “On the translocation of masses,” Dokl. Akad. SSSR, V. 37,
pp. 227–229.
Chapter 1: Eric V. Denardo 27
daughter. During the war, while serving as a statistician for the British Mer-
chant Shipping Mission in Washington, D.C., he built a model of optimal
routing of ships, with the attendant shadow costs. Koopmans shared the 1975
Nobel Prize in economics with Kantorovich for his contributions to the opti-
mal allocation of resources.
An historic conference
A conference on activity analysis was held from June 20-24, 1949, at the
Cowles Foundation, then located at the University of Chicago. This confer-
ence was organized by Tjalling Koopmans, who had become very excited
about the potential for linear programming during a visit by Dantzig in the
spring of 1947. The volume that emerged from this conference was the first
published compendium of results related to linear programming4. The par-
ticipants in this conference included six future Nobel Laureates (Kenneth Ar-
row, Robert Dorfman, Tjalling Koopmans, Paul Samuelson, Herbert Simon
and Robert Solow) and five future winners of the von Neumann Theory Prize
in Operations Research (George Dantzig, David Gale, Harold Kuhn, Herbert
Simon and Albert Tucker).
With amazing foresight, the Air Force organized Project SCOOP (sci-
entific computation of optimal programs) and funded the development and
acquisition of digital computers that could implement the simplex method.
These computers included:
Industrial applications
8. Review
5╇
Cooper. W. W., “Abraham Charnes and W. W. Cooper (et al): A brief history of a
long collaboration in developing industrial uses of linear programming,” Operations
Research, V. 50, pp. 35-41.
6╇
Charnes, A., W. W. Cooper and B. Mellon, “Blending aviation gasolines – a study of
programming interdependent activities in an integrated oil company, Econometrica,
V. 20, pp 135-159, 1952.
7╇
Charnes, A., “Optimality and degeneracy in linear programming,” Econometrica, V.
20, pp 160-170, 1952.
Chapter 1: Eric V. Denardo 29
Terminology
Utility
It is hoped that you now have a feel for the value of studying linear pro-
gramming and its generalizations. Within this chapter, it has been observed
that:
• The methods have broad applicability – they adapt to handle strict in-
equalities, integer-valued variables, nonlinearities, and competition.
• Pivots are potent – with them, we can tackle systems of linear equa-
tions, linear programs, and fixed-point problems.
Its breadth, insight and usefulness may make linear programming the
most important development in applicable mathematics to have occurred
during the last 100 years.
30 Linear Programming and Generalizations
1.╇ (subway cars) The Transit Authority must repair 100 subway cars per
month, and it must refurnish 50 subway cars per month. Both tasks can
be done by the Transit Authority, and both can be contracted to private
shops, but at a higher cost. Private contracting increases the cost by $2000
per car repaired and $2500 per car refurnished.
(a)╇ Formulate the problem of minimizing the monthly expense for private
contracting as a linear program. Solve it graphically.
in the case of a fire. On a normal school day, there are 450 people in the
building. It has three exterior doors. With a bit of experimentation, she
learned that about 1.5 minutes elapse between the sounding of a fire alarm
and the emergence of people from door A, after which people can emerge
at the rate of 60 per minute. The comparable data for doors B and C are
delay of 1.25 minutes and 1.0 minutes, and rates of 40 per minute and 50
per minute, respectively.
(b) Can you “eyeball” the optimal solution to this linear program? Hint:
After the first 1.5 minutes, are people filing out at the rate of 150 per
minute?
4. (deadheading) SW airline uses a single type of aircraft. Its service has been
disrupted by a major winter storm. A total of 20 aircraft, each with its
crew, must be deadheaded (flown without passengers) in order to resume
its normal schedule. To the right of the table that appears below are the
excess supplies at each of three airports. (These total 20). To the bottom
are the excess demands at five other airports. (They also total 20). Within
the table are the deadheading costs. For instance, the airline has 9 aircraft
too many at airport A, it has 5 aircraft too few at airport V, and the cost
of deadheading each aircraft from airport A to airport V is 25 thousand
dollars. The airline wishes to resume its normal schedule with the least
possible expense on deadheading.
(b) Subtract the smallest cost in each column from every cost in that col-
umn. Did this alter the relative desirability of different plans?
(c) With respect to the costs obtained from part (b), “eyeball” a shipping
plan whose cost is close to zero. How far from optimum can it be?
Have you established a lower bound on the cost of resuming SW air-
line’s normal schedule?
32 Linear Programming and Generalizations
V W X Y Z supply
A 25 10 20 25 20 9
B 5 10 80 20 40 4
C 10 40 75 10 10 7
demand 5 2 4 6 3
1.╅ Preview����������������������������������尓������������������������������������尓������������������������ 33
2.╅ The Basics����������������������������������尓������������������������������������尓�������������������� 34
3.╅ Expository Conventions����������������������������������尓���������������������������������� 38
4.╅ The Sumproduct Function����������������������������������尓������������������������������ 40
5.╅ Array Functions and Matrices����������������������������������尓������������������������ 44
6.╅ A Circular Reference ����������������������������������尓������������������������������������尓�� 46
7.╅ Linear Equations ����������������������������������尓������������������������������������尓�������� 47
8.╅ Introducing Solver ����������������������������������尓������������������������������������尓������ 50
9.╅ Introducing Premium Solver����������������������������������尓�������������������������� 56
10.╇ What Solver and Premium Solver Can Do����������������������������������尓���� 60
11.╇ An Important Add-In����������������������������������尓������������������������������������尓�� 62
12.╇ Maxims for Spreadsheet Computation����������������������������������尓���������� 64
13.╇ Review����������������������������������尓������������������������������������尓�������������������������� 65
14.╇ Homework and Discussion Problems����������������������������������尓������������ 65
1. Preview
Excel has evolved, and it continues to evolve. The same is true of Solver.
Several versions of Excel and Solver are currently in use. A goal of this chapter
is to provide you with the information that that is needed to make effective
use of the software with which your computer is equipped.
If your computer is a PC, you could be using Excel 2003, 2007 or 2010.
Excel 2003 remains popular. Excel 2007 and Excel 2010 have different file
structures. To ease access, each topic is introduced in the context of Excel
2003 and is adapted to more recent versions of Excel in later subsections.
Needless to say, perhaps, some subsections are more relevant to you than
others.
But if your computer is equipped with Excel 2008 (for Macs only), its
software has a serious limitation. Excel 2008 does not support Visual Basic.
This makes it less than ideal for scientific and business uses. You will not be
able to use your computer to take the grunt-work out of the calculations in
Chapters 3 and 4, for instance. Upgrade to Excel 2011 as soon as possible.
It does support Visual Basic. Alternatively, use a different version of Excel,
either on your computer or on some other.
This section contains basic information about Excel. If you are familiar
with Excel, scan it or skip it.
Chapter 2: Eric V. Denardo 35
Cells
Table 2.1.↜ A spreadsheet
You select a cell by putting the cursor in that cell and then clicking it.
When you select a cell, it is outlined in heavy lines, and a fill handle appears
in the lower right-hand corner of the outline. In Table 2.1, cell C9 has been
selected. Note the fill handle – it will prove to be very handy.
Entering numbers
Entering functions
In Excel, functions (and only functions) begin with the “=” sign. To enter a
function into a cell, select that cell, depress the “=” key, then type the function,
and then depress the Enter key. The function you enter in a cell will not appear
there. Instead, the cell will display the value that the function has been assigned.
It Table 2.1, cell A3 displays the value 24, but it is clear (from column C)
that cell A3 contains the function 23â•›×â•›3, rather than the number 24. Similarly,
√
cell A5 displays the number 1.414…, which is the value of the function 2,
evaluated to ten significant digits.
Entering text
To enter text into a cell, select that cell, then type the text, and then de-
press either the Enter key or any one of the arrow keys. To make cell A6 look
as it does, select cell A6 and type mean. Then hit the Enter key. If the text
you wish to place in a cell could be misinterpreted, begin with an apostrophe,
which will not appear. To make cell A7 appear as it does in Table 2.1, select
cell A7, type ‘= mean, and hit the Enter key. The leading apostrophe tells Excel
that what follows is text, not a function.
Formatting a cell
In Table 2.1, cell A8 displays the fraction 1/3. Making that happen looks
easy. But suppose you select cell A8, type 1/3 and then press the Enter key.
What will appear in cell A8 is “3-Jan.” Excel has decided that you wish to put
a date in cell A8. And Excel will interpret everything that you subsequently
enter into cell A8 as a date. Yuck!
With Excel 2003 and earlier, the way out of this mess is to click on the
Format menu, then click on Cells, then click on the Number tab, and then
select either General format or a Type of Fraction.
With Excel 2007, the Format menu disappeared. To get to the Format
Cells box, double-click on the Home tab. In the menu that appears, click on
Chapter 2: Eric V. Denardo 37
the Format icon, and then select Format Cells from the list that appears. From
here on, proceed as in the prior subsection.
With Excel 2010, the Format Cells box has moved again. To get at it, click
on the Home tab. A horizontal “ribbon” will appear. One block on that ribbon
is labeled Number. The lower-right hand corner of the Number block has a
tiny icon. Click on it. The Format Cells dialog box will appear.
Entering Fractions
How can you get the fraction 1/3 to appear in cell A8 of Table 2.1? Here is
one way. First, enter the function =1/3 in that cell. At this point, 0.333333333
will appear there. Next, with cell A8 still selected, bring the Format Cells box
into view. Click on its Number tab, select Fraction and the Type labeled Up
to one digit. This will round the number 0.333333333 off to the nearest one-
digit fraction and report it in cell A8.
If you select a cell, its content appears in the formula bar, which is the
blank rectangle just above the spreadsheet’s column headings. If you select
cell A5 of Table 2.1, the formula =SQRT(2) will appear in the formula bar, for
instance. What good is the formula bar? It is a nice √ place to edit your func-
tions. If you want to change the number in cell A5 to 3, select cell A5, move
the cursor onto the formula bar, and change the 2 to a 3.
Arrays
In Excel jargon, a relative reference to a column or row omits the “$” sign,
and an absolute (or fixed) reference to a column or row includes the “$” sign.
With Excel 2003 and earlier, select the cell or array you want to repro-
duce. Then move the cursor to the Copy icon (it is just to the right of the scis-
sors), and then click it. This puts a copy of the cell or array you selected on the
Clipboard. Next, select the cell (or array) in which you want the information
to appear, and click on the Paste icon. What was on the clipboard will appear
where you put it except for any cell addresses in functions that you copied
onto the Clipboard. They will change as follows:
• The relative addresses will shift the number rows and/or columns that
separate the place where you got it and the place where you put it.
This may seem abstruse, but its uses will soon be evident.
With Excel 2007, the Copy and Paste icons have been moved. To make
them appear, double-click on the Home tab. The Copy icon will appear just
below the scissors. The Paste icon appears just to the left of the Copy icon,
and it has the word “Paste” written below it. With Excel 2010, the Copy and
Paste icons are back in view – on the Home tab, at the extreme left.
An effort has been made to present material about Excel in a way that is
easy to grasp. As concerns keystroke sequences, from this point on:
Chapter 2: Eric V. Denardo 39
This text displays each Excel keystroke sequence in boldface type, omit-
ting both:
The spreadsheets that appear in this text display the values that have been
assigned to functions, rather than the functions themselves. The convention
that is highlighted below can help you to identify the functions.
In Table 2.1, for instance, cells A3, A4 and A5 are outlined in dotted lines,
and column C specifies the functions whose values they contain. Finally:
The Springer website contains two items that are intended for use
with this book. They can be downloaded from http://extras.springer.
com/2011/978-1-4419-6490-8.
One of the items at the Springer website is a folder that is labeled, “Excel
spreadsheets – one per chapter.” You are encouraged to download that folder
now, open its spreadsheet for Chapter 2, note that this spreadsheet contains
sheets labeled Table 2.1, Table 2.2, …, and experiment with these sheets as
you proceed.
40 Linear Programming and Generalizations
Problem 2.A.╇ For the random variable X that is described in Table 2.2, com-
pute the mean, the variance, the standard deviation, and the mean absolute
deviation.
The sumproduct function will make short work of Problem 2.A. Before
discussing how, we interject a brief discussion of discrete probability models.
If you are facile with discrete probability, it is safe to skip to the subsection
entitled “Risk and Return.”
The probability model in Table 2.2 has four outcomes, and the sum of
their probabilities does equal 1.0. Outcome b will occur with probability 0.55,
and the random variable X will take the value 3.2 if outcome b occurs.
The random variable X in Table 2.2 takes values between –6 and +22. The
mean (a.k.a. expectation) of a random variable represents the “center” of its
probability distribution. The mean of a random variable X is denoted as μ or
E(X), and it is found by multiplying the probability of each outcome by the
value that the random variable takes when that outcome occurs and taking
the sum. For the data in Table 2.2, we have
The mean of a random variable has the same unit of measure as does the
random variable itself. If X is measured in dollars, so is its mean. The mean is
a weighted average; each value that X can take is weighed (multiplied) by its
probability.
There are several measures of the spread of a random variable, that is, of
the difference (X – μ) between the random variable X and its mean. The most
famous of these measures of spread is known as the variance. The variance of
a random variable X is denoted as σ 2 or Var(X) and is the expectation of the
square of (X – μ). For the data in Table 2.2, we have
The unit of measure of the variance is the square of the unit of measure of
the random variable. If X is measured in dollars, Var(X) is measured in (dol-
lars)â•›×â•›(dollars), which is a bit weird.
The standard deviation of a random variable has the same unit of mea-
sure as does the random variable itself.
Taking the square (in the variance) and then the square root (in the stan-
dard deviation) seems a bit contrived, and it emphasizes values that are far
from the mean. For many purposes, the mean absolute deviation may be a
more natural measure of the spread in a distribution.
Interpret the random variable X as the profit that will be earned from a
portfolio of investments. A tenet of financial economics is that in order to
obtain a higher return one must accept a higher risk. In this context, E(X) is
taken as the measure of return, and StDev(X) as the measure of risk. It can
make sense to substitute MAD(X) as the measure of risk. Also, as suggested in
Chapter 1, a portfolio X that minimizes MAD(X) subject to the requirement
that E(X) be at least as large as a given threshold can be found by solving a
linear program.
The arguments in the sumproduct function must be arrays that have the
same number of rows and columns. Let us suppose we have two arrays of
the same size. The sumproduct function multiplies each element in one of
these arrays by the corresponding element in the other and takes the sum.
The same is true for three arrays of the same size. That makes it easy to com-
pute the mean, the variance and the standard deviation, as is illustrated in
Table 2.3
Chapter 2: Eric V. Denardo 43
Note that:
• The function in cell C13 multiplies each entry in the array C5:C8 by
the corresponding entry in the array D5:D8 and takes the sum, thereby
computing μ = E(X).
The arrays in a sumproduct function must have the same number of rows
and the same number of columns. In particular, a sumproduct function will
not multiply each element in a row by the corresponding element in a column
of the same length.
Dragging
• Move the cursor to the lower right-hand corner of cell E5. The fill han-
dle (a small rectangle in the lower right-hand corner of cell E5) will
change to a Greek cross (“+” sign).
• While this Greek cross appears, depress the mouse, slide it down to cell
E8 and then release it. The functions =D6 – C$13 through =D8 – C$13
will appear in cells E6 through E8. Nice!
Dragging downward increments the relative row numbers, but not the
fixed row numbers. Similarly, dragging to the right increases the relative col-
umn numbers, but leaves the fixed column numbers unchanged. Dragging is
an especially handy way to repeat a pattern and to execute a recursion.
• Select the array (block) of cells whose values this array function will
determine.
• Type the name of the array function, but do not hit the Enter key. In-
stead, hit Ctrl+Shift+Enter (In other words, depress the Ctrl and Shift
keys and, while they are depressed, hit the Enter key).
Matrix multiplication
3 2
0 1 2 4 2
A= , B = 2 0 , C= ,
−1 1 −1 1 3
1 1
element is found by multiplying each element in the ith row of A by the cor-
responding element in the jth column of B and taking the sum.
A spreadsheet
• Hit Ctrl+Shift+Enter
Quirks
Excel computes array functions with ease, but it has its quirks. One of
them has been mentioned – you need to remember to end each array func-
tion by hitting Ctrl+Shift+Enter rather than by hitting Enter alone.
The third quirk occurs when you decide to alter an array function or to
eliminate an array. To do so, you must begin by selecting all of the cells in
which its output appears. Should you inadvertently attempt to change a por-
tion of the output, Excel will proclaim, “You cannot change part of an Array.”
If you then move the cursor – or do most anything – Excel will repeat its
proclamation. A loop! To get out of this loop, hit the Esc key.
x = 6 – 0.5y,
y = 2 + 0.5x.
This is easy. Substituting (2â•›+â•›0.5x) for y in the first equation gives xâ•›=â•›4
and hence yâ•›=â•›4.
Let us see what happens when we set this problem up in a naïve way for
solution on a spreadsheet. In Table 2.5, formulas for x and y have been placed
in cells B4 and B5. The formula in each of these cells refers to the value in
Chapter 2: Eric V. Denardo 47
the other. A loop has been created. Excel insists on being able to evaluate the
functions on a spreadsheet in some sequence. When Excel is presented with
Table 2.5, it issues a circular reference warning.
This seems ominous. Excel cannot solve a system of equations. But it can,
with a bit of help.
To see how to get around the circular reference problem, we turn our
attention to an example that is slightly more complicated than Problem 2.B.
This example is
Problem 2.C.╇ Find values of the variables A, B and C that satisfy the equa-
tions
2A + 3B + 4C = 10,
2A – 2B – C = 6,
A + B + C = 1.
48 Linear Programming and Generalizations
You probably recall how to solve Problem 2.C, and you probably recall
that it requires some grunt-work. We will soon see how to do it on a spread-
sheet, without the grunt-work.
An ambiguity
Problem 2.C exhibits an ambiguity. The letters A, B and C are the names
of the variables, and Problem 2.C asks us to find values of the variables A, B
and C that satisfy the three equations. You and I have no trouble with this
ambiguity. Computers do. On a spreadsheet, the name of the variable A will
be placed in one cell, and its value will be placed in another cell.
Table 2.6 presents the data for Problem 2.C. Cells B2, C2 and D2 contain
the labels of the three decision variables, which are A, B and C. Cells B6, C6
and D6 have been set aside to record the values of the variables A, B and C.
The data in the three constraints appear in rows 3, 4 and 5, respectively.
Note that:
• Trial values of the decision variables have been inserted in cells B6, C6
and D6.
• The “=” signs in cells F3, F4 and F5 are memory aides; they remind
us that we want to arrange for the numbers to their left to equal the
numbers to their right, but they have nothing to do with the computa-
tion.
Chapter 2: Eric V. Denardo 49
• The “$” signs in cell E5 suggest – correctly – that this function has been
dragged upward onto cells E4 and E3. For instance, cell E3 contains the
value assigned to the function
=â•›SUMPRODUCT(B3:D3, B$6:D$6)
and the number 9 appears in cell E3 because Excel assigns this function
the value 9 = 2â•›×â•›1â•›+â•›3â•›×â•›1â•›+â•›4â•›×â•›1.
The pattern in Table 2.6 works for any number of linear equations in
any number of variables. This pattern is dubbed the “standard format” for
linear systems, and it will be used throughout this book. A linear system is
expressed in standard format if the columns of its array identify the variables
and the rows identify the equations, like so:
• One row is reserved for the values of the variables (row 6, above).
–╇An “=” sign that serves (only) as a memory aid (as in cell F3).
What is missing?
Our goal is to place numbers in cells B6:D6 for which the values of the
functions in cells E3:E5 equal the numbers in cells G3:G5, respectively. Excel
cannot do that, by itself. We will see how to do it with Solver and then with
Premium Solver for Education.
50 Linear Programming and Generalizations
Let us begin with a bit of the history. Solver was written by Frontline
Systems for inclusion in an early version of Excel. Shortly thereafter, Micro-
soft took over the maintenance of Solver, and Frontline Systems introduced
Premium Solver. Over the intervening years, Frontline Systems has improved
its Premium Solver repeatedly. Recently, Microsoft and Frontline Systems
worked together in the design of Excel 2010 (for PCs) and Excel 2011 (for
Macs). As a consequence:
• If your computer is equipped with Excel 2003 or Excel 2007, Solver is per-
fectly adequate, but Premium Solver has added features and fewer bugs.
• If your computer is equipped with Excel 2010 (for PCs) or with Excel
2011 (for Macs), a great many of the features that Frontline Systems
introduced in Premium Solver have been incorporated in Solver itself,
and many bugs have been eliminated.
• If your computer is equipped with Excel 2008 for Macs, it does not sup-
port Visual Basic. Solver is written in Visual Basic. The =pivot(cell, ar-
ray) function, which is used extensively in this book, is also written in
Visual Basic. You will not be able to use Solver or the “pivot” function
until you upgrade to Excel 2011 (for Macs). Until then, use some other
version of Excel as a stopgap.
Preview
This section begins with a discussion of the version of Solver with which
Excel 2000, 2003 and 2007 are equipped. The discussion is then adapted to
Excel 2010 and 2011. Premium Solver is introduced in the next section.
Finding Solver
When you purchased Excel (with the exception of Excel 2008 for Macs),
you got Solver. But Solver is an “Add-In,” which means that it may not be
ready to use. To see whether Solver is up and running, open a spreadsheet.
Chapter 2: Eric V. Denardo 51
With Excel 2003 or earlier, click on the Tools menu. If Solver appears
there, you are all set; Solver is installed and activated. If Solver does not ap-
pear on the Tools menu, it may have been installed but not activated, and it
may not have been installed. Proceed as follows:
• Click again on the Tools menu, and then click on Add-Ins. If Solver is
listed as an Add-In but is not checked off, check it off. This activates
Solver. The next time you click on the Tools menu, Solver will appear
and will be ready to use.
• If Solver does not appear on the list of Add-Ins, you will need to find
the disc on which Excel came, drag Solver into your Library, and then
activate it.
If your computer is equipped with Excel 2007, Solver is not on the Tools
menu. To access Solver, click on the Data tab and then go to the Analysis box.
You will see a button labeled Solver if it is installed and active. If the Solver
button is missing:
• Click on the Office Button that is located at the top left of the spread-
sheet.
• In the bottom right of the window that appears, select the Excel Op-
tions button.
• Next, click on the Add-Ins button on the left and look for Solver Add-
In in the list that appears.
• If it is in the inactive section of this list, then select Manage: Excel Add-
Ins, then click Go…, and then select the box next to Solver Add-in and
click OK.
• If Solver Add-in is not listed in the Add-Ins available box, click Browse
to locate the add-in. If you get prompted that the Solver Add-in is not
currently installed on your computer, click Yes to install it.
To find Solver with Excel 2010, click on the Data tab. If Solver appears
(probably at the extreme right), you are all set. If Solver does not appear, you
52 Linear Programming and Generalizations
will need to activate it, and you may need to install it. To do so, open an Excel
spreadsheet and then follow this protocol:
• Click on the File menu, which is located near the top left of the spread-
sheet.
• Click on the Options tab (it is near the bottom of the list) that appeared
when you clicked on the File menu.
• A dialog box named Excel Options will pop up. On the side-bar to its
left, click on Add-Ins. Two lists of Add-Ins will appear – “Active Appli-
cation Add-Ins” and “Inactive Application Add-Ins.”
–╇If Solver is on the “Inactive” list, find the window labeled “Manage:
Excel Add-Ins,” click on it, and then click on the “Go” button to its
right. A small menu entitled Add-Ins will appear. Solver will be on
it, but it will not be checked off. Check it off, and then click on OK.
–╇If Solver is not on the “Inactive” list, click on Browse, and use it to
locate Solver. If you get a prompt that the Solver Add-In is not cur-
rently installed on your computer, click “Yes” to install it. After in-
stalling it, you will need to activate it; see above.
To make your Solver dialog box look like that in Figure 2.1, proceed as
follows:
• With Excel 2003, on the Tools menu, click on Solver. With Excel 2007,
go to the Analysis box of the Data tab, and click on Solver.
• Move the cursor to the By Changing Cells window, then select cells
B6:D6, and then click.
–╇Click on the Cell Reference window, then select cells E3:E5 and click.
–╇Click on the Constraint window. Then select cells G3:G5 and click.
This will cause the Add Constraint dialog box to look like:
–╇Click on OK. This will close the Add Constraint dialog box and re-
turn you to the Solver dialog box, which will now look exactly like
Figure 2.1.
• In the Solver dialog box, do not click on the Solve button. Instead, click
on the Options button and, on the Solver Options menu that appears
54 Linear Programming and Generalizations
(see below) click on the Assume Linear Model window. Then click on
the OK button. And then click on Solve.
In a flash, your spreadsheet will look like that in Table 2.7. Solver has
succeeded; the values it has placed in cells B6:D6 that enforce the constraints
E3:E5â•›=â•›G3:G5; evidently, setting Aâ•›=â•›0.2, Bâ•›=â•›–6.4 and Câ•›=â•›7.2 which solves
Problem 2.C.
Presented as Figure 2.2 is a Solver dialog box for Excel 2010. It differs from
the dialog box for earlier versions of Excel in the ways that are listed below:
Chapter 2: Eric V. Denardo 55
• The method of solution is selected on the main dialog box rather than
on the Options page.
Fill this dialog box out as you would for Excel 2007, but remember to
select the option you want in the “nonnegative variables” box.
Frontline Systems has made available for educational use a bundle of soft-
ware called the Risk Solver Platform. This software bundle includes Premium
Solver, which is an enhanced version of Solver. This software bundle also in-
cludes the capability to formulate and run simulations and the capability to
draw and roll back decision trees. Sketched here are the capabilities of Pre-
mium Solver. This sketch is couched in the context of Excel 2010. If you are
using a different version of Excel, your may need to adapt it somewhat.
Note to instructors
If you adopt this book for a course, you can arrange for the participants
in your course (including yourself, of course) to have free access to the edu-
cational version of the Risk Solver Platform. To do so, call Frontline Systems
at 755 831-0300 (country code 01) and press 0 or email them at academics@
solver.com.
Note to students
If you are enrolled in a course that uses this book, you can download
the Risk Solver Platform by clicking on the website http://solver.com/student/
and following instructions. You will need to specify the “Textbook Code,”
which is DLPEPAE, and the “Course code,” which your instructor can pro-
vide.
cussed in this subsection. Using it as part of the Risk Solver Platform is dis-
cussed a bit later.
• In the window to the left of the Options button, click on Standard LP/
Quadratic.
• If the button that makes the variables nonnegative is checked off, click
on it to remove the check mark. Then click on Solve.
In a flash, your spreadsheet will look like that in Table 2.7. It will report
values of 0.2, –6.4 and 7.2 in cells B7, C7, and D7.
But when Premium Solver is operated from the Risk Solver Platform, it is
modeless, which means that you can move back and forth between Premium
Solver and your spreadsheet without closing anything down. The modeless
version can be very advantageous.
To see how to use Premium Solver from the Risk Solver Platform, begin
by reproducing Table 2.6 on a spreadsheet. Then click on the File button. A
Risk Solver Platform button will appear at the far right. Click on it. A menu
will appear. Just below the File button will be a button labeled Model. If that
button is not colored, click on it. A dialog box will appear at the right; in it,
click on the icon labeled Optimization. A dialog box identical to Figure 2.4
will appear, except that neither the variables nor the constraints will be identi-
fied.
Chapter 2: Eric V. Denardo 59
Making this dialog box look exactly like Figure 2.4 is not difficult. The
green Plus sign (Greek cross) just below the word “Model” is used to add
information. The red “X” to its right is used to delete information. Proceed
as follows:
• Select cells B6:D6, then click on Normal Variables, and then click on
Plus.
• Click on Normal Constraints and then click on Plus. Use the dialog box
that appears to impose the constraints E3:E5 = G3:G5.
It remains to specify the solution method you will use and to execute the
computation. To accomplish this:
• Click on Engine, which is to the right of the Model button, and select
Standard LP/Quadratic Engine.
• Click on Output, which is to the right of the Engine button. Then click
on the green triangle that points to the right.
60 Linear Programming and Generalizations
In an instant, your spreadsheet will look exactly like Table 2.7. It will ex-
hibit the solution Aâ•›=â•›0.2, Bâ•›=â•›–6.4 and Câ•›=â•›7.2.
Premium Solver and the versions of Solver that are in Excel 2010 and Ex-
cel 2011 include all three packages. Earlier editions of Excel include the first
two of these packages. A subsection is devoted to each.
The LP software
When solving linear programs and integer programs, use the LP soft-
ware. It is quickest, and it is guaranteed to work. If you use it with earlier
versions of Solver, remember to shift to the Options sheet and check off As-
sume Linear Model. To use it with Premium Solver as an Add-In, check off
Standard LP/Quadratic in a window on the main dialog box. The advantages
of this package are listed below:
• Its software checks that the system you claim to be linear actually is
linear – and this is a debugging aid. (Excel 2010 is equipped with a ver-
sion of Solver that can tell you what, if anything, violates the linearity
assumptions.)
Chapter 2: Eric V. Denardo 61
• When you use the LP software, you can place any values you want in
the changing cells before you click on the Solve button. The values you
have placed in these cells will be ignored.
• On the other hand, when you use the GRG software, the values you
place in the changing cells are important. The software starts with the
values you place in the changing cells and attempts to improve on them.
The closer you start, the more likely the GRG software is to obtain a solu-
tion. It is emphasized:
When using the GRG software, try to “start close” by putting reasonable
numbers in the changing cells.
Premium Solver’s GRG code includes (on its options menu) a “multi-
start” feature that is designed to find solutions to problems that are not con-
vex. If you are having trouble with the GRG code, give it a try.
A quirk
The GRG Solver may attempt to evaluate a function outside the range for
which it is defined. It can attempt to evaluate the function =LN(cell) with a
negative number in that cell, for instance. Excel’s =ISERROR(cell) function
can help you to work around this. To see how, please refer to the discussion
on page 643 of Chapter 20.
62 Linear Programming and Generalizations
Numerical differentiation
It is also the case that the GRG Solver differentiates numerically; it ap-
proximates the derivative of a function by evaluating that function at a variety
of points. It is safe to use any function that is differentiable and whose deriva-
tive is continuous. Here are two examples of functions that should be avoided:
If you use a function that is not differentiable, you may get lucky. And you
may not. It is emphasized:
Needless to say, perhaps, it is a very good idea to avoid functions that are
not continuous when you use the GRG Solver.
This software package is markedly slower, but it does solve problems that
elude the simplex method and the generalized reduced gradient method. Use
it when the GRG solver does not work.
The Risk Solver Platform includes other optimization packages. The Gu-
robi package solves linear, quadratic, and mixed-integer programs very ef-
fectively. Its name is an amalgam of the last names of the founders of Gurobi
Optimization, who are Robert Bixby, Zonghao Gu, and Edward Rothberg.
The SOCP engine quickly solves a generalization of linear programs whose
constraints are cones.
random variable having μ as its mean and σ as its standard deviation exceeds
the number q. That function sees action in Chapter 7.
Begin by clicking on the Springer website for this book, which is speci-
fied on page 39. On that website, click on the icon labeled OP_TOOLS, copy
it, and paste it into a convenient folder on your computer, such as My Docu-
ments. Alternatively, drag it onto your Desktop.
What remains is to insert this Add-In in your Library and to activate it.
How to do so depends on which version of Excel you are using.
With Excel 2003, the Start button provides a convenient way to find and
open your Library folder (or any other). To accomplish this:
• Click on the Start button. A menu will pop up. On that menu, click on
Search. Then click on For Files and Folders. A window will appear. In
it, type Library. Then click on Search Now.
• After a few seconds, the large window to the right will display an icon
for a folder named Library. Click on that icon. A path to the folder that
contains your Library will appear toward the top of the screen. Click on
that path.
• You will have opened the folder that contains your library. An icon for
your Library is in that folder. Click on the icon for your Library. This
opens your Library.
With your library folder opened, drag OP_TOOLS into it. Finally, acti-
vate OP_TOOLS, as described earlier.
With Excel 2007 and Excel 2010, clicking on the Start button is not the
best way to locate your Library. Instead, open Excel. If you are using Excel
64 Linear Programming and Generalizations
2007, click on the Microsoft Office button. If you are using Excel 2010, click
on File.
Next, with Excel 2007 or 2010, click on Options. Then click on the Add-
Ins tab. In the Manage drop-down, choose Add-Ins and then click Go. Use
Browse to locate OP_TOOLS and then click on OK. Verify that OP_TOOLS
is on the Active Add-Ins list, and then click on OK at the bottom of the
window.
The fact that Excel gives instant feedback can help you to “debug as you
go.”
Chapter 2: Eric V. Denardo 65
13. Review
All of the information in this chapter will be needed, sooner or later. You
need not master all of it now. You can refer back to this chapter as needed.
Before tackling Chapters 3 and 4, you should be facile with the use of spread-
sheets to solve systems of linear equations via the “standard format.” You
should also prepare to use the software on the Springer website for this book.
A final word about Excel: When you change any cell on a spreadsheet, Ex-
cel automatically re-computes the value of each function on that sheet. This
happens fast – so fast that you may not notice that it has occurred.
3. (↜the famous birthday problem) Suppose that each child born in 2007 (not
a leap year) was equally likely to be born on any day, independent of the
others. A group of n such children has been assembled. None of these
children are related to each other. Denote as Q(n) the probability that at
least two of these children share a birthday. Find the smallest value of n for
which Q(n)â•›>â•›0.5. Hints: Perhaps the probability P(n) that these n children
were born on n different days be found (on a spreadsheet) from the recur-
sion P(n)â•›=â•›P(n – 1) (365 – n)/365. If so, a “drag” will show how quickly
P(n) decreases as n increases.
4. For the matrices A and B in Table 2.4, compute the matrix product BA.
What happens when you ask Excel to compute (BA)–1? Can you guess
why?
66 Linear Programming and Generalizations
3A + 2B + 1C + 5 ln(A) = 6
2A + 3B + 2C + 4 ln(B) = 5
1A + 2B + 3C + 3 ln(C) = 4
6. Recreate Table 2.4. Replace the “0” in matrix A with a blank. What hap-
pens?
7. The spreadsheet that appears below computes 1 + 2n and 2n for various
values of n, takes the difference, and gets 1 for nâ•›≤â•›49 and gets 0 for nâ•›≥â•›50.
Why? Hint: Modern versions of Excel work with 64 bit words.
Chapter 3: Mathematical Preliminaries
1.╅ Preview����������������������������������尓������������������������������������尓������������������������ 67
2.╅ Gaussian Operations����������������������������������尓������������������������������������尓�� 68
3.╅ A Pivot����������������������������������尓������������������������������������尓�������������������������� 69
4.╅ A Basic Variable����������������������������������尓������������������������������������尓���������� 71
5.╅ Trite and Inconsistent Equations����������������������������������尓�������������������� 72
6.╅ A Basic System����������������������������������尓������������������������������������尓������������ 74
7.╅ Identical Columns����������������������������������尓������������������������������������尓������� 76
8.╅ A Basis and its Basic Solution����������������������������������尓������������������������ 78
9.╅ Pivoting on a Spreadsheet ����������������������������������尓������������������������������ 78
10.╇ Exchange Operations����������������������������������尓������������������������������������尓�� 81
11.╇ Vectors and Convex Sets����������������������������������尓���������������������������������� 82
12.╇ Vector Spaces����������������������������������尓������������������������������������尓���������������� 87
13.╇ Matrix Notation����������������������������������尓������������������������������������尓���������� 89
14.╇ The Row and Column Spaces ����������������������������������尓������������������������ 93
15.╇ Efficient Computation* ����������������������������������尓���������������������������������� 98
16.╇ Review����������������������������������尓������������������������������������尓������������������������ 103
17.╇ Homework and Discussion Problems����������������������������������尓���������� 104
1. Preview
One section of this chapter is starred. That section touches lightly on ef-
ficient numerical computation, an advanced topic on which this book does
not dwell.
3. A Pivot
• Then, for each k other than j, replace equation (k) by itself minus equa-
tion (j) times the coefficient of x in equation (k).
70 Linear Programming and Generalizations
This definition may seem awkward, but applying it to system (1) will
make everything clear. This will be done twice – first by hand, then on a
spreadsheet.
This pivot transforms system (1) into system (2), below. This pivot con-
sists of Gaussian operations, so it preserves the set of solutions to system (1).
In other words, each set of values of the variables x1, x2, x3, and x4 that satis-
fies system (1) also satisfies system (2), and conversely.
This pivot has eliminated the variable x1 from equations (2.2), (2.3) and
(2.4) because its coefficients in these equations equal zero.
Chapter 3: Eric V. Denardo 71
The next pivot will occur on a nonzero coefficient in equation (2.2). The
variables x3 and x4 have nonzero coefficients in this equation. We could pivot
on either. Let’s pivot on the coefficient of x3 in equation (2.2). This pivot con-
sists of the following sequence of Gaussian operations:
These Gaussian operations transform system (2) into system (3), below.
They create no solutions and destroy none.
This pivot made x3 basic for equation (3.2). It kept x1 basic for equation
(3.1). That is no accident. Why? The coefficient of x1 in equation (2.2) had
been set equal to zero, so replacing another equation by itself less some con-
72 Linear Programming and Generalizations
stant times equation (2.2) cannot change its coefficient of x1 . The property
that this illustrates holds in general. It is emphasized:
•â•‡The variable x becomes basic for the equation that has the coef-
ficient on which the pivot occurred.
•â•‡Any variable that had been basic for another equation remains ba-
sic for that equation.
For the remainder of this section, it is assumed that pâ•›=â•›−4/3. In this case,
equations (3.1) and (3.2) have basic variables, and equation (3.3) is trite.
Gauss-Jordan elimination continues to pivot, aiming for a basic variable for
each non-trite equation. Equation (3.4) lacks a basic variable. The variables x2
Chapter 3: Eric V. Denardo 73
In system (4), each non-trite equation has been given a basic variable. A
solution to system (4) is evident. Equate each basic variable to the right-hand-
side value of the equation for which it is basic, and equate any other variables
to zero. That is, set:
x1 = 1, x3 = –2/3, x2 = 1/3, x4 = 0.
These values of the variables satisfy system (4), hence must satisfy sys-
tem (1).
(5.1) x1 = 1 – (5/3)x4,
satisfies system (4) and, consequently, satisfies system (1). By the way, the
question posed earlier can now be answered: If p ≠â•›−â•›4/3. system (1) has no
solution, and if pâ•›=â•›−4/3, system (1) has infinitely many solutions, one for
each value of x4 .
The dictionary
System (5) has been written in a format that is dubbed the dictionary
because:
• Each equation has a basic variable, and that basic variable is the sole
item on the left-hand side of the equation for which it is basic.
• The nonbasic variables appear only on the right-hand sides of the equa-
tions.
Basic solution
x1 = 1, x3 = −2/3, x2 = 1/3, x4 = 0.
Chapter 3: Eric V. Denardo 75
Each equation has n + 1 data elements, including its right-hand side val-
ue. The first Gaussian operation in a pivot divides an equation by the coef-
ficient of one of its variables. This requires n divisions (not n + 1) because it
is not necessary to divide a number by itself. Each of the remaining Gaussian
operations in a pivot replaces an equation by itself less a particular constant d
times another equation. This requires n multiplications (not n+1) because it
is not necessary to compute dâ•›−â•›dâ•›=â•›0. We’ve seen that each Gaussian operation
in a pivot requires n multiplications or divisions. Evidently:
• Each pivot entails m Gaussian operations, one per equation, for a total
of m n multiplications and divisions per pivot.
In brief:
A minor complication has been glossed over: a basic system can have
more than one basic solution. To indicate how this can occur, consider system
(6), below. It differs from system (1) in that p equals −4/3 and in that it has a
fifth decision variable, x5 , whose coefficient in each equation equals that of
x2 .
The variables x2 and x5 are basic for equation (7.4). When x2 became
basic, x5 also became basic. Do you see why?
System (7) has two basic solutions. One basic solution corresponds to
selecting x2 as the basic variable for equation (7.4), and it sets
If an equation has more than one basic variable, two or more variables
in the original system had identical columns of coefficients, and all of
them became basic for that equation.
The fact that identical columns stay identical is handy – in later chapters,
it will help us to understand the simplex method.
Consider any basic system. A set of variables is called a basis if this set
consists of one basic variable for each equation that has a basic variable. Sys-
tem (4) has one basis, which is the set {x1 , x3 , x2 } of variables. System (7) has
two bases. One of these bases is the set {x1 , x3 , x2 } of variables. The other
basis is {x1 , x3 , x5 }.
Again, consider any basic system. Each basis for it has a unique basic
solution, namely, the solution to the equation system in which each nonbasic
variable is equated to zero and each basic variable is equated to the right-
hand-side value of the equation for which it is basic. System (7) is basic. It has
two bases and two basic solutions; equation (8) gives the basic solution for the
basis {x1 , x3 , x2 }, and (9) gives the basic solution for the basis {x1 , x3 , x5 }.
The terms “basic variable,” “basis,” and “basic solution” suggest that a ba-
sis for a vector space lurks nearby. That vector space is identified later in this
chapter.
Pivoting by hand gets old fast. Excel can do the job flawlessly and pain-
lessly. This section tells how.
A detached-coefficient tableau
The spreadsheet in Table 3.1 will be used to solve system (1) for the case
in which pâ•›=â•›−4/3. Rows 1 through 5 of Table 3.1 are a detached-coefficient
tableau for system (1). Note that:
Chapter 3: Eric V. Denardo 79
Table 3.1.↜ Detached coefficient tableau for system (1) and the first pivot.
Excel functions could be used to create rows 7-10. For instance, row 7
could be obtained by inserting in cell B7 the function =B2/$B2 and dragging
it across the row. Similarly, row 8 could be obtained by inserting in cell B8
the function =B3â•›−â•›$B3 * B$7 and dragging it across the row. But there is an
easier way.
80 Linear Programming and Generalizations
An Add-In
• Select the array B7:F10 (This causes the result of the pivot to appear in
cells B7 through F10.)
Table 3.2 reports the result of executing two more pivots with the same
array function.
• Select the block B12:F15 of cells, type =pivot(D8, B7:F10) and then hit
Ctrl+Shift+Enter
• Select the block B17:F20 of cells, type =pivot(C15, B12:F15) and then
hit Ctrl+Shift+Enter
Rows 17-20 report the result of these pivots. The data in rows 17-20 are
identical to those in system (4) with pâ•›=â•›−4/3. In particular:
Like the others, these exchange operations can be undone. To recover the
original equation system after doing an exchange operation, simply repeat it.
Exchanging rows 19 and 20 shifts the trite equation to the bottom. Then, ex-
changing columns C and D puts the basic variables on the diagonal.
In linear algebra, the two Gaussian operations that were introduced ear-
lier and the first of the above two exchange operations are known as elemen-
tary row operations. Most texts on linear algebra begin with a discussion
of elementary row operations and their properties. That’s because Gaussian
operations are fundamental to linear algebra.
Vectors
Similarly, the symbol n denotes the set of all n-vectors, namely, the set
that consists of each vector x = (x1 , x2 , . . . , xn ) as x1 through xn vary, in-
dependently, over the set of all real numbers. This set n of all n-vectors is
known as n-dimensional space or, more succinctly, as n-space. The n-vector
xâ•›=â•›(0, 0, …, 0) is called the origin of n .
Figure 3.1.↜ The vectors xâ•›=â•›(5, 1) and yâ•›=â•›(−2, 3) and their sum.
x + y = (3, 4)
y = (–2, 3)
3 x = (5, 1)
–2 0 3 5
Vector addition
(10) x + y = (x1 + y1 , x2 + y2 , . . . , xn + yn ).
The gray lines in Figure 3.1 indicate that, graphically, to take the sum of
the vectors (5, 1) and (−2, 3), we can shift the “tail” of either vector to the head
of the other, while preserving the “length” and “direction” of the vector that is
being shifted.
Scalar multiplication
What happens when the vector x in Figure 3.1 is multiplied by the scalar
câ•›=â•›0.75? Each entry in x is multiplied by 0.75. This reduces the length of the
vector x without changing the direction in which it points.
What happens when the vector x is multiplied by the scalar câ•›=â•›−1? Each
entry in x is multiplied by −1. This reverses the direction in which x points,
but does not change its length.
x – y = x + (–1)y = (x1 x– =
y1,(xx12 ,–xy22,, . . . , xxnn )– yn).
Figure 3.2.↜ The vectors xâ•›=â•›(5, 1) and yâ•›=â•›(−2, 3) and their difference.
y = (–2, 3)
3 x = (5, 1)
–2 0 3 5 7
–2 x – y = (7, –2)
cx + (1 − c)y
Figure 3.3.↜ The thick gray line segment is the interval between x╛=╛(5, 1)
and yâ•›=â•›(−2, 3).
c=0
c = 1/4
c = 1/2
3 c = 3/4
c=1
y = (–2, 3) 1
x = (5, 1)
–2 0 5 7
–2 x – y = (7, –2)
Each convex combination of the vectors x and y that are depicted in Fig-
ure 3.3 can be written as
where c is a number that lies between 0 and 1, inclusive. Evidently, the interval
between x and y consists of each vector y + c(xâ•›−â•›y) obtained by adding y and
the vector c(xâ•›−â•›y) as c varies from 0 to 1. Figure 3.3 depicts y + c(xâ•›−â•›y) for the
values câ•›=â•›0, 1/4, 1/2, 3/4 and 1.
86 Linear Programming and Generalizations
By the way, if x and y are distinct n-vectors, the line that includes x and y
is the set L that is given by
This line includes x (take câ•›=â•›1) and y (take câ•›=â•›0), it contains the interval
between x and y, and it extends without limit in both directions.
Convex sets
Convex sets will play a key role in linear programs and in their general-
izations. A vector x that is a member of a convex set C is said to be an extreme
point of C if x is not a convex combination of two other vectors in C. Read-
ing from left to right, the four convex sets in Figure 3.4 have infinitely many
extreme points, three extreme points, no extreme points, and two extreme
points. Do you see why?
Linear constraints
Let us recall from Chapter 1 that each constraint in a linear program re-
quires a linear expression to bear one of three relationships to a number, these
three being “=”, “≤”, and “≥.” In other words, with a0 through an as fixed
numbers and x1 through xn as decision variables, each constraint takes one
of these forms:
a1 x1 + a1 x2 + · · · + an xn = a0
a1 x1 + a1 x2 + · · · + an xn ≤ a0
a1 x1 + a1 x2 + · · · + an xn ≥ a0
It’s easy to check that the set of n-vectors xâ•›=â•›(x1, x2, …, xn) that satisfy a
particular linear constraint is convex. As noted above, the intersection of con-
vex sets is convex. Hence, the set of vectors xâ•›=â•› (x1 , x2 , · · · , xn ) that satisfy
all of the constraints of a linear program is convex. It is emphasized:
The set of vectors that satisfy all of the constraints of a linear program
is convex.
Convex sets play a crucial role in linear programs and in nonlinear pro-
grams.
• V is not empty.
Each vector space V must contain the origin; that is so because V must
contain at least one vector x and because it must also contain the scalar 0
times x, which is the origin. Each vector space is a convex set. Not every con-
vex set is a vector space, however.
Geometric insight
It’s clear, visually, that the subsets V of 2 (the plane) that are vector
spaces come in these three varieties:
Linear combinations
(15) c1 v1 + c2 v2 + · · · + cK vK ,
(16) 0 = c1 v1 + c2 v2 + · · · + cK vK
bination of these vectors is to multiply each of them by the scalar 0 and then
add them up.
A basis
Trouble?
A basis has just been defined as a set of vectors. Earlier, in our discussion
of Gauss-Jordan elimination, a basis had been defined as a set of decision
variables. That looks to be incongruous, but a correspondence will soon be
established.
In the prior section, the entries in the n-vector xâ•›=â•›(x1, x2, …, xn) could
have been arranged in a row or in a column. When doing matrix arithmetic,
it is necessary to distinguish between rows and columns.
90 Linear Programming and Generalizations
Matrices
A1j
A2j
Aj = , Ai = [Ai1 Ai2 · · · Ain ]
..
.
Amj
Matrix multiplication
Thus, the ijth element of the matrix (EF) equals the product Ei Fj of the ith
row of E and the jth column of F.
Similarly, the ith row (EF)i of EF and jth column (EF)j of EF are given by
(19) (EF)i = Ei F,
It is emphasized:
The ith row of the matrix product (EF) equals EiF and the jth column of
this matrix product equals EFj
Vectors
In this context, a vector is a matrix that has only one row or only one
column. Whenever possible, lower-case letters are used to represent vectors.
Displayed below are an n × 1 vector x and an m × 1 vector b.
x1 b1
x2 b2
x = . , b= .
.. ..
xn bm
• The data in the equation Axâ•›=â•›b are the m × n matrix A and the m × 1
vector b.
• The decision variables (unknowns) in this equation are arrayed into the
n × 1 vector x.
In brief, the integer m is the number of rows in the matrix A, and the
integer n is the number of columns. Put another way, the matrix equation
Axâ•›=â•›b is a system of m equations in n unknowns.
92 Linear Programming and Generalizations
Note in expression (21) that the number (scalar) x1 multiplies each entry
in A1 (the 1st column of A), that the scalar x2 multiplies each entry in A2 , and
so forth. In other words,
(22) Ax = A1 x1 + A2 x2 + · · · + An xn .
You may recall that the “column space” of a matrix A is the set of all linear
combinations of the columns of A; we will get to that shortly.
(23) yâ•›Ax==
(yâ•›A
(x1,1 , yâ•›xA2 ,2 . . . , yâ•›
xAn )n).
(24) yA = y1 A1 + y2 A2 + · · · + ym Am
Chapter 3: Eric V. Denardo 93
In brief:
An ambiguity
is called the column space of the matrix A. Equation (25) reads, “ Vc equals
the set that contains Ax for every n × 1 vector x.” It is clear from equation
(22) that Vc is the set of all linear combinations of the columns of the matrix
A, moreover, that Vc is a vector space.
(26) Vr = {yA : y ∈ 1 × m },
is called the row space of A. Evidently, Vr is the set of all linear combinations
of the rows of the matrix A, and it too is a vector space.
construct a basis for the column space of the 4 × 4 matrix A that is given
by
2 4 −1 8
1 2 1 1
(27) A= .
0 0 2 −4
−1 1 −1 1
Let us see how. With A given by (27) and with x as a 4 × 1 vector, equa-
tion (22) shows the matrix product A x is this linear combination of the col-
umns of A.
2 4 −1 8
1 2 1 1
(28) Ax =
0 x1 + 0 x2 + 2 x3 + −4 x4 .
−1 1 −1 1
Please observe that (28) is identical to the left-hand side of system (1).
A homogeneous equation
2 4 −1 8 0
1 2 1 1 0
(29) x1 + x2 + x3 + x4 = ,
0 0 2 −4 0
−1 1 −1 1 0
No new work is needed to identify the solutions to (29). To see why, re-
place the right-hand side values of system (1) by 0’s and repeat the Gaussian
operations that transformed system (1) into system (4), getting:
1 0 0 5/3 0
0 0 1 −2 0
(30) x1 + x2 + x3 + x4 = .
0 0 0 0 0
0 1 0 2/3 0
Chapter 3: Eric V. Denardo 95
2 4 −1 8
1 2 1 1
x1 + x 2 + x3 =
0 0 2 −4
−1 1 −1 1
transforms it into
1 0 0 5/3
0 0 1 −2
x1 + x2 + x3 = ,
0 0 0 0
0 1 0 2/3
{Aj : j ∈ C} the set of columns on which pivots have occurred. This set
{Aj : j ∈ C} of columns is a basis for the column space of A.
The analog of (30) indicates that the set {Aj : j ∈ C} of columns must
be linearly independent and that each of the remaining columns must be a
linear combination of these columns. Thus, the set {Aj : j ∈ C} of columns
span the column space of A, which completes a proof.
Reconciliation
In the current section, the same Gauss-Jordan procedure has been used to
identify the set {A1 , A2 , A3 } of columns as a basis for the column space of A.
These are two different ways of making the same statement. It is emphasized:
The statement that a set of variables is a basis for the equation system
Ax = b means that their columns of coefficients are a basis for the col-
umn space of A and that b lies in the column space of A.
When the variables in the equation system Axâ•›=â•›b are labeled x1 through
xn , a basis can also be described as a subset β of the first n integers. A subset β
of the first n integers is now said to be a basis if {Aj : j ∈ β} is a basis for the
column space of A . In brief, the same basis for the column space of the 4 × 4
matrix A in equation (27) is identified in these three ways:
• As the set {A1 , A2 , A3 } of columns of A.
A basis for the row space of a matrix A could be found by applying Gauss-
Jordan elimination to the equation AT x = 0, where AT denotes the trans-
pose of A. A second application of Gauss-Jordan elimination is not necessary,
however.
Three key results about vector spaces are stated and illustrated in this
subsection. These three results are highlighted below:
Three results:
•â•‡Every basis for a vector space contains the same number of ele-
ments, and that number is called the rank of the vector space.
•â•‡The row space and the column spaces of a matrix A have the same
rank.
All three of these results are important. Their proofs are postponed, how-
ever, to Chapter 10, which sets the stage for a deeper understanding of linear
programming.
• The set {A1 , A2 , A3 } of columns is a basis for the column space of the
matrix A in (27).
The rank of a vector space is also called its dimension; these terms are
synonyms. “Dimension” jives better with our intuition. In 3-space, every
plane through the origin has 2 as its dimension (or rank), for instance.
Pivots make the simplex method easy to understand, but they are rela-
tively inefficient. Gaussian elimination substitutes “lower pivots” for pivots. It
solves an equation system with roughly half the work. Or less.
Lower pivots
• For each equation (k) that remains in S, replace equation (k) by itself
less the multiple of equation (j) that equates the coefficient of x in equa-
tion (k) to zero.
A familiar example
Initially, before any lower pivots have occurred, the set S consists of equa-
tions (31.1) through (31.4).
The variable x1 does not appear in equations (32.2), (32.3) and (32.4).
These three equations are identical to equations (2.2), (2.3) and (2.4), as
must be.
Equation (31.1) has been set aside, temporarily. After equations (32.2)
through (32.4) have been solved for values of the variables x2 , x3 and x4 ,
equation (31.1) will be solved for the value of x1 that is prescribed by these
values of x2 , x3 and x4 .
This lower pivot replaces (32.3) and (32.4) by equations (33.3) and (33.4).
The variable x3 has been eliminated from equations (33.3) and (33.4).
These two equations are identical to equations (3.3) and (3.4), exactly as in
the case for the first lower pivot.
Equation (32.2), on which this pivot occurred, is set aside. After solving
equations (33.3) and (33.4) for values of the variables x2 and x4 , equation
(32.2) will be solved for the variable x3 on which the lower pivot has occurred.
The next lower pivot is slated to occur on equation (33.3). Again, there
are two cases to consider. If p is unequal to −4/3, equation (33.3) is inconsis-
tent, so no solution can exist to the original equation system. Alternatively, if
pâ•›=â•›−4/3, equation (33.3) is trite, and it has nothing to pivot upon.
Chapter 3: Eric V. Denardo 101
Only equation (34.4) remains in S. The next step calls for a lower pivot on
equation (34.4). The variables x2 and x4 have nonzero coefficients in equa-
tion (34.4), so a lower pivot could occur on either of them. As before, we pivot
on the coefficient of x2 in this equation. But no equations remain in S after
equation (34.4) is removed. Hence, this lower pivot entails no arithmetic. As
concerns lower pivots, we are finished.
Back-substitution
1.5x3 = −1
3x2 =1
The first lower pivot eliminated x1 from the bottom two equations. The
second lower pivot eliminated x3 from the bottom equation. Thus, these
equations can be solved for the variables on which their lower pivots have
occurred by working from the bottom up. This process is aptly called back-
substitution. For our example, back-substitution first solves the bottom
equation for x2 , then solves the middle equation for x3 , and then solves the
top equation for x1 . This computation gives x2 = 1/3 and x3 = −2/3 and
x1 = 1, exactly as before.
102 Linear Programming and Generalizations
An adroit sequence of pivots can reduce the rate at which fill-in occurs.
A simple method for retarding fill-in counts the number of nonzero elements
that might be created by each pivot and select a pivot element that minimizes
this number. This method works with full pivots, and it works a bit better
with lower pivots, for which it is now described. Specifically:
• Keep track of the set R of rows on which lower pivots have not yet oc-
curred and the set C of columns of for which variables have not yet
been made basic.
While R is nonempty:
Among the pairs (j, k) with j ∈ R and k ∈ C for which the coefficient
of xk in row j of the current tableau is nonzero, pick a pair that mini-
mizes (rj − 1)(ck − 1).
To see what can go awry, consider a matrix (array) whose nonzero entries
are between 1 and 100, except for a few that are approximately 10−6 . Pivoting
on one of these tiny entries multiplies everything in its row by 106 and shifts
the information in some of the other rows about 6 digits to the right. Doing
that once may be OK. Doing two or three times can bury the information in
the other rows. And that’s without worrying about the round-off error in the
pivot element. In brief:
16. Review
1A − 1B + 2C = 10
−2A + 4B − 2C = 0
0.5A − 1B − 1C = 6
Chapter 3: Eric V. Denardo 105
2A + 3B − 1C = 12
−2A + 2B − 9C = 3
4A + 5B = 21
(a) Use Gauss-Jordan elimination to find a solution to this equation sys-
tem.
(b) Plot those solutions to this equation system in which each variable
is nonnegative. Complete this sentence: The solutions that have been
plotted form a ________________.
(c) What would have happened if one of the right-hand-side values had
been different from what it is? Why?
(b) Can you continue in a way that produces a basic solution? If so, do so.
6. The matrix A given by (27) consists of the coefficients of the decision vari-
ables in system (1). For this matrix A:
106 Linear Programming and Generalizations
(c) Which subsets of {A1 , A2 , A3 , A4 } are a basis for the row space of A?
Why?
8. (a basis) This problem concerns the four vectors that are listed below.
Solve parts (a), (b) and (c) without doing any numerical computation.
2 4 −1 8
1 2 1 1
, , , .
0 0 2 −4
−1 1 −1 1
(a) Show that the left-most three of these vectors are linearly indepen-
dent.
(b) Show that the left-most three of these vectors span the other one.
(c) Show that the left-most three of these vectors are a basis for the vector
space that consists of all linear combinations of these four vectors.
(c) If a pivot makes x5 basic for some equation, then x12 ____________.
Chapter 3: Eric V. Denardo 107
(b) Every (homogeneous) system Axâ•›=â•›0 has at least one non-trivial solu-
tion, that is, one solution that has x ≠ 0
(b) Prove or disprove: There exists a nonzero vector x such that Axâ•›=â•›0.
12. True or false? Each subset V of n that is a vector space has a basis. Hint:
take care.
13. This problem concerns the matrix equation Axâ•›=â•›b. Describe the condi-
tions on A and b under which this equation has:
(a) No solutions.
15. Prove that a set V of n-vectors that includes the origin is a vector space if
and only if V contains the vector [(1 − α)u + αv] for every pair u and v
of elements of V and for every real number α.
(b) For the case nâ•›=â•›2, describe three types of affine space, and guess the
“dimension” of each.
17. Designate as X the set consisting of each vector x that satisfies the matrix
equation Axâ•›=â•›b. Suppose X is not empty. Is X a vector space? Is X an affine
space? Support your answers.
18. Verify that equations (19) and (20) are correct. Hint: Equation (18) might
help.
19. (↜Small pivot elements) You are to solve following system twice, each time
by Gauss-Jordan elimination. Throughout each computation, you are to
approximate each coefficient by three significant digits; this would round
the number 0.01236 to 0.0124, for instance.
0.001A + 1B = 10
1A − 1B = 0
(a) For the first execution, begin with a pivot on the coefficient of A in
the topmost equation.
(b) For the second execution, begin with a pivot on the coefficient of B in
the topmost equation.
Remark: The final two problems (below) refer to the starred section on
efficient computation.
20. (Work for lower pivots and back-substitution) Imagine that a system of m
equations in n unknowns is solved by lower pivots and back-substitution
and that no trite or inconsistent equations have been encountered.
(b) For each j < m, show that the j-th lower pivot requires (m + 1â•›−â•›j) (n)
multiplications and divisions.
Chapter 3: Eric V. Denardo 109
(c) How many multiplications and divisions are needed to execute Gauss-
Jordan elimination with lower pivots and back-substitution? Hint: sum-
ming part (b) gives (2 + 3 + · · · + m)(n) = (n)(m)(m + 1)/2 − n.
Equation x1 x2 x3 x4 x5 RHS
(1) * * * * * *
(2) * * *
(3) * * *
(4) * * * *
(5) * * *
Part II–The Basics
This section introduces you to the simplex method and prepares you to
make intelligent use of the computer codes that implement it.
In this chapter, you will learn how to formulate linear programs for solu-
tion by Solver and by Premium Solver for Education. You will also learn how
to interpret the output that these software packages provide. A linear pro-
gram is seen to be the ideal environment in which to relate three important
economic concepts – shadow price, “relative” opportunity cost, and marginal
benefit. This chapter includes a “Perturbation Theorem” that can help you
to grapple with the fact that a linear program is a model, an approximation.
This chapter plays a “mop up” role. If care is not taken, the simplex meth-
od can pivot forever. In Chapter 6, you will see how to keep that from oc-
curring. The simplex method, as presented in Chapter 4, is initiated with a
feasible solution. In Chapter 6, you will see how to adapt the simplex method
to determine whether a linear program has a feasible solution and, if so, to
find one.
Chapter 4: The Simplex Method, Part 1
1. Preview
The simplex method is the principal tool for computing solutions to lin-
ear programs. Computer codes that execute the simplex method are widely
available, and they run on nearly every computer. You can solve linear pro-
grams without knowing how the simplex method works. Why should you
learn it? Three reasons are listed below:
• Understanding the simplex method helps you make good use of the
output that computer codes provide.
• The “feasible pivot” that lies at the heart of the simplex method is cen-
tral to constrained optimization, much as Gauss-Jordan elimination is
fundamental to linear algebra. In later chapters, feasible pivots will be
adapted to solve optimization problems that are far from linear.
The simplex method also has a surprise to offer. It actually solves a pair
of optimization problems, the one under attack and its “dual.” That fact may
seem esoteric, but it will be used in Chapter 14 to formulate competitive situ-
ations for solution by linear programming and its generalizations.
x ≤ 6,
x + y ≤ 7,
â•… ╛╛2y ≤ 9,
– x + 3y ≤ 9,
x ≥ 0,
y ≥ 0.
Feasible solutions
7
2y ≤ 9
6 –x + 3y ≤ 9
4
x+y≤7
3
2
feasible region x≤6
x≥0 1
0 x
y≥0
0 1 2 3 4 5 6 7
In Problem A and in general, the feasible region is the set of values of the
decision variables that satisfy all of the constraints of the linear program. In
Figure 4.1, the feasible region is shaded. Let us recall from Chapter 3 that the
feasible region of a linear program is a convex set because it contains the in-
terval (line segment) between each pair of points in it. A constraint in a linear
program is said to be redundant if its removal does not change the feasible
region. Figure 4.1 makes it clear that the constraint 2y ≤ 9 is redundant.
Iso-profit lines
Figure 4.1 omits any information about the objective function. Each fea-
sible solution assigns a value to the objective in the natural way; for instance,
feasible solution (5, 1) has objective value 2xâ•›+â•›3yâ•›=â•›(2)(5)â•›+â•›(3)(1)â•›=â•›13.
parallel to each other. Notice in Figure 4.2 that the point (3, 4) has a profit of
18 and that no other feasible solution has a profit as large as 18. Thus, xâ•›=â•›3 and
yâ•›=â•›4 is the unique optimal solution to Problem A, and 18 is its optimal value.
Figure 4.2.↜ Feasible region for Problem A, with iso-profit lines and
objective vector (2, 3).
7 objective vector
3 equals (2, 3)
6
2x + 3y = 18 2
5
4 (3, 4)
2x + 3y = 12
3
2
2x + 3y = 6 feasible region
1 (6, 1)
0 x
2x + 3y = 0 0 1 2 3 4 5 6 7
ing 2 units toward the right of the page and 3 units toward the top. In Fig-
ure 4.2, the objective vector shown touching the iso-profit line 2xâ•›+â•›3yâ•›=â•›18.
The objective vector can have its tail “rooted” anywhere in the plane. In
Figure 4.2 and in general, the objective vector is perpendicular to the iso-
profit lines. It’s the direction in which the objective vector points that matters.
In a maximization problem, we seek a feasible solution that lies farthest in
the direction of the objective vector. Similarly, in a minimization problem, we
seek a feasible solution that lies farthest in the direction that is opposite to the
objective vector. It is emphasized:
Extreme points
Edges
a different extreme point. Suppose, for instance, that the objective vector is
(3, 3). In this case, the objective vector has rotated clockwise, and extreme
points (3, 4) and (6, 1) are both optimal, as is each point in the edge con-
necting them. If the objective vector is (4, 3), the objective vector has rotated
farther clockwise, and the unique optimal solution is the extreme point (6, 1).
Two extreme points are said to be adjacent if the interval between them
is an edge. In Figure 4.2, extreme points (0, 0) and (0, 3) are adjacent. Extreme
points (0, 0) and (3, 4) are not adjacent.
Simplex pivots
When the simplex method is applied to Problem A, the first pivot will
occur from extreme point (0, 0) to extreme point (0, 3), and the second pivot
will occur to extreme point (3, 4), which will be identified as optimal.
A linear program is said to be feasible and bounded if it has at least one fea-
sible solution and if its objective cannot be improved without limit. Problem
A is feasible and bounded. It would not be bounded if the constraints xâ•›+â•›y ≤ 7
and x ≤ 6 were removed. It is easy to convince oneself, visually, of the following:
A linear program can be feasible and bounded even if its feasible region is
unbounded. An example is: Minimize {x}, subject to x ≥ 0.
A canonical form?
The simplex method will be used to solve every linear program that has
been cast in Form 1. Can every linear program be cast in Form 1? Yes. To
verify that this is so, observe that:
A canonical form for linear programs is any format into which every lin-
ear program can be cast. Form 1 is canonical form. Since Form 1 is canonical,
describing the simplex method for Form 1 shows how to solve every linear
program. It goes without saying, perhaps, that it would be foolish to describe
the simplex method for linear programs that have not been cast in a canoni-
cal form.
Recasting Problem A
2x + 3y = z,
which equates z to the value of the objective function. Problem A has four
“ ≤ ” constraints, other than those on its decision variables. Each of these in-
equality constraints is converted into an equation by inserting a slack vari-
able on its left-hand side. This re-writes Problem A as
(1.0) 2x + 3y – z = 0,
(1.1) 1x + s1 = 6,
(1.2) 1x + 1y + s2 = 7,
(1.3) â•…â•…â•›2y + s3 = 9,
(1.4) – 1x + 3y + s4 = 9,
x ≥ 0, y ≥ 0, â•›s1 ≥ 0 for i = 1, 2, 3, 4.
Chapter 4: Eric V. Denardo 121
To see where the “slack variables” get their name, consider the constraint
xâ•›+â•›y ≤ 7. In the constraint xâ•›+â•›yâ•›+â•› s2 = 7, the variable s2 is positive if xâ•›+â•›y < 7
and s2 is zero if xâ•›+â•›y = 7. Evidently, s2 “takes up the slack” in the constraint
xâ•›+â•›y ≤ 7.
The variable –z
In Form 1, the variable z plays a special role because it measures the ob-
jective. We elect to think of –z as a decision variable. In Problem A’, the vari-
able –z is basic for equation (1.0) because –z has a coefficient of +1 in equa-
tion (1.0) and has coefficients of 0 in all other equations. During the entire
course of the simplex method, no pivot will ever occur on any coefficient in
the equation for which –z is basic. Consequently, –z will stay basic for this
equation.
Reduced cost
The equation for which –z is basic plays a guiding role in the simplex
method, and its coefficients have been given names. The coefficient of each
variable in this equation is known as that variable’s reduced cost. In equation
(1.0), the reduced cost of x equals 2, the reduced cost of y equals 3, and the
reduced cost of each slack variable equals 0. The term “reduced cost” is firmly
established in the literature, and we will use it. But it will soon be clear that
“marginal profit” would have been more descriptive.
Problem A’ has seven decision variables. It might seem that the feasible
region for Problem A’ can only be “visualized” in seven-dimensional space.
Figure 4.3 shows that a 2-dimensional picture will do. In Figure 4.3, each line
in Figure 4.1 has been labeled with the variable in Problem A’ that equals zero
on it. For instance, the line on which the inequality xâ•›+â•›y ≤ 7 holds as an equa-
tion is relabeled s2 = 0 because s2 is the slack variable for the constraint
xâ•›+â•›yâ•›+â•› s2 = 7.
122 Linear Programming and Generalizations
y
7
6
s4 = 0 s3 = 0
5
3 s2 = 0
2
x=0 feasible region
1 s1 = 0
0 x
y=0 0 1 2 3 4 5 6 7
Figure 4.3 also enables us to identify the extreme points with basic solu-
tions to system (1). Note that each extreme point in Figure 4.3 lies at the inter-
section of two lines. For instance, the extreme point (0, 3) is the intersection
of the lines xâ•›=â•›0 and s4 = 0. The extreme point (0, 3) will soon be associated
with the basis that excludes the variables that x and s4 .
System (1) has five equations and seven variables. The variables –z and
s1 through s4 form a basis for system (1). This basis consists of five variables,
one per equation. A fundamental result in linear algebra (see Proposition 10.2
on page 334 for a proof) is that every basis for a system of linear equations
has the same number of variables. Thus, each basis for system (1) contains
exactly five variables, one per equation. In other words, each basis excludes
two of the seven decision variables. Each basis for system (1) has a basic solu-
tion, and that basic solution equates its two nonbasic variables to zero. This
identifies each extreme point in Figure 4.3 with a basis. Extreme point (0, 3)
corresponds to the basis that excludes x and s4 because (0, 3) is the intersec-
tion of the lines xâ•›=â•›0 and s4 = 0. Similarly, extreme point (3, 4) corresponds to
Chapter 4: Eric V. Denardo 123
the basis that excludes s2 and s4 because (3, 4) is the intersection of the lines
s2 = 0.and s4 = 0.
Problem A’ will now be used to introduce the simplex method, and Fig-
ure 4.3 will be used to track its progress. System (1) is basic because each of its
equations has a basic variable. The basis for system (1) consists of –z and the
slack variables. This basis excludes x and y. Its basic solution equates to zero
its nonbasic variables (which are x and y) and is
x = 0, y = 0, −z = 0, s1 = 6, s2 = 7, s3 = 9, s4 = 9.
A feasible basis
A basis for Form 1 is now said to be feasible if its basic solution is feasible,
that is, if the values of the basic variables are nonnegative, with the possible
exception of –z. Evidently, the basis {–z, s1, s2, s3, s4â•›} is feasible.
Phases I and II
For Problem A, a feasible basis sprang immediately into view. That is not
typical. Casting a linear program in Form 1 does not automatically produce
a basis, let alone a feasible basis. Normally, a feasible basis must be wrung
out of the linear program by a procedure that is known as Phase I of the
simplex method. Using Problem A to introduce the simplex method begins
with “Phase II” of the simplex method. Phase I has been deferred to Chapter
6 because it turns out to be a minor adaptation of Phase II.
Phase II of the simplex method begins with a feasible basis and with –z
basic for one of its equations. Phase II executes a series of pivots. None of
these pivots occurs on any coefficient in the equation for which –z is basic.
Each of these pivots:
• keeps –z basic;
Phase II stops pivoting when it discerns that the basic solution’s objective
value cannot be improved. How this occurs will soon be explained.
A simplex tableau
The dictionary
• Shift the non-basic variables x and y to the right-hand sides of the con-
straints.
• Multiply equation (1.0) by –1, so that z (and not –z) appears on its left-
hand side.
Writing system (1) in the format of a dictionary produces system (2), below.
(2.0) z = 0 + 2x + 3y
(2.1) s1 = 6 − 1x + 0y
(2.2) s2 = 7 − 1x + 1y,
(2.3) s3 = 9 − 0x − 2y
(2.4) s4 = 9 + 1x − 3y
his lovely book, Linear Programming, published in 1983 by W. H. Freedman and Co.,
New York. In that book, Chvátal attributes the term to J. E. Strum’s, Introduction to
Linear Programming, published in 1972 by Holden-Day, San Francisco.
Chapter 4: Eric V. Denardo 125
In system (2), the variable z (rather than –z) is basic for the topmost equa-
tion, and the slack variables are basic for the remaining equations. The basic
solution to system (2) equates each non-basic variable to zero and, conse-
quently, equates each basic variable to the number on the right-hand-side
value of the equation for which it is basic.
setting that nonbasic variable equal to 1 and adjusting the values of the basic
variables accordingly.
A pivot
Our goal is to pivot in a way that improves the basic solution’s objec-
tive value. Each pivot on a simplex tableau causes one variable that had been
nonbasic to become basic and causes one basic variable to become nonbasic.
Equation (2.0) shows that the objective function improves if the basic solu-
tion is perturbed by setting x positive or by setting y positive. We could pivot
in a way that makes x basic or in a way that makes y basic.
(3.0) z = 0 + 3y;
Evidently, the largest value of y that keeps the perturbed solution feasible
is y = 3. If y exceeds 3, the perturbed solution has s4 < 0.
Graphical interpretation
Figure 4.3 is now used to interpret the ratios in system (3). The initial ba-
sis excludes x and y, and so the initial basic solution lies at the intersection of
the lines x = 0 and yâ•›=â•›0, which is the point (0, 0). The perturbation in system
(3) keeps xâ•›=â•›0 and allows y to become positive, thereby moving upward on
the line xâ•›=â•›0. Each “ratio” in system (3) is a value of y for which (0, y) inter-
sects a constraint. No ratio is computed for constraint (3.1) because the lines
(0, y) and s1 = 0 do not intersect. The smallest ratio is the largest value of y
for which the perturbed solution stays feasible.
Chapter 4: Eric V. Denardo 127
Feasible pivots
• Excluding the equation for which –z is basic, each equation whose co-
efficient of the entering variable is positive has a ratio that equals this
equation’s right-hand side value divided by its coefficient of the enter-
ing variable.
System (1) is now used to illustrate feasible pivots. In this system, let y
be the entering variable. No ratio is computed for equation (1.0) because
–z stays basic for that equation. No ratio is computed for equation (1.1)
because the coefficient of y in this equation is not positive. Ratios are com-
puted for equations (1.2), (1.3) and (1.4), and these ratios equal 7, 4.5 and
3, respectively. The pivot occurs on the coefficient of y in equation (1.4) be-
cause that equation’s ratio is smallest. Note that this pivot results in a basic
tableau for which y becomes basic and the variable s4 that had been basic
for equation (1.4) becomes nonbasic. Equation (3.4) with s4 = 0 shows that
yâ•›=â•›3, hence that this pivot keeps the basic solution feasible. In this case and
in general:
With x (and not y) as the entering variable in system (1), ratios would be
computed form equations (1.1) and (1.2), these ratios would equal 6/1â•›=â•›6 and
7/1â•›=â•›7, respectively, and a feasible pivot would occur on the coefficient of x
in equation (1.1). This pivot causes s1 to leave the basis, resulting in a basic
tableau whose basic solution has xâ•›=â•›6 and remains feasible. By the way, the
coefficient of x in equation (1.4) equals –1, which is negative, and a pivot on
this coefficient would produce a basic solution having xâ•›=â•›9/(–1)â•›=â•›–9, which
would not be feasible.
A simplex pivot
Is the simplex pivot unambiguous? No, it is not. More than one nonbasic
variable can have marginal profit that is positive. Also, two or more rows can
tie for the smallest ratio.
Rule #1
To execute this pivot, select the block B12:I16, type the function
=pivot(C7, B3:I7) and then hit Ctrl+Shift+Enter to remind Excel that this
is an array function (because it sets values in an array of cells, rather than in
a single cell).
The pivot in Table 4.1 causes y to enter the basis and s4 to depart. The
basic solution that results from this pivot remains feasible because it equates
each basic variable other than –z to a nonnegative value.
130 Linear Programming and Generalizations
This pivot improves z by 9, which equals the product of the reduced cost
(marginal profit) of y and the ratio for its pivot row. This reflects a property
that holds in general and is highlighted below:
In each feasible pivot, the change in the basic solution’s objective value
equals the product of the reduced cost of the entering variable and the
ratio for its pivot row.
(4)
change in the basic solution’s reduced cost of the ratio for its
= × .
objective value entering variable pivot row
In Problem A, each pivot will improve the basic solution’s objective value.
That does not always occur, however. The RHS value of the pivot row can
equal 0. If it does equal 0, equation (4) shows that no change occurs in the
basic solution’s objective value. That situation is known as “degeneracy,” and
it is discussed in the next section.
Let us resume the simplex method. For the tableau in rows 12-16 of Table
4.1, x is the only nonbasic variable whose marginal profit is positive; its re-
duced cost equals 3. So x will be the entering variable for the next simplex
pivot. The spreadsheet in Table 4.2 identifies that 3 is the smallest ratio and
displays the tableau that results from a pivot on the coefficient of x in this
row. Equation (4) shows that this pivot will improve the basic solution’s objec-
tive value by 9 = 3 × 3. This pivot causes x to become basic and causes s2
(which had been basic for the pivot row) to become nonbasic. Rows 21-25 of
Table 4.2 exhibit the result of this pivot.
The basic solution to the tableau in rows 21-25 of Table 4.2 has xâ•›=â•›3,
yâ•›=â•›4 and zâ•›=â•›18. The nonbasic variables in this tableau are s2 and s4 . In Fig-
ure 4.3, this basic solution lies at the intersection of the lines s2 = 0 and
s4 = 0. Visually, it is optimal.
Chapter 4: Eric V. Denardo 131
An optimality condition
In system (5), the variables s2 and s4 are nonbasic. The basic solution
to system (5) is the unique solution to system (5) in which the nonbasic vari-
ables s2 and s4 are equated to zero. This basic solution has zâ•›=â•›18. Since the
132 Linear Programming and Generalizations
coefficients of s2 and s4 in equation (5.0) are negative, any solution that sets
either s2 and s4 to a positive value has z < 18. In brief, the basic solution to
system (5) is the unique optimal solution to Problem A’.
Test for optimality. The basic solution to a basic feasible system for
Form 1 is optimal if the reduced costs of the nonbasic variables are:
Recap
• The reduced cost of each nonbasic variable equals the change that oc-
curs in the objective value if the basic solution is perturbed by setting
that nonbasic variable equal to 1.
• If an equation has a ratio, this ratio equals the value of the entering
variable for which the perturbed solution reduces the equation’s basic
variable to zero.
• The smallest of these ratios equals the largest value of the entering vari-
able that keeps the perturbed solution feasible.
5. Degeneracy
In a feasible pivot, the RHS value of the pivot row must be nonnegative. A
feasible pivot is said to be nondegenerate if the right-hand-side value of the
Chapter 4: Eric V. Denardo 133
Nondegenerate pivots
Equation (4) holds for every pivot that occurs on a basic tableau. If a pivot
is nondegenerate:
• The coefficient of the entering variable in the pivot row must be posi-
tive, so the ratio for the pivot row must be positive.
• Hence, equation (4) shows that each nondegenerate simplex pivot im-
proves the basic solution’s objective value.
It is emphasized:
Degenerate pivots
Let us now interpret equation (4) for the case of a feasible pivot that is
degenerate. In this case:
• This pivot (like any other) multiplies the pivot row by a constant, and
it replaces the other rows by themselves less constants times the pivot
row. Since the pivot is degenerate, the RHS value of the pivot row equals
0, so the pivot changes no RHS values.
• The variables that had been basic for rows other than the pivot row
remain basic for those rows; their values in the basic solution remain as
they were because the RHS values do not change.
• The variable that departs from the basis had equaled zero, and the vari-
able that enters the basis will equal zero.
134 Linear Programming and Generalizations
In brief:
Cycling
On the other hand, each degenerate pivot changes the basis without
changing the basic solution. The simplex method is said to cycle if a sequence
of simplex pivots leads to a basis visited previously. If a cycle occurs, it must
consist exclusively of degenerate pivots.
−x + y ≤ 2,
╇╛x ≥ 0, y ≥ 0.
Please sketch the feasible region of Problem B. Note that its constraints
are satisfied by each pair (x, y) having y ≥ 2 and xâ•›=â•›y – 2; moreover, that each
such pair has objective value of 0xâ•›+â•›3yâ•›=â•›3y, which becomes arbitrarily large
as y increases. To see what happens when the simplex method is applied to
Problem B, we first place it in Form 1, as
(6.0) 0x+ 3y – z = 0,
(6.1) –x+ y + s1 = 2,
x ≥ 0 ,â•…â•›y ≥ 0 ,â•…â•›s1 ≥ 0.
Table 4.3 shows what happens when the simplex method is applied to
Problem B’. The first simplex pivot occurs on the coefficient of y in equa-
tion (6.1), producing a basic feasible tableau whose basis excludes x and s1
and is
x = s1 = 0, y = 2, −z = −6.
(7.0) z = 6 + 3x − 3s1 ,
(7.1) y = 2 + 1x − 1s1 .
Shadow prices are present not just for the final basis, but at every step
along the way. They guide the simplex method. In Chapter 11, we will see
how they do that.
It will be demonstrated in Proposition 10.2 (on page 334) that every basis
for the column space of a matrix has the same number of columns. Thus,
every basic tableau for a linear program has the same number (possibly zero)
of trite rows. A linear program is said to satisfy the Full Rank proviso if any
Chapter 4: Eric V. Denardo 137
basic tableau for its Form 1 representation has a basic variable for each row.
Proposition 10.2 implies that the Full Rank proviso is satisfied if and only if
every basic tableau has one basic variable for each row.
System (1) has a basic variable for each row, so Problem A satisfies the
Full Rank proviso. If a linear program satisfies the Full Rank proviso, its equa-
tions must be consistent, and no basic tableau has a trite row.
A definition
For linear programs that satisfy the Full Rank proviso, each basis pre-
scribes a set of shadow prices, one per constraint. Their definition is high-
lighted below.
Evidently, each shadow price is a rate of change of the objective value with
respect to the constraint’s right-hand-side (RHS) value. (In math-speak, each
shadow price is a partial derivative.)
For the final basis, whose basic solution is in rows 22-25 of Table 4.2, the
shadow price for the 2nd constraint will now be computed. That constraint’s
138 Linear Programming and Generalizations
RHS value in the original linear program value equals 7. Let us ask ourselves:
What would happen to this basic solution if the RHS value of the 2nd con-
straint were changed from 7 to 7â•›+â•›δ? Table 4.4, below, will help us to answer
this question. Table 4.4 differs from the initial tableau (rows 2-7 of Table 4.1)
in that the dashed line records the locations of the “=” signs and in that the
variable δ appears on the right-hand-side of each equation with a coefficient
of 1 in the 2nd constraint and with coefficients of 0 in the other constraints. Ef-
fectively, the RHS value of the 2nd constraint has been changed from 7 to 7â•›+â•›δ.
x y s1 s2 s3 s4 –z
Table 4.5.↜渀 The current tableau after the same two pivots.
----- -------- ---------
x y s1 s2 s3 s4 –z RHS δ
0 0 0 –9/4 0 –1/4 1 –18 –9/4
0 0 1 –3/4 0 1/4 0 3 –3/4
1 0 0 3/4 0 –1/4 0 3 3/4
0 0 0 –1/2 1 –1/2 0 1 –1/2
0 1 0 1/4 0 1/4 0 4 1/4
Casting the basic solution to Table 4.5 in the format of a dictionary pro-
duces system (8), below. Equation (8.0) shows that the rate of change of the
Chapter 4: Eric V. Denardo 139
objective value with respect to the RHS value of the 2nd constraint equals 9/4.
Thus, the shadow price of the 2nd constraint equals 9/4 or 2.25.
(8.0) z = 18 + (9/4)δ
(8.1) s1 = 3 − (3/4)δ
(8.2) x = 3 + (3/4)δ
(8.3) s3 = 1 − (1/2)δ
(8.4) y = 4 + (1/4)δ
System (8) prescribes the values of the basic variables in terms of the
change δ in the right-hand side of the 2nd constraint of Problem A. The range
of a shadow price is the interval in its RHS value for which the basic solution
remains feasible. It’s clear from equations (8.1) through (8.4) that the basic
variables stay nonnegative for the values of δ that satisfy the inequalities
s1 = 3 − (3/4)δ ≥ 0,
x = 3 + (3/4)δ ≥ 0,
s3 = 1 − (1/2)δ ≥ 0,
y = 4 + (1/4)δ ≥ 0.
−4 ≤ δ ≤ 2.
The largest value of δ for which the perturbed basic solution remains
feasible is called the allowable increase. The negative of the smallest value of
δ for which the perturbed basic solution remains feasible is called the allow-
able decrease. In this case, the allowable increase equals 2 and the allowable
decrease equals 4.
140 Linear Programming and Generalizations
A break-even price
Evidently, if the RHS value of the 2nd constraint can be increased at a per-
unit cost p below 2.25 (which equals 9/4), it is profitable to increase it by as
many as 2 units, perhaps more. Similarly, if the RHS value of the 2nd constraint
can be decreased at a per-unit revenue p above 2.25, it is profitable to decrease
it by as many as 4 units, perhaps more.
Economic insight
It’s often the case that the RHS values of a linear program represent levels
of resources that can be adjusted upward or downward. When this occurs, the
shadow prices give the break-even value of small changes in resource levels –
they suggest where it is profitable to invest, and where it is profitable do divest.
The term, shadow price, reflects the fact that these break-even prices are
endogenous (determined within the model), rather than by external market
forces.
In any basic tableau, the shadow price of each “≤” constraint equals (−1)
times the reduced cost of that constraint’s slack variable.
In Table 4.5, for instance, the shadow prices for the four constraints are 0,
9/4, 0 and 1/4, respectively.
Except for a factor of (–1), the same property holds for each “≥” con-
straint.
Chapter 4: Eric V. Denardo 141
In any basic tableau, the shadow price of each “≥” constraint equals the
reduced cost of that constraint’s surplus variable.
Graphical illustration
(9) −x + 3y = 9 and x + y = 7 + δ.
This reconfirms that the shadow price of the 2nd constraint equals 9/4.
142 Linear Programming and Generalizations
y
objective vector
9
x+y≤9
8
3
–x + 3y ≤ 9
7
x+y≤7
6 2
5 x+y≤3
4
2y ≤ 9
3
2 feasible region
x≥0 1
0 x
0 1 2 3 4 5 6 7 8 9
For Problem A, consider the effect of adding δ2 units to the RHS of the
2nd constraint and adding δ4 units to the RHS of the 4th constraint. Let us
ask ourselves: What effect would this have on the basic solution for the basis
in Table 4.5? Inserting δ4 on the RHS of Table 4.4 with a coefficient of +1
in the 4th constraint and repeating the above argument (the variables s4 and
δ4 have identical columns of coefficients) indicates that the basic solution
becomes
Chapter 4: Eric V. Denardo 143
3 − (3/4)δ2 + (1/4)δ4 ≥ 0.
In Chapter 3, it was noted that the set of ordered pairs (δ2 , δ4 ) that sat-
isfy a particular linear inequality, such as the above, is a convex set. It was also
observed that the intersection of convex sets is convex. In particular, the set
of pairs (δ2 , δ4 ) for which the basic solution remains feasible (nonnegative)
is convex. In brief:
Note also that perturbing the RHS values of the original tableau af-
fects only the RHS values of the current tableau. It has no effect on the
coefficients of the decision variables in any of the equations. In particular,
these perturbations have no effect on the reduced costs (top-row coef-
ficients). If the reduced costs satisfy the optimality conditions before the
perturbation occurs, they continue to satisfy it after perturbation occurs.
It is emphasized:
144 Linear Programming and Generalizations
In Chapter 5, we will see that the shadow prices are central to a key idea in
economics, namely, the “opportunity cost” of doing something new. In Chap-
ter 12, the shadow prices will emerge as the decision variables in a “dual”
linear program.
Every computer code that implements the simplex method finds and re-
ports a basic solution that is optimal. Most of these codes also report a shadow
price for each constraint, along with an allowable increase and an allowable
decrease for each RHS value.
If the Full Rank proviso is violated, not all of the constraints can have
shadow prices. These computer codes report them anyhow! What these codes
are actually reporting are values of the basis’s “multipliers” (short for Lagrange
multipliers). In Chapter 11, it will be shown that these “multipliers” coincide
with the shadow prices when they exist and, even if the shadow prices do not
exist, the multipliers account correctly for the marginal benefit of perturbing
the RHS values in any way that keeps the linear program feasible.
8. Review
Listed below are the most important of the properties of the simplex
method.
• The simplex method pivots from one basic feasible tableau to another.
Chapter 4: Eric V. Denardo 145
• In each basic feasible tableau, the reduced cost of each nonbasic vari-
able equals the amount by which the basic solution’s objective value
changes if that nonbasic variable is set equal to 1 and if the values of
the basic variables are adjusted to preserve a solution to the equation
system.
• If the RHS value of the pivot row is positive, the pivot is nondegenerate.
Each nondegenerate simplex pivot improves the basic solution’s objec-
tive value.
• If the RHS value of the pivot row is zero, the pivot is degenerate. Each
degenerate pivot changes the basis, but causes no change in the basic
solution or in its objective value.
• A linear program satisfies the Full Rank proviso if any basis has as many
basic variables as there are constraints in the linear program’s Form 1
representation.
• If the Full Rank proviso is satisfied, each basic feasible tableau has these
properties:
146 Linear Programming and Generalizations
–╇The shadow price of each constraint equals the rate of change of the
basic solution’s objective value with respect to the constraint’s RHS
value.
–╇If only a single RHS value is changed, the shadow price applies to
increases as large as the allowable increase and to decreases as large
as the allowable decrease.
• Rule #1 can cause the simplex method to cycle, and the ambiguity in
Rule #1 can be resolved in a way that precludes cycling, thereby guar-
anteeing finite termination.
• If the Full Rank proviso is violated, each basis still has “multipliers” that
correctly account for the marginal value of any perturbation of the RHS
values that keeps the linear program feasible.
Not a word has appeared in this chapter about the speed of the simplex
method. For an algorithm to be useful, it must be fast. The simplex method
is blazingly fast on nearly every practical problem. But examples have been
discovered on which it is horrendously slow. Why that is so has remained a
bit of a mystery for over a half century. Chapter 6 touches lightly on the speed
of the simplex method.
Chapter 4: Eric V. Denardo 147
2. Rule #1 picks the most positive entering variable for a simplex pivot on a
maximization problem. State a simplex pivot rule that makes the largest
possible improvement in the basic solution’s objective value. Use Problem
A to illustrate this rule.
(a) The coefficient of y in equation (1.1) equals zero. How is this fact re-
flected in Figure 4.3? Does a similar interpretation apply to the coef-
ficient of x in equation (1.3)?
(b) With x as the entering variable, no ratio was computed for equa-
tion (1.4). If this ratio had been computed, it would have equaled
9/(–1)â•›=â•›–9. Use Figure 4.3 to interpret this number.
(c) True or false: Problem A’ has a feasible basis whose nonbasic variables
are x and s4 .
6. (graphical interpretation) True or false: for Problem A’, every set that in-
cludes –z and all but two of the variables x, y and s1 through s4 is a basis.
– x + y ≤ 1,
x + y ≤ 4,
x – y ≤ 2,
x≥0, y ≥ 0.
(a) Solve this linear program by executing simplex pivots on a spread-
sheet.
148 Linear Programming and Generalizations
(b) Solve this linear program graphically, and use your graph to trace the
progress of the simplex method.
â•› A – B ≤ 1,
– A + B ≤ 1.
(a) Plot this linear program’s feasible region. Does it have any extreme
points?
(b) Does this linear program have an optimal solution? If so, name one.
(c) Apply the simplex method to this linear program. What happens?
â•› x â•›≤ 4,
â•›– x + y â•›≤ 2,
2x + 3y â•› ≤ 12,
x≥0, y ≥ 0.
(b) Execute a feasible pivot that finds a second optimal solution to this
linear program.
(c) Solve this linear program graphically, and use your graph to trace the
progress of the simplex method.
(d) How many optimal solutions does this linear program have? What are
they?
10. For the linear program that appears below, construct a basic feasible sys-
tem, state its basis, and state its basic solution.
Chapter 4: Eric V. Denardo 149
â•…â•…â•…â•…â•… x + y – z = 16,
â•›y +â•›z â•›≤ 12,
╅╅╅╅╅╅╅╇╛2y –â•›z ≥ – 10,
x ≥ 0, y ≥ 0, z ≥ 0.
(a) True or false: Problem A’ has a feasible basis whose nonbasic variables
are x and s2 .
(b) True or false: Problem A’ has a feasible basis whose nonbasic variables
are x and s3 .
11. (an unbounded linear program) Draw the feasible region for Problem B’. Ap-
ply the simplex method to Problem B’, selecting y (and not x) as the entering
variable for the first pivot. What happens? Interpret your result graphically.
(c) List the bases and basic solutions that were encountered. Did a degen-
erate pivot occur?
13. (degeneracy in 3-space) This problem concerns the linear program: Maxi-
mize {xâ•›+â•›1.5yâ•›+â•›z} subject to the constraints x + y ≤ 1, y + z ≤ 1, x ≥ 0,
y ≥ 0, z ≥ 0.
(a) Use the simplex pivots with Rule #1 to solve this linear program on a
spreadsheet. Did a degenerate pivot occur?
(b) Plot this linear program’s feasible region. Explain why a degenerate
pivot must occur.
(c) True or false: If a degenerate pivot occurs, the linear program must
have a redundant constraint.
150 Linear Programming and Generalizations
(a) Use the simplex pivots with Rule #1 to solve this linear program on a
spreadsheet. Did a degenerate pivot occur?
(b) True or false: The simplex method stops when it encounters an opti-
mal solution.
16. Consider a basic feasible tableau that is nondegenerate, so that its basic
solution equates all variables to positive values, with the possible excep-
tion of –z. Complete the following sentence and justify it: A feasible pivot
on this tableau will result in a degenerate tableau if and only if a tie occurs
for_______.
17. True or false: For a linear program in Form 1, feasible pivots are the only
pivots that keep the basic solution feasible.
18. (redundant constraints) Suppose that you need to learn whether or not the
ith constraint in a linear program is redundant.
(a) Suppose the ith constraint is a “≤” inequality. How could you find out
whether or not this constraint is redundant? Hint: use a linear pro-
gram.
(b) Suppose the ith constraint is an equation. Hint: use part (a), twice.
20. (bases and shadow prices) This problem refers to Table 4.1.
(a) For the basis solution in rows 2-7, find the shadow price for each con-
straint.
Chapter 4: Eric V. Denardo 151
(b) For the basic solution in rows 11-16, find the shadow price for each
constraint.
21. Adapt Table 4.4 and Table 4.5 to compute the shadow price, the allowable
increase, and the allowable decrease for the optimal basis of the RHS val-
ue of the constraint – xâ•›+â•›3y ≤ 9. Which previously nonbinding constraint
becomes binding at the allowable increase? At the allowable decrease?
22. On the plane, plot the set S that consists of all pairs (δ2 , δ4 ) for which
the basic solution to system (12) remains feasible. For each point on the
boundary of the set S that you have plotted, indicate which constraint(s)
become binding.
23. Suppose every RHS value in Problem A is multiplied by the same positive
constant, for instance, by 10.5. What happens to the optimal basis? To the
optimal basic solution? To the optimal value? To the optimal tableau? Why?
25. True or false: When the simplex method is executed, a variable can:
(a) Leave the basis at a pivot and enter at the next pivot. Hint: If it entered,
to which extreme point would it lead?
(b) Enter at a pivot and leave at the next pivot. Hint: Maximize {2yâ•›+â•›x},
subject to the constraints 3y + x ≤ 3, x ≥ 0, y ≥ 0.
(a) The basic solution to this tableau is the unique optimal solution.
(b) The basic solution to this tableau is optimal, but is not the unique
optimal solution.
(b) Prove that deleting B and the equation for which it is basic can have
no effect either on the feasibility of this linear program or on its opti-
mal value.
Chapter 5: Analyzing Linear Programs
1. Preview
This chapter also addresses the fact that a linear program – like any math-
ematical model – is but an approximation to the situation that is under study.
The information that accompanies the optimal solution to a linear program
can help you to determine whether or not the approximation is a reasonable
one.
Three sections of this chapter are starred because they can be read inde-
pendently of each other. One of these starred sections provides a glimpse of
duality.
This example has a long history. An early precursor appears in the article by Robert
1╇
Contribution
When one is allocating this week’s production capacity, the variable cost
normally includes the material and energy that will be consumed during pro-
duction, and the fixed cost includes depreciation of existing structures, prop-
erty taxes, and other expenses that are unaffected by decisions about what to
produce this week.
The contribution of an action equals the revenue that it creates less its
variable cost. This usage abbreviates the accounting phrase, “contribution to-
ward the recovery of fixed costs.” Table 5.1 reports $840 as the contribution
of each Standard model vehicle. This means that $840 equals the sales price
of a Standard model vehicle less the variable cost of manufacturing it. When
profit is used in this book, what is meant is contribution.
Maximizing contribution
The manager of the ATV plant seeks the mix of activities that maximizes
the rate at which contribution is earned, measured in dollars per week. At a
first glance, the Luxury model vehicle seems to be the most profitable. It has
the largest contribution. Each type of vehicle consumes a total of 4 hours of
capacity in the Engine and Body shops, where congestion is likely to occur.
But we will see that no Luxury model vehicles should be manufactured, and
we will come to understand why that is so.
Let us formulate the ATV problem for solution via linear programming.
Its decision variables are the rates at which to produce the three types of ve-
hicles, which are given the names:
156 Linear Programming and Generalizations
Evidently, mnemonics (memory aids) are being used; the labels S, F and
L abbreviate the production rates for Standard, Fancy and Luxury model ve-
hicles.
Inequality constraints
The ATV problem places eight constraints on the values taken by the
decision variables. Three of these constraints reflect the fact that the produc-
tion quantities cannot be negative. These three are S ≥ 0, F ≥ 0, and L ≥ 0.
The remaining five constraints keep the capacity of each shop from being
over-utilized. The top line of Table 5.1 shows that producing at rates S, F, and
L vehicles per week consumes the capacity of the Engine shop at the rate of
3Sâ•›+â•›2Fâ•›+â•›1L hours per week, so the constraint
3S + 2F + 1L ≤ 120
keeps the number of hours consumed in the Engine shop from exceeding its
weekly capacity. The expression, {840Sâ•›+â•›1120Fâ•›+â•›1200L}, measures the rate at
which profit is earned. The complete linear program is:
Engine: 3S + 2F + 1L ≤ 120,
Body: 1S + 2F + 3L ≤ 80,
Standard Finishing: 2S ≤ 96,
Fancy Finishing: ╅╇╛3F ≤ 102,
Luxury Finishing: ╅╅╅╅╅╛╛╛╛╛2L ≤ 40,
S ≥ 0, F ≥ 0, L ≥ 0.
Chapter 5: Eric V. Denardo 157
Integer-valued variables?
A spreadsheet
–╇Cells B9, C9 and D9 are reserved for the values of these decision
variables, each of which has been set equal to 1, temporarily.
–╇Rows 5 through 8 contain comparable data for the other four shops.
• Column F contains “<=” signs. These are memory aids; they remind us
that the quantities to their left must not exceed the RHS values to their
right.
It remains for Solver to select values in cells B9, C9 and D9 that maximize
the value in cell E3 while enforcing the constraints of Program 1.
A standard format
Table 5.2 presents the data for the ATV problem in a standard format,
which consists of:
• One row for the labels of the decision variables (row 2 in this case).
• One row for the values of the decision variables (row 9 in this case).
• One row for the contribution of each decision variable (row 3).
• One column for the coefficients of each decision variable, one column
for the sumproduct functions that measure the consumption of each
resource, one column for the RHS values, and one (optional) column
that records the sense of each constraint.
This is the first of two sections that describe slightly different ways to
compute the optimal solution to the ATV problem. This section is focused
Chapter 5: Eric V. Denardo 159
on Solver, which comes with Excel. The next section is focused on Premium
Solver for Education, which is on the disc that accompanies this book.
Figure 5.1 displays a Solver dialog box, which has been filled out. This
dialog box identifies E3 as the cell whose value we wish to maximize, it speci-
fies cells B9:D9 as the changing cells, and it imposes constraints that keep
the quantities in cells B9:D9 nonnegative and keep the quantities in cells E4
through E8 from exceeding the quantities in cells G4 through G8, respec-
tively.
Chapter 2 tells how to fill out this dialog box. As was indicated in Chapter
2, the Solver dialog box for Excel 2010 differs slightly from the above, but is
filled out in a similar way.
Binding constraints
In Table 5.3, the shaded cells identify these binding constraints. It will
soon be argued that it is optimal to keep these constraints binding even if the
model’s data are somewhat inaccurate.
Chapter 5: Eric V. Denardo 161
A sensitivity report
The Solver Results dialog box (see Table 5.3) has a window containing
the word “sensitivity.” Clicking on it creates a sheet containing a Sensitivity
Report that is reproduced as Table 5.4.
• The shadow price for a constraint equals the rate of change of the basic
solution’s objective value with respect to the RHS value of that con-
straint. The basic solution remains feasible (and hence optimal) for in-
creases in its RHS value up to the Allowable Increase and for decreases
up to the Allowable Decrease.
• The reduced cost of each basic variable equals zero, and the reduced
cost of each nonbasic variable equals the amount by which the optimal
value changes if that variable is set equal to 1 and the values of the basic
variables are adjusted accordingly.
In particular:
• The capacity of the Engine shop has a break-even value of 140 $/hour,
and this value applies to increases of up to 56 hs/wk and to decreases up
to 16 hr/wk. Hence, an increase of up to 56 hours per week of Engine
shop capacity is profitable if it can be obtained at a price below 140 dol-
lars per hour. And a decrease in Engine shop capacity of up to 16 hours
per week is profitable if it be put to an alternative use that is worth more
than 140 dollars per hour.
You forgot?
Sooner or later, nearly everyone who uses Solver will forget to check off
Assume Linear Model before solving a linear program. If you forget, the “en-
gine” that solves your linear program will not be the simplex method, but a
more general algorithm. It computes the correct shadow prices but calls them
Lagrange multipliers, it computes the correct reduced costs, but calls them
reduced gradients, and it omits the Allowable Increases and Allowable De-
creases because it presumes the problem is nonlinear.
Premium Solver for Education has added features and fewer bugs than do
the earlier versions of Solver. If you have a choice, use the Premium version.
Chapter 2 tells how to install and activate it. After it is activated, Premium
Solver will appear on the Add-Ins tab of the Excel? File menu.
Chapter 5: Eric V. Denardo 163
To illustrate the use of the Premium Solver dialog box, we arrange for
it to solve the ATV problem. The first step is to replicate the spreadsheet in
Table 5.3. Next, click on the Add-Ins tab on the File menu. An icon labeled
Premium Solver will appear at the left, just below the File tab. Click on it. A
Solver Parameters dialog box will appear. To make it look like that in Fig-
ure 5.2, follow the procedure that is described in Chapter 2. After you suc-
ceed, click on Solve. In a flash, a solution will appear, along with the usual box
that accords you the opportunity to obtain a sensitivity report.
Solver and Premium Solver report reduced costs and shadow prices as
they are defined in Chapter 4. Some computer packages use different conven-
tions as to the signs (but not the magnitudes) of the reduced costs and the
shadow prices. If you are using a different software package to solve linear
programs, you will need to figure out what sign conventions it employs. An
easy way to do that is described below.
A maximization problem
To see what sign conventions a particular computer package uses for max-
imization problems, you can ask it to solve a simple linear program, such as
164 Linear Programming and Generalizations
3x + 3y ≤ 6,
x ≥ 0 ,â•… y ≥ 0.
Clearly, the optimal solution to Example 1 is xâ•›=â•›0 and yâ•›=â•›2, and its opti-
mal value equals 8. It is equally clear that:
â•›3x + 3y ≥ 6,
â•› x ≥ 0,â•… y ≥ 0.
The optimal solution to Example 2 is xâ•›=â•›2 and yâ•›=â•›0, and its optimal value
equals 4. Evidently:
Uncertain data
This model’s data are uncertain because they cannot be measured pre-
cisely and because they can fluctuate in unpredictable ways. In the ATV
model, it is presumed that, in an “representative” week, 120 machine hours
are available in the Engine shop; 120 equals the number of machine hours
during which the Engine shop is open for business less allowances for routine
maintenance, machine breakdowns, shortages of vital parts, absences of key
workers, power failures, and other unforeseen events. The actual number of
machine hours available in a particular week could be larger or smaller than
120, depending on how things turned out that week.
Aggregation
Linearization
straint presumes linear interactions among the three types of vehicles that are
produced there. The actual interactions are more complicated and are some-
what nonlinear. For example, this constraint accounts crudely for the set-up
times that are needed to change from the production of one model to another.
The ATV example is aggregated and simplified. It has to be. Imagine how
intractable. this model would become if it incorporated all of the complexi-
ties and details just mentioned. Yet, there is merit in starting with a simple
and aggregated model. It will be relatively easy to build and debug. And if it
is artfully built, its simplicity will cause the main insights to stand out starkly.
Robustness
Illustration
• Make Standard model vehicles at the rate of 20 per week, make Fancy
model vehicles at the rate of 30 per week and make no Luxury model
vehicles.
• Keep the Engine and Body shops busy making Standard and Fancy
model vehicles, and make no Luxury model vehicles.
If the model’s data were exact (and that never occurs), both descriptions
of the optimal solution would be correct. If the model’s data are close, the lat-
ter is correct because it keeps the binding constraints binding.
Sketch of a proof
• Each basis for the LP’s equation system contains one variable per equa-
tion. (In other words, the Full Rank proviso is satisfied.)
• The linear program has only one optimal solution, and this optimal so-
lution is nondegenerate. (It sets each basic variable that is constrained
to be nonnegative to a positive value.)
Coupling these observations with the fact that the inverse of a matrix is a
continuous function of its data would prove the theorem.
168 Linear Programming and Generalizations
It will soon be seen that three concepts are very closely linked.
Context
These three concepts are described in the context of a linear program that
has been cast in Form 1 – with equality constraints and nonnegative decision
variables. In such a linear program, each decision variable is now said to rep-
resent an activity. Each constraint (other than the nonnegativity constraints
on the variables) measures the consumption of a resource and requires its
consumption to equal its availability (RHS value). Each basis is now said to
engage in those activities (decision variables) that the basis includes. The val-
ue assigned to each decision variable in a basic solution is now said to be the
level of the activity that it represents.
Shadow prices and marginal benefit are familiar from Chapter 4, but rela-
tive opportunity cost was not discussed there. Like the other two terms, relative
opportunity cost is defined in the context of a particular basis. The relative op-
portunity cost of each activity equals the reduction in contribution that occurs
the levels of the activities in which the basis is engaged are altered so as to free up
(make available) the resources needed to set the level of that activity equal to 1.
Shadow prices, relative opportunity costs and marginal benefit are de-
fined for every basis, not merely for the optimal basis. To illustrate these con-
cepts – and the relationship between them – we focus on the basis (and basic
solution) that is optimal for the ATV problem. Table 5.4 reports its shadow
prices. For convenient reference, these shadow prices are recorded in Ta-
ble 5.5, with a label and unit of measure of each.
Chapter 5: Eric V. Denardo 169
Let us recall that these shadow prices are break-even prices. For instance,
a change of δ in the RHS of the Engine shop capacity constraint causes a
change of 140 δ in the basic solution’s objective value. These break-even pric-
es apply to simultaneous changes in several RHS values.
Table 5.5 presents the shadow prices for the optimal basis. With refer-
ence to that basis, the relative opportunity cost of making one Luxury model
vehicle is now computed. Making one Luxury model vehicle requires 1 hour
in the Engine shop, 3 hours in the Body shop, and 2 hours in the Luxury Fin-
ishing shop. The shadow prices apply to simultaneous changes in several RHS
values, and the prices in Table 5.5 show that:
The marginal profit of any activity equals its contribution less the relative
opportunity cost of freeing up the resources needed to accomplish that activ-
ity. In particular,
Equation (2) tells us nothing new because Table 5.4 reports –200 as the
reduced cost of L, namely, as the change in profit if the basic solution is per-
turbed by setting Lâ•›=â•›1 and adjusting the values of the basic variables accord-
ingly.
Equation (2) tells us nothing new, but equation (1) does. It indicates why
the Luxury model vehicle is unprofitable. Making one Luxury model vehicle
requires 3 hours in the Body shop, which has a break-even price of $420/hour,
and (3) × (420) = $1, 260.
260 This (alone) exceeds the contribution of the Luxu-
ry model vehicle. In this example and in general:
To learn why the optimal solution to a linear program is what it is, use
the shadow prices to parse the relative opportunity cost of each activity
in which it does not engage.
The Luxury model vehicle would become profitable if its relative oppor-
tunity cost could be reduced below $1,200, and equation (1) shows that this
would occur if the time it required in the Body shop could be reduced below
2 11/21 hours.
Making one Nifty reduces profit by approximately $80. Nifties are not
profitable. Their relative opportunity cost shows that they would become
slightly profitable. if their manufacturing time in the Body shop could be re-
duced by 0.2 hours.
Still in the context of the optimal plan for the ATV facility, let’s compute
the relative opportunity cost of making one Standard model vehicle. To do
so, we must free up the resources that it requires, which are 3 hours in the
Engine shop (at a cost of $140 per hour), 1 hour in the Body shop (at a cost
of $420 per hour) and 2 hours in the Standard finishing shop (at a cost of $0
per hour), so that:
Please pause to verify that the relative opportunity cost of one Fancy
model vehicle equals $1,120. This illustrates a point that holds in general and
is highlighted below:
Consider any basis for a linear program. The contribution of each basic
variable equals its relative opportunity cost.
The above has been justified on economic grounds. It also follows from
the fact that the reduced cost of each basic variable equals zero.
172 Linear Programming and Generalizations
That’s five linear equations in five unknowns. The lower three equations
set SFâ•›=â•›FFâ•›=â•›LFâ•›=â•›0. This reduces the upper two equations to 3Eâ•›+â•›1Bâ•›=â•›840 and
1Eâ•›+â•›3Bâ•›=â•›1200; their solution is Eâ•›=â•›165 and Bâ•›=â•›345.
By the way, the shadow price that this basis assigns to a particular con-
straint could have been computed by adding δ to its RHS value and seeing
what happens to the basic solution and finding the change in its objective
value. For the ATV example, this would have required solution of six lin-
ear equations (not 5). And it would have given us only one of the shadow
prices.
No shadow price?
The ATV problem satisfies the Full Rank proviso because each constraint
in its Form-1 representation has a slack variable. This guarantees that the
equation system has a solution and that it continues to have a solution if a
RHS value is perturbed, hence that each basis assigns a shadow price to each
constraint.
Multipliers
An illustration
The 2nd constraint in Program 2 is twice the 1st constraint. If the RHS
value of either of these constraints was perturbed, the linear program would
become infeasible. Neither constraint can have a shadow price. Solver and
Premium Solver report “shadow prices” anyhow.
Multipliers, by hand
What these software packages are actually reporting are values of the “multi-
pliers.” As noted above, the values of the multipliers y1 , y2 and y3 are such that
the relative opportunity cost of each basic variable equals its contribution. For
the basis that includes x1 and x3 and excludes x2 , the multipliers must satisfy:
x1 is basic ⇒ 1y1 + 2y2 + 3y3 = 4,
x3 is basic ⇒ 3y1 + 6y2 + 1y3 = 5.
174 Linear Programming and Generalizations
Premium Solver has been used to solve Program 2. The sensitivity report
in Table 5.6 records its optimal solution and multipliers, which are
This is the same basis and the same set of multipliers that are computed
above.
Sneak preview
The opportunity cost of doing something is the benefit one can obtain
from the best alternative use of the resources needed to do that thing.
Paul Samuelson
ing the afternoon to picking raspberries. His best alternative use of the time
and effort he would spend picking raspberries is in picking strawberries. Cru-
soe anticipates that spending the afternoon picking strawberries or raspber-
ries will be equally pleasurable. The opportunity cost of picking raspberries
is the value that Crusoe places on the strawberries he might have picked that
afternoon.
A difficulty
A puzzle
Which is it?”
Fewer than 22% of the roughly 200 professional economists who respond-
ed to this question got the correct answer. That is pretty dismal performance.
Cost When They See One? A Dismal Performance from the Dismal Science,” Contri-
butions to Economic Analysis & Policy: Vol. 4, Issue 1, Article 7, 2005.
Chapter 5: Eric V. Denardo 177
If every one of them had chosen amongst the four choices at random (a pure
guess), a statistic this low would occur with a probability that is below 0.2.
Extra work
–╇Then, for each activity that is excluded from the program, use these
shadow prices to compute the relative opportunity cost of the activity.
–╇For each activity, find the best alternative use of the bundle of re-
sources it requires. (This requires solution of one optimization prob-
lem per activity.)
Recap
The relative opportunity cost focuses the decision maker on the marginal
benefit of an action – this being the benefit (contribution) obtained from the
action less the cost of freeing up the resources needed to make it possible. The
178 Linear Programming and Generalizations
Virtually all of the ideas that have appeared so far in this book are due
to George B. Dantzig. Not all of the terminology is due to him. In his classic
text3, Dantzig used the term relative cost instead of “reduced cost.” Relative
cost reflects the fact that it is relative to the current basis. In place of “shadow
price,” he used the terms price and multiplier, the latter as an abbreviation
of Lagrange multiplier. Dantzig understood that the shadow prices exist if
and only if the Full Rank proviso is satisfied. He fully understood the role of
multipliers in marginal analysis and in the “revised” simplex method. Con-
sider this:
• Dantzig’s simplex method has remained the principal tool for finding
an optimal allocation of resources ever since he devised it.
A perplexing decision
George B. Dantzig, Linear programming and extensions, R-366-PR, The RAND Cor-
3╇
poration, August, 1963 and Princeton University Press, Princeton, NJ, 1963.
Chapter 5: Eric V. Denardo 179
out peer as concerns the optimal allocation of resources. Since that time, no
one has approached his stature in this field.
This is the first of three starred sections that are starred because they
are independent of each other. Here, the ATV problem is used to glimpse a
topic that is known as duality. Let’s begin by recalling the labels of the shadow
prices – E for Engine shop, B for Body shop, SF for Standard Finishing shop,
and so forth. Table 5.5 reports the values that the optimal solution assigns to
the shadow prices, namely,
E = 140, B = 420, SF = 0, FF = 0, LF = 0.
These shadow prices are break-even prices, and their unit of measure is
$/hour.
The labels E, B, SF, FF and LF will be used to describe the decision vari-
ables in a second linear program. Think of yourself as an outsider who wishes
to rent the ATV facility for one week. Imagine that you face the situation in:
Problem B (renting the ATV facility).╇ The ATV company has agreed to rent
its ATV facility to you for one week according to the following terms:
• You must offer them a price for each unit of capacity of each shop.
• You must set each price high enough that they have no economic mo-
tive to withhold any capacity of any shop from you.
What prices should you set, and how much must you spend to rent the
facility for one week?
The ATV company can earn $50,400 by operating this facility for one
week. You must set your prices high enough that they have no motive to with-
hold any capacity from you. Intuitively, it seems clear that you will need to pay
at least $50,400 to rent their entire capacity. But must you spend more than
$50,400? And what prices should you offer? To answer these questions, we
will build a linear program.
180 Linear Programming and Generalizations
Decision variables
The decision variables in this linear program are the prices that you will
offer. By agreement, you must offer five prices, one per shop. These prices are
labeled:
E╛ ╇ = the price ($/hour) you offer for each unit of Engine shop capacity.
B╛╇ = the price ($/hour) you offer for each unit of Body shop capacity.
SF = the price ($/hour) you offer for each unit of Standard Finishing shop
capacity.
FF â•›= the price ($/hour) you offer for each unit of Fancy Finishing shop
capacity.
LF â•›= the price ($/hour) you offer for each unit of Luxury Finishing shop
capacity.
Let us compute the cost you will incur for renting the entire capacity of
the ATV facility. The Engine shop has a capacity of 120 hours, and you must
pay E dollars for each hour you rent. The cost to you of renting the entire
capacity of the Engine shop is 120E. The cost of renting the entire capacity of
the Body shop capacity is 80B. And so forth. The total cost that you will pay
to rent every unit of every shop’s capacity is given by
You wish to minimize this expression, which is your rental bill, subject to
constraints that keep the ATV company from withholding any capacity from
you.
The ATV company need not make full use of the capacity of any of its
shops. That fact constrains the prices that you can offer. For instance, the
capacity constraint on the Engine shop is the inequality,
3S + 2F + 1L ≤ 120.
Chapter 5: Eric V. Denardo 181
Can you offer a price E that is negative? No. If you did, the ATV company
would not rent you any of the capacity of its Engine shop. Instead, it would
leave those resources idle. You must offer a price E that is nonnegative. The
decision variable E must satisfy the constraint E ≥ 0.
Each shop’s capacity constraint is a “≤” inequality. For this reason, each
of the prices that you offer must be nonnegative. In other words, the decision
variables must satisfy the constraints,
E ≥ 0, B ≥ 0, SF ≥ 0, FF ≥ 0, LF ≥ 0.
Producing vehicles
The ATV facility can be used to manufacture vehicles. Your prices must
be high enough that manufacturing each type of vehicle becomes unprofit-
able. The price of the bundle of resources needed to produce each vehicle
must be at least as large as its contribution.
Let us begin with the Standard model vehicle. Column B of Table 5.2
shows that the company would earn $840 for each Standard model vehicle
that it made, that making this vehicle would require 3 hours in the Engine
shop, 1 hour in the Body shop, and 2 hours in the Standard Finishing shop.
Thus, it becomes unprofitable to make any Standard model vehicles if the
prices you offer satisfy
S: S: 3E + 1B + 2SF ≥ 840.
F: F: 2E + 2B + 3FF ≥ 1120.
In the same way, the data in column D of Table 5.2 shows the Luxury
model vehicle becomes unprofitable if the prices you offer satisfy
L: L: 1E + 3B + 2LF ≥ 1200.
The constraints and objective of a linear program that rents the ATV
manufacturing facility have now been presented. Assembling them produces
182 Linear Programming and Generalizations
Program 3 calculates the prices that minimize the cost of renting the fa-
cility for one week, subject to constraints that make it unprofitable for the
ATV company to withhold any capacity from you, the renter. From Solver, we
could learn that the optimal solution to this linear program is
E = 140, B = 420, SF = 0, FF = 0, LF = 0,
that its optimal value is 50,400 $/wk, and the shadow prices of its three con-
straints are
S = 20, F = 30, L = 0.
Program 1 and Program 3 have the same optimal value, and the shadow
prices of each form an optimal solution to the other!
Duality
•â•‡The other linear program is feasible and bounded, and both linear
programs have the same optimal value.
A surprise?
Did duality surprise you? If so, you are in very good company. It sur-
prised George B. Dantzig too. In retrospect, it’s eminently reasonable. To see
why, we note that the optimal solution to Program 1 and its shadow prices
have these properties:
• These shadow prices are such that no vehicle’s contribution exceeds its
relative opportunity cost, so they satisfy the “≥” constraints of Program 3.
From Table 5.4, we see that the shadow price on Engine shop capacity is
140 $/hr, with an Allowable Increase of 56 and an Allowable Decrease of 16.
This price applies for capacity levels in the range between 104 hours (because
104â•›=â•›120 – 16) and 176 hours (because 176â•›=â•›120â•›+â•›56).
We can use Solver to find optimal solutions to Program 1 for Engine shop
capacities that are just below 104 and just above 176, find the range and shad-
ow price in each case, and repeat. Figure 5.3 plots the result. This figure shows
that the slope decreases with quantity. This is no accident. A linear program
184 Linear Programming and Generalizations
contribution
$ 60,000
$ 50,000
slope = 0
$ 40,000
slope = 140
$ 30,000
slope = 165
slope = 240
$ 20,000
slope = 560
$ 10,000
slope = 1200 Engine
$0 shop
0 20 40 76 104 176 capacity
The current shadow prices are the most favorable; larger increases will
be less profitable, and larger decreases will be more costly.
The example in Chapter 4 had only two decision variables, so its geom-
etry could be visualized on the plane. The ATV problem has three decision
variables, which are S, F and L, so a geometric view of it requires three-di-
Chapter 5: Eric V. Denardo 185
mensional (or solid) geometry. Solid geometry has been familiar since birth,
even if it is omitted from typical high-school geometry courses.
Cartesian coordinates
F
(0, 34, 0) (12, 34, 0)
(0, 34, 4) (20, 30, 0)
(0, 0, 0) S
The feasible region for the ATV problem is polyhedron that has 10 verti-
ces (extreme points), 15 edges, and 7 faces. One constraint is binding on each
face, two on each edge, and three at each vertex. Two vertices are adjacent if
an edge connects them.
When the simplex method is applied to the ATV example, each pivot
shifts it from a vertex to an adjacent vertex whose objective value is improved.
It stops pivoting after reaching vertex (20, 30, 0).
Higher dimensions
Plane geometry deals with only two variables, solid geometry with three.
Linear programs can easily have dozens of decision variables, or hundreds.
Luckily, results that hold for plane geometry and for solid geometry tend to
remain valid when there are many variables. That is why geometry is relevant
to linear programs.
12. Review
This chapter has shown how to formulate a linear program for solution
by Solver and by Premium Solver for Education. A focal point of this chapter
has been the interpretation of the information accompanies the optimal solu-
tion. We have seen:
• How the shadow prices determine the relative opportunity costs of the
basic and nonbasic variables.
Chapter 5: Eric V. Denardo 187
• How the relative opportunity costs help us to understand why the opti-
mal solution is what it is.
Material in later chapters has been glimpsed. In Chapter 11, shadow pric-
es, relative opportunity cost, and marginal profit will be used in to guide the
“revised” simplex method as it pivots. Chapter 12 is focused on duality and
its uses. In Chapter 14, duality will be used to construct a simple (stylized)
model production and consumption in an economy in general equilibrium.
Nonlinear programs will be studied in Chapter 20, where it is seen that the
natural generalization of decreasing marginal return produces nonlinear pro-
grams that are relatively easy to solve.
1. For the ATV problem, Table 5.4 (on page 161) reports an Allowable De-
crease on the objective coefficient of L of ∞. That is no accident. Why?
2. For the ATV problem, Table 5.4 (on page 161) reports a shadow price of
140 for the Engine shop constraint and an Allowable Decrease of 16 in
this constraint’s right-hand-side value. Thus, renting 4 hours of Engine
shop for one week decreases contribution by $560 because 560â•›=â•›4 × 140.
Without re-running the linear program, show how the optimal solution
changes when the Engine shop capacity is decreased by 4. Hint: The Per-
turbation Theorem reduces this problem to solving two equations in two
unknowns.
188 Linear Programming and Generalizations
3. Eliminate from the ATV problem the Luxury model vehicles. The linear
program that results has only two decision variables, so its feasible region
is a portion of the plane.
(d) Show, graphically, that its optimal solution sets Sâ•›=â•›20 and Fâ•›=â•›30, so
that its optimal value equals 50,400.
(e) Set aside the capacity needed to make one Luxury model vehicle. Re-
solve the linear program graphically. Show that making one Luxury
model vehicle decreases contribution by 200.
4. Suppose, in the ATV problem, that the contribution of the Standard mod-
el vehicle is $900 apiece for the first 12 made per week and $640 apiece for
production above that level.
(a) Revise Program 1 to account for this diseconomy of scale, and solve it.
(b) You can figure out what the optimal solution would be without doing
Part (a). How? Support your answer.
5. Consider a company that can use overtime labor at an hourly wage rate
that is 50% in excess of regular time labor cost. Does this represent an
economy of scale? A diseconomy of scale? Will a profit-maximizing linear
program have an unintended option? If so, what will it be? Will this unin-
tended option be selected by optimization, or will it be ruled out?
(a) The optimal solution would be multiplied by _______, and the opti-
mal value would be multiplied by _______.
Chapter 5: Eric V. Denardo 189
(c) There was nothing special about the factor 0.75 because ________.
8. The sensitivity report seems to omit the shadow prices of the nonnegativ-
ity constraints. True or false:
10. This problem refers to the survey by Paul J. Ferraro and Laura J. Taylor
that is cited in Section 8 of this chapter.
(a) With the economists’ definition of opportunity cost, which of the four
answers is correct?
(c) Suppose you had planned to attend the Dylan concert when someone
offers you a free ticket to the Clapton concert. What can you say of
the relative opportunity cost of seeing the Clapton concert?
190 Linear Programming and Generalizations
11. This problem refers to the Robinson Crusoe example that is discussed in
Section 8. Suppose Crusoe has a third alternative. In addition to spending
the afternoon picking strawberries or strawberries, he could spend it loll-
ing on the beach.
(a) Suppose he would rather loll on the beach than pick strawberries.
Carefully describe:
(b) Suppose he planned to pick raspberries when the sun came out, at
which time it occurred to him that he might enjoy an afternoon on
the beach. Describe the relative opportunity cost of an afternoon on
the beach.
12. Write down linear programs that have each of these properties:
(e) Its feasible region is unbounded, and it has multiple optima, none of
which occur at an extreme point.
14. In Table 5.6, the Allowable Increase and Allowable Decrease for rows 5
and 6 are zero because Program 2 becomes infeasible if the RHS value of
either of the first two constraints is perturbed. These constraints do not
have shadow prices. Table 5.6 reports that row 7 does have a shadow price,
and it reports an Allowable Increase and Allowable Decrease for its RHS
Chapter 5: Eric V. Denardo 191
value. Why does this row have a shadow price? What accounts for the
range on this shadow price?
15. Consider a constraint in a linear program that has no shadow price. Does
this constraint have a multiplier? What can be said about the Allowable
Increase and the Allowable Decrease of that constraint’s RHS value?
16. With the Engine shop capacity fixed at 120 hours per week, use Solver to
compute the optimal value of Program 1 for all values of the Body shop
capacity. Plot the analog of Figure 5.3. Do you observe decreasing mar-
ginal return?
17. With Figure 5.4 in view, have Solver or Premium Solver use the simplex
method to solve Program 1, but use the Options tab to have it show the
results of each iteration. Record the sequence of basic solutions that it fol-
lowed. Did it pivot from extreme point to extreme point? Did each pivot
occur along an edge?
18. (A farmer) A 1,200 acre farm includes a well that has a capacity of 2,000
acre-feet of water per year. (One acre-foot is one acre covered to a depth
of one foot.) This farm can be used to raise wheat, alfalfa, and beef. Wheat
can be sold at $550 a ton and beef at $1,300 a ton. Alfalfa can be bought or
sold at the market price of $220 per ton. Each ton of wheat that the farmer
produces requires one acre of land, $50 of labor, and 1.5 acre-feet of water.
Each ton of alfalfa that she produces requires 1/3 acre of land, $40 of labor
and 0.6 acre-feet of water. Each ton of beef she produces requires 0.8 acres
of land, $50 of labor, 2 acre-feet of water, and 2.5 tons of alfalfa. She can
neither buy nor sell water. She wishes to operate her farm in a way that
maximizes its annual profit. Below are the data in a spreadsheet formula-
tion, the solution that Solver has found, and a Sensitivity Report.
192 Linear Programming and Generalizations
(a) Write down the linear program. Define each variable. Give each vari-
able’s unit of measure. Explain the objective function and each con-
straint. Explain why the constraint AS ≥ 0 is absent. What is the unit
of measure of the objective? What is the unit of measure of each con-
straint?
(b) State the optimal solution in a way that can executed when the data
are inexact.
(d) What would have to happen to the price of wheat in order for her to
change her production mix?
(e) What would have to happen to the price of alfalfa for her to change
her production mix?
Note: Parts (f) through (i) refer to the original problem and are inde-
pendent of each other.
(f) The government has offered to let her deposit some acreage in the
“land bank.” She would be paid to produce nothing on those acres. Is
she interested? Why?
(g) The farmer is considering soybeans as a new crop. The market price
for soybeans is $800 per ton. Each ton of soybeans requires 2 acres of
land, 1.8 acre feet of water and $60 of labor. Without re-running the
Chapter 5: Eric V. Denardo 193
(h) A neighbor has a 400 acre farm with a well whose capacity is 500
acre-feet per year. The neighbor wants to retire to the city and to rent
his entire farm for $120,000 per year. Should she rent it? If so, what
should she do with it?
(b) Solve this linear program on a spreadsheet. Describe its optimal solu-
tion in a way that can be implemented when its data are inexact.
(c) What is the value to the company of the EPA’s relaxing the constraint
on particulate emission by one ounce per week? What is the value to
the company of the EPA’s relaxing the constraint on Chemical emis-
sions by one ounce per week?
(d) (an emissions trade-off) By how much should the company be willing
to reduce its weekly emission of chemicals if the EPA would allow it
to emit one additional ounce of particulates each week?
194 Linear Programming and Generalizations
(e) (an emissions tax) The EPA is considering the control of emissions
through taxation. Suppose that the government imposes weekly tax
rates of P dollars per ounce of particulate emissions and C dollars per
ounce of chemical emission. Find tax rates, P and C, that keep the
company’s pollutants at or below the current levels and minimize the
company’s tax bill. Hint: With the constraints on emissions deleted,
the feasible region becomes a triangle, so the tax rates must be suffi-
ciently large to make the extreme point(s) that cause excess pollution
to become undesirable.
(f) By how much does the taxation scheme in part (e) reduce profit?
Chapter 6: The Simplex Method, Part 2
1. Preview
• That the simplex method can “cycle” (fail to terminate finitely) and that
it can be kept from doing so.
• That the simplex method readily accommodates variables that are not
required to be nonnegative.
Also discussed here is the speed of the simplex method. Decades after
its discovery and in spite of the best efforts of scores of brilliant researchers,
the simplex method remains the method of choice for solving large linear
programs.
2. Phase I
(1.3) â•›– 2p – 3q ≤ 0,
(1.4) ╛╛ p ≥ 0, q ≥ 0, r ≥ 0.
Step 1 of Phase I
The 1st step of Phase I is to cast the linear program in Form 1, preserving
its sense of optimization. Executing this step on Problem A rewrites it as
Chapter 6: Eric V. Denardo 197
Step 2
Rows 2-6 of Table 6.1 mirror system (2). Rows 4 and 5 lack basic vari-
ables. A choice exists as to the elements on which to pivot. Table 6.1 exhibits
the result of pivoting on the coefficient of s1 in row 4 and then on the coef-
ficient of p in row 11. These pivots produce the basic tableau in rows 14-18.
This tableau’s basic solution sets pâ•›=â•›–10 and s3â•›=â•›–20. If this solution were
feasible, Phase I would be complete. It is not feasible, so Phase I continues.
Step 3
The 3rd step is to insert on the left-hand side of the equation system an
artificial variable α with a coefficient of –1 in each equation whose RHS
value has the wrong sign and with a coefficient of 0 in each of the remaining
equations. Displayed in rows 20-24 of Table 6.2 is the result of executing Step
3 on Program 1.
198 Linear Programming and Generalizations
In system (3), setting qâ•›=â•›0, râ•›=â•›0 and αâ•›≥â•›20 equates the variables s1, p and
s3 to nonnegative values. Moreover, a pivot on the coefficient of α in the equa-
tion for which s3 is basic will remove s3 from the basis and will produce a basic
solution in which αâ•›=â•›20. This motivates the next step.
Step 4
Step 4 is to select equation whose RHS value is most negative and pivot
upon the coefficient of α in that equation. When applied to the tableau in
rows 20-24 of Table 6.2, this pivot occurs on the coefficient of α in row 24.
This pivot produces the basic tableau in rows 26-30. That tableau’s basic solu-
tion sets s1â•›=â•›4, pâ•›=â•›10 and αâ•›=â•›20, exactly as predicted from system (3)
Step 5
What remains is to drive α down toward zero, while keeping the basic
variables (other than –z) nonnegative. This will be accomplished by a slight
adaptation of the simplex method. To see how to pivot, we write the equations
represented by rows 26-30 in dictionary format, as:
The goal is to decrease the value of α, which is the basic variable for equa-
tion (4.3). The nonbasic variables q and r have negative coefficients in equa-
tion (4.3), so setting either of them positive decreases α.
In a Phase I simplex pivot for Form 1, the entering variable and pivot
element are found as follows:
•â•‡The entering variable can be any nonbasic variable that has a posi-
tive coefficient in the row for which α is basic.
•â•‡The pivot row is selected by the usual ratios, which keep the basic
solution feasible and keep –z basic.
•â•‡But if the row for which α is basic ties for the smallest ratio, pivot
on that row.
To reduce the ambiguity in this pivot rule, let’s select as the entering vari-
able a nonbasic variable that has the most positive coefficient in the row for
which α is basic. Table 6.3 displays the Phase I simplex pivots that result.
Rows 26-30 of Table 6.3 indicate that for the first of these pivots, q is the
entering variable (its coefficient in row 30 is most positive), and row 29 has
the smallest ratio, so q enters the basis and p departs. Rows 35-38 result from
that pivot.
Rows 34-38 of Table 6.3 indicate that for the second pivot, r is the enter-
ing variable (its coefficient in row 38 is most positive), and α is the departing
variable because the row for which α is basic ties for the smallest ratio.
Rows 40-44 display the basic tableau that results from that pivot. The
variable α has become nonbasic. The numbers in cells H42-H44 are non-
Chapter 6: Eric V. Denardo 201
negative, so this tableau’s basic solution equates the basic variables q, r and
s1 to nonnegative values. Deleting α and its column of coefficients deleted
produces a basic feasible tableau for Program 1.
Step 6 of Phase I
The 6th and final step is to delete α and its column of coefficients. This
step produces a basic feasible tableau with which to begin Phase II. Applying
this step to rows 40-44 of Table 6.3 casts Program 1 as the linear program:
No entering variable?
One possibility has not yet been accounted for. Phase I pivots can result
in a basic tableau whose basic solution sets αâ•›>â•›0 but in which no nonbasic
variable has a positive coefficient in the row for which α is basic. If this oc-
curs, no entering variable for a Phase I simplex pivot can be selected. What
then?
The basic solution to this equation system has αâ•›=â•›4.615, and the variables
p, r and s3 are constrained to be nonnegative, so equation (6) demonstrates
that no feasible solution can have α < 4.615. The artificial variable α cannot be
reduced below 4.615, so the linear program is infeasible.
Recap – infeasible LP
Recap – feasible LP
Now consider a linear program that is feasible. Neither of the above con-
ditions can occur. If Gauss-Jordan elimination (Step 2) produces a feasible
basis, Phase II is initiated immediately. If not, an artificial variable α is in-
serted, and a pivot produces a basic solution that is feasible, except that it
Chapter 6: Eric V. Denardo 203
Commentary
Let’s suppose that certain variables are likely to be part of an optimal ba-
sis. Phase I can be organized for a fast start by pivoting into the initial basis
as many as possible of these variables.
3. Cycling
Does the simplex method terminate after finitely many pivots? The an-
swer is a qualified “yes.” If no care is taken in the choice of the entering vari-
able and the pivot row, the simplex method can keep on pivoting forever.
If care is taken, the simplex method is guaranteed to be finite. This section
describes the difficulty that can arise and shows how to avoid it.
The difficulty
• Each nondegenerate simplex pivot changes the basis, changes the basic
solution, and improves its objective value.
• Each degenerate pivot changes the basis, but does not change any RHS
values, hence causes no change in the basic solution or in the basic so-
lution’s objective value.
previously. Also, since there are finitely many bases, so only finitely many
nondegenerate simplex pivots can occur prior to termination.
Whether or not the simplex method cycles depends on how the ambigui-
ty in its pivot rule is resolved. The entering variable can be any variable whose
reduced cost is positive in a maximization problem, negative in a minimiza-
tion problem. The pivot row can be any row whose ratio is smallest.
Rule A
• The pivot row has the lowest ratio. Ties, if any, are broken by picking
the row whose basic variable is listed farthest to the left.
A cycle
Rule A can cycle. In fact, when Rule A is applied to the linear program
in Table 6.4, it does cycle. After six degenerate pivots, the tableau in rows 3-6
reappears.
An anti-cycling rule
Abraham Charnes was the first to publish a rule that precludes cycling.
The key to his paper, published in 19521, was to pivot as though the RHS
values were perturbed in a way that breaks ties. Starting with a basic feasible
tableau (either in Phase I or in Phase II), imagine that the RHS value of the
1st non-trite constraint is increased by a very small positive number ε, that
the RHS value of the 2nd non-trite constraint is increased by ε 2 , and so forth.
Standard results in linear algebra make it possible to demonstrate that, for
all sufficiently small positive values of ε, there can be no tie for the smallest
ratio. Consequently, each basic feasible solution to the perturbed problem
equates each basic variable (with the possible exception of –z) to a positive
value. In the perturbed problem, each simplex pivot is nondegenerate. This
guarantees that the simplex method cannot cycle. Termination must occur
after finitely many pivots.
The perturbation argument that Charnes pioneered has had a great many
uses in optimization theory. From a computational viewpoint, perturbation is
unwieldy, however. Integrating it into a well-designed computer code for the
simplex method requires extra computation that slows down the algorithm.
A simple cycle-breaker
• The pivot row has the smallest ratio. Ties are broken by picking the row
whose basic variable is listed farthest to the left.
Rule B is often called Bland’s rule, in his honor. Proving that Bland’s rule
precludes cycles is a bit involved. By contrast, incorporating it in an efficient
computer code is easy, and it adds only slightly to the computational burden.
Bland’s rule can be invoked after encountering a large number of consecutive
degenerate pivots.
Initially, it was not clear whether the simplex method could cycle if no
special care was taken to break ties for the entering variable and the pivot row.
George Dantzig asked Alan Hoffman to figure this out. In 1951, Hoffman
found an example in which Rule A cycles. The data in Hoffman’s example
entail the elementary trigonometric functions (sin ϕ, cos2 ϕφ and so forth). In
Hoffman’s memoirs3, he reports:
2╇
Robert G. Bland, “New finite pivot rules for the simplex method, Mathematics of
Operations Research, V. 2, pp. 103-107, 1977.
3╇
Page 171 of Selected Papers of Alan Hoffman with Commentary, edited by Charles
Micchelli, World Scientific, Rver Edge, NJ.
Chapter 6: Eric V. Denardo 207
Charnes was the first to publish an anti-cycling rule, but he may not have
been the first to devise one. In his 1963 text, George Dantzig4 wrote,
“Long before Hoffman discovered his example, simple devices were pro-
posed to avoid degeneracy. The main problem was to devise a way to avoid
degeneracy with as little extra work as possible. The first proposal along
these lines was presented by the author in the fall of 1950 …. Later, A. Or-
den, P. Wolfe and the author published (in 1954) a proof of this method
based on the concept of lexicographic ordering ….”
Perturbation and lexicographic ordering are two sides of the same coin;
they lead to the same computational procedure, and it is a bit unwieldy.
Form 2
In Problem B, the decision variables d, e and f are free; they can take any
values.
Getting started
The tableau in rows 12-20 of Table 6.5 results from pivoting on the coef-
ficients of d, e and f in rows 4, 5 and 6, respectively. This tableau is basic. Its
basic solution is feasible because d, e and f are allowed to take any values.
Chapter 6: Eric V. Denardo 209
Once a free variable becomes basic, the RHS value of the equation for
which it is basic can have any value, positive, negative or zero. To keep a
free variable basic, compute no ratio for the row for which it is basic. To
keep d, e and f basic for the equations represented by rows 14, 15 and 16 of
Table 6.5, we’ve placed “none” in cells N14, N15 and N16. In this example
and in general:
After a free variable becomes basic, compute no ratio for the equation
for which it is basic. This keeps the free variable basic, allowing it to
have any sign in the basic solution that results from each simplex pivot.
Rows 13-20 of Table 6.5 are a basic feasible tableau with which to initiate
Phase II, and its first pivot occurs on the coefficient of c in row 18. Pivoting con-
tinues until the optimality condition or the unboundedness condition occurs.
Problem B fails to illustrate one situation that can arise: In a basic feasible
tableau for a linear program that has been written in Form 2, a free vari-
210 Linear Programming and Generalizations
• In either case, compute no ratio for any row whose basic variable is free.
Needless work
5. Speed
Typical behavior
The best codes of the simplex method quickly solve practical linear pro-
grams having m and n in the thousands or tens of thousands. No one re-
ally understands why the simplex method is as fast as it is. On carefully-con-
Chapter 6: Eric V. Denardo 211
structed examples (one of which appears as Problem 5), the simplex method
is exceedingly slow. Any attempt to argue that the simplex method is fast “on
average” must randomize in a way that bad examples occur with miniscule
probability.
moreover, that the quality of the fit is quite good. Expression (7) is strikingly
close to (mâ•›+â•›n)/2.
Atypical behavior
The simplex method does not solve all problems quickly. In their 1972 pa-
per, Klee and Minty7 showed how to construct examples having m equations
and 2 m decision variables for which Rule A requires 2mâ•›–â•›1 pivots. (Problem
5 presents their example for the case mâ•›=â•›3.) Even at the (blazing) speed of one
million pivots per second, it would take roughly as long as the universe has
existed for Rule A to solve a Klee-Minty example with mâ•›=â•›100.
A conundrum
5╇
Vanderbei, Robert J., Linear Programming: Foundations and Extensions, Kluwer
Academic Publishers, Boston, Mass., 1997.
6╇
Gay, D. “Electronic mail distribution of linear programming test problems,” Math-
ematical Programming Society COAL Newsletter, V. 13, pp 10-12, 1985.
7╇
V. Klee and G. J. Minty, “How good is the simplex algorithm?” In O. Shisha, editor,
Inequalities III, pp. 159-175, Academic Press, New York, NY, 1972.
212 Linear Programming and Generalizations
Spielman and Shang-Huia Tang that has won both the Gödel and the Fulker-
son prize8.
Interior-point methods
8╇
Spielman, D and S.-H. Teng, “Smoothed analysis of algorithms: Why the simplex
method usually takes polynomial time,” Journal of the ACM, V. 51, pp. 385-463
(2004).
9╇
L. G. Kachian, “A polynomial algorithm in linear programming,” Soviet Mathemat-
ics Doklady, V. 20, pp 191-194 (1979).
10╇
N. Karmarkar, “A new polynomial-time algorithm for linear programming,” Pro-
ceedings of the 16th annual symposium on Theory of Computing, ACM New York,
pp 302-311 (1979).
Chapter 6: Eric V. Denardo 213
AT&T weighs in
A business unit
11╇
Fiacco, A. V. and G. McCormick, Nonlinear programming: sequential unconstrained
minimization techniques,” John Wiley & Sons, New York, 1968, reprinted as Classics
in applied mathematics volume 4, SIAM, Philadelphia, Pa., 1990.
12╇
Dikin, I. I.. “Iterative solution of problems of linear and quadratic programming,”
Soviet Math. Doklady, V. 8, pp. 674-675, 1967.
214 Linear Programming and Generalizations
Seminal work
What’s best?
For extremely large linear programs, the best of the interior-point method
might run a bit faster than the simplex method. The simplex method enjoys
an important advantage, nonetheless. In Chapter 13, we will see how to solve
an integer program by solving a sequence of linear programs. The simplex
13╇
Renegar, J., “A polynomial-time algorithm, based on Newton’s method for linear
programming,” Mathematical Programming, V. 40, pp 59-93, 1988
√
14╇
Ye, Yinyu, Michael J. Todd and Shinji Mizuno, “On O( n L) iteration homogeneous
and self-dual linear programming algorithm,” Mathematics of Operations Research,
V. 19, pp. 53-67, 1994.
Chapter 6: Eric V. Denardo 215
method is far better suited to this purpose because it finds an optimal solu-
tion that is an extreme point; interior-point methods find an extreme point
only if the optimum solution is unique.
6. Review
The simplex method can cycle, and cycles can be avoided. Bland’s method
for avoiding cycles is especially easy to implement. Even so, the perturbation
method of Charnes (equivalently, the lexicographic method of Dantzig) has
proved to be a useful analytic tool in a number of settings.
Decision variables that are not constrained in sign are easy to accommo-
date within the simplex method. Once a free variable is made basic, it is kept
basic by computing no ratio for the equation for which it is basic.
Modern interior-point methods may run a bit faster than the simplex
method on enormous problems, but the simplex method remains the method
of choice, especially when integer-valued solutions are sought.
1. (Phase I) In Step 2 of Phase I, would any harm be done by giving the arti-
ficial variable α a coefficient of –1 in every equation other than the one for
which –z is basic?
216 Linear Programming and Generalizations
2. (Phase I) For the tableau in rows 35-39 of Table 6.3, rows 37 and 38 tie for
the smallest ratio. Execute a pivot on the coefficient of r in row 37. Does
this result in a basis that includes α and whose basic solution sets αâ•›=â•›0? If
so, indicate how to remove α from the basis and construct a basic feasible
tableau with which to initiate Phase II.
4. (Phases I and II) Consider this linear program: Maximize {2xâ•›+â•›6y}, subject
to the constraints
╇╛╛╛2x – 5y ≤ –3,
╇╛╛4x – 2y + 2z ≤ –2,
╅╇ 1x + 2y â•›≤ 4,
x ≥ 0, y ≥ 0, z ≥ 0.
(a) For this example, execute the simplex method with Rule A. (You will
need seven pivots.)
(b) For each extreme point encountered in part (a), record the triplet (x1,
x2, x3,).
Chapter 6: Eric V. Denardo 217
(c) Plot the triplets you recorded in part (b). Identify the region of which
they are the extreme points. Does it resemble a deformation of the unit
cube? Could you have gotten from the initial extreme point to the final
extreme point with a one simplex pivot?
(d) What do you suppose the comparable example is for the case mâ•›=â•›2?
Have you solved it?
(e) Write down but do not solve the comparable example for mâ•›=â•›4.
8. In Rule B, ties are broken by picking the variable that is farthest to the left.
Would it work equally well to pick the variable that is farthest to the right?
9. The idea that motivates Charnes’s perturbation scheme is to resolve the am-
biguity in the variable that will leave the basis by perturbing the RHS values
by miniscule amounts, but in a nonlinear way. The tableau that appears be-
low reproduces rows 2-6 of Table 6.4, with the dashed line representing the
“=” signs and with the quantity ε j added to the jth constraint, for jâ•›=â•›1, 2, 3.
(a) Execute Charnes’s pivot rule (for maximization) on this tableau, se-
lecting the nonbasic variable whose reduced cost is most positive as
the entering variable.
(b) Identify the first pivot at which Charnes’s rule selects a different pivot
element than does Rule A.
(c) Complete and justify the sentence: If a tie were to occur for the smallest
ratio when Charnes’s pivot rule is used, two rows would need to have
coefficients of ε 1 , ε 2 , and ε 3 that are _________, and that cannot oc-
cur because elementary row operations keep ______ rows ______.
218 Linear Programming and Generalizations
10. Cycling can occur in Phase I. Cycling in Phase I can be precluded by Rule
B or by Charnes’s perturbation scheme. At what of the six steps of Phase
I would Charnes perturb the RHS values? Which RHS values would he
perturb?
11. Consider a linear program that is written in Form 1 and is feasible and
bounded. By citing (but not re-proving) results in this chapter, demon-
strate that this linear program has a basic feasible solution that is optimal.
12. (free variables) This problem concerns the maximization problem that is
described by rows 12-20 of Table 6.5, in which d, e and f are free.
(a) On a spreadsheet, execute the simplex method with Rule A, but com-
puting no ratios for the rows whose basic variables are free.
(b) Did any of the free variables switch sign? If so, what would have oc-
curred if this problem had been forced into Form 1 prior to using
Rule A? Remark: Part (b) requires no computation.
13. (free variables) The tactic by which free variables are handled in Section 4
of this chapter is to make them basic and keep them basic. Here’s an alter-
native:
(i) After making a free variable basic, set aside this variable, and set aside
the equation for which it just became basic. (This reduces by one the
number of rows and the number of columns.)
(ii) At the end, determine the values taken by the free variables from the
values found for the other variables.
Does this work? If it does work, why does it work? And how would you
determine the “values taken by the free variables.”
14. (extreme points and free variables) A feasible solution to a linear program
is an extreme point of the feasible region if that feasible solution is not
a convex combination of two other feasible solutions. Consider a linear
program that is written in Form 2. Suppose this linear program is feasible
and bounded. Is it possible that no extreme point is an optimal solution?
Hint: can a feasible region have no extreme points?
Part III–Selected Applications
This chapter is built upon 10 examples. When taken together, these ex-
amples suggest the range of uses of linear programs and their generalizations.
These examples include linear programs, integer programs, and nonlinear
programs. They illustrate the role of optimization in operations management
and in economic analysis. Uncertainty plays a key role in several of them.
Also discussed in this chapter are the ways in which Solver and Premium
Solver can be used to solve problems that are not linear.
Described in this chapter are “network flow” models and the uses to
which they can be put. If the “fixed” flows such a model are integer-valued,
the simplex method is shown to find an integer-valued optimal solution.
Chapter 7: A Survey of Optimization Problems
1. Preview
Three sections of this chapter are starred. The starred sections delve into
probabilistic modeling. These starred sections present all of the “elementary”
probability that they employ, but readers who are new to that subject may find
those sections to be challenging.
A tailored spreadsheet
A linear program
The functions in cells E18 and F18 compute the shipping and pro-
duction costs. Solver has been asked to minimize the quantity in cell G18,
which is the total cost. Its changing cells are the shipping quantities in cells
D12:G14. Its constraints are H12:H14 ≤ H5:H7 (production quantities cannot
exceed production capacities), D15:H15 = D9:H9 (demands must be met)
and D12:G14 ≥ 0 (shipping quantities must be nonnegative). Table 7.1 reports
the optimal solution to this linear program.
A coincidence?
Figure 7.1 has 7 nodes, one for each field and one for each refinery. The
node for field U accounts for the production in that field and for its shipment
to the four refineries. The node for refinery 1 accounts for the demand at its
refinery and the ways in which this demand can be satisfied. The flow into
node U cannot exceed 250, which is the capacity of field 1, and the flow out of
node 1 must equal 200, which is the demand at refinery 1.
Chapter 7: Eric V. Denardo 225
= 200
1
≤ 250
U
= 300
2
≤ 400
V
= 250
3
≤ 350
W
= 150
4
Problem 7.B. (Olde England).╇ In an early era, developing nations shifted their
economies from agriculture toward manufacturing. Old England had three
principal technologies, which were the production of food, yarn and clothes.
It traded the inputs and outputs of these technologies with other countries.
In particular, it exported the excess (if any) of yarn production over internal
demand.
The Premier asked you to determine the production mix that would
maximize the net value of exports for the coming year. Your first step was
to accumulate the “net output” data that appear in cells B4:D10 of Table 7.2.
Columns B records the net output for food production; evidently, producing
each unit of food requires that Olde England import £0.50 worth of goods
(e.g., fertilizer), consume 0.2 units of food (e.g., fodder to feed to animals),
consume 0.5 units of labor, and use 0.9 units of land. Column C records the
net outputs for yarn production; producing each unit of yarn requires that
Olde England import £1.25 worth of goods, consume 1 unit of labor, and use
1.5 units of land. Column D records the net outputs for clothes production;
producing each unit of clothes requires the nation to import £5.00 worth of
goods, consume 1 unit of yarn, and consume 4 units of labor.
Cells J5:J7 record the levels of internal consumption of food, yarn and
clothes, respectively; in the coming year, Olde England will consume 11.5
Chapter 7: Eric V. Denardo 227
million units of food, 0.6 million units of yarn and 1.2 million units of clothes.
Cells J9:J12 record the nation’s capacities, which are 65 million units of labor
and 27 million units of land, as well as the capability to produce yarn at the
rate of 10.2 million units per year and clothes at the rate of 11 million units
per year.
Row 4 records the world market prices of £3 per unit for food, £10 per
unit for yarn and £16 per unit for clothes. The amounts that Olde England
imports or exports will have negligible effect on these prices.
Decision variables
This activity analysis has two types of decision variables. The symbols FP,
YP and CP stand for the quantity of food, yarn and clothes to produce in the
coming year. The symbols FE, YE and CE stand for the net exports of food,
yarn and clothes during the coming year. The unit of measure of each of these
quantities is millions of units per year. The production quantities FP, YP and
CP must be nonnegative, of course. The net export quantities FE, YE and CE
can have any sign; setting FEâ•›=â•›−1.5 accounts for importing 1.5 million units
of food next year, for instance.
A linear program
The linear program whose data appear in Table 7.2 maximizes the net
value of exports. Column H contains the usual sumproduct functions. Cell
H4 measures the contribution (value of net exports). Rows 5-7 account for
the uses of food, yarn and clothes. Rows 9-10 account for the uses of land and
228 Linear Programming and Generalizations
labor. Rows 11 accounts for the loom capacity, and row 12 accounts for the
clothes-making capacity.
Evidently, the net trade balance is maximized by making full use of the
land, full use of the capacity to weave yarn, and full use of the capacity to pro-
duce clothes. Clothes are exported. The nation produces most, but not all, of
the food and yarn it requires.
Activity analyses like this one make it easy to respond to a variety of “what
if ” questions. Here are a few: What would occur if Olde England decided that
it ought to be self-sufficient as concerns food? Would it pay to increase the
capacity to produce yarn? What would occur if the market price of clothes
decreased by 20%?
The phrase “activity analysis” was first used by Tjalling Koopmans; its ini-
tial appearance is in the title1 of the proceedings of a famous conference that
he organized shortly after George Dantzig developed the simplex method.
Well before that time (indeed, well before any digital computers existed) Was-
sily Leontief (1905-1999) built large input-output models of the American
economy and used them to answer “what if ” questions. Leontief received the
Nobel Prize in 1973 “for the development of the input-output method and for
its application to important economic problems.” As Leontief had observed,
an activity analysis is the natural way in which to describe the production
side of a model of an economy that is in general equilibrium. One such model
appears in Chapter 14.
The functions in row 8 of this table compute the mean (expected) rate of
return on each asset; these are 5%, 3%, and 4%, respectively.
A portfolio
The function in cell G11 computes the expectation E(R) of the return on
this portfolio. The functions in cells H4 through H6 compute the difference
Râ•›−â•›E(R) between the return R and its expectation if states a, b and c occur.
The function in cell H11 computes Var(R) because the variance equals the
expectation of the squared difference between the outcome and the mean.
Efficiency
Problem 7.C, part (a). ╇ For the data in Table 7.3, find the minimum-variance
portfolio whose expected rate return rate is at least 3%.
It is not difficult to show (we omit this) that Var(R) is a convex quadratic
function of the fractions invested in the various assets. For that reason, mini-
Chapter 7: Eric V. Denardo 231
mizing Var(R) subject to a constraint that keeps E(R) from falling below a
prescribed bound is a garden-variety (easily solved) nonlinear program.
The GRG nonlinear code has been used to minimize the variance in the
return (cell H9) with the fractions invested in the three assets (cells D9:F9) as
the changing cells, subject to constraints that keep the fractions nonnegative,
keep their total equal to 1, and keep the mean return (cell G9) at least as large
as the number in cell C9. This portfolio invests roughly 47% in asset 2 and
roughly 53% in asset 3. It achieves a mean return rate of 3.53%. The standard
deviation in its rate of return is roughly 0.005. Evidently, if an investor seeks
a higher mean rate of return than 3.53%, she or he must accept more risk (a
higher variance, equivalently, a higher standard deviation).
The set of all pairs [E(R), Var(R)] for efficient portfolios is called the
efficient frontier. If a rational decision maker accepts E(R) as the measure of
desirability and Var(R) as the measure of risk, that person chooses a portfolio
on the efficient frontier. If a portfolio is not on the efficient frontier, some
other portfolio is preferable.
Problem 7.C, part (b). ╇ For the data in Table 7.3, find the portfolios that are
on the efficient frontier.
232 Linear Programming and Generalizations
No asset returns more than 5%, so placing a value greater than 0.05 in
cell C9 guarantees infeasibility. To find a family of portfolios that are on the
efficient frontier, one can repeat the calculation whose result is exhibited in
Table 7.5 with the number in cell C9 equal to a variety of values between 0.03
and 0.05. There is a technical difficulty, however.
Suppose we solve the NLP with 0.034 in cell C9, then change that num-
ber to 0.038, and then solve again. The new solution replaces the only one.
This difficulty has been anticipated. Row 9 contains all of the information we
might want to keep from a particular run. Before making the 2nd run, “Copy”
row 9 onto the Clipboard and then use the Paste Special command to put only
its “Values” in row 14. After changing the entry in cell C9 and re-optimizing,
use the Paste Special command to put the new “Values” in row 15. And so
forth. Reported in Table 7.6 is the result of a calculation done with values of
C9 between 0.03 and 0.05 in increments of 0.004.
Piecewise linearity
These portfolios exhibit piecewise linearity. As the mean rate of return in-
creases from 3.53% to 4.34%, the portfolio varies linearly. When the mean rate
of return reaches 4.34%, the fraction invested in asset 3 decreases to 0. As the
rate of return increases from 4.34% to 5%, the portfolio again varies linearly,
with f3 = 0 in this interval. Evidently, as the mean return rate increases, the
optimal portfolio “pivots” from one extreme point to another. This is the sort
of behavior that one expects in the optimal solution to a linear program. One is
led to wonder whether this nonlinear program is mimicking a linear program.
Chapter 7: Eric V. Denardo 233
Suppose you wish to solve the portfolio optimization with cell C9 (the
lower bound on the mean return) equal to the 11 equally-spaced values 0.03,
0.032, 0.034, …, 0.05. To do so, follow this protocol:
234 Linear Programming and Generalizations
• Next, return to the Model tab on the dialog box to the right of Fig-
ure 7.2 Then click on the row containing the variables, cells D9:F9 in
this case. Make sure that the Monitor Value of these cells is set equal to
True. (If it is set equal to False, switch it.)
• Finally, either click on the green triangle to the right of the dialog
box that is displayed to the right of Figure 7.2 or click on Optimize
in the drop-down menu to the left of Figure 7.2 Either action causes
Premium Solver to solve the 11 optimization problems that you have
specified.
You can then scroll through the solutions to these optimization problems
by clicking on the window in a ribbon that currently reads “Opt. 11.” You can
also create a chart by clicking Charts on the drop-down menu.
The ribbon
Measures of risk
These two measures of risk share a defect; Var(R) and MAD(R) place
large penalties on outcomes that are far better than the mean. It might make
still better sense to minimize the expectation of the amount by which the
mean exceeds the outcome, i.e., to accept E[(µ − R)+ ] as the measure of risk.
The ideas and results in this section were developed by Harry Markow-
itz while he was a Ph. D. student at the University of Chicago. He published a
landmark paper in 1952, and he shared in the 1990 Nobel Prize in Economics,
which was awarded for “pioneering work in the theory of financial economics.”
Problem 7.D.╇ This problem appends to the ATV problem in Chapter 5 the
possibility of leasing tools that improve efficiency and thereby lower manu-
facturing costs. Tools α and β facilitate more efficient manufacture of Fancy
and Luxury model vehicles, respectively. Leasing tool α costs $1,800 per week,
and this tool reduces the cost of manufacturing each Fancy model vehicle by
$120. Similarly, leasing tool β costs $3,000 per week, and that tool reduces the
cost of producing each Luxury model vehicle by $300. The goal remains un-
changed; it is to operate the ATV plant in a way that maximizes contribution.
What production rates accomplish this?
Binary variables
An integer program
Throughout this text – and throughout much of the literature – the term
integer program is used to describe an optimization problem that would be a
linear program if requirement that some or all of its decision variables be in-
teger-valued were deleted. An integer program can have no quadratic terms,
for instance. It might be more precise to describe this type of optimization
problem as an “integer linear program,” but that usage never took root. Two
different methods for solving integer programs are discussed in Chapter 14.
Both of these methods solve a sequence – often surprisingly short – of linear
programs.
Break-even values
Binary variables will be used to model the leasing of these tools. Equating
the binary variable a to 1 corresponds to leasing tool α. Equating the binary
variable b to 1 corresponds to leasing tool β. Our goal is to formulate Problem
7.D as an optimization problem that differs from a linear program only in that
the variables a and b are binary.
The binary variable b accounts in a similar way for leasing the tool that
reduces the cost of producing Luxury model vehicles.
A spreadsheet
After you formulate your integer program, but before you click on the
Solver button:
No shadow prices?
First, consider the case in which all of the decision variables must be
integer-valued. In this case, shadow prices cannot exist because perturbing
a RHS value by a small amount causes the optimization problem to become
infeasible.
Next, consider the case in which only some of the decision variables must
be integer-valued. In this case, perturbing a RHS value may preserve feasibil-
ity, but it may cause an abrupt change in the objective value. When that oc-
curs, the shadow price cannot exist.
Problem 7.D illustrates this phenomenon. It is not hard to show that the
feasible solution Sâ•›=â•›35, Fâ•›=â•›0 and Lâ•›=â•›15 is a local maximum. Perturbing this
solution by setting Lâ•›=â•›1 decreases the objective value by $50, for instance. If
the GRG code encounters this feasible solution, it will stop; it has found a lo-
cal maximum that is not a global maximum.
The data in the “traveling salesperson problem” are the number of cities
that the salesperson is to visit and the travel times from city to city. A tour oc-
curs if the salesperson starts at one of these cities and visits each of the other
cities exactly once prior to returning to the city at which he or she began. The
length of the tour is the sum of the times it takes to travel from each city to
the next. The traveling salesperson problem is that of finding a tour whose
length is smallest. The traveling salesperson problem may sound a bit con-
trived, but it arises in a variety of contexts, including
Problem 7.E (scheduling jobs).╇ Five different jobs must be done on a sin-
gle machine. The needed to perform each job is independent of the job that
preceded it, but the time needed to reset the machine to perform each job
does vary with the job that preceded it. Rows 3 to 9 of Table 7.8 specifies the
times needed to reset the machine to accomplish each of the five jobs. “Job 0”
marks the start, and “job 6” marks the finish. Each reset time is given in min-
utes. This table shows, for instance, that doing job 1 first entails a 3-minute
setup and that doing job 4 immediately after job 1 entails a 17-minute reset
time. Reset times of 100 minutes represent job sequences that are not allowed.
The goal is to perform all five jobs in the shortest possible time, equivalently,
to minimize the sum of the times needed to set up the machine to perform
the five jobs.
Chapter 7: Eric V. Denardo 241
This subsection describes a solution method that uses the Standard Evo-
lutionary Solver, which exists only in Premium Solver. If you do not have ac-
cess to Premium Solver, please skip to the next subsection.
242 Linear Programming and Generalizations
The traveling salesperson problem has been widely studied, and sever-
al different methods of solution have been found to work well even when
the number n of cities is fairly large. One of these methods is based on the
“assignment problem.” A network flow model is called an assignment prob-
lem if it has 2 m nodes and m2 directed arcs with these properties:
• The network has m “supply” nodes, with a fixed flow of 1 into each sup-
ply node.
• The network has m “demand” nodes, with a fixed flow of 1 out of each
demand node.
• It has a directed arc pointing from each supply node to each demand
node. The flows on these arcs are nonnegative.
• Each cell in the array D12:I17 contains the shipping quantity from the
“supply node” in its row to the “demand node” in its column.
Solver had been asked to find the least-cost assignment. This assignment
ships one unit out of each supply node and one unit into each demand node.
The solution to this assignment problem is not reported in Table 7.9. With x(i, j)
as the flow from source node i to demand node j, the least-cost assignment
sets
Subtours
This optimal solution identifies the job sequences 0-2-1-4-6 and 3-5-3.
Neither of these is a tour. These job sequences correspond to subtours because
neither of them includes all of the jobs (cities in case of a traveling salesperson
problem).
An optimal solution
We could have imposed the constraint that eliminates the other subtour.
That constraint is x(0, 2)â•›+â•›x(2, 1)â•›+â•›x(1, 4)â•›+â•›x(4, 6)â•›≤â•›3.
This section discusses a subject with which every college student is famil-
iar. This section is starred because readers who have not studied elementary
probability may find it to be challenging.
Problem 7.F. ╇ You are the Dean of Admissions at a liberal arts college that
has a strong academic tradition and has several vibrant sports programs. You
seek a freshman class of 510 persons. An agreement has been reached with the
head coach of each of several sports. These agreements allow each coach to
admit a limited number of academically-qualified applicants who that coach
seeks to recruit for his or her team. The coaches have selected a total of 280
such persons. From past data, you estimate that each of these 280 people will
join the entering class with probability of 0.75, independent of the others.
Your college has no dearth of qualified applicants. From past experience, you
estimate that each qualified person you accept who has not been selected (and
courted) by a coach will join the entering class with probability of 0.6. Your
provost is willing to risk one chance in 20 of having an entering class that
is larger than the target of 510. How many offers should you make to non-
athletes? What is the expectation of the number of students who will join the
freshman class?
Chapter 7: Eric V. Denardo 245
In particular, the number A of athletes who will join the class has the bi-
nomial distribution with parameters nâ•›=â•›280 and pâ•›=â•›0.75. Thus, the mean and
variance of A are given by
A normal approximation
A spreadsheet
The spreadsheet in Table 7.10 evaluates the yield from the pool of athletes
and non-athletes. Cell C4 contains the number of offers to make to non-ath-
letes. This number could have been required to be integer-valued, but doing
so would make little difference. The functions in cells F3 and G3 compute
the mean and variance of the yield from the athletes. The functions in cells
F4 and G4 compute the mean and variance of the yield from the others. The
functions in cells C8, C9 and C10 compute the mean, variance and standard
deviation of the class size C. The function in cell C12 computes the probabil-
ity that C does not exceed the target of 510.
Solver has been asked to find the number in cell C4 such that C12â•›=â•›C13.
Evidently, you should offer admission to approximately 465 non-athletes.
How’s that?
Fine tuning
For example, the probability that the class size does not exceed 510 is well
approximated by the probability that the normally distributed random vari-
able C does not exceed 510.5. A slightly more precise answer to the problem
you face as the Director of Admissions can be found by making these changes
to the spreadsheet in Table 7.10:
If you make these changes, you will find that they result in a near-imper-
ceptible change in the number of non-athletes to whom admission is to be
offered.
The function in cell C14 requires explanation. The positive part (x)+ of the
number x is defined by (x)+â•›=â•›max{0, x}. Interpret (x)+ as the larger of x and 0.
When D denotes a random variable and q is a number (Dâ•›−â•›q)+ is the random
variable whose value equals the amount, if any, by which D exceeds q.
This section is starred because readers who have not had a course in
“elementary” probability may find it to be challenging. In many of the United
States, electric utilities are allowed to produce the power required by their
customers, and they are allowed to purchase power from other utilities. Prob-
lem 7.G, below, concerns a utility that is in such a state.1
Problem 7.G (a power plant)╇ You are the chief engineer for a utility company.
Your utility must satisfy the entire demand for electricity in the district it serves.
The rate D at which electricity is demanded by customers in your district is un-
certain (random), and it varies with the time of day and with the season. It is
convenient to measure this demand rate, D, in units of electricity per year, rather
than units per second or per hour. The load curve specifies for each value of t
the fraction F(t) of the year during which D does not exceed t. This load curve
is known. The distribution of D is approximately normal with a mean of 1250
thousand units per year and a standard deviation of 200 thousand units per year.
Your utility has no way to store electricity. It can produce electricity efficiently
with “base load” plant or less efficiently with “peak load” plant. It can also pur-
chase electricity from neighboring utilities that have spare capacity. The “trans-
fer” price at which this occurs has been set – tentatively – at 6.20 dollars per unit
of electricity. Of this transfer price, only the fuel cost is paid to the utility provid-
ing the power; the rest accrues to the state. The transfer price is intended to be
Connecticut is not such a state, and its utility rates in 2009 are exceeded only by
1╇
Hawaii’s.
Chapter 7: Eric V. Denardo 249
high enough to motivate each utility to satisfy at least 98% of its annual power
requirement from its own production. The relevant costs are recorded in Ta-
ble 7.11. Annualized capital costs are incurred whether or not the plant is being
used to generate electricity. Fuel costs are incurred only for fuel that is consumed.
Your goal is to design the plant that minimizes the expected annualized
cost of supplying power to your customers. What is that cost? How much of
each type of plant should your utility possess? Will your utility produce at
least 98% of the power that its customers consume?
The plant
Base load plant is cheaper to operate (see Table 7.11), so you will not use
any peak-load plant unless your base-load plant is operating at capacity. For
the same reason, you will not purchase any electricity from other utilities un-
less your base-load and peak-load capacities are fully utilized. This leads to
the introduction of two decision variables:
The variables q1 and q2 are measured in units of electricity per year. From
Table 7.11, we see that base-load and peak-load plant have annualized capital
costs of 2.00 dollars per unit of capacity and 1.30 dollars per unit of capacity,
respectively. The annualized cost C of the plant is given by
C = 2.00 q1 + 1.30 (q2 − q1 ) ,
The electricity
The random variable (Dâ•›−â•›q2)+ equals the annualized rate at which elec-
tricity is purchased from other utilities, this being the excess of D over the to-
tal capacity of the base-load and peak-load plant. This electricity costs $6.20
per unit, so its expected annual cost equals
The expectation G of the cost of the electricity itself equals the sum of
expressions (1) and (2) and (3). Since D has the normal distribution, each of
these expressions can be found from the normal loss function.
A spreadsheet
The solution that is displayed in Table 7.12 was found with the Standard
Evolutionary Solver, and it was found quickly. If you explore this solution, you
will see that the design problem exhibits a “flat bottom.” Eliminating peak-
load plant capacity increases the annualized cost by less than 1%, for instance.
It is left for you, the reader, to explore these questions: Is the transfer price
large enough to motivate the utility to produce at least 98% of the power its cus-
tomers require? If not, what is the smallest price that would motivate it to do so?
Many retail stores face the problem of providing appropriate levels of in-
ventory in the face of uncertain demand. These stores face a classic tradeoff:
252 Linear Programming and Generalizations
Large levels of inventory require a large cash investment. Low levels of inven-
tory risk stock-outs and their attendant costs.
A simple “base stock” model illustrates this trade-off. Let us suppose that
an item is restocked each evening after the store closes. Let us suppose that the
demands the store experiences for this item on different days are uncertain, but
are independent and identically distributed. The decision variable in this model
is the order up to quantity q, which equals the amount of inventory that is to be
made available when the store opens each morning. This model is illustrated by
Problem 7.H (a base stock problem). ╇ You must set the stock levels of 100 dif-
ferent items. The demand for each item on each day has the Poisson distribu-
tion. The demands on different days are independent of each other. From his-
torical data, you have accurate estimates of the mean demand for each item. If
a customer’s demand cannot be satisfied, he or she buys the item from some
other store. Management has decreed that you should run out of each item
infrequently, not more than 2% of the time, but that you should not carry
excessive inventory. What is your stocking policy?
Let the random variable D denote the demand for a particular item on a
particular day. Your order-up-to quantity for this item is the smallest integer
q such that P(Dâ•›≤â•›q) is at least 0.98. Row 3 of the spreadsheet in Table 7.13
displays the optimal order quantity for items whose expected demand E(D)
equals 10, 20, 40, 80, 160 and 320.
Safety stock
that will occur until you are able to restock. The excess of the order-up-to
quantity q over the mean demand E(D) is known as the safety stock. Row 5
of the spreadsheet in Table 7.13 specifies the safety stock for various levels of
expected demand. Row 6 shows that the safety stock grows less rapidly than
the expected demand. Row 7 shows that the safety stock is roughly propor-
tional to the square root of the expected demand.
An economy of scale
For the base stock model, the safety stock is not proportional to the mean
demand. If the mean√ demand doubles, the safety stock grows by the factor
of approximately 2 , not by a factor of 2. This economy of scale is common
to nearly every inventory model. The safety stock needed to provide a given
level of service grows as the square root of the mean demand.
Problem 7.I (cash management).╇ Mr. T does not use credit or debit cards
spends, and he spends cash at a constant rate, with a total of $2,000 each
month. He obtains the cash he needs by withdrawing it from an account that
pays “simple interest” at the rate of 5% per year. His paycheck is automatically
deposited in the same account, and he is careful never to let the balance in
that account go negative. Each withdrawal requires him to spend 45 minutes
traveling to the bank and waiting in line, and he values his free time at $20/
hour. How frequently should he visit the bank, and how much should he with-
draw at each visit?
Opportunity cost
It is optimal for Mr. T to arrive at the bank with no cash in his pocket.
When Mr. T does visit the bank, he withdraws some number q of dollars. Be-
cause he spends cash at a uniform rate, the average amount of cash that he has
254 Linear Programming and Generalizations
in his possession is q/2, and the opportunity cost of not having that amount of
cash in an account that pays 5% per year equals (0.05)(q/2).
Annualized cost
As q increases, the number of visits to the bank decreases, but the oppor-
tunity cost of the cash that is not earning interest increases. A trade-off exists.
Inventory control
AK qH
(5) C(q) = + .
q 2
The EOQ
d −A K H ↜
(6) C(q) = + ..
dq q2 2
When q is small, the derivative is negative. As q increases, the deriva-
tive increases. As q become very large, the derivative approaches the positive
number, H/2. The optimal order quantity q* is the unique value of q for which
the derivative equals 0. Equating to 0 the RHS of equation (6) produces
2A K
(7) ∗ q = ..
H
The number q* given by (7) has been known for nearly a century as the
economic order quantity (or EOQ for short).
Bank withdrawals
and the number A/q* of visits to the bank over the course of the year equals
6.32. Evidently, Mr. T is not troubled about having a large amount of cash in
his pocket.
An economy of scale
If the annual demand doubles, equations (7) and (8) show that the economic
√
order quantity q* and the annualized cost C(q*) increase by the factor of 2 ,
rather than by the factor of 2. This is the same sort of economy of scale that
was exhibited by the base stock model.
256 Linear Programming and Generalizations
A flat bottom
Algebraic manipulation of the expressions for C(q) and for C(q*) pro-
duces the equation
C(q) 1 q∗ q
(9) ∗
= + ∗ .
C(q ) 2 q q
equals (3/4) 2, and (3/4) 2 ∼ = 1.06, which exceeds the minimum by only
6%. It is emphasized:
Flat bottom: In the EOQ model, the annualized cost C(q) exceeds C(q*)
by not more than 6% as q varies by a factor of 2, between 0.707 q* and
1.414 q*.
This “flat bottom” can be good news. An EOQ model can result from
simplifying a situation that is somewhat more complex. Its flat bottom con-
notes that the simplification may have little impact on annualized cost.
A replenishment interval
µ = E[D(k)], σ 2 = Var[D(k)].
In this model, the symbol A denotes the expectation of the total demand
that occurs during a 365 day year. Because demand has stationary indepen-
dent increments,
A = µ × (365/k).
Backorders
The demand D(k) that occurs during the replenishment interval can ex-
ceed the supply at the start of that period. When it does, a stock-out occurs. In
the model that is under development, it is assumed that demands that cannot
be met from inventory are backordered, that is, filled when the merchandise
becomes available.
258 Linear Programming and Generalizations
Costs
• Each order that is placed incurs a fixed ordering cost K that is indepen-
dent of the size of the order.
For the model that has just been specified, it is reasonable – and it can
be shown to be optimal – to employ an ordering policy that is determined by
numbers r and q, where
• The number q is the reorder quantity. Each moment at which the in-
ventory position is reduced to r, an order is placed for q units.
In this context, the quantity r − E[D(k)] = r − µ is the safety stock.
(10) (r − µ + q/2)H.
The number of orders placed per year is uncertain, but the average num-
ber of orders placed per year equals the ratio A/q of the expected annual de-
Chapter 7: Eric V. Denardo 259
mand A to the size q of the order. Each order incurs cost K, and the expected
annualized ordering cost is given by the (familiar) expression
(11) KA/q.
The number of units backordered at the moment before the order is filled
equals the excess [D(k)â•›−â•›r]+ of the demand during the k-day period over the
stock level r at the moment the order is placed. Each unit that is backordered
incurs a cost that equals b, and the expected number of orders placed per year
equals A/q. Hence, the expectation of the annualized cost of backorders is
given by
The optimization problem is to select values of q and r that minimize the sum
of the expressions (10), (11) and (12).
Problem 7.J (more cash management).╇ Rachael is away at college. She and her
mom have established a joint account whose sole use is to pay for Rachael’s
miscellaneous expenses. Rachael charges these expenses on a debit card. The
bank debits withdrawals from this account immediately, and the bank credits
deposits to this account 16 days after they are made. This account pays no in-
terest, and it charges a penalty of $3 per dollar of overdraft. Rachael’s miscel-
laneous expenses have stationary independent increments, and the amount of
miscellaneous expense that she incurs during each 16-day period is approxi-
mately normal with a mean of $160 and a standard deviation of $32. Rachael
and her mom practice inventory control. When the balance in Rachael’s ac-
count is reduced to r dollars, she phones home to request that a deposit of q
dollars be credited to this account. Her mother attends to this immediately.
The transfer takes her mom 30 minutes, and she values her time at $30 per
hour. Rachael’s mom transfers this money from a conservative investment
account that returns 5% simple interest per year. What values of r and q do
Rachael and her mom choose?
260 Linear Programming and Generalizations
A spreadsheet
Table 7.14 reports the optimal solution that was found by Solver’s GRG
code. It had been asked to minimize the number in cell D9 (total annual-
ized expense) with changing cells H3 and I3. As mentioned earlier, the GRG
code works best when it is initialized with reasonable values of the decision
variables in the changing cells. This run of Premium Solver was initialized
with the EOQ (roughly 1500) in cell H3 and with 160 (the mean demand
during the replenishment interval) in cell I3. Solver reports a optimal order
quantity q*â•›=â•›1501 and a reorder point r*â•›=â•›239, which provides a safety stock
of 79â•›=â•›239â•›−â•›160.
13. Review
The GRG Solver works best if you can initialize it with values of the deci-
sion variables that are reasonably close to the optimum. Tips on getting good
results with it can be found in Section 11 of Chapter 20. Look there if you are
having trouble.
It might have seemed, at first glance, that the simplex method and its
generalizations apply solely to optimization problems that are deterministic.
That is not so. Uncertainty plays a central role in several of the examples in
this chapter, and that is true of other chapters as well.
1. (Olde England) Write down the linear program whose spreadsheet formu-
lation is presented as Table 7.2.
3. (efficient portfolios) Redo part (a) of Problem 7.C with MAD as the mea-
sure of risk. Compare your results with those in Table 7.5.
4. (efficient portfolios) Redo part (b) of Problem 7.C with MAD as the mea-
sure of risk. Compare your results with those in Table 7.6.
262 Linear Programming and Generalizations
(a) How many people should you place on the wait list?
(b) With what probability does your admissions policy produce a fresh-
man class that contains precisely 510 persons?
(c) What is the expected number of vacant positions in next year’s Fresh-
man class.
10. (college admissions) In Problem 7.F, rework the spreadsheet to account for
the suggestions in the subsection entitled “Fine Tuning.” How may offers
of admission will be made?
Chapter 7: Eric V. Denardo 263
11. (a power plant) In Problem 7.G, by how much does expected annual cost
increase if peak-load plant is eliminated? Hint: You might re-optimize
with C7 as the only decision variable and with the function =â•›C7 in cell C8.
12. (a power plant) In Problem 7.G, is the transfer price of 6.20 $/unit large
enough to motivate the utility to satisfy at least 98% of the power its cus-
tomers demand with its own production capacity”? If not, how much
larger does the transfer price need to be? Hint: You might wish to opti-
mize with a variety of values in cell E4.
13. (a power plant) In Problem 7.G, suppose that base-load plant emits 1 unit
of carbon per unit of electricity produced and that peak-load plant emits
2 units of carbon per unit of electricity produced. How large a tax on car-
bon is needed to motivate the utility to produce no electricity with peak-
load plant? What impact would this tax have on the utility’s expected an-
nual cost?
(a) What is the safety stock? With what probability does Rachael incurs
an overdraft prior to replenishment of the account?
(b) As the reorder point q is varied, does the annualized cost continue to
display a “flat bottom?” akin to that of the EOQ model?
(c) Suppose that both the mean and the standard deviation of D(16) were
doubled, from 160 and 32 to 320 and 64. Does the optimal solution
display an economy of scale akin to that of the EOQ model?
15. (Rachael, yet again). This problem has the same data as in Problem 7.J.
Rachael’s mom has found it inconvenient to supply cash at uncertain
times. She would prefer to supply uncertain amounts of cash at pre-deter-
mined times. Rachael and her mom have revised the structure of the cash
management policy. Every t days, Rachael requests the amount needed to
raise her current bank balance to x dollars.
(c) What can you say about the balance in her account at the moment
before the deposit is credited to it?
(e) What is the probability distribution of the amount of cash that Ra-
chael’s mom transfers to her account?
(f) What would happen to the expected annualized cost of this account if
Rachael’s mom made a deposit every six months.
16. In a winter month, an oil refinery has contracted to supply 550,000 bar-
rels of gasoline, 700,000 barrels of heating oil and 240,000 barrels of jet
fuel. It can purchase light crude at a cost of $60 per barrel and heavy crude
at a cost of $45 per barrel. Each barrel of light crude it refines produces
0.35 barrels of gasoline, 0.35 barrels of heating oil and 0,15 barrels of jet
fuel. Each barrel of heavy crude it refines produces 0.25 barrels of gaso-
line, 0.4 barrels of heating oil and 0.15 barrels of jet fuel.
(a) Formulate and solve a linear program that satisfies the refinery’s con-
tracts at least cost.
(b) Does this refinery meet its demands for gasoline, heating oil and avia-
tion fuel exactly? If not, why not?
17. (a staffing problem) Police officers work for 8 consecutive hours. They
are paid a bonus of 25% above their normal pay for work between 10 pm
and 6 am. The demand for police officers varies with the time of day, as
indicated below:
â•›period minimum
2 am to 6 am 12
6 am to 10 am 20
10 am to 2 pm 18
2 pm to 6 pm 24
6 pm to 10 pm 29
10 pm to 2 am 18
Chapter 7: Eric V. Denardo 265
(b) It is not necessary in part (a) to require that the decision variables be
integer-valued. Explain why. Hint: it is relevant that 6 is an even num-
ber.
18. (a traveling salesperson). The spreadsheet that appears below specifies the
driving times in minutes between seven state capitals. Suppose that you
are currently at one of these capitals and that you wish to drive to the
other six and return where you started, spending as little time on the road
as possible.
(a) Formulate and solve an assignment problem akin to the one in the
chapter. Its optimal solution will include some number k of subtours.
19. (departure gates) As the schedule setter for an airline, you must schedule
exactly one early-morning departure from Pittsburgh to each of four cit-
ies. Due to competition, the contribution earned by each flight depends
on its departure time, as indicated below. For instance, the most profitable
departure time for O’Hare is at 7:30 am. Your airline has permission to
schedule these four departures at any time between 7 am and 8 am, but
you have only two departure gates, and you cannot schedule more than
two departures in any half-hour interval.
266 Linear Programming and Generalizations
(c) Another airline wishes to rent one departure gate for the 7:00 am
time. What is the smallest rent that would be profitable for you to
charge?
across row 8 as far as cell L8. Require that the decision variables in
cells C6:L7 be binary, and require that the numbers in cells C8:L8 be
nonnegative. Why does this work?
(b) Denote as T(i) the total population of district i, and denote as D(i) the
number of registered Democrats in district i. Use Solver or Premium
Solver to compute T(i) and D(i) for each district, to enforce the con-
straints 520 ≤ T(i) ≤ 630 for each i, to enforce the constraints
(c) Is the optimization problem that you devised an integer linear pro-
gram, or is it an integer nonlinear program?
type of perfume A B C
expected demand 50 45 30
standard deviation of demand 15 12 10
Contribution $30 $43 $50
loss of good will $20 $30 $40
22. (A perfume counter, continued) The chi-chi department store in the pre-
ceding problem is open from Monday through Saturday each week. The
demand that occurs for a particular type of perfume is not dependent on
268 Linear Programming and Generalizations
the day of the week, and demands on different days are independent of
each other. Your supplier had been resupplying each Thursday evening,
after the store closes. For an extra fee of $350, your supplier has offered to
resupply a second time each week, after the close of business on Monday.
How many bottles of each type of perfume should you stock with resup-
ply on a twice-a-week basis? Is it worthwhile to do so?
Chapter 8: Path Length Problems
and Dynamic Programming
1. Preview
This is the first of a pair of chapters that deal with optimization problems
on “directed networks.” This chapter is focused on path-length problems, the
next on network-flow problems. Path-length problems are ubiquitous. A vari-
ety of path-length problems will be posted in this chapter. They will be solved
by linear programming and, where appropriate, by other methods.
2. Terminology
The network optimization problems in this chapter and the next employ
terminology that is introduced in this section. Most of these definitions are
easy to remember because they are suggested by normal English usage.
2 4
3 5
N = {1, 2, 3, 4, 5},
A = {(1, 2), (1, 3), (2, 5), (3, 4), (3, 5), (4, 2), (5, 4)}.
Paths
Directed arc (i, j) is said to have node i as its tail and node j as its head. A
path is a sequence of n directed arcs with nâ•›≥â•›1 and with the property that the
head of each arc other than the nth is the tail of the next. A path is said to be
from the tail of its initial arc to the head of its final arc. In Figure 8.1, the arc
(2, 5) is a path from node 2 to node 5, and the arc sequence {(2, 5), (5, 4)} is a
path from node 2 to node 4.
To interpret a path, imagine that when you are at node i, you can walk
across any arc whose tail is node i. Walking across arc (i, j) places you at node
j, at which point you can walk across any arc whose tail is node j. In this con-
text, any sequence of arcs that can be walked across is a path.
Cycles
A path from a node to itself is called a cycle. In Figure 8.1, the path {(2, 5),
(5, 4), (4, 2)} from node 2 to itself is a cycle, for instance. A path from node j
to itself is said to be a simple cycle if node j is visited exactly twice and if no
other node is visited more than once.
Trees
Arc lengths
Let us consider a directed network in which each arc (i, j) has a datum
c(i, j) that is dubbed the length of arc (i, j). Figure 8.2 depicts a network that
has 8 nodes and 16 directed arcs. The length of each arc is adjacent to it. Four
of these so-called “lengths” are negative. In particular, c(5, 7) = −â•›5.4.
272 Linear Programming and Generalizations
3.1
2 5
- 5.
4
-0
5 .2
2. 7
2.2
3.9
1.8
0.8 2.4
1 3 6
1.7
1.3
1.
4 - 0.3
.0
-8
8
0.3
9.6
4
Path lengths
The length of each path is normally taken to be the sum of the lengths
of its arcs. In Figure 8.2, for instance, the path {(1, 2), (2, 6)} has length
2.3 = 2.5â•›+â•›(−0.2). Also, the path {{6, 8), (8, 6), (6, 8)} has length 2.3 =
1.3â•›−â•›0.3â•›+â•›1.3.
Path-length problems
A directed network can have many paths from node i to node j. The
shortest path problem is that of finding a path from a given node i to a given
node j whose length is smallest. The longest path problem is that of finding a
path from given node i to a given node j whose length is longest. Path-length
problems are important in themselves, and they arise as components of other
optimization problems. Solution methods for path-length problems are in-
troduced in the context of
Problem 8.A.╇ For the directed network depicted in Figure 8.2, find the short-
est path from node 1 to node 8.
Having solved Problem 8.A by trial and error, we will now use it to in-
troduce a potent group of ideas that are known, collectively, as “dynamic pro-
gramming.” These ideas will lead us to a variety of ways of solving Problem
8.A, and they have a myriad of other uses.
States
• A sense of time. This sense of time may be an artifice. For our shortest-
route problem, think of each transition from node to node as taking an
amount of time that is indeterminate and immaterial, but positive.
For our shortest-path problem, the only piece of information that needs
to be included the state is the node i that we are currently at. How we got to
node i doesn’t matter; we seek the shortest path from node i to node 8.
Embedding
(A choice as to embedding has just been made; it would work equally well
to find, for each node j, the length F(j) of the shortest path from node 1 to
node j.)
274 Linear Programming and Generalizations
Linking
The optimization problem with which we began has now been replaced
with a family of optimization problems, one per state. Members of this family
are closely related in a way that will make them easy to solve. For the shortest-
path problem at hand, each arc (i, j) in Figure 8.2 establishes the relationship,
because c(i, j)â•›+â•›f(j) is the length of some path from node i to node 8, and f(i)
is the length of the shortest such path. Moreover, with f(8)â•›=â•›0,
Equation (2) holds because the shortest path from node i to node 8 has as its
first arc (i, j) for some node j and its remaining arcs form the shortest path
from node j to node 8. Expression (2) links the optimization problems for the
various starting states.
Imagine, for the moment, that correct numerical values have been as-
signed to f(2) through f(8). The value of f(1) that satisfies (2) is the largest
value of f(1) that satisfies the inequalities in (1) for the arcs that have node 1
as their tail.
To compute f(1), our original goal, it would suffice to maximize f(1) sub-
ject to the constraints in system (1). This would give the correct f-value for
each node on the shortest path from node 1 to node 8, but it might given in-
correct f-values for the others. A linear program that gives the correct f-value
for every node is
Chapter 8: Eric V. Denardo 275
Recorded in Table 8.1 is the optimal solution that Solver has found. The
seven arcs whose constraints are binding have been shaded. These arcs form
a tree of shortest paths to node 8. This tree is displayed in Figure 8.3, as is the
length f(i) of the shortest path from each node i to node 8.
276 Linear Programming and Generalizations
0.8 - 2.3
3.1
2 5
- 5.
4 3.1
5
2.2
2. 7
3.3 3.0 1.3 1.8
1 3 6
1.3
0
8
9.6
9.6
4
Version #1
Version #2
We have made use of the 2nd version! Please pause to convince yourself
that equation (2) does so.
Version #3
The 3rd version of the principle of optimality rests on the notion of a cycle
of events – observe a state, select a decision, wait for transition to occur to a
new state, observe that state, and repeat. This version is the
Principle of optimality (3rd version).╇ An optimal policy has the property
that whatever the initial state is and no matter what decision is selected ini-
tially, the remaining decisions in the optimal policy are optimal for the state
that results from the first transition.
Problem 8.A illustrates the 3rd version as well. This version states, for in-
stance, that if one begins at node 3 and chooses any arc whose tail is node 3, the
278 Linear Programming and Generalizations
remaining arcs in the optimal policy are optimal for the node to which transi-
tion occurs. The 3rd version a verbal counterpart of the optimality equation.
Recap
A linear program has been used to find an optimal policy. This illustrates
a link between linear and dynamic programming. Do there exist dynamic
programming problems whose optimal policies cannot be found by solving
linear programs? Yes, there do, but they are rare.
In fact, if this network did have a cycle whose length were negative, the
shortest-path problem would be ill-defined: There would be no shortest
path from node 1 to node 8 because a path from node 1 to node 8 could
repeat this (negative) cycle any number of times en route. By the way, if
the network in Figure 8.2 did have a negative cycle, Program 8.1 would be
infeasible.
What about the longest path from node 1 to node 8? That problem is
ill-defined because a path from node 1 to 8 can repeat the cycle (5, 7, 6, 5) an
arbitrarily large number of times.
You might wonder, as have many others, whether it might be easy to find
the longest path from one node to another that contains no cycle. It isn’t. That
is equivalent to the “traveling salesman problem,” which is to say that it is
NP-complete. (No polynomial algorithm is known to solve it, and if you did
find an algorithm that solves it for all data sets, you would have proved that
1╇
Richard Bellman’s books include the classic, Dynamic Programming, Princeton Uni-
versity Press, 1957, reprinted by Dover Publications, 2003.
2╇
Richard Bellman, Eye of the Hurricane: an Autobiography, World Scientific Publish-
ing Co, Singapore, 1984.
280 Linear Programming and Generalizations
Pâ•›=â•›NP,) This is one case – amongst many – in which one of a pair of closely-
related problems is easy to solve, and the other is not.
Problem 8.B.╇ For the network in Figure 8.4, find the tree of shortest paths
from node 1 to all others.
2.5 ∞
3.1
2 5
5.4
∞
5
2.2
2. 7
3.9
0 0.8 ∞
1.7
0.8 2.4
1 3 6
1.3
0. ∞
9
1.3
8
0.9
9.6
4
Reaching
All arc lengths in Figure 8.4 are nonnegative. Figure 8.4 hints at the al-
gorithm that is about to be introduced. This algorithm is initialized with
v(1) = â•›0 and with v(j)â•›=â•›+∞ for each j = i . Initially, each node is unshaded.
The general step is to select an unshaded node i whose label is smallest (node
1 initially) and execute the
Chapter 8: Eric V. Denardo 281
Reaching step:╇ Shade node i. Then, for each arc (i, j) whose tail is node
i, update v(j) by setting
Figure 8.4 describes the result of the first application of the Reaching
step. Node 1 has been shaded, and the labels of nodes 2, 3 and 4 have been
reduced to 2.5, 0.8 and 0.9, respectively. Evidently, there is a path from node
1 to node 3 whose length equals 0.8. The fact that arc lengths are nonnegative
guarantees that all other paths from node 1 to node 3 have lengths of 0.9 or
more. As a consequence, node 3 has v(3) = f(3) = 0.8. The second iteration
of the reaching step will shade node 3 and will execute (3) for the arcs (3, 2),
(3, 4) and (3, 6). This will not change v(2) or v(4), but it will reduce v(6) from
+∞ to 3.2.
The update in (3) “reaches” out from node i to update the labels for some
unshaded nodes. After any number of executions of the Reaching step:
• If node j is shaded, its label v(j) equals the length of the shortest path
from node 1 to node j.
• If node j is not shaded, its label v(j) equals the length of the shortest
path from node 1 to node j whose final arc (i, j) has i shaded.
The fact that arc lengths are nonnegative suffices for an easy inductive proof
of the properties that are highlighted above.
As soon as a label v(j) becomes finite, it equals the length of some path
from node 1 to node j. To build a shortest-path tree, augment the Reaching
step to record at node j the arc (i, j) that reduced v(j) most recently.
E. W. Dijkstra
The algorithm that has just been sketched bears the name of its inventor.
It is known as Dijkstra’s method, after the justly-famous Dutch computer
scientist, E. W. Dijkstra (1930-2002). Dijkstra is best known, perhaps, for his
recommendation that the GOTO statement be abolished from all higher-level
programming languages, i.e., from everything except machine code.
282 Linear Programming and Generalizations
If all arc lengths are positive, it is not necessary to pick the unshaded node
whose label is smallest. Note in Figure 8.4 that:
• Each arc whose head and tail are unshaded has length of 1.3 or more.
• No unshaded node whose label is within 1.3 of the smallest can have its
label reduced. In particular, since v(4) = 0.9 ≤ 0.8â•›+â•›1.3, it must be that
v(4)â•›=â•›f(4).
Denote as m the length of the shortest arc whose head and tail are un-
shaded. (In Figure 8.4, m equals 1.3.) As just noted, each unshaded node
j whose label v(j) is within m of the smallest has v(j)â•›=â•›f(j). The unshaded
nodes can be placed in a system of buckets, each of whose width m, where
the pth bucket contains each unshaded node j having label v(j) that satisfies
pmâ•›≤â•›v(j)â•›<â•›(pâ•›+â•›1)m. The reaching step in (3) can be executed for each node i
in the lowest-numbered nonempty bucket. After a bucket is emptied, it can
be re-used, and a system of 1â•›+â•›M/m buckets suffices, where M is the length of
the longest arc whose head is unshaded.
Recap
Dijkstra’s method works when the arc lengths are nonnegative. It works
whether or not the network is cyclic. For large sparse networks, the time-
consuming part of Dijkstra’s method is determination of the unshaded node
whose label is smallest. When the reaching step is executed for node i, the
shortest path to node i is known; this can be used to prune the network of arcs
that will not be needed to compute shortest paths to the rest of the nodes. If
arc lengths are positive, reaching can be speeded up by the use of buckets. The
uses of reaching with pruning and buckets are explored in a series of papers
written with Bennett L. Fox3.
Reaching works if the arc lengths are nonnegative. If the network is acy-
clic, an even simpler method is available. That method is introduced in the
context of
Problem 8.C.╇ Find the tree of shortest paths to node 9 for the network that
is depicted in Figure 8.5.
12.1 -4.0
2 5 7
6.
9 12.4
4
1
1.
4.
.6
3.5
-5.0
-5
1 4 8
7. 10
2. 8 .2
1
.0
0
3.
-6
3 6 9
5.0
The network in Figure 8.5 has 9 nodes and 15 directed arcs. Each arc (i, j)
in this network has i < j, which guarantees that the network is acyclic. Each
arc (i, j) in this network also has a length c(i, j). Some arc lengths are negative,
e.g., c(5, 7)â•›=â•›−â•›4.0.
The states for Problem 8.C are the integers 1 through 8, and f(i) denotes
the length of the shortest path from node i to node 9. With f(9)â•›=â•›0, the opti-
mality equation takes the form,
where it is understood that the minimum is to be taken over all j such that
(i, j) is an arc.
Since the head of each arc has a higher number than the tail, this optimal-
ity equation is easy to solve by a method that is known as backwards optimi-
284 Linear Programming and Generalizations
zation. This method solves (4) backwards, that is, in decreasing i. Backwards
optimization is easily executed by hand. Doing so gives f(8)â•›=â•›10.2, then
Spreadsheet computation
Cell C14 contains the integer 7, so this function takes the sum of the number
in cell D14 and the number that is in the cell that is 7 rows below cell H3 and
0 columns to the right of cell H3, thereby computing c(5, 7)â•›+â•›f(7).
Dragging the function in cell E14 up and down the column computes
c(i, j)â•›+â•›f(j) for each arc (i, j). The min functions in column H equate f(i) to the
RHS of equation (4). The arcs that attain these minima have been shaded. The
shades arcs form a tree of shortest paths to node 9.
task A B C D E
completion time 9 6 4 5 8
predecessors -- -- B A, C B
A simpler approach is to identify the tasks with the nodes and the prefer-
ence relations with the arcs. In Figure 8.6, node S represents the start of the
project and node F represents its completion. Each task is represented as an
ellipse (node) with its length inside it, and each precedence relationship is
286 Linear Programming and Generalizations
represented as a directed arc. For instance, arcs (A, D) and (C, D) exist be-
cause task D cannot begin until tasks A and C have been completed.
A 9 D 5
S 0 C 4 F 0
B 6 E 8
In Figure 8.6, the lengths are associated with the nodes, rather than with
the arcs. A path is now a sequence of tasks each of which is a predecessor of
the next. Task sequences (S, A, D) and (B, E) and (C) are paths. The length of
a path equals the sum of the lengths of the tasks that it contains. The longest
path that includes the start and finish tasks is (S, B, C, D, F) and its length is
15â•›=â•›6â•›+â•›4â•›+â•›5. The project takes 15 weeks to complete.
Any path whose length is longest is said to be a critical path. Each task in
a critical path is called critical. The critical tasks cannot be delayed without
increasing the project completion time. For the data in Figure 8.6, tasks B, C
and D are critical.
Problem 8.D illustrates the critical path method, whose components are:
• Find the shortest time needed to complete the project, and identify the
tasks that cannot be delayed without increasing the completion time.
Chapter 8: Eric V. Denardo 287
Figure 8.6 has only seven nodes and is easy to solve by “eyeball.” If a proj-
ect had a large number of tasks, a systematic solution procedure would be
called for. The network representation of a project management problem
must be acyclic, so backwards optimization can be adapted to find the short-
est completion time and the critical tasks. The arc lengths are positive, so
reaching can also be adapted to this purpose.
Let us consider, briefly, how one might compute the earliest time at which
each task can be completed. For each task x, designate
These earliest completion times satisfy an optimality equation, and t(x) can
be computed from this equation as soon as the task completion times for its
predecessors have been determined.
The data in Figure 8.6 are used to illustrate this recursion. With
t(S)â•›=â•›0, this recursion gives t(A)â•›=â•›t(S)â•›+â•›9â•›=â•›9, then t(B)â•›=â•›t(S)â•›+â•›6â•›=â•›6, then
t(C)â•›=â•›t(b)â•›+â•›4â•›=â•›10, then
t(A) + 5 9+5
t(D) = max = max = 15,
t(C) + 5 10 + 5
and so forth. The method that has just been sketched is just like backwards
optimization, except that it begins at the start of the network. This method is
sometimes called forwards optimization.
Reaching can also be used to compute the earliest completion times, and
they can even be found by solving a linear program. Using a linear program to
solve a problem as simple as this seems a bit like overkill, however.
288 Linear Programming and Generalizations
On the other hand, linear programming has a role to play when the prob-
lem is made a little more complicated. Crashing accounts for the situation
in which some or all of the task times can be shortened at added expense. To
illustrate, we alter Problem 8.D by supposing that:
The aggregate number w of weeks by which task times are reduced is given by
Each of the eight arcs in Figure 8.6 gives rise to an inequality on a latest com-
pletion time. Two of these inequalities are
Critique
A second weakness of CPM lies in its assumption that the tasks and
their duration times are fixed and known. One is actually working with es-
timates. As time unfolds, unforeseen events occur. When they do, one can
revise the network, re-estimate the task times, re-compute its critical path,
and re-determine which tasks need close monitoring. It is sometimes prac-
tical to model the uncertainty in the task duration times. A technique that
is known as PERT (short for Program Evaluation and Review Technique)
is a blend of simulation and critical-path computation. PERT is sketched
below.
Step 1:╇ For each task, estimate these three elements of data:
Model the duration time of this task by a random variable X whose distri-
bution is triangular with the above parameters.
Step 2:╇ Simulate the project a large number of times. For each simulation,
record the project completion time, which tasks are critical, and the dif-
ference between each task’s earliest and latest start times.
PERT allows one to discern which of the tasks are most likely to be critical.
On the 1950’s, when PERT was developed, computer simulation was ex-
ceedingly arduous. That has changed. In the current era, simulations are easy
to execute on a spreadsheet. Today, PERT and CPM are routinely used in the
management of large-scale development projects.
290 Linear Programming and Generalizations
9. Review
This chapter has exposed you to the terminology that is used to describe
directed networks and to a few representative path-length problems. The
shortest-path problem is well-defined when the network has no cycle whose
length is negative. It can be solved by linear programming. If all arc lengths
are nonnegative, it can also be solved by reaching. If the network is acyclic,
it can also be solved by backwards optimization. All three of these methods
produce a tree of shortest paths.
Let us close with mention of the fact that dynamic programming is espe-
cially well-suited to the analysis of Markov decision models; these models
describe situations in which decisions must be taken in the face of uncer-
tainty. These models are important. They are not explored in this book, but
the ideas that have just been reviewed may help to provide access to them.
1. The network in Figure 8.2 has a tree of shortest paths from node 1 to all
others.
Chapter 8: Eric V. Denardo 291
(a) Write down the optimality equation whose solution gives the lengths
of the paths in this tree.
(b) Find this shortest-path tree by any method (including trial and error)
and draw the analogue of Figure 8.3.
(c) Check that the path lengths in your tree satisfy the optimality equation
you wrote in part (a).
2. Use the reaching method to compute the tree of shortest paths to node 7
of the network in Figure 8.4.
(a) If this network is acyclic, show that its nodes can be relabeled so that
each arc (i, j) has iâ•›<â•›j. Hint: Add an arc at the end of a path repeatedly,
and see what happens.
(b) If this network is cyclic, show that its nodes cannot be relabeled so
that each arc (i, j) has iâ•›<â•›j. Hint: if each arc (i, j) has iâ•›<â•›j, can there be a
cycle?
4. For the network in Figure 8.4, there does not exist a tree of paths from
node 2 to all others. Adapt reaching to find the shortest path from node 2
to each node that can be reached from node 2.
5. For the network in Figure 8.5, there exists a tree of shortest paths from
node 1 to all others.
(a) Write down the optimality equation that is satisfied by the lengths of
the paths in this tree.
(b) Adapt backwards optimization to find the lengths of the paths in this
tree.
6. Reaching was proposed for a network whose arc lengths are nonnega-
tive. Does Reaching work on acyclic networks whose arc lengths can
have any signs? In particular, for the network in Figure 8.5, does reach-
ing find the tree of shortest paths from node 1 to the others. Support
your answer.
minimizes the steepest grade he must climb. Each arc represents a road
segment, and the number adjacent each arc is the steepest grade he will
encounter when traveling that road segment in the indicated direction.
(b) Write an optimality equation for the set of problems you identified in
part (a).
(c) Solve that optimality equation. What route do you recommend? What
is its maximum grade?
(d) Which versions of the principle of optimality are valid for this prob-
lem?
(c) With crashing, what is the optimal project completion time? Which
tasks are critical?
(d) Why is it not economical to shorten the completion time below the
value you reported in part (c)?
10. (Dawn Grinder) Dawn Grinder has 10 hours to prepare for her exam in
linear algebra. The exam covers four topics, which are labeled A through
D. Dawn wants to maximize her score on this test. She estimates that de-
voting j hours to topic x will improve her score by b(j, x) points.
(b) Write the optimality equation for the formulation you proposed in
part (a).
Chapter 8: Eric V. Denardo 293
(c) Dawn has estimated that the benefit of each hour spent on subjects
A or B and C or D are as indicated below. How shall she allocate her
time? How many points can she gain? How many points will she lose
if she takes an hour off? Remark: For these data, the solution should
be pretty clear.
hours 0 1 2 3 4
A or B 0 2 2 3 4
C or D 0 1 1 2 3
11. (No Wonder bakers) No Wonder Bakers currently has 120 bakers in its em-
ploy. Corporate policy allows bakers to be hired at the start of each month,
but never allows firing. Training each new baker takes 1 month and requires
a trained baker to spend half of that month supervising the trainee, rather
than making bread for the company. Eight percent of the bakers and eight
percent of the trainees quit at the end of each month. The demand D(j) for
trained bakers in each of the next seven months, is, respectively, 100, 105,
130, 110, 140, 120, and 100. In particular, D(1)â•›=â•›100. Demand must be met
by production in the current month. Trained bakers and trainees receive the
same monthly wage rate. The company wishes to satisfy demand at mini-
mum payroll cost. Denote as x(j) a number of the number of trained bakers
that is large enough to satisfy the demand during months j through 7, and
denote as t(j) the number of trainees that are hired at the start of month j.
12. (bus stops) A city street consists of 80 blocks of equal length. Over the
course of the day, exactly S(j) people start uptown bus trips at block j, and
exactly E(j) people end uptown bus trips at block. Necessarily,
(a) Suppose that bus stops are located at blocks p and qâ•›>â•›p, but not in
between. Interpret
j−p
q
W(p, q) = [S(j) + E(j)] min .
j=p q−j
(b) Suppose that the first stop is located at block p and that the last stop is
located at block q. Interpret
p 80
B(p) = S(j) (p − j), T(q) = E(j) (j − q).
j=1 j=q
(c) Suppose bus stops are located at blocks p1 < p2 < · · · < p12 . Express
the total number of blocks walked by bus users in terms of the func-
tions specified in parts (a) and (b). Justify your answer.
(d) Can you relate the bus-stop location problem to a dynamic program
each of whose states is a pair (k, q) in which k bus stops are located
in blocks 1 through q, with the highest-numbered stop at block q?
Justify your answer. Hint: look at
f(1, q) = B(q),
f(k, q) = minp<q {f(k − 1, p) + W(p, q)},
F(80) = minq≤150 {f(12, q) + T(q)}
13. (bus stops, continued) For the data in the preceding problem, the director
of public services wishes to optimize with a different objective. She now
wishes to locate bus stops (not necessarily 12 in number) so as to mini-
mize the total travel time of the population of bus users. People walk at
the rate of one block per minute. Buses travel at the rate of 5 blocks per
minute, but they also take 1.5 minutes to decelerate to a stop, allow pas-
sengers to get off and on, and reaccelerate.
Chapter 8: Eric V. Denardo 295
(a) Suppose that stops are located at blocks p and qâ•›>â•›p, but not in be-
tween. Give a formula for the number K(p, q) of people who are on
the bus while it travels from the stop at block p to the stop at block q.
(b) With W(p, q) as given in part (a) of the preceding problem, interpret
(c) Can you relate this bus-stop location problem to a dynamic program
each of whose states is a singleton q? Hints: Is it the case that
f(p)â•›+â•›B(p, q)â•›≥â•›f(q)?
If so, how can you account for the people who walk uptown to the first
stop and uptown from the final stop? and in which k bus stops are lo-
cated in blocks 1 through q, with the highest-numbered stop at block q?
Chapter 9: Flows in Networks
1. Preview
In this chapter, you will see how the simplex method simplifies when it is
applied to a class of optimization problems that are known as “network flow
models.” You will also see that if a network flow model has “integer-valued
data,” the simplex method finds an optimal solution that is integer-valued.
2. Terminology
Figure 9.1 depicts a directed network that has 5 nodes and 7 directed
arcs. As was the case in the previous chapter, each node is represented as a
circle with an identifying label inside, and each directed arc is represented
2 4
3 5
as a line segment that connects two nodes, with an arrow pointing from one
node to the other.
As in the preceding chapter, directed arc (i, j) is said to have node i as its
tail and has node j as its head. Again, a path is a sequence of n directed arcs
with n ≥ 1 and with the property that the head of each arc other than the nth is
the tail of the next. A path is said to be from the tail of its initial arc to the head
of its final arc. A path from a node j to itself is called a cycle. Again, directed
network is sometimes abbreviated to network and directed arc is sometimes
abbreviated to arc.
Chains
Arcs (↜i, j)F and (↜i, j)R are said to be oriented. Arc (↜i, j)F has node i as its tail
and node j as its head. Arc (↜i, j)R has node j as its tail and node i as its head. In
this context, a chain is a sequence of n distinct oriented arcs, with n ≥ 1, whose
orientations are such that the head of each arc but the nth is the tail of the next
arc. A chain is said to be from the tail of its initial arc to the head of its final
arc. In Figure 9.1, (3, 5)R and {(5, 4)F, (3, 4)R} are chains from node 5 to node 3.
Loops
A chain from a node to itself is called a loop. In Figure 9.1, the chain
{(5, 4)F, (3, 4)R, (3, 5)F} is a loop, namely, a chain from node 5 to itself. A loop
Chapter 9: Eric V. Denardo 299
from node i to itself is said to be a simple loop if node i is visited exactly twice
and if no other node is visited more than once.
Spanning trees
The network in Figure 9.1 has several spanning trees, one of which is the
set T of directed arcs in Figure 9.2. These arcs do contain exactly one chain
from every node to every other node. Their chain from node 4 to node 1 is
{(4, 2)F, (1, 2)R}, for instance.
2 4
3 5
(b) T contains a chain from each node i in the network to each node j ≠ i.
Evidently, a spanning tree must contain one fewer arc than the number of
nodes in the network. Spanning trees play an important role in network flow.
Later in this chapter, the spanning trees will be shown to be the bases for the
“transportation problem,” for instance.
• The flow on each arc occurs from its tail to its head.
• Each arc can have a positive lower bound on the amount of its flow.
• Each arc can have a finite upper bound on the amount of its flow.
• Each node can have a fixed flow, and a node’s fixed flow may be into or
out of that node.
• Flow is conserved at each node; the sum of the flows into each node
equals the sum of the flows out of that node.
The flows into nodes U, V and W are decision variables. On what arcs do
these flows occur? Each arc must have a head and a tail. Implicitly, the network
in Figure 9.3 has an unseen node. Let us label it node α. The flows into node U,
V and W occur on directed arcs (α, U), (α, V), and (α, W). Also, the fixed flows
out of nodes 1 through 4 occur on arcs (1, α) through (4, α).
Chapter 9: Eric V. Denardo 301
= 200
1
≤ 250
U
U
= 300
2
≤ 400
V
V
= 250
3
≤ 350
W
= 150
4
Flow is conserved at the unseen node, and this occurs automatically. The
flow conservation constraints for the seven nodes in Figure 9.3 guarantee that
the total flow into node α equals 900 and that the total flow out of node α
equals 900.
The model
• A finite set N whose members arc called nodes and, possibly, one
“unseen” node α that is not included in N.
The network flow model is a linear program that is cast in the format of
Program 9.1. Minimize (i, j) ∈ A ij ij , subject to
c x
Program 9.1 has three elements of data per arc; each arc (i, j) has a unit cost
cij, a lower bound Lij and an upper bound Uij. Each lower bound is nonnegative.
Each arc’s upper bound must be at least as large as its lower bound, and each
upper bound can be as large as +∞. Program 9.1 also has one element of data
302 Linear Programming and Generalizations
per node in N; the number Di for node i is called node i’s net outward flow.
The number Di can be positive, negative, or zero. If Di is positive, it is a fixed
flow out of node i. If Di is negative, then −Di it is a fixed flow into node i.
Program 9.1 minimizes the cost of the flow subject to constraints that
keep each arc’s flow between its lower and upper bound and require the total
flow into each node in N to equal the total flow out of that node. The network
may include one node α that is not in N. The flow-conservation constraint
for node α can be (and is) omitted from (2) because this constraint is implied
by the others.
An example
- 1.3
2 4 8
2
9.
4.
1
12 1
- 2.1
0
3.
-6
.0
8.0
3 5 4
Problem 9.A.╇ For the directed network in Figure 9.4, each arc has 0 as its
lower bound and has + ∞ as its upper bound. Find a least-cost flow.
The decision variables in a network flow problem are the flows on the
arcs. For the network in Figure 9.4, xij denotes the flow on arc (i, j). Program
9.2, below, minimizes the cost of the flow, subject to constraints that keep
Chapter 9: Eric V. Denardo 303
the flows nonnegative and that conserve flow at each node. Each node’s flow
conservation constraint is written in the format, flow in equals flow out.
Program 9.2. Min {9.2 x12 − 6 x13 + 4.1 x25 + 3 x34 + 8 x35 − 1.3 x42 − 2.1 x54}, s. to
A spreadsheet
Writing equations (3.1) through (3.6) with the decision variables on the
left-hand side and the data on the right-hand side produces rows 5 through
9 of the spreadsheet in Table 9.1. In this spreadsheet, the labels of the arcs
appear in row 2, the flows appear in row 3, and the unit costs are in row 4.
Table 9.1.╇ A spreadsheet for Problem 9.A.
Solver has minimized the quantity in cell J4 of Table 9.1, with C3:I3 as
its changing cells and subject to the constraints J5:J9 = L5:L9 and C3:I3 ≥ 0.
By doing so, Solver has minimized the cost of the flow that satisfies the flow-
conservation constraints and keeps the flows nonnegative. Table 9.1 reports
an optimal solution that sets
304 Linear Programming and Generalizations
(4) x13 = 12, â•… x25= 4, â•… x34 = 12, â•… x42 = 4,
and that equates the remaining flows to zero. That the values of these flows
are integers is no accident, as will soon be evident.
Problem 9.A╇ does have integer-valued data; its fixed flows are integers, its
lower bounds are 0, and its upper bounds are infinite. A property of network
flow problems with integer-valued data appears below as
Claim #1:╇ Consider any arc (↜i, j) ∈ B. The decision variable xij is basic. If
Lij is positive, then yij is basic. If Uij is finite, then zij is basic.
Proof of Claim:╇ Since xij is not zero, it must be basic. Suppose Lij is is
positive. Since xij is not integer-valued and since Lij is integer-valued, the
decision variable yij = xij − Lij is not zero, hence is basic. Similarly, if Uij is a
finite integer, zij cannot equal zero, hence must be basic. This proves the claim.
When a node repeats, a simple loop T has been found. Perturb x, y, and
z as follows. Add a positive number K to the flow on each forward arc in this
simple loop and subtract K from each reverse arc in this simple loop. If an arc
in this loop has a positive lower bound Lij, decrease or increase by K the value
of the basic variable yij so as to preserve a solution to Lij = xij − yij. Do the same
for the arcs in this loop that have finite upper bounds. The perturbed solution
satisfies (2) and the Form-1 representation of (1). Claim #1 shows that only
the values of basic variables have been perturbed. This contradicts the fact
that the basic solution is unique. Hence, B must be empty. ■
The gist
-K
3 55
-K
An important result
Large network flow problems that have integer-valued data tend to have
many alternative optima. For those problems, the simplex method enjoys an
advantage over interior-point methods. Each simplex pivot produces a basic
solution, which is guaranteed to be integer-valued. In particular, the opti-
mal solution with which the simplex method terminates is integer-valued. By
contrast, interior-point methods converge to the “center” of the set of optimal
solutions, which is not integer-valued when there are multiple optima.
• There are m supply nodes, which are numbered 1 through m, and the
positive datum Si is the number of units available for shipment out of
supply node i.
• Shipment can occur from each supply node to each demand node. For
each pair (i, j), the cost of shipping each unit from supply node i to
demand node j equals cij.
The decision variables in the transportation problem are the quantity xij
to ship from each supply node i to each demand node j. The transportation
problem is the linear program,
m n
Program 9.3.╇ Minimize i=1 j=1 cij xij , subject to the constraints
nn
(5.i) xij x≤ij S≤i Sifor for
j=1 j=1 i =i 1,
=2,1,.2,
. . ., .m,
. , m,
mm
xijxij==DD forforj =
j =1,1,
2,2,
. ....,.n,, n,
(6.j) i=1
i=1 j j
Equation (5.i) requires that not more than Si units are shipped out of supply
node i. Equation (6.j) requires that exactly Dj units are shipped into demand
node j. The constraints in (7) keep the shipping quantities nonnegative.
By summing (5.i) over i, we see that the sum of the shipping quantities
cannot exceed the sum of the supplies. By summing (6, j) over j, we see that
the sum of the shipping quantities must equal the sum of the demands. As a
consequence, Program 9.3 cannot be feasible unless its data satisfy
m n
(8) i=1 Si ≥ j=1 Dj .
Testing for (8) is easy. For the remainder of this discussion, it is assumed
that (8) holds. In fact, a seemingly-stronger assumption is invoked. It is that
308 Linear Programming and Generalizations
expression (8) holds as an equation. Thus, for the remainder of this section, it
is assumed that the aggregate supply equals the aggregate demand. This entails
no loss of generality; it can be obtained by including a dummy demand node,
say node m, whose demand equals the excess of the aggregate supply over the
aggregate demand and with shipping cost cim = 0 for each (supply) node i.
When (8) holds as an equation, every solution to (5) and (6) satisfies each
inequality as an equation. Assuming that aggregate supply equals aggregate
demand lets us switch our attention from Program 9.3 to
m n
Program 9.3E. Minimize i=1 j=1 cij xij â•›, subject to the constraints
n
nx = S = 1,
(9) ij xij ≤i Si for
j=1 j=1 for ii = 1,2,
2,......, m,
, m,
mm
(10) xijxij==DD forforj =
j =1,1,
2,2,
. ....,.n,
, n,
i=1
i=1 j j
The remainder of this section is focused on Program 9.3E (the “E” being
short for equality-constrained), and it is assumed that (8) holds as an equation.
An example
Figure 9.6 presents the data for a transportation problem that has m = 3
(three source nodes) and n = 5 (five demand nodes). The supplies S1 through
S3 are at the right of the rows. The demands D1 through D5 are at the bottom
of the columns. The number in the upper left-hand corner of each cell is the
shipping cost from the supply in its row to the demand in its column. By
reading across the first row, we see that c11 = 4, c12 = 7, c13 = 3, and so forth.
The total of the supplies equals 10,000, and the sum of the demands equals
10,000, so aggregate supply does equal the aggregate demand. Demand node
5 has D5 = 1,000, and each shipping cost in its column equals 0. Evidently,
demand node 5 is a dummy node that “absorbs” at zero cost the excess supply
of 1,000 units.
Problem 9.B. For the transportation problem whose data are presented in
Figure 9.6, find a least-cost flow.
Chapter 9: Eric V. Denardo 309
The simplex method will soon be executed directly on diagrams like that
in Figure 9.6. There is no need for simplex tableaus.
4 7 3 5 0
2500 = S1
10 9 3 6 0
4000 = S 2
3 6 4 4 0
3500 = S3
2000 3000 2500 1500 1000
=
D1 D2 D3 D4 D5
Initializing Phase II
• Their sum across each row must equal that row’s supply.
• Their sum down each column must equal that column’s supply.
• Record as xij the smaller of the unsatisfied supply in row i and the
unsatisfied demand in column j.
-╇If this exhausts the supply in row i but not the demand in column j,
increase i by 1 and repeat.
310 Linear Programming and Generalizations
-╇If this exhausts the demand in column j but not the supply in row i,
increase j by 1 and repeat.
-╇If this exhausts both the supply in row i and the demand in column j,
either increase i by 1 and repeat or increase j by 1 and repeat.
Figure 9.7 displays the result of applying this rule to the example in Figure
9.6. The first step sets x11 = 2,000. This reduces S1 to 500, and it reduces D1
to 0, so j is increased to 2. The second step sets x12 = 500, which exhausts the
supply at node 1, so i is increased to 2. And so forth.
4 7 3 5 0
500
2000 500 2500
10 9 3 6 0
1500
2500 1500 4000
3 6 4 4 0 1000
2500
1000 1500 1000 3500
2000 3000 2500 1500 1000
2500 1000
The entries in Figure 9.7 do form a feasible solution: These entries are
nonnegative. Their sum across each row equals that row’s supply. And the
sum down each column equals that column’s demand.
A spanning tree
The directed network in Figure 9.8 records this feasible solution. Its arcs
correspond to the cells in Figure 9.7 to which flows have been assigned, and
the value of each flow appears beside its arc. The fixed flows into supply nodes
1-3 and out of demand nodes 1-5 are recorded next to stubby arrows into and
out of their nodes.
The seven arcs in Figure 9.8 form a spanning tree; these arcs contain no
loop, and they contain a chain from every node to every other. It will soon be
shown that the flows in Figure 9.8 are a basis, moreover, that the bases for the
transportation problem correspond to the spanning trees. That is the content of
Chapter 9: Eric V. Denardo 311
Figure 9.8.╇ The spanning tree and basic solution constructed by the
Northwest Corner rule.
'HVWLQDWLRQ
6RXUFH
Claim #2:╇ A spanning tree exists, and the rank of (9)-(10) equals m + n – 1.
If B is a spanning tree, then S is a basis.
Proof:╇ That a spanning tree T exists is obvious. Proposition 9.1 shows that
|T| = n + m – 1. The set S of decision variables that correspond to the arcs in
312 Linear Programming and Generalizations
Multipliers
Proof*:╇ The variable xij has coefficients of 0 in all but one constraint in (9)
and in all but one constraint in (10). It has coefficients of +1 in the constraints
whose multipliers are ui and vj, which justifies (11). Also, if xij is basic, its top-
row coefficient (reduced cost) cij equals zero, which justifies (12). ■
The multipliers for Program 9.3E are not unique. To see why, consider
any solution to (11). If we add to each multiplier ui a constant K and subtract
from each multiplier vj the same constant K, we obtain another solution to
(11). As a consequence, we can begin by picking one multiplier and equating
it to any value we wish, and then use (11) to compute the values of the other
multipliers.
v1 = 4 v2 = 7 v3 = 1 v4 = 1 v5 = –3
u1 = 0 4 7 3 5 0
Having set u1 = 0, the fact that x11 is basic lets us compute v1 from (11)
because
c11 = u1 + v1, so 4 = 0 + v1,
which gives v1 = 4. Similar arguments show that v2 = 7, that u2 = 2, and so
forth.
314 Linear Programming and Generalizations
An entering variable
The reduced costs are given in terms of the multipliers by equation (9),
which is cij = cij − ui − vj. The reduced cost of each basic variable equals 0.
Figure 9.10 records the reduced cost cij of each nonbasic variable xij in that
variable’s cell.
v1 = 4 v2 = 7 v3 = 1 v4 = 1 v5 = –3
u1 = 0 4 7 3 5 0
– +
2000 500 +2 +4 + 3 2500
u2 = 2 10 9 3 6 0
– +
+4 2500 1500 +3 + 1 4000
u3 = 3 3 6 4 4 0
+ –
-4 -4 1000 1500 1000 3500
2000 3000 2500 1500 1000
A loop
In Figure 9.10, the variable x31 has been selected as the entering variable,
as is indicated by the “+” sign in its cell. Setting x31 = K requires the values of
the basic variables to be perturbed like so:
• Finally, to preserve the flow into demand node 3, we must decrease x33
by K.
In Figure 9.10, the cells whose flows increase are recorded with a “+” sign
and the cells whose flows decrease are recorded with a “–” sign. The effect of
this perturbation is to ship K units around the loop
{(3, 1)F, (1, 1)R, (1, 2)F, (2, 2)R, (2, 3)F, (3, 3)R},
and the shipping costs for the arcs in this loop indicate that shipping K units
around this loop changes cost by
K(3 − 4 + 7 − 9 + 3 − 4) = −4K,
A leaving variable
The largest value of K for which the perturbed solution stays feasible is
1,000, the smallest of the shipping quantities in the cells that are marked with
“–” signs. Thus, the simplex pivot calls for the variable x31 to enter the basis
and x33 to leave the basis. Figure 9.11 records the basic feasible solution that
results from this pivot.
4 7 3 5 0
Recorded in Figure 9.12 are the multipliers for the second basis. Also
recorded in that figure is the reduced cost of x25, which equals −3. Selecting
x25 as the entering variable for the next pivot creates the loop for which arcs
have “+” and “−” signs. Note that a tie occurs for the departing variable.
Setting the value K of the entering variable equal to 1,000 causes the values of
x11 and x35 to equal 0.
v1 = 4 v2 = 7 v3 = 1 v4 = 5 v5 = 1
u1 = 0 4 7 3 5 0
– +
1000 1500 2500
u2 = 2 10 9 3 6 0
– +
1500 2500 -3 4000
u3 = –1 3 6 4 4 0
+ –
1000 1500 1000 3500
2000 3000 2500 1500 1000
Degeneracy
Aircraft scheduling
4.╇The largest amount K that can be shipped around this loop while
preserving feasibility identifies a variable to remove from the basis.
318 Linear Programming and Generalizations
The simplex method for network flow does not require simplex tableaus.
Nor does it require diagrams akin to Figure 9.10. Efficient implementations
use two or three “pointers” per node, instead. Given a basic solution and its
multipliers, these pointers enable quick identification of: (i) the entering
variable; (ii) the loop created by the entering variable; (iii) the leaving variable;
and (iv) the change in flows and multipliers due to the pivot. Updating the
pointers after a pivot occurs is not difficult, but it lies outside the scope of this
book.
Speed
1
╇Norman Zadeh, “A bad network problem for the simplex method and other mini-
mum cost flow algorithms, “Mathematical Programming, V. 5, p 255-266, 1973.
2
╇H. K. Kuhn, “The Hungarian method for the assignment problem,” Naval Research
Logistics Quarterly, V 2, pp. 83-97, 1955.
Chapter 9: Eric V. Denardo 319
One might hope to subtract constants from the rows and columns of the
cost matrix so that:
This might seem to be something new. But it isn’t new. Equations (11)
and (12) indicate that this is exactly what the multipliers for the optimal basis
accomplish.
Table 9.2.╇ Data for a transportation problem equivalent to that in Figure 9.6.
A partial shipping plan ships only on arcs whose costs are zero. A myopic
rule for establishing a partial shipping plan is as follows: Repeatedly, identify
an arc whose cost equals zero and ship as much as possible on that arc. For the
data in Table 9.2, one implementation of this myopic rule sets
This partial shipping is recorded in the cells of Table 9.3 that have “chicken
wire” background. The other cells in the boxed-in array contain shipping
costs. Recorded to the right of this array is the residual supply, which is 3,000
units at node 2. Recorded below this array are the residual demands, which
are 1,500 units at node 5 and 1,500 units at node 7.
Four of the nodes in Table 9.3 are labeled “R.” These are the nodes in
the “reachable network” whose arcs appear as solid lines in Figure 9.13. The
reachable network contains those arcs on which the residual supply can be
shipped at zero cost. This network includes arcs (2, 6) and (2, 8) because the
Chapter 9: Eric V. Denardo 321
residual supply at node 2 can be sent on arcs (2, 6) and (2, 8) at zero cost.
This network also includes arc (1, 6) because flow that reaches node 6 can be
forwarded to node 1 by decreasing x16 from its current value of 2,500.
1 44
1 1500
1 5
25 00
3000 2 6
1
3
3 7 1500
These nodes are shaded in Fig. 9.13 and they are labeled “R” in Table 9.3.
Note that each arc (↜i, j) with i ∈ R and j ∉ R has a shipping cost that is
positive; if its cost were 0, node j would have been included in R. Figure 9.13
displays as dashed lines the three arcs (↜i, j) with i ∈ R and j ∉ R for which
costs are smallest. These three arcs are (1, 4), (1, 5), and (1, 7).
322 Linear Programming and Generalizations
Figure 9.13 suggests how to revise the shipping costs in a way that keeps
all shipping costs nonnegative, keeps the cost of the partial shipment equal
to zero, and allows the set R of reachable nodes to be enlarged. Denote as Δ
the smallest of the shipping costs on arcs (↜i, j) with i ∈ R and j ∉ R. For the
shipping costs in Table 9.3,
In this example and in general, the revised costs have the properties that
are listed below.
(a) If xij is positive, its cost cij remains equal to zero. (That occurs because
R contains either both i and j or neither.)
(b) No cost becomes negative. (That occurs because each arc (↜i, j) for
which cost cij will be reduced has cij ≥ Δ.)
(c) The cost of each arc in the reachable network remains equal to zero.
(Each such arc has its head and its tail in R, so its cost is not revised.)
(d) Each arc (↜i, j) that attains the minimum in (16) has cost cij decreased
to zero. Each such arc has i in R.
Point (a) guarantees that the cost of the partial shipping plan remains equal
to zero. Point (b) guarantees that the shipping costs remain nonnegative. Point
(c) guarantees that the arcs that had been in the reachable network remain in
that network. Point (d) allows at least one arc to be added to the reachable
network. In brief, the revised costs stay nonnegative, the cost of the partial
shipping plan stays equal to zero, and set of reachable nodes can be enlarged.
Chapter 9: Eric V. Denardo 323
Incremental shipment
This revision reduces to zero the shipping costs on arcs (1, 4), (1, 5) and
(1, 7). As Figure. 9.13 attests, it becomes possible to ship some of the residual
supply from node 2 to residual demand nodes 5 and 7. Shipping 1500 units
from node 2 to node 5 on the chain {(2, 6)F, (1, 6)R, (1, 5)F} satisfies the resid-
ual demand at node 5, and it reduces x26 from 2,500 to 1,000. Shipping
an additional 1,000 units from node 2 to node 7 on the on the chain {(2, 6)F,
(1, 6)R, (1, 7)F} reduces the residual demand at node 7 from 1,500 to 500 units,
and it reduces x26 from 1,000 to 0.
Table 9.4 reports the current shipment costs and the partial shipment
plan that results from these shipments. A residual supply of 500 units remains
at node 2, and a residual demand of 500 units remains at node 7.
moreover, that the next revision of the shipping costs will reduce c27 to 0,
thereby allowing the remaining 500 units to be shipped directly from node 2
to node 7. Evidently, an optimal shipping plan sets
Speed
7. Review
With some modification, these three properties hold for all network flow
problems.
If you read the starred section on the Hungarian method, you learned
of an algorithm that competes with the simplex method for network flow
problems and that has polynomial worst-case behavior when it is applied to
the assignment problem.
Chapter 9: Eric V. Denardo 325
1.╇Beginning with the tableau in Figure 9.12, continue pivoting until you find
an optimal solution to the transportation problem whose data appear in
Figure 9.6. Did you encounter any degenerate pivots?
3.╇ (faster start for the transportation problem) The Northwest Corner rule ini-
tializes the simplex method for transportation problems, but it ignores the
shipping costs. This problem illustrates one of the ways to obtain an initial
spanning tree that is feasible and that accounts for the shipping costs.
4.╇A swimming coach has timed her five best swimmers in each of the four
strokes that are part of a relay event. No swimmer is allowed to do more
than one stroke. Recorded below is the amount (possibly zero) by which
each swimmer’s time exceeds the minimum in each stroke.
(b)╇Use Solver to find an optimal solution. Interpret the values that Solver
assigns to the shadow prices.
Note: The next three problems relate to the transportation problem whose
data are presented in the diagram:
5.╇ For the 4 × 6 transportation problem whose data appear above:
(a)╇Use the NW corner rule to find a spanning tree with which to initialize
the simplex method.
(b)╇Then use tableaus like the one in Figure 9.9 of this chapter to execute
two pivots of the simplex method.
6.╇ For the 4 × 6 transportation problem whose data appear above:
(a)╇Find a partial shipping plan that exhausts the top three supplies and
ships only on arcs that cost 0. Which demands are fully satisfied?
(b)╇Draw the analogue of Table 9.2 for this partial shipment plan.
(c)╇Indicate how to alter the shipping costs so as to allow the network you
drew in part (b) to be enlarged.
(d)╇Repeat steps (b) and (c) until you have found an incremental network
that includes a demand node that has residual demand. Alter the
partial shipping plan to ship as much as possible to that node, while
keeping the shipping cost equal to zero.
Chapter 9: Eric V. Denardo 327
7.╇ For the 4 × 6 transportation problem whose data appears above:
(b)╇Use the Options Button to help you record each basic solution that Solver
encounters as it solves this linear program. Turn in a list of the basic
solutions that Solver found. Were any of these solutions degenerate?
Chapter 10 contains the information about vector spaces that relates di-
rectly to linear programs. You may find that much of this information is famil-
iar, and you may find that linear programming strengthens your grasp of it.
As was noted in earlier chapters, shadow prices may not exist. The mul-
tipliers always exist. They may not be unique. Even when the multipliers are
ambiguous, they are shown to account properly for the relative opportunity
cost of each decision variable. The multipliers are also shown to guide the
simplex method as it pivots.
In this chapter, the simplex method with multipliers is used to prove the
“Duality Theorem” of linear programming. This theorem shows how each
linear program is paired with another. Several uses of the Duality Theorem
are presented in this chapter, and other uses appear in later chapters.
330 Linear Programming and Generalizations
This chapter introduces you to the “dual” simplex pivot, and it presents
several algorithms that employ simplex pivots and dual simplex pivots. One
of these algorithms is a one-phase “homotopy” that pivots from an arbitrary
basis to an optimal basis. Another algorithm solves integer programs.
Chapter 10: Vector Spaces and
Linear Programs
1. Preview
It is recalled that the column space of the matrix A is the set of all linear
combinations of the columns of A. Similarly, the row space of the matrix A
is the set of all linear combinations of the rows of A. Included in this chapter
are:
• A demonstration that different bases for the same vector space must
contain the same number of elements.
╛╛cx – z = 0,
Ax â•›= b,
╇╛x ≥ 0.
The data in Program 10.1 are the m × n matrix A, the m × 1 vector b and
the 1 × n vector c. Its decision variables are z and x1 through xn. The decision
variables x1 through xn are required to be nonnegative, and they are arrayed
into the n × 1 vector x. Evidently, Program 10.1 has m equality constraints,
excluding the equation that defines z as the objective value, and it has n deci-
sion variables, other than z.
Using subscripts to identify columns allows the jth column of the matrix
product AB to be expressed as
The second equation in (2) is familiar; to see why, substitute x for the
column vector Bj and observe that this equation expresses Ax as a linear com-
bination of the columns of A.
Similarly, using superscripts to identify rows allows the ith row of the ma-
trix product AB to be written as
A fundamental result in linear algebra is that different bases for the same
vector space must contain the same number of vectors. In earlier chapters,
this result was cited without proof. A proof appears here. This proof rests
squarely on Gauss-Jordan elimination. It begins with
Proposition 10.1.╇ Each set of r × 1 vectors that is linearly independent
contains r or fewer vectors.
show that sâ•›≤â•›r. Assign these s vectors the labels A1 through As and let A be
the r × s matrix whose jth column is Aj for jâ•›=â•›1, 2, …, s. Apply Gauss-Jor-
dan elimination to the equation Axâ•›=â•›0. This equation is satisfied by setting
xâ•›=â•›0, so Proposition 3.1 shows that Gauss-Jordan elimination must identify
a set C of columns of A that is a basis for the column space of A. The num-
ber |C| of columns in this basis equals the number of rows on which pivots
have occurred, and that cannot exceed the number r of rows of A. Hence,
|C| â•›≤â•›r.
Aiming for a contradiction, suppose râ•›<â•›s. Since |C| â•›≤â•›r, at least one col-
umn of A must be excluded from the basis C for the column space, and that
column is a linear combination of the set C of columns, which contradicts the
hypothesis that the vectors in S are linearly independent, thereby completing
a proof. ■
(a)╇The vector space V has a basis, and that basis contains not more than
n vectors.
Proof.╇ This set V must contain at least one vector v other than 0. Begin-
ning with Sâ•›=â•›{v}, augment S repeatedly, as follows: If V contains a vector w
that is not a linear combination of the vectors in S, replace S by S ∪ {w}. Re-
peat. Proposition 10.1 guarantees that this procedure stops before S contains
nâ•›+â•›1 vectors. It stops with S equal to a set of linearly independent vectors that
span V, that is, with a basis for V. This proves part (a).
For part (b), consider any two bases for V. Label the vectors in one of
these bases A1 through Ar, and label the vectors in the other basis B1 through
Bs, and label these bases so that râ•›≤â•›s. Each vector in the second basis is a linear
combination of the vectors in the first. Since Bj is a linear combination of the
vectors A1 through Ar, there exist scalars C1j through Crj such that
With A as the n × r matrix whose ith column equals Ai for iâ•›=â•›1, … r and
with C as the r × s matrix whose ijth entry is Cij for iâ•›=â•›1, …, r and jâ•›=â•›1, …, s,
equations (4.j) and (2) give
With B as the nâ•›×â•›s matrix whose jth column equals Bj for jâ•›=â•›1, …, s, the
equations in system (5) form
(6) B = AC.
The bases have been labeled so that râ•›≤â•›s. Aiming for a contradiction, sup-
pose that râ•›<â•›s. In this case, the r × s matrix C has more columns than rows,
so Proposition 10.1 shows that its columns are linearly dependent, hence that
there exists an s × 1 vector xâ•›≠â•›0 such that Cxâ•›=â•›0. Postmultiply (6) by x to
obtain Bxâ•›=â•›ACxâ•›=â•›A(Cx)â•›=â•›A0â•›=â•›0, which shows that the columns of B are lin-
early dependent. This cannot occur because the columns of B are a basis, so it
cannot be that râ•›<â•›s. Thus, râ•›=â•›s, and a proof is complete. ■
Proposition 10.2 shows that each basis for a given vector space V has the
same number of vectors. The number of elements in each basis for a vector
space is called the dimension or rank of that vector space. In particular, the
number of vectors in each basis for the column space of a matrix A is known
as the column rank of A. Similarly, the number of vectors in each basis for the
row space of a matrix A is known as the row rank of A.
This is the first of two sections in which matrices are used to describe
pivots. In the current section, an example is used to illustrate properties that
hold in general.
A familiar example
2 4 −1 8 4
1 2 1 1 1
.
(7) A=
and b=
0 0 2 −4 −4/3
−1 1 −1 1 0
2 4 −1 8 4
1 2 1 1 1
(8) [A, b] = .
0 0 2 −4 −4/3
−1 1 −1 1 0
Equation (8) omits the column headings: Reading from left to right, these
column headings are x1 through x4 and RHS. The entries in (8) are identical to
the entries in cells B2:F5 of Table 3.1.
1 0 0 5/3 1
0 0 1 −2 −2/3
(9) [A, b̄] =
0
.
0 0 0 0
0 1 0 2/3 1/3
This tableau in equation (9) is basic: The variables x1, x2 and x3 are basic
for rows 1, 4 and 2, respectively, and row 3 is trite. Columns A1, A2 and A3 are
a basis for the column space of A.
A pivot matrix
In Chapter 3, the first pivot made x1 basic for the 1st row of the tableau.
This pivot changed the entries in the 1st column of (8) from [2 1 0 –1]T to
[1 0 0 0]T. It did so by altering the rows of the tableau in (8) in these ways:
• Row (2) was replaced by itself plus row (1) times (−1/2).
Chapter 10: Eric V. Denardo 337
• Row (3) was replaced by itself plus row (1) times (0).
• Row (4) was replaced by itself plus row (1) times (1/2).
It will be seen that the effect of this pivot is to premultiply [A, b] by the
4 × 4 pivot matrix P(1) that is specified by
1/2 0 0 0
−1/2 1 0 0
(10) P(1) =
0
.
0 1 0
1/2 0 0 1
To see that the matrix product P(1) A has the desired effect, note that
1/2 0 0 0 2 1
−1/2 1 0 0 1 0
P(1) A1 =
0
= .
0 1 0 0 0
1/2 0 0 1 −1 0
1 2 −1/2 4 2
0 0 3/2 −3 −1
(11) P(1) [A, b] = .
0 0 2 −4 −4/3
0 3 −3/2 5 2
A variable has become basic for the 1st row of the tableau, for which that
reason only the 1st column of P(1) differs from the corresponding column of
the identity matrix.
The second pivot makes x3 basic for the row (2) of the tableau
in (11). This pivot changes the entries in the 3rd column of (11) from
[–1/2 3/2 2 –3/2]T to [0 1 0 0]T. To check that premultiplying the tableau
in (11) by the matrix P(2) given in
1 1/3 0 0
0 2/3 0 0
(12) P(2) =
0 −4/3 1
.
0
0 1 0 1
338 Linear Programming and Generalizations
Only the 2nd column of P(2) differs from the identity matrix. Equations
(10) and (12) illustrate a property that holds in general and is highlighted
below:
The pivot matrix for a pivot on a coefficient in the kth row differs from
the identity matrix only in its kth column.
Spreadsheet computation
These pivot matrices can be created on a spreadsheet, and the matrix mul-
tiplications can be done with Excel. Table 10.1 indicates how. For instance:
This example will next be used to illustrate two properties that are shared
by every sequence of pivots.
An observation
In equation (13), the 3rd column of Q(3) equals I3, the 3rd column of the
identity matrix. That is no accident. No variable been made basic for the 3rd
Chapter 10: Eric V. Denardo 339
row of the tableau, for which reason the 3rd column of the pivot matrices P(1),
P(2) and (P3) equal I3, so repeated use of equation (2) gives
If none of the first p pivots occur on an element in row k, the kth column
of Q(p) equals Ik.
A second observation
Equation (9) shows that row (3) of the tableau [ A, b̄ ] = Q(3) [A, b] is
3
trite; it consists entirely of 0’s. Equation (3) shows that A = Q(3)3 A . Hence,
from (13), we see that
340 Linear Programming and Generalizations
(15) [0, 0, 0, 0] = A 3 = Q(3)3A = (2/3)A1 + (−4/3)A2 + 1A3.
If a sequence of pivots causes the kth row A k to equal 0, then no pivot has
occurred any coefficient in row k, and Ak is a linear combination of the
rows of A on which pivots have occurred.
After any number p (including 0) of pivots have occurred, the initial tab-
leau is transformed into the tableau,
A pivot matrix
the ith row and jth column of (18). This coefficient must be nonzero. It will
be seen that this pivot is executed by premultiplying the array [ A , b̄ ] by the
m × m matrix P(pâ•›+â•›1) that differs from the identity matrix only in its ith col-
umn and is given by
1 −A1j /Aij 0
.. . . .. ..
. . . .
(19) P(p + 1) =
0 · · · 1/Aij · · · 0 .
.. .. . .
. . ..
. .
0 −Amj /Aij 1
If the matrix product P(pâ•›+â•›1)[ A , b̄ ] is to make xj basic for row i, its jth
column must have a 1 in row i and have 0’s elsewhere. Substitute to obtain
A sequence of pivots
Let us consider the effect of beginning with the tableau [A, b] and execut-
ing any finite sequence of pivots. If p pivots have occurred so far, the initial
tableau [A , b] has been transformed into the current tableau
(20) [ A , b̄ ] = Q(p)[A, b]
and where P(j) is the pivot matrix for the jth pivot.
The example in the prior section suggests that if none of the first p pivots
occurred on a coefficient in row k, then the kth column of Q(p) equals the
342 Linear Programming and Generalizations
kth column of the identity matrix. That example also suggests that if the kth
row of the matrix A that results from the first p pivots consists entirely of
0’s, then the kth row of A is a linear combination of the rows of A on which
pivots have occurred. These suggestions are shown to be accurate by parts
(a) and (b) of
Proposition 10.3.╇ Equations (20) and (21) describe the tableau [ A, b̄]
that results from any finite number p of pivots on an initial tableau [A, b].
Denote as R the set of rows on which these p pivots have occurred.
Since k ∈ / R , part (a) of this proposition gives Q(p)kkâ•›=â•›1. Part (a) also gives
Q(p)ikâ•›=â•›0 for each iâ•›≠â•›k that is not in R, Thus, from (22),
0 = Ak + Q(p)ik Ai .
(23)
i∈R
Since |R| = |C| , we have shown that every matrix A has row rank that
does not exceed its column rank. The row rank of the transpose of A equals
the column rank of A, so it must be that every matrix A has row rank equal to
its column rank, which proves part (c). Moreover, since {Ai : i ∈ R} spans
the row space of A, it cannot be that {Ai : i ∈ R} is linearly dependent, as
this would imply that the row rank of A is less than |R|. Hence, {Ai : i ∈ R}
is a basis for the row space of A, which proves part (b). ■
Proposition 10.4 demonstrates that that the row rank of each matrix
equals its column rank. This justifies a definition; the rank of a matrix is the
number of vectors in any basis for its column space or in any basis for its row
space.
The Full Rank proviso was employed in Chapter 4. In that chapter, a lin-
ear program was said to satisfy the Full Rank proviso if any basic tableau for
its Form 1 representation has a basic variable for each row. Program 10.1 is
written in the format of Form 1.
Proposition 10.5.╇ Program 10.1 satisfies the Full Rank proviso if and
only if the rows of A are linearly independent.
Proof.╇ The constraints of Program 10.1 are the equations cxâ•›−â•›zâ•›=â•›0 and
Axâ•›=â•›b and the nonnegativity requirements xâ•›≥â•›0. The equations form the lin-
ear system
c 1 −z 0
= .
A 0 x b
c 1
F= .
A 0
Suppose the Full Rank proviso is satisfied. A basic solution exists that has
mâ•›+â•›1 basic variables. Proposition 10.2 shows that every basic solution has
mâ•›+â•›1 variables, hence that the rank of A equals m.
Suppose the rank of A equals m. The row rank of F must equal mâ•›+â•›1, so
every basic solution must have a basic variable for each row. ■
Chapter 10: Eric V. Denardo 345
Many readers will recall that the m × m matrices B and C are each oth-
ers’ inverses if BCâ•›=â•›I. Some readers will recall that the preceding statement is
a theorem – an implication of a more primitive definition of the “inverse” of
a matrix.
(24) CB = BD = I.
If B is invertible, then (24) and the fact that matrix multiplication is as-
sociative gives
(26) CB = BC = I.
Equation (25) also shows that at most one matrix C can satisfy (26).
If B is invertible, the unique matrix C that satisfies (26) is called the in-
verse of B. The inverse of B, if it exists, is denoted B−1. Not every square ma-
trix is invertible; if a row of B consists entirely of 0’s, the matrix B cannot be
invertible, for instance.
Proposition 10.6.╇
Proof.╇ Part (a) has been proved. For part (b) the fact that matrix multi-
plication is associative is used in
346 Linear Programming and Generalizations
Pivot matrices
1 A1j 0
.. . . .. ..
. . . .
(28) R = 0 ···
Aij ··· 0 .
.. .. .. ..
. . . .
0 Amj 1
Permutation matrices
Proof.╇ It will be demonstrated that (a) ⇒ (b) ⇒ (c) ⇒ (d) ⇒ (e) ⇒ (a).
Proposition 10.8 (theorem of the alternative for linear systems).╇ For each
m × n matrix A and each m × 1 vector b, exactly one of the following alter-
natives holds:
Proof.╇ The proof will show that if (a) holds, (b) cannot and that if (a)
does not hold, (b) must.
(a) implies not (b). By hypothesis, there exists an n × 1 vector x such that
Axâ•›=â•›b. Aiming for a contradiction, suppose that (b) also holds, so there exists
a 1 × m vector y such that yAâ•›=â•›0 and ybâ•›≠â•›0. Premultiply Axâ•›=â•›b by y to obtain
yAxâ•›=â•›ybâ•›≠â•›0. Postmultiply yAâ•›=â•›0 by x to obtain yAxâ•›=â•›0 xâ•›=â•›0. This establishes
the contradiction 0â•›=â•›yAxâ•›≠â•›0. Thus, if (a) holds, (b) cannot.
Not (a) implies (b). By hypothesis, there exists no n × 1 vector x such that
Axâ•›=â•›b. Application of Gauss-Jordan elimination to the array [A, b] must re-
sult in an inconsistent row. Proposition 10.3 shows that the resulting tableau
is Q[A, b]â•›=â•›[QA, Qb] for some matrix Q. This tableau has an inconsistent
row, say, the ith row. From (3), we see that QiAâ•›=â•›0 and that Qibâ•›≠â•›0, so (b) holds
with yâ•›=â•›Qi. ■
Who cares?
A point of logic
It’s easy to stumble when trying to prove that two or more statements are
equivalent. Proposition 10.8 will be used to illustrate the pitfall, along with
a foolproof way to avoid it. Proposition 10.8 asserts that the two statements
listed below are equivalent:
This raises a point of logic. Listed below are four implications, each of which
can be part of a demonstration that the above two conditions are equivalent.
Here and throughout, “ ⇒ ” means “implies” and “ ⇐ ” means “is implied by.”
Suppose râ•›>â•›mâ•›+â•›1. In this case, the set consisting of the vectors (v2â•›−â•›v1),
(v3â•›−â•›v1), …, (vrâ•›−â•›v1) consists of at least mâ•›+â•›1 vectors in m . Proposition 10.1
shows that these vectors are linearly dependent, so there exist numbers d2
through dr not all of which equal zero such that
r
0= di (vi − v1 ).
i=2
r
Define d1 by d 1 = − i=2 di , and note that
r r
0= di vi , 0= di ,
i=1 i=1
Not all of d1 through dr equal zero, and they sum to 0, so at least one of
them is positive. Define R by
ci
(29) R = min : di > 0 .
di
ei = ci − Rdiâ•…â•… for i = 1, . . . , r.
11. Review
The pivot matrices that are introduced in this chapter will play a key role
in Chapter 11. A generalization of the theorem of the alternative in this chap-
ter will play a key role in Chapter 12.
2 4 −1 8
â•…â•…â•…â•…â•… A = 1 2 1 and b = 1 .
−1 1 −1 1
Parts (a)-(c) of this problem ask you to adapt the Excel computation in
Table 10.1 to the above data.
(a) For these data, use pivot matrices to execute Gauss-Jordan elimina-
tion, pivoting at each opportunity on the left-most nonzero coefficient
in the lowest-numbered row for which a basic variable has not yet
been found.
3. Table 10.1 specifies pivot matrices P(1), P(2) and P(3) that transform (8)
into (9). Use equation (28) to write down the inverse of P(1), of P(2) and
of P(3).
352 Linear Programming and Generalizations
1 1 0 0
0 0 1 1
â•…â•…â•…â•…â•…â•…â•…â•…â•…â•… B=
1
.
0 0 4
0 0 0 2
(b) Without doing any further calculation, determine whether or not the
columns of B are linearly independent.
(d) On the same spreadsheet, compute JT and JTQ. Remark: Excel has an
array function that computes the transpose of a matrix.
0 0 1 0 0 0 0 1 0 0
0 0 0 0 1 1 0 0 0 0
â•…â•…â•…â•… J = Ĵ =
0 0 0 1 0 0 0 0 1 0
1 0 0 0 0 0 1 0 0 0
0 1 0 0 0 0 0 0 0 1
(b) Draw a directed network that includes nodes 1 through 5 and directed
arc (i, j) if and only if Jijâ•›=â•›1. Use this network to determine the smallest
positive integer n for which Jnâ•›=â•›I. (Here, J3â•›=â•›J J J, for instance.)
(d) What is the smallest positive integer n such that every 5 × 5 permuta-
tion matrix J has Jnâ•›=â•›I? Why?
Chapter 10: Eric V. Denardo 353
9. Let a, b, c and d be any numbers such that (ab − cd) = 0. Show that the
2 × 2 matrices given below are each others’ inverses.
a c 1 b −c
â•…â•…â•…â•…â•…â•…â•… .
d b (a b − c d) −d a
11. Is it possible to locate four ×’s and one * on a sheet of paper in such a way
that the * is a convex combination of the four ×’s but is not a convex com-
bination of fewer than four of the ×’s? If not, why not?
Chapter 11: Multipliers and the Simplex Method
1. Preview
fill in the details later. Proofs of the propositions in this chapter are starred
because of their lengths.
(1.0) c1 x1 + c2 x2 + · · · · · · + cn xn − z = 0
(1.m+1) x1 ≥ 0, x2 ≥ 0, . . . , xn ≥ 0.
• The integer m is the number of equations, other than the one that de-
fines z as the objective value.
The data for Program 11.1 array themselves into the initial tableau (ma-
trix) that is depicted below.
row 0 c1 c2 ··· cn 1 0
row 1 A11
A12 ··· A1n 0 b1
(2) row 2 A21
A22 ··· A2n 0 b2
.. .. .. .. .. ..
. . . . . .
row m Am1 Am2 · · · Amn 0 bm
As this notation suggests, the top-most row of the initial tableau is called
row 0, and the others are called row 1 through row m.
The numbers in the 1st column of the initial tableau are the coefficients
of x1 in equations (1.0) through (1.m), the numbers in the 2nd column are the
coefficients of x2 in these equations, and so forth, through the nth column.
The numbers in the next-to-last column multiply –z, and the final column
consists of the RHS values.
Admissible pivots
When applied to the initial tableau, the simplex method executes a se-
quence of pivots. Each of these pivots occurs on a nonzero coefficient of a
decision variable in some row other than row 0. Any pivot that occurs on
a non-zero coefficient of some variable in some row iâ•›≥â•›1 is now said to be
an admissible pivot. Each simplex pivot is admissible, but admissible pivots
need not be simplex pivots.
A tableau that can result from any finite sequence of admissible pivots is
now called a current tableau. Each current tableau is depicted as
Bars atop the entries in (3) record the fact that they can differ from the
numbers in the corresponding positions of (2). The next-to-last column of
tableau (3) equals that of (2) because no admissible pivot occurs on a coef-
ficient in row 0. The entry in the upper-right-hand corner of tableau (3) is
denoted b̄0, which need not equal 0 because each admissible pivot replaces
row 0 by itself less a constant times some other row.
Matrix notation
The data in the initial and current tableaus group themselves naturally
into the matrices and vectors in:
c 1 b0 c̄ 1 b̄0
(4) and .
A 0 b A 0 b̄
b1 b̄1
b2 b̄2
b = . , b̄ = . ,
.. ..
bm b̄m
and the RHS value b0 of the initial tableau equals 0. Finally, the “0’s” in equa-
tion (4) are mâ•›×â•›1 vectors of 0’s.
A relationship
(a) There exist at least one 1â•›×â•›m vector y and at least one m × m matrix
Q such that
(b) If the rank of A equals m, equations (5) and (6) have unique solutions
y and Q. If the rank of A is less than m, equations (5) and (6) have
multiple solutions.
• Row 0 of the current tableau equals row 0 of the initial tableau less a
linear combination of rows 1 through m of the initial tableau.
The (pâ•›+â•›1)st pivot occurs on a nonzero coefficient Aij in row iâ•›≥â•›1 of the
current tableau. Row i is multiplied by the constant 1/ Aij, so it remains a
linear combination of rows 1 through m of the initial tableau. Each of the
other rows is replaced by itself less some constant times row i of the current
tableau. Thus, each row is replaced by itself less a linear combination of rows
1 through m of the initial tableau. As a consequence, a revised matrix Q and
a revised vector y satisfy the highlighted properties after pâ•›+â•›1 pivots have oc-
curred. This completes an inductive proof of part (a).
For part (b), we first consider the case in which the rank of [A, b] is less
than m. The rows of [A, b] are linearly dependent, so there exists a nonzero
1â•›×â•›m vector w such that wAâ•›=â•›0 and wb = 0. Replacing y by (yâ•›+â•›w) preserves
a solution to (5). Similarly, with W as the mâ•›×â•›m matrix each of whose rows
equals w, replacing Q by (Qâ•›+â•›W) preserves a solution to (6). Hence, the solu-
tions to (5) and (6) cannot be unique.
360 Linear Programming and Generalizations
Let us consider the case in which the rank of A equals m. The rows of A are
linearly independent. Solutions y and ỹ to (5) satisfy (y − ỹ)A = 0, and, since
the rows of A are linearly independent, this guarantees y = ỹ. Similarly, solu-
tions Q and Q̃ to (6) satisfy (Q − Q̃)A = 0. For each i, we have (Q − Q̃)i A = 0.
The fact that the rows of A are linearly independent (again) guarantees Qi = Q̃i
for each i, so that Q = Q̃. This completes a proof of part (b).
It was demonstrated in Proposition 10.5 that the Full Rank proviso holds
if and only if the rank of A equals m. Thus, y and Q are unique if the Full Rank
proviso is satisfied, and they are not unique if it is violated.
Multipliers
Each vector y that satisfies (5) is said to be a set of multipliers for the cur-
rent tableau, and yi is called the multiplier for the ith constraint. To see where
the multipliers get their name, we rewrite equation (5) as
Equation (7) contains the same information as does equation (5); it states
that c̄ = c − yA and that b̄0 = −yb. Equation (7) can be read as:
The top row of the current tableau equals the top row of the initial tab-
leau less the sum over each iâ•›≥â•›1 of the constant (multiplier) yi times the
ith row of the initial tableau.
An existential result
one m × m matrix Q that satisfy equations (5) and (6). Equations (5) and (6)
can be written succinctly as
c̄ 1 b̄0 1 −y c 1 0
(8) = .
A 0 b̄ 0 Q A 0 b
Please pause to check that equation (8) contains exactly the same infor-
mation as do equations (5) and (6), for instance, that (8) gives
c̄ = 1c − yA = c − yA.
Part (b) of Proposition 11.1 shows that the Q̃ is unique if and only if the
rank of A equals m.
1 0 ··· −c̄j /Aij ··· 0
0 1 · · · −A1j /Aij ··· 0
.. .. . .
. . .
..
.
..
.
(10) P̃ = .
0 0 · · ·
1/Aij ··· 0
.. .. .. .. ..
. .
. . .
0 0 ··· −Amj /Aij · · · 1
362 Linear Programming and Generalizations
Note, from equations (8) and (9), that the effect of this pivot is to replace
Q̃ by the matrix product P̃ Q̃. This matrix product may look messy. It will
turn out to have a simple interpretation, however. What remains of P̃ after
removal of its top row and left-most column is the (familiar) mâ•›×â•›m matrix P
that is given by
With Ii as the ith row of the mâ•›×â•›m identity matrix, the (mâ•›+â•›1)â•›×â•›(mâ•›+â•›1)
matrix P̃ partitions itself as
1 ( − c̄j /Aij ) Ii
(12) P̃ = ,
0 P
With Q̃ given by (9) and P̃ given by (12), the matrix product P̃ Q̃ parti-
tions itself as
c̄j i
1 (− c̄j /Aij ) I i
1 −y 1 −y − Q
(13) P̃Q̃ = = Aij .
0 P 0 Q
0 PQ
This discussion is summarized by
c̄j
(14) y←y+ Qi ,
Aij
(15) Q ← PQ.
Chapter 11: Eric V. Denardo 363
Thus, starting with Qâ•›=â•›I and yâ•›=â•›0 and updating Q and y with each pivot
results in a matrix Q and a vector y that satisfy (5) and (6) for the current
basis. This occurs even if the Full Rank proviso is violated.
Let us recall from Chapter 5 that relative opportunity cost is relative to the
current plan. In a linear program, each basis (set of basic variables) is a plan,
and the relative opportunity cost of doing something equals the decrease in
profit (increase in cost) that occurs if the resources needed to do that thing
are freed up and the values of the basic variables are adjusted accordingly.
Proposition 11.3 (below) shows that the multipliers can be interpreted as
break-even prices even when they are not unique.
(a) The basic solution for this tableau has y b as its objective value.
(c) Suppose Program 11.1 is a maximization problem. For each j, the rel-
ative opportunity cost of the resources needed to set xjâ•›=â•›1 equals yAj.
(d) Row i has a unique multiplier yi if and only if the column space of A
contains the mâ•›×â•›1 vector Ii (which has 1 in its ith position and has 0’s
in all other positions).
364 Linear Programming and Generalizations
Proof*.╇ For part (a), we note that the basic solution to this tableau equates
–z to b̄0 , so (5) gives –zâ•›=â•›– yb and zâ•›=â•›yb.
For part (b), consider a vector y and matrix Q for which (5) and (6)
hold with the RHS vector b. Let us replace b by (bâ•›+â•›d) and then repeat the
pivot sequence that led to the current tableau. This has no effect on y or
Q. It has no effect on A = QA or on c̄ = c − yA. Each variable xj that was
basic remains basic. The vector b̄ = Q b is replaced by Q(bâ•›+â•›d), and the
number b̄0 = −y bb is replaced by –y(bâ•›+â•›d). By hypothesis, b and d are in
the column space of A, so that any equation that was trite remains trite.
The tableau remains basic, and (5)-(6) continue to hold, which proves
part (b).
For part (c), note that removing the resources needed to set xjâ•›=â•›1 replaces
b by (b − Aj), so part (b) shows that the basic solution’s objective changes by
y × (−Aj) = −yAj. In a maximization problem, profit decreases by yAj. This
proves part (c).
For part (d), we first consider the case in which the column space Vc of
A contains the vector Ii. The sum of two vectors in Vc is in Vcâ•›, so the vector
(bâ•›+â•›Ii) is in Vc, and part (b) shows that changing the RHS vector from b to
(bâ•›+â•›Ii) changes the basic solution’s objective by yIiâ•›=â•›yi. This demonstrates that
yi is unique.
Now consider the case in which the column space Vc does not contain Ii.
Since b is in Vc, there does exist an nâ•›×â•›1 vector x such that bâ•›=â•›Ax. Since Ii is
not in Vc, no solution exists to Azâ•›=â•›Ii. Proposition 10.8 shows that there does
exist a row vector v such that vAâ•›=â•›0 and vIi = 0. Premultiply bâ•›=â•›Ax by v
to obtain vbâ•›=â•›vAxâ•›=â•›(vA)xâ•›=â•›0. We have seen that vAâ•›=â•›0, that vbâ•›=â•›0 and that
0 = vIi = vi . Each vector y that satisfies (5) is a set of multipliers for the cur-
rent tableau. With y as such a vector, note that (yâ•›+â•›v) also satisfies (5), hence is
a set of multipliers. These multipliers satisfy (y + v)Ii = yi + vi = yi because
vi = 0. Hence, the multiplier for the ith constraint cannot be unique. This
completes a proof.
Proposition 11.3 shows that the vector y of multipliers play the role of
break-even prices in these ways:
• The equation zâ•›=â•›yb shows that the multipliers are break-even prices for
the entire bundle b of resources.
Chapter 11: Eric V. Denardo 365
• For a maximization problem, the equation c̄j = cj − yAj shows that yAj
is the decrease in profit that occurs if the resources needed to set xj = 1
are set aside.
Shadow prices
Consider any basic solution to Axâ•›=â•›b. The multiplier yi for the ith con-
straint is unique if and only if yi is the shadow price for the ith constraint.
Solver and Premium Solver report shadow prices whether or not they
exist. What these codes are actually reporting is a set of multipliers for the
final basis. Calling these multipliers “shadow prices” follows a long-standing
tradition.
To illustrate what occurs when the Full Rank proviso is violated, we pres-
ent to Premium Solver the linear program that appears below.
This linear program has three equality constraints. Its 2nd constraint is a
linear multiple of its 1st constraint. Perturbing the RHS of either of these two
constraints renders the linear program infeasible. Neither can have a shadow
price. When Premium Solver is presented with this linear program, it reports
an optimal solution that sets
x1 = 1, x2 = 0, x3 = 1,
and it reports an optimal value z* = 8. Its sensitivity analysis reports that re-
duced costs of x1, x2 and x3 equal 0, –2, and 0, respectively. Its sensitivity analy-
sis also reports the shadow prices and ranges that appear in
The “shadow prices” for rows 1 and 2 are actually multipliers. To double-
check that the vector y╛=╛[0╇ 0.5╇ 1] of multipliers does satisfy equation (5), we
substitute and obtain
c̄ = [4 2 4] − (0.5)[6 4 2] − (1.0)[1 2 3] = [0 −2 0],
z∗ = (0)(4) + (0.5)(8) + (1.0)(4) = 8,
both of which are correct.
Table 11.1 reports that the RHS value of row 1 has 0 as its Allowable In-
crease and its Allowable Decrease because perturbing the RHS value of row
1 renders the linear program infeasible. The same is true for row 2. Premium
Solver (correctly) reports that the basis solution remains feasible when the
RHS value of row 3 is increased by as much as 8 and when it is decreased by
as much as 2 2/3. Row 3 does have a shadow (break-even) price, and it does
apply to changes in the RHS value of the 3rd constraint that lie between –2 2/3
and â•›+â•›8.
As the time this book is being written, the Sensitivity Report issued by
Solver differs from that in Table 11.1. Solver reports correct values of the
Chapter 11: Eric V. Denardo 367
multipliers. For this example (and others that violate the Full Rank proviso),
Solver reports incorrect ranges of those RHS values that cannot be perturbed
without rendering the linear program infeasible.
Final words
The fact that multipliers are break-even prices suggests that they ought to
play a key role in the simplex method. Yet the multipliers are all but invisible
in the tableau-based simplex method that was presented in Chapter 4. The
next section of this chapter shows that the multipliers are crucial to a version
of the simplex method that is better suited to solving large linear programs.
The tableau-based simplex method is a great way to learn how the sim-
plex method works, and it is a fine way in which to solve linear programs that
have only a modest number of equations and decision variables. For really
large linear programs, there is a better way.
Described in this section is the version of the simplex method that is im-
plemented in several commercial codes. This method was originally dubbed
the revised simplex method1, but it has long been known as the simplex
method with multipliers. As its name suggests, this method uses the mul-
tipliers to guide the simplex method as it pivots. The simplex method with
multipliers has two main advantages:
Part V – alternative algorithm for the revised simplex method using product form
for the inverse,” RM 1268, The RAND Corporation, Santa Monica, CA, November
19, 1953.
368 Linear Programming and Generalizations
A nonterminal iteration
3. Find a row i whose ratio b̄i /Aij is closest to zero, among those rows
having Aij > 0.
4. Replace y by itself plus the multiple c̄j /Aij of the ith row Qi of Q. Then,
with the pivot matrix P given by (11), replace Q by PQ. Return to
Step 1.
Eventually, after enough pivots have occurred, the round-off error will
have accumulated to the point at which it can no longer be dealt with. When
that occurs, it becomes necessary to begin again – to “pivot in” the current
basis, and then restart the simplex method. This too is easier to accomplish
using the simplex method with multipliers.
Column generation
Commonly-used notation
This subsection deals with the case in which Program 11.1 satisfies the
Full Rank proviso: Thus, the rank of A equals m, the equation Axâ•›=â•›b is con-
sistent, and each basic tableau has one basic variable per row.
Let us consider any basic tableau that might be encountered by the sim-
plex method. The variable –z is basic for row 0, and rows 1 through m have
basic variables. These basic variables are used to identify a function β, an
mâ•›×â•›m matrix B and a 1â•›×â•›m vector cB by the following procedure: For iâ•›=â•›1, 2,
…, m.
• The decision variable xβ(i) is basic for row i of this tableau.
• The 1â•›×â•›m vector cB has cβ(i) as its ith entry.
In the literature, the matrix B that is prescribed by these rules is called a ba-
sis matrix. The ith column of B is the column Aβ(i) of coefficients of the variable
xβ(i) that is basic for row i. The matrix B is square (because the Full Rank proviso
is satisfied), and B is invertible (because its columns are linearly independent).
An example
To illustrate this notation, we reconsider the linear program that was used
in Chapter 4 to introduce the simplex method. Table 11.2 reproduces its ini-
tial and final tableaus. Its decision variables (previously x, y and s1 through s4)
are now labeled x1 through x6, however. This example satisfies the Full Rank
proviso because the variables x3 through x6 are basic for rows 1 through 4 of
the initial tableau.
The tableau in rows 10-14 of Table 11.2 is basic. The variables that are
basic for rows 1 through 4 of this tableau are x3, x1, x5 and x2, respectively. For
each i, β(i) identifies the variable that is basic for row i of this tableau, and
Chapter 11: Eric V. Denardo 371
c B = c3 c 1 c 5 c 2 = 0 2 0 3 ,
1 1 0 0
0 1 0 1
(17) B = A3 A1 A5 A2 =
.
0 0 1 2
0 −1 0 3
1 −3/4 0 1/4
0 3/4 0 −1/4
(18) B−1 =
0 −1/2 1 −1/2 ,
0 1/4 0 1/4
This matrix B−1 appears in Table 11.2. To see why, recall that x3 through x6
are the slack variables for the original tableau.
If the Full Rank proviso is satisfied, equations (5) and (6) are satisfied
by a unique vector y and a unique matrix Q. Their relationship to each basic
tableau’s basis matrix B and to its vector cB are the subject of
Proposition 11.4.╇ Suppose that Program 11.1 satisfies the Full Rank pro-
viso, and consider any current tableau that is basic. The matrix Q and vector
y that satisfy (5) and (6) are
and the current tableau’s basic solution has objective value z that is given by
(21) z = cBB−1b.
The variable xβ(i) is basic for row i, so the reduced cost c̄β(i) of this vari-
able equals 0. Thus, (5) gives cβ(i) = yAβ(i) . Since Aβ(i) = Bi , we have
demonstrated that cβ(i) = yBi . This equation holds for each i, so cBâ•›=â•›yB.
Postmultiply this equation by B−1 to obtain cBB−1â•›=â•›y. This verifies (19).
Equations (20) and (21) are immediate from (6), (5) and the fact that the
basic solution equates –z to the RHS value b̄0 of row 0. This completes a
proof.
When the Full Rank proviso is satisfied, the matrix Q and the vector y
satisfy Qâ•›=â•›B−1 and yâ•›=â•›cBB–1.
Suppose, however, that the Full Rank proviso is not satisfied. The rank
of A is less than m, so each basis for the column space of A consists of fewer
than m columns. The “basis matrix” has fewer columns than rows. It is not
square, and it cannot have an inverse. Results that are stated in terms of B−1
cannot be valid. These results become correct, however, when B−1 is replaced
by any matrix Q that satisfies (6) and when cBB−1 is replaced by any vector y
that satisfies (5). It is highlighted:
When the Full Rank proviso is violated, results that are stated in terms of
B−1 and cBB−1 become correct when Q replaces B−1 and y replaces cBB−1.
Chapter 11: Eric V. Denardo 373
In brief, the more standard development coalesces with ours when B−1 is
replaced by the product Q of the pivot matrices that led to the current tableau
and when cBB−1 is replaced by the vector y of multipliers.
7. Review
Proposition 11.1 relates the current tableau to the initial tableau via a row
vector y and a square matrix Q that satisfy equations (5) and (6). This vector
y and matrix Q are unique if and only if the Full Rank proviso is satisfied.
Proposition 11.2 shows how to compute solutions y to (5) and Q to (6) recur-
sively, by accounting for each pivot. Proposition 11.3 shows how the vector y
of multipliers plays the role of break-even prices, even when y is not unique.
Proposition 11.4 relates the development in this chapter to the more typical
one, in which the Full Rank proviso is satisfied. In addition, the multipliers
have been shown to be key to the “revised” simplex method, which is better-
suited to solving large linear programs.
In concert, the results in this chapter show that the multipliers play a
crucial role in linear programming. In Chapter 12, it will be seen that the
multipliers play yet another role – they are the decision variables in a second
linear program, which is known as the “dual.”
1. Suppose that the equation Axâ•›=â•›b is consistent and that its ith row includes
a slack variable (that converted an inequality into an equation.)
(a) Show that the system Axâ•›=â•›b remains consistent when the RHS value of
the ith constraint is perturbed.
5. Consider a linear program that is written in Form 1 and for which the sum
of the rows of A equals the 1â•›×â•›n vector 0 = (0 0 … 0). Does this linear
program have shadow prices? Support your answer.
6. Cells D11:G14 of Table 11.2 contain the inverse of the matrix B that is
given by (17). This is not an accident. Why? Hint: cells D4:G7 of the same
table give the entries in the matrix Iâ•›=â•›BB−1 and the product Q of the pivot
matrices that produce the tableau in Table 11.2 equals B−1.
(a) Does this linear program satisfy the Full Rank proviso?
(b) In each basic tableau, where can the shadow prices be found?
(c) In each basic tableau, where can the inverse of the basis matrix be
found?
Chapter 11: Eric V. Denardo 375
8.╇This problem concerns the linear program that appears below. Which of
its constraints have shadow prices and which do not? Support your answer.
x1 ≥ 0, x2 ≥ 0, x3 ≥ 0.
9.╇Consider a basic tableau for a linear program that is written in the format:
Minimize c x, subject to the constraints Axâ•›≥â•›b and xâ•›≥â•›0. Where can the
shadow prices for this tableau be found? Support your answer.
Chapter 12: Duality
1. Preview
In Chapter 11, each current tableau was seen to have at least one vector
y of multipliers that determine its vector c̄ of reduced costs and its objective
value z via c̄ = c − yA and zâ•›=â•›yb. It was also noted that these multipliers, if
unique, are the shadow prices. A method was presented for computing a vec-
tor y of multipliers, whether or not they are unique.
Ax = b,
x ≥ 0.
Chapter 12: Eric V. Denardo 379
Program 12.1D (below) has the same data as does Program 12.1. Its deci-
sion variables form the 1 × m (row) vector y.
yA ≥ c,
y is free.
Program 12.1D is called the dual of Program 12.1. Since Program 12.1 is a
canonical form, this defines the dual of every linear program.
An unwieldy definition?
This chapter’s analysis of Program 12.1 and its dual begins with an easy-
to-prove result that is known as “weak duality.”
380 Linear Programming and Generalizations
(a) Then
(1) yb ≥ z∗ ≥ z∗ ≥ cx.
(b) Also, each inequality in (1) holds as an equation if and only if x and y
satisfy
m
(2) yi Aij − cj (xj ) = 0 for j = 1, 2, . . . , n.
i=1
Ax = b, x ≥ 0, and yA ≥ c.
(3) yb = cx + tx
= cx + (yA − c)x
n
= cx + (yAj − cj )xj .
j=1
For part (b), we first suppose that feasible solutions x and y satisfy (2).
In this case, (3) gives ybâ•›=â•›cx, so every inequality in (1) must hold as an
equation.
Finally, suppose, that feasible solutions x and y satisfy ybâ•›=â•›cx. In this case,
(3) gives 0 = nj=1 (yAj − cj )xj . Feasibility of x and y assure us that each
term on the RHS of this equation is nonnegative, and the fact that the sum of
nonnegative terms equals 0 guarantees (2). ■
Chapter 12: Eric V. Denardo 381
Weak duality?
The name, weak duality, stems from the fact that the optimal value z∗ of
a maximization problem cannot exceed the optimal value z∗ of the minimiza-
tion problem that is its dual.
(a) The optimal value of Program 12.1 equals the optimal value of Pro-
gram 12.1D, and both are finite.
(d) Application of the simplex method with multipliers and with an anti-
cycling rule to Program 12.1 terminates with a basis whose basic solu-
tion x is optimal for Program 12.1 and with a vector y of multipliers
that is optimal for Program 12.1D.
Proof.╇ (a) ⇒(b): Linear programs whose optimal values are finite must
be feasible.
(b) ⇒ (c): Immediate from Proposition 12.1.
c̄ = c − yA, z∗ = yb.
(d) ⇒(a): By hypothesis, x is feasible for Program 12.1, and y is feasible for
Program 12.1D. Also, since x is optimal for Program 12.1, Part (a) of Proposi-
tion 11.3 shows that cxâ•›=â•›yb. That z∗ = z∗ is immediate from (1). This com-
pletes a proof. ■
The term, strong duality, describes conditions under which the inequal-
ity z∗ ≥ z∗ holds as an equation. The portion of Proposition 12.2 that does
not mention the simplex method is called the “Duality Theorem.”
The proof provided here of Proposition 12.2 rests on the simplex method.
Once you know the role played by multipliers, the method of proof is straight-
forward – examine the conditions that cause the simplex method to terminate.
Chapter 12: Eric V. Denardo 383
To illustrate the last of these points, we observe that the coefficients of the
variable xj in Program 12.1 are cj and column vector Aj ; these coefficients are
the data in the constraint yAj ≥ cj in Program 12.1D.
The first step in a recipe for taking the dual of a linear program is to as-
sign to each non-sign constraint in that linear program a “complementary”
decision variable in its dual. The senses of these complementary variables and
constraints are determined by Table 12.1, below.
To illustrate Table 12.1, consider Program 12.1 and its dual. Program 12.1
has equality constraints; from row 2 of the cross-over table, we see that its
dual has variables that are free (unconstrained as to sign). Program 12.1 has
nonnegative decision variables, and row 4 of the cross-over table shows that
its dual has constraints that are “≥” inequalities.
A memory aid
• Row 1, when read from left to right, states that the complementary vari-
able (shadow price) for a “≤” constraint in a maximization problem is
nonnegative. That must be so because increasing the constraint’s RHS
value can increase the optimal value, but cannot decrease it.
• Row 4, when read from right to left, states that the complementary
variable (shadow price) for a “≥” constraint in a minimization problem
is nonnegative. That must be so for the same reason.
The recipe
The recipe that appears below constructs the dual of every linear pro-
gram. This recipe is wordy, but an example will make everything clear.
An example
x: – 4a + 5b – 6c ≥ 7,
y: 8a + 9b + 10c = –11,
Step 1 of the recipe states that the dual of Program 12.2 is a maximization
problem. The non-sign constraints in Program 12.2 have been assigned the
complementary variables x, y and z (any labels other than a, b and c would
do). Step 2 determines the senses of x, y and z from rows 4, 5 and 6 of the
cross-over table. Evidently,
x ≥ 0, y is free, z ≤ 0.
b: 5x + 9y + 13z = –2,
c: –6x + 10y – 14z ≥ 3.
a: –4x + 8y + 12z ≤ 1,
x ≥ 0, y is free, z ≤ 0.
Note that the RHS values of Program 12.2 become the objective coef-
ficients of its dual, and the objective coefficients of Program 12.2 become
the RHS values of its dual. Note also that the column of coefficients of each
variable in Program 12.2 become the row of coefficients of its complementary
constraint.
You are urged to take the dual of Program 12.2D and see that it is Program
12.2.
The recipe for taking the dual treats the constraints on the signs of the
decision variables differently from the so-called “non-sign” constraints. This
seems arbitrary. What happens if we don’t?
x: –4a + 5b – 6c ≥ 7,
y: 8a + 9b + 10c = –11,
The dual’s objective now includes the addend 0 s1 , which equals zero.
Because s1 is complementary to a “≥” constraint, row 4 of the cross-over table
shows that s1â•›≥â•›0. Because the variable a is now free, row 2 shows that its com-
plementary constraint is
a: – 4x + 8y + 12z + 1s1 = 1.
Evidently, treating aâ•›≥â•›0 as a non-sign constraint inserts a slack variable
in the constraint that is complementary to a. This has no material effect on
Program 12.2D.
No proof?
No proof has been provided that the recipe works. To supply a proof, we
would need to show that using the recipe has the same effect as forcing a lin-
ear program into the format of Program 12.1 and then taking its dual. Such a
proof would be cumbersome, it would provide no insight, and it is omitted.
Weak and strong duality had been established in the context of Program
12.1 and its dual. These results apply to any pair of linear programs, however.
That is so because:
A set of values of the decision variables for a linear program and for its
dual is said to satisfy complementary slackness if the following conditions
are satisfied:
• If a variable for either of these linear programs is not zero, its comple-
mentary constraint holds as an equation.
Consider any basic tableau for Program 12.1. Its basic solution x and
each vector y of its multipliers satisfy complementary slackness.
Chapter 12: Eric V. Denardo 389
Feasible solutions to Program 12.1 and its dual are optimal if and only if
they satisfy complementary slackness.
Pivot strategies
The simplex method pivots from basic tableau to basic tableau. Each ba-
sic tableau has a basic solution x and at least one vector y of multipliers. Listed
below are conditions that are necessary and sufficient for x and y to be opti-
mal for a linear program and its dual:
Each basic tableau has a basic solution and multipliers that satisfy (iii).
The simplex method pivots to preserve (i) and (iii). It aims to improve the ba-
sic solution’s objective value with each pivot. It stops as soon as it encounters
a tableau whose multipliers satisfy (ii).
Ax = b, x ≥ 0.
yA ≤ 0, yb > 0.
Proof. (a) implies not (b):╇ Suppose that (a) holds, so that a solution x
exists to Axâ•›=â•›b and xâ•›≥â•›0. Aiming for a contradiction, suppose that (b) also
holds, i.e., there exists a solution y to yAâ•›≤â•›0 and yb > 0. Premultiply Axâ•›=â•›b
by y and obtain yAxâ•›=â•›yb > 0. Postmultiply yAâ•›≤â•›0 by the nonnegative vector x
and obtain yAxâ•›≤â•›0. The contradiction 0 < yAxâ•›≤â•›0 shows that (b) cannot hold
if (a) does.
Not (a) implies (b):╇ Suppose that (a) does not hold. Let us consider the
linear program and its dual that are specified by:
Since (a) does not hold, LP is infeasible. The 1 × m vector yâ•›=â•›0 is feasible
for Dual. If its optimal value equaled 0, the Duality Theorem would imply
that the optimal value of LP equals zero. This cannot occur, by hypothesis, so
a solution must exist to (b). This completes a proof. ■
• Observe that the dual has a feasible solution whose objective value is
negative (positive) if and only if the constraint system has no solution.
An illustration
(5) Ax ≤ b, x ≥ 0.
The handiest linear program is to maximize {0x}, subject to (5). The dual
linear program minimizes {yb} subject to the constraints
(6) yA ≥ 0, y ≥ 0.
The dual is feasible because setting yâ•›=â•›0 satisfies (6). The Duality Theo-
rem guarantees that no solution exists to (5) if and only if a solution exists to
(6) that has yb < 0. This proves
Farkas’s lemma?
Perhaps Farkas’s lemma once was a lemma, that is, a step toward an im-
portant result. This “lemma” is now recognized as one of the most fundamen-
392 Linear Programming and Generalizations
This is the first of three self-contained sections. Each of these three sec-
tions uses a linear program and its dual to analyze an issue. These sections are
starred because they can be read independently of each other and because the
information in them is not used in later chapters.
(a) There exist values of the outputs and costs of the inputs such that
unit B has a benefit-to-cost ratio that is at least as large as any of the
other units.
(b) There exists a nonnegative linear combination of the other units that
produces more of each output and consumes less of each input than
does unit B.
A schedule of prices
The decision variables in this model are the prices (values) that are placed
on the inputs and on the outputs. Let us designate
Benefit-to-cost ratios
Given a schedule of prices, the benefit-to-cost ratio ρB of unit B is deter-
mined by the data in its row of Table 12.2 and is given by
A potentially-efficient unit
(7) ρB ≥ ρA and ρB ≥ ρC .
The inequalities in (7) entail the comparison of ratios. This seems to pres-
ent a difficulty; the requirement ρB â•›≥â•› ρA cannot be represented as a linear
inequality. Note, however, that multiplying the price pi of each output by the
same constant θ multiplies each ratio by θ and preserves (7). Thus, if a solu-
tion exists to (7), then a solution exists in which ρB is at least 1 and in which
ρA and ρC are at most 1. In other words, a schedule of prices satisfies (7) if
and only if a (possibly different) schedule of prices satisfies
(8) ρB ≥ 1, ρA ≤ 1, ρC ≤ 1.
A linear program
• Its 1st constraint keeps the benefit of the outputs of unit A from exceed-
ing the cost of its inputs, thereby enforcing ρA ≤ 1.
• Its 2nd constraint keeps the cost of the inputs to unit B from exceeding
the benefit of its outputs, thereby enforcing ρB ≥ 1.
Chapter 12: Eric V. Denardo 395
• Its 3rd constraint keeps the benefit of the outputs of unit C from exceed-
ing the cost of its inputs, thereby enforcing ρC ≤ 1.
Duality will be used in the analysis of Program 12.3. For that reason, each
of its non-sign constraints has been assigned a complementary dual variable;
yA is complementary to the 1st constraint, for instance.
Solver says
Solver reports that Program 12.3 has 0 as its optimal value. No schedule
of prices exists for which unit B has the highest benefit-to-cost ratio. Solver
reports an optimal solution having pi = 0 and qj = 0 for each i and j, and it
reports shadow prices yA , yB and yC and v that are given below.
Proposition 12.2 shows that these shadow prices (multipliers) are an op-
timal solution to the dual of Program 12.3. From the dual, we will learn that
unit B is enveloped by the combination of units A and B whose weights wA
and wB are given by
yA 2.875
(10) wA = = = 0.730,
yB 3.9375
yC 3.0833
(11) wC = = = 0.783.
yB 3.9375
Program 12.3 is feasible and bounded (its optimal value equals 0). Propo-
sition 12.2 shows that Program 12.3D is also feasible and bounded, moreover,
that the values of the multipliers given by (9) are an optimal solution to Pro-
gram 12.3D. This optimal solution has vâ•›=â•›0 and it has yB > 0.
Let us interpret the optimal solution to Program 12.3D. Dividing its sec-
ond constraint by yB and noting that vâ•›=â•›0 shows that
yA yC
3.5 + 6 > 7.
yB yB
This inequality shows that unit B consumes more of input 2 than does
the same weighted combination of units A and C. The pattern is evident; unit
B consumes more of each input and produces less of each output than does
the weighted combination of units A and C with weights wA and wC given by
(10) and (11). Unit B is enveloped.
• There do not exist prices on the inputs and outputs for which unit k has
a benefit-to-cost ratio that is at least as large as that of any other unit.
• The analogue of Program 12.3 has 0 as its optimal value, and its optimal
solution has a shadow price yi for each unit i that is nonnegative, with
yk > 0.
• The dual of that linear program is feasible and bounded, and it has 0 as
its optimal value.
• With wi = yi /yk , the dual shows that unit k consumes more of each in-
put and produces less of each output than does the nonnegative linear
combination of the other units in which, for each i, in the inputs and
outputs of unit i are multiplied by wi .
It suffices for this result that the each of the inputs be positive and that
each of the outputs be nonnegative. (If an input equaled 0, a ratio could have
0 as its denominator.)
A link exists between arbitrage and duality. This link holds in general,
but it is established here in the context of a family of one-period investment
opportunities. Dealing with a one-period model lets us focus on the inves-
tor’s asset position at the end of the period, and this simplifies the discus-
sion.
A risk-free asset
Risky assets
Each risky asset has a fixed market price at the start of the period. Each
risky asset’s price at the end of the period depends on the “state” that occurs
then. These states are mutually exclusive and collectively exhaustive – one of
them will occur.
An example
Table 12.3 describes a model having one risk-free asset, three risky assets
(that are labeled 1, 2 and 3), and four states (that are labeled a, b, c and d).
The net return from investing in one unit of an asset is the amount of
money (possibly negative) that remains from borrowing the price of that asset
at the start of the period, purchasing that asset at that time, selling it at the end
of the period, and repaying the loan and the accrued interest. For instance,
Chapter 12: Eric V. Denardo 399
if state b occurs, the net return for investing in one share of stock #1 equals
$109 − $100 × (1 + 0.03) = $6. The formula in cell E14 shows how to com-
pute the net return for investing in one unit of each asset.
A portfolio
An arbitrage opportunity
A linear program
wa + wb + wc + wd ≤ 1
were omitted from this linear program, it would have 0 or +∞ as its optimal
value. With that constraint included, the optimal solution to the linear pro-
gram exhibits a portfolio that creates an arbitrage opportunity, if one exists.
400 Linear Programming and Generalizations
θ: θ : wa + wb + wc + wd ≤ 1,
wa ≥ 0, wb ≥ 0, wc ≥ 0, wd ≥ 0.
No arbitrage
Solver reports that Program 12.4 has 0 as its optimal value, so this set of
investment opportunities presents no arbitrage opportunity. Solver reports
these values of the shadow prices:
This probability distribution has a property that you might surprise you.
Given this probability distribution over the states, let us compute the
expectation of the net return for each unit invested in asset 1. Row 12 of
Table 12.3 contains the net return of asset 1 for each state. From that row and
the probability distribution given above, we learn that expected net return for
asset 1 is given by
Chapter 12: Eric V. Denardo 401
1
× (9 × 4 + 28 × 6 − 31 × 8 + 11 × 4) = 0.
79
It is no accident that each risky asset has 0 as its expected net return. A
theorem of the alternative is at work. This theorem demonstrates that exactly
one of the following alternatives holds:
• There is a probability distribution over the states such that the prob-
ability of each state is positive and such that the expected net return of
each risky asset equals 0.
This result will be seen to follow directly from duality. It will be estab-
lished in a general setting, rather than for the data in Table 12.3.
A general model
Aij â•›equals the net return at the end of the period per unit invested in asset i at
the start of the period if state j occurs at the end of the period.
The Aij’s are known to the investor. For the data in Table 12.3, the Aij’s
form the 3 × 4 array in cells B12:E14.
occurs at that time. The constraints require the net profit wj to be nonnega-
tive for each state j, and the objective seeks a portfolio that has at least one wj
positive. With the constraint on the sum of the wj’s deleted, the optimal value
of Program 12.5 would be 0 or +∞. Including that constraint causes it to
exhibit a portfolio that have an arbitrage opportunity, if one exists.
n
Program 12.5.╇ Maximize j=1 wj , subject to the constraints
m
q↜渀屮j: wj = i=1 xi Aij for j = 1, . . . , n,
n
θ: j=1 wj ≤ 1,
wj ≥ 0 for j = 1, . . . , n.
Program 12.5 is feasible because setting the xi’s and the wj’s equal to zero
satisfies its constraints. If no arbitrage opportunity exists, the optimal value
of Program 12.5 equals zero. To see what this implies, we investigate the dual
of Program 12.5, which appears below as:
wj: qj + θ ≥ 1 for j = 1, . . . , n.
If the optimal value of Program 12.5 is zero, Strong Duality shows that the
Program 12.5D is feasible and that its optimal value equals 0, hence that there
exists a set of qj’s that satisfy qj ≥ 1 for each j and that satisfy
n
(12) j=1 Aij qj = 0 for i = 1, . . . , m.
n
With K = j=1 qj , set
(13) pj = qj /K for j = 1, . . . , n.
n
(15) j=1 pj = 1 .
Chapter 12: Eric V. Denardo 403
m
(16) i=1 Aij pj = 0 for i = 1, . . . , m.
Proposition 12.5.╇ For each i such that 1 ≤ i ≤ m and for each j such
that 1 ≤ j ≤ n, let Aij equal the net return that at that the end of the period if
state j is observed then and if one unit is invested in asset i at the start of the
period. The following are equivalent.
Proof.╇ That (a) ⇒ (b) ⇒ (c) ⇒ (d) has been established. That (b) ⇒
(a) is immediate. Showing that (d) ⇒ (b) will complete a proof.
(d) ⇒ (b): Suppose that there exists numbers p1 through pn that satisfy
(14), (15) and (16). Write (16) in matrix notation as 0â•›=â•›Ap. Program 12.5 is
feasible because equating all of its decision variable so zero satisfies its con-
straints. Consider any feasible solution to Program 12.5. Write its equality
constraints in matrix form as wâ•›=â•›xA. Postmultiply this equation by p to ob-
tain wpâ•›=â•›xApâ•›=â•›x0â•›=â•›0. Since w is feasible, wi ≥ 0 for each i. Since pi > 0 for
each i, the only way to obtain m i=1 wi pi = 0 is to have wi = 0 for each i, so
The proof technique in this section is similar to that in the prior section.
It is to set up a linear program whose objective value can be made positive
if and only if a condition holds and examine its dual. This technique will be
used again in the next section.
Feasible solutions to Program 12.1 and Program 12.1D are optimal if and
only if they satisfy the complementary slackness conditions,
The goal of this section is to show if Program 12.1 is feasible and bound-
ed, there exist optimal solutions to it and its dual that satisfy (18). The first –
and main – step in the proof is to show that there exist optimal solutions that
satisfy xj + tj > 0 for a particular j.
Proof.╇ The Duality Theorem guarantees Programs 12.1 and 12.1D have
the same optimal value and that feasible solutions x and y to Program 12.1
and Program 12.1D, respectively, are optimal if cxâ•›≥â•›yb.
The feasible solutions to Program 12.6 (below) are the optimal solutions
to Programs 12.1 and 12.1D. Program 12.6 seeks optimal solutions to Pro-
grams 12.1 and 12.1D that have xj + tj > 0.
y: –Aw ╛↜+ λb = 0,
t: ╛↜w – αIj ≥ 0,
â•› â•›
α ≥ 0, λ ≥ 0.
406 Linear Programming and Generalizations
In the above, Iâ•›j and Iâ•›j denote the row and column vectors having 1 in their
jth entry and 0’s elsewhere.
Case #2:╇ In this case, λ > 0. Set y = v(1/λ) and set x = w(1/λ). From
the constraints that are displayed above,
yb = cx,
j
yA – Iâ•› /λ ≥ c,
Ax â•›= b,
x ╛╛╛≥ Ij/λ.
Of these four constraints, the last requires xâ•›≥â•›0 with xj > 0 and the second
requires yA – tâ•›=â•›c with tâ•›≥â•›0 and tj > 0. Since ybâ•›=â•›cx and since Axâ•›=â•›b, optimal
solutions to Program 12.6 and its dual have been constructed. But these solu-
tions have xj > 0 and tj > 0, so they violate complementary slackness, hence
cannot be optimal. Case #2 cannot occur either. The proof is complete. ■
xj + tj > 0 for j = 1, 2, …, n .
Proof.╇ Proposition 12.6 holds for each particular value of j. The average
of n optimal solutions to a linear program is optimal. Denote as x̂ and as
(ŷ, t̂) the average of the optimal solutions found from Proposition 12.6 for
the values jâ•›=â•›1, 2, …, n. Note that x̂ is optimal for Program 12.1, that (ŷ, t̂)
is optimal for Program 12.1D and that x̂j + t̂j > 0 for each j. This proves the
theorem. ■
Chapter 12: Eric V. Denardo 407
11. Review
It is now clear that the simplex method attacks a linear program and its
dual. A variety of issues can be addressed by studying a linear program and its
dual. Three of them are studied in the starred sections of this chapter. Others
appear in later chapters. In Chapter 14, for instance, duality will be used to
develop a (simplified) model of an economy in general equilibrium.
1. (taking the dual) Use the recipe to take the dual of Program 12.2D. Indicate
where and how you used the cross-over table.
(b) Take the dual of the program you constructed in part (a).
(d) True or false: You have demonstrated Program 12.1 is the dual of Pro-
gram 12.1D.
(b) A food chemist has found a way to create each of the nutrients directly,
rather than from foods. She wishes maximize the revenue she receives
from selling nutrients to him. Formulate her problem as a linear pro-
gram.
(c) Is there a relationship between the linear programs you have created
and, if so, what is it?
(d) When eating food, might it be optimal for him to consume more than
bi units of a particular nutrient? If so, what price must she set on that
nutrient? Why?
1 −1
A= .
−1 1
A 0 x b x 0
≤ , ≥ .
0 −AT yT −cT yT 0
(a) Assume the original linear program is feasible and bounded. Is the
“related” linear program feasible and bounded? What is its optimal
value?
(c) A linear program is said to be self-dual if it and its dual are identical.
(d) True or false: Each linear program that is feasible and bounded can be
written as a linear program whose optimal value equals zero.
(a) Use only the data of Program 12.1 (specifically, A, b and c) to write a
system of linear constraints whose solution includes an optimal solu-
tion to Program 12.1.
(b) What can you say about the constraints you formed in part (a) if Pro-
gram 12.1 is not feasible and bounded?
(b) Suppose that Program A is unbounded. Can there exist a row vector
y that satisfies yA ≥ c and y ≥ 0.? If there cannot, must there exist
a column vector x̂ that satisfies the constraints in part (a)? Support
your answers.
(a)╇Is it true that the vectors x and y in that satisfy the constraints in part
(a) have cxâ•›≤â•›yb? Support your answer.
ŷ: â•… Ax ≤ b,
x̂ : â•… –yA ≤ –c,
θ: ╅↜yb ≤ cx,
â•… x ≥ 0, y ≥ 0.
(c)╇Can a solution to (*) exist that has θâ•›=â•›0? Support your answer. (↜Hint:
Refer to part (a) of the preceding problem.)
(d)╇Can a solution to (*) exist that has θâ•›=â•›1? What about θ > 0? Support
your answers.
11. ╇(bounded feasible regions) This problem concerns Program A: Max {cx},
subject to the constraints Ax ≤ b and x ≥ 0.
(a)╇ Suppose that Program A is feasible and that its feasible region is
bounded. Show that a row vector ŷ ≥satisfies
0 ŷ ≥ 0 and ŷA ≥ e,
where e is the 1 × n vector each of whose entries equals 1. Show that
the dual of Program A has an unbounded feasible region.
(b)╇ Suppose that the dual of Program A is feasible and that its feasible
region is bounded. Show that a column vector x̂ satisfies O
xx̂ ≥ 0 and
Axx̂ ≤ −ee where e is now the m × 1 vector of 1’s. Conclude that the
AO
feasible region of Program A is unbounded.
1 −1
A= ,
−1 1
Chapter 12: Eric V. Denardo 411
see whether you can find vectors b and c such that both feasible regions
are unbounded.
13. ╇(data envelopment) This problem concerns the data envelopment model
whose data are in Table 12.2.
(b) For the data in Table 12.2, unit B is enveloped. Can you determine
whether or not unit A is enveloped without solving a linear program?
If so, how?
15. ╇(the no-arbitrage tenet) This problem concerns a variant of the no-arbitrage
model. Let us assume that investors cannot go “short;” equivalently, that
each portfolio x must have xj ≥ 0 for jâ•›=â•›1, 2, …, n. The no-arbitrage tenet
remains unchanged, but the definition of a portfolio is more restrictive.
16. (strong complementary slackness) State and prove the variant of Proposi-
tion 12.7 for the variant of Program 12.1 in which Axâ•›=â•›b is replaced by
Axâ•›≤â•›b. Hint: Might it suffice to apply Proposition 12.6 and Proposition
12.7 to the linear program in which A is replaced by [A, I] ?
17. (a matrix game) Suppose that you and I know the entries in the m × n
matrix A. You pick a row. Simultaneously, I pick a column. If you pick
row i and I pick column j, I pay you the amount Aij . Suppose you choose
a randomized strategy p (a probability distribution over the rows) that
maximizes your smallest expected payoff over all columns I might choose.
(a) Interpret the constraints and the objective of the linear program:
Max v, subject to
q↜渀屮j: v≤ m for j = 1, 2, . . . , n,
i=1 pi Aij
m
w: i=1 pi = 1,
pi ≥ 0 for i = 1, 2, . . . , m.
(c) Write down and interpret the dual of this linear program. Is it feasible
and bounded?
(d) What does complementary slackness say about the optimal solutions
of the two linear programs?
Chapter 13: The Dual Simplex Pivot and Its Uses
1. Preview
The dual simplex method is presented in this chapter, as are three of its
uses. To introduce the first of these uses, we recall that the simplex method
consists of two phases. The “parametric self-dual method” is a pivot scheme
that has only one phase. It uses simplex pivots and dual simplex pivots to
move from a basic tableau to an optimal tableau. How it works is discussed
here.
x: 1a + 1b – 1d ≥ 2,
y: 1b + 2c + 3d ≥ 3,
a ≥ 0,â•… b ≥ 0,â•… c ≥ 0,â•… d ≥ 0.
Does Problem 13.A look familiar? Its dual is the linear program that was
used in Chapter 4 to introduce the simplex method. The initial steps of the
dual simplex method are also familiar. They are to:
Executing the first two of these steps casts Problem 13.A in the format of
(1.0) 6a + 7b + 9c + 9d – z = 0,
(1.1) 1a + 1b – 1d – t1 = 2,
(1.2) 1b + 2c + 3d – t2 = 3,
a ≥ 0,â•… b ≥ 0,â•… c ≥ 0,â•… d ≥ 0,â•… t1 ≥ 0,â•… t2 ≥ 0.
In Program 13.1, the variables t1 and t2 are not basic because their coef-
ficients in equations (1.1) and (1.2) equal −1. Multiplying these equations by
−1 produces a basic tableau and places Program 13.1 in the equivalent form,
(2.0) 6a + 7b + 9c + 9d –z= 0,
(2.1) – 1a – 1b + 1d + t1 = – 2,
(2.2) – 1b – 2c – 3d + t2 = – 3,
a ≥ 0,â•… b ≥ 0,â•… c ≥ 0,â•… d ≥ 0,â•… t1 ≥ 0,â•… t2 ≥ 0.
Chapter 13: Eric V. Denardo 415
The variables t1, t2 and −z are a basis for system (2). This basis satisfies the
optimality conditions for a minimization problem because the reduced costs
of the nonbasic variables are nonnegative. The basic solution is not feasible
because it sets
t1 = −2,â•…â•… t2 = −3,â•…â•… z = 0,
Phase II
• Each pivot preserves the optimality condition and aims to worsen the
basic solution’s objective value.
• In each pivot, the departing variable (pivot row) is chosen first, before
the entering variable (pivot column) is selected.
• The row in which the pivot element occurs has a RHS value that is
negative.
• A ratio is computed for each column whose entry in the pivot row is
negative, and this ratio equals the column’s reduced cost (top-row coef-
ficient) divided by its coefficient in the pivot row.
• The column in which the pivot element occurs has a ratio that is closest
to zero.
416 Linear Programming and Generalizations
The dual simplex method will be executed twice, first by hand and then
on a spreadsheet.
When the dual simplex method is applied to system (2), the first pivot
could occur a coefficient in equation (2.1) or (2.2) because both of their RHS
values are negative. We mimic Rule A of Chapter 4 and select the equation
whose RHS value is most negative, equation (2.2) in this case. Ratios are com-
puted for the variables b, c and d because their coefficients in equation (2.2)
are negative. These ratios equal 7/(−1), 9/(−2) and 9/(−3), respectively. The
first dual simplex pivot occurs on the coefficient of d in row (2.2) because
its ratio is closest to zero. Arrows record the selection of the pivot row and
column.
(2.0) 6a + 7b + 9c + 9d – z = 0,
(2.1) –1a – 1b â•› + 1d + t1 = –2,
(2.2) – 1b – 2c – 3d + t2 = –3,â•… ←
ratios: –7 –4.5 – 3
↑
z = 9,â•…â•… t1 = −3,â•…â•… d = 1.
The first pivot has increased the basic solution’s objective value from 0 to 9;
this worsened the objective value because Program 13.1 is a minimization
problem.
Pivoting on a spreadsheet
In Table 13.1, rows 8-10 reproduce the information in system (2). Cell I10
is shaded because its RHS value is the most negative. The functions in row
11 compute the ratios. Cell E11 is shaded because its ratio is closest to zero.
Evidently, the coefficient in cell E10 is the pivot element. Table 13.1 omits the
array function =pivot(E10, B8:I10) that executes the pivot and creates the ar-
ray in the block B15 : I17 of cells. This block corresponds to system (3). The
next pivot occurs on the coefficient in cell C16, and it creates the tableau in
cells B22 : I24. That tableau is optimal.
A coincidence?
This application of the dual simplex method required two pivots, and it
encountered tableaus whose basic solutions have objective values of 0, 9 and
18. When the simplex method was introduced Chapter 4, two pivots were
needed, and the same sequence of objective values was encountered. There
are other similarities (for instance, in the ratios), and they are not a coinci-
dence.
Chapter 13: Eric V. Denardo 419
Problem 13.A is the dual of the linear program that was used in Chapter
4 to introduce the simplex method, and it’s a fact that:
The dual simplex method is aptly named. It amounts to applying the simplex
method to the dual of the linear program.
A disappointment?
This equivalence suggests that the dual simplex method is nothing new,
that it is not useful. That is incorrect! Three uses of the dual simplex method
are presented in this chapter, and each of these uses is important.
The simplex method has two phases, as does the dual simplex method.
By contrast, the “parametric self-dual” method has one phase. It uses simplex
C. Lemke, “The dual method of solving linear programming problems,” Naval Re-
1╇
pivots and dual simplex pivots to move from a basic tableau to an optimal
tableau. It will be introduced in the context of
(5.3) â•›–2p – 3q ≤ 0,
Getting started
The first steps of the parametric self-dual method are familiar. They are
to:
In Program 13.2 (below), the surplus variable s1 and the slack variable s3
have been used to convert constraints (5.1) and (5.3) to equations, and z has
been defined as the objective value in the usual way.
(6.0) 4p + 1q + 2r –z= 0,
The basic solution to system (7) is not feasible because the RHS values
of equations (7.2) and (7.3) are negative. This basic solution also violates the
optimality condition for a maximization problem because q and r have posi-
tive the reduced costs.
System (7) should look familiar. It is. Problem 13.B is identical to the
linear program that was used in Chapter 6 to introduce Phase I of the simplex
method, and (7) is the basic tableau with which Phase I was initiated.
A homotopy
Correcting non-optimality
• Add α to each RHS value that is negative, but not from the equation for
which −z is basic.
The basic solution to system (8) is optimal for all values of α that satisfy
αâ•›≥â•›20. When α is slightly below 20, the RHS value of equation (8.3) becomes
negative, and a dual simplex pivot is needed to restore optimality. This pivot
occurs on a negative coefficient in equation (8.3), and that coefficient is de-
termined by the ratios in:
In Table 13.2, the variable q has the ratio that is closer to zero, so the 1st
pivot occurs on the coefficient of q in equation (8.3). Executing this pivot
casts Program 13.2α in the equivalent form,
Chapter 13: Eric V. Denardo 423
We seek the range on α for which the basic solution to system (9) is opti-
mal, namely, the range on α for which the reduced costs of r and of s3 satisfy
Dependence on α
System (9) indicates which coefficients can depend on α, and how they
depend on α. After any number of pivots, only the reduced costs and the RHS
values can depend on α, and the dependence is linear, except for the RHS
value of the equation for which −z is basic, for which the dependence can be
quadratic.
A spreadsheet
Table 13.3.↜ The optimal tableau for αâ•›≥â•›20 and the 1st pivot element.
To interpret Table 13.3:
• Compare rows 21 and 22 with the reduced costs in equation (8.0). Note
that row 21 contains the coefficients of α and that row 22 contains the
coefficients that are independent of α
• Compare columns H and I with the RHS values in system (8). Note that
column I contains the coefficients of α and that column H contains the
coefficients that are independent of α.
• The functions in cells B27:F27 compute the net reduced costs for this
value of α.
• The functions in cell K23:K25 compute the net RHS values of equations
(8.1)-(8.3) for the same value of α.
• The ratios are computed in row 28. Column C has the ratio that is clos-
est to zero, and cell C28 is shaded to record this fact.
• The 1st pivot will occur on the coefficient in cell C25, which lies at the
intersection of the pivot row and the pivot column. This coefficient is
shaded to record its selection.
Chapter 13: Eric V. Denardo 425
This pivot updates the entries in the entire tableau, including the row and
column that depend on α. The array function =pivot(C25, B21:I25) executes
the first pivot and produces the array in the block B32 : I36 of cells in Ta-
ble 13.4, below. That table contains the same information as does system (9).
In particular, rows 32 and 33 describe the equation
which is equation (9.0). The format of Table 13.4 is similar to that of Ta-
ble 13.3. To reduce clutter, the functions that compute the net reduced costs
and net RHS values have not been recorded.
Table 13.4.↜ The optimal tableau for 15â•›≤â•›αâ•›≤â•›20 and the 2nd pivot element.
Table 13.4 indicates that as α decreases to 15, the reduced cost of the non-
basic variable s3 increases to zero. The 2nd pivot will be a simplex pivot with s3
as the entering variable. The coefficient on which this pivot occurs is in cell
F34. The array function =pivot(F34, B32:I36) executes the 2nd pivot.
Table 13.5 contains the tableau that results from the 2nd pivot. Its format
is identical to that of Table 13.4. Evidently, as α decreases to 13 1/3, the re-
duced cost of r increases to 0. The 3rd pivot will be a simplex pivot, with r as
the entering variable. This pivot will occur on the coefficient of r in cell D47
because only it is positive.
426 Linear Programming and Generalizations
Table 13.5.↜ The optimal tableau for 13 1/3â•›≤â•›αâ•›≤â•›15 and the 3rd pivot element.
The basic solution to rows 54-58 of Table 13.5 is optimal when αâ•›=â•›0. This
basic solution equates the nonbasic variables q and s1 to 0, and it equates the
basic variables to the values
Recap
αâ•›≥â•›20. The 1st pivot produced a basis that is optimal for all α between 15 and
20. The 2nd pivot produced a basis that is optimal for all α between 13 1/3
and 15. The final pivot produced a basis that is optimal for all α between 0
and 13 1/3.
Speed
4. Branch-and-Bound
Let us recall from Chapter 1 that an integer program differs from a linear
program in that one or more of the decision variables must be integer-valued.
An example of an integer program appears below as
1a + 1b ≤ 5,
1a – 1b ≤ 0,
2a + 5c ≤ 7,
a ≥ 0,â•… b ≥ 0,â•… c ≥ 0,
a, b and c are integer-valued.
Problem 13.C is easy to solve by trial and error. Two candidate solutions are
a = 2, b = 3, c = 0, objective = 12,
a = 1 b = 4, c = 1, objective = 12.5,
This example will be used to introduce two different methods for solving
integer programs. Both of these methods solve a sequence of linear programs.
Each linear program after the first differs from a linear program that had
428 Linear Programming and Generalizations
been solved previously by the inclusion of one extra inequality constraint. For
that reason, each linear program other than the first is well-suited to solution
by the dual simplex method.
The LP relaxation
Three simplex pivots solve this relaxation and produce the tableau in Ta-
ble 13.7.
Branching
• Replace that linear program by two others, each with one added con-
straint. One of these new linear programs requires the decision variable
to be not greater than the next lower integer. The other requires the
decision variable to be not smaller than the next larger integer.
Bounding
The bound is the best (largest in the case of maximization) of the objec-
tive values of the feasible solutions to the integer program that have been
found so far. The initial bound is −∞ in the case of a maximization problem
and is +∞ in the case of a minimization problem. If the bound is finite, the
incumbent is a feasible solution to the integer program whose objective value
equals the bound.
430 Linear Programming and Generalizations
Pruning
• The linear program is feasible, but its optimal value fails to improve on
the prior bound.
No linear program that is pruned can have a feasible solution that satis-
fies the integrality constraints and has an objective value that improves the
incumbent’s.
A branch-and-bound tree
The optimal solution to the linear program at Node 2 sets câ•›=â•›0.6, and
Nodes 4 and 5 have the added constraint câ•›≤â•›0 and câ•›≥â•›1, respectively.
Let’s suppose that the linear program at Node 4 is solved next. Its optimal
solution is integer-valued and its optimal value equals 12. This optimal solu-
tion becomes the incumbent, and 12 becomes the bound. Any other node
whose optimal value is 12 or less can be pruned because adding constraints to
its linear program cannot improve (increase) its optimal value.
Executing branch-and-bound
One could solve the linear program at each node of the branch-and-
bound tree from scratch. An attractive alternative is start with the optimal
solution to the node that is one level up and use the dual simplex method
to account for the new (inequality) constraint. How that is accomplished is
discussed next.
Rows 27-30 of Table 13.8 contain the optimal tableau for the LP relax-
ation. Row 31 models the new constraint, aâ•›≤â•›2. It does so by introducing
a slack variable s4 that converts this constraint to the equation aâ•›+â•›s4â•›=â•›2.
The slack variable s4 is basic for row 31, but a is no longer basic for row
29. Pivoting on the coefficient of a in row 29 restores the basis, preserves
the optimality condition, and produces the tableau in rows 34-38. This
tableau’s basic solution sets s4â•›=â•›−â•›0.5. A dual simplex pivot is called for. It
occurs on the coefficient of s2 in row 38. This pivot produces an optimal
tableau. (In general, more than one dual simplex pivot may be needed to
produce an optimal tableau). Evidently, the optimal solution to the LP at
Node 2 sets
The decision variables s5, s1 and s2 are required to be nonnegative, so the left-
hand side of the above constraint can not be negative. The RHS value of this
constraint is negative, so it can have no solution. This demonstrates that the
LP at Node 3 is infeasible. That node can be pruned.
Node 4
Node 4 depicts the LP at Node 2 with the added constraint câ•›≤â•›0. Mimick-
ing the procedure used at node 2 solves this LP. (Its solution can be found in
the spreadsheet that accompanies this chapter). Its optimal solution is
Node 5
Node 5 depicts the LP at Node 2 with the added constraint câ•›≥â•›1. Mimic-
ing the procedure used at node 3 solves this linear program and produces the
optimal solution
• Making the slack or surplus variable for the new constraint basic for
that constraint.
• Making the decision variable that is being bounded basic for the equa-
tion for which it had been basic. (This does not affect the reduced costs,
so the optimality condition is preserved.)
Typically, only a few dual simplex pivots are needed to solve a particular
linear program in the tree. But the number of nodes in the branch-and-bound
tree could be enormous. More about this later.
Chapter 13: Eric V. Denardo 435
A pure integer program differs from a linear program in that every deci-
sion variable is required to be integer-valued. A mixed integer program dif-
fers from a linear program in that at one or more – but not all – of its decision
variables are required to be integer-valued.
Program 13.3 will now be solved by the “cutting plane” method. This
method can be made to work for the case of a mixed integer program. To sim-
plify the discussion, it is presented for the case of a pure integer program, for
A. H. Land and A. G. Doig, “An automatic method for describing discrete program-
2╇
Table 13.7 gives the optimal tableau for the LP relaxation of Program
13.3. This tableau is reproduced in dictionary format as.
The basic solution to system (11) is not feasible for Program 13.3 because it
fails to equate a, b and c to integer values.
A cutting plane
Each iteration of the cutting plane method selects any variable whose
value in the optimal solution violates an integrality constraint and uses the
equation for which it is basic to create one new constraint. Let us select the
variable a, as was done in the previous section. The basic solution to system
(11) sets aâ•›=â•›2.5, which violates the integrality constraint on a. We write 2.5 as
2â•›+â•›0.5 and write equation (11.2) as
The term “[…]” in (11.2) cannot exceed 0.5, so it must be that aâ•›≤â•›2. This leads
us to the linear program having the added constraint
(11.4) s4 + a = 2,
The basic solution to system (12) equates the variables a, b and s2 to inte-
ger values. Only c is equated to a fraction, so equation (12.3) must be used to
produce a cutting plane. The addend 0.4 s4 on the right-hand-side of (12.3)
seems to present a slight difficulty, but substituting 0.4â•›=â•›1.0â•›−â•›0.6 lets (12.3)
be written as
The term “[…]” in the above cannot be larger than 0.6, so each integer-valued
solution to (123) satisfies câ•›≤â•›s4. This generates the cutting plane
(12.5) s6 + c − s4 = 0,
A spreadsheet
Rows 20-25 of Table 13.10 describe a basic tableau that satisfies the opti-
mality condition for a maximization problem (the reduced costs are nonposi-
tive) and that equates the basic variables to nonnegative integer values. An
optimal solution to Problem 13.C has been found, and it is
Strong cuts
The first cut used equation (11.4) to impose the constraint aâ•›≤â•›2 and the
second cut used equation (12.5) to impose the constraint câ•›≤â•›s4. These were (as
it will turn out) the “strong” cuts of their type. Precise mathematical definition
of a strong cut requires notation that is slightly involved, but the example
will make everything clear. In this example, a is a basic variable, and b and c
are nonbasic. Substituting 3.8 = 3â•›+â•›0.8 and 1.8 = 2â•›−â•›0.2 and −3.4 = −3â•›−â•›0.4
into (13) rewrites it as
(15) a ≤ 3 + 2b − 3c.
What’s best?
The cutting plane method is surely elegant, but it tends to run slowly on
practical problems. The branch-and-bound method is supremely inelegant,
but it runs surprisingly quickly on practical problems. Really big integer pro-
grams – such as routing fleets of aircraft – are being solved. They are solved by
3╇
Gomory, R. E., “Outline of an algorithm for integer solutions to linear programs,”
Bull. Amer. Math. Soc., V. 64, pp. 275-278, 1958.
4╇
Gomory, R. E., “An algorithm for integer solutions to linear programs,” Princeton-
IBM Research Technical Report Number 1, Nov. 17, 1958. Reprinted as pp. 269-302
of Recent Advances in Mathematical Programming, R. L. Graves and P. Wolfe (eds.),
McGraw-Hill, NY, 1963.
440 Linear Programming and Generalizations
artfully designed hybrid algorithms that begin with cutting planes and switch
to branch-and-bound.
6. Review
This chapter completes our account of the simplex method. It and its
twin, the dual simplex method, are so fast that they can be used as subrou-
tines in algorithms that solve optimization problems that are not linear. Their
speed and the fact that they report extreme points as optimal solutions ac-
count for the fact that rather large integer programs can be solved – and are
solved – fairly quickly.
1. Execute the dual simplex method on Program 13.1, but arrange for the
first pivot to occur on a coefficient in equation (2.1), rather than in equa-
tion (2.2).
2. Write the dual of Problem 13.A, labeling the variables that are comple-
mentary to its constraints as x and y and its slack variables as s1 through s4.
(a) Is the linear program you just constructed identical to a linear pro-
gram in Chapter 4?
(b) What pairs of variables (one from Problem 13.A and the other from
the dual that you have just constructed) are complementary?
(c) Fill in the blanks: At each iteration of the application of the dual
simplex method to Problem 13.A, the reduced cost of each decision
variable equals the value of the ___ in the comparable iteration of
____________.
4. (cycling and Bland’s rule) Suppose that Phase II of the dual simplex method
is being used to solve a linear program whose objective is maximization
Chapter 13: Eric V. Denardo 441
and that a basic tableau has been found that has nonpositive reduced costs.
Fill in the blanks:
(a) The analogue of Rule A for the simplex method picks the pivot row as
follows: The pivot row has the most ____ RHS value; ties, if any, are
broken by picking the row whose basic variable is listed farthest to the
____.
(b) The analogue of Rule A for the simplex method picks the pivot col-
umn as one whose ratio is ____ to zero; ties, if any, are broken by pick-
ing the column that is _____.
(c) The rule chosen in parts (a) and (b) will cycle when it is used to solve
the dual of the linear program that appears in this book as _____.
(d) Cycling in the dual simplex method is precluded by using the ana-
logue of Bland’s rule, which resolves the ambiguity in the pivot ele-
ment as follows _____________.
5. In branch-and-bound, can it occur that all nodes get pruned? If so, under
what circumstances does it occur?
7. Solve Problem 13.C (on page 427) by branch-and-bound, but branch first
on the variable c, rather than a. Construct the analogue of Figure 13.1. (An
optimal tableau for the LP relaxation appears as Table 13.7 on the spread-
sheet for this chapter.)
8. Solve Problem 13.C (on page 427) by the cutting plane method, but begin
with a cutting plane for equation (11.3) rather than (11.2). (An optimal
tableau for the LP relaxation appears as Table 13.7 on the spreadsheet for
this chapter.)
Part V–Game Theory
Linear programs deal with a single decision maker who acts in his or her
best interest. Game theory deals with multiple decision makers, each of whom
acts in his or her own best interest. At first glance, these subjects seem to have
nothing in common. But there are two strong connections – one through the
Duality Theorem, the other through the simplex method.
The bi-matrix game is not a zero-sum game. The Duality Theorem pro-
vides no insight into it. But the simplex method does. Feasible pivots are used
to construct an equilibrium.
1. Preview
Prior chapters of this book have been focused on the search by a single
decision maker (individual or firm) for a strategy whose net benefit is largest,
equivalently, whose net cost is smallest.
Game theory is the study of models in which two or more decision mak-
ers must select strategies and in which each decision maker’s well being can
be affected by his or her strategy and by the strategies of the other partici-
pants. Being mindful of the interests of the other participants lies at the heart
of game theory.
It is emphasized:
In game theory, each of several players selects his or her strategy. As men-
tioned above, the benefit each player receives can depend on that player’s
strategy and on the strategies of the other players. Listed below are three dif-
ferent performance criteria:
1╇
This usage of “stable” is not uniformly agreed upon. Some writers have used “strong
equilibrium” instead, but that usage never caught on. Other writers describe a set of
outcomes (rather than strategies) as stable if no group of participants can all get out-
comes they prefer by changing their strategies simultaneously.
2╇
In the literature on economics, what we are calling an equilibrium is sometimes
referred to as a Nash equilibrium; this distinguishes it from a general equilibrium, this
being a Nash equilibrium in which the “market clears.”
Chapter 14: Eric V. Denardo 447
This chapter presents a model for which each of these solution concepts
is germane. The “sealed-bid auction” in Section 3 has a dominant strategy.
The “marriage game” in Section 4 has stable strategies. The matrix game in
Section 5 has an equilibrium, as does the simplified model of an economy in
Section 6.
• The artwork is sold to the most recent (and highest) bidder at the price
that he or she had bid.
448 Linear Programming and Generalizations
A Vickery auction
• The bidders are precluded from sharing any information about their
bids.
• The bids are opened, and the item is purchased by the person whose
bid is highest. The price that person pays is the second highest bid.
Such an auction has long been known as a Vickery auction. For it and
related work, Vickery (also spelled Vickrey) was awarded a share of the 1996
Nobel Prize in Economics. Like many good ideas, this one has deep roots.
Stamps had been auctioned in this way since the 1890’s. To illustrate this type
of auction, we consider:
Problem 14.A.╇ A Vickery auction will be used to sell a home. You are willing
to pay as much as $432,000 for this home. Others will bid on it. You have no
idea what they will bid. What do you bid?
A dominant strategy
Should you bid less than $432 thousand? Suppose you do. For purposes
of discussion, suppose you bid $420 thousand. If you win the auction, you pay
exactly the same price that you would have paid if you had bid $432 thousand.
But suppose the high bid was $425 thousand. By bidding low, you lost. Had
you bid $432 thousand, you would have gotten the home for $7 thousand less
than you would have been willing to pay for it. Evidently, you should not bid
less than $432 thousand.
Should you bid more than $432 thousand? Suppose you do. If you win the
auction, you pay the second highest price. If that price is below $432 thou-
sand, you pay the same amount you would if you had bid $432 thousand. If
that price is above $432 thousand, you pay more than the value of you placed
on the house.
Evidently, you should bid exactly $432 thousand. This strategy is domi-
nant because it is best for you, independent of what the other players do. In a
Vickery auction, it is optimal for each bidder to bid the value that he or she
places on the item. The winning bidder earns a profit equal to the difference
between that person’s bid and the second highest bid.
The next illustration of game theory is to a market in that has two classes
of participant, say, class A and class B. Each member of class A seeks an as-
sociation with a member of class B, and conversely. It is assumed that:
• Each member of class A has a strict ranking over some or all of the
members of class B.
• Each member of class B has a strict ranking over some or all of the
members of class A.
If member α of either class omits member β of the other class from his/
her ranking, it means that member α prefers being unassociated to being
paired with member β. To illustrate this type of two-sided market, consider:
Graduates and firms: Each college graduate has a strict ranking over the var-
ious positions that he or she might fill, and each firm with an open position
has a strict ranking over the college graduates who might fill it.
Medical school graduates and hospitals: Each medical school graduate has a
strict preference over the internships that she or he might wish to fill. Each
hospital with one or more open internships has a strict preference over the
graduates who might fill each position.
Note that if a firm or a hospital has more than one open position, it is as-
sumed to rank candidates for its open positions independently.
450 Linear Programming and Generalizations
A matching
Unstable matchings
• It can match an internship to a graduate that the hospital did not rank.
• It can match a graduate to an internship that the graduate did not rank.
In the first example, the hospital prefers to leave the internship vacant. In
the second, the graduate prefers no internship. In the third, the graduate and
hospital prefer to abandon their current assignments and pair up with each
other.
• Can there be more than one stable matching? If so, how do they com-
pare?
In 1962, David Gale and Lloyd Shapley coauthored a lovely paper4 that
posed this matching problem and answered the questions that are listed above.
D. Gale and L. Shapley, “College admissions and the stability of marriage,” American
4╇
Problem 14.B (the dance competition).╇ For Saturday night’s ballroom dance
competition, each of four women can be paired any of five men. The four
women are labeled A through D, and the five men are labeled v through z.
Each woman has a strict preference over the men, and each man has a strict
preference over the women. The preferences are listed in Table 14.1. This table
indicates that woman A’s first choice is man w, her second choice is man x,
and so forth. This table also indicates that man z’s 1st choice is woman A, his
2nd choice is woman B, his 3rd choice is woman D, and he prefers to stay home
than be partnered with woman C.
Let us consider whether a matching that includes the pairs (A, v) and (B,
x) can be stable. Woman B and man v are ranked with their 1st choice part-
ners. They are can do no better. What about woman A and man x? Woman A
prefers man x to her assigned partner. Man x prefers woman A to his assigned
partner. Given the option, they will break their dates and go to the dance
competition with each other. This matching is not stable.
DAP/M
The procedure that is described below is has the acronym DAP/M, which
is short for “deferred acceptance procedure with men proposing.” The bid-
ding process is as follows:
1. In the first round, each man proposes to the woman he ranks as best.
2. Each woman who has multiple offers rejects all but the one she ranks
best. (No woman has yet accepted any offer.)
452 Linear Programming and Generalizations
3. In the next round, each man who was rejected in Step 2 and has not ex-
hausted his preference list proposes to the woman he ranks just below
the one who just rejected him. Return to Step 2.
Table 14.2 shows what happens when DAP/M is applied to the data in
Table 14.1. In the first round, woman A receives three offers and woman D
receives two. Woman A rejects offers from men y and z because she prefers
man v. Woman D rejects man x because she prefers man w. In the second
round, men x, y and z propose to the women they rank as second. Proposals
continue for four rounds. In Round 4, only man z proposes, and at the end of
this round he has been rejected by each woman he wishes to dance with. At
that point, DAP/M establishes the matching:
Proof of stability
Evidently, woman α and man ω do not prefer each other to their current
assignments. The matching created by DAP/M is stable.
DAP/W
DAP/W describes the same deferred acceptance procedure, but with the
women doing the proposing. For the data in Table 14.1, the women rank dif-
ferent men first, but woman C ranks man z highest, and he would rather
stay home, so she proposes to man y in the 2nd round. At this point, no man
has more than one offer, so the bidding process terminates. It produces the
matching
with man z unmatched. This too is a stable matching, and for the same reason.
Table 14.3.↜渀 The rank that each player places on her/his partner
under DAP/M and DAP/W.
process A B C D v w x y z
DAP/M 3 3 5 2 1 2 2 2 4
DAP/W 1 1 2 1 4 4 3 4 4
The general matching problem that we’ve illustrated is known as the mar-
riage problem and is as follows:
454 Linear Programming and Generalizations
• Each member of set M has a strict preference ranking over some or all
of the members of W.
Proposition 14.1.╇ For the marriage problem, DAP/M and DAP/W pro-
duce stable matchings. Indeed:
(a) The stable matching produced by DAP/M is best for each member
of M and is worst for each member of W.
(b) The stable matching produced by DAP/W is best for each member of
W and is worst for each member of M.
Proof.╇ The proof that DAP/M (and hence DAP/W) produce stable match-
ings is identical to the proof given for Problem 14.B. An inductive proof of
parts (a) and (b) can be found on page 32 of a book by Roth and Sotomayor.5 ■
retic modeling and analysis, Cambridge University Press, Cambridge, England, 1990.
Chapter 14: Eric V. Denardo 455
Presented in this section is a game that had analyzed decades before the
advent of linear programming by the great 20th century mathematician, John
von Neumann. To introduce this game, we consider:
Problem 14.C.╇ You and I know the data in the matrix A whose entries are
given in equation (1). You choose a row of this matrix. Simultaneously, I
choose a column. I pay you the amount at the intersection. Each of us prefers
more money to less.
3 5 −2
(1) A = .
6 7 4
Problem 14.C is a zero-sum game because you win what I lose. For the
payoff matrix A that is given by (1), it is easy to see how to play this game.
You prefer row 2 to row 1 because each entry in row 2 is larger than the cor-
responding entry in row 1. Playing row 2 is dominant for you; it is better for
you than row 1 no matter what I do.
• I play column 3.
A minor complication
Let’s reconsider the same game with a different payoff matrix A, namely,
with
9 1 3
(2) A = 6 5 4 .
0 8 2
456 Linear Programming and Generalizations
With payoff matrix (2), you no longer have a dominant row, and I no
longer have a dominant column.
To see what you should do, think about the least you can win if you pick
each row. If you pick row 1, the least you can win is 1. If you pick row 2, the
least you can win is 4. If you pick row 3, the least you can win is 0. Evidently,
picking row 2 maximizes the least you can win.
The payoff matrices given by equations (1) and (2) did not make this
game famous. Let us now turn our attention to the 3 × 4 payoff matrix A
given as
5 2 6 4
(3) A = 2 3 1 2 .
1 4 7 6
The “row mins” equal 2, 1 and 1. The “column maxes” equal 5, 4, 7 and 6.
The largest of the row mins (namely 2) is less than the smallest of the column
maxes (namely 4). This is enough to guarantee that there can be no equilib-
rium in which you play a particular row and I play a particular column.
Randomized strategies
Matrix notation
It proves handy to represent your strategy as a 1 × 3 (row) vector p and
mine as a 4 × 1 (column) vector q, so that
Let us recall that the jth column of the matrix A is denoted Aj and that its
i row is denoted Ai . For the 3 × 4 payoff matrix A given by expression (3),
th
pAj equals your expected payoff if you choose strategy p and I play
column j,
Aiq equals my expected payout if I choose strategy q and you play row i.
Similarly, if I choose strategy q and you play row 3, your expected payoff
(and my expected payout) equals
Particular strategies
With strategy p∗ , you pick row 1 with probability of 1/2, you pick row
3 with probability of 1/2, and you avoid row 2. Similarly, with strategy q ∗ , I
pick column 1 with probability 1/3, I pick column 2 with probability 2/3, and
I avoid the other two columns.
The entries in the matrix product Pp∗ A equals my expected payout if you
choose strategy p∗ and I choose the corresponding column of A.
5 2 6 4
(4) p∗ A = [1/2 0 1/2] 2 3 1 2 = 3 3 6.5 5
1 4 7 6
Evidently, if you play strategy p∗ , the least I can expect to lose is 3. More-
over, my expected payout equals 3 if I randomize in any way over columns 1
and 2 and play columns 3 and 4 with probability 0. Strategy q ∗ randomizes in
this way, in which sense it is a best response to strategy p∗ .
Similarly, the entries in the matrix product Aq ∗ equal your expected pay-
off if I play strategy q ∗ and you choose the corresponding row of A.
1/3
5 2 6 4 3
2/3
(5) Aq ∗ = 2 3 1 2
0 = 2.67
1 4 7 6 3
0
Evidently, if I play strategy q ∗ , the most you can expect to win is 3, and
you win 3 if you randomize in any way over rows 1 and 3 and avoid row 2.
Strategy p∗ randomizes in this way, so it is a best response to strategy q ∗ .
An equilibrium
Expressions (6) and (7) show that the pair p∗ and q ∗ of strategies is an
equilibrium; each of these strategies is a best response to the other. These
expressions also show that 3 is the value of the game.
A “maximin” problem
If the row player chooses strategy p and the column player chooses col-
umn j, the row player’s expected payoff equals pAj , and it seems natural from
expression (4) that the row player aims to maximize the smallest such payoff.
In other words, the row player seeks a randomized strategy p∗ that solves:
p1 + p2 + … + pm = 1,
pi ≥ 0 for i = 1,â•›…,â•›m.
p1 + p2 + … + pm = 1,â•…â•…
pi ≥ 0 for i = 1,â•›…,â•›m.
in A. To see that Program 14.2 is bounded, note that each feasible solution
equates v to a value that does not exceed the largest entry in A.
A minimax problem
q1 + q2 + …+ qn = 1,
qj ≥ 0 for j = 1,…,â•›n.
We have seen that Program 14.2 and Program 14.4 are feasible and
bounded. Hence, both linear programs have optimal solutions.
• Each optimal solution to Program 14.2 prescribes the value v∗ of its
objective and a vector p∗ of probabilities.
Proposition 14.2.╇ Optimal solutions to Programs 14.2 and 14.4 exist and
have these properties: v∗ = w∗ and the pair p∗ and q ∗ form an equilibrium.
Chapter 14: Eric V. Denardo 461
Proof*.╇ It is left to you (see Problem 11) to verify that Program 14.2 and
Program 14.4 are each others’ duals. These linear programs are feasible and
bounded; the Duality Theorem (Proposition 12.2) demonstrates that they
have the same optimal value. So v∗ = w∗ . From constraint (8) and (9), we
see that the optimal solutions to Programs 14.2 and 14.4 satisfy
Multiply inequality (10) by the nonnegative number qj∗ and then sum
over j. Since the qj∗ ’s sum to 1, this gives v∗ ≤ p*Aâ•›q*. Similarly, multiply (11)
by p∗i and then sum over i to obtain v* ≥ p*Aâ•›q*. Thus,
(12) v∗ = p∗ Aq ∗ .
Next, consider any strategy q for the column player. Multiply (10) by qj
and then sum over j to obtain
(13) v∗ ≤ p∗ Aq.
Finally, consider any strategy p for the row player. Multiply (11) by pi and
then sum over i to obtain
(14) v∗ ≥ p Aq ∗ .
It is easy to see that Programs 14.2 and 14.4 satisfy the Full Rank proviso,
hence that each basic solution to either assigns a shadow price to each con-
straint. The shadow prices for the optimal basis are an optimal solution to
the dual (Proposition 12.2), so it is only necessary to solve one of these linear
programs. The shadow prices that Solver reports for either linear program are
an optimal solution to the other linear program.
462 Linear Programming and Generalizations
v* = maxp{minj(pAj)},
v* = minq{maxi(Aiq)}.
It’s easy to see that
Thus, the analysis of Program 14.2 and its dual have proved:
An historic conversation
• Dantzig was surprised to learn that the simplex method solves two lin-
ear programs, the one under attack and its dual.
• von Neumann was surprised to learn that the simplex method is the
natural weapon with which to prove the minimax theorem and to com-
pute solutions to matrix games.
2002.
Chapter 14: Eric V. Denardo 463
Aggregation
Agents
• The prices of the goods are endogeneous, which is to say that they are
set within the model.
• The consumers own all of the goods. Each consumer begins with an
endowment of each good.
• At the market, each consumer sells the goods that he or she owns but
does not wish to consume and buys the goods that he or she wishes to
consume, but does not own.
464 Linear Programming and Generalizations
• Each consumer faces a budget constraint, namely, that the market value
of the goods that the consumer buys cannot exceed the market value of
the goods that the consumer sells.
• Each consumer trades at the market in order to maximize the value (to
that consumer) of the bundle of goods that he or she consumes.
The market for a good is said to clear if the quantity of the good that
is offered for sale is at least as large as the quantity that is demanded. (The
model that is under development allows for free disposal of unwanted goods.)
Whether or not the market clears can depend on the price; if the price of a
good is too low, its demand may exceed its supply. These prices are endog-
enous, which is to say that they are determined within the model.
A general equilibrium
• Each consumer maximizes his or her welfare, given the actions of the
other participants.
• Each producer maximizes profit, given the actions of the other partici-
pants.
• Each good is traded at the market and at a price that is set within the
model.
A simplification
The data
The data that describe this one-period model are listed below:
• For each good g and each technology t, the quantity Agt equals the net
output of good g per unit level of technology t.
Mnemonics are in use: The letter “g” identifies a good, the letter “t” iden-
tifies a technology, the number eg is called the consumer’s endowment of
good g, and the number ug is called the consumer’s utility of each unit of
good g. Goods exist in nonnegative quantities, so these endowments (the
eg’s) are nonnegative numbers. The consumer owns all of the assets. If, for
instance, good 7 is steel, then e7 equals the number of units of steel that
466 Linear Programming and Generalizations
the consumer possesses at the start of the period. In this (linear) model,
the per-unit utility can vary with the good, but not with the quantity that is
consumed.
“Net output” can have any sign. If Agt is positive, good g is an output of
technology t. If Agt is negative, good g is an input to technology t. The m × n
array A of net outputs describes a linear activity analysis model of the sort
that had been discussed in Chapter 7.
Problem 14.D (general equilibrium).╇ For this model, is there a general equi-
librium? If so, how can it be found?
For the model that is under development, a linear program and its dual
will be used to demonstrate that a general equilibrium exists and to construct
one.
The decision variables in these linear programs are of three types – the
level at which each technology is operated, the market price for each good,
and the amount of each good that the consumer consumes. For tâ•›=â•›1, …, n and
for gâ•›=â•›1, …, m, the model includes the decision variables:
xt = â•›the level at which the producers operate technology t during the period,
zg = the amount of good g that the consumer consumes during the period.
xt ≥ 0,â•…â•… t = 1,…,â•›n,
zg ≥ 0,â•…â•… g = 1…â•›m.
Net production
The net production of good g during the period is given by the quantity
n
t=1 Agt xt
Net profit
A producer who operates technology t must buy its inputs at the market
and sell its outputs at the market. Thus
m
(16) g=1 pg Agt ==the
thenet
netprofit
pro t per unit
unit level
level of
of technology
technology t.t .
This sum equals the revenue received from the outputs of the technology
less the price paid for its inputs. Capital is an input, so this sum is positive if
the producer earns an excess profit, that is, a profit that is above the market
rate of return on capital.
A producers’ equilibrium
xt ≥ 0 for t = 1, . . . , n,
m
g=1 pg Agt ≤ 0 for t = 1, . . . , n,
m
xt g=1 pg Agt = 0 for t = 1, . . . , n.
But they were unnoticed for decades after Walras’s seminal work (1884) on
general equilibrium.
Market clearing
The market clearing constraint for good g states that the amount zg of
good g that is consumed by the consumer cannot exceed the sum of the con-
sumer’s endowment eg and the net production of good g. Expression (16)
specifies the net production of good g, so market clearing requires
n
zg ≤ eg + t=1 Agt xt , for g =
=1,1,2,
2,…,â•›
. . .m .
, m.
Free disposal accounts for the fact that these constraints are inequalities,
not equations. If a good g were noxious (think of slag), its market clearing
constraint would be an equation, rather than an inequality.
A consumer’s equilibrium
The consumer faces a budget constraint, namely, that the market value
of the bundle of goods that the consumer consumes cannot exceed the market
value of the consumer’s endowment. At a given set of prices, a consumer’s
equilibrium is any trading and consumption plan that maximizes the utility
of the bundle of goods that the consumer consumes, subject to the consum-
er’s budget constraint. Our model has only one consumer, and the satisfaction
that the consumer receives from each good is linear in the amount of that
good that is consumed. For our model, a consumer’s equilibrium is an opti-
mal solution to the linear program,
Chapter 14: Eric V. Denardo 469
m
Consumer’s LP.╇ Maximizez g=1 ug zg , subject to the constraints
mm
mm
g=1 pq zg ≤ g=1 ppggeegg, ,
g=1 g=1
zg ≥ 0, for g = 1, . . . , m.
In this LP, the subscript “z” to the right of “Maximize” is a signal that
the zg’s are its decision variables. The prices (the pg’s) are fixed because the
consumer has no direct effect on the prices. The objective of this linear pro-
gram measures the consumer’s level of satisfaction (utility) with the bundle
of goods that he or she consumes. Its constraint keeps the market value of
the bundle of goods that the consumer consumes from exceeding the market
value of the consumer’s endowment.
A linear program
We are now poised to answer the questions posed in Problem 14.D. For
the case of a single (canonical) consumer, a general equilibrium will be con-
structed from the optimal solutions to Program 14.5 (below) and its dual.
m
Program 14.5.╇ u∗ = Maximizez,x g=1 ug zg , subject to the constraints
n
zgxt−≤nt=1
n
pg : pzg :−
pg : t=1zA −
g gt egt=1
,AgtA
for
t≤
xgt =
xgt e≤g ,e1,
g ,for g, m,
. . .for =g 1,
= .1,. .. ,. m,
. , m,
xt ≥ 0, for xt ≥xt t =
≥ 1,
0, 0, .for ,t n,=t 1,
. . for = .1,. .. ,. n,
. , n,
zg ≥ 0, for zg ≥zgg 0,= 0,
≥ 1, for g, m.
. . .for =g 1,
= .1,. .. ,. m.
. , m.
A curious feature of Program 14.5 is that the producers are altruistic; they
set their production levels so as to maximize the consumer’s level of satisfac-
tion, with no regard for their own welfare.
Is Program 14.5 feasible? Yes. The endowments (the eg’s) are nonnegative
numbers, so it is feasible to equate each decision variable to zero. Program 14.5
enforces the market clearing constraints. It omits these facets of a general
equilibrium:
470 Linear Programming and Generalizations
The notation hints that the optimal values of the dual variables will serve
as market prices, that a general equilibrium will be constructed from optimal
solutions to Program 14.5 and its dual.
In Program 14.5, the market clearing constraint on good g has been as-
signed the complementary dual variable, pg . Each decision variable in Pro-
gram 14.5 gives rise to a complementary constraint in its dual. This dual ap-
pears below as
m
Program 14.6.╇ u∗ = Minimize g=1 eg pg , subject to the constraints
m m
xt : g=1 : xt :gtm
pxgt (−A ) ≥ p0,g=1
g=1 pfor
g (−A g (−A =gt0,)1,≥. .0,for
gt )t ≥ . , n,t for t=
= 1, . . .1,, .n,. . , n,
zg : zg : zg p: g ≥ ug , forpgg≥=pugg1,,≥.u.for
g. , , m,= g1,=
gfor . . .1,, .m,
. . , m,
pg ≥ 0, forpgg≥=p0,g1,≥.0,. , m.
.for = g1,=
gfor . . .1,, .m.
. . , m.
(a) Program 14.5 and Program 14.6 have optimal solutions and have
the same optimal value.
Remark:╇ The proof of Proposition 14.4 rests on the Duality Theorem for
linear programming and is starred for that reason.
Chapter 14: Eric V. Denardo 471
(19) ug zg = pg zg , for g = 1, . . . , m.
u∗ = m
m m
g=1 pg eg = g=1 ug zg = g=1 pg zg .
This equation demonstrates that the market value m g=1 pg eg of the con-
m
sumer’s endowment equals the market value g=1 g g of the bundle of
p z
goods that the consumer consumes. In other words, this is a consumer’s equi-
librium, and the proof is complete. ■
472 Linear Programming and Generalizations
Recap
The key to this analysis has been the assumption of a single (canonical)
consumer. With one consumer, the content of this theorem remains valid for
the case of decreasing marginal returns on production and consumption. For
this more general case, the Lagrange multipliers that Solver reports are the
market prices.
Program 14.5 and its dual can have alternative optimal, but they have a
unique optimal value, u*. Thus, the consumers’ optimal consumption bundle
need not be unique, but the benefit obtained by the consumer is unique.
Problem 14.E (the prisoner’s dilemma).╇ You and I have been arrested and
have been placed in separate cells. The district attorney calls us into her office
and tells us that she is confident that we committed a major crime, but she
only has enough evidence to get us convicted of a minor crime. If neither of
Chapter 14: Eric V. Denardo 473
us squeals on the other, each of us will do 1 year in jail on the minor crime. If
only one of us squeals, the squealer will not go to jail and the other will serve
7 years for the major crime. If both of us squeal, each will go to jail for 5 years
for the major crime. She tells us that we must make our decisions indepen-
dently, and then sends us back to our respective cells. She visits each of our
cells and asks each of us to squeal. Each of us prefers less time in the slammer
to more. How shall we respond?
The players in this game are being treated symmetrically. If both players
clam, both serve 1 year. If one clams and the other squeals, the squealer serves
no time and the person who was squealed on serves 7 years. If both squeal,
both serve 5 years. Table 14.4 displays the cost to each player under each pair
of strategies.
Table 14.4.↜渀 The cost of each strategy pair, my cost at the left,
yours at the right
8. Review
1. (a Vickery auction) In Problem 14.A, suppose you (the seller) are allowed
to place a sealed bid on the property that is for sale. Let V denote the value
you place on the property. What do you bid? Do you have a dominant
strategy? Under what circumstance will you earn a profit, and how large
will it be?
2. (the marriage game) Suppose all rankings are as in Table 14.1, except that
man z’s ranking is A B D C. What matching would DAP/M produce? What
ranking would DAP/W produce? Would the same man stay home in both
of these rankings?
3. (the marriage game) How could you determine whether or not a particular
instance of the marriage game has a unique stable matching?
4. In Problem 14.B (↜the marriage game), the men doing the proposing and
the true preferences are as given in Table 14.1. Can the women to misrep-
resent their preferences in such a way that DAP/M yields the stable match-
ing that they would attain under DAP?W? If so, how?
5. (lunch) Each of six students brings a sandwich to school. These six students
meet for lunch in the school cafeteria. Each student has a strict preference
Chapter 14: Eric V. Denardo 475
over the sandwiches. The students are labeled A through F. The sandwich-
es they bring are labeled a through f, respectively. Their preferences are as
indicated below. For instance, student A’s 1st choice is sandwich c (which
was brought by student C), her 2nd choice is sandwich e, her 3rd choice is
sandwich f, her 4th choice is the sandwich she brought, and so forth. Each
of these students will eat a sandwich, at lunch but not necessarily the one
that he or she brought.
student preference
A c e f a b d
B b a c e f d
C e f c a d b
D c a b e d f
E d c b f e a
F b d e f a c
7. Some matrix games can be solved by “eyeball.” The matrix game whose
payoff matrix A is given by (3) is one of them. Let’s see how:
(a) I (the column player) look at A and observe that playing rows 1 and
3 with probability of 0.5 is at least as good for you as is playing row 2.
What does this tell me about your best strategy?
(b) If you choose the randomization in part (a), my expected payout for
my four pure strategies form the vector [3 3 6.5 5]. What does
this tell me about the columns I should avoid?
476 Linear Programming and Generalizations
(c) Has this game been boiled down to the 2 × 2 payoff matrix
5 2
?
1 4
╇ 8. ╛True or false: In a zero-sum matrix game, each player has a dominant
strategy. Support your answer.
╇ 9. â•›Consider a two-person zero-sum matrix game whose m × n payoff ma-
trix A has the property that
maxi(minjâ•›Aij) = minj(maxiâ•›Aij).
10. Use a linear program to find an equilibrium for the zero-sum matrix
game with payoff matrix A that is given by
1 2 3 4
A = 6 3 0 −3 .
2 8 −5 −1
11. Using the cross-over table, take the dual of Program 14.2. Interpret the
linear program that you obtained. Are both linear programs feasible and
bounded? What does complementary slackness say about their optimal
solutions?
13. Find an equilibrium for the bimatrix game in which your cost matrix A
and my cost matrix B are given below. Hints: Can I pick a strategy such
that what you lose is independent of the row you choose? Can you do
something similar?
1 5 4 1
A= , B=
3 0 3 6
14. Dick and Harry are the world’s best sprinters. They will place 1st and 2nd
in each race that they both run, whether or not they take a performance-
enhancing drug. Dick and Harry are equally likely to win a race in which
neither of them take this drug or if both take it. Either of them is certain
to win if only he takes it. They are in it for the money. Each race pays K
dollars to the winner, nothing to anyone else.
There is a test for this drug. The test is perfect (no false positives or
false negatives), but it is expensive to administer. If an athlete is tested at
the conclusion of the race and is found to have taken the drug, he is dis-
qualified from that race and is fined 12 K dollars.
(a) Without drug testing, are there dominant strategies? If so, what are they?
(b) Now, suppose that with probability p Harry and David are both tested
for this drug at the conclusion of the race. Are there values of p that
are large enough to that neither cheats? If so, how large must p be?
(c) Redo part (b) for the case in which Harry and David have product
endorsement contracts that are worth 10 K and that each of their con-
tracts has a clause stating that payment will cease if he tests positive
for performance-enhancing drugs.
15. You are a contestant in the TV quiz show, Final Jeopardy. Its final round
is about to commence. Each of three contestants (yourself included) has
accumulated a certain amount of “funny money” by answering questions
correctly in prior rounds. The rules for the final round are these:
• The program’s host announces the category of a question that he will pose.
• Knowing the category but not the question, each contestant wagers part
or all of his/her funny money by writing that amount on a slate that is
not visible to anyone else.
478 Linear Programming and Generalizations
• Next, the question is posed, and each contestant writes his/her answer
on the same slate.
• Then, the slates are made visible. Each contestant who had the correct
(incorrect) answer has his/her funny money increased (decreased) by
the amount of that person’s wager.
• The contestant whose final amount of funny money is largest wins that
amount in dollars. The others win nothing.
Having heard the final category, you are confident that you will be able to
answer the question correctly with probability q that exceeds 0.5 . Your
goal is to maximize the expectation of the amount that you will win.
(a) Denote as y your wealth position in funny money at the end of the
final round, and denote as f(y) the probability that you will win given
y. Can f(y) decrease as y increases?
(b) Denote as x your wealth position in funny money as the start of the
final round and, given a wager of w (necessarily, w ≤ x), denote as e(x,
w) the expectation of your winnings. Argue that
(c) For the final round, do you have a dominant strategy? If so, what is it,
and why is it dominant?
16. On the web, look up the definition of a “cooperative game” and of its
“core.” What are they?
Chapter 15: A Bi-Matrix Game
1.╇ Preview
The zero-sum matrix game of von Neumann was studied in Chapter 14.
The current chapter is focused on a non-zero sum generalization. This gener-
alization, which is known as a bi-matrix game, is described below:
• If you choose row i and I choose column j, you lose Aij and I lose Bij.
• You wish to minimize the expectation of your loss, and I wish to mini-
mize he expectation of my loss.
Pivot strategies
In Chapter 14, we saw that the zero-sum matrix game has an equilibrium
in randomized strategies, moreover, that this equilibrium can be found by
using the simplex method to solve a linear program. Here, we will see that
the bi-matrix game also has an equilibrium in randomized strategies. This
equilibrium will not be constructed by the simplex method, however. In its
place, an equilibrium will be found by application of a related procedure that
is known as the “complementary pivot method.”
Payoff matrices
The bi-matrix game has been introduced in the context of a pair A and B
of cost matrices. The entire discussion is adapted in Section 6 to the case in
which A and B are payoff matrices, rather than cost matrices.
Significance
The bi-matrix game is important in its own right, and the complementary
pivot method has several other uses, which include the solution of convex
quadratic programs. The most amazing feature of the complementary pivot
method may be that it leads directly to a method for approximating a “Brou-
wer fixed point,” as will be seen in Chapter 16.
2. Illustrations
Let us begin with the particularly simple instance of the bi-matrix game
whose cost matrices are
Chapter 15: Eric V. Denardo 481
1 5 4 1
A= B= .
3 0 3 2
For these matrices, you do not have a dominant row, but I have a domi-
nant column. Each entry in the 1st column of B is larger than the correspond-
ing entry in the 2nd column of B. I will pick column 2 because it costs me
less than column 1, independent of the row that you choose. Knowing that
I will pick column 2, you choose row 2 because A22â•› =â•› 0â•›<â•›A21â•› =â•› 5. For these
matrices, the bi–matrix game has an equilibrium in nonrandomized strate-
gies, namely:
• I choose column 2.
2 5 7 7 5 2
(1) A= , B= .
5 7 3 2 1 6
p = p1 p 2 qT = q1 q2 q3
Here, pi is the probability that you play row i, so p1 and p2 are nonnega-
tive numbers whose sum equals 1. Similarly, qj is the probability that I play
column j, so q1, q2 and q3 are nonnegative numbers whose sum equals 1.
482 Linear Programming and Generalizations
Solution by eye
Let us suppose that you randomize over the rows so that I am indifferent
between columns 2 and 3. In other words, you choose p1 and p2 =€(1 − p1) so that
for you, the row player. Each entry in the matrix product p B equals my ex-
pected loss if I play the corresponding column. For the randomized strategy
p given above,
Now let us suppose that I randomize in a way that that avoids column 1
and makes you indifferent between rows 1 and 2. In other words, I pick q2 and
q3 =€(1 – q2) so that
Hence, if I play use the randomized strategy q given by (4), your expected
loss equals 17â•›/â•›3, independent of which row you choose.
Equation (3) and the fact that q1 =€0 shows that q is a best response to the
strategy p given by (2). Equation (5) shows that p is a best response to q. Evi-
dently, an equilibrium in randomized strategies has been constructed.
Empathy
The prior analysis of the cost matrices in (1) illustrate a principle that can
help in the construction of an equilibrium.
• You (the row player) figure out which columns I should avoid and se-
lect your randomized strategy p so that I am indifferent between those
columns that I should not avoid.
• I (the column player) ascertain which rows you should avoid and select
my randomized strategy q so that you are indifferent between the rows
you should not avoid.
With larger and more complicated cost matrices, it can be difficult for the
players to “eyeball” strategies p and q that have these properties.
3. An Equilibrium
In this section, our attention turns from the payoff matrices in equation
(1) to the general situation. Presented in this section is a set of equations that
prescribe an equilibrium in randomized strategies.
• The probability that you choose row i and I choose column j equals the
product piqj because our choices are made independently.
484 Linear Programming and Generalizations
• If you choose row i and I choose column j, then you lose Aij and I lose
Bij.
Expression (6) states that if I choose strategy q, you cannot reduce your
expected loss below p A q, no matter what strategy p̂ you choose. Similarly,
strategy q is a best response to p if
A convenient simplification
For the 2 × 3 example whose cost matrices are specified by (1), let us ask
ourselves what happens if the number 10 is subtracted from every entry in B.
Numerically, this replaces the data in (1) with
2 5 7 −3 −5 −8
(8) A= , B=
5 7 3 −8 −9 −4
Chapter 15: Eric V. Denardo 485
Linear constraints
Imposing (9) entails no loss of generality, and it lets us deduce the re-
quirement that probabilities sum to 1. That requirement is absent from the
system of equations and nonnegativity conditions in
n
(10) 1 + si = j=1 Aij yj for ifor
= i1,= .1,. .. ,. .,m,
m,
m
(11) −1 + tj = i=1 xi Bij forfor
j =j =1,1,. .. .. .,, n,
n,
(12) xi ≥ 0, si ≥ 0 i =i1,
for for . . .. ., .,m,
= 1, m,
(13) yj ≥ 0, tj ≥ 0 forfor
j=j =1,1,. .. .. ,. n.
, n.
Proposition 15.1.╇ Suppose that each entry in the cost matrix A is posi-
tive and that each entry in the cost matrix B is negative. Consider any solution
n
to (10)-(13). Then the numbers ρ = m i=1 xi and σ = j=1 yj are positive,
for for
i = i1,= 1,
. . .. .,.,m,
m,
(14) pi = xi /ρ
(15) qj = yj /σ for j =
for = 1,1,. .. .,. .n,
, n,
Satisfy
equations (14) and (15) define, respectively, a randomized strategy p for the
row player and a randomized strategy q for the column player. Dividing (10)
by σ produces (16), and dividing (11) by ρ produces (17). This completes a
proof. ■
Complementary variables
(18) xi si = 0 for i = 1, …, m,
(19) yj tj = 0 for j = 1, …, n.
Proposition 15.2.╇ Suppose that each entry in the cost matrix A is posi-
tive and that each entry in the cost matrix B is negative. Consider any com-
plementary solution to (10)-(11). The pair p and q of randomized strategies
specified by Proposition 15.1 form an equilibrium.
(20) 1/σ = p A q.
Similarly, multiply (17) by qj and then sum over j. The qj’s sum to 1, and
(19) gives tj qj = 0 for each j, so
(21) −1/ρ = p B q.
Complementary bases
• For each i, the basis includes exactly one member of the pair {xi, si}.
• For each j, the basis includes exactly one member of the pair {yj, tj}.
• Its basic solution equates each decision variable to a value that is non-
negative.
The set {s1, …, sm, t1, …, tn} of decision variables is a basis for (22)-(23),
and this set does include exactly one member of each complementary pair,
but its basic solution is not feasible because it sets siâ•›=â•›−1 for each i.
n
(24) si − j=1 Aij yj − iα = −1 for i = 1, . . . , m.
Chapter 15: Eric V. Denardo 489
Now, when the nonbasic variable α is set equal to 1, the basic solution has
s1 =€0, and it equates s2 through sm to positive values.
A sequence of pivots will occur on the system consisting of (23) and (24).
The initial pivot will occur on the coefficient of α in the equation for which
s1 is basic. The variable α enters the basis, and s1 departs. The resulting basis
consists of the set {α, s2 , . . ., sm , t1 , . . ., tn } of decision variables. Its basic so-
lution sets
α = 1,
si = i – 1 for i = 2, 3, …, m,
tj = 1 for j = 1, 2, …, n,
A pivot sequence
• The initial pivot occurs on the coefficient of α in the equation for which
s1 is basic.
• The entering variable in each pivot after the first is the complement of
the variable that departed on the preceding pivot, and the row on which
that pivot occurs is determined by the usual ratios, which keep the ba-
sic solution nonnegative.
Since s1 leaves on the 1st pivot, its complement, x1, will enter on the 2nd
pivot. The usual ratios determine the coefficient of x1 to pivot upon. The vari-
able that departs in the 2nd pivot is the basic variable for the row on which that
pivot occurs. Its complement enters on the 3rd pivot. And so forth – until a
pivot occurs on a coefficient in the row for which α is basic.
An Illustration
• Rows 3-4 of this tableau mirror equation (24). Rows 5-7 mirror equa-
tion (23).
• In each pivot, the entering variable’s label is lightly shaded, the pivot
element is shaded more darkly, and the departing variable’s label is sur-
rounded by dashed lines.
• The 1st pivot occurs on the coefficient of α in the row for which s1 is
basic.
• Since s1 departs on the 1st pivot, its complement x1 enters on the 2nd
pivot.
• Column N displays the usual ratios. The 2nd pivot occurs on the coef-
ficient of x1 row 14 because its ratio is smallest. The variable t3 is basic
for the equation modeled by that row. Hence, the complement y3 of t3
will enter on the 3rd pivot.
Chapter 15: Eric V. Denardo 491
Table 15.1.↜ The first two pivots for the matrices A and B in (8).
The 5th pivot produces a complementary basis. From the spreadsheet that
accompanies this chapter, we see that its basic solution sets
Failure?
What can go wrong? Like any pivot scheme, the complementary pivot
method could cycle (revisit a basis). And it could call for the introduction of a
variable that can be made arbitrarily large without reducing the values of any
of the basic variables. In the next section, a routine perturbation argument
will preclude cycling, and each entering variable will be shown to have a coef-
ficient on which to pivot.
The structure of the argument used here is strikingly similar to the argu-
ment that will be used in Chapter 16 to construct a Brouwer fixed-point. In
both chapters, the pivot scheme can be described as taking a “walk” through
the “rooms” of a mansion.
Rooms
Let us call each almost complementary basis a green room, and let us call
each complementary basis a blue room. The mansion is the set of all green
and blue rooms. Thus, each room in the mansion identifies a complementary
basis or an almost complementary basis. Doors between certain rooms will
soon be created.
Degeneracy
Perturbation
• For i =€ 1, …, m, subtract ε i from the RHS value of the ith equation
in (24).
• For j =€1, …, m, add εm+j to the RHS value of the jth equation in (23).
Reversibility of pivots
C€w = b,
whose data are the entries in the râ•›×â•›s matrix C and in the râ•›×â•›1 vector b and
whose decision variables form the sâ•›×â•›1 vector w.
Doors
Each green room (almost complementary basis) has two doors, each
door corresponding to selecting as the entering variable one member of the
pair that is not represented in this basis. Each blue room (complementary
basis) has a single door, which corresponds to selecting α as the entering
variable.
494 Linear Programming and Generalizations
Each label identifies the entering variable for the pivot that corresponds
to walking through that door. Each door between two rooms is labeled on
both sides. The label on the other side of the door you are looking at is that
of the variable that will depart as you walk through the door. Because piv-
ots are reversible, after walking through a door into a new room, you could
turn around and walk through the same door back to the room at which you
started.
Let us observe that the mansion does have a door to the outside. To exhibit
such a door, we consider the green room (almost complementary basis) that
results from the initial pivot. That basis is the set {α, s2 , . . ., sm , t1 , . . ., tn } of
decision variables. This pivot removed s1 from the basis. Let us take s1 as the
entering variable. Setting s1 positive alters the values of the basic variables
like so:
α = 1 + s1,
We have seen that the mansion has a door between a green room (almost
complementary basis) and the outside. Suppose – and this will be demon-
strated – that:
It will soon be clear that if there is only one door to the outside, the comple-
mentary pivot method must terminate with a pivot to a blue room (comple-
mentary basis).
Complementary paths
Figure 15.1 displays the types of paths we can encounter if there is only
one door to the outside. The mansion (set of rooms) is enclosed in dashed
lines, and the door to the outside is represented by a line segment between a
green room and the outside.
G G G G
B G G B
G G G
G G G G G G B
Figure 15.1 illustrates the three types of complementary path that can oc-
cur if there is only one door to the outside:
496 Linear Programming and Generalizations
• Type 2: Suppose we enter the mansion and then follow the complemen-
tary path. We cannot revisit a room – the first room we revisit would
need to have at least three doors, and none do. We cannot leave the
mansion – there is only one door to the outside, and we would need to
revisit the room we entered before we left. We must end at a blue room
(complementary basis).
• Type 3: Suppose we begin at a blue room other than the one at the end
of the Type-2 path. We cannot join the Type-2 path because it has no
unused doors. We cannot leave the mansion because there is only one
door to the outside. We must end at another blue room.
The complementary pivot method follows the Type-2 path. This path
must lead to a complementary basis – hence to an equilibrium in randomized
strategies – if we can show that there is only one door to the outside.
Something extra
No pivot row
Two propositions will be used to show that the bi-matrix game, as speci-
fied by equations (23) and (24), has only one door to the outside. The first of
these propositions is not particular to the bi-matrix game – it relates an enter-
ing variable that can be made arbitrarily large to a “homogeneous” equation
Chapter 15: Eric V. Denardo 497
Cw=b and w ≥ 0,
whose data are the entries in the râ•›×â•›s matrix C and in the râ•›×â•›1 vector b and
whose decision variables form the sâ•›×â•›1 vector w. As is usual, Cj denotes the jth
column of the matrix C. This proposition studies the case in which an enter-
ing variable xk can be made arbitrarily large without causing any of the basic
variables to become negative.
(25) Cj w(θ )j + θ Ck = b,
j∈β
has a solution w(θ) ≥ 0 for all θ ≥ 0. Then there exists a non-zero nonnegative
solution to
(26) Cj vj + Ck vk = 0.
j∈β
Proof. Subtracting (25) with the value θ =€ 0 from (25) with a positive
value θ produces
Ck = Cj (−zj ).
j∈β
Cj [w(θ )j − w(0)j − θ zj ] = 0
j∈β
498 Linear Programming and Generalizations
Proposition 15.3 concerns the feasible region of any linear program that
has been cast in Form 1. Its feasible region consists of the vectors w that sat-
isfy C w = b and w ≥ 0.
The crux of the argument that the complementary pivot method cannot
cycle appears below as
Proposition 15.4╇ (one door). Suppose that each entry in the cost matrix
A is positive and that each entry in the cost matrix B is negative. Consider an
almost complementary basis for system (23) and (24) and suppose that per-
turbing its basic solution by equating one of the missing pair of variables to a
positive value causes no basic variable to decrease in value. Then the basis is
{α, s2 , . . . , sm , t1 , . . . , tn } and the entering variable is s1.
Proof*.╇ The proof of this proposition earns a star because it is a bit in-
tricate.
Since the entering variable can be made arbitrarily large, Proposition 15.3
show that there exists a solution to the homogeneous version of these equa-
tions, namely, a nonnegative solution to
n n
(31) ŝi − AAijijŷŷj j−−iα̂iα̂==00 for i for i =. 1,
= 1, . . .,. .m,
, m,
j=1
j=1
m
(32) t̂j − m
i=1 x̂x̂i iBBijij==00 for1,j =. .1,. ., . n.
for i = . , n.
i=1
Further, in this homogeneous solution only the entering variable and the ba-
sic variables can be positive. Since the basis is almost complementary and
since the complement of the entering variable is not basic, these solutions
satisfy
All of the variables in (32) equal zero, not all of the variables in (31) can equal
zero. Each coefficient Aij is positive, so (31) guarantees
Thus, ŝi is either basic or is the entering variable. In either case, (33) guar-
antees
xi = 0 for each i.
500 Linear Programming and Generalizations
tj = 1 for each j.
yj = 0 for each j.
Every basis for (23) and (24) contains (mâ•›+â•›n) variables. By hypothesis, the
current basis is almost complementary, so it includes α. It includes t1 through
tn, and it includes (mâ•›–â•›1) of the variables s1 through sm. It must exclude exactly
one of the variables s1 through sm.
The clincher
• If you choose row i and I choose column j, you earn Aij and I earn Bij.
Expected net cost is the negative of expected net profit. A perfectly sat-
isfactory way to treat the problem with payoff matrices is to is to replace A
by −A and B by −B and solve the resulting bi-matrix game problem with cost
matrices.
Until this point, our discussion of bi-matrix games has been focused on
competitive behavior. This section concerns a model that includes coopera-
tion. Let us consider this situation:
• If you choose row i and I choose column j, you receive Aij dollars and I
receive Bij dollars.
• You and I can engage in a contract that governs how the total Aijâ•›+â•›Bij of
the amounts we receive is to be allocated between us.
502 Linear Programming and Generalizations
4 1 1 2
(35) A= , B= .
−3 0 3 2
The matrix Aâ•›+â•›B measure the total of the rewards we can attain, and the ma-
trix Aâ•›–â•›B represents the difference For the data in (35),
5 3 3 −1
(36) A+B= , A−B= .
0 2 −6 −2
A threat
To get me to play column 1, you will need to compensate me. I can threat-
en to play column 2. In this case – and in general – a reasonable measure of the
threat is the value of the game A – B. For the data in (36), that value equals –1.
(It has an equilibrium in pure strategies; you play row 1, and I play column 2).
where “val(A – B)” is short for the value (to the row player) of the game whose
payoff matrix is A – B. A reasonable division of the spoils is that the row
player receives (αâ•›+â•›β)/2 and the column player receives (αâ•›–â•›β)/2.
α = $5.00, β = −$1.00.
The total of $5.00 is divided like so; you receive $1.50 and I receives $3.50.
Interestingly, although the cooperative solution awards you (the row player)
Chapter 15: Eric V. Denardo 503
$4.00 and me (the column player) $1.00, I have enough bargaining power to
come out ahead, by this account.
8. Review
Except for a brief foray into cooperative behavior, this chapter has fo-
cused on competition. The complementary pivot method has been used to
find an equilibrium, that is, a pair of strategies in which each player responds
optimally to the actions of the other.
2. By eye, find all equilibria for the bi-matrix game whose cost matrices A
and B are given below. Remark: Barring “degeneracy,” the number of equi-
libria is odd.
0 6
A=B=
6 0
504 Linear Programming and Generalizations
3. By eye, find all equilibria for the bi-matrix game whose cost matrices A
and B are given below. Remark: Barring “degeneracy,” the number of equi-
libria is odd.
0 6 0
A = B = 0 0 6
6 0 0
4. Construct 2â•›×â•›2 cost matrices A and B for which the bi-matrix game has
three equilibria with these properties: amongst these equilibria, one has
the best (unique smallest) expected cost for the row player and the worst
(unique largest) expected cost for the column player,. and another has
the worst (unique largest) expected cost for the row player and the best
(unique smallest) expected cost for the column player.
5. Find all equilibria for the bi-matrix game whose cost matrices A and B are
given below.
1 3 0 0
A= , B=
4 0 0 0
2 7 5 6 9 4 5 6
7 7 2 6 5 3 11 3
A=
7
, B=
0 4 4 8 4 4 8
1 7 5 2 2 0 7 6
7. When comparing vectors x and y that have the same size, the inequality
x < y means x ≤ y and x ≠ y. (Thus, x < y if x ≤ y and if at least one of these
inequalities is strict.) In a bi-matrix game, column k of the cost matrix B
is said to be weakly dominated if there exists an nâ•›×â•›1 vector q whose en-
tries are nonnegative, whose entries sum to 1, and that satisfy Bq < Bk and
q k = 0.
Chapter 15: Eric V. Denardo 505
(a) Suppose column k is dominated. Argue that at least one pair p and q
of strategies that is an equilibrium has q k = 0.
(b) A constant-sum matrix game can have multiple equilibria, but they all
have the same expected ____________________. Complete the sen-
tence and indicate why it is true.
9. This problem concerns the bi-matrix game whose payoff matrices A and B
are given by
1 3 −2 −1
A= , B= .
5 2 4 0
(a) Find the equilibrium and its expected payoff to each player. Hint: Each
player can arrange for the other’s expected payoff to be independent of
his strategy.
(b) What is the largest total amount α that the two players could receive
from this game?
(c) Find the value β of the matrix game whose payoff matrix is A – B. Can
the row player guarantee a minimum payoff that exceeds that of the
column player by β?
(d) Describe a procedure that provides the row player with a payoff of
(αâ•›+â•›β)/2 and the column player with a payoff of (α – β)/2.
Chapter 16: Fixed Points and Equilibria
1. Preview
This chapter introduces fixed-point computation and its use in the cal-
culation of economic equilibria. The development begins with a statement of
Brouwer’s fixed- point theorem. Next, the problem of finding an equilibrium
is formulated for solution as a Brouwer fixed-point problem. The comple-
mentary pivot method is then adapted to approximate Brouwer fixed-points,
first on the “unit simplex,” then on any simplex, and finally on any closed
convex subset of n . The chapter is closed with a discussion of the related lit-
erature. This chapter is meaty, but geometric diagrams will help us to grapple
with its key ideas.
A few examples
A continuous function that maps a subset C of n into itself need not
have a fixed point within that set. The following three examples suggest what
can keep this from happening.
In Example 16.1, the set C is not closed. In Example 16.2, the set C is
closed but not bounded. In Example 16.3, the set C is closed and bounded,
but it has a “hole.”
A sufficient condition
Two players
As in Chapter 15, the data for the game we shall play are the entries in
the mâ•›×â•›n matrices A and B. You (the row player) select a row. Simultaneously,
I (the column player) select a column. If you pick row i and I pick column j,
you receive the amount Aij , and I receive the amount Bij . You and I know
the entries in the matrices A and B. You prefer a larger expected payoff to a
smaller one, and so do I.
An equilibrium
p̄ A q ≤ p A q and p B q̄ ≤ p A q.
holds when iâ•›=â•›3. This inequality states that if I play strategy q, you can do
better by playing row 3 than you can by playing strategy p. This connotes that
you ought to increase the probability p3 that you play row 3. More generally,
you benefit by increasing pi for each row i for which (1) holds. To do so, you
will need to reduce the probability that you play rows for which (1) does not
hold.
Note that αi (p, q) is positive if and only if (1) holds. Consider the num-
bers p̂1 through p̂m that are defined by
pi + αi (p, q)
(3) p̂i = , for i = 1, 2, · · · , m.
1+ m k=1 αk (p, q)
The same argument works for me, the column player. Let us suppose that
the inequality
(4) p Bj > p B q.
holds for some j. This says that if you play strategy p, I can do better by play-
ing column j than by playing strategy q. This suggests that I should increase
the probability qj that I play column j. This observation leads to the func-
tions,
qj + βj (p, q)
(6) q̂j = for j = 1, 2, . . . , n.
1 + nk=1 βk (p, q)
A fixed point
Equations (3) and (6) adjust the strategies of the row and column players
through the function f that is given by
where p̂i and q̂j are specified by the right-hand sides of (3) and (6), respec-
tively. A pair (p, q) of probability distributions is said to be a stable distribu-
tion if
Proposition 16.2, below, shows that at least one stable distribution exists,
moreover, that each stable distribution is an equilibrium.
(a) T
here exists at least one pair (p, q) of stable probability distribu-
tions.
Proof.╇ The set C of pairs (p, q) of probability distributions over the rows
and columns is closed, bounded and convex. The function f that maps the
pair (p, q) into (p̂, q̂) via (3) and (6) is a continuous map of C into itself.
Brouwer’s theorem (Proposition 16.1) shows that this function has at least
one fixed point, hence that there exists at least one pair (p, q) of probability
distributions for which (3) and (6) hold with p̂ = p and q̂ = q. This proves
part (a).
For part (b), let us first consider any pair (p, q) of strategies. Set vâ•›=â•›pAq.
It will be argued by contradiction that there exists at least one row i for which
pi > 0 and v ≥ Ai q. Suppose not. Multiply v < Ai q by pi and sum over i to
obtain the contradiction, v < pAqâ•›=â•›v.
pi + 0
pi = m .
1+ k=1 αk (p, q)
Ak q ≤ p A q for k = 1, 2, · · · , m.
Let p̂ be any randomized strategy for the row player. Premultipy the in-
equality that is displayed above by pˆk and sum over k to obtain p̂ A q ≤ p A q,
which shows that p is a best response to q. A similar argument demonstrates
that βj (p, q) = 0 for each j and, consequently, that q is a best response to p.
This completes a proof. ■
Chapter 16: Eric V. Denardo 513
A prize-winning result
In 1950, John Nash1 published the content of Proposition 16.1, with the
same proof, in a brief, elegant and famous paper. In the economics literature,
an equilibrium for an n-player game is often referred to as a Nash equilib-
rium. This distinguishes it from a general equilibrium, in which the market
clearing conditions are also satisfied. John Nash shared in the 1994 Nobel
Prize in Economics, which was awarded for “pioneering analysis of equilibria
in the theory of noncooperative games.”
Nash’s paper was exceedingly influential, but a case can be made that the
pioneering work in this area had been done by John von Neumann and Oskar
Morgenstern. In 1928, von Neumann2 had introduced the matrix game and
had made the same use of Brouwer’s fixed point theorem in his demonstra-
tion that a pair of randomized strategies has the “minimax” property that is
described in Chapter 14. Von Neumann and Morgenstern had shown in their
1944 book3 that a zero-sum game has an equilibrium in randomized strategies.
A vector space
1╇
Nash, John F (1950) “Equilibrium points in n-person games” Proceedings of the Na-
tional Academy of Sciences v. 36, pp. 48-49.
2╇
von Neumann, John (1928), “Zur Theorie der Gesellschaftsspiele.” Math. Annalen.
v. 100, pp. 295-320.
3╇
von Neumann, John and Oskar Morgenstern (1944, reprinted in 2007), Theory of
Games and Economic Behavior, Princeton University press, Princeton, NJ.
514 Linear Programming and Generalizations
It’s easy to see that each vector space V must contain the n-vector 0 (In
the above, take uâ•›=â•›v and αâ•›=â•›–1). An equivalent way in which to describe a
vector space is presented in
Proposition 16.3.╇ A subset W of n is a vector space if and only if:
• W contains the vector 0, and
• W contains the vector [(1 − α)u + αv] for every pair u and v of vectors
in W and for every real number α.
An affine space
Proposition 16.3 motivates a modest generalization. A subset X of n is
now called an affine space if:
• X contains the vector [(1 − α)u + αv] for every pair u and v of ele-
ments of X and for every real number α.
Affine spaces appear elsewhere in this book, though they are not la-
beled as such. The hyperplanes in Chapter 17 are affine spaces. The relative
neighborhoods in Chapter 19 are defined in the context of the an affine
space.
Affine combinations
The sum of several vectors in a vector space is a vector that lies in that
space. More generally, a vector space is closed under linear combinations. An
affine space may not be closed under linear combinations. In Example 16.4,
the sum of the vectors (1, 0, 0) and (0, 0, 1) is not in X, for instance.
Chapter 16: Eric V. Denardo 515
Affine spaces are closed under a related operation. Let the subset X of n
be an affine space that contains the set {v1 , v2 , . . . , vk } of vectors. For each
set {α1 , α2 , . . . , αk } of numbers such that
α1 + α2 + · · · + αk = 1,
the vector,
α1 v1 + α2 v2 + · · · + αk vk ,
Proposition 16.4 states that affine spaces are closed under affine combi-
nations. This proposition can be proved by induction on k. The details are left
to you, the reader.
Linear independence
A nonempty set of vectors in n is linearly independent if the only way
to obtain the vector 0 as a linear combination of these vectors is to multiply
each of them by the scalar 0 and take the sum. A characterization of linear
independent sets that contain at least two vectors is presented in
Proposition 16.5.╇ A set of two or more vectors in n is linearly indepen-
dent if and only if none of these vectors is a linear combination of the others.
Proof of Proposition 16.5 is also left to you, the reader. Let us recall (from
Chapter 10) that every set {v1 , v2 , . . . , vk } of vectors in n that is linearly
independent must have k ≤ n .
Affine independence
A nonempty set of vectors in n is now said to be affinely independent
if none of these vectors is an affine combination of the others.
516 Linear Programming and Generalizations
Example 16.5.╇ The vectors (1, 0) and (0, 1) and (1, 1) in 2 are affinely
independent because the line that includes any two of these vectors excludes
the third.
5. A Simplex
(8)
k k
S = {v = j=1 αj vj such that α ≥ 0 and j=1 αj = 1}
Extreme points
tor in the simplex S can be written in exactly one way as a convex combination
of v1 through vk . In particular, no vector in {v1 , v2 , . . . , vk } is a convex
combination of the others. In other words, each vector in {v1 , v2 , . . . , vk } is
an extreme point of S.
With the simplex S defined by (8), the vector vj is called the jth vertex of
S, and the subset of S that has αj = 0 is called the jth facet of S. The jth vertex of
S is sometimes called vertex j, and the jth facet of S is sometimes called facet j.
Simplexes in 3-space
Example 16.6.╇ Each subset S of 3 that is a simplex takes one of these forms:
• A point.
• A line segment.
• A triangle.
• A tetrahedron.
Each simplex in 3-space is a convex set that has not more than 4 extreme
points. (Its extreme points must be affinely independent, and no set of 5 or
more vectors in 3 can be affinely independent.) In particular, a pyramid is
not a simplex because it has 5 extreme points, which cannot be affinely inde-
pendent.
A tetrahedron has seven faces that are neither facets nor vertices. Can you
identify them?
The simplex method pivots from extreme point to extreme point. Must
the portion of the feasible region that lies close to an extreme point resemble
a simplex? No. Consider a feasible region C that has the shape of a large pyra-
518 Linear Programming and Generalizations
mid. The portion of C whose altitude is not more than one millimeter below
the apex of the pyramid has 5 extreme points, and they cannot be affinely
independent.
The apex of a pyramid is the intersection of four planes, rather than three,
for which reason its basic solution is degenerate. The pyramid connotes (cor-
rectly) that the neighborhood of a degenerate extreme point need not resem-
ble a simplex.
This definition is a bit technical, but an example will clear the air.
Subdividing a triangle
Let us consider a simplex U3 that is easily drawn on the plane. This sim-
plex consists of each 3-vector whose entries are nonnegative numbers that
sum to 1. Specifically,
This simplex has three extreme points, which are (1, 0, 0) and (0, 1, 0) and
(0, 0, 1). Figure 16.1 presents two different subdivisions of U3 .
Chapter 16: Eric V. Denardo 519
(0, 1, 0) (0, 1, 0)
Repeated subdivision
Let us turn, briefly, from 3-space to n-space. For jâ•›=â•›1, 2, …, n, the symbol
j
e denotes the n-vector that has 1 in its jth position and has 0’s in all other posi-
tions. The unit simplex Un in n is the set of all convex combinations of the
n-vectors e1 through en , and it can be described succinctly as
The case n = 3
Let us return to the case nâ•›=â•›3, in which case the unit simplex is the set of
all convex combinations of the vectors e1 = (1, 0, 0), e2 = (0, 1, 0) and
e3 = (0, 0, 1). Figure 16.2 displays the vertexes and facets of U3 , along with
information that will help us to subdivide it.
e2 = (0, 1, 0)
facet 1 facet 3
x1 = 0 x3 = 0
x1 = 1/5 x3 = 1/5
line segments in Figure 16.2 identify the points in U3 that have x2 = 1/5 and
x3 = 1/5.
A subdivision
e2 = (0, 1, 0)
facet 1
facet 3
x1 = 1/5 x3 = 1/5
x2 = 1/5
e3 = (0, 0, 1) facet 2 e1 = (1, 0, 0)
Each of the 25 small simplexes in Figure 16.4 has three vertexes, but many
of these vertexes are shared. The 25 small simplexes have a total of 21 vertexes
because 21 = 1 + 2 + · · · + 6.
522 Linear Programming and Generalizations
Each of the 25 small simplexes in Figure 16.4 has three facets, but many of
these facets are shared. These small simplexes have a total of 45 facets. Exactly
15 of these facets lie on the boundary of U3 , and each of the other facets is
shared by exactly two of the small simplexes.
Labeling vertexes
Paths
The small simplexes in the corners of Figure 16.4 are missing one label
apiece. We could focus on any one of the corners. Let’s choose the small sim-
plex in the lower right-hand corner. Its vertices omit the label 1. To identify a
path to a completely-labeled simplex, in Ω, we:
Chapter 16: Eric V. Denardo 523
1 3
facet 1 1 2 3 facet 3
1 3 1 3
1 1 1 2 3
2 2 2 2 2 3
facet 2
• Call each of the 25 small simplexes a room. The mansion is the set of all
rooms.
• Color a room green if its vertexes have labels 2 and 3, but not 1.
• Note that there is only one door to the outside of the mansion.
• Begin outside the mansion, enter it through its only door to the outside
and leave each green room by the door through which you did not en-
ter. This must lead you to a blue room.
524 Linear Programming and Generalizations
• If there is a second blue room, leave it by its only door and then leave
each green room by the door through which you did not enter. This
must lead to a third blue room. And so forth.
The same argument works if we start in any corner. The room at the top
of Figure 16.4 has vertexes that are labeled 1 and 3, but not 2. To start there,
paint each room green if its vertices have the labels 1 and 3, but not 2. Create
a door in each facet of each room that bears the labels 1 and 3. Then enter the
mansion as before and leave each green room by the door through which you
did not enter. This must lead to a blue room. For the labeling in Figure 16.4,
this walk leads to a different blue room (completely labeled simplex).
A complication
This general situation is more subtle than the example in Figure 16.4 sug-
gests. A grid has been used to subdivide the unit simplex in 3 into smaller
simplexes. What about the unit simplex in 4 ? A grid can be used to parti-
tion the unit simplex in 4 into smaller simplexes, but the partitioning is not
unique. The difficulties are compounded in higher dimensions. To partition
the simplex in n , we would need to delve into algebraic topology. That can
be avoided, and we shall.
as a thick line segment. Figure 16.5 also sets the stage for subdividing this
tetrahedron by bisection. The mid-points of its edges are labeled a through f.
2
b
c 3
a
f
e
4
d
1
c
a
e f
d
526 Linear Programming and Generalizations
Primitive sets
An illustration
The definition of a primitive set is wordy, but a picture will make every-
thing clear. Figure 16.7 represents the unit simplex U3 ,as a triangle. In this
figure, six distinguished points are represented as black dots. There may be
other distinguished points, but the others appear in the unshaded parts of U3 ,
and their dots are omitted from Figure 16.7. Each of the shaded triangles in
Figure 16.7 has these properties:
• Exactly one distinguished point lies inside each facet of each shaded
triangle that does not lie in a facet of the unit simplex.
(0, 1, 0)
facet 1
x1 = 0.2
facet 3
x1 = 0.8
x3 = 0.4
x3 = 0.7
x2 = 0.24
(0, 0, 1) (1, 0, 0)
x2 = 0.06
facet 2
A proper labeling
• If a facet of a primitive set lies within the kth facet of the unit simplex
Un , that facet receives the label k.
A pivot scheme
A primitive set is said to be completely labeled if its facets bear the labels
1 through n, equivalently, if no two of its facets have the same label. A familiar
argument will demonstrate that each proper labeling has a completely labeled
primitive set. This argument will identify a path from a corner of Un to a
completely labeled primitive set. Each pivot will shift from one primitive set
to another. The facets of each primitive set that is encountered prior to termi-
nation will bear all but one of the labels 1 through n. The same label will be
missing from each primitive set that is encountered prior to termination. The
pivot scheme will terminate when it encounters a completely-labeled primi-
tive set, i.e., a primitive set whose facets bear all n labels.
Initialization
This pivot scheme is now illustrated for the case in which nâ•›=â•›3. Each
primitive set T that it encounters is specified by (10) with specific values of
a1 , a2 , and a3 . The pivot scheme will be initialized at the shaded triangle in the
lower right-hand corner of Figure 16.7. To accomplish this:
• Begin with a2 = 0 and a3 = 0.
• Find the distinguished point x having the largest value of x1 , and equate
a1 to its value of x1 .
For the data in Figure 16.7, this initialization step sets a1 = 0.8, and it
produces the shaded primitive set T in the lower right-hand corner of the fig-
ure. If the entering facet had 1 as its label, we would have encountered a primi-
tive set that has all three labels, and the algorithm would terminate. To see
how the pivot scheme proceeds, we suppose that the entering facet does not
have 1 as its label. For specificity, we suppose its label equals 2, rather than 3.
A pivot
will prepare for another facet to enter, and another pivot to occur. How the
first pivot occurs will be illustrated in the context of Figure 16.8.
3 3
x1 = 0.7
2 x3 = 0
x1 = 0.8
3
2
(1, 0, 0)
x2 = 0
x2 = 0.12
The orientation of the facet containing the point xâ•›=â•›(0.8, 0.12, 0.08) has
just shifted from the facet having x1â•›=â•›0.8 to the facet having x2â•›=â•›0.12.
Two facets of the primitive set that results from the first pivot have now
been identified. These facets lie in the intervals x2 = a2 and x3 = a3 with
a2 = 0.12 and a3 = 0. A facet on which x1 equals some constant has yet to be
specified. This is accomplished in a familiar way:
• Among those points x having x2 > a2 and x3 > a3 , find the distinguished
point having the largest value of x1 , equate a1 to its value of x1 .
• Denote as T the resulting primitive set, and label the facet of T on which
facet x1â•›=â•›a1 as the new entering facet.
This pivot results in the shaded triangle in the upper left portion of Fig-
ure 16.8. The entering facet bears the label 3. The other facet that bears the
label 3 has x3â•›=â•›0, and it will leave on the 2nd pivot.
A later pivot
• The leaving facet is in the interval on which x2 = 0.24, and its departure
causes a2 to increase from 0.24 to the smallest value c for which the in-
terval x2 = c includes a distinguished point in a facet of T. For the data
in Figure 16.9, a2 increases to 0.36.
• The facet that includes this distinguished point has shifted its orienta-
tion from the interval in which x3 = 0.4 to the interval in which x2 =
0.36.
• Since the facet on which x3 = 0.4 has departed, we search among the
distinguished points x having x1 > a1 (currently, a1 = 0.2) and x2 > a2
532 Linear Programming and Generalizations
(currently, a2 = 0.36) for the one having the largest value of x3 . For the
data in Figure 16.9, this distinguished point has x3 = 0.25. The entering
facet lies in the interval on which x3 = 0.25. This facet bears the label 3,
so the other facet having the label 3 will leave on the next pivot.
x3 = 0.4
x1 = 0.2 x3 = 0.25
3
3
2
3
x2 = 0.36
x2 = 0.24
General discussion
Our attention turns to the unit simplex Un in n . Pick any set of dis-
tinguished points in Un that satisfies the Nondegeneracy Assumption. Label
each distinguished point with an integer between 1 and n, inclusive, and con-
sider the proper labeling of the facets of the primitive sets that is determined
by these labels.
The label of the facet that includes x1 duplicates the label of some other
facet in the initial primitive set. That facet will leave on the first pivot. The
facet that will enter on each pivot is found by the rule that has just been il-
lustrated. If the label of the entering facet equals 1, pivoting stops. If not, this
primitive set has no facet whose label equals 1, the label of the entering facet
duplicates one other label, and that label departs on the next pivot.
Termination
Let us ask ourselves the rhetorical question, “What can happen when this
pivot scheme is executed?”
Proof. Let us call each primitive set a room. Let us color a room (primi-
tive set) blue if its facets contain all n labels. Let us color a room green if its
facets contain the labels 2 through n but not 1.
Begin with the room whose facets are contained in facets 2 through n of
U . that room is blue, there is nothing to prove. Suppose it is green. It is the
n If
only green room that intersects the boundary of Un . Call two colored rooms
adjacent if it is possible to shift from one to the other with a single pivot. The
green room at which pivoting begins is the only green room that is adjacent
to one other room. Every green room other than it is adjacent to exactly two
other rooms. Pivoting cannot revisit a room because the first room revisited
would need to be adjacent to three others, and none are. There are finitely
many rooms, so pivoting must stop. It must stop by encountering a blue room,
namely, a room whose facets bear the labels 1 through n. That room (primi-
tive set) is completely labeled. ■
Let f be a continuous map of the unit simplex Un into itself. Each vec-
tor x in Un has x ≥ 0 and has x1 + x2 + · · · + xn = 1. The fact that f(x) is in
Un guarantees f(x) ≥ 0 and f (x)1 + f (x)2 + · · · + f (x)n = 1. Of necessity, the
inequality
f (x)j ≥ xj
is satisfied. Let us assign to each distinguished point wk the label L(wk ) whose
value is an integer j for which (11) holds.
Each facet of each primitive set is now assigned a monotone label by this
rule:
These labels are “monotone” because each facet of each primitive set is
labeled with an integer j such that the inequality f (x)j ≥ (x)j is satisfied by at
least one vector x in that facet. These labels are proper because they satisfy the
Border Condition.
Proposition 16.7.╇ Let f be a continuous function that maps the unit sim-
plex Un in n into itself. Then there exists at least one vector x in Un such that
f(x) = x.
Remark:╇ The proof of this proposition uses material from real analysis
and is starred.
At the heart of this proof of Proposition 16.7 lies the scheme in the prior
section for finding a completely-labeled primitive set. This method offers the
536 Linear Programming and Generalizations
An issue
A simplex
Proposition 16.7 is stated in terms of a function that maps the unit sim-
plex Un into itself. What about a function that maps some other simplex into
itself?
Proof. This simplex S has some number n of vertexes. Label these ver-
texes v1 through vn . Because S is a simplex, each vector in S can be written
in a unique way as a convex combination of the vectors v1 through vn . Write
each x ∈ S as the convex combination
x = x1 v1 + x2 v2 + · · · + xn vn .
With S expresses in this way, the prior discussion of primitive sets applies,
as does the proof of Proposition 16.7. ■
Sperner’s lemma
Primitive sets
In 1965, Herbert Scarf introduced primitive sets and indicated how one
could start in a corner and follow a path to a completely-labeled primitive set.
In his 1973 monograph, written in collaboration with Terje Hansen, Scarf4
acknowledged his debt to Lemke and Howson. It seems amazing, even now,
that a method devised to find a complementary solution to a linear system
would adapt so naturally to the distinctly nonlinear a problem of approximat-
ing a Brouwer fixed point.
Impact
4╇
Scarf, H. E., with T. Hansen, The Computation of Economic Equilibria, Yale Univer-
sity Press, New Haven, CT (1973).
5╇
Eaves, B. C., “Homotopies for computation of fixed points,” Mathematical Program-
ming, V. 3, pp 1-22 (1972).
6╇
Merrill O. H., Applications and extensions of an algorithm that computes fixed points
of certain upper semi-continuous point to set mappings, Ph.D. Thesis, University of
Michigan, Ann Arbor, MI. (1972).
Chapter 16: Eric V. Denardo 539
10. Review
In brief, prior to the work of Lemke and Scarf, the connection between
economic equilibria and fixed points had been theoretical. Equilibria could
be shown to exist, but no method for computing them existed. Economists
needed to rely on arguments of the sort that Brouwer had spurned. That is no
longer the case. Equilibria can now be computed and studied.
(a) Show that the vectors {v2 − v1 ), {v3 − v1 ), {vk − v1 ) are linearly in-
dependent.
van der Laan, G and A. J. J. Talman [1982], “On the computation of fixed points in
7╇
5. Let the subset C of 3 be a pyramid. How many vertices does it have? De-
scribe its vertices. Which of these vertices is a linear combination of the
others? Which of its vertices is not a convex combination of the others?
11. In Figure 16.4, create different system of doors – one in each facet of each
small simplex that omits only the label 2. Does the path-from-the-outside
argument continue to work? Does it lead to a different blue room?
12. In the context of Figure 16.4, suppose you start outside the mansion and
follow a path to a blue room. Devise a scheme that might lead from that
blue room to a different blue room.
(b) Bisect each edge. Identify any one of the vertices of the original tet-
rahedron. Use solid lines to connect the points that bisect each edge
that touches this vertex. Repeat for the other three vertices.
Chapter 16: Eric V. Denardo 541
(c) Describe the object you constructed in solid lines. How many vertices
does it have? How many edges?
(d) Pick a pair of its vertices that are not connected by an edge. Connect
them. Did you just execute a subdivision of a tetrahedron? If so, how
many smaller subdivisions did you obtain?
14. In the context of Figure 16.9, describe the next pivot. (You may wish to
postulate the location of one or more points.)
15. (A 3-player analogue of the bi-matrix game) Suppose player 1 has m op-
tions, that player 2 has n options, and that player 3 has p options. Suppose
that if players 1, 2 and 3 choose options i, j and k, they lose Aijk , Bijk and
Cijk , respectively. Each player knows the data, each player selects a ran-
domized strategy, and each player aims to minimize the expectation of his
loss.
(c) Does the proof of Proposition 16.2 adapt to this game? If so, how?
Part VI–Nonlinear Optimization
This chapter begins with concepts that are fundamental to analysis and
to constrained optimization – the dot product of two vectors, the norm of a
vector, the angle between two vectors, neighborhoods, open sets, closed sets,
convex sets, and continuous functions. Two of the key results in this chapter
are the “extreme value theorem” and the “supporting hyperplane theorem.”
In this chapter, convex functions are defined, and ways in which to rec-
ognize a convex function are described. A key result in this chapter is that a
convex function has a supporting hyperplane at each point on the interior of
its domain.
544 Linear Programming and Generalizations
1. Preview
This chapter is focused on the properties of convex sets that are particu-
larly relevant to nonlinear programming. Presented here are:
• Basic information about the dot product, the norm of a vector, the
angle between two vectors, neighborhoods, open and closed sets, and
limit points.
2. Preliminaries
The chapter begins with topics that may be familiar to you. These include
the dot product of two vectors, the norm of a vector, the angle between two
vectors, open and closed sets, neighborhoods, and continuous functions.
There is nothing new about the dot product: When A is an mâ•›×â•›n matrix
and x is an nâ•›×â•›1 vector, the ith element in the matrix product A x equals the
dot product of the ith row of A and x.
The norm
For each vector x in n , the norm of x is denoted ||x|| and is defined by
√
(2) ||x|| = x · x = (x1 )2 + (x2 )2 + · · · + (xn )2 .
The norm of x can be interpreted as the length of the line segment be-
tween the vectors 0 and x. This definition harks back to the time of Euclid.
When we speak of the angle between the n-vectors x and y, what is meant is
the angle θ between their respective line segments, as is illustrated in Figure 17.1.
y
y–tx
x
θ tx
0
Chapter 17: Eric V. Denardo 547
Equation (4), below, shows that cos θ is determined by the dot product
of x and y and their norms. As Figure 17.1 suggests, this result is established
by selecting the value of t for which the vectors tx and yâ•›–â•›tx are perpen-
dicular.
Proposition 17.1.╇ Let x and y be non-zero n-vectors. For the scalar t given by
x·y
(3) t= ,
x. · x
the vectors tx and yâ•›–â•›tx are the sides of a right triangle, and the angle θ be-
tween the vectors x and y has
x·y
(4) cosθ = .
||x.|| ||y||
(5) x · y = tx · x.
The identity yâ•›=â•›txâ•›+â•›(yâ•›–â•›tx) makes it clear that the three vectors tx and (yâ•›–â•›tx)
and y are the sides of a triangle. The Pythagorean theorem will verify that
this is a right triangle whose hypotenuse is y. To employ it, we take the sum
of the squares of the lengths of the vectors tx and (yâ•›–â•›tx) and use (5) repeat-
edly in
(t x) · (t x) + (y − t x) · (y − t x) = t2 (x · x) + (y − t x) · (y),
= t x · y + y · y − t x · y = y · y,
The sign of the dot product x · y motivates much of the algebra in this
chapter.
Neighborhoods
For each positive number ε, the set Bε (x) is called a neighborhood of x. When
n equals 3, the set Bε (x) is a “ball” that consists of those vectors y whose dis-
tance from x is below ε.
Open sets
Closed sets
Continuous functions
Bounded sets
The symbol ej is reserved for the n-vector whose jth entry equals 1 and
whose other entries equal 0, so
1 if k = j
j
ek = .
0 if k = j
Throughout this chapter, the symbol n is reserved for the number of entries
in each n–vector.
The results in this section make use of the material that has just been
discussed. The first of these results is
Remark:╇ This result has several proofs, none of which is truly brief. The
theme of the proof offered here is to construct a nested sequence (T1 , T2 , . . . )
of subsets of n , each of which is a closed “cube,” each of which contains infi-
nitely many members of the sequence (v1 , v2 , . . . , ) and each of which has half
the “width” of its predecessor.
that sub-cube T2. Express T2 as the union of 2n cubes each having half of
its width, note that one of them must contain infinitely many members of
(v1 , v2 , . . . , ), label that cube T3, and repeat.
Each sub-cube is closed, and the intersection of any number of closed sets
is closed. Being nested, the intersection of the closed sets T1, T2, …, is non-
empty. Because the width of Ti approaches 0, there exists exactly one n-vector
v such that {v} = ∞ i=1 Ti .
Before proving the theorem, we pause to indicate what can go wrong when its
hypothesis is not satisfied.
Example 17.1╇ (why S must be closed). The function f(x)╛=╛x on the open
set S = {x : 0 < x < 1} attains neither its minimum nor its maximum.
Proof.╇ Proposition 17.2 and the continuity of f guarantee that the quantity
z∗ = inf{f (x) : x ∈ S} is finite. Proposition 17.2 also guarantees that there ex-
ists a convergent sequence {v1 , v2 , . . .} of n-vectors in S for which f (vm ) → z∗.
Since S is closed, there exists an element v of S such that {v1 , v2 , . . .} converges
to v. Because f is continuous, f(v)â•›=â•›z∗. ■
552 Linear Programming and Generalizations
The extreme value theorem will soon be used to generalize the theorem
of the alternative that was presented in Chapter 12. This generalization con-
cerns convex cones and their “duals.”
Convex cones
A subset C of n is called a convex cone if C is nonempty and if
Examples
1970.
Chapter 17: Eric V. Denardo 553
The subsets of 2 that are closed convex cones take one of six forms, five
of which are illustrated in Figure 17.2. These five are: (1) the origin, (2) a half-
line through the origin, (3) a wedge-shaped region that includes the origin,
(4) a line through the origin, and (5) a half-space that includes the origin. Not
represented in Figure 17.2 is 2 itself, which is a closed convex cone.
Figure 17.2.↜ Five subsets of the plane that are closed convex cones.
x2 x2 x2
x1 x1 x1
x2 x2
x1 x1
Polyhedral cones
Let A be an mâ•›×â•›n matrix, and consider the set C of mâ•›×â•›1 vectors given by
Non-polyhedral cones
is a closed convex cone, but C is not the set of nonnegative linear combina-
tions of finitely many vectors. (The subset of C in which x3 ≤ 6 has the shape
of an ice cream cone.)
(10) C∗ = {y ∈ R n : [c ∈ C] ⇒ [y · c ≤ 0]}.
&
&
A duality theorem?
or right angle with each vector in C*. This connotes that if we begin with
a closed convex cone C, take its polar cone C*, and then take its polar cone
(C*)*, we get C. That is correct! That this is so will soon be established. It is a
corollary of the result that comes next.
A generalization
A geometric perspective
Figure 17.4.↜ Vectors b ∈
/ C and y ∈ C∗ have b · y > 0.
ĉ C
C*
556 Linear Programming and Generalizations
One can interpret f (c) = ||b − c||2 as the “squared distance between b and c.”
The function f(c) defined by (11) is continuous. Aiming to use the extreme
value theorem,
T = pick
{c ∈ any element
C : f(c) of C and define the set T by
≤ f(c̄)}.
(14) y = b − ĉ.
In inequality (16), cancel the terms yâ•›․â•›y, then divide by α, and then let α
decrease to zero to obtain 0 ≤ −2c · y, equivalently, 0 ≥ c · y = y · c. This
inequality holds for every c ∈ C, which shows that y ∈ C∗
completing a proof. ■
The proof of Proposition 17.4 is lengthy, but it has only two themes. One
theme is to use the extreme value theorem to identify a vector in C that is
closest to b. The other is a “calculus trick,” namely, to let α approach 0 in a way
that gets rid of the quadratic term.
Farkas
Proposition 17.4 may remind you of a result from Chapter 12. That result
appears here as
Proposition 17.5╇ (Farkas). Consider any mâ•›×â•›n matrix A and any mâ•›×â•›1
vector b. Exactly one of the following alternatives occurs:
(a) There exists an nâ•›×â•›1 vector x such that Axâ•›=â•›b and xâ•›≥â•›0.
(b) There exists a 1â•›×â•›m vector y such that yAâ•›≤â•›0 and y bâ•›>â•›0.
C∗ = {y ∈ R 1×m : yA ≤ 0}.
Thus, Proposition 17.5 is immediate from Proposition 17.4. ■
A pattern of inference
simplex LP
method ⇒ duality ⇔ Farkas
⇑
Farkas for extreme value
polar cones ⇐ theorem
Evidently, starting with the extreme value theorem leads to deeper results
than does starting with the simplex method.
c·y ≤0 ∀ y ∈ C∗ .
Chapter 17: Eric V. Denardo 559
H = {x ∈ n : a · x = β},
H+ = {x ∈ n : a · x > β},
H− = {x ∈ n : a · x < β}.
It is clear that these three sets are disjoint, that each of them is convex, that
their union equals n . In addition, the set H is closed, and the sets H+ and H−
are open. The set H is called a hyperplane, and the sets H+ and H− are called
open halfspaces.
Illustration
x2
(2, 3)
hyperplane H
x1
3
Separation
H ŝ
b
Chapter 17: Eric V. Denardo 561
(20) a = ŝ − b.
Consider any s ∈ S. Being convex, the set S contains [(1 − α)ŝ + αs] =
[ŝ + α(s − ŝ)] for each α between 0 and 1. Applying (19) with s replaced by
[ŝ + α(s − ŝ)] and then letting α decrease to zero yields
(21) a · ŝ ≤ a · s ∀s ∈ S.
Take β = (a · b + a · ŝ)/2 , and observe from (21) and (22) that (18)
holds. ■
Remark:╇ You, the reader, are encouraged to draw the analog of Fig-
ure 17.7 that describes the supporting hyperplane.
8. Review
This chapter is far from encyclopedic. One important topic that this chap-
ter omits is the “implicit function theorem.” It would be required for a more
ambitious foray into nonlinear optimization than is found in Chapter 20.
Chapter 17: Eric V. Denardo 563
1. Find the angle between the 3-vectors (1, 2, 3) and (2, –5, 1).
S∗ = {y ∈ Rn : y · s ≤ 0 ∀ s ∈ S}
==â•›{(s, t) : s ∈ S, t ∈ T}
6. Let S and T be convex subsets of n-vectors. Is the setUUâ•›
convex? Support your answer.
9. (↜that Farkas implies LP Duality). The data in the problem are the (famil-
iar) mâ•›×â•›n matrix A, the mâ•›×â•›1 vector b and the 1â•›×â•›n vector c. Suppose that
there do not exist an nâ•›×â•›1 vector x and a 1â•›×â•›m vector y that satisfy
564 Linear Programming and Generalizations
â•…â•…â•… u: Ax ≤ b,
â•…â•…â•… v: – yAâ•›≤ – c,
â•…â•…â•… θ: – cx + yb ≤ 0,
â•… x ≥ 0, y ≥ 0.
(a) Show that must exist a 1â•›×â•›m vector u and a nâ•›×â•›1 vector v that satisfy
(b) Show that there cannot exist an nâ•›×â•›1 vector x and a 1â•›×â•›m vector y such
that
(c) Use Farkas and weak duality to prove this theorem of the alternative:
Either a linear program and its dual have the same optimal value or at
least one of them is infeasible. Hint: This is immediate from part (b).
Chapter 18: Differentiation
1. Preview
Differentiation abounds with traps for the unwary. Many things that
seem to be true turn out to be false. This chapter is sprinkled with examples
that identify the pitfalls.
This chapter draws upon Chapter 17. Before tackling this chapter, you
should be familiar with the norm, the dot product, the angle between two
vectors, neighborhoods, and open sets.
f (x + ε) − f (x)
(1) y = lim ε→0 ,
ε
f(x + ε) − f(x)
(2) f(x + ε) − f(x) = · ε → f (x) · 0 = 0.
ε
A discontinuous derivative
Let us examine how f(x) behaves as x approaches 0. Recall that the function
sin(x) of x oscillates with a period of 2π, specifically, that every real num-
ber x has sin(xâ•›+â•›2π)â•›=â•›sin(x). Recall also that sin(x) takes values between +1
Chapter 18: Eric V. Denardo 567
0 for x = 0
f (x) = ,
2x sin (1/x) − cos (1/x) for x = 0
Rolle’s theorem
Proposition 18.1╇ (Rolle’s theorem). Let the function f(x) of the variable x
be continuous on the interval a ≤ x ≤ b, with f(a)â•›=â•›f(b)â•›=â•›0, and suppose that
f(x) is differentiable on the interval a < x < b. Then there exists at least one
number y that satisfies
Proof.╇ Let’s first suppose that this function f has f(w)â•›<â•›0 for at least one value
of w that satisfies aâ•›<â•›wâ•›<â•›b. The set Sâ•›=â•›{x: a ≤ x ≤ b} is closed and bounded,
and f is continuous on S, so the Extreme Value theorem (Proposition 17.2)
guarantees that there exists an element y of S that minimizes f over this inter-
val, i.e.,
Moreover, since f(w)â•›<â•›0 and f(a)â•›=â•›f(b)â•›=â•›0, it must be that y lies strictly be-
tween a and b. Taking ε positive but close to zero gives
f (y + ε) − f (y)
≥ 0,
ε
568 Linear Programming and Generalizations
f(y + ε) − f(y)
≤ 0.
ε
If f(w) â•›>â•›0 for some number w between a and b, applying the prior argu-
ment to the function –f establishes the desired result. Finally, if f(w)â•›=â•›0 for
every w between a and b, the function f has f (y) = 0 for aâ•›<â•›yâ•›<â•›b. ■
Proposition 18.1 is known as Rolle’s theorem. It and its proof are exqui-
sitely simple. To appreciate them, you need only recall Example 18.1.
Proposition 18.2╇ (the Mean Value theorem). Let the function f(x) of x be
continuous for aâ•›≤â•›xâ•›≤â•›b, and let f(x) be differentiable on the interval aâ•›<â•›xâ•›<â•›b.
Then there exists at least one number y that satisfies aâ•›<â•›yâ•›<â•›b and
f (b) − f (a)
(5) f (y) = .
b−a
x−a
g(x) = f (x) − f (a) − · [f (b) − f (a)].
b−a
Since g(a)â•›=â•›0 and g(b)â•›=â•›0, Rolle’s theorem applies to g, and g (y) = 0 gives
f (y) = [f (b) − f (a)/[b − a]. as desired. ■
f (x + ε) − [f (x) + y · ε]
(6) limε→0 = 0.
|ε|
An interpretation
f (x + ε)
Expression (6) states that for small values of ε, the difference between
f(xâ•›+â•›ε) and g(xâ•›+â•›ε) is small even when divided by |ε|. It is emphasized:
f(x + d) − [f(x) + y · d]
(7) lim||d||→0 = 0.
||d||
In (7), the role of ε is played by the vector d, and the value that the function as-
signs to (xâ•›+â•›d) is compared with the hyperplane whose slopes form the vector
y. Evidently, to be differentiable is to be well-approximated by a hyperplane.
The limit
The limit in (7) must hold no matter how the norm of the vector d ap-
proaches zero. It is emphasized:
It is easy to check that (6) and (7) coincide when n â•›=â•›1. Later in this chapter,
we will interpret yi as the “partial derivative” of f(x) with respect to the variable xi.
understanding of the gradient, let us specialize (7) to the case in which the
vector d approaches 0 in a particular “direction.” To do so, we replace d by εd
where d is a fixed n-vector and ε is a number that approaches 0.
and y = ∇f (x).
f (x + εd) − f (x) − εy · d
→0 as ε → 0,
||εd||
f (x + εd) − f (x) − εy · d
→0 as ε → 0,
ε||d||
1 f (x + εd) − f (x)
−y·d →0 as ε → 0,
||d|| ε
It remains to demonstrate that only ∇f (x) satisfies (8). Pick any i between
1 and n. Let ei be the n-vector with 1 in its ith position and 0’s elsewhere. Take
dâ•›=â•›ei, and note from (8) that yi must equal ∇f (x)i . ■ ||
A non-zero gradient
If the vector ∇f (x) is not zero, it determines both rate of change of f and
the direction of increase of the function f.
f (x + εd) − f (x)
(9) lim ε→0 ≤ ||∇f (x)||,
ε
Proof.╇ Since ∇f (x) and d are nonzero n-vectors, the angle θ between
them was shown in Chapter 17 to satisfy
∇f (x) · d
(10) cos (θ ) = .
||∇f (x)|| ||d||
By hypothesis, ||d|| = 1. Proposition 18.3 shows that the limit on the left-
hand side of (9) equals ∇f (x) · d. Substituting gives
f (x + εd) − f (x)
(11) limε→0 = ||∇f (x)|| cos (θ).
ε
Since cos (θ ) ≤ 1, inequality (9) has been verified. The cosine of θ equals 1
if and only if the angle between ∇f (x) and d equals 0, and that occurs if and
only if d = ∇f (x)/||∇f (x)||, which completes a proof. ■
If the gradient of a function is not zero, it points uphill (in the direction
of increase) of that function.
For particular function f, the limit on the RHS of (8) may exist, and it may
not exist. When the limit in
f (x + εd) − f (x)
lim ε→0
ε
exists and is finite, we call this limit the bidirectional derivative in the direc-
tion d. When the limit in
f (x + εd) − f (x)
lim ε↓0
ε
exists and is finite, we call this limit the unidirectional derivative, in the
direction d. This terminology is not universally agreed upon. Let it be noted
that:
2uv3
(12) f (u, v) = ,
u2 + v6
for all other pairs (u, v). Let us consider the behavior of this function in a
neighborhood of (0, 0). For each number v ≠ 0, we have f (v3 , v) = 1, so f
is not continuous at (0, 0) and, for that reason, cannot be differentiable at
(0, 0). On the other hand, an easy calculation verifies that this function has
bidirectional derivative in each direction d at (0, 0), and these bidirectional
derivatives equal 0. In other words, (8) holds with yâ•›=â•›0.
As was the case in Chapter 17, ei denotes the n-vector that has 1 in its ith
position and has 0’s in all other positions. The real-valued function f of n vari-
ables is now said to have yi as its ith partial derivative at the point x in nn if f is
defined in a neighborhood of x and if there exists a finite number yi such that
f (x + εei ) − f (x)
(13) yi = limε→0 .
ε
It is emphasized:
The notation that is used to describe partial derivatives varies with the
context. If f is thought of as a function of the n-vector x and if the ith partial
Chapter 18: Eric V. Denardo 575
∂f ∂f
and (x),
∂xi ∂xi
∂f ∂f
and (u, v, w).
∂v ∂v
Consider a function that has partial derivatives. Must this function have
bidirectional derivatives? No.
Example 18.4.╇ Let the function f of two variables have f(0, 0)╛=╛0 and
2x1 x2
f(x1 , x2 ) = ,
x1 2 + x2 2
Proposition 18.5.╇ Let f map an open subset S of n into . The following are
equivalent:
parses f(xâ•›+â•›d) – f(x) into the sum of two terms, with only the 1st element
of x varying in the 1st term and only the 2nd element of x varying in the 2nd
term. The partial derivatives exist within Br(x), and the Mean Value theorem
(Proposition 18.2) shows that there exist numbers α1 and α21 that lie strictly
between 0 and 1 for which
∂f
(14) f(x + d) − f(x) = d1 (x1 + α1 d1 , x2 )
∂x1
∂f
+ d2 (x1 + d1 , x2 + α2 d2 ).
∂x1
Let ε be any positive number. The continuity of the partial derivatives on
S guarantees that there exists a positive number δ that is a function of ε such
that, for iâ•›=â•›1, 2,
∂f ∂f
(15) (z) − (x) < ε/2 ∀ z such that ||z − x|| < δ.
∂x ∂xi
i
Divide the above by ||d|| and then let ε → 0 to see that f is differentiable at x.
This shows that f is differentiable on S. That the derivative is continuous on S
is immediate from (16) and (17). ■
The key to Proposition 18.5 is to vary the coordinates one at a time and
use the mean value theorem once per coordinate. Rolle to the rescue!
8. Review
3. (↜polar coordinates, continued) Consider the function g(r, θâ•›) = r sin (3θâ•›).
Does this function have partial derivatives at (0, 0)? Does it have bidi-
rectional derivatives at (0, 0)? Is it differentiable at (0, 0)? Support your
answer.
5. With the function f that is defined by (3), let g(x) = (1/2) f(x) +
(1/4)f(x – 1) + (1/8) f(x – 1/2) + (1/16) f(x – 1/3) + (1/32) f(x – 2/3).
c) Suppose (this is true) that the rational numbers can be placed in one-
to-one correspondence with the positive integers. Does there exist a
differentiable function h(x) whose derivative is discontinuous at every
rational number? Support your answer.
6. Consider the function f of two variables that has f(0, 0)â•›=â•›0 and, for all
other 2-vectors, has
Chapter 18: Eric V. Denardo 579
2uv2
f (u, v) = .
u2 + v 4
Is this function continuous at 0? Does it have bidirectional derivatives at
0? Is it differentiable at 0? Support your answer.
7. For what directions d does the function f given in Example 18.4 have bidi-
rectional derivatives at xâ•›=â•›(0, 0)? Support your answer.
1. Preview
This chapter is sprinkled with examples of the pathologies that the analy-
sis of convex functions must skirt.
2. Introduction
Convex functions are closely related to convex sets. Let us recall from
Chapter 17 that a subset S of n is convex if S contains the line segment be-
tween every pair of n-vectors in S, that is, if
(1) αx + (1 − α)y ∈ S
for each pair x and y of vectors in S and for every number α between 0 and 1.
holds for each pair x and y of vectors in S and for each number α that satisfies
0 ≤ α ≤ 1.
Geometric Insight
Figure 19.1↜ A convex function f of one variable and two of its chords.
f (c) − f (b)
f (b) slope =
c−b
f (c)
a b c
Not displayed in Figure 19.1 is the chord connecting the pairs [a, f(a)]
and [c, f(c)]. This chord lies above the value that f assigns to each number x
that lies strictly between a and c. Figure 19.1 suggests – correctly – that:
Inequality (2) need not hold strictly; a linear function is convex, for in-
stance.
In Figure 19.1, the chord to the right has a higher (less negative) slope
than the chord to the left. If a and b are close to each other, the slope of the
584 Linear Programming and Generalizations
chord that connects them approximates the derivative (if it exists) of f at, say
(aâ•›+â•›b)/2. This suggests – correctly, as we shall see – that:
Of the three properties that are highlighted above, the first is obvious,
and the other two are verified in Proposition 19.5.
A mnemonic
Concave functions
A real-valued function f that is defined on a convex subset S of n is
said to be concave on S if the function −f is convex on S, equivalently, if the
“≤” in (2) is replaced by “≥”. Each property of a convex function becomes a
property of a concave function when the requisite inequality is reversed. For
instance, a concave function lies on or above its chords, Also, a differentiable
function f of one variable is concave if its slope (derivative) fâ•›'(x) is nonin-
creasing.
Economic interpretation
Suppose f(x) measures the cost of acquiring x units of a good. If this cost
function is convex, the marginal cost f(xâ•›+â•›1) − f(x) of acquiring one more
unit can only go up as the quantity increases. Similarly, suppose g(x) mea-
sures the profit obtained producing x units of a good. If this profit function is
concave, the marginal profit g(xâ•›+â•›1) − g(x) of producing one more unit can
only go down as the quantity increases. Convex and concave functions are
Chapter 19: Eric V. Denardo 585
Terminology
Within this book, a function f that assigns a real number to each vector x
in a convex set S of n-vectors is said to be convex on S if f satisfies (2). As you
explore the literature, you will find that some writers use different nomen-
clature: They extend to n the domain of a function f that is convex on S by
setting f(z)â•›=╛╛+∞ for each n-vector z that is not in S. In this book, functions
whose values can be infinite are avoided.
c−b b−a
b= a+ c
c−a c−a
c−b b−a
f (b) ≤ f (a) + f (c).
c−a c−a
586 Linear Programming and Generalizations
Now suppose part (b) is satisfied. Each step of the above argument is re-
versible, so f is convex on S. ■
f (x0)
f (x2)
f (x1)
x0 x1 x2
In Figure 19.2, the n-vectors x0, x1 and x2 lie in the set S on which the
function f is convex; the vector x1 is reached by starting at x0 and moving
some positive number v of units in some direction d, and x2 is reached by
moving farther in the same direction, d.
(4) x1 = x0 + vd, x2 = x0 + (v + w) d.
Chapter 19: Eric V. Denardo 587
Then
Remarks:╇ The inequalities in (5) mirror the relations between the slopes
in Figure 19.2. The proof of Proposition 19.2 is similar to the proof of Propo-
sition 19.1.
x1 − x0 x2 − x 0
d= = .
v v+w
v w
x1 = x2 + x0 .
v+w v+w
The numbers v/(vâ•›+â•›w) and w/(vâ•›+â•›w) are nonnegative, and they sum to 1,
so the convexity of f on S gives
v w
f (x1 ) ≤ f (x2 ) + f (x0 ).
v+w v+w
Multiply the above inequality by the positive number (vâ•›+â•›w) and rear-
range the resulting inequality as w[f (x1) − f (x0)] ≤ v[f (x2) − f (x1)], then di-
vide by the product v w of the positive numbers v and w to obtain
w+v
[f (x1 ) − f (x0 )] ≤ f (x2 ) − f (x0 ).
v
588 Linear Programming and Generalizations
which is the second of the desired inequalities. To obtain the third, multi-
ply (6) by v, then add f(x2)â•›−â•›f(x1) to both sides and proceed as above. ■
The definition of convexity requires that the value that a convex function
f assigns to a convex combination of x and y cannot exceed the same convex
combination of f(x) and f(y). A similar bound holds for the convex combina-
tion of three or more points.
f[α1 x1 + α2 x2 + · · · + αr xr ] ≤
α1 1 αr−1 r−1
(1 − αr ) f x + ··· + x + αr f(xr ),
1 − αr 1 − αr
is convex on Sâ•›=â•›{x : 0 < x ≤ 1}. The epigraph of this function is the convex
subset T of 2 that is depicted in Figure 19.3.
1 T
0.5
x
0 1
Figure 19.1 exhibits a convex function of one variable and two of its
chords. It’s clear visually that as a chord shifts right-ward, its slope can only
increase.
b
f (b) − f (a) = f (z)dz ≤ f (b)(b − a),
a
c
f (c) − f (b) = f (z)dz ≥ f (b)(c − b).
b
Divide the first inequality by (bâ•›−â•›a), divide the second by (câ•›−â•›b) and then
subtract to eliminate fâ•›'(b), obtaining
Listed below are properties of convex functions that follow directly from
the definition.
592 Linear Programming and Generalizations
(a) For each number β, each n-vector a and each element ŝ of S, the func-
tion
f(x) = β + a · (x − ŝ) ∀x∈S
is convex on S.
h(x) = =+
f(x)
h (x) + g (x) ∀ x ∈∀ Sx ∈ S,
g(x)
f (x)
are convex on S.
Part (a) of this proposition states that linear functions are convex. Part
(b) states that convexity is preserved by multiplying a convex function by a
nonnegative number. Part (c) states that the sum of two convex functions is
convex and that the maximum of two convex functions is convex.
Example 19.2.╇ From Proposition 19.5 and Proposition 19.6, we see that:
Quadratic functions
The Hessian
∂ 2f
(14) H(x)ij = (x)
∂xi ∂xj
Proof.╇ Exactly the same as for Proposition 19.7, but with (13) replaced
by g (α) = 2H(x).. ■
Neighborhoods
An empty interior
A convex set can contain many elements none of which are in its interior.
Witness:
The next few propositions describe properties that hold in the interior of
a convex set. The interior may be empty, as it is in Example 19.3. When the in-
terior is empty, these propositions are vacuous. Or so it seems. In Section 12,
we will see how to apply these results to each point in the “relative interior”
of a convex set. For Example 19.3, each vector in S is in the relative interior
of S, incidentally.
8. Continuity
The boundary
Example 19.1 exhibits a convex function that jumps upward at the bound-
ary of the region on which it is convex. It may seem that a convex function
can jump upward but not downward on the boundary. But consider
Example 19.4.╇ Let S = {(u, v) ∈ 2 : u > 0 } ∪ {(0, 0)} and let the func-
tion f be defined by
2
v /u if u > 0
f(u, v) = .
0 if u = v = 0
596 Linear Programming and Generalizations
This set S is convex, and (0, 0) is the only point on its boundary. It is not
hard to show (Problem 4 suggests how) that this function f is convex on S.
√
Note that for any uâ•›>â•›0 and any kâ•›>â•›0, this function has f (u, ku) = k , inde-
pendent of u. This function jumps downward at (0, 0).
The interior
a2
zm
x
xm
ym
a0 a1
Chapter 19: Eric V. Denardo 597
The set A and the constant K are fixed throughout the remainder of the
proof. Each vector y in A is a convex combination of a0 through an, so Jensen’s
inequality (Proposition 19.3) guarantees
(16) f (y) ≤ K ∀ y ∈ A.
For the second step of the proof, consider any m for which xmâ•›≠â•›x. This
step places lower bounds on f(x) and on f(xm). With c as any number, consid-
er the n-vector x + c(xm − x). For values of c that are close enough to zero,
this vector is in A. For values of c that are sufficiently far from 0, this vector is
not in A. (The dashed line segment in Figure 19.4 corresponds to the values
of c for which this vector is in A.) Define λm and μm by
1 m (λm − 1)
(19) xm = y + x
λm λm
(20) 1 (λm − 1)
f (xm ) ≤ K+ f (x).
λm λm
1 µm
(21) x= zm + xm
(µm + 1) (µm + 1)
1 µm
(22) f (x) ≤ K+ f (xm ).
(µm + 1) (µm + 1)
Inequalities (20) and (22) are the desired lower bounds on f(x) and f(xm).
The third major step of the proof is to let m → ∞. Since xm → x and Since
ym and zm are on the boundary of A, equations (19) and (21) give
λm → ∞ and
µm → ∞, so (20) and (22) give lin supm→∞ f(xm ) ≤ f(x) ≤ lin inf m→∞ f(xm ).
These inequalities show that f (xm ) → f (x), which completes a proof. ■
Chapter 19: Eric V. Denardo 599
No derivative
Example 19.6.╇ Let â•› S = {x ∈ : 0 < x < 1}. The rational numbers (frac-
tions) in S can be placed in one-to-one correspondence with the positive in-
tegers. In such a correspondence, let r(i) be the rational number that corre-
sponds to the integer i, and consider the function f defined by
∞
f (x) = (1/2)i · max{0, x − r(i)}.
i=1
It is not difficult to show that f is increasing and convex on S, but that f fails to
have a derivative at each rational number in S. It can also be shown that f has
a derivative at each irrational number in S.
You may have observed that the functions in Examples 19.5 and 19.6 have
“left” and “right” derivatives at each point in the interior of their domains.
• The same limit in (23) be obtained for every sequence of positive num-
bers that decreases to zero.
• This limit be a number, rather than +∞ or −∞ .
The function f(x) in Example 19.5 is not differentiable at 0, but its unidi-
rectional derivatives at 0 are easily seen to be
d for d ≥ 0
f + (0, d) = .
0 for d ≤ 0
Bidirectional derivatives
The boundary
f (u) = 1 − 1 − u2
is plotted in Figure 19.5. For xâ•›=â•›–1 and dâ•›=â•›+1, the set S contains xâ•›+â•›ε d for all
positive ε that are below 2. But f + ( − 1, +1) does not exist because the ratio
on the RHS of (23) approaches −∞ as ε decreases to 0.
f (u)
u
-1 0 1
The interior
The “support” of a convex function has a similar definition. Let the func-
tion f be convex on a convex subset S of n. This function is said to have a
support at the n-vector x in S if there exists an n-vector d such that
The expression on the RHS of (24) is linear in y, and (24) requires f(y)
to be at least as large as the value that this linear expression assigns to y. The
main result of this section is that a convex function has a support at each
point x in the interior of its domain.
Illustration
the function f ( y) of y
y
x
Chapter 19: Eric V. Denardo 603
The boundary
Differentiable functions
f (x + εd) − f (x)
(25) lim = ∇f (x) · d ∀ d ∈ n .
ε→0 ε
Consider
Proof.╇ Since x and y are in the convex set S, the convex function f satis-
fies
f (1 − ε) x + εy ≤ (1 − ε)f (x) + εf (y)
for all ε having 0â•›<â•›εâ•›<â•›1. Divide the above inequality by the positive number ε
and then rearrange it as
Let ε approach 0, and note from (25) that the LHS of the above inequality
approaches ∇f (x) (y − x). This completes a proof. ■
In brief, the function f given in Example 19.8 does not lie on or above the
plane that matches its value at xâ•›=â•›(u, u) and whose slopes equal to the unidi-
rectional derivatives f + [x, e1 ] and f + [x, e2 ], both of which equal zero.
An existential result
The fact that T contains each pair (x, y) with yâ•›>â•›f(x) guarantees that β
cannot be negative. Aiming for a contradiction, suppose βâ•›=â•›0. In this case, the
inequality in (28) reduces to â•› 0 ≤ α · ((O – x) and (ii) guarantees that the vec-
xx̂ −
tor α cannot equal 0. For each number δ that is sufficiently close to zero, the
x̂x,, y) having O
set T contains each pair((O xx̂ − x = δα and y = f ((O x̂ ). Premultiply
x)
xx̂ − x = δα by α to obtain 0 ≤ α · (O
O (xx̂ − x) = δα · α . Since α is not zero, α · α
is positive, and the preceding inequality cannot hold for any negative value
of δ, so the desired contradiction is established. Thus, (28) holds with βâ•›>â•›0.
Divide (28) by β, define the n-vector d by d = – α/β, and note from (28) that
Since f is convex on S, Proposition 19.2 shows that (29) remains true for
all x̂ ∈S. This proves part (a).
For part (b), suppose the ith partial derivative of f exists at x. Denote as ei
the n-vector having 1 in its ith position and 0’s elsewhere. In (29), set x̂ = x +
δei to obtain f(x + δei) – f(x) ≥ diδ for every number δ having |δ| ≤ ε. For δ > 0,
divide the preceding inequality by δ and then let δ approach zero to obtain
f (x, ei ) ≥ di . For δ > 0, divide the preceding inequality by δ and then let δ
approach zero to obtain f (x, ei ) ≥ di . For δ < 0, divide the same inequality
by δ and let δ approach zero to obtain f (x, ei ) ≤ di. Hence, f (x, ei ) = di,
which completes a proof. ■
606 Linear Programming and Generalizations
Part (a) is existential; it shows that a convex function has at least one sup-
port at each point x in the interior of its domain, but it does not show how
to construct a support. Part (b) shows that a convex function that has partial
derivatives at x has exactly one support at x, moreover, that this support has d
equal to the vector of partial derivatives, evaluated at x.
f(y) ≥ f(x) + z · (y − x) ∀ y ∈ S,
f (x + d m ) − f (x) − z · d m ≥ 0.
f (x + d m ) − f (x) − z · d m
(30) lim sup m→∞ ≤ 0.
||d m ||
Chapter 19: Eric V. Denardo 607
Jensen’s inequality will be used to verify (30). With dim as the ith entry in
dm, we designate
n |dim |
|dm | = i=1 |dim | and αim = |dm |
for i = 1, 2, . . . , n.
Note that the sum over i of αim equals 1. As usual, ei denotes the n-vector
having 1 in its ith position and 0’s in all other positions.
where “o(ε)” is short for any function of a(ε) such that a(ε)/εâ•›→â•›0 as εâ•›→â•›0.
Consider the identity
n
x + dm = i=1 αi (x + |dm |ei ).
n
f(x + dm ) ≤ i=1 αi [zi |dm | + o(|dm |) + f(x)].
√
For any nonzero vector d, the inequality |d|/||d|| ≤ n holds because
replacing any two non-equal entries of d by their average has no effect on
|d| but reduces ||d||. Thus, dividing the inequality that is displayed above by
||d m || yields
f (x + d m ) − f (x) − z · d m o(|dm |) √
(32) ≤ ≤ o( n).
m
||d || m
||d ||
Inequality (32) has been verified for dmâ•›>â•›0. To verify it for any nonzero
vector dm, replace ei by −ei throughout the preceding paragraph for those
entries having dim < 0 . To verify (30), let m → ∞ in (32). ■
A subspace
(33) L(s) = { β (x − y) : β ∈ , x ∈ S, y ∈ S }.
Chapter 19: Eric V. Denardo 609
Thus, L(S) is obtained by taking the difference (x−y) of each pair of vec-
tors in S and multiplying that difference by every real number β. An immedi-
ate consequence of the fact that S is convex is that:
• The subset L(S) of n is a vector space.
• The set L(S) equals n if and only if S has a non-empty interior.
Figure 19.7 illustrates L(S) for the convex set S = {(u, 1 − u) : 0 < u < 1}
of all 2-vectors whose entries are positive numbers that sum to 1. The interior
of S is empty. We will soon see that each vector in S is in its “relative interior.”
v
1 the set S
u
1
the subspace L(S)
S + T = {(x + y) : x ∈ S, y ∈ T}.
Evidently, BSε (0) is a proper subset of Bε (x) if L(S) is a proper subset of n.
For part (a), we consider any pairs {x1 , z1 } through {xk , zk } of ele-
ments of S such that 0.5(x1 + z1 ) through 0.5(xk + zk ) span L(S). The av-
erage of these k vectors is easily seen to be in the relative interior of S, which
proves part (a).
Chapter 19: Eric V. Denardo 611
For part (b), consider any vector x in the relative interior of S. For ε suf-
ficiently close to 0, the set BSε (x) is in S, so nonzero numbers β1 through βk
exist such that xi = x + β i vi ∈ BSε (x) for i â•›=â•› 1, 2,…, k.
Consider any y in S and any number α such that 0 ≤ α < 1. Set zâ•›=â•› x + α(y−x),
set zi = xi + α(y − xi ) for iâ•›=â•› 1, 2,…, k, and set λ = (1 − α) ε. For each i
we have zi ∈ S because S is convex, and we have zi − z = (1 − α)(xi − x) ,
so ||zi − z|| = (1 − α)||xi − x|| ≤ λ . This guarantees zi ∈ BSλ (z) for each
i, hence that z is in the relative interior of S. ■
Thus, if a convex set S contains more than one vector, its relative interior
is nonempty. And, if x is in the relative interior of S and y is in S, then each
vector in the open line segment between x and y is also in the relative interior
of S.
These results hold for convex sets. If a subset S of n is not convex, L(S)
need not be a vector space.
13. Review
3. Suppose that the functions f(x) and g(x) are convex and twice-differen-
tiable on , and that g is nondecreasing. Show that the function f[g(x)] is
convex on . (↜Hint: Differentiate f[g(x)] twice.)
(a)╇Show that this function is convex on the interval between (0, 0) and
any vector (u, v) in S.
(b)╇Show that this function is convex on the interval between any two
non-zero vectors in S. (↜Hint: compute its Hessian.)
5. (Unidirectional derivatives):
(a)╇For Example 19.5, compute the sum f1+ (0, 1) + f1+ (0, −1) of at the
point 0 at which f is not differentiable.
(b)╇For Example 19.6, compute the sum f1+ [r(i), 1] + f1+ [r(i), −1] at the
ith fraction r(i).
6. Show that the function f(x) = ex log (x) is convex on Sâ•›=â•›{x : xâ•›≥â•›1}.
7. Let g(x)â•›=â•›–log(x) and h(x)â•›=â•›x2, and let Sâ•›=â•›{x : xâ•›>â•›0}. Support your answers
to each of the following:
(a)╇ Is g convex on S?
(b)╇ Is h convex on ?
Chapter 19: Eric V. Denardo 613
8. Suppose the functions f and g are convex on , and suppose that these
functions are twice differentiable. Under what circumstance is the func-
tion h(x)╛=╛f[g(x)] convex on ? ↜Hint: It might help to review the preced-
ing problem.
(b)╇For each set {x1 , . . . , xn } of positive numbers and each set {α1 , . . . , αn }
of nonnegative numbers that sum to 1, use part (a) to show that
x1α1 · · · xnαn ≤ α1 x1 · · · + αn xn ,
t hereby verifying that the geometric mean does not exceed the arith-
metic mean.”
for each set {x1 , . . . , xn } of positive numbers and each set {α1 , . . . , αn }
of nonnegative numbers that sum to 1. Hint: part (c) night help.
n n β n 1/α α
i=1 wi xi ≤ i=1 wi i=1 wi xi
(f)╇With constant α having 0â•›<â•›αâ•›<â•›1 and with βâ•›=â•›1â•›–â•›α, prove Hölder’s in-
equality, which is that
n n n
1/β β 1/α α
i=1 yi zi ≤ i=1 yi i=1 zi
(b)╇For the symmetric 3â•›×â•›3 matrix Q whose entries are in cells B2:D4 of
the spreadsheet that appears below, elementary row operations have
produced a matrix L with 1’s on the diagonal, with 0’s above the diag-
onal, and with L Q given by cells L2:N4. Is this matrix L invertible? If
so, what is its inverse? What sequence of elementary row operations
transformed Q into L Q? Is the matrix LQLT symmetric? Is LQLT
diagonal? If so, what entries are on its diagonal?
(c)╇With L as the 3â•›×â•›3 matrix in part (b) and with x as any 3â•›×â•›1 vector, set
y = (LT )−1 x = (L−1 )T x and observe that
(e)╇For the symmetric matrix in cells B2:D4, find the range on the value
of Q21 (its current value equals –3) for which the matrix Q is positive
semi-definite.
1 −2 3 −4
−2 6 4 −4
Q=
3 4 62 −51
−4 −4 −51 239
Chapter 19: Eric V. Denardo 615
12. Can a matrix Q be positive semi-definite if Qiiâ•›<â•›0 for some i? If not, why
not?
13. Take S ⊆ 2 as the intersection of surface of the unit circle and the posi-
tive orthant. Sketch the set L(S) that is defined by (33). Is it a vector space?
14. (↜trivial supports) Consider the convex subset S of 3 that consists of each
vector x that has x12â•›+â•›x22â•›≤â•›4 and x3â•›=â•›1.
1. Preview
• The KKT conditions are necessary and sufficient for a feasible solution
to a nonlinear program to be a global optimum if the objective and
constraints of the nonlinear program satisfy a “constraint qualification”
that is presented in Section 5.
• The KKT conditions are shown to be necessary (but not sufficient) for
a feasible solution to be a local optimum if the objective and constraints
satisfy a different constraint qualification that is presented in Section 9.
• Several algorithms have been devised that do a good job of finding lo-
cal or global optima to nonlinear programs. The generalized reduced
gradient method (abbreviated GRG) is one of them. The GRG method
is built upon the simplex method. It is implemented in Solver and in
Premium Solver. These implementations work well if the functions are
differentiable and if the derivatives are continuous.
The chapter begins with the presentation of the optimality conditions for
a linear program in a format that becomes the KKT conditions when they are
restated in the context of a nonlinear program. As noted above, the objec-
tive and constraints of a nonlinear program must be restricted if its optimal
solution is to satisfy the KKT conditions. Any such restriction has long been
known (somewhat inaccurately) as a constraint qualification. Examples are
presented of the difficulties that constraint qualifications must rule out.
A limitation of this hypothesis is then brought into view, and a less re-
strictive constraint qualification is introduced. If a nonlinear program satis-
Chapter 20: Eric V. Denardo 619
fies that condition, the KKT conditions are seen to necessary for a feasible
solution to be a local optimum.
• The fact that a convex function lies on or above its supports (Proposi-
tion 19.11).
• The fact that a convex function is continuous on the interior of its do-
main (Proposition 19.9).
x ∈ n×1 .
The data in Program 20.1 are the 1 × n vector c, the m × n matrix A, and
the m × 1 vector b. The decision variables form the n × 1 vector x. The con-
straint x ≥ 0 is omitted from Program 20.1. Any nonnegativity constraints
on the decision variables are represented in Program 20.1 by rows of the con-
straint matrix.
620 Linear Programming and Generalizations
Proposition 20.1.╇ Let x* be feasible for Program 20.1. The following are
equivalent.
(2)
λi ≥ 0fori = 1, · · · , m, for i = 1, …, m,
Remark:╇ This result and its proof are familiar. Expressions (1) and (2) are
the constraints of the dual of Program 20.1, and (3) is complementary slack-
ness.
Proof.╇ In Chapter 12, we saw that the dual of Program 20.1 is the linear
program:
(a) ⇒ (b): :Suppose x* is optimal for Program 20.1. The Duality Theo-
rem shows that there exists a row vector λ that satisfies (1) and (2) (which
are the constraints of the dual linear program) and has cx* c∗ = λb. It re-
mains to verify (3). By hypothesis, x* satisfies Ax ≤ b, so that A x*â•›+â•›s = b
where the m × 1 vector s satisfies s ≥ 0. Premultiply the preceding equa-
tion by λ and use λ A = c to obtain c x*â•›+â•›λ s = λb. Since c x*â•›=â•›λb, we have
0 =0λ=s =λ λ1 s1 + · · · + λm sm . Each addend in this sum is nonnegative, so each
addend must equal zero. Hence, if si is positive, it must be that λi equals 0.
This verifies (3).
(b) ⇒ (a) :: Suppose x is feasible for Program 20.1 and that λ satis-
fies (1)-(3), hence that λ is feasible for the dual of Program 20.1. Multiply
the constraint Ai x ≤ bi by the nonnegative number λi and use (3) to get
λi Ai x = λi bi . Sum over i to obtain λAx = λb. Equation (1) is λA = c, so
lambdab.
Chapter 20: Eric V. Denardo 621
we have λAx = cx = λb. The Duality Theorem shows that x is optimal for
Program 20.1. ■
In prior chapters, the variable that was complementary to the ith con-
straint of a linear program was called the multiplier for that constraint and
was denoted yi . The symbol λi suggests (correctly) that the variable that is
complementary to the ith constraint of a nonlinear program will be called the
Lagrange multiplier for that constraint.
Program 20.1 is a special case of a the nonlinear program that that ap-
pears below as
In Program 20.2, f(x) and g1 (x) through gm (x) are real-valued functions
of the decision variables x1 through xn . To place Program 20.1 in the format
of Program 20.2, set
Terminology
A canonical form
Gradients are now used to express the optimality conditions for Program
20.1 in terms of the functions f(x) and g1(x) through gm (x). These functions
are linear in x. Their gradients (vectors of partial derivatives) are
m
(7) ∇f(x) = i=1 λi ∇gi (x).
Chapter 20: Eric V. Denardo 623
Thus, with f and g1 through gm specified by (4) and (5), Proposition 20.1
shows that a feasible solution x to Program 20.1 is optimal if and only if there
exist numbers λ1 through λm that satisfy (7), (8) and (9), where
(8) 0i ≥ 0for i =
λi ≥ λ for1,i .=. .1,
, m. . . , m ,
(9) λi gi (x) = 0 for i = 1, . . . , m .
Nomenclature
A qualification
One might hope that the analogue of Proposition 20.1 holds for nonlin-
ear programs – that a feasible solution to Program 20.2 is optimal if and only
if it and a vector of Lagrange multipliers satisfy the KKT conditions. That
need not be true. It is true if the objective and constraints of Program 20.2
satisfy a “constraint qualification” that will be introduced shortly.
624 Linear Programming and Generalizations
An illustration
g1(x) ≤ 0
+
-
g3 (x) ≤ 0
+
-
S ∇g1 ( y)
D( y) ∇f ( y)
- C( y)
+
y
∇g2 ( y)
g2 (x) ≤ 0
Example 20.1 falls into the format of Program 20.2 when we set n = m = 1
and
f (x) = x, g1 (x) = (x − 1)3.,
No interior
The optimal solution to a nonlinear program can fail to satisfy the KKT
conditions if its feasible region S has no interior, as is illustrated by
The first of these constraints keeps the pair (x1 , x2 ) from lying outside
the circle of radius 1 that is centered at (1, 0). The second constraint keeps
the pair (x1 , x2 ) from lying outside the circle of radius 1 that is centered at
(3, 0). The only feasible solution is x* = (2, 0). Example 20.2 falls in the format
of Program 20.2 when we take n = 2, m = 2 and define f, g1 and g2 by
626 Linear Programming and Generalizations
g1(x) ≤ 0 g2 (x) ≤ 0
x2
+ ∇f (x*) +
1
- -
-1
x* = (2, 0)
Note visually that ∇g1 (x∗ ) points to the right, that ∇g2 (x∗ ) points to the
left and that ∇f (x∗ ) points toward the top of the page. This makes it impos-
sible to express ∇f (x∗ ) as a linear combination of ∇g1 (x∗ ) and ∇g2 (x∗ ).
Algebraically, we have x* = (2, 0), and
for which reason no multipliers can satisfy (7). This difficulty can crop up
when the feasible region has an empty interior.
A cusp
If x1 > 1, the RHS of the 1st constraint is negative, and 2nd constraint is
violated. Hence, the unique optimal solution is x* = (1, 0). To place Example
20.3 in the format of Program 20.2, we take n = 2, m = 2, and
x2
1.5
∇g1 (x*)
1 g1 (x) ≤ 0
+
0.5 -
-
0 ∇f (x*) x1
+ 1 2
-0.5
g2 (x) ≤ 0 x* = (1, 0)
-1
∇g2 (x*)
-1.5
Affine functions
N = {1, 2, . . . , m}\L.
Interpret N as the set consisting of those i for which the ith constraint is
“genuinely nonlinear.”
An hypothesis
Hypothesis #1.
art (a): The functions – f and g1 through gm are convex and differen-
P
tiable on an open convex set T that contains S.
Part (b): There exists a feasible solution x̄ to Program 20.2 that satisfies
gi (x̄) < 0 for each i ∈ N.
Part (b) requires that Program 20.2 has a feasible solution that satisfies
each “genuinely nonlinear” constraint as a strict inequality. Let us see why
Hypothesis #1 rules out Examples 20.1, 20.2 and 20.3:
Chapter 20: Eric V. Denardo 629
• Examples 20.1 and 20.3 violate Part (a) because the function g1 is not
convex.
• Example 20.2 violates Part (b) because it has N = {1, 2}, and no feasible
solution x̄ satisfies both nonlinear constraints strictly.
Morton Slater
Appeal
• For an instance of Program 20.2 that satisfies Hypothesis #1, the analog
of Program 20.1 holds: A feasible solution is a global optimum if and
only if it and a set of Lagrange multipliers satisfy the KKT conditions.
Sufficiency
Proof.╇ Proposition 20.2 shows that the set S of feasible solutions to Pro-
gram 20.2 is convex. The hypothesis of Proposition 20.3 is that x* is feasible
for Program 20.2 and that x* and an m-vector λ satisfy (7)-(9).
Consider any feasible solution x for Program 20.2. For each i between 1
and m, we have 0 ≤ gi (x) because x is feasible. By hypothesis, gi is convex
on S and is differentiable at x*. A convex differentiable function lies on or
above its supports (Proposition 19.11), which justifies the second inequality
in
Chapter 20: Eric V. Denardo 631
(12) 0 ≥ ∇f (x∗ ) · (x − x∗ ) .
The function f is concave and is differentiable at x*, so Proposition 19.11
also guarantees
Showing that the KKT conditions are sufficient for a feasible solution x*
to be a global optimum has been fairly straightforward. The main tool in the
proof of Proposition 20.3 is the fact that a convex function lies on or above
its supports.
Necessity
It has now been shown that (x∗ + εd) is feasible for all sufficiently small
positive values of ε. By hypothesis, ∇f (x∗ ) · d > 0, and the fact that f is dif-
ferentiable at x* couples with Proposition 18.3 to give
f (x∗ + εd) − f (x∗ )
lim ε↓0 = ∇f (x∗ ) · d > 0.
ε
This inequality shows that f (x∗ + εd) > f (x∗ ) for all sufficiently small
positive number ε. This contradicts the optimality of x*, which completes a
proof. ■
Consider the case in which a solution to (15)-(17) has µ0 > 0. Recall that
E is the set of constraints that are binding at x*, hence that dividing (16) by
µ0 shows that the gradient of the objective is a nonnegative linear combina-
tion of the gradients of the binding constraints, equivalently, that (7)-(9) hold.
The above and (20) produce the contradiction 0 > 0, which completes a
proof. ■
Recap
The proofs of Propositions 20.2 through 20.5 rely principally on the sup-
porting hyperplane theorem for a convex function and the Duality Theorem
of linear programming. In concert, these propositions prove
(b) There exists an m-vector λ such that x* and λ satisfy the KKT condi-
tions.
Thus, for nonlinear programs that satisfy Hypothesis #1, the KKT condi-
tions are necessary and sufficient for a feasible solution to be optimal. For
nonlinear programs that satisfy Hypothesis #1, Proposition 20.6 is the exact
analogue of Proposition 20.1.
The KKT conditions are succinct because (7) is written in terms of gradi-
ents. It is actually a system of n equations, one per decision variable. The data
in each equation are the partial derivatives of the objective and constraints
with respect to its decision variable.
Chapter 20: Eric V. Denardo 635
A recipe
–
â•fi Its RHS equals the partial derivative of the objective with respect to
that decision variable.
–â•fi Each addend on its LHS equals the product of (i) the partial deriva-
tive of a constraint with respect to that decision variable and (ii) the
Lagrange multiplier that is complementary to that constraint.
An example
Complementary constraints
Example 20.4 has two decision variables. The decision variable x is non-
negative, so row 1 shows that its complementary constraint is a “≤” inequal-
ity. The decision variable y is free, so row 2 shows that its complementary
constraint is an equation. The coefficients of the constraint that is comple-
mentary to x are found by differentiating the objective and constraints with
respect to x, and that constraint is
x: 4 λ1 + 0.5 x−0.5 λ2 ≤ ex .
The KKT conditions that are obtained from this recipe are equivalent
to those that would be obtained by forcing Example 20.4 into the format of
Program 20.2 and then using (7)-(9). Proving that this is so would be cum-
bersome, elementary, and uninsightful. A proof is omitted.
8. Minimization
The major results in this chapter are presented in the context of a maxi-
mization problem. For convenient reference, these results are restated in the
context of a minimization problem. Let us consider
gi (x) ≥ 0 for i = 1, 2, . . . , m,
n×1
x∈ .
Hypothesis #1MIN.
Part (a): The functions f and – g1 through – gm are convex and differen-
tiable on a convex open set T that contains S.
Part (b): There exists a feasible solution x̄ to Program 20.2MIN that
satisfies
gi (x̄) > 0 for each i ∈ N.
It is easy to check that Hypothesis #1MIN becomes Hypothesis #1 when
this minimization problem is converted into an equivalent maximization
problem.
λi ≥ 0 for i = 1, . . . , m,
λi gi (x) = 0 for i = 1, . . . , m.
638 Linear Programming and Generalizations
Evidently, the KKT conditions for Program 20.2MIN are identical to those
for Program 20.2.
On the other hand, if the function g3 is not affine, replacing the con-
straint g3 (x) = 0 by the same pair of inequalities destroys Part (a) of Hy-
pothesis #1 because it cannot be the case that the functions g3 and −g3 are
both convex. Hypothesis #1 accommodates equality constraints only if they
are affine.
Program 20.4 differs from Program 20.2 only in that m – r of its con-
straints are equations. From row 2 of the cross-over table, we see that the
multipliers for those equations are free (unconstrained in sign). The KKT
conditions for Program 20.4 are
m
(21) ∇f(x) = i=1 λi ∇gi (x),
Hypothesis #2.
Part (a): The functions f and g1 through gm are differentiable on an
open set T that contains S.
Part (b): The gradients of the constraints that are binding at each local
optimum x* are linearly independent.
• Example 20.1 violates Part (b) because its optimal solution x* has
g1 (x∗ ) = 0.and ∇g1 (x∗ ) = 0.
• Examples 20.2 and 20.3 violate Part (b) because the both exam-
ples have optimal solutions x* that have g1 (x∗ ) = g2 (x∗ ) = 0 and
∇g1 (x∗ ) = −∇g2 (x∗ ).
Necessity
Sufficiency?
The objective of Example 20.5 is linear. Its feasible solutions are the
points (x1 , x2 ) that lie on the circle of radius 1 that is centered at (0, 0). This
example’s gradients are
√
∇f (x) = ( 3, 1) and ∇g1 (x) = (2x1 , 2x2 ).
A feasible solution for Example 20.5 satisfies the KKT conditions if there
exist numbers x1 , x2 and λ for which x1 2 + x2 2 = 1 and ∇f (x) = λ∇g(x).
An easy computation verifies that these equations have two solutions, which
are displayed below.
√
λ = 1, x1 = 3/2, x2 = 1/2
√
λ = −1, x1 = − 3/2, x2 = −1/2
One of these solutions is the point on the unit circle that maximizes f(x).
The other is the point on the unit circle that minimizes f(x). Evidently, under
Hypothesis #2, the KKT conditions are insufficient; they do not guarantee a
local maximum.
The KKT conditions have a brilliant history. In the summer of 1950, they
and a constraint qualification were first presented to the research commu-
nity in a paper by Kuhn and Tucker.1 That paper was instantly famous, and
the conditions in it became known as the Kuhn-Tucker conditions. The con-
straint qualification that Kuhn and Tucker employed differs from Hypoth-
esis #2. Their main result was akin to Proposition 20.7. It showed that their
constraint qualification guarantees that each local optimum satisfies the KKT
conditions. More than two decades elapsed before the research community
became aware that William Karush had obtained exactly the same result in his
unpublished 1939 master’s thesis2. The Kuhn-Tucker conditions have hence
(and aptly) been called the Karush-Kuhn-Tucker (or KKT) conditions.
Tucker
nearly two decades – a particularly brilliant era, one in which he nurtured the
careers of dozens of now-famous contributors to the mathematical underpin-
nings of operations research, game theory, and related areas.
Kuhn
Karush
John
By 1948, Fritz John4 had obtained a weakened form of the KKT condi-
tions in which ∇f(x∗ ) is replaced by λ0 ∇f(x∗ ) , where λ0 must be nonnega-
tive, but can equal 0. John’s paper omits the constraint qualification that is
shared by the work of Karush and of Kuhn and Tucker.
3╇ Takayama,
A., Mathematical Economics, Drysdale Press, Hinsdale, Illinois, 1974.
4╇ John,
F., Extremum problems with inequalities as subsidiary conditions, Studies
and essays presented to Richard Courant on his 60th birthday, Interscience, New York,
pp. 187-204, 1948.
Chapter 20: Eric V. Denardo 643
Slater
A personal reminiscence
Readers who wish to learn more about the origins of nonlinear program-
ming and its relationship to the work of Lagrange and Euler are referred to a
personal reminiscence by a pioneer, Harold W. Kuhn6.
Discussed in this section are a few tips that can help you to obtain good
results with the GRG method. These tips are presented in the context of a
nonlinear program, but some of them apply to nonlinear systems as well.
5╇ Slater,
M., “Lagrange multipliers revisited: a contribution to nonlinear program-
ming,” Cowles Commission Discussion Paper, Mathematics 403, November, 1950.
6╇ Kuhn, H. K., “Nonlinear programming: a historical note,” A history of mathemati-
The GRG method seeks a local optimum, which may or may not satisfy
the KKT conditions.
Solver and Premium Solver are equipped with versions of the GRG meth-
od that differentiate “numerically.” This means that it approximates each par-
tial derivative by evaluating the function at closely spaced values. As might be
expected, this works best when the functions are differentiable and when the
derivatives are continuous.
experiment – initialize the GRG method several times, with different values
in the changing cells. It is emphasized:
Try to initialize the GRG method with reasonable values in the chang-
ing cells. If necessary, experiment.
If you use functions that are continuous but not differentiable, you may
get lucky. You can even get lucky if you use a discontinuous function. Using
a discontinuous function is not recommended! Use a binary variable instead.
Solver and Premium Solver are equipped to tackle nonlinear systems some of
whose variables are explicitly required to be integer-valued.
A quirk
The GRG code has a quirk. It may attempt to evaluate a function for a
value that lies outside of the range on which the function is defined. It can at-
tempt to compute log(x) for a value of x that is negative, for instance. Includ-
ing the constraint x ≥ 0 does not keep this from occurring. Its occurrence can
bring Excel to a halt. Two ways around this quirk are presented below.
This will not occur if you start “close enough.” Place a positive lower
bound K on the value of those variables whose logarithms are being com-
puted, and solve the problem repeatedly, gradually reducing K to 0. Initialize
each iteration with the optimal solution for a somewhat higher value of K.
This tactic can avoid logarithms of negative numbers.
A slicker way is to use Excel’s “ISERROR” function. Suppose that the ob-
jective of a nonlinear program is to maximize the expression
646 Linear Programming and Generalizations
n
(24) j=1 cj ln (xj ),
The GRG method has a great deal in common with the simplex method.
A sketch of the GRG method is presented in this starred section. This sketch
is focused on its use to solve a nonlinear program. It seeks a local optimum.
It parses the problem of finding a local optimum into a sequence of “line
searches,” each of which optimizes the objective over a half-line or an interval.
Line search
Having solved the line search, the GRG method corrects the vector
x + θ d to account for any curvature in the set of solutions to the constraints
that were binding at x. It then iterates by finding a new improving direction,
executing a new line search, and so forth. How it accomplishes these steps will
be explored in a series of examples.
Chapter 20: Eric V. Denardo 647
No binding constraints
How the “naïve start” gets its name will be exposed in the context of
For Example 20.6, let’s initiate the naïve start with the feasible solution
x = (1, 1), for which ∇f (x) = (1, −1). For its first line search, this algorithm
takes d = (1, −1), so
x2
1
0 x1
1 2 3 3 4 5 6
-1
Attenuation
There are several ways in which to attenuate the zigzags. Solver and Pre-
mium Solver use one of them. Table 20.1 reports the result of applying Solver
to Example 20.6. Its first line search proceeds exactly as does the naïve start.
Subsequent iterations correct for the zigzag. The constraint x1 ≤ 100 becomes
binding at the 7th iteration, and the optimal solution, (x1 , x2 ) = (100, 0), is
reached at the 8th iteration.
Zigzagging begins whenever a line search fails to change the set of bind-
ing constraints. The Generalized Reduced Gradient method picks its improv-
ing direction d so as to attenuate the zigzags.
The GRG method builds upon the simplex method. To indicate how, we
begin with an optimization problem whose constraints are linear and whose
objective is not, namely
The decision variables in Program 20.5 form the n × 1 vector x. Its data
are the m × n matrix A, the m × 1 vector b, and the function f(x).
A(x + θ d) = b.
An illustration
1 1 1 1 3
A= and b= .
6 4 2 1 12
20 −10 −40 0
∇f (x)
(25) = 1 1 1 1 .
A
6 4 2 1
Pivoting
T
The feasible solution x = 1 1 1 0 equates x1 , x2 and x3 to
positive values, but the matrix A has only two rows, so there is a choice as to
the columns that are to become basic. Let’s pivot on the coefficient of x1 in
the 1st row of A and on the coefficient of x3 in the 2nd row of A. These two
pivots transform the tableau on the RHS of (25) into
0 0 0 55
Search direction
The entries in the top row of (26) play the role of reduced costs. Evidently,
T
perturbing x = 1 1 1 0 by setting variable x4 equal to θ changes
the objective by approximately 55 θ when the values of the variables x1 and
x3 whose columns have become basic are adjusted to preserve a solution to
the equation system. The changes ∇x1 and ∇x3 that must occur in the val-
ues of these variables are found by placing the homogeneous system whose
LHS is given by (26) (and whose RHS consists of 0’s) in dictionary format;
x1 = 0.25θ,
x3 = −1.25θ.
Chapter 20: Eric V. Denardo 651
This line search finds the value of θ that maximizes f (x + θd) while
keeping x + θ d ≥ 0. The optimal value of θ equals 0.8, at which point x3
decreases to 0. This line search results in the feasible solution
The variable x3 that had been basic now equals 0. The variable x4 that had
been nonbasic now equals 0.8. Replacing the top row of (26) by (27) and then
pivoting so as to keep x1 basic for the 1st constraint and to make x4 basic for
the 2nd constraint produces the tableau
0 −20.4 15.2 0
(28) 1 0.6 0.2 0 .
0 0.4 0.8 1
The current feasible solution has x2 = 1.2, which is positive. The reduced
costs (top- row coefficients) in (28) show that the next line search will reduce
the nonbasic variable x2 (its reduced cost is negative) and increase the non-
basic variable x3 (its reduced cost is positive). The direction d in which this
search occurs will adjust the values of the basic variable so as to preserve a
solution to the homogeneous equation A d = 0. This direction d will satisfy
d2 = −20.4,
d3 = 15.2,
The ideas that were just introduced are now adapted to Program 20.5
itself. Each iteration begins with a vector x that satisfies A x = b and x ≥ 0.
Barring degeneracy, x has at least m positive entries (one per row of A), and x
may have more than m positive entries. The direction d in which the next line
search will occur is selected as follows:
1. Given this vector x, pivot to create a basic variable for each row but the
topmost of the tableau
∇f (x)
(29) ,
A
but do not pivot on any entry in any column j for which xj = 0. De-
note as β the set of columns on which pivots occur. (If x has more than
m positive elements, there is choice as to β.) The tableau that results
from these pivots is denoted
c̄(x)
(30) .
Ā
• If xk > 0 and k ∈
/ β, then dk = c̄(x)k .
(31)
dk = − Āij dj
j∈
/β
Deja vu
The ensuing line search will find the value of θ that maximizes f (x + θd)
while keeping x + θ d ≥ 0. The usual ratios determine the largest number ρ
for which x + ρd ≥ 0. If θ is less than ρ, a zigzag has commenced, and it will
need to be attenuated.
Nonlinear constraints
To discuss the GRG method in its full generality, we turn our attention to
a nonlinear program that has been cast in the format of
With nonlinear constraints, the vector x + θd that results from the line
search is very likely to violate A(x + θ d) = b. When that occurs, a correc-
tion is needed. Methods that implement such corrections lie well beyond the
scope of this discussion. The “G” in GRG owes its existence, in the main, to
the way in which corrections are made.
654 Linear Programming and Generalizations
Our sketch of the GRG method has been far from complete. Not a word
has been said about how it finds a feasible solution x to a nonlinear program,
for instance.
With supports substituted for gradients, the KKT conditions for Program
20.2 become
(34) a0 = m i
i=1 λi a ,
(35) λi ≥ 0 = i = 1,
for ifor 1, 2, .2,
. . . ,.m,
. , m,
Sufficiency
A demonstration that the KKT conditions are sufficient follows the ex-
actly same pattern that it did under Hypothesis #1.
Proof.╇ Proposition 20.2 holds as written because its proof does not use
differentiability. Proposition 20.3 holds when (10) is replaced by (33), when
(7) is replaced by (34), and when (11) is replaced by (32). ■
Necessity?
- S
+
-
+
a1
(0, 0)
∇f (0, 0)
Necessity
14. Review
5. For Program 20.2, suppose that the functions f and g1 through gm are
differentiable. Let x* be feasible, and suppose every feasible solution other
than x* has (x − x∗ ) · ∇f (x∗ ) < 0. Show that x* is a local maximum.
4 x y + 3 x z + 2 y z ≤ 72,
x ≥ 0, y ≥ 0, z ≥ 0.
Then write the KKT conditions for the same optimization problem, and
solve them analytically. Do you get the same solution?
7. The data in the optimization problem that appears below are the positive
numbers a1 through am and the positive numbers b1 through bm . What
is its optimal solution? Why?
658 Linear Programming and Generalizations
Minimize m 2
j=1 aj (xj ) , subject to
m
j=1 bj xj = 100, xj ≥ 0 for j = 1, 2, . . ., n.
(a) Show that x∗ is a global minimum. Big hint: Do parts (b)-f) first.
(b) Is A d = 0?
10. A slight variant of the linear program that was used in Chapter 4 to intro-
duce the simplex method is as follows: Maximize (2 xâ•›+â•›3 y), subject to the
six constraints
x − 6 ≤ 0, (x + y − 7)3 ≤ 0, 2 y − 9 ≤ 0,
−x + 3 y − 9 ≤ 0, −x ≤ 0, −y ≤ 0.
Chapter 20: Eric V. Denardo 659
Exhibit its feasible region and solve it graphically. Does its optimal solu-
tion satisfy the KKT conditions? If not, why not?
11. Use the GRG method to find an optimal solution to Example 20.3 (on
page 626). Did it work? If so, does the solution that it finds satisfy the
KKT conditions?
12. Prove the following: Part (a) of Hypothesis #1 guarantees that a local
maximum for Problem 2 is a global maximum.
13. Suppose that x* is a local maximum for Program 20.2 and Hypothesis #1
is satisfied, except that the functions – f and g1 through gm are not dif-
ferentiable. Show that x* is a global maximum.
(b) Use Solver or Premium Solver to find an optimal solution to it. Ob-
tain a sensitivity report.
15. The data in the nonlinear program that appears below are the m × n ma-
trix A, the m × 1 vector b, the 1 × n vector c and the symmetric n × n
matrix Q. Write down the KKT conditions for this nonlinear program.
1 T
z∗ = min cx + x Qx , subject to Ax = b, x ≥ 0.
2
(a) Pivot to make x1 basic for the 1st row of A and to make x2 basic for
the 2nd row of A, so that β = {1, 2} rather than {1, 3}.
(b) With reference to this (new) basis, find the reduced gradient d.
(c) Execute a line search in this direction d. Specify the feasible solution
that results from this line search.
660 Linear Programming and Generalizations
(d) True or false: In an iteration of the GRG method, the set β of columns
is made basic has no effect on the feasible solution that results from
the line search.
17. On pages 649-651, the GRG method “pivoted” from the feasible solution
x = [1â•… 1â•… 1â•… 0]T to the feasible solution x = [1.2â•… 1â•… 0â•… 0.8]T. De-
scribe and execute the next iteration.
18. The data in NLP #1 and NLP #2 (below) are the m × n matrix A, the m × 1
vector b and the n × 1 vector c. Set S = {x ∈ n×1 : Ax = b, x ≥ 0}.
Assume that the numbers c1 through cn are positive, that S is bounded,
and that S contains at least one vector x each of whose entries is positive.
Ax = b, x ≥ 0.
n
(a) Show that every feasible solution to NLP#1 has y b ≥ j=1 cj .
(b) Show that there exists a positive number ε such that the optimal solu-
tion to NLP #2 is guaranteed to satisfy xj > ε for j = 1, 2, …, n. Does
the variant of NMP #2 that includes these positive lower bounds sat-
isfy Hypothesis #1? If so, write down its KKT conditions.
(c) Use part (b) to show that NLP #1 has an optimal solution and that
each of its optimal solutions:
n
– has y b = j=1 cj ,
19. (critical path with workforce allocation). The tasks in a project correspond
to the arcs in a directed acyclic network. This network has exactly one
node α at which no arcs terminate and exactly on node ω from which no
arcs emanate. Nodes α and ω represent the start and end of the project.
Each arc (i, j) represents a task and has a positive datum cij , which equals
Chapter 20: Eric V. Denardo 661
the number of weeks needed to complete this task if the entire workforce
is devoted to it. If a fraction xij of the workforce is assigned to task (i, j),
its completion time equals cij/xij. Work on each task (i, j) can begin as
soon as work on every task (k, i) has been completed. The problem is
to allocate the workforce to tasks so as to minimize the time needed to
complete the project.
(b) Show that the minimum project completion time equals i,j cij
weeks, show that all tasks are critical (delaying the start of any task
would increase the project completion time), and show how to find
the unique optimal allocation x of the workforce to tasks.
Note: Problems 18 and 19 draw upon the paper, “A nonlinear allocation prob-
lem,” by E. V. Denardo, A. J. Hoffman, T. MacKensie and W. R. Polleyblank,
IBM J. Res. Dev., vol. 36, pp. 301-306, 1994.
Index
E Excel cell, 35
Eaves, C., 538 absolute address, of 37, 38
economy, 463 entering functions in, 36
agents, 463 entering numbers in, 35
consumers and producers, 463 fill handle of, 35
consumers’ equilibrium, 468 relative address of, 37, 38
endowments, 463 selecting an, 35
general equilibrium, 464, 470 Excel commands
goods and technologies, 463 copy and paste, 38
market clearing, 466, 468 drag, 43, 44
producers’ equilibrium, 467 format cells, 36, 37
edge, 117, 186 Excel functions, 36
elementary row operations, 82 ABS, 62
ellipsoid method, 212 error, 61
English auction, 447 ISSERROR, 646
EOQ model, 253-256 LN, 61
economy of scale, 255 MIN, 62
flat bottom, 256 MMULT, 339
opportunity cost, 253 NL, 62, 248
the EOQ, 254 OFFSET, 241, 284
EOQ model with uncertain demand, SUMPRODUCT, 42, 43, 48, 49
256-260 Excel Solver Add-In, 50-56, 62-64
backorders, 257 Excel 2008 (for MACs only) 34, 50
cycle stock, 258 exchange operations, 81
reorder point, 258 extreme points, 117, 516
reorder quantity, 258 extreme value theorem, 551, 558
safety stock, 258
with constant resupply intervals, F
263, 264 Farkas, G., 390, 557, 558
epigraph, 589-590 Farkas’s lemma, 390-392
evolutionary Solver, 60-62, 241, 251 feasible basis, 123
Excel, 33-65 feasible pivot, 127, 133
circular reference in, 46, 47 feasible region, 115
for PCs, 34 bounded, 118
for Macs, 34 edge of, 117
formula bar, 37 extreme point of, 117
Excel Add-Ins, 50 feasible solution, 114
Solver, 50-56 Feinberg, E. A., 369
Premium Solver, 50, 56-59 Ferraro, P., 176, 189
OP_TOOLS, 02, 37 Fiacco, T., 213
Excel array, 37 Final Jeopardy, 477
Excel array functions, 44-46 financial economics, 397-404
matrix multiplication, 45, 46 arbitrage opportunity, 399
pivot, 62, 63 no-arbitrage tenet, 397
Index: Eric V. Denardo 667
V Y
Vanderbei, R., 211 Yale University, vii
Van der Heyden, L. vii Ye. Y., 214
variable cost, 155
vectors, 83-87 Z
addition of, 83 Zadeh, N., 318
convex combination of, 85,