Leclin 112
Leclin 112
Leclin 112
Debasis Mishra
April 6, 2011
Introduction
Optimization of a function f over a set S involves finding the maximum (minimum) value of
f (objective function) in the set S (feasible set). Properties of f and S define various types
of optimization. Primarily, optimization can be classified into three categories.
1. Linear Programming: If f is a linear function (e.g., f (x1 , x2 ) = x1 +2x2 ) and the set
S is defined by a finite collection of linear inequalities and equalities, then it is called
a linear program. As an example, consider the following linear program.
x1 ,x2
s.t.
x1 + x2 6
x1 2
x2 0
2. Integer Programming: An integer program is a linear program with further restriction that the solution be integers. In the previous example, if we impose that x1 and
x2 can only admit integer values, then it becomes an integer program.
3. Nonlinear Programming: If f is a non-linear function and the feasible set S is
defined by a finite collection of non-linear equations, then it is called a non-linear
program. There are further classifications (and extensions) of non-linear programs
Planning Unit, Indian Statistical Institute, 7 Shahid Jit Singh Marg, New Delhi 110016, India, E-mail:
[email protected]
depending on the specific nature of the problem. Typically, f is assumed to be continuous and differentiable. An example is the following non-linear program.
x1 ,x2
s.t.
x1 + x2 = 1.
In general, an optimization problem written in mathematical form is referred to as a
mathematical program. In this course, we will learn about linear and integer programming problems, their solution methods.
To understand a little more about linear and integer programs, consider the above example. The feasible set/region can be drawn on a plane. Figure 1 shows the feasible regions
for the linear program (dashed region), and the integer points inside that feasible region is
the feasible region of the integer program. Notice that the optimal solution of this linear
program has to lie on the boundary of the feasible region. Moreover, an extreme point is an
optimal solution (x1 = 2, x2 = 4). This is no accident, as we will show. If we impose the
integer constraints on x1 and x2 , then the feasible region has a finite set of points. Again,
the optimal solution is x1 = 2, x2 = 4 (this is obvious since this is an integral solution, and
is an optimal solution of the linear program).
6
@
x1 = 2
@
@
@
@
@
@
@
@d
@
d @d
@
x1 + 2x2 = 0
@d
d
d
H
H
.
..
@
..
.
.
H
H ......
d
d
d @d
.
H
@
H
H
H
d
d
d
d@d
@
.
H
..
H
.
.
..
@ x1 + x2 = 6
..
.
H
H ....
@
?
H
H
@
@
H
H
H
H
Figure 1: Feasible set and objective function of linear and integer programs
3
3.1
Linear Programming
An Example
Problem 1 There is 100 units of water to be distributed among three villages. The water
requirement of the villages are 30, 50, and 40 respectively. The water shortage costs of the
three villages are 4, 3, and 5 respectively. Water supply to no two villages should exceed 70.
Find a water distribution that minimizes the total water shortage cost.
Modeling: Let xi (i {1, 2, 3}) denote the amount of water supplied to village i. Since
the total amount of water is 100, we immediately have
x1 + x2 + x3 = 100.
Further, water supply of no two villages should exceed 70. This gives us,
x1 + x2 70
x2 + x3 70
x1 + x3 70.
3
The water requirement of every village puts an upper bound on the supply. So, we
can put x1 30, x2 50, x3 40. Of course, the water supply should all be nonnegative, i.e., x1 , x2 , x3 0. Finally, the total water shortage costs of the three villages are
4(30 x1 ) + 3(50 x2 ) + 5(40 x3 ) = 470 4x1 3x2 5x3 . If we want to minimize the
total water shortage cost, then it is equivalent to just maximizing 4x1 + 3x2 + 5x3 . So, the
problem can be formulated as:
Z = max 4x1 + 3x2 + 5x3
x1 ,x2 ,x3
s.t.
(P1)
3
X
xi = 100
i=1
xi + xj 70
x1 30
i, j {1, 2, 3}, i 6= j
x2 50
x3 40
xi 0
i {1, 2, 3}
3.2
Standard Form
n
X
cj xj
j=1
Z = max
x
s.t.
n
X
j=1
n
X
cj xj
j=1
(LP)
aij xj bi
i {1, . . . , m}
xj 0
j {1, . . . , n}.
s.t.
(INF-LP)
x1 + x2 3
3x1 3x2 11
x1 , x2 0
5
3. Unbounded: This is the class of LP problems for which feasible solutions exist, but
for every number M, there exists a feasible solution that gives the objective function
a value more than M. So, none of the feasible solutions is optimal. An example is the
following:
Z = max x1 x2
x1 ,x2
s.t.
(UNBD-LP)
2x1 + x2 1
x1 2x2 2
x1 , x2 0
2x1 + x2 = 1
@
R
@
H
H
HH
HH @@
R
HH
HH
HH
HH x 2x = 2
1
2
?
H
H
Figure 2: Unbounded LP
The second world war brought about many new things to the world. This included the use
and rapid growth of the field of linear programming. In 1947, George B. Dantzig, regarded
by many as the founder of the discipline, designed the simplex method to solve linear
programming problems for the U.S. Air Force. After that, the field showed rapid growth
6
in terms of research. Production management and economics were the primary areas which
applied linear programming in a variety of problems. Tremendous application potential of
this field led to increase in theoretical research in the field, and thus, a new branch of applied
mathematics was born.
In 1947, T.C. Koopmans pointed out several applications of linear programming in economic theory. Till today, a significant portion of economic theory is still governed by the
fundamentals of linear programming (as we will discuss).
The fundamentals of linear programming has its roots as early as 1820s, when Fourier investigated techniques to solve systems of linear equations. L.V. Kantorovich pointed out the
significance of linear programming and its applications in 1939. Unfortunately, his work was
not noticed for a long time (since he carried out most of his research in U.S.S.R.). In 1975,
Kantorovich and Koopmans were awarded the Nobel prize in economics for their contributions to the theory of optimum allocation of resources. Another event in 1970s attracted
a lot of media attention. Ever since the invention of simplex method, mathematicians were
working for a theoretically satisfactory algorithm to solve linear programs. In 1979, L.G.
Khachian published the description of such an algorithm - though its performance has been
extremely unsatisfactory in practice. On the other hand, simplex method, whose theortical
performance is not good, does a very good job in practice.
Simplex Preview
One of the first discovered, and immensely effective linear programming (LP) algorithms is
the simplex method. The objective of this section is to give examples to illustrate the
method.
5.1
First Example
(EX-1)
2x1 + 3x2 + x3 5
(1)
(3)
4x1 + x2 + 2x3 11
(2)
x1 , x2 , x3 0.
The first step in the method consists of introducing slack variables for every constraint.
For example, in Equation 1, the slack between 5 and 2x1 +3x2 +x3 is assigned a slack variable
7
x4 = 5 2x1 3x2 x3
x5 = 11 4x1 x2 2x3
x1 , x2 , x3 , x4 , x5 , x6 0.
The new variables x4 , x5 , x6 are called slack variables, and the old variables x1 , x2 , x3
are called decision variables. Hence our new LP is to
max z
s.t.
x1 , x2 , x3 , x4 , x5 , x6 0,
(4)
where z = 5x1 +4x2 +3x3 and x4 , x5 , and x6 are determined by the equations above. This new
LP is equivalent (same set of feasible solutions in terms of decision variables) to (EX-1),
given the equations determining the slack variables. The simplex method is an iterative
procedure in which having found a feasible solution x1 , . . . , x6 of (4), we look for another
feasible solution x1 , . . . , x6 of (4) such that
5
x1 + 4
x2 + 3
x3 > 5x1 + 4x2 + 3x3 .
If an optimal solution exists, we can repeat this finite number of iterations till there is
no improvement in the objective function value, at which point we stop. The first step is
to find a feasible solution, which is easy in our example: x1 = x2 = x3 = 0, which gives
x4 = 5, x5 = 11, x6 = 8. This gives z = 0.
We now need to look for a solution that gives a higher value to z. For this, we look to
increase values of any one of the variables x1 , x2 , x3 . We choose x1 . Keeping x2 and x3 at
zero, we notice that we can increase x1 to min( 25 , 11
, 8 ) = 52 to maintain x4 , x5 , x6 0. As a
4 3
result of this, the new solution is
5
1
25
x1 = , x2 = 0, x3 = 0, x4 = 0, x5 = 1, x6 = , z = .
2
2
2
Notice that by increasing the value of x1 , a variable whose value was positive (x4 ) got a
value of zero. Now, we have to create system of equation similar to previous iteration. For
that we will write the value of z and variables having non-zero values (x1 , x5 , x6 ) in terms of
variables having zero values (x4 , x2 , x3 ).
1
1
5 3
x2 x3 x4 .
2 2
2
2
x5 = 1 + 5x2 + 2x4 .
1
3
1 1
x6 = + x2 x3 + x4 .
2 2
2
2
x1 =
z=
25 7
1
5
x2 + x3 x4 .
2
2
2
2
Of x2 , x3 , x4 , the value of z decreases by increasing the values of x2 and x4 . So, the only
candidate for increasing value in this iteration is x3 . The amount we can increase the value of
x3 can again be obtained from the feasibility conditions of x1 , x5 , x6 0, which is equivalent
to (given x2 = x4 = 0) 25 12 x3 0 and 21 12 x3 0. This gives that the maximum possible
value of x3 in this iteration can be min(5, 1) = 1. By setting x3 = 1, we get a new solution
as
x1 = 2, x2 = 0, x3 = 1, x4 = 0, x5 = 1, x6 = 0, z = 13.
(5)
Two things should be noticed here: (a) this solution is also a solution of the previous
system of equations and (b) the earlier solution is also a solution of this system of equations.
This is precisely because we are just rewriting the system of equations using a different set of
decision and slack variables in every iteration, and that is the central theme of the simplex
method.
So, the new variable that takes zero value is x6 . We now write the system of equations
in terms of x2 , x4 , x6 .
x3 = 1 + x2 + 3x4 2x6
x1 = 2 2x2 2x4 + x6
x5 = 1 + 5x2 + 2x4
z = 13 3x2 x4 x6 .
Now, the value of z will decrease by increasing the values of any of the variables x2 , x4 , x6 .
So, we have reached a dead-end. In fact, we have reached an optimal solution. This is clear
from the fact that any solution requires x2 , x4 , x6 0, and by assigning any value not equal
to zero to these variables, we will decrease the value of z. Hence, z = 13 is an optimal
solution. The corresponding values of x1 , x3 , x5 are 2, 1, 1 respectively.
9
5.2
Dictionaries
Z = max
n
X
cj xj
j=1
s.t.
n
X
j=1
(LP)
aij xj bi
i {1, . . . , m}
xj 0
j {1, . . . , n}
The first step in the simplex method is to introduce slack variables, xn+1 , . . . , xn+m 0
corresponding to m constraints, and denote the objective function as z. So,
xn+i = bi
z=
n
X
n
X
aij xj
i {1, . . . , m}
j=1
cj xj .
(6)
(7)
j=1
xj 0
j {1, . . . , n, n + 1, . . . , n + m}
(8)
cj xj >
n
X
cj xj .
j=1
j=1
As we have seen in the example, a feasible solution is represented with a system of linear
equations consisting of dependent variables. These system of equations corresponding to a
feasible solution is called a dictionary. A dictionary will have the following features:
1. Every solution of the system of equations of the dictionary must be a solution of system
of equations (6), (7), and (8), and vice versa.
2. The equations of every dictionary must express m of the variables x1 , . . . , xm+n and
the objective function z (dependent variables) in terms of the remaining n variables
(independent variables).
10
x3 = 5 x2 + x1
x4 = 3 x2 2x1
z = x1 + 3x2
x1 , x2 , x3 , x4 0.
In this dictionary, we can set x1 = x2 = 0 to get a feasible solution. Rewriting the first
equation in terms of x2 , we get the following dictionary.
x2 = 5 + x1 x3
x4 = 2 3x1 + x3
z = 15 + 4x1 3x3
x1 , x2 , x3 , x4 0.
Unlike the first dictionary, we cannot put the value of independent variables to zero to get
a feasible solution: putting x1 = x3 = 0 gives us x2 = 5 but x4 = 2 < 0. This is an
undesirable feature. To get over this feature, we need the following notion.
In the dictionary, the dependent variables are kept on the left hand side (LHS), and
they are expressed in terms of the independent variables on the right hand side (RHS). An
additional feature of a dictionary is
setting the RHS variables at zero and evaluating the LHS variables, we arrive at a
feasible solution.
A dictionary with this additional property is called a feasible dictionary. Hence, every
feasible dictionary describes a feasible solution. The second dictionary above is not feasible
but the first one is.
A feasible solution that can be described in terms of a feasible dictionary is called a
basic solution. The characteristics feature of the simplex method is that it works with
basic solutions only.
5.3
Second Example
11
2x1 x2 + 2x3 4
2x1 + 3x2 x3 2
x1 , x2 , x3 0.
In this case, the initial feasible dictionary has all the slack variables as dependent variables, and it looks as follows:
x4 = 3 x1 3x2 x3
x5 = 2 + x1 3x3
x6 = 4 2x1 + x2 2x3
x7 = 2 2x1 3x2 + x3
z = 5x1 + 5x2 + 3x3 .
12
1
1
3
x1 = 1 x2 + x3 x7
2
2
2
3
1
3
x4 = 2 x2 x3 + x7
2
2
2
3
5
1
x5 = 3 x2 x3 x7
2
2
2
x6 = 2 + 4x2 3x3 + x7
5
11
5
z = 5 x2 + x3 x7 .
2
2
2
Some comments about terminology are in order:
1. Dependent variables, which appear on the LHS of any dictionary, are called basic
variables. Independent variables are called non-basic variables. In the previous
dictionary, x1 , x4 , x5 , x6 are basic variables, and x2 , x3 , x7 are non-basic variables.
2. Set of basic and non-basic variables change from iteration to iteration.
3. Choice of entering basic variable is motivated by the fact that we want to increase
the value of z, and we choose one that does that, and increase its value the maximum
possible.
4. Choice of leaving basic variable is motivated by the need to maintain feasibility. This
is done by identifying the basic variable that poses the most stringent bound on the
entering basic variable.
5. The formula for the entering basic variable appears in the pivot row, and the process
of constructing a new dictionary is called pivoting. In the previous dictionary, x3 is
the next entering basic variable, and x6 is the leaving basic variable. So, the formula
for x3 appears in
x3 =
2 4
1
1
+ x2 x6 + x7 ,
3 3
3
3
13
2 4
1
1
+ x2 + x7 x6
3 3
3
3
1
1
4 5
x1 = x2 x7 x6
3 6
3
6
1
7
x4 = 1 x2 + x6
2
2
4 29
4
5
x5 = x2 x7 + x6
3
6
3
6
x3 =
z=
2
11
26 29
+ x2 x7 x6 .
3
6
3
6
Now, the entering basic variable is x2 , and the leaving basic variable is x5 . Pivoting yields
the following dictionary:
x2 =
8
8
5
6
x7 + x6 + x5
29 29
29
29
1
3
30
+ x7 x6
29 29
29
32
3
9
x1 =
x7 x6 +
29 29
29
28
3
1
+ x7 x6 +
x4 =
29 29
29
x3 =
8
x5
29
5
x5
29
21
x5
29
z = 10 2x7 x6 x5 .
At this point, no more pivoting is possible, and we arrive at the optimal solution described
by the last dictionary as:
x1 =
8
30
32
, x2 = , x3 = ,
29
29
29
2. Iteration: We may get stuck in some iteration. Can we always choose a new entering
and leaving variable?
3. Termination: We may not be able to finish. Can the simplex method construct an
endless sequence of dictionaries without reaching an optimal solution?
We look at each of these three pitfalls. Before proceeding, let us review the general form
of a dictionary. There is a set of basic variables B with #B = m, and the linear program is
written in the following form in this dictionary.
X
xi = bi
a
ij xj
iB
j B
/
z = v +
cj xj
j B
/
6.1
Iteration
x2 = 5 + 2x3 x4 3x1
x5 = 7 3x4 4x1
z = 5 + x3 x4 x1 .
The entering variable is x3 . However, neither of the two basic variables x2 and x5 put
an upper bound on x3 . Hence, we can increase x3 as much as we want without violating
feasibility. Set x3 = t for any positive number t, and we get the solution x1 = 0, x2 =
5 + 2t, x3 = t, x4 = 0, x5 = 7, and z = 5 + t. Since t can be made arbitrarily large, so can
be z, and we conclude that the problem is unbounded. The same conclusion can be reached
in general: if there is no candidate for leaving the basis, then we can make the value of the
entering variable, and hence the value of the objective function, as large as we wish. In that
case, the problem is unbounded.
6.1.3 Degeneracy
The presence of more than one candidate for leaving the basis has interesting consequences.
For example, consider the dictionary
x4 = 1 2x3
x6 = 2 + x1 3x2 4x3
z = 2x1 x2 + 8x3 .
Having chosen x3 as the entering variable, we see that x4 , x5 , and x6 are all candidates
for leaving variable. Choosing x4 , and pivoting, we get the new dictionary as
x3 = 0.5 0.5x4
x6 = x1 3x2 + 2x4
z = 4 + 2x1 x2 4x4 .
16
This dictionary is different from others in one important aspect: along with the non-basic
variables, two of the basic variables, x5 and x6 have value of zero. Basic solutions with one
or more basic variables at zero are called degenerate.
Although harmless, degeneracy has annoying side effects. In the next iteration, we have
x1 as the entering variable, and x5 as the leaving variable. But the value of x1 can be
increased by a maximum of zero. Hence, the objective function value does not change.
Pivoting changes the dictionary to:
x1 = 2x2 + 1.5x4 0.5x5
x3 = 0.5 0.5x4
x6 = x2 + 3.5x4 0.5x5
z = 4 + 3x2 x4 x5 .
but the solution remains the same. Simplex iterations that do not change the basic
solution are called degenerate. One can verify that the next iteration is also degenerate,
but the one after that is not - in fact, it is the optimal solution.
Degeneracy is an accident. Many practical problems face degeneracy, and when it happens
the simplex goes through few (many a times quite a few) degenerate iterations before coming
up with a non-degenerate solution. But there are occasions when this may not happen.
6.2
Cycling
Sometimes, a sequence of dictionary can appear again and again. This phenomenon is called
cycling. To understand cycling let us look at a series of dictionaries.
x5 = 0.5x1 + 5.5x2 + 2.5x3 9x4
x6 = 0.5x1 + 1.5x2 + 0.5x3 x4
x7 = 1 x1
Now, the sequence of dictionaries constructed in the first six iterations goes as follows.
After the first iteration:
x1 = 11x2 + 5x3 18x4 2x5
x7 = 1 x1
x7 = 1 x1
xi = bi
z=v+
aij xj
j B
/
iB
cj xj .
j B
/
and
xi = bi
aij xj
z = v +
cj xj .
j B
/
j B
/
19
iB
xi = bi
z=v+
aij xj
j B
/
iB
cj xj .
j B
/
Since all iterations leading from D to D are degenerate, the objective function z must have
20
the same value v in both D and D . Thus, the last row of D may be recorded as
z=v+
m+n
X
cj xj ,
j=1
where cj = 0 wherever xj is basic in D . Since this equation has been obtained from D
by algebraic manipulations, it must satisfy every solution of D. In particular, it must be
satisfied by
xs = y, xj = 0 (j
/ B, j 6= s), xi = bi ais y (i B), z = v + cs y
y.
Thus, we have
v + cs y = v + cs y +
X
iB
iB
for every choice of y. Since the RHS of the previous equation is a constant independent of
y, we have
X
ci ais = 0.
cs cs +
iB
But xs is entering in D, implying cs > 0. Since xs is not entering in D and yet s < t, we
have cs 0. Hence for some r B, cr ars < 0. Note two points now:
Since r B, the variable xr is basic in D; since cr 6= 0, the same variable is nonbasic
in D . Hence, xr is fickle, and we have r t.
r 6= t: since xt is leaving in D, we have ats > 0 and so ct ats > 0 (since ct > 0 with xt
entering in D ).
This shows that r < t and yet xr is not entering in D . Thus, cr 0, and ars > 0. Since
all iterations from D to D are degenerate, the two dictionaries describe the same solution.
Since xr is non-basic in D , its value is zero in both D and D , meaning br = 0. Hence, xr
was a candidate for leaving the basis of D - yet we picked xt , even though r < t. This is a
contradiction.
21
6.3
Initialization
The only remaining point that needs to be explained is getting hold of the initial feasible
dictionary in a problem
max
n
X
cj xj
j=1
s.t.
n
X
j=1
aij xj bi
i {1, . . . , m}
xj 0
j {1, . . . , n}.
with an infeasible origin. The problem with infeasible origin is that we may not know
whether a feasible solution exists at all, and even we know what a feasible dictionary will be
for that solution. One way of getting around these two solutions is the so called auxiliary
problem:
min x0
s.t.
n
X
j=1
aij xj x0 bi
i {1, . . . , m}
xj 0
j {0, 1, . . . , n}.
A feasible solution of the auxiliary problem is readily available: set xj = 0 for j 6= 0 and
make the value of x0 sufficiently large.
Theorem 3 The original LP problem has a feasible solution if and only if the optimal value
of the associated auxiliary problem is zero.
Proof : If the original problem has a feasible solution than the auxiliary problem has the
same feasible solution with x0 = 0. This is clearly the optimal value. Further if the auxiliary
problem has a feasible (optimal) solution with x0 = 0, then the original problem has the
same feasible solution.
Hence, our objective is to solve the auxiliary problem. Consider the following example.
22
max x1 x2 + x3
s.t.
2x1 x2 + 2x3 4
2x1 3x2 + x3 5
x1 + x2 2x3 1
x1 , x2 , x3 0.
To avoid unnecessary confusion, we write the auxiliary problem in its maximization form,
and construct the dictionary as
x4 = 4 2x1 + x2 2x3 + x0
x5 = 5 2x1 + 3x2 x3 + x0
x6 = 1 + x1 x2 + 2x3 + x0
w = x0 ,
which is an infeasible dictionary. But it can be made feasible by pivoting on the most
negative bi row, i.e., x5 in this case, and choosing x0 as the entering variable. The new
(feasible) dictionary is:
x0 = 5 + 2x1 3x2 + x3 + x5
x4 = 9 2x2 x3 + x5
n
X
aij xj + x0
j=1
i {1, . . . , m}
w = x0 .
which is infeasible. However, this can be transformed into a feasible dictionary. This is done
by a single pivot in which x0 enters and the most infeasible xn+i leaves. More precisely,
23
the leaving variable is that xn+k whose bk has the largest negative value among all negative
bi s. After pivoting, x0 assumes the positive value bk , and each basic xn+i assumes the
non-negative value bi bk . Now, we can continue with our simplex method. In our example,
two more iterations yield the following dictionary:
x3 = 1.6 0.2x1 + 0.2x5 + 0.6x6 0.8x0
x4 = 3 x1 x6 + 2x0
w = x0 .
This dictionary is an optimal solution of the auxiliary problem with x0 = 0. Further, this
points to a feasible dictionary of the original problem.
x3 = 1.6 0.2x1 + 0.2x5 + 0.6x6
x2 = 2.2 + 0.6x1 + 0.4x5 + 0.2x6
x4 = 3 x1 x6
z = 0.6 + 0.2x1 0.2x5 + 0.4x6 .
So, we learned how to construct the auxiliary problem, and its first feasible dictionary.
In the process of solving the auxiliary problem, it may be possible that x0 may be
a candidate for the leaving variable in which case we pick x0 . Immediately, after
pivoting we get
x0 as a non-basic variable, in which case w = 0.
Clearly, this is an optimal solution. However, we may also reach an optimal dictionary of
auxiliary problem with x0 basic. If value of w is non-zero in that, then we simply conclude
that the original problem is infeasible. Else, x0 is basic and the optimal value of w is zero.
We argue that this is not possible. Since the dictionary preceding the final dictionary was
not optimal, the value of w = x0 must have changed from some negative value to zero
in the final iteration. To put it differently, the value of x0 must have changed from some
positive level to zero in this iteration. This means, x0 was also a candidate for leaving the
basis, and we should have picked it according to our policy. This is a contradiction.
Hence, we either construct an optimal solution of the auxiliary problem where x0 is a
non-basic variable, and we proceed to the original problem by constructing a new feasible
dictionary, or we conclude that the original problem is infeasible.
24
This strategy of solving an LP is known as the two phase simplex method. In the
first phase, we set up and solve the auxiliary problem; if we find an optimal solution of the
auxiliary problem, then we proceed to the second phase, solving the original problem.
Theorem 4 (Fundamental Theorem of Linear Programming) Every LP problem in
the standard form has the following three properties:
1. If it has no optimal solution, then it is either unbounded or infeasible.
2. If it has a feasible solution, then it has a basic feasible solution.
3. If it has an optimal solution, then it has a basic optimal solution.
Proof : The first phase of the two phase simplex method either discovers that the problem
is infeasible or else it delivers a basic feasible solution. The second phase either discovers
that the problem is unbounded or gives a basic optimal solution.
Note that if the problem is not in standard form then the theorem does not hold, e.g.,
max x s.t. x < 0 has no optimal solution even though it is neither infeasible nor unbounded.
6.4
We give an example to illustrate how the simplex method works. Consider the following
linear program.
max x1 + 2x2
x1 ,x2
s.t.
x1 + x2 4
x2 2
x1 , x2 0.
The feasible region for this LP is shown in Figure 3. Clearly, no first phase is required
here since the origin is a feasible solution. Hence, the first dictionary looks as follows (x3
and x4 are the slack variables).
x3 = 4 x1 x2
x4 = 2 x2
z = x1 + 2x2 .
25
x2
x1+x2=1
(2.2)
x2=2
(0,2)
x1+2x2
(0,0)
(4,0)
x1
Polyhedra are special classes of closed convex sets. We have already shown (in assignments)
that a closed convex set is characterized by intersection of (possibly infinite) half-spaces. If
it is the intersection of a finite number of half-spaces, then it is called a polyhedron.
Definition 1 A set P Rn is called a polyhedron if there exists a m n matrix A and
a vector b Rm such that P = {x Rn : Ax b}.
A polytope is the convex hull of a finite set of points.
Definition 2 A set P Rn is called a polytope if there exists finite number of vectors
x1 , . . . , xt Rn such that P = H(x1 , . . . , xt ).
26
Extreme Points
27
Proof : Suppose z is an extreme point of P . Assume for contradiction r(Az ) < n. This
means there exists a vector x Rn and x 6= 0 such that Az x = 0. By definition, for all rows
ai not in Az we have ai z < bi . This means there exists a > 0 such that for every ai not in
Az we have
ai (z + x) bi and ai (z x) bi .
To see why this is true, consider the a row ai . Suppose ai x 0. Then ai x 0. This means,
ai z + ai x < bi since ai z < bi . Also, since can be chosen arbitrarily small, ai z ai x bi .
Analogously, if ai x 0, we will have ai z + ai x bi and ai z ai x < bi .
Since Az x = 0 and Az b, we get A(z + x) b and A(z x) b. Hence, z + x and
z x belong to P . Since z is a convex combination of these two points, z cannot be an
extreme point. This is a contradiction.
Suppose r(Az ) = n. Assume for contradiction z is not an extreme point of P . Then
there exists x, y P with z 6= x 6= y and 0 < < 1 such that z = x + (1 )y. Then for
every row ai in Az we can write ai x bi = ai z = ai (x + (1 )y). Rearranging, we get
ai (x y) 0. Similarly, ai y bi = ai z = ai (x + (1 )y). This gives us ai (x y) 0.
Hence, ai (x y) = 0. This implies that Az (x y) = 0. But x 6= y implies that r(Az ) 6= n,
which is a contradiction.
Remark: This theorem implies that a polyhedron has only a finite number of extreme
points. This follows from the fact that there can be only finite number of subrows of A.
Remark: Also, if the number of linearly independent rows (constraints) of A is less than
n, then rank of every submatrix of A will be less than n. In that case, the polyhedron has
no extreme points - in two dimension, if the constraints are all parallel lines then the rank
of any submatrix is one, and clearly we cannot have any extreme point. Hence, if z is an
extreme point of P , then we should have more constraints than variables.
Remark: Suppose z and z are two distinct extreme points of a polyhedron. Then Az and
Az are distinct. Else, we will have Az (z z ) = 0, and z z 6= 0 will imply that r(Az ) < n.
The result can be used to prove the following theorem, which we state without proving.
Theorem 6 Let P be a bounded polyhedron with extreme points (x1 , . . . , xt ). Then P =
H(x1 , . . . , xt ), i.e., every bounded polyhedron is a polytope. Moreover, every polytope is a
bounded polyhedron.
Theorem 7 A feasible solution described by a feasible dictionary (i.e., a basic feasible solution) of a linear program max cx subject to x {x Rn : Ax b} is an extreme point of
{x Rn : Ax b}.
Proof : Let z be a feasible solution of max cx s.t. Ax b (x 0 is folded into the
constraints) described by a feasible dictionary. An implication of this is exactly n non-basic
variables have value zero in the dictionary.
Suppose r(Az ) < n. Then, we know that (from earlier proof) that there exists > 0 and
The objective of this section is to write the dictionary in matrix form. Consider the following
dictionary.
x1 = 54 0.5x2 0.5x4 0.5x5 + 0.5x6
29
(EX)
The slack variables are x5 , x6 , x7 , and they convert the inequalities to equations in the dictionary.
3x1 + 2x2 + x3 + 2x4 + x5 = 225
x1 + x2 + x3 + x4 + x6 = 117
4x1 + 3x2 + 3x3 + 4x4 + x7 = 420
(9)
The given dictionary is obtained by solving for x1 , x3 , and x7 from these equations. In matrix
terms the solution may be described very compactly. First, we record the system as Ax = b,
where
3 2 1 2 1 0 0
A= 1 1 1 1 0 1 0
4 3 3 4 0 0 1
225
b = 117
420
x=
x1
x2
x3
x4
x5
x6
x7
3 1 0
AB = 1 1 0
4 3 1
2 2 1 0
AN = 1 1 0 1
3 4 0 0
x1
xB = x3
x7
30
xN =
x2
x4
x5
x6
This is a compact record of the equations in the given dictionary. To write the objective
function in matrix terms, we write it as cx with c = [19, 13, 12, 17, 0, 0, 0], or more precisely
as cB xB + cN xN , where cB = [19, 12, 0] and cN = [13, 17, 0, 0]. Substituting for xB we get
1
1
1
z = cB (A1
B b AB AN xN ) + cN xN = cB AB b + (cN cB AB AN )xN .
The only thing that we have not proved so far is that AB is non-singular. This is equivalent
to proving that the system of equations AB xB = b has a unique solution. We already know
that there is a solution: let x be the solution corresponding to the dictionary with B as the
set of basic variables, then Ax = b or AB xB + AN xN = b or AB xB = b (since xN is zero).
Suppose there is another solution xB . Create x by setting xN to zero. Since AB xB = b
and xN = 0, we get that AB xB + AN xN = A
x = b. So, x satisfies the original system of
equations, and hence should satisfy any dictionary generated in the simplex method. But
xN = 0 implies that x = x .
10
Duality
Duality is probably the most used concept of linear programming in both theory and practice.
The central motivation to look for a dual is the following: How do we find bounds on
the objective function of a linear program without solving it completely? To
understand further, let us look at the following example.
31
x1 , x2 , x3 , x4 0.
Rather than solving this LP, let us try to find bounds on the optimal value z of this
LP. For example, (0, 0, 0, 0) is a feasible solution. Hence, z 0. Another feasible solution
is (0, 0, 1, 0) which gives z 5. Another feasible solution is (3, 0, 2, 0) which gives z 22.
But there is no systematic way in which we were looking for the estimate - it was purely
guess work. Duality provides one systematic way of getting this estimate.
Let us multiply the second constraint by 53 , which gives us
5
275
25
x1 + x2 + 5x3 + 40x4
.
3
3
3
But notice that
4x1 + x2 + 5x3 + 3x4
25
5
275
x1 + x2 + 5x3 + 40x4
.
3
3
3
Of course, each of these multipliers must be non-negative. Next, we want to use the LHS
of Equation (10) as an upper bound on 4x1 + x2 + 5x3 + 3x4 . This can be justified only
if in (10), the coefficient of each xi is at least as big as the corresponding coefficient in the
32
y1 + y2 + 2y3 1
y1 + 3y2 + 3y3 5
If the multipliers are non-negative (note here that if the constraints are equalities, then we
do not need the multipliers to be non-negative - they can be free) and if they satisfy these
inequalities, then we can get an upper bound on the objective function, i.e., for every feasible
solution (x1 , x2 , x3 , x4 ) of the original problem and every feasible solution (y1 , y2 , y3) of the
previous set of inequalities, we have
4x1 + x2 + 5x3 + 3x4 y1 + 55y2 + 3y3 .
Further optimal solution z of the original LP satisfies
z y1 + 55y2 + 3y3.
Of course we want this bound to be as close to optimal as possible. This can be done by
minimizing y1 + 55y2 + 3y3 . So, we are led to another LP problem that gives us an upper
bound of the original problem.
min y1 + 55y2 + 3y3
s.t.
y1 + 5y2 y3 4
y1 + y2 + 2y3 1
y1 + 3y2 + 3y3 5
y1 , y2, y3 0.
33
10.1
From our discussion of the example, it is clear how to write the dual of an original problem.
In general, the dual problem of
max
n
X
cj xj
j=1
s.t.
n
X
j=1
(P)
aij xj bi
i {1, . . . , m}
xj 0
j {1, . . . , n}.
m
X
bi yi
i=1
s.t.
m
X
i=1
(D)
aij yi cj
j {1, . . . , n}
yi 0
i {1, . . . , m}.
cj xj
34
m
X
i=1
bi yi.
Proof :
n
X
j=1
cj xj
=
n X
m
X
(
aij yi )xj
j=1 i=1
m X
n
X
aij xj )yi
i=1 j=1
m
X
bi yi .
i=1
Lemma 1 is useful since if we find feasible solutions of (P) and (D) at which their objective
functions are equal, then we can conclude that they are optimal solutions. Indeed, Lemma
solution of (D) such that j=1 cj xj = i=1 bi yi , then for every feasible solution (x1 , . . . , xn )
of (P) and for every feasible solution (y1 , . . . , yn ), we can write
n
X
j=1
cj xj
11
m
X
bi yi
i=1
n
X
cj xj
j=1
m
X
bi yi .
i=1
The explicit version of the theorem is due to Gale, but it is supposed to have originated from
conversations between Dantzig and von Neumann in the fall of 1947.
Theorem 8 (The Duality Theorem - Strong Duality) Let (x1 , . . . , xn ) be a feasible so
) be a feasible solution of (D). (x1 , . . . , xn ) is an optimal solution
lution of (P) and (y1, . . . , ym
cj xj
j=1
m
X
bi yi.
(SD)
i=1
Before presenting the proof, let us illustrate the crucial point of the theorem: the optimal
solution of (D) can be read off the z-row of the final dictionary for (P). For the example,
the final dictionary of (P) is
x2 = 14 2x1 4x3 5x5 3x7
x4 = 5 x1 x3 2x5 x7
Note that the slack variables x5 , x6 , x7 can be matched with the dual variables y1 , y2 , y3 in
a natural way. In the z-row of the dictionary, the coefficients of these slack variables are
(11, 0, 6). As it turns out the optimal dual solution is obtained by reversing the signs of
these coefficients, i.e., (11, 0, 6). The proof of the duality theorem works on this logic.
Proof : Suppose Equation (SD) holds. Assume for contradiction that (x1 , . . . , xn ) 6=
Pm
Pn
Pn
n
X
aij xj
i {1, . . . , m}.
j=1
Since an optimal solution exists, the simplex method finds it, and the final row of the final
dictionary reads
z = z +
m+n
X
ck xk ,
k=1
z =
n
X
cj xj .
j=1
We claim that
yi =
cn+i
i {1, . . . , m}
cj xj = z +
j=1
n
X
j=1
cj xj
m
X
yi
i=1
n
X
bi
aij xj ,
j=1
n
m
m
X
X
X
cj +
aij yi xj .
cj xj = z
bi yi +
j=1
i=1
36
i=1
Pn
j=1 cj xj
This equation is obtained from algebraic manipulations of the definitions of slack variables
and objective function, and must hold for all values of x1 , . . . , xn . Hence, we have
z =
m
X
bi yi ;
cj = cj +
i=1
m
X
aij yi
i=1
j {1, . . . , n}.
aij yi cj
j {1, . . . , n}
yi 0
i {1, . . . , m}.
i=1 bi yi .
11.1
Pn
j=1 cj xj
=
First, notice that dual of a dual problem is the original primal problem, i.e., dual of (D)
is (P). A nice corollary to this observation is that a linear program has an optimal
solution if and only if its dual has an optimal solution.
By Lemma 1, if the primal problem is unbounded, then the dual problem is infeasible.
To see this, assume for contradiction that the dual is feasible when the primal is unbounded.
This means, the feasible dual solution provides an upper bound on the optimal value of the
primal problem. This is a contradiction since the primal problem is unbounded. By the
same argument, if the dual is unbounded, then the primal is infeasible.
This also shows that if the primal and dual are both feasible, then they both have optimal
solutions, i.e., none of them is unbounded. However, both the primal and the dual can be
infeasible. For example,
max 2x1 x2
s.t.
x1 x2 1
x1 + x2 2
x1 , x2 0
and its dual are infeasible. We summarize these observations in the Table 11.1.
Duality has important practical applications. In certain cases, it may be better to solve
the dual problem than the primal problem, and then read the primal solution from the last
row of the final dictionary. For example, a primal problem with 100 constraints and two
37
Primal
Optimal
Infeasible
Unbounded
Optimal
Dual
Infeasible
Unbounded
11.2
11.3
Complementary Slackness
The question we ask in this section is given a feasible solution of the primal problem (P) and
a feasible solution of the dual problem (D), are there conditions under which these solutions
are optimal. The following theorem answers this question.
ble solutions of (P) and (D) respectively. (x1 , . . . , xn ) is an optimal solution of (P) and
38
(y1 , . . . , ym
) is an optimal solution of (D) if and only if
m
hX
i=1
i
aij yi cj xj = 0
n
i
h
X
aij xj yi = 0
bi
j=1
j {1, . . . , n}
(CS-1)
i {1, . . . , m}.
(CS-2)
aij yi
i=1
i
cj xj 0
j {1, . . . , n}
n
h
i
X
aij xj yi 0
bi
i {1, . . . , m}.
j=1
Now suppose that x and y are optimal. Assume for contradiction that one of the inequalities
in the first set of constraints is not tight. In that case, adding up all the constraints in the
first set will give us
0<
n X
m
X
j=1 i=1
m
X
bi yi
i=1
=0
aij yixj
n
X
n
X
cj xj
j=1
cj xj
j=1
aij xj yi
j=1 i=1
n
X
cj xj .
m
X
bi yi .
j=1
aij xj yi =
i=1 j=1
i=1
P
P
Theorem 10 gives us a certificate of proving optimality. The idea is clear from our earlier
interpretation of optimal dual variable values as negative of coefficients of slack variables in
the objective function row of the final simplex dictionary. If a dual variable has positive
optimal value, this implies that coefficient of slack variable is negative. This further implies
that the slack variable is non-basic in the final simplex dictionary. Hence, its value is zero in
the primal optimal solution. This implies that the corresponding constraint is binding in the
optimal solution. Similarly, if some constraint is non-binding, then the corresponding slack
variable has positive value in the optimal solution. This implies that the slack variable is
basic in the final simplex dictionary, which further implies that its coefficient is zero in the
objective function row. Hence, the corresponding dual solution has zero value.
Consider the following example.
max x1 + x2
s.t.
x1 1
2x1 + 3x2 6
x1 , x2 0.
Consider an optimal solution (x1 , x2 ) of this LP, and assume that x1 < 1. Now, let (y1 , y2) be
a dual optimal solution. This should provide a bound to x1 + x2 (y1 + 2y2)x1 + (3y2 )x2 <
y1 + 6y2 (since x1 < 1). By strong duality, this is not possible unless we set y1 = 0. This is
exactly the idea behind complementary slackness.
11.4
In optimization, the dual variables are often called Lagrange multipliers. In economics, the
dual variables are interpreted as prices of resources, where resources are constraints. Consider
the following example.
Suppose there are n products that a firm can manufacture. Each product requires the use
of m resources. To manufacture product j, the firm needs aij amount of resource i (naturally,
it makes sense to assume aij 0 here, but one need not). The amount of resource i available
is bi . The market price of product j is cj (again, both bi and cj can be assumed to be nonnegative in this story). The firm needs to decide how much to manufacture of each product
to maximize his revenue subject to resource constraints. The problem can be formulated as
40
n
X
cj xj
j=1
s.t.
n
X
j=1
(PE)
aij xj bi
i {1, . . . , m}
xj 0
j {1, . . . , n}.
Now, suppose an investor wants to buy this firms resources. He proposes a price for
every resource. In particular, he proposes a price of yi for resource i. Moreover, the investor
promises that he will set his prices high enough such that the firm gets at least as much
selling the resources as he would turning the resources into products and then selling them
at price vector c. Hence, the following constraints must hold
m
X
i=1
aij yi cj
j {1, . . . , n}.
Another way to think of these constraints is that if the constraint for product j does not
hold, then the firm will not sell resources required to produce product j since by selling them
P
in the market he gets a per unit price of cj which will be higher than m
i=1 aij yi - the per
unit profit from selling.
Of course, all prices offered by the investor must be non-negative. The investor must now
try to minimize the price he needs to pay to buy the resources. Hence, he should
min
m
X
bi yi .
i=1
In particular, the investor will solve the following linear programming problem (DE).
min
m
X
bi yi
i=1
s.t.
m
X
i=1
(DE)
aij yi cj
j {1, . . . , n}
yi 0
i {1, . . . , m}.
Strong duality theorem says that the optimal value of investors cost equals the optimal
value of firms profit. The dual variables are thus prices for the resources in the primal
problem.
41
Now, let us interpret the complementarity slackness conditions here. Suppose the firm
has an optimal production plan in place. In this plan, he does not use resource, say, i
completely, i.e., the constraint corresponding to i is not binding. In that case, can sell the
resources at unit price of zero. But if the price offered is strictly positive for resource i, then
the production plan must be using all the resources. Intuitively, the demand for the input i
is high, leading to positive prices.
42