Algorithms
Algorithms
Algorithms
|I|=n
p(I)T(I).
We will almost always work with worst-case running time. This is because for many of the problems
we will work with, average-case running time is just too difcult to compute, and it is difcult to specify
a natural probability distribution on inputs that are really meaningful for all applications. It turns out
that for most of the algorithms we will consider, there will be only a constant factor difference between
worst-case and average-case times.
Running Time of the Brute Force Algorithm: Let us agree that the input size is n, and for the running
time we will count the number of time that any element of P is accessed. Clearly we go through the
outer loop n times, and for each time through this loop, we go through the inner loop n times as well.
The condition in the if-statement makes four accesses to P. (Under C semantics, not all four need be
evaluated, but lets ignore this since it will just complicate matters). The output statement makes two
accesses (to P[i].x and P[i].y) for each point that is output. In the worst case every point is maximal
(can you see how to generate such an example?) so these two access are made for each time through
the outer loop.
Thus we might express the worst-case running time as a pair of nested summations, one for the i-loop
and the other for the j-loop:
T(n) =
n
i=1
_
_
2 +
n
j=1
4
_
_
.
These are not very hard summations to solve.
n
j=1
4 is just 4n, and so
T(n) =
n
i=1
(4n + 2) = (4n + 2)n = 4n
2
+ 2n.
As mentioned before we will not care about the small constant factors. Also, we are most interested in
what happens as n gets large. Why? Because when n is small, almost any algorithm is fast enough.
It is only for large values of n that running time becomes an important issue. When n is large, the n
2
term will be much larger than the n term, and so it will dominate the running time. We will sum this
analysis up by simply saying that the worst-case running time of the brute force algorithm is (n
2
).
This is called the asymptotic growth rate of the function. Later we will discuss more formally what
this notation means.
Summations: (This is covered in Chapter 3 of CLR.) We saw that this analysis involved computing a sum-
mation. Summations should be familiar from CMSC 150, but lets review a bit here. Given a nite
sequence of values a
1
, a
2
, . . . , a
n
, their suma
1
+a
2
+ +a
n
can be expressed in summation notation
as
n
i=1
a
i
.
If n = 0, then the value of the sum is the additive identity, 0. There are a number of simple algebraic
facts about sums. These are easy to verify by simply writing out the summation and applying simple
6
Lecture Notes CMSC 251
high school algebra. If c is a constant (does not depend on the summation index i) then
n
i=1
ca
i
= c
n
i=1
a
i
and
n
i=1
(a
i
+b
i
) =
n
i=1
a
i
+
n
i=1
b
i
.
There are some particularly important summations, which you should probably commit to memory (or
at least remember their asymptotic growth rates). If you want some practice with induction, the rst
two are easy to prove by induction.
Arithmetic Series: For n 0,
n
i=1
i = 1 + 2 + +n =
n(n + 1)
2
= (n
2
).
Geometric Series: Let x ,= 1 be any constant (independent of i), then for n 0,
n
i=0
x
i
= 1 +x +x
2
+ +x
n
=
x
n+1
1
x 1
.
If 0 < x < 1 then this is (1), and if x > 1, then this is (x
n
).
Harmonic Series: This arises often in probabilistic analyses of algorithms. For n 0,
H
n
=
n
i=1
1
i
= 1 +
1
2
+
1
3
+ +
1
n
ln n = (lnn).
Lecture 3: Summations and Analyzing Programs with Loops
(Tuesday, Feb 3, 1998)
Read: Chapt. 3 in CLR.
Recap: Last time we presented an algorithm for the 2-dimensional maxima problem. Recall that the algo-
rithm consisted of two nested loops. It looked something like this:
Brute Force Maxima
Maxima(int n, Point P[1..n]) {
for i = 1 to n {
...
for j = 1 to n {
...
...
}
}
We were interested in measuring the worst-case running time of this algorithm as a function of the
input size, n. The stuff in the . . . has been omitted because it is unimportant for the analysis.
Last time we counted the number of times that the algorithm accessed a coordinate of any point. (This
was only one of many things that we could have chosen to count.) We showed that as a function of n
in the worst case this quantity was
T(n) = 4n
2
+ 2n.
7
Lecture Notes CMSC 251
We were most interested in the growth rate for large values of n (since almost all algorithms run fast
for small values of n), so we were most interested in the 4n
2
term, which determines how the function
grows asymptotically for large n. Also, we do not care about constant factors (because we wanted
simplicity and machine independence, and gured that the constant factors were better measured by
implementing the algorithm). So we can ignored the factor 4 and simply say that the algorithms
worst-case running time grows asymptotically as n
2
, which we wrote as (n
2
).
In this and the next lecture we will consider the questions of (1) how is it that one goes about analyzing
the running time of an algorithm as function such as T(n) above, and (2) how does one arrive at a
simple asymptotic expression for that running time.
A Harder Example: Lets consider another example. Again, we will ignore stuff that takes constant time
(expressed as . . . in the code below).
A Not-So-Simple Example:
for i = 1 to n { // assume that n is input size
...
for j = 1 to 2*i {
...
k = j;
while (k >= 0) {
...
k = k - 1;
}
}
}
How do we analyze the running time of an algorithm that has many complex nested loops? The
answer is that we write out the loops as summations, and then try to solve the summations. Let I(),
M(), T() be the running times for (one full execution of) the inner loop, middle loop, and the entire
program. To convert the loops into summations, we work from the inside-out. Lets consider one pass
through the innermost loop. The number of passes through the loop depends on j. It is executed for
k = j, j 1, j 2, . . . , 0, and the time spent inside the loop is a constant, so the total time is just j +1.
We could attempt to arrive at this more formally by expressing this as a summation:
I(j) =
j
k=0
1 = j + 1
Why the 1? Because the stuff inside this loop takes constant time to execute. Why did we count
up from 0 to j (and not down as the loop does?) The reason is that the mathematical notation for
summations always goes from low index to high, and since addition is commutative it does not matter
in which order we do the addition.
Now let us consider one pass through the middle loop. Its running time is determined by i. Using
the summation we derived above for the innermost loop, and the fact that this loop is executed for j
running from 1 to 2i, it follows that the execution time is
M(i) =
2i
j=1
I(j) =
2i
j=1
(j + 1).
Last time we gave the formula for the arithmetic series:
n
i=1
i =
n(n + 1)
2
.
8
Lecture Notes CMSC 251
Our sum is not quite of the right form, but we can split it into two sums:
M(i) =
2i
j=1
j +
2i
j=1
1.
The latter sum is clearly just 2i. The former is an arithmetic series, and so we nd can plug in 2i for n,
and j for i in the formula above to yield the value:
M(i) =
2i(2i + 1)
2
+ 2i =
4i
2
+ 2i + 4i
2
= 2i
2
+ 3i.
Now, for the outermost sum and the running time of the entire algorithm we have
T(n) =
n
i=1
(2i
2
+ 3i).
Splitting this up (by the linearity of addition) we have
T(n) = 2
n
i=1
i
2
+ 3
n
i=1
i.
The latter sum is another arithmetic series, which we can solve by the formula above as n(n + 1)/2.
The former summation
n
i=1
i
2
is not one that we have seen before. Later, well show the following.
Quadratic Series: For n 0.
n
i=1
i
2
= 1 + 4 + 9 + +n
2
=
2n
3
+ 3n
2
+n
6
.
Assuming this fact for now, we conclude that the total running time is:
T(n) = 2
2n
3
+ 3n
2
+n
6
+ 3
n(n + 1)
2
,
which after some algebraic manipulations gives
T(n) =
4n
3
+ 15n
2
+ 11n
6
.
As before, we ignore all but the fastest growing term 4n
3
/6, and ignore constant factors, so the total
running time is (n
3
).
Solving Summations: In the example above, we saw an unfamiliar summation,
n
i=1
i
2
, which we claimed
could be solved in closed form as:
n
i=1
i
2
=
2n
3
+ 3n
2
+n
6
.
Solving a summation in closed-form means that you can write an exact formula for the summation
without any embedded summations or asymptotic terms. In general, when you are presented with an
unfamiliar summation, how do you approach solving it, or if not solving it in closed form, at least
getting an asymptotic approximation. Here are a few ideas.
9
Lecture Notes CMSC 251
Use crude bounds: One of the simples approaches, that usually works for arriving at asymptotic
bounds is to replace every term in the summation with a simple upper bound. For example,
in
n
i=1
i
2
we could replace every term of the summation by the largest term. This would give
n
i=1
i
2
i=1
n
2
= n
3
.
Notice that this is asymptotically equal to the formula, since both are (n
3
).
This technique works pretty well with relatively slow growing functions (e.g., anything growing
more slowly than than a polynomial, that is, i
c
for some constant c). It does not give good bounds
with faster growing functions, such as an exponential function like 2
i
.
Approximate using integrals: Integration and summation are closely related. (Integration is in some
sense a continuous form of summation.) Here is a handy formula. Let f(x) be any monotonically
increasing function (the function increases as x increases).
_
n
0
f(x)dx
n
i=1
f(i)
_
n+1
1
f(x)dx.
2 1 0 3 ... n
f(x)
f(2)
x
2 1 0 3 ... n n+1
f(2)
x
f(x)
Figure 2: Approximating sums by integrals.
Most running times are increasing functions of input size, so this formula is useful in analyzing
algorithm running times.
Using this formula, we can approximate the above quadratic sum. In this case, f(x) = x
2
.
n
i=1
i
2
_
n+1
1
x
2
dx =
x
3
3
n+1
x=1
=
(n + 1)
3
3
1
3
=
n
3
+ 3n
2
+ 3n
3
.
Note that the constant factor on the leading term of n
3
/3 is equal to the exact formula.
You might say, why is it easier to work with integrals than summations? The main reason is
that most people have more experience in calculus than in discrete math, and there are many
mathematics handbooks with lots of solved integrals.
Use constructive induction: This is a fairly good method to apply whenever you can guess the general
form of the summation, but perhaps you are not sure of the various constant factors. In this case,
the integration formula suggests a solution of the form:
n
i=1
i
2
= an
3
+bn
2
+cn +d,
but we do not know what a, b, c, and d are. However, we believe that they are constants (i.e., they
are independent of n).
10
Lecture Notes CMSC 251
Lets try to prove this formula by induction on n, and as the proof proceeds, we should gather
information about what the values of a, b, c, and d are.
Since this is the rst induction proof we have done, let us recall how induction works. Basically
induction proofs are just the mathematical equivalents of loops in programming. Let n be the
integer variable on which we are performing the induction. The theorem or formula to be proved,
called the induction hypothesis is a function of n, denote IH(n). There is some smallest value
n
0
for which IH(n
0
) is suppose to hold. We prove IH(n
0
), and then we work up to successively
larger value of n, each time we may make use of the induction hypothesis, as long as we apply it
to strictly smaller values of n.
Prove IH(n0);
for n = n0+1 to infinity do
Prove IH(n), assuming that IH(n) holds for all n < n;
This is sometimes called strong induction, because we assume that the hypothesis holds for all
n
< n. Usually we only need to assume the induction hypothesis for the next smaller value of n,
namely n 1.
Basis Case: (n = 0) Recall that an empty summation is equal to the additive identity, 0. In this
case we want to prove that 0 = a 0
3
+ b 0
2
+ c 0 + d. For this to be true, we must have
d = 0.
Induction Step: Let us assume that n > 0, and that the formula holds for all values n
< n, and
from this we will show that the formula holds for the value n itself.
The structure of proving summations by induction is almost always the same. First, write the
summation for i running up to n, then strip off the last term, apply the induction hypothesis
on the summation running up to n 1, and then combine everything algebraically. Here we
go.
n
i=1
i
2
=
_
n1
i=1
i
2
_
+n
2
= a(n 1)
3
+b(n 1)
2
+c(n 1) +d +n
2
= (an
3
3an
2
+ 3an a) + (bn
2
2bn +b) + (cn c) +d +n
2
= an
3
+ (3a +b + 1)n
2
+ (3a 2b +c)n + (a +b c +d).
To complete the proof, we want this is equal to an
3
+bn
2
+cn+d. Since this should be true
for all n, this means that each power of n must match identically. This gives us the following
constraints
a = a, b = 3a +b + 1, c = 3a 2b +c, d = a +b c +d.
We already know that d = 0 from the basis case. From the second constraint above we can
cancel b from both sides, implying that a = 1/3. Combining this with the third constraint
we have b = 1/2. Finally from the last constraint we have c = a +b = 1/6.
This gives the nal formula
n
i=1
i
2
=
n
3
3
+
n
2
2
+
n
6
=
2n
3
+ 3n
2
+n
6
.
As desired, all of the values a through d are constants, independent of n. If we had chosen
the wrong general form, then either we would nd that some of these constants depended
on n, or we might get a set of constraints that could not be satised.
Notice that constructive induction gave us the exact formula for the summation. The only
tricky part is that we had to guess the general structure of the solution.
11
Lecture Notes CMSC 251
In summary, there is no one way to solve a summation. However, there are many tricks that can be
applied to either nd asymptotic approximations or to get the exact solution. The ultimate goal is to
come up with a close-form solution. This is not always easy or even possible, but for our purposes
asymptotic bounds will usually be good enough.
Lecture 4: 2-d Maxima Revisited and Asymptotics
(Thursday, Feb 5, 1998)
Read: Chapts. 2 and 3 in CLR.
2-dimensional Maxima Revisited: Recall the max-dominance problem from the previous lectures. A point
p is said to dominated by point q if p.x q.x and p.y q.y. Given a set of n points, P =
p
1
, p
2
, . . . , p
n
in 2-space a point is said to be maximal if it is not dominated by any other point
in P. The problem is to output all the maximal points of P.
So far we have introduced a simple brute-force algorithm that ran in (n
2
) time, which operated by
comparing all pairs of points. The question we consider today is whether there is an approach that is
signicantly better?
The problem with the brute-force algorithm is that uses no intelligence in pruning out decisions. For
example, once we know that a point p
i
is dominated by another point p
j
, then we we do not need to use
p
i
for eliminating other points. Any point that p
i
dominates will also be dominated by p
j
. (This follows
from the fact that the domination relation is transitive, which can easily be veried.) This observation
by itself, does not lead to a signicantly faster algorithm though. For example, if all the points are
maximal, which can certainly happen, then this optimization saves us nothing.
Plane-sweep Algorithm: The question is whether we can make an signicant improvement in the running
time? Here is an idea for how we might do it. We will sweep a vertical line across the plane from left
to right. As we sweep this line, we will build a structure holding the maximal points lying to the left of
the sweep line. When the sweep line reaches the rightmost point of P, then we will have constructed
the complete set of maxima. This approach of solving geometric problems by sweeping a line across
the plane is called plane sweep.
Although we would like to think of this as a continuous process, we need some way to perform the
plane sweep in discrete steps. To do this, we will begin by sorting the points in increasing order of
their x-coordinates. For simplicity, let us assume that no two points have the same y-coordinate. (This
limiting assumption is actually easy to overcome, but it is good to work with the simpler version, and
save the messy details for the actual implementation.) Then we will advance the sweep-line from point
to point in n discrete steps. As we encounter each new point, we will update the current list of maximal
points.
First off, how do we sort the points? We will leave this problem for later in the semester. But the
bottom line is that there exist any number of good sorting algorithms whose running time to sort n
values is (nlog n). We will just assume that they exist for now.
So the only remaining problem is, how do we store the existing maximal points, and how do we update
them when a new point is processed? We claim that as each new point is added, it must be maximal for
the current set. (Why? Beacuse its x-coordinate is larger than all the x-coordinates of all the existing
points, and so it cannot be dominated by any of the existing points.) However, this new point may
dominate some of the existing maximal points, and so we may need to delete them from the list of
maxima. (Notice that once a point is deleted as being nonmaximal, it will never need to be added back
again.) Consider the gure below.
Let p
i
denote the current point being considered. Notice that since the p
i
has greater x-coordinate
than all the existing points, it dominates an existing point if and only if its y-coordinate is also larger
12
Lecture Notes CMSC 251
(b)
sweep line sweep line
(a)
4 2
(2,5)
(13,3)
(9,10)
(4,11)
(3,13)
(10,5)
(7,7)
(15,7)
(14,10)
(12,12)
(5,1)
(4,4)
14
12
16 14 12
10
8
6
4
2
10 8 6 6
(2,5)
(13,3)
(9,10)
(4,11)
(3,13)
(10,5)
(7,7)
(15,7)
(14,10)
(12,12)
8 10
2
4
6
8
10
12 14 16
12
14
(4,4)
(5,1)
4 2
Figure 3: Plane sweep algorithm for 2-d maxima.
(or equal). Thus, among the existing maximal points, we want to nd those having smaller (or equal)
y-coordinate, and eliminate them.
At this point, we need to make an important observation about how maximal points are ordered with
respect to the x- and y-coordinates. As we read maximal points from left to right (in order of increasing
x-coordinates) the y-coordinates appear in decreasing order. Why is this so? Suppose to the contrary,
that we had two maximal points p and q, with p.x q.x but p.y q.y. Then it would follow that q is
dominated by p, and hence it is not maximal, a contradiction.
This is nice, because it implies that if we store the existing maximal points in a list, the points that
p
i
dominates (if any) will all appear at the end of this list. So we have to scan this list to nd the
breakpoint between the maximal and dominated points. The question is how do we do this?
I claim that we can simply scan the list linearly. But we must do the scan in the proper direction for
the algorithm to be efcient. Which direction should we scan the list of current maxima? From left
to right, until nding the rst point that is not dominated, or from right to left, until nding the rst
point that is dominated? Stop here and think about it for a moment. If you can answer this question
correctly, then it says something about your intuition for designing efcient algorithms. Let us assume
that we are trying to optimize worst-case performance.
The correct answer is to scan the list from left to right. Here is why. If you only encounter one point
in the scan, then the scan will always be very efcient. The danger is that you may scan many points
before nding the proper breakpoint. If we scan the list from left to right, then every point that we
encounter whose y-coordinate is less than p
i
s will be dominated, and hence it will be eliminated from
the computation forever. We will never have to scan this point again. On the other hand, if we scan
from left to right, then in the worst case (consider when all the points are maximal) we may rescan the
same points over and over again. This will lead to an (n
2
) algorithm
Now we can give the pseudocode for the nal plane sweep algorithm. Since we add maximal points
onto the end of the list, and delete them from the end of the list, we can use a stack to store the maximal
points, where the top of the stack contains the point with the highest x-coordinate. Let S denote this
stack. The top element of the stack is denoted S.top. Popping the stack means removing the top
element.
Plane Sweep Maxima
Maxima2(int n, Point P[1..n]) {
13
Lecture Notes CMSC 251
Sort P in ascending order by x-coordinate;
S = empty; // initialize stack of maxima
for i = 1 to n do { // add points in order of x-coordinate
while (S is not empty and S.top.y <= P[i].y)
Pop(S); // remove points that P[i] dominates
Push(S, P[i]); // add P[i] to stack of maxima
}
output the contents of S;
}
Why is this algorithm correct? The correctness follows from the discussion up to now. The most
important element was that since the current maxima appear on the stack in decreasing order of x-
coordinates (as we look down fromthe top of the stack), they occur in increasing order of y-coordinates.
Thus, as soon as we nd the last undominated element in the stack, it follows that everyone else on the
stack is undominated.
Analysis: This is an interesting program to analyze, primarily because the techniques that we discussed in
the last lecture do not apply readily here. I claim that after the sorting (which we mentioned takes
(nlog n) time), the rest of the algorithm only takes (n) time. In particular, we have two nested
loops. The outer loop is clearly executed n times. The inner while-loop could be iterated up to n 1
times in the worst case (in particular, when the last point added dominates all the others). So, it seems
that though we have n(n 1) for a total of (n
2
).
However, this is a good example of how not to be fooled by analyses that are too simple minded.
Although it is true that the inner while-loop could be executed as many as n 1 times any one time
through the outer loop, over the entire course of the algorithm we claim that it cannot be executed
more than n times. Why is this? First observe that the total number of elements that have ever been
pushed onto the stack is at most n, since we execute exactly one Push for each time through the outer
for-loop. Also observe that every time we go through the inner while-loop, we must pop an element off
the stack. It is impossible to pop more elements off the stack than are ever pushed on. Therefore, the
inner while-loop cannot be executed more than n times over the entire course of the algorithm. (Make
sure that you believe the argument before going on.)
Therefore, since the total number of iterations of the inner while-loop is n, and since the total number
of iterations in the outer for-loop is n, the total running time of the algorithm is (n).
Is this really better? How much of an improvement is this plane-sweep algorithm over the brute-force al-
gorithm? Probably the most accurate way to nd out would be to code the two up, and compare their
running times. But just to get a feeling, lets look at the ratio of the running times. (We have ignored
constant factors, but well see that they cannot play a very big role.)
We have argued that the brute-force algorithm runs in (n
2
) time, and the improved plane-sweep
algorithm runs in (nlog n) time. What is the base of the logarithm? It turns out that it will not matter
for the asymptotics (well show this later), so for concreteness, lets assume logarithm base 2, which
well denote as lg n. The ratio of the running times is:
n
2
nlg n
=
n
lg n
.
For relatively small values of n (e.g. less than 100), both algorithms are probably running fast enough
that the difference will be practically negligible. On larger inputs, say, n = 1, 000, the ratio of n to
lg n is about 1000/10 = 100, so there is a 100-to-1 ratio in running times. Of course, we have not
considered the constant factors. But since neither algorithm makes use of very complex constructs, it
is hard to imagine that the constant factors will differ by more than, say, a factor of 10. For even larger
14
Lecture Notes CMSC 251
inputs, say, n = 1, 000, 000, we are looking at a ratio of roughly 1, 000, 000/20 = 50, 000. This is
quite a signicant difference, irrespective of the constant factors.
For example, suppose that there was a constant factor difference of 10 to 1, in favor of the brute-force
algorithm. The plane-sweep algorithm would still be 5,000 times faster. If the plane-sweep algorithm
took, say 10 seconds to execute, then the brute-force algorithm would take 14 hours.
From this we get an idea about the importance of asymptotic analysis. It tells us which algorithm is
better for large values of n. As we mentioned before, if n is not very large, then almost any algorithm
will be fast. But efcient algorithm design is most important for large inputs, and the general rule of
computing is that input sizes continue to grow until people can no longer tolerate the running times.
Thus, by designing algorithms efciently, you make it possible for the user to run large inputs in a
reasonable amount of time.
Asymptotic Notation: We continue to use the notation () but have never dened it. Lets remedy this now.
Denition: Given any function g(n), we dene (g(n)) to be a set of functions that are asymptotically
equivalent to g(n), or put formally:
(g(n)) = f(n) [ there exist positive constants c
1
, c
2
, and n
0
such that
0 c
1
g(n) f(n) c
2
g(n) for all n n
0
.
Your response at this point might be, Im sorry that I asked. It seems that the reasonably simple
concept of throw away all but the fastest growing term, and ignore constant factors should have a
simpler and more intuitive denition than this. Unfortunately, it does not (although later we will see
that there is somewhat easier, and nearly equivalent denition).
First off, we can see that we have been misusing the notation. We have been saying things like T(n) =
(n
2
). This cannot be true. The left side is a function, and right side is a set of functions. This should
properly be written as T(n) (n
2
). However, this abuse of notation is so common in the eld of
algorithm design, that no one notices it.
Going back to an earlier lecture, recall that we argued that the brute-force algorithm for 2-d maxima
had a running time of T(n) = 4n
2
+ 2n, which we claimed was (n
2
). Lets verify that this is so. In
this case g(n) = n
2
. We want to show that f(n) = 4n
2
+2n is a member of this set, which means that
we must argue that there exist constants c
1
, c
2
, and n
0
such that
0 c
1
n
2
(4n
2
+ 2n) c
2
n
2
for all n n
0
.
There are really three inequalities here. The constraint 0 c
1
n
2
is no problem, since we will always
be dealing with positive n and positive constants. The next is:
c
1
n
2
4n
2
+ 2n.
If we set c
1
= 4, then we have 0 4n
2
4n
2
+ 2n, which is clearly true as long as n 0. The other
inequality is
4n
2
+ 2n c
2
n
2
.
If we select c
2
= 6, and assume that n 1, then we have n
2
n, implying that
4n
2
+ 2n 4n
2
+ 2n
2
= 6n
2
= c
2
n
2
.
We have two constraints on n, n 0 and n 1. So let us make n
0
= 1, which will imply that we as
long as n n
0
, we will satisfy both of these constraints.
Thus, we have given a formal proof that 4n
2
+ 2n (n
2
), as desired. Next time well try to give
some of the intuition behind this denition.
15
Lecture Notes CMSC 251
Lecture 5: Asymptotics
(Tuesday, Feb 10, 1998)
Read: Chapt. 3 in CLR. The Limit Rule is not really covered in the text. Read Chapt. 4 for next time.
Asymptotics: We have introduced the notion of () notation, and last time we gave a formal denition.
Today, we will explore this and other asymptotic notations in greater depth, and hopefully give a better
understanding of what they mean.
-Notation: Recall the following denition from last time.
Denition: Given any function g(n), we dene (g(n)) to be a set of functions:
(g(n)) = f(n) [ there exist strictly positive constants c
1
, c
2
, and n
0
such that
0 c
1
g(n) f(n) c
2
g(n) for all n n
0
.
Lets dissect this denition. Intuitively, what we want to say with f(n) (g(n)) is that f(n) and
g(n) are asymptotically equivalent. This means that they have essentially the same growth rates for
large n. For example, functions like 4n
2
, (8n
2
+2n3), (n
2
/5+
3, and
now we have f(n) c
1
n
2
, for all n n
0
, which is what we need.
Upper bound: f(n) grows asymptotically no faster than n
2
: This is established by the portion of
the denition that reads there exist positive constants c
2
and n
0
, such that f(n) c
2
n
2
for all
n n
0
. Consider the following reasoning (which is almost correct):
f(n) = 8n
2
+ 2n 3 8n
2
+ 2n 8n
2
+ 2n
2
= 10n
2
.
This means that if we let c
2
= 10, then we are done. We have implicitly made the assumption
that 2n 2n
2
. This is not true for all n, but it is true for all n 1. So, let us select n
0
1, and
now we have f(n) c
2
n
2
for all n n
0
, which is what we need.
16
Lecture Notes CMSC 251
From the lower bound, we have n
0
(n)
g
(n)
,
where f
(n) and g
) = (n
lg n
)+n
, holds whenever
n
< n. We want to prove the formula holds for n itself. To do this, we need to express T(n)
in terms of smaller values. To do this, we apply the denition:
T(n) = 2T(n/2) +n.
Now, n/2 < n, so we can apply the induction hypothesis here, yielding T(n/2) = (n/2) lg(n/2)+
(n/2). Plugging this in gives
T(n) = 2((n/2) lg(n/2) + (n/2)) +n
= (nlg(n/2) +n) +n
= n(lg n lg 2) + 2n
= (nlg n n) + 2n
= nlg n +n,
which is exactly what we want to prove.
The Iteration Method: The above method of guessing a solution and verifying through induction works
ne as long as your recurrence is simple enough that you can come up with a good guess. But if the
recurrence is at all messy, there may not be a simple formula. The following method is quite powerful.
When it works, it allows you to convert a recurrence into a summation. By in large, summations are
easier to solve than recurrences (and if nothing else, you can usually approximate them by integrals).
The method is called iteration. Lets start expanding out the denition until we see a pattern developing.
We rst write out the denition T(n) = 2T(n/2) + n. This has a recursive formula inside T(n/2)
which we can expand, by lling in the denition but this time with the argument n/2 rather than n.
Plugging in we get T(n) = 2(2T(n/4) +n/2) +n. We then simplify and repeat. Here is what we get
when we repeat this.
T(n) = 2T(n/2) +n
= 2(2T(n/4) +n/2) +n = 4T(n/4) +n +n
= 4(2T(n/8) +n/4) +n +n = 8T(n/8) +n +n +n
= 8(2T(n/16) +n/8) +n +n +n = 16T(n/16) +n +n +n +n
= . . .
25
Lecture Notes CMSC 251
At this point we can see a general pattern emerging.
T(n) = 2
k
T(n/(2
k
)) + (n +n + +n) (k times)
= 2
k
T(n/(2
k
)) +kn.
Now, we have generated alot of equations, but we still havent gotten anywhere, because we need to
get rid of the T() from the right-hand side. Heres how we can do that. We know that T(1) = 1. Thus,
let us select k to be a value which forces the term n/(2
k
) = 1. This means that n = 2
k
, implying that
k = lg n. If we substitute this value of k into the equation we get
T(n) = 2
(lg n)
T(n/(2
(lg n)
)) + (lg n)n
= 2
(lg n)
T(1) +nlg n = 2
(lg n)
+nlg n = n +nlg n.
In simplifying this, we have made use of the formula from the rst homework, a
log
b
n
= n
log
b
a
, where
a = b = 2. Thus we have arrived at the same conclusion, but this time no guesswork was involved.
The Iteration Method (a Messier Example): That one may have been a bit too easy to see the general form.
Lets try a messier recurrence:
T(n) =
_
1 if n = 1,
3T(n/4) +n otherwise.
To avoid problems with oors and ceilings, well make the simplifying assumption here that n is a
power of 4. As before, the idea is to repeatedly apply the denition, until a pattern emerges.
T(n) = 3T(n/4) +n
= 3(3T(n/16) +n/4) +n = 9T(n/16) + 3(n/4) +n
= 9(3T(n/64) +n/16) + 3(n/4) +n = 27T(n/64) + 9(n/16) + 3(n/4) +n
= . . .
= 3
k
T
_
n
4
k
_
+ 3
k1
(n/4
k1
) + + 9(n/16) + 3(n/4) +n
= 3
k
T
_
n
4
k
_
+
k1
i=0
3
i
4
i
n.
As before, we have the recursive term T(n/4
k
) still oating around. To get rid of it we recall that
we know the value of T(1), and so we set n/4
k
= 1 implying that 4
k
= n, that is, k = log
4
n. So,
plugging this value in for k we get:
T(n) = 3
log
4
n
T(1) +
(log
4
n)1
i=0
3
i
4
i
n
= n
log
4
3
+
(log
4
n)1
i=0
3
i
4
i
n.
Again, in the last step we used the formula a
log
b
n
= n
log
b
a
where a = 3 and b = 4, and the fact that
T(1) = 1. (Why did we write it this way? This emphasizes that the function is of the formn
c
for some
constant c.) By the way, log
4
3 = 0.7925 . . . 0.79, so n
log
4
3
n
0.79
.
26
Lecture Notes CMSC 251
We have this messy summation to solve though. First observe that the value n remains constant
throughout the sum, and so we can pull it out front. Also note that we can write 3
i
/4
i
and (3/4)
i
.
T(n) = n
log
4
3
+n
(log
4
n)1
i=0
_
3
4
_
i
.
Note that this is a geometric series. We may apply the formula for the geometric series, which gave in
an earlier lecture. For x ,= 1:
m
i=0
x
i
=
x
m+1
1
x 1
.
In this case x = 3/4 and m = log
4
n 1. We get
T(n) = n
log
4
3
+n
(3/4)
log
4
n
1
(3/4) 1
.
Applying our favorite log identity once more to the expression in the numerator (with a = 3/4 and
b = 4) we get
(3/4)
log
4
n
= n
log
4
(3/4)
= n
(log
4
3log
4
4)
= n
(log
4
31)
=
n
log
4
3
n
.
If we plug this back in, we have
T(n) = n
log
4
3
+n
n
log
4
3
n
1
(3/4) 1
= n
log
4
3
+
n
log
4
3
n
1/4
= n
log
4
3
4(n
log
4
3
n)
= n
log
4
3
+ 4(n n
log
4
3
)
= 4n 3n
log
4
3
.
So the nal result (at last!) is
T(n) = 4n 3n
log
4
3
4n 3n
0.79
(n).
It is interesting to note the unusual exponent of log
4
3 0.79. We have seen that two nested loops typi-
cally leads to (n
2
) time, and three nested loops typically leads to (n
3
) time, so it seems remarkable
that we could generate a strange exponent like 0.79 as part of a running time. However, as we shall
see, this is often the case in divide-and-conquer recurrences.
Lecture 8: More on Recurrences
(Thursday, Feb 19, 1998)
Read: Chapt. 4 on recurrences, skip Section 4.4.
Recap: Last time we discussed recurrences, that is, functions that are dened recursively. We discussed
their importance in analyzing divide-and-conquer algorithms. We also discussed two methods for solv-
ing recurrences, namely guess-and-verify (by induction), and iteration. These are both very powerful
methods, but they are quite mechanical, and it is difcult to get a quick and intuitive sense of what
is going on in the recurrence. Today we will discuss two more techniques for solving recurrences. The
rst provides a way of visualizing recurrences and the second, called the Master Theorem, is a method
of solving many recurrences that arise in divide-and-conquer applications.
27
Lecture Notes CMSC 251
Visualizing Recurrences Using the Recursion Tree: Iteration is a very powerful technique for solving re-
currences. But, it is easy to get lost in all the symbolic manipulations and lose sight of what is going
on. Here is a nice way to visualize what is going on in iteration. We can describe any recurrence in
terms of a tree, where each expansion of the recurrence takes us one level deeper in the tree.
Recall that the recurrence for MergeSort (which we simplied by assuming that n is a power of 2, and
hence could drop the oors and ceilings)
T(n) =
_
1 if n = 1,
2T(n/2) +n otherwise.
Suppose that we draw the recursion tree for MergeSort, but each time we merge two lists, we label that
node of the tree with the time it takes to perform the associated (nonrecursive) merge. Recall that to
merge two lists of size m/2 to a list of size m takes (m) time, which we will just write as m. Below
is an illustration of the resulting recursion tree.
= n
n
n
n 2( /2) =
n 4( /4) =
n/4 n/4 n/4 n/4
n/2 n/2
n
+
T(n/4)
T(n/2)
T(n)
T(n/2)
...
n (lg +1) n
n(n/n) = n
... 1 1
+ 1 levels
+
+
1 1
n lg
Figure 5: Using the recursion tree to visualize a recurrence.
Observe that the total work at the topmost level of the recursion is (n) (or just n for short). At the
second level we have two merges, each taking n/2 time, for a total of 2(n/2) = n. At the third level we
have 4 merges, each taking n/4 time, for a total of 4(n/4) = n. This continues until the bottommost
level of the tree. Since the tree exactly lg n + 1 levels (0, 1, 2, . . . , lg n), and each level contributes a
total of n time, the total running time is n(lg n + 1) = nlg n + n. This is exactly what we got by the
iteration method.
This can be used for a number of simple recurrences. For example, lets try it on the following recur-
rence. The tree is illustrated below.
T(n) =
_
1 if n = 1,
3T(n/2) +n
2
otherwise.
Again, we label each node with the amount of work at that level. In this case the work for T(m) is m
2
.
For the top level (or 0th level) the work is n
2
. At level 1 we have three nodes whose work is (n/2)
2
each, for a total of 3(n/2)
2
. This can be written as n
2
(3/4). At the level 2 the work is 9(n/4)
2
, which
can be written as n
2
(9/16). In general it is easy to extrapolate to see that at the level i, we have 3
i
nodes, each involving (n/2
i
)
2
work, for a total of 3
i
(n/2
i
)
2
= n
2
(3/4)
i
.
This leads to the following summation. Note that we have not determined where the tree bottoms out,
so we have left off the upper bound on the sum.
T(n) = n
2
?
i=0
_
3
4
_
i
.
28
Lecture Notes CMSC 251
T(n/2) T(n/2)
T(n/4)
T(n/2)
T(n)
(n/4)
2
...
2 2
n
+
+
+
= n
...
9(n/4) = n (9/16)
2 2
3(n/2) = n (3/4)
(n/4)
2
(n/4)
2
(n/4)
2
(n/4)
2
(n/4)
2
2
(n/2)
2
(n/2)
2
(n/2)
2
i
n (3/4)
2
2
Figure 6: Another recursion tree example.
If all we wanted was an asymptotic expression, then are essentially done at this point. Why? The
summation is a geometric series, and the base (3/4) is less than 1. This means that this series converges
to some nonzero constant (even if we ran the sum out to ). Thus the running time is (n
2
).
To get a more exact result, observe that the recursion bottoms out when we get down to single items,
and since the sizes of the inputs are cut by half at each level, it is not hard to see that the nal level is
level lg n. (It is easy to be off by 1 here, but this sort of small error will not affect the asymptotic
result. In this case we happen to be right.) So, we can plug in lg n for the ? in the above summation.
T(n) = n
2
lg n
i=0
_
3
4
_
i
.
If we wanted to get a more exact answer, we could plug the summation into the formula for the geo-
metric series and simplify. This would lead to an expression like
T(n) = n
2
(3/4)
lg n+1
1
(3/4) 1
.
This will take some work to simplify, but at this point it is all just tedious algebra to get the formula
into simple form. (This sort of algebraic is typical of algorithm analysis, so be sure that you follow
each step.)
T(n) = n
2
(3/4)
lg n+1
1
(3/4) 1
= 4n
2
((3/4)
lg n+1
1)
= 4n
2
(1 (3/4)
lg n+1
) = 4n
2
(1 (3/4)(3/4)
lg n
)
= 4n
2
(1 (3/4)n
lg(3/4)
) = 4n
2
(1 (3/4)n
lg 3lg 4
)
= 4n
2
(1 (3/4)n
lg 32
) = 4n
2
(1 (3/4)(n
lg 3
/n
2
))
= 4n
2
3n
lg 3
.
Note that lg 3 1.58, so the whole expression is (n
2
).
In conclusion, the technique of drawing the recursion tree is a somewhat more visual way of analyzing
summations, but it is really equivalent to the method of iteration.
(Simplied) Master Theorem: If you analyze many divide-and-conquer algorithms, you will see that the
same general type of recurrence keeps popping up. In general you are breaking a problem into a
subproblems, where each subproblem is roughly a factor of 1/b of the original problem size, and
29
Lecture Notes CMSC 251
the time it takes to do the splitting and combining on an input of size n is (n
k
). For example, in
MergeSort, a = 2, b = 2, and k = 1.
Rather than doing every such recurrence from scratch, can we just come up with a general solution?
The answer is that you can if all you need is an asymptotic expression. This result is called the Master
Theorem, because it can be used to master so many different types of recurrence. Our text gives a
fairly complicated version of this theorem. We will give a simpler version, which is general enough for
most typical situations. In cases where this doesnt apply, try the one from the book. If the one from
the book doesnt apply, then you will probably need iteration, or some other technique.
Theorem: (Simplied Master Theorem) Let a 1, b > 1 be constants and let T(n) be the recurrence
T(n) = aT(n/b) +n
k
,
dened for n 0. (As usual let us assume that n is a power of b. The basis case, T(1) can be any
constant value.) Then
Case 1: if a > b
k
then T(n) (n
log
b
a
).
Case 2: if a = b
k
then T(n) (n
k
log n).
Case 3: if a < b
k
then T(n) (n
k
).
Using this version of the Master Theorem we can see that in the MergeSort recurrence a = 2, b = 2,
and k = 1. Thus, a = b
k
(2 = 2
1
) and so Case 2 applies. From this we have T(n) (nlog n).
In the recurrence above, T(n) = 3T(n/2) + n
2
, we have a = 3, b = 2 and k = 2. We have a < b
k
(3 < 2
2
) in this case, and so Case 3 applies. From this we have T(n) (n
2
).
Finally, consider the recurrence T(n) = 4T(n/3) + n, in which we have a = 4, b = 3 and k = 1. In
this case we have a > b
k
(4 > 3
1
), and so Case 1 applies. From this we have T(n) (n
log
3
4
)
(n
1.26
). This may seem to be a rather strange running time (a non-integer exponent), but this not
uncommon for many divide-and-conquer solutions.
There are many recurrences that cannot be put into this form. For example, if the splitting and combin-
ing steps involve sorting, we might have seen a recurrence of the form
T(n) =
_
1 if n = 1,
2T(n/2) +nlog n otherwise.
This solves to T(n) = (nlog
2
n), but the Master Theorem (neither this form nor the one in CLR)
will tell you this. However, iteration works just ne here.
Recursion Trees Revisited: The recursion trees offer some intuition about why it is that there are three cases
in the Master Theorem. Generally speaking the question is where is most of the work done: at the top
of the tree (the root level), at the bottom of the tree (the leaf level), or spread equally throughout the
entire tree.
For example, in the MergeSort recurrence (which corresponds to Case 2 in the Master Theorem) every
level of the recursion tree provides the same total work, namely n. For this reason the total work is
equal to this value times the height of the tree, namely (log n), for a total of (nlog n).
Next consider the earlier recurrence T(n) = 3T(n/2)+n
2
(which corresponds to Case 3 in the Master
Theorem). In this instance most of the work was concentrated at the root of the tree. Each level of the
tree provided a smaller fraction of work. By the nature of the geometric series, it did not matter how
many levels the tree had at all. Even with an innite number of levels, the geometric series that result
will converge to a constant value. This is an important observation to make. A common way to design
the most efcient divide-and-conquer algorithms is to try to arrange the recursion so that most of the
work is done at the root, and at each successive level of the tree the work at this level reduces (by some
constant factor). As long as this is the case, Case 3 will apply.
30
Lecture Notes CMSC 251
Finally, in the recurrence T(n) = 4T(n/3) + n (which corresponds to Case 1), most of the work is
done at the leaf level of the recursion tree. This can be seen if you perform iteration on this recurrence,
the resulting summation is
n
log
3
n
i=0
_
4
3
_
i
.
(You might try this to see if you get the same result.) Since 4/3 > 1, as we go deeper into the levels
of the tree, that is deeper into the summation, the terms are growing successively larger. The largest
contribution will be from the leaf level.
Lecture 9: Medians and Selection
(Tuesday, Feb 24, 1998)
Read: Todays material is covered in Sections 10.2 and 10.3. You are not responsible for the randomized
analysis of Section 10.2. Our presentation of the partitioning algorithm and analysis are somewhat different
from the ones in the book.
Selection: In the last couple of lectures we have discussed recurrences and the divide-and-conquer method
of solving problems. Today we will give a rather surprising (and very tricky) algorithm which shows
the power of these techniques.
The problem that we will consider is very easy to state, but surprisingly difcult to solve optimally.
Suppose that you are given a set of n numbers. Dene the rank of an element to be one plus the
number of elements that are smaller than this element. Since duplicate elements make our life more
complex (by creating multiple elements of the same rank), we will make the simplifying assumption
that all the elements are distinct for now. It will be easy to get around this assumption later. Thus, the
rank of an element is its nal position if the set is sorted. The minimum is of rank 1 and the maximum
is of rank n.
Of particular interest in statistics is the median. If n is odd then the median is dened to be the element
of rank (n + 1)/2. When n is even there are two natural choices, namely the elements of ranks n/2
and (n/2) + 1. In statistics it is common to return the average of these two elements. We will dene
the median to be either of these elements.
Medians are useful as measures of the central tendency of a set, especially when the distribution of val-
ues is highly skewed. For example, the median income in a community is likely to be more meaningful
measure of the central tendency than the average is, since if Bill Gates lives in your community then
his gigantic income may signicantly bias the average, whereas it cannot have a signicant inuence
on the median. They are also useful, since in divide-and-conquer applications, it is often desirable to
partition a set about its median value, into two sets of roughly equal size. Today we will focus on the
following generalization, called the selection problem.
Selection: Given a set A of n distinct numbers and an integer k, 1 k n, output the element of A
of rank k.
The selection problem can easily be solved in (nlog n) time, simply by sorting the numbers of A,
and then returning A[k]. The question is whether it is possible to do better. In particular, is it possible
to solve this problem in (n) time? We will see that the answer is yes, and the solution is far from
obvious.
The Sieve Technique: The reason for introducing this algorithm is that it illustrates a very important special
case of divide-and-conquer, which I call the sieve technique. We think of divide-and-conquer as break-
ing the problem into a small number of smaller subproblems, which are then solved recursively. The
sieve technique is a special case, where the number of subproblems is just 1.
31
Lecture Notes CMSC 251
The sieve technique works in phases as follows. It applies to problems where we are interested in
nding a single item from a larger set of n items. We do not know which item is of interest, however
after doing some amount of analysis of the data, taking say (n
k
) time, for some constant k, we nd
that we do not know what the desired item is, but we can identify a large enough number of elements
that cannot be the desired value, and can be eliminated from further consideration. In particular large
enough means that the number of items is at least some xed constant fraction of n (e.g. n/2, n/3,
0.0001n). Then we solve the problem recursively on whatever items remain. Each of the resulting
recursive solutions then do the same thing, eliminating a constant fraction of the remaining set.
Applying the Sieve to Selection: To see more concretely how the sieve technique works, let us apply it to
the selection problem. Recall that we are given an array A[1..n] and an integer k, and want to nd the
k-th smallest element of A. Since the algorithm will be applied inductively, we will assume that we
are given a subarray A[p..r] as we did in MergeSort, and we want to nd the kth smallest item (where
k r p + 1). The initial call will be to the entire array A[1..n].
There are two principal algorithms for solving the selection problem, but they differ only in one step,
which involves judiciously choosing an item from the array, called the pivot element, which we will
denote by x. Later we will see how to choose x, but for now just think of it as a random element of A.
We then partition A into three parts. A[q] contains the element x, subarray A[p..q 1] will contain all
the elements that are less than x, and A[q + 1..r], will contain all the element that are greater than x.
(Recall that we assumed that all the elements are distinct.) Within each subarray, the items may appear
in any order. This is illustrated below.
Before partitioing
After partitioing
2 6 4 1 3 7 9
pivot
3 5 1 9 4 6
x
p r
q p r
A[q+1..r] > x
A[p..q1] < x
5
2 7
Partition
(pivot = 4)
9
7
5
6
(k=64=2)
Recurse
x_rnk=2 (DONE!)
6
5
5
6
(pivot = 6)
Partition
(k=2)
Recurse
x_rnk=3
(pivot = 7)
Partition
(k=6)
Initial
x_rnk=4
6
7
3
1
4
6
2
9
5
4
1
9
5
3
7
2
6
9
5
7
Figure 7: Selection Algorithm.
It is easy to see that the rank of the pivot x is qp+1 in A[p..r]. Let x rnk = q p +1. If k = x rnk,
then the pivot is the kth smallest, and we may just return it. If k < x rnk, then we know that we need
to recursively search in A[p..q 1] and if k > x rnk then we need to recursively search A[q + 1..r].
In this latter case we have eliminated q smaller elements, so we want to nd the element of rank k q.
Here is the complete pseudocode.
Selection
Select(array A, int p, int r, int k) { // return kth smallest of A[p..r]
if (p == r) return A[p] // only 1 item left, return it
32
Lecture Notes CMSC 251
else {
x = Choose_Pivot(A, p, r) // choose the pivot element
q = Partition(A, p, r, x) // partition <A[p..q-1], x, A[q+1..r]>
x_rnk = q - p + 1 // rank of the pivot
if (k == x_rnk) return x // the pivot is the kth smallest
else if (k < x_rnk)
return Select(A, p, q-1, k) // select from left subarray
else
return Select(A, q+1, r, k-x_rnk)// select from right subarray
}
}
Notice that this algorithm satises the basic form of a sieve algorithm. It analyzes the data (by choosing
the pivot element and partitioning) and it eliminates some part of the data set, and recurses on the rest.
When k = x rnk then we get lucky and eliminate everything. Otherwise we either eliminate the pivot
and the right subarray or the pivot and the left subarray.
We will discuss the details of choosing the pivot and partitioning later, but assume for now that they
both take (n) time. The question that remains is how many elements did we succeed in eliminating?
If x is the largest or smallest element in the array, then we may only succeed in eliminating one element
with each phase. In fact, if x is one of the smallest elements of A or one of the largest, then we get
into trouble, because we may only eliminate it and the few smaller or larger elements of A. Ideally x
should have a rank that is neither too large nor too small.
Let us suppose for now (optimistically) that we are able to design the procedure Choose Pivot in
such a way that is eliminates exactly half the array with each phase, meaning that we recurse on the
remaining n/2 elements. This would lead to the following recurrence.
T(n) =
_
1 if n = 1,
T(n/2) +n otherwise.
We can solve this either by expansion (iteration) or the Master Theorem. If we expand this recurrence
level by level we see that we get the summation
T(n) = n +
n
2
+
n
4
+
i=0
n
2
i
= n
i=0
1
2
i
.
Recall the formula for the innite geometric series. For any c such that [c[ < 1,
i=0
c
i
= 1/(1 c).
Using this we have
T(n) 2n O(n).
(This only proves the upper bound on the running time, but it is easy to see that it takes at least (n)
time, so the total running time is (n).)
This is a bit counterintuitive. Normally you would think that in order to design a (n) time algorithm
you could only make a single, or perhaps a constant number of passes over the data set. In this algorithm
we make many passes (it could be as many as lg n). However, because we eliminate a constant fraction
of elements with each phase, we get this convergent geometric series in the analysis, which shows that
the total running time is indeed linear in n. This lesson is well worth remembering. It is often possible
to achieve running times in ways that you would not expect.
Note that the assumption of eliminating half was not critical. If we eliminated even one per cent, then
the recurrence would have been T(n) = T(99n/100)+n, and we would have gotten a geometric series
involving 99/100, which is still less than 1, implying a convergent series. Eliminating any constant
fraction would have been good enough.
33
Lecture Notes CMSC 251
Choosing the Pivot: There are two issues that we have left unresolved. The rst is how to choose the pivot
element, and the second is how to partition the array. Both need to be solved in (n) time. The second
problem is a rather easy programming exercise. Later, when we discuss QuickSort, we will discuss
partitioning in detail.
For the rest of the lecture, lets concentrate on how to choose the pivot. Recall that before we said that
we might think of the pivot as a random element of A. Actually this is not such a bad idea. Lets see
why.
The key is that we want the procedure to eliminate at least some constant fraction of the array after
each partitioning step. Lets consider the top of the recurrence, when we are given A[1..n]. Suppose
that the pivot x turns out to be of rank q in the array. The partitioning algorithm will split the array into
A[1..q 1] < x, A[q] = x and A[q + 1..n] > x. If k = q, then we are done. Otherwise, we need
to search one of the two subarrays. They are of sizes q 1 and n q, respectively. The subarray that
contains the kth smallest element will generally depend on what k is, so in the worst case, k will be
chosen so that we have to recurse on the larger of the two subarrays. Thus if q > n/2, then we may
have to recurse on the left subarray of size q 1, and if q < n/2, then we may have to recurse on the
right subarray of size n q. In either case, we are in trouble if q is very small, or if q is very large.
If we could select q so that it is roughly of middle rank, then we will be in good shape. For example,
if n/4 q 3n/4, then the larger subarray will never be larger than 3n/4. Earlier we said that we
might think of the pivot as a random element of the array A. Actually this works pretty well in practice.
The reason is that roughly half of the elements lie between ranks n/4 and 3n/4, so picking a random
element as the pivot will succeed about half the time to eliminate at least n/4. Of course, we might be
continuously unlucky, but a careful analysis will show that the expected running time is still (n). We
will return to this later.
Instead, we will describe a rather complicated method for computing a pivot element that achieves the
desired properties. Recall that we are given an array A[1..n], and we want to compute an element x
whose rank is (roughly) between n/4 and 3n/4. We will have to describe this algorithm at a very high
level, since the details are rather involved. Here is the description for Select Pivot:
Groups of 5: Partition A into groups of 5 elements, e.g. A[1..5], A[6..10], A[11..15], etc. There will
be exactly m = n/5| such groups (the last one might have fewer than 5 elements). This can
easily be done in (n) time.
Group medians: Compute the median of each group of 5. There will be mgroup medians. We do not
need an intelligent algorithm to do this, since each group has only a constant number of elements.
For example, we could just BubbleSort each group and take the middle element. Each will take
(1) time, and repeating this n/5| times will give a total running time of (n). Copy the group
medians to a new array B.
Median of medians: Compute the median of the group medians. For this, we will have to call the
selection algorithm recursively on B, e.g. Select(B, 1, m, k), where m = n/5|, and
k = (m+ 1)/2|. Let x be this median of medians. Return x as the desired pivot.
The algorithm is illustrated in the gure below. To establish the correctness of this procedure, we need
to argue that x satises the desired rank properties.
Lemma: The element x is of rank at least n/4 and at most 3n/4 in A.
Proof: We will show that x is of rank at least n/4. The other part of the proof is essentially sym-
metrical. To do this, we need to show that there are at least n/4 elements that are less than or
equal to x. This is a bit complicated, due to the oor and ceiling arithmetic, so to simplify things
we will assume that n is evenly divisible by 5. Consider the groups shown in the tabular form
above. Observe that at least half of the group medians are less than or equal to x. (Because x is
34
Lecture Notes CMSC 251
8
10
27
Group
29
11
58
39
60
55
1
21
52
19
48 63
12
23
3
24
37
57
14
6
48 24
57
14
25
30
43
2
32
3
63
12
52
23
64
34
17
44
5
19
8
27
10
41
25
25
43
30
32
2
63
52
12
23
3
34
44
17
27
10
8
19
48
41
60
1
29
11
39
58
Get median of medians
(Sorting of group medians is not really performed)
6
43
30
32
2
64
5
34
44
17 29
11
39
58 55
21
41
60
1
24
64
5
55
21
Get group medians
37
57
14
37
6
Figure 8: Choosing the Pivot. 30 is the nal pivot.
their median.) And for each group median, there are three elements that are less than or equal to
this median within its group (because it is the median of its group). Therefore, there are at least
3((n/5)/2 = 3n/10 n/4 elements that are less than or equal to x in the entire array.
Analysis: The last order of business is to analyze the running time of the overall algorithm. We achieved
the main goal, namely that of eliminating a constant fraction (at least 1/4) of the remaining list at each
stage of the algorithm. The recursive call in Select() will be made to list no larger than 3n/4.
However, in order to achieve this, within Select Pivot() we needed to make a recursive call to
Select() on an array B consisting of n/5| elements. Everything else took only (n) time. As
usual, we will ignore oors and ceilings, and write the (n) as n for concreteness. The running time
is
T(n)
_
1 if n = 1,
T(n/5) +T(3n/4) +n otherwise.
This is a very strange recurrence because it involves a mixture of different fractions (n/5 and 3n/4).
This mixture will make it impossible to use the Master Theorem, and difcult to apply iteration. How-
ever, this is a good place to apply constructive induction. We know we want an algorithm that runs in
(n) time.
Theorem: There is a constant c, such that T(n) cn.
Proof: (by strong induction on n)
Basis: (n = 1) In this case we have T(n) = 1, and so T(n) cn as long as c 1.
Step: We assume that T(n
) cn
for all n
i
i, the quadratic series,
i
i
2
, the geometric series
i
x
i
, and the
harmonic series
i
1/i. Practice with simplifying summations. For example, be sure that you
can take something like
i
3
i
_
n
2
i
_
2
and simplify it to a geometric series
n
2
i
(3/4)
i
.
Also be sure you can apply the integration rule to summations. (Chapt. 3 of CLR.)
Asymptotics: Know the formal denitions for , O, and , as well as how to use the limit-rule.
Know the what the other forms, o and , mean informally. There are a number of good sample
problems in the book. Ill be happy to check any of your answers. Also be able to rank functions
in asymptotic order. For example which is larger lg
n or
lg n? (It is the former, can you see
why?) Remember the following rule and know how to use it.
lim
n
n
b
a
n
= 0 lim
n
lg
b
n
n
c
= 0.
(Chapt. 2 of CLR.)
Recurrences: Know how to analyze the running time of a recursive program by expressing it as a
recurrence. Review the basic techniques for solving recurrences: guess and verify by induction
(Ill provide any guesses that you need on the exam), constructive induction, iteration, and the
(simplied) Master Theorem. (You are NOT responsible for the more complex version given in
the text.) (Chapt 4, Skip 4.4.)
38
Lecture Notes CMSC 251
Divide-and-conquer: Understand how to design algorithms by divide-and-conquer. Understand the
divide-and-conquer algorithm for MergeSort, and be able to work an example by hand. Also
understand how the sieve technique works, and how it was used in the selection problem. (Chapt
10 on Medians; skip the randomized analysis. The material on the 2-d maxima and long integer
multiplication is not discussed in CLR.)
Lecture 11: First Midterm Exam
(Tuesday, March 3, 1998)
First midterm exam today. No lecture.
Lecture 12: Heaps and HeapSort
(Thursday, Mar 5, 1998)
Read: Chapt 7 in CLR.
Sorting: For the next series of lectures we will focus on sorting algorithms. The reasons for studying sorting
algorithms in details are twofold. First, sorting is a very important algorithmic problem. Procedures
for sorting are parts of many large software systems, either explicitly or implicitly. Thus the design of
efcient sorting algorithms is important for the overall efciency of these systems. The other reason is
more pedagogical. There are many sorting algorithms, some slow and some fast. Some possess certain
desirable properties, and others do not. Finally sorting is one of the few problems where there provable
lower bounds on how fast you can sort. Thus, sorting forms an interesting case study in algorithm
theory.
In the sorting problem we are given an array A[1..n] of n numbers, and are asked to reorder these
elements into increasing order. More generally, A is of an array of records, and we choose one of these
records as the key value on which the elements will be sorted. The key value need not be a number. It
can be any object from a totally ordered domain. Totally ordered means that for any two elements of
the domain, x, and y, either x < y, x =, or x > y.
There are some domains that can be partially ordered, but not totally ordered. For example, sets can
be partially ordered under the subset relation, , but this is not a total order, it is not true that for any
two sets either x y, x = y or x y. There is an algorithm called topological sorting which can be
applied to sort partially ordered sets. We may discuss this later.
Slow Sorting Algorithms: There are a number of well-known slow sorting algorithms. These include the
following:
Bubblesort: Scan the array. Whenever two consecutive items are found that are out of order, swap
them. Repeat until all consecutive items are in order.
Insertion sort: Assume that A[1..i 1] have already been sorted. Insert A[i] into its proper position
in this subarray, by shifting all larger elements to the right by one to make space for the new item.
Selection sort: Assume that A[1..i 1] contain the i 1 smallest elements in sorted order. Find the
smallest element in A[i..n], and then swap it with A[i].
These algorithms are all easy to implement, but they run in (n
2
) time in the worst case. We have
already seen that MergeSort sorts an array of numbers in (nlog n) time. We will study two others,
HeapSort and QuickSort.
39
Lecture Notes CMSC 251
Priority Queues: The heapsort algorithm is based on a very nice data structure, called a heap. A heap is
a concrete implementation of an abstract data type called a priority queue. A priority queue stores
elements, each of which is associated with a numeric key value, called its priority. A simple priority
queue supports three basic operations:
Create: Create an empty queue.
Insert: Insert an element into a queue.
ExtractMax: Return the element with maximum key value from the queue. (Actually it is more
common to extract the minimum. It is easy to modify the implementation (by reversing < and >
to do this.)
Empty: Test whether the queue is empty.
Adjust Priority: Change the priority of an item in the queue.
It is common to support a number of additional operations as well, such as building a priority queue
from an initial set of elements, returning the largest element without deleting it, and changing the
priority of an element that is already in the queueu (either decreasing or increasing it).
Heaps: A heap is a data structure that supports the main priority queue operations (insert and delete max)
in (log n) time. For now we will describe the heap in terms of a binary tree implementation, but we
will see later that heaps can be stored in arrays.
By a binary tree we mean a data structure which is either empty or else it consists of three things: a
root node, a left subtree and a right subtree. The left subtree and right subtrees are each binary trees.
They are called the left child and right child of the root node. If both the left and right children of a
node are empty, then this node is called a leaf node. A nonleaf node is called an internal node.
Complete Binary Tree Binary Tree
Root
Depth:
2
3
4
5
1
0
internal
node
leaf
Figure 10: Binary trees.
The depth of a node in a binary tree is its distance from the root. The root is at depth 0, its children
at depth 1, its grandchildren at depth 2, and so on. The height of a binary tree is its maximum depth.
Binary tree is said to be complete if all internal nodes have two (nonempty) children, and all leaves
have the same depth. An important fact about a complete binary trees is that a complete binary tree of
height h has
n = 1 + 2 +. . . + 2
h
=
h
i=0
2
i
= 2
h+1
1
nodes altogether. If we solve for h in terms of n, we see that the the height of a complete binary tree
with n nodes is h = (lg(n + 1)) 1 lg n (log n).
40
Lecture Notes CMSC 251
A heap is represented as an left-complete binary tree. This means that all the levels of the tree are full
except the bottommost level, which is lled from left to right. An example is shown below. The keys of
a heap are stored in something called heap order. This means that for each node u, other than the root,
key(Parent(u)) key(u). This implies that as you follow any path from a leaf to the root the keys
appear in (nonstrict) increasing order. Notice that this implies that the root is necessarily the largest
element.
4
Heap ordering Leftcomplete Binary Tree
14
3
16
9
10
8 7
1 2
Figure 11: Heap.
Next time we will show how the priority queue operations are implemented for a heap.
Lecture 13: HeapSort
(Tuesday, Mar 10, 1998)
Read: Chapt 7 in CLR.
Heaps: Recall that a heap is a data structure that supports the main priority queue operations (insert and
extract max) in (log n) time each. It consists of a left-complete binary tree (meaning that all levels of
the tree except possibly the bottommost) are full, and the bottommost level is lled from left to right.
As a consequence, it follows that the depth of the tree is (log n) where n is the number of elements
stored in the tree. The keys of the heap are stored in the tree in what is called heap order. This means
that for each (nonroot) node its parents key is at least as large as its key. From this it follows that the
largest key in the heap appears at the root.
Array Storage: Last time we mentioned that one of the clever aspects of heaps is that they can be stored in
arrays, without the need for using pointers (as would normally be needed for storing binary trees). The
reason for this is the left-complete nature of the tree.
This is done by storing the heap in an array A[1..n]. Generally we will not be using all of the array,
since only a portion of the keys may be part of the current heap. For this reason, we maintain a variable
m n which keeps track of the current number of elements that are actually stored actively in the
heap. Thus the heap will consist of the elements stored in elements A[1..m].
We store the heap in the array by simply unraveling it level by level. Because the binary tree is left-
complete, we know exactly how many elements each level will supply. The root level supplies 1 node,
the next level 2, then 4, then 8, and so on. Only the bottommost level may supply fewer than the
appropriate power of 2, but then we can use the value of mto determine where the last element is. This
is illustrated below.
We should emphasize that this only works because the tree is left-complete. This cannot be used for
general trees.
41
Lecture Notes CMSC 251
3 2 1 5 6 7 8 9 4 13 12 11 10
6 7
10
16 14 10 8 7 9 4 2 1 3
n=13 m=10
Heap as an array
9 8
9
5 4
3 2
1
16
Heap as a binary tree
14
3
10
8 7
1 4 2
Figure 12: Storing a heap in an array.
We claim that to access elements of the heap involves simple arithmetic operations on the array indices.
In particular it is easy to see the following.
Left(i) : return 2i.
Right(i) : return 2i + 1.
Parent(i) : return i/2|.
IsLeaf (i) : return Left(i) > m. (That is, if is left child is not in the tree.)
IsRoot(i) : return i == 1.
For example, the heap ordering property can be stated as for all i, 1 i n, if (not IsRoot(i)) then
A[Parent(i)] A[i].
So is a heap a binary tree or an array? The answer is that from a conceptual standpoint, it is a binary
tree. However, it is implemented (typically) as an array for space efciency.
Maintaining the Heap Property: There is one principal operation for maintaining the heap property. It is
called Heapify. (In other books it is sometimes called sifting down.) The idea is that we are given an
element of the heap which we suspect may not be in valid heap order, but we assume that all of other
the elements in the subtree rooted at this element are in heap order. In particular this root element may
be too small. To x this we sift it down the tree by swapping it with one of its children. Which child?
We should take the larger of the two children to satisfy the heap ordering property. This continues
recursively until the element is either larger than both its children or until its falls all the way to the
leaf level. Here is the pseudocode. It is given the heap in the array A, and the index i of the suspected
element, and m the current active size of the heap. The element A[max] is set to the maximum of A[i]
and it two children. If max ,= i then we swap A[i] and A[max] and then recurse on A[max].
Heapify
Heapify(array A, int i, int m) { // sift down A[i] in A[1..m]
l = Left(i) // left child
r = Right(i) // right child
max = i
if (l <= m and A[l] > A[max]) max = l // left child exists and larger
if (r <= m and A[r] > A[max]) max = r // right child exists and larger
if (max != i) { // if either child larger
swap A[i] with A[max] // swap with larger child
Heapify(A, max, m) // and recurse
}
}
42
Lecture Notes CMSC 251
See Figure 7.2 on page 143 of CLR for an example of how Heapify works (in the case where m = 10).
We show the execution on a tree, rather than on the array representation, since this is the most natural
way to conceptualize the heap. You might try simulating this same algorithm on the array, to see how
it works at a ner details.
Note that the recursive implementation of Heapify is not the most efcient. We have done so because
many algorithms on trees are most naturally implemented using recursion, so it is nice to practice this
here. It is possible to write the procedure iteratively. This is left as an exercise.
The HeapSort algorithm will consist of two major parts. First building a heap, and then extracting the
maximum elements from the heap, one by one. We will see how to use Heapify to help us do both of
these.
How long does Hepify take to run? Observe that we perform a constant amount of work at each level
of the tree until we make a call to Heapify at the next lower level of the tree. Thus we do O(1) work
for each level of the tree which we visit. Since there are (log n) levels altogether in the tree, the total
time for Heapify is O(log n). (It is not (log n) since, for example, if we call Heapify on a leaf, then
it will terminate in (1) time.)
Building a Heap: We can use Heapify to build a heap as follows. First we start with a heap in which the
elements are not in heap order. They are just in the same order that they were given to us in the array
A. We build the heap by starting at the leaf level and then invoke Heapify on each node. (Note: We
cannot start at the top of the tree. Why not? Because the precondition which Heapify assumes is that
the entire tree rooted at node i is already in heap order, except for i.) Actually, we can be a bit more
efcient. Since we know that each leaf is already in heap order, we may as well skip the leaves and
start with the rst nonleaf node. This will be in position n/2|. (Can you see why?)
Here is the code. Since we will work with the entire array, the parameter mfor Heapify, which indicates
the current heap size will be equal to n, the size of array A, in all the calls.
BuildHeap
BuildHeap(int n, array A[1..n]) { // build heap from A[1..n]
for i = n/2 downto 1 {
Heapify(A, i, n)
}
}
An example of BuildHeap is shown in Figure 7.3 on page 146 of CLR. Since each call to Heapify
takes O(log n) time, and we make roughly n/2 calls to it, the total running time is O((n/2) log n) =
O(nlog n). Next time we will show that this actually runs faster, and in fact it runs in (n) time.
HeapSort: We can now give the HeapSort algorithm. The idea is that we need to repeatedly extract the
maximum item from the heap. As we mentioned earlier, this element is at the root of the heap. But
once we remove it we are left with a hole in the tree. To x this we will replace it with the last leaf in
the tree (the one at position A[m]). But now the heap order will very likely be destroyed. So we will
just apply Heapify to the root to x everything back up.
HeapSort
HeapSort(int n, array A[1..n]) { // sort A[1..n]
BuildHeap(n, A) // build the heap
m = n // initially heap contains all
while (m >= 2) {
swap A[1] with A[m] // extract the m-th largest
m = m-1 // unlink A[m] from heap
43
Lecture Notes CMSC 251
Heapify(A, 1, m) // fix things up
}
}
An example of HeapSort is shown in Figure 7.4 on page 148 of CLR. We make n 1 calls to Heapify,
each of which takes O(log n) time. So the total running time is O((n 1) log n) = O(nlog n).
Lecture 14: HeapSort Analysis and Partitioning
(Thursday, Mar 12, 1998)
Read: Chapt 7 and 8 in CLR. The algorithm we present for partitioning is different from the texts.
HeapSort Analysis: Last time we presented HeapSort. Recall that the algorithm operated by rst building a
heap in a bottom-up manner, and then repeatedly extracting the maximum element from the heap and
moving it to the end of the array. One clever aspect of the data structure is that it resides inside the
array to be sorted.
We argued that the basic heap operation of Heapify runs in O(log n) time, because the heap has
O(log n) levels, and the element being sifted moves down one level of the tree after a constant amount
of work.
Based on this we can see that (1) that it takes O(nlog n) time to build a heap, because we need
to apply Heapify roughly n/2 times (to each of the internal nodes), and (2) that it takes O(nlog n)
time to extract each of the maximum elements, since we need to extract roughly n elements and each
extraction involves a constant amount of work and one Heapify. Therefore the total running time of
HeapSort is O(nlog n).
Is this tight? That is, is the running time (nlog n)? The answer is yes. In fact, later we will see that it
is not possible to sort faster than (nlog n) time, assuming that you use comparisons, which HeapSort
does. However, it turns out that the rst part of the analysis is not tight. In particular, the BuildHeap
procedure that we presented actually runs in (n) time. Although in the wider context of the HeapSort
algorithm this is not signicant (because the running time is dominated by the (nlog n) extraction
phase).
Nonetheless there are situations where you might not need to sort all of the elements. For example, it
is common to extract some unknown number of the smallest elements until some criterion (depending
on the particular application) is met. For this reason it is nice to be able to build the heap quickly since
you may not need to extract all the elements.
BuildHeap Analysis: Let us consider the running time of BuildHeap more carefully. As usual, it will make
our lives simple by making some assumptions about n. In this case the most convenient assumption is
that n is of the form n = 2
h+1
1, where h is the height of the tree. The reason is that a left-complete
tree with this number of nodes is a complete tree, that is, its bottommost level is full. This assumption
will save us from worrying about oors and ceilings.
With this assumption, level 0 of the tree has 1 node, level 1 has 2 nodes, and up to level h, which has
2
h
nodes. All the leaves reside on level h.
Recall that when Heapify is called, the running time depends on how far an element might sift down
before the process terminates. In the worst case the element might sift down all the way to the leaf
level. Let us count the work done level by level.
At the bottommost level there are 2
h
nodes, but we do not call Heapify on any of these so the work is
0. At the next to bottommost level there are 2
h1
nodes, and each might sift down 1 level. At the 3rd
level from the bottom there are 2
h2
nodes, and each might sift down 2 levels. In general, at level j
44
Lecture Notes CMSC 251
0 0 0 0 0 0
2*2
0 0*8
1*4
3*1
Total work for BuildHeap
2
0
1 1 1 1
2
3
Figure 13: Analysis of BuildHeap.
from the bottom there are 2
hj
nodes, and each might sift down j levels. So, if we count from bottom
to top, level-by-level, we see that the total time is proportional to
T(n) =
h
j=0
j2
hj
=
h
j=0
j
2
h
2
j
.
If we factor out the 2
h
term, we have
T(n) = 2
h
h
j=0
j
2
j
.
This is a sum that we have never seen before. We could try to approximate it by an integral, which
would involve integration by parts, but it turns out that there is a very cute solution to this particular
sum. Well digress for a moment to work it out. First, write down the innite general geometric series,
for any constant x < 1.
j=0
x
j
=
1
1 x
.
Then take the derivative of both sides with respect to x, and multiply by x giving:
j=0
jx
j1
=
1
(1 x)
2
j=0
jx
j
=
x
(1 x)
2
,
and if we plug x = 1/2, then voila! we have the desired formula:
j=0
j
2
j
=
1/2
(1 (1/2))
2
=
1/2
1/4
= 2.
In our case we have a bounded sum, but since the innite series is bounded, we can use it instead as an
easy approximation.
Using this we have
T(n) = 2
h
h
j=0
j
2
j
2
h
j=0
j
2
j
2
h
2 = 2
h+1
.
Now recall that n = 2
h+1
1, so we have T(n) n + 1 O(n). Clearly the algorithm takes at least
(n) time (since it must access every element of the array at least once) so the total running time for
BuildHeap is (n).
45
Lecture Notes CMSC 251
It is worthwhile pausing here a moment. This is the second time we have seen a relatively complex
structured algorithm, with doubly nested loops, come out with a running time of (n). (The other
example was the median algorithm, based on the sieve technique. Actually if you think deeply about
this, there is a sense in which a parallel version of BuildHeap can be viewed as operating like a sieve,
but maybe this is getting too philosophical.) Perhaps a more intuitive way to describe what is happening
here is to observe an important fact about binary trees. This is that the vast majority of nodes are at the
lowest level of the tree. For example, in a complete binary tree of height h there is a total of n 2
h+1
nodes in total, and the number of nodes in the bottom 3 levels alone is
2
h
+ 2
h1
+ 2
h2
=
n
2
+
n
4
+
n
8
=
7n
8
= 0.875n.
That is, almost 90% of the nodes of a complete binary tree reside in the 3 lowest levels. Thus the lesson
to be learned is that when designing algorithms that operate on trees, it is important to be most efcient
on the bottommost levels of the tree (as BuildHeap is) since that is where most of the weight of the tree
resides.
Partitioning: Our next sorting algorithm is QuickSort. QuickSort is interesting in a number of respects.
First off, (as we will present it) it is a randomized algorithm, which means that it makes use of a ran-
dom number generator. We will show that in the worst case its running time is O(n
2
), its expected
case running time is O(nlog n). Moreover, this expected case running time occurs with high proba-
bility, in that the probability that the algorithm takes signicantly more than O(nlog n) time is rapidly
decreasing function of n. In addition, QuickSort has a better locality-of-reference behavior than either
MergeSort or HeapSort, and thus it tends to run fastest of all three algorithms. This is how it got its
name. QuickSort (and its variants) are considered the methods of choice for most standard library
sorting algorithms.
Next time we will discuss QuickSort. Today we will discuss one aspect of QuickSort, namely the
partitioning algorithm. This is the same partitioning algorithm which we discussed when we talked
about the selection (median) problem. We are given an array A[p..r], and a pivot element x chosen
from the array. Recall that the partitioning algorithm is suppose to partition A into three subarrays:
A[p..q 1] whose elements are all less than or equal to x, A[q] = x, and A[q + 1..r] whose elements
are greater than or equal to x. We will assume that x is the rst element of the subarray, that is,
x = A[p]. If a different rule is used for selecting x, this is easily handled by swapping this element
with A[p] before calling this procedure.
We will present a different algorithm from the one given in the text (in Section 8.1). This algorithm is
a little easier to verify the correctness, and a little easier to analyze. (But I suspect that the one in the
text is probably a bit for efcient for actual implementation.)
This algorithm works by maintaining the following invariant condition. The subarray is broken into
four segments. The boundaries between these items are indicated by the indices p, q, s, and r.
(1) A[p] = x is the pivot value,
(2) A[p + 1..q] contains items that are less than x,
(3) A[q + 1..s 1] contains items that are greater than or equal to x, and
(4) A[s..r] contains items whose values are currently unknown.
This is illustrated below.
The algorithm begins by setting q = p and s = p +1. With each step through the algorithm we test the
value of A[s] against x. If A[s] x, then we can simply increment s. Otherwise we increment q, swap
A[s] with A[q], and then increment s. Notice that in either case, the invariant is still maintained. In the
rst case this is obvious. In the second case, A[q] now holds a value that is less than x, and A[s 1]
now holds a value that is greater than or equal to x. The algorithm ends when s = r, meaning that
46
Lecture Notes CMSC 251
configuration
p
r
q s
x ?
p r
q s
>= x < x
q
swap
x
Initial configuration
Final configuration
Intermediate
x < x >= x ?
Figure 14: Partitioning intermediate structure.
all of the elements have been processed. To nish things off we swap A[p] (the pivot) with A[q], and
return the value of q. Here is the complete code:
Partition
Partition(int p, int r, array A) { // 3-way partition of A[p..r]
x = A[p] // pivot item in A[p]
q = p
for s = p+1 to r do {
if (A[s] < x) {
q = q+1
swap A[q] with A[s]
}
}
swap A[p] with A[q] // put the pivot into final position
return q // return location of pivot
}
An example is shown below.
Lecture 15: QuickSort
(Tuesday, Mar 17, 1998)
Revised: March 18. Fixed a bug in the analysis.
Read: Chapt 8 in CLR. My presentation and analysis are somewhat different than the texts.
QuickSort and Randomized Algorithms: Early in the semester we discussed the fact that we usually study
the worst-case running times of algorithms, but sometimes average-case is a more meaningful measure.
Today we will study QuickSort. It is a worst-case (n
2
) algorithm, whose expected-case running time
is (nlog n).
We will present QuickSort as a randomized algorithm, that is, an algorithm which makes random
choices. There are two common types of randomized algorithms:
Monte Carlo algorithms: These algorithms may produce the wrong result, but the probability of this
occurring can be made arbitrarily small by the user. Usually the lower you make this probability,
the longer the algorithm takes to run.
47
Lecture Notes CMSC 251
Final swap
3 7 1
q
8 4
s
5 3 7
3
6
s
4 8
q
1 7 3 6
5 3
3
1
5 3 7 1 8 4
s q
6 3
5
4 6
q
8 1
s
3 7 4 6 3
q
8
p r
5 3 8 6 4 3
q s
1 7
5 3 8 6 4 3
q
7 1
s
5 3 8 6 4 3
q
7 1
s
5
5 3 8 6 4 3
s q
7 1
Figure 15: Partitioning example.
Las Vegas algorithms: These algorithms always produce the correct result, but the running time is a
random variable. In these cases the expected running time, averaged over all possible random
choices is the measure of the algorithms running time.
The most well known Monte Carlo algorithm is one for determining whether a number is prime. This
is an important problem in cryptography. The QuickSort algorithm that we will discuss today is an ex-
ample of a Las Vegas algorithm. Note that QuickSort does not need to be implemented as a randomized
algorithm, but as we shall see, this is generally considered the safest implementation.
QuickSort Overview: QuickSort is also based on the divide-and-conquer design paradigm. Unlike Merge-
Sort where most of the work is done after the recursive call returns, in QuickSort the work is done
before the recursive call is made. Here is an overview of QuickSort. Note the similarity with the selec-
tion algorithm, which we discussed earlier. Let A[p..r] be the (sub)array to be sorted. The initial call
is to A[1..n].
Basis: If the list contains 0 or 1 elements, then return.
Select pivot: Select a random element x from the array, called the pivot.
Partition: Partition the array in three subarrays, those elements A[1..q 1] x, A[q] = x, and
A[q + 1..n] x.
Recurse: Recursively sort A[1..q 1] and A[q + 1..n].
The pseudocode for QuickSort is given below. The initial call is QuickSort(1, n, A). The Partition
routine was discussed last time. Recall that Partition assumes that the pivot is stored in the rst element
of A. Since we want a random pivot, we pick a random index i from p to r, and then swap A[i] with
A[p].
QuickSort
QuickSort(int p, int r, array A) { // Sort A[p..r]
if (r <= p) return // 0 or 1 items, return
i = a random index from [p..r] // pick a random element
48
Lecture Notes CMSC 251
swap A[i] with A[p] // swap pivot into A[p]
q = Partition(p, r, A) // partition A about pivot
QuickSort(p, q-1, A) // sort A[p..q-1]
QuickSort(q+1, r, A) // sort A[q+1..r]
}
QuickSort Analysis: The correctness of QuickSort should be pretty obvious. However its analysis is not
so obvious. It turns out that the running time of QuickSort depends heavily on how good a job we
do in selecting the pivot. In particular, if the rank of the pivot (recall that this means its position in
the nal sorted list) is very large or very small, then the partition will be unbalanced. We will see that
unbalanced partitions (like unbalanced binary trees) are bad, and result is poor running times. However,
if the rank of the pivot is anywhere near the middle portion of the array, then the split will be reasonably
well balanced, and the overall running time will be good. Since the pivot is chosen at random by our
algorithm, we may do well most of the time and poorly occasionally. We will see that the expected
running time is O(nlog n).
Worst-case Analysis: Lets begin by considering the worst-case performance, because it is easier than the
average case. Since this is a recursive program, it is natural to use a recurrence to describe its running
time. But unlike MergeSort, where we had control over the sizes of the recursive calls, here we do
not. It depends on how the pivot is chosen. Suppose that we are sorting an array of size n, A[1..n],
and further suppose that the pivot that we select is of rank q, for some q in the range 1 to n. It takes
(n) time to do the partitioning and other overhead, and we make two recursive calls. The rst is to
the subarray A[1..q 1] which has q 1 elements, and the other is to the subarray A[q + 1..n] which
has r (q + 1) + 1 = r q elements. So if we ignore the (as usual) we get the recurrence:
T(n) = T(q 1) +T(n q) +n.
This depends on the value of q. To get the worst case, we maximize over all possible values of q. As a
basis we have that T(0) = T(1) = (1). Putting this together we have
T(n) =
_
1 if n 1
max
1qn
(T(q 1) +T(n q) +n) otherwise.
Recurrences that have maxs and mins embedded in them are very messy to solve. The key is deter-
mining which value of q gives the maximum. (A rule of thumb of algorithm analysis is that the worst
cases tends to happen either at the extremes or in the middle. So I would plug in the value q = 1, q = n,
and q = n/2 and work each out.) In this case, the worst case happens at either of the extremes (but see
the book for a more careful analysis based on an analysis of the second derivative). If we expand the
recurrence in the case q = 1 we get:
T(n) T(0) +T(n 1) +n = 1 +T(n 1) +n
= T(n 1) + (n + 1)
= T(n 2) +n + (n + 1)
= T(n 3) + (n 1) +n + (n + 1)
= T(n 4) + (n 2) + (n 1) +n + (n + 1)
= . . .
= T(n k) +
k2
i=1
(n i).
49
Lecture Notes CMSC 251
For the basis, T(1) = 1 we set k = n 1 and get
T(n) T(1) +
n3
i=1
(n i)
= 1 + (3 + 4 + 5 +. . . + (n 1) +n + (n + 1))
n+1
i=1
i =
(n + 1)(n + 2)
2
O(n
2
).
In fact, a more careful analysis reveals that it is (n
2
) in this case.
Average-case Analysis: Next we show that in the average case QuickSort runs in (nlog n) time. When
we talked about average-case analysis at the beginning of the semester, we said that it depends on some
assumption about the distribution of inputs. However, in this case, the analysis does not depend on
the input distribution at allit only depends on the random choices that the algorithm makes. This is
good, because it means that the analysis of the algorithms performance is the same for all inputs. In
this case the average is computed over all possible random choices that the algorithm might make for
the choice of the pivot index in the second step of the QuickSort procedure above.
To analyze the average running time, we let T(n) denote the average running time of QuickSort on a list
of size n. It will simplify the analysis to assume that all of the elements are distinct. The algorithm has
n random choices for the pivot element, and each choice has an equal probability of 1/n of occuring.
So we can modify the above recurrence to compute an average rather than a max, giving:
T(n) =
_
1 if n 1
1
n
n
q=1
(T(q 1) +T(n q) +n) otherwise.
This is not a standard recurrence, so we cannot apply the Master Theorem. Expansion is possible, but
rather tricky. Instead, we will attempt a constructive induction to solve it. We know that we want a
(nlog n) running time, so lets try T(n) anlg n + b. (Properly we should write lg n| because
unlike MergeSort, we cannot assume that the recursive calls will be made on array sizes that are powers
of 2, but well be sloppy because things will be messy enough anyway.)
Theorem: There exist a constant c such that T(n) cnln n, for all n 2. (Notice that we have
replaced lg n with lnn. This has been done to make the proof easier, as we shall see.)
Proof: The proof is by constructive induction on n. For the basis case n = 2 we have
T(2) =
1
2
2
q=1
(T(q 1) +T(2 q) + 2)
=
1
2
((T(0) +T(1) + 2) + (T(1) +T(0) + 2) =
8
2
= 4.
We want this to be at most c2 ln 2 implying that c 4/(2 ln 2) 2.885.
For the induction step, we assume that n 3, and the induction hypothesis is that for any n
< n,
we have T(n
) cn
ln n
q=1
(T(q 1) +T(n q) +n)
=
1
n
n
q=1
(T(q 1) +T(n q)) +n.
50
Lecture Notes CMSC 251
Observe that if we split the sum into two sums, they both add the same values T(0) + T(1) +
. . . +T(n 1), just that one counts up and the other counts down. Thus we can replace this with
2
n1
q=0
T(q). Because they dont follow the formula, well extract T(0) and T(1) and treat them
specially. If we make this substitution and apply the induction hypothesis to the remaining sum
we have (which we can because q < n) we have
T(n) =
2
n
_
n1
q=0
T(q)
_
+n =
2
n
_
T(0) +T(1) +
n1
q=2
T(q)
_
+n
2
n
_
1 + 1 +
n1
q=2
(cq lg q)
_
+n
=
2c
n
_
n1
q=2
(cq ln q)
_
+n +
4
n
.
We have never seen this sum before. Later we will show that
S(n) =
n1
q=2
q ln q
n
2
2
ln n
n
2
4
.
Assuming this for now, we have
T(n) =
2c
n
_
n
2
2
ln n
n
2
4
_
+n +
4
n
= cnln n
cn
2
+n +
4
n
= cnln n +n
_
1
c
2
_
+
4
n
.
To nish the proof, we want all of this to be at most cnln n. If we cancel the common cnln n we
see that this will be true if we select c such that
n
_
1
c
2
_
+
4
n
0.
After some simple manipulations we see that this is equivalent to:
0 n
cn
2
+
4
n
cn
2
n +
4
n
c 2 +
8
n
2
.
Since n 3, we only need to select c so that c 2 +
8
9
, and so selecting c = 3 will work. From
the basis case we have c 2.885, so we may choose c = 3 to satisfy both the constraints.
The Leftover Sum: The only missing element to the proof is dealing with the sum
S(n) =
n1
q=2
q ln q.
51
Lecture Notes CMSC 251
To bound this, recall the integration formula for bounding summations (which we paraphrase
here). For any monotonically increasing function f(x)
b1
i=a
f(i)
_
b
a
f(x)dx.
The function f(x) = xln x is monotonically increasing, and so we have
S(n)
_
n
2
xln xdx.
If you are a calculus macho man, then you can integrate this by parts, and if you are a calculus
wimp (like me) then you can look it up in a book of integrals
_
n
2
xln xdx =
x
2
2
ln x
x
2
4
n
x=2
=
_
n
2
2
ln n
n
2
4
_
(2 ln 2 1)
n
2
2
ln n
n
2
4
.
This completes the summation bound, and hence the entire proof.
Summary: So even though the worst-case running time of QuickSort is (n
2
), the average-case running
time is (nlog n). Although we did not show it, it turns out that this doesnt just happen much of
the time. For large values of n, the running time is (nlog n) with high probability. In order to get
(n
2
) time the algorithm must make poor choices for the pivot at virtually every step. Poor choices are
rare, and so continuously making poor choices are very rare. You might ask, could we make QuickSort
deterministic (nlog n) by calling the selection algorithm to use the median as the pivot. The answer
is that this would work, but the resulting algorithm would be so slow practically that no one would ever
use it.
QuickSort (like MergeSort) is not formally an in-place sorting algorithm, because it does make use of a
recursion stack. In MergeSort and in the expected case for QuickSort, the size of the stack is O(log n),
so this is not really a problem.
QuickSort is the most popular algorithm for implementation because its actual performance (on typical
modern architectures) is so good. The reason for this stems from the fact that (unlike Heapsort) which
can make large jumps around in the array, the main work in QuickSort (in partitioning) spends most of
its time accessing elements that are close to one another. The reason it tends to outperform MergeSort
(which also has good locality of reference) is that most comparisons are made against the pivot element,
which can be stored in a register. In MergeSort we are always comparing two array elements against
each other. The most efcient versions of QuickSort uses the recursion for large subarrays, but once
the sizes of the subarray falls below some minimum size (e.g. 20) it switches to a simple iterative
algorithm, such as selection sort.
Lecture 16: Lower Bounds for Sorting
(Thursday, Mar 19, 1998)
Read: Chapt. 9 of CLR.
Review of Sorting: So far we have seen a number of algorithms for sorting a list of numbers in ascending
order. Recall that an in-place sorting algorithm is one that uses no additional array storage (however,
we allow QuickSort to be called in-place even though they need a stack of size O(log n) for keeping
track of the recursion). A sorting algorithm is stable if duplicate elements remain in the same relative
position after sorting.
52
Lecture Notes CMSC 251
Slow Algorithms: Include BubbleSort, InsertionSort, and SelectionSort. These are all simple (n
2
)
in-place sorting algorithms. BubbleSort and InsertionSort can be implemented as stable algo-
rithms, but SelectionSort cannot (without signicant modications).
Mergesort: Mergesort is a stable (nlog n) sorting algorithm. The downside is that MergeSort is
the only algorithm of the three that requires additional array storage, implying that it is not an
in-place algorithm.
Quicksort: Widely regarded as the fastest of the fast algorithms. This algorithm is O(nlog n) in the
expected case, and O(n
2
) in the worst case. The probability that the algorithm takes asymptoti-
cally longer (assuming that the pivot is chosen randomly) is extremely small for large n. It is an
(almost) in-place sorting algorithm but is not stable.
Heapsort: Heapsort is based on a nice data structure, called a heap, which is a fast priority queue.
Elements can be inserted into a heap in O(log n) time, and the largest item can be extracted in
O(log n) time. (It is also easy to set up a heap for extracting the smallest item.) If you only want
to extract the k largest values, a heap can allow you to do this is O(n + k log n) time. It is an
in-place algorithm, but it is not stable.
Lower Bounds for Comparison-Based Sorting: Can we sort faster than O(nlog n) time? We will give an
argument that if the sorting algorithm is based solely on making comparisons between the keys in the
array, then it is impossible to sort more efciently than (nlog n) time. Such an algorithm is called a
comparison-based sorting algorithm, and includes all of the algorithms given above.
Virtually all known general purpose sorting algorithms are based on making comparisons, so this is
not a very restrictive assumption. This does not preclude the possibility of a sorting algorithm whose
actions are determined by other types of operations, for example, consulting the individual bits of
numbers, performing arithmetic operations, indexing into an array based on arithmetic operations on
keys.
We will show that any comparison-based sorting algorithm for a input sequence a
1
, a
2
, . . . , a
n
) must
make at least (nlog n) comparisons in the worst-case. This is still a difcult task if you think about it.
It is easy to show that a problem can be solved fast (just give an algorithm). But to show that a problem
cannot be solved fast you need to reason in some way about all the possible algorithms that might ever
be written. In fact, it seems surprising that you could even hope to prove such a thing. The catch here
is that we are limited to using comparison-based algorithms, and there is a clean mathematical way of
characterizing all such algorithms.
Decision Tree Argument: In order to prove lower bounds, we need an abstract way of modeling any pos-
sible comparison-based sorting algorithm, we model such algorithms in terms of an abstract model
called a decision tree.
In a comparison-based sorting algorithm only comparisons between the keys are used to determine
the action of the algorithm. Let a
1
, a
2
, . . . , a
n
) be the input sequence. Given two elements, a
i
and
a
j
, their relative order can only be determined by the results of comparisons like a
i
< a
j
, a
i
a
j
,
a
i
= a
j
, a
i
a
j
, and a
i
> a
j
.
A decision tree is a mathematical representation of a sorting algorithm (for a xed value of n). Each
node of the decision tree represents a comparison made in the algorithm (e.g., a
4
: a
7
), and the two
branches represent the possible results, for example, the left subtree consists of the remaining compar-
isons made under the assumption that a
4
a
7
and the right subtree for a
4
> a
7
. (Alternatively, one
might be labeled with a
4
< a
7
and the other with a
4
a
7
.)
Observe that once we know the value of n, then the action of the sorting algorithm is completely
determined by the results of its comparisons. This action may involve moving elements around in the
array, copying them to other locations in memory, performing various arithmetic operations on non-key
data. But the bottom-line is that at the end of the algorithm, the keys will be permuted in the nal array
53
Lecture Notes CMSC 251
in some way. Each leaf in the decision tree is labeled with the nal permutation that the algorithm
generates after making all of its comparisons.
To make this more concrete, let us assume that n = 3, and lets build a decision tree for SelectionSort.
Recall that the algorithm consists of two phases. The rst nds the smallest element of the entire list,
and swaps it with the rst element. The second nds the smaller of the remaining two items, and swaps
it with the second element. Here is the decision tree (in outline form). The rst comparison is between
a
1
and a
2
. The possible results are:
a
1
a
2
: Then a
1
is the current minimum. Next we compare a
1
with a
3
whose results might be either:
a
1
a
3
: Then we know that a
1
is the minimum overall, and the elements remain in their original
positions. Then we pass to phase 2 and compare a
2
with a
3
. The possible results are:
a
2
a
3
: Final output is a
1
, a
2
, a
3
).
a
2
> a
3
: These two are swapped and the nal output is a
1
, a
3
, a
2
).
a
1
> a
3
: Then we know that a
3
is the minimum is the overall minimum, and it is swapped with
a
1
. The we pass to phase 2 and compare a
2
with a
1
(which is now in the third position of
the array) yielding either:
a
2
a
1
: Final output is a
3
, a
2
, a
1
).
a
2
> a
1
: These two are swapped and the nal output is a
3
, a
1
, a
2
).
a
1
> a
2
: Then a
2
is the current minimum. Next we compare a
2
with a
3
whose results might be either:
a
2
a
3
: Then we know that a
2
is the minimum overall. We swap a
2
with a
1
, and then pass to
phase 2, and compare the remaining items a
1
and a
3
. The possible results are:
a
1
a
3
: Final output is a
2
, a
1
, a
3
).
a
1
> a
3
: These two are swapped and the nal output is a
2
, a
3
, a
1
).
a
2
> a
3
: Then we know that a
3
is the minimum is the overall minimum, and it is swapped with
a
1
. We pass to phase 2 and compare a
2
with a
1
(which is now in the third position of the
array) yielding either:
a
2
a
1
: Final output is a
3
, a
2
, a
1
).
a
2
> a
1
: These two are swapped and the nal output is a
3
, a
1
, a
2
).
The nal decision tree is shown below. Note that there are some nodes that are unreachable. For exam-
ple, in order to reach the fourth leaf from the left it must be that a
1
a
2
and a
1
> a
2
, which cannot
both be true. Can you explain this? (The answer is that virtually all sorting algorithms, especially
inefcient ones like selection sort, may make comparisons that are redundant, in the sense that their
outcome has already been determined by earlier comparisons.) As you can see, converting a complex
sorting algorithm like HeapSort into a decision tree for a large value of n will be very tedious and
complex, but I hope you are convinced by this exercise that it can be done in a simple mechanical way.
3,1,2 3,2,1 3,1,2 3,2,1
a2:a1
>
a2:a1
>
>
> >
> >
a2:a3
a1:a3
2,3,1 2,1,3 1,3,2 1,2,3
a2:a3
a1:a2
<
< < < <
<
<
a1:a3
Figure 16: Decision Tree for SelectionSort on 3 keys.
54
Lecture Notes CMSC 251
Using Decision Trees for Analyzing Sorting: Consider any sorting algorithm. Let T(n) be the maximum
number of comparisons that this algorithm makes on any input of size n. Notice that the running time
fo the algorithm must be at least as large as T(n), since we are not counting data movement or other
computations at all. The algorithm denes a decision tree. Observe that the height of the decision
tree is exactly equal to T(n), because any path from the root to a leaf corresponds to a sequence of
comparisons made by the algorithm.
As we have seen earlier, any binary tree of height T(n) has at most 2
T(n)
leaves. This means that this
sorting algorithm can distinguish between at most 2
T(n)
different nal actions. Lets call this quantity
A(n), for the number of different nal actions the algorithm can take. Each action can be thought of as
a specic way of permuting the oringinal input to get the sorted output.
How many possible actions must any sorting algorithm distinguish between? If the input consists of n
distinct numbers, then those numbers could be presented in any of n! different permutations. For each
different permutation, the algorithm must unscramble the numbers in an essentially different way,
that is it must take a different action, implying that A(n) n!. (Again, A(n) is usually not exactly
equal to n! because most algorithms contain some redundant unreachable leaves.)
Since A(n) 2
T(n)
we have 2
T(n)
n!, implying that
T(n) lg(n!).
We can use Stirlings approximation for n! (see page 35 in CLR) yielding:
n!
2n
_
n
e
_
n
T(n) lg
_
2n
_
n
e
_
n
_
= lg
vV
in-deg(v) =
vV
out-deg(v) = [E[.
([E[ means the cardinality of the set E, i.e. the number of edges).
In an undirected graph each edge contributes once to the outdegree of two different edges and thus we
have
Observation: For an undirected graph G = (V, E),
vV
deg(v) = 2[E[.
Lecture 21: More on Graphs
(Tuesday, April 14, 1998)
Read: Sections 5.4, 5.5.
Graphs: Last time we introduced the notion of a graph (undirected) and a digraph (directed). We dened
vertices, edges, and the notion of degrees of vertices. Today we continue this discussion. Recall that
graphs and digraphs both consist of two objects, a set of vertices and a set of edges. For graphs edges
are undirected and for graphs they are directed.
Paths and Cycles: Lets concentrate on directed graphs for the moment. A path in a directed graph is a
sequence of vertices v
0
, v
1
, . . . , v
k
) such that (v
i1
, v
i
) is an edge for i = 1, 2, . . . , k. The length of
the path is the number of edges, k. We say that w is reachable from u if there is a path from u to w.
Note that every vertex is reachable from itself by a path that uses zero edges. A path is simple if all
vertices (except possibly the rst and last) are distinct.
A cycle in a digraph is a path containing at least one edge and for which v
0
= v
k
. A cycle is simple if,
in addition, v
1
, . . . , v
k
are distinct. (Note: A self-loop counts as a simple cycle of length 1).
In undirected graphs we dene path and cycle exactly the same, but for a simple cycle we add the
requirement that the cycle visit at least three distinct vertices. This is to rule out the degenerate cycle
u, w, u), which simply jumps back and forth along a single edge.
There are two interesting classes cycles. A Hamiltonian cycle is a cycle that visits every vertex in a
graph exactly once. A Eulerian cycle is a cycle (not necessarily simple) that visits every edge of a
graph exactly once. (By the way, this is pronounced Oiler-ian and not Yooler-ian.) There are also
path versions in which you need not return to the starting vertex.
61
Lecture Notes CMSC 251
One of the early problems which motivated interest in graph theory was the K onigsberg Bridge Prob-
lem. This city sits on the Pregel River as is joined by 7 bridges. The question is whether it is possible
to cross all 7 bridges without visiting any bridge twice. Leonard Euler showed that it is not possible, by
showing that this question could be posed as a problem of whether the multi-graph shown below has
an Eulerian path, and then proving necessary and sufcient conditions for a graph to have such a path.
4
1
3
2
2
4
3
1
Figure 19: Bridges at K onigsberg Problem.
Euler proved that for a graph to have an Eulerian path, all but at most two of the vertices must have
even degree. In this graph, all 4 of the vertices have odd degree.
Connectivity and acyclic graphs: A graph is said to be acyclic if it contains no simple cycles. A graph is
connected if every vertex can reach every other vertex. An acyclic connected graph is called a free tree
or simply tree for short. The term free is intended to emphasize the fact that the tree has no root, in
contrast to a rooted tree, as is usually seen in data structures.
Observe that a free tree is a minimally connected graph in the sense that the removal of any causes the
resulting graph to be disconnected. Furthermore, there is a unique path between any two vertices in a
free tree. The addition of any edge to a free tree results in the creation of a single cycle.
The reachability relation is an equivalence relation on vertices, that is, it is reexive (a vertex is
reachable from itself), symmetric (if u is reachable from v, then v is reachable from u), and transitive
(if u is reachable from v and v is reachable from w, then u is reachable from w). This implies that
the reachability relation partitions the vertices of the graph in equivalence classes. These are called
connected components.
A connected graph has a single connected component. An acyclic graph (which is not necessarily
connected) consists of many free trees, and is called (what else?) a forest.
A digraph is strongly connected if for any two vertices u and v, u can reach v and v can reach u. (There
is another type of connectivity in digraphs called weak connectivity, but we will not consider it.) As
with connected components in graphs, strongly connectivity denes an equivalence partition on the
vertices. These are called the strongly connected components of the digraph.
A directed graph that is acyclic is called a DAG, for directed acyclic graph. Note that it is different
from a directed tree.
Isomorphism: Two graphs G = (V, E) and G
= (V
, E
.
Isomorphic graphs are essentially equal except that their vertices have been given different names.
Determining whether graphs are isomorphic is not as easy as it might seem at rst. For example,
consider the graphs in the gure. Clearly (a) and (b) seem to appear more similar to each other than to
(c), but in fact looks are deceiving. Observe that in all three cases all the vertices have degree 3, so that
is not much of a help. Observe there are simple cycles of length 4 in (a), but the smallest simple cycles
in (b) and (c) are of length 5. This implies that (a) cannot be isomorphic to either (b) or (c). It turns
62
Lecture Notes CMSC 251
5
4
3
2
8
1
10
9
(a)
6 7
8
9
10
(b) (c)
2
6
1
3 4
5
7
8 9
10
1
2
3 4
5
6
7
Figure 20: Graph isomorphism.
out that (b) and (c) are isomorphic. One possible isomorphism mapping is given below. The notation
(u v) means that vertex u from graph (b) is mapped to vertex v in graph (c). Check that each edge
from (b) is mapped to an edge of (c).
(1 1), (2 2), (3 3), (4 7), (5 8), (6 5), (7 10), (8 4), (9 6), (10 9).
Subgraphs and special graphs: A graph G
= (V
, E
) is a subgraph of G = (V, E) if V
V and
E
E. Given a subset V
is the graph G
= (V
, E
) where
E
= (u, v) E [ u, v V
.
In other words, take all the edges of G that join pairs of vertices in V
.
An undirected graph that has the maximum possible number of edges is called a complete graph.
Complete graphs are often denoted with the letter K. For example, K
5
is the complete graph on 5
vertices. Given a graph G, a subset of vertices V
uV
(deg(u) + 1) = n +
uV
deg(u) +n = 2n + 2e (n +e).
68
Lecture Notes CMSC 251
0
1
2 2
3
1
t
u w
x v
s
Figure 27: BFS tree.
For an directed graph the analysis is essentially the same.
Lecture 23: All-Pairs Shortest Paths
(Tuesday, April 21, 1998)
Read: Chapt 26 (up to Section 26.2) in CLR.
All-Pairs Shortest Paths: Last time we showed how to compute shortest paths starting at a designated
source vertex, and assuming that there are no weights on the edges. Today we talk about a consid-
erable generalization of this problem. First, we compute shortest paths not from a single vertex, but
from every vertex in the graph. Second, we allow edges in the graph to have numeric weights.
Let G = (V, E) be a directed graph with edge weights. If (u, v) E, is an edge of G, then the weight of
this edge is denoted W(u, v). Intuitively, this weight denotes the distance of the road from u to v, or
more generally the cost of traveling from u to v. For now, let us think of the weights as being positive
values, but we will see that the algorithm that we are about to present can handle negative weights as
well, in special cases. Intuitively a negative weight means that you get paid for traveling from u to v.
Given a path = u
0
, u
1
, . . . , u
k
), the cost of this path is the sum of the edge weights:
cost() = W(u
0
, u
1
) +W(u
1
, u
2
) + W(u
k1
, u
k
) =
k
i=1
W(u
i1
, u
i
).
(We will avoid using the term length, since it can be confused with the number of edges on the path.)
The distance between two vertices is the cost of the minimum cost path between them.
We consider the problem of determining the cost of the shortest path between all pairs of vertices
in a weighted directed graph. We will present two algorithms for this problem. The rst is a rather
naive (n
4
) algorithm, and the second is a (n
3
) algorithm. The latter is called the Floyd-Warshall
algorithm. Both algorithms is based on a completely different algorithm design technique, called
dynamic programming.
For these algorithms, we will assume that the digraph is represented as an adjacency matrix, rather than
the more common adjacency list. Recall that adjacency lists are generally more efcient for sparse
graphs (and large graphs tend to be sparse). However, storing all the distance information between
each pair of vertices, will quickly yield a dense digraph (since typically almost every vertex can reach
almost every other vertex). Therefore, since the output will be dense, there is no real harm in using the
adjacency matrix.
Because both algorithms are matrix-based, we will employ common matrix notation, using i, j and k
to denote vertices rather than u, v, and w as we usually do. Let G = (V, E, w) denote the input digraph
and its edge weight function. The edge weights may be positive, zero, or negative, but we assume that
69
Lecture Notes CMSC 251
there are no cycles whose total weight is negative. It is easy to see why this causes problems. If the
shortest path ever enters such a cycle, it would never exit. Why? Because by going round the cycle
over and over, the cost will become smaller and smaller. Thus, the shortest path would have a weight
of , and would consist of an innite number of edges. Disallowing negative weight cycles will rule
out the possibility this absurd situation.
Input Format: The input is an nn matrix W of edge weights, which are based on the edge weights in the
digraph. We let w
ij
denote the entry in row i and column j of W.
w
ij
=
_
_
_
0 if i = j,
W(i, j) if i ,= j and (i, j) E,
+ if i ,= j and (i, j) / E.
Setting w
ij
= if there is no edge, intuitively means that there is no direct link between these two
nodes, and hence the direct cost is innite. The reason for setting w
ii
= 0 is that intuitively the cost
of getting from any vertex to should be 0, since we have no distance to travel. Note that in digraphs
it is possible to have self-loop edges, and so W(i, j) may generally be nonzero. Notice that it cannot
be negative (otherwise we would have a negative cost cycle consisting of this single edge). If it is
positive, then it never does us any good to follow this edge (since it increases our cost and doesnt take
us anywhere new).
The output will be an n n distance matrix D = d
ij
where d
ij
= (i, j), the shortest path cost from
vertex i to j. Recovering the shortest paths will also be an issue. To help us do this, we will also
compute an auxiliary matrix pred[i, j]. The value of pred[i, j] will be a vertex that is somewhere along
the shortest path from i to j. If the shortest path travels directly from i to j without passing through
any other vertices, then pred[i, j] will be set to null. We will see later than using these values it will be
possible to reconstruct any shortest path in (n) time.
Dynamic Programming for Shortest Paths: The algorithm is based on a technique called dynamic pro-
gramming. Dynamic programming problems are typically optimization problems (nd the smallest or
largest solution, subject to various constraints). The technique is related to divide-and-conquer, in the
sense that it breaks problems down into smaller problems that it solves recursively. However, because
of the somewhat different nature of dynamic programming problems, standard divide-and-conquer
solutions are not usually efcient. The basic elements that characterize a dynamic programming algo-
rithm are:
Substructure: Decompose the problem into smaller subproblems.
Optimal substructure: Each of the subproblems should be solved optimally.
Bottom-up computation: Combine solutions on small subproblems to solve larger subproblems, and
eventually to arrive at a solution to the complete problem.
The question is how to decompose the shortest path problem into subproblems in a meaningful way.
There is one very natural way to do this. What is remarkable, is that this does not lead to the best solu-
tion. First we will introduce the natural decomposition, and later present the Floyd-Warshall algorithm
makes use of a different, but more efcient dynamic programming formulation.
Path Length Formulation: We will concentrate just on computing the cost of the shortest path, not the path
itself. Let us rst sketch the natural way to break the problem into subproblems. We want to nd some
parameter, which constrains the estimates to the shortest path costs. At rst the estimates will be crude.
As this parameter grows, the shortest paths cost estimates should converge to their correct values. A
natural way to do this is to restrict the number of edges that are allowed to be in the shortest path.
For 0 m n 1, dene d
(m)
ij
to be the cost of the shortest path from vertex i to vertex j that
contains at most m edges. Let D
(m)
denote the matrix whose entries are these values. The idea is to
70
Lecture Notes CMSC 251
compute D
(0)
then D
(1)
, and so on, up to D
(n1)
. Since we know that no shortest path can use more
than n 1 edges (for otherwise it would have to repeat a vertex), we know that D
(n1)
is the nal
distance matrix. This is illustrated in the gure (a) below.
d = 9
(2)
1,3
d = 4
(3)
4
9
2
1
1
4
d = INF
1,3
(1)
1,3
(no path)
(using: 1,2,3)
(unsing: 1,4,2,3)
(a) (b)
2
8
i
j
k
(m1)
(m1)
kj
w
d
d
ik
ij
1
3
Figure 28: Dynamic Programming Formulation.
The question is, how do we compute these distance matrices? As a basis, we could start with paths of
containing 0 edges, D
(0)
(as our text does). However, observe that D
(1)
= W, since the edges of the
digraph are just paths of length 1. It is just as easy to start with D
(1)
, since we are given W as input.
So as our basis case we have
d
(1)
ij
= w
ij
.
Now, to make the induction go, we claim that it is possible to compute D
(m)
fromD
(m1)
, for m 2.
Consider how to compute the quantity d
(m)
ij
. This is the length of the shortest path from i to j using at
most m edges. There are two cases:
Case 1: If the shortest path uses strictly fewer than m edges, then its cost is just d
(m1)
ij
.
Case 2: If the shortest path uses exactly m edges, then the path uses m 1 edges to go from i to
some vertex k, and then follows a single edge (k, j) of weight w
kj
to get to j. The path from
i to k should be shortest (by the principle of optimality) so the length of the resulting path is
d
(m1)
ik
+w
ij
. But we do not know what k is. So we minimize over all possible choices.
This is illustrated in the gure (b) above.
This suggests the following rule:
d
(m)
ij
= min
_
d
(m1)
ij
min
1kn
_
d
(m1)
ik
+w
kj
_
_
.
Notice that the two terms of the main min correspond to the two cases. In the second case, we consider
all vertices k, and consider the length of the shortest path from i to k, using m1 edges, and then the
single edge length cost from k to j.
We can simplify this formula a bit by observing that since w
jj
= 0, we have d
(m1)
ij
= d
(m1)
ij
+w
jj
.
This term occurs in the second case (when k = j). Thus, the rst term is redundant. This gives
d
(m)
ij
= min
1kn
_
d
(m1)
ik
+w
kj
_
,
The next question is howshall we implement this rule. One way would be to write a recursive procedure
to do it. Here is a possible implementation. To compute the shortest path from i to j, the initial call
would be Dist(n 1, i, j). The array of weights
71
Lecture Notes CMSC 251
Recursive Shortest Paths
Dist(int m, int i, int j) {
if (m == 1) return W[i,j] // single edge case
best = INF
for k = 1 to n do
best = min(best, Dist(m-1, i, k) + w[k,j]) // apply the update rule
return best
}
Unfortunately this will be very slow. Let T(m, n) be the running time of this algorithm on a graph with
n vertices, where the rst argument is m. The algorithm makes n calls to itself with the rst argument
of m 1. When m = 1, the recursion bottoms out, and we have T(1, n) = 1. Otherwise, we make n
recursive calls to T(m1, n). This gives the recurrence:
T(m, n) =
_
1 if m = 1,
nT(m1, n) + 1 otherwise.
The total running time is T(n1, n). It is a straightforward to solve this by expansion. The result will
be O(n
n
), a huge value. It is not hard to see why. If you unravel the recursion, you will see that this
algorithm is just blindly trying all possible paths from i to j. There are exponentially many such paths.
So how do we make this faster? The answer is to use table-lookup. This is the key to dynamic
programming. Observe that there are only O(n
3
) different possible numbers d
(m)
ij
that we have to
compute. Once we compute one of these values, we will store it in a table. Then if we want this value
again, rather than recompute it, we will simply look its value up in the table.
The gure below gives an implementation of this idea. The main procedure ShortestPath(n, w) is
given the number of vertices n and the matrix of edge weights W. The matrix D
(m)
is stored as D[m],
for 1 m n 1. For each m, D[m] is a 2-dimensional matrix, implying that D is a 3-dimensional
matrix. We initialize D
(1)
by copying W. Then each call to ExtendPaths() computes D
(m)
from
D
(m1)
, from the above formula.
Dynamic Program Shortest Paths
ShortestPath(int n, int W[1..n, 1..n]) {
array D[1..n-1][1..n, 1..n]
copy W to D[1] // initialize D[1]
for m = 2 to n-1 do
D[m] = ExtendPaths(n, D[m-1], W) // comput D[m] from D[m-1]
return D[n-1]
}
ExtendShortestPath(int n, int d[1..n, 1..n], int W[1..n, 1..n]) {
matrix dd[1..n, 1..n] = d[1..n, 1..n] // copy d to temp matrix
for i = 1 to n do // start from i
for j = 1 to n do // ...to j
for k = 1 to n do // ...passing through k
dd[i,j] = min(dd[i,j], d[i,k] + W[k,j])
return dd // return matrix of distances
}
The procedure ExtendShortestPath() consists of 3 nested loops, and so its running time is (n
3
). It
is called n 2 times by the main procedure, and so the total running time is (n
4
). Next time we will
see that we can improve on this. This is illustrated in the gure below.
72
Lecture Notes CMSC 251
1
5
4
1
4
3
2
1
4 12 0 5
5 0 1 ?
0 3 9 1
(2)
D =
3
4
7
5 7
2
3
1
5
4
1
4
3
3
(1)
13
7 2 3 0
4 7 0 5
6
? 2 9 0
4 ? 0 ?
? 0 1 ?
0 8 ? 1
? = infinity
W =
9
1
2
4
8 1
4
3
2
1
13 2 3 0
3
9
5 12
2
2
1
5 0 1 6
0 3 4 1
(3)
D =
= D
Figure 29: Shortest Path Example.
Lecture 24: Floyd-Warshall Algorithm
(Thursday, April 23, 1998)
Read: Chapt 26 (up to Section 26.2) in CLR.
Floyd-Warshall Algorithm: We continue discussion of computing shortest paths between all pairs of ver-
tices in a directed graph. The Floyd-Warshall algorithm dates back to the early 60s. Warshall was
interested in the weaker question of reachability: determine for each pair of vertices u and v, whether
u can reach v. Floyd realized that the same technique could be used to compute shortest paths with
only minor variations.
The Floyd-Warshall algorithm improves upon this algorithm, running in (n
3
) time. The genius of the
Floyd-Warshall algorithm is in nding a different formulation for the shortest path subproblem than
the path length formulation introduced earlier. At rst the formulation may seem most unnatural, but
it leads to a faster algorithm. As before, we will compute a set of matrices whose entries are d
(k)
ij
. We
will change the meaning of each of these entries.
For a path p = v
1
, v
2
, . . . , v
, j
] for i
i and j
k=1
A[i, k]B[k, j].
Observe that there are pr total entries in C and each takes O(q) time to compute, thus the total time
(e.g. number of multiplications) to multiply these two matrices is p q r.
B C
=
A
p
q
q
r
p
r
Multiplication
pqr
time =
=
*
Figure 33: Matrix Multiplication.
Note that although any legal parenthesization will lead to a valid result, not all involve the same number
of operations. Consider the case of 3 matrices: A
1
be 5 4, A
2
be 4 6 and A
3
be 6 2.
mult[((A
1
A
2
)A
3
)] = (5 4 6) + (5 6 2) = 180,
mult[(A
1
(A
2
A
3
))] = (4 6 2) + (5 4 2) = 88.
Even for this small example, considerable savings can be achieved by reordering the evaluation se-
quence. The Chain Matrix Multiplication problem is: Given a sequence of matrices A
1
, A
2
, . . . , A
n
and dimensions p
0
, p
1
, . . . , p
n
where A
i
is of dimension p
i1
p
i
, determine the multiplication se-
quence that minimizes the number of operations.
Important Note: This algorithm does not perform the multiplications, it just gures out the best order
in which to perform the multiplications.
79
Lecture Notes CMSC 251
Naive Algorithm: We could write a procedure which tries all possible parenthesizations. Unfortunately, the
number of ways of parenthesizing an expression is very large. If you have just one item, then there is
only one way to parenthesize. If you have n items, then there are n 1 places where you could break
the list with the outermost pair of parentheses, namely just after the 1st item, just after the 2nd item,
etc., and just after the (n 1)st item. When we split just after the kth item, we create two sublists to
be parenthesized, one with k items, and the other with n k items. Then we could consider all the
ways of parenthesizing these. Since these are independent choices, if there are L ways to parenthesize
the left sublist and R ways to parenthesize the right sublist, then the total is L R. This suggests the
following recurrence for P(n), the number of different ways of parenthesizing n items:
P(n) =
_
1 if n = 1,
n1
k=1
P(k)P(n k) if n 2.
This is related to a famous function in combinatorics called the Catalan numbers (which in turn is
related to the number of different binary trees on n nodes). In particular P(n) = C(n 1) and
C(n) =
1
n + 1
_
2n
n
_
.
Applying Stirlings formula, we nd that C(n) (4
n
/n
3/2
). Since 4
n
is exponential and n
3/2
is just
polynomial, the exponential will dominate, and this grows very fast. Thus, this will not be practical
except for very small n.
Dynamic Programming Solution: This problem, like other dynamic programming problems involves de-
termining a structure (in this case, a parenthesization). We want to break the problem into subproblems,
whose solutions can be combined to solve the global problem.
For convenience we can write A
i..j
to be the product of matrices i through j. It is easy to see that
A
i..j
is a p
i1
p
j
matrix. In parenthesizing the expression, we can consider the highest level of
parenthesization. At this level we are simply multiplying two matrices together. That is, for any k,
1 k n 1,
A
1..n
= A
1..k
A
k+1..n
.
Thus the problem of determining the optimal sequence of multiplications is broken up into 2 questions:
how do we decide where to split the chain (what is k?) and how do we parenthesize the subchains
A
1..k
and A
k+1..n
? The subchain problems can be solved by recursively applying the same scheme.
The former problem can be solved by just considering all possible values of k. Notice that this problem
satises the principle of optimality, because if we want to nd the optimal sequence for multiplying
A
1..n
we must use the optimal sequences for A
1..k
and A
k+1..n
. In other words, the subproblems must
be solved optimally for the global problem to be solved optimally.
We will store the solutions to the subproblems in a table, and build the table in a bottom-up manner.
For 1 i j n, let m[i, j] denote the minimum number of multiplications needed to compute
A
i..j
. The optimum cost can be described by the following recursive denition. As a basis observe that
if i = j then the sequence contains only one matrix, and so the cost is 0. (There is nothing to multiply.)
Thus, m[i, i] = 0. If i < j, then we are asking about the product A
i..j
. This can be split by considering
each k, i k < j, as A
i..k
times A
k+1..j
.
The optimum time to compute A
i..k
is m[i, k], and the optimum time to compute A
k+1..j
is m[k+1, j].
We may assume that these values have been computed previously and stored in our array. Since A
i..k
is a p
i1
p
k
matrix, and A
k+1..j
is a p
k
p
j
matrix, the time to multiply them is p
i1
p
k
p
j
. This
suggests the following recursive rule for computing m[i, j].
m[i, i] = 0
m[i, j] = min
ik<j
(m[i, k] +m[k + 1, j] +p
i1
p
k
p
j
) for i < j.
80
Lecture Notes CMSC 251
It is not hard to convert this rule into a procedure, which is given below. The only tricky part is arranging
the order in which to compute the values. In the process of computing m[i, j] we will need to access
values m[i, k] and m[k+1, j] for k lying between i and j. This suggests that we should organize things
our computation according to the number of matrices in the subchain. Let L = j i + 1 denote the
length of the subchain being multiplied. The subchains of length 1 (m[i, i]) are trivial. Then we build
up by computing the subchains of lengths 2, 3, . . . , n. The nal answer is m[1, n]. We need to be a
little careful in setting up the loops. If a subchain of length L starts at position i, then j = i + L 1.
Since we want j n, this means that i + L 1 n, or in other words, i n L + 1. So our loop
for i runs from 1 to n L + 1 (to keep j in bounds).
Chain Matrix Multiplication
Matrix-Chain(array p[1..n], int n) {
array s[1..n-1,2..n]
for i = 1 to n do m[i,i] = 0 // initialize
for L = 2 to n do { // L = length of subchain
for i = 1 to n-L+1 do {
j = i + L - 1
m[i,j] = INFINITY
for k = i to j-1 do {
q = m[i, k] + m[k+1, j] + p[i-1]*p[k]*p[j]
if (q < m[i, j]) { m[i,j] = q; s[i,j] = k }
}
}
}
return m[1,n] and s
}
The array s[i, j] will be explained later. It is used to extract the actual sequence. The running time of
the procedure is (n
3
). Well leave this as an exercise in solving sums, but the key is that there are
three nested loops, and each can iterate at most n times.
Extracting the nal Sequence: To extract the actual sequence is a fairly easy extension. The basic idea is
to leave a split marker indicating what the best split is, that is, what value of k lead to the minimum
value of m[i, j]. We can maintain a parallel array s[i, j] in which we will store the value of k providing
the optimal split. For example, suppose that s[i, j] = k. This tells us that the best way to multiply
the subchain A
i..j
is to rst multiply the subchain A
i..k
and then multiply the subchain A
k+1..j
, and
nally multiply these together. Intuitively, s[i, j] tells us what multiplication to perform last. Note that
we only need to store s[i, j] when we have at least two matrices, that is, if j > i.
The actual multiplication algorithm uses the s[i, j] value to determine how to split the current sequence.
Assume that the matrices are stored in an array of matrices A[1..n], and that s[i, j] is global to this
recursive procedure. The procedure returns a matrix.
Extracting Optimum Sequence
Mult(i, j) {
if (i > j) {
k = s[i,j]
X = Mult(i, k) // X = A[i]...A[k]
Y = Mult(k+1, j) // Y = A[k+1]...A[j]
return X*Y; // multiply matrices X and Y
}
else
return A[i];
}
81
Lecture Notes CMSC 251
i
0
j
7 2 6 4
1
2
3
4
1
2
3
4
0
p
4
p
3
p
2
p
1
p
m[i,j]
5
158
88
120
48
104
84
0 0 0
Final order
3
4
A
3
A
2
A
1
A
4
A
3
A
2
A
1
A
2
3
2
1
1
3
3 1
3 2 1
s[i,j]
i j
2
3
4
Figure 34: Chain Matrix Multiplication.
In the gure below we show an example. This algorithm is tricky, so it would be a good idea to trace
through this example (and the one given in the text). The initial set of dimensions are 5, 4, 6, 2, 7)
meaning that we are multiplying A
1
(5 4) times A
2
(4 6) times A
3
(6 2) times A
4
(2 7). The
optimal sequence is ((A
1
(A
2
A
3
))A
4
).
Lecture 27: NP-Completeness: General Introduction
(Tuesday, May 5, 1998)
Read: Chapt 36, up through section 36.4.
Easy and Hard Problems: At this point of the semester hopefully you have learned a few things of what
it means for an algorithm to be efcient, and how to design algorithms and determine their efciency
asymptotically. All of this is ne if it helps you discover an acceptably efcient algorithm to solve
your problem. The question that often arises in practice is that you have tried every trick in the book,
and still your best algorithm is not fast enough. Although your algorithm can solve small problems
reasonably efciently (e.g. n 20) the really large applications that you want to solve (e.g. n 100)
your algorithm does not terminate quickly enough. When you analyze its running time, you realize that
it is running in exponential time, perhaps n
n
, or 2
n
, or 2
(2
n
)
, or n!, or worse.
Towards the end of the 60s and in the eary 70s there were great strides made in nding efcient
solutions to many combinatorial problems. But at the same time there was also a growing list of
problems for which there seemed to be no known efcient algorithmic solutions. The best solutions
known for these problems required exponential time. People began to wonder whether there was some
unknown paradigm that would lead to a solution to these problems, or perhaps some proof that these
problems are inherently hard to solve and no algorithmic solutions exist that run under exponential
time.
Around this time a remarkable discovery was made. It turns out that many of these hard problems
were interrelated in the sense that if you could solve any one of them in polynomial time, then you
could solve all of them in polynomial time. The next couple of lectures we will discuss some of these
problems and introduce the notion of P, NP, and NP-completeness.
Polynomial Time: We need some way to separate the class of efciently solvable problems from inef-
ciently solvable problems. We will do this by considering problems that can be solved in polynomial
time.
82
Lecture Notes CMSC 251
We have measured the running time of algorithms using worst-case complexity, as a function of n, the
size of the input. We have dened input size variously for different problems, but the bottom line is the
number of bits (or bytes) that it takes to represent the input using any reasonably efcient encoding. (By
a reasonably efcient encoding, we assume that there is not some signicantly shorter way of providing
the same information. For example, you could write numbers in unary notation 11111111
1
= 100
2
= 8
rather than binary, but that would be unacceptably inefcient.)
We have also assumed that operations on numbers can be performed in constant time. From now on,
we should be more careful and assume that arithmetic operations require at least as much time as there
are bits of precision in the numbers being stored.
Up until now all the algorithms we have seen have had the property that their worst-case running times
are bounded above by some polynomial in the input size, n. A polynomial time algorithm is any
algorithm that runs in time O(n
k
) where k is some constant that is independent of n. A problem is said
to be solvable in polynomial time if there is a polynomial time algorithm that solves it.
Some functions that do not look like polynomials but are. For example, a running time of O(nlog n)
does not look like a polynomial, but it is bounded above by a the polynomial O(n
2
), so it is considered
to be in polynomial time.
On the other hand, some functions that do look like polynomials are not. For example, a running time
of O(n
k
) is not considered in polynomial time if k is an input parameter that could vary as a function
of n. The important constraint is that the exponent in a polynomial function must be a constant that is
independent of n.
Decision Problems: Many of the problems that we have discussed involve optimization of one form or
another: nd the shortest path, nd the minimum cost spanning tree, nd the knapsack packing of
greatest value. For rather technical reasons, most NP-complete problems that we will discuss will be
phrased as decision problems. A problem is called a decision problem if its output is a simple yes or
no (or you may think of this as True/False, 0/1, accept/reject).
We will phrase many optimization problems in terms of decision problems. For example, rather than
asking, what is the minimum number of colors needed to color a graph, instead we would phrase this
as a decision problem: Given a graph G and an integer k, is it possible to color G with k colors. Of
course, if you could answer this decision problem, then you could determine the minimum number of
colors by trying all possible values of k (or if you were more clever, you would do a binary search on
k).
One historical artifact of NP-completeness is that problems are stated in terms of language-recognition
problems. This is because the theory of NP-completeness grew out of automata and formal language
theory. We will not be taking this approach, but you should be aware that if you look in the book, it
will often describe NP-complete problems as languages.
Denition: Dene P to be the set of all decision problems that can be solved in polynomial time.
NP and Polynomial Time Verication: Before talking about the class of NP-complete problems, it is im-
portant to introduce the notion of a verication algorithm. Many language recognition problems that
may be very hard to solve, but they have the property that it is easy to verify whether its answer is
correct.
Consider the following problem, called the undirected Hamiltonian cycle problem (UHC). Given an
undirected graph G, does G have a cycle that visits every vertex exactly once.
An interesting aspect of this problems is that if the graph did contain a Hamiltonian cycle, then
it would be easy for someone to convince you that it did. They would simply say the cycle is
v
3
, v
7
, v
1
, . . . , v
13
). We could then inspect the graph, and check that this is indeed a legal cycle
and that it visits all the vertices of the graph exactly once. Thus, even though we know of no efcient
83
Lecture Notes CMSC 251
Nonhamiltonian Hamiltonian
Figure 35: Undirected Hamiltonian cycle.
way to solve the Hamiltonian cycle problem, there is a very efcient way to verify that a given graph is
Hamiltonian. The given cycle is called a certicate. This is some piece of information which allows us
to verify that a given string is in a language. If it is possible to verify the accuracy of a certicate for a
problem in polynomial time , we say that the problem is polynomial time veriable.
Note that not all languages have the property that they are easy to verify. For example, consider the
problem of determining whether a graph has exactly one Hamiltonian cycle. It would be easy for
someone to convince that it has at least one, but it is not clear what someone (no matter how smart)
would say to you to convince you that there is not another one.
Denition: Dene NP to be the set of all decision problems that can be veried by a polynomial time
algorithm.
Beware that polynomial time verication and polynomial time solvable are two very different concepts.
The Hamiltonian cycle problem is NP-complete, and so it is widely believed that there is no polynomial
time solution to the problem.
Why is the set called NP rather than VP? The original term NP stood for nondeterministic polyno-
mial time. This referred to a program running on a nondeterministic computer that can make guesses.
Basically, such a computer could nondeterministically guess the value of certicate, and then verify
that the string is in the language in polynomial time. We have avoided introducing nondeterminism
here. It would be covered in a course on complexity theory or formal language theory.
Like P, NP is a set of languages based on some complexity measure (the complexity of verication).
Observe that P NP. In other words, if we can solve a problem in polynomial time, then we can
certainly verify that an answer is correct in polynomial time. (More formally, we do not even need to
see a certicate to solve the problem, we can solve it in polynomial time anyway).
However it is not known whether P = NP. It seems unreasonable to think that this should be so. In
other words, just being able to verify that you have a correct solution does not help you in nding the
actual solution very much. Most experts believe that P ,= NP, but no one has a proof of this.
NP-Completeness: We will not give a formal denition of NP-completeness. (This is covered in the text,
and higher level courses such as 451). For now, think of the set of NP-complete problems as the
hardest problems to solve in the entire class NP. There may be even harder problems to solve that are
not in the class NP. These are called NP-hard problems.
One question is how can we the notion of hardness mathematically formal. This is where the concept
of a reduction comes in. We will describe this next.
Reductions: Before discussing reductions, let us rst consider the following example. Suppose that there
are two problems, A and B. You know (or you strongly believe at least) that it is impossible to solve
84
Lecture Notes CMSC 251
Harder
NPHard
NP
NPComplete
P
Easy
Figure 36: Relationship between P, NP, and NP-complete.
problem A in polynomial time. You want to prove that B cannot be solved in polynomial time. How
would you do this?
We want to show that
(A / P) (B / P).
To do this, we could prove the contrapositive,
(B P) (A P).
In other words, to show that B is not solvable in polynomial time, we will suppose that there is an
algorithm that solves B in polynomial time, and then derive a contradiction by showing that A can be
solved in polynomial time.
How do we do this? Suppose that we have a subroutine that can solve any instance of problem B in
polynomial time. Then all we need to do is to show that we can use this subroutine to solve problem A
in polynomial time. Thus we have reduced problem A to problem B.
It is important to note here that this supposed subroutine is really a fantasy. We know (or strongly
believe) that Acannot be solved in polynomial time, thus we are essentially proving that the subroutine
cannot exist, implying that B cannot be solved in polynomial time.
Let us consider an example to make this clearer. It is a fact that the problem of determining whether an
undirected graph has a Hamiltonian cycle (UHC) is an NP-complete problem. Thus, there is no known
polynomial time algorithm, and in fact experts widely believe that no such polynomial time algorithm
exists.
Suppose your boss of yours tells you that he wants you to nd a polynomial solution to a different
problem, namely the problem of nding a Hamiltonian cycle in a directed graph (DHC). You think
about this for a few minutes, and you convince yourself that this is not a reasonable request. After all,
would allowing directions on the edges make this problem any easier? Suppose you and your boss both
agree that the UHC problem (for undirected graphs) is NP-complete, and so it would be unreasonable
for him to expect you to solve this problem. But he tells you that the directed version is easier. After
all, by adding directions to the edges you eliminate the ambiguity of which direction to travel along
each edge. Shouldnt that make the problem easier? The problem is, how do you convince your boss
that he is making an unreasonable request (assuming your boss is willing to listen to logic).
You explain to your boss: Suppose I could nd an efcient (i.e., polynomial time) solution to the
DHC problem, then Ill show you that it would then be possible to solve UHC in polynomial time. In
particular, you will use the efcient algorithm for DHC (which you still havent written) as a subroutine
to solve UHC. Since you both agree that UHC is not efciently solvable, this means that this efcient
subroutine for DHC must not exist. Therefore your boss agrees that he has given you an unreasonable
task.
85
Lecture Notes CMSC 251
Here is how you might do this. Given an undirected graph G, create a directed graph G
by just
replacing each undirected edge u, v with two directed edges, (u, v) and (v, u). Now, every simple
path in the G is a simple path in G
does. Now, if you could develop an efcient solution to the DHC problem, you could use this
algorithm and this little transformation solve the UHC problem. Here is your algorithm for solving the
undirected Hamiltonian cycle problem. You take the undirected graph G, convert it to an equivalent
directed graph G
(by edge-doubling), and then call your (supposed) algorithmfor directed Hamiltonian
cycles. Whatever answer this algorithm gives, you return as the answer for the Hamiltonian cycle.
UHC to DHC Reduction
bool Undir_Ham_Cycle(graph G) {
create digraph G with the same number of vertices as G
for each edge {u,v} in G {
add edges (u,v) and (v,u) to G
}
return Dir_Ham_Cycle(G)
}
You would now have a polynomial time algorithm for UHC. Since you and your boss both agree that
this is not going to happen soon, he agrees to let you off.
Nonhamiltonian Hamiltonian
Figure 37: Directed Hamiltonian cycle reduction.
Notice that neither problem UHC or DHC has been solved. You have just shown how to convert a
solution to DHC into a solution for UHC. This is called a reduction and is central to NP-completeness.
Lecture 28: NP-Completeness and Reductions
(Thursday, May 7, 1998)
Read: Chapt 36, through Section 36.4.
Summary: Last time we introduced a number of concepts, on the way to dening NP-completeness. In
particular, the following concepts are important.
Decision Problems: are problems for which the answer is either yes or no. The classes P and NP
problems are dened as classes of decision problems.
P: is the class of all decisions problems that can be solved in polynomial time (that is, O(n
k
) for some
constant k).
86
Lecture Notes CMSC 251
NP: is dened to be the class of all decision problems that can be veried in polynomial time. This
means that if the answer to the problem is yes then it is possible give some piece of information
that would allow someone to verify that this is the correct answer in polynomial time. (If the
answer is no then no such evidence need be given.)
Reductions: Last time we introduced the notion of a reduction. Given two problems A and B, we say that
A is polynomially reducible to B, if, given a polynomial time subroutine for B, we can use it to solve
A in polynomial time. (Note: This denition differs somewhat from the denition in the text, but it is
good enough for our purposes.) When this is so we will express this as
A
P
B.
The operative word in the denition is if. We will usually apply the concept of reductions to problems
for which we strongly believe that there is no polynomial time solution.
Some important facts about reductions are:
Lemma: If A
P
B and B P then A P.
Lemma: If A
P
B and A / P then B / P.
Lemma: (Transitivity) If A
P
B and B
P
C then A
P
C.
The rst lemma is obvious from the denition. To see the second lemma, observe that B cannot be in
P, since otherwise A would be in P by the rst lemma, giving a contradiction. The third lemma takes a
bit of thought. It says that if you can use a subroutine for B to solve A in polynomial time, and you can
use a subroutine for C to solve B in polynomial time, then you can use the subroutine for C to solve
A in polynomial time. (This is done by replacing each call to B with its appropriate subroutine calls to
C).
NP-completeness: Last time we gave the informal denition that the NP-complete problems are the hard-
est problems in NP. Here is a more formal denition in terms of reducibility.
Denition: A decision problem B NP is NP-complete if
A
P
B for all A NP.
In other words, if you could solve B in polynomial time, then every other problemAin NP would
be solvable in polynomial time.
We can use transitivity to simplify this.
Lemma: B is NP-complete if
(1) B NP and
(2) A
P
B for some NP-complete problem A.
Thus, if you can solve B in polynomial time, then you could solve A in polynomial time. Since A is
NP-complete, you could solve every problem in NP in polynomial time.
Example: 3-Coloring and Clique Cover: Let us consider an example to make this clearer. Consider the
following two graph problems.
3-coloring (3COL): Given a graph G, can each of its vertices be labeled with one of 3 different col-
ors, such that no two adjacent vertices have the same label.
Clique Cover (CC): Given a graph G and an integer k, can the vertices of G be partitioned into k
subsets, V
1
, V
2
, . . . , V
k
, such that
i
V
i
= V , and that each V
i
is a clique of G.
87
Lecture Notes CMSC 251
k=3
Clique cover 3colorable Not 3colorable
Figure 38: 3-coloring and Clique Cover.
Recall that the clique is a subset of vertices, such that every pair of vertices in the subset are adjacent
to each other.
3COL is a known NP-complete problem. Your boss tells you that he wants you to solve the CC
problem. You suspect that the CC problem is also NP-complete. How do you prove this to your boss?
There are two things to be shown, rst that the CC problem is in NP. We wont worry about this, but it
is pretty easy. (In particular, to convince someone that a graph has a clique cover of size k, just specify
what the k subsets are. Then it is an easy matter to verify that they form a clique cover.)
The second item is to show that a known NP-complete problem (we will choose 3COL) is polynomially
reducible to CC. To do this, you assume that you have access to a subroutine CliqueCover(G,
k). Given a graph G and an integer k, this subroutine returns true if G has a clique cover of size k
and false otherwise, and furthermore, this subroutine runs in polynomial time. How can we use this
alleged subroutine to solve the well-known hard 3COL problem? We want to write a polynomial time
subroutine for 3COL, and this subroutine is allowed to call the subroutine CliqueCover(G,k) for
any graph G and any integer k.
Lets see in what respect the two problems are similar. Both problems are dividing the vertices up into
groups. In the clique cover problem, for two vertices to be in the same group they must be adjacent to
each other. In the 3-coloring problem, for two vertices to be in the same color group, they must not be
adjacent. In some sense, the problems are almost the same, but the requirement adjacent/non-adjacent
is exactly reversed.
Recall that if Gis a graph, then Gis the complement graph, that is, a graph with the same vertex set, but
in which edges and nonedge have been swapped. The main observation is that a graph Gis 3-colorable,
if and only if its complement G, has a clique-cover of size k = 3. Well leave the proof as an exercise.
Using this fact, we can reduce the the 3-coloring problem to the clique cover problem as follows.
Remember that this means that, if we had a polynomial time procedure for the clique cover problem
then we could use it as a subroutine to solve the 3-coloring problem Given the graph we want to
compute the 3-coloring for, we take its complement and then invoke the clique cover, setting k = 3.
3COL to CC Reduction
bool 3Colorable(graph G) {
let G = complement(G)
return CliqueCover(G,3)
}
There are a few important things to observe here. First, we never needed to implement the Clique-
Cover() procedure. Remember, these are all what if games. If we could solve CC in polynomial
time, then we could solve 3COL. But since we know that 3COL is hard to solve, this means that CC is
also hard to solve.
88
Lecture Notes CMSC 251
G G
k=3 k=3 G
_
G
_
3colorable
Not 3colorable
Clique cover No clique cover
Figure 39: Clique covers in the complement.
A second thing to observe that is the direction of the reduction. In normal reductions, you reduce the
problem that you do not know how to solve to one that you do know how to solve. But in NP-complete,
we do not know how to solve either problem (and indeed we are trying to show that an efcient solution
does not exist). Thus the direction of the reduction is naturally backwards. You reduce the known
problem to the problem you want to show is NP-complete. Remember this! It is quite counterintuitive.
Remember: Always reduce the known NP-complete problem to the problem you want to prove
is NP-complete.
The nal thing to observe is that the reduction really didnt attempt to solve the problem at all. It just
tried to make one problem look more like the other problem. A reductionist might go so far as to say
that there really is only one NP-complete problem. It has just been dressed up to look differently.
Example: Hamiltonian Path and Hamiltonian Cycle: Lets consider another example. We have seen the
Hamiltonian Cycle (HC) problem (Given a graph, does it have a cycle that visits every vertex exactly
once?). Another variant is the Hamiltonian Path (HP) problem (Given a graph, does it have a simple
path that visits every vertex exactly once?)
Suppose that we know that the HC problem is NP-complete, and we want to show that the HP problem
is NP-complete. How would we do this. First, remember what we have to show, that a known NP-
complete problem is reducible to our problem. That is, HC
P
HP. In other words, suppose that
we had a subroutine that could solve HP in polynomial time. How could we use it to solve HC in
polynomial time?
Here is a rst attempt (that doesnt work). First, if a graph hast a Hamiltonian cycle, then it certainly
must have a Hamiltonian path (by simply deleting any edge on the cycle). So if we just invoke the
HamPath subroutine on the graph and it returns no then we can safely answer no for HamCycle.
However, if it answers yes then what can we say? Notice, that there are graphs that have Hamiltonian
path but no Hamiltonian cycle (as shown in the gure below). Thus this will not do the job.
Here is our second attempt (but this will also have a bug). The problem is that cycles and paths are
different things. We can convert a cycle to a path by deleting any edge on the cycle. Suppose that the
89
Lecture Notes CMSC 251
graph G has a Hamiltonian cycle. Then this cycle starts at some rst vertex u then visits all the other
vertices until coming to some nal vertex v, and then comes back to u. There must be an edge u, v
in the graph. Lets delete this edge so that the Hamiltonian cycle is now a Hamiltonian path, and then
invoke the HP subroutine on the resulting graph. How do we know which edge to delete? We dont so
we could try them all. Then if the HP algorithm says yes for any deleted edge we would say yes
as well.
However, there is a problem here as well. It was our intention that the Hamiltonian path start at u
and end at v. But when we call the HP subroutine, we have no way to enforce this condition. If HP
says yes, we do not know that the HP started with u and ended with v. We cannot look inside the
subroutine or modify the subroutine. (Remember, it doesnt really exist.) We can only call it and check
its answer.
Correct Reduction Second Attempt
Both HC and HP exist There is both HC and HP
First Attempt
v
u
No HC and no HP No HC, but after deleting No HC but
there is HP
u
v
y
x
There is HC, and after
deleting {u,v} there is HP
{u,v} there is HP
x
v
u
u
v
y
Figure 40: Hamiltonian cycle to Hamiltonian path attempts.
So is there a way to force the HP subroutine to start the path at u and end it at v? The answer is yes,
but we will need to modify the graph to make this happen. In addition to deleting the edge from u to
v, we will add an extra vertex x attached only to u and an extra vertex y attached only to v. Because
these vertices have degree one, if a Hamiltonian path exists, it must start at x and end at y.
This last reduction is the one that works. Here is how it works. Given a graph G for which we want to
determine whether it has a Hamiltonian cycle, we go through all the edges one by one. For each edge
u, v (hoping that it will be the last edge on a Hamiltonian cycle) we create a new graph by deleting
this edge and adding vertex x onto u and vertex y onto v. Let the resulting graph be called G
. Then
we invoke our Hamiltonian Path subroutine to see whether G