cs330 10 Notes PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 204

Algorithms

Freely using the textbook by Cormen, Leiserson, Rivest, Stein


Pter Gcs
Computer Science Department
Boston University
Fall 2010
The class structure
See the course homepage.
In the notes, section numbers and titles generally refer to the book:
CLSR: Algorithms, third edition.
Algorithms
Computational problem example: sorting.
Input, output, instance.
Algorithm example: insertion sort.
Algorithm 2.1: INSERTION-SORT(A)
1 for j 2 to A.size do
2 key A[j]
// Insert A[j] into the sorted sequence A[1. . j 1]
3 i j 1
4 while i > 0 and A[i] > key do
5 A[i +1] A[i]
6 i
7 A[i +1] key
Pseudocode conventions
Assignment
Indentation
Objects Arrays are also objects.
Meaning of A[3. . 10].
Parameters Ordinary parameters are passed by value.
Objects are always treated like a pointer to the body of data,
which themselves are not copied.
Analyzing algorithms
By size of a problem, we will mean the size of its input (measured
in bits).
Cost as function of input size. In a problem of xed size, the cost
will be inuenced by too many accidental factors: this makes it
hard to draw lessons applicable to future cases.
Our model for computation cost: Random Access Machine Until
further renement, one array access is assumed to take constant
amount of time (independent of input size or array size).
Resources: running time, memory, communication (in case of
several processors or actors).
Estimating running time
When a program line involves only a simple instruction we
assume it has constant cost. The main question is the number
of times it will be executed. It is better to think in terms of
larger program parts than in terms of program lines.
If a part is a loop then the time is the overhead plus the sum of
the execution times for each repetition.
If a part is a conditonal then the time depends on which branch
is taken.
And so on.
In the example, assume:
c
1
is the cost of comparison.
c
2
is the cost of copying.
Ignore everything else.
The cost of iteration j is t
j
: depends on how many times is the
condition A[i] > key of the while loop satised.
Leading term:

j
t
j
= (n
2
).
Data structures
In the insertion sort, every time A[i] > key is found, two
assignments are made. So we perform 2 comparisons (cost c
1
) and
2 assignments (cost c
2
). But the bound c
2
papers over some
differences. The assignment
A[i +1] A[i]
could be the costly (who know how much data is in A[i]). Even
without changing the algorithm, by choosing the way of storing the
data can inuence this cost signicantly. Improvements:
A[i] should not contain the actual data if it is large, only the
address of the place where it be found (a link).
Instead of an array, use a linked list. Then insertion does not
involve pushing back everything above.
A[1] A[2] A[3]
... ... ...
A[n]
The array contains links to the actual data, so the copying during
sorting has a xed low cost.
When moving an element to a different place in a linked list, most
intermediate elements are not touched.
Worst case, average case
Worst case We estimated the largest cost of an algorithm for a
given input size. This is what we will normally do.
Average case What does this mean?
The notion of a random, or typical input is problematic.
But we can introduce random choices in our algorithm, by
a process called randomization. We will see examples
when this can give an average performance signicantly
better than the worst-case performance, on all inputs.
Strong and weak domination
For the insertion sort, in rst approximation, we just want to know
that its worst-case cost grows as n
2
, where n is the le size. In
your introductory courses CS112 and CS131 you already have seen
the basic notions needed to talk about this grows as. Here, I
collect some this information: more is found here than what I have
time for initially in class.
Rough comparison of functions
f (n) < g(n) means lim
n
f (n)/g(n) = 0: in words, g(n) grows
faster than f (n). Other notation:
f (n) = o(g(n)) f (n) < g(n) g(n) = ( f (n)).
Example: n 4 >116
_
n +80. We may also write
116
_
n +80 = o(n).
Generally, when we write f (n) = o(g(n)) then g(n) has a simpler
form than f (n) (this is the point of the notation).
f (n)

< g(n) means sup
n
f (n)/g(n) < , that is f (n) c g(n) for
some constant c. Other (the common) notation:
f (n) = O(g(n)) f (n)

< g(n) g(n) = ( f (n)).
(The notation

< is mine, you will not nd it in your books.)
This is a preorder. If f

< g and g

< f then we write f

= g,
f = (g), and say that f and g have the same rate of growth.
Example: n
2
5n and 100n(n +2) have the same rate of growth.
We can also write
100n(n +2) = O(n
2
), 100n(n +2) = (n
2
).
On the other hand, n +
_
n = O(n
2
) but not (n
2
).
Important special cases
O(1) denotes any function that is bounded by a constant, for
example (1 +1/n)
n
= O(1).
o(1) denotes any function that is converging to 0 as n . For
example, another way of writing Stirlings formula is
n! =
_
n
e
_
n
_
2n(1 + o(n)).
Not all pairs of functions are comparable
Here are two functions that are not comparable. Let f (n) = n
2
, and
for k = 0, 1, 2, . . . , we dene g(n) recursively as follows. Let n
k
be
the sequence dened by n
0
= 1, n
k+1
= 2
n
k
. So, n
1
= 2, n
2
= 4,
n
3
= 16, and so on. For k = 1, 2, . . . let
g(n) = n
k+1
if n
k
< n n
k+1
.
So,
g(n
k
+1) = n
k+1
= 2
n
k
= g(n
k
+1) = g(n
k
+2) = = g(n
k+1
).
This gives g(n) = 2
n1
for n = n
k
+1 and g(n) = n for n = n
k+1
.
Function g(n) is sometimes much bigger than f (n) = n
2
and
sometimes much smaller: these functions are incomparable for
<,

<.
Some function classes
Important classes of increasing functions of n:
Linear functions: (bounded by) c n for arbitrary constant c.
Polynomial functions: (bounded by) n
c
for some constant c > 0,
for n 2.
Exponential functions: those (bounded by) c
n
for some
constant c > 1.
Logarithmic functions: (bounded by) c log n for arbitrary
constant c. Note: If a function is logarithmic with log
2
then it is
also logarithmic with log
b
for any b, since
log
b
x =
log
2
x
log
2
b
= (log
2
x)(log
b
2).
These are all equivalence classes under

=.
Some simplication rules
Addition: take the maximum, that is if f = O(g) then
f + g = O(g). Do this always to simplify expressions.
Warning: do it only if the number of terms is constant! This is
wrong: n + n + (n times) + n ,= O(n).
f (n)
g(n)
is generally worth rewriting as 2
g(n) log f (n)
. For
example, n
log n
= 2
(log n)(log n)
= 2
log
2
n
.
But sometimes we make the reverse transformation:
3
log n
= 2
(log n)(log3)
= (2
log n
)
log3
= n
log3
.
The last form is the most meaningful, showing that this is a
polynomial function.
Examples
n/loglog n +log
2
n

= n/loglog n.
Indeed, loglog n <log n <n
1/2
, hence
n/loglog n >n
1/2
>log
2
n.
Order the following functions by growth rate:
n
2
3loglog n

= n
2
,
log n/n,
loglog n,
nlog
2
n,
3 +1/n

= 1,
_
5n/2
n
,
(1.2)
n1
+
_
n +log n

= (1.2)
n
.
Solution:
_
5n/2
n
<log n/n <1 <loglog n
<n/loglog n <nlog
2
n <n
2
<(1.2)
n
.
Sums of series
You must know the following three sums:
Arithmetic series 1 +2 +3 + + n =
n(n+1)
2
.
Geometric series 1 +q +q
2
+ +q
n1
=
1q
n
1q
.
Innite geometric series If [q[ < 1 then 1 +q +q
2
+ =
1
1q
.
Simplication of sums
For rates of growth, the following is more important:
Geometric series grows as fast as its largest element:
6 +18 + +2 3
n

= 3
n
Even more true of series growing faster, say,
1! +2! + + n!

= n!.
Sum of n
c
(for example arithmetic series) For rate of growth,
replace each term with the maximal one:
2
2
+5
2
+8
2
+ +(2 +3n)
2

= (n +1)(2 +3n)
2

= n
3
.
Even more true of a series growing slower:
log n! = log2 +log3 + +log n

= nlog n.
Let us derive formally, say 1
2
+2
2
+ +n
2

= n
3
. The upper bound
is easy.
Lower bound, with k = |n/2:
1
2
+ + n
2
k
2
+(k +1)
2
+ + n
2
(n/2 1)(n/2)
2

= n
3
.
Innite series
We will prove the following, via rough estimates:
1/3 +2/3
2
+3/3
3
+4/3
4
+ < .
Since any exponentially growing function grows faster than the
linear function, we know n

< 3
n/2
. Therefore
n 3
n

< 3
n/2
3
n
= 3
n/2
, and the whole sum is

< 1 +q +q
2
+ =
1
1 q
where q = 3
1/2
.
Another example:
1 +1/2 +1/3 + +1/n = (log n).
Indeed, for n = 2
k1
, upper bound:
1 +1/2 +1/2 +1/4 +1/4 +1/4 +1/4 +1/8 +
= 1 +1 + +1 (k times).
Lower bound:
1/2 +1/4 +1/4 +1/8 +1/8 +1/8 +1/8 +1/16 +
= 1/2 +1/2 + +1/2 (k times).
Divide and conquer
An efcient algorithm can frequently obtained using the following
idea:
1
Divide into subproblems of equal size.
2
Solve subproblems.
3
Combine results.
In order to handle subproblems, a more general procedure is often
needed.
Merge sort
1
Subproblems: sorting A[1. . n/2] and A[n/2 +1. . n]
2
Sort these.
3
Merge the two sorted arrays of size n/2.
The more general procedures now are the ones that sort an merge
arbitrary parts of an array.
Algorithm 3.1: MERGE(A, p, q, r)
Merges A[p . . q] and A[q +1. . r].
1 n
1
q p +1; n
2
r q
2 create array L[1. . n
1
+1] and R[1. . n
2
+1]
3 for i 1 to n
1
do L[i] A[p + i 1]
4 for j 1 to n
2
do R[j] A[q + j]
5 L[n
1
+1] ; R[n
2
+1]
6 i 1, j 1
7 for k p to r do
8 if L[i] R[j] then A[k] L[j]; i++
9 else A[k] R[j]; j++
Why the business? These sentinel values allow to avoid an extra
part for the case that L or R are exhausted. This is also why we
used new arrays for the input, rather than the output.
Algorithm 3.2: MERGE-SORT(A, p, r)
Sorts A[p . . r].
1 if p < r then
2 q [(p +1)/2|
3 MERGE-SORT(A, p, q); MERGE-SORT(A, q +1, r)
4 MERGE(A, p, q, r)
Analysis, for the worst-case running time T(n):
T(n)
_
c
1
if n = 1
2T(n/2) + c
2
n otherwise.
Draw a recursion tree based on this inequality.
Recursion tree
n
n/2 n/2
n/4 n/4 n/4 n/4
Work on top level
Total work on level 1
Total work on level 2
...
Assume n = 2
k
:
c
2
n(1 +2 1/2 +4 1/4 + +2
k
2
k
) + c
1
n,
= c
2
nk + c
1
n = n(c
2
log n + c
1
) = O(nlog n)
Nonrecursive version
Perform rst the jobs at the bottom level, then those on the next
level, and so on. In passes k and k +1:
merge merge merge merge
2
k1
2
k
merge merge
Sorting without random access
What if we have to sort such a large list A, that it does not t
into random access memory?
If we are at a position p we can inspect or change A[p], but in
one step, we can only move to position p +1 or p 1.
Note that the Algorithm MERGE(A, p, q, r) passed through arrays
A, L, R in one direction only, not jumping around. But both the
recursive and nonrecursive MERGE-SORT(A) uses random access.
We will modify the nonrecursive MERGE-SORT(A) to work
without random access.
Runs
For position p in array A, let p
/
p be as large as possible with
A[p] A[p +1] A[p
/
].
The sub-array A[p . . p
/
] will be called a run.
Our algorithm will consist of a number of passes. Each pass goes
through array A and merges consecutive runs: namely
performs MERGE(A, p, q, r) where p = 1, q = p
/
, r = (q +1)
/
,
does the same starting with p r +1
and so on.
Algorithm 3.3: MERGE-RUNS(A)
Merge consecutive runs in A, return the new number of runs.
1 n A.size; r 0; k 1
2 while r < n do
3 p r +1; q p
4 while q < n and A[q +1] A[q] do q++ // q p
/
5 if q = n then r n else
6 r q +1; k++
7 while r < n and A[r +1] A[r] do r++
8 MERGE(A, p, q, r)
9 return k
Algorithm 3.4: DISK-SORT(A)
Sort without random access.
1 repeat k MERGE-RUNS(A) until k = 1
Each MERGE-RUNS(A) roughly halves the number of runs.
If you view the arrays as being on tapes, the algorithm uses still
too many rewind operations to return from the end of the tape
to the beginning. In an exercise, I will ask you to modify
DISK-SORT(A) (and MERGE-RUNS(A)) to so as to eliminate almost
all rewinds.
Lower bound on sorting
Lower bounds are difcult, in general: hard to prove that every
algorithm needs at least time T(n) in the worst case of our
problem, on inputs of length n.
We do not have such a result even on sorting.
Frequently, we show the lower bound only on algorithms
belonging to a specic class.
In case of sorting, there is a natural class: algorithms that get
information about the array elements only in one way:
comparing them to each other.
They do not not look at details of an element: no arithmetical
operations on elements, or looking at individual bits.
We will only count the number of comparisons used.
Decision tree
From the work of any such algorithm, we can derive a decision
tree.
b < c
a < b < c
a < c < b
c <a < b
b <a < c b < c <a
c < b <a
N Y
Y
N
Y
N
Y
N
Y
N
a < b
a < c
b < c
a < c
Each execution of the algorithm gives a downward path in the tree.
Worst-case time is lower-bounded by longest downward path
(height) in this binary tree.
Theorem
A tree of height h has at most 2
h
leaves.
Indeed, when we increase the height by 1, each leaf can give rise to
at most two children, so the number of leaves can at most double.
Corollary
If a tree has L leaves, then its height is at least log L.
Application to sorting
The number of leaves is at least n!. Hence the height is at least
log n!: in fact, it is |log n!.
log n! = log2 +log3 + +log n nlog n.
We want lower bound. Let us show that for n large enough, this is
0.8nlog n.
log2 +log3 + +log n
log(n/10) +log(n/10 +1) + +log n
0.9nlog(n/10) = 0.9n(log n log10) 0.8nlog n.
Of course, we could have written 0.99 in place of 0.8 in this result.
Theorem
Sorting n items needs at least
log n! = nlog n(1 + o(1))
comparisons in the worst case.
Average case
Average case? We need a lower bound on the average pathlength
in a binary tree with a given number L of leaves. This seems hard
to compute, but we only need a lower bound! Call a binary tree
optimal, if its average pathlength is as small as possible for its
number of leaves.
Claim
In an optimal binary tree with height h, every path has
length h1 or h: hence, the average path length is [log L|.
Indeed, if there is any shorter path to some leaf l, take a leaf l
/
with
longest path, and move it under l. This decreases the average
pathlength.
Corollary
The average number of comparisons on a random list
with n elements is [log n!|.
Information
Here is an intuitive retelling of the above argument.
If all we know about an object x is that it belongs to some known
set E of size m, we say that we have an uncertainty of size log m (m
bits). Each comparison gives us at most 1 bit of information
(decrease of uncertainty) about the permutation in question. Since
there are n! permutations, our uncertainty is log n!, so we need
that many bits of information to remove it.
Finding the maximum
Here is another decision-tree lower bound, that is not
information-theoretic.
Problem Given a list of n numbers, nd the largest element.
Solution (You know it: one pass.)
Other solution Tournament.
Cost n 1 comparisons in both cases.
Question Is this optimal?
The information-theoretical lower bound is log n.
Theorem
We need at least n 1 comparisons in order to nd
the maximum.
Indeed, every non-maximum element must participate in a
comparison that shows it is smaller than the one it is compared to.
Listing permutations
Here is another algorithm that is guaranteed to take a long time:
list all permutations of an array.
Why is it taking long? For a trivial reason: the output is long (n! n
items).
How to implement? It is not so easy to organize going through all
possibilities. We will use recursion: similar techniques will help
with other problems involving exhaustive search.
In what order to list? Lexicographical order: If a
1
. . . a
k
. . . a
n
and
b
1
. . . b
k
. . . b
n
are permutations, a
i
= b
i
for i < k and a
k
< b
k
then a
1
. . . a
k
. . . a
n
comes rst.
Algorithm 4.1: INSERT(A, i, j)
1 x A[i]
2 if i < j then for k i +1 to j do A[k 1] A[k]
3 else if j < i then for k i 1 to j do A[k +1] A[k]
4 A[j] x
Algorithm 4.2: LIST-PERMS-LEX(A, p)
List, in lexicographic order, all permutations of A that do not
change A[1. . p 1]. Leave A in the original order.
1 if p = n then print A else
2 LIST-PERMS-LEX(A, p +1)
3 for i = p +1 to n do
4 INSERT(A, i, p); LIST-PERMS-LEX(A, p +1); INSERT(A, p, i)
The solution below uses only swaps, but lists in some
non-lexicographic order.
Algorithm 4.3: SWAP(A, i, j)
1 x A[i]; A[i] A[j]; A[j] x
Algorithm 4.4: LIST-PERMS-SWAP(A, p)
List, all permutations of A that do not change A[1. . p 1].
Leave A in original order.
1 if p = n then print A else
2 LIST-PERMS-SWAP(A, p +1)
3 for i = p +1 to n do
4 SWAP(A, i, p); LIST-PERMS-SWAP(A, p +1); SWAP(A, p, i)
Priority queues
If we sort an array, then it becomes very easy to nd the largest
element, the second largest, and so on. Sometimes we need a
data structure that just does this.
For this, we dene the abstract data type, or object class called
priority queue.
The priority queue object maintains a set S of elements, each
associated with a value (say, an integer) called a key.
Operations INSERT(S, x), MAXIMUM(S), EXTRACT-MAX(S).
Later INCREASE-KEY(x, k)
Applications Job scheduling in a multiuser operating system.
Event-driven simulation.
Later in this course: helping various algorithms to use data.
Implementations
Unordered array All operations but INSERT take (n) steps.
Sorted array All but MAXIMUM take (n) steps.
Heap (has nothing to do with the heaps of memory management)
Maximum takes 1 step, others log n.
Fibonacci heap (later) All operations cost O(log n), and have
amortized cost O(1). (See denition of amortized cost later).
Heap
1. (Binary) tree with keys increasing towards root:
A[PARENT(i)] A[i].
2. A special array implementation of this tree:
PARENT(i) = [i/2|,
LEFT(i) = 2i, RIGHT(i) = 2i +1.
A[1]=15
A[2]=12 A[3]=9
A[4]=8 A[5]=2 A[6]=1 A[7]=7
A[8]=6 A[9]=7 A[10]=1
Building and maintaining a heap
HEAPIFY(A, i) will be called by several of our methods. Let
argmax
A
(i, j, k) = i if A[i] A[j], A[k], else j if A[j] A[k], else k.
Algorithm 5.1: HEAPIFY(A, i)
Assuming the trees under LEFT(i), RIGHT(i) are heaps, make
the tree under i also a heap.
Implementation: oat down A[i].
1 l LEFT(i); if l > A.heap-size then l i
2 r RIGHT(i); if r > A.heap-size then r i
3 largest argmax
A
(i, l, r)
4 if largest ,= i then
5 SWAP(A, i, largest)
6 HEAPIFY(A, largest)
Implementing EXTRACT-MAX(A):
Put the last element into the root: A[1] A[A.heap-size].
HEAPIFY(A, 1).
Implementing INSERT(A, k):
A.heap-size++; A[A.heap-size] k
Float up A[A.heap-size].
Build a heap cheaper
The above implementation would be sufcient for sorting since we
can build a heap by repeated insertion, and then extract the
maximum repeatedly. But the following is better.
Algorithm 5.2: BUILD-HEAP(A)
Build a heap, assuming the elements are in the array A.
1 for i A.heap-size downto 1 do HEAPIFY(A, i)
The naive estimate for the cost is O(nlog n), if n = A.heap-size. But
more analysis shows:
Theorem
The cost of algorithm BUILD-HEAP(A) on a heap of size
n is O(n).
The largest depth is D = [log n|, starting with the root at depth 0.
For the 2
d
elements i is at depth d, the cost of HEAPIFY(A, i) is
hd. Total cost, converting the depth d to the height h = D d:
0

d=D
2
d
(D d) =
D

h=0
h2
Dh
= 2
D
D

h=0
h2
h
n

h=0
h2
h
.
Though there is an exact formula for the last sum, a rough estimate
is more instructive for us. Since h <2
h/2
, write, with some
constant c:

h0
h2
h
c

h0
2
h/2
< ,
as a geometric series with quotient < 1.
Abstract data types (objects)
Some pieces of data, together with some operations that can be
performed on them.
Externally, only the operations are given, and only these should
be accessible to other parts of the program.
The internal representation is private. It consists typically of a
number of records, where each record has key, satellite data.
We will ignore the satellite data unless they are needed for the
data manipulation itself.
Increasing a key
This operation (giving more priority to some element) is important
in some applications where the priority queue helps some
algorithm (say Dijkstras shortest path algorithm).
Let the changed item oat up:
Algorithm 5.3: INCREASE-KEY(A, i, key)
Increase A[i] to key > A[i].
1 A[i] key
2 while i > 1 and A[PARENT(i)] < A[i] do
3 SWAP(A, i, PARENT(i))
4 i PARENT(i)
The INSERT(A, key) operation can also be seen as a special case of
INCREASE-KEY(A, i, key).
Quicksort
Another way to apply divide and conquer to sorting. Both here
and in mergesort, we sort the two parts separately and combine the
results.
With Mergesort, we sort rst two predetermined halves, and
combine the results later.
With Quicksort, we determine what elements fall into two
halves (approximately) of the sorted result, using Partition.
Calling this recursively on both halves completes the sort.
Algorithm 6.1: PARTITION(A, p, r)
Partition A[p . . r] into elements x = A[r] placed in p, . . . , i,
into i +1 containing x, and the rest containing > x.
Return i +1.
1 x A[r]
2 i p 1
3 for j p to r 1 do
4 if A[j] x then
5 i++; SWAP(A, i, j)
6 SWAP(A, i +1, r); return i +1
Example A = [2, 8, 7, 1, 3, 5, 6, 4].
Analysis Invariant: at beginning of loop,
p, . . . , i
. _ .
x
, i +1, . . . , j 1
. _ .
>x
Algorithm 6.2: QUICKSORT(A, p, r)
1 if p < r then
2 q PARTITION(A, p, r)
3 QUICKSORT(A, p, q 1); QUICKSORT(A, q +1, r)
Performance analysis
Worst case
Let T(n) be the worst-case number of comparisons. If
q = PARTITION(A, 1, n) then
T(n) = T(q 1) + T(n q) + n 1.
This is not immediately helpful, since we do not know q. But q = n
is possible (say, the array was already sorted), giving
T(n) = T(n 1) + n 1. Resolving the recursion shows that all
comparisons are performed!
Balanced partitioning
Hypothetical partitioning algorithm that always gives
0.1n < q < 0.9n.
Analysis with recursion tree gives nlog n/log(10/9).
Approximate balanced partitioning by partitioning around a
random array element.
Random number generator RANDOM(a, b) returns a random,
equally distributed integer between a and b (b a +1
possibilities).
In practice, we only have a pseudo-random number generator.
This is a separate topic, for the moment let us hope it gives
numbers that behave just like true random ones.
Randomized Quicksort
Algorithm 6.3: RANDOMIZED-PARTITION(A, p, r)
1 SWAP(A, r, RANDOM(p, r))
2 return PARTITION(A, p, r)
Algorithm 6.4: RANDOMIZED-QUICKSORT(A, p, r)
1 q RANDOMIZED-PARTITION(A, p, r)
2 RANDOMIZED-QUICKSORT(A, p, q 1)
3 RANDOMIZED-QUICKSORT(A, q, r)
Expected value
In order to analyze randomized quicksort, we learn some
probability concepts. I assume that you learned about random
variables already in an introductory course (Probability in
Computing, or its equivalent).
For a random variable X with possible values a
1
, . . . , a
n
, its
expected value X is dened as
a
1

_
X = a
1
_
+ + a
n

_
X = a
n
_
.
Example: if Z is a random variable whose values are the possible
outcomes of a toss of a 6-sided die, then
Z = (1 +2 +3 +4 +5 +6)/6 = 3.5.
Example: If Y is the random variable that is 1 if Z 5, and 0
otherwise, then
Y = 1 | Z 5| +0 | Z < 5| = | Z 5| .
Sum theorem
Theorem
For random variables X, Y (on the same sample
space):
(X + Y) = X +Y.
Example
For the number X of spots on top after a toss of a die,
let A be the event 2[X , and B the event X > 1. Dad gives me a
dime if A occurs and Mom gives one if B occurs. What is my
expected win?
Let I
A
be the random variable that is 1 if A occurs and 0 otherwise.
(I
A
+ I
B
) = I
A
+I
B
= (A) +(B) = 1/2 +5/6 dimes.
Average-case analysis of Quicksort
Let the sorted order be z
1
< z
2
< < z
n
. If i < j then let
Z
i j
= |z
i
, z
i+1
, . . . , z
j
|.
Let C
i j
= 1 if z
i
and z
j
will be compared sometime during the sort,
and 0 otherwise.
The only comparisons happen during a partition, with the pivot
element. Let
i j
be the rst (random) pivot element entering Z
i j
. A
little thinking shows:
Lemma
We have C
i j
= 1 if and only if
i j
|z
i
, z
j
|. Also, for
every x Z
i j
, we have

i j
= x
_
=
1
j i +1
.
It follows that
_
C
i j
= 1
_
= C
i j
=
2
ji+1
. The expected number
of comparisons is, with k = j i +1:

1i<jn
C
i j
=

1i<jn
2
j i +1
= 2
n

k=2
nk+1

i=1
1
k
= 2
n

k=2
n k +1
k
= 2
_
n 1
2
+
n 2
3
+ +
1
n
_
= 2(n +1)
_
1 +
1
2
+
1
3
+ +
1
n
_
4n.
From analysis we know 1 +
1
2
+
1
3
+ +
1
n
= lnn +O(1). Hence
the average complexity is O(nlog n).
Warning: We want to know also the probability of the cost
exceeding, say, 10nlog n. Analysis of the variance of the random
cost gives this, too.
Median and order statistics
The median of a list is the middle element (or the rst one, if
there are two) in the sorted order. We will try to nd it faster
than by full sorting.
More generally, we may want to nd the ith element in the
sorted order. (This generalizes minimum, maximum and
median.) This is called selection.
We will see that all selection tasks can be performed in linear
time.
Selection in expected linear time
Algorithm 7.1: RANDOMIZED-SELECT(A, p, r, i)
Return the ith element in the sorted order of A[p . . r].
1 if p = r then return p
2 q RANDOMIZED-PARTITION(A, p, r)
3 k q p +1
4 if i = k then return A[q]
5 else if i < k then return RANDOMIZED-SELECT(A, p, q 1, i)
6 else return RANDOMIZED-SELECT(A, q +1, r, i k)
Analysis
Worst-case running time is O(n
2
), even for nding the
minimum!
We will nd an upper bound T(n) on the expected running
time of RANDOMIZED-SELECT(A, p, r, i), independently of i, when
r p +1 = n.
Assume that cn is the cost of partition and the decision.
Denoting by
0
the running time of the whole process, and
1
the running time of the recursive call, we have

0
an +
1
,

0
an +
1
,
where an is the cost of partition.
Let Q = RANDOMIZED-PARTITION(A, p, r). Let us bound
1
under
the condition Q p +1 = k:
If i = k, then
1
= 0.
Else if i < k then
1
T(k 1),
Else
1
T(n k).
(
1
[Q p +1 = k) T(max(k 1, n k)).
Since
_
Q p +1 = k
_
=
1
n
, we can now sum, using Bayess
Theorem:

k=1
1
n
T(max(k 1, n k))
2
n
n1

k=n/2
T(k),
where we assumed n even, for simplicity.
We got the recursion T(n) an +
2
n

n1
k=n/2
T(k). We will prove by
induction T(n) cn for some c > 0. Assume it is true for < n, we
prove it for n (choosing c appropriately).
2
n
n1

k=n/2
ck =
2c
n

n
2

n/2 + n 1
2

3
4
cn,
where we used the sum formula of an arithmetic series. Now
an +
3
4
cn = cn if we make a +0.75c = c, that is c = 4a. For the
induction we also need the initial condition T(1) c 1, but this is
true with c = 4a.
Number-theoretic algorithms
Size of input Just as before, we count the number of input bits. If
the input is a positive integer x of 1000 bits, the input size is
1000 = |log x, and not x. So, counting from x down to 1 is
an exponential algorithm, as a function of the input size.
Cost of operations Most of the time, for cost, we still mean the
execution time of the computer, which has a xed word size.
So, multiplying two very large numbers x, y is not one
operation, since the representation of the numbers x, y must
be spread over many words. We say that we measure the
number of bit operations.
Sometimes, though, we will work with a model in which we
just measure the number of arithmetic operations. This is
similar to the sorting model in which we only counted
comparisons.
Elementary number theory
Divisibility and divisors. Notation x[ y.
Prime and composite numbers.
Theorem
If a, b are integers, b > 0, then there are unique
numbers q, r with
a = qb + r, 0 r < b.
Of course, q = [a/b|. We write r = a mod q.
Cost of computing the remainder and quotient: O(len(q)len(b)).
Indeed, the long division algorithm has len(q) iterations, with
numbers of length len(b).
If u mod b = v mod b then we write
u v (mod b)
and say u is congruent to v modulo b. For example, numbers
congruent to 1 modulo 2 are the odd numbers.
Common divisors
It is not easy to nd divisors of a number: most cryptographic
algorithms are based on this difculty of factorization.
But, as we will see, common divisors of two (or more) numbers are
easy to nd.
Theorem
For numbers a, b there are integers x, y such that
d = xa + y b > 0
is a common divisor of a and b.
It is easy to see that every common divisor of a, b divides d, so it is
the greatest common divisor, denoted by gcd(a, b).
Relatively prime integers a, b: those without common divisor, that
is gcd(a, b) = 1.
Fundamental theorem
Lemma (Fundamental)
If p is prime, then p[ab if and only if
p[a or p[b.
Proof. If p does not divide a, then gcd(a, p) = 1 = xa + yp, so
b = xab + ypb. Since p[ab, this shows p[b.
This implies
Theorem (Fundamental theorem of arithmetic)
Every positive
integer n has a unique prime decomposition n = p
e
1
1
p
e
k
k
.
The existence of some decomposition can be seen easily: just keep
dividing as long as you can, and collect the indivisible parts (the
primes). The lemma is needed to prove uniqueness.
Euclids algorithm
Based on the observation that if a = qb + r then the common
divisors of a, b are the same as the common divisors of b, r.
Algorithm 8.1: EUCLID(a, b)
Computes gcd(a, b)
1 if b = 0 then return a
2 else return EUCLID(b, a mod b)
Example: gcd(30, 21).
30 = 1 21 +9
21 = 2 9 +3
9 = 3 3
gcd = 3
Running time
Assume b < a:
a = bq + r b + r > 2r,
a/2 > r,
ab/2 > br.
In each iteration, the product decreases by a factor of 2.
|log(ab) iterations, using division.
Extended Euclidean algorithm
Proof of the theorem that some common divisor of a, b (namely the
greatest one) can be written as xa + y b.
It is true if b = 0. Assume it is true for numbers for which the
Euclidean algorithm takes < n steps, and prove by induction.
a = bq + r.
By induction,
d = x
/
b + y
/
r = x
/
b + y
/
a y
/
qb = y
/
a +(x
/
qy
/
)b.
Algorithm 8.2: EXTENDED-EUCLID(a, b)
Returns (d, x, y) where d = gcd(a, b) = xa + y b.
1 if b = 0 then return (a, 1, 0)
2 else
3 (d
/
, x
/
, y
/
) EXTENDED-EUCLID(b, a mod b)
4 return (d
/
, y
/
, x
/
[a/b|y
/
)
Nonrecursive version
The following version is better for analysis and hand calculation:
Algorithm 8.3: EXTENDED-EUCLID(a, b)
nonrecursive version.
1 r a; r
/
b
2 x 1; y 0
3 x
/
0; y
/
1
4 while r
/
,= 0 do
5 q [r/r
/
|
6 (r, x, y, r
/
, x
/
, y
/
) (r
/
, x
/
, y
/
, r qr
/
, x qx
/
, y qy
/
)
7 return (r, x, y)
Proof of validity: show by induction that always r = xa + y b.
Diophantine equations
Solving the equation
ax + by = c
in integers. Let d = gcd(a, b). There is a solution only if d[c. By the
Fundamental Lemma, there are x
/
, y
/
with ax
/
+ by
/
= d. Hence
we get a solution
x = x
/
(c/d), y = y
/
(c/d).
Multiplicative inverse
If gcd(a, m) = 1 then there is an a
/
with
aa
/
mod m = 1.
Indeed, we just need to solve the equation ax + my = 1. The
number a
/
is called the multiplicative inverse of a modulo m.
Application: If p is prime then every a < p has a multiplicative
inverse.
Modular arithmetic
Congruences behave somewhat like equations:
Theorem
If a
1
b
1
(mod m) and a
2
b
2
(mod m) then
a
1
ia
2
b
1
i b
2
(mod m),
a
1
a
2
b
1
b
2
(mod m).
Verify this in the lab. Division is also possible in the cases when
there is a multiplicative inverse:
Theorem
If gcd(a, m) = 1 then there is an a
/
with a
/
a 1
(mod m).
Efcient exponentiation
In modular arithmetic, we can exponentiate in a polynomial time:
Theorem
Given numbers a, n, m, there is an algorithm that
computes a
n
mod m in time polynomial in the size of input (which
is log a +log m+log m).
This is not true for ordinary exponentiation: 2
n
has (n) digits, so
it cannot be computed in time O(log n).
The trick that helps in the modular case is repeated squaring:
Compute a
2
mod m, a
4
mod m, a
8
mod m, . . . , using
a
2k
mod m = (a
k
mod m)
2
mod m.
Then multiply the needed powers. For example if
n = 10 = 2
3
+2
2
1
, then a
10
= a
2
3
a
2
1
.
Dynamic sets
These are abstract data types, just like heaps. Various subtypes
exist, depending on which operations we want to support:
SEARCH(S, k)
INSERT(S, k)
DELETE(S, k)
A data type supporting these operations is called a dictionary.
Other interesting operations:
MINIMUM(S), MAXIMUM(S)
SUCCESSOR(S, x) (given a pointer to x), PREDECESSOR(S, x)
UNION(S
1
, S
2
)
Elementary data structures
You have studied these in the Data Structures course.
Stack operations:
STACK-EMPTY(S)
PUSH(S, x)
POP(S)
Implement by array or linked list.
Queue operations:
ENQUEUE(Q, x)
DEQUEUE(S, x)
Implement by circular array or linked list (with link to end).
Linked lists
Singly linked, doubly linked, circular.
LIST-SEARCH(L, k)
LIST-INSERT(L, x) (x is a record)
LIST-DELETE(L, x)
In the implementation, sentinel element nil[L] between the head
and the tail to simplify boundary tests in a code.
Dynamic storage allocation
This underlies to a lot of data structure implementation.
You must either explicitly implement freeing unused objects of
various sizes and returning the memory space used by them to free
lists of objects of various sizes, or use some kind of garbage
collection algorithm.
The present course just assumes that you have seen (or will see)
some implementations in a Data Structures course (or elsewhere).
Representing trees
Binary tree data type, at least the following functions:
PARENT[x], LEFT[x], RIGHT[x].
We can represent all three functions explicitly by pointers.
Rooted trees with unbounded branching: One possible
representation via a binary tree is the left-child, right-sibling
representation.
Other representations : Heaps, only parent pointers, and so on.
Hash tables
Problem A very large universe U of possible items, each with a
different key. The number n of actual items that may
eventually occur is much smaller.
Solution idea Hash function h(k), hash table T[0. . m1]. Key k
hashes to hash value (bucket) h(k).
New problem Collisions.
Resolution Chaining, open hashing, and so on.
Uniform hashing assumption Assumes that
Items arrive randomly.
Search takes (1 + n/m), on average, since the average
list length is n/m.
Hash functions
What do we need? The hash function should spread the (hopefully
randomly incoming) elements of the universe as uniformly as
possible over the table, to minimize the chance of collisions.
Keys into natural numbers It is easier to work with numbers than
with words, so we translate words into numbers.
For example, as radix 128 integers, possibly adding up these
integers for different segments of the word.
Randomization: universal hashing
To guarantee the uniform hashing assumption, instead of assuming
that items arrive randomly, we we choose a random hash
function, h(, r), where r is a parameter chosen randomly from
some set H.
Denition
The family h(, ) is universal if for all x ,= y U we
have

_
h(x, r) = h( y, r)
_

1
m
.
If the values h(x, r) and h( y, r) are pairwise independent, then the
probability is exactly
1
m
(the converse is not always true). Thus,
from the point of view of collisions, universality is at least as good
as pairwise independence.
An example universal hash function
We assume that our table size m is a prime number. (Not too
restrictive: it is known that for every m there is a prime between m
and 2m.) Let d > 0 be an integer dimension. We break up our key
x into a sequence
x = x
1
, x
2
, . . . , x
d
, x
i
< m.
Fix the random coefcients r
i
< m, i = 1, . . . , d +1, therefore
[H[ = m
d+1
.
h(x, r) = r
1
x
1
+ + r
d
x
d
+ r
d+1
mod m.
Let us show that all values h(x, r) are pairwise independent,
therefore our random hash function is universal. We will prove that
h(x, r) takes each of its m possible values with the same
probability, independently of how we x h( y, r).
Every value A = h(x, r) appears exactly m
d
times. Indeed, no
matter how we choose r
1
, . . . , r
d
, one can choose r
d+1
uniquely
to make h(x, r) = A.
Now take some y ,= x, for example x
1
,= y
1
. Fix h(x, r) = A,
h( y, r) = B. Chooose r
2
, . . . , r
d
arbitrarily, and consider
= x
1
y
1
, W = r
2
(x
2
y
2
) + + r
d
(x
d
y
d
),
AB r
1
+W (mod m).
The last equation has exactly one solution r
1
=
1
(AB W),
where
1
is the multiplicative inverse of modulo the prime
m. Given r
1
, . . . , r
d
there is a unique r
d+1
with h( y, r) = B. So
every pair of values A, B appears exactly m
d1
times. This
proves the independence.
Optional: perfect hashing
If we know the set S of n keys in advance, it is also possible to hash
without any collisions, into a table of size 6n, using just 2 probes.
(Method of Fredman, Komls and Szemerdi).
Observations:
1
If m> n
2
then there is a vector c = (c
1
, . . . , c
d+1
) such that the
hash function h(, c) has no collisions. Indeed, for each pair
x ,= y, the probability of a collision is 1/m. There are
n(n 1)/2 such pairs, so the probability of having a collision
on even one pair is at most n(n 1)/(2m) < 1/2. Therefore
there is a vector c such that h(, c) has no collisions for S.
2
Now assume that m> 3n. For each vector r and position z, let
C
z
(r) be the number of keys colliding on position z. We will
show later that there is a vector b with

z
C
z
(b)
2
< 3n.
Using the above observations, here is a 2-step hashing scheme.
Using vector b, create a table U
z
of size C
z
(b)
2
at each position
z of our table T of size 3n where there are collisions. For each
table U
z
(b), use a hash function h(, c(z)) to hash x to a unique
position in U
z
. This is possible due to argument
1
above.
Now, rst use h(x, b) to hash key x to a position z in table T. If
there were collisions then T at position z will hold the key c(z)
of the second hash function h(x, c(z)) that hashes x to a unique
position in table U
z
.
Total size: 3n for table T, and < 3n for the union of all
nonempty tables U
z
.
Square sizes of chains
It remains to show that for some value b of r, we have

z
C
z
(b)
2
< 3n.
Fix position z. For item i, let X
i
= 1 if h(i, r) = z and 0 otherwise.
Then C
z
=

n
i=1
X
i
. The variables X
i
are pairwise independent,
with
_
X
i
= 1
_
= 1/m. Because of the independence,
X
i
X
j
= X
i
X
j
for i ,= j, allowing to write
C
2
z
=

i
X
2
i
+

i,=j
X
i
X
j
,
C
2
z
=

i
X
2
i
+

i,=j
X
i
X
j
=
n
m
+
n(n 1)
m
2
,

z
C
2
z
= n
_
1 +
n 1
m
_
< n(1 +1/3).
Since this is the average over all vectors r there is obviously some
vector b with

z
C
z
(b)
2
< 3n.
Open addressing
Hashing must sometimes be very fast, for example in caches. In
this case, even the pointer operations of the chaining collision
resolution are considered slow.
Open addressing puts the collided elements into the same table,
at a new position computed very fast. Looking for the new
position is called probing. We need a whole probe sequence
h(k, 0), h(k, 1, ), . . . .
In open addressing schemes, deletion is problematic, though
there are several marking methods.
It is not easy to analyze open addressing, since the uniform
hashing assumption is too strong.
Linear probing h(k, i) = (h
/
(k) + i) mod m. There is a problem of
primary clustering: long runs of collided elements build up.
Quadratic probing tries to correct this: h(k, i) = h
/
(k) + c
1
i + c
2
i
2
.
There is, however, a problem of secondary clustering, since
probing sequences are independent of k.
Double hashing solves these issues:
h(k, i) = h
1
(k) + ih
2
(k),
where h
2
(k) is relatively prime to m.
Example: h
1
(k) = k mod m with m a prime,
h
2
(k) = 1 + k mod (m1).
Search trees
Ordered left-to-right, items placed into the tree nodes (not only
in leaves).
Inorder tree walk.
Search, min, max, successor, predecessor, insert, delete. All in
O(h) steps.
Searches
Iterative or recursive TREE-SEARCH(x, k), where x is (a pointer
to) the root.
Minimum and maximum are easy.
TREE-SUCCESSOR(x)
if right subtree of x is nonempty then its minimum.
else go up: return the rst node to whom you had to step
right-up (if it exists).
Binary search tree
Insertion and deletion
Insertion always into a leaf.
Deletion TREE-DELETE(T, z) is more complex. We use the procedure
TRANSPLANT(T, u, v),
which replaces the subtree starting at u in T with the tree
rooted in v. Warning:
Before, the tree of v need not be part of T.
After, the pointer u still points to the record u (along with
any subtree hanging from it).
Algorithm 10.1: TREE-DELETE(T, z)
Delete node z from tree T.
1 if z.left = nil then TRANSPLANT(T, z, z.right)
2 else if z.right = nil then TRANSPLANT(T, z, z.left)
3 else
4 y TREE-MINIMUM(z.right) // y has no left child.
5 if y.parent ,= z then
// Splice it out from between parent and right child.
6 TRANSPLANT(T, y, y.right) // Does not lose original y.
7 y.right z.right; y.right.parent y
8 TRANSPLANT(T, z, y)
9 y.left z.left; y.left.parent y
Algorithm 10.2: BS-TREE-SPLIT(T, k)
Split T around key k into two trees.
1 (L, R) new trees
2 if T is empty then L empty; R empty
3 else if k < T.root.key then
4 R T
5 (L, R
/
) BS-TREE-SPLIT(T.left)
6 R.left R
/
7 else
8 L T
9 (L
/
, R) BS-TREE-SPLIT(T.right)
10 L.right L
/
11 return (L, R)
Randomized binary search trees
It can be shown (see book) that if n keys are inserted into a
binary search tree in random order, then the expected height of
the tree is only O(log n).
You cannot expect that keys will be inserted in random order.
But it is possible (Martnez-Roura) to randomize the insertion
and deletion operations in a way that will still guarantee the
same result.
Here is the idea of randomized insertion into a binary tree of
size n.
With probability
1
n+1
, insert the key as root, and split the tree
around it.
With probability 1
1
n+1
, randomized-insert recursively into the
left or right subtree, as needed.
There are corresponding deletion and join operations.
Algorithm 10.3: RAND-BS-TREE-INSERT(T, k)
Randomized insert into a binary search tree.
1 n T.size
2 r RANDOM(0, n)
3 if r = n then // Insert at the root.
4 (L, R) BS-TREE-SPLIT(T, k)
5 T new tree; T.key k
6 (T.left, T.right) (L, R);
7 else if k < T.root.key then RAND-BS-TREE-INSERT(T.left, k)
8 else RAND-BS-TREE-INSERT(T.right, k)
Balanced trees
The cost of most common operations on search trees (search,
insert, delete) is O(h), where h is the height.
We can claim h = O(log n) for the size n, if the tree is balanced.
But keeping a tree completely balanced (say a binary tree as
complete as possible) is as costly as maintaining a sorted array.
Idea: keep the tree only nearly balanced! The idea has several
implementations: AVL trees, B-trees, red-black trees. We will
only see B-trees.
B-trees
Search trees, with the following properties, with a parameter t 2.
The records are kept in the tree nodes, sorted, between the
edges going to the subtree.
Every path from root to leaf has the same length.
Every internal node has at least t 1 and at most 2t 1
records (t to 2t children).
Total number of nodes n 1 + t + + t
h
=
t
h+1
1
t1
, showing
h = O(log
t
n).
Maintaining a B-tree
Insertion Insert into the appropriate node. Split, if necessary. If
this requires split in the parent, too, split the parent.
There is an implementation with only one, downward, pass. It
makes sure that each subtree in which we insert, the root is not
full: has fewer than 2t 1 keys. To achieve it, we split
proactively.
Deletion More complex, but also doable with one, downward,
pass. Make sure that when we delete from a subtree, the root
has more than t 1 (the minimum) keys. To achieve this,
merge proactively.
Insertion
Algorithm 10.4: B-TREE-SPLIT-CHILD(x, i)
Split the ith child c
i
of non-full node x: before split, the child
has 2t 1 keys. No surprises.
1 z new node; y x.c
i
2 z.n t 1; z.leaf y.leaf // If y is a leaf, so be z.
3 for j = 1 to t 1 do z.key
j
y.key
j+t
4 if not y.leaf then for j = 1 to t do z.c
j
y.c
j+t
5 y.n t 1 // Now z contains the second half of y.
6 for j = x.n +1 downto i +1 do x.c
j+1
x.c
j
7 x.c
i+1
z // Now z is a child of x.
8 for j = x.n downto i do x.key
j+1
x.key
j
9 x.key
i
y.key
t
// The middle key of y is inserted into x.
10 x.n++
Algorithm 10.5: B-TREE-INSERT-NONFULL(x, k)
Insert into a subtree at a non-full node.
1 i x.n
2 if x.leaf then // Just insert the key into the leaf.
3 while i 1 and k < x.key
i
do x.key
i+1
x.key
i
; i
4 x.key
i+1
k; x.n++
5 else
6 while i 1 and k < x.key
i
do i
7 i++ // Found the child to insert into.
8 if x.c
i
.n = 2t 1 then
9 B-TREE-SPLIT-CHILD(x, i)
10 if k > x.ke y
i
then i++
11 B-TREE-INSERT-NONFULL(x.c
i
, k)
Algorithm 10.6: B-TREE-INSERT(T, k)
1 r T.root
2 if r.n < 2t 1 then B-TREE-INSERT-NONFULL(r, k)
3 else // Add new root above r before splitting it.
4 s new node; s.leaf false; s.n 0
5 s.c
1
r; T.root s
6 B-TREE-SPLIT-CHILD(s, 1)
7 B-TREE-INSERT-NONFULL(s, k)
Deletion
The deletion algorithm is more complex. See the book for an
outline, and try to implement it in pseudocode as an exercise.
Augmenting data structures
Dynamic order statistics
Cache in each node the size of the subtree.
Retrieving an element with a given rank (selection).
Determining the rank of a given element.
Maintaining subtree sizes.
Dynamic programming
When there is a recursive algorithm based on subproblems but the
total number of subproblems is not too great, it is possible to cache
(memoize) all solutions in a table.
Fibonacci numbers.
Binomial coefcients.
(In both cases, the algorithm is by far not as good as computing the
known formulas.)
Longest common subsequence
The diff program in Unix: what does it mean to say that we nd
the places where two les differ (including insertions and
deletions)? Or, what does it mean to keep the common parts?
Let it mean the longest subsequence present in both:
X = a b c b d a b
Y = b d c a b a b a
b c b a
Running through all subsequences would take exponential
time. There is a faster solution, recognizing that we only want
to nd some longest subsequence.
Let c[i, j] be the length of the longest common subsequence of
the prex of X[1. . i] and Y[1. . j]. Recursion:
c[i, j] =
_
_
_
0 if i = 0 or j = 0,
c[i 1, j 1] +1 if i, j > 0 and x
i
= y
j
,
max(c[i, j 1], c[i 1, j]) otherwise.
We are not computing a function C(X, Y, i, j) by naive
recursion, but collect the values c[i, j] as they are computed, in
array c: C(X, Y, i, j, c) checks whether c[i, j] is dened. If yes, it
just returns c[i, j]; else it uses the above recursion, and assigns
c[i, j] before returning the value.
Non-recursive implementation
Compute the table c[i, j] bottom-up.
Also, store the value b[i, j] = ||, || or || depending on
whether the optimum is c[i, j 1], c[i 1, j] or
c[i 1, j 1] +1. (We also use the value |, | in case
c[i, j] = c[i 1, j] = c[i, j 1], though this is not important.)
Find the longest common subsequence walking backwards on
the arrows.
j 1 2 3 4 5 6
i b d c a b a

1 a 0 0 0 1 1 1

2 b 1 1 1 1 2 2

3 c 1 1 2 2 2 2

4 b 1 1 2 2 3 3

5 d 1 2 2 2 3 3

6 a 1 2 2 3 3 4

7 b 1 2 2 3 4 4
Algorithm 11.1: LCS-LENGTH(X, Y)
1 mX.length; n Y.length
2 b[1. . m, 1. . n], c[0. . m, 0. . n] new tables
3 for i = 1 to m do c[i, 0] 0
4 for j = 1 to n do c[0, j] 0
5 for i = 1 to m do
6 for j = 1 to n do
7 if x
i
= y
j
then
8 c[i, j] c[i 1, j 1] +1
9 b[i, j] ||
10 else
11 c[i, j] max(c[i, j 1], c[i 1, j]); b[i, j]
12 if c[i 1, j] = c[i, j] then b[i, j] ||
13 if c[i, j 1] = c[i, j] then b[i, j] b[i, j] ||
14 return c, b
Algorithm 11.2: LCS-PRINT(b, X, i, j)
Print the longest common subsequence of X[1. . i] and
Y[1. . j] using the table b computed above.
1 if i = 0 or j = 0 then return
2 if b[i, j] then LCS-PRINT(b, X, i 1, j 1); print x
i
3 else if b[i, j] then LCS-PRINT(b, X, i 1, j)
4 else LCS-PRINT(b, X, i, j 1)
The knapsack problem
Given: volumes b a
1
, . . . , a
n
> 0, and integer values
w
1
w
n
> 0.
maximize w
1
x
1
+ + w
n
x
n
subject to a
1
x
1
+ + a
n
x
n
b,
x
i
=0, 1, i = 1, . . . , n.
In other words, nd a subset i
1
< < i
k
of the set of items 1, . . . , n
(by choosing which x
i
= 1) such that
the sum of their volumes a
i
1
+ +a
i
k
is less than the volume b
of our knapsack,
the sum of their values w
i
1
+ + w
i
k
is maximal.
Special cases
Subset sum problem nd i
1
, . . . , i
k
with a
i
1
+ + a
i
k
= b.
Obtained by setting w
i
= a
i
. Now if there is a solution with
value b, we are done.
Partition problem Given numbers a
1
, . . . , a
n
, nd i
1
, . . . , i
k
such
that a
i
1
+ + a
i
k
is as close as possible to (a
1
+ + a
n
)/2.
Solve by dynamic programming. For 1 k n,
m
k
(p) = min| a
1
x
1
+ + a
k
x
k
: w
1
x
1
+ + w
k
x
k
= p |.
If the set is empty the minimum is . Let w = w
1
+ + w
n
. The
array m
k+1
(0), . . . , m
k+1
(w) can be computed from
m
k
(0), . . . , m
k
(w):
If w
k+1
> p then m
k+1
(p) = m
k
(p) (item k +1 cannot be used).
Otherwise, decide whether to use item k +1:
m
k+1
(p) = min| m
k
(p), a
k+1
+ m
k
(p w
k+1
) |.
The optimum is max| p : m
n
(p) b |.
Complexity: O(nw) steps, (counting additions as single steps).
Why is this not a polynomial algorithm?
What if we want the exact equation a
i
1
+ + a
i
k
= b? Assume
that a
i
, b are also integers.
For 1 k n,
A
k
(p) = | a
1
x
1
+ + a
k
x
k
: w
1
x
1
+ + w
k
x
k
= p | |1, . . . , b|.
The array A
k+1
(0), . . . , A
k+1
(w) can be computed from
A
k
(0), . . . , A
k
(w):
If w
k+1
> p then A
k+1
(p) = A
k
(p).
Otherwise,
A
k+1
(p) = A
k
(p) a
k+1
+A
k
(p w
k+1
) |1, . . . , b|,
where x +|u
1
, . . . , u
q
| = |x +u
1
, . . . , x +u
q
|.
Now the optimum is max| p : b A
n
(p) |.
Complexity: O(nwb).
Example: Let |a
1
, . . . , a
5
| = |2, 3, 5, 7, 11|, w
i
= 1, b = 13.
Otherwise,
1 2 3 4
1 |2| || || ||
2 |2, 3| |5| || ||
3 |2, 3, 5| |5, 7, 8| |10| ||
4 |2, 3, 5, 7| |5, 7, 8, 9, 10, 12| |10, 12| ||
5 |2, 3, 5, 7, 11| |5, 7, 8, 9, 10, 12, 13| |10, 12| ||
max| p : 13 A
5
(p) | = 2.
Application: money changer problem. Produce the sum b using
smallest number of coins of denominations a
1
, . . . , a
n
(at most one
of each).
Greedy algorithms
Activity selection
There is no general recipe. In general, greedy algorithms assume
that
There is an objective function f (x
1
, . . . , x
n
) to optimize that
depends on some choices x
1
, . . . , x
n
. (Say, we need to maximize
f ().)
There is a way to estimate, roughly, the contribution of each
choice x
i
to the nal value, but without taking into account
how our choice will constrain the later choices.
The algorithm is greedy if it still makes the choice with the best
contribution.
The greedy algorithm is frequently not the best, but sometimes it is.
Example: activity selection. Given: activities
[s
i
, f
i
)
with starting time s
i
and nishing time f
i
. Goal: to perform the
largest number of activites.
Greedy algorithm: sort by f
i
. Repeatedly choose the activity with
the smallest f
i
compatible with the ones already chosen. Rationale:
smallest f
i
restricts our later choices least.
Example
(1, 4), (3, 5), (0, 6), (5, 7), (3, 8), (5, 9), (6, 10),
(8, 11), (8, 12), (2, 13), (12, 14).
Chosen: (1, 4), (5, 7), (8, 11), (12, 14).
Is this correct? Yes, by design, since we always choose an activity
compatible with the previous ones.
Is this best? By induction, it is sufcient to see that in the rst step,
the greedy choice is best possible. And it is, since if an activity
is left available after the some choice, it is also left available
after the greedy choice.
Knapsack
The knapsack problem, considered above under dynamic
programming, also has a natural greedy algorithm. It is easy to
see that this is not optimal. (Exercise.)
But there is a version of this problem, called the fractional
knapsack problem, in which we are allowed to take any fraction
of an item. The greedy algorithm is optimal in this case.
(Exercise.)
Exhaustive search
Maximum independent set
There are cases of algorithmic problems when nothing much better
is known than trying out all possibilities.
Example: Maximum independent set in a graph. Let G = (V, E), be
an undirected graph on the set of nodes V, with set of edges E.
A set S of vertices is independent if no two vertices u, v S are
connected by an edge: |u, v| , E.
A maximal independent set is one that cannot be extended to a
larger independent set.
A maximum independent set is an independent set of maximal
size.
The black point forms by itself a maximal
independent set, the white ones a maximum
independent set.
It is easy to nd a maximal independent set, but it is very hard to
nd a maximum independent set. This is what we are interested in.
Let N
G
(x) denote the set of neighbors of point x in graph G,
including x itself.
Let us x a vertex x. There are two kinds of independent vertex set
S: those that contain x, and and those that do not.
If x , S then S is an independent subset of the graph G \ |x|
(of course, the edges adjacent to x are also removed from G).
If x S then S does not intersect N
G
(x), so S \ |x| is an
independent subset of G \ N
G
(x).
This gives rise to a simple recursive search algorithm.
Algorithm 13.1: MAX-INDEP(G)
Returns a maximum independent set of the graph G = (V, E).
We write V = V(G).
1 if G has no edges then return V(G)
2 else
3 x a vertex from G
4 S
1
MAX-INDEP(G \ |x|)
5 S
2
|x| MAX-INDEP(G \ N
G
(x))
6 return the larger of S
1
, S
2
We can improve on the simple search by noticing that some
searches cannot lead to improvement. For example in the graph
2 3 4 1
if we found the set |1, 3| then since the possibility that 1 S, 3 , S
cannot give us an independent set of size > 2 anymore, we need
not explore it. There is also no need to explore any possibilities
after 2 S. We formalize this as follows:
MAX-INDEP-LB(G, b)
is an algorithm that returns an independent subset of G if it can
return one of size > b, otherwise it returns .
Algorithm 13.2: MAX-INDEP-LB(G, b)
Returns a maximum independent set of the graph G if there is
one of size > b, otherwise returns .
1 if [V(G)[ b then return
2 else if G has no edges then return V(G)
3 else
4 x a vertex from V
5 S
1
MAX-INDEP-LB(G \ |x|, b)
6 b
/
max(b, [S
1
[)
7 S
2
|x| MAX-INDEP-LB(G \ N
G
(x), b
/
1)
8 S the larger of S
1
, S
2
9 if [S[ > b then return S else return
We will call the algorithm as MAX-INDEP-LB(G, 0).
This kind of algorithm is called a branch-and-bound algorithm:
maintaining a bound sometimes helps pruning the branches.
Let us trace this algorithm on the graph
2 3 4 1
(,1) : (G \ |1|, 0)
(,1, ,2) : (G \ |1, 2|, 0)
(,1, ,2, ,3) : (G \ |1, 2, 3|, 0) returns |4|, b
/
= 1
(,1, ,2, 3) : (G \ |1, 2, 3, 4|, 1 1 = 0) returns ,
so (,1, ,2) : (G \ |1, 2|, 0) returns |4|, b
/
= 1.
(,1, 2) : (G \ |1, 2, 3|, 1 1 = 0) returns |4|,
so (,1) : (G \ |1|, 0) returns |2, 4|, b
/
= 2.
(1) : (G \ |1, 2|, 2 1 = 1)
(1, ,2, ,3) : (G \ |1, 2, 3|, 1) returns by bound, b
/
= 2.
(1, ,2, 3) : (G \ |1, 2, 3, 4|, 1 1 = 0) returns ,
so (1) : (G \ |1, 2|, 2 1 = 1) returns by bound,
giving the nal return |2, 4|.
Graph representation
Graphs G(V, E): set V of vertices or nodes, or points, and set E of
edges or lines.
Directed or undirected. Loop edges. Parallel edges.
Representations:
Adjacency list good for sparse graphs, works for all kinds of graph.
Adjacency matrix (vertex-vertex), good for dense graphs. If there
are parallel edges, they can be represented by multiplicities.
Incidence matrix (vertex-edge).
Some other graph notions: path, (no repeated edges or nodes),
walk (repetitions allowed), cycle.
We write u v if there is a directed path from u to v.
Breadth-rst search
Directed graph G = (V, E). Call some vertex s the source. Finds
shortest path (by number of edges) from the source s to every
vertex reachable from s.
Rough description of breadth-rst search:
Distance x.d from s gets computed.
Black nodes have been visited.
Gray nodes frontier.
White nodes unvisited.
Algorithm 14.1: Generic graph search
1 paint s grey
2 while there are grey nodes do
3 take a grey node u, paint it black
4 for each white neighbor v of u do
5 v.d u.d +1
6 make v a grey child of u
To turn this into breadth-rst search, make the set of grey nodes a
queue Q. Now Q is represented in two ways, as a queue, and as a
subset of V (the grey nodes). The two representations must remain
synchronized.
Analysis: O([V[ +[E[) steps, since every link is handled at most
once.
Algorithm 14.2: BFS(G, s)
Input: graph G = (V, Adj).
Returns: the tree parents u., and the distances u.d.
Queue Q makes this search breadth-rst.
1 for u V \ |s| do
2 u.color white; u.d ; u. nil
3 s.color gray; s.d 0; Q |s|
4 while Q ,= do
5 u Q.head; DEQUEUE(Q)
6 for v u.Adj do
7 if v.color = white then
8 v.color gray; v.d u.d +1; v. u
9 ENQUEUE(Q, v)
10 u.color black
In the breadth-rst tree dened by , each path from node to root
is a shortest path.
Shortest path problems
Composite words
Sometimes, it takes some thinking to see that a problem is a
shortest path problem.
Example: how to break up some composite words?
Personaleinkommensteuerschtzungskommissionsmitglieds-
reisekostenrechnungsergnzungsrevisionsfund (Mark Twain)
With a German dictionary, break into relatively few components.
Graph points all division points of the word, including start and
end.
Edges if the word between the points is in the dictionary (maybe
without the s at the end).
Path between the start and end corresponds to a legal breakup.
(Note that this graph is acyclic.)
The word breakup problem is an example where the graph is
given only implicitly: To nd out whether there is an edge
between two points, you must make a dictionary lookup.
We may want to minimize those lookups, but minimizing two
different objective functions simultaneously (the number of
division points and the number of lookups) is generally not
possible.
(For minimizing lookups, depth-rst search seems better.)
Car racing
start finish
In each step, the speed vector can change only by 1 in each
direction. We have to start and arrive with speed 1, vertical
direction. There is a graph in which this is a shortest path problem.
Vertices (point, speed vector) pairs (p, v).
Edges between (p
1
, v
1
) and (p
2
, v
2
): if p
2
p
1
= v
1
, [v
2
v
1
[

1.
Here [(x, y)[

= max([x[, [ y[) is the so-called maximum norm.


Depth-rst search
Similar to breadth-rst search, but in our generic graph search,
put grey nodes on a stack, rather than a queue.
In applications, it provides a useful structure for exploring the
whole graph, not just the nodes reachable from a given source.
So it is restarted as long as there are any unvisited nodes.
Predecessor v. as before. Now the predecessor subgraph may
be a depth-rst forest, not just a depth-rst tree, when there are
several sources.
Global variable time used to compute timestamps: discovery
time v.d, nishing time v. f (when all descendants have been
explored).
Recursive algorithm DFS-VISIT(G, u). Cost O(V + E).
Algorithm 15.1: DFS-VISIT(G, u)
Assumes that unvisited nodes of G are white.
1 time++; u.d time; u.color gray
2 for each v u.Adj do
3 if v.color = white then v. u; DFS-VISIT(G, v)
4 u.color black; time++; u. f time
Algorithm 15.2: DFS(G)
1 for each vertex u G.V do
2 u.color white; u. nil
3 for each vertex u G.V do
4 if u.color = white then DFS-VISIT(G, u)
Properties of depth-rst search
Theorem (Parenthesis)
v.d, v. f behave as parentheses.
Theorem (White path)
Vertex v is a descendant of vertex u iff
at time u.d, it can be reached from u on a white path.
Classication of edges:
Tree edges
Back edges
Forward edges (nonexistent in an undirected graph)
Cross edges (nonexistent in an undirected graph)
Topological sort
Assume that in a directed graph G = (V, E) there are no cycles.
Then it is called a directed acyclic graph (dag).
Example: course scheduling with prerequisites.
Write u _ v if there is a path (possibly of length 0) from u to v,
and u v when u _ v and u ,= v.
The relation _ is transitive: u _ v and v _ w implies u _ w.
The relation _ is also antisymmetric: u _ v and v _ u implies
u = v. Indeed, a walk u v u would contain a cycle.
A relation _ that is transitive and antisymmetric is called a
partial order. So a directed acyclic graph always gives rise to a
partial order.
A partial order _ is a total order if for all x, y we have x _ y or
y _ x.
Examples of partial order:
a[b were a, b are positive integers.
A B where A, B are sets.
a < b where a, b are real numbers. This is also a total order.
Every partial order gives rise to an acyclic graph: introduce
edges u v for every u, v with u v.
Call an edge (u, v) of a directed acyclic graph necessary if it is
the only directed path from u to v.
How to nd all necessary edges?
Theorem
Let E
/
be the set of all necessary edges of a directed
acyclic graph G = (V, E). Then E
/
denes the same relation on V
as E.
Can we turn every partial order into a total order by adding
some more x y relations?
This is equivalent to the following: Given an acyclic graph
G = (V, E), is there a listing v
1
, . . . , v
n
of its vertices in such a
way that if (v
i
, v
j
) E then i < j?
Yes, and DFS(G) nds the listing very efciently. Sorting by v. f
(backwards) is the desired order.
Theorem
The graph G is acyclic iff DFS(G) yields no back edges.
Strongly connected components
If the graph G is not acyclic, then write u = v if u _ v and v _ u.
u = v is an equivalence relation (reexive, symmetric,
transitive). It breaks up V into equivalence classes, called
strongly connected components.
Taking these components as new vertices, we get an acyclic
graph.
Let G
T
be the graph in which each edge of G is reversed.
Algorithm 15.3: Find strongly connected components
1 DFS(G) to compute nishing times u. f
2 DFS(G
T
), taking points in order of decreasing u. f
3 return each tree of the last step as a separate strongly
connected component.
To show correctness, look at the last element u chosen to perform
DFS(G, u). This is the rst one to do DFS(G
T
, u).
Claim: The elements visited in DFS(G
T
, u) are in the strong
component of u.
Indeed, if u
G
T
v then v
G
u. Then DFS(G) has not visited v before
choosing u, otherwise would have found u in the search that visited
v. Then u v, since DFS(G, u) left nothing unvisited.
Now delete the strong component of u, and repeat the reasoning. . .
Shortest paths with edge weights (lengths)
Weight of edge e = (u, v): w(e) = w(u, v).
Weight of path: the sum of the weights of its edges.
Shortest path: lightest path.
Distance (u, v) is the length of lightest path from u to v.
Variants of the problem:
Single-pair (from source s to destination t).
Single-source s: to all reachable points. Returns a tree of
lightest paths, represented by the parent function v v..
All-pairs.
Negative weights? These are also interesting, but rst we assume
that all weights are nonnegative.
Relaxation
Dijkstras algorithm for single-source shortest paths is a
renement of breadth-rst search.
During the algorithm, we maintain the candidate distance v.d
for each vertex v. By the end, v.d will equal the distance (s, v)
from s to v.
The updating of v.d using new information is called relaxation.
Algorithm 16.1: RELAX(u, v, w)
Update the candidate distance from s to v using the edge
length w(u, v). If the distance through u is shorter, make u
the parent of v.
1 if v.d > u.d + w(u, v) then
2 d.v u.d + w(u, v)
3 v. u;
For Dijkstras algorithm, the points v are on a min-heap Q keyed by
the eld v.key = v.d. It supports the operation
DECREASE-KEY(Q, v, k),
which decreases the key value of v to k (oating v up the heap Q as
needed):
Algorithm 16.2: RELAX(u, v, w, Q)
1 if v.d > u.d + w(u, v) then
2 DECREASE-KEY(Q, v, u.d + w(u, v))
3 v. u;
Algorithm 16.3: DIJKSTRA(G, w, s)
On termination, u.d = (s, u), the distance, and u. represents
a tree.
1 INITIALIZE-SINGLE-SOURCE(G, s)
2 Q V, a min-heap
3 while Q ,= do
4 u EXTRACT-MIN(Q)
5 for each v u.Adj do RELAX(u, v, w, Q)
Correctness: Order the nodes by distance (s, u) in a sequence
s = u
1
, . . . , u
n
. We prove by induction on k that at the end,
u
k
.d = (s, u). True for k = 1. Let u
j
be the previous element on
some shortest path to u
k
. By inductive assumption, the nal
u
j
.d = (s, u
j
). The nal u
k
.d > (s, u
j
) is computed after the nal
u
j
.d. Relaxation with u
j
.d gives u
k
.d = u
j
.d + w(u
j
, u
k
).
Cost of Dijkstras algorithm
Let n = [V[, m = [E[. Potentially a priority queue update for each
edge: O(mlog n). This depends much on how dense is G.
All pairs, negative edges
Assume that there are no negative edge weights: To compute
distances for all pairs, repeat Dijkstras algorithm from each
vertex as source. Complexity O(mnlog n), not bad. But can be
improved, if G is dense, say m = (n
2
).
If there are negative edges, new question: how about a negative
cycle C? Then the minimum length of walks is (that is does
not exist) between any pair of nodes u, v such that u C and
C v. (A walk can go around a cycle any number of times.)
More modest goal: nd out whether there is a negative cycle: if
there is not, nd all distances (shortest paths are then easy).
The Floyd-Warshall algorithm
Dynamic programming. Recursion idea: Call a k-path one that uses
only 1, . . . , k for intermediate vertices. d
(k)
i j
= distance over k-paths.
A k-path from i to j that is not a k 1-path consists of a k 1-path
from i to k and a k 1-path from k to j.
Algorithm 16.4: FLOYD-WARSHALL(W)
If there is any i in a negative cycle then d
(n)
ii
< 0.
Otherwise d
(n)
i j
= length of a lightest path i j.
1 for i, j = 1 to n do d
(0)
i j
w(i, j) // Adjacency matrix
2 for k = 1 to n do
3 for i = 1 to n do
4 for j = 1 to n do d
(k)
i j
min(d
(k1)
i j
, d
(k1)
ik
+ d
(k1)
kj
)
Cost: O(n
3
).
Applications
Transitive closure of a directed graph.
Arbitrage. Exchange rate x
i j
: this is how much it costs to buy a
unit of currency j in currency i. Let w(i, j) = log x
i j
. We can
become innitely rich if and only if there is negative cycle in
this graph.
Longest path in an acyclic graph
The Floyd-Warshall algorithm with negative weights would do it,
but the following is more efcient.
1 for each vertex u in backwards topologically sorted order do
2 for for each vertex v in u.Adj do RELAX(u, v, w)
(Of course, RELAX(u, v, w) is taking the maximum here, not
minimum.)
Application: nding the longest path is the basis of all project
planning algorithms. (PERT method). You are given a number of
tasks (for example courses with prerequisites). Figure out the least
time needed to nish it all. This assumes unlimited resources, for
example ability to take any number of courses in a semester.
Minimum spanning trees
With respect to connectivity, another important algorithmic
problem is to nd the smallest number of edges that still leaves an
undirected graph connected. More generally, edges have weights,
and we want the lightest tree. (Or heaviest, since negative weights
are also allowed.)
Generic algorithm: add a safe edge each time: an edge that does
not form a cycle with earlier selected edges.
A cut of a graph G is any partition V = S T, S T = . It respects
edge set A if no edge of A crosses the cut. A light edge of a cut: a
lightest edge across it.
Theorem
If A is a subset of some lightest spanning tree, S a cut
respecting A then after adding any light edge of S to A, the
resulting A
/
still belongs to some lightest spanning tree.
Prims algorithm
Keep adding a light edge adjacent to the already constructed tree.
Algorithm 17.1: MST-PRIM(G, w, r)
Lightest spanning tree of (G, w), root r, returned in v..
1 for each vertex u do u.key ; u. nil
2 r.key 0; Q G.V, a min-heap keyed by v.key
3 while Q ,= do
4 u EXTRACT-MIN(Q)
5 for v u.Adj do
6 if v Q and w(u, v) < v.key then v. u
7 DECREASE-KEY(Q, v, w(u, v))
The only difference to Dijkstras algorithm is in how the key gets
decreased: it is set to the candidate lightest edge length from v to
the tree, not the candidate lightest path length from v to the root.
Matchings
Example (Workers and jobs)
Suppose that we have n workers
and n jobs. Each worker is capable of performing some of the jobs.
Is it possible to assign each worker to a different job, so that
workers get jobs they can perform?
It depends. If each worker is familiar only with the same one job
(say, digging), then no.
Example
At a dance party, with 300 students, every boy knows
50 girls and every girl knows 50 boys. Can they all dance
simultaneously so that only pairs who know each other dance with
each other?
Bipartite graph: left set A (of girls), right set B (of boys).
Matching, perfect matching.
Theorem
If every node of a bipartite graph has the same degree
d 1 then it contains a perfect matching.
Examples showing the (local) necessity of the conditions:
Bipartiteness is necessary, even if all degrees are the same.
Bipartiteness and positive degrees is insufcient.
Example
6 tribes partition an island into hunting territories of
100 square miles each. 6 species of tortoise, with disjoint habitats
of 100 square miles each.
Can each tribe pick a tortoise living on its territory, with different
tribes choosing different totems?
For S A let
N(S) B
be the set of all neighbors of the nodes of A.
Special property: For every S A we have [N(S)[ [S[.
Indeed, the combined hunting area of any k tribes intersects with
at least k tortoise habitats.
The above property happens to be the criterion also in the general
case:
Theorem (The Marriage Theorem)
A bipartite graph has a perfect
matching if and only if [A[ = [B[
and for every S A we have
[N(S)[ [S[.
The condition is necessary.
Proposition
The condition
implies the same condition for all
S B.
Prove this as an exercise.
S
N(S)
Flow networks
Directed graph. Source s, sink t. Every vertex is on some path
from s to t.
Flow: function f (u, v) on all edges (u, v) showing the amount
of material going from u to v. We are only interested in the net
ow

f (u, v) = f (u, v) f (v, u): then

f (v, u) =

f (u, v). So we
simply require
f (v, u) = f (u, v).
The total ow entering a non-end node equals the total ow
leaving it:
for all u V \ |s, t|

v
f (u, v) = 0.
Each edge (u, v) imposes a capacity c(u, v) 0 on the ow:
f (u, v) c(u, v). (We may have c(u, v) ,= c(v, u).)
9/16
12/12
12/20
3/13
14
4
3/4
9
7
s
t
v
1
v
2
v
3
v
4
The notation f /c means ow f along an edge with capacity c.
Our goal is to maximize the value of the ow f , that is
[ f [ =

v
f (s, v) =

v
f (v, t).
Application to matching
s t
n points on left, n on right. Edges directed to right, with unit
capacity.
Perfect matching ow of value n.
Flow of value n perfect matching? Not always, but
fortunately (as will be seen), there is always an integer
maximum ow.
Ford-Fulkerson theory
Idea for increasing the ow: do it
along some path.
9/16
12/12
12/20
3/13
14
4
3/4 9
7
s t
v
1
v
2
v
3
v
4
Increment it along the path
s-v
2
-v
4
-t by 4.
Then increment along the path
s-v
2
-v
4
-v
3
-t by 6. Now we are
stuck: no more direct way to
augment.
9/16
12/12
12/20
7/13
4/14
4/4
3/4 9
7
s t
v
1
v
2
v
3
v
4
9/16
12/12
18/20
13/13
10/14
4/4
3/4 9
6/7
s t
v
1
v
2
v
3
v
4
New idea: Increase (by 1) on
(s, v
1
), decrease on (v
1
, v
2
),
increase on (v
2
, v
4
, v
3
, t).
Residual network, augmenting path
Generalization: Given a ow f ,
residual capacity
c
f
(u, v) = c(u, v) f (u, v).
Makes sense even with negative
f (u, v). The residual network
may have edges (with positive
capacity) that were not in the
original network. An augmenting
path is an s-t path in the residual
network (with some ow along
it). (How does it change the
original ow?)
7
18
13
4
1
9
1
s
t
v
1
v
2
v
3
v
4
9
3
2
4
10
9/16
12/12
18/20
13/13
10/14
4/4
3/4 9
6/7
s t
v
1
v
2
v
3
v
4
12
6
Residual capacity of the path: the minimum of the residual
capacities of its edges (in the example, 1).
We obtained:
10/16
12/12
19/20
13/13
11/14
4/4
2/4
9
7/7
s
t
v
1
v
2
v
3
v
4
This cannot be improved: look at the cut (S, T) with T = |v
3
, t|.
Method of augmenting paths
Keep increasing the ow along augmenting paths.
Questions
1
If it terminates, did we reach maximum ow?
2
Can we make it terminate?
3
How many augmentations may be needed? Is this a
polynomial time algorithm?
Neither of these questions is trivial. The technique of cuts takes
closer to the solution.
Cuts
Cut (S, T) is a partition of V with s S, t T.
Net ow f (S, T) =

uS,vT
f (u, v).
Capacity c(S, T) =

uS,vT
c(u, v). Obviously, f (S, T) c(S, T).
9/16
12/12
12/20
3/13
14
4
3/4 9
7
s
t
v
1
v
2
v
3
v
4
S T
In this example, c(S, T) = 26, f (S, T) = 12.
Lemma
f (S, T) = [ f [, the value of the ow.
Corollary
The value of any ow is bounded by the capacity of
any cut.
Theorem (Max-ow, min-cut)
The following properties of a
ow f are equivalent.
1
[ f [ = c(S, T) for some cut (S, T).
2
f is a maximum ow.
3
There are no augmenting paths to f .
The equivalence of the rst two statements says that the size of the
maximum ow is equal to the size of the minimum cut.
Proof:
1

2
and
2

3
are obvious. The crucial step is
3

1
.
Given f with no augmenting paths, we construct (S, T): let S be
the nodes reachable from s in the residual network G
f
.
Proof of the marriage theorem
Using Max-Flow Min-Cut. Assume there is no perfect matching in
the bipartite graph G = (A B, E), with [A[ = [B[ = n. We nd a
bottleneck H A with [N(H)[ < [H[.
Flow network over A B |s, t| as before. Since there is no perfect
matching, the maximum ow has size < n. So there is a cut (S, T),
s S, t T, with c(S, T) < n.
H = S A
T A
H

S
H

T
s
t
Let H = S A, H
/
= N(H).
n > c(S, T)
= c(|s|, T) + c(S A, T B) + c(S, |t|)
(n [H[) +[H
/
T[ +[H
/
S[
= n [H[ +[H
/
[,
[H[ > [H
/
[.
Polynomial-time algorithm
Does the Ford-Fulkerson algorithm terminate? Not necessarily
(if capacities are not integers), unless we choose the
augmenting paths carefully.
Integer capacities: always terminates, but may take
exponentially long.
Network derived from the bipartite matching problem: each
capacity is 1, so we terminate in polynomial time.
Edmonds-Karp: use breadth-rst search for the augmenting
paths. Why should this terminate?
Lemma
In the Edmonds-Karp algorithm, the shortest-path
distance
f
(s, v) increases monotonically with each augmentation.
Proof: Let
f
(s, u) be the distance of u from s in G
f
, and let f
/
be
the augmented ow. Assume, by contradiction
f
/ (s, v) <
f
(s, v)
for some v: let v be the one among these with smallest
f
/ (s, v).
Let u v be be a shortest path edge in G
f
/ , and
d :=
f
(s, u)(=
f
/ (s, u)), then
f
/ (s, v) = d +1.
Edge (u, v) is new in G
f
/ ; so (v, u) was a shortest path edge in G
f
,
giving
f
(s, v) = d 1. But
f
/ (s, v) = d +1 contradicts

f
/ (s, v) <
f
(s, v).
An edge is said to be critical, when it has just been lled to capacity.
Lemma
Between every two times that an edge (u, v) is critical,

f
(s, u) increases by at least 2.
Proof: When it is critical,
f
(s, v) =
f
(s, u) +1. Then it disappears
until some ow f
/
. When it reappears, then (v, u) is critical, so

f
/ (s, u) =
f
/ (s, v) +1
f
(s, v) +1 =
f
(s, u) +2.
Corollary
We have a polynomial algorithm.
Proof: Just bound the number of possible augmentations, noticing
that each augmentation makes some edge critical.
Let n = [V[, m = [E[. Each edge becomes critical at most n/2 times.
Therefore there are at most m n/2 augmentations. Each
augmentation may take O(m) steps: total bound is
O(m
2
n).
There are better algorithms: Goldbergs push-relabel algorithm,
also given in your book, achieves O(n
3
).
Approximations
The setting
In some cases we cannot hope to nd a complete solution to some
algorithmic problem, in reasonable time: we have to be satised
with an approximate solution. This makes sense in case of
optimization problems.
Suppose we have to maximize a positive function. For object
function f (x, y) for x, y |0, 1|
n
, for input x, the optimum is
M(x) = max
y
f (x, y)
where y runs over the possible values which we will call witnesses.
For 0 < , an algorithm A(x) is a -approximation if
f (x, A(x)) > M(x)/.
For minimization problems, with minimum m(x), we require
f (x, A(x)) < M(x).
Greedy algorithms
Try local improvements as long as you can.
Example (Maximum cut)
Graph G = (V, E), cut S V,
S = V \ S. Find cut S that maximizes the number of edges in the
cut:
[| |u, v| E : u S, v S |[.
Greedy algorithm:
Repeat: nd a point on one side of the cut whose moving
to the other side increases the cutsize.
Theorem
If you cannot improve anymore with this algorithm
then you are within a factor 2 of the optimum.
Proof. The unimprovable cut contains at least half of all edges.
Randomized algorithms
Generalize maximum cut for the case where edges e have weights
w
e
, that is maximize

uS,vS
w
uv
.
Question The greedy algorithm brings within factor 2 of the
optimum also in the weighted case. But does it take a
polynomial number of steps?
New idea: decide each v S? question by tossing a coin. The
expected weight of the cut is
1
2

e
w
e
, since each edge is in the
cut with probability 1/2.
Less greed is sometimes better
What does the greedy algorithm for vertex cover say?
The following, less greedy algorithm has better performance
guarantee.
Approx_Vertex_Cover(G):
1 C
2 E
/
E[G]
3 while E
/
,= do
4 let (u, v) be an arbitrary edge in E
/
5 C C |u, v|
6 remove from E
/
every edge incident on either u or v
7 return C
Theorem
Approx_Vertex_Cover has a ratio bound of 2.
Proof. The points of C are endpoints of a matching. Any optimum
vertex cover must contain half of them.
How bad is greedy vertex cover?
Greedy algorithm: Repeat the following, as long as you can:
Find a vertex with the largest possible degree. Delete it from
the graph, along with all the edges incident to it.
We will show a graph where this algorithm does not even have a
constant approximation ratio.
A = |a
1
, . . . , a
n
|.
B
2
= |b
2,1
, . . . , b
2,[n/2|
|. Here, b
2,1
is connected to a
1
, a
2
,
further b
2,2
to a
3
, a
4
, and so on.
B
3
= |b
3,1
, . . . , b
3,[n/3|
|. Here b
3,1
is connected to a
1
, a
2
, a
3
,
further b
3,2
to a
4
, a
5
, a
6
, and so on.
and so on.
B
n
consists of a single point connected to all elements of A.
The n elements of A will cover all edges. Greedy algorithm:
Each element of A has degree (n 1), so it chooses B
n
rst.
Then B
n1
, then B
n2
, and so on, down to B
2
.
The total number of points chosen is
[n/2| +[n/3| + +[n/n| n(1/2 + +1/n) n nlnn.
The set-covering problem
Can it be worse than this? No, even for a more general problem:
set cover.
Given (X, ): a set X and a family of subsets of X, nd a
min-size subset of covering X.
Example: Smallest committee with people covering all skills.
Generalization: Set S has weight w(S) > 0. We want a
minimum-weight set cover.
The algorithm Greedy_Set_Cover(X, ):
1 U X
2 (
3 while U ,= do
4 select an S that maximizes [S U[/w(S)
5 U U \ S
6 ( ( |S|
7 return (
Analysis
Let H(n) = 1 +1/2 + +1/n( lnn).
Theorem
Greedy_Set_Cover has a ratio bound
max
S
H([S[).
See the proof in a graduate course (or in the book).
Application to vertex cover.
Approximation scheme
Knapsack problem
Recall the knapsack problem:
Given: integers b a
1
, . . . , a
n
, and integer weights w
1
w
n
.
maximize

j
w
j
x
j
subject to

j
a
j
x
j
b,
x
i
=0, 1, i = 1, . . . , n.
Idea for approximation: break each w
i
into a smaller number of big
chunks, and use dynamic programming. Let r > 0, w
/
i
= [w
i
/r|.
maximize

j
w
/
j
x
j
subject to

j
a
j
x
j
b,
x
i
=0, 1, i = 1, . . . , n.
When r is large then our numbers are small, so the dynamic
programming solution is not so expensive. But we still get a good
approximation to the optimum. (See the detailed estimate in a
graduate course.)
NP problems
Examples
shortest vs. longest simple paths
compositeness
subset sum
perfect matching
graph isomorphism
3-coloring
Ultrasound test of sex of fetus.
Decision problems vs. optimization problems vs. search problems.
Example
Given a graph G.
Decision Given k, does G have an independent subset of size k?
Optimization What is the size of the largest independent set?
Search Given k, give an independent set of size k (if there is one).
Optimization+search Give a maximum size independent set.
Abstract problems Instance. Solution.
Encoding We can pretend that all our problems are about strings.
Polynomial-time verication
Example
Hamiltonian cycles.
An NP problem is dened with the help of a polynomial-time
function
V(x, w)
with yes/no values that veries, for a given input x and witness
(certicate) w whether w is indeed witness for x.
Decision problem: Is there a witness (say, a Hamilton cycle)?
Search problem: Find a witness!
The same decision problem may belong to very different
verication functions (search problems):
Example
Input: positive integer x that is of the form z
k
for any
k > 0.
Decision: Is x composite?
Witness kind 1: Positive y ,= x, 1 dividing x.
Witness kind 2: : Different u, v, w with u
2
v
2
w
2
(mod x).
The following theorem will be probably proved for you in
cryptography or an algebraic algorithms course:
Theorem
If x is not of the form z
k
for k > 0 then it is composite
if and only if there is a b such that the equation t
2
b (mod x)
has more than two solutions in t.
Reduction, completeness
Reduction of problem A
1
to problem A
2
(given by verication
functions V
1
, V
2
) with a reduction (translation) function :
wV
1
(x, w) uV
2
((x), u).
We will say A
1

P
A
2
, if there is a polynomially computable .
Examples
Reducing matching to maximum ow.
Vertex cover to maximum independent set: C is a vertex cover
iff V \ C is an independent set.
NP-hardness.
NP-completeness.
Satisability of Boolean formulas
Boolean formulas.
F(x, y, z) = (x y z) (x y z) (x y z)
is 1 if and only if exactly two of the variables x, y, z are 1. So it
is satisable, say the assignment x = 1, y = 1, z = 0 satises it.
G(x, y, z, t) = (x y) (y z) (z t) x t
is not satisable. Indeed, x y means the same as x y, that
is x y. So G requires x y z t, but alsox = 1, t = 0.
Example expression of some combinatorial problem via Boolean
formulas: 2-coloring.
Theorem
Satisability is NP-complete.
Theorem
INDEPENDENT SET is NP-complete.
Reducing SAT to it.
Most NP-complete problems are proved such by a chain of
reductions from INDEPENDENT SET, or directly from SAT.
Example
Set cover vertex cover independent set.
The End

You might also like