cs330 10 Notes PDF
cs330 10 Notes PDF
cs330 10 Notes PDF
< 1 +q +q
2
+ =
1
1 q
where q = 3
1/2
.
Another example:
1 +1/2 +1/3 + +1/n = (log n).
Indeed, for n = 2
k1
, upper bound:
1 +1/2 +1/2 +1/4 +1/4 +1/4 +1/4 +1/8 +
= 1 +1 + +1 (k times).
Lower bound:
1/2 +1/4 +1/4 +1/8 +1/8 +1/8 +1/8 +1/16 +
= 1/2 +1/2 + +1/2 (k times).
Divide and conquer
An efcient algorithm can frequently obtained using the following
idea:
1
Divide into subproblems of equal size.
2
Solve subproblems.
3
Combine results.
In order to handle subproblems, a more general procedure is often
needed.
Merge sort
1
Subproblems: sorting A[1. . n/2] and A[n/2 +1. . n]
2
Sort these.
3
Merge the two sorted arrays of size n/2.
The more general procedures now are the ones that sort an merge
arbitrary parts of an array.
Algorithm 3.1: MERGE(A, p, q, r)
Merges A[p . . q] and A[q +1. . r].
1 n
1
q p +1; n
2
r q
2 create array L[1. . n
1
+1] and R[1. . n
2
+1]
3 for i 1 to n
1
do L[i] A[p + i 1]
4 for j 1 to n
2
do R[j] A[q + j]
5 L[n
1
+1] ; R[n
2
+1]
6 i 1, j 1
7 for k p to r do
8 if L[i] R[j] then A[k] L[j]; i++
9 else A[k] R[j]; j++
Why the business? These sentinel values allow to avoid an extra
part for the case that L or R are exhausted. This is also why we
used new arrays for the input, rather than the output.
Algorithm 3.2: MERGE-SORT(A, p, r)
Sorts A[p . . r].
1 if p < r then
2 q [(p +1)/2|
3 MERGE-SORT(A, p, q); MERGE-SORT(A, q +1, r)
4 MERGE(A, p, q, r)
Analysis, for the worst-case running time T(n):
T(n)
_
c
1
if n = 1
2T(n/2) + c
2
n otherwise.
Draw a recursion tree based on this inequality.
Recursion tree
n
n/2 n/2
n/4 n/4 n/4 n/4
Work on top level
Total work on level 1
Total work on level 2
...
Assume n = 2
k
:
c
2
n(1 +2 1/2 +4 1/4 + +2
k
2
k
) + c
1
n,
= c
2
nk + c
1
n = n(c
2
log n + c
1
) = O(nlog n)
Nonrecursive version
Perform rst the jobs at the bottom level, then those on the next
level, and so on. In passes k and k +1:
merge merge merge merge
2
k1
2
k
merge merge
Sorting without random access
What if we have to sort such a large list A, that it does not t
into random access memory?
If we are at a position p we can inspect or change A[p], but in
one step, we can only move to position p +1 or p 1.
Note that the Algorithm MERGE(A, p, q, r) passed through arrays
A, L, R in one direction only, not jumping around. But both the
recursive and nonrecursive MERGE-SORT(A) uses random access.
We will modify the nonrecursive MERGE-SORT(A) to work
without random access.
Runs
For position p in array A, let p
/
p be as large as possible with
A[p] A[p +1] A[p
/
].
The sub-array A[p . . p
/
] will be called a run.
Our algorithm will consist of a number of passes. Each pass goes
through array A and merges consecutive runs: namely
performs MERGE(A, p, q, r) where p = 1, q = p
/
, r = (q +1)
/
,
does the same starting with p r +1
and so on.
Algorithm 3.3: MERGE-RUNS(A)
Merge consecutive runs in A, return the new number of runs.
1 n A.size; r 0; k 1
2 while r < n do
3 p r +1; q p
4 while q < n and A[q +1] A[q] do q++ // q p
/
5 if q = n then r n else
6 r q +1; k++
7 while r < n and A[r +1] A[r] do r++
8 MERGE(A, p, q, r)
9 return k
Algorithm 3.4: DISK-SORT(A)
Sort without random access.
1 repeat k MERGE-RUNS(A) until k = 1
Each MERGE-RUNS(A) roughly halves the number of runs.
If you view the arrays as being on tapes, the algorithm uses still
too many rewind operations to return from the end of the tape
to the beginning. In an exercise, I will ask you to modify
DISK-SORT(A) (and MERGE-RUNS(A)) to so as to eliminate almost
all rewinds.
Lower bound on sorting
Lower bounds are difcult, in general: hard to prove that every
algorithm needs at least time T(n) in the worst case of our
problem, on inputs of length n.
We do not have such a result even on sorting.
Frequently, we show the lower bound only on algorithms
belonging to a specic class.
In case of sorting, there is a natural class: algorithms that get
information about the array elements only in one way:
comparing them to each other.
They do not not look at details of an element: no arithmetical
operations on elements, or looking at individual bits.
We will only count the number of comparisons used.
Decision tree
From the work of any such algorithm, we can derive a decision
tree.
b < c
a < b < c
a < c < b
c <a < b
b <a < c b < c <a
c < b <a
N Y
Y
N
Y
N
Y
N
Y
N
a < b
a < c
b < c
a < c
Each execution of the algorithm gives a downward path in the tree.
Worst-case time is lower-bounded by longest downward path
(height) in this binary tree.
Theorem
A tree of height h has at most 2
h
leaves.
Indeed, when we increase the height by 1, each leaf can give rise to
at most two children, so the number of leaves can at most double.
Corollary
If a tree has L leaves, then its height is at least log L.
Application to sorting
The number of leaves is at least n!. Hence the height is at least
log n!: in fact, it is |log n!.
log n! = log2 +log3 + +log n nlog n.
We want lower bound. Let us show that for n large enough, this is
0.8nlog n.
log2 +log3 + +log n
log(n/10) +log(n/10 +1) + +log n
0.9nlog(n/10) = 0.9n(log n log10) 0.8nlog n.
Of course, we could have written 0.99 in place of 0.8 in this result.
Theorem
Sorting n items needs at least
log n! = nlog n(1 + o(1))
comparisons in the worst case.
Average case
Average case? We need a lower bound on the average pathlength
in a binary tree with a given number L of leaves. This seems hard
to compute, but we only need a lower bound! Call a binary tree
optimal, if its average pathlength is as small as possible for its
number of leaves.
Claim
In an optimal binary tree with height h, every path has
length h1 or h: hence, the average path length is [log L|.
Indeed, if there is any shorter path to some leaf l, take a leaf l
/
with
longest path, and move it under l. This decreases the average
pathlength.
Corollary
The average number of comparisons on a random list
with n elements is [log n!|.
Information
Here is an intuitive retelling of the above argument.
If all we know about an object x is that it belongs to some known
set E of size m, we say that we have an uncertainty of size log m (m
bits). Each comparison gives us at most 1 bit of information
(decrease of uncertainty) about the permutation in question. Since
there are n! permutations, our uncertainty is log n!, so we need
that many bits of information to remove it.
Finding the maximum
Here is another decision-tree lower bound, that is not
information-theoretic.
Problem Given a list of n numbers, nd the largest element.
Solution (You know it: one pass.)
Other solution Tournament.
Cost n 1 comparisons in both cases.
Question Is this optimal?
The information-theoretical lower bound is log n.
Theorem
We need at least n 1 comparisons in order to nd
the maximum.
Indeed, every non-maximum element must participate in a
comparison that shows it is smaller than the one it is compared to.
Listing permutations
Here is another algorithm that is guaranteed to take a long time:
list all permutations of an array.
Why is it taking long? For a trivial reason: the output is long (n! n
items).
How to implement? It is not so easy to organize going through all
possibilities. We will use recursion: similar techniques will help
with other problems involving exhaustive search.
In what order to list? Lexicographical order: If a
1
. . . a
k
. . . a
n
and
b
1
. . . b
k
. . . b
n
are permutations, a
i
= b
i
for i < k and a
k
< b
k
then a
1
. . . a
k
. . . a
n
comes rst.
Algorithm 4.1: INSERT(A, i, j)
1 x A[i]
2 if i < j then for k i +1 to j do A[k 1] A[k]
3 else if j < i then for k i 1 to j do A[k +1] A[k]
4 A[j] x
Algorithm 4.2: LIST-PERMS-LEX(A, p)
List, in lexicographic order, all permutations of A that do not
change A[1. . p 1]. Leave A in the original order.
1 if p = n then print A else
2 LIST-PERMS-LEX(A, p +1)
3 for i = p +1 to n do
4 INSERT(A, i, p); LIST-PERMS-LEX(A, p +1); INSERT(A, p, i)
The solution below uses only swaps, but lists in some
non-lexicographic order.
Algorithm 4.3: SWAP(A, i, j)
1 x A[i]; A[i] A[j]; A[j] x
Algorithm 4.4: LIST-PERMS-SWAP(A, p)
List, all permutations of A that do not change A[1. . p 1].
Leave A in original order.
1 if p = n then print A else
2 LIST-PERMS-SWAP(A, p +1)
3 for i = p +1 to n do
4 SWAP(A, i, p); LIST-PERMS-SWAP(A, p +1); SWAP(A, p, i)
Priority queues
If we sort an array, then it becomes very easy to nd the largest
element, the second largest, and so on. Sometimes we need a
data structure that just does this.
For this, we dene the abstract data type, or object class called
priority queue.
The priority queue object maintains a set S of elements, each
associated with a value (say, an integer) called a key.
Operations INSERT(S, x), MAXIMUM(S), EXTRACT-MAX(S).
Later INCREASE-KEY(x, k)
Applications Job scheduling in a multiuser operating system.
Event-driven simulation.
Later in this course: helping various algorithms to use data.
Implementations
Unordered array All operations but INSERT take (n) steps.
Sorted array All but MAXIMUM take (n) steps.
Heap (has nothing to do with the heaps of memory management)
Maximum takes 1 step, others log n.
Fibonacci heap (later) All operations cost O(log n), and have
amortized cost O(1). (See denition of amortized cost later).
Heap
1. (Binary) tree with keys increasing towards root:
A[PARENT(i)] A[i].
2. A special array implementation of this tree:
PARENT(i) = [i/2|,
LEFT(i) = 2i, RIGHT(i) = 2i +1.
A[1]=15
A[2]=12 A[3]=9
A[4]=8 A[5]=2 A[6]=1 A[7]=7
A[8]=6 A[9]=7 A[10]=1
Building and maintaining a heap
HEAPIFY(A, i) will be called by several of our methods. Let
argmax
A
(i, j, k) = i if A[i] A[j], A[k], else j if A[j] A[k], else k.
Algorithm 5.1: HEAPIFY(A, i)
Assuming the trees under LEFT(i), RIGHT(i) are heaps, make
the tree under i also a heap.
Implementation: oat down A[i].
1 l LEFT(i); if l > A.heap-size then l i
2 r RIGHT(i); if r > A.heap-size then r i
3 largest argmax
A
(i, l, r)
4 if largest ,= i then
5 SWAP(A, i, largest)
6 HEAPIFY(A, largest)
Implementing EXTRACT-MAX(A):
Put the last element into the root: A[1] A[A.heap-size].
HEAPIFY(A, 1).
Implementing INSERT(A, k):
A.heap-size++; A[A.heap-size] k
Float up A[A.heap-size].
Build a heap cheaper
The above implementation would be sufcient for sorting since we
can build a heap by repeated insertion, and then extract the
maximum repeatedly. But the following is better.
Algorithm 5.2: BUILD-HEAP(A)
Build a heap, assuming the elements are in the array A.
1 for i A.heap-size downto 1 do HEAPIFY(A, i)
The naive estimate for the cost is O(nlog n), if n = A.heap-size. But
more analysis shows:
Theorem
The cost of algorithm BUILD-HEAP(A) on a heap of size
n is O(n).
The largest depth is D = [log n|, starting with the root at depth 0.
For the 2
d
elements i is at depth d, the cost of HEAPIFY(A, i) is
hd. Total cost, converting the depth d to the height h = D d:
0
d=D
2
d
(D d) =
D
h=0
h2
Dh
= 2
D
D
h=0
h2
h
n
h=0
h2
h
.
Though there is an exact formula for the last sum, a rough estimate
is more instructive for us. Since h <2
h/2
, write, with some
constant c:
h0
h2
h
c
h0
2
h/2
< ,
as a geometric series with quotient < 1.
Abstract data types (objects)
Some pieces of data, together with some operations that can be
performed on them.
Externally, only the operations are given, and only these should
be accessible to other parts of the program.
The internal representation is private. It consists typically of a
number of records, where each record has key, satellite data.
We will ignore the satellite data unless they are needed for the
data manipulation itself.
Increasing a key
This operation (giving more priority to some element) is important
in some applications where the priority queue helps some
algorithm (say Dijkstras shortest path algorithm).
Let the changed item oat up:
Algorithm 5.3: INCREASE-KEY(A, i, key)
Increase A[i] to key > A[i].
1 A[i] key
2 while i > 1 and A[PARENT(i)] < A[i] do
3 SWAP(A, i, PARENT(i))
4 i PARENT(i)
The INSERT(A, key) operation can also be seen as a special case of
INCREASE-KEY(A, i, key).
Quicksort
Another way to apply divide and conquer to sorting. Both here
and in mergesort, we sort the two parts separately and combine the
results.
With Mergesort, we sort rst two predetermined halves, and
combine the results later.
With Quicksort, we determine what elements fall into two
halves (approximately) of the sorted result, using Partition.
Calling this recursively on both halves completes the sort.
Algorithm 6.1: PARTITION(A, p, r)
Partition A[p . . r] into elements x = A[r] placed in p, . . . , i,
into i +1 containing x, and the rest containing > x.
Return i +1.
1 x A[r]
2 i p 1
3 for j p to r 1 do
4 if A[j] x then
5 i++; SWAP(A, i, j)
6 SWAP(A, i +1, r); return i +1
Example A = [2, 8, 7, 1, 3, 5, 6, 4].
Analysis Invariant: at beginning of loop,
p, . . . , i
. _ .
x
, i +1, . . . , j 1
. _ .
>x
Algorithm 6.2: QUICKSORT(A, p, r)
1 if p < r then
2 q PARTITION(A, p, r)
3 QUICKSORT(A, p, q 1); QUICKSORT(A, q +1, r)
Performance analysis
Worst case
Let T(n) be the worst-case number of comparisons. If
q = PARTITION(A, 1, n) then
T(n) = T(q 1) + T(n q) + n 1.
This is not immediately helpful, since we do not know q. But q = n
is possible (say, the array was already sorted), giving
T(n) = T(n 1) + n 1. Resolving the recursion shows that all
comparisons are performed!
Balanced partitioning
Hypothetical partitioning algorithm that always gives
0.1n < q < 0.9n.
Analysis with recursion tree gives nlog n/log(10/9).
Approximate balanced partitioning by partitioning around a
random array element.
Random number generator RANDOM(a, b) returns a random,
equally distributed integer between a and b (b a +1
possibilities).
In practice, we only have a pseudo-random number generator.
This is a separate topic, for the moment let us hope it gives
numbers that behave just like true random ones.
Randomized Quicksort
Algorithm 6.3: RANDOMIZED-PARTITION(A, p, r)
1 SWAP(A, r, RANDOM(p, r))
2 return PARTITION(A, p, r)
Algorithm 6.4: RANDOMIZED-QUICKSORT(A, p, r)
1 q RANDOMIZED-PARTITION(A, p, r)
2 RANDOMIZED-QUICKSORT(A, p, q 1)
3 RANDOMIZED-QUICKSORT(A, q, r)
Expected value
In order to analyze randomized quicksort, we learn some
probability concepts. I assume that you learned about random
variables already in an introductory course (Probability in
Computing, or its equivalent).
For a random variable X with possible values a
1
, . . . , a
n
, its
expected value X is dened as
a
1
_
X = a
1
_
+ + a
n
_
X = a
n
_
.
Example: if Z is a random variable whose values are the possible
outcomes of a toss of a 6-sided die, then
Z = (1 +2 +3 +4 +5 +6)/6 = 3.5.
Example: If Y is the random variable that is 1 if Z 5, and 0
otherwise, then
Y = 1 | Z 5| +0 | Z < 5| = | Z 5| .
Sum theorem
Theorem
For random variables X, Y (on the same sample
space):
(X + Y) = X +Y.
Example
For the number X of spots on top after a toss of a die,
let A be the event 2[X , and B the event X > 1. Dad gives me a
dime if A occurs and Mom gives one if B occurs. What is my
expected win?
Let I
A
be the random variable that is 1 if A occurs and 0 otherwise.
(I
A
+ I
B
) = I
A
+I
B
= (A) +(B) = 1/2 +5/6 dimes.
Average-case analysis of Quicksort
Let the sorted order be z
1
< z
2
< < z
n
. If i < j then let
Z
i j
= |z
i
, z
i+1
, . . . , z
j
|.
Let C
i j
= 1 if z
i
and z
j
will be compared sometime during the sort,
and 0 otherwise.
The only comparisons happen during a partition, with the pivot
element. Let
i j
be the rst (random) pivot element entering Z
i j
. A
little thinking shows:
Lemma
We have C
i j
= 1 if and only if
i j
|z
i
, z
j
|. Also, for
every x Z
i j
, we have
i j
= x
_
=
1
j i +1
.
It follows that
_
C
i j
= 1
_
= C
i j
=
2
ji+1
. The expected number
of comparisons is, with k = j i +1:
1i<jn
C
i j
=
1i<jn
2
j i +1
= 2
n
k=2
nk+1
i=1
1
k
= 2
n
k=2
n k +1
k
= 2
_
n 1
2
+
n 2
3
+ +
1
n
_
= 2(n +1)
_
1 +
1
2
+
1
3
+ +
1
n
_
4n.
From analysis we know 1 +
1
2
+
1
3
+ +
1
n
= lnn +O(1). Hence
the average complexity is O(nlog n).
Warning: We want to know also the probability of the cost
exceeding, say, 10nlog n. Analysis of the variance of the random
cost gives this, too.
Median and order statistics
The median of a list is the middle element (or the rst one, if
there are two) in the sorted order. We will try to nd it faster
than by full sorting.
More generally, we may want to nd the ith element in the
sorted order. (This generalizes minimum, maximum and
median.) This is called selection.
We will see that all selection tasks can be performed in linear
time.
Selection in expected linear time
Algorithm 7.1: RANDOMIZED-SELECT(A, p, r, i)
Return the ith element in the sorted order of A[p . . r].
1 if p = r then return p
2 q RANDOMIZED-PARTITION(A, p, r)
3 k q p +1
4 if i = k then return A[q]
5 else if i < k then return RANDOMIZED-SELECT(A, p, q 1, i)
6 else return RANDOMIZED-SELECT(A, q +1, r, i k)
Analysis
Worst-case running time is O(n
2
), even for nding the
minimum!
We will nd an upper bound T(n) on the expected running
time of RANDOMIZED-SELECT(A, p, r, i), independently of i, when
r p +1 = n.
Assume that cn is the cost of partition and the decision.
Denoting by
0
the running time of the whole process, and
1
the running time of the recursive call, we have
0
an +
1
,
0
an +
1
,
where an is the cost of partition.
Let Q = RANDOMIZED-PARTITION(A, p, r). Let us bound
1
under
the condition Q p +1 = k:
If i = k, then
1
= 0.
Else if i < k then
1
T(k 1),
Else
1
T(n k).
(
1
[Q p +1 = k) T(max(k 1, n k)).
Since
_
Q p +1 = k
_
=
1
n
, we can now sum, using Bayess
Theorem:
k=1
1
n
T(max(k 1, n k))
2
n
n1
k=n/2
T(k),
where we assumed n even, for simplicity.
We got the recursion T(n) an +
2
n
n1
k=n/2
T(k). We will prove by
induction T(n) cn for some c > 0. Assume it is true for < n, we
prove it for n (choosing c appropriately).
2
n
n1
k=n/2
ck =
2c
n
n
2
n/2 + n 1
2
3
4
cn,
where we used the sum formula of an arithmetic series. Now
an +
3
4
cn = cn if we make a +0.75c = c, that is c = 4a. For the
induction we also need the initial condition T(1) c 1, but this is
true with c = 4a.
Number-theoretic algorithms
Size of input Just as before, we count the number of input bits. If
the input is a positive integer x of 1000 bits, the input size is
1000 = |log x, and not x. So, counting from x down to 1 is
an exponential algorithm, as a function of the input size.
Cost of operations Most of the time, for cost, we still mean the
execution time of the computer, which has a xed word size.
So, multiplying two very large numbers x, y is not one
operation, since the representation of the numbers x, y must
be spread over many words. We say that we measure the
number of bit operations.
Sometimes, though, we will work with a model in which we
just measure the number of arithmetic operations. This is
similar to the sorting model in which we only counted
comparisons.
Elementary number theory
Divisibility and divisors. Notation x[ y.
Prime and composite numbers.
Theorem
If a, b are integers, b > 0, then there are unique
numbers q, r with
a = qb + r, 0 r < b.
Of course, q = [a/b|. We write r = a mod q.
Cost of computing the remainder and quotient: O(len(q)len(b)).
Indeed, the long division algorithm has len(q) iterations, with
numbers of length len(b).
If u mod b = v mod b then we write
u v (mod b)
and say u is congruent to v modulo b. For example, numbers
congruent to 1 modulo 2 are the odd numbers.
Common divisors
It is not easy to nd divisors of a number: most cryptographic
algorithms are based on this difculty of factorization.
But, as we will see, common divisors of two (or more) numbers are
easy to nd.
Theorem
For numbers a, b there are integers x, y such that
d = xa + y b > 0
is a common divisor of a and b.
It is easy to see that every common divisor of a, b divides d, so it is
the greatest common divisor, denoted by gcd(a, b).
Relatively prime integers a, b: those without common divisor, that
is gcd(a, b) = 1.
Fundamental theorem
Lemma (Fundamental)
If p is prime, then p[ab if and only if
p[a or p[b.
Proof. If p does not divide a, then gcd(a, p) = 1 = xa + yp, so
b = xab + ypb. Since p[ab, this shows p[b.
This implies
Theorem (Fundamental theorem of arithmetic)
Every positive
integer n has a unique prime decomposition n = p
e
1
1
p
e
k
k
.
The existence of some decomposition can be seen easily: just keep
dividing as long as you can, and collect the indivisible parts (the
primes). The lemma is needed to prove uniqueness.
Euclids algorithm
Based on the observation that if a = qb + r then the common
divisors of a, b are the same as the common divisors of b, r.
Algorithm 8.1: EUCLID(a, b)
Computes gcd(a, b)
1 if b = 0 then return a
2 else return EUCLID(b, a mod b)
Example: gcd(30, 21).
30 = 1 21 +9
21 = 2 9 +3
9 = 3 3
gcd = 3
Running time
Assume b < a:
a = bq + r b + r > 2r,
a/2 > r,
ab/2 > br.
In each iteration, the product decreases by a factor of 2.
|log(ab) iterations, using division.
Extended Euclidean algorithm
Proof of the theorem that some common divisor of a, b (namely the
greatest one) can be written as xa + y b.
It is true if b = 0. Assume it is true for numbers for which the
Euclidean algorithm takes < n steps, and prove by induction.
a = bq + r.
By induction,
d = x
/
b + y
/
r = x
/
b + y
/
a y
/
qb = y
/
a +(x
/
qy
/
)b.
Algorithm 8.2: EXTENDED-EUCLID(a, b)
Returns (d, x, y) where d = gcd(a, b) = xa + y b.
1 if b = 0 then return (a, 1, 0)
2 else
3 (d
/
, x
/
, y
/
) EXTENDED-EUCLID(b, a mod b)
4 return (d
/
, y
/
, x
/
[a/b|y
/
)
Nonrecursive version
The following version is better for analysis and hand calculation:
Algorithm 8.3: EXTENDED-EUCLID(a, b)
nonrecursive version.
1 r a; r
/
b
2 x 1; y 0
3 x
/
0; y
/
1
4 while r
/
,= 0 do
5 q [r/r
/
|
6 (r, x, y, r
/
, x
/
, y
/
) (r
/
, x
/
, y
/
, r qr
/
, x qx
/
, y qy
/
)
7 return (r, x, y)
Proof of validity: show by induction that always r = xa + y b.
Diophantine equations
Solving the equation
ax + by = c
in integers. Let d = gcd(a, b). There is a solution only if d[c. By the
Fundamental Lemma, there are x
/
, y
/
with ax
/
+ by
/
= d. Hence
we get a solution
x = x
/
(c/d), y = y
/
(c/d).
Multiplicative inverse
If gcd(a, m) = 1 then there is an a
/
with
aa
/
mod m = 1.
Indeed, we just need to solve the equation ax + my = 1. The
number a
/
is called the multiplicative inverse of a modulo m.
Application: If p is prime then every a < p has a multiplicative
inverse.
Modular arithmetic
Congruences behave somewhat like equations:
Theorem
If a
1
b
1
(mod m) and a
2
b
2
(mod m) then
a
1
ia
2
b
1
i b
2
(mod m),
a
1
a
2
b
1
b
2
(mod m).
Verify this in the lab. Division is also possible in the cases when
there is a multiplicative inverse:
Theorem
If gcd(a, m) = 1 then there is an a
/
with a
/
a 1
(mod m).
Efcient exponentiation
In modular arithmetic, we can exponentiate in a polynomial time:
Theorem
Given numbers a, n, m, there is an algorithm that
computes a
n
mod m in time polynomial in the size of input (which
is log a +log m+log m).
This is not true for ordinary exponentiation: 2
n
has (n) digits, so
it cannot be computed in time O(log n).
The trick that helps in the modular case is repeated squaring:
Compute a
2
mod m, a
4
mod m, a
8
mod m, . . . , using
a
2k
mod m = (a
k
mod m)
2
mod m.
Then multiply the needed powers. For example if
n = 10 = 2
3
+2
2
1
, then a
10
= a
2
3
a
2
1
.
Dynamic sets
These are abstract data types, just like heaps. Various subtypes
exist, depending on which operations we want to support:
SEARCH(S, k)
INSERT(S, k)
DELETE(S, k)
A data type supporting these operations is called a dictionary.
Other interesting operations:
MINIMUM(S), MAXIMUM(S)
SUCCESSOR(S, x) (given a pointer to x), PREDECESSOR(S, x)
UNION(S
1
, S
2
)
Elementary data structures
You have studied these in the Data Structures course.
Stack operations:
STACK-EMPTY(S)
PUSH(S, x)
POP(S)
Implement by array or linked list.
Queue operations:
ENQUEUE(Q, x)
DEQUEUE(S, x)
Implement by circular array or linked list (with link to end).
Linked lists
Singly linked, doubly linked, circular.
LIST-SEARCH(L, k)
LIST-INSERT(L, x) (x is a record)
LIST-DELETE(L, x)
In the implementation, sentinel element nil[L] between the head
and the tail to simplify boundary tests in a code.
Dynamic storage allocation
This underlies to a lot of data structure implementation.
You must either explicitly implement freeing unused objects of
various sizes and returning the memory space used by them to free
lists of objects of various sizes, or use some kind of garbage
collection algorithm.
The present course just assumes that you have seen (or will see)
some implementations in a Data Structures course (or elsewhere).
Representing trees
Binary tree data type, at least the following functions:
PARENT[x], LEFT[x], RIGHT[x].
We can represent all three functions explicitly by pointers.
Rooted trees with unbounded branching: One possible
representation via a binary tree is the left-child, right-sibling
representation.
Other representations : Heaps, only parent pointers, and so on.
Hash tables
Problem A very large universe U of possible items, each with a
different key. The number n of actual items that may
eventually occur is much smaller.
Solution idea Hash function h(k), hash table T[0. . m1]. Key k
hashes to hash value (bucket) h(k).
New problem Collisions.
Resolution Chaining, open hashing, and so on.
Uniform hashing assumption Assumes that
Items arrive randomly.
Search takes (1 + n/m), on average, since the average
list length is n/m.
Hash functions
What do we need? The hash function should spread the (hopefully
randomly incoming) elements of the universe as uniformly as
possible over the table, to minimize the chance of collisions.
Keys into natural numbers It is easier to work with numbers than
with words, so we translate words into numbers.
For example, as radix 128 integers, possibly adding up these
integers for different segments of the word.
Randomization: universal hashing
To guarantee the uniform hashing assumption, instead of assuming
that items arrive randomly, we we choose a random hash
function, h(, r), where r is a parameter chosen randomly from
some set H.
Denition
The family h(, ) is universal if for all x ,= y U we
have
_
h(x, r) = h( y, r)
_
1
m
.
If the values h(x, r) and h( y, r) are pairwise independent, then the
probability is exactly
1
m
(the converse is not always true). Thus,
from the point of view of collisions, universality is at least as good
as pairwise independence.
An example universal hash function
We assume that our table size m is a prime number. (Not too
restrictive: it is known that for every m there is a prime between m
and 2m.) Let d > 0 be an integer dimension. We break up our key
x into a sequence
x = x
1
, x
2
, . . . , x
d
, x
i
< m.
Fix the random coefcients r
i
< m, i = 1, . . . , d +1, therefore
[H[ = m
d+1
.
h(x, r) = r
1
x
1
+ + r
d
x
d
+ r
d+1
mod m.
Let us show that all values h(x, r) are pairwise independent,
therefore our random hash function is universal. We will prove that
h(x, r) takes each of its m possible values with the same
probability, independently of how we x h( y, r).
Every value A = h(x, r) appears exactly m
d
times. Indeed, no
matter how we choose r
1
, . . . , r
d
, one can choose r
d+1
uniquely
to make h(x, r) = A.
Now take some y ,= x, for example x
1
,= y
1
. Fix h(x, r) = A,
h( y, r) = B. Chooose r
2
, . . . , r
d
arbitrarily, and consider
= x
1
y
1
, W = r
2
(x
2
y
2
) + + r
d
(x
d
y
d
),
AB r
1
+W (mod m).
The last equation has exactly one solution r
1
=
1
(AB W),
where
1
is the multiplicative inverse of modulo the prime
m. Given r
1
, . . . , r
d
there is a unique r
d+1
with h( y, r) = B. So
every pair of values A, B appears exactly m
d1
times. This
proves the independence.
Optional: perfect hashing
If we know the set S of n keys in advance, it is also possible to hash
without any collisions, into a table of size 6n, using just 2 probes.
(Method of Fredman, Komls and Szemerdi).
Observations:
1
If m> n
2
then there is a vector c = (c
1
, . . . , c
d+1
) such that the
hash function h(, c) has no collisions. Indeed, for each pair
x ,= y, the probability of a collision is 1/m. There are
n(n 1)/2 such pairs, so the probability of having a collision
on even one pair is at most n(n 1)/(2m) < 1/2. Therefore
there is a vector c such that h(, c) has no collisions for S.
2
Now assume that m> 3n. For each vector r and position z, let
C
z
(r) be the number of keys colliding on position z. We will
show later that there is a vector b with
z
C
z
(b)
2
< 3n.
Using the above observations, here is a 2-step hashing scheme.
Using vector b, create a table U
z
of size C
z
(b)
2
at each position
z of our table T of size 3n where there are collisions. For each
table U
z
(b), use a hash function h(, c(z)) to hash x to a unique
position in U
z
. This is possible due to argument
1
above.
Now, rst use h(x, b) to hash key x to a position z in table T. If
there were collisions then T at position z will hold the key c(z)
of the second hash function h(x, c(z)) that hashes x to a unique
position in table U
z
.
Total size: 3n for table T, and < 3n for the union of all
nonempty tables U
z
.
Square sizes of chains
It remains to show that for some value b of r, we have
z
C
z
(b)
2
< 3n.
Fix position z. For item i, let X
i
= 1 if h(i, r) = z and 0 otherwise.
Then C
z
=
n
i=1
X
i
. The variables X
i
are pairwise independent,
with
_
X
i
= 1
_
= 1/m. Because of the independence,
X
i
X
j
= X
i
X
j
for i ,= j, allowing to write
C
2
z
=
i
X
2
i
+
i,=j
X
i
X
j
,
C
2
z
=
i
X
2
i
+
i,=j
X
i
X
j
=
n
m
+
n(n 1)
m
2
,
z
C
2
z
= n
_
1 +
n 1
m
_
< n(1 +1/3).
Since this is the average over all vectors r there is obviously some
vector b with
z
C
z
(b)
2
< 3n.
Open addressing
Hashing must sometimes be very fast, for example in caches. In
this case, even the pointer operations of the chaining collision
resolution are considered slow.
Open addressing puts the collided elements into the same table,
at a new position computed very fast. Looking for the new
position is called probing. We need a whole probe sequence
h(k, 0), h(k, 1, ), . . . .
In open addressing schemes, deletion is problematic, though
there are several marking methods.
It is not easy to analyze open addressing, since the uniform
hashing assumption is too strong.
Linear probing h(k, i) = (h
/
(k) + i) mod m. There is a problem of
primary clustering: long runs of collided elements build up.
Quadratic probing tries to correct this: h(k, i) = h
/
(k) + c
1
i + c
2
i
2
.
There is, however, a problem of secondary clustering, since
probing sequences are independent of k.
Double hashing solves these issues:
h(k, i) = h
1
(k) + ih
2
(k),
where h
2
(k) is relatively prime to m.
Example: h
1
(k) = k mod m with m a prime,
h
2
(k) = 1 + k mod (m1).
Search trees
Ordered left-to-right, items placed into the tree nodes (not only
in leaves).
Inorder tree walk.
Search, min, max, successor, predecessor, insert, delete. All in
O(h) steps.
Searches
Iterative or recursive TREE-SEARCH(x, k), where x is (a pointer
to) the root.
Minimum and maximum are easy.
TREE-SUCCESSOR(x)
if right subtree of x is nonempty then its minimum.
else go up: return the rst node to whom you had to step
right-up (if it exists).
Binary search tree
Insertion and deletion
Insertion always into a leaf.
Deletion TREE-DELETE(T, z) is more complex. We use the procedure
TRANSPLANT(T, u, v),
which replaces the subtree starting at u in T with the tree
rooted in v. Warning:
Before, the tree of v need not be part of T.
After, the pointer u still points to the record u (along with
any subtree hanging from it).
Algorithm 10.1: TREE-DELETE(T, z)
Delete node z from tree T.
1 if z.left = nil then TRANSPLANT(T, z, z.right)
2 else if z.right = nil then TRANSPLANT(T, z, z.left)
3 else
4 y TREE-MINIMUM(z.right) // y has no left child.
5 if y.parent ,= z then
// Splice it out from between parent and right child.
6 TRANSPLANT(T, y, y.right) // Does not lose original y.
7 y.right z.right; y.right.parent y
8 TRANSPLANT(T, z, y)
9 y.left z.left; y.left.parent y
Algorithm 10.2: BS-TREE-SPLIT(T, k)
Split T around key k into two trees.
1 (L, R) new trees
2 if T is empty then L empty; R empty
3 else if k < T.root.key then
4 R T
5 (L, R
/
) BS-TREE-SPLIT(T.left)
6 R.left R
/
7 else
8 L T
9 (L
/
, R) BS-TREE-SPLIT(T.right)
10 L.right L
/
11 return (L, R)
Randomized binary search trees
It can be shown (see book) that if n keys are inserted into a
binary search tree in random order, then the expected height of
the tree is only O(log n).
You cannot expect that keys will be inserted in random order.
But it is possible (Martnez-Roura) to randomize the insertion
and deletion operations in a way that will still guarantee the
same result.
Here is the idea of randomized insertion into a binary tree of
size n.
With probability
1
n+1
, insert the key as root, and split the tree
around it.
With probability 1
1
n+1
, randomized-insert recursively into the
left or right subtree, as needed.
There are corresponding deletion and join operations.
Algorithm 10.3: RAND-BS-TREE-INSERT(T, k)
Randomized insert into a binary search tree.
1 n T.size
2 r RANDOM(0, n)
3 if r = n then // Insert at the root.
4 (L, R) BS-TREE-SPLIT(T, k)
5 T new tree; T.key k
6 (T.left, T.right) (L, R);
7 else if k < T.root.key then RAND-BS-TREE-INSERT(T.left, k)
8 else RAND-BS-TREE-INSERT(T.right, k)
Balanced trees
The cost of most common operations on search trees (search,
insert, delete) is O(h), where h is the height.
We can claim h = O(log n) for the size n, if the tree is balanced.
But keeping a tree completely balanced (say a binary tree as
complete as possible) is as costly as maintaining a sorted array.
Idea: keep the tree only nearly balanced! The idea has several
implementations: AVL trees, B-trees, red-black trees. We will
only see B-trees.
B-trees
Search trees, with the following properties, with a parameter t 2.
The records are kept in the tree nodes, sorted, between the
edges going to the subtree.
Every path from root to leaf has the same length.
Every internal node has at least t 1 and at most 2t 1
records (t to 2t children).
Total number of nodes n 1 + t + + t
h
=
t
h+1
1
t1
, showing
h = O(log
t
n).
Maintaining a B-tree
Insertion Insert into the appropriate node. Split, if necessary. If
this requires split in the parent, too, split the parent.
There is an implementation with only one, downward, pass. It
makes sure that each subtree in which we insert, the root is not
full: has fewer than 2t 1 keys. To achieve it, we split
proactively.
Deletion More complex, but also doable with one, downward,
pass. Make sure that when we delete from a subtree, the root
has more than t 1 (the minimum) keys. To achieve this,
merge proactively.
Insertion
Algorithm 10.4: B-TREE-SPLIT-CHILD(x, i)
Split the ith child c
i
of non-full node x: before split, the child
has 2t 1 keys. No surprises.
1 z new node; y x.c
i
2 z.n t 1; z.leaf y.leaf // If y is a leaf, so be z.
3 for j = 1 to t 1 do z.key
j
y.key
j+t
4 if not y.leaf then for j = 1 to t do z.c
j
y.c
j+t
5 y.n t 1 // Now z contains the second half of y.
6 for j = x.n +1 downto i +1 do x.c
j+1
x.c
j
7 x.c
i+1
z // Now z is a child of x.
8 for j = x.n downto i do x.key
j+1
x.key
j
9 x.key
i
y.key
t
// The middle key of y is inserted into x.
10 x.n++
Algorithm 10.5: B-TREE-INSERT-NONFULL(x, k)
Insert into a subtree at a non-full node.
1 i x.n
2 if x.leaf then // Just insert the key into the leaf.
3 while i 1 and k < x.key
i
do x.key
i+1
x.key
i
; i
4 x.key
i+1
k; x.n++
5 else
6 while i 1 and k < x.key
i
do i
7 i++ // Found the child to insert into.
8 if x.c
i
.n = 2t 1 then
9 B-TREE-SPLIT-CHILD(x, i)
10 if k > x.ke y
i
then i++
11 B-TREE-INSERT-NONFULL(x.c
i
, k)
Algorithm 10.6: B-TREE-INSERT(T, k)
1 r T.root
2 if r.n < 2t 1 then B-TREE-INSERT-NONFULL(r, k)
3 else // Add new root above r before splitting it.
4 s new node; s.leaf false; s.n 0
5 s.c
1
r; T.root s
6 B-TREE-SPLIT-CHILD(s, 1)
7 B-TREE-INSERT-NONFULL(s, k)
Deletion
The deletion algorithm is more complex. See the book for an
outline, and try to implement it in pseudocode as an exercise.
Augmenting data structures
Dynamic order statistics
Cache in each node the size of the subtree.
Retrieving an element with a given rank (selection).
Determining the rank of a given element.
Maintaining subtree sizes.
Dynamic programming
When there is a recursive algorithm based on subproblems but the
total number of subproblems is not too great, it is possible to cache
(memoize) all solutions in a table.
Fibonacci numbers.
Binomial coefcients.
(In both cases, the algorithm is by far not as good as computing the
known formulas.)
Longest common subsequence
The diff program in Unix: what does it mean to say that we nd
the places where two les differ (including insertions and
deletions)? Or, what does it mean to keep the common parts?
Let it mean the longest subsequence present in both:
X = a b c b d a b
Y = b d c a b a b a
b c b a
Running through all subsequences would take exponential
time. There is a faster solution, recognizing that we only want
to nd some longest subsequence.
Let c[i, j] be the length of the longest common subsequence of
the prex of X[1. . i] and Y[1. . j]. Recursion:
c[i, j] =
_
_
_
0 if i = 0 or j = 0,
c[i 1, j 1] +1 if i, j > 0 and x
i
= y
j
,
max(c[i, j 1], c[i 1, j]) otherwise.
We are not computing a function C(X, Y, i, j) by naive
recursion, but collect the values c[i, j] as they are computed, in
array c: C(X, Y, i, j, c) checks whether c[i, j] is dened. If yes, it
just returns c[i, j]; else it uses the above recursion, and assigns
c[i, j] before returning the value.
Non-recursive implementation
Compute the table c[i, j] bottom-up.
Also, store the value b[i, j] = ||, || or || depending on
whether the optimum is c[i, j 1], c[i 1, j] or
c[i 1, j 1] +1. (We also use the value |, | in case
c[i, j] = c[i 1, j] = c[i, j 1], though this is not important.)
Find the longest common subsequence walking backwards on
the arrows.
j 1 2 3 4 5 6
i b d c a b a
1 a 0 0 0 1 1 1
2 b 1 1 1 1 2 2
3 c 1 1 2 2 2 2
4 b 1 1 2 2 3 3
5 d 1 2 2 2 3 3
6 a 1 2 2 3 3 4
7 b 1 2 2 3 4 4
Algorithm 11.1: LCS-LENGTH(X, Y)
1 mX.length; n Y.length
2 b[1. . m, 1. . n], c[0. . m, 0. . n] new tables
3 for i = 1 to m do c[i, 0] 0
4 for j = 1 to n do c[0, j] 0
5 for i = 1 to m do
6 for j = 1 to n do
7 if x
i
= y
j
then
8 c[i, j] c[i 1, j 1] +1
9 b[i, j] ||
10 else
11 c[i, j] max(c[i, j 1], c[i 1, j]); b[i, j]
12 if c[i 1, j] = c[i, j] then b[i, j] ||
13 if c[i, j 1] = c[i, j] then b[i, j] b[i, j] ||
14 return c, b
Algorithm 11.2: LCS-PRINT(b, X, i, j)
Print the longest common subsequence of X[1. . i] and
Y[1. . j] using the table b computed above.
1 if i = 0 or j = 0 then return
2 if b[i, j] then LCS-PRINT(b, X, i 1, j 1); print x
i
3 else if b[i, j] then LCS-PRINT(b, X, i 1, j)
4 else LCS-PRINT(b, X, i, j 1)
The knapsack problem
Given: volumes b a
1
, . . . , a
n
> 0, and integer values
w
1
w
n
> 0.
maximize w
1
x
1
+ + w
n
x
n
subject to a
1
x
1
+ + a
n
x
n
b,
x
i
=0, 1, i = 1, . . . , n.
In other words, nd a subset i
1
< < i
k
of the set of items 1, . . . , n
(by choosing which x
i
= 1) such that
the sum of their volumes a
i
1
+ +a
i
k
is less than the volume b
of our knapsack,
the sum of their values w
i
1
+ + w
i
k
is maximal.
Special cases
Subset sum problem nd i
1
, . . . , i
k
with a
i
1
+ + a
i
k
= b.
Obtained by setting w
i
= a
i
. Now if there is a solution with
value b, we are done.
Partition problem Given numbers a
1
, . . . , a
n
, nd i
1
, . . . , i
k
such
that a
i
1
+ + a
i
k
is as close as possible to (a
1
+ + a
n
)/2.
Solve by dynamic programming. For 1 k n,
m
k
(p) = min| a
1
x
1
+ + a
k
x
k
: w
1
x
1
+ + w
k
x
k
= p |.
If the set is empty the minimum is . Let w = w
1
+ + w
n
. The
array m
k+1
(0), . . . , m
k+1
(w) can be computed from
m
k
(0), . . . , m
k
(w):
If w
k+1
> p then m
k+1
(p) = m
k
(p) (item k +1 cannot be used).
Otherwise, decide whether to use item k +1:
m
k+1
(p) = min| m
k
(p), a
k+1
+ m
k
(p w
k+1
) |.
The optimum is max| p : m
n
(p) b |.
Complexity: O(nw) steps, (counting additions as single steps).
Why is this not a polynomial algorithm?
What if we want the exact equation a
i
1
+ + a
i
k
= b? Assume
that a
i
, b are also integers.
For 1 k n,
A
k
(p) = | a
1
x
1
+ + a
k
x
k
: w
1
x
1
+ + w
k
x
k
= p | |1, . . . , b|.
The array A
k+1
(0), . . . , A
k+1
(w) can be computed from
A
k
(0), . . . , A
k
(w):
If w
k+1
> p then A
k+1
(p) = A
k
(p).
Otherwise,
A
k+1
(p) = A
k
(p) a
k+1
+A
k
(p w
k+1
) |1, . . . , b|,
where x +|u
1
, . . . , u
q
| = |x +u
1
, . . . , x +u
q
|.
Now the optimum is max| p : b A
n
(p) |.
Complexity: O(nwb).
Example: Let |a
1
, . . . , a
5
| = |2, 3, 5, 7, 11|, w
i
= 1, b = 13.
Otherwise,
1 2 3 4
1 |2| || || ||
2 |2, 3| |5| || ||
3 |2, 3, 5| |5, 7, 8| |10| ||
4 |2, 3, 5, 7| |5, 7, 8, 9, 10, 12| |10, 12| ||
5 |2, 3, 5, 7, 11| |5, 7, 8, 9, 10, 12, 13| |10, 12| ||
max| p : 13 A
5
(p) | = 2.
Application: money changer problem. Produce the sum b using
smallest number of coins of denominations a
1
, . . . , a
n
(at most one
of each).
Greedy algorithms
Activity selection
There is no general recipe. In general, greedy algorithms assume
that
There is an objective function f (x
1
, . . . , x
n
) to optimize that
depends on some choices x
1
, . . . , x
n
. (Say, we need to maximize
f ().)
There is a way to estimate, roughly, the contribution of each
choice x
i
to the nal value, but without taking into account
how our choice will constrain the later choices.
The algorithm is greedy if it still makes the choice with the best
contribution.
The greedy algorithm is frequently not the best, but sometimes it is.
Example: activity selection. Given: activities
[s
i
, f
i
)
with starting time s
i
and nishing time f
i
. Goal: to perform the
largest number of activites.
Greedy algorithm: sort by f
i
. Repeatedly choose the activity with
the smallest f
i
compatible with the ones already chosen. Rationale:
smallest f
i
restricts our later choices least.
Example
(1, 4), (3, 5), (0, 6), (5, 7), (3, 8), (5, 9), (6, 10),
(8, 11), (8, 12), (2, 13), (12, 14).
Chosen: (1, 4), (5, 7), (8, 11), (12, 14).
Is this correct? Yes, by design, since we always choose an activity
compatible with the previous ones.
Is this best? By induction, it is sufcient to see that in the rst step,
the greedy choice is best possible. And it is, since if an activity
is left available after the some choice, it is also left available
after the greedy choice.
Knapsack
The knapsack problem, considered above under dynamic
programming, also has a natural greedy algorithm. It is easy to
see that this is not optimal. (Exercise.)
But there is a version of this problem, called the fractional
knapsack problem, in which we are allowed to take any fraction
of an item. The greedy algorithm is optimal in this case.
(Exercise.)
Exhaustive search
Maximum independent set
There are cases of algorithmic problems when nothing much better
is known than trying out all possibilities.
Example: Maximum independent set in a graph. Let G = (V, E), be
an undirected graph on the set of nodes V, with set of edges E.
A set S of vertices is independent if no two vertices u, v S are
connected by an edge: |u, v| , E.
A maximal independent set is one that cannot be extended to a
larger independent set.
A maximum independent set is an independent set of maximal
size.
The black point forms by itself a maximal
independent set, the white ones a maximum
independent set.
It is easy to nd a maximal independent set, but it is very hard to
nd a maximum independent set. This is what we are interested in.
Let N
G
(x) denote the set of neighbors of point x in graph G,
including x itself.
Let us x a vertex x. There are two kinds of independent vertex set
S: those that contain x, and and those that do not.
If x , S then S is an independent subset of the graph G \ |x|
(of course, the edges adjacent to x are also removed from G).
If x S then S does not intersect N
G
(x), so S \ |x| is an
independent subset of G \ N
G
(x).
This gives rise to a simple recursive search algorithm.
Algorithm 13.1: MAX-INDEP(G)
Returns a maximum independent set of the graph G = (V, E).
We write V = V(G).
1 if G has no edges then return V(G)
2 else
3 x a vertex from G
4 S
1
MAX-INDEP(G \ |x|)
5 S
2
|x| MAX-INDEP(G \ N
G
(x))
6 return the larger of S
1
, S
2
We can improve on the simple search by noticing that some
searches cannot lead to improvement. For example in the graph
2 3 4 1
if we found the set |1, 3| then since the possibility that 1 S, 3 , S
cannot give us an independent set of size > 2 anymore, we need
not explore it. There is also no need to explore any possibilities
after 2 S. We formalize this as follows:
MAX-INDEP-LB(G, b)
is an algorithm that returns an independent subset of G if it can
return one of size > b, otherwise it returns .
Algorithm 13.2: MAX-INDEP-LB(G, b)
Returns a maximum independent set of the graph G if there is
one of size > b, otherwise returns .
1 if [V(G)[ b then return
2 else if G has no edges then return V(G)
3 else
4 x a vertex from V
5 S
1
MAX-INDEP-LB(G \ |x|, b)
6 b
/
max(b, [S
1
[)
7 S
2
|x| MAX-INDEP-LB(G \ N
G
(x), b
/
1)
8 S the larger of S
1
, S
2
9 if [S[ > b then return S else return
We will call the algorithm as MAX-INDEP-LB(G, 0).
This kind of algorithm is called a branch-and-bound algorithm:
maintaining a bound sometimes helps pruning the branches.
Let us trace this algorithm on the graph
2 3 4 1
(,1) : (G \ |1|, 0)
(,1, ,2) : (G \ |1, 2|, 0)
(,1, ,2, ,3) : (G \ |1, 2, 3|, 0) returns |4|, b
/
= 1
(,1, ,2, 3) : (G \ |1, 2, 3, 4|, 1 1 = 0) returns ,
so (,1, ,2) : (G \ |1, 2|, 0) returns |4|, b
/
= 1.
(,1, 2) : (G \ |1, 2, 3|, 1 1 = 0) returns |4|,
so (,1) : (G \ |1|, 0) returns |2, 4|, b
/
= 2.
(1) : (G \ |1, 2|, 2 1 = 1)
(1, ,2, ,3) : (G \ |1, 2, 3|, 1) returns by bound, b
/
= 2.
(1, ,2, 3) : (G \ |1, 2, 3, 4|, 1 1 = 0) returns ,
so (1) : (G \ |1, 2|, 2 1 = 1) returns by bound,
giving the nal return |2, 4|.
Graph representation
Graphs G(V, E): set V of vertices or nodes, or points, and set E of
edges or lines.
Directed or undirected. Loop edges. Parallel edges.
Representations:
Adjacency list good for sparse graphs, works for all kinds of graph.
Adjacency matrix (vertex-vertex), good for dense graphs. If there
are parallel edges, they can be represented by multiplicities.
Incidence matrix (vertex-edge).
Some other graph notions: path, (no repeated edges or nodes),
walk (repetitions allowed), cycle.
We write u v if there is a directed path from u to v.
Breadth-rst search
Directed graph G = (V, E). Call some vertex s the source. Finds
shortest path (by number of edges) from the source s to every
vertex reachable from s.
Rough description of breadth-rst search:
Distance x.d from s gets computed.
Black nodes have been visited.
Gray nodes frontier.
White nodes unvisited.
Algorithm 14.1: Generic graph search
1 paint s grey
2 while there are grey nodes do
3 take a grey node u, paint it black
4 for each white neighbor v of u do
5 v.d u.d +1
6 make v a grey child of u
To turn this into breadth-rst search, make the set of grey nodes a
queue Q. Now Q is represented in two ways, as a queue, and as a
subset of V (the grey nodes). The two representations must remain
synchronized.
Analysis: O([V[ +[E[) steps, since every link is handled at most
once.
Algorithm 14.2: BFS(G, s)
Input: graph G = (V, Adj).
Returns: the tree parents u., and the distances u.d.
Queue Q makes this search breadth-rst.
1 for u V \ |s| do
2 u.color white; u.d ; u. nil
3 s.color gray; s.d 0; Q |s|
4 while Q ,= do
5 u Q.head; DEQUEUE(Q)
6 for v u.Adj do
7 if v.color = white then
8 v.color gray; v.d u.d +1; v. u
9 ENQUEUE(Q, v)
10 u.color black
In the breadth-rst tree dened by , each path from node to root
is a shortest path.
Shortest path problems
Composite words
Sometimes, it takes some thinking to see that a problem is a
shortest path problem.
Example: how to break up some composite words?
Personaleinkommensteuerschtzungskommissionsmitglieds-
reisekostenrechnungsergnzungsrevisionsfund (Mark Twain)
With a German dictionary, break into relatively few components.
Graph points all division points of the word, including start and
end.
Edges if the word between the points is in the dictionary (maybe
without the s at the end).
Path between the start and end corresponds to a legal breakup.
(Note that this graph is acyclic.)
The word breakup problem is an example where the graph is
given only implicitly: To nd out whether there is an edge
between two points, you must make a dictionary lookup.
We may want to minimize those lookups, but minimizing two
different objective functions simultaneously (the number of
division points and the number of lookups) is generally not
possible.
(For minimizing lookups, depth-rst search seems better.)
Car racing
start finish
In each step, the speed vector can change only by 1 in each
direction. We have to start and arrive with speed 1, vertical
direction. There is a graph in which this is a shortest path problem.
Vertices (point, speed vector) pairs (p, v).
Edges between (p
1
, v
1
) and (p
2
, v
2
): if p
2
p
1
= v
1
, [v
2
v
1
[
1.
Here [(x, y)[
f (u, v). So we
simply require
f (v, u) = f (u, v).
The total ow entering a non-end node equals the total ow
leaving it:
for all u V \ |s, t|
v
f (u, v) = 0.
Each edge (u, v) imposes a capacity c(u, v) 0 on the ow:
f (u, v) c(u, v). (We may have c(u, v) ,= c(v, u).)
9/16
12/12
12/20
3/13
14
4
3/4
9
7
s
t
v
1
v
2
v
3
v
4
The notation f /c means ow f along an edge with capacity c.
Our goal is to maximize the value of the ow f , that is
[ f [ =
v
f (s, v) =
v
f (v, t).
Application to matching
s t
n points on left, n on right. Edges directed to right, with unit
capacity.
Perfect matching ow of value n.
Flow of value n perfect matching? Not always, but
fortunately (as will be seen), there is always an integer
maximum ow.
Ford-Fulkerson theory
Idea for increasing the ow: do it
along some path.
9/16
12/12
12/20
3/13
14
4
3/4 9
7
s t
v
1
v
2
v
3
v
4
Increment it along the path
s-v
2
-v
4
-t by 4.
Then increment along the path
s-v
2
-v
4
-v
3
-t by 6. Now we are
stuck: no more direct way to
augment.
9/16
12/12
12/20
7/13
4/14
4/4
3/4 9
7
s t
v
1
v
2
v
3
v
4
9/16
12/12
18/20
13/13
10/14
4/4
3/4 9
6/7
s t
v
1
v
2
v
3
v
4
New idea: Increase (by 1) on
(s, v
1
), decrease on (v
1
, v
2
),
increase on (v
2
, v
4
, v
3
, t).
Residual network, augmenting path
Generalization: Given a ow f ,
residual capacity
c
f
(u, v) = c(u, v) f (u, v).
Makes sense even with negative
f (u, v). The residual network
may have edges (with positive
capacity) that were not in the
original network. An augmenting
path is an s-t path in the residual
network (with some ow along
it). (How does it change the
original ow?)
7
18
13
4
1
9
1
s
t
v
1
v
2
v
3
v
4
9
3
2
4
10
9/16
12/12
18/20
13/13
10/14
4/4
3/4 9
6/7
s t
v
1
v
2
v
3
v
4
12
6
Residual capacity of the path: the minimum of the residual
capacities of its edges (in the example, 1).
We obtained:
10/16
12/12
19/20
13/13
11/14
4/4
2/4
9
7/7
s
t
v
1
v
2
v
3
v
4
This cannot be improved: look at the cut (S, T) with T = |v
3
, t|.
Method of augmenting paths
Keep increasing the ow along augmenting paths.
Questions
1
If it terminates, did we reach maximum ow?
2
Can we make it terminate?
3
How many augmentations may be needed? Is this a
polynomial time algorithm?
Neither of these questions is trivial. The technique of cuts takes
closer to the solution.
Cuts
Cut (S, T) is a partition of V with s S, t T.
Net ow f (S, T) =
uS,vT
f (u, v).
Capacity c(S, T) =
uS,vT
c(u, v). Obviously, f (S, T) c(S, T).
9/16
12/12
12/20
3/13
14
4
3/4 9
7
s
t
v
1
v
2
v
3
v
4
S T
In this example, c(S, T) = 26, f (S, T) = 12.
Lemma
f (S, T) = [ f [, the value of the ow.
Corollary
The value of any ow is bounded by the capacity of
any cut.
Theorem (Max-ow, min-cut)
The following properties of a
ow f are equivalent.
1
[ f [ = c(S, T) for some cut (S, T).
2
f is a maximum ow.
3
There are no augmenting paths to f .
The equivalence of the rst two statements says that the size of the
maximum ow is equal to the size of the minimum cut.
Proof:
1
2
and
2
3
are obvious. The crucial step is
3
1
.
Given f with no augmenting paths, we construct (S, T): let S be
the nodes reachable from s in the residual network G
f
.
Proof of the marriage theorem
Using Max-Flow Min-Cut. Assume there is no perfect matching in
the bipartite graph G = (A B, E), with [A[ = [B[ = n. We nd a
bottleneck H A with [N(H)[ < [H[.
Flow network over A B |s, t| as before. Since there is no perfect
matching, the maximum ow has size < n. So there is a cut (S, T),
s S, t T, with c(S, T) < n.
H = S A
T A
H
S
H
T
s
t
Let H = S A, H
/
= N(H).
n > c(S, T)
= c(|s|, T) + c(S A, T B) + c(S, |t|)
(n [H[) +[H
/
T[ +[H
/
S[
= n [H[ +[H
/
[,
[H[ > [H
/
[.
Polynomial-time algorithm
Does the Ford-Fulkerson algorithm terminate? Not necessarily
(if capacities are not integers), unless we choose the
augmenting paths carefully.
Integer capacities: always terminates, but may take
exponentially long.
Network derived from the bipartite matching problem: each
capacity is 1, so we terminate in polynomial time.
Edmonds-Karp: use breadth-rst search for the augmenting
paths. Why should this terminate?
Lemma
In the Edmonds-Karp algorithm, the shortest-path
distance
f
(s, v) increases monotonically with each augmentation.
Proof: Let
f
(s, u) be the distance of u from s in G
f
, and let f
/
be
the augmented ow. Assume, by contradiction
f
/ (s, v) <
f
(s, v)
for some v: let v be the one among these with smallest
f
/ (s, v).
Let u v be be a shortest path edge in G
f
/ , and
d :=
f
(s, u)(=
f
/ (s, u)), then
f
/ (s, v) = d +1.
Edge (u, v) is new in G
f
/ ; so (v, u) was a shortest path edge in G
f
,
giving
f
(s, v) = d 1. But
f
/ (s, v) = d +1 contradicts
f
/ (s, v) <
f
(s, v).
An edge is said to be critical, when it has just been lled to capacity.
Lemma
Between every two times that an edge (u, v) is critical,
f
(s, u) increases by at least 2.
Proof: When it is critical,
f
(s, v) =
f
(s, u) +1. Then it disappears
until some ow f
/
. When it reappears, then (v, u) is critical, so
f
/ (s, u) =
f
/ (s, v) +1
f
(s, v) +1 =
f
(s, u) +2.
Corollary
We have a polynomial algorithm.
Proof: Just bound the number of possible augmentations, noticing
that each augmentation makes some edge critical.
Let n = [V[, m = [E[. Each edge becomes critical at most n/2 times.
Therefore there are at most m n/2 augmentations. Each
augmentation may take O(m) steps: total bound is
O(m
2
n).
There are better algorithms: Goldbergs push-relabel algorithm,
also given in your book, achieves O(n
3
).
Approximations
The setting
In some cases we cannot hope to nd a complete solution to some
algorithmic problem, in reasonable time: we have to be satised
with an approximate solution. This makes sense in case of
optimization problems.
Suppose we have to maximize a positive function. For object
function f (x, y) for x, y |0, 1|
n
, for input x, the optimum is
M(x) = max
y
f (x, y)
where y runs over the possible values which we will call witnesses.
For 0 < , an algorithm A(x) is a -approximation if
f (x, A(x)) > M(x)/.
For minimization problems, with minimum m(x), we require
f (x, A(x)) < M(x).
Greedy algorithms
Try local improvements as long as you can.
Example (Maximum cut)
Graph G = (V, E), cut S V,
S = V \ S. Find cut S that maximizes the number of edges in the
cut:
[| |u, v| E : u S, v S |[.
Greedy algorithm:
Repeat: nd a point on one side of the cut whose moving
to the other side increases the cutsize.
Theorem
If you cannot improve anymore with this algorithm
then you are within a factor 2 of the optimum.
Proof. The unimprovable cut contains at least half of all edges.
Randomized algorithms
Generalize maximum cut for the case where edges e have weights
w
e
, that is maximize
uS,vS
w
uv
.
Question The greedy algorithm brings within factor 2 of the
optimum also in the weighted case. But does it take a
polynomial number of steps?
New idea: decide each v S? question by tossing a coin. The
expected weight of the cut is
1
2
e
w
e
, since each edge is in the
cut with probability 1/2.
Less greed is sometimes better
What does the greedy algorithm for vertex cover say?
The following, less greedy algorithm has better performance
guarantee.
Approx_Vertex_Cover(G):
1 C
2 E
/
E[G]
3 while E
/
,= do
4 let (u, v) be an arbitrary edge in E
/
5 C C |u, v|
6 remove from E
/
every edge incident on either u or v
7 return C
Theorem
Approx_Vertex_Cover has a ratio bound of 2.
Proof. The points of C are endpoints of a matching. Any optimum
vertex cover must contain half of them.
How bad is greedy vertex cover?
Greedy algorithm: Repeat the following, as long as you can:
Find a vertex with the largest possible degree. Delete it from
the graph, along with all the edges incident to it.
We will show a graph where this algorithm does not even have a
constant approximation ratio.
A = |a
1
, . . . , a
n
|.
B
2
= |b
2,1
, . . . , b
2,[n/2|
|. Here, b
2,1
is connected to a
1
, a
2
,
further b
2,2
to a
3
, a
4
, and so on.
B
3
= |b
3,1
, . . . , b
3,[n/3|
|. Here b
3,1
is connected to a
1
, a
2
, a
3
,
further b
3,2
to a
4
, a
5
, a
6
, and so on.
and so on.
B
n
consists of a single point connected to all elements of A.
The n elements of A will cover all edges. Greedy algorithm:
Each element of A has degree (n 1), so it chooses B
n
rst.
Then B
n1
, then B
n2
, and so on, down to B
2
.
The total number of points chosen is
[n/2| +[n/3| + +[n/n| n(1/2 + +1/n) n nlnn.
The set-covering problem
Can it be worse than this? No, even for a more general problem:
set cover.
Given (X, ): a set X and a family of subsets of X, nd a
min-size subset of covering X.
Example: Smallest committee with people covering all skills.
Generalization: Set S has weight w(S) > 0. We want a
minimum-weight set cover.
The algorithm Greedy_Set_Cover(X, ):
1 U X
2 (
3 while U ,= do
4 select an S that maximizes [S U[/w(S)
5 U U \ S
6 ( ( |S|
7 return (
Analysis
Let H(n) = 1 +1/2 + +1/n( lnn).
Theorem
Greedy_Set_Cover has a ratio bound
max
S
H([S[).
See the proof in a graduate course (or in the book).
Application to vertex cover.
Approximation scheme
Knapsack problem
Recall the knapsack problem:
Given: integers b a
1
, . . . , a
n
, and integer weights w
1
w
n
.
maximize
j
w
j
x
j
subject to
j
a
j
x
j
b,
x
i
=0, 1, i = 1, . . . , n.
Idea for approximation: break each w
i
into a smaller number of big
chunks, and use dynamic programming. Let r > 0, w
/
i
= [w
i
/r|.
maximize
j
w
/
j
x
j
subject to
j
a
j
x
j
b,
x
i
=0, 1, i = 1, . . . , n.
When r is large then our numbers are small, so the dynamic
programming solution is not so expensive. But we still get a good
approximation to the optimum. (See the detailed estimate in a
graduate course.)
NP problems
Examples
shortest vs. longest simple paths
compositeness
subset sum
perfect matching
graph isomorphism
3-coloring
Ultrasound test of sex of fetus.
Decision problems vs. optimization problems vs. search problems.
Example
Given a graph G.
Decision Given k, does G have an independent subset of size k?
Optimization What is the size of the largest independent set?
Search Given k, give an independent set of size k (if there is one).
Optimization+search Give a maximum size independent set.
Abstract problems Instance. Solution.
Encoding We can pretend that all our problems are about strings.
Polynomial-time verication
Example
Hamiltonian cycles.
An NP problem is dened with the help of a polynomial-time
function
V(x, w)
with yes/no values that veries, for a given input x and witness
(certicate) w whether w is indeed witness for x.
Decision problem: Is there a witness (say, a Hamilton cycle)?
Search problem: Find a witness!
The same decision problem may belong to very different
verication functions (search problems):
Example
Input: positive integer x that is of the form z
k
for any
k > 0.
Decision: Is x composite?
Witness kind 1: Positive y ,= x, 1 dividing x.
Witness kind 2: : Different u, v, w with u
2
v
2
w
2
(mod x).
The following theorem will be probably proved for you in
cryptography or an algebraic algorithms course:
Theorem
If x is not of the form z
k
for k > 0 then it is composite
if and only if there is a b such that the equation t
2
b (mod x)
has more than two solutions in t.
Reduction, completeness
Reduction of problem A
1
to problem A
2
(given by verication
functions V
1
, V
2
) with a reduction (translation) function :
wV
1
(x, w) uV
2
((x), u).
We will say A
1
P
A
2
, if there is a polynomially computable .
Examples
Reducing matching to maximum ow.
Vertex cover to maximum independent set: C is a vertex cover
iff V \ C is an independent set.
NP-hardness.
NP-completeness.
Satisability of Boolean formulas
Boolean formulas.
F(x, y, z) = (x y z) (x y z) (x y z)
is 1 if and only if exactly two of the variables x, y, z are 1. So it
is satisable, say the assignment x = 1, y = 1, z = 0 satises it.
G(x, y, z, t) = (x y) (y z) (z t) x t
is not satisable. Indeed, x y means the same as x y, that
is x y. So G requires x y z t, but alsox = 1, t = 0.
Example expression of some combinatorial problem via Boolean
formulas: 2-coloring.
Theorem
Satisability is NP-complete.
Theorem
INDEPENDENT SET is NP-complete.
Reducing SAT to it.
Most NP-complete problems are proved such by a chain of
reductions from INDEPENDENT SET, or directly from SAT.
Example
Set cover vertex cover independent set.
The End