Randomized

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Chapter 1

Randomized algorithms
A randomized algorithm flips coins during its execution to determine what
to do next. When considering a randomized algorithm, we usually care
about its expected worst-case performance, which is the average amount
of time it takes on the worst input of a given size. This average is computed
over all the possible outcomes of the coin flips during the execution of the
algorithm. We may also ask for a high-probability bound, showing that
the algorithm doesnt consume too much resources most of the time.
In studying randomized algorithms, we consider pretty much the same issues as for deterministic algorithms: how to design a good randomized algorithm, and how to prove that it works within given time or error bounds. The
main difference is that it is often easier to design a randomized algorithm
randomness turns out to be a good substitute for cleverness more often than
one might expectbut harder to analyze it. So much of what one does is
develop good techniques for analyzing the often very complex random processes that arise in the execution of an algorithm. Fortunately, in doing so we
can often use techniques already developed by probabilists and statisticians
for analyzing less overtly algorithmic processes.
Formally, we think of a randomized algorithm as a machine M that
computes M (x, r), where x is the problem input and r is the sequence of
random bits. Our machine model is the usual random-access machine or
RAM model, where we have a memory space that is typically polynomial in
the size of the input n, and in constant time we can read a memory location,
write a memory location, or perform arithmetic operations on integers of
up to O(log n) bits.1 In this model, we may find it easier to think of the
1
This model is unrealistic in several ways: the assumption that we can perform arithmetic on O(log n)-bit quantities in constant omits at least a factor of (log log n) for

CHAPTER 1. RANDOMIZED ALGORITHMS

random bits as supplied as needed by some subroutine, where generating a


random integer of size O(log n) takes constant time; the justification for this
assumption is that it takes constant time to read the next O(log n)-sized
value from the random input.
Because the number of these various constant-time operations, and thus
the running time for the algorithm as a whole, may depend on the random
bits, it is now a random variablea function on points in some probability
space. The probability space consists of all possible sequences r, each
of which is assigned a probability Pr [r] (typically 2|r| ), and the running
time for M on some input x is generally given as an expected value2
Er [time(M (x, r))], where for any X,
Er [X] =

X(r) Pr [r] .

(1.0.1)

We can now quote the performance of M in terms of this expected value:


where we would say that a deterministic algorithms runs in time O(f (n)),
where n = |x| is the size of the input, we instead say that our randomized algorithm runs in expected time O(f (n)), which means that Er [time(M (x, r))] =
O(f (|x|)) for all inputs x.
This is distinct from traditional worst-case analysis, where there is no
r and no expectation, and average-case analysis, where there is again no
r and the value reported is not a maximum but an expectation over some
distribution on x. The following trivial example shows the distinction.

1.1

A trivial example

Let us consider a variant of the classic card game Find the Lady3 Here a
dealer puts down three cards and we want to find a specific card among the
three (say, the Queen of Spades). In this version, the dealer will let us turn
over as many cards as we want, but each card we turn over will cost us a
dollar. If we find the queen, we get two dollars.
addition and probably more for multiplication in any realistic implementation; while the
assumption that we can address nc distinct locations in memory in anything less than
nc/3 time in the worst case requires exceeding the speed of light. But for reasonably small
n, this gives a pretty good approximation of the performance of real computers, which do
in fact perform arithmetic and access memory in a fixed amount of time, although with
fixed bounds on the size of both arithmetic operands and memory.
2
Well see more details of these and other concepts from probability theory in Chapters 2 and 3.
3
Often called Three-card Monte, but that will get confusing when we talk about
Monte Carlo algorithms later.

CHAPTER 1. RANDOMIZED ALGORITHMS

Because this is a toy example, we assume that the dealer is not cheating.
The author is not responsible for the results of applying the analysis below
to real-world games where this assumption does not hold.
A deterministic algorithm tries the cards in some fixed order. A clever
dealer will place the Queen in the last place we look: so the worst-case payoff
is a loss of a dollar.
In the average case, if we assume that all three positions are equally
likely to hold the target card, then we turn over one card a third of the
time, two cards a third of the time, and all three cards a third of the time;
this gives an expected payoff of
1
(1 + 2 + 3) 2 = 0.
3
But this requires making assumptions about the distribution of the cards,
and we are no longer doing worst-case analysis.
The trick to randomized algorithms is that we can can obtain the same
expected payoff even in the worst case by supplying the randomness ourselves. If we turn over cards in a random order, then the same analysis for
the average case tells us we get the same expected payoff of 0but unlike
the average case, we get this expected performance no matter where the
dealer places the cards.

1.2

Verifying polynomial identities

A less trivial example is described in [MU05, 1.1]. Here we are given two
products of polynomials and we want to determine if they compute the same
function. For example, we might have
p(x) = (x 7)(x 3)(x 1)(x + 2)(2 x + 5)
q(x) = 2 x5 13 x4 21 x3 + 127 x2 + 121 x 210
These expressions both represent degree-5 polynomials, and it is not
obvious without multiplying out the factors of p whether they are equal or
not. Multiplying out all the factors of p may take as much as O(d2 ) time if we
assume integer multiplication takes unit time and do it the straightforward
way.4 We can do better than this using randomization.
The trick is that evaluating p(x) and q(x) takes only O(d) integer operations, and we will find p(x) = q(x) only if either (a) p(x) and q(x) are really
4

It can be faster if we do something sneaky like use fast Fourier transforms [SS71].

CHAPTER 1. RANDOMIZED ALGORITHMS

the same polynomial, or (b) x is a root of p(x) q(x). Since p(x) q(x) has
degree at most d, it cant have more than d roots. So if we choose x uniformly at random from some much larger space, its likely that we will not
get a root. Indeed, evaluating p(11) = 112320 and q(11) = 120306 quickly
shows that p and q are not in fact the same.
This is an example of a Monte Carlo algorithm, which is an algorithm
that runs in a fixed amount of time but only gives the right answer some of
the time. (In this case, with probability 1 d/r, where r is the size of the
range of random integers we choose x from.) Monte Carlo algorithms have
the unnerving property of not indicating when their results are incorrect,
but we can make the probability of error as small as we like by running
the algorithm repeatedly. For this particular algorithm, the probability of
error after k trials is only (d/r)k , which means that for fixed d/r we need
O(log(1/)) iterations to get the error bound down to any given . If we are
really paranoid, we could get the error down to 0 by testing d + 1 distinct
values, but now the cost is as high as multiplying out p again.
The error for this algorithm is one-sided: if we find a witness to the
fact that p 6= q, we are done, but if we dont, then all we know is that we
havent found a witness yet. We also have the property that if we check
enough possible witnesses, we are guaranteed to find one.
A similar property holds in the classic Miller-Rabin primality test,
a randomized algorithm for determining whether a large integer is prime
or not.5 The original version, due to Gary Miller [Mil76] showed that, as
in polynomial identity testing, it might be sufficient to pick a particular
set of deterministic candidate witnesses. Unfortunately, this result depends
on the truth of the extended Riemann hypothesis, a notoriously difficult
open problem in number theory. Michael Rabin [Rab80] demonstrated that
choosing random witnesses was enough, if we were willing to accept a small
probability of incorrectly identifying a composite number as prime.
For many years it was open whether it was possible to test primality
deterministically in polynomial time without unproven number-theoretic assumptions, and the randomized Miller-Rabin algorithm was one of the most
widely-used randomized algorithms for which no good deterministic alternative was known. Finally, Agrawal et al. [AKS04] demonstrated how to test
primality deterministically using a different technique, although the cost of
their algorithm is high enough that Miller-Rabin is still used in practice.
5

We will not describe this algorithm here.

CHAPTER 1. RANDOMIZED ALGORITHMS

1.3

Randomized QuickSort

The QuickSort algorithm [Hoa61a] works as follows. For simplicity, we


assume that no two elements of the array being sorted are equal.
If the array has > 1 elements,
Pick a pivot p uniformly at random from the elements of the
array.
Split the array into A1 and A2 , where A1 contains all elements
< p elements > p.
Sort A1 and A2 recursively and return the sequence A1 , p, A2 .
Otherwise return the array.
The splitting step takes exactly n 1 comparisons, since we have to
check each non-pivot against the pivot. We assume all other costs are dominated by the cost of comparisons. How many comparisons does randomized
QuickSort do on average?
There are two ways to solve this problem: the dumb way and the smart
way. Well do it the dumb way now and save the smart way for 1.3.2.

1.3.1

Brute force method: solve the recurrence

Let T (n) be the expected number of comparisons done on an array of n


elements. We have T (0) = T (1) = 0 and for larger n,
T (n) =

X
1 n1
(T (k) + T (n 1 k)) .
n k=0

(1.3.1)

Why? Because there are n equally-likely choices for our pivot (hence the
1/n), and for each choice the expected cost is T (k) + T (n 1 k), where k
is the number of elements that land in A1 . Formally, we are using here the
law of total probability, which says that for any random variable X and
partition of the probability space into events B1 . . . Bn , then
E [X] =

Bi E [X | Bi ] ,

where
E [X | Bi ] =

X
1
X()
Pr [Bi ] B
i

CHAPTER 1. RANDOMIZED ALGORITHMS

is the conditional expectation of X conditioned on Bi , which we can


think of as just the average value of X if we know that Bi occurred. (See
2.3.1 for more details.)
So now we just have to solve this ugly recurrence. We can reasonably
guess that when n 1, T (n) an log n for some constant a. Clearly this
holds for n = 1. Now apply induction on larger n to get
T (n) = (n 1) +

X
1 n1
(T (k) + T (n 1 k))
n k=0

= (n 1) +

X
2 n1
T (k)
n k=0

= (n 1) +

X
2 n1
T (k)
n k=1

(n 1) +

X
2 n1
ak log k
n k=1
Z n

2
n
2a
= (n 1) +
n

(n 1) +

ak log k
k=1
n2 log n

n2 1

+
4
4

n2 log n n2 1

+
2
4
4
an
a
= (n 1) + an log n
+
.
2
2n

2a
= (n 1) +
n

If we squint carefully at this recurrence for a while we notice that setting


a = 2 makes this less than or equal to an log n, since the remaining terms
become (n 1) n + 1/n = 1/n 1, which is negative for n 1. We can
thus confidently conclude that T (n) 2n log n (for n 1).

1.3.2

Clever method: use linearity of expectation

Alternatively, we can use linearity of expectation (which well discuss


further in 3.4.1 to compute the expected number of comparisons used by
randomized QuickSort.
Imagine we use the following method for choosing pivots: we generate a
random permutation of all the elements in the array, and when asked to sort
some subarray A0 , we use as pivot the first element of A0 that appears in our
list. Since each element is equally likely to be first, this is equivalent to the

CHAPTER 1. RANDOMIZED ALGORITHMS

actual algorithm. Pretend that we are always sorting the numbers 1 . . . n


and define for each pair of elements i < j the indicator variable Xij to be
1 if i is compared to j at some point during the execution of the algorithm
and 0 otherwise. Amazingly, we can actually compute the probability of
this event (and thus E [Xij ]): the only time i and j are compared is if one
of them is chosen as a pivot before they are split up into different arrays.
How do they get split up into different arrays? If some intermediate element
k is chosen as pivot first, i.e., if some k with i < k < j appears in the
permutation before both i and j. Occurrences of other elements dont affect
the outcome, so we can concentrate on the restriction of the permutations
to just the numbers i through j, and we win if this restricted permutation
starts with either i or j. This event occurs with probability 2/(j i + 1), so
we have E [Xij ] = 2/(j i + 1). Summing over all pairs i < j gives:

X
X
E [Xij ]
E Xij =
i<j

i<j

X
i<j

=
=
=
=
=

2
ji+1

n1
X
X ni+1
i=1 k=2
i
n X
X

2
k

2
k
i=2 k=2
n
X
2(n k + 1)

k=2
n 
X
k=2
n
X

k
2(n + 1)
2
k

2(n + 1)
2(n 1)
k
k=2

= 2(n + 1)(Hn 1) 2(n 1)


= 2(n + 1)Hn 4n.
Here Hn = ni=1 1i is the n-th harmonic number, equal to ln n +
+ O(n1 ), where 0.5772 is the Euler-Mascheroni constant (whose
exact value is unknown!). For asymptotic purposes we only need Hn =
(log n).
For the first step we are taking advantage of the fact that linearity of
P

CHAPTER 1. RANDOMIZED ALGORITHMS

expectation doesnt care about the variables not being independent. The
rest is just algebra.
This is pretty close to the bound of 2n log n we computed using the
recurrence in 1.3.1. Given that we now know the exact answer, we could
in principle go back and use it to solve the recurrence exactly.6
Which way is better? Solving the recurrence requires less probabilistic
handwaving (a more polite term might be insight) but more grinding
out inequalities, which is a pretty common trade-off. Since I am personally
not very clever I would try the brute-force approach first. But its worth
knowing about better methods so you can try them in other situations.

1.4
1.4.1

Classifying randomized algorithms by their goals


Las Vegas vs Monte Carlo

One difference between QuickSort and polynomial equality testing that


QuickSort always succeeds, but may run for longer than you expect; while
the polynomial equality tester always runs in a fixed amount of time, but
may produce the wrong answer. These are examples of two classes of randomized algorithms, which were originally named by Lszl Babai [Bab79]:7
A Las Vegas algorithm fails with some probability, but we can tell
when it fails. In particular, we can run it again until it succeeds, which
means that we can eventually succeed with probability 1 (but with a
potentially unbounded running time). Alternatively, we can think of
a Las Vegas algorithm as an algorithm that runs for an unpredictable
amount of time but always succeeds (we can convert such an algorithm
back into one that runs in bounded time by declaring that it fails if it
runs too longa condition we can detect). QuickSort is an example
of a Las Vegas algorithm.
A Monte Carlo algorithm fails with some probability, but we cant
tell when it fails. If the algorithm produces a yes/no answer and
the failure probability is significantly less than 1/2, we can reduce the
probability of failure by running it many times and taking a majority of
the answers. The polynomial equality-testing algorithm in an example
of a Monte Carlo algorithm.
6

We wont.
To be more precise, Babai defined Monte Carlo algorithms based on the properties of
Monte Carlo simulation, a technique dating back to the Manhattan project. The term
Las Vegas algorithm was new.
7

CHAPTER 1. RANDOMIZED ALGORITHMS

The heuristic for remembering which class is which is that the names
were chosen to appeal to English speakers: in Las Vegas, the dealer can tell
you whether youve won or lost, but in Monte Carlo, le croupier ne parle
que Franais, so you have no idea what hes saying.
Generally, we prefer Las Vegas algorithms, because we like knowing when
we have succeeded. But sometimes we have to settle for Monte Carlo algorithms, which can still be useful if we can get the probability of failure small
enough. For example, any time we try to estimate an average by sampling
(say, inputs to a function we are trying to integrate or political views of
voters we are trying to win over) we are running a Monte Carlo algorithm:
there is always some possibility that our sample is badly non-representative,
but we cant tell if we got a bad sample unless we already know the answer
we are looking for.

1.4.2

Randomized complexity classes

Las Vegas vs Monte Carlo is the typical distinction made by algorithm designers, but complexity theorists have developed more elaborate classifications. These include algorithms with one-sided failure properties. For
these algorithms, we never get a bogus yes answer but may get a bogus
no answer (or vice versa). This gives us several complexity classes that
act like randomized versions of NP, co-NP, etc.:
The class R or RP (randomized P) consists of all languages L for
which a polynomial-time Turing machine M exists such that if x L,
then Pr [M (x, r) = 1] 1/2 and if x 6 L, then Pr [M (x, r) = 1] =
0. In other words, we can find a witness that x L with constant
probability. This is the randomized analog of NP (but its much more
practical, since with NP the probability of finding a winning witness
may be exponentially small).
The class co-R consists of all languages L for which a poly-time Turing
machine M exists such that if x 6 L, then Pr [M (x, r) = 1] 1/2 and
if x L, then Pr [M (x, r) = 1] = 0. This is the randomized analog of
co-NP.
The class ZPP (zero-error probabilistic P ) is defined as RP co-RP.
If we run both our RP and co-RP machines for polynomial time, we
learn the correct classification of x with probability at least 1/2. The
rest of the time we learn only that weve failed (because both machines
return 0, telling us nothing). This is the class of (polynomial-time)

CHAPTER 1. RANDOMIZED ALGORITHMS

10

Las Vegas algorithms. The reason it is called zero-error is that we


can equivalently define it as the problems solvable by machines that
always output the correct answer eventually, but only run in expected
polynomial time.
The class BPP (bounded-error probabilistic P) consists of all languages L for which a poly-time Turing machine exists such that if x 6
L, then Pr [M (x, r) = 1] 1/3, and if x L, then Pr [M (x, r) = 1]
2/3. These are the (polynomial-time) Monte Carlo algorithms: if our
machine answers 0 or 1, we can guess whether x L or not, but we
cant be sure.
The class PP (probabilistic P) consists of all languages L for which a
poly-time Turing machine exists such that if x 6 L, then Pr [M (x, r) = 1]
1/2, and if x L, then Pr [M (x, r) = 1] < 1/2. Since there is only
an exponentially small gap between the two probabilities, such algorithms are not really useful in practice; PP is mostly of interest to
complexity theorists.
Assuming we have a source of random bits, any algorithm in RP, coRP, ZPP, or BPP is good enough for practical use. We can usually even
get away with using a pseudorandom number generator, and there are good
reasons to suspect that in fact every one of these classes is equal to P.

1.5

Classifying randomized algorithms by their methods

We can also classify randomized algorithms by how they use their randomness to solve a problem. Some very broad categories:8
Avoiding worst-case inputs, by hiding the details of the algorithm
from the adversary. Typically we assume that an adversary supplies
our input. If the adversary can see what our algorithm is going to
do (for example, he knows which door we will open first), he can use
this information against us. By using randomness, we can replace our
predictable deterministic algorithm by what is effectively a random
choice of many different deterministic algorithms. Since the adversary
doesnt know which algorithm we are using, he cant (we hope) pick
an input that is bad for all of them.
8

These are largely adapted from the introduction to [MR95].

CHAPTER 1. RANDOMIZED ALGORITHMS

11

Sampling. Here we use randomness to find an example or examples


of objects that are likely to be typical of the population they are drawn
from, either to estimate some average value (pretty much the basis of
all of statistics) or because a typical element is useful in our algorithm
(for example, when picking the pivot in QuickSort). Randomization
means that the adversary cant direct us to non-representative samples.
Hashing. Hashing is the process of assigning a large object x a small
name h(x) by feeding it to a hash function h. Because the names are
small, the Pigeonhole Principle implies that many large objects hash to
the same name (a collision). If we have few objects that we actually
care about, we can avoid collisions by choosing a hash function that
happens to map them to different places. Randomization helps here
by keeping the adversary from choosing the objects after seeing what
our hash function is.
Hashing techniques are used both in load balancing (e.g., insuring
that most cells in a hash table hold only a few objects) and in fingerprinting (e.g., using a cryptographic hash function to record a
fingerprint of a file, so that we can detect when it has been modified).
Building random structures. The probabilistic method shows
the existence of structures with some desired property (often graphs
with interesting properties, but there are other places where it can be
used) by showing that a randomly-generated structure in some class
has a nonzero probability of having the property we want. If we can
beef the probability up to something substantial, we get a randomized
algorithm for generating these structures.
Symmetry breaking. In distributed algorithms involving multiple processes, progress may be stymied by all the processes trying to
do the same thing at the same time (this is an obstacle, for example,
in leader election, where we want only one process to declare itself
the leader). Randomization can break these deadlocks.

You might also like