Randomized
Randomized
Randomized
Randomized algorithms
A randomized algorithm flips coins during its execution to determine what
to do next. When considering a randomized algorithm, we usually care
about its expected worst-case performance, which is the average amount
of time it takes on the worst input of a given size. This average is computed
over all the possible outcomes of the coin flips during the execution of the
algorithm. We may also ask for a high-probability bound, showing that
the algorithm doesnt consume too much resources most of the time.
In studying randomized algorithms, we consider pretty much the same issues as for deterministic algorithms: how to design a good randomized algorithm, and how to prove that it works within given time or error bounds. The
main difference is that it is often easier to design a randomized algorithm
randomness turns out to be a good substitute for cleverness more often than
one might expectbut harder to analyze it. So much of what one does is
develop good techniques for analyzing the often very complex random processes that arise in the execution of an algorithm. Fortunately, in doing so we
can often use techniques already developed by probabilists and statisticians
for analyzing less overtly algorithmic processes.
Formally, we think of a randomized algorithm as a machine M that
computes M (x, r), where x is the problem input and r is the sequence of
random bits. Our machine model is the usual random-access machine or
RAM model, where we have a memory space that is typically polynomial in
the size of the input n, and in constant time we can read a memory location,
write a memory location, or perform arithmetic operations on integers of
up to O(log n) bits.1 In this model, we may find it easier to think of the
1
This model is unrealistic in several ways: the assumption that we can perform arithmetic on O(log n)-bit quantities in constant omits at least a factor of (log log n) for
X(r) Pr [r] .
(1.0.1)
1.1
A trivial example
Let us consider a variant of the classic card game Find the Lady3 Here a
dealer puts down three cards and we want to find a specific card among the
three (say, the Queen of Spades). In this version, the dealer will let us turn
over as many cards as we want, but each card we turn over will cost us a
dollar. If we find the queen, we get two dollars.
addition and probably more for multiplication in any realistic implementation; while the
assumption that we can address nc distinct locations in memory in anything less than
nc/3 time in the worst case requires exceeding the speed of light. But for reasonably small
n, this gives a pretty good approximation of the performance of real computers, which do
in fact perform arithmetic and access memory in a fixed amount of time, although with
fixed bounds on the size of both arithmetic operands and memory.
2
Well see more details of these and other concepts from probability theory in Chapters 2 and 3.
3
Often called Three-card Monte, but that will get confusing when we talk about
Monte Carlo algorithms later.
Because this is a toy example, we assume that the dealer is not cheating.
The author is not responsible for the results of applying the analysis below
to real-world games where this assumption does not hold.
A deterministic algorithm tries the cards in some fixed order. A clever
dealer will place the Queen in the last place we look: so the worst-case payoff
is a loss of a dollar.
In the average case, if we assume that all three positions are equally
likely to hold the target card, then we turn over one card a third of the
time, two cards a third of the time, and all three cards a third of the time;
this gives an expected payoff of
1
(1 + 2 + 3) 2 = 0.
3
But this requires making assumptions about the distribution of the cards,
and we are no longer doing worst-case analysis.
The trick to randomized algorithms is that we can can obtain the same
expected payoff even in the worst case by supplying the randomness ourselves. If we turn over cards in a random order, then the same analysis for
the average case tells us we get the same expected payoff of 0but unlike
the average case, we get this expected performance no matter where the
dealer places the cards.
1.2
A less trivial example is described in [MU05, 1.1]. Here we are given two
products of polynomials and we want to determine if they compute the same
function. For example, we might have
p(x) = (x 7)(x 3)(x 1)(x + 2)(2 x + 5)
q(x) = 2 x5 13 x4 21 x3 + 127 x2 + 121 x 210
These expressions both represent degree-5 polynomials, and it is not
obvious without multiplying out the factors of p whether they are equal or
not. Multiplying out all the factors of p may take as much as O(d2 ) time if we
assume integer multiplication takes unit time and do it the straightforward
way.4 We can do better than this using randomization.
The trick is that evaluating p(x) and q(x) takes only O(d) integer operations, and we will find p(x) = q(x) only if either (a) p(x) and q(x) are really
4
It can be faster if we do something sneaky like use fast Fourier transforms [SS71].
the same polynomial, or (b) x is a root of p(x) q(x). Since p(x) q(x) has
degree at most d, it cant have more than d roots. So if we choose x uniformly at random from some much larger space, its likely that we will not
get a root. Indeed, evaluating p(11) = 112320 and q(11) = 120306 quickly
shows that p and q are not in fact the same.
This is an example of a Monte Carlo algorithm, which is an algorithm
that runs in a fixed amount of time but only gives the right answer some of
the time. (In this case, with probability 1 d/r, where r is the size of the
range of random integers we choose x from.) Monte Carlo algorithms have
the unnerving property of not indicating when their results are incorrect,
but we can make the probability of error as small as we like by running
the algorithm repeatedly. For this particular algorithm, the probability of
error after k trials is only (d/r)k , which means that for fixed d/r we need
O(log(1/)) iterations to get the error bound down to any given . If we are
really paranoid, we could get the error down to 0 by testing d + 1 distinct
values, but now the cost is as high as multiplying out p again.
The error for this algorithm is one-sided: if we find a witness to the
fact that p 6= q, we are done, but if we dont, then all we know is that we
havent found a witness yet. We also have the property that if we check
enough possible witnesses, we are guaranteed to find one.
A similar property holds in the classic Miller-Rabin primality test,
a randomized algorithm for determining whether a large integer is prime
or not.5 The original version, due to Gary Miller [Mil76] showed that, as
in polynomial identity testing, it might be sufficient to pick a particular
set of deterministic candidate witnesses. Unfortunately, this result depends
on the truth of the extended Riemann hypothesis, a notoriously difficult
open problem in number theory. Michael Rabin [Rab80] demonstrated that
choosing random witnesses was enough, if we were willing to accept a small
probability of incorrectly identifying a composite number as prime.
For many years it was open whether it was possible to test primality
deterministically in polynomial time without unproven number-theoretic assumptions, and the randomized Miller-Rabin algorithm was one of the most
widely-used randomized algorithms for which no good deterministic alternative was known. Finally, Agrawal et al. [AKS04] demonstrated how to test
primality deterministically using a different technique, although the cost of
their algorithm is high enough that Miller-Rabin is still used in practice.
5
1.3
Randomized QuickSort
1.3.1
X
1 n1
(T (k) + T (n 1 k)) .
n k=0
(1.3.1)
Why? Because there are n equally-likely choices for our pivot (hence the
1/n), and for each choice the expected cost is T (k) + T (n 1 k), where k
is the number of elements that land in A1 . Formally, we are using here the
law of total probability, which says that for any random variable X and
partition of the probability space into events B1 . . . Bn , then
E [X] =
Bi E [X | Bi ] ,
where
E [X | Bi ] =
X
1
X()
Pr [Bi ] B
i
X
1 n1
(T (k) + T (n 1 k))
n k=0
= (n 1) +
X
2 n1
T (k)
n k=0
= (n 1) +
X
2 n1
T (k)
n k=1
(n 1) +
X
2 n1
ak log k
n k=1
Z n
2
n
2a
= (n 1) +
n
(n 1) +
ak log k
k=1
n2 log n
n2 1
+
4
4
n2 log n n2 1
+
2
4
4
an
a
= (n 1) + an log n
+
.
2
2n
2a
= (n 1) +
n
1.3.2
X
X
E [Xij ]
E Xij =
i<j
i<j
X
i<j
=
=
=
=
=
2
ji+1
n1
X
X ni+1
i=1 k=2
i
n X
X
2
k
2
k
i=2 k=2
n
X
2(n k + 1)
k=2
n
X
k=2
n
X
k
2(n + 1)
2
k
2(n + 1)
2(n 1)
k
k=2
expectation doesnt care about the variables not being independent. The
rest is just algebra.
This is pretty close to the bound of 2n log n we computed using the
recurrence in 1.3.1. Given that we now know the exact answer, we could
in principle go back and use it to solve the recurrence exactly.6
Which way is better? Solving the recurrence requires less probabilistic
handwaving (a more polite term might be insight) but more grinding
out inequalities, which is a pretty common trade-off. Since I am personally
not very clever I would try the brute-force approach first. But its worth
knowing about better methods so you can try them in other situations.
1.4
1.4.1
We wont.
To be more precise, Babai defined Monte Carlo algorithms based on the properties of
Monte Carlo simulation, a technique dating back to the Manhattan project. The term
Las Vegas algorithm was new.
7
The heuristic for remembering which class is which is that the names
were chosen to appeal to English speakers: in Las Vegas, the dealer can tell
you whether youve won or lost, but in Monte Carlo, le croupier ne parle
que Franais, so you have no idea what hes saying.
Generally, we prefer Las Vegas algorithms, because we like knowing when
we have succeeded. But sometimes we have to settle for Monte Carlo algorithms, which can still be useful if we can get the probability of failure small
enough. For example, any time we try to estimate an average by sampling
(say, inputs to a function we are trying to integrate or political views of
voters we are trying to win over) we are running a Monte Carlo algorithm:
there is always some possibility that our sample is badly non-representative,
but we cant tell if we got a bad sample unless we already know the answer
we are looking for.
1.4.2
Las Vegas vs Monte Carlo is the typical distinction made by algorithm designers, but complexity theorists have developed more elaborate classifications. These include algorithms with one-sided failure properties. For
these algorithms, we never get a bogus yes answer but may get a bogus
no answer (or vice versa). This gives us several complexity classes that
act like randomized versions of NP, co-NP, etc.:
The class R or RP (randomized P) consists of all languages L for
which a polynomial-time Turing machine M exists such that if x L,
then Pr [M (x, r) = 1] 1/2 and if x 6 L, then Pr [M (x, r) = 1] =
0. In other words, we can find a witness that x L with constant
probability. This is the randomized analog of NP (but its much more
practical, since with NP the probability of finding a winning witness
may be exponentially small).
The class co-R consists of all languages L for which a poly-time Turing
machine M exists such that if x 6 L, then Pr [M (x, r) = 1] 1/2 and
if x L, then Pr [M (x, r) = 1] = 0. This is the randomized analog of
co-NP.
The class ZPP (zero-error probabilistic P ) is defined as RP co-RP.
If we run both our RP and co-RP machines for polynomial time, we
learn the correct classification of x with probability at least 1/2. The
rest of the time we learn only that weve failed (because both machines
return 0, telling us nothing). This is the class of (polynomial-time)
10
1.5
We can also classify randomized algorithms by how they use their randomness to solve a problem. Some very broad categories:8
Avoiding worst-case inputs, by hiding the details of the algorithm
from the adversary. Typically we assume that an adversary supplies
our input. If the adversary can see what our algorithm is going to
do (for example, he knows which door we will open first), he can use
this information against us. By using randomness, we can replace our
predictable deterministic algorithm by what is effectively a random
choice of many different deterministic algorithms. Since the adversary
doesnt know which algorithm we are using, he cant (we hope) pick
an input that is bad for all of them.
8
11