Lower Bounds Probabilistic Arguments: With All

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Lower Bounds by Probabilistic Arguments

(Extended Abstract)

Andrew C. Yao

Computer Science Department


Stanford University
Stanford, California 94305
Abstract For some of these results, the effectiveness of probabilistic
The purpose of this paper is to resolve several open problems in methods does not seem to be a priori obvious, as the questions them-
the current literature on Boolean circuits, communication complexity, selves do not involve the evaluation of probabilities. (In the topic
and hashing functions. These lower bound results share the common of "probabilistic communication complexity", the probabilistic aspect
feature that their proofs utilize probabilistic arguments in an essential only enters in specifying what an acceptable algorithm is.) As with all
way. lower bound problems, the answers sought concern the existence or
Specifically, we prove that, to compute the majority function of non-existence of certain combinatorial objects. In our case, the com-
n Boolean variables, the size of any depth-3 monotone circuit must be binatorial objects in question are Boolean networks with cost measured
greater than 2" and the size of any width-2 branching program must
f ,
by the number of gates, matrices with cost measured by the size of
have super-polynomial growth. We also show that, for the problem of their second dimensions, etc.
deciding whether i ~ j for two n-bit integers i and j, the probabilistic There are a number of main results in this paper. In this extended
f~error one-way communication complexity is of order B( n), while the abstract, we will present the proofs for only three of the Theorems,
two-way f-error complexity is O«log n)2). We will also prove that, one on each of the topics mentioned above (see § 2, 3, and 4).
to compute i · j mod p for an n-bit prime p, the probabilistic f-error
two-way communication complexity is of order 8(n). Finally, we prove
2. Bounds for Boolean Circuits.
a conjecture of Ullman that unifonn hashing is asymptotically optimal
in its expected retrieval cost among open address hashing schemes . 2.1. Background.
Boolean circuits have long been regarded as an excellent model
1. Introduction. for proving lower bounds on the computational complexity of various
specific problems (see e.g. Savage [Sa]). However, progress has been
By definition, probabilistic methods involve the estimation of
slow in deriving strong lower bounds even in restricted models such
probabilities for certain events to happen. It is often used in
Combinatorics to demonstrate the existence of objects with certain as monotone circuits. Recently, super-poly.nomial lower bounds have
properties without giving an explicit description of such objects (see been obtained in several restricted models. These results are interesting
both in the proof techniques used, and in the fact that these restricted
Erdos and Spencer [ES)). In the study of computational complexity,
models are closely related to several important issues in computation
probabilistic method did not use to be a major tool, but has become
theory, such as the alternation hierarchy, logspace problem; etc. (see
increasingly important in the recent literature (e.g. Fredman [F), Furst,
[FFSl] [BDFP] for discussions).
Saxe, and Sipser [FSSl], ·Duris, Oalil, Paul, and Reischuk [DGPR),
etc.) for proving both upper bounds and lower bounds. In a way, Furst, Saxe, and Sipser (FFSl] studied circuits of bounded depth,
such a development is particularly welcome for proving lower bounds, and showed that no p'olynomial size circuits of fixed depth can
as techniques in this area are notoriously lacking at the present time. compute the parity function (and related functions such as the majority
The main purpose of this paper is to prove a number of lower bound function) of n Boolean variables. Chandra, Fortune, and Lipton
results which resolve several open problems in Boolean circuits, com- [CFLl] extended the above results to a wider class of functions, and
munication complexity, and hashing functions. It also may serve the Sipser [Sil proved JD't circuits of bounded depth and polynomial size
purpose of further enhancing the interest in probabilistic methods, fonn a hierarchy with regard to the depth. Valiant [V] showed that,
since a variety of such arguments are used. for the restricted case of monotone circuits and depth three, any circuit
fo~ the clique problem must have size 2" E•

rPart of this work was done while the author was visiting IBM Research
Center, San Jose, California

420
0272-5428/83/0000/0420$01.00 © 1983 IEEE
Borodin, Oolev, Fich, and Paul [BDFP] considered branching the two edges leaving tJ = 1 or Xi = 0; each sink
are labelled by Xi
programs of bounded width, which can be viewed as a special type is labelled by either a 0 or a 1. We will call t to be the length of P,
of Boolean circuits, and proved an n 2 /log n lower bound on width- 2 and define the width of P to be the maximum number of nodes in any
programs for computing the majority function. Exponential lower level L i . For any input value assignment of the Boolean varables Xi,
bounds e" were obtained in [BDFP] and in Plumstead and Plumstead a branching program starts at the root, traversing down the directed
(PP] for certain functions when only restricted classes of width-2
graph by following only edges with labels consistent with the values of
programs are allowed Following an approach given in [BDFP),
the Boolean variables, until a sink is reached where the stored value
Shearer (private communication) showed a e" lower bound to the
is the desired output. We say P computes a function f if for every
size of width-2 programs for computing the (mod 3)-sum function of
possible assignment (Xl, X2, ••• , x n ) E {O,l}n, the program outputs
n Boolean variables. Along a different direction, Chandra, Furst, and
f(Xb X2, •••, x n ).
Lipton [CFL2) proved that no linear-size branching programs of fixed
width can compute the majority function of n Boolean variables. 2.3. Results.
Several open problems were raised in the above papers. Among Let MAJ(xt, X2, ••• , x,,) be the Boolean function that takes on
them: value 1 if and only if at least n/2 of the variables are 1.
(1) Does the majority function have ar polynomial size branching
program of width 2? ([BDFP» Theorem 1. Let q(n) be any polynomial. Then, for all large n, any
(2) Can one show a 2" E lower' bound for a problem that is known , width-2 branching program that computes MA.J(xt, X2, ••• , x,,) must
to be in P (in contrast to the clique problem used in [V), which is have length greater than q(n).

NP-complete)? ([V]) Theorem 2. There exists a positive constant f > 0 such that,
(3) Does the majority function have a bounded-depth polynomial-size for all large n, any monotone E s (or Ils)-circuit that computes
circuit if the parity function is allowed as a primitive gate? ([FFS2D M AJ(xt, X2, ••• , x n ) must have size at least 2"E.

In this paper we will resolve questions (1) and (2), and along Theorems 1 and 2 resolved some open problems in [BDFP] [V]
the way prove a partial result for question (3). We will show (questions (1) and (2) mentioned in section 2.1). Theorem 2 has
that, although bounded-depth and bounded-width circuits arose from a succint proof: which will be given in Section 2.4. The proof
different origins, the techniques developed in the bounded-depth case of Theorem 1 is much more involved, and uses the "probabilistic

[FFS1] [Vl are essential for answering some questions in the bounded- reduction" technique introduced in [FSSl). One side result involved

width case.
in the proof is the following theorem which is of interest in its own
J

right (along the line of question (3) in Section 2.1).


We remark that question (2) has independetly been resolved by
A E~(e)-cicuit is defined to be the same as a Ek-cicuit except
Klawe, Paul and Pippenger (private communication) recently with a
that the gates at the lowest level are the "EXCLUSIVE-OR"-gates,
different method.
and that the number of literals input to each "EXCLUSIVE-OR"-gate
2.2. Terminology. is limited to e or less.
As defined in (FFSl], a Ek-circuit Q is a logical circuit consisting Theorem 8. Let e be any fixed integer and q( n) a fixed poly·
of k layers of alternating "OR" and "AND" gates with unbounded nomial. Then, for all large n, any E~(e)-circuit that computes
fan-ins and fan-outs; the inputs are literals from a set of Boolean MAJ{Xt, %2, •••, x,,) must have size greater than q(n).
variables and their complements {Xt, X2, •••, x"' Xl, X2, ••• , xn }, and The next theorem extends Valiant's result [V] in a different way,
whose output gate is an "OR"-gate. Similarly, a Ilk-circuit can be
by giving a strong exponential lower bound for monotone depth-4
defined in the same way, except that the output gate is an "AND"- circuit
gate. The size of a circuit is the total number of gates in the circuit
Let G" denote the n(n - 1)/2-tuple of Boolean variables that
A circuit is monotone if no complemented literals Xi appear in the
represent a graph on n vertices. Let CLIQUE(G n ) be the Boolean
inputs.
function whose value is 1 if and only if the graph represented by G fa
Branching program [M] is a refinement of the standard decision has a clique of size no. t .
tree program. Following [BDFP], a branching program P of n Boolean
Theorem 4. There exists a positive cost f > 0 such that, for all
variables {Xt, X2, ••• , x n } is a directed graph whose nodes are divided
large n, any monotone E 4 circuit that computes CLIQUE(G,,) must
into levels L o, L t , ..., L t such that all edges run from La to ~+t
have size at least 2n f •

and thatLo consists of one node called the root; furthennore, every
node is either a sink or a node of outdegree 2, and in the latter case,

421
with high probability, this clause will contain at least one literal Xi
2.4. Proof of Theorem 2-
whose index j is among the m indices chosen SO far. In fact, a simple
Let x = (Xl, X2, •.•, x~) be a tuple of n, Boolean variables. A 19
calculation shows that, with probability 1- 2- n O. , all clauses of size
monotone clause C is a formula of the fonn Xii Xi2 + + ···+
Xi., nO. or more will have been satisfied by Xii = 1, Xi2 = 1,···, Xi", =
2

where the "+" is the Boolean "OR". A monotone CNF-formula 1. Thus, for our purpose, we can assume that this is indeed the
G is a function of the fonn Cl · C 2• ··Ct , where the "." is the case. Without loss of generality, let C 1 , C2,·· ., Ch be the unsatisfied
Boolean "AND", and each Ci is a monotone clause. For example, clauses, all with size(C,) < n 0.2 •
(X3+ X4 + X8)· (X2 + +
X3 X7) • (Xl +
X2) is a monotone CNF- Let C i1 , C'2'···' Cip be a maximal set of disjoint clauses (i.e.
fonnula. We will often use C(x) or G(x) to denote the Boolean with disjoint sets of literals) among C 1 , C 2 ,···, Ch.. It is easy to see
functions associated with the fonnulas. A monotone E 3 -formula F is that p ~ nO. 2 ; otherwise setting all the less than nO. 4 literals in
of the form G 1 + G2 + ... + G., where each Gi is a monotone Cil UCi2 U···UCip together with the already set Xii = 1, Xi2 =
CNF-fonnula. For any formula C, G, or F, its size (written as r
1, · · ., Xi m = 1 would satisfy G with less than n/21 of the x's set
size(C), etc.) is defined to be the total number of logic connectives to 1, which would be a contradiction to the assumption of G being
contained in it majorized.
As it is well known that for any fixed depth, the circuit size and Let us now continue with the process of picking the rest of the
the formula size are polynomially related, the following proposition indices i m + 1 , i m +2 ,···, ir"/21 and set the corresponding XA;'S to 1.
will imply Theorem 2. In order to get an x with G(x) = 1, each of the 8 = nO. 2 clauses
Proposition 1. For all large n, if F is a monotone E 3 -fonnula and Cil' Ci2 , • • ., Ci. must contain at least one of these newly chosen n 0.4
MAJ(x) = F(x), then size{F) ~ 2ft°. •
1
variables Xi m + 1 , Xi"'+2'···' Xir_/21. This implies that at least 8 of
these newly chosen variables have to be in A, defined as the set of
Assume size(F) <2 ft
0.1; we will show that there exists an input
value of x such that F(x) ~MAJ(x);, Let us write F = Gt + G2 + literals in C;I' Ci2' • • ., C,... The probability for this to happen is no
... + G. where the Gi'S are monotone CNF-formulas; clearly 8 < greater than r( a, b, c) with a = nO. 4, b = nO. 2, c = n/2 + nO. 4,
where r(a, b, c) is the probabilty of two random size-a subsets from
2ft °. 1 •
the universe {1, 2, ... , c} having at least b elements in common. A
Let w(x) denote the weight function, i.e., the number of I's in the 19
standard calculation shows r(a, b, c) to be O(2- n O. ). This completes
values of Xi'S. Call a CNF-fonnula G majorized if G(x) =
0 for all the proof of the Lemma, and hence Proposition 1.
x with w(x) < fn/2l- If any Gi is not majorized, then by definition
there exists an x with w(x) < fn/21 and Gi(!) = 1; clearly for this
3. Communication Complexity.
value of x, F(x) ~MAJ(x).
3.1. Background.
We will thus assume that all Gi are majorized. Here the
probabilistic argument comes in: we will show that, for all sufficiently Let f( i, j) be a Boolean-valued function of n-bit integers i amd
large n, if we take a random x with w(x) = fn/21, then F(x) = j. Suppose the values of i and j are known only to persons A and
o ~MAJ(x) with probability strictly greater than O. This implies the B respectively, and they wish to compute cooperatively the vaiue of
existence of such an x and hence the Proposition. I( i, j), by sending information to each other alternately, one bit at a
Lemme 1. Let G be a majorized monotone CNF-formula with time, according to some algorithm. What is the minimum number of
size less than 2,,0.1. Take a random x with w(x) = fn/2l- Then bits they have to exchange? This and several related communication
Pr{G(x) = 1} = 0(2- ft °. ).
19 complexity questions at the bit level were raised and studied in Yao
[Y2]. (Abelson [A] also considered a model in which f is a c(}otinuous
This lemma implies the desired result; if we pick a random
function of real variables i, j.) Later, Lipton and Sedgewick. [LSj
x with w(x) = rn/21 , then Pr{F(x) = 1} ::; 2: i Pr{Gi(X) =
introduced the notion of non-deteministic protocols; Mehlhorn and
1} < 8· 0(2- ft °. ) < 1 for all sufficiently large n, and as a result,
19

Schmidt (MS] gave new techniques for finding lower bounds, and
Pr{F(x) = O} > O. It remains therefore to prove the lemma.
exhibited gaps between different communication complexity measures;
1
To prove the lemma, let G = Ct ·C2·· ·Ct, where t < 2ft °. and Papadimitriou and Sipser [PS] ~ntroduced the notion of k-round
the C;, are clauses. Let us generate a random x by picking sequentially protocols, and showed an exponential gap between the power of 1-
random distinct indices it, i 2,···, irn/21, and setting Xii = 1, Xi 2 = round and 2-round protocols. Recently, Duris, OaIil and Schnitger
1,···, xrn/21 = 1 and all the other XA;'S to 0: (private communication) proved an exponential gap between the power
of Ie-round and (k + I)-round protocols for general k. Aho, Ullman,
Let us stop the process after picking m indices where m = and Yannakakis lAUY] and El GamaI and Pang [EP] showed that
rn/21- nO. 4 • Let C, be any clause with SiZe(Ci) ~ nO. 2 • Then, computing the Hamming distance of i and j takes order n bits by
two-way communication.

422
In [Y2] several different complexity measures were introduced.
The two-way communication complexity for the ordering function
The complexity that corresponds to the basic problem described above
is fairly well detennined. It is at least O(log n) by a general theorem
is the deterministic two- way complexity, denoted by C(!; 1 ++ 2), as the
([Y2]), and it is not hard to prove that Ct.(g,,; 1 ..... 2) = O((log n)2).
class of algorithms considered are detenninistic algorithms and the bits
Note that the ordering function exihibits a striking difference in its
are sent in both directions. If the algorithms are restricted to those, in
one-way and two-way probabilistic complexity. More generally, one
which A will send B only one string Q and after that B will compute
can show for 9n a trade-off phenomenon; namely, its probabilistic
f(i,j) from j and Q, then the minimum number of bits needed
complexity is at most O(n 1/ k (log n)2) if k rounds of communication
is called deterministic one-way complexity, denoted by C(!; 1 -+ 2).
are allowed.
Clearly, C(f; 1 ++ 2) ~ C(f; 1 -+ 2). In another direction, if
randomized algorithms are allowed and if error probability up to some We do not know accurately the two-way probabilistic complexity
fraction f is pennissible, then the corresponding complexity are called of the set intersection function. However, the following result shows
probabilistic tWO"'way complexity (with e"or f) and probabilistic one-way that it grows faster than log n, and thus the set intersection function
complexity (with e"or f), denoted by Ct.(!; 1 ++ 2) and Ct.(!; 1 -+ is definitely harder than the identity function in this regard.
2). We remark that, as is the standard convention for randomized
algorithms (e.g. Rabin [R]), the error probability is defined with
Theorem 6. For any fixed number 0 < f < 1/2, CE(h,,; 1 ++
2)/ log n -+
00 as n -. 00.
respect to the randomized moves of the algorithms, and should be
independent of the values of i, j. The above description was of course We will now exihibit a function that has a provably high two-way
very rough. The fonnal definitions will be omitted here (they can be probabilistic complexity. Let the multiplication function l1'( i,;") be the
found in (Y2]). For functions f that may take on more than values Boolean function defined as the parity of (i · }) mod p, where p is an
0,1, all these complexity measures can be extended straightforwardly.
n-bit prime number and 1= J = {1, 2, ... ,p-1}.
Few results about the probabilistic communication complexity
Theorem 1. For any fixed number 0 < f < 1/2, there exists a
(one-way or two-way) for specific functions were known. The only
constant A > 0 such that Ct.(l1'; 1 ++ 2) > An for all n.
solved case is the identification function I (see [Y2]) or close variations
Theorem 7 can be ~erived as an application of a general result,
(see [MS]), defined by f(i,j) = 1 if i = j and 0 otherwise; in this
which we will now discuss. Let 0 < ~ < 1/2 and· A > 0 be any fixed
case a weak lower bound supplied by a general theorem turns out to
constants chosen once and for all. We will introduce two concepts:
be tight. In [Y2], it was left as an open problem to determine the
"moderate" and "anticorrelated".
probabilistic complexity of a list of elementary functions. This is one
of the few situations where one has the opportunity to compare, in a Let f( i, j) be a function defined on I X J where III = IJ I = N,
well-defined setting, the power of randomized algorithms versus that with f taking on values in a set V. Let n = flog Nl. For each i E I
of detenninistic algorithms. In the present paper, we will study the and v E V, let 8,(u) = {j I f(i,j) = v} ~ J. Let ~f(v) be the
complexity aspect for several of these problems. set of (i,j) for which !(i,j) = v.
For any v E V, we will say that ! is v-moderate if JJt <
3.2. Results.
l~f(v)l/ N 2 < 1 -~. We say that f is v-antico"elated if
We define an ordering function gn( i, j) and a set intersection 18,(u)n8k(V)1 ~ hIS,(u)I·18k(V)I(1 +
O(~)) for all i,k E I.
function h n(i,}) on n-bit integers i, j as follows: gn( i, j) = 1 if Intuitively, if a random column j E J is chosen, the two events that
i < j and 0 otherwise; hn(i,j) = E:=l Bkbkwhere B1 B 2··· a" f(i,j) = v and f(k,j) = v do not reinforce each other by more
and bt b2 ••• bn are the binary representations of i and j. If we regard +
than a factor of 1 O( ~). For example, let ~ =1/5 and A =1/4,
i and j as representing two sets 8 and T from an n-element universe it is not hard to verify that lp is v-moderate and v-anticorrelated for
(8 = {k I Bk = 1 }), then hn(i,j·) is equal to 18 nTI· v E {O, 1}. Thus Theorem 7 will follow from the following general
The probabilistic one-way complexity for the ordering function result
and the intersection function are determined by the following theorem. Theorem 8. Let 0 < f < 1/2 be any fixed constant. If f is both
It means that the allowance of error and probabilistic moves do v-moderate and v-anticorrelated for some v, then CE(f; 1 +-+ 2) =
not make it easier in one-way communication for computing these 9(n).
functions; this is in contrast to the case of the identity function, in
which case O(n) bits are needed for detenninistic communication, but 3.3. Proof of Theorem 8.
O(log n) bits would suffice for probabilistic one-way communication.
Clearly, we only need to establish the lower bound for Ct.(!; 1 ++
Theorem 5. For any fixed number 0 < E: < 1/2, there exists 2). As the error bound for a probabilistic algorithm with error f can
a constantA> 0 such that CE(g,,; 1 -+ 2) > An, and CE(h,,; 1 -+ be made arbitrarily small by repeating it enough number of times, we
2) > An for all n. can assume that f < J1.2/32. For simplicity, let us further assume that

423
p, < I Si(v)l/ N < 1 - p, for all i E I, i.e., each row in the matrix From the entropy bound for the external path length for tree A and
(/(i, j)) contains a proper fraction of the v' 8. (2), (4), we obtain
Let us first digress to consider a slightly different problem. ~ IS"I "IS"I 1og4 (ITII)
Suppose that the values of i and j are independently and unifonnly L-,
leeK
t" IT'I ~ L-, -IT'1
leeK
-IS
Ie
I
distributed over I and J, and that the values of f( i, j) is to be
> E IS"l log• (IJIII.IJI). (5)
computed by a deterministic two-way algorithm A with average error
- leEK IT'I 418,,1
probability not exceeding E. Let C{A) denote the average number of The lemma follows from (5) and the definition of M(E', v, I).•
bits exchanged. Define the distributional two-way E-error complexity Theorem 8 follows immediately from Lemma 2,. 3 and the next
of I, written DE(/; 1 ++ 2), to be the minimum such C(A). lemma.
The problem just introduced is related to the original problem. Lemma 4. M(E',.V, I) = O(N2-~) for 0 < E' < p,/4.
In fact, these two fonnulations roughly correspond to viewing the
Proof. Let G X H be any (E', v)-monochromatic rectangle. We
communication problem from the distributional approach and from
need only show that either fGI = O(Nt-~) or fHI = O(Nt-~). If
the randomized approach, as discussed in Yao [VI]. As a consequence
IGI ~ Nl-~, then we are done. We thus assume that IGI > Nl-~.
of a general theorem (Yao (VI]) that relates the complexities from
Pick a random j E J; let Xi be the 0-1 random variable that takes
these two viewpoints, we have the following lemma.
the value 1 if and only if I(i,j) = v. Let X = E,eGX,. By
Lemma 2. CE(/; 1 ++ 2) ~ !D2E(/j 1 ++ 2). assumption, each row of (/(i,j» has at most a fraction of 1- JJ of
We will now prove a lower bound to DE'. Let us call S = G X H the v's, implying
a rectangle when G ~ I,H ~ J, and call IGI·IHI the area of 8. E(X) = E E(Xi) ~ (1 - p,)IGI· (6)
We say that S is (E', v)-monochromatic if at least a 1 - E' fraction of ieG
the (i,j) E S satisfy I(i,j) = v. Let M(E', v, f) be the maximum Also, using the property of I being v-anticorrelated, ~e obtain
area of any (E', v)-monochromatic S. 2
Var(X) = E(X ) - (E(X»)2
Lemma 3. D.,(Jj 1 ++ 2) ~ log. (~~~:~~~)) for all 0 < ~ < = E(E(X~) - (E(Xi))2)
ieG
JJ/2, where a = IJ/4 and E" = 4E'/ p,. +.2 E(E(XiXi ) - E(Xi)E(Xi ))
Proof. Let A be any deterministic 2-way algorithm for computing i<i
I( i, j) with error probability ~ E'. Recall that A is a 4-ary tree with ~ IGI + IGI 2~). ~ 2~!2. (7)
alternate levels of nodes representing the moves of A and B. Let the
leaves with output I(i,j) = v be L1,L2, •.. ,Lt., occurring at depth From (6), (7) and the Chebycheff's Inequality, we obtain
Lt , L2 , ••• , it respectively. The set of input values (i,j) arriving at Pr{ (G X {j}) is (21, v)-monochromatic}
each leaf Lie form a rectangle 8,,; let E" be the fraction of SIc for ~ Pr{tx - E(Xli > ((1 - 21) - (1 - IJ))IGI}
which v is not the correct answer. Qearly, Var(X) 1
~ ((J.£ - 2£')IGI)2 = O( N). ).
E E"IS"I ~ E'III ,·IJI· (1)
tSIeSt This means that there are at most N . O( ~) indices j E J such that
G X {j}is (2E', v)-monochromatic. Now, note that for G X H to be
Let T = U" SIc. Then every (i,j) E 6t(v)-T will be given the
wrong answer by A. As 16f(v)1 ~ JJIII·IJI, the set T must contain (E', v)-monochromatic, there need to be at.least IHI/2 indices j E H
at least half of the points in 6 f ( v); otherwise the error probability by for which G X {j} is (2E', v)-monochromatic. It follows that
A would exceed IJ/2 > E'. It follows that
IHI ~ N 'O(~).) = O(N 1-)').
1
tTl ~ 2IJIII·IJI. (2) This proves Lemma 4. •
From (1) and (2),

(3) 4. Hashing.

4.1. Introduction.

Let K = {k I flo ~ 4:}, and T' = Er.eK S", then we conclude Hashing techniques have often been used to store and retrieve
records in the fonn of a table. In open-address hashing, the "key"
from (3)thM 1 of each record is mapped by a bashing function into a sequence of
IT'I ~ ilTI. (4)
table locations, and the insertion and retrieval of the record follow this

424
sequence. In particular, for uni/onn IJ,ashing, where the hash function keys K t ,K2, ... ,KN are inserted, with h(K,) = U,; denote by
maps keys into random pennutations, the expected cost of inserting a R(Ki, Tp ) the retrieval cost if Ki is to be retrieved from Tp , and let
new key into a table a-fraction full is essentially equal to (1 1 01) for R,,(Tp ) = -k Ef:l R(K" Tp ).
large table, while the expected cost of retrieving a record in the table We model the performance of a hashing function h with a
- essenti-ally -1 Iog - 1- .
IS
o l-a probability distribution p", over QM. Consider a random (N, M)-
In 1972, Ullman [U] raised the question about the optimal hash scenario p = (Ul, 0'2, ••• , UN), where each (Ji E Q M is independently
function, and defined a mathematical model for discussing it He distributed according to p",_ Let C",(N, M) be the expected value of
showed that, in terms of the insertion cost of a new key, no hash R(KN, Tp ), and C~(N, M) the expected value of R,,(Tp ).
function can have a lower cost than the unifonn hashing function all Unitorm hashing is modeled by choosing p,,(1T)= l/M! for all
the time; however, he also exihibited a hash function that performs 1T E QM. For this hash function, it was known (see, e.g. Knuth[KI»
better than uniform hashing some of the time. Ullman conjectured that C",(N,M) = M-¥:J+l and C~(N,M) = MiJl(HM+ 1 -
that,. in terms of the retrieval cost, uniform hashing is optimal all HM - N + 1 ), where Hs is the Harmonic Number 1+!+... +!; for
the time. Our main result is to prove Ullman's conjecture in the fixed 0 <Q= N/M < 1 and N ...... 00, this gives C",(N,M) =
asymptotic sense, namely, the retrieval time using any hash function -1- + 0(1) 1
and C~(N, M) = -log - - 0(1). (In the
1 +
is+at least !.log _1_
o 1-0
+ 0(1). 1-0 0 1-0
remainder of this section, we will use 0 to denote the loading faclor
Knuth [K2] raised a weaker conjecture that, among single-hashing N/M.)
jUnctions, which form a restricted family of hashing functions, no one Our main result is the following theorem.
can perform substantially better than the performance bound of a
random single-hashing function (the bounds are very close to those for Theorem 9. For any E > 0, there exists a constant a E such that for
the uniform hashing). That conjecture was proved by Ajtai, Komlos, any hashing function h,
and Szemeredi [AKS]. The proof of our result uses a refinement of C' (N M) >!l _1__ aElogM (8)
the approach that they developed [AKS). Readers may refer to [AKS] h' - 0 og 1 - 0 M'
for more discussions on the intuitions behind the techniques. if £ < 0 < 1- E. Also, there exists an absolute constant b such that
for any hashing function h,
4.2. Terminology.

As hashing is a familiar concept (See e.g. [Kl», we will describe


C~(M,M) 2 logM -loglogM - b (9).
the terms informally. We refer to [U] for precise definitions where
Note that, in this notation, a single-hashing function h is
this model was first formulated.
a hashing function with Ph(1T) = 1/M for 1T in a certain set
A sequence of keys are inserted into a table of size M according {1rlJ 1T2,".' 1rM-l}, where 'Trj E QM starts with j, and p,,(1T) = 0
to a hashing function h. We will deifine C",(N, M), the expected otherwise. Ajtai et a1. [AKS] proved that (8) and (9) are true when
cost of inserting the N-th key, and C~(N, M), the expected cost of h is a single-hashing function.
retrieving a key in the table containing N keys.
Let QM be the set of all permutations of {O, 1,2, ... , M -1}, 4.3. Proof of Theorem 9.
where M is any positive integer. A hashing function h assigns each
Without loss of generality, we can assume that the hashing
key K a permutation h(K) = i 1i2·· ·i M E QM. In inserting a
function h satisfies p",('Tr) 7'= 0 for every 1T E Q M, as the quantity
key K into a hash table, we try locations it, i2,." in tum until an
C~ (N ,M) is a continuous function (in fact a polynomial) in the
empty slot is found, where K is then inserted; the insertion cost is
M! variables {ph(1r) 1'Tr E QM}. A random (L, M)-scenario p =
measured by the number of locations tried before finding an empty
slot Now, suppose a sequence of N keys have been inserted, where
(0'17 0'2, • •• , UN) will be called an h-random (L, M)-scenario, if 0',
are independently distributed according to Ph; L may be 00.
1 ::; N ::; M; let T be the resulting hash table. To retrieve a key
K in T, the rule is again to try in sequence the locations ii, i2, ... Suppose that M, N, h are given, and let 1 ~ d < N be any
as given by h(K), until K is found; the retrieval cost is the number integer. Consider the insertion of N keys according to an h-random
of locations tried. (N, M)-scenario. For each k E {O, 1, 2, ... , M - 1}, let 'Vic be the
probability that table location k is occupied after N - d keys have
An (N, M)-scenario p is a sequence (u 1, 0'2, ••• , UN) where each
been inserted, and let 6k be the expected number of times location k
0', E QM. Let T p be the table obtained when a sequence of N

t
In Section 4, all the logarithms are in base e.
*
The notation we are using follows [U]. In (Kl], the notation C' is
used to denote the insertion cost instead of the retrieval COSI.

425
has been probed during the insertion process of the N keys. Oearly, that it follows from our assumption Ph(1r) :1= 0 that Ph(V) =F 0 for
all nonempty V.
C~(N,M) = ~ E 61" (10)
OSk<M We now describe a procedure that generates a random (N, M)-
and scenario p = (Ub U2, ••• , UN)' It will be seen that 0',
are indepen-
N-d= E Vk· (11) dently distributed according to Ph. It proceeds in three steps.
OSk<M
Procedure RANDSCEN;
Xi
Let I(x) = X- e~~ E(i - d),-, where X = -log(l- x). Our Step 1: Generate a random skeleton scenario w -
i>cl I.
(1rlJ 1r2," ., 1rN-cl) by successively generating 1r1, 1r2, • • " each new
main effort will be in proving the .following proposition. Roughly
1r;+1 is randomly chosen from Q'[w(;)] according to distribution PV;'
speaking, it states that if location k is probed at least once, then it is
where V;. = Q'[w(;»).
likely to have been probed a fair number of times.
Step 2: For each ~ ~ j < N - d, generate first an integer r; ~ 0
Proposition 2. 6k ~ I(vk) for each k. distributed geometrically with probability u w ,; = Ph( Q[w(;»)), that
Theorem 9 is an analytic consequence of(IO), (11) and Proposition is, Pr{r; = i} = (1 - 'Uw,;)u~,;; generate a random (r;, M)..
2, as demonstrated in [AKS). We shall review it below. First, observe scenario w; = (1r;I' 1r;2,"" 1r;,rj)' where each 1r;,t is randomly and
that I(x) is a convex function (which can be verified by showing that independently chosen from W; = Q [w(i)] distributed according to
I" > 0); this implies that E i ril(x,) ~ I(E, rixi) for ri > 0 and Pw;'
E, ri = 1. Then, from (3), (4) and Proposition 3, we obtain Step 3: Let w' = (1rt, WI, 1r2, W2,"" 1rN-cl, WN-cl), where
= EIS;SN-clrj; if r > d, then let p be the
C~(N, M) ;;:::~ E f(Vk) r
(N, M)-scenario. w'(N); otherwise, generate d - r more ran-
°Sk<M

;;::: ~f(E~Vk) dom O'N-(cl-r)+I,UN-(cl-r)+2"",O'N, each chosen indepen-


dently from QM according to distribution Ph, and let p be

= ~f(NM d) (w', UN-(cl-r)+1J UN-(cl-r)+2,'." UN)'

End RANDSCEN.
By choosing d = flO log Ml, one can show that I(~) is well
Lemma 5. The p generated by RANDSCEN is an h-random (N,
approximated by -log(l- ~); that is, I(x) ~ X in this case. A
M)-scenario.
close examination of the error bounds then leads to Theorem 9.
Proof. (Sketch) Let us consider a modified procedure, in which step 3
It remains only to prove Proposition 2. For the rest of the proof,
is replaced by the generation of an h-random (00, M)-scenario w" =
let k E {O, 1, ... , M - I} be fixed. We fi~t define some notations.
UlJ U2, U3,"', and the output of p' = w'w". A close examination
Let L be any integer. For any (L, M)-scenario w, partition QM into
shows that p' will also be an (00, M)-scenario. On the other hand, it
two disjoint parts Q[w] and Q'[w) as defined below. Consider the
is easy to see that the p generated by the original procedure has the
table Tw obtained by inserting keys according to w, and let B w ~
same distribution as p'[N). This proves the lemma.•
{O, 1, ... , M - I} be the set of occupied positions in Tw • We put
1r E QMinto Q[w] if a new key K with h(K) = 1r will 'occupy Suppose p is an (N, M)-scenario generated by RANDSCEN,
position k when inserted into Tw ; otherwise, let 1r E Q'[w]. In other with r being the parameter generated in step 2 during the process.
words, if k E B w , then Q[w] = 0; otherwise, Q[w) contains all
Lemma 6. Let s(p) be the number of times that table position
those 1r = ii, i 2 , • •• , il-b k, i l + b • •• , i M with it E B w for 1 ~
k is probed during the insertion of N keys according to p. Then
t < 'l. For example, when w =the empty string, Q[w] is the set of
s(p) ~ min{r, d}.
permutations 1r that starts with k.
Proof. Write p = (1r b WI, 1r2, W2"") with w; = 1r;1 1ri2" '1rirj in
For any (L, M)-scenario w = (1rl' 1r2, ••• , 1rL), we will use
the notation of the procedure. Each insertion that corresp~nds to a
wej) to denote its prefix, the (j, M)-scenario w = (1r1' 1r2,"" 1r;).
An (N - d, M)-scenario w = (1rI' 1r2,"" 1rN-cl) will be called a 1r;lin p will probe location k. The total number of 1r;l in p is equal
skeleton scenario if k is not occupied in the table Tw' Note that we to min{r, d}.1
can alternatively define a skeleton scenario'as a scenario' for which To prove Proposition 2, imagine that we follow the steps in
1r;+1 E Q'[w(;)] for 0 ~ j < N - d. RANDSCEN to generate an h-random (N, M)-scenario p. We wish
For any nonempty subset V ~ QM, let Pv denote the prob- to analyze this process of generating p to' estimate the expected value
ability distribution obtained when Ph is restricted to V. Let Ph(V) of min{r, d}; then Lemma 6 will provide the lower bound needed
denote E1rEV Ph(1r), then PV(1r) = Ph(1r)/Ph(V) for 1r E V. Note since 6k is the expected value of s(p).

426
It follows from (14), (IS) that
Consider the execution of RANDSCEN as a stochastic process.
Let n denote the random variable corresponding to w in Step E(S In = w) 2 j(Pr{t:.w = I}). (16)
1, Xi denote the random variable for ri in step 2, and X =
Using (16) and the convexity of I, we obtain
El~'~N-tlX,. Let S denote the random variable corresponding
to s(p) defined in Lemma 6. Oearly, tJlf; = E ew E (S I () = w)
w

Die = E(S). (7) ~ E I(E cw· Pr{t1w= I})


w w
= I(Pr{d = I})
We also introduce some scalars. Let cw =
Pr{ n = w} for skeleton
scenario w; let p'w,j denote p(Q[wC.ilD as defined in Step 2 of the
= 1(61;).
procedure. This proves Proposition 2. •

Our approach is to analyze the expected value of min {r, d} for


5. Concluding Remarks.
fixed W, and then average over w. From Lemma 6, we obtain
We conclude this paper with some open problems.
E(S I (1 = w) 2 E(min{d, X} In = w)
= E Pr{X2 i l o =w} (1) Can one prove a super-polynomial lower bound for fixed-width
lSiStl branching programs that compute a familiar function? In particular, is
= E Pr{x(w) 2 i}, (13) the minimum size of any width- 3 branching program for the CLIQUE
lSiSd
problem super-polynomial?
where XCw) = ~
LJ X(~)
, with X(~)
, being the random variable (2) Is there a hierarchy theorem for fixed-width branching programs
1S,SH-a
Xi restricted to the probability space specified by n = w. As similar to Sipser's result lSi] for bounded-depth circuits? More
precisely, let k > 2 be any fixed integer, is there a family of functions
X~~l, 1 ~ j ~ N - d, are independent variables with distribution
Pr{XCw) = i} = (1 - J.Lw,' )(J.Lw,,)i, the following analytic result {It, 12' ...'}, where In is a Boolean function of n variables, such

applies. that In has a width-k polynomial-size branching program, but the


Lemma 7. [AKS] Suppose Y = E1SiSG 1';', where minimum size of any width-(k -1) branching program computing 1ft
must grow super-polynomially in n?
Y 1 , Y2 , • • • ,Ya are independent random variable with Pr{Y;' =
Al (3) Klawe, Paul, and Pippenger (private communication) recently
i} = (1 - Zj)(Zj)'. Then Pr{Y ~ i} ~ e->' ~ l! where
have exihibited, for each k, polynomial-time computable monotone

A= -IOg( IT
1SiSA
(1- Zj)}
functions whose depth-k monotoQe circuit size are O(2 ftEk ) for some
Elf; > o. Can one prove that, for every fixed k, the minimum size of

Proof. See [AKS].• any depth-k monotone circuit for computing the majority function is
at least 0(2 ft k )?
f

From Lemma 7 and (13) we have (4) Theorem 7 proves that the probabilistic two-way complexity (with

A l en-or E) for the set intersection function has to grow faster than log n.
E(S In = w) 2 E e-).w E --T We conjecture it to be of order O( n).
1~iSd l~i t.
(5) The k-round probabilistic one-way complexity (with error f.) for the
). ~ A~ (14)
= Aw - e- w L,,(l- d)lI. ordering function is O(n 1 / k ), ignoring the logarithmic factors. Is this
l>d. .
a tight bound?
where Aw = -IOg( .IT lSi~N-d
(1 - UW,j)} (6) Can one prove that uniform hashing is also asymptotically optimal
in the insertion cost all the time? More precisely, can one prove that
1
Now, let t1 be the random variable that is equal to 1 if location k for any fixed 0 < 0: < 1, C",(N, M) = - - 0(1)?
1-0:
+
is occupied in TplN -d] and 0 otherewise; let t:. w denote t1 restricted
to the situation 0 = w. References.
For any skeleton scenario w, it is easy to check from the definitions [A] H. Abelson, "Lower bounds on information transfer in distributed
that computations," Proc. 19th Ann. IEEE Symp. on Foundations of
1- Pr{d w = I} = Pr{~w = O} Computer Science, 1978, Ann Arbor, Michigan, 151-158.
- II (1 - ~w".). (15)
[AKS] M. Ajtai, J. Komlos, and E. Szemeredi, "There is no fast single
l~iSN-d
hashing function," Information Processing Letters 7 (1978), 270-273.

427
[AUY] A. V. Aho, 1. D. Ullman, and M. Yannakakis, "On notions of [PP] B. R. Plumstead and J. B. Plumstead, "Bounds for cube coloring."
infonnation transfer in VLSI circuits," Proc. 15th Ann. ACM Symp. Preprint, Berkeley, 1982.
on Theory ofComputing, April 1983, Boston, Massachusetts, 133-139.
[R] M. O. Rabin, "Probabilistic algorithms," in Algorithms and
[BDFP] A. Borodin, D. Dolev, F. E. Fich, and W. Paul, "Bounds for
Complexity: New Directions and Recent Results, edited by 1. F. Traub,
width two branching programs," Proc. 15th Ann. ACM Symp. on
21-39, Academic Press, New York, 1976.
Theory of Computing, April 1983, Boston, Massachusetts, 87-93.
[Sa] J. E. Savage, The Complexity of Computing, Wiley, New York,
[CFL1] A. K. Chandra, S. Fortune, and R. Lipton, "Unbounded fan-in
1974.
circuits and associative functions:' Proc. 15th Ann. ACM Symp. on
Theory of Computing, April 1983, Boston, Massachusetts, 52-60. [Si] M. Sipser, "Borel sets and circuit complexity," Proc. 15th
[CFL2] A. K. Chandra, M. L. Furst, and R. Lipton, "Multi-party Ann. ACM Symp. Oil Theory of Computing, April 1983, Boston,

protocols," Proc. 15th Ann. ACM Symp. on Theory of Computing, Massachusetts, 61-69.
Apri11983, Boston, Massachusetts, 94-99. [U] J. D. Ullman, "A note on the efficiency of hashing function,"
[DGPR] P. Duris, Z. GaIil, W. Paul, and R. Reischuk, "Two nonlinear JACM 19 (1972), 569-575.
lower bounds," Proc. 15th Ann. ACM Symp. on Theory ofComputing, [V] L. G. Valiant, "Exponential lower bounds for restricted monotone
April 1983, Boston, Massachusetts, 127-132. circuits," Proc. 15th Ann. ACM Symp. on Theory ofComputing, April
(EP] A. El Gamal and D. F. Pang, "Communication complexity of 1983, Boston, Massachusetts, 110-117.
computing the Hamming distance," draft, 1983. [VI] A. C. Yao, "Probabilistic complexity: towards a unified measure
of complexity," Proc. 18th Ann. IEEE Symp. on Foundations of
[ES] P. Erdes and 1. Spencer, Probabilistic Methods in Combinatorics,
Computer Scienc, October 1977, Providence, Rhode Island, 222-227.
Academic Press, New York, 1974.
[V2] A. C. Yao, "Some complexity questions related to distributive
(F] M. Fredman, "Two applications of a probabilistic search technique:
computing," Proc. 11th Ann. ACM Symp. on Theory of Computing,
sorting X + Y and building balanced trees," Proc. 7th Ann. ACM
April 1979, Atlanta, Georgia, 209-213.
Symp. on Theory ofComputing, May 1975, Albuquerque, New Mexico,
240-244.
(FSS1] M. Furst, 1. B. Saxe, and M. Sipser, "Parity, circuits, and
the polynomial time hierarchy," Proc. 22nd Ann. IEEE Symp.on
Foundations of Computer Science, 1981, 260-270.

(FSS2] M. Furst, 1. B. Saxe, and M. Sipser, "Parity, circuits, and the


polynomial time hierarchy," preprint, a newer version of [FSSl], 1982.
[Kl] D. E. Knuth, The Art of Computer Programming, VoL 3,
Addison-Welsley, Reading, Massachusetts, 1975, second printing.

[K2] D. E. Knuth, "Computer science and its relation to mathematics,"


Am. Math Monthly 8 (1974), 323-343.

(LS] R. Lipton and R. Sedgewick, "Lower bounds for VLSI," Proc.


13th Ann. ACM Symp. on Theory of Computing, May 1981,
Milwaukee, Wisconsin, 300-307.
(MJ W. Masek, "A fast algorithm for the string editing problem and
decision graph complexity," M. Sc. Thesis, MIT, May 1976.

[MS] K. Mehlhorn and E. M. Schmidt, "Las Vegas is better than


detenninism in VLSI and distributive computing," Proc. 14th Ann.
ACM Symp. on Theory of Conzputing, May 1982, San Francisco,
California, 330-337.
(PS) C. H. Papadimitriou and M. Sipser, "Commnication complexity,"
Proc. 14th Ann. ACM Symp. on Theory of Computing, May 1982,
San Francisco, California, 196-200.

428

You might also like