Space-Efficient Approximate Voronoi Diagrams
Sunil Arya
Theocharis Malamatos∗
David M. Mount†
Department of Computer
Science
The Hong Kong University of
Science and Technology
Clear Water Bay, Kowloon,
Hong Kong
Department of Computer
Science
The Hong Kong University of
Science and Technology
Clear Water Bay, Kowloon,
Hong Kong
Department of Computer
Science and
Institute for Advanced
Computer Studies
University of Maryland
College Park, Maryland 20742
[email protected]
[email protected]
[email protected]
ABSTRACT
Given a set S of n points in IRd , a (t, ǫ)-approximate Voronoi
diagram (AVD) is a partition of space into constant complexity cells, where each cell c is associated with t representative
points of S, such that for any point in c, one of the associated representatives approximates the nearest neighbor to
within a factor of (1 + ǫ). Like the Voronoi diagram, this
structure defines a spatial subdivision. It also has the desirable properties of being easy to construct and providing
a simple and practical data structure for answering approximate nearest neighbor queries. The goal is to minimize the
number and complexity of the cells in the AVD.
We assume that the dimension d is fixed. Given a real
parameter γ, where 2 ≤ γ ≤ 1/ǫ, we show that it is possible
to construct a (t, ǫ)-AVD consisting of
O(nǫ
d−1
2
γ
3(d−1)
2
log γ)
cells for t = O(1/(ǫγ)(d−1)/2 ). This yields a data structure
of O(nγ d−1 log γ) space (including the space for representatives) that can answer ǫ-NN queries in time O(log(nγ) +
1/(ǫγ)(d−1)/2 ). (Hidden constants may depend exponentially on d, but do not depend on ǫ or γ).
In the case γ = 1/ǫ, we show that the additional log γ
factor in space can be avoided, and so we have a data structure that answers ǫ-approximate nearest neighbor queries in
time O(log(n/ǫ)) with space O(n/ǫd−1 ), improving upon the
best known space bounds for this query time. In the case
γ = 2, we have a data structure that can answer approximate nearest neighbor queries in O(log n + 1/ǫ(d−1)/2 ) time
using optimal O(n) space. This dramatically improves the
∗
The work of the first two authors was supported by the
Research Grants Council of Hong Kong, China under project
number HKUST 6158/98E.
†
This author’s work was supported in part by the National
Science Foundation under grant CCR-0098151.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
STOC’02, May 19-21, 2002, Montreal, Quebec, Canada.
Copyright 2002 ACM 1-58113-495-9/02/0005 ...$5.00.
previous best space bound for this query time by a factor of
O(1/ǫ(d−1)/2 ).
We also provide lower bounds on the worst-case number of cells assuming that cells are axis-aligned rectangles
of bounded aspect ratio. In the important extreme cases
γ ∈ {2, 1/ǫ}, our lower bounds match our upper bounds
asymptotically. For intermediate values of γ we show that
our upper bounds are within a factor of O((1/ǫ)(d−1)/2 log γ)
of the lower bound.
1. INTRODUCTION
Given a set S of n points in IRd , called sites, the Voronoi
diagram of S is a partition of space into cells, such that
each cell is the region of space consisting of all points that
are closer to a particular site than to any other. Voronoi
diagrams are among the most fundamental and well-studied
objects in computational geometry.
Voronoi diagrams have numerous applications in areas
such as pattern recognition and classification, machine learning, robotics, and graphics. Many of these applications are
in high dimensions but, unfortunately, the complexity of
Voronoi diagrams can be as high as n⌈d/2⌉ in dimension
d. Constructing Voronoi diagrams can be complicated due
either to numerical inaccuracies or degeneracies, which result when points are cocircular or nearly so. Also, although
Voronoi diagrams implicitly encode the information of what
site is closest to a given point, by themselves they are not
suitable for use as a data structure for nearest neighbor
searching. These shortcomings have led to consideration of
simpler structures that can be used in place of the Voronoi
diagram.
We begin with some definitions. For a real parameter
ǫ > 0, we say that a point p ∈ S is an ǫ-nearest neighbor
(ǫ-NN ) of a point q ∈ IRd , if the distance between q and p is
at most (1 + ǫ) times the distance between q and its nearest
neighbor in S. We assume that distances are measured in the
Euclidean metric. An approximate Voronoi diagram (AVD)
of S is defined to be a partition of space into cells, where each
cell c is associated with a representative rc ∈ S, such that rc
is an ǫ-NN for all the points in c [9]. More generally we may
allow up to some given t ≥ 1 representatives to be stored
with each cell, and require that for any point in the cell,
one of these t representatives is an ǫ-NN. We refer to such a
decomposition as a (t, ǫ)-approximate Voronoi diagram.
Vleugels and Overmars [13] considered approximating the
Voronoi diagram of a disjoint set of convex sites in IRd for the
purpose of motion planning. Har-Peled [9] showed how to
construct a (1, ǫ)-AVD of O((n/ǫd )(log n) log(n/ǫ)) size. A
cell in this subdivision is the difference of two axis-aligned
cubes (the inner cube is optional). Moreover, these cells
are stored in a compressed quadtree data structure, which
provides a simple and efficient method for answering ǫ-NN
queries in O(log(n/ǫ)) time. This approach is very appealing
because, like the Voronoi diagram, it defines a subdivision
of space, but there are no problems with geometric degeneracies, and a point-location structure is provided. The construction and search structure are very simple and practical.
The principal shortcoming of Har-Peled’s data structure
is its size. Sabharwal et al. [12] reduced the size by a logarithmic factor. Arya and Malamatos [1] showed that it
is possible to construct a (1, ǫ)-AVD of O(n/ǫd ) size, and
they provided a lower bound of Ω(n/ǫd−1 ) on the size of
a (1, ǫ)-AVD. As with Har-Peled their cells are differences
of two axis-aligned hyperrectangles. Further, they introduced the notion of allowing multiple representatives per
cell and showed that given a real parameter 2 ≤ γ ≤ 1/ǫ, it
is possible to construct a (t, ǫ)-AVD of O(nγ d ) size, where
t = O(1/(ǫγ)(d−1)/2 ). By varying γ it is possible to achieve
a tradeoff in space and query time.
To understand the significance of this line of work, it is
useful to briefly review the evolution of results in the field of
approximate nearest neighbor searching in Euclidean space.
There have been two principal approaches. One direction
has focused on eliminating exponential dependencies on dimension, as characterized by the work of Indyk and Motwani [10] and Kushilevitz et al. [11]. Although these methods provide space and query times without exponential dependence on dimension, their space requirements grow as
nO(1) , which may be too high for many applications.
The second direction, which has been developed in computational geometry, is to assume that d is a small constant and n is large. The resulting data structures have
sizes that are nearly linear in n but may have factors in
space and query time that grow as (1/ǫ)O(d) . Arya et al. [3]
and, later, Duncan et al. [8] provided data structures that
achieve O((1/ǫ)d log n) query time and use O(n) space (independent of ǫ). These structures were optimal with respect to space, but the ǫ factors in the query times were
far from optimal. The cells of the Voronoi diagram are convex polytopes, and results on approximating convex polytopes by Dudley [7] suggest that (1/ǫ)(d−1)/2 should be the
proper bound. Indeed, Clarkson [6] showed that queries
could be answered in O((1/ǫ)d/2 log n) time. However Clarkson’s space bounds were larger by a factor of at least (1/ǫ)d/2
and involved a parameter that depended on the geometric
arrangement of the sites. Through an impressive combination of structures, Chan [5] showed that the query time
could be reduced to O((1/ǫ)(d−1)/2 log n) time using a data
structure with space O((1/ǫ)(d−1)/2 n log n). The recent results of Arya and Malamatos [1] further tightened these
bounds by showing that the query time could be improved
to O(log n + 1/ǫ(d−1)/2 ) and removed the log factor from the
space bound.
Finally, it seems that the constant factors are coming exactly in line with bounds suggested by Dudley’s results.
The one remaining deficiency is the significant factor of
O((1/ǫ)(d−1)/2 ) in the space bound. In this paper we show
that this factor can be eliminated altogether. In particular
we provide a data structure of size O(n) that can answer approximate nearest neighbor queries in O(log n + 1/ǫ(d−1)/2 )
time. Thus we simultaneously provide optimal space bounds
while matching the query time suggested by Dudley’s results. Our approach is based on a more space-efficient version of approximate Voronoi diagrams. Our approach differs
from those of Har-Peled and Arya and Malamatos principally in how cells are generated and how representatives are
chosen. Otherwise, it produces an AVD data structure with
the same desirable characteristics.
More specifically, our main result is that, given a set
of n sites S in IRd , 0 < ǫ ≤ 1/2, and a parameter 2 ≤
γ ≤ 1/ǫ, we show how to construct a (t, ǫ)-AVD, where
t = O(1/(ǫγ)(d−1)/2 ), consisting of
O(nǫ
d−1
2
γ
3(d−1)
2
log γ)
cells. This yields a data structure of O(nγ d−1 log γ) space
(including the space for representatives) that can answer ǫNN queries in time O(log(nγ) + 1/(ǫγ)(d−1)/2 ). The bounds
described above on approximate nearest neighbor searching
arise with γ = 2. In the case γ = 1/ǫ, the additional log γ
factor in space can be avoided, and we have a data structure that answers queries in time O(log(n/ǫ)) with space
O(n/ǫd−1 ). These results are presented in Theorem 1 and
Corollary 1.1 in Section 3. Note that our bounds are superior to those of [9] and [1] for all values of γ.
We believe that the factor of log γ in the number of cells
is an artifact of our proof technique. As evidence of this
we provide a significantly different construction and a different analysis based on a charging argument in which this
log γ factor in the number of cells is replaced by factor of
log 2/(ǫγ). We feel that this alternative
construction may be
p
of independent interest. For γ > 1/ǫ this provides a superior space bound. These results are presented in Theorem 2
in Section 4.
We also present the first lower bounds on the worst-case
number of cells in a (t, ǫ)-AVD for multiple representatives.
We assume that cells are based on axis-aligned rectangles
of bounded aspect ratio. These results show that our space
bounds are tight in the two extremes, γ = 2 and γ = 1/ǫ.
For all intermediate values of γ our upper bounds on space
are tight to within a factor of O((1/ǫ)(d−1)/2 log γ). These
results are presented in Section 5.
Our results come about from a few innovations. The principal reduction in space arises from a form of deterministic
sampling based on the BBD-tree. By applying the construction of Arya and Malamatos on an appropriately sampled
subset of points, we can decrease the number of cells significantly while increasing the number of representatives by
only a constant factor. The main difficulty is avoiding an additional factor of O(log n) in the number of representatives,
as would be expected from well known results on ǫ-nets. Our
other innovation is to create cells more economically by considering a well-separated pair decomposition of the points,
and concentrating the generation of cells along the bisector
between well-separated pairs. Our alternative approach is
based on a charging argument which shows that, after certain modifications to Arya and Malamatos’s construction,
the number of representatives for most of the cells is significantly smaller than the maximum possible. This enables us
to reduce the number of cells drastically by combining cells
with fewer representatives into larger cells.
2.
PRELIMINARIES
Throughout we assume that the dimension d is a fixed constant, and the constants hidden in the asymptotic bounds
may depend on d (but not on ǫ or γ). We assume that the
set S of points has been scaled and translated to lie within a
ball of radius ǫ/8 placed at the center of the unit hypercube
[0, 1]d .
Let x and y denote any two points in IRd . We use |xy| to
denote the Euclidean distance between x and y and xy to
denote the segment joining x and y.
We denote by b(x, r) a ball of radius r centered at x, i.e,
b(x, r) = {y : |xy| ≤ r}. For a ball b and any positive real
γ, we use γb to denote the ball with the same center as b and
whose radius is γ times the radius of b, and b to denote the
set of points that are not in b. Given a set X of points and a
point q, let NN q (X) be the distance to the nearest neighbor
of q in X. If there are no points in X, then NN q (X) is
defined to be infinity.
We briefly review the notions of well-separated pair decomposition and balanced box-decomposition trees, as they
play an important role in our constructions.
2.1 The Well-Separated Pair Decomposition
Let S be a set of n points in IRd . We say that two sets of
points X and Y are well-separated if they can be enclosed
within two disjoint d-dimensional balls of radius r, such that
the distance between the centers of these balls is at least αr,
where α ≥ 2 is a real parameter called the separation factor.
If we consider joining the centers of these two balls by a
line segment, the resulting shape resembles a dumbbell. The
balls are the heads of the dumbbell. Define the length of a
dumbbell to be the distance between the centers of the balls.
A dumbbell separates two points x and y if x is contained
in one head and y in the other.
A well-separated pair decomposition (WSPD) of S is a set
PS,α = {(X1 , Y1 ), · · · , (Xm , Ym )} of pairs of subsets of S
such that (i) for 1 ≤ i ≤ m, Xi and Yi are well-separated
and (ii) for any distinct points x, y ∈ S, there exists a unique
pair (Xi , Yi ) such that either x ∈ Xi and y ∈ Yi or x ∈ Yi
and y ∈ Xi . (We say that the pair (Xi , Yi ) separates x
and y.) Callahan and Kosaraju [4] have shown that we can
construct a WSPD containing O(αd n) pairs in O(n log n +
αd n) time. For each pair, their construction also provides
the corresponding dumbbell. Throughout we will assume
that the length of the dumbbell is exactly α times the radius
of the dumbbell heads.
2.2 The BBD Tree
Let U = [0, 1]d denote a unit hypercube in IRd . We define
a quadtree box recursively as follows: (i) U is a quadtree box,
and (ii) any hypercube obtained by splitting a quadtree box
into 2d equal parts is a quadtree box. The size of a quadtree
box is its side length. A nice property of quadtree boxes is
that any two quadtree boxes are either disjoint or one is
contained inside the other.
The balanced box-decomposition (BBD) tree is a balanced
2d -ary tree that compactly represents a hierarchical decomposition of space [3]. Each node of the tree is associated
with a region of space called a cell, which is the difference
of two quadtree boxes, an outer box and an (optional) inner
box. The root of the tree is associated with U . The cell associated with any node is partitioned into disjoint cells, which
are associated with the children of the node. (For details
see [3].) We define the size of a cell to be same as the size
of its outer box.
The properties of the BBD tree, relevant to this paper,
are given below.
(i) A set S of n points can be stored in a BBD tree having
O(n) nodes and O(log n) depth.
(ii) A collection C of n quadtree boxes can be stored in a
BBD tree having O(n) nodes and O(log n) depth. The
subdivision induced by its leaves is a refinement of the
subdivision induced by the quadtree boxes in C and,
for any point q, we can determine the leaf containing
q in O(log n) time.
(iii) The number of cells of the BBD tree with pairwise
disjoint interiors, each of size at least s, that intersect
a ball of radius r is at most O((1 + r/s)d ).
(iv) Let 1 ≤ t ≤ n. Consider the set of nodes of the BBD
tree that have at most t points but whose parents have
more than t points. The size of such a set is at most
O(n/t).
(v) Suppose that we assign weights to a subdivision induced by an (unweighted) BBD tree. Let w be the
maximum weight of a cell and let W denote the total
weight of all the cells. Then it is possible to build a
weighted BBD tree for exactly the same set of cells
such that the following holds. Given w ≤ t ≤ W ,
consider a set of nodes of weight at most t but whose
parents have weight more than t. The size of such a
set is O(W/t).
Properties (i) and (iii) are proved in [3]. Properties (ii)
and (v) follow from a generalization of (i). Property (iv)
follows from the balancing aspect of the BBD tree.
Throughout we use the following notation. Given a cell c,
let sc denote the size of c, and bc be the ball of radius sc d/2
whose center coincides with the center of c’s outer box (note
that c ⊆ bc ).
3. SPACE REDUCTION BY SAMPLING
Let S be a set of n points in IRd , and let 0 < ǫ ≤ 1/2
and 2 ≤ γ ≤ 1/ǫ be two real parameters. In this section we show how to construct a (t, ǫ)-AVD, consisting of
O(nǫ(d−1)/2 γ 3(d−1)/2 log γ) cells, where t = O(1/(ǫγ)(d−1)/2 ).
The construction of AVDs described by Arya and Malamatos [1] is based on two fundamental constructions. The
first is that given two concentric balls it is possible to construct a small set of nearest-neighbor representatives so that
given a query point in the inner ball its approximate nearest
neighbor outside the outer ball will be one of these representatives, and vice versa. A similar result holds for two
disjoint balls. Lemmas 1 and 2 below indicate the sizes of
these representative sets as a function of the separation of
the two balls. The second construction is a subdivision of
space into a collection of cells, such that for each cell, the
points of S lying outside this cell satisfy certain separation
properties. This is given in Lemma 3 below. In all three
lemmas (adapted slightly for convenience), S is a set of n
points in IRd , and 0 < ǫ ≤ 1/2 and γ ≥ 2 are two real
parameters.
Lemma 1. Let b1 and b2 denote two concentric balls of
radius r and γr, respectively. There exists a set R ⊆ S
consisting of
„
„
««d−1
1
1+O √
ǫγ
points such that
(i) for any point q ∈ b1 , NN q (R) ≤ (1 + ǫ)NN q (S ∩ b2 ),
and
(ii) for any point q ∈ b2 , NN q (R) ≤ (1 + ǫ)NN q (S ∩ b1 ).
Lemma 2. Let b1 and b2 be two disjoint balls of radius r1
and r2 , respectively, whose minimum distance of separation
is at least ℓ′ . Further, suppose that ℓ′ ≥ max(r1 , r2 )/2. Then
there exists a set R ⊆ S consisting of
„
„√
««d−1
r1 r2
√
1+O
ℓ′ ǫ
points such that for any point q ∈ b1 , NN q (R) ≤ (1 + ǫ) ·
NN q (S ∩ b2 ).
Lemma 3. It is possible to construct a subdivision consisting of O(nγ d ) cells, where each cell c is the difference
of two cubes and satisfies at least one of the following three
properties:
(i) |S ∩ γbc | ≤ 1.
(ii) There exists a ball b′c such that S ∩ γbc ⊆ b′c and the
ball γb′c does not overlap c.
(iii) There exists a ball b′c such that S ∩ γbc ⊆ b′c . Letting r1 and r2 denote the radius of balls bc and b′c ,
respectively, and ℓ′ denote the minimum distance of
separation
b and b′c , then ℓ′ ≥ 2 max(r1 , r2 )
√ between
√ c
and ℓ′ / r1 r2 ≥ 5 γ.
The three lemmas given above imply that, for a cell c in
the subdivision of Lemma 3, we can choose a set of representatives of size O((1/ǫγ)(d−1)/2 ) each from γbc and from
b′c . This forms the basis of the approach given by Arya and
Malamatos. We can significantly reduce the number of cells
by weakening the separation property in Lemma 3, allowing
for O((1/ǫγ)(d−1)/2 ) points in the region γbc −b′c , and choosing all these points as additional representatives. A natural
method for trying to achieve this goal is to construct the subdivision of Lemma 3 for a set of randomly sampled points
of suitable size. Unfortunately, standard results in random
sampling and ǫ-net theory imply that this would lead to an
increase in the number of representatives by a factor that
is logarithmic in n. We overcome this difficulty by devising
a novel but simple sampling procedure based on the BBDtree, which has the following property. Let nf denote the
number of points of S that are sampled, where 0 < f ≤ 1
is a parameter. Then for any fat region (e.g., an Euclidean
ball) that is free of sampled points, we can contract the region by a constant factor (say, 2), and the shrunken region
is guaranteed to have at most O(1/f ) points of the original
set. As we will see, this sampling procedure suffices for our
application and leads to a large space improvement.
In Lemma 4, we present a simple version of this idea,
intended to illustrate the basic space-reduction mechanism
and, later, in Section 3.1, we apply this technique in full
force.
Lemma 4. Let S be a set of n points in IRd , and let γ ≥ 2
and 0 < f ≤ 1 be two real parameters. It is possible to
construct a subdivision consisting of O(nf γ d ) cells, where
each cell c is the difference of two cubes and satisfies at least
one of the following three properties:
(i) |S ∩ γbc | = O(1/f ).
(ii) There exists a ball b′c such that |S∩(γbc −b′c )| = O(1/f )
and the ball γb′c does not overlap c.
(iii) There exists a ball b′c such that |S ∩ (γbc − b′c )| =
O(1/f ). Letting r1 and r2 denote the radius of balls
bc and b′c , respectively, and ℓ′ denote the minimum
distance of separation between bc and b′c , then ℓ′ ≥
√
√
max(r1 , r2 )/2 and ℓ′ / r1 r2 ≥ γ,
Proof. Let T denote the BBD tree for the set S of points.
Let N be the set of nodes of T that contain at most 1/f
points (of S), but whose parents contain more than 1/f
points. Let X be the cells corresponding to these nodes. By
BBD property (iv), |X | is O(nf ). Let S ′ ⊆ S be the set
of points obtained by sampling one point arbitrarily from
each (non-empty) cell in X . We construct the subdivision
described in Lemma 3 for S ′ but using the value 2γ in place
of γ in the lemma. Observe that since |S ′ | = O(nf ), by
Lemma 3, the number of cells in this subdivision is O(nf γ d ).
We claim that any cell c in this subdivision satisfies the
desired property.
If Case (i) of Lemma 3 holds, then |S ′ ∩ 2γbc | ≤ 1. Let
′
X ⊆ X be the cells that overlap the ball γbc , and contain
at least one point of S. Since S ′ contains one point from
each cell in X ′ , all but at most one cell in X ′ must intersect
the boundary of 2γbc . By BBD property (iii), the number
of cells that overlap γbc and intersect the boundary of 2γbc
is bounded by a constant. Thus |X ′ | = O(1). Since a cell in
X ′ has at most 1/f points, the total number of points of S
in γbc is O(1/f ).
If Case (ii) of Lemma 3 holds, then there exists a ball b′′c
such that S ′ ∩ 2γbc ⊆ b′′c and the ball 2γb′′c does not overlap
c. Define b′c = 2b′′c . It follows that γb′c does not overlap c.
Next we show that |S ∩ (γbc − b′c )| = O(1/f ). Let X ′ ⊆ X
be the cells that overlap the region γbc − b′c , and contain at
least one point of S. Since S ′ contains one point from each
cell in X ′ , and 2γbc − b′c /2 contains no point of S ′ , it follows
that each cell in X ′ intersects the boundary of both 2γbc
and γbc , or intersects the boundary of both b′c and b′c /2. By
BBD property (iii), the number of such cells is bounded by
a constant. Since a cell in X ′ has at most 1/f points, the
total number of points of S in the region γbc − b′c is O(1/f ).
A similar argument shows that if Case (iii) of Lemma 3
holds, then the corresponding case holds here. The details
are omitted. This completes the proof.
We construct the subdivision described in Lemma 4 for
f = (ǫγ)(d−1)/2 . We assign representatives to the cells
as follows. Let q be a point inside a cell c. Let bc and
b′c be the balls defined in Lemma 4. Since c is contained
within the ball bc , applying Lemma 1(i), it follows that we
can find a set Rc′ consisting of O(1/(ǫγ)(d−1)/2 ) points such
that NN q (Rc′ ) ≤ (1 + ǫ)NN q (S ∩ γbc ). We next consider
the case when the nearest neighbor of q lies within γbc .
Note that one of the three cases given in the statement
of Lemma 4 must hold. If Case (i) holds, then we define
Rc′′ = S ∩γbc , and if Case (ii) or Case (iii) holds, then we define Rc′′ = S ∩ (γbc − b′c ). Note that |Rc′′ | = O(1/(ǫγ)(d−1)/2 ).
It is clear that a point in Rc′ ∪ Rc′′ is an ǫ-NN of q unless
Case (ii) or (iii) holds and the nearest neighbor of q lies
within b′c . If Case (ii) (Case (iii), resp.) holds then, by
Lemma 1(ii) (Lemma 2, resp.), it follows that we can find
a set Rc′′′ consisting of O(1/(ǫγ)(d−1)/2 ) points such that
NN q (Rc′′′ ) ≤ (1 + ǫ)NN q (S ∩ b′c ). Finally we assign the set
of representatives Rc for c to be Rc′ ∪ Rc′′ in Case (i) and
Rc′ ∪ Rc′′ ∪ Rc′′′ in Cases (ii) and (iii). Clearly, Rc has size
O(1/(ǫγ)(d−1)/2 ) and satisfies the desired property, namely,
NN q (Rc ) ≤ (1 + ǫ)NN q (S). Thus, we have thus shown how
to construct a (t, ǫ)-AVD with O(nǫ(d−1)/2 γ (3d−1)/2 ) cells
for t = O(1/(ǫγ)(d−1)/2 ).
3.1 Bisector-Sensitive Construction
In this section, we present a more sophisticated construction that reduces the space by nearly a factor of γ. The
construction used in the previous section simply applied the
existing construction of Arya and Malamatos to a suitable
sample of the points. In this section we combine the sampling process with a new construction, which generates cells
along the Voronoi bisectors of the well-separated pairs.
Let T denote the BBD tree for the set S of points. Let N
be the set of nodes of T that contain at most 1/f points (of
S), but whose parents contain more than 1/f points. Here f
is a parameter between 0 and 1, which will later be assigned
a suitable value depending on ǫ and γ. Let X be the cells
corresponding to these nodes. By BBD property (iv), |X | is
O(nf ). Let S ′ ⊆ S be the set of points obtained by sampling
one point arbitrarily from each (non-empty) cell in X .
We construct a WSPD PS ′ ,α for S ′ , using separation factor α = 16. Note that the number of pairs in PS ′ ,16 is
O(nf ). Let D′ denote the set of dumbbells corresponding
to this WSPD. We expand both the heads of each dumbbell
by a factor of two. Let D denote the new set of dumbbells.
Note that the dumbbells in D have a separation factor of
8. For each dumbbell P ∈ D, we compute a set of quadtree
boxes CP as follows.
Let X ⊆ S and Y ⊆ S denote the set of points that are
enclosed within the two heads of P . Let x and y denote the
centers of the heads enclosing X and Y , respectively. Let ℓ =
|xy|, let z denote the center of the segment xy, and let BP
denote the set of balls of radius 2i ℓ, for 4 ≤ i ≤ ⌈log γ + 4⌉,
centered at z. Let VP denote the set of points whose nearest
neighbor in X and in Y are equidistant (i.e., the Voronoi
bisector of X and Y ). For a ball b ∈ BP , let Cb be the set of
quadtree boxes overlapping b ∩ VP that have the largest size
not exceeding rb2 /(2048γdℓ), where rb denotes the radius of b.
Let CP′ be the set of quadtree boxes overlapping b(z, 16γℓ)
that have the largest size not exceeding γℓ/d. Let CP =
(∪b∈BP Cb ) ∪ CP′ and C = ∪P ∈D CP . Finally we store all the
quadtree boxes in C in a BBD tree T ′ . We will show that the
subdivision induced by the leaves of T ′ , along with suitably
chosen representatives, is the desired approximate Voronoi
diagram.
We first bound the number of cells in the subdivision.
The following lemma essentially shows that the Voronoi bisector of a well-separated pair (X, Y ) behaves as a (d − 1)dimensional hyperplane for the purpose of packing arguments. The proof is omitted, but is similar to one given
in [2].
Lemma 5. Let X and Y be two sets of points enclosed
within two disjoint d-balls of radius r, such that the distance
between the centers of these balls is at least αr, where α ≥ 4.
Let V denote the Voronoi bisector of X and Y , and let b be
any d-ball of radius R. Then the number of quadtree boxes
of size s that overlap b ∩ V is at most O((1 + R/s)d−1 ).
By Lemma 5, it follows that for any ball b ∈ BP , |Cb | is
O((1+γℓ/rb )d−1 ). Also, by BBD property (iii), |CP′ | = O(1).
It is now easy to see that |CP | is O(γ d−1 ). Since the number
of dumbbells in D is O(nf ), the total number of quadtree
boxes, |C|, is O(nf γ d−1 ). Since the number of leaves in the
BBD tree T ′ is O(|C|), this bound applies to the number of
cells in the subdivision.
Lemma 7 describes an important property of the subdivision, which will enable us to choose representatives for the
cells. We need the following technical lemma. The proof is
omitted due to space limitations, but is proved by methods
similar to those used in [1].
Lemma 6. Let c be a cell corresponding to a leaf of T ′ .
Let x be a point in S ∩ 2γbc . Let Sc ⊆ S ∩ 2γbc be the set of
points p such that there is a dumbbell P ∈ D that separates x
and p, and VP intersects cell c. Then either |Sc | = 0 or there
exists a ball b′c such that Sc ∪ {x} ⊆ b′c and which satisfies
at least one of the following two properties:
(i) The ball 8γb′c does not overlap c.
√
√
(ii) ℓ′ ≥ 4 max(r1 , r2 ) and ℓ′ / r1 r2 ≥ 19 γ, where ℓ′ denotes the minimum distance of separation between bc
and b′c , r1 denotes the radius of bc , and r2 denotes the
radius of ball b′c .
The proof is based on the fact that a dumbbell separating
x from a point in Sc is either “far away” from c or it implies
an upper bound on the size of c (because it generates c
or a cell enclosing c). By considering the dumbbell that
separates x from the point in Sc that is farthest from it, we
can show that these two cases yield properties (i) and (ii),
respectively.
Lemma 7. Let S be a set of n points in IRd , and let γ ≥ 2
and 0 < f ≤ 1 be two real parameters. It is possible to
construct a subdivision consisting of O(nf γ d−1 ) cells, where
each cell c is the difference of two cubes and satisfies at least
one of the following three properties. Let Sc ⊆ S be the set
of points that are the nearest neighbor of some point in c.
(i) |Sc ∩ γbc | = O((1/f ) log γ).
(ii) There exists a ball b′c such that |Sc ∩ (γbc − b′c )| =
O((1/f ) log γ) and the ball γb′c does not overlap c.
(iii) There exists a ball b′c such that |Sc ∩ (γbc − b′c )| =
O((1/f ) log γ). Letting ℓ′ denote the minimum distance of separation between bc and b′c , r1 denote the
radius of bc , and r2 denote
of ball b′c , then
√ the radius
√
ℓ′ ≥ max(r1 , r2 ) and ℓ′ / r1 r2 ≥ 9 γ.
Proof. If |Sc ∩ γbc | = 0, then (i) trivially holds. So
suppose that |Sc ∩ γbc | ≥ 1. First we show that if |S ′ ∩
2γbc | = 0 then (i) holds. Let X ′ ⊆ X be the set of cells
that overlap the ball γbc , and contain at least one point
of S. Since S ′ contains one point from each cell in X ′ , all
the cells in X ′ must intersect the boundary of 2γbc . By
BBD property (iii), the number of cells that overlap γbc
and intersect the boundary of 2γbc is bounded by a constant.
Thus |X ′ | = O(1). Since a cell in X ′ has at most 1/f points,
the total number of points of S in γbc is O(1/f ), and so
clearly (i) holds.
In the remainder of the proof, we assume that |S ′ ∩2γbc | ≥
1 and |Sc ∩ γbc | ≥ 1. Let (x, w) be the closest pair of points
such that x ∈ S ′ ∩ 2γbc and w ∈ Sc ∩ 2γbc . (In case of
ties, choose any such pair.) Let β = |xw|. (Note that if
S ′ ∩ Sc ∩ 2γbc 6= ∅, then x = w, and β = 0.)
Let Sc′ ⊆ S ∩ 2γbc be the set of points p such that there is
a dumbbell P ∈ D that separates x and p, and VP intersects
cell c. If |Sc′ | = 0, then let b′′c be the ball of zero radius
centered at x. Otherwise, by Lemma 6, there exists a ball
b′′c such that Sc′ ∪{x} ⊆ b′′c and which satisfies either property
(i) or (ii) listed therein. Let r2′′ be the radius of b′′c and let
z denote its center. We distinguish two cases based on the
closest distance L between z and cell c: (1) L ≥ sc /16 and
(2) L < sc /16.
Case 1: L ≥ sc /16.
Let r2 = max(2r2′′ , sc /(16γ)), and let b′c = b(z, r2 ). Note
that x is at a distance of at most r2 /2 from the center z
of b′c . We consider two subcases: (a) β < r2 /64 and (b)
β ≥ r2 /64.
Subcase (a): β < r2 /64.
It is easy to verify that b′c satisfies either property (ii) or
(iii). It remains to show that |Sc ∩(γbc −b′c )| = O((1/f ) log γ).
To this end, we first show that the cell c′ ∈ X that contains
a point u ∈ Sc ∩(γbc −b′c ) has size at least |zu|/(64d). For the
sake of contradiction, suppose that the size of c′ is less than
|zu|/(64d). Recall that S ′ contains one point from each cell
in X . Let u′ be the point in c′ that belongs to S ′ (note that
u and u′ may be the same point). By the triangle inequality,
we get |xu′ | ≥ |zu| − |uu′ | − |xz|. Since |uu′ | ≤ |zu|/64 and
|xz| ≤ |zu|/2 (because |zu| ≥ r2 and |xz| ≤ r2 /2), it follows
that |xu′ | ≥ 31|zu|/64.
Consider the dumbbell P ′ ∈ D′ that separates x and u′ .
Let P be the dumbbell corresponding to P ′ in D. Let A
and B denote the heads of dumbbell P containing x and u′ ,
respectively. Since the separation factor for the dumbbells
in D is 8, it follows that the radius r ′ of A and B is at
least |xu′ |/10 ≥ 31|zu|/640. Recall that the heads of the
dumbbells in D′ are enlarged by a factor of 2 to obtain the
dumbbells in D. Hence any point within distance r ′ /2 of x
and u′ will lie within A and B, respectively. Since |wx| =
β ≤ r2 /64 ≤ |zu|/64, it follows that w ∈ A and, since |uu′ | ≤
|zu|/64, it follows that u ∈ B. Since both w and u belong
to Sc (i.e., they are the nearest neighbor of some point in
cell c), it is not difficult to see that the Voronoi bisector VP
must intersect c. But then b′′c should have contained u, a
contradiction.
Partition the points in Sc ∩ (γbc − b′c ) into groups, where
the ith group has all the points whose distance from z lies
in the interval [2i r2 , 2i+1 r2 ]. Since r2 ≥ sc /(16γ), and the
maximum distance of a point in γbc from z is some constant
times γsc , it is clear that the number of non-empty groups
is O(log γ). Let Xi ⊆ X be the set of cells that overlap some
point in the ith group. By the above observation, the cells
in Xi have size at least 2i r2 /(64d), and so by BBD property
(iii), |Xi | is O(1). Since each cell in Xi has 1/f points, the
number of points in the ith group is O(1/f ), which implies
the desired bound on |Sc ∩ (γbc − b′c )|.
Subcase (b): β ≥ r2 /64.
Consider a ball b̃ of radius 32β centered at x. It is easy
to show that for any dumbbell separating x and any point
u outside this ball, the dumbbell head containing x also
contains w. As in Subcase (a), we can now show that |Sc ∩
(γbc − b̃)| = O((1/f ) log γ).
Next we claim that |Sc ∩ γbc ∩ b̃| = O(1/f ). To see this,
let X ′ ⊆ X be the set of cells that contain a point in Sc ∩
γbc ∩ b̃. Note that a cell in X ′ must either overlap 2γbc
or it must have size exceeding β/d (otherwise the distance
between any pair of points in this cell that belong to Sc and
S ′ , respectively, would be less than β, a contradiction). It
follows from BBD property (iii) that |X ′ | = O(1), which
implies the desired claim.
Thus we get |Sc ∩ γbc | = O((1/f ) log γ), which implies (i).
Case 2: L < sc /16.
It is obvious that property (ii) given in Lemma 6 cannot
apply to b′′c , so either b′′c satisfies property (i) given there
or its radius must be zero. It follows that r2′′ ≤ L/(8γ).
We consider two subcases: (a) β < sc /(256γ) and (b) β ≥
sc /(256γ).
Subcase (a): β < sc /(256γ).
We set b′c = b(z, L/γ). Note that b′c satisfies property (ii),
or it is a ball of zero radius centered at a point inside cell c.
We now show that |Sc ∩ (γbc − b′c )| = O((1/f ) log γ).
Similar to Subcase (a) of Case 1, we can show that |Sc ∩
(γbc −b(z, sc /(8γ)))| = O((1/f ) log γ). We omit the straightforward details.
We next claim that |S ∩ (b(z, sc /(8γ)) − b′c )| = O(1/f ),
which would imply (ii). Towards this end, we show that
there are no points of S ′ in the region b(z, sc /(4γ)) − b′c /2.
For contradiction, assume that there is a point u ∈ S ′ in
this region. Consider a dumbbell P ∈ D that separates x
and u. Let ℓ be the length of P . By definition of wellseparatedness, we get 4|xu|/5 ≤ ℓ ≤ 4|xu|/3. Applying the
triangle inequality, it is easy to see that 3L/(8γ) ≤ |xu| ≤
sc /(2γ), which implies that 3L/(10γ) ≤ ℓ ≤ 2sc /(3γ). It
can be easily checked from the construction that all cells
whose closest distance from x is less than 15γℓ can have size
at most γℓ/d. Note that the closest distance from x to cell
c is at most L + L/(8γ) ≤ 17L/16 ≤ 4γℓ. Thus the size of
cell c cannot exceed 2sc /(3d), a contradiction.
Let X ′ ⊆ X be the set of cells that overlap the region
b(z, sc /(8γ)) − b′c , and contain at least one point of S. Since
S ′ contains one point from each cell in X ′ , and there are no
points of S ′ in the region b(z, sc /(4γ)) − b′c /2, it follows that
each cell in X ′ intersects the boundary of both b(z, sc /(4γ))
and b(z, sc /(8γ)) or the boundary of both b′c and b′c /2. By
BBD property (iii), the number of such cells is bounded by
a constant. Noting that a cell in X ′ has at most 1/f points,
the desired result follows.
Subcase (b): β ≥ sc /(256γ).
Using an argument similar to Subcase (b) of Case 1, we
can show that (i) holds. We omit the straightforward details.
This completes the proof.
In view of the similarity of Lemma 7 with Lemma 4, it
should be clear that the same method for assigning representatives works for this subdivision too. We mention
that the only difference is the value of f , which we set to
(ǫγ)(d−1)/2 log γ.
Given a query point q, we can determine the leaf of the
BBD tree T ′ that contains q in O(log(nγ)) time. By com-
puting the distance from q for each of the stored representatives, we can answer queries in O(log(nγ) + 1/(ǫγ)(d−1)/2 )
time. We summarize the main result of this section.
Theorem 1. Let S be a set of n points in IRd , and let
0 < ǫ ≤ 1/2 and 2 ≤ γ ≤ 1/ǫ be two real parameters. We can
construct an (O(1/(ǫγ)(d−1)/2 ), ǫ)-approximate Voronoi diagram for S that consists of O(nǫ(d−1)/2 γ 3(d−1)/2 log γ) cells,
where each cell is the difference of two cubes. Moreover,
for any query point, we can return its ǫ-NN in O(log(nγ) +
1/(ǫγ)(d−1)/2 ) time. Here the constants in the O-notation
are independent of ǫ and γ.
We obtain a family of data structures that can answer ǫNN queries in O(log(nγ) + 1/(ǫγ)(d−1)/2 ) time using space
O(nγ d−1 log γ). Setting γ = 2 we obtain the most spaceefficient solution in this family, which we present in the following corollary.
Corollary 1.1. Given a set S of n points in IRd , we can
answer ǫ-NN queries in O(log n + 1/ǫ(d−1)/2 ) time using a
data structure of space O(n).
Remark: For the case of γ = 1/ǫ, we can omit the sampling process altogether in Lemma 7, which allows us to save
a log γ factor in space. By a straightforward extension of the
approach given in this section, we can construct a (1, ǫ)-AVD
of size O(n/ǫd−1 ). Arya and Malamatos [1] have shown a
lower bound of Ω(n/ǫd−1 ) on the size of a (1, ǫ)-AVD, assuming that the cells are differences of two axis-aligned hyperrectangles, which implies that our construction has optimal
size.
omitted due to space limitations, but is proved by methods
similar to those used in [1].
Lemma 8. Let S be a set of n points in IRd , and let γ ≥ 2
be a real parameter. It is possible to construct a subdivision
consisting of O(nγ d−1 ) cells, where each cell is the difference
of two cubes, that satisfies the following properties. In the
following, c is any cell in the subdivision and Sc ⊆ S ∩ γbc is
the set of points that are the nearest neighbor of some point
in c.
(a) The number of cells, each of size at least s, that intersect a ball of radius r is at most O((1 + r/s)d ).
(b) There is a constant k > 1 such that the ball kγbc contains at least one point of S.
(c) The cell c satisfies at least one of the following three
properties.
(i) |Sc | ≤ 1.
(ii) There exists a ball b′c such that Sc ⊆ b′c and the
ball γb′c does not overlap c.
(iii) There exists a ball b′c such that Sc ⊆ b′c . Letting r1
and r2 denote the radius of balls bc and b′c , respectively, and ℓ′ denote the minimum distance of separation between bc and b′c , then ℓ′ ≥ max(r1 , r2 )
√
√
and ℓ′ / r1 r2 ≥ 19 γ.
LOWER SPACE BOUNDS BY BETTER
CHARGING
Let X be any subdivision satisfying Lemma 8. Using Lemmas 1 and 2, we can easily show that O(1/(ǫγ)(d−1)/2 ) representatives suffice for each cell. The following lemma gives a
more complicated method for choosing cell representatives
which, however, has the advantage that we can obtain a
much better bound on the total number of representatives.
Let S be a set of n points in IRd , and let 0 < ǫ ≤ 1/2 and
2 ≤ γ ≤ 1/ǫ be two real parameters. In this section we show
how to construct a (t, ǫ)-AVD, consisting of
„
«
3(d−1)
d−1
2
O nǫ 2 γ 2 log
ǫγ
Lemma 9. Let S be a set of n points in IRd , and let 0 <
ǫ ≤ 1/2 and 2 ≤ γ ≤ 1/ǫ be two real parameters. Let t =
O(1/(ǫγ)(d−1)/2 ). Any subdivision X satisfying Lemma 8 is
a (t, ǫ)-AVD such that the total number of representatives
summed over all cells of X is O(nγ d−1 log(2/(ǫγ))).
4.
cells, where t = O(1/(ǫγ)(d−1)/2 ).
The construction proceeds as follows. First, we construct
a (t, ǫ)-AVD consisting of O(nγ d−1 ) cells, such that the
total number of representatives summed over all cells is
O(nγ d−1 log(2/(ǫγ))). Note that for this AVD, there is a
huge gap between the number of representatives for a cell
in the worst case, which is O(1/(ǫγ)(d−1)/2 ), and their number averaged over all cells, which is only O(log(2/(ǫγ))). In
the second step of the construction, we take advantage of
this fact by combining cells that have fewer representatives
into larger cells. This enables us to significantly reduce the
number of cells and leads to the desired AVD.
Let D be the set of dumbbells corresponding to the WSPD
for S, using separation factor 8. For each dumbbell P ∈ D,
define the Voronoi bisector VP , the set of balls BP , and the
sets CP′ and CP of quadtree boxes, exactly as in Section 3.1.
We store all the quadtree boxes in C = ∪P ∈D CP in a BBD
tree T . Using the same approach as in Section 3.1, we can
show that the number of leaves in T is O(nγ d−1 ).
We now describe a crucial property satisfied by the subdivision induced by the leaves of T , which enables us to select representatives for the cells economically. The proof is
Proof. (Sketch) Let c be any cell in X and let Sc′ ⊆ S
be the set of points that are the nearest neighbor of some
point in c. Let q be any point in c. We will identify two sets
of points Rc′ ⊆ S (outer representatives) and Rc′′ ⊆ S (inner representatives) each consisting of t = O(1/(ǫγ)(d−1)/2 )
points, such that NN q (Rc′ ) ≤ (1 + ǫ)NN q (Sc′ ∩ γbc ) and
NN q (Rc′′ ) ≤ (1 + ǫ)NN q (Sc′ ∩ γbc ). Finally, we will store
the set of representatives Rc = Rc′ ∪ Rc′′ with c. Clearly
NN q (Rc ) ≤ (1 + ǫ)NN q (S).
In this version of the paper, we only present our method
for computing
Rc′ , the set of outer representatives, and show
P
that c∈X |Rc′ | = O(nγ d−1 log(2/(ǫγ))). Using similar ideas,
we can compute the inner representatives and show that this
bound also applies to their total number. Details will appear
in the full version.
Letting k′ = k + 1, we have Sc′ ⊆ k′ γbc because, by
Lemma 8, the ball kγbc contains at least one point of S
and γ is at least 2. Let Sc′′ = Sc′ ∩ (k′ γbc − γbc ). Let
Uc denote the set of quadtree boxes that have the largest
size not exceeding ǫγ 2 sc /8 and that contain at least one
point of Sc′′ . Clearly |Uc | = O(1/(ǫγ)d ). Note that any
quadtree box u ∈ Uc can be enclosed within a ball bu of
radius ru ≤ ǫγ 2 sc d/16 ≤ γsc d/16, which implies that the
minimum distance of separation between the balls bc and
b√u , ℓ′u , is √
at least γsc d/8. It is now easy to verify that
r1 ru /(ℓ′u ǫ) = O(1) and ℓ′u ≥ max(r1 , ru )/2, where r1 =
sc d/2 is the radius of ball bc . Therefore, by Lemma 2, we
can find a set Rc,u ⊆ Sc′′ ∩ u consisting of O(1) points such
that NN q (Rc,u ) ≤ (1 + ǫ/2)NN q (Sc′′ ∩ u). It follows that
the set R̃c′ = ∪u∈Uc Rc,u has at most O(1/(ǫγ)d ) points and
satisfies NN q (R̃c′ ) ≤ (1 + ǫ/2)NN q (Sc′ ∩ γbc ).
The main difficulty with using R̃c′ as the outer representatives for c is that their number can be too large. To remedy
this, observe that c ⊆ bc and all the points in R̃c′ lie outside γbc . Thus, we can apply Lemma 1 to obtain a set
Rc′ ⊆ R̃c′ consisting of O(1/(ǫγ)(d−1)/2 ) points such that
NN q (Rc′ ) ≤ (1 + ǫ/4)NN q (R̃c′ ). Thus
NN q (Rc′ ) ≤
≤
(1 + ǫ/4)(1 + ǫ/2)NN q (Sc′ ∩ γbc )
(1 + ǫ)NN q (Sc′ ∩ γbc ),
as desired.
We next present a charging argument to show that
«
„
X ′
2
d−1
.
|Rc | = O nγ
log
ǫγ
c∈X
Let D be the set of dumbbells corresponding to the WSPD
for S, using separation factor 8. Each dumbbell P ∈ D
allocates a unit charge to each cell in a set C̃P ⊆ X defined as follows. Let X ⊆ S and Y ⊆ S denote the set of
points that are enclosed within the two heads of P . Let
x and y denote the centers of the heads enclosing X and
Y , respectively. Let ℓ = |xy|, let z denote the center of
the segment xy, and let B̃P denote the set of balls of radius 2i ℓ, for 0 ≤ i ≤ ⌈log((40k′ d)/(ǫγ))⌉, centered at z. Let
VP denote the Voronoi bisector of X and Y . For a ball
b ∈ B̃P , let C˜b ⊆ X be the set of cells overlapping b ∩ VP
that have size at least rb /(8k′ γd), where rb denotes the radius of b. Finally, we let C̃P = ∪b∈B̃P C˜b . We claim that: (i)
the total charges allocated to all the cells in X by the above
procedure is O(nγ d−1 log(2/(ǫγ))) and, (ii) each cell c ∈ X
receives enough charge to pay for the representatives stored
with c.
To prove (i), note that Lemma 5 implies that for any ball
b ∈ B̃P , |C˜b | is O(γ d−1 ). Since the number of balls in B̃P is
O(log(2/(ǫγ))), it follows that each dumbbell allocates a unit
charge to at most O(γ d−1 log(2/(ǫγ))) cells in X . Recalling
that the number of dumbbells in D is O(n), (i) follows.
To prove (ii), we will show that cell c receives charge from
at least Ω(|R̃c′ |) dumbbells. In light of (i) and the fact that
Rc′ ⊆ R̃c′ , this would clearly imply the desired bound on
P
′
′
′
c∈X |Rc |. We first identify a set of points R̂c ⊆ R̃c such
′
′
that |R̂c | = Ω(|R̃c |) and the distance between any pair of
points in R̂c′ is at least ǫγ 2 sc /16. We can obtain the set
R̂c′ using the following procedure. We start with R̂c′ being
empty and consider the quadtree boxes in Uc one by one. For
each quadtree box u examined, we add any one point of Rc,u
to R̂c′ , and eliminate the at most 3d quadtree boxes in Uc that
share a common boundary with u from future consideration.
We continue in this manner until all the quadtree boxes are
eliminated. It is clear that this process yields a set R̂c′ with
the properties mentioned above. We next show that there
are at least |R̂c′ | − 1 distinct dumbbells that separate pairs
of points in |R̂c′ |, and that each of these dumbbells allocates
a unit charge to cell c. Clearly this would imply (ii).
Consider the following process for identifying a set of
dumbbells. At each step, we find the dumbbell in D that
separates the closest pair of points among the remaining
points in R̂c′ . We then eliminate one of these two points and
continue this process with the remaining points. We stop
when only one point in |R̂c′ | remains. Obviously this process finds a set of |R̂c′ | − 1 dumbbells. Noting that, at each
step, each head of the dumbbell identified contains exactly
one point, among the points in R̂c′ that have not yet been
eliminated, it is clear that all the dumbbells obtained are
distinct.
It remains to show that each of these |R̂c′ | − 1 dumbbells
allocates a charge to cell c. In fact, we will show that any
dumbbell P that separates two points x, y ∈ R̂c′ allocates
a charge to c. Observe first that VP must intersect c since
x and y are both the nearest neighbor of some point in
c. Let x′ and y ′ denote the centers of the heads of P , let
z denote the center of the line segment x′ y ′ , and let ℓ =
|x′ y ′ |. Using the definition of well-separatedness and the
triangle inequality, it follows that |xx′ | ≤ ℓ/8, |yy ′ | ≤ ℓ/8,
and 4|xy|/5 ≤ ℓ ≤ 4|xy|/3. Since x and y are both contained
in k′ γbc , it is easy to show that both ℓ, and the distance
between z and any point in c, are less than 2k′ γsc d. Also,
since the distance between x and y is at least ǫγ 2 sc /16, it
follows that ℓ ≥ ǫγ 2 sc /20. This implies that the radius of
the largest ball in B̃P is at least 2k′ γsc d, and so there must
be a ball in B̃P that overlaps c ∩ VP . Let rb denote the
radius of the smallest ball b ∈ B̃P that overlaps c ∩ VP . If
b is the smallest ball in B̃P , then its radius rb is ℓ, which
is less than 2k′ γsc d. Otherwise, rb cannot exceed twice the
closest distance between z and c ∩ VP , which implies that rb
is less than 4k′ γsc d. Recalling that P allocates a charge to
all cells overlapping b ∩ Vp that have size at least rb /(8k′ γd),
it is clear that c receives a charge from P . This completes
the proof.
To complete the construction, we weight each cell of this
AVD with the associated number of representatives. We
then build a weighted BBD tree for these cells. Letting
t = 1/(ǫγ)(d−1)/2 , we apply property (v) of BBD trees to
produce a truncated tree having
„
«
3(d−1)
d−1
2
O nǫ 2 γ 2 log
ǫγ
cells. For each cell in the resulting tree, the number of
representatives is no greater than the sum of its associated
weights. From this we have the following result.
Theorem 2. Let S be a set of n points in IRd , and let
0 < ǫ ≤ 1/2 and 2 ≤ γ ≤ 1/ǫ be two real parameters. We
can construct an (O(1/(ǫγ)(d−1)/2 ), ǫ)-approximate Voronoi
diagram for S that consists of
„
«
3(d−1)
d−1
2
O nǫ 2 γ 2 log
ǫγ
cells, where each cell is the difference of two cubes. Moreover, for any query point, we can return its ǫ-NN in time
O(log(nγ) + 1/(ǫγ)(d−1)/2 ). Here the constants in the Onotation are independent of ǫ and γ.
5.
LOWER BOUNDS
Our lower bound construction is parameterized by the
number of points n, the dimension d, the approximation
factor ǫ and the desired number of representatives per cell
t. Intuitively, to make the number of cells large we must
create a situation in which many cells are small. To make
cells as small as possible, we distribute t + 1 sites evenly on
a (k − 1)-dimensional sphere centered at the origin. The set
of points in IRd that are equidistant from these points lie
on a linear subspace J of dimension d − k. Any cells that
has a significant overlap with J has t + 1 contenders for the
nearest neighbor for each point in the cell. If this cell is
relatively close to the origin we can derive an upper bound
on the size of such a cell. We will see that as k increases this
size bound decreases. However, as k increases, the dimension of J decreases, and hence the number of overlapping
cells decreases. Thus we face a tradeoff. Our construction
takes the dimension k as a parameter, and then derives the
value of k that produces the best lower bound. We make
⌊n/(t + 1)⌋ widely distributed copies of this configuration
to produce the final lower bound.
The main results of this section are provided below. The
aspect ratio of a rectangle is defined to be the ratio between
its longest and shortest side lengths.
Theorem 3. Consider AVDs in which each cell is an
axis-aligned box of bounded aspect ratio or the difference of
two such boxes.
(i) Given t and ǫ, there exists a set of n sites in IRd such
that the number of cells in any (t, ǫ)-AVD for this set
is
0
1
„ «(d−1)+c−2√2c(d−1)
1
A,
Ω @n
ǫ
ln t
and c ≤
where c = − ln
ǫ
d−1
.
2
(ii) The ratio between the upper bound provided in Theorem 1 and this lower bound is
1
0
!
„ « d−1
„ «2√2c(d−1)−4c
2
1
1
A
@
log γ
≤ O
log γ .
O
ǫ
ǫ
Remark: Recalling the observation that the log γ factor in
the upper bound can be avoided if γ = 1/ǫ, it follows that in
the extreme case t = 1 (c = 0) the upper and lower bounds
match asymptotically. Also, when γ is a constant, the factor
log γ can be ignored, and hence it follows that in the other
extreme case t = 1/ǫ(d−1)/2 (c = (d − 1)/2) the upper and
lower bounds match.
The remainder of this section is devoted to proving Theorem 3. Define H to be the linear subspace that is orthogonal
to the diagonal vector whose
P coordinates are all 1, that is,
w
~ = (w1 , . . . , wd ) ∈ H if i wi = 0. Let k, 1 ≤ k ≤ d − 1,
be an integer parameter to be fixed later, and let J be some
d − k dimensional linear subspace of H. Let K be the k
dimensional orthogonal linear subspace of J. Let U be the
intersection of a unit sphere with K. The principal properties of J, K and U are the following.
(i) The angle between any coordinate vector and a nonzero
vector of H is bounded below by a constant (depending on dimension). Because J ⊆ H, this applies to J
as well.
(ii) Given any set of points on U , every point on J is
equidistant from all of these points.
(iii) A set of points is δ-sparse if for each point in the set
its nearest neighbor in the set is at distance at least δ.
From standard results it follows that the unit sphere
U has a θ-sparse set of cardinality Ω(1/θk−1 ).
Recalling that t is the number of representatives, define θ
to be maximum such that there is a θ-sparse set on U of size
at least t + 1. Let Sk denote this set. By the observation
above θ is Ω(1/t1/(k−1) ). Let α = 18ǫ/θ2 . We show that no
cell of an AVD of S can contain a large ball centered on J
and close to the origin. Let BJ be the intersection of a ball
of unit radius centered at the origin and J.
Lemma 10. Consider a (t, ǫ)-AVD for the set S, whose
cells are the differences of axis-aligned rectangles of bounded
aspect ratio. Consider a ball b of radius α centered at some
point of BJ . Then no cell of the AVD can contain b.
To sketch the proof we observe that if such a cell existed,
then by property (ii) every point of S is a nearest neighbor
to some point in the cell. Since S has t + 1 points, one
point p ∈ S is not a representative for this cell. Consider
the point q on the boundary of the ball that is closest to p.
Because S is θ-sparse, and the cell is relatively close to S, it
can be shown that the distance of every other point in S to
q exceeds the length |qp| by ǫ, providing a contradiction.
Let C denote the set of cells of the (t, ǫ)-AVD that intersect
BJ . For each c ∈ C, define cα to be the set of points of the
interior of c that are at distance at least α from c’s boundary.
Clearly, if cα intersects BJ , then c contains a ball of radius
α that is centered at a point in BJ , and by the previous
lemma, no such cell can be in the AVD. Otherwise, by the
fact that AVD cells are fat and axis-aligned and by property
(i) above, it follows that the diameter of the intersection of
any cell and BJ is O(α). By applying a simple packing
argument to BJ it follows that the number of cells of the
AVD that intersect BJ is at least
„ «d−k
«d−k
„
„ 2 «d−k
1
1
θ
Ω
,
= Ω 2/(k−1)
= Ω
α
ǫ
t
ǫ
for t ≤ (1/ǫ)(k−1)/2 . To complete the construction, we replicate this configuration of points ⌊n/(t + 1)⌋ times at a sufficiently large distance from each other. Up to a constant
factor depending on d, the total number of cells is at least
L = nt (t2/(k−1) ǫ)k−d .
To simplify this expression, let us express t as (1/ǫ)c for
some constant c. Our constraint on t implies that c ≤ (k −
1)/2. Thus we have the following lower bound.
„ «(d−k)(1− 2c )−c
„ «(d−1)+c−(k−1)−2c d−1
k−1
k−1
1
1
L = n
.
= n
ǫ
ǫ
Since k is under our control, to get the best lower
bound
p
we maximize the exponent, by setting k − 1 = 2c(d − 1).
It is easy to verify that if c ≤ (d − 1)/2 then this satisfies
our prior constraint on t. As a result we have the following
lower bound, which establishes Theorem 3(i).
„ «(d−1)+c−2√2c(d−1)
1
L=n
.
ǫ
From Theorem 1 it follows that, ignoring constant factors
and the log γ term, there exists a (t, ǫ)-AVD for our point set
with the following number of cells. U = nǫ(d−1)/2 γ 3(d−1)/2 .
To achieve t representatives we set γ = (1/ǫ)(1/t)2/(d−1) and
set t = (1/ǫ)c to get
„ «(d−1)−3c
1
U =n
.
ǫ
To compare the upper and lower bound, we compute their
ratio
„ «2√2c(d−1)−4c
1
U
.
=
ρ(c) =
L
ǫ
To determine the worst-case ratio as a function of t, we
maximize this expression over c yielding c = (d − 1)/8 (and
hence t = (1/ǫ)(d−1)/8 ). Note that this satisfies our earlier
requirement that c ≤ (d−1)/2. This yields the following upper bound on the ratio between the upper and lower bounds.
„ «(d−1)/2
1
ρ(c) ≤
.
ǫ
This establishes Theorem 3(ii).
6.
REFERENCES
[1] S. Arya and T. Malamatos. Linear-size approximate
Voronoi diagrams. In Proc. 13th ACM-SIAM Sympos.
Discrete Algorithms, pages 147–155, 2002.
[2] S. Arya and D. M. Mount. Approximate range
searching. Computational Geometry: Theory and
Applications, 17:135–152, 2000.
[3] S. Arya, D. M. Mount, N. Netanyahu, R. Silverman,
and A. Y. Wu. An optimal algorithm for approximate
nearest neighbor searching in fixed dimensions. J.
ACM, 45:891–923, 1998.
[4] P. B. Callahan and S. R. Kosaraju. A decomposition
of multidimensional point sets with applications to
k-nearest-neighbors and n-body potential fields. J.
ACM, 42:67–90, 1995.
[5] T. M. Chan. Approximate nearest neighbor queries
revisited. Discrete Comput. Geom., 20:359–373, 1998.
[6] K. L. Clarkson. An algorithm for approximate
closest-point queries. In Proc. 10th Annu. ACM
Sympos. Comput. Geom., pages 160–164, 1994.
[7] R. M. Dudley. Metric entropy of some classes of sets
with differentiable boundaries. J. Approx. Theory,
10:227–236, 1974.
[8] C. A. Duncan, M. T. Goodrich, and S. G. Kobourov.
Balanced aspect ratio trees: Combining the
advantages of k-d trees and octrees. J. Algorithms,
33:303–333, 2001.
[9] S. Har-Peled. A replacement for Voronoi diagrams of
near linear size. In Proc. 42 Annu. IEEE Sympos.
Found. Comput. Sci., pages 94–103, 2001.
[10] P. Indyk and R. Motwani. Approximate nearest
neighbors: Towards removing the curse of
dimensionality. In Proc. 30th Annu. ACM Sympos.
Theory Comput., pages 604–613, 1998.
[11] E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient
search for approximate nearest neighbor in high
dimemsional spaces. In Proc. 30th Annu. ACM
Sympos. Theory Comput., pages 614–623, 1998.
[12] Y. Sabharwal, S. Sen, and N. Sharma. Improved space
bound for approximate Voronoi diagram. Manuscript,
2001.
[13] J. Vleugels and M. Overmars. Approximating Voronoi
diagrams of convex sites in any dimension. Internat. J.
Comput. Geom. Appl., 8:201–222, 1998.