Academia.eduAcademia.edu

Space-efficient approximate Voronoi diagrams

2002

Space-Efficient Approximate Voronoi Diagrams Sunil Arya Theocharis Malamatos∗ David M. Mount† Department of Computer Science The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong Department of Computer Science The Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong Department of Computer Science and Institute for Advanced Computer Studies University of Maryland College Park, Maryland 20742 [email protected] [email protected] [email protected] ABSTRACT Given a set S of n points in IRd , a (t, ǫ)-approximate Voronoi diagram (AVD) is a partition of space into constant complexity cells, where each cell c is associated with t representative points of S, such that for any point in c, one of the associated representatives approximates the nearest neighbor to within a factor of (1 + ǫ). Like the Voronoi diagram, this structure defines a spatial subdivision. It also has the desirable properties of being easy to construct and providing a simple and practical data structure for answering approximate nearest neighbor queries. The goal is to minimize the number and complexity of the cells in the AVD. We assume that the dimension d is fixed. Given a real parameter γ, where 2 ≤ γ ≤ 1/ǫ, we show that it is possible to construct a (t, ǫ)-AVD consisting of O(nǫ d−1 2 γ 3(d−1) 2 log γ) cells for t = O(1/(ǫγ)(d−1)/2 ). This yields a data structure of O(nγ d−1 log γ) space (including the space for representatives) that can answer ǫ-NN queries in time O(log(nγ) + 1/(ǫγ)(d−1)/2 ). (Hidden constants may depend exponentially on d, but do not depend on ǫ or γ). In the case γ = 1/ǫ, we show that the additional log γ factor in space can be avoided, and so we have a data structure that answers ǫ-approximate nearest neighbor queries in time O(log(n/ǫ)) with space O(n/ǫd−1 ), improving upon the best known space bounds for this query time. In the case γ = 2, we have a data structure that can answer approximate nearest neighbor queries in O(log n + 1/ǫ(d−1)/2 ) time using optimal O(n) space. This dramatically improves the ∗ The work of the first two authors was supported by the Research Grants Council of Hong Kong, China under project number HKUST 6158/98E. † This author’s work was supported in part by the National Science Foundation under grant CCR-0098151. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. STOC’02, May 19-21, 2002, Montreal, Quebec, Canada. Copyright 2002 ACM 1-58113-495-9/02/0005 ...$5.00. previous best space bound for this query time by a factor of O(1/ǫ(d−1)/2 ). We also provide lower bounds on the worst-case number of cells assuming that cells are axis-aligned rectangles of bounded aspect ratio. In the important extreme cases γ ∈ {2, 1/ǫ}, our lower bounds match our upper bounds asymptotically. For intermediate values of γ we show that our upper bounds are within a factor of O((1/ǫ)(d−1)/2 log γ) of the lower bound. 1. INTRODUCTION Given a set S of n points in IRd , called sites, the Voronoi diagram of S is a partition of space into cells, such that each cell is the region of space consisting of all points that are closer to a particular site than to any other. Voronoi diagrams are among the most fundamental and well-studied objects in computational geometry. Voronoi diagrams have numerous applications in areas such as pattern recognition and classification, machine learning, robotics, and graphics. Many of these applications are in high dimensions but, unfortunately, the complexity of Voronoi diagrams can be as high as n⌈d/2⌉ in dimension d. Constructing Voronoi diagrams can be complicated due either to numerical inaccuracies or degeneracies, which result when points are cocircular or nearly so. Also, although Voronoi diagrams implicitly encode the information of what site is closest to a given point, by themselves they are not suitable for use as a data structure for nearest neighbor searching. These shortcomings have led to consideration of simpler structures that can be used in place of the Voronoi diagram. We begin with some definitions. For a real parameter ǫ > 0, we say that a point p ∈ S is an ǫ-nearest neighbor (ǫ-NN ) of a point q ∈ IRd , if the distance between q and p is at most (1 + ǫ) times the distance between q and its nearest neighbor in S. We assume that distances are measured in the Euclidean metric. An approximate Voronoi diagram (AVD) of S is defined to be a partition of space into cells, where each cell c is associated with a representative rc ∈ S, such that rc is an ǫ-NN for all the points in c [9]. More generally we may allow up to some given t ≥ 1 representatives to be stored with each cell, and require that for any point in the cell, one of these t representatives is an ǫ-NN. We refer to such a decomposition as a (t, ǫ)-approximate Voronoi diagram. Vleugels and Overmars [13] considered approximating the Voronoi diagram of a disjoint set of convex sites in IRd for the purpose of motion planning. Har-Peled [9] showed how to construct a (1, ǫ)-AVD of O((n/ǫd )(log n) log(n/ǫ)) size. A cell in this subdivision is the difference of two axis-aligned cubes (the inner cube is optional). Moreover, these cells are stored in a compressed quadtree data structure, which provides a simple and efficient method for answering ǫ-NN queries in O(log(n/ǫ)) time. This approach is very appealing because, like the Voronoi diagram, it defines a subdivision of space, but there are no problems with geometric degeneracies, and a point-location structure is provided. The construction and search structure are very simple and practical. The principal shortcoming of Har-Peled’s data structure is its size. Sabharwal et al. [12] reduced the size by a logarithmic factor. Arya and Malamatos [1] showed that it is possible to construct a (1, ǫ)-AVD of O(n/ǫd ) size, and they provided a lower bound of Ω(n/ǫd−1 ) on the size of a (1, ǫ)-AVD. As with Har-Peled their cells are differences of two axis-aligned hyperrectangles. Further, they introduced the notion of allowing multiple representatives per cell and showed that given a real parameter 2 ≤ γ ≤ 1/ǫ, it is possible to construct a (t, ǫ)-AVD of O(nγ d ) size, where t = O(1/(ǫγ)(d−1)/2 ). By varying γ it is possible to achieve a tradeoff in space and query time. To understand the significance of this line of work, it is useful to briefly review the evolution of results in the field of approximate nearest neighbor searching in Euclidean space. There have been two principal approaches. One direction has focused on eliminating exponential dependencies on dimension, as characterized by the work of Indyk and Motwani [10] and Kushilevitz et al. [11]. Although these methods provide space and query times without exponential dependence on dimension, their space requirements grow as nO(1) , which may be too high for many applications. The second direction, which has been developed in computational geometry, is to assume that d is a small constant and n is large. The resulting data structures have sizes that are nearly linear in n but may have factors in space and query time that grow as (1/ǫ)O(d) . Arya et al. [3] and, later, Duncan et al. [8] provided data structures that achieve O((1/ǫ)d log n) query time and use O(n) space (independent of ǫ). These structures were optimal with respect to space, but the ǫ factors in the query times were far from optimal. The cells of the Voronoi diagram are convex polytopes, and results on approximating convex polytopes by Dudley [7] suggest that (1/ǫ)(d−1)/2 should be the proper bound. Indeed, Clarkson [6] showed that queries could be answered in O((1/ǫ)d/2 log n) time. However Clarkson’s space bounds were larger by a factor of at least (1/ǫ)d/2 and involved a parameter that depended on the geometric arrangement of the sites. Through an impressive combination of structures, Chan [5] showed that the query time could be reduced to O((1/ǫ)(d−1)/2 log n) time using a data structure with space O((1/ǫ)(d−1)/2 n log n). The recent results of Arya and Malamatos [1] further tightened these bounds by showing that the query time could be improved to O(log n + 1/ǫ(d−1)/2 ) and removed the log factor from the space bound. Finally, it seems that the constant factors are coming exactly in line with bounds suggested by Dudley’s results. The one remaining deficiency is the significant factor of O((1/ǫ)(d−1)/2 ) in the space bound. In this paper we show that this factor can be eliminated altogether. In particular we provide a data structure of size O(n) that can answer approximate nearest neighbor queries in O(log n + 1/ǫ(d−1)/2 ) time. Thus we simultaneously provide optimal space bounds while matching the query time suggested by Dudley’s results. Our approach is based on a more space-efficient version of approximate Voronoi diagrams. Our approach differs from those of Har-Peled and Arya and Malamatos principally in how cells are generated and how representatives are chosen. Otherwise, it produces an AVD data structure with the same desirable characteristics. More specifically, our main result is that, given a set of n sites S in IRd , 0 < ǫ ≤ 1/2, and a parameter 2 ≤ γ ≤ 1/ǫ, we show how to construct a (t, ǫ)-AVD, where t = O(1/(ǫγ)(d−1)/2 ), consisting of O(nǫ d−1 2 γ 3(d−1) 2 log γ) cells. This yields a data structure of O(nγ d−1 log γ) space (including the space for representatives) that can answer ǫNN queries in time O(log(nγ) + 1/(ǫγ)(d−1)/2 ). The bounds described above on approximate nearest neighbor searching arise with γ = 2. In the case γ = 1/ǫ, the additional log γ factor in space can be avoided, and we have a data structure that answers queries in time O(log(n/ǫ)) with space O(n/ǫd−1 ). These results are presented in Theorem 1 and Corollary 1.1 in Section 3. Note that our bounds are superior to those of [9] and [1] for all values of γ. We believe that the factor of log γ in the number of cells is an artifact of our proof technique. As evidence of this we provide a significantly different construction and a different analysis based on a charging argument in which this log γ factor in the number of cells is replaced by factor of log 2/(ǫγ). We feel that this alternative construction may be p of independent interest. For γ > 1/ǫ this provides a superior space bound. These results are presented in Theorem 2 in Section 4. We also present the first lower bounds on the worst-case number of cells in a (t, ǫ)-AVD for multiple representatives. We assume that cells are based on axis-aligned rectangles of bounded aspect ratio. These results show that our space bounds are tight in the two extremes, γ = 2 and γ = 1/ǫ. For all intermediate values of γ our upper bounds on space are tight to within a factor of O((1/ǫ)(d−1)/2 log γ). These results are presented in Section 5. Our results come about from a few innovations. The principal reduction in space arises from a form of deterministic sampling based on the BBD-tree. By applying the construction of Arya and Malamatos on an appropriately sampled subset of points, we can decrease the number of cells significantly while increasing the number of representatives by only a constant factor. The main difficulty is avoiding an additional factor of O(log n) in the number of representatives, as would be expected from well known results on ǫ-nets. Our other innovation is to create cells more economically by considering a well-separated pair decomposition of the points, and concentrating the generation of cells along the bisector between well-separated pairs. Our alternative approach is based on a charging argument which shows that, after certain modifications to Arya and Malamatos’s construction, the number of representatives for most of the cells is significantly smaller than the maximum possible. This enables us to reduce the number of cells drastically by combining cells with fewer representatives into larger cells. 2. PRELIMINARIES Throughout we assume that the dimension d is a fixed constant, and the constants hidden in the asymptotic bounds may depend on d (but not on ǫ or γ). We assume that the set S of points has been scaled and translated to lie within a ball of radius ǫ/8 placed at the center of the unit hypercube [0, 1]d . Let x and y denote any two points in IRd . We use |xy| to denote the Euclidean distance between x and y and xy to denote the segment joining x and y. We denote by b(x, r) a ball of radius r centered at x, i.e, b(x, r) = {y : |xy| ≤ r}. For a ball b and any positive real γ, we use γb to denote the ball with the same center as b and whose radius is γ times the radius of b, and b to denote the set of points that are not in b. Given a set X of points and a point q, let NN q (X) be the distance to the nearest neighbor of q in X. If there are no points in X, then NN q (X) is defined to be infinity. We briefly review the notions of well-separated pair decomposition and balanced box-decomposition trees, as they play an important role in our constructions. 2.1 The Well-Separated Pair Decomposition Let S be a set of n points in IRd . We say that two sets of points X and Y are well-separated if they can be enclosed within two disjoint d-dimensional balls of radius r, such that the distance between the centers of these balls is at least αr, where α ≥ 2 is a real parameter called the separation factor. If we consider joining the centers of these two balls by a line segment, the resulting shape resembles a dumbbell. The balls are the heads of the dumbbell. Define the length of a dumbbell to be the distance between the centers of the balls. A dumbbell separates two points x and y if x is contained in one head and y in the other. A well-separated pair decomposition (WSPD) of S is a set PS,α = {(X1 , Y1 ), · · · , (Xm , Ym )} of pairs of subsets of S such that (i) for 1 ≤ i ≤ m, Xi and Yi are well-separated and (ii) for any distinct points x, y ∈ S, there exists a unique pair (Xi , Yi ) such that either x ∈ Xi and y ∈ Yi or x ∈ Yi and y ∈ Xi . (We say that the pair (Xi , Yi ) separates x and y.) Callahan and Kosaraju [4] have shown that we can construct a WSPD containing O(αd n) pairs in O(n log n + αd n) time. For each pair, their construction also provides the corresponding dumbbell. Throughout we will assume that the length of the dumbbell is exactly α times the radius of the dumbbell heads. 2.2 The BBD Tree Let U = [0, 1]d denote a unit hypercube in IRd . We define a quadtree box recursively as follows: (i) U is a quadtree box, and (ii) any hypercube obtained by splitting a quadtree box into 2d equal parts is a quadtree box. The size of a quadtree box is its side length. A nice property of quadtree boxes is that any two quadtree boxes are either disjoint or one is contained inside the other. The balanced box-decomposition (BBD) tree is a balanced 2d -ary tree that compactly represents a hierarchical decomposition of space [3]. Each node of the tree is associated with a region of space called a cell, which is the difference of two quadtree boxes, an outer box and an (optional) inner box. The root of the tree is associated with U . The cell associated with any node is partitioned into disjoint cells, which are associated with the children of the node. (For details see [3].) We define the size of a cell to be same as the size of its outer box. The properties of the BBD tree, relevant to this paper, are given below. (i) A set S of n points can be stored in a BBD tree having O(n) nodes and O(log n) depth. (ii) A collection C of n quadtree boxes can be stored in a BBD tree having O(n) nodes and O(log n) depth. The subdivision induced by its leaves is a refinement of the subdivision induced by the quadtree boxes in C and, for any point q, we can determine the leaf containing q in O(log n) time. (iii) The number of cells of the BBD tree with pairwise disjoint interiors, each of size at least s, that intersect a ball of radius r is at most O((1 + r/s)d ). (iv) Let 1 ≤ t ≤ n. Consider the set of nodes of the BBD tree that have at most t points but whose parents have more than t points. The size of such a set is at most O(n/t). (v) Suppose that we assign weights to a subdivision induced by an (unweighted) BBD tree. Let w be the maximum weight of a cell and let W denote the total weight of all the cells. Then it is possible to build a weighted BBD tree for exactly the same set of cells such that the following holds. Given w ≤ t ≤ W , consider a set of nodes of weight at most t but whose parents have weight more than t. The size of such a set is O(W/t). Properties (i) and (iii) are proved in [3]. Properties (ii) and (v) follow from a generalization of (i). Property (iv) follows from the balancing aspect of the BBD tree. Throughout we use the following notation. Given a cell c, let sc denote the size of c, and bc be the ball of radius sc d/2 whose center coincides with the center of c’s outer box (note that c ⊆ bc ). 3. SPACE REDUCTION BY SAMPLING Let S be a set of n points in IRd , and let 0 < ǫ ≤ 1/2 and 2 ≤ γ ≤ 1/ǫ be two real parameters. In this section we show how to construct a (t, ǫ)-AVD, consisting of O(nǫ(d−1)/2 γ 3(d−1)/2 log γ) cells, where t = O(1/(ǫγ)(d−1)/2 ). The construction of AVDs described by Arya and Malamatos [1] is based on two fundamental constructions. The first is that given two concentric balls it is possible to construct a small set of nearest-neighbor representatives so that given a query point in the inner ball its approximate nearest neighbor outside the outer ball will be one of these representatives, and vice versa. A similar result holds for two disjoint balls. Lemmas 1 and 2 below indicate the sizes of these representative sets as a function of the separation of the two balls. The second construction is a subdivision of space into a collection of cells, such that for each cell, the points of S lying outside this cell satisfy certain separation properties. This is given in Lemma 3 below. In all three lemmas (adapted slightly for convenience), S is a set of n points in IRd , and 0 < ǫ ≤ 1/2 and γ ≥ 2 are two real parameters. Lemma 1. Let b1 and b2 denote two concentric balls of radius r and γr, respectively. There exists a set R ⊆ S consisting of „ „ ««d−1 1 1+O √ ǫγ points such that (i) for any point q ∈ b1 , NN q (R) ≤ (1 + ǫ)NN q (S ∩ b2 ), and (ii) for any point q ∈ b2 , NN q (R) ≤ (1 + ǫ)NN q (S ∩ b1 ). Lemma 2. Let b1 and b2 be two disjoint balls of radius r1 and r2 , respectively, whose minimum distance of separation is at least ℓ′ . Further, suppose that ℓ′ ≥ max(r1 , r2 )/2. Then there exists a set R ⊆ S consisting of „ „√ ««d−1 r1 r2 √ 1+O ℓ′ ǫ points such that for any point q ∈ b1 , NN q (R) ≤ (1 + ǫ) · NN q (S ∩ b2 ). Lemma 3. It is possible to construct a subdivision consisting of O(nγ d ) cells, where each cell c is the difference of two cubes and satisfies at least one of the following three properties: (i) |S ∩ γbc | ≤ 1. (ii) There exists a ball b′c such that S ∩ γbc ⊆ b′c and the ball γb′c does not overlap c. (iii) There exists a ball b′c such that S ∩ γbc ⊆ b′c . Letting r1 and r2 denote the radius of balls bc and b′c , respectively, and ℓ′ denote the minimum distance of separation b and b′c , then ℓ′ ≥ 2 max(r1 , r2 ) √ between √ c and ℓ′ / r1 r2 ≥ 5 γ. The three lemmas given above imply that, for a cell c in the subdivision of Lemma 3, we can choose a set of representatives of size O((1/ǫγ)(d−1)/2 ) each from γbc and from b′c . This forms the basis of the approach given by Arya and Malamatos. We can significantly reduce the number of cells by weakening the separation property in Lemma 3, allowing for O((1/ǫγ)(d−1)/2 ) points in the region γbc −b′c , and choosing all these points as additional representatives. A natural method for trying to achieve this goal is to construct the subdivision of Lemma 3 for a set of randomly sampled points of suitable size. Unfortunately, standard results in random sampling and ǫ-net theory imply that this would lead to an increase in the number of representatives by a factor that is logarithmic in n. We overcome this difficulty by devising a novel but simple sampling procedure based on the BBDtree, which has the following property. Let nf denote the number of points of S that are sampled, where 0 < f ≤ 1 is a parameter. Then for any fat region (e.g., an Euclidean ball) that is free of sampled points, we can contract the region by a constant factor (say, 2), and the shrunken region is guaranteed to have at most O(1/f ) points of the original set. As we will see, this sampling procedure suffices for our application and leads to a large space improvement. In Lemma 4, we present a simple version of this idea, intended to illustrate the basic space-reduction mechanism and, later, in Section 3.1, we apply this technique in full force. Lemma 4. Let S be a set of n points in IRd , and let γ ≥ 2 and 0 < f ≤ 1 be two real parameters. It is possible to construct a subdivision consisting of O(nf γ d ) cells, where each cell c is the difference of two cubes and satisfies at least one of the following three properties: (i) |S ∩ γbc | = O(1/f ). (ii) There exists a ball b′c such that |S∩(γbc −b′c )| = O(1/f ) and the ball γb′c does not overlap c. (iii) There exists a ball b′c such that |S ∩ (γbc − b′c )| = O(1/f ). Letting r1 and r2 denote the radius of balls bc and b′c , respectively, and ℓ′ denote the minimum distance of separation between bc and b′c , then ℓ′ ≥ √ √ max(r1 , r2 )/2 and ℓ′ / r1 r2 ≥ γ, Proof. Let T denote the BBD tree for the set S of points. Let N be the set of nodes of T that contain at most 1/f points (of S), but whose parents contain more than 1/f points. Let X be the cells corresponding to these nodes. By BBD property (iv), |X | is O(nf ). Let S ′ ⊆ S be the set of points obtained by sampling one point arbitrarily from each (non-empty) cell in X . We construct the subdivision described in Lemma 3 for S ′ but using the value 2γ in place of γ in the lemma. Observe that since |S ′ | = O(nf ), by Lemma 3, the number of cells in this subdivision is O(nf γ d ). We claim that any cell c in this subdivision satisfies the desired property. If Case (i) of Lemma 3 holds, then |S ′ ∩ 2γbc | ≤ 1. Let ′ X ⊆ X be the cells that overlap the ball γbc , and contain at least one point of S. Since S ′ contains one point from each cell in X ′ , all but at most one cell in X ′ must intersect the boundary of 2γbc . By BBD property (iii), the number of cells that overlap γbc and intersect the boundary of 2γbc is bounded by a constant. Thus |X ′ | = O(1). Since a cell in X ′ has at most 1/f points, the total number of points of S in γbc is O(1/f ). If Case (ii) of Lemma 3 holds, then there exists a ball b′′c such that S ′ ∩ 2γbc ⊆ b′′c and the ball 2γb′′c does not overlap c. Define b′c = 2b′′c . It follows that γb′c does not overlap c. Next we show that |S ∩ (γbc − b′c )| = O(1/f ). Let X ′ ⊆ X be the cells that overlap the region γbc − b′c , and contain at least one point of S. Since S ′ contains one point from each cell in X ′ , and 2γbc − b′c /2 contains no point of S ′ , it follows that each cell in X ′ intersects the boundary of both 2γbc and γbc , or intersects the boundary of both b′c and b′c /2. By BBD property (iii), the number of such cells is bounded by a constant. Since a cell in X ′ has at most 1/f points, the total number of points of S in the region γbc − b′c is O(1/f ). A similar argument shows that if Case (iii) of Lemma 3 holds, then the corresponding case holds here. The details are omitted. This completes the proof. We construct the subdivision described in Lemma 4 for f = (ǫγ)(d−1)/2 . We assign representatives to the cells as follows. Let q be a point inside a cell c. Let bc and b′c be the balls defined in Lemma 4. Since c is contained within the ball bc , applying Lemma 1(i), it follows that we can find a set Rc′ consisting of O(1/(ǫγ)(d−1)/2 ) points such that NN q (Rc′ ) ≤ (1 + ǫ)NN q (S ∩ γbc ). We next consider the case when the nearest neighbor of q lies within γbc . Note that one of the three cases given in the statement of Lemma 4 must hold. If Case (i) holds, then we define Rc′′ = S ∩γbc , and if Case (ii) or Case (iii) holds, then we define Rc′′ = S ∩ (γbc − b′c ). Note that |Rc′′ | = O(1/(ǫγ)(d−1)/2 ). It is clear that a point in Rc′ ∪ Rc′′ is an ǫ-NN of q unless Case (ii) or (iii) holds and the nearest neighbor of q lies within b′c . If Case (ii) (Case (iii), resp.) holds then, by Lemma 1(ii) (Lemma 2, resp.), it follows that we can find a set Rc′′′ consisting of O(1/(ǫγ)(d−1)/2 ) points such that NN q (Rc′′′ ) ≤ (1 + ǫ)NN q (S ∩ b′c ). Finally we assign the set of representatives Rc for c to be Rc′ ∪ Rc′′ in Case (i) and Rc′ ∪ Rc′′ ∪ Rc′′′ in Cases (ii) and (iii). Clearly, Rc has size O(1/(ǫγ)(d−1)/2 ) and satisfies the desired property, namely, NN q (Rc ) ≤ (1 + ǫ)NN q (S). Thus, we have thus shown how to construct a (t, ǫ)-AVD with O(nǫ(d−1)/2 γ (3d−1)/2 ) cells for t = O(1/(ǫγ)(d−1)/2 ). 3.1 Bisector-Sensitive Construction In this section, we present a more sophisticated construction that reduces the space by nearly a factor of γ. The construction used in the previous section simply applied the existing construction of Arya and Malamatos to a suitable sample of the points. In this section we combine the sampling process with a new construction, which generates cells along the Voronoi bisectors of the well-separated pairs. Let T denote the BBD tree for the set S of points. Let N be the set of nodes of T that contain at most 1/f points (of S), but whose parents contain more than 1/f points. Here f is a parameter between 0 and 1, which will later be assigned a suitable value depending on ǫ and γ. Let X be the cells corresponding to these nodes. By BBD property (iv), |X | is O(nf ). Let S ′ ⊆ S be the set of points obtained by sampling one point arbitrarily from each (non-empty) cell in X . We construct a WSPD PS ′ ,α for S ′ , using separation factor α = 16. Note that the number of pairs in PS ′ ,16 is O(nf ). Let D′ denote the set of dumbbells corresponding to this WSPD. We expand both the heads of each dumbbell by a factor of two. Let D denote the new set of dumbbells. Note that the dumbbells in D have a separation factor of 8. For each dumbbell P ∈ D, we compute a set of quadtree boxes CP as follows. Let X ⊆ S and Y ⊆ S denote the set of points that are enclosed within the two heads of P . Let x and y denote the centers of the heads enclosing X and Y , respectively. Let ℓ = |xy|, let z denote the center of the segment xy, and let BP denote the set of balls of radius 2i ℓ, for 4 ≤ i ≤ ⌈log γ + 4⌉, centered at z. Let VP denote the set of points whose nearest neighbor in X and in Y are equidistant (i.e., the Voronoi bisector of X and Y ). For a ball b ∈ BP , let Cb be the set of quadtree boxes overlapping b ∩ VP that have the largest size not exceeding rb2 /(2048γdℓ), where rb denotes the radius of b. Let CP′ be the set of quadtree boxes overlapping b(z, 16γℓ) that have the largest size not exceeding γℓ/d. Let CP = (∪b∈BP Cb ) ∪ CP′ and C = ∪P ∈D CP . Finally we store all the quadtree boxes in C in a BBD tree T ′ . We will show that the subdivision induced by the leaves of T ′ , along with suitably chosen representatives, is the desired approximate Voronoi diagram. We first bound the number of cells in the subdivision. The following lemma essentially shows that the Voronoi bisector of a well-separated pair (X, Y ) behaves as a (d − 1)dimensional hyperplane for the purpose of packing arguments. The proof is omitted, but is similar to one given in [2]. Lemma 5. Let X and Y be two sets of points enclosed within two disjoint d-balls of radius r, such that the distance between the centers of these balls is at least αr, where α ≥ 4. Let V denote the Voronoi bisector of X and Y , and let b be any d-ball of radius R. Then the number of quadtree boxes of size s that overlap b ∩ V is at most O((1 + R/s)d−1 ). By Lemma 5, it follows that for any ball b ∈ BP , |Cb | is O((1+γℓ/rb )d−1 ). Also, by BBD property (iii), |CP′ | = O(1). It is now easy to see that |CP | is O(γ d−1 ). Since the number of dumbbells in D is O(nf ), the total number of quadtree boxes, |C|, is O(nf γ d−1 ). Since the number of leaves in the BBD tree T ′ is O(|C|), this bound applies to the number of cells in the subdivision. Lemma 7 describes an important property of the subdivision, which will enable us to choose representatives for the cells. We need the following technical lemma. The proof is omitted due to space limitations, but is proved by methods similar to those used in [1]. Lemma 6. Let c be a cell corresponding to a leaf of T ′ . Let x be a point in S ∩ 2γbc . Let Sc ⊆ S ∩ 2γbc be the set of points p such that there is a dumbbell P ∈ D that separates x and p, and VP intersects cell c. Then either |Sc | = 0 or there exists a ball b′c such that Sc ∪ {x} ⊆ b′c and which satisfies at least one of the following two properties: (i) The ball 8γb′c does not overlap c. √ √ (ii) ℓ′ ≥ 4 max(r1 , r2 ) and ℓ′ / r1 r2 ≥ 19 γ, where ℓ′ denotes the minimum distance of separation between bc and b′c , r1 denotes the radius of bc , and r2 denotes the radius of ball b′c . The proof is based on the fact that a dumbbell separating x from a point in Sc is either “far away” from c or it implies an upper bound on the size of c (because it generates c or a cell enclosing c). By considering the dumbbell that separates x from the point in Sc that is farthest from it, we can show that these two cases yield properties (i) and (ii), respectively. Lemma 7. Let S be a set of n points in IRd , and let γ ≥ 2 and 0 < f ≤ 1 be two real parameters. It is possible to construct a subdivision consisting of O(nf γ d−1 ) cells, where each cell c is the difference of two cubes and satisfies at least one of the following three properties. Let Sc ⊆ S be the set of points that are the nearest neighbor of some point in c. (i) |Sc ∩ γbc | = O((1/f ) log γ). (ii) There exists a ball b′c such that |Sc ∩ (γbc − b′c )| = O((1/f ) log γ) and the ball γb′c does not overlap c. (iii) There exists a ball b′c such that |Sc ∩ (γbc − b′c )| = O((1/f ) log γ). Letting ℓ′ denote the minimum distance of separation between bc and b′c , r1 denote the radius of bc , and r2 denote of ball b′c , then √ the radius √ ℓ′ ≥ max(r1 , r2 ) and ℓ′ / r1 r2 ≥ 9 γ. Proof. If |Sc ∩ γbc | = 0, then (i) trivially holds. So suppose that |Sc ∩ γbc | ≥ 1. First we show that if |S ′ ∩ 2γbc | = 0 then (i) holds. Let X ′ ⊆ X be the set of cells that overlap the ball γbc , and contain at least one point of S. Since S ′ contains one point from each cell in X ′ , all the cells in X ′ must intersect the boundary of 2γbc . By BBD property (iii), the number of cells that overlap γbc and intersect the boundary of 2γbc is bounded by a constant. Thus |X ′ | = O(1). Since a cell in X ′ has at most 1/f points, the total number of points of S in γbc is O(1/f ), and so clearly (i) holds. In the remainder of the proof, we assume that |S ′ ∩2γbc | ≥ 1 and |Sc ∩ γbc | ≥ 1. Let (x, w) be the closest pair of points such that x ∈ S ′ ∩ 2γbc and w ∈ Sc ∩ 2γbc . (In case of ties, choose any such pair.) Let β = |xw|. (Note that if S ′ ∩ Sc ∩ 2γbc 6= ∅, then x = w, and β = 0.) Let Sc′ ⊆ S ∩ 2γbc be the set of points p such that there is a dumbbell P ∈ D that separates x and p, and VP intersects cell c. If |Sc′ | = 0, then let b′′c be the ball of zero radius centered at x. Otherwise, by Lemma 6, there exists a ball b′′c such that Sc′ ∪{x} ⊆ b′′c and which satisfies either property (i) or (ii) listed therein. Let r2′′ be the radius of b′′c and let z denote its center. We distinguish two cases based on the closest distance L between z and cell c: (1) L ≥ sc /16 and (2) L < sc /16. Case 1: L ≥ sc /16. Let r2 = max(2r2′′ , sc /(16γ)), and let b′c = b(z, r2 ). Note that x is at a distance of at most r2 /2 from the center z of b′c . We consider two subcases: (a) β < r2 /64 and (b) β ≥ r2 /64. Subcase (a): β < r2 /64. It is easy to verify that b′c satisfies either property (ii) or (iii). It remains to show that |Sc ∩(γbc −b′c )| = O((1/f ) log γ). To this end, we first show that the cell c′ ∈ X that contains a point u ∈ Sc ∩(γbc −b′c ) has size at least |zu|/(64d). For the sake of contradiction, suppose that the size of c′ is less than |zu|/(64d). Recall that S ′ contains one point from each cell in X . Let u′ be the point in c′ that belongs to S ′ (note that u and u′ may be the same point). By the triangle inequality, we get |xu′ | ≥ |zu| − |uu′ | − |xz|. Since |uu′ | ≤ |zu|/64 and |xz| ≤ |zu|/2 (because |zu| ≥ r2 and |xz| ≤ r2 /2), it follows that |xu′ | ≥ 31|zu|/64. Consider the dumbbell P ′ ∈ D′ that separates x and u′ . Let P be the dumbbell corresponding to P ′ in D. Let A and B denote the heads of dumbbell P containing x and u′ , respectively. Since the separation factor for the dumbbells in D is 8, it follows that the radius r ′ of A and B is at least |xu′ |/10 ≥ 31|zu|/640. Recall that the heads of the dumbbells in D′ are enlarged by a factor of 2 to obtain the dumbbells in D. Hence any point within distance r ′ /2 of x and u′ will lie within A and B, respectively. Since |wx| = β ≤ r2 /64 ≤ |zu|/64, it follows that w ∈ A and, since |uu′ | ≤ |zu|/64, it follows that u ∈ B. Since both w and u belong to Sc (i.e., they are the nearest neighbor of some point in cell c), it is not difficult to see that the Voronoi bisector VP must intersect c. But then b′′c should have contained u, a contradiction. Partition the points in Sc ∩ (γbc − b′c ) into groups, where the ith group has all the points whose distance from z lies in the interval [2i r2 , 2i+1 r2 ]. Since r2 ≥ sc /(16γ), and the maximum distance of a point in γbc from z is some constant times γsc , it is clear that the number of non-empty groups is O(log γ). Let Xi ⊆ X be the set of cells that overlap some point in the ith group. By the above observation, the cells in Xi have size at least 2i r2 /(64d), and so by BBD property (iii), |Xi | is O(1). Since each cell in Xi has 1/f points, the number of points in the ith group is O(1/f ), which implies the desired bound on |Sc ∩ (γbc − b′c )|. Subcase (b): β ≥ r2 /64. Consider a ball b̃ of radius 32β centered at x. It is easy to show that for any dumbbell separating x and any point u outside this ball, the dumbbell head containing x also contains w. As in Subcase (a), we can now show that |Sc ∩ (γbc − b̃)| = O((1/f ) log γ). Next we claim that |Sc ∩ γbc ∩ b̃| = O(1/f ). To see this, let X ′ ⊆ X be the set of cells that contain a point in Sc ∩ γbc ∩ b̃. Note that a cell in X ′ must either overlap 2γbc or it must have size exceeding β/d (otherwise the distance between any pair of points in this cell that belong to Sc and S ′ , respectively, would be less than β, a contradiction). It follows from BBD property (iii) that |X ′ | = O(1), which implies the desired claim. Thus we get |Sc ∩ γbc | = O((1/f ) log γ), which implies (i). Case 2: L < sc /16. It is obvious that property (ii) given in Lemma 6 cannot apply to b′′c , so either b′′c satisfies property (i) given there or its radius must be zero. It follows that r2′′ ≤ L/(8γ). We consider two subcases: (a) β < sc /(256γ) and (b) β ≥ sc /(256γ). Subcase (a): β < sc /(256γ). We set b′c = b(z, L/γ). Note that b′c satisfies property (ii), or it is a ball of zero radius centered at a point inside cell c. We now show that |Sc ∩ (γbc − b′c )| = O((1/f ) log γ). Similar to Subcase (a) of Case 1, we can show that |Sc ∩ (γbc −b(z, sc /(8γ)))| = O((1/f ) log γ). We omit the straightforward details. We next claim that |S ∩ (b(z, sc /(8γ)) − b′c )| = O(1/f ), which would imply (ii). Towards this end, we show that there are no points of S ′ in the region b(z, sc /(4γ)) − b′c /2. For contradiction, assume that there is a point u ∈ S ′ in this region. Consider a dumbbell P ∈ D that separates x and u. Let ℓ be the length of P . By definition of wellseparatedness, we get 4|xu|/5 ≤ ℓ ≤ 4|xu|/3. Applying the triangle inequality, it is easy to see that 3L/(8γ) ≤ |xu| ≤ sc /(2γ), which implies that 3L/(10γ) ≤ ℓ ≤ 2sc /(3γ). It can be easily checked from the construction that all cells whose closest distance from x is less than 15γℓ can have size at most γℓ/d. Note that the closest distance from x to cell c is at most L + L/(8γ) ≤ 17L/16 ≤ 4γℓ. Thus the size of cell c cannot exceed 2sc /(3d), a contradiction. Let X ′ ⊆ X be the set of cells that overlap the region b(z, sc /(8γ)) − b′c , and contain at least one point of S. Since S ′ contains one point from each cell in X ′ , and there are no points of S ′ in the region b(z, sc /(4γ)) − b′c /2, it follows that each cell in X ′ intersects the boundary of both b(z, sc /(4γ)) and b(z, sc /(8γ)) or the boundary of both b′c and b′c /2. By BBD property (iii), the number of such cells is bounded by a constant. Noting that a cell in X ′ has at most 1/f points, the desired result follows. Subcase (b): β ≥ sc /(256γ). Using an argument similar to Subcase (b) of Case 1, we can show that (i) holds. We omit the straightforward details. This completes the proof. In view of the similarity of Lemma 7 with Lemma 4, it should be clear that the same method for assigning representatives works for this subdivision too. We mention that the only difference is the value of f , which we set to (ǫγ)(d−1)/2 log γ. Given a query point q, we can determine the leaf of the BBD tree T ′ that contains q in O(log(nγ)) time. By com- puting the distance from q for each of the stored representatives, we can answer queries in O(log(nγ) + 1/(ǫγ)(d−1)/2 ) time. We summarize the main result of this section. Theorem 1. Let S be a set of n points in IRd , and let 0 < ǫ ≤ 1/2 and 2 ≤ γ ≤ 1/ǫ be two real parameters. We can construct an (O(1/(ǫγ)(d−1)/2 ), ǫ)-approximate Voronoi diagram for S that consists of O(nǫ(d−1)/2 γ 3(d−1)/2 log γ) cells, where each cell is the difference of two cubes. Moreover, for any query point, we can return its ǫ-NN in O(log(nγ) + 1/(ǫγ)(d−1)/2 ) time. Here the constants in the O-notation are independent of ǫ and γ. We obtain a family of data structures that can answer ǫNN queries in O(log(nγ) + 1/(ǫγ)(d−1)/2 ) time using space O(nγ d−1 log γ). Setting γ = 2 we obtain the most spaceefficient solution in this family, which we present in the following corollary. Corollary 1.1. Given a set S of n points in IRd , we can answer ǫ-NN queries in O(log n + 1/ǫ(d−1)/2 ) time using a data structure of space O(n). Remark: For the case of γ = 1/ǫ, we can omit the sampling process altogether in Lemma 7, which allows us to save a log γ factor in space. By a straightforward extension of the approach given in this section, we can construct a (1, ǫ)-AVD of size O(n/ǫd−1 ). Arya and Malamatos [1] have shown a lower bound of Ω(n/ǫd−1 ) on the size of a (1, ǫ)-AVD, assuming that the cells are differences of two axis-aligned hyperrectangles, which implies that our construction has optimal size. omitted due to space limitations, but is proved by methods similar to those used in [1]. Lemma 8. Let S be a set of n points in IRd , and let γ ≥ 2 be a real parameter. It is possible to construct a subdivision consisting of O(nγ d−1 ) cells, where each cell is the difference of two cubes, that satisfies the following properties. In the following, c is any cell in the subdivision and Sc ⊆ S ∩ γbc is the set of points that are the nearest neighbor of some point in c. (a) The number of cells, each of size at least s, that intersect a ball of radius r is at most O((1 + r/s)d ). (b) There is a constant k > 1 such that the ball kγbc contains at least one point of S. (c) The cell c satisfies at least one of the following three properties. (i) |Sc | ≤ 1. (ii) There exists a ball b′c such that Sc ⊆ b′c and the ball γb′c does not overlap c. (iii) There exists a ball b′c such that Sc ⊆ b′c . Letting r1 and r2 denote the radius of balls bc and b′c , respectively, and ℓ′ denote the minimum distance of separation between bc and b′c , then ℓ′ ≥ max(r1 , r2 ) √ √ and ℓ′ / r1 r2 ≥ 19 γ. LOWER SPACE BOUNDS BY BETTER CHARGING Let X be any subdivision satisfying Lemma 8. Using Lemmas 1 and 2, we can easily show that O(1/(ǫγ)(d−1)/2 ) representatives suffice for each cell. The following lemma gives a more complicated method for choosing cell representatives which, however, has the advantage that we can obtain a much better bound on the total number of representatives. Let S be a set of n points in IRd , and let 0 < ǫ ≤ 1/2 and 2 ≤ γ ≤ 1/ǫ be two real parameters. In this section we show how to construct a (t, ǫ)-AVD, consisting of „ « 3(d−1) d−1 2 O nǫ 2 γ 2 log ǫγ Lemma 9. Let S be a set of n points in IRd , and let 0 < ǫ ≤ 1/2 and 2 ≤ γ ≤ 1/ǫ be two real parameters. Let t = O(1/(ǫγ)(d−1)/2 ). Any subdivision X satisfying Lemma 8 is a (t, ǫ)-AVD such that the total number of representatives summed over all cells of X is O(nγ d−1 log(2/(ǫγ))). 4. cells, where t = O(1/(ǫγ)(d−1)/2 ). The construction proceeds as follows. First, we construct a (t, ǫ)-AVD consisting of O(nγ d−1 ) cells, such that the total number of representatives summed over all cells is O(nγ d−1 log(2/(ǫγ))). Note that for this AVD, there is a huge gap between the number of representatives for a cell in the worst case, which is O(1/(ǫγ)(d−1)/2 ), and their number averaged over all cells, which is only O(log(2/(ǫγ))). In the second step of the construction, we take advantage of this fact by combining cells that have fewer representatives into larger cells. This enables us to significantly reduce the number of cells and leads to the desired AVD. Let D be the set of dumbbells corresponding to the WSPD for S, using separation factor 8. For each dumbbell P ∈ D, define the Voronoi bisector VP , the set of balls BP , and the sets CP′ and CP of quadtree boxes, exactly as in Section 3.1. We store all the quadtree boxes in C = ∪P ∈D CP in a BBD tree T . Using the same approach as in Section 3.1, we can show that the number of leaves in T is O(nγ d−1 ). We now describe a crucial property satisfied by the subdivision induced by the leaves of T , which enables us to select representatives for the cells economically. The proof is Proof. (Sketch) Let c be any cell in X and let Sc′ ⊆ S be the set of points that are the nearest neighbor of some point in c. Let q be any point in c. We will identify two sets of points Rc′ ⊆ S (outer representatives) and Rc′′ ⊆ S (inner representatives) each consisting of t = O(1/(ǫγ)(d−1)/2 ) points, such that NN q (Rc′ ) ≤ (1 + ǫ)NN q (Sc′ ∩ γbc ) and NN q (Rc′′ ) ≤ (1 + ǫ)NN q (Sc′ ∩ γbc ). Finally, we will store the set of representatives Rc = Rc′ ∪ Rc′′ with c. Clearly NN q (Rc ) ≤ (1 + ǫ)NN q (S). In this version of the paper, we only present our method for computing Rc′ , the set of outer representatives, and show P that c∈X |Rc′ | = O(nγ d−1 log(2/(ǫγ))). Using similar ideas, we can compute the inner representatives and show that this bound also applies to their total number. Details will appear in the full version. Letting k′ = k + 1, we have Sc′ ⊆ k′ γbc because, by Lemma 8, the ball kγbc contains at least one point of S and γ is at least 2. Let Sc′′ = Sc′ ∩ (k′ γbc − γbc ). Let Uc denote the set of quadtree boxes that have the largest size not exceeding ǫγ 2 sc /8 and that contain at least one point of Sc′′ . Clearly |Uc | = O(1/(ǫγ)d ). Note that any quadtree box u ∈ Uc can be enclosed within a ball bu of radius ru ≤ ǫγ 2 sc d/16 ≤ γsc d/16, which implies that the minimum distance of separation between the balls bc and b√u , ℓ′u , is √ at least γsc d/8. It is now easy to verify that r1 ru /(ℓ′u ǫ) = O(1) and ℓ′u ≥ max(r1 , ru )/2, where r1 = sc d/2 is the radius of ball bc . Therefore, by Lemma 2, we can find a set Rc,u ⊆ Sc′′ ∩ u consisting of O(1) points such that NN q (Rc,u ) ≤ (1 + ǫ/2)NN q (Sc′′ ∩ u). It follows that the set R̃c′ = ∪u∈Uc Rc,u has at most O(1/(ǫγ)d ) points and satisfies NN q (R̃c′ ) ≤ (1 + ǫ/2)NN q (Sc′ ∩ γbc ). The main difficulty with using R̃c′ as the outer representatives for c is that their number can be too large. To remedy this, observe that c ⊆ bc and all the points in R̃c′ lie outside γbc . Thus, we can apply Lemma 1 to obtain a set Rc′ ⊆ R̃c′ consisting of O(1/(ǫγ)(d−1)/2 ) points such that NN q (Rc′ ) ≤ (1 + ǫ/4)NN q (R̃c′ ). Thus NN q (Rc′ ) ≤ ≤ (1 + ǫ/4)(1 + ǫ/2)NN q (Sc′ ∩ γbc ) (1 + ǫ)NN q (Sc′ ∩ γbc ), as desired. We next present a charging argument to show that « „ X ′ 2 d−1 . |Rc | = O nγ log ǫγ c∈X Let D be the set of dumbbells corresponding to the WSPD for S, using separation factor 8. Each dumbbell P ∈ D allocates a unit charge to each cell in a set C̃P ⊆ X defined as follows. Let X ⊆ S and Y ⊆ S denote the set of points that are enclosed within the two heads of P . Let x and y denote the centers of the heads enclosing X and Y , respectively. Let ℓ = |xy|, let z denote the center of the segment xy, and let B̃P denote the set of balls of radius 2i ℓ, for 0 ≤ i ≤ ⌈log((40k′ d)/(ǫγ))⌉, centered at z. Let VP denote the Voronoi bisector of X and Y . For a ball b ∈ B̃P , let C˜b ⊆ X be the set of cells overlapping b ∩ VP that have size at least rb /(8k′ γd), where rb denotes the radius of b. Finally, we let C̃P = ∪b∈B̃P C˜b . We claim that: (i) the total charges allocated to all the cells in X by the above procedure is O(nγ d−1 log(2/(ǫγ))) and, (ii) each cell c ∈ X receives enough charge to pay for the representatives stored with c. To prove (i), note that Lemma 5 implies that for any ball b ∈ B̃P , |C˜b | is O(γ d−1 ). Since the number of balls in B̃P is O(log(2/(ǫγ))), it follows that each dumbbell allocates a unit charge to at most O(γ d−1 log(2/(ǫγ))) cells in X . Recalling that the number of dumbbells in D is O(n), (i) follows. To prove (ii), we will show that cell c receives charge from at least Ω(|R̃c′ |) dumbbells. In light of (i) and the fact that Rc′ ⊆ R̃c′ , this would clearly imply the desired bound on P ′ ′ ′ c∈X |Rc |. We first identify a set of points R̂c ⊆ R̃c such ′ ′ that |R̂c | = Ω(|R̃c |) and the distance between any pair of points in R̂c′ is at least ǫγ 2 sc /16. We can obtain the set R̂c′ using the following procedure. We start with R̂c′ being empty and consider the quadtree boxes in Uc one by one. For each quadtree box u examined, we add any one point of Rc,u to R̂c′ , and eliminate the at most 3d quadtree boxes in Uc that share a common boundary with u from future consideration. We continue in this manner until all the quadtree boxes are eliminated. It is clear that this process yields a set R̂c′ with the properties mentioned above. We next show that there are at least |R̂c′ | − 1 distinct dumbbells that separate pairs of points in |R̂c′ |, and that each of these dumbbells allocates a unit charge to cell c. Clearly this would imply (ii). Consider the following process for identifying a set of dumbbells. At each step, we find the dumbbell in D that separates the closest pair of points among the remaining points in R̂c′ . We then eliminate one of these two points and continue this process with the remaining points. We stop when only one point in |R̂c′ | remains. Obviously this process finds a set of |R̂c′ | − 1 dumbbells. Noting that, at each step, each head of the dumbbell identified contains exactly one point, among the points in R̂c′ that have not yet been eliminated, it is clear that all the dumbbells obtained are distinct. It remains to show that each of these |R̂c′ | − 1 dumbbells allocates a charge to cell c. In fact, we will show that any dumbbell P that separates two points x, y ∈ R̂c′ allocates a charge to c. Observe first that VP must intersect c since x and y are both the nearest neighbor of some point in c. Let x′ and y ′ denote the centers of the heads of P , let z denote the center of the line segment x′ y ′ , and let ℓ = |x′ y ′ |. Using the definition of well-separatedness and the triangle inequality, it follows that |xx′ | ≤ ℓ/8, |yy ′ | ≤ ℓ/8, and 4|xy|/5 ≤ ℓ ≤ 4|xy|/3. Since x and y are both contained in k′ γbc , it is easy to show that both ℓ, and the distance between z and any point in c, are less than 2k′ γsc d. Also, since the distance between x and y is at least ǫγ 2 sc /16, it follows that ℓ ≥ ǫγ 2 sc /20. This implies that the radius of the largest ball in B̃P is at least 2k′ γsc d, and so there must be a ball in B̃P that overlaps c ∩ VP . Let rb denote the radius of the smallest ball b ∈ B̃P that overlaps c ∩ VP . If b is the smallest ball in B̃P , then its radius rb is ℓ, which is less than 2k′ γsc d. Otherwise, rb cannot exceed twice the closest distance between z and c ∩ VP , which implies that rb is less than 4k′ γsc d. Recalling that P allocates a charge to all cells overlapping b ∩ Vp that have size at least rb /(8k′ γd), it is clear that c receives a charge from P . This completes the proof. To complete the construction, we weight each cell of this AVD with the associated number of representatives. We then build a weighted BBD tree for these cells. Letting t = 1/(ǫγ)(d−1)/2 , we apply property (v) of BBD trees to produce a truncated tree having „ « 3(d−1) d−1 2 O nǫ 2 γ 2 log ǫγ cells. For each cell in the resulting tree, the number of representatives is no greater than the sum of its associated weights. From this we have the following result. Theorem 2. Let S be a set of n points in IRd , and let 0 < ǫ ≤ 1/2 and 2 ≤ γ ≤ 1/ǫ be two real parameters. We can construct an (O(1/(ǫγ)(d−1)/2 ), ǫ)-approximate Voronoi diagram for S that consists of „ « 3(d−1) d−1 2 O nǫ 2 γ 2 log ǫγ cells, where each cell is the difference of two cubes. Moreover, for any query point, we can return its ǫ-NN in time O(log(nγ) + 1/(ǫγ)(d−1)/2 ). Here the constants in the Onotation are independent of ǫ and γ. 5. LOWER BOUNDS Our lower bound construction is parameterized by the number of points n, the dimension d, the approximation factor ǫ and the desired number of representatives per cell t. Intuitively, to make the number of cells large we must create a situation in which many cells are small. To make cells as small as possible, we distribute t + 1 sites evenly on a (k − 1)-dimensional sphere centered at the origin. The set of points in IRd that are equidistant from these points lie on a linear subspace J of dimension d − k. Any cells that has a significant overlap with J has t + 1 contenders for the nearest neighbor for each point in the cell. If this cell is relatively close to the origin we can derive an upper bound on the size of such a cell. We will see that as k increases this size bound decreases. However, as k increases, the dimension of J decreases, and hence the number of overlapping cells decreases. Thus we face a tradeoff. Our construction takes the dimension k as a parameter, and then derives the value of k that produces the best lower bound. We make ⌊n/(t + 1)⌋ widely distributed copies of this configuration to produce the final lower bound. The main results of this section are provided below. The aspect ratio of a rectangle is defined to be the ratio between its longest and shortest side lengths. Theorem 3. Consider AVDs in which each cell is an axis-aligned box of bounded aspect ratio or the difference of two such boxes. (i) Given t and ǫ, there exists a set of n sites in IRd such that the number of cells in any (t, ǫ)-AVD for this set is 0 1 „ «(d−1)+c−2√2c(d−1) 1 A, Ω @n ǫ ln t and c ≤ where c = − ln ǫ d−1 . 2 (ii) The ratio between the upper bound provided in Theorem 1 and this lower bound is 1 0 ! „ « d−1 „ «2√2c(d−1)−4c 2 1 1 A @ log γ ≤ O log γ . O ǫ ǫ Remark: Recalling the observation that the log γ factor in the upper bound can be avoided if γ = 1/ǫ, it follows that in the extreme case t = 1 (c = 0) the upper and lower bounds match asymptotically. Also, when γ is a constant, the factor log γ can be ignored, and hence it follows that in the other extreme case t = 1/ǫ(d−1)/2 (c = (d − 1)/2) the upper and lower bounds match. The remainder of this section is devoted to proving Theorem 3. Define H to be the linear subspace that is orthogonal to the diagonal vector whose P coordinates are all 1, that is, w ~ = (w1 , . . . , wd ) ∈ H if i wi = 0. Let k, 1 ≤ k ≤ d − 1, be an integer parameter to be fixed later, and let J be some d − k dimensional linear subspace of H. Let K be the k dimensional orthogonal linear subspace of J. Let U be the intersection of a unit sphere with K. The principal properties of J, K and U are the following. (i) The angle between any coordinate vector and a nonzero vector of H is bounded below by a constant (depending on dimension). Because J ⊆ H, this applies to J as well. (ii) Given any set of points on U , every point on J is equidistant from all of these points. (iii) A set of points is δ-sparse if for each point in the set its nearest neighbor in the set is at distance at least δ. From standard results it follows that the unit sphere U has a θ-sparse set of cardinality Ω(1/θk−1 ). Recalling that t is the number of representatives, define θ to be maximum such that there is a θ-sparse set on U of size at least t + 1. Let Sk denote this set. By the observation above θ is Ω(1/t1/(k−1) ). Let α = 18ǫ/θ2 . We show that no cell of an AVD of S can contain a large ball centered on J and close to the origin. Let BJ be the intersection of a ball of unit radius centered at the origin and J. Lemma 10. Consider a (t, ǫ)-AVD for the set S, whose cells are the differences of axis-aligned rectangles of bounded aspect ratio. Consider a ball b of radius α centered at some point of BJ . Then no cell of the AVD can contain b. To sketch the proof we observe that if such a cell existed, then by property (ii) every point of S is a nearest neighbor to some point in the cell. Since S has t + 1 points, one point p ∈ S is not a representative for this cell. Consider the point q on the boundary of the ball that is closest to p. Because S is θ-sparse, and the cell is relatively close to S, it can be shown that the distance of every other point in S to q exceeds the length |qp| by ǫ, providing a contradiction. Let C denote the set of cells of the (t, ǫ)-AVD that intersect BJ . For each c ∈ C, define cα to be the set of points of the interior of c that are at distance at least α from c’s boundary. Clearly, if cα intersects BJ , then c contains a ball of radius α that is centered at a point in BJ , and by the previous lemma, no such cell can be in the AVD. Otherwise, by the fact that AVD cells are fat and axis-aligned and by property (i) above, it follows that the diameter of the intersection of any cell and BJ is O(α). By applying a simple packing argument to BJ it follows that the number of cells of the AVD that intersect BJ is at least „ «d−k «d−k „ „ 2 «d−k 1 1 θ Ω , = Ω 2/(k−1) = Ω α ǫ t ǫ for t ≤ (1/ǫ)(k−1)/2 . To complete the construction, we replicate this configuration of points ⌊n/(t + 1)⌋ times at a sufficiently large distance from each other. Up to a constant factor depending on d, the total number of cells is at least L = nt (t2/(k−1) ǫ)k−d . To simplify this expression, let us express t as (1/ǫ)c for some constant c. Our constraint on t implies that c ≤ (k − 1)/2. Thus we have the following lower bound. „ «(d−k)(1− 2c )−c „ «(d−1)+c−(k−1)−2c d−1 k−1 k−1 1 1 L = n . = n ǫ ǫ Since k is under our control, to get the best lower bound p we maximize the exponent, by setting k − 1 = 2c(d − 1). It is easy to verify that if c ≤ (d − 1)/2 then this satisfies our prior constraint on t. As a result we have the following lower bound, which establishes Theorem 3(i). „ «(d−1)+c−2√2c(d−1) 1 L=n . ǫ From Theorem 1 it follows that, ignoring constant factors and the log γ term, there exists a (t, ǫ)-AVD for our point set with the following number of cells. U = nǫ(d−1)/2 γ 3(d−1)/2 . To achieve t representatives we set γ = (1/ǫ)(1/t)2/(d−1) and set t = (1/ǫ)c to get „ «(d−1)−3c 1 U =n . ǫ To compare the upper and lower bound, we compute their ratio „ «2√2c(d−1)−4c 1 U . = ρ(c) = L ǫ To determine the worst-case ratio as a function of t, we maximize this expression over c yielding c = (d − 1)/8 (and hence t = (1/ǫ)(d−1)/8 ). Note that this satisfies our earlier requirement that c ≤ (d−1)/2. This yields the following upper bound on the ratio between the upper and lower bounds. „ «(d−1)/2 1 ρ(c) ≤ . ǫ This establishes Theorem 3(ii). 6. REFERENCES [1] S. Arya and T. Malamatos. Linear-size approximate Voronoi diagrams. In Proc. 13th ACM-SIAM Sympos. Discrete Algorithms, pages 147–155, 2002. [2] S. Arya and D. M. Mount. Approximate range searching. Computational Geometry: Theory and Applications, 17:135–152, 2000. [3] S. Arya, D. M. Mount, N. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. J. ACM, 45:891–923, 1998. [4] P. B. Callahan and S. R. Kosaraju. A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields. J. ACM, 42:67–90, 1995. [5] T. M. Chan. Approximate nearest neighbor queries revisited. Discrete Comput. Geom., 20:359–373, 1998. [6] K. L. Clarkson. An algorithm for approximate closest-point queries. In Proc. 10th Annu. ACM Sympos. Comput. Geom., pages 160–164, 1994. [7] R. M. Dudley. Metric entropy of some classes of sets with differentiable boundaries. J. Approx. Theory, 10:227–236, 1974. [8] C. A. Duncan, M. T. Goodrich, and S. G. Kobourov. Balanced aspect ratio trees: Combining the advantages of k-d trees and octrees. J. Algorithms, 33:303–333, 2001. [9] S. Har-Peled. A replacement for Voronoi diagrams of near linear size. In Proc. 42 Annu. IEEE Sympos. Found. Comput. Sci., pages 94–103, 2001. [10] P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. 30th Annu. ACM Sympos. Theory Comput., pages 604–613, 1998. [11] E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimemsional spaces. In Proc. 30th Annu. ACM Sympos. Theory Comput., pages 614–623, 1998. [12] Y. Sabharwal, S. Sen, and N. Sharma. Improved space bound for approximate Voronoi diagram. Manuscript, 2001. [13] J. Vleugels and M. Overmars. Approximating Voronoi diagrams of convex sites in any dimension. Internat. J. Comput. Geom. Appl., 8:201–222, 1998.