Christer Ericson Real Time Collision Detection en 324 386
Christer Ericson Real Time Collision Detection en 324 386
Christer Ericson Real Time Collision Detection en 324 386
Spatial Partitioning
Recall from Chapter 2 that pairwise processing is expensive, and that the idea of
broad-phase processing is to restrict pairwise tests to objects near enough that they
could possibly intersect. Spatial partitioning techniques provide broad-phase process-
ing by dividing space into regions and testing if objects overlap the same region of
space. Because objects can only intersect if they overlap the same region of space, the
number of pairwise tests is drastically reduced. This chapter explores three types of
spatial partitioning methods: grids, trees, and spatial sorting.
285
286 Chapter 7 Spatial Partitioning
Figure 7.1 Issues related to cell size. (a) A grid that is too fine. (b) A grid that is too coarse
(with respect to object size). (c) A grid that is too coarse (with respect to object complexity).
(d) A grid that is both too fine and too coarse.
1. The grid is too fine. If the cells are too small, a large number of cells must be updated
with associativity information for the object, which will take both extra time and
space. The problem can be likened to fitting elephant-size objects into matchbox-
size cells. Too fine of a grid can be especially bad for moving objects, due to the
number of object links that must be both located and updated.
2. The grid is too coarse (with respect to object size). If the objects are small and the grid
cells are large, there will be many objects in each cell. As all objects in a specific cell
are pairwise tested against each other, a situation of this nature can deteriorate to
a worst-case all-pairs test.
3. The grid is too coarse (with respect to object complexity). In this case, the grid cell
matches the objects well in size. However, the object is much too complex, affect-
ing the pairwise object comparison. The grid cells should be smaller and the objects
should be broken up into smaller pieces, better matching the smaller grid cell size.
4. The grid is both too fine and too coarse. It is possible for a grid to be both too fine and
too coarse at the same time. If the objects are of greatly varying sizes, the cells can
be too large for the smaller objects while too small for the largest objects.
Case 4 is addressed in more detail in Section 7.2, on hierarchical grids. Taking the
remaining cases into consideration, cell size is generally adjusted to be large enough
(but not much larger) to fit the largest object at any rotation. This way, the number
of cells an object overlaps is guaranteed to be no more than four cells (for a 2D grid;
eight for a 3D grid). Having objects overlap only a small number of cells is important.
7.1 Uniform Grids 287
It limits the amount of work required to insert and update objects in the grid, and to
perform the actual overlap tests.
Under the normal assumption of frame-to-frame noninterpenetration of objects,
with cells and objects being almost the same size the maximum number of objects in
a cell is bounded by a small constant. Consequently, resolving collisions within a cell
using an all-pairs approach results in only a small number of pairwise object tests.
With cell size selected, grid dimensions are determined by dividing the diameter
of the simulation space by the diameter of the cells. If memory use for storing the
grid itself is a concern, the cell size can be increased at the cost of possible decreased
performance. However, more grid cells do not necessarily have to translate into use
of more memory (more on this in the following sections).
In ray tracing a popular method for determining the grid dimensions has been the
n1/3 rule: given n objects, divide space into a k × k × k grid, with k = n1/3 . The ratio of
cells to objects is referred to as the grid density. Recent results indicate that a density
of 1 is not very good, and a much higher density is suggested [Havran99].
It is sometimes claimed that it is difficult to set a near-optimal cell size, and that
if cell size is not correctly set grids become expensive in terms of performance or
prohibitively memory intensive (see, for example, [Cohen95]). Practical experience
suggests that this is not true. In fact, for most applications although it is easy to pick
bad parameter values it is equally easy to pick good ones.
// Cell position
struct Cell {
Cell(int32 px, int32 py, int32 pz) { x = px; y = py; z = pz; }
int32 x, y, z;
};
Note that it is surprisingly easy to construct hash functions that although perhaps
not bad, are just not very good (and there are no guarantees the previous hash
function is particularly strong). For information on constructing more sophisticated
hash functions see [Binstock96], [Jenkins97], or [Knuth98].
Performing a hash mapping of this type has the added benefit of allowing the grid
to be unbounded in size. The mapping function wraps world coordinates into a finite
number of cells (thereby making boundary checks for accessing neighboring cells
unnecessary). Although the world may consist of an infinite amount of cells, only a
finite number of them are overlapped by objects at a given time. Thus, the storage
required for a hashed grid is related to the number of objects and is independent of
the grid size.
To avoid having to perform a hash table lookup to find out that a cell is in fact
empty and is not in the table, a dense bit array with 1 bit per cell in the grid can be
used as a quick pretest indicator of whether a cell is empty or not. (Of course, this
optimization is only possible by keeping the grid fixed in size.)
7.1 Uniform Grids 289
index: 0 1 n–1
Figure 7.2 A (potentially infinite) 2D grid is mapped via a hash function into a small number
of hash buckets.
To handle collisions (the mapping of two keys to the same bucket) in the hash
table, two methods are traditionally used: open hashing and closed hashing. In open
hashing (also known as separate chaining), each bucket is allowed to contain a linked
list of records. When a new entry is added to the hash table, the bucket index is
computed and the entry is added at the start of the linked list of the corresponding
bucket. Each of the stored records contains a key along with associated data. In this
case, the key would be the original cell coordinates and the data would be the head
of a linked list of objects contained in that cell. As this representation ends up with
buckets containing lists of lists, the code for updating the data structure as objects
are added or deleted from a grid cell and its corresponding bucket becomes slightly
more complicated than desired.
In contrast, closed hashing (or open addressing) stores records directly within the
hash buckets themselves. When the computed bucket index of a new entry points
at an occupied bucket, new bucket indices are computed (in a consistent manner)
until an empty bucket is found and the record is inserted. The simplest such method,
known as linear probing, simply tries subsequent buckets until one is found, wrapping
around at the end of the hash table if necessary. Although linear probing has good
cache behavior, it performs poorly as the hash table fills. Using linear probing, hash
table utilization should not be much higher than 50% for good performance. Other,
better but more complicated, probing approaches exist. See [Knuth98] for a detailed
analysis of the behavior of linear probing and alternative probing methods.
A problem with closed hashing is that objects cannot be directly deleted from the
hash table, but must be lazily deleted to avoid having subsequent searches fail. A
lazy deletion involves marking the slot so that it is free for an insertion yet is not
considered empty, which would (incorrectly) terminate a search of the hash table for
some other object.
290 Chapter 7 Spatial Partitioning
A third solution is to use the open hashing method but exclude the first-level
linked list and have the buckets directly contain a list of objects. This results in very
simple code with low overhead, but the lack of a stored key introduces two problems
that must be addressed. First, when performing tests against the content of a specific
cell it is now possible its bucket will contain both objects of interest as well as those
from other cells, which happened to map to the same bucket. To avoid spending time
peforming expensive narrow-phase tests on distant objects, in-depth tests must be
preceded by a bounding volume test to cull irrelevant objects. Such culling is a good
overall strategy anyway, in that nonoverlapping objects can share buckets regardless
of hash table representation.
Second, a related problem occurs when several cells are tested by the same query.
If two or more of the cells have been aliased to the same nonempty hash bucket, its
content will be checked several times for the same query. Fortunately, this seemingly
serious problem is easily solved by a time-stamping method in which hash buckets
are labeled each time they are queried. The label consists of a unique identifier that
changes for each query made. During a query, if a hash bucket is found labeled with
the current identifier the bucket has already been examined and can be ignored. Time
stamping is further described in Section 7.7.2.
A different approach to extending a grid when an object moves out of the region
covered by the grid is described in [Reinhard00]. Separating the grid into both logical
and physical parts, the logical grid is allowed to grow as the object moves outside the
grid. However, the extra cells thus introduced are mapped onto the existing physical
grid through a toroidal wrapping of logical cell coordinates modulo the size of the
physical grid. The size of the logical grid now determines whether an object near
the physical grid boundary must consider collisions against objects in its toroidal
neighbor cells on the opposite side of the grid.
m= ai
0≤i<n
7.1 Uniform Grids 291
G F C A
H B G F C A H B I E D
I E D
(a) (b)
Figure 7.3 (a) A grid storing static data as lists. (b) The same grid with the static data stored
into an array.
is allocated. Each cell is also associated with an index number bk into this array,
where
b0 = 0
bk = bk−1 + ak−1 .
The data is then in a second pass iterated over again. This time, the ak objects over-
lapping cell Ck are stored with help from the associated index number into the array
at indices bk through bk + ak − 1. Because all data for each cell is stored contigu-
ously in the array, and because no links are needed between the contiguous entries,
the resulting array-based grid uses less memory and is more cache efficient than a
representation based on linked lists.
C
C
E E D C
D
B D B
A A
D D E E B
A C
Figure 7.4 A 4 × 5 grid implicitly defined as the intersection of 9 (4 + 5) linked lists. Five
objects have been inserted into the lists and their implied positions in the grid are indicated.
4 11000
3
1 01010
00100
2
0 00101
Figure 7.5 A 4 × 5 grid implicitly defined as the intersection of 9 (4 + 5) bit arrays. Five
objects have been inserted into the grid. The bit position numbers of the bits set in a bit array
indicate that the corresponding object is present in that row or column of the grid.
A more novel storage approach is to allocate, for each row and column of the grid,
a bit array with 1 bit for each object in the world. Placing an object into the grid now
simply involves setting the bit in the bit array in both the row and column bit arrays
for the grid cell corresponding to the object (Figure 7.5).
An object exists in a grid cell if and only if the bit arrays of the row and column of
the cell both have the bit corresponding to that object set. Testing for the presence of
objects in a cell is therefore as simple as performing a bitwise AND operation on the
two bit arrays.
When the query object overlaps several grid cells, the distributive properties of
bitwise operators can be used to reduce the amount of computation needed. For
example, assume an object overlaps rows ri and ri+1 and columns cj and cj+1 . The
objects potentially intersected are given by the union of all bits set in the bitwise
7.1 Uniform Grids 293
intersection of each of the four grid cells (ri , cj ), (ri , cj+1 ), (ri+1 , cj ), and (ri+1 , cj+1 ). If
r[i] and c[j] correspond to the bit masks of row cell i and column cell j, respectively,
this method gives the resulting union b of bits as
b = (r[i] & c[j]) | (r[i] & c[j+1]) | (r[i+1] & c[j]) | (r[i+1] & c[j+1]);
However, the distributive properties of the bitwise operators allow this expression
to be more efficiently computed as
// Compute the merged (bitwise-or’ed) bit array of all overlapped grid rows.
// Ditto for all overlapped grid columns
for (int y = y1; y <= y2; y++)
294 Chapter 7 Spatial Partitioning
// Now go through the intersection of the merged bit arrays and collision test
// those objects having their corresponding bit set
for (int i = 0; i < NUM_OBJECTS_DIV_32; i++) {
int32 objectsMask = mergedRowArray[i] & mergedColumnArray[i];
while (objectsMask) {
// Clears all but lowest bit set (eg. 01101010 -> 00000010)
int32 objectMask = objectsMask & (objectsMask - 1);
// Get index number of set bit, test against corresponding object
// (GetBitIndex(v) returns log_2(v), i.e. n such that 2∧ n = v)
int32 objectIndex = GetBitIndex(objectMask) + i * 32;
TestCollisionAgainstObjectNumberN(objectIndex);
// Mask out tested object, and continue with any remaining objects
objectsMask ∧ = objectMask;
}
}
}
A simplified version of implicit grids using bit arrays can be used as a coarse
broad-phase rejection method. Instead of having a bit array indicating exactly what
objects are in each row and column of the grid, a single bit can be used to indi-
cate the presence of an object in a row or column. If no bit is set for the rows and
columns an object overlaps, the object cannot be in overlap and no further testing is
necessary.
Table 7.1 illustrates memory use for three different grid representations: the implicit
grid using bit arrays, a dense array (with double-linked lists), and a sparse array (also
using double-linked lists). Up to roughly 1,000 objects and a grid size of 100 × 100,
the implicit bit grid uses less memory than the other two representations. The implicit
bit grid also uses memory in a cache-friendly way and is overall a very efficient spatial
partitioning method.
Table 7.1 Memory use (in bytes) for three different grid representations: implicit grid (with bit arrays),
dense array (with double-linked list), and a sparse array (with double-linked list).
Grid size (m × n)
object-object tests. Should objects be placed in just one representative cell or all cells
they overlap? If just one cell, what object feature should be used to determine which
cell the object goes into? The next two sections look at these issues in the context of
how the test queries are issued: if they are issued one at a time (allowing objects to
move between successive tests) or if they are all issued simultaneously.
object. Thus, regardless of which other cells (if any) the current object overlaps all
neighboring cells and their contents must be tested for collision.
This makes a total of nine cells tested (27 cells in 3D), which is not very promising.
If instead the minimum corner of the axis-aligned bounding box has been used, most
neighboring cells have to be tested only if the current object actually overlaps into
them. The pseudocode for the 2D version of this test reads:
At worst, all nine cells will still have to be checked, but at best only four cells
have to be tested. This makes the minimum corner a much better feature for object
placement than the center point. On the other hand, if objects have been placed in
all cells they overlap the single object test would have to check only those exact cells
7.1 Uniform Grids 297
the AABB overlaps, as all colliding objects are guaranteed to be in those cells. The
test would thus simply read:
Four tested cells is now the worst case, and a single cell the best case. Although
this makes for the fastest test, it complicates the code for updating when objects move
(having to update up to eight cells in a 3D grid) and is likely to use more memory than
single-cell storage. In addition, collision between two objects can now be reported
multiple times and thus pair collision status must be maintained (as discussed in
Section 7.7.1).
B
A
Figure 7.6 Objects A and B are assigned to a cell based on the location of their top left-hand
corners. In this case, overlap may occur in a third cell. Thus, to detect intersection between
objects cells must be tested against their NE or SW neighbor cells.
This approach requires that all objects are tested, and thus these five tests cannot
be combined with any border overlap test (as was done earlier), because that would
prevent some necessary tests from occurring. For example, say object A does not
overlap into the east neighbor cell and that the neighbor cell is not tested against.
However, there could still be an object B in the east cell that overlaps into the cell to
its west (overlapping with object A), but because the east cell does not test against
its west neighbor any overlap between A and B would go undetected. Furthermore,
as some (but not all) collisions will be detected twice per frame this approach would
require that overlap status be recorded (for instance in a collision-pair matrix).
The case in which objects are placed in a single cell based on the minimum corner
point can be similarly simplified. Here, however, border overlap tests can be used to
minimize the number of neighboring cells checked further. This test now reads:
When testing neighboring cells, range checking of grid coordinates can be avoided
by placing a layer of sentinels (dummy cells) around the border of the grid.
A B
(a) (b)
Figure 7.7 (a) In a regular grid, grid cell A has eight neighbors. (b) In a hexagonal-type grid,
grid cell B has just six neighbors.
5
4
3 B D
2 F
1 A C E
A B C D E F
Figure 7.8 A small 1D hierarchical grid. Six objects, A through F, have each been inserted in
the cell containing the object center point, on the appropriate grid level. The shaded cells are
those that must be tested when performing a collision check for object C.
hierarchy works well. Let the cells at level 1 just encompass the smallest objects. Then
successively double the cell size so that cells of level k + 1 are twice as wide as cells
of level k. Repeat the doubling process until the highest-level cells are large enough
to encompass the largest objects. In general, this setup serves well to minimize the
number of cells an object overlaps. It is still possible to do better in specific instances,
however. For instance, if there are only two object sizes clearly two grid levels
suffice.
Testing whether an arbitrary object A is in collision with objects stored in the hgrid
is done by traversing through all hgrid levels. At each level, A is tested against not only
those grid cells its bounding volume overlaps but those neighboring cells that could
have objects extending into the cells that are overlapped (as discussed in Section
7.1.6). Note that it is not sufficient to check just the immediately neighboring cells to
the cell the object is in. A large object, when checked against the grid, would cover a
lot of cells at the lowest level and not just the immediate neighborhood.
If not just a single object, but all n objects are tested at the same time, testing does
not have to proceed over all hgrid levels. Instead, each object can start testing from
its insertion level. This follows, as smaller objects will test against the grid levels of
larger objects.
Overall, the hgrid structure is not much different from a tree structure, such as an
octree, implemented using arrays. However, one key feature that sets it apart from
the tree structures is the lack of a tree “top.” That is, the hgrid corresponds to just
the last few levels of the tree. The top levels of a tree rarely contain any objects and
are simply traversed just to reach the lower levels at which the objects reside. Hgrids
avoid these empty levels and gain extra speed in doing so.
To speed up the collision testing further, it is worthwhile keeping track of the total
number of objects on each grid level. Levels that contain no objects can then be
excluded from testing.
Hgrids can be very memory expensive when each grid level is allocated as a dense
array. For this reason they are best implemented using hashed storage, as described
in Section 7.1.3.
302 Chapter 7 Spatial Partitioning
struct HGrid {
uint32 occupiedLevelsMask; // Initially zero (Implies max 32 hgrid levels)
int objectsAtLevel[HGRID_MAX_LEVELS]; // Initially all zero
Object *objectBucket[NUM_BUCKETS]; // Initially all NULL
int timeStamp[NUM_BUCKETS]; // Initially all zero
int tick;
};
It is assumed that objects are inserted into the hgrid based on their bounding
spheres. For the sake of simplicity, the given code links objects within a bucket using
single-linked lists, with the list pointer embedded in the object. Use of a double-
linked list would allow removal in O(1) time, instead of O(n) time. Associating each
hgrid cell with an array (of object pointers) also allows removal in O(1) time. The latter
option requires some bookkeeping to handle the arrays overflowing, but is overall
a faster and more cache-friendly option. Let the object representation contain the
following variables.
struct Object {
Object *pNextObject; // Embedded link to next hgrid object
Point pos; // x, y (and z) position for sphere (or top left AABB corner)
float radius; // Radius for bounding sphere (or width of AABB)
int bucket; // Index of hash bucket object is in
int level; // Grid level for the object
... // Object data
};
Cells are here assumed to be square (cubical for a 3D hgrid). The ratio of the
diameter of the largest sphere that goes into a cell at a particular level with respect to
the cell side size is given by:
The smaller the ratio is, the fewer cells must be examined at a given level, on
average. However, it also means that the average number of objects per cell increases.
An appropriate ratio is application specific. When traversing from level to level, the
grid cell side increases by the following factor:
const float CELL_TO_CELL_RATIO = 2.0f; // Cells at next level are 2*side of current cell
// Add object to grid square, and remember cell and level numbers,
// treating level as a third dimension coordinate
Cell cellPos((int)(obj->pos.x / size), (int)(obj->pos.y / size), level);
int bucket = ComputeHashBucketIndex(cellPos);
obj->bucket= bucket;
obj->level = level;
obj->pNextObject = grid->objectBucket[bucket];
grid->objectBucket[bucket] = obj;
// Mark this level as having one more object. Also indicate level is in use
grid->objectsAtLevel[level]++;
grid->occupiedLevelsMask |= (1 << level);
}
Checking an object for collision against objects in the hgrid can be done as follows.
It is assumed that nothing is known about the number of objects tested at a time,
and thus all grid levels are traversed (except when it can be concluded that there are
no more objects to be found in the grid).
// If all objects are tested at the same time, the appropriate starting
// grid level can be computed as:
// float diameter = 2.0f * obj->radius;
// for ( ; size * SPHERE_TO_CELL_RATIO < diameter; startLevel++) {
// size *= CELL_TO_CELL_RATIO;
// occupiedLevelsMask >>= 1;
// }
7.2 Hierarchical Grids 305
// Has this hash bucket already been checked for this object?
if (grid->timeStamp[bucket] == grid->tick) continue;
grid->timeStamp[bucket] = grid->tick;
When, as here, a callback is used to call a function to handle two colliding objects
it is important that the callback function not change the hgrid data structure itself.
Because CheckObjAgainstGrid() is in the middle of operating on the data structure,
such changes would cause reentrancy problems, leading to missed collisions at best
and crashes at worst.
5
4
3 B B D
2 F
1 A A C C E E
A B C D E F
Figure 7.9 In Mirtich’s (first) scheme, objects are inserted in all cells overlapped at the
insertion level. As in Figure 7.8, the shaded cells indicate which cells must be tested when
performing a collision check for object C.
7.3 Trees 307
In that this test relies on knowing which pairs to test, pair information has to be
tracked separately. Whenever an object A is stored in a grid cell containing another
object B, a counter corresponding to the pair (A, B) is incremented, and the pair
is added to the tracking data structure. When either object moves out of one of
their common cells, the counter is decremented. When the counter reaches zero, the
pair is removed from the tracking data structure because they are no longer close.
Only pairs in the tracking data structure undergo narrow-phase collision detection
checking. The cost of moving an object is now much larger, but thanks to spa-
tial coherence the larger grid cells have to be updated only infrequently. Overall,
this scheme trades the cost of updating cells for the benefit of having to test fewer
grid cells.
7.3 Trees
Trees were discussed in the previous chapter in the context of bounding volume
hierarchies. Trees also form good representations for spatial partitioning. This section
explores two tree approaches to spatial partitioning: the octree and the k-d tree.
308 Chapter 7 Spatial Partitioning
6 3
7
1
0 4
5
Figure 7.10 Numbering of the eight child nodes of the root of an octree.
Figure 7.11 A quadtree node with the first level of subdivision shown in black dotted lines,
and the following level of subdivision in gray dashed lines. Dark gray objects overlap the first-
level dividing planes and become stuck at the current level. Medium gray objects propagate
one level down before becoming stuck. Here, only the white objects descend two levels.
To avoid duplication of geometry between multiple octree nodes, the original set of
primitives is stored in an array beforehand and a duplicate set of primitives is passed
to the construction function. Each of the duplicate primitives is given a reference to
the original primitive. When primitives are split, both parts retain the reference to the
original primitive. As tree leaf nodes are generated, rather than storing the split-up
duplicate geometry the references to the original set of primitives are stored. As the
tree is static, this work is best done in a preprocessing step.
Duplicating the primitives across overlapped octants works great for a static envi-
ronment. However, this approach is inefficient when the octree holds dynamic objects
because the tree must be updated as objects move. For the dynamic scenario, a better
approach is to restrict placement of objects to the lowest octree cell that contains
the object in its interior. By associating the object with a single cell only, a minimum
amount of pointer information must be updated when the object moves. A drawback
of restricting objects to lie within a single octree cell is that objects may be higher in
the tree than their size justifies. Specifically, any object straddling a partitioning plane
will become“stuck”at that level, regardless of the relative size between the object and
the node volume (Figure 7.11). This problem is addressed later. From here on, it is
assumed the octree is of the dynamic variety.
The octree can begin as an empty tree grown on demand as objects are inserted.
It can also be preallocated to alleviate the cost of dynamically growing it. Given
the previous octree node definition, preallocating a pointer-based octree down to a
specified depth can be done as:
else {
// Construct and fill in ’root’ of this subtree
Node *pNode = new Node;
pNode->center = center;
pNode->halfWidth = halfWidth;
pNode->pObjList = NULL;
Because objects have been restricted to reside in a single cell, the simplest linked-
list solution for handling multiple objects in a cell is to embed the linked-list next
pointer in the object. For the sample code given here, the object is assumed to be
structured as follows.
struct Object {
Point center; // Center point for object
float radius; // Radius of object bounding sphere
...
Object *pNextObject; // Pointer to next object when linked into list
};
if (!straddle) {
if (pTree->pChild[index] == NULL) {
pTree->pChild[index] = new Node;
...initialize node contents here...
}
InsertObject(pTree->pChild[index], pObject);
} else {
...same as before...
}
for collisions can be done as a recursive top-down procedure in which the objects of
each nonempty node are checked against all objects in the subtree below the node.
// Tests all objects that could possibly overlap due to cell ancestry and coexistence
// in the same cell. Assumes objects exist in a single cell only, and fully inside it
void TestAllCollisions(Node *pTree)
{
// Keep track of all ancestor object lists in a stack
const int MAX_DEPTH = 40;
static Node *ancestorStack[MAX_DEPTH];
static int depth = 0; // ’Depth == 0’ is invariant over calls
comparisons with the coordinates of the octree nodes. Let the point’s coordinates be
given as three floats x, y, and z. Remap each of these into binary fractions over the range
[0 . . . 1]. For instance, if the octree’s x range spans [10 . . . 50], then an x coordinate of
25 would map into the fraction (25 − 10)/(50 − 10) = 0. 375 = 0. 01100000. Note
that this does not require either all dimensions to be the same or that they be powers
of 2. By virtue of construction, it should be clear that the most significant bit of the
binary fraction for the x coordinate specifies whether the point lies to the left of the
octree yz plane through the octree center point (when the bit is 0) or to the right of
it (when the bit is 1). Taking the three most significant bits of the binary fractions for
x, y, and z together as a unit, they form the index number (0–7) of the root’s child
node, in which the point lies. Similarly, the three next-most significant bits specify
the subnode within that node in which the point lies, and so on until a leaf node has
been reached.
The resulting binary string of interleaved bits of the three binary fractions can be
seen as a key, a locational code, directly describing a node position within the tree.
Nodes output in sorted order based on this locational code are said to be in Morton
order (Figure 7.12).
Given the locational code for the parent node, a new locational code for one of
its child nodes is easily constructed by left-shifting the parent key by 3 and adding
(or ORing) in the parent’s child index number (0–7) for the child node childKey =
(parentKey << 3) + childIndex. Locational codes can be used in a very memory-
efficient octree representation, explored next.
been enhanced to contain its own locational code. The locational codes for both the
parent node and all children nodes can be computed from the stored locational code.
As such, the octree nodes no longer need explicit pointers to the children, thereby
becoming smaller.
The size of a node can be explicitly stored or it can be derived from the “depth”of
the locational code. In the latter case, a sentinel bit is required in the locational code
to be able to distinguish between, say, 011 and 000000011, turning these into 1011
and 1000000011, respectively. The code for this follows.
To be able to access the octree nodes quickly given just a locational code, the
nodes are stored in a hash table using the node’s locational code as the hash key. This
hashed storage representation provides O(1) access to any node, whereas locating an
arbitrary node in a pointer-based tree takes O(log n) operations.
To avoid a failed hash table lookup, the node is often enhanced (as previously) with
a bitmask indicating which of the eight children exist. The following code for visiting
all existing nodes in a linear octree illustrates both how child nodes are computed
and how the bitmask is used.
// Takes three 10-bit numbers and bit-interleaves them into one number
uint32 Morton3(uint32 x, uint32 y, uint32 z)
{
// z--z--z--z--z--z--z--z--z--z-- : Part1By2(z) << 2
// -y--y--y--y--y--y--y--y--y--y- : Part1By2(y) << 1
// --x--x--x--x--x--x--x--x--x--x : Part1By2(x)
// zyxzyxzyxzyxzyxzyxzyxzyxzyxzyx : Final result
return (Part1By2(z) << 2) + (Part1By2(y) << 1) + Part1By2(x);
}
The support function Part1By2() splits up the bits of each coordinate value,
inserting two zero bits between each original data bit. Interleaving the intermedi-
ate results is then done by shifting and adding the bits. The function Part1By2() can
be implemented as:
// Takes two 16-bit numbers and bit-interleaves them into one number
uint32 Morton2(uint32 x, uint32 y)
{
return (Part1By1(y) << 1) + Part1By1(x);
}
If sufficient bits are available in the CPU registers, the calls to the support function
can be done in parallel. For example, using a modified Part1By1() function that
operates on 32-bit numbers giving a 64-bit result, the Morton2() function presented
previously can be written as:
In general, for the 2D case just presented, handling n-bit inputs would require
4n-bit numbers being available in the intermediate calculations.
(a) (b)
Figure 7.13 (a) The cross section of a regular octree, shown as a quadtree. (b) Expanding the
nodes of the octree, here by half the node width in all directions, turns the tree into a loose
octree. (The loose nodes are offset and shown in different shades of gray to better show their
boundaries. The original octree nodes are shown as dashed lines.)
7.3 Trees 319
depth at which an object will be inserted can be computed from the size of the object,
allowing for O(1) insertions.
This power does not come for free, however. The larger nodes overlap more neigh-
boring nodes, and more nodes have to be examined to determine whether any two
objects must be tested against each other. Still, in most cases the reduction in pairwise
tests makes loose octrees attractive for collision detection purposes.
Loose octrees are similar in structure to hierarchical grids. The key difference is
that octrees are rooted. This means that time is spent traversing the (largely) empty
top of the tree to get at the lower-level nodes where the objects are, due to objects
generally being much smaller than the world. In comparison, the hgrid is a shallower,
rootless structure providing faster access to the nodes where the objects are stored.
In addition, hgrid levels can be dynamically deactivated so that no empty levels
are processed and processing stops when no objects exist on the remaining levels.
A linearized or array-based octree allows similar optimizations. The corresponding
optimization for a pointer-based octree is to maintain a subtree object-occupancy
count for each node. The cost of updating these occupancy counts is proportional to
the height of the tree, whereas the hgrid level occupancy counters are updated in O(1)
time. Thanks to the automatic infinite tiling performed by the hash mapping, a hashed
hgrid does not need to know the world size. The octree cannot easily accommodate
an unbounded world without growing in height. Overall, hgrids seem to perform
somewhat better than loose octrees for collision detection applications.
A C I
D F
B
E
G
H I J
H
J A B C
D E F G
(a) (b)
Figure 7.14 A 2D k-d tree. (a) The spatial decomposition. (b) The k-d tree layout.
region it is in), nearest neighbor (find the point in a set of points the query point is
closest to), and range search (locate all points within a given region) queries.
The following code illustrates how to visit all nodes of a k-d tree a given sphere
overlaps. One interesting problem is correctly rejecting subtree volumes the sphere is
outside of, even when the sphere straddles one or more of the supporting hyperplanes
of the halfspaces forming the volume. The solution adopted here is to maintain the
point inside the volume closest to the sphere center during traversal. Initially this
variable, volNearPt, is set to the sphere center. As the traversal recurses the far side
of the splitting plane (the side the sphere is not on), the point is clamped to the
splitting plane to determine the closest point on the volume boundary. The distance
between volNearPt and the query sphere center can now be used to cull subtrees.
struct KDNode {
KDNode *child[2]; // 0 = near, 1 = far
int splitType; // Which axis split is along (0, 1, 2, ...)
float splitValue; // Position of split along axis
...
};
// Visit k-d tree nodes overlapped by sphere. Call with volNearPt = s->c
void VisitOverlappedNodes(KDNode *pNode, Sphere *s, Point &volNearPt)
{
if (pNode == NULL) return;
// Update (by clamping) nearest point on volume when traversing far side.
// Keep old value on the local stack so it can be restored later
float oldValue = volNearPt[pNode->splitType];
volNearPt[pNode->splitType] = pNode->splitValue;
// If sphere overlaps the volume of the far node, recurse that subtree too
if (SqDistPointPoint(volNearPt, s->c) < s->r * s->r)
VisitOverlappedNodes(pNode->child[first ∧ 1], s, volNearPt);
This code framework is easily extended to, for instance, locate the nearest neighbor
object to the query sphere by starting with a sphere of unbounded size, shrinking the
sphere as objects are encountered within the sphere. Note that the shape of the
regions in a k-d tree will affect the number of nodes being visited. For example, if
many narrow regions are next to each other, a wide query is likely to overlap all
regions. To limit the number of nodes being visited, it is best to avoid regions narrow
in one or more dimensions. Such nonnarrow regions are commonly referred to as fat
regions.
The efficiency of k-d trees (in the context of ray tracing) is discussed in
[MacDonald90]. An in-depth treatment of octrees, quadtrees, k-d trees, and similar
spatial data structures is found in [Samet90a] and [Samet90b].
(a) (b)
Figure 7.15 (a) A grid of trees, each grid cell containing a separate tree. (b) A grid indexing
into a single tree hierarchy, each grid cell pointing into the tree at which point traversal
should start.
it suffices to locate the grid cell the object is inside and follow the pointer to the
correct tree node. Using a grid in this manner can help bypass a large portion of the
tree during queries.
To handle the case of going to the correct node when an object is on a grid cell
boundary overlapping two or more cells, the grid cells can be made loose, similar to
the nodes in a loose octree. The grid cell that encompasses the object is now used
to find the entry point into the tree structure.
segment, 0 ≤ t < tmax, the segment straddles the plane and both children of the
tree are recursively descended. If not, only the side containing the segment origin is
recursively visited.
By first descending the side of the segment origin, overlapped k-d tree nodes are
guaranteed to be traversed in order near to far. Thus, assuming the closest intersection
point is sought, when a hit is detected among the node contents the test procedure
can exit early, not having to test any other nodes.
// Visit all k-d tree nodes intersected by segment S = a + t * d, 0 <= t < tmax
void VisitNodes(KDNode *pNode, Point a, Vector d, float tmax)
{
if (pNode == NULL) return;
if (d[dim] == 0.0f) {
// Segment parallel to splitting plane, visit near side only
VisitNodes(pNode->child[first], a, d, tmax);
} else {
// Find t value for intersection between segment and split plane
float t = (pNode->splitValue - a[dim]) / d[dim];
This procedure can be rewritten to avoid explicit recursion, in the manner described
in Chapter 6. Only small changes are required to accommodate traversals for octrees
and BSP trees. It is also possible to directly link the faces of a leaf to the leaf
nodes they border, allowing direct traversal from one leaf to the next. As a face
324 Chapter 7 Spatial Partitioning
(a) (b)
Figure 7.16 Cell connectivity for a 2D line. (a) An 8-connected line. (b) A 4-connected line.
In 3D, the corresponding lines would be 26-connected and 6-connected, respectively.
may border more than just one leaf node, a quadtree or k-d tree stored at the
face is used to divide the face area and link each area with the correct neighboring
leaf node.
(i, j) (i, j + 1) (i + 1, j)
ty ty
tx tx
Figure 7.17 Illustrating the values of tx, ty, tx, and ty. (a) tx is the distance between
two vertical boundaries. (b) ty is the distance between two horizontal boundaries. (c) For
cell (i, j ), the distance tx to the next horizontal boundary is less than the distance ty to the
next horizontal boundary, and thus the next cell to visit is (i + 1, j ).
horizontal boundary. If tx is less than ty, a vertical boundary will be pierced before the
next horizontal one is encountered, and the algorithm proceeds to step left or right
along the x axis (depending on the direction of the ray) to the next cell.
As the step is made, tx is updated to represent the distance to the next vertical
boundary. For uniform-size cells, the distance moved along the ray is constant from
one vertical boundary to the next, and thus the change made to tx can be precom-
puted. This value is called tx (Figure 7.17a). The stepping value from one horizontal
boundary to the next is similarly constant and kept as ty (Figure 7.17b).
In Figure 7.17c, the current cell is denoted (i, j). Because tx (the distance from the
ray origin to the next vertical boundary) is less than ty, the next cell visited is (i + 1, j).
Assuming the cell side to be M units wide, tx is given by
dx 2 + dy2
tx = M .
dx
For the line segment (x1 , y1 ) − (x2 , y2 ), dx = |x2 − x1 | and dy = y2 − y1 . The initial
values of tx and ty are computed slightly differently depending on which direction
the segment is pointing in. With the ray origin at (x1 , y1 ), for the given x coordinate
(x1 ) the nearest vertical boundary to the left has a value of minx = M x1 /M. The
nearest vertical boundary to the right has a value of maxx = minx + M. If the ray is
directed to the right, the value of tx can therefore be computed as
dx 2 + dy2
tx = (maxx − x1 ) .
dx
326 Chapter 7 Spatial Partitioning
兹苵苵苵苵苵苵苵苵苵
dx2 + dy2
兹苵苵苵苵苵苵苵苵苵
dx2 + dy2 tx = (maxx – x1)
tx = (x1 – minx) dx
dx
maxx = M x1 /M + M
minx = M x1 /M tx
minx = M x1 /M
(x1, y1)
(x1, y1)
tx
maxx = M x1 /M + M
x1 – minx maxx – x1
(a) (b)
Figure 7.18 Computing the initial value of tx (done analogously for ty) (a) for a ray directed
to the left and (b) for a ray directed to the right.
Instead of using the i and j coordinates to index into some array grid[i][j], it
is possible to maintain a pointer to the current cell and step the pointer directly by
adding appropriate values to the pointer instead of adding di and dj to a set of index
variables.
328 Chapter 7 Spatial Partitioning
for (;;) {
VisitCell(i, j, k);
if (tx <= ty && tx <= tz) { // tx smallest, step in x
if (i == iend) break;
tx += deltatx;
i += di;
} else if (ty <= tx && ty <= tz) { // ty smallest, step in y
if (j == jend) break;
ty += deltaty;
j += dj;
} else { // tz smallest, step in z
if (k == kend) break;
tz += deltatz;
k += dk;
}
}
It should be noted that as tx and ty are float variables it is possible that rounding
errors may cause the cells visited during traversal to be different from those visited by
an algorithm using exact arithmetic. If this is a concern, objects should be considered
as extended by some small epsilon before being assigned to the grid. It is also possi-
ble to turn this algorithm into an all-integer algorithm by using integer-only inputs
and multiplying through tx, ty, tx, and ty by the denominators involved in the
initialization code. An all-integer 4-connected line-drawing algorithm is described
in [Barerra03].
If the objects stored in the grid can be concave, it is important to make sure that the
intersection point of a detected intersection lies within the current cell. Otherwise,
when a concave object straddles multiple cells the object could have been hit several
cells away, and there might be a hit with a different object in any of the cells between.
Section 7.7 presents methods for avoiding having to perform multiple intersection
tests against the same objects in such situations as just described.
There are several possible optimizations that can be performed on ray queries on
uniform grids. For example, note that a ray at an angle passes through more cells
than an axis-aligned ray. Therefore, one possible optimization is to have several grids
oriented differently and pick the most aligned one to trace a ray through. Another
optimization is to apply a move-to-front reordering heuristic to objects in a cell. As
intersected objects are likely to be intersected again on the next query, having an early
upper bound on the distance to the nearest intersected object allows more distant
objects within the same cell to be quickly culled using a quick distance test. Additional
optimizations can be found in the ray-tracing literature [Glassner89], [Shirley00].
7.5 Sort and Sweep Methods 329
b0 b1 e0 b2 e1 e2 b3 b4 e3 e4
Figure 7.20 Objects clustered on the y axis (caused, for example, by falling objects settling
on the ground). Even small object movements can now cause large positional changes in the
list for the clustered axis.
In general, updating lists for n objects is expected time O(n). However, coherence
can break down due to clustering of objects. Specifically, sorting can deteriorate to
O(n2 ) on the clustered axis because of small world movements causing large positional
movements in the list. Unfortunately, such clustering is common. For instance, having
all objects on a surface, such as a floor, will cause tight clustering on the vertical axis
(Figure 7.20).
One solution to the global situation in which objects tend to cluster on, say, the
y axis is to track only the x and z lists. In most situations, sorting on just two axes is
likely to be sufficient, which also has the benefit of reducing the memory overhead of
this scheme. In some cases, it might even be worthwhile to pick axes other than the
normal x, y, and z ones to sort on. For instance, the x = y axis could be considered
in situations in which objects tend to cluster on x or y. Although this may lessen the
clustering problem for specific applications, the problem still remains. Section 7.5.2
suggests an alternative implementation avoiding the worst-case O(n2 ) behavior.
struct AABB {
Elem *pMin[3]; // Pointers to the three minimum interval values (one for each axis)
Elem *pMax[3]; // Pointers to the three maximum interval values (one for each axis)
Object *pObj; // Pointer to the actual object contained in the AABB
};
struct Elem {
AABB *pAABB; // Back pointer to AABB object (to find matching max/min element)
Elem *pLeft; // Pointer to the previous linked list element
7.5 Sort and Sweep Methods 331
Here, the lists themselves would be represented by the three list headers:
Elem *gListHead[3];
By merging minimum and maximum values into one element each, space is con-
served (as only a single back pointer to the AABB object is needed). In addition, the
combined structure needs just one flag value, as its contained coordinate values are
either min or max values. This results in the following structures.
struct AABB {
Elem *pMin; // Pointer to element containing the three minimum interval values
Elem *pMax; // Pointer to element containing the three minimum interval values
Object *pObj; // Pointer to the actual object contained in the AABB
};
struct Elem {
AABB *pAABB; // Back pointer to AABB object (to find matching max/min element)
Elem *pLeft[3]; // Pointers to the previous linked list element (one for each axis)
Elem *pRight[3]; // Pointers to the next linked list element (one for each axis)
float value[3]; // All min or all max coordinate values (one for each axis)
int minmax:1; // All min values or all max values?
};
Because the Elem structs do not move (only their contained pointers are being
relinked), this representation can be refined further by placing the structs contiguously
in memory. Given a pointer to either Elem struct, the corresponding min or max
element can always be found either before or after (respectively) the current one.
As the AABB struct was only needed to link the Elem structs, the pair of Elem structs
can effectively serve as the new AABB structure. Whereas previously AABB structs and
Elem structs were allocated separately, now the AABB struct contains all required data
and only a single structure need be allocated.
struct AABB {
Elem min; // Element containing the three minimum interval values
Elem max; // Element containing the three maximum interval values
332 Chapter 7 Spatial Partitioning
struct Elem {
Elem *pLeft[3]; // Pointers to the previous linked list element (one for each axis)
Elem *pRight[3]; // Pointers to the next linked list element (one for each axis)
float value[3]; // All min or all max coordinate values (one for each axis)
int minmax:1; // All min values or all max values?
};
Given a pointer to an Elem struct, the corresponding AABB is located in this way:
Compared to the previous representation, this saves four pointers (16 bytes at
4 bytes per pointer) per AABB. By embedding the structure inside the objects them-
selves, the pointer to the object would no longer be needed and an additional 4 bytes
would be saved. With a little trickery, the minmax bit fields could be moved from the
Elem structs into the AABB struct to save some additional memory. They could, in fact,
be embedded within one of the other structure fields and thus take up no extra space.
This is still quite an expensive structure at about 76 to 84 bytes per object, depending
on implementation. Further savings can be had by turning the 32-bit pointers into
16-bit indices, assuming that at no time will there be more than 65,536 objects (which
is a safe assumption for many applications).
In the following code it is assumed the three lists have been initialized to have
both start and end sentinels. Sentinels can be set up as follows.
enum {
MIN_ELEM = 0, // Indicates AABB minx, miny, or minz element
MAX_ELEM = 1 // Indicates AABB maxx, maxy, or maxz element
};
pSentinel->max.pLeft[i] = &pSentinel->min;
pSentinel->max.pRight[i] = NULL; // not strictly needed
pSentinel->min.value[i] = -FLT_MAX;
pSentinel->max.value[i] = FLT_MAX;
gListHead[i] = &pSentinel->min;
}
// Note backwardness of initializing these two
pSentinel->min.minmax = MAX_ELEM;
pSentinel->max.minmax = MIN_ELEM;
Inserting a new AABB into the sorted lists is done by scanning the lists for the
appropriate insertion point. To simplify the presentation here, the overlap pair sta-
tus is calculated in a separate pass after the new AABB has been inserted. To make
the code more efficient, the overlap pair status could be updated at the same time
objects are inserted by interleaving the routines. (Note that it is not possible to deter-
mine all overlapping pairs just by examining the Elem structs between the min and
max element. The latter test would miss overlap with a larger AABB, completely
encompassing the new AABB.)
To keep track of which collisions are active, the following code assumes the
existence of two functions AddCollisionPair() and DeleteCollisionPair() that
register and unregister, respectively, a pair of objects as being colliding.
// Insert min cell at position where pElem points to first larger element.
// Assumes large sentinel value guards from falling off end of list
while (pElem->value[i] < pAABB->min.value[i])
pElem = pElem->pRight[i];
pAABB->min.pLeft[i] = pElem->pLeft[i];
pAABB->min.pRight[i] = pElem;
pElem->pLeft[i]->pRight[i] = &pAABB->min;
pElem->pLeft[i] = &pAABB->min;
// Insert max cell in the same way. Can continue searching from last
// position as list is sorted. Also assumes sentinel value present
while (pElem->value[i] < pAABB->max.value[i])
pElem = pElem->pRight[i];
334 Chapter 7 Spatial Partitioning
pAABB->max.pLeft[i] = pElem->pLeft[i];
pAABB->max.pRight[i] = pElem;
pElem->pLeft[i]->pRight[i] = &pAABB->max;
pElem->pLeft[i] = &pAABB->max;
}
// Now scan through list and add overlap pairs for all objects that
// this AABB intersects. This pair tracking could be incorporated into
// the loops above, but is not done here to simplify the code
for (Elem *pElem = gListHead[0]; ; ) {
if (pElem->minmax == MIN_ELEM) {
if (pElem->value[0] > pAABB->max.value[0])
break;
if (AABBOverlap(pAABB, GetAABB(pElem)))
AddCollisionPair(pAABB, GetAABB(pElem));
} else if (pElem->value[0] < pAABB->min.value[0])
break;
}
}
After the position of an AABB has changed, the lists must be updated:
// This updating code assumes all other elements of list are sorted
void UpdateAABBPosition(AABB *pAABB)
{
// For all three axes
for (int i = 0; i < 3; i++) {
Elem *pMin = &pAABB->min, *pMax = &pAABB->max, *t;
// Try to move min element to the left. Move the roaming pointer t left
// for as long as it points to elem with value larger than pMin’s. While
// doing so, keep track of the update status of any AABBs passed over
for (t = pMin->pLeft[i]; pMin->value[i] < t->value[i]; t = t->pLeft[i])
if (t->minmax == MAX_ELEM)
if (AABBOverlap(pAABB, GetAABB(t)))
if (!HasCollisionPair(pAABB, GetAABB(t)))
AddCollisionPair(pAABB, GetAABB(t));
// If t moves from its original position, move pMin into new place
if (t != pMin->pLeft[i])
MoveElement(i, pMin, t);
7.5 Sort and Sweep Methods 335
}
}
The support function to move the AABB elements within the list is given by:
The preceding code should be a sufficient framework for implementing the addi-
tional code required to complete the linked-list implementation of sort and sweep.
struct AABB {
Point min;
Point max;
...
};
AABB *gAABBArray[MAX_OBJECTS];
For each pass of the sort-and-sweep algorithm one of the coordinate axes is
selected as the axis to sort along. (The selection is described later.) The selected
axis is specified through a global variable.
Given a sorting axis, the pointers are sorted in ascending order based on the
minimum value of the referenced AABB. To simplify the presentation, the following
code uses stdlib’s qsort(). The required comparison function needed for qsort() to
sort the AABBs is:
// Comparison function for qsort. Given two arguments A and B must return a
// value of less than zero if A < B, zero if A = B, and greater than zero if A > B
7.5 Sort and Sweep Methods 337
After the array has been sorted, it is scanned for overlapping boxes. To avoid
keeping track of currently active intervals as the array is swept, this particular imple-
mentation scans the array forward for overlapping objects to the end of the currently
tested box interval.
void SortAndSweepAABBArray(void)
{
// Sort the array on currently selected sorting axis (gSortAxis)
qsort(gAABBArray, MAX_OBJECTS, sizeof(AABB *), cmpAABBs);
During the sweep pass, the variances of the AABB centers along each of the three
axes are computed. After the pass is completed, the axis corresponding to the largest
variance is selected for use the next time the routine is called. This method does
not entirely avoid pathological cases. When objects are clustering in more than one
dimension, worst-case O(n2 ) behavior could still arise.
Insertions are O(1) by adding new AABBs after the position last used in the array.
The sorting pass absorbs the cost of bringing the AABBs to their proper place. When
an AABB is removed, its corresponding pointer also has to be removed from the
array. Given an AABB, a binary search over the array locates the pointer in O(log n)
time. The last pointer in the array replaces the pointer to be deleted. If removals are
frequent, a back pointer inside the AABB structure to the array pointer could be filled
in during the sweep pass, making deletions O(1) as well.
1 2 3
Figure 7.21 A simple portalized world with five cells (numbered) and five portals (dashed).
The shaded region indicates what can be seen from a given viewpoint. Thus, here only cells
2, 3, and 5 must be rendered.
function is called recursively for adjoining cells whose portals are visible to the
camera. During recursion, new portals encountered are clipped against the current
portal, narrowing the view into the scene. Recursion stops when either the clipped
portal becomes empty, indicating the new portal is not visible to the camera, or
when no unvisited neighboring cells are available.
The portals are implemented as non-self-intersecting polygons. The cells them-
selves are traditionally defined by convex polyhedra. Being convex, the polygons in
the cell can be rendered in any order without risk of overlap, thus obviating the
need to sort polygons. (However, with the Z-buffer capabilities of modern hardware
convex cells are no longer a requirement.) The following pseudocode illustrates an
implementation of this rendering procedure.
1 2 3
C
4
5
B
Figure 7.22 There are no objects in the cells overlapped by A, and thus object A does not
need to test against any objects. Objects B and C must be tested against each other, as C
crosses the portal between cells 3 and 5 and thus lies partly in the same cell as B.
bitindex = a(2n − a − 3) 2 + b − 1
instead of
bitindex = an + b.
The following code illustrates a possible implementation of a bit flag array for object-
object intersection testing.
// Allocate enough words to hold a bit flag for each object pair
const int32 MAX_OBJECTS = 1000;
342 Chapter 7 Spatial Partitioning
Although this works well for object-object testing, it is less than ideal in the ray-
shooting case. In the object-object scenario, the array of bit flags would have to be
reset once per frame, which is not too bad. For ray shooting, however, the array
would have to be reset for each ray tested! In a typical game there might be tens
or hundreds of ray-like tests performed each frame. Even for a modest number of
objects, this operation now quickly becomes very expensive.
1 2 3 4
C
A
B
Figure 7.23 Using time stamping avoids having to intersect the ray against object A twice,
in both cells 1 and 2, as the ray is traced through the uniform grid. By storing the computed
intersection with object B into a mailbox, the intersection can be reused in cell 3 without
having to be recomputed.
between A and the ray, A is simply tagged with the time stamp of the ray and the
ray processing continues to the next cell. In cell 2, the ray is again tested against A,
but because it has already been time stamped with the current time object A can be
safely ignored.
However, consider what happens with object B. It is also tested against in cell 1.
There is an intersection between B and the ray, but the intersection point is in cell 3.
Because the intersection point is not in the current cell, it is possible that there could
be another intersection in the cells between (and in fact there is, with object C in
cell 3). Thus, the ray processing must continue to subsequent cells. Before continuing
to the next cell, the intersection with B must be compared to the best intersection
stored so far to see if it constitutes a closer intersection. Because B happens to be
the first object intersected, this is clearly the case and a pointer to B along with the
parametric time at which the ray strikes B are saved as the currently best intersection.
Before B is finally time stamped, it also makes sense to update the number of cells
being traversed by the ray query. Because it is now known that there is an intersection
in cell 3, there is no need to continue processing into cell 4. In cell 2, objects A and
B have to be tested again, but both have been time stamped and are thus ignored,
with processing continued to cell 3. In cell 3, B is again tested and ignored because
of its time stamp. The ray is also tested against object C, which it intersects. As the
intersection with C is found to be closer than the stored intersection with object
B, the best intersection is updated to correspond to this new intersection. At this
point, because there are no more objects to test in cell 3 and the ray processing
was earlier set to stop in this cell the query can now return with the intersection
with C as the result after having performed the intersection test with each object
once only.
One slight problem remains. Regardless of the size used for the time stamp
counter, eventually it will overflow. If enough bits have been assigned for the
counter — which could well approach 64 bits for certain applications — the problem
can safely be ignored as not happening during the uptime of the application. Assume
for now that to conserve memory only an 8-bit counter was used. Every 256 ticks, the
time stamp counter will now overflow and at that time all stored time stamps will
have to be reset. If they are not, an object that has not been time stamped in 256 ticks
344 Chapter 7 Spatial Partitioning
and is suddenly tested would incorrectly report that it has already been tested in the
current time slice!
The time stamping technique described here has been described in the context
of ray tracing as mailboxes [Amanatides87]. Another approach to eliminating mul-
tiple intersections and redundant calculations is the residency masks described in
[Cychosz92].
In most applications, resetting the time stamp values will go unnoticed. However,
for applications with a huge number of objects this operation could become expensive.
For those applications, the solution is then to not reset all time stamp values at once.
This solution is explored in the next section.
Table 7.2 Indicating possible time stamp values for objects in the three blocks at various points. At the
point at which objects are about to be updated there is never an object with a time stamp
equal to the current global time stamp. Thus, even though only a few objects are cleared
each frame there is never a time stamp conflict.
Once at initialization, all object time stamp counters are cleared to zero.
blockToClear = 0;
tickCounter = 0;
for (i = 0; i < MAX_OBJECTS; i++)
object[i].timeStamp = tickCounter;
346 Chapter 7 Spatial Partitioning
During each frame the time stamp counters are used as normal.
At the end of each frame — or between rays in case of shooting rays — a subset of
all stored time stamps is cleared by marking them with the current time stamp value
(which has served its purpose for this frame).
// Reset the time stamp for all objects in the current block to be cleared
from = blockToClear * MAX_OBJECTS / MAX_COUNTER_VALUE;
to = (blockToClear + 1) * MAX_OBJECTS / MAX_COUNTER_VALUE;
for (i = from; i < to; i++)
object[i].timeStamp = tickCounter;
// Indicate that the next block should be cleared the next frame
if (++blockToClear >= NUM_BLOCKS)
blockToClear = 0;
// Wrap the global time stamp counter when it exceeds its maximum value
if (tickCounter >= MAX_COUNTER_VALUE - 1)
tickCounter = 0;
The same technique can be applied for amortized clearing of other buffers. For
example, by setting aside a few bits of the time stamp it could be used to spread the
clearing of a (software) Z-buffer over a number of frames.
7.8 Summary
Spatial partitioning is an approach to accelerating broad-phase processing by dividing
space into disjoint regions and associating scene objects with all regions they overlap.
7.8 Summary 347
Pairwise tests then must be performed only for those objects that share a region. In
some cases objects are associated with just a single region, in which case neighboring
regions also must be explored for potential collisions.
In this chapter, three types of spatial partitioning methods were explored: grids,
trees, and spatial sorting. Grids are often uniform, meaning that they consist of grid
cells of equal size. When large, grids may consume large amounts of memory. One
way to reduce the memory requirements of grids is to use hashed storage, as described
in Section 7.1.3, which has the benefit of allowing the grid to be of infinite extent.
Grids can also be expressed implicitly. The implicit bit grid, in particular, is a very
efficient spatial partitioning method. The hierarchical grid addresses the problem of
objects being of greatly varying sizes, making it difficult to set a grid cell size for a
uniform grid.
One of the tree representations described was the octree. It partitions space into
eight uniform-size cells, then recursively partitions the cells in the same way. Thanks
to their regular structure, octrees can be represented linearly, obviating the need to
represent tree links using pointers, and thereby saving memory. The octree can also
be made loose, meaning that its cells are expanded to overlap partially. Loose octrees
address the problem of objects not being able to descend farther down the tree when
they straddle a splitting plane (and are only allowed to be referenced from a single
octree cell).
The k-d tree is a generalization of the octree in that it can represent the same
spatial partitioning as the octree, but not vice versa. The k-d tree divides space into
two, along one dimension at a time and at an arbitrary position. The simpler structure
of the k-d tree makes queries on it easier.
Spatial sorting involves maintaining all objects in a sorted structure along one or
more axes. The sorted structure is incrementally updated when objects move. Because
the structure is sorted, object collisions are detected by examining neighboring objects
without requiring a search.
Chapter 8 discusses one additional tree representation, which is a generalization
of k-d trees. This representation can also be used as a spatial partitioning method.