IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 11, NO. 1, JANUARY/FEBRUARY 1999
45
Spatial Databases
Accomplishments and Research Needs
Shashi Shekhar, Senior Member, IEEE, Sanjay Chawla,
Siva Ravada, Member, IEEE, Andrew Fetterer, Member, IEEE,
Xuan Liu, Student Member, IEEE, Chang-tien Lu, Student Member, IEEE
Abstract—Spatial databases, addressing the growing data management and analysis needs of spatial applications such as
Geographic Information Systems, have been an active area of research for more than two decades. This research has produced a
taxonomy of models for space, spatial data types and operators, spatial query languages and processing strategies, as well as
spatial indexes and clustering techniques. However, more research is needed to improve support for network and field data, as well
as query processing (e.g., cost models, bulk load). Another important need is to apply spatial data management accomplishments to
newer applications, such as data warehouses and multimedia information systems. The objective of this paper is to identify recent
accomplishments and associated research needs of the near term.
Index Terms—Spatial databases, multidimensional, object-relational, databases, Geographic Information Systems.
——————————F——————————
1 INTRODUCTION
1.1 Spatial Databases
S
PATIAL database [11], [15], [35] management systems
aim at the effective and efficient management of data
related to
• a space such as the physical world (geography, urban
planning, astronomy);
• parts of living organisms (anatomy of the human body);
• engineering design (very large scale integrated cir-
cuits, the design of an automobile, or the molecular
structure of a pharmaceutical drug); and
• conceptual information space (a multidimensional
decision support system, fluid flow, or an electromagnetic field).
The field of spatial database research has been an active
area of research for more than two decades. The results of
this research, e.g., spatial multidimensional indexes, are
being used in a number of areas. The field of spatial databases can be defined by its accomplishments; current research is aimed at improving its functionality and its performance. The impetus for improving functionality comes
from the needs of existing applications such as Geographic
Information Systems (GIS) and Computer Aided Design
(CAD), as well as from potential applications such as Multimedia Information System (MMIS), Data Warehousing
(DWH), and NASA’s Earth Observation System (EOS). The
acceptance of GIS as an important tool in governmental
decision-making is also documented [34], and military
²²²²²²²²²²²²²²²²
• S. Shekhar, S. Chawla, X. Liu, and C.-t. Lu are with the Computer Science
Department, University of Minnesota, 200 Union St. SE, Minneapolis,
MN 55455. E-mail: {shekhar, chawla, xliu, ctlu}@cs.umn.edu.
• S. Ravada is with the Oracle Corporation.
• A. Fetterer Panttaja Consulting Group in San Francisco.
Manuscript received 3 June 1997; revised 13 Aug. 1998.
For information on obtaining reprints of this article, please send e-mail
to:
[email protected], and reference IEEECS Log Number 108311.
planners have embraced GIS technology at all levels of tactical, operational and strategic planning, including battlefied visualization and terrain analysis [20].
Commercial examples of spatial database management
include Informix’s spatial data-blades (i.e., 2D, 3D, Geodetic), Oracle’s Universal server with either Spatial Data
Option or Spatial Data Cartridge and ESRI’s Spatial Data
Engine (SDE). Research prototype examples of spatial database management systems include spatial datablades with
Postgres [30], Predator, and Paradise [9]. The functionalities
provided by these systems include a set of spatial data
types such as a point, line-segment and polygon, and a set
of spatial operations such as inside, intersection, and distance. The spatial types and operations may be made part
of a query language such as SQL, which allows spatial querying when combined with an object-relational database
management system [6], [32]. The performance enhancement provided by these systems includes a multidimensional spatial index and algorithms for spatial access
methods, spatial range queries, and spatial joins. Spatial indexing with concurrency control may be implemented in
the object-relational server for performance reasons.
Existing and emerging applications require new functionalities including the modeling of network spaces and
continuous fields. The performance needs of emerging applications require not only the management of large data
sets, but also new processing strategies for spatial setoperations, field operations (e.g., slope), and network
analysis (e.g., shortest-path, route-evaluation).
1.2 Related Work and Our Contributions
Recent reports [11], [15], [35], [1] have described the accomplishments of spatial database research and have prioritized research needs. A broad survey of spatial database requirements and an overview of research results
1041-4347/99/$10.00 © 1999 IEEE
46
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 11, NO. 1, JANUARY/FEBRUARY 1999
is provided in [35], [11], [1]. Basic modeling requirements for spatial objects such as points, lines, and polygons
are given in terms of their geometry, topology and object
relationships (topological, directional, metric, network). Requirements are given for other user-level issues such as
graphical input and output and query language support.
Spatial clustering and indexing techniques [23] such as
Grid-files, Z-order, Quad-tree, Kd-trees, R-trees [12], and
associated join strategies are described. Finally, an architecture for spatial databases is given in terms of the objectrelational model.
Research needed to improve the performance of spatial
databases in the context of object-relational databases was
listed in [15]. The primary research needs identified were
concurrency control techniques for spatial indexing methods, the development of cost models for query strategies,
and the development of new spatial join algorithms beyond
nested-loop and tree matching.
Many of the research needs identified in [15] have since
been addressed. For example, concurrency control techniques for R-trees have been studied in the context of R-link
[16] trees. Also, new spatial join strategies using space partitioning [22] have been explored. In this paper, we identify
the recent accomplishments in spatial databases as well as
current research needs, based on publications in journals
and conference proceedings and recent commercial trends.
1.3 Scope and Outline
The role of the spatial database component is dependent on
the type of database management system (DBMS) involved:
relational, object-oriented or object-relational. In this paper,
we focus the discussion of spatial databases in the context
of the object-relational [6], [32], [31] databases, which provide extensibility to many components of traditional databases to support new application domains. These and other
important issues including architectural options, Raster
DBMS and Network spaces are covered in detail in our
forthcoming book [24]. Spatial databases have been one of
the most common applications of object-relational databases and have influenced their design a great deal. Objectrelational databases allow the inclusion of spatial datatypes, spatial operations, and multidimensional indexing
systems. This three-layer architectural framework is shown
in Fig. 1, and it consists of an object-relational database
management system, a spatial database, and a spatial application such as a GIS or MMIS. The interface between the
application and the spatial data system maps applicationspecific constructs to the spatial database. The spatial database associates the application requirements to the functionality provided by the DBMS. The interface to the DBMS
supports specialized query processing, which in turn supports the core database requirements for achieving acceptable performance.
Emerging trends such as World Wide Web interfaces,
multimedia data, and image processing are likely to impact
the data sharing and analysis needs of spatial databases.
Scaling up to large datasets requires new research in many
areas beyond spatial databases, including research on filesystems, device-drivers for tertiary storage, computer networks, and visualization software and algorithms related to
graphics and computational geometry. This paper does not
explore those issues.
The remainder of the paper is organized as follows:
Section 2 describes the recent advances in spatial databases.
Section 3 states the research needs for spatial databases.
Section 4 highlights our conclusions and motivates exploration of applications whose needs are not currently met by
spatial databases.
2 ACCOMPLISHMENTS
Research into spatial databases has mainly focused on developing a space taxonomy, spatial data models, spatial
query languages and processing strategies, and spatial access methods. This section lists recent important accomplishments, not only for the current applications of spatial
databases, but also for the emerging database problems that
have spatial dimensions.
2.1 Space Taxonomy
Space is a framework to formalize specific relationships
among a set of objects. Depending on the relationships of
interest, different models of space such as set-based
space, topological space, Euclidean space, metric space and
network space can be used [35]. Set-based space uses the
basic notion of elements, element-equality, sets and membership to formalize the set relationships such as set-equality,
subset, union, cardinality, relation, function, and convexity. Relational and object-relational databases use this model
of space.
Topological space uses the basic notion of a neighborhood and points to formalize the extended object relationships such as boundary, interior, open, closed, within, connected, and overlaps, which are invariant under elastic deformation. Combinatorial topological space formalizes relationships such as Euler’s formula (#faces + #vertices − #edges =
1 for planar configuration). Network space is a form of
topological space in which the connectivity property among
nodes formalizes graph properties such as connectivity, isomorphism, shortest-path, and planarity.
Euclidean coordinatized space uses the notion of a coordinate system to transform spatial properties and relationships to properties of tuples of real numbers. Metric spaces
formalize the distance relationships using positive symmetric functions that obey the triangle inequality. Many multidimensional applications use Euclidean coordinatized
space with metrics such as distance.
2.2 Spatial Data Model and Query Language
A spatial data model [25], [35] is a type of data-abstraction
that hides the details of data-storage. There are two common models of spatial information: field-based and objectbased. The field-based model treats spatial information
such as altitude, rainfall and temperature as a collection of
spatial functions transforming a space-partition to an attribute domain. The object-based model treats the information space as if it is populated by discrete, identifiable, spatially referenced entities. The operations on spatial objects
include distance and boundary. The operations on fields include local, focal, and zonal operations, as shown in Table 2.
The fields may be continuous, differentiable, discrete, and
SHEKHAR ET AL.: SPATIAL DATABASESACCOMPLISHMENTS AND RESEARCH NEEDS
47
Fig. 1. Three-layer architecture.
isotropic or anisotropic, with positive or negative autocorrelation. Certain field operations (slope or interpolation)
assume certain field properties (differentiable or positive
autocorrelation).
An implementation of a spatial data model in the context
of object-relational databases consists of a set of spatial data
types and the operations on those types. Much work has
been done over the last decade on the design of spatial Abstract Data Types (ADTs) and their embedding in a query
language. Consensus is slowly emerging via standardization efforts, and recently the OGIS consortium [21] has proposed a specification for incorporating 2D geospatial ADTs
in SQL. Fig. 3, which illustrates this spatial data-type hierarchy consists of Point, Curve, and Surface classes and a
parallel class of Geometry Collection. The basic operations
operative on all datatypes are shown in Table 1. The topological operations are based on the ubiquitous nineintersection model [10]. Using the OGIS specification,
common spatial queries can be intutively posed in SQL. For
example, the query Find all lakes which have an area greater
than 5 sq. km. and are within 20 km. from the campgrounds can
be posed as shown in Fig. 2a.
Other example GIS queries which can be implemented
using OGIS operations are provided in Table 3. The OGIS
specification is confined to topological and metric operations on vector data types. Other interesting classes of operations are network, direction, dynamic and the field operations of focal, local and zonal (see Table 2). While standards for field based raster data types are still emerging,
Map Algebra [33], specifically designed for cartographic
modeling and RaSQL, based on Image Algebra [3], for
general multidimensional discrete objects(satellite images,
X-rays, etc.), are important milestones.
2.3 Spatial Query Processing
The efficient processing of spatial queries requires both efficient representation and efficient algorithms. Common representations of spatial data in an object model include spaghetti, the node-arc-area (NAA) model, the doubly connectededge-list (DCEL), and boundary representation [17], some of
48
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 11, NO. 1, JANUARY/FEBRUARY 1999
TABLE 1
REPRESENTATIVE FUNCTIONS SPECIFIED BY OGIS [21]
TABLE 2
A SAMPLE OF SPATIAL OPERATIONS
TABLE 3
TYPICAL SPATIAL QUERIES FROM GIS
SHEKHAR ET AL.: SPATIAL DATABASESACCOMPLISHMENTS AND RESEARCH NEEDS
Fig. 2: (a) SQL query with spatial operators; (b) corresponding query tree.
Fig. 3. Spatial data type hierarchy [21].
49
which are shown in Fig. 4 using entity-relationship diagrams. The NAA model differentiates between the topological concepts (node, arc, areas) and the embedding space
(points, lines, areas). The spaghetti-ring and DCEL focus on
the topological concepts. The representation of the field data
model includes a regular tessellation (triangular, square, hexagonal grid), as well as triangular irregular networks (TIN).
The spatial queries [7], shown in Table 3, are often processed using filter and refine techniques. Approximate geometry such as the minimal orthogonal bounding rectangle
of an extended spatial object is first used to filter out many
irrelevant objects quickly. Exact geometry is then used for
the remaining spatial objects to complete the processing.
Strategies for range-queries include a scan and index-search
in conjunction with the plane-sweep algorithm [5]. Strategies for the spatial-join include the nested loop, tree
matching [5], when indices are present on all participating
relations, and space partitioning [22], in the absence of indices. To speed up computation for large spatial objects (it is
common for polygons to have 1,000 or more edges), object
indices are used in extended filtering. Strategies such
as object approximation and tree matching originated in
spatial-databases, and can potentially be applied in other
domains with similar characteristics.
50
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 11, NO. 1, JANUARY/FEBRUARY 1999
Fig. 4. Entity relationship diagrams for common representations of spatial data.
Fig. 5. Space-filling curves to linearize a multidimensional space.
SHEKHAR ET AL.: SPATIAL DATABASESACCOMPLISHMENTS AND RESEARCH NEEDS
51
2.4 Spatial File Organization and Indices
The physical design of a spatial database optimizes
the instructions to storage devices for performing common operations on spatial data files. File designs for secondary storage include clustering methods as well as
spatial hashing methods. The design of spatial clustering
techniques is more difficult compared to the design of
traditional clustering because there is no natural order in
multidimensional space where spatial data resides. This is
only complicated by the fact that the storage disk is a
logical one-dimensional device. Thus, what is needed is a
mapping from a higher dimensional space to a onedimensional space that is distance-preserving: So that elements that are close in space are mapped onto nearby
points on the line, and one-one: no two points in the space
are mapped onto the same point on the line [2]. Several
mappings, none of them ideal, have been proposed to accomplish this. The most prominent ones include row-order,
z-order, and the Hilbert-curve (see Fig. 5).
Metric clustering techniques use the notion of distance to
group nearest neighbors together in a metric space. Topological clustering methods like connectivity-clustered access methods [27] use the min-cut partitioning of a graph
representation to efficiently support graph traversal operations. The physical organization of files can be supplemented with indices, which are data-structures to improve
the performance of search operations.
+
Classical one-dimensional indices such as the B tree can
be used for spatial data by linearizing a multidimensional
space using a space-filling curve such as the Z-order (see
Fig. 5). A large number of spatial indices [23] have been
explored for multidimensional Euclidean space. Representative indices for point objects include Grid files, multidimensional grid files [18], Point-Quad-Trees, and Kd-trees.
Representative indices for extended objects include the
R-tree family, the Field tree, Cell tree, BSP tree, and Balanced and Nested grid files.
One of the first access methods created to handle extended objects was Guttman’s R-tree structure [12]. The
R-tree is a height balanced natural extension of the B+ tree
for higher dimensions. Objects are represented in the R-tree
by their minimum bounding rectangles (MBRs). Nonleaf
nodes are composed of entries of the form (R, child−pointer),
where R is the MBR of all entries contained in the childpointer. Leaf nodes contain the MBRs of the data objects. To
guarantee good space utilization and height-balance, the
parent MBRs are allowed to overlap. Fig. 6a illustrates the
spatial objects organized in an R-tree, while Fig. 6b shows
the file structure where the nodes correspond to disk pages.
Many variations of the R-tree structure exist whose main
emphasis is on discovering new strategies to maintain the
balance of the tree in case of a split and to minimize the
overlap of the MBRs in order to improve the search time.
Concurrency control for spatial access methods [16] is
provided by the R-link tree, which is a variant of the R-tree
with additional sibling pointers that allow the tracking of
modifications. Concurrency is provided during operations
such as search, insert, and delete. The R-link tree is also
recoverable in a write-ahead logging environment.
Fig. 6: (a) Spatial objects (bold) arranged in R-tree hierarchy; (b) R-tree
file structure on disk.
2.5 Other Accomplishments
Spatial applications like NASA’s Earth Observation System
(EOS) have some of the largest data sets encountered in
any application to date. This has prompted new research in
database-file design for storage on tertiary storage devices
such as juke-boxes. Representative results include those
from the Sequoia 2000 project [30]. High-performance spatial applications such as flight simulators with geographic
accuracy have triggered the development of new parallel
formalizations for the range query and the spatial join
query, including declustering methods and dynamic-load
balancing techniques for multidimensional spatial data [28],
[19]. Other interesting developments include hierarchical
algorithms for shortest path computation [14] and view
materialization [26].
3 RESEARCH NEEDS
Spatial databases are being used for an increasing number
of new applications, such as Intelligent Transportation
Systems, NASA’s Earth Observation System, Multimedia
Information Systems (MMIS) and Data Warehouses. This
section lists representative research needs.
3.1 Space Taxonomy
Many spatial applications manipulate continuous spaces of
different scales and with different levels of discretization. A
sequence of operations on discretized data can lead to
52
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 11, NO. 1, JANUARY/FEBRUARY 1999
growing errors similar to the ones introduced by finiteprecision arithmetic on numbers. There are preliminary
results [11] on the use of discrete basis and bounding errors
with peg-board semantics. Another related problem concerns interpolation to estimate the continuous field from a
discretization. Negative spatial autocorrelation makes interpolation error-prone. Further work is needed on a
framework to formalize the discretization process, its associated errors, and on interpolation.
3.2 Spatial Data Model
Spatial data models have been developed for topological,
metric and coordinatized Euclidean space. The OGIS specification alluded to in Section 2.2 is confined to topological
operators [8], and more work is needed to incorporate relationships which involve directional [29] and metric properties (see Table 2 for examples). In addition, there has been
very little work toward developing data models, data types
(e.g., node, edge, path), and a kernel set of operations (e.g.,
get-successors, shortest path) for network space, despite
their critical role in applications like transportation and
utility management (telephone, gas, electric).
Similarly, there is a need for developing the field data
model [33] toward a field-based query language. Operations on fields will be needed to help derive new information such as land-cover classification; the fields involved
include temperature, texture, and water content, and are
obtained through imaging in different bands such as infrared, visible bands, or microwave.
3.3 Spatial Query Processing
Many open research areas exist at the logical level of query
processing, including query-cost modeling and strategies
for nearest neighbor, bulk loading as well as queries related to fields and networks. Cost models are used to
rank and select the promising processing strategies, given a
spatial query and a spatial data set. Traditional cost models may not be accurate in estimating the cost of strategies
for spatial operations, due to the distance metric as well
as the semantic gap between relational operators and spatial operation. Cost models are needed to estimate the selectivity of spatial search and join operations toward comparison of execution-costs of alternative processing strategies for spatial operations during query optimization. Preliminary work in the context of the R-tree, tree-matching
join, and fractal-models is promising [4], [36], but more
work is needed.
Similarly, common strategies employed in traditional
databases for the logical transformation step in query optimization may not be always applicable in the context of spatial databases. For example consider the query (see Fig. 2a).
Let us assume that the Area() function is not precomputed and that its value is computed afresh every time it
is invoked. A query tree generated for the query is shown
in Fig. 2b.
In the classical situation, the rule “select before join”
would dictate that the Area() function be computed before
the join predicate function, Distance() (Fig. 7a), the underlying assumption being that the computational cost of executing the select and join predicate are equivalent and
Fig. 7: (a) Area() before distance(); (b) Distance() before Area().
negligible compared to the I/O cost of the operations. In
the spatial situation the relative cost per tuple of Area() and
Distance() is an important factor in deciding the order of
the operations [13]. Depending upon the implementation of
these two functions the optimal strategy may be to process
the join before the select operation(see Fig. 7b).
Many processing strategies using the overlap predicate
have been developed for range queries and spatial join queries. However, there is a need to develop and evaluate
strategies for many other frequent queries such as those in
Table 4. These include queries on objects using predicates
other than overlap and queries on fields such as slope
analysis as well as queries on networks such as the shortest
path to a set of destinations. Bulk loading strategies for
spatial data also need further study.
3.4 Spatial File Organization and Indices:
Physical Level
Many file organizations and indices with distance metrics
have been developed for coordinatized Euclidean space.
However, little work has been done on file clustering and
on indices for network spaces such as road maps and
SHEKHAR ET AL.: SPATIAL DATABASESACCOMPLISHMENTS AND RESEARCH NEEDS
53
TABLE 4
DIFFICULT SPATIAL QUERIES FROM GIS
telephone networks. Further work is needed, both to characterize the access patterns of the graph algorithms that
underlie network operations and to design access methods.
The R-link tree [16] is among the few approaches available for concurrency control on the R-tree. New approaches
for concurrency-control techniques are needed for other
spatial indices. The data volume of emerging spatial applications such as NASA’s EOS is among the highest of any
database application. Sequoia 2000 [30] provides an approach toward tertiary storage files and indices. Other approaches for managing databases on tertiary storage need
to be investigated.
3.5 Other
Other research needs include benchmarking, workflow
modeling, and the visual presentation of results. The Sequoia 2000 [30] benchmark characterizes the data and queries in Earth Science applications. The performance of
loading data, raster queries, spatial selection, spatial joins,
and recursion is addressed in 11 benchmark queries. A few
more are provided in the Paradise system [9]. Similar
benchmarks are needed to characterize the spatial data
management needs of other applications such as GIS,
DWH, and transportation.
The workflow in some spatial applications such as GIS is
based on manipulating layers to produce new, derived layers. Typically, the layers are combined in a tree-based manner, starting with a large number of source layers and producing new layers until a final result layer is produced.
Information about dependence among layers is useful for
change propagation if the source layers are modified.
Spatial databases may require a different type of
concurrency support than is needed by traditional databases. For example, transactions in traditional systems tend
to be short (on the order of seconds). However, in spatial
databases, these transactions can last up to a couple of
hours for editing and browsing. Similarly, recovery and
backup issues may also change, as the spatial objects tend
be large (a few megabytes) when compared to their counterparts in traditional systems. There is a need to characterize the work flow of spatial applications.
Many spatial applications present results visually, in the
form of maps which consist of graphic images, 3D displays,
and animations. They also allow users to query the visual
representation by pointing to the visual representation using devices like a mouse or a pen. Further work is needed
to explore the impact of querying by pointing and visual
presentation of results on database performance.
4 SUMMARY AND DISCUSSION
In this survey, we have presented the major research accomplishments and techniques which have emerged from
the area of SDBMS. These include object-based data modeling, spatial data types, filter and refine techniques for
query processing and spatial indexing. We have also identified areas where more research is needed. Some of these
areas are spatial graphs, field based modeling, cost modeling and concurrency control, query processing techniques
and discretization and propogation error.
Many of the spatial techniques highlighted in this survey are being used in an increasing number of applications such as GIS, CAD, and EOS. We believe that other
emerging multidimensional applications such as multimedia information systems will use these methods to solve
problems such as searching and indexing spatial content.
We illustrate the possibilities in the context of multimedia
information systems with text, audio and video data over
the World Wide Web.
Multimedia data has a spatial content which can be queried using the same spatial operators that have become
popular in geographic information systems. For example,
the spatial operator inside of can be applied to text to locate
sentences that contain the word “multimedia.” Also, audio
is often broken into channels with each channel containing
input from a different source; for instance, trumpet, guitar,
and voice. These channels are analogous to layers in GIS
and can be manipulated similarly. A spatial join could determine all of the locations where the input from both piano
and voice is over a certain decibel threshold.
A video database such as a movie server can take advantage of techniques developed for spatial databases.
Consider the movie Toy Story: Each frame contains spatial
content with objects interacting in directional relationships.
For instance, Buzz Lightyear could be above the trees when
he is flying, and frames in the movie could be queried
based on those relationships. For example, if you cannot
remember when in the movie an important event occurred,
but you can remember that Buzz Lightyear was in front of a
tree, you would be able to query the movie using that relationship to determine when in the movie that event took
place. Such queries exploit the directional relationships inherent between all tangible objects.
54
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 11, NO. 1, JANUARY/FEBRUARY 1999
ACKNOWLEDGMENTS
This work is sponsored, in part, by the United States Army
High Performance Computing Research Center under the
auspices of the Department of the Army, Army Research
Laboratory Cooperative Agreement No. DAAH04-95-2-0003
and Contract No. DAAH04-95-C-0008, the contents of which
do not necessarily reflect the position or the policy of the
government; no official govenmental endorsement should
be inferred. This work was also supported, in part, by the
National Science Foundation under Grant No. 9631539. We
thank Professor Jaideep Srivastava for technical commentary and Christiane McCarthy for helping to improve the
readability of the paper.
REFERENCES
[1] N. Adam and A. Gangopadhyay, Database Issues in Geographical
Information Systems, Kluwer, 1997.
[2] T. Asano, D. Ranjan, T. Roos, E. Wiezl, and P. Widmayer, “Space
Filling Curves and Their Use In The Design of Geometric Data
Structures,” Theoretical Computer Science, vol. 181, no. 1, pp. 3–15,
July 1996.
[3] P. Baumann, “Management of Multidimensional Discrete Data,”
VLDB J., special issue on spatial database systems, vol. 3, no. 4,
pp. 401–444, Oct. 1994.
[4] A. Belussi and C. Faloutsos, “Estimating the Selectivity of Spatial
Queries Using the ‘Correlation’ Fractal Dimension,” Proc. 21st
Int’l Conf. Very Large Data Bases, pp. 299–310, Zurich, Sept. 1995.
[5] T. Brinkhoff, H.-P. Kriegel, and B. Seeger, “Efficient Processing of
Spatial Joins Using R-Trees,” Proc. SIGMOD Conf. Management of
Data, pp. 237–246, Washington D.C., ACM, June 1993.
[6] D. Chamberlin, Using The New DB2: IBM’s Object Relational System, Morgan Kaufmann, 1997.
[7] N. Chrisman, Exploring Geographic Information Systems, John Wiley
and Sons, 1997.
[8] E. Clemintini and P. Di Felice, “Topological Invariants for Lines,”
IEEE Trans. Knowledge and Data Eng., vol. 10, no. 1, pp. 38–
54, 1998.
[9] D.J. DeWitt, N. Kabra, J. Luo, J.M. Patel, and J.-B. Yu, “ClientServer Paradise,” Proc. 20th Int’l Conf. Very Large Data Bases,
pp. 558–569, Santiago de Chile, Chile, Sept. 1994.
[10] M. Egenhofer, “Spatial SQL: A Query and Presentation Language,” IEEE Trans. Knowledge and Data Eng., vol. 6, no. 1, pp. 86–
95, 1994.
[11] R.H. Güting, “An Introduction to Spatial Database Systems,”
VLDB J., special issue on spatial database systems, vol. 3, no. 4,
pp. 357–399, 1994.
[12] R. Guttman, “R-Tree: A Dynamic Index Structure for Spatial
Searching,” Proc. SIGMOD Conf., Ann. Meeting, pp. 47–57, Boston,
ACM, 1984.
[13] J.M. Hellerstein and M. Stonebraker, “Predicate Migration: Optimizing Queries with Expensive Predicates,” Proc. SIGMOD Int’l
Conf. Management of Data, pp. 267–276, Washington, D.C., ACM,
May 1993.
[14] N. Jing, Y. Huang, and E. Rundensteiner, “Hierarchical Encoded
Path Views for Path Query Processing: An Optimal Model and Its
Performance Evaluation,” IEEE Trans. Knowledge and Data Eng.,
vol. 10, no. 3, pp. 409–432, 1998.
[15] W. Kim, J. Garza, and A. Kesin, “Spatial Data Management in
Database Systems,” Proc. Third Int’l Symp Advances in Spatial Databases, pp. 1–13, Lecture notes in Computer Science 692, SpringerVerlag, Singapore, 1993.
[16] M. Kornacker and D. Banks, “High-Concurrency Locking in
R-Trees,” Proc. 21st Int’l Conf. Very Large Data Bases, pp. 134–145,
Zurich, Sept. 1995.
[17] R. Laurini and D. Thompson, Fundamentals of Spatial Information
Systems, Academic Press, 1992.
[18] J. Lee, Y. Lee, K. Whang, and I. Song, “A Physical Database Design Method for Multidimensional File Organization,” Information
Sciences, vol. 120, no. 1, pp. 31–65, Oct. 1997.
[19] D.-R. Liu and S. Shekhar, “A Similarity Graph-Based Approach to
Declustering Problems and Its Application Toward Parallelizing
Grid Files,” Proc. 11th Int’l Conf. Data Eng., pp. 373–381, Taipei,
Taiwan, Mar. 1995.
[20] U.S. Army Corps of Engineers, Topographic Engineering Center,
URL: http://www.tec.army.mil/gis-internet2.html.
[21] Open GIS Consortium, OpenGIS Simple Features Specification for
SQL, URL: http://www.opengis.org/public/abstract.html, 1998.
[22] J.M. Patel and D.J. DeWitt, “Partition Based Spatial-Merge Join,”
Proc. 1996 SIGMOD Int’l Conf. Management of Data, pp. 259–270,
Montreal, ACM, 1996.
[23] H. Samet, The Design and Analysis of Spatial Data Structures,
Addison-Wesley, 1990.
[24] S. Shekhar and S. Chawla, Spatial Databases: Concepts, Implementation and Trends, first draft, URL: http://www.cs.umn.edu/Research/
shashi-group/ Book/index.html, 1998.
[25] S. Shekhar, M. Coyle, D.-R. Liu, B. Goyal, and S. Sarkar, “Data
Models in Geographic Information Systems,” Comm. ACM,
vol. 40, no. 4, pp. 103–111, 1997.
[26] S. Shekhar, A. Fetterer, and B. Goyal, “Materialization Trade-Offs
in Hierarchical Shortest Path Algorithms,” Proc. Fifth Int’l Symp.
Advances in Spatial Databases, Lecture Notes in Computer Science
1,262, Springer-Verlag, pp. 94–111, 1997.
[27] S. Shekhar and D.-R. Liu, “A Connectivity-Clustered Access
Method for Networks and Network Computation,” IEEE Trans.
Knowledge and Data Eng., vol. 9, no. 1, pp. 102–119, Jan. 1997.
[28] S. Shekhar, S. Ravada, V. Kumar, D. Chubb, and G. Turner, “Parallelizing A GIS on A Shared Address Space Architecture,”
Computer, vol. 29, no. 12, Dec. 1996.
[29] S. Shekhar and X. Liu, “Direction As A Spatial Object,” Proc. GIS
Workshop, Maryland, ACM, Nov. 1998; also available at URL:
http://www.cs.umn.edu/Research/shashi-group/ paper_list.html.
[30] M. Stonebraker, J. Frew, and J. Dozier, “The Sequouia 2000 Storage
Benchmark,” Proc. SIGMOD Conf. Management of Data, pp. 2–11,
Washington D.C., ACM, May 1993.
[31] M. Stonebraker and G. Kennitz, “Postgres Next-Generation Database Management System,” Comm. ACM, vol. 34, no. 10, pp. 78–
92, 1993.
[32] M. Stonebreaker and D. Moore, Object Relational DBMSs: The Next
Great Wave, Morgan Kaufmann, 1997.
[33] C.D. Tomlin, Geographic Information Systems and Cartographic Modeling, Englewood Cliffs, N.J.: Prentice Hall, 1990.
[34] UCGIS Congressional breakfast, URL: http://urban.rutgers.edu/ucgis,
1998.
[35] M.F. Worboys, Geographic Information Systems: A Computing Perspective, Taylor and Francis, 1995.
[36] Y. Theodoridis, E. Stefanakis, and T. Sellis, “Cost Models for Join
Queries in Spatial Databases,” Proc. 14th Int’l Conf. Data Eng.,
pp. 476–483, Orlando, Fla., Feb. 1998.
Shashi Shekhar received the BTech degree in
computer science from the Indian Institute of
Technology, Kanpur, India, in 1985; and the MS
degree in business administration and the PhD
degree in computer science from the University
of California, Berkeley, in 1989. He is currently
an associate professor in the Department of
Computer Science and an active member of the
the United States Army High Performance Computing Research Center as well as at the Center
for Transportation Studies at the University of
Minnesota, Minneapolis. His research interests include databases,
geographic information systems (GIS), and intelligent transportation
systems. He has published more than 100 research papers in refereed
journals, conference and workshop proceedings, and edited books. He
is a member of the IEEE Transactions on Knowledge and Data Engineering Editorial Board and the IEEE Computer Society Computer
Science Engineering Practice Board. He is guest editor of a special
issue of Communications of the ACM on GIS, and he was program cochair of the ACM International Workshop on Advances in GIS (1996).
His work in GIS includes databases for managing spatial networks
(e.g., road maps), parallelization of GIS, routing algorithms for Advanced Traveler Information Systems, and archiving traffic measurements. His group has developed some of the most efficient indexing
methods for large roadmaps and algorithms for path evaluation as well
SHEKHAR ET AL.: SPATIAL DATABASESACCOMPLISHMENTS AND RESEARCH NEEDS
as for computing shortest paths. His sponsors include the U.S. National Science Foundation, the Army Research Laboratories, Control
Data Inc., the USDOT, the MN/DoT, and the ITS Institute. His general
area of research is data and knowledge engineering. Currently, his
work is focused on storage, management, and analysis of scientific
and geographic data, information, and knowledge. The research is
motivated by, and has been applied to, application areas including
transportation (ITS), manufacturing, and finance. In data engineering
and database systems, his group has designed the ConnectivityClustered Access Method (CCAM), a new storage and access method
for spatial networks which outperforms alternative schemes in carrying
out network computations. He has also worked with semantic query
optimization, and high-performance geographic databases. In knowledge engineering, he has worked on the problem of discovery in databases. He has worked with symbolic machine learning techniques as
well as neural networks. He has designed one of the fastest scalable
parallel formulation of backpropagation learning algorithms for neural
networks and these parallel formulations compute in excess of 1 gigabyte of connections per second. He is a senior member of the IEEE,
and a member of the IEEE Computer Society, the ACM, and the AAAI.
Sanjay Chawla received his PhD degree in
mathematics at the University of Tennessee in
1995. He is now a postdoctoral fellow in the Department of Computer Science at the University
of Minnesota. Before that, he was an industrial
postdoctoral fellow at the Institute for Mathematics and Its Application (IMA), also at the University of Minnesota. His research interests include
Geographic Information Systems (GIS), spatial
databases, and optimal control theory.
Siva Ravada received a BTech degree in computer science from Andhra University, India; and
a PhD degree in computer science from the
University of Minnesota. He is a senior member
of technical staff in the Spatial Products Division
at the Oracle Corporation. His main research
interests are the design and analysis of parallel
algorithms for query processing in spatial databases. He is a member of the ACM, the IEEE,
and the IEEE Computer Society.
55
Andrew Fetterer received his BS degree in
computer science from the Univeristy of Minnesota in 1995 and his MS degree in computer science in 1997. He is currently a data warehouse
architect at the Panttaja Consulting Group in San
Francisco, having previously worked at the Oracle
Corporation. His research focuses on materialization issues in routing algorithms for large spatial networks. He is a member of the IEEE.
Xuan Liu received her bachelor of engineering
degree in computer software in 1987 and her
master of science degree in computer science in
1990, both from Xiamen University, Xiamen,
Peoples Republic of China. She is currently a
PhD student in the Computer Science Department at the University of Minnesota. She was an
assistant professor at Xiamen University from
1990 to 1995. Her current research interests
include spatial databases, object-oriented databases, Geographical Information Systems, spatial data modeling, spatial query processing and optimization, and
query languages. She is a student member of the IEEE.
Chang-tien Lu received his bachelor’s degree in
computer science and engineering from the
Tatung Institute of Technology, Taipei, Taiwan, in
1991; and the MS degree in computer science
from the Georgia Institute of Technology, Atlanta,
in 1996. He has been a PhD student in the Department of Computer Science at the University
of Minnesota, Twin Cities, since 1996. His research interests include geographic information
systems, multidimensional data and indexes,
spatial query, and spatial join processing. He is a
student member of the IEEE.