We are systematically investigating fundamental concepts in programming languages and their conne... more We are systematically investigating fundamental concepts in programming languages and their connection with the central ideas of relational databases. Our intention is to identify related notions and to generalize other notions in order to produce a database programming language with the following characteristics. The language should support efficient operations on bulk data, too large to fit into primary memory. The language should follow the structured, imperative tradition of ALGOL 60, but with support for alternative paradigms, such as functional and object-oriented. By thorough exploitation of the language/database concepts adopted, and insistence on orthogonality, the language should employ a minimum number of concepts and thus a very simple syntax. The present paper discusses four important ingredients of the language we intend to design and implement. We have implemented a scoping mechanism and are exploring refinements. We have implemented four types of metadata. We are imp...
This thesis presents three trie organizations for various binary tries. The new trie structures h... more This thesis presents three trie organizations for various binary tries. The new trie structures have two distinctive features: (1) they store no pointers and require two bits per node in the worst case, and (2) they partition tries into pages and are suitable for secondary storage. We apply trie structures to indexing, storing and querying both text and spatial data on secondary storage. We are interested in practical problems such as storage compactness, I/O efficiency, and large trie construction. We use our tries to index and search arbitrary substrings of a text. For an index of 100 million keys, our trie is 10 %- 25 % smaller than the best known method. This difference is important since the index size is crucial for trie methods. We provide methods for dynamic tries and allow texts to be changed. We also use our tries to compress and approximately search large dictionaries. Our algorithm can find strings with k mismatches in sublinear time. To our knowledge, no other published sublinear algorithm is known for this problem. Besides, we use our tries to store and query spatial data such as maps. A trie structure is proposed to permit querying and retrieving spatial data at arbitrary levels of resolution, without reading from secondary storage any more data than is needed for the specified resolution. The trie structure also compresses spatial data substantially. The performance results on map data have confirmed our expectations: the querying cost is linear in the amount of data needed and independent of the data size in practice. We give algorithms for a set of sample queries including geometrical selection, geometrical join and the nearest neighbour. We also show how to control query cost by specifying an acceptable resolution.
We propose a new trie organization for large text documents requiring secondary storage. Index si... more We propose a new trie organization for large text documents requiring secondary storage. Index size is critical in all trie representations of text, and our organization is smaller than all known methods. Access time is as good as the best known method. Tries can be constructed in good time. For an index of 100 million entries, our experiments show size factors of less than 3, as compared with 3.4 for the best previous method. Our measurements show expected access costs of 0.1 sec., and construction times of 18 to 55 hours, depending on the text characteristics.
Digital trees, or tries, were introduced thirty years ago for sublinear-time retrieval of substri... more Digital trees, or tries, were introduced thirty years ago for sublinear-time retrieval of substrings from large texts. They were exploited for this, as a well-known example, by the University of Waterloo project to put the New Oxford English Dictionary onto CD-ROM. We have recently improved the performance of trie techniques for text and shown their use in searches for approximations to a given string. We have also shown that tries have excellent retrieval properties for spatial data. We have shown how to use tries to represent, without redundancy, spatial data which can be displayed to any resolution, retrieving from disk or from network only the amount of data that will finally be displayed. We have done this particularly for two-dimensional vector data, such as makes up very large maps, but have also established that the trie techniques apply to raster data and to data of other than two dimensions. These results are the basis for a claim that tries offer the best storage ...
International Symposium on Cooperative Database Systems for Advanced Applications, 1996
Digital trees, or tries, were introduced thirty years ago for sublinear-time retrieval ofsubstrin... more Digital trees, or tries, were introduced thirty years ago for sublinear-time retrieval ofsubstrings from large texts. They were exploited for this, as a well-known example, by theUniversity of Waterloo project to put the New Oxford English Dictionary onto CD-ROM.We have recently improved the performance of trie techniques for text and shown theiruse in searches for approximations to a given string.We
We are systematically investigating fundamental concepts in programming languages and their conne... more We are systematically investigating fundamental concepts in programming languages and their connection with the central ideas of relational databases. Our intention is to identify related notions and to generalize other notions in order to produce a database programming language with the following characteristics. The language should support efficient operations on bulk data, too large to fit into primary memory. The language should follow the structured, imperative tradition of ALGOL 60, but with support for alternative paradigms, such as functional and object-oriented. By thorough exploitation of the language/database concepts adopted, and insistence on orthogonality, the language should employ a minimum number of concepts and thus a very simple syntax. The present paper discusses four important ingredients of the language we intend to design and implement. We have implemented a scoping mechanism and are exploring refinements. We have implemented four types of metadata. We are imp...
This thesis presents three trie organizations for various binary tries. The new trie structures h... more This thesis presents three trie organizations for various binary tries. The new trie structures have two distinctive features: (1) they store no pointers and require two bits per node in the worst case, and (2) they partition tries into pages and are suitable for secondary storage. We apply trie structures to indexing, storing and querying both text and spatial data on secondary storage. We are interested in practical problems such as storage compactness, I/O efficiency, and large trie construction. We use our tries to index and search arbitrary substrings of a text. For an index of 100 million keys, our trie is 10 %- 25 % smaller than the best known method. This difference is important since the index size is crucial for trie methods. We provide methods for dynamic tries and allow texts to be changed. We also use our tries to compress and approximately search large dictionaries. Our algorithm can find strings with k mismatches in sublinear time. To our knowledge, no other published sublinear algorithm is known for this problem. Besides, we use our tries to store and query spatial data such as maps. A trie structure is proposed to permit querying and retrieving spatial data at arbitrary levels of resolution, without reading from secondary storage any more data than is needed for the specified resolution. The trie structure also compresses spatial data substantially. The performance results on map data have confirmed our expectations: the querying cost is linear in the amount of data needed and independent of the data size in practice. We give algorithms for a set of sample queries including geometrical selection, geometrical join and the nearest neighbour. We also show how to control query cost by specifying an acceptable resolution.
We propose a new trie organization for large text documents requiring secondary storage. Index si... more We propose a new trie organization for large text documents requiring secondary storage. Index size is critical in all trie representations of text, and our organization is smaller than all known methods. Access time is as good as the best known method. Tries can be constructed in good time. For an index of 100 million entries, our experiments show size factors of less than 3, as compared with 3.4 for the best previous method. Our measurements show expected access costs of 0.1 sec., and construction times of 18 to 55 hours, depending on the text characteristics.
Digital trees, or tries, were introduced thirty years ago for sublinear-time retrieval of substri... more Digital trees, or tries, were introduced thirty years ago for sublinear-time retrieval of substrings from large texts. They were exploited for this, as a well-known example, by the University of Waterloo project to put the New Oxford English Dictionary onto CD-ROM. We have recently improved the performance of trie techniques for text and shown their use in searches for approximations to a given string. We have also shown that tries have excellent retrieval properties for spatial data. We have shown how to use tries to represent, without redundancy, spatial data which can be displayed to any resolution, retrieving from disk or from network only the amount of data that will finally be displayed. We have done this particularly for two-dimensional vector data, such as makes up very large maps, but have also established that the trie techniques apply to raster data and to data of other than two dimensions. These results are the basis for a claim that tries offer the best storage ...
International Symposium on Cooperative Database Systems for Advanced Applications, 1996
Digital trees, or tries, were introduced thirty years ago for sublinear-time retrieval ofsubstrin... more Digital trees, or tries, were introduced thirty years ago for sublinear-time retrieval ofsubstrings from large texts. They were exploited for this, as a well-known example, by theUniversity of Waterloo project to put the New Oxford English Dictionary onto CD-ROM.We have recently improved the performance of trie techniques for text and shown theiruse in searches for approximations to a given string.We
Uploads
Papers by Heping Shang