Nowadays, information overload is becoming an ever-growing problem. Finding the exact information... more Nowadays, information overload is becoming an ever-growing problem. Finding the exact information is quite difficult. The main problem is that information is not organized according to its natural structures. There is not a data model that can represent these semi-structured or unstructured data in a natural way. On the other hand, almost every information application exploits interactive interfaces in some way. However, the development of user interfaces is still ad-hoc and often uses some predefined form-based interfaces. How can users express their queries in a semantic way is an important research topic. In this paper, we present an effective conceptual model to organize data according to their contents based on object-relation databases, and describe a quasi-natural language interface for the conceptual model. We also discuss the design and implementation of our prototype system.
Various query languages have been proposed to extract and restructure information in XML document... more Various query languages have been proposed to extract and restructure information in XML documents. These languages, usually claiming to be declarative, mainly consider the conjunctive relationships among data elements. In order to present the operations where the hierarchical and the disjunctive relationships need to be considered, such as restructuring hierarchy and handling heterogeneity, the programs in these languages often exhibit a procedural style and thus the declarativeness in them is not so prominent as in conventional query languages like SQL. In this paper, we propose a declarative pattern-based functional XML query language named XML Tree Query (XTQ). XTQ adopts expressive composite patterns to present data extraction, meanwhile establishing the conjunctive, the disjunctive and the hierarchical relationships among data elements. It uses the matching terms, a composite structure of the variables bound to the matched data elements, to present a global sketch of the ex-✩ This research is partially supported by National Science Foundation of China and Open Foundation of State Key Lab of Software Engineering. ✩✩ The authors wish to thank the anonymous referees for their valuable comments and suggestions, which greatly improved the technical content and the presentation of the paper. Also thanks to Tieyun Qian and Ming Zhong for their valuable advice and efforts in revising the submission.
Document databases are becoming popular, but how to present complex document query to obtain usef... more Document databases are becoming popular, but how to present complex document query to obtain useful information from the document remains an important topic to study. In this paper, we describe the design issues of a pattern-based document database query language named JPQ. JPQ uses various expressive patterns to extract and construct document fragments following a JSON-like document data model. It adopts tree-like extraction patterns with a coherent pattern composition mechanism to extract data elements from hierarchically structured documents and maintain the logical relationships among the elements. Based on these relationships, JPQ deploys a deductive mechanism to declaratively specify the data transformation requests and considers also data filtering on hierarchical data structure. We use various examples to show the features of the language and to demonstrate its expressiveness and declarativeness in presenting complex document queries.
Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search - JIWES '12, 2012
Keyword search on graphs aims to find minimum connected trees containing the keywords. Normally, ... more Keyword search on graphs aims to find minimum connected trees containing the keywords. Normally, the answer trees are ranked by their topological structures. However, this basic ranking scheme does not distinguish answer trees well when many answer trees have the same structures or contain redundant information. This paper proposes a novel ranking scheme, which combines both structurebased and content-based ranking factors. It can effectively prioritize the answer trees with more valuable content and punish the ones with redundant information. Meanwhile, it will not reduce the efficiency of top-k search algorithms by performing an edge re-weighting process offline.
Deductive object-oriented databases are intended to integrate the deductive and object-oriented d... more Deductive object-oriented databases are intended to integrate the deductive and object-oriented database techniques to combine the best of two approaches and to overcome their inherent shortcomings, with a number of deductive object-oriented database languages proposed. However, most of these languages are only structurally object-oriented. Important behaviorally object-oriented features such as methods and encapsulation common in object-oriented database systems are not properly supported. This chapter presents a novel deductive object-oriented database language called ROL2 that extends structurally object-oriented database language ROL with all behaviorally object-oriented features. It supports in a rule-based framework all important object-oriented features such as object identity, complex objects, typing, information hiding, rule-based methods, encapsulation of such methods, overloading, late binding, polymorphism, class hierarchies, multiple structural and behavioral inheritanc...
Regular path expression is one of the core components of XML query languages, and several approac... more Regular path expression is one of the core components of XML query languages, and several approaches to evaluating regular path expressions have been proposed. In this paper, a new path expression evaluation approach, extent join, is proposed to compute both parent-children ('/') and ancestor-descendent ('//') connectors between path steps. Furthermore, two path expression optimization rules, pathshortening and path-complementing, are proposed. The former reduces the number of joins by shortening the path while the latter optimizes the execution of a path by using an equivalent complementary path expression to compute the original path. Experimental results show that the algorithms proposed in this paper are much more efficient than conventional ones.
Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.
How to extract data from XML documents is an important issue for XML research and development. Ho... more How to extract data from XML documents is an important issue for XML research and development. However, how to view XML documents determines how they can be queried. In this paper, we first describe a natural way to view XML documents as in complex object data models so that we can easily comprehend XML data from database point of view. We then illustrate how to use logical variables to extract data from XML documents. We also describe a rule-based declarative query language for XML. We demonstrate that our rule-based language provides a uniform framework that is advantageous over other XML query languages including XQuery in the following ways. First, it provides a natural way for separating querying and result constructing using the body and the head respectively. Second, several rules can be used for the same query so that complex queries can be expressed in a simple and natural way. Also, its use of logical variables and rules makes many functions and operators in XQuery and XPath unnecessary or definable constructively. Finally, it provides a natural and direct support for recursion as in deductive databases and has logical foundations that have played a significant role in database research in the past.
Supporting for updating XML documents has recently attracted interest. This paper presents a nove... more Supporting for updating XML documents has recently attracted interest. This paper presents a novel declarative XML update language, which is an extension of the XML-RL query language. We define XML-RL update syntax and a set of primitive update operations to fully evolve XML into universal data representation and sharing format.
Seventh International Database Engineering and Applications Symposium, 2003. Proceedings.
Use of path expressions is a common feature in most XML query languages, and many evaluation meth... more Use of path expressions is a common feature in most XML query languages, and many evaluation methods for path expression queries have been proposed recently. However, there are few researches on the issue of optimizing regular path expression queries. In this paper, two kinds of path expression optimization principles are proposed, named path shortening and path complementing, respectively. The path shortening principle reduces the querying cost by shortening the path expressions with the knowledge of XML schema. While the path complementing principle substitutes the user queries with the equivalent lower-cost path expressions. The experimental results show that these two techniques can largely improve the performance of path expression query processing.
XML is becoming prevalent in data presentation and data exchange on the internet. One important i... more XML is becoming prevalent in data presentation and data exchange on the internet. One important issue in the XML research community is how to query XML documents to extract and restructure information. Currently, XQuery based on XPath is the most promising standard. In this paper, we discuss limitations of XPath and XQuery, and propose a generalization of XPath called XTree that overcomes these limitations. Using XTree, multiple variable bindings can be instantiated in one expression; and XTree expressions, which represent a tree rather than a path, can be used in both the querying part and the result construction part of a query. Based on XTree, we develop an XTree query language, which is more compact and convenient to use than XQuery, and supports common query operations such as join, negation, grouping, and recursion in a direct way. We describe an algorithm that converts XTree query scripts to XQuery scripts. This algorithm provides not only a means of executing queries written in XTree query language but also highlights differences between the two query languages.
Multimedia data is a challenge for data management. The semantics of traditional alphanumeric dat... more Multimedia data is a challenge for data management. The semantics of traditional alphanumeric data are mostly explicit, unique, and self-contained, but the semantics of multimedia data are usually dynamic, diversiform, and varying from one user’s perspective to another’s. When dealing with different applications in which multimedia data is involved, great challenges arise. We first introduce a novel data model called Information Networking Model (INM), which can represent the dynamic and complex semantic relationships of the real world. In this chapter, we show how to use INM to capture dynamic and complex semantics relationship of multimedia data. Using INM, we present a multimedia modeling mechanism. The general idea of this novel mechanism is to place the multimedia data in a complex semantic environment based on the real world or application requirements, and then users can make use of both contextual semantics and multimedia metadata to retrieve the precise results they expect.
How to query XML documents to extract and restructure the information is an important issue in XM... more How to query XML documents to extract and restructure the information is an important issue in XML research. Currently, XQuery based on XPath is the most promising standard of W3C. In this paper, we introduce a new set of syntax rules called XTree, which is a generalization of XPath. XTree has a tree structure, and a user can bind multiple variables in one XTree expression. It explicitly identifies list-valued variables, and defines some natural builtin functions to manipulate them. XTree expression can also be used in the result construction part of a query, to make it easy to read and comprehend. With these differences, XTree expressions are much more compact, and more convenient to write and understand than XPath expressions. We also give algorithms to convert queries based on XTree expressions to standard XQuery queries.
The Relationlog system is a novel persistent deductive database system for advanced data and know... more The Relationlog system is a novel persistent deductive database system for advanced data and knowledgebased applications. It directly supports the storage and inference of data with complex structures, especially data supported in nested relational and complex-object models. The Relationlog system supports the Relationlog query language, which is a typed extension of Datalog with tuples and sets and stands in the same relationship to the nested relational and complex-object models as Datalog stands to the relational model. It also supports an SQL-like data definition language and a declarative data manipulation language. This article introduces the Relationlog language, discusses the system architecture, the design decisions incorporated within its implementation, and our experience in developing the system.
Web is the biggest source of information and contains many entities and relationships between the... more Web is the biggest source of information and contains many entities and relationships between them, extracting these data from Massive Web pages and Integrating to a Semi-Structured Data with rich semantics will be more conducive to the management and use of these web data. On this premise, a comprehensive method is proposed to perform extraction the entities and relationships from the webpages. The method consists of two steps: 1) The target Web pages which contains these entities will be found based on the combination of vision information and content of keyword, meanwhile recording the relationship between father and children target Web pages; 2) Extracting the entities with analysis of DOM tree structure of the obtained Web pages and definitions of some extraction rules. At last, the extracted data is organized into a Semi-Structured Data with special relationships. Experiments on a large number of HTML pages have showed that this method can get a high correct rate and coverage.
Over the past decade, a large number of deductive object-oriented database languages have been pr... more Over the past decade, a large number of deductive object-oriented database languages have been proposed. The earliest of these languages had few object-oriented features, and more and more features have systematically been incorporated in successive languages. However, a language with a clean logical semantics that naturally accounts for all the key object-oriented features, is still missing from the literature. This article takes us another step towards solving this problem. Two features that are currently missing are the encapsulation of rule-based methods in classes, and nonmonotonic structural and behavioral inheritance with overriding, conflict resolution and blocking. This article introduces the syntax of a language with these features. The language is restricted in the sense that we have omitted other object-oriented and deductive features that are now well understood, in order to make our contribution clearer. It then defines a class of databases, called well-defined databas...
This paper describes ROL, a deductive and object-oriented database language which has been implem... more This paper describes ROL, a deductive and object-oriented database language which has been implemented. ROL integrates important features of object-oriented and deductive database systems. It supports object identity, complex objects, classes, class hierarchy, multiple inheritance with overriding and blocking, and schema. It also supports structured values such as functor objects and sets, treats them as rst class citizens, and provides powerful mechanisms for representing both partial and complete information on sets. It is an extension of pure valued-oriented deductive systems such as Datalog and LDL and subsumes them as special cases. It supports important integrity constraints such as domain, key, referential, functional dependency, and cardinality in a uniform framework. Furthermore, it has a logical semantics that cleanly accounts for all of its object-oriented and value-oriented features.
Nowadays, information overload is becoming an ever-growing problem. Finding the exact information... more Nowadays, information overload is becoming an ever-growing problem. Finding the exact information is quite difficult. The main problem is that information is not organized according to its natural structures. There is not a data model that can represent these semi-structured or unstructured data in a natural way. On the other hand, almost every information application exploits interactive interfaces in some way. However, the development of user interfaces is still ad-hoc and often uses some predefined form-based interfaces. How can users express their queries in a semantic way is an important research topic. In this paper, we present an effective conceptual model to organize data according to their contents based on object-relation databases, and describe a quasi-natural language interface for the conceptual model. We also discuss the design and implementation of our prototype system.
Various query languages have been proposed to extract and restructure information in XML document... more Various query languages have been proposed to extract and restructure information in XML documents. These languages, usually claiming to be declarative, mainly consider the conjunctive relationships among data elements. In order to present the operations where the hierarchical and the disjunctive relationships need to be considered, such as restructuring hierarchy and handling heterogeneity, the programs in these languages often exhibit a procedural style and thus the declarativeness in them is not so prominent as in conventional query languages like SQL. In this paper, we propose a declarative pattern-based functional XML query language named XML Tree Query (XTQ). XTQ adopts expressive composite patterns to present data extraction, meanwhile establishing the conjunctive, the disjunctive and the hierarchical relationships among data elements. It uses the matching terms, a composite structure of the variables bound to the matched data elements, to present a global sketch of the ex-✩ This research is partially supported by National Science Foundation of China and Open Foundation of State Key Lab of Software Engineering. ✩✩ The authors wish to thank the anonymous referees for their valuable comments and suggestions, which greatly improved the technical content and the presentation of the paper. Also thanks to Tieyun Qian and Ming Zhong for their valuable advice and efforts in revising the submission.
Document databases are becoming popular, but how to present complex document query to obtain usef... more Document databases are becoming popular, but how to present complex document query to obtain useful information from the document remains an important topic to study. In this paper, we describe the design issues of a pattern-based document database query language named JPQ. JPQ uses various expressive patterns to extract and construct document fragments following a JSON-like document data model. It adopts tree-like extraction patterns with a coherent pattern composition mechanism to extract data elements from hierarchically structured documents and maintain the logical relationships among the elements. Based on these relationships, JPQ deploys a deductive mechanism to declaratively specify the data transformation requests and considers also data filtering on hierarchical data structure. We use various examples to show the features of the language and to demonstrate its expressiveness and declarativeness in presenting complex document queries.
Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search - JIWES '12, 2012
Keyword search on graphs aims to find minimum connected trees containing the keywords. Normally, ... more Keyword search on graphs aims to find minimum connected trees containing the keywords. Normally, the answer trees are ranked by their topological structures. However, this basic ranking scheme does not distinguish answer trees well when many answer trees have the same structures or contain redundant information. This paper proposes a novel ranking scheme, which combines both structurebased and content-based ranking factors. It can effectively prioritize the answer trees with more valuable content and punish the ones with redundant information. Meanwhile, it will not reduce the efficiency of top-k search algorithms by performing an edge re-weighting process offline.
Deductive object-oriented databases are intended to integrate the deductive and object-oriented d... more Deductive object-oriented databases are intended to integrate the deductive and object-oriented database techniques to combine the best of two approaches and to overcome their inherent shortcomings, with a number of deductive object-oriented database languages proposed. However, most of these languages are only structurally object-oriented. Important behaviorally object-oriented features such as methods and encapsulation common in object-oriented database systems are not properly supported. This chapter presents a novel deductive object-oriented database language called ROL2 that extends structurally object-oriented database language ROL with all behaviorally object-oriented features. It supports in a rule-based framework all important object-oriented features such as object identity, complex objects, typing, information hiding, rule-based methods, encapsulation of such methods, overloading, late binding, polymorphism, class hierarchies, multiple structural and behavioral inheritanc...
Regular path expression is one of the core components of XML query languages, and several approac... more Regular path expression is one of the core components of XML query languages, and several approaches to evaluating regular path expressions have been proposed. In this paper, a new path expression evaluation approach, extent join, is proposed to compute both parent-children ('/') and ancestor-descendent ('//') connectors between path steps. Furthermore, two path expression optimization rules, pathshortening and path-complementing, are proposed. The former reduces the number of joins by shortening the path while the latter optimizes the execution of a path by using an equivalent complementary path expression to compute the original path. Experimental results show that the algorithms proposed in this paper are much more efficient than conventional ones.
Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002.
How to extract data from XML documents is an important issue for XML research and development. Ho... more How to extract data from XML documents is an important issue for XML research and development. However, how to view XML documents determines how they can be queried. In this paper, we first describe a natural way to view XML documents as in complex object data models so that we can easily comprehend XML data from database point of view. We then illustrate how to use logical variables to extract data from XML documents. We also describe a rule-based declarative query language for XML. We demonstrate that our rule-based language provides a uniform framework that is advantageous over other XML query languages including XQuery in the following ways. First, it provides a natural way for separating querying and result constructing using the body and the head respectively. Second, several rules can be used for the same query so that complex queries can be expressed in a simple and natural way. Also, its use of logical variables and rules makes many functions and operators in XQuery and XPath unnecessary or definable constructively. Finally, it provides a natural and direct support for recursion as in deductive databases and has logical foundations that have played a significant role in database research in the past.
Supporting for updating XML documents has recently attracted interest. This paper presents a nove... more Supporting for updating XML documents has recently attracted interest. This paper presents a novel declarative XML update language, which is an extension of the XML-RL query language. We define XML-RL update syntax and a set of primitive update operations to fully evolve XML into universal data representation and sharing format.
Seventh International Database Engineering and Applications Symposium, 2003. Proceedings.
Use of path expressions is a common feature in most XML query languages, and many evaluation meth... more Use of path expressions is a common feature in most XML query languages, and many evaluation methods for path expression queries have been proposed recently. However, there are few researches on the issue of optimizing regular path expression queries. In this paper, two kinds of path expression optimization principles are proposed, named path shortening and path complementing, respectively. The path shortening principle reduces the querying cost by shortening the path expressions with the knowledge of XML schema. While the path complementing principle substitutes the user queries with the equivalent lower-cost path expressions. The experimental results show that these two techniques can largely improve the performance of path expression query processing.
XML is becoming prevalent in data presentation and data exchange on the internet. One important i... more XML is becoming prevalent in data presentation and data exchange on the internet. One important issue in the XML research community is how to query XML documents to extract and restructure information. Currently, XQuery based on XPath is the most promising standard. In this paper, we discuss limitations of XPath and XQuery, and propose a generalization of XPath called XTree that overcomes these limitations. Using XTree, multiple variable bindings can be instantiated in one expression; and XTree expressions, which represent a tree rather than a path, can be used in both the querying part and the result construction part of a query. Based on XTree, we develop an XTree query language, which is more compact and convenient to use than XQuery, and supports common query operations such as join, negation, grouping, and recursion in a direct way. We describe an algorithm that converts XTree query scripts to XQuery scripts. This algorithm provides not only a means of executing queries written in XTree query language but also highlights differences between the two query languages.
Multimedia data is a challenge for data management. The semantics of traditional alphanumeric dat... more Multimedia data is a challenge for data management. The semantics of traditional alphanumeric data are mostly explicit, unique, and self-contained, but the semantics of multimedia data are usually dynamic, diversiform, and varying from one user’s perspective to another’s. When dealing with different applications in which multimedia data is involved, great challenges arise. We first introduce a novel data model called Information Networking Model (INM), which can represent the dynamic and complex semantic relationships of the real world. In this chapter, we show how to use INM to capture dynamic and complex semantics relationship of multimedia data. Using INM, we present a multimedia modeling mechanism. The general idea of this novel mechanism is to place the multimedia data in a complex semantic environment based on the real world or application requirements, and then users can make use of both contextual semantics and multimedia metadata to retrieve the precise results they expect.
How to query XML documents to extract and restructure the information is an important issue in XM... more How to query XML documents to extract and restructure the information is an important issue in XML research. Currently, XQuery based on XPath is the most promising standard of W3C. In this paper, we introduce a new set of syntax rules called XTree, which is a generalization of XPath. XTree has a tree structure, and a user can bind multiple variables in one XTree expression. It explicitly identifies list-valued variables, and defines some natural builtin functions to manipulate them. XTree expression can also be used in the result construction part of a query, to make it easy to read and comprehend. With these differences, XTree expressions are much more compact, and more convenient to write and understand than XPath expressions. We also give algorithms to convert queries based on XTree expressions to standard XQuery queries.
The Relationlog system is a novel persistent deductive database system for advanced data and know... more The Relationlog system is a novel persistent deductive database system for advanced data and knowledgebased applications. It directly supports the storage and inference of data with complex structures, especially data supported in nested relational and complex-object models. The Relationlog system supports the Relationlog query language, which is a typed extension of Datalog with tuples and sets and stands in the same relationship to the nested relational and complex-object models as Datalog stands to the relational model. It also supports an SQL-like data definition language and a declarative data manipulation language. This article introduces the Relationlog language, discusses the system architecture, the design decisions incorporated within its implementation, and our experience in developing the system.
Web is the biggest source of information and contains many entities and relationships between the... more Web is the biggest source of information and contains many entities and relationships between them, extracting these data from Massive Web pages and Integrating to a Semi-Structured Data with rich semantics will be more conducive to the management and use of these web data. On this premise, a comprehensive method is proposed to perform extraction the entities and relationships from the webpages. The method consists of two steps: 1) The target Web pages which contains these entities will be found based on the combination of vision information and content of keyword, meanwhile recording the relationship between father and children target Web pages; 2) Extracting the entities with analysis of DOM tree structure of the obtained Web pages and definitions of some extraction rules. At last, the extracted data is organized into a Semi-Structured Data with special relationships. Experiments on a large number of HTML pages have showed that this method can get a high correct rate and coverage.
Over the past decade, a large number of deductive object-oriented database languages have been pr... more Over the past decade, a large number of deductive object-oriented database languages have been proposed. The earliest of these languages had few object-oriented features, and more and more features have systematically been incorporated in successive languages. However, a language with a clean logical semantics that naturally accounts for all the key object-oriented features, is still missing from the literature. This article takes us another step towards solving this problem. Two features that are currently missing are the encapsulation of rule-based methods in classes, and nonmonotonic structural and behavioral inheritance with overriding, conflict resolution and blocking. This article introduces the syntax of a language with these features. The language is restricted in the sense that we have omitted other object-oriented and deductive features that are now well understood, in order to make our contribution clearer. It then defines a class of databases, called well-defined databas...
This paper describes ROL, a deductive and object-oriented database language which has been implem... more This paper describes ROL, a deductive and object-oriented database language which has been implemented. ROL integrates important features of object-oriented and deductive database systems. It supports object identity, complex objects, classes, class hierarchy, multiple inheritance with overriding and blocking, and schema. It also supports structured values such as functor objects and sets, treats them as rst class citizens, and provides powerful mechanisms for representing both partial and complete information on sets. It is an extension of pure valued-oriented deductive systems such as Datalog and LDL and subsumes them as special cases. It supports important integrity constraints such as domain, key, referential, functional dependency, and cardinality in a uniform framework. Furthermore, it has a logical semantics that cleanly accounts for all of its object-oriented and value-oriented features.
Uploads
Papers by Mengchi Liu