Towards Declarative XML Querying
Mengchi Liu
School of Computer Science
Carleton University
Ottawa, Ontario, Canada K1S 5B6
[email protected]
Abstract
How to extract data from XML documents is an important issue for XML research and development. However,
how to view XML documents determines how they can be
queried. In this paper, we first describe a natural way to
view XML documents as in complex object data models so
that we can easily comprehend XML data from database
point of view. We then illustrate how to use logical variables to extract data from XML documents. We also describe a rule-based declarative query language for XML.
We demonstrate that our rule-based language provides a
uniform framework that is advantageous over other XML
query languages including XQuery in the following ways.
First, it provides a natural way for separating querying and
result constructing using the body and the head respectively.
Second, several rules can be used for the same query so that
complex queries can be expressed in a simple and natural
way. Also, its use of logical variables and rules makes many
functions and operators in XQuery and XPath unnecessary
or definable constructively. Finally, it provides a natural
and direct support for recursion as in deductive databases
and has logical foundations that have played a significant
role in database research in the past.
1 Introduction
How to extract data from XML documents is an important issue for XML research and development. Various XML query languages have been proposed in the
past several years, such as Lorel [2], XML-GL [5],
XQL [22], XPath [9], XML-QL [11], XSLT [8], YATL [10],
XDuce [13], XQuery [7], etc. For a comparative analysis of
some of these language, see [3]. Some of them are in the
tradition of database query languages, others more closely
inspired by XML. The XML Query Working Group has recently published XML Query Requirements for XML query
languages [6]. The discussion is going on within the World
Tok Wang Ling
School of Computing
National University of Singapore
Lower Kent Ridge Road, Singapore 119260
[email protected]
Wide Web Consortium, within many academic forums and
within IT industry, with XQuery been selected as the basis
for an official W3C query language for XML.
However, there are several serious problems with existing XML query languages, including XQuery. Firstly, they
are based on a low level data model such as XML Query
Data Model [12] which use various nodes, such as element,
attribute, text, comment, and trees of nodes to represent
XML documents and force users to view XML documents
from such a low level programming point of view. Furthermore, they forces users to navigate through the trees of
XML documents in their queries, which makes the queries
hard to comprehend.
Secondly, most of the existing XML query languages
are based on SQL and OQL [4]. Unlike queries on traditional relational databases whose results are always flat
relations, the results for XML queries are complex. Thus,
XML queries have to have two components: querying part
and result constructing part. The existing XML query languages have to intermix the querying part and result constructing in a nested way and thus make queries complicated, which is an inherent problem inherited from SQL
and OQL. For example, in XML-QL, there are explicit two
constructs: where and construct. The where clause is used
for querying and the construct clause is for result constructing. However, the construct clause can contain nested where
and construct clauses so that querying and result constructing are intermixed. XQuery extends the two constructs in
XML-QL into four constructs: for, let, where and return,
i.e., FLWR expressions. The for and let clauses are used
for variable bindings. The for clause binds one value to a
variable at a time, while the let clause binds all values to a
variable. The while clause is used as a filter for the values
generated by the for and let clause. The return clause is used
for result constructing. Again, the return clause can contain
nested FLWR expressions.
Although XQuery is based on XPath, XQL, XML-QL,
SQL, and OQL, it is an algebraic language that relies on a
number of predefined functions and operators, such as doc-
ument, text, node, attribute, element, child, parent, descendent, position, filter, shadow, etc. Some of these functions
and operators are unintuitive, which makes the query language hard to learn and remember.
Thirdly, recursion cannot be expressed using the normal
querying and result constructing constructs. Instead, userdefined functions have to be used to handle recursion. However, the purpose of functions is for general and common
computation that would be needed for many times rather
than a sideway for other purpose. In our view, this is the
drawback of the language design anyway.
Finally, existing XML query languages lack logic foundations that have played a significant role in database research in the past.
In this paper, we describe a natural way to view XML
documents as in complex object data models [1], so that we
can easily comprehend XML data from database point of
view. Based on this view, we then illustrate how to use logical variables to extract data from XML documents. We also
describe a rule-based declarative query language for XML.
We demonstrate that our rule-based language provides uniform framework that is advantageous over other XML query
languages including XQuery in the following ways. First,
it provides a natural way for separating querying and result
constructing using the body and the head respectively. As
a result, no matter how complex the query is, the querying
part and the result constructing part are strictly separated.
Second, several rules can be used for the same query so
that complex queries can be expressed in a simple and natural way. Also, the use of logical variables and rules makes
many functions and operators in XQuery and XPath are unnecessary or definable constructively in our language. Finally, it provides a natural and direct support for recursion
as in deductive databases and has logical foundations [18].
The rest of the paper is organized as follows. Section 2 discusses how to model XML documents as in
databases. Section 3 investigates how to query XML documents declaratively. Section 4 shows how to use rule-based
language for query result constructing. Section 5 summarizes and points out further research issues.
2 Modeling XML Documents as in Databases
In this section, we show that we do not have to view
XML data in a low level as in XML Query Data Model [12].
Instead, we can view it as a complex object in complex object data models. With such a view, query representation
becomes also higher level as well. Consider the following
simple XML document:
<person id=”o111”>
<name>
<first>John</first>
<last>Smith</last>
</name>
<age>25</age>
</person>
where person, name, first, last, age are element names and
id is an attribute name. This XML document is represented
in XML Query Data Model as a tree shown in Figure 1, in
which person, name, last, first are element nodes, id is an
attribute node, and o111, John, Smith, 25 are text nodes.
person
@id
o111
age
name
first
last
John
25
Smith
Figure 1. XML Query Data Model Representation
However, this kind data is not new to the database community. We can view it as a complex object in complex
object data model as follows:
)
)
)
)
)
)
person [
@id o111,
name [
first John,
last Smith],
age 25]
We call this complex object as element object, which is
a pair of element name and element value. The element
value [@id o111, name [first John, last Smith],
age 25] is a tuple object, which contains an attribute
object @id o111, nested tuple object [first John, last
Smith], and an element object age 25. o111, John,
Smith, 25 are lexical objects, The symbol @ in front of
is used to separate elid denotes id is an attribute,
ement name from element value, attribute name from attribute value.
By comparing with XML Query Data Model, here we
have introduced several higher-level notions such as ele-
)
)
)
)
)
)
)
)
)
)
<people>
<person id=”o123”>
<name>Jane</name>
</person>
<person id=”o234” mother=”o456”>
<name>John</name>
</person>
<person id=”o456” children =”o123 o234”>
<name>Mary</name>
</person>
<person id=”o567”>
<name>Joan</name>
</person>
<person id=”o678” children =”o456 o567”>
<name>Tony</name>
</person>
<person/>
</people>
Figure 2. Sample XML Document 1
ment object, attribute object, tuple object, lexical object that
naturally correspond to XML notations. However, element
values are different from element contents in XML as element values contain attributes and element contents that
elements have.
Besides the above objects, we also need other kinds of
objects. Consider the XML document shown in Figure 2.
It is represented in our data model as an element object as
follows:
)
[
people
person [
@id o123,
name Jane],
person [
@id o234,
@mother o456,
name John],
person [
@id o456,
o123,o234 ,
@children
name Mary]
person [
@id o567,
name Joan]
person [
@id o678,
@children
o456,o567 ,
name Tony]
person null]
)
)
)
)
)
)
)
)
)
)f
)
)
)
)
)
)
)f
)
)
g
g
Note here the attribute name children has list values such
f
g f
g
as o123,o234 and o456,o567 , and there are several elements with the same element name person but different
values, especially null value.
From querying point of view, if we just want one value of
the element person at a time, then the above representation
is enough. However, if we want all the values of the element
person, then it is not clear what should be returned from this
representation. To solve this problem, we extend the notion
list object to cover element values and represent the above
XML document as follows:
)
people [
person
[@id
[@id
[@id
name
[@id
[@id
name
null ]
)f
)o123, name)Jane],
)o234, @mother)o456, name)John],
)o456, @children)fo123,o234g,
)Mary]
)o567, name )Joan]
)o678, @children )fo456,o567g,
)Tony]
g
In other words, we can treat person element as listvalued if we are interested in all of its values. This extension
is necessary as demonstrated by the use of LET expression
in XQuery.
Thus, elements in tuple objects here can have two kinds
of representations: single-valued and list-valued. Similarly,
attributes in tuple objects can also have two kinds of representations: single-valued and list-valued as the value of an
attribute can be a list object. Therefore, tuple objects can be
represented in different ways: flat representation in which
there is no list objects, integrated representation in which
list objects if any must be used, regular representation in
which element objects use flat representation and attribute
objects use integrated representation.
Example 1 The following examples demonstrate their difference:
)
)
[@children o123, @children o234,
title XML, author John, author Tony]
)
)f
)
g
)
o123,o234 ,
[@children
title XML, author
John, Tony ]
)
)f
)f
g
g
o123,o234 ,
[@children
title XML, author John, author
)
)
)Tony]
where the first representation is flat, the second representation is integrated and the last representation is regular. Regular representation directly corresponds to how data is represented in XML.
XML also allows mixed content documents. Consider
the following example:
<Address>
John lives on
<Street>Main St</Street>
with house number
<Number>123</Number>
in
<City>Ottawa</City>
<Country>Canada</Country>
</Address>
Our notion of tuple object naturally covers this case as
follows:
)
)
)
)
)
Address [
John lives on,
Street Main St,
with house number,
Number 123,
in,
City Ottawa,
Country Canada]
)
Note that if ’,’ and ’ ’ occurs in the XML document, they
are simply part of the strings that are not interpreted as ’,’
and ’ ’ in our data model.
A well-formed XML document over the web has a URL
that specifies its location and contains exactly one root element. The URL consists of the hostname, domain name, directories and the filename. In XQuery, the XML document
is obtained through the document function. In our framework, we treat the URL and the associated element as first
class citizen.
Consider the sample XML document at URL
www.abc.com/people.xml shown in Figure 3, where
www.abc.com/people.xml is the URL, www is the hostname, abc.com is the domain name, people is the directory
name, and people.xml is the file name. It contains one
bib element with two book sub-elements and one journal
sub-element. Its representation in our data model in regular
representation as an XML object is shown in Figure 4.
)
Note that the data model describe here is just intended
for us to conceptualize XML documents based on which
we can express queries.
3 Declarative Querying
Our objective is to be able to extract data from XML
documents in a declarative way. Thus, we use logical variables for querying. Logical variables are place holders and
are different from variables in procedural languages. They
are used to bind/match the values at their places. Logical
variables can be typed or non-typed. Based on the data
model introduced above, we have different kinds of objects in XML document: lexical, list, tuple, attribute with
<bib name=”IT” year=”1998”>
<book year=”1995”>
<title>Databases</title>
<author><last>Date</last><first>C.</first>
</author>
<publisher><name>Springer</name>
</publisher>
</book>
<book year=”1998”>
<title>XML</title>
<author><last>Date</last><first>C.</first>
</author>
<author><last>Darwen</last><first>D.</first>
</author>
<publisher><name>Spinger</name>
</publisher>
</book>
<journal year=”1998”>
<title>XML</title>
<editor><last>Date</last><first>C.</first>
</editor>
</journal>
</bib>
Figure 3. Sample XML document 2
attribute name and value, element with element name and
value. Therefore, we should have corresponding variables
so that we can match various components: lexical variables,
list variables, tuple variable, attribute variables, attribute
name variable, attribute value variables, element variables,
element name variable, and element value variables. For
URL, we may query host name, domain name, directories
and file name that contain certain XML documents. Thus
we need corresponding variables for them as well.
To be flexible and easy to use in practice, we prefer nontyped variables. A variable can be used for any location
but the same variable occurs in different places in the same
context should mean the same value from logic point of
value. Because XML objects have flat representation and
integrated representation, we could partition the variables
into two kinds: single-valued variables and list-valued variables, but it would be confusing and counterintuitive to have
two sets of variables in the language. Therefore, we only allow single-valued variables that start with $, and $ itself is
an anonymous variable. For a list, we use a grouping variable of the form $X , which is constructed from the singlevalued variable $X that ranges over the elements in the list.
Such grouping variables are common in advanced deductive
database language such as F-Logic [14], ROL [15], Relationlog [17], and ROL2 [19].
f g
)
)
)
)
(www.abc.com/people.xml)/
bib [@name
IT,
1998,
@year
book
[@year 1995,
title Databases,
author [last Date, first C.],
publisher [name Springer]],
book
[@year 1998,
title XML,
author [last Date, first C.],
author [last Darwen, first D.],
publisher [name Springer]]
journal [@year 1998,
title XML,
editor [last Date, first C.]]
)
)
)
) )
)
) )
)
)
) )
)
) )
)
) )
)
)
) )
)
)
Figure 4. Regular Representation
3.1 URL-related Querying and Built-in Functions
As we make URL and associated XML element first
class citizen in our framework, we can also use various variables for URL and URL components. Consider the following examples:
($URL)/$document
It is expected that the system should find a URL and the
corresponding document over the Internet one at a time.
The following query finds two distinct URLs that contain
the same document.
($URL1)/$document, ($URL2)/$document,
$URL1 = $URL2
6
The next query is more specific. It asks for the URL for
an XML document that contains a book element with 1998
for attribute year and XML for element title.
($URL)/bib/book
)[@year )1998, title )XML]
Obviously, the above queries are quite expensive. To reduce the search space, we can limit the search to a certain
site as follows:
(www.abc.com/$Path)
/bib/book [@year 1998, title
)
)
)XML]
This time, the system should just search within the given
site and find the path that contains the expected XML document and bind it to the variable $Path.
We can even restrict the search to a specific directory and
a specific file type at a site as follows:
(www.abc.com/$File.xml)
/bib/book [@year 1998, title
)
)
)XML]
where $File is a variable that binds to the filename satisfying
the condition, if any.
Current XPath and XQuery do not support such kind of
queries.
To query XML document, we also provide functions to
manipulate tuple, list, and string objects. However, unlike
XPath and XQuery which use a large number of unintuitive
functions for almost anything, here we only use several natural functions that are obvious and easy to understand. We
use several examples to demonstrate.
The built-in functions supported in the language are used
in a object-oriented fashion.
Example 2 If the variable $name binds to a string, we can
check whether the string contains a specific string using the
built-in function as follows:
$name.contains(XML)
This expression evaluates to true or false depending on what
$name binds to. Only the string that makes it true will be
retrieved.
f
g
Example 3 If the grouping variable $Numbers binds to
a list of numbers, we can obtain their count, average, minimal, maximum, and sum using the built-in functions as follows:
f$Numbersg.count() = $countNumber
f$Numbersg.avg() = $avgNumber
f$Numbersg.min() = $minNumber
f$Numbersg.max() = $maxNumber
f$Numbersg.sum() = $sumNumber
In such an expression, if the single-valued variable $Numbers binds to something other than a number in the list,
then the expression evaluates to false and nothing will be
retrieved.
f
g
Example 4 If a variable $authors binds a list of authors,
we can have the following comparison expressions using
the built-in operators whose meanings are self-explanatory.
Note that the expressions in the same line are related.
f$authorsg.count() > 2
f$authorsg.first() = [last: $Last, first: $F]
f$authorsg.firstTwo() 3 $author
f$authorsg.third() = $author
f$authorsg.last() = $author
f$authorsg.last() = $A, f$authorsg.precedes($A) = $A’
f$authorsg.sort() = f$sortedAuthorsg
f$authorsg.distinct() = f$uniqueAuthorsg
3.2 XML Document Querying
In the following, we demonstrate how to use variables
to obtain or match various components in XML documents
based on our data model. Our discussion is based on the
two sample XML documents in Figures 2 and 3 in the last
section.
Example 5 If we want to obtain the whole XML document at www.abc.com/people.xml shown in Figure 3, we
can simply use a binding variable $bib to match the XML
object as follows:
(www.abc.com/people.xml)/$bib
The expression is equivalent to XPath expression /bib.
Example 6 If we want to obtain the value of the bib element that is a tuple of two attribute objects, two book element objects, and one journal element object, we can use a
binding variable $bibValue as follows:
(www.abc.com/people.xml)/bib
)$bibValue
In XPath, we have to use two expressions /bib/* and
/bib/@* to get the same result.
Example 7 If we want to get the title of a book, we can use
the variable $title as follows:
(www.abc.com/people.xml)
/bib/book/title $title
)
The expression corresponds to XPath expression
/bib/title/text(). Note that here the function text() is
used in XPath (also in XQuery). In fact, in XPath and
XQuery, we have to know exactly what is there and use
proper functions such as text(), node(), etc. in order to
express the query properly. In our framework, we can
simple use variables to match whatever is there so that
query expression becomes a lot easier for users.
The attribute values can be queries in the same way as
well.
Example 8 If we want the year of a book, we can use the
variable $year as follows:
(www.abc.com/people.xml)
/bib/book/@year $year
)
The expression corresponds
/bib/book/@year/text().
to
XPath
expression
Example 9 If we want an attribute in a book element, we
can use single-valued attribute variable @$attr as follows:
(www.abc.com/people.xml)/bib/book/@$attr
Example 10 Writing the complete path is not convenient
for users. Thus, we also allow users to use path abbreviation
as in XPath. The following are two examples:
)$lastname
(www.abc.com/people.xml)/bib//last )$lastname
(www.abc.com/people.xml)//last
These expressions correspond to XPath expression
//last/text() and /bib//last/text() respectively.
Example 11 If we want to obtain child element object in
bib, we can use a variable $element as follows:
(www.abc.com/people.xml)/bib/$element
where $element binds one of the book and journal element
objects in the bib element object at a time. The expression
is equivalent to XPath expression /bib/*.
Example 12 If we want to obtain attribute objects with
name year in the bib element object, we can use one of the
following expressions that contain an attribute variable together with a selection condition:
(www.abc.com/people.xml)/bib/@$attr(@year
)$)
(www.abc.com/people.xml)/bib/@$attr(@year)
(www.abc.com/people.xml)/bib/@$attr(@$A),
@$A = name
6
The attribute variable @$attr binds an attribute object and
the selection condition deals with the attribute name and
value. When the value is not needed, we can use an anonymous variable, as in the first one or simply omit it as in the
second and third. These expressions correspond to XPath
expression /bib/book/* in terms of the result.
Note that in the third expression, we do not specify the
attribute name in the selection condition. This query cannot
be expressed in XPath and XQuery.
Example 13 If we want to obtain the attribute object with
value IT in the bib element object, we can use the following
expression:
(www.abc.com/people.xml)/bib/@$attr($
)IT)
Note here we do not specify the name of the attribute in the
selection condition. Instead, we simply use an anonymous
variable. This query also cannot be expressed in XPath and
XQuery.
Example 14 If we want all attributes in the bib element object, we can use the grouping variable @$attr as follows:
f
f
g
(www.abc.com/people.xml)/bib/book/ @$attr
g
The expression corresponds to XPath expression /bib/@*.
Element object queries are similar. The following are
several exmaples.
Example 15 If we are only interested in the book elements,
not the journal one, we can use one of the following expressions that contain a variable $book together with a selection
condition:
(www.abc.com/people.xml)/bib/$book(book
)$)
(www.abc.com/people.xml)/bib/$book(book)
(www.abc.com/people.xml)/bib/$book($name),
$name = journal
6
g
(www.abc.com/people.xml)/bib/ $books(book)
XPath does not support such kind of matching. In XQuery,
we need to use the following let construct to handle it:
Let $doc := document(”www.abc.com/people.xml”)
Let $book := $doc/bib/book
Example 17 From all the book element objects obtained
above, if we just want the second book element object, we
can select it into a variable $secondbook with the built-in
location operator from the list using one of the following
expressions:
f
g
($URL)/bib/ $books(book) ,
$books(book) .second() = $sBook
(1)
($URL)/bib/ $books(book) .second() = $sBook
(2)
f
f
g
g
The first one has two separate expressions. The second
combines the two expressions into a single one. These expressions correspond to XPath expression /bib/book(2)/*.
Example 18 Continue with the above result, if we want the
second author of the second book, we can use two variables
as follows where the grouping variable $authors matches
all author element objects in the second book element object
and the single-valued variable $secondauthor matches the
second author element object:
f
f
g
f
g
g
($URL)/bib/ $books .second()/ $authors .second() =
$secondAuthor
The expression corresponds
/bib/book(2)/author(2)/*.
to
XPath
expression
Sometimes, we want to search through XML documents
and find objects that satisfy more than one conditions.
Example 19 The following example find the book element
that have value 1998 for attribute year and XML for element
title.
(www.abc.com/people.xml)
/bib/$bookElement(@year 1998, title
)
)XML)
Example 20 Continue with the above example, if we are
not interested in the book element, rather we just want author names, we don’t need to use the variable $bookElement. We can simply use the following expression instead:
(www.abc.com/people.xml)
/bib/book [@year 1998, title
Example 16 If we want to obtain all book element object in
the bib element, we can use a grouping variable as follows:
f
The variable $bookElement matches the second book element object. The expression corresponds to XPath expression /bib/book(@year =”1998” and title = ”XML”)
)
)
)XML, author)$A]
As shown above, the main difference between XPath
and our querying framework is the use of binding variables
which make a logical foundation for XML query language
possible.
4 Rule-based Query Result Construction
As discussed in Section 2, an XML document can be
viewed as a complex object as in a complex object data
model. Although we can obtain information from XML
documents using logical variables as discussed in the previous section, we cannot generate well-formed XML document, as we cannot construct results into XML documents.
As we use logical variables for querying, it is natural to use
logic-based language, especially rule-based language, for
result constructing as well.
Indeed, rule-based languages provide a natural way for
separating querying and result constructing using the body
and the head respectively, as demonstrated in advanced deductive database languages Relationlog [17], ROL [16],
and ROL2 [19], and rule-based HTML document query
language [20]. Also, rule-based languages allow complex
queries to be expressed using several rules.
In XQuery, the FLWR construct is not powerful enough
to support recursion so that recursion has to be dealt with
using user-defined functions. Rule-based languages support
recursion in a natural and direct way.
In this section, we first introduce result constructing expression. The result constructing expression is similar to an
XML object with an URL part and element part. The URL
part specifies the URL for the file where the result will be
held. When the file is the standard output, we can simply
omit the URL part to simplify the expression. The element
part is used to construct the result element.
Consider the following five expressions:
/results/$b
(1)
(file:/home/users/xml/result.xml)/results/$b
(2)
(file:/home/users/xml/result.xml)/results/result/$b
(3)
(file:/home/users/xml/result.xml)
/results/result [@$year, $title]
(4)
)
(file:/home/users/xml/result.xml)
/results/book [title $T, authors
)
)
)f$Ag]
(5)
The first expression tells the system to use the standard output file, i.e., the screen, for the result. The second expression gives the URL of the file. Obviously, the user should
have write permission on this file. For these two expressions, the resulting element contains the object that variable
$b holds. If $b matches several objects (one at a time), then
the results will not be a well-formed XML document as it
will have several root elements. The third expression does
not have this problem as there will be only one root element
results and each element object that $b matches will be inside a child result element object. The next expression constructs a child result element object that has an attribute denoted by variable @$year and an element denoted by variable $title. The last expression contains a grouping variable
$A so that each author that $A binds to is grouped into a
list.
A rule has two parts: querying part and result constructing part with the following form:
Example 22 Create a flat list of all the title-author pairs,
with each pair enclosed in a result element.
querying
(http://www.abc.com/bib.xml)
/bib/book [title $t, author $a]
constructing
/results/result [title: $t, author: $a℄
)
results>
<result>
<title>Databases</title>
<author><last>Date</><first>C.</></>
</result>
<result>
<title>XML</title>
<author><last>Date </><first>C.</></>
</result>
<result>
<title>XML</title>
<author><last>Darwen</><first>D.</></>
</result>
</results>
<
querying E xp1 ; :::; E xpn constructing E xp
Example 21 List book elements for books published by
Springer after 1991.
)
querying
(http://www.abc.com/bib.xml)/bib/book $b
(publisher/name Springer, year ¿ 1991, title
constructing
/results/book $b
)
)
)
Example 23 For each author in the bibliography, list the
author’s name and the titles of all books by that author,
grouped inside a result element.
querying
(http://www.abc.com/bib.xml)
/bib/book [title $t, author $a]
constructing
/results/result [author $a, title
$t ]
)$t)
The result is a list of book elements under the root element
results:
results>
<book year=”1995”>
<title>Databases</title>
<author><last>Date</><first>C.</></>
<publisher><name>Spinger</name></>
</book>
<book year=”1998”>
<title>XML</title>
<author><last>Date</><first>C.</></>
<author><last>Darwen</><first>D.</></>
<publisher><name>Spinger</name></>
</book>
</results>
)
Note here the variable $a in the querying part matches one
author at a time and the result is
f g
where E xp1 ; :::; E xpn are querying expressions discussed
in the previous section and E xp is a result constructing expression.
A query can be expressed using one or more rules. Let us
look at several queries based on the book XML document
in Figure 3.
)
)
)
)
)
fg
)
)
)f g
The grouping variable $t in the constructing part is used
to group the titles of all books by the author $a. The result
is as follows:
<
results>
<result>
<title>Databases </title>
<author><last>Date </><first>C.</></>
</result>
<result>
<title>XML </title>
<author><last>Date</><first>C.</></>
<author><last>Darwen</><first>D.</></>
</result>
</results>
<
Example 24 For each book that has at least two authors,
list the title and first two authors.
querying
(http://www.abc.com/bib.xml)
/bib/book )[title )$t, author )f$ag],
fag.count()2, fag.firstTwo() 3 $b
constructing
/results/result )[title )$t, author )f$bg]
In deductive database language Datalog, a query can be
expressed using several rules with different head or temporary relations. Thus, Datalog does not have universal and
existential quantifiers as they can be transformed into equivalent rules without them [21]. For XML queries, it is better
to use less rules and thus we include universal quantifier
foreach and existential quantifier exists in the construct of
the form foreach ... exists ... such that.
Example 25 List the people who are authors of every book.
querying
(www.abc.com/people.xml)//author )$a
foreach $b (www.abc.com/people.xml)//book )$b
exists $a’ $b/author )$a’
such that ($a = $a’)
constructing
/results/result )$a
Let us now see how to handle recursion using our query
language.
Example 26 Consider the sample XML document in Figure 2. The following query lists the ID of a person and
his/her ancestors IDs in an ancestors attribute with two
rules:
querying
(http://www.abc.com/people.xml)
/person )[@id )$p, @children )$c]
constructing
/results/result )[@id )$c, @ancestors )f$pg]
querying
(http://www.abc.com/people.xml)
/person )[@id )$p, @children )$c],
/results/result )[@id )$p, @ancestors )$a]
constructing
/results/result )[@id )$c, ancestors )f$pg]
The first rule says for each person identified by $p, if $c is
a child, then $p is an ancestor of $c. The second rule says if
$c is a child of $p and $a is an ancestor of $p, then $p is also
an ancestor of $c. Note here the second rule is recursively
defined. The result of this query is as follows:
<results>
<result @id=”o123” @ancestors=”o456 o678”/>
<result @id=”o234” @ancestors=”o456 o678”/>
<result @id=”o456” @ancestors=”o678”/>
<result @id=”o567” @ancestors=”o678”/>
</results>
These two rules can be combined into one using or as in
logic programming as follows:
querying
(http://www.abc.com/people.xml)
/person )[@id)$p, @children)$c]
or
(http://www.abc.com/people.xml)
/person )[@id)$p, @children)$c],
/results/result )[@id)$p, @ancestors)$a]
constructing
/results/result )[@id)$c, @ancestors)f$pg]
5 Conclusion
In this paper, we have described a natural way to model
XML documents as in complex object data models. As a
result, we can easily comprehend XML data from database
point of view. Based on this view, we have also illustrated
how logical variables can be used to extract data from XML
documents and how rules can be used to construct query
results. We have also demonstrated the benefits that rulebased query language can bring to us. In addition, our language supports not only data extraction from XML documents but also URL-related searches. Thus, it supports the
functionalities of search engines. None of the existing XML
query languages supports this feature.
Unlike other XML query languages, the language
described here has a well-defined logical foundation [18]. The language been implemented in Java.
We will make it available from the web site at
http://www.scs.carleton.ca/mengchi/XML/ after further testing and debugging.
We would like to investigate how the data model described here can be used as a basis for other XML query
languages such as XQuery. Also, we would like to extend
the language into a full-fledged one by adding other useful
features for XML querying and transformation. We plan
to develop a natural language interface to our XML query
language as well so that users can directly use a simplified
English to query XML documents. A direct XML storage
manager that supports the rule-based language is also underway.
References
[1] S. Abiteboul, R. Hull, and V. Vianu. Foundations of
Databases. Addison Wesley, 1995.
[2] S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L.
Wiener. The Lorel Query Language for Semistructured
Data. Intl. Journal of Digital Libraries, 1(1):68–88, 1997.
[3] A. Bonifati and S. Ceri. Comparative Analysis of Five XML
Query Languages. SIGMOD Record, 29(1):68–79, 2000.
[4] R. G. G. Cattell and D. Barry, editors. The Object Database
Standard: ODMG 2.0. Morgan Kaufmann, Los Altos, CA,
1997.
[5] S. Ceri, S. Comai, E. Damiani, P. Fraternali, S. Paraboschi,
and L. Tanca. XML-GL: a Graphical Language for Querying
and Restructuring WWW data. In Proceedings of the 8th International World Wide Web Conference, Toronto, Canada,
1999.
[6] D. Chamberlin,
P. Fankhauser,
M. Marchiori,
and J. Robie.
XML Query Requirements.
http://www.w3.org/TR/2001/WD-xmlquery-req-20010215,
February 2001.
[7] D. Chamberlin, D. Florescu, J. Robie, J. Simon, and
M. Stefanescu. XQuery: A Query Languge for XML.
http://www.w3.org/TR/2001/WD-xquery-20010215, February 2001.
[8] J. Clark.
XSL Transformations (XSLT) Version 1.0.
http://www.w3.org/TR/xslt, November 1999.
[9] J. Clark and S. DeRose.
XML Path Language
(XPath) Version 1.0. http://www.w3.org/TR/1999/RECxpath-19991116, November 2001.
[10] S. Cluet and J. Simeon.
YATL: a Functional and
Declarative Language for XML.
http://db.belllabs.com/user/simeon/icfp.ps, 1999.
[11] A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and
D. Suciu.
XML-QL: A Query Language for XML.
http://www.w3.org/TR/1998/Note-xml-ql-19980819, August
1998.
[12] M. Fernandez and J. Robie.
XML Query Data
Model. http://www.w3.org/TR/2001/WD-Query-datamodel20010215, February 2001.
[13] H. Hosoya and B. Pierce. XDuce: A Typed XML Processing
Language (Preliminary Report. In Proceedings of WebDB
Workship, 2000.
[14] M. Kifer, G. Lausen, and J. Wu. Logical Foundations of
Object-Oriented and Frame-Based Languages. Journal of
ACM, 42(4):741–843, 1995.
[15] M. Liu. ROL: A Deductive Object Base Language. Information Systems, 21(5):431 – 457, 1996.
[16] M. Liu. The ROL Deductive Object Base Language (Extended Abstract). In Proceedings of the 7th International
Workshop on Database and Expert Systems Applications
(DEXA Workshop ’96), pages 122–131, Zurich, Switzerland,
September 9-10 1996. IEEE CS Press.
[17] M. Liu. Relationlog: A Typed Extension to Datalog with
Sets and Tuples. Journal of Logic Programming, 36(3):271–
299, 1998.
[18] M. Liu. A Logical Foundation for XML. In Proceedings of
the 14th International Conference on Advanced Information
Systems Engineering (CAiSE ’02), pages 568–583, Toronto,
Canadan, May 27-31 2002. Springer-Verlag LNCS 2348.
[19] M. Liu and M. Guo. ROL2: A Real Deductive ObjectOriented Database Language. In Proceedings of the 17th International Conference on Conceptual Modeling (ER ’98),
pages 302–315, Singapore, Nov. 16-19 1998. SpringerVerlag LNCS 1507.
[20] M. Liu and T. W. Ling. A Conceptual Model and Rulebased Query Language for HTML. World Wide Web Journal, 4:49–77, 2001.
[21] J. W. Lloyd. Foundations of Logic Programming. SpringerVerlag, 2 edition, 1987.
[22] J. Robie, J. Lapp, and D. Schach. XML Query Language
(XQL).
http://www.w3.org/TandS/QL/QL98/pp/xql.html,
1998.