Filling the gap between data federation and
data integration
Antonella Poggi and Marco Ruzzi
Dipartimento di Informatica e Sistemistica
Università di Roma “La Sapienza”
Via Salaria 113, I-00198 Roma, Italy
{poggi,ruzzi}@dis.uniroma1.it
Abstract. Today’s fast and continuous growth of large business organizations
enforces the need to integrate and share data coming from a number of heterogeneous and distributed data sources. To solve this kind of problem, data federation
tools suggest the adoption of a database management system as a kind of middleware infrastructure that uses a set of wrappers to access heterogeneous data
sources. In this paper, we highlight federation tools limitations with respect to
the information integration problem and show how data integration systems overcome such limitations. We propose then two different techniques to implement a
data integration system using a data federation tool. Finally, we present experimental results that compare the two solutions.
1 Introduction
Today’s fast and continuous growth of large business organizations, often deriving from
mergers of smaller enterprises, enforces an increasing need in integrating and sharing
large amounts of data, coming from a number of heterogeneous and distributed data
sources. Nonetheless, such needs are also shown by others application fields, like information systems for administrative organizations, life sciences research and many
others. More, is not unfrequent that different parts of the same organization, whatever
it is, adopt different systems to produce and maintain its critical data. All these kinds of
situations are concerned with the problem of information integration.
Some recent software solutions for the problem suggest the adoption of a database
management system (DBMS) as a kind of middleware infrastructure that uses a set
of software modules, called wrappers, to access heterogeneous data sources [8]. Such
wrappers hide the native characteristics of each source, masking them under the appearances of a common relational table. Furthermore, their aim is to mediate between the
federated database and the sources, mapping the data model of each source to the federated database data model, also transforming operation over the federated database into
requests that the source can handle. Such a technique takes the name of data federation
and allows the users to exploit the full power of the SQL language, that the adopted
DBMS supports, to combine the data coming from the sources. Therefore, the user can
federate each kind of source for which an appropriate wrapper exists, involving different sources within a single SQL statement. Among the variety of research products,
IBM DB2 Information Integrator (DB2II) [2] is the only data federation commercial
tool we are aware of [7]. That is why it is the one we consider in the present paper.
As we will better explain in the following, the features provided by data federation
tools are often inadequate to obtain an acceptable solution to the problem of information
integration. We will show how to make use of a data integration system to overcome
these limitations, providing a flexible architecture that allows the user to uniformly
access various sources and combine their data into a global unified view. To this aim,
the user can define a global schema, i.e., a set of global concepts that represent its
interest domain, explaining, by means of a mapping, the relationships that hold between
these global entities and the underlying sources, represented within the data integration
system by a source schema [11, 9]. To better represent its interest domain, the user can
enrich the global schema with integrity constraints that the system takes into account
while performing the requests [5, 3, 4]. The user interacts with the system by posing
queries over the global schema, while the system carries out the task to suitably access
the sources and retrieve the data that form the answer.
In this paper we present an approach to the problem of information integration that
applies data integration theory on a data federation environment. More specifically, assuming a relational framework, we define a formal specification of a data integration
system, comprising a global schema, a mapping and a source schema, and we show
how to implement this specification by means of a commercial tool for data federation.
This is obtained by: (i) producing an instance of a federated database through the compilation of such a specification; (ii) translating the user queries posed over the global
schema, so as to issue them to the federated database. To this aim, we propose two different compilation techniques: in both cases, we show correctness of the compilation
with respect to the semantics of the system. Finally, we compare the two solutions by a
set of experiments conducted using DB2II.
The paper is structured as follows: in Section 2 we present the basic concepts regarding data federation, referring to the architecture of IBM DB2 Information Integrator
and we analyze its limitations with respect to the problem of information integration; in
Section 3 we introduce the classical logical framework for data integration; in Section 4
we show two different techniques for specification compilation and query rewriting, experimenting and comparing both techniques in Section 5; finally, we conclude the paper
and present some future projects about the topic of this paper.
2 DB2II: IBM’s data federation tool
Data federation tools enable data from multiple heterogeneous data sources to appear as
if it was contained in a single federated database. Besides uniform access, performance
and resource requirements are also considered in order to suitably face typical needs as
joins or aggregations of different data sources.
In this section, we provide an overview of a commercial solution. We choose DB2
Information Integrator (DB2II), since it is known to provide a very efficient federation
of heterogeneous data sources [2]. Then, starting from a critics of DB2II, we discuss
the limits of data federation with respect to data integration.
2.1
Overview of DB2II
DB2II is the IBM’s solution to the data federation problem [8]. It provides a single
uniform access to a large variety of heteregeneous data sources. More precisely, DB2II
provides different wrappers (implemented by different libraries) for many popular data
sources, such as relational sources (Oracle, MS SQL Server, ...) as well as semistructured XML sources and many common specialized data sources (such as an Excel table). Note that several homogeneous data sources can refer to the same wrapper. Then,
inside a particular data source, data sets of interest are modeled as nicknames, that
constitute basically a virtual view on the set of data. For example, inside a relational
source, DB2II associates a nickname to each relation of interest in the source, while,
for a non-relational XML source, DB2II associates a nickname to a fragment of the
XML document characterized by an XPATH expression.
The DB2II wrapper architecture enables the federation of heteregenous data
sources, maintaining the state information for each of them, managing connections,
decomposing queries into fragments that each source can handle and managing transactions across data sources. It supplies the infrastructure to model multiple data sets and
multiple operations, letting specialized functions supported by the external source to be
invoked in a query even though DB2 does not natively support the function. Furthermore , it includes a flexible framework to provide input to the query optimizer about the
cost of data source operations and the size of data that will be returned, supplying at the
same time information based on the immediate context of the query. Finally, wrappers
participate as resource managers [8] in a DB2 transaction.
2.2
Limits of data federation
Let us consider the DB2II’s approach to information integration presented above. As
a federation tool, note that DB2II does not let the designer define an arbitrary description of the domain of interest, through which the users can access external sources. In
particular, we have identified two main limitating features of the product:
1. As we already mentioned, DB2II models an external source data set as a nickname
that consists of a virtual view on the data set. Therefore, DB2II establishes a oneto-one correspondence between source data sets and nicknames, instead of letting
the designer define a correspondence between a concept of interest, represented as
a unique relation, and a view over the multiple source data sets.
2. Furthermore, if the source is relational, the nickname schemas are identical to those
of the modeled data source relations, which means that both have the same number,
name and type of attributes. Even if the definition of nickname schemas is a little
more flexible for non-relational data sources, there are still several rules that the
designer has to follow, which limit the expressiveness of the correspondence even
inside a single source data set.
The mentioned limits are typical of data federation. Indeed, in this kind of tools, the
designer is provided with a view of source data sets that is source-dependent, since it
reflects basically the source data sets structure. In the next section, we present a formal
approach to the problem of data integration showing how this theory can fulfil the gap
enforced by such limitations.
3 A logical framework for data integration
Data integration is the problem of combining data residing at different sources, and
providing the user with a unified view of these data [11, 12, 10]. The adoption of a logical formalism in specifying the global schema enables users to focus on specifying the
intensional aspects of the integration (what they want), rather than thinking about the
procedural aspects (how to obtain the answers). On the one hand, through the specification of a global schema, the domain of interest can be described independently of the
data sources structure. On the other hand, the mapping models the relation between the
global schema and the sources. The limits of data federation are therefore overcome by
data integration systems. Such kind of approach, guarantees an higher level of flexibility, that will eventually lead to the extension of the system, by introducing, for example,
integrity constraints over the global schema.
In this section, we set up a logical framework for data integration. We first formalize
the notion of a data integration system, then we specify its semantics.
3.1
Syntax
We consider to have an infinite, fixed alphabet Γ of constants representing real world
objects, and will take into account only databases having Γ as domain. Furthermore,
we adopt the so-called unique name assumption, i.e., we assume that different constants
denote different objects. Formally, a data integration system I is a triple hG, S, Mi ,
where:
1. G is the global schema expressed in the relational model. In particular, G is a set of
relations, each with an associated arity that indicates the number of its attributes:
G = {R1 (A1 ), R2 (A2 ), ..., Rn (An )}, where Ai is the sequence of attributes of
Ri , for i = 1..n.
2. S is the source schema, constituted by the schemas of the various sources. We assume that the sources are relational. Dealing with only relational sources is not
restrictive, since we can always assume that suitable wrappers (as the IBM Information Integrator wrappers) present sources in the relational format.
In parSm
ticular, S is the union of sets of data source relations Sj : S = j=1 Sj , with
Sj = {rj1 (aj1 ), rj2 (aj2 ), ..., rjnj (ajnj )}, where ajh is the sequence of attributes
of the h-th relation of the j-th source, for j = 1..m, h = 1..nj .
3. M is the mapping between the global and the source schema. In our framework,
the mapping is defined in the Global-As-View (GAV) approach, i.e. each global
relation in G is associated with a view, i.e., a query over the sources. We assume
that views in the mapping are specified through conjunctive queries (CQ). We recall
that a CQ q of arity n is a rule of the form:
q(x1 , ..., xn ) ← conj(x1 , ..., xn , y1 , ..., ym )
where: conj(x1 , ..., xn , y1 , ..., ym ) is a conjunction of atoms involving
the variables x1 , ..., xn , y1 , ..., ym and a set of constants of Γ . We denote the atom q(x1 , ..., xn ) as head(q), while body(q) denotes the set
conj(x1 , ..., xn , y1 , ..., ym ). A GAV mapping is therefore a set of CQ q, where
head(q) is a relation symbol of G and each atom in body(q) is a relation symbol of
S. We assume that, for each relation symbol R in G, at most one CQ in M uses R
in its head. Such a CQ will have the form:
R(x) ← r1 (x1 , y1 ), ..rk (xk , yk )
where rh ∈ S for h = 1..k, and x =
exists.
Sk
h=1
xh . We denote as ρ(R) such a CQ, if it
Finally, a query over the global schema is a formula that is intended to extract a set
of tuples of elements of Γ . We assume that the language used to specify queries is union
of conjunctive queries (UCQ). We recall that a UCQ of arity n is a set of conjunctive
queries Q such that each q ∈ Q has the same arity n and uses the same predicate symbol
in the head. A query over I is a UCQ that only uses relation symbols of G in the body
of the rules. An example of data integration system specification follows.
Example 1. We build up a data integration system specification I considering three data
sources: the first one stores information coming from the Registry Office concerning
citizens and enterprises, the second one holds geographical information about the cities
and their dislocation, the third one stores information coming from the Land Register
about ownerships (buildings and lands). The source schema S that represents these
sources within the system contains four relations:
s1
s2
s3
s4
= citizen(ssn, name, citycode)
= enterprise(ssn, name, citycode, emp number)
= city(code, name, country)
= ownerships(code, owner ssn, address, type)
We want to extract from these sources, only information about owners and their buildings. We can define the global schema G with two relations
R1 = owner(name, city)
R2 = building(address, owner)
and the mapping M can be defined as follows
v1 = owner(X, Y ) ← citizen(W1 , X, Z), city(Z, Y, W2 ).
v2 = owner(X, Y ) ← enterprise(W1 , X, Z, W2 ), city(Z, Y, W3 ).
v3 = building(X, Y ) ← ownership(W1 , Y, X,′ building ′ ).
A possible query q that asks for owners and address of buildings of Rome can be
q = Q(X, Y ) ← owner(X,′ Rome′ ), building(Y, X).
3.2
Semantics
A database instance (or simply database) C for a relational schema DB is a set of facts
of the form r(t) where r is a relation of arity n in DB and t is an n-tuple of constants
of Γ . We denote as rC the set {t | r(t) ∈ C} and with q C the result of the evaluation of
the query q (expressed over DB) on C.
Global Schema
RELATIONS
(virtual data)
GAV Mappings
Federated Schema
NICKNAMES
(virtual data)
Wrappers
Source Schema
SOURCES
DATA SETS
(real data)
Fig. 1. Levels of integration
In order to assign semantics to a data integration system I = hG, S, Mi, we start
considering a source database for I, i.e., a database D for the source schema S. We call
global database for I any database for G. Given a source database D for hG, S, Mi the
semantics sem of I w.r.t. D, sem(I, D), is the set of global databases B for I such that
B satisfies M with respect to D. In particular, B is such that for each view ρM (R) in
the GAV mapping M, all the tuples that satisfy ρM (R) in D satisfy the global schema
D
relation R, i.e. ρM (R) ⊆ RB .
Finally, we specify the semantics of queries posed to a data integration system. Such
queries are expressed in terms of the symbols in the global schema of I. Formally, given
a source database D for I, we call certain answers q I,D to a query q with respect to I
and D , the set of tuples t of objects in Γ such that t ∈ q B for every global database B
for I with respect to D, i.e q I,D = {t | ∀B ∈ sem(I, D), t ∈ q B }.
4 Efficient data integration through data federation
As illustrated in the previous section, data integration supplies a higher level of abstraction with respect to data federation. In particular, as shown in Figure 1, a data integration
system can be considered as composed of: (i) a global schema, that constitutes the intensional description of data of interest, (ii) a source schema, that consists basically of
a federated schema, (iii) a set of source data sets, that represent the real data, (iv) a set
of GAV mappings that model the relation between the global relations and the federated relations and (v) a set of wrappers that implement the correspondence one-to-one
between the federated relations and the source data sets. Such a scenario can be seen as
a specialization of the well known wrapper-mediator architecture [13].
In this section, we first present an overview of our approach to realize efficient data
integration system relying on a federation tool; then, we define two different techniques
based on such an approach; finally, we discuss the correctness of both the techniques.
Data Integration
System
COMPILER
Query q
TRANSLATOR
DB2II
Instance
Query q’
Fig. 2. System architecture
4.1
Overview of our solution
The solution we propose has the aim to implement an efficient data integration system
that offers all the optimization techniques supplied by the federation tool, besides the
expressiveness of the global schema. Note that, in the specification of the logical framework for data integration, we assume to deal with a relational source schema S. Indeed,
through this assumption, we implicitly assume to take advantage of the mechanism of
wrappers supplied by the federation tool.
Let us use DB2II in order to implement a data integration system. We need first to
build the source schema S, by creating a set of nicknames. Then, we need to implement
two modules:
1. a compiler, that is responsible of compiling the data integration system I in a DB2II
instance I ′ ;
2. a translator, that is responsible to translate a query Q over the data integration
system I in a query Q′ over the corresponding DB2II instance I ′ .
An overview of the architecture of the entire system is given in Figure 2.
Suppose to start from a DB2II instance that contains all the nicknames in S. There
are two possible techniques for the implementation of both modules presented above:
– Internal technique: the idea is to rely upon the DB2II management of views. The
compiler builds a DB2II instance that contains a view VR , for each global relation R. VR is defined by the query ρ(R) expressed in the data integration system
mapping.
– External technique: the idea is to focus the process on the user queries, rather
than on the system specification, implementing an unfolding technique [13] that
takes into account the correspondences expressed in the mappings.
Note that, in the following, for the lack of space, we will omit the rules that specify
the translation between the logic representation of both mapping and queries, and the
corresponding SQL statement issued to the DBMS.
4.2
Internal technique: implementing data integration using DB2II views
Since DB2II relies on DB2 UDB (Universal Database), the DBMS of IBM, a possible
solution to realize data integration is to take advantage of the management of views
of DB2II, in order to implement the correspondences expressed in the mapping. In
particular, the DB2II instance Ii ′ that results from the compilation of an input data
integration system I = hG, S, Mi is charaterized by a set of views V = {W }, where
each view is represented as a couple W = hV (A), Ψ (V )i such that:
– V (A) is the view schema, where A is a vector of attributes and V is the view name;
– Ψ (V ) is the conjunctive query on S that defines the view; Ψ (V ) has the form:
V (x) ← S1 (x1 , y1 ), ..Sk (xk , yk )
Sk
where x = h=1 xh and the components of the vectors x and yh are variables or
constants of the set Γ , for h = 1..k.
The compilation algorithm compilei (I), shown in Figure 3, generates the set V of
views of Ii ′ , by applying the following rule: if R(A) ∈ G, then insert into V the couple
W = hVR (A), Ψ (VR )i, where:
– ρ(R) = R(x) ← r1 (x1 , y1 ), ..rk (xk , yk ),
– Ψ (VR ) = VR (x) ← r1 (x1 , y1 ), ..rk (xk , yk ).
V := {};
for eachR(A) ∈ G do
V := V ∪ hVR (A), VR ← body(ρ(R))i
Fig. 3. compilei (I) algorithm.
Now, after having created in the DB2II instance Ii ′ the set of views V, we need
to provide a translation mechanism that, given a query a conjunctive Q over I, returns
a corresponding query Qi ′ over the set of views V that belong to Ii ′ . The translation
algorithm translatei (Q, I), shown in Figure 4, generates the result, by applying the
following two syntactical rules:
1. the head of the query Qi ′ is identical to the head of the query Q;
2. for each atom R(x1 , y1 ) in the body of Q, insert the atom VR (x1 , y1 ) in the body
of Qi ′ .
Example 2. Referring to the system specification I exposed in Example 1, we show the
compiled DB2II instance Ii ′ :
hVowner (name, city) , {Vowner (X, Y ) ← citizen(W1 , X, Z), city(Z, Y, W2 ).,
Vowner (X, Y ) ← enterprise(W1 , X, Z, W2 ),
city(Z, Y, W3 ).}i
hVbuilding (address, owner) , Vbuilding (X, Y ) ← ownership(W1 , Y, X,′ building ′ ).i
Note that the view definition for Vowner is a union of conjunctive queries. The translation
for the user query Q to be posed on the defined views is
Qi = Q(X, Y ) ← Vowner (X,′ Rome′ ), Vbuilding (Y, X).
qaux := {};
for each g ∈ body(Q) such that
g = R(xi , yi ) do
qaux := qaux ∪ VR (xi , yi )
Q′ := head(Q) ← qaux
Fig. 4. translatei (Q, I) algorithm.
4.3
External technique: external management of global schema and mapping
In the second solution, we specify a technique for the implementation of a data integration system specification by means of external data structure. In short, we leave inside
the DB2II instance Ie ′ only the federated schema S, that is, the set of nicknames that
wrap the sources, maintaining, by appropriate outer structure, the global schema and
the mapping between global relations and nicknames. Therefore, in this solution, the
compiler behaves like the identity function, i.e. the DB2II instance Ie ′ is constituted
only by S.
As we said in the previous sections, the system user poses his queries on the global
schema. In order to process user’s requests, we have to show a translation mechanism
that allows the queries to be issued on the compiled DB2II instance Ie ′ . Informally,
this may be done by substituting each atom appearing in the body of the query with the
body of its corresponding mapping definition. This process can be seen as an extension
of the well-known unfolding algorithm [13], and can be formalized as follows. Given a
conjunctive query q over the relational symbols of G, where G is the global schema of
I = hG, S, Mi, we define translated query for I ′ the query q ′ = translatee (q, I) in
which q ′ = translatee (q, I) is a new query expressed over the relational symbols of
S.
qaux := q;
for each g in body(q)
v := mapping by head(g);
σ := unify(g, head(v));
if (σ == {})
return NULL;
else
qaux := σ[replace(q, g, body(v))];
end if
end for
return qaux ;
Fig. 5. translatee (q, I) algorithm
In Figure 5 we show the unfolding algorithm we adopt for query reformulation.
Such an algorithm makes use of some subroutines: mapping by head(g), that given
an atom g returns the mapping view whose head contains g; unify(g1 ,g2 ), that unifies the variables of g1 and g2 , returning, if an unifier can be found, a set of substitutions
σ that makes both the atoms equal. Moreover, the unfolding algorithm uses the subroutine replace(q,g,conj), that, given a conjunctive query q, one of its body atoms
g, and a conjunction conj, replaces each atom g of the body of the query q with the
conjunction conj.
Example 3. We consider again the system specification I of Example 1. As we have
said before, there is no need of compiling the specification, but we have to unfold the
query q over the source relations. Based on the unfolding algorithm, we obtain the
following union of conjunctive query
Q′ (X, Y ) ← citizen(W1 , X, Z), city(Z, Y, W2 ),
ownership(W3 , Y, X,′ building ′ ).
′
Q (X, Y ) ← enterprise(W1 , X, Z, W2 ), city(Z, Y, W3 ),
ownership(W4 , Y, X,′ building ′ ).
4.4
Correctness
So far we have proposed two different techniques that allow the implementation of
a data integration system in DB2II. Compilation and translation algorithms basically
consist, in both the techniques, of syntactical transformations.
The following theorems show the correctness of both the solutions (we omit the
demonstrations for space reasons):
Theorem 1. Given a data integration system specification I = hG, S, Mi, a database
instance D for I, a conjunctive query Q over G and a tuple t̄, t̄ ∈ q I,D if and only if t̄
belongs to the set of answers we obtain by querying the compiled DB2II instance Ii ′ by
means of the query Qi ′ , where Ii ′ is the DB2II instance obtained from the compilation
of I by means of the algorithm compilei (I), and Qi ′ is the query obtained from the
translation obtained of Q translation by means of the algorithm translatei (Q, I).
Theorem 2. Given a data integration system specification I = hG, S, Mi, a database
instance D for I, a conjunctive query Q over G and a tuple t̄, t̄ ∈ q I,D if and only if t̄
belongs to the set of answers we obtain by querying the DB2II instance Ie ′ , where Ie ′
is constituted only by S, and Qe ′ is the query obtained from the translation obtained
from Q by means of the translatee (Q, I) algorithm.
5 Experimental results
In order to test the feasibility of both the techniques presented in the previous sections, we have carried out some experiments on real data, coming from the University
of Rome “La Sapienza”. The experiment scenario comprises three data sources significantly overlapping: (i) a DB2 database instance holding administrative information, i.e.
registry, exams and career information about students (204014 tuples)); (ii) a Microsoft
SQLServer database instance storing information about exam plan of students (28790
tuples); (iii) some XML documents containing information about exams of students
of the computer science diploma course (5836 tuples). Note that the number of tuples
refers to the federated tables, resulting from wrapping.
(a) Query execution times
(b) Percentage comparisons
Fig. 6. Data integration through data federation
Starting from these sources, we have built a GAV data integration system whose
specification, that we omit for the lack of space, contains 10 global relations and 27
source relations. In order to test the efficiency of our solutions regardless of network
traffic and delays, we have carried out the experiments on local instances of the data,
so as to have a truthful comparison of both the techniques. We have conducted the
experiments on an Double Intel Pentium IV Xeon machine, with 3 GHz processor clock
frequency, equipped with 2 GB of RAM memory and a 30 GB SCSI hard disk at 7200
RPM; the machine runs the Windows XP operating system.
We have run 5 test queries on both the specifications: one obtained by the internal
technique presented in Section 4.2 and the other one generated with the external technique proposed in Section 4.3. We observe that the internal technique based on DB2II
views definition is faster than the external one, which implements a query rewriting algorithm. Furthermore, the gain obtained with the first technique is major for queries that
present an higher number of joins. This is due to the capability of the DB2II query engine to efficiently process views, taking into account source statistics and some access
optimization. Comparative execution times are shown in Figure 6, where the percentage
difference between the lower and the faster technique is also presented.
6 Conclusions and future work
The goal of this paper was to propose a novel approach to realize a data integration
system, by means of a commercial data federation tool (DB2II). Summarizing, we have
provided the following contributions: (i) we have highlighted the limits of data federation tools with respect to information integration; (ii) we have proposed two different
compilation techniques to overcome such limitations; (iii) we have presented experimental results that compare the two approaches.
The proposed solutions take advantage of the tool mechanism of wrappers, besides
offering the expressiveness typical of a data integration system. Two fully running versions of our system have been implemented that follow different techniques. Through
a set of experiments we compared the two techniques, in order to evaluate which solution takes the most of the tool optimization techniques. In particular, we obtained better
results using the internal management of views of DB2II, instead of representing the
mappings by appropriate external data structures. Nevertheless, we notice that the external technique may be better if the designer wants to take advantage of the full power
of the logic approach, keeping the opportunity, for example, to exploit some recent
techniques for the treatment of incomplete and inconsistent information [11].
In fact, this work represents the first step in the implementation of a data integration
system through data federation tool. Among the perspectives of our work, we plan to
extend our approach by enhancing the modeling power of the system, for example by
means of the possibility of specifying typing constraints or integrity constraints over
the global schema. Moreover, we plan to investigate how to rely upon a commercial
federation tool, following the Local-As-View (LAV) approach to data integration [1, 6],
and, finally, offering different languages for the specification of the global schema, as
object-oriented or semi-structured languages.
References
1. Serge Abiteboul and Oliver Duschka. Complexity of answering queries using materialized
views. In Proc. of the 17th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database
Systems (PODS’98), pages 254–265, 1998.
2. Paolo Bruni, Francis Arnaudies, Amanda Bennett, Susanne Englert, and Gerhard Keplinger.
Data federation with IBM DB2 Information Integrator V8.1, 2003. Redbook, http://
www.redbooks.ibm.com/redbooks/pdfs/sg247052.pdf.
3. Andrea Calı̀, Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. Data integration under integrity constraints. Information Systems, 2003. To appear.
4. Andrea Calı̀, Domenico Lembo, and Riccardo Rosati. On the decidability and complexity
of query answering over inconsistent and incomplete databases. In Proc. of the 22nd ACM
SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS 2003), pages
260–271, 2003.
5. Oliver M. Duschka, Michael R. Genesereth, and Alon Y. Levy. Recursive query plans for
data integration. J. of Logic Programming, 43(1):49–73, 2000.
6. Gösta Grahne and Alberto O. Mendelzon. Tableau techniques for querying information
sources through global schemas. In Proc. of the 7th Int. Conf. on Database Theory
(ICDT’99), volume 1540 of Lecture Notes in Computer Science, pages 332–347. Springer,
1999.
7. L.M. Haas. A researcher’s dream. DB2 Magazine, 8(3):34–40, 2003.
8. L.M. Haas, E.T. Lin, and M.A. Roth. Data integration through database federation. IBM
Systems Journal, 41(4):578–596, 2002.
9. Alon Y. Halevy. Answering queries using views: A survey. Very Large Database J.,
10(4):270–294, 2001.
10. Richard Hull. Managing semantic heterogeneity in databases: A theoretical perspective. In
Proc. of the 16th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems
(PODS’97), pages 51–61, 1997.
11. Maurizio Lenzerini. Data integration: A theoretical perspective. In Proc. of the 21st ACM
SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS 2002), pages
233–246, 2002.
12. Alon Y. Levy. Logic-based techniques in data integration. pages 575–595. Kluwer Academic
Publishers, 2000.
13. Jeffrey D. Ullman. Information integration using logical views. In Proc. of the 6th Int. Conf.
on Database Theory (ICDT’97), volume 1186 of Lecture Notes in Computer Science, pages
19–40. Springer, 1997.