Schema management for largescale multidatabase systems
Item Type
text; Dissertation-Reproduction (electronic)
Authors
Wei, Chih-Ping, 1965-
Publisher
The University of Arizona.
Rights
Copyright © is held by the author. Digital access to this material
is made possible by the University Libraries, University of Arizona.
Further transmission, reproduction or presentation (such as
public display or performance) of protected items is prohibited
except with permission of the author.
Download date
21/01/2022 06:29:57
Link to Item
http://hdl.handle.net/10150/290610
INFORMATION TO USERS
This manuscnpt has been reproduced from the microfihn master. UMI
films the t^ directly from the original or copy submitted. Thus, some
thesis and dissertation copies are in typewriter fece, while others may be
from any type of computer printer.
The quality of this reproduction is dependent upon the quality of the
copy submitted. Broken or indistinct print, colored or poor quality
illustrations and photographs, print bleedthrough, substandard margins,
and improper alignment can adversely afiect reproduction.
In the unlikely event that the author did not send UMI a complete
manuscript and there are missing pages, these will be noted.
Also, if
unauthorized copyright material had to be removed, a note will indicate
the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by
sectioning the original, beginning at the upper left-hand comer and
continuing from left to right in equal sections with small overlaps. Each
original is also photographed in one exposure and is included in reduced
form at the back of the book.
Photographs included in the original manuscript have been reproduced
xerographically in this copy. Higher quality 6" x 9" black and white
photographic prints are available for any photographs or illustrations
appearing in this copy for an additional charge. Contact UMI directiy to
order.
UMI
A Bell & Howell Information Company
300 North Zedb Road, Ami Aibor MI 48106-1346 USA
313/761-4700 800/521-0600
SCHEMA MANAGEMENT FOR
LARGE-SCALE MULTIDATABASE SYSTEMS
by
Chih-Ping Wei
Copyright © Chih-Ping Wei 1996
A Dissertation Submitted to the Faculty of the
COMMITTEE ON BUSINESS ADMINISTRATION
In Partial Fulfillment of the Requirements
For the Degree of
DOCTOR OF PHILOSOPHY
WITH A MAJOR IN MANAGEMENT
In the Graduate College
THE UNIVERSITY OF ARIZONA
1996
UMI Number: 9713368
Copyright 1996 byWei, Chih-Ping
All rights reserved.
UMI Microform 9713368
Copyright 1997, by UMI Company. Ail rights reserved.
This microform edition is protected against unauthorized
copying under Title 17, United States Code.
UMI
300 North Zeeb Road
Ann Arbor, MI 48103
2
THE UNIVERSITY OF ARIZONA ®
GRADUATE COLLEGE
As members of the Final Examination Committee, we certify that we have
read the dissertation prepared by Chih-Ping Wei
entitled
Schema Management for Large-Scale Multidatabase Systems
and recommend that it be accepted as fulfilling the dissertation
requirement for the Degree of
Doctor of Philosophy
Anindya Jgatta,
Ralph Martinez
Date
Final approval and acceptance of this dissertation is contingent upon
the candidate's submission of the final copy of the dissertation to the
Graduate College.
I hereby certify that I have read this dissertation prepared under my
direction and recommend that it be accepted as fulfilling the dissertation
requirement.
ation
3
STATEMENT BY AUTHOR
This dissertation has been submitted in partial fulfillment of requirements for an
advanced degree at the University of Arizona and is deposited in the University Library
to be made available to borrowers under the rules of the library.
Brief quotations from this dissertation are allowable without special permission,
provided that accurate acknowledgment of source is made. Requests for permission for
extended quotation from or reproduction of this manuscript in whole or in part may be
granted by the copyright holder.
SIGNED:
4
ACKNOWLEDGEMENTS
I would like to thank all my committee members. Dr. Olivia R. Liu Sheng, Dr. Hsinchun
Chen, Dr. Anyndia Datta, Dr. Ralph Martinez, and Dr. Bernard P. Zeigler, who have
been helpful in many ways. My deepest thanks go to my dissertation advisor. Dr. Olivia
R. Liu Sheng, who is an incredibly talented, thoughtful and inspiring individual. I am
extremely fortunate to have Dr. Sheng as my mentor; she has always been there for me
and has helped me over many hurdles during my research development. This
dissertation would not have achieved such high quality without her belief in my
capabilities and her guiding me to work to my potential.
I am deeply indebted to my close friends, Paul Jen-Hwa Hu, Pony Huei-Hwa Ma, HsiaoTang Chang, Yiming Chung and Ming-Hsuan Yang, who went above and beyond the
call of friendship to help me finish up the dissertation and put life in the right perspective
when the going got tough. My gratitude to them is beyond words.
Moreover, it is impossible to measure my indebtedness to my girlfriend, Mei-Yun Wang.
She has been remarkably understanding and supportive in everything 1 pursue. I
honestly believe that I would not have achieved what I have without unceasing love,
encouragement and trust from my parents and my sisters. I dedicate this work to those
who love me and make my life blessed with hope, joy and wonders.
5
DEDICATION
To my grandfather who passed away in 1995 with an unrealized life-time wish whose
realization comes nine months too late.
6
TABLE OF CONTENTS
LIST OF FIGURES
9
LIST OF TABLES
11
LIST OF ALGORITHMS.
12
ABSTRACT
13
1. INTRODUCTION
15
1.1 BACKGROUND
1.2 OVERVIEW OF MULTIDATABASE SYSTEMS
1.2.1 CHARACTERISTICS OF MULTIDATABASE SYSTEMS
1.2.2 OBJECTIVES OF MULTIDATABASE SYSTEMS
1.2.3 SCHEMA ARCHITECTURE OF MULTIDATABASE SYSTEMS
1.2.4 RESEARCH ISSUES IN MULTIDATABASE SYSTEMS
1.3 ISSUES OF SCHEMA MANAGEMENT FOR MULTIDATABASE SYSTEMS
1.4 RESEARCH MOTIVATION AND OBJECTIVE
1.5 ORGANIZATION OF THE DISSERTATION
15
18
18
20
23
26
29
32
35
2. LITERATURE REVIEW AND RESEARCH FORMULATION.
37
2.1 LITERATURE REVIEW ON MODEL AND SCHEMA TRANSLATION
2.1.1 FORMULATION OF MODEL AND SCHEMA TRANSLATION PROBLEMS
2.1.2 APPROACHES TO MODEL TRANSLATION
2.2 LITERATURE REVIEW ON SCHEMA INTEGRATION
2.2.1 FORMULATION OF SCHEMA INTEGRATION PROBLEM
2.2.2 TAXONOMY OF CONFLICTS
2.2.3 ANALYSIS OF CONFLICT TYPES AND SCHEMA INTEGRATION PROCESS
2.3 RESEARCH FOUNDATION
2.3.1 SEMANTICS CHARACTERISTICS OF DATA MODELS
2.3.2 IMPLICATION TO SCHEMA NORMALIZATION
2.3.3 IMPLICATION TO MODEL AND SCHEMA TRANSLATION
2.3.4 SUMMARY OF RESEARCH FOUNDATION
2.4 RESEARCH FRAMEWORK AND TASKS
2.5 RESEARCH QUESTIONS
37
37
39
44
44
46
47
49
50
53
55
61
63
65
3. METAMODEL DEVELOPMENT
68
3.1 MODELING PARADIGM AND BEYOND
3.1.1 MODELING PARADIGM
68
68
7
3.1.2 COMPONENTS OF A MODEL
3.1.3 METAMODELING PARADIGM.
3.1.4 REQUIREMENTS OF METAMODEL
3.1.5 EXTENDED METAMODELING PARADIGM
3.2 METAMODEL: SYNTHESIZED OBJECT-ORIENTED ENTITY-RELATIONSHIP MODEL
3.2.1 CONSTRUCTS OF THE SOOER MODEL
3.2.2 CONSTRAINT SPECIFICATION IN THE SOOER MODEL
3.3 METAMODEL SCHEMA AND MODEL DEFINITION LANGUAGE
3.3.1 METAMODEL CONSTRUCTS IN METAMODEL SCHEMA.
3.3.2 EXPLICIT METAMODEL CONSTRAINTS IN METAMODEL SCHEMA
3.3.3 METAMODEL SEMANTICS IN METAMODEL SCHEMA
3.3.4 IMPLICIT METAMODEL CONSTRAINTS IN METAMODEL SCHEMA
3.3.5 MODEL DEFINITION LANGUAGE
3.4 METAMODELING PROCESS AND MODEL SCHEMA
69
71
74
74
77
79
84
90
90
93
95
100
102
104
4. INDUCTIVE METAMODELING
105
4.1 OVERVIEW
4.1.1 CHARACTERISTICS OF INDUCTIVE METAMODELING PROBLEM
4.1.2 COMPARISONS WITH EXISTING INDUCTIVE LEARNING TECHNIQUES
4.2 ABSTRACTION INDUCTION TECHNIQUE FOR INDUCTIVE METAMODELING
4.2.1 CONCEPT DECOMPOSITION PHASE
4.2.1.1 CONCEPT HIERARCHY CREATION
4.2.1.2 CONCEPT HIERARCHY ENHANCEMENT
4.2.1.3 CONCEPT GRAPH MERGING
4.2.1.4 CONCEPT GRAPH PRUNING
4.2.2 CONCEPT GENERALIZATION PHASE
4.2.2.1 GENERALIZATION OF PROPERTY NODES
4.2.2.2 GENERALIZATION OF LEAF CONCEPT NODES
4.2.2.3 GENERALIZATION OF NON-LEAFCONCEPT NODES AND IMMEDIATE DOWNWARD
HAS-A LINKS
4.2.2.4 GENERALIZATION OF REFER-TO LINKS
4.2.3 CONSTRAINT GENERATION PHASE
4.3 TIME COMPLEXITY ANALYSIS OF ABSTRACTION INDUCTION TECHNIQUE
4.4 EVALUATION OF ABSTRACTION INDUCTION TECHNIQUE
105
105
109
113
114
118
123
129
133
135
138
143
145
158
165
171
173
5. CONSTRUCT EQUIVALENCE ASSERTION LANGUAGE
183
5.1 DESIGN PRINCIPLES FOR A CONSTRUCT EQUIVALENCE REPRESENTATION
183
5.2 DEVELOPMENT OF CONSTRUCT EQUIVALENCE ASSERTION LANGUAGE
186
5.2.1 HIGH-LEVEL SYNTAX STRUCTURE OF CONSTRUCT EQUIVALENCE ASSERTION LANGUAGE... 187
5.2.2 DEFINITION OF CONSTRUCT-SET
188
5.2.3 DEFINITION OF CONSTRUCT-CORRESPONDENCE
199
5.2.4 DEFINITION OF ANCILLARY-DESCRIPTION
204
8
5.2.5 EXECUTION SEMANTICS OF CONSTRUCT EQUIVALENCES
5.3 INFORMATION PRESERVING CONSTRUCT EQUIVALENCE TRANSFORMATION
FUNCTION
5.4 EVALUATION OF CONSTRUCT EQUIVALENCE ASSERTION LANGUAGE AGAINST
DESIGN PRINCIPLES
207
212
228
6. CONSTRUCT-EQUIVALENCE-BASED SCHEMA TRANSLATION AND
SCHEMA NORMALIZATION
232
6.1 CONSTRUCT EQUIVALENCE TRANSFORMATION METHOD
6.2 CONSTRUCT EQUIVALENCE REASONING METHOD
6.3 CONSTRUCT-EQUIVALENCE-BASED SCHEMA TRANSLATION
6.3.1 ALGORITHM: CONSTRUCT-EQUIVALENCE-BASED SCHEMA TRANSLATION.
6.3.2 ADVANTAGES OF CONSTRUCT-EQUIVALENCE-BASED SCHEMA TRANSLATION
6.4 CONSTRUCT-EQUIVALENCE-BASED SCHEMA NORMALIZATION
232
243
251
251
252
255
7. CONTRIBUTIONS AND FUTURE RESEARCH
257
7.1 CONTRIBUTIONS
7.1.1 CONTRIBUTIONS TO MDBS RESEARCH
7.1.2 CONTRIBUTIONS TO OTHER RESEARCH AREAS
7.2 FUTURE RESEARCH DIRECTIONS
257
258
261
261
APPENDICES
265
A. RELATIONSHIPS BETWEEN SYNTHESIZED TAXONOMY OF CONFLICTS AND OTHER
TAXONOMIES
B. MODEL SCHEMA OF A RELATIONAL DATA MODEL
C. COMMON SEPARATORS AND THEIR IMPLICATIONS TO CONCEPT HIERARCHY
CREATION IN ABSTRACTION INDUCTION TECHNIQUE
D. EVALUATION STUDY 1: RELATIONAL MODEL SCHEMA INDUCED FROM A
UNIVERSITY HEALTH CENTER DATABASE
E. EVALUATION STUDY 2: NETWORK MODEL SCHEMA INDUCED FROM A
HYPOTHETICAL COMPANY DATABASE
F. EVALUATION STUDY 3: HIERARCHICAL MODEL SCHEMA INDUCED FROM A
HYPOTHETICAL COMPANY DATABASE
G. INTER-MODEL CONSTRUCT EQUIVALENCES BETWEEN THE EER AND SOOER
MODELS
H. INTRA-MODEL CONSTRUCT EQUIVALENCES OF THE SOOER MODEL
REFERENCES
265
266
269
270
275
280
286
291
295
9
LIST OF FIGURES
FIGURE 1.1: ENVIRONMENT OF MULTIDATABASE SYSTEM
17
FIGURE 1.2: REFERENCE SCHEMA ARCHITECTURE OF AN MDBS
24
FIGURE 1.3: SCHEMA MANAGEMENT ISSUES ON THE FIVE-LEVEL SCHEMA
ARCHITECTURE
30
FIGURE 2.1: SCHEMA INTEGRATION PROCESS
45
FIGURE 2.2: PROPOSED SCHEMA INTEGRATION PROCESS
49
FIGURE 2.3: NON-ORTHOGONAL MODEL CONSTRUCTS OF DATA MODEL
51
FIGURE 2.4: OVERLAPPED SEMANTIC SPACES OF DATA MODEL T AND S
52
FIGURE 2.5: CONCEPTUAL FLOW OF CONSTRUCT-EQUIVALENCE-BASED SCHEMA
NORMALIZATION
54
FIGURE 2.6: SOURCE OF SEMANTIC LOSS AND DOMAIN FOR SEMANTIC ENHANCEMENT .. 56
FIGURE 2.7: EXAMPLE OF INTRA-MODEL AND INTER-MODEL CONSTRUCT
EQUIVALENCES
59
FIGURE 2.8: CONCEPTUAL FLOW OF SCHEMA TRANSLATION BASED ON INTRA-MODEL
AND INTER-MODEL CONSTRUCT EQUIVALENCES
60
FIGURE 2.9: ARCHITECTURAL FRAMEWORK FOR CONSTRUCT-EQUIVALENCE-BASED
METHODOLOGY FOR SCHEMA TRANSLATION AND SCHEMA NORMALIZATION
65
FIGURE 3.1: MODELING PARADIGM
69
FIGURE 3.2: METAMODELING PARADIGM
72
FIGURE 3.3: SCHEMA HIERARCHY IN METAMODELING PARADIGM
73
FIGURE 3.4: EXTENDED METAMODELING PARADIGM
75
FIGURE 3.5: SCHEMA HIERARCHY IN EXTENDED METAMODELING PARADIGM
77
FIGURE 3.6: GRAPHICAL NOTATIONS OF THE SOOER MODEL CONSTRUCTS
80
FIGURE 3.7: METAMODEL CONSTRUCTS IN METAMODEL SCHEMA
91
FIGURE 3.8: MODEL DEFINITION LANGUAGE BASED ON THE SOOER METAMODEL
103
FIGURE 3.9: MODEL SCHEMA OF RELATIONAL DATA MODEL
104
FIGURE 4.1: EXAMPLE OF CONCEPT GRAPH
117
FIGURE 4.2: STEPS OF CONCEPT DECOMPOSITION PHASE
118
FIGURE 4.3: CONCEPT HIERARCHIES (EXAMPLE 1,2 AND 3) AFTER CONCEPT
HIERARCHY CREATION
123
FIGURE 4.4: CONCEPT GRAPHS (EXAMPLE 1 AND 2) AFTER CONCEPT HIERARCHY
ENHANCEMENT
128
FIGURE 4.5: CONCEPT GRAPHS (EXAMPLE 1,2 AND 3) AFTER CONCEPT HIERARCHY
ENHANCEMENT
129
FIGURE 4.6: CONCEPT GRAPHS (EXAMPLE 1.1 AND 1.2) BEFORE CONCEPT GRAPH
MERGING
130
FIGURE 4.7: CONCEPT GRAPH (EXAMPLE 1.1 WITH 1.2) AFTER CONCEPT GRAPH
MERGING
131
FIGURE 4.8: CONCEPT GRAPHS (EXAMPLE 1 AND 2) AFTER CONCEPT GRAPH MERGING 133
10
FIGURE 4.9: CONCEPT GRAPHS (EXAMPLE 1 AND 2) AFTER CONCEPT GRAPH PRUNING 135
FIGURE 4.10: HIGH-LEVEL VIEW OF CONCEPT GENERALIZATION PHASE
137
FIGURE 4.11: EXAMPLE OF PROPERTY GENERALIZATION HIERARCHY
139
FIGURE 4.12: CONCEPT GRAPHS (EXAMPLE 1 AND 2) AFTER GENERALIZATION OF
PROPERTY NODES
143
FIGURE 4.13: CONCEPT GRAPHS (EXAMPLE I AND 2) AND MODEL SCHEMA AFTER
GENERALIZATION OF LEAF CONCEPT NODES
145
FIGURE 4.14: EXAMPLE OF SETS OF SIMILAR NON-LEAF CONCEPT NODES
148
FIGURE 4.15: SIMILARITY GRAPH FOR ALL NON-LEAF CONCEPT NODES OF EXAMPLE 1
AND 2
155
FIGURE 4.16: CONCEPT GRAPHS (EXAMPLE 1 AND 2) AND MODEL SCHEMA AFTER
GENERALIZATION OF NON-LEAF CONCEPT NODES AND HAS-A LINKS
158
FIGURE 4.17: MODEL SCHEMA AFTER GENERALIZATION OF REFER-TO LINKS
164
FIGURE 4.18: RELATIONAL MODEL SCHEMA INDUCED FROM UNIVERSITY HEALTH
CENTER DATABASE
174
FIGURE 4.19: REFERENCE NETWORK MODEL SCHEMA
178
FIGURE 4.20: NETWORK MODEL SCHEMA INDUCED FROM HYPOTHETICAL COMPANY
DATABASE
178
FIGURE 4.21: REFERENCE HIERARCHICAL MODEL SCHEMA
180
FIGURE 4.22: HIERARCHICAL MODEL SCHEMA INDUCED FROM HYPOTHETICAL
COMPANY DATABASE
180
FIGURE 5.1: PROCESS OF TRANSFORMING CONSTRUCT EQUIVALENCE OF EXAMPLE 5....216
FIGURE 5.2: CONSTRUCT EQUIVALENCE TRANSFORMED FROM EXAMPLE 5
218
FIGURE 5.3: PROCESS OF TRANSFORMING CONSTRUCT EQUIVALENCE OF EXAMPLE 6....220
FIGURE 5.4: CONSTRUCT EQUIVALENCE TRANSFORMED FROM EXAMPLE 6
221
FIGURE 5.5: CONSTRUCT EQUIVALENCE TRANSFORMED FROM EXAMPLE 5
226
FIGURE 5.6: CONSTRUCT EQUIVALENCE TRANSFORMED FROM EXAMPLE 5
228
11
LIST OF TABLES
TABLE 1.1: CHALLENGES TO RESEARCH ISSUES OF MDBS
27
TABLE 1.2: REQUIRED SUPPORT FOR DIFFERENT TYPES OF SCHEMA EVOLUTION
32
TABLE 3.1: FUNCTIONS FOR FUNCTION EXPRESSION IN SOOER CONSTRAINT
SPECIFICATION
89
TABLE 3.2: EXPLICIT METAMODEL CONSTRAINTS IN METAMODEL SCHEMA
94
TABLE 3.3: METAMODEL SEMANTICS IN METAMODEL SCHEMA
97
TABLE 4.1: SUMMARY OF CHARACTERISTICS OF INDUCTIVE METAMODELING PROCESS 106
TABLE 4.2: SUMMARY OF EXISTING INDUCTIVE LEARNING TECHNIQUES
112
TABLE 4.3: MODEL CONSTRAINTS (EXAMPLE 1 AND 2) AFTER CONSTRAINT
GENERATION
171
TABLE 4.4: TIME COMPLEXITY OF EACH STEP IN ABSTRACTION INDUCTION
TECHNIQUE
172
TABLE 4.5: SUMMARY OF EVALUATION STUDY 1 (RELATIONAL MODEL SCHEMA)
176
TABLE 4.6: PRECISION RATES OF ABSTRACTION INDUCTION TECHNIQUE IN
EVALUATION STUDY 1
177
TABLE 4.7: SUMMARY OF EVALUATION STUDY 2 (NETWORK MODEL SCHEMA)
179
TABLE 4.8: SUMMARY OF EVALUATION STUDY 3 (HIERARCHICAL MODEL SCHEMA)
181
TABLE 4.9: SUMMARY OF THREE EVALUATION STUDIES
182
TABLE 5.1: SOURCES FOR ANCILLARY-DESCRIPTION WHEN EXCHANGING CONSTRUCTSETS
205
TABLE 5.2: SUMMARY OF EXECUTION SEMANTICS OF CONSTRUCT EQUIVALENCE
212
TABLE 5.3: RESTRUCTURING OPERATIONS FOR ANCILLARY CLAUSE AND CONSTRUCTCORRESPONDENCE
214
TABLE 5.4: RESTRUCTURING OPERATIONS FOR COMPLEX CONSTRUCT-DOMAINS AND
SELECTION CLAUSES ON LHS CONSTRUCT-INSTANCES (OR CONSTRUCT-INSTANCESETS)
215
TABLE 5.5: INVERSIBILITY OF RESTRUCTURING OPERATIONS OF THE CONSTRUCT
EQUIVALENCE TRANSFORMATION FUNCTION
223
TABLE 5.6: EVALUATION OF CONSTRUCT EQUIVALENCE ASSERTION LANGUAGE
231
12
LIST OF ALGORITHMS
ALGORITHM 3.1: ALGORITHM OF METAMODEL SEMANTICS INSTANTIATION
ALGORITHM 4.1: GENERALIZATION OF NON-LEAF CONCEPT NODES AND DOWNWARD
HAS-A LINKS
ALGORITHM 4.2: CONSTRAINT GENERALIZATION PROCESS
ALGORITHM 6.1: INTER-MODEL CONSTRUCT EQUIVALENCE TRANSFORMATION
ALGORITHM 6.2: INTRA-MODEL CONSTRUCT EQUIVALENCE TRANSFORMATION FOR
SOURCE DATA MODEL
ALGORITHM 6.3: INTRA-MODEL CONSTRUCT EQUIVALENCE TRANSFORMATION FOR
TARGET DATA MODEL
ALGORITHM 6.4: INTRA-MODEL CONSTRUCT EQUIVALENCE TRANSFORMATION FOR
SCHEMA NORMALIZATION
ALGORITHM 6.5: INTRA-MODEL CONSTRUCT EQUIVALENCE REASONING METHOD
ALGORITHM 6.6: CONFLICT RESOLUTION
ALGORITHM 6.7: INTER-MODEL CONSTRUCT EQUIVALENCE REASONING METHOD
ALGORITHM 6.8: ALGORITHM FOR CONSTRUCT-EQUIVALENCE-BASED SCHEMA
TRANSLATION
ALGORITHM 6.9: ALGORITHM FOR CONSTRUCT-EQUIVALENCE-BASED SCHEMA
TRANSLATION
101
150
168
233
236
240
242
247
248
250
252
256
13
ABSTRACT
Advances in networking and database technologies have made the concept of global
infomiation sharing possible. A rapidly growing number of applications require access
to and manipulation of the data residing in multiple pre-existing database systems, which
are usually autonomous and heterogeneous. A promising approach to the problems of
interoperating multiple heterogeneous database systems is the construction of
multidatabase systems.
Among all of the research issues concerning multidatabase
systems, schema management which involves with the management of various schemas
at different levels in a dynamic environment has been largely overlooked in the previous
research. Two most important research in schema management have been identified:
schema translation and schema integration. The need for declarative and extensible
approach to schema translation and the support for schema integration are accentuated in
a large-scale environment.
This dissertation presents a construct-equivalence-based methodology based on the
implications of semantics characteristics of data models for schema translation and
schema integration. The research was undertaken for the purposes of I) overcoming the
methodological inadequacies of existing schema translation approaches and the
conventional schema integration process for large-scale MDBSs, 2) providing an
integrated methodology for schema translation and schema normalization whose
similarities of problem formulation has not been previously recognized, 3) inductively
learning model schemas that provide a basis for declaratively specifying construct
equivalences for schema translation and schema normalization. The methodology is
based on a metamodel (Synthesized Object-Oriented Entity-Relationship (SOGER)
model), an inductive metamodeling approach (Abstraction Induction Technique), a
declarative construct equivalence representation (Construct Equivalence Assertion
Language, CEAL), and its associated transformation and reasoning methods. The results
of evaluation studies showed that Abstraction Induction Technique inductively learned
satisfactory model schemas. CEAL's expressiveness and adequacy m meeting its design
principles, well-defined construct equivalence transformation and reasoning methods, as
well as the advantages realized by the construct-equivalence-based schema translation
and schema normalization suggested that the construct-equivalence-based methodology
be a promising approach for large-scale MDBSs.
15
CHAPTER 1
Introduction
1.1 Background
For historical reasons and because different applications can be better supported by
different types of database management systems (DBMSs) which employ different data
models with different query languages, a variety of heterogeneous and autonomous local
database systems (LDBSs) are currently in use in organizations today [MY95]. Each
LDBS manages data for its applications (called local applications) autonomously and
usually is not accessible to other applications which are not pertaining to the LDBS. As
organizations and users become more sophisticated, they increasingly demand both
access to and ability to manipulate data in multiple pre-existing heterogeneous and
autonomous LDBSs without loss of local autonomy [BHP92]. This demand for global
information sharing arises in two contexts: complementary databases and competing
databases [FN93, Z94, ZSC95]. In the context of complementary databases within a
single or multiple organizations, a LDBS covering a subarea of a domain of interest is a
complement of other LDBSs, each of which captures only certain aspect of the domain.
Information concerning the same reality in the domain therefore is scattered over
different LDBSs. Examples of global information sharing in the complementary context
include computer-integrated manufacturing (CIM) [USR93, WD92, DW91], health-care
[LWH94a, LWH94b], and office information environments [HM85].
Without the
16
capability of access to and manipulation on data in all these complementary databases,
complete information of the same reality is hard to obtain and global data consistency
among LDBSs is difficult to achieve. On the other hand, in the context of competing
databases, LDBSs are similar in content but serve competing business interests.
Electronic commerce provides a typical example of global information sharing in the
competing context [MYB87, BW95].
Competing databases once were considered
proprietary by their owning organizations. Now, however, organizations are increasingly
willing to grant external users (e.g., customers, suppliers, etc.) access to part of their
databases in order to maintain competitive advantages and broaden promotion charmels.
Consequently, global information sharing on competing databases provides a global
information retrieval platform on which users can easily search for information in
different databases or seek services provided by different organizations.
Applications for global information sharing (called global applications) are required to
access and manipulate data from multiple pre-existing heterogeneous and autonomous
database systems. Without any global support or coordination, the development of such
global applications requires global application developers to have comprehensive
knowledge of the schemas being managed by LDBSs, be able to resolve any schema or
system heterogeneity, and be able to use various database systems having different data
models and query languages, etc. The performance and effectiveness (e.g., quality of
query results and global data consistency) of global applications therefore depends
heavily on the extensive knowledge of global application developers. To facilitate the
development of global applications, it has been suggested that the most viable and
17
general solution is a multidatabase system (MDBS) [BHP92, K95]. An MDBS is a
database system that resides unobtrusively on top of existing LDBSs and presents a
single database illusion to global applications [K95], as shown in Figure 1.1.
Its
unobtrusiveness means that an MDBS should not require any change to an existing
LDBS or any existing local applications. Moreover, an MDBS should not limit global
applications orJy to retrieval operations. Both retrievals and updates on LDBSs from
global applications should be supported by an MDBS.
Global
Application
global
transaction ] r
result of global
transaction
Multidatabase System
(MDBS)
Q
subtransactions
results of
subtransactions
communication network
Local
Application
subtransaction
result
local transactions
ions ^
result of
subtransaction
\
Local Database
System (LDBS)
subtransaction
result of
subtransaction
result;^
Local Database
System (LDBS)
Local
Application
'
^^ocal tr
transactions
Figure 1.1: Environment of Multidatabase System
As shown in Figure 1.1, a global transaction in an MDBS environment is a transaction
executed under the MDBS's control and is decomposed by the MDBS into a set of
subtransactions, each of which is an ordinary local transaction from the point of view of
the LDBS where the subtransaction is executed. Afterward, the execution results of
subtransactions will be rendered back to and consolidated in the MDBS, which in turn
18
returns the consolidated result to its requesting global application. At the same time,
local applications are still preserved.
Outside the control of the MDBS, local
applications submit to their LDBSs local transactions which have access only to a single
LDBS.
Execution results of local transactions are returned to requesting local
applications directly by LDBSs.
1.2 Overview of Multidatabase Systems
1.2.1 Characteristics of Multidatabase Systems
The first and most substantial characteristic of an MDBS is autonomy of LDBSs. A
classification of the type of autonomy in a MDBS, proposed by Veijalainen and PopescuZeletin [VP88] and later extended by Sheth and Larson [SL90], is summarized as follow.
1. Design autonomy: refers to the ability of a LDBS to choose its own design with
respect to any matter, including: I) data being managed, 2) representation (data
model, query language) and naming of data elements, 3) conceptualization or
semantic interpretation of data,
4) data constraints, and 5) functionality and
implementation (e.g., access methods, transaction management model, concurrency
control and recovery method, etc.) of the local DBMS.
2. Communication autonomy: refers to the ability of a LDBS to decide whether to
communicate with other LDBSs. A LDBS with communication autonomy is able to
decide when and how it responds to a request from another LDBS.
3. Execution autonomy: refers to the ability of a LDBS to execute local transactions
without interference from extemal transactions (i.e., subtransactions) submitted by an
19
MDBS and to decide the order in which to execute external transactions. Thus, an
MDBS cannot enforce an execution order on a LDBS with execution autonomy.
Execution autonomy implies that a LDBS can abort any transactions (local or
external) that does not meet its local constraints and that its local transactions are
logically unaffected by its participation in an MDBS. Furthermore, the LDBS need
not inform an MDBS of the order in which external transactions are executed and the
order of an external transactions with respect to local transactions.
4. Association autonomy: implies that a LDBS has the ability to decide whether and
how much to share its resources and functionality with others. This includes die
ability of a LDBS to associate with or disassociate from an MDBS and the ability of
a LDBS to participate in more than one MDBSs with different degree of sharing.
As a result of the design autonomy of LDBSs, the second characteristic of an MDBS is
heterogeneities among participating LDBSs.
Heterogeneities may occur at different
levels:
1. Syntactic heterogeneity: is primarily caused by the fact that LDBSs may not be
homogeneous with respect to their data models and query languages.
2. Semantic heterogeneity: refers to the same reality being represented differently in
different LDBSs. This partially results from multiple equivalent representations for
the same reality expressed in the same data model and panially from different
perspectives of LDBSs' designers and conflicts ha application semantics [BLN86,
BCN92, SP91, KK92].
20
3. System heterogeneity: is primarily catised by the fact that LDBSs may not be
homogeneous with respect to their functionality and implementation. For instance, a
LDBS may not be a first-class DBMS and therefore lack important DBMS features
[LOG92].
Different LDBSs may support different access methods in query
execution.
Furthermore, LDBSs may have different concurrency control and
recovery methods, and may use different transaction management models.
4. Operating environment heterogeneity: LDBSs may reside on computer systems with
different architectures and different computational speed. On the other hand, LDBSs
may
be connected
via different
types of
networks supporting different
communication protocols.
1.2.2 Objectives of Muitidatabase Systems
The characteristics of local autonomy and heterogeneities of LDBSs in an MDBS
environment present several development requirements. In particular, an MDBS must
provide an environment that provides local autonomy preservation, transparency, and
multi-database consistency assurance. Performance and extensibility requirements are
also essential to the development of an MDBS.
1. Local autonomy preservation: preserving the four types of autonomy of LDBSs. The
MDBS must require absolutely no change to LDBSs (including data and DBMSs)
and existing local applications. In other words, the MDBS must appear to any
participating LDBS as just another local application and must introduce virtually no
21
change in the administration of any participating LDBS. Furthermore, the MDBS
cannot enforce an association or disassociation decision on a LDBS.
2. Transparency
provision: An
MDBS
needs
to
hide
the above-mentioned
heterogeneities from global applications. Four types of transparencies need to be
provided:
a) Syntactic heterogeneity transparency: obviates global applications from the need
to know different data models and query languages employed by LDBSs. Two
levels of syntactic heterogeneity transparency can be defined. At minimum,
global applications should be able to access LDBSs in the data model and query
language supported by the MDBS (called common data model and common
query language). To support this essential requirement, schemas of LDBSs need
to be translated into equivalent schemas in a common data model. Second, a very
appealing feature of global applications is that they can choose to work with the
MDBS using a data model and query language other than the common data model
and query language; that is, the MDBS needs to present several interfaces in
different data models and accessible by different query languages to global
applications.
b) Semantic and distribution heterogeneity transparency: Global applications are
shielded from semantic heterogeneities among LDBSs by the MDBS. Moreover,
the MDBS needs to prevent global applications from having to know locations of
information, whether or not information is replicated, and how information is
dispersed and can be combined from different LDBSs. To achieve this type of
transparency, an integrated schema (or integrated schemas) which represents the
22
integration of LDBSs' schemas needs to be maintained at the MDBS level. Thus,
by issuing global transactions through the integrated schema of the MDBS,
global applications can access or manipulate the set of participating LDBSs as if
they were a centralized database system.
c) System transparency: Global applications need not be aware of functional and
implementational differences of LDBSs.
d) Operating environment transparency: The MDBS must shield global applications
from the heterogeneous operating environments of LDBSs.
3. Multi-database consistency assurance: ensuring the correct and reliable execution of
global retrieval and update transactions in the presence of local transactions. Both
transactional integrity, which refers to the consistency of all local databases in the
presence of concurrent global and local access, and semantic integrity, which refers
to the consistency of all local databases with respect to integrity constraints, need to
be ensured by the MDBS.
However, maintaining these types of integrity in an
MDBS environment is significantly more difficult than in a homogeneous distributed
database environment.
4. Performance comparable to that of homogeneous distributed database systems:
Hiding heterogeneities and preserving local autonomy of LDBSs considerably
increase the performance overhead of the MDBS. To be practical, the performance
of the MDBS should be at least comparable to that of a homogeneous distributed
database system. This is a difficult yet an essential requirement for the MDBS.
5. High extensibility: The MDBS needs to be highly extensible with respect to its
ability to accommodate new association of a LDBS whose data model, query
23
language, transaction management model and concurrency control method are
possibly unknovm to the MDBS, the disassociation of a participating LDBS, and
evolutions of participating LDBSs. In other words, an MDBS needs to adapt to any
of these changes with minimal effort.
1.2.3 Schema Architecture of Muitidatabase Systems
The characteristics of design and association autonomy, heterogeneity and distribution of
an MDBS make three-level schema architecture appropriate for describing the
architecture of a centralized DBMS inadequate to describe the architecture of an MDBS.
Several reference schema architectures have been proposed to support these
characteristics [B87, TBC87].
Most of them are similar to the five-level schema
architecture described in [SL90]. The reference five-level schema architecture, as shown
in Figure 1.2, consists of a local schema level, a component schema level, an export
schema level, an integrated schema level, and a global external schema level. Moreover,
to support the coexistence of global and local users in an MDBS environment, each
LDBS usually will have a set of local external schemas defined on top of its local
schema. Since local external schemas are mainly for local users, they are not included in
the schema architecture of an MDBS.
24
Global
External
Schema
Global
External
Schema
Global
External
Schema
Integrated
Schema
i
Export
Schema
Export
Schema
:
Local
External
Schema
Export
Schema
: 1: \
Component
Schema
Component
Schema
Component
Schema
Local
Schema
Local
Schema
Local
Schema
Local
External
Schema
Figiire 1.2: Reference Schema Architecture of an MDBS
1. Local schema: A local schema is the conceptual schema of a LDBS. A local schema is
expressed in the local data model of the local DBMS, hence different local schemas
may be expressed in different data models.
2. Component schema: A component schema is derived by translating local schema into a
common data model of the MDBS. Two reasons for defining component schemas are
(1) they describe the divergent local schemas using a single representation and (2)
semantics that are missing in a local schema can be added to its component schema.
The use of the component schemas supports the design autonomy and heterogeneity
features of an MDBS.
3. Export schema: Not all data of a LDBS may be available to an MDBS and its users.
An export schema represents a subset of a component schema that is available to the
MDBS.
The purpose of defining export schemas is to facilitate control and
25
management of association autonomy. The representation in the export schema level is
the common data model of the MDBS.
4. Integrated schema: An integrated schema (sometimes called a global schema or a
federated schema) is an integration of multiple export schemas. An integrated schema
also includes the information on data distribution that is generated when export
schemas are integrated. The integrated schema level supports the distribution and
semantic heterogeneity feature of an MDBS. The representation in the integrated
schema level is the common data model of the MDBS. There may be multiple
integrated schemas in an MDBS, one for each class of federation users performing a
related set of activities.
5. Global external schema: A global external schema defines a schema for a global
user/application or a class of global users/applications. The representation in the global
extemal schema level (called the extemal data model) can be the common data model
of the MDBS or any data model preferred by global users/applications, as described in
the objective of the syntactic heterogeneity transparency provision.
The schema architecture shown in Figure 1.2 allows only one export schema for each
LDBS and one integrated schema for an MDBS. However, this is only a reference
model. Depending on the assumptions and requirements of the MDBS being designed,
other schema architectures degenerating or extending from this reference model can be
employed. For example, if all data defined on the local schema of each LDBS fiilly
participate the MDBS, the export schema level can be excluded from
the schema
architecture; thus, the reference five-level schema architecture is degenerated into a four-
level schema architecture.
On the other hand, if the reference schema architecture
appears too restricted when LDBSs are required to have flexibility in deciding not only
whether to participate the MDBS but also which federations to join and what data to be
shared in each federation, the integrated and export schema levels of the reference
schema architecture need to be extended. In this schema architecture (for MDBS with
multiple federations), several export schemas can be defined on the component schema of
a LDBS. Each of the export schemas can be associated with one or more federations.
There exist multiple integrated schemas in the MDBS each of which represents the
conceptual view of a specific federation of users/applications. An integrated schema can
be constructed from
a set of export schemas, a set of integrated schemas, or a
combination of them.
1.2.4 Research Issues in Multidatabase Systems
Based on the five-level
reference schema architecture discussed above, an essential
research issue in developing an MDBS is the development or selection of a common data
model and common query language. In addition, the characteristics and objectives of an
MDBS render several challenges to the management of and the operations in an MDBS.
MDBS management concerns the development and evolution processes of an MDBS,
while the MDBS operation involves issues related to global transactions and query
execution. Table 1.1 illustrates important research issues and their challenges in each of
the categories outlined below.
27
Research Issues
Characteristics of MDBS
Autonomy
Heterogeneity
MDBS Management
=> Schema
Design and
association
Management
autonomy
=i> Negotiation
Association
autonomy
MDBS Operation
=> Global
Design and
execution
Transaction
Management
autonomy
=> Multidatabase
Query
Optimization and
Processing
Communication
autonomy
Objectives of MDBS
Syntactic and
semantic
heterogeneity
Autonomy preservation,
transparencies provision,
and high extensibility
Autonomy preservation
System
heterogeneity
Autonomy preservation,
transparencies provision,
multi-database
consistency assurance,
performance, and high
extensibility
Autonomy preservation,
transparencies provision,
and performance
System and
operating
environment
heterogeneity
Table 1.1: Challenges to Research Issues of MDBS
MDBS Management:
1. Schema management: In terms of the reference schema architecture shown in Figure
1.2, schema management refers to constructing and evolving schemas in the
component, export, integrated and global external schema levels during the MDBS
development and evolution processes. Schema management involves such issues as
schema translation, schema integration, schema definition, and schema evolution
support. Detailed discussion on each of these schema management issues will be
deferred until next subsection.
2. Negotiation: Negotiation which occurs between the MDBS and LDBS database
administrators involves protocols for governing the specification of export schemas
28
and the declaration of authorization to global users on accessing export schemas
[SL90, AB89]. The negotiation protocols need to facilitate the preservation of the
association autonomy of LDBSs with the goal of satisfying the global users'
requirements and minimizing authorization conflict.
MDBS Operation:
1. Global transaction management: It is responsible for maintaining database
consistency in the presence of local transactions while allowing concurrent global
updates across multiple LDBSs.
Global transaction management is much more
complicated than distributed transaction management for homogeneous distributed
database systems.
The complication is introduced by 1) possibly different
transaction management models and concurrency control methods employed by
LDBSs, 2) execution autonomy preventing the MDBS from being able to enforce
transaction execution orders on LDBSs and from knowing the serialization orders of
transaction execution from
LDBSs, 3) the coexistence of global and local
transactions causing indirect conflicts of which the MDBS is unaware, and 4)
performance requirements [GRS94, MRB92].
2. Multidatabase query optimization and processing: It is responsible for decomposing a
global query into subqueries, generating and determining the most efficient execution
strategy among all plausible execution strategies, and translating each subquery into
an equivalent query in a local query language understood by its executing LDBS.
Due to the unavailability of statistics required by query optimization (constrained by
communication autonomy), system heterogeneity (e.g., different access methods
29
supported by different LDBSs; thus, resulting performance difference) and operating
environment heterogeneity (different computational speed of LDBSs and network
speed connecting LDBSs), additional concerns need to be taken into account when
developing the multidatabase query optimizer and processor for an MDBS [SL90,
DKS92, LOG92, MY95].
Among the MDBS research issues mentioned above, schema management is critical to
the development, evolution and operation of an MDBS. However, in the past research,
schema management has not been addressed in its entirety. Thus, it deserves the more
detailed discussion which will be provided in the next section.
1.3 Issues of Schema Management for Multidatabase Systems
Figure 1.3 illustrates the main schema management issues considered in the five-level
schema architecture of an MDBS. It is evident that the development of an MDBS is a
bottom-up process which involves schema translation (translating each local schema into
its corresponding component schema), schema definition (defining an export schema(s)
for each component schema), schema integration (integrating all export schemas into an
integrated schema), and schema definition/schema translation (defining and translating
part of the integrated schema into a global external schema).
30
Global
External
Schema
Schema
Definition/
Translation
Global
External
Schema
Changes
Integrated
Schema
Changes
Export
Schema
Schema
Integration
Schema
Definition
Changes
Component
Schema
Component
Schema
Schema
Translation
Changes
Changes
Export
Schema
Local
Schema
Local
Schema
Figure 1.3: Schema Management Issues on the Five-level Schema Arcliitecture
1. Schema translation hides the syntactic heterogeneity in an MDBS and is needed in
two situations: between the local and component schema levels, and between the
integrated and global external schema levels. Since the local data model of a LDBS
is usually not as expressive as the common data model, some data semantics are
missing in the local schema of the LDBS and are often embedded in the extension of
the local schema (i.e., local database). Therefore, the schema translation from the
local schema to its corresponding component schema involves a schema enrichment
process to improve the semantic level of the component schema. Schema enrichment
discovers missing semantics from local databases and will reduce the difficulty of
schema integration that can make use of these additional semantics to easily detect
and resolve semantic heterogeneity.
31
2. Schema integration is required to provide semantic and distribution heterogeneity
transparency. It refers to integrating multiple schemas (export, component and/or
integrated schemas, depending on the schema architecture of the MDBS) into a single
integrated schema by identifying and resolving the semantic heterogeneity existing
between/among schemas to be integrated.
3. Schema definition provides facility to support the specification of an export schema
from a component schema or a global external schema from an integrated schema of
an MDBS. The schema definition facility requires a view mechanism, similar to that
for specifying external schemas in centralized database systems, in the common
query language of the MDBS.
4. Schema evolution support deals with the propagation of changes in any schema level
of the schema architecture to other affected schema levels in a dynamic MDBS
environment. The schema evolution may be caused by 1) association of a LDBS into
the MDBS, 2) disassociation of a LDBS from the MDBS, 3) changes to the local or
export schema of a LDBS, 4) changes to a global external schema, or 5) creation of a
new global external schema.
Support for different types of schema evolution
involves different combinations and sequences of schema translation, schema
integration, schema definition, and housekeeping operations.
The housekeeping
operations are used for updating the affected schema(s) in the schema architecture
without involving schema translation, integration, and definition. Table 1.2 shows
the required support for each type of schema evolution.
32
Type of Schema
Evolution
New Association
Disassociation
Changes to Local or
Export Schema
Changes to Global
External Schema
Adding New Global
External Schema
Schema
Translation
V
Schema
Integration
V
V
yj
Schema
Definition
V
Housekeeping
Operations
V
V
V
V
V
V
V
V
Table 1.2: Required Support for Different Types of Schema Evolution
1.4 Research Motivation and Objective
Tremendous advances in networking, the exponential growth of databases, and the everincreasing need for global information sharing will bring an MDBS into a large-scale
environment. It has been recognized that finding solutions to interoperability in a large
network of databases will be a major research area in the next decade or so [BKG93,
SSU91, SYE90].
"Large-scale" is a relative term. In terms of schema management issues, it is not defined
by the number and geographical dispersion of LDBSs participating in an MDBS. Rather,
it is characterized by 1) the magnitude and variation of local data models employed by
LDBSs being associated with an MDBS, 2) complicated semantic heterogeneity of
participating LDBSs, and 3) dynamic changes of schemas, participation and revocation
of LDBSs. A more detailed elaboration of each characteristic and its challenges to
schema management in a large-scale MDBS environment are as follows:
33
1. In a large-scale MDBS environment, the assumption made in some prototype
MDBSs [BOT86, TBC87, ADD91, ADK91, ADSY93] that the MDBS supports only
a given set of local data models is inappropriate, because other data models
(proprietary, variations of a data model, or possibly newly emerged) may be adopted
as local data models of LDBSs. Moreover, the number of external data models
needed to be supported also increases in a large-scale MDBS environment. The
magnitude and variation of local and external data models impose a fundamental
requirement that schema translation must be declarative and extensible to easily
accommodate various, possibly unseen, local and external data models.
2. The semantic heterogeneity in a large-scale MDBS environment is more complicated
than that in a smaller-scale MDBS environment. Increased and more complicated
semantic heterogeneity calls for a schema integration process that reduces
interactions with database administrators during an extremely difficult schema
integration process. This can be approached in two ways. One is to reduce the
amount and complexity of the semantic heterogeneity which needs to be identified
and resolved by means of transforming schemas that make them semantically
equivalent but more representationally compatible. The other is to adopt some form
of intelligent support to discovering data semantics buried in local databases so that
the need for interaction with database administrators will be lessened.
3. A large-scale MDBS envirorunent is dynamic because I) new LDBSs may join into
the MDBS, 2) LDBSs already participating in the MDBS may cease their
participation in the MDBS, and 3) local or export schemas of LDBSs may change
over time. Thus, schema evolution support will become a de facto requirement. As
34
shown in Table 1.2, schema translation, schema integration and schema definition are
often needed in different types of schema evolution. Facility for schema definition
should be invariant, regardless of the scale of the MDBS, because it is simply a view
mechanism on the common query language of an MDBS.
However, increased
requests for the schema evolution support in a dynamic environment amplify the
challenges to schema translation and schema integration mentioned above.
The first approach to facilitating the schema integration process is to normalize schemas
based on some pre-defined normalization criteria. Moreover, this can also be regarded as
a special type of schema translation.
In other words, transforming a schema is to
translate the schema into another schema expressed in the same data model as the
original one. Thus, it is achievable to develop a technique suitable to solving th^ two
most important schema management
issues: schema translation and schema
normalization for facilitating the schema integration process. Past research on schema
translation focused mostly on the development of model-to-model translation rules. A
declarative and model-independent approach for schema translation has not yet been
proposed in the literature. On the other hand, schema translation and the support for
schema integration have been treated as independent research issues in previous research.
The overlapping between schema translation and support for schema integration has
failed to be identified and utilized in providing a satisfactory solution to these two issues.
This dissertation research was motivated by the growing trend of moving an MDBS into
a large-scale environment and was directed toward developing an integrated
35
methodology to meet the challenges to schema translation and the support to schema
integration that exist in such an environment.
1.5 Organization of the Dissertation
This chapter presented an overview of the characteristics, objectives, reference schema
architecture, major research issues of an MDBS in general, and the schema management
for an MDBS in particular. The motivation and objective of this dissertation research
also have been discussed.
In Chapter 2, the literature relevant to this dissertation
research will first be reviewed, followed by the formulation of research approaches, tasks
and framework.
Detailed research questions to be addressed will also be defined.
Chapter 3 will depict the metamodeling paradigm as well as the development of a
metamodel. Synthesized Object-Oriented Entity-Relationship (SOOER) model, which
can also be adopted as a common data model of an MDBS. Chapter 4 will present an
inductive metamodeling technique. Abstraction Induction Technique, which induces a
model schema from a set of application schemas expressed in a data model. The model
schema serves as a foundation for construct-equivalence-based schema translation and
schema normalization. Evaluation of Abstraction Induction Technique for inductive
metamodeling will also be conducted and analyzed in this chapter. In Chapter 5, the
detailed language, Construct Equivalence Assertion Language (CEAL), for the constructequivalence-based schema translation and schema normalization will be presented.
Chapter 6 will be devoted to the development of algorithms for construct equivalence
transformation and reasoning methods as well as the construct-equivalence-based schema
translation and schema normalization.
Finally, Chapter 7 will summarize
contributions of this dissertation research as well as suggest future research directions.
37
CHAPTER 2
Literature Review and Research Formulation
This chapter reviews the literature related to this dissertation research, focusing on the
formulation of and approaches to schema translation and integration problems. Its aun is
to identify common areas between these two schema management issues and adequacies
as well as inadequacies of existing approaches. Based on the literature review, the
foundation of this dissertation research will be analyzed as a preliminary to establishing
the framework
for an integrated methodology for schema translation and support to
schema integration that is needed in a large-scale MDBS environment.
Research
questions specific to this methodology to be addressed will also be detailed.
2.1 Literature Review on Model and Schema Translation
2.1.1 Formulation of Model and Schema Translation Problems
(Data) model translation and schema translation are often used interchangeably.
However, as implied by their names, the former deals with translation at the data model
level, while the latter deals with the translation at the schema level. The problem of
model translation is usxoally formulated as:
- Given two data models,
- Define the translation knowledge between the model constructs of one data model
and model constructs of another data model.
On the other hand, the problem of schema translation is formulated as:
- Given I) an application schema represented in a data model (called a source data
model), and
2) translation knowledge from the source data model to another data model
(called a target data model)
- Derive an equivalent application schema represented in the target data model
Model translation deals with the specification of translation knowledge between data
models. Based on the semantics of model constructs of two data models, the translation
knowledge defines equivalences between the model constructs of one data model and
those of another. The detailed properties of model constructs may also need to be
specified in the translation knowledge. For example, an attribute whose multiplicity
property is 'multi-valued' in an entity-relationship model can be translated into a relation
in a relational model. The relation includes an attribute which is the same as the multi
valued attribute but now has the multiplicity property 'single-valued' as well as the
identifier attribute(s) of the entity on which the multi-valued attribute is defined. In this
example, a translation knowledge between the entity-relationship and the relational
models is defined not only at the model construct level (i.e., attribute and relation) but
also at the property of model construct level (i.e., multiplicity property of attribute).
Moreover, the translation knowledge between two data models should ideally be bi
directional. That is, the same set of translation knowledge can be used for translating
from one data model to another and vice versa.
Schema translation is the process of applying translation knowledge on an application
schema expressed in the source data model to generate an equivalent application schema
39
expressed in the target data model. Since the translation knowledge involved in a
schema translation requires only one direction (from a source data model to a target data
model), schema translation is a directional process.
2.1.2 Approaches to Model Translation
Model translations (and schema translations) usually have been developed for two major
cases [YL93, ADSY93]: 1) to support a database design process where a conceptual
schema expressed in a concept data model is subsequently converted into a logical
schema; and 2) to support an MDBS development or evolution process where a local
schema expressed in a local data model is converted into a component schema expressed
in the common data model and where the integrated schema expressed in the common
data model is translated into a global external schema expressed in an external data
model.
Most work on model translation has been related to the database design process [HC93,
G93, MHR93].
Since translations in this case mainly concern mapping from
a
conceptual data model to a logical data model (e.g., from an entity-relationship data
model or an object-oriented data model to a relational, network, or hierarchical data
model), the direct-translation approach is usually adopted.
The direct-translation
approach refers to defining translation knowledge from model constructs of one data
model to those of another data model without having formalized the two data models
(i.e., formalized model constructs, their relationships and their semantics).
The
40
translation knowledge is usually represented as a set of translation rules which are
always directed from a source data model to a target data model.
Several disadvantages are associated with the direct-translation approach. First and most
importantly, it is done on a data model by data model basis rather than at a general level.
Thus, no formal framework is defined and employed to guide the process of specifying
translation knowledge between two data models. Second, it lacks formal support for
specifying the translation knowledge (rules) from one data model to another. Because
the model constructs and their semantics are not explicitly formalized, model engineers
are required to have comprehensive understanding of the types, relationships and
semantics of model constructs of both data models. Third, the translation knowledge is
specified from one data model to another data model; it is uni-directional. Thus, the
inverse model translation always requires effort by model engineers to specify another
set of translation knowledge.
Finally, although the translation knowledge can be
represented as a set of translation rules, without formally formalized model constructs,
the development of a formal language as the representation of translation knowledge
universal to any two data models is hard to achieve. Thus, the translation knowledge is
usually implemented procedurally rather than declaratively.
The above-mentioned
disadvantages hamper use of the direct-translation approach for the MDBS development
or evolution process.
To overcome the problems inherent in the direct-translation approach, a metamodeltranslation approach has been proposed [AT91, AT93, JJ95].
The notion of
41
metamodels and metamodeling has been adopted in the design and development of
information systems [BF92, BdL89, H092] as well as computer-aided software
engineering (CASE) [GCK92]. A metamodel is a model one level higher than a model
and provides a set of constructs (called metamodel constructs) for formally formalizing
models [vG87]. The formal specification of a model expressed in a metamodel is called
the model schema of the model, while the process of conceptualization of a model is
called the metamodeling process.
A detailed description of metamodel and
metamodeling paradigm will be provided in the next chapter.
The general process in the metamodel-translation approach consists of the following two
steps: 1) the model schema for each data model to be translated is formally represented in
a metamodel and 2) the translation knowledge is then specified on the two model
schemas. In work by Atzeni and Torlone [AT91, AT93], the metamodel employed
consists of four metamodel constructs; lexical (atomic concept), abstract (abstract
concept or type), aggregation (relationship between/among abstracts) and function
(mapping between a concept, which can be a lexical, abstract or aggregation, and another
concept). Once the model schemas for the source and target data models are represented
in this metamodel, the translation knowledge is then specified on the model schemas
procedurally by Pascal-like programs. Their metamodel-translation method overcomes
the first two problems of the direct-translation approach. However, their method has the
same disadvantages as the last two disadvantages in the direct-translation approach (i.e.,
uni-directional and procedural translation knowledge). Moreover, the metamodel used in
this method allows modeling a data model only at the model construct level.
The
42
detailed properties of each model construct of a data model cannot be represented in such
a metamodel.
Jeusfeld and Johnen [JJ95] proposed another method based on the metamodel-translation
approach. The metamodel adopted is a specialization hierarchy of concepts. The most
general concept is an element which can be specialized into either a unit or a link. A unit
may consist of other units and may be connected by links. A unit can be specialized into
an object unit which may have instances in the database or a type unit which does not
have explicit instances in the database. On the other hand, links are distinguished
according to their arity and direction. Thus, a link can be an undirected link, directed
link, binary link, total link, or partial link. These subclasses of a unit or link can be
further specialized.
Since the metamodel adopted consists of a large number of
metamodel constructs (represented as concepts in the specialization hierarchy), the model
schemas of the source and target data models imply some general translation knowledge
between the two data models. For example, if a model construct of the source data
model and a model construct of the target data model are instances of the same
metamodel construct, they are regarded as being equivalent. However, the instantiation
from a model construct mto a metamodel construct is not always one to one. In this case,
a set of first order logic-like rules need to be defined to classify the model construct into
several cases each of which is an instance of one metamodel construct. The set of first
order logic-like rules constitutes another part of the translation knowledge. To deal with
the situation when a model construct in the source data model cannot find a
corresponding model construct in the target data model which is an instance of the same
metamodel construct as the former, or when a model construct in the source data model
has more than one corresponding model constructs in the target data model, a set of
mapping relationships (expressed again as first order logic-like rules) between source
and target data models are used. The set of mapping relationships is independent of
specific data models and reusable for translation between any two data models.
Information about whether the translation knowledge specified in this method is bi
directional is not explicitly stated in [JJ95]. Thus, it is hard to judge whether this method
overcomes the third disadvantage of the direct-translation approach.
Although the
translation knowledge is declaratively represented in the first order logic-like rules, first
order logic-like representation may not be easy to use. Moreover, as in the previous
method, the detailed properties of each model construct of a data model cannot be
represented in this metamodel.
In sum, the goal of model translation is to develop the translation knowledge (i.e.,
equivalence between model constructs of two data models) based on the semantics of
model constructs. The direct-translation approach for model translation is not suitable in
an MDBS environment. The metamodel-translation approach has been shown to be
appropriate to the schema translation in the MDBS context.
Due to the problems
exhibited by the two methods using the metamodel-translation approach reviewed above
[AZ91, AT93, JJ95], a new schema translation method based on the metamodeltranslation approach is needed.
44
2.2 Literature Review on Schema Integration
2.2.1 Formulation of Schema Integration Problem
In some of the literature, the problem of schema integration encompasses the process of
schema translation [WCN92, SPD92, SP91], while in other literature the schema
translation process is assumed to have been completed before the schema integration
begins [BLN86]. Since schema translation is a separate schema management issue, as
depicted in the previous chapter, the second view of the schema integration problem is
adopted in this dissertation. Based on the five-level schema architecture shown in Figure
1.2, the formulation of the schema integration problem is as follows:
- Given a collection of export schemas,
- Construct an integrated schema that will support all of the export schemas. The
construction of the integrated schema should not result in loss of information or the
ability to query and/or update LDBSs either individually or collectively.
The processes of schema integration proposed in the literature [BLN86, KK92, WCN92,
SPD92] generally consist of three main steps: conflict identification, conflict resolution,
and schema merging and restructuring. Figure 2.1 illustrates the conventional process of
schema integration.
45
Inter-schema
Relationships
Schemas
Conflict
Identification
Conflict
Resolution
Integrated
Schema
Inter-schema
Corrrespondences
Inter-schema
Corrrespondences
with Resolution
Strategies
Schema Merging/
Restructuring
Figure 2.1: Schema Integration Process
1. Conflict Identification: Input schemas are analyzed and compared to detect possible
conflicts represented as a set of inter-schema correspondences. In addition, interschema relationships, which are relationships between/among concepts in different
schemas but not existing in any individual schema, may be discovered while comparing
schemas.
2. Conflict Resolution: Each type of inter-schema correspondences may be associated
with multiple resolution strategies. It is the responsibility of this step to determine an
appropriate resolution strategy for each inter-schema correspondence. During this step,
interaction with designers and users is usually required before compromises can be
achieved.
3. Schema Merging and Restructuring: After resolution strategies for inter-schema
correspondences have been resolved and inter-schema relationships have been
collected, the schemas are ready to be superimposed, giving rise to an integrated
46
schema which will be analyzed and, if necessary, restructured in order to achieve such
desirable qualities as completeness, correctness, minimality^ and understandability.
The main methodological issues are related to the problem of discovering and resolving
possible conflicts in the schemas to be integrated [SBD93]. Conflicts between/among
schemas are mainly caused by different perspectives of designers of schemas to be
integrated, equivalence among model constructs in the data model, conflicts in
application semantics, etc. [BLN86, BCN92, SP91, KK92]. In the next subsection, a
taxonomy of conflicts will be presented.
2.2.2 Taxonomy of Conflicts
Several taxonomies of conflicts have been proposed in the literature [SP91, BCN92,
SPD92, KS94, RPR94] and a synthesized taxonomy of conflicts is depicted as follows.
The relationships between the synthesized taxonomy of conflicts and other taxonomies
found in the literature are listed in Appendix A.
1. Semantic Conflict: Different designers may not perceive exactly the same sets of
objects or may adopt different classifications. Semantic conflict indicates that interschema relationships are missing from individual export schemas [YP93, SP91,
SPD92].
2. Descriptive Conflicts: When describing related sets of real world objects, different
designers do not perceive exactly the same set of properties. Descriptive conflicts
47
include naming conflicts due to homonyms and synonjons, attribute domain, scale,
constraints, operations, etc. [SP91, SPD92].
3. Structural Conflicts: Designers can choose different model constructs to represent the
same real world objects. The extent to which structural conflicts may arise is related
to the semantic relativism of the data model in use (i.e., to its ability to support
different, although equivalent, representations of the same reality) [SP91, SPD92].
4. Extension Conflict: Data for related concepts in different LDBSs may not be
compatible. The incompatibility occurs due to different level of accuracy required by
different LDBSs, asynchronous updates among LDBSs, and etc. [RPR94].
5. Schematic Conflict: This type of conflict arises when data in one LDBS correspond
to metadata of another LDBS [SP91, SPD92, SCG93].
2.2.3 Analysis of Conflict Types and Schema Integration Process
Among the five types of conflicts defined in the previous subsection, the extension
conflict is the only one occurring in the data level rather than the schema level. It
complicates operations in an MDBS (e.g., how can inconsistent data from different
LDBSs be reconciled into a global query result?). However, conflict identification and
resolution in the schema integration process is not concerned with this type of conflict.
The identification and resolution of the semantic, descriptive and schematic conflicts
mainly depend on understanding the semantics of export schemas and usually require the
use of their extensions. On the other hand, since structural conflict results from the
semantic relativism of the data model in use, understanding the semantics of model
48
constructs of the data model and equivalences among the model constructs is essential to
the identification and resolution of this type of conflict.
Because the nature of and the knowledge required to identify and resolve structural
conflict differ from those of the other three conflict types, the former should be treated
differently in the schema integration process.
If equivalences among the model
constructs (called construct equivalences) of the data model and an associated reasoning
mechanism exist, each input schema can be transformed into a semantically equivalent
and representationally compatible schema. Thus, most (if not all) structural conflicts
among all input schemas can be eliminated from the transformed input schemas. The
process of transforming each input schema into a semantically equivalent schema is
called schema normalization, and the resulting schema is called the normalized schema.
Therefore, the complexity of the conflict identification and resolution steps will be
mitigated because the magnitude of the structural conflicts has previously been reduced,
if not fully eliminated. Based on this analysis, a new schema integration process is then
proposed as shown in Figure 2.2. In this proposed schema integration process, the first
step is schema normalization which employs a set of construct equivalences and is
guided by normalization criteria to transform each input schema into a normalized
schema. A normalization criterion is an undesired (or desired) quality that must be
avoided (or achieved) by a schema.
An undesired quality can be defined as an
expression of an undesired model construct or an undesired combination of model
constructs. After the schema normalization step, the normalized schemas replace the
original input schemas for subsequent steps in the schema integration process.
49
Construct
Equivalences
Schema
Normalization
Schemas
Normalized
Schemas
Conflict
Identification
Conflict
Resolution
Integrated
Schema
Normalization
Criteria
Inter-schema
Relationships
Inter-schema
Corrrespondences
Inter-schema
Corrrespondences
with Resolution
Strategies
Schema Merging/
Restructuring
Figure 2.2: Proposed Schema Integration Process
2.3 Research Foundation
As stated in the research motivation in Section 1.4, the objective of this dissertation
research has been to develop an integrated methodology to solve the challenges to
schema translation and support to schema integration that are accentuated in a large-scale
MDBS environment. The literature review on schema (model) translation and schema
integration indicates that construct equivalences (i.e., equivalences among model
constructs) of two data models or the same data model are essential to model translation
and schema normalization in the proposed schema integration process. Thus, the notion
of construct equivalences establishes the foundation for this dissertation research. In this
section, an analysis of construct equivalences based on the semantics characteristics of
data models will first be conducted. Its implications for schema normalization and
model translation will then be discussed.
2.3.1 Semantics Characteristics of Data Models
Non-orthogonal model constnicts:
A data model provides a set of model constructs for representing the data semantics
(structural, behavioral, and constraints) of application domains. The semantics of the set
of model constructs constitute the semantic space of the data model [SZ93]. As has been
mentioned, when using a data model it is possible that the same reality can be modeled
by equivalent representations (i.e., with different combinations of model constructs)
[BLN86, BCN92, SP94]. The phenomenon of multiplicity of possible representations of
the same real world is C3i\sd semantic relativism [SP94, SP91, SPD92] and results from the
non-orthogonal constructs supported by the data model. A set of model constructs M are
orthogonal if the semantics of any proper subset of M is inexpressible by any other
proper subset of M. On the other hand, a set of model constructs N are non-orthogonal if
a model construct (or model constructs) m N can equivalently be expressed by other
model construct(s) in N. Figure 2.3 graphically represents the semantic space of a data
model and the non-orthogonal constructs supported by the data model. For example, the
model constructs of the data model S consist of Csi, Cs2, ..., and Csk- Csi, €52 and Cgj
are not orthogonal since the union of the semantics of Cgj and Cs2 is equivalent to part of
the semantics of Cgj. Thus, there exists a construct equivalence between the union of Cgi
and Cs2 and part of Csi- On the other hand, a construct equivalence exists between €53
51
and the union of Csj and Csk because the semantics of Css is equivalent to the union of
the semantics of Cgj and Cs^. It is evident that the set of model constructs Csi, €52 and
Cs3, or the set of model constructs Csi, Cs2 and Csj are orthogonal since any proper
subset of model constructs in each set is not expressible by any other proper subset of
model constructs in the same set.
Semantic Space of
Data Model S
Cst
part of
Qi
Cs2
Csj
Cs3
u
^Sk
Csi :Model construct i of data model S
^ C : Mode' construct a and b are equivalent to
Csb-* ^
'' model construct c
Figure 2.3: Non-orthogonal Model Constructs of Data Model
The non-orthogonality of model constructs in a data model can be represented as a set of
construct equivalences within the data model (called intra-model construct equivalences).
Furthermore, these intra-model construct equivalences are bi-directional. In other words,
an intra-model construct equivalence defines that a set of model construct(s) is
equivalent to another set of model construct(s) in the same data model and vice versa.
Overlapped semantic spaces:
52
As mentioned, each data model takes up a semantic space.
Different data models
consume semantic spaces of different sizes. The semantic space of one data model
overlaps with that of another data model if there exist any equivalent relationships
among model constructs of the two data models.
In other words, the semantics
represented by overlapping model constructs in one data model can also be expressible
by their equivalent counterparts in another data model. As shown in Figure 2.4, the
semantic spaces of the data model S and T overlap in terms of the constructs Cgi, Csj, ...
of the data model S and C-miJ Cxn, C^o, ••• of the data model T. For example, part of the
semantics of Csj in the data model S is equivalent to the semantics of Cxm in the data
model T, while the semantics of Csj in the data model S is equivalent to the union of the
semantics of C-m and Cxo of the data model T.
Semantic Space of
Data Model S
-SI
Semantic Space of
Data Model T
partpt^
'Tl
-S2
-T2
Cs3
-T3
Overlapped
Semantic Space
Csi
Cxm
: Construct i of data model S
: Construct m of data model T
: is equivalent to
Figure 2.4: Overlapped Semantic Spaces of Data Model T and S
53
The notion of overlapped semantic spaces provides a sound foundation of defining
translatibility between data models. Two data models are translatable if and only if their
semantic spaces overlap; otherwise, they are not translatable. The overlapped semantic
spaces of two data models can be represented as a set of construct equivalences between
the two data models (called inter-model construct equivalences).
Like intra-model
construct equivalences, inter-model construct equivalences are also bi-directional.
23.2 Implication to Sctiema Normalization
Since schema normalization is a special type of schema translation, the implications of
the semantics characteristics of a data model to schema normalization will first be
discussed.
Schema Normalization Problem Formulation:
The non-orthogonality of model constructs in a data model makes schema normalization
possible. Accordingly, the schema normalization problem can be formally formulated
as:
- Given 1) a set of intra-model construct equivalences of a data model S,
2) normalization criteria represented as a set of undesired model
constructs, and
3) an application schema represented in S.
- Derive a normalized application schema represented in S which satisfies the
normalization criteria.
Conceptual Flow of Construct-Equivalence-Based Schema Normalization:
54
Since the schema normalization process is based on the notion of intra-model construct
equivalences, it is called the construct-equivalence-based schema normalization
approach. For example, given a set of intra-model construct equivalences of the data
model S as shown in Figure 2.3 with €53 being the undesired model construct in the
normalization criteria, the semantic space of the data model S is transformed accordingly
and shown in Figtire 2.5.
Since Cs3 is the undesired model construct in this
normalization task, the bi-directional intra-model construct equivalence between €53 and
the union of Cgj and Cgk is transformed into uni-directional from Cs3 to the union of Cjj
and CjIj. Thus, the schema normalization process normalizes an application schema by
following the direction of this intra-model construct equivalence to transform all
instances of the model construct €53 in the application schema into equivalent instances
of the model constructs Cjj and Cgk. The resulting application schema will not contain
any instance of the model construct €53. Thus, it satisfies the normalization criteria
specified previously.
Semantic Space of
Data Model S
Csi
GCs2
part of
^ Csi
Csi
C.J
Csk
••
Cs3 :Undesired model construct (normalization criteria)
Figure 2.5: Conceptual Flow of Construct-Equivalence-Based Schema Normalization
55
2.3.3 Implication to Model and Schema Translation
Model and Schema Translation Problem Formulation:
As mentioned, the overlapped semantic spaces of two data models present an opportunity
and provide direct mappings for translation between these two data models. However,
the non-overlapped semantic spaces of the two data model render two challenges to the
model translation: semantic loss and the need for semantic enhancement. In the context
of schema translation, the semantics loss refers to a situation where semantics covered in
the source application schema caimot be fully expressed by the model constructs of the
target data model, thus resulting in loss of part of the semantics of the source application
schema. On the other hand, the semantic enhancement enriches the target application
schema by discovering more semantics which are neither expressible in the source data
model nor explicitly expressed in the source application schema.
As depicted in Figure 2.6, the semantic space of the source data model S which is not
overlapped with that of the target data model T is the cause of the semantics loss since
model constructs in this non-overlapped semantic space may not be directly
representable in the target data model. For example, there exists no model construct in
the target data model T directly equivalent to the model construct of Cji in the source
data model. As a result, any semantics in the source application schema represented by
Csi may not be able to be expressed in the target data model. Thus, this part of the
semantics in the application schema will be lost when translating the application schema
represented in the data model S into that in the data model T. Therefore, one of the
objectives of model translation is to provide a systematic way of minimizing such
56
semantics loss resulted from
the non-overlapped semantic space of the source data
model.
Translation Direction
Semantic Space of
Source Data Model S
Csi
^
Semantic Space of
Target Data Model T
part of &t
^S2
Cs3
source of
semantic loss
domain for
semantic enliancement
Figure 2.6: Source of Semantic Loss and Domain for Semantic Enhancement
On the other hand, the semantic space of the target data model which is not overlapped
with that of the source data model defines the domain for semantic enhancement since
the semantics expressible by model constructs in this non-overlapping semantic space of
the target data model are not readily available in any model construct of the source data
model. For example, as shown in Figure 2.6, the semantics of the model constructs Cxj,
C-n and C-^ in the target data model T are not in the semantic space overlapped to that of
the source data model S. When translating a source application schema represented in
the data model S into in the data model T, these model constructs will not be present in
the target application schema since the semantics covered by these model constructs are
57
not available in the source application schema. Thus, the second objective of model
translation is to maximize the semantic enhancement result in a systematic manner.
To achieve these two objectives of model translation, in addition to the inter-model
construct equivalences between the source and target data models, the intra-model
construct equivalences of both data models need to be employed in schema translation.
Obviously, the inter-model construct equivalences serve as the translation bridge
between these two data models. Moreover, the intra-model construct equivalences of the
two data models can be used to minimize semantic loss and maximize the semantic
enhancement result during a translation. For example, as shown in Figure 2.6, there
exists no construct in the target data model which is equivalent to Cji or €52- However,
if there exists an equivalence between the union of the semantics of Cgi and €52 and part
of the semantics of Csi (as shown in Figure 1), it can be concluded that the semantics
covered by Csi and €52 in a source application schema is representable (e.g., by the
construct Cf-nJ in the target data model and can be contained in its corresponding target
application schema. Thus, by the use of intra-model construct equivalences of the source
data model, semantic loss can be minimized. On the other hand, if there exists in the
target data model an equivalence between some constructs in the non-overlapped
semantic space and some other constructs in the overlapped semantic space, this intramodel construct equivalence can be employed to guide the semantic enhancement
process and to increase the semantics level of the target application schema by
discovering the semantics which are missing in the source application schema but are
58
representable by those constructs in the non-overlapped semantic space of the target data
model.
Essentially, construct equivalence serves as the representation of translation knowledge;
the inter-model and intra-model construct equivalences are translation knowledge.
Model translation based on this notion is called the construct-equivalence-based model
translation approach.
Hence, the problem of model translation is formally formulated
as:
- Given
I) semantic space of a data model S
2) semantic space of a data model T
- Derive 1) the intra-model construct equivalences of S
2) the intra-model construct equivalences of S
3) the inter-model construct equivalences between S and T
The formulation of the schema translation problem becomes;
- Given 1) the intra-model construct equivalences of two data models (S and T)
2) the inter-model construct equivalences between S and T
3) an application schema represented in S
- Derive an application schema represented in T.
Conceptual Flow of Construct-Equivalence-Based Schema Translation:
The construct-equivalence-based schema translation approach becomes straightforward
and systematic, as will be seen shortly. Assume the intra-model construct equivalences
of two data models (S and T) and inter-model construct equivalences between S and T be
shown in Figure 2.7. The schema translation from the data model S (source) to the data
model T (target) is shown in Figure 2.8. Conceptually, the construct-equivalence-based
schema translation consists of three stages:
59
1. source convergence stage in wliich all instances (in the source application schema) of
the model constructs in the non-overlapped semantic space of S are transformed into
instances (in the source application schema) of the model constructs of S in the
overlapped semantic space,
2. source-target projection stage in which all instances (in the source application
schema) of the model constructs of S in the overlapped semantic space are mapped
into instances (in the target application schema) of the model constructs of T in the
overlapped space, and
3. target enhancement stage in which some instances (in the target application schema)
of the model constructs of T in the overlapped semantic space will be considered for
semantic enhancement by transforming them to instances of other model constructs,
including those in the non-overlapped semantic space of T.
Semantic Space of
Data Model S
*-si
Semantic Space of
Data Model T
partofGsi
Cs2
CSJ
-TI
".J
VSf
(GxiufGro)
-T2
-T3
:
C
: Intra-model construct equivalence
: Inter-model construct equivalence
Figure 2.7: Example of Intra-model and Inter-model Construct Equivalences
60
Translation Direction
Semantic Space of
Source Data Model S
Qi
1
d—^
Cs2
Semantic Space of
Target Data Model T
L ^ Cji
'
Ct2
Cs3
-
V
r~^ Cx3
I
Source
Convergence
Stage
Source-Target
Projection
I
Stage
|
Target
Enhancement
Stage
i
Figure 2.8: Conceptual Flow of Schema Translation Based on Intra-model and Intermodel Construct Equivalences
As shown in Figure 2.8, the source convergence stage utilizes the intra-model construct
equivalences of the source data model, the source-target projection stage employs the
inter-model construct equivalences between the source and target data models, while the
target enhancement stage manipulates the intra-model construct equivalences of the
target data model. In terms of evolution of application schema in this schema translation
process, the source convergence stage transforms the source application schema Aj into
an intermediate application schema A2 which is composed of only the model constructs
of S in the overlapped semantic space, the source-target projection stage translates A2
into another intermediate application schema A3 with only the model constructs of T in
the overlapped semantic space, and, finally, the target enhancement stage generates the
target application schema A4 which enhances the semantics level of A3 by incorporating
those model constructs in the non-overlapped semantic space of T. Compared with
Figure 2.7, the flow of schema translation from S to T shown in Figure 2.8 follows the
61
right-directed arrows of both intra-model and inter-model construct equivalences. If the
translation direction is from T to S, then the flow of schema translation would follow the
left-directed arrows in Figtore 2.7.
2.3.4 Summary of Research Foundation
This subsection summarizes the research foundation discussed above and identifies the
essential components of the construct-equivalence-based methodology for schema
translation and schema normalization.
1. The translation knowledge is represented as inter-model construct equivalences
between two data models and intra-model construct equivalences of the two data
models.
2. The construct-equivalence-based schema translation is bi-directional.
As illustrated in Figure 2.7, the direction for the inter-model and intra-model
construct equivalences is determined according to the translation direction of a
particular translation task. Requiring a method for transforming the direction of
construct equivalences (inter-model and intra-model), the construct-equivalencebased schema translation approach becomes bi-directional.
3. The notion of construct equivalence empowers an integrated methodology for both
model (and schema) translation and schema integration.
Compared with the construct-equivalence-based schema translation, the constructequivalence schema normalization process is the same as the process of the source
convergence or the target enhancement stage in its ability to transform an application
62
schema into another application schema (represented in the same data model as the
original application schema) via intra-model construct equivalences. Normalization
criteria required by the schema normalization process are used to direct the
transformation process.
Although normalization criteria are not required by the
source convergence and target enhancement stages in the schema translation process,
the former is directed by the model constructs in the overlapped semantic space of
the source data model while the latter is directed to the model constructs in the nonoverlapped semantic space of the target data model. Thus, reasoning procedures on
intra-model construct equivalences for these three processes should be highly similar
if not exactly alike. The methodology based on the construct equivalence concept
(called the construct-equivalence-based methodology), including a representation for
and reasoning procedures on inter-model and intra-model construct equivalences, can
be applied to both model/schema translation and schema normalization.
4. The construct-equivalence-based methodology for schema translation and schema
normalization requires a metamodel and its associated metamodeling process.
Inter-model and intra-model construct equivalences are defined according to the
model constructs included in the semantic spaces of data models. To facilitate the
specification of inter-model and intra-model construct equivalences required by both
the construct-equivalence-based schema translation and the construct-equivalencebased schema normalization, the semantic spaces need to be formally formalized.
Therefore, a metamodel and its associated metamodeling process are essential
components of the construct-equivalence-based methodology.
63
2.4 Research Framework and Tasks
Components essential to
the development of
the construct-equivalence-based
methodology for schema translation and schema normalization in a large-scale MDBS
environment have been identified in Section 2.3.4. They include 1) a metamodel as the
formal representation of data models (or their semantic spaces), 2) a metamodeling
process, 3) a construct equivalence representation for inter-model and intra-model
construct equivalences, 4) a construct equivalence transformation method which
transforms the direction of inter-model and intra construct equivalences conforming to a
desired translation direction, and 5) a construct equivalence reasoning method for
schema translation and schema normalization.
The metamodel and the metamodeling process are used to formally specify data models
and generate model schemas for the data models. Based on the model schemas of two
data models and the construct equivalence representation, inter-model and intra-model
construct equivalences required by the construct-equivalence-based model (and schema)
translation and/or schema normalization can be specified. The construct equivalence
transformation and reasoning methods are the processing components of the constructequivalence-based schema translation and schema normalization. The metamodeling
process is knowledge-intensive and usually error-prone, similar to a modeling process for
formally specifying an application schema using a data model. Therefore, an inductive
metamodeling process which learns the model schema for a data model from some
example application schemas without interacting with users is critical to the success of
64
the construct-equivalence-based methodology. In the MDBS context, LDBSs and hence
local schemas pre-exist and can readily be used by the inductive metamodeling process.
Figure 2.9 depicts the architectural framework
for the construct-equivalence-based
methodology for schema translation and schema normalization.
The inductive
metamodeling induces the model schema for a data model from example application
schema represented in the data model.
translation or schema
The construct-equivalence-based schema
normalization requires
transformation and reasoning methods.
both the construct equivalence
They differ only with regard to construct
equivalences and normalization criteria required.
The construct-equivalence-based
schema translation translates a source application schema represented in one data model
into a target application schema represented in another data model based on the intermodel construct equivalences between and the intra-model construct equivalences of the
two data models.
On the contrary, the construct-equivalence-based schema
normalization, which requires only the intra-model construct equivalences of one data
model and normalization criteria, transforms an application schema represented in one
data model into a normalized application schema represented in the same data model.
65
Construct-Equivalence-Based
Schema Translation or Normalization
Construct Equivalence
Reasoning Method
Application
Schema in S
Construct Equivalence
Transformation Method
Intra-model
Construct
Equivalences of S
Inter-model
Construct
Equivalences
Model Schema
of Data Model S
Target
Application
Schema in T or
Normalized
Application
Schema in S
Intra-model
Construct
Equivalences of T
Model Schema
of Data Model T
Inductive
Metamodeling
iI
Application
Schema in S
i1
Application
Schema in T
Figure 2.9: Architectural Framework for Construct-Equivalence-Based Methodology for
Schema Translation and Schema Normalization
2.5 Research Questions
Specific research questions that need to be addressed in each component of the constructequivalence-based methodology for schema translation and schema normalization are
listed below.
Research Questions Related to Metamodel:
1. What are the requirements of a metamodel? Specifically, can any existing data
model be adopted as a metamodel? What are the components of a metamodel? What
are the relationships among components of a metamodel?
66
Research Questions Related to Metamodeling Process:
2. A metamodeling process is one level higher than a modeling process. Is there any
process higher than the metamodeling process? If so, is it required in the context of
schema translation and schema normalization?
3. How does the inductive metamodeling process differ from other inductive learning
problems?
Can any existing inductive learning technique be adopted by the
inductive metamodeling process?
4. How efficient and effective is the inductive metamodeling technique developed in
this dissertation research?
Research Questions Related to Construct Equivalence Representation:
5. What are the requirements for the representation of construct equivalences? Can a
construct equivalence representation be declarative and suitable to both inter-model
and intra-model construct equivalences?
6. What is the execution semantics of an inter-model construct equivalence? Is it the
same as the execution semantics of an intra-model construct equivalence?
Research Questions Related to Construct Equivalence Transformation Method:
7. What are the requirements for the construct equivalence transformation method?
Research Questions Related to Construct Equivalence Reasoning Method:
67
8. Related to research question 6), is there any difference between the constructequivalence reasoning method for inter-model construct equivalences and that for
inter-model construct equivalence?
9. Are there any differences among the construct-equivalence reasoning methods for the
source convergence stage, the target enhancement stage, and schema normalization
stage?
10. What are the other advantages of the construct-equivalence-based model (and
schema) translation?
68
CHAPTER 3
Metamodel Development
This chapter details the metamodeling paradigm extended from the modeling paradigm,
requirements for metamodel development, and a metamodel
(Synthesized Object-Oriented Entity-Relationship Model).
named SOOER
The model constructs of
SOOER, a model definition language based on SOOER, and an instantiation mechanism
between a metamodel and models will also be discussed.
3.1 Modeling Paradigm and Beyond
3.1.1 Modeling Paradigm
In the modeling paradigm, shown in Figure 3.1, the modeling process takes a model as
input to produce a formalized application schema. Thus, an application schema is an
instantiation of constructs supported by the model and therefore should be constrained by
the constraints of the model's constructs. However, a model in the modeling paradigm
lacks formal model formalism, which results in several problems [vBtH91].
First,
ambiguity may arise. Different analysts may have different interpretations or knowledge
about the meaning and constraints of the constructs in the model, thus increasing the
possibility of erroneous application schemas.
Second, since the constraints of the
model's constructs are not explicitly formalized, the verification of an application
69
schema represented in the model is usually performed in an ad-hoc manner and
impossible on a formal basis. Finally, comparison or interoperation with other models is
difficult, if not impossible.
Abstraction Level
Modeling Level
Schema Level
Model
Modeling
Application Schema
Application
Figure 3.1: Modeling Paradigm
3.1.2 Components of A Model
To solve the problems pertaining to the modeling paradigm discussed above, a formal
specification of a model is needed before embarking on the modeling process. To
formally specify a model requires the understanding of its components.
A model,
basically, consists of the following components:
1. Model constructs:
Model constructs are a set of high-level building blocks for defining real world
systems. Based on different abstraction' views, the constructs of a model (called
model constructs) can be classified into three types:
' An abstraction is a mental process used to select some characteristics and properties of a set of objects
and exclude other characteristics that are not relavent [BCN91].
70
•
Structural constructs: real world systems are viewed as a set of data, data
properties, and data relationships.
•
Behavioral constructs: real world systems are viewed as behavior of data.
•
Constraint constructs: real world systems are viewed as a set of logical
restrictions on the data existing below the application schema level. Constraints
specify data that are considered permissible [QW86, SK86,1091]. Constraints in
an application schema can be classified into two types: implicit and explicit
constraints [B78, LS83, UL93, EN94]. Implicit constraints are implied by the
semantics of model constructs when instantiating model constructs into an
application schema. For example, in an relational model, specifying that an
attribute is the key of a relation implies a constraint on the attribute's values, a
unique value for each tuple. However, implicit constraints are not capable of
capturing all the constraints that may occur in an application domain. As a result,
additional constraints, called explicit constraints, need to explicitly be specified
in an application schema. The constraint constructs of a model are not intended
to represent implicit constraints; rather they are used to represent only explicit
constraints.
2. Model constraints:
Every model has a set of built-in constraints (called model constraints) associated
with the model constructs. An analogy to constraints in an application schema which
specify the permissible instantiations fi*om an application schema to data, model
constraints are used to verify whether an application schema is permissible or not.
For example, in a relational data model, there exists a model constraint demanding
71
that every relation must have a unique name. Accordingly, duplicate relation names
are not allowed in an application schema.
3. Model semantics:
Model semantics ascertain the semantics of the model constructs (e.g., the meaning
of a key) which will be instantiated as implicit constraints of application schemas.
3.1.3 Metamodeling Paradigm
The need for formally specifying a model extends the modeling paradigm into the
metamodeling paradigm, as shown in Figure 3.2. In the metamodeling paradigm, a
metamodel which provides metamodel constructs is employed to formally define the
specification of a model (i.e., the specification of its three components: model constructs,
model constraints, and model semantics). The conceptualization process of a model is
called metamodeling, while the formal specification of a model is called the model
schema, which serves as the formal specification in the modeling process and provides
basis for representing the application schema.
72
Abstraction Level
Mo^gliPg Lml
Schema Level
Metamodel
Metamodeling
Model
Model Schema
Modeling
Application
Application Schema
Figure 3.2: Metamodeling Paradigm
As shown in the rightmost column of Figure 3.2, a two-level schema hierarchy
associated by an instantiation relationship exists in the metamodeling paradigm: model
schema and application schema levels. Figure 3.3 provides a detailed view of this
schema hierarchy.
A model schema includes the formal specification of model
constructs, model constraints, and model semantics constraints of a model.
As
mentioned in the previous subsection, model constraints specify logical restrictions on
model constructs, while model semantics define the semantics of model constructs.
When model constructs are instantiated from the model schema into an application
schema, the instantiations need to be verified against model constraints.
Valid
instantiations of model constructs will in turn trigger the instantiations of model
semantics into implicit constraints of the application schema. Consisting of both explicit
73
and implicit constraints, application constraints define logical restrictions on application
constructs and are used to ensure that every extension of the application schema is valid.
Model
Schema
define
.roles
Model
Constraints
Model
Constructs
verily
deflne
semantics
Model
Semantics
tngger
instantiation
Application
Schema
Application
Constructs
define
rules
instantiation
Application Constraints
Explicit
Constraints
Implicit
Constraints
Figure 3.3: Schema Hierarchy in Metamodeling Paradigm
The advantages of extending the modeling paradigm to the metamodeling paradigm
become obvious.
Formally defined model constructs, along with their associated
constraints and semantics, will reduce if not eliminate ambiguity related to the meaning
and use of its model constructs. Verification of an application schema of the model can
be achieved on a formal basis because the model constraints are formally specified and
available in the model schema. Moreover, the comparison and interoperation of different
models can also be performed at the model schema level.
74
3.1.4 Requirements of Metamodel
An unanswered question in the metamodeling paradigm is "what should be the
components of a metamodel?" If a model resulting from the metamodeling process is
considered as an application of a metamodel, this question becomes identical to "what
are the components of a model" which has been answered in Section 3.1.2. Accordingly,
a metamodel is required to provide the following metamodel constructs essential to the
metamodeling process:
1. Structural constructs to be used to model the structural aspect of the three types of
model constructs.
2. Behavioral constructs to be used to capture the behavior of model constructs.
3. Constraint constructs to be used to represent model constraints.
In addition, a metamodel should contain specific metamodel constraints on metamodel
constructs and metamodel semantics defining the semantics of metamodel constructs.
3.1.5 Extended Metamodeling Paradigm
As mentioned earlier, the modeling paradigm (in Figure 3.1) is extended to the
metamodeling paradigm (in Figure 3.2) to address the need for formal specification of
models. The same analogy should be applied to the metamodel level. In Figure 3.2, the
metamodel used in the metamodeling process to formalize a model is not formally
defined. However, without the formal specification of the metamodel, the metamodeling
process has the same problems as the modeling process in the modeling paradigm. Thus,
75
the metamodel schema also should be formally defined for use in the metamodeling
process.
The conceptualization process of the metamodel is called the meta-
metamodeling. If a new meta-metamodel is adopted for the meta-metamodeling process,
another process, called meta-meta-metamodeling, would be required to formally specify
the meta-metamodel. The process could continue to infinity if there were no way to stop
it at some level. This can be done by using the same model for a specific level and its
next higher level [BF92]. Since the main interest of this dissertation research is at the
model level, there is no need to go beyond the metamodel level, at which the process
should be terminated by using the metamodel as the meta-metamodel. In other words,
the metamodel will formally be specified by its own constructs. Figure 3.4 depicts this
extended metamodeling paradigm in which the process terminates at the metamodel
level.
Abstraction Level
Modeling Level
Schema Level
Metametamodeling
Metamodel Schema
Metamodel
Metamodeling
insUntiation
Model Schema
Model
Modeling
Application
Application Schema
Figure 3.4: Extended Metamodeling Paradigm
76
The extended metamodeling paradigm expands the schema hierarchy from two levels to
three levels with additional metamodel schema. Figure 3.5 depicts the upper two levels
in the schema hierarchy. As in the two types of application constraints in the application
schema level, model constraints consist of explicit model constraints and implicit model
constraints. Implicit model constraints are implied by and derived from the semantics of
metamodel constructs when instantiating metamodel constructs into model constructs in
a model schema, while explicit model constraints refer to those not captured by implicit
model constraints but which can be explicitly specified in a model schema.
The
components and their relationships in the metamodel schema are ahnost identical to
those in a model schema since the domains of both schemas are models (in spite of their
difference in role, one being the metamodel while the other being metamodeled). There
is an additional instantiation relationship between metamodel semantics and implicit
metamodel constraints in the metamodel schema. This instantiation relationship reflects
the decision that the termination process in the extended metamodeling paradigm is the
meta-metamodeling process in which the metamodel is specified by its own constructs.
Consequently, the metamodel semantics will be instantiated as the implicit metamodel
constraints of the metamodel schema. On the other hand, since the schema in the lower
level is an instantiation of that in the next higher level m the schema hierarchy,
relationships between the model schema level and the metamodel schema level (as
shown in Figure 3.5) are the same as those between the application schema level and the
model schema level (as shown in Figure 3.3). Thus, the validity of a model schema is
governed by metamodel constraints while instantiations from metamodel semantics into
77
implicit model constraints are triggered by valid instantiations from
metamodel
constructs into model constructs.
Metamodel Constraints
define
rales
Metamodel
Schema
Metamodel
Constructs
Explicit
Metamodel
Constraints
Implicit
Metamodel
Constraints
insiantiatian
Metamodel
Semantics
define semantics
verify
trigger
instantiatioii
define
mles
Model
Schema
Model
Constructs
instantiation
Model Constraints
Explicit
Model
Constraints
define semantics
Implicit
Model
Constraints
Model
Semantics
Figure 3.5: Schema Hierarchy in Extended Metamodeling Paradigm
3.2 Metamodel: Synthesized Object-Oriented Entity-Reiationsliip
Model
The extended entity-relationship (EER) model, object-oriented (00) model, and first
order logic (FOL) are prevalent alternatives to the metamodel in the literature [SZ93].
The EER model and its variants provide a set of semantics-rich structural constructs, but
they lack the behavioral and constraint constructs required by a metamodel, as defined in
section 3.1.4.
The 00 model and its variants are capable of describing both the
structural and behavioral aspects of models. However, the semantics-richness of their
78
structural constructs is less than that provided by the EER model. Like the EER model,
the 00 model does not provide the declarative constraint specification capability, so by
itself it cannot completely satisfy the requirements of a metamodel.
The FOL
representation consists of a set of well-formed formulas. The main strength of the FOL
lies in its declarative constraint specification capability, which also permits structural
modeling. However, representing the structural constructs in well-formed formulas has
been found difficult [SZ93]. Furthermore, the behavioral aspect of models carmot be
represented using FOL.
Therefore, FOL alone caimot completely satisfy the
requirements of a metamodel either.
Although none of the discussed models alone can fully satisfy the reqiurements of a
metamodel, each has its unique strength in meeting the requirements of a metamodel.
An immediate solution is to integrate or synthesize these three models in such a way that
the weakness of one can be complemented by the strengths of others. The Synthesized
Object-Oriented Entity-Relationship (SOOER) model [LW91] represents an effort
toward the desired approach by synthesizing and extending the concepts and notations
belonging to the families of 00 and EER models to provide the necessary constructs to
model the structural, behavioral, semantic constraint and heuristic knowledge pertaining
to coupled knowledge-base/database systems. In the SOOER model, a production rule is
employed as the representation of the constraint construct, but it has limited capability to
express complicated constraints. In this section, an FOL-based constraint construct of
the SOOER model will be proposed to replace the rule-based constraint construct. By
integrating the semantics-rich structural constructs firom the EER model, the behavioral
79
constructs from the 00 model and the declarative constraint constructs from the FOL,
the SOOER model can satisfy the requirements of a metamodel, making it appropriate
for the metamodel.
In the following, the constructs of the SOOER model is first
described, followed by a detailed discussion of its constraint specifications.
3.2.1 Constructs of the SOOER Model
The constructs of the SOOER model provide the means of representing properties,
relationships, behaviors, and constraints of data of interest.
The main structural
constructs of the SOOER model include entity classes and relationship classes.
Constructs of attributes, methods, and constraints are encapsulated in entity classes and
relationship classes. The graphical notations of the structural model constructs adopted
by the SOOER model are summarized in Figure 3.6.
80
Model Construct
Graphical Notation
Entity Class
Entity
Ci^ttribute^
Attribute (Single-valued)
Attribute (Multi-valued)
Identifier
Identifier
Association Name
Association Relationship
Entity-1
(mill, max)
(mm, max)
Entity-2
Superclass
Specialization Relationship
i
Subclass-n
Subclass-I
Assembly
Class
Aggregation Relationship
(nan, max)
Component
Class-I
(ain, max)
Component
Class-m
Figure 3.6: Graphical Notations of the SOOER Model Constructs
Entity class: An entity class is an abstraction of a group of objects which have common
characteristics (attributes), behavior (methods), and relationships with other objects, and
share the same set of semantic constraints. Therefore, the constituents of an entity class
include attributes, methods, and constraints. A subset of its attributes will be designated
as the identifier of the entity class. Furthermore, an entity class may have relationships
with other entity classes.
Relationship class: A relationship class is a logical connection between or among entity
classes.
Three types of relationship classes are supported by the SOOER model:
specialization, aggregation, and association.
distinct semantics and is described as follows.
Each type of relationship has its own
81
1. Specialization relationship class: A specialization relationship class categorizes a
general entity class into one or many specialized entity classes. The general entity
class serves as a superclass and each specialized entity class is a subclass of its
general entity class. Specialization relationships are transitive: if A is a subclass of B
and B is a subclass of C, then A is a subclass of C. One important mechanism of
specialization relationship classes is inheritance, by which a subclass inherits
properties (i.e., attributes, methods, constraints, and relationships) from
its
superclass. A specialization relationship class is characterized by completeness and
disjointness properties. The completeness property of a specialization relationship
class is total if the aggregation of all objects of the superclass is the same as the
union of all objects of its subclasses; otherwise, the specialization relationship is
partial. The disjointness property of a specialization relationship is disjoint if there
is no overlapping between the objects of any two subclasses; otherwise, the
specialization relationship class is said to be overlapped.
2. Aggregation relationship class: An aggregation relationship class indicates that a
component entity class is a-part-of an assembly entity class.
An aggregation
relationship class often possesses the existence dependency between the assembly
entity class and its component entity class. That is, when an object of the assembly
entity class is deleted, all its component objects will also be deleted.
Another
property of an aggregation relationship class is called operation propagation which
states that an operation performed on an assembly object will usually be propagated
to its component objects.
Like specialization relationship classes, aggregation
relationship classes are also transitive. The participation of a component entity class
82
in an aggregation relationship class is characterized by its minimal and maximal
cardinalities. The minimal cardinality of a component entity class specifies the
minimum number of objects of this entity class required to contribute to an object of
the assembly entity class, while the maximal cardinality specifies the maximum
number of objects of this component entity class to an object of the assembly entity
class.
3. Association relationship class: An association relationship class can be a unary, a
binary or an n-ary relationship, in which n otherwise independent entity classes are
related to one another. Similar to an entity class, an association relationship class
may have its own attributes, methods, and constraints.
To distinguish one
association relationship class from others, each association relationship class is
named uniquely. Each entity class that participates in an association relationship
class plays a distinct role in the relationship. If there is no confusion with roles in an
association relationship class, role names can be omitted. Furthermore, participation
of each entity class related to an association relationship class is again constrained by
minimal and maximal cardinalities.
The minimal cardinality of an entity class
participating in an association relationship class indicates the minimum number of
association relationship instances in which an object of the entity class must
participate, while the maximal cardinality specifies the maximum number of
association relationship instances in which an object of the entity class can
participate.
83
Attribute: An attribute may be either atomic or composite (i.e., decomposable into a set
of subattributes which themselves may be either atomic or composite). An attribute can
be stored or derived. A stored attribute of an entity class or relationship class (in the
following, a generic term "class" is used for both types of classes) explicitly contains
value(s) for each object of the class.
A derived attribute which captures inter
relationships caused by functional composition or heuristic knowledge between attribute
values derives its value for a particular object from the value(s) of other attribute(s) of
same or different objects. Different from the original SOGER model [LW91], derived
attributes rather than additional rule construct are used to model the heuristic knowledge.
The advantages of using derived attributes for this purpose include uniform treatment of
information (stored or derived) and reduction of the number of model constructs,
resulting in lower model complexity. The form of a derived attribute contains arithmetic
operations, IF-THEN rules, or combination of both. An attribute is characterized by its
data type, multiplicity, uniqueness, and null-specification properties, and may have a
default value. On the multiplicity dimension, an attribute may be single-valued (i.e., an
object has at most one value for this attribute) or multi-valued (i.e., an object may have
more than one value for this attribute).
Regarding the uniqueness property of an
attribute, if an attribute of a class is specified as "unique", the value of this attribute is
unique across all objects of the class; otherwise, duplication is allowed. The nullspecification of an attribute specifies whether a null value can be assumed by this
attribute. Moreover, if an attribute or a set of attributes of an entity class whose values
(or composites of values) are distinct for each object of the entity class, the attribute or
the set of attributes can become the identifier of the entity class.
84
Methods: Methods of a class define the behavior of the class and in turn define the
behavior of all objects of the class. Inherited from the object-oriented paradigm, the
interface of a method mcludes the name of the method (i.e., signature), parameter(s), and
its return type if applicable. A method of a class can be either an object method which is
applied to an individual object of the class or a class method which is applied to the class
as a whole.
Constraints: Constraints define logical restrictions on the constructs mentioned above.
Due to the declarative nature and the expressive power of the FOL and its variations,
most of the constraint specification languages proposed, including ALICE [U89], CE
[M84, M86], ERC-H- [T93], are FOL-based languages. Thus, an FOL-based constraint
construct of the SOOER model will be developed and discussed in the next subsection.
3.2.2 Constraint Specification in the SOOER Model
Constraints, associated with clzisses, are well-formed first
order logic formulae.
Extending the FOL [W92], the basic building blocks of constraints are variables, path
expressions, and function expressions on variables or path expressions. Upon these basic
building blocks, terms, atomic formulae and finally well-formed formulae are specified.
These terminologies are defined as follows.
Definition: Variable
85
A variable provides a way to reference an object of a class (an entity class or a
relationship class), a set of objects of a class, or an attribute. Accordingly, a variable can
be one of the following three forms:
1. Instance variable: refers to an object of a class. The class can be a directly-referenced
class or a path-referenced class (which will be defined next). An instance variable
can be quantified universally (V or all) or existentially (3 or exists). The expression
of an instance variable with its domain is expressed as:
V instance-variable: class | path-referenced-class or
3 instance-variable: class | path-referenced-class
where | denotes "or". The first expression defines a universally quantified instance
variable which denotes that each object of a particular class or a path-referencedclass. The second expression defines an existentially quantified instance variable
which denotes that there exists an object of a particular class or a path-referencedclass.
2. Instance-set variable: refers to a set of objects of a class as a whole. Again, the
domain of an instance-set variable can be a directly-referenced or path-referenced
class. An instance-set variable cannot be quantified by an existential or universal
quantifier. The expression of an instance-set variable with its domain is expressed
as:
instance-set-variable = class | path-referenced-class
which states that the instance-set-variable refers to the set of objects of a particular
class or a path-referenced-class.
86
3. Attribute variable: is bound to an attribute value of an object of a class. Howr binding
is perfonned depends on whether an attribute variable is quantified universally or
existentially. Furthermore, if an attribute variable refers to a composite attribute, the
aggregation of its subattributes will be the domain of the attribute variable. Since an
attribute can only be visited from an object of a class (which is expressed as an
instance variable) followed by a sequence of attribute links with or without traversing
through other classes, the attribute referenced by an attribute variable is always a
path-referenced-attribute. The expression of an attribute variable with its domain is
expressed as:
V attribute-variable: path-referenced-attribute or
3 attribute-variable: path-referenced-attribute
The first expression defines a universally quantified attribute variable which denotes
each value of path-referenced-attribute. An existentially quantified attribute variable,
defined in the second expression, expresses that there exists a value of the pathreferenced-attribute.
Definition: Path Expression
Let t be an instance variable on the class T. A path expression, denoted by t.Ai.A2...An
(t is the origin of the path, while An is the terminal of the path) refers to a path in a
SOOER schema, satisfying the following statements for each j € {1,..., n}:
1. If Aj is an attribute, then Ak is an attribute of A^.i, for each k e (j+l,..., n}.
2. If Aj is an entity class, then Aj is the end of the path or there exists Aj+, which is
either an attribute of Aj or a relationship in which Aj participates.
87
3. If Aj is an association relationship class, then A,- is the end of the path or there exists
Aj+i which is either an attribute of Aj or an entity class participating in Aj.
4. If Aj is a specialization or aggregation relationship class, then Aj+i is an entity class
participating in Aj.
The above statements for a path essentially have the following properties:
1. A path starts from an instance variable.
2. The terminal of a path is an entity class, an association relationship class, or an
attribute, but can never be a specialization or an aggregation relationship class.
3. If a path ends at an entity class or an association relationship class, the terminal class
is called a path-referenced-class. For each object of T (i.e., an object denoted by t),
the path returns a set of objects of the terminal class associated to t via this path.
4. If a path ends at an attribute, the attribute is called the path-referenced-attribute. For
each object of T (i.e., an object denoted by t), the path returns a value or a set of
values of the terminal attribute associated to t via this path. Whether it returns a
single value or a set of values depends on the maximum cardinality of each entity
class and the multiplicity of the attribute involved in the path.
If all maximal
cardinalities involved in the path are 1 and the multiplicity of the attribute is singlevalued, then the path will return a single value; otherwise, it returns a set of values.
The path expression t.A,.A2...An is an abbreviated form in which the role name of an
entity class participating in its subsequent association relationship class in the path is
omitted. In the full expression, to express an association relationship class in a path, the
88
preceding and subsequent entity class of the association relationship should be signified
by a particular role name indicating how the path would traverse through this association
relationship class.
In other word, if Aj.i.Aj.Aj+i is part of a path where Aj is an
association relationship class and Aj.i and Aj+j are entity classes, it is represented as Aj.
i.rolej.,-»Aj->roIej.Aj+i where the preceding entity class Aj.i takes the role rolej., and the
subsequent entity class Aj+j takes the role roIej. Role names are not necessary in an
association relationship class when all the participating entity classes are distinct.
Definition: Function Expression
Variables, entity classes, association relationship classes, and path expressions (pathreferenced-classes or path-referenced-attributes) can be manipulated by functions to
derive aggregate information. The most commonly used fimctions are summarized in
Table 3.1.
89
Function
min()
Input Parameter
atomic path-referenced-attribute
max()
atomic path-referenced-attribute
sum()
atomic, numerical pathreferenced-attribute
an atomic, numerical pathreferenced-attribute
instance-set variable, pathreferenced-class, or pathreferenced-attribute
instance-set variable, pathreferenced-class, or pathreferenced-attribute
avgO
count()
countdistinct()
Description
returns the mmimum value of a set of
attribute values
returns the maximum value of a set of
attribute values
returns the summation of a set of
attribute values
returns the average of a set of
attribute values
returns the total number of objects (or
values) in a set v^ithout duplication
elimination
returns the total number of objects (or
values) in a set with duplication
elimination
Table 3.1: Functions for Function Expression in SOOER Constraint Specification
Definition: Term and Atomic Formulae
A term is either a variable, a path expression, a fxmction expression, a constant, or a set
of constants, while atomic formulae are constructed by composition of terms as follows:
• A term is an atomic formula.
• If fi and f2 are atomic formulae, then f] cp
is also an atomic formula, where cp is an
arithmetic operator, value-comparison operator, set-membership operator (e,
comparison operator (=, c, c, 3,3,
or set operator (u, n, -).
Definition: Well-Formed Formulae
Well-formed formulae are constructed by composition of atomic formulae as follows:
• An atomic formula is a well-formed formula.
set-
90
• If f, fi, and f2 are well-formed formulae, then -if, f, v fj, f, A £2, fi => fj are formulae,
where "-I" stands for not, "v" for or, "A" for and, and "=>" for implies.
3.3 Metamodel Schema and Model Definition Language
As shown in Figure 3.4, the metamodel (i.e., the SOOER model) needs to be formally
specified through the meta-metamodeling process to derive the metamodel schema which
serves as the formal specification representation for modeling models (i.e., through the
metamodeling process). The meta-metamodeling process is similar to the modeling
process except that the target of the meta-metamodeling process is the metamodel and
the formal specification representation is again the metamodel. As depicted in Figure
3.5, the goals of the meta-metamodeling process are to formally define the four
components of the metamodel schema: metamodel constructs, explicit metamodel
constraints, implicit metamodel constraints and metamodel semantics. Therefore, the
meta-metamodeling process can be considered as the process of structurally modeling
metamodel constructs, specifying explicit metamodel constraints, defining metamodel
semantics, and instantiating metamodel semantics into implicit metamodel constraints.
3.3.1 Metamodel Constructs in Metamodel Schema
The metamodel constructs are structurally modeled in the metamodel constructs and
graphically illustrated by Figure 3.7.
91
hss^subattnbute
subattribute
/\
sy
Constraint
Uniqueaess
Entity
Class
Attnbute
Null-spw3/
^^ecificatiq^
composite
(O.m) (O.m)
Name ^Xco.m)
Data-type denved
Min-card
Max«card
Relationship
Class
(0,m)
base
compose
denved'froni
Method
^^rivatioo-ni
Association
Identifier
superclass
ecla ratio
subclass
Specialization
^mplementatiq^
Aggregation
Min-card
""Max-cai^
"
component
Figure 3.7: Metamodel Constructs in Metamodel Schema
Explanation:
As shown in Figure 3.7, the metamodel provides two main metamodel constructs: EntityClass and Relationship-Class. These two metamodel constructs are described by a name
property because when metamodeling a model (in the metamodeling process) a name
needs to be specified for each model construct which is an instance of Entity-Class or
Relationship-Class (e.g., the relation model construct in the relational data model is an
instance of Entity-Class with the name "Relation"). Since the two metamodel constructs
share a common property and relationships to other metamodel constructs (which will be
explained later), a superclass called Class is created in the metamodel schema for these
two metamodel constructs.
Relationship-Class is further specialized into three
metamodel constructs: Association, Specialization, and Aggregation. An association
92
relationship relating two or more possibly identical entity-classes and an entity-class
optionally associating with one or more association relationships are described by a
relationship between the metamodel constructs Association and Entity-Class in the
model schema. Each link between an association relationship and an entity-class can be
further depicted by a minimal-cardinality, a maximal-cardinality, and a role taken by the
entity-class when participating in the association-relationship.
On the other hand, the metamodel supports the constructs for modeling attributes,
constraints, and methods, as defined in Section 3.2.1. Three metamodel constructs are
included in the metamodel schema: Attribute, Constraint, and Method.
Since each
entity-class or relationship-class may contain attributes, methods and constraints, an
aggregation relationship is created between the metamodel construct Class as its
aggregate-class and the metamodel constructs Attribute, Constraint, and Method as its
component-classes. The optional inclusions of attribute, constraint, and method by an
entity-class or a relationship-class are described by zero minimal cardinalities on the
component-classes of the aggregation relationship. The metamodel construct Attribute is
described with name, multiplicity, uniqueness, and null-spec. When a model construct
which is an instance of Attribute is created in the model schema during the
metamodeling process, the name for this model construct, whether it is single- or multi
valued, unique or duplicable, or null allowed or disallowed need to be specified for the
next level instantiation (i.e., when this model construct is instantiated into an application
schema).
93
3.3.2 Explicit Metamodei Constraints in Metamodei Schema
Since explicit metamodei constraints define logical restrictions on metamodei constructs
and will be used to verify instantiations of metamodei constructs into model schema as
shown in Figure 3.5, explicit metamodei constraints generally take the format of:
Vinstance-variable: metamodei construct ...=>
assertion on path-referenced-attributes
The general form consists of two parts: antecedent and assertion separated by =>. If the
antecedent defined on each instance of a particular metamodei construct (i.e., each model
construct which is an instance of this metamodei construct) is true, the assertion on an
attribute reachable from the instance (i.e., property reachable from the model construct)
must be satisfied. Table 3.2 lists the explicit metamodei constraints of the metamodei
SOGER.
94
Metamodel
Construct
Class
RelationshipClass
Attribute
Association
Specialization
Aggregation
Explicit Metamodel Constraint
Description
• Vc: Class, Va: c.Identifier.conipose.Attribute
=> a.Uniqueness = 'not-null' A
a.Multiplicity = 'single-valued'
• Vc: Class, count(c.Identifier) = 1 =>
c.Identify.compose.Attribute c c.Attribute
• Vr: Relationship-Class => r.Type e
{'Association', 'Aggregation', 'Specialization'}
• Va; Attribute
a.Uniqueness e {'unique', 'not-unique'}
• Va: Attribute =>
a.Null-Spec e {'null', 'not-null'}
• Va: Attribute => a.Multiplicity e
{'single-valued','multi-valued'}
• Va: Association, Vr: a.relate =>
r.Min-Card > 0
• Va: Association, Vr: a.relate =>
r.Max-Card > 0 v r.Max-Card = 'm'
• Va: Association,
Vr: a.relate, r.Max-Card 'm' =>
r.Min-Card < r.Max-Card
• Va: Association, Vrl: a.relate, Vr2: a.relate,
rl vi r2 => rl.role ^ r2.role
• Va: Specialization, Vs: a.subclass =>
s.Min-Card > 0
• Va: Specialization, Vs: a.subclass =>
S.Max-Card > 0 v s.Max-Card = 'm'
• Va: Specialization, Vs: a.subclass,
S.Max-Card 'ni'=>
s.Min-Card < s.Max-Card
• Va: Aggregation, Vs: a.component =>
S.Min-Card > 0
• Va: Aggregation, Vs: a.component
s.Max-Card > 0 v s.Max-Card = 'm'
• Va: Aggregation, Vs: a.component,
S.Max-Card * 'ni'=5>
s.Min-Card < s.Max-Card
• Identifier must not-null
and single-valued
• Inclusion constraint on
identifier and attribute
• Domain constraint on
Type
• Domain constraint on
Uniqueness
• Domain constraint on
Null-Spec
• Domain constraint on
Multiplicity
• Domain constraint on
Min-Card
• Domain constraint on
Max-Card
• Domain constraint on
Min-Card and MaxCard
• Unique role name for
each association
• Domain constraint on
Min-Card
• Domain constraint on
Max-Card
• Domain constraint on
Min-Card and MaxCard
• Domain constraint on
Min-Card
• Domain constraint on
Max-Card
• Domain constraint on
Min-Card and MaxCard
Table 3.2: Explicit Metamodel Constraints in Metamodel Schema
95
Explanation:
In Table 3.2, the first explicit metamodel constraint in the metamodel construct Class
defines that for every model construct which is an instance of the Class (i.e., every
instance of the Class denoted by Vc: Class) and for each attribute which is part of the
identifier of this instance (expressed as Va: c.Identifier.compose.Attribute), this attribute
must be declared as 'not-null' and 'single-valued' (denoted by as a.Uniqueness = 'notnull' A a.Multiplicity = 'single-valued'). The second explicit metamodel constraint in
the metamodel construct Class asserts that for every model construct which is an instance
of the Class (i.e., every instance of the Class denoted by Vc: Class), if the model
construct has an identifier (expressed as count(c.Identifier)=l), then the attributes
included in the identifier must be a subset of all attributes of the model construct (i.e.,
denoted by c.Identify.compose.Attribute c c.Attribute). The first explicit metamodel
constraint in the metamodel construct Attribute states that for every instance of the
Attribute (i.e., any model construct which is an instance of the Attribute), its uniqueness
specification can only be either 'unique' or 'not-unique'.
3.3.3 Metamodel Semantics in Metamodel Schema
Metamodel semantics define the semantics of metamodel constructs and will be
instantiated as implicit model constraints in a model schema.
Thus, metamodel
semantics usually follow the form of:
Vinstance-variablel: metamodel construct ...=>
(Vinstance-variable2: instance-variable1 ...=>
assertion on path-referenced-attributes from instance-variable2)
96
In this general form of metamodel semantics, the outer antecedent (i.e., Vinstancevariablel: metamodel construct ...) refers to an instance of a particular metamodel
construct (i.e., a model construct which is an instance of the metamodel construct). The
inner antecedent (i.e., Vinstance-variable2: instance-variable1...) and its assertion (i.e.,
"assertion on path-referenced-attributes from instance-variable2") refer to instances of
the model construct declared in the outer antecedent (i.e., application constructs which
are instances of the model construct). In fact, the semantics of a metamodel construct is
defined in the assertion of the inner antecedent. The interpretation of this form is for
each instance (model construct) of a particular metamodel construct and for each instance
of this model construct, the assertion on the instance of the model construct must be
satisfied. Using this expression, the metamodel semantics of the metamodel SOOER are
listed in Table 3.3.
97
Metamodel Semantics
Metamodel
Construct
• Vc: Class, Va: c.Attribute, a.Uniqueness = 'unique' =>
Class
(Vol: c, Vo2: c, ol^o2=> ol.a7io2.a)
• Vc: Class, Va: c.Attribute, a.Null-Spec = 'not-null' =>
(Vol: c => count(ol.a) > I)
• Vc: Class, Va: c.Attribute,
Vb: a.composite-^Has-subattribute->subattribute.Attribute,
b.Null-Spec = 'not-null'
(Vol: c, Val: ol.a.b count(al) > I)
• Vc: Class, Va: c.Attribute, a.Multiplicity='single-valued'=>
(Vol: c =:> count(ol.a) £ 1)
• Vc: Class, Va: c.Attribute,
Vb: a.composite->Has-subattribute-)-subattribute.Attribute,
b.Multiplicity = 'single-valued' =>
(Vol: c, Val: ol.a.b => count(al) < I)
Association • Va: Association, Vrl: a.relate, Vr2: a.relate.
rI.Role ^ r2.Role, cl := rl.Entity-Class,
c2 := r2.Entity-Class =>
(Vol: cl =>
count(ol.rl.Role->a->r2.Role.c2) > r2.Min-Card)
• Va: Association, Vrl: a.relate, Vr2: a.relate,
rl.Role * r2.Role, r2.Max-Card 'm'.
cl :=rl.Entity-Class, c2 := r2.Entity-Class =3>
(Vol:cl =>
count(o I .rl .Role->^a->r2.Role.c2) < r2.Max-Card)
• Va: Association, Vrl: a.relate, Vr2: a.relate, Vr3: a.relate.
rl.Role r2.Role, rl.Role r3.Role, r2.Role vt r3.Role,
cl := rI.Entity-Class, c2 := r2.Entity-Class,
c3 := r3.Entity-Class =>
(Vol: cl, Vo2: eI.rl.Role->a->r2.Role.c2 =>
count(o I .rl .Role->a-^r3.Role.c3) > r3.Min-Card)
• Va: Association, Vrl: a.relate, Vr2; a.relate, Vr3; a.relate,
rl.Role ^ r2.Role, rl.Role ^ r3.Role, r2.Role ^ r3.Role,
r2.Max-Card ^'m', cl := rl.Entity-Class,
c2 := r2.Entity-Class, c3 := r3.Entity-Class =>
(Vol: cl, Vo2: el.rl.Role->a->>r2.Role.c2 =>
count(ol.rl.Role->a->r3.Role.c3) < r3.Max-Card)
Table 3.3: Metamodel Semantics in Metamodel Schema
Description
• Semantics of
Uniqueness
• Semantics of
Null-Spec
• Semantics of
Null-Spec
• Semantics of
Multiplicity
• Semantics of
Multiplicity
• Semantics of
Min-Card of
Unary and
Binary
Associations
• Semantics of
Max-Card of
Unary and
Binary
Associations
• Semantics of
Min-Card of
Ternary
Associations
• Semantics of
Max-Card of
Ternary
Associations
98
Metamodel
Metamodel Semantics
Construct
Speciali
• Va: Specialization, Ve: a.subciass,
s := a.superclass.Entity-Class, c := e.Entity-Class =>
zation
(Vol: s => count(ol.superclass-^a->subclass.c) >
e.Min-Card)
• Va: Specialization, Ve: a.subclass, e.Max-Card ^'m',
s := a.superclass.Entity-Class, c := e.Entity-Class =>
(Vol: s => count(ol.supercIass-*a->subclass.c) <
e.Max-Card)
Aggregation • Va: Aggregation, Ve: a.component,
s := a.aggregate.Entity-Class, c := e.Entity-Class =>
(Vol: s => count(ol.aggregate->a->component.c) >
e.Min-Card)
• Va: Aggregation, Ve: a.component, e.Max-Card 'm',
s := a.aggregate.Entity-Class, c := e.Entity-Class =>
(Vol: s => count(ol.aggregate->a-^component.c) <
e.Max-Card)
Description
• Semantics of
Min-Card of
subclass in
Specialization
• Semantics of
Max-Card of
subclass in
Specialization
• Semantics of
Min-Card of
component in
Aggregation
• Semantics of
Max-Card of
component in
Aggregation
Table 3.3 (Continued): Metamodel Semantics in Metamodel Schema
Explanation:
In Table 3.3, the first metamodel semantics for the metamodel construct Class defines
that for each attribute of an instance of the metamodel construct Class (i.e., each
attribute of a model construct which is an instance of the Class, expressed as Vc: Class,
Va: c.Attribute), if the attribute's uniqueness specification is 'unique' (expressed as
a.Uniqueness = 'unique'), then for any two distinct instances of this model construct
(i.e., any two distinct application constructs which are the instance of this model
construct, depicted as Vol: c, Vo2: c, ol
o2), these two instances must have distinct
values on this attribute (expressed as ol.a 9^ o2.a). Essentially, this metamodel semantics
defines the semantics of the uniqueness specification on attributes. For example, the
model construct Relation in the relational data model is specified as an instance of the
metamodel construct Entity-Class and, in turn, is also an instance of the metamodel
99
construct Class because Entity-Class is a subclass of Class in the metamodel schema.
Further assume that the name attribute of the model construct Relation be specified as
unique.
According to the assertion of the inner antecedent of the first metamodel
semantics for the metamodel construct Class, the name attribute of the model construct
Relation being unique delineates that any two distinct instances of the model construct
Relation (i.e., any two relations in an application schema) must have distinct names (i.e.,
distinct relation names in an application schema). The specification derived from this
metamodel semantics coincides with the intention of specifying the name attribute of the
model construct Relation as unique as well as with the constraints imposed on the
definition for the relational data model.
The second metamodel semantics for the metamodel construct Class defines for each
attribute of an instance of the metamodel construct Class (i.e., each attribute of a model
construct which is an instance of the Class, expressed as Vc: Class, Va: c.Attribute), if
the attribute's null-specification is 'not-null' (expressed as a.Null-Spec = 'not-null'), then
every instance of this model construct (each application construct which is the instance
of this model construct, depicted as Vol: c) must have at least one value on this attribute
(expressed as count(ol.a) >1). As such, this metamodel semantics defines the semantics
of attributes being specified as 'not-null'.
100
3.3.4 Implicit Metamodei Constraints in Metamodei Schema
The instantiation of metamodei semantics into implicit metamodei constraints basically
is a process of instantiating instance variables defined in the outer antecedent of each
metamodei semantics in terms of metamodei constructs, evaluating conditions specified
in the outer antecedent of each substituted metamodei semantics against metamodei
constructs, and adding the inner antecedent and its assertion into metamodei schema as
its implicit metamodei constraints when the conditions are evaluated as true. As shown
in Figure 3.5, another instantiation relationship exists between metamodei semantics and
implicit model constraints of a model schema. The instantiation process just described
for deriving implicit metamodei constraints is also valid for deriving implicit model
constraints. The detailed metamodei semantics instantiation algorithm is summarized in
Algorithm 3.1.
101
GIobal-Instaiitiate(S, M)
/* Input;
Metamodel semantics S and metamodel schema (or model schema) M
Result: Metamodel schema with implicit metamodel constraints (or model schema
with implicit constraints) */
Begin
For each metamodel construct (or model construct) C in M
For each metamodel semantics L of the metamodel construct Class in S
C-Instantiate(L, C).
If C is an instance of the metamodel construct Association
For each metamodel semantics A of the metamodel construct Association in S
C-Instantiate(A, C).
If C is an instance of the metamodel construct Aggregation
For each metamodel semantics G of the metamodel construct Aggregation in S
C-Instantiate(G, C).
If C is an instance of the metamodel construct Specialization
For each metamodel semantics P of the metamodel construct Specialization in S
C-Instantiate(P, C).
End.
C-Instantiate(T, C)
/* Input: A metamodel semantics T and a metamodel construct (or model construct) C
Result: C with implicit metamodel constraints (or implicit model constraints) */
Begin
Initialize Stack;
T = Instantiate all occurrences of the first instance-variable defined in the outer
antecedent of T by C's name.
Push(T, Stack).
While Stack is not empty
T = Pop(Stack).
If T contains an uninstantiated instance-variable I in the outer antecedent
For each metamodel construct (or model construct) R which is related to C
and is an instance of the domain of I
T = Instantiate all occurrences of I in T by R's name.
Push(T, Stack).
Else If (T contains no condition is the outer antecedent) or
(the conditions in the outer antecedent are evaluated as true)
Add the inner antecedent and its assertion of T into C.
End.
Algorithm 3.1: Algorithm of Metamodel Semantics Instantiation
102
Given the metamodel schema shown in Figure 3.7, Algorithm 3.1 will generate the
implicit metamodel constraints for the metamodel according to the metamodel semantics
specified in Table 3.3. For example, since the name attribute of the metamodel construct
Class is the identifier (as shown in Figure 3.7), the uniqueness, null-spec and multiplicity
properties of this attribute are 'unique', 'not-null', and 'single-valued', respectively.
Accordingly, the following implicit metamodel constraints for the metamodel construct
Class can be instantiated:
1. Vo1: Class, Vo2: Class, o1 ?£ o2 => o1.Name ^ o2.Name
2. Vo1: Class => count(o 1.Name) > 1
3. Vo1: Class => count(o 1.Name) < 1
The first implicit metamodel constraint depicts that when two instances of the metamodel
construct Class are distinct, their name must be different. That is, the names of model
constructs in a model schema must be unique. The second and third implicit metamodel
constraints together define that each instance of the metamodel construct Class needs to
have one and only one name (i.e., a name is required for every model construct in a
model schema). Due to the implicit metamodel constraints being derivable in nature, a
complete listing of all implicit metamodel constraints will not further be provided.
3.3.5 Model Definition Language
Based on the metamodel constructs and the explicit metamodel constraints, a model
definition language (MDL) is developed, as shown in Figure 3.8. Rather than replacing
the graphical representation of the SOGER metamodel, the MDL provides an alternative
way of textually describing a model schema.
103
MODEL: model-name
{
< ENTITY-CLASS: entity-name
{ [ ATTRIBUTE: attribute-declaration
IDENTIFIER: attribute-name [ <, attribute-name> ] ]
[ METHOD: <method-name (declaration): implementation> ]
[CONSTRAINT: <constraint-name: specification> ] } >
< RELATIONSHIP-CLASS: relationship-name
{ TYPE: type-declaration
[ ATTRIBUTE: attribute-declaration
[ IDENTIFIER: attribute-name [ c, attribute-name> ] ] ]
[ METHOD: <method-name (declaration): implementation> ]
[CONSTRAINT: <constraint-name: specification>] } >
attribute-declaration :=
<att-name (DATA-TYPE: data-type-specification
UNIQUENESS; unique-specification
NULL-SPEC: null-specification
MULTIPLICITY: multiplicity-specification
[ HAS-SUBATTRIBU're: attribute-declaration ]
[ DERIVED-FROM: attribute-name ([ <, attribute-name> ]) derivation-rule ])>
type-declaration :=
ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
<entity-name (ROLE: role-name,
MIN-CARDINALITY: cardinality-specification,
MAX-CARDINALITY: cardinality-specification)> |
SPECIALIZATION
SUPERCLASS; entity-name
SUBCLASS: <entity-name (MIN-CARDINALITY: cardinality-specification,
MAX-CARDINALITY: cardinality-specification)> |
AGGREGATION
AGGREGATE: entity-name
COMPONENT: <entity-name (MIN-CARDINALITY: cardinality-specification,
MAX-CARDINALITY; cardinality-specification)>
uniqueness-specification := unique | not-unique
null-specification := null | not-null
multiplicity-specification := single-valued| multi-valued
cardinality-specification := 0 | 1 | ... | m
Annotations:
< >
[ ]
denotes the enclosed item repeats one or more times.
denotes optional.
A | B denotes A or B.
Figure 3.8: Model Definition Language Based on the SOOER Metamodel
104
3.4 Metamodeling Process and Model Schema
Using the metamodel and its metamodel schema defined previously, the metamodeling
process is the process of formalizing a model as the model schema which consists of four
components: model constructs, explicit model constraints, implicit model constraints and
model semantics. Appendix B lists the model schema resulting from metamodeling a
relational data model. Its model constructs are graphically depicted in Figure 3.9.
name
Relation
•consist of 1
name
<[]^ataTy\|
C^Unique^——
(I.m)
Attribute
(I.m)
CSefaultValui
(1.1)
(O.m)
Primary-Key ( 1 . 1 ) 0 (O.m) Foreign-Key
( l . m ) 0 (0, 1)
referenced
compose-of
(0. I)
participate
Figure 3.9: Model Schema of Relational Data Model
The metamodeling process is similar to a modeling process through which an application
schema is formally specified. As mentioned in Section 2.4, it is required to support an
inductive metamodeling process which leams the model schema of a data model from
some example application schemas without interacting with users in a large-scale MDBS
environment.
The discussions on the development of an inductive metamodeling
technique (called Abstraction Induction Technique) will be provided in the next chapter.
105
CHAPTER 4
Inductive Metamodeling
This chapter first analyzes the characteristics of the inductive metamodeling problem,
reviews existing inductive learning techniques based on a proposed analysis framework,
and investigates the applicability of any existing inductive learning technique to the
inductive metamodeling. Subsequently, the development of a technique for the inductive
metamodeling. Abstraction Induction Technique, will be described in detail.
An
evaluation of this technique will also be performed and analyzed.
4.1 Overview
4.1.1 Characteristics of Inductive Metamodeling Problem
The inductive metamodeling process leams the model schema of a data model from some
example application schemas (called training examples) represented by this data model.
More specifically, the problem of inductive metamodeling can be overviewed in the
following aspects: learning strategy, type of induction, noises in training examples,
representation, and specific induction output.
aspects depicted in the following.
Table 4.1 summarizes each of these
106
Learning Strategy
Type of Induction
Noises in Training Examples
Representation
Specific Induction Output
Learning from examples (positive examples only)
Characteristic and abstraction concepts
Noise-free
- External representation of training examples: DD
(structured) or DDL (textual)
- Internal representation of training examples: to be
discussed in Section 4.2.1.
- Representation of induction output: metamodel
(SOGER)
Model schema with model constructs and model
constraints only
Table 4.1: Summary of Characteristics of Inductive Metamodeling Process
Learning Strategy:
Several learning strategies have been distinguished [CMM83, Mi86, KM86, RK91] and
can be classified as follows:
1. Rote learning; The learning system (called learner) simply accepts and memorizes the
information provided to it.
2. Learning from instruction: The learner transforms instructions into an internallyusable form and integrates them with prior knowledge.
3. Learning by analogy: The learner acquires new facts by transforming and augmenting
existing knowledge that bears strong similarity to the desired new concept into a
form effectively useful in the new situation.
4. Inductive learning: Inductive learning can be subdivided into learning from examples
and learning by observation and discovery. In learning from examples, the learner
induces a general concept description that describes all positive examples and none
of the negative examples. In learning by observation and discovery, the learner
107
investigates a domain in an unguided fashion for regularities and general rules
explaining all, or at least most, observations.
Among these learning strategies, the inductive metamodeling process adopts the
inductive learning strategy.
Specifically, since all training examples provided are
positive examples, the inductive metamodeling process belongs to the category of
learning from examples with only positive examples.
Type of Induction:
The nature of this induction task is to generate a higher-level abstraction which
generalizes the structures and constraints exhibited in the training examples. In other
words, the inductive metamodeling is a process of structure and constraint
generalization.
The induced abstraction (i.e., model schema) explains the model
constructs with which training examples were defined and the model constraints to which
training examples conform. Inversely, a training example is a specific instance of the
induced model schema.
Noises in Training Examples:
Training examples used in the inductive metamodeling process are noise-free since
training examples (application schemas) are extracted from or provided by pre-existing
LDBSs.
Representation:
108
The representation employed by the inductive metamodeling is the language employed
for expressing training examples and induction output. Two levels of the representation
of training examples need to be distinguished: external and internal representation. The
external representation of training examples refers to the original format of training
examples. A training example can be extracted from the data dictionary (DD) of a
database system or formulated in a data definition language (DDL). On the other hand,
the internal representation of training examples is the representation which can be
efficiently and effectively manipulated by the inductive metamodeling process. If the
internal representation differs from the external representation, transformation of training
examples is required. The internal representation of training examples developed for the
inductive metamodeling process will be defined in Section 4.2.1. Since the induction
output from the inductive metamodeling is the model schema of a data model, the
representation of induction output is a metamodel. Specifically, the SOGER model is
adopted as the representation of induction output.
Specific Induction Output:
As mentioned, the induction output from the inductive metamodeling process is a model
schema. A model schema consists of the specifications of model constructs, model
constraints (including explicit and implicit model constraints), and model semantics.
However, it may not be possible for all of the components of a model schema to be
induced from application schemas. Instantiation relationships in the schema hierarchy of
the (extended) metamodeling paradigm, as shown in Figure 3.3 and Figure 3.5, indicate
that application constructs and explicit constraints of an application schema are
109
instantiations of model constructs.
Thus, model constructs can be learned from
applications constructs and explicit constraints described in training examples. Ideally,
model semantics can be induced from implicit constraints of an application schema.
However, since implicit constraints are application independent, they are usually
embedded in DBMSs and are not specified in application schemas. Hence, the induction
of model semantics in a model schema solely from application schemas is extremely
difficult, if not impossible. On the other hand, since instantiations from model constructs
into application constructs and explicit constraints are verified by model constraints,
some patterns exhibited in an application schema may be generalized into model
constraints.
Therefore, some model constraints can be induced from
application
schemas. In all, only the model constructs and model constraints of a model schema can
possibly be induced from application schemas.
4.1.2 Comparisons with Existing Inductive Learning Techniques
As mentioned earlier, the inductive metamodeling process employs the learning from
examples strategy which is a special type of inductive learning strategy. Therefore, it is
necessary to investigate the differences between the inductive metamodeling process and
other inductive learning problems as well as the applicability of existing inductive
learning techniques to the inductive metamodeling process.
The analysis firamework for inductive learning techniques consists of three aspects: the
description of training examples, the type of concept induced, and whether the process is
110
constructive or not. The description of training examples can be either structural or
attribute-oriented.
The types of concept induced from inductive learning generally
uiclude characteristic, discriminant and taxonomic. An inductive learning technique is
constructive if its induction process changes the description space; that is, it produces
new descriptions that were not present in the training examples. These three aspects are
described in more detail as follows.
Description of Training Examples [DM83]:
1. Structural description: Structural descriptions portray training examples as composite
structures consisting of various components. For example, a structural description of
a building could represent the building in terms of the floors, the walls, the roof, etc.
2. Attribute description: Training examples are described by a set of properties rather
than their internal structures. The attribute description of a building might list its
cost, total square-foot, condition, etc.
Tvpe of Concept Induced [DM83, C90]:
1. Characteristic: A characteristic concept is a description of a class of training
examples that states facts true for all training examples in the class.
Thus,
characteristic concepts do not necessary comply with the strict discrimination
criterion. Characteristic concepts are often encoded as frames or logical formulae.
2. Discriminant: A discriminant concept is a description of a class of training examples
that states only those properties of all training examples in the class that are
necessary to distinguish them from the training examples in other classes. Often
Ill
discriminant concepts are encoded as paths from
the root to the leaves of an
incrementally acquired decision tree.
3. Taxonomic: A taxonomic concept is a description of a class of training examples that
subdivides the class into subclasses. An important kind of taxonomic concept is a
description that determines a concept clustering. Generally speaking, determination
of characteristics and discriminant concepts is the subject of learning from (preclassified) examples, while determination of taxonomic concepts is the subject of
learning by observation or discovery.
Constructive Induction [DM83, P94, K94]:
To overcome a situation in which the initially given description space of training
examples yields poor learning results, the basic idea of constructive induction is to
somehow transform the original description space into a space where training examples
exhibit (more) regularity. Usually this is done by introducing new descriptions through
combining or aggregating existing descriptions (original or constructed).
The new
descriptions represent concepts wdth higher-level abstraction than the descriptions from
which the new descriptions were derived.
Table 4.2 summarizes some existing induction learning techniques in the analysis
framework depicted above. Any cell marked by "—denotes not available or not found.
Most inductive learning techniques deal with attribute-oriented training examples. Only
a few inductive learning techniques concern structural description of training examples.
112
All structural inductive learning techniques induce characteristic concepts, and none of
them is constructive.
Type of
Concept
Induced
Characteristic
Description of Training Examples
Structural
Attribute-Oriented
Non-constructive Constructive Non-Constructive Constructive
Winston's System
AQ7Uni [S78]
DB Learn
[CCH89]
[W75], Version
Spaces [M77,
M78]
DBLeam
AQ15[MMH86],
[CCH90],
ID3 [Q83], CN2
AQ17-DCI
[CN89], C4.5
[Q93]
[BM9I], AQI7HCI [WM94],
CN2-MCI [K94]
EPAM [FS84],
COBWEB [F87],
CLASSIT [GLF90]
—
Discriminant
•
Taxonomic
—
—
Table 4.2: Summary of Existing Inductive Learning Techniques
Training examples for the inductive metamodeling process are described by their internal
structures (e.g., a patient table consists of the name, patient-id and address attributes)
rather than by a set of properties. In terms of type of concept induced, the inductive
metamodeling process aims at inducing a model schema which is a characteristic rather
than a discriminant or taxonomic concept.
Furthermore, training examples for the
inductive metamodeling process are instances of model constructs. Since its goal is to
induce concepts (model constructs and model constraints) at the model level rather than
at the application level, the original description spaces need to be aggregated and
generalized from application specific spaces to model specific spaces (e.g., from such
113
specific tables as "patient", "doctor", etc. to a model construct called "table"). Thus, the
inductive metamodeiing process needs to be constructive.
analysis framework,
In sum, in terms of the
the technique for inductive metamodeiing needs to deal with
structural descriptions of training examples for inducing characteristic concepts in a
constructive maimer. However, as shown in Table 4.2, none of the existing inductive
learning techniques can be adopted as the technique for inductive metamodeiing. Thus,
the development of a new technique is inevitable and will be described in the next
section.
4.2 Abstraction Induction Technique for Inductive Metamodeiing
To deal with the structural descriptions of training examples (i.e., application schemas)
in the inductive metamodeiing process, each training example needs to be decomposed to
reflect its internal structure. Since inductive metamodeiing is a constructive induction
process, training examples at the application level needs to be abstracted into concepts
(i.e., model constructs) at the model level. Subsequently, the remaining induction output
(i.e., model constraints) will be generalized or generated. Based on this conceptual flow
of the inductive metamodeiing process, the technique for inductive metamodeiing (called
Abstraction Induction Technique) consists of three phases: concept decomposition,
concept generalization, and constraint generation.
discussions of each of the phases.
The following are the detailed
114
4.2.1 Concept Decomposition Phase
This phase is to represent the internal structure of each training example so as to perform
the subsequent phases with effectiveness and efficiency. The internal representation for
training examples needs to reflect the hierarchical decomposition structure embedded in
each training example and to describe the relationships within or between hierarchical
decomposition structures.
Accordingly, a representation called concept graph is
developed. Definitions for the concept graph and its related terminologies are given as
follows.
Definition: Concept Graph
A concept graph, a structural representation of a training example, consists of nodes and
directed links. A node represents a term in a training example. A directed links denotes
relationship between nodes and can be classified into two types: has-a and refer-to link.
A has-a link denotes that its destination node (called constituent node) is a part or a
property of its origin node (called composite node), while a refer-to link expresses that
its origin node refers to a concept of its destination node defined in the same or different
concept graph. A concept node^ refers to a non-leaf node or a leaf node from which
some refer-to link(s) is originated. A leaf node which is not associated with any refer-to
link is called a property node of its composite node.
Definition: Concept Hierarchy
" The definition of a concept node is valid for typed languages but may not be applicable to typeless
languages. Since all database systems and their underlying data models, to the author's knowledge, are
typed systems, this assumption does not result in any loss of the generality of the concept graph
representation.
115
A concept hierarchy is a subgraph of a concept graph and consists of only nodes and hasa links. The height of a node in a concept hierarchy is the inclusive number of nodes
between the node and its root node.
Definition: Spurious Refer-to Links
If a node C refers to a node N and to a direct or indu-ect composite node S of N
simultaneously, the refer-to link from C to S is called a spurious refer-to link because the
semantics of this refer-to link are implied by the refer-to link from C to N.
Definition: Derivable Refer-to Links and Consequential Refer-to Links
If a node C refers to a set of nodes S which in turn are referenced by another node N
where N refers to no other node but S, the refer-to links from C to each node in S are
called the derivable refer-to links because, by adding a refer-to link from C to N, they
can be derived from the refer-to link from C to N and the refer-to links from N to S. The
refer-to link from C to N is called the consequential refer-to link of its corresponding
derivable refer-to links.
Definition: Redundant Refer-to Links
If a set of derivable refer-to links and their consequential refer-to link coexist, the set of
derivable refer-to links are called redundant refer-to links because they are implied by the
consequential refer-to link.
Definition: Well-structured Concept Graph
116
A well-structured concept graph is a concept graph without spurious, derivable or
redundant refer-to links.
gx;amplg:
Assume that an example of the relational application schema be:
Create Table Ti
(Ai char(15) not-null,
Aa char(20) not-null,
A3 Int null,
Primary-Key (Ai, A2))
The corresponding concept graph for this example is shown in Figure 4.1. In this
example, T, node has four constituent nodes; Aj, A2, A3, and Primary-Key. Tj, A,, Aj,
A3, and Primary-Key are concept nodes because they are either non-leaf nodes or leaf
nodes from which some refer-to link(s) is originated. Aj node has two constituent nodes:
char(15) and not-null. char(15) node is a leaf node without any refer-to link originating
from it and therefore is a property node of Aj. The node Primary-Key is associated with
two refer-to links: one to Aj and the other to A2. These two refer-to links correspond to
the primary key specification in the example (i.e., Primary-Key A, A2). Furthermore,
this concept graph is well-structured because it contains neither spurious, derivable nor
redundant refer-to links.
117
Primary-Key
cliar(15)
not-null
cliar(20)
not-null
Int
null
; has-a
: refer-lo
Figxire 4.1; Example of Concept Graph
The goal of the concept decomposition phase is to construct a well-structured concept
graph for each training example. As shown in Figure 4.2, the concept decomposition
phase can be decomposed into four steps: 1) concept hierarchy creation which creates a
concept hierarchy for each training example, 2) concept hierarchy enhancement which
transforms each concept hierarchy into a concept graph by adding appropriate refer-to
links, 3) concept graph merging which merges each single-node concept graph with
another related concept graph, and 4) concept graph pruning which transforms each
concept graph into a well-structured one.
118
Application Schemas
(Examples of a
Data Model)
Stopping Words
and Keywords
(Data Model
Dependent)
Concept
Hierarchy
Creation
Concept
Hierarchies
Concept
Hierarchy
Enhancement
Concept
Graphs
Concept
Graph
Merging
Concept
Graph
Pruning
Well-structured
Concept
Graphs
Figure 4.2: Steps of Concept Decomposition Phase
4.2.1.1 Concept Hierarchy Creation
This step is to create a concept hierarchy for each training example. The complexity of
this step depends on the extemal representation of training examples. If it is a structured
format (e.g., training examples are extracted from
the DD), creation of concept
hierarchies is straightforward because no parsing is required. However, if the extemal
representation of training examples is a DDL format, the prerequisite of the concept
hierarchy creation is to parse these unstructured, textual training examples.
When training examples are represented in a DDL format, the correctness of the concept
hierarchy creation depends on the effectiveness of the parsing algorithm.
Stopping
words (e.g.. Create) need not be included in concept hierarchies. Keywords (e.g.. Table)
119
need to be detached from training examples at the same time being associated with
concept hierarchies because these associations indicate generalization possibilities (e.g.,
if Ti and T2 are prefixed by the keyword Table, they can be considered as instances of
the model construct "Table").
Separators (e.g., "(" before A,,
after not-null, etc.)
which implies the internal structure exhibited in training examples should be preserved
to avoid undesirable flatten concept hierarchies. Stopping words and keywords are data
model dependent. For example. Table is a keyword in the relational data model, but it
may be a non-keyword concept in other data models. Thus, one important domain
knowledge required by the concept decomposition step, as shown in Figure 4.2, is the
data model dependent stopping words and keywords. Although different data models
may use different sets of separators, a set of common separators can be identified and
their implication to the creation of concept hierarchies may be invariant among data
models. For example, regardless data models, a comma
which divides two terms in a
training example usually indicates that they are siblings in a concept hierarchy. A set of
common separators and their implication to concept hierarchy creation are analyzed and
listed in Appendix C. Use of the domain knowledge (data model dependent stopping
words and keywords) and the heuristics for separators, the parsing algorithm identifies
non-stopping-word, non-keyword and non-separator terms (called regular terms) from
training examples.
Accordingly, for each training example the concept hierarchy creation step applies the
following rules;
Creation Rule 1 (C-Rule 1):
120
If a regular term C is the first regular term of the training example, create a new
concept hierarchy with the root node named C for the example.
C-Rule 2:
If a regular term C in the training example is a constituent concept of another
regular term P, create a node (named C) for C and add a has-a link from the node P
to the node C in the concept hierarchy corresponding to the example.
C-Ruie 3:
If the keyword K in the training example is not immediately followed by a regtilar
term, then
1) create a node for K by applying C-Rule 1 or C-Rule 2, and
2) set the possible-generalization-name of the node K. as K.
C-Rule 4:
If a regular term C in the training example is a sibling concept of another regular
term S and the node S is the root of the concept hierarchy representing the example,
then
1) create a node with the name C,
2) create a node with a name same as the keyword K prior to S in the example,
3) set the node K as the root node of the concept hierarchy,
4) add has-a links from K to S and from K to C,
5) set the possible-generalization-name of the node K as K, and
6) reset the possible-generalization-name of the node S as un-initialized.
C-Rule 5:
121
If a regular term C in the training example is a sibling concept of another regular
term S and the node S is not the root node of the concept hierarchy representing the
example, then
1) create a node with the name C,
2) add a has-a link from the composite node of S to C, and
3) if the possible-generaiization-name of the node S has been set, set the possiblegeneralization-name of C as that of S.
C-Rule 6:
If the regular term C whose immediately prior term is a keyword K in the training
example, set the possible-generalization-name of the node C as K.
Example:
Assume that three training examples in a relational data model and the keywords for the
relational data model be as below:
Example 1
Create table Ti
(Ai char(15) not null,
Aa char(20) not null,
A3 Int null,
Primary-Key (A,, A2))
Example 2
Create table T2
(Ai Int not null,
A2 real null,
Bi char(15) null,
02 char(20) null,
Primary-Key (Ai))
Example 3
Foreign-Key T2 (B^, 82)
reference Ti (Ai, A2)
Keywords
Table, Primary-Key, Foreign-Key
The corresponding concept hierarchy created for each training example is shown in
Figxire 4.3. According to C-Rule 6, the possible-generalization-name of the node T, in
the first concept hierarchy and of the node T2 in the second concept hierarchy would be
122
set to "Table". As a result of applying C-Rule 3, the possible-generalization-name of the
node Primary-Key in the first and the second concept hierarchy is "Primary-Key". When
constructing the concept hierarchy for Example 3 before T j is encountered, the root node
of the concept hierarchy is T2 because the immediately prior term of T2 is a keyword and
thus C-Rule 6 is applied. Thus, the possible-generalization-name of T2 is "Foreign-Key"
now. When processing the regular term T1 in this example, T, should be a sibling node
of T2.
According to C-Rule 4, a new root node "Foreign-Key" whose possible-
generalization-name is also "Foreign-Key" is created for this concept hierarchy whose
immediate constituent nodes would be T, and T2, and the possible-generalization-name
for the node T[ will be un-initialized.
123
Bampfrl
Pnmary-Key
char(lS)
BOt-null
not-null
cfaar(20)
null
int
Ai
^2
BtampltZ
A:
Ai
zr / \ A
int
\
\
real
A T3
char(15)
null
not-null
Primary-Key
Bz
B,
char(20)
null
\
null
Foreign-Key
Bamptea
X
T2
Ti
\
B,
B2
/
A,
Az
: has-a
Figure 4.3: Concept Hierarchies (Example 1,2 and 3) After Concept Hierarchy Creation
4.2.1.2 Concept Hierarchy Enhancement
Concept hierarchy enhancement evolves concept hierarchies into concept graphs by
substituting the leaf nodes which imply reference relationships with refer-to links. A
reference relationship exists when a leaf node of a concept hierarchy is identical to a non-
124
leaf node (i.e., concept node) in the same or different concept hierarchy or when a leaf
node which is qualified by the root node of another concept hierarchy is identical to a
leaf node of that concept hierarchy. In the first case, if the global naming scheme (i.e.,
the names of the concept nodes in all training examples should be unique) is employed
in defining these training examples, the names of all non-leaf nodes in all concept
hierarchies would be different. As such, there would be no confusion in determining
whether the non-leaf node referenced by a leaf-node is in the same or different concept
hierarchy. However, the names of non-leaf node concepts which are unique within a
training example usually are not unique globally.
For example, the names of the
attributes of a relational table are unique though different relational tables may have
attributes with the same name. To avoid ambiguity resulted from the local naming
scheme in defining these examples, inter-example references (i.e., a concept in one
training example refers to other concept in another training example) require
qualifications while intra-example references need not be qualified. The qualification
often begins from the top-most concept of the training example (i.e., the root node of the
concept hierarchy) to with the concept being referenced. On the other hand, the second
case does not contain any ambigtaity because the leaf node which implies a reference
relationship has already been qualified by the root node of another concept hierarchy.
Referring to Figure 4.3, the concept hierarchy for Example 3 contains two qualification
nodes: T2 and Tp
Each of these qualification nodes explicitly depicts the concept
hierarchy to which their constituent nodes refer. Thus, it is un-ambiguous to say that A,
and A2 of Ti refer to the concept hierarchy of Example 1 rather than that of Example 2.
125
As shown in Figure 4.3, the qualification nodes appear as the composite nodes of those
leaf nodes been qualified. However, it is also possible that a qualification node appears
as a sibling node of the qualified leaf nodes. Therefore, rules for concept hierarchy
enhancement need to deal with both situations. Moreover, in the case of inter-example
reference, the qualified leaf nodes need to be substituted by refer-to links. That is, referto links are created and the qualified leaf nodes are removed. After qualified leaf-nodes
are substituted and removed, their qualification nodes should be removed fi^om the
concept hierarchy. The removal of qualification nodes is due to the fact that it is the
product of the local naming scheme and its existence depends on the existence of the
leaf-nodes been qualified.
Based on the local naming scheme, the possible locations of the qualification nodes, and
the removal of the qualification nodes as discussed above, the rules for the concept
hierarchy enhancement are as follows;
For each leaf node in every concept hierarchy.
Enhancement Rule 1 (E-Rule 1):
If the leaf node L is the same as a non-leaf node N in the same concept hierarchy
and N does not appear in any other concept hierarchy, then create a refer-to link
fi-om the direct composite node of L to N and remove L from the concept hierarchy.
E-Rule 2:
If the leaf node L is the same as some non-leaf nodes N in the same and different
concept hierarchies, and neither the non-root composite (direct and indirect) nodes
of L nor the sibling nodes of L include any root node of the concept hierarchy
126
which contains N, then create a refer-to link from the direct composite node of L to
N of the same concept hierarchy as L and remove L from the concept hierarchy.
E-Rule 3:
If the leaf node L is the same as some (leaf or non-leaf) nodes N in the same and/or
different concept hierarchies and Cj, C2, ...Cn (C| is the composite node of L, C2 is
the composite node of Cj,
is the composite node of Cn.i) are the non-root
direct and indirect composite nodes of L where €„ is the same as the root node of a
different concept hierarchy H, then create a refer-to link from the durect composite
node of €„ to N of H and remove L from the concept hierarchy. Furthermore, after
removing L if Ci becomes the leaf node, remove C, as well. This removal process
continues from C, to Cj (where i < n) and terminates when C; is a non-leaf node.
E-Rule 4:
If the leaf node L is the same as some (leaf or non-leaf) nodes N in the same and/or
different concept hierarchies, and the non-root composite (direct and indirect)
nodes of L do not include any root node of the concept hierarchy which contains N,
and one of the sibling nodes S of N is the same as the root node of a different
concept hierarchy H which contains N, then create a refer-to link from the direct
composite node of L to N of H and remove L from
the concept hierarchy.
Furthermore, after removing L, if S does not have any sibling node, remove S as
well.
For each leaf node L in every concept hierarchy, E-Rule 1 deals with unambiguous intraexample references and E-Rule 2 is for references without qualification which imply
127
intra-example references. E-Rule 3 is for references with qualification (expressed in the
composite nodes of L) which denote inter-example references, while E-Rule 4 is for
references with qualification (expressed in the sibling node of L) which also denote interexample references. If none of the rules is applicable, no action need to be taken on L.
Example:
Referring to Example 1 in Figure 4.3, the leaf node Ai (whose composite node is
Primary-Key) is the same as the non-leaf node A, of Tj (in Example 1) and A, of T2 (in
Example 2). Since the non-root composite (direct and indirect) nodes of this leaf node
(in this case, only Primary-Key) do not include T2 (i.e., the root node of another concept
hierarchy where A^ appears also), the reference relationship implied by this leaf node
should be an intra-concept hierarchy reference and thus E-Rule 2 is applied. According
to E-Rule 2, the leaf node Ai (of Primary Key) of Example 1 is deleted and a refer-to link
is added from Primary Key to the non-leaf node Aj in Figure 4.3. The same process can
be applied to the leaf node A2 of Primary Key in Example 1. Regarding the remaining
leaf nodes of Example 1, they are not the same as any non-leaf node in all concept
hierarchies, no action will be taken for these leaf because since they represent the
properties of their direct composite nodes. The enhancement process for the concept
hierarchy for Example 2 is similar to that for Example 1. The resulting concept graphs
enhanced from Figure 4.3 for Example 1 and 2 are shown in Figure 4.4.
128
Primary-Key
char(15)
cbar(20)
not-null
int
null
Pnmary-Key
int
\
not-null
real
null
char(15)
\
null
char(20)
: bas-«
null
: refer-to
Figure 4.4: Concept Graphs (Example 1 and 2) After Concept Hierarchy Enhancement
The concept hierarchy enhancement process continues for Example 3. As shown in
Figure 4.3, the leaf node B, is qualified by its composite node (T2) because T, is the root
node of the concept hierarchy for Example 2. Thus, E-Rule 3 is triggered. As a result, a
refer-to link is added cormecting the composite node of T2 (i.e., Foreign-Key) in
Example 3 to the node
in Example 2, and Bi of T2 in Example 3 is removed.
However, the qualification node T2 will not be deleted since it still has a constituent node
(B2). The same process is applied to the leaf node B2 which is qualified by its composite
node T2.
After the substitution of this leaf node by another refer-to link from the
composite node of T2 in Example 3 to the node B2, the qualification node T2 in Example
3 should be removed since it is a leaf node now. The concept hierarchy enhancement
process proceed in Example 3. E-Rule 3 is applied for the leaf node Aj and A2 of T(. As
129
a result, two refer-to links are added from the composite node of Ti (i.e., Foreign-Key) in
Example 3 to the node Ai and A2 in Example I, respectively. The node A,, A2, and T, in
the concept hierarchy of Example 3 are removed accordingly. The resulted concept
graphs after the concept hierarchy enhancement is shown in Figure 4.5.
Pnmary-Kcy
7ZS__ZZ3L 7Z5
char(15)
not-null
A:
int
\
not-null
real
not-null
ctiar(20)
B,
\
null
char(lS)
int
Primary-Key
Bi
\
null
char(20)
null
\
null
Foreign-Key
: ha5-a
: refer-to
Figure 4.5: Concept Graphs (Example 1,2 and 3) After Concept Hierarchy
Enhancement
4.2.1.3 Concept Graph Merging
The goal of concept graph merging is to merge all single-node concept graphs with other
concept graphs. A single-node concept graph is defined as a concept graph containing
only one node from which some refer-to link(s) is originated. A single-node concept
graph exists because the information pertaining to one training example was fragmented
130
and represented in different examples. For example, assume Example 1 given above be
represented in two training examples as below.
1,1
Create Table T,
(Ai char(15) not-null,
A2 char(20) not-null,
A3 int null)
Example 1.2
Primary-Key Ti (Ai, A^)
The concept graphs after the concept hierarchy enhancement for these two training
examples are shown in Figure 4.6. As shown on the right of Figure 4.6, the concept
graph of Example 1.2 consists of only one node from which two refer-to links are
originated: to Ai of Tj and to A2 of T2 in Example 1.1. The concept graph of Example
1.2 can be merged with the concept graph of Example 1.1 and the node Primary-Key will
become one of the constituent node of T, in the concept graph of Example 1.1. The
result of this merging is shown in Figure 4.7. The concept graph for Example 1.1 and
1.2 after merging is identical to that for Example 1 (i.e., the upper part of Figure 4.5).
This is because the unification of Example 1.1 and 1.2 is the same as Example 1.
Primary-Key
char(I5)
not-nuil
char(20)
int
null
: has-«
• : refer-to
Figure 4.6: Concept Graphs (Example 1.1 and 1.2) Before Concept Graph Merging
131
Primary-Key
char(15)
not-null
char(20)
not-null
int
null
Figxire 4.7: Concept Graph (Example 1.1 with 1.2) After Concept Graph Merging
Based on the illustration given above, the concept graph merging algorithm can be stated
as: "if a concept graph has only one node R and the destination nodes of all refer-to links
originated from R are all in another concept graph H, then add a has-a link from the root
node of H to R." However, this algorithm is not complete because it neglects the
situation when R refers to multiple concept graphs. Example 3 shown in Figure 4.5
serves as an explication of this situation. The concept graph for Example 3 has only one
node (i.e., Foreign-Key) which refers to two concept graphs. To which concept graph
the node Foreign-Key should be merged? The decision can be made based on the roles
of the referenced nodes with respect to the node Foreign-Key. Referring to Example 3,
the clause T2 (Bj, B2) plays an active role in defining the concept Foreign-Key, while the
clause Ti (Ai, A2) plays a passive role. Semantically, the concept graph with one single
node R should be merged into the concept graph whose constituent concepts actively
participate in defining R. However, without additional domain knowledge, it is difficult
to determine the role played by the referenced node. The heuristics of determining into
which concept graph the single-node concept graph should be merged is needed. Based
on the observation that the concepts been referenced by the single-node concept graph
132
with the active role usually appear before those with the passive role (e.g., in Example 3,
T2 (Bj, B2) appears before T, (A,, A2)), the heuristics can be stated as; "the concept
graph with the single node R should be merged into the concept graph first referenced by
R." Accordingly, the concept graph of Example 3 in Figure 4.5 is merged into the
concept graph of Example 2. Using the heuristics based on the reference sequence
principle, the rules for the concept graph merging can be formalized as follows:
For each concept graph.
Merging Rule 1 (M-Rule 1):
If the concept graph has only one node R and the destination node(s) of the refer-to
link(s) originated from R are all in another concept graph H, then add a has-a link
from the root node of H to R (i.e., R becomes the constituent node of the root node
ofH).
M-Rule 2:
If the concept graph has only one node R, the destination node(s) of the refer-to
link(s) originated from R are in more than one concept graphs, and the concept
graph H contains the concept first referenced by R, then add a has-a link from the
root node of H to R.
Example:
As discussed above, the concept graph of Example 3 should be merged into the concept
graph of Example 2, according to M-Rule 2. The concept graphs after the concept graph
merging step are shown in Figure 4.8. Please note that Example 2 in Figure 4.8 is the
unification of the original Example 2 and 3.
133
Bamp»1
Pnmary-Key
r:^jzi:^_z3
char(15)
not-null
cliar(20)
B,
^2
a
int
\
not-null
/\
real
not-null
int
Bj
ziYZir
\
\
char(lS)
null
null
null
Primary-Key
Foreign-Key
char(20)
null
: has-a
: refer-to
Figure 4.8: Concept Graphs (Example 1 and 2) After Concept Graph Merging
4.2.1.4 Concept Graph Pruning
Concept graph pruning aims at removing spurious and redundant refer-to links and
replacing derivable refer-to links with consequential refer-to links. The result of this step
is a set of well-structured concept graphs. The rules for concept graph pnming are as
follows:
For each concept node C of every concept graph.
Pruning Rule 1 (P-Rule 1):
If any spurious refer-to link originated from C is found, remove this refer-to link.
P-Rule 2:
134
If any set of redundant refer-to links originated from C are found, remove these
redundant refer-to links.
P-Rule 3:
If any set of derivable refer-to links originated from C are found, replace them by
their consequential refer-to link.
Example:
Since the concept graphs for Example 1 and 2 (as shown in Figure 4.8) do not contain
any spurious and redundant refer-to link, P-Rule 1 and P-Rule 2 are not applied.
However, the refer-to links from the node Foreign-Key of Example 2 to Ai and to A2 of
Example 1 are derivable because A, and A2 of T, are referenced by Primary-Key of Ti
which does not reference to any other node. Therefore, according to P-Rule 3, these two
derivable refer-to links will be replaced by a single refer-to link from Foreign-Key of T2
to Primary-Key of Ti. The final concept graphs after the concept graph pruning are
shown in Figiure 4.9. As can be proved, each concept graph in Figure 4.9 is a wellstructured concept graph.
135
Primary-Key
char(15)
not-null
cliar(20)
not-null
int
null
T;
Primary-Key
int
not-null
real
char(15)
null
Foreign-Key
char(20)
null
null
Figure 4.9: Concept Graphs (Example 1 and 2) After Concept Graph Pruning
4.2.2 Concept Generalization Phase
Once all training examples are represented in well-structured concept graphs, the concept
generalization phase is initiated. The goal of the concept generalization phase is to
generalize each set of "similar" nodes or links into a model construct (in the model
schema) which in fact is an instance of a metamodel construct (e.g., attribute, entityclass, association relationship, specialization relationship, or aggregation relationship).
Specifically, given a set of the well-structured concept graphs generated by the concept
decomposition phase, the concept generalization generalizes each set of "similar"
property nodes into a higher-level property concept which will later become an attribute
of entity-classes, each set of "similar" concept nodes into an entity-class, has-a links into
136
aggregation relationships or attributes of entity-classes, and refer-to links into association
or specialization relationships in the model schema. These generalization tasks require
the following four steps:
1. Generalization of property nodes,
2. Generalization of leaf concept nodes,
3. Generalization of non-leaf concept nodes and their immediate downward has-a
links, and
4. Generalization of refer-to Links.
One of the challenges to the above-mentioned generalization tasks in the concept
generalization process concerns with defining the similarity of property nodes or concept
nodes. The similarity measurement for the property nodes is based on the abstraction of
property values (i.e., the higher-level concept these property values represent).
For
example, as shown in Figure 4.9, there exist the property nodes of "char(15)",
"char(20)", "int" and "real" which should be regarded as a set of "similar" property
nodes since the higher-level concept these property values represent is "data-type".
However, without a domain knowledge which defines the higher-level concept for all or
part of these property values, it is extremely difficult to assert that these property values
should be generalized into the same higher-level concept by looking up a thesaurus if
available. Therefore, the domain knowledge of property generalization hierarchies needs
to be employed m the concept generalization phase. On the other hand, the similarity
measurement for the concept nodes is based on the structural decomposition of these
concept nodes.
In other words, two concept nodes are similar if their immediate
137
constituent concept nodes and immediate property nodes are the same or greatly
overlapped.
According to this structural decomposition similarity measurement, no
domain knowledge is required for identifying a set of "similar" concept nodes from all
possible concept nodes.
The input and output of each step in the concept generalization phase are shown in
Figure 4.10. Each of these steps, the definition of property generalization hierarchy as
well as the similarity measurements will be presented and illustrated in the following
subsection.
Well-structured
Concept
Graphs
New
Generalization
of Property
Nodes
values
Property
Generalization
Hierarchies
New property
values, and new
property
generalization
hierarchies
Generalization
of Leaf
Concept Nodes
Entity-class
instances
IVIodel
Schema
Generalization of
Non-leaf
Concept Nodes
and Downward
Has-a Links
Generalization of
Refer-to Links
Entity-class
instances,
aggregation
relationship
instances,
and attributes
Association and
specialization relationship
instances
Figure 4.10: High-Level View of Concept Generalization Phase
138
4.2.2.1 Generalization of Property Nodes
The purpose of the generalization of property nodes is to generalize each set of "similar"
property nodes into a higher-level property concept which will later become an attribute
of entity-classes corresponding to the composite nodes of these property nodes. As
mentioned, the similarity measurement for the property nodes requires the domain
knowledge of property generalization hierarchies.
Definition: Property generalization hierarchy
A property generalization hierarchy consists of nodes and links. The root node of a
property generalization hierarchy is the property concept, while the intermediate nodes
and leaf nodes are specific property values of this property concept. A link, linking a
child node to a parent node and representing an "is-a" relationship, defines that the
parent node (a property concept or a property value) is a generalization of the child node
(property value).
Example:
As shown in Figure 4.11, char(15) and char(20) are generalized into the property value
char(n) where n is an integer, while data-type is the property concept of char(n) and int.
139
Data-type
y V
int
char(n)
• Data-type = {int, char(n)}
• char(n) = {char(15), char(20)}
V V
char(15)
char(20)
Figxire 4.11: Example of Property Generalization Hierarchy
However, determination of property generalization hierarchies to be included and their
degree of completeness can be problematic. Although it is evident that a more complete
set of will improve the effectiveness of the generalization of property nodes, it is neither
feasible nor practical to enumerate all of the possible property concepts of each data
model or all possible property values. Hence, the approach of having a complete set of
property generalization hierarchies for each data model or for all data models is not
achievable.
Rather, an adaptive approach which evolves the initial set of property
generalization hierarchies from training examples during the inductive metamodeling
process is chosen. Rules which expand existing property generalization hierarchies by
adding new property values or new property generalization hierarchies are needed and
will be discussed later.
Since property nodes will be generalized into attributes of entity-classes in model
schemas, it is necessary to examine the dimensions related to the notion of attribute. An
attribute, besides its name, usually is described in such dimensions as data types, null
specification (whether the attribute allows null values or not), uniqueness specification
(whether the values of the attribute are unique or not), and multiplicity (whether the
140
attribute can take more than one value). Hence, four property generalization hierarchies
which are data model independent are constructed and employed during the concept
generalization phase: data-type, null-spec, uniqueness, and multiplicity. Initial property
values of each property generalization hierarchy are listed below:
1. data-type = {int, integer, char(n), character(n), text, real, float}
2. null-spec = (null, not null, null allowed, null not allowed}
3. uniqueness = {unique, not unique}
4. multiplicity = {single-valued, multi-valued}
The similarity between a property node i in a concept graph and a property value j in a
property generalization hierarchy is defined as:
Similarity(i, j)
where
=1
L(F(ni,nj)) + L(R(ni,nj))
=
—
(L(ni) + L{nj))/2
ifni = nj
if n: ^ n:
'
n; is the name of property node i,
nj is the name of property value j,
F(s, t) returns the longest common leading substring of s and t,
R(s, t) returns the longest common trailing substring of s and t, and
L(s) returns the length of string s.
This similarity function measures the proportion of the number of characters in common
(by position from both ends of strings) of two strings.
Example:
The similarity of the property node "char(15)" and the property value "char(n)" is:
F("char(15)", "char(n)") = "char(" => L(FC'char(15)", "char(n)")) = 5.
R("char(15)", "char(n)") = ")"
=> L(R("char(15)", "char(n)")) = 1.
141
Siinilarity("char(15)", "char(n)") = (5+lH((8+7)/2) = 0.8
The similarity of the property node "not-null" and the property value "not null" is:
F("not-nuir', "not null") = "not" => L(F("not-nuU", "not null")) = 3.
RC'not-null", "not null") = "null" => L(R("not-nuH", "not null")) = 4.
Sunilarity("not-null", "not null") = (3+4)H-((8+8)/2) = 0.875
Employing the initial data model independent property generalization hierarchies and the
similarity measurement function, the generalization of property nodes generalizes each
property node if possible and update the property generalization hierarchies if needed.
The rules for the generalization of property nodes are defined as follows:
For every property node of each concept graph.
Generalization Rule 1 (G-Rule 1):
If the possible-generalization-name of the property node N is set as P (in the concept
hierarchy creation step), then set the generalization-name of N as P.
G-Rule 2:
If the possible-generalization-name of the property node N is not set and the highest
Similarity(N, P) where P is any property value in the property generalization
hierarchies is > h (threshold with the default is 0.5), then
1) set the generalization-name of N as the name of the root node of P, and
2) if Similarity(N, P) < 1, create a new node with the name of N as the child of P in
the property generalization hierarchy where P is located.
If neither of these two generalization rules is applicable for a property node, the
generalization-name of this property node carmot be determined.
Since no other
142
knowledge is available, no action for these un-generalized property nodes needs to be
taken. Further generalization for the un-generalized property nodes will be performed in
the generalization of non-leaf concept nodes and has-a links step.
Example:
The property value with the highest similarity to the property node char(15) of A, in
Example 1 (as shown in the upper part of Figure 4.9) is char(n). Since the property node
char(15) whose possible-generalization-name is not initialized during the concept
hierarchy creation step and Similarity("char(15)", "char(n)") is 0.8 (see the similarity
computation shown above) which is higher than the default threshold, G-Rule 2 is
applied. As a result, the generalization-name of this property node is set to the root node
of the property value char(n) (i.e., data-type) and a new property value char(15) is
inserted into the data-type property generalization hierarchy as the child of the property
value char(n). The generalization process continues for every property node of Example
1 and 2. The concept graphs for Example 1 and 2 after the generalization of property
nodes completes is shown in Figure 4.12.
143
T,
A,
AZ
Primary-Key
•__Z~V
char(15)
data-type
not-null
null-spec
A,
int
data-type
cliar(20) not-null
data-type null-spcc
AI
real
data-type
not-null
null-spec
_735
int
data-type
BI
ciiar(15)
data-type
null
null-spec
null
null-spcc
Primary-Key
cbar(20)
data-type
null
null-spec
\
Foreign-Key
—: has-a
#
• : refer-to
null
null-spec
name
Figure 4.12: Concept Graphs (Example 1 and 2) After Generalization of Property Nodes
4.2.2.2 Generalization of Leaf Concept Nodes
Leaf concept nodes in a concept graph refer to those leaf nodes from which any refer-to
link is originated. Foreign-Key ip Figure 4.12 is an example of leaf concept nodes. The
presence of associated refer-to links to other concept nodes (leaf or non-leaf) suggests
entity-classes be created for leaf concept nodes. Since a leaf concept node is at the leafnode level of a concept graph, it does not have any constituent node beneath. Therefore,
no attribute will be created for the entity-class corresponding to the leaf concept node.
The rules for the generalization of leaf concept nodes are listed and illustrated below.
For every leaf concept node C of each concept graph,
G-Rule 3: Leaf Concept Node
Entity-Class
144
Set the generalization-name G of the leaf concept node C as its possiblegeneralization-name (if available) or the name of the leaf concept node (i.e., C). If
there exists no entity-class in the model schema with the same name as G, then create
an entity-class with the name G in the model schema.
Example:
The leaf concept node of "Primary-Key" of Example 1 (as depicted in Figure 4.12) has
two refer-to links to the concept nodes A, of Tj and A2 of Tj, respectively. As illustrated
in the concept hierarchy creation step, the possible-generalization-name of the concept
node "Primary-Key" is "Primary-Key". According to G-Rule 3, the generalization-name
of this leaf concept node is assigned as "Primary-Key" and an entity-class Primary-Key
is created in the model schema. The concept graph for Example 2, as shown in Figure
4.12, consists of two leaf concept nodes: Primary-Key and Foreign-Key. Since there
already exists an entity-class Primary-Key in the model schema created previously, no
entity-class creation action will be performed for the leaf concept node "Primary-Key" of
Example 2. Similar to the actions taken for the leaf concept node of "Primary-Key" in
Example 1, the generalization-name of this leaf concept node is set to "Foreign-Key" and
an entity-class Foreign-Key is created in the model schema. The resulting concept
graphs and the model schema after the generalization of leaf concept nodes are shown in
Figure 4.13.
145
Eiatnp>»1
char(15)
data-type
A,
int
data-type
not-null
null-spec
not-null cliar(20)
null-spec data-type
B,
A:
real
data-type
not-null
null-spec
cliar(lS)
data-type
null
null-spec
zns
int
data-type
null
null-spec
B2
Primary-Key
Primary-Key
cliar(20)
data-type
null
null-spec
Pnmary-Key
Pnmary-Key
\
«
null
null-spec
Foreign-Key
Foreign-Key
—• : has-*
• : refer-to
name
seneraiizatioii
Model Schema
Primary-Key
Foreign-Key
Figure 4.13: Concept Graphs (Example 1 and 2) and Model Schema After Generalization
of Leaf Concept Nodes
4.2.2.3 Generalization of Non-Leaf Concept Nodes and Immediate Downward Has-a
Links
Once the generalizations have been performed on all leaf nodes (property nodes or leaf
concept nodes) of every concept graph, the generalization process moves up for the nonleaf concept nodes and their immediate downward has-a links.
The goals of this
generalization step are to identify each set of "similar" leaf concept nodes which will be
generalized as an entity-class in the model schema, and to generalize their immediate
146
has-a links into aggregation relationships or attributes of entity-classes. As mentioned,
the similarity measurement for the non-leaf concept nodes is based on their structural
decomposition. Formally stated, the similarity between two non-leaf concept nodes N;
and Nj is defined as:
Similarity(Nj, Nj)= 1
if p(Nj) = p(Nj)
if p(Nj) ^ p(Nj)
= 0
[Gi n Gj| |Gj n Gi(
where
otherwise.
pilcji
p(N) is the possible-generalization-name of the concept node N,
C|j is the set of immediate constituent nodes of the concept node N^,
Gk is the set of generalization-names of
(with null removal but
without duplicate elimination),
S n T is the subset of S which appears in the intersection of S and T
(without duplication elimination), and
jSj is the cardinality of the set S.
Choice of this similarity fimction
can be justified as follows.
If the possible-
generalization-names of two non-leaf concept nodes are identical, regardless their
structural decomposition, they should be regarded as instances of the same model
construct; hence, the similarity between them is always 1. On the other hand, if their
possible-generalization-names are initialized in the concept hierarchy creation step but
are distinct, they are instances of different model constructs semantically. Thus, the
similarity is always 0. Since Nj and N,- are non-leaf concept nodes, |Cj| and |Cj| > 0.
Moreover, 0 < jG-, n Gj| < IC,] and 0 < jOj o GJ < jCj]. The last portion of the similarity
function is mathematically definable in [0, 1], so is the similarity fimction.
E?^amplg;
147
Assume the possible-generalization-names of the non-leaf concept nodes A] and A2 in
Example 1 (shown Figure 4.12) be not set.
The similarity between A| and A2 is
computed as follows:
CAI = {char(15), not-null} => IC^il = 2.
CA2 = {char(20), not-null} => |CAI| = 2.
Gai = {data-type, null-spec}, 0^2 = {data-type, null-spec} =>
|GAI ^ GA2| = I {data-type, null-spec}| = 2 and |GA2 ^ GAII = I {data-type, nullspec}! = 2.
Therefore, Similarity(A[, A2) = (2*2)^(2*2) = 1 since the generalizations of the
constituent nodes of A, and A2 are identical.
Example:
This example will demonstrate the situation when SnT
TnS. Assume a non-leaf
concept node P have constituent nodes pi, p2, P3, and P4. Their generalization-names are
gi, g2, g3, and g3, respectively. Another non-leaf concept node Q has three constituent
nodes qi (whose generalization-name is gi), q2 (which has not been generalized yet), and
q3 (whose generalization-name is gs). The Similarity(P, Q) is computed as follows:
P = {Pb P2> P3, P4}
=> |P|=4.
Q = {qb 02.03}
=> IQI = 3.
Gp = {gi, g2, g3, gs}. Gq = {g,, g3} => |Gp n GqI = I {g,, g3, g3} I = 3 and
|GQ n Gp| = I {gi, gs} I = 2.
Hence, Sunilarity(P, Q) = (3*2)-;-(4»3) = 0.5 < 1 since the generalization of the
constituent nodes of P is not identical to that of Q.
Given a set of non-leaf concept nodes and the similarity between any two nodes, defining
a set of similar non-leaf concept nodes with respect to the similarity threshold TI is
essential. Assume C,, C2, •••, and
be non-leaf concept nodes. The similarity which is
greater than or equal to r| between any two nodes is highlighted by a solid line, while the
148
similarities less than TI are graphically represented as dashed lines, as shown ui Figure
4.14 a). Based on this graphical notation, the determination of sets of similar non-leaf
concept nodes essentially is to partition the well-connected similarity graph into
subgraphs each of which contains nodes reachable (via solid lines) directly or indirectly
from any other node in the subgraph and unreachable (via solid lines) from any node in
other subgraphs. Using this definition, the set of non-leaf concept nodes in Figure 4.14
a) is partitioned into two sets of similar non-leaf concept nodes: {Ci, C2, C3, C4, Cg} and
{C5} shown in Figure 4.14 b).
C, ; non-leaf concqil node i
n : similarity threshold
— : similarity i ti
: similarity < n
a) Similarity between Any Two Non-Leaf Concept Nodes
b) Two Sets of Similar Non-Leaf Concept Nodes
Figure 4.14: Example of Sets of Similar Non-Leaf Concept Nodes
149
It is obvious that the set of non-leaf concept nodes which are similar to a particular node
Cj with respect to TI is the set of nodes in the well-connected similarity graph reachable
via solid lines directly or indirectly from Cj. Formally, the set of non-leaf concept nodes
S similar to CJ can be defined recursively as:
S = {Cj I Similarity(Cj, Cj) > TI, CJ E U, CJ 9^ C;} U
{Ck I Similarity(Ck, Cj) > TI, C|c e U, C^ e S, C^
Cj}
where U is the universe (i.e., the set of all non-leaf concept nodes).
Given the functions of computing the similarities of non-leaf concept nodes and defining
sets of similar non-leaf concept nodes, the generalization of non-leaf concept nodes and
immediate has-a links follows the algorithm described in Algorithm 4.1.
150
I*
Input:
All concept graphs G, model schema M and
property generalization hierarchies P
Results: M with new entity-classes and aggregation relationships,
G with generalization-name of non-leaf concept nodes initialized, and
P with new property generalization hierarchies and/or new property
values added into existing property generalization hierarchies. */
Begin
Add all non-leaf concept nodes in G into U.
Sort U in accordance with the heights of non-leaf concept nodes in a
descending
order so as to perform the generalization of non-leaf concept nodes and
has-a links in the bottom-up manner.
Repeat
> Construct the well-connected similarity graph from U.
> Get the first non-leaf concept node C, in U.
> Find S as the union of Ci and the set of non-leaf concept nodes similar
to C[.
> Apply the generalization rules G-Rule 4 to 10 (defined below) on S.
> U = U - S /* remove all of the non-leaf concept nodes in S from U */
Until U is empty.
End.
Algorithm 4.1: Generalization of Non-Leaf Concept Nodes and Downward Has-a Links
For a set of similar non-leaf concept nodes S, the following generalization rules can be
applied:
G-Rule 4: Set of Similar Non-leaf Concept Nodes -> Entity-Class
Create an entity-class E in the model schema for S. If a possible-generalization-name
G exists in S, set the name of E as G; otherwise, set the name of E as the default
"construct_#" where # starts at 1 and increments by 1 every time it is employed. Set
the generalization-name of each non-leaf concept node in S as the name of E.
151
G-Rule 5: Identifier of Entity-Class
If |S| > 1, add an attribute (default name is "name") into E as the identifier of E. This
attribute is used to distinguish the non-leaf concept nodes in S from each other. The
null-specification, uniqueness and multiplicity properties of this attribute are set to
"not-null", "unique" and "single-valued", respectively.
G-Rule 6: Identifier of Entity-Class
If |S| = 1 and the immediate property nodes of the only non-leaf concept node N in S
contain one or more uninitialized generalization-name, add an attribute (default name
is "name") into E as the identifier of E. This attribute is used to distinguish the
property-nodes of N which may be instances of N from each other. Same as in GRule 5, the null-specification, uniqueness and multiplicity properties of this attribute
are set to "not-null", "unique" and "single-valued", respectively.
G-Rule 7: Generalized Property Node
Attribute of Entity-Class
For each generalized property node P (i.e., the generalization-name of P has been set)
of every non-leaf concept node in S,
If there exists no attribute (in E where E is the entity-class for S) whose name is
die same as the generalization-name of P, add a new attribute A (whose name is
the same as the generalization-name of P) into E in the same order as that of
property nodes in all non-leaf concept nodes in S.
G-Rule 8: Ungeneralized Property Node —> Attribute of Entity-Class
For each ungeneralized property node P (i.e., the generalization-name of P has not
been set) of every non-leaf concept node in S,
152
If all of the sibling property nodes of P are ungeneralized and the number of
attributes of E (where E is the entity-class for S) is the same as the number of
immediate property nodes of S,
1) set the generalization-name of P as the name of the attribute of E whose
position in the attribute list of E is the same as P's position in the property
node list of S, and
2) create a new property value (with the same name as P) whose parent node
is a property concept which has the same name as the generalization-name
of P.
If all of the sibling property nodes of P are ungeneralized and the number of
attributes of E (where E is the entity-class for S) is different from the number of
immediate property nodes of S,
1) create a new property generalization hierarchy where the name of its root
node (i.e., property concept) is set as "property_#" where # starts at 1 and
increments by 1 every time a new property generalization hierarchy is
created,
2) add a new property value (with the same name as P) as the child node of
the root node of the newly created property generalization hierarchy,
3) set the generalization-name of P as the same name as the root node of the
newly created property generalization hierarchy, and
4) add a new attribute A (whose name is the same as the generalization-name
of P) into E in the same order as that of property nodes in all non-leaf
concept nodes in S.
153
If one of the left sibling property nodes of P has been generalized,
1) set L as the nearest generalized left sibling property node of P,
2) find the attribute A (from the attribute list of E) which is fth attribute right
to the attribute for L where / is the distance between P and L (i.e., 1 + the
number of property nodes of S between P and L),
3) set the generalization-name of P as the same name as A, and
4) create a new property value (with the same name as P) whose parent node
is a property concept which has the same name as A.
If none of the left sibling property nodes of P has been generalized and one of
the right sibling property nodes of P has been generalized,
1) set L as the nearest generalized right sibling property node of P,
2) find the attribute A (from the attribute list of E) which is fth attribute left to
the attribute for L where / is the distance between P and L (i.e., I + the
number of property nodes of S between P and L),
3) set the generalization-name of P as the same name as A, and
4) create a new property value (with the same name as P) whose parent node
is a property concept which has the same name as A.
G-Rule 9: Properties of Attributes of Entity-Class
For each attribute A,
1. If every non-leaf concept node in S has at least one property node corresponding
to A, set the null-specification property of A to "not-null", and "null" otherwise.
2. If the null-specification property of A is "not-null" and if the names of all
property nodes corresponding to A of all non-leaf concept nodes in S are all
154
different, set the uniqueness property of A to "unique", and "not-unique"
otherwise.
3. If every non-leaf concept node in S has at most one property node corresponding
to A, set the multiplicity property of A to "single-valued", and "multi-valued"
otherwise.
G-Rule 10: Immediate Downward Has-a Links —> Aggregation Relationship
If any of the non-leaf concept nodes in S has constituent concept node(s) (leaf or
non-leaf),
1. Create an aggregation relationship R in the model schema between E and the
entity-classes {Ei, E2, ..., En} which are the union of the entity-classes
corresponding to all constituent concept nodes of all non-leaf concept nodes in S.
2. The default name of R is "consist_of_#" where # starts at 1 and increments every
time when a new aggregation-relationship is created.
3. The entity-classes Ei, E2,
is connected to R as component-classes, while E
is the aggregate-class of R.
4. For each Ej related to R:
The minimal cardinality of Ej with respect to E is the minimum of the numbers of
constituent concept nodes (whose generalization-name is Ej) of all non-leaf
concept nodes in S. The maximal cardinality of Ej with respect to E is "m" (for
many) if the number N of the constituent concept nodes (whose generalizationname is Ej) of every non-leaf concept node in S is not identical; otherwise, the
maximal cardinality of Ej with respect to E is set as the number N.
155
Example:
Referring to Example 1 and 2 in Figure 4.13, U after sorting by height is {T,.Ai, T1.A2,
T,.A3, T2.A,, T2.A2, T2.B1, T2.B2, T,, T2}. The prefix by the name of the composite node
(e.g., Tj) of a non-leaf concept node (e.g., Ai) is only for the illustrative and
representational clarity and convenience. The well-connected similarity graph for U is
shown in Figure 4.15 (similarity links with zero values are not shown in the diagram). It
is obvious that the similarity between any pair of nodes in the upper part of Figure 4.15
is 1 because the generalizations of their constituent nodes are identical (i.e., data-type
and null-spec). That is, Similarity(Ti, T2) = 1 since their possible-generalization-names
are the same.
T,.A2
T,.B
; similarity = 1
T|
T,
Figure 4.15: Similarity Graph for All Non-Leaf Concept Nodes of Example 1 and 2
Assume the similarity threshold r\ be 0.2. S (the set of non-leaf concept nodes similar to
the first non-leaf concept node Tj.A,) includes T,.Ai, T[.A2, T[.A3, Tj.A,, T2.A2, T2.B1,
and T2.B2 as shown in Figure 4.15. According to G-Rule 4 and 5, an entity-class
construct 1 with the name attribute as its identifier is created in the model schema and
156
the generalization-name of each non-leaf concept node in S is set as "construct_r'. GRule 7 adds two attributes into the entity-class construct_l: data-type and null-spec.
According to G-Rule 9, the null-specification (and multiplicity) property of the data-type
attribute of construct_l is set to "not-null" (and "smgle-valued) because the data-type
property node appears at least once (and at most once) as the constituent nodes of each
non-leaf concept node in S. The uniqueness property of the data-type attribute is set to
"not-unique" since some of the data-type property nodes of the non-leaf concept nodes in
S share the same name (e.g., char(15) is shared by T,.A[ and T2.B1). Similarly, the nullspecification, uniqueness, and multiplicity property of the null-spec attribute of
construct_l can then be determined by applying G-Rule 9. Since none of the non-leaf
concept nodes in S has constituent concept node, G-Rule 10 is not applicable.
Subsequently, as depicted in Algorithm 4.1, U = U - S is executed and the next
generalization iteration starts. Now, U contains only Ti and T2 and Similarity(T|, T2)
again is 1 since the possible-generalization-names of Tj and T2 are the same. Since
Similarity(Ti, T2) is greater than the similarity threshold, S = {Tj, T2}. According to GRule 4 and 5, an entity-class called "Table" (i.e., the possible-generalization-name of T,
and T2) with the name attribute as its identifier is created in the model schema and the
generalization-name of T, and T2 is set as "Table". Because neither T, nor T2 has an
immediate constituent property node, G-Rule 7 can not be applied. On the other hand, T1
has four immediate constituent concept nodes which correspond to the entity-classes
construct_l and Primary-Key, at the same time the six immediate constituent concept
nodes of T2 correspond to the entity-classes construct_l, Primary-Key, and Foreign-Key.
157
Thus, G-Rule 10 is applied; resulting in the creation of an aggregation relationship
consist_of_l linking the entity-class Table (as the aggregate-class) to the entity-classes
construct_l, Primary-Key, and Foreign-Key (as the component-classes) in the model
schema. The minimal cardinality of construct_l with respect to Table is 3 which is the
minimum of 3 (T, has three immediate constituent concept nodes A[, A2 and A3 which
are generalized into construct_l) and 4 (Ai, A2, Bj, and 83 of T2).
The maximal
cardinality of construct_l with respect to Table is "m" because Tj and T2 have different
number of immediate constituent concept nodes corresponding to construct_l (3 and 4,
respectively). The minimal and maximal cardinality of Primary-Key and Foreign-Key
are determined likewise. After the determination of cardinalities for component-classes
of consist_of_l, T] and T2 are removed from U which now becomes an empty set. The
generalization of non-leaf concept nodes and has-a links is considered completed. The
resulting concept graphs (for Example 1 and 2) and the model schema are shown in
Figure 4.16.
158
T,
Table
Bampiej:
Aj
coiistract_l
Ai
coiistnict_l
cliar(15)
data-type
Bammtg
A,
constnict_l
int
data-type
\
\
z::x
not-null
nnll-spcc
not-null char(20)
null-spcc data-type
Primary-Key
Primary-Key
coiistnict_l
int
data-type
null
null-spec
Table
A2
constnict_l
real
data-type
not-null
null-spec
\
B,
constnict_l
cbar(lS)
data-type
\
construct^l
char(20)
data-type
null
null-spec
null
null-spec
Primary-Key
Primary-Key
Foreign-Key
Foreign-Key
; has-a
: refer-to
null
null-spec
name
gcncnlizatiOD
Model Schema
CT nanit
name^^^ r
(3,m)
Construct_l
Table
(1.1)
Primary-Key
(0. 1)
Foreign-Key
null-spcc
Table, name (not-null, unique, single-valued)
Constnict_I. name (not-null, unique, single-valued)
Construct_l. data-type (not-null, not-unique, single-valued)
Construct_l. null-spec (not-null, not-unique, single-valued)
Figure 4.16: Concept Graphs (Example 1 and 2) and Model Schema After Generalization
of Non-Leaf Concept Nodes and Has-a Links
4.2.2.4 Generalization of Refer-to Links
By now, the remaining ungeneralized parts in concept graphs are refer-to links. As
shown Figure 4.16, Primary-Key has refer-to links to A, and to A2 of T,. At the meta
159
level, these two refer-to links signify that the model construct Primary-Key relates to the
model construct construct_l; hence, an association relationship should be constructed
between them in the model schema.
Semantically, refer-to links denote association or specialization relationships between the
entity-classes created in the previous steps. Whether a refer-to link is generalized into an
association or specialization relationship depends on whether the entity-classes
corresponding to the origin and to the destination concept nodes of the refer-to link are
the same or different. If a refer-to link connects concept nodes corresponding to different
entity-classes, as illustrated previously, it will be generalized into an association
relationship because a model construct (i.e., entity-class instance in the model schema)
usually is not a superclass or subclass of a different model construct. On the contrary, if
a refer-to link connects concept nodes corresponding to the same entity-class, the origin
concept node usually is the subclass of the destination concept node. In an 00 data
model, for example, a class C is specified as a subclass of another class S. When
constructing concept graphs for this example, the concept node for C refers to the
concept node for S. Since C and S are instantiations of the same model construct, their
corresponding entity-class E should be the same. The refer-to link from C to S denotes a
specialization relationship in which E for the origin concept node C is the subclass and E
for the destination concept node S is the superclass.
Furthermore, if a concept node (whose corresponding entity-class is C) contains a set of
refer-to links R whose destination concept nodes correspond to the same entity-class E in
160
the model schema, a decision needs to be made regarding whether R be generalized into
one or more relationships in the model schema between C and E. The answer is
contingent first on the type of relationship into which R should be generalized and
secondly on whether all of the destination concept nodes of R are in the same or different
concept graph. If specialization relationship is an appropriate type for R (i.e., C is the
same as E), only one specialization relationship should be created for R, because all
specialization relationships imply identical semantic meaning and interpretation
(superclass-and-subclass).
If there are more than one specialization relationships
between two entity-classes, only one of them is significant and the rest of them are
redundant and therefore should not exist.
Unlike the decision on the number of
specialization relationships for R, the semantical correctness of permissible coexistence
of multiple association relationships with distinct meanings between two entity-classes
requires additional consideration for the decision on the number of association
relationships for R. Different concept graphs referenced by a concept node assume
different roles in defining this concept node, as discussed in Section 4.2.1.3. Thus, in
spite of all destination concept nodes of R corresponding to the same entity-class, the
subset of R which ends at one concept graph should not be considered the same as
another subset of R ending at different concept graphs. Accordingly, when association
relationship is the type for R (i.e., C differs from E), if all of the destination concept
nodes of R are in the same concept graph, one association relationship is sufficient for
generalizing R; otherwise, the number of association relationships for R is the number of
different concept graphs being referenced by R.
161
Since the coexistence of multiple association relationships with distinct meanings
between two entity-classes is unusual in model schemas, the above-mentioned
consideration for the decision on the number of association relationships for R will not
be taken into account in the following rules for the generalization of refer-to links.
Accordingly, the process of and rules for the generalization of refer-to links is defined as
follows:
For each refer-to link,
G-Rule 11: Refer-to Link
Association Relationship
If the entity-classes Ei and E2 corresponding to the origin and the destination concept
nodes of the refer-to link L are different and if there exists no association relationship
whose first associated-class is Ej and second associated-class is E2 in the model
schema, then
1. Create an association relationship R in the model schema for L. The default
name for R is "relate_#" where # starts at 1 and increments by 1 every time an
association relationship is created in the model schema.
2. R cormects Ei (the first associated-class of R) with E2 (the second associatedclass of R).
3. The minimal cardinality of E2 on R is the minimum number of the concept nodes
of E2 (i.e., those concept nodes whose generalization-names are E2) referred by
each concept node of E, of every concept graph. The minimal cardinality of E,
on R is the minimum number of the concept nodes of Ei referring to each concept
node of E2 on every concept graph.
162
4. The maximal cardinality of E2 on R is "m" (for many) if the number N of the
concept nodes of E2 referred by each concept node of E| of every concept graph
is not identical; otherwise, the maximal cardinality of E2 on R is set to the
number N. Similarly, the maximal cardinality of E[ on R is "m" if the number N
of the concept nodes of E, referring to each concept node of Ej of every concept
graph is not identical; otherwise, the maximal cardinality of E, on R is set as N.
G-Rule 12: Refer-to Link -> Specialization Relationship
If the entity-classes corresponding to the origin and the destination concept nodes of
the refer-to link L are the same (say E) and if there exists no specialization
relationship whose superclass and subclass are E in the model schema, then
1. Create a specialization relationship S in the model schema for L. The default
name for S is "specialize_#" where # starts at 1 and increments by 1 every time a
specialization relationship is created in the model schema.
2. S connects E (as the superclass) with itself (as the subclass).
3. The minimal cardinality of the subclass of S is the minimum number of the
concept nodes of E referring to each concept node of E of every concept graph.
4. The maximal cardinality of the subclass of S is "m" if the number of the concept
nodes of E referring to each concept node of E of every concept graph is not
identical; otherwise, the maximal cardinality of the subclass of S is set as the
number N.
Example:
163
Assume the first refer-to link in the concept graphs for Example 1 and 2 (as shown in
Figure 4.16) is from Primary-Key to Ai. Since the entity-classes corresponding to the
origin and to the destination concept nodes of this refer-to link are Primary-Key and
construct_l which are different, G-Rule 11 is applied.
As a result, an association
relationship relate_l which connects Primary-Key with construct_l is created in the
model schema. In Example 1, the nimiber of the concept nodes of construct_l referred
by the concept node of Primary-Key is I, while it is 2 in Example 2. Thus, the minimal
and the maximal cardinality of construct_l on relate_l are 1 and "m", respectively. On
the other hand, since not all of the concept nodes of construct_l are referred by the
concept node Primary-Key in both examples, the minimal cardinality of Primary-Key on
relate_l is 0. Moreover, at most one refer-to link from a concept node of Primary Key to
a concept node of construct_l in both examples; thus, the maximal cardinality of
Primary-Key on relate_l is 1.
After the association relationship relate_l is created for the refer-to link from the concept
node of Primary-Key to the concept node of construct_l, no action needs to be taken for
the refer-to links from Primary-Key to A2 (in Example 1) and from Primary-Key to Aj
(in Example 2). Since the concept node Foreign-Key (in Example 2) has three refer-to
links: two to the concept nodes of construct_l and one to the concept node of PrimaryKey. Therefore, two association-relationships relate_2 (connecting Foreign_Key with
construct_l) and relate_3 (connecting Foreign_Key with Primary_Key) will be created in
the model schema. The minimal and maximal cardinalities of each of the two association
164
relationships can be determined using G-Rule 11. The resulting model schema after the
generalization of refer-to links is shown in Figure 4.17.
IV^etSdieiiia'
Table
'consist of 1
Construct 1
(2.2)
(0.1)
(1.1)
(3.m)
(I.m)
(0.1)
Primary-Key
(1.1)
relate 3
relate 1
(0.1)
relate 2
Table, name (not-null, unique, single-valued)
Constnict_I. name (not-null, unique, single-valued)
Construa_l. data-type (not-null, not-unique, single-valued)
Constnict_l. null-spec (not-null, not-unique, single-valued)
Figure 4.17: Model Schema After Generalization of Refer-to Links
The generalization of refer-to links step concludes the concept generalization phase. In
other words, induction and abstraction of model constructs from the training examples
are achieved.
By substituting the entity-class Construct_l in Figure 4.17 with a
semantically meaningfiil name such as "Attribute", the induced model constructs
demonstrate the key concepts of the relational data model and can be explained as
follows:
A table in the relational data model consists of three or more attributes (i.e.,
Construct_l in the diagram), one and only one primary key, and zero or one foreignkey. Each primary key is defined on (i.e., relate_l) one or many attributes, but an
attribute can be associated with at most one primary key. On the other hand, a foreign
key refers to (i.e., relate_3) one and only one primary key, while a primary key is
referenced by zero or one foreign key. Finally, a foreign key composes of (i.e.,
relate_2) two attributes, and an attribute can be included in at most one foreign key.
165
The induced model constructs and the relationships among them conform to the general
definition of the relational data model. However, some cardinalities are problematic.
For example, the minimal cardinality of the component-class Attribute on consist_of_l
and the maximal cardinality of the component-class Foreign-Key on consist_of_l are not
conect with respect to the general definition of the relational data model, but are accurate
with respect to the training examples employed. Such incorrect cardinalities are resulted
firom the representativeness of training examples and can be fixed
when more
representative training examples are included in the induction process.
4.2.3 Constraint Generation Phase
This phase is to generate explicit model constraints and implicit model constraints
pertaining to the model constructs induced from training examples. Two major types of
explicit model constraints are considered in this phase: domain and inclusion model
constraints. A domain constraint specifies the possible values an attribute can have
[EN94], while an inclusion constraint specifies an inclusion restriction between an
association relationship connecting component-classes in an aggregation relationship and
this aggregation relationship. Take a relational data model for example, attribute(s)
serving as the primary key of a table are the attribute(s) of the same table (i.e., the
primary key attributes of a table is a subset of the attributes of the table on which the
primary key is defined).
Although domain and inclusion model constraints are not
inherent to the metamodel constructs, training examples from which the model constructs
are generalized afford the inductive possibility. On the other hand, implicit model
constraints are instantiations of the metamodel semantics triggered by the instantiations
166
of metamodel constructs into model constructs (as depicted in Figure 3.5).
The
algorithm for instantiating from the metamodel semantics into implicit model constraints
has been specified in Algorithm 3.1.
During the generalization of property nodes step, attributes of entity-classes in the model
schema are generalized from
property nodes in concept graphs, and the property
generalization hierarchies are updated accordingly. Therefore, the possible values an
attribute can take are all leaf nodes of the property generalization hierarchy to which the
attribute (i.e., all property nodes from which the attribute is generalized) corresponds
(i.e., the root node of the property generalization hierarchy is the same as the name of the
attribute). However, the incorporation of all leaf nodes of a property generalization
hierarchy into a domain constraint may cause the problem of over-specialized. For
example, assume an attribute correspond to the property generalization hierarchy of data
type and some of its leaf nodes include char(5), char(lO), char(15), etc. If all of the leaf
nodes in the property generalization hierarchy are included in the domain constraint of
this attribute, one will see many values for character strings of different length (e.g.,
char(5), char(lO), char(15), etc.) in the domain constraint. If char(5), char(lO), char(15),
etc. can be substituted by the property value char(n) where n > 0, the over-specialized
problem can be alleviated. A solution towards to this end is as follows. If a set of leaf
nodes in a property generalization hierarchy share the same non-root parent node, the
non-root parent node (in replacement of the set of child leaf nodes) will be included in a
domain constraint.
For those leaf nodes whose parent nodes are the root nodes of
167
property generalization hierarchies, further generalization possibility does not exist; thus,
they will appear in domain constraints as they are.
Inclusion constraints can be induced from training examples by checking whether all
refer-to links from which an association relationship is generalized are all intra-concept
graph refer-to links. If so, an inclusion constraint will be created on the entity-class
which is the lowest aggregate-class, in the aggregation hierarchy, of the entity-classes
related by the association relationship. The discussion of generic form of inclusion
constraints will be deferred until the Constraint Generalization Rule 2 (CG-Rule 2).
Accordingly, the constraint generalization phase includes explicit model constraint
induction and inherent model constraint instantiation. Its algorithm is summarized in
Algorithm 4.2, followed by the constraint generation rules employed in the algorithm.
168
/* Input:
Model schema M, metamodel semantics S and
property generalization hierarchies P.
Result: M with explicit and implicit model constraints */
Begin
For each model construct E of M
> For each attribute A of E
Apply constraint generation rule CG-Rule 1 on A.
> If E is an entity-class
For each metamodel semantics T in the metamodel construct Class of S
Instantiate(T, E). /* see Algorithm 3.1 */
> If E is an association relationship
Apply constraint generation rule CG-Rule 2 on E.
For each metamodel semantics T in the metamodel construct
Association of S
Instantiate(T, E).
> If E is an aggregation relationship
For each metamodel semantics T in the metamodel construct
Aggregation of S
Instantiate(T, E).
> IfE is a specialization relationship
For each metamodel semantics T in the metamodel construct
Specialization of S
Instantiate(T, E).
End.
Algorithm 4.2: Constraint Generalization Process
Constraint Generation Rule 1 (CG-Rule 1): Property Generalization Hierarchy ->
Domain Constraint
If the attribute A of the model construct E is not the name attribute, create the
following domain constraint in E:
Va: e_name => a.a_name e value-set
where e_name is the name of E,
a_name is the name of A, and
169
value-set is the collection of the leaf nodes whose parents are the root node of
the property generalization hierarchy to which A belongs and the immediate
parents of the leaf nodes whose parents are not the root node of the property
generalization hierarchy.
CG-Rule 2: Intra-structural Concept Graph Refer-to Links -> Inclusion Constraint
If all refer-to links from which the association relationship R is generalized (R
connects the entity-classes S to T) are intra-concept graph refer-to links, create the
following inclusion constraint in the lowest common aggregate-class C of S and T;
Vc: cname, Ve: c.sname, A = e.mame.tname => Ac c.T
where cname is the name of entity-class C,
sname is the name of the entity-class S,
tname is the name of the entity-class T, and
mame is the name of the association relationship R.
Example:
Since two non-name attributes exist m the entity-class construct_l, two domain
constraints are created and encapsulated in the entity-class construct_l (according to CGRule 1).
Va: construct_l => a.data-type 6 {int, char(n), real}
Va: construct_l => a.null-spec 6 {not null, null}
An examination of the concept graphs for Example 1 and 2 (as shown in the upper
portion of Figure 4.16) indicates that all of the refer-to links originated from PrimaryKey are intra-concept graph refer-to links.
According to CG-Rule 2, an inclusion
constraint (as shown below) is generated and encapsulated in the entity-class Table
which is the lowest common aggregate-class of the entity-classes for the origin and
destination concept nodes of these refer-to links (i.e., Primary-Key and construct_l in the
170
model schema). Analogously, the intra-concept graph refer-to links from Foreign Key
to BI of T2 and B2 of T2 also suggests an inclusion constraint be created in the entityclass Table. These two inclusion constraints are listed in the following:
Vc: Table, Ve: c.Primary-Key, A = e.relate_l.construct_l => A c c.construct_l
Vc: Table, Ve: c.Foreign-Key, A = e.relate_2.construct_l
A c c.construct_l
By applying Algorithm 3.1, nineteen inherent model constraints are instantiated from the
implicit metamodel constraints of the metamodel schema. For example, the inherent
model constraints 5) and 6) in Table 4.3 state that for the names of tables must be unique
and each table needs to have a table name. The explicit model constraints induced from
examples and the inherent model constraints instantiated from the implicit metamodel
constraints in the constraint generation phase are listed in Table 4.3.
171
Domain
Constraints
Inclusion
Constraints
Inherent
Model
Constraints
1. V a:construct_l =:> a.data-type e {int, char(n), real}
2. V a:construct_l => a.null-spec e (null, not-null}
1. V c:Table, V e:c.Primary-Key, A=e.reIate_l.construct_l
=> a c c.construct_l
2. V crTable, V e:c.Foreign-Key, A=e.reIate_2.construct_I
A c c.construct_l
1. V ol:construct_l, V o2:construct_l, ol ?io2
=> 01.name * o2.name
2. V oI:construct_I =:> count(ol.name) = I
3. V ol:construct_l => count(ol.data-type) = 1
4. V ol:construct_l => count(oI.null-spec) = I
5. V o I :Table, V o2:Table, o I o2 => o I .name o2.nanie
6. V ohTable => count(ol.name) = 1
7. V ol:Table => count(ol->consist_of_l->construct_l) >3
8. V ol:Table => count(oI->consist_of_I->Priniary-Key) = 1
9. V olrTable
count(oI->consist_of_l->Foreign-Key) > 0
10. V olrTable => count(ol-*consist_of_I->Foreign-Key) < I
11. V ol:Primary-Key => count(oI->relate_I-+construct_I) S 1
12. V ol:construct_I => count(ol->reIate_l^Primary-Key) >. 0
13. V ol:construct_l => count(ol-^relate_l->Primary-Key) < I
14. V oI:Foreign-Key => count(ol->relate_2->construct_l) = 2
15. V ol:construct_l =:> count(ol->reIate_2->Foreign-Key) > 0
16. V ol:construct_l => count(ol->reIate_2^Foreign-Key) < 1
17. V oI:Foreign-Key => count(ol->reIate_3->Primary-Key) = 1
18. V ol:Primary-Key => count(oI^reIate_3->Foreign-Key) > 0
19. V ol:Primary-Key => count(ol->relate_3->Foreign-Key) < 1
Table 4.3: Model Constraints (Example 1 and 2) After Constraint Generation
4.3 Time Complexity Analysis of Abstraction Induction Technique
The time complexity of each step in Abstraction Induction Technique for inductive
metamodeling is determined by a number of factors, including 1) the number of training
examples, 2) the number of property nodes on a concept graph, 3) the number of leaf
concept nodes on a concept graph, 4) the number of non-leaf concept nodes on a concept
graph, 5) the number of refer-to links on a concept node, 6) the number of property
hierarchies, 7) the number of property value on a property hierarchy, 8) the number of
172
model constructs in the model schema being induced, 9) the number of relationships
among model constructs, and 10) the number of attributes of a model construct. The
incorporation of all above-mentioned factors in the time complexity analysis will conceal
the most important factors to be studied in complicated order of magnitude functions.
Thus, only the most important factors will be selected in this time complexity analysis,
including the number of training examples (n), the number of model constructs in the
model schema being induced (m), and the number of relationships among model
constructs (r). The time complexity of Abstraction Induction Technique is summarized
in Table 4.4. The order of magnitude of the concept hierarchy enhancement being O(n^)
stems from the need to examine all concept hierarchies (non-leaf nodes) for every leaf
node in each concept hierarchy for potential reference relationships.
The need to
construct a well-connected similarity graph for all non-leaf concept nodes of all concept
graphs results in the order of magnitude of the generalization of non-leaf concept nodes
being O(n^). The orders of magnitude of the remaining steps in Abstraction Induction
Technique are obvious and wall not be explained further.
Phase
Concept
Decomposition
Concept
Generalization
Constraint
Generation
Step
Concept hierarchy creation
Concept hierarchy enhancement
Concept graph merging
Concept graph pruning
Generalization of property nodes
Generalization of leaf concept nodes
Generalization of non-leaf concept nodes
Generalization of refer-to links
Order of Magnitude
0(n)
O(n')
0(n)
0(nO
0(n)
0(nm)
0(n')
0(nr)
0(m+r)
Table 4.4: Time Complexity of Each Step in Abstraction Induction Technique
173
The overall time complexity of Abstraction Induction Technique is 0(nm+nVnr). If the
number of training examples is the most concerned factor, the overall time complexity of
Abstraction Induction Technique is O(n^).
4.4 Evaluation of Abstraction Induction Technique
In order to evaluate Abstraction Induction Technique for inductive metamodeling, a
prototype of this technique has been implemented. Three evaluation studies will be
conducted in this section. The model schema induced from each of the studies will be
compared with a reference model schema corresponding to the data model employed in
the study. The reference model schema conforms to the general definition of the data
model and is engineered by an expert model engineer. An inconsistency between these
two model schemas can be classified into one of the followings:
1. Incorrect: A specification in the induced model schema is identified in the reference
model schema.
However, its details are not identical to its counterpart in the
reference model schema.
2. Missing: A specification in the reference model schema cannot be found in the
induced model schema.
3. Excessive: A specification in the induced model schema cannot be found in the
reference model schema.
Evaluation Study 1: Inducing Relational Model Schema From A University HealthCenter Database
174
The first evaluation study is based on a relational database for a university health center.
The health-center database is mainly for maintaining patient and diagnosis information.
It consists of six relational tables: Patient, Doctor, Drugs, Treatment, Dispense, and
Specialties. These relational tables as training examples for inducing the relational
model schema are represented in a Structured Query Language (i.e., DDL of the
relational data model). The detailed inputs of and output from Abstraction Induction
Technique are listed in Appendix D. The induced relational model schema is graphically
shown in Figure 4.18. The model construct "Construct_l" is in fact the model construct
"Attribute" according to the relational data model terminology. The reference relational
model schema which conforms to the general definition of a relational data model is
shown in Appendix B and Figure 3.9.
Table
consist of 1
(1.1)
Construct 1
(1.1)
(l.m)
(0. 1)
Primary-Key
7rTrC>(0, in)|
relate 3
relate I
Foreign-Key
(O.I)
relate_2
Table.naine (not-null, unique, single-valued)
Constnict_l.naine (not-null, unique, single-valued)
Construa_I .data-type (not-null, not-unique, single-valued)
Construct_I .null-spec (not-null, not-unique, single-valued)
Figure 4.18: Relational Model Schema Induced from University Health Center Database
The summary of this evaluation study is provided in Table 4.5. The induced relational
model schema is consistent with the reference relational model schema in terms of model
constructs and their relationships but contains inconsistencies in the remaining
components of the induced model schema.
It consists of two uicorrect cardinality
175
specifications. For example, by the definition of the relational data model, a table
contains one or more attributes. However, since all training examples happen to have
two or more attributes (i.e., Construct_l), the minimal cardinality on the model construct
"Construct_r' in relation to the model construct "Table" is over-specialized as 2. As
mentioned at the end of the concept generalization phase in Section 4.2.2.4, this overspecialized problem results from
the representativeness of the training examples
employed in this evaluation study. Thus, when more representative training examples
are provided, such incorrectness problem will not be present in the induced model
schema. On the other hand, two attributes (i.e., IsUnique and DefaultValue) of the
model construct "Attribute" in the reference model schema are missing in the induced
model schema. This may be due to the representativeness of training examples or lack of
these specifications in the data model with which the training examples were defined. In
the former case, the incompleteness of the induced model schema as compared to the
reference one should be treated as an error, while it is not an error in the later case. All
inconsistencies in implicit model constraints can be attributed to the incorrect
cardinalities and the incomplete attributes of model constructs in the induced model
schema. The induced model schema contains an excessive explicit model constraint (the
domain constraint for DataType). Its absence in the reference relational model schema is
because this explicit model constraint is system-dependent (varying from one DBMS to
another) while the reference model schema is not. Thus, it should not considered as an
error in the induced model schema.
176
Induced Relational
Model Schema
Model Construct
Relationship
between Model
Constructs
Cardinality
Statistics
Description/Causes
4
4
Perfect match.
Perfect match.
18
— Incorrect: 2
Attribute of
Model Construct
4
- Missing: 2
Explicit Model
Constraint
4
- Excessive:
1
Implicit Model
Constraint
17
- Incorrect: 2
- Missing: 3
minimal cardinality between Table and
Construct_l and maximal cardinality on
Construct l to Foreign-Key.
Missing IsUnique and DefaultValue in
Attribute model construct.
Domain constraint for DataType is not
shown in the reference relational model
schema.
Resulted from incorrect cardinalities.
Resulted from missing attributes.
Table 4.5: Summary of Evaluation Study 1 (Relational Model Schema)
The precision rates of this evaluation study, if all of the specifications in a model schema
are equally weighted, are shown in Table 4.6. As shown in the table, the worst precision
rate of this evaluation study is 82.35% (the number of consistent specifications divided
by the total number of specifications) when missing attributes are considered as
erroneous induction results and inconsistent implicit model constraints are treated as
independent errors from incorrect cardinalities and incomplete attributes. However, it is
more reasonable to exclude inconsistent implicit model constraints from the computation
of precision rate because inconsistent implicit model constraints can be attributed to and
are derived from incorrect cardinalities and incomplete attributes of model constructs in
the induced model schema. Thus, the precision rates of the evaluation study are 88.24%
177
(when missing attributes are treated as erroneous induction results) and 94.12% (when
missing attributes are not erroneous induction results). According to these precision
rates, it can be concluded that Abstraction Induction Technique produces a satisfactory
relational model schema in the evaluation study 1.
Implicit Model Constraints
Are Included
Implicit Model Constraints
Are Not Included
Missing Attributes
Are Errors
82.35%
88.24%
Missing Attributes
Are Not Errors
90.20%
94.12%
Table 4.6: Precision Rates of Abstraction Induction Technique in Evaluation Study 1
Evaluation Study 2: Inducing Network Model Schema From A Hypothetical
Company Database
Training examples in the second evaluation study are extracted from a hypothetical
company database schema (in the network data model) depicted in [EN94].
This
company database maintains uiformation concerning employees, their departments and
supervisors, projects they are working on, and their dependents.
The external
representation of the training examples is CODASYL DBTF DDL-like language. The
detailed inputs of and output from
Abstraction Induction Technique are listed in
Appendix E. The reference network model schema is graphically depicted in Figure
4.19, and the network model schema induced from
hypothetical company database is shown in Figure 4.20.
the training examples in the
178
Set
Record
name
consist of
(l.m)
(l.m)
name
^Ijata-typ?^^--"
Attribute
(I.m)\/ (0.1)
Not-Duplicate
rerer_by
Record.name (not-null, unique, single-valued)
SeLname(not-null, unique, single-valued)
Attribute.name (not-null, unique, single-valued)
Attribute.data-type (not-null, not-unique, single-valued)
Figure 4.19: Reference Network Model Schema
relate 2
(I.I)
name
(l.m)
(3.m)
Construct 1
Set
(0.1)
Not-Duplicate
relate I
Record.nanie (not-null, unique, single-valued)
SeLname(not-nulI, unique, single-valued)
Construct_I.name (not-null, unique, single-valued)
Construct_l .data-type (not-null, not-unique, single-valued)
Figure 4.20: Network Model Schema Induced from Hypothetical Company Database
The model construct "Construct_l" in the induced network model schema corresponds to
the model construct "Attribute" in the reference network model schema. In terms of
model constructs, their attributes and relationships among model constructs, the induced
network model schema is consistent with the reference network model schema.
However, the induced network model schema consists of two incorrect cardinality
specifications: the maximal cardinality on the model construct "Set" in relation to the
model construct "Record" and the minimal cardinality on the model construct
"Construct_r' in relation to the model construct "Record". As discussed previously,
inconsistent implicit model constraints are derived errors; hence, they will not be
179
included in the evaluation summary of this study as shown in Table 4.7. Since the
precision rate of this study is 91.30% (i.e., (4+3+10+4)-r(4+3+12+4)), Abstraction
Induction Technique produces a satisfactory network model schema in this evaluation
study.
Induced Network
Model Schema
Model Construct
Relationship
between Model
Constructs
Cardinality
Attribute of
Model Construct
Statistics
Description/Causes
4
3
Perfect match.
Perfect match.
12
- Incorrect: 2
4
minimal cardinality between Record and
Construct_l and maximal cardinality on Set
to Record.
Perfect match.
Table 4.7: Summary of Evaluation Study 2 (Network Model Schema)
Evaluation Study 3: Inducing Hierarchical Model Schema From A Hypothetical
Company Database
The third evaluation study is based on the same hypothetical company database as used
in the evaluation study 2. But, the application schema is represented in the hierarchical
data model and expressed in a hierarchical data definitional language as shown in
[EN94]. The detailed input of and output from Abstraction Induction Technique are
listed in Appendix F. The reference hierarchical model schema is shown in Figure 4.21,
while the hierarchical model schema induced from the hypothetical company database is
shown in Figure 4.22.
180
Hierarchy
(O.I)
has root
CT ""n't
point_to
Record
refer_to
(I.I)
{I. I)
^ C0ini5t_0f
(O.m)
Pointer
(O.m)
^^~name
^data-typc^
(O.m)
(O.m)
Attribute
( l . m ) ' 0 ' (0.1)
Key
(O.m)
(0.1)
Parent
composed
Hierarchy.name (not-null, unique, single-valued)
Record.name (not-null, unique, single-valued)
Setname(not-null, unique, single-valued)
Attribute.name (not-null, unique, single-valued)
Attribute.data-type (not-null, not-unique, single-valued)
Figure 4.21: Reference Hierarchical Model Schema
Hierarchy
relate 1
relate 4
relate 3
Record
•consist_of_l
name
Pointer
C^ata-type^!^—'
Construct 1
(1.1)0(0. 1)
relate 2
Key
Parent
Hierarchy.name (not-null, unique, single-valued)
Record.name (not-null, unique, single-valued)
SeLname(not-null, unique, single-valued)
Construct_I.name (not-null, unique, single-valued)
Construct_I.data-type (not-null, not-unique, single-valued)
Figure 4.22: Hierarchical Model Schema Induced from Hypothetical Company Database
The model construct "Construct_l" in the induced hierarchical model schema
corresponds to the model construct "Attribute" in the reference hierarchical model
schema. In terms of model constructs, their attributes and relationships among model
constructs, the induced hierarchical model schema is consistent with the reference
181
hierarchical model schema. However, the induced hierarchical model schema consists of
five incorrect cardinality specifications as depicted in the evaluation summary of this
study, as shown in Table 4.8
The satisfactory precision rate of this study (i.e.,
(6+5+19+4)-r(6+5+24+4) = 87.18%) indicates that Abstraction Induction Technique is
effective in producing a hierarchical model schema in this evaluation study.
Induced
Hierarchical
Model Schema
Model Construct
Relationship
between Model
Constructs
Cardinality
Statistics
Description/Causes
6
5
Perfect match.
Perfect match.
24
- Incorrect; 5
4
Attribute of
Model Construct
1) Minimal cardinality on "Record" to
"Hierarchy", 2) maximal cardinality on
"Pointer" to "Record" via "relate_4", 3)
maximal cardinality on "Pointer" to "Record"
via "consist_of_l", 4) maximal cardinality on
"Construct_l" to "Key", and 5) maximal
cardinality on "Parent" to "Record" via
"consist_of_r'.
Perfect match.
Table 4.8: Summary of Evaluation Study 3 (Hierarchical Model Schema)
Summary of Evaluation Studies:
The three evaluation studies are summarized in Table 4.9. As discussed above, the
induced model schema in each evaluation study is consistent with its corresponding
reference model schema in terms of model constructs, their attributes, and relationships
182
between model constructs. Cardinality specifications cannot accurately be induced in all
evaluation studies. To improve the precision rate of Abstraction Induction Technique for
inductive metamodeling, generalization of maximal and minimal cardinalities from
training examples desires further research.
Data Model
to be Induced
Study 1
Relational
Study 2
Network
Study 3
Hierarchical
Application
Schema
# of Training
Examples
University
Health Center
Database
Hypothetical
Company
Database
Hypothetical
Company
Database
6
8
9
Precision Rate
(excluding model
constraints)
94.12% (only consisting
of 2 incorrect
cardinalities)
91.30% (only consisting
of 2 incorrect
cardinalities)
87.18% (only consisting
of 5 incorrect
cardinalities)
Table 4.9: Summary of Three Evaluation Studies
183
CHAPTER 5
Construct Equivalence Assertion Language
A construct equivalence representation is one of the essential components in the
construct-equivalence-based
methodology
for
schema
translation
and
schema
normalization. This chapter is devoted to the development of a construct equivalence
representation. Construct Equivalence Assertion Language (CEAL). Design principles
for the development of a construct equivalence representation are specified first,
followed by the detailed CEAL syntactic specifications. The execution semantics of
intra-model and inter-model construct equivalences specified in CEAL will also be
defined. A construct equivalence transformation function, which is the building block of
the construct equivalence transformation method, will be developed in this chapter.
Finally, CEAL will be evaluated according to its design principles.
5.1 Design Principles for A Construct Equivalence Representation
A construct equivalence representation serves as the formal specification for intra-model
and inter-model construct equivalences required by schema translation and schema
normalization. The design principles for a construct equivalence representation are
defined as follows:
1. Declarative:
184
The construct equivalence representation must be declarative in nature.
As
mentioned in Section 2.4, the method for reasoning on intra-model and inter-model
construct equivalences for schema translation and schema normalization is separated
from the representation.
2. Supporting bi-directional construct equivalences:
A bi-directional construct equivalence refers to the ability of interpreting and
reasoning from both directions of the construct equivalence. As mentioned earlier,
this property is important to the construct-equivalence-based schema translation in
order to avoid the need for specifying two sets of unit-directional translation
knowledge between two data models. It is even more important when considering
the reusability of intra-model construct equivalences of a data model during defining
translation knowledge between the data model and different data models and when
considering the use of intra-model construct equivalences in schema normalization
because the actual transformation direction is determined by normalization criteria
which can be any combination of undesired model constructs.
3. Instantiable by applications constructs:
Since construct equivalences are defined at the model level, they will be instantiated
by application constructs in an application schema during the schema translation and
schema normalization process. Therefore, the use of variables to be instantiated by
application constructs in the construct equivalence representation are inevitable.
4. Capable of specifying various nimiber of constructs in a construct equivalence:
Based on the number of constructs involved in each side of a construct equivalence,
three types of construct equivalences can be identified: 1) one-to-one which involves
185
only one model construct on each side of a construct equivalence, 2) one-to-many (or
many-to-one) which involves one model construct on one side of a construct
equivalence and multiple model constructs on the other side, and 3) many-to-many.
The construct equivalence representation needs to be capable of specifying these
three types of construct equivalences.
5. Capable of expressing part of a model construct in a construct equivalence:
A construct equivalence may involve model constructs which partially participate in
the construct equivalence.
A partial model construct in a construct equivalence
denotes that not all but only some of its instances will participate in the construct
equivalence.
A partial model construct can be viewed as the model construct
associated with certain restriction(s). For example, "single-valued attribute" is the
attribute model construct with the restriction on the multiplicity property as "singlevalued."
Thus, the construct equivalence representation needs to allow the
expression of restrictions on model constructs when partial model constructs in
construct equivalences are required.
6. Allowing the expression of multiple instances of the same model construct in a
construct equivalence:
Multiple mstances of the same model construct may be involved in a construct
equivalence. For example, "a set of relations in the relational data model" refers to
multiple instances of the relation model construct in the relational data model. Since
the number of instances of a model construct is not static and will dynamically be
determined at the schema translation or schema normalization process, the construct
equivalence representation needs to be flexible enough to deal with this dynamics.
186
7. Capable of defining detailed correspondences for construct equivalences:
Defining construct equivalences only at the construct level is not sufficient to
describe the full spectrum of construct equivalences. As mentioned, each model
construct in a model schema is described by some attributes (i.e., properties) and has
relationships with other model constructs. Detailed correspondences on these two
levels are required when specifying construct equivalences.
For example, when
specifying a non-foreign key attribute in the relational model which is equivalent to a
single-valued attribute in the SOOER model, correspondences between properties
(e.g., name, null specification, uniqueness specification, etc.) of the single-valued
attribute in the SOOER model and the non-foreign key attribute in the relational data
model need to be deliberated as well. Thus, the construct equivalence representation
should allow specifying and being able to reason on detailed correspondences.
8. Single language for both intra-model and inter-model construct equivalences:
The construct equivalence representation should serve as the specification language
for both intra-model and inter-model construct equivalences.
5.2 Development of Construct Equivalence Assertion Language
The section details the development of a construct equivalence representation. Construct
Equivalence Assertion Language (CEAL). In this section, all examples illustrating the
use of CEAL for specifying inter-model or intra-model construct equivalences are based
on the SOOER model schema and the relational data model schema, shown in Figure 3.7
and Figure 3.9, respectively.
187
5.2.1 High-Level Syntax Structure of Construct Equivalence Assertion Language
Construct Equivalence Assertion Language (CEAL) is used to express intra-model or
inter-model construct equivalences each of which follows the syntax of:
construct-set s constaict-set
WITH construct-correspondence
[ HAVING ancillary-description]
where = refers to "is equivalent to" and [ ] denotes "optional".
A construct equivalence defined in the above syntax is interpreted as "two construct-sets
are equivalent with specific correspondences defined in the construct-correspondence
clause." As discussed, a construct equivalence transformation method is a necessity for
achieving
bi-directional
construct
equivalences.
The
construct
equivalence
transformation method determines the direction of each construct equivalence and
employs a construct equivalence transformation function for exchanging the LHS with
the RHS construct-set of the construct equivalence.
The construct equivalence
transformation function needs to ensure the validity of transformed construct
equivalences. Moreover, it must be an information preserving fimction; that is, the result
of exchanging construct-sets on the construct equivalence which has already been
exchanged should be the same as the original construct equivalence. However, as will be
seen shortly, a LHS construct-set may be associated with some selection conditions,
while a RHS construct-set cannot. Some but not all of the selection conditions on the
LHS construct-set can be moved to the construct-correspondence clause when
performing the construct-set exchange between two sides of a construct equivalence.
188
Ancillary-description is reserved for those selection-conditions which are initially
associated with the LHS construct-set but cannot be specified in the constructcorrespondence clause after the construct-set exchange is performed. The ancillarydescription clause is optional since there may not exist any ancillary descriptions in a
construct equivalence.
5.2.2 Definition of Constnict-Set
The basic building blocks of a construct-set in a construct equivalence are constructinstance and construct-instance-set.
A construct-set consists of a set of construct-
instances and/or construct-instance-sets connected by AND operators.
A construct-
instance specified in the LHS of a construct equivalence provides a way to reference
each instance of a construct-domain (i.e., an instance of a model construct in an
application schema) satisfying certain selection conditions (called instance-selectioncondition). On the other hand, a construct-instance specified in the right-hand-side
(RHS) of a construct equivalence refers to a particular instance or implies a new instance
of a construct domain. A particular instance or a new instance of a construct-domain
referred by a RHS construct instance can be specified in the construct-correspondence
clause which will be discussed later. Thus, instance-selection-condition applicable to a
LHS construct-instance is not allowed by a RHS construct-instance. Furthermore, since
a LHS construct-instance is always qualified universally and a RHS construct-instance is
always qualified existentially, both the universal qualifier (i.e., V) and the existential
189
qualifier (i.e., 3) are omitted in the construct equivalence assertion language.
The
general expression of a construct-instance is:
construct-instance: constaict-domain [ (WHERE instance-selection-condition) ]
WHERE clause is optional for a LHS construct-mstance but not allowed in a RHS
construct-instance.
On the other hand, a construct-instance-set refers to a set of instances of the same
construct-domain. Thus, two levels of reference are required: an inner construct-instance
for an instance in a construct-domain satisfying certain instance-selection-condition and
an outer construct-instance-set for a set of such construct-instances from the same
construct-domain satisfying certain set-selection-condition. Same as discussed above, a
LHS construct-instance-set may be associated with an instance-selection-condition
and/or a set-selection-condition, but none of them can be associated to a RHS. The
general expression of a construct-instance-set is specified as:
constaict-instance-set:
{construct-instance: construct-domain [ (WHERE instance-selection-condition) ]}
[ (WHERE set-selection-condition) ]}
Both WHERE clauses are optional for a LHS construct-instance-set but not allowed
in a RHS construct-instance-set.
Reference to a construct-instance-set in the construct equivalence refers to all instances
in the construct-instance-set. The reference protocol to the inner construct-instance of a
construct-instance-set is defined as follows.
When the iimer construct-instance is
referenced, this reference intends to each instance in the construct-instance-set. When a
set operator (i.e., n, u, or -) is applied on the irmer construct-instance, it returns the
190
result of applying the set-operator on all instances in the construct-instance-set. A set
comparison operator on the inner construct-instance returns true if all pairs of instances
in the construct-instance-set satisfy this set comparison operator; otherwise, it returns
false.
Because the reference order to the instances in the construct-instance-set is
undefined, set operators and set comparison operators with the commutative property are
applicable to the inner construct-instance. Thus, the set difference operator and the
subset or superset comparison operators cannot be applied on the irmer constructinstance.
Moreover, because set operators and set comparison operators are binary
operations, the traditional convention of expressing set operators or set comparison
operators (e.g., A r» B, A = B, etc.) is not appropriate to express a set operator or a set
comparison operator on a single inner construct-instance. Thus, a new expression of a
set operator or a set comparison operator is defined as
<set-op> I <comp-op>(inner-construct-instance[.path.model-construct])
where | denotes "or",
<set-op> is an applicable set operator (union and intersect are used for
and n, respectively), and
<comp-op> is a set comparison operator (equal and not_equal are used for =
and ^).
Constnict-Domain;
A construct-domain can be a simple or complex construct-domain. A simple construct«
domain refers to a model construct in a model schema and be either directly-referenced
or path-referenced. A directly-referenced construct-domain addresses a model construct
by its name, while a path-referenced construct-domain, similar to the path-expression of
the constraint-specification language of the SOOER model defined in Section 3.2.2,
191
traverses a model schema via a path of relationships from a construct-instance to a model
construct (called the terminal of the path) in the model schema. A simple constructdomain is formally expressed as:
[model-name.]model-construct |
construct-instance.path.model-construct
The former is for a directly-referenced construct-domain, while the latter is for a pathreferenced construct domain. If a directly-referenced construct-domain appears in an
inter-model construct equivalence, it is necessary to include a model name; otherwise,
the model name can be omitted. However, a path-referenced construct domain need not
be signified by a model name because its leading construct-instance can be used to trace
on which model schema this path-referenced construct domain is defined.
On the other hand, a complex construct-domain restricts a simple construct-domain by
other simple construct-domains referring to the same model construct as the first simple
construct-domain. Thus, a complex construct-domain is composed of a set of simple
construct-domains connected by either set intersection or difference operators.
union operator cannot be used since it is not a set-reduction operator.
The
All simple
construct-domains in a complex construct-domain should refer to the same model
construct. Since a complex construct-domain defines a selection condition at the model
construct level, it can appear only in the LHS of an construct equivalence (i.e., a LHS
construct-instance may be defined on a complex construct-domain, but a RHS constructinstance must be defined on a simple construct-domain).
192
1. R: Relational.Relation
R is an construct-instance which denotes each relation in the relational data model.
2. A: Relational-Attribute - Relational.Foreign-Key.Attribute
A denotes each non foreign key attribute (specified as Relational-Attribute Relational.Foreign-Key.Attribute) in the relational data model.
3. S: {E: SOOER.Entity-Class}
S is a construct-instance-set which consist of a set of entity-classes (denoted by {E:
SOOER.Entity-Class}) in the SOOER model.
Instance-Selection-Condition of LHS Construct-Instances:
Instance-selection-condition enclosed in the WHERE clause of a construct-instance
further restricts the instances of the construct-domain to be referenced by the constructinstance. For example, "each single-valued attribute in the SOOER model" is specified
as "A; SOOER.Attribute
(WHERE
A.Multiplicity
=
'single-valued')."
A;
SOOER.Attribute refers to each instance of the attribute model construct in an
application schema, regardless whether it is single- or multi-valued (i.e., A is a single- or
multi-valued application construct which is an instance of the attribute model construct).
The "WHERE A.Multiplicity = 'single-valued'" clause is an instance-selection-condition
on A and as a result restricts A to instances of single-valued attributes.
Instance-
selection-condition defined for its preceding construct-instance is a conjunction of
selection clauses each of which can be one of the following types:
1. Property selection: A property selection is used to select a subset of instances in a
construct-domain whose property is equal to a particular value. The single-valued
attribute example described above illustrates the meaning and use of a property
selection. Formally, a property selection is expressed as:
193
cinst-property = value
where cinst is the construct-instance on which the property selection is defined
and property is a property of cinst.
Path selection: A path selection, used to select a subset of instances in a constructdomain whose path(s) exhibits certain characteristics, can be expressed in one of the
following forms:
•
•
•
cinsti .path.model-construct <set-comp> set1
cinsti .path.model-construct [ <set-op> set2 ... ] <set-comp> set3 | 0 or
cInstI.path.model-construct <set-op> set2 ... <set-comp> set3
agg-functlon(clnst1.path.model-construct [ <set-op> set2 ... ])
<comp-op> agg-function(set3 [ <set-op> set4 ...]) j value
where
- cinsti is the construct-instance on which the path selection is defined.
- setl refers to a set of instances. It can be a set of construct-instance(s) enclosed
in { }, a construct-instance-set, a path-referenced model construct originating
from a construct-instance possibly different from cinsti.
- set2, set3 or set4 refers to a set of instances. It can be a construct-instance-set,
a path-referenced model construct originating from a construct-instance possibly
different from cinsti.
- <set-comp> is a set comparison operator including =, 3,2, e, and c.
- <set-op> is a set-reduction operator including n and -.
- <comp-op> is a value comparison operator including =, >, >, <, and <.
- agg-function() is an aggregation flmction on a set including count() and countdistinct().
A path selection expressed in the first form specifies that an instance in the constructdomain will be included in the reference scope of the construct-instance (cinsti) if
the specific set comparison (denoted by set-comp) on a path originating from the
instance and a set (or the intersection or difference of multiple sets) originating from
the same or different instance is satisfied. The second form performs intersection or
difference operations on a path originating from an instance denoted by the construct-
194
instance with other set(s) of instances, and then compares the set operation result
with another set of instances originating from the same or different construct (or an
empty set). For example, that "each relation whose primary key attributes do not
contain any of its foreign key attributes in the relational model" is represented as: "R:
Relationai.Relation (WHERE R.Primary-Key.Attribute n R.Foreign-Key.Attribute =
0)".
The last form compares the aggregate information derived from
originating from
a path
an instance denoted by the construct-instance with another
aggregate information or a value. For example, "each specialization relationship in
the SOGER model which is associated with a single subclass" is specified as "S:
SOOER.Specialization (WHERE count(S.subclass.Entity-Class) = 1".
3. Extension selection: An extension selection defines a restriction on the extension
(i.e., actual data in database) of an application construct (i.e., instance denoted by the
construct-instance or by a path originating from the construct-instance). Similar to
those for path selections, three expression forms can be used for specifying extension
selections:
•
•
•
ext(cinst1[.path.model-construct]) <set-comp> ext(set1 [<set-op> ext(set2)... ])
ext(cinst1[.path.model-construct] <set-op> set1 ...) <set-comp> ext(set2) | 0
ext-function(ext(cinst1[.path.model-construct] [ <set-op> set1 ...]))
<comp-op> agg-functlon(ext(set2 [ <set-op> set3 ...])) 1 value
where
- ext(set) is the extension function.
- ext-function( ) is an aggregation function on an extension including count( ),
count-distinct(), has_duplicate(), and has_null().
For example, that "two relations in the relational model whose extensions on their
primary key attributes are the same" is represented as: "T: Relationai.Relation AND
195
S: Relational.Relation (WHERE ext(S.Piimary-Key.Attribute) = ext(T.PrimaryKey.Attribute))".
Example 1: Inter-model construct equivalence
That "each relation in the relational data model whose primary key attributes do not
contain any of its foreign key attributes is equivalent to an entity-class in the SOOER
model...." is expressed as:
R: Relational.Relation (WHERE R.Primary-Key.Attribute.Foreign-Key = 0)
E: SOOER.Entity-Class
WITH ...
"R.Primary-Key.Attribute.Foreign-Key = 0" is a path selection of the second form.
Example 2: Inter-model construct equivalence
That "each non-foreign key attribute of a relation in the relational data model is
equivalent to a single-valued attribute of the entity-class for the relation in the SOOER
model. ..." is represented as:
R: Relational.Relation AND
N: R.Attribute - R.Foreign-Key.Attribute
E: SOOER.Entity-Class AND
A: E.Attribute
WITH ...
The construct-domain for the construct-instance N is a complex construct-domain. The
LHS of the construct equivalence denotes each non-foreign key attribute N (i.e.,
R.Attribute - R. Foreign-Key.Attribute) of each relation R in the relational model, while
196
the RHS represents an entity-class E (i.e., SOOER.Entity-Class) and an attribute A of the
entity-class (i.e., A: E.Attribute) in the SOGER model. The detailed correspondences
related to the construct-instances defined in the RHS (e.g., "single-valued attribute" and
"the entity-class for the relation") will be specified in the construct-correspondence
clause.
Example 3: Intra-model construct equivalence
"A multi-valued attribute of an entity-class in the SOGER model is equivalent to an
entity-class, a single-valued attribute, and an association relationship in the SOGER
model. The association relationship connects the new entity-class and the entity-class to
which the multi-valued attribute belongs. ..."
E; SOOER.Entity-Class AND
M: E.Attribute (WHERE M.Multiplicity = 'multi-valued')
N: SOOER.Entity-Class AND
A: Attribute AND
R: SOOER.Assoclation
WITH ...
The LHS of the construct equivalence denotes "each multi-valued attribute M of every
entity-class E in the SOGER model." The RHS of the construct equivalence involves
three construct-instances and denotes "an attribute A, an entity-class N and an
association relationship in the SOGER model." Again, the detailed correspondences for
"single-valued attribute" and "the association relationship connecting the new entityclass and the entity-class to which the multi-valued attribute belongs" will be specified in
the construct-correspondence clause.
197
Set-Selection-Condition on LHS Constnict-Instance-Sets:
Set-selection-condition enclosed in the WHERE clause of a construct-instance-set
identifies the maximal subset of construct-instances (from the construct-domain denoted
by the inner construct-instance) to be included in the construct-instance-set or test the
validity of the set of construct-instances currently included in the construct-instance-set.
As mentioned, since set-selection-condition on a construct-instance-set serves as
restriction criteria, it appears only in the LHS of a construct equivalence.
A set-
selection-condition consists of conjunctive selection clauses. If a selection clause on a
construct-instance-set is defined upon the inner construct-instance (or a construct
reachable from the inner construct-instance), it applies to every instance in the constructinstance-set according to the reference protocol defined above and is used to select from
the construct-domain the maximal subset of construct-instances which satisfy the
selection clause to be included in the construct-instance-set. If none of the possible
subsets of the construct-instances in the construct-domain satisfies the selection clause, S
will be an empty set. On the other hand, a selection clause defined on a constructinstance-set applies to the set of instances in the construct-instance-set. For example,
assume S be a construct-instance-set. A selection clause "count(S) > 1" is defined on the
construct-instance-set and requires the number of instances in S be greater than 1. If so,
8 contains all construct-instances currently included in S. If the number of instances in S
is less than or equal to 1, the set of construct-instances currently included in S fails to
satisfy the testing condition and S will become an empty set.
198
A selection clause in a set-selection-condition can be expressed in the following forms:
1. union \ intersection(lclnst.path.model-constaict)
<set-comp> set1 [ <set-op> set2 ...]
2. union | intersection(ext(icinst.path.model-construct))
<set-comp> ext(set1 [ <set-op> set2 ...])
3. agg-functlon(clnstset) <comp-op> value
where
- icinst is the inner construct-instance of the construct-instance-set on which the
selection clause is defined.
- set1 or set2 refers to a set of instances. It can be a set of construct-instance(s)
enclosed in { }, a construct-instance-set, a path-referenced model construct originating
from a construct-instance possibly different from cinstl.
- <set-comp> is a set comparison operator including =, z), 3, c, and c.
- <comp-op> is a value comparison operator including =, >, >, <, and <.
- agg-function(), an aggregation flmction, includes count() and count-distinct().
- icinstset is the construct-instance-set on which the selection clause is defined.
Example 4: Intra-model construct equivalence
"In the SOGER model, a set of entity-classes having the same identifier as another
entity-class whose extension on its identifier attributes is a superset of the union of that
of the formers are equivalent to a new specialization relationship in which the latter is its
superclass and the formers are its subclasses...." is expressed as:
C: SOOER.Entity-Class AND
T: {E: SOOER.Entity-Class (WHERE E.Identifier.Attribute = C.ldentlfier.Attribute)}
(WHERE unlon(ext(E.ldentlfier.Attribute)) e ext(C.Identifier.Attribure))
S: SOOER.Speciailzatlon
WITH ...
"E: SOOER.Entity-Class (WHERE E.Identifier.Attribute = C.Identifier.Attribute)" is the
iimer construct-instance of the construct-instance-set T and denotes every entity-class
whose identifier attributes are the same as those of the entity-class C. The set-selection-
199
condition on T (i.e., "union(ext(E.Identifier.Attribute)) c ext(C.Identifier.Attribure)")
further restricts the set of entity-classes to be included in T if the union of their
extensions on identifier attributes is the same as or a subset of the extension on identifier
attributes of C.
The RHS of this intra-model construct equivalence denotes a
specialization relationship. The detailed correspondence for "C is the superclass of T in
the specialization relationship" will be specified in the construct-correspondence clause.
5.2.3 Definition of Construct-Correspondence
So far, a construct equivalence is defined at the construct-set level. That is, a set of
model constructs defined in the LHS of a construct equivalence are equivalent to another
set of model constructs defined in the RHS. The construct-correspondence clause is
employed to specify correspondences within or between construct-sets of a construct
equivalence. The construct-correspondence of a construct equivalence consists of a set
of correspondences each of which can be either a connection correspondence, property
correspondence or relationship correspondence. For example, in Example 1, besides
declaring each relation whose primary key attributes do not contain any of its foreign key
attributes in the relational model being equivalent to an entity-class in the SOOER
model, a correspondence at the property level (i.e., property correspondence) is needed.
It is that the name of the entity-class is the same as that of the relation. Moreover, as
depicted in Example 2, correspondences for "single-valued attribute A" and "the entityclass E for the relation R" (where A and E are the RHS construct-instances and R is the
LHS construct-instance) need to be specified in the construct-correspondence of the
200
construct equivalence. The former is an example of property correspondence while the
latter is an example of connection correspondence. That "the association relationship R
connects the new entity-class N and the entity-class E to which the multi-valued attribute
belongs" (where R and N are the RHS construct-instances and E is the LHS constructinstance), described in Example 3, demonstrates a relationship correspondence.
Connection Correspondence:
A connection correspondence specifies that a construct-instance in the RHS of a
construct equivalence is corresponding to another construct-instance (or a pathreferenced model construct originating from some construct-instance) defined in the LHS
of the construct equivalence. "The entity-class E for the relation R" in Example 2
indicates that E must be the entity-class corresponding to the relation R and thus is a
connection correspondence. Moreover, connection correspondences can only be used in
inter-model construct equivalences. The expression of an equivalence connection is
formally specified as:
cinsti = cinst2[.path.model-construct]
where cinsti is a RHS construct-instance and cinst2 is a LHS construct-instance of
an inter-model construct equivalence.
Property Correspondence:
A property correspondence which may appear in an inter-model or intra-model construct
equivalence asserts a property of a construct-instance defined in the RHS of the construct
equivalence equal to a constant value or equal to a property of another construct-instance
defined in the LHS of the construct equivalence. A property correspondence may be
201
conditional; that is, the property correspondence can be established only when a
condition is satisfied. A property correspondence is formally expressed as:
cinsti .property = value | cinst2.property [ IF condition ]
where cinsti is a construct-instance defined in the RHS of the construct equivalence.
cinst2 is a construct-instance defined in the LHS of the construct equivalence.
The expression of a condition is the same as those defined in property, path or
extension selection.
Example 5: (continued fi-om Example 2)
"Each non-foreign key attribute of a relation in the relational model is equivalent to a
single-valued attribute of the entity-class for the relation in the SOOER model. The
name (or data-type) of the single-valued attribute is the same as that of the non-foreign
key attribute. The null-specification of the single-valued attribute is determined by
whether the non-foreign key attribute allows null values or not. The value of the
uniqueness property of the single-valued attribute is the same as that of the IsUnique
property of the non-foreign attribute."
This inter-model construct equivalence is
represented as:
R: Relational.Relation AND N: R.Attribute - R.Foreign-Key.Attribute
E: SOOER.Entity-Class AND A: E.Attribute
WITH E = R AND
A.Multiplicity = 'single-valued' AND
A.Name = N.Name AND
ANull-Spec = N.lsNull AND
A.Uniqueness = N.IsUnique AND
ADataType = N.DataType
The construct-correspondence enclosed in the WITH clause of this inter-model construct
equivalence specifies a connection correspondence between two construct-instances (i.e.,
202
E = R which denotes the entity-class E is corresponding to the relation R), a property
correspondence for single-valued attribute A (i.e., A.Multiplicity = 'single-valued') and
several property correspondences between single-valued attribute A and the non-foreign
key attribute N.
Relationship Correspondence;
A relationship correspondence specifies a change to the relationships of a constructinstance as a result of participating in a construct equivalence. That is, it defines how the
content of a construct reachable from a construct-instance will be changed when the
construct-instance
is
involved
in
the
construct
equivalence.
Relationship
correspondences can be used in both inter-model and intra-model construct equivalence.
A relationship correspondence can be defined for a RHS or a LHS construct-instance.
Identical to a property correspondence, a relationship correspondence can be conditional.
A relationship correspondence is formally expressed as:
cinstl.path.model-construct = set1 [ <set-op> set2 ... ] [ IF condition ]
where cinsti is a construct-instance defined in the construct equivalence.
set1 or set2 refers to a set of instances. It can be a set of constructinstance(s) enclosed within { }, a construct-instance-set, a path-referenced
model construct originating from a construct-instance possibly different from
cinsti.
The expression of condition is the same as that of property, path, or
extension selection.
Example 6: (continued from Example 3)
"A multi-valued attribute in the SOGER model is equivalent to an association
relationship, an entity-class and a single-valued attribute in the SOGER model. The
203
association relationship connects the new entity-class and the entity-class to which the
multi-valued attribute belongs. The single-valued attribute becomes the only attribute as
well as the identifier of the new entity-class. As the identifier of the new entity-class, the
uniqueness and niUl-spec properties of the single-valued attribute are 'unique' and 'notnull', respectively. If different instances of the existing entity-class can share the same
value of the multi-valued attribute (i.e., the uniqueness property of multi-valued attribute
is 'not-unique'), the maximal cardinality on the existing entity-class is m; otherwise, it is
1. The minimal cardinality on the existing entity-class is always I since each value of
the multi-valued attribute was associated to an instance of the existing entity-class. ..."
This intra-model construct-equivalence is represented as:
E; Entity-Class AND
M: E.Attribute (WHERE M.Multiplicity = 'multi-valued')
N: Entity-Class AND
A: Attribute AND
R: Association
WITH A.Multiplicity = 'single-valued' AND
R.relate.Entity-Class = {E, N} AND
N.Attribute = {A} AND
N.Identifier.Attribute = N.Attribute AND
AUniqueness = 'unique' AND
A.Null-Spec = 'not-nuir AND
R.relate.E(Max-Card) = m IF M.Uniqueness = 'not-unique' AND
R.relate.E(Max-Card) = 1 IF M.Uniqueness = 'unique' AND
R.relate.E(Min-Card) = 1 AND
The first, fifth and sixth correspondences (i.e., A.Multiplicity = 'single-valued',
A.Uniqueness = 'unique' and A.Null-Spec = 'not-null') are property correspondences
defined on the properties of the attribute A and state that the attribute A is single-valued,
must be unique and cannot allow null values, respectively. The second, third and fourth
204
correspondences (i.e., R.reIate.Entity-Class = {E, N}, N.Attribute = {A} and
N.Identifier.Attribute = N.Attribute) are relationship correspondences which indicate that
"the association relationship R must link the entity-classes E and N", that "the singlevalued attribute becomes the only attribute of the new entity-class," and that "the
attribute becomes the identifier of the new entity-class," respectively. The last three
correspondences define values for the maximal and minimal cardinality of the link
between the entity-class E and the association relationship R. "R.relate.E(Max-Card) =
m IF M.Uniqueness = 'not-unique'" is a conditional property correspondence. It depicts
that the maximal cardinality between the entity-class E and the association relationship R
is m (for many) if the IsUnique property of the multi-valued attribute M is 'not-unique'.
Otherwise, the maximal cardinality between the entity-class E and the association
relationship R is 1 (as defined in R.relate.E(Max-Card) = 1 IF M.Uniqueness = 'unique').
5.2.4 Definition of Ancillary-Description
The existence of the ancillary-description clause in a construct equivalence is to ensure
the construct equivalence be bi-directional.
The ancillary-description consists of a
conjunction of ancillary clauses. An ancillary clause is used to contain a selection clause
of an instance-selection-condition or set-selection-condition on a LHS construct-instance
of a construct equivalence which caimot be specified in the construct-correspondence
clause after exchanging construct-sets between two sides of the construct equivalence.
Besides, there are some other sources for the ancillary-description when exchanging
205
construct-sets of a construct equivalence, Table 5.1 provides a summary of all potential
sources of the ancillary-description.
Original Construct Equivalence
Instance-selection-condition on a LHS
construct-instance
- Property selection clause
- Path selection clause
- Extension selection clause
Set-selection-condition on a LHS constructinstance-set
Complex construct-domain of a LHS
construct-instance
Conditional construct-correspondence
Ancillary-Description of
Transformed Construct Equivalence
no
partial
yes
yes
yes
partial
Table 5.1: Sources for Ancillary-Description When Exchanging Construct-Sets
A property selection on a LHS construct-instance defining a value that must be satisfied
by a property of each instance denoted by the construct-instance will become a property
correspondence which ascertains the property value of an instance denoted by the
construct-instance (formerly in the LHS) after exchanging the construct-sets of the
construct equivalence. Similarly, the first form of a path selection clause with the set
equality comparison operator will become a relationship correspondence in the constructcorrespondence clause of the transformed construct equivalence. Since these two types
of selection clauses on LHS construct-instances can be transformed into property or
relationship correspondences ui the construct-correspondence of the transformed
construct equivalence, they will not be specified in the ancillary-description clause of the
transformed construct equivalence. However, there exists no counterpart for the first
206
form of path selections which involve non-equality set comparison operators, the second
and third forms of path selections, extension selections, and set-selection-conditions in
the construct-correspondence clause. They need to be described by ancillary clauses in
the ancillary-description of the transformed construct-equivalence.
In addition, as
mentioned in the previous subsection, a complex construct-domain can appear only in
the LHS of a construct equivalence. When exchanging the construct-sets of the construct
equivalence, all complex construct-domains originally defined for LHS constructmstances are not permitted in any of the RHS construct-instances of the transformed
construct equivalence. As a result, complex construct-domains need to be described in
the ancillary-description of the transformed construct equivalence. Moreover, some of
conditional construct-correspondences in a construct equivalence cannot be transformed
and specified in the construct-correspondence of the transformed construct equivalence.
Thus, they represent another source for the ancillary-description when exchanging
construct-sets of a construct equivalence.
The detailed discussion of the construct
equivalence transformation function (i.e., exchanging construct-sets of a construct
equivalence) will be deferred until Section 5.3.
The definition of ancillary-description concludes the syntactic specifications of CEAL.
To demonstrate the expressiveness of CEAL, the inter-model construct equivalences
between the EER and SOOER models (whose model schemas are shown in Figure 3.9
and Figure 3.7 respectively) and some of the intra-model construct equivalences of the
SOOER model are engineered and listed in Appendix G and Appendix H.
207
5.2.5 Execution Semantics of Construct Equivalences
Once the syntax of CEAL is defined, the execution semantics of inter-model and intramodel construct equivalences expressed in CEAL need to be explicitly specified. As
shown in Figure 2.8, inter-model construct equivalences are used in the source-target
projection stage of the construct-equivalence-based schema translation. Thus, LHSs of
inter-model construct equivalences represent instantiations from application constructs in
the source application schema, while RHSs of inter-model construct equivalences may
involve mstantiations of application constructs in the target application schema and result
in the creation of corresponding application constructs in the target application schema.
The detailed execution semantics of inter-model construct equivalences is depicted as
follows:
1. Instantiation of LHS construct-instances from the source application schema:
A LHS construct-instance refers to each application construct which belongs to the
construct-domain (directly-referenced, path-referenced, or complex) of the constructinstance and satisfies the instance-selection-condition of the construct-instance. A
LHS construct-instance-set refers to the maximal subset of the application constructs
(referenced by its inner construct-instance) which satisfies the set-selection-condition
of the construct-instance-set.
2. Instantiation of RHS construct-instances from the target application schema:
A RHS construct-instance associated with a connection correspondence in the
construct-correspondence clause of the construct equivalence results in searching in
208
the target application schema for its corresponding application construct (as indicated
in the RHS of the connection correspondence). If the corresponding application
construct has not yet existed in the target application schema, execution of this
construct equivalence will be deferred.
3. Application construct creation in the target application schema:
If a RHS construct-instance is not associated with a correspondence connection in the
construct-correspondence clause of the construct equivalence, an application
construct denoted by this construct-instance will be created in the target application
schema when the construct equivalence is executed. If the RHS construct-instance is
defined on a path-referenced construct-domain, a linkage between the application
construct denoted by the origin of the path and the newly created application
construct for the RHS construct instance will automatically be established.
4. Assignment implied by property and relationship correspondences:
A property (or relationship) correspondence indicates an assignment operation by
which the value represented in the RHS of the correspondence is assigned to the
property of the application instance (or the terminal of the path) denoted by the LHS
of the correspondence. If a property or relationship correspondence involves an inner
construct-instance which is not the operand of a set operator, the property or
relationship will be executed for every application construct included in the
construct-instance-set to which the inner construct-instance belongs.
5. No execution effect from ancillary-description:
The ancillary-description of a construct equivalence does not have any execution
semantics.
That is, when a construct equivalence is executed, its ancillary-
209
description will not cause any change in the source or target application schema. The
ancillary-description may be listed as additional constraints on the target application
schema.
During the schema translation process, intra-model construct equivalences of the source
data model are employed in the source convergence stage and those of the target data
model are employed in the target enhancement stage, as shown in Figure 2.8. On the
other hand, intra-model construct equivalences can be used in the schema normalization
process. In all cases, the execution of an intra-model construct equivalence involves
instantiating LHS and possibly RHS construct-instances from
application schema,
creating new application constructs in the application schema, and possibly deleting
existing application constructs from the application schema. The detailed execution
semantics of intra-model construct equivalences is depicted as follows:
1. Instantiation of LHS construct-instances from the application schema:
Same as that defined in the execution semantics of inter-model construct equivalence.
2. Instantiation of RHS construct-instances from the application schema:
If the RHS of a selection-condition on a LHS construct-instance refers to a RHS
construct-instance, an instantiation of the RHS construct-instance will be triggered.
Since intra-model construct equivalences do not
involve any
connection
correspondence, no execution delay will occur.
3. Application construct creation in the application schema;
If a RHS construct-instance (or constnict-instance-set) is not referenced by the RHS
of any path selection on any LHS construct-instance, an application construct
210
denoted by this construct-instance (or construct-instance-set) will be created in the
application schema when the construct equivalence is executed.
If the RHS
construct-instance is defined on a path-referenced construct-domain, a linkage
between the application construct denoted by the origin of the path and the newly
created application construct for the RHS construct-instance will automatically be
established.
4. Application construct deletion in the application schema:
If a LHS construct-instance (or construct-instance-set) does not appear in the LHS of
any property or relationship correspondence or is not enclosed within { } in the RHS
of any relationship correspondence, the application construct denoted by the
construct-instance (or construct-instance-set) will be removed from the application
schema after executing the construct equivalence.
5. Assignment implied by property and relationship correspondences:
Same as that defined in the execution semantics of inter-model construct equivalence.
6. No execution effect from ancillary-description:
Same as that defined in the execution semantics of inter-model construct equivalence.
The execution semantics of inter-model and intra-model construct equivalences is
summarized in Table 5.2. Two major execution semantics differences between intermodel and intra-model construct equivalences can be identified.
One difference
concerns the instantiation of RHS construct-instances and the application construct
creation and is resulted from the use of connection correspondences only in inter-model
construct equivalences but not in intra-model construct equivalences.
The other
211
difference is the application construct deletion. Since inter-model construct equivalences
are used in source-target projection stage of the schema translation, the main objective is
to create a target application schema corresponding to the source application schema.
Thus, the deletion of application constructs from either the source application schema or
the target application schema is not required. On the other hand, since intra-model
construct equivalences are used in the source convergence or target enhancement stage of
the schema translation or in the schema normalization process, transforming from the
current application schema into a semantically equivalent one may involve the deletion
of application constructs in the current application schema.
212
Application
construct deletion
Not applicable
Property and
relationship
correspondence
Ancillarydescription
Resulting in an assignment
operation
Intra-model Construct
Equivalence
From the current application
schema
Triggered by references in
LHS construct-instances
When a RHS constructinstance is not referenced by
the RHS of any path selection
on any LHS construct
instance
When a LHS constructinstance does not appear in
the LHS of any
correspondence or is not
enclosed within { } in the
RHS of any relationship
correspondence
Resulting in an assignment
operation
No execution semantics
No execution semantics
Instantiation of LHS
construct-instances
Instantiation of RHS
construct-instances
Application
construct creation
Inter-model Construct
Equivalence
From the source application
schema
Triggered by connection
correspondences
When a RHS constructinstance is not associated
with a coimection
correspondence
Table 5.2: Summary of Execution Semantics of Construct Equivalence
5.3 Information Preserving Construct Equivalence Transformation
Function
To allow a construct equivalence be bi-directional, a construct equivalence
transformation function (employed by the construct equivalence transformation method)
for exchanging construct-sets between two sides of a construct equivalence and
restructuring other components of the construct equivalence is necessary. As discussed,
it needs to ensure the validity of transformed construct equivalences and be an
information preserving function.
213
The tasks and steps of transforming a construct equivalence consist of 1) restructuring all
ancillary clauses in the ancillary-description clause, 2) restructuring all correspondences
(connection, property, and relationship correspondences) in the construct-correspondence
clause, 3) restructuring all complex construct-domains of LHS construct-instances, 4)
restructuring all selection clauses of the instance-selection-condition on each LHS
construct-instance, 5) restructuring all selection clauses of the set-selection-condition on
each LHS construct-instance-set, and 6) exchanging the LHS construct-set with the RHS
construct-set. Each step employs a set of restructuring operations. The restructuring
operations for steps 1 to 2 are listed in Table 5.3, and those for steps 3, 4 and 5 are listed
in Table 5.4. The operation required by the step 6 is simply an exchange operation
between two sides of a construct equivalence.
214
Restructuring Operation
Original Construct Equivalence
Ancillary clause (N)
- B[uC...]n{V}=0
DPI: Append B [ -C ...]" into the constructdomain of the RHS construct-instance V and remove N
from the ancillary-description.
- B [ o C . . . ] n { V } = {V}
OP2: Append "n B [ n C ...]" into the constructdomain of the RHS construct-instance V remove N
from the ancillary-description.
OP3: Exchange the antecedent of N with the condition
- Antecedent IF condition
of N. If the new antecedent of C is a property
correspondence or the path selection of the first form
with a set equality comparison operator, C is converted
into correspondence in the construct-correspondence.
OP4: N is converted into a selection clause of the
- Others
construct-instance (or construct-instance-set) denoted
by the first construct-instance (or construct-instanceset) in N.
Construct-Correspondence (C)
OPS: Exchange the LHS of C with the RHS of C.
- Connection correspondence
- Unconditional property
correspondence
=> S.property = value
OP6: C is converted into a property selection of the
RHS construct-instance S.
=> S.property = T.property
OPT: Exchange the LHS of C with the RHS of C.
- Unconditional relationship
correspondence
S.path.M = P
OPS: If P is a path-referenced model construct and the
origin construct-instance of P is associated with a
connection correspondence, exchange the LHS of C
with the RHS of C. C is converted into a path selection
clause of the construct-instance denoted by the origin
construct-instance of P.
=> S.path.M = P <set-op> Q ...
OP9: If <set-op> is
change it into 'w'. If <set-op>
is 'vj change it into
OPIO: Exchange the antecedent of C with the
- Conditional property
condition of C. If the new antecedent of C is neither a
correspondence
property correspondence nor the path selection of the
first form with a set equality comparison operator, C is
converted into an ancillary clause in the ancillarydescription
See OPIO
- Conditional relationship
correspondence
Table 5.3: Restructuring Operations for Ancillary Clause and Construct-Correspondence
215
Original Construct Equivalence
LHS construct-instance (V) with
complex construct-domain
- V:A-B[-C...]
- V:AnB[nC...]
Restructuring Operation
CPU: Add an ancillary clause (B [ u C ...] n {V} =
0) in the ancillary-description of the construct
equivalence and remove B [ - C ...]" from the
construct-domain of V.
OP12: Add an ancillary clause (B [ o C ...] n {V} =
{V}) in the ancillary-description of the construct
equivalence and remove "n B [ n C ...]" from the
construct-domain of V.
Selection clause (C) of the instanceselection-condition on a LHS
construct-instance
- Property selection clause
OP13: C is converted into a property
correspondence.
- Path selection clause
=> First form with set equality
OP14: If P is a path-referenced model construct and
comparison operator
S is associated with a connection correspondence,
exchange the LHS of C with its RHS (i.e., C now
(i.e., S.path.M = P)
becomes P = S.path.M). C is converted into a
relationship correspondence.
OP15: C is converted into an ancillary-clause.
First form with non-equality
comparison operator, second
form, and third fomi
See OP15
- Extension selection clause
See OP15
Selection clause (C) of the setselection-condition on a LHS
construct-instance-set
Table 5.4: Restructuring Operations for Complex Construct-Domains and Selection
Clauses on LHS Construct-Instances (or Construct-Instance-Sets)
Example 7: (continued from Example 5)
As shown in Example 5, the mter-model construct equivalence which is specified as
"Each non-foreign key attribute of a relation in the relational model is equivalent to a
single-valued attribute of the entity-class for the relation in the SOGER model. ..." is
represented as:
216
R: Relational.Relatlon AND
N: R.Attribute - R.Foreign-Key.Attribute
E: SOOER.Entity-Class AND A: E.Attribute
WITH ERRAND
A.Multipliclty = 'single-valued' AND
A.Name = N.Name AND
A.Null-Spec = N.lsNull AND
A.Uniqueness = N.lsUnique AND
A.DataType = N.DataType
The transformation process of this inter-model construct equivalence is depicted below
and graphically summarized in Figure 5.1.
R: Relational.Relatlon AND
N; R.Attribute - R.Foreign-Key.Attribute
E: SOOER.Entity-Class AND A: E.Attribute
E = R AND
A.Multiplicity = 'sinqle-valued' AND
A.Name = N.Name AND
A.Null-Spec = N.lsNull AND
A.Uniqueness = N.lsUnique AND
A.DataTvpe = N.DataType
<- OPll (step 3)
<-OP5
<-OP6
<-OP7
<-OP7
<-OP7
<-OP7
(step 2)
(step 2)
(step 2)
(step 2)
(step 2)
(step 2)
Figure 5.1: Process of Transforming Construct Equivalence of Example 5
1) Restructuring all ancillary clauses in the ancillary-description clause:
Since the
original construct equivalence does not contain any ancillary clause, no action is
performed in this step.
2) Restructuring all correspondences in the construct-correspondence clause:
According to the restructuring operation 0P5, the LHS of the connection
correspondence E = R will be exchanged with its RHS (i.e., the connection
217
correspondence will become R = E). Since the RHS of the property correspondence
A.Multiplicity = 'single-valued' is a constant value, 0P6 is executed. Consequently,
the property correspondence becomes a property selection clause of the constructinstance A. According to 0P7, the next property correspondence clause A.Name =
N.Name is restructured and becomes N.Name = A.Name. The same restructuring
operation is applied to the remaining property correspondence clauses since their
format is the same as A.Name = N.Name.
3) Restructuring complex construct-domains of LHS construct-instances: The LHS
construct-instance N is defined on the complex construct-domain R.Attribute R.Foreign-Key.Attribute.
Therefore, according to OP11, the ancillary clause
R.Foreign-Key.Attribute n {N} = 0 will be added into the ancillary-description.
4) Restructuring instance-selection-conditions on LHS construct-instances: Since the
construct equivalence has no instance-selection-condition, no action is taken in this
step.
5) Restructuring set-selection-conditions on LHS construct-instance-sets: Since the LHS
construct-set of the construct equivalence does not involve any construct-instanceset, no action is taken in this step. This inter-model construct equivalence after this
step is as follows:
R: Reiational.Relation AND
N: R.Attribute
E: SOOER.Entity-Class AND
A: E.Attribute (WHERE A.Multiplicity = 'single-valued')
WITH RhEAND
N.Name = A.Name AND
N.Null-Spec = AlsNull AND
N.Uniqueness = A.lsUnique AND
N.DataType = ADataType
218
HAVING R.Foreign-Key.Attribute r» {N} = 0
6) Exchanging the LHS construct-set with the RHS construct-set; The LHS constructinstances R and N are exchanged with the RHS construct-instances E and A. The
resulted construct equivalence is shown in Figure 5.2.
E: SOOER-Entlty-Class AND
A: E.Attribute (VWERE A.Multiplicity = 'single-valued')
R: Relational.Relation AND
N; R.Attribute
WITH R = EAND
N.Name = A.Name AND
N.Null-Spec = A.lsNull AND
N.Uniqueness = A.lsUnique AND
N.DataType = A.DataType
HAVING R.Foreign-Key.Attribute n {N} = 0
Figure 5.2: Construct Equivalence Transformed from Example 5
The transformed inter-model construct equivalence can be interpreted as: each singlevalued attribute of an entity-class in the SOOER model is equivalent to an attribute of the
relation for the entity-class in the relational model. The attribute in the relation is a nonforeign-key attribute (as specified in the ancillary clause R.Foreign-Key.Attribute o {N}
= 0). The name (or data-type) of the non-foreign key attribute is the same as that of the
single-valued attribute. Whether the non-foreign key attribute allows null values or not
is determined by the null-specification of the single-valued attribute. The IsUnique
property of the non-foreign attribute corresponds to the uniqueness property of the
single-valued attribute."
The meaning of the transformed inter-model construct
equivalence conforms to the reverse direction of its original construct equivalence.
219
Example 8: (continued from Example 6)
The intra-model construct equivalence within the SOGER model shown in Example 6
translates a multi-valued attribute into a new entity-class, a single-valued attribute, and
an association relationship connecting the new entity-class and the entity-class to which
the multi-valued attribute belongs. The complete intra-model construct equivalence is
represented as below.
E; Entity-Class AND
M: E.Attribute (WHERE M.Multiplicity = 'multi-valued')
N; Entity-Class AND
A: Attribute AND
R: Association
WITH A.Multiplicity = 'single-valued' AND
R.relate.Entity-Class = {E, N} AND
N.Attribute = {A} AND
N.Identifier.Attribute = N.Attribute AND
A.Name = M.Name AND
A.DataType = M.DataType AND
A.Uniqueness = 'unique' AND
A.Null-Spec = 'not-null' AND
R.relate.E(Max-Card) = m IF M.Uniqueness = 'not-unique' AND
R.relate.E(Max-Card) = 1 IF M.Uniqueness = 'unique' AND
R.relate.E(Min-Card) = 1 AND
R.relate.N(Max-Card) = m AND
R.relate.N(Min-Card) = 0 IF M.Null-Spec = 'null' AND
R.relate.N(Min-Card) = 1 IF M.Null-Spec = 'not-null'
The transformation process of this inter-model construct equivalence is graphically
depicted in Figure 5.3. The resulted construct equivalence is shown in Figure 5.4.
220
E: Entity-Class AND
M: E.Attribute (WHERE M.Multiplicity = 'multi-valued')
N: Entity-Class AND
A: Attribute AND
R: Association
WITH A.Multiplicity = 'single-valued' AND
R.relate.Entity-Class = (E. N} AND
N.Attribute=(A>AND
N.Identifier.Attribute = N.Attribute AND
A.Name = M.Name AND
A.DataTvpe = M.DataTvpe AND
A.Uniqueness = 'unique' AND
A.Null-Spec = 'not-null' AND
R.relate.EfMax-Card^ = m
IF M.Uniqueness = 'not-unique' AND
<- OP13 (step 4)
OP6
^ OPS
OPS
<- OPS
<- OPT
<- OP7
<- OP6
<r- OP6
(step 2)
(step 2)
(step 2)
(step 2)
(step 2)
(step 2)
(step 2)
(step 2)
4-OPlO (step 2)
R.rel9teE(M9x-C9rci) = 1
IF M.Uniqueness = 'unique' AND
R.relate.E(Min-Card) = 1 AND
R.relate.N(Max-Card^ = m AND
R.relate.N(Min-Card^ = 0
IFM.Null-Spec = 'null' AND
R.relate.NrMin-Card) = 1
IF M.Null-Spec = 'not-null'
•<-OP10 (step 2)
<— OP6 (step 2)
OP6 (step 2)
<-OP10 (step 2)
<r- OPIO (step 2)
Figure 5.3: Process of Transforming Construct Equivalence of Example 6
221
N: Entity-Class
A: Attribute
R: Association
(WHERE N.Attribute = {A} AND
N.Attribute = N.ldentlfier.Attribute) AND
(WHERE A.Multipllclty = 'single-valued' AND
A.Uniqueness = 'unique' AND
A.Null-Spec = 'not-null') AND
(WHERE R.relate.Entity-Class = {E, N} AND
R.relate.E(Min-Card) = 1 AND
R.relate.N(Max-Card) = m)
E: Entity-Class AND
M: E.Attribute
WITH M.Name = A.Name AND
M.DataType = A.DataType AND
M.Uniqueness = 'not-unique' IF R.relate.E(Max-Card) = m AND
M.Uniqueness = 'unique' IF R.relate.E(Max-Card) = 1AND
M.Null-Spec = 'null' IF R.relate.N(Min-Card) = 0 AND
M.Null-Spec = 'not-null' IF R.relate.N(Min-Card) = 1 AND
M.Multiplicity = 'multi-valued'
Figure 5.4: Construct Equivalence Transformed from Example 6
This transformed inter-model construct equivalence shown in Figiure 5.4 can be
interpreted as; in the SOGER model, each entity-class N having a 'single-valued',
'unique' and 'not-null' attribute as its only attribute as well as its identifier, and an
association relationship R connecting N (with the maximal cardinality as m) to another
entity-class E (with the minimal cardinality as 1) are equivalent to a multi-value attribute
of the entity-class E. The name and the data type of the multi-valued attribute are the
same as those of the single-valued attribute. If the maximal cardinality of the link
between the association relationship R and the entity-class E is m (i.e., an instance of the
entity-class N can be associated with multiple instances of the entity-class E), the
Uniqueness property of the multi-valued attribute is 'not-unique'; otherwise, it is
222
'unique'. If the mininaai cardinality of the link between the association relationship R
and the entity-class N is 0 (i.e., an instance of the entity-class E may not be associated
with any instance of the entity-class N), the Null-Spec property of the multi-valued
attribute is 'null'; otherwise, it is 'not-null'."
As discussed, an essential requirement of the construct equivalence transformation
function is information preserveness. A construct equivalence transformation fimction F
is an information preserving fimction if F(F(E)) = E where E is a construct equivalence.
Theorem 1: Construct equivalence transformation fimction based on the restructuring
operations is an information preserving fiinction.
[Proof]
An information preserving construct equivalence transformation fimction requires that
the effect resulted firom an restructuring operation can be inversed by the same or a
different restructuring operation. In other words, every restructuring operation 0 needs
to have an inverse restructuring operation R whose inverse restructuring operation is the
restructuring operation O. Table 5.5 shows the inverse restructuring operation of each
restructuring operation employed by the construct equivalence transformation function.
223
Restructuring Operation
OPl (ancillary clause -> complex
construct-domain)
OPl (ancillary clause complex
construct-domain)
OPS
ancillary clause with condition ->
ancillary clause with condition
ancillary clause with condition ->
conditional correspondence
OP4 (ancillary clause
selection clause)
OPS (connection correspondence
exchange)
OP6 (imconditional property
correspondence
property selection)
OP7 (unconditional property
correspondence exchange)
OPS (imconditional relationship
correspondence
path selection)
OP9 (unconditional relationship
correspondence set-op conversion)
OPIO
conditional correspondence ->
conditional correspondence
conditional correspondence —)• ancillary
clause with condition
OPll (complex construct-domain ->
ancillary clause)
OP12 (complex construct-domain
ancillary clause)
OP13 (property selection ->
unconditional property correspondence)
OP14 (path selection -> unconditional
relationship correspondence)
OP15 (path selection, extension selection,
and set-selection-condition -> ancillary
clause)
Inverse Restructuring Operation
OPll (complex construct-domain ->
ancillary clause)
OP12 (complex construct-domain ->
ancillary clause)
OPS (ancillary clause with condition ->•
ancillary clause with condition)
OPIO (conditional correspondence
ancillary clause with condition)
OP15 (path selection, extension
selection, and set-selection-condition ->
ancillary clause)
OPS (connection correspondence
exchange)
OP13 (property selection
unconditional property correspondence)
OP7 (unconditional property
conespondence exchange)
OP14 (path selection -> unconditional
relationship correspondence)
OP9 (unconditional relationship
correspondence set-op conversion)
OPIO (conditional correspondence
conditional correspondence)
OPS (ancillary clause with condition ->
conditional correspondence)
OPl (ancillary clause -> complex
construct-domain)
OP2 (ancillary clause
complex
construct-domain)
OP6 (unconditional property
correspondence -> property selection)
OPS (unconditional relationship
correspondence
path selection)
OP4 (ancillary clause —>• selection
clause)
Table 5.5: Inversibility of Restructuring Operations of the Construct Equivalence
Transformation Function
224
Assume a construct equivalence to be transformed be E, the transformed construct
equivalence of E be E', and the transformed construct equivalence of E' be E". In the
following, the proof of the inversibility between the restructuring operations OPl and
OPll is provided. The proof of the inversibility between the rest of restructuring
operations can be obtained similarly and thus will not be provided further.
Case 1: The inverse restructuring operation of OPl is OPll.
For each ancillary clause N of E with the form of "B [ u C ...] n {V} = 0," the
restructuring operation OPl appends
B [ - C ...]" into the construct-domain of the
RHS construct-instance V and removes N from the ancillary-description. Assume the
original construct-domain of V be A. A must be a simple construct-domain since the
construct-instance V is in the RHS of the construct equivalence to be transformed. After
the construct-set exchange as indicated in the step 6 of the transformation function, the
construct-instance V becomes a LHS construct-instance with a complex constructdomain "V: A - B [ - C ...]" in E'. When E' is transformed into E", the restructuring
operation OPll adds an ancillary clause "B [ u C ...] n {V} = 0" into the ancillarydescription and removes
B [ - C ...]" from the construct-domain of the LHS
construct-instance V (i.e., the construct-domain of V is A now). After the construct-sets
of E' are exchanged, the construct-instance V becomes a RHS construct-instance in E".
Since V is a RHS construct-instance with a simple construct-domain A in both E and E",
the ancillary-clause N is "B [ u C ...] n {V} = 0" in both E and E", and no other
component in a construct equivalence is restructured by OPl and OPll, E and E" are
225
identical with respect to all ancillary-clauses with the form of "B [ u C ...] n {V} = 0."
Thus, the inverse restructuring operation of OPl is OPll.
Case 2: The inverse restructuring operation of OPl 1 is OPl.
For each complex construct-domain with the form of "V: A - B [ - C ...]" for a LHS
construct-instance V, the restructuring operation OPll adds an ancillary clause "B [ u C
.„] n {V} = 0" into the ancillary-description and removes
B [ - C ...]" from the
construct-domain of the LHS construct-instance V (i.e., V has a simple construct-domain
A now). After the construct-sets of E are exchanged, the construct-instance V becomes a
RHS construct-instance in E'. When E' is transformed into E", since E' has an ancillary
clause with the form of "B [ u C ...] n {V} = 0," the restructuring operation OPl
appends
B [ - C ...]" into the construct-domain of the RHS construct-instance V (i.e.,
now the construct-instance V has a complex construct-domain "A - B [ - C ...]") and
removes N from the ancillary-description. After the construct-sets of E' are exchanged,
the construct-instance V becomes a LHS construct-instance with a complex constructdomain in E". Since V is a LHS construct-instance with a complex construct-domain
"V: A - B [ - C ...]" in both E and E", the ancillary-clause "B [ u C ...] n {V} = 0"
added by OPll has been removed by OPl, and no other component in a construct
equivalence is restructured by OPll and OPl, E and E" are identical with respect to
each complex-construct-domain of a LHS construct-instance.
restructuring operation of OPll is OPl.
Thus, the inverse
226
Thus, combining Case 1 and Case 2, the inversibility of the restructuring OPl and OPl 1
is proved. As mentioned, the inversibility between the rest of restructuring operations
can also be proved similarly. Hence, the construct equivalence transformation fiinction
based on the restructuring operations OPl to OP15 is an information preserving
function.
Example 9: (continued from Example 7)
The construct equivalence specified as "each non-foreign key attribute of a relation in the
relational model is equivalent to a single-valued attribute of the entity-class for the
relation in the SOGER model. ..." is transformed into the construct equivalence shown
in Figure 5.2. Here, it will be shown that, after applying the construct equivalence
transformation function on this transformed construct equivalence, the resulted construct
equivalence will be the same as the original construct equivalence. The process of
transforming the construct equivalence transformed from Example 5 is summarized in
Figure 5.5.
E; SOOER.Entity-Class AND
A; E.Attribute A/VHERE A.Multiplicity = 'single-valued')
R; Relational.Relation AND
N: R.Attribute
WITH R = E AND
N.Name = A.Name AND
N.Null-Spec = AlsNull AND
N.Uniqueness = A.lsUnique AND
N.DataType = ADataType
HAVING R.Foreiqn-Kev.Attribute n {N} = 0
<-OP13 (step 4)
<-OP5
<r-0?7
<-OP7
<-OP7
<-OP7
<-OPl
Figure 5.5: Construct Equivalence Transformed from Example 5.
(step 2)
(step 2)
(step 2)
(step 2)
(step 2)
(step 1)
227
The inversibility of the restructuring operations can be observed by comparing Figure 5.1
with Figure 5.5. For example, A.MultipIicity ='single-vaiued' initially was a property
correspondence. As shown in Figure 5.1, this property correspondence was transformed
by the restructuring operation OP6 and became a property selection clause of the
construct-instance A (as shown in Figure 5.2).
When transforming the construct
equivalence shown in Figure 5.5, the restructuring operation OP13 is applied on the
property selection clause A.MultipIicity = 'single-valued' of the LHS construct-instance
A and converts the property selection clause into a property correspondence. Thus, the
restructuring operation OP13 offsets the effect of OP6 and is therefore the inverse
restructuring operation of OP13. Furthermore, the inverse restructuring operation of
OPS is itself, as shown in Table 5.5 and illustrated by the restructuring operation OPS on
E s R in Figure 5.1 and OPS on R = E in Figure 5.5.
Consequently, the construct equivalence resulted from Figure 5.5 is depicted in Figure
5.6. It is the same as its original construct equivalence, as shown in Example 3 and the
beginning of Example 7.
228
R: Relational.Relation AND
N: R.Attribute - R.Foreign-Key.Attribute
E: SOOER.Entity-Class AND
A; E.Attribute
WITH E = R AND
A.Name = N.Name AND
A.Null-Spec = N.lsNull AND
A.Uniqueness = N.lsUnique AND
A.DataType = N.DataType AND
A.Multiplicity = 'single-valued'
Figure 5.6: Construct Equivalence Transformed from Example 5
5.4 Evaluation of Construct Equivalence Assertion Language Against
Design Principles
A construct equivalence specified in CEAL only asserts an equivalence (similar to a
mathematical equation) between two sets of model constructs.
Although construct
equivalences are associated with execution semantics, the reasoning method is not
embedded in construct equivalences. Thus, it is declarative and satisfies the first design
principle (i.e., declarative nature) for a construct equivalence representation defined in
Section 5.1. The use of the ancillary-description clause and the information-preserving
construct equivalence transformation function ensures that a construct equivalence can
be interpreted and reasoned from either direction. Thus, the second design principle (i.e.,
bi-directional construct equivalences) is observed. A construct-instance (or a constructinstance-set) in the construct set of a construct equivalence is a variable for an instance
(or a set of instances) of a model construct and thus can be instantiated by application
229
constructs in an application schema to be translated (as required by the third design
principle, instantiable by application constructs).
As discussed, a construct equivalence defines an equivalence between two construct-sets
(i.e., construct-set = construct-set) each of which consists of a conjunction of constructinstances and/or construct-instance-sets. If both construct-sets in a construct equivalence
consist of a single construct-instance or construct-instance-set, the construct equivalence
is one-to-one. It is a one-to-many (or many-to-one) construct equivalence if one of its
construct-sets consists of a single construct-instance or construct-instance-set and the
other has multiple construct-instances and/or construct-instance-sets.
If both of the
construct-sets include multiple construct-instances and/or construct-instance-sets, it
becomes a many-to-many construct equivalence.
Accordingly, CEAL is capable of
expressing one-to-one, one-to-many (or many-to-one), and many-to-many construct
equivalences; thus, the fourth design principle is satisfied. Moreover, the constructdomain of a construct instance can be annotated by a model name.
All construct-
instances and construct-instance-sets belonging to the same construct-set of a construct
equivalence must be drawn from
the same model schema, while construct-sets in
different sides of a construct equivalence can be defined upon different model schemas.
If both of the construct-sets of a construct equivalence are from the same model schema,
the construct equivalence is an intra-model one; otherwise, it becomes an inter-model
construct equivalence. Therefore, a single language for both intra-model and inter-model
construct equivalences which is required by the last design principle is supported in
CEAL.
230
As for the fifth design principle (i.e., partial model construct in a construct equivalence)
to LHS construct-instances, the supporting features in CEAL include complex constructdomains for and instance-selection-conditions on LHS construct-instances.
As
illustrated previously, a partial model construct "each non foreign-key attribute in the
relational
model"
can
be
expressed
with
a
complex
"Relational.Attribute - Relational.Foreign-Key.Attribute".
construct-domain
as
As such, a complex
construct-domain can be used to define a partial model construct in a construct
equivalence in terms of set operations on two or more model constructs. In addition, an
instance-selection-condition on a LHS construct-instance can be employed to specify a
partial model construct in terms of restrictions on its property, path, or extension. On the
other hand, property correspondences with values in their RHSs and relationship
correspondences whose origin construct-instances are specified in the RHS of construct
equivalences determine some characteristics of RHS construct-instances. Thus, even
RHS construct-instances cannot be associated to complex construct-domains or instanceselection-conditions, partial model constructs in the RHS of a construct equivalence can
still be represented in the construct-correspondence clause.
The use of construct-instance-set and the reference protocol to irmer construct-instances
of construct-instance-sets render the expressiveness to CEAL for specifying multiple
instances of the same model construct in construct equivalences and thus support the
sixth design principle. The provision of property and relationship correspondences in the
construct-correspondence contributes to CEAL's capability in defining detailed
231
correspondences for construct-equivalences, as required by the seventh design principle.
Table 5.6 summarizes the evaluation of CEAL against its design principles.
Design Principle
1. Declarative nature
2. Bi-directional construct
equivalences
3. Instantiable by application
constructs
4. 1-to-l, 1-to-m or m-to-m
construct equivalence
5. Partial model construct in a
construct equivalence
6. Multiple instances of the same
model construct in a construct
equivalence
7. Detailed correspondences for
construct equivalences
8. Single language for both intermodel and intra-model
construct equivalence
Supporting Features in Construct
Equivalence Assertion Language
=> Expression of construct equivalences similar
to mathematics equations
=> Ancillary-description in the HAVING clause
=> Information-preserving transformation
function
=> Construct-instance and construct-instance-set
=> Construct-set as a conjunction of constructinstances and/or construct-instance-set
=> Complex construct-domain for a LHS
construct-instance
=> Selection-condition on a construct-instance
=> Property and relationship correspondence for
a RHS construct-instance
=> Construct-instance-set possibly with setselection-condition
=> Construct-correspondence in the WITH
clause
- Property correspondence
- Relationship correspondence
=> Model name prefixed in model constructs
when constructing construct-domains
Table 5.6: Evaluation of Construct Equivalence Assertion Language
232
CHAPTER 6
Construct-Equivalence-Based Schema Translation and
Schema Normalization
This chapter details algorithms for the construct-equivalence-based schema translation
and construct-equivalence-based schema normalization. As shown in Figure 2.9, since
both are built upon construct equivalence transformation and reasoning methods, die
development of these two methods will be described first.
6.1 Construct Equivalence Transformation Method
The construct equivalence transformation method ensures that the construct-sets of each
construct equivalence involved in a particular translation or normalization task be in the
desired direction. The construct-equivalence-based schema translation involves three
transformation tasks: 1) transformation of the inter-model construct equivalences
between the source and the target data model, 2) transformation of the intra-model
construct equivalences of the source data model, and 3) transformation of the intra-model
construct equivalences of the target data model.
On the other hand, construct-
equivalence-based schema normalization requires only one transformation task, which is
to transform, based on pre-determined normalization criteria, intra-model construct
equivalences of the data model in which the application schema is specified. Because the
transformation task required by the construct-equivalence-based schema normalization is
233
highly similar to the third transformation task of the construct-equivalence-based schema
translation, the development of these two algorithms will be discussed together.
Algorithm for Transformation of Inter-Model Construct Equivalences:
This algorithm ensures that the LHS construct-set of each inter-model construct
equivalence be specified on the model constructs of the source data model and that its
RHS construct-set be specified on the model constructs of the target data model. Thus, it
simply involves the exchange of construct-sets of all inter-model construct equivalences
whose current directions are the reverse of the desired direction. The detailed algorithm
of this transformation is depicted in Algorithm 6.1.
InterCE-Transform(CEsT» SM, TM):
/* Input: CEsx (inter-model construct equivalences between the source and the
target data model), SM (the name of the soiu-ce data model), and TM (the
name of the target data model).
Result: transformed CEST (each of inter-model construct equivalence whose LHS
construct-set is specified on the model constructs of the source data model
and RHS construct-set of E is specified on the model constructs of the
target data model) */
Begin
For each inter-model construct equivalence E in CEgj
If the LHS construct-set of E is specified on the model constructs of TM
Then transform E according to the construct equivalence transformation
function described in Section 5.3.
End.
Algorithm 6.1: Inter-model Construct Equivalence Transformation
234
Algorithm for Transformation of Intra-Model Construct Equivalences of the Source
Data Model:
The goal of this transformation is to ensure that all model constructs in the nonoverlapped semantic space of the source data model can be translated into the model
constructs in the overlapped semantic space of the source data model.
Since the
overlapped semantic space of the source data model has, in fact, been represented in the
LHS constructs-sets of the inter-model construct equivalences, this transformation is
directed by the LHS construct-sets of all inter-model construct equivalences.
Accordingly, the transformation process is initiated from
identifying intra-model
construct equivalences which are directly RHS-associated with some inter-model
construct equivalences, followed by identifying the intra-model construct equivalences
which are indirectly RHS-associated with the inter-model construct equivalences (i.e.,
directly RHS-associated with the union of the inter-model construct-equivalences and the
intra-model construct equivalences which have been identified as directly or indirectly
RHS-associated with the inter-model construct equivalences). A construct equivalence is
said to be directly RHS-associated with a set of construct equivalences if the RHS
construct-set of the former is the same^ as or a subset'^ of any combination of LHS
' A construct-set CS, is said to be the same as another construct-set CS, if I) for each construct-instance CI
in CSj there exists one construct-instance in CS2 whose construct-domain and instance-selection-condition
are the same as those of CI, 2) for each construct-instance-set CIS in CS, there exists one constructinstance-set in CS2 whose inner construct-instance and set-selection-condition are the same as those of CIS,
3) for each construct-instance CI in CS, there exists one construct-instance in CS, whose construct-domain
and instance-selection-condition are the same as those of CI, and 4) for each construct-instance-set CIS in
CST there exists one construct-instance-set in CS, whose inner construct-instance and set-selection
condition are the same as those of CIS.
A construct-set CS, is said to be a subset of another construct-set CS, if I) for each construct-instance CI
in CS, there exists one construct-instance in CS, whose construct-domain and instance-selection-condition
are the same as those of CI, and 2) for each construct-instance-set CIS in CS, there exists one constructinstance-set in CS, whose inner construct-instance and set-selection-condition are the same as those of CIS.
235
construct-sets of the latter. Since the constnict-sets between two sides of a construct
equivalence are interchangeable, checking whether a construct equivalence is RHSassociated with a set of construct equivalences needs to be performed on both the
construct equivalence and its transformed construct equivalence (according to the
construct equivalence transformation function described in Section 5.3). Furthermore,
after all of the intra-model construct equivalences directly and indirectly RHS-associated
with the inter-model construct equivalences are identified, the remaining intra-model
construct equivalences are removed from the set of intra-model construct equivalences
for the intended schema translation because they cannot lead to any model construct in
the overlapped semantic space. The detailed algorithm for transforming intra-model
construct equivalences of the source data model is shown in Algorithm 6.2.
236
SourceCE-Transfonii(CEs, CEgr):
/* Input: CEs (intra-model construct equivalences of the source data model) and
CEyr (inter-model construct equivalences).
Result; transformed CEs
Begin
Goal-Set = CESTMark every intra-model construct equivalence in CEj as 'unprocessed'.
Repeat
For each 'improcessed' intra-model construct equivalence E in CEg
> Transform E and assign the resulted construct equivalence into E'.
> If E is directly RHS-associated with Goal-Set (i.e., the RHS constructset of E is the same as or a subset of any combination of LHS constructsets in Goal-Set)
Then If E' is directly RHS-associated with Goal-Set
Then
nl = the number of application constructs created - the
number of application constructs deleted, if E is
executed (based on the execution semantics defined in
Section 5.3).
n2 = the number of application constructs created - the
number of application constructs deleted, if E' is
executed.
If nl > n2 Then select E Else select E' (see discussion 1)
Else select E.
Else If E' is directly RHS-associated with Goal-Set
Then select E'.
> If E is selected Then insert E into Goal-Set and mark E as 'processed'.
> IfE' is selected
Then insert E' into Goal-Set, replace E by E' and mark E as 'processed'.
Until (no intra-model construct equivalence is inserted into the Goal-Set within this
for-loop) or (all intra-model construct equivalences in CEs have the status of
'processed').
Remove all of the 'unprocessed' intra-model construct equivalences from CEsEnd.
Algorithm 6.2: Intra-model Construct Equivalence Transformation for Source Data
Model
237
Discussion 1:
When both a construct equivalence E and its transformed construct equivalence E' are
directly RHS-associated with a set of construct equivalences S, E (and of course E') is a
construct equivalence involving a subset of model constructs of S. In this case, three
alternatives can be employed: both E and E' are included, neither of them is included, or
either E or E' is included in chains of reasoning. The first alternative will result in a
cycle when performing source convergence of the schema translation. In other words,
some application constructs (in the source application schema) denoted by the LHS
construct-set of E can be translated into application constructs denoted by the RHS of E.
By applying E', the application constructs resulting from applying E can be translated
back to the application constructs before both E and E' are applied. To avoid increasing
the complexity of the source convergence algorithm, this alternative will not be adopted.
Although the second alternative does not cause a reasoning cycle during the source
convergence stage, dropping both E and E' may degrade the quality of the target
application schema since any additional semantics possibly generated by E or E' is not
available for the target-enhancement stage. Thus, the second alternative also will not be
adopted.
The third alternative, which does not have the drawback of the second
alternative but preserves its advantage, selects either E or E' to be included in chains of
reasoning. The decision on which construct equivalence to select can be determined by
the number of net application constructs created by them (i.e., the number of application
constructs created minus the number of application constructs deleted, if E or E' is
executed). A simple heuristics can be developed accordingly: If the number of net
238
application constructs resulting from executing E is greater than or equal to that of E',
then E is selected; otherwise, E' is selected.
Algorithm for Transformation of Intra-Model Construct Equivalences of the Target
Data Model (or the Data Model Used in Schema Normalization):
The goal of this transformation in the construct-equivalence-based schema translation is
to expand the use of model constructs of the target data model by incorporating the
model constructs in the non-overlapped semantic space of the target data model. The
unrestricted scenario is to utilize all model constructs available in the target data model
for expressing the target application schema. However, sometimes, it may be preferable
not to employ some model constructs of the target data model in the target application
schema. In such a case, chains of reasoning in the target enhancement stage in the
construct-equivalence-based schema translation should not be terminated at these
undesired model constructs.
Thus, the transformation process of the intra-model
construct equivalences of the target data model is driven by the overlapped semantic
space of the target data model represented in the EIHS of the inter-model construct
equivalences and is constrained by all undesired model constructs of the target data
model. The representation of each imdesired model construct is the same as the syntax
for construct-instance defined in Section 5.2.2. The set of all undesired model constructs
(called aversion set) are connected by conjunctive operators.
239
The general transformation flow for the intra-model construct-equivalences of the target
data model is similar to that for the intra-model construct-equivalences of the source data
model (as shown in Algorithm 6.2). Since this transformation process is driven by the
RHS construct-sets of the inter-model construct equivalences, the identification of intramodel construct equivalences directly and indirectly associated to the inter-model
construct equivalences should be defined on the LHS construct-sets of the intra-model
construct equivalences of the target data model. A construct equivalence is said to be
directly LHS-associated with a set of construct equivalences if the LHS construct-set of
the former is the same as or a subset of any combination of RHS construct-sets of the
latter. After all of the intra-model construct equivalences directly and indirectly LHSassociated with the inter-model construct equivalences are identified, the remaining
intra-model construct equivalences are removed from the set of intra-model construct
equivalences for the intended schema translation since they cannot be visited directly or
indirectly from the overlapped semantic space. In addition, all terminal intra-model
construct equivalences (i.e., the intra-model construct equivalences which are not
directly RHS-associated with the rest of intra-model construct equivalences) whose RHS
construct-sets include any undesired model construct in the aversion set should be
removed as well.
The detailed algorithm for transforming intra-model construct
equivalences of the target data model is depicted in Algorithm 6.3.
240
TargetCE-Transforin(CET, CEgx? A):
/* Input: CEf (intra-model construct equivalences of the target data model), CEgr
(inter-model construct equivalences), and A (aversion set).
Result: transformed CEx */
Begin
Origin-Set = CESTMark every intra-model construct equivalence in CEj- as 'unprocessed'.
Repeat
For each 'unprocessed' intra-model construct equivalence E in CE-r
> Transform E and assign the resulted construct equivalence into E'.
> If E is directly LHS-associated with Origin-Set (i.e., the LHS constructset of E is the same as or a subset of any combination of RHS constructsets in Origin-Set)
Then If E is directly LHS-associated with Origin-Set
Then nl = the number of application constructs created - number
of application constructs deleted, if E is executed.
n2 = the number of application constructs created - number
of application constructs deleted, if E' is executed.
If nl > n2 Then select E Else select E'.
Else select E.
Else If E is directly LHS-associated with Origin-Set
Then select E'.
> If E is selected Then insert E into Origin-Set
Else insert E' into Origin-Set and replace E by E'.
> Mark E as 'processed'.
Until (no new construct equivalence is inserted into Origin-Set within this for-loop)
or (all intra-model construct equivalences in CEj have the status of 'processed').
Remove all 'unprocessed' intra-model construct equivalences from CE-r.
Repeat
For each intra-model construct equivalence E in CE-j> If the construct-domain and the selection-condition of any constructinstance in the RHS construct-set of E are the same as those of any undesired model construct in A
Then If E is not directly RHS-associated to the rest of intra-model
construct equivalences in CEx Then remove E from CE-rUntil (no more construct equivalence is removed from CEx within this for-loop).
End.
Algorithm 6.3: Intra-model Construct Equivalence Transformation for Target Data
Model
241
The transformation involved in the construct-equivalence-based schema normalization is
the same as the restricted case of transforming the intra-model construct equivalences of
the target data model for the construct-equivalence-based schema translation as described
above, except that the former is not driven by the overlapped semantic space of the target
data model but by the aversion set (and also constrained by the aversion set). Thus, with
some minor modifications, the transformation algorithm for intra-model construct
equivalences of the target data model can be adopted as the algorithm for intra-model
construct equivalences for schema normalization.
Such modifications include 1)
dropping the set of inter-model construct equivalences (i.e., CEst in Algorithm 6.3) from
the input of the transformation algorithm and 2) changing Origin-Set = CEST into OriginSet = A (where A is the aversion set). The resulting transformation algorithm required
by the construct-equivalence-based schema normalization is depicted in Algorithm 6.4.
242
NormalCE-TransformCCEx? A):
/* Input: CEj (intra-model construct equivalences of the data model used in the
schema normalization process) and A (aversion set).
Result: transformed CEx */
Begin
Origin-Set = A.
/* different from Algorithm 6.3 */
Mark every intra-model construct equivalence in CEj as 'unprocessed'.
Repeat
For each 'unprocessed' intra-model construct equivalence E in CEj
> Transform E and assign the resulted construct equivalence into E'.
> If E is directly LHS-associated with Origin-Set (i.e., the LHS constructset of E is the same as or a subset of any combination of RHS constructsets in Origin-Set)
Then If E is directly LHS-associated with Origm-Set
Then nl = the number of application constructs created - number
of application constructs deleted, if E is executed.
n2 = the number of application constructs created - number
of application constructs deleted, if E' is executed.
If nl > n2 Then select E Else select E'.
Else select E.
Else If E is directly LHS-associated with Origin-Set
Then select E'.
> If E is selected Then insert E into Origin-Set
Else insert E' into Origin-Set and replace E by E'.
> Mark E as 'processed'.
Until (no new construct equivalence is inserted into Origin-Set within this for-loop)
or (all intra-model construct equivalences in CEj have the status of 'processed').
Remove all 'unprocessed' intra-model construct equivalences from CE^.
Repeat
For each intra-model construct equivalence E in CEj
> If the construct-domain and the selection-condition of any constructinstance in the RHS construct-set of E are the same as those of any undesired model construct in A
Then If E is not directly RHS-associated to the rest of intra-model
construct equivalences in CEf Then remove E from CE-j-.
Until (no more construct equivalence is removed from CEj within this for-loop).
End.
Algorithm 6.4: Intra-model Construct Equivalence Transformation for Schema
Normalization
243
6.2 Construct Equivalence Reasoning Method
The construct equivalence reasoning method deals with the reasoning of intra-model
construct equivalences and that of inter-model construct equivalences. In terms of its
application, the construct-equivalence-based schema translation requires both types of
reasoning in its three stages (i.e., source convergence, source-target projection, and target
enhancement), while the construct-equivalence-based schema normalization involves
only the reasoning of intra-model construct equivalences.
Algorithm for Intra-Model Construct Equivalence Reasoning Method:
The intra-model construct equivalence reasoning method is employed in the source
convergence and target enhancement stages of the schema translation process and by the
schema normalization process. The source convergence starts from an initial source
application schema through a series of intra-model construct equivalence executions and
works toward another application schema represented only by the model constructs in the
overlapped semantic space of the source data model. The target enhancement goes
through the same process; that is, transforming from an initial target application schema
represented only by the model constructs in the overlapped semantic space of the target
data model to another application schema expressed with additional model constructs in
the non-overlapped semantic space of the target data model. Moreover, the described
process is also applicable to the schema normalization process in which an initial
application schema is transformed through a series of intra-model construct equivalence
executions into another application schema containing no undesired model constructs.
244
Essentially, these three processes can be considered as a forward-chaining process [W92,
GD93] in a rule-based system, where the application constructs in the source (or target)
application schema are facts and the intra-model construct equivalences are knowledge
(or rules). Consistent with a forward-chaining process, the source convergence, target
enhancement, or schema normalization (after intra-model construct equivalences have
been transformed appropriately by the construct-equivalence transformation method)
involves the repetition of the following basic steps:
1. Matching: In this step, the intra-model construct equivalences are instantiated from
an application schema to decide which intra-model construct equivalences are
satisfied. A construct equivalence is satisfied if its instantiations from the application
schema (i.e., binding-set^ the construct equivalence) are not empty (i.e., the bindingset is not an empty set and all columns in the binding-set are not completely empty).
2. Conflict Resolution: It is possible that the matching step will find multiple intramodel construct equivalences that are satisfied. The ones that have potential to be
executed (i.e., their instantiations are not empty) constitute a conflict-set. Conflict
resolution involves determining the priority of each intra-model construct
equivalence in the conflict-set and then selecting the one with the highest priority to
be executed.
3. Execution: The last step is the execution of the selected intra-model construct
equivalence based on the execution semantics for intra-model construct equivalences
defined in Section 5.2.5.
^ Binding-set is an n-dimensional array, where n is the number construct-instances and construct-instancesets in the LHS construct-set. Each column denotes a construct-instance or construct-instance-set in the
LHS construct-set. The order of columns conforms to that in LHS construct-set. Each row denotes an
instantiation of the LHS construct-set from the application schema to be translated or normalized.
245
Determination of the priority of each intra-model construct equivalence in the conflictset during the conflict resolution step requires a conflict resolution scheme. Commonly
used conflict resolution schemes for forward-chaining production rule-based systems
include specificity, complexity, and recency of rules [W92, GD93].
Adopting the
concepts behind these common conflict resolution schemes with the consideration of the
representation of construct equivalences distinct from that of rules, the conflict resolution
scheme for intra-model construct equivalence reasoning includes the following
principles.
1. LHS specificity principle:
When the LHS construct-set of an intra-model construct equivalence in the conflictset is a superset of that of another intra-model construct equivalence in the conflictset, select the superset intra-model construct equivalence on the grounds that it deals
with a more specific situation.
2. LHS complexity principle:
The intra-model construct equivalence in the conflict-set with the maximal number of
construct-instances and construct-instance-sets in its LHS construct-set will be
selected.
This principle always tries to attack the most refined portion of the
problem.
3. Extent of consequence principle:
In the conflict-set, the intra-model construct equivalence with the maximal number of
net application constructs being created will be selected. This principle always tries
246
to generate more application constructs of possibly more different types of model
constructs in an application schema.
4. Applicability principle:
The intra-model construct equivalence in the conflict-set with the maximal number of
rows in its binding-set (i.e., maximal number of instantiations from the application
schema) will be selected to be executed. This principle may improve the efficiency
of the schema translation or schema normalization because the maximum number of
application constructs in the application schema will be translated or normalized in
the execution of a single intra-model construct equivalence.
The detailed algorithm for the intra-model construct equivalence reasoning method is
shown in Algorithm 6.5, while the procedure for the conflict resolution is depicted in
Algorithm 6.6.
247
IntraCE-Reasoning(CE, AS):
/* Input: CE (intra-model construct equivalences of a data model) and AS
(application schema in the data model).
Result: updated AS */
Begin
Repeat
Initialize Conflict-Set as an empty set.
/* matching step */
For each intra-model construct equivalence E in CE
- Create a binding-set for all possible instantiations of the LHS constructset of E from AS
- If the binding-set is not empty and none of its colunms is completely
empty
Then add E into Conflict-Set.
If Conflict-Set is not empty, then
/* conflict resolution step */
- C = Conflict-Resolution(Conflict-Set, binding-sets).
/* execution step */
- For every row which has no empty cell in the binding-set of C
> Instantiate the RHS construct-set of C according to the execution
semantics for intra-model construct equivalences defined in
Section 5.2.5.
> Assert the instantiation of the RHS construct-set in AS.
> Apply all property and relationship correspondences on AS for
this instantiation.
Until Conflict-Set is empty.
End.
Algorithm 6.5: Intra-model Construct Equivalence Reasoning Method
248
Conflict-Resolutioii(CS, BS):
/* Input: CS (conflict-set which contains a set of construct equivalences) and BS
(binding-sets each of which is associated with a construct equivalence in
CS).
Result: a construct equivalence selected from CS */
Begin
- If CS has more than one construct equivalences
Then Select the construct equivalence(s) from CS according to the LHS
specificity principle and discard the un-selected ones from CS.
- If CS still has more than one construct equivalences, then
Then Select the construct equivalence(s) from CS according to the LHS
complexity principle and discard un-selected ones from CS.
- If CS still has more than one construct equivalences, then
Then Select the construct equivalence(s) from CS according to the extent of
consequence principle and discard un-selected ones from CS.
- If CS still has more than one construct equivalences, then
Then Select the construct equivalence(s) from CS according to the applicability
principle and discard un-selected ones from CS.
- If CS still has more than one construct equivalences
Then Randomly select a construct equivalence from CS and discard the rest of
construct equivalences from CS.
- Return the only construct equivalence in CS.
End.
Algorithm 6.6: Conflict Resolution
Algorithm for Inter-Model Construct Equivalence Reasoning Method:
The inter-model construct equivalence reasoning method is required only by the sourcetarget projection stage in the construct-equivalence-based schema translation. It receives
an application schema expressed in the source data model and produces, through a series
of inter-model construct equivalence executions, an application schema expressed in the
target data model. Unlike the intra-model construct equivalence reasoning method which
updates the initial application schema into the final application schema, the source-target
249
projection stage produces a new application schema expressed in the target data model
rather than updating the initial application schema expressed in the source data model.
Since the source application schema will not be updated at all during the source-target
projection stage, all 'satisfied' inter-model construct equivalences eventually will be
executed. To ensure that the order of the inter-model construct equivalences will not
affect the source-target projection result, an assumption of no conflicting inter-model
construct equivalences needs to be made.
That is, any two inter-model construct
equivalences with identical LHS construct-sets and overlapped or distinct RHS
construct-sets are not allowed. However, it is permissible for one inter-model construct
equivalence to have the same LHS construct-set as another inter-model construct
equivalence and for the RHS construct-set of the former to be a superset or subset of that
of the latter. Under this assumption, the conflict resolution step in the intra-model
construct equivalence reasoning method is not needed. Thus, the process for the intermodel construct equivalence reasoning method (i.e., the source-target projection stage)
involves the repetition of the matching and the execution step.
As described above, in
the matching step, each inter-model construct equivalence is instantiated from the source
application schema to decide which intra-model construct equivalences are satisfied. If
satisfied, this inter-model construct equivalence will be executed based on the execution
semantics for inter-model construct equivalences. The detailed algorithm for the intermodel construct equivalence reasoning method is shown in Algorithm 6.7.
250
IntraCE-Reasoning(CEsTt ASs* AS^):
/* Input: CEyr (inter-model construct equivalences), and ASg (application schema
in the source data model).
Result: ASx (application schema in the target data model) */
Begin
Initialize the status of each inter-model construct equivalence in CEst as 'valid'.
Repeat
For each inter-model construct equivalence E in CEST
- Create a binding-set for all possible instantiations of the LHS constructset of E from ASg.
- If the binding-set is empty or none of its columns (for construct-instance
or construct-instance-set) is completely empty
Then mark the status of E as 'invalid' and start the next for-loop.
- If none of the RHS construct-instances of E is associated with a
connection correspondence or all instantiations (in the binding-set) of
every LHS construct-instance involved in a connection correspondence
have corresponding application constructs in ASy, then
> Mark the status of E as 'fired'.
> For every row which has no empty cell in the binding-set
• Instantiate the RHS construct-set and all correspondences
(property and relationship) of E according to the execution
semantics for inter-model construct equivalences defined in
Section 5.3 and store the resulted instantiation in T.
• If T has not been asserted in ASj
Then assert T in ASj
Else
If T has been partially asserted in AS-p by another
construct equivalence in CEgx (see Discussion 2
below)
Then assert the un-asserted part of T in ASxUntil none of the inter-model construct equivalences in CEST has the status of
'valid'.
End.
Algorithm 6.7: Inter-model Construct Equivalence Reasoning Method
251
Discussion 2:
This case deals with the situation in which the RHS construct-set of an inter-model
construct equivalence is a subset of that of another inter-model construct equivalence
which has been instantiated and asserted in the target application schema.
6.3 Construct-Equivalence-Based Schema Translation
6.3.1 Algorithin: Construct-Equivalence-Based Schema Translation
With the construct equivalence transformation method and the construct equivalence
reasoning method, the development of the construct-equivalence-based schema
translation becomes straightforward.
Since schema translation is directional (from
soiu-ce to target data model), all construct equivalences need to be transformed, by using
the construct equivalence transformation method described in Section 6.1, to establish
chains of reasoning from the non-overlapped semantic space of the source data model via
the overlapped semantic space, and finally to the non-overlapped semantic space of the
target data model.
Subsequently, according to the stage in the schema translation
process, either the intra-model or the inter-model construct equivalence reasoning
method will be adopted. As a result, the construct-equivalence-based schema translation
consists of six steps, as depicted in Algorithm 6.8. The first three steps, involving the
construct equivalence transformation method, are to transform the inter-model construct
equivalences between and the intra-model constructs equivalences of the source and the
target data model.
The remaining three steps, adopting the construct equivalence
252
reasoning method, correspond to the three stages of the construct-equivalence-based
schema translation.
/* Input:
CEs (intra-model construct equivalences of the source data model S),
CEj (intra-model construct equivalences of the target data model T),
CEsx (inter-model construct equivalences between S and T), ASg
(application schema in the source data model), and A (aversion set, that is,
undesired model constructs of T in the final application schema)
Result: ASj (application schema in the target data model T) */
Begin
InterCE-Transform(CEsx, 'S', 'T').
SourceCE-Transform(CEs, CEgx)TargetCE-Transform(CEx, CEst, A).
IntraCE-Reasoning(CEs, ASs).
InterCE-Reasoning(CEsx, ASg, AS^)IntraCE-Reasoning(CEx, ASj).
End.
/* Transforming Inter-model CE */
/* Transforming Intra-model CE of S */
/* Transforming Intra-model CE of T */
/* Source Convergence Stage */
/* Source-Target Projection Stage */
/* Target Enhancement Stage */
Algorithm 6.8: Algorithm for Construct-Equivalence-Based Schema Translation
6.3.2 Advantages of Construct-Equivalence-Based Schema Translation
The intra-model and inter-model construct equivalences in the construct-equivalencebased schema translation are the schema translation knowledge whose representation is
CEAL based on the model schema derived from the inductive metamodeling process.
The transformation and reasoning required by the construct-equivalence-based schema
translation are separated from
the declaration of translation knowledge.
Several
advantages can be derived from the decomposition of translation knowledge into intermodel and intra-model construct equivalences and the separation of the translation
knowledge from its transformation and reasoning methods.
253
L The separation allows model engineers to focus only on the specification of the
translation knowledge without worrying about how the knowledge will be processed.
Each knowledge element (e.g., an intra-model or inter-model construct equivalence)
can be incorporated or removed independently without affecting others.
Any
incorrect translation can easily be attributed to its causing knowledge element. Since
translation knowledge is represented in a declarative format and as independent
elements, the specification, debugging and modification of translation knowledge can
be performed more easily than for translation knowledge represented in a procedural
form.
2. The constituents of translation knowledge make the problem decomposition possible.
The task of specifying the translation knowledge for two data models is naturally
decomposed into five subtasks (i.e., specifying the model schema for each data
model, defining the intra-model construct equivalences for each data model, and
determining the inter-model construct equivalences between these two data models).
Furthermore, with the facilitation of the inductive metamodeling process described in
Chapter 4, two of the tasks (i.e., specifying the model schema for each data model)
essentially are automated without interaction with model designers. Thus, model
engineers need not face a very large complicated task at one time. Rather, they deal
with three smaller subtasks one at a time.
3. The characteristic of bi-directional construct equivalences supported in CEAL
eliminates the need to specify two different sets of translation knowledge between
two data models.
254
4. The model schema and the intra-model construct equivalences of a data model can be
reused in specifying translation knowledge between the data model and different data
models. This advantage is important in a large-scale MDBS environment since all
local data models need to be translated into a common data model and the common
data model needs to be translated into different external data models. Assume that
model engineers have defined intra-model construct equivalences for the data model
S based on the model schema derived by the inductive metamodeling process. When
there is a need to perform the model translation between the data model S and T,
what model engineers need to do is to specify intra-model construct equivalences for
the data model T based on its model schema and the inter-model construct
equivalences between S and T. The existing knowledge pertaining to the data model
S can be reused and need not be re-specified. Afterward, if a need arises for the
translation between S (or T) and another new data model N, the existing knowledge
pertaining to the data model S (or T) can again be reused. As shown in these two
examples, when defining the translation between a known data model and a new data
model, the reusability of existing knowledge reduces the number of subtasks to be
carried out firom three to two. If the intra-model construct equivalences of both data
models already exist, the translation knowledge to be specified involves only the
inter-model construct equivalences (i.e., the reusability of existing knowledge
reduces the number of subtasks to be performed from three to one).
5. Automatic derivation of translation knowledge can be achieved in certain cases.
Assuming that translation knowledge between the data model S and T and between T
and N have been specified, if there is a need to perform translation between S and N,
255
one way is to perform a translation from S to T and then from T to N. Even though
this approach requires no effort from model engineers to define the inter-model
construct equivalences between S and N, its performance suffers due to the indirect
translation through an intermediate data model (T). Another approach is to deduce
the inter-model construct equivalences between S and N from
the intra-model
construct equivalences of S, T, and N and the inter-model construct equivalences
between S and T as well as T and N. Whether all of the inter-model construct
equivalences between S and N can be derived deserves further investigation. If not
all of them can be deduced, partial results can complement the first approach to
improve the performance of the indirect translation.
6. Finally, as explained in Section 2.3, the use of intra-model construct equivalences in
the non-overlapped semantic spaces of the source data model and the target data
model in the construct-equivalence-based schema translation approach provides a
systematic way of minimizing the semantic loss and maximizing the semantic
enhancement results.
6.4 Construct-Equivalence-Based Schema Normalization
The algorithm for the construct-equivalence-based schema normalization is much
simpler than the algorithm for the construct-equivalence-based schema translation since
the former is a special case of the latter.
The construct-equivalence-based schema
normalization only involves 1) the transformation of the intra-model construct
equivalence of the data model employed in this schema normalization process and 2) the
256
reasoning on the intra-model construct equivalences which is constrained by the
normalization criteria specified for this normalization process. Hence, the constructequivalence-based schema normalization consists of two steps, as depicted in Algorithm
6.9.
/* Input:
CE (intra-model construct equivalences of the data model employed in
this normalization process), AS (application schema in the data model),
and A (aversion set that is, normalization criteria)
Result: updated AS */
Begin
NormalCE-Transform(CE, A).
IntraCE-Reasoning(CE, AS).
End.
/* Transforming Intra-CE */
/* Normalization Process */
Algorithm 6.9: Algorithm for Construct-Equivalence-Based Schema Translation
257
CHAPTER 7
Contributions and Future Research
Based on the implications of semantics characteristics of data models for schema
translation and schema integration, a construct-equivalence-based methodology has been
developed for the purposes of 1) overcoming the methodological inadequacies of
existing schema translation approaches and the conventional schema integration process
for large-scale MDBSs, 2) providing an integrated methodology for schema translation
and schema normalization whose similarities of problem formulation have not been
previously recognized, 3) inductively learning model schemas that provide a basis for
declaratively specifying construct equivalences for schema translation and schema
normalization.
In previous chapters, we have detailed the development and some
evaluation results of the construct-equivalence-based methodology. In this chapter, the
contributions of this dissertation research will be summarized. Possible future research
directions will also be discussed.
7.1 Contributions
The contributions of this dissertation research will be discussed from two perspectives:
those that are essential to the MDBS research and those to other research areas.
258
7.1.1 Contributions to MDBS Research
The overall contribution of this dissertation research is to provide the missing theoretical
foundation and effective techniques for an important problem, schema management, in
large information systems which are rapidly proliferated by emerging IT such as
Intranet/Internet. Specific contributions of this dissertation research are:
I. Formalization of the construct-equivalence-based approach in a metamodeling
paradigm for schema translation: The notion of inter-model and intra-model
construct equivalences establishes a soimd foundation for formalizing the
representation and reasoning of translation knowledge for schema translation, as
shown in Figiare 2.6. To support the specification of inter-model and intra-model
construct equivalences, the semantic spaces of data models are formalized as model
schemas using a metamodel. Supported by construct equivalence transformation and
reasoning methods, the construct-equivalence-based approach in a metamodeling
paradigm provides a formal, declarative, bi-directional, and extensible technique for
schema translation.
Furthermore, as depicted in Section 6.3.2, the construct-
equivalence-based schema translation technique facilitates the specification of
translation knowledge by problem decomposition and reusability of existing
translation knowledge. The use of intra-model construct equivalences can improve
schema translation quality because semantic loss resulting firom the non-overlapped
semantic space of the source data model can be minimized and semantic
enhancement resulting fi^om the non-overlapped semantic space of the target data
model can be maximized.
259
2. A pioneering attempt to facilitate schema integration using intra-model
construct equivalences in solving structural conflicts: The conventional schema
integration process treats structural conflicts and other types of conflicts (i.e.,
semantic, descriptive and extension conflicts) uniformly. However, as analyzed in
Section 2.2.3, knowledge for identifying and resolving structural conflicts differs
from that for identifying and resolving other types of conflicts. The former requires
formalizing the semantics of model constructs and equivalences among model
constructs of a data model, while the latter are based on understanding the data
semantics of LDBSs.
Schema normalization based on intra-model construct
equivalences (defined on the model schema of a data model) separates the processing
of structural conflicts from that of other types of conflicts and facilitates schema
integration by reducing, if not totally resolving, structural conflicts before dealing
with other types of conflicts.
3. An integrated methodology for schema translation and schema normalization in
a large-scale MDBS environment: The notion of construct equivalences reveals
similarities in problem formulation of schema translation and schema normalization.
The construct-equivalence-based methodology in a metamodeling
paradigm
developed in this dissertation research provides an integrated technique for schema
translation and schema normalization essential to schema management for large-scale
MDBSs. Intra-model construct equivalences defined for schema translation can be
utilized by schema normalization. Moreover, construct equivalence transformation
and reasoning methods are applicable to both the schema translation and schema
normalization processes.
260
4. Formalization of a metamodel for formalizing data models: As discussed in
Section 3.2, any of the prevalent alternatives (i.e., EER, 00 and FOL) to the
metamodel cannot fully satisfy the requirements of a metamodel. The SOGER model
proposed in this dissertation integrates aspects of these alternatives in meeting the
requirements of a metamodel.
It consists of a set of semantics-rich structural,
behavioral, and declarative constraint constructs to represent three components of a
model: model constructs, model constraints and model semantics.
5. A technique for inductive metamodeling: The metamodeling paradigm for
construct-equivalence-based schema translation and schema normalization requires
metamodeling data models. The Abstraction Induction Technique for inductive
metamodeling aims at automating the metamodeling process to make feasible and
promising the construct-equivalence-based methodology.
Satisfactory evaluation
results (see Section 4.4) shows the effectiveness of using the Abstraction Induction
Technique for inducing model schemas from application schemas without interaction
with model designers.
6. Formal language for expressing schema translation and normalization
knowledge: Past research failed to propose a formal and declarative language for
expressing schema translation and normalization knowledge. This deficiency results
in schema translation and schema normalization being procedural. With associated
transformation and reasoning methods, CEAL serves as a formal and declarative
language to represent schema translation and normalization knowledge in the form of
construct equivalences.
261
7.1.2 Contributions to Other Research Areas
1. From a global perspective, schema translation and schema normalization for largescale MDBSs are within a broad research area of method/model integration and
interoperability.
The proposed research framework
and integrated methodology
based on the notion of construct equivalence in the metamodeling paradigm can be
adopted or provide insights for methodology development for managing or
integrating IT-related methods/models (e.g., system analysis and design methods,
transaction management models, query language translation, etc.).
2. An efficient metamodeling process is an essential component of a meta-CASE tool.
The Abstraction Induction Technique for inductive metamodeling can be adopted to
provide this missing component of existing meta-CASE tools and could also be
applied when integration of CASE tools or meta-CASE tools is inevitable.
7.2 Future Research Directions
This dissertation encompasses a broad research domain that includes a metamodeling
paradigm and metamodel development, inductive metamodeling technique, a formal and
declarative construct equivalence assertion language, as well as construct-equivalencebased schema translation and schema normalization.
formulate a theoretical framework
Although initial efforts to
for developing and evaluating the construct-
equivalence-based methodology for schema translation and schema normalization have
been described in this dissertation, much additional work is needed.
262
1. Empirical evaluation of the construct-equivalence-based methodology: More
empirical evaluation of the Abstraction Induction Technique for inductive
metamodeling should be conducted to validate its applicability to inducing model
schemas of different data models.
The effectiveness of bi-directional schema
translation and schema normalization of the proposed construct-equivalence-based
methodology should also be evaluated using real-world applications. Moreover,
expressiveness and user-friendliness of CEAL also needs to be evaluated.
2. Development of a construct equivalence verification method: The quality of
schema translation or schema normalization depends heavily on the correctness of the
inter-model and intra-model construct equivalences provided. Thus, a method for
verifying the completeness and consistency of construct equivalences is needed and
essential to the construct-equivalence-based methodology.
3. Development of a construct equivalence induction/deduction method: Currently,
model engineers play an important role in specifying inter-model and intra-model
construct equivalences. Like the knowledge-engineer-driven knowledge engineering
process in developing knowledge-based systems, the model-engineer-driven
construct equivalence specification process is knowledge-intensive and usually errorprone and could become a bottleneck for schema translation and schema
normalization. If inter-model and intra-model construct equivalences of two data
models can be induced from translation or normalization examples from those two
data models and/or can be deduced from their model schemas or existing construct
equivalences related to intermediate data models, the problems pertaining to the
263
model-engineer-driven construct equivalence specification process can be overcome
and the correctness of construct equivalences will be better ensured.
4. System development of the construct-equivalence-based methodology: Partial
implementation (i.e.. Abstraction Induction Technique) of the construct-equivalencebased methodology has been prototyped in this dissertation research. More system
development efforts involving the implementation of construct equivalence
transformation and reasoning methods and linking of the current prototype of
Abstraction Induction Technique to existing DBMSs are required.
In addition to efforts for continuing the development and validation of the constructequivalence-based methodology, the following future research for extending this
methodology to other research areas is proposed.
1. Extending the methodology to query language translation and integration of
transaction management models: Query language translation and integration of
transaction management models in an MDBS are important to its operations.
Investigation of applicability and extension of the research framework
and the
construct-equivalence-based methodology to these two research issues are suggested.
2. Development of a meta-CASE tool incorporating the inductive metamodeling
process: The Abstraction Induction Technique for inductive metamodeling can be
adopted for development of a meta-CASE tool or the functionality enhancement of
an existing meta-CASE tool. If the meta-CASE tool (existing or to be developed)
into which the inductive metamodeling process will be incorporated is based on the
SOOER metamodel, the Abstraction Induction Technique developed in Section 4.2
264
can be directly adopted without any modification. However, if it is not SOOERbased, some modification to the concept generalization and constraint generation
rules would be needed.
265
APPENDIX A
Relationships between Synthesized Taxonomy of Conflicts and Other
Taxonomies
Semantic
Conflict
Descriptive
Conflict
Structural
Conflict
Extension
Conflict
[RPR94]
- Identification
of related
concepts
Naming, key, behavioral,
attribute incompatibility,
abstraction level of
attributes, and scaling
conflicts
Type conflict
Level of
accuracy,
asynchronous
updates, and
lack of
security
[BLN86]
Naming conflict
- Structural
conflict
including
type conflict
[KS94]
- Structural
conflict
including
dependency,
key, and
behavioral
conflicts
- Abstraction
level
incompatibility
including
generalization
and
aggregation
conflicts
[SP91],
[SPD92]
Semantic
conflict
- Domain definition
incompatibility
including naming, data
representation, scaling,
data coding, default
value, and integrity
constraint conflicts
- Entity definition
Incompatibility
including naming,
identifier, schema
isomorphism, (i.e.,
abstraction level of
attributes), union
compatibility (i.e.,
attribute incompatibility)
conflicts
Descriptive conflict
including naming,
attribute domain, scale,
cardinalities, and
operations.
- Data value
incompatibi
lity including
known
inconsistency,
temporal
inconsistency
and
acceptable
inconsistency
Structural
conflict
Schematic
Conflict
Data valueattribute,
attributeentity, and
data valueentity
conflicts
266
APPENDIX B
Model Schema of A Relational Data Model
MODEL: Relational
{
ENTITY-CLASS: Relation
{
ATTRIBUTE:
Name
(UNIQUENESS: unique
NULL-SPEC: not-nuil
MULTIPLICITY: single-valued)
IDENTIFIER: Name
CONSTRAINT:
/* explicit constraints specific to the Relational model */
RC1:
Vr: Relation, Vp: r.Primary-key, A = p.compose-of.Attribute
=> A c r.Attribute
RC2:
Vr: Relation, Vf: r.Foreign-key, A = f.participate.Attribute
=> A c r.Attribute
/* implicit constraints instantiated from the metamodel semantics */
RC3:
Vol: Relation, Vo2: Relation, ol^s: o2 => ol.Name ^ o2.Name
RC4:
Vo I: Relation => count(o 1 .Name) = 1
RC5:
Vo 1: Relation => count(o 1 .aggregate-^Consist-of->Attribute) > 1
RC6:
Vo 1: Relation => count(o 1 .aggregate->Consist-of->^Primary-Key) = 1
RC7:
Vol: Relation count(ol.aggregate->Consist-of->Foreign-Key) > 0
ENTITY-CLASS: Attribute
{
ATTRIBUTE:
(UNIQUENESS: unique
Name
NULL-SPEC: not-null
MULTIPLICITY: single-valued)
(UNIQUENESS:
not-unique
DataType
NULL-SPEC: not-null
MULTIPLICITY: single-valued)
(UNIQUENESS: not-unique
IsUnique
NULL-SPEC: not-null
MULTIPLICITY: single-valued)
(UNIQUENESS: not-unique
IsNull
NULL-SPEC: not-null
MULTIPLICITY: single-valued)
DefaultValue (UNIQUENESS: not-unique
267
NULL-SPEC: null
MULTIPLICITY: single-valued)
IDENTIFIER: Name
CONSTRAINT:
/* explicit constraints specific to the Relational model */
ACl:
Va: Attribute => aJsUnique e {unique, not-unique}
AC2:
Va: Attribute => aJsNull e {null, not-null}
/* implicit constraints instantiated from the metamodel semantics */
ACS:
Vol: Attribute, Vo2: Attribute, ol^^oZ => ol.Name vio2.Name
AC4:
Vo I: Attribute => count(o 1.Name) = 1
ACS:
Vo1: Attribute => count(o 1 .DataType) = 1
AC6:
Vo 1: Attribute => count(o 1.IsUnique) = 1
AC7:
Vol: Attribute =>count(ol.IsNull) = 1
ACS:
Vo1: Attribute
count(o I .DefauItValue) < I
AC9:
Vo 1: Attribute => count(o1.Compose-of.Primary-Key) > 0
AC 10:
Vo1: Attribute => count(o 1.Participate.Foreign-Key) > 0
ACII:
V o1 : A t t r i b u t e = > c o u n t ( o 1. C o m p o s e - o f . P r i m a r y - K e y ) < 1
AC 12:
Vo1: Attribute => count(o 1.Participate.Foreign-Key) < 1
}
ENTITY-CLASS: Primary-Key
{
CONSTRAINT:
/* the implicit constraints instantiated from the metamodel semantics *!
PC 1:
Vo 1: Primary-Key => count(o 1.Compose-of.Attribute) > 1
PC2:
Vo1: Primary-Key => count(o 1.Referenced.Foreign-Key) > 0
ENTITY-CLASS: Foreign-Key
{
CONSTRAINT:
/* implicit constraints instantiated from the metamodel semantics */
FCI:
Vo1: Foreign-Key => count(o 1.Participate.Attribute) > I
FC2:
Vo I: Foreign-Key => count(o1.Referenced.Primary-Key) = 1
}
RELATIONSHIP CLASS: Consist-of
{
TYPE: AGGREGATION
AGGREGATE-CLASS: Entity
COMPONENT-CLASS:
Attribute (MIN-CARDINALITY: 1, MAX-CARDINALITY: m)
Primary-Key (MIN-CARDINALITY: I, MAX-CARDINALITY: 1)
Foreign-Key (MIN-CARDINALITY: 0, MAX-CARDINALITY: m)
RELATIONSHIP CLASS: Compose-of
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
Primary-Key (MIN-CARDINALITY: 0, MAX-CARDINALITY: 1)
Attribute (MIN-CARDINALITY: 1, MAX-CARDINALITY: m)
}
RELATIONSHIP CLASS: Participate
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
Attribute (MIN-CARDINALITY: I, MAX-CARDINALITY: m)
Foreign-Key (MIN-CARDINALITY: 0, MAX-CARDINALITY: I)
}
RELATIONSHIP CLASS: Referenced
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
Primary-Key (MIN-CARDINALITY: UMAX-CARDINALITY: 1)
Foreign-Key (MIN-CARDINALITY: 0, MAX-CARDINALITY: m)
}
269
APPENDIX C
Common Separators and Their Implications to Concept Hierarchy
Creation in Abstraction Induction Technique
The set of commonly used separators includes (not exhaustively).,;:(){ }->=> = of.
Separator
>
(and)
{ and }
->
=>
of
Implications to Concept Hierarchy Creation
A dot at the end of a line indicates the end of an example. If a dot is
prefixed and postfixed by terms without spaces in between, it implies
that the prefixed term is a qualification term of the postfixed term or the
postfixed term is a constituent concept of the prefixed term. In either
case, a has-a link is implied from the prefixed term to the postfixed term.
A comma separates two terms or two sequences of terms. These two
terms (or sequences of terms) usually are in the same level of the concept
hierarchy.
Same as
A colon usually is prefixed and postfixed by terms with or without
spaces in between. It may indicate that the postfixed term is a
constituent concept of the prefixed term; thus, a has-a link is implied
from the prefixed term to the postfixed term.
Any sequence of terms enclosed within an opening bracket and its
corresponding ending bracket usually is the constituent concept of the
term before the opening bracket. Hence, a set of has-a links are implied
from the term before the opening bracket to each of terms enclosed
within brackets. If the prior character of'(' is not a space or a separator,
the pair of'(' and ')' are treated as part of a term.
Same as '(' and ')'.
If there is no space inmiediately before and after
it may indicate the
prefixed term is a qualification term of the postfixed term or the
postfixed term is a constituent concept of the prefixed term. Thus, a hasa liiik is implied from the prefixed term to the postfixed term.
Same as
An equality sign indicates that the postfixed term or sequence of terms
are the constituent concept of the prefixed term. A has-a link is implied
from the prefixed term to the postfixed tenn(s).
The word 'of indicates that the prefixed term is the constituent concept
of the postfixed term, or that the postfixed term is the qualification term
of the prefixed term. In either case, a has-a link is implied from the
postfixed term to the prefixed one.
270
APPENDIX D
Evaluation Study 1: Relational Model Schema Induced from A
University Health Center Database
Property Hierarchies:
Data-type = {int, integer, char(n), text(n), double, integer,
character(n), numeric}
char(n)
= {char(lO), char(20)}
Null-spec = {null, not-null}
Stopword and Keywords:
## STOPWORD
CREATE
DEFINE
REFERENCE
## KEYWORD
RELATION
TABLE
ATTRIBUTE
Training Examples:
##
TABLE DISPENSE
(TREATMENT_NO DOUBLE NOT-NULL,
MED_CODE DOUBLE NOT-NULL,
MED_QTY INTEGER NOT-NULL,
PRIMARY-KEY TREATMENT_NO MED_CODE)
##
TABLE DOCTOR
(DOCTOR_ID DOUBLE NOT-NULL,
NAME TEXT(30) NOT-NULL,
PRIMARY-KEY DOCTOR_ID)
##
TABLE DRUGS
(CODE DOUBLE NOT-NULL,
MED_NAME TEXT(30) NOT-NULL,
MED_DESC MEMO NOT-NULL,
271
USE_METHOD MEMO NOT-NDLL,
DNIT TEXT(5) NOT-NULL,
PRIMARY-KEY CODE)
##
TABLE STUDENT
(STUDENT_ID INTEGER NOT-NULL,
NAME TEXT(30) NOT-NULL,
ADDRESS TEXT(100) NULL,
DEPT TEXT(4) NULL,
AGE INTEGER NOT-NULL,
GENDER TEXT(l) NOT-NULL,
TEL TEXT(15) NULL,
STATUS TEXTd) NOT-NULL,
CREDIT INTEGER NULL,
REMARKS MEMO NULL,
DATE_BIRTH DATE NOT-NULL,
DATE_REGISTER DATE NOT-NULL,
REGISTER_STATUS YES/NO NOT-NULL,
REGISTER_TIME TIME NULL,
TREATMENT_ROOM INTEGER NOT-NULL,
PRIMARY-KEY STUDENT_ID)
##
TABLE TREATMENT
(TREATMENT_NO COUNTER NOT-NULL,
D_ID DOUBLE NOT-NULL,
S_ID INTEGER NOT-NULL,
TREATMENT_DATE DATE NOT-NULL,
TREATMENT_TIME TIME NULL,
DIAGNOSIS MEMO NULL,
DISPENSE_STATUS YES/NO NOT-NULL,
PRIMARY-KEY TREATMENT_NO)
##
TABLE SPECIALTIES
(DOCTOR_ID DOUBLE NOT-NULL,
SPECIALTY TEXT(20) NOT-NULL,
PRIMARY-KEY DOCTOR_ID SPECIALTY)
##
FOREIGN-KEY (SPECIALTIES.DOCTOR_ID) REFERENCE (DOCTOR.DOCTOR_ID)
##
FOREIGN-KEY (DISPENSE.TREATMENT_NO) REFERENCE (TREATMENT.TREATMENT_NO)
##
FOREIGN-KEY (DISPENSE.MED_CODE) REFERENCE (DRUGS.CODE)
##
272
FOREIGN-KEY (TREATMENT.D_ID) REFERENCE (DOCTOR.DOCTOR_ID)
##
FOREIGN-KEY (TREATMENT.S ID) REFERENCE (STDDENT.STUDENT ID)
Induced Model Schema:
ENTITY-CLASS: PRIMARY-KEY
{
}
ENTITY-CLASS: FOREIGN-KEY
{
}
ENTITY-CLASS: C0NSTRUCT_1
{
ATTRIBUTES:
name:
TYPE: char(15)
UNIQUENESS: unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
DATA-TYPE:
TYPE: char(15)
UNIQUENESS: not-unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
NULL-SPEC:
TYPE: char(15)
UNIQUENESS: not-xinique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
CONSTRAINT:
Constraint_l: forall ol:C0NSTRUCT_1, forall o2:C0NSTRUCT_1,
ol 1= o2 => ol.name 1= o2.nanie
Constraint_2: forall ol:C0NSTRUCT_1 => count(ol.name) = 1
Constraint_3: forall a:C0NSTRUCT_1 => a.DATA-TYPE in {iNT,
CHAR(N), TEXT(N), DOUBLE, INTEGER, CHARACTER(N), NUMERIC,
DATE, MEMO, MEMO, COUNTER, MEMO, YES/NO, TIME, COUNTER,
TIME, MEMO, YES/NO}
Constraint_4: forall ol:C0NSTRUCT_1 => coxmt(ol.DATA-TYPE) = 1
Constraint_5: forall a:C0NSTRUCT_1 =>
a.NULL-SPEC in {NULL, NOT-NULL}
Constraint_6: forall ol:C0NSTRUCT_1 => coxint(ol.NULL-SPEC) = 1
}
273
ENTITY-CLASS: TABLE
{
ATTRIBUTES:
name:
TYPE: Char(15)
UNIQUENESS: unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
CONSTRAINT:
Constraint_7: forall ol:TABLE, forall o2:TABLE,
ol != o2 => ol.name != o2.name
Constraint_8: forall ol:TABLE => covmt(ol.name) = 1
Constraint_14: forall c:TABLE, forall e:c.PRIMARY-KEY,
A=e.RELATE_1.C0NSTRUCT_1 => A included-in c.C0NSTRUCT_1
Constraint_18: forall c:TABLE, forall e:c.FOREIGN-KEY,
A=e.RELATE_2.C0NSTRUCT_l => A included-in c.C0NSTRUCT_1
}
RELATIONSHIP-CLASS: C0NSIST_0F_1
{
TYPE: AGGREGATION
AGGREGATE: TABLE
COMPONENT:
C0NSTRUCT_1 (MIN-CARDINALITY: 2, MAX-CARDINALITY:
PRIMARY-KEY (MIN-CARDINALITY: 1, MAX-CARDINALITY:
FOREIGN-KEY (MIN-CARDINALITY: 0, MAX-CARDINALITY:
CONSTRAINT:
Constraint_9: forall ol:TABLE =>
count(ol.aggregate->CONSIST_OF_l->component.CONSTRUCT_l)
Constraint_10: forall ol:TABLE =>
count(ol.aggregate->CONSIST_OF_l->component.PRIMARY-KEY)
Constraint_ll: forall ol:TABLE =>
count(ol.aggregate->C0NSIST_0F_1->component.FOREIGN-KEY)
M)
1)
M)
>= 2
= 1
>= 0
RELATIONSHIP-CLASS: RELATE_1
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
PRIMARY-KEY (ROLE: REFER_TO,
MIN-CARDINALITY: 0, MAX-CARDINALITY: 1)
C0NSTRUCT_1 (ROLE: REFERED_BY,
MIN-CARDINALITY: 1, MAX-CARDINALITY: M)
CONSTRAINT:
Constraint_12: forall ol:PRIMARY-KEY =>
count(ol.REFER_TO->RELATE_l->REFERED_BY.CONSTRUCT_l) >= 1
Constraint_13: forall ol:C0NSTRUCT_1 =>
count(ol.REFERED_BY->RELATE_l->REFER_TO.PRIMARY-KEY) >= 0
}
274
RELATIONSHIP-CLASS: REIjATE_2
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
FOREIGN-KEY (ROLE: REFER_TO,
MIN-CARDINALITY: 0, MAX-CARDINALITY: 1)
C0NSTRUCT_1 (ROLE: REFERED_BY,
MIN-CARDINALITY: 1, MAX-CARDINALITY: 1)
CONSTRAINT:
Constraint_15: forall ol:FOREIGN-KEY =>
coimt(ol.REFER_TO->RELATE_2->REFERED_BY.C0NSTRUCT_1) = 1
Constraint_16: forall ol:C0NSTRUCT_1 =>
covmt(ol.REFERED_BY->RELATE_2->REFER_T0.FOREIGN-KEY) >= 0
Constraint_17: forall ol:C0NSTRUCT_1 =>
coimt(ol.REFERED_BY->RELATE_2->REFER_T0.FOREIGN-KEY) <= 1
}
RELATIONSHIP-CLASS: RELATE_3
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
FOREIGN-KEY (ROLE: REFER_TO,
MIN-CARDINALITY: 0, MAX-CARDINALITY: M)
PRIMARY-KEY (ROLE: REFERED_BY,
MIN-CARDINALITY: 1, MAX-CARDINALITY: 1)
CONSTRAINT:
Constraint_19: forall ol:FOREIGN-KEY =>
count(ol.REFER_T0->RELATE_3->REFERED_BY.PRIMARY-KEY) = 1
Constraint_20: forall ol:PRIMARY-KEY =>
count(ol.REFERED_BY->RELATE_3->REFER_TO.FOREIGN-KEY) >= 0
Constraint_21: forall ol:PRIMARY-KEY =>
covmt(ol.REFERED_BY->RELATE_3->REFER_T0.FOREIGN-KEY) <= M
}
275
APPENDIX E
Evaluation Study 2: Network Model Schema Induced from A
Hypothetical Company Database
Property Hierarchies:
Data-type = {int, integer, char(n), text(n), double, integer,
character(n), numeric}
char(n)
= {char(lO), char(20)}
Null-spec = {null, not-null}
Stopword and Keywords:
## STOPWORD
TYPE
IS
ASCENDING
DECENDING
## KEYWORD
RECORD
SET
INSERTION
RETENTION
OWNER
MEMBER
Training Examples:
##
RECORD IS EMPLOYEE
{NOT-DUPLICATE SSN,
NOT-DUPLICATE FNAME MINIT LNAME,
FNAME TYPE IS CHARACTER(15),
MINIT TYPE IS CHARACTER(1) ,
LNAME TYPE IS CHARACTER(15) ,
SSN TYPE IS CHARACTER(9),
DEPTNAME TYPE IS CHARACTER(IS)}
##
RECORD IS DEPARTMENT
{NOT-DUPLICATE NAME,
NAME TYPE IS CHARACTER(15),
NUMBER TYPE IS INTEGER,
LOCATION TYPE IS CHARACTER(15),
MGRSTART TYPE IS CHARACTER(15)}
##
RECORD IS PROJECT
{NOT-DUPLICATE NAME,
NOT-DUPLICATE NUMBER,
NAME TYPE IS CHARACTER(15),
NUMBER TYPE IS INTEGER,
LOCATION TYPE IS CHARACTER(15)}
##
RECORD IS WORKS_ON
{NOT-DUPLICATE ESSN PNUMBER,
ESSN TYPE IS CHARACTER(9),
PNUMBER TYPE IS INTEGER,
HOURS TYPE IS NUMERIC}
##
SET IS WORKS_FOR
{OWNER IS DEPARTMENT,
MEflBER IS EMPLOYEE,
INSERTION IS MANUAL,
RETENTION IS OPTIONAL}
##
SET IS MANAGES
{OWNER IS EMPLOYEE,
MEMBER IS DEPARTMENT,
INSERTION IS AUTOMATIC,
RETENTION IS MANDATORY}
##
SET IS CONTROLS
{OWNER IS DEPARTMENT,
MEMBER IS PROJECT,
INSERTION IS AUTOMATIC,
RETENTION IS MANDATORY}
##
SET IS P_WORKSON
{OWNER IS PROJECT,
MEMBER IS WORKS_ON,
INSERTION IS MANUAL,
RETENTION IS FIXED}
Induced Model Schema:
ENTITY-CLASS: NOT-DUPLICATE
{
}
ENTITY-CLASS: C0NSTRUCT_1
{
ATTRIBUTES:
name:
TYPE: char(15)
UNIQUENESS: unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
DATA-TYPE:
TYPE: char(15)
UNIQUENESS: not-unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
CONSTRAINT:
Constraint_l: forall ol:C0NSTRUCT_1, forall o2:C0NSTRUCT_1,
ol != o2 => ol.name != o2.name
Constraint_2: forall ol:C0NSTRUCT_1 => count(ol.name) = 1
Constraint_3: forall a:C0NSTRUCT_1 =>
a.DATA-TYPE in {iNT, CHAR(N), INTEGER, CHARACTER(N),
NUMERIC, DATE}
Constraint_4: forall ol:C0NSTRUCT_1 => count(ol.DATA-TYPE)
}
ENTITY-CLASS: RECORD
{
ATTRIBUTES:
name:
TYPE: char(15)
UNIQUENESS: unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
CONSTRAINT:
Constraint_5: forall ol:RECORD, forall o2:REC0RD,
ol != o2 => ol.name != o2.name
Constraint_S: forall ol:REC0RD => count(ol.name) = 1
Constraint_17: forall c:REC0RD, forall e:c.NOT-DUPLICATE,
A=e.RELATE_1.CONSTRUCT 1 => A included-in c.CONSTRUCT
}
RELATIONSHIP-CLASS: C0NSIST_0F_1
{
TYPE: AGGREGATION
AGGREGATE: RECORD
COMPONENT:
NOT-DUPLICATE (MIN-CARDINALITY: 1, MAX-CARDINALITY
278
C0NSTRUCT_1 (MIN-CARDINALITY: 3, MAX-CARDINALITY: M)
CONSTRAINT:
Constraint_7: forall ol:RECORD =>
coiint (ol.aggregate->CONSIST_OF_l->coniponent.NOT-DUPLICATE) >= 1
Constraint_8: forall ol:RECORD =>
count(ol.aggregate->CONSIST_OF_l->component.CONSTRUCT_l) >= 3
}
ENTITY-CLASS: SET
{
ATTRIBUTES:
name:
TYPE: char(15)
UNIQUENESS: unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
INSERTION:
TYPE: char(15)
UNIQUENESS: not-unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
RETENTION:
TYPE: char(15)
UNIQUENESS: not-unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
CONSTRAINT:
Constraint_9: forall ol:SET, forall o2:SET, ol != o2 =>
ol.name != o2.name
Constraint_10: forall ol:SET => coiint (ol.name) = 1
Constraint_ll: forall a:SET => a.INSERTION in {MANUAL, AUTOMATIC}
Constraint_12: forall ol:SET => coxmt(ol.INSERTION) = 1
Constraint_13: forall a:SET =>
a.RETENTION in {OPTIONAL, MANDATORY, FIXED}
Constraint_14: forall ol:SET => count(ol.RETENTION) = 1
}
RELATIONSHIP-CLASS: RELATE_1
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
NOT-DUPLICATE (ROLE: REFER_TO,
MIN-CARDINALITY: 0, MAX-CARDINALITY: 1)
C0NSTRUCT_1 (ROLE: REFERED_BY,
MIN-CARDINALITY: 1, MAX-CARDINALITY: M)
CONSTRAINT:
Constraint_15: forall ol:NOT-DUPLICATE =>
count(ol.REFER_TO->RELATE_1->REFERED_BY.C0NSTRUCT_1) >= 1
Constraint_16: forall ol:CONSTRUCT_l =>
count(ol.REFERED BY->RELATE 1->REFER TO.NOT-DUPLICATE) >= 0
279
}
RELATIONSHIP-CLASS: RELATE_2
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
SET (ROLE: REFER_TO,
MIN-CARDINALITY: 1, MAX-CARDINALITY: 1)
RECORD (ROLE: REFERED_BY, MIN-CARDINALITY: 2, MAXCARDINALITY: 2)
CONSTRAINT:
Constraint_18: forall ol:SET =>
count (ol.REFER_T0->RELATE_2->REFERED_BY.RECORD) = 2
Constraint_19: forall ol:RECORD =>
count(ol.REFERED_BY->RELATE_2->REFER_T0.SET) >= 1
}
280
APPENDIX F
Evaluation Study 3: Hierarchical Model Schema Induced from A
Hypothetical Company Database
Property Hierarchies:
Data-type = {int, integer, char(n) , text{n.), doiible, integer,
character(n), numeric}
char(n)
= {chardO), char(20)}
Null-spec = {null, not-null}
Stopword and Keywords:
## STOPWORD
IS
OF
ASCENDING
DECENDING
ASC
DESC
## KEYWORD
RECORD
ROOT
CHILD-NUMBER
POINTER
Training Examples:
##
RECORD EMPLOYEE
{ROOT OF HIERARCHIES.HIERARCHY2,
FNAME CHARACTER(15),
MINIT CHARACTER(1),
LNAME CHARACTER {15),
SSN CHARACTER(9),
BDATE CHARACTER(9),
KEY SSN}
##
RECORD DEPARTMENT
{ROOT OF HIERARCHIES.HIERARCHYl,
DNAME CHARACTER{15),
DNUMBER INTEGER,
KEY DNAME,
KEY DNDMBER}
##
RECORD DMANAGER
{PARENT DEPARTMENT,
CHILD-NUMBER 1,
MGRSTARTDATE CHARACTER(9),
POINTER MPTR EMPLOYEE}
##
RECORD PROJECT
{PARENT DEPARTMENT,
CHILD-NUMBER 4,
PNAME CHARACTER(15),
PNUMBER INTEGER,
PLOCATION CHARACTER(15),
KEY PNAME,
KEY PNUMBER}
##
RECORD PWORKER
{PARENT PROJECT,
CHILD-NUMBER 1,
HOURS CHARACTER(4),
POINTER WPTR EMPLOYEE}
##
RECORD DEMPLOYEES
{PARENT DEPARTMENT,
CHILD-NUMBER 2,
POINTER EPTR EMPLOYEE}
##
RECORD ESUPERVISEES
{PARENT EMPLOYEE,
CHILD-NUMBER 2,
POINTER SPTR EMPLOYEE}
##
RECORD DEPENDENT
{PARENT EMPLOYEE,
CHILD-NUMBER 1,
DEPNAME CHARACTER(15),
SEX CHARACTER(1),
BIRTHDATE CHARACTER(9) ,
RELATIONSHIP CHARACTER(10) }
##
HIERARCHIES {HIERARCHYl, HIERARCHYa}
Induced Model Schema:
ENTITY-CLASS: KEY
ENTITY-CLASS: PARENT
ENTITY-CLASS: POINTER
ENTITY-CLASS: C0NSTRUCT_1
{
ATTRIBUTES:
name:
TYPE: Char(15)
UNIQUENESS: unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
DATA-TYPE:
TYPE: Char(15)
UNIQUENESS: not-unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
CONSTRAINT:
Constraint_l: forall ol:C0NSTRUCT_1, forall o2:C0NSTRUCT_1
ol != o2 => ol.name != o2.name
Constraint_2: forall ol:C0NSTRUCT_1 => count(ol.name) = 1
Constraint_3: forall a:C0NSTRUCT_1 =>
a.DATA-TYPE in {iNT, CHAR(N), INTEGER, CHARACTER(N),
NUMERIC, DATE}
Constraint_4: forall ol:C0NSTRUCT_1 => count(ol.DATA-TYPE)
}
ENTITY-CLASS: RECORD
{
ATTRIBUTES:
name:
TYPE: char(15)
UNIQUENESS: unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
283
CHILD-NDMBER:
TYPE: char(15)
UNIQUENESS: not-unique
NULL-SPEC: null
MULTIPLICITY: single-valued
CONSTRAINT:
Constraint_5: forall ol:RECORD, forall o2:RECORD,
ol != o2 => ol.name != o2.name
Constraint_6: forall ol:RECORD => count(ol.name) = 1
Constraint_7: forall a:RECORD => a.CHILD-NUMBER in {l, 4, 2}
Constraint_8: forall ol:RECORD => count(ol.CHILD-NUMBER) <= 1
Constraint_24: forall c:RECORD, forall e:c.KEY,
A=e.RELATE_2.C0NSTRUCT_1 => A included-in c.C0NSTRUCT_1
}
RELATIONSHIP-CLASS: C0NSIST_0F_1
{
TYPE: AGGREGATION
AGGREGATE: RECORD
COMPONENT:
C0NSTRUCT_1 (MIN-CARDINALITY: 0, MAX-CARDINALITY: M)
KEY (MIN-CARDINALITY: 0, MAX-CARDINALITY: M)
PARENT (MIN-CARDINALITY: 0, MAX - CARDINALITY: 1)
POINTER (MIN-CARDINALITY: 0, MAX-CARDINALITY: 1)
CONSTRAINT:
Constraint_9: forall ol:RECORD =>
count (ol.aggregate->CONSIST_OF_l->component.CONSTRUCT_l) >= 0
Constraint_10: forall ol:RECORD =>
count (ol.aggregate->C0NSIST_0F_1->component.KEY) >= 0
Constraint_ll: forall ol:RECORD =>
count (ol.aggregate->CONSIST_OF_l->component.PARENT) >= 0
Constraint_12: forall ol:RECORD =>
count(ol.aggregate->CONSIST_OF_l->component.PARENT) <= 1
Constraint_13: forall ol:RECORD =>
count(ol.aggregate->CONSIST_OF_l->component.POINTER) >= 0
Constraint_14: forall ol:RECORD =>
coiant (ol.aggregate->CONSIST_OF_l->component.POINTER) <= 1
ENTITY-CLASS: HIERARCHIES
{
ATTRIBUTES:
name:
TYPE: char(15)
UNIQUENESS: unique
NULL-SPEC: not-null
MULTIPLICITY: single-valued
CONSTRAINT:
Constraint_15: forall ol:HIERARCHIES, forall o2:HIERARCHIES,
ol != o2 => ol.name != o2.name
284
Constraint_lS: forall ol:HIERARCHIES => count(ol.name) = 1
}
RELATIONSHIP-CLASS: RELATE_1
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
RECORD (ROLE: REFER_TO,
MIN-CARDINALITY: 0, MAX-CARDINALITY: 1)
HIERARCHIES (ROLE: REFERED_BY,
MIN-CARDINALITY: 0, MAX-CARDINALITY: 1)
CONSTRAINT:
Constraint_17: forall ol:RECORD =>
count(ol.REFER_TO->RELATE_l->REFERED_BY.HIERARCHIES) >= 0
Constraint_18: forall ol:RECORD =>
count(ol.REFER_TO->RELATE_1->REFERED_BY.HIERARCHIES) <= 1
Constraint_l9: forall ol:HIERARCHIES =>
count(ol.REFERED_BY->RELATE_l->REFER_TO.RECORD) >= 0
Constraint_20: forall ol:HIERARCHIES =>
count(ol.REFERED_BY->RELATE_l->REFER_TO.RECORD) <= 1
}
RELATIONSHIP-CLASS: RELATE_2
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
KEY (ROLE: REFER_TO,
MIN-CARDINALITY: 0, MAX-CARDINALITY: 1)
C0NSTRUCT_1 (ROLE: REFERED_BY,
MIN-CARDINALITY: 1, MAX-CARDINALITY: 1)
CONSTRAINT:
Constraint_21: forall ol:KEY =>
count(ol.REFER_TO->RELATE_2->REFERED_BY.C0NSTRUCT_1) = 1
Constraint_22: forall ol:C0NSTRUCT_1 =>
count(ol.REFERED_BY- >REIiATE_2- >REFER_TO.KEY) >= 0
Constraint_23: forall ol:C0NSTRUCT_1 =>
count(ol.REFERED_BY->RELATE_2->REFER_T0.KEY) <= 1
}
RELATIONSHIP-CLASS: RELATE_3
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
PARENT (ROLE: REFER_TO,
MIN-CARDINALITY: 0, MAX-CARDINALITY: 1)
RECORD (ROLE: REFERED_BY,
MIN-CARDINALITY: 1, MAX-CARDINALITY: 1)
CONSTRAINT:
Constraint_25: forall ol:PARENT =>
count(ol.REFER TO->RELATE 3->REFERED BY.RECORD) = 1
Constraint_2S: forall ol:RECORD =>
count(ol.REFERED_By- >RELATE_3 - >REFER_TO.PARENT)
Constraint_27: forall ol:RECORD =>
count (ol.REFERED_BY- >RELATE_3 - >REFER_TO.PARENT)
}
RELATIONSHIP-CLASS: RELATE_4
{
TYPE: ASSOCIATION
ASSOCIATED-ENTITY-CLASS:
POINTER (ROLE: REFER_TO,
MIN-CARDINALITY: 0, MAX-CARDINALITY: 1)
RECORD (ROLE: REFERED_BY,
MIN-CARDINALITY: 1, MAX-CARDINALITY: 1)
CONSTRAINT:
Constraint_28: forall ol:POINTER =>
count(ol.REFER_TO- >RELATE_4 - >REFERED_BY.RECORD)
Constraint_29: forall ol:RECORD =>
count (ol.REFERED_BY- >RELATE_4- >REFER_TO.POINTER)
Constraint_30: forall ol:RECORD =>
count (ol.REFERED_BY->RELATE_4 - >REFER_TO.POINTER)
}
286
APPENDIX G
Inter-Model Construct Equivalences between the EER and SOOER
Models
1. Each relation whose primary key attributes do not contain any of its foreign key
attributes in the relational model is equivalent to an entity-class in the SOOER
model. The name of the entity-class corresponds to that of the relation.
R: Relational.Relation (WHERE R.Primary-Key.Attribute.Foreign-Key = 0)
E: SOOER.Entity-Class
WITH E.Name = R.Name
2. In the relational model, each relation 1) whose primary key contains a foreign key to
a single relation and non-foreign key attributes, and 2) whose attributes are its
primary key is equivalent to a new entity-class and a new association relationship
which relates the new entity-class and the entity-class for the referenced relation in
the SOOER model. The name of the entity-class corresponds to that of the relation.
The attribute(s) of the new entity-class is the non-foreign key attribute(s) of the
relation and becomes the primary key attribute(s) of the new entity-class.
R l : Relational.Relation
(WHERE Rl.Primary-Key.Attribute - Rl .Foreign-Key.Attribute ^ 0 AND
Rl.Attribute = Rl.Primary-Key.Attribute) AND
R2: Relational.Relation
(WHERE R2.Primary-Key.Attribute =
Rl.Primary-Key.Attribute.Foreign-Key.Attribute)
E l : SOOER.Entity-Class AND
E2: SOOER.Entity-Class AND
A: SOOER.Association
WITH E2 = R2 AND
A.relate.Entity-Class = {El, E2} AND
El.Name = Rl.Name AND
El.Attribute = Rl.Attribute - R2.Primary-Key.Attribute AND
El.Primary-Key.Attribute = El.Attribute AND
A.relate.El (Max-Card) = m AND
A.relate.El(Min-Card) = 1
IF ext(Rl.Primary-Key.Attribute n Rl .Foreign-Key.Attribute) =
ext(R2.Primary-key.Attribute) AND
A.relate.El (Min-Card) = 0
287
IF ext(Rl.Primary-Key.Attribute n Rl.Foreign-Key.Attribute) *
ext(R2.Primary-key.Attribute) AND
A.reiate.E2(Min-Card) = 1 AND
A.reIate.E2(Max-Card) = m
IF has_duplicate(ext(Rl.Primary-Key.Attribute Rl.Foreign-Key.Attribute)) = true AND
A.relate.E2(Max-Card) = 1
IF has_duplicate(ext(Rl.Primary-Key.Attribute Rl.Foreign-Key.Attribute)) = false
3. Each relation R whose primary key is a foreign key to a single relation in the
relational model is equivalent to a new entity-class and a new specialization
relationship in the SOGER model. The new entity-class is the superclass and the
entity-class for the relation referenced by the foreign key is the subclass of the
specialization relationship. The name of the specialization relationship corresponds
to that of R.
R l : Relational.Relation AND
R2: Relational.Relation
(WHERE
R2.Primary-Key = Rl .Foreign-Key.Primary-Key AND
R2.Primary-Key.Attribute = Rl.Primary-Key.Attribute)
E l : SOOER.Entity-Class AND
E2: SOOER.Entity-Class AND
S: SOOER.Specialization
WITH E2 = R2AND
S.superclass.Entity-Class = {E2} AND
S.subclass.Entity-Class = (El} AND
El.Name = Rl.Name AND
Rl.Foreign-Key.Attribute = Rl .Foreign-Key.Attribute Rl.Primary-Key.Attribute AND
Rl.Attribute = Rl.Attribute - R2.Attribute
4. In the relational model, each relation whose primary key attributes are foreign key
attributes to more than one relations is equivalent a new association relationship
which connects to the entity-classes each of which is for the relation referenced by
the foreign key in the SOOER model. The name of the association relationship is the
same as that of R.
R: Relational.Relation AND
S: {T: R.Primary-Key.Attribute.Foreign-Key.Primary-Key.Relation}
(WHERE count(S) > 1)
D: {E: SOOER.Entity-Class} AND
288
A: SOOER.Association
WITH E = T AND
A.relate.Entity-Class = D AND
A.Name = R.Name AND
R.Attribute = R.Attribute - union(E.Identifier.Attribute) AND
R.Primary-Key.Attribute = R.Priinary-Key.Attribute union(E.Identifier.Attribute) AND
R.Foreign-Key.Attribute = R.Foreign-Key.Attribute uiiion(E.Identifier.Attribute) AND
A.relate.E(Max-Card) = m IF
has_dupIicate(ext(R.Primary-Key.Attribute - T.Primary-Key.Attribute))
= true AND
A.relate.E(Max-Card) = 1 IF
has_duplicate(ext(R.Primary-Key.Attribute - T.Primary-Key.Attribute))
= false
5. Each non-foreign key attribute of a relation in the relational model is equivalent to a
single-valued attribute of the entity-class for the relation in the SOOER model. The
name (or data-type) of the single-valued attribute is the same as that of the nonforeign key attribute. The null-specification of the single-valued attribute is
determined by whether the non-foreign key attribute allows null values or not. The
uniqueness property of the single-valued attribute corresponds to the IsUnique
property of the non-foreign attribute.
R: Relational.Relation AND
N: R.Attribute - R.Foreign-Key.Attribute
E: SOOER.Entity-Class AND
A: E.Attribute
WITH E = RAND
A.multiplicity = 'single-valued' AND
A.Name = N.Name AND
A.Null-Spec = N.IsNull AND
A.Uniqueness = N.IsUnique AND
A.DataType = N.DataType
6. Each non-foreign key attribute of a relation in the relational model corresponds to a
single-valued attribute of the association relationship for the relation in the SOOER
model. The name (or data-type) of the single-valued attribute is the same as that of
the non-foreign key attribute. The null-specification of the single-valued attribute is
determined by whether the non-foreign key attribute allows null values or not. The
uniqueness property of the single-valued attribute corresponds to the IsUnique
property of the non-foreign attribute.
289
R; Relational.Relation AND
N: R.Attribute - R.Foreign-Key.Attribute
E: SOOER-Association (WHERE E = R) AND
A: E.Attribute
WITH A.Name = N.Name AND
A.Multiplicity = 'single-valued' AND
A.Null-Spec = N.IsNull AND
A.Uniqueness = N.IsUnique AND
A.DataType = N.DataType
7. Each non-foreign key primary key attribute of a relation in the relational model is
equivalent to an identifier attribute of the entity-class for the relation in the SOOER
model.
R: Relational.Relation AND
K: R.Primary-Key.Attribute - R.Foreign-Key.Attribute
E: SOOER.Entity-Class
WITH E = RAND
E.Identifier.Attribute = E.Identifier.Attribute u {K}
8. Each foreign key which is not part of the primary key in the relational model is
equivalent to an association relationship in the SOOER model which connects to the
entity-classes for the relation where the foreign key is defined and the relation
referenced by the foreign key. Furthermore, if the extension of the foreign key
contains duplicate tuples, the maximum cardinality on the entity-class for the relation
where the foreign key is defined is m; otherwise it is 1. If the extension of the
foreign key is the same as that of the primary key to which the foreign key refers, the
minimum cardinality on the entity-class for the relation where the foreign key is
defined is 1; else, it is 0. The maximum cardinality on the entity-class for the
relation referenced by the foreign key is 1. If the extension of the foreign key
contains null values, the minimum cardinality on the entity-class for the relation
referenced by the foreign key is 0; otherwise, it is 1."
R: Relational.Relation AND
F: R.Foreign-Key (WHERE F.Attribute n R.Primary-Key.Attribute = 0) AND
R2: F.Primary-Key.Relation
E l : SOOER.Entity-Class AND
E2: SOOER.Entity-Class AND
A; SOOER.Association
WITH E l = R A N D
E2 = R2 AND
290
A.relate.Entity-Class = {El, E2} AND
A.reIate.El (Max-Card) = m IF
has_dupIicate(ext(F.Attribute)) = true AND
A.relate.El (Max-Card) = 1 IF
has_dupIicate(ext(F.Attribute)) = false AND
A.relate.El(Min-Card) = I IF
ext(F.Attribute) = ext(F.Primary-key.Attribute) AND
A.relate.El (Min-Card) = 0 IF
ext(F.Attribute) ^ ext(F.Primary-key.Attribute) AND
A.reIate.E2(Max-Card) = 1 AND
A.relate.E2(Min-Card) = 0 IF has_null(ext(F.Attribute)) = true AND
A.relate.E2(Min-Card) = 1 IF has_null(ext(F.Attribute)) = false AND
R.Attribute = R.Attribute - R.Foreign-Key.Attribute
291
APPENDIX H
Intra-Model Construct Equivalences of the SOOER Model
A multi-valued attribute in the SOOER model is equivalent to an entity-class, a
single-valued attribute and an association relationship in the SOOER model. The
association relationship connects the new entity-class and the existing entity-class to
which the multi-valued attribute belongs. The multi-valued attribute is removed from
its entity-class. The single-valued attribute becomes the identifier of the new entityclass. As the identifier of the new entity-class, the uniqueness and null-spec
properties of the single-valued attribute are 'unique' and 'not-null', respectively. The
name and data type of the single-valued attribute are the same as those of the multi
valued attribute. If different instances of the existing entity-class can share the same
value of the multi-valued attribute (i.e., the uniqueness property of multi-valued
attribute is 'not-unique'), the maximum cardinality on the existing entity-class is m;
otherwise, it is 1. The minimum cardinality on the existing entity-class is always 1
since each value of the multi-valued attribute was associated to an instance of the
existing entity-class. On the other hand, as a result of transformed from a multi
valued attribute, the maximum cardinality on the new entity-class is always m. That
the IsNull property of the multi-valued attribute is 'null' means that the an instance
of the existing entity-class may not have any value in this multi-valued attribute;
resulting the minimum cardinality on the new entity-class is 0. If the IsNull property
of the multi-valued attribute is 'not-null', the minimirai cardinality on the new entityclass is 1.
E: Entity-Class AND
M: E.Attribute (WHERE M.Multiplicity = 'multi-valued')
N: Entity-Class AND
A; Attribute AND
R: Association
WITH A.Multiplicity = 'single-valued' AND
R.relate.Entity-Class = {E, N} AND
N.Attribute = {A} AND
N.IdentifierAttribute = N.Attribute AND
A.Name = M.Name AND
A.DataType = M.DataType AND
A.Uniqueness = 'unique' AND
A.Null-Spec = 'not-nuir AND
R.relate.E(Max-Card) = m IF M.Uniqueness = 'not-unique' AND
R.relate.E(Max-Card) = 1 IF M.Uniqueness = 'unique' AND
292
R.reIate.E(Min-Card) = 1 AND
R.reIate.N(Max-Card) = m AND
R.reIate.N(Min-Card) = 0 IF M.Null-Spec = 'null' AND
R.relate.N(Min-Card) = 1 IF M.Null-Spec = 'not-null'
2. A set of entity-classes having the same identifier as another entity-class whose
extension on its identifier attributes is a superset of the union of that of the formers
are equivalent to a new specialization relationship in which the latter is the superclass
and the formers are the subclasses. Furthermore, the identifier attributes of each
subclass will be removed.
C: Entity-Class AND
T: {E: Entity-Class (WHERE E.ldentifier.Attribute = C.Identifier.Attribute)}
(WHERE union(ext(E.Identifier.Attribute)) e ext(C.Identifier.Attribure))
S: Specialization
WITH S.superclass.Entity-Class = {C} AND
S.subclass.Entity-Class = T AND
E.ldentifier = E.Identifier - C.Identifier AND
E.Attribute = E.Attribute - C.Identifier.Attribute
3. If a set of entity-classes share the same identifier as another entity-class and the
extension of the latter is a superset of the union of the formers, the set of association
relationship between the latter entity-class and each of the formers are equivalent to a
new specialization relationship with the latter as the superclass and the formers are
the subclasses. Furthermore, the identifier attributes of each subclass will be
removed.
C: Entity-Class AND
T: {E: Entity-Class (WHERE E.ldentifier.Attribute = C.Identifier.Attribute)}
(WHERE union(ext(E.Identifier.Attribute)) c ext(C.Identifier.Attribute))
R: {A: E.relate.Association (WHERE A.relate.Entity-Class = {C, E})}
S: Specialization
WITH S.superclass.Entity-Class = {C} AND
S.subclass.Entity-Class = T AND
E.ldentifier = E.Identifier - C.Identifier AND
E.Attribute = E.Attribute - C.Identifier.Attribute
4. A set of entity-classes sharing the same identifier are equivalent to a new
specialization with a new entity-class as the superclass and the set of entity-classes as
the subclasses. The identifier and common attributes of the subclasses will be
promoted to the new superclass.
T: {E: Entity-Class} (WHERE equal(E.ldentifier.Attribute) = true)
293
N: Entity-Class AND
S: Specialization
WITH S.superclass.Entity-Class = {N} AND
S.subclass.Entity-CIass = T AND
N.Attribute = intersect(E.Attribute) AND
N.IdentifienAttribute = E.IdentifierAttribute AND
E.Attribute = E.Attribute - intersect(E.Attribute)
5. A set of specialization relationships each of which has a single subclass and shares
the same superclass are equivalent to a new specialization relationship with their
superclass as the superclass and the set of subclasses of those specialization
relationships as the subclasses of the new specialization relationship.
E: Entity-Class AND
S: {T: Specialization (WHERE count(T.subclass.Entity-Class) = I) AND
T.superclass.Entity-Class = E)} AND
B: {C: T.subclass.Entity-Class}
N: Specialization
WITH N.superclass.Entity-Class = {E} AND
N.subclass.Entity-Class = B
6. If each entity-class E is not associated with any aggregation relationship and has the
maximimi and minimum cardinality as 1 on every entity-class C directly linked to E
via an association relationship, the entity-class E and the set of association
relationships are equivalent to a new association relationship connecting to all Cs.
The minimal and maximum cardinality on each C with the new association
relationship are the same as those on E with the association relationship to C. All
attributes of E become attributes of the new association relationship.
E; Entity-Class (WHERE E.aggregate.Aggregation = 0 AND
E.component.Aggregation = 0) AND
S: {A: E.relate.Association (WHERE count(A.related.Entity-CIass) = 2)} AND
O: {C: A.relate.Entity-Class (WHERE
C E AND A.relate.C(min-card) = 1 AND A.relate.C(max-card) =1)}
(WHERE count(O) = count(S))
R: Association
WITH
R.relate.Entity-Class = O AND
R.relate.C(min-card) = A.relate.E(niin-card) AND
R.relate.C(max-card) = A.relate.E(max-card) AND
R.Attribute = E.Attribute
294
7. An aggregation relationship is equivalent to a set of association relationships each of
which connects a component-class to the aggregate-class of the aggregation
relationship. The maximum and minimum cardinality on each component-class with
its new association relationship are the same as those on the component-class with
the aggregation relationship. The maximum and minimal cardinality of the
aggregate-class with each new association relationship are I, respectively.
A: Aggregation AND
G: A.aggregate.Entity-Class AND
E: (C; A.component.Entity-Class}
S: {R: Association}
WITH
R.relate.Entity-CIass = {G, C} AND
R.relate.C(min-card) = A.component.C(min-card) AND
R.relate.C(max-card) = A.component.C(max-card) AND
R.relate.C(min-card) = 1 AND
R.relate.C(max-card) =1
295
REFERENCES
[ADD91] Ahmed, R., DeSchedt, P., Du, W., Kent, W., Ketabchi, M., Litwin, W., Rafii,
A. and Shan, M.-C., "The Pegasus Heterogeneous Multidatabase System,"
IEEE Computer, Vol. 24, No. 12, December 1991, pp.19-27.
[ADK91] Ahmed, R., DeSchedt, P., Kent, W., Ketabchi, M., Litwin, W., Rafii, A. and
Shan, M.-C., "Pegasus: A System for Seamless Integration of Heterogeneous
Information Sources," Proceedings of the IEEE International Computer
Conference (COMPCON '91), March 1991, pp.128—136.
[AB89] Alonso, R. and Barbard, D., "Negotiating Data Access in Federated Database
Systems," Proceedings of Fifth International Conference on Data
Engineering, Los Angeles, CA, Feb. 6-10, 1989, pp.56-65.
[ADSY93] Andersson, M., Dupont, Y., Spaccapietra, S., Yetongnon, K., Tresch, M. and
Ye, H., "FEMUS: A Federated Multilingual Database System," Advanced
Database Systems, N. R. Adam and B. K. Bhargava (Eds.), Springer-Verlag,
Berlin, German, 1993, pp.359-380.
[AT91]
Atzeni, P. and Torlone, R., "A Metamodel Approach for the Management of
Multiple Models in CASE Tools," Proceedings of Database and Expert
Systems Applications, K. Karagiannis (Ed.), Germany, 1991, pp.350-355.
[AT93]
Atzeni, P. and Torlone, R., "A Metamodel Approach for the Management of
Multiple Models and the Translation of Schemes," Information Systems, Vol.
18, No. 6, 1993. pp.349-362.
[BLN86] Batini, C., Lenzerini, M., and Navathe, S. B., "A Comparative Analysis of
Methodologies for Database Schema Integration," ACM Computing Surveys,
Vol. 18, No. 4, December 1986, pp.323-364.
[BCN92] Batini, C., Ceri, S., and Navathe, S. B., Conceptual Database Design: An
Entity-Relationship Approach, The Benjamin/Cummings Publishing Company,
Inc., Redwood City, CA, 1992.
[BW95] Benjamin, R. and Wigand, R., "Electronic Markets and Virtual Value Chains on
the Information Superhighway," Sloan Management Review, Winter 1995,
pp.62-72.
[B87]
Blakey, M., "Basis of a Partially Informed Distributed Database,"
Proceedinsg of I3th International Conference on Very Large Data Bases,
Brighton, UK., 1987, pp.381-388.
[BM91] Bloedom, E. and Michalski, R., "Data Driven Constructive Induction in
AQ17-PRE: A Method and Experiments," Proceedings of the Third
International Conference on Tools for AI, San Jose, CA, 1991, pp.30-37.
[BKG93] Bouguettaya, A., King, R., Galligan, D. and Simmons, J., "Implementation of
Interoperability in Large Multidatabases," Proceedings of Third international
296
Workshop on Research Issues in Data Engineering: Interoperability in
Multidatabase Systems (RIDE-IMS '93), Vienna, Austria, April 19-20, 1993,
pp.55-60.
[BOT86] Breitbart, Y., Olson, P. and Thompson, G., "Database Integration in a
Distributed Heterogeneous Database System," Proceedings of 2nd
International Conference on Data Engineering, 1986, pp.301—310.
[BHP92] Bright, M. W., Hurson, A. R., and Pakzad, S. H., "A Taxonomy and Current
Issues in Multidatabase Systems," Computer, Vol. 25, No. 3, Mar. 1992,
pp.50-60.
[BF92]
Brinkkemper, S. and Falkenberg, E. D., "Three Dichotomies in the
Information System Methodology," Technical Report, Department of
Information Systems, University of Nijmegen, The Netherland, 1992.
[BdL89] Brinkkemper, S., de Lange, M., Looman, R. and van der Steen, F. H. G. C.,
"On the Derivation of Method Companionship by Meta-Modelling,"
Technical Report No. 89-5, Department of Informatics, University of
Nijmegen, March 1989.
[B78]
Brodie, M. L., Specification and Verification of Database Semantic Integrity,
Ph.D Thesis, Department of Computer Science, University of Toronto,
Toronto, 1978.
[CMM83] Cabonell, J. G., Michalski, R. S. and Mitchell, T. M., "An Overview of
Machine Learning," Machine Learning: An Artificial Intelligence Approach,
R. S. Michalski, J. G. Carbonell and T. M. Mitchell (Eds.) Tioga Publishing
Company,1983, pp.3-23.
[CCH89] Cai, Y., Cercone, N. and Han, J., "Learning Characteristic Rules from
Relational Databases," Proceedings of International Symposium of
Computational Intelligence, Milano, Italy, September 1989, pp.187-196.
[CCH90] Cai, Y., Cercone, N. and Han, J., "An Attribute-Oriented Approach for
Learning Classification Rules from Relational Databases," Proceedings of the
sixth International Conference on Data Engineering, 1990, pp.281-288.
Carbonell, J. G., "Introduction: Paradigms for Machine Learning," Machine
[C90]
Learning: Paradigms and Methods, J. G. Carbonell (Ed.), A Bradford Book,
Cambridge, MA, 1990, pp.1-9.
[CN89] Clark, P. and Niblett, T., "The CN2 Induction Algorithm," Machine Leaning,
Vol 3, 1989, pp.261-283.
[DM83] Dietterich, T. G. and Michalski, R. S., "A Comparative Review of Selected
Methods For Learning From Examples," Machine Learning: An Artificial
Intelligence Approach, R. S. Michalski, J. G. Carbonell and T. M. Mitchell
(Eds.) Tioga Publishing Company, 1983, pp. 41-81.
[DW91] Dilts, D. M. and Wu, W., "Using Knowledge-based Technology to Integrate
CIM Databases," IEEE Transactions on Knowledge and Data Engineering,
Vol. 3, No. 2, June 1991, pp.237-245.
297
[DKS92] Du, W., Krishnamurthy, R. and Shan, M.-C., "Query Optimization in
Heterogeneous DBMS," Proceedings of the 18th VLDB Conference,
Vancouver, British Columbia, Canada, 1992, pp.277-29l.
[EN94]
Ebnasri, R. and Navathe, S. B., Foundamentals of Database Systems, 2nd
Ed., The Benjamin/Cummings Publishing Company, Inc., Redwood City,
CA, 1994.
[FN93]
Fankhauser, P. and Neuhold, E. J., "Knowledge Based Integration of
Heterogeneous Databases," Interoperable Database Systems (DS-5) (A-25),
D. K- Hsiao, E. J. Neuhold and R. Sacks-Davis (Eds.), Elsevier Science
Publishers, North-Holland, 1993, pp.155-175.
[FS84]
Feigenbaum, E and Simon, H., "EPAM-like Models of Recognition and
Learning," Cognitive Sci, Vol. 8,1984, pp. 305-336.
[F87]
Fisher, D., "Knowledge Acquisition via Incremental Conceptual Clustering,"
Machine Learning, Vol. 2, 1987, pp. 139-172.
[GLF90] Gennari, J,. Langley, P. and Fisher, D., "Models of Incremental Concept
Formation," Machine Learning: Paradigms and Methods, J. G. Carbonell
(Ed.), The MIT Press, Cambridge, Massachusetts, London, England, 1990,
pp.10-61.
[GRS94] Georgakopoulos, D., Rusinjiewicz, M., and Sheth, A. P., "Using Tickets to
Enforce the Serializability of Multidatabase Transactions," IEEE
Transactions on Knowledge and Data Engineering, Vol. 6, No. 1, Feb. 1994,
pp.166—180.
[G93]
Getta, J. R., "Translation of Extended Entity-Relationship Database Model
into Object-Oriented Database Model," Interoperable Database Systems (DS5) (A-25), D. K. Hsiao, E. J. Neuhold and R. Sacks-Davis (Eds.), Elsevier
Science Publishers B. V., North-Holland, 1993, pp. 87-100.
[GCK92] Goldkuhl, G., Cronholm, S. and Krysander, C., "Adaptation of CASE Tools
to Different Systems Development Methods," Proceedings of the 15th IRIS
'92 Conference, Larkollen, Norway, August 1992.
[GD93] Gonzalez, A. J. and Dankel, D. D., The Engineering of Knowledge-Based
Systems: Theory and Practice, Prentice Hall, Englewood Cliffs, NJ, 1993.
[HM85] Heimbigner, D. and McLeod, D., "A Federated Architecture for Information
Management," ACM Transactions on Office Information Systems, Vol. 3, No.
3, July 1985, pp.253-278.
[H092] Heym, M. and Osterle, H., "A Semantic Data Model for Methodology
Engineering," Proceedings of the 5th CASE '92 Workshop, G. Forte and N.
Madhayji (Eds.), 1992.
[HC93]
Hsieh, S.-Y., Chang, C. K., Mongkolwat, P., Pilch, W. W. Jr., and Shih, C.C., "Capturing the Object-Oriented Database Model in Relation Form,"
Proceedings of the 7th Annual International Computer Software and
Applications Conference, 1993, pp.202-208.
298
[1091]
Ishakbeyoglu, N. S. and Ozsoyoglu, Z. M., "Maintenance of Semantic
Integrity Constraints Under Database Updates," Proceedings of the Sizth
International Symposium on Computer and Information Sciences, Vol I, M.
Baray and B. Ozgii? (Eds.), Elsevier Science Publishers B.V., 1991, pp.l25134.
[JJ95]
Jeusfeld, M. A. and Johnen, U. A., "An Executable Meta Model for ReEngineering of Database Schemas," International Journal of Cooperative
Information Systems, Vol. 4, No. 2 & 3,1995, pp.237—258.
[KK92] Kamel, M. N. and Kamel, N. N., "Federated Database Management System:
Requirements, Issues and Solutions," Computer Communications, Vol. 15,
No. 4, May 1992, pp.270-278.
[KM86] Kedar-Cabelli and Mahadevan, S., "Bibliography of Recent Machine
Learning Research," Machine Learning, Vol. II, R. S. Michalski, J. G.
Carbonell and T. M. Mitchell (Eds.), Morgan Kaufinann Publishers, Inc., Los
Altos, CA, 1986, pp.671-705.
[K95]
Kim, W., "Introduction to Part 2: Technology for Interoperating Legacy
Databases," Chapter 25 in Modern Database Systems, W. BCim (Ed.),
Addison-Wesley Publishing Company, Reading, MA, 1995, pp.515-520.
[KS94]
Komatzky, Y. and Shoval, P., "Conceptual Design of Object-Oriented
Database Schemas Using the Binary-Relationship Model," Journal of Data &
Knowledge Engineering, Vol. 14, 1994, pp265—288.
[K94]
Kramer, S., "CN2-MCI: A Two-Step Method for Constructive Induction,"
Proceedings of Workshop on Constructive Induction and Change of
Representation, July 10,1994.
[LS83]
Lenzerini, M. and Santucci, G., "Cardinality Constraints in the EntityRelationship Model," Entity-Relationship Approach to Software Engineering,
C. G. Davis, S. Jajodia, P. A. Ng and R. T. Yeh (Eds.), Elsevier Scicence
Publishers, 1983, pp.529-549.
[LW91] Liu Sheng, O. R. and Wei, C.-P., "Object-Oriented Modeling and Design of
Coupled KnowIedge-based/Database Systems," Proceedings of International
Conference on Data Engineering, Phoenix, AZ, 1991, pp.98-105.
[LWH94a] Liu Sheng, 0. R., Wei, C.-P., Hu, P. J.-H. and Han, T., "Analysis and Design
of A Distributed Intelligent Multimedia Information System for Supporting
Medical Image Reading," Proceedings of 1994 Pacific Workshop on
Distributed Multimedia Systems, February 26, 1994, Taipei, Taiwan, pp.l09128.
[LWH94b]Liu Sheng, O. R., Wei, C.-P. and Hu, P. J.-H., "Engineering Patient Image
Retrieval Knowledge," The Journal of Knowledge Engineering and
Technology, Vol. 7, No. 2, Summer/Fall 1994, pp.45-61.
[LOG92] Lu, H., Ooi, B.-C. and Goh, C.-H., "On Global Multidatabase Query
Optimization," SIGMOD Record, Vol. 21, No. 4, December 1992, pp.6-11.
299
[MYB87] Malone, T. W., Yates, J. and Benjamin, R. I., "Electronic Markets and
Electronic Hierarchies," Communications of the ACM, Vol. 30, No. 6, June
1987, pp.484-497.
[MHR93] McCormack, J. I., Halpin, T. A. and Ritson, P. R., "Automated Mapping of
Conceptual Schemas to Relational Schemas," Proceedings of 5th
International Conference, CAiSE '93, C. Rolland, F. Bodart and C. Cauvet
(Eds.), Paris, France, June 8-11,1993, pp.432-448.
[MRB92] Mehrotra, S., Rastogi, R., Breitbart, Y., Korth, H. F., and Silberschatz, A.,
"The Concurrency Control Problem in Miiltidatabases: Characteristics and
Solutions," ACMSIGMOD Record, June 1992, pp.288-297.
[MY95] Meng, W. and Yu, C., "Query Processing in Multidatabase Systems," Chapter
27 in Mordern Database Systems: The Object Model, Interoperability, and
Beyond, W. Kim (Ed.), Addison-Wesley Publishing Company, Reading, MA,
1995, pp.551-572.
[Mi86]
Michalski, R., "Understanding the Nature of Learning: Issues and Research
Directions," Machine Learning, Vol. II, R. S. Michalski, J. G. Carbonell and
T. M. Mitchell (Eds.), Morgan Kaufinarm Publishers, Inc., Los Altos, CA,
1986, pp.3-25.
[MMH86] Michalski, R., Mozetic, I., Hong, J. and Lavrac, N., "The Multi-purpose
Incremental Learning System AQ15 and Its Testing Application to Three
Medical Domains," Proceedings of the Fifth National Conference on
Artificial Intelligence, Morgan Kaufinann, Philadelphia, PA, 1986, pp.10411045.
[M77]
Mitchell, T., "Version Spaces: A Candidate Elimination Approach to Rule
Learning", In Proceedings IJCAI-77, 1977.
[M78]
Mitchell, T., Version Spaces: An Approach to Concept Learning, PhD Thesis,
Stanford University, Stanford, CA., 1987.
[M84]
Morgenstem, M., "Constraint Equations: Declarative Expression of
Constraints with Automatic Enforcement," Proceedings of International
Conference on Very Large Data Bases, 1984, pp. 111—125.
[M86]
Morgenstem, M., "The Role of Constraints in Databases, Expert Systems, and
Knowledge Representation," Proceedings of the First International
Conference on Expert Database Systems, L. Kerschberg (Ed.), The
Benjamin/Cummings Publishing Company, 1986, pp.351-368.
[P94]
Pfahringer, B., "CiPF 2.0: A Robust Constructive Induction System,"
Proceedings of Workshop on Constructive Induction and Change of
Representation, July 10,1994.
[QW86] Qian, X. and Wiederhold, G., "Knowledge-based Integrity Constraint
Validation," Proceedings of the Twelfth International Conference on Very
Large Data Bases, Kyoto, Japan, August 1986, pp.3-12.
[Q83]
Quinlan, J. R., "Learning Efficient Classification Procedures and Their
Application to Chess End Games," Machine Learning: An Artificial
300
[Q93]
[RPR94]
[RK91]
[SCG93]
[SBD93]
[SYE90]
[SK86]
[SL90]
[SSU91]
[S78]
[SP91]
[SPD92]
Intelligences Approach, R.S, Michalski, J. G. Carbonell and T. M. Mitchell
(Eds.), Morgan Kaufinann, Los Altos, CA, 1983, pp.463-482.
Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufinann
Publishers, San Mateo, CA, 1993.
Reddy, M. P., Prasad, B. E., Reddy, P. G. and Gupta, A., "A Methodology for
Integration of Heterogeneous Database," IEEE Transactions on Knowledge and
Data Engineering, Vol. 6, No. 6, December 1994, pp.920-933.
Rich, E. and BCnight, K., Artificial Intelligence, 2nd Ed., McGraw-Hill, Inc.,
New York, NY, 1991.
Saltor, F., Castellanos, M. G. and Garcia-Solaco, M., "Overcoming Schematic
Discrepancies in Interoperable Databases," Interoperable Database Systems
(DS-5) (A-25), D. K. Hsiao, E. J. Neuhold and R. Sacks-Davis (Eds.),
Elsevier Science Publishers, North-Holland, 1993, pp.191-205.
Santucci, G., Batini, C. and Di Battista, G., "Multilevel Schema Integration,"
Proceedings of 12th International Conference on the Entity-Relationship
Approach (ER '93), R. A. Elmasri, V. Kouramajian and B. Thalheim (Eds.),
Arlington, TX, December 15-17, 1993, pp.327-338.
Scheuermann, P., Yu, C., Ehnagarmid, A., Garcia-Molina, H., Manola, F.,
McLeod, D., Rosenthal, A. and Templeton, M., "Report on the Workshop on
Heterogeneous Database Systems," SIGMOD Record, Vol. 19, No. 4,
December 1990, pp.23-31.
Shepherd, A. and Kerschberg, L., "Constraint Management in Expert
Database Systems," Proceedings of the First International Conference on
Expert Database Systems, L. Kerschberg (Ed.), The Benjamin/Cummings
Publishing Company, 1986, pp.309-331.
Sheth, A. P. and Larson, J. A., "Federated Database Systems for Managing
Distributed, Heterogeneous, and Autonomous Databases," ACM Computing
Surveys, Vol. 22, No. 3, September 1990, pp. 183-236.
Silberschatz, A., Stonebraker, M. and Ullman, J. F., "Database Systems:
Achievements and Opportunities," Communications of the ACM, Vol. 34, No.
10, October 1991, pp.111-120.
Stepp, R., "The Investigation of the UNICLASS Program AQ7UNI and
User's Guide," Technical Report No. 949, Department of Computer Science,
University of Illinois at Urbana-Champaign, 1978.
Spaccapietra, S. and Parent, C., "Conflicts and Correspondence Assertions m
Interoperable Databases," SIGMOD Record, Vol. 20, No. 4, December 1991,
pp.49-54.
Spaccapietra, S., Parent, C. and Dupont, Y., "Model Independent Assertions for
Integration of Heterogeneous Schemas," The VLDB Journal, Vol. 1, No. 1, July
1992.
301
[SP94]
[SZ93]
[T93]
|TBC87]
[U89]
[UL93]
[USR93]
[vBtH91]
[vG87]
[VP88]
[WCN92]
Spaccapietra, S. and Parent, C., "View Integeration: A Step Forward in
Solving Structure Conflicts," IEEE Transactions on Knowledge and Data
Engineering, Vol. 6, No.2, April, 1994, pp258-274.
Steele, P. M. and Zaslavsky, A. B., "The Role of Meta Models in Federating
System Modelling Techniques," Proceedings of 12th International
Conference on Entity-Relationship Approach, R. A. Elmasri, V. Kouramajian
and B. ThaUieim (Eds.), Arlington, TX, Dec. 15-17, 1993, pp.315-326.
Tari, A., "Interoperability between Database Models," Interoperable
Database Systems (DS-5) (A-25), D. K. Hsiao, E. J. Neuhold and R. SacksDavis (Eds.), Elsevier Science Publishers B. V., North-Holland, 1993,
pp.101-118.
Templeton, M., Brill, D., Chen, A., Dao, S., Lund, E., McGregor, R. and Ward,
P., "Mermaid: A Front-End to Distributed Heterogeneous Databases,"
Proceedings of IEEE, Vol. 75, Vol. 5, May 1987, pp.695—708.
Urban, A. D., "ALICE: An Assertion Language for Integrity Constraint
Expression," Proceedings of the Computer Software Applications
Conference, Orlando, FL, 1989, pp.292-299.
Urban, S. D. and Lim, B. B. L., "An Intelligent Framework for Active
Support of Database Semantics," International Journal of Expert Systems,
Vol. 6, No. 1, 1993, pp.1-37.
Urban, S. D., Shah, J. J., and Rogers, M., "An Overview of the ASU
Engineering Database Project: Interoperability in Engineering Design,"
Proceedings of 3rd International Workshop on Research Issues in Data
Engineering: Interoperability in Multidatabase Systems (RIDE-IMS '93),
Vienna, Austria, April 19-20, 1993, pp.73-76.
van Bommel, P., ter Hofstede, A. H. M. and van der Weide, Th. P.,
"Semantics and Verification of Object-Role Models," Information Systems,
Vol. 16, No. 5,1991, pp.471-495.
van Gigch, J. P., "Methodological Comparison of the Science Systems and
Metasystem Paradigms," Decision Making about Decision Making:
Metamodels and Metasystems, J. P. van Gigch (ed.). Abacus Press,
Cambridge, MA, 1987, pp.19-33.
Veijalainen, J. and Popesch-Zeletin, R., "Multidatabase Systems in ISO/OSI
Environment," Standards in Information Technology and Industrial Control,
Malagardis, N. and Williams, E. (Eds.), North-Holland, The Netherlands,
1988, pp.83-97.
Whang, W. K., Chakravarthy, A. and Navathe, S. B., "Heterogeneous
Databases: Inferring Relationships for Merging Component Schemas, and a
Query Language," Technical Report, UF-CIS-TR-92-048, University of
Florida, December 1992.
302
[WM94]
[W75]
[W92]
[WD92]
[YL93]
[YP93]
[Z94]
[ZSC95]
Wnek J. and Michalski, R., "Hypothesis-Driven Constructive Induction in
AQ17-HCI: A Method and Experiments," Machine Learning, Vol. 14, No. 2,
1994, pp. 139-168.
Winston, P., "Learning Structural Descriptions From Examples," Chapter 5 in
The Psychology of Computer Vision, P. Winston (Ed.), McGraw Hill, New
York, 1975.
Winston, P. H., Artificial Intelligence, 3rd Ed., Addison-Wesley Publishing
Company, Reading, MA, 1992.
Wu, W. and Dilts, D. M., "Integrating Diverse CIM Data Bases: The Role of
Natural Language Interface," IEEE Transactions on Systems, Man, and
Cybernetics, Vol. 22, No. 6, November/December 1992, pp.1331-1347.
Yan, L.-L. and Ling, T.-W., "Translating Relational Schema with Constraints
Into OODB Schema," Interoperable Database Systems (DS-5) (A-25), D. K.
Hsiao, E. J. Neuhold and R. Sacks-Davis (Eds.), Elsevier Science Publishers
B. v., North-Holland, 1993, pp.69-85.
Yang, J. and Papazoglou, M. P., "Determining Schema Interdependencies in
Object-Oriented Multidatabase Systems," Proceedings of the Third
International Symposium on Database Systems for Advanced Applications, S.
Moon and H. Ikeda (Eds.), Taejon, Korea, April 6-8, 1993, pp.47-54.
Zhao, J. L. "Schema Coordination in Federated Database Systems,"
Proceedings of the 4th Workshop on Information Technologies and Systems
(WITS'94), Vancouver, Canada, December 17-18,1994.
Zhao, J. L., Segev, A. and Chatteqee, A., "A Universal Relation Approach to
Federated Database Management," Preceedings of International Conference
on Data Engineering, Taipei, Taiwan, 1995.