Entity-Relationship Modeling Re-Revisited: JANUARY 2004
Entity-Relationship Modeling Re-Revisited: JANUARY 2004
Entity-Relationship Modeling Re-Revisited: JANUARY 2004
discussions, stats, and author profiles for this publication at: http://www.researchgate.net/publication/28674356
CITATIONS
READS
75
2 AUTHORS:
Don Goelman
Il-Yeol Song
Villanova University
Drexel University
15 PUBLICATIONS 66 CITATIONS
SEE PROFILE
SEE PROFILE
Introduction
Since its inception [5], the Entity-Relationship (ER) model has been the primary
approach for presenting and communicating a database schema at the conceptual level
(i.e., independent of its subsequent implementation), especially by means of the associated
Entity-Relationship Diagram (ERD). Theres also a fairly standard method for converting
it to a relational database schema. In fact, if the ER model is in some sense correct, then
the associated relational database schema should be in pretty good normal form [15]. Of
course, there have been some suggested extensions to Chens original ideas (e.g.,
specialization and aggregation as in [10, 19]), some different approaches for capturing
information in the ERD, and some variations on the mapping to the relational model, but
the degree of variability has been relatively minor. One reason for the remarkable
robustness and popularity of the approach is no doubt the wide appreciation for the
simplicity of the diagram. Consequently, the desirability of incorporating additional
features in the ERD must be weighed against the danger of overloading it with so much
information that it loses its visual power in communicating the structure of a database. In
fact, the models versatility is also evident in its relatively straightforward mappability to
the newer Object Data Model [7]. Now admittedly an industrial strength ERD reflecting
an actual enterprise would necessarily be some order of magnitude more complex than
even the production numbers in standard texts [e.g., 10]. However, this does not weaken
the ability of a simple ERD to capture local pieces of the enterprise, nor does it lessen the
importance of ER-type thinking in communicating a conceptual model.
Quite recently, however, both Camps and Badia have demonstrated [4, and 2 (from
whose paper the title of this one is derived)] some apparent shortcomings in the ER
model, both in the model itself and in the processes of conversion to the relational model
and its subsequent normalization. They have illustrated these problems through some
interesting examples. They also make some recommendations for improvements, based on
these examples. However, while not claiming that the ER model can be all things to all
users, we believe that the problems presented in the examples described in those two
papers are due less to the model and more to its incorrect application.
Extending the ERD to represent complex multi-relation constraints or constraints at
the attribute level are interesting research topics, but are not always desirable. We claim
that representing them would clutter the ERD as a conceptual model at the enterprise
level; complex constraints would be better specified in a textual or language-oriented
format than at the ERD level.
The purpose of this paper is to take these examples as a starting point to discuss the
possible shortcomings of the ER model and the necessity, or lack thereof, for modifying it
in order to address them. We therefore begin by reviewing and analyzing those
illustrations. Section 2 describes and critiques Camps scenarios; Section 3 does Badias.
Section 4 considers some related issues, most notably a general design principle only
minimally offered in the ER model. Section 5 concludes our paper.
we are told that for a given state/product combination, there can only be one dealer. Thus,
a minimal set of dependencies is as follows:
{s-Id, p-Id} d-Id
{s-Id, p-Id} c-Date
d-Id d-Address
p-Id p-Type
s-Id s-Capital
(A)
An ERD for this is given in Figure 1 (attributes are eliminated in the figures, for
the sake of clarity), and the obvious relational database schema is as follows:
State(s-Id, s-Capital)
Product(p-Id, p-Type)
Dealer(d-Id, d-Address)
Concession(s-Id, p-Id, d-Id, c-Date)
(B)
_________________________________________________
_________________________________________________
The foreign key constraints derive here from the two components of Concessions
key, which are primary keys of their native schemas. Since the only functional
dependencies are those induced by keys, the schema is in BCNF. Here Camps imposes
further constraints:
p-Id d-Id
s-Id d-Id
In other words, if a product is offered as a concession, then it can only be with a single
dealer regardless of the state; and analogously on the state-dealer side. The author is
understandably unhappy about the absence of a standard ERD approach to accommodate
the resulting binary constraining relationships (using the language of [12]), which he
renders in a rather UML-like fashion [17], similar to Figure 2. At this point, in order to
highlight the generic structure, he introduces new notation (A, B, C, D for State, Dealer,
Product, Concession, respectively). However, we will keep the current ones for the sake
of comfort, while still pursuing the structure of his narrative. He notes that the resulting
relational database schema includes the non-3NF relation schema Concession(s-Id,p-Id,dId,c-Date). Further, when Camps wishes to impose the constraints that a state
(respectively product) instance can determine a dealer if and only if there has been a
concession arranged with some product (respectively state), he expresses them with these
conditions:
(C)
_________________________________________________
Now we note that it is actually possible to capture the structural properties of the
enterprise at this stage by the simple (i.e., ternary-free) ERD of either Figure 3a [13] or
Figure 3b [18]. The minimal set of associated functional dependencies in Figure 3a is as
follows:
s-Id s-Capital
p-Id p-Type
d-Id d-Address
s-Id d-Id
p-Id d-Id
{s-Id, p-Id} c-Date
(D)
One, therefore, obtains the following relational database schema, which is, of
course, in BCNF, since all functional dependencies are due to keys:
State(s-Id,s-Capital,d-Id)
(E)
Product(p-Id,p-Type,d-Id)
Dealer(d-Id,d-Address)
Concession(s-Id,p-Id,c-Date)
_________________________________________________
_________________________________________________
Admittedly, this approach loses something: the ternary character of Concession.
However, any dealer-relevant information to a concession instance can be discovered by a
simple join; a view can also be conveniently defined. The ternary relationship in Figure 2
is therefore something of a red herring when constraining binary relationships are imposed
to a ternary relationship. In other words, it is possible that an expansion of the standard
ERD language to include n-ary relationships being constrained by m-ary ones might be a
very desirable feature, but its absence is not a surprising one.
Jones and Song showed that the ternary schema with FDs imposed in Figure 2 can
have lossless decomposition, but cannot have an FD-preserving schema (Pattern 11 in
[13]). Camps now arrives at the same schema (E) (by normalizing his non-3NF one, not
by way of our ERD in Figure 3a). The problem he sees is incorporating the semantics of
(C). The constraints he develops are:
s-Id, p-Id (Concessions) s-Id, p-Id (State*Product)
s-Id (State) s-Id (Concessions) iff State.d-Id is not null
p-Id (Product) p-Id (Concessions) iff Product.d-Id is not null
(F)
The last two conditions seem not to make sense syntactically. The intention is most
likely the following (keeping the first condition and rephrasing the other two):
s-Id, p-Id (Concessions) s-Id, p-Id (State*Product)
(G)
(s0s-Id(State))(s0 s-Id(Concessions) iff (d0)(<s0,d0> s-Id, d-Id(State)))
(p0p-Id(Product))(p0 p-Id(Concessions) iff (d0)(<p0,d0> p-Id, d-Id(Product)))
At any rate, Camps shows how SQL can accommodate these conditions too using
CHECKs in the form of ASSERTIONS, but he considers any such effort (to need any
conditions besides key dependencies and inclusion constraints) to be anomalous. We feel
that this is not so surprising a situation after all. The complexity of real-world database
design is so great that, on the contrary, it is quite common to encounter a situation where
many integrity constraints are not expressible in terms of functional and inclusion
dependencies alone. Instead, one must often use the type of constructions that Camps
shows us or use triggers to implement complex real-world integrity constraints.
minimality and power. In this section we give a brief recapitulation of the examples,
together with our analyses.
_________________________________________________
_________________________________________________
In certain limited situations [8] the Offers relationship might be superfluous and
recovered by composing the other two relationships (or, in the relational database schema,
by performing the appropriate joins). We would need to be careful about dropping Offers,
however. For example, if a particular course were at present unstaffed, then the Teaches
link would be broken. This is the case when Course entity has partial (optional)
participation to Department entity. Without an explicit Offers instance, we wouldnt
know which department offers the course. This is an example of a chasm trap which
requires an explicit Offers relationship [6]. Another case where we couldnt rely on
merely dropping one of the relationship links would arise if a commutative diagram
involved the composition of two relationships in each path; then we would surely need to
retain them both and to implement the constraint explicitly.
We note that allowing cycles and redundancies in ERDs has been a topic of research
in the past. Atzenti and Parker [1] advise against it; Markowitz and Shoshani [15] feel that
it is not harmful if it is done right. Dullea and Song [8, 9] provide a complete analysis of
redundant relationships in cyclic ERDs. Their decision rules on redundant relationships
are based on both maximum and minimum cardinality constraints.
entities as shown in Figure 6. Figure 6 avoids the difficulties at both the ER and relational
levels. In fact, this fix is even exhibited in the Connolly source itself. We note that the
chasm trap discussed in Section 3.2 and the fan trap are commonly called connection
traps [6] which make the connection between two entities separated by the third entity
ambiguous.
_________________________________________________
Figure 5. A semantically wrong ERD with a fan trap (from Figure 2 in [2] and Figure
11.19(a) from [6]).
_________________________________________________
_________________________________________________
through 7d, we show several variations of this case related to capturing the history of
works-in relationships and the above constraint. Well comment additionally on this in
Section 4.
_________________________________________________
Figure 7a. An employee may work in only one project and each project can have many
employees. The diagram already assumes that an employee must work for only one
project at a time. This diagram is not intended to capture any history of works-in
relationship.
Figure 7b. An employee may work in many projects and each project may have many
employees. The diagram assumes that an employee may work for many projects at the
same time. This diagram is also not intended to capture any history of works-in
relationship.
Figure 7c. An employee may work in only one project at a time. This diagram can capture
a history of works-in relationship of an employee for projects and still satisfies the
constraint that an employee may work in only one project at a time.
Figure 7d. In Figure 7.c, entity TimePeriod is not easily materialized, we can reify the
relationship Works-in to an intersection entity. This diagram can capture the history of
works-in relationship, but does not satisfy the constraint that an employee may work in
only one project.
_________________________________________________
sometimes called the Chen approach to cardinality [11] or look across; when A is a
singleton set and B its complement, it is called the Merise approach [11] or look here.
All told, McAllister shows that there are 3n-2n+1+1 different combinations possible for A
and B, where n is the number of different roles.
Clearly, given this explosive growth, it is impractical to include all possible
cardinality constraints in a general ERD, although McAllister shows a tabular approach
that works pretty well for ternary relationships. He shows further that there are many
equalities and inequalities that must hold among the cardinalities, so that the entries in the
table are far from independent. The question arises as to which cardinalities have the
highest priorities and should thus appear in an ERD. It turns out that the Merise and Chen
approaches give the same information in the binary case but not in the ternary one, which
becomes the contentious case (n>3 is rare enough not to be a serious issue). In fact one
finds both Chen [as in 10] and Merise [as in 3] systems in practice. In his article, Genova
feels that UML [17] made the wrong choice by using the Chen method for its Cmins, and
he suggests that class diagrams include both sets of information (but only when either A
or B is singleton). That does not seem likely to happen, though.
Still, consideration of these general cardinality constraints and McAllisters
axioms comes in handy in a couple of the settings we have discussed. The general setting
helps understand connections between, for example, ternary and related binary
relationships as in Figure 2 and [12]. And it similarly sheds light on preservation (and
loss) of information in Section 3.5 above, when a binary relationship is replaced by a
ternary one. Finally, we believe that it also provides the deep structural information for
describing the properties of decompositions of the associated relation schemas. It is
therefore indisputable in our opinion that these general cardinality constraints do much to
describe the fundamental structure of a relationship in the ER model; only portions of
which, like the tip of an iceberg, are currently visible in a typical ERD. And yet we are not
claiming that such information should routinely be included in the model.
5 Conclusion
We have reviewed recent literature ([4] and [2]) that illustrate through some interesting
examples areas of conceptual database design that are not accommodated sufficiently at
the present time by the Entity-Relationship model. However, some of these examples
seem not to hold up under scrutiny.
Capabilities that the model does indeed lack are constraints on commutative
diagrams (Section 3.2 above), recursive closures (3.3), and some range conditions (3.6) as
pointed out by Badia. Another major conceptual modeling tool missing in the ER model is
that of general cardinality constraints [16]. These constraints are the deep structure that
underlies such more visible behavior as constraining and related relationships, Chen and
Merise cardinality constraints, functional dependencies and decompositions, and
participation constraints. How many of these missing features should actually be
incorporated into the ER model is pretty much a question of triage, of weighing the
benefits of a feature against the danger of circuit overload.
We believe that some complex constraints such as multi-relation constraint are better
to be represented as a textual or a language-oriented syntax, such as OCL [17], rather than
at the ER diagram level. We also believe that information and constraints at the attribute
level could be expressed at the data dictionary level or in a separate low level diagram
below the ERD level. In these ways, we will keep an ERD as a conceptual model at
enterprise level to deliver major semantics without visual overload and too much clutter.
Consider the complexity of an ERD for a real-world application that has over 50 entities
and hundreds of attributes and representing all those complex multi-relation and attribute
constraints in the ERD. The use of a CASE tool that supports a conceptual ERD with its
any low level diagram for attributes and/or its associated data dictionary should be a right
direction for this problem.
We note that we do not claim that some research topics suggested by Baida, such as
relationships over relationships and attributes over attributes, are not interesting or worthy.
Research in those topics would bring interesting new insights and powerful ways of
representing complex semantics. What we claim here is that the ERD itself has much
value as it is now, especially for relational applications, where all the examples of Baida
indicate. We believe, however, that extending the ER model to support new application
semantics such as biological applications should be encouraged.
The D in ERD connotes to many researchers and practitioners the simplicity and
power of communication that account for the models popularity. Indeed, as the EntityRelationship model nears its 30th birthday, we find its robustness remarkable.
References
1.Atzeni, P. and Parker, D.S., Assumptions in relational database theory, in Proceedings
of the 1st ACM Symposium on Principles of Database Systems, March 1982.
2. Badia, A. Entity-Relationship Modeling Revisited, SIGMOD Record, 33(1), March
2004, pp. 77-82.
3.
Navathe,
S.,
Conceptual
Database
Design,