Untitled

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Chapter_3

Logical Database Design and the Relational Model


 Learning Objectives:
– Some basic definitions
– Relational data structure.
– Relational keys (Primary key, Foreign key , Composite key)
– Properties of relations.
– Transforming ER-Diagram into relations:
 Step 1: Map Regular Entities
 Step 2: Map Weak Entities
 step 3: Map Binary Relationships:
– Map binary One-to-Many relationships.
– Map binary Many-to-Many relationships.
– Map binary One-to-One relationships.
 Step 4: Map Associative Entities.
 Step 5: Map Unary Relationships:
– Unary One-to-many Relationships.
– Unary Many-to-many Relationships.
 Step 6: Map Ternary Relationships:
– Steps In Normalization.
– Integrity constraints.
Relational Model Integrity Constraints
In a relational database, there will typically be many relations,
and the tuples in those relations are usually related in various
ways.
The state of the whole database will correspond to the states of
all its relations at a particular point in time.
There are generally many restrictions or constraints on the actual
values in a database state.
The relational data model includes several types of constraints,
or rules limiting acceptable values and actions, whose purpose is
to facilitate maintaining the accuracy and integrity of data in the
database.
Constraints on databases can generally be divided into three
main categories:
1. Model-based constraints or Implicit constraints.
2. Schema-based constraints or Explicit constraints.
3. Application-based or Semantic Constraints or Business Rules.
Three categories of Constraints
1. Model-based constraints or Implicit constraints: Constraints that are
inherent in the data model.
o inherent means existing in something as a permanent or essential
attribute.
o The characteristics of relations that we discussed in the previous
sections are the inherent constraints of the relational model.
o For example, the constraint that “a relation cannot have
duplicate tuples” is an inherent constraint.
o “A relational model does not allow a list as a value for any
attribute” is an inherent constraint.

2. Schema-based constraints or Explicit constraints: Constraints that


can be directly expressed in the schemas of the data model, typically
by specifying them in the Data Definition Language. (DDL).
o There are four main types of this categories : (1) Domain
constraints, (2) Key constraints, (3) Entity integrity constraints,
and (4) Referential integrity constraints.
o We will discuss them later in this section.
3. Application-based or Semantic Constraints or Business Rules.
Constraints that cannot be directly expressed in the schemas of
the data model, and hence must be expressed and enforced by
the application programs or in some other way.

row

 Domain constraints specify that within each tuple, the value of


each attribute A must be an atomic value from the domain
dom(A).
 All of the values that appear in a column of a relation must be
from the same domain.

A common method of specifying a domain is to specify a data


type from which the data values forming the domain are drawn.
It is also useful to specify a name for the domain, to help in
interpreting its values.
Some examples of domains follow:
Local_Phone_Numbers. The set of seven-digit phone numbers
valid within a particular area code in the United States.
Social_Security_Numbers. The set of valid nine-digit Social
Security Numbers: which is a unique identifier assigned to each
person in the United States.
Names: The set of character strings that represent names of
persons.
Grade_Point_Averages. Possible values of computed GPAs
each must be a real (floating-point) number between 0 and 4.
Student_Ages. Possible ages of students in a university; each
must be an integer value between 18 and 28.
Academic_Department_Names. The set of academic
department names in a university, such as Computer Science,
Economics, and Physics.
Academic_Department_Codes. The set of academic department
codes, such as ‘CS’, ‘ECON’, and ‘PHYS’.
The preceding are called logical definitions of domains
A data type or format is also specified for each domain. For
example,
The data type for the domain Local_Phone_Numbers can be
declared as a character string of the form ddd-dddd, where each
d is a numeric (decimal) digit.
The data type for Student_Ages is an integer number
between 18 and 28.
For Academic_Department_Names, the data type is the set
of all character strings that represent valid department
names.
A domain is thus given a name, data type, or format.
Additional information for interpreting the values of a domain
can also be given; for example, a numeric domain such as
Person_weights should have the units of measurement, such as
pounds or kilograms.
In the formal relational model, a relation is defined as a set of
tuples.
By definition, all elements of a set are distinct; hence, all tuples in
a relation must also be distinct.
This means that no two tuples can have the same combination of
values for all their attributes.
Usually, there are other subsets of attributes of a relation schema
R with the property that no two tuples in any relation state r of R
should have the same combination of values for these attributes.
Suppose that we denote one such subset of attributes by SK; then
for any two distinct tuples t1 and t2 in a relation state r of R, we
have the constraint that:
t1[SK] ≠ t2[SK]
Any such set of attributes SK is called a superkey of the relation
schema R.
A superkey SK specifies a uniqueness constraint that no two distinct
tuples in any state r of R can have the same value for SK.
Every relation has at least one default superkey.
However, a superkey can have redundant attributes, so a more
useful concept is that of a key, which has no redundancy.
Hence, a key satisfies two properties:
1. Two distinct tuples in any state of the relation cannot have
identical values for (all) the attributes in the key. This uniqueness
property also applies to a superkey.
2. It is a minimal superkey—that is, a superkey from which we
cannot remove any attributes and still have the uniqueness
constraint hold.
 This minimality property is required for a key but is optional
for a superkey.
Hence, a key is a superkey but not vice versa.
A superkey may be a key (if it is minimal) or may not be a key (if it is
not minimal).
Consider the STUDENT relation of Figure 3.29
The attribute set {Ssn} is a key of STUDENT because no two
student tuples can have the same value for Ssn.
Any set of attributes that includes Ssn—for example, {Ssn,
Name, Age}—is a superkey.
However, the superkey {Ssn, Name, Age} is not a key of
STUDENT because removing Name or Age or both from the set
still leaves us with a superkey.
In general, any superkey formed from a single attribute is also a
key.
A key with multiple attributes must require all its attributes
together to have the uniqueness property.
The value of a key attribute can be used to identify uniquely each
tuple in the relation.
For example, the Ssn value 305-61-2435 identifies uniquely the
tuple corresponding to Benjamin Bayer in the STUDENT
relation.
Figure 3.29 superkey and key in STUDENT relation
In general, a relation schema may have more than one key. In this
case, each of the keys is called a candidate key. For Example,

Figure 3.30
the CAR relation,
with two candidate keys
The CAR relation in Figure 3.30 has two candidate keys:
License_number and Engine_serial_number.
It is common to designate one of the candidate keys as the primary key
of the relation.
This is the candidate key whose values are used to identify tuples in the
relation.
We use the convention that the attributes that form the primary key of a
relation schema are underlined.
Notice that when a relation schema has several candidate keys, the
choice of one to become the primary key is somewhat arbitrary;
however, it is usually better to choose a primary key with a single
attribute or a small number of attributes.
The other candidate keys are designated as unique keys and are not
underlined.

Another constraint on attributes specifies whether NULL values are or


are not permitted.
For example, if every STUDENT tuple must have a valid, non-NULL
value for the Name attribute, then Name of STUDENT is constrained to
be NOT NULL.
Relational Databases and Relational
Database Schemas Example
In this example, we define a relational database and a relational
database schema.
Figure 5.5 shows a relational database schema that we call
COMPANY = {EMPLOYEE, DEPARTMENT, DEPT_LOCATIONS,
PROJECT, WORKS_ON, DEPENDENT}.

In each relation schema, the underlined attribute represents the


primary key.
Figure 5.6 shows a relational database state corresponding to the
COMPANY schema.
When we refer to a relational database, we implicitly include
both its schema and its current state.
(Figure 5.5 ,5.6, and 5.7 are drown in separate pdf file attached
with the slides)
 NOTES:
1. A database state that does not obey all the integrity constraints
is called not valid, and a state that satisfies all the constraints in
the defined set of integrity constraints IC is called a valid state.
2. In Figure 5.5, the Dnumber attribute in both DEPARTMENT and
DEPT_LOCATIONS stands for the same real-world concept—the
number given to a department.
That same concept is called Dno in EMPLOYEE and Dnum in
PROJECT.
Attributes that represent the same real-world concept may or
may not have identical names in different relations.
3. Alternatively, attributes that represent different concepts may
have the same name in different relations.
For example, we could have used the attribute name Name for
both Pname of PROJECT and Dname of DEPARTMENT;
in this case, we would have two attributes that share the
same name but represent different real-world concepts —
project names and department names.
4. In some early versions of the relational model, an assumption
was made that the same real-world concept, when represented by
an attribute, would have identical attribute names in all relations.
This creates problems when the same real-world concept is
used in different roles (meanings) in the same relation.
For example, the concept of Social Security number appears
twice in the EMPLOYEE relation of Figure 5.5: once in the role
of the employee’s SSN, and once in the role of the
supervisor’s SSN.
We are required to give them distinct attribute names—Ssn
and Super_ssn, respectively—because they appear in the
same relation and in order to distinguish their meaning.
5. Integrity constraints are specified on a database schema and are
expected to hold on every valid database state of that schema.
6. In addition to domain, key, and NOT NULL constraints, two
other types of constraints are considered part of the relational
model: entity integrity and referential integrity. (see the
following sections).
The entity integrity constraint states that no primary key value can be
NULL.
This is because the primary key value is used to identify individual
tuples in a relation.
Having NULL values for the primary key implies that we cannot
identify some tuples.
For example, if two or more tuples had NULL for their primary keys,
we may not be able to distinguish them if we try to reference them from
other relations.

Key constraints and entity integrity constraints are specified on


individual relations.
The referential integrity constraint is specified between two relations
and is used to maintain the consistency ‫ ترابط‬among tuples in the two
relations.
Informally, the referential integrity constraint states that a tuple in one
relation that refers to another relation must refer to an existing tuple in
that relation.
For example, in Figure 5.6, the attribute Dno of EMPLOYEE gives the
department number for which each employee works; hence, its
value in every EMPLOYEE tuple must match the Dnumber value of
some tuple in the DEPARTMENT relation.
To define referential integrity more formally, first we define the
concept of a foreign key.
The conditions for a foreign key, given below, specify a referential
integrity constraint between the two relation schemas R1 and R2.
A set of attributes FK in relation schema R1 is a foreign key of R1
that references relation R2 if it satisfies the following rules:
1. The attributes in FK have the same domain(s) as the primary key
attributes PK of R2; the attributes FK are said to reference or
refer to the relation R2.
2. A value of FK in a tuple t1 of the current state r1(R1) either occurs
as a value of PK for some tuple t2 in the current state r2(R2) or is
NULL.
In the former case, we have t1[FK] = t2[PK], and we say that
the tuple t1 references or refers to the tuple t2.
In this definition, R1 is called the referencing relation and R2 is the
referenced relation.
If these two conditions hold, a referential integrity constraint from R1
to R2 is said to hold.
In a database of many relations, there are usually many referential
integrity constraints.
Referential integrity constraints typically arise from the relationships
among the entities represented by the relation schemas.

For example, consider the database shown in Figure 5.6:


In the EMPLOYEE relation, the attribute Dno refers to the
department for which an employee works; hence, we designate Dno
to be a foreign key of EMPLOYEE referencing the DEPARTMENT
relation.
This means that a value of Dno in any tuple t1 of the EMPLOYEE
relation must match a value of the primary key of DEPARTMENT—
the Dnumber attribute—in some tuple t2 of the DEPARTMENT
relation,
or the value of Dno can be NULL if the employee does not belong to
a department or will be assigned to a department later.
For example, in Figure 5.6 the tuple for employee ‘John Smith’
references the tuple for the ‘Research’ department, indicating that
‘John Smith’ works for this department.
Notice that a foreign key can refer to its own relation. (How)
For example, the attribute Super_ssn in EMPLOYEE refers to the
supervisor of an employee; this is another employee, represented
by a tuple in the EMPLOYEE relation.
Hence, Super_ssn is a foreign key that references the EMPLOYEE
relation itself.
In Figure 5.6 the tuple for employee ‘John Smith’ references the
tuple for employee ‘Franklin Wong,’ indicating that ‘Franklin
Wong’ is the supervisor of ‘John Smith’.
We can diagrammatically display referential integrity constraints by
drawing a directed arc from each foreign key to the relation it
references.
For clarity, the arrowhead may point to the primary key of the
referenced relation.
Figure 5.7 shows the schema in Figure 5.5 with the referential
integrity constraints displayed in this manner.
The operations of the relational model can be categorized into
retrievals and updates.
In this section, we concentrate on the database modification or
update operations.
There are three basic operations that can change the states of
relations in the database: Insert, Delete, and Update (or Modify).
They insert new data, delete old data, or modify existing data
records, respectively.
1. Insert: is used to insert one or more new tuples in a relation,
2. Delete: is used to delete tuples, and
3. Modify: is used to change the values of some attributes in
existing tuples.
Whenever these operations are applied, the integrity constraints
specified on the relational database schema should not be
violated.
In this section we use the database shown in Figure 5.6 for examples
and discuss the four types of constraints discussed before.
For each type of operation, we give some examples and discuss any
constraints that each operation may violate.

The Insert operation provides a list of attribute values for a new


tuple t that is to be inserted into a relation R.
Insert can violate any of the four types of constraints.
Domain constraints can be violated if an attribute value is given
that does not appear in the corresponding domain or is not of the
appropriate data type.
Key constraints can be violated if a key value in the new tuple t
already exists in another tuple in the relation r(R).
Entity integrity can be violated if any part of the primary key of
the new tuple t is NULL.
Referential integrity can be violated if the value of any foreign
key in t refers to a tuple that does not exist in the referenced
relation. Here are some examples to illustrate this discussion:
If an insertion violates one or more constraints, the default option is to
reject the insertion.
In this case, it would be useful if the DBMS could provide a reason to the
user as to why the insertion was rejected.

The Delete operation can violate only referential integrity.


This occurs if the tuple being deleted is referenced by foreign keys from
other tuples in the database.
To specify deletion, a condition on the attributes of the relation selects the
tuple (or tuples) to be deleted. Here are some examples.
The Update (or Modify) operation is used to change the values of one
or more attributes in a tuple (or tuples) of some relation R.
It is necessary to specify a condition on the attributes of the relation to
select the tuple (or tuples) to be modified. Here are some examples.
End of Lecture_13

You might also like