Database Modeling and Design: Logical Design: Toby Teorey, Sam Lightstone, Tom Nadeau
Database Modeling and Design: Logical Design: Toby Teorey, Sam Lightstone, Tom Nadeau
Database Modeling and Design: Logical Design: Toby Teorey, Sam Lightstone, Tom Nadeau
4th Edition
Lecture Notes
Contents
I. Introduction ................................................................………...……2
Relational database life cycle 3
Characteristics of a good database design process 6
II. The Entity-Relationship (ER) Model …………...……………….7
Basic ER concepts 7
Ternary relationships 11
1
I. Introduction
Introductory Concepts
data—a fact, something upon which an inference is based (information or knowledge has
value, data has cost)
data item—smallest named unit of data that has meaning in the real world (examples:
last name, address, ssn, political party)
data aggregate (or group) -- a collection of related data items that form a
whole concept; a simple group is a fixed collection, e.g. date (month, day, year); a
repeating group is a variable length collection, e.g. a set of aliases.
database—collection of interrelated stored data that serves the needs of multiple users
within one or more organizations; a collection of tables in the relational model.
database administrator (DBA) -- person or group responsible for the effective use of
database technology in an organization or enterprise.
2
4. Management control—DBA: lifecycle control, training, maintenance
*Y2K (year 2000) problem—many systems store 2-digit years (e.g. ‘02-OCT-98’) in
their programs and databases, that give incorrect results when used in date arithmetic
(especially subtraction), so that ‘00’ is still interpreted as 1900 rather than 2000. Fixing
this problem requires many hours of reprogramming and database alterations for many
companies and government agencies.
3
4
Figure 1.2
5
Characteristics of a Good Database Design Process
* Iterative requirements analysis
- interview top-down
- use simple models for data flow and data relationships
- verify model
6
II. Entity-Relationship (ER) Model
Entity - a class of real world objects having common characteristics and properties about
which we wish to record information.
7
Figure 2.1
8
Figure 2.2
9
Figure 2.3
10
Super-class (super-type)/subclass (subtype) relationship
Generalization
* similarities are generalized to a super-class entity, differences are specialized to a subclass
called an “ISA” relationship (“specialization” is the inverse relationship)
* special property: inheritance - subclass inherits the primary key of the super-class, super-c
common nonkey attributes, each subclass has specialized non-key attributes
Aggregation
* “part-of” relationship among entities to a higher type aggregate entity (“contains” is the in
relationship)
11
Figure 2.4
Constraints in ER modeling
* exclusion constraint - restricts an entity to be related to only of several other
- mandatory/optional
- specifies lower bound of connectivity of entity instances
- participating in a relationship as 1 or 0
12
Ternary Relationships
Figure 2.6
13
14
III. The Unified Modeling Language (UML)
Class Diagrams
15
Degree
manag 1
reflexive
association Employe *
managed
binary * 1
association Department Division
*
Employee
Multiplicit
i one-to-one manager
Department Employe
1 1
one-to- Department 1 Employe
*
WorkAssignmen
task-
assignment
many-to-
Employe Project
* *
Existence
manager
optional Department Employe
0..1 1
mandatory occupant
Office Employe
1 0..*
16
Employee
Individual
Complete
enumeration
of sub-
Employee Customer
EmpCust
Software
Progra Electronic
Course
Teacher Text
17
Figure 3.5 UML n-ary relationship (parallel to Figure 2.7)
Composition Invoice
example with
«pk» inv_num
customer_id
inv date
1 .. *
LineItem
«pk» inv_num
«pk»
line_num
description
18
Musi
Group
Artist Medi Distributio
Composer MusicMedi Studio
Lyricist a Publisher
Musician Album RetailStore
Instrument CD
Song
Rendition
19
Figure 3.9 Relationships between classes in the music package
Publishe
Grou
Music Studio
Artis
Produce
Album CD
Track
Rendition
Figure 3.10 Classes of the media package, and related classes
20
Activity Diagrams
21
Customer Manufacturer
Request Generate
Review quote
[unacceptabl
]
[acceptabl
]
Place Enter
Produce order
Ship order
Receive
Receive Generate
Pay Record
22
Rules of Thumb for UML Usage
1. Decide what you wish to communicate first, and then focus your
description. Illustrate the details that further your purpose, and elide the rest..
Be concise.
2. Keep each UML diagram to one page. Diagrams are easier to understand
if they can be seen in one shot.
3. Use the UML when it is useful. Don't feel compelled to write a UML
document just because you feel you need a UML document.
5. Take care to clearly organize each diagram. Avoid crossing associations. Group
elements together if there is a connection in your mind.
23
IV. Requirements Analysis and Conceptual Data Modeling
Requirements Analysis
Purpose - identify the real-world situation in enough detail
to be able to define database components. Collect two types of data: natural data (input to
the database) and processing data (output from the database).
1. Organizational objectives
- sell more cars this year
- move into to recreational vehicle market
3. Organizational structure/chart
24
Data and Process Dictionary Entries for Requirements Analysis
in the Database Design Lifecycle
Entity Description (possibly in a data dictionary)
Name customer
Reference-no 4201
Cardinality 10,000
Growth rate 100 per month
Synonyms user, buyer
Role (or description) someone who purchases or rents a
product made by the company.
Security level 0 (customer list is public)
Subtypes adults, minors
Key attribute(s) cust-no
Non-key attribute(s) cust-name, addr, phone, payment-status Relationship to
other entities salesperson, order, product
Used in which applications billing, advertising
Relationship description
Name purchase
Reference-no 511037
Degree binary
Entities and connectivity customer(0,n), product(1,n)
Synonyms buy
Attributes (of the relationship) quantity, order-no
Assertions a customer must have purchased at
least one product, but some products
may not have been purchased as yet by
any customers.
25
Entities used employee
Data volume (how many entities) implicit from entity cardinality
Function: Take customer orders and either fill them or make adjustments.
Frequency: daily
26
View Integration Methods
Goal in schema integration
- to create a non-redundant unified (global) conceptual schema
1. Comparing of schemas
* look for correspondence (identity) among entities
2. Conforming of schemas
* resolve conflicts (often user interaction is required)
27
3. Merging and restructuring
* superimpose entities
28
Figure 4.5
29
Figure 4.6
30
Figure 4.7
31
Entity Clustering for ER Models
Motivation
* conceptual (ER) models are difficult to read and understand for large and complex databases, e.g.
10,000 or more data elements
* there is a need for a tool to abstract the conceptual database schema (e. g. clustering of the ER
diagram)
* potential applications
- documentation of the database conceptual schema (in coordination with the data dictionary)
Clustering Methodology
Given an extended ER diagram for a database.....
32
Figure 4.8
33
Figure 4.9
V. Transforming the Conceptual Data Model to SQL
Tables
* Entity – directly to a SQL table
* Ternary relationship – directly to a SQL table, taking the 3 primary keys of the 3
entities associated with this relationship as foreign keys in the new table
* Generalization subclass (subtype) entity – directly to a SQL table, but with the
primary key of its super-class (super-type) propagated down as a foreign key into its table
34
* Mandatory constraint (1 lower bound) on the “one” side of a one-to-many
relationship – the foreign key in the “many” side table associated with the primary key
in the “one” side table should be set as “not null” (when the lower bound is 0, nulls are
allowed as the default in SQL)
35
36
Figure 5.1
37
Figure 5.3
38
Figure 5.5
39
Figure 5.7
40
VI. Normalization
First normal form (1NF) to third normal form (3NF) and BCNF
Goals of normalization
1. Integrity
2. Maintainability
super-key -- a set of one or more attributes, which, when taken collectively, allows us to identify
uniquely an entity or table.
candidate key—any subset of the attributes of a super-key that is also a super-key, but not
reducible.
primary key -- arbitrarily selected from the set of candidate keys, as needed for indexing.
41
First Normal Form (single table)
TABLE SUPPLIER_PART (100k rows, 73 bytes/row => 7.3 MB)
SNUM SNAME STATUS CITY PNUM PNAME WT QTY SHIPDATE
S1 SMITH 20 LONDON P1 NUT 12 3 1-4-90
S1 SMITH 20 LONDON P2 BOLT 22 2 2-17-90
S1 SMITH 20 LONDON P3 WRENCH 27 6 11-5-89
S1 SMITH 20 LONDON P4 WRENCH 24 2 6-30-91
S1 SMITH 20 LONDON P5 CLAMP 22 1 8-12-91
S1 SMITH 20 LONDON P6 LEVEL 19 5 4-21-91
S2 JONES 10 PARIS P1 NUT 12 3 5-3-90
S2 JONES 10 PARIS P2 BOLT 22 4 12-31-90
S3 BLAKE 10 PARIS P3 WRENCH 27 4 3-25-91
S3 BLAKE 10 PARIS P5 CLAMP 22 2 3-27-91
S4 CLARK 20 LONDON P2 BOLT 22 2 10-31-89
S4 CLARK 20 LONDON P4 WRENCH 24 3 7-14-90
S4 CLARK 20 LONDON P5 CLAMP 22 7 8-20-90
S5 ADAMS 30 ATHENS P5 CLAMP 22 5 8-11-91
Functional dependencies
SNUM --> SNAME, STATUS,CITY
CITY --> STATUS
PNUM --> PNAME, WT
SNUM,PNUM,SHIPDATE --> QTY
42
S1 P1 3 1-4-90 SNUM, PNUM, SHIPDATE--> QTY
S1 P2 2 2-17-90
S1 P3 6 11-5-89
S1 P4 2 6-30-90
S1 P5 1 8-12-91
S1 P6 5 4-21-91
S2 P1 3 5-3-90
S2 P2 4 12-31-90
S3 P3 4 3-25-91
S3 P5 2 3-27-91
S4 P2 2 10-31-89
S4 P4 3 7-14-90
S4 P5 7 8-20-90
S5 P5 5 8-11-91
43
Third Normal Form
44
Functional Dependency Inference rules
(Armstrong’s Axioms)
1. Reflexivity
If Y is a subset of the attributes of X, then X->Y.
X = ABCD, Y = ABC => X->Y
X->X trivial case
2. Augmentation
If X->Y and Z is a subset of table R (i.e. Z is any set of attributes in R), then XZ -> YZ .
3. Transitivity
If X->Y and Y->Z then X->Z.
4. Pseudo-transitivity
If X->Y and YW->Z then XW->Z.
(transitivity is a special case of pseudo-transitivity when W is null)
5. Union
If X->Y and X->Z then X->YZ.
6. Decomposition
If X->YZ then X->Y and X->Z.
Given: any FD containing all attributes in the table R(W,X,Y,Z), i.e. XY -> WZ.
Proof:
(1) XY -> WZ given
(2) XY -> XY by the reflexivity axiom
(3) XY -> XYWZ by the union axiom
(4) XY uniquely determines every attribute in table R, as shown in (3)
(5) XY uniquely defines table R, by the definition of a table as having no duplicate rows
(6) XY is therefore a super-key, by the definition of a super-key.
45
3NF Synthesis Algorithm (Bernstein)
Basic definitions
g e H set of FDs
H+ closure of H - set of all FDs derivable from H using all the FD inference rules
Example
Given a set of FDs H, determine a minimal set of tables in 3NF,
while preserving all FDs and maintaining only lossless decomposition/joins.
H: AB->C DM->NP D->KL
A->DEFG D->M
E->G L->D
F->DJ PR->S
G->DI PQR->ST
Step 3: Partition H’ into tables such that all FDs with the
same left side are in one table, thus eliminating any non-fully functional FDs. (Note: creating
tables at this point would be a feasible solution for 3NF, but not necessarily minimal.)
R1: AB->C R4: G->DI R7: L->D
R2: A->EF R5: F->DJ R8: PQR->T
R3: E->G R6: D->KLMNP
Step 4: Merge equivalent keys, i.e. merge tables where all FD’s satisfy 3NF.
46
4.1 Write out the closure of all LHS attributes resulting from Step 3, based on transitivities.
4.2 Using the closures, find tables that are subsets of other groups and try to merge them. Use Rule
1 and Rule 2 to establish if the merge will result in FDs with super-keys on the LHS. If not, try
using the axioms to modify the FDs to fit the definition of super-keys.
4.3 After the subsets are exhausted, look for any overlaps among tables and apply Rules 1 and 2
(and the axioms) again.
In this example, note that R7 (L->D) has a subset of the attributes of R6 (D->KLMNP).
Therefore we merge to a single table with FDs D->KLMNP, L->D because it satisfies 3NF: D is
a super-key by Rule 1 and L is a super-key by Rule 2.
Final 3NF (and BCNF) table attributes, FDs, and candidate keys:
R1: ABC (AB->C with key AB) R5: DFJ (F->DJ with key F)
R2: AEF (A->EF with key A) R6: DKLMNP (D->KLMNP, L->D, w/keys D, L)
R3: EG (E->G with key E) R7: PQRT (PQR->T with key PQR)
R4: DGI (G->DI with key G) R8: PRS (PR->S with key PR)
Step 4a. Check to see whether all tables are also BCNF. For any table that is not BCNF, add the
appropriate partially redundant table to eliminate the delete anomaly.
47
Maier’s Example using 3NF Synthesis
[Maier,D. The Theory of Relational Databases, Computer Science Press, 1983]
R = {A,B,C,D,E,F,G,H,I,J,K }
Functional dependencies (FDs):
(1) E --> A B C D F G H I J K (7) H I --> J
(2) A B C --> E D F G H I J K (8) I J --> H
(3) A B D --> E C F G H I J K (9) H J --> I
(4) G --> H I J
(5) C F --> K
(6) D F --> K
48
Example of a 3NF table that is not BCNF,
Delete anomaly: If Sutton drops Journalism, then we have no record of Murrow teaching Journalism.
How can we decompose this table into BCNF?
49
Decomposition 2 (better).....eliminates the delete anomaly
SI (no FD) and I -> C
Advantages – eliminates the delete anomaly, lossless
Disadvantage - dependency SC -> I is not preserved
The new row is allowed in SI using unique(student,instructor) in the create table command, and the
join of SI and IC is lossless. However, a join of SI and IC now produces the following two rows:
student course instructor
Sutton Math Von Neumann
Sutton Math Dantzig which violates the FD SC -> I.
Oracle, for instance, has no way to automatically check SC->I, although you could write a procedure to
do this at the expense of a lot of overhead.
50
VII. An Example of Logical Database Design
Requirements Specification
The management of a large retail store would like a database to keep track of sales
activities. The requirements analysis for this database led to the following six entities and
their unique identifiers:
• Each customer has one job-title, but different customers may have the same job-title.
• Each customer may place many orders, but only one customer may place a particular
order.
• Each department has many salespeople, but each salesperson must work in only one
department.
• Each department has many items for sale, but each item is sold in only one
department. (Item means item type, like IBM PC).
• For each order, items ordered in different departments must involve different
salespeople, but all items ordered within one department must be handled by
exactly one salesperson. In other words, for each order, each item has exactly one
salesperson; and for each order, each department has exactly one salesperson.
ER Construct FDs
Customer(many):Job(one) cust-no -> job-title
Order(many): Customer(one) order-no -> cust-no
Salesperson(many): Department(one) sales-id -> dept-no
Item(many): Department(one) item-no -> dept-no
Order(many): Item(many): order-no,item-no->sales-id
Salesperson(one)
Order(many): Department(many): order-no,dept-no-> sales-id
Salesperson(one)
51
Figure 7.1
52
foreign key (dept_no) references department
on delete set null on update cascade);
53
order_item_sales order_no,item_no sales_id
order_dept_sales order_no,dept_no sales_id
54
55
VIII. Business Intelligence
Data Warehousing
Table 8.1 Comparison between OLTP and data warehouse databases
Staging Area
Transform
Load
Data
Warehouse
Ad Hoc Query Tools Data Mining
Customer
1
«pk» CustId
Name
CustType
Fact Table City
Ship Calendar State Province
1 «fk» CustID * Country
«pk» ShipDateID «fk» ShipDateID
Ship Date * «fk» BindID
Ship Month
«dd» JobID * Bind Style
Ship Quarter
Ship Year Cost
Ship Day of Week Sell «pk» BindId
1 Bind Desc
Bind Category
Country
State Province
Cust Type
City
Customer
Ship Year
57
Dimensional Design Process
Select a Business
Determine
Choose
Identify
«fk» shape id
«fk» color id
«fk» texture id
Texture Density
«fk» density id
«fk» size id
«fk» estimate date id
«dd» estimate number
Size «fk» win date id Estimate Date
«dd» job number
«fk» customer id
«fk» promotion id
Win Date «fk» cost center id Customer
widget quantity
estimated hours
hourly rate
estimated cost
Promotion markup Cost Center
discount
price
58
Color
«pk» color id
color description
hue
intensity
glows in dark
59
Productivity Detail
Cost Center
«dd» job number Actual Start Date
«fk» cost center id
Employee «fk» employee id
«fk» actual start date id Actual Start Time
«fk» actual start time id
«fk» actual finish date id
«fk» actual finish time id Actual Finish Date
widget quantity
finished on time
estimated hours Actual Finish Time
actual hours
Invoice Date
Cost Center
Promotion
Employee
Customer
Win Date
Density
Texture
Shape
Color
Size
Estimating x x x x x x x x x x
Scheduling x x x x x x x x x
Productivity Tracking x x x x x x
Job Costing x x x x x x x x x
60
Shape Job Costing Detail Color
«fk» shape id
«fk» color id
«fk» texture id
Texture Density
«fk» density id
«fk» size id
«fk» estimate date id
«dd» estimate number
Size «fk» win date id Estimate Date
«dd» job number
«fk» customer id
«fk» promotion id
Win Date «fk» cost center id Customer
«fk» invoice date id
widget quantity
estimated hours
hourly rate
Promotion estimated cost Cost Center
markup
discount
price
actual hours
Invoice Date actual cost
61
On-Line Analytical Processing (OLAP)
(4, 2) (3, 3)
(4, 3)
d
Possible views = Π hi (8.1)
i=1
0.1 P=0.9
{} 1
{c, s} 0x2=0
{s} 0.79M x 2 = 1.58M
{c} 5.9M x 2 = 11.8M
{} 6M - 1
63
64
IX. CASE Tools for Logical Database Design
65
Similarly these tools produce the transformation table types described in Chapter 5:
• An entity table with the same information content as the original entity.
• An entity table with the embedded foreign key of the parent entity.
• A relationship table with the foreign keys of all the entities in the
relationship.
Figure 9.3 Rational Data Architect entity relationship modeling (courtesy IBM)
66
Generating a database from a design
Database support
Collaborative support
1. Concurrency control.
2. Merge and collaboration capabilities that allow designers to combine designs
or merge their latest changes into a larger design project.
Distributed development
Application life cycle tooling integration
Design compliance checking
Design and Normalization
Discover 1st 2nd 3rd normalization
Index and Storage
check for excessive indexing
Naming Standards
Security Compliance
Sarbanes-Oxley compliance
Valid data model and rules
Model syntax checks
Reporting
Modeling a Data Warehouse
Semi-Structured data, XML
##
67