Database Modeling and Design: Logical Design: Toby Teorey, Sam Lightstone, Tom Nadeau

Database Modeling and Design: Logical Design
4th Edition
Toby Teorey, Sam Lightstone, Tom Nadeau
Lecture Notes
Contents
I. Introduction ................................................................………...……2
Relational database life cycle 3
Characteristics of a good database design process 6
II. The Entity-Relationship (ER) Model …………...……………….7
Basic ER concepts 7
Ternary relationships 11
III. The Unified Modeling Language (UML)………...…………….13

Class diagrams 13
Activity diagrams 19
Rules of thumb for UML 21
IV. Requirements Analysis and Conceptual Data Modeling….…..22

Requirements analysis 22
Conceptual data modeling 24
View integration methods 25
Entity Clustering 30
V. Transforming the Conceptual Model to SQL…………...………32

VI. Normalization and normal forms ………………………………38
First normal form to third normal form (3NF) and BCNF 38
3NF synthesis algorithm (Bernstein) 43
VII. An Example of Logical Database Design………………………48
VIII. Business Intelligence………………………………..……….....52

Data warehousing 52
On-line analytical processing (OLAP) 58
IX. CASE Tools for Logical Database Design……………………….60
1
I. Introduction
Introductory Concepts
data—a fact, something upon which an inference is based (information or knowledge has
value, data has cost)
data item—smallest named unit of data that has meaning in the real world (examples:
last name, address, ssn, political party)
data aggregate (or group) -- a collection of related data items that form a
whole concept; a simple group is a fixed collection, e.g. date (month, day, year); a
repeating group is a variable length collection, e.g. a set of aliases.
record—group of related data items treated as a unit by an application program

(examples: presidents, elections, congresses)
file—collection of records of a single type (examples: president, election)
database—collection of interrelated stored data that serves the needs of multiple users
within one or more organizations; a collection of tables in the relational model.
database management system (DBMS) -- a generalized software system for storing

and manipulating databases. Includes logical view (schema, sub-schema), physical view
(access methods, clustering), data manipulation language, data definition language,
utilities - security, recovery, integrity, etc.
database administrator (DBA) -- person or group responsible for the effective use of
database technology in an organization or enterprise.
Objectives of Database Management

1. Data availability—make an integrated collection of data available to a wide variety of
users
* at reasonable cost—performance in query update, eliminate or control data
redundancy
* in meaningful format—data definition language, data dictionary
* easy access—query language (4GL, SQL, forms, windows, menus);
embedded SQL, etc.; utilities for editing, report generation, sorting
2. Data integrity—insure correctness and validity

* checkpoint/restart/recovery
* concurrency control and multi-user updates
* accounting, audit trail (financial, legal)
3. Privacy (the goal) and security (the means)

* schema/sub-schema, passwords
2
4. Management control—DBA: lifecycle control, training, maintenance
5. Data independence (a relative term) -- avoids reprogramming of applications, allows

easier conversion and reorganization
* physical data independence—program unaffected by changes in the storage structure or

access methods
* logical data independence—program unaffected by changes in the schema
* Social Security Administration example

- changed benefit checks from $999.99 to $9999.99 format
- had to change 600 application programs
- 20,000 work hours needed to make the changes (10 work years)
*Y2K (year 2000) problem—many systems store 2-digit years (e.g. ‘02-OCT-98’) in
their programs and databases, that give incorrect results when used in date arithmetic
(especially subtraction), so that ‘00’ is still interpreted as 1900 rather than 2000. Fixing
this problem requires many hours of reprogramming and database alterations for many
companies and government agencies.
Relational Database Lifecycle

1. Requirements analysis
* natural data relationships (process-independent)
* usage requirements (process-dependent)
* hardware/software platform (OS, DBMS)
* performance and integrity constraints
* result: requirements specification document, data dictionary entries
2. Logical database design

2.1 Conceptual data modeling
2.2 View integration
2.3 Transformation of the conceptual model to SQL tables
2.4 Normalization of SQL tables
*result: global database schema, transformed to table definitions
3. Physical database design

* index selection
* materialized views, clustering, partitioning, denormalization
*data distribution over the network
4. Database implementation, monitoring, and modification
3
4
Figure 1.2
5
Characteristics of a Good Database Design Process
* Iterative requirements analysis
- interview top-down
- use simple models for data flow and data relationships
- verify model
* Stepwise refinement and iterative re-design
* Well-defined design review team and process

-database designers
-DBMS software group
-end users in the application areas when to review
* When to do a design review

- after requirements analysis & conceptual design
- after physical design
- after implementation (tuning) meeting format
*Design review rules

- short documentation in advance
- formal presentation
- criticize product, not person
- goal is to locate problems, do solutions off line
6
II. Entity-Relationship (ER) Model
Basic ER Modeling Concepts
Entity - a class of real world objects having common characteristics and properties about
which we wish to record information.
Relationship - an association among two or more entities
* occurrence - instance of a relationship is the collective instances of the related entities

* degree - number of entities associated in the relationship (binary, ternary, other n-ary)
* connectivity - one-to-one, one-to-many, many-to-many
* existence dependency (constraint) - optional/mandatory
Attribute - a characteristic of an entity or relationship
* Identifier - uniquely determines an instance of an entity

* Identity dependence - when a portion of an identifier is inherited from another entity
* Multi-valued - same attribute having many values for one entity
* Surrogate - system created and controlled unique key (e.g. Oracle’s “create sequence”)
7
Figure 2.1
8
Figure 2.2
9
Figure 2.3
10
Super-class (super-type)/subclass (subtype) relationship
Generalization
* similarities are generalized to a super-class entity, differences are specialized to a subclass
called an “ISA” relationship (“specialization” is the inverse relationship)
* disjointness constraint - there is no overlap among subclasses
* completeness constraint - constrains subclasses to be all-inclusive of the super-class or not

total or partial coverage of the superclass)
* special property: hierarchical in nature
* special property: inheritance - subclass inherits the primary key of the super-class, super-c
common nonkey attributes, each subclass has specialized non-key attributes
Aggregation
* “part-of” relationship among entities to a higher type aggregate entity (“contains” is the in
relationship)
* attributes within an entity, data aggregate (mo-day-year)
* entity clustering variation: membership or “is-member-of” relationship
11
Figure 2.4
Constraints in ER modeling
* exclusion constraint - restricts an entity to be related to only of several other
* entities at a given point in time
- mandatory/optional
- specifies lower bound of connectivity of entity instances
- participating in a relationship as 1 or 0
* uniqueness constraint – one-to-one functional dependency among key attributes

in a relationship: binary, ternary, or higher n-ary
12
Ternary Relationships
Figure 2.6
13
14
III. The Unified Modeling Language (UML)
Class Diagrams
Figure 3.1 Basic UML class diagram constructs
15
Degree
manag 1
reflexive
association Employe *
managed
binary * 1
association Department Division
ternary skill assignme

Skill Project
association * *
*
Employee
Multiplicit
i one-to-one manager
Department Employe
1 1
one-to- Department 1 Employe
*
WorkAssignmen
task-
assignment
many-to-
Employe Project
* *
Existence
manager
optional Department Employe
0..1 1
mandatory occupant
Office Employe
1 0..*
Figure 3.2 Selected UML relationship types (parallel to Figure 2.2)
16
Employee
Manage Engine Technicia Secreta
Individual
Complete
enumeration
of sub-
Employee Customer
EmpCust
Figure 3.3 UML generalization constructs (parallel to Figure 2.4)
Software
Progra Electronic
Course
Teacher Text
Figure 3.4 UML aggregation constructs (parallel to Figure 2.5)
17
Figure 3.5 UML n-ary relationship (parallel to Figure 2.7)
Primary key as a Car

t t
«pk» vin
mileage
color
Composition Invoice
example with
«pk» inv_num
customer_id
inv date
1 .. *
LineItem
«pk» inv_num
«pk»
line_num
description
Figure 3.6 UML constructs illustrating primary keys
Example from the Music Industry
Music Media Distributio

n
Figure 3.7 Example of related packages
18
Musi
Group
Artist Medi Distributio
Composer MusicMedi Studio
Lyricist a Publisher
Musician Album RetailStore
Instrument CD
Song
Rendition
Figure 3.8 Example illustrating classes grouped into packages
19
Figure 3.9 Relationships between classes in the music package
Publishe
Grou
Music Studio
Artis
Produce
Album CD
Track
Rendition
Figure 3.10 Classes of the media package, and related classes
20
Activity Diagrams
Figure 3.11 UML activity diagram constructs
21
Customer Manufacturer
Request Generate
Review quote
[unacceptabl
]
[acceptabl
]
Place Enter
Produce order
Ship order
Receive
Receive Generate
Pay Record
Figure 3.12 UML activity diagram, manufacturing example
22
Rules of Thumb for UML Usage
1. Decide what you wish to communicate first, and then focus your
description. Illustrate the details that further your purpose, and elide the rest..
Be concise.
2. Keep each UML diagram to one page. Diagrams are easier to understand
if they can be seen in one shot.
3. Use the UML when it is useful. Don't feel compelled to write a UML
document just because you feel you need a UML document.
4. Accompany your diagrams with textual descriptions..
5. Take care to clearly organize each diagram. Avoid crossing associations. Group
elements together if there is a connection in your mind.
23
IV. Requirements Analysis and Conceptual Data Modeling
Requirements Analysis
Purpose - identify the real-world situation in enough detail
to be able to define database components. Collect two types of data: natural data (input to
the database) and processing data (output from the database).
Natural data requirements (what goes into the database)
1. Organizational objectives
- sell more cars this year
- move into to recreational vehicle market
2. Information system objectives

- keep track of competitors’ products and prices
- improve quality and timing of data to management regarding production schedule delays,
etc.
- keep track of vital resources needed to produce and market a product
3. Organizational structure/chart
4. Administrative and operational policies

- annual review of employees
- weekly progress reports
- monthly inventory check
- trip expense submission
5. Data elements, relationships, constraints, computing environment
Processing requirements (what comes out of the database)
1. Existing applications - manual, computerized
2. Perceived new applications
* quantifies how data is used by applications
* should be a subset of data identified in the natural relationships

(but may not be due to unforeseen applications)
* problem - many future applications may be unknown
24
Data and Process Dictionary Entries for Requirements Analysis
in the Database Design Lifecycle
Entity Description (possibly in a data dictionary)
Name customer
Reference-no 4201
Cardinality 10,000
Growth rate 100 per month
Synonyms user, buyer
Role (or description) someone who purchases or rents a
product made by the company.
Security level 0 (customer list is public)
Subtypes adults, minors
Key attribute(s) cust-no
Non-key attribute(s) cust-name, addr, phone, payment-status Relationship to
other entities salesperson, order, product
Used in which applications billing, advertising
Attribute description (data elements in a data dictionary)

Name cust-no
Reference-no 4202
Range of legal values 1 to 999,999
Synonyms cno, customer-number
Data type integer
Description customer id number set by the company.
Key or nonkey key
Source of data table of allowable id numbers
Used in applications billing
Attribute trigger /*describes actions that occur when a
data element is queried or updated*/
Relationship description
Name purchase
Reference-no 511037
Degree binary
Entities and connectivity customer(0,n), product(1,n)
Synonyms buy
Attributes (of the relationship) quantity, order-no
Assertions a customer must have purchased at
least one product, but some products
may not have been purchased as yet by
any customers.
Process (application) description

Name payroll
Reference-no 163
Frequency bi-weekly
Priority 10
Deadline noon Fridays
Data elements used emp-name, emp-salary
25
Entities used employee
Data volume (how many entities) implicit from entity cardinality
Interviews at different levels
Top management - business definition, plan/objectives, future plans

Middle management - functions in operational areas, technical areas, job-titles, job functions
Employees - individual tasks, data needed, data out
Specific end-users of a DBMS - applications and data of interest
Basic rules in interviewing
1. Investigate the business first

2. Agree with the interviewee on format for documentation (ERD, DFD, etc.)
3. Define human tasks and known computer applications
4. Develop and verify the flow diagram(s) and ER diagram(s)
5. Relate applications to data (this helps your programmers)
Example: order entry clerk
Function: Take customer orders and either fill them or make adjustments.
Frequency: daily
Task Def Volume Data Elements
1. Create order 2000 A, B, E, H

2. Validate order 2000 A, B, G, H, J
3. Fill out error form 25 A, C
4. Reserve item/price 6000 A, D, H
5. Request alternate items 75 A, E, I, K,M
6. Enter unit price 5925 A, F, J, N
Conceptual Data Modeling
1. Classify entities and attributes

* entities should contain descriptive information
* multivalued attributes should be classified as entities
* attributes should be attached to the entities they most directly describe
2. Identify the generalization hierarchies
3. Define relationships – binary, unary, ternary
26
View Integration Methods
Goal in schema integration
- to create a non-redundant unified (global) conceptual schema
(1) completeness - all components must appear in the global schema
(2) minimality - remove redundant concepts in the global schema
(3) understandability - does global schema make sense?
1. Comparing of schemas
* look for correspondence (identity) among entities
* detect possible conflicts

- naming conflicts
homonyms - same name for different concepts
synonyms - different names for the same concept
- structural conflicts
type conflicts - different modeling construct for the same concept (e. g. “order” as an entity,
attribute, relationship)
- dependency conflicts - connectivity is different for different views (e.g. job-title vs. job-
title-history)
- key conflicts - same concept but different keys are assigned (e.g. ID-no vs. SSN)
- behavioral conflicts - different integrity constraints (e.g. null rules for optional/mandatory:
insert/delete rules)
* determine inter-schema properties

- possible new relationships to combine schemas
- possible abstractions on existing entities or create new super-classes (super-types)
2. Conforming of schemas
* resolve conflicts (often user interaction is required)
* conform or align schemas to make compatible for integration
* transform the schema via

- renaming (homonyms, synonyms, key conflicts)
- type transformations (type or dependency conflicts)
- modify assertions (behavioral conflicts)
27
3. Merging and restructuring
* superimpose entities
* restructure result of superimposition
28
Figure 4.5
29
Figure 4.6
30
Figure 4.7
31
Entity Clustering for ER Models
Motivation
* conceptual (ER) models are difficult to read and understand for large and complex databases, e.g.
10,000 or more data elements
* there is a need for a tool to abstract the conceptual database schema (e. g. clustering of the ER
diagram)
* potential applications
- end user communication
- application design team communication
- documentation of the database conceptual schema (in coordination with the data dictionary)
Clustering Methodology
Given an extended ER diagram for a database.....
Step 1. Define points of grouping within functional areas.
Step 2. Form entity clusters

* group entities within the same functional area
* resolve conflicts by combining at a higher functional grouping
Step 3. Form higher entity clusters.
Step 4. Validate the cluster diagram.

* check for consistency of interfaces.
* end-users must concur with each level.
32
Figure 4.8
33
Figure 4.9
V. Transforming the Conceptual Data Model to SQL
Tables
* Entity – directly to a SQL table
* Many-to-many binary relationship – directly to a SQL table, taking the 2 primary

keys in the 2 entities associated with this relationship as foreign keys in the new table
* One-to-many binary relationship – primary key on “one” side entity copied as a

foreign key in the “many” side entity’s table
* Recursive binary relationship – same rules as other binary relationships
* Ternary relationship – directly to a SQL table, taking the 3 primary keys of the 3
entities associated with this relationship as foreign keys in the new table
* Attribute of an entity – directly to be an attribute of the table transformed from this

entity
* Generalization super-class (super-type) entity – directly to a SQL table
* Generalization subclass (subtype) entity – directly to a SQL table, but with the
primary key of its super-class (super-type) propagated down as a foreign key into its table
34
* Mandatory constraint (1 lower bound) on the “one” side of a one-to-many
relationship – the foreign key in the “many” side table associated with the primary key
in the “one” side table should be set as “not null” (when the lower bound is 0, nulls are
allowed as the default in SQL)
35
36
Figure 5.1
37
Figure 5.3
38
Figure 5.5
39
Figure 5.7
40
VI. Normalization
First normal form (1NF) to third normal form (3NF) and BCNF
Goals of normalization
1. Integrity
2. Maintainability
Side effects of normalization

* Reduced storage space required (usually, but it could increase)
* Simpler queries (sometimes, but some could be more complex)
* Simpler updates (sometimes, but some could be more complex)
First normal form (1NF) -- a table R is in 1NF iff all underlying

domains contain only atomic values, i.e. there are no repeating groups in
a row.
functional dependency—given a table R, a set of attributes B is functionally dependent on

another set of attributes A if at each instant of time each A value is associated with only one B
value. This is denoted by A -> B. A trivial FD is of the form XY --> X (subset).
super-key -- a set of one or more attributes, which, when taken collectively, allows us to identify
uniquely an entity or table.
candidate key—any subset of the attributes of a super-key that is also a super-key, but not
reducible.
primary key -- arbitrarily selected from the set of candidate keys, as needed for indexing.
Third normal form (3NF)

A table is in 3NF if, for every nontrivial FD X --> A, either:
(1) attribute X is a super-key, or
(2) attribute A is a member of a candidate key (prime attribute)
Boyce-Codd normal form (BCNF)

A table is in BCNF if, for every nontrivial FD X --> A,
(1) attribute X is a super-key.
41
First Normal Form (single table)
TABLE SUPPLIER_PART (100k rows, 73 bytes/row => 7.3 MB)
SNUM SNAME STATUS CITY PNUM PNAME WT QTY SHIPDATE
S1 SMITH 20 LONDON P1 NUT 12 3 1-4-90
S1 SMITH 20 LONDON P2 BOLT 22 2 2-17-90
S1 SMITH 20 LONDON P3 WRENCH 27 6 11-5-89
S1 SMITH 20 LONDON P4 WRENCH 24 2 6-30-91
S1 SMITH 20 LONDON P5 CLAMP 22 1 8-12-91
S1 SMITH 20 LONDON P6 LEVEL 19 5 4-21-91
S2 JONES 10 PARIS P1 NUT 12 3 5-3-90
S2 JONES 10 PARIS P2 BOLT 22 4 12-31-90
S3 BLAKE 10 PARIS P3 WRENCH 27 4 3-25-91
S3 BLAKE 10 PARIS P5 CLAMP 22 2 3-27-91
S4 CLARK 20 LONDON P2 BOLT 22 2 10-31-89
S4 CLARK 20 LONDON P4 WRENCH 24 3 7-14-90
S4 CLARK 20 LONDON P5 CLAMP 22 7 8-20-90
S5 ADAMS 30 ATHENS P5 CLAMP 22 5 8-11-91
Functional dependencies
SNUM --> SNAME, STATUS,CITY
CITY --> STATUS
PNUM --> PNAME, WT
SNUM,PNUM,SHIPDATE --> QTY
Attribute sizes (bytes)

SNUM 5 PNAME 10
SNAME 20 WT 5
STATUS 2 QTY 5
CITY 10 SHIPDATE 8
PNUM 8 Total size 73
Third Normal Form
TABLE PART (100 rows, 23 bytes/row => 2.3 KB)

PNUM PNAME WT Functional dependencies
P1 NUT 12 PNUM --> PNAME, WT
P2 BOLT 17
P3 WRENCH 17
P4 WRENCH 24
P5 CLAMP 12
P6 LEVEL 19
TABLE SHIPMENT (100k rows, 26 bytes/row => 2.6 MB)

SNUM PNUM QTY SHIPDATE Functional dependency
42
S1 P1 3 1-4-90 SNUM, PNUM, SHIPDATE--> QTY
S1 P2 2 2-17-90
S1 P3 6 11-5-89
S1 P4 2 6-30-90
S1 P5 1 8-12-91
S1 P6 5 4-21-91
S2 P1 3 5-3-90
S2 P2 4 12-31-90
S3 P3 4 3-25-91
S3 P5 2 3-27-91
S4 P2 2 10-31-89
S4 P4 3 7-14-90
S4 P5 7 8-20-90
S5 P5 5 8-11-91
NOT Third Normal Form
TABLE SUPPLIER (200 rows, 37 bytes/row => 7.4 KB)

SNUM SNAME STATUS CITY Functional dependencies
S1 SMITH 20 LONDON SNUM --> SNAME, STATUS, CITY
S2 JONES 10 PARIS CITY --> STATUS
S3 BLAKE 10 PARIS
S4 CLARK 20 LONDON
S5 ADAMS 30 ATHENS
Decomposition of Table Supplier into two Third Normal Form (3NF) Tables
43
Third Normal Form
TABLE SUPPLIER_W/O_STATUS (200 rows, 35 bytes/row => 7 KB)

SNUM SNAME CITY Functional dependency
S1 SMITH LONDON SNUM --> SNAME, CITY
S2 JONES PARIS
S3 BLAKE PARIS
S4 CLARK LONDON
S5 ADAMS ATHENS
TABLE CITY_AND_STATUS (100 rows, 12 bytes/row => 1.2 KB)

CITY STATUS Functional dependency
LONDON 20 CITY --> STATUS
PARIS 10
ATHENS 30
In general, the FDs can be derived from

1. Explicit assertions given
2. ER diagram (implied by ER constructs)
3. Intuition (your experience with the problem data)
44
Functional Dependency Inference rules
(Armstrong’s Axioms)
1. Reflexivity
If Y is a subset of the attributes of X, then X->Y.
X = ABCD, Y = ABC => X->Y
X->X trivial case
2. Augmentation
If X->Y and Z is a subset of table R (i.e. Z is any set of attributes in R), then XZ -> YZ .
3. Transitivity
If X->Y and Y->Z then X->Z.
4. Pseudo-transitivity
If X->Y and YW->Z then XW->Z.
(transitivity is a special case of pseudo-transitivity when W is null)
5. Union
If X->Y and X->Z then X->YZ.
6. Decomposition
If X->YZ then X->Y and X->Z.
Superkey Rule 1. Any FD involving all attributes of a table defines

a super-key on the LHS of the FD.
Given: any FD containing all attributes in the table R(W,X,Y,Z), i.e. XY -> WZ.
Proof:
(1) XY -> WZ given
(2) XY -> XY by the reflexivity axiom
(3) XY -> XYWZ by the union axiom
(4) XY uniquely determines every attribute in table R, as shown in (3)
(5) XY uniquely defines table R, by the definition of a table as having no duplicate rows
(6) XY is therefore a super-key, by the definition of a super-key.
Super-key Rule 2. Any attribute that functionally determines a

Super-key of a table, is also a super-key for that table.
Given: Attribute A is a super-key for table R(A,B,C,D,E), and E -> A.

Proof:
(1) Attribute A uniquely defines each row in table R, by the def. of a super-key
(2) A -> ABCDE by the definition of a super-key and a relational table
(3) E -> A given
(4) E -> ABCDE by the transitivity axiom
(5) E is a super-key for table R, by the definition of a super-key.
45
3NF Synthesis Algorithm (Bernstein)
Basic definitions
g e H set of FDs
H+ closure of H - set of all FDs derivable from H using all the FD inference rules
H’ cover of H - any set of FDs from which every FD in H+ can

be derived
H’(non-redundant) – non-redundant cover of H, i.e. a cover which contains no proper subset

which is also a cover. Can be determined with quadratic complexity O(n2).
Example
Given a set of FDs H, determine a minimal set of tables in 3NF,
while preserving all FDs and maintaining only lossless decomposition/joins.
H: AB->C DM->NP D->KL
A->DEFG D->M
E->G L->D
F->DJ PR->S
G->DI PQR->ST
Step 1: Eliminate any extraneous attributes in the left hand

sides of the FDs. We want to reduce the left hand sides of as many FDs as possible.
In general: XY->Z and X->Z => Y is extraneous (Reduction Rule 1)
XYZ->W and X->Y => Y is extraneous (Reduction Rule 2)
For this example we mix left side reduction with the union and decomposition axioms:
DM->NP => D->NP => D -> MNP
D->M D->M
PQR->ST => PQR->S, PQR->T => PQR->.T

PR->S PR->S PR->S
Step 2: Find a non-redundant cover H’ of H, i.e. eliminate any FD

derivable from others in H using the inference rules (most frequently the transitivity axiom).
A->E->G => eliminate A->G from the cover
A->F->D => eliminate A->D from the cover
Step 3: Partition H’ into tables such that all FDs with the
same left side are in one table, thus eliminating any non-fully functional FDs. (Note: creating
tables at this point would be a feasible solution for 3NF, but not necessarily minimal.)
R1: AB->C R4: G->DI R7: L->D
R2: A->EF R5: F->DJ R8: PQR->T
R3: E->G R6: D->KLMNP
Step 4: Merge equivalent keys, i.e. merge tables where all FD’s satisfy 3NF.
46
4.1 Write out the closure of all LHS attributes resulting from Step 3, based on transitivities.
4.2 Using the closures, find tables that are subsets of other groups and try to merge them. Use Rule
1 and Rule 2 to establish if the merge will result in FDs with super-keys on the LHS. If not, try
using the axioms to modify the FDs to fit the definition of super-keys.
4.3 After the subsets are exhausted, look for any overlaps among tables and apply Rules 1 and 2
(and the axioms) again.
In this example, note that R7 (L->D) has a subset of the attributes of R6 (D->KLMNP).
Therefore we merge to a single table with FDs D->KLMNP, L->D because it satisfies 3NF: D is
a super-key by Rule 1 and L is a super-key by Rule 2.
Final 3NF (and BCNF) table attributes, FDs, and candidate keys:
R1: ABC (AB->C with key AB) R5: DFJ (F->DJ with key F)
R2: AEF (A->EF with key A) R6: DKLMNP (D->KLMNP, L->D, w/keys D, L)
R3: EG (E->G with key E) R7: PQRT (PQR->T with key PQR)
R4: DGI (G->DI with key G) R8: PRS (PR->S with key PR)
Step 4a. Check to see whether all tables are also BCNF. For any table that is not BCNF, add the
appropriate partially redundant table to eliminate the delete anomaly.
47
Maier’s Example using 3NF Synthesis
[Maier,D. The Theory of Relational Databases, Computer Science Press, 1983]
R = {A,B,C,D,E,F,G,H,I,J,K }
Functional dependencies (FDs):
(1) E --> A B C D F G H I J K (7) H I --> J
(2) A B C --> E D F G H I J K (8) I J --> H
(3) A B D --> E C F G H I J K (9) H J --> I
(4) G --> H I J
(5) C F --> K
(6) D F --> K
Step 1 - No reduction of determinants necessary.

Step 2 - Find non-redundant cover.
(4) G->HIJ => eliminate HIJ from (1), (2), and (3)
(7) HI->J => reduce (4) to G->HI, eliminating J from (4)
(5) CF -> K => eliminate K from (1) and (3)
(6) DF->K => eliminate K from (2)
(1) E->DFG => eliminate DFG from (2)
(1) E->CFG => eliminate CFG from (3)
Step 3 - Partition into groups with the same left side.

G1: E->ABCDFG G6: DF->K
G2: ABC->E G7: HI->J
G3: ABD->E G8: IJ->H
G4: G->HI G9: HJ->I
G5: CF->K
Step 4 - Merge equivalent keys, forming new groups. Construct final set of tables, attributes, FDs, and
candidate keys.
R1: ABCDEFG ( E->ABCDFG, ABC->E, ABD->E with keys E, ABC, ABD)
R2: GHI (G->HI with key G)
R3: CFK (CF->K with key CF)
R4: DFK (DF->K with key DF
R5: HIJ (HI->J, IJ->H, HJ->I with keys HI, IJ, HJ)
48
Example of a 3NF table that is not BCNF,
i.e. it has further anomalies:

S = student, C = course, I = instructor
SC -> I For each course, each student is taught by only one instructor. A course may be taught by more
than one instructor.
I -> C Each instructor teaches only one course.
This table is 3NF with a candidate key SC:

SCI student course instructor
Sutton Math Von Neumann
Sutton Journalism Murrow
Niven Math Von Neumann
Niven Physics Fermi
Wilson Physics Einstein
Delete anomaly: If Sutton drops Journalism, then we have no record of Murrow teaching Journalism.
How can we decompose this table into BCNF?
Decomposition 1 (bad)........eliminates the delete anomaly

SC (no FDs) and I -> C (two tables)
Problems - 1. lossy join
2. dependency SC -> I is not preserved
SC student course IC instructor course

Sutton Math Von Neumann Math
Sutton Journalism Murrow Journalism
Niven Math Fermi Physics
Niven Physics Einstein Physics
Wilson Physics
----------------join SC and IC ------------------

SCI’ student course instructor
Sutton Journalism Murrow
Niven Math Von Neumann
Niven Physics Fermi
Niven Physics Einstein (spurious row)
Wilson Physics Fermi (spurious row)
Wilson Physics Einstein
49
Decomposition 2 (better).....eliminates the delete anomaly
SI (no FD) and I -> C
Advantages – eliminates the delete anomaly, lossless
Disadvantage - dependency SC -> I is not preserved
SI student instructor IC instructor course

Sutton Von Neumann Von Neumann Math
Sutton Murrow Murrow Journalism
Niven Von Neumann Fermi Physics
Niven Fermi Einstein Physics
Wilson Einstein Dantzig Math (new)
Sutton Dantzig (new)
The new row is allowed in SI using unique(student,instructor) in the create table command, and the
join of SI and IC is lossless. However, a join of SI and IC now produces the following two rows:
student course instructor
Sutton Math Dantzig which violates the FD SC -> I.
Oracle, for instance, has no way to automatically check SC->I, although you could write a procedure to
do this at the expense of a lot of overhead.
Decomposition 3 (tradeoff between integrity and performance)

SC -> I and I -> C (two tables with redundant data)
Problems -extra updates and storage cost
50
VII. An Example of Logical Database Design
Requirements Specification
The management of a large retail store would like a database to keep track of sales
activities. The requirements analysis for this database led to the following six entities and
their unique identifiers:
Entity Entity key Key length(max) No. of

in characters occurrences
Customer cust-no 6 80,000

Job job-no 24 80
Order order-no 9 200,000
Salesperson sales-id 20 150
Department dept-no 2 10
Item item-no 6 5,000
The following assertions describe the data relationships:
• Each customer has one job-title, but different customers may have the same job-title.
• Each customer may place many orders, but only one customer may place a particular
order.
• Each department has many salespeople, but each salesperson must work in only one
department.
• Each department has many items for sale, but each item is sold in only one
department. (Item means item type, like IBM PC).
• For each order, items ordered in different departments must involve different
salespeople, but all items ordered within one department must be handled by
exactly one salesperson. In other words, for each order, each item has exactly one
salesperson; and for each order, each department has exactly one salesperson.
ER Construct FDs
Customer(many):Job(one) cust-no -> job-title
Order(many): Customer(one) order-no -> cust-no
Salesperson(many): Department(one) sales-id -> dept-no
Item(many): Department(one) item-no -> dept-no
Order(many): Item(many): order-no,item-no->sales-id
Salesperson(one)
Order(many): Department(many): order-no,dept-no-> sales-id
Salesperson(one)
51
Figure 7.1
create table customer

(cust_no char(6),
job_title varchar(256),
primary key (cust_no),
foreign key (job_title) references job
on delete set null on update cascade);
create table job

(job_no char(6),
job_title varchar(256),
primary key (job_no));
create table order

(order_no char(9),
cust_no char(6) not null,
primary key (order_no),
foreign key (cust_no) references customer
create table salesperson

(sales_id char(10)
sales_name varchar(256),
dept_no char(2),
primary key (sales_id),
52
foreign key (dept_no) references department
create table department

(dept_no char(2),
dept_name varchar(256),
manager_name varchar(256),
primary key (dept_no));
create table item

(item_no char(6),
dept_no char(2),
primary key (item_no),
create table order_item_sales

(order_no char(9),
item_no char(6),
sales_id varchar(256) not null,
primary key (order_no, item_no),
foreign key (order_no) references order
on delete cascade on update cascade,
foreign key (item_no) references item
foreign key (sales_id) references salesperson
on delete cascade on update cascade);
create table order_dept_sales

(order_no char(9),
dept_no char(2),
sales_id varchar(256) not null,
primary key (order_no, dept_no),
foreign key (order_no) references order
foreign key (sales_id) references salesperson
on delete cascade on update cascade);
Table Primary key Likely non-keys
customer cust_no job_title, cust_name, cust_address

order order_no cust_no, item_no, date_of_purchase,
price
salesperson sales_id dept_no, sales_name, phone_no
item item_no dept_no, color, model_no
53
order_item_sales order_no,item_no sales_id
order_dept_sales order_no,dept_no sales_id
54
55
VIII. Business Intelligence
Data Warehousing
Table 8.1 Comparison between OLTP and data warehouse databases
OLTP Data warehouse
Transaction oriented Business process oriented

Thousands of users Few users (typically under 100)
Generally small (MB up to several GB) Large (hundreds of GB up to
several TB)
Current data Historical data
Normalized data Denormalized data
(many tables, few columns per table) (few tables, many columns per table)
Continuous updates Batch updates*
Simple to complex queries Usually very complex queries
Operational Operational Operational

Applications Applications Applications
feeder feeder feeder

DB1 DB2 DB3
Extract Extract Extract
Staging Area
Transform
Load
Report Generators OLAP
Data
Warehouse
Ad Hoc Query Tools Data Mining
Figure 8.1 Basic data warehouse architecture
Core Requirements for Data Warehousing

1. DWs are organized around subject areas.
2. DWs should have some integration capability.
3. The data is considered to be nonvolatile and should be mass loaded.
4. Data tends to exist at multiple levels of granularity. Most important, the data tends to
be of a historical nature, with potentially high time variance.
5. The DW should be flexible enough to meet changing requirements rapidly.
6. The DW should have a capability for rewriting history, that is, allowing for “what-if”
analysis.
7. A usable DW user interface should be selected.
8. Data should be either centralized or distributed physically.
56
Logical Design
Customer
1
«pk» CustId
Name
CustType
Fact Table City
Ship Calendar State Province
1 «fk» CustID * Country
«pk» ShipDateID «fk» ShipDateID
Ship Date * «fk» BindID
Ship Month
«dd» JobID * Bind Style
Ship Quarter
Ship Year Cost
Ship Day of Week Sell «pk» BindId
1 Bind Desc
Bind Category
Figure 8.3 Example star schema for a data warehouse
Country
State Province
Cust Type
City
Ship Day of Week
Customer
Ship Date Fact Table
Ship Month Bind Style
Ship Quarter Bind Category
Ship Year
Figure 8.4 Example snow flake schema for a data warehouse
57
Dimensional Design Process
Select a Business
Determine
Choose
Identify
Figure 8.5 Four step dimensional design process [Kimb02]
Dimensional Modeling Example
Shape Estimating Detail Color
«fk» shape id
«fk» color id
«fk» texture id
Texture Density
«fk» density id
«fk» size id
«fk» estimate date id
«dd» estimate number
Size «fk» win date id Estimate Date
«dd» job number
«fk» customer id
«fk» promotion id
Win Date «fk» cost center id Customer
widget quantity
estimated hours
hourly rate
estimated cost
Promotion markup Cost Center
discount
price
Figure 8.6 Star schema for estimating process
58
Color
«pk» color id
color description
hue
intensity
glows in dark
Figure 8.7 Color dimension showing attributes
Date Estimate Date Win Date
«pk» date id «pk» estimate date id «pk» win date id

date description estimate date description win date description
month estimate month win month
quarter estimate quarter win quarter
year estimate year win year
day of week estimate day of week win day of week
Figure 8.8 Date dimensions showing attributes
Cost Center Scheduling Detail
«dd» job number

«fk» cost center id
Sched Start Date
«fk» sched start date id
Actual Start Date
«fk» sched start time id
Sched Start Time «fk» sched finish date id
«fk» sched finish time id Actual Start Time
«fk» actual start date id
Sched Finish Date
«fk» actual start time id
Actual Finish Date
«fk» actual finish date id
Sched Finish Time «fk» actual finish time id
finished on time Actual Finish Time
estimated hours
actual hours
Figure 8.9 Star schema for the scheduling process
59
Productivity Detail
Cost Center
«dd» job number Actual Start Date
«fk» cost center id
Employee «fk» employee id
«fk» actual start date id Actual Start Time
«fk» actual start time id
«fk» actual finish date id
«fk» actual finish time id Actual Finish Date
widget quantity
finished on time
estimated hours Actual Finish Time
actual hours
Figure 8.10 Star schema for the productivity tracking process
Table 8.2 Data warehouse bus for widget example
Actual Finish Time

Sched Finish Time
Actual Finish Date

Sched Finish Date
Actual Start Time

Sched Start Time
Actual Start Date

Sched Start Date
Estimate Date
Invoice Date
Cost Center
Promotion
Employee
Customer
Win Date
Density
Texture
Shape
Color
Size
Estimating x x x x x x x x x x
Scheduling x x x x x x x x x
Productivity Tracking x x x x x x
Job Costing x x x x x x x x x
60
Shape Job Costing Detail Color
«fk» shape id
«fk» color id
«fk» texture id
Texture Density
«fk» density id
«fk» size id
«fk» estimate date id
«dd» estimate number
Size «fk» win date id Estimate Date
«dd» job number
«fk» customer id
«fk» promotion id
Win Date «fk» cost center id Customer
«fk» invoice date id
widget quantity
estimated hours
hourly rate
Promotion estimated cost Cost Center
markup
discount
price
actual hours
Invoice Date actual cost
Figure 8.11 Star schema for the job costing process
Job Costing Daily Snapshot
Invoice Date «fk» invoice date id

widget quantity
estimated hours
estimated cost
price
actual hours
actual cost
Figure 8.12 Schema for the job costing daily snapshot
61
On-Line Analytical Processing (OLAP)
Calendar Dimension Fact Table Customer Dimension

(first ordinate) (second ordinate)
0: date id (0, 0) 0: cust id
1: month 1: city
2: quarter 2: state
3: year 3: all
(1, 0) (0, 1)
4: all
(2, 0) (1, 1) (0, 2)
(3, 0) (2, 1) (1, 2) (0, 3)
(4, 0) (3, 1) (2, 2) (1, 3)
(4, 1) (3, 2) (2, 3)
(4, 2) (3, 3)
(4, 3)
Figure 8.13 Product graph labeled with aggregation level coordinates
d
Possible views = Π hi (8.1)
i=1
If we express Equation 8.1 in different terms, the problem of exponential explosion

becomes more apparent. Let g be the geometric mean of the number of hierarchical levels
in the dimensions. Then Equation 8.1 becomes Equation 8.2.
Possible views = gd (8.2)
0.1 P=0.9
0.1 0.9 0.1 0.9
0.1 0.9 0.1 0.9 0.1 0.9 0.1 0.9

62
0.001 0.009 0.009 0.081 0.009 0.081 0.081 0.729

Selection of Materialized Views
Fact Table c = Customer

{c, p, s} 6M p = Part
s = Supplier
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
Figure 8.16 Example hypercube lattice structure
Table 8.3 Two iterations of HRU, based on Figure 8.16
Iteration 1 Benefit Iteration 2 Benefit

{p, s} 5.2M x 4 = 20.8M
{c, s} 0x4=0 0x2=0
{c, p} 0x4=0 0x2=0
{s} 5.99M x 2 = 11.98M 0.79M x 2 = 1.58M
{p} 5.8M x 2 = 11.6M 0.6M x 2 = 1.2M
{c} 5.9M x 2 = 11.8M 5.9M x 2 = 11.8M
{} 6M - 1 0.8M - 1
Table 8.4 First iteration of PGA, based on Figure 8.16
Candidates Iteration 1 Benefit

{p, s} 5.2M x 4 = 20.8M
{s} 5.99M x 2 = 11.98M
{} 6M - 1
Table 8.5 Second iteration of PGA, based on Figure 8.16
Candidates Iteration 2 Benefit
{c, s} 0x2=0
{s} 0.79M x 2 = 1.58M
{c} 5.9M x 2 = 11.8M
{} 6M - 1
63
64
IX. CASE Tools for Logical Database Design
Software tools provide significant value to database designers by

a) Dramatically reduced the complexity of conceptual and logical design, both
of which can be rather subtle to do well.
b) Automating the transformation of the logical design to a physical design (at
least the basic physical design).) to create the physical database.
c) Providing reporting, round trip engineering and reverse engineering that
make such tools invaluable in maintaining systems over a long period of
time.
Key Capabilities to Watch For

• Complete round trip engineering
• UML design
• Schema evolution, Change management
• Reverse engineering of existing systems
• Team support, allowing multiple people to work on the same project concurrently.
• Integrated with Eclipse and .NET and other tooling products
• Component and convention re-use (being able to reuse naming standard, domain,
logical models over multiple design projects)
• Reusable assets (extensibility, template, ...)
• Reporting
Basic Transformations of the Conceptual Model to SQL Tables

• Transform each entity into a table containing the key and nonkey attributes
of the entity.
• Transform every many-to-many binary or binary recursive relationship into
a relationship table with the keys of the entities and the attributes of the
relationship.
• Transform every ternary or higher-level n-ary relationship into a
relationship table.
65
Similarly these tools produce the transformation table types described in Chapter 5:
• An entity table with the same information content as the original entity.
• An entity table with the embedded foreign key of the parent entity.
• A relationship table with the foreign keys of all the entities in the
relationship.
Figure 9.3 Rational Data Architect entity relationship modeling (courtesy IBM)
66
Generating a database from a design
Database support
Collaborative support
1. Concurrency control.
2. Merge and collaboration capabilities that allow designers to combine designs
or merge their latest changes into a larger design project.
Distributed development
Application life cycle tooling integration
Design compliance checking
Design and Normalization
Discover 1st 2nd 3rd normalization
Index and Storage
check for excessive indexing
Naming Standards
Security Compliance
Sarbanes-Oxley compliance
Valid data model and rules
Model syntax checks
Reporting
Modeling a Data Warehouse
Semi-Structured data, XML
##
67

Database Modeling and Design: Logical Design: Toby Teorey, Sam Lightstone, Tom Nadeau

Uploaded by

Copyright:

Available Formats

Database Modeling and Design: Logical Design: Toby Teorey, Sam Lightstone, Tom Nadeau

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Database Modeling and Design: Logical Design: Toby Teorey, Sam Lightstone, Tom Nadeau

Uploaded by

Copyright:

Available Formats

Database Modeling and Design: Logical Design

Toby Teorey, Sam Lightstone, Tom Nadeau

III. The Unified Modeling Language (UML)………...…………….13

IV. Requirements Analysis and Conceptual Data Modeling….…..22

V. Transforming the Conceptual Model to SQL…………...………32

VII. An Example of Logical Database Design………………………48

VIII. Business Intelligence………………………………..……….....52

IX. CASE Tools for Logical Database Design……………………….60

record—group of related data items treated as a unit by an application program

file—collection of records of a single type (examples: president, election)

database management system (DBMS) -- a generalized software system for storing

Objectives of Database Management

2. Data integrity—insure correctness and validity

3. Privacy (the goal) and security (the means)

5. Data independence (a relative term) -- avoids reprogramming of applications, allows

* physical data independence—program unaffected by changes in the storage structure or

* logical data independence—program unaffected by changes in the schema

* Social Security Administration example

Relational Database Lifecycle

2. Logical database design

3. Physical database design

4. Database implementation, monitoring, and modification

* Stepwise refinement and iterative re-design

* Well-defined design review team and process

* When to do a design review

*Design review rules

Basic ER Modeling Concepts

Relationship - an association among two or more entities

* occurrence - instance of a relationship is the collective instances of the related entities

Attribute - a characteristic of an entity or relationship

* Identifier - uniquely determines an instance of an entity

* disjointness constraint - there is no overlap among subclasses

* completeness constraint - constrains subclasses to be all-inclusive of the super-class or not

* special property: hierarchical in nature

* attributes within an entity, data aggregate (mo-day-year)

* entity clustering variation: membership or “is-member-of” relationship

* entities at a given point in time

* uniqueness constraint – one-to-one functional dependency among key attributes

Figure 3.1 Basic UML class diagram constructs

ternary skill assignme

Figure 3.2 Selected UML relationship types (parallel to Figure 2.2)

Manage Engine Technicia Secreta

Figure 3.3 UML generalization constructs (parallel to Figure 2.4)

Figure 3.4 UML aggregation constructs (parallel to Figure 2.5)

Primary key as a Car

Figure 3.6 UML constructs illustrating primary keys

Example from the Music Industry

Music Media Distributio

Figure 3.8 Example illustrating classes grouped into packages

Figure 3.11 UML activity diagram constructs

Figure 3.12 UML activity diagram, manufacturing example

4. Accompany your diagrams with textual descriptions..

Natural data requirements (what goes into the database)

2. Information system objectives

4. Administrative and operational policies

5. Data elements, relationships, constraints, computing environment

Processing requirements (what comes out of the database)

1. Existing applications - manual, computerized

2. Perceived new applications