er

Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

The Entity-Relationship (ER)

Model

Lecture 2
INFS614

INFS614, GMU, Lecture 2 1


Administrivia
❖ Everyone have texts, Oracle accounts?
❖ Homework #1 is on the website.
❖ Lectures will follow text, but I may add
some materials. I’ll try to let you know
when.
– You are responsible for both on tests.
❖ Check online syllabus for updates

INFS614, GMU, Lecture 2 2


Database Design Overview
2) Conceptual 3) Schema
Design (this lecture) Design
(next lecture)

R1 R2 R3

CREATE TABLE ....


Feature not modeled: xxxxx

1) Requirements
4) Physical
Analysis
Design
requirement

4 Stages
requirement
requirement
requirement
requirement
requirement Queries ->
requirement
requirement
requirement
requirement
requirement
requirement
R1 R2 R3
requirement
requirement
requirement
Indices, Replication,...

INFS614, GMU, Lecture 2 3


Database Design Overview
❖ Stage 1: Requirements analysis
– lots of interviews! lots of listening/notetaking!
– What queries do your users ask? What analyses are
being performed? What decisions must be made &
what information is needed to make them?
– What data sources are available to address all this?
◆ data sources have “personalities”: pond, fire hose, un-dug
well, muddy stream, vast ocean, ...
– what are the integrity requirements for this info?
– hopefully this converges to one set of requirements!
◆ but what if Bob wants weekly salary after taxes, and Jane
wants monthly gross earnings?
◆ but what what if some information systems already exist??
– conflicts lead to an “integration phase”
INFS614, GMU, Lecture 2 4
Database Design Overview
❖ Stage 2: Conceptual Design (ER Model)
– what concepts (abstract data structures) satisfy
these requirements?
– what integrity constraints apply?
– the ER (Entity-Relationship) Model is used to
describe answers to these questions
– you will learn it in this lecture

❖ An aside
– data structures + integrity constraints = a data
model (T/F)?

INFS614, GMU, Lecture 2 5


Database Design Overview
❖ An aside
– False! The ER model is 2/3 of a data model.
◆ no operations explicitly modeled! (assumes CRUD)
– Anyone know UML?
◆ UML targets the broader enterprise, models application
code / operations well, but is not as good for DBMS-
specific models

❖ Point #1: UML and ER have some overlap


❖ Point #2: ER doesn’t capture operations
– don’t try, it will confuse you!

INFS614, GMU, Lecture 2 6


Database Design Overview
❖ Stage 3: Schema Design
– the design of specific information objects which
user queries & applications can access

– Part 1: translation of ER diagrams into a relational


schema (SQL DDL)
◆ we learn this next week
– Part 2: normalization (refinement) of a schema
◆ elegant process discussed in Chapter 15
◆ we defer this to the end of the semester

INFS614, GMU, Lecture 2 7


Database Design Overview
❖ Stage 4: Physical Database Design
– now you consider operations!
◆ query mix (read-only? updates? how many tables involved?)
◆ workload (# of concurrent users? what are they doing?)
◆ access patterns (peak load? when can we install patches?)
– you decide on file structures, add indices, rewrite
queries so the optimizer will work better, perhaps
adjust your schema again, perhaps use replication
– things change, so you never stop doing this!
◆ this is called “database tuning”
◆ science + art, an entire field itself....

INFS614, GMU, Lecture 2 8


What Can Go Wrong?
❖ We’ve just described the classic sequential
“waterfall model”:

❖ Sometimes, sequential is too simplistic:


– Required data & attributes are unknown
– Constraints keep changing
❖ And you have to prototype & iterate ....
– Used to be hard for databases (MongoDB now helps)
INFS614, GMU, Lecture 2 9 9
Great Book On Design In
Challenging Circumstances

❖ The Design of Design: Essays from


a Computer Scientist
– by Fred Brooks (Turing Award
Winner)

❖ We will learn a waterfall approach to database


design (this lecture & next)
– A great strategy exists
– But: remember the value of asking lots of good questions,
prototyping, staying aware of evolving circumstances
INFS614, GMU, Lecture 2 1010
What Else Can Go Wrong?
❖ What if the number of entities & relationships
is (or appears) trivial?
❖ Related: What if one entity is so huge, it
dominates all phases (esp. physical design)?
– Perhaps obscuring smaller datasets?

❖ These challenges can occur in today’s “big data”


scenarios.
– There is even a “cultural challenge”: people attracted to
massive fast-changing data problems tend to differ from
people attracted to small precise data problems!
❖ Key: keep the big picture in mind as you do either ...
INFS614, GMU, Lecture 2 11 11
Now Back to the Lecture:
The Entity Relationship Model
❖ The E-R Model is used in conceptual design
(the 2nd stage):
– Three major concepts: entities, attributes,
relationshps
– An E-R Model can be represented pictorially:
◆ Result is an ER Diagram
– An E-R Model can be mapped into a Relational
Database Schema

– Fasten your seatbelts ...

INFS614, GMU, Lecture 2 12


ER Terminology Checklist
❖ entity ❖ recursive ❖ identifying
❖ entity set relationship relationship set
❖ attributes ❖ role ❖ partial key,
❖ domain ❖ key constraint, discriminant
❖ key cardinality ❖ ISA relationship
❖ candidate key ❖ many-to-one ❖ specialization
❖ primary key ❖ one-to-many ❖ generalization
❖ super key ❖ one-to-one ❖ superclass/subclass
❖ relationship
❖ participation ❖ inheritance
❖ relationship set constraints ❖ overlap constraint
❖ instance
❖ total ❖ covering constraint
❖ arity, degree
❖ partial ❖ aggregation
❖ binary
❖ ternary ❖ weak entity set
❖ identifying owner
INFS614, GMU, Lecture 2 13
ER Model Basics
❖ Entity: A real-world object of interest, uniquely
distinguishable from other objects.
– Distinguishable via its description, attributes & values
– Examples: me, you, your address, a student, the president
of GMU, the Universe
❖ Entity Set: A collection of similar entities.
– E.g., all students, all instructors
– Entity sets need not be disjoint: all GTAs
– The entity sets that end up in your ER diagram are
relevant to your enterprise

❖ Is this class an entity, an entity set, or both?


– Note: as an entity, it has special attributes (e.g.,
enrollment) which make it a different kind of entity than a
student ...

INFS614, GMU, Lecture 2 14


ER Model Basics
❖ Attribute: every entity has a set of attributes
– e.g. for a student entity: age, major, GPA, weight
– ER attributes are single-valued (not set-valued)
– what about address? (entity or attribute?)
– the attributes you choose reflect the level of detail
you need
◆ don’t overdo it, don’t underdo it!
❖ Entities in the same entity set have the same
attributes

INFS614, GMU, Lecture 2 15


ER Model Basics
❖ Each attribute has a Domain : a set of possible
values for that attribute :
Domain (age) : {0,1,..,100}
Domain (gender) : {female, male}
Domain (sname) : any string of at most 20 characters
❖ Can the value “null” be part of a domain?
– what if a student doesn’t have a major yet?
– but this is not good for special attributes called “keys”! ....
– note: what does a null mean!? (unknown, not applicable, zero, ...?)

INFS614, GMU, Lecture 2 16


Keys of Entity Sets
❖ Key: a minimal set of (1 or more) attributes whose
value uniquely identifies each entity in an entity set
– critical concept -- distinguishes among entities
– almost every entity set has a key
– examples: for GMU students: G#, for a house: address, for
people: {address, name, birthday}
❖ Minimality: no attribute can be removed and it is still
a key
– for GMU students: is {G#, age} minimal?
❖ Superkey: Any set of attributes containing the fields
of a key
– not necessarily minimal
– {G#, age} is a superkey; is {G#}?

INFS614, GMU, Lecture 2 17


Keys of Entity Sets
❖ There are 2 important types of keys:
❖ Candidate key
– There may be several possible keys for an entity set
– e.g., {G#}, {SS#}, {id of 7:20pm Monday class, name, age}
– each is a candidate key
❖ Primary key
– the candidate key you pick to be the primary one
– important because (when translated into the relational
model) the DBMS can enforce its uniqueness
– also because you have to keep it unique over time
◆ this takes work, and you only want to do it for one key!
– the primary key is underlined in an ER diagram
◆ (but not the candidate keys)

INFS614, GMU, Lecture 2 18


ER Diagram:
Example Entity Set

name
ssn lot

Employees

Diagram rule:

Entity set: Box (rectangle)


Attribute: “bubble” (oval)
Primary key: underlined

INFS614, GMU, Lecture 2 19


ER Basics: Relationships
❖ Relationship: Association among two or more entities.
– e.g., employee Mary works in the Pharmacy department
❖ Relationship Set: Collection of similar relationships.
– similar because it relates entities from the same entity sets
– Bob works in the Accounting dept is in the same set
– relationship sets can also have “descriptive” attributes
◆ e.g. when did Mary start to work in the Pharmacy dept?

❖ Arity
– relationships do not have to be binary, can be ternary, or ....
– An n-ary relationship set R relates n entity sets E1 ... En
– each relationship in R relates specific entities ea in E1,...,ez in
En

INFS614, GMU, Lecture 2 20


Relationship Set Example
since
name dname
ssn lot did budget

Employees Works_In Departments

Diagram rule:

Relationship set: Diamond


Attributes: “bubble” (oval)

INFS614, GMU, Lecture 2 21


Ternary Relationship Example
(arity = 3)
since
name dname
ssn lot did budget

Employees Works_In2 Departments

addresses Locations capacity

INFS614, GMU, Lecture 2 22


Identifying a Relationship Set
since
name dname
ssn lot did budget

Employees Works_In Departments

Each relationship is uniquely identified by the


combination of the keys of the participating entities

For Works_In: employee ssn and department did


Thus, for a given employee-department pair, we
cannot have more than one associated since value.
INFS614, GMU, Lecture 2 23
“Recursive” Relationship Sets

name
role indicators
ssn lot

Employees

supervisor subordinate

Reports_To since

Each Reports_To relationship is uniquely identified


by subordinate_ssn and supervisor_ssn.
INFS614, GMU, Lecture 2 24
Instance of a Relationship
Set
❖ An instance of a relationship set is a specific set
of relationships.
❖ Snapshot of the relationship set at some instant in
time.
❖ E.g., Instance of relationship set Works_In:
– each relationship in a Works_In instance includes:
{an employee’s SSN, a department’s DID, the “since” date}

INFS614, GMU, Lecture 2 25


Example Relationship Set Instance
Example : Instance of relationship set Works_In :
– Employee 123-22-366 (Sue) has worked in A-Dept since 3/3/96
– Employee 123-22-366 (Sue) has worked in B-Dept since 11/3/98
– etc.
Represented as: {123-22-366, A, 3/3/96}, {123-22-366, B, 11/3/98}, ...

3/3/96

11/3/98
123-22-366 A
11/3/98
534-55-928 B

633-90-9767 11/3/98
C
231-89-6598
7/30/95

7/30/95

Employees Works-In Departments


INFS614, GMU, Lecture 2 26
Relationship Set Cardinalities
❖ Every relationship set has a “cardinality” constraint
❖ For binary relationship sets:

1-to-1 1-to Many Many-to-1 Many-to-Many


(e.g. husband-wife) (e.g. manager- (e.g. car- (e.g. student-class)
employee) manfacturer)

INFS614, GMU, Lecture 2 27


Key Constraints (1:N)
since
name dname
ssn lot did budget

Employees Manages Departments


1 N

❖ Manages has a 1-N cardinality


– each dept has at most 1 (one) manager
– this is also called a key constraint on Department with
respect to Manages.
◆ i.e., did (by itself) is the key for relationship set Manages
(Why?)
❖ In ER diagrams:
– you must use an arrow on the “N” side in an ER diagram
– you may also use the “1” and the “N”
INFS614, GMU, Lecture 2 28
Key Constraints (1:1)
since
name dname
ssn lot did budget

Employees Manages Departments


1 1

❖ Now, add the restriction that each employee can


manage at most one department.
❖ We obtain a one-to-one (1:1) relationship set.
– Two arrows!

INFS614, GMU, Lecture 2 29


Key Constraints (M:N)
since
name dname
ssn lot did budget
Manages

Employees Departments

M Works_In N

since

❖ Works_In has a M-N cardinality


– no restrictions -- no key constraints
– no arrows
– (what is the key for relationship set Works_In?)

INFS614, GMU, Lecture 2 30


Key Constraints for Ternary
Relationships
since
name dname
ssn lot did budget

Employees Works_In2 Departments

address Locations capacity

❖ Each employee works in at most one department and


at a single location.
– ssn is the key (it uniquely determines did and address)

INFS614, GMU, Lecture 2 31


Why Is It Called a Key
Constraint?
❖ In a relationship instance, each entity of
the “key” entity set (having the arrow) can
participate at most once.
– i.e., it may not map to multiple entities
❖ Thus, it uniquely identifies a relationship
instance
– the entity’s key is also a key (or part of a key)
of the relationship set
❖ This will make more sense next week

INFS614, GMU, Lecture 2 32


Participation Constraints
❖ Whether every entity in an entity set must
participate in a relationship set
❖ Does every department have a manager?
– If so, this is a participation constraint, and the participation
of Departments in Manages is said to be total (vs. partial).

since
name dname
ssn lot did budget

Employees Manages Departments

Works_In

since
INFS614, GMU, Lecture 2 33
Participation Constraints

3/3/96

123-22-366 23

11/3/98
534-55-928
37

633-90-9767
51
7/30/95
231-89-6598

Employees Manages Departments


Partial Participation Total Participation

❖ note: what can we infer about key constraints in this diagram?


– its much easier to prove existence than non-existence

INFS614, GMU, Lecture 2 34


Weak Entities
❖ Not all entities have a key!
❖ Consider the following situation: employees can purchase
insurance policies to cover their dependents.

name
ssn lot pname age

Employees Policy Dependents

The attribute pname (dependent name in the


policy) does not identify a dependent uniquely.
– Dependents are unique through their relationship!
INFS614, GMU, Lecture 2 35
Weak Entity Sets
❖ A weak entity is dependent on another (owner)
entity.
– Owner entity set and weak entity set must participate in
a one-to-many relationship set (one owner, many weak
entities).
– Weak entity set must have total participation in this
identifying relationship set (why?)

partial key, or discriminant


name
cost pname age
ssn lot

Employees Policy Dependents

INFS614, GMU, Lecture 2 36


Another example

vNote: In the ER model, the primary key of the strong


entity set is not explicitly including in the weak entity
set, since it is implicit in the identifying relationship.

INFS614, GMU, Lecture 2 37


Strong vs. Weak entity sets
❖ Strong entity set:
– Has sufficient attributes to form a key
❖ Weak entity set:
– Lacks sufficient attributes to form a key
– But each entity can be distinguished within the weak
entity set by the partial key / discriminant
❖ Weak entities are uniquely identified by:
– a) their partial key / discriminant, and
– b) the primary key of the entity set in it’s identifying
relationship

INFS614, GMU, Lecture 2 38


Class Hierarchies
❖ We may want to organize the entities of an
entity set into superclasses and subclasses.
❖ E.g.: classify the entities in Employees as
Hourly_Emps or Contract_Emps.
❖ Semantics: every entity in Hourly_Emps
and Contract_Emps is also in Employees.
Thus, they must have all attributes of
Employees.

INFS614, GMU, Lecture 2 39


ER ISA (`is a’) Hierarchies
Triangle denotes A ISA B
– every A entity is also considered to
be a B entity
– Query to superclass retrieves all
instances of subclass(es) as well.
name
– Superclass attributes are inherited ssn lot
by subclasses
– ISA is also known as
Employees
“specialization”
hourly_wages hours_worked
ISA
contractid

Hourly_Emps Contract_Emps

INFS614, GMU, Lecture 2 40


Additional Example

(note: we will keep the triangles pointed upwards, in this class)


INFS614, GMU, Lecture 2 41
name
ssn lot

ISA (`is a’) Hierarchies Employees

hourly_wages hours_worked
ISA
contractid

Hourly_Emps Contract_Emps

❖ Overlap constraint: Subclasses are allowed to overlap


(share an entity)
❖ E.g., if Joe can be an Hourly_Emps as well as a Contract_Emps

❖ Covering constraint: Subclasses whose elements cover


(contain) all elements of their superclass
❖ E.g., if Hourly_Emps and Contract_Emps contain all elements of
Employees; each Employee must belong to some subclass
INFS614, GMU, Lecture 2 42
Design Motivations For Using ISA
❖ To model information as class hierarchies
– e.g., taxonomies
❖ You discover special attributes that don’t apply to all
entities in an entity set, so you specialize
– or vice versa!
– E.g., hourly_wages does not make sense for Contract_Emps.
❖ You realize only certain entities in an entity set can
participate in a relationship
– Define Senior_Emps ISA Employees;
– Define Manages relationship between entity sets Senior_Emps
and Departments to ensure that only senior employees can be
managers.

INFS614, GMU, Lecture 2 43


Aggregation
❖ Entity set Projects. ❖ A department that sponsors
❖ Each Projects a project might assign
entity is sponsored employees to monitor the
by one or more sponsorship.
departments. HOW!?

started_on since
dname
pid pbudget did budget

Projects Sponsors Departments

INFS614, GMU, Lecture 2 44


Aggregation
❖ Used when we have to model a ssn
name
lot
relationship between: an entity
set and a relationship set. Employees

– Aggregation allows us to treat a


relationship set as an entity set
Monitors
for purposes of participation in until
(other) relationships.

started_on since
dname
pid pbudget did budget

Projects Sponsors Departments


❖ Notes:
❖ Aggregations are infrequent
❖ The term “aggregation”is also used for “is-part-of”, a non-ER construct
with a different meaning!
INFS614, GMU, Lecture 2 45
Aggregation
☛ Aggregation vs. name
ssn lot
ternary relationship:
❖ Monitors is a distinct Employees
(from “sponsors”)
relationship, with its
own attribute. Monitors until

❖ Also, one can say each


sponsorship is
started_on since
monitored by at most dname
pid pbudget did budget
one employee (key
constraint). Projects Sponsors Departments

INFS614, GMU, Lecture 2 46


Aggregation: another example

Suppose we want to record managers for jobs performed by an employee


at a branch office
INFS614, GMU, Lecture 2 47
Example (cont.)
❖ Relationship sets works-on (ternary) and manages (quaternary)
represent overlapping information
– Note: Every manages relationship corresponds to a works-on
relationship
– Note: Some works-on relationships may not correspond to any
manages relationships
◆ So we can’t discard the works-on relationship

❖ Untangle this via aggregation


– Treat a relationship as an abstract entity
– Allow relationships between relationships
– Abstraction of an entire relationship into new “entity”
❖ The following diagram (much more cleanly) represents:
– An employee works on a particular job at a particular branch
– An employee, branch, job combination may have an associated manager

INFS614, GMU, Lecture 2 48


With aggregation

INFS614, GMU, Lecture 2 49


ER Design Challenges
❖ Design choices:
– Should a concept be modeled as an entity or an
attribute?
– Should a concept be modeled as an entity or a
relationship?
– Identifying relationships: Binary or ternary?
Aggregation?
❖ Note: Limits of the ER Model
– A lot of data semantics can (and should) be captured.
– But some things just cannot be captured in ER diagrams.

INFS614, GMU, Lecture 2 50


Entity or Attribute?
❖ Should address be modeled as an attribute of
Employees or an independent (related) entity?
◆ Address could be a character-string attribute of
Employee entities.
◆ If we want to model that several employees use the
same address, address must be a separate entity.
◆ If we want to support queries of address parts (e.g.,
retrieve employees in a given city, address can be
modeled as a separate entity with sub-attributes
◆ If there can be multiple addresses per employee,
address must be an entity
– ER limitation: attributes cannot be set-valued
– Note: semi-structured models permit multivalued atts!
INFS614, GMU, Lecture 2 51
Attribute Placement (1)
❖ Works_In2 does not allow an
employee to work in a dept. from to
name
for two or more periods. dname
ssn lot did budget
❖ Similar to the problem of
wanting to record several Employees Works_In2 Departments
addresses for an employee:
we want to record several
values of the descriptive
attributes for each instance
of this relationship.
❖ Solution: invent a separate ssn name lot did
dname
budget
entity for from and to
Employees Works_In3 Departments

from Duration to

INFS614, GMU, Lecture 2 52


Attribute Placement (2)
❖ First ER diagram OK if a
manager gets a separate
discretionary budget for since dbudget
each dept. name dname
❖ What if a manager gets a ssn lot did budget
discretionary budget that
covers all managed depts? Employees Manages2 Departments
– Redundancy of dbudget,
which is stored for each
dept managed by the
since
manager.
– Misleading: suggests ssn name dname
lot did
dbudget is tied to budget
managed dept.
Employees Manages3 Departments
❖ Soln: move dbudget over
to the manager
– (need new managers ISA
entity)
Managers dbudget
INFS614, GMU, Lecture 2 53
Ternary Relationships That Should Be
❖ Add new constraints: Binary
– at most 1 employee name
can own a policy ssn lot pname age
– every policy must be
Employees Covers Dependents
owned by an
employee
Policies
– dependents depends
on policies policyid cost
❖ Insight:
– no real need for ssn name lot pname age
dependents to
Dependents
relate directly to Employees
employees...
– covers is overkill Owner
Beneficiary

Policies

INFS614, GMU, Lecture 2 policyid cost 54


Binary Relationships That Should be
Ternary
Parts Needs Departments

Can-Supply Contracts Deals-With

qty

Suppliers

❖ No combination of binary relationships substitutes!


– they can’t express the 3-argument predicate contract(D,P,S)
◆ i.e., that D has agreed to buy P from S.
– And: How do we record qty?

INFS614, GMU, Lecture 2 55


Some Things ER Just Doesn’t
Model Well (or at all) !
❖ Multivalued attributes
❖ Numeric constraints
– e.g., managers manage between 3 and 7 employees
❖ Functional dependencies (ch 15)
– fields that determine the value of other fields
– e.g., a dept can’t order two different parts from the same
supplier (can be done with an aggregation, but ugly/hard!)
❖ Inclusion dependencies
– values in one attribute must be a subset of the values in another
❖ More general constraints
– managers must make 10% more than any of their employees
– this takes triggers/table constraints/application code

INFS614, GMU, Lecture 2 56


Summary of Conceptual Design
❖ Conceptual design follows requirements analysis
– ER model is a popular approach to conceptual design
– close to the way people think about their applications

❖ Major ER constructs:
– entities, relationships, and attributes
– key (cardinality) and participation constraints
– weak entities, ISA hierarchies, and aggregation
❖ Notes:
– Many variations on ER model.
– ER is subjective; normalization will address this later ...
– In big data scenarios, conceptual design may be not needed (due to
very few entities) and/or impossible (due to uncertain in
attributes)
INFS614, GMU, Lecture 2 57
Putting It All Together!
❖ You start with an English (or...) description of the
modeling problem
– either handed to you, or you extract it from interviews
❖ You attempt to capture all the features described
in an initial ER diagram
– you determine entities, attributes, relationships
❖ You (almost certainly will) refine it several times
– e.g., moving attributes to the right place, making sure
every constraint is correctly marked
– ask clarifying questions as needed
– test it out by thinking of example instances
– explicitly state in English what you couldn’t model in ER

INFS614, GMU, Lecture 2 58


Putting It All Together! (2)
❖ The resulting ER diagram has several important
uses
– 1) you can translate it directly into a set of relations,
which can be implemented via SQL DDL statements
◆ (come back next week!)
◆ note: some tools exist for this, but a) you can’t use them on
tests, b) you will be graded on these algorithms, not the ones
embedded in your tools, and c) you are in grad school -- use the
tool later and be smarter than it!
– 2) ER diagrams are great documentation
◆ succinct view of an entire information system
◆ smart people (like you) will be able to pick one up and
understand volumes!

INFS614, GMU, Lecture 2 59


Example Problem:
❖ Assume the university computing services (UCS) administers all campus
computer networks. Each computer in a network is uniquely identifed by
an inventory number, and also has a name (e.g., cs.gmu.edu) and a model
(e.g., Dell Laptop). Computers are connected via networks. Each network
has an identification number (id) and a capacity. Each computer is
connected to a single network. The peak network traffic from the
computers that are connected to a network may not exceed the capacity of
the network. Each computer can have several peripheral devices. Each
peripheral is installed in a specific io-port (e.g., USB slot #1); io-port
numbers are unique within the computer, and has device name for the
device currently associated with it (e.g, Samsung SSD). Users are identified
by a social security number, and also have a name. Users can access the
computers assigned to them; users are guaranteed access to at least one
computer. For each computer that a user may access, the user is assigned a
user-id and a password.

INFS614, GMU, Lecture 2 60

You might also like