Unit-Ii Database Design: Er Model & Er Diagrams
Unit-Ii Database Design: Er Model & Er Diagrams
Unit-Ii Database Design: Er Model & Er Diagrams
DATABASE DESIGN
ER MODEL & ER DIAGRAMS
Database design/ Database model
• well-designed database shall:
– Eliminate Data Redundancy: the same piece of data shall
not be stored in more than one place. This is because
duplicate data not only waste storage spaces but also easily
lead to inconsistencies.
– Ensure Data Integrity and Accuracy
• Database design is a logic structure of a database
Design can be done in different levels:
• Physical level
• Logical level
• User/view level
ER Model – Basic Concepts
• The ER model defines the three most relevant steps
• It works around real-world entities and the associations among
them
• At view level, the ER model is considered a good option for
designing databases.
Examples of Entities:
• Person: Employee, Student, Patient
• Place: Store, Building
• Object: Machine, product and Car
• Event: Sale, Registration, Renewal
• Concept: Account, Course
Entities can be characterized into two types:
• Strong entity: A strong entity has a primary key attribute which uniquely
identifies each entity. Symbol of strong entity is same as an entity.
• Weak entity: A weak entity does not have a primary key attribute and
depends on other entity via a foreign key attribute.
Difference between Strong Entity & Weak Entity
B. Attribute:
Each entity has a set of properties. These properties of each entity are
termed as attributes. For example, a car entity would be described by
attributes such as price, registration number, model number, color etc.
Attributes are indicated by ovals in an e-r diagram.
• Hence, as part of the Enhanced ER Model, along with other improvements, three new concepts were
added to the existing ER Model, they were:
• Generalization
• Specialization
• Aggregation
• Subclass / Super class
• Union collection of superclass
• Inheritance
Generalization:
Generalization is a bottom-up approach in which two lower level
entities combine to form a higher level entity. In generalization,
the higher level entity can also combine with other lower level
entities to make further higher level entity.
It's more like Superclass and Subclass system, but the only
difference is the approach, which is bottom-up. Hence, entities
are combined to form a more generalized entity, in other words,
sub-classes are combined to form a super- class
Specialization:
Specialization is opposite to Generalization. It is a top-down
approach in which one higher level entity can be broken down
into two lower level entity. In specialization, a higher level entity
may not have any lower-level entity sets, it's possible.
Aggregation:
Aggregation is a process when relation between two entities is
treated as a single entity.
In the diagram above, the relationship between Center and
Course together, is acting as an Entity, which is in relationship
with another entity Visitor. Now in real world, if a Visitor or a
Student visits a Coaching Center, he/she will never enquire about
the center only or just about the course, rather he/she will ask
enquire about both.
ER to Relational Mapping
• ER Model, when conceptualized into diagrams, gives a good
overview of entity-relationship, which is easier to understand.
• ER diagrams can be mapped to relational schema, that is, it is
possible to create relational schema using ER diagram.
• We cannot import all the ER constraints into relational model,
but an approximate schema can be generated.
• There are several processes and algorithms available to
convert ER Diagrams into Relational Schema.
• Some of them are automated and some of them are manual.
ER diagrams mainly comprise of −
• Entity and its attributes
• Relationship, which is association among entities.
Mapping Entity:
An entity is a real-world object with some attributes.
Mapping Process (Algorithm)
• Create table for each entity.
• Entity's attributes should become fields of tables with their
respective data types.
• Declare primary key
Mapping Relationship:
A relationship is an association among entities.
Mapping Process
• Create table for a relationship.
• Add the primary keys of all participating Entities as fields of table with
their respective data types.
• If relationship has any attribute, add each attribute as field of table.
• Declare a primary key composing all the primary keys of participating
entities.
• Declare all foreign key constraints.
Mapping Weak Entity Sets:
A weak entity set is one which does not have any primary key
associated with it.
Mapping Process
• Create table for weak entity set.
• Add all its attributes to table as field.
• Add the primary key of identifying entity set.
• Declare all foreign key constraints.
Mapping Hierarchical Entities:
ER specialization or generalization comes in the form of hierarchical entity
sets.
Mapping Process
• Create tables for all higher-level entities.
• Create tables for lower-level entities.
• Add primary keys of higher-level entities in the table of lower-level
entities.
• In lower-level tables, add all other attributes of lower-level entities.
• Declare primary key of higher-level table and the primary key for lower-
level table.
• Declare foreign key constraints.
Functional Dependency:
The functional dependency is a relationship that exists between two attributes.
It typically exists between the primary key and non-key attribute within a
table.
X → Y
The left side of FD is known as a determinant, the right side of the production
is known as a dependent.
For example:
Assume we have an employee table with attributes:
Emp_Id, Emp_Name, Emp_Address.
Emp_Id → Emp_Name
Types of Functional dependency:
• Insertion Anomaly
• Updation Anomaly
• Deletion Anomaly
Normalization Rule
The table already satisfies 3 rules out of the 4 rules, as all our
column names are unique, we have stored data in the order we
wanted to and we have not inter-mixed different type of data in
columns.
• But out of the 3 different students in our table, 2 have opted for
more than 1 subject. And we have stored the subject names in a
single column. But as per the 1st Normal form each column must
contain atomic value. It's very simple, because all we have to do
is break the values into atomic values.
• Here is our updated table and it now satisfies the First Normal
Form.
• By doing so, although a few values are getting repeated but
values for the subject column are now atomic for each
record/row. Using the First Normal Form, data redundancy
increases, as there will be many columns with same data in
multiple rows but each row as a whole will be unique.
In this table, student_id is the primary key and will be unique for
every row, hence we can use student_id to fetch any row of data
from this table
Even for a case, where student names are same, if we know the
student_id we can easily fetch the correct record.
• Hence we can say a Primary Key for a table is the column or a
group of columns(composite key) which can uniquely identify
each record in the table.
• I can ask from branch name of student with student_id 10, and I
can get it. Similarly, if I ask for name of student with student_id
10 or 11, I will get it. So all I need is student_id and every other
column depends on it, or can be fetched using it.This is
Dependency and we also call it Functional Dependency.
Partial Dependency:
Now that we know what dependency is, we are in a better state to
understand what partial dependency is.
For a simple table like Student, a single column like student_id can
uniquely identfiy all the records in a table.
But this is not true all the time. So now let's extend our example to
see if more than 1 column together can act as a primary key.
Let's create another table for Subject, which will have subject_id
and subject_name fields and subject_id will be the primary key.
• Now we have a Student table with student information and
another table Subject for storing subject information.
• Let's create another table Score, to store the marks obtained by
students in the respective subjects. We will also be saving name
of the teacher who teaches that subject along with marks.
• Now as we just discussed that the primary key for this table is a
composition of two columns which is student_id & subject_id
but the teacher's name only depends on subject, hence the
subject_id, and has nothing to do with student_id.
And our Score table is now in the second normal form, with no
partial dependency.
Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,
1.It is in the Second Normal form.
2.And, it doesn't have Transitive Dependency.
So let's use the same example, where we have 3 tables, Student,
Subject and Score.
Student Table
Subject Table
Score Table
Transitive Dependency
With exam_name and total_marks added to our Score table, it saves
more data now. Primary key for the Score table is a composite key,
which means it's made up of two attributes or columns → student_id
+ subject_id. The new column exam_name depends on both student
and subject. For example, a mechanical engineering student will
have Workshop exam but a computer science student won't. And for
some subjects you have Practical exams and for some you don't. So
we can say that exam_name is dependent on both student_id and
subject_id.
Professor Table
Fourth Normal Form (4NF):
A table is said to be in the Fourth Normal Form when,
1.It is in the Boyce-Codd Normal Form.
2.it doesn't have Multi-Valued Dependency.
Multi-valued Dependency
A table is said to have multi-valued dependency, if the following
conditions are true,
1.For a dependency A → B, if for a single value of A, multiple value
of B exists, then the table may have multi-valued dependency.
2.Also, a table should have at-least 3 columns for it to have a multi-
valued dependency.
3.And, for a relation R(A,B,C), if there is a multi-valued
dependency between, A and B, then B and C should be independent
of each other.
Example
Below we have a college enrolment table with columns s_id, course
and hobby.
Well the two records for student with s_id 1, will give rise to two more records, as
shown below, because for one student, two hobbies exists, hence along with both
the courses, these hobbies should be specified.
And, in the table above, there is no relationship between the columns course and
hobby. They are independent of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data
and other anomalies as well.
How to satisfy 4th Normal Form?
To make the above relation satisfy the 4th normal form, we can
decompose the table into 2 tables.
Course Opted Table
Hobbies Table
The above table can be decomposed into the following three tables; therefore it is not
in 5NF:
Domain Key Normal Form(DKNF)
• A relation is in DKNF if every constraint on the relation is a logical
consequences of the definition of Keys and Domains.
• A constraint in this definition is any rule which is precise enough so that
you can evaluate whether or not it is true.
• A key is a unique identifier of a row in a table.
• A domain is the set of permitted values of an attribute.
Look at this database, which is in 1NF, to see what you must do to put that
database in DKNF.
• Table: SALES (Customer_ID, Product, Price)
• Key: Customer_ID
• Constraints:
• Customer_ID determines Product
• Product determines Price
• Customer_ID must be an integer >1000
To enforce Constraint 3 (that Customer_ID must be an integer greater than
1000), you can simply define the domain for Customer_ID to incorporate this
constraint. That makes the constraint a logical consequence of the domain of the
CustomerID column. Product depends on Customer_ID, and Customer_ID is a
key, so you have no problem with Constraint 1, which is a logical consequence
of the definition of the key.