DBMS Normalization

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

NORMALIZATION

UNIT-3
KEY CONCEPTS
Formal Name Common Name Also Known As

Relation Table Entity


Tuple Row Record
Attribute Column Field

2
KEYS in DBMS
• KEYS in DBMS is an attribute or set of attributes which helps you to
identify a row(tuple) in a relation(table).

• They allow you to find the relation between two tables. Keys help you
uniquely identify a row in a table by a combination of one or more columns
in that table.

• Key is also helpful for finding unique record or row from the table.
Database key is also helpful for finding unique record or row from the
table.
• There are mainly seven different types of Keys in DBMS and each key has
it’s different functionality:

• Super Key - A super key is a group of single or multiple keys which


identifies rows in a table. A Super key may have additional attributes that
are not needed for unique identification.
• Candidate Key - is a set of attributes that uniquely identify tuples in a
table. Candidate Key is a super key with no repeated attributes.
• The Primary key should be selected from the candidate keys. Every table
must have at least a single candidate key.
• A table can have multiple candidate keys but only a single primary key.

• Properties of Candidate key:


• It must contain unique values
• Candidate key may have multiple attributes
• Must not contain null values
• It should contain minimum fields to ensure uniqueness
• Uniquely identify each record in a table
• Primary Key - is a column or group of columns in a table that uniquely
identify every row in that table. The Primary Key can't be a duplicate
meaning the same value can't appear more than once in the table.
• A table cannot have more than one primary key.

• Rules for defining Primary key:


• Two rows can't have the same primary key value
• It must for every row to have a primary key value.
• The primary key field cannot be null.
• The value in a primary key column can never be modified or updated if any
foreign key refers to that primary key.
• Alternate Key - is a column or group of columns in a table that uniquely
identify every row in that table.
• A table can have multiple choices for a primary key but only one can be set
as the primary key. All the keys which are not primary key are called an
Alternate Key.
• FOREIGN KEY is a column that creates a relationship between two
tables. The purpose of Foreign keys is to maintain data integrity and allow
navigation between two different instances of an entity. It acts as a cross-
reference between two tables as it references the primary key of another
table.
• Composite key: A key that has more than one attributes is known as
composite key. It is also known as compound key.
• Note: Any key such as super key, primary key, candidate key etc. can be
called composite key if it has more than one attributes.
• Lets consider a table Sales. This table has four columns (attributes) –
cust_Id, order_Id, product_code & product_count.

• Table – Sales
Relational Decomposition
• When a relation in the relational model is not in appropriate normal form
then the decomposition of a relation is required.
• In a database, it breaks the table into multiple tables.
• If the relation has no proper decomposition, then it may lead to problems
like loss of information.
• Decomposition is used to eliminate some of the problems of bad design like
anomalies, inconsistencies, and redundancy.
• Lossless Decomposition
• If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
• The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
• The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
EMPLOYEE table:
• Now, when these two relations are joined on the common column
"EMP_ID", then the resultant relation will look like:
In above example Marks is fully functionally dependent on STUDENT#
COURSE#and not on sub set of STUDENT# COURSE#. This means Marks
can not be determined either by STUDENT# OR COURSE# alone. It can be
determined only using STUDENT# AND COURSE# together. Hence Marks
is fully functionally dependent on STUDENT# COURSE#.
Another Example
• CourseName is not fully functionally dependent on STUDENT#
COURSE# because subset of STUDENT# COURSE#
• i.e only COURSE# determines the CourseName and STUDENT# does not
have any role in deciding CourseName.

• Hence CourseName is not fully functionally dependent on STUDENT#


COURSE#.

• In the above relationship CourseName is partially dependent on composite


attributes STUDENT# COURSE#because COURSE# alone defines the
CourseName.
Another Example
In above example, Grade depends on Marks and in turn Marks depends on
Student#COURSE#.

Hence Grade transitively depends on Student#COURSE#.


Transitive:Indirect
Another Example
Normalization
• Normalization is the process of organizing the data in the database.
• Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like
Insertion, Update and Deletion Anomalies.
• Normalization divides the larger table into the smaller table and links them
using relationship.
• The normal form is used to reduce redundancy from the database table.
Problems Without Normalization
• If a table is not properly normalized and have data redundancy then it will not
only eat up extra memory space but will also make it difficult to handle and
update the database, without facing data loss.
• Insertion, Updation and Deletion Anomalies are very frequent if database is not
normalized. To understand these anomalies let us take an example of
a Student table.
Problem Without Normalization
Database Normalization is used for following Purpose:

• To Eliminate the redundant or useless data

• To Reduce the complexity of the data

• To Ensure the relationship between tables as well as data


in the tables

• To Ensure data dependencies and data is logically


stored.
First Normal Form (1NF)
• A relation will be 1NF if it contains an atomic value.
• It states that an attribute of a table cannot hold multiple values.
• It must hold only single-valued attribute.
• First normal form disallows the multi-valued attribute, composite
attribute, and their combinations.
• Example: Relation EMPLOYEE is not in 1NF because of multi-
valued attribute EMP_PHONE.
Second Normal Form (2NF)
• In the 2NF, relational must be in 1NF.
• In the second normal form, all non-key attributes are fully functional
dependent on the primary key
• Example: Let's assume, a school can store the data of teachers and the
subjects they teach. In a school, a teacher can teach more than one subject.
• In the given table, non-prime attribute TEACHER_AGE is dependent on
TEACHER_ID which is a proper subset of a candidate key. That's why it
violates the rule for 2NF.
Third Normal Form (3NF)
• A relation will be in 3NF if it is in 2NF and not contain any
transitive partial dependency.

• 3NF is used to reduce the data duplication. It is also used to achieve


the data integrity.

• If there is no transitive dependency for non-prime attributes, then


the relation must be in third normal form.

• A relation is in third normal form if it holds atleast one of the


following conditions for every non-trivial function dependency
X → Y.

• X is a super key.

• Y is a prime attribute, i.e., each element of Y is part of some


candidate key.
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super
key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE TABLE
BCNF
• Boyce-Codd Normal Form or BCNF is an extension to the third normal form, and
is also known as 3.5 Normal Form.

• Rules for BCNF


• For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following
conditions:
• BCNF is the advance version of 3NF. It is stricter than 3NF
1. It should be in the Third Normal Form.
2. A relation is in BCNF, if and only if every determinant is a candidate key..
• ie. A table is in BCNF if every functional dependency X → Y, X is the super key of
the table.
• For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
4th Normal Form
• For a table to satisfy the Fourth Normal Form, it should satisfy the
following two conditions:
• It should be in the Boyce-Codd Normal Form.
• And, the table should not have any Multi-valued Dependency.

• What is Multi-valued Dependency?


• A table is said to have multi-valued dependency, if the following conditions
are true,

• For a dependency A → B, if for a single value of A, multiple value of B


exists, then the table may have multi-valued dependency.

• Also, a table should have at-least 3 columns for it to have a multi-valued


dependency.

• And, for a relation R(A,B,C), if there is a multi-valued dependency


between, A and B, then B and C should be independent of each other.

• If all these conditions are true for any relation(table), it is said to have
multi-valued dependency.
As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.

You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more records, as
shown below, because for one student, two hobbies exists, hence along with both the
courses, these hobbies should be specified.
Now this relation satisfies the fourth normal form.
A table can also have functional dependency along with multi-valued dependency.
In that case, the functionally dependent columns are moved in a separate table and
the multi-valued dependent columns are moved to separate tables.
Fifth normal form (5NF)
• A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless.
• 5NF is satisfied when all the tables are broken into as many tables as
possible in order to avoid redundancy.
• 5NF is also known as Project-join normal form (PJ/NF).
• In the above table, John takes both Computer and Math class for
Semester 1 but he doesn't take Math class for Semester 2. In this
case, combination of all these fields required to identify a valid
data.

• Suppose we add a new Semester as Semester 3 but do not know


about the subject and who will be taking that subject so we leave
Lecturer and Subject as NULL.

• But all three columns together acts as a primary key, so we can't


leave other two columns blank.

• So to make the above table into 5NF, we can decompose it into


three relations P1, P2 & P3:
The Process of
Normalization up to
5NF

54

You might also like