Normalization

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

If a dataset is maintained in the form of just a single table, it

leads to Data Redundancy, which means a single value of data is


stored multiple times. This leads to many issues like an increment in
the database size, slower data retrieval, and data inconsistency.
Thus, to overcome this, Normalization in DBMS is used in which a
large table is reduced into smaller tables until each of the single
tables contains one relation.

Normalization
Normalization is a process of organizing data in a database to reduce
redundancy and improve data consistency. Primary keys are really important in
organizing information in a database. They help to make sure that every row in
a table has a unique identification so that nothing gets mixed up or lost.
Normalization is the process of organizing data in a database to minimize
redundancy and dependency. In database design, there are different normal
forms based on the primary keys of a table.

First Normal Form (1NF)


1NF requires that each column in a table contains atomic values and that each
row is uniquely identified. This means that a table cannot have repeating groups
or arrays as columns, and each row must have a unique primary key.
Example

A table is in 1NF if each column contains atomic values and each row is uniquely
identified. For example, a table that lists customers and their phone numbers −

Customer ID Name Phone Numbers

1 John 555-1234, 555-5678

2 Jane 555-9876

3 Michael 555-5555

This violates 1NF because the Phone Numbers column contains repeating
groups.

To normalize this table to 1NF, we can split the Phone Numbers column into
separate rows and add a separate primary key column −

Customer ID Name Phone Number

1 John 555-1234

1 John 555-5678

2 Jane 555-9876

3 Michael 555-5555

 All columns contain atomic values.


 All values of a column(attribute) belong to the same domain.
 Column names are unique.
Normalization and Normal Forms in DBMS
If a dataset is maintained in the form of just a single table, it
leads to Data Redundancy, which means a single value of data is
stored multiple times. This leads to many issues like an increment in
the database size, slower data retrieval, and data inconsistency.
Thus, to overcome this, Normalization in DBMS is used in which a
large table is reduced into smaller tables until each of the single
tables contains one relation.

As a result of Normalization, the redundancy in a table is removed


which improves the efficiency of the database system. Let’s
understand the Need for Normalization in DBMS using a simple
example. Consider a table having Student ID, name, Branch Name,
and Branch Code.

This is the single table storing all the data related to a college
student. Storing data in this way leads to the following problems:
 If you want to “insert” a name and ID of a new student, you
cannot do it until you don’t have his branch name and
branch code. This is called Insertion Anomaly.
 If you want to “delete” a student name, then the branch
name and code will also be deleted. The deleted branch
name and code cannot be recovered again. This is called
Deletion Anomaly.
 If you want to “update” the branch name of John from CS to
Civil, the update for Brat will also happen. Thus, multiple
updations of single data occur. This is called Updation
Anomaly.

A simple solution to the above problems is to reduce the dataset


into two tables using Normalization in DBMS in the following way.

As you can see, there is no need to repeatedly store the branch


data. Also, new data can be inserted into the data separately
without any anomaly. Therefore, Normalization in DBMS provides us
with a way to remove the above anomalies using the concept of
Normal Forms in DBMS. Before learning about the types of Normal
Forms in DBMS, you should be aware of the concept of Functional
Dependency in DBMS.
Types of Normal Forms
There are the following Normal Forms in DBMS:

First Normal Form (1NF)

The First Normal Form of Normalization in DBMS states that an


attribute of a Table cannot hold Multiple Values. We can also say
that an attribute must be single-valued. In other words, the
composite attribute or multivalued attribute is not allowed.

Thus, A Table is said to be in 1NF if:

 All columns contain atomic values.


 All values of a column(attribute) belong to the same domain.
 Column names are unique.

For example, if there are multiple branch names for a student, they
must be converted to a single-valued attribute as shown below.

Second Normal Form (2NF)

A Table ( Relation) in the Database exists in the Second Normal


Form if:

 It exists in the 1NF.


 It does not have Partial Dependency

Let’s understand the Second Normal Form of Normalization in DBMS


with an example. Suppose there is a table ‘Product’ containing the
Customer ID, Name, and Price of Products.

This table contains some redundancy as the price of the Fan is


stored twice. This table is in 1NF as all values are atomic. Now, let’s
see for 2NF.

Here,

Candidate Keys are {Customer_ID, Name}. The Prime Attributes


( attributes that are a part of the candidate key) will be Customer_ID
and Name and the Non-Prime Attribute will be Price. We see that
Customer_ID->Price is a Partial Functional Dependency because
Price (non-prime attribute) depends on Customer_ID(a subset of the
candidate key).
Therefore, we conclude the table is not in 2NF. To convert the table
to 2NF using Normalization in DBMS, we decompose it into two
tables as given below:

Both the above tables are in 1NF. Also, the Functional Dependencies
are Customer ID->Product Name and Product Name->Price in which
there is no Partial Dependency. Also, the price of the Fan is not
stored multiple times. So, Data Redundancy is removed.

Third Normal Form (3NF)

The Third Normal Form exists in a table if:

 It exists in the 2NF.


 There is no Transitive Dependency for non-prime attributes
in the table.

Now, let’s first see Transitive Functional Dependency before going


ahead with the Normalization in DBMS.

Transitive Functional Dependency


Transitive Functional Dependency occurs when a non-prime
attribute is dependent on another non-prime attribute. For example,
in a relation, if Student ID determines Student’s City then Student ID
-> Student City is a Functional Dependency. And, if Student’s State
is determined by Student’s City then Student City->Student State is
also a Functional Dependency.

But, Student City is a Non-Prime Attribute that depends on Student


ID (Prime Attribute). Therefore, Student State becomes indirectly
dependent upon Student ID. We conclude that If Student ID ->
Student City and Student City->Student State exists, then Student
ID->Student State also exists.

Now, let’s see how to convert a table into the Third Normal Form
(3NF) of Normalization in DBMS. Suppose there is a table containing
the data of students as follows:

In this table, the Candidate Key is Student ID which is a prime


attribute, and the rest of the attributes are Non-Prime. Thus, there is
no partial dependency. So, it is in 2NF.

But, STU_ID->ZIPCODE and ZIPCODE->STU_STATE are the two


functional dependencies that exist. Since Zipcode is a non-prime
attribute, STU_STATE indirectly depends on STU_ID. Therefore,
STU_ID->STU_STATE also exists which is a Transitive Functional
Dependency.

To convert this into 3NF, we decompose the relation into two tables
as below.

Now, both the tables are free from Transitive Functional


Dependency and thus exist in 3NF of Normalization in DBMS.

Boyce-Codd Normal Form (BCNF)

A table or a relation exists in the BCNF if:

 It exists in 3NF.
 For every Functional Dependency A->B in the table, A is a
Super Key.

In simple words, if there exists a Functional Dependency X->Y in the


table such that Y is a Prime Attribute and X is a non-prime attribute,
then the table is not in BCNF.

Let’s see how to identify whether a Table exists in BCNF or not using
an example. Suppose there is a table named ‘Customer Service’
which contains the data about the Product, Customer, and Seller.
We can observe here that despite Normalization up to 3NF, there is
Data Redundancy in the table.

For the given table:

 Candidate Keys = {Product ID, Customer Name}


 Prime Attributes: Product ID and Customer Name
 Non-Prime Attributes: Seller Name

Therefore, the following conclusions can be drawn from the table:

1. Since all the values in the columns are atomic, the Table is
in 1NF.
2. The dependency Customer Name ->Seller Name is not
possible because One Customer may purchase from more
than one seller. Thus, there is no Partial Dependency i.e.
seller name(non-prime attribute) does not depend on
Customer Name(a subset of Candidate Key). Thus, the
Table is in 2NF.
3. Functional Dependencies present in the table are:

· Product ID, Customer Name->Seller Name

· Seller Name->Customer Name

For Transitive Dependency, a non-prime attribute should determine


another non-prime attribute. But here, Seller Name is a non-prime
attribute but Customer Name is a Prime Attribute. Therefore, there
is no Transitive Dependency. Thus, Table exists in 3NF.

4. For a table to exist in BCNF, in Every Functional Dependency,


LHS should be a Super Key. But in Seller Name->Customer Name,
Seller Name is not a Key. Therefore, the table does not exist in
BCNF.

Now, let’s convert the table into BCNF Form of Normalization in


DBMS. We break the table into two tables as follows:
Therefore, the above two tables satisfy the conditions of BCNF. This
is how the Normal Forms in DBMS remove the data redundancy in a
Relational Database.

You might also like