Normalization in DBMS11

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Normalization in DBMS: 1NF, 2NF, 3NF and

BCNF in Database
Normalization is a process of organizing the data in database to avoid data
redundancy, insertion anomaly, update anomaly & deletion anomaly. Let’s
discuss about anomalies first then we will discuss normal forms with
examples.

Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad
dream for any database administrator. Managing a database with anomalies is
next to impossible.

• Update anomalies − If data items are scattered and are not linked to each other
properly, then it could lead to strange situations. For example, when we try to update
one data item having its copies scattered over several places, a few instances get
updated properly while a few others are left with old values. Such instances leave the
database in an inconsistent state.

• Deletion anomalies − We tried to delete a record, but parts of it was left undeleted
because of unawareness, the data is also saved somewhere else.

• Insert anomalies − We tried to insert data in a record that does not exist at all.

Normalization is a method to remove all these anomalies and bring the database to a
consistent state.

First Normal Form


First Normal Form is defined in the definition of relations (tables) itself. This rule
defines that all the attributes in a relation must have atomic domains.
The values in an atomic domain are indivisible units.
We re-arrange the relation (table) as below, to convert it to First Normal Form.

Each attribute must contain only a single value from its pre-defined domain.

Second Normal Form


Before we learn about the second normal form, we need to understand the following −

• Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime


attribute.

• Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a


non-prime attribute.

If we follow second normal form, then every non-prime attribute should be fully
functionally dependent on prime key attribute. That is, if X → A holds, then there
should not be any proper subset Y of X, for which Y → A also holds true.

We see here in Student_Project relation that the prime key attributes are Stu_ID
and Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and
Proj_Name must be dependent upon both and not on any of the prime key attribute
individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name
can be identified by Proj_ID independently. This is called partial dependency,
which is not allowed in Second Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no
partial dependency.

Third Normal Form


For a relation to be in Third Normal Form, it must be in Second Normal form and the
following must satisfy −

• No non-prime attribute is transitively dependent on prime key attribute.

• For any non-trivial functional dependency, X → A, then either − o X is a superkey or, o A is prime
attribute.

We find that in the above Student_detail relation, Stu_ID is the key and only prime
key attribute. We find that City can be identified by Stu_ID as well as Zip itself.
Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip
→ City, so there exists transitive dependency.

To bring this relation into third normal form, we break the relation into two relations as
follows −

Boyce-Codd Normal Form


Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms.
BCNF states that −

• For any non-trivial functional dependency, X → A, X must be a super-key.

In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip
is the super-key in the relation ZipCodes. So, Stu_ID → Stu_Name, Zip and

Zip → City

Which confirms that both the relations are in BCNF.


Database normalization is a database schema design technique, by which an existing
schema is modified to minimize redundancy and dependency of data. Normalization
split a large table into smaller tables and define relationships between them to
increases the clarity in organizing data.

• The words normalization and normal form refer to the structure of a database.
• Normalization was developed by IBM researcher E.F. Codd In the 1970s.
• Normalization increases the clarity in organizing data in Database.

Normalization of a Database is achieved by following a set of rules called 'forms' in


creating the database.

First Normal Form

Each column is unique in 1NF.

Example:
Sample Employee table, it displays employees are working with multiple departments.

Employee Age Department


Melvin 32 Marketing, Sales

Edward 45 Quality
Assurance

Alex 36 Human
Resource
Employee table following 1NF:

Employee Age Department

Melvin 32 Marketing

Melvin 32 Sales

Edward 45 Quality Assurance

Alex 36 Human Resource

Second Normal Form

The entity should be considered already in 1NF, and all attributes within the entity
should depend solely on the unique identifier of the entity.

Example:

Sample Products table:

productID product Brand

1 Monitor Apple

2 Monitor Samsung
3 Scanner HP

4 Head phone JBL

Product table following 2NF:


Products Category table:

productID product

1 Monitor

2 Scanner

3 Head
phone
Brand table:

Products Brand table:

pbID productID brandID

1 1 1
brandID brand

1 Apple

2 Samsung
2 1 2

3 HP
3 2 3

4 JBL

4 3 4

Third Normal Form (3NF)


The entity should be considered already in 2NF, and no column entry should be
dependent on any other entry (value) other than the key for the table.

If such an entity exists, move it outs outside into a new table.

3NF is achieved considered as the database is normalized.

Boyce-Codd Normal Form (BCNF)


3NF and all tables in the database should be only one primary key.

Fourth Normal Form (4NF)


Tables cannot have multi-valued dependencies on a Primary Key.

Fifth Normal Form (5NF)


A composite key shouldn't have any cyclic dependencies.

Well, this is a highly simplified explanation for Database Normalization. One can study this
process extensively though. After working with databases for some time, you'll
automatically create Normalized databases, as it's logical and practical.

First Normal Form (1NF)

A table is said to be in First Normal Form (1NF) if and only if each attribute of the relation is atomic.
That is,

• Each row in a table should be identified by primary key (a unique column value or group
of unique column values)
• No rows of data should have repeating group of column values.

Let's consider the STUDENT table with his ID, Name address and 2 subjects that he has opted for.

• Look at Chris entry. He has only subject. Hence Subject2 for him is NULL. Here storage
space for second entry is simply wasted.
• In the case of Joseph, he has two subjects, Mathematics and Physics, both the columns
have values. Imagine if he opts for third subject? There is no column for his third entry. In
this case, whole table needs to be altered, which is not good at this stage. Once
database is designed, it should be a perfect one. We should not be modifying it as we
start adding/updating data.
• One of the requirements of 1NF is, each table should have primary key. This key in the
table makes each record unique. In our example we have it already- STUDENT_ID.
• Here, SUBJECT1 and SUBJECT2 are same set of columns, i.e.; it has same kind of
information stored - Subject, which is a violation of first rule of 1NF. As it states, there
should not be any repeating columns. We have to remove such columns. But think how?
In order to have STUDENT in 1NF, we have to remove multiple SUBJECT columns from STUDENT
table. Instead, create only one SUBJECT column, and for each STUDENT enters as many rows
as SUBJECT he has. After making this change, the above table will change as follows:
Now STUDENT_ID alone cannot be a primary key, because it does not uniquely determines
each record in the table. If we want to records for Joseph, and we query by his ID,100, gives
us two records. Hence Student_ID is no more a primary key. When we observe the data in the
table, all the four field uniquely determines each record. Hence all four fields together
considered as primary key.

Thus, the above table is in 1NF form.

Second Normal Form (2NF)


A relation is said to be in a second normal form if and only if,
• it's in first normal form
• Every non-key attributes are identified by the use of primary key

• All subset of data, which applies to have multiple rows in a table must be removed and
placed in a new table. And this new table and the parent table should be related by the
use of foreign key.

In the 1NF STUDENT table above, Joseph and Allen have multiple rows because of their
SUBJECTS. Although it is in 1NF form, it wastes storage space by repeating whole of their
information - name and address in each row. In addition, Student ID alone is strong enough to
be a primary key. If we make Student_ID as primary, all other attributes in the table cannot be
uniquely identified. This is because of multiple rows exists for single ID. Hence it does not
satisfy second condition of 2NF.

So what we can do here is, apply the third condition of 2NF. Remove Subject from the
STUDENT table and create a separate table for it. So the two tables are - STUDENT and
SUBJECT. Now the STUDENT table will have only STUDENT information - STUDENT_ID,
STUDENT_NAME and ADDRESS. New SUBJECT table will have STUDENT_ID and
SUBJECT_NAME.

Now there is no repeating group of columns in STUDENT table and STUDENT_ID is the
primary key of STUDENT table. It uniquely identifies the Student name and address which are
non key attributes of this table. Hence it satisfies both 1NF and 2NF.

In the new SUBJECT table, Subject names opted by each student. Since same student
cannot opt for same subject multiple times in academic year, there will not be any duplicity
of data. But Student_ID alone is not unique; hence it cannot be a primary key. Both
Student_ID and Subject_Name is unique in this table. Hence both of them together become a
primary key. Hence SUBJECT table satisfies 1NF.

There is no non-key attributes in SUBJECT table. Hence we cannot verify second condition
of 2NF. According to the third condition of 2NF, we have removed the data which is forming
multiple rows and put those details in new table. It also states that there should be
relationship between the original table and new table by using foreign key constraint. In the
SUBJECT table, STUDENT_ID is derived from STUDENT table. In this table, STUDENT _ID is
part of primary key as well as it is a foreign key. Hence we can easily relate both STUDENT
and SUBJECT table by using STUDENT_ID. Hence satisfies the third condition of 2NF.

If we want to know which all subjects Joseph has opted for, we would query as below:

SELECT std.STUDENT_ID,
std.STUDENT_NAME,
sb.SUBJECT

FROM STUDENT std, SUBJECT sb

WHERE std.STUDENT_ID = sb.STUDENT_ID

AND std.STUDENT_NAME = 'Joseph';


Similarly, if there are multiple students staying in same area, STUDENT table is having huge
amount of records and there is a change requested for street or city name, then whole
STUDENT table needs to be searched and updated. Imagine, we have to update 'Fraser
Village Drive' to Fraser Village Dr'. The Update statement would be

UPDATE STUDENT std

SET std.STREET = 'Fraser Village Dr'

WHERE std .STREET = 'Fraser Village Drive';

Above query will search whole student table for 'Fraser Village Drive' and then update it to
'Fraser Village Dr’. But searching a huge table and updating the single or multiple records will
be a very time consuming, hence affecting the performance of the database.

Instead, if we have these details in a separate table ZIPCODE and is related to STUDENT
table using zip? However ZIPCODE table will have comparatively less amount of records
and we just have to update ZIPCODE table once. It will automatically reflect in the STUDENT
table! Hence making the database and query simpler! And table is in 3NF.

UPDATE ZIPCODE z

SET z.STREET = 'Fraser Village Dr'


WHERE z .STREET = 'Fraser Village Drive';
Now if we have to select the whole address of a student, Chris, we join both STUDENT and
ZIPCODE table using ZIP and get the whole address.

SELECT std.STUDENT_NAME,

z.STREET,

z.CITY,

z.ZIP

FROM STUDENT std, ZIPCODE z

WHERE std.ZIP = z.ZIP

AND std.STUDENT_NAME = 'Chris';

Boyce-Codd Normal Form (3.5NF)


This normal form is also referred as 3.5 normal forms. This normal form
• Meets all the requirement of 3NF
• Any table is said to be in BCNF, if its candidate keys do not have any partial
dependency on the other attributes. i.e.; in any table with (x, y, z) columns, if (x, y)->z
and z->x then it's a violation of BCNF. If (x, y) are composite keys and (x, y)->z, then
there should not be any reverse dependency, directly or partially.

In the above 3NF example, STUDENT_ID is the Primary key in STUDENT table and ZIP is the
primary key in the ZIPCODE table. There is no other key column in each of the tables which
determines the functional dependency. Hence it's in BCNF form. That is, with STUDENT_ID,
we can retrieve STUDENT_NAME and ZIP from STUDENT table. Similarly, with ZIP value, we
can retrieve STREET and CITY from ZIPCODE table

Let us consider another example - consider each student who has taken major subjects has
different advisory lecturers. Each student will have different advisory lecturers for same
Subjects. There exists following relationship, which is violation of BCNF.

(STUDENT_ID, MAJOR_SUBJECT) -> ADVISORY_LECTURER

ADVISORY_LECTURER -> MAJOR_SUBJECT


i.e. Major Subject which is a part of composite candidate key is determined non-key attribute of
the same table, which is against the rule.

Below table will have all the anomalies too. If we delete any student from below table, it
deletes lecturer's information too. If we add any new lecturer/student to the database, it
needs other related information also. Also, if we update subject for any student, his lecturer
info also needs to be changed, else it will lead to inconsistency.
Hence we need to decompose the table so that eliminates so that it eliminates such
relationship. Now in the new tables below, there are no inter-dependent composite keys
(moreover, there is no composite key in both the tables). If we need to add/update/delete
any lecturer, we can directly insert record into STUDENT_ADVISOR table, without affecting
STUDENT_MAJOR table. If we need to insert/update/delete any subject for a student, then
we can directly do it on STUDENT_MAJOR table, without affecting STUDENT_ADVISOR
table. When we have both advisor as well as major subject information, then we can directly
add/update both the tables. Hence we have eliminated all the anomalies in the database.

You might also like