Normalization FORM

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Normalization of Database Table

Normalization is a process or technique of evaluating and correcting table structure to minimize


data redundancy and to improve data integrity.
The presence of data redundancy and the loss of data integrity in a table structure are pointers
to a bad database design.
Data Redundancy is the repetition of the same piece of data at multiple places within a table.
The following are the drawbacks of data redundancy.

● Increased storage space which could lead to high cost and over-high resources
consumption.
● Data inconsistency

● Insertion Anomaly

● Deletion Anomaly

● Updation Anomaly
Data Integrity refers to the accuracy, completeness and consistency of data through its life-
cycle.
By accuracy, it means data must reflect reality. For example a customer’s address in the
database table must be up-to-date and correct.

By consistency, it means the data should be homogeneous across different systems and
formats. For example, the customer's name should be the same, no spelling changes
throughout. Example of a table with data – redundancy:

STUDENT-TABLE
student_id student_name department hod phone

1 ABC Computer Science Dr M 080123

2 DEF Computer Science Dr M. 080123

3 GHJ Computer Science Dr M. 080123

4 PQR Computer Science Dr M. 080123


The same value is repeated here for all the students on the columns: department,hod and
phone .This is an example of data redundancy and the table is said to be un – normalized.
If we imagine a large dataset of over hundreds of thousand records of students, the
space/storage consumption would be high and that in turn could increase data management
cost.

Insertion Anomaly
When certain data or attributes cannot be inserted into the database table without the
presence of other data of which existence of both are independent of each other , this situation
is called Insertion Anomaly.
For example, in the student-table above, let's assume to insert a new department (say
Mathematics), the insertion would not be possible without inserting the student name. But
both the student and department values are not dependent on each other for existence. So the
inability to add data to the database table due to the absence of other data whose existence is
not dependent on the former data, is called Insertion Anomaly.
To correct this, another table must be created for departments so that we can add more
departments without depending on the addition of student records.

Deletion Anomaly
If deletion of data from a database table causes deletion of other information(or data ) which
exist independently, such a deletion is said to be an anomaly. For example if there was a need
to delete all the students’ records from the table, the deletion of all the departments would go
together. Since the existence of departments is independent of the student, a separate table
must be created for departments which can give room to deletion of students’ records without
affecting the records of the departments, and vice-versa.

Updation Anomaly
If a change in any single piece of data in a table calls for changes at all the other copies of the
same data which had already come under insertion anomaly, the update is said to be anomaly.
Like if the HOD name of a department is to be updated, the same update must happen at all the
rows where the copies exist. Instead , a separate table for the department will correct this.

Normalization of database table works through a series of stages called Normal Forms:
They are:
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce – Codd Normal Form (BNF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF)

FIRST NORMAL FORM (1NF)


FIRST NORMAL FORM (1NF) defines the basic rules for structuring data in a relational database
table. The rules are:-
1. Each column must contain atomic value.
i.e Each cell should contain a single, indivisible value. There should not be
storing lists,arrays or complex structure within a single cell.
There shouldn’t be a composite attribute in a single cell.
For example there shouldn’t be storing of “city, state, LGA” in one cell. Each
should be in a separate column.
2. For every column, mixed types must be avoided.
For example, address, phone and email must not be in a single column.
3. Each column should have unique name
4. There must be a unique identity (primary key) for each table.
5. Order in which data is stored should be over- looked as SQL query can
easily
handle that in any order.

SECOND NORMAL FORM(2NF)

A table is in 2NF if it is in 1NF and there are no partial dependencies. A partial


dependency is a dependency where a non-key attribute is dependent on only part
of the primary key.

THIRD NORMAL FORM(3NF)


Rules: In addition to being in 2NF, no transitive dependencies exist. A transitive
dependency means a non-key attribute depends on another non-key attribute that
itself depends on the primary key.

Boyce-Codd Normal Form (BCNF)

A stricter form of 3NF where every determinant (attribute or set of attributes


causing another attribute to be uniquely determined) is a candidate key (a
potential primary key).

Fourth Normal Form (4NF)

Fourth Normal Form (4NF) is a level of database normalization aimed at


minimizing multivalued dependencies, which can impact data integrity and
efficiency. While it builds upon the principles of lower normal forms (1NF, 2NF,
and 3NF), it specifically addresses these non-standard dependencies.

Fifth Normal Form (5NF)

Fifth Normal Form (5NF), also known as Projection-Join Normal Form (PJ/NF), is
the highest level of database normalization aimed at eliminating all potential
anomalies and redundancies arising from complex data relationships. While
achieving 5NF is not always necessary or applicable, it represents the purest
form of data organization in relational databases.

You might also like