What Is Normalization

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

What is normalization?

Database normalization is the process of organizing data into tables in such a way that the results
of using the database are always unambiguous and as intended. Such normalization is intrinsic to
relational database theory. It may have the effect of duplicating data within the database and
often results in the creation of additional tables.
The concept of database normalization is generally traced back to E.F. Codd, an IBM researcher
who, in 1970, published a paper describing the relational database model. What Codd described
as "a normal form for database relations" was an essential element of the relational technique.
Such data normalization found a ready audience in the 1970s and 1980s -- a time when disk
drives were quite expensive and a highly efficient means for data storage was very necessary.
Since that time, other techniques, including denormalization, have also found favor.

Reasons for normalization.

 To correct duplicate data and database anomalies.


 To avoid creating and updating any unwanted data connections and dependencies.
 To prevent unwanted deletions of data.
 To optimize storage space.
 To reduce the delay when new types of data need to be introduced.

1NF

First normal form (1NF). This is the "basic" level of database normalization, and it generally
corresponds to the definition of any database, namely:
It contains two-dimensional tables with rows and columns.
 Each column corresponds to a sub object or an attribute of the object represented by the
entire table.
 Each row represents a unique instance of that subobject or attribute and must be different
in some way from any other row (that is, no duplicate rows are possible).
 All entries in any column must be of the same kind. For example, in the column labeled
"Customer," only customer names or numbers are permitted
2NF

Second normal form (2NF). At this level of normalization, each column in a table that is not a
determiner of the contents of another column must itself be a function of the other columns in the
table. For example, in a table with three columns containing the customer ID, the product sold
and the price of the product when sold, the price would be a function of the customer ID (entitled
to a discount) and the specific product. In this instance the data in the third column is said to be
dependent upon the data in the first and second columns. This dependency does not occur in the
1NF case.
The column labeled customer ID is considered a primary key because it is a column that
uniquely identifies the rows in that table, and it meets the other accepted requirements in
standard database management schema: It does not have NULL values and its values won't
change over time.

3NF

Third normal form (3NF). At the second normal form, modifications are still possible because a
change to one row in a table may affect data that refers to this information from another table.
For example, using the customer table just cited, removing a row describing a customer purchase
(because of a return, perhaps) will also remove the fact that the product has a certain price. In the
third normal form, these tables would be divided into two tables so that product pricing would be
tracked separately.

You might also like