Normalization Examples
Normalization Examples
Normalization Examples
Definition: A relation is said to be in First Normal Form (1NF) if and only if each
attribute of the relation is atomic. More simply, to be in 1NF, each column must
contain only a single value and each row must contain the same columns.
Example: The following table is NOT in First Normal Form:
Manager
Employees
Jim
Mary
Renee
Mike
Joe
Alan, Tim
Manager Employee
Jim
Susan
Jim
Rob
Jim
Beth
Mary
Alice
Mary
John
Mary
Asim
Renee
Mike
Joe
Alan
Joe
Tim
Create separate tables for each group of related data and identify each row
with a unique column (the primary key).
What do these rules mean when contemplating the practical design of a database?
Its actually quite simple.
The first rule dictates that we must not duplicate data within the same row of a
table. Within the database community, this concept is referred to as the atomicity of
a table. Tables that comply with this rule are said to be atomic. Lets explore this
principle with a classic example a table within a human resources database that
stores the manager-subordinate relationship. For the purposes of our example, well
impose the business rule that each manager may have one or more subordinates
while each subordinate may have only one manager.
Intuitively, when creating a list or spreadsheet to track this information, we would
draw up something like this:
Manager
Subordinate1
Subordinate2
Subordinate3
Bob
Jim
Mary
Beth
Mary
Mike
Jason
Carol
Jim
Alan
Subordinate4
Mark
However, recall the first rule imposed by 1NF: eliminate duplicative columns from the
same table. Clearly, the Subordinate1-Subordinate4 columns are duplicative. Take a
moment and ponder the problems raised by this scenario. Jim only has one
subordinate the Subordinate2-Subordinate4 columns are simply wasted storage
space (a precious database commodity). Furthermore, Mary already has 4
subordinates what happens if she takes on another employee? The whole table
structure would require modification.
At this point, a second bright idea usually occurs to database novices: We dont want
to have more than one column and we want to allow for a flexible amount of data
storage. Lets try something like this:
Manager
Subordinates
Bob
Mary
Jim
Alan
This solution is closer, but it also falls short of the mark. The subordinates column is
still duplicative and non-atomic. What happens when we need to add or remove a
subordinate? We need to read and write the entire contents of the table. Thats not
a big deal in this situation, but what if one manager had one hundred employees?
Also, it complicates the process of selecting data from the database in future queries.
Heres a table that satisfies the first rule of 1NF:
Manager
Subordinate
Bob
Jim
Bob
Mary
Bob
Beth
Mary
Mike
Mary
Jason
Mary
Carol
Mary
Mark
Jim
Alan
Now, what about the second rule: identify each row with a unique column or set of
columns (the primary key)? You might take a look at the table above and suggest
the use of the subordinate column as a primary key. In fact, the subordinate column
is a good candidate for a primary key due to the fact that our business rules specified
that each subordinate may have only one manager. However, the data that weve
chosen to store in our table makes this a less than ideal solution. What happens if
we hire another employee named Jim? How do we store his manager-subordinate
relationship in the database?
Its best to use a truly unique identifier (like an employee ID or SSN) as a primary
key. Our final table would look like this:
Manager
Subordinate
182
143
182
201
182
123
201
156
201
041
201
187
201
196
143
202
Now, our table is in first normal form! Join us next time as we explore the second normal
form.
Definition: In order to be in Second Normal Form, a relation must first fulfill the
requirements to be in First Normal Form. Additionally, each nonkey attribute in the
relation must be functionally dependent upon the primary key.
Example: The following relation is in First Normal Form, but not Second Normal
Form:
Order # Customer
Contact
Person
Total
Acme Widgets
John Doe
$134.23
Acme Widgets
John Doe
$1042.42
Acme Widgets
John Doe
$928.53
In the table above, the order number serves as the primary key. Notice that the
customer and total amount are dependent upon the order number -- this data is
specific to each order. However, the contact person is dependent upon the
customer. An alternative way to accomplish this would be to create two tables:
Customer
Contact Person
Acme Widgets
John Doe
ABC Corporation
Fred Flintstone
Order
Customer
#
Total
$134.23
Acme Widgets
Acme Widgets
$1042.42
Acme Widgets
$928.53
The creation of two separate tables eliminates the dependency problem experienced
in the previous case. In the first table, contact person is dependent upon the
primary key -- customer name. The second table only includes the information
unique to each order. Someone interested in the contact person for each order could
obtain this information by performing a JOIN operation.
Over the past month, we've looked at several aspects of normalizing a database
table. First, we discussed the basic principles of database normalization. Last time,
we explored the basic requirements laid down by the first normal form (1NF). Now,
let's continue our journey and cover the principles of second normal form (2NF).
Recall the general requirements of 2NF:
Remove subsets of data that apply to multiple rows of a table and place them
in separate rows.
Create relationships between these new tables and their predecessors through
the use of foreign keys.
These rules can be summarized in a simple statement: 2NF attempts to reduce the
amount of redundant data in a table by extracting it, placing it in new table(s) and
creating relationships between those tables.
Let's look at an example. Imagine an online store that maintains customer
information in a database. Their Customers table might look something like this:
CustNum FirstName
LastName Address
John
Doe
Alan
Johnson
Beth
4
5
City
State
ZIP
NY
11579
82 Evergreen
Tr
NY
11579
FL
33157
Jacob
Smith
IN
46637
Sue
Ryan
FL
33157
Sea Cliff
A brief look at this table reveals a small amount of redundant data. We're storing
the "Sea Cliff, NY 11579" and "Miami, FL 33157" entries twice each. Now, that might
not seem like too much added storage in our simple example, but imagine the
wasted space if we had thousands of rows in our table. Additionally, if the ZIP code
for Sea Cliff were to change, we'd need to make that change in many places
throughout the database.
In a 2NF-compliant database structure, this redundant information is extracted and
stored in a separate table. Our new table (let's call it ZIPs) might look like this:
ZIP
City
State
11579
Sea Cliff
NY
33157
Miami
FL
46637
South Bend
IN
If we want to be super-efficient, we can even fill this table in advance -- the post
office provides a directory of all valid ZIP codes and their city/state relationships.
Surely, you've encountered a situation where this type of database was utilized.
Someone taking an order might have asked you for your ZIP code first and then
knew the city and state you were calling from. This type of arrangement reduces
operator error and increases efficiency.
Now that we've removed the duplicative data from the Customers table, we've
satisfied the first rule of second normal form. We still need to use a foreign key to
tie the two tables together. We'll use the ZIP code (the primary key from the ZIPs
table) to create that relationship. Here's our new Customers table:
CustNum
FirstName
LastName
Address
ZIP
John
Doe
12 Main Street
11579
Alan
Johnson
82 Evergreen Tr
11579
Beth
Thompson
1912 NE 1st St
33157
Jacob
Smith
46637
Sue
Ryan
412 NE 1st St
33157
We've now minimized the amount of redundant information stored within the
database and our structure is in second normal form!
Definition: In order to be in Third Normal Form, a relation must first fulfill the
requirements to be in Second Normal Form. Additionally, all attributes that are not
dependent upon the primary key must be eliminated.
Examples: The following table is NOT in Third Normal Form:
Company
City
State ZIP
Acme Widgets
New York
NY 10169
ABC Corporation
Miami
FL
XYZ, Inc.
Columbia
MD 21046
33196
In this example, the city and state are dependent upon the ZIP code. To place this
table in 3NF, two separate tables would be created -- one containing the company
name and ZIP code and the other containing city, state, ZIP code pairings.
This may seem overly complex for daily applications and indeed it may be. Database
designers should always keep in mind the tradeoffs between higher level normal
forms and the resource issues that complexity creates.
Welcome to the fourth of five articles on database normalization! Over the past two
months, we've explored the basic principles of database normalization as well as the
specific requirements of first normal form (1NF) and second normal form (2NF).
This week we'll look at third normal form, otherwise known as 3NF.
In our first article, we looked at the basic requirements of 3NF. Here they are again
to refresh your memory:
Order Number
Customer
Number
Unit Price
Quantity
Total
241
$10
$20
842
$9
20
$180
919
$19
$19
919
$12
10
$120
Remember, our first requirement is that the table must satisfy the requirements of
1NF and 2NF. Are there any duplicative columns? No. Do we have a primary key?
Yes, the order number. Therefore, we satisfy the requirements of 1NF. Are there any
subsets of data that apply to multiple rows? No, so we also satisfy the requirements
of 2NF.
Now, are all of the columns fully dependent upon the primary key? The customer
number varies with the order number and it doesn't appear to depend upon any of
the other fields. What about the unit price? This field could be dependent upon the
Order Number
Customer Number
Unit Price
Quantity
241
$10
842
$9
20
919
$19
919
$12
10
Now our table is in 3NF. But, you might ask, what about the total? This is a derived
field and it's best not to store it in the database at all. We can simply compute it "on
the fly" when performing database queries. For example, we might have previously
used this query to retrieve order numbers and totals:
SELECTOrderNumber,Total
FROMWidgetOrders
We can now use the following query:
SELECTOrderNumber,UnitPrice*QuantityASTotal
FROMWidgetOrders
to achieve the same results without violating normalization rules.