5 normalizationDBMS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

DATABASE MANAGEMENT SYSTEMS

5.NORMALIZATION AND DATABASE TABLES


Database Tables and Normalization
❖ Normalization is a process for evaluating and correcting table structures to
minimize data redundancies, and eliminates data anomalies.
• Normalization works through a series of stages called normal forms. The first three
stages are described as first normal form (1NF), second normal form (2NF), and third
normal form (3NF). From a structural point of view, 2NF is better than 1NF, and 3NF is
better than 2NF.
• In order to meet performance requirements, we need to denormalize some
portions of database design.
• Denormalization produces a lower normal form, i.e., a 3NF will be
converted to a 2NF through denormalization.
The Need for Normalization:
To get a better idea of the normalization process, consider the database
activities of Construction Company that manages several building projects.

The Data reflects the assignment of employees to projects .An the employee can be
assigned to more than one project.
Example: Darlene M.Smithson (EMP_NUM =112) has been assigned to two project
Amber wave and star flight.
The structure of data set in the above table has the following deficiencies
1. The project number (PROJ_NUM) is intended to be a primary key or at least a
part of primary key, but it contains nulls.
2. The table entries invite data inconsistencies. example , The
JOB_CLASS_VALUE “ Elect engineer “ might be entered as “Elect .Engg”.
3. The table displays data redundancies .Those redundancies yield the following
anomalies :
❖ Update Anomalies : Modifying JOB CLASS for Employee number 105
requires many alterations , one for each EMP_NUM=105

1
DATABASE MANAGEMENT SYSTEMS

❖ Insertion Anomalies : To complete a row definition , a new employee must be


assigned to a project . If the employee is not assigned, a phantom project must
be created to complete the employee data.
❖ Deletion Anomalies: Only one employee is associated with a given project .If
that employee leaves the company and the employee data are deleted, the
project will also be deleted. To prevent the loss of information a fictitious
employee must be created to save the project information.
The Normalization Process:
The objective of normalization is to ensure that each table conforms to the concept of
well-formed relations, i.e., tables that have the following characteristics:
1. Each table represents a single subject .For example, a course table will contain
only data that directly pertains to courses .Similarly, and a student table will
contain only student data.
2. No data item will be unnecessarily stored in more than one table .The reason for
this requirement is to ensure that the data are updated in only one place.
3. All nonprime attributes in a table are dependent on the primary key .The reason for
this requirement is to ensure that the data are uniquely identifiable by a primary
key value.
4. Each table is void of insertion, update, deletion anomalies .This is to ensure the
integrity and consistency of the data.
The most common normal forms and their basic characteristics are listed in the table
below.
NORMAL FORMS CHARACTERISTICS
First Normal Form (1NF) Table format , no repeating groups and PK
identified
Second Normal Form(2NF) 1NF and no partial dependencies
Third Normal Form (3NF) 2NF and no transitive dependencies
Boyce-Codd Normal Form (BCNF) Every determinant is a candidate key
(Special case of 3NF)
Fourth Normal Form (4NF) 3NF and no independent multi-valued
dependencies

Functional Dependency
The attribute B is fully functionally dependent on the attribute A if each value of A determines
one and only one value of B.
Example : PROJ_NUM ? PROJ_NAME (read as PROJ_NUM functionally determines
PROJ_NAME)
In this case, the attribute PROJ_NUM is known as the determinant attribute and the attribute
PROJ_NAME is known as the dependent attribute.
Fully Functional Dependency: If the attribute B is functionally dependent on a composite key A but
not on any subset of that composite key, the attribute B is fully functionally dependent on A.

FIRST NORMAL FORM (1NF):


A relation is in 1NF if it contains no multivalued attributes i.e., eliminating repeated groups in a
table.
A relation is in 1NF if all the attributes come from domains with only atomic values .Thus a table that
contains multivalued attributes or repeating groups is not a relation
Example: Below Table which is not in 1NF

2
DATABASE MANAGEMENT SYSTEMS

The below table contain multivalued attributes .so the above relation is not in 1NF
Table 1:

Normalizing the table structure will reduce the data redundancies.


Conversion to First Normal Form:
The conversion to 1NF process starts with a simple three –step procedure.
Step 1: Eliminate the Repeating Groups:
To eliminate the repeating groups, eliminate nulls by making sure that each repeating group
attribute contains an appropriate data value.
Table 2:

Now the above Table 2 is in First norm Form


Step 2: Identify the Primary Key:
In the above table, the PROJ_NUM is not an adequate primary key because the project number does
not uniquely identify a row. For example, PROJ_NUM value 15 can identify any one of five
employees.
To maintain a proper primary key that will uniquely identify any attribute value, the new key
composed of PROJ_NUM and EMP_NUM.
For example consider the PROJ_NUM =15 and EMP_NUM=101 uniquely identifies the entries for
attributes PROJ_NAME , EMP_NAME , JOB_CLASS,CHG_HOUR and HOURS must be Evergreen
, John G.News , Database Engineer ,$105.00,23.8
Step 3: Identify All Dependencies:
From the above example, PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR and HOURS
values are all dependent on i.e., they are determined by the combination of PROJ_NUM and
EMP_NUM.
That can be represented as
PROJ_NUM, EMP_NUM → PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS
There are additional dependencies. For example, the project number identifies the project name i.e.,
project name is dependent on the project number i.e., PROJ_NUM →PROJ_NAME

3
DATABASE MANAGEMENT SYSTEMS

Similarly, EMP_NUM →EMP_NAME, JOB_CLASS, CHG_HOUR


All the dependencies for the above example can be represented in the below figure
Figure 1NF

Note the following dependency diagram features:


1. The primary key attributes are bold, underlined.
2. The arrows above the attributes indicate all dependences that are based on the primary key i.e.,
in this example the attributes are dependent on the combination of PROJ_NUM, EMP_NUM.
3. The arrows below the dependency diagram indicate less desirable dependencies.
Two types of dependencies exist:
a. Partial Dependency: A dependency based on only a part of a composite primary key is called
Partial Dependency.
For example: only PROJ_NUM determines the PROJ_NAME .that is PROJ_NAME is dependent
on the part of the primary key (PROJ_NUM, EMP_NUM).
b.Transitive Dependency :A transitive dependency is a dependency of one nonprime attribute on
another nonprime attribute .For example , CHG_HOUR nor JOB_CLASS is a prime attribute that
is , neither attribute is atleast part of a key –The condition is known as a transitive dependency

NOTE: Transitive dependencies yield data anomalies.

SECOND NORMAL FORM:


A Relation is in 2NF if it is in 1NF and every non-key attribute is Fully Functionally Dependent
(FFD) on the primary key , i.e., no non-key attribute is functionally dependent on part of the primary
key
CONVERSION TO SECOND NORMAL FORM:

Converting to 2NF is done only when the 1NF has a composite primary key .If the 1NF has a single
attribute primary key , then the table is automatically in 2NF.The 1NF to 2NF conversion is simple
starting with the 1NF format example above figure 1NF
Step 1 : Write each key component on a separate line ;
• Write each key component on a separate line , then write the original key on the last line
• Example :
PROJ_NUM
EMP_NUM
PROJ_NUM EMP_NUM
• Each component will become the key in a new table i.e, the original table is now divided into
three tables (PROJECT,EMPLOYEE and ASSIGNMENT )

4
DATABASE MANAGEMENT SYSTEMS

Step 2:Assign Corresponding Dependent Attributes :


Determine those attributes that are dependent on other attributes .
For example consider , the three new tables ( PROJECT , EMPLOYEE , and ASSIGNMENT ) are
described as by the following relational schemas :
PROJECT(PROJ_NUM ,PROJ_NAME)

EMPLOYEE(EMP_NUM , EMP_NAME ,JOB_CLASS ,CHG_HOUR)

ASSIGNMENT(PROJ_NUM ,EMP_NUM ,ASSIGN_HOURS)

The results of step 1 and step are shown in below figure .Here Most of the anomalies have been
eliminated .For example ,if we want to add ,change or delete a PROJECT record , we need to go to
only to the PROJECT table and make the change to only one record
FIGURE 2NF

NOTE : When a table’s primary key is composed of several attributes then the partial dependency
can exist .
A table whose primary key consists of only a single attribute is automatically in 2NF once it is in
1NF
THIRD NORMAL FORM :
A relation is in 3NF if and only if it is in 2NF and no transitive dependencies exist .
A transitive dependency is a functional dependency between two non-key attributes
CONVERSION TO THIRD NORMAL FORM:
The 2NF to 3NF conversion is simple starting with the 2NF format example shown in above figure
2NF
The anomalies resulted in Database organization shown in figure 2NF are easily eliminated by
completing the following three steps :
Step 1:Identify Each New Determinant :
• For every transitive dependency , write its determinant as a PK for a new Table .
• A determinant is any attribute whose value determines other values within the row .
• For example ,The figure 2NF shows only one table that contains a transitive dependency .
Therefore write the determinant for this transitive dependency as : JOB_CLASS
Step 2 : Identify the Dependent Attributes :
Identify the attributes that are dependent on each determinant identified in step 1 and identify the
dependency
In this example : JOB_CLASS → CHG_HOUR

5
DATABASE MANAGEMENT SYSTEMS

Name the table to reflect its contents and function .In this case table name is JOB
STEP 3: Remove the Dependent Attribute from Transitive Dependencies :
Eliminate all dependent attributes in the transitive relationships from each of the tables that have such
a transitive relationship .
In this example , eliminate CHG_HOUR from the EMPLOYEE table
i.e.,
EMP_NUM → EMP_NAME , JOB_CLASS
JOB_CLASS remains in the EMPLOYEE table to serve as a foreign key (FK).
After the 3NF conversion has been completed , the database contains four tables :
PROJECT (PROJ_NUM , PROJ_NAME)
EMPLOYEE (EMP_NUM , EMP_NAME ,JOB_CLASS )
JOB (JOB_CLASS ,CHG_HOUR)
ASSIGNMENT(PROJ_NUM,EMP_NUM ,ASSIGN_HOURS)

IMPROVING THE DESIGN


Normalization cannot be relied on to make good designs .Instead , normalization is valuable because
its use helps eliminate data redundancies .
Evaluate PK Assignments :
Each time a new employee is entered into the EMPLOYEE table , a JOB_CLASS value must be
entered .Unfortunately it is too easy to make data-entry errors that lead to referential integrity
violations .For example , entering DB Designer instead of Database Designer for the JOB_CLASS
attribute in the EMPLOYEE table will trigger such a violation . Therefore , it would be better to add a
JOB_CODE attribute to create a unique identifier
Evaluate Naming Conventions :
It is best adhere to the naming conventions .For example ,CHG_HOUR will be changed to
JOB_CHG_HOUR to indicate its association with the JOB table .
Refine Attribute Atomicity:
It is generally good practice to pay attention to the atomicity requirement .An atomic attribute is one
that cannot be further subdivided .Such an attribute is said to exhibit atomicity. Clearly , the use of the
EMP_NAME in the EMPLOYEE table is not atomic because , EMP _NAME can be subdivided into
last name , first name .In general designers prefer to use simple , single-valued attributes as indicated
by the business rules and processing requirements .
Identify New Attributes :
If the EMPLOYEE table were used in a real world environment ,several other attributes would have to
be added . For example , gross salary payments ,social security payments ,medical payments etc
.would be desirable . The same principle must be applied to all other tables in the design .

6
DATABASE MANAGEMENT SYSTEMS

Identify New Relationships :


The designer must take care to place the right attributes in the right table by using normalization
principles .
Refine Primary key as required for Data Granularity :
Granularity refers to the level of detail represented by the values stored in a tables row .Data stored at
their lowest level of granularity are said to be atomic data .For example ,the ASSIGNMENT table uses
the ASSIGN_HOURS attribute to represent the hours worked by a given employee on a given project .
In other words ASSIGN_HOURS represent hourly total , daily total ,weekly total ,monthly total or
yearly total ? clearly ASSIGN_HOURS requires more careful definition .

Maintain Historical Accuracy :


Writing the Job charge per hour into the ASSIGNMENT table is crucial to maintaining the historical
accuracy of the data in the ASSIGNMENT table .It would be appropriate to name this attribute
ASSIGN_CHG_HOUR .Although this attribute would appear to have the same value as
JOB_CHG_HOUR , that is true if the JOB_CHG_HOUR value remains same forever .
Evaluate Using Derived Attributes
We can use a derived attribute in the ASSIGNMENT table to store the actual charge made to a project
.That derived attribute , to be named ASSIGN_CHARGE , is the result of multiplying the ASSIGN
_HOURS by the ASSIGN_CHG_HOUR . Such derived attribute values can be calculated when they
are needed to write reports or invoices
Surrogate Key Considerations :
At the implementation level, a Surrogate key is a system defined attribute generally created and
maintained via the DBMS. Usually , a surrogate key is numeric and its value is automatically
incremented for each new row . For example , Microsoft Access uses an AutoNumber datatype ,
Microsoft SQL Server uses an identity column and Oracle uses sequence object
The JOB_CODE attribute was designated to be the JOB table’s primary key .Here JOB_CODE
does not prevent duplicate entries from being made , as shown in below figure

The data entries in the above table are inappropriate because they duplicate existing records yet there
is no violation of either the entity integrity or referential integrity. This “Multiple duplicate records “
problem was created when the JOB_CODE attribute was added as the PK .In any case , if JOB_CODE
is to be the surrogate PK , we still must ensure the existence of unique values in the
JOB_DESCRIPTION through the use of a unique index
Higher Normal Forms :
The BOYCE-CODD NORMAL FORM(BCNF):
When a relation has more than one candidate key, anomalies may present even though the relation is
in 3NF .
A relation is said to be in BCNF if and only if every determinant in the table is a candidate key.
That is When a table contains only one candidate key, the 3NF and BCNF are equivalent .

7
DATABASE MANAGEMENT SYSTEMS

For example , consider the following example STUDENT –ADVISOR table .


SID MAJOR ADVISOR MAJOR_GPA

100 Electronics AAA 4.0


100 Computers BBB 3.5
200 Physics CCC 3.4
300 Computers DDD 3.7
400 Electronics AAA 4.8
300 Physics CCC 4.9

In the above table , primary key is SID+MAJOR .


Attributes ADVISOR , MAJOR_GPA functionally dependent on primary key .This reflects the
constraint that A given STUDENT may have more than one MAJOR ,for each MAJOR a STUDENT
has exactly one ADVISOR and GPA.
There is a 2nd functional dependency i.e., MAJOR is functionally dependency , i.e., MAJOR is
functionally dependent on ADVISOR and Each ADVISOR advices exactly one MAJOR .
Anomalies in the STUDENT_ADVISOR Table :
• Update Anomaly :To replace Physics advisor CCC by YYY this change must be made in two
rows in the table
• Insertion Anomaly : To insert a row with the information that FFF advices in Computers ,it
is not possible until atleast one student in computer is assigned to advisor FFF .
• Deletion Anomaly : If a student number 100 in computers withdraw from college , we loss the
information that BBB advices Computers .
• All the above anomalies result from the fact that there is determinant (ADVISOR in this
example ) that is not a candidate key in the relation .
• The Relation which is in 3NF can be converted into BCNF using a simple two-step process .It
is shown in the following figure

STEP 1: Modify the relation by making determinant as the component of the primary key
as illustrated below
Revised Table :

SID ADVISOR MAJOR MAJOR_GPA

Step 2:If we examine above modified table , we will discover that revised table has a partial
functional dependency , so as a second step decompose the table to eliminate the Partial functional
dependency as shown below

SID ADVISOR MAJOR_GPA ADVISOR MAJOR

FOURTH NORMAL FORM (4NF):


When a relation is in BCNF , there are no longer anomalies that result from functional dependencies
.However , there may be some anomalies that result from multivalued dependencies .
8
DATABASE MANAGEMENT SYSTEMS

Multi-valued Dependencies (MVD):


The type of dependency that exists when there are atleast three attributes , for example , A , B and C
in a relation , For each value of ‘A’ a well-defined set of values of ‘B’ and values of ‘C’, but those
values ‘B’ and ‘C’ values are independent of each other .
Consider the following table :
OFFERING
COURSE LECTURER TEXTBOOK
Management Pranav Navathe
Harsha R.K.Taxali
Sri chand
IT Sashank Navathe
Bala Guruswami
In this table the following assumptions hold :
• Each Course has a well-defined set of Lecturers .
• Each Course has a well-defined set of text books that are used
• The text books that are used for a given course are independent of the Lecturers for that
Course.
The above table is converted into OFFERING by filling the empty cells
OFFERING
COURSE LECTURER TEXTBOOK

Management Pranav Navathe


Management Pranav R.K.Taxali
Management Harsha Navathe
Management Harsha R.K.Taxali
Management Sri chand Navathe
Management Sri chand R.K.Taxali
IT Sashank Navathe
IT Sashank Bala Guruswami
• In the above relation primary key is combination of
(COURSE, LECTURER , TEXTBOOK)
• Relation has no determinants other than primary key . Hence the relation is in BCNF
• The above relation suffers from the following update anomaly . For example , we want to add a
3rd text book say author name Yashwant to the Management course , three rows must be added
to the OFFERING relation for each Lecturer .
• This type of dependency is known as Multi-valued dependency .To remove multivalued
dependency the relation is decomposed into the following two relations
TEXT
LECTURER

COURSE LECTURER COURSE TEXT

• Now the relation is in 4NF


NORMALIZATION AND DATABASE DESIGN :
Normalization should be part of the design process .Therefore make sure that proposed entities meet
the required normal form before the table structures are created .
❖ First , an ERD is created through an iterative process .we begin by identifying relevant entities
, their attributes and their relationships .The ERD provides the big picture or macro view of an
organization’s data requirements and operations

9
DATABASE MANAGEMENT SYSTEMS

❖ Second ,normalization focuses on the characteristics of specific entities , i.e., Normalization


represents a micro view of the entities within the ERD .
❖ To illustrate the proper role of normalization in the design process,let’s examine the operations
of the contracting company
❖ Simple description of company’s operations ,two entities and their attributes are defined as :
PROJECT(PROJ_NUM , PROJ_NAME)
EMPLOYEE(EMP_NUM,EMP_LNAME,EMP_FNAME , EMP_INITIAL
,JOB_DESCRIPTION ,JOB_CHG_HOUR)
❖ PROJECT is in 3NF and needs no modification at this point
❖ EMPLOYEE requires additional scrutiny .The JOB_DESCRIPTION attribute defines
job classifications such as Systems analysts ,database designer ,and programmer.
Initial contracting Company ERD

Modified Contracting company ERD

TO represent the M:N relationship between EMPLOYEE and PROJECT , we might think that two
1:M relationships could be used –An can be assigned to many projects , and each project can have
many employees assigned to it .
Incorrect M:N Relationship representation

10
DATABASE MANAGEMENT SYSTEMS

Final Contract Company ERD is depicted below

DENORMALIZATION :
Denormalization: Denormalization is the process of combining two or more tables into a single table
by using denormalization process. We can reduce the data in memory space or storage space. In
general, Denormalization may partition a relation into several physical records, may combine
attributes from several relations together into one physical record, or may do combination of both.
The problem with normalization is that as tables are decomposed to
conform to normalization requirements, the number of database tables expands. Therefore, in order to
generate information data must be put together from various tables. Joining a large number of tables
takes additional input output (I/O) operations and processing logic, thereby reducing the system speed
The database design process could, in some cases, introduce some small degree of redundant data in
the model. This, in effect, creates “denormalized” relations.
Common Denormalization Examples
Case Example Rationale and Controls
Redundant data Storing ZIP ,CITY attributes in • Avoid extra join operations
the CUSTOMER table when ZIP • program can validate city (drop
determines CITY down box) based on the zip code

Storing STU_HRS and


Derived Data STU_CLASS (student • Avoid extra join operations
classification )When STU_HRS • program can validate classification
determines STU_CLASS (lookup) based on the student hours

Pre-aggregated Storing the student grade point • Avoid extra join operations
data (also average(STU_GPA) aggregate
derived data) value in the STUDENT table • Program computes the GPA every
when this can be calculated from time a grade is entered or updated
the ENROLL and COURSE
tables
• STU_GPA can be updated only via
administrative routine .

Information Using a temporary denormalized • Impossible to generate the data


Requirements table to hold report data .This is required by the report using the
required when creating a tabular plain SQL .
report in which the columns • No need to maintain table
represent data that is stored in the .Temporary table is deleted once
table as rows report is done
• Processing speed is not an issue

11
DATABASE MANAGEMENT SYSTEMS

In this case there is enough storage space , the designer’s choices could be narrowed down to
• Store the data in permanent denormalized table
• Create temporary denormalized table from the permanent normalized tables .The denormalized
tables exist only as long as it takes to generate the report .

12

You might also like