Database I: Methodology Normalization
Database I: Methodology Normalization
Database I: Methodology Normalization
Methodology
Normalization
1
Normalization ? (1/2)
Normalization is the technique for analyzing
relations based on their primary key (or candidate
keys in the case of BCNF) and functional
dependencies
Normalization is executed as a series of steps
Each step corresponds to a specific normal form
Different normal forms or levels are called as first,
second and so on
Each normalized form has certain requirements or
conditions, which must be fulfilled
2
Normalization ? (2/2)
If a relation fulfills any particular form then it
is said to be in that normal form
The minimum form in which all the tables are
in is called the normal form of entire database
It is performed after the logical database
design
3
Why Normalization ? (1/3)
Main aim of relational database design is to group
attributes into relations so as to minimize data
redundancy and thereby reduce the file storage
space
Consider StaffBranch relation below
It is clear that there is redundant information in
StaffBranch relation
SNo SName Position Salary BNo BAddress PhNo
S123 Asad Manager 23000 B5 Peshawar 20456789
S125 Jamal Programmer 40000 B5 Peshawar 20456789
S130 Ghamgeen Peon 5000 B6 Islamabad 34564567
4
6
Functional Dependencies (1/4)
Functional dependency describes the
relationship between attributes
In a database, we often have the case where
one field defines the other
For example, we can say that Employee ID
(EmpID) defines a name
What does this mean?
It means that if I have a database with
EmpIDs and names, and if I know someone's
EmpID, then I can find their name
7
Functional Dependencies (2/4)
From the word "defines," we means that for
every EmpID we will have one and only one
name
So we will say that we have defined name as
being functionally dependent on EmpID
A formal definition can be:
If A and B are attributes or sets of attributes of
relation R, we say that B is functionally dependent
on A if each value of A in R has associated with it
exactly one value of B in R 8
Functional Dependencies (3/4)
We write a functional dependency (FD) as:
EmpID → Name
Can be read as “EmpID defines Name" or “EmpID
implies Name“ or Name is functionally dependent on
EmpID”
Let us look at a an interesting example:
EmpID Name Job Salary
101 Darya Khan Programmer 40000
104 Shahrukh Designer 50000
103 Gul Jana System Engineer 35000
103 Gul Jana Project Manager 55000 9
Functional Dependencies (4/4)
We have the FD that EmpID → Name
This means that every time we find 103, we
find the name, Gul Jana
Thus something is on the left-hand side of a
FD, it does not imply that you have a key or
that it will be unique in the database
The FD X → Y only means that for every
occurrence of X you will get the same value
of Y
10
FD Inference Axioms or Armstrong
Axioms ( Transitivity Rule )
RNo Name School Location
Let us now consider
101 Rasheed PPS Peshawar
another example 102 Ani FCA Mardan
Here, we will define: 103 Naseer ECS Swabi
RNo → Name 104 Ghafoor FCA Mardan
RNo → School 105 Bilal PPS Peshawar
106 Sohail PPS Peshawar
Name → School
Have we violated any FDs with our data?
Because all RNos are unique, there cannot be a FD
violation of RNo → Name
The same comment is true for RNo → School and
Name → School 11
FD Inference Axioms or Armstrong
Axioms ( Transitivity Rule )
If we define a FD X → Y and we define a FD Y → Z,
then we know by inference that X → Z
So we can infer that RNo → Location
The inference illustrated is called the transitivity rule
of FD inference
To see that the FD RNo→ Location is true in our
data, you can note that given any value of RNo, you
always find a unique location for that person
This prove that transitivity rule exists
12
FD Inference Axioms or Armstrong
Axioms (Pseudo Transitivity Rule )
If A → B and CB → D, then AC → D
If RNo → Name and
Name,School →Location
Then we can write it as
RNo,Name → Location
13
FD Inference Axioms or Armstrong
Axioms ( Reflexive Rule )
If X is a composite, composed of A Name City
and B, then X→ A and X→ B Haleem Peshawar
Example: X= Name, City. Then we Rasheed Mardan
Kaleem Charsadda
are saying that X → Name and
X → City
The rule, says if I give you the combination
<Rasheed, Mardan>, what is this person's Name?
What is this person's City?
14
FD Inference Axioms or Armstrong
Axioms (Augmentation Rule)
If X→ Y, then XZ→ Y
Name City ShoeSize
Now, I claim that because Haleem Peshawar 10
Name→ City, that Rasheed Mardan 6
Name+ShoeSize → City Kaleem Charsadda 7
16
FD Inference Axioms or Armstrong
Axioms (Union Rule)
The union/additivity rule is the reverse of the
decomposition rule in that if X → Y and
X → Z, then X → YZ
If we were given that Name → City and given
that Name → ShowSize
We can immediately write Name→ City, Shoe
Size
17
Keys and Functional
Dependencies (1/4)
The main reason we identify the FDs and
inference rules is to be able to find keys and
develop normal forms for relational
databases
In any table, we want to find out which, if any
attribute(s), will identify the rest of the
attributes
An attribute that will identify all the other
attributes in row is called a "candidate key“
Consider the example on next slide 18
Keys and Functional
Dependencies (2/4)
Now suppose we defineRNo Name School Location
101 Rasheed PPS Peshawar
the following FDs: 102 Ani FCA Mardan
24
Normal Forms
(2N Form)
A relation is in second normal form (2NF) if and only
if it is in first normal form and all the non-key
attributes are fully functional dependent on the key
Partial dependencies are not allowed
The only time we have to be concerned about 2NF
is when the key is composite
Second normal form (2NF) addresses the concept
of removing duplicative data
A relation that is not in 2NF exhibits the update,
insertion and deletion anomalies
Consider the example on next slide 25
Normal Forms
(2N Form)
name job salary address
Employee (name, job,
Asad Welder 2300 Peshawar
salary, address) Asad Programmer 30000 Peshawar
FDs Karim Programmer 45000 Lahore
name → address
The problems developing here are redundancy and
anomalies
Here address depends only on the name, not the
job; this is an example of a partial dependency
26
Normal Forms
(2N Form)
As the table is in 1NF
The process for transforming a 1NF table to 2NF is:
1. Identify any partial determinants (attribute on LHS) other
than the composite key, and the columns they determine
2. Create and name a new table for each determinant and
the unique columns it determines
3. Move the determined columns from the original table to
the new table. The determinate becomes the primary key
of the new table
4. Delete the columns you just moved from the original table
except for the determinant which will serve as a foreign
key
5. The original table may be renamed to maintain semantic
meaning 27
Normal Forms
(2N Form)
Consider the non-2NF name job salary address
2 relation Asad Welder 2300 Peshawar
3 5 4
Program Student
stdId stName stAdr prName
prName prCrdts
MCS 64 S1020 Sohail Dar I-8 Islamabad MCS
35
Normal Forms
(Boyce-Codd Normal Form (BCNF))
Non-BCNF sno sname cno cname dateofenroll
S1020 Sohail Dar C104 E-Com 12/02/2007
S1038 Shoaib Ali C104 E-Com 10/01/2008
S1015 Tahira Ejaz H234 English 01/02/2007
Crs
cno cname
Std
C104 E-Com
sno sname H234 English
S1020 Sohail Dar Std_Crs
S1038 Shoaib Ali sno cname datofenroll
S1015 Tahira Ejaz S1020 C104 10/01/2008
S1038 C104 01/02/2007
36
S1015 H234 12/02/2007
Other Normal Forms
5NF deals with multi-valued dependency and
possible loss less decompositions
Domain Key Normal Form (DKNF) reduces
further chances of any possible inconsistency
37