Database I: Methodology Normalization

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 37

Database I

Methodology

Normalization

1
Normalization ? (1/2)
 Normalization is the technique for analyzing
relations based on their primary key (or candidate
keys in the case of BCNF) and functional
dependencies
 Normalization is executed as a series of steps
 Each step corresponds to a specific normal form
 Different normal forms or levels are called as first,
second and so on
 Each normalized form has certain requirements or
conditions, which must be fulfilled
2
Normalization ? (2/2)
 If a relation fulfills any particular form then it
is said to be in that normal form
 The minimum form in which all the tables are
in is called the normal form of entire database
 It is performed after the logical database
design

3
Why Normalization ? (1/3)
 Main aim of relational database design is to group
attributes into relations so as to minimize data
redundancy and thereby reduce the file storage
space
 Consider StaffBranch relation below
 It is clear that there is redundant information in
StaffBranch relation
SNo SName Position Salary BNo BAddress PhNo
S123 Asad Manager 23000 B5 Peshawar 20456789
S125 Jamal Programmer 40000 B5 Peshawar 20456789
S130 Ghamgeen Peon 5000 B6 Islamabad 34564567
4

S133 Khamar Sweaper 3000 B7 Mardan 23567890


Why Normalization ? (2/3)
 The serious problem with relations having
redundant information is update anomalies
 Update anomalies can be classified as:
 Insertion Anomalies
 There are two insertion anomalies in StaffBranch
relation
 To insert the details of new member, we have to also
enter the branch details for new entry, thus we have to
take great care that branch details is consistent
 To insert details of a new branch that currently has no
members of staff, the solution could be to insert nulls
for staff, but wait SNo is primary key ?
5
Why Normalization ? (3/3)
 Deletion Anomalies
 If we delete the row for staff number S133, the
information relating to branch number B7 is also lost
from the database
 Modification Anomalies
 If we want to change the phone number for branch
number B5, we must update the rows of all staff
located at that branch to avoid data inconsistency

6
Functional Dependencies (1/4)
 Functional dependency describes the
relationship between attributes
 In a database, we often have the case where
one field defines the other
 For example, we can say that Employee ID
(EmpID) defines a name
 What does this mean?
 It means that if I have a database with
EmpIDs and names, and if I know someone's
EmpID, then I can find their name
7
Functional Dependencies (2/4)
 From the word "defines," we means that for
every EmpID we will have one and only one
name
 So we will say that we have defined name as
being functionally dependent on EmpID
 A formal definition can be:
 If A and B are attributes or sets of attributes of
relation R, we say that B is functionally dependent
on A if each value of A in R has associated with it
exactly one value of B in R 8
Functional Dependencies (3/4)
 We write a functional dependency (FD) as:
EmpID → Name
 Can be read as “EmpID defines Name" or “EmpID
implies Name“ or Name is functionally dependent on
EmpID”
 Let us look at a an interesting example:
EmpID Name Job Salary
101 Darya Khan Programmer 40000
104 Shahrukh Designer 50000
103 Gul Jana System Engineer 35000
103 Gul Jana Project Manager 55000 9
Functional Dependencies (4/4)
 We have the FD that EmpID → Name
 This means that every time we find 103, we
find the name, Gul Jana
 Thus something is on the left-hand side of a
FD, it does not imply that you have a key or
that it will be unique in the database
 The FD X → Y only means that for every
occurrence of X you will get the same value
of Y
10
FD Inference Axioms or Armstrong
Axioms ( Transitivity Rule )
RNo Name School Location
 Let us now consider
101 Rasheed PPS Peshawar
another example 102 Ani FCA Mardan
 Here, we will define: 103 Naseer ECS Swabi
RNo → Name 104 Ghafoor FCA Mardan
RNo → School 105 Bilal PPS Peshawar
106 Sohail PPS Peshawar
Name → School
 Have we violated any FDs with our data?
 Because all RNos are unique, there cannot be a FD
violation of RNo → Name
 The same comment is true for RNo → School and
Name → School 11
FD Inference Axioms or Armstrong
Axioms ( Transitivity Rule )
 If we define a FD X → Y and we define a FD Y → Z,
then we know by inference that X → Z
 So we can infer that RNo → Location
 The inference illustrated is called the transitivity rule
of FD inference
 To see that the FD RNo→ Location is true in our
data, you can note that given any value of RNo, you
always find a unique location for that person
 This prove that transitivity rule exists

12
FD Inference Axioms or Armstrong
Axioms (Pseudo Transitivity Rule )
 If A → B and CB → D, then AC → D
 If RNo → Name and
Name,School →Location
 Then we can write it as
RNo,Name → Location

13
FD Inference Axioms or Armstrong
Axioms ( Reflexive Rule )
 If X is a composite, composed of A Name City
and B, then X→ A and X→ B Haleem Peshawar
 Example: X= Name, City. Then we Rasheed Mardan
Kaleem Charsadda
are saying that X → Name and
X → City
 The rule, says if I give you the combination
<Rasheed, Mardan>, what is this person's Name?
What is this person's City?

14
FD Inference Axioms or Armstrong
Axioms (Augmentation Rule)
 If X→ Y, then XZ→ Y
Name City ShoeSize
 Now, I claim that because Haleem Peshawar 10
Name→ City, that Rasheed Mardan 6
Name+ShoeSize → City Kaleem Charsadda 7

(i.e., we augmented Name with ShoeSize)


 Will there be a contradiction here, ever?
 No, because we defined Name → City, Name
plus more information will always identify the
unique City for that individual
 We can always add information to the LHS of
an FD and still have the FD be true
15
FD Inference Axioms or Armstrong
Axioms (Decomposition Rule)
 The decomposition/projectivity rule says that
if it is given that X → YZ (that is, X defines
both Y and Z), then X → Y and X → Z
 Suppose I define Name → City, ShoeSize
 This means for every occurrence of Name, I
have a unique value of City and a unique
value of ShoeSize

16
FD Inference Axioms or Armstrong
Axioms (Union Rule)
 The union/additivity rule is the reverse of the
decomposition rule in that if X → Y and
X → Z, then X → YZ
 If we were given that Name → City and given
that Name → ShowSize
 We can immediately write Name→ City, Shoe
Size

17
Keys and Functional
Dependencies (1/4)
 The main reason we identify the FDs and
inference rules is to be able to find keys and
develop normal forms for relational
databases
 In any table, we want to find out which, if any
attribute(s), will identify the rest of the
attributes
 An attribute that will identify all the other
attributes in row is called a "candidate key“
 Consider the example on next slide 18
Keys and Functional
Dependencies (2/4)
 Now suppose we defineRNo Name School Location
101 Rasheed PPS Peshawar
the following FDs: 102 Ani FCA Mardan

RNo → Name 103 Naseer ECS Swabi


104 Ghafoor FCA Mardan
RNo → School 105 Bilal PPS Peshawar

School → Location 106 Sohail PPS Peshawar

 What we want is to find the least number of


attributes that can identify all the rest
attributes (hopefully only one attribute)
 Suppose we take RNo as candidate key 19
Keys and Functional
Dependencies (3/4)
 Can we show that RNo "defines" all attributes in the
relation?
 We can use the transitive, reflexive and union rule
RNo → Name (given)
RNo → School (given)
School → Location (given)
RNo → Location (derived by the transitive rule)
RNo → RNo (reflexive rule)
RNo → RNo, Name, School, Location (union rule)
 Therefore RNo is a candidate key for this relation
 Finding a candidate key is the finding of a “closure of an
attribute or a set of attributes” that defines all the other
attributes 20
Keys and Functional
Dependencies (4/4)
 Are there any other candidate keys?
 Of course! By augmentation rule we can augment
RNo and name to form new candidate keys: RNo,
Name
 Is School a candidate key? No
 Once we have found a set of candidate keys (or
perhaps only one as in this case), we designate one
of the candidate keys as the primary key and move
on to normal forms
 FD rules are useful in developing Normal forms
21
Normal Forms
(1N Form)
 A relation is in first normal form if and only if
every attribute is single valued for each tuple or
 There be no repeating fields or groups in the row
 This means that each attribute in each row , or
each cell of the table, contains only one value
 First normal form say that the domains of
attributes of a relation are atomic
 The concept of 1NF is demonstrated on next
slide
22
Normal Forms
(1N Form)
Employee
Non-1NF: Name Address Dependant
Sohail Dar I-8 Islamabad Jamal,Kamal,Karim
Shoiab Ali Wah Cantt. Haider, Ihsan
Tahira Ijaz Peshawar Gul Mast
There are more than one values for
1NF Tables: Dependant, thus it is not in 1N form
Dependant
DependantName EmployeeName Employee
Jamal Sohail Dar Name Address
Kamal Sohail Dar Sohail Dar I-8 Islamabad
Karim Sohail Dar Shoiab Ali Wah Cantt.
Haider Shoiab Ali Tahira Ijaz Peshawar
Ihsan Shoiab Ali 23

Gul Mast Tahira Ijaz


Normal Forms
(1N Form)
 Note that the original table could be
reconstructed by combining these two tables
 By recording all the rows in the EMPLOYEE
table and combining them with the
corresponding rows in the EMPLOYEE table
where the names were equal (an equi-join
operation)

24
Normal Forms
(2N Form)
 A relation is in second normal form (2NF) if and only
if it is in first normal form and all the non-key
attributes are fully functional dependent on the key
 Partial dependencies are not allowed
 The only time we have to be concerned about 2NF
is when the key is composite
 Second normal form (2NF) addresses the concept
of removing duplicative data
 A relation that is not in 2NF exhibits the update,
insertion and deletion anomalies
 Consider the example on next slide 25
Normal Forms
(2N Form)
name job salary address
 Employee (name, job,
Asad Welder 2300 Peshawar
salary, address) Asad Programmer 30000 Peshawar
 FDs Karim Programmer 45000 Lahore

name+job → salary Javad Designer 40000 Lahore

name → address
 The problems developing here are redundancy and
anomalies
 Here address depends only on the name, not the
job; this is an example of a partial dependency
26
Normal Forms
(2N Form)
 As the table is in 1NF
 The process for transforming a 1NF table to 2NF is:
1. Identify any partial determinants (attribute on LHS) other
than the composite key, and the columns they determine
2. Create and name a new table for each determinant and
the unique columns it determines
3. Move the determined columns from the original table to
the new table. The determinate becomes the primary key
of the new table
4. Delete the columns you just moved from the original table
except for the determinant which will serve as a foreign
key
5. The original table may be renamed to maintain semantic
meaning 27
Normal Forms
(2N Form)
 Consider the non-2NF name job salary address
2 relation Asad Welder 2300 Peshawar

3 Asad Programmer 30000 Peshawar


NameAdd
Karim Programmer 45000 Lahore
name address Javad Designer 40000 Lahore
Asad Peshawar
Karim Lahore 1
NameJob
Javad Lahore
name job salary 5
Asad Welder 2300
4
Asad Programmer 30000
Karim Programmer 45000
28
Javad Designer 40000
Normal Forms
(3N Form)
 A relational table is in 3NF if it is already in
2NF and every non-key column is non-
transitively dependent upon its primary key
 OR All non-key attributes are functionally
dependent only upon the primary key
 Transitive Dependency
 Transitive dependency occurs when one non-key
attribute determines another non-key attribute
 E.g.
STD(stId, stName, stAdr, prName, prCrdts)
29
Normal Forms
(3N Form)
stId → stName, stAdr, prName, prCrdts
prName → prCrdts
 Here stId is the key
 prCrdts can be determined by prName
(Transitive Dependency)
 Thus transitive dependency exists which
means STUDENT relation is not in 3NF
 Transitive dependencies cause insertion,
deletion, and update anomalies 30
Normal Forms
(3N Form)
 So for 3NF we will concentrate on relations with
one candidate key
 The process for transforming a 2NF table to 3NF is:
1. Identify any determinants, other the primary key, and the
columns they determine
2. Create and name a new table for each determinant and
the unique columns it determines
3. Move the determined columns from the original table to
the new table. The determinate becomes the primary key
of the new table
4. Delete the columns you just moved from the original table
except for the determinate which will serve as a foreign
key
5. The original table may be renamed to maintain semantic
meaning 31
Normal Forms
(3N Form)
Non-3NF stdId stName stAdr prName prCrdts
S1020 Sohail Dar I-8 Islamabad MCS 64
S1038 Shoaib Ali G-6 Islamabad BCS 132
S1015 Tahira Ejaz L Rukh Wah MCS 64
2
1

3 5 4
Program Student
stdId stName stAdr prName
prName prCrdts
MCS 64 S1020 Sohail Dar I-8 Islamabad MCS

BCS 132 S1038 Shoaib Ali G-6 Islamabad BCS


S1015 Tahira Ejaz L Rukh Wah MCS
32
Normal Forms
(Boyce-Codd Normal Form (BCNF))
 1NF and 2NF identifies partial and transitive
dependencies on the basis of primary keys
 But what if such dependencies remain on other
candidate keys, if any exist
 BCNF identifies the functional dependencies on all
candidate keys
 It means that if relation has one candidate key and it
is in 3NF, it is also in BCNF
 3NF relation is not in BCNF if:
 The candidate keys in the relation are composite keys
 There is more than one candidate key in the relation 33
Normal Forms
(Boyce-Codd Normal Form (BCNF))
 The keys are not disjoint, that is, some attributes in the
keys are common
 For example consider the Enroll relation
Enroll (sno, sname, cno, cname, dateofenroll)
 Let us assume that the relation has the following
candidate keys:
sno,cno
sno,cname
sname,cno
sname, cname
 The relation is in 3NF but not in BCNF because
there are dependencies
34
Normal Forms
(Boyce-Codd Normal Form (BCNF))
sno → sname
cno → cname
 Where attributes are part of a candidate key are
dependent on part of another candidate key
 By decomposing the Enroll relation in the following
three relations result in BCNF
Std(sno, sname)
Crs(cno, cname)
Std_Crs(sno, cno, dateofenroll)

35
Normal Forms
(Boyce-Codd Normal Form (BCNF))
Non-BCNF sno sname cno cname dateofenroll
S1020 Sohail Dar C104 E-Com 12/02/2007
S1038 Shoaib Ali C104 E-Com 10/01/2008
S1015 Tahira Ejaz H234 English 01/02/2007
Crs
cno cname
Std
C104 E-Com
sno sname H234 English
S1020 Sohail Dar Std_Crs
S1038 Shoaib Ali sno cname datofenroll
S1015 Tahira Ejaz S1020 C104 10/01/2008
S1038 C104 01/02/2007
36
S1015 H234 12/02/2007
Other Normal Forms
 5NF deals with multi-valued dependency and
possible loss less decompositions
 Domain Key Normal Form (DKNF) reduces
further chances of any possible inconsistency

37

You might also like