De Unit 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

UNIT-3

Relational database design (RDD)


Relational database design (RDD) models’ information and data into a set of tables with
rows and columns. Each row of a relation/table represents a record, and each column
represents an attribute of data. The Structured Query Language (SQL) is used to manipulate
relational databases. The design of a relational database is composed of four stages, where
the data are modeled into a set of related tables. The stages are −

• Define relations/attributes
• Define primary keys
• Define relationships
• Normalization
Relational databases differ from other databases in their approach to organizing data and
performing transactions. In an RDD, the data are organized into tables and all types of data
access are carried out via controlled transactions. Relational database design satisfies the
ACID (atomicity, consistency, integrity, and durability) properties required from a database
design. Relational database design mandates the use of a database server in applications for
dealing with data management problems.

Relational Database Design Process


Database design is more art than science, as you have to make many decisions. Databases
are usually customized to suit a particular application. No two customized applications are
alike, and hence, no two databases are alike. Guidelines (usually in terms of what not to do
instead of what to do) are provided in making these design decision, but the choices
ultimately rest on the designer.
Step 1 − Define the Purpose of the Database (Requirement Analysis)

• Gather the requirements and define the objective of your database.


• Drafting out the sample input forms, queries and reports often help.
Step 2 − Gather Data, Organize in tables and Specify the Primary Keys
• Once you have decided on the purpose of the database, gather the data that are
needed to be stored in the database. Divide the data into subject-based tables.
• Choose one column (or a few columns) as the so-called primary key, which uniquely
identifies the each of the rows.
Step 3 − Create Relationships among Tables
A database consisting of independent and unrelated tables serves little purpose (you may
consider using a spreadsheet instead). The power of a relational database lies in the
relationship that can be defined between tables. The most crucial aspect in designing a
relational database is to identify the relationships among tables. The types of relationship
include:

• one-to-many
• many-to-many
• one-to-one
One-to-Many
In a "class roster" database, a teacher may teach zero or more classes, while a class is taught
by one (and only one) teacher. In a "company" database, a manager manages zero or more
employees, while an employee is managed by one (and only one) manager. In a "product
sales" database, a customer may place many orders; while an order is placed by one
particular customer. This kind of relationship is known as one-to-many.
Many-to-Many
In a "product sales" database, a customer's order may contain one or more products; and a
product can appear in many orders. In a "bookstore" database, a book is written by one or
more authors; while an author may write zero or more books. This kind of relationship is
known as many-to-many.
One-to-One
In a "product sales" database, a product may have optional supplementary information such
as image, more description and comment. Keeping them inside the Products table results in
many empty spaces (in those records without these optional data). Furthermore, these large
data may degrade the performance of the database.
Instead, we can create another table (say ProductDetails, ProductLines or ProductExtras) to
store the optional data. A record will only be created for those products with optional data.
The two tables, Products and ProductDetails, exhibit a one-to-one relationship. That is, for
every row in the parent table, there is at most one row (possibly zero) in the child table. The
same column productID should be used as the primary key for both tables.
Column Data Types
You need to choose an appropriate data type for each column. Commonly data types
include integers, floating-point numbers, string (or text), date/time, binary, collection (such
as enumeration and set).
Step 4 − Refine & Normalize the Design
For example,

• adding more columns,


• create a new table for optional data using one-to-one relationship,
• split a large table into two smaller tables,
• Other methods.
Normalization
Normalization
o Normalization is the process of organizing the data in the database.

o Normalization is used to minimize the redundancy from a relation or set of relations. It is


also used to eliminate the undesirable characteristics like Insertion, Update and Deletion
Anomalies.

o Normalization divides the larger table into the smaller table and links them using
relationship.

o The normal form is used to reduce redundancy from the database table.

Types of Normal Forms


There are the four types of normal forms:

Normal Description
Form

1NF A relation is in 1NF if it contains an atomic value.

2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency


exists.
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no
multi-valued dependency.

5NF A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless.

Apply the so-called normalization rules to check whether your database is structurally
correct and optimal.
First Normal Form (1NF): A table is 1NF if every cell contains a single value, not a list of
values. This property is known as atomic. 1NF also prohibits a repeating group of columns
such as item1, item2, itemN. Instead, you should create another table using a one-to-many
relationship.
Second Normal Form (2NF) − A table is 2NF if it is 1NF and every non-key column is fully
dependent on the primary key. Furthermore, if the primary key is made up of several
columns, every non-key column shall depend on the entire set and not part of it.
For example, the primary key of the OrderDetails table comprising orderID and productID. If
unitPrice is dependent only on productID, it shall not be kept in the OrderDetails table (but
in the Products table). On the other hand, if the unit price is dependent on the product as
well as the particular order, then it shall be kept in the OrderDetails table.
Third Normal Form (3NF) − A table is 3NF if it is 2NF and the non-key columns are
independent of each other. In other words, the non-key columns are dependent on primary
key, only on the primary key and nothing else. For example, suppose that we have a
Products table with columns productID (primary key), name and unitPrice. The column
discountRate shall not belong to the Products table if it is also dependent on the unitPrice,
which is not part of the primary key.
Higher Normal Form: 3NF has its inadequacies, which leads to a higher Normal form, such
as Boyce/Codd Normal form

At times, you may decide to break some of the normalization rules, for performance reason
(e.g., create a column called totalPrice in Orders table which can be derived from the
orderDetails records); or because the end-user requested for it. Make sure that you fully
aware of it, develop programming logic to handle it, and properly document the decision.
Integrity Rules
You should also apply the integrity rules to check the integrity of your design −
1. Entity Integrity Rule − The primary key cannot contain NULL. Otherwise, it cannot
uniquely identify the row. For composite key made up of several columns, none of the
columns can contain NULL. Most of the RDBMS check and enforce this rule.
2.Referential Integrity Rule − Each foreign key value must be matched to a primary key
value in the table referenced (or parent table).
You can insert a row with a foreign key in the child table only if the value exists in the parent
table.
If the value of the key changes in the parent table (e.g., the row updated or deleted), all
rows with this foreign key in the child table(s) must be handled accordingly. You could either
(a) disallow the changes; (b) cascade the change (or delete the records) in the child tables
accordingly; (c) set the key value in the child tables to NULL.
Most RDBMS can be set up to perform the check and ensure the referential integrity, in a
specified manner.
3.Business logic Integrity − Besides the above two general integrity rules, there could be
integrity (validation) pertaining to the business logic, e.g., zip code shall be 5-digit within a
certain ranges, delivery date and time shall fall in the business hours; quantity ordered shall
be equal or less than quantity in stock, etc. These could be carried out invalidation rule (for
the specific column) or programming logic.
Column Indexing
You could create an index on the selected column(s) to facilitate data searching and
retrieval. An index is a structured file that speeds up data access for SELECT but may slow
down INSERT, UPDATE, and DELETE. Without an index structure, to process a SELECT query
with a matching criterion (e.g., SELECT * FROM Customers WHERE name='Tan Ah Teck'), the
database engine needs to compare every record in the table. A specialized index (e.g., in
BTREE structure) could reach the record without comparing every record. However, the
index needs to be rebuilt whenever a record is changed, which results in overhead
associated with using indexes.
The index can be defined on a single column, a set of columns (called concatenated index),
or part of a column (e.g., first 10 characters of a VARCHAR(100)) (called partial index). You
could build more than one index in a table. For example, if you often search for a customer
using either customerName or phone number, you could speed up the search by building an
index on column customerName, as well as phoneNumber. Most RDBMS builds an index on
the primary key automatically.

What is Functional Dependency


Functional dependency in DBMS, as the name suggests is a relationship between attributes
of a table dependent on each other. Introduced by E. F. Codd, it helps in preventing data
redundancy and gets to know about bad designs.
To understand the concept thoroughly, let us consider P is a relation with attributes A and B.
Functional Dependency is represented by -> (arrow sign)
Then the following will represent the functional dependency between attributes with an
arrow sign −
A -> B

Above suggests the following:

Example
The following is an example that would make it easier to understand functional dependency

We have a <Department> table with two attributes − DeptId and DeptName.

DeptId = Department ID
DeptName = Department Name

The DeptId is our primary key. Here, DeptId uniquely identifies the DeptName attribute.
This is because if you want to know the department name, then at first you need to have
the DeptId.

DeptId DeptName

001 Finance

002 Marketing

003 HR

Therefore, the above functional dependency between DeptId and DeptName can be
determined as DeptId is functionally dependent on DeptName −

DeptId -> DeptName

Types of Functional Dependency


Functional Dependency has three forms −

• Trivial Functional Dependency


• Non-Trivial Functional Dependency
• Completely Non-Trivial Functional Dependency
Let us begin with Trivial Functional Dependency −
Trivial Functional Dependency
It occurs when B is a subset of A in −

A ->B

Example
We are considering the same <Department> table with two attributes to understand the
concept of trivial dependency.
The following is a trivial functional dependency since DeptId is a subset
of DeptId and DeptName

{ DeptId, DeptName } -> Dept Id

Non –Trivial Functional Dependency


It occurs when B is not a subset of A in −
A ->B

Example

DeptId -> DeptName

The above is a non-trivial functional dependency since DeptName is a not a subset of


DeptId.
Completely Non - Trivial Functional Dependency
It occurs when A intersection B is null in −

A ->B

Armstrong’s Axioms Property of Functional Dependency


Armstrong’s Axioms property was developed by William Armstrong in 1974 to reason about
functional dependencies.
The property suggests rules that hold true if the following are satisfied:

• Transitivity
If A->B and B->C, then A->C i.e. a transitive relation.
• Reflexivity
A-> B, if B is a subset of A.
• Augmentation
The last rule suggests: AC->BC, if A->B
Data Anomalies
Normalization is necessary if you do not do it then the overall integrity of the data stored in
the database will eventually degrade. Specifically, this is due to data anomalies. These
anomalies naturally occur and result in data that does not match the real-world the
database purports to represent.

Anomalies are caused when there is too much redundancy in the database's information.
Anomalies can often be caused when the tables that make up the database suffer from poor
construction. So, what does "poor construction" mean? Poor table design will become
evident if, when the designer creates the database, he doesn't identify the entities that
depend on each other for existence, like the rooms of a hotel and the hotel, and then
minimize the chance that one would ever exist independent of the other.

The normalization process was created largely in order to reduce the negative effects of
creating tables that will introduce anomalies into the database.

There are three types of Data Anomalies: Update Anomalies, Insertion Anomalies, and
Deletion Anomalies.

Update Anomalies happen when the person charged with the task of keeping all the
records current and accurate, is asked, for example, to change an employee’s title due to a
promotion. If the data is stored redundantly in the same table, and the person misses any of
them, then there will be multiple titles associated with the employee. The end user has no
way of knowing which is the correct title.

Insertion Anomalies happen when inserting vital data into the database is not possible
because other data is not already there. For example, if a system is designed to require that
a customer be on file before a sale can be made to that customer, but you cannot add a
customer until they have bought something, then you have an insert anomaly. It is the
classic "catch-22" situation.

Deletion Anomalies happen when the deletion of unwanted information causes desired
information to be deleted as well. For example, if a single database record contains
information about a particular product along with information about a salesperson for the
company and the salesperson quits, then information about the product is deleted along
with salesperson information.

You might also like