De Unit 3
De Unit 3
De Unit 3
• Define relations/attributes
• Define primary keys
• Define relationships
• Normalization
Relational databases differ from other databases in their approach to organizing data and
performing transactions. In an RDD, the data are organized into tables and all types of data
access are carried out via controlled transactions. Relational database design satisfies the
ACID (atomicity, consistency, integrity, and durability) properties required from a database
design. Relational database design mandates the use of a database server in applications for
dealing with data management problems.
• one-to-many
• many-to-many
• one-to-one
One-to-Many
In a "class roster" database, a teacher may teach zero or more classes, while a class is taught
by one (and only one) teacher. In a "company" database, a manager manages zero or more
employees, while an employee is managed by one (and only one) manager. In a "product
sales" database, a customer may place many orders; while an order is placed by one
particular customer. This kind of relationship is known as one-to-many.
Many-to-Many
In a "product sales" database, a customer's order may contain one or more products; and a
product can appear in many orders. In a "bookstore" database, a book is written by one or
more authors; while an author may write zero or more books. This kind of relationship is
known as many-to-many.
One-to-One
In a "product sales" database, a product may have optional supplementary information such
as image, more description and comment. Keeping them inside the Products table results in
many empty spaces (in those records without these optional data). Furthermore, these large
data may degrade the performance of the database.
Instead, we can create another table (say ProductDetails, ProductLines or ProductExtras) to
store the optional data. A record will only be created for those products with optional data.
The two tables, Products and ProductDetails, exhibit a one-to-one relationship. That is, for
every row in the parent table, there is at most one row (possibly zero) in the child table. The
same column productID should be used as the primary key for both tables.
Column Data Types
You need to choose an appropriate data type for each column. Commonly data types
include integers, floating-point numbers, string (or text), date/time, binary, collection (such
as enumeration and set).
Step 4 − Refine & Normalize the Design
For example,
o Normalization divides the larger table into the smaller table and links them using
relationship.
o The normal form is used to reduce redundancy from the database table.
Normal Description
Form
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency
and joining should be lossless.
Apply the so-called normalization rules to check whether your database is structurally
correct and optimal.
First Normal Form (1NF): A table is 1NF if every cell contains a single value, not a list of
values. This property is known as atomic. 1NF also prohibits a repeating group of columns
such as item1, item2, itemN. Instead, you should create another table using a one-to-many
relationship.
Second Normal Form (2NF) − A table is 2NF if it is 1NF and every non-key column is fully
dependent on the primary key. Furthermore, if the primary key is made up of several
columns, every non-key column shall depend on the entire set and not part of it.
For example, the primary key of the OrderDetails table comprising orderID and productID. If
unitPrice is dependent only on productID, it shall not be kept in the OrderDetails table (but
in the Products table). On the other hand, if the unit price is dependent on the product as
well as the particular order, then it shall be kept in the OrderDetails table.
Third Normal Form (3NF) − A table is 3NF if it is 2NF and the non-key columns are
independent of each other. In other words, the non-key columns are dependent on primary
key, only on the primary key and nothing else. For example, suppose that we have a
Products table with columns productID (primary key), name and unitPrice. The column
discountRate shall not belong to the Products table if it is also dependent on the unitPrice,
which is not part of the primary key.
Higher Normal Form: 3NF has its inadequacies, which leads to a higher Normal form, such
as Boyce/Codd Normal form
At times, you may decide to break some of the normalization rules, for performance reason
(e.g., create a column called totalPrice in Orders table which can be derived from the
orderDetails records); or because the end-user requested for it. Make sure that you fully
aware of it, develop programming logic to handle it, and properly document the decision.
Integrity Rules
You should also apply the integrity rules to check the integrity of your design −
1. Entity Integrity Rule − The primary key cannot contain NULL. Otherwise, it cannot
uniquely identify the row. For composite key made up of several columns, none of the
columns can contain NULL. Most of the RDBMS check and enforce this rule.
2.Referential Integrity Rule − Each foreign key value must be matched to a primary key
value in the table referenced (or parent table).
You can insert a row with a foreign key in the child table only if the value exists in the parent
table.
If the value of the key changes in the parent table (e.g., the row updated or deleted), all
rows with this foreign key in the child table(s) must be handled accordingly. You could either
(a) disallow the changes; (b) cascade the change (or delete the records) in the child tables
accordingly; (c) set the key value in the child tables to NULL.
Most RDBMS can be set up to perform the check and ensure the referential integrity, in a
specified manner.
3.Business logic Integrity − Besides the above two general integrity rules, there could be
integrity (validation) pertaining to the business logic, e.g., zip code shall be 5-digit within a
certain ranges, delivery date and time shall fall in the business hours; quantity ordered shall
be equal or less than quantity in stock, etc. These could be carried out invalidation rule (for
the specific column) or programming logic.
Column Indexing
You could create an index on the selected column(s) to facilitate data searching and
retrieval. An index is a structured file that speeds up data access for SELECT but may slow
down INSERT, UPDATE, and DELETE. Without an index structure, to process a SELECT query
with a matching criterion (e.g., SELECT * FROM Customers WHERE name='Tan Ah Teck'), the
database engine needs to compare every record in the table. A specialized index (e.g., in
BTREE structure) could reach the record without comparing every record. However, the
index needs to be rebuilt whenever a record is changed, which results in overhead
associated with using indexes.
The index can be defined on a single column, a set of columns (called concatenated index),
or part of a column (e.g., first 10 characters of a VARCHAR(100)) (called partial index). You
could build more than one index in a table. For example, if you often search for a customer
using either customerName or phone number, you could speed up the search by building an
index on column customerName, as well as phoneNumber. Most RDBMS builds an index on
the primary key automatically.
Example
The following is an example that would make it easier to understand functional dependency
−
We have a <Department> table with two attributes − DeptId and DeptName.
DeptId = Department ID
DeptName = Department Name
The DeptId is our primary key. Here, DeptId uniquely identifies the DeptName attribute.
This is because if you want to know the department name, then at first you need to have
the DeptId.
DeptId DeptName
001 Finance
002 Marketing
003 HR
Therefore, the above functional dependency between DeptId and DeptName can be
determined as DeptId is functionally dependent on DeptName −
A ->B
Example
We are considering the same <Department> table with two attributes to understand the
concept of trivial dependency.
The following is a trivial functional dependency since DeptId is a subset
of DeptId and DeptName
Example
A ->B
• Transitivity
If A->B and B->C, then A->C i.e. a transitive relation.
• Reflexivity
A-> B, if B is a subset of A.
• Augmentation
The last rule suggests: AC->BC, if A->B
Data Anomalies
Normalization is necessary if you do not do it then the overall integrity of the data stored in
the database will eventually degrade. Specifically, this is due to data anomalies. These
anomalies naturally occur and result in data that does not match the real-world the
database purports to represent.
Anomalies are caused when there is too much redundancy in the database's information.
Anomalies can often be caused when the tables that make up the database suffer from poor
construction. So, what does "poor construction" mean? Poor table design will become
evident if, when the designer creates the database, he doesn't identify the entities that
depend on each other for existence, like the rooms of a hotel and the hotel, and then
minimize the chance that one would ever exist independent of the other.
The normalization process was created largely in order to reduce the negative effects of
creating tables that will introduce anomalies into the database.
There are three types of Data Anomalies: Update Anomalies, Insertion Anomalies, and
Deletion Anomalies.
Update Anomalies happen when the person charged with the task of keeping all the
records current and accurate, is asked, for example, to change an employee’s title due to a
promotion. If the data is stored redundantly in the same table, and the person misses any of
them, then there will be multiple titles associated with the employee. The end user has no
way of knowing which is the correct title.
Insertion Anomalies happen when inserting vital data into the database is not possible
because other data is not already there. For example, if a system is designed to require that
a customer be on file before a sale can be made to that customer, but you cannot add a
customer until they have bought something, then you have an insert anomaly. It is the
classic "catch-22" situation.
Deletion Anomalies happen when the deletion of unwanted information causes desired
information to be deleted as well. For example, if a single database record contains
information about a particular product along with information about a salesperson for the
company and the salesperson quits, then information about the product is deleted along
with salesperson information.