DBMS Notes UNIT 3-4
DBMS Notes UNIT 3-4
DBMS Notes UNIT 3-4
in
UNIT-III
Normalization
A large database defined as a single relation may result in data duplication. This repetition of
data may result in:
So to handle these problems, we should analyze and decompose the relations with redundant data
into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into
a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data
results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations. The relation is said to be in particular normal form if it satisfies constraints.
Normal Description
Form
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are
fully functional dependent on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has
no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.
Advantages of Normalization
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e.,
4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.
Multivalued dependency occurs when two attributes in a table are independent of each other but, both
depend on a third attribute. A multivalued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least three attributes.
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together
acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
MESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
There is no Hard and fast rule to define normal form up to 5NF. Historically the process
of normalization and the process of discovering undesirable dependencies were carried
through 5NF, but it has been possible to define the stricter normal form that takes into
account additional type of dependencies and constraints.
The basic idea behind the DKNF is to specify the normal form that takes into account
all the possible dependencies and constraints.
In simple words, we can say that DKNF is a normal form used in database
normalization which requires that the database contains no constraints other than
domain constraints and key constraints.
In other words, a relation schema is said to be in DKNF only if all the constraints and
dependencies that should hold on the valid relation state can be enforced simply by
enforcing the domain constraints and the key constraints on the relation. For a relation
in DKNF, it becomes very straight forward to enforce all the database constraints by
simply checking that each attribute value is a tuple is of the appropriate domain and that
every key constraint is enforced.
Reason to use DKNF are as follows:
1. To avoid general constraints in the database that are not clear key constraints.
2. Most database can easily test or check key constraints on attributes.
However, because of the difficulty of including complex constraints in a DKNF relation
its practical utility is limited means that they are not in practical use, since it may be
quite difficult to specify general integrity constraints.
Let’s understand this by taking an example:
Example –
Consider relations CAR (MAKE, vin#) and MANUFACTURE (vin#, country),
Where vin# represents the vehicle identification number ‘country’ represents the name
of the country where it is manufactured.
A general constraint may be of the following form:
If the MAKE is either ‘HONDA’ or ‘MARUTI’ then the first character of the vin# is a
‘B’ If the country of manufacture is ‘INDIA’
If the MAKE is ‘FORD’ or ‘ACCURA’, the second character of the vin# is a ‘B” if the
country of manufacture is ‘INDIA’.
There is no simplified way to represent such constraints short of writing a procedure or
general assertion to test them. Hence such a procedure needs to enforce an appropriate
integrity constraint. However, transforming a higher normal form into domain/key
normal form is not always a dependency-preserving transformation and these are not
possible always.
Domain Key Normal Form DKNF is also called 6NF.
Our relations have no constraints other than domain constraints and key constraints.
Domain constraints
Domain constraints specify the possible values of the attribute.
E.g. The only colors of hair are black and white
E.g. The CGPA of a student is between 0 and 4.
General constraints
General constraints or any other constraints which can be
expressed by the first-order logic.
E.g. If the first digit of a bank account is 2, then the balance of
the account is greater than >=4000.
account Balance Type
1-T4Tutorials-110
2-T4Tutorials-211
It’s logically wrong because Balance is nonprime attribute and determining the type. In this case,
a transitive dependency is occurring.
Domain constraint: If Regular_Account, then the first digit of AccountNo will be 2 and
balance < 4000
Domain constraint: If Executive_Account, then the first digit of AccountNo will be 1 and
balance ³ 4000.
Solution: Separate table for Regular_Account and Executive_Account.
account Balance
1-T4Tutorials-110
2-T4Tutorials-211
Table: Regular_Account
account Balance
1-T4Tutorials-110
2-T4Tutorials-211
UNIT-IV
log. It contains information about the start and end of each transaction and any updates
which occur during the transaction. The log keeps track of all transaction operations
that affect the values of database items. This information is needed to recover from
transaction failure.
The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.
read_item(T, X): This log entry records that transaction T reads the value of
database item X.
write_item(T, X, old_value, new_value): This log entry records that transaction T
changes the value of the database item X from old_value to new_value. The old
value is sometimes known as a before an image of X, and the new value is known as
an afterimage of X.
commit(T): This log entry records that transaction T has completed all accesses to
the database successfully and its effect can be committed (recorded permanently) to
the database.
abort(T): This records that transaction T has been aborted.
checkpoint: Checkpoint is a mechanism where all the previous logs are removed
from the system and stored permanently in a storage disk. Checkpoint declares a
point before which the DBMS was in a consistent state, and all the transactions were
committed.
A transaction T reaches its commit point when all its operations that access the
database have been executed successfully i.e. the transaction has reached the point at
which it will not abort (terminate without completing). Once committed, the
transaction is permanently recorded in the database. Commitment always involves
writing a commit entry to the log and writing the log to disk. At the time of a system
crash, item is searched back in the log for all transactions T that have written a
start_transaction(T) entry into the log but have not written a commit(T) entry yet; these
transactions may have to be rolled back to undo their effect on the database during the
recovery process.
Undoing – If a transaction crashes, then the recovery manager may undo
transactions i.e. reverse the operations of a transaction. This involves examining a
transaction for the log entry write_item(T, x, old_value, new_value) and set the
value of item x in the database to old-value. There are two major techniques for
recovery from non-catastrophic transaction failures: deferred updates and immediate
updates.
Deferred update – This technique does not physically update the database on disk
until a transaction has reached its commit point. Before reaching commit, all
transaction updates are recorded in the local transaction workspace. If a transaction
fails before reaching its commit point, it will not have changed the database in any
way so UNDO is not needed. It may be necessary to REDO the effect of the
operations that are recorded in the local transaction workspace, because their effect
may not yet have been written in the database. Hence, a deferred update is also
known as the No-undo/redo algorithm
Immediate update – In the immediate update, the database may be updated by
some operations of a transaction before the transaction reaches its commit point.
However, these operations are recorded in a log on disk before they are applied to
the database, making recovery still possible. If a transaction fails to reach its commit
point, the effect of its operation must be undone i.e. the transaction must be rolled
back hence we require both undo and redo. This technique is known as undo/redo
algorithm.
Caching/Buffering – In this one or more disk pages that include data items to be
updated are cached into main memory buffers and then updated in memory before
being written back to disk. A collection of in-memory buffers called the DBMS
cache is kept under the control of DBMS for holding these buffers. A directory is
used to keep track of which database items are in the buffer. A dirty bit is associated
with each buffer, which is 0 if the buffer is not modified else 1 if modified.
Shadow paging – It provides atomicity and durability. A directory with n entries is
constructed, where the ith entry points to the ith database page on the link. When a
transaction began executing the current directory is copied into a shadow directory.
When a page is to be modified, a shadow page is allocated in which changes are
made and when it is ready to become durable, all pages that refer to the original are
updated to refer new replacement page.
Backward Recovery – The term “Rollback ” and “UNDO” can also refer to
backward recovery. When a backup of the data is not available and previous
modifications need to be undone, this technique can be helpful. With the backward
recovery method, unused modifications are removed and the database is returned to
its prior condition. All adjustments made during the previous traction are reversed
during the backward recovery. In another word, it reprocesses valid transactions and
undoes the erroneous database updates.
Forward Recovery – “Roll forward “and “REDO” refers to forwarding recovery.
When a database needs to be updated with all changes verified, this forward
recovery technique is helpful.
Some failed transactions in this database are applied to the database to roll those
modifications forward. In another word, the database is restored using preserved
data and valid transactions counted by their past saves.
Some of the backup techniques are as follows :
Full database backup – In this full database including data and database, Meta
information needed to restore the whole database, including full-text catalogs are
backed up in a predefined time series.
Differential backup – It stores only the data changes that have occurred since the
last full database backup. When some data has changed many times since last full
database backup, a differential backup stores the most recent version of the changed
data. For this first, we need to restore a full database backup.
Transaction log backup – In this, all events that have occurred in the database, like
a record of every single statement executed is backed up. It is the backup of
transaction log entries and contains all transactions that had happened to the
database. Through this, the database can be recovered to a specific point in time. It
is even possible to perform a backup from a transaction log if the data files are
destroyed and not even a single committed transaction is lost.
We will discuss the protocols and the problems related to concurrency control in DBMS.
1. Lock based
The lock-based protocol is the technique of applying lock condition on the data element, which
helps in restricting another resource to perform read and write operation until the lock is active.
There are mainly two types of lock such as shared or read-only lock and exclusive lock.
2. Validation based
The validation based protocol is also known as an optimistic concurrency control technique. It
involves the read phase, validation phase, and writes phase for concurrency control.
3. Timestamp based
The timestamp-based protocol uses system time or logical count as a timestamp to serialize the
execution of concurrent transactions that helps to maintain the order of the transactions.
4. Two-phase protocol
The two-phase protocol (2PL) is a locking mechanism that ensures serializability by using two
distinct phases of lock condition. It uses the expanding phase and shrinking phase to acquire and
release the lock condition to maintain concurrency control.
1. Dirty Read
Dirty read or temporary update problems happen while there is an incomplete transaction. In this
scenario, the data element or item got updated by one transaction and filed before completing it.
And another transaction tries to access the data element before it is modified or rolled back to its
last value.
Transaction T1 Transaction T2
Read(X) Read(X)
# X Value X-n
X=X-n
X=X+n1
Write(X)
Database Security:-
Overview
Data is a valuable resource that must be handled and managed with care, just like any other
economic resource. As a result, some or all of the commercial data may be tactically important to
their specific company and hence must have to be kept protected and confidential. In this article,
you will get to learn about the extent of database security. As countermeasures to these threats,
a variety of computer-based controls are available.
Scope
We will cover the following topics in this article:
Database security in DBMS is a technique for protecting and securing a database from
intentional or accidental threats. Security considerations will apply not only to the data stored in
an organization's database: a breach of security may impact other aspects of the system, which
may ultimately affect the database structure. As a result, database security encompasses
hardware parts, software parts, human resources, and data.
To use the security efficiently, appropriate controls are required, which are separated into a
specific goal and purpose for the system. The demand for effective security, which was
frequently neglected or overlooked in the past, is now being rigorously verified by many
businesses.
These above-stated conditions generally represent the areas where the organization should focus
on lowering the risk, which is the possibility of data loss or destruction within a database. Since
all of the data inside an organization is interrelated, an activity that results in a loss in one area
may also lead to a loss in another.
Although most security breaches are caused by hackers, in reality, insiders account for 80% of
data loss. The extent to which an incident, such as a data breach, can harm our company is
determined by several factors.
It is a type of attack which occurs when a malicious code is injected into frontend
(web) apps and then transmitted to the backend database. SQL injections provide
hackers with unrestricted access to any data saved in a database. There are two types
of such computer attacks: SQL injection attacks on traditional
databases and NoSQL injection attacks on large data databases. Typically, these
are queries generated as an extension of online application forms or received via
HTTP requests. Any database system is vulnerable to these attacks if developers do
not follow secure coding practices and the organization does not conduct regular
vulnerability testing.
Malware is software designed to corrupt data or harms a database. Malware could enter your
system via any endpoint device connected to the database's network and exploit vulnerabilities in
your system. Malware protection is important on any endpoint, but it is particularly necessary on
database servers due to their high value and sensitivity. Examples of common malware include
spyware, Trojan viruses, viruses, worms, adware, and ransomware.
Countermeasures:
Safeguard any endpoints on your databases. Install specialized malware protection software and
spread awareness among users on risk and preventative measures.
Databases are breached and leaked due to insufficient level of IT security expertise and
education of non-technical employees, who may violate basic database security standards and
endanger databases. IT security employees may also lack the necessary expertise to create
security controls, enforce rules, or execute incident response processes.
Countermeasures:
In a denial of service (DoS) attack, the cybercriminal uses a huge number of fake requests to
overwhelm the target service—in this case, the database server. As a result, the server cannot
handle legitimate requests from actual users and frequently crashes or becomes unstable.
A DoS (Denial of Service) attack causes a database server to slow down and may render it
unavailable to all users. Even though a DoS attack does not reveal the contents of a database, it
can cost victims a significant amount of time and money. Moreover, what is the use of having a
database if you can't use or access it?
In a distributed denial of service (DDoS) attack, fake traffic is generated by a large number of
computers that are part of an attacker-controlled botnet. This results in extremely high traffic
volumes, which are difficult to stop without a highly scalable defensive architecture. Cloud-
based DDoS prevention services can dynamically scale up to deal with massive DDoS attacks.
Countermeasures:
Harden the TCP/IP stack by increasing the size of the TCP connection queue with the
appropriate registry settings.
Reduce the time it takes to establish a connection.
Use dynamic backlog methods to ensure that the connection queue is never full.
Use an Intrusion Detection System (IDS) on your network.
Attackers are continuously attempting to isolate and target software vulnerabilities, and database
management software is a particularly desirable target. New vulnerabilities are identified on
daily basis, and security updates are issued regularly by all open-source database management
platforms and commercial database software manufacturers. However, if you do not apply these
changes immediately, your database may be vulnerable to attack.
Even if you deploy patches on time, there is always the risk of zero-day attacks, which occur
when attackers find a vulnerability that the database vendor has not yet found and patched.
Countermeasures:
Conduct a regular search for new sensitive data in your databases. You may accomplish this very
successfully with the Periodic Data Discovery tool and Compliance Manager, which will
automatically discover and secure newly uploaded sensitive data.
Database users in DBMS may have varying levels of access. However, users may abuse them,
and the three basic categories of privilege abuse are as follows: excessive privilege abuse,
legitimate privilege abuse, and unused privilege abuse. Excessive privileges always introduce
unnecessary risks. According to statistics, 80% of attacks on company databases are carried out
by current or former workers.
Countermeasures:
It is recommended that a strict access and privileges control policy be implemented and enforced.
Don't give staff too many privileges, and revoke outdated privileges as soon as possible.
If a database is not audited, it represents a risk of noncompliance with sensitive data protection
rules at the national and international levels. All database events must be automatically logged
and registered, and automatic auditing solutions must be used. Failure or unwillingness to do so
represents a major risk on multiple levels.
Countermeasures:
Authentication
Authentication is the process of confirming whether a user logs in only with the rights granted to
him to undertake database operations. A certain user can only log in up to his privilege level, but
he cannot access any other sensitive data.
The ability to access sensitive data is restricted by the use of authentication. For example,
a mobile phone performs authentication by requesting a PIN, fingerprint, or by face recognition.
Similarly, a computer verifies a username by requesting the appropriate password.
However, in the context of databases, authentication takes on a new dimension because it can
occur at multiple levels. It can be done by the database itself, or the configuration can be adjusted
to allow the operating system or another external means to authenticate users.
For example, when creating a database in Microsoft's SQL Server a user must specify whether to
use database authentication, operating system authentication, or both (the so-called mixed-mode
authentication). Other databases that prioritize security use near-foolproof authentication methods
such as fingerprint recognition and retinal scanning.
By using various authentication technologies such as biometrics for retina and fingerprints, you
can protect your data from unauthorized/malicious users.
Access Control
Database access control is a means of restricting access to sensitive company data to only those
people (database users) who are authorized to access such data and permitting access to
unauthorized persons. It is a key security concept that reduces risk to the business or organization.
Physical and logical access control are the two types of access control. Access to campuses,
buildings, rooms, and physical IT assets is restricted through physical access control. Connections
to computer networks, system files, and data are restricted through logical access control.
To safeguard a facility, corporations use electronic access control systems to track employee
access to restricted company locations and private regions, such as data centers, using user
credentials, access card readers, auditing, and reports. Some of these systems include access
control panels to restrict access to rooms and buildings, as well as alarms and lockdown features
to prevent unauthorized access or operations.
Logical access control systems execute user and entity identification, authentication, and
authorization by evaluating needed login credentials, which can include passwords, personal
identification numbers, biometric scans, security tokens, or other authentication
factors. Multifactor authentication (MFA), which needs two or more authentication factors, is
frequently used as part of a layered defense to safeguard access control systems.
The most well-known Database Access Control examples are:
o Discretionary Access Control (DAC)
o Mandatory Access Control (MAC)
o Role-Based Access Control (RBAC)
o Attribute-Based Access Control (ABAC)
Inference Control
Flow Control
Distributed systems involve a large amount of data flow from one site to another as well as within
a site. Flow control prohibits data from being transferred in such a way that unauthorized agents
cannot access it.
A flow policy specifies the channels through which data can flow. It also defines security classes
for data as well as transactions. Convert channels are the pathways for information to flow
implicitly in ways that violate a company's privacy policy.
Statistical database security focuses on the protection of sensitive individual values stored in so-
called statistical databases and used for statistical purposes, as well as retrieving summaries of
values based on categories. They do not allow the retrieval of individual information.
Examples include patient records utilized by medical researchers and extensive phone call records
statistically examined by phone companies to improve their services.
It provides access to the database to obtain statistical information about the number of employees
in the company but not to obtain detailed confidential/personal information about a specific
individual employee.
The techniques used to prevent statistical database compromise are classified into two types:
noise addition, in which all data and/or statistics are available but are only approximate rather
than exact, and restriction, in which the system only delivers statistics and/or data that are deemed
safe.
In any situation, a technique is evaluated by assessing both the amount of information lost and the
level of privacy attained. Statistical data protection aims to maximize privacy while minimizing
information loss. To evaluate a particular technique, a theoretical lower bound on the information
loss required to achieve a given level of privacy must be established.
Encryption
Several companies now offer data protection platforms and tools. A comprehensive approach
should include the following features:
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The value of
the attribute must be available in the corresponding domain.
Example:
o This is because the primary key value is used to identify individual rows in relation and if the
primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
Example:
Example:
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A primary
key can contain a unique and null value in the relational table.
Example:
Features
Databases in the collection are logically interrelated with each other. Often they represent
a single logical database.
Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is not synonymous with
a transaction processing system.
Need for Sharing of Data − The multiple organizational units often need to
communicate with each other and share their data and resources. This demands common
databases or replicated databases that should be used in a synchronized manner.
Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and
Online Analytical Processing (OLAP) work upon diversified systems which may have
common data. Distributed database systems aid both these processing by providing
synchronized data.
Database Recovery − One of the common techniques used in DDBMS is replication of
data across different sites. Replication of data automatically helps in data recovery if
database in any site is damaged. Users can access data from other sites while the damaged
site is being reconstructed. Thus, database failure may become almost inconspicuous to
users.
Support for Multiple Application Software − Most organizations use a variety of
application software each with its specific database support. DDBMS provides a uniform
functionality for using the same data among different platforms.
Need for complex and expensive software − DDBMS demands complex and often
expensive software to provide data transparency and co-ordination across the several
sites.
Processing overhead − Even simple operations may require a large number of
communications and additional calculations to provide uniformity in data across the sites.
Data integrity − The need for updating data in multiple sites pose problems of data
integrity.
Overheads for improper data distribution − Responsiveness of queries is largely
dependent upon proper data distribution. Improper data distribution often leads to very
slow response to user requests.
Design of Distributed Databases
There are several different architectures for distributed database systems, including:
Client-server architecture: In this architecture, clients connect to a central
server, which manages the distributed database system. The server is responsible
for coordinating transactions, managing data storage, and providing access
control.
Peer-to-peer architecture: In this architecture, each site in the distributed
database system is connected to all other sites. Each site is responsible for
managing its own data and coordinating transactions with other sites.
Federated architecture: In this architecture, each site in the distributed
database system maintains its own independent database, but the databases are
integrated through a middleware layer that provides a common interface for
accessing and querying the data.
Distributed database systems can be used in a variety of applications, including
e-commerce, financial services, and telecommunications. However, designing
and managing a distributed database system can be complex and requires careful
consideration of factors such as data distribution, replication, and consistency.
Advantages of Distributed Database System :
1) There is fast data processing as several sites participate in request processing.
2) Reliability and availability of this system is high.
3) It possess reduced operating cost.
4) It is easier to expand the system by adding more sites.
5) It has improved sharing ability and local autonomy.
Disadvantages of Distributed Database System :
1) The system becomes complex to manage and control.
2) The security issues must be carefully managed.
3) The system require deadlock handling during the transaction processing
otherwise
the entire system may be in inconsistent state.
4) There is need of some standardization for processing of distributed database
system.
REFERENCES: