DBMS Notes UNIT 3-4

BCOM-Ist SEM www.sbs.ac.
in
CLASS: BCA4th SEM

BATCH: 2021-2024
SUBJECT: DATABASE MANAGEMENT SYSTEMS LECTURE NOTES
Notes as per IKGPTU Syllabus
NAME OF FACULTY: Ms. Sonia Sharma
FACULTYOF BUSINESS MANAGEMENT, SBS COLLEGE. LUDHIANA
SBS @PROPRIETARY Page 1

BCOM-Ist SEM www.sbs.ac.in
UNIT-III
Normalization
A large database defined as a single relation may result in data duplication. This repetition of
data may result in:
o Making relations very large.

o It isn't easy to maintain and update data as it would involve searching many records in
relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.
So to handle these problems, we should analyze and decompose the relations with redundant data
into smaller, simpler, and well-structured relations that are satisfy desirable properties.
Normalization is a process of decomposing the relations into relations with fewer attributes.
What is Normalization?
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of relations. It is also
used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using relationships.
The normal form is used to reduce redundancy from the database table.
Why do we need Normalization?
The main reason for normalizing the relations is removing these anomalies. Failure to eliminate
anomalies leads to data redundancy and can cause data integrity and other problems as the
database grows. Normalization consists of a series of guidelines that helps to guide you in
creating a good database structure.
Data modification anomalies can be categorized into three types:
o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into
a relationship due to lack of data.

o Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data
results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data value
requires multiple rows of data to be updated.
Types of Normal Forms:
Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations. The relation is said to be in particular normal form if it satisfies constraints.
Following are the various types of Normal forms:
Normal Description
Form
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are
fully functional dependent on the primary key.

3NF A relation will be in 3NF if it is in 2NF and no transition dependency

exists.
BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form and has
no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.
Advantages of Normalization
o Normalization helps to minimize data redundancy.

o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e.,
4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.
Multi-valued Dependencies and Fourth Normal Form:-
Multivalued dependency occurs when two attributes in a table are independent of each other but, both
depend on a third attribute. A multivalued dependency consists of at least two attributes that are
dependent on a third attribute that's why it always requires at least three attributes.

Fourth normal form (4NF)
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.
Example
STUDENT
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two

courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth normal form (5NF)
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together
acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
MESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math

P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Domain-key normal form (DKNF)
There is no Hard and fast rule to define normal form up to 5NF. Historically the process
of normalization and the process of discovering undesirable dependencies were carried
through 5NF, but it has been possible to define the stricter normal form that takes into
account additional type of dependencies and constraints.
The basic idea behind the DKNF is to specify the normal form that takes into account
all the possible dependencies and constraints.
In simple words, we can say that DKNF is a normal form used in database
normalization which requires that the database contains no constraints other than
domain constraints and key constraints.

In other words, a relation schema is said to be in DKNF only if all the constraints and
dependencies that should hold on the valid relation state can be enforced simply by
enforcing the domain constraints and the key constraints on the relation. For a relation
in DKNF, it becomes very straight forward to enforce all the database constraints by
simply checking that each attribute value is a tuple is of the appropriate domain and that
every key constraint is enforced.
Reason to use DKNF are as follows:
1. To avoid general constraints in the database that are not clear key constraints.
2. Most database can easily test or check key constraints on attributes.
However, because of the difficulty of including complex constraints in a DKNF relation
its practical utility is limited means that they are not in practical use, since it may be
quite difficult to specify general integrity constraints.
Let’s understand this by taking an example:
Example –
Consider relations CAR (MAKE, vin#) and MANUFACTURE (vin#, country),
Where vin# represents the vehicle identification number ‘country’ represents the name
of the country where it is manufactured.
A general constraint may be of the following form:
If the MAKE is either ‘HONDA’ or ‘MARUTI’ then the first character of the vin# is a
‘B’ If the country of manufacture is ‘INDIA’
If the MAKE is ‘FORD’ or ‘ACCURA’, the second character of the vin# is a ‘B” if the
country of manufacture is ‘INDIA’.
There is no simplified way to represent such constraints short of writing a procedure or
general assertion to test them. Hence such a procedure needs to enforce an appropriate
integrity constraint. However, transforming a higher normal form into domain/key
normal form is not always a dependency-preserving transformation and these are not
possible always.
Domain Key Normal Form DKNF is also called 6NF.
Our relations have no constraints other than domain constraints and key constraints.
Domain constraints
Domain constraints specify the possible values of the attribute.
E.g. The only colors of hair are black and white
E.g. The CGPA of a student is between 0 and 4.

Key constraiKey constraints specify keys of some relation.
General constraints
General constraints or any other constraints which can be
expressed by the first-order logic.
E.g. If the first digit of a bank account is 2, then the balance of
the account is greater than >=4000.
account Balance Type
1-T4Tutorials-110
2-T4Tutorials-211
It’s logically wrong because Balance is nonprime attribute and determining the type. In this case,
a transitive dependency is occurring.
Domain constraint: If Regular_Account, then the first digit of AccountNo will be 2 and
balance < 4000
Domain constraint: If Executive_Account, then the first digit of AccountNo will be 1 and
balance ³ 4000.
Solution: Separate table for Regular_Account and Executive_Account.
account Balance
1-T4Tutorials-110
2-T4Tutorials-211
Table: Regular_Account
account Balance
1-T4Tutorials-110
2-T4Tutorials-211
UNIT-IV
Database Recovery Techniques in DBMS

Database recovery techniques are used in database management systems (DBMS) to

restore a database to a consistent state after a failure or error has occurred. The main
goal of recovery techniques is to ensure data integrity and consistency and prevent data
loss. There are mainly two types of recovery techniques used in DBMS:
Rollback/Undo Recovery Technique: The rollback/undo recovery technique is based
on the principle of backing out or undoing the effects of a transaction that has not
completed successfully due to a system failure or error. This technique is accomplished
by undoing the changes made by the transaction using the log records stored in the
transaction log. The transaction log contains a record of all the transactions that have
been performed on the database. The system uses the log records to undo the changes
made by the failed transaction and restore the database to its previous state.
Commit/Redo Recovery Technique: The commit/redo recovery technique is based on
the principle of reapplying the changes made by a transaction that has been completed
successfully to the database. This technique is accomplished by using the log records
stored in the transaction log to redo the changes made by the transaction that was in
progress at the time of the failure or error. The system uses the log records to reapply
the changes made by the transaction and restore the database to its most recent
consistent state.
In addition to these two techniques, there is also a third technique called checkpoint
recovery. Checkpoint recovery is a technique used to reduce the recovery time by
periodically saving the state of the database in a checkpoint file. In the event of a
failure, the system can use the checkpoint file to restore the database to the most recent
consistent state before the failure occurred, rather than going through the entire log to
recover the database.
Overall, recovery techniques are essential to ensure data consistency and availability in
DBMS, and each technique has its own advantages and limitations that must be
considered in the design of a recovery system
Database systems, like any other computer system, are subject to failures but the data
stored in them must be available as and when required. When a database fails it must
possess the facilities for fast recovery. It must also have atomicity i.e. either transaction
are completed successfully and committed (the effect is recorded permanently in the
database) or the transaction should have no effect on the database. There are both
automatic and non-automatic ways for both, backing up of data and recovery from any
failure situations. The techniques used to recover the lost data due to system crashes,
transaction errors, viruses, catastrophic failure, incorrect commands execution, etc. are
database recovery techniques. So to prevent data loss recovery techniques based on
deferred update and immediate update or backing up data can be used. Recovery
techniques are heavily dependent upon the existence of a special file known as a system

log. It contains information about the start and end of each transaction and any updates
which occur during the transaction. The log keeps track of all transaction operations
that affect the values of database items. This information is needed to recover from
transaction failure.
 The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.
 read_item(T, X): This log entry records that transaction T reads the value of
database item X.
 write_item(T, X, old_value, new_value): This log entry records that transaction T
changes the value of the database item X from old_value to new_value. The old
value is sometimes known as a before an image of X, and the new value is known as
an afterimage of X.
 commit(T): This log entry records that transaction T has completed all accesses to
the database successfully and its effect can be committed (recorded permanently) to
the database.
 abort(T): This records that transaction T has been aborted.
 checkpoint: Checkpoint is a mechanism where all the previous logs are removed
from the system and stored permanently in a storage disk. Checkpoint declares a
point before which the DBMS was in a consistent state, and all the transactions were
committed.
A transaction T reaches its commit point when all its operations that access the
database have been executed successfully i.e. the transaction has reached the point at
which it will not abort (terminate without completing). Once committed, the
transaction is permanently recorded in the database. Commitment always involves
writing a commit entry to the log and writing the log to disk. At the time of a system
crash, item is searched back in the log for all transactions T that have written a
start_transaction(T) entry into the log but have not written a commit(T) entry yet; these
transactions may have to be rolled back to undo their effect on the database during the
recovery process.
 Undoing – If a transaction crashes, then the recovery manager may undo
transactions i.e. reverse the operations of a transaction. This involves examining a
transaction for the log entry write_item(T, x, old_value, new_value) and set the
value of item x in the database to old-value. There are two major techniques for
recovery from non-catastrophic transaction failures: deferred updates and immediate
updates.
 Deferred update – This technique does not physically update the database on disk
until a transaction has reached its commit point. Before reaching commit, all
transaction updates are recorded in the local transaction workspace. If a transaction
fails before reaching its commit point, it will not have changed the database in any
way so UNDO is not needed. It may be necessary to REDO the effect of the

operations that are recorded in the local transaction workspace, because their effect
may not yet have been written in the database. Hence, a deferred update is also
known as the No-undo/redo algorithm
 Immediate update – In the immediate update, the database may be updated by
some operations of a transaction before the transaction reaches its commit point.
However, these operations are recorded in a log on disk before they are applied to
the database, making recovery still possible. If a transaction fails to reach its commit
point, the effect of its operation must be undone i.e. the transaction must be rolled
back hence we require both undo and redo. This technique is known as undo/redo
algorithm.
 Caching/Buffering – In this one or more disk pages that include data items to be
updated are cached into main memory buffers and then updated in memory before
being written back to disk. A collection of in-memory buffers called the DBMS
cache is kept under the control of DBMS for holding these buffers. A directory is
used to keep track of which database items are in the buffer. A dirty bit is associated
with each buffer, which is 0 if the buffer is not modified else 1 if modified.
 Shadow paging – It provides atomicity and durability. A directory with n entries is
constructed, where the ith entry points to the ith database page on the link. When a
transaction began executing the current directory is copied into a shadow directory.
When a page is to be modified, a shadow page is allocated in which changes are
made and when it is ready to become durable, all pages that refer to the original are
updated to refer new replacement page.
 Backward Recovery – The term “Rollback ” and “UNDO” can also refer to
backward recovery. When a backup of the data is not available and previous
modifications need to be undone, this technique can be helpful. With the backward
recovery method, unused modifications are removed and the database is returned to
its prior condition. All adjustments made during the previous traction are reversed
during the backward recovery. In another word, it reprocesses valid transactions and
undoes the erroneous database updates.
 Forward Recovery – “Roll forward “and “REDO” refers to forwarding recovery.
When a database needs to be updated with all changes verified, this forward
recovery technique is helpful.
Some failed transactions in this database are applied to the database to roll those
modifications forward. In another word, the database is restored using preserved
data and valid transactions counted by their past saves.
Some of the backup techniques are as follows :
 Full database backup – In this full database including data and database, Meta
information needed to restore the whole database, including full-text catalogs are
backed up in a predefined time series.

 Differential backup – It stores only the data changes that have occurred since the
last full database backup. When some data has changed many times since last full
database backup, a differential backup stores the most recent version of the changed
data. For this first, we need to restore a full database backup.
 Transaction log backup – In this, all events that have occurred in the database, like
a record of every single statement executed is backed up. It is the backup of
transaction log entries and contains all transactions that had happened to the
database. Through this, the database can be recovered to a specific point in time. It
is even possible to perform a backup from a transaction log if the data files are
destroyed and not even a single committed transaction is lost.
Introduction to Concurrency Control in DBMS

Concurrency control in DBMS is an important concept that is related to the transactions and data
consistency of the database management systems. Concurrency control refers to the process of
managing independent operations of the database that are simultaneous and considered as a
transaction in DBMS. Concurrency Control works on the principle of maintaining the
transactions state that can be a complete transaction or an incomplete transaction. In the case of
the complete transaction, all the associated database operations need to be completed with
specified rules and sequences, whereas an incomplete transaction occurs in case all the database
operations are not completed due to some technical, power failure, or network connectivity issue.
How does Concurrency Control work in DBMS?

The concurrency control is the process to maintain the data where there are multiple resources or
users are accessing the data element and performing the database operations. There are several
enterprise systems such as banking, ticket booking, and traffic light systems that use a shared
database as part of the data store associated with concurrent transactions. There is a chance of
conflict for these transactions and resulting data inconsistency.
We will discuss the protocols and the problems related to concurrency control in DBMS.
Concurrency Control Protocols

Concurrency control protocols are the techniques used to maintain data consistency, atomicity,
and serializability. Following are some of the concurrency control protocols
1. Lock based
The lock-based protocol is the technique of applying lock condition on the data element, which
helps in restricting another resource to perform read and write operation until the lock is active.
There are mainly two types of lock such as shared or read-only lock and exclusive lock.

2. Validation based
The validation based protocol is also known as an optimistic concurrency control technique. It
involves the read phase, validation phase, and writes phase for concurrency control.
3. Timestamp based
The timestamp-based protocol uses system time or logical count as a timestamp to serialize the
execution of concurrent transactions that helps to maintain the order of the transactions.
4. Two-phase protocol
The two-phase protocol (2PL) is a locking mechanism that ensures serializability by using two
distinct phases of lock condition. It uses the expanding phase and shrinking phase to acquire and
release the lock condition to maintain concurrency control.
Concurrency Control Problems

There are multiple problems that can arise in concurrent transaction scenarios. Some of the
common problems are:
1. Dirty Read
Dirty read or temporary update problems happen while there is an incomplete transaction. In this
scenario, the data element or item got updated by one transaction and filed before completing it.
And another transaction tries to access the data element before it is modified or rolled back to its
last value.
Transaction T1 Transaction T2
Read(X) Read(X)
# X Value X-n
X=X-n
X=X+n1

Write(X)
Database Security:-
Overview
Data is a valuable resource that must be handled and managed with care, just like any other
economic resource. As a result, some or all of the commercial data may be tactically important to
their specific company and hence must have to be kept protected and confidential. In this article,
you will get to learn about the extent of database security. As countermeasures to these threats,
a variety of computer-based controls are available.
Scope
We will cover the following topics in this article:
 What do you mean by database security and why is it important in DBMS?

 What are the database security threats and control measures for the security of data in
databases?
 Lastly, we will discuss the different types of database tools and platforms in DBMS.
What is Database Security in DBMS?
Database security in DBMS is a technique for protecting and securing a database from
intentional or accidental threats. Security considerations will apply not only to the data stored in
an organization's database: a breach of security may impact other aspects of the system, which
may ultimately affect the database structure. As a result, database security encompasses
hardware parts, software parts, human resources, and data.
To use the security efficiently, appropriate controls are required, which are separated into a
specific goal and purpose for the system. The demand for effective security, which was
frequently neglected or overlooked in the past, is now being rigorously verified by many
businesses.
We consider database security in the following scenarios:
 Theft and fraudulent.

 Loss of Data privacy.

 Loss of Data integrity.

 Loss of confidentiality or secrecy
 Loss of availability of data.
These above-stated conditions generally represent the areas where the organization should focus
on lowering the risk, which is the possibility of data loss or destruction within a database. Since
all of the data inside an organization is interrelated, an activity that results in a loss in one area
may also lead to a loss in another.
Why Database Security is Important?

Security is an important concern in database management because the information stored in a
database is a very valuable and, at times, quite sensitive commodity. As a result, data in a
database management system must be protected from abuse and illegal access and updates.
Although most security breaches are caused by hackers, in reality, insiders account for 80% of
data loss. The extent to which an incident, such as a data breach, can harm our company is
determined by several factors.
 Compromise of intellectual property: Our intellectual property—trade secrets,

inventions, or unique methods—could be essential for our ability to sustain an advantage
in our industry. If our intellectual property is stolen or leaked, then we will lose our
competitive advantage and it may be difficult to maintain or recover.
 The reputational harm is done to our brand: Customers or partners may refuse to buy
goods or services from us (or do business with us) if they do not believe they can trust
our company to protect their data or their own.
 The concept of business continuity (or lack of it): Some businesses are unable to
operate until a breach has been resolved.
 Penalties or fines to be paid for failure: The cost of failing to comply with international
regulations such as the Sarbanes-Oxley Act (SAO) or Payment Card Industry Data
Security Standard (PCI DSS) specific to industry regulations on data privacy, such as
HIPAA, or regional privacy laws like the European Union's General Data Protection
Regulation (GDPR) could be significant, with fines exceeding many millions of dollars in
the worst-case scenario.
 Costs of correcting breaches and notifying consumers about them: Along with
notifying customers of a breach, the organization that was breached must fund the
investigation and forensic services such as crisis management, triage repairs to the
affected systems, and much more.
Database Security Threats

Many software vulnerabilities, misconfigurations, or practices of misuse or carelessness could

lead to breaches. The following are some of the most well-known causes and types of database
security cyber threats.
1) SQL/NoSQL Injection Attacks
It is a type of attack which occurs when a malicious code is injected into frontend
(web) apps and then transmitted to the backend database. SQL injections provide
hackers with unrestricted access to any data saved in a database. There are two types
of such computer attacks: SQL injection attacks on traditional
databases and NoSQL injection attacks on large data databases. Typically, these
are queries generated as an extension of online application forms or received via
HTTP requests. Any database system is vulnerable to these attacks if developers do
not follow secure coding practices and the organization does not conduct regular
vulnerability testing.
Malware is software designed to corrupt data or harms a database. Malware could enter your
system via any endpoint device connected to the database's network and exploit vulnerabilities in
your system. Malware protection is important on any endpoint, but it is particularly necessary on
database servers due to their high value and sensitivity. Examples of common malware include
spyware, Trojan viruses, viruses, worms, adware, and ransomware.
Countermeasures:
 Safeguard any endpoints on your databases. Install specialized malware protection software and
spread awareness among users on risk and preventative measures.
3) Lack of Security Expertise and Education
Databases are breached and leaked due to insufficient level of IT security expertise and
education of non-technical employees, who may violate basic database security standards and
endanger databases. IT security employees may also lack the necessary expertise to create
security controls, enforce rules, or execute incident response processes.
Countermeasures:
 Database users must be trained in database security.

 IT security professionals will be encouraged to advance their professional level and

qualifications.
4) Denial of Service (DoS/DDoS) Attacks
In a denial of service (DoS) attack, the cybercriminal uses a huge number of fake requests to
overwhelm the target service—in this case, the database server. As a result, the server cannot
handle legitimate requests from actual users and frequently crashes or becomes unstable.
A DoS (Denial of Service) attack causes a database server to slow down and may render it
unavailable to all users. Even though a DoS attack does not reveal the contents of a database, it
can cost victims a significant amount of time and money. Moreover, what is the use of having a
database if you can't use or access it?
In a distributed denial of service (DDoS) attack, fake traffic is generated by a large number of
computers that are part of an attacker-controlled botnet. This results in extremely high traffic
volumes, which are difficult to stop without a highly scalable defensive architecture. Cloud-
based DDoS prevention services can dynamically scale up to deal with massive DDoS attacks.
Countermeasures:
 Harden the TCP/IP stack by increasing the size of the TCP connection queue with the
appropriate registry settings.
 Reduce the time it takes to establish a connection.
 Use dynamic backlog methods to ensure that the connection queue is never full.
 Use an Intrusion Detection System (IDS) on your network.
5) Exploitation of Database Software Vulnerabilities
Attackers are continuously attempting to isolate and target software vulnerabilities, and database
management software is a particularly desirable target. New vulnerabilities are identified on
daily basis, and security updates are issued regularly by all open-source database management
platforms and commercial database software manufacturers. However, if you do not apply these
changes immediately, your database may be vulnerable to attack.
Even if you deploy patches on time, there is always the risk of zero-day attacks, which occur
when attackers find a vulnerability that the database vendor has not yet found and patched.
Countermeasures:
 Encrypt any sensitive information in your database (s).

 Apply the necessary database controls and permissions.

 Conduct a regular search for new sensitive data in your databases. You may accomplish this very
successfully with the Periodic Data Discovery tool and Compliance Manager, which will
automatically discover and secure newly uploaded sensitive data.
6) Excessive Database Privileges
Database users in DBMS may have varying levels of access. However, users may abuse them,
and the three basic categories of privilege abuse are as follows: excessive privilege abuse,
legitimate privilege abuse, and unused privilege abuse. Excessive privileges always introduce
unnecessary risks. According to statistics, 80% of attacks on company databases are carried out
by current or former workers.
Countermeasures:
 It is recommended that a strict access and privileges control policy be implemented and enforced.
 Don't give staff too many privileges, and revoke outdated privileges as soon as possible.
7) Weak Audit Trail
If a database is not audited, it represents a risk of noncompliance with sensitive data protection
rules at the national and international levels. All database events must be automatically logged
and registered, and automatic auditing solutions must be used. Failure or unwillingness to do so
represents a major risk on multiple levels.
Countermeasures:
 Use automatic auditing solutions that have no impact on database performance.
Control Measures for the Security of Data in Databases

The following are the key control measures used to ensure data security in databases:
Authentication
 Authentication is the process of confirming whether a user logs in only with the rights granted to
him to undertake database operations. A certain user can only log in up to his privilege level, but
he cannot access any other sensitive data.
 The ability to access sensitive data is restricted by the use of authentication. For example,
a mobile phone performs authentication by requesting a PIN, fingerprint, or by face recognition.
Similarly, a computer verifies a username by requesting the appropriate password.
 However, in the context of databases, authentication takes on a new dimension because it can
occur at multiple levels. It can be done by the database itself, or the configuration can be adjusted
to allow the operating system or another external means to authenticate users.

 For example, when creating a database in Microsoft's SQL Server a user must specify whether to
use database authentication, operating system authentication, or both (the so-called mixed-mode
authentication). Other databases that prioritize security use near-foolproof authentication methods
such as fingerprint recognition and retinal scanning.
 By using various authentication technologies such as biometrics for retina and fingerprints, you
can protect your data from unauthorized/malicious users.
Access Control
 Database access control is a means of restricting access to sensitive company data to only those
people (database users) who are authorized to access such data and permitting access to
unauthorized persons. It is a key security concept that reduces risk to the business or organization.
 Physical and logical access control are the two types of access control. Access to campuses,
buildings, rooms, and physical IT assets is restricted through physical access control. Connections
to computer networks, system files, and data are restricted through logical access control.
 To safeguard a facility, corporations use electronic access control systems to track employee
access to restricted company locations and private regions, such as data centers, using user
credentials, access card readers, auditing, and reports. Some of these systems include access
control panels to restrict access to rooms and buildings, as well as alarms and lockdown features
to prevent unauthorized access or operations.
 Logical access control systems execute user and entity identification, authentication, and
authorization by evaluating needed login credentials, which can include passwords, personal
identification numbers, biometric scans, security tokens, or other authentication
factors. Multifactor authentication (MFA), which needs two or more authentication factors, is
frequently used as part of a layered defense to safeguard access control systems.
 The most well-known Database Access Control examples are:
o Discretionary Access Control (DAC)
o Mandatory Access Control (MAC)
o Role-Based Access Control (RBAC)
o Attribute-Based Access Control (ABAC)
Inference Control
 Inference control in databases, also known as Statistical Disclosure Control (SDC), is a

discipline that aims to secure data so that it can be published without disclosing sensitive
information associated with specific individuals among those to whom the data corresponds.
 It prevents the user from completing any inference channel. This strategy prevents sensitive
information from indirect disclosure. There are two kinds of inferences: identity disclosure and
attribute disclosure.
 SDC is used to protect the privacy of respondents in fields such as government statistics, health
statistics, e-commerce (sharing of customer data), etc. Since data modification ultimately
implies data protection, SDC aims to achieve protection with the minimum amount of accuracy
loss for database users.

Flow Control
 Distributed systems involve a large amount of data flow from one site to another as well as within
a site. Flow control prohibits data from being transferred in such a way that unauthorized agents
cannot access it.
 A flow policy specifies the channels through which data can flow. It also defines security classes
for data as well as transactions. Convert channels are the pathways for information to flow
implicitly in ways that violate a company's privacy policy.
Database Security Applying Statistical Method
 Statistical database security focuses on the protection of sensitive individual values stored in so-
called statistical databases and used for statistical purposes, as well as retrieving summaries of
values based on categories. They do not allow the retrieval of individual information.
 Examples include patient records utilized by medical researchers and extensive phone call records
statistically examined by phone companies to improve their services.
 It provides access to the database to obtain statistical information about the number of employees
in the company but not to obtain detailed confidential/personal information about a specific
individual employee.
 The techniques used to prevent statistical database compromise are classified into two types:
noise addition, in which all data and/or statistics are available but are only approximate rather
than exact, and restriction, in which the system only delivers statistics and/or data that are deemed
safe.
 In any situation, a technique is evaluated by assessing both the amount of information lost and the
level of privacy attained. Statistical data protection aims to maximize privacy while minimizing
information loss. To evaluate a particular technique, a theoretical lower bound on the information
loss required to achieve a given level of privacy must be established.
Encryption
 Data encryption protects data confidentiality by converting it to encoded information known as

ciphertext, which can only be decoded with a unique decryption key generated either during or
before encryption.
 Data encryption can be used during data storage or transfer, and it is usually used in conjunction
with authentication services to ensure that keys are only given to or used by authorized users.
 Data is more accessible and desirable to attackers than ever before, increasing the need for
security. Additionally, many firms must comply with data protection regulations, many of which
specifically require the use of encryption.
 Aside from the clear advantages of enhanced security, privacy protection, and the avoidance of
illegal access, encryption also helps to assure data integrity. Encryption protects content against
unauthorized change and can be used to validate the origin and authenticity of data.
Data Protection Tools and Platforms

Several companies now offer data protection platforms and tools. A comprehensive approach
should include the following features:
 Discovery: The ability to discover is often required to meet regulatory compliance

requirements. Look for a solution that can discover and classify vulnerabilities in our
databases, whether they're in the cloud or on-premises. It will also
offer recommendations to address any discovered vulnerabilities.
 Data Activity Monitoring: The solution should be capable of monitoring and analyzing
all data activity in all databases, regardless of whether our application is on-premises, in
the cloud, or inside a container. It will notify us of suspicious activities in real time,
allowing us to respond to threats more rapidly. It also provides visibility into the current
state of our data via an integrated and comprehensive user interface. It is also important
to select a system that enforces regulations, procedures, and the division of roles. Be
sure that the solution we select can generate the reports we require to comply with the
rules.
 Data Tokenization and Encryption: In the case of an incident, encryption serves as an
additional line of protection against any breach. Any software we deploy must be capable
of protecting data in the cloud, on-premises hybrid, or multi-cloud environments. Find a
tool that has volume, file, and application encryption features that meet our company's
compliance requirements. Tokenization (data concealment) or advanced key management
of security keys may be required.
 Data Security and Risk Analysis Optimization: An application that provides
contextual insights by combining security data with advanced analytics will allow the
user to perform optimizing, risk assessment, and reporting a breeze. Select a technology
that can store and combine vast amounts of recent and historical data regarding the
security and health of your databases. Also, choose a solution that allows for data
analysis, auditing, and reporting capabilities via a comprehensive but user-friendly self-
service interface.
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
Types of Integrity Constraint

1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The value of
the attribute must be available in the corresponding domain.
Example:
2. Entity integrity constraints

o The entity integrity constraint states that primary key value can't be null.

o This is because the primary key value is used to identify individual rows in relation and if the
primary key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
Example:
3. Referential Integrity Constraints

o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of
Table 2, then every value of the Foreign Key in Table 1 must be null or be available in Table 2.
Example:

4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A primary
key can contain a unique and null value in the relational table.
Example:

Structure of a Distributed Database
A distributed database is a collection of multiple interconnected databases, which are spread

physically across various locations that communicate via a computer network.
Features
 Databases in the collection are logically interrelated with each other. Often they represent
a single logical database.
 Data is physically stored across multiple sites. Data in each site can be managed by a
DBMS independent of the other sites.
 The processors in the sites are connected via a network. They do not have any
multiprocessor configuration.
 A distributed database is not a loosely connected file system.
 A distributed database incorporates transaction processing, but it is not synonymous with
a transaction processing system.
Distributed Database Management System

A distributed database management system (DDBMS) is a centralized software system that
manages a distributed database in a manner as if it were all stored in a single location.
Features
 It is used to create, retrieve, update and delete distributed databases.
 It synchronizes the database periodically and provides access mechanisms by the virtue of
which the distribution becomes transparent to the users.
 It ensures that the data modified at any site is universally updated.
 It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.
 It is designed for heterogeneous database platforms.
 It maintains confidentiality and data integrity of the databases.
Factors Encouraging DDBMS

The following factors encourage moving over to DDBMS −
 Distributed Nature of Organizational Units − Most organizations in the current times
are subdivided into multiple units that are physically distributed over the globe. Each unit
requires its own set of local data. Thus, the overall database of the organization becomes
distributed.

 Need for Sharing of Data − The multiple organizational units often need to
communicate with each other and share their data and resources. This demands common
databases or replicated databases that should be used in a synchronized manner.
 Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and
Online Analytical Processing (OLAP) work upon diversified systems which may have
common data. Distributed database systems aid both these processing by providing
synchronized data.
 Database Recovery − One of the common techniques used in DDBMS is replication of
data across different sites. Replication of data automatically helps in data recovery if
database in any site is damaged. Users can access data from other sites while the damaged
site is being reconstructed. Thus, database failure may become almost inconspicuous to
users.
 Support for Multiple Application Software − Most organizations use a variety of
application software each with its specific database support. DDBMS provides a uniform
functionality for using the same data among different platforms.
Advantages of Distributed Databases

Following are the advantages of distributed databases over centralized databases.
Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers
and local data to the new site and finally connecting them to the distributed system, with no
interruption in current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to
a halt. However, in distributed systems, when a component fails, the functioning of the system
continues may be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met
from local data itself, thus providing faster response. On the other hand, in centralized systems,
all queries have to pass through the central computer for processing, which increases the
response time.
Lower Communication Cost − In distributed database systems, if data is located locally where
it is mostly used, then the communication costs for data manipulation can be minimized. This is
not feasible in centralized systems.
Adversities of Distributed Databases

Following are some of the adversities associated with distributed databases.

 Need for complex and expensive software − DDBMS demands complex and often
expensive software to provide data transparency and co-ordination across the several
sites.
 Processing overhead − Even simple operations may require a large number of
communications and additional calculations to provide uniformity in data across the sites.
 Data integrity − The need for updating data in multiple sites pose problems of data
integrity.
 Overheads for improper data distribution − Responsiveness of queries is largely
dependent upon proper data distribution. Improper data distribution often leads to very
slow response to user requests.
Design of Distributed Databases
A distributed database is basically a database that is not limited to one system, it is

spread over different sites, i.e, on multiple computers or over a network of computers.
A distributed database system is located on various sites that don’t share physical
components. This may be required when a particular database needs to be accessed by
various users globally. It needs to be managed such that for the users it looks like one
single database.
Types:
1. Homogeneous Database:
In a homogeneous database, all different sites store database identically. The operating
system, database management system, and the data structures used – all are the same at
all sites. Hence, they’re easy to manage.
2. Heterogeneous Database:
In a heterogeneous distributed database, different sites can use different schema and
software that can lead to problems in query processing and transactions. Also, a
particular site might be completely unaware of the other sites. Different computers may
use a different operating system, different database application. They may even use
different data models for the database. Hence, translations are required for different
sites to communicate.
Distributed Data Storage:
There are 2 ways in which data can be stored on different sites. These are:
1. Replication –
In this approach, the entire relationship is stored redundantly at 2 or more sites. If the
entire database is available at all sites, it is a fully redundant database. Hence, in
replication, systems maintain copies of data.
This is advantageous as it increases the availability of data at different sites. Also, now
query requests can be processed in parallel.
However, it has certain disadvantages as well. Data needs to be constantly updated. Any
change made at one site needs to be recorded at every site that relation is stored or else

it may lead to inconsistency. This is a lot of overhead. Also, concurrency control

becomes way more complex as concurrent access now needs to be checked over a
number of sites.
2. Fragmentation –
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts)
and each of the fragments is stored in different sites where they’re required. It must be
made sure that the fragments are such that they can be used to reconstruct the original
relation (i.e, there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a
problem.
Fragmentation of relations can be done in two ways:
 Horizontal fragmentation – Splitting by rows –

The relation is fragmented into groups of tuples so that each tuple is assigned to at
least one fragment.
 Vertical fragmentation – Splitting by columns –
The schema of the relation is divided into smaller schemas. Each fragment must
contain a common candidate key so as to ensure a lossless join.
In certain cases, an approach that is hybrid of fragmentation and replication is used.
Applications of Distributed Database:
 It is used in Corporate Management Information System.
 It is used in multimedia applications.
 Used in Military’s control system, Hotel chains etc.
 It is also used in manufacturing control system.
 A distributed database system is a type of database management system that
stores data across multiple computers or sites that are connected by a network. In
a distributed database system, each site has its own database, and the databases
are connected to each other to form a single, integrated system.
 The main advantage of a distributed database system is that it can provide
higher availability and reliability than a centralized database system. Because the
data is stored across multiple sites, the system can continue to function even if
one or more sites fail. In addition, a distributed database system can provide
better performance by distributing the data and processing load across multiple
sites.

 There are several different architectures for distributed database systems, including:
 Client-server architecture: In this architecture, clients connect to a central
server, which manages the distributed database system. The server is responsible
for coordinating transactions, managing data storage, and providing access
control.
 Peer-to-peer architecture: In this architecture, each site in the distributed
database system is connected to all other sites. Each site is responsible for
managing its own data and coordinating transactions with other sites.
 Federated architecture: In this architecture, each site in the distributed
database system maintains its own independent database, but the databases are
integrated through a middleware layer that provides a common interface for
accessing and querying the data.
 Distributed database systems can be used in a variety of applications, including
e-commerce, financial services, and telecommunications. However, designing
and managing a distributed database system can be complex and requires careful
consideration of factors such as data distribution, replication, and consistency.
 Advantages of Distributed Database System :
 1) There is fast data processing as several sites participate in request processing.
2) Reliability and availability of this system is high.
3) It possess reduced operating cost.
4) It is easier to expand the system by adding more sites.
5) It has improved sharing ability and local autonomy.
 Disadvantages of Distributed Database System :
 1) The system becomes complex to manage and control.
2) The security issues must be carefully managed.
3) The system require deadlock handling during the transaction processing
otherwise
the entire system may be in inconsistent state.
4) There is need of some standardization for processing of distributed database
system.
REFERENCES:

DBMS Notes UNIT 3-4

Uploaded by

Copyright:

Available Formats

DBMS Notes UNIT 3-4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DBMS Notes UNIT 3-4

Uploaded by

Copyright:

Available Formats

BCOM-Ist SEM www.sbs.ac.

CLASS: BCA4th SEM

SUBJECT: DATABASE MANAGEMENT SYSTEMS LECTURE NOTES

Notes as per IKGPTU Syllabus

NAME OF FACULTY: Ms. Sonia Sharma

FACULTYOF BUSINESS MANAGEMENT, SBS COLLEGE. LUDHIANA

SBS @PROPRIETARY Page 1

o Making relations very large.

Normalization is the process of organizing the data in the database.

Why do we need Normalization?

Data modification anomalies can be categorized into three types:

SBS @PROPRIETARY Page 2

Types of Normal Forms:

Following are the various types of Normal forms:

1NF A relation is in 1NF if it contains an atomic value.

SBS @PROPRIETARY Page 3

3NF A relation will be in 3NF if it is in 2NF and no transition dependency

BCNF A stronger definition of 3NF is known as Boyce Codd's normal form.

o Normalization helps to minimize data redundancy.

Multi-valued Dependencies and Fourth Normal Form:-

SBS @PROPRIETARY Page 4

Fourth normal form (4NF)

STU_ID COURSE HOBBY

In the STUDENT relation, a student with STU_ID, 21 contains two

SBS @PROPRIETARY Page 5

Fifth normal form (5NF)

SBS @PROPRIETARY Page 6

SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1

Math Akash Semester 2

Chemistry Praveen Semester 1

SBS @PROPRIETARY Page 7

Domain-key normal form (DKNF)

SBS @PROPRIETARY Page 8

SBS @PROPRIETARY Page 9

Key constraiKey constraints specify keys of some relation.

Database Recovery Techniques in DBMS

SBS @PROPRIETARY Page 10

Database recovery techniques are used in database management systems (DBMS) to

SBS @PROPRIETARY Page 11

SBS @PROPRIETARY Page 12

SBS @PROPRIETARY Page 13

Introduction to Concurrency Control in DBMS

How does Concurrency Control work in DBMS?

Concurrency Control Protocols

SBS @PROPRIETARY Page 14

Concurrency Control Problems

SBS @PROPRIETARY Page 15

 What do you mean by database security and why is it important in DBMS?

What is Database Security in DBMS?

We consider database security in the following scenarios:

 Theft and fraudulent.

SBS @PROPRIETARY Page 16

 Loss of Data integrity.

Why Database Security is Important?

 Compromise of intellectual property: Our intellectual property—trade secrets,

Database Security Threats