Unit 4 DBMS Notes N
Unit 4 DBMS Notes N
Unit 4 DBMS Notes N
Overview
Transactions are a set of operations that are used to perform some logical set of work. A transaction
is made to change data in a database which can be done by inserting new data, updating the existing
data, or by deleting the data that is no longer required. There are certain types of transaction states
which tell the user about the current condition of that database transaction and what further steps to
Introduction
You might have encountered a situation when your system got crashed due to some hardware or
software issues and got rebooted to ensure that all the data is restored in a consistent state. This
protection of user's data even in case of a system failure marks as one of the major advantages
of database management system. Various transactions are done as a part of manipulating the data
in a database, these transactions can be seen as a set of operations that are executed by the user
program in DBMS. Execution of a similar transaction multiple times will lead to the generation of
multiple transactions. For example, Withdrawing some amount of money from the ATM can be seen
A certain set of operations takes place when a transaction is done that is used to perform some
logical set of operations. For example: When we go to withdraw money from ATM, we encounter
1. Transaction Initiated
7. Transaction processes
The above mentioned are the set of operations done by you. But in the case of a transaction in
DBMS there are three major operations that are used for a transaction to get executed in an efficient
Let's understand the above three sets of operations in a transaction with a real-life example of
Initial balance in both the banks before the start of the transaction
This data before the start of the transaction is stored in the secondary memory (Hard disk) which
once initiated is bought to the primary memory (RAM) of the system for faster and better access.
Now for a transfer of ₹ 500 from Account1 to Account2 to occur, the following set of operations
Read (Account1) --> 5000 Account1 = Account1 - 500 Write (Account1) --> 4500 Read (Account2)
--> 2000 Account2 = Account2 + 500 Write (Account2) --> 2500 commit
The COMMIT statement permanently saves the changes made by the current transaction.
When a transaction is successful, COMMIT is applied. If the system fails before a COMMIT is
applied, the transaction reaches its previous state after ROLLBACK.
After commit operation the transaction ends and updated values of Account1 = ₹ 4500 and
Account2 = ₹ 2500. Every single operation that occurs before the commit is said to be in a
partially committed state and is stored in the primary memory (RAM). After the transaction is
committed, the updated data is accepted and updated in the secondary memory (Hard Disk).
If in some case, the transaction failed anywhere before committing, then that transaction gets
aborted and have to start from the beginning as it can’t be continued from the previous state of
failure. This is known as Roll Back. :::
Transaction States in DBMS
During the lifetime of a transaction, there are a lot of states to go through. These states update the
operating system about the current state of the transaction and also tell the user about how to plan
further processing of the transaction. These states decide the regulations which decide the fate of a
The
ROLLBACK statement undo the changes made by the current transaction. A transaction
Active State: When the operations of a transaction are running then the transaction is said to
be active state. If all the read and write operations are performed without any error then it
progresses to the partially committed state, if somehow any operation fails, then it goes to a
Partially Committed: After all the read and write operations are completed, the changes
which were previously made in the main memory are now made permanent in the database,
after which the state will progress to committed state but in case of a failure it will go to the
failed state.
Failed State: If any operation during the transaction fails due to some software or hardware
issues, then it goes to the failed state. The occurrence of a failure during a transaction makes
a permanent change to data in the database. The changes made into the local memory data
Aborted State: If the transaction fails during its execution, it goes from failed state to
aborted state and because in the previous states all the changes were only made in the main
memory, these uncommitted changes are either deleted or rolled back. The transaction at this
point can restart and start afresh from the active state.
Committed State: If the transaction completes all sets of operations successfully, all the
changes made during the partially committed state are permanently stored and the
transaction is stated to be completed, thus the transaction can progress to finally get
Terminated State: If the transaction gets aborted after roll-back or the transaction comes
from the committed state, then the database comes to a consistent state and is ready for
maintain state consistency in the database, both before and after the transaction. These are called
ACID properties.
1.Atomicity: This property means that either the transaction takes place completely at once
or doesn’t happen at all. There is no middle option, i.e., transactions do not occur partially.
Each transaction is considered as one single step which either runs completely or is not
2.Consistency: This property means that the integrity constraints of a database are
maintained so that the database is consistent before and after the transaction. It refers to the
correctness of a database.
3.Isolation: This property means that multiple transactions can occur concurrently without
causing any inconsistency to the database state. These transactions occur independently
without any external interference. Changes that occur in a particular transaction are not
visible/ accessible to any other transaction until that particular change in that transaction has
been committed.
4.Durability: This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk and they remain
intact even if a system failure occurs. These updates become permanent and are stored in the
non-volatile memory.
Problem faced when concurrent transactions are
transactions in a shared database. However, when the concurrent transaction techniques are
The five leading problems that occur due to uncontrolled execution of concurrent
transactions are
1. Lost Update Problem
2. Dirty Read Problem
3. Unrepeated Read
4. Incorrect summary
5. Phantom Read
1. Lost Update Problem
A lost update problem appears when at least two transactions access the same transaction
data and performs their operation. As we know, the data of transaction if the first read by
the system and operation is performed and then write operation comes in the action
which permanently saves the change in the actual database. Similarly, two or more
transaction read the data from the database and performs their own operations but during
the write operation sometimes the updated data from one
transaction is overwritten by another transaction due to which the new data from the first
transaction got lost and this situation is called lost update problem.
Lost Update Problem example
T1 T2
Read(X)
Read(X)
Update X=X+1000
Update X=X-500
Write(X)
Write(X)
Let two transactions T1 and T2. The transaction first T1 reads the data X from the
database, and then transaction T2 reads the same data from the database. After that T1
perform their operation to add 1000 to X, and then transaction T2 performs the operation
to subtract 500 from the data which is read by T2 i.e. X, that T1 performs write operation
which stores the value of X according to the changes made by T1 and then T2 runs write
operation which again updates the value of X in database. This situation causes the loss
of change of X which is performed by T1 as the T2 overwrites the X again after it gets
updated by T1. We can also say that the update made by T1 is lost.
2. Dirty read problem
This is one of the problems that occur due to concurrent execution in an uncontrolled
manner. In this problem, when two or more transactions run and one transaction reads
the data value from the database and updates it with some operation but not completed,
another transaction reads that updated value of data and completes its execution by the
‘commit’ statement’. After completing of other transaction, the first transaction reads the
same data or another data or performs another operation and get failed.
According to the atomicity property of the transaction, the transaction rolled back and
started from the begging i.e. the old or raw value of that data. But the other transaction
which is already get completed by reading the old updated value of data is not
considered an incorrect transaction as its data is considered as wrong data or dirty read.
This whole scenario is called the dirty read problem.
Dirty read problem example
T1 T2
Read(Y)
Update Y=Y+1000
Write(Y)
Read(Y)
Update Y=Y+500
Write(Y)
commit
Read(Z) // Failed
Write(Z)
Let two transactions T1 is execution and read a data Y from the database and then update
the data by adding 1000 to data and then write statement is executed (but the transaction
is not finished). Then another transaction T2 comes in the action and it read the updated
value of Y and performs subtracting 500 from Y and then executes the ‘write’ statement
and with commit command, it completes its execution.
Then transaction T1 again performs read operation on some data Z and get failed. This
failure will abort the transaction T1 and then the transaction T2 which has already been
completed by reading the value of updated X which was the part of an aborted
transaction, is now not a stable database transaction. Hence the data returned or written
by transaction T2 is considered to be wrong.
3. Unrepeated Read problem
An unrepeated read appears when more than two transactions are executing the
transaction data. In this problem, when a transaction starts executing data from the
database and performs a read operation, another transaction also reads the same data
from the database. Then after the previous transaction, updated the value of that data and
performed the write operation and then another operation again to read the updated value
of data.
This time the data is changed as the first transaction performed some operations on that
value, but the other transaction is unaware of it as for that transaction, both data values
must be the same, but the data is found to be different when read operation is performed
the second time. This causes the inconsistency of data.
Let’s understand this problem with the help of an example
T1 T2
Read(X)
Read(X)
Update X=X+100
Write(X)
Read(x)
Read(Y)
Write (Y)
Let two transactions be T1 and T2 where T1 starts executing and performing read
operations on data item X from the database. Then T2 also reads the same value X from
the database. Then T1 starts its operation to add 100 to X and performs a write operation
on X. then again T2 read the value of X but this time the value id changes to X+100,
which makes the transaction inconsistency as for T2 the value of X must be the same
when read operation is performed on X second time.
4. Incorrect Summary problem
The incorrect summary problem occurs when more than two transactions execute their
operations. One transaction calculates their aggregate function on data from the database
while the other uploads the data value to the database. This leads to inconsistency. The
result of one transaction is used as the input data value of another transaction.
Let us understand the situation with the help of an example
T1 T2
Read(X)
Update X=X+1000
Write(X)
Read(X)
Update X=X-500
Write(x)
Let two transactions T1 and T2. The transaction T1 reads X from the database, performs
the addition operation as X+1000 and then performs the write operation on X. Now
transaction T2 reads the value of X, which is updated to X+1000 and subtracts 500 from
the value, making the new value X-500 but transaction T2 may calculate some values
before they have been updated by transaction T2.
5. Phantom Read problem
The phantom Read problem occurs when a transaction reads data from the database, and
then another transaction reads the same database data and then the data is deleted as an
operation of the first transaction. Now, when another transaction tries to read that data
again, it does not find the data to read. It is a special case of an unrepeated read problem.
Let’s understand more with the help of an example
T1 T2
Read(X)
Read(X)
Delete(X)
Read(X)
Let transaction T1 read the data X from the database, and then transaction T2 read the
same data X from the database. Now the transaction T1 deletes X from the database, and
then T2 tries to read X again, which is already deleted by T1, so the transaction will not
be able to read the X.
Serializability in DBMS
Overview
transaction is a set of instructions used to perform any logical operations in terms of databases.
What is Serializability?
helps in maintaining the transactions to execute simultaneously without interleaving one another. In
simple words, serializability is a way to check if the execution of two or more transactions are
Schedules in DBMS are a series of operations performing one transaction to the other.
R(X) means Reading the value: X; and W(X) means Writing the value: X.
Schedules in DBMS are of two types:
1. Serial Schedule - A schedule in which only one transaction is executed at a time, i.e., one
Example:
Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
W(b)
R(a)
W(a)
Here, we can see that Transaction-2 starts its execution after the completion of Transaction-1.
Serial schedules are always serializable because the transactions only work one after the other. Also,
for a transaction, there are n! serial schedules possible (where n is the number of transactions).
interchanging. There are several transactions executing simultaneously as they are being
the same piece of data. Hence, the serializability of non-serial schedules is a major concern
so that our database is consistent before and after the execution of the transactions.
Example:
Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
R(a)
W(b)
W(a)
We can see that Transaction-2 starts its execution before the completion of Transaction-1, and they
are interchangeably working on the same data, i.e., "a" and "b".
A non-serial schedule is called a serializable schedule if it can be converted to its equivalent serial
schedule. In simple words, if a non-serial schedule and a serial schedule result in the same then the
To test the serializability of a schedule, we can use Serialization Graph or Precedence Graph. A
serialization Graph is nothing but a Directed Graph of the entire transactions of a schedule.
It can be defined as a Graph G(V, E) consisting of a set of directed-edges E = {E1, E2, E3, ..., En}
and a set of vertices V = {V1, V2, V3, ...,Vn}. The set of edges contains one of the two operations -
Ti -> Tj, means Transaction-Ti is either performing read or write before the transaction-Tj.
NOTE:
If there is a cycle present in the serialized graph then the schedule is non-serializable because
the cycle resembles that one transaction is dependent on the other transaction and vice versa. It also
means that there are one or more conflicting pairs of operations in the transactions. On the other
Two operations inside a schedule are called conflicting if they meet these three conditions:
To conclude, let’s take two operations on data: "a". The conflicting pairs are:
1. READ(a) - WRITE(a)
2. WRITE(a) - WRITE(a)
3. WRITE(a) - READ(a)
Note:
Let's take an example of schedule "S" having three transactions t1, t2, and t3 working
t1 t2 t3
R(x)
R(z)
W(z)
R(y)
R(y)
W(y)
W(x)
W(z)
W(x)
Non-serializable schedule. R(x) of T1 conflicts with W(x) of T3, so there is a directed edge from
T1 to T3. R(y) of T1 conflicts with W(y) of T2, so there is a directed edge from T1 to T2. W(y\x) of
T3 conflicts with W(x) of T1, so there is a directed edge from T3 to T. Similarly, we will make
edges for every conflicting pair. Now, as the cycle is formed, the transactions cannot be serializable.
Types of Serializability
Serializability of any non-serial schedule can be verified using two types mainly: Conflict
One more way to check serializability is by forming an equivalent serial schedule that results in the
same as the original non-serial schedule. Since this process only focuses on the output rather than
the operations taking place in between the switching of transactions, it is not practically used. Now
A non-serial schedule is a conflict serializable if, after performing some swapping on the non-
conflicting operation results in a serial schedule. It is checked using the non-serial schedule and an
equivalent serial schedule. This process of checking is called Conflict Serializability in DBMS.
It is tedious to use if we have many operations and transactions as it requires a lot of swapping.
For checking, we will use the same Precedence Graph technique discussed above. First, we will
check conflicting pairs operations(read-write, write-read, and write-write) and then form directed
edges between those conflicting pair transactions. If we can find a loop in the graph, then the
Example:
We have a schedule "S" having three transactions t1, t2, and t3 working simultaneously. Let's form
is precedence graph.
t1 t2 t3
R(x)
R(y)
R(y)
W(y)
W(x)
W(x)
R(x)
W(x)
As there is no loop in the
is a conflict serializable
The order of the Transactions: t1 will execute first as there is no incoming edge on T1. t3 will
execute second as it depends on T1 only. t2 will execute at last as it depends on both T1 and T3.
NOTE:
If a schedule is conflicting serializable, then it is surely a consistent schedule. On the other hand, a
non-conflicting serializable schedule may or may not be serial. To further check its serial behavior,
If a non-serial schedule is view equivalent to some other serial schedule then the schedule is called
The two conditions needed by schedules(S1 and S2) to be view equivalent are:
Example: If transaction t1 is reading "A" from database in schedule S1, then in schedule
Example: If a transaction t1 updated A at last in S1, then in S2, t1 should perform final
write as well.
Example: We have a schedule "S" having two transactions t1, and t2 working simultaneously.
S:
t1 t2
R(x)
W(x)
R(x)
W(x)
R(y)
W(y)
R(y)
W(y)
Let's form its view equivalent schedule (S') by interchanging mid-read-write operations of both the
transactions. S':
t1 t2
R(x)
W(x)
R(y)
W(y)
R(x)
W(x)
R(y)
W(y)
Since a view equivalent schedule is possible, it is a view serializable schedule.
NOTE:
A conflict serializable schedule is always viewed as serializable, but vice versa is not always true.
2. Check for a blind write. If there is a blind write, then the schedule can be view serializable.
So, check its view serializability using the view equivalent schedule technique (stated
above). If there is no blind write, then the schedule can never be view serializable.
Example:
We have a schedule "S" having two transactions t1, t2, and t3 working simultaneously.
S:
t1 t2 t3
R(x)
W(x)
W(x)
W(x)
It's precedence graph:
Since there is a loop present, the schedule is non-conflicting serializable schedule. Now, there are
blind writes [t2 -> w(x) and t3 -> w(x)] present, hence check for View Serializability.
S':
t1 t2 t3
R(x)
W(x)
W(x)
W(x)
Hence, the schedule is view serializable.
NOTE:
One important thing to note here is that we can form a number of sequences of the transactions such
as:
t1 t2 t3
t1 t3 t2
t2 t1 t3
t2 t3 t1
t3 t1 t2
t3 t2 t1
Benefits of Serialization
Serialization helps in checking concurrency control between multiple transactions. It also helps in
maintaining consistency in the database before and after any transaction. Serializable schedules are
resource-efficient and help in improving CPU throughput(total work done in a certain time).
Conclusion
Recoverability in DBMS
Overview
In the Database Management Systems there are multiple transactions that are happening parallelly.
Some of these transactions are dependent on each other while some are independent. If one
transaction fails in the dependent transactions, can we minimize its impact on other transactions as
well? Recoverability in DBMS is an area that deals with such kind of issues. In this article, we will
discuss about recoverable and irrecoverable schedules, followed by cascadeless and cascade
Recoverable Schedule
Suppose Aarnav has 10 lakhs in his account and his friend Jaanvi who has 1 lakh in her account,
needs 2 lakhs more to buy a new gaming laptop. The laptop is a limited edition, and once she places
the order, she cannot cancel it. The money can be paid only within 30 minutes of placing the order.
She texted Aarnav to transfer her 2 lakhs and placed an order for the 3 lakh gaming laptop.
Aarnav doesn't transfer the money to Jaanvi, Jaanvi may get in trouble since she can't cancel the
order and she does not have money. Hence, a wise decision would have been to place an order when
she receives 2 lakhs from Aarnav.
Therefore, if Jaanvi places the order once she receives the money from Aarnav can be considered a
recoverable schedule since she has the option whether to place the order or not.
A schedule is a recoverable schedule if each transaction present in the schedule
commits only after all the transactions from which it has read the values are
executed/committed entirely. If T1 and T2 are two transactions and T2 reads data
from T1, then T2 should commit only after T1 commits. Mathematically,
(T1→T2 Reads)⟹(T1 Commits→T2 Commits)
Irrecoverable Schedule
A schedule is said to be irrecoverable if the transaction commits before the transaction from which
Let us see one more example for better understanding. Consider two transactions T1 and T2 in the
schedule. T1 initially reads and write the data but is not committed. T2 starts, it reads the data
updated by T1 and commits. Now, if T1 fails, both T1 and T2 needs to rollback and start again. This
is an irrecoverable schedule since T1 can rollback as it was not committed but T2 can't since it is
committed.
Cascade Recoverable Rollback Schedule
Let us first understand what "cascading" means by an example. Cascade generally means that
successive events depend on the occurrence of previous events. For example, water falls through
the nth stair after flowing through the (n−1)th stair. It is falling from the (n−1)th stair after
traversing from the (n−2)th stair. Hence all the stairs form a cascade. The water would not fall
through the successive cascade of stairs, if it doesn't flows through any one of the previous stair.
If the transactions form a cascade without any commits as shown below, then it is called a cascade
recoverable schedule. In this type of schedule, if the previous transaction fails then the following
transactions needs to be aborted and start again since there would be inconsistency in the data.
Cascadeless Recoverable Rollback Schedule
Abhijeet, Bhaswat, and Clara are sitting in the exam hall one behind another. Clara is cheating from
Bhaswat and Bhaswat is cheating from Abhijeet. In what order should they submit the answer sheet
Abhijeet should submit the paper first, followed by Bhaswat and Clara. Otherwise, say Abhijeet
found the mistake in his answer after Bhaswat and Clara submitted the answer sheet. So he can
update the answer and submit. Abhijeet will now get more marks since Bhaswat and Clara can't
update their solution as they have already submitted the answer sheet.
Hence, in a cascadeless recoverable schedule, a transaction should only read the data from another
Conclusion
Recoverability in DBMS is like backup and restore used to satisfy the durability of
transaction schedules.
If each transaction present in the schedule commits only after all the transactions from
which it has read the values are committed, then the schedule is said to be recoverable
schedule.
Cascadeless schedule is a recoverable schedule in which the following transactions can read
the data of the previous transactions only when the previous transactions are committed.
Overview
Log-based recovery in DBMS provides the ability to maintain or recover data in case of system
failure. DBMS keeps a record of every transaction on some stable storage device to provide easy
access to data when the system fails. A log file will be created for every operation performed on the
database at that point. The original transaction should be processed before it is applied to the
To recover from a system failure, Apache HBase, a distributed data store based on HDFS, employs
a log-based recovery provided by DBMS. The WAL is written every time a database write occurs,
so it is replicated on HDFS. Upon recording the transaction in WAL, it is then moved to memstore,
which lives on the region server for HBase. This memstore becomes full once there are multiple
writes. As soon as the memstore is full, an HFile is created and flushed to disk.
During a shutdown or a restart of the system holding the memstore, all the data in the memstore
cache are lost before they could be flushed to disk as HFiles. When the memstore is full, the WAL is
replayed to reconstruct the data and repopulate it so that when full, it can flush the data to
permanent storage.
Introduction to Log-Based Recovery in DBMS
A log is a series of records. The logs for each transaction are kept in a log file to allow recovery in
case of failure. A log is kept for each operation performed on the database. It is important to store
the log before the actual transactions are applied to the database. Take the example of modifying a
A new log is written to the file when the City is changed from Chennai to NCR <Tn, City,
Once the transaction has been completed, another log will be written to indicate that the
Name Description
Transaction identifier It is an identifier that distinguishes one transaction from another.
Data item identifier It uniquely identifies data.
Old value This is the value of the data item before writing.
New value This is the value of an item after it has been written.
The different types of log records are denoted by the following names:
Command Description
<Ti start> The transaction Ti has begun.
<Ti, Xi, V1, The transaction Ti performed a write operation on item Xi, and Xj contains
V2> value v1 before the write and will contain value V2 after it was completed.
Command Description
<Ti commit> It commits a transaction.
<Ti abort> It aborts a transaction.
Transactions that perform a write operation to the database only modify it when a log record is
created for that write operation. The logs may or may not record the reading of data items. The
reason is that reading a data item has no effect on the consistency of the database and is not
1. Undo(Ti): Restores the old values of each of the items in Ti that have been updated by the
transaction Ti.
2. Redo(Ti): Updates all data items updated by transaction Ti with their new values. Undoing
a T requires the log to contain both the record <start> and the record <commit>.
Checkpoint in DBMS
Overview
The DBMS checkpoint is a procedure of compressing the transaction log file by transferring the old
transaction to permanent storage. It helps in system recovery when the failure occurs. The
procedure of system recovery includes reading of log file in reverse order and maintaining redo and
undo lists.
Introduction
In a real-time environment, a newly created transaction log file occupies an ample amount of
storage space. A transaction log file contains a record of operations performed by the transactions in
the database. They ensure consistency on crashes or hardware failures. Tracing every single update
and maintenance of log files also contributes to filling up the system's memory. The DBMS
checkpoint come into the picture when the size of the transaction log file becomes too large to
manage and handle easily. The DBMS checkpoint is a mechanism of compressing the
transaction log file by transferring the old transactions to permanent storage. The checkpoint
marks the position till where the consistency of the transactions is maintained. During the execution
of the transactions, the curser passes through the marked checkpoint. At that point, all the
transactions get saved into the database and get erased from the log file. Then the log file started
getting filled up by some new list of operations till the next checkpoint.
The schematic representation of the system recovery when the failure occurs during the
T1 T2 T3 T4
START
START
COMMIT
START
COMMIT
START
FAILURE
Following are the steps to be performed for the system recovery using checkpoint.
The transaction log file are read in the reverse order, ie., from T4 to T1.
Redo and Undo are the lists that are created and maintained by the system.
If the transactions contain operations like <Tn, start> and <Tn, commit> together or <Tn,
commit> alone then the transaction will be stored in the redo list. In the above diagram, The
transaction T1 contains only <Tn, commit> and transactions T2 and T3 contain <Tn, start>
and <Tn, commit> both the operations and therefore transactions T1, T2 and T3 are stored in
If the transactions contain operations like <Tn, start> but not <Tn, commit>, then the
transaction will be stored in undo list. In the above diagram, The transaction T4 contains
<Tn, start> but not <Tn, commit> operation, and therefore transaction T4 is stored in undo
list.
Operations List
<Tn,start> and <Tn,commit> Redo list
<Tn,commit> Redo list
<Tn,start> Undo list
Advantages of Checkpoint
A checkpoint is a feature that enhances the consistency of the database during the execution
Checkpoint help in getting our transactions back in the case of accidental shutdown of the
database.
Checkpoint hardens the dirty pages. Hardening dirty pages refer to writing all the dirty pages
Checkpoint serves as the synchronization point between the database and the transaction log
file.
Overview
When several transactions execute concurrently without any rules and protocols, various problems
arise that may harm the data integrity of several databases. These problems are known
as concurrency control problems. Therefore several rules are designed, to maintain consistency in
the transactions while they are executing concurrently which are known as concurrency control
protocols.
A transaction is a single reasonable unit of work that can retrieve or may change the data of a
database. Executing each transaction individually increases the waiting time for the other
transactions and the overall execution also gets delayed. Hence, to increase the throughput and to
are set in a row and only one train is allowed to move from station A to B and others have to wait
for the first train to reach its destination then it will take a lot of time for all the trains to travel from
station A to B. To reduce time all the trains should be allowed to move concurrently from
When several transactions execute simultaneously, then there is a risk of violation of the data
In a multi-user system, multiple users can access and use the same database at one time,
which is known as the concurrent execution of the database. It means that the same database
While working on the database transactions, there occurs the requirement of using the
database by multiple users for performing different operations, and in that case, concurrent
The thing is that the simultaneous execution that is performed should be done in an
interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of
the transaction operations, there occur several challenging problems that need to be solved.
Overview
The database management system (DBMS) stores data that can connect with one another as well as
can be altered at any point. There are instances where more than one user may attempt to access the
processing of transactions across many databases in the picture. Lock-based protocol in DBMS is
We can define a lock-based protocol in DBMS as a mechanism that is responsible for preventing a
transaction from reading or writing data until the necessary lock is obtained. The concurrency
problem can be solved by securing or locking a transaction to a specific user. The lock is a variable
protocol in DBMS. Every second, millions of payments take place. Consider buying a movie ticket
at your preferred theatre. Now, there's a chance that two or more customers will try to reserve the
A scenario like this, when treated on a big scale, can damage the database consistency resulting in
the corruption of data. To avoid any conflicts over user access to read and write into the database
and to ensure that there is no concurrency, each transaction must be handled separately. Lock-based
protocols in DBMS are used to accomplish this purpose. To know more about the concept of
In DBMS Lock based Protocols, there are two modes for locking and unlocking data items Shared
Lock (lock-S) and Exclusive Lock (lock-X). Let's go through the two types of locks in detail:
Shared Lock
Shared
Locks,
which are
often denoted as lock-S(), is defined as locks that provide Read-Only access to the
information associated with them. Whenever a shared lock is used on a database, it can be
read by several users, but these users who are reading the information or the data items will
not have permission to edit it or make any changes to the data items.
To put it another way, we can say that shared locks don't provide access to write. Because
numerous users can read the data items simultaneously, multiple shared locks can be
installed on them at the same time, but the data item must not have any other locks
A shared lock, also known as a read lock, is solely used to read data objects. Read integrity
Shared locks can also be used to prevent records from being updated.
S-lock is requested via the Lock-S instruction.
Example of Shared Locks: Consider the situation where the value of variable X equals 50 and
there are a total of 2 transactions reading X. If one transaction wants to change the value of A,
another transaction that tries to read the value will read the incorrect value of the variable X.
However, until it is done with reading, the Shared lock stops it from updating.
When the lock-based protocol in DBMS is applied to the transaction (let's say T1) discussed above,
Exclusive Lock
Exclusive Lock allows the data item to be read as well as written. This is a one-time use
mode that can't be utilized on the exact data item twice. To obtain X-lock, the user needs to
make use of the lock-x instruction. After finishing the 'write' step, transactions can unlock
By imposing an X lock on a transaction that needs to update a person's account balance, for
example, you can allow it to proceed. As a result of the exclusive lock, the second
At any given time, the exclusive locks can only be owned by one transaction.
Example of exclusive locks: Consider the instance where the value of a data item X is equal
to 50 and a transaction requires a deduction of 20 from the data item X. By putting a Y lock on this
particular transaction, we can make it possible. As a result, the exclusive lock prevents any other
A vital point to remember when using Lock-based protocols in Database Management System is
that a Shared Lock can be held by any amount of transactions. On the other hand, an Exclusive
Lock can only be held by one transaction in DBMS, this is because a shared lock only reads data
but does not perform any other activities, whereas an exclusive lock performs read as well as
writing activities.
The figure given below demonstrates that when two transactions are involved, and both of these
transactions seek to read a specific data item, the transaction is authorized, and no conflict occurs;
but, in a situation when one transaction intends to write the data item and another transaction
Overview
Locking in a database management system is used for handling transactions in databases. The two-
phase locking protocol ensures serializable conflict schedules. A schedule is called conflict
Scope
Growing Phase
Shrinking phase
Two-Phase Locking
Before understanding the two phases of Locking, let's understand the types of locks used in
transaction control.
Shared Lock: Data can only be read when a shared lock is applied. Data cannot be written.
It is denoted as lock-S
Exclusive lock: Data can be read as well as written when an exclusive lock is applied. It is
denoted as lock-X
Growing Phase: In the growing phase, the transaction only obtains the lock. The transaction
can not release the lock in the growing phase. Only when the data changes are committed
Shrinking Phase: Neither are locks obtained nor they are released in this phase. When all
the data changes are stored, only then the locks are released.
In the above figure, we can see that in the growing phase, all the locks are obtained till the point
when all the locks needed by the transactions are acquired. This pint is called Lock-Point. After the
The transaction can release the shared lock after the lock point.
The transaction can not release any exclusive lock until the transaction commits.
In strict two-phase locking protocol, if one transaction rollback then the other
transaction should also have to roll back. The transactions are dependent on each
exclusive lock.
The transaction must lock all the data items it requires in the transaction before the
transaction begins.
If any of the data items are not available for locking before execution of the lock,
The read-and-write data items need to know before the transaction begins. This is not
possible normally.
TIMESTAMP PROTOCOL
Timestamp is a unique identifier created by the DBMS to identify a transaction. They are
usually assigned in the order in which they are submitted to the system. Refer to the
If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set R_TS(X)
to the larger of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects two conflicting operations that occur in an
incorrect order, it rejects the latter of the two operations by aborting the Transaction
that issued it. Schedules produced by Basic TO are guaranteed to be conflict serializable.
Validation based protocol avoids the concurrency of the transactions and works based on
the assumption that if no transactions are running concurrently then no interference occurs.
This is why it is also called Optimistic Concurrency Control Technique.
In this protocol, a transaction doesn’t make any changes to the database directly, instead it
performs all the changes on the local copies of the data items that are maintained in the
transaction itself. At the end of the transaction, a validation is performed on the transaction.
If it doesn’t violate any serializability rule, the transaction commit the changes to the
database else it is updated and restarted.
2. Validation phase: In this phase, a validation check is done on the temporary variables
to see if it violates the rules of serializability.
3. Write phase: This is the final phase of validation based protocol. In this phase, if the
validation of the transaction is successful then the values of temporary local variables
is written to the database and the transaction is committed. If the validation is failed in
second phase then the updates are discarded and transaction is slowed down to be
restarted later.
Start(Tn): It represents the timestamp when the transaction Tn starts the execution.
Validation(Tn): It represents the timestamp when the transaction Tn finishes the read phase
and starts the validation phase.
Finish(Tn): It represents the timestamp when the transaction Tn finishes all the write
operations.
This protocol uses the Validation(Tn) as the timestamp of the transaction Tn because this is
actual phase of the transaction where all the checks happen. So it is safe to say that TS(Tn)
= Validation(Tn).
If there are two transactions T1 & T2 managed by validation based protocol and if
Finish(T1) < Start(T2) then the validation will be successful as the serializability is
maintained because T1 finished the execution well before the transaction T2 started the read
phase
Transactions in databases are atomic, which means when a transaction needs to access the values in
the database, a locking protocol is used to lock the database. Locking prevents other transactions
from modifying or viewing the database simultaneously because other ongoing transactions are
locked.
It is crucial to lock the database while performing a transaction optimally. Locking an entire
database unnecessarily can cost a loss of concurrency. We have a concept of multiple granularity in
Example
We can understand the concept of multiple granularity with a tree diagram. Consider a tree structure
as mentioned below.
1. The first level, or the tree's root, represents our entire database containing several tables and
records.
2. The next level (second level) has nodes that represent areas. The database consists of these
areas.
3. The children of the second level are known as files. A file can not be present within more
4. The bottommost layer in our trees is known as the records. Records are child nodes of files
Database
Area
File
Record
Now that we understand the levels in our tree diagram, we can see that each node in the tree can be
individually locked. So far, we know from 2 phase locking protocol that when a transaction locks a
node, it can be shared or exclusive. Once a node is locked, the children nodes are implicitly locked
1. If we want to apply a lock on a node, we need to ensure that no ancestor of the node is
locked. To ensure that none of the ancestors is locked, we need to iterate from root to node
2. To acquire a lock on the entire database, the complete tree needs to be iterated to ensure
entire tree to acquire a lock on the entire database nullifies our concept of multiple granularity. We
will see a better approach to solve the limitations discussed above in upcoming sections of the
article.
Shared locks: Shared locks are used while reading data only, and the lock prevents other
transactions from updating the data but allows them to read the data.
Exclusive locks: When the data is being modified, exclusive locks prevent other transactions
Multiple Granularity?
For the problem mentioned in the above section, we introduce three additional locks other than
1. Intention Shared (IS) lock: If there is an Intention-shared lock on a node, then the lower
level tree only has explicit locking with the shared lock. This means that a node can have
an Intention Shared (IS) lock only when only nodes at the lower level are locked with shared
locks.
2. Intention Exclusive (IX) lock: Explicit lock on the lower level tree from the node with
3. Shared and Intention Exclusive (SIX) lock: In such type of lock, the subtree of the locked
node is explicitly locked in shared mode and with an exclusive mode lock at the lower level.
Let us see the compatibility matrix for the five locks that we now know:
To
ensure serializability in data, we use intention lock modes in multiple granularity protocols. If
If the parent node of T has IX or IS lock, then transaction T will lock the node only in IS or
S mode.
If the parent node of T has locked in mode SIX or IX, then transaction T will lock the node
If transaction T has previously not unlocked any node, then transaction T can lock a node.
If none of the children of transaction T is locked, only then can transaction T unlock a node.
The locks in multiple granularity are acquired from top to down (top-down order), but the locks
Let us see the simulation of the protocol with an example on the tree mentioned in the article:
Suppose transaction T1 reads the recode Ra2 in a file Fa. Then another transaction, T2, will
lock in the database, area A1, file Fa with IS mode, and record Ra2 in Shared mode in the
same order.
Let's say our transaction T2 modifies the record Ra8 in the same file Fa. For this operation
database, area A1 and file Fa will be locked in IX mode, and the target record Ra8 will be
Another transaction T3, tries to read all records in our file Fa. Then T3 will lock the nodes in
its path, i.e. database and area A1, in IS mode, and then file Fa will be locked in shared (S)
mode.
Lastly, let's say transaction T4 reads our entire database. Then transaction T4 will lock the
Here, our transactions T1, T3, and T4 concurrently access the database, while transactions T2 and
T1 can execute concurrently but can not execute concurrently with either transaction T3 and T4.
This way, multiple granularity improves concurrency and reduces the overhead on locks, but
deadlocks can still be possible in multiple granularity protocol, just like in the two-phase locking
protocol.
Also,click here to learn more about Record in DBMS.
Conclusion
A locking protocol locks the database whenever a transaction wants to access the database.
Multiple granularity is hierarchically breaking up our database into smaller blocks that can
be locked.
Other than shared and exclusive locks, three additional locks are introduced in multiple