Unit 4 DBMS Notes N

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

UNIT-4

Transaction Processing Concepts

Overview
Transactions are a set of operations that are used to perform some logical set of work. A transaction

is made to change data in a database which can be done by inserting new data, updating the existing

data, or by deleting the data that is no longer required. There are certain types of transaction states

which tell the user about the current condition of that database transaction and what further steps to

be followed for the processing.

Introduction

You might have encountered a situation when your system got crashed due to some hardware or

software issues and got rebooted to ensure that all the data is restored in a consistent state. This

protection of user's data even in case of a system failure marks as one of the major advantages

of database management system. Various transactions are done as a part of manipulating the data

in a database, these transactions can be seen as a set of operations that are executed by the user

program in DBMS. Execution of a similar transaction multiple times will lead to the generation of

multiple transactions. For example, Withdrawing some amount of money from the ATM can be seen

as a transaction that can be done multiple times also.


Operations in Transaction

A certain set of operations takes place when a transaction is done that is used to perform some

logical set of operations. For example: When we go to withdraw money from ATM, we encounter

the following set of operations:

1. Transaction Initiated

2. You have to insert an ATM card

3. Select your choice of language

4. Select whether savings or current account

5. Enter the amount to withdraw

6. Entering your ATM pin

7. Transaction processes

8. You collect the cash

9. You press finish to end transaction

The above mentioned are the set of operations done by you. But in the case of a transaction in

DBMS there are three major operations that are used for a transaction to get executed in an efficient

manner. These are:


1. Read/ Access Data 2. Write/ Change Data 3. Commit

Let's understand the above three sets of operations in a transaction with a real-life example of

transferring money from Account1 to Account2.

Initial balance in both the banks before the start of the transaction

Account1 - ₹ 5000 Account2 - ₹ 2000

This data before the start of the transaction is stored in the secondary memory (Hard disk) which

once initiated is bought to the primary memory (RAM) of the system for faster and better access.

Now for a transfer of ₹ 500 from Account1 to Account2 to occur, the following set of operations

will take place.

Read (Account1) --> 5000 Account1 = Account1 - 500 Write (Account1) --> 4500 Read (Account2)

--> 2000 Account2 = Account2 + 500 Write (Account2) --> 2500 commit

The COMMIT statement permanently saves the changes made by the current transaction.
When a transaction is successful, COMMIT is applied. If the system fails before a COMMIT is
applied, the transaction reaches its previous state after ROLLBACK.
After commit operation the transaction ends and updated values of Account1 = ₹ 4500 and
Account2 = ₹ 2500. Every single operation that occurs before the commit is said to be in a
partially committed state and is stored in the primary memory (RAM). After the transaction is
committed, the updated data is accepted and updated in the secondary memory (Hard Disk).
If in some case, the transaction failed anywhere before committing, then that transaction gets
aborted and have to start from the beginning as it can’t be continued from the previous state of
failure. This is known as Roll Back. :::
Transaction States in DBMS

During the lifetime of a transaction, there are a lot of states to go through. These states update the

operating system about the current state of the transaction and also tell the user about how to plan
further processing of the transaction. These states decide the regulations which decide the fate of a

transaction whether it will commit or abort.

The

ROLLBACK statement undo the changes made by the current transaction. A transaction

cannot undo changes after COMMIT execution.

Following are the different types of transaction States :

 Active State: When the operations of a transaction are running then the transaction is said to

be active state. If all the read and write operations are performed without any error then it

progresses to the partially committed state, if somehow any operation fails, then it goes to a

state known as failed state.

 Partially Committed: After all the read and write operations are completed, the changes

which were previously made in the main memory are now made permanent in the database,

after which the state will progress to committed state but in case of a failure it will go to the

failed state.
 Failed State: If any operation during the transaction fails due to some software or hardware

issues, then it goes to the failed state. The occurrence of a failure during a transaction makes

a permanent change to data in the database. The changes made into the local memory data

are rolled back to the previous consistent state.

 Aborted State: If the transaction fails during its execution, it goes from failed state to

aborted state and because in the previous states all the changes were only made in the main

memory, these uncommitted changes are either deleted or rolled back. The transaction at this

point can restart and start afresh from the active state.

 Committed State: If the transaction completes all sets of operations successfully, all the

changes made during the partially committed state are permanently stored and the

transaction is stated to be completed, thus the transaction can progress to finally get

terminated in the terminated state.

 Terminated State: If the transaction gets aborted after roll-back or the transaction comes

from the committed state, then the database comes to a consistent state and is ready for

further new transactions since the previous transaction is now terminated.

Properties of Transaction in DBMS


There are four major properties that are vital for a transaction to be successful. These are used to

maintain state consistency in the database, both before and after the transaction. These are called

ACID properties.
1.Atomicity: This property means that either the transaction takes place completely at once

or doesn’t happen at all. There is no middle option, i.e., transactions do not occur partially.

Each transaction is considered as one single step which either runs completely or is not

executed at all. Click here, to learn more about Atomicity in DBMS.

2.Consistency: This property means that the integrity constraints of a database are

maintained so that the database is consistent before and after the transaction. It refers to the

correctness of a database.

3.Isolation: This property means that multiple transactions can occur concurrently without

causing any inconsistency to the database state. These transactions occur independently

without any external interference. Changes that occur in a particular transaction are not

visible/ accessible to any other transaction until that particular change in that transaction has

been committed.

4.Durability: This property ensures that once the transaction has completed execution, the

updates and modifications to the database are stored in and written to disk and they remain

intact even if a system failure occurs. These updates become permanent and are stored in the

non-volatile memory.
Problem faced when concurrent transactions are

executing with example

The concurrency control technique is a process of managing the simultaneous execution of

transactions in a shared database. However, when the concurrent transaction techniques are

executed in an uncontrolled manner then it leads to several problems.

The five leading problems that occur due to uncontrolled execution of concurrent
transactions are
1. Lost Update Problem
2. Dirty Read Problem
3. Unrepeated Read
4. Incorrect summary
5. Phantom Read
1. Lost Update Problem

A lost update problem appears when at least two transactions access the same transaction
data and performs their operation. As we know, the data of transaction if the first read by
the system and operation is performed and then write operation comes in the action
which permanently saves the change in the actual database. Similarly, two or more
transaction read the data from the database and performs their own operations but during
the write operation sometimes the updated data from one
transaction is overwritten by another transaction due to which the new data from the first
transaction got lost and this situation is called lost update problem.
Lost Update Problem example

T1 T2
Read(X)
Read(X)
Update X=X+1000
Update X=X-500
Write(X)
Write(X)
Let two transactions T1 and T2. The transaction first T1 reads the data X from the
database, and then transaction T2 reads the same data from the database. After that T1
perform their operation to add 1000 to X, and then transaction T2 performs the operation
to subtract 500 from the data which is read by T2 i.e. X, that T1 performs write operation
which stores the value of X according to the changes made by T1 and then T2 runs write
operation which again updates the value of X in database. This situation causes the loss
of change of X which is performed by T1 as the T2 overwrites the X again after it gets
updated by T1. We can also say that the update made by T1 is lost.
2. Dirty read problem

This is one of the problems that occur due to concurrent execution in an uncontrolled
manner. In this problem, when two or more transactions run and one transaction reads
the data value from the database and updates it with some operation but not completed,
another transaction reads that updated value of data and completes its execution by the
‘commit’ statement’. After completing of other transaction, the first transaction reads the
same data or another data or performs another operation and get failed.
According to the atomicity property of the transaction, the transaction rolled back and
started from the begging i.e. the old or raw value of that data. But the other transaction
which is already get completed by reading the old updated value of data is not
considered an incorrect transaction as its data is considered as wrong data or dirty read.
This whole scenario is called the dirty read problem.
Dirty read problem example

T1 T2
Read(Y)
Update Y=Y+1000
Write(Y)
Read(Y)
Update Y=Y+500
Write(Y)
commit
Read(Z) // Failed
Write(Z)
Let two transactions T1 is execution and read a data Y from the database and then update
the data by adding 1000 to data and then write statement is executed (but the transaction
is not finished). Then another transaction T2 comes in the action and it read the updated
value of Y and performs subtracting 500 from Y and then executes the ‘write’ statement
and with commit command, it completes its execution.
Then transaction T1 again performs read operation on some data Z and get failed. This
failure will abort the transaction T1 and then the transaction T2 which has already been
completed by reading the value of updated X which was the part of an aborted
transaction, is now not a stable database transaction. Hence the data returned or written
by transaction T2 is considered to be wrong.
3. Unrepeated Read problem

An unrepeated read appears when more than two transactions are executing the
transaction data. In this problem, when a transaction starts executing data from the
database and performs a read operation, another transaction also reads the same data
from the database. Then after the previous transaction, updated the value of that data and
performed the write operation and then another operation again to read the updated value
of data.
This time the data is changed as the first transaction performed some operations on that
value, but the other transaction is unaware of it as for that transaction, both data values
must be the same, but the data is found to be different when read operation is performed
the second time. This causes the inconsistency of data.
Let’s understand this problem with the help of an example
T1 T2
Read(X)
Read(X)
Update X=X+100
Write(X)
Read(x)
Read(Y)
Write (Y)
Let two transactions be T1 and T2 where T1 starts executing and performing read
operations on data item X from the database. Then T2 also reads the same value X from
the database. Then T1 starts its operation to add 100 to X and performs a write operation
on X. then again T2 read the value of X but this time the value id changes to X+100,
which makes the transaction inconsistency as for T2 the value of X must be the same
when read operation is performed on X second time.
4. Incorrect Summary problem

The incorrect summary problem occurs when more than two transactions execute their
operations. One transaction calculates their aggregate function on data from the database
while the other uploads the data value to the database. This leads to inconsistency. The
result of one transaction is used as the input data value of another transaction.
Let us understand the situation with the help of an example
T1 T2
Read(X)
Update X=X+1000
Write(X)
Read(X)
Update X=X-500
Write(x)
Let two transactions T1 and T2. The transaction T1 reads X from the database, performs
the addition operation as X+1000 and then performs the write operation on X. Now
transaction T2 reads the value of X, which is updated to X+1000 and subtracts 500 from
the value, making the new value X-500 but transaction T2 may calculate some values
before they have been updated by transaction T2.
5. Phantom Read problem

The phantom Read problem occurs when a transaction reads data from the database, and
then another transaction reads the same database data and then the data is deleted as an
operation of the first transaction. Now, when another transaction tries to read that data
again, it does not find the data to read. It is a special case of an unrepeated read problem.
Let’s understand more with the help of an example
T1 T2
Read(X)
Read(X)
Delete(X)
Read(X)
Let transaction T1 read the data X from the database, and then transaction T2 read the
same data X from the database. Now the transaction T1 deletes X from the database, and
then T2 tries to read X again, which is already deleted by T1, so the transaction will not
be able to read the X.
Serializability in DBMS

Overview

Serializability is related to schedules and transactions. Schedule is a set of transactions, and a

transaction is a set of instructions used to perform any logical operations in terms of databases.

What is Serializability?

Serializability of schedules ensures that a non-serial schedule is equivalent to a serial schedule. It

helps in maintaining the transactions to execute simultaneously without interleaving one another. In

simple words, serializability is a way to check if the execution of two or more transactions are

maintaining the database consistency or not.

Schedules and Serializable Schedules in DBMS

Schedules in DBMS are a series of operations performing one transaction to the other.

R(X) means Reading the value: X; and W(X) means Writing the value: X.
Schedules in DBMS are of two types:

1. Serial Schedule - A schedule in which only one transaction is executed at a time, i.e., one

transaction is executed completely before starting another transaction.

Example:

Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
W(b)
R(a)
W(a)
Here, we can see that Transaction-2 starts its execution after the completion of Transaction-1.
Serial schedules are always serializable because the transactions only work one after the other. Also,

for a transaction, there are n! serial schedules possible (where n is the number of transactions).

2. Non-serial Schedule - A schedule in which the transactions are interleaving or

interchanging. There are several transactions executing simultaneously as they are being

used in performing real-world database operations. These transactions may be working on

the same piece of data. Hence, the serializability of non-serial schedules is a major concern

so that our database is consistent before and after the execution of the transactions.

Example:

Transaction-1 Transaction-2
R(a)
W(a)
R(b)
W(b)
R(b)
R(a)
W(b)
W(a)
We can see that Transaction-2 starts its execution before the completion of Transaction-1, and they

are interchangeably working on the same data, i.e., "a" and "b".

What is a serializable schedule?

A non-serial schedule is called a serializable schedule if it can be converted to its equivalent serial

schedule. In simple words, if a non-serial schedule and a serial schedule result in the same then the

non-serial schedule is called a serializable schedule.


Testing of Serializability

To test the serializability of a schedule, we can use Serialization Graph or Precedence Graph. A

serialization Graph is nothing but a Directed Graph of the entire transactions of a schedule.

It can be defined as a Graph G(V, E) consisting of a set of directed-edges E = {E1, E2, E3, ..., En}

and a set of vertices V = {V1, V2, V3, ...,Vn}. The set of edges contains one of the two operations -

READ, WRITE performed by a certain transaction.

Ti -> Tj, means Transaction-Ti is either performing read or write before the transaction-Tj.

NOTE:

If there is a cycle present in the serialized graph then the schedule is non-serializable because

the cycle resembles that one transaction is dependent on the other transaction and vice versa. It also

means that there are one or more conflicting pairs of operations in the transactions. On the other

hand, no-cycle means that the non-serial schedule is serializable.


What is a conflicting pair in transactions?

Two operations inside a schedule are called conflicting if they meet these three conditions:

1. They belong to two different transactions.

2. They are working on the same data piece.

3. One of them is performing the WRITE operation.

To conclude, let’s take two operations on data: "a". The conflicting pairs are:

1. READ(a) - WRITE(a)

2. WRITE(a) - WRITE(a)

3. WRITE(a) - READ(a)

Note:

There can never be a read-read conflict as there is no change in the data.

Let's take an example of schedule "S" having three transactions t1, t2, and t3 working

simultaneously, to get a better understanding.

t1 t2 t3
R(x)
R(z)
W(z)
R(y)
R(y)
W(y)
W(x)
W(z)
W(x)
Non-serializable schedule. R(x) of T1 conflicts with W(x) of T3, so there is a directed edge from

T1 to T3. R(y) of T1 conflicts with W(y) of T2, so there is a directed edge from T1 to T2. W(y\x) of

T3 conflicts with W(x) of T1, so there is a directed edge from T3 to T. Similarly, we will make

edges for every conflicting pair. Now, as the cycle is formed, the transactions cannot be serializable.

Types of Serializability

Serializability of any non-serial schedule can be verified using two types mainly: Conflict

Serializability and View Serializability.

One more way to check serializability is by forming an equivalent serial schedule that results in the

same as the original non-serial schedule. Since this process only focuses on the output rather than

the operations taking place in between the switching of transactions, it is not practically used. Now

let's discuss Conflict and View Serializability in detail.


Conflict Serializability and Conflict Serializable Schedules

A non-serial schedule is a conflict serializable if, after performing some swapping on the non-

conflicting operation results in a serial schedule. It is checked using the non-serial schedule and an

equivalent serial schedule. This process of checking is called Conflict Serializability in DBMS.

It is tedious to use if we have many operations and transactions as it requires a lot of swapping.

For checking, we will use the same Precedence Graph technique discussed above. First, we will

check conflicting pairs operations(read-write, write-read, and write-write) and then form directed

edges between those conflicting pair transactions. If we can find a loop in the graph, then the

schedule is non-conflicting serializable otherwise it is surely a conflicting serializable schedule.

Example:

We have a schedule "S" having three transactions t1, t2, and t3 working simultaneously. Let's form

is precedence graph.

t1 t2 t3
R(x)
R(y)
R(y)
W(y)
W(x)
W(x)
R(x)
W(x)
As there is no loop in the

graph(the graph is DAG), it

is a conflict serializable

schedule as well as a serial

schedule. Since it is a serial

schedule, we can detect the order of transactions as well.

The order of the Transactions: t1 will execute first as there is no incoming edge on T1. t3 will

execute second as it depends on T1 only. t2 will execute at last as it depends on both T1 and T3.

So, order of its equivalent serial schedule is: t1 --> t3 --> t2

NOTE:

If a schedule is conflicting serializable, then it is surely a consistent schedule. On the other hand, a

non-conflicting serializable schedule may or may not be serial. To further check its serial behavior,

we use the concept of View Serializability.

View Serializability and View Serializable Schedules

If a non-serial schedule is view equivalent to some other serial schedule then the schedule is called

View Serializable Schedule. It is needed to ensure the consistency of a schedule.


What is view equivalency?

The two conditions needed by schedules(S1 and S2) to be view equivalent are:

1. Initial read must be on the same piece of data.

Example: If transaction t1 is reading "A" from database in schedule S1, then in schedule

S2, t1 must read A.

2. Final write must be on the same piece of data.

Example: If a transaction t1 updated A at last in S1, then in S2, t1 should perform final

write as well.

3. The mid sequence should also be in the same order.

Example: If t1 is reading A which is updated by t2 in S1, then in S2, t1 should read A

which should be updated by t2.

This process of checking view equivalency of a schedule is called View Serializability.

Example: We have a schedule "S" having two transactions t1, and t2 working simultaneously.

S:

t1 t2
R(x)
W(x)
R(x)
W(x)
R(y)
W(y)
R(y)
W(y)
Let's form its view equivalent schedule (S') by interchanging mid-read-write operations of both the

transactions. S':

t1 t2
R(x)
W(x)
R(y)
W(y)
R(x)
W(x)
R(y)
W(y)
Since a view equivalent schedule is possible, it is a view serializable schedule.

NOTE:

A conflict serializable schedule is always viewed as serializable, but vice versa is not always true.

Actual process for checking view serializability

1. First, check for conflict serializability.

2. Check for a blind write. If there is a blind write, then the schedule can be view serializable.

So, check its view serializability using the view equivalent schedule technique (stated

above). If there is no blind write, then the schedule can never be view serializable.

Blind write is writing a value or piece of data without reading it.

Example:

We have a schedule "S" having two transactions t1, t2, and t3 working simultaneously.

S:
t1 t2 t3
R(x)
W(x)
W(x)
W(x)
It's precedence graph:

Since there is a loop present, the schedule is non-conflicting serializable schedule. Now, there are

blind writes [t2 -> w(x) and t3 -> w(x)] present, hence check for View Serializability.

One of its view equivalent schedules can be:

S':

t1 t2 t3
R(x)
W(x)
W(x)
W(x)
Hence, the schedule is view serializable.
NOTE:

One important thing to note here is that we can form a number of sequences of the transactions such

as:

t1 t2 t3

t1 t3 t2

t2 t1 t3

t2 t3 t1

t3 t1 t2

t3 t2 t1

This makes it a N-P Hard problem.

Benefits of Serialization

Serialization helps in checking concurrency control between multiple transactions. It also helps in

maintaining consistency in the database before and after any transaction. Serializable schedules are

resource-efficient and help in improving CPU throughput(total work done in a certain time).

Conclusion

 Serialization is a way to check if the execution of two or more transactions is maintaining

the database consistency or not.


 Serialization helps to maintain isolation between multiple transactions under a schedule.

Recoverability in DBMS

Overview

In the Database Management Systems there are multiple transactions that are happening parallelly.

Some of these transactions are dependent on each other while some are independent. If one

transaction fails in the dependent transactions, can we minimize its impact on other transactions as

well? Recoverability in DBMS is an area that deals with such kind of issues. In this article, we will

discuss about recoverable and irrecoverable schedules, followed by cascadeless and cascade

rollback recoverable schedules.

Recoverable Schedule

Suppose Aarnav has 10 lakhs in his account and his friend Jaanvi who has 1 lakh in her account,
needs 2 lakhs more to buy a new gaming laptop. The laptop is a limited edition, and once she places
the order, she cannot cancel it. The money can be paid only within 30 minutes of placing the order.
She texted Aarnav to transfer her 2 lakhs and placed an order for the 3 lakh gaming laptop.

Transaction in their bank account


If

Aarnav doesn't transfer the money to Jaanvi, Jaanvi may get in trouble since she can't cancel the
order and she does not have money. Hence, a wise decision would have been to place an order when
she receives 2 lakhs from Aarnav.

Therefore, if Jaanvi places the order once she receives the money from Aarnav can be considered a

recoverable schedule since she has the option whether to place the order or not.
A schedule is a recoverable schedule if each transaction present in the schedule
commits only after all the transactions from which it has read the values are
executed/committed entirely. If T1 and T2 are two transactions and T2 reads data
from T1, then T2 should commit only after T1 commits. Mathematically,
(T1→T2 Reads)⟹(T1 Commits→T2 Commits)

Irrecoverable Schedule

A schedule is said to be irrecoverable if the transaction commits before the transaction from which

it has read the data commits. Mathematically,

(T1→T2 Reads)⟹(T2 Commits→T1 Commits)


If Jaanvi puts the order before the money arrives, then it can be considered as an irrecoverable

schedule since she can't cancel the order now.

Let us see one more example for better understanding. Consider two transactions T1 and T2 in the
schedule. T1 initially reads and write the data but is not committed. T2 starts, it reads the data
updated by T1 and commits. Now, if T1 fails, both T1 and T2 needs to rollback and start again. This
is an irrecoverable schedule since T1 can rollback as it was not committed but T2 can't since it is
committed.
Cascade Recoverable Rollback Schedule
Let us first understand what "cascading" means by an example. Cascade generally means that
successive events depend on the occurrence of previous events. For example, water falls through
the nth stair after flowing through the (n−1)th stair. It is falling from the (n−1)th stair after
traversing from the (n−2)th stair. Hence all the stairs form a cascade. The water would not fall
through the successive cascade of stairs, if it doesn't flows through any one of the previous stair.

If the transactions form a cascade without any commits as shown below, then it is called a cascade

recoverable schedule. In this type of schedule, if the previous transaction fails then the following

transactions needs to be aborted and start again since there would be inconsistency in the data.
Cascadeless Recoverable Rollback Schedule
Abhijeet, Bhaswat, and Clara are sitting in the exam hall one behind another. Clara is cheating from

Bhaswat and Bhaswat is cheating from Abhijeet. In what order should they submit the answer sheet

to get equal marks ?

Abhijeet should submit the paper first, followed by Bhaswat and Clara. Otherwise, say Abhijeet

found the mistake in his answer after Bhaswat and Clara submitted the answer sheet. So he can

update the answer and submit. Abhijeet will now get more marks since Bhaswat and Clara can't

update their solution as they have already submitted the answer sheet.

Hence, in a cascadeless recoverable schedule, a transaction should only read the data from another

transaction in the schedule once the previous schedule is being committed.


The above diagram depicts a cascadeless recoverable schedule. There are 4 transactions, T1, T2, T3
and T4. Initially, T1 reads and write the data and it commits. Then T2 reds the data and commits,
followed by T3 and T4. If any one the transactions fails in between, they it will rollback to their
initial state without affecting the other transactions.
Cascadeless schedules are more preferred than cascading schedules as the loss of CPU

cycles is minimized when a rollback is required.

Conclusion

 Recoverability in DBMS is like backup and restore used to satisfy the durability of

transaction schedules.

 If each transaction present in the schedule commits only after all the transactions from

which it has read the values are committed, then the schedule is said to be recoverable

schedule.

 If a schedule is not recoverable then it is irrecoverable.

 Cascadeless schedule is a recoverable schedule in which the following transactions can read

the data of the previous transactions only when the previous transactions are committed.

 If a schedule is not cascadeless, then it is a cascade rollback schedule.


RECOVERY FROM TRANSACTION FAILURES

1. Log based recovery

Overview
Log-based recovery in DBMS provides the ability to maintain or recover data in case of system

failure. DBMS keeps a record of every transaction on some stable storage device to provide easy

access to data when the system fails. A log file will be created for every operation performed on the

database at that point. The original transaction should be processed before it is applied to the

database. To understand Log-based recovery in DBMS let us look at an example

To recover from a system failure, Apache HBase, a distributed data store based on HDFS, employs

a log-based recovery provided by DBMS. The WAL is written every time a database write occurs,

so it is replicated on HDFS. Upon recording the transaction in WAL, it is then moved to memstore,

which lives on the region server for HBase. This memstore becomes full once there are multiple

writes. As soon as the memstore is full, an HFile is created and flushed to disk.

During a shutdown or a restart of the system holding the memstore, all the data in the memstore

cache are lost before they could be flushed to disk as HFiles. When the memstore is full, the WAL is

replayed to reconstruct the data and repopulate it so that when full, it can flush the data to

permanent storage.
Introduction to Log-Based Recovery in DBMS

A log is a series of records. The logs for each transaction are kept in a log file to allow recovery in

case of failure. A log is kept for each operation performed on the database. It is important to store

the log before the actual transactions are applied to the database. Take the example of modifying a

student's City. This transaction produces the following logs.

 A start log is produced when the transaction begins. <Tn, Start>

 A new log is written to the file when the City is changed from Chennai to NCR <Tn, City,

'Chennai', 'NCR' >

 Once the transaction has been completed, another log will be written to indicate that the

operation has been completed

How to Perform Log-Based Recovery in DBMS?

The following terms are used for describing log-based recovery.

Name Description
Transaction identifier It is an identifier that distinguishes one transaction from another.
Data item identifier It uniquely identifies data.
Old value This is the value of the data item before writing.
New value This is the value of an item after it has been written.
The different types of log records are denoted by the following names:

Command Description
<Ti start> The transaction Ti has begun.
<Ti, Xi, V1, The transaction Ti performed a write operation on item Xi, and Xj contains
V2> value v1 before the write and will contain value V2 after it was completed.
Command Description
<Ti commit> It commits a transaction.
<Ti abort> It aborts a transaction.
Transactions that perform a write operation to the database only modify it when a log record is

created for that write operation. The logs may or may not record the reading of data items. The

reason is that reading a data item has no effect on the consistency of the database and is not

beneficial to the recovery mechanism.

During the recovery process, we perform two operations:

1. Undo(Ti): Restores the old values of each of the items in Ti that have been updated by the

transaction Ti.

2. Redo(Ti): Updates all data items updated by transaction Ti with their new values. Undoing

a T requires the log to contain both the record <start> and the record <commit>.
Checkpoint in DBMS

Overview
The DBMS checkpoint is a procedure of compressing the transaction log file by transferring the old

transaction to permanent storage. It helps in system recovery when the failure occurs. The

procedure of system recovery includes reading of log file in reverse order and maintaining redo and

undo lists.

Introduction

In a real-time environment, a newly created transaction log file occupies an ample amount of

storage space. A transaction log file contains a record of operations performed by the transactions in

the database. They ensure consistency on crashes or hardware failures. Tracing every single update

and maintenance of log files also contributes to filling up the system's memory. The DBMS

checkpoint come into the picture when the size of the transaction log file becomes too large to

manage and handle easily. The DBMS checkpoint is a mechanism of compressing the
transaction log file by transferring the old transactions to permanent storage. The checkpoint

marks the position till where the consistency of the transactions is maintained. During the execution

of the transactions, the curser passes through the marked checkpoint. At that point, all the

transactions get saved into the database and get erased from the log file. Then the log file started

getting filled up by some new list of operations till the next checkpoint.

Recovery using Checkpoint

The schematic representation of the system recovery when the failure occurs during the

execution of concurrent transactions.

Transactions and their operations in the above diagram:

T1 T2 T3 T4
START
START
COMMIT
START
COMMIT
START
FAILURE
Following are the steps to be performed for the system recovery using checkpoint.

 The transaction log file are read in the reverse order, ie., from T4 to T1.
 Redo and Undo are the lists that are created and maintained by the system.

 If the transactions contain operations like <Tn, start> and <Tn, commit> together or <Tn,

commit> alone then the transaction will be stored in the redo list. In the above diagram, The

transaction T1 contains only <Tn, commit> and transactions T2 and T3 contain <Tn, start>

and <Tn, commit> both the operations and therefore transactions T1, T2 and T3 are stored in

the redo list.

 If the transactions contain operations like <Tn, start> but not <Tn, commit>, then the

transaction will be stored in undo list. In the above diagram, The transaction T4 contains

<Tn, start> but not <Tn, commit> operation, and therefore transaction T4 is stored in undo

list.

Table of transactions operations and the lists they are placed in

Operations List
<Tn,start> and <Tn,commit> Redo list
<Tn,commit> Redo list
<Tn,start> Undo list
Advantages of Checkpoint

 A checkpoint is a feature that enhances the consistency of the database during the execution

of multiple transactions concurrently.

 Checkpoint help in getting our transactions back in the case of accidental shutdown of the

database.
 Checkpoint hardens the dirty pages. Hardening dirty pages refer to writing all the dirty pages

from log files or the buffer to physical disk.

 Checkpoint serves as the synchronization point between the database and the transaction log

file.

 Checkpoint accelerates the data recovery process.

Concurrency Control Techniques

Overview
When several transactions execute concurrently without any rules and protocols, various problems

arise that may harm the data integrity of several databases. These problems are known

as concurrency control problems. Therefore several rules are designed, to maintain consistency in

the transactions while they are executing concurrently which are known as concurrency control

protocols.

What is concurrency control in DBMS?

A transaction is a single reasonable unit of work that can retrieve or may change the data of a

database. Executing each transaction individually increases the waiting time for the other

transactions and the overall execution also gets delayed. Hence, to increase the throughput and to

reduce the waiting time, transactions are executed concurrently.


Example: Suppose, between two railway stations, A and B, 5 trains have to travel, if all the trains

are set in a row and only one train is allowed to move from station A to B and others have to wait

for the first train to reach its destination then it will take a lot of time for all the trains to travel from

station A to B. To reduce time all the trains should be allowed to move concurrently from

station A to B ensuring no risk of collision between them.

When several transactions execute simultaneously, then there is a risk of violation of the data

integrity of several databases. Concurrency Control in DBMS is a procedure of managing

simultaneous transactions ensuring their atomicity, isolation, consistency and serializability.

Concurrent Execution in DBMS

 In a multi-user system, multiple users can access and use the same database at one time,

which is known as the concurrent execution of the database. It means that the same database

is executed simultaneously on a multi-user system by different users.

 While working on the database transactions, there occurs the requirement of using the

database by multiple users for performing different operations, and in that case, concurrent

execution of the database is performed.

 The thing is that the simultaneous execution that is performed should be done in an

interleaved manner, and no operation should affect the other executing operations, thus
maintaining the consistency of the database. Thus, on making the concurrent execution of

the transaction operations, there occur several challenging problems that need to be solved.

Lock Based Protocol in DBMS

Overview

The database management system (DBMS) stores data that can connect with one another as well as

can be altered at any point. There are instances where more than one user may attempt to access the

same data item simultaneously, resulting in concurrency.

As a result, there is a requirement to handle concurrency in order to handle the concurrent

processing of transactions across many databases in the picture. Lock-based protocol in DBMS is

an example of such an approach.

Introduction to Lock-Based Protocol

We can define a lock-based protocol in DBMS as a mechanism that is responsible for preventing a

transaction from reading or writing data until the necessary lock is obtained. The concurrency

problem can be solved by securing or locking a transaction to a specific user. The lock is a variable

that specifies which activities are allowed on a certain data item.


Let's go through one real-life example which will help us understand the concept of the lock-based

protocol in DBMS. Every second, millions of payments take place. Consider buying a movie ticket

at your preferred theatre. Now, there's a chance that two or more customers will try to reserve the

very same seat without knowing it.

A scenario like this, when treated on a big scale, can damage the database consistency resulting in

the corruption of data. To avoid any conflicts over user access to read and write into the database

and to ensure that there is no concurrency, each transaction must be handled separately. Lock-based

protocols in DBMS are used to accomplish this purpose. To know more about the concept of

concurrency, read here: Concurrency control in DBMS

Types of Locks in DBMS

In DBMS Lock based Protocols, there are two modes for locking and unlocking data items Shared

Lock (lock-S) and Exclusive Lock (lock-X). Let's go through the two types of locks in detail:
Shared Lock

 Shared

Locks,

which are

often denoted as lock-S(), is defined as locks that provide Read-Only access to the

information associated with them. Whenever a shared lock is used on a database, it can be

read by several users, but these users who are reading the information or the data items will

not have permission to edit it or make any changes to the data items.

 To put it another way, we can say that shared locks don't provide access to write. Because

numerous users can read the data items simultaneously, multiple shared locks can be

installed on them at the same time, but the data item must not have any other locks

connected with it.

 A shared lock, also known as a read lock, is solely used to read data objects. Read integrity

is supported via shared locks.

 Shared locks can also be used to prevent records from being updated.
 S-lock is requested via the Lock-S instruction.

Example of Shared Locks: Consider the situation where the value of variable X equals 50 and

there are a total of 2 transactions reading X. If one transaction wants to change the value of A,

another transaction that tries to read the value will read the incorrect value of the variable X.

However, until it is done with reading, the Shared lock stops it from updating.

When the lock-based protocol in DBMS is applied to the transaction (let's say T1) discussed above,

all the processes listed below occur.

1. T1 will gain exclusive access to the data item X.

2. Find out what the current value of data item A is.

3. The data item will be accessible once the transaction is finished.

Exclusive Lock
 Exclusive Lock allows the data item to be read as well as written. This is a one-time use

mode that can't be utilized on the exact data item twice. To obtain X-lock, the user needs to

make use of the lock-x instruction. After finishing the 'write' step, transactions can unlock

the data item.

 By imposing an X lock on a transaction that needs to update a person's account balance, for

example, you can allow it to proceed. As a result of the exclusive lock, the second

transaction is unable to read or write.

 The other name for an exclusive lock is write lock.

 At any given time, the exclusive locks can only be owned by one transaction.

Example of exclusive locks: Consider the instance where the value of a data item X is equal

to 50 and a transaction requires a deduction of 20 from the data item X. By putting a Y lock on this

particular transaction, we can make it possible. As a result, the exclusive lock prevents any other

transaction from reading or writing.

Lock Compatibility Matrix

A vital point to remember when using Lock-based protocols in Database Management System is

that a Shared Lock can be held by any amount of transactions. On the other hand, an Exclusive

Lock can only be held by one transaction in DBMS, this is because a shared lock only reads data
but does not perform any other activities, whereas an exclusive lock performs read as well as

writing activities.

The figure given below demonstrates that when two transactions are involved, and both of these

transactions seek to read a specific data item, the transaction is authorized, and no conflict occurs;

but, in a situation when one transaction intends to write the data item and another transaction

attempts to read or write simultaneously, the interaction is rejected.


Two Phase Locking Protocol

Overview

Locking in a database management system is used for handling transactions in databases. The two-

phase locking protocol ensures serializable conflict schedules. A schedule is called conflict

serializable if it can be transformed into a serial schedule by swapping non-conflicting operations.

Scope

covers all the concepts involved in the two-phase locking protocol :

 Types of locks in transactions - Shared lock and Exclusive lock

 Two phases in the two-phase locking protocol

 Growing Phase

 Shrinking phase

 Two-phase locking types

 Strict two-phase locking protocol


 Rigorous two-phase locking protocol

 Conservative Two-Phase Locking Protocol

Two-Phase Locking

Before understanding the two phases of Locking, let's understand the types of locks used in

transaction control.

 Shared Lock: Data can only be read when a shared lock is applied. Data cannot be written.

It is denoted as lock-S

 Exclusive lock: Data can be read as well as written when an exclusive lock is applied. It is

denoted as lock-X

Now, let's understand the two phases of locking.

The two phases of Locking are :

 Growing Phase: In the growing phase, the transaction only obtains the lock. The transaction

can not release the lock in the growing phase. Only when the data changes are committed

the transaction starts the Shrinking phase.

 Shrinking Phase: Neither are locks obtained nor they are released in this phase. When all

the data changes are stored, only then the locks are released.
In the above figure, we can see that in the growing phase, all the locks are obtained till the point

when all the locks needed by the transactions are acquired. This pint is called Lock-Point. After the

lock point, the transaction enters the Shrinking phase.

Two-Phase Locking Types

Two-phase Locking is further classified into three types :

1. Strict two-phase locking protocol :

 The transaction can release the shared lock after the lock point.

 The transaction can not release any exclusive lock until the transaction commits.

 In strict two-phase locking protocol, if one transaction rollback then the other

transaction should also have to roll back. The transactions are dependent on each

other. This is called Cascading schedule.

2. Rigorous two-phase locking protocol :


 The transaction cannot release either of the locks, i.e., neither shared lock nor

exclusive lock.

 Serailizability is guaranteed in a Rigorous two-phase locking protocol.

 Deadlock is not guaranteed in rigorous two-phase locking protocol.

3. Conservative two-phase locking protocol :

 The transaction must lock all the data items it requires in the transaction before the

transaction begins.

 If any of the data items are not available for locking before execution of the lock,

then no data items are locked.

 The read-and-write data items need to know before the transaction begins. This is not

possible normally.

 Conservative two-phase locking protocol is deadlock-free.

 Conservative two-phase locking protocol does not ensure a strict schedule.

TIMESTAMP PROTOCOL

Timestamp is a unique identifier created by the DBMS to identify a transaction. They are

usually assigned in the order in which they are submitted to the system. Refer to the

timestamp of a transaction T as TS(T).


Timestamp Ordering Protocol –
The main idea for this protocol is to order the transactions based on their Timestamps. A
schedule in which the transactions participate is then serializable and the only equivalent
serial schedule permitted has the transactions in the order of their Timestamp Values.
Stating simply, the schedule is equivalent to the particular Serial Order corresponding to
the order of the Transaction timestamps. An algorithm must ensure that, for each item
accessed by Conflicting Operations in the schedule, the order in which the item is accessed
does not violate the ordering. To ensure this, use two Timestamp Values relating to each
database item X.
W_TS(X) is the largest timestamp of any transaction that executed write(X)
successfully.
R_TS(X) is the largest timestamp of any transaction that executed read(X)
successfully.
Basic Timestamp Ordering –
Every transaction is issued a timestamp based on when it enters the system. Suppose, if an
old transaction Ti has timestamp TS(Ti), a new transaction Tj is assigned timestamp TS(Tj)
such that TS(Ti) < TS(Tj). The protocol manages concurrent execution such that the
timestamps determine the serializability order. The timestamp ordering protocol ensures that
any conflicting read and write operations are executed in timestamp order. Whenever some
Transaction T tries to issue a R_item(X) or a W_item(X), the Basic TO algorithm compares
the timestamp of T with R_TS(X) & W_TS(X) to ensure that the Timestamp order is not
violated. This describes the Basic TO protocol in the following two cases.

1.Whenever a Transaction T issues a W_item(X) operation, check the following


conditions:
If R_TS(X) > TS(T) and if W_TS(X) > TS(T), then abort and rollback T and reject
the operation. else,
Execute W_item(X) operation of T and set W_TS(X) to TS(T).
2.Whenever a Transaction T issues a R_item(X) operation, check the following
conditions:
If W_TS(X) > TS(T), then abort and reject T and reject the operation, else

If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set R_TS(X)
to the larger of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects two conflicting operations that occur in an
incorrect order, it rejects the latter of the two operations by aborting the Transaction
that issued it. Schedules produced by Basic TO are guaranteed to be conflict serializable.

Validation Based Protocol in DBMS

Validation based protocol avoids the concurrency of the transactions and works based on
the assumption that if no transactions are running concurrently then no interference occurs.
This is why it is also called Optimistic Concurrency Control Technique.

In this protocol, a transaction doesn’t make any changes to the database directly, instead it
performs all the changes on the local copies of the data items that are maintained in the
transaction itself. At the end of the transaction, a validation is performed on the transaction.
If it doesn’t violate any serializability rule, the transaction commit the changes to the
database else it is updated and restarted.

Three phases of Validation based Protocol


1. Read phase: In this phase, a transaction reads the value of data items from database
and store their values into the temporary local variables. Transaction then starts
executing but it doesn’t update the data items in the database, instead it performs all
the operations on temporary local variables.

2. Validation phase: In this phase, a validation check is done on the temporary variables
to see if it violates the rules of serializability.

3. Write phase: This is the final phase of validation based protocol. In this phase, if the
validation of the transaction is successful then the values of temporary local variables
is written to the database and the transaction is committed. If the validation is failed in
second phase then the updates are discarded and transaction is slowed down to be
restarted later.

Let’s look at the timestamps of each phase of a transaction:

Start(Tn): It represents the timestamp when the transaction Tn starts the execution.

Validation(Tn): It represents the timestamp when the transaction Tn finishes the read phase
and starts the validation phase.
Finish(Tn): It represents the timestamp when the transaction Tn finishes all the write
operations.

This protocol uses the Validation(Tn) as the timestamp of the transaction Tn because this is
actual phase of the transaction where all the checks happen. So it is safe to say that TS(Tn)
= Validation(Tn).

If there are two transactions T1 & T2 managed by validation based protocol and if
Finish(T1) < Start(T2) then the validation will be successful as the serializability is
maintained because T1 finished the execution well before the transaction T2 started the read
phase

What is Multiple Granularity in DBMS?

Granularity is the size of data that is allowed to be locked. Multiple granularity is


hierarchically breaking up our database into smaller blocks that can be locked. This locking
technique allows the locking of various data sizes and sets. This way of breaking the data
into blocks that can be locked decreases lock overhead and increases the concurrency in our
database. Multiple granularity helps maintain a track of data that can be locked and how the
data is locked.
What is Granularity in DBMS?

Transactions in databases are atomic, which means when a transaction needs to access the values in

the database, a locking protocol is used to lock the database. Locking prevents other transactions

from modifying or viewing the database simultaneously because other ongoing transactions are

locked.
It is crucial to lock the database while performing a transaction optimally. Locking an entire

database unnecessarily can cost a loss of concurrency. We have a concept of multiple granularity in

DBMS to solve this issue.

Example

We can understand the concept of multiple granularity with a tree diagram. Consider a tree structure

as mentioned below.

Here, our tree has four node levels.

1. The first level, or the tree's root, represents our entire database containing several tables and

records.

2. The next level (second level) has nodes that represent areas. The database consists of these

areas.
3. The children of the second level are known as files. A file can not be present within more

than one area.

4. The bottommost layer in our trees is known as the records. Records are child nodes of files

and can not be present in more than one file.

Hence, the four levels in our tree are:

 Database

 Area

 File

 Record

Now that we understand the levels in our tree diagram, we can see that each node in the tree can be

individually locked. So far, we know from 2 phase locking protocol that when a transaction locks a

node, it can be shared or exclusive. Once a node is locked, the children nodes are implicitly locked

with the identical lock mode.

There are two problems with this approach:

1. If we want to apply a lock on a node, we need to ensure that no ancestor of the node is

locked. To ensure that none of the ancestors is locked, we need to iterate from root to node

to check any node in the path is not locked.

2. To acquire a lock on the entire database, the complete tree needs to be iterated to ensure

none of the descendent (of the root node) is locked.


These limitations arise because we have a two-phase locking mechanism. The need to search the

entire tree to acquire a lock on the entire database nullifies our concept of multiple granularity. We

will see a better approach to solve the limitations discussed above in upcoming sections of the

article.

 Shared locks: Shared locks are used while reading data only, and the lock prevents other

transactions from updating the data but allows them to read the data.

 Exclusive locks: When the data is being modified, exclusive locks prevent other transactions

from reading or modifying the data.

What are the Different Types of Intention Mode Locks in

Multiple Granularity?

For the problem mentioned in the above section, we introduce three additional locks other than

shared and exclusive locks, which are:

1. Intention Shared (IS) lock: If there is an Intention-shared lock on a node, then the lower

level tree only has explicit locking with the shared lock. This means that a node can have

an Intention Shared (IS) lock only when only nodes at the lower level are locked with shared

locks.
2. Intention Exclusive (IX) lock: Explicit lock on the lower level tree from the node with

either shared or exclusive lock.

3. Shared and Intention Exclusive (SIX) lock: In such type of lock, the subtree of the locked

node is explicitly locked in shared mode and with an exclusive mode lock at the lower level.

Let us see the compatibility matrix for the five locks that we now know:

To

ensure serializability in data, we use intention lock modes in multiple granularity protocols. If

a transaction T attempts to lock a node, it needs to follow the following protocols:

 The transaction T must follow the lock-compatibility matrix.

 Transaction T first locks the root node (in any mode).

 If the parent node of T has IX or IS lock, then transaction T will lock the node only in IS or

S mode.

 If the parent node of T has locked in mode SIX or IX, then transaction T will lock the node

only in SIX, IX, or X mode.

 If transaction T has previously not unlocked any node, then transaction T can lock a node.

 If none of the children of transaction T is locked, only then can transaction T unlock a node.
The locks in multiple granularity are acquired from top to down (top-down order), but the locks

must be released in the bottom-up order.

Let us see the simulation of the protocol with an example on the tree mentioned in the article:

 Suppose transaction T1 reads the recode Ra2 in a file Fa. Then another transaction, T2, will

lock in the database, area A1, file Fa with IS mode, and record Ra2 in Shared mode in the

same order.

 Let's say our transaction T2 modifies the record Ra8 in the same file Fa. For this operation

database, area A1 and file Fa will be locked in IX mode, and the target record Ra8 will be

locked in exclusive (X) mode.

 Another transaction T3, tries to read all records in our file Fa. Then T3 will lock the nodes in

its path, i.e. database and area A1, in IS mode, and then file Fa will be locked in shared (S)

mode.

 Lastly, let's say transaction T4 reads our entire database. Then transaction T4 will lock the

database in S mode and read the database.

Here, our transactions T1, T3, and T4 concurrently access the database, while transactions T2 and

T1 can execute concurrently but can not execute concurrently with either transaction T3 and T4.

This way, multiple granularity improves concurrency and reduces the overhead on locks, but

deadlocks can still be possible in multiple granularity protocol, just like in the two-phase locking

protocol.
Also,click here to learn more about Record in DBMS.

Conclusion

 A locking protocol locks the database whenever a transaction wants to access the database.

 Granularity is the size of data that is allowed to be locked.

 Multiple granularity is hierarchically breaking up our database into smaller blocks that can

be locked.

 Other than shared and exclusive locks, three additional locks are introduced in multiple

granularity locking protocols:

 Intention Shared (IS)

 Intention Exclusive (IX)

 Shared and Intention Exclusive (SIX)

You might also like