Module 5

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 74

1

Transaction management,
Concurrency Control
2
3
What is a
Database
Transaction?
 A Database
Transaction is a
logical unit of
processing in a
DBMS which
entails one or
more database
access operation.
4
Transaction in DBMS

 Transaction in DBMS refers to


operations like insertion, updation, and
deletion of data. This set of logical
works requires one or more database
operations. A transaction means that
there is a change in the database.
Introduction to Transaction 5
Management:

Transaction management in
DBMS ensures that data is
restored consistently when
the computer restarts after a
crash. After a system crash,
restoring the data is essential
for data recovery. The
restoration uses checkpoints,
transaction logs, and crash
recovery methods.
Example of a Transaction in DBMS: 6

 A simple example of a transaction will be dealing with the bank accounts of two
users let say A and B. A simple transaction of moving an amount of 5000 from A
to B engages many low-level jobs.
 As the amount of Rs. 5000 gets transferred from the A's account to B's account, a
series of tasks gets performed in the background of the screen.
This very simple and small transaction includes several steps:
1.Reading the amount of A’s account. Denoted as R(A)
2.debit A's bank account by 5000 i.e., A-5000
3.writing the modified account details with write operation. i.e., W(A)
4.Reading the amount of B’s account i.e., R(B)
5.credit A's bank account by 5000 i.e., B+5000
6.writing the modified account details with write operation. i.e., W(B)
Why do you need 7
concurrency in Transactions?

 A database is a shared resource accessed. It is used by many


users and processes concurrently. For example, the banking
system, railway, and air reservations systems, stock market
monitoring, supermarket inventory, and checkouts, etc.
 Not managing concurrent access may create issues like:
• Hardware failure and system crashes
• Concurrent execution of the same transaction, deadlock, or slow
performance
8
State Transition Diagram for a
Database Transaction
9
Different state
10
Process flow

1. Once a transaction states execution, it becomes active. It can issue


READ or WRITE operation.
2. Once the READ and WRITE operations complete, the transactions
becomes partially committed state.
3. Next, some recovery protocols need to ensure that a system failure will
not result in an inability to record changes in the transaction
permanently. If this check is a success, the transaction commits and
enters into the committed state.
4. If the check is a fail, the transaction goes to the Failed state.
5. If the transaction is aborted while it’s in the active state, it goes to the
failed state. The transaction should be rolled back to undo the effect of
its write operations on the database.
6. The terminated state refers to the transaction leaving the system.
11
Properties of Transaction
 A transaction is an execution of a user program, as a series of
read and writes operations. There are properties that all
transactions should follow and possess. The four basic are in
combination termed as ACID properties.
 A DBMS must ensure four important properties of transactions
to maintain data in the face of concurrent access and system
failures:
12
Goal: The ACID properties

 Atomicity: Either all actions are carried


out, or none are.
 Consistency: If each transaction is
consistent, and the database is initially
consistent, then it is left consistent
 Isolation: Transactions are isolated, or
protected, from the effects of other
scheduled transactions
 Durability: If a transactions completes
successfully, then its effects persist
13
Atomicity

 A transaction can
 Commit after completing its actions, or
 Abort because of
 Internal DBMS decision: restart
 System crash: power, disk failure, …
 Unexpected situation: unable to access disk, data
value, …
 A transaction interrupted in the middle could leave
the database inconsistent
 DBMS needs to remove the effects of partial
transactions to ensure atomicity: either all a
transaction’s actions are performed or none
14
Atomicity cont.

 A DBMS ensures atomicity by undoing the


actions of partial transactions
 To enable this, the DBMS maintains a record,
called a log, of all writes to the database
 The component of a DBMS responsible for this is
called the recovery manager
15
Consistency

 Each transaction, with concurrent execution of other


transactions, must preserve the consistency of the database.
For example, the user have the consistency criterion for
fund transfers between bank accounts, total amount of money
in the accounts should not change before and after the
transaction.
To transfer money from one account to another, a transaction
must debit one account, temporarily leaving the database
inconsistent, and then credit the second account, restoring
consistency.
 Users are responsible for ensuring transaction consistency.
 Integrity constraints should not be violated.
16
Isolation

 Users should be able to understand a transaction


without considering the effect of other concurrently
executing transactions, even though actions of
several transactions might be interleaved, the net
effect is identical to executing all transactions one
after the other in some serial order
 For example, if transactions T1 and T2 are
executed concurrently, the net effect is equivalent
to executing
 T1 followed by T2, or
 T2 followed by T1
17
Durability

 DBMS uses the log to ensure durability


 Once the DBMS informs the user that a transaction
has been successfully completed, its effects should
persist even if the system crashes before all its
changes are reflected on disk. This property is called
durability.
 If the system crashed before the changes made by a
completed transaction are written to disk, the log is
used to remember and restore these changes when
the system is restarted
 The DBMS component that ensures atomicity and
durability, called the recovery manager.
18
Transactions and schedules
 A transaction is seen by the DBMS as a series, or list, of actions
 Includes read and write of objects
 We’ll write this as R(o) and W(o) (sometimes RT(o) and WT(o) )
 For example
T1: [R(a), W(a), R(c), W(c)]
T2: [R(b), W(b)]
 In addition, a transaction should specify as its final action either
commit, or abort
19
Schedules
A schedule is a list of actions from a set
of transactions
A well-formed schedule is one where the
actions of a particular transaction T are in the
same order as they appear in T
20
Schedules cont.

A schedule that contains either an abort or a


commit for each transaction whose actions are listed
in it is called a complete schedule. A complete
schedule must contain all the actions of every
transaction that appears in it.
If the actions of different transactions are not

interleaved that is, transactions are executed from


start to finish, one by one-we call the schedule a
serial schedule.
CONCURRENT EXECUTION OF TRANSACTIONS
21

 Ensuring transaction isolation while permitting concurrent execution is difficult


but necessary for performance reasons.
 First, while one transaction is waiting for a page to be read in from disk, the CPU
can process another transaction. This is because I/O activity can be done in
parallel with CPU activity in a computer.
 Second, interleaved execution of a short transaction with a long transaction
usually allows the short transaction to complete quickly. In serial execution, a
short transaction could get stuck behind a long transaction, leading to
unpredictable delays in response time, or average time taken to complete a
transaction.
The DBMS interleaves the actions of different transactions to improve
performance, but not all interleavings should be allowed. Only for Serializable
schedules DBMS should allow interleavings.
22
Serializability
A serializable schedule is a non-serial schedule whose effect on
any consistent database instance is guaranteed to be identical to that
of some complete serial schedule that results from executing the
transactions in some serial order.
 As an example, the schedule shown 23in
Figure(below) is serializable. Even though the
actions of T1 and T2 are interleaved, the result of
this schedule is equivalent to running T2 (in its
entirety) and then running T1 i.e., similar to serial
schedule.
 A schedule is complete only if every transaction of
the schedule are ended with either commit or abort.
For a schedule to be serializable it should be
complete schedule.
24
Anomalies with interleaved
execution

 Concurrent transactions may leave database in an


inconsistent state. Two actions on the same data
object conflict if at least one of them is a
write.
 We’ll now consider three ways in which a
schedule involving two consistency-preserving
transactions can leave a consistent database
inconsistent
Reading Uncommitted Data 25
(WR Conflicts)
26

The first source of anomalies is that a transaction T2 could


read a database object A that has been modified by another
transaction T1, which has not yet committed. Such a read
operation is called a dirty read.
 A simple example illustrates how such a schedule could
lead to an inconsistent database state. Consider two
transactions T1 and T2. each of which, run alone,
preserves data consistency.
27

 The general problem illustrated here is that Tl may write some value
into A but that is not yet committed because its transaction is still in
progress.T1 is interleaved ,Transaction T2 started executing and used
value of A. After its transaction is done, it is committed.
 If T1 is aborted before it is committed because of some system or power
failure then T2 has read a value for A that should never have been there.
 If T2 had not yet committed, we could deal with the situation by
cascading the abort of T1 and also aborting T2; this process
recursively aborts any transaction that read data written by T2, and so
on. But T2 has already committed, and so we cannot undo its actions
which is unrecoverable schedule.(as below)
28

 In a recoverable schedule, transactions commit only after all


transactions whose changes they read commit. If transactions read
only the changes of committed transactions, the schedule is
recoverable.
A serializable schedule over a set S of transactions is a schedule
whose effect on any consistent database instance is guaranteed to be
identical to that of some complete serial schedule over the set of
committed transactions in S.
29
Unrepeatable Reads (RW
Conflicts)

 The second way in which anomalous behavior could result


is that a transaction T2 could change the value of an object
A that has been read by a, transaction T1, while T1 is still
in progress.
 If T1 tries to read the value of A again, it will get a
different result, even though it has not modified A in the
meantime.
 This situation could not arise in a serial execution of two
transactions; it is called an unrepeatable read.
30
Unrepeatable Reads (RW
Conflicts)
31

 Suppose that A is read and modified by Transaction


T1. Now T1 is interleaved and T2 starts execution ,
which reads and modifies data object A and B.
 Now if T1 reads A it finds different value for A
which is not given by T1.Here concurrency of
execution is violating isolation property of
transaction.
 This situation can never arise in a serial execution
of T1 and T2.
Overwriting Uncommitted 32

Data (WW Conflicts):

Here in this example suppose value of A is 100,in T1 transaction A is reduced by 50


and by T2 its value is again reduced by 25.At the end of two transactions value of a
should be 25.
33

 The third source of anomalous behavior is that a transaction T2 could overwrite


the value of an object A, which has already been modified by a transaction T1,
while T1 is still in progress i.e., before value of A(50) in T1 is updated,T1 is
interleaved.So read operation of T2 reads value of A as 100 instead of 50.
 As T2 does not read the value of A written by T1, a potential problem exists. By
T1 value of A is committed as 50 and then value of A by T2 is committed as 100-
25=75.
 The result is not identical to the result of either of the serial execution of
transactions, and the interleaved schedule is therefore not serializable. It violates
the desired consistency criterion that the final value of A should be 25. The
problem is that we have a lost update.After the first transaction is commit, T2
overwrites and commits again which leads to lose of previous update.
34
LOCK-BASED CONCURRENCY
CONTROL

DBMS must be able to ensure that only serializable,
recoverable schedules are allowed and that no actions of
committed transactions are lost while undoing aborted
transactions. A DBMS typically uses a locking protocol to
achieve this.
 A locking protocol is a set of rules to be followed by each
transaction to ensure that, even though several
transactions might be interleaved (i.e.,when concurrency
is used), the net effect is identical to executing all
transactions in some serial order.
 Different locking protocols use different types of locks, such
as shared locks or exclusive locks.
35
Strict Two-Phase Locking
(Strict 2PL)
 The most widely used locking protocol, called Strict Two-Phase Locking,
or Strict 2PL, has two rules. The first rule is
1. If a transaction T wants to read an object, it requests a shared lock
on the object. If a transaction T wants to modify an object, it
requests a exclusive lock on the object.
 A transaction that has an exclusive lock can also read the object; an
additional shared lock is not required.
 A transaction that requests a lock is suspended until the DBMS is able to
grant it the requested lock.
 The DBMS keeps track of the locks it has granted and ensures that if
a transaction holds an exclusive lock on an object, no other
transaction holds a shared or exclusive lock on the same object.
36

The second rule in Strict 2PL is


All locks held by a transaction are released when the
transaction is completed.
Requests to acquire and release locks can be
automatically inserted into transactions by the DBMS.
We denote the action of a transaction T requesting a
shared lock on object 0 as ST(0)
exclusive lock as XT(O)).
37

 In effect, the locking protocol allows only 'safe' interleavings of


transactions. If two transactions access completely independent
parts of the database, they concurrently obtain the locks they need
and proceed merrily on their ways.

 If two transactions access the same object, and one wants to modify
it, their actions are effectively ordered serially· all actions of one of
these transactions (the one that gets the lock on the common object
first) are completed before (this lock is released and) the other
transaction can proceed.
38

Strict 2PL algorithm allows only serializable schedules


39
Deadlocks

 Consider the following example. Transaction T1 sets an exclusive


lock on object A, T2 sets an exclusive lock on B, T1 requests an
exclusive lock on B and is queued, and T2 requests an exclusive lock
on A and is queued.
 Now, T1 is waiting for T2 to release its lock and T2 is waiting
for T1 to release its lock. Such a cycle of transactions waiting for
locks to be released is called a deadlock.
 Clearly, these two transactions will make no further progress.
Worse, they hold locks that may be required by other transactions.
40

 The DBMS must either prevent or detect (and


resolve) such deadlock situations; the common
approach is to detect and resolve deadlocks. A
simple way to identify deadlocks is to use a
timeout mechanism.
 If a transaction has been waiting too long for a
lock, we can assume that it is in a deadlock cycle
and abort it.
41
42
2PL, SERIALIZABILITY, AND
RECOVERABILITY
 We consider how locking protocols guarantee some important
properties of schedules; namely, serializability and recoverability.
 Two schedules are said to be conflict equivalent if they involve the
(same set of) actions of the same transactions and they order every
pair of conflicting actions of two committed transactions in the same
way
43
Conflict Equivalent

 Two actions conflict if they operate on the same data


object and at least one of them is a write.
 The outcome of a schedule depends only on the order
of conflicting operations.
 We can interchange any pair of nonconflicting
operations without altering the effect of the schedule
on the database.
 If two schedules are conflict equivalent, it is easy to
see that they have the same effect on a database.
Indeed, because they order all pairs of conflicting
operations in the same way, we can obtain one of
them from the other by repeatedly swapping pairs of
44
Conflict equivalent

 Non-conflicting pairs
 R(A),R(A)
 R(B),R(A)
 W(B),W(A)
 R(B), W(A)
 W(A), R(B)
 Conflicting Pairs
• W(A), R(A)
 R(A), W(A)
45
EXAMPLE1:

S2
S1
46

S1 and S2 are
conflict equivalent
but S3 schedule is
not conflict
equivalent to S1,S2
as the conflicting
operations order is
not same

S3
47

S S’
 T1 T2 T1 T2 R(A)
R(A)
W(A)
W(A)
R(B)

R(A) R(A)
W(A) W(A)

R(B)
48

 S
R(A)
T1 T2 T1 R(A)
T2
W(A) W(A)
R(B)
R(A)

R(B) R(A)
W(A)
W(A)
49

 Check for adjacent non-conflict pairs, if yes then


swap the positions.

 S and S’ are conflict equivalent.


50

 S
Conflict pairs no change in position
 T1
R(A) T2
W(A)

R(A)

W(A)
R(B)
51
Conflict Serializability
 Schedule is conflict serializable if it is conflict
equivalent to some serial schedule.
 It is useful to capture all potential conflicts
between the transactions in a schedule in a
precedence graph, also called a serializability
graph.
 The precedence graph for a schedule S
contains:
 A node for each committed transaction in S.
 An arc from Ti to Tj if an action of Ti precedes
and conflicts with one of Tj's actions.
52
EXAMPLE1
53

 The precedence graph for the schedule shown in


Figure.
54
EXAMPLE2

R(X)
T1 T2 T3
R(Y)
R(X)

R(Y)
R(Z)

W(Y)

W(Z)
R(Z)
W(X)
W(Z)
55 if
• A schedule S is conflict serializable if and only
its precedence graph is acyclic.
• An equivalent serial schedule in this case is given
by any topological sort over the precedence
graph.

T1 T2
Equivalent to
T2-T3-T1
SERIAL
SCHEDULE
T3
56
View Serializability
 Conflict serializability is sufficient but not necessary for
serializability. A more general sufficient condition is view
serializability.
 Two schedules S1 and S2 over the same set of transactions -
any transaction that appears in either S1 or S2 must also appear
in the other - are view equivalent under these conditions:
1. If Ti reads the initial value of object A in S1, it must also read
the initial value of A in S2.
2. If Ti reads a value of A written by Tj in S1, it must also read the
value of A written by Tj in S2.
3. For each data object A, the transaction (if any) that performs the
final write on A in S1 must also perform the final write on A in S2.
57

 S S’
R(A)
T1 T2 T1 R(A)
T2
W(A) W(A)
R(B)
R(A) W(B)
W(A)
R(B)
R(A)
W(B)
W(A)
R(B) R(B)
W(B) W(B)
58
2PL
Strict 2PL ensures that the precedence graph
for any schedule that it allows is acyclic. A
widely studied variant of Strict 2PL, called Two-
Phase Locking (2PL), relaxes
the second rule of Strict 2PL to allow
transactions to release locks before the end,
that is, before the commit or abort action.
Thus, every transaction has a `growing' phase
in which it acquires locks, followed by a
`shrinking' phase in which it releases locks.
59

 Even (nonstrict) 2PL ensures acyclicity of the precedence


graph and therefore allows only serializable schedules
 A schedule is said to be strict if a value written by a
transaction T is not read or overwritten by other
transactions until T either aborts or commits. Strict
schedules do not require cascading aborts, an actions of
aborted transactions can be undone by restoring the
original values of modified objects.
 The reason is that when a transaction T writes an object
under Strict 2PL, it holds the (exclusive) lock until it
commits or aborts. Thus, no other transaction can see or
modify this object until T
is complete where as 2PL has to use cascading abort as it
uses the uncommitted data to recover the schedule.
60
LOCK MANAGEMENT
• The part of the DBMS that keeps track of the locks issued to
transactions is called the lock manager.
 The lock manager maintains a lock table, which is a hash table
with the data object identifier as the key. The DBMS also
maintains a descriptive entry for each transaction in a transaction
table.
 A lock table entry for an object- which can be a page, a record,
and so on, depending on the DBMS contains the following
information: the number of transactions currently holding a lock on
the object (this can be more than one if the object is locked in
shared mode), the nature of the lock (shared or exclusive), and a
pointer to a queue of lock requests.
61
Implementing Lock and Unlock
Requests
According to the Strict 2PL protocol, before a transaction T reads or writes a
database object O, it must obtain a shared or exclusive lock on O and must hold
on to the lock until it commits or aborts. When a transaction needs a lock on an
object, it issues a lock request to the lock manager:
1. If a shared lock is requested, the queue of requests is empty, and the object is
not currently locked in exclusive mode, the lock manager grants the lock and
updates the lock table entry for the object (indicating that the object is locked in
shared mode, and incrementing the number of transactions holding a lock by one).
2. If an exclusive lock is requested, and no transaction currently holds a lock onthe
object (which also implies the queue of requests is empty), the lock manager
grants the lock and updates the lock table entry.
3. Otherwise, the requested lock cannot be immediately granted, and the lock
request is added to the queue of lock requests for this object.
62

 When a transaction aborts or commits, it releases all its


locks. When a lock on an object is released, the lock
manager updates the lock table entry for the object and
examines the lock request at the head of the queue for
this object.
 Indeed, if there are several requests for a shared lock on
the object at the front of the queue, all of these requests
can now be granted together.
 Example if T1 has a shared lock on O, and T2 requests an
exclusive lock, T2's request is queued. Now, if T3 requests
a exclusive lock, its request enters the queue behind that of
T2.
Atomicity of Locking and 63
Unlocking

 The implementation of lock and unlock


commands must ensure that these are
atomic operations.
 To ensure atomicity of these operations
when several instances of the lock
manager code can execute concurrently,
access to the lock table has to be
guarded by an operating system
synchronization mechanism such as
a semaphore.
Additional Issues: Lock Conversion, Convoys, 64
Latches
 A transaction may need to acquire an exclusive lock on
an object for which it already holds a shared lock. Such a
lock upgrade request is handled specially by granting
the write lock immediately if no other transaction
holds a shared lock on the object and inserting the
request at the front of the queue otherwise.
 The rationale for favouring the transaction thus is that it
already holds a shared lock on the object and queuing it
behind another transaction that wants an exclusive lock
on the same object causes both transactions to wait for
each other and therefore be blocked forever.
 A better approach is to avoid the need for lock upgrades altogether by
obtaining exclusive locks initially, and downgrading to a shared lock
once it is clear that this is sufficient
 In addition to locks, which are held over a long duration, a DBMS also supports
short-duration latches. Setting a latch before reading or writing a page ensures
that the physical read or write operation is atomic; otherwise, two 65 read/write
operations might conflict if the objects being locked do not correspond to disk
pages (the units of I/O). Latches are unset immediately after the physical read or
write operation is completed.
 We concentrated thus far on how the DBMS schedules transactions based on
their requests for locks. This interleaving interacts with the operating system’s
scheduling of processes' access to the CPU and can lead to a situation called
a convoy, where most of the CPU cycles are spent on process switching.
 The problem is that a transaction T holding a heavily used lock may be
suspended by the operating system. Until T is resumed, every other transaction
that needs this lock is queued. Such queues, called convoys, can quickly become
very long; a convoy, once forrned, tends to be stable. Convoys are one of the
drawbacks of building a DBMS on top of a general-purpose operating system
with preemptive scheduling.
66
Deadlocks
 Consider the following example. Transaction T1 sets an exclusive
lock on object A, T2 sets an exclusive lock on B, T1 requests an
exclusive lock on B and is queued, and T2 requests an exclusive lock
on A and is queued.
 Now, T1 is waiting for T2 to release its lock and T2 is waiting
for T1 to release its lock. Such a cycle of transactions waiting for
locks to be released is called a deadlock.
 Clearly, these two transactions will make no further progress.
Worse, they hold locks that may be required by other transactions.
67
Deadlock Detection

 Deadlocks tend to be rare and typically involve very few


transactions. This observation suggests that rather than
taking measures to prevent deadlocks, it may be better to
detect and resolve deadlocks as they arise.
 In the detection approach, the DBMS must periodically check
for deadlocks.When a transaction Ti is suspended because a
lock that it requests cannot be granted, it must wait until all
transactions Tj that currently hold conflicting locks release
them.
 The lock manager maintains a structure called a wait-for
graph to detect deadlock cycles. The nodes correspond to
active transactions, and there is an arc from Ti to Tj if (and
only if) Ti is waiting for Tj to release a lock. The lock manager
adds edges to this graph when it queues lock requests and
removes edges when it grants lock requests.
 The waits-for graph is periodically checked
68
for cycles, which indicate deadlock. A
deadlock is resolved by aborting a
transaction that is on a cycle and releasing
its locks; this action allows some of the
waiting transactions to proceed.
 an alternative to maintaining a waits-for
graph, a simplistic way to identify dead-
locks is to use a timeout mechanism: if a
transaction has been waiting too long for a
lock, we can assume (pessimistically) that it
is in a deadlock cycle and abort it.
 Consider the schedule shown in Figure. The last
step, shown below the line, creates a cycle in69
the
waits-for graph.
70
71
Deadlock Prevention
 If there is a high level of contention for locks and
therefore an increased likelihood of deadlocks,
prevention based schemes could perform better.
 We can prevent deadlocks by giving each transaction a
priority and ensuring that lower-priority transactions
are not allowed to wait for higher-priority transactions
(or vice versa).
 One way to assign priorities is to give each transaction
a timestamp when it starts up. The lower the timestamp,
the higher is the transaction's priority; that is, the oldest
transaction has the highest priority.
 If a transaction Ti requests a lock and transaction
72Tj
holds a conflicting lock, the lock manager can use
one of the following two policies:
 Wait-die: If Ti has higher priority, it is allowed to
wait; otherwise it is aborted.
 Wound-wait: If Ti has higher priority, abort Tj;
otherwise Ti waits.
 In the wait-die scheme, lower priority transactions
can never wait for higher priority transactions. In
the wound-wait scheme, higher priority transactions
never wait for lower priority transactions. In either
case no deadlock cycle can develop.
 When a transaction is aborted and restarted, it
should be given the same timestamp that 73 it
had originally. Reissuing timestamps in this
way ensures that each transaction will
eventually become the oldest transaction, and
thus the one with the highest priority, and will
get all the locks that it requires.
 The wait-die scheme is non-premptive,
only a transaction requesting a lock can be
aborted. As a transaction grows older (and its
priority increases), it tends to wait for more
and more younger transactions. A younger
transaction that conflicts with an older
transaction may be repeatedly aborted , but
on the other hand, a transaction that has all
74
Conservative 2PL
 A variant of 2PL called Conservative 2PL can also
prevent deadlocks. Under Conservative2PL, a
transaction obtains all the locks that it will ever
need when it begins, or blocks waiting for these
locks to become available.
 This scheme ensures that there will not be any
deadlocks, and, perhaps more importantly, that a
transaction that already holds some locks will not
block waiting for other locks.
 It has to know exactly what locks are needed ahead of time, and
this approach leads to setting more locks than necessary. It also
has higher overhead for setting locks because a transaction has to
release all locks

You might also like