UNIT 4-fdb

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 38

Transaction management and

Concurrency control
ACID properties
• ACID properties in DBMS are necessary for
maintaining data consistency, integrity, and reliability
while performing transactions in the database.
• A transaction is a single logical unit of work that
accesses and possibly modifies the contents of a
database. Transactions access data using read-and-
write operations. To maintain consistency in a
database, before and after the transaction, certain
properties are followed. These are
called ACID properties.
Atomicity:
• Either the entire transaction takes place at once or
doesn’t happen at all.
• There is no midway i.e. transactions do not occur
partially. Each transaction is considered as one unit and
either runs to completion or is not executed at all.
• It involves the following two operations.
— Abort : If a transaction aborts, changes made to the
database are not visible.
— Commit : If a transaction commits, changes made are
visible.
• Atomicity is also known as the ‘All or nothing rule’.
Example
• Consider the following transaction T consisting
of T1 and T2 : Transfer of 100 from account X to
account Y .
If the transaction fails after
completion of T1 but before
completion of T2 .( say,
after write(X) but before write(Y) ),
then the amount has been
deducted from X but not added
to Y . This results in an inconsistent
database state. Therefore, the
transaction must be executed in its
entirety in order to ensure the
correctness of the database state.
Consistency:
• This means that integrity constraints must be
maintained so that the database is consistent before and
after the transaction. It refers to the correctness of a
database. Referring to the example above,
• The total amount before and after the transaction must
be maintained.
Total before T occurs = 500 + 200 = 700 .
Total after T occurs = 400 + 300 = 700 .
• Therefore, the database is consistent . Inconsistency
occurs in case T1 completes but T2 fails. As a result, T is
incomplete.
Isolation:
• This property ensures that multiple transactions
can occur concurrently without leading to the
inconsistency of the database state.
• Transactions occur independently without
interference.
• Changes occurring in a particular transaction
will not be visible to any other transaction until
that particular change in that transaction is
written to memory or has been committed.
Example
• Let X = 500, Y = 500.
Consider two transactions T and T”.
Suppose T has been executed till Read
(Y) and then T’’ starts. As a result,
interleaving of operations takes place
due to which T’’ reads the correct value
of X but the incorrect value of Y and
sum computed by
T’’: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at
end of the transaction:
T: (X+Y = 50, 000 + 450 = 50, 450) .
This results in database inconsistency,
due to a loss of 50 units.
Durability:
• This property ensures that once the
transaction has completed execution, the
updates and modifications to the database are
stored in and written to disk and they persist
even if a system failure occurs.
• These updates now become permanent and
are stored in non-volatile memory. The effects
of the transaction, thus, are never lost.
Concurrency
• First we will study isolation of transactions, ensured
by the concurrency control subsystem.

• Isolation is a problem only when multiple users are


accessing the same data, and their actions
interleave.

• Why is concurrency necessary?


– Applications demand it
– Better utilization of resources: While one
user/transaction is reading the disk, another can be
using the CPU or reading another disk. This results in
better throughput and response time.
Serial Schedules
• Consider these transactions

T1: BEGIN A+=100, B-=100 END //Deposit, withdraw


T2: BEGIN C = A+B END //Compute bank balance
T3: BEGIN A =1.06*A, B=1.06*B //Give interest

• Definition: A schedule of transactions is an interleaving of


the actions of the transactions so that each transaction’s
order is preserved
• Definition: A schedule of transactions is serial if its
transactions occur consecutively, one after another.
Schedule, Serial Schedules*
• Which of these is a schedule of T1,T2, and/or T3? A serial schedule?

Time

S1
T1: A+=100, B-=100
T3: A=1.06*A, B=1.06*B

S2 T1: A+=100, B-=100


T3: A=1.06*A, B=1.06*B

S3 T1: A+=100, B-=100


T2: C=A+B

S4 T1: B-=100, A+=100


T2: C=A+B
Allowable Concurrency*
• We want to allow interleaved schedules like S2&S3,
otherwise the DBMS becomes a (very slow) batch
system.
• S2 Looks good. Why?

• Any DBMS should allow S2.


• What is wrong with S3?

• Any DBMS should forbid S3.


• How can we distinguish between S2 and S3?
Serializable Schedules
• Definition: A schedule is serializable if its effect on
the DB is the same as the effect of some serial
schedule.
– Clearly serial schedules like S1 are serializable
– S2 is serializable

• Serializability is the same as the isolation condition


of ACID.

• The goal of the concurrency control subsystem of


a DBMS is to ensure serializability*
A complex way to prove Serializability*

• Recall S2:
S2 T1: A+=100, B-=100
T3: A=1.06*A, B=1.06*B

• Why is S2 serializable?
Schedules as reads and writes
• The expression A =1.06*A in the previous schedule
means
– Read A from disk
– Set A equal to 1.06*A
– Write A to disk
• Only the read and write to disk matter to the DBMS.
– We use the notation R(A), W(A)
• So henceforth we write schedules in terms of reads and
writes to the DBMS
• This will be less intuitive but we will be able to capture
the DBMS activity better
Proving Serializability, ctd*
S2 T1: A+=100, B-=100
T3: A=1.06*A, B=1.06*B

• Here is S2 with our new notation:


S2
T1: R(A),W(A), R(B),W(B)
T3: R(A),W(A), R(B),W(B)

• Why is S2 serializable?

• Why is S5 serializable?
S5 T4: R(C),W(C), R(D),
T5: R(D), R(E),W(E)
Non-conflicting Actions
• S2 and S5 have a special structure that makes it
possible to prove that they are serializable.
Let’s formalize this structure.
• Definition: Two actions are non-conflicting if
they are in different transactions and either
they access different data items or both are
reads.
– The green actions in S2 and S5 actions are
nonconflicting
• Observation: If non-conflicting actions are
commuted then the new schedule gives the
same result in the DB as the old one
Conflict serializability
• Definition: Two schedules are conflict equivalent if
– One can be transformed into the other by commuting
non-conflicting actions
• Theorem: Conflict equivalent schedules produce
the same result in the database.
– Proof: Use the preceding observation
• Definition: A schedule is conflict serializable if it is
conflict equivalent to a serializable schedule
• Theorem: every conflict serializable schedule is
serializable
– Proof: By the previous theorem, a conflict serializable
schedule produces the same result as a serializable
schedule.
Precedence graphs
• So far the only way we have to prove serializability is to
verify conflict serializability, and that is laborious. But
there is an easier way.
• Precedence graph: One node per transaction. Edge
from Ti to Tj if an action in Ti occurs before an action in
Tj and they conflict.
– Actions conflict if they act on the same data item and one
of them is a write.
• Theorem: A schedule is conflict serializable if and only
if its precedence graph is acyclic (not cyclic).
• Draw the precedence graph for the following
schedules to see if they are conflict serializable and
therefore serializable.
LO7.2: Which of these is serializable?*
S6 T1: R(A),W(A), R(B),W(B)
T2: R(A),W(A), R(B),W(B)

S7
T1: R(A), W(A)
T2: R(A), W(A), R(B)

S8 T1: R(A), R(B) W(A)


T2: W(A)
T3: W(A) R(B)

S9 T1: R(A), W(A)


T2: W(A)
T3: W(A)
Precedence Graphs Rule! (mostly)
• As on the preceding page we can use a precedence
graph to prove that a schedule like S6 is serializable.
• Other schedules, like S7 and S8, with cyclic
precedence graphs, we may not be able to prove
serializable, so a DBMS would forbid them.
• Notice S9. It has a cyclic precedence graphs so it is
not conflict serializable, but we can show by a trick
(nice guys finish last) that it is serializable.
– This shows that the inclusions on the next page are strict
Conclusions
Serializable

Conflict
Serializable

Serial

Acyclic Precedence Graph


Serializability in the real world
• So far we have dealt with the theory, which
shows us how to tell if a schedule is
serializable.
• But a real DBMS is not presented with
schedules, it sees only a stream of
transactions.
• What can a real DBMS do to enforce
serializability and thus achieve the “isolation”
ACID property?
Locking: Used by most DBMSs to enforce serializability

• Transaction must get a lock – before it can read or update data

• There are two kinds of locks:


shared (S) locks and exclusive (X) locks

• To read a record you MUST get an S lock


To write (modify or delete) a record you MUST get an X lock

• Lock info maintained by a “lock manager”


How Locks Work
• If an object has an S lock, new
transactions can get S locks lock on data item
but not X locks. -- S X

-- ok ok ok

lock you want


• If an object has an X lock, no
other transaction can get S ok ok no
any lock (S or X) on that object.
X ok no no

• If a transaction can’t get a Lock compatibility

lock, it waits (in a queue).


Lock-based Protocols System
• Simple Lock-based protocol (or Binary
Locking):
– basic form of concurrency control.
– It operates by ensuring that a transaction obtains
a lock on a data item before accessing it (read or
write). However, it lacks strict rules for when locks
should be acquired or released, which can lead to
several disadvantages.
Lock-based Protocols System
• Disadvantage:
– Lost Updates
– Cascading Rollbacks
– Deadlocks
– No Guarantee of Serializability
Lock-based Protocols System
• Pre-Claiming Lock Protocol
– A transaction declares all the data items it will
access (read/write) beforehand.
– This protocol checks whether all the required locks
can be acquired at the start of the transaction.
– If yes, the transaction is allowed to proceed. If not,
the transaction is delayed until all required locks
are available.
Lock-based Protocols System
• Advantages:
– Deadlock free
– Simplified lock management
• Disadvantages:
– Conservative and inefficient
– Starvation: A transaction may be delayed indefinitely if other
transactions continuously hold the locks it requires.
– Potential for Resource Wastage
– Difficult to Predict Access Requirements:
Lock-based Protocols System
• 2-Phase Locking: In this locking protocols locking
and unlocking of data items are done in two phases:
– Growing Phase: New locks on data items may be
acquired but none can be released.
– Shrinking Phase: Existing locks may be released
but no new locks can be acquired.

Note: If lock conversion is allowed, then upgrading of


lock( from S(a) to X(a) ) is allowed in the Growing Phase,
and downgrading of lock (from X(a) to S(a)) must be done in
the shrinking phase.
Lock-based Protocols System
• Example:
T1 T2
1 lock-S(A)
2 lock-S(A)
3 lock-X(B)
4 ………. ……….
5 Unlock(A)
6 Lock-X(C)
7 Unlock(B)
8 Unlock(A)
9 Unlock(C)
10 ………. ……….
Lock-based Protocols System
• Transaction T1
T1 T2
– The growing Phase is from steps 1-3
1 lock-S(A)
– The shrinking Phase is from steps 5-7
– Lock Point at 3 2 lock-S(A)
3 lock-X(B)
• Transaction T2
4 ………. ……….
– The growing Phase is from steps 2-6
– The shrinking Phase is from steps 8-9 5 Unlock(A)
– Lock Point at 6 6 Lock-X(C)
• Lock Point 7 Unlock(B)
– The Point at which the growing 8 Unlock(A)
phase ends, i.e., when a transaction 9 Unlock(C)
takes the final lock it needs to carry
on its work. 10 ………. ……….
Lock-based Protocols System
• Advantage:
– Ensures Serializability.
• Disadvantages:
– Cascading Rollback is possible
– Deadlocks and starvation are possible
Lock-based Protocols System
• Cascading Rollbacks in 2-PL
Deadlock in DBMSs
• What is a deadlock?
– A cycle of transactions, T1, T2, ... , Tn=T1 where each Ti is waiting for Ti-1 to
release a lock.
– Causes these transactions to sleep forever.
• A Deadlock can happen whenever you allow a transaction to wait for
a lock, even with strict two phase locking.
– Simple example:
T1: R(B), W(A)
T2: R(A), W(B)

• Users can eliminate deadlocks by accessing resources in a fixed order.


• DBMSs typically detect deadlocks and abort the transaction that (it
thinks) has used the least resources.
Lock-based Protocols System
Schedule:
• T1: Acquires a lock on A (in its growing phase).
– T1 holds the lock on A.
• T2: Acquires a lock on B (in its growing phase).
– T2 holds the lock on B.
• T1: Requests a lock on B.
– T1 is blocked because T2 holds the lock on B.
• T2: Requests a lock on A.
– T2 is blocked because T1 holds the lock on A.
Lock-based Protocols System
Deadlock Situation
• T1 is waiting for T2 to release the lock on B,
and T2 is waiting for T1 to release the lock on
A.
• Neither transaction can proceed, resulting in a
deadlock.

You might also like