18CSC303J DBMS Unit-V
18CSC303J DBMS Unit-V
18CSC303J DBMS Unit-V
1
• Understand the practical problems of concurrency
control and gain knowledge about failures and
Course Learning recovery
Rationale CLR - 6
Concurrency control
Deadlock
2. A := A – 50
3. write(A)
4. read(B)
5. B := B + 50
6. write(B)
T1 T2
1. read(A)
2. A := A – 50
3. write(A)
read(A), read(B), print(A+B)
4. read(B)
5. B := B + 50
6. write(B)
• Isolation can be ensured trivially by running transactions serially
– That is, one after the other.
• However, executing multiple transactions concurrently has significant benefits, as we
will see later.
Required Properties of a Transaction (Cont.)
• Isolation requirement
• Atomicity. Either all operations of the transaction are properly reflected in the
database or none are.
• Durability. After a transaction completes successfully, the changes it has made to the
database persist, even if there are system failures.
Transaction State
• Active – the initial state; the transaction stays in this state while it is executing
• Partially committed – after the final statement has been executed.
• Failed -- after the discovery that normal execution can no longer proceed.
• Aborted – after the transaction has been rolled back and the database
restored to its state prior to the start of the transaction. Two options after it
has been aborted:
– Restart the transaction
• can be done only if no internal logical error
– Reduced average response time for transactions: short transactions need not wait
behind long ones.
– Must preserve the order in which the instructions appear in each individual
transaction.
• A transaction that fails to successfully complete its execution will have an abort
instruction as the last statement
Schedule 1
• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B.
• An example of a serial schedule in which T1 is followed by T2 :
Schedule 2
• A serial schedule in which T2 is followed by T1 :
Schedule 3
• Let T1 and T2 be the transactions defined previously. The following schedule is
not a serial schedule, but it is equivalent to Schedule 1.
• With concurrent transactions, all transactions share a single disk buffer and a single
log
– A buffer block can have data items updated by one or more transactions
– Can be ensured by obtaining exclusive locks on updated items and holding the locks till
end of transaction (strict two-phase locking)
• Undo of a log record <Ti, X, V1, V2> writes the old value V1 to X
• Redo of a log record <Ti, X, V1, V2> writes the new value V2 to X
• Undo and Redo of Transactions
– undo(Ti) restores the value of all data items updated by Ti to their old values, going
backwards from the last log record for Ti
• each time a data item X is restored to its old value V a special log record <Ti , X, V> is written
out
• when undo of a transaction is complete, a log record
<Ti abort> is written out.
– redo(Ti) sets the value of all data items updated by Ti to the new values, going forward
from the first log record for Ti
• No logging is done in this case
Undo and Redo on Recovering from Failure
• When recovering after failure:
– Transaction Ti
– needs to be undone if the log
• contains the record <Ti start>,
• but does not contain either the record <Ti commit> or <Ti abort>.
– Transaction Ti needs to be redone if the log
• contains the records <Ti start>
• and contains the record <Ti commit> or <Ti abort>
• Note that If transaction Ti was undone earlier and the <Ti abort> record written to
the log, and then a failure occurs, on recovery from failure Ti is redone
– such a redo redoes all the original actions including the steps that restored old values
• Known as repeating history
• Seems wasteful, but simplifies recovery greatly
Immediate DB Modification Recovery Example
Below we show the log as it appears at three instances of time.
(a) undo (T0): B is restored to 2000 and A to 1000, and log records
<T0, B, 2000>, <T0, A, 1000>, <T0, abort> are written out
(b) redo (T0) and undo (T1): A and B are set to 950 and 2050 and C is restored to 700. Log records <T1, C, 700>, <T1, abort> are
written out.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-X
instruction.
2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction.
• Lock requests are made to the concurrency-control manager by the programmer. Transaction can
• Lock-compatibility matrix
• A transaction may be granted a lock on an item if the requested lock is compatible with locks
already held on the item by other transactions
• Any number of transactions can hold shared locks on an item,
– But if any transaction holds an exclusive on the item no other transaction may hold any lock on the
item.
• If a lock cannot be granted, the requesting transaction is made to wait till all incompatible
locks held by other transactions have been released. The lock is then granted.
Lock-Based Protocols (Cont.)
• Example of a transaction performing locking:
T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)
• Locking as above is not sufficient to guarantee serializability — if A and B get updated in-between
the read of A and B, the displayed sum would be wrong.
• A locking protocol is a set of rules followed by all transactions while requesting and releasing
locks. Locking protocols restrict the set of possible schedules.
The Two-Phase Locking Protocol
• This protocol ensures conflict-serializable schedules.
• The protocol assures serializability. It can be proved that the transactions can be serialized in
the order of their lock points (i.e., the point where a transaction acquired its final lock).
The Two-Phase Locking Protocol (Cont.)
– Given a transaction Ti that does not follow two-phase locking, we can find a
transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not
conflict serializable.
Lock Conversions
• Two-phase locking with lock conversions:
– First Phase:
– can acquire a lock-S on item
– can acquire a lock-X on item
– can convert a lock-S to a lock-X (upgrade)
– Second Phase:
– can release a lock-S
– can release a lock-X
– can convert a lock-X to a lock-S (downgrade)
• This protocol assures serializability. But still relies on the programmer to insert the
various locking instructions.
Automatic Acquisition of Locks
• A transaction Ti issues the standard read/write instruction, without explicit locking calls.
if Ti has a lock on D
then
read(D)
else begin
grant Ti a lock-S on D;
read(D)
end
Automatic Acquisition of Locks (Cont.)
• Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to release its lock on B, while
executing lock-X(A) causes T3 to wait for T4 to release its lock on A.
• Starvation occurs if the concurrency control manager is badly designed. For example:
– A transaction may be waiting for an X-lock on an item, while a sequence of other transactions
request and are granted an S-lock on the same item.
• The potential for deadlock exists in most locking protocols. Deadlocks are a necessary evil.
• Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified protocol
called strict two-phase locking -- a transaction must hold all its exclusive locks till it
commits/aborts.
• Rigorous two-phase locking is even stricter. Here, all locks are held till commit/abort. In this
• The lock manager replies to a lock request by sending a lock grant messages (or a
message asking the transaction to roll back, in case of a deadlock)
• The lock manager maintains a data-structure called a lock table to record granted locks
and pending requests
• The lock table is usually implemented as an in-memory hash table indexed on the name
of the data item being locked
Lock Table
• Deadlock prevention protocols ensure that the system will never enter into a
– Require that each transaction locks all its data items before it begins execution
(predeclaration).
– Impose partial ordering of all data items and require that a transaction can lock data
• Timeout-Based Schemes:
– a transaction waits for a lock only for a specified amount of time. If the lock has not
been granted within that time, the transaction is rolled back and restarted.
• When Ti requests a data item currently being held by Tj, then the edge Ti Tj is
inserted in the wait-for graph. This edge is removed only when Tj is no longer
holding a data item needed by Ti.
• The system is in a deadlock state if and only if the wait-for graph has a cycle. Must
invoke a deadlock-detection algorithm periodically to look for cycles.
Deadlock Detection (Cont.)
• When a transaction locks a node in the tree explicitly, it implicitly locks all the
node's descendents in the same mode.
granularity:
– intention-shared (IS): indicates explicit locking at a lower level of the tree but only with shared
locks.
– intention-exclusive (IX): indicates explicit locking at a lower level with exclusive or shared locks
– shared and intention-exclusive (SIX): the sub-tree rooted by that node is locked explicitly in
shared mode and explicit locking is being done at a lower level with exclusive-mode locks.
• intention locks allow a higher level node to be locked in S or X mode without having to
• Observe that locks are acquired in root-to-leaf order, whereas they are released in leaf-
to-root order.
• Lock granularity escalation: in case there are too many locks at a particular level,
switch to higher granularity S or X lock
Recovery System
Recovery System
• Failure Classification
• Storage Structure
• Recovery and Atomicity
• Log-Based Recovery
• Remote Backup Systems
Failure Classification
• Transaction failure :
– Logical errors: transaction cannot complete due to some internal error condition
– System errors: the database system must terminate an active transaction due to an error
condition (e.g., deadlock)
• System crash: a power failure or other hardware or software failure causes the
system to crash.
– Fail-stop assumption: non-volatile storage contents are assumed to not be corrupted by
system crash
• Database systems have numerous integrity checks to prevent corruption of disk data
• Disk failure: a head crash or similar disk failure destroys all or part of disk storage
– Destruction is assumed to be detectable: disk drives use checksums to detect failures
Recovery Algorithms
• Consider transaction Ti that transfers $50 from account A to account B
– Two updates: subtract 50 from A and add 50 to B
• Transaction Ti requires updates to A and B to be output to the database.
– A failure may occur after one of these modifications have been made but before
both of them are made.
– Modifying the database without ensuring that the transaction will commit may
leave the database in an inconsistent state
– Not modifying the database may result in lost updates if failure occurs just after
transaction commits
• Recovery algorithms have two parts
1. Actions taken during normal transaction processing to ensure enough information
exists to recover from failures
2. Actions taken after a failure to recover the database contents to a state that ensures
atomicity, consistency and durability
Storage Structure
• Volatile storage:
– does not survive system crashes
– examples: main memory, cache memory
• Nonvolatile storage:
– survives system crashes
– examples: disk, tape, flash memory,
non-volatile (battery backed up) RAM
– but may still fail, losing data
• Stable storage:
– a mythical form of storage that survives all failures
– approximated by maintaining multiple copies on distinct nonvolatile
media
– See book for more details on how to implement stable storage
Stable-Storage Implementation
• Maintain multiple copies of each block on separate disks
– copies can be at remote sites to protect against disasters such as fire or flooding.
• Failure during data transfer can still result in inconsistent copies: Block transfer can result in
– Successful completion
• Protecting storage media from failure during data transfer (one solution):
2. When the first write successfully completes, write the same information onto the
second physical block.
3. The output is completed only after the second write successfully completes.
Stable-Storage Implementation (Cont.)
• Block movements between disk and main memory are initiated through the
following two operations:
– input(B) transfers the physical block B to main memory.
– output(B) transfers the buffer block B to the disk, and replaces the appropriate
physical block there.
• We assume, for simplicity, that each data item fits in, and is stored inside, a
single block.
Example of Data Access
Data Access (Cont.)
• Each transaction Ti has its private work-area in which local copies of all data
items accessed and updated by it are kept.
– Ti's local copy of a data item X is called xi.
• Transferring data items between system buffer blocks and its private work-
area done by:
– read(X) assigns the value of data item X to the local variable xi.
– write(X) assigns the value of local variable xi to data item {X} in the buffer block.
– Note: output(BX) need not immediately follow write(X). System can perform the
output operation when it deems fit.
• Transactions
– Must perform read(X) before accessing X for the first time (subsequent reads can
be from local copy)
– write(X) can be executed at any time before the transaction commits
Recovery and Atomicity
• To ensure atomicity despite failures, we first output information describing
the modifications to stable storage without modifying the database itself.
• We study log-based recovery mechanisms in detail
– We first present key concepts
– And then present the actual recovery algorithm
• Less used alternative: shadow-copy and shadow-paging
shadow-copy
Log-Based Recovery
• A log is kept on stable storage.
– The log is a sequence of log records, and maintains a record of update activities on the database.
• Output of updated blocks to stable storage can take place at any time before or after
transaction commit
• Order in which blocks are output can be different from the order in which they are
written.
• The deferred-modification scheme performs updates to buffer/disk only at the time of
transaction commit
– Simplifies some aspects of recovery
– But has overhead of storing local copy
Transaction Commit
– all previous log records of the transaction must have been output already
2. we might unnecessarily redo transactions which have already output their updates
to the database.
3. Write a log record < checkpoint L> onto stable storage where L is a list of all
transactions active at the time of checkpoint.
– Once the record <Ti start> is found stop the scan and write the log record <Ti abort>
Recovery Algorithm (Cont.)
– Log records are output to stable storage when a block of log records in the buffer is full,
• Log force is performed to commit a transaction by forcing all its log records
• Several log records can thus be output using a single output operation, reducing
– Transaction Ti enters the commit state only when the log record
<Ti commit> has been output to stable storage.
– Before a block of data in main memory is output to the database, all log records
pertaining to data in that block must have been output to stable storage.
• This rule is called the write-ahead logging or WAL rule
– Strictly speaking WAL only requires undo information to be output