Chapter 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Chapter 4 - Transaction Concepts and concurrency control

4.1 Failure classification


4.2 Recovery concepts
4.3 Log base recovery techniques (Deferred and Immediate update)
4.4 Checkpoints, Relationship between database manager and buffer cache. Aries recovery
algorithm.
4.5 Recovery with concurrent transactions (Rollback, checkpoints, commit)
4.6 Database backup and recovery from catastrophic failure

Introduction
• A crash is the sudden failure of a software application or operating system or of a
hardware device such as a hard disk.
• An important feature of conventional database systems is the ability to recover from
system crashes.
• Recovery means storing data into previous state before incident.
• Database recovery is the process of restoring the database to a correct (consistent) state in
the event of a failure.
• Crash recovery is the process by which database is moved back to consistent state.
Failure Classification
Failure refers to the state when system can no longer continue with its normal execution and that
results in loss of information.
• Transaction failure :
⮚ Logical errors: transaction cannot complete due to some internal error condition.
E.g. Data not found, bad input
⮚ System errors: the database system must terminate an active transaction due to an
error condition (e.g., deadlock)
• System crash: a power failure or other hardware or software failure causes the system to
crash.
Fail-stop assumption: non-volatile storage contents are assumed to not be corrupted by
system crash Database systems have numerous integrity checks to prevent corruption of disk
data.
• Disk failure: a head crash or similar disk failure destroys all or part of disk storage
Destruction is assumed to be detectable: disk drives use checksums to detect failures.

Recovery Concept
• Consider transaction Ti that transfers $50 from account A to account B
⮚ Two updates: subtract $50 from A and add $50 to B
• Transaction Ti requires updates to A and B to be output to the database.
⮚ A failure may occur after one of these modifications have been made but before
both of them are made.
⮚ Modifying the database without ensuring that the transaction will commit may
leave the database in an inconsistent state
⮚ Not modifying the database may result in lost updates if failure occurs just after
transaction commits
• Recovery algorithms have two parts
⮚ Actions taken during normal transaction processing to ensure enough information
exists to recover from failures
⮚ Actions taken after a failure to recover the database contents to a state that ensures
atomicity, consistency and durability

Concept of log:
• Log means recording of database modifications.
• To denote various types of log records:
1. Start of transaction:
<Ti, start>
2. Update log:
<Ti,Xj,V1,V2>
Ti - Transaction identifier
Xj – Data item identifier
V1 – Old value of Xj
V2 – New value Xj after the write operation
3. Transaction Commit:
<Ti, commit>
4. Transaction abort:
<Ti, abort>
• Log is stored in stable storage.
• Whenever a transaction performs a write, it is essential that the log record for that
write be created before the database is modified.
Log based recovery techniques:
• The log based recovery technique maintain transaction logs to keep track of all update
operations of the transaction.
• It assumes that transactions are executed serially ie only one transaction is active at a
time.
• It uses structure called log to store modification.
Recovery Procedure:
Two operations applied to recover from a failure:
1. Undo: This operation recovers (roll backs) the changes made to database by
uncommitted transaction and restores the database to the consistent state that existed
before the start of the transaction.
2. Redo: This operation reapplied the changes of a committed transaction and restores the
database to the consistent state it would be at the end of the transaction. This operation is
required when the changes of a committed transaction are not partially reflected to the
database on disk. The redo modifies the new values for the committed transaction.
Two recovery algorithms are:
1. Deferred update (or NO UNDO/REDO) algorithm
2. Immediate update (or UNDO/REDO) algorithm

Deferred update (or NO UNDO/REDO):


• In deferred, if transaction fails a database before reaching its commit point, it will not
have changed the database in any way, so UNDO is not needed.
• It may be necessary to REDO the effect of the operations of a committed transaction
from the log, because their effect may not yet have been recorded in the database.
• Hence deferred update is also known as No UNDO/REDO algorithm.
Example 1:

Example 2:

In case of crash,
If transaction commits then do REDO action.
If no commit action is for transaction then perform NO UNDO
Immediate update (UNDO/REDO):
• It allows the database modifications to be output to the database while transaction is still
active state. This is called as uncommitted modifications.
• It allows database to modify by any transaction after write operation.
• If transaction fails after recording some changes in database but before reaching its
commit point, the effect of its operation on database must be undone ie transaction must
be roll backed.
• For committed transaction REDO action is taken.
• Hence deferred update is also known as UNDO/REDO algorithm
Example 1:

In log T1 contains commit so changes are updated to database. This is REDO action.
Example 2:
When T1 starts and performs write operation then immediately changes are made into the
database.
Same for T2 also.
But in case of system crash changes for Transaction T1 are Redone ie REDO because commit is
log.
For T2 changes are undone as no commit is in log.
i.e. For T1 action is REDO
For T2 action is UNDO

In case of crash,
If transaction commits then do REDO action.
If no commit action is for transaction then perform UNDO
Immediate update examples:
Recovery actions in each case above are:
(a) undo (T0): B is restored to 2000 and A to 1000, and log records
<T0, B, 2000>, <T0, A, 1000>, <T0, abort> are written out
(b) redo (T0) and undo (T1): A and B are set to 950 and 2050 and C is restored to 700. Log
records <T1, C, 700>, <T1, abort> are written out.
(c) redo (T0) and redo (T1): A and B are set to 950 and 2050
respectively. Then C is set to 600

Checkpoints, Relationship between database manager and buffer cache:


Checkpoint:
• Redoing/undoing all transactions recorded in the log can be very slow processing the
entire log is time-consuming if the system has run for a long time.
• It might unnecessarily redo transactions which have already output their updates to the
database.
• Streamline recovery procedure by periodically performing check pointing.
• Write a log record < checkpoint L> onto stable storage where L is a list of all transactions
active at the time of checkpoint.
• All updates are stopped while doing check pointing.
• Checkpoint is a mechanism where all the previous logs are removed from the system and
stored permanently in a storage disk.
• Checkpoint declares a point before which the DBMS was in consistent state and all the
transactions were committed.
• During recovery we need to consider only the most recent transaction Ti that started
before the checkpoint, and transactions that started after Ti.
• Scan backwards from end of log to find the most recent <checkpoint L> record
• Only transactions that are in L or started after the checkpoint need to be redone or
undone
• Transactions that committed or aborted before the checkpoint already have all
their updates output to stable storage.
• Some earlier part of the log may be needed for undo operations
• Continue scanning backwards till a record <Ti start> is found for every transaction Ti in
L.
• Parts of log prior to earliest <Ti start> record above are not needed for recovery, and can
be erased whenever desired.

T1 can be ignored (updates already output to disk due to checkpoint)


T2 and T3 redone.
T4 undone

Relationship between database manager and buffer cache:


• A database manager (DB manager) is a computer program or set of computer programs,
that provide basic database management functionalities including creation and
maintenance of databases.
• Database managers have several capabilities as backup and restore, create, delete
databases.
• Buffer Cache is a shared and reserved memory area that stores the most recently
accessed data blocks from hard disk in RAM. It is also called as data cache. It may
contain “dirty” (modified data blocks) and “clean” (unused) blocks. Buffer cache is
mainly used to take advantage of a computer’s fast primary memory compared to the
slower secondary memory, thereby minimizing the number of input/output (I/O)
operations between the primary and secondary memories.
• For example, when you execute a query to find some information the DBMS will fetch
the data blocks that might contain the required records into buffer cache. Then the data
blocks in the buffer cache will be searched for the required records.
• Database manager plays important role in managing the buffer cache.
• Database manager decides which data needs to keep on buffer cache and data which is
not required swapped out to increase the response time.
ARIES Recovery Algorithms:
This algorithm is for Recovery and Isolation Exploiting Semantics.
This algorithm is simple and flexible.
ARIES recovery process consist of following three phases:
1. Analysis: The analysis phase identifies the dirty(updated) pages in the buffer and the set
of transactions active at the time of crash. The appropriate point in the log where the
REDO operation should be determined.
2. REDO: REDO operation is applied to only committed transactions. However in ARIES
log will provide the start point for REDO from which REDO operations are applied until
the end of the log is reached.
Information stored by ARIES and in the data pages will allow ARIES to
determine whether the operation to be redone has actually been applied to the database
and hence need not be reapplied. Thus only the necessary REDO operations are applied
during recovery.
3. UNDO: During the UNDO phase, the log is scanned backward and the operations of
transactions that were active at the time of crash are undone in reverse order.
ARIES includes the log, the Transaction table, Dirty page table. In addition to this
check pointing is used. These two tables are maintained by the transaction manager and
written to the log during check pointing.

Database backup and recovery from catastrophic failure:


• The recovery techniques in previous sections applied in case of non-catastrophic failures.
The recovery manager of a DBMS must also be equipped to handle more catastrophic
failures such as disk crashes.
• The main technique used to handle such crashes is a database backup.
• We have two different strategies to recover data from such a catastrophic failure −
⮚ Database backups can be taken on magnetic tapes and stored at a safer place. This
backup can later be transferred onto a freshly installed database to bring it to the
point of backup.
⮚ Remote backup. Here a backup copy of the database is stored at a remote location
from where it can be restored in case of a catastrophe.
• To avoid losing all the effects of transactions that have been executed since the last
backup, it is necessary to back up the system log at more frequent intervals than full
database backup by periodically copying to other storages.
Examples:
1. If deferred update a technique with checkpoint is used, what will be the recovery
procedure?
[start – transaction, T1]
[write – T1, A, 10]
[commit T1]
[start – transaction, T3]
[write – T3, B, 15]
[checkpoint]
[commit T3]
[start – transaction, T2]
[write – T2, B, 20]
[start – transaction, T4]
[write – T4, D, 25]
[write – T2, C, 30] <- System crash
T1 is before checkpoint and committed also – Don’t take any action for T1(Don’t apply
recovery algorithm)
T3 is before checkpoint but has not committed, It has committed after checkpoint – Redo
T2, T4 has started after checkpoint and not committed before system crash – NO UNDO

2. If deferred update a technique with checkpoint is used, what will be the recovery
procedure?
[start – transaction, T1]
[write – T1, B, 100]
[commit T1]
[checkpoint]
[start – transaction, T2]
[write – T2, B, 15]
[commit T2]
[start – transaction, T3]
[write – T3, D, 200]
[write – T3, B, 200] <- System crash
T1 is before checkpoint and committed also – Don’t take any action for T1(Don’t apply
srecovery algorithm)
T2 It has committed after checkpoint – Redo
T3 has started after checkpoint and not committed before system crash – NO UNDO

3. If immediate update a technique with checkpoint is used, what will be the


recovery procedure?
[start – transaction, T1]
[write – T1, A, 10, 100]
[commit T1]
[checkpoint]
[start – transaction, T2]
[write – T2, B, 20, 200]
[commit T2]
[start – transaction, T3]
[write – T3, C, 30, 300] <- System crash
T1 is before checkpoint and committed also – Don’t take any action for T1(Don’t apply
recovery algorithm)
T2 It has committed after checkpoint – Redo
T3 has started after checkpoint and not committed before system crash – UNDO
4. If immediate update a technique with checkpoint is used, what will be the
recovery procedure?
[start – transaction, T1]
[read – T1, A]
[read – T1, B]
[write – T1, D, 10, 100]
[commit T1]
[checkpoint]
[start – transaction, T2]
[read – T2, B]
[write – T2, B, 15, 30]
[start – transaction, T4]
[read – T4, B]
[write – T4, B, 50, 200]
[start – transaction, T3]
[read – T3, A]
[write – T3, A, 20, 40]
[commit T4]
[read – T2, B]
[write – T2, D, 100, 200] <- System crash

T1 is before checkpoint and committed also – Don’t take any action for T1(Don’t apply
recovery algorithm)
T4 It has committed after checkpoint – Redo
T2, T3 has started after checkpoint and not committed before system crash – UNDO

You might also like