DBMS Module5 Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Transaction:

o The transaction is a set of logically related operation. It contains a group of tasks.


o A transaction is an action or series of actions. It is performed by a single user to
perform operations for accessing the contents of the database.
Example: Suppose an employee of bank transfers Rs 800 from X's account to Y's account.
This small transaction contains several low-level tasks:
X's Account
1. Open_Account(X)
2. Old_Balance = X.balance
3. New_Balance = Old_Balance - 800
4. X.balance = New_Balance
5. Close_Account(X)
Y's Account
1. Open_Account(Y)
2. Old_Balance = Y.balance
3. New_Balance = Old_Balance + 800
4. Y.balance = New_Balance
5. Close_Account(Y)
Operations of Transaction:
Following are the main operations of transaction:
Read(X): Read operation is used to read the value of X from the database and stores it in a
buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the buffer.
Let's take an example to debit transaction from an account which consists of following
operations:
1. 1. R(X);
2. 2. X = X - 500;
3. 3. W(X);
Let's assume the value of X before starting of the transaction is 4000.
o The first operation reads X's value from database and stores it in a buffer.
o The second operation will decrease the value of X by 500. So buffer will contain
3500.
o The third operation will write the buffer's value to the database. So X's final value
will be 3500.
But it may be possible that because of the failure of hardware, software or power, etc. that
transaction may fail before finished all the operations in the set.
For example: If in the above transaction, the debit transaction fails after executing
operation 2 then X's value will remain 4000 in the database which is not acceptable by the
bank.
To solve this problem, we have two important operations:
Commit: It is used to save the work done permanently.
Rollback: It is used to undo the work done.
States of Transaction
In a database, the transaction can be in one of the following states -

Active state
o The active state is the first state of every transaction. In this state, the transaction is
being executed.
o For example: Insertion or deletion or updating a record is done here. But all the
records are still not saved to the database.
Partially committed
o In the partially committed state, a transaction executes its final operation, but the
data is still not saved to the database.
o In the total mark calculation example, a final display of the total marks step is
executed in this state.
Committed
A transaction is said to be in a committed state if it executes all its operations successfully.
In this state, all the effects are now permanently saved on the database system.
Failed state
o If any of the checks made by the database recovery system fails, then the transaction
is said to be in the failed state.
o In the example of total mark calculation, if the database is not able to fire a query to
fetch the marks, then the transaction will fail to execute.
Aborted
o If any of the checks fail and the transaction has reached a failed state then the
database recovery system will make sure that the database is in its previous
consistent state. If not then it will abort or roll back the transaction to bring the
database into a consistent state.
o If the transaction fails in the middle of the transaction then before executing the
transaction, all the executed transactions are rolled back to its consistent state.
o After aborting the transaction, the database recovery module will select one of the
two operations:
1. Re-start the transaction
2. Kill the transaction
Transaction property
The transaction has the four properties. These are used to maintain consistency in a
database, before and after the transaction.
Property of Transaction
1. Atomicity
2. Consistency
3. Isolation
4. Durability

Atomicity
o It states that all operations of the transaction take place at once if not, the
transaction is aborted.
o There is no midway, i.e., the transaction cannot occur partially. Each transaction is
treated as one unit and either run to completion or is not executed at all.
Atomicity involves the following two operations:
Abort: If a transaction aborts then all the changes made are not visible.
Commit: If a transaction commits then all the changes made are visible.
Example: Let's assume that following transaction T consisting of T1 and T2. A consists of
Rs 600 and B consists of Rs 300. Transfer Rs 100 from account A to account B.
T1 T2

Read(A) Read(B)
A:=A-100 Y:=Y+100
Write(A) Write(B)
After completion of the transaction, A consists of Rs 500 and B consists of Rs 400.
If the transaction T fails after the completion of transaction T1 but before completion of
transaction T2, then the amount will be deducted from A but not added to B. This shows
the inconsistent database state. In order to ensure correctness of database state, the
transaction must be executed in entirety.
Consistency
o The integrity constraints are maintained so that the database is consistent before
and after the transaction.
o The execution of a transaction will leave a database in either its prior stable state or
a new stable state.
o The consistent property of database states that every transaction sees a consistent
database instance.
o The transaction is used to transform the database from one consistent state to
another consistent state.
For example: The total amount must be maintained before or after the transaction.
1. Total before T occurs = 600+300=900
2. Total after T occurs= 500+400=900
Therefore, the database is consistent. In the case when T1 is completed but T2 fails, then
inconsistency will occur.
Isolation
o It shows that the data which is used at the time of execution of a transaction cannot
be used by the second transaction until the first one is completed.
o In isolation, if the transaction T1 is being executed and using the data item X, then
that data item can't be accessed by any other transaction T2 until the transaction T1
ends.
o The concurrency control subsystem of the DBMS enforced the isolation property.
Durability
o The durability property is used to indicate the performance of the database's
consistent state. It states that the transaction made the permanent changes.
o They cannot be lost by the erroneous operation of a faulty transaction or by the
system failure. When a transaction is completed, then the database reaches a state
known as the consistent state. That consistent state cannot be lost, even in the event
of a system's failure.
o The recovery subsystem of the DBMS has the responsibility of Durability property.
Types of Schedules in DBMS

Schedule, as the name suggests, is a process of lining the transactions and executing them
one by one. When there are multiple transactions that are running in a concurrent manner
and the order of operation is needed to be set so that the operations do not overlap each
other, Scheduling is brought into play and the transactions are timed accordingly. The
basics of Transactions and Schedules is discussed in Concurrency Control (Introduction),
and Transaction Isolation Levels in DBMS articles. Here we will discuss various types of
schedules.

1. Serial Schedules: Schedules in which the transactions are executed non-


interleaved, i.e., a serial schedule is one in which no transaction starts until a
running transaction has ended are called serial schedules.
Example: Consider the following schedule involving two transactions T1 and
T2.
T1 T2

R(A)

W(A)

R(B)

W(B)

R(A)

R(B)

2. where R(A) denotes that a read operation is performed on some data item
‘A’ This is a serial schedule since the transactions perform serially in the
order T1 —> T2
3. Non-Serial Schedule: This is a type of Scheduling where the operations of
multiple transactions are interleaved. This might lead to a rise in the
concurrency problem. The transactions are executed in a non-serial manner,
keeping the end result correct and same as the serial schedule. Unlike the
serial schedule where one transaction must wait for another to complete all
its operation, in the non-serial schedule, the other transaction proceeds
without waiting for the previous transaction to complete. This sort of
schedule does not provide any benefit of the concurrent transaction. It can
be of two types namely, Serializable and Non-Serializable Schedule. The
Non-Serial Schedule can be divided further into Serializable and Non-
Serializable.
a. Serializable: This is used to maintain the consistency of the database. It
is mainly used in the Non-Serial scheduling to verify whether the
scheduling will lead to any inconsistency or not. On the other hand, a
serial schedule does not need the serializability because it follows a
transaction only when the previous transaction is complete. The non-
serial schedule is said to be in a serializable schedule only when it is
equivalent to the serial schedules, for an n number of transactions. Since
concurrency is allowed in this case thus, multiple transactions can
execute concurrently. A serializable schedule helps in improving both
resource utilization and CPU throughput.
These are of two types:
1. Conflict Serializable: A schedule is called conflict serializable if it can
be transformed into a serial schedule by swapping non-conflicting
operations. Two operations are said to be conflicting if all conditions
satisfy:
• They belong to different transactions
• They operate on the same data item
• At Least one of them is a write operation
2. View Serializable: A Schedule is called view serializable if it is view
equal to a serial schedule (no overlapping transactions). A conflict
schedule is a view serializable but if the serializability contains blind
writes, then the view serializable does not conflict serializable.
b. Non-Serializable: The non-serializable schedule is divided into two
types, Recoverable and Non-recoverable Schedule.
1. Recoverable Schedule: Schedules in which transactions commit only
after all transactions whose changes they read commit are called
recoverable schedules. In other words, if some transaction T j is
reading value updated or written by some other transaction T i, then the
commit of Tj must occur after the commit of T i.
Example – Consider the following schedule involving two transactions
T1 and T2.
T1 T2

R(A)

W(A)

W(A)

R(A)

commit

commit
2. This is a recoverable schedule since T1 commits before T2, that makes
the value read by T2 correct. There can be three types of recoverable
schedule:
a. Cascading Schedule: Also called Avoids cascading
aborts/rollbacks (ACA). When there is a failure in one transaction
and this leads to the rolling back or aborting other dependent
transactions, then such scheduling is referred to as Cascading
rollback or cascading abort. Example:

b. Cascadeless Schedule: Schedules in which transactions read


values only after all transactions whose changes they are going to
read commit are called cascadeless schedules. Avoids that a single
transaction abort leads to a series of transaction rollbacks. A
strategy to prevent cascading aborts is to disallow a transaction
from reading uncommitted changes from another transaction in the
same schedule. In other words, if some transaction Tj wants to read
value updated or written by some other transaction Ti, then the
commit of Tj must read it after the commit of T i.
c. Example: Consider the following schedule involving two
transactions T1 and T2.
T1 T2

R(A)

W(A)

W(A)

commit

R(A)

commit

d. This schedule is cascadeless. Since the updated value of A is read


by T2 only after the updating transaction i.e.
T1 commits. Example: Consider the following schedule involving
two transactions T1 and T2.
T1 T2

R(A)

W(A)

R(A)

W(A)

abort

abort

e. It is a recoverable schedule but it does not avoid cascading aborts.


It can be seen that if T1 aborts, T2 will have to be aborted too in
order to maintain the correctness of the schedule as T 2 has already
read the uncommitted value written by T 1.
f. Strict Schedule: A schedule is strict if for any two transactions Ti,
Tj, if a write operation of Ti precedes a conflicting operation of
Tj (either read or write), then the commit or abort event of T i also
precedes that conflicting operation of Tj. In other words, Tj can read
or write updated or written value of T i only after Ti commits/aborts.
g. Example: Consider the following schedule involving two
transactions T1 and T2.
T1 T2

R(A)

R(A)

W(A)

commit

W(A)

R(A)

commit

h. This is a strict schedule since T2 reads and writes A which is written


by T1 only after the commit of T1.
Non-Recoverable Schedule: Example: Consider the following
schedule involving two transactions T1 and T2.
T1 T2

R(A)

W(A)

W(A)

R(A)
T1 T2

commit

abort

T2 read the value of A written by T 1, and committed. T1 later aborted,


therefore the value read by T2 is wrong, but since T2 committed, this
schedule is non-recoverable.
Note – It can be seen that:
1. Cascadeless schedules are stricter than recoverable schedules or are a
subset of recoverable schedules.
2. Strict schedules are stricter than cascadeless schedules or are a subset of
cascadeless schedules.
3. Serial schedules satisfy constraints of all recoverable, cascadeless and strict
schedules and hence is a subset of strict schedules.
The relation between various types of schedules can be depicted as:
Example: Consider the following schedule:
S:R1(A), W2(A), Commit2, W1(A), W3(A), Commit3, Commit1
Which of the following is true?
(A) The schedule is view serializable schedule and strict recoverable schedule
(B) The schedule is non-serializable schedule and strict recoverable schedule
(C) The schedule is non-serializable schedule and is not strict recoverable
schedule.
(D) The Schedule is serializable schedule and is not strict recoverable
schedule
Solution: The schedule can be re-written as:-
T1 T2 T3

R(A)

W(A)

Commit

W(A)

W(A)

Commit

Commit

First of all, it is a view serializable schedule as it has view equal serial schedule
T1 —> T2 —> T3 which satisfies the initial and updated reads and final write on
variable A which is required for view serializability. Now we can see there is
write – write pair done by transactions T1 followed by T3 which is violating the
above-mentioned condition of strict schedules as T 3 is supposed to do write
operation only after T 1 commits which is violated in the given schedule. Hence
the given schedule is serializable but not strict recoverable. So, option (D) is
correct.
Log based Recovery in DBMS
The atomicity property of DBMS states that either all the operations of
transactions must be performed or none. The modifications done by an aborted
transaction should not be visible to the database and the modifications done by
the committed transaction should be visible. To achieve our goal of atomicity,
the user must first output stable storage information describing the
modifications, without modifying the database itself. This information can help
us ensure that all modifications performed by committed transactions are
reflected in the database. This information can also help us ensure that no
modifications made by an aborted transaction persist in the database.

Log based Recovery in DBMS

Log and log records


The log is a sequence of log records, recording all the updated activities in the
database. In stable storage, logs for each transaction are maintained. Any
operation which is performed on the database is recorded on the log. Prior to
performing any modification to the database, an updated log record is created
to reflect that modification. An update log record represented as: <Ti, Xj, V1,
V2> has these fields:
1. Transaction identifier: Unique Identifier of the transaction that performed
the write operation.
2. Data item: Unique identifier of the data item written.
3. Old value: Value of data item prior to write.
4. New value: Value of data item after write operation.
Other types of log records are:
1. <Ti start>: It contains information about when a transaction Ti starts.
2. <Ti commit>: It contains information about when a transaction Ti commits.
3. <Ti abort>: It contains information about when a transaction Ti aborts.
Undo and Redo Operations
Because all database modifications must be preceded by the creation of a log
record, the system has available both the old value prior to the modification of
the data item and new value that is to be written for data item. This allows
system to perform redo and undo operations as appropriate:
1. Undo: using a log record sets the data item specified in log record to old
value.
2. Redo: using a log record sets the data item specified in log record to new
value.
The database can be modified using two approaches –
1. Deferred Modification Technique: If the transaction does not modify the
database until it has partially committed, it is said to use deferred
modification technique.
2. Immediate Modification Technique: If database modification occur while
the transaction is still active, it is said to use immediate modification
technique.
Recovery using Log records
After a system crash has occurred, the system consults the log to determine
which transactions need to be redone and which need to be undone.
1. Transaction Ti needs to be undone if the log contains the record <Ti start>
but does not contain either the record <Ti commit> or the record <Ti abort>.
2. Transaction Ti needs to be redone if log contains record <Ti start> and either
the record <Ti commit> or the record <Ti abort>.
Use of Checkpoints – When a system crash occurs, user must consult the log.
In principle, that need to search the entire log to determine this information.
There are two major difficulties with this approach:
1. The search process is time-consuming.
2. Most of the transactions that, according to our algorithm, need to be redone
have already written their updates into the database. Although redoing them
will cause no harm, it will cause recovery to take longer.
To reduce these types of overhead, user introduce checkpoints. A log record of
the form <checkpoint L> is used to represent a checkpoint in log where L is a
list of transactions active at the time of the checkpoint. When a checkpoint log
record is added to log all the transactions that have committed before this
checkpoint have <Ti commit> log record before the checkpoint record. Any
database modifications made by Ti is written to the database either prior to the
checkpoint or as part of the checkpoint itself. Thus, at recovery time, there is no
need to perform a redo operation on Ti. After a system crash has occurred, the
system examines the log to find the last <checkpoint L> record. The redo or
undo operations need to be applied only to transactions in L, and to all
transactions that started execution after the record was written to the log. Let us
denote this set of transactions as T. Same rules of undo and redo are
applicable on T as mentioned in Recovery using Log records part. Note that
user need to only examine the part of the log starting with the last checkpoint
log record to find the set of transactions T, and to find out whether a commit or
abort record occurs in the log for each transaction in T. For example, consider
the set of transactions {T0, T1, . . ., T100}. Suppose that the most recent
checkpoint took place during the execution of transaction T67 and T69, while
T68 and all transactions with subscripts lower than 67 completed before the
checkpoint. Thus, only transactions T67, T69, . . ., T100 need to be considered
during the recovery scheme. Each of them needs to be redone if it has
completed (that is, either committed or aborted); otherwise, it was incomplete,
and needs to be undone.
Log-based recovery is a technique used in DBMS to recover a database to a
consistent state in the event of a failure or crash. It involves the use of
transaction logs, which are records of all the transactions performed on the
database.
In log-based recovery, the DBMS uses the transaction log to reconstruct the
database to a consistent state.
The transaction log contains records of all the changes made to the database,
including updates, inserts, and deletes. It also records information about each
transaction, such as its start and end times.
When a failure occurs, the DBMS uses the transaction log to determine which
transactions were incomplete at the time of the failure. It then performs a series
of operations to undo the incomplete transactions and redo the completed ones.
This process is called the redo/undo recovery algorithm.
The redo operation involves reapplying the changes made by completed
transactions that were not yet saved to the database at the time of the failure.
This ensures that all changes are applied to the database.
The undo operation involves undoing the changes made by incomplete
transactions that were saved to the database at the time of the failure. This
restores the database to a consistent state by reversing the effects of the
incomplete transactions.
Once the redo and undo operations are completed, the DBMS can bring the
database back online and resume normal operations.
Log-based recovery is an essential feature of modern DBMSs and provides a
reliable mechanism for recovering from failures and ensuring the consistency of
the database.
Advantages of Log based Recovery
• Durability: In the event of a breakdown, the log file offers a dependable and
long-lasting method of recovering data. It guarantees that in the event of a
system crash, no committed transaction is lost.
• Faster Recovery: Since log-based recovery recovers databases by
replaying committed transactions from the log file, it is typically faster than
alternative recovery methods.
• Incremental Backup: Backups can be made in increments using log-based
recovery. Just the changes made since the last backup are kept in the log
file, rather than creating a complete backup of the database each time.
• Lowers the Risk of Data Corruption: By making sure that all transactions
are correctly committed or canceled before they are written to the database,
log-based recovery lowers the risk of data corruption.
Disadvantages of Log based Recovery
• Additional overhead: Maintaining the log file incurs an additional overhead
on the database system, which can reduce the performance of the system.
• Complexity: Log-based recovery is a complex process that requires careful
management and administration. If not managed properly, it can lead to data
inconsistencies or loss.
• Storage space: The log file can consume a significant amount of storage
space, especially in a database with a large number of transactions.
• Time-Consuming: The process of replaying the transactions from the log
file can be time-consuming, especially if there are a large number of
transactions to recover.
Conclusion
In conclusion, data integrity and system reliability are maintained by log-based
recovery in database management systems. It minimizes data loss and ensures
reliability by assisting in the continuous restoration of databases following
failures.

You might also like