Transaction
Transaction
Transaction
In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction. Atomicity requires that each transaction is "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not leave effects on the database at all, as if it never existed. Consistency: The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors do not violate any defined rules. Isolation: The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e. one after the other. Providing isolation is the main goal of concurrency control. Depending on concurrency control method, the effects of an incomplete transaction might not even be visible to another transaction. Durability: Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory. Examples The following examples further illustrate the ACID properties. In these examples, the database table has two fields, A and B, in two records. An integrity constraint requires that the value in A and the value in B must sum to 100.
constraint after it has executed. However, assume that after removing 10 from A, the transaction is unable to modify B. If the database retained A's new value, atomicity and the constraint would both be violated. Atomicity requires that both parts of this transaction, or neither, be complete.
Transaction State
Because failures occurs, transaction are broken up into states to handle various situation.
Schedule 3: The following concurrent schedule does not preserve the value of the sum A + B.
Fig. 2 (Schedule-2)
Fig. 3 (Schedule-3)
Active, the initial state; the transaction stays in this state while it is executing. Partially committed, after the final statement has been executed. Failed, after the discovery that normal execution can no longer proceed. Aborted, after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted: Restart the transaction only if no internal logical error but hardware or software failure. Kill the transaction once internal logical error occurs like incorrect data input. Committed, after successful completion. The transaction is terminated once it is aborted or committed.
Equivalent schedules: For any database state, the effect (on the set of objects in the database) of executing the first schedule is identical to the effect of executing the second schedule. Serializable schedule: A schedule that is equivalent to some serial execution of the transactions. If the dependency graph of a schedule is acyclic, the schedule is called conflict serializable. Such a schedule is equivalent to a serial schedule. This is the condition that is typically enforced in a DBMS(although it is not necessary for serializability). A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of: 1. Conflict serializability 2. View serializability Conflict Serializability: Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q. 1. Ii = read(Q), Ij = read(Q). Ii and Ij dont conflict. 2. Ii = read(Q), Ij = write(Q). They conflict. 3. Ii = write(Q), Ij = read(Q). They conflict 4. Ii = write(Q), Ij = write(Q). They conflict If a schedule S can be transformed into a schedule S by a series of swaps of non-conflicting instructions, we say that S and S are conflict equivalent. We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule. Test for conflict serializability: A schedule is conflict serializable if it produces an acyclic precedence graph
Schedule:
Schedules is sequences that indicate the chronological order in which instructions of concurrent transactions are executed. a schedule for a set of transactions must consist of all instructions of those transactions Must preserve the order in which the instructions appear in each individual transaction. Example of schedules (refer figure 1) Schedule 1: Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The following is a serial schedule, in which T1 is followed by T2. Fig: 1 Schedule1 Schedule 2: Let T1 and T2 be the transactions defined previously. The following schedule is not a serial schedule, but it is equivalent to Schedule 1.
View Serializability: Let S and S be two schedules with the same set of transactions. S and S are view equivalent if the following three conditions are met: 1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Ti must, in schedule S, also read the initial value of Q. 2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that value was produced by transaction Tj (if any), then transaction Ti must in schedule S also read the value of Q that was produced by transaction Tj. 3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must perform the final write(Q) operation in schedule S. As can be seen, view equivalence is also based purely on reads and writes alone. A schedule S is view serializable it is view equivalent to a serial schedule. Every conflict serializable schedule is also view serializable. Schedule in figure a schedule which is viewserializable but not conflict serializable. Every view serializable schedule that is not conflict serializable has blind writes.
before the read operation of Tj. Every cascadeless schedule is also recoverable
It is desirable to restrict the schedules to those that are cascadeless.
Implementation of Isolation: Schedules must be conflict or view serializable, and recoverable, for the sake of database consistency, and preferably cascadeless. A policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency. Concurrency-control schemes tradeoff between the amount of concurrency they allow and the amount of overhead that they incur. Some schemes allow only conflict-serializable schedules to be generated, while others allow viewserializable schedules that are not conflictserializable.
Recoverability Need to address the effect of transaction failures on concurrently running transactions Recoverable schedule if a transaction Tj reads a data items previously written by a transaction Ti, the commit operation of Ti appears before the commit operation of Tj. The schedule in right figure is not recoverable if T9 commits immediately after the read. If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent database state. Hence database must ensure that schedules are recoverable. Cascading rollback a single transaction failure leads to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable) If T10 fails, T11 and T12 must also be rolled back. Can lead to the undoing of a significant amount of work.
CONCURRENCY CONTROL: Process of managing simultaneous operations on the database without having them interfere with one another. Prevents interference when two or more users are accessing database simultaneously and at least one is updating data. Although two transactions may be correct in themselves, interleaving of operations may produce an incorrect result. Need for Concurrency Control: Three examples of potential problems concurrency: Lost update problem Uncommitted dependency problem Inconsistent analysis problem. caused by
Problem avoided by preventing T3 from reading balx until after T4 commits or aborts. __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ Inconsistent Analysis Problem(inconsistent Retrieval) Occurs when transaction reads several values but second transaction updates some of them during execution of first. Example: T6 is totaling balances of account x (100), account y (50), and account z (25). Meantime, T5 has transferred 10 from balx to balz, so T6 now has wrong result (10 too high).
Lost Update Problem : Successfully completed update is overridden by another user. Example: T1 withdraws 10 from an account with ba lx, initially 100. T2 deposits 100 into same account. Serially, final balance would be 190. Time T1 T2 BalX t1 Begin_transc 100 t2 Begin_transc read(balx) 100 t3 read(balx) Balx=balx+100 100 t4 Balx=balx-10 Write(balx) 200 t5 Write(balx) commit 90 t6 commit 90 Loss of T2's update!! This can be avoided by preventing T1 from reading balx until after update. __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ Uncommitted Dependency Problem : It occurs when one transaction can see intermediate results of another transaction before it has committed. Example: T4 updates balx to 200 but it aborts, so balx should be back at original value of 100. T3 has read new value of balx (200) and uses value as basis of 10 reduction, giving a new balance of 190, instead of 90. Time t1 t2 t3 t4 t5 t6 t7 t8 T1 T2 Begin_transc read(balx) Balx=balx+100 Write(balx) rollback BalX 100 100 100 200 200 100 190 190
Problem avoided by preventing T6 from reading balx and balz until after T5 completed updates. __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________
How Should Lock be Used? In a transaction, a data item which we want to read/write should first be locked before the read/write is done. After the operation is over, the transaction should then unlock the data item so that other transaction can lock that same data item for their respective usage. In the earlier chapter we had seen a transaction to deposit Rs 100/- from account A to account B. The transaction should now be written as the following: Lock-X (A); (Exclusive Lock, we want to both read As value and modify it) Read A; A = A 100; Write A; Unlock (A); (Unlocking A after the modification is done) Lock-X (B); (Exclusive Lock, we want to both read Bs value and modify it) Read B; B = B + 100; Write B; Unlock (B); (Unlocking B after the modification is done) And the transaction that deposits 10% amount of account A to account C should now be written as: Lock-S (A); (Shared Lock, we only want to read As value) Read A; Temp = A * 0.1; Unlock (A); (Unlocking A) Lock-X (C); (Exclusive Lock, we want to both read Cs value and modify it) Read C; C = C + Temp; Write C; Unlock (C); (Unlocking C after the modification is done) Let us see how these locking mechanisms help us to create error free schedules. You should remember that in the previous chapter we discussed an example of an erroneous schedule: T1 T2
Read A; A = A 100; Read A; Temp = A * 0.1; Read C; C = C + Temp; Write C; Write A; Read B; B = B + 100; Write B;
We detected the error based on common sense only, that the Context Switching is being performed before the new value has been updated in A. T2 reads the old value of A, and thus deposits a wrong amount in C. Had we used the locking mechanism, this error could never have occurred. Let us rewrite the schedule using the locks.
T2
Lock-S (A) Read A; Temp = A * 0.1; Unlock (A) Lock-X (C) Read C; C = C + Temp; Write C; Unlock (C) Write A; Unlock (A) Lock-X (B) Read B; B = B + 100; Write B; Unlock (B)
Growing Phase: In this phase the transaction can only acquire locks, but cannot release any lock. The transaction enters the growing phase as soon as it acquires the first lock it wants. From now on it has no option but to keep acquiring all the locks it would need. It cannot release any lock at this phase even if it has finished working with a locked data item. Ultimately the transaction reaches a point where all the lock it may need has been acquired. This point is called Lock Point. Shrinking Phase: After Lock Point has been reached, the transaction enters the shrinking phase. In this phase the transaction can only release locks, but cannot acquire any new lock. The transaction enters the shrinking phase as soon as it releases the first lock after crossing the Lock Point. From now on it has no option but to keep releasing all the acquired locks. There are two different versions of the Two Phase Locking Protocol. One is called the Strict Two Phase Locking Protocol and the other one is called the Rigorous Two Phase Locking Protocol. Strict Two Phase Locking Protocol In this protocol, a transaction may release all the shared locks after the Lock Point has been reached, but it cannot release any of the exclusive locks until the transaction commits. This protocol helps in creating cascade less schedule. A Cascading Schedule is a typical problem faced while creating concurrent schedule. Consider the following schedule once again.
T1 Lock-X (A) Read A; A = A 100; Write A; Unlock (A) Lock-S (A) Read A; Temp = A * 0.1; Unlock (A) Lock-X (C) Read C; C = C + Temp; Write C; Unlock (C) Lock-X (B) Read B; B = B + 100; Write B; Unlock (B) T2
We cannot prepare a schedule like the above even if we like, provided that we use the locks in the transactions. See the first statement in T2 that attempts to acquire a lock on A. This would be impossible because T1 has not released the excusive lock on A, and T2 just cannot get the shared lock it wants on A. It must wait until the exclusive lock on A is released by T1, and can begin its execution only after that. So the proper schedule would look like the following:
T1 Lock-X (A) Read A; A = A 100; Write A; Unlock (A) Lock-S (A) Read A; Temp = A * 0.1; Unlock (A) Lock-X (C) Read C; C = C + Temp; Write C; Unlock (C) Lock-X (B) Read B; B = B + 100; Write B; Unlock (B) T2
And this automatically becomes a very correct schedule. We need not apply any manual effort to detect or correct the errors that may creep into the schedule if locks are not used in them. Two Phase Locking Protocol : The use of locks has helped us to create neat and clean concurrent schedule. The Two Phase Locking Protocol defines the rules of how to acquire the locks on a data item and how to release the locks. The Two Phase Locking Protocol assumes that a transaction can only be in one of two phases.
The schedule is theoretically correct, but a very strange kind of problem may arise here. T1 releases the exclusive lock on A, and immediately after that the Context Switch is made. T2 acquires a shared lock on A to read its value, perform a calculation, update the content of account C and then issue COMMIT. However, T1 is not finished yet. What if the remaining portion of T1 encounters a problem (power failure, disc failure etc) and cannot be committed? In that case T1 should be rolled back and the old BFIM value of A should be restored. In such a case T2, which has read the updated (but not committed) value of A and
calculated the value of C based on this value, must also have to be rolled back. We have to rollback T2 for no fault of T2 itself, but because we proceeded with T2 depending on a value which has not yet been committed. This phenomenon of rolling back a child transaction if the parent transaction is rolled back is called Cascading Rollback, which causes a tremendous loss of processing power and execution time. Using Strict Two Phase Locking Protocol, Cascading Rollback can be prevented. In Strict Two Phase Locking Protocol a transaction cannot release any of its acquired exclusive locks until the transaction commits. In such a case, T1 would not release the exclusive lock on A until it finally commits, which makes it impossible for T2 to acquire the shared lock on A at a time when As value has not been committed. This makes it impossible for a schedule to be cascading. Rigorous Two Phase Locking Protocol In Rigorous Two Phase Locking Protocol, a transaction is not allowed to release any lock (either shared or exclusive) until it commits. This means that until the transaction commits, other transaction might acquire a shared lock on a data item on which the uncommitted transaction has a shared lock; but cannot acquire any lock on a data item on which the uncommitted transaction has an exclusive lock.
Q does not exist there anymore, and T would be rolled back. 2. If TS (T) >= W-timestamp (Q), then the transaction T is trying to read a value of data item Q which has been written and committed by some other transaction earlier. Hence T will be allowed to read the value of Q, and the Rtimestamp of Q should be updated to TS(T). For Write operations: 1. If TS (T) < R-timestamp (Q), then it means that the system has waited too long for transaction T to write its value, and the delay has become so great that it has allowed another transaction to read the old value of data item Q. In such a case T has lost its relevance and will be rolled back. 2. Else if TS (T) < W-timestamp (Q), then transaction T has delayed so much that the system has allowed another transaction to write into the data item Q. in such a case too, T has lost its relevance and will be rolled back. 3. Otherwise the system executes transaction T and updates the W-timestamp of Q to TS(T).