Transaction

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Unit-VI Transaction and Concurrency control

In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction. Atomicity requires that each transaction is "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not leave effects on the database at all, as if it never existed. Consistency: The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including but not limited to constraints, cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code) but merely that any programming errors do not violate any defined rules. Isolation: The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e. one after the other. Providing isolation is the main goal of concurrency control. Depending on concurrency control method, the effects of an incomplete transaction might not even be visible to another transaction. Durability: Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory. Examples The following examples further illustrate the ACID properties. In these examples, the database table has two fields, A and B, in two records. An integrity constraint requires that the value in A and the value in B must sum to 100.

constraint after it has executed. However, assume that after removing 10 from A, the transaction is unable to modify B. If the database retained A's new value, atomicity and the constraint would both be violated. Atomicity requires that both parts of this transaction, or neither, be complete.

Consistency failure : Consistency is a very general term


which demands that the data must meet all validation rules. In the previous example, the validation is a requirement that A + B = 100. Also, it may be inferred that both A and B must be integers. A valid range for A and B may also be inferred. All validation rules must be checked to ensure consistency. Assume that a transaction attempts to subtract 10 from A without altering B. Because consistency is checked after each transaction, it is known that A + B = 100 before the transaction begins. If the transaction removes 10 from A successfully, atomicity will be achieved. However, a validation check will show that A + B = 90, which is inconsistent with the rules of the database. The entire transaction must be cancelled and the affected rows rolled back to their pre-transaction state. If there had been other constraints, triggers, or cascades, every single change operation would have been checked in the same way as above before the transaction was committed.

Isolation failure To demonstrate isolation, we assume


two transactions execute at the same time, each attempting to modify the same data. One of the two must wait until the other completes in order to maintain isolation. Consider two transactions. T1 transfers 10 from A to B. T2 transfers 10 from B to A. Combined, there are four actions: subtract 10 from A add 10 to B. subtract 10 from B add 10 to A. If these operations are performed in order, isolation is maintained, although T2 must wait. Consider what happens if T1 fails half-way through. The database eliminates T1's effects, and T2 sees only valid data. By interleaving the transactions, the actual order of actions might be: , , , . Again, consider what happens if T1 fails. T1 still subtracts 10 from A. Now, T2 adds 10 to A restoring it to its initial value. Now T1 fails. What should A's value be? T2 has already changed it. Also, T1 never changed B. T2 subtracts 10 from it. If T2 is allowed to complete, B's value will be 10 too low, and A's value will be unchanged, leaving an invalid database. This is known as a write-write failure, because two transactions attempted to write to the same data field.

Durability failure Assume that a transaction transfers


10 from A to B. It removes 10 from A. It then adds 10 to B. At this point, a "success" message is sent to the user. However, the changes are still queued in the disk buffer waiting to be committed to the disk. Power fails and the changes are lost. The user assumes (understandably) that the changes have been made.

Atomicity failure : Assume that a transaction attempts


to subtract 10 from A and add 10 to B. This is a valid transaction, since the data continue to satisfy the

Transaction State
Because failures occurs, transaction are broken up into states to handle various situation.

Schedule 3: The following concurrent schedule does not preserve the value of the sum A + B.

Fig. 2 (Schedule-2)

Fig. 3 (Schedule-3)

Active, the initial state; the transaction stays in this state while it is executing. Partially committed, after the final statement has been executed. Failed, after the discovery that normal execution can no longer proceed. Aborted, after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted: Restart the transaction only if no internal logical error but hardware or software failure. Kill the transaction once internal logical error occurs like incorrect data input. Committed, after successful completion. The transaction is terminated once it is aborted or committed.

Equivalent schedules: For any database state, the effect (on the set of objects in the database) of executing the first schedule is identical to the effect of executing the second schedule. Serializable schedule: A schedule that is equivalent to some serial execution of the transactions. If the dependency graph of a schedule is acyclic, the schedule is called conflict serializable. Such a schedule is equivalent to a serial schedule. This is the condition that is typically enforced in a DBMS(although it is not necessary for serializability). A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of: 1. Conflict serializability 2. View serializability Conflict Serializability: Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q. 1. Ii = read(Q), Ij = read(Q). Ii and Ij dont conflict. 2. Ii = read(Q), Ij = write(Q). They conflict. 3. Ii = write(Q), Ij = read(Q). They conflict 4. Ii = write(Q), Ij = write(Q). They conflict If a schedule S can be transformed into a schedule S by a series of swaps of non-conflicting instructions, we say that S and S are conflict equivalent. We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule. Test for conflict serializability: A schedule is conflict serializable if it produces an acyclic precedence graph

Schedule:
Schedules is sequences that indicate the chronological order in which instructions of concurrent transactions are executed. a schedule for a set of transactions must consist of all instructions of those transactions Must preserve the order in which the instructions appear in each individual transaction. Example of schedules (refer figure 1) Schedule 1: Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The following is a serial schedule, in which T1 is followed by T2. Fig: 1 Schedule1 Schedule 2: Let T1 and T2 be the transactions defined previously. The following schedule is not a serial schedule, but it is equivalent to Schedule 1.

Algorithm to produce a precedence graph:


o Add a node for each transaction; o Add a directed edge from Ti to Tj , if Tj reads the value of an item written by Ti; o Add a directed edge from Ti to Tj , if Tj writes a value in to an item after it has been read by Ti. Ask: Is the graph cyclic? Yes?: Schedule is not ConflictSerializable

View Serializability: Let S and S be two schedules with the same set of transactions. S and S are view equivalent if the following three conditions are met: 1. For each data item Q, if transaction Ti reads the initial value of Q in schedule S, then transaction Ti must, in schedule S, also read the initial value of Q. 2. For each data item Q if transaction Ti executes read(Q) in schedule S, and that value was produced by transaction Tj (if any), then transaction Ti must in schedule S also read the value of Q that was produced by transaction Tj. 3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must perform the final write(Q) operation in schedule S. As can be seen, view equivalence is also based purely on reads and writes alone. A schedule S is view serializable it is view equivalent to a serial schedule. Every conflict serializable schedule is also view serializable. Schedule in figure a schedule which is viewserializable but not conflict serializable. Every view serializable schedule that is not conflict serializable has blind writes.

Cascadeless schedules cascading rollbacks


cannot occur; for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears

before the read operation of Tj. Every cascadeless schedule is also recoverable
It is desirable to restrict the schedules to those that are cascadeless.

Implementation of Isolation: Schedules must be conflict or view serializable, and recoverable, for the sake of database consistency, and preferably cascadeless. A policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency. Concurrency-control schemes tradeoff between the amount of concurrency they allow and the amount of overhead that they incur. Some schemes allow only conflict-serializable schedules to be generated, while others allow viewserializable schedules that are not conflictserializable.

Recoverability Need to address the effect of transaction failures on concurrently running transactions Recoverable schedule if a transaction Tj reads a data items previously written by a transaction Ti, the commit operation of Ti appears before the commit operation of Tj. The schedule in right figure is not recoverable if T9 commits immediately after the read. If T8 should abort, T9 would have read (and possibly shown to the user) an inconsistent database state. Hence database must ensure that schedules are recoverable. Cascading rollback a single transaction failure leads to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable) If T10 fails, T11 and T12 must also be rolled back. Can lead to the undoing of a significant amount of work.

CONCURRENCY CONTROL: Process of managing simultaneous operations on the database without having them interfere with one another. Prevents interference when two or more users are accessing database simultaneously and at least one is updating data. Although two transactions may be correct in themselves, interleaving of operations may produce an incorrect result. Need for Concurrency Control: Three examples of potential problems concurrency: Lost update problem Uncommitted dependency problem Inconsistent analysis problem. caused by

Problem avoided by preventing T3 from reading balx until after T4 commits or aborts. __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ Inconsistent Analysis Problem(inconsistent Retrieval) Occurs when transaction reads several values but second transaction updates some of them during execution of first. Example: T6 is totaling balances of account x (100), account y (50), and account z (25). Meantime, T5 has transferred 10 from balx to balz, so T6 now has wrong result (10 too high).

Lost Update Problem : Successfully completed update is overridden by another user. Example: T1 withdraws 10 from an account with ba lx, initially 100. T2 deposits 100 into same account. Serially, final balance would be 190. Time T1 T2 BalX t1 Begin_transc 100 t2 Begin_transc read(balx) 100 t3 read(balx) Balx=balx+100 100 t4 Balx=balx-10 Write(balx) 200 t5 Write(balx) commit 90 t6 commit 90 Loss of T2's update!! This can be avoided by preventing T1 from reading balx until after update. __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ Uncommitted Dependency Problem : It occurs when one transaction can see intermediate results of another transaction before it has committed. Example: T4 updates balx to 200 but it aborts, so balx should be back at original value of 100. T3 has read new value of balx (200) and uses value as basis of 10 reduction, giving a new balance of 190, instead of 90. Time t1 t2 t3 t4 t5 t6 t7 t8 T1 T2 Begin_transc read(balx) Balx=balx+100 Write(balx) rollback BalX 100 100 100 200 200 100 190 190

Problem avoided by preventing T6 from reading balx and balz until after T5 completed updates. __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________ __________________________________________________

Begin_transc read(balx) Balx=balx-10 Write(balx) commit

Concurrency Control Techniques:


When multiple transactions are trying to access the same sharable resource, there could arise many problems if the access control is not done properly. There are some important mechanisms to which access control can be maintained. Earlier we talked about theoretical concepts like serializability, but the practical concept of this can be implemented by using Locks and Timestamps. Here we shall discuss some protocols where Locks and Timestamps can be used to provide an environment in which concurrent transactions can preserve their Consistency and Isolation properties. Two basic concurrency control techniques: Locking methods Timestamping Both are conservative approaches: delay transactions in case they conflict with other transactions. Optimistic methods assume conflict is rare and only check for conflicts at commit. Lock Based Protocol A lock is nothing but a mechanism that tells the DBMS whether a particular data item is being used by any transaction for read/write purpose. Since there are two types of operations, i.e. read and write, whose basic nature are different, the locks for read and write operation may behave differently. Read operation performed by different transactions on the same data item poses less of a challenge. The value of the data item, if constant, can be read by any number of transactions at any given time. Write operation is something different. When a transaction writes some value into a data item, the content of that data item remains in an inconsistent state, starting from the moment when the writing operation begins up to the moment the writing operation is over. If we allow any other transaction to read/write the value of the data item during the write operation, those transaction will read an inconsistent value or overwrite the value being written by the first transaction. In both the cases anomalies will creep into the database. The simple rule for locking can be derived from here. If a transaction is reading the content of a sharable data item, then any number of other processes can be allowed to read the content of the same data item. But if any transaction is writing into a sharable data item, then no other transaction will be allowed to read or write that same data item. Depending upon the rules we have found, we can classify the locks into two types. Shared Lock: A transaction may acquire shared lock on a data item in order to read its content. The lock is shared in the sense that any other transaction can acquire the shared lock on that same data item for reading purpose. Exclusive Lock: A transaction may acquire exclusive lock on a data item in order to both read/write into it. The lock is excusive in the sense that no other transaction can acquire any kind of lock (either shared or exclusive) on that same data item. The relationship between Shared and Exclusive Lock can be represented by the following table which is known as Lock Matrix.

Locks already existing Locks to Be granted Shared Exclusive

Shared TRUE FALSE

Exclusive FALSE FALSE

How Should Lock be Used? In a transaction, a data item which we want to read/write should first be locked before the read/write is done. After the operation is over, the transaction should then unlock the data item so that other transaction can lock that same data item for their respective usage. In the earlier chapter we had seen a transaction to deposit Rs 100/- from account A to account B. The transaction should now be written as the following: Lock-X (A); (Exclusive Lock, we want to both read As value and modify it) Read A; A = A 100; Write A; Unlock (A); (Unlocking A after the modification is done) Lock-X (B); (Exclusive Lock, we want to both read Bs value and modify it) Read B; B = B + 100; Write B; Unlock (B); (Unlocking B after the modification is done) And the transaction that deposits 10% amount of account A to account C should now be written as: Lock-S (A); (Shared Lock, we only want to read As value) Read A; Temp = A * 0.1; Unlock (A); (Unlocking A) Lock-X (C); (Exclusive Lock, we want to both read Cs value and modify it) Read C; C = C + Temp; Write C; Unlock (C); (Unlocking C after the modification is done) Let us see how these locking mechanisms help us to create error free schedules. You should remember that in the previous chapter we discussed an example of an erroneous schedule: T1 T2
Read A; A = A 100; Read A; Temp = A * 0.1; Read C; C = C + Temp; Write C; Write A; Read B; B = B + 100; Write B;

We detected the error based on common sense only, that the Context Switching is being performed before the new value has been updated in A. T2 reads the old value of A, and thus deposits a wrong amount in C. Had we used the locking mechanism, this error could never have occurred. Let us rewrite the schedule using the locks.

T1 Lock-X (A) Read A; A = A 100;

T2

Lock-S (A) Read A; Temp = A * 0.1; Unlock (A) Lock-X (C) Read C; C = C + Temp; Write C; Unlock (C) Write A; Unlock (A) Lock-X (B) Read B; B = B + 100; Write B; Unlock (B)

Growing Phase: In this phase the transaction can only acquire locks, but cannot release any lock. The transaction enters the growing phase as soon as it acquires the first lock it wants. From now on it has no option but to keep acquiring all the locks it would need. It cannot release any lock at this phase even if it has finished working with a locked data item. Ultimately the transaction reaches a point where all the lock it may need has been acquired. This point is called Lock Point. Shrinking Phase: After Lock Point has been reached, the transaction enters the shrinking phase. In this phase the transaction can only release locks, but cannot acquire any new lock. The transaction enters the shrinking phase as soon as it releases the first lock after crossing the Lock Point. From now on it has no option but to keep releasing all the acquired locks. There are two different versions of the Two Phase Locking Protocol. One is called the Strict Two Phase Locking Protocol and the other one is called the Rigorous Two Phase Locking Protocol. Strict Two Phase Locking Protocol In this protocol, a transaction may release all the shared locks after the Lock Point has been reached, but it cannot release any of the exclusive locks until the transaction commits. This protocol helps in creating cascade less schedule. A Cascading Schedule is a typical problem faced while creating concurrent schedule. Consider the following schedule once again.
T1 Lock-X (A) Read A; A = A 100; Write A; Unlock (A) Lock-S (A) Read A; Temp = A * 0.1; Unlock (A) Lock-X (C) Read C; C = C + Temp; Write C; Unlock (C) Lock-X (B) Read B; B = B + 100; Write B; Unlock (B) T2

We cannot prepare a schedule like the above even if we like, provided that we use the locks in the transactions. See the first statement in T2 that attempts to acquire a lock on A. This would be impossible because T1 has not released the excusive lock on A, and T2 just cannot get the shared lock it wants on A. It must wait until the exclusive lock on A is released by T1, and can begin its execution only after that. So the proper schedule would look like the following:
T1 Lock-X (A) Read A; A = A 100; Write A; Unlock (A) Lock-S (A) Read A; Temp = A * 0.1; Unlock (A) Lock-X (C) Read C; C = C + Temp; Write C; Unlock (C) Lock-X (B) Read B; B = B + 100; Write B; Unlock (B) T2

And this automatically becomes a very correct schedule. We need not apply any manual effort to detect or correct the errors that may creep into the schedule if locks are not used in them. Two Phase Locking Protocol : The use of locks has helped us to create neat and clean concurrent schedule. The Two Phase Locking Protocol defines the rules of how to acquire the locks on a data item and how to release the locks. The Two Phase Locking Protocol assumes that a transaction can only be in one of two phases.

The schedule is theoretically correct, but a very strange kind of problem may arise here. T1 releases the exclusive lock on A, and immediately after that the Context Switch is made. T2 acquires a shared lock on A to read its value, perform a calculation, update the content of account C and then issue COMMIT. However, T1 is not finished yet. What if the remaining portion of T1 encounters a problem (power failure, disc failure etc) and cannot be committed? In that case T1 should be rolled back and the old BFIM value of A should be restored. In such a case T2, which has read the updated (but not committed) value of A and

calculated the value of C based on this value, must also have to be rolled back. We have to rollback T2 for no fault of T2 itself, but because we proceeded with T2 depending on a value which has not yet been committed. This phenomenon of rolling back a child transaction if the parent transaction is rolled back is called Cascading Rollback, which causes a tremendous loss of processing power and execution time. Using Strict Two Phase Locking Protocol, Cascading Rollback can be prevented. In Strict Two Phase Locking Protocol a transaction cannot release any of its acquired exclusive locks until the transaction commits. In such a case, T1 would not release the exclusive lock on A until it finally commits, which makes it impossible for T2 to acquire the shared lock on A at a time when As value has not been committed. This makes it impossible for a schedule to be cascading. Rigorous Two Phase Locking Protocol In Rigorous Two Phase Locking Protocol, a transaction is not allowed to release any lock (either shared or exclusive) until it commits. This means that until the transaction commits, other transaction might acquire a shared lock on a data item on which the uncommitted transaction has a shared lock; but cannot acquire any lock on a data item on which the uncommitted transaction has an exclusive lock.

Q does not exist there anymore, and T would be rolled back. 2. If TS (T) >= W-timestamp (Q), then the transaction T is trying to read a value of data item Q which has been written and committed by some other transaction earlier. Hence T will be allowed to read the value of Q, and the Rtimestamp of Q should be updated to TS(T). For Write operations: 1. If TS (T) < R-timestamp (Q), then it means that the system has waited too long for transaction T to write its value, and the delay has become so great that it has allowed another transaction to read the old value of data item Q. In such a case T has lost its relevance and will be rolled back. 2. Else if TS (T) < W-timestamp (Q), then transaction T has delayed so much that the system has allowed another transaction to write into the data item Q. in such a case too, T has lost its relevance and will be rolled back. 3. Otherwise the system executes transaction T and updates the W-timestamp of Q to TS(T).

Timestamp Ordering Protocol:


A timestamp is a tag that can be attached to any transaction or any data item, which denotes a specific time on which the transaction or data item had been activated in any way. We, who use computers, must all be familiar with the concepts of Date Created or Last Modified properties of files and folders. Well, timestamps are things like that. A timestamp can be implemented in two ways. The simplest one is to directly assign the current value of the clock to the transaction or the data item. The other policy is to attach the value of a logical counter that keeps incrementing as new timestamps are required. The timestamp of a transaction denotes the time when it was first activated. The timestamp of a data item can be of the following two types: W-timestamp (Q): This means the latest time when the data item Q has been written into. R-timestamp (Q): This means the latest time when the data item Q has been read from. These two timestamps are updated each time a successful read/write operation is performed on the data item Q. How should timestamps be used? The timestamp ordering protocol ensures that any pair of conflicting read/write operations will be executed in their respective timestamp order. This is an alternative solution to using locks. For Read operations: 1. If TS (T) < W-timestamp (Q), then the transaction T is trying to read a value of data item Q which has already been overwritten by some other transaction. Hence the value which T wanted to read from

You might also like