Distributed Database Systems-Chhanda Ray-1
Distributed Database Systems-Chhanda Ray-1
Distributed Database Systems-Chhanda Ray-1
Distributed Transaction
Management
ACID Properties of Transactions
A transaction has four properties that lead to the consistency and reliability of a database
management system. These are atomicity, consistency, isolation and durability.
Atomicity Atomicity refers to the fact that each transaction is a single logical unit of
work in a database environment that may consist of a number of operations. Thus, either
all operations of a transaction are to be completed successfully, or none of them are
carried out at all. This property is known as all-or-nothing property of a transaction. It is
the responsibility of the DBMS to maintain atomicity even when failures occur in the
system. If the execution of a transaction is interrupted by any sort of failures, the DBMS
will determine what actions should be performed with that transaction to recover from
the failed state. To maintain atomicity during failures, either the system should complete
the remaining operations of the transaction or it should undo all the operations that
have already been executed by the transaction before the failure.
The execution of a transaction may be interrupted owing to a variety of reasons such as bad
input data, deadlock, system crash, processor failures, power failure, or media and
communication links failures (in case of distributed system). The recovery after a transaction
failure is divided into two different categories based on the different types of failures that
can occur in a database environment. The activity of ensuring transaction atomicity in the
case of transaction aborts due to failures caused by bad input data, deadlock and other
factors (except system crash and hardware failures) is called transaction recovery. On the
other hand, the activity of preserving transaction atomicity in the case of system crash or
hardware failures is called crash recovery.
Consistency Referring to the correctness of data, this property states that a transaction must
transform the database from one consistent state to another consistent state. It is the
responsibility of both the semantic data controller (integrity manager) and the concurrency
control mechanisms to maintain consistency of the data. The semantic data controller can
ensure consistency by enforcing all the constraints that have been specified in the database
schema, such as integrity and enterprise constraints. Similarly, the responsibility of a
concurrency control mechanism is to disallow transactions from reading or updating “dirty
data”. Dirty data refers to the state of the data at which the data have been updated by a
transaction, but the transaction has not yet committed. (The details of distributed
concurrency control mechanisms are discussed in Chapter 8).
A classification of consistency has been defined [Grey et al., 1976] based on dirty data,
which groups databases into four levels of consistency, which are listed in the following.
Degree 3: A transaction T1 observes degree 3 consistency if it is restricted by the following
rules:
T1 does not overwrite dirty data of other transactions.
T1 does not commit any write operation until it completes all its write operations, that is,
until the end of the transaction.
T1 does not read dirty data from other transactions.
Other transactions in the system do not read any dirty data updated by T1 before T1
completes.
Degree 2: A transaction T1 observes degree 2 consistency if it is restricted by the following
rules.
T1 does not overwrite dirty data of other transactions.
T1 does not commit any write operation until it completes all its write operations, that is,
until the end of the transaction.
T1 does not read dirty data from other transactions.
T1 does not commit any write operation until it completes all its write operations, that is,
until the end of the transaction.
Degree 0: A transaction T1 observes degree 0 consistency if it is restricted by the following
facts.
Thus, it is clear that a higher degree of consistency covers all the lower levels. These
different degrees of consistency provide flexibility to application developers to define
transactions that can operate at different levels.
Transaction manager –. The transaction manager at each site manages the execution of the
transactions that access data stored at that local site. Each such transaction may be a local
transaction or part of a global transaction. The structure of the transaction manager is similar to
that of its counterpart in a centralized system, and it is responsible for the following:
Lost update anomaly – This anomaly can occur when an apparently successful
completed update operation made by one user (transaction) is overridden by another
user (transaction). This anomaly is illustrated in example 8.1.
Example 8.1.
Consider the example of an online electronic funds transfer system accessed via
remote automated teller machines (ATMs). Assume that one customer (T1) is
withdrawing $500 from an account with balance B, initially $2,000, while another
customer (T2) is depositing an amount $600 into the same account via ATM from a
different place. If these two tasks are executed serially, then the balance B will be
$2,100. Further assume that both the tasks are started nearly at the same time and both
read the same value $2,000. Due to concurrent access to the same data item, when
customer T1 is decrementing the value of B to $1,500, customer T2 is incrementing
the value of B to $2,100. Thus, update to the data item B made by customer T1 will be
overridden by the customer T2. Here, the consistency of the data is violated owing to
this anomaly, known as lost update anomaly.
Uncommitted dependency or dirty read anomaly – This problem occurs when one
transaction allows other transactions to read its data before it has committed and then
decides to abort. The dirty read anomaly is explained in example 8.2.
Example 8.2.
Let us consider the example of a banking system in which two transactions T3 and
T4 are accessing the database concurrently. The transaction T3 is calculating the
average balance of all customers of a particular branch while the transaction T4 is
depositing $1,000 into one customer account with balance balx at the same time.
Hence, the average balance calculated by the transaction T3 is incorrect, because the
transaction T4 incremented and updated the balance of one customer during the
execution of the transaction T3. This anomaly, known as inconsistent analysis
anomaly, can be avoided by preventing the transaction T4 from reading and updating
the value of balx until the transaction T3 completes its execution.
Non-repeatable or fuzzy read anomaly – Fuzzy read or non-repeatable read
anomalyoccurs when one transaction reads the value of a data item that is
subsequently modified or deleted by another transaction. If the first transaction
attempts to re-read the data item either that data item may not be found or it may
read a different value.
Example 8.4.
Assume that in the employee database, one transaction T1 reads the salary of an
employee while another transaction T2 is updating the salary of the same employee
(or the same employee record is deleted from the employee database by the
transaction T2) concurrently. Now, if the transaction T1 attempts to re-read the value
of the salary of the same employee, it will read a different value (or that record will be
not found) since employee database is updated by the transaction T2. Thus, two read
operations within the same transaction T1 return different values. This anomaly occurs
due to concurrent access of the same data item, known as fuzzy read anomaly.
Phantom read anomaly – This anomaly occurs when a transaction performing some
operation on the database based on a selection predicate, another transaction inserts
new tuples satisfying that predicate into the same database. This is known as phantom
read. For example, assume that in the employee database, one transaction T1 retrieves
all employees belonging to the R&D department while another transaction T2 inserts
new employees into that department. Hence, if the transaction T1 re-executes the same
operation, then the retrieved data set will contain additional (phantom) tuples that has
been inserted by the transaction T2 in the meantime.
Besides all the above anomalies, multiple-copy consistency problem can arise in a
distributed database system as a result of data distribution. This problem occurs when
data items are replicated and stored at different locations in a DDBMS. To maintain the
consistency of the global database, when a replicated data item is updated at one site, all
other copies of the same data item must be updated. If any copy is not updated, the
database becomes inconsistent. The updates of the replicated data items are carried out
either synchronously or asynchronously to preserve the consistency of the data.
Distributed Serializability
A transaction consists of a sequence of read and write operations together with some
computation steps that represents a single logical unit of work in a database environment. All
the operations of a transaction are units of atomicity; that is, either all the operations should
be completed successfully or none of them are carried out at all. Ozsu has defined a formal
notation for the transaction concept. According to this formalization, a transaction Ti is a
partial ordering of its operations and the termination condition. The partial order P = {Σ, α}
defines an ordering among the elements of Σ (called the domain), where Σ consists of the
read and write operations and the termination condition (abort, commit) of the transaction Ti,
and α indicates the execution order of these operations within Ti.
The execution sequence or execution ordering of operations of a transaction is called the
schedule. Let E denote an execution sequence of transactions T 1, T 2,. . ., Tn,. E is a serial
execution or serial schedule if no transactions execute concurrently in E; that is, each
transaction is executed to completion before the next one begins. Every serial execution or
serial schedule is deemed to be correct, because the properties of transactions imply that a
serial execution terminates properly and preserves database consistency. On the other hand,
an execution is said to be a non-serial executionor non-serial schedule if the operations
from a set of concurrent transactions execute simultaneously. A non-serial schedule (or
execution) is serializable if it is computationally equivalent to a serial schedule, that is, if it
produces the same result and has the same effect on the database as some serial execution.
To prevent data inconsistency, it is essential to guarantee serializability of concurrent
transactions.
In serializability, the ordering of read and write operations are important. In this context, it is
necessary to introduce the concept of conflicting transactions. If two transactions perform
read or write operations on different data items, then they are not conflicting, and hence the
execution order is not important. If two transactions perform read operation on the same data
item, then they are non-conflicting. Two transactions are said to be conflicting if they access
the same data item concurrently, and at least one transaction performs a write operation.
Thus, concurrent read -read operation by two transactions is non-conflicting, but concurrent
read-write, write-read and write-write operations by two transactions are conflicting. The
execution order is important if there is a conflicting operation caused by two transactions.
If concurrent accesses of data items are allowed in a database environment, then the schedule
may be non-serial, which indicates that the consistency of data may be violated. To allow
maximum concurrency and to preserve the consistency of data, it is sometimes possible to
generate a schedule from a non-serial schedule that is equivalent to a serial schedule or serial
execution order by swapping the order of the non-conflicting operations. If schedule S1 is
derived from a non-serial schedule S2 by swapping the execution order of non-conflicting
operations in S2, and if S1 produces the same output as that of a serial schedule S3 with the
same set of transactions, then S1 is called a serializableschedule. This type of serializability
is known as conflict serializability and Schedule S1 is said to be conflict equivalent to
schedule S2. A conflict serializable schedule orders any conflicting operations in the same
way as a serial execution. A directed precedence graph or a serialization graph can be
produced to test for conflict serializability.
Serializability theory for centralized databases can be extended in a straightforward manner
for distributed non-replicated databases. In a distributed system, each local site maintains a
schedule or execution order of transactions or subtransactions (part of a global transaction)
running at that site, called local schedule. The global schedule or global execution order is
the union of all local schedules in a non -replicated distributed database system. Hence, if
each local site maintains serializability, then the global schedule also becomes serializable as
the local serialization orders are identical. Replication of data items in a distributed system
adds extra complexity in maintaining global serializability or distributed serializability. In
this case also, it is possible that local schedules are serializable, but mutual consistency of
the replicated data items may not be preserved owing to the conflicting operations on the
same data item at multiple sites. Thus, a distributed schedule or global schedule (the union
of all local schedules) is said to be distributed serializable if the execution order at each
local site is serializable and the local serialization orders are identical. This requires that all
subtransactions appear in the same order in the equivalent serial schedule at all sites.
Formally, distributed serializability can be defined as follows. Consider S is the union of all
local serializable schedules S1, S2,. . ., Sn respectively in a distributed system. Now, the
global schedule S is said to be distributed serializable, if for each pair of conflicting
operations Oi and Oj from distinct transactions Ti and Tj respectively from different sites,
Oi precedes Oj in the total ordering S, and if and only if Ti precedes Tj in all of the local
schedules where they appear together. To attain serializability, a DDBMS must incorporate
synchronization techniques that control the relative ordering of conflicting operations.
Example 8.5.
Read(b);
a : = a + b;
write(a);
commit;
The partial ordering of the above transaction Ti can be formally represented as P = {Σ, α},
Where Σ = {R(a), R(b), W(a), C} and
α = {(R(a), W(a)), (R(b), W(a)), (W(a), (C)), (R(a), (C)), (R(b), (C))}.
Here, the ordering relation specifies the relative ordering of all operations with respect to
the termination condition. The partial ordering of a transaction facilitates to derive the
corresponding directed acyclic graph (DAG) for the transaction. The DAG of the above
transaction Ti is illustrated in figure 8.1.
Centralized 2PL
In the centralized 2PL method, the lock manager or scheduler is a central component, and
hence a single site is responsible for managing all activities of the lock manager for the
entire distributed system. Before accessing any data item at any site, appropriate locks must
be obtained from the central lock manager. If a global transaction Ti is initiated at site Si of
the distributed system, then the centralized 2PL for that global transaction should be
implemented in the following way.
The transaction manager at the site Si (called transaction coordinator) partitions the global
transaction into several subtransactions using the information stored in the global system
catalog. Thus, the transaction coordinator is responsible for maintaining consistency of data. If
the data items are replicated, then it is also the responsibility of the transaction coordinator to
ensure that all replicas of the data item are updated. Therefore, for a write operation, the
transaction coordinator requests exclusive locks on all replicas of the data item that are
stored at different sites. However, for a read operation, the transaction coordinator can select
any one of the replicas of the data item that are available, preferably at its own site. The local
transaction managers of the participating sites request and release locks from/to the
centralized lock manager following the normal rules of the 2PL protocol. Participating sites
are those sites that are involved in the execution of the global transaction (subtransactions).
After receiving the lock request, the centralized lock manager checks whether that lock
request is compatible with the existing locking status or not. If it is compatible, the lock
manager sends a message to the corresponding site that the lock has been granted; otherwise,
the lock request is put in a queue until it can be granted.
In some systems, a little variation of this method is followed. Here the transaction
coordinator sends lock requests to the centralized lock manager on behalf of participating
sites. In this case, a global update operation that involves n sites requires a minimum of 2n +
3 messages to implement centralized 2PL method. These are 1 lock request from the
transaction coordinator, 1 lock grant message from the centralized lock manager, n update
messages from the transaction coordinator, nacknowledgement messages from
the n participating sites and 1 unlock request from the transaction coordinator as illustrated
in figure 8.3.
The main advantage of the centralized 2PL method in DDBMSs is that it is a straightforward
extension of the 2PL protocol for centralized DBMSs; thus, it is less complicated and
implementation is relatively easy. In this case, the deadlock detection can be handled easily,
because the centralized lock manager maintains all the locking information. Hence, the
communication cost also is relatively low. In the case of DDBMSs, the disadvantages of
implementing the centralized 2PL method are system bottlenecks and lower reliability. For
example, as all lock requests go to the centralized lock manager, that central site may
become a bottleneck to the system. Similarly, the failure of the central site causes a major
failure of the system; thus, it is less reliable. This method also hampers the local autonomy.
Distributed 2PL
Distributed 2PL method implements the lock manager at each site of a distributed system. Thus,
the lock manager at each site is responsible for managing locks on data items that are stored
locally. If the database is not replicated, then distributed 2PL becomes the same as primary copy
2PL. If the database is replicated, then distributed 2PL implements a read-one-write-all
(ROWA) replica control protocol. In ROWA replica control protocol, any copy of a replicated
data item can be used for a read operation, but for a write operation, all copies of the replicated
data item must be exclusively locked before performing the update operation.
In distributed 2PL, when a global transaction is initiated at a particular site, the transaction
coordinator (the transaction manager of that site is called transaction coordinator) sends lock
requests to lock managers of all participating sites. In response to the lock requests the lock
manager at each participating site can send a lock granted message to the transaction
coordinator. However, the transaction coordinator does not wait for the lock granted
message in some implementations of the distributed 2PL method. The operations that are to
be performed at a participating site are passed by the lock manager of that site to the
corresponding transaction manager instead of the transaction coordinator. At the end of the
operation the transaction manager at each participating site can send the corresponding
message to the transaction coordinator. In an alternative approach, the transaction manager at
a participating site can also pass the “end of operation” message to its own lock manager,
who can then release the locks and inform the transaction coordinator. The communication
between the participating sites and the transaction coordinator when executing a global
transaction using distributed 2PL is depicted in figure 8.4.
In distributed 2PL method, locks are handled in a decentralized manner; thus, it overcomes
the drawbacks of centralized 2PL method. This protocol ensures better reliability than
centralized 2PL and primary copy 2PL methods. However, the main disadvantage of this
approach is that deadlock handling is more complex owing to the presence of multiple lock
managers. Hence, the communication cost is much higher than in primary copy 2PL, as all
data items must be locked before an update operation. In distributed 2PL method, a global
update operation that involves n participating sites may require a minimum of 5n messages.
These are n lock requests, n lock granted messages, n update messages, n acknowledgement
messages and n unlock requests.
In figure 9.3, local deadlock detection is performed at leaf nodes site 1, site2, site3, site4 and site5.
The deadlock detector at site6, namely DD12 , is responsible for detecting any deadlock situation
involving its child nodes site1 and site2. Similarly, site3, site4 and site5 send their LWFGs to site7
and the deadlock detector at site7 searches for any deadlocks involving its adjacent child nodes.
A global deadlock detector exists at the root of the tree that would detect the occurrence of
global deadlock situations in the entire distributed system. Here, the global deadlock detector
resides at site8 and would detect the deadlocks involving site6 and site7.
The performance of the hierarchical deadlock detection approach depends on the
hierarchical organization of nodes in the system. This organization should reflect the
network topology and the pattern of access requests to different sites of the network. This
approach is more reliable than centralized deadlock detection as it reduces the dependence
on a central site; also it reduces the communication cost. However, its implementation is
considerably more complicated, particularly with the possibility of site and communication
link failures. In this approach, detection of false deadlocks can also occur.
The LWFGs are merged so as to determine global deadlock situations. To avoid sites
transmitting their LWFGs to each other, a simple strategy is followed here. According to this
strategy, one timestamp value is allocated to each transaction and a rule is imposed such that
one site S i transmits its LWFG to the site Sk, if a transaction, say Tk, at site Sk is waiting for a
data item that is currently being held by a transaction Ti at site Si and ts(Ti) < ts(Tk). If ts(Ti) <
ts(Tk), the site Si transmits its LWFG to the site Sk, and the site Sk adds this information to its
LWFG and checks for cycles not involving the external node Tex in the extended graph. If
there is no cycle in the extended graph, the process continues until a cycle appears and it
may happen that the entire GWFG is constructed and no cycle is detected. In this case, it is
decided that there is no deadlock in the entire distributed system. On the other hand, if the
GWFG contains a cycle not involving the external node Tex, it is concluded that a deadlock
has occurred in the system. The distributed deadlock detection method is illustrated in figure
9.4.
For Practical
See slides of that topic discuss in class
Chapter 10. Distributed
Recovery Management