HSNetwork For DSystems
HSNetwork For DSystems
HSNetwork For DSystems
Department of Electrical Engineering - Systems University of Southern California Los Angeles, CA 90089-2565 Department of Computer Science University of California Riverside, CA 92521-0304
Abstract In a distributed database system (DDBS), the data is partitioned into smaller databases and distributed over multiple sites in a computer network. Associated with a DDBS are functions like query processing and concurrency control. Traditionally in DDBS research, the computer network has been considered a performance bottleneck and a lot of research e ort has been directed towards the design of database operations that minimize the data transmission cost. With the development of high speed networks, the network transmission costs go down and new algorithms that e ciently utilize the huge bandwidth available are required. In this paper, we rst identify the issues involved in developing this distributed application in a high speed environment. Then we demonstrate the inadequacy of existing database protocols in utilizing the Gigabit-per-second network. And nally, we develop a new concurrency control protocol that performs better than traditional DDBS in a high speed network. Both analytical and simulation results are presented. In this paper, we have concentrated on the concurrency control (CC) aspect of DDBS since this protocol is at the heart of the overall functioning of the distributed system.
This research is supported in part by the National Science Foundation under grant No. NCR-9016348, by the Department of Defense Joint Services Electronics Program under contract No. F49620-91-0028, and by the Paci c Bell External Technology Program.
we describe a new concurrency control algorithm that works well in the high speed environment. The main goal of the datacycle and the proposed concurrency control scheme is to use the large bandwidth available to buy extra transaction throughput.
2 Main Issues
As mentioned earlier, for a DDBS in a high speed environment, the main motivation is to be able to trade transaction throughput with the available bandwidth. All of the following discussion is motivated with the aim of reducing the average transaction response time (which increases the transaction throughput) using a broadband network resource. Conventional networks are di erent from high speed networks in that the propagation delay in the former is small relative to the transmission delay of data. For example, the propagation delay across the United States (at the speed of light) is about 20 msec. In the emerging high-speed network, at say, 1 Gigabit/sec, it will only take 1 msec. to transmit a 1 Megabit le, for a total delay of 21 msec. For a traditional network like the Internet, operating at 50 Kbits/sec, the transmission delay is 20 sec., for a total delay of 20020 msec. In addition, the transmission delay dominates the local processing delay at the computer sites. Therefore, most existing distributed database algorithms have focused on minimizing the volume of data to be transmitted. In a high speed environment, however, the emphasis now should be on reducing the total propagation and processing delay. It has been postulated 1] that the processor speed has a great impact on the data communication performance. Much of the time (up to 90%) required by the transfer of data from one site to another is spent by the processor in each site (packetizing, etc). Thus, it is even more important to have fast processors and optimize on the processing time. In recent years however, with the availability of higher bandwidth, research has been undertaken to make the communication protocols faster. An important criterion when dealing with DDBS in a high speed network is that the transaction volume is expected to be quite large, because of the high capacity of the system. This has signi cant e ect on the design of the DDBS protocols. Some of the other distinguishing characteristics of high speed networks include high reliability (owing to the low bit-error-rates (BER) of the optical ber transmission medium) and regular topology (to simplify routing). Although these two speci c issues will not be considered in this paper, they are to be kept in mind for future work. Summarizing, high speed networks are characterised by the following properties. The rest of this section is concerned with the way in which these properties a ect our design of a DDBS. Large propagation delay compared to transmission delay. Large propagation delay compared to processing delay. 3
3 Related Work
So far, not much work has been done in the area of distributed database systems on wide area high speed networks. Two notable attempts are the Fragment and Replicate query processing strategy designed for high speed local area networks, and the Datacycle Architecture that was aimed at designing a database machine with very high throughputs for primarily read-oriented transactions. These contributions are described in some detail in the following subsections.
Figure 1: A schematic diagram of the datacycle architecture acteristics of a distributed database system. For instance, the data must be physically distributed over a computer network for the system to pass as a DDBS 22]. The datacycle architecture resembles a multi-processor database machine more than a DDBS. It is given such a prominent coverage here since it is the only database system till now that has attempted to utilize the high bandwidth of optical bers to provide higher transaction throughputs.
UAU
4.1 Disadvantages of using the Datacycle Architecture as a Wide Area High UAU UAU UAU UAU Speed DDBS
The main disadvantages that would appear if the Datacycle architecture were to be used as a distributed database systemNetwork in a high speed environment are enumerated in this section. Upstream It is mainly a read-oriented system. The update throughput is limited by the centralized Update Manager. The multiversion optimistic concurrency control scheme does introduce more concurrency and prevents multiple aborts of long transactions. However, the database size increases proportionately to the number of versions being maintained. This has an adverse e ect of increasing the data-cycle duration, which in turn increases the query response time. Optimistic schemes for concurrency control usually do not perform very well under situations of high probability of con ict. With very high volume of transactions coming into a high speed environment, the probability of con ict will be high too, and thus an optimistic scheme is not the appropriate choice. 7
The central database pump is the most vulnerable component in this architecture. In case of a pump failure, the entire system shuts down. Data dependent transactions as well as transactions with large access sets incur high response times and have a high abort probability. It is our intention to avoid (or at least minimize) the above problems in our design, and compare it with both traditional DDBS and the Datacycle system.
5 Performance Comparison of a Read-Only System with the Datacycle Architecture and Traditional Locking scheme
The concurrency control (CC) scheme is at the heart of any DDBS system. The performance of a DDBS is largely determined by the e ciency of its concurrency control scheme. Much work has been done in the area of CC. 28] provides a comprehensive collection of CC schemes proposed in the available literature. The two-phase locking (2-PL) CC scheme is the currently accepted industry standard. It is our belief that the datacycle concept of periodically broadcasting the entire database works very well for a Read-only system, and will outperform the standard 2-PL, given a communication network with a high enough data rate. This fact will be demonstrated with an example in this section using performance models developed by us. In conventional distributed databases, the data items reside at speci c computer sites, and all requests to read or write a data-item are directed to the respective computer site. It must be noted that in a DDBS where updates are not made, there is no need to exercise concurrency control since reading a data item can be done simultaneously by two or more transactions. However, in a conventional DDBS running two-phase locking, even a query (read-only transactions) would have to request the data items in its access set from the respective computer sites. In a pure read-only system, the lock-request messages may be renamed data-request messages. An example comparing the performance of a read-only system with traditional locking and a read-only system with the datacycle scheme follows. It is assumed that the communication network as well as the computer sites are perfectly reliable.
5.1 Example
The datacycle implementation of a Read-only DDBS on a high speed network is considered rst. Although, ultimately it would make sense to consider a distributed version of the datacycle architecture, we look at the centralized data pump implementation here. In any case, for a Read-only 8
system, the distributed version would have the e ect of producing a lower cycle time, and all our results here can be scaled accordingly. Next, a very small DDBS example for illustrative purposes alone, is described.
G Tdc = C
dc Ts = T D
(1)
f (
( 1 ) = Tdc
otherwise
Ts <
Tdc + Ts
(2)
(3) (4)
This is not true in reality, but since it does not a ect our results in any signi cant manner, we shall make this assumption.
Figure 2: DDBS architecture for (a) the datacycle scheme (b) the locking scheme The minimum time to access a data item by a site is zero (if the data item resides at that same site) and the maximum time to access a data item is N Tp + Ttr (the sum of the propagation delay for both the access request and the data itself and the transmission delay for the data1 ). Let us denote by the time to access a single data item, as before, and assume that it is uniformly distributed in (0; NTp + Ttr ). Now,
f (
8 1 < ) = : NTp+Ttr 0
(5)
(a)
The transmission delay for the access request is neglected owing to the small size of the access request
10
(6)
The random variables are independent and identically distributed with the uniform distributions f ( ) in Equations 2 and 5 respectively for the datacycle implementation and the 2-PL scheme. Hence,
f k ( ) = k f ( ) (F ( ))k?1
(7)
probability density function of k , the mean k and second moment k ]2 may be easily calculated. As mentioned earlier, the access set cardinality k of a transaction ranges from 1 to K with probability gk . We are interested in the mean (X ) and second moment (X 2) of the random variable X ,
F ( ) is the probability distribution function of the random variable . Thus from the above
11
which denotes the time for any transaction to access all the data items in its access set. Clearly,
X=
K X k=1
k gk
and
X2 =
K X k=1
k ]2 gk
(8)
Let the random variable E denote the transaction execution time. Also, let E and E 2 denote the rst and second moments of E respectively. As de ned in earlier sections, the random variable X denotes the time duration to access all the data items in the access set of the transaction being processed. The average transaction service time is now denoted by S which is the sum of the data access time and the execution time. We assume that the execution does not begin until all the data items have been read. Thus, the following equations may be written assuming that the execution time and the data access time are independent.
S = X +E S = X +E
and
S2 = X 2 + E2 + 2 X E
(9)
The transaction execution time here is assumed to be the same for all transactions. This may not be true in general and a better assumption perhaps might be that the transaction execution time is a function of the cardinality of its access set. Then E = E (k), where k is the cardinality of the transaction access set. In that case, we de ne
Sk = S=
k + E (k)
and
S k ]2 = S2 =
K X k=1
k ]2 + E 2(k) + 2 k E (k)
(10)
S k gk
and
S k ]2 gk
(11)
Transactions queue up at each of the computer sites at an average arrival rate of transactions per second, and are serviced with an average service time of S and second moment of S 2 . Now, the average time spent by each transaction in the system R, may be calculated using the standard M/G/1 formula 29] as given below.
S2 R = S + 2 (1 ? S)
(12)
1200
800
1000
1 Gbps
1.5 Gbps
2 Gbps
800
600
400
200
0 0
10
12
14
Figure 3: Comparison of datacycle and traditional locking for a read-only system with pre-declared access sets: E=0
deadlock resolution time was not included in the overall query response time. Also, the access time of a single data item was assumed to be uniformly distributed. Results of the comparison between the datacycle scheme and the traditional locking are presented in Figure 3 for three di erent values of the network speed C , viz., 1, 1.5 and 2 Gbps with zero execution time. Figure 4 contains the same results for the datacycle scheme with an average exceution time of 50 msec. The execution time was assumed to be exponentially distributed and independent of the cardinality of the access set. The 1 ; k = 1; : : :; K maximum data access set cardinality K has been assigned a value of 10, with gk = K (i.e, all access set cardinalities between 1 and K are equally likely). A comparison of the simulation and analysis results for the datacycle and the traditional locking scheme (with zero execution time) are given in Figures 5 and 6 respectively. Clearly, the simulation and analytical results match fairly well for both cases. The simulations written in the C programming language executed 50,000 transactions typically and required a few minutes of CPU time (SUN Sparc Workstation) per data point. The analysis typically took some fraction of a second to compute all the necessary data points. As the speed of the network is increased, the maximum sustainable transaction arrival rate also increases as before. However, as the average query execution time increases, increasing the network data rate is not as e ective as before in improving the query throughput. This is a fairly obvious result, since the bottleneck device is now the database machine that processes the queries. Figure 4 has been provided here to stress the fact that there is little point in implementing distributed databases on very high speed networks if the query processing capabilities of database machines cannot keep up. From Figure 3, it is evident that the datacycle scheme performs worse than the 2-PL scheme, at 13
N = 10 computer sites. Deadlocks were taken into consideration in the simulation although the
600
800 700
*
500
2 Gbps
*
200
*
* * * * * *
100
* * * * * * * * * * * * * * * *
10
15
20
25
30
35
40
45
50
0 0
10
15
20
25
30
35
40
Figure 5: Comparison of simulation and analysis of the datacycle scheme for a read-only system with pre-declared access sets: E=0
Figure 6: Comparison of simulation and analysis of traditional locking for a read-only system with pre-declared access sets: E=0
a low data rate (1 Gbps); the two schemes perform almost equally at an intermediate data rate (1.5 Gbps) and at a high data rate (2 Gbps), the datacycle scheme out-performs the traditional 2-PL scheme. The same pattern is noticed for both data access distributions. Further, it is interesting to note that the performance of the 2-PL scheme does not change very much with increase in the data rate. This is to be expected since the 2-PL scheme is limited by the propagation delay between sites which is una ected by the network data rate. On the other hand, since the performance of the datacycle scheme directly depends on the transmission time of the entire database, with increase in the data rate, the performance drastically improves. Thus, there is a cross over point in the speed of the network where the datacycle scheme does better than the traditional 2-PL system. This cross over occurs just over 1.5 Gbps for both data access distributions, as can be seen in the gures provided. It would be very useful to be able to theoretically derive this cross over point, and this is what is done in the following sub-section. The datacycle scheme, however, su ers from the extra cost involved in equipping the computer sites with sophisticated data lters to read the data on the broadcast bus \on the y" and having a reliable centralized data pump.
14
C is the network data rate at which the cross over in performance occurs. Now, TA = TB is solved for C , with G = 0:05, D = 100 data items, N = 10 computer sites and Tp is calculated using Equation 3 with L = 3000. C is found to be 1:534 Gbps, quite close to the value obtained from
the gures.
G) TA = Tdc (= C TB = N Tp + Ttr (= N Tp + DG C )
(13) (14)
15
D?n k D ; k D?n k
0;
k > D?n
(15)
Qk n;m = Probfm of the k data items are accessed in the same cyclej n out of D data items have gone pastg Dm ?n n k ? m ; m D ? n and k ? m n D k
(16)
Since the processing of a query may begin anywhere within a data cycle, P = Probfn out of D data items have gone pastg = 1 ; n = 1; 2; : : :; D
n
k be the average time for a query to access all the k data items in its readset, given that n Let Xn of the data items in the current cycle have already gone past the site. Equation 18 is the formula k and has three terms. The rst term accounts for the case where all the k data items are for Xn accessed in the current cycle. The second term is for the case where all of the k required data items have gone past. Here, the query has to wait until the end of the current cycle, which is the time required for the remaining (D ? n) data items to go past. As per Equation 1, the transmission time per data item is Ts . Assuming that queries arrive exactly at the beginning of a data item time slot, the remaining time until the end of the current data cycle is (D ? n)Ts . Then, since all the k data items are among the rst n data items in the new cycle, the time to access the data items can be a maximum of nTs . The third term considers a scenario where only m out of k data items can be accessed in the current cycle. Owing to the data consistency criterion, all the data items have to be accessed in the same cycle. Thus, again the query has to wait until the end of the current cycle and access all the k data items in the new cycle. Figure 7 should help clarify the scenario being considered. k = P k ?k k k Xn n 1;D?n + Qn;0 (D ? n)Ts + ?1;n ] + k ?1 X Qk n;m m=1
(17)
(D ? n)Ts + ?m n;D ]
(18)
?r i;j denotes the average time required to access r data items, the access times of each of which is uniformly distributed between iTs; jTs]; i < j . This parameter may be easily calculated as done k ]2 may be computed. Now, X k , the average time required before. Similarly, the second moment Xn 16
Figure 7: Timing diagram for a query that arrives when n data items have gone past by a query having k data items in its readset may be evaluated as below.
Xk =
D X n=1
kP Xn n
and
X k ]2
D X n=1
k ]2 P Xn n
(19)
The average time X and its second moment X 2 required by a query to access its entire readset, averaged over all the K classes of queries is given below. From earlier sections, gk = ProbfThe cardinality of the readset of a query = kg.
X=
K X k=1
X k gk
and
X2 =
K X k=1
X k ]2 gk
(20)
Now the average response time of a query may be calculated using Equation 12. The graphs in Figure 3 correspond to sthe case where there are no updates in the system. The e ect of updates (D-n)T nT nTs (D-n)T s s will be to shift the graphs to the left. In Figure 8, the graphs corresponding to the case where all the data items are updated at the beginning of each cycle are presented. For easy comparison, the throughput curves of the update-free case are also provided in the same gure. It is interesting to Broadcast Cycle note Transaction (by comparing Figures 3 andBegins 8) that at a higher data rate (of 2 Gbps), the datacycle scheme in an Arrival update-intensive environment performs slightly better than the traditional locking protocol in an update-free environment. As the data rate is further increased, the margin of improvement in performance is expected to increase. Since some assumptions were made to simplify the analysis (basically, the same assumptions made earlier), a simulation was done to validate the analysis. The comparison between the analysis and simulation results are presented in Figure 9. For low loads, the simulation and analysis results match very well while at higher loads, the results deviate a little. This is owing to the fact that at high loads, there is a dependency between the response times of successive queries, which is not taken into account in the analysis.
17
1 Gbps 800
* * * *
2 Gbps
600
400
* * * * * * * *
* * * * * * * * * * * * * * *
200
* * * * * * * * * * *
* * * * * * *
10
15
20
25
30
35
40
45
50
0 0
10
15
20
25
30
35
40
45
Figure 8: The e ect of updates on the query throughput in the datacycle scheme: E = 0
Figure 9: Comparison of simulation and analysis of the e ect of frequent updates on the query throughput in the datacycle scheme: E=0
From the previous sections, it is clear that the datacycle concept works well for a read-only system. We have looked at some other concurrency control (CC) strategies with respect to implementing them in DDBS in a high speed environment. Speci cally, we have looked at the datacycle implementation of update transactions, and the standard two-phase locking. The shortcomings of these schemes have been highlighted below. In this section, we develop a new CC protocol termed Sendon-Demand , bearing in mind the various requirements of a DDBS in a high speed environment. Our database correctness notion is that of con ict serializability 22, 28]. To avoid rollbacks to the maximum possible extent, we use the consistency criteria of the highest degree (degree 3) as de ned by Gray et al. in 22, 30]. CC problems in the datacycle architecture 1. The centralized update manager is the ultimate performance and reliability bottleneck. 2. A multiversion optimistic algorithm is used that may not work very e ciently under conditions of high data contention (which is to be expected in a DDBS in a high speed environment). CC problems in locking 1. There is a fair possibility of deadlocks. 2. There is also an overhead associated with the sequence of lock request, lock grant and lock release messages while accessing any data item. 18
Figure 11: Structure of the claim queue for a data object At system start-up time, all the data items reside at certain sites. As the transactions start arriving, the claim queues start lling up. Every site transmits each of its resident data items to the site where the rst con rmed transaction in the respective claim queues was initiated. To improve overall e ciency, each site may transmit the remainder of the associated claim queue along with the data item, although construction of the claim queue may also be done by processing the broadcast information stored earlier. While the claim queue is being transmitted, some other transactions requiring that data item may arrive in the system (broadcast their access set information and subsequently get con rmed). The information about these transactions may be derived from the broadcast information. Every transaction-initiating site waits for the entire access set to arrive at its location, nishes processing the transaction, and then sends out the data items to other sites that are next on the respective claim queues. Clearly, this mechanism eliminates the necessity to unlock data items, leading to better performance. In traditional systems, to maintain the atomicity of transactions, special commit protocols like the two-phase commit 22] may have to be implemented. However with a little thought, it is clear that only an inexpensive local commit protocol is required here, thus enhancing performance at another level. Since each site maintains a claim queue for only those data items that currently reside at that site, the memory requirement is not very large. However, each site has to allocate memory to store the access set information of con rmed transactions. A typical update transaction has a readset as well as a writeset. Thus a typical claim queue for a data item has read entries as well as write entries. Now, consecutive read entries can be processed in parallel, while the write entries have to be served serially. Consider the claim queue of a single data item Dj , where there are two write entries separated by a few read entries. After the rst write has been completed by the site that currently possesses the write copy of Dj , it sends the updated version of Dj to all sites that have a read entry in the claim queue before the next write. The actual write copy is sent to the site that is responsible for the last read-entry before the second write entry. However, after each transaction that needed to read Dj is completed, there should be a mechanism to inform the site that needs to write on Dj . This is achieved if the sites reading Dj send a message to the site that is waiting to write on Dj the moment the update requiring the read is over. The site that has the write copy transmits the write copy of Dj along with the message, 20
and then the update at that site can begin. One of the important assumptions made here is that the con rmation duration is set to a value such that total ordering of the messages is achieved, i.e., every site sees the same ordering of conrmed transactions in a claim queue. In a real system, messages may be delayed and the broadcast mechanism may have to be sophisticated enough to guarantee total ordering even under those circumstances. Further work is planned on studying the performance of the send-on-demand protocol under the relaxed assumption that total ordering is not met by the broadcasting mechanism. To illustrate the working of the send-on-demand protocol, a simple example has been worked out in Figure 12 with a 3-node DDBS and 3 data items. The communication links are labelled with the communication time (transmission + propagation delay, the major component of which in high speed networks is the propagation time).
21
0 T3 Site 2
ite 1 t Message nt
Figure 12: A simple example illustrating send-on-demand execution data cycle is guaranteed to be correct since updates are incorporated in the broadcast stream only at the beginning of the cycle. The information D2may not be the most up-to-date but the result of the Broadcast Message for query will be T consistent. However in the case of the distributed datacycle implementation, it is not 2 sent very clearSite as to consistent since now the updates of data items a ected D1 w data 2 how TS=1 D3 may w beTderived, 1 by the same update operation may be re ected in the di erent broadcast streams at di erent times. 22
D2 T1 conrmed and D2 sent to Site 1 Broadcast Message for T
S=0 D2 w D3 w
D1
D1
Figure 12: Example (continued) The D2 sent to Site 3.data items read from one broadcast stream is of course still consistent within a single cycle. The example below will demonstrate how this problem is tackled in our system. Example
D1 D3
Consider a query that has three data items, say D1 ; D2 and D3 in its readset. Let the pair T2 execution t T (TSDj ; SDj ) represent the timestamp information, where TSDj is the write timestamp of the data 3 continues for 1 unit item Dj and SDj = fDi1 ; : : :; Dik g is the set of k data items (including Dj ) updated by the same update that created the latest version of Dj respectively, as read by a particular site at time t. The second piece of information may be derived from the transaction ID information. Without any loss of generality, Let t ; tDand t3 be the respective points in time that data items D1; D2 and D3 D11 2 T2query nishes execution respectively are available3to the in question. Figure 13a illustrates this scenario, where the and departs. data items in the readset are D available onSite two 3 di erent broadcast channels. Also, the broadcast sent to
T3
1
D3
23
st Cycle Begins
(TSD1 ; SD1 )t1 (a) = (ts1 ; fD1; D4g) (TSD2 ; SD2 )t2 = (ts2 ; fD2; D3g) (TSD3 ; SD3 )t3 = (ts3 ; fD3g)
If is violated since no con icting updates t2 no consistency criteria t1 it is obvious that t0ts2 ts3 , then t4 are re ected in the versions being read. Figure 13b depicts the case when ts2 > ts3 . In this case, an update that should have re ected on the version of D3 has not yet taken e ect and using these versions of the three data items to compute the query would give inconsistent results. Hence, the site has to wait until a consistent set of versions can be obtained. Suppose (TSD1 ; SD1 )t4 = (ts4 ; fD1; D2g) (TSD2 ; SD2 )t5 = (ts4 ; fD1; D2g) (TSD3 ; SD3 )t6 = (ts2 ; fD2; D3g)
t3
24
t6
(b)
Let ts4 > ts2 . Now, D3 has been updated to the new version. However, in the meantime, the versions of D1 and D2 have changed. Thus the data versions at time instants t1 ; t2 and t6 will constitute a consistent set. From the above example, the consistency criteria for queries may be developed. A query that requires k data items, say D1 ; : : :; Dk without loss of generality, may read those versions of the data items that satisfy atleast one of the following conditions.
For any two data items Dm and Dn , 1 m; n k and m 6= n, If Dm or Dn 2 SDm SDn ], then at least one of SDm or SDn contains both Dm and Dn . Let SDm be that update set. Then, the set of versions accessed is consistent if TSDm TSDn . The integrated send-on-demand algorithm is summarized in Figure 14.
All Sites:
Update claim queues as and when transactions are con rmed. If responsible for broadcasting a fragment of the database, append timestamp information to the data items as the updated data items arrive. Remove the update transaction informationfrom the claim queues as and when they depart by checking the timestamp information in the broadcast streams.
Scan broadcast channels on which data items in the query access set are being broadcast. Read only those copies that are mutually consistent by checking the respective timestamp information. Execute the query.
Broadcast the entire access set (readset+writeset) of the update to all the sites. Wait for the following events: { All the data items in the access set arrive. { The \completion" message arrives from the transactions that hold a read-copy of a data item in the writeset. Execute the update. Send a copy of each updated data item (along with the time at which the update was executed) to the respective sites responsible for broadcasting them. If holding a read-copy of data item Dj , send a \completion" message to the next write in the claim queue for Dj . If holding the write-copy of Dj , send a read-copy of Dj to all sites which have a read entry in the claim queue before the next write; and send the write-copy to the next write.
Figure 14: The Integrated Send-on-Demand CC Algorithm clear that the send-on-demand protocol does better in a high contention scenario. At the lower data rate of 50 Mbps, the performance curves for the two protocols are fairly close, and this is because the data transmission time is a dominant factor. At even lower data rates (or in local area networks where the propagation delay is insigni cant), the performance of both protocols is about the same. As the data rate is increased, (and the e ect of the propagation delay dominates) send-on-demand outperforms locking. At a low transaction arrival rate, the average response time for locking is lower than that in the send-on-demand, and that is because of the initial broadcasting overhead in the send-on-demand. As the arrival rate is increased, the queueing delay for the transactions in the locking case dominates and consequently the response time increases.
26
700 50 Mbps K=1 20% update the same data item ..... : Locking ___ : Send-on-Demand
600
800
500
400 1 Gbps
600
300
400
200
200
100
0 0
100
200
300
400
500
600
0 0
50
100
150
200
250
300
350
400
Figure 15: Comparison of send-ondemand and traditional locking for a write-only system
also propagation delay bound, i.e., its performance does not improve with the speed of the network after a certain threshold data rate. Further, at low data contention, the standard 2-PL CC protocol performs better than send-on-demand because of the overhead of the broadcast phase in the latter. Thus, a CC algorithm that is a hybrid of the send-on-demand and locking schemes might prove to be better, and is under active consideration. As part of the future work, the operation of the send-on-demand protocol in the face of network and computer site failures is being considered. The e ect of di erent data granularity, di erent data access distributions and multiple copies of data items on the performance of the send-on-demand protocol is also being looked at. Data security and network architecture issues in the context of this new protocol is also being studied.
Acknowledgements
The authors would like to thank the anonymous reviewers for their detailed comments on the original manuscript. They were of great assistance in improving the overall presentation of this paper.
(21)
Again, we make the assumption that the random variables Di ; : : :; Dk are independent and identically distributed with the uniform distributions in Equations 2 and 5, as the case may be. Using the characteristic function of the p.d.f f ( ) denoted by (s), the characteristic function of k , k (s) may be calculated as below:
k (s) =
(s)]k
(22)
Now, the rst and second moment of k , viz., k and k ]2 may be calculated by standard procedures. As before, Equation 8 may be used to calculate X and X 2. The average response time of a query, R may be calculated as in Equation 12. The results of comparison of the datacycle and the traditional scheme for dependent-data transactions is provided in Fig. 17 with the same parameters as in the previous section. A comparison 28
1000 900
* : Simulation Points
1000
1.5 Gbps *
*
1 Gbps 800
1.5 Gbps
2 Gbps
1 Gbps
600
2 Gbps
*
400
300
* * * * * * * * * * * * * *
* * * * * * * *
200
200 100
* *
0 0
10
12
14
0 0
10
12
14
Figure 17: Comparison of datacycle and traditional locking for a read-only system with data dependent transactions: Ei = 0
1200 * : Simulation Points 1000 2 Gbps
*
Figure 18: Comparison of simulation and analysis of the datacycle scheme for a readonly system with data dependent transactions: Ei = 0
800
*
600
*
400
* *
200
* * * * * *
0 0
10
12
Figure 19: Comparison of simulation and analysis of traditional locking for a readonly system with data dependent transactions: Ei = 0 of simulation and analysis for both cases is presented in Figures 18 and 19.
29
References
1] W. Luk, X. Wang, and F. Ling, \On the Communication Cost of Distributed Database Processing," in Proceedings of the International Conference on Distributed Computing Systems, pp. 528{535, 1988. 2] C. Date, An Introduction to Database Systems, vol. I. Addison-Wesley, 4 ed., 1986. 3] H. Garcia-Molina, R. J. Lipton, and J. Valdes, \A Massive Memory Machine," IEEE Transactions on Computers, vol. C-33, pp. 391{399, May 1984. 4] A. Ammann, M. Harahan, and R. Krishnamurthy, \Design of a Memory Resident DBMS," in Proceedings of the IEEE Spring Computer Conference COMPCON-85, (San Francisco, California), pp. 54{57, February 1985. 5] D. Dewitt, R. Katz, F. Olken, L. Shapiro, M. Stonebraker, and D. Wood, \Implementation Techniques for Main Memory Database Systems," in Proceedings of the 1984 ACM SIGMOD Conference, (Boston, Massachusetts), pp. 1{8, June 1984. 6] M. H. Eich, \MARS: The Design of a Main Memory Database Machine," in Proceedings of the International Workshop on Database Machines, pp. 468{481, October 1987. 7] M. H. Eich, \Main Memory Database Research Directions," in Proceedings of the Sixth International Workshop on Database Machines (H. Boral and P. Faudemay, eds.), (Deauville, France), pp. 251{268, Springer-Verlag, June 1989. 8] J. Gray and F. Putzolu, \The 5-minute Rule for Trading Memory for Disc Accesses and the 10-byte Rule for Trading Memory for CPU time," in Proceedings of the ACM SIGMOD Conference, (San Francisco, California), pp. 395{398, May 1987. 9] R. Hagmann, \A Crash Recovery Scheme for a Memory Resident Database System," IEEE Transactions on Computers, vol. C-35, pp. 839{843, September 1986. 10] T. J. Lehman and M. J. Carey, \A Recovery Algorithm for a High Performance Memory Resident Database System," in Proceedings of the ACM SIGMOD Conference, (San Francisco, California), pp. 104{117, May 1987. 11] K. Salem and H. Garcia-Molina, \System M: A Transaction Processing Testbed for Memory Resident Data," IEEE Transactions on Knowledge and Data Engineering, vol. 2, pp. 161{172, March 1990. 12] J. Gray, B. Good, D. Gawlick, P. Homan, and H. Sammer, \One Thousand Transactions per Second," in Proceedings of the IEEE Spring Computer Conference COMPCON-85, (San Francisco, California), pp. 96{101, February 1985. 13] S.-C. Shyu, Design and Performance Analysis of Locking Algorithms for Distributed Databases. PhD thesis, University of Southern California, Los Angeles, CA 90089, December 1989. 14] S.-C. Shyu and V. O. Li, \Performance Analysis of Static Locking in Distributed Database Systems," IEEE Transactions on Computers, vol. 39, pp. 741{751, June 1990. 30
15] A. Chen, D. Brill, M. Templeton, and C. Yu, \Distributed Query Processing in a Multiple Database System," IEEE Journal on Selected Areas in Communications, vol. 7, pp. 390{398, April 1989. 16] M. Templeton, D. Brill, S. Dao, E. Lund, P. Ward, A. Chen, and R. MacGregor, \Mermaid{ A Front-End to Distributed Heterogeneous Databases," Proceedings of the IEEE, vol. 75, pp. 695{708, May 1987. 17] C. Yu, K. Guh, D. Brill, and A. Chen, \Partitioning Relation for Parallel Processing in Fast Local Networks," in Proceedings of the IEEE 1986 Parallel Processing, pp. 1021{1028, 1986. 18] C. Yu, K. Guh, D. Brill, and A. Chen, \Partition Strategy for Distributed Query Processing in Fast Local Networks," IEEE Transactions on Software Engineering, vol. 15, pp. 780{793, June 1989. 19] C. Yu, K. Guh, C. Chang, C. Chen, M. Templeton, and D. Brill, \An Algorithm to Process Queries in a Fast Distributed Network," in Proceedings of the IEEE 1984 Real Time Systems Symposium, pp. 115{122, 1984. 20] C. Yu, K. Guh, W. Zhang, M. Templeton, and A. C. D. Brill, \Algorithms to Process Distributed Queries in Fast Local Networks," IEEE Transactions on Computers, vol. C-36, pp. 1153{1163, October 1987. 21] P. S. Yu, D. Cornell, D. M. Dias, and A. Thomasian, \On Coupling Partitioned Database Systems," in Proceedings of the 6th International Conference on Distributed Computing Systems, (Massachusetts), pp. 148{157, May 1986. 22] M. T. Ozsu and P. Valduriez, Principles of Distributed Database Systems. Prentice Hall, 1991. 23] J. D. Ullman, Principles of Database Systems. Computer Science Press, second ed., 1982. 24] T. Bowen, G. Gopal, G. Herman, and J. William Mans eld, \A Scale Database Architecture for Network Services," IEEE Communications Magazine, vol. 29, pp. 52{59, January 1991. 25] G. Herman and G. Gopal, \The Case for Orderly Sharing," in Springer Verlag Lecture Notes in Computer Science on High Performance Transaction Systems (D. Gawlick, M. Haynie, and A. Reuter, eds.), pp. 148{174, Springer-Verlag, 1989. 26] G. Herman, G. Gopal, K. Lee, and A. Weinrib, \The Datacycle Architecture for very High Throughput Database Systems," in Proceedings of the ACM, SIGMOD, pp. 97{103, 1987. 27] A. Weinrib and G. Gopal, \Decentralized Resource Allocation for Distributed Systems," in Proceedings of INFOCOM 87, (San Francisco), pp. 328{336, April 1987. 28] W. Cellary, E. Gelenbe, and T. Morzy, Concurrency Control in Distributed Database Systems. Studies in Computer Science and Arti cial Intelligence, Elsevier Science Publishers, 1988. 29] L. Kleinrock, Queueing Systems, vol. I: Theory. McGraw Hill, 1960. 30] J. Gray, R. Lorie, G. Putzolu, and I. Traiger, \Granularity of Locks and Degrees of Consistency in a Shared Data Base," in Modelling in Data Base Management Systems (G. Nijssen, ed.), pp. 365{394, Amsterdam: North-Holland, 1976. 31