A Survey of Distributed Query Optimization

48
The International Arab Journal of Information Technology, Vol. 2, No. 1, January 2005
A Survey of Distributed Query Optimization

Alaa Aljanaby1 , Emad Abuelrub 1 , and Mohammed Odeh2 Computer Science Department, Zarqa Private University, Jordan 2 Faculty of Computing, University of the West of England, UK
1
Abstract: The distributed query optimization is one of the hardest problems in the database area. The great commercial success of database systems is partly due to the development of sophisticated query optimization technology where users pose queries in a declarative way using SQL or OQL and the optimizer of the database system finds a good way (i. e. plan) to execute these queries. The optimizer, for example, determines which indices should be used to execute a query and in which order the operations of a query (e. g. joins, selects, and projects) should be executed. To this end, the optimizer enumerates alternative plans, estimates the cost of every plan using a cost model, and chooses the plan with lowest cost. There has been much research into this field. In this paper, we study the problem of distributed query optimization; we focus on the basic components of the distributed query optimizer, i. e. search space, search strategy, and cost model. A survey of the available work into this field is given. Finally, some future work is highlighted based on some recent work that uses mobile agent technologies. Keywords: Distributed query optimization, deterministic strategies, randomized strategies. Received October 1, 2003; accepted March 3, 2004
1. Introduction
The future of large database systems lies into the realm of distributed computing. The main reason for this is that distributed computing can be constructed at a low cost without the need for any specialized technology, using existing sequential computer and relatively cheap computer networks. A database is physically distributed across different sites by fragmenting and replicating the data. Fragmentation subdivides each relation into horizontal fragments using select operation or vertical fragments using project operation. Fragmentation is desirable because it enables the placement of data in close proximity to its place of use. Each fragment may also be replicated to a number of sites. This is preferable when the same data is accessed from applications that run at a number of sites [16, 17]. The great commercial success of database systems is partly due to the development of sophisticated query optimization technology, where users pose queries in a declarative way using SQL or OQL and the optimizer of the database system finds a good way (i. e. plan) to execute these queries. The optimizer, for example, determines which indices should be used to execute a query and in which order the operations of a query (e. g. joins, selects, and projects) should be executed. To this end, the optimizer enumerates alternative plans, estimates the cost of every plan using a cost model, and chooses the plan with lowest cost [9]. Selecting the optimal execution strategy for a query is NP-hard in the number of relations [5]. For complex queries with many relations, this incurs a prohibitive
optimization cost. Therefore, the actual objective of the optimizer is to find a strategy close to optimal and to avoid bad strategies. The selection of the optimal strategy generally requires the prediction of execution cost of the alternative candidate ordering prior to actually executing the query. The execution cost is expressed as a weighted combination of I/ O, CPU, and communication costs [18]. In this paper, we will discuss some of the ways in which queries can be optimized for distributed environments. In section 2, the problem of query processing and optimization is discussed. Different kinds of search spaces are discussed in section 3. Section 4 discusses the different types of search strategies showing the advantages and disadvantages of each strategy. Some kinds of cost models are discussed in section 5. Section 6 concludes the results and highlights some future directions.
2. Query Processing and Optimization

Query processing is the process of translating a query expressed in a high-level language such as SQL into low-level data manipulation operations. Query Optimization refers to the process by which the best execution strategy for a given query is found from a set of alternatives. Typically query processing involves many steps. The first step is query decomposition in which an SQL query is first scanned, parsed and validate. The scanner identifies the language tokens such as SQL keywords, attribute names, and relation names in the text of the query, whereas the parser checks the query
49
syntax to determine whether it is formulated according to the syntax rules of the query language. The query must also be validated, by checking that all attribute and relation names are valid and semantically meaningful names in the schema of the particular database being queried. An internal representation of the query is then created. A query expressed in relational algebra is usually called initial algebraic query and can be represented as a tree data structure called query tree. It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra operations as internal nodes. For a given SQL query, there is more than one possible algebraic query. Some of these algebraic queries are better than others. The quality of an algebraic query is defined in terms of expected performance. Therefore, the second step is query optimization step that transforms the initial algebraic query using relational algebra transformations into other algebraic queries until the best one is found. A Query Execution Plan (QEP) is then founded which represented as a query tree includes information about the access method available for each relation as well as the algorithms used in computing the relational operations in the tree. The next step is to generate the code for the selected QEP; this code is then executed in either compiled or interpreted mode to produce the query result [3, 7]. Figure 1 shows the different steps of Query Processing.
Query in high-level language
Scanning, parsing and validating
Initial algebraic query
Query optimizer
Execution plan
Code generator
Code to execute the query
Runtime database processor
Result of the query Figure 1. Query processing steps.
2.1. Distributed Query Optimization

In distributed query optimization two more steps are involved between query decomposition and query optimization: Data localization and global query optimization. The input to data localization is the
initial algebraic query generated by the query decomposition step. The initial algebraic query is specified on global relations irrespective of their fragmentation or distribution. The main role of data localization is to localize the query using data distributed information. In this step, the fragments that are involved in the query are determined and the query is transformed into one that operates on fragments rather than global relations. Thus during the data localization step, each global relation is first replaced by its localization program, which is union of the fragment of a horizontally or vertically fragment query, and then the resulting fragment query is simplified and restructured to produce another good query. Simplification and restructuring may be done according to the same rules used in the decomposition step. The final fragment query is generally far from optimal; this process only eliminates bad queries. The input to the third step is a fragment query, that is an algebraic query on fragments. By permuting the ordering of operations within one fragment query, many equivalent query execution plans may be found. The goal of query optimization is to find an execution strategy for the query that is close optimal. An execution strategy for a distributed query can be described with relational algebra operations and communication primitives (send/ receive operations) for transferring data between sites [16]. The query optimizer that follows this approach is seen as three components: A search space, a search strategy and a cost model. The search space is the set of alternative execution to represent the i put query. n These strategies are equivalent, in the sense that they yield the same result but they differ on the execution order of operations and the way these operations are implemented. The search strategy explores the search space and selects the best plan. It defines which plans are examined and in which order. The cost model predicts the cost of a given execution plan which may consist of the following components [3]: Secondary storage cost: This is the cost of searching for reading and writing data b locks on secondary storage. Memory storage cost: This is the cost pertaining to the number of memory buffers needed during query execution. Computation cost: This the cost of performing in memory operations on the data buffers during query optimization. Communication cost: This is the cost of shipping the query and its results from the database site to the site or terminal where the query originated.
3. Search Space
The optimizer considers each algebraic operation independently since the query decomposition step has
50
already taken global reorganization decisions. Optimization of all operations but the n-way Select Project Join (SPJ) is quit straightforward. It consists of choosing the algorithm and the home of operation. The crucial issue in terms of the search strategy is the join ordering problem, which is NP-hard on the number of relations [5]. The search space, or solution space, is the set of all QEPs that compute the same result. A point in the solution space is one particular plan, i. e. solution for the problem. A solution is described by the query tree for executing the join expression. Every point of the search space has a cost associated with it; a cost function maps query trees to their respective costs. The query tree itself is a binary tree that consists of base relations as its leaves and joins operations as its inner nodes; edges denote the flow of data that takes place from the leaves of the tree to the root. As the join operation is commutative and associative, the number of possible query trees increases quickly with increasing the number of relations involved in the query [23]. Thus, we reduce the goal of the query optimizer to find the best join order, together with the best join method for each join [6]. For a complex query, involving many relations and many operators, the number of equivalent operator trees can be very high. Investigating a large search space may make optimization time prohibitive, sometime much more expensive than the execution time. Therefore, query optimizers typically restrict the size of the search space they consider. The first restriction is to use heuristics. Another important restriction is the shape of the join tree [18]. The following sections discuss the characteristics of the most interesting kinds of search spaces depending on the shape of the join tree they include.
naturally induce intra-operation parallelism. Hash-join usually consists of two consecutive phases: Build and probe. In the build phase, a hash table is constructed on the join attribute of the smallest relation. In the probe phase, the largest relation is sequentially scanned and the hash table is consulted for each of its tuples.
Figure 2. Different query trees.
3.1. Left-Deep and Right-Deep Trees

The search space restricted to only left-deep trees consists of all query trees where the inner relation of each join is a base relation as in Figure 2-a. For a fixed number of base relations, left-deep tree does not leave any degree of freedom concerning the shape of the tree, but there are n! ways to allocate n base relations to the tree leaves. In most sequential optimizers like the optimizer of IBM System R [21], the search space is restricted to left-deep trees. The reason for this heuristic restriction is that the space including all trees is much larger, while an optimal or nearly optimal plan can often be found in the much smaller space of left-deep trees. When join algorithms are not symmetric, which is the case for hash-join, it is useful to distinguish between left-deep and right-deep trees, see Figure 2-b. Hash-based join algorithm have been shown to be the most efficient for equi-joins [2] and particularly interesting in a parallel environment because they
Consider the example presented in [13, 14] as shown in Figure 3. In the left-deep tree of Figure 3-a, the temporary relation Temp1 must be completely produced and the hash table in Build2 must be finished before Probe2 can start consuming R3. The same is true for Temp2, Build3 and Probe3. Thus this tree is executed in four consecutive phases: Build R1s hash table, then probe it with R and build Temp1s hash 2 table, then probe it with R and build Temp2s hash 3 table, and finally probe it with R4 and produce the result. In the right-deep tree of Figure 3-b, the arrows indicate that the temporary results are pipeline to the next operation. This tree can be executed in two phases if enough memory is available to build the hash tables: Build the hash tables for R R3, and R4, and then 1, execute probe1, probe2 and probe3 in pipeline.
3.2. Zigzag Trees

Zigzag trees are mainly of interest in distributed and parallel database. Ziane, Zait, and Borle -Salanet [25] proposed zigzag trees as an intermediate formats between left-deep trees and right-deep trees, see Figure 2-c. Zigzag trees can le ad to better response time than right-deep trees in case of limited memory, especially when temporary relations are not staged to disk. The rationale is to turn right instead of simply slicing the right-deep trees. Turning right means that the temporary relation produced by the right-deep subtree will be used as a building relation in the following hash-join operation, it assumed that all joins in the tree are hash join, instead of a probing relation if the rightdeep had simply been used. Note that choosing a temporary relation as a building one is particularly useful when the build phase can be avoided, that is,
51
when it is already hashed on the attribute of the next join. A reasonable heuristic to favor right-deep trees or zigzag trees when relations are partially fragmented on disjoint homes and intermediate relations are rather larger. Consider the sample query of Figure 3-c, less memory than the right-deep tree of Figure 3-b is needed for storing the hash tables. This tree is executed in three phases: Build the hash tables of R1 and R4, execute probe1 and build2, and then execute probe2 and probe3 in pipeline.
4. Search Strategies
One central component of a query optimizer is its search strategy or enumeration algorithm. The enumeration algorithm of the optimizer determines which plans to enumerate, and classically is based on dynamic programming. There are basically two approaches to solve this problem. The first approach is the deterministic strategies that are proceeded by building plans, starting form base relations, joining one more relation at each step till the complete plans are obtained. When constructing QEPs through dynamic programming, equivalent partial plans are constructed and compared on some cost model. To reduce the optimization cost, partial plans that are not likely to lead to the optimal plan are pruned (discarded) as soon as possible, and the cheaper partial plans are retained and used to construct the full plan. A greedy strategy builds only one such plan using depth-first search, while dynamic programming builds all possible plans breadth-first. The other approach is the randomized strategies that concentrate on searching the optimal solution around some particular points. They do not guarantee that the optimal plan is obtained, but avoid the high cost of optimization, in terms of memory and time consumption [13]. First, a start plan is built by a greedy strategy. Then, the algorithm tries to improve the start plan by visiting its neighbors. A neighbor is obtained by applying a random transformation to a plan. An example of a typical transformation consists in changing two randomly chosen operand relations of the plan. Iterative improvement and simulated annealing differ on the criteria for replacing the current plan by the transformed one and on the stopping criteria.
Figure 3. Scheduling hash-join with different query trees.
3.3. Bushy Trees

This search space permits join nodes where both operands are composites, i. e. no base relations as in Figure 2-d. Thus, the solutions in this space are not restricted. Consequently, this search space includes left-deep as well as other special tree shapes as subsets. Since the shape of possible query trees can be arbitrary, the cardinality (no. of solutions) of this space is much higher than the cardinality of the left-deep space. Bushy trees offer the best opportunity for exploiting independent parallelism and it is the most suitable for parallel machines. Independent parallelism is useful when the relations are partitioned on disjoint homes. The set of nodes where a relation is stored is called its home [1, 10, 23]. Suppose the sample query in Figure 2-d is executed on shared-nothing m achine, and that the relations (R1 and R2, R3 and R4) are partitioned on disjoint homes (respectively h1 and h2). Then R1? R2 and R3? R4 could be independently executed in parallel by the set of nodes that constitutes h1 and h2.
4.1. Deterministic Strategies

In this section, the basic deterministic strategies, i. e. dynamic programming and greedy algorithm will be discussed as well as a brief discussion of a combination approach of both algorithms. 4.1.1. Dynamic Programming This algorithm is pioneered in IBMs System R project [21] and it is used in almost all commercial database products [15, 24]. The basic dynamic programming for query optimization as presented in [8] is shown in Figure 4. It works in bottom-up way by building more complex sub-plans from simpler sub-plans until the complete plan is constructed. In the first phase, the algorithm builds access plan for every table in the query (Lines 1 to 4 of Figure 4). Typically, there are several different access plans for a relation (table). If relation A, for instance, is replicated at sites S1 and S2, the algorithm would enumerate table-scan (A, S1) and table-scan (A, S2) as alternative access plans for table
52
A. In the second phase, the algorithm enumerates all two-way join plans using the access plans as building blocks (Lines 5 to 13 of Figure 4). Again, the algorithm would enumerate alternative join plans for all relevant sites; i. e. consider carrying out joins with A at S1 and S2. Next, the algorithm builds three-way join plans, using access-plans and two-way join plans as building blocks. The algorithm continues in this way until it has enumerated all n-way join plans. In the third phase, the n -way join plans are massaged by the finalizePlans function so that they become complete plans for the query; e. g. project, sort or group-by operators are attached, if necessary (Line 14 of Figure 4). Note that dynamic programming uses in every step of the second phase the same joinPlans function to produce more and more complex plans using the simpler plans as building blocks. Just as there are usually several access plans, there are usually several different ways to join two tables (e. g. nested loop join, hash join, sort-merge join, etc) and the joinPlans function will return a plan for every alternative join method. The beauty of the dynamic programming is that inferior plans are discarded as early as possible (Lines 3, 10 and 15 of Figure 4). This approach is called pruning and is carried out by the prunePlan function. A plan can be pruned if an alternative plan exists that does the same or more work at lower cost. While enumerating 2-way join plans, for example, dynamic programming would enumerate A? B and B? A as two alternative plans to execute this join, but only the cheaper of the two plans would be kept in the optPlan ({A, B}) structure after pruning, and will be used as building block for 3-way, 4-way, join plans involving A and B. Pruning significantly reduces the complexity of query optimization. It should be noted that there are situations in which two plans, join plans of A and B, are incomparable and must both be retained in optPlans({A, B}) structure, even though one plan is more expensive than the other plan. For example, A sort-merge-join B and A hashjoin B are incomparable if the sort-merge-join is more expensive than the hash-join, when the ordering of the result by the sort-merge-join is interesting. In this case, the ordering of the result might help to reduce the cost of later operations. All final plans are comparable so that only one plan will be retained as the final plan. In a distributed DBMS, neither table-scan (A, S1) nor table-scan (A, S2) may be immediately pruned in order to guarantee that the optimizer finds a good plan. Both plans do the same work, but they produce their result at different sites. Even if table-scan (A, S1) is cheaper than table-scan (A, S2), it must be kept because it might be a building block of the overall optimal plan if, for instance, the query results are to be presented at S2. Only if the cost of table-scan (A, S1) plus the cost of shipping A from S1 to S2 is lower than
the cost of table-scan (A, S2), table-scan (A, S2) is pruned. Input: SPJ query q on relations R1 , R2 , , Rn Output: A query plan for q 1 for I = 1 to n do 2 optPlan ({Ri })= accessPlans (Ri ) 3 prunePlans (optPlan ({Ri })) 4 End for 5 for I = 2 to n do 6 for all S { R1 , R2 , , Rn} such that |S| = I do 7 optPlan (S) = 8 for all O S do 9 optPlan (S) = opPlan (S) joinPlans (optPlan (O), optPlan (S-O)) 10 prunePlans (optPlan (S)) 11 End for 12 End for 13 End for 14 finalizePlans (optPlan ({R1, R2, , Rn })) 15 prunePlans (optPlan ({R1, R2, , Rn })) 16 return optPlan ({R1 , R2 , , Rn })
Figure 4. Dynamic programming algorithm.
Lanzelotte et al. [13] addressed an important issue, that is which plans are equivalent in order to prune the expensive ones? At first glance, equivalent partial plans are those that produce the same result (tuples). In fact, the order of resulting tuples is important equivalence criterion. The reason is that in the presence of sort-merge join; a partial with a high cost could lead to a better plan, if a sort operation could be avoided. The researchers made some experiments showing that dynamic programming performs better than a randomize strategy for queries with small number of relations, but this situation is inverted when the query has 7 relations or more. 4.1.2. Greedy Algorithm As an alternative to dynamic programming, greedy algorithms have been proposed. These greedy algorithms run much faster than dynamic programming, but they typically produce worse plans [23]. Figure 5 shows the basic greedy algorithm for query optimization. Just like dynamic programming, this greedy algorithm has three phases and constructs plans in a bottom-up way. It makes use of the same accessPlans, joinPlans, and finalizePlans functions in order to generate plans. However, in the second phase this greedy algorithm carries out a very simple and rigorous selection of the join order. With every iteration of the greedy loop (Lines 5 to 11 of Figure 5), this algorithm applies a plan evaluation function, in order to select the next best join. As an example, for a five-way join query with tables A, B, C, D, and E, the plan evaluation function could determine that A and D
53
should be joined first; the result of A?D should be joined with C next; B and E should be joined next; finally, the results of C ? (A?D) and B?E should be joined. Obviously, the quality of the plans produced by this algorithm strongly depends on the plan evaluation function [9]. Input: SPJ query q on relations R1 , R2 , , Rn Output: A query plan for q 1 for I = 1 to n do 2 optPlan ({Ri }) = accessPlans (Ri ) 3 prunePlans (optPlan ({Ri })) 4 End for 5 toDo = { R1 , R2 , , Rn } 6 for I = 1 to n-1 do 7 find O, I toDo, P joinPlans (optPlan (O), optPlan (I)) such that eval (P) = min {eval(P)| P joinPlans (optPlan(O), optPlan (I)); O,I toDo} 8 generate new symbol: T 9 optPlan ({T}) = {P} 10 toDo = toDo - {O, I} {T} 11 delete (optPlan (O), delete (optPlan(I)) 12 End for 13 finalizePlans (optPlan ({R1 , R2 , , Rn })) 14 prunePlans (optPlan ({R1 , R2 , , Rn })) 15 return optPlan ({R1, R2, , Rn })
Figure 5. Greedy algorithm.
building blocks generated in each algorithm, and the number of building blocks produced in every iteration.
4.2. Randomized Strategies

Randomized algorithms usually perform random walks in the solution space via series of moves. The kinds of moves that are considered depend on the solution space. If left-deep processing trees are desired, each solution can be represented uniquely by an ordered list of relation participating in the join. There are different moves for modifying these relations, Swap and 3Cycle. Swap exchanges the position of two arbitrary relations in the list, and 3Cycle performs a cyclic rotation of three arbitrary relations in the list. For instance, if R1 R2 R3 R4 R5 was a point in the solution space, application of Swap might lead to R1 R4 R3 R2 R5, whereas 3Cycle could yield R5 R2 R1 R4 R3 [4, 23]. If bushy processing trees are considered, the following moves, introduced by Ioannidis and Kang [6] are used for traversal of the solution space: Commutativity: A?B ? B?A Associativity: (A?B) ? C ? A ? (B?C) Left Join Exchange: (A?B) ? C ? (A?C) ? B Right Join Exchange: (A?B) ? C ? B? (A ? C) The points that can be reached in one move, which form a point P are called the neighbors of P. A move is called uphill (downhill) if the cost of the source point is lower (higher) than the cost of the destination point. A point is a local minimum if in all paths starting at that state, any downhill move comes after at least one uphill move. A point is a global minimum if it has lowest cost among all points. 4.2.1. Iterative Improvement The iterative improvement algorithm is shown in Figure 6. The inner loop of the algorithm is called a local optimization that starts at random point and improves the solution by repeatedly accepting random downhill moves until it reaches local minimum. Iterative improvement repeats these local optimizations until a stopping condition (a predetermined number of starting points are processed or a time limit is exceeded) is met, at such point it returns the local minimum with lowest cost [6, 23]. 4.2.2. Simulated Annealing Iterative improvement suffers from a major drawback. Because local optimization performs only downhill moves, it is possible that even with a high number of starting points the final result is still unacceptable. This is the case especially when the solution space contains a large number of high cost local minima. In this case, the algorithm gets easily trapped in one of the high cost local minima. Simulated annealing is a refinement of iterative improvement that removes this restriction. It accepts
4.1.3. Iterative Dynamic Programming Kossmann and Stocker [9] present and evaluate a new class of enumeration algorithms, based on the early work in [22], called Iterative Dynamic Programming (IDP). The main idea of IDP is to apply dynamic programming several times in the process of optimizing a query and can be seen as a combination of dynamic programming and greedy algorithm. IDP has reasonable (i. e. polynomial) complexity and produces in most situation very good plans. Researchers made some experiments to show that IDP produce better plans than any other algorithm in situations in which dynamic programming is not viable because of its high (exponential) complexity. One particular advantage is that certain IDP-variants adapt to optimization problem. If the query is simple, these IDP-variants produce an optimal plan and in the same time as dynamic programming. If the query is too complex for dynamic programming, these IDP-variants produce best plan, close to optimal, but this plan is significantly better than the plans produced by other algorithms that are applicable in those situations (e. g. randomized algorithms). IDP can be classified as a generalization of dynamic programming and the greedy algorithm with the goal to combine the advantage of both. They presented eight variants of IDP that differ in three ways: The time the iteration takes place, the size of the
54
uphill moves with some probability, trying to avoid being caught in a high cost local minimum. The algorithm is shown in Figure 7. It was originally derived by analogy to process the annealing of crystals. The inner loop of the algorithm is called stage. Each stage is performed under a fixed value of a parameter T, called temperature, which controls the probability of accepting uphill moves. Each stage ends when the algorithm is considered to have reached equilibrium. Then the temperature is reduced according to some function and another stage begins. The algorithm stops when it is considered to be frozen, i. e. when the temperature is equal zero [6, 23]. 4.2.3. Two Phase Optimization The basic idea for this variant is the combination of iterative improvement and simulated annealing in order to combine the advantage of both [6]. Iterative improvement, if applied repeatedly, is capable of covering a large part of the search space and descends rapidly into a local minimum. Whereas simulated annealing is very well suited for thoroughly covering the neighborhood of a given point in the search space. Thus, Two Phase Optimization works as follows: 1. For a number of randomly selected starting points, local minima are sought by way of iterative improvement. 2. From the lowest of these local minima, the simulated annealing algorithm is started in order to search the neighborhood for better solutions. Because only the close proximity of the local minimum needs to be covered, the initial temperature for simulated annealing pass is set lower than it would be for the simulated annealing run by itself [23]. Function Iterative Improvement Output: MinState = Optimized processing tree MinCost = 8 Do State = Random starting point Cost = cost(State) Do NewState = State after random move NewCost = cost(NewState) If NewCost < Cost then Cost = Newcost End if While Local minimum not reached If Cost < MinCost then MinState = State MinCost = Cost End if While Time limit not exceeded Return MinState End Iterative Imporovement
Figure 6. Iterative improvement.
Function Simulated Annealing Input: State Random starting point Output: MinState optimized proceesing tree MinState = State; Cost = cost(State); MinCost = Cost; T = Starting temperature Do Do NewState = State after random move NewCost = cost(NewState) If NewCost <= Cost then State = NewState Cost = NewCost Else with probability e (NewCost-Cost)/T State = NewState Cost = NewCost End if If Cost < MinCost then MinStae = State MinCost = Cost End if While Equilibrium not reached While Not frozen Return MinState End Simulated Annealing
Figure 7. Simulated annealing.
5. Cost Model
Lanzelotte, Valduriez, and Zait [12] introduced a cost model that captures all aspects of parallelism and scheduling. They define the cost estimate of a QEP containing only join nodes. All formulas given below compute response time and they simply refer to it by cost. In addition to the traditional assumptions, uniform distribution of values and independence of attributes, they also assume that tuples of a relation are uniformly partitioned among nodes of different homes, and there is no overlap between nodes of different homes, although several relations may share the same home. In the following, R refers to a base relation of the physical schema, and N to the operation captured by QEP node. P denotes, in the same time, a QEP and the transient relation produced by that QEP. The parameters, database schema or system parameters, used in the cost model are shown in Table 1. An optimal execution of the join operation requires each operand to be partitioned the same way. For example, if p and q are both partitioned on n nodes using the same function on the join attribute, the operation join(p, q) is equivalent to the union of n parallel operations join(pi , qi ), with I = 1,, n. If the above mentioned condition is not satisfied, parallel join algorithm attempt to make such condition available by recognizing the relations, i. e. dynamically
55
repartitioning the tuples of the operand relations on the nodes using the same function on the join attribute.
Table 1: Cost model parameters.
card(R) width(R) cpu network packet send receive Number of tuples in relation R Size of one tuple of relation R CPU speed Network speed The size of packet The time for a send operation The time for a receive operation
Consumption (MC). TW and RT are expressed in seconds, and MC in Kbytes [13]. The first two components are used to express a trade-off between response time and throughput. The third component represents the size of memory needed to execute the plan. The cost function is a combination of the first two components, and plans that need more memory than available are discarded. Given a plan p, its cost is computed by a parameterized function cos t (WRT ,WTW ) ( p ) , defined as follows:
First, estimate the cost of partitioning an operand relation R. Obviously, if the relation is appropriately partitioned, this cost is 0. Let # source be the number of nodes over which R is partitioned, and # dest be the number of nodes of the destination home. Each source node contains card(R)/ # source tuples. Thus it will send card(R) * width(R)/ (n * packet) packets. If we assume that tuples will be uniformly distributed on destination nodes, then each node will receive card(R)/ # dest tuples, and thus will process card (R) * width (R)/ (m * packet) incoming packets. Since a destination node starts processing only when first packet arrives, the cost of repartitioning R on # dest nodes is: cost (part(R)) = max((card(R) * width(R)/ (# source * packet)) * send, (card(R )* width(R)/ (# dest * packet)) * receive + send + packet/ network) The cost of joining tuples of p and q, where p and q are, respectively, the pipelined and stored operands of the join operation, is: cost (join(p, q)) = max(costalg cost(part(p))) + cost(part(q)) (join(p, q)),
WRT * RT +W * TW if MC of TW plan p does not exceed Plan p does not exceed cost(WRT ,WTW )( p ) = the available memory the availablememory otherwise otherwise
Where WRT and WTW are weight factors between 0 and 1, such that WRT + WTW = 1. A major difficulty in evaluating the cost is in assigning values to the weight of the first two components. These factors depend on the system state (e. g. load of the system and number of the queries submitted to the system), and are ideally determined at run time. Lanzelotte et al. [13] suppose that only one query is submitted to the system at a time, therefore the cost function become:
RT cos t ( P ) =
if MC of plan P does not exceed the available memory otherwise
The response time of P, scheduled in phases (each denoted by ph), is computed as follows:
respTime ( P ) =
phP
(max o ph( respTime (O ) + pipe _ delay(O )) + store _ delay( ph))
where costalg (join(p, q)) is the cost to process the join at one node. It depends on the join algorithm used. The partitioning of p is performed simultaneously to the join processing, after the repartitioning of q has completed. Given a QEP p scheduled in phases, each denoted by ph, the cost of p is computed as follows:
cos t ( p ) =
(max o ph( respTime( N ) + ph P pipe _ delay ( N )) + store _ delay ( ph))
Where pipe_delay (N) is waiting period of node N, necessary for the producer to deliver the first result tuples. It is equal to 0 if the input relations of N are stored. Store_delay (ph) is the time necessary to store the output results of phase ph. It is equal to 0 if ph is the last phase, assuming that the results are delivered as soon as they are produced. Based on the above model, and taking into account the aspect of parallel execution, the same researchers define the cost of the plan as three components: Total Work (TW), Response Time (RT), and Memory
Where O denotes an operation and respTime (O) is the response time of O. pipe_delay (O) is the waiting period of O, necessary for the producer to deliver the first result tuples. It is equal to 0, if the input relations of O are stored. Store_delay (ph) is the time necessary to store the output results of phase ph. It is equal to 0, if ph is the last phase, assuming that the results are delivered as soon as they are produced. The cost model, in a parallel environment depends on dynamic parameters. For example, the amount of available memory may have an impact on the choice of a scheduling strategy. Insufficient memory is a reason that forces the execution plan to be split into more phases. Memory size is usually unknown to the optimizer that operates at compile time. Parallelism introduces another crucial dynamic parameter, which is the way the load is balanced among the processors. In some cases, the impact of dynamic parameters is limited. For example, in shared-memory architecture, if we do not suppose inter-operation, knowing the amount of available memory is not relevant, because
56
only one operation is executed at a time. However, in general case, some optimization decisions should be made at run time. One solution to this problem is to build several execution plans, put together by means of choosing operators.
[6]
6. Conclusions and Future Work

We have studied the problem of distributed query optimization. We focus on the major optimization issues being addressed in distributed databases. We have seen that a query optimizer is mainly consists of three components: The search space, the search strategy, and the cost model. Different kinds of search spaces are discussed with different schedules. Search strategies, the central part of the optimizer, can be seen as two classes. We have shown that all published algorithms of the first class, i. e. deterministic strategies, have exponential time and space complexity and are guaranteed to find the optimal plan. Whereas, the big advantage of the algorithms of the second class, i. e. randomized algorithms, is that they have constant space overhead. Typically, randomized algorithms are slower than heuristics and dynamic programming for simpler queries but this is inverted for large queries. Randomized strategies do not guarantee to find the optimal plan. Some cost models are discussed and the basic parameters for parallel environment are shown. Finally, a promising future direction is the use of mobile-agent technologies [11, 19, 20], which provide the combination of flexibility and precision, to optimize the execution of distributed queries. The proposed mobile agent system is a rule -based and should contain the necessary information, i. e. rules, used in the optimization process such as selectivity and cardinality.
[7]
[8] [9]
[10]
[11]
[12]
[13]
[14]
References
[1] Chen M. S., Yu P. S., and Wu K. L., Optimization of Parallel Execution for MultiJoin Queries, IEEE Transaction on Knowledge and Data Engineering, vol. 6, no. 3, 1996. DeWitt D. J. and Gray J., Parallel Database Systems: The Future of High Performance Database Systems, Communication of ACM, vol.35, no. 6, pp. 85-98, 1992. Elmasri R. and Navathe S. B., Fundamentals of Database Systems, Reading, MA, AddisonWesley, 2000. Galindo-Legaria C., Pellenkoft A., and Kersten, M., Fast, Randomized Join-Order Selection Why Use Transformation?, in Proceedings of the Conference on Very Large Databases (VLDB), Santiago, Chile, pp. 85-95, 1994. Ibraraki T. and Kameda T. Optimal Nesting for Computing N-Relational Joins, ACM [15]
[2]
[16]
[3] [4]
[17] [18] [19]
[5]
Transactions on Database Systems, vol. 9, no. 3, pp. 482-502, 1984. Ioannidis Y. E. and Kang Y. C., Randomized Algorithms for Optimizing Large Join Queries, in Proceedings of the ACM SIGMOD Conference on Management of Data, Atlantic City, USA, pp. 312-321, 1990. Ioannidis Y. E., Query Optimization, in Trucker A. (Ed), The Computer Science and Engineering Handbook, CRC press, pp. 10381054, 1996. Kossmann D., The State of Art in Distributed Query Optimization, ACM Computing Surveys, September 2000. Kossmann D. and Stocker K., Iterative Dynamic Programming: A New Class of Query Optimization Algorithm, ACM TODS, March 2000. Kremer M. and Gryz J., A Survey of Query Optimization in Parallel Databases, Technical Report, CS-04-1999, York University, Canada, 1999. Lange D. B., Mobile Objects and Mobile Agents: The Future of Distributed Computing, in Jul E. (Ed), ECOOP98, LNCS 1445, pp.1-12, 1998. Lanzelotte R. S. G., Valduriez P., and Zait M., On the Effectiveness of Optimization Search Strategies for Parallel Execution Spaces, in Proceedings of the Conference on Very Large Databases (VLDB), Dublin, Ireland, pp. 493-504, 1993. Lanzelotte R. S. G., Valduriez P., Zait M., and Ziane M., Industrial-Strength Parallel Query Optimization: Issues and Lessons, Information Systems, vol. 19, no. 4, pp. 311-330, 1994. Niccum T. M., Srivastava J., Himatstingka B., and LI J., A Tree-Decomposition Approach to Parallel Query Optimization, Technical Report TR 93-016, University of Minnesota, 1993. Ono K. and Lohman G. M., Measuring The Complexity of Join Enumeration in Query Optimization, in Proceedings of the Conference on Very Large Database (VLDB), Brisbane, Australia , pp. 314-325, 1990. Oszu, M. T. and Valduriez P., Distributed and Parallel Database Systems, in Trucker A. (Ed), The Computer Science and Engineering Handbook, CRC press, pp. 1093-1111, 1997. Oszu, M. T. and Valduriez P., Distributed Database Systems: Where Are We Now, IEEE Computer, vol. 24, no. 8, pp. 68-78, 1991. Oszu M. T. and Valduriez P., Principles of Distributed Database Systems, Prentice Hall International, NJ, 1999. Pan L., Bic L. F., Dillencourt M. B., and Lai M. K., Mobile Agents: The Right Vehicle for Distributed Sequential Computing, in Sahni S.
57
[20] [21]
[22] [23]
[24]
[25]
et al. (Eds), HiPC 2002, LNCS 2552, pp. 575584, 2002. Pham V. A. and Karmouch A. Mobile Software Agents: An Overview, IEEE Communication Magazine, pp. 26-37, 1998. Selinger P. G., Astrahan M. M., Chamberlin D. D., Lorie R. A., and Price T. G., Access Path Selection in A Relational Database Management System, in Proceedings of the ACM SIGMOD Conference on Management of Data, Boston, USA, pp. 23-34, 1979. Shekita E. and Young H., Iterative Dynamic Programming, IBM Technical Report, 1998. Steinbrunn M., Moerkotte G., and Kemper A., Heuristic and Randomized Optimization for the Join-Ordering Problem, VLDB Journal, vol. 6, no. 3, pp. 191-20, 1997. Vance B. and Maier D., Rapid Bushy JoinOrder Optimization with Cartesian Products, in Proceedings of the ACM SIGMOD Conference on Management of Data, Montreal, Canada, pp. 35-46, 1996. Ziane M., Zait M., and Borle -Salanet P., Parallel Query Processing with Zigzag Trees, VLDB Journal, vol. 2, no. 3, pp. 277-301, 1993.
Mohammed Odeh is senior lecturer in software engineering and associate of the Complex Cooperative System Centre at the University of West of England, Bristol, UK. He holds PhD degree in computer science from University of Bath, 1993 in addition to PGCert in Higher Education and membership of ACM and ILT. He has more than 18 years of experience, including extensive project management experience in planning and leading a range of IT-related projects in addition to management posts. He supervises five PhD students in bioinformatics, information management and integration, process modeling, and knowledge management. He also leads and teaches modules at both BSc sand M levels in computer science and Sc software engineering.
Alaa Aljanaby received his both BSc and MSc in computer science from Basrah University, Iraq, in 1993 and 1996, respectively. Currently, he is a lecturer in Computer Science Department at Zarqa Private University, Jordan. Previously, he was a lecturer in Computer Science Department at Shatt El-Arab University College, Iraq. His research interests are parallel processing, systolic algorithms, and distributed databases. Emad Abuelrub is an associate professor and dean of the Faculty of Science and Information Technology at Zarqa Private University, Jordan. He received his Bachelor degrees in computer engineering and computer science from Oklahoma State University, USA, in 1984 and 1985, respectively. He then joined the Alabama A&M University, USA, where he obtained his MSc degree in computer science in 1987. He completed his PhD degree in computer science from Louisiana State University, USA, in 1993. His areas of interest include parallel and distributed systems, interconnection networks, fault-tolerance computing, parallel algorithms, and parallel architectures. He is a member of the IEEE, ACM, and JEA.

A Survey of Distributed Query Optimization

Uploaded by

Copyright:

Available Formats

A Survey of Distributed Query Optimization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Survey of Distributed Query Optimization

Uploaded by

Copyright:

Available Formats

48