Academia.eduAcademia.edu

Towards Indexing Schemes for Self-Tuning DBMS

2005, 21st International Conference on Data Engineering Workshops (ICDEW'05)

Index tuning as part of database tuning is the task of selecting and creating indexes with the goal of reducing query processing times. However, in dynamic environments with various ad-hoc queries it is difficult to identify potentially useful indexes in advance. In this paper, based on previous research regarding automatic index creation at runtime we point out the need for new indexing schemes suitable for self-tuning. Based on problems with previous approaches we describe the key concepts, which are sparse and partial indexing, usage-balanced instead of data-balanced structures, and dynamic resource assignment. We illustrate the approach by a simple index structure, which provides adaptability as well as improved access characteristics. Furthermore, we will outline key tasks to introduce according concepts in future DBMS.

Towards Indexing Schemes for Self-Tuning DBMS Kai-Uwe Sattler Eike Schallehn Ingolf Geist Department of Computer Science and Automation Department of Computer Science TU Ilmenau University of Magdeburg P.O. Box 100565, D-98684 Ilmenau, Germany P.O. Box 4120, D-39016 Magdeburg, Germany [email protected] Abstract Index tuning as part of database tuning is the task of selecting and creating indexes with the goal of reducing query processing times. However, in dynamic environments with various ad-hoc queries it is difficult to identify potentially useful indexes in advance. In this paper, based on previous research regarding automatic index creation at runtime we point out the need for new indexing schemes suitable for self-tuning. Based on problems with previous approaches we describe the key concepts, which are sparse and partial indexing, usage-balanced instead of data-balanced structures, and dynamic resource assignment. We illustrate the approach by a simple index structure, which provides adaptability as well as improved access characteristics. Furthermore, we will outline key tasks to introduce according concepts in future DBMS. 1. Introduction Today’s enterprise database applications are often characterized by a large volume of data and high demand with regard to query response time and transaction throughput. Besides investing in new powerful hardware, database tuning plays an important role for fulfilling the requirements. However, database tuning requires a thorough knowledge about system internals, data characteristics, applications, and the query workload. Among others index selection is a main tuning task. Here, the problem is to decide how queries can be supported by creating indexes on certain columns. This requires to balance between the benefit of an index and the loss caused by space consumption and maintenance costs. While such considerations have been an integral part of physical database design for a long time and are well studied, most often this results in a very high complexity and tremendous efforts required to provide an efficient solution {eike,geist }@iti.cs.uni-magdeburg.de for a given application. During recent years the need for system support on this task became obvious. The paper is structured according to implications from our research and can be divided into the following three sections. Static index management and recommendation (Section 2): the current state of the art within commercial DBMS includes index wizards or advisors, which help in analyzing single queries or complete workloads and deriving the recommendation of an index configuration for a given application. This is done by solving an optimization problem based on cost estimations for query processing with potential sets of index candidates. Nevertheless, this task remains a responsibility of the system designer or administrator and has to be carried out frequently to adjust to changing usage, data, and environment. Dynamic index management and autonomous tuning (Section 3): in our previous research we investigated whether this task can be carried out autonomously and continuously during runtime. For this purpose, we implemented a component gathering and updating statistics based on which decisions about changing the current index configuration can be made. This approach was implemented and showed very encouraging results during evaluation. Despite the overall positive results, we received some critical feedback about the involved overhead and lack of control of the system behavior during query processing. Dynamic self-tuning index structures (Section 4): as a possible consequence of the previous drawbacks, we currently consider moving the task of index tuning one step down, i.e. from the level of index configurations down to the indexes themselves. If the indexes could in some way adjust to their usage, e.g. grow with increasing or shrink with decreasing numbers of accesses, this would lead to a much more fine-grained control of selftuning within the system. We will conclude the paper by giving an overview of related work in Section 5 and, finally, describing conclusions in Section 6. 2. Static Index Management and Recommendation Though index design is not very complicated for small or medium-sized schemas and rather static query workloads, it can be quite difficult in scenarios with explorative analysis and many ad-hoc queries where the required indexes cannot be foreseen. A particular example scenario is OLAP where business intelligence and/or ROLAP tools produce queries as a result of an information request initiated by a user. These tools produce sequences of statements including creating and building tables, inserts, and queries [9]. The most current releases of the major commercial DBMS such as Oracle10g, IBM DB2 Version 8, and SQL Server 2000 provide (limited) support for this scenario. They include so-called index wizards which are able to analyze a workload (in terms of costs of previously performed queries) and – based on some heuristics – derive recommendations for index creation. This is implemented using virtual indexes that are not physically created but only considered during query optimization in a “what if” manner. Though these tools utilize workload information collected at runtime they still work in design mode. That means, the DBA has to decide about index creation and index creation is completely separated from query processing. The drawback of this approach with respect to the above mentioned scenarios is that queries are dynamically generated – eventually on temporary tables – and therefore an offline workload analysis is rather difficult. Another drawback of index advisors results from the approach of analyzing a workload once over a fixed period of time and create a static index configuration based on this observation. In many applications database usage changes over time, e.g. over longer periods due to adjustments to changing work processes, with new applications working on the same data set, or frequent changes like seasonal usage or towards the end of business quarters. Furthermore, a collected workload is very likely incomplete and biased by the impact of collecting the workload on a not fully optimized database. In addition, the necessity of index configuration changes may be triggered by changes of the hardware involved, like CPUs, main memory, and disks containing the database. All this renders the index tuning task as an ongoing process with a continuous need of DBA interaction. Index advisors do not change this, they just minimize the required human input of expert knowledge. 3. Dynamic Index Management and Autonomous Tuning Our previous work regarding autonomous query-driven index tuning is based on the introduced concepts of static index recommendation, but in addition proposes the autonomous adjustment of index configurations during runtime. The overall approach and architecture is outlined in [13] while in [14] the full details and evaluation results are given. Processing Index-building Queries At first, we will sketch the overall process of executing index-building queries. The main objective of our approach is to improve the execution times of (possible future) queries by creating useful indexes automatically. As creating indexes without limits could exhaust the available database space, we assume an index pool – an index space of limited size acting as a persistent index cache. The size of this pool is configured by the DBA as a system parameter. Based on this assumption a query is processed as follows: (1) For a given query Q the potentially useful indexes are determined. (2) For query Q a cost-optimized query plan is derived. (3) The query Q is re-optimized using the virtual indexes based on step (1). (4) The profits of sets of virtual indexes are computed as the difference of the costs of the plans from step (2) and from step (3). The index set with the highest profit is called index recommendation and is used to update a global index configuration where cumulative profits of all indexes (both materialized and virtual) are maintained. From this index configuration we have finally to decide about: creating indexes from the virtual index set, and replacing other indexes from the index pool if there is not enough space for the newly created indexes. The above discussion about processing queries leaves out several important issues. First, it should be noted that step (2) and (3) can be merged. An optimizer based on the usual dynamic programming approach can consider relation access via virtual indexes in the first iteration. The only required modifications to the optimization algorithm are to generate access plans with virtual indexes if a table scan operator was chosen and to not prune a plan if only plans with virtual indexes are better. Thus, the result of the optimization step consists of at least two plans: a plan without virtual indexes and one or more plans using virtual indexes. A second issue is the “self-interest” of a query. If we consider only the best plan generated in step (3) we are able to find only an index set contributing to the current query Q because we try to maximize the benefit of this query (local optimization). If we would consider all index sets from step (2) that provide a positive profit or at least no high loss, we could create indexes that are possibly useful in the future (i.e. for other queries), too. However, this global optimization requires to consider more index sets. Finally, under the assumption of a space-limited index pool it can be necessary to replace existing indexes in the pool by other indexes if the new virtual indexes promise a higher benefit than the old one. For this purpose, different strategies are possible. Beside classical replacement strategies which have been developed over the past years (e.g. LRU, LFU), the profit of an index can be taken into account. However, this requires to maintain statistics about global profits, e.g. by monitoring and cumulating local profits of an index for different queries. The query processing described above is to some limited extent supported by current database management systems. Virtual indexes, existing for instance in Oracle, allow the “as if”-runs of the optimizer as in step (3) to check the usefulness of indexes without materializing them. Relying on such a virtual optimization, we still have to find possibly applicable index sets. The DB2 optimizer goes one step further by providing index recommendations covering most of steps (1) to (3). ized and virtual indexes considered so far in the index catalog D = {i1 , . . . , ik }. Here, for each index i the following information is kept: • i.benefit is the benefit, i.e. the cumulative profit, of the index, • i.type ∈ {0, 1} denotes the type of index, with i.type = 1, if i is materialized and 0 otherwise, • i.size is the size of the index, which is estimated based on the typical parameters available as databases statistics, e.g. the attribute size and the number of tuples in the relation. The profit of an index set according to a query can be calculated in different ways, as outlined in the following. However, as we used the DB2 system for our evaluation, we could use the optimizer and recommended virtual indexes. For evaluation purposes we used the following technique to extract cost estimations of queries for different index configurations: 1. compute the costs for the query without any indexes except for primary key indexes via the EXPLAIN mode, 2. derive a recommended index set via the RECOMMEND INDEXES mode, 3. compute the cost for the recommended index set. Cost Model For dealing with costs and benefits of indexes as part of automatic index creation we have to distinguish between materialized and virtual (i.e. currently not materialized) indexes. Note, that we do not consider explicitly created indexes such as primary indexes defined by the schema designer. Furthermore, we assume that statistics for both kind of indexes (virtual/materialized) are computed on demand: When a certain index is considered for the first time, statistical information about it is obtained. A set of indexes i1 , . . . , in which are used for processing a query Q is called index set and denoted by I. The set of all virtual indexes of I is virt(I), the set of all materialized indexes is mat(I). Let be cost(Q) the cost for executing query Q using only existing indexes and cost(Q, I) the cost of processing Q using in addition indexes from I. Then, the profit of I for processing query Q is simply profit(Q, I) = cost(Q) − cost(Q, I) Obviously, if virt(I) = ∅ then profit(Q, I) = 0. In order to evaluate the benefit of creating certain indexes for other queries or to choose among several possible indexes for materialization we have to maintain information about them. Thus, we collect the set of all material- This way, we cannot only derive the potential profit of an index set, but the advisor mode of the DB2 optimizer also provides statistical information such as the cardinality and the number of leaf nodes that allow a precise estimation of the index size required for our strategies. Note that we use the cost model of the underlying DBMS. Thus, we can guarantee that our profit estimations are as accurate as the estimated query costs. The subset of D comprising all materialized indexes is called index configuration C = mat(D). For such a configuration X size(C) = i.size ≤ MAX SIZE i∈C holds, i.e., the size of the configuration is less or equal the maximum size of the index pool. By maintaining cumulative profit and cost information about all possible indexes we are able to determine an index configuration optimal for a given (historical) query workload. Assuming this workload is also representative for the near future, the problem of index creation is basically the problem of finding an index configuration Cnew which maximizes the overall benefit: X max i.benefit i∈Cnew This can be achieved by materializing virtual indexes (i.e. add them to the current configuration) and/or replace existing indexes. In order to avoid thrashing, a replacement is performed only if the difference between the benefit of the new configuration Cnew and the benefit of the current configuration Ccurr is above a given threshold. Here,P the benefit of a configuration is computed by benefit(C) = i∈C i.benefit. In addition, we have to take into account the cost building the new indexes costbuild (i) which appear as negative profit: So far we have focused on the analysis of a given query and how to maintain statistics of data gathered from virtual optimization. Now the question arises: is it necessary to update the materialized index configuration? If an index set I can replace a subset Irepl ⊆ C = mat(D) of the currently materialized index configuration, such that benefit(C ∪ I \ Irepl ) − benefit(C) − X costbuild (i) > MIN DIFF ∧ i∈virt(I) benefit(Cnew ) − benefit(Ccurr ) − X costbuild (i) > MIN DIFF i∈virt(Cnew ) Considering the cumulative profit of an index as a criterion for decisions about a globally optimal index configuration raises an issue related to the historic aspects of the gathered statistics. Assuming that future queries are most similar to the most recent workload, because database usage changes in a medium or long term, the statistics have to represent the current workload as exactly as possible. Less recently gathered statistics should have less impact on building indexes for future use. Therefore, we applied an aging strategy for cumulative profit statistics based on an idea presented by O’Neil et al. in [11]. Criteria for Index Selection As described before, it is easy to decide whether a query can locally benefit from a certain index configuration by quantifying the profit of feasible index combinations using virtual optimization. In order to globally decide about an optimal index configuration for future queries, the information about possible profits has to be gathered, condensed and maintained to best represent the current workload of the system, and finally based on these information a decision has to be made if an index configuration can be changed at a certain point in time. While processing a query Q the statistics must be updated by adding profits to each involved index. At this point we considered various strategies for assigning profits to each index involved. One alternative relates to the fact, that there may be various combinations of indexable attributes yielding a profit during virtual optimization. In this case, we can either add profits for all minimal index sets Ii yielding a profit, or dd only profits for the minimal index set that is locally optimal, i.e. yields the most profit, where an index set Ii is minimal, if there is no index set Ij ⊂ Ii yielding the same profit. While the former yields a more complete picture of possible gains of certain index configurations, the latter introduces less overhead while over large workloads still providing a reasonable approximation of configuration benefits. size(C ∪ I \ Irepl ) < MAX SIZE holds, indexes in Irepl can be dropped and those in I can be created. These conditions allow only improvements of the index configurations according to the current workload and conforming to our requirements regarding index space, and the criterion to avoid thrashing. For choosing I from locally beneficial index sets we considered two strategies: from the beneficial index sets choose only the locally optimal index set, or check all beneficial index sets for a possibly globally optimal configuration. The replacement index set Irepl is computed using the currently materialized index set C = mat(D) applying a greedy approach. To do this, we sort C ascending to a replacement criterion and choose the least beneficial indexes, until our space requirements are fulfilled. As replacement criteria we considered the number of references for an index, the cumulative profit of an index, and the ratio of profit per query or reference of an index. Now, if the found replacement candidate is significantly less beneficial than the index set we investigate for a possible materialization, the index configuration can either be changed during query execution as described or scheduled to be changed later on. In our implementation and experiments we only considered strategies based on locally optimal index sets. First of all, the number of indexes which have to be taken into account during processing is much smaller and therefore this approach has a significant lower overhead. Secondly, this strategy can be easily implemented on top of index recommendation facilities provided by commercial DBMS such as the index advisor of DB2, which recommends only the best index set for a given query or workload respectively. 4. Dynamic Self-tuning Index Structures The approach for self-tuning index configurations described in the previous section provides a solution for continuous tuning on the level of index configurations, where configurations are a set of common index structures. In this section we will motivate our current research approach, that moves the solution of the problem at hand to the level of the index structures. Though the impact on DBMS implementations is quite severe in this case, we will illustrate that more efficient and more organic solutions can be realized by rethinking the kind of indexes used in DBMS. Motivation for new Index Structures While indexing in databases using derivatives of BTrees, R-Trees, Grid Files, Tries, and many others has been applied successfully for years, obviously these index structures were not conceived having self-tuning in mind. This is for the following reasons. Data-balanced instead of access-balanced structure: with current DBMS index structures data access is optimized for all existing values of a certain, possibly multidimensional, range/domain. This is done to achieve O(1) or O(log n) complexity for accessing a single data unit, i.e. a page, tuple, object, etc. Nevertheless, this is done not considering if the data unit is accessed very often, rarely, or not at all. Hence, the benefit of the index structure is greater for data accessed more frequently than for data accessed less often, though both come at the same price, i.e. used system resources. Coarse granularity of tuning: with current index structures it is either all or nothing: an index can be created on an access path or not. Though the system may benefit for instance from partial indexes as proposed by Stonebraker in [16] or incomplete and sparse indexes, this is not implemented in most current DBMS. Furthermore, compared to the size the overall benefit of an incomplete index grows with O(log size), i.e. the bigger an index grows the more the gained profit ∆benefit/∆size declines. Considering the often tremendous disk space requirements, index structures suitable for self-tuning should be able to grow or even shrink on demand. Unawareness of system resource usage: related to the previously mentioned aspects, currently used index structures are not suitable to consider space limitations during runtime. Though index sizes can be estimated very precisely before creation, the index itself cannot comply to space restrictions during runtime, which would be required to realize adaptive index configurations. A Simple Adaptable Binary Tree To illustrate a possible approach and get an overview of all the related issues we implemented a simple index structure based on a binary tree which adapts to its usage during runtime, by 1. having a adjustable size, i.e. number of nodes, that can be increased or decreased during runtime, which results in a sparse tree with page containers as leaves, and 2. being access-balanced, i.e. the tree is deeper for data regions which are accessed more often to minimize page accesses. An example tree is sketched in Figure 1. Root node with overall page access statistics Page container with local page access statistics 51 25 74 38 28 63 44 55 Figure 1. An access-balanced binary tree The sparse index has page containers as leaf nodes, containing pages of tuples with keys of the value range specified by the tree. This way, currently data is organized by the index. The implications for secondary indexes and multidimensional indexing schemes are discussed later on. Nevertheless, the current approach allows assigning system resources to the tree by increasing or decreasing the allowed number of nodes within the tree. The lookup in the tree works as usual, whereas the lookup in the page containers requires a sequential scan through the pages. The reorganization and access-based balancing can only be sketched roughly due to space limitations. In principle it works as follows: the root node keeps track of all page accesses and the current size of the tree. Accordingly it computes a balance balance = all page accesses number of page containers for all page containers. For the page containers pc the number of all pageAccesses(pc) is stored. If a leaf node with two page containers pc1 and pc2 fulfills pageAccesses(pc1 ) + pageAccesses(pc2 ) < balance it is removed and added to a pool of free nodes, which is maintained as a number of free nodes in the tree root. The two page containers are merged and linked to the predecessor in the tree. If the number of page accesses of a single page container fulfills pageAccesses(pc1 ) > 2 ∗ balance Number of accessed pages (cumulative) 1600000 1400000 fully balanced access-balanced 1200000 1000000 800000 600000 400000 200000 0 0 200000 400000 600000 800000 Number of queries 1000000 Figure 2. Comparison of page accesses In Figure 2 the results for page accesses are shown. Because it would not make sense to compare our approach to a dense index, we instead compare to an alternative implementation of a fixed size tree without the automatic access-based balancing, which is fully balanced from the beginning. Though in both cases the complexity of a single lookup remains in the order of O(log n), the accessbalanced tree outperforms the fully balanced tree with a linear factor because access to more often used data is more efficient. 900 800 Number of tree reorganizations and, furthermore, there are free nodes in the node pool available, a new node is inserted with the median key from the container, and the container is split into two according new containers. Growing the tree works simply by increasing the free node pool variable and relying on the previously sketched reorganization algorithms. Shrinking can be done the same way, but also may be forced to immediately regain resources by removing nodes and merging their page containers having the least page accesses. Currently, these reorganization as well as statistic updates are performed during lookup operations. For reasons discussed later on they may also be deferred to avoid concurrency problems. Based on this reorganization schema, the tree can adjust to size restrictions as well as data usage, e.g. the example tree in Figure 1 may index data with keys uniformly distributed in the range of 1 to 100, while accesses are normally distributed with a mean key value of 50, i.e. there are far more accesses in the middle of the range. Therefore, the tree is deeper towards this range and has smaller page containers to grant efficient lookup of often accessed data. For experimental evaluation we tested a similar scenario of uniformly distributed data (100000 tuples, approx. 100 tuples per page, tree with 1000 nodes) and normally distributed exact match lookups. 700 600 500 400 300 200 100 0 0 200000 400000 600000 Number of queries 800000 1000000 Figure 3. Number of tree reorganizations In Figure 3 we illustrate the number of tree reorganizations triggered by the number of exact match lookups. After ca 100000 lookups the tree reaches a relatively stable structure with very few reorganizations required afterwards. Nevertheless, we could not yet quantify the major advantage of the proposed approach: instead of choosing to create indexes within a configuration or not, we now can assign space to indexes where needed within a more complex indexing schema. And, as discussed before, even small indexes may provide a huge benefit. This way, having a great number of those small indexes is preferable to having a small number of very huge indexes. An evaluation of these aspects, apart from solving the problems described in the following, requires a real life scenario with multiple tables, columns, etc., e.g. the TPC benchmarks and a full implementation of the global indexing schema. Because we cannot build our test environment based on available DBMS technology due to the strong difference of our approach, this is a very complex task we are dealing with in our current research. Open Issues It is quite obvious that there are a great number of open problems and questions with the approach, which we are currently trying to resolve. For our future work we will focus on the following aspects: An overall indexing schema: the basic ideas outlined before mostly only refer to new indexes, which must be the building blocks of an indexing schema covering the task of index tuning for a full database, for instance by deciding about whether a certain index makes sense at all or assigning space resources to indexes. A simplistic approach would be to support all indexable columns with indexes, and control the tuning only via resource assignment, e.g. the previously presented binary tree also allows trees with size 0, i.e. no nodes and only a list of pages. Data structures suitable for DBMS storage: of course binary trees are not the optimal choice for a DBMS index structure. Because we balance based on access characteristics and not on the data distribution, the only interesting property of B-Trees is the fixed node size that fits well with the storage management of a DBMS. If we want to support multiple (or even all possible) access paths, the best solution are multidimensional index structures, such as Grid Files or kdB-Trees. Nevertheless, these structures will have to be adjusted to be access-balanced and to serve as sparse indexes to support resource assignment. Index-organized data vs. secondary indexes: to support multiple access paths in current DBMS most often secondary indexes are used. These can cause a tremendous space overhead for sparse indexes, because possibly very long lists of tuple identifiers are required even for small indexes. To solve this issue, our current research again focuses on multi-dimensional index structures which are used to organize data on secondary storage similar to Grid Files. As an alternative we consider Bitmap indexes, but for these the well-known problems regarding the update granularity would have to be resolved. Concurrency issues: concurrency issues can result from keeping often updated statistics within the index structures. For instance, the points where statistics are kept within the binary tree are marked grey in Figure 1. Our goal is to isolate such hot spots as much as possible. On the other hand, the statistics do not necessarily underly strong transaction requirements and therefore can be managed by the DBMS separately. Further concurrency problems relate to the reorganization of data and index structures, which may also be triggered by read-only accesses. A possible solution would be deferred reorganizations. Operations based on access-balanced index structures: as a consequence of new index structures, index based operations and their implementation have to be investigated. Simple examples would be range scans, sorting and ordered scans, merge joins, etc. Possibilities of extending commercial DBMS: because the basic principles of the proposed approach are quite oppositional to index management and usage in current DBMS, the integration of according concepts in commercial solutions will not be an easy task. Therefore, we investigate the transfer of partial results, for instance based on partial or multi-layered indexes, into existing systems. Evaluation: as pointed out before, the introduced concepts require a thorough evaluation with a complex test environment. On the one hand, index structures must be evaluated regarding the advantages of access-based balancing as well as the overhead and speed of reorganization. Finally, the concept of sparse indexing must be evaluated separately to quantify the performance of the overall indexing schema. 5. Related Work Index tuning in general – whether carried out by experts, supported by tools, or even accomplished autonomously by the DBMS – always comprises the selection of an index configuration providing the highest possible system efficiency. Autonomous index tuning furthermore requires support for index replacement strategies. The selection of an index configuration is an important task of physical database design. However, this problem is considered only during design time in literature and in current practice, for overviews see [18, 8]. Our approach of building indexes during query processing has some different characteristics and challenges: early adaption to a current workload, iterative update of the index statistics, index replacement strategies, and possible usage of table scans of a query for index building. Therefore, there are many similarities to other self-tuning features of a database system, for instance the cache and buffer management. There are several academic approaches as well as database products for advising index selections [4, 3, 12, 2, 15, 5]. A common approach is the analysis of a workload given by the database administrator or by former queries from a log file. Using this approach several techniques were developed which use either a separate cost model or rely on the optimizer estimates. The works [3, 6] belong to the class of techniques which make use of a separate cost model. Relying on a stand-alone cost model has the important disadvantage, that the tool cannot exactly estimate the real system behavior. In contrast, an optimizer-based approach works directly with the system’s estimations. The work of [6] also deals with adaption to changing workloads at run time, but it is based on its own cost model, in contrast our approach is based on optimizer estimations. An early realization of the optimizer-based approach is described in [4]. This work describes the design tool DBDSGN, which relies on the System R optimizer and computes for a given workload an optimal index configuration. This approach inspired the index wizards and advisors of current database management systems [2, 17, 1]. The work described in [10] deals with view and index selection in a data warehouse environment. It combines a cost-based method with a set of rules of thumb. The cost- based technique uses an A* algorithm, but it does not sufficiently reduce the search space for real world scenarios, which leads to the rule set. The authors of [5, 7] propose another technique for index selection for OLAP. Here, indexes are considered during the selection of materialized views. The cost model is based on the estimated number of returned rows of a query and is independent from the optimizer. As an optimization algorithm the authors used a greedy algorithm. The greedy behavior prevents the discovery of index interaction, e.g. in a merge-join. Therefore, Chaudhuri and Narasayya included in their index selection an exhaustive search for the best configurations [2]. Furthermore, the algorithm in [2] consists of two phases: enumerating possible configurations from every single query of the workload and subsequently, selecting the final configuration by using the mentioned combined greedy approach with an optimizer-based cost model. 6. Conclusion Index tuning is – among others – an important task for fulfilling the query execution time requirements of modern database applications. Today’s commercial database systems support this task with so-called index wizards or advisors recommending indexes based on a given workload. In our previous research we argued that this support can be further improved by a tight integration of query monitoring, index selection and building with the actual query processing. Based on a cost model we presented different strategies for identifying potentially beneficial indexes, maintaining statistics on index profits and deciding about indexes for creating and/or dropping. Furthermore, we discussed how this approach can be implemented on top of a commercial DBMS providing basic facilities for index recommendation. Though, in our implementation we exploited some special features of DB2 the approach basically can be ported to other systems providing similar support for index recommendation. Finally, to overcome some problems with the previous approach and provide a more natural solution we proposed moving the index tuning problem from the configuration layer to the index layer itself. For this purpose, we illustrated some advantages of the new approach based on a simple, flexible, and access-balanced binary tree index. Because this is research work that started only recently, we gave an overview of the issues to be considered for selftuning index structures. References [1] Oracle9i Database Online Documentation, Release 2 (9.2), 2002. [2] S. Chaudhuri and V. R. Narasayya. An Efficient CostDriven Index Selection Tool for Microsoft SQL Server. In VLDB’1997, pages 146–155, 1997. [3] S. Choenni, H. M. Blanken, and T. Chang. On the Selection of Secondary Indices in Relational Databases. DKE, 11(3):207 – 234, December 1993. [4] S. J. Finkelstein, M. Schkolnick, and P. Tiberio. Physical Database Design for Relational Databases. TODS, 13(1):91– 128, 1988. [5] H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman. Index Selection for OLAP. In ICDE’1997, pages 208–219, 1997. [6] M. Hammer and A. Chan. Index Selection in a Self-Adaptive Data Base Management System. In SIGMOD’1976, pages 1–8, 1976. [7] V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing Data Cubes Efficiently. In SIGMOD’1996, pages 205–216, 1996. [8] IEEE. Bulletin of the Technical Committee on Data Engineering, volume 22, June 1999. [9] T. Kraft, H. Schwarz, R. Rantzau, and B. Mitschang. CoarseGrained Optimization: Techniques for Rewriting SQL Statement Sequences. In VLDB’2003, pages 488–499, 2003. [10] W. Labio, D. Quass, and B. Adelberg. Physical Database Design for Data Warehouses. In ICDE’1997, pages 277–288, 1997. [11] E. J. O’Neil, P. E. O’Neil, and G. Weikum. The LRU-K Page Replacement Algorithm For Database Disk Buffering. In SIGMOD’1993, pages 297–306, 1993. [12] S. Rozen and D. Shasha. A Framework for Automating Physical Database Design. In VLDB’1991, pages 401–411, 1991. [13] K. Sattler, I. Geist, and E. Schallehn. QUIET: Continuous Query-driven Index Tuning (Software Demonstration). In VLDB’2003, pages 1129–1132, 2003. [14] K. Sattler, E. Schallehn, and I. Geist. Autonomous querydriven index tuning. In Proc. Int. Database Engineering and Applications Symposium (IDEAS 2004), Coimbra, Portugal, pages 439–448, July 2004. [15] B. Schiefer and G. Valentin. DB2 Universal Database Performance Tuning. Bulletin of the Technical Committee on Data Engineering, 22(2):12–19, June 1999. [16] M. Stonebraker. The case for partial indexes. Sigmod Record, 18(4):4–11, Dec. 1989. [17] G. Valentin, M. Zuliani, D. Zilio, G. Lohman, and A. Skelley. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In ICDE’2000, pages 101–110, Mar. 2000. [18] G. Weikum, A. Mönkeberg, C. Hasse, and P. Zabback. Selftuning Database Technology and Information Services: From Wishful Thinking to Viable Engineering. In VLDB’2002, pages 20 – 31, 2002.