Academia.eduAcademia.edu

Database Self-Management: Taming the Monster

2011

We describe our experience in the development of self-managing technologies in Sybase SQL Anywhere, a full-featured relational database system that is commonly embedded with applications developed by Independent Software Vendors (ISVs). We illustrate how SQL Anywhere's embeddability features work in concert to provide a robust data management solution in zero-administration environments. 1 Introduction Sybase SQL Anywhere is a full-function, ISO SQL standard-compliant relational database server designed to work in a variety of frontline environments, from traditional server-class back-office installations to handheld devices running the Windows CE operating system. SQL Anywhere includes features common to enterpriseclass database management systems, such as intra-query parallelism, materialized views, OLAP functionality, stored procedures, triggers, full-text search, and hot failover. SQL Anywhere supports a variety of 32-and 64bit hardware platforms, including Microsoft Windows; various flavours of UNIX including Linux, Solaris, AIX, HP-UX, and Apple OS/X; and Windows CE. However, the strength of SQL Anywhere is in its self-managing technologies, which allows the server to offer SQL data management, query processing, and synchronization capabilities in zero-administration environments. In this paper, we describe our experience with some of the self-managing technologies present in SQL Anywhere. These self-managing technologies are fundamental components of the server, not merely administrative add-ons that assist a database administrator in configuring the server's operation-since in embedded environments there is often no trained administrator to do so. These technologies work in concert to offer the level of self-management and adaptiveness that embedded application software requires. It is, in our view, impossible to achieve effective self-management by considering these technologies in isolation. 2 Query Optimization Certainly one of the most significant areas within the SQL Anywhere server that concerns self-managing behaviour is the SQL Anywhere query optimizer [1-3]. A significant portion of SQL Anywhere revenue stems from Independent Software Vendors (ISV's) who embed SQL Anywhere with their applications, some of which are deployed on thousands, and in some cases millions, of computers. These types of deployments typically share the following characteristics:

Database Self-Management: Taming the Monster Mohammed Abouzour Ivan T. Bowman Peter Bumbulis David E. DeHaan Anil K. Goel Anisoara Nica G. N. Paulley John Smirnios Sybase, An SAP Company Abstract We describe our experience in the development of self-managing technologies in Sybase SQL Anywhere, a full-featured relational database system that is commonly embedded with applications developed by Independent Software Vendors (ISVs). We illustrate how SQL Anywhere’s embeddability features work in concert to provide a robust data management solution in zero-administration environments. 1 Introduction Sybase SQL Anywhere is a full-function, ISO SQL standard-compliant relational database server designed to work in a variety of frontline environments, from traditional server-class back-office installations to handheld devices running the Windows CE operating system. SQL Anywhere includes features common to enterpriseclass database management systems, such as intra-query parallelism, materialized views, OLAP functionality, stored procedures, triggers, full-text search, and hot failover. SQL Anywhere supports a variety of 32- and 64bit hardware platforms, including Microsoft Windows; various flavours of UNIX including Linux, Solaris, AIX, HP - UX , and Apple OS / X ; and Windows CE . However, the strength of SQL Anywhere is in its self-managing technologies, which allows the server to offer SQL data management, query processing, and synchronization capabilities in zero-administration environments. In this paper, we describe our experience with some of the self-managing technologies present in SQL Anywhere. These self-managing technologies are fundamental components of the server, not merely administrative add-ons that assist a database administrator in configuring the server’s operation—since in embedded environments there is often no trained administrator to do so. These technologies work in concert to offer the level of self-management and adaptiveness that embedded application software requires. It is, in our view, impossible to achieve effective self-management by considering these technologies in isolation. 2 Query Optimization Certainly one of the most significant areas within the SQL Anywhere server that concerns self-managing behaviour is the SQL Anywhere query optimizer [1–3]. A significant portion of SQL Anywhere revenue stems from Independent Software Vendors (ISV’s) who embed SQL Anywhere with their applications, some of which are deployed on thousands, and in some cases millions, of computers. These types of deployments typically share the following characteristics: Copyright 2011 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 1 • There is no DBA available to ‘tune’ the application, to run workload analysis tools, perform capacity planning analysis, or monitor the system. • The database workload is mixed, with both OLTP-like transactions and reporting queries mixed together. The nature of embedded applications renders workload partitioning, or sharding, moot. • There is little correlation between the size of the database instance and the size of the schema, or the complexity of the application’s queries. SQL queries with a join degree of 15-to-20 are routine. SQL queries with 50 or more total quantifiers (tables, views, or derived tables) are common enough to be unsurprising. • In turn, there is often little correlation between a query’s complexity and its execution time. In some cases, even complex queries can execute very quickly, particularly over small instances. In such cases, the optimization process must be very efficient. For these queries, the optimizer should not spend ten seconds (for example) optimizing a query that will run in under one second unless this time can be amortized over many executions. • The database server executes concurrently with other applications on the same computer. Hence the server must co-operate with these applications for limited resources, particularly memory. • The system hardware, operating system platform, and application workload characteristics, including peak-to-average ratios, may vary from installation to installation, making a priori application tuning unhelpful. Moreover, a significant trend in application development over the past several years is the use of ObjectRelational Mapping tools, such as Hibernate and NHibernate, and complex application development environments such as Microsoft Entity Framework, that map a relational schema into object-oriented abstractions within the application [4]. As a result, these frameworks can generate a mix of both simple and complex queries whose construction is largely outside the control of the application developer. Hence the ability to syntactically annotate SQL statements with tuning parameters is largely absent with the use of these frameworks. As a result of the above constraints, we ‘tamed the monster’ by engineering the SQL Anywhere optimizer to be relatively inexpensive—in terms of both memory and CPU consumption—so that each SQL request can be optimized while taking into account the server context at that particular moment [5]. Server context includes the amount of available query memory for query execution, the number of available worker threads for intra-query parallelism, and the approximate contents of the server’s buffer pool which directly influences I / O cost. One of SQL Anywhere’s enumeration methods is a branch-and-bound algorithm [1], which is cheap due to careful engineering to minimize memory allocation and aggressively prune sub-optimal execution plans. Over time, and due in part to the ever-increasing sophistication of SQL Anywhere applications—particularly those that involve synchronization—along with the ever-expanding SQL language and the explosion in the use of Object-Relational mapping tools, SQL Anywhere’s query optimizer has become increasingly more sophisticated—and self-adapting. Included in the optimization process are sophisticated static and cost-based query rewrite optimizations [6, 7], including those for optimizing queries over materialized views [3]; the detailed analysis of a wide range of physical query execution operators; sophisticated selectivity and cardinality estimation; and an exploration of a space of possible plans that include parallelization. Our server software has been spectacularly successful at meeting the needs of our customers, whether they deploy SQL Anywhere server software on 64-bit installations of Windows 2008 Server, or the identical server software on Windows CE handheld devices. Experience. However, despite the fact that the SQL Anywhere optimizer is relatively cheap, with this greater sophistication comes longer code paths, and consequently greater overhead, with each new SQL Anywhere release. OPEN time for a request is the critical measure for many applications, which includes the time to 2 construct the statement’s necessary abstractions within the server in addition to optimization time per se. As with other commercial relational database systems, we too have tried to mitigate this additional overhead, in largely two ways. The first technique is execution plan caching, which amortizes the optimization cost over multiple invocations of the same SQL statement. The second is ‘bypass’ optimization for ‘simple’ SQL statements where the best execution strategy can be determined heuristically. These mitigation techniques have been difficult to ‘get right’ in the sense that errors in selecting which plans to cache (or which cached plans to abandon), or precisely which queries will be optimized heuristically, lead to inconsistencies in query execution times. This inconsistency, coupled with a cost model that takes server context into account, yields another ‘monster’ that we term the ‘dancing optimizer’, named after the stereotypical dancing Russian circus bear. We are actively researching new optimization techniques to tame this monster. The usage patterns of SQL Anywhere described above dictate that the optimization process continue to adapt to available resources, query complexity, and estimated query cost. Join enumeration algorithms can be extremely expensive for complex queries—for example, the exhaustive enumeration of the space of bushy trees—regardless of how small the estimated runtime is. Moreover, the memory consumption of these algorithms may be prohibitive. One important lesson learned from designing the SQL Anywhere optimizer is that there is no ideal join enumeration algorithm which will perform well—that is, generate good-quality access plans in a CPU- and memory-efficient way—for all types of queries, or available system resources. A robust design is to have a set of join enumeration algorithms to choose from when optimizing a particular query, based on the available system resources, query complexity, and its estimated runtime [2]. Query complexity, for example, can be used to predict the memory needed for the memoization process as well as the CPU required for enumeration. Efficient cost-based and heuristic pruning should be employed to adjust the optimization time to the expected query cost. In addition, we are researching diagnostic tools with which to support the analysis of performance issues in the field. One novel capability of the SQL Anywhere optimizer is to log the set of rejected plans for a given SQL statement, so that these may be compared with the plan that was eventually chosen [8]. We are also continuing to research techniques that will permit the diagnosis of performance issues, due to sub-optimal execution plans chosen by the optimizer, when the customer’s database is unavailable (for example, due to regulatory restrictions). We have augmented the execution plan information output by the query optimizer to contain information ranging from the cost-model computations to the cost-based and heuristic decisions made by the optimizer during join enumeration. These traces of the exploration of the search space can be analyzed in isolation, without the customer database, and also can be used to generate synthetic data to simulate the original database instance. 3 Self-managing Statistics Anywhere has used query feedback techniques since 1992 to automatically gather statistics during query processing. Rather than require explicit generation of statistics, by scanning or sampling persistent data, the server automatically collects statistics as part of query execution. More recent work by other researchers [9–13] has also exploited this notion of collecting statistics as a by-product of query execution. In its early releases, SQL Anywhere computed frequent-value statistics from the evaluation of equality and IS NULL predicates, and stored these statistics persistently in the database for use in the optimization of subsequent queries. Later, support was added for inequality and LIKE conditions. An underlying assumption of this model is that the data distribution is skewed; values that do not appear as a frequent-value statistic are assumed to be in the ‘tail’ of the distribution. Today, SQL Anywhere uses a variety of techniques to gather statistics and maintain them automatically. These include index statistics, index probing, analysis of referential integrity constraints, three types of singlecolumn self-managing histograms, and join histograms. SQL 3 Histogram implementation. SQL Anywhere histograms are equi-depth histograms whose number of buckets expand and contract dynamically as column distribution changes are detected. As is typical, we use the uniform distribution assumption when interpolating within a bucket. SQL Anywhere histograms combine traditional buckets with frequent-value statistics, known as ‘singleton buckets’. A value that constitutes at least 1% or ‘top N’ of the values is saved as a singleton bucket. The number of singletons retained in any histogram depends on the size of the table and the column’s distribution, but lies in the range [0,100]. An additional metric, which we term histogram density, is maintained for each histogram. Histogram density is an estimate of the average selectivity of all values in the column that are not covered by singleton buckets. In other words, the histogram density is the average selectivity of individual values in the tail of the distribution. Thus density gives an approximate selectivity estimate for an equality predicate whose value falls within a non-singleton bucket. An advantage of our definition of histogram density is that it does not reflect the high-frequency outliers in the data distribution, so a single average value will serve as an accurate estimate for many data distributions. For efficiency and simplicity, the same infrastructure is used for all data types except longer string and binary data. An order-preserving hash function, whose range is a double-precision floating point value, is used to derive the bucket boundaries on these data types. For longer string and binary data types, SQL Anywhere uses a different infrastructure that dynamically maintains a list of observed predicates and their selectivities. In addition, not only are buckets created for entire string values, but also for LIKE patterns intended to match a ‘word’ somewhere in the string. Statistics collection during query processing. Histograms are created automatically when data is loaded into the database using a LOAD TABLE statement, when an index is created, or when an explicit CREATE STATISTICS statement is executed. A modified version of Greenwald’s algorithm [14] is used to create the cumulative distribution function for each column. Our modifications significantly reduce the overhead of statistics collection with a marginal reduction in quality. In addition to these bulk operations, histograms are automatically updated during query processing. During normal operation, the evaluation of (almost) any predicate over a base column can lead to an update of the histogram for this column. INSERT, UPDATE, and DELETE statements also update the histograms for the modified columns. Each histogram’s density metric is updated using a moving average with exponential smoothing. That is, given histogram density d and an observed value selectivity v, we take the new histogram density to be d+λ(v −d) where λ is a small constant. We apply this procedure to observed selectivities due to query feedback from equality predicates, and also for selectivities of singleton buckets that are removed from the histogram. Experience. Techniques such as query feedback enable SQL Anywhere’s ability to perform well in zeroadministration environments. However, a monster lurks: these self-managing techniques would occasionally fail to satisfactorily maintain high-quality statistics. Part of the issue is that statistics are modified on a perconnection basis, so concurrent transactions can result in incorrect histogram modifications. Another issue is that, to avoid concurrency bottlenecks, histogram maintenance is not transactional. These issues have prompted the development of ‘self-healing’ measures to identify inaccurate statistics, determine appropriate remedies to correct the problem, and schedule those remedies for execution, all without DBA intervention. The SQL Anywhere server automatically tracks estimated predicate selectivities to actual values, and when the amount or frequency of error is considered high enough, determines how the error should be rectified. One technique is to replace outright the current histogram with a reconstructed one based on the query’s experience feedback. If the error is more widespread, however, the server may schedule one of several different piggybacking mechanisms (cf. reference [13]) to reconstruct portions of the histogram through (a) piggybacking off a user query, (b) index sampling, or (c) stratified table scan sampling. In addition, the server tracks the frequency with which specific histograms are used, and concentrates its maintenance efforts on those histograms most critical to the user’s workload. These feedback mechanisms, along with on-the-fly, real-time index and table statistics, such as the number of distinct values, number of leaf pages, and clustering statistics, have greatly improved the accuracy of the statistics used for query optimization in SQL Anywhere. Recent customer experience strongly suggests that these 4 techniques have significantly reduced the need for manual or scheduled intervention to correct statistical errors. Nonetheless, we are continuing to enhance our histogram maintenance techniques, including the development of analysis and reporting tools to enable diagnosis of histogram and other statistical anomalies without requiring access to the raw customer data. 4 Adaptive Query Execution Anywhere uses feedback control to ‘tune’ buffer pool memory to meet overall system requirements, including memory demands from other applications. The feedback control mechanism uses the OS working set size, the amount of free physical memory, and the buffer miss rate as its inputs, which are ordinarily polled once per minute but are polled much more frequently at server startup in anticipation of application activity. Since the buffer pool size can grow—or shrink—on demand, query execution must be able to adapt to execution-time changes in the amount of available physical memory. All page frames in SQL Anywhere’s heterogeneous buffer pool are the same size, and can be used for table and index pages, undo or redo log pages, bitmaps, query memory, or connection heaps. Each type of page has different usage patterns: heap pages, for example, are local to a connection, while table pages are shared. Therefore, the buffer pool’s page replacement algorithm, a modified generalized ‘clock’ algorithm [15], must be aware of differences in reference locality. No attempt is made in SQL Anywhere to support buffer pool partitions. In our view, dynamic changes to the system’s workload that occur frequently in SQL Anywhere deployments render static workload configuration tools ineffective. SQL Anywhere’s query optimizer can automatically annotate a chosen query execution strategy with alternative plan operators that offer a cheaper execution technique if either the optimizer’s choices are found to be suboptimal at run-time, or the operator requires a low-memory strategy at the behest of the memory governor (see below). For example, the optimizer may construct an alternative index-nested loops strategy for a hash join, in case the size of the build input is considerably smaller than that expected at optimization time. Hash join, hash-based duplicate elimination, and sorting are examples of operators that have low-memory execution alternatives. A special operator for execution of RECURSIVE UNION is able to switch between several alternative strategies, possibly using a different one for each recursive iteration. To ensure that SQL Anywhere maintains a consistent memory footprint, in-memory data structures used by query processing operators are allocated within heaps that are subject to page replacement within the buffer pool. Moreover, as the buffer pool can shrink during query execution, memory-intensive query execution operators must be able to adapt to changing memory availability. The memory governor controls query execution by limiting memory consumption for a statement to not exceed its quota. Approaching this quota may result in the memory governor requesting query execution operators to free memory if possible. For example, hash-based operations in SQL Anywhere choose a number of buckets based on the expected number of distinct hash keys; the goal is to have a small number of distinct key values per bucket. In turn, buckets are divided uniformly into a small, fixed number of partitions. The number of partitions is selected to provide a balance between I / O behaviour and fanout. During the processing of the hash operation, the governor detects when the query plan needs to stop allocating more memory—that is, it has exceeded its quota. When this happens, the partition with the largest number of rows is evicted from memory. The in-memory rows are written out to a temporary table, and incoming rows that hash to the partition are also written to disk. By selecting the partition with the most rows, the governor frees up the most memory for future processing, in the spirit of other documented approaches in the literature [16]. By paying attention to the memory quota while building the hash table on the smaller input, the server can exploit as much memory as possible, while degrading adaptively if the input does not fit completely in memory. In addition, the server also pays attention to the quota while processing the probe input. If the probe input uses memory-intensive operators, their execution SQL 5 may compete with the hash join for memory. If an input operator needs more memory to execute, the memory governor evicts a partition from the consuming operator’s hash table. This approach prevents an input operator from being starved for memory by a consumer operator. Experience. Adaptive query execution techniques are essential in SQL Anywhere due to the other autonomic systems within it, such as dynamic cache sizing, that can negatively impact query execution strategies. Today, adaptive execution in SQL Anywhere involves trying to salvage an execution strategy when the server context at execution time was not anticipated by the optimizer. This conservative approach ‘tames’ the monster of catastrophically slow execution plans or memory panic situations at execution time. So far, we have not implemented potentially expensive mitigation techniques such as on-the-fly query re-optimization. However, to better manage the tradeoff between conservative resource allocation and optimal performance, we are considering other approaches, such as more sophisticated execution plan caching techniques. 5 Self-tuning Multiprogramming Level Several commercial database servers (e.g., Microsoft SQL Server, Oracle, and SQL Anywhere) employ a workerper-request server architecture. In this architecture, a single queue is used for all database server requests from all connections. A worker dequeues a request from the request queue, processes the request, and then returns to service the next available request in the queue or block on an idle queue. In this configuration, there are no guarantees that a single connection will be serviced by the same worker. This architecture has proved to be more effective in handling large numbers of connections and, if configured properly, has less overhead [17]. The difficult issue with this architecture is how to set the size of the worker pool to achieve good throughput levels [18]. This parameter effectively controls the server’s multiprogramming level (MPL). Setting the server’s MPL is subject to tradeoffs. A large MPL permits the server to process a large number of requests concurrently, but with the risk of thrashing due to hardware or software resource contention (locks) or to excessive context switching between threads [19]. With a small MPL, the server is able to apportion a larger amount of server memory for query execution. The obvious drawback, however, is that the server’s hardware resources are under-utilized. The issue in zero-administration or deployed environments is that setting the server’s MPL a priori is largely impossible because of changes to both the server’s execution environment and the database workload over time. SQL Anywhere overcomes this problem by dynamically adjusting the server’s multiprogramming level to maximize throughput during server operation [20]. To perform the MPL adjustment, the SQL Anywhere kernel uses a feedback controller that oversees two algorithms—a hill climbing algorithm and a global parabola approximation algorithm—that significantly extends prior work by Heiss and Wagner [21]. The feedback controller periodically switches between the two algorithms. The combined approach is effective at maintaining throughput over varying workloads without large oscillations. Experience. Hand-tuning of a server’s multiprogramming level is a notoriously difficult task even with well-understood workloads, making MPL tuning a monster to be tamed. In SQL Anywhere, hand-tuning is doubly difficult because SQL Anywhere can assign multiple workers to a single request to achieve intra-query parallelism. The degree of parallelization is determined by the query optimizer in a cost-based manner, made with respect to the characteristics of the system context at optimization time. There is no static partitioning of work amongst the workers assigned to a query. Rather, a parallel query plan is organized so that any active thread can grab and process a small unit of work from anywhere in the data flow tree. The unit of work is usually a page’s worth of rows. It is not important which thread executes a given unit of work; a single thread can execute the entire query if additional threads are not available. This means that parallel queries adapt gracefully to fluctuations in server load. However, autonomic MPL tuning has not slayed the monster outright; in the field, automatic adjustment does not perform well with periodic, ‘bursty’ workloads. Additional research is ongoing to develop better feedback control techniques to solve this problem. 6 6 Indexing Spatial Data Anywhere offers support for spatial data, with functionality modelled after the SQL / MM spatial standard [22]. Spatial data is indexed by linear quadtrees [23], which use a Z-curve to map two-dimensional quadtree tiles into 64-bit integer keys suitable for storage in a B-tree index. Query performance over linear quadtrees is heavily impacted by the granularity with which spatial objects are tessellated [24]. Choosing a tessellation granularity requires navigating a trade-off between index filtering efficacy and index size, and entails knowledge of both the data distribution and the query workload. Other commercial systems specify the tessellation granularity at index creation time, and the same tessellation algorithm is applied to both data objects (upon insertion) and to query objects over the index [24, 25]. Unfortunately, choosing tessellation granularity up front becomes problematic in embedded environments where either the data or the query distributions may not be known at design time. The approach we have taken in SQL Anywhere is to separate the choice of tessellation granularity for data and query objects. In order to ensure robust index performance in all scenarios, data objects are tessellated with a fixed coarse granularity. This conservative approach favours index size over filtering efficacy and guarantees that index scans remain as cheap as possible in case the query workload includes large containment queries; any reduction in filtering quality at the index level is mitigated by various second-stage filters. Tessellation of spatial query objects is a great example of how several self-tuning technologies in SQL Anywhere work together. Query objects are tessellated dynamically during query execution by a cost-based algorithm that examines both the data distribution and the current server state. Within sparse data regions a query object is tessellated coarsely, thereby reducing the number of key range probes against the index; as data density increases the tessellation becomes more fine-grained, yielding more but tighter range probes. The query execution plan includes alternatives for both the spatial index or a table scan, allowing the tessellation algorithm to make a cost-based decision to revert to a table scan for large query objects whose index scanning cost is prohibitive. Given a spatial join where objects from one table form the queries against the spatial index of a second table, changes in buffer pool contents during execution of the join is taken into account by the cost function when tessellating successive query objects. Furthermore, because the knowledge of the data distribution is obtained from the server’s self-tuning histograms, query tessellation adapts dynamically to both changes in the underlying data and improvements in histogram accuracy resulting from query feedback. SQL 7 Conclusions and Future Work Taming the various self-management monsters is exceedingly difficult. System and regression testing of autonomic systems like SQL Anywhere remain unsolved problems. Yet as challenging as these problems are, diagnosing performance problems experienced by customers can be even more difficult. Increasingly, privacy and legislative constraints prevent access to customer databases. In these situations, the existence of diagnostic tools that the customer can execute is essential. As one example, graphical representations of SQL Anywhere execution plans can contain information related to the search space considered by the query optimizer. This information ranges from the various cost model computations to the cost-based and heuristic decisions made by the optimizer during join enumeration. These traces of optimization search spaces can be analyzed in isolation, and also can be used to generate synthetic data to match the original customer database. Database self-management does have the potential to exacerbate its own monster: unpredictable query execution times. However, our twenty years of customer experience has shown that SQL Anywhere’s capacity to adapt to changing execution contexts largely trumps those infrequent situations when a specific query’s execution time has become intolerable. In the main, the term ‘database self-management’ typically applies to performance characteristics. However, an often-neglected aspect of self-management is with error handling, increasingly relevant due to the use of commodity hardware that will fail in predictable ways [26]. 7 Finally, we note that with the proliferation of database systems in many different environments, backwards compatibility has become increasingly important. While such constraints are occasionally inconvenient, backwards compatibility is critical when an ISV has millions of deployed seats, and can’t possibly upgrade all software instantaneously. References [1] Ivan T. Bowman and G. N. Paulley, “Join enumeration in a memory-constrained environment”, in Proceedings, Sixteenth IEEE International Conference on Data Engineering, San Diego, California, Mar. 2000, pp. 645–654. [2] Anisoara Nica, “A call for order in search space generation process of query optimization”, in Proceedings, ICDE Workshops (Self-Managing Database Systems), Hanover, Germany, Apr. 2011, IEEE Computer Society Press. [3] Anisoara Nica, “Immediate materialized views with outerjoins”, in Proceedings, ACM Thirteenth International Workshop on Data Warehousing and OLAP ( DOLAP ), in ACM Ninetheenth Conference on Information and Knowledge Management ( CIKM ), Toronto, Canada, Oct. 2010, pp. 45–52. [4] Craig Russell, “Bridging the object-relational divide”, ACM Queue, vol. 6, no. 3, pp. 16–27, May 2008. [5] Ivan T. Bowman, Peter Bumbulis, Dan Farrar, Anil K. Goel, Brendan Lucier, Anisoara Nica, G. N. Paulley, John Smirnios, and Matthew Young-Lai, “SQL Anywhere: A holistic approach to database selfmanagement”, in Proceedings, ICDE Workshops (Self-Managing Database Systems), Istanbul, Turkey, Apr. 2007, pp. 414–423, IEEE Computer Society Press. [6] G. N. Paulley and Per-Åke Larson, “Exploiting uniqueness in query optimization”, in Proceedings, Tenth IEEE International Conference on Data Engineering, Houston, Texas, Feb. 1994, pp. 68–79, IEEE Computer Society Press. [7] Kristofer Vorwerk and G. N. Paulley, “On implicate discovery and query optimization”, in Proceedings, IEEE International Data Engineering and Applications Symposium, Edmonton, Alberta, July 2002, pp. 2–11, IEEE Computer Society Press. [8] Anisoara Nica, Daniel Scott Brotherston, and David William Hillis, “Extreme visualisation of the query optimizer search spaces”, in ACM SIGMOD International Conference on Management of Data, Providence, Rhode Island, June 2009, pp. 1067–1070. [9] Ashraf Aboulnaga and Surajit Chaudhuri, “Self-tuning histograms: Building histograms without looking at data”, in ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, May 1999, pp. 181–192. [10] Nicolas Bruno and Surajit Chaudhuri, “Efficient creation of statistics over query expressions”, in Proceedings, Nineteenth IEEE International Conference on Data Engineering, Bangalore, India, Mar. 2003, pp. 201–212. [11] Surajit Chaudhuri and Vivek Narasayya, “Automating statistics management for query optimizers”, in Proceedings, Sixteenth IEEE International Conference on Data Engineering, San Diego, California, Mar. 2000, pp. 339–348, IEEE Computer Society Press. 8 [12] Ihab F. Ilyas, Volker Markl, Peter J. Haas, Paul Brown, and Ashraf Aboulnaga, “CORDS: Automatic discovery of correlations and soft functional dependencies”, in ACM SIGMOD International Conference on Management of Data, Paris, France, June 2004, pp. 647–658. [13] Qiang Zhu, Brian Dunkel, Wing Lau, Suyun Chen, and Berni Schiefer, “Piggyback statistics collection for query optimization: Towards a self-maintaining database management system”, Computer Journal, vol. 47, no. 2, pp. 221–244, 2004. [14] Michael Greenwald, “Practical algorithms for self-scaling histograms or better than average data collection”, Performance Evaluation, vol. 20, no. 2, pp. 19–40, June 1996. [15] Alan Jay Smith, “Sequentiality and prefetching in database systems”, Systems, vol. 3, no. 3, pp. 223–247, Sept. 1978. ACM Transactions on Database [16] Hans-Jörg Zeller and Jim Gray, “An adaptive hash join algorithm for multiuser environments”, in Proceedings of the 16th International Conference on Very Large Data Bases, Brisbane, Australia, Aug. 1990, pp. 186–197. [17] Bianca Schroeder, Mor Harchol-Balter, Arun Iyengar, Erich Nahum, and Adam Wierman, “How to determine a good multi-programming level for external scheduling”, in Proceedings, 22nd IEEE International Conference on Data Engineering, Washington, DC, Apr. 2006, p. 60, IEEE Computer Society. [18] Stavros Harizopoulos, Staged Database Systems, PhD thesis, Carnegie Mellon University, 2005. [19] Shimin Chen, Phillip B. Gibbons, Tood C. Mowry, and Gary Valentin, “Fractal prefetching B+ -trees: Optimizing both cache and disk performance”, in ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, June 2002, pp. 157–168, Association for Computing Machinery. [20] Mohammed Abouzour, Kenneth Salem, and Peter Bumbulis, “Automatic tuning of the multiprogramming level in Sybase SQL Anywhere”, in Proceedings, 26th IEEE International Conference on Data Engineering: Workshops, Long Beach, California, Mar. 2010, pp. 99–104, IEEE Computer Society Press. [21] Hans-Ulrich Heiss and Roger Wagner, “Adaptive load control in transaction processing systems”, in Proceedings, 17th International Conference on Very Large Data Bases, Barcelona, Spain, Sept. 1991, pp. 47–54, Morgan-Kaufmann. [22] International Standards Organization, ISO / IEC 13249-3:2011, Information technology—Database languages—SQL Multimedia and application packages: Part 3, Spatial, 2011. [23] Irene Gargantini, “An effective way to represent quadtrees”, Communications of the ACM, vol. 25, no. 12, pp. 905–910, Dec. 1982. [24] Ravi Kanth V Kothuri, Siva Ravada, and Daniel Abugov, “Quadtree and R-tree indexes in Oracle spatial: a comparison using GIS data”, in ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, 2002, pp. 546–557, Association for Computing Machinery. [25] Yi Fang, Marc Friedman, Giri Nair, Michael Rys, and Ana-Elisa Schmid, “Spatial indexing in Microsoft SQL Server 2008”, in ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, 2008, pp. 1207–1216, Association for Computing Machinery. [26] Lakshmi N. Bairavasundaram, Garth R. Goodson, Bianca Schroeder, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, “An analysis of data corruption in the storage stack”, in FAST, Mary Baker and Erik Riedel, Eds. 2008, pp. 223–238, USENIX. 9