Academia.eduAcademia.edu

SQL Anywhere: A Holistic Approach to Database Self-management

2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop

In this paper we present an overview of the selfmanagement features of SQL Anywhere, a full-function relational database system designed for frontline business environments with minimal administration. SQL Anywhere can serve as a high-performance workgroup server, an embedded database that is installed along with an application, or as a mobile database installed on a handheld device that provides full database services, including two-way synchronization, to applications when the device is disconnected from the corporate intranet. We illustrate how the various self-management features work in concert to provide a robust data management solution in zero-administration environments.

SQL Anywhere: A Holistic Approach to Database Self-management∗ Ivan T. Bowman Peter Bumbulis Anisoara Nica G. N. Paulley Dan Farrar Anil K. Goel Brendan Lucier John Smirnios Matthew Young-Lai Sybase iAnywhere Waterloo, Ontario, Canada E-mail: [email protected] Abstract In this paper we present an overview of the selfmanagement features of SQL Anywhere, a full-function relational database system designed for frontline business environments with minimal administration. SQL Anywhere can serve as a high-performance workgroup server, an embedded database that is installed along with an application, or as a mobile database installed on a handheld device that provides full database services, including two-way synchronization, to applications when the device is disconnected from the corporate intranet. We illustrate how the various self-management features work in concert to provide a robust data management solution in zero-administration environments. 1. Introduction Database systems have become ubiquitous across the computing landscape. This is partly because of the basic facilities offered by database management systems: physical data independence, ACID transaction properties, a highlevel query language, stored procedures, and triggers. This permits sophisticated applications to ‘push’ much of their complexity into the database itself. The proliferation of database systems in the mobile and embedded market segments is due, in addition to the features above, to the support for two-way database replication and synchronization offered by most commercial database management systems. Data synchronization technology makes it possible for remote users to both access and update corporate data at a remote, off-site location. With local (database) storage, this ∗ c 2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. can be accomplished even when disconnected from the corporate network, a commonplace circumstance in frontline business environments. SQL Anywhere1 is a full-function, ANSI SQL -compliant relational database server designed to work in a variety of frontline environments, from traditional server-class backoffice installations to handheld devices running the Windows CE operating system. SQL Anywhere supports features common to enterprise-class database management systems, such as intra-query parallelism, materialized views, OLAP functionality, stored procedures, triggers, and hot failover, and does so on a variety of 32- and 64-bit hardware platforms, including Microsoft Windows, various flavours of UNIX including Linux, Solaris, and AIX , Novell Netware, Apple Macintosh, and Windows CE. However, the strengh of SQL Anywhere is in its ability to offer SQL data management, query processing, and synchronization capabilities in zero-administration environments. SQL Anywhere was designed from the outset to offer self-management features permitting its deployment as an embedded database system. As an example, a SQL Anywhere database can be started by a simple client API call from the application, and can shut down automatically when the last connection disconnects. As a second example, SQL Anywhere databases are stored as ordinary OS files and can be managed with the file utilities provided by the operating system. Each database consists of a main database file, a separate transaction log file, and up to 12 additional database files that can be placed on the same filesystem or spread across several. Raw partitions are not supported. The benefit of this simplicity is deployment flexibility. To image copy a database, one simply copies all of its associated files; execution of a database-specific copy utility is not required. Database files are portable amongst all supported platforms, including Windows CE devices, and even if the machines are of different CPU architectures. This flexibil1 Sybase and SQL Anywhere are trademarks of Sybase Inc. Other company or product names referenced herein are trademarks and/or servicemarks of their respective companies. ity makes database deployment, application development, and problem determination significantly easier. In this paper, we describe a wide range of technologies that permit SQL Anywhere to be used in embedded and/or zero-administration environments. These technologies are fundamental components of the server, not merely administrative add-ons that assist a database administrator in configuring the server’s operation. It is important to note that these technologies work in concert to offer the level of selfmanagement and adaptiveness that embedded application software requires. It is, in our view, impossible to achieve effective self-management by considering these technologies in isolation. The paper is structured as follows. In Section 2 we describe how SQL Anywhere manages memory, specifically how it permits the buffer pool to grow and shrink dynamically. Section 3 contains an overview of the server’s selfmanaging statistics implementation. Section 4 describes the server’s adaptive query optimization and execution features, especially those that pertain to memory usage. Section 5 describes tools provided to assist with physical database and application design. Section 6 concludes the paper. 2. Dynamic buffer pool management When a database system is embedded in an application as part of an installable package, it cannot normally use all the machine’s resources. Rather, it must co-exist with other software and system tools whose configuration and memory usage vary from installation to installation, and from moment to moment. It may be possible to perform workload analysis of the application to determine database configuration parameters such as the server’s multiprogramming level [3]. However, it is difficult to predict the system load or the amount of memory that will be available at any point in time. Therefore, SQL Anywhere uses the following approach to buffer pool management: rather than attempting to ‘tune’ buffer pool memory in isolation, the server tunes buffer pool allocation to fit the overall system requirements. It does this by using a feedback control mechanism with the OS working set size as one input (see Figure 1). The OS working set size, which is polled every minute, is the operating system’s amount of real memory in use by the process. Using this feedback control loop, the server’s buffer pool grows or shrinks on demand, depending on system-wide resource usage and the memory management policy of the operating system. This adjustment can be done very efficiently on some operating systems that permit address space to be allocated to a process independent of backing physical memory. The variability of the buffer pool size has implications for query processing. Queries must adapt to execution- Figure 1. Cache sizing feedback control time changes in the amount of available physical memory (see Section 4). The database server attempts to set its buffer pool size to the current OS working set size plus the amount of unused physical memory, keeping a small amount (5 MB) in reserve for use by the OS. If the target size differs from the current size by less than 64 KB, then the buffer pool size remains unchanged. The target buffer pool size is constrained within a lower bound and an upper bound. These bounds do not change during the lifetime of the server and their default values can be overridden at server start. The upper bound determined at startup is a hard limit, but SQL Anywhere also imposes a soft limit by ensuring that the target buffer pool size is between the lower bound and min(database size + main heap size, upper bound ). (1) Database size includes the size of the temporary files used for intermediate results. Hence, larger temporary files will automatically unconstrain the maximum buffer pool size. In addition to the two reference inputs in the feedback control loop—the OS working set size and the amount of free physical memory—the buffer pool manager also monitors the buffer miss rate. If there are no buffer pool misses between polling times, the buffer pool governor will not permit the buffer pool to grow. A lack of buffer pool page replacements may mean the server is largely idle, or that the working set of database pages is entirely resident in the buffer pool; either situation makes it unnecessary to increase its size. However, the buffer pool is always allowed to shrink, regardless of buffer pool activity, if the new target buffer pool size is smaller than the current size. To avoid undesirable fluctuations, the server applies a damping factor to size changes by resizing the pool to 0.9 ∗ new ideal size + 0.1 ∗ current size. (2) Our general experience is that the SQL Anywhere dynamic buffer pool sizing heuristics are stable when the overall system load is not wildly fluctuating. Since the sampling period is nominally one minute, the buffer pool size does not respond rapidly to sudden changes to memory demands within the system. However, the server will decrease its sampling period to 20 seconds at startup and when the database grows significantly, to respond more quickly when it is expected that the demands on memory by the database server may be changing rapidly. The sampling period is not affected by sudden memory usage fluctuations elsewhere in the system. A slightly modified version of this feedback control algorithm is used on Windows CE platforms because the Windows CE operating system resource manager lacks the ability to report the current working set size for an application. Instead, the feedback control algorithm utilizes as its reference input the current buffer pool size. In this case, the buffer pool will grow only when there is an increase in the amount of free memory available on the device. However, and more importantly, the buffer pool may shrink when other applications allocate additional memory. 2.1. Heaps A novel feature of SQL Anywhere is that the buffer pool is a single heterogeneous pool of all types of pages: table pages, index pages, undo and redo log pages, bitmaps, free pages, and heap pages. To support efficient buffer pool management, all page frames are the same size. To ensure that SQL Anywhere maintains a consistent memory footprint, in-memory data structures created and utilized for query processing, including hash tables, prepared statements, cursors, and similar structures, are allocated within heaps. When a heap is not in use—for example, when the server is awaiting the next FETCH request from the application—the heap is ‘unlocked’. Pages in unlocked heaps can be stolen and used by the buffer pool manager for other purposes, such as table or index pages, as required. When this happens, the stolen pages are swapped out to the temporary file. To resume the processing of the request, the heap is re-locked, pinning its pages in physical memory. A pointer swizzling technique (cf. reference [7]) is used to reset pointers in pages relocated during re-locking. For a number of key data structures, SQL Anywhere uses disk-based implementations to eliminate or reduce the need for limits that would require tuning, automatically or otherwise. For example, to reduce stack pressure, SQL Anywhere copies and resets the stack, and restores it afterwards, before executing a stored procedure. As well, long-term locks are stored in a disk-based extensible hash table, avoiding the need for specifying a lock table size or lock escalation thresholds. Similarly, strings longer than a page have disk- based implementations. These techniques allow SQL Anywhere to eliminate restrictions on what data types can be indexed. SQL Anywhere uses lightweight fibers rather than OS threads on systems where they are available. One advantage of fibers is that the server can schedule tasks itself, rather than accepting the decisions made by the OS thread scheduling algorithm. However, the main adaptive advantage is that fibers contribute to an overall framework for flexible reallocation of memory resources. If a request running on a fiber blocks or is suspended, and its heaps are swapped out, then its memory and address space requirements are very small. This, as well as techniques described below such as low-memory fallbacks for query plans, combine to give the server much flexibility in reducing its memory footprint on demand. 2.2. Page replacement strategy As described above, all page frames in the buffer pool are the same size, and can be used for table and index pages, undo or redo log pages, bitmaps, or connection heaps. Each type of page has different usage patterns: heap pages, for example, are local to a connection, while table pages are shared. Therefore, the buffer pool’s page replacement algorithm must be aware of differences in reference locality. For example, it must recognize that adjacent references to a single page during a table scan are different from other reference patterns. SQL Anywhere uses a modified generalized ‘clock’ algorithm [18] for page replacement. In addition, we have implemented techniques to reduce the overhead of the clock algorithm, yet quickly recognize page frames that can be reused. Conceptually, the clock algorithm uses a moving ‘window’ over the entire buffer pool, where pages are ordered by their time of last reference. The entire buffer pool is divided into eight segments based on the page reference time series. The ‘score’ of a page is incremented as it moves from segment to segment. Pages with lower scores are candidates for replacement. Page scores are decayed exponentially to ensure that all pages can eventually become candidates for replacement if they are not re-referenced. One modification to this basic algorithm is the addition of a lookaside queue of pages that can be reused immediately. Typically, pages in this queue are heap and temporary table pages. The queue is implemented using a lock-free array that allows a fast decision whether a page is reusable. If the queue is empty, then the clock algorithm is used to choose a page to replace. It is important that the queue be lock-free to avoid the use of semaphores, which are expensive to implement on most hardware platforms. 3. Self-managing statistics SQL Anywhere has used query execution feedback techniques since 1992 to gather statistics automatically during query processing. Rather than require explicit statistics generation by scanning or sampling persistent data, the server automatically collects statistics as part of query execution. More recent work by other researchers [1, 24] has also exploited this notion of collecting statistics as a side effect of query execution. In its early releases, SQL Anywhere computed frequentvalue statistics from the evaluation of equality and Is Null predicates in a manner similar to that described by Lynch [11], and stored these statistics persistently in the database for use in the optimization of subsequent queries. Later, support was added for inequality and LIKE conditions. An underlying assumption of this model is that the data distribution is skewed; values that do not appear as a frequentvalue statistic are assumed to be in the ‘tail’ of the distribution. These frequent-value statistics were also used to compute a density measure for each base table column that was used to compute join selectivity. Today, SQL Anywhere uses a variety of techniques to gather and maintain statistics automatically. These include the maintenance of index statistics, index probing, analysis of referential integrity constraints, three types of singlecolumn self-managing histograms, and join histograms. 3.1. Histogram implementation SQL Anywhere histograms are equi-depth histograms [10] whose number of buckets expand and contract dynamically as column distribution changes are detected. As is typical, we use the uniform distribution assumption when interpolating within a bucket. SQL Anywhere histograms combine traditional buckets with frequent-value statistics, known as ‘singleton buckets’. A value that constitutes at least 1% or ‘top N’ of the values is saved as a singleton bucket. The number of singletons retained in any histogram depends on the cardinality of the table and the column’s data distribution, but lies in the range [0,100]. A ‘density’ value is also computed for each column, which represents the average selectivity of a single value that is not saved as a singleton bucket. Density values are used by the query optimizer as a guide for estimating the cardinality of intermediate results and for interpolating within a bucket. When appropriate, a histogram may consist entirely of singleton buckets, in which case a ‘compressed’ representation is employed that maintains only one domain value for each bucket. For efficiency and simplicity, the same infrastructure is used for all data types except longer string and binary data. An order-preserving hash function, whose range is a double-precision floating point value, is used to derive the bucket boundaries on these data types. For numeric data types including date and time, the hash function is simply a conversion of the underlying value to a double-precision floating point value. For short strings the hash is obtained by constructing an integer value representing the binary values of characters in the string. In order to maintain discreteness of domains, each data type is also assigned a value width which represents the difference between two consecutive values in the domain, e.g., the value widths for INT and REAL are 1 and 1e-35, respectively. For longer string and binary data types, SQL Anywhere uses a different infrastructure that dynamically maintains a list of observed predicates and their selectivities. Rather than save individual string values for bucket boundaries— which can potentially be very long—a non order-preserving hash function value is used. Each bucket is represented by a hash value, a relational predicate (equality, non-equality, BETWEEN, IS NULL, or LIKE) and the associated selectivity for the specified predicate on the histogram column. When collecting statistics on these string columns, not only are buckets created for entire string values, but buckets are also created for ‘words’ in the string, where a ‘word’ is loosely defined as any sequence of characters separated by any amount of white space. These buckets are useful in estimating the selectivity of LIKE predicates since we have found, in our experience, that many applications perform string searches using a LIKE pattern intended to match a ‘word’ somewhere in the string. 3.2. Statistics collection during query processing Histograms are created automatically when data is loaded into the database using a LOAD TABLE statement, when an index is created, or when an explicit CREATE STATISTICS statement is executed. A modified version of Greenwald’s algorithm [8] is used to create the cumulative distribution function for each table column. Our modifications significantly reduce the overhead of statistics collection with a marginal reduction in quality. In addition to these bulk operations, histograms are automatically updated during query processing. During normal operation, the evaluation of (almost) any predicate over a base column can lead to an update of the histogram for this column. In addition, INSERT, UPDATE, and DELETE statements also update the histograms for the modified columns. Join histograms are computed on-the-fly during query optimization to determine the cardinality of intermediate results. As with column histograms, join histograms are over a single attribute. In cases where the join condition is over multiple columns, a combination of existing referential in- tegrity constraints, index statistics, and density values is used to compute and/or constrain join selectivity estimates. A variety of statistics, other than column histograms, are automatically maintained for other database objects. For stored procedures used in a FROM clause, the server maintains a summary of statistics for previous invocations, including total CPU time and result cardinality. A moving average of these statistics is saved persistently in the database for optimization of subsequent queries. In addition, statistics specific to certain values of the procedure’s input parameters are saved and managed separately if they differ sufficiently from the moving average. Index statistics, such as the number of distinct values, number of leaf pages, and clustering statistics, are maintained in real time during server operation. Table statistics, in particular the percentage of a table resident in the buffer pool, are also maintained in real time and used by the cost model when computing the cost of an access method. Anywhere (re)optimizes a query2 at each invocation. There are two broad exceptions to this. The first class of exceptions are simple DML statements, restricted to a single table, where the cost of optimization approaches the cost of statement execution. In such cases, these statements bypass the cost-based optimizer, and are optimized heuristically. The second class are statements within stored proce- dures, user-defined functions, and triggers. For these statements, access plans are cached on an LRU basis for each connection. A statement’s plan is only cached, however, if the access plans obtained by successive optimizations of that statement during a ‘training period’ are identical. After the training period is over, the cached plan is used for subsequent invocations. However, to ensure the plan remains ‘fresh’, the statement is periodically verified at intervals taken from a decaying logarithmic scale. Re-optimization of each query means that optimization cost cannot be amortized over many executions. Optimization must therefore be cheap. One of several techniques used to reduce optimization cost is to limit the size of the optimizer’s search space. The SQL Anywhere optimizer uses a proprietary branch-and-bound, depth-first search enumeration algorithm [16] over left-deep processing trees3 . Depth-first search has the significant advantage of using very little memory; in fact, much of the state information required by the algorithm can be kept on the processor stack. As an example, a 100-way join query against a small TPCH database can be optimized and executed by SQL Anywhere on a Dell Axim device, running Windows Mobile 5, with as little as 3 MB of buffer pool, with only 1 MB needed for optimization. The enumeration algorithm first determines a heuristic ranking of the tables involved in the join strategy. In addition to enumerating tables or table functions, the algorithm also enumerates complex subqueries by converting them into joins on a cost basis [15]. By considering tables in rank order, the enumeration algorithm initially (and automatically) defers Cartesian products to as late in the strategy as possible. Hence, it is likely that the first join strategy generated, though not necessarily optimal, will be one with a ‘reasonable’ overall cost, relative to the entire search space. The algorithm is branch-and-bound in the sense that a partial join strategy is retained only if its cost is provably less than the cost of the best complete join strategy discovered thus far. There are several interesting characteristics of the branch-and-bound enumeration algorithm. The first is how the search space is pruned during join strategy generation. The algorithm incrementally costs the prefix of a join strategy and backtracks as soon as the cost of an intermediate result exceeds that of the cheapest complete plan discovered thus far. Since any additional quantifiers can only add to the plan’s cost, no join strategy with this prefix of quantifiers can possibly be cheaper and the entire set of such strategies can be pruned outright. This pruning is the essence of the algorithm’s branch-and-bound paradigm [2]. 2 3 4. Query processing In our experience, there is little correlation between application or schema complexity, and the database size or deployment platform. Developers tend to complicate, rather than simplify, application design when they migrate applications to business front-lines, even when targeting platforms like hand-held devices with few computing resources. It is usually only the user interface that is re-architected because of the input mode differences on such devices. This complexity makes sophisticated query processing an essential component of SQL Anywhere. As described in Section 2, the operating characteristics of the server can change from moment to moment. In simple OLTP workloads with uniform data distributions, the effect of changes in buffer pool size can be easily approximated [20]. However, in mixed-workload systems with complex queries, this flexibility demands that query processing algorithms adapt to changes in the amount of memory they can use. It also means that the query optimizer must take the server state into account when choosing access plans. 4.1. Query optimization SQL In this context we use the term ‘query’ to refer to not only queries, but also to INSERT, UPDATE, and DELETE statements. Left-deep trees are used except for cases involving complex derived tables or table expressions containing outer joins. Additional care must be taken when analyzing cost measures for an intermediate result. For example, a significant component of any plan’s cost concerns its buffer pool utilization [9, 12, 19]. However, measures such as buffer hit ratios can be accurately estimated only for a complete execution strategy, because in a fully-pipelined plan the mostrecently used pages will be from those tables at the root of the processing tree. Nonetheless, it is possible to estimate the cost of computing an intermediate result based on a very optimistic metric: assume that half the buffer pool is available for each quantifier in the plan. Clearly this is nonsense with any join degree greater than 1. However, the point is not to cost intermediate results accurately, but to prune grossly inefficient strategies from the search space quickly. A second notable characteristic of the join enumeration involves the control strategy for the search algorithm. As described in reference [16], the SQL Anywhere optimizer’s search space is a tree. Conceptually, the root of the tree represents an ‘empty’ join strategy, with no quantifiers placed in the plan. Each of the root’s children represents a  quantifier, index, join method  3-tuple that can be the first quantifier in the plan. Level 1 has at most n quantifiers, but may have many 3-tuples depending on how many different index and join method combinations can be used with each quantifier. This ‘first’ quantifier corresponds to the left-most, and therefore deepest, plan node in the left-deep strategy. Each of these 3-tuples at level 1 in turn has m 3-tuple children formed from the other n− 1 quantifiers that can be placed at that position. The 3-tuple children are heuristically ordered by the enumeration algorithm such that the most ‘promising’ ones are enumerated first. Note that the search space tree may be unbalanced due to inherent constraints of processing the particular query. For example, if the query contains a LEFT OUTER JOIN, the preserved side of the join must precede the null-supplied side in the left-deep join strategy tree. A problem with traversing the above search tree using a branch-and-bound approach with early halting is that the search effort is not well-distributed over the entire search space. If a small portion of the entire space is visited, most of the enumerated plans will be very similar. SQL Anywhere addresses this problem by employing an optimizer governor [21] to manage the search. The governor dynamically allocates a quota of search effort across sections of the search space to increase the likelihood that an efficient plan is found. The amount of effort spent in optimization is measured by the number of visits to each node in the search tree. A quota of visits is used to limit the search effort in any subtree. This quota is unevenly distributed across siblings so that more is given to ‘promising’ 3-tuples at each level based on their heuristic ordering. Conceptually, each node assigns 1/2 of its quota to its first child, 1/2 of what- ever remains after visiting that child to the second child, and so on. The initial quota can be specified within the application, if desired, allowing fine-grained tuning of the optimization effort spent on each statement. Implementation of the above algorithm is slightly different from the conceptual view since the route taken by the search is not known in advance. Potential 3-tuples for each remaining unplaced quantifier are determined for each join strategy prefix. During enumeration, if a set of strategies is pruned at level m in the tree, then any unused quota is returned to the node above it at level m−1. As a further refinement, when a new optimal plan is discovered that improves the overall estimated cost by at least 20%, any remaining quota for that search path is completely redistributed, starting at the root, which represents the lowest-level node in the left-deep processing tree. This has the benefit of concentrating additional quota in a portion of the search space that has yielded at least one good plan, in anticipation of possibly finding other efficient plans in that portion of the space. 4.2. Disk transfer time model SQL Anywhere uses a Disk Transfer Time ( DTT ) model [7] to estimate a query’s expected I / O cost. A ‘generic’ model (see Figure 2(a)) is used by default. It has been validated using a systematic testing framework over a variety of machine architectures and disk subsystems. While complete accuracy of any cost model is a worthwhile goal, in practice such accuracy is not necessary. The primary objective for the cost model is to ensure that for any query plans P1 and P2 , we have Cost E (P1 ) > Cost E (P2 ) iff Cost A (P1 ) > Cost A (P2 ). (3) That is, the relationship of expected costs of the two plans is the same as that of the actual costs at run time. The DTT function summarizes disk subsystem behaviour with respect to an application (in this case, the SQL Anywhere server). The DTT function models the amortized cost of reading one page randomly over a band size area of the disk. If the band size is 1, the I / O is sequential, otherwise it is random. The two read curves in Figure 2(a) illustrate better random retrieval times when the band size is smaller. Significantly larger band sizes increase the average cost of each retrieval because of a higher probability of each retrieval requiring a seek, and the increase in the travel time of the disk arm to reach the correct cylinder. Note that each write curve in the model DTT (Figure 2(a)) illustrates a lower amortized cost than its corresponding read curve for larger band sizes. On the surface, this is counterintuitive: disk subsystems are typically optimized for read operations, particularly sequential reads [17, pp. 44–7] and write operations usually require addi- Calibrated DTT, Intel Bensley Default DTT 9000 18000 8000 16000 Time (microseconds) 12000 Read 4K 10000 Read 8K Write 4K 8000 Write 8K 6000 Time (microseconds) 7000 14000 6000 5000 Read 4K Write 4K 4000 3000 4000 2000 2000 1000 0 0 0 500 1000 1500 2000 2500 3000 3500 1 10 100 1000 Band Size 10000 100000 1000000 10000000 Band Size (a) Default DTT model. (b) Calibrated DTT for an Intel Bensley processor with a single Seagate 7200 RPM ‘Barracuda’ disk (note the logarithmic scale). Figure 2. DTT models 4.3. Adaptive query execution SQL Anywhere’s query optimizer makes choices based on estimates and predictions made at optimization time, but there is necessarily a level of uncertainty in these predic- DTT FOR SANDISK 512mb SD CARD 1600 1400 1200 Time (microseconds) tional overhead [4]. The explanation of the DTT curves is that in a database system, if we ignore issues such as prefetching, read requests are typically synchronous; a join algorithm, for example, cannot proceed until the required page is present in the buffer pool. However, writes are typically asynchronous, since it is the buffer pool manager that will decide at what point any dirty pages will be flushed to disk, and these deferred I / O requests can take advantage of parallelism, shortest seek-time scheduling, and other optimizations [7, 17]. If the default DTT model is unsatisfactory, a DBA can substitute a different one. For specialized hardware, a CALIBRATE DATABASE statement can determine the read DTT curve from the actual system. The write DTT curve is approximated using the read curve as a baseline. Figure 2(b) illustrates an actual read DTT and an approximate write DTT taken from a dual-core, multi-CPU Intel Bensley processor with a ‘Barracuda’ 7200 RPM disk. Calibration can also be important for Windows CE platforms, particularly because persistent storage, often compact flash memory, has substantially different characteristics than disk media. Figure 3 illustrates the DTT curve of a 512 MB SD card on a Pocket PC 2003 handheld device—note the uniform random access times. In SQL Anywhere, the DTT model is stored in the catalog and can be altered or loaded with the execution of a DDL statement. Consequently, it is straightforward to deploy hundreds or thousands of databases to CE devices with a cost model derived from a representative device. 1000 Read 4K Read 2K Write 4K Write 2K 800 600 400 200 0 1 200 800 1237 1674 2548 4296 4297 Band Size Figure 3. DTT for an SD storage card tions due to estimation error and concurrent activity. Individual query execution operators are able to detect particular prediction problems and adapt locally to the current conditions. One example of this adaptivity is the hash join operator. The optimizer may favour a hash join over an index nested-loops strategy based, in part, on the estimated number of rows in the build input. After building the hash table on the build input, the hash join operator knows the precise number of build rows, and can determine if the optimizer was wrong and an index nested-loops strategy would in fact have been cheaper. The query optimizer annotates the hash join operator with an alternate strategy so that an index nested-loops strategy can be employed after reading the build input if the number of rows is low enough to make it preferable. In a similar way, a special operator for exe- cution of RECURSIVE UNION is able to switch between several alternative strategies, possibly using a different one for each recursive iteration, and also possibly sharing work from iteration to iteration. In addition to adapting to errors in cardinality estimation, the SQL Anywhere query execution operators adapt to changes in the amount of buffer pool available and to fluctuations in the number of concurrent, active requests. Each task, or unit-of-work, within the SQL Anywhere server has a memory governor that controls memory usage for both fixed-sized data structures and variable-sized structures used for memory-intensive operators such as sorting or hashing. The memory governor provides two quotas to each task: • a hard memory limit: if exceeded, the statement is terminated with an error. This limit is ( 43 maximum buffer pool size) number of active requests. (4) • a soft memory limit, which the statement’s query processing algorithms should not exceed. When this limit is reached, the memory governor requests query execution operators to free memory. The soft limit is computed as current buffer pool size server multiprogramming level . (5) The memory governor controls query execution by limiting memory consumption for a statement to the soft limit. The query optimizer uses the predicted soft limit to estimate execution costs, and annotates memory-intensive operators with a maximum number of pages to use in their processing. Memory-intensive operators include hash join, hash group by, hash distinct, and sort. If the actual memory available at execution time is much lower than the optimizer expected, then a memory-intensive operator may not be able to complete in a reasonable time. Memory-intensive operators detect this condition at execution time, and change locally to a low-memory fallback strategy. For example, the low-memory fallback for hash group by uses a temporary table containing partially computed groups with an index on the grouping columns. Low-memory fallback strategies are only used in extraordinary cases. In normal operation, the concern is how to share memory effectively between operators within a single statement’s execution plan and between concurrent requests [5, 14]. This sharing is accomplished using the memory governor’s soft limit and annotations from the query optimizer. For example, a hash join operation in SQL Anywhere chooses a number of buckets based on the expected number of distinct hash keys; the goal is to have a small number of distinct key values per bucket. In turn, buckets are divided uniformly into a small, fixed, number of partitions. The number of partitions is selected to provide a balance between I / O behaviour and fanout. During processing of the hash operation, memory usage of the hash table is monitored. When the hash table reaches the memory governor’s soft limit, the partition with the largest number of rows is evicted from memory. The in-memory rows are written out to a temporary table, and incoming rows that hash to the partition are also written to disk. By selecting the partition with the most rows, the governor frees up the most memory for future processing, in the spirit of other documented approaches in the literature [23]. By paying attention to the soft limit while building the hash table on the smaller input, the server can exploit as much memory as possible, while degrading adaptively if the input does not fit completely in memory. In some cases, multiple memory-intensive operators within a query execution plan may be using memory concurrently. For example, the probe input to a hash join may use memory for hash group by. When the soft limit is reached for the statement, the memory governor begins requesting that memory be relinquished starting at the ‘highest’ consuming operator and moving down the execution tree. This approach prevents an input operator from being starved for memory by a consumer operator, while still allowing individual operators to proceed with as much memory as possible. 4.4. Adaptive intra-query parallelism SQL Anywhere can assign multiple threads or fibers to a single request, thus achieving intra-query parallelism. Manegold et al. describe a technique that has very good automatic load balancing characteristics [13]. They construct a right-deep pipeline of hash joins and execute the probe phase with multiple threads. A thread fetches a row in a first-come, first-serve manner from the single scan feeding the pipeline and then executes the probe against all hash tables in the pipeline. Any number of threads can do this simultaneously, independent of the number of joins in the plan. First-come, first-serve partitioning also has the advantage that it preserves a sequential disk access pattern for table scans. This is particularly important for parallelizing queries on SMP or multi-core machines where the database is on a single disk. We use this technique with several extensions. One is that the build phase of the hash joins is also parallelized. The threads assigned to perform the probe phase are first used to execute the build phase by fetching rows first-come, firstserve from the scans and building separate hash tables. The hash tables are then merged into a single hash table for each join and the probe phase proceeds. Thus, a plan involving an arbitrary number of joins composed in an arbitrary way (not just right-deep) can be executed with much the same load balancing characteristics on the build phases as on the probe phases. We also extend the technique to support complex plans containing both hash joins and additional operators while maintaining the load balancing characteristics. The operators include nested loop joins, Bloom filters, and hash group bys. In addition to load balancing, an adaptive advantage of the technique is that the number of threads assigned to a plan can very easily be changed during execution. In particular, if the number of threads is dynamically reduced to one, then the total cost of the query is only slightly worse than if it was never set up to use parallelism. This gives the server a lot of flexibility to adapt gracefully to fluctuations in load. It also makes query optimization relatively easy since costing of a parallel plan can be done as a slight modification of costing the corresponding non-parallel plan. 5. Database design and workload analysis The goals that drive self-management features on database systems, such as reduced total cost of ownership, also extend to other phases of the design and implementation database applications. Database application design, physical database design, and database setup are phases of the complete database application lifecycle that have traditionally required a level of user expertise similar to that needed to tune an active database system. SQL Anywhere provides an integrated toolset, known as Application Profiling, to advise DBAs and application developers on each of these tasks. To support this analysis, as much detail as possible is collected about a database application and a database instance. This is accomplished by obtaining a detailed trace of all server activity, including SQL statements processed, performance counters, and contention for rows in the database. This trace information is captured as an application runs, and is transferred via a TCP / IP link into any SQL Anywhere database, where it can be analyzed. This flexible architecture allows for the trace to be captured with a focus on convenience (by storing the trace in the same database that generated it) or on performance (by storing the trace data on a database on a separate physical machine). The architecture also permits the Application Profiling tool to analyze and make recommendations, including index recommendations, for databases on mobile devices running Windows CE. The Application Profiling tool contains a database of commonly seen design flaws. It is able to detect incorrect database option settings. It can also detect suboptimal query patterns coming from an application. For instance, it can detect the presence of a client-side join, in which many identical statements arrive from an application, differing only by some constant value used in a predicate. The tool can then point out to the user that such a loop in the application would be more efficiently carried out as a single statement issued to the server. In addition to detecting design or implementation problems, the Application Profiling tool also includes an Index Consultant that can recommend the creation or removal of indexes. The Index Consultant uses a novel technique to provide useful recommendations without requiring excessive resources, whereby the query optimizer is able to generate specifications for indexes it would like to have [6]. These ‘virtual index’ specifications can be very general, allowing flexibility in the composition, sequence, and ordering of index columns. The virtual index specification becomes tighter as optimization proceeds, as the optimizer desires more specific orderings of the input [22]. When the Index Consultant is finished, a physical composition and ordering is imposed on the index. This technique allows the optimizer to consider large numbers of potential physical indexes efficiently. All of the existing parts of the Application Profiling tool require some intervention from the DBA; in some cases, the DBA is only required to approve or disapprove of a recommendation made (for example, when a recommendation is made to create an index). In other cases, the DBA must make manual changes to the database (for instance, moving the transaction log to a separate disk drive). The fulfillment of the promise of self-management would be to have these recommendations implemented automatically. 6. Future work Virtually every new feature implemented in SQL Anywhere is designed to be adaptive or self-managing. In addition to the self-management technologies described above, SQL Anywhere also includes a variety of tools and technologies that are useful during the development of a database application, including graphical administration and modeling tools, and a full-function stored procedure debugger. As with any software system, SQL Anywhere continues to evolve and improve its capabilities. In the future we will be studying a variety of opportunities to further improve self-managing performance. Self-management of database statistics poses unique challenges in some user environments. For example, the inevitable delay, no matter how small, between changes to underlying data and the capture and integration of their corresponding distribution statistics can cause problems in application environments where data distributions change drastically at regular intervals. Moreover, the overhead incurred by the server for automatic statistics collection must be carefully managed: the server must be able to intelligently tradeoff their collection with the potential improvement to query workload performance. We are working on a new statistics collection framework, that includes feedback from the optimizer, in order to proactively determine desired, unnecessary, or inaccurate statistics. Other items on our research agenda include: • dynamically changing the server’s multiprogramming level in response to database workload and changes to overall system performance; • better algorithms for apportioning memory across query execution operators within an access plan (cf. references [5, 14]); • automatic reclustering and/or reorganization of tables and indexes; • improvements to buffer pool growth and shrinkage strategies to avoid hysteresis and make server performance more predictable; • better modeling of write performance on removable media, particularly for Windows CE devices; and • adaptive priortization and scheduling of I / O requests. References [1] A. Aboulnaga and S. Chaudhuri. Self-tuning histograms: Building histograms without looking at data. In ACM SIG MOD International Conference on Management of Data, pages 181–192, Philadelphia, Pennsylvania, May 1999. [2] I. T. Bowman and G. N. Paulley. Join enumeration in a memory-constrained environment. In Proceedings, Sixteenth IEEE International Conference on Data Engineering, pages 645–654, San Diego, California, Mar. 2000. [3] K. P. Brown, M. J. Carey, and M. Livny. Goal-oriented buffer management revisited. In ACM SIGMOD International Conference on Management of Data, pages 353–364, Montréal, Québec, June 1996. [4] P. M. Chen and D. A. Patterson. A new approach to I / O performance evaluation–self-scaling I / O benchmarks, predicted I / O performance. ACM Transactions on Computer Systems, 12(4):308–339, Nov. 1994. [5] B. Dageville and M. Zait. SQL memory management in Oracle9i. In Proceedings of the 28th International Conference on Very Large Data Bases, pages 962–973, Hong Kong, China, Aug. 2002. [6] D. J. Farrar and A. Nica. Database system with methodology for automated determination and selection of optimal indexes. US Patent 2005/0203940, Sept. 2005. [7] A. K. Goel. Exact Positioning of Data Approach to Memory Mapped Persistent Stores: Design, Analysis and Modelling. PhD thesis, University of Waterloo, Waterloo, Ontario, 1996. [8] M. Greenwald. Practical algorithms for self-scaling histograms or better than average data collection. Performance Evaluation, 20(2):19–40, June 1996. [9] L. M. Haas, M. J. Carey, M. Livny, and A. Shukla. Seeking the truth about ad-hoc join costs. The VLDB Journal, 6(3):241–256, Aug. 1997. [10] Y. E. Ioannidis. The history of histograms (abridged). In Proceedings of the 29th International Conference on Very Large Data Bases, pages 19–30, Berlin, Germany, Sept. 2003. [11] C. A. Lynch. Selectivity estimation and query optimization in large databases with highly skewed distributions of column values. In Proceedings of the 14th International Conference on Very Large Data Bases, pages 240–251, Los Angeles, California, Aug. 1988. [12] L. F. Mackert and G. M. Lohman. Index scans using a finite LRU buffer: A validated I / O model. ACM Transactions on Database Systems, 14(3):401–424, Sept. 1989. [13] S. Manegold, J. K. Obermaier, and F. Waas. Load balanced query evaluation in shared-everything environments. In European Conference on Parallel Processing, pages 1117– 1124, 1997. [14] B. Nag and D. J. DeWitt. Memory allocation strategies for complex decision support queries. In Proceedings, Seventh International Conference on Information and Knowledge Management ( CIKM ), pages 116–123, Bethesda, Maryland, Nov. 1998. [15] A. Nica. System and methodology for cost-based subquery optimization using a left-deep tree join enumeration algorithm. US Patent 2004/0220923, Nov. 2004. [16] A. Nica. System and methodology for generating bushy trees using a left-deep tree join enumeration algorithm. US Patent 2004/0006561, Jan. 2004. [17] D. Shasha and P. Bonnet. Database Tuning. Morgan-Kaufmann, San Francisco, California, 2003. [18] A. J. Smith. Sequentiality and prefetching in database systems. ACM Transactions on Database Systems, 3(3):223– 247, Sept. 1978. [19] A. Swami and K. B. Schiefer. Estimating page fetches for index scans with finite LRU buffers. The VLDB Journal, 4(4):675–701, Oct. 1995. [20] T.-F. Tsuei, A. N. Packer, and K.-T. Ko. Database buffer size investigation for OLTP workloads. In ACM SIGMOD International Conference on Management of Data, pages 112–122, Tucson, Arizona, May 1997. [21] M. Young-Lai. Database system with methodology for distributing query optimization effort over large search spaces. US Patent 6,807,546, Oct. 2004. [22] M. Young-Lai and A. Nica. Database system with methodology for generalized order optimization. US Patent 2006/0136368, June 2006. [23] H.-J. Zeller and J. Gray. An adaptive hash join algorithm for multiuser environments. In Proceedings of the 16th International Conference on Very Large Data Bases, pages 186– 197, Brisbane, Austrailia, Aug. 1990. [24] Q. Zhu, B. Dunkel, W. Lau, S. Chen, and B. Schiefer. Piggyback statistics collection for query optimization: Towards a self-maintaining database management system. Computer Journal, 47(2):221–244, 2004.