An Evolutionary Approach to the Index Selection Problem
Javier Calle, Yago Sáez, Dolores Cuadra
Computer Science Department - Carlos III University of Madrid
{fcalle,ysaez,dcuadra}@inf.uc3m.es
Abstract— In this paper, evolutionary algorithms are explored
with the objective of demonstrating that they offer the most
efficient and adequate solution to the Index Selection Problem
(ISP). The final target is to develop a self-tuning database
system requiring little (or no) intervention from experts in
physical design. Following the evaluation of the proposal and
the discussion of experimental results, conclusions are made
regarding the possibilities presented by evolutionary
algorithms for future projects.
Keywords- Index Selection
Algorithms, Self-Tuning.
I.
Problem,
Evolutionary
INTRODUCTION
One of the principal concerns present when trying to
improve the performance of a database instance is that of
finding the most appropriate physical design for that
database. Within these general concerns, the Index Selection
Problem (ISP) can be defined as the search for a particular
combination of indexes such that the cost for a given
workload in the database is minimized. This problem has
been traditionally formalized as the linear combination of 0-1
values on a string of variables, each representing a different
candidate index (i.e., 0-1 integer linear programming) [7].
Being NP-hard [9], the implementation of the ISP ought
to consider certain restrictions that allow for a solution to be
reached in a reasonable amount of time. Almost all proposed
solutions to the ISP, for example, first begin with a search for
a reduced subset of candidate indexes. Later, many of these
proposals focus on the search for heuristics and efficient
pruning techniques in order to avoid the exploration of index
combinations known, a priori, to be ineffective. Finally,
certain authors [14] opt for statistics-based simulations
(rather than taking measurements on real environment
executions) in order to save time and not affect databases in
use. While the algorithms usually studied as potential
solutions to the ISP fix the characteristics of the database and
the database management system (DBMS), and even set the
workload as static, it is nevertheless the case that each of
these parameters is, in reality, dynamic. Thus, it would be
much more proper to find an algorithm capable from the start
of adapting dynamically to any change in these parameters.
Furthermore, a number of additional studies can be found
focusing on specific system types (e.g., relational systems,
OLTP, etc.) or even certain auxiliary structures [4]. The
ability, therefore, of an algorithm to find general solutions
applicable to distinct systems and given diverse structures
would make its use even more recommendable. Given these
considerations, evolutionary algorithms present themselves
as the most promising solution, insofar as they can perfectly
c
978-1-4577-1124-4/11/$26.00 2011
IEEE
adapt to the definition of the problem and, in addition, can
dynamically adapt to variations in the objective function [10]
[11]. Other related studies demonstrating the viability of the
proposed solution can also be found [18].
It is the principal objective of this study to offer an actual
measurement of the goodness and superiority of this proposal
when compared with the results obtained by frequently-used
tools, as well as expert database administrators. This first
evaluation, therefore, is made using a relational database in
static conditions (i.e., fixing characteristics of the database,
DBMS and workload), looking for any type of auxiliary
structure offered by the DBMS. It is the hope of the authors
of this study that the demonstrated relative goodness of
evolutionary algorithms as a solution to the ISP may ground
future studies that are more general (i.e., applicable to
distinct systems) and focus on dynamic conditions.
II.
RELATED WORK AND PROPOSAL
Let C be the set of candidate indexes for a database with
a cost function f: W x DBi x P(C) Æ R providing a real value
for the execution of workload (W) on a database with a given
state (DBi) and a particular physical design which, for
reasons of brevity, is here restricted to the selection of an
index combination from the power set of C that is k ∈ P(C).
The ISP can be now defined as the search for k which
minimizes the cost.
One of the first problems to be addressed when selecting
the correct indexes for a particular database schema is to
decide the workload for which the indexes will optimize the
performance of the database. The workload is a significantly
large set of updating and query instructions (i.e., sentences)
representing the operations that occur in a database. One way
of selecting a representative workload is by utilizing the
logging capabilities [2] of many DBMSs to capture the trace
of queries and modifications made in any of those particular
systems. In certain published works [3], for example, the
new self-tuning characteristics of Oracle RDBMS are
presented. Influencing in large part the selection of a good
physical design is the analysis of the most frequent
operations with the largest margin for improvement. In the
proposal presented here, design is grounded in a set of
candidate indexes which will be iteratively optimized during
the execution of a predefined set of operations. To select
them, Oracle uses the Automatic Workload Repository
(AWR) which is updated every hour with operations and
statistics collected by the DBMS.
The second problem to resolve is the selection of a set of
candidate indexes for a given workload. Given that, in many
cases, the search space of candidates is often computationally
485
unmanageable (i.e., with an exponential number of solutions),
a selection should be made among all possible index
combinations, eliminating those which are not representative
for the selected workload. In the majority of the proposals
studied, this filtration is carried out by the DBMS advisor.
To give some examples, the SQL Server Advisor bases its
recommendation on the syntactical structure of SQL
sentences [2]. In Oracle [3], the history stored (weekly) in
the AWR is used for the creation of four lists whose rankings
depend on the total time spent during (1) a given week, (2)
any given day of the week, (3) any given hour of the week,
and (4) the average time consumed during the week.
The execution cost of an SQL sentence pertaining to a
workload selected with the configuration of candidate
indexes is estimated. Dynamic programming [2][5] or whatif analysis and optimization based on greedy algorithms
[8][14] is used in most cases. At this point, the procedure for
solving the ISP does not execute the definitive configuration
of indexes in the database schema. Rather, it is the
responsibility of the administrator, according to the data
provided by the advisor tool and the administrator’s own
experience, to select the definitive configuration of indexes
to be introduced in the DBMS. Additionally, certain other
proposals provide and increase the degree of certainty in the
solution through the use of visual tools which, for example,
can demonstrate the interaction between indexes [19].
Solving the ISP for a DB instance in running time and
without the intervention of a DB administrator is one of the
Self Tuning goals. From the 1970s until present day,
researchers have studied not only the ISP in static conditions,
but also other physical structures like materialized views and
their relationships with indexes [1], as well as the tuning of
database configuration parameters [13].
Furthermore, the nature of the ISP restricted to a set of
candidate indexes is adjusted to the search for combinations
of elements (that may or may not appear) in order to
maximize an objective function. Therefore, the ISP may be
approached as an optimization problem whose objective
function is the minimization of the cost, as measured in
response time or number of logical reads. The situation,
therefore, is ideal for the proposal of search and optimization
techniques like genetic algorithms for the solution of the ISP.
Genetic algorithms (GAs) are stochastic search and
optimization techniques inspired by the theory of evolution.
Over many generations, populations evolve according to the
principles of natural selection and the survival of the fittest.
Imitating this process, GAs simulate populations of
individuals – each representing a possible or candidate
solution to a single problem – capable of evolving under the
evolutionary pressure exerted by the objective function. The
best solutions (i.e., fittest individuals) have a greater
probability of surviving and, as a result, reproducing (i.e.,
being bred) with other surviving solutions. As will be
described in more detail in the following section, proper
evolution depends largely on the correct initial encoding (i.e.,
genetic representation) of these candidate solutions.
The process carried out by a GA can be summarized in
the following steps. First, an initial population is randomly
generated in which each individual of that population
486
represents a candidate solution to a single problem and is
genetically represented by one or more chromosomes
(normally bit strings). Once this population has been created,
individuals are evaluated according to a fitness function. The
fittest individuals are selected for reproduction using the
genetic operators, crossover and mutation. In crossover, part
of the genetic representation of each parent is handed down
to the child. Mutation, on the other hand, allows for the
appearance of new genetic characteristics in the child, similar
to what occurs in nature, through the random modification of
certain genes. Once this new population has been created and
has replaced the former population, the new individuals are
evaluated for fitness. This process is repeated numerous
times until a termination condition programmed by the
designer has been met. Diverse types of genetic operators for
selection, crossover and mutation exist, as well as distinct
probability factors that they could occur. The parameters
chosen in this study allow for the replication of the
experimentation environment and will be described in
greater detail in the third section.
The selection here of GAs over other techniques is due to
a number of reasons including, firstly, their behavior when
working with very large search spaces, something quite
common in real-world examples of the ISP. Furthermore,
GAs are highly recommendable techniques for the solution
of non-linear optimization problems, and their performance
has been sufficiently demonstrated for both academic
[12],[17] and real-world [16] search and optimization
problems. As an additional consideration, in real-world
instances of the ISP, it is not a necessary prerequisite that the
global optimal solution be found if the pinpointing of a local
optimal solution can offer a significant improvement.
Finally, GAs have proven to be robust techniques in the
presence of fitness function noise, something generally
present in real-world cases.
It is important to mention that the application of GAs to
the ISP has been proposed in this study as a hypothesis
which, in the case that it proves to be valid, can be used as a
launching point for future studies and improvements. The
proposal of this technique here, however, is not intended to
rule out or take the place of a deep analysis of the fitness
landscape to determine if any alternative and more adequate
optimization techniques exist.
In order to adapt the ISP to the GA, the former may be
reformulated in the following way: the system ought to find
an array of variables x ∈ M – where M is the total
configuration of possible indexes in the system – that
minimizes the response time and number of logical reads
required for a defined set of instructions. The objective
function (i.e., fitness function) will be that which minimizes
the cost in time and/or reads resulting from the execution of
a predefined set of SQL sentences under the configuration of
indexes proposed by the array of variables.
The parameter M can be calculated using Equation (1)
nCols
(1)
M =
¦C
i
nCols
⋅ nIndexType ,
i =1
i
being C nCols
=
(
nCols
i
!
) = ( n nCols
− p )! ⋅ p!
2011 Third World Congress on Nature and Biologically Inspired Computing
where nCols represents the total number of columns
involved in the study, where i permits the study of each
possible index grouping including anywhere from 1 to nCols
elements, and where nIndexType represents the total number
of index types (e.g., secondary, bitmap, cluster, etc.) that
could be included in the study. The resulting number of
possible combinations (M) will then be used to determine the
size in bits of the chromosome (N) in Equation (2).
(2) N = [log 2 ( M ) ] + 1, [x] = n, if n < x < n + 1 and n ∈ Ν
This encoding has the advantage of working with bit
strings, the most recommendable for GAs. Values 1 and 0
represent indexes that exist and do not exist, respectively.
Nevertheless, the encoding has the inconvenience of
presenting a certain amount of redundancy depending on the
proximity of M to a power of two. As a solution, this paper
proposes the use of non-binary chromosomes based on the
types of possible indexes (nIndexTypes).
III.
EXPERIMENTAL DESIGN
The principal objective of this study is determining
whether the GA could resolve the ISP in an efficient way. In
order to carry out these experiments, a relational database
managing system (RDBMS) was required allowing for the
use of indexes and providing statistical information about
their use and performance. As a result, the Oracle Database
11gTM (http://www.oracle.com/us/products/database) was
chosen not only because it fulfilled these prerequisites, but
also due to its widespread use by consumers (that said,
however, it is important to recognize that the proposal could
be scalable to any other DBMS, as well). The DBMS was
installed in a dedicated server (Quad-Core AMD OpteronTM
8356 Processor 2.3 GHz and 8 Gb of RAM), and functioning
on a Windows Server 2008 Enterprise (64-bit) OS. The
experimental scenario and files used to run the experiments
can be found in http://labda.inf.uc3m.es/evolutionary/.
As mentioned previously, the hypothesis tested by this
study was whether the method based on GAs could yield
solutions to the ISP with lower costs than other methods with
respect to response time and number of logical reads
(specifically, consistent gets [6]). These other methods tested
here included the recommendations from tuning experts for
the selected DBMS, as well as the proposals from the widely
used analytical tool, Toad® (http://www.quest.com /toad)
for Oracle. In each case, a resulting set k ∈ P(C0) was
yielded which was efficient for a database (DBi) and given
workload (W), and with a set of candidate indexes C0.
A. Independent Variables
For this experiment, a database was used with a single
table whose description included 25 fields, four of which
were numeric and the rest of which were alphanumeric.
Thus, it is clear that while the database used contained only a
single table, the former nevertheless had to be significantly
large. The hypothesis was tested with two different
scenarios. In the first one, the workload consists of 6 million
records (approximately 3 Gb). In the second scenario, the
workload was increased to 30 million records (nearly 15 Gb).
In both scenarios, the maximum and average record sizes per
table were 521 and 371.63 bytes, respectively.
Regarding the measurement of performance, not only the
database itself and its design are crucial, but so too are the
state of the DBMS (in particular, those of its buffers) and,
naturally, the workload (W). Even if W design does not
affect the behavior of the proposal, it is nevertheless
convenient to design it such that its performance could be
improved or worsened depending on the inclusion of
different indexes. Therefore, in this experiment, W included
a sufficient number of updating and query operations (with a
0.1% volatility in the former). The queries (eight in total)
involved conditional expressions over one, two or three
columns in the table, affecting nine columns overall. In order
that each execution could be carried out under the same
conditions, buffers were flushed in each of the iterations.
Additionally, since the different updating processes change
the state of the database, it was returned to its original state
following each execution of W. In this study, only the cost
for the execution of W, rather than for re-initialization and
preparation processes, is taken into account.
In order to demonstrate the goodness of the proposal,
indexes for each of the columns involved in a selection were
included in the set of candidate indexes for the experimental
scenarios. Some of these columns, for example, gave rise to
multiple candidates since two distinct index types (i.e.,
secondary and bitmap) were considered for columns with
low cardinality (i.e., less than 28 distinct values). Finally,
towards more realistic experiment, multi-attribute indexes
were included for columns appearing together in the same
sentence. In the end, the set contained some twenty indexes
among which the most appropriate were to be identified.
B. Dependent Variables
The metric used here to demonstrate the efficiency of a
particular method was the number of data block consistent
gets [6]. This measurement can be as representative as
response time and offers a good deal of independence from
other factors (e.g., state of the database server, processes
foreign to the DBMS, state and management policy of the
intermediate memory, state and configuration of the storage
devices, etc.) that could potentially affect performance. As a
secondary metric, response time was also observed. In order
to reduce alterations due to other factors, each execution was
repeated 50 times with the DBMS cache emptied for each
respective repetition. For response time, the average of the
measurements obtained in each execution was used.
C. Methods
This study compares the following methods:
• Toad® for Oracle: a commercial tool with two
distinct configurations (Toad2 and Toad5).
• Experts: recommendations proposed by two
professional administrators, Expert #1 and Expert
#2, with access to the database and to all the
statistical data collected by the DBMS.
• GA: a recommendation based on the application of a
simple or canonical GA following Goldberg [16]
• None: configuration by default from the DBMS.
2011 Third World Congress on Nature and Biologically Inspired Computing
487
In order to allow for the easy replication of these
experiments, AForge.Genetic, an open source genetic
algorithm library was used. Furthermore, each of the genetic
operators of selection, crossover and mutation utilized in the
experiments were those implemented by default in the
library. The GA execution parameters tested are:
• Population Size: 40 / 100
• Selection Meth.: Ranking/Roulette-wheel (both with
elitism)
• Crossover (type and rate): Uniform; 75%
• Mutation (type and rate): Single Chr. random bit
flip; 10%
• Termination Condition: 100 generations
• Evaluation Function: fittest minimize Oracle
consistent gets
The GA procedure applied in the experiments can be
summarized as follows. First, the individuals of the initial
population were randomly generated. Following this
initialization process, each individual was evaluated during
the sequential execution of the workload with the number of
consistent gets used as the fitness value. Once evaluated, the
fittest individuals were selected using the ranking/roulettewheel technique. Of the selected individuals from the initial
population, 75% were then subjected to crossover, followed
by the mutation of one of the genes in 10% of the selected
individuals. At this point, the new individuals obtained were
evaluated (it was not necessary to re-evaluate the unmodified
individuals) and selection, crossover and mutation were
repeated until satisfying the termination condition. The
computational cost of each iteration (i.e., generation) of the
process depended on the evaluation of the new individuals in
that iteration. In each evaluation, the database was returned
to its initial state, the indexes proposed by the individual to
be evaluated were created and the workload was launched.
Thus, the execution time for a complete generation could be
measured as the time taken for the DBMS to carry out this
process multiplied by the number of individuals to be
evaluated. Depending upon the convergence of the GA, the
number of new individuals analyzed would diminish, thereby
increasing the execution velocity. Despite the procedure
described and used in the present experiment, the possibility
of carrying out a simulation of the workload rather than its
execution should not be ruled out for future studies.
IV.
EXPERIMENTAL RESULTS
Two experiments were carried out using tables with 6
and 30 million records, respectively. Once the solutions
proposed by the GA and other sources (see Methods) had
been obtained, 50 executions of each solution in the same
conditions were carried out. Averages of the measurements
from those executions were taken for a comparative analysis.
A. First scenario: 6 million records
In the first scenario, different index combinations were
obtained by each of the methods tested – the GA, the
commercial tool with two different configurations (i.e.,
Toad2 and Toad5), the two experts consulted and the empty
(i.e., default) configuration. In figure 1.b, response times for
488
each of the proposed combinations (averages from the 50
executions) are compared. In figures 1.a and 1.c, the same
solutions for each different method are detailed with respect
to consistent gets and database block gets, respectively.
B. Second scenario: 30 million records
In the second scenario, the same experiments were
repeated on a table five times larger than in the previous
scenario. Similar to the first scenario, the proposed solutions
were different for each method evaluated. However, results
also differed in the majority of cases from those observed in
the first scenario with the smaller table. As evidenced from
figure 2.a, the GA again obtained optimization for consistent
gets, the result of which being response time optimization as
seen in figure 2.b. In addition, as can be seen in figures 2.c
and 2.b, the large reduction in database block gets in the
combinations proposed by the index experts resulted in a
response time that came significantly close to, yet without
surpassing, that of the solution proposed by the GA.
C. Additional experimentation
With the installation of the virtual machine software,
VMWare (www.vmware.com), in the DB server allowing for
the restriction of processor and main memory resources, each
experiment was repeated in different conditions. While the
experiments, therefore, were executed with diverse hardware
configurations and workloads, the results obtained in each
case were nevertheless similar with respect to the superior
performance of the GA over the other methods tested.
In order to analyze the diversity of solutions proposed by
the GA (due to its stochasticity), an additional experiment
was performed which sought to repeat the executions of the
GA with the same conditions as in the first execution.
However, variations to the proposed solutions were produced
rarely, and in those cases the latter could be traced to the
appearance of a local minimum that was close to the global
optimum and, objectively speaking, superior than all those
proposed by either the commercial tool or the experts.
D. Statistical testing
In the two scenarios, statistical tests were performed to
prove the significance of the results obtained. First, results
were subjected to the Kolmogorov-Smirnov test to evaluate
the normality of their distribution. Measurements based on
consistent gets were found to have a normal distribution and
a later Student’s t-test (ρ<0.05) showed significant results,
ruling out the null hypothesis. Regarding the measurements
based on response time, however, the normality of the
distribution was not clear. As a result, the non-parametric
Mann-Whitney-Wilcoxon test was used, finding statistically
significant differences (ρ<0.05) in each case.
V.
DISCUSSION
The metric used in the objective function for the GA in
this study was the number of consistent gets rather than
response time. This was due to far greater variability of the
latter metric, influenced by diverse factors that introduce
noise in the measurements. Even the selected algorithm is
robust with respect to noise, using the response time metric
2011 Third World Congress on Nature and Biologically Inspired Computing
for convergence would nevertheless be much more costly.
Furthermore, the selected metric is also efficient in reducing
database access time which led to the obtainment of optimal
results by the GA, as can be observed in figs 1.a and 2.a.
While the commercial tool demonstrated positive
improvements in performance, they were inferior to those
presented by the GA. One of the causes for this difference
can be found in the volatility of the tables (since the
workload includes updating operations). In general,
commercial tools tend to focus on the optimization of record
localization operations and often only on the localizations
with larger margin for improvement. Thus, the tools pay
little or no attention to the cost coming from table and
resulting index updating operations. Such operations are
similarly difficult for experts to assess. On the contrary,
however, the nature of the operations executed remains
completely foreign to the GA which only observes the
associated metric (i.e., consistent gets, in this case).
It is also important to note that the proposals of the
commercial tools include the tuning of certain instance
configuration parameters which, when combined, may yield
satisfactory results. The present study has left these
parameters out of the experiments discussed here, despite the
fact that they could also be used by the GA for the
comprehensive tuning of the database. This fact constitutes
an extremely interesting area for future research.
Regarding the performance of the GA relative to the
other mechanisms tested, one must note that the latter
required data about the execution of the workload over a
number of days which, for the former, were unnecessary.
The GA has disadvantages in the resources consumption and
much time due to its need to test, establish and retract diverse
configurations. Since this lowered performance may have
been undesirable in many production servers, all experiments
of the present study were conducted on a replicated server.
The results presented here clearly support the use of GAs
in the ISP. Nevertheless, future results could be better yet by
testing the application of additional metrics or even the
simultaneous application of multiple metrics (multi-objective
algorithms). Finally, the improvement of the effectiveness of
the algorithm without efficiency loss is another future goal.
VI.
CONCLUSIONS
The current study has evaluated a proposal for the
solution of the ISP using GAs. In the course of that
evaluation, the efficacy of GAs has been compared with that
of commercial tools as well as experts in database
administration and tuning. In each case studied, the solution
offered from the GA was superior to those obtained through
the other methods used. While these results clearly support
the case for the use of GAs on a professional level, there
nevertheless exist certain performance-related weaknesses as
well as a large margin for improvement that ought to be
explored before any particular product is marketed.
The consumption of DB resources during the exploration
of solutions is high and unacceptable for many production
servers. In such cases, therefore, it is the recommendation of
the authors to execute the GA over a replicated server.
Figure 1 (a, b, c). Average results from table with 6 million records
Figure 2 (a, b, c). Average results for table with 30 million records
2011 Third World Congress on Nature and Biologically Inspired Computing
489
A particular advantage presented by GAs is that their
continued execution provides results that dynamically adapt
to variations in the workload over time. Thus, it would be
particularly interesting to dedicate future research efforts to
measure the time elapsed from the moment that workload
changes are produced to the moment when these changes
have a noticeable effect upon the proposal.
The parameters of the experiments carried out in this
study – with respect to the environment, DBMS and applied
workload – were all highly general. Nevertheless, and in
order to further demonstrate this generality and exploit the
use of GAs for the ISP, currently planned studies by the
authors aim to expand research to additional DBMSs, other
auxiliary structures (e.g., R-tree and clustered indexes), as
well as instance configuration parameters. In this way, the
authors hope to encounter a reliable mechanism for
comprehensive database tuning. Similar to the majority of
ISP-related techniques, the proposal presented here is
grounded in a set of candidate indexes. Compared with other
ISP-related techniques, however, this set can be made
relatively large while maintaining reasonable cost margins.
Nevertheless, performance decreases with an increased
number of possible solutions. Furthermore, it must be added
that a large part of the other improvements suggested – the
inclusion of configuration parameters (thereby increasing
chromosome size), the application of additional metrics
(either individual or simultaneous with a multi-objective
algorithm), etc. – may result in even lower performance and a
higher number of generations required to obtain good solutions.
Insofar as the performance of the GA, which was already
a weak point of the technique, could decrease further with
the implementation of these new improvements, it would be
convenient to propose a change in the way solutions are
evaluated by substituting the real executions in the replicated
server for approximate cost calculations using simulations or
heuristics. While this proposal would undoubtedly take away
a certain amount of realism from the measurements, it would
also allow for the improvement of the proposed solutions (by
carrying out larger-scale tests) without greatly affecting the
cost. However, given the fact that the GA explores a volume
of solutions much lower than the total number of possible
solutions, the cost could nevertheless be maintained within
margins unthinkable for more exhaustive searches, even
those carried out through simulation.
Finally, and insofar as it would allow for a more precise
classification of the ISP [15], the analysis of the fitness
landscape constitutes a necessary topic for future research.
With such a classification available, researchers could
therefore perform an important comparative study of the
implementation of alternative optimization techniques.
ACKNOWLEDGMENT
This work has been supported by the Spanish Ministry
of Education and Science (project Thuban TIN2008-02711).
REFERENCES
[1] Agrawal, S., Chaudhuri, S., and Narasayya, V. 2000.
Automated selection of materialized views and indexes for
SQL databases. In Procs. 26th VLDB 2000, pp 496–505.
490
[2] Agrawal, S. Chaudhuri, S., Kollar, L., Marathe, A. P.,
Narasayya, V.R., and Syamala, M. 2004. Database Tuning
Advisor for Microsoft SQL Server 2005. In Procs. 30th VLDB
Conference 2004, pp. 1110–1121.
[3] Belknap, P., Dageville, B., Dias, K., and Yagoub, K. 2009.
Self-Tuning for SQL Performance in Oracle Database 11g.
IEEE Int. Conf. on Data Engineering (ICDE), pp.1694-1700.
[4] Bellatreche, L., Missaoui, R., Necir, H., and Drias, H. 2007.
Selection and Pruning Algorithms for Bitmap Index Selection
Problem Using Data Mining. In Song et al. (Eds.): DaWak
2007. LNCS 4654, pp 221-230. Springer.
[5] Bruno, N., and Chaudhuri, S. 2005. Automatic Physical
Database Tuning: a Relaxation-based Approach. In Procs.
ACM SIGMOD Conference 2005, pp. 227–238.
[6] Burleson, D.K. 2010. Oracle Tuning: The Definitive
Reference. Second ed., Rampant TechPress.
[7] Caprara, A., Fischetti, M., and Maio, D. 1995. Exact and
Approximate Algorithms for the Index Selection Problem in
Physical Database Design. IEEE Transactions on Knowledge
and Data Engineering 7(6): pp. 955-967.
[8] Chaudhuri, S., and Narasayya, V. 1998. Autoadmin ‘what-if’
index analysis utility. SIGMOD Records 27(2), pp. 367–378.
[9] Chaudhuri, S., Datar, M., and Narasayya, V. 2004. Index
selection for databases: A hardness study and a principled
heuristic solution. IEEE Transactions on Knowledge and Data
Engineering, 16(11): pp. 1313-1323.
[10] Cobb, H.G., and Grefenstette, J.J. 1993. Genetic Algorithms
for Tracking Changing Environments. Proc. 5th Intl. Conf. on
Genetic Algorithms, pp. 523-529.
[11] Dasgupta, D., and McGregor, D.R. 1992. Nonstationary
function optimization using structured genetic algorithm. In
Procs. of PPSN II, pp. 145-154.
[12] DeJong, K. 1986. An Analysis of Reproduction and
Crossover in a Binary - coded Genetic Algorithm. PhD thesis,
University of Michigan, Ann Arbor.
[13] Duan, S., Thummala, V., and Babu, S. 2009. Tuning database
configuration parameters with iTuned. Proc. VLDB Endow.
2, 1, pp. 1246-1257.
[14] Finkelstein, S., Schkolnick, M., and Tiberio, P. 1988. Physical
database design for relational databases. ACM Transactions
on Database Systems 13(1): pp. 91–128.
[15] Galindo-Legaria, C., Waas, F. 2002. The Effect of Cost
Distributions on Evolutionary Optimization Algorithms.
GECCO 2002: 351-358.
[16] Goldberg, D. 1989. Genetic Algorithms in Search,
Optimization, and Machine Learning. Addison-Wesley.
[17] Holland, J. 1975. Adaptation in Natural and Artificial
Systems. University of Michigan Press.
[18] Kratica, J, Ljubic, I, and Tošic, D. 2003. A genetic algorithm
for the index selection problem. In G. R. Raidl et al. (Eds.),
EvoWorkshops 2003. LNCS vol 2611, pp 281-291, Springer.
[19] Schnaitter, K., Polyzotis, N., and Getoor, L. 2009. Index
Interactions in Physical Design Tuning: Modeling, Analysis,
and Applications. VLDB 2009, Vol. 2, N.1, pp. 1234-124.
2011 Third World Congress on Nature and Biologically Inspired Computing