Iterative-Deepening Search with
On-line Tree Size Prediction
Ethan Burns and Wheeler Ruml
University of New Hampshire
Department of Computer Science
eaburns at cs.unh.edu and ruml at cs.unh.edu
Abstract. The memory requirements of best-first graph search algorithms such as A* often prevent them from solving large problems. The
best-known approach for coping with this issue is iterative deepening,
which performs a series of bounded depth-first searches. Unfortunately,
iterative deepening only performs well when successive cost bounds visit
a geometrically increasing number of nodes. While it happens to work
acceptably for the classic sliding tile puzzle, IDA* fails for many other
domains. In this paper, we present an algorithm that adaptively chooses
appropriate cost bounds on-line during search. During each iteration, it
learns a model of the search tree that helps it to predict the bound to use
next. Our search tree model has three main benefits over previous approaches: 1) it will work in domains with real-valued heuristic estimates,
2) it can be trained on-line, and 3) it is able to make predictions with
only a small number of training examples. We demonstrate the power
of our improved model by using it to control an iterative-deepening A*
search on-line. While our technique has more overhead than previous
methods for controlling iterative-deepening A*, it can give more robust
performance by using its experience to accurately double the amount of
search effort between iterations.
1
Introduction
Best-first search is a fundamental tool for automated planning and problem
solving. One major drawback of best-first search algorithms, such as A* [1],
is that they store every node that is generated. This means that for difficult
problems in which many nodes must be generated, A* runs out of memory. If
optimal solutions are still required, however, iterative deepening A* (IDA*) [3]
can often be used instead. IDA* performs a series of depth-first searches where
each search expands all nodes whose estimated solution cost falls within a given
bound. As with A*, the solution cost of a node n is estimated using the value
f (n) = g(n) + h(n) where g(n) is the cost accrued along the current path from
the root to n and h(n) is a lower-bound on the cost that will be required to
reach a goal node, which we call the heuristic value of n. After every iteration
that fails to expand a goal, the bound is increased to the minimum f value of
any node that was generated but not previously expanded. Because the heuristic
2
f=3
f=3.2
f=4
f=3.1
f=3.3
f=5
Fig. 1. Geometric versus non-geometric growth.
estimator is defined to be a lower-bound on the cost-to-go and because the bound
is increased by the minimum amount, any solution found by IDA* is guaranteed
to be optimal. Also, since IDA* uses depth-first search at its core, it only uses an
amount of memory that is linear in the maximum search depth. Unfortunately,
it performs poorly on domains with few nodes per f layer as it will re-expand
many interior nodes in order to expand only a very small number of new frontier
nodes on each iteration.
One reason why IDA* performs well classic academic benchmarks like the
sliding tiles puzzle and Rubik’s cube is that both of these domains have a geometrically increasing number of nodes that fall within the successive iterations
as the bound used for the search is increased by the minimum possible amount.
This means that each iteration of IDA* will re-expand not only all of the nodes
of the previous iterations but it will also expand a significant number of new
nodes that were previously out-of-bounds. Sarkar et al. [10] show that, in a domain with this geometric growth, IDA* will expand O(n) nodes where n is the
number of nodes expanded by A*. They also show, however, that in a domain
that does not exhibit geometric growth, IDA* may expand as many as O(n2 )
nodes. Figure 1 shows this graphically. The diagram on the left shows a tree with
three f -layers each of an integer value and each layer encompasses a sufficient
portion of the tree such that successive iterations of IDA* will each expand many
new nodes that were not expanded previously. The right diagram in Figure 1, on
the other hand, shows a tree with real-valued f layers where each layer contains
only a very small number of nodes and therefore IDA* will spend a majority of
its time re-expanding nodes that it has expanded previously. Because domains
with real-valued edge costs tend to have many distinct f values, they fall within
this later category in which IDA* performs poorly.
The main contribution of this work is a new type of model that can be used to
estimate the number of nodes expanded in an iteration of IDA*. While the stateof-the-art approach to estimating search effort is able to predict the number of
expansion with surprising accuracy in several domains it has two drawbacks: 1) it
requires a large amount of off-line training to learn the distribution of heuristic
values and 2) it does not extend easily to domains with real-valued heuristic
estimates. Our new model, which we call an incremental model, is able to predict
as accurately as the current state-of-the-art model for the 15-puzzle when trained
off-line. Unlike the previous approaches, however, our incremental model can
also handle domains with real-valued heuristic estimates. Furthermore, while the
3
previous approaches require large amounts of off-line training, our model may be
trained on-line during a search. We show that our model can be used to control an
IDA* search by using information learned on completed iterations to determine
a bound to use in the subsequent iteration. Our results show that our new
model accurately predicts IDA* search effort. While IDA* guidance using our
model tends to be expensive in terms of CPU time, the gain in accuracy allows
the search to remain robust. Unlike the other IDA* variants which occasionally
give very poor performance, IDA* using an incremental model is the only IDA*
variant that can perform well over all of the domains used in our experiments.
2
Previous Work
Korf et al. [4] give a formula (henceforth abbreviated KRE) for predicting the
number of nodes IDA* will expand with a given heuristic when searching to a
given cost threshold. The KRE method uses an estimate of the heuristic value
distribution in the search space to determine the percentage of nodes at a given
depth that are fertile. A fertile node is a node within the cost bound of the
current search iteration and hence will be expanded by IDA*.
The KRE formula requires two components: 1) the heuristic distribution in
the search space and 2) a function for predicting the number of nodes at a
given depth in the brute-force search tree. They showed that off-line random
sampling can be used to learn the heuristic distribution. For their experiments,
a sample size of ten billion states was used to estimate the distribution of the 15puzzle. Additionally, they demonstrate that a set of recurrence relations, based
on a special feature that they called the type of a node, can be used to find
the number of nodes at a given depth in the brute-force search tree for a tiles
puzzle or Rubik’s cube. The node type used by the KRE method for the 15puzzle is the location of the blank tile: on a side, in a corner, or in the middle.
Throughout this paper, a node type can be any feature of a state that is useful
for predicting information about its offspring. The results of the KRE formula
using these two techniques gave remarkably accurate predictions when averaged
over a large number of initial states for each domain.
Zahavi et al. [14] provide a further generalization of the KRE formula called
Conditional Distribution Prediction (CDP). The CDP formula uses a conditional
heuristic distribution to predict the number of nodes within a cost threshold.
The formula takes into account more information than KRE such as the heuristic value and node type of the parent and grandparent of each node as conditions
on the heuristic distribution. This extra information enables CDP to make predictions for individual initial states and to extend to domains with inconsistent
heuristics. Using CDP, Zahavi et al. show that substantially more accurate predictions can be made on the sliding tiles puzzle and Rubik’s cube given different
initial states with the same heuristic value.
While the KRE and CDP formulas are able to give accurate predictions,
their main drawback is that they require copious amounts of off-line training
to estimate the heuristic distribution in a state space. Not only does this type
4
of training take an excessive amount of time but it also does not allow the
model to learn any instance-specific information. In addition, the implementation
of these formulas as specified by Zahavi et al. [14] assumes that the heuristic
estimates have integer values so that they can be used to index into a large
multi-dimensional array. Many real-world domains have real-valued edge costs
and therefore these techniques are not applicable in those domains.
2.1
Controlling Iterative Search
The problem with IDA* in domains with many distinct f values is well known
and has been explored in past work. Vempaty et al. [12] present an algorithm
called DFS*. DFS* is a combination of IDA* and depth-first search with branchand-bound that sets the bounds between iterations more liberally than standard
IDA*. While the authors describe a sampling approach to estimate the bound
increase between iterations, in their experiments, the bound is simply increased
by doubling.
Wah et al. [13] present a set of three linear regression models to control an
IDA* search. Unfortunately, intimate knowledge of the growth properties of f
layers in the desired domain is required before the method can be used. In many
settings, such as domain-independent planning for example, this knowledge is
not available in advance.
IDA* with Controlled Re-expansion (IDA*CR ) [10] uses a method similar to
that of DFS*. IDA*CR uses a simple model of the search space that tracks the
f values of the nodes that were pruned during the previous iteration and uses
them to find a bound for the next iteration . IDA*CR uses a histogram to count
the number of nodes with each out-of-bound f value during each iteration of
search. When the iteration is complete, the histogram is used to estimate the f
value that will double the number of nodes in the next iteration. The remainder
of the search proceeds as in DFS*, by increasing the bound and performing
branch-and-bound on the final iteration to guarantee optimality.
While IDA*CR is simple, the model that it uses to estimate search effort relies
upon two assumptions about the search space to achieve good performance. The
first is that the number of nodes that are generated outside of the bound must
be at least the same as the number of nodes that were expanded. If there are
an insufficient number of pruned nodes, IDA*CR sets the bound to the greatest
pruned f value that it has seen. This value may be too small to significantly
advance the search. The second assumption is that none of the children of the
pruned frontier nodes of one iteration should fall within the bound on the next
iteration. If this happens, then the next iteration may be much larger than twice
the size of the previous. As we will see, this can cause the search to overshoot the
optimal solution cost on its final iteration, giving rise to excessive search effort.
3
Incremental Models of Search Trees
To estimate the number of nodes that IDA* will expand when using a given cost
threshold, we would like to know the distribution of f values in the search space.
5
Assuming a consistent heuristic1 , all nodes with f values within the threshold
will be expanded. If this distribution is given as a histogram that contains the
number of nodes with each f value, then we can simply find the bound for which
the number of nodes with f values less than the bound matches our desired value.
Our new incremental model performs this task and has the ability to be trained
both off-line with sampling and on-line during a search.
We will estimate the distribution of f values in two steps. In the first step, we
learn a model of how the f values are changing from nodes to their offspring. In
the second step, we extrapolate from the model of change in f values to estimate
the overall distribution of f values. This means that our incremental model
manipulates two main distributions: we call the first one the ∆f distribution
and the second one the f distribution. In the next section, we will describe the
∆f distribution and give two techniques for learning it. We will then describe
how the ∆f distribution can be used to estimate the f distribution.
3.1
The ∆f Distribution
The goal of learning the ∆f distribution is to predict how the f values in the
search space change between nodes and their offspring. The advantage of storing
∆f values instead of storing the f values themselves is that it enables our model
to extrapolate to portions of the search space for which it has no training data, a
necessity when using the model on-line or with few training samples. We will use
the information from the ∆f distribution to build an estimate of the distribution
of f values over the search nodes.
The CDP technique of Zahavi et al. [14] learns a conditional distribution of
the heuristic value and node type of a child node c, conditioned on the node type
and heuristic estimate of the parent node p, notated P (h(c), t(c)|h(p), t(p)). As
described by [14], this requires indexing into a multi-dimensional array according
to h(p) and so the heuristic estimate must be an integer value. Our incremental
model also learns a conditional distribution, however in order to handle realvalued heuristic estimates, our incremental model uses the integer valued searchspace-steps-to-go estimate d of a node instead of its cost-to-go lower bound, h.
In unit-cost domains, d, also known as the distance estimate, will typically be
the same as h, however in domains with real-valued edge costs they will differ.
d is typically easy to compute while computing h [11]. The distribution that is
learned by the incremental model is P (∆f (c), t(c), ∆d(c)|d(p), t(p)), that is, the
distribution over the change in f value between a parent and child, the child
node type and the change in d estimate between a parent and child, given the
distance estimate of the parent and the type of the parent node.
The only non-integer term used by the incremental model is ∆f (c). Our
implementation uses a large multi-dimensional array of fixed-sized histograms
1
A heuristic is consistent when the change in the h value between a node and its
successor is no greater than the cost of the edge between the nodes. If the heuristic is
not consistent then a procedure called pathmax [6] can be used to make it consistent
locally along each path traversed by the search.
6
over ∆f (c) values. Each of the integer-valued features is used to index into
the array, resulting in a histogram of the ∆f (c) values. By storing counts, the
model can estimate the branching factor of the search space by dividing the total
number of nodes with a given d and t by the total number of their offspring.
This branching factor will be used below to estimate the number of successors
of a node when building the f distribution.
Zahavi et al. [14] found that it is often important to take into account information about the grandparent of a node for the distributions used in CDP. We
accomplish this with the incremental model by rolling together the node types
of the parent and grandparent into a single type. For example, on the 15-puzzle,
if the parent state has the blank in the center and it was generated by a state
with the blank on the side, then the parent type would be a side–center node.
This allows us to use an array with the same dimensionality across domains that
take different amounts of ancestry into account.
Learning Off-line. We can learn an incremental ∆f model off-line using the
same method as with KRE and CDP. A large number of random states from
a domain are sampled, and the children (or grandchildren) of each sampled
state are generated. The change in distance estimate ∆d(c) = d(c) − d(p), node
type t(c) of the child node, node type t(p) of the parent node, and the distance
estimate d(p) of the parent node are computed and a count of 1 is then added to
the appropriate histogram for the (possibly real-valued) change in f , ∆f (c) =
f (c) − f (p) between parent and child.
Learning On-line. An incremental ∆f model can also be learned on-line during
search. Each time a node is generated, the ∆d(c), t(c), t(p) and d(p) values are
computed for the parent node p and child node c and a count of 1 is added to
the corresponding histogram for ∆f (c), as in the off-line case. In addition, when
learning a ∆f model on-line, the depth of the parent node in the search tree is
also known. We have found that this feature greatly improves accuracy in some
domains (such as the vacuum domain described below) and so we always add it
as a conditioning feature when learning an incremental model on-line.
Each iteration of IDA* search will expand a superset of the nodes expanded
during the previous iteration. To avoid duplicating effort, our implementation
tracks the bound used in the previous iteration and the model is only updated
when expanding a node that would have been pruned on the previous iteration.
Additionally, the search spaces for many domains form graphs instead of trees.
In these domains, our implementation of depth-first search does cycle checking
by using a hash table of all of the nodes along the current path. In order for our
model to take this extra pruning into account, we only train the model on the
successors of a node that pass the cycle detection.
Learning a Backed-off Model. Due to data sparsity, and because the ∆f
model will be used to extrapolate information about the search space for which it
7
may not have any training data, a backed-off version of the model may be needed
that is conditioned on fewer features of each node. When querying the model, if
there is no training data for a given set of features, the more general backed-off
model is consulted instead. When learning a model on-line, because the model
is learned on instance-specific data, we found that it was only necessary to learn
a model that backs off the depth feature. When training off-line, however, we
learn a series of two back-off models, first eliminating the parent node distance
estimate and then eliminating both the parent distance and type.
3.2
The f Distribution
Our incremental model predicts a bound that will result in expanding the desired
number of nodes for a given start state by estimating the distribution of f
values of the nodes in the search space. The f value distribution of one search
depth and the model of ∆f are used to generate the f value distribution for the
next depth. By beginning with the root node, which has a known f value, our
procedure simulates the expansions of each depth layer to incrementally compute
estimates of the f value distribution at the next layer. The accumulation of these
depth-based f value distributions can then be used to make our prediction.
To increase accuracy, the distribution of f values at each depth is conditioned
on node type t and distance estimate d. We begin our simulation with a model of
depth 0 which is simply a count of 1 for f = f (root ), t = t(root ) and d = d(root ).
Next, the ∆f model is used to find a distribution over ∆f , t and ∆d values for the
offspring of the nodes at each combination of t and d values at the current depth.
By storing ∆ values, we can compute d(c) = d(p)+∆d(c) and f (c) = f (p)+∆f (c)
for each parent p with a child c. This gives us the number of nodes with each f ,
t and d value at the next depth of the search.
Because the ∆f values may be real numbers, they are stored as histograms by
our ∆f model. In order to add f (p) + ∆f (c), we use a procedure called additive
convolution [9, 8]. Each node, i.e. every count in the histogram for the current
layer, will have offspring whose f values differ according the ∆f distribution.
The additive convolution procedure sums the distribution of child f values for
every count in the current layer’s f histogram, resulting in a histogram of f
values over all successors. More formally, the convolution of two histograms ωa
and ωb , where ωP
a and ωb are functions from values to weights, is a histogram ωc ,
where ωc (k) = i∈Domain(ωa ) ωa (i) · ωb (k − i). By convolving the f distribution
of a set of nodes with the distribution of the change in f values between these
nodes and their offspring, we get the f distribution of the offspring.
Since the maximum depth of a shortest-path search tree is typically unknown,
our simulation must use a special criterion to determine when to stop. With a
consistent heuristic the f values of nodes will be non-decreasing along a path [7]
and therefore the change in f stored in our model will always be positive. Since
the change in f is always positive, the f values encountered during the simulation
will always increase between layers. As soon as the simulation estimates that a
sufficient number of nodes will be generated to meet our desired count, the
maximum f value can be fixed as an upper bound since selecting a greater f
8
Simulate(bound , desired , depth, accum, nodes)
1. nodes ′ = SimExpand(depth, nodes )
2. accum ′ = add (accum, nodes − nodes ′ )
3. bound ′ = find bound (accum ′ , bound , desired )
4. if weight left of (bound ′ , nodes − nodes ′ ) > ǫ
5. depth ′ = depth + 1
6. Simulate(bound ′ , desired , depth ′ , accum ′ , nodes ′ )
7. else return accum ′
SimExpand(depth, nodes)
8. nodes ′ = new2dhistogramarray
9. for each t and d with weight (nodes[t, d]) > 0 do
10. fs = nodes [t, d]
11. SimGen(depth, t, d, fs, nodes ′ )
12. return nodes ′
SimGen(depth, t, d, fs, nodes ′ )
13. for each type t′ and ∆d
14. ∆fs = delta f model [t′ , ∆d, d, t]
15. if weight (∆fs) > 0 then
16.
d′ = d + ∆d
17.
fs ′ = Convolve(fs, ∆fs)
18.
nodes ′ [t′ , d′ ] = add (nodes ′ [t′ , d′ ], fs ′ )
19. done
Fig. 2. Pseudo code for the simulation procedure used to estimate the f distribution.
value can only give more nodes than desired. As the simulation proceeds further,
we re-evaluate the f value that gives our desired number of nodes to account new
node generations. This upper bound will continue to decrease and the simulation
will estimate fewer and fewer new nodes within the bound at each depth. When
the expected number of new nodes is only a fractional value smaller than some
ǫ the simulation can stop. In our experiments we use ǫ = 10−3 . Additionally,
because the d value of a node can never be negative, we can prune all nodes that
would be generated with d ≤ 0.
Figure 2 shows the pseudo-code for the procedure that estimates the f distribution. The entry point is the Simulate function which has the following
parameters: the cost bound, desired number of nodes, the current depth, a histogram that contains the accumulated distribution of f values so far and a
2-dimensional array of histograms which stores the conditional distribution of f
values among the nodes at the current depth. Simulate begins by simulating
the expansion of the nodes at the current depth (line 1). The result of this is
the conditional distribution of f values for the nodes generated as offspring at
the next depth. These f values are accumulated into a histogram of all f values
seen by the simulation thus far (line 2). An upper bound is determined (line 3)
and if greater than ǫ new nodes are estimated to be in the next depth then the
simulation continues recursively (lines 4–6), otherwise the accumulation of all f
values is returned as the final result.
9
The Sim-Expand function is used to build the conditional distribution of the
f values for the offspring of the nodes at the current simulation-depth. For each
node type t and distance estimate d for which there exist nodes at the current
depth, the Sim-Gen function is called to estimate the conditional f distribution
of their offspring (lines 9–11). Sim-Gen uses the ∆f distribution (line 14) to
compute the frequency of f values for nodes generated from parents with the
specified combination of type and distance-estimate. Because this distribution
is over ∆f , t and ∆d, we have all of the information that is needed to construct
the conditional f distribution for the offspring (lines 16–18).
Warm Starting. As an iterative deepening search progresses, some of the
shallower depths become completely expanded: no nodes are pruned at that depth
or any shallower depth. All of the children of nodes in a completely expanded
depth are completely generated. When learning the ∆f distribution on-line, our
incremental model has the exact depth, d and f values for all of the layers that
have been completely generated. We “warm start” the simulation by seeding it
with the perfect information for completed layers and beginning at the first depth
that has not been completely generated. This can speed up the computation of
the f distribution and can increase accuracy.
4
Empirical Evaluation
In the following sections we show an empirical study of our new model and
some of the related previous approaches. We begin by evaluating the accuracy
of the incremental model when trained off-line. We then show the accuracy of
the incremental model when used on-line to control an IDA* search.
4.1
Off-line Learning
We evaluate the quality of the predictions given by the incremental model when
using off-line training by comparing the predictions of the model with the true
node expansion counts. For each problem instance the optimal solution cost is
used as the cost bound. Because both CDP and the incremental model estimate
all of the nodes within a cost bound, the truth values are computed by running
a full depth-first search of the tree bounded by the optimal solution cost. This
search is equivalent to the final iteration of IDA* assuming that the algorithm
finds the goal node after having expanded all other nodes that fall within the
cost bound.
Estimation Accuracy. We trained both CDP [14] and an incremental model
off-line on ten billion random 15-puzzle states using the Manhattan distance
heuristic. We then compared the predictions given by each model to the true
number of nodes within the optimal-solution-cost bound for each of the standard
100 15-puzzle instances due to Korf [3]. The leftmost plot of Figure 3 shows the
1
CDP
IM
2
IM
1
0
0
-1
-1
6
8
10
log10 nodes
0.9
0.6
0.3
6
9
log10 nodes
IM
CDP
3
6
log10 sample size
log10 (estimate / actual)
2
fraction correct
log10 (estimate / actual)
10
6
0
-6
IM
CDP
3
6
log10 sample size
Fig. 3. Accuracy when trained off-line.
results of this experiment. The x axis is on a log scale; it shows the actual
number of nodes within the cost bound. The y axis is also on a log scale; it
shows the ratio of the estimated number of nodes to the actual number of nodes,
we call this metric the estimation factor. The closer that the estimation factor
is to one (recall that log10 1 = 0) the more accurate the estimation was. The
median estimation factor for the incremental model was 1.435 and the median
estimation factor for CDP was 1.465 on this set of instances. From the plot we
can see that, on each instance, the incremental model gave estimations that were
nearly equivalent to those given by CDP, the current state-of-the-art predictor
for this domain.
To demonstrate our incremental model’s ability to make predictions in domains with real-valued edge costs and with real-valued heuristic estimates, we
created a modified version of the 15-puzzle where each move costs the square
root of the tile number that is being moved. We call this problem the square root
tiles puzzle and for the heuristic we use a modified version of the Manhattan
distance heuristic that takes into account the cost of each individual tile.
As presented by Zahavi et al. [14], CDP is not able to make predictions on
this domain because of the real-valued heuristic estimates. The second panel in
Figure 3 shows the estimation factor for the predictions given by the incremental
model trained off-line on fifty billion random square root tiles states. The same
100 puzzle states were used. Again, both axes are on a log scale. The median
estimation factor on this set of puzzles was 2.807.
Small Sample Sizes. Haslum et al. [2] use a technique loosely based on the
KRE formula to select between different heuristics for domain independent planning. When given a choice between two heuristic lower bound functions, we would
like to select the heuristic that will expand fewer nodes. Using KRE (or CDP)
to estimate node expansions requires a very large off-line sample of the heuristic
distribution to achieve accurate predictions, which is not achievable in applications such as Haslum et al.’s. Since the incremental model uses ∆ values and a
backed-off model, however, it is able to make useful predictions with very little
training data. To demonstrate this, we created 100 random pairs of instances
from Korf’s set of 15-puzzles. We used both CDP and the incremental model
to estimate the number of expansions required by each instance when given its
optimal solution cost. We rated the performance of each model based on the
median growth factor
11
Sliding tiles
12
IDA* IM
IDA* CR
IDA*
y=2
6
0
Square root tiles
IDA* CR
IDA* IM
y=2
16
6
8
0
0
5
10
iteration number
100
Vacuum maze
y=2
IDA* IM
IDA* CR
12
5
0
0
0
10
20
30
iteration number
100
Uniform tree
IDA* CR
IDA* IM
y=2
IDA*
10
0
20
40
iteration number
0
10
20
30
iteration number
100
% solved
40
90
90
IDA* IM
IDA*
IDA* CR
50
20
IDA*
IDA* CR
IDA* IM
80
0
10000 20000
CPU time
IDA* CR
IDA* IM
80
0
30000
60000
CPU time
0
IDA* IM
IDA* CR
80000
160000
CPU time
0
0
8000
16000
CPU time
Fig. 4. IDA*, IDA*CR and IDA*IM growth rates and number of instances solved.
fraction of pairs for which it was able to correctly determine the more difficult
of the two instances.
The third plot in Figure 3 shows the fraction of pairs that were ordered
correctly by each model for various sample sizes. Error bars represent 95% confidence intervals on the mean. We can see from this plot that the incremental
model was able to achieve much higher accuracy when ordering the instances
with as few as ten training samples. CDP required 10,000 training samples or
more to achieve comparable accuracy. The rightmost plot in this figure shows
the log10 estimation factor of the estimates made by each model. While CDP
achieved higher quality estimates when given 10,000 or more training instances,
the incremental model was able to make much more accurate predictions when
trained on only 10, 100 and 1,000 samples.
4.2
On-line Learning
In this section, we evaluate the incremental model when trained and used online during an IDA* search. When it comes time to set the bound for the next
iteration, the incremental model is consulted to find a bound that is predicted
to double the number of node expansions from that of the previous iteration. We
call this algorithm IDA*IM . As we will see, because the model is trained on the
exact instance for which it will be predicting, the estimations tend to be more
accurate than the off-line estimations, even with a much smaller training set.
In the following subsections, we evaluate the incremental model by comparing
IDA*IM to the original IDA*[3] and IDA*CR [10].
Sliding Tiles. The unit-cost sliding tiles puzzle is a domain where standard
IDA* search works very well. The minimum cost increase between iterations
12
is two and this leads to a geometric increase in the number of nodes between
subsequent iterations.
The top left panel of Figure 4 shows the median growth factor, the relative
size of one iteration compared to the next, on the y axis, for IDA* , IDA*CR and
IDA*IM . Ideally, all algorithms would have a median growth factor of two. All
three of the lines for the algorithms are drawn directly on top of one another in
this plot. While both IDA*CR and IDA*IM attempted to double the work done
by subsequent iterations, all algorithms still achieved no less than 5x growth.
This is because, due to the coarse granularity of f values in this domain, no
threshold can actually achieve the target growth factor. However, the median
estimation factor of the incremental model over all iterations in all instances
was 1.029. This is very close to the optimal estimation factor of one. So, while
granularity of f values made doubling impossible, the incremental model still
predicted the amount of work with great accuracy. The bottom panel shows
the percentage of instances solved within the time given on the x axis. Because
IDA*IM and IDA*CR must use branch-and-bound on the final iteration of search
they are unable to outperform IDA* in this domain.
Square Root Tiles. While IDA* works well on the classic sliding tile puzzle, a
trivial modification exposes its fragility: changing the edge costs. In this section,
we look at the square root cost variant of the sliding tiles. This domain has many
distinct f values, so when IDA* increases the bound to the smallest out-of-bound
f value, it will visit a very small number of new nodes with the same f in the
next iteration. We do not show the results for IDA* on this domain because it
gave extremely poor performance. IDA* was unable to solve any instances with
a one hour timeout and at least one instance requires more than a week to solve.
The second column of Figure 4 presents the results for IDA*IM and IDA*CR .
Even with the branch-and-bound requirement, IDA*IM and IDA*CR easily outperform IDA* by increasing the bound more liberally between iterations. While
IDA*CR gave slightly better performance with respect to CPU time, its model
was not able to provide very accurate predictions. The growth factor between
iterations for IDA*CR was no smaller than eight times the size of the previous
iteration when the goal was to double. The incremental model, however, was
able to keep the growth factor very close to doubling. The median estimation
factor was 0.871 for the incremental model which is much closer to the optimal
estimation factor of one than when the model was trained off-line. We conjecture
that the model was able to learn features that were specific to the instance for
which it was predicting.
One reason why IDA*CR was able to achieve competitive performance in
this domain is because, by increasing the bound very quickly, it was able to skip
many iterations of search that IDA*IM performed. IDA*CR performed no more
than 10 iterations on any instance in this set whereas IDA*IM performed up
to 33 iterations on a single instance. Although the rapid bound increase was
beneficial in the square root tiles domain, in a subsequent section we will see
that increasing the bound too quickly can severely hinder performance.
13
Vacuum Maze. The objective of the vacuum maze domain is to navigate a
robot through a maze in order for it to vacuum up spots of dirt. In our experiments, we used 20 instances of 500x500 mazes that were built with a depth-first
search. Long hallways with no branching were then collapsed into single edges
with a cost equivalent to the hallway length. Each maze contained 10 pieces of
dirt and any state in which all dirt had been vacuumed was a goal. The median
number of states per instance was 56 million and the median optimal solution
cost was 28, 927. The heuristic was the size of the minimum spanning tree of the
locations of the dirt and vacuum. The pathmax procedure [6] was used to make
the f values non-decreasing along a path.
The third column of Figure 4 shows the median growth factor and number
of instances solved by each algorithm for a given amount of time. Again, IDA*
is not shown due to its very poor performance in this domain. Because there
are many dead ends in each maze, the branching factor in this domain is very
close to one. The model used by IDA*CR gave very inaccurate predictions and
the algorithm often increased the bound by too small of an increment between
iterations. IDA*CR performed up to 386 iterations on a single instance. With
the exception of a dip near iterations 28–38, the incremental model was able to
accurately find a bound that doubled the amount of work between iterations.
The dip in the growth factors may be attributed to histogram inaccuracy on the
later iterations of the search. The median estimation factor of the incremental
model was 0.968, which is very close to the perfect factor of one. Because of the
poor predictions given by the IDA*CR model, it was not able to solve instances
as quickly as IDA*IM on this domain.
While our results demonstrate that the incremental model gave very accurate
predictions in the vacuum maze domain, it should be noted that, due to the
small branching factor, iterative searches are not ideal for this domain. A simple
implementation of frontier A* [5] was able to solve each instance in this set in
no more than 1,887 CPU seconds.
Uniform Trees. We also designed a simple synthetic domain that illustrates
the brittleness of IDA*CR . We created a set of trees with 3-way branching where
each node has outgoing edges of cost 1, 20 and 100. The goal node lies at a depth
of 19 along a random path that is a combination of 1- and 20-cost edge and the
heuristic h = 0 for all nodes. We have found that the model used by IDA*CR will
often increase the bound extremely quickly due to the large 100-cost branches.
Because of the extremely large searches created by IDA*CR we use a five hour
time limit in this domain.
The top right plot in Figure 4 shows the growth factors and number of instances solved in a given amount of time for IDA*IM , IDA*CR and IDA*. Again,
the incremental model was able to achieve very accurate predictions with a median estimation factor of 0.978. IDA*IM was able to solve ten of twenty instances
and IDA* solved eight within the time limit. IDA*IM solved every instance in
less time than IDA*. IDA*CR was unable to solve more than two instances within
14
the time limit. It grew the bounds in between iterations extremely quickly, as
can be seen in the growth factor plot on the bottom right in Figure 4.
Although IDA* tended to have reasonable CPU time performance in this
domain, its growth factors were very close to one. The only reason that IDA*
achieved reasonable performance is because expansions in this synthetic tree
domain required virtually no computation. This would not be observed in a
more realistic domain where expansion required any reasonable computation.
4.3
Summary
When trained off-line, the incremental model was able to make predictions on
the 15-puzzle domain that were nearly indistinguishable from CDP, the current
state-of-the art. In addition, the incremental model was able to estimate the
number of node expansions on a real-valued variant of the sliding tiles puzzle
where each move costs the square root of the tile number being moved. When
presented with pairs of 15-puzzle instances, the incremental model trained with
10 samples was more accurately able to predict which instance would require
fewer expansions than CDP when trained with 10,000 samples.
The incremental model made very accurate predictions across all domains
when trained on-line and when used to control the bounds for IDA*, our model
made for a robust search. While the alternative approaches occasionally gave
extremely poor performance, IDA* controlled by the incremental model achieved
the best performance of the IDA* searches in the vacuum maze and uniform tree
domains and was competitive with the best search algorithms for both of the
sliding tiles domains.
5
Discussion
In search spaces with small branching factors such as the vacuum maze domain,
the backed-off model seems to have a greater impact on the accuracy of predictions than in search spaces with larger branching factor such as the sliding tiles
domains. Because the branching factor in the vacuum maze domain is small,
however, the simulation must extrapolate out to great depths (many of which
the model has not been trained on) to accumulate the desired number of expansions. The simple backed-off model used here merely ignored depth. While
this tended to give accurate predictions for the vacuum maze domain, a different
model may be required for other domains.
6
Conclusion
In this paper, we presented a new incremental model for predicting the distribution of solution cost estimates in a search tree and hence the number of
nodes that bounded depth-first search will visit. Our new model is comparable
to state-of-the-art methods in domains where those methods apply. The three
15
main advantages of our new model are that it works naturally in domains with
real-valued heuristic estimates, it is accurate with few training samples, and it
can be trained on-line. We demonstrated that training the model on-line can
lead to more accurate predictions. Additionally, we have shown that the incremental model can be used to control an IDA* search, giving a robust algorithm,
IDA*IM . Given the prevalence of real-valued costs in real-world problems, online incremental models are an important step in broadening the applicability of
iterative deepening search.
7
Acknowledgements
We gratefully acknowledge support from NSF (grant IIS-0812141) and the DARPA
CSSG program (grant HR0011-09-1-0021).
References
1. Hart, P.E., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions of Systems Science and Cybernetics SSC-4(2), 100–107 (July 1968)
2. Haslum, P., Botea, A., Helmert, M., Bonte, B., Koenig, S.: Domain-independent
construction of pattern database heuristics for cost-optimal planning. In: Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI-07) (Jul
2007)
3. Korf, R.E.: Iterative-deepening-A*: An optimal admissible tree search. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI-85). pp.
1034–1036 (1985)
4. Korf, R.E., Reid, M., Edelkamp, S.: Time complexity of iterative-deepening-A*.
Artificial Intelligence 129, 199–218 (2001)
5. Korf, R.E., Zhang, W., Thayer, I., Hohwald, H.: Frontier search. Journal of the
ACM 52(5), 715–748 (2005)
6. Mérõ, L.: A heuristic search algorithm with modifiable estimate. Artificial Intelligence pp. 13–27 (1984)
7. Pearl, J.: Heuristics: Intelligent Search Strategies for Computer Problem Solving.
Addison-Wesley (1984)
8. Rose, K., Burns, E., Ruml, W.: Best-first search for bounded-depth trees. In: The
2011 International Symposium on Combinatorial Search (SOCS-11) (2011)
9. Ruml, W.: Adaptive Tree Search. Ph.D. thesis, Harvard University (May 2002)
10. Sarkar, U., Chakrabarti, P., Ghose, S., Sarkar, S.D.: Reducing reexpansions in
iterative-deepening search by controlling cutoff bounds. Artificial Intelligence 50,
207–221 (1991)
11. Thayer, J., Ruml, W.: Using distance estimates in heuristic search. In: Proceedings
of ICAPS-2009 (2009)
12. Vempaty, N.R., Kumar, V., Korf, R.E.: Depth-first vs best-first search. In: Proceedings of AAAI-91. pp. 434–440 (1991)
13. Wah, B.W., Shang, Y.: Comparison and evaluation of a class of IDA* algorithms.
International Journal on Artificial Intelligence Tools 3(4), 493–523 (October 1995)
14. Zahavi, U., Felner, A., Burch, N., Holte, R.C.: Predicting the performance of IDA*
using conditional distributions. Journal of Artificial Intelligence Research 37, 41–83
(2010)