Once.: Bonet, Hector @usb - Ve

Download as ps, pdf, or txt
Download as ps, pdf, or txt
You are on page 1of 13

Planning as Heuristic Search: New Results

Blai Bonet and Hector Ge ner


Depto. de Computacion
Universidad Simon Bolvar
Aptdo. 89000, Caracas 1080-A, Venezuela
fbonet,[email protected]

Abstract. In the recent AIPS98 Planning Competition, the hsp plan-


ner, based on a forward state search and a domain-independent heuristic,
showed that heuristic search planners can be competitive with state of
the art Graphplan and Satis ability planners. hsp solved more problems
than the other planners but it often took more time or produced longer
plans. The main bottleneck in hsp is the computation of the heuristic
for every new state. This computation may take up to 85% of the pro-
cessing time. In this paper, we present a solution to this problem that
uses a simple change in the direction of the search. The new planner,
that we call hspr, is based on the same ideas and heuristic as hsp, but
searches backward from the goal rather than forward from the initial
state. This allows hspr to compute the heuristic estimates only once. As
a result, hspr can produce better plans, often in less time. For example,
hspr solves each of the 30 logistics problems from Kautz and Selman in
less than 3 seconds. This is two orders of magnitude faster than black-
box. At the same time, in almost all cases, the plans are substantially
smaller. hspr is also more robust than hsp as it visits a larger number
of states, makes deterministic decisions, and relies on a single adjustable
parameter than can be xed for most domains. hspr, however, is not bet-
ter than hsp accross all domains and in particular, in the blocks world,
hspr fails on some large instances that hsp can solve. We discuss also
the relation between hspr and Graphplan, and argue that Graphplan can
also be understood as a heuristic search planner with a precise heuristic
function and search algorithm.

1 Introduction
The last few years have seen a number of promising new approaches in Plan-
ning. Most prominent among these are Graphplan [BF95] and Satplan [KS96].
Both work in stages by building suitable structures and then searching them.
In Graphplan, the structure is a graph, while in Satplan, it is a set of clauses.
Both planners have shown impressive performance and have attracted a good
deal of attention. Recent implementations and signi cant extensions have been
reported in [KNHD97,LF99,KS99,ASW98].
In the recent AIPS98 Planning Competition [McD98], three out of the four
planners in the Strips track, ipp [KNHD97], stan [LF99], and blackbox [KS99],
were based on these ideas. The fourth planner, hsp [BG98], was based on heuris-
tic search [Nil80,Pea83]. In hsp, the search is assumed to be similar to the search
in problems like the 8-puzzle, the sole di erence being in the heuristic: while in
problems like the 8-puzzle the heuristic is normally given (e.g., as the sum of
Manhattan distances), in planning it has to be extracted automatically from the
representation of the problem. hsp thus appeals to a simple scheme for com-
puting the heuristic from Strips encodings and uses the heuristic to guide the
search. Similar ideas have been used recently in [McD96] and [BLG97].
In this paper, we review the heuristic and search algorithm used in hsp,
discuss the problem of having to compute the heuristic from scratch in every
new state, and introduce a reformulation of hsp that avoids this problem. We
also analyze the relation between the resulting planner and Graphplan, and argue
that Graphplan can also be understood as an heuristic search planner with a
precise heuristic and search algorithm.

2 HSP: Heuristic Search Planner


hsp casts planning problems as problems of heuristic search. A Strips problem
P = hA; O; I; Gi is a tuple where A is a set of atoms, O is a set of operators,
and I  A and G  A encode the initial and goal situations. The operators
op 2 O are all assumed grounded, and each has precondition, add, and delete
lists denoted as Prec(op), Add(op), and Del(op), given by sets of atoms from A.
A Strips problem P = hA; O; I; Gi de nes a state-space S = hS; s0 ; SG ; A(); f; ci
where
S1. the states s 2 S are collections of atoms from A
S2. the initial state s0 is I
S3. the goal states s 2 SG are such that G  s
S4. the actions a 2 A(s) are the operators op 2 O such that Prec(op)  s
S5. the transition function f maps states s into states s0 = f (s; a), such that
s0 = s Del(a) + Add(a) for a 2 A(s).
hsp searches this state-space, starting from s0, with the aid of an heuristic ex-
tracted from the Strips representation of the problem (see also [McD96,BLG97]).

2.1 Heuristic
The heuristic function h for a problem P is obtained by considering a `relaxed'
problem P 0 in which all delete lists are ignored. From any state s, the optimal
cost h0 (s) for solving P 0 is a lower bound on the optimal cost h(s) for solving
P . The heuristics h(s) could thus be set to h0(s). Solving optimally the relaxed
problem P 0 and hence obtaining h0 (s), however, is still exponentially hard. We
thus settle for an approximation: we set h(s) to an estimate of the optimal value
function h0 (s) of the relaxed problem. This estimate is computed as follows.
For a given state s and every atom p we compute a measure g(p) that provides
an estimate of the cost (number of steps) of achieving p from s. These measures
are initialized to 0 if p 2 s and to 1 otherwise. Then, every time an operator
op with preconditions C = Prec(op) is applicable in s, each atom p 2 Add(op)
is added to s and g(p) is updated to
g(p) := min [ g(p) ; 1 + g(C ) ] (1)
where g(C ) stands for the cost of achieving the conjunction of atoms C (see
below). The expansions and updates continue until the measures g(p) do not
change. This is basically a shortest path algorithm over the graph that connects
action preconditions with their e ects.
In hsp, the costs g(C ) for conjunctions of atoms C is de ned as the sum of
the costs g(r) of the individual atoms r in C :
g(C ) def
=
X g (r ) (additive costs) (2)
r 2C
The heuristic function h(s) that provides an estimate of the number of steps
required to make the goal G true from s, is then de ned as:
h(s) def
= g(G) (3)
Note that the g measures depend on s, which is the state from which they are
computed. In hsp, these measures are recomputed from scratch in every new
state. This is expensive and may take up to 85% of the processing time.
The de nition of the cost of conjunctions in (2) assumes that `subgoals' are
independent. Namely, that plans that achieve one subgoal have no e ect on the
plans that achieve other subgoals. In general, this is not true, and as a result
the heuristic function de ned by 1{3 is not admissible (it may overestimate the
true optimal costs).
An admissible heuristic function could be de ned by simply changing the
formula (2) to
g(C ) = max
r 2C
g(r) (max costs) (4)
The heuristic function that would result from this change, that we call the `max
heuristic', is admissible but is less informative than the additive function de ned
above. This is the reason that it is not used in hsp. In fact, while the `additive'
heuristic combines the costs of all subgoals, the `max' heuristic focuses only on
the most dicult subgoals ignoring all others. In Sect. 5 we will see that a re ned
version of the `max heuristic' is used in Graphplan.

2.2 Algorithm
Provided with the heuristic h, hsp approaches planning as a problem of heuristic
search. While standard search algorithms like a or ida [Kor93] could be used,
hsp uses a variation of hill-climbing. This is due to several reasons, the most
important being that the computation of h(s) for every new state is expensive.
hsp thus tries to get to the goal with as few node evaluations as possible. Sur-
prisingly, the hill-climbing search plus a few simple enhancements, like a limited
number of `plateau' moves, restarts, and memory of past states, performs rela-
tively well. We have obtained similar results before using Korf's lrta [Kor90] in
place of hill-climbing [BLG97]. This is probably due to the fact that the heuristic
h, while not admissible, is informative and tends to drive the search in a good
direction.
Table 1 from [McD98], shows the results for the Strips track of the recent
AIPS Competition. As can be seen, hsp solved more problems than the other
planners but it often took more time or produced longer plans (see [McD98] for
details).

Round Planner Avg. Time Solved Fastest Shortest


Round 1 blackbox 1.49 63 16 55
hsp 35.48 82 19 61
ipp 7.40 63 29 49
stan 55.41 64 24 47
Round 2 blackbox 2.46 8 3 6
hsp 25.87 9 1 5
ipp 17.37 11 3 8
stan 1.33 7 5 4
Table 1. Results of the AIPS98 Competition (Strips track). Columns show the number
of problems solved by each planner, and the number of problems in which each planner
was fastest or produced shortest plans (from McDermott)

3 HSPr: Heuristic Regression Planning


In hsp, the main bottleneck is the computation of the heuristic in every new
state. The main innovation in hspr is a simple change in the direction of the
search that avoids this problem. As a result, the search proceeds faster, more
states can be visited, and often better plans can be found in less time.
hspr searches backward from the goal rather than forward from the initial
state. This is an old idea in planning known as regression search [Nil80,Wel94].
In regression search, the states can be thought as subgoals; i.e., the `application'
of an action in a goal yields a situation in which the execution of the action
achieves the goal. The crucial point in hspr is that the measures g(p) computed
from s0 that estimate the cost of achieving each atom p from s0 (Sect. 2.1), can
be used with no recomputation to estimate the heuristic of any state arising
during the regression. For example, if s = fp; q; rg is one such state, the cost
of reaching s0 from s by regression can be estimated in terms of the costs g(p),
g(q), and g(r) of achieving p, q, r from s0 . The same applies to any other state.
In other words, in hspr, the cost estimates g(p) are computed once from s0 and
are used to de ne the heuristic of any state. This is the key change from hsp
from hspr which is formalized below.

3.1 State space


hspr and hsp search two di erent state spaces. hsp searches the progression
space S de ned by [S1]{[S5] above, where the actions are the available Strips
operators. hspr, on the other hand, searches a regression space R where the
actions correspond to `reverse' application of the Strips operators. The regression
space R associated with a Strips problem P = hA; O; I; Gi can be de ned in
analogy to the progression space S de ned by [S1]{[S5] above, as the tuple
hS; s0 ; SG ; A(); f; ci where
R1. the states s are sets of atoms from A
R2. the initial state s0 is the goal G
R3. the goal states s 2 SG are such s  I
R4. the set of actions A(s) applicable in s are the operators op 2 O that are
relevant and consistent; namely, Add(op) \ s 6= ; and Del(op) \ s = ;
R5. the state s0 = f(a; s) that follows the application of a 2 A(s) is such that
s' = s Add(a) + Prec(a).
The solution of this state space is, like the solution of any space hS; s0 ; SG ; A(); f; ci,
a nite sequence of actions a0 , a1 , . . . , an such that for a sequence of states s0 ,
s1 , . . . , sn+1 , si+1 = f(ai; si), for i = 0; : : : ; n, ai 2 A(si), and sn+1 2 SG . The
solution of the progression and regression spaces are related in the obvious way;
one is the inverse of the other.

3.2 Heuristic
hspr searches the regression space [R1]{[R5] using an heuristic based on the
additive cost estimates g(p) described in Sect 2.1. These estimates are computed
only once from the initial state s0 2 S . The heuristic h(s) associated with any
state s is then de ned as
h(s) = g(p)
X
p2s
While in hsp, the heuristic h(s) combines the cost estimates g(p) of a xed set
of goal atoms computed from each state s, in hspr, h(s) combines the estimates
g(p) computed from a xed state s0 over the set of subgoals p 2 s.

3.3 Mutexes
The regression search often leads to states s that are not reachable from the
initial state s0 in S and cannot lead to a solution. Graphplan [BF95] recognized
this problem and showed a way to deal with it based on the notion of mutual
exclusivity relations or mutexes. In hspr, we use the same idea with a slightly
di erent formulation.
Roughly, two atoms p and q form a mutex hp; qi, when they cannot be both
true in a state reachable from s0. For example, in block problems, atoms like
on(a; b) and on(a; c) will normally form a mutex. States containing mutex pairs
can be safely pruned as they cannot be part of a solution. We are thus interested
in recognizing as many mutexes as possible.
A tentative de nition is to have a pair of atoms hp; qi as a mutex when p
and q are not both true in s0 and every action that asserts p deletes q, and
every action that asserts q deletes p. This de nition, however, while sound, is
too weak. In particular, it does not recognize a pair like hon(a; b); on(a; c)i as a
mutex, since actions like move(a; d; b) add the rst atom but do not delete the
second.
We thus use a di erent de nition in which a pair hp; qi is recognized as mutex
when the actions that add p and do not delete q can guarantee through their
preconditions that q will not be true after the action (symmetrically for actions
that add q). For that we consider sets of mutexes.
De nition 1. A set M of atom pairs is a mutex set given a set of operators O
and an initial state s0 i for all hp; qi in M
1. p and q are not both true in s0 ,
2. for every op 2 O that adds p, q 2 Del(op), or q 62 Add(op) and for some
r 2 Prec(op), hr; qi 2 M , and
3. for every op 2 O that adds q, p 2 Del(op), or p 26 Add(op) and for some
r 2 Prec(op), hr; pi 2 M .
It is easy to verify that if hp; qi belongs to a mutex set, then p and q are
really mutually exclusive, i.e., no reachable state will contain them both. Also if
M1 and M2 are two mutex sets, M1 [ M2 will be a mutex set as well, and hence
according to this de nition, there is a single largest mutex set. Rather than
computing this set, however, that is dicult, we compute an approximation as
follows.
We say that hp; qi is a `bad pair' in M when it does not comply with one of
the conditions 1{3 above. The procedure for approximating the largest mutex
set starts with a set of pairs M := M0 and iteratively removes all bad pairs
from M until no bad pair remains. The initial set M0 can be chosen as the set
of all pairs hp; qi for atoms p and q. In practice, however, we've found that we
could speed up the computation without losing mutexes by considering a smaller
initial set de ned as the union of two sets MA and MB where
{ MA is the set of pairs hp; qi such that for some op 2 O, p 2 Add(op) and
q 2 Del(op),
{ MB is the set of pairs hr; qi such that for some hp; qi in MA and some op 2 O,
r 2 Prec(op) and p 2 Add(op)
The structure of this de nition mirrors that of Def. 1. We treat pairs symmet-
rically and hence assume that hq; pi is in a set when hp; qi is, and vice versa (in
the implementation, atoms are ordered so only one pair is stored).
A mutex in hspr refers to a pair in the set M  obtained from M0 = MA + MB
by sequentially removing all `bad' pairs. Like the analogous de nition in Graph-
plan, the set M  does not capture all actual mutexes, yet it can be computed
fast, and in the domains we have considered appears to prune the obvious un-
reachable states. A di erence with Graphplan is that this de nition identi es
non-temporal mutexes while Graphplan identi es time-dependent mutexes. On
the other hand, because of the xed point construction, it can identify mutexes
that Graphplan cannot. For example, in the complete TSP domain [LF99], pairs
like hat(city1 ); at(city2 )i would be recognized as a mutex by this de nition but
not by Graphplan, as the actions of going to di erent cities are not mutually
exclusive for Graphplan.1

3.4 Algorithm
The search algorithm in hspr uses the heuristic h to guide the search in the
regression space and the mutex set M  to prune states. The choice of the actual
search algorithm follows from several considerations:
1. We want hspr to solve problems with large state-spaces (e.g., spaces of 1020
states are not uncommon).
2. Node generation in hspr, while faster than in hsp, is slow in comparison
with domain-speci c search algorithms.2
3. The heuristic function, while often quite informative, is not admissible and
often overestimates the true optimal costs.
4. State spaces in planning are quite redundant with many paths leading to
the same states.
These constraints are common to hsp where node generation is several times
slower because of the computation of the heuristic. Slow node generation with
large state-spaces practically rules out optimal search algorithms such as a and
ida that visit too many states before nding a solution. hsp uses instead a Hill-
Climbing search with a few extensions like a memory of past states, a limited
number of `plateau' moves, and multiple restarts. In hspr, we wanted to take
advantage of the faster node generation to use a more systematic algorithm. After
some experimentation, we settled on a simple algorithm that we call Greedy Best
First. gbfs is a bfs algorithm that uses a cost function f (n) = g(n) + W  h(n),
where g(n) is the accumulated cost (number of steps),3 h(n) is the estimated
cost, and W  1 is a real parameter.4 As any bfs algorithm, gbfs maintains the
nodes for expansion in an open list, and the nodes already expanded in a closed
1
An additional distinction is that Graphplan focuses on parallel macro-actions, while
we focus on sequential primitive actions. See Sect. 5.
hspr generates a few thousand nodes per second while Korf's algorithms for the
2

Rubik's Cube and the 24-puzzle generate in the order of a million nodes per second
[Kor98,KT96].
3
Not to be confused with the cost measures g(p) de ned in Sect. 2.1 for atoms.
4
For the role of W in bfs, see [Kor93].
list. The only di erence between gbfs and bfs is that before selecting a node
n with minimum f -cost from open, gbfs checks rst to see whether the f -best
children of the last node expanded are `good enough'. If so, gbfs selects one such
child; else it selects n from open as bfs. In hspr, a child n0 of a node n is `good
enough' when it appears closer to the goal; i.e., when h(n0 ) < h(n). gbfs thus
performs some sort of Hill-Climbing search but `backtracks' in a bfs mode when
the heuristic is not improved. We also explored schemes in which `discrepancies'
are counted and used to control a lds algorithm [HG95] but didn't get as good
results. The results reported below for hspr are all based on the gbfs algorithm
described above with W = 5. Small changes in W have little e ect in hspr, but
some problems are not solved with W = 1.

4 Experiments
We report results on two domains for which there is a widely used body of
dicult but solvable instances. We use the 30 logistic instances and the 5 largest
block-world instances included in the blackbox distribution (that we refer to
as log-i and bw-k, i = 1; : : : ; 30, k = 1; : : : ; 5),5 and 5 hard block-world instances
of our own (that we refer to as bw-i, i = 6; : : : ; 10).
In logistics, we compare hspr with hsp, blackbox, and tlplan. tlplan
[BK98] is not a domain-independent planner (it uses control information tailored
for each domain) but provides a reference for the results achievable in the domain.
The results are shown in Table 2. The `time' column measures CPU time in
seconds, while `steps' displays the number of actions in the solutions found.6
As the table shows, hspr solves each of the 30 logistics problems in less than
3 seconds. This is faster (and less erratic) than hsp, and two orders of magnitude
faster than blackbox. At the same time, in almost all cases, the plans produced
by hspr are substantially smaller than those produced by blackbox and hsp.
The speed of hspr is comparable with tlplan that includes ne-tuned knowledge
for the domain. The plans obtained by tlplan, however, are shorter (with one
exception).
hspr is not better than hsp accross all domains, and there are problems,
including a number of `grid', `mystery' and `mprime' instances from the AIPS
5
The logistics instances are from the `logistics-strips' directory; while the block-world
instances are from the prodigy-bw directory (they contain 7, 9, 11, 15, and 19 blocks,
respectively).
6
The results for blackbox and tlplan were taken from [BK98]. The results for
blackbox in [BK98] are compatible with the results in the distribution but are
slightly more detailed. From [BK98], blackbox was run on a SPARC Ultra 2 with
a 296MHz clock and 256M of RAM, while tlplan was run on a Pentium Pro with
a 200MHz clock and 128M of RAM. The results for hsp and hspr were obtained on
a SPARC Ultra 5 running at 333MHz with 128M of RAM. In the case of hsp and
hspr, individual problems are converted into C programs that are then compiled and
run. This takes on the order of a couple of seconds for each instance. This time is
not included in the table.
Time Steps
Problem hspr hsp bbox tlplan hspr hsp bbox tlplan
log-01 0.09 0.63 0.57 0.26 27 39 25 25
log-02 0.08 0.48 95.97 0.28 28 31 31 27
log-03 0.08 0.47 98.99 0.24 29 27 28 27
log-04 0.23 0.95 130.74 1.37 67 58 71 51
log-05 0.19 1.12 231.93 1.10 51 53 69 42
log-06 0.33 1.51 321.27 1.91 69 60 82 51
log-07 1.12 | 264.04 5.54 81 | 96 70
log-08 1.49 9.43 317.42 6.84 82 170 110 70
log-09 0.76 | 1609.45 3.79 77 | 121 70
log-10 0.99 4.77 84.04 2.42 46 109 71 41
log-11 0.64 1.61 137.93 2.24 54 47 68 46
log-12 0.43 1.30 136.22 1.93 41 36 49 38
log-13 1.26 5.78 165.84 6.54 74 102 85 66
log-14 1.46 8.48 77.74 9.34 82 141 89 73
log-15 1.10 3.53 424.36 5.36 69 76 91 63
log-16 0.34 1.09 926.96 1.14 44 43 85 39
log-17 0.36 | 758.47 1.24 48 | 83 43
log-18 2.36 5.88 152.35 9.27 56 57 105 46
log-19 0.74 2.97 149.22 2.66 50 68 78 45
log-20 1.77 7.46 538.22 10.18 99 144 113 89
log-21 1.51 5.09 190.49 6.83 69 81 87 59
log-22 1.22 4.12 846.84 6.40 87 99 111 75
log-23 0.95 | 173.93 4.69 70 | 85 62
log-24 1.05 5.00 74.83 4.71 73 113 87 64
log-25 1.04 | 73.99 4.09 67 | 84 57
log-26 0.80 3.66 233.40 3.64 52 77 80 55
log-27 1.08 4.45 145.16 5.52 76 115 97 70
log-28 2.96 | 867.34 14.53 88 | 118 74
log-29 2.40 8.40 89.51 5.99 50 105 84 45
log-30 0.89 | 495.37 3.42 52 | 92 51
Table 2. Performance over logistics problems. A dash indicates that the planner
was not able to solve the problem in 10 mins or 100 Mb.
hspr hsp
Problem Blocks Time Steps Time Steps
bw-01 7 0.07 6 0.16 6
bw-02 9 0.15 6 0.59 10
bw-03 11 0.33 10 0.77 12
bw-04 15 8.11 18 5.88 19
bw-05 19 6.96 18 15.80 28
bw-06 8 0.12 19 0.42 12
bw-07 10 1.10 13 0.65 12
bw-08 12 3.73 14 1.97 18
bw-09 14 { { 3.83 17
bw-10 16 { { 12.63 20
Table 3. Performance over blocks-world problems. A dash indicates that a planner
was not able to solve the problem in 10 mins or 100 Mb.

competition (see [McD98]), were neither planner does well. In the blocks-world,
in particular, hsp appears to do better than hspr over some hard instances
(bottom of Table 3). This appeared surprising to us as hsp and hspr use similar
heuristics and hspr can visit many more nodes than hsp. The problem we found
is that the additive heuristic may not work as well for hspr as for hsp due to
the changes in the size of the states during the regression search. Since h(s) is
de ned as the sum of the costs g(p) for p 2 s, this means that the heuristics
favors smaller states over large states. This is not necessarily bad but often
makes the de nition of `good children' in the Greedy bfs algorithm inadequate.
Indeed, sometimes perfectly good `children' do not appear closer to the goal
than their `parents' simply because they contain more atoms. We are currently
exploring possible ways around this problem. This problem does not happen in
hsp because in the forward search all states s have the same size. Namely, atoms
that are not in s can be safely assumed to be false then. The same is not true
for the regression search.

5 Related Work
hspr and hsp are descendants of the real-time planner asp reported in [BLG97].
All three planners perform planning as heuristic search and use the same heuris-
tic but search in di erent directions or with di erent algorithms. unpop is a
planner based on similar ideas developed independently by Drew McDermott
[McD96].
The search in hspr can be understood as consisting of two phases. In the
rst, a forward propagation is used to compute the measures g(p) that estimate
the cost of achieving each of the atoms p from the initial state s0 ; in the second,
a regression search is performed using those measures. These two phases are in
correspondence with the two phases in Graphplan [BF95], where a plan graph
is built forward in the rst phase, and is searched backward in the second. The
two planners are also related in the use of mutexes that hspr borrows from
Graphplan. For the rest, hspr and Graphplan look quite di erent. However, we
argue below that Graphplan, like hspr, can also be understood as a heuristic
search planner with a precise heuristic function and search algorithm. From
this point of view, the main innovation in Graphplan is the implementation of
the search algorithm, that is quite ecient, and the derivation of the heuristic
function making use of the mutex information. Basically, we argue that the main
features of Graphplan can be understood as follows:
1. Graph: The graph encodes an admissible heuristic hG that is a re ned
version of the `max' heuristic discussed in Sect. 2.1, where hG (s) = j i j
is the index of the rst level in the graph that contains (the set of atoms) s
without a mutex, and in which s is not memoized (note that memoizations
are updates on the heuristic function; see 4).
2. Mutex: Mutexes are used to prune states in the regression search (as in
hspr) and to re ne the `max' heuristic into hG
3. Algorithm: The search algorithm is a version of Iterative Deepening a
(ida ) [Kor93], where the sum of the accumulated cost g(n) and the esti-
mated cost hG (n) is used to prune nodes n whose cost exceed the current
threshold. Actually, Graphplan never generates such nodes.
4. Memoization: Memoizations are updates on the heuristic function hG (see 1).
The resulting algorithm is a memory-extended version of ida that closely
corresponds to the mrec algorithm [SB89]. In mrec, the heuristic of a node
n is updated and stored in a hash-table after the search below the children
of n completes without a solution (given the current threshold).
5. Parallelism: Graphplan searches a regression space Rp that is slightly dif-
ferent than R above, in which the actions are macro-actions (parallel-actions)
composed of compatible primitive actions. Solutions costs in Rp are given by
the number of macro-actions used (the number of time steps). The heuristic
hG () is admissible in this space as well, where it provides a tighter lower
bound than in R. While the branching factor in Rp can be very high (the
number of macro-actions applicable in a state), Graphplan makes smart use
of the information in the graph to generate only the children that are `rele-
vant' and whose cost does not exceed the current threshold.
This interpretation can be tested by extracting the heuristic hG from the graph
and plugging it into a suitable version of ida (mrec) running over the parallel-
regression space Rp. The same care as Graphplan has to be taken for generating
the applicable `parallel' actions in each state, avoiding redundant or unnecessary
e ort. Our claim is that the performance of the resulting algorithm should be
close to the basic Graphplan system [BF95] except for a small constant factor.
We haven't done this experiment ourselves but hope others may want to do
it. If the account is correct, it will show that Graphplan, like hsp and hspr, is
best understood as a heuristic search planner. This would provide a di erent
perspective to evaluate the strengths and limitations of Graphplan, and may
suggest ways in which Graphplan can be improved. For another perspective on
Graphplan, see [KLP97].
6 Discussion
We have presented a reformulation of hsp that makes `planning as heuristic
search' more competitive with state of the art planners. A number of problem
and challenges remain open. Among those that we think are the most important
are the following:
{ better heuristics: the `additive' heuristic used in hsp and hspr is too
simple and often overestimates widely the true costs. The `max' heuristic
is too simple in the other direction, and focuses only on the most dicult
subgoals ignoring all others. Better heuristics are needed.
{ node generation: the node generation rates in hsp and hspr are orders
of magnitude slower than in specialized search algorithms [KT96,Kor98].
Implementations need to be improved substantially if domain-independent
planning approaches are to compete with domain-speci c search algorithms.
{ use of memory: in hard problems, hspr may run out of memory. A num-
ber of approaches for searching with limited memory are discussed in the
literature but none is currently incorporated in the planners discussed.
{ optimal search: so far we have been concerned with nding `reasonable'
plans, yet if better admissible heuristics can be obtained and node generation
rates are improved, it'll be feasible to use standard optimal search algorithms
such as ida.
{ modeling languages: hsp and hspr are tied to the Strips language, yet the
ideas can be generalized to more expressive planning languages. In [Gef99],
we describe Functional Strips, a language that extends Strips with rst-class
function symbols, and allows the codi cation of a number of problems (e.g.,
Hanoi, 8-puzzle, etc) in a way that mimics more specialized representations.
In the current implementation, the heuristic function is given by the user,
but we expect to be able to exploit the richer language for extracting better
heuristics.
Except for the modeling point, the rest of the issues are common to the prob-
lems encountered in the application of domain-speci c heuristic search methods
[JS99]. Indeed, the only thing that distinguishes planning as heuristic search
from classical heuristic search is the use of a general declarative language for
encoding problems and getting the heuristic information. The biggest challenge
is to make the declarative approach competitive with specialized methods while
being more general.
Acknowledgements: Part of this work was done while H. Ge ner was visiting
IRIT, Toulouse. He thanks J. Lang, D. Dubois, H. Prade, and H. Farreny, for
their hospitality and for making the visit possible. Both authors also thank R.
Korf and D. Furcy for comments related to this work. Partial funding is due to
Conicit. B. Bonet is currently at UCLA under a USB-Conicit fellowship.
The code for hspr is available at www.ldc.usb.ve/hector.
References
[ASW98] C. Anderson, D. Smith, and D. Weld. Conditional e ects in graphplan. In
Proc. AIPS-98. AAAI Press, 1998.
[BF95] A. Blum and M. Furst. Fast planning through planning graph analysis. In
Proceedings of IJCAI-95, Montreal, Canada, 1995.
[BG98] B. Bonet and H. Ge ner. HSP: Planning as heuristic search.
http://www.ldc.usb.ve/hector, 1998.
[BK98] F. Bacchus and F. Kabanza. Using temporal logics to ex-
press search control knowledge for planning, 1998. Submitted. At
http://www.lpaig.uwaterloo.ca/fbacchus.
[BLG97] B. Bonet, G. Loerincs, and H. Ge ner. A robust and fast action selection
mechanism for planning. In Proceedings of AAAI-97, 1997.
[Gef99] H. Ge ner. Functional strips: a more general language for planning and
problem solving. Presented at the Logic-based AI Workshop, Washington
D.C., June 1999. At http://www.ldc.usb.ve/hector.
[HG95] W. Harvey and M. Ginsberg. Limited discrepancy search. In Proc. IJCAI-
95, 1995.
[JS99] A. Junghanns and J. Schae er. Domain-dependent single-agent search en-
hancements. In Proc. IJCAI-99. Morgan Kaufmann, 1999.
[KLP97] S. Kambhampati, E. Lambrecht, and E. Parker. Understanding and ex-
tending Graphplan. In Proc. 4th European Conf. on Planning, LNAI 1248.
Springer, 1997.
[KNHD97] J. Koehler, B. Nebel, J. Ho man, and Y. Dimopoulos. Extending planning
graphs to an ADL subset. In Proc. 4th European Conf. on Planning, volume
LNAI 1248. Springer, 1997.
[Kor90] R. Korf. Real-time heuristic search. Art. Intelligence, 42:189{211, 1990.
[Kor93] R. Korf. Linear-space best- rst search. Art. Intelligence, 62:41{78, 1993.
[Kor98] R. Korf. Finding optimal solutions to to Rubik's cube using pattern
databases. In Proceedings of AAAI-98, pages 1202{1207, 1998.
[KS96] H. Kautz and B. Selman. Pushing the envelope: Planning, propositional
logic, and stochastic search. In Proceedings of AAAI-96.
[KS99] H. Kautz and B. Selman. Unifying SAT-based and Graph-based planning.
Proceedings IJCAI-99.
[KT96] R. Korf and L. Taylor. Finding optimal solutions to the twenty-four puzzle.
In Proceedings of AAAI-96, pages 1202{1207. MIT Press, 1996.
[LF99] D. Long and M. Fox. The ecient implementation of the plan-graph. JAIR,
1999.
[McD96] D. McDermott. A heuristic estimator for means-ends analysis in planning.
In Proc. Third Int. Conf. on AI Planning Systems (AIPS-96), 1996.
[McD98] D. McDermott. AIPS-98 Planning Competition Results. http://ftp.cs.
yale.edu/mcdermott/aipscomp-results.html,1998.
[Nil80] N. Nilsson. Principles of Arti cial Intelligence. Tioga, 1980.
[Pea83] J. Pearl. Heuristics. Morgan Kaufmann, 1983.
[SB89] A. Sen and A. Bagchi. Fast recursive formulations for BFS that allow
controlled used of memory. In Proc. IJCAI-89, pages 297{302, 1989.
[Wel94] D. Weld. An introduction to least commitment planning. AI Magazine,
1994.

You might also like