Cerberus: Red-Black Heuristic For Planning Tasks With Conditional Effects Meets Novelty Heuristic and Enchanced Mutex Detection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Cerberus: Red-Black Heuristic for Planning Tasks with Conditional Effects

Meets Novelty Heuristic and Enchanced Mutex Detection

Michael Katz
IBM Research
Yorktown Heights, NY, USA
[email protected]

Abstract Katz, Hoffmann, and Domshlak (2013b) introduced the


red-black framework and conducted a theoretical investi-
Red-black planning is the state-of-the-art approach to satis- gation of tractability. Following up on this, they devised
ficing classical planning. A planner Mercury, empowered by practical red-black plan heuristics, non-admissible heuris-
the red-black planning heuristic, was the runner-up of the lat- tics generated by repairing fully delete-relaxed plans into
est International Planning Competition (IPC) 2014, despite
the trivial handling of conditional effects by compiling them
red-black plans (Katz, Hoffmann, and Domshlak 2013a).
away. Conditional effects are important for classical plan- Observing that this technique often suffers from dramatic
ning and required in many domains for efficient modeling. over-estimation incurred by following arbitrary decisions
Another recent success in satisficing classical planning is the taken in delete-relaxed plans, Katz and Hoffmann (2013)
Novelty based heuristic guidance. When novelty of heuris- refined the approach to rely less on such decisions, yield-
tic values is considered, search space is partitioned into nov- ing a more flexible algorithm delivering better search guid-
elty layers. Exploring these layers in the order of their nov- ance. Subsequently, Katz and Hoffmann (2014b) presented
elty considerably improves the performance of the underlying a red-black DAG heuristics for a tractable fragment charac-
heuristics. Yet another recent success relates to the transla- terized by DAG black causal graphs and devise some en-
tion of planning tasks from the input PDDL language to a hancements targeted at making the resulting red-black plans
grounded multi-valued variable based representation, such as
+
SAS . Recent methods of invariants synthesis allow for de-
executable in the real task, stopping the search if they suc-
riving richer SAS+ representations. ceed in reaching the goal. Red-black DAG heuristics are
in the heart of the Mercury planner (Katz and Hoffmann
We herein present a satisficing classical planner which we
baptize Cerberus, that incorporates these three recent im-
2014a), the runner-up of the sequential satisficing track in
provements. It starts by performing enhanced mutex detec- the latest International Planning Competition (IPC 2014).
tion to derive a SAS+ planning task with conditional effects. All aforementioned work on red-black planning, however,
Then, it performs best first search of various greediness, ex- handles the SAS+ fragment without conditional effects, de-
ploiting red-black planning heuristic with a direct handling spite of conditional effects being a main feature in the do-
of conditional effects and using such red-black heuristic as a mains of IPC 2014. The planner Mercury that favorably par-
base for a novelty heuristic. ticipated in IPC 2014, handles conditional effects by simply
compiling them away (Nebel 2000). Obviously, the num-
ber of actions in the resulted planning tasks grows exponen-
Introduction tially, and thus such straight forward compiling away does
Delete relaxation heuristics have played a key role in the not scale well. Nebel (2000) presents an alternative compila-
success of satisficing planning systems (Bonet and Geffner tion, that does not lead to an exponential blow-up in the task
2001; Hoffmann and Nebel 2001; Richter and Westphal size. This compilation, however does not preserve the delete
2010). A well-known pitfall of delete relaxation is its inabil- relaxation. Thus, several delete relaxation based heuristics
ity to account for repetive achievements of facts. It has thus were adapted to natively support conditional effects (Haslum
been an actively researched question from the outset how to 2013; Röger, Pommerening, and Helmert 2014). Recently,
take some deletes into account, e. g. (Fox and Long 2001; Katz (2018) has shown that the fragment of red-black plan-
Gerevini, Saetti, and Serina 2003; Helmert 2004; Helmert ning characterized by DAG black causal graphs remains
and Geffner 2008; Baier and Botea 2009; Cai, Hoffmann, tractable in the presence of conditional effects, extending the
and Helmert 2009; Haslum 2012; Keyder, Hoffmann, and existing red-black planning heuristics to natively handling
Haslum 2012). Red-black planning framework (Domshlak, conditional effects.
Hoffmann, and Katz 2015), where a subset of red state vari- Search-boosting and pruning techniques have consider-
ables takes on the relaxed value-accumulating semantics, ably advanced the state-of-the-art in planning as heuristic
while the other black variables retain the regular semantics, search (Richter and Helmert 2009; Richter and Westphal
introduced a convenient way of interpolating between fully 2010; Xie et al. 2014; Valenzano et al. 2014; Domshlak,
relaxed and regular planning. Katz, and Shleyfman 2013; Lipovetzky and Geffner 2012).
One such technique is based on the concept of novelty of iments, this step was observed to make a significant contri-
a state, where the search procedure prunes nodes that do bution to the performance of the overall planning system.
not qualify as novel. The concept has been successfully
exploited in classical planning via SIW + and DF S(i) Red-Black Planning Heuristic
search algorithms and in heuristic search, in conjunction In order to describe the configuration of the red-black plan-
with helpful actions (Lipovetzky and Geffner 2012; 2014; ning heuristic hRB , we need to specify how a red-black task
2017). and in blind state-space search for deterministic on- is constructed (which variables are chosen to be red and
line planning in Atari-like problems (Lipovetzky, Ramirez, which black), also known as painting strategy, as well as
and Geffner 2015), where it was later generalized to ac- how the red-black task is solved. In both cases, we followed
count for rewards (Shleyfman, Tuisov, and Domshlak 2016; the choices made by Mercury planner. Specifically, for red-
Jinnai and Fukunaga 2017). The latter work, although ap- black task construction followed one of the basic strategies,
plied to Atari-like problems, is valid for planning with re- namely ordering the variables by causal graph level, and
wards in general, when rewards are defined on states. Con- either (a) iteratively painting variables red until the black
sequently, (Katz et al. 2017) brought the concept of novelty causal graph becomes a DAG (Domshlak, Hoffmann, and
back to heuristic search, adapting the novelty definition of Katz 2015), or (b) iteratively painting variables black as long
Shleyfman, Tuisov, and Domshlak (2016) to a novelty of a as the black causal graph is a DAG. There are two submit-
state with respect to its heuristic estimate. The new nov- ted planners, that differ in their painting strategies. While
elty notion was no longer used solely for pruning search the planner that (similarly to Mercury planner) uses strategy
nodes, but rather as a heuristic function, for node ordering in (a) is called Cerberus, the planner that uses strategy (b) is
a queue. However, since such heuristics are not goal-aware, denoted by Cerberus-gl. These two planners differ in red-
Katz et al. (2017) use the base goal-aware heuristic as a sec- black planning task creation only, and therefore in what fol-
ondary (tie-breaking) heuristic for node ordering. lows, we describe the configurations without mentioning the
In this work we construct a planner Cerberus, named af- actual planner. The further difference from Mercury planner
ter the monstrous three-headed guardian of the gates of the is in the definition of invertibility in the presence of condi-
Underworld in Greek mythology. The planner incorporates tional effects. In our planners we follow the definition of
three main recent improvements, namely enhanced mutex Katz (2018).
detection, recent novelty heuristic, and the extension of red- For solving the red-black task, we use the algorithm pre-
black planning heuristic to conditional effects. Two variants sented in Figure 2 of Katz (2018). It is an adaptation of the
of the planner submitted to the International Planning Com- algorithm of Katz and Hoffmann (2014a) to tasks with con-
petition (IPC) 2018 differ in the red-black planning heuristic ditional effects. The algorithm receives a red-black planning
they use. In the reminder of this paper we describe the com- task, as well as a set of red facts that is sufficient for reach-
ponents in detail. ing the red-black goals. Such a set is typically obtained from
a relaxed solution to the task. Then, it iteratively (i) selects
Configurations an action that can achieve some previously unachieved fact
from that set, (ii) achieves its preconditions, and (iii) applies
Both Cerberus variants participate in three tracks, namely the action. Finally, when all the facts in the set are achieved,
satisficing, agile, and bounded-cost. They are built on top of it achieves the goal of the task. We follow Katz and Hoff-
the adaptation of the Mercury planner (Katz and Hoffmann mann (2014a) in the two optimizations applied to ehnance
2014a), runner-up of the sequential satisficing track of IPC red-black plan applicability: selecting the next action in (i)
2014, to the recent version of the Fast Downward framework preferring actions such that achieving their black precondi-
(Helmert 2006). Furhter, the implementation is extended to tions does not involve deleting facts from the set above, and
natively support conditional effects (Katz 2018). In contrast selecting the sequences of actions in (ii), preferring those
to Mercury planner, the red-black planning heuristic is en- that are executable in the current state.
hanced by the novelty heuristic (Katz et al. 2017), replacing
the queues ordered by the red-black planning heuristic hRB Landmarks Count Heuristic
in Mercury planner with queues ordered by the novelty of a Following the successful approaches of Mercury and LAMA
state with respect to its red-black planning heuristic estimate planners, we use additional queues ordered by the landmark
hRB , with ties broken by hRB . In what follows, we describe count heuristic (Richter and Westphal 2010).
the parts that are shared between the tracks and then detail
the configuration for each track. Novelty Heuristic
The novelty heuristic used in our planners measures the nov-
Enchanced Invariance Detection elty of a state with respect to its red-black planning heuristic
As the search and the heuristic computation are performed estimate hRB . Specifically, we use the hQB heuristic, as de-
on the finite domain representation SAS+ (Bäckström and scribed in Equation 3 of Katz et al. (2017). The quantified
Nebel 1995), invariance detection plays a significant role both novel and non-novel heuristic hQB is designed not only
in the quality of the translation from PDDL representation to distinguish novel states from non-novel ones, but also to
to SAS+ . To reduce the number of multi-valued state vari- separate novel states, and even to separate non-novel ones.
ables we exploit the h2 mutexes detection as a preprocessing Consequently, we use the best performing overall configura-
step (Alcázar and Torralba 2015). In our preliminary exper- tion of Katz et al. (2017) in Cerberus planners.
Satisficing Track Bonet, B., and Geffner, H. 2001. Planning as heuristic
The configuration runs a sequence of search iterations of de- search. Artificial Intelligence 129(1):5–33.
creasing level of greediness. The first iteration is the greedy Cai, D.; Hoffmann, J.; and Helmert, M. 2009. Enhanc-
best-first search (GBFS) with deferred heuristic evaluation, ing the context-enhanced additive heuristic with precedence
alternating between four queues. The first queue is ordered constraints. In Gerevini, A.; Howe, A.; Cesta, A.; and Re-
by the novelty of a state with respect to its red-black plan- fanidis, I., eds., Proceedings of the Nineteenth International
ning heuristic estimate hRB , with ties broken by hRB . The Conference on Automated Planning and Scheduling (ICAPS
second queue consists of states achieved by preferred oper- 2009), 50–57. AAAI Press.
ators of the red-black planning heuristic1 hRB , ordered by Domshlak, C.; Hoffmann, J.; and Katz, M. 2015. Red-
hRB . The third and forth queues are ordered by the land- black planning: A new systematic approach to partial delete
mark count heuristic, with all successors and those achieved relaxation. Artificial Intelligence 221:73–114.
by the preferred operators, respectively. Domshlak, C.; Katz, M.; and Shleyfman, A. 2013. Symme-
The next iterations perform a weighted A∗ with deferred try breaking: Satisficing planning and landmark heuristics.
heuristic evaluation and decreasing weights w = 5, 3, 2, 1, In Borrajo, D.; Kambhampati, S.; Oddi, A.; and Fratini, S.,
continuing with w = 1. All these iterations alternate be- eds., Proceedings of the Twenty-Third International Confer-
tween the four queues as in Mercury planner, with the first ence on Automated Planning and Scheduling (ICAPS 2013),
two ordered by hRB , with all successors and those achieved 298–302. AAAI Press.
by the preferred operators, respectively, and the last two as
Fox, M., and Long, D. 2001. Stan4: A hybrid planning
in the first iteration. In case a solution is found in the pre-
strategy based on subproblem abstraction. AI Magazine
vious iteration, its cost is passed as a pruning bound to the
22(3):81–84.
next iteration.
In case of non-unit costs, a cost transformation is per- Gerevini, A.; Saetti, A.; and Serina, I. 2003. Planning
formed, adding a constant 1 to all costs. Further, the first through stochastic local search and temporal action graphs
iteration is performed twice, once with unit costs and once in LPG. Journal of Artificial Intelligence Research 20:239–
with the increased costs. 290.
Haslum, P. 2012. Incremental lower bounds for additive
Agile Track cost planning problems. In McCluskey, L.; Williams, B.;
Silva, J. R.; and Bonet, B., eds., Proceedings of the Twenty-
The configuration in the agile track mimics the first itera- Second International Conference on Automated Planning
tion of the configuration in the satisficing track as described and Scheduling (ICAPS 2012), 74–82. AAAI Press.
above.
Haslum, P. 2013. Optimal delete-relaxed (and semi-relaxed)
Bounded-Cost Track planning with conditional effects. In Rossi, F., ed., Proceed-
ings of the 23rd International Joint Conference on Artificial
The configuration in the bounded-cost track mimics the con- Intelligence (IJCAI 2013), 2291–2297. AAAI Press.
figuration in the agile track as described above. The only Helmert, M., and Geffner, H. 2008. Unifying the causal
difference is that the cost bound is provided as an input. graph and additive heuristics. In Rintanen, J.; Nebel, B.;
Beck, J. C.; and Hansen, E., eds., Proceedings of the Eigh-
teenth International Conference on Automated Planning and
Scheduling (ICAPS 2008), 140–147. AAAI Press.
References
Helmert, M. 2004. A planning heuristic based on causal
Alcázar, V., and Torralba, Á. 2015. A reminder about the graph analysis. In Zilberstein, S.; Koehler, J.; and Koenig,
importance of computing and exploiting invariants in plan- S., eds., Proceedings of the Fourteenth International Confer-
ning. In Brafman, R.; Domshlak, C.; Haslum, P.; and Zilber- ence on Automated Planning and Scheduling (ICAPS 2004),
stein, S., eds., Proceedings of the Twenty-Fifth International 161–170. AAAI Press.
Conference on Automated Planning and Scheduling (ICAPS
Helmert, M. 2006. The Fast Downward planning system.
2015), 2–6. AAAI Press.
Journal of Artificial Intelligence Research 26:191–246.
Bäckström, C., and Nebel, B. 1995. Complexity results
Hoffmann, J., and Nebel, B. 2001. The FF planning system:
for SAS+ planning. Computational Intelligence 11(4):625–
Fast plan generation through heuristic search. Journal of
655.
Artificial Intelligence Research 14:253–302.
Baier, J. A., and Botea, A. 2009. Improving planning per- Jinnai, Y., and Fukunaga, A. 2017. Learning to prune dom-
formance using low-conflict relaxed plans. In Gerevini, A.; inated action sequences in online black-box planning. In
Howe, A.; Cesta, A.; and Refanidis, I., eds., Proceedings Proceedings of the Thirty-First AAAI Conference on Artifi-
of the Nineteenth International Conference on Automated cial Intelligence (AAAI 2017). AAAI Press.
Planning and Scheduling (ICAPS 2009), 10–17. AAAI
Press. Katz, M., and Hoffmann, J. 2013. Red-black relaxed plan
heuristics reloaded. In Helmert, M., and Röger, G., eds.,
1 Proceedings of the Sixth Annual Symposium on Combinato-
These are basically the preferred operators of the full delete
relaxation, the FF heuristic. rial Search (SoCS 2013), 105–113. AAAI Press.
Katz, M., and Hoffmann, J. 2014a. Mercury planner: Push- Howe, A.; Cesta, A.; and Refanidis, I., eds., Proceedings
ing the limits of partial delete relaxation. In Eighth Inter- of the Nineteenth International Conference on Automated
national Planning Competition (IPC-8): planner abstracts, Planning and Scheduling (ICAPS 2009), 273–280. AAAI
43–47. Press.
Katz, M., and Hoffmann, J. 2014b. Pushing the lim- Richter, S., and Westphal, M. 2010. The LAMA planner:
its of partial delete relaxation: Red-black DAG heuristics. Guiding cost-based anytime planning with landmarks. Jour-
In ICAPS 2014 Workshop on Heuristics and Search for nal of Artificial Intelligence Research 39:127–177.
Domain-independent Planning (HSDIP), 40–44. Röger, G.; Pommerening, F.; and Helmert, M. 2014. Opti-
Katz, M.; Lipovetzky, N.; Moshkovich, D.; and Tuisov, A. mal planning in the presence of conditional effects: Extend-
2017. Adapting novelty to classical planning as heuristic ing LM-Cut with context splitting. In Schaub, T.; Friedrich,
search. In Proceedings of the Twenty-Seventh International G.; and O’Sullivan, B., eds., Proceedings of the 21st Eu-
Conference on Automated Planning and Scheduling (ICAPS ropean Conference on Artificial Intelligence (ECAI 2014),
2017), 172–180. AAAI Press. 765–770. IOS Press.
Katz, M.; Hoffmann, J.; and Domshlak, C. 2013a. Red- Shleyfman, A.; Tuisov, A.; and Domshlak, C. 2016. Blind
black relaxed plan heuristics. In desJardins, M., and search for atari-like online planning revisited. In Kamb-
Littman, M. L., eds., Proceedings of the Twenty-Seventh hampati, S., ed., Proceedings of the 25th International Joint
AAAI Conference on Artificial Intelligence (AAAI 2013), Conference on Artificial Intelligence (IJCAI 2016), 3251–
489–495. AAAI Press. 3257. AAAI Press.
Katz, M.; Hoffmann, J.; and Domshlak, C. 2013b. Who said Valenzano, R.; Sturtevant, N. R.; Schaeffer, J.; and Xie, F.
we need to relax all variables? In Borrajo, D.; Kambham- 2014. A comparison of knowledge-based GBFS enhance-
pati, S.; Oddi, A.; and Fratini, S., eds., Proceedings of the ments and knowledge-free exploration. In Proceedings of
Twenty-Third International Conference on Automated Plan- the Twenty-Fourth International Conference on Automated
ning and Scheduling (ICAPS 2013), 126–134. AAAI Press. Planning and Scheduling (ICAPS 2014), 375–379. AAAI
Katz, M. 2018. Red-black heuristic for planning tasks with Press.
conditional effects. Technical report, IBM. Available at Xie, F.; Müller, M.; Holte, R. C.; and Imai, T. 2014. Type-
http://ibm.biz/ceffRBTr. based exploration with multiple search queues for satisficing
Keyder, E.; Hoffmann, J.; and Haslum, P. 2012. Semi- planning. In Proceedings of the Twenty-Eighth AAAI Con-
relaxed plan heuristics. In McCluskey, L.; Williams, B.; ference on Artificial Intelligence (AAAI 2014). AAAI Press.
Silva, J. R.; and Bonet, B., eds., Proceedings of the Twenty-
Second International Conference on Automated Planning
and Scheduling (ICAPS 2012), 128–136. AAAI Press.
Lipovetzky, N., and Geffner, H. 2012. Width and serial-
ization of classical planning problems. In De Raedt, L.;
Bessiere, C.; Dubois, D.; Doherty, P.; Frasconi, P.; Heintz,
F.; and Lucas, P., eds., Proceedings of the 20th European
Conference on Artificial Intelligence (ECAI 2012), 540–545.
IOS Press.
Lipovetzky, N., and Geffner, H. 2014. Width-based algo-
rithms for classical planning: New results. In Schaub, T.;
Friedrich, G.; and O’Sullivan, B., eds., Proceedings of the
21st European Conference on Artificial Intelligence (ECAI
2014), 1059–1060. IOS Press.
Lipovetzky, N., and Geffner, H. 2017. Best-first width
search: Exploration and exploitation in classical planning.
In Proceedings of the Thirty-First AAAI Conference on Arti-
ficial Intelligence (AAAI 2017). AAAI Press.
Lipovetzky, N.; Ramirez, M.; and Geffner, H. 2015. Clas-
sical planning with simulators: Results on the atari video
games. In Proceedings of the 24th International Joint Con-
ference on Artificial Intelligence (IJCAI 2015), 1610–1616.
AAAI Press.
Nebel, B. 2000. On the compilability and expressive power
of propositional planning formalisms. Journal of Artificial
Intelligence Research 12:271–315.
Richter, S., and Helmert, M. 2009. Preferred operators and
deferred evaluation in satisficing planning. In Gerevini, A.;

You might also like