Academia.eduAcademia.edu

Preferred Explanations: Theory and Generation via Planning

2011

In this paper we examine the general problem of generating preferred explanations for observed behavior with respect to a model of the behavior of a dynamical system. This problem arises in a diversity of applications including diagnosis of dynamical systems and activity recognition. We provide a logical characterization of the notion of an explanation. To generate explanations we identify and exploit a correspondence between explanation generation and planning. The determination of good explanations requires additional ...

Proceedings of the Ninth International Workshop on Non-Monotonic Reasoning, Action and Change (NRAC’11) Editors: Sebastian Sardina School of Computer Science and IT RMIT University Melbourne, VIC, 3000 Australia [email protected] Stavros Vassos Department of Informatics and Telecommunications National and Kapodistrian University of Athens Athens, 15784 Greece [email protected] Technical Report RMIT-TR-11-02 July 2011 School of Computer Science and Information Technology RMIT University Melbourne 3000, Australia Preface We present here the informal proceedings for the Ninth International Workshop on Non-Monotonic Reasoning, Action and Change (NRAC’11), a well-established forum to foster discussion and sharing of experiences among researchers interested in the broad areas of nonmonotonic reasoning, and reasoning about action and change, including belief revision, planning, logic programming, argumentation, causality, probabilistic and possibilistic approaches to KR, and other related topics. Since its inception in 1995, NRAC has always been held in conjunction with International Joint Conference on Artificial Intelligence (IJCAI), each time with growing success, and showing an active and loyal community. Previous editions were held in 2009 in Pasadena, USA; in 2007 in Hyderabad, India; in 2005 in Edinburgh, Scotland; in 2003 in Acapulco, Mexico; in 2001 in Seattle, USA; in 1999 in Stockholm, Sweden; in 1997 in Nagoya, Japan; and in 1995 in Montreal, Canada. This time, NRAC’11 is held as a 1.5-day satellite workshop of IJCAI’11, in Barcelona, Spain, and will take place on July 16 & 17. An intelligent agent exploring a rich, dynamic world, needs cognitive capabilities in addition to basic functionalities for perception and reaction. The abilities to reason nonmonotonically, to reason about actions, and to change one’s beliefs, have been identified as fundamental high-level cognitive functions necessary for common sense. Many deep relationships have already been established between the three areas and the primary aim of this workshop is to further promote this cross-fertilization. A closer look at recent developments in the three fields reveals how fruitful such cross-fertilization can be. Comparing and contrasting current formalisms for Nonmonotonic Reasoning, Reasoning about Action, and Belief Revision helps identify the strengths and weaknesses of the various methods available. It is an important activity that allows researchers to evaluate the state-of-the-art. Indeed a significant advantage of using logical formalisms as representation schemes is that they facilitate the evaluation process. Moreover, following the initial success, more complex real-world applications are now within grasp. Experimentation with prototype implementations not only helps to identify obstacles that arise in transforming theoretical solutions into operational solutions, but also highlights the need for the improvement of existing formal integrative frameworks for intelligent agents at the ontological level. This workshop will bring together researchers from all three areas with the aim to compare and evaluate existing formalisms, report on new developments and innovations, identify the most important open problems in all three areas, identify possibilities of solution transferal between the areas, and identify important challenges for the advancement of the areas. As part of the program we will be considering the status of the field and discussing questions such as: What nonmonotonic logics and what theories of action and change have been implemented?; how to compare them?; which frameworks are implementable?; what can be learned from existing applications?; what is needed to improve their scope and performance? In addition to the paper sessions, this year’s workshop features invited talks by two internationally renowned researchers: Jürgen Dix (TU Clausthal University, Germany) on ‘‘How to test and compare Multi-agent systems?” and Grigoris Antoniou (University of Crete, Greece) on “Nonmonotonic Reasoning in the Real: Reasoning about Context in Ambient Intelligence Environments” The programme chairs would like to thank all authors for their contributions and are also very grateful to the program committee for their hard work during the review phase and for providing excellent feedback to the authors. The programme chairs are also very grateful to Pavlos Peppas and Mary-Anne Williams from the steering committee for always being available for consultation, and to Maurice Pagnucco for helping us to put these Proceedings together. June 2011 Sebastian Sardina Stavros Vassos RMIT University National and Kapodistrian University of Athens Organization Organizing Committee Sebastian Sardina Stavros Vassos RMIT University, Australia National and Kapodistrian University of Athens, Greece Steering Committee Gerhard Brewka Michael Thielscher Leora Morgenstern Maurice Pagnucco Pavlos Peppas Mary-Anne Williams Andreas Herzig Benjamin Johnston University of Leipzig, Germany University of NSW, Australia SAIC Advanced Systems and Concepts, USA University of NSW, Australia University of Patras, Greece University of Technology, Sydney, Australia Universite Paul Sabatier, France University of Technology, Sydney, Australia Program Committee Xiaoping Chen Jim Delgrande Jérôme Lang Thomas Meyer Michael Thielscher Sheila McIlraith Eduardo Fermé Dongmo Zhang Mehdi Dastani Giuseppe De Giacomo Christian Fritz Leora Morgenstern Pavlos Peppas Sajjad Haider Alfredo Gabaldon University of Science and Technology China, China Simon Fraser University, Canada Universite Paul Sabatier, France Meraka Institute, South Africa University of NSW, Australia University of Toronto, Canada University of Madeira, Portugal University of Western Sydney, Australia Utrecht University, The Netherlands Sapienza Universita’ di Roma, Italy PARC (Palo Alto Research Center), USA SAIC Advanced Systems and Concepts, USA University of Patras, Greece Institute of Business Administratio, Pakistan Universidade Nova de Lisboa, Portugal Table of Contents An Adaptive Logic-based Approach to Abduction in AI (Preliminary Report) . . . . . . . . . . . . . . . . . . . . . . . 1 Tjerk Gauderis Default Reasoning about Conditional, Non-Local and Disjunctive Effect Actions . . . . . . . . . . . . . . . . . . . . . 7 Hannes Strass A Logic for Specifying Partially Observable Stochastic Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Gavin Rens, Thomas Meyer, Alexander Ferrein and Gerhard Lakemeyer Agent Supervision in Situation-Determined ConGolog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Giuseppe De Giacomo, Yves Lespérance and Christian Muise On the Use of Epistemic Ordering Functions as Decision Criteria for Automated and Assisted Belief Revision in SNePS (Preliminary Report) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Ari Fogel and Stuart Shapiro Decision-Theoretic Planning for Golog Programs with Action Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Daniel Beck and Gerhard Lakemeyer Verifying properties of action theories by bounded model checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Laura Giordano, Alberto Martelli and Daniele Theseider Dupré Efficient Epistem ic Reasoning in Partially Observable Dynamic Domains Using Hidden Causal Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Theodore Patkos and Dimitris Plexousakis Preferred Explanations: Theory and Generation via Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Shirin Sohrabi, Jorge A. Baier and Sheila A. Mcilraith The Method of ILP+ASP on Psychological Models (Preliminary Report) . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Javier Romero, Alberto Illobre, Jorge Gonzalez and Ramon Otero Tractable Strong Outlier Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Fabrizio Angiulli, Rachel Ben-Eliyahu-Zohary and Luigi Palopoli Topics in Horn Contraction: Supplementary Postulates, Package Contraction, and Forgetting . . . . . . . 87 James Delgrande and Renata Wassermann A Selective Semantics for Logic Programs with Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Alfredo Gabaldon An Adaptive Logic-based Approach to Abduction in AI∗ (Preliminary Report) Tjerk Gauderis Centre for Logic and Philosophy of Science Ghent University, Belgium [email protected] Abstract language of the logic over which the considered theory T is defined and ∆, the condition of the formula, is a set of regular well-formed formulas that are assumed to be false. To express this assumption, these formulas are generally called abnormalities in adaptive logic literature.2 For an adaptive logic in standard format, the abnormalities are characterized by a logical form. The set of plausibly derivable formulas P from a logical theory T is formed in the following way: In a logic-based approach to abductive reasoning, the background knowledge is represented by a logical theory. A sentence φ is then considered as an explanation for ω if it satisfies some formal conditions. In general, the following three conditions are considered crucial: (1) φ together with the background knowledge implies ω; (2) φ is logically consistent with what is known; and (3) φ is the most ‘parsimonious’ explanation. But, since abductive reasoning is a non-monotonic form of reasoning, each time the background knowledge is extended, the status of previously abduced explanations becomes once again undefined. The adaptive logics program is developed to address these types of non-monotonic reasoning. In addition to deductive reasoning steps, it allows for direct implementation of defeasible reasoning steps, but it adds to each formula the explicit set of conditions that would defeat this formula. So, in an adaptive logic for abduction, a formula is an abduced hypothesis as long as none of its conditions is deduced. This implies that we will not have to recheck all hypotheses each time an extension to our background knowledge is made. This is the key advantage of this approach, which allows us to save repetitive re-computations in fast growing knowledge bases. 1 1. Premise Rule: if A ∈ T , then (A, ∅) ∈ P 2. Unconditional Inference Rule: if A1 , . . . , An ⊢ B and (A1 , ∆1 ), . . . , (An , ∆n ) ∈ P, then (B, ∆1 ∪ . . . ∪ ∆n ) ∈ P 3. Conditional Inference Rule: if A1 , ..., An ⊢ B ∨ Dab(Θ) and (A1 , ∆1 ), . . . , (An , ∆n ) ∈ P, then (B, ∆1 ∪ . . . ∪ ∆n ∪ Θ) ∈ P The Adaptive Logics Framework The adaptive logics program is established to offer insight in the direct application of defeasible reasoning steps.1 This is done by focussing on which formulas would falsify a defeasible reasoning step. Therefore, in adaptive logics a formula is a pair (A, ∆) with A a regular well-formed formula in the ∗ Research for this paper was supported by project subventions from the Special Fund for Research (BOF) of Ghent University. I am grateful to the anonymous referees for their helpful suggestions. 1 The adaptive logics program is founded by Batens in the eighties. For a more recent overview of the general results, see [Batens, 2007]. For a philosophical defense of the use of adaptive logics, see [Batens, 2004]. 1 where Dab(Θ) stands for disjunction of abnormalities, i.e. the classical disjunction of all elements in the finite set of abnormalities Θ. This third rule, which adds new conditions, makes clear how defeasible steps are modeled. The idea is that if we can deductively derive the disjunction of a defeasible result B and the formulas, the truth of which would make us to withdraw B, we can defeasibly derive B on the assumption that none of these formulas is true. Apart from the set of plausible formulas P we need a mechanism that selects which defeasible results should be withdrawn. This is done by defining a marking strategy. In the adaptive logics literature, several strategies have been developed, but for our purposes it is sufficient to consider the simple strategy. According to this strategy, the set of the derivable formulas or consequences D ⊆ P consists of : 1. Deductive Results: if (A, ∅) ∈ P, then (A, ∅) ∈ D 2. Unfalsified Defeasible Results: if (A, Θ) ∈ P (with Θ 6= ∅) and if for every ω ∈ Θ : (ω, ∅) 6∈ P, then (A, Θ) ∈ D 2 This representation of adaptive logics is a reinterpretation of the standard representation of adaptive logics, which is in terms of a proof theory. I made this reinterpretation for purposes of comparison with other approaches in AI. So, apart from the deductive results – which are always derivable – this strategy considers all defeasible results as derived, as long as none of the elements of their condition is deductively derived. From the definitions of the sets P and D, we can understand how adaptive logics model the non-monotonic character of defeasible reasoning. If our theory T is extended to the new theory T ′ (T ⊂ T ′ ), then we can define the corresponding sets P ′ and D′ . On the one hand, the set of plausibly derivable formulas will be monotonic (P ⊂ P ′ ), since there is no mechanism to withdraw elements from this set and it can only grow larger.3 On the other hand, we know that the set of derivable formulas is non-monotonic (D 6⊂ D′ ). It is possible that a condition of a defeasible result in D, is suddenly – in light of the new information in T ′ – deductively derivable. So, this result will not be part of D′ any more. Obviously, no deductive result will ever be revoked. This makes this kind of logics very apt to model fast growing knowledge bases.4 If one needs a previously defeasibly derived result at a certain point, we cannot be sure whether it is still valid, because there might have been several knowledge base updates in the meantime. But, since the set of plausible formulas is monotonic, we know this formula will still be in P. So, instead of recalculating the whole nonmonotonic set D after each knowledge base extension (which is the traditional approach), it is sufficient to expand the monotonic set P. Of course, in this approach, if we want to use a defeasible result at a certain stage of knowledge base expansion, we will first have to check its condition. Still, it is easily seen that a lot of repetitive re-computation is avoided, certainly in situations in which we only need a small percentage of the defeasible results at every stage of knowledge base expansion. Moreover, it is proven that if the adaptive logic is in standard format, which means that the abnormalities have a fixed logical form, the corresponding logic will have all interesting meta-theoretic properties. The logic for abduction developed in this article will be in standard format and will therefore be sound, complete, proof invariant and have the fixed-point property.5 2 Other “conditional” approaches formulas together with consistency conditions that need to be satisfied to make these formulas acceptable. The main difference with these research programs is that the abnormalities in adaptive logics are based on a fixed logical form. This means that, for instance, the logical form for abduction – explained in this paper – is the form of abnormalities for any premise set on which we want to apply abductive reasoning. Put in other words, as soon as a fully classical premise set is given, all the possible abnormalities and, therefore, all the plausible and finally derivable abductive results can be calculated. There is no element of choice. In the other approaches, the conditions of defeasible steps must be given in the premise set, which leaves an element of choice which conditions we want to add to which defeasible implications. In adaptive logics, the defeasible consequences can be derived as soon as we have a classical premise set and as soon as we have chosen the appropriate logic for the kind of reasoning we want to do (e.g. abduction). 3 The problem of multiple explanatory hypotheses in Abduction If we focus our attention now to the abductive problem, we cannot allow that the different defeasible results – the abduced hypotheses – are together in the set P. For instance, if Tweety is a non-flying bird, he may be a penguin or an ostrich. But a set containing both the formulas (penguin(T weety), Θ1 ) and (ostrich(T weety), Θ2 ) is inconsistent.6 An elegant solution to this problem is found by translating this problem to a modal framework. When we introduce a possibility operator ♦ to indicate hypotheses and the corresponding necessity operator ( =df ¬♦¬) to represent background knowledge, we evade this problem. The Tweetyexample translates, for instance, as such (for variables ranging over the domain of all birds): Background Knowledge: (∀x(penguin(x) ⊃ ¬f lies(x)), ∅) (∀x(ostrich(x) ⊃ ¬f lies(x)), ∅) (¬f lies(T weety), ∅) Plausible defeasible results: As far as I can see, two other approaches in AI have used the idea of directly adding conditions or restrictions to formulas. On the one hand, there is a line of research, called “Cumulative Default Reasoning”, going back to a paper of [Brewka, 1991] with the same title. On the other hand, in the area of argumentation theory, some work on defeasible logic programs (see, for instance, [Garcı́a and Simari, 2004]) is also based on 3 It is important to understand “plausible” as “initially plausible” (at the time of derivation) and not as “plausible according to our present insights”. The second definition would, of course, have led to a non-monotonic set. 4 In that way, this kind of logic can offer a solution to what [Paul, 2000] mentioned as one of the main problems of both set-coverbased and some logic-based approaches to abduction. 5 For an overview of the generic proofs of these properties, see [Batens, 2007]. 2 (♦penguin(T weety), Θ1 ) (♦ostrich(T weety), Θ2 ) So, with this addition the sets P and D are consistent again. Though, in this situation it is not really necessary to maintain the modal operators, because we can quite easily make a translation to a hierarchical set-approach, by borrowing some ideas of the Kripke-semantics for modal logics.7 In these semantics, a hypothesis is said to be true in a possible world that is accessible from the world in which the hypothesis is stated, while necessities are true in all accessible worlds. 6 At this point, we make abstraction of the exact conditions. The details of the conditions will be explained below. 7 It is important to remember that we are constructing a syntactical representation, not a semantics for the underlying logic. If we define now a world(-set) as the set of formulas assigned to that world, we can finish our translation from modalities to sets. We define the actual world w as the set of all formulas of the knowledge base and all deductive consequences. The elements of the set w are the only formulas that have a -operator in our modal logic, and are thus the only elements that will be contained in every world-set in our system. Subsequently, for every abduced hypothesis we define a new world-set that contains it. This world is hierarchically directly beneath the world from which the formula is abduced. This new set contains further the formulas of all the world-sets hierarchically above, and will be closed under deduction. To make this hierarchy clear, we will use the names w1 , w2 , . . . for the worlds containing hypotheses directly abduced from the knowledge base, w1.1 , w1.2 , . . . , w2.1 , . . . for hypotheses abduced from a first-level world, etc. With this translation in mind, we can omit the modal operators and just keep for every formula track of the hierarchically highest world-set that contains it. So, our Tweety example can be respresented as such: (∀x(penguin(x) ⊃ ¬f lies(x)), ∅) (∀x(ostrich(x) ⊃ ¬f lies(x)), ∅) (¬f lies(T weety), ∅) (penguin(T weety), Θ1 ) (ostrich(T weety), Θ2 ) w w w w1 w2 Since the hierarchical system of sets wi is equivalent to the set P (the plausibly derivable results) of a logic for abduction, the definition of the set D (of this logic) can be applied to this system of sets too. It is clear that only the deductive consequences – the only formulas with an empty condition – will be the formulas in the set w. Further, since all formulas in a world-set have the same conditions, i.e. the condition of the hypothesis for which the world is created, the definition of D does not only select on the level of the formulas, but actually also on the level of the world-sets.8 Put in other words, D selects a subsystem of the initial system of hierarchically ordered sets. The different sets in this subsystem are equivalent with what [Flach and Kakas, 2000] called abductive extensions of some theory. In this way, the logic can handle mutually contradictory hypotheses,9 without the risk that any set of formulas turns out to be inconsistent. 4 now use this set representation to reformulate the syntax of the logic MLAs , which is previously developed in [Gauderis, 2011].10 This adaptive logic, the name of which stands for Modal Logic for Abduction, is an adaptive logic designed to handle contradictory hypotheses in abduction. The reformulation in terms of sets is performed with the goal to integrate the adaptive approach with other AI-approaches. First we need to define the abductive problem in a formal way. Definition 1. An abductive system T is a triple (H, O, d) of the following three sets • a set of clauses H of the form ∀x(A1 (α) ∧ . . . ∧ An (α) ⊃ B(α)) with A1 (α), . . . , An (α), B(α) literals and α ranging over d. • a set of observations O of the form C(γ) with C a literal and a constant γ ∈ d. • a domain d of constants. All formulas are closed formulas defined over a standard predicative first order logic. Furthermore, the notation does not imply that predicates should be of rank 1. Predicates can have any rank, the only preliminaries are that in the clauses all Ai and B share a common variable, and that the observations have at least one variable that is replaced by a constant. Obviously, for predicates of higher rank, extra quantifiers for the other variables need to be added to make sure that all formulas are closed. Definition 2. The background knowledge or actual world w of an abductive system T = (H, O, d) is the set w = {(P, ∅) | H ∪ O ⊢ P } Since it was the goal of an adaptive logic-approach to implement directly defeasible reasoning steps, we will consider instances of the Peircean schema for abduction [Peirce, 1960, 5.171]: The surprising fact, C is observed; But if A were true, C would be a matter of course, Hence, there is reason to suspect that A is true. When we translate his schema to the elements of T = (H, O, d), we get the following schema: ∀x(A1 (α) ∧ . . . ∧ An (α) ⊃ B(α)) B(γ) A1 (γ) ∧ . . . ∧ An (γ) Reformulation of the abductive problem in the adaptive logics format So far, in this paper we have shown – in the first section – how we can represent the standard format of adaptive logics in terms of two sets P and D, and – in the third section – how we can cope with contradictory hypotheses by using a hierarchical system of world-sets. In this section we will 8 Strictly speaking, each world-set contains also all formulas of the world-sets hierarchically above. But since these formulas are also contained in those worlds above, no information is lost if we allow that D can select on the level of the world-sets. 9 Consider, for instance, the famous quaker/republican example: our approach will lead to two different abductive extensions, one in which Nixon will be a pacifist and another one in which he isn’t. 3 To implement this schema – better-known as the logical fallacy Affirming the Consequent – in an adaptive logic, we need to specify the logical form of the conditions that would falsify the application of this rule. As we can see from how the conditional inference rule is introduced in the first section, the disjunction of the hypothesis and all defeating conditions needs to be derivable from the theory. To specify these conditions, we will first overview the different desiderata for our abductions. 10 In the original article, the syntax of the logic MLAs is defined in terms of a proof theory. Obviously, it is straightforward that if the negation of the hypothesis can be derived from our background knowledge, the abduction is falsified. If we know that Tweety lives in Africa, we know that he cannot be a penguin. So, in light of this information, the hypothesis cannot longer be considered as derivable: (penguin(T weety), Θ1 ) 6∈ D. But the hypothesis still remains in the monotonic set of ‘initially’ plausible results: (penguin(T weety), Θ1 ) ∈ P. So, if we define A(α) to denote the full conjunction, Now we can define the defeasible reasoning steps. Therefore we will need a new notation, which has the purpose to lift out one element from the conjunction A1 (α) ∧ . . . ∧ An (α). This will be used to check for more parsimonious explanations. Notation 1 (A−1 i (α)). A(α) =def A1 (α) ∧ . . . ∧ An (α) the first formal condition that could falsify the defeasible step will be To avoid self-explanations we will further add the condition that A(α) and B(α) share no predicates. The reason why this condition also states the two premises of the abductive schema is because, in an adaptive logic, we can apply the conditional rule each time the disjunction is derivable. So, if we didn’t state the two premises in the abnormality, we could derive anything as a hypothesis since ⊢ A(γ) ∨ ¬A(γ) for any A(γ). But with the current form, only hypotheses for which the two premises are true can be derived. This abnormality would already be sufficient to create an adaptive logic. Still, we want to add some other defeating conditions. This could be done by replacing the abnormality by a disjunction of the already found abnormality and the other wanted conditions. Then, each time one of the conditions is derivable, the whole disjunction is derivable (by addition), and so, the formula defeated. But this result is obtained in the same way if we allow that one defeasible inference step adds more than one element to the condition instead of this complex disjunction. Hence, we will add these extra conditions in this way. A lot of times, it is stated that the abduced hypothesis must be as parsimonious as possible. One of the main reasons for this is that one has to avoid random explanations. For instance, have a look at the following example: H = {∀x(penguin(x) ⊃ ¬f lies(x))} O = {¬f lies(T weety)} d = {x | x is a bird} The following formulas are derivable from this: (∀x(penguin(x) ∧ is green(x) ⊃ ¬f lies(x)), ∅) (penguin(T weety) ∧ is green(x), Θ1 ) (is green(T weety), Θ1 ) w w1 w1 The fact that T weety is green is not an explanation for the fact that T weety doesn’t fly, nor is it something that follows from our background knowledge. Since we want to avoid that our abductions yield this kind of random hypotheses, we will add a mechanism to control that our hypothesis is the most parsimonious. A final condition that we have to add is that our observation is not a tautology. Since we use a material implication, anything could be derived as an explanation for a tautology, because ⊢ B(α) ⊃ ⊤ for any B(α). 4 : A−1 i (α) =df (A1 (α) ∧ . . . ∧ Ai−1 (α) ∧ Ai+1 (α) ∧ . . . ∧ An (α)) if n = 1 : A−1 1 (α) =df ⊤ Definition 3. The set of abnormalities Ω for an abductive system T is given by Ω ∀α(A1 (α) ∧ . . . ∧ An (α) ⊃ B(α)) ∧ B(γ) ∧ ¬A(γ). if n > 1 = {(∀x(A1 (α) ∧ . . . ∧ An (α) ⊃ B(α)) ∧ B(γ) ∧ ¬A(γ)) n _ ∨ ∀αB(α) ∨ ∀α(A−1 i (α) ⊃ B(α) | γ ∈ d, i=1 α ranging over d, Ai and B literals, B 6∈ {Ai }} It is easily seen that the generic conditional rule for adaptive logics – as defined in section 1 – defined by this set of abnormalities is equivalent with the following inference rule that is written in the style of the Peircean schema stated above. Definition 4. Defeasible Inference rule for Abduction ( ∀α(A1 (α) ∧ . . . ∧ An (α) ⊃ B(α)), ∅) w ( B(γ), ∅) wi ( A1 (γ) ∧ . . . ∧ An (γ), Θ) wij with wij a new world hierarchically directly beneath wi and Θ = {¬(A1 (γ) ∧ . . . ∧ An (γ)), ∀αB(α), −1 ∀α(A−1 1 (α) ⊃ B(α)), . . . , ∀α(An (α) ⊃ B(α))} So, it is possible to abduce further on hypothetical observations (and generate in that way further abductive extensions), but the implications need to be present in the background knowledge w. It is quite obvious, that if the abduced hypothesis is already abduced before (from, for instance, another implication), the resulting world-set will contain the same formulas, but with other conditions. Finally, as explained in section 1, this body of definitions is formulated in the general framework of adaptive logics. This means that we have the following property. Property 1. The logic MLAs is a fixed-point logic which has a sound and complete semantics with respect to its syntax. For the semantics and proof theory of this logic, and the proof that this logic is in the standard format of adaptive logics, we refer to [Gauderis, 2011]. For the soundness and completeness proof, we refer to the generic proof provided in [Batens, 2007] for all adaptive logics in standard format. 5 Example Motivation and comparison with other approaches In this section we will consider an elaborate example of the dynamics of this framework. The main goal is to illustrate the key advantage of this approach, i.e. that there is no longer the need to recalculate all non-monotonic results at any stage of a growing knowledge base, but that one only needs to check the non-monotonic derivability of the needed formulas at a certain stage against the monotonic plausibility. This is the main difference with other approaches to abduction such as the ones explicated in, for instance, [Paul, 2000], [Flach and Kakas, 2000] or [Kakas and Denecker, 2002]. Since these approaches focus on a fixed and not an expanding knowledge base, they require in cases of expansion a full recomputation to keep the set of derived non-monotonic results updated. It is not claimed that the adaptive approach yields better results than these other approaches in cases of a fixed knowledge base. In fact, it is an issue for future research to investigate whether the integration of the existing approaches for fixed knowledge bases with the adaptive approach does not yield better results. Since the background information is extended, we only know that all previously derived hypotheses are still in the set of plausible hypotheses P. If we want to check whether they are in the set of derivable hypotheses D, we need to check whether their conditions are derivable from this extended information or not. But – this has already been cited several times as the key advantage of this system – we don’t need to check all hypotheses. Since we don’t have any further information on the penguin case, we just leave the hypothesis (4) for what it is. Thus, we save a computation, because at this stage we are not planning on reasoning or communicating on the penguin hypothesis. We just want to check whether this new information is a problem for the ostrich hypothesis; and indeed, it is easily seen that (5) 6∈ D′ . Initial system T Our elaborate example will be an abductive learning situation about the observation of a nonflying bird, called Tweety. Initially, our abductive system T = (H, O, d) contains in addition to this observation only very limited background knowledge. Second Extension T ′′ At this stage, we will investigate further the penguin hypothesis and retrieve additional background information about penguins. H = {∀x(penguin(x) ⊃ ¬f lies(x)), ∀x(ostrich(x) ⊃ ¬f lies(x))} O = {¬f lies(T weety)} d = {x | x is a bird} H′′ = H′ ∪ {∀x(penguin(x) ⊃ eats f ish(x)), ∀x(on south pole(x) ∧ in wild(x) ⊃ penguin(x))} O′′ = O′ d = {x | x is a bird} The following formulas can now further be retrieved: Thus, our background knowledge contains the following formulas: (∀x(penguin(x) ⊃ ¬f lies(x)), ∅) (∀x(ostrich(x) ⊃ ¬f lies(x)), ∅) (¬f lies(T weety), ∅) w w w (1) (2) (3) w1 w2 (4) (5) with the sets Θ1 and Θ2 defined as Θ1 Θ2 = = {¬penguin(T weety), ∀x ¬f lies(x)} {¬ostrich(T weety), ∀x ¬f lies(x)} Since both implications have only one conjunct in the antecedent, their parsimony conditions – as defined in the general logical form – trivially coincide with the second condition. Since none of the conditions is deductively derivable in w, both (4) and (5) are elements of the set of derivable formulas D. First Extension T ′ At this stage, we discover that Tweety can swim, something we know ostriches can’t. H′ = H ∪ {∀x(ostrich(x) ⊃ ¬swims(x))}, O′ = O ∪ {swims(T weety)} d = {x | x is a bird} From which the following formulas can be derived: (∀x(swims(x) ⊃ ¬ostrich(x)), ∅) (¬ostrich(T weety), ∅) w w (6) (7) 5 w1 w1.1 w1.1 (8) (9) (10) with the set Θ1.1 defined as Θ1.1 And the following abductive hypotheses can be derived: (penguin(T weety), Θ1 ) (ostrich(T weety), Θ2 ) (eats f ish(T weety), Θ1 ) (on south pole(T weety), Θ1.1 ) (in wild(T weety), Θ1.1 ) = {¬(on south pole(T weety) ∧ in wild(T weety)), ∀x penguin(x), ∀x(on south pole(x) ⊃ penguin(x)), ∀x(in wild(x) ⊃ penguin(x))} Since the first element of Θ1.1 is actually a disjunction, the first condition can even be split in two. This stage is added to illustrate the other aspects of adaptive reasoning. Firstly, as (8) illustrates, there is no problem in reasoning further on previously deductively derived hypotheses. Only, to reason further, we must first check the condition of these hypotheses (This poses no problem here, because we can easily verify that (4) ∈ D′′ ). The deductively derived formula has the same conditions as the hypothesis on which it is built (and is contained in the same world). So, these results stand as long as the hypotheses on which assumption they are derived, hold. This characteristic of adaptive logics is very interesting, because it allows to derive predictions that can be tested in further investigation. In this example, we can test whether Tweety eats fish. In case this experiment fails and ¬eats f ish(T weety) is added to the observations in the next stage, the hypothesis (and all results derived on its assumption) will be falsified. Secondly, the set of conditions Θ1.1 for the formulas (9) and (10) contains now also conditions that check for parsimony. Let us illustrate their functioning with the final extension. Third Extension T ′′′ At this stage, we learn that even in captivity the only birds that can survive on the South Pole are penguins. In addition to that, we get to know that Tweety is held in captivity. H′′′ = H′′ ∪ {∀x(on south pole(x) ⊃ penguin(x))}, O′′′ = O′′ ∪ {¬in wild(T weety)} d = {x | x is a bird} If we now check the parsimony conditions of Θ1.1 , we see that an element of this condition can be derived from our background knowledge. This means that all formulas assigned to world w1.1 are not derivable anymore on this condition. Still, one might wonder whether this parsimony condition should not keep (9) and only withdraw (10). But, that this is not a good road is proven by the fact that in that case (10) would be falsified by the extra observation that Tweety does not live in the wild. In fact, that it was a good decision to withdraw the whole world w1.1 is illustrated by the fact that the South Pole hypothesis of (9) can also be derived from H′′′ in another world. (on south pole(T weety), Θ1.2 ) w1.2 (11) with the set Θ1.2 defined as Θ1.1 = {¬on south pole(T weety), ∀x penguin(x)} So, at the end, we find that the set D′′′ of derivable formulas consists of all formulas derivable in the worlds w, w1 and w1.2 . The formulas of w2 and w1.1 are not an element of this final set of derivable results. 6 Conclusion In this article we presented a new logic-based approach to abduction which is based on the adaptive logics program. The main advantages of this approach are : 1. Each abduced formula is presented together with the specific conditions that would defeat it. In that way, it is not necessary to check the whole system for consistency after each extension of the background knowledge. Only the formulas that are needed at a certain stage need to be checked. Furthermore, it allows for the conditions to contain additional requirements, such as parsimony. 2. In comparison with other approaches that add conditions to formulas, the conditions are here fixed by a logical form and hence only determined by the (classical) premise set. In this way, there is no element of choice in stating conditions (as, for instance, in default logics). 3. By integrating a hierarchical system of sets, it provides an intuitive representation of multiple hypotheses without causing conflicts between contradictory hypotheses. 4. It allows for further deductive and abductive reasoning on previous retrieved abduced hypotheses. 5. The approach is based on a proper sound and complete fixed point logic (MLAs ). 6 Limitations and Future Research It has been argued that these advantages make this approach apt for systems in which not all non-monotonic derivable results are needed at every stage of expansion of a knowledge base. Still, it needs to be examined whether an integration with existing systems (for a fixed knowledge base) do not yield better results. Furthermore, since the key feature of this approach is the saving of computations in expanding knowledge bases, it needs to be investigated whether there is no integration possible with assumption-based Truth Maintenance Systems (building on the ideas of [Reiter and de Kleer, 1987]). References [Batens, 2004] Diderik Batens. The need for adaptive logics in epistemology. In D. Gabbay, S. Rahman, J. Symons, and J.P. Van Bendegem, editors, Logic, Epistemology and the Unity of Science, pages 459–485. Kluwer Academic Publishers, Dordrecht, 2004. [Batens, 2007] Diderik Batens. A universal logic approach to adaptive logics. Logica Universalis, 1:221–242, 2007. [Brewka, 1991] Gerhard Brewka. Cumulative Default Logic. Artificial Intelligence, 50(2):183–205, 1991. [Flach and Kakas, 2000] Peter A. Flach and Antonis C. Kakas. Abductive and Inductive Reasoning: Background and Issues. In Peter A. Flach and Antonis C. Kakas, editors, Abduction and Induction. Essays on their Relation and their Integration, volume 18 of Applied Logic Series, pages 1–27. Kluwer Academic Publishers, Dordrecht, 2000. [Garcı́a and Simari, 2004] Alejandro J. Garcı́a and Guillermo R. Simari. Defeasible Logic Programming: An Argumentative Approach. Theory and Practice of Logic Programming, 4(1):95–2004, 2004. [Gauderis, 2011] Tjerk Gauderis. Modelling Abduction in Science by means of a Modal Adaptive Logic. Foundations of Science, 2011. Forthcoming. [Kakas and Denecker, 2002] Antonis Kakas and Marc Denecker. Abduction in Logic Programming. In A. Kakas and F. Sadri, editors, Computational Logic: Logic Programming and Beyond. Part I, pages 402–436. Springer Verlag, 2002. [Paul, 2000] Gabriele Paul. AI Approaches to Abduction. In Dov M. Gabbay and Rudolf Kruse, editors, Abductive Reasoning and Uncertainty Management Systems, volume 4 of Handbook of Defeasible Reasoning and Uncertainty Management Systems, pages 35–98. Kluwer Academic Publishers, Dordrecht, 2000. [Peirce, 1960] Charles S. Peirce. Collected Papers. Belknap Press of Harvard University Press, Cambridge, Massachusetts, 1960. [Reiter and de Kleer, 1987] Raymond Reiter and Johan de Kleer. Foundations of Assumption-based Truth Maintenance Systems: Preliminary Report. In Proceedings of the Sixth National Conference on Artificial Intelligence (AAAI’87), pages 183–188, 1987. Default Reasoning about Conditional, Non-Local and Disjunctive Effect Actions Hannes Strass Institute of Computer Science University of Leipzig [email protected] Abstract objects. In the presence of a (simple) state default expressing that objects are to be considered not broken unless there is information to the contrary, this could lead to the following reasoning: After dropping an object x of which nothing further is known, we can apply the default and infer it is not broken. But this means it cannot have been fragile before (since otherwise it would be broken). This line of reasoning violates the principle of causality: while a fragile object will be broken after dropping it, this does not mean that objects should be assumed not fragile before dropping them. We will formally define when such undesired inferences arise and devise a modification to the basic framework that provably disables them. Interestingly, the counterintuitive consequences occur already with conditional, local-effect actions; our modification however prevents them also for actions with nondeterministic, non-local effects. Since the introduction of effect preconditions represents our most significant change, we will prove that it is a proper generalisation of the original framework: for all action default theories with only unconditional, local effect actions, the “old” and “new” approach yield the same results. For the subsequent extensions it will be straightforward to see that they are proper generalisations. The paper proceeds as follows. In the next section, we provide the necessary background. The sections thereafter extend the basic approach introduced in [Baumann et al., 2010] by conditional effects (Section 3), non-local effects (Section 4) and disjunctive effects (Section 5). In the penultimate section, we prove several desirable properties of the extended framework; Section 7 discusses related work and concludes. Recently, Baumann et al. [2010] provided a comprehensive framework for default reasoning about actions. Alas, the approach was only defined for a very basic class of domains where all actions have mere unconditional, local effects. In this paper, we show that the framework can be substantially extended to domains with action effects that are conditional (i.e. are context-sensitive to the state in which they are applied), non-local (i.e. the range of effects is not pre-determined by the action arguments) and even disjunctive (thus nondeterministic). Notably, these features can be carefully added without sacrificing important nice properties of the basic framework, such as modularity of domain specifications or existence of extensions. 1 Introduction Reasoning about actions and non-monotonic reasoning are two important fields of logic-based knowledge representation and reasoning. While reasoning about actions deals with dynamic domains and their evolution over time, default reasoning is usually concerned with closing gaps in incomplete static knowledge bases. Both areas have received considerable attention and have reached remarkable maturity by now. However, a unifying approach that combines the full expressiveness of both fields was still lacking, until a recent paper [Baumann et al., 2010] took an important first step into the direction of uniting these two lines of research. There, a logical framework was proposed that lifted default reasoning about a domain to a temporal setting where defaults, action effects and the frame assumption interact in a well-defined way. In this paper, we develop a substantial extension of their work: we significantly generalise the theoretical framework to be able to deal with a broad class of action domains where effects may be conditional, non-local and non-deterministic. As we will show in the paper, extending the approach to conditional effects is straightforward. However, retaining their construction of defaults leads to counterintuitive conclusions. Roughly, this is due to eager default application in the presence of incomplete knowledge about action effects. As an example, consider the classical drop action that breaks fragile 2 Background 2.1 Unifying Action Calculus The unifying action calculus (UAC) was proposed in [Thielscher, 2011] to allow for a treatment of problems in reasoning about actions that is independent of a particular calculus. It is based on a finite, sorted logic language with equality which includes the sorts FLUENT, ACTION and TIME along with the predicates < : TIME × TIME, that denotes a (possibly partial) ordering on time points; Holds : FLUENT × TIME, that is used to state that a fluent is true at a given time point; and Poss : ACTION × TIME × TIME, expressing that an action is possible for given starting and ending time points. As a most fundamental notion in the UAC, a state formula 7 Φ[~s ] in ~s is a first-order formula with free TIME variables ~s where (1) for each occurrence of Holds(f, s) in Φ[~s ] we have s ∈ ~s and (2) predicate Poss does not occur. State formulas allow to express properties of action domains at given time points. Although this definition is quite general in that it allows an arbitrary finite sequence of time points, for our purposes two time points will suffice. For a function A into sort ACTION, a precondition axiom for A(~x) is of the form Poss(A(~x), s, t) ≡ πA [s] if its prerequisite, justification and consequent are sentences, that is, have no free variables; otherwise, it is open. The semantics of defaults is defined via the notion of extensions for default theories. A default theory is a pair (W, D), where W is a set of sentences in first-order logic and D is a set of defaults. A default theory is closed if all its defaults are closed; otherwise, it is open. For a set T of formulas, we say that a default α : β/γ is applicable to T iff α ∈ T and ¬β ∈ / T ; we say that the default has been applied to T if it is applicable and additionally γ ∈ T . Extensions for a default theory (W, D) are deductively closed sets of formulas which contain all elements of W , are closed under application of defaults from D and which are grounded in the sense that each formula in them has a non-cyclic derivation. For closed default theories this is captured by the following definition. Definition 2 (Theorem 2.1, [Reiter, 1980]). Let (W, D) be a closed default theory and E be a set of closed formulas. def def Define E0 = W and Ei+1 = Th(Ei ) ∪ Di for i ≥ 0, where   α:β def ∈ D, α ∈ Ei , ¬β ∈ /E Di = γ γ S∞ Then E is an extension for (W, D) iff E = i=0 Ei . We will interpret open defaults as schemata representing all of their ground instances. Therefore, open default theories can be viewed as shorthand notation for their closed counterparts.2 When we use an extension E or set of defaults D with an integer subscript, we refer to the Ei and Di from above. We write (W, D) |≈ Ψ to express that the formula Ψ is contained in each extension of the default theory (W, D). (1) where πA [s] is a state formula in s with free variables among s, t, ~x. The formula πA [s] thus defines the necessary and sufficient conditions for the action A to be applicable for the arguments ~x at time point s, resulting in t. The UAC also provides a general form for effect axioms; we however omit this definition because we only use a special form of effect axioms here. The last notion we import formalises how action domains are axiomatised in the unifying action calculus. Definition 1. A (UAC) domain axiomatisation consists of a finite set of foundational axioms Ω defining a time structure, a set Π of precondition axioms (1) and a set Υ of effect axioms; the latter two for all functions into sort ACTION; lastly, it contains uniqueness-of-names axioms for all finitely many function symbols into sorts FLUENT and ACTION. The foundational axioms Ω serve to instantiate the UAC by a concrete time structure, for example the branching situations with their usual ordering from the situation calculus. We restrict our attention to domains that make intuitive sense; one of the basic things we require is that actions actually consume time: A domain axiomatisation is progressing, if Ω |= (∃s : TIME)(∀t : TIME)s ≤ t and Ω ∪ Π |= Poss(a, s, t) ⊃ s < t. Here, we are only concerned with progressing domain axiomatisations; we use the macro def ¬(∃s)s < t to refer to the unique initial time point. Init(t) = For presentation purposes, we will make use of the concept of fluent formulas, where terms of sort FLUENT play the role of atomic formulas, and complex formulas can be built using the usual first-order constructors. For a fluent formula Φ, we will denote by Φ[s] the state formula that is obtained by replacing all fluent literals [¬]f in Φ by [¬]Holds(f, s). The operator |·| will be used to extract the affirmative component of a fluent literal, that is, |¬f | = |f | = f ; the polarity of a fluent literal is given by sign(¬f ) = − and sign(f ) = +. 2.3 Default Reasoning in Action Domains with Unconditional, Local Effect Actions The approach of [Baumann et al., 2010] combines default logic with the unifying action calculus: domain axiomatisations are viewed as incomplete knowledge bases that are completed by defaults. It takes as input a description of a particular action domain with normality statements. This description comprises the following: (1) a domain signature, that defines the vocabulary of the domain; (2) a description of the direct effects of actions; (3) a set of state defaults Φ ψ, constructs that specify conditions Φ under which a fluent literal ψ normally holds in the domain.3 The state defaults from the domain description are translated into Reiter defaults, where the special predicates DefT(f, s, t) and DefF(f, s, t) are used to express that a fluent f becomes normally true (false) from s to t.4 For each state default δ, two Reiter defaults are created: δInit , that is used for default conclusions about the initial time point; and δReach , that is used for default conclusions about time points that can be reached via action application. 2.2 Default Logic Default logic as introduced by [Reiter, 1980] uses defaults to extend incomplete world knowledge. They are of the form1 α:β γ (shorthand: α : β/γ) Here, α, the prerequisite, the β, the justification, and γ, the consequent, are first-order formulas. These expressions are to be read as “whenever we know α and nothing contradicts β, we can safely conclude γ”. A default is normal if β = γ, that is, justification and consequent coincide. A default is closed 2 Free variables of formulas not in a default will however be implicitly universally quantified from the outside. 3 Here, Φ, the prerequisite, is a fluent formula; ψ, the consequent, being a fluent literal also allows to express that a fluent normally does not hold in the domain. 4 It should be noted that DefF(f, s, t) is not the same as ¬DefT(f, s, t) – the latter only means that f becomes not normally true from s to t. 1 Reiter [1980] introduces a more general version of defaults with an arbitrary number of justifications, which we do not need here. 8 Definition 3. Let δ = Φ ψ be a state default. Init(t) ∧ Φ[t] : ψ[t] ψ[t] Pre (s, t) : Def (ψ, s, t) δ def = Def (ψ, s, t) def δInit = δReach Note that a default conclusion of a state property in a noninitial state crucially depends on an action execution leading to that state. Hence, whenever it is definitely known that Holds(f, t) after Poss(a, s, t), it follows from the effect axiom that ¬DefF(f, s, t); a symmetrical argument applies if ¬Holds(f, t). This means that definite knowledge about a fluent inhibits the opposite default conclusion. But observe that the addition of DefT and DefF as “causes” to the effect axiom weakened the solution to the frame problem established earlier. The following definition ensures that the persistence assumption is restored in its full generality. Definition 5. Let ∆ be a set of state defaults, ψ be a fluent literal and s, t be variables of sort TIME. The default closure axiom for ψ with respect to ∆ is   ^  (11) ¬PreΦ ψ (s, t) ⊃ ¬Def (ψ, s, t) (2) (3) def Preδ (s, t) = Φ[t] ∧ ¬(Φ[s] ∧ ¬ψ[s])  DefT(ψ, s, t) if ψ = |ψ| def Def (ψ, s, t) = DefF(|ψ| , s, t) otherwise For a set ∆ of state defaults, the corresponding defaults are def def ∆Init = {δInit | δ ∈ ∆} and ∆Reach = {δReach | δ ∈ ∆}. For the Reach defaults concerning two time points s, t connected via action application, we ensure that the state default δ was not violated at the starting time point s by requiring ¬(Φ[s] ∧ ¬ψ[s]) in the prerequisite.5 The consequent is then inferred unless there is information to the contrary. Being true (or false) by default is then built into the effect axiom by accepting it as a possible “cause” to determine a fluent’s truth value. The other causes are the ones already known from monotonic formalisms for reasoning about actions: direct action effects, and a notion of persistence that provides a solution to the frame problem [McCarthy and Hayes, 1969]. Definition 4. Let f : FLUENT and s, t : TIME be variables. The following macros express that f persists from s to t: def FrameT(f, s, t) = Holds(f, s) ∧ Holds(f, t) def FrameF(f, s, t) = ¬Holds(f, s) ∧ ¬Holds(f, t) Φ For a fluent literal ψ not mentioned as a consequent in ∆ the default closure axiom is just ⊤ ⊃ ¬Def (ψ, s, t). Given a domain axiomatisation Σ and a set ∆ of state defaults, we denote by Σ∆ the default closure axioms with respect to ∆ and the fluent signature of Σ. The fundamental notion of the solution to the state default problem by [Baumann et al., 2010] is now a default theory where the incompletely specified world consists of a UAC domain axiomatisation augmented by suitable default closure axioms. The default rules are the automatic translations of user-specified, domain-dependent state defaults. For a domain axiomatisation Σ and a set ∆ of state defaults, the corresponding domain axiomatisation with state defaults is the pair (Σ ∪ Σ∆ , ∆Init ∪ ∆Reach ). We use a well-known example domain [Reiter, 1991] to illustrate the preceding definitions. To ease the presentation, in this example we instantiate the UAC to the branching time structure of situations. Example 1 (Breaking Objects). Imagine a robot that can move around and carry objects, among them a vase. When the robot drops an object x, it does not carry x any more and additionally x is broken. Usually, however, objects are not broken unless there is information to the contrary. The fluents that we use to describe this domain are Carries(x) (the robot carries x) and Broken(x) (x is broken); the only function of sort ACTION is Drop(x). Dropping an object is possible if and only if the robot carries the object: (4) (5) Let A be a function into sort ACTION and ΓA be a set of fluent literals with free variables in ~x that denote the positive and negative direct effects of A(~x), respectively. The following pair of macros expresses that f is a direct effect of A(~x): _ def DirectT(f, A(~x), s, t) = f = F (~x′ ) (6) F (~ x′ )∈ΓA , ~ x′ ⊆~ x def DirectF(f, A(~x), s, t) = _ ¬F (~ x′ )∈Γ A f = F (~x′ ) (7) ,~ x′ ⊆~ x An effect axiom with unconditional effects, the frame assumption and normal state defaults is of the form Poss(A(~x), s, t) ⊃ (∀f )(Holds(f, t) ≡ CausedT(f, A(~x), s, t)) ∧ (∀f )(¬Holds(f, t) ≡ CausedF(f, A(~x), s, t)) Poss(Drop(x), s, t) ≡ Holds(Carries(x), s) ∧ t = Do(Drop(x), s) The effects of dropping an object x are given by the set ΓDrop(x) = {¬Carries(x), Broken(x)} (8) where def CausedT(f, A(~x), s, t) = DirectT(f, A(~x), s, t) ∨ FrameT(f, s, t) ∨ DefT(f, s, t) def CausedF(f, A(~x), s, t) = DirectF(f, A(~x), s, t) ∨ FrameF(f, s, t) ∨ DefF(f, s, t) ψ∈∆ The set of state defaults ∆break = {⊤ ¬Broken(x)} says that objects are normally not broken. Applying the definitions from above to this specification results in the domain axbreak break iomatisation with defaults (Σbreak ∪ Σbreak ∆ , ∆Init ∪ ∆Reach ), break where Σ contains effect axiom (8) and the above precondition axiom for Drop, the set ∆break Init contains only Init(t) : ¬Holds(Broken(x), t) ¬Holds(Broken(x), t) (9) (10) 5 The reason for this is to prevent application of initially definitely violated state defaults through irrelevant actions. A default violation occurs when the prerequisite Φ[s] of a state default δ is known to be met, yet the negation of the consequent prevails, ¬ψ[s]. 9 and the defaults ∆break Reach for action application consist of ¬Holds(Broken(x), s) : DefF(Broken(x), s, t) DefF(Broken(x), s, t) Finally, the default closure axioms for the fluent Broken are Holds(Broken(x), s) ⊃ ¬DefF(Broken(x), s, t) and ¬DefT(Broken(x), s, t), and ¬Def (ψ, s, t) for all other fluent def Do(Drop(Vase), S0 ), the default literals ψ. With S1 = theory sanctions the sceptical conclusions that the vase is initially not broken, but is so after dropping it: Definition 7. Let ε = Φ/ψ be a conditional effect expression and f : FLUENT and s, t : TIME be variables. The following macro expresses that ε has been activated for f from s to t:7 def Activatedε (f, s, t) = (f = |ψ| ∧ Φ[s]) Let A be a function into sort ACTION with a set of conditional effect expressions ΓA(~x) that is local-effect. The direct positive and negative effect formulas for A(~x) are _ DirT(f, A(~x), s, t) ≡ Activatedε (f, s, t) (12) break break (Σbreak ∪ Σbreak ∆ , ∆Init ∪ ∆Reach ) |≈ ¬Holds(Broken(Vase), S0 ) ∧ Holds(Broken(Vase), S1 ) One of the main theoretical results of [Baumann et al., 2010] was the guaranteed existence of extensions for the class of domain axiomatisations with defaults considered there. As we will see later on, a similar result holds for our generalisation of the theory. Proposition 1 (Theorem 4, [Baumann et al., 2010]). Let Σ be a domain axiomatisation and ∆ be a set of state defaults. Then the corresponding domain axiomatisation with state defaults (Σ ∪ Σ∆ , ∆Init ∪ ∆Reach ) has an extension. If furthermore Σ is consistent, then so are all extensions for (Σ ∪ Σ∆ , ∆Init ∪ ∆Reach ). 3 ε∈Γ+ A(~ x) DirF(f, A(~x), s, t) ≡ _ Activatedε (f, s, t) (13) ε∈Γ− A(~ x) An effect axiom with conditional effects, the frame assumption and normal state defaults is of the form (8), where def CausedT(f, A(~x), s, t) = DirT(f, A(~x), s, t) ∨ FrameT(f, s, t) ∨ DefT(f, s, t) (14) def CausedF(f, A(~x), s, t) = DirF(f, A(~x), s, t) ∨ FrameF(f, s, t) ∨ DefF(f, s, t) (15) The only difference between the effect axioms of [Baumann et al., 2010] and the effect axioms defined here is the replacement of their macros DirectT, DirectF for unconditional direct effects with the predicates DirT, DirF for conditional effects. In the following, we will understand domain axiomatisations to contain – for each action – effect axioms of the form (8) along with the respective direct positive and negative effect formulas. To ease notation, for predicates with an obvious polarity (like DirT, DirF), we use a neutral version (like Dir) with fluent literals L, where Dir(L, a, s, t) denotes DirF(F, a, s, t) if L = ¬F for some fluent F and DirT(L, a, s, t) otherwise. While this extended definition of action effects is straightforward, it severely affects the correctness of default reasoning in the action theory: as the following example shows, one cannot naı̈vely take this updated version of the effect axioms and use the Reiter defaults as before. Example 1 (Continued). We add a unary fluent Fragile with the obvious meaning and modify the Drop action such that dropping only breaks objects that are fragile: ΓDrop(x) = {⊤/¬Carries(x), Fragile(x)/Broken(x)}. Assume that all we know is that the robot initially carries the vase, Holds(Carries(Vase), S0 ). As before, the effect axiom tells us that the robot does not carry the vase any more at S1 . Additionally, since we do not know whether the vase was fragile at S0 , there is no reason to believe that it is broken after dropping it, hence ¬Broken(Vase) still holds by default at S1 . But now, due to the presence of conditional effects, the effect axiom for Drop(Vase) clearly entails ¬Holds(Broken(Vase), S1 ) ⊃ ¬Holds(Fragile(Vase), S0 ),8 Conditional Effects We first investigate how the default reasoning framework of [Baumann et al., 2010] can be extended to conditional effect actions. As we will show, there is subtle interdependence between conditional effects and default conclusions, which requires a revision of the defaults constructed in Definition 3. We begin by formalising how to represent conditional effects in the domain specification language. Recall that in the unconditional case, action effects were just literals denoting the positive and negative effects. In the case of conditional effects, theses literals are augmented with a fluent formula that specifies the conditions under which the effect materialises. Definition 6. A conditional effect expression is of the form Φ/ψ, where Φ is a fluent formula and ψ a fluent literal. Φ/ψ is called positive if sign(ψ) = + and negative if sign(ψ) = −. For an action A and sequence of variables ~x matching A’s arity, a conditional effect expression ε is called local for A(~x) iff all free variables in ε are among ~x. Throughout the paper, we will assume given a set ΓA(~x) of conditional effect expressions for each function A into sort ACTION with matching sequence of variables ~ x. Such a set ΓA(~x) is called local-effect if all ε ∈ ΓA(~x) are local for A(~x). − By Γ+ A(~ x) we refer to the positive, by ΓA(~ x) to the negative elements of ΓA(~x) . With this specification of action effects, it is easy to express the implication “effect precondition implies effect” via suitable formulas. For this purpose, we introduce the new predicates DirT and DirF. Intuitively, DirT(f, a, s, t) says that f is a direct positive effect of action a from s to t; symmetrically, DirF(f, a, s, t) says that f is a direct negative effect.6 7 The second time argument t of macro Activatedε (f, s, t) will only be needed later when we introduce non-deterministic effects. 8 This is just the contrapositive of the implication expressed by the effect axiom. 6 Notice that these new predicates are in contrast to Definition 4, where DirectT and DirectF are merely syntactic sugar. 10 and thus we can draw the conclusion default only if it is known that a conflict cannot arise, that is, if it is known that the contradictory direct effect cannot materialise. To this end, we extend the original default prerequisite Preδ (s, t) = Φ[t] ∧ ¬(Φ[s] ∧ ¬ψ[s]) that only requires the precondition to hold and the default not to be violated previously: we will additionally stipulate that any action a happening at the same time cannot create a conflict. Definition 9. Let δ = Φ ψ be a state default and s, t : TIME be variables. break break (Σbreak ∪ Σbreak ∆ , ∆Init ∪ ∆Reach ) |≈ ¬Holds(Fragile(Vase), S0 ) This is undesired as it lets us conclude something about the present (S0 ) using knowledge about the future (S1 ) which we could not conclude using only knowledge and default knowledge about the present (there is no default that could conclude ¬Fragile(Vase)). The flaw with this inference is that it makes default conclusions about a fluent whose truth value is affected by an action at the same time. This somewhat contradicts our intended usage of defaults about states: we originally wanted to express reasonable assumptions about fluents whose values are unknown. Generalising the example, the undesired behaviour occurs whenever there exists a default ΦD ψ with conclusion ψ whose negation ¬ψ might be brought about by a conditional effect ΦC /¬ψ. The faulty inference then goes like this: def Safeδ (s, t) = (∀a)(Poss(a, s, t) ⊃ ¬Dir(¬ψ, a, s, t)) Preδ (s, t) ∧ Safeδ (s, t) : Def (ψ, s, t) def (16) δPoss = Def (ψ, s, t) def For a set ∆ of state defaults, ∆Poss = {δPoss | δ ∈ ∆}. In the example domain, applying the above definition yields the following. Example 1 (Continued). For the state default δ break saying that objects are usually not broken, we have Safeδbreak (s, t) = (∀a)(Poss(a, s, t) ⊃ ¬DirT(Broken(x), a, s, t)). This expresses that the state default can be safely applied from s to t whenever for any action a happening at the same time, it is known that a does not cause a violation of this default at the break ending time point t. The resulting default δPoss is ΦD [t] ⊃ Def (ψ, s, t) ⊃ ψ[t] ⊃ ¬Dir(¬ψ, s, t) ⊃ ¬ΦC [s] Obviously, this inference is only undesired if there is no information about the effect’s precondition at the starting time point of the action. This motivates our formal definition of the conditions under which a so-called conflict between an action effect and a default conclusion arises. Definition 8. Let (Σ, ∆) be a domain axiomatisation with defaults, E be an extension for (Σ, ∆), α be a ground action and δ = Φ ψ be a ground state default. We say that there is a conflict between α and δ in E iff there exist ground time points σ and τ such that for some i ≥ 0 we have 1. (a) Ei 6|= Poss(α, σ, τ ) ⊃ ¬Dir(¬ψ, α, σ, τ ) (b) Ei 6|= Def (ψ, α, σ, τ ) 2. (a) Ei+1 |= Poss(α, σ, τ ) ⊃ ¬Dir(¬ψ, α, σ, τ ) (b) Ei+1 |= Def (ψ, σ, τ ) In words, a conflict arises in an extension if up to some stage i, before we make the default conclusion ψ, we cannot conclude the effect ¬ψ will not occur (1); after concluding ψ by default, we infer that ¬ψ cannot occur as direct effect (2). We can now go back to the example seen earlier and verify that the counter-intuitive conclusion drawn there was indeed due to a conflict in the sense of the above definition. Example 1 (Continued). Consider the only extension E break break break for (Σbreak ∪ Σbreak ∆ , ∆Init ∪ ∆Reach ). Before applying any defaults whatsoever, we know that dropping the vase is possible: E0break |= Poss(Drop(Vase), S0 , S1 ); but we do not know if the vase is fragile and hence E0break 6|= ¬DirT(Broken(Vase), Drop(Vase), S0 , S1 ) (item 1). After applying all the defaults, we know that the vase is not broken at S1 : E1break |= DefF(Broken(Vase), S0 , S1 ). Hence, it cannot have been broken by dropping it in S0 , that is, E1break |= ¬DirT(Broken(Vase), Drop(Vase), S0 , S1 ) (item 2), thus cannot have been fragile in the initial situation. In the following, we will modify the definition of Reiter defaults from [Baumann et al., 2010] to eliminate the possibility of such conflicts. The underlying idea is to apply a ¬Holds(Broken(x), s) ∧ Safeδbreak (s, t) : DefF(Broken(x), s, t) DefF(Broken(x), s, t) As we will see later (Theorem 3), the default closure axioms ¬PreΦ ψ (s, t) ⊃ ¬Def (ψ, s, t) for preserving the commonsense principle of inertia in the presence of inapplicable defaults need not be modified. With our new defaults, we can now redefine the concept of a domain axiomatisation with defaults for conditional effect actions. Definition 10. Let Σ be a domain axiomatisation where the effect axioms are given by Definition 7 and let ∆ be a set of state defaults. The corresponding domain axiomatisation with defaults is the pair (Σ ∪ Σ∆ , ∆Init ∪ ∆Poss ). The direct effect formulas that determine DirT and DirF will be redefined twice in this paper. We will understand the above definition to be retrofitted with their latest version. The extension to conditional effects is a proper generalisation of the original approach of Section 2.3 for the special case of unconditional effect actions, as is shown below. Theorem 2. Consider a domain axiomatisation with only unconditional action effects and a set ∆ of state defaults. Let Ξ1 = (Σ ∪ Σ∆ , ∆Init ∪ ∆Reach ) be the corresponding domain axiomatisation with defaults of [Baumann et al., 2010], and let Ξ2 = (Σ′ ∪ Σ∆ , ∆Init ∪ ∆Poss ) be the domain axiomatisation with defaults according to Definition 10. For a state formula Ψ and time point τ , we have Ξ1 |≈ Ψ[τ ] iff Ξ2 |≈ Ψ[τ ]. Proof sketch. For unconditional effects, a ground Dir atom is by Definition 7 equivalent to the corresponding Direct macro, hence the effect axioms of the two approaches are equivalent. Furthermore, the truth values of ground DirT and DirF atoms are always fixed, and consequently each Reiter default (16) defined above is applicable whenever the original Reach default (3) of [Baumann et al., 2010] is applicable. 11 4 Non-Local Effects 2. Defaults override persistence: (A) Let Φ′′ /ψ, Φ′′ /¬ψ ∈ / Γα for all Φ′′ ; ′ ′ (B) for each δ = Φ ¬ψ ∈ ∆, let δ ′ be not applicable to E; and (C) E |= Preδ (σ, τ ) ∧ Safeδ (σ, τ ). Then E |= ψ[τ ]. 3. The frame assumption is correctly implemented: For all fluent formulas Φ′′ , let Φ′′ /ψ, Φ′′ /¬ψ ∈ / Γα and for all state defaults δ ′ with consequent ψ or ¬ψ, let E |= ¬Preδ′ (σ, τ ). Then E |= ψ[σ] ≡ ψ[τ ]. Up to here, conditional effect expressions for an action A(~x) were restricted to contain only variables among ~x. Considering a ground instance A(~ς) of an action, this means that the set of objects that can possibly be affected by this action is already fixed to ~ς. This is a restriction because it can make the specification of certain actions at least cumbersome or utterly impossible, for example actions that affect a vast number of (or all of the) domain elements at once. The gain in expressiveness when allowing non-local action effects comes at a relatively low cost: it suffices to allow additional free variables ~y in the conditional effect expressions. They represent the objects that may be affected by the action without being among the action arguments ~x. Definition 11. Let A be a function into sort ACTION and ~x a sequence of variables matching A’s arity. Let ε be a conditional effect expression of the form Φ/F (~x′ , ~y ) or Φ/¬F (~x′ , ~y ) with free variables ~x′ , ~y , where ~x′ ⊆ ~x and ~y is disjoint from ~x. For variables f : FLUENT and s, t : TIME, the following macro expresses that ε has been activated for f from s to t: Proof sketch. Similar to the proof of Theorem 3 in [Baumann et al., 2010], adapted to our definition of Reiter defaults. 5 Disjunctive Effects The next and final addition to effect axiom (8) is the step of generalising purely deterministic action effects. Disjunctive action effects have been studied in the past [Kartha, 1994; Shanahan, 1997; Giunchiglia et al., 1997; Thielscher, 2000]. Our contribution in this paper is two-fold. First, we express disjunctive effects by building them into the effect axiom inspired by work on nonmonotonic causal theories [Giunchiglia et al., 2004]. This works without introducing additional function symbols – called determining fluents [Shanahan, 1997] – for which persistence is not assumed and that are used to derive indeterminate effects via conditional effects. The second and more important contribution is the combination of non-deterministic effects with state defaults. We claim that it brings a significant representational advantage: Disjunctive effects can explicitly represent potentially different outcomes of an action of which none is necessarily predictable. At the same time, state defaults can be used to model the action effect that normally obtains. For example, dropping an object might not always completely break it, but most of the time only damage it. This can be modelled in our framework by specifying “broken or damaged” as disjunctive effect of the drop action, and then including the default “normally, dropped objects are damaged” to express the usual outcome. Next, we define how disjunctive effects are declared by the user and accommodated into the theory. The basic idea is to allow disjunctions of fluent literals ψ1 ∨ . . . ∨ ψn in the effect part of a direct effect expression. The intended meaning of these disjunctions is that after action execution, at least one of the effects ψi holds. Definition 12. Let Φ be a fluent formula and Ψ = ψ1 ∨ . . . ∨ ψn be a disjunction of fluent literals. The pair Φ/Ψ is called a conditional disjunctive effect expression (or cdee). Firstly, we want to guarantee that at least one effect out of ψ1 ∨ . . . ∨ ψn occurs. To this end, we say for each ψi that non-occurrence of all the other effects ψj with j 6= i is a sufficient cause for ψi to occur. We build into the effect axiom (in the same way as before) the n implications Φ[s] ∧ ¬ψ2 [t] ∧ . . . ∧ ¬ψn [t] ⊃ Caused(ψ1 , a, s, t) .. . Φ[s] ∧ ¬ψ1 [t] ∧ . . . ∧ ¬ψn−1 [t] ⊃ Caused(ψn , a, s, t) def Activatedε (f, s, t) = (∃~y )(f = F (~x′ , ~y ) ∧ Φ[s]) The direct positive and negative effect formulas are of the form (12) and (13). Note that according to this definition, free variables ~y are quantified existentially when they occur in the context Φ and universally when they occur in the consequence ψ. They thus not only express non-local effects but also non-local contexts. Example 2 (Exploding Bomb [Reiter, 1991]). In this domain, objects might get broken not by getting dropped, but because a bomb in their proximity explodes: ΓDetonate(b) = {Bomb(b) ∧ Near(b, x)/Broken(x)}. Def. 11 yields the direct effect formulas DirT(f, Detonate(b), s, t) ≡ (∃x)(f = Broken(x) ∧ Holds(Near(x, b), s)) and DirF(f, Detonate(b), s, t) ≡ ⊥. In this example, the defaults from Definition 9 also prevented conflicts possibly arising from non-local effects. We will later see that this is the case for all domains with local and non-local effect actions. Like the original framework, our extension implements a particular preference ordering between causes that determine a fluent’s truth value. This means that whenever two causes are in conflict – for example, a state default says an object is not broken, and an action effect says it is – the preferred cause takes precedence. The preferences are direct effects < default conclusions < persistence, where a < b means “a is preferred to b”. The theorem below proves that this preference ordering is indeed established. Theorem 3. Let Σ be a domain axiomatisation, ∆ be a set of state defaults, δ = Φ ψ ∈ ∆ be a state default, E be an extension for the domain axiomatisation with state defaults (Σ ∪ Σ∆ , ∆Init ∪ ∆Poss ), ϕ be a ground fluent, and E |= Poss(α, σ, τ ) for ground action α and time points σ, τ . 1. Effects override everything: Φ/(¬)ϕ ∈ Γα and E |= Φ[σ] imply E |= (¬)ϕ[τ ]. 12 mined about Damaged(x) not being among its negative effects), the default δPoss is applicable and we conclude This, together with the persistence assumption, is in effect an exclusive or where only exactly one effect occurs (given that no other effects occur simultaneously). Thus we add, for each literal, its truth as sufficient cause for itself being true: break break (Σbreak ∪ Σbreak ∆ , ∆Init ∪ ∆Poss ) |≈ Holds(Carries(Vase), S0 ) ∧ Holds(Damaged(Vase), S1 ) Φ[s] ∧ ψ1 [t] ⊃ Caused(ψ1 , a, s, t) .. . Φ[s] ∧ ψn [t] ⊃ Caused(ψn , a, s, t) If we now observe that the vase is broken after all – Holds(Broken(Vase), S1 ) – and add this information to the knowledge base, we will learn that this was an action effect: This makes every interpretation where at least one of the mentioned literals became true a model of the effect axiom. For the next definition, we identify a disjunction of literals Ψ = ψ1 ∨ . . . ∨ ψn with the set of literals {ψ1 , . . . , ψn }. break break (Σbreak ∪ Σbreak ∆ , ∆Init ∪ ∆Poss ) |≈ Holds(Broken(Vase), S1 ) ⊃ DirT(Broken(Vase), Drop(Vase), S0 , S1 ) Definition 13. Let ε = Φ/Ψ be a conditional disjunctive effect expression, ψ ∈ Ψ and f : FLUENT and s, t : TIME be variables. The following macro expresses that effect ψ of cdee ε has been activated for f from s to t: Furthermore, the observation allows us to rightly infer that the vase was fragile at S0 . It is worth noting that for a cdee Φ/Ψ with deterministic effect Ψ = {ψ}, the macro ActivatedΦ/Ψ,ψ (f, s, t) expressing activation of this effect is equivalent to ActivatedΦ/ψ (f, s, t) from Definition 7 for activation of the conditional effect; hence the direct effect formulas (17) for disjunctive effects are a generalisation of (12), the ones for deterministic effects. We have considered here only local non-deterministic effects to keep the presentation simple. Of course, the notion can be extended to non-local effects without harm. def Activatedε,ψ (f, s, t) =  f = |ψ| ∧ Φ[s] ∧  ^ ψ ′ ∈Ψ\{ψ}   ¬ψ ′ [t] ∨ ψ[t] Let A be a function into sort ACTION and ΓA be a set of conditional disjunctive effect expressions with free variables in ~x that denote the direct conditional disjunctive effects of A(~x). The direct positive and negative effect formulas are _ DirT(f, A(~x), s, t) ≡ Activatedε,ψ (f, s, t) (17) 6 We have already seen in previous sections that the approach to default reasoning about actions presented here has certain nice properties: it is a generalisation of the basic approach [Baumann et al., 2010] and it implements a particular preference ordering among causes. While those results were mostly straightforward adaptations, the theorem below is novel. It states that conflicts between conditional effects and default conclusions in the sense of Definition 8 cannot occur. Theorem 4. Let (Σ, ∆) be a domain axiomatisation with defaults, E be an extension for (Σ, ∆) and δ = Φ ψ be a state default. Furthermore, let i ≥ 0 be such that Def (ψ, σ, τ ) ∈ / Ei and Def (ψ, σ, τ ) ∈ Ei+1 . Then for all ground actions α, Poss(α, σ, τ ) ⊃ ¬Dir(¬ψ, α, σ, τ ) ∈ Ei . Φ/Ψ∈ΓA(~ x) , ψ∈Ψ, sign(ψ)=+ DirF(f, A(~x), s, t) ≡ _ Activatedε,ψ (f, s, t) Properties of the Extended Framework (18) Φ/Ψ∈ΓA(~ x) , ψ∈Ψ, sign(ψ)=− The implementation of the example sketched above illustrates the definition. Example 1 (Continued). We once again modify the action Drop(x). Now a fragile object that is dropped becomes not necessarily completely broken, but might only get damaged. To this end, we record in the new fluent Dropped(x) that the object has been dropped and write the state default δ = Dropped(x) Damaged(x) saying that dropped objects are usually damaged. Together, these two express the normal outcome of the action drop. Formally, the action effects are ΓDrop(x) = { ⊤/¬Carries(x), ⊤/Dropped(x), Fragile(x)/Broken(x) ∨ Damaged(x)}. Constructing the direct effect formulas as per Definition 13 yields Proof. According to Def. 2, we have Ei+1 = Th(Ei ) ∪ ∆i ; hence, Def (ψ, σ, τ ) ∈ Ei+1 can have two possible reasons: 1. Def (ψ, σ, τ ) ∈ Th(Ei ) \ Ei . By construction, this can only be due to effect axiom (8), more specifically, we have (1) Ei |= Caused(ψ, α, σ, τ ) ∧ ¬Frame(ψ, σ, τ ) ∧ ¬Dir(ψ, σ, τ ) and (2) Ei |= ¬Caused(¬ψ, α, σ, τ ), whence Ei |= ¬Dir(¬ψ, α, σ, τ ) proving the claim. 2. Def (ψ, σ, τ ) ∈ ∆i . By definition of δPoss in Def. 9, Preδ (σ, τ ) ∧ Safeδ (σ, τ ) ∈ Ei , whereby we can conclude Poss(α, σ, τ ) ⊃ ¬Dir(¬ψ, α, σ, τ ) ∈ Ei . DirT(f, Drop(x), s, t) ≡ f = Dropped(x) ∨ (f = Broken(x) ∧ Holds(Fragile(x), s) ∧ (¬Holds(Damaged(x), t) ∨ Holds(Broken(x), t))) ∨ (f = Damaged(x) ∧ Holds(Fragile(x), s) ∧ (¬Holds(Broken(x), t) ∨ Holds(Damaged(x), t))) Note that conflicts already arise with conditional, local effects; the framework however makes sure there are no conflicts even for conditional, non-local, disjunctive effects. Finally, the existence of extensions for domain axiomatisations with state defaults can still be guaranteed for the extended framework. Since the effect axiom of Drop(x) is itself not determined about the status of Broken(x) and Damaged(x) (but is deter- 13 [Denecker and Ternovska, 2007] Marc Denecker and Eugenia Ternovska. Inductive Situation Calculus. AIJ, 171(5– 6):332–360, 2007. [Giunchiglia et al., 1997] Enrico Giunchiglia, G. Neelakantan Kartha, and Vladimir Lifschitz. Representing Action: Indeterminacy and Ramifications. AIJ, 95(2):409–438, 1997. [Giunchiglia et al., 2004] Enrico Giunchiglia, Joohyung Lee, Vladimir Lifschitz, Norman McCain, and Hudson Turner. Nonmonotonic Causal Theories. AIJ, 153(1-2):49–104, 2004. [Kartha, 1994] G. Neelakantan Kartha. Two Counterexamples Related to Baker’s Approach to the Frame Problem. AIJ, 69(1–2):379–391, 1994. [Lakemeyer and Levesque, 2009] Gerhard Lakemeyer and Hector Levesque. A Semantical Account of Progression in the Presence of Defaults. In Proceedings of IJCAI, pages 842–847, 2009. [McCarthy and Hayes, 1969] John McCarthy and Patrick J. Hayes. Some Philosophical Problems from the Standpoint of Artificial Intelligence. In Machine Intelligence, pages 463–502. Edinburgh University Press, 1969. [Michael and Kakas, 2011] Loizos Michael and Antonis Kakas. A Unified Argumentation-Based Framework for Knowledge Qualification. In E. Davis, P. Doherty, and E. Erdem, editors, Proceedings of the Tenth International Symposium on Logical Formalizations of Commonsense Reasoning, Stanford, CA, March 2011. [Reiter, 1980] Raymond Reiter. A Logic for Default Reasoning. AIJ, 13:81–132, 1980. [Reiter, 1991] Raymond Reiter. The Frame Problem in the Situation Calculus: A Simple Solution (Sometimes) and a Completeness Result for Goal Regression. In Artificial Intelligence and Mathematical Theory of Computation – Papers in Honor of John McCarthy, pages 359–380. Academic Press, 1991. [Reiter, 2001] Raymond Reiter. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. The MIT Press, September 2001. [Shanahan, 1997] Murray Shanahan. Solving the Frame Problem: A Mathematical Investigation of the Common Sense Law of Inertia. The MIT Press, February 1997. [Strass and Thielscher, 2009] Hannes Strass and Michael Thielscher. Simple Default Reasoning in Theories of Action. In Proceedings of AI, pages 31–40, Melbourne, Australia, December 2009. Springer-Verlag Berlin Heidelberg. [Thielscher, 2000] Michael Thielscher. Nondeterministic Actions in the Fluent Calculus: Disjunctive State Update Axioms. In Intellectics and Computational Logic (to Wolfgang Bibel on the occasion of his 60th birthday), pages 327–345, Deventer, The Netherlands, The Netherlands, 2000. Kluwer, B.V. [Thielscher, 2011] Michael Thielscher. A Unifying Action Calculus. AIJ, 175(1):120–141, 2011. Theorem 5. Let Σ be a domain axiomatisation and ∆ be a set of state defaults. Then the corresponding domain axiomatisation with defaults (Σ ∪ Σ∆ , ∆Init ∪ ∆Poss ) has an extension. If furthermore Σ is consistent, then so are all extensions for (Σ ∪ Σ∆ , ∆Init ∪ ∆Poss ). Proof. Existence of an extension is a corollary of Theorem 3.1 in [Reiter, 1980] since the defaults in ∆Init ∪ ∆Poss are still normal. If Σ is consistent, then so is Σ ∪ Σ∆ by the argument in the proof of Theorem 4 in [Baumann et al., 2010]. Consistency of all extensions then follows from Corollary 2.2 in [Reiter, 1980]. Additionally, it is easy to see that the domain specifications provided by the user are still modular: different parts of the specifications, such as conditional effect expressions and state defaults, are completely independent of each other from a user’s point of view. Yet, the intricate semantic interactions between them are correctly dealt with. 7 Discussion We have presented an extension to a recently introduced framework for default reasoning in theories of actions and change. The extension increases the range of applicability of the framework while fully retaining its desirable properties: we can now express context-dependent effects of actions, actions with a potentially global effect range and indeterminate effects of actions – all the while domain descriptions have not become significantly more complex, and default extensions of the framework still provably exist. There is not much related work concerning the kind of default reasoning about actions we consider here. [Denecker and Ternovska, 2007] enriched the situation calculus [Reiter, 2001] with inductive definitions. While they provide a nonmonotonic extension of an action calculus, the intended usage is to solve the ramification problem rather than to do the kind of defeasible reasoning we are interested in this work. [Lakemeyer and Levesque, 2009] provide a progression-based semantics for state defaults in a variant of the situation calculus, but without looking at nondeterministic actions. In an earlier paper [Strass and Thielscher, 2009], we explored default effects of nondeterministic actions, albeit in a much more restricted setting: there, actions had only unconditional effects – either deterministic or disjunctive of the form f ∨ ¬f –, and defaults had only atomic components, that is, they were of the form (¬)Holds(f, t) : (¬)Holds(g, t)/(¬)Holds(g, t). Most recently, [Michael and Kakas, 2011] gave an argumentationbased semantics for propositional action theories with state defaults. While being more flexible in terms of preferences between causes, their approach is constricted to a linear time structure built into the language and does not make a clear ontological distinction between fluents and actions. References [Baumann et al., 2010] Ringo Baumann, Gerhard Brewka, Hannes Strass, Michael Thielscher, and Vadim Zaslawski. State Defaults and Ramifications in the Unifying Action Calculus. In Proceedings of KR, pages 435–444, Toronto, Canada, May 2010. 14 A Logic for Specifying Partially Observable Stochastic Domains Gavin Rens1,2 and Thomas Meyer1,2 and Alexander Ferrein3 and Gerhard Lakemeyer3 {grens,tmeyer}@meraka.org.za {ferrein,gerhard}@cs.rwth-aachen.de 1 2 CSIR Meraka Institute, Pretoria, South Africa University of KwaZulu-Natal, School of Computer Science, South Africa 3 RWTH Aachen University, Informatik, Germany Abstract with expected meanings. The robot can perceive observations only from the set Ω = {obsNil , obsLight, obsMedium, obsHeavy}. When the robot performs a weigh action (i.e., it activates its ‘weight’ sensor) it will perceive either obsLight, obsMedium or obsHeavy; for other actions, it will perceive obsNil . The robot experiences its environs through three Boolean features: P = {full , drank, holding} meaning respectively that the oil-can is full, that the robot has drunk the oil and that it is currently holding something in its gripper. Given a formalization K of our scenario, the robot may have the following queries: We propose a novel modal logic for specifying agent domains where the agent’s actuators and sensors are noisy, causing uncertainty in action and perception. The logic draws both on POMDP theory and logics of action and change. The development of the logic builds on previous work in which a simple multi-modal logic was augmented with first-class observation objects. These observations can then be used to represent the set of observations in a POMDP model in a natural way. In this paper, a subset of the simple modal logic is taken for the new logic, in which modal operators may not be nested. The modal operators are then extended with notions of probability. It will be shown how stochastic domains can be specified, including new kinds of axioms dealing with perception and a frame solution for the proposed logic. 1 • Is the probability of perceiving that the oil-can is light 0.7 when the can is not full, and have I drunk the oil, and am I holding the can? Does (obsLight | weigh)0.7 (¬full ∧ drank ∧ holding) follow from K? • If the oil-can is empty and I’m not holding it, is there a 0.9 probability that I’ll be holding it after grabbing it, and a 0.1 probability that I’ll have missed it? Does (¬full ∧ ¬holding) → ([grab]0.9 (¬full ∧ holding) ∧ [grab]0.1 (¬full ∧ ¬holding)) follow from K? Introduction and Motivation In order for robots and intelligent agents in stochastic domains to reason about actions and observations, they must first have a model of the domain over which to reason. For example, a robot may need to represent available knowledge about its grab action in its current situation. It may need to represent that when ‘grabbing’ the oil-can, there is a 5% chance that it will knock over the oil-can. As another example, if the robot has access to information about the weight of an oil-can, it may want to represent the fact that the can weighs heavy with a 90% chance in ‘situation A’, but that it is heavy with a 98% chance in ‘situation B’. Logic-based artificial intelligence for agent reasoning is well established. In particular, a domain expert choosing to represent domains with a logic can take advantage of the progress made in cognitive robotics [Levesque and Lakemeyer, 2008] to specify domains in a compact and transparent manner. Modal logic is considered to be well suited to reasoning about beliefs and changing situations. POMDP theory has proven to be a good general framework for formalizing dynamic, stochastic systems. A drawback of traditional POMDP models is that they cannot include information about general facts and laws. Moreover, succinct axioms describing the dynamics of a domain cannot be writ- In the physical real world, or in extremely complex engineered systems, things are not black-and-white. We live in a world where there can be shades of truth and degrees of belief. Part of the problem is that agents’ actuators and sensors are noisy, causing uncertainty in their action and perception. In this paper, we propose a novel logic that draws on partially observable Markov decision process (POMDP) theory and on logics for reasoning about action and change, combining both in a coherent language to model change and uncertainty. Imagine a robot that is in need of an oil refill. There is an open can of oil on the floor within reach of its gripper. If there is nothing else in the robot’s gripper, it can grab the can (or miss it, or knock it over) and it can drink the oil by lifting the can to its ‘mouth’ and pouring the contents in (or miss its mouth and spill). The robot may also want to confirm whether there is anything left in the oil-can by weighing its contents. And once holding the can, the robot may wish to place it back on the floor. In situations where the oil-can is full, the robot gets 5 units of reward for grabbing the can, and it gets 10 units for a drink action. Otherwise, the robot gets no rewards. Rewards motivate an agent to behave as desired. The domain is (partially) formalized as follows. The robot has the set of actions A = {grab, drink, weigh, replace} 15 and Schmolze, 2005; Sanner and Kersting, 2010; Poole, 1998]. But for two of these, the frameworks are not logics per se. The first [Wang and Schmolze, 2005] is based on Functional STRIPS, “which is a simplified first-order language that involves constants, functions, and predicate symbols but does not involve variables and quantification”. Their representations of POMDPs are relatively succinct and they have the advantage of using first-order predicates. The STRIPS-like formalism is geared specifically towards planning, though, and their work does not mention reasoning about general facts. Moreover, in their approach, action-nondeterminism is modeled by associating sets of deterministic action-outcomes per nondeterministic action, whereas SLAOP will model nondeterminism via action effects—arguably, ours is a more natural and succinct method. Sanner and Kersting [2010] is similar to the first formalism, but instead of Functional STRIPS, they use the situation calculus to model POMDPs. Although reified situations make the meaning of formulae perspicuous, and reasoning with the situation calculus, in general, has been accepted by the community, when actions are nondeterministic, ‘action histories’ cause difficulties in our work: The set of possible alternative histories is unbounded and some histories may refer to the same state [Rens, 2010, Chap. 6]. When, in future work, SLAOP is extended to express belief states (i.e., sets of possible alternative states), dealing with duplicate states will be undesirable. The Independent Choice Logic [Poole, 1998] is relatively different from SLAOP; it is an extension of Probabilistic Horn Abduction. Due to its difference, it is hard to compare to SLAOP, but it deserves mentioning because it shares its application area with SLAOP and both are inspired by decision theory. The future may tell which logic is better for certain representations and for reasoning over the representations. Finally, SLAOP was not conceived as a new approach to represent POMDPs, but as the underlying specification language in a larger meta-language for reasoning robots that include notions of probabilistic uncertainty. The choice of POMDPs as a semantic framework is secondary. ten in POMDP theory. In this work, we develop a logic that will further our goal of combining modal logic with POMDP theory. That is, here we design a modal logic that can represent POMDP problems specifically for reasoning tasks in cognitive robotics (with domain axioms). The logic for actual decision-making will be developed in later work. To facilitate the correspondence between POMDPs and an agent logic, we require observation objects in the logic to correspond to the POMDPs’ set of observations. Before the introduction of the Logic of Actions and Observations (LAO) [Rens et al., 2010], no modal logic had explicit observations as first-class elements; sensing was only dealt with via special actions or by treating actions in such a way that they somehow get hold of observations. LAO is also able to accommodate models of nondeterminism in the actions and models of uncertainty in the observations. But in LAO, these notions are non-probabilistic. In this paper we present the Specification Logic of Actions and Observations with Probability (SLAOP). SLAOP is derived from LAO and thus also considers observations as firstclass objects, however, a probabilistic component is added to LAO for expressing uncertainty more finely. We have invented a new knowledge representation framework for our observation objects, based on the established approaches for specifying the behavior of actions. We continue our motivation with a look at the related work, in Section 2. Section 3 presents the logic and Section 4 provides some of the properties that can be deduced. Section 5 illustrates domain specification with SLAOP, including a solution to the frame problem. Section 6 concludes the paper. 2 Related Work Although SLAOP uses probability theory, it is not for reasoning about probability; it is for reasoning about (probabilistic) actions and observations. There have been many frameworks for reasoning about probability, but most of them are either not concerned with dynamic environments [Fagin and Halpern, 1994; Halpern, 2003; Shirazi and Amir, 2007] or they are concerned with change, but they are not actually logics [Boutilier et al., 2000; Bonet and Geffner, 2001]. Some probabilistic logics for reasoning about action and change do exist [Bacchus et al., 1999; Iocchi et al., 2009], but they lack some desirable attributes, for example, a solution to the frame problem, nondeterministic actions, or catering for sensing. There are some logics that come closer to what we desire [Weerdt et al., 1999; Van Diggelen, 2002; Gabaldon and Lakemeyer, 2007; Van Benthem et al., 2009], that is, they are modal and they incorporate notions of probability, but they were not created with POMDPs in mind and they don’t take observations as first-class objects. One nonlogical formalism for representing POMDPs [Boutilier and Poole, 1996] exploits structure in the problems for more compact representations. In (logic-based) cognitive robotics, such compact representation is the norm, for example, specifying only local effects of actions, and specifying a value related to a set of states in only one statement. On the other hand, there are three formalisms for specifying POMDPs that employ logic-based representation [Wang 3 Specification Logic of Actions and Observations with Probability SLAOP is a non-standard modal logic for POMDP specification for robot or intelligent agent design. The specification of robot movement has a ‘single-step’ approach in SLAOP. As such, the syntax will disallow nesting of modal operators; sentences with sequences of actions, like [grab][drink][replace]drank are not allowed. Sentences will involve at most unit actions, like [grab]holding ∨ [drink]drank. Nevertheless, the ‘single-step’ approach is sufficient for specifying the probabilities of transitions due to action executions. The logic to be defined in a subsequent paper will allow an agent to query the probability of some propositional formula ϕ after an arbitrary sequence of actions and observations. 16 3.1 Syntax state-transition function, representing, for each action, transition probabilities between states; R is the reward function, giving the expected immediate reward gained by the agent, for any state and agent action; Ω is a finite set of observations the agent can experience of its environment; and O is the observation function, giving, for each action and the resulting state, a probability distribution over observations, representing the agent’s ‘trust’ in its observations. Our semantics follows that of multi-modal logic K. However, SLAOP structures are non-standard in that they are extensions of structures with the form hW, Ri, where W is a finite set of worlds such that each world assigns a truth value to each atomic proposition, and R is a binary relation on W . Intuitively, when talking about some world w, we mean a set of features (fluents) that the agent understands and that describes a state of affairs in the world or that describes a possible, alternative world. Let w : P 7→ {0, 1} be a total function that assigns a truth value to each fluent. Let C be the set of all possible functions w. We call C the conceivable worlds. The vocabulary of our language contains four sorts: 1. a finite set of fluents (alias propositional atoms) P = {p1 , . . . , pn }, 2. a finite set of names of atomic actions A {α1 , . . . , αn }, = 3. a finite set of names of atomic observations Ω = {ς1 , . . . , ςn }, 4. a countable set of names Q = {q1 , q2 , . . .} of rational numbers in Q. From now on, denote Q ∩ (0, 1] as Q∩ . We refer to elements of A ∪ Ω ∪ Q as constants. We are going to work in a multimodal setting, in which we have modal operators [α]q , one for each α ∈ A, and predicates (ς | α)q and (ς | α)✸ , for each pair in Ω × A. Definition 3.1 Let α, α′ ∈ A, ς, ς ′ ∈ Ω, q ∈ (Q ∩ (0, 1]), r, c ∈ Q and p ∈ P. The language of SLAOP, denoted LSLAOP , is the least set of Φ defined by the grammars: Definition 3.2 A SLAOP structure is a tuple S hW, R, O, N, Q, U i such that ϕ ::= p | ⊤ | ¬ϕ | ϕ ∧ ϕ. Φ ::= ϕ | [α]q ϕ | (ς | α)q | (ς | α)✸ | α = α′ | = 1. W ⊆ C: the set of possible worlds (corresponding to S); ς = ς ′ | Reward(r) | Cost(α, c) | ¬Φ | Φ ∧ Φ. 2. R: a mapping that provides an accessibility relation Rα : W × W × Q∩ for each α ∈ A (correspondingPto T ); Given some w− ∈ W , we require that (w− ,w+ ,pr)∈Rα pr = 1; If (w− , w+ , pr), (w− , w+ , pr′ ) ∈ Rα , then pr = pr′ ; As usual, we treat ⊥, ∨, → and ↔ as abbreviations. We shall refer to formulae ϕ ::= p | ⊤ | ¬ϕ | ϕ ∧ ϕ as static. If a formula is static, it mentions no actions and no observations. [α]q ϕ is read ‘The probability of reaching a world in which ϕ holds after executing α, is equal to q’. [α] abbreviates [α]1 . hαiϕ abbreviates ¬[α]¬ϕ. (ς | α)q can be read ‘The probability of perceiving ς is equal to q, given α was performed’. (ς | α)✷ abbreviates (ς | α)1 . (ς | α)✸ is read ‘It is possible to perceive ς’, given α was performed’. The definition of a POMDP reward function R(a, s) may include not only the expected rewards for being in the states reachable from s via a, but it may deduct the cost of performing a in s. To specify rewards and execution costs in SLAOP, we require Reward and Cost as special predicates. Reward(r) can be read ‘The reward for being in the current situation is r units,’ and we read Cost(α, c) as ‘The cost for executing α is c units.’ Let VA = {v1α , v2α , . . .} be a countable set of action variables and VΩ = {v1ς , v2ς , . . .} a countable set of observation α α variables. Let ϕ|vα1 ∧ . . . ∧ ϕ|vαn be abbreviated by (∀v α )ϕ, where ϕ|vc means ϕ with all variables v ∈ (VA ∪ VΩ ) appearing in it replaced by constant c of the right sort (action or observation). Quantification over observations is similar to that for actions; the symbol ∃ is also available for abbreviation, with the usual meaning. 3. O: a nonempty finite set of observations (corresponding to Ω); 4. N : Ω 7→ O is a bijection that associates to each name in Ω, a unique observation in O; 5. Q: a mapping that provides a perceivability relation Qα : O × W × Q∩ for each α ∈ A (correspond+ ing that P to O); Given some w ∈ + W , we require + ′ pr = 1; If (ς, w , pr), (ς, w , pr ) ∈ + (o,w ,pr)∈Qα Qα , then pr = pr′ ; 6. U : a pair hRe, Coi (corresponding to R), where Re : W 7→ Q is a reward function and Co is a mapping that provides a cost function Coα : W 7→ Q for each α ∈ A; 7. Observation-per-action condition: For all α ∈ A, if (w, w′ , prα ) ∈ Rα , then there is an o ∈ O s.t. (o, w′ , pro ) ∈ Qα ; 8. Nothing-for-nothing condition: For all w, if there exists no w′ s.t. (w, w′ , pr) ∈ Rα for some pr, then Coα (w) = 0. A corresponds to A and Ω to Ω. Rα defines which worlds w+ are accessible via action α performed in world w− and the transition probability pr ∈ Q∩ . Qα defines which observations o are perceivable in worlds w+ accessible via action α and the observation probability pr ∈ Q∩ . We prefer to exclude relation elements referring to transitions that cannot occur, hence why pr ∈ Q∩ and not pr ∈ Q ∩ [0, 1]. 3.2 Semantics While presenting our semantics, we show how a POMDP, as defined below, can be represented by a SLAOP structure. A POMDP [Kaelbling et al., 1998] (for our purposes) is a tuple hS, A, T , R, Ω, Oi, where S is a finite set of states that the agent can be in; A is a finite set of agent actions; T is the 17 Because N is a bijection, it follows that |O| = |Ω| (we take |X| to be the cardinality of set X). The value of the reward function Re(w) is a rational number representing the reward an agent gets for being in or getting to the world w. It must be defined for each w ∈ W . The value of the cost function Co(α, w− ) is a rational number representing the cost of executing α in the world w− . It must be defined for each action α ∈ A and each w− ∈ W . Item 7 of Definition 3.2 implies that actions and observations always appear in pairs, even if implicitly. And item 8 seems reasonable; it states that any action that is inexecutable in world w incurs no cost for it in the world w. Proposition 4.1 Assume an arbitrary structure S and some w in S. Assume S, w |= [α]q θ ∧ [α]q′ ψ. Then 1. if q = q ′ then no deduction can be made; 2. if q 6= q ′ then S, w |= hαi¬(θ ↔ ψ); 3. if q > q ′ then S, w |= hαi¬(θ → ψ); 4. if q + q ′ > 1 then S, w |= hαi(θ ↔ ψ); 5. S, w |= [α]¬(θ ∧ ψ) → [α]q+q′ (θ ∨ ψ); 6. if q = 1 then S, w |= [α](ψ → θ) and S, w |= [α]q′ (θ ∧ ψ); 7. S, w |= [α]q ⊤ is a contradiction if q < 1; 8. S, w |= [α]1−q ¬ϕ iff S, w |= [α]q ϕ and q 6= 1; 9. S, w |= ¬[α]1−q ¬ϕ iff S, w |= ¬[α]q ϕ and q 6= 1. Definition 3.3 (Truth Conditions) Let S be a SLAOP structure, with α, α′ ∈ A, ς, ς ′ ∈ Ω, q ∈ (Q ∩ (0, 1]) or Q∩ as applicable, and r ∈ Q or Q as applicable. Let p ∈ P and let ϕ be any sentence in LSLAOP . We say ϕ is satisfied at world w in structure S (written S, w |= ϕ) if and only if the following holds: 1. S, w |= p iff w(p) = 1 for w ∈ W ; 2. S, w |= ⊤ for all w ∈ W ; 3. S, w |= ¬ϕ iff S, w 6|= ϕ; 4. S, w |= ϕ ∧ ϕ′ iff S, w |= ϕ and S, w |= ϕ′ ; 5. S, w |= α = α′ iff α and α′ are identical; 6. S, w |= ς = ς ′ iff ς and ς ′ are identical;  P 7. S, w |= [α]q ϕ iff (w,w′ ,pr)∈Rα ,S,w′ |=ϕ pr = q; 8. 9. 10. 11. S, w S, w S, w S, w Proof: Please refer to our draft report [Rens and Meyer, 2011]. Q.E.D. It is worth noting that in the case when q > q ′ (item 3), S, w |= hαi¬(θ ∧ ψ) is also a consequence. But hαi¬(θ → ψ) logically implies hαi¬(θ ∧ ψ). Consider item 8 further: Suppose [α]q∗ ϕ where q ∗ = 1 (in some structure at some world). Then, in SLAOP, one could represent S, w |= [α]1−q∗ ¬ϕ as ¬hαi¬ϕ. But this is just [α]ϕ (≡ [α]q∗ ϕ). The point is that there is no different way to represent [α]ϕ in SLAOP (other than syntactically). Hence, in item 8, we need not cater for the case when q = 1. Proposition 4.2 |=SLAOP ([α]q θ ∧ ¬[α]q ψ) → ¬[α](θ ↔ ψ). Proof: Let S be any structure and w a world in S. Assume S, w |= [α]q θ ∧ ¬[α]q ψ. Assume S, w |= [α](θ ↔ ψ). Then because S, w |= [α]q θ, one can deduce S, w |= [α]q ψ. This is a contradiction, therefore S, w 6|= [α](θ ↔ ψ). Hence, S, w |= ([α]q θ ∧ ¬[α]q ψ) → ¬[α](θ ↔ ψ). Q.E.D. Proposition 4.3 Assume an arbitrary structure S and an arbitrary world w in S. There exists some constant q such that S, w |= [α]q ϕ if and only if S, w |= hαiϕ. Proof: Assume an arbitrary structure S and an arbitrary world w in it. Then S, w |= [α] Pq ϕ for some constant q  ⇔ ∃q . (w,w′ ,pr)∈Rα ,S,w′ |=ϕ pr = q  P pr = 0 ⇔ Not: ∃q . ′ ′  P(w,w ,pr)∈Rα ,S,w |=ϕ ⇔ Not: ∃q . ′ ′ (w,w ,pr)∈Rα ,S,w |=¬ϕ pr = 1 ⇔ Not: S, w |= [α]¬ϕ ⇔ S, w |= hαiϕ. Q.E.D. |= (ς | α)q iff (N (ς), w, q) ∈ Qα ; |= (ς | α)✸ iff a q exists s.t. (N (ς), w, q) ∈ Qα ; |= Reward(r) iff Re(w) = r; |= Cost(α, c) iff Coα (w) = c. The definition of item 7 comes from probability theory, which says that the probability of an event (ϕ) is simply the sum of the probabilities of the atomic events (worlds) where the event (ϕ) holds. A formula ϕ is valid in a SLAOP structure (denoted S |= ϕ) if S, w |= ϕ for every w ∈ W . We define global logical entailment (denoted K |=GS ϕ) as follows: for all S, if S |= V ψ then S |= ϕ. ψ∈K 4 Some Properties Remark 4.1 Item 7 of Definition 3.2, the observation-peraction condition, implies that if S, w |= hαiϕ then S, w′ |= ϕ → (∃v ς )(v ς | α)✸ , for some w, w′ ∈ W . Remark 4.2 Item 8 of Definition 3.2, the nothing-for-nothing condition, implies that |=SLAOP (∀v α ) ¬hv α i⊤ → Cost(v α , 0). We are also interested in noting the interactions of any two percept events—when sentences of the form (ς | α)q ϕ are satisfied in the same world. Only two consequences could be gleaned, given Definition 3.3, item 8: Proposition 4.4 Assume an arbitrary structure S and some w in S. 1. If S, w |= (ς | α)q ∧ (ς ′ | α)q′ and ς is the same observation as ς ′ , then q = q ′ ; In the terminology of probability theory, a single world would be called an atomic event. Probability theory says that the probability of an event e is simply the sum of the probabilities of the atomic events (worlds) where e holds. We are interested in noting the interactions of any two sentences of the form [α]q ϕ being satisfied in the same world. Given the principle of the sum of atomic events, we get the following properties. 18 2. If S, w |= (ς | α)q ∧ (ς ′ | α)q′ and ς is not the same observation as ς ′ , then q + q ′ ≤ 1. 5.1 The Action Description In the following discussion, W ϕ is the set of worlds in which static formula ϕ holds (the ‘models’ of ϕ). A formal description for the construction of conditional effect axioms follows. For one action, there is a set of axioms that take the form Proof: Directly from probability theory and algebra. Q.E.D. Proposition 4.5 Assume an arbitrary structure S and an arbitrary world w in it. There exists some constant q such that S, w |= (ς | α)q if and only if S, w |= (ς | α)✸ . Proof: Let N (ς) = o. Assume an arbitrary structure S and an arbitrary world w in S. Then S, w |= (ς | α)q for some constant q ⇔ ∃q . (o, w, q) ∈ Qα ⇔ S, w |= (ς | α)✸ . Q.E.D. φ1 → ([α]q11 ϕ11 ∧ . . . ∧ [α]q1n ϕ1n ); φ2 → ([α]q21 ϕ21 ∧ . . . ∧ [α]q2n ϕ2n ); φj → ([α]qj1 ϕj1 ∧ . . . ∧ [α]qjn ϕjn ), where the φi and ϕik are static, and where the φi are conditions for the respective effects to be applicable, and in any one axiom, each ϕik represents a set W ϕik of worlds. The number qik is the probability that the agent will end up in a world in W ϕik , as the effect of performing α in the right condition φi . For axioms generated from the effect axioms (later in Sec. 5.1), we shall assume that ϕik is a minimal disjunctive normal form characterization of W ϕik . The following constraints apply. The following is a direct consequence of Propositions 4.3 and 4.5. Corollary 4.1 |=SLAOP [α]q ϕ → hαiϕ and |=SLAOP (ς | α)q → (ς | α)✸ . Further Properties of Interest − Recall that Rα = {(w, w′ ) | (w, w′ , pr) ∈ Rα }. We now justify treating [α]1 as [α] of regular multi-modal logic. Proposition 4.6 [α]1 is the regular [α]. That is, S, w |= − ′ [α]1 ϕ if and only if for all w′ , if wRα w , then S, w′ |= ϕ, for any structure S and any world w in S. Proof: S, wP |= [α]1 ϕ  ⇔ (w,w′ ,pr)∈Rα ,S,w′ |=ϕ pr = 1 ⇔ ∀w′ . if ∃pr . (w, w′ , pr) ∈ Rα then S, w′ |= ϕ − ′ ⇔ ∀w′ . if wRα w then S, w′ |= ϕ. Q.E.D. Proposition 4.7 hαi has normal semantics. That is, S, w |= hαiϕ if and only if there exist w′ , pr such that (w, w′ , pr) ∈ Rα and S, w′ |= ϕ. Proof: S, w |= hαiϕ ⇔ S, w |= ¬[α]¬ϕ ⇔ S, w |= ¬[α]1 ¬ϕ ⇔ S,Pw 6|= [α]1 ¬ϕ  ⇔ (w,w′ ,pr)∈Rα ,S,w′ |=¬ϕ pr 6= 1 ⇔ ∃w′ , pr . (w, w′ , pr) ∈ Rα and S, w′ 6|= ¬ϕ ⇔ ∃w′ , pr . (w, w′ , pr) ∈ Rα and S, w′ |= ϕ. Q.E.D. 5 ··· ; • There must be a set of effect axioms for each action α ∈ A. • The φi must be mutually exclusive, i.e., the conjunction of any pair of conditions causes a contradiction. However, it is not necessary that W ϕi1 ∪ . . . ∪ W ϕin = C. • A set of effects ϕi1 to ϕin in any axiom i must be mutually exclusive. • The transition probabilities qi1 , . . . , qin of any axiom i must sum to 1. The following sentence is an effect axiom for the grab action: (full ∧ ¬holding) → ([grab]0.7 (full ∧ holding) ∧ [grab]0.2 (¬full ∧ ¬holding) ∧ [grab]0.1 (full ∧ ¬holding)). Executability axioms of the form φk → hαi⊤ must be supplied, for each action, where φk is a precondition conveying physical restrictions in the environment with respect to α. The sentence ¬holding → hgrabi⊤ states that if the robot is not holding the oil-can, then it is possible to grab the can. A set of axioms must be generated that essentially states that if the effect or executability axioms do not imply executability for some action, then that action is inexecutable. Hence, given α, assume the presence of an executability closure axiom of the following form: ¬(φ1 ∨ . . . ∨ φj ∨ φk ) → ¬hαi⊤. The sentence holding → ¬hgrabi⊤ states that if the robot is holding the oil-can, then it is not possible to grab it. Now we show the form of sentences that specify what does not change under certain conditions—conditional frame axioms. Let φi → ([α]qi1 ϕi1 ∧ . . . ∧ [α]qin ϕin ) be the i-th effect axiom for α. For each α ∈ A, for each effect axiom i, do: For each fluent p ∈ P, if p is not mentioned in ϕi1 to ϕin , then (φi ∧ p) → [α]p and (φi ∧ ¬p) → [α]¬p are part of the domain specification. For our scenario, the conditional frame axioms of grab are Specifying Domains with SLAOP We briefly describe and illustrate a framework to formally specify—in the language of SLAOP—the domain in which an agent or robot is expected to live. Let BK be an agent’s background knowledge (including non-static formulae) and let IC be its initial condition, a static formula describing the world the agent finds itself in when it becomes active. In the context of SLAOP, we are interested in determining BK |=GS IC → ϕ, where ϕ is any sentence. The agent’s background knowledge may include static law axioms which are facts about the domain that do not change. They have no predictable form, but by definition, they are not dynamic and thus exclude mention of actions. drank → ¬full is one static law axiom for the oil-can scenario. The other kinds of axioms in BK are described below. (full ∧ ¬holding ∧ drank) → [grab]drank; (full ∧ ¬holding ∧ ¬drank) → [grab]¬drank; (¬full ∧ ¬holding ∧ drank) → [grab]drank; (¬full ∧ ¬holding ∧ ¬drank) → [grab]¬drank. 19 Given frame and effect axioms, it may still happen that the probability to some worlds cannot be logically deduced. Suppose (for the purpose of illustration only) that the sentence [grab]0.7 (full ∧ holding) ∧ [grab]0.3 (full ∧ ¬holding ∧ drank). via some action, there exists an observation associated with the action, perceivable in that world. The perceivability axioms must adhere to this remark. • For every pair of perceivability axioms φ → (ς | α)q and φ′ → (ς | α)q′ for the same observation ς, W φ must ′ be disjoint from W φ . P • For every particular condition φ, φ→(ς|α)q q = 1. This P is so that N (ς):(N (ς),w+ ,pr)∈Qα pr = 1. (1) can be logically deduced from the frame and effect axioms in BK. Now, according to (1) the following worlds are reachable: (full ∧ holding ∧ drank), (full ∧ holding ∧ ¬drank) and (full ∧ ¬holding ∧ drank). The transition probability to (full ∧ ¬holding ∧ drank) is 0.3, but what are the transition probabilities to (full ∧ holding ∧ drank) and (full ∧holding ∧¬drank)? We have devised a process to determine such hidden probabilities via uniform axioms [Rens and Meyer, 2011]. Uniform axioms describes how to distribute probabilities of effects uniformly in the case sufficient information is not available. It is very similar to what [Wang and Schmolze, 2005] do to achieve compact representation. A uniform axiom generated for (1) would be Some perceivability axioms for the oil-can scenario might be (obsNil | grab)✷ ; (¬full ∧ drank ∧ holding) → (obsLight | weigh)0.7 ; (¬full ∧ drank ∧ holding) → (obsHeavy | weigh)0.1 ; (¬full ∧ drank ∧ holding) → (obsMedium | weigh)0.2 . Perceivability axioms for sensory actions also state when the associated observations are possible. The following set of axioms states when the associated observations are impossible for sensory action weigh of our scenario. [grab]0.35 (full ∧ holding ∧ drank) ∧ [grab]0.35 (full ∧ holding ∧ ¬drank) ∧ [grab]0.3 (full ∧ ¬holding ∧ drank). ((¬full ∧ drank ∧ ¬holding) ∨ (full ∧ ¬drank ∧ ¬holding)) → ¬(lobsLight | weigh)✸ ; ((¬full ∧ drank ∧ ¬holding) ∨ (full ∧ ¬drank ∧ ¬holding)) → ¬(obsHeavy | weigh)✸ ; ((¬full ∧ drank ∧ ¬holding) ∨ (full ∧ ¬drank ∧ ¬holding)) → ¬(obsMedium | weigh)✸ . The following axiom schema represents all the effect condition closure axioms. (¬(φ1 ∨. . .∨φj )∧P ) → [A]P , where there is a different axiom for each substitution of α ∈ A for A and each literal for P . For example, (holding ∧ P ) → [grab]P , where P is any p ∈ P or its negation. The perceivability condition closure axiom schema is 5.2 The Perception Description ¬(φ11 ∨ · · · ∨ φ1j ) → ¬(ς1 | α)✸ ; ¬(φ21 ∨ · · · ) → ¬(ς2 | α)✸ ; · · · ; ¬(· · · ∨ φnk ) → ¬(ςn | α)✸ , One can classify actions as either ontic (physical) or sensory. This classification also facilitates specification of perceivability. Ontic actions have intentional ontic effects, that is, effects on the environment that were the main intention of the agent. grab, drink and replace are ontic actions. Sensory actions—weigh in our scenario—result in perception, maybe with (unintended) side-effects. Perceivability axioms specify what conditions must hold after the applicable action is performed, for the observation to be perceivable. Ontic actions each have perceivability axioms of the form (obsNil | α)✷ . Sensory actions typically have multiple observations and associated conditions for perceiving them. The probabilities for perceiving the various observations associated with sensory actions must be specified. The following set of perceivability axiom schemata does this: φ11 → (ς1 | α)q11 ; φ12 → (ς1 | α)q12 ; φ1j → (ς1 | α)q1n ; φ21 → (ς2 | α)q21 ; φnk → (ςn | α)qkn , where the φi are taken from the perceivability axioms. There are no perceivability closure axioms for ontic actions, because they are always tautologies. Ontic actions each have unperceivability axioms of the form (∀v ς )((v ς | α)✸ ↔ v ς = obsNil ). The axiom says that no other observation is perceivable given the ontic action. That is, for any instantiation of an observation ς ′ other than obsNil , ¬(ς ′ | α)✸ is a logical consequence. For sensory actions, to state that the observations not associated with action α are always impossible given α was executed, we need an axiom of the form (∀v ς )(v ς 6= o1 ∧ v ς 6= o2 ∧ · · · ∧ v ς 6= on ) → ¬(v ς | α)✸ . For the oil-can scenario, they are ··· ; ··· ; (∀v ς )(v ς | grab)✸ ↔ v ς = obsNil ; (∀v ς )(v ς | drink)✸ ↔ v ς = obsNil ; (∀v ς )(v ς | replace)✸ ↔ v ς = obsNil ; (∀v ς )(v ς 6= obsHeavy ∧ v ς 6= obsLight ∧ v ς 6= obsMedium) → ¬(v ς | weigh)✸ . where {ς1 , ς2 , . . . , ςn } is the set of first components of all elements in Qα and the φi are the conditions expressed as static formulae. The following constraints apply to these axioms. • There must be a set of perceivability axioms for each action α ∈ A. 5.3 The Utility Function • In the semantics section, item 7 of the definition of a SLAOP structure states that for every world reachable A sufficient set of axioms concerning ‘state rewards’ and ‘action costs’ constitutes a utility function. 20 and There must be a means to express the reward an agent will get for performing an action in a world it may find itself— for every action and every possible world. The domain expert must supply a set of reward axioms of the form φi → Reward(ri ), where φi is a condition specifying the world in which the rewards can be got (e.g., holding → Reward(5) and drank → Reward(10)). The conditions of the reward axioms must identify worlds that are pairwise disjoint. This holds for cost axioms too: The domain expert must also supply a set of cost axioms of the form (φi ∧ hαi⊤) → Cost(α, ci ), where φi is a condition specifying the world in which the cost ci will be incurred for action α. For example, (full ∧ hgrabi⊤) → Cost(grab, 2); (¬full ∧ hgrabi⊤) → Cost(grab, 1); (full ∧ hdrinki⊤) → Cost(drink, 2); (¬full ∧ hdrinki⊤) → Cost(drink, 1); hreplacei⊤ → Cost(replace, 0.8). (∀v α )¬p → (v α = β1 ∧ ¬Cond− (β1 , p)) → [β1 ])¬p ∧ .. . (v α = βm ∧ ¬Cond− (βm , p)) → [βm ])¬p ∧ (v α 6= β1 ∧ · · · ∧ v α 6= βm ) → [v α ])¬p. Claim 5.1 The collection of pairs of compact frame axioms for each fluent in P is logically equivalent to the collection of all conditional frame axioms and effect closure axioms generated with the processes presented above. Proof: Please refer to our draft report [Rens and Meyer, 2011]. Q.E.D. 5.4 A Frame Solution The method we propose for avoiding generating all the frame and effect closure axioms, is to write the effect and executability axioms, generate the uniform axioms, and then generate a set of a new kind of axioms representing the frame and effect closure axioms much more compactly. By looking at the effect axioms of a domain, one can define for each fluent p ∈ P a set Cause+ (p) of actions that can (but not necessarily always) cause p (as a positive literal) to flip to ¬p, and a set Cause− (p) of actions that can (but not necessarily always) causes ¬p (as a negative literal) to flip to p.1 For instance, grab ∈ Cause+ (f ull), because in effect axiom (f ull ∧ ¬holding) → ([grab]0.7 (f ull ∧ holding) ∧ [grab]0.2 (¬f ull ∧ ¬holding) ∧ [grab]0.1 (f ull ∧ ¬holding)), grab flips f ull to ¬f ull (with probability 0.2). The axiom also shows that grab ∈ Cause− (holding) because it flips ¬holding to holding (with probability 0.7). The actions mentioned in these sets may have deterministic or stochastic effects on the respective propositions. Furthermore, by looking at the effects axioms, Cond functions can be defined: For each α ∈ Cause+ (p), Cond+ (α, p) returns a sentence that represents the disjunction of all φi under which α caused p to be a negative literal. Cond− (α, p) is defined similarly. Suppose that Cause+ (p) = {α1 , . . . , αm } and Cause− (p) = {β1 , . . . , βn }. We propose, for any fluent p, a pair of compact frame axioms with schema (∀v α )p → (v α = α1 ∧ ¬Cond+ (α1 , p)) → [α1 ]p ∧ ... There are in the order of |A| · 2|Ω| · D frame axioms, where D is the average number of conditions on effects per action (the φi ). Let N be the average size of |Cause+ (p)| or |Cause− (p)| for any p ∈ P. With the two compact frame axioms (per fluent), no separate frame or effect closure axioms are required in the action description (AD). If we consider each of the most basic conjuncts and disjuncts as a unit length, then the size of each compact frame axiom is O(N ), and the size of all compact frame axioms in AD is in the order of N · 2|P|. For reasonable domains, N will be much smaller than |A|, and the size of all compact frame axioms is thus much smaller than the size of all frame and effect closure axioms (|A| · 2|P| · (D + 1)). 5.5 Some Example Entailment Results The following entailments have been proven concerning the oil-can scenario [Rens and Meyer, 2011]. BK oc is the background knowledge of an agent in the scenario. To save space and for neater presentation, we abbreviate constants and fluents by their initials. BK oc |=GS (f ∧ d ∧ ¬h) → [g]0.7 (f ∧ d ∧ h): If the can is full and the oil has been drunk, the probability of successfully grabbing it without spilling oil is 0.7. BK oc |=GS (f ∧ ¬d ∧ h) → ¬[d]0.2 (f ∨ ¬d ∨ ¬h): If the robot is in a situation where it is holding the full oil-can (and has not yet attempted drinking), then the probability of having failed to drink the oil is not 0.2. BK oc |=GS (∃v ς )(v ς | drink)✷ : In any world, there always exists an observation after the robot has drunk. BK oc |=GS hdi⊤ ↔ h: In any world, it is possible to drink the oil if and only if the can is being held. BK oc |=GS (f ∧ hdi⊤) → ¬Cost(d, 3): Assuming it is possible to drink and the can is full of oil, then the cost of doing the drink action is not 3 units. 6 (v α = αm ∧ ¬Cond+ (αm , p)) → [αm ]p ∧ (v α 6= α1 ∧ · · · ∧ v α 6= αm ) → [v α ]p Concluding Remarks We introduced a formal language specifically for robots that must deal with uncertainty in affection and perception. It is one step towards a general reasoning system for robots, not the actual system. 1 Such sets and functions are also employed by Demolombe, Herzig and Varzinczak [Demolombe et al., 2003]. 21 [Halpern, 2003] J. Y. Halpern. Reasoning about Uncertainty. The MIT Press, Cambridge, MA, 2003. [Iocchi et al., 2009] L. Iocchi, T. Lukasiewicz, D. Nardi, and R. Rosati. Reasoning about actions with sensing under qualitative and probabilistic uncertainty. ACM Transactions on Computational Logic, 10(1):5:1–5:41, 2009. [Kaelbling et al., 1998] L. Kaelbling, M. Littman, and A. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2):99– 134, 1998. [Levesque and Lakemeyer, 2008] H. Levesque and G. Lakemeyer. Cognitive Robotics. In B. Porter F. Van Harmelen, V. Lifshitz, editor, Handbook of Knowledge Representation, pages 869–886. Elsevier Science, 2008. [Poole, 1998] D. Poole. Decision theory, the situation calculus and conditional plans. Linköping Electronic Articles in Computer and Information Science, 8(3), 1998. [Rens and Meyer, 2011] G. Rens and T. Meyer. Logic and utility based agent planning language, Part II: Specifying stochastic domains. Technical Report KRR-10-01, KRR, CSIR Meraka Institute, Pretoria, South Africa, January 2011. url: http://krr.meraka.org.za/publications/2011. [Rens et al., 2010] G. Rens, I. Varzinczak, T. Meyer, and A. Ferrein. A logic for reasoning about actions and explicit observations. In Jiuyong Li, editor, Proc. of 23rd Australasian Joint Conf. on AI, pages 395–404, 2010. [Rens, 2010] G. Rens. A belief-desire-intention architecture with a logic-based planner for agents in stochastic domains. Master’s thesis, School of Computing, University of South Africa, 2010. [Rens, 2011] G. Rens. From an agent logic to an agent programming language for partially observable stochastic domains. In Proc. of 22nd Intl. Joint Conf. on AI, Menlo Park, CA, 2011. AAAI Press. To appear. [Sanner and Kersting, 2010] S. Sanner and K. Kersting. Symbolic dynamic programming for first-order POMDPs. In Proc. of 24th Natl. Conf. on AI, pages 1140–1146, 2010. [Shirazi and Amir, 2007] A. Shirazi and E. Amir. Probabilistic modal logic. In Proc. of 22nd Natl. Conf. on AI, pages 489–494. AAAI Press, 2007. [Van Benthem et al., 2009] J. Van Benthem, J. Gerbrandy, and B. Kooi. Dynamic update with probabilities. Studia Logica, 93(1):67–96, 2009. [Van Diggelen, 2002] J. Van Diggelen. Using modal logic in mobile robots. Master’s thesis, Cognitive Artificial Intelligence, Utrecht University, 2002. [Wang and Schmolze, 2005] C. Wang and J. Schmolze. Planning with POMDPs using a compact, logic-based representation. In Proc. of 17th IEEE Intl. Conf. on Tools with AI, pages 523–530, 2005. [Weerdt et al., 1999] M. De Weerdt, F. De Boer, W. Van der Hoek, and J.-J. Meyer. Imprecise observations of mobile robots specified by a modal logic. In Proc. of ASCI-99, pages 184–190, 1999. POMDP theory is used as an underlying modeling formalism. The formal language is based on multi-modal logic and accepts basic principals of cognitive robotics. We have also included notions of probability to represent the uncertainty, but we have done so ‘minimally’, that is, only as far as is necessary to represent POMDPs for the intended application. Beyond the usual elements of logics for reasoning about action and change, the logic presented here adds observations as first-class objects, and a means to represent utility functions. In an associated report [Rens and Meyer, 2011], the frame problem is addressed, and we provided a belief network approach to domain specification for cases when the required information is available. The computational complexity of SLAOP was not determined, and is left for future work. Due to the nature of SLAOP structures, we conjecture that entailment in SLAOP is decidable. It’s worth noting that the three latter frameworks discussed in Section 2 [Wang and Schmolze, 2005; Sanner and Kersting, 2010; Poole, 1998] do not mention decidability results either. The next step is to prove decidability of SLAOP entailment, and then to develop a logic for decision-making in which SLAOP will be employed. Domains specified in SLAOP will be used to make decisions in the ‘meta’ logic, with sentences involving sequences of actions and the epistemic knowledge of an agent. This will also show the significance of SLAOP in a more practical context. Please refer to our extended abstract [Rens, 2011] for an overview of our broader research programme. References [Bacchus et al., 1999] F. Bacchus, J. Y. Halpern, and H. J. Levesque. Reasoning about noisy sensors and effectors in the situation calculus. Artificial Intelligence, 111(1– 2):171–208, 1999. [Bonet and Geffner, 2001] B. Bonet and H. Geffner. Planning and control in artificial intelligence: A unifying perspective. Applied Intelligence, 14(3):237–252, 2001. [Boutilier and Poole, 1996] C. Boutilier and D. Poole. Computing optimal policies for partially observable decision processes using compact representations. In Proc. of 13th Natl. Conf. on AI, pages 1168–1175, 1996. [Boutilier et al., 2000] C. Boutilier, R. Reiter, M. Soutchanski, and S. Thrun. Decision-theoretic, high-level agent programming in the situation calculus. In Proc. of 17th Natl. Conf. on AI, pages 355–362. AAAI Press, Menlo Park, CA, 2000. [Demolombe et al., 2003] R. Demolombe, A. Herzig, and I. Varzinczak. Regression in modal logic. Journal of Applied Non-Classical Logics, 13(2):165–185, 2003. [Fagin and Halpern, 1994] R. Fagin and J. Y. Halpern. Reasoning about knowledge and probability. J. of ACM, 41(2):340–367, 1994. [Gabaldon and Lakemeyer, 2007] A. Gabaldon and G. Lakemeyer. ESP: A logic of only-knowing, noisy sensing and acting. In Proc. of 22nd Natl. Conf. on AI, pages 974–979, 2007. 22 Agent Supervision in Situation-Determined ConGolog Giuseppe De Giacomo Sapienza – Università di Roma Rome, Italy Yves Lespérance York University Toronto, Canada Christian Muise University of Toronto Toronto, Canada [email protected] [email protected] [email protected] Abstract out). For example, we could have an agent process representing a child and its possible behaviors, and a second process representing a babysitter that specifies the behaviors by the child that can be allowed. If the supervisor can control all the actions of the supervised agent, then it is straightforward to specify the behaviors that may result as a kind of synchronized concurrent execution of the agent and supervisor processes. A more interesting case arises when some agent actions are uncontrollable. For example, it may be impossible to prevent the child from getting muddy once he/she is allowed outside. In such circumstances, the supervisor may have to block some agent actions, not because they are undesirable in themselves (e.g. going outside), but because if they are allowed, the supervisor cannot prevent the agent from performing some undesirable actions later on (e.g. getting muddy). We follow previous work [McIlraith and Son, 2002; Fritz and McIlraith, 2006] in assuming that processes are specified in a high level agent programming language defined in the Situation Calculus [Reiter, 2001].1 In fact, we define and use a restricted version of the ConGolog agent programming language [De Giacomo et al., 2000] that we call Situation-Determined ConGolog (SDConGolog). In this version, following [De Giacomo et al., 2010] all transitions involve performing an action (i.e. there are no transitions that merely perform a test). Moreover, nondeterminism is restricted so that the remaining program is a function of the action performed, i.e. there is a unique remaining program δ ′ such that a given program δ can perform a transition (δ, s) →a (δ ′ , do(a, s)) involving action a in situation s. This means that a run of such a program starting in a given situation can be taken to be simply a sequence of actions, as all the intermediate programs one goes through are functionally determined by the starting program and situation and the actions performed. Thus we can see a program and a starting situation as specifying a language, that of all the sequences of actions that are runs of the program in the situation. This allows us to define language theoretic notions such as union, intersection, and difference/complementation in terms of op- We investigate agent supervision, a form of customization, which constrains the actions of an agent so as to enforce certain desired behavioral specifications. This is done in a setting based on the Situation Calculus and a variant of the ConGolog programming language which allows for nondeterminism, but requires the remainder of a program after the execution of an action to be determined by the resulting situation. Such programs can be fully characterized by the set of action sequences that they generate. The main results are a characterization of the maximally permissive supervisor that minimally constrains the agent so as to enforce the desired behavioral constraints when some agent actions are uncontrollable, and a sound and complete technique to execute the agent as constrained by such a supervisor. 1 Introduction There has been much work on process customization, where a generic process for performing a task or achieving a goal is customized to satisfy a client’s constraints or preferences [Fritz and McIlraith, 2006; Lin et al., 2008; Sohrabi et al., 2009]. This approach was originally proposed in [McIlraith and Son, 2002] in the context of web service composition [Su, 2008]. The idea is that the generic process provides a wide range of alternative ways to perform the task. During customization, alternatives that violate the constraints are eliminated. Some parameters in the remaining alternatives may be restricted or instantiated so as to ensure that any execution of the customized process will satisfy the client’s constraints. Another approach to service composition synthesizes an orchestrator that controls the execution of a set of available services to ensure that they realize a desired service [Sardiña and De Giacomo, 2009; Bertoli et al., 2010]. In this paper, we develop a framework for a similar type of process refinement that we call supervised execution. We assume that we have a nondeterministic process that specifies the possible behaviors of an agent, and a second process that specifies the possible behaviors that a supervisor wants to allow (or alternatively, of the behaviors that it wants to rule 1 Clearly, there are applications where a declarative formalism is preferable, e.g. linear temporal logic (LTL), regular expressions over actions, or some type of business rules. However, there has been previous work on compiling such declarative specification languages into ConGolog, for instance [Fritz and McIlraith, 2006], which handles an extended version of LTL interpreted over a finite horizon. 23 erations on the corresponding programs, which has applications in many areas (e.g. programming by demonstration and programming by instruction [Fritz and Gil, 2010], and plan recognition [Demolombe and Hamon, 2002]).Working with situation-determined programs also greatly facilitates the formalization of supervision/customization. In [De Giacomo et al., 2010], it is in fact shown that any ConGolog program can be made situation-determined by recording nondeterministic choices made in the situation. Besides a detailed characterization of SDConGolog,2 the main contributions of the paper are as follows: first, based on previous work in discrete event control [Wonham and Ramadge, 1987], we provide a characterization of the maximally permissive supervisor that minimally constrains the actions of the agent so as to enforce the desired behavioral specifications, showing its existence and uniqueness; secondly, we define a program construct for supervised execution that takes the agent program and supervisor program, and executes them to obtain only runs allowed by the maximally permissive supervisor, showing its soundness and completeness. The rest of the paper proceeds as follows. In the next section, we briefly review the Situation Calculus and the ConGolog agent programming language. In Section 3, we define SDConGolog, discuss its properties, and introduce some useful programming constructs and terminology. Then in Section 4, we develop our account of agent supervision, and define the maximal permissive supervisor and supervised execution. Finally in Section 5, we review our contributions and discuss related and future work. 2 while ϕ do δ δ1 |δ2 πx.δ δ∗ δ1 kδ2 In the above, α is an action term, possibly with parameters, and ϕ is situation-suppressed formula, that is, a formula in the language with all situation arguments in fluents suppressed. As usual, we denote by ϕ[s] the situation calculus formula obtained from ϕ by restoring the situation argument s into all fluents in ϕ. Program δ1 |δ2 allows for the nondeterministic choice between programs δ1 and δ2 , while πx.δ executes program δ for some nondeterministic choice of a legal binding for variable x (observe that such a choice is, in general, unbounded). δ ∗ performs δ zero or more times. Program δ1 kδ2 expresses the concurrent execution (interpreted as interleaving) of programs δ1 and δ2 . Formally, the semantics of ConGolog is specified in terms of single-step transitions, using the following two predicates [De Giacomo et al., 2000]: (i) T rans(δ, s, δ ′ , s′ ), which holds if one step of program δ in situation s may lead to situation s′ with δ ′ remaining to be executed; and (ii) F inal(δ, s), which holds if program δ may legally terminate in situation s. The definitions of T rans and F inal we use are as in [De Giacomo et al., 2010]; these are in fact the usual ones [De Giacomo et al., 2000], except that the test construct ϕ? does not yield any transition, but is final when satisfied. Thus, it is a synchronous version of the original test construct (it does not allow interleaving). A consequence of this is that in the version of ConGolog that we use, every transition involves the execution an action (tests do not make transitions), i.e., Preliminaries The situation calculus is a logical language specifically designed for representing and reasoning about dynamically changing worlds [Reiter, 2001]. All changes to the world are the result of actions, which are terms in the language. We denote action variables by lower case letters a, action types by capital letters A, and action terms by α, possibly with subscripts. A possible world history is represented by a term called a situation. The constant S0 is used to denote the initial situation where no actions have yet been performed. Sequences of actions are built using the function symbol do, such that do(a, s) denotes the successor situation resulting from performing action a in situation s. Predicates and functions whose value varies from situation to situation are called fluents, and are denoted by symbols taking a situation term as their last argument (e.g., Holding(x, s)). Within the language, one can formulate action theories that describe how the world changes as the result of actions [Reiter, 2001]. To represent and reason about complex actions or processes obtained by suitably executing atomic actions, various so-called high-level programming languages have been defined. Here we concentrate on (a fragment of) ConGolog that includes the following constructs: α ϕ? δ1 ; δ2 if ϕ then δ1 else δ2 while loop nondeterministic branch nondeterministic choice of argument nondeterministic iteration concurrency Σ ∪ C |= Trans(δ, s, δ ′ , s′ ) ⊃ ∃a.s′ = do(a, s). Here and in the remainder, we use Σ to denote the foundational axioms of the situation calculus from [Reiter, 2001] and C to denote the axioms defining the ConGolog language. 3 Situation-Determined Programs As mentioned earlier, we are interested in process customization. For technical reasons, we will focus on a restricted class of ConGolog programs for describing processes, namely “situation-determined programs”. A program δ is situationdetermined in a situation s if for every sequence of transitions, the remaining program is determined by the resulting situation, i.e., . SituationDetermined (δ, s) = ∀s′ , δ ′ , δ ′′ . ∗ ∗ ′ ′ Trans (δ, s, δ , s ) ∧ Trans (δ, s, δ ′′ , s′ ) ⊃ δ ′ = δ ′′ , where Trans∗ denotes the reflexive transitive closure of Trans. Thus, a (partial) execution of a situation-determined program is uniquely determined by the sequence of actions it has produced. This is a key point. In general, the possible executions of a ConGolog program are characterized by sequences of configurations formed by the remaining program and the current situation. In contrast, the execution of situationdetermined programs can be characterized in terms of sequences of actions only, those sequences that correspond to the situations reached from where the program started. atomic action test for a condition sequence conditional 2 In [De Giacomo et al., 2010], situation-determined programs were only dealt with incidentally. 24 For example, the ConGolog program (a; b) | (a; c) is not situation-determined in situation S0 as it can make a transition to a configuration (b, do(a, S0 )), where the situation is do(a, S0 ) and the remaining program is b, and it can also make a transition to a configuration (c, do(a, S0 )), where the situation is also do(a, S0 ) and the remaining program is instead c. It is impossible to determine what the remaining program is given only a situation, e.g. do(a, S0 ), reached along an execution. In contrast, the program a; (b | c) is situationdetermined in situation S0 . There is a unique remaining program (b | c) in situation do(a, S0 ) (and similarly for the other reachable situations). When we restrict our attention to situation-determined programs, we can use a simpler semantic specification for the language; instead of Trans we can use a next (partial) function, where next(δ, a, s) returns the program that remains after δ does a transition involving action a in situation s (if δ is situation determined, such a remaining program must be unique). We will axiomatize the next function so that it satisfies the following properties: next(πx.δ, a, s) = if next(δ, a, s) 6= ⊥ Interleaving concurrency: next(δ1 kδ2 , a, s) =  next(δ1 , a, s)kδ2    if next(δ1 , a, s) 6= ⊥ and next(δ2 , a, s) = ⊥  δ1 knext(δ2 , a, s)  if next(δ2 , a, s) 6= ⊥ and next(δ1 , a, s) = ⊥    ⊥ otherwise Test, empty program, undefined: next(ϕ?, a, s) = ⊥ next(nil, a, s) = ⊥ next(⊥, a, s) = ⊥ Moreover the undefined program is never Final: Final(⊥, s) ≡ false. Let C n be the set of ConGolog axioms extended with the above axioms specifying next and Final(⊥, s). It is easy to show that: Proposition 1 Properties N1, N2, and N3 are entailed by Σ∪ Cn. Note in particular that as per N3, if the remaining program is not uniquely determined, then next(δ, a, s) is undefined. Notice that for situation-determined programs this will never happen, and if next(δ, a, s) returns ⊥ it is because δ cannot make any transition using a in s: Corollary 2 ∃!δ ′ .Trans(δ, s, δ ′ , do(a, s)) ⊃ ∀δ ′ .(Trans(δ, s, δ ′ , do(a, s)) ⊃ next(δ, a, s) = δ ′ ) (N2) ¬∃!δ ′ .Trans(δ, s, δ ′ , do(a, s)) ⊃ next(δ, a, s) = ⊥ (N3) Here ∃!x.φ(x) means that there exists a unique x such that φ(x); this is defined in the usual way. ⊥ is a special value that stands for “undefined”. The function next(δ, a, s) is only defined when there is a unique remaining program after program δ does a transition involving the action a; if there is such a unique remaining program, then next(δ, a, s) denotes it. We define the function next inductively on the structure of programs using the following axioms: Atomic action:  nil if P oss(a, s) and α = a next(α, a, s) = ⊥ otherwise Σ ∪ C n |= ∀δ, s.SituationDetermined (δ, s) ⊃ ∀a [(next(δ, a, s) = ⊥) ≡ (¬∃δ ′ .Trans(δ, s, δ ′ , do(a, s)))]. Let’s look at an example. Imagine an agent specified by δB1 below that can repeatedly pick an available object and repeatedly use it and then discard it, with the proviso that if during use the object breaks, the agent must repair it: Sequence: next(δ1 ; δ2 , a, s) =  next(δ1 , a, s); δ2 if next(δ1 , a, s) 6= ⊥ and    (¬F inal(δ1 , s) or next(δ2 , a, s) = ⊥) δB1 = [π x.Available(x)?; [use(x); (nil | [break(x); repair(x)])]∗ ; discard(x)]∗ next(δ2 , a, s) if Final(δ1 , s) and next(δ1 , a, s) = ⊥    ⊥ otherwise We assume that there is a countably infinite number of available unbroken objects initially, that objects remain available until they are discarded, that available objects can be used if they are unbroken, and that objects are unbroken unless they break and are not repaired (this is straightforwardly axiomatized in the situation calculus). Notice that this program is situation-determined, though very nondeterministic. Conditional: next(if ϕ then δ1 else δ2 , a, s) = next(δdx , a, s) if ∃!d.next(δdx , a, s) 6= ⊥ ⊥ otherwise Nondeterministiciteration: next(δ, a, s); δ ∗ next(δ ∗ , a, s) = ⊥ otherwise next(δ, a, s) = δ ′ ∧ δ ′ 6= ⊥ ⊃ Trans(δ, s, δ ′ , do(a, s)) (N1)   next(δ1 , a, s) if ϕ[s] next(δ2 , a, s) if ¬ϕ[s] Loop:   next(δ, a, s); while ϕ do δ if ϕ[s] and next(δ, a, s) 6= ⊥ next(while ϕ do δ, a, s) =  ⊥ otherwise Nondeterministic branch:  next(δ1 , a, s) if next(δ2 , a, s) = ⊥ or   next(δ2 , a, s) = next(δ1 , a, s) next(δ1 |δ2 , a, s) = next(δ  2 , a, s) if next(δ1 , a, s) = ⊥  ⊥ otherwise Nondeterministic choice of argument: 25 Language theoretic operations on programs. We can extend the SDConGolog language so as to close it with respect to language theoretic operations, such as union, intersection and difference/complementation. We can already see the nondeterministic branch construct as a union operator, and intersection and difference can be defined as follows: Intersection/synchronous concurrency:   next(δ1 , a, s) & next(δ2 , a, s) if both are different from ⊥ next(δ1 & δ2 , a, s) =  ⊥ otherwise by executing δ from s which can be extended until a Final Difference: next(δ1 − δ2 , a, s) =  configuration is reached:   next(δ1 , a, s) − next(δ2 , a, s) if both are different from ⊥ next(δ1 , a, s)   if next(δ2 , a, s) = ⊥ GR(δ, s) = {~a | ∃~b.Final(next ∗ (δ, ~a~b, s), do(~a~b, s))} ⊥ if next(δ1 , a, s) = ⊥ It is easy to see that CR(δ, s) ⊆ GR(δ, s) ⊆ RR(δ, s), i.e., complete runs are good runs, and good runs are indeed runs. Moreover, CR(δ, s) = CR(δ ′ , s) implies GR(δ, s) = GR(δ ′ , s), i.e., if two programs in a situation have the same complete runs, then they also have the same good runs; however they may still differ in their sets of non-good runs, since CR(δ, s) = CR(δ ′ , s) does not imply RR(δ, s) = RR(δ ′ , s). We say that a program δ in s is non-blocking iff RR(δ, s) = GR(δ, s), i.e., if all runs of the program δ in s can be extended to runs that reach a Final configuration. For these new constructs, Final is defined as follows: Final(δ1 & δ2 , s) ≡ Final(δ1 , s) ∧ Final(δ2 , s) Final(δ1 − δ2 , s) ≡ Final(δ1 , s) ∧ ¬Final(δ2 , s) We can express the complement of a program δ using difference as follows: (πa.a)∗ − δ. It is easy to check that Proposition 1 and Corollary 2 also hold for programs involving these new constructs. As we will see later, synchronous concurrency can be used to constrain/customize a process. Difference can be used to prohibit certain process behaviors: δ1 − δ2 is the process where δ1 is executed but δ2 is not. To illustrate, consider an agent specified by program δS1 that repeatedly picks an available object and does anything to it provided it is broken at most once before it is discarded: The search construct. We can add to the language a search construct Σ, as in [De Giacomo et al., 1998]:   Σ(next(δ, a, s)) if there exists ~a s.t. Final(next ∗ (δ, a~a, s)) next(Σ(δ), a, s) =  ⊥ otherwise δS1 = [π x.Available(x)?; [π a.(a−(break(x) | discard(x)))]∗ ; (nil | (break(x)); [π a.(a−(break(x) | discard(x)))]∗ ); discard(x)]∗ F inal(Σ(δ), s) ≡ F inal(δ, s). Intuitively, next(Σ(δ), a, s) does lookahead to ensure that action a is in a good run of δ in s, otherwise it returns ⊥. Notice that: (i) RR(Σ(δ), s) = GR(Σ(δ), s), i.e., under the search construct all programs are non-blocking; (ii) RR(Σ(δ), s) = GR(δ, s), i.e., Σ(δ) produces exactly the good runs of δ; (iii) CR(Σ(δ), s) = CR(δ, s), i.e., Σ(δ) and δ produce exactly the same set of complete runs. Thus Σ(δ) trims the behavior of δ by eliminating all those runs that do not lead to a Final configuration. Note also that if a program is non-blocking in s, then RR(Σ(δ), s) = RR(δ, s), in which case there is no point in using the search construct. Finally, we have that: CR(δ, s) = CR(δ ′ , s) implies RR(Σ(δ), s) = RR(Σ(δ ′ ), s), i.e., if two programs have the same complete runs, then under the search construct they have exactly the same runs. Sequences of actions generated by programs. We can extend the function next to the function next ∗ (δ, ~a, s) that takes a program δ, a finite sequence of actions ~a,3 and a situation s, and returns the remaining program δ ′ after executing δ in s producing the sequence of actions ~a, defined by induction on the length of the sequence of actions as follows: next ∗ (δ, ǫ, s) = δ next ∗ (δ, a~a, s) = next ∗ (next(δ, a, s), ~a, do(a, s)) where ǫ denotes the empty sequence. Note that if along ~a the program becomes ⊥ then next ∗ returns ⊥ as well. We define the set RR(δ, s) of (partial) runs of a program δ in a situation s as the sequences of actions that can be produced by executing δ from s:4 4 Supervision Let us assume that we have two agents: an agent B with behavior represented by the program δB and a supervisor S with behavior represented by δS . While both are represented by programs, the roles of the two agents are quite distinct. The first is an agent B that acts freely within its space of deliberation represented by δB . The second, S, is supervising B so that as B acts, it remains within the behavior permitted by S. This role makes the program δS act as a specification of allowed behaviors for agent B. Note that, because of these different roles, one may want to assume that all configurations generated by (δS , s) are F inal, so that we leave B unconstrained on when it may terminate. This amounts to requiring the following property to hold: CR(δS , s) = GR(δS , s) = RR(δS , s). While reasonable, for the technical development below, we do not need to rely on this assumption. The behavior of B under the supervision of S is constrained so that at any point B can execute an action in its original behavior, only if such an action is also permitted in RR(δ, s) = {~a | next ∗ (δ, ~a, s) 6= ⊥} Note that if ~a ∈ RR(δ, s), then all prefixes of ~a are in RR(δ, s) as well. We define the set CR(δ, s) of complete runs of a program δ in a situation s as the sequences of actions that can be produced by executing δ from s until a Final configuration is reached: CR(δ, s) = {~a | Final(next ∗ (δ, ~a, s), do(~a, s))} We define the set GR(δ, s) of good runs of a program δ in a situation s as the sequences of actions that can be produced 3 Notice that such sequences of actions have to be axiomatized in second-order logic, similarly to situations (with UNA and domain closure). As a short cut they could also be characterized directly in terms of “difference” between situations. 4 Here and in what follows, we use set notation for readability; if we wanted to be very formal, we could introduce RR as a defined predicate, and similarly for CR, etc. 26 S’s behavior. Using the synchronous concurrency operator, this can be expressed simply as: Relaxed supervision. To define relaxed supervision we first need to introduce two operations on programs: projection and, based on it, relaxation. The projection operation takes a program and an action filter Au , and projects all the actions that satisfy the action filter (e.g., are uncontrollable), out of the execution. To do this, projection substitutes each occurrence of an atomic action term αi by a conditional statement that replaces it with the trivial test true? when Au (αi ) holds in the current situation, that is: δB & δS . Note that unless δB & δS happens to be non-blocking, it may get stuck in dead end configurations. To avoid this, we need to apply the search construct, getting Σ(δB & δS ). In general, the use of the search construct to avoid blocking, is always needed in the development below. We can use the example programs presented earlier to illustrate. The execution of δB1 under the supervision of δS1 is simply δB1 & δS1 (assuming all actions are controllable). It is straightforward to show that the resulting behavior is to repeatedly pick an available object and use it as long as one likes, breaking it at most once, and repairing it whenever it breaks, before discarding it. It can be shown that the set of partial/complete runs of δB1 & δS1 is exactly that of: pj (δ, Au ) = δ αi if Au (αi ) then true? else αi for every occurrence of an action term αi in δ. (Recall that such a test does not perform any transition in our variant of ConGolog.) The relaxation operation on δ wrt Au (a, s) is as follows: rl (δ, Au ) = pj (δ, Au )k(πa.Au (a)?; a)∗ . [π x.Available(x)?; use(x)∗ ; [nil | (break(x); repair(x); use(x)∗ )]; discard(x)]∗ In other words, we project out the actions in Au from δ and run the resulting program concurrently with one that picks (uncontrollable) actions filtered by Au and executes them. The resulting program no longer constrains the occurrence of actions from Au in any way. In fact, notice that the remaining program of (πa.Au (a)?; a)∗ after the execution of an (uncontrollable) filtered action is (πa.Au (a)?; a)∗ itself, and that such a program is always Final. Now we are ready to define relaxed supervision. Let us consider a supervisor S with behavior δS for agent B with behavior δB . Let the action filter Au (au , s) specify the uncontrollable actions. Then the relaxed supervision of S (for Au (au , s)) in s is the relaxation of δS so as that it allows every uncontrollable action, namely: rl (δS , Au ). So we can characterize the behavior of B under the relaxed supervision of S as: δB & rl (δS , Au ). The following properties are immediate consequences of the definitions: Uncontrollable actions. In the above, we implicitly assumed that all actions of agent B could be controlled by the supervisor S. This is often too strong an assumption, e.g. once we let a child out in a garden after rain, there is nothing we can do to prevent her/him from getting muddy. We now want to deal with such cases. Following [Wonham and Ramadge, 1987], we distinguish between actions that are uncontrollable by the supervisor and actions that are controllable. The supervisor can block the execution of the controllable actions but cannot prevent the supervised agent from executing the uncontrollable ones. To characterize the uncontrollable actions in the situation calculus, we use a special fluent Au (au , s), which we call an action filter, that expresses that action au is uncontrollable in situation s. Notice that, differently from the Wonham and Ramadge work, we allow controllability to be context dependent by allowing an arbitrary specification of the fluent Au (au , s) in the situation calculus. While we would like the supervisor S to constrain agent B so that δB & δS is executed, in reality, since S cannot prevent uncontrollable actions, S can only constrain B on the controllable actions. When this is sufficient, we say that the supervisor is “effective”. Technically, following again Wonham and Ramadge’s ideas, this can be captured by saying that the supervision by δS is effective for δB in situation s iff: Proposition 3 The relaxed supervision rl (δS , Au ) is effective for δB in situation s. Proposition 4 CR(δB & δS , s) ⊆ CR(δB & rl (δS , Au ), s). Proposition 5 If CR(δB & rl (δS , Au ), s) ⊆ CR(δB & δS , s), then δS is effective for δB in situation s. Notice that, the first one is what we wanted. But the second one says that rl (δS , Au ) may indeed by more permissive than δS : some complete runs that are disallowed in δS may be permitted by its relaxation rl (δS , Au ). This is not always acceptable. The last one, says that when the converse of Proposition 4 holds, we have that the original supervision δS is indeed effective for δB in situation s. Notice however that even if δS effective for δB in situation s, it may still be the case that CR(δB & rl (δS , Au ), s) ⊂ CR(δB & δS , s). ∀~aau .~a ∈ GR(δB & δS , s) and Au (au , do(~a, s)) implies if ~aau ∈ GR(δB , s) then ~aau ∈ GR(δS , s). What this says is that if we postfix a good run ~a for both B and S with an uncontrollable action au that is good for B, then this uncontrollable action au must also be good for S. By the way, notice that ~aau ∈ GR(δB , s) and ~aau ∈ GR(δS , s) together imply that ~aau ∈ GR(δB & δS , s). What about if such a property does not hold? We can take two orthogonal approaches: (i) relax δS so that it places no constraints on the uncontrollable actions; (ii) require that δS be indeed enforced, but disallow all those runs that prevent δS from being effective. We look at both approaches below. Maximal permissive supervisor. Next we study a more conservative approach: we require the supervision δS to be fulfilled, and for getting effectiveness we restrict it further. Interestingly, we show that there is a single maximal way of restricting the supervisor S so that it both fulfills δS and becomes effective. We call the resulting supervisor the maximal permissive supervisor. 27 are controllable), the supervisor S1 can only ensure that its constraints are satisfied if it forces B1 to discard an object as soon as it is broken and repaired. This is what we get as maximal permissive supervisor mps(δB1 , δS1 , S0 ), whose set of partial/complete runs can be shown to be exactly that of: We start by introducing a new abstract program construct set(E) taking as argument a possibly infinite set E of sequences of actions, with next and Final defined as follows: ( set(E ′ ) with E ′ = {~a | a~a ∈ E} if E ′ 6= ∅ next(set(E), a, s) = ⊥ if E ′ = ∅ Final(set(E), s) ≡ (ǫ ∈ E) [π x.Available(x)?; use(x)∗ ; [nil | (break(x); repair(x))]; discard(x)]∗ Thus set(E) can be executed to produce any of the sequences of actions in E. Notice that for every program δ and situation s, we can define Eδ = CR(δ, s) such that CR(set(Eδ ), s) = CR(δ, s). The converse does not hold in general, i.e., there are abstract programs set(E) such that for all programs δ, not involving the set(·) construct, CR(set(Eδ ), s) 6= CR(δ, s). That is, the syntactic restrictions in ConGolog may not allow us to represent some possible sets of sequences of actions. With the set(E) construct at hand, following [Wonham and Ramadge, 1987], we may define the maximal permissive supervisor mps(δB , δS , s) of B with behavior δB by S with behavior δS in situation s, as: S mps(δB , δS , s) = set( E∈E E) where By the way, notice that (δB1 & rl (δS1 , Au )) instead is completely ineffective since it has exactly the runs as δB1 . Unfortunately, in general, mps(δB , δS , s) requires the use of the abstract program construct set(E), which can be expressed directly in ConGolog only if E is finite.5 For this reason the above characterization remains essentially mathematical. So next, we develop a new construct for execution of programs under maximal permissive supervision, which is indeed realizable. Maximal permissive supervised execution. To capture the notion of maximal permissive execution of agent B with behavior δB under the supervision of S with behavior δS in situation s, we introduce a special version of the synchronous concurrency construct that takes into account the fact the some actions are uncontrollable. Without loss of generality, we assume that δB and δS both start with a common controllable action (if not, it is trivial to add a dummy action in front of both so as to fullfil the requirement). Then, we characterize the construct through next and Final as follows: next(δB &Au δS , a, s) =  ⊥ if ¬Au (a, s) and ∃a~u .Au (a~u , do(a, s)) s.t.    next ∗ (Σ(δB ), aa~u , s) 6= ⊥ and next ∗ (Σ(δS ), aa~u , s) = ⊥ E = {E | E ⊆ CR(δB & δS , s) and set(E) is effective for δB in s} Intuitively mps denotes the maximal set of runs that are effectively allowable by a supervisor that fulfills the specification δS , and which can be left to the arbitrary decisions of the agent δB on the non-controllable actions. A quite interesting result is that, even in the general setting we are presenting, such a maximally permissive supervisor always exists and is unique. Indeed, we can show: Theorem 6 For the maximal permissive supervisor mps(δB , δS , s) the following properties hold: 1. mps(δB , δS , s) always exists and is unique; 2. mps(δB , δS , s) is an effective supervisor for δB in s; ⊥ if next(δB , a, s) = ⊥ or next(δS , a, s) = ⊥    otherwise next(δB , a, s) &Au next(δS , a, s) Here Au (a~u , s) is inductively defined on the length of a~u as the smallest predicate such that: (i) Au (ǫ, s) ≡ true; (ii) Au (au a~u , s) ≡ Au (au , s) ∧ Au (a~u , do(au , s)). Final for the new construct is as follows: 3. For every possible effective supervisor δ̂S for δB in s such that CR(δB & δ̂S , s) ⊆ CR(δB & δS , s), we have that CR(δB & δ̂S , s) ⊆ CR(δB & mps(δB , δS , s), s). Proof: We prove the three claims separately. Claim 1 follows directly from the fact set(∅) satisfies the conditions to be included in mps(δB , δS , s). Claim 3 also follows immediately from the definition of mps(δB , δS , s), by recalling that CR(δB & δ̂S , s) = CR(δB & set(Eδ̂S ), s). Final(δB &Au δS , s) ≡ Final(δB , s) ∧ Final(δS , s). This new construct captures exactly the maximal permissive supervisor; indeed the theorem below shows the correctness of maximal permissive supervised execution: Theorem 7 CR(δB &Au δS , s) = CR(δB & mps(δB , δS , s), s). Proof: We start by showing: For Claim 2, it suffices to show that ∀~aau .~a ∈ GR(δB & mps(δB , δS , s), s) and Au (au , do(~a, s)) we have that if ~aau ∈ GR(δB , s) then ~aau ∈ GR(mps(δB , δS , s), s). Indeed, if ~a ∈ GR(δB & mps(δB , δS , s), s) then there is an effective supervisor set(E) such that ~a ∈ GR(δB & set(E), δS , s), s). set(E) being effective for δB in s, if ~aau ∈ GR(δB , s) then ~aau ∈ GR(set(E), s), but then ~aau ∈ GR(mps(δB , δS , s), s). CR(δB &Au δS , s) ⊆ CR(δB & mps(δB , δS , s), s). It suffices to show that δB &Au δS is effective for δB in s. Indeed, if this is the case, by considering that δB & mps(δB , δS , s) is the largest effective supervisor for δB in s, and that RR(δB & (δB &Au δS ), s) = RR(δB &Au δS , s), we get the thesis. 5 Note that the object domain may be uncountable in general, hence not even an infinitary ConGolog program could capture set(E) in general. We can illustrate using our example programs. If we assume that the break action is uncontrollable (and the others 28 So we have to show that: ∀~aau .~a ∈ GR(δB &Au δS , s) and Au (au , do(~a, s)) we have that if ~aau ∈ GR(δB , s) then ~aau ∈ GR(δB &Au δS , s). Since, wlog we assume that δB and δS started with a common controllable action, we can write ~a = a~′ ac a~u , where ′ ¬Au (ac , do(a~′ , s)) and Au (a~u , do(a~′ ac , s)) holds. Let δB = ∗ ∗ ′ ′ ′ ′ ′ ~ ~ ~ next (δB , a , s), δS = next (δS , a , s), and s = do(a , s). By the fact that a~′ ac a~u ∈ GR(δB &Au δS , s) we know that ′ next(δB &Au δS′ , do(ac , s′ )) 6= ⊥. But then, by de definition of next, we have that for all b~u such that Au (b~u , s′ ) if ′ b~u ∈ GR(δB , do(ac , s′ )) then b~u ∈ GR(δS′ , do(ac , s′ )). In particular this holds for b~u = a~u au . Hence we have that if ~aau ∈ GR(δB , s) then ~aau ∈ GR(δS , s). runs that lead to Final configurations. We can ensure that an agent finds such executions by having it do lookahead/search. Also of interest is the case in which agents act boldly without necessarily performing search to get to Final configurations. In this case, we need to consider all partial runs, not just good ones. Note that this would actually yield the same result if we engineered the agent behavior such that all of its runs are good runs, i.e. if RR(δB , s) = GR(δB , s), i.e., all configurations are final. In fact, one could define a closure construct cl (δ) that would make all configurations of δ final. Using this, one can apply our specification of the maximal permissive supervisor to this case as well if we replace δB & δS by cl (δB & δS ) in the definition. Observe also, that under the assumption RR(δB , s) = GR(δB , s), in next(δB &Au δS , a, s) we no longer need to do the search Σ(δB ) and Σ(δS ) and can directly use δB and δS . We conclude by mentioning that if the object domain is finite, then ConGolog programs assume only a finite number of possible configurations. In this case, we can take advantage of the finite state machinery that was originally proposed by Wonham and Ramage (generalizing it to deal with situationdependent sets of controllable actions), and the recent work on translating ConGolog into finite state machines and back [Fritz et al., 2008], to obtain a program that actually characterizes the maximally permissive supervisor. In this way, we can completely avoid doing search during execution. We leave an exploration of this notable case for future work. Next we prove: CR(δB & mps(δB , δS , s), s) ⊆ CR(δB &Au δS , s). Suppose not. Then there exist a complete run ~a such that ~a ∈ CR(δB & mps(δB , δS , s), s) but ~a 6∈ CR(δB &Au δS , s). As an aside, notice that ~a ∈ CR(δ, s) then ~a ∈ GR(δ, s) and for all prefixes a~′ such that a~′~b = ~a we have a~′ ∈ GR(δ, s). Hence, let a~′ = a~′′ a such that a~′ ∈ GR(δB &Au δS , s) but ′′ ′′ ~′′ = next ∗ (δB , a , s), a~′′ a 6∈ GR(δB &Au δS , s), and let δB ∗ ′ ′′ δ = next (δS , a~′′ , s), and s = do(a~′′ , s). S Since a~′′ a 6∈ GR(δB &Au δS , s), it must be the case ′′ that next(δB &Au δS′′ , a, s′′ ) = ⊥. But then, consider′′ , a, s′′ ) 6= ⊥ and next(δS′′ , a, s′′ ) 6= ing that both next(δB ⊥, it must be the case that ¬Au (a, s′′ ) and exists b~u such ′′ that Au (b~u , do(a, s′′ )), and ab~u ∈ GR(δB , s′′ ) but ab~u 6∈ ′′ ′′ GR(δS , s ). Notice that b~u 6= ǫ, since we have that a ∈ GR(δS′′ , s′′ ). So b~u = c~u bu d~u with ac~u ∈ GR(δS′′ , s′′ ) but ac~u bu 6∈ GR(δS′′ , s′′ ). Now a~′ ∈ GR(δB & mps(δB , δS , s), s) and since Au (c~u bu , do(a~′ , s)), we have that a~′ c~u bu ∈ GR(δB & mps(δB , δS , s), s). Since, mps(δB , δS , s) is effective for δB in s, we have that, if a~′ c~′u bu ∈ GR(δB , s) then a~′ c~u bu ∈ GR(mps(δB , δS , s), s). This, by definition of mps(δB , δS , s), implies a~′ c~u bu ∈ GR(δB & δS , s), and hence, in turn, a~′ c~u bu ∈ GR(δS , s). Hence, we can conclude that ac~′u bu ∈ GR(δS′′ , s′′ ), getting a contradiction. 5 Acknowledgments We thank Murray Wonham for inspiring discussions on supremal controllable languages in finite state discrete event control, which actually made us look into agent supervision from a different and very fruitful point of view. We also thank the anonymous referees for their comments. We acknowledge the support of EU Project FP7-ICT ACSI (257593). References [Bertoli et al., 2010] Piergiorgio Bertoli, Marco Pistore, and Paolo Traverso. Automated composition of web services via planning in asynchronous domains. Artif. Intell., 174(3-4):316–361, 2010. [De Giacomo et al., 1998] Giuseppe De Giacomo, Raymond Reiter, and Mikhail Soutchanski. Execution monitoring of high-level robot programs. In KR, pages 453–465, 1998. [De Giacomo et al., 2000] Giuseppe De Giacomo, Yves Lespérance, and Hector J. Levesque. ConGolog, a concurrent programming language based on the situation calculus. Artificial Intelligence, 121(1–2):109–169, 2000. Conclusion In this paper, we have investigated agent supervision in situation-determined ConGolog programs. Our account of maximal permissive supervisor builds on [Wonham and Ramadge, 1987]. However, Wonham and Ramage’s work deals with finite state automata, while we handle infinite state systems in the context of the rich agent framework provided by the situation calculus and ConGolog. We used ConGolog as a representative of an unbounded-states process specification language, and it should be possible to adapt our account of supervision to other related languages. We considered a form of supervision that focuses on complete runs, i.e., [De Giacomo et al., 2010] Giuseppe De Giacomo, Yves Lespérance, and Adrian R. Pearce. Situation calculus based programs for representing and reasoning about game structures. In KR, 2010. [Demolombe and Hamon, 2002] Robert Demolombe and Erwan Hamon. What does it mean that an agent is performing a typical procedure? a formal definition in the situation calculus. In AAMAS, pages 905–911, 2002. 29 [Fritz and Gil, 2010] Christian Fritz and Yolanda Gil. Towards the integration of programming by demonstration and programming by instruction using Golog. In PAIR, 2010. [Fritz and McIlraith, 2006] Christian Fritz and Sheila McIlraith. Decision-theoretic Golog with qualitative preferences. In KR, pages 153–163, June 2–5 2006. [Fritz et al., 2008] Christian Fritz, Jorge A. Baier, and Sheila A. McIlraith. ConGolog, sin trans: Compiling ConGolog into basic action theories for planning and beyond. In KR, pages 600–610, 2008. [Lin et al., 2008] Naiwen Lin, Ugur Kuter, and Evren Sirin. Web service composition with user preferences. In ESWC, pages 629–643, 2008. [McIlraith and Son, 2002] S. McIlraith and T. Son. Adapting Golog for composition of semantic web services. In KR, pages 482–493, 2002. [Reiter, 2001] Ray Reiter. Knowledge in Action. Logical Foundations for Specifying and Implementing Dynamical Systems. The MIT Press, 2001. [Sardiña and De Giacomo, 2009] Sebastian Sardiña and Giuseppe De Giacomo. Composition of ConGolog programs. In IJCAI, pages 904–910, 2009. [Sohrabi et al., 2009] Shirin Sohrabi, Nataliya Prokoshyna, and Sheila A. McIlraith. Web service composition via the customization of Golog programs with user preferences. In Conceptual Modeling: Foundations and Applications, pages 319–334. Springer, 2009. [Su, 2008] Jianwen Su. Special issue on semantic web services: Composition and analysis. IEEE Data Eng. Bull., 31(3), 2008. [Wonham and Ramadge, 1987] WM Wonham and PJ Ramadge. On the supremal controllable sub-language of a given language. SIAM J Contr Optim, 25(3):637659, 1987. 30 On the Use of Epistemic Ordering Functions as Decision Criteria for Automated and Assisted Belief Revision in SNePS (Preliminary Report) Ari I. Fogel and Stuart C. Shapiro University at Buffalo, The State University of New York, Buffalo, NY {arifogel,shapiro}@buffalo.edu Abstract to give up, but this has to be decided by some other means. What makes things more complicated is that beliefs in a database have logical consequences. So when giving up a belief you have to decide as well which of its consequences to retain and which to retract... [Gärdenfors and Rott, 1995] In later sections we will discuss in detail how to make a choice of belief(s) to retract when presented with an inconsistent belief set. We implement belief revision in SNePS based on a user-supplied epistemic ordering of propositions. We provide a decision procedure that performs revision completely automatically when given a well preorder. We also provide a decision procedure for revision that, when given a total preorder, simulates a well preorder by making a minimal number of queries to the user when multiple propositions within a minimally-inconsistent set are minimally-epistemically-entrenched. The first procedure uses Op|Σ|q units of space, and completes within Op|Σ|2 ¨ smax q units of time, where Σ is the set of distinct minimally-inconsistent sets, and smax is the number of propositions in the largest minimally-inconsistent set. The second procedure uses Op|Σ|2 ¨s2max q space and Op|Σ|2 ¨s2max q time. We demonstrate how our changes generalize previous techniques employed in SNePS. AGM Paradigm In [Gärdenfors, 1982; Alchourron et al., 1985], operators and rationality postulates for theory change are discussed. In general, any operator that satisfies those postulates may be thought of as an AGM operation. Let CnpAq refer to the closure under logical consequence of a set of propositions A. A theory is defined to be a set of propositions closed under logical consequence. Thus for any set of propositions A, CnpAq is a theory. It is worth noting that theories are infinite sets. [Alchourron et al., 1985] discusses operations that may be performed on theories. Partial meet contraction and revision. defined in [Alchourron et al., 1985], satisfy all of the postulates for a rational contraction and revision operator, respectively. 1 Introduction 1.1 Belief Revision Several varieties of belief revision have appeared in the literature over the years. AGM revision typically refers to the addition of a belief to a belief set, at the expense of its negation and any other beliefs supporting its negation [Alchourron et al., 1985]. Removal of a belief and beliefs that support it is called contraction. Alternatively, revision can refer to the process of resolving inconsistencies in a contradictory knowledge base, or one known to be inconsistent [Martins and Shapiro, 1988]. This is accomplished by removing one or more beliefs responsible for the inconsistency, or culprits. This is the task with which we are concerned. In particular, we have devised a means of automatically resolving inconsistencies by discarding the least-preferred beliefs in a belief base, according to some epistemic ordering [Gärdenfors, 1988; Williams, 1994; Gärdenfors and Rott, 1995]. Theory Change on Finite Bases It is widely accepted that agents, because of their limited resources, believe some but by no means all of the logical consequences of their beliefs. [Lakemeyer, 1991] A major issue with the AGM paradigm it tends to operate on and produce infinite sets (theories). A more practical model would include operations to be performed on finite belief sets, or belief bases. Such operators would be useful in supporting computer-based implementations of revision systems [Williams, 1994]. It has been argued that The AGM paradigm uses a coherentist approach1 [Gärdenfors, 1989], in that all beliefs require some sort of external justification. On the other hand, finite-base systems are said to use a foundationalist approach, wherein some beliefs indeed have their own epistemic standing, and others can be derived from them. SNePS, as we shall see, uses the finite-base foundationalist approach. The problem of belief revision is that logical considerations alone do not tell you which beliefs 1 It 31 has also been argued otherwise [Hansson and Olsson, 1999] Epistemic Entrenchment Let us assume that the decision on which beliefs to retract from a belief base is made is based on the relative importance of each belief, which is called its degree of epistemic entrenchment [Gärdenfors, 1988]. Then we need an ordering ď with which to compare the entrenchment of individual beliefs. Beliefs that are less entrenched are preferentially discarded during revision over beliefs that are more entrenched. An epistemic entrenchment ordering is used to uniquely determine the result of AGM contraction. Such an ordering is a noncircular total preorder (that satisfies certain other postulates) on all propositions. way to construct non-prioritized belief revision is to base it on the following two-step process: First we decide whether to accept or reject the input. After that, if the input was accepted, it is incorporated into the belief state [Hansson, 1999]. Hansson goes on to describe several other models of nonprioritized belief revision, but they all have one unifiying feature distinguishing them from prioritized belief revision: the input, i.e. the RHS argument to the revision operator, is not always accepted. To reiterate: Prioritized belief revision is revision in which the proposition by which the set is revised is always present in the result (as long as it is not a contradiction). Non-prioritized belief revision is revision in which the RHS argument to the revision operator is not always present in the result (even if it is not a contradiction). The closest approximation from Hansson’s work to our work is the operation of semi-revision [Hansson, 1997]. Semi-revision is a type of non-prioritized belief revision that may be applied to belief bases. Ensconcements Ensconcements, introduced in [Williams, 1994], consist of a set of forumlae together with a total preorder on that set. They can be used to construct epistemic entrenchment orderings, and determine theory base change operators. Safe Contraction In [Alchourron and Makinson, 1985], the operation safe contraction is introduced. Let ă be a non-circular relation over a belief set A. An element a of A is safe with respect to x iff a is not a minimal element of any minimal subset B of A such that x P CnpBq. Let A{x be the set of all elements of A that are safe with respect to x. Then the safe contraction of A by x, denoted A ´s x, is defined to be A XCnpA{xq. 1.2 SNePS Description of the System “SNePS is a logic-, frame-, and network- based knowledge representation, reasoning, and acting system. Its logic is based on Relevance Logic [Shapiro, 1992], a paraconsistent logic (in which a contradiction does not imply anything whatsoever) [Shapiro and Johnson, 2000].” SNeRE, the SNePS Rational Engine, provides an acting system for SNePS-based agents, whose beliefs must change to keep up with a changing world. Of particular interest is the believe action, which is used to introduce beliefs that take priority over all other beliefs at the time of their introduction. Assumption-based Truth Maintenance Systems In an assumption-based truth maintenance system (ATMS), the system keeps track of the assumptions (base beliefs) underlying each belief [de Kleer, 1986]. One of the roles of an conventional TMS is to keep the database contradiction-free. In an assumption-based ATMS, contradictions are removed as they are discovered. When a contradiction is detected in an ATMS, then there will be one or more minimally-inconsistent sets of assumptions underlying the contradiction. Such sets are called no-goods. [Martins and Shapiro, 1988] presented SNeBR, an early implementation of an ATMS that uses the logic of SNePS. In that paper, sets of assumptions supporting a belief are called origin sets. They correspond to antecedents of a justification from [de Kleer, 1986]. The focus of this paper is modifications to the modern version of SNeBR. Belief Change in SNePS Every belief in a SNePS knowledge base (which consists of a belief base and all currently-known derived propostions therefrom) has one or more support sets, each of which consists of an origin tag and an origin set. The origin tag will identify a belief as either being introduced as a hypothesis, or derived (note that it is possible for a belief to be both introduced as a hypothesis and derived from other beliefs). The origin set contains those hypotheses that were used to derive the belief. In the case of the origin tag denoting a hypothesis, the corresponding origin set would be a singleton set containing only the belief itself. The contents of the origin set of a derived belief are computed by the implemented rules of inference at the time the inference is drawn [Martins and Shapiro, 1988; Shapiro, 1992]. The representation of beliefs in SNePS lends itself well to the creation of processes for contraction and revision. Specifically, in order to contract a belief, one must merely remove at least one hypothesis from each of its origin sets. Similarly, prioritized revision by a belief b (where ␣b is already believed) is accomplished by removing at least one belief from each origin set of ␣b. Non-prioritized belief revision under this paradigm is a bit more complicated. We discuss both types of revision in more detail in §2. Kernel Contraction In [Hansson, 1994], the operation kernel contraction is introduced. A kernel set Aáα is defined to be the set of all minimal subsets of A that imply α . A kernel set is like a set of origin sets from [Martins and Shapiro, 1988]. Let σ be an incision function for A. Then for all α , σ pAáα q Ď YpAáα q, and if H ‰ X P Aáα , then X X σ pAáα q ‰ H. The kernel contraction of A by α based on σ , denoted A „σ α , is equal to Azσ pAáα q. Prioritized Versus Non-Prioritized Belief Revision In the AGM model of belief revision [Alchourron et al., 1985] . . . the input sentence is always accepted. This is clearly an unrealistic feature, and . . . several models of belief change have been proposed in which no absolute priority is assigned to the new information due to its novelty. . . . One 32 That is, a proposition asserted by a believe action takes priority over any other proposition. When either both or neither propositions being compared have been assserted by the believe action, then we use the same ordering as we would for nonprioritized revision. SNeBR SNeBR, The SNePS Belief Revision subsystem, is responsible for resolving inconsistencies in the knowledge base as they are discovered. In the current release of SNePS (version 2.7.1), SNeBR is able to automatically resolve contradictions under a limited variety of circumstances [Shapiro and The SNePS Implementation Group, 2010, 76]. Otherwise “assisted culprit choosing” is performed, where the user must manually select culprits for removal. After belief revision is performed, the knowledge base might still be inconsistent, but every known derivation of an inconsistency has been eliminated. 2.2 Common Requirements for a Rational Belief Revision Algorithm Primary Requirements The inputs to the algorithm are: • A set of formulae Φ: the current belief base, which is known to be inconsistent • A total preorder ď on Φ: an epistemic entrenchment ordering that can be used to compare the relative desirability of each belief in the current belief base 2 New Belief Revision Algorithms 2.1 Problem Statement Nonprioritized Belief Revision Suppose we have a knowledge base that is not known to be inconsistent, and suppose that at some point we add a contradictory belief to that knowledge base. Either that new belief directly contradicts an existing belief, or we derive a belief that directly contradicts an existing one as a result of performing forward and/or backward inference on the new belief. Now the knowledge base is known to be inconsistent. We will refer to the contradictory beliefs as p and ␣p Since SNePS tags each belief with one or more origin sets, or sets of supporting hypotheses, we can identify the underlying beliefs that support each of the two contradictory beliefs. In the case where p and ␣p each have one origin set, OS p and OS␣ p respectively, we may resolve the contradiction by removing at least one hypothesis from OS p Y OS␣ p . We shall refer to such a union as a no-good. If there are m origin sets for p, and n origin sets for ␣p, then there will be at most mˆn distinct no-goods (some unions may be duplicates of others). To resolve a contradiction in this case, we must retract at least one hypothesis from each no-good (Sufficiency). We wish to devise an algorithm that will select the hypotheses for removal from the set of no-goods. The first priority will be that the hypotheses selected should be minimallyepistemically-entrenched (Minimal Entrenchment) according to some total preorder ď. Note that we are not referring strictly to an AGM entrenchment order, but to a total preorder on the set of hypotheses, without regard to the AGM postulates. The second priority will be not to remove any more hypotheses than are necessary in order to resolve the contradiction (Information Preservation), while still satisfying priority one. • Minimally-inconsistent sets of formulae σ1 , . . . , σn , each of which is a subset of Φ: the no-goods • A set Σ “ tσ1 , . . . , σn u: the set of all the no-goods The algorithm should produce a set T that satisfies the following conditions: pEESNePS 1q @σ rσ P Σ Ñ Dτ rτ P pT X σ qs (Sufficiency) pEESNePS 2q @τ rτ P T Ñ Dσ rσ P Σ ^ τ P σ ^ @wrw P σ Ñ τ ď wsss (Minimal Entrenchment) pEESNePS 3q @T 1 rT 1 Ă T Ñ ␣@σ rσ P Σ Ñ Dτ rτ P pT 1 X σ qsss (Information Preservation) Condition pEESNePS 1q states that T contains at least one formula from each set in Σ. Condition pEESNePS 2q states that every formula in T is a minimally-entrenched formula of some set in Σ. Condition pEESNePS 3q states that if any formula is removed from T, then Condition pEESNePS 1q will no longer hold. In addition to the above conditions, our algorithm must terminate on all possible inputs, i.e. it must be a decision procedure. Supplementary Requirement In any case where queries must be made of the user in order to determine the relative epistemic ordering of propositions, the number of such queries must be kept to a minimum. 2.3 Implementation We present algorithms to solve the problem as stated: Where we refer to ď below, we are using the prioritized entrenchment ordering from §2.1. In the case of nonprioritized revision we may assume that P “ H Prioritized Belief Revision The process of Prioritized Belief Revision in SNePS occurs when a contradiction is discovered after a belief is asserted explicitly using the believe act of SNeRE. The major difference here is that a subtle change is made to the entrenchment ordering ď. If ďnonpri is the ordering used for nonprioritized belief revision, then for prioritized belief revision we use an ordering ď pri as follows: Let P be the set of beliefs asserted by a believe action. Then @e1 , e2 re1 P P ^ e2 R P Ñ ␣pe1 ď pri e2 q ^ e2 ď pri e1 s @e1 , e2 re1 R P ^ e2 R P Ñ pe1 ď pri e2 Ø e1 ďnonpri e2 qs @e1 , e2 re1 P P ^ e2 P P Ñ pe1 ď pri e2 Ø e1 ďnonpri e2 qs Using a well preorder Let tď be the output of a function f whose input is a total preorder ď, such that tď Ďď The idea is that f creates the well preorder tď from ď by removing some pairs from the total preorder ď. Note that in the case where ď is already a well preorder, tď “ď. Then we may use Algorithm 1 to solve the problem. Algorithm 1 Algorithm to compute T given a well preorder 33 Input: Σ, tď Output: T 1: T ð H 2: for all pσ P Σq do 3: Move minimally entrenched belief in σ to first position in σ , using tď as a comparator 4: end for 5: Sort elements of Σ into descending order of the values of the first element in each σ using tď as a comparator 6: AddLoop : 7: while pΣ ‰ Hq do 8: currentCulprit ð σ11 9: T ð T Y tcurrentCulpritu 10: DeleteLoop : 11: for all pσcurrent P Σq do 12: if pcurrentCulprit P σcurrent q then 13: Σ ð Σzσcurrent 14: end if 15: end for 16: end while 17: return T 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: Using a total preorder Unfortunately it is easy to conceive of a situation in which the supplied entrenchment ordering is a total preorder, but not a well preorder. For instance, let us say that, when reasoning about a changing world, propositional fluents (propositions that are only true of a specific time or situation) are abandoned over non-fluent propositions. It is not clear then how we should rank two distinct propositional fluents, nor how to rank two distinct non-fluent propositions. If we can arbitrarily specify a well preorder that is a subset of the total preorder we are given, then algorithm 1 will be suitable. Otherwise, we can simulate a well order t through an iterative construction by querying the user for the unique minimally-entrenched proposition of a particular set of propositions at appropriate times in the belief-revision process. Algorithm 2 accomplishes just this. other no-good via an lσ j ,(1 ď j ď |Σ| , i ‰ j)) then T ð T Y tpu for all pσcurrent P Σq do if pp P σcurrent q then Σ ð Σzσcurrent end if end for if pΣ “ Hq then return T end if end if end for Modi f yLoop: for all pσ P Σq do if (σ has multiple minimally-entrenched propositions) then query which proposition l of the minimallyentrenched propostions is least desired. Modify ď so that l is strictly less entrenched than those other propositions. break out of Modi f yLoop end if end for end loop Characterization These algorithms perform an operation similar to incision functions [Hansson, 1994], since they select one or more propositions to be removed from each minimally-inconsistent set. Their output seems analogous to σ pΦápp ^ ␣pqq, where σ is the incision function, á is the kernel-set operator from [Hansson, 1994], and p is a proposition. But we are actually incising Σ, the set of known no-goods. The known no-goods are of course a subset of all no-goods, i.e. Σ Ď Φápp ^ ␣pq. This happens because SNeBR resolves contradictions as soon as they are discovered, rather than performing inference first to discover all possible sources of contradictions. The type of contraction eventually performed is similar to safe contraction [Alchourron and Makinson, 1985], except that there are fewer restrictions on our epistemic ordering. Algorithm 2 Algorithm to compute T given a total preorder 3 Analysis of Algorithm 1 Input: Σ, ď Output: T 1: T ð H 2: MainLoop: 3: loop 4: ListLoop: 5: for all pσi P Σ, 1 ď i ď |Σ|q do 6: Make a list lσi of all minimally-entrenched propositions, i.e. propositions that are not strictly more entrenched than any other, among those in σi , using ď as a comparator. 7: end for 8: RemoveLoop: 9: for all (σi P Σ, 1 ď i ď |Σ|) do 10: if (According to lσi , σ has exactly one minimallyentrenched proposition p AND the other propositions in σi are not minimally-entrenched in any 3.1 Proofs of Satisfaction of Requirements by Algorithm 1 We show that Algorithm 1 satisfies the requirements established in section 2: pEESNePS 1q (Sufficiency) During each iteration of AddLoop an element τ is added to T from some σ P Σ. Then each set σ P Σ containing τ is removed from Σ. The process is repeated until Σ is empty. Therefore each removed set σ in Σ contains some τ in T (Note that each σ will be removed from Σ by the end of the process). So @σ rσ P Σ Ñ Dτ rτ P pT X σ qs. Q.E.D. pEESNePS 2q (Minimal Entrenchment) From lines 8-9, we see that T is comprised solely of first elements of sets in Σ. And from lines 2-4, we see that those first elements are all minimal under tď relative to the other 34 largest σ in Σ, then lines 2-4 will take Op|Σ| ¨ smax q time. In line 5, we sort the no-goods’ positions in Σ using their first elements as keys. This takes Op|Σ| ¨ logp|Σ|qq time. Lines 716 iterate through the elements of Σ at most once for each element in Σ. During each such iteration, a search is performed for an element within a no-good. Also, during each iteration through all the no-goods, at least one σ is removed, though this does not help asymptotically. Since the no-goods are not sorted, the search takes linear time in smax . So lines 7-16 take Op|Σ|2 ¨ smax q time. Therefore, the running time is Op|Σ|2 ¨ smax q time. Note that the situation changes slightly if we sort the no-goods instead of just placing the minimally-entrenched proposition at the front, as in lines 2-4. In this case, each search through a no-good will take Oplogpsmax qq time, yielding a new total time of Op|Σ| ¨ smax ¨ logpsmax q ` |Σ|2 ¨ logpsmax qq. elements in each set. Since @e1 , e2 , ď re1 tď e2 Ñ e1 ď e2 s, those first elements are minimal under ď as well. That is, @τ rτ P T Ñ Dσ rσ P Σ^ τ P σ ^@wrw P σ Ñ τ ď wsss. Q.E.D. pEESNePS 3q (Information Preservation) From the previous proof we see that during each iteration of AddLoop, we are guaranteed that at least one set σ containing the current culprit is removed from Σ. And we know that the current culprit for that iteration is minimally-entrenched in σ . We also know from pEESNePS 2q that each subsequently chosen culprit will be minimally entrenched in some set. From lines 2-5 and AddLoop, we know that subsequently chosen culprits will be less entrenched than the current culprit. From lines 2-5, we also see that all the other elements in σ have higher entrenchment than the current culprit. Therefore subsequent culprits cannot be elements in σ . So, they cannot be used to eliminate σ . Obviously, previous culprits were also not members of σ . Therefore, if we exclude the current culprit from T , then there will be a set in Σ that does not contain any element of T . That is, @T 1 rT 1 Ă T Ñ Dσ rσ P Σ ^ ␣Dτ rτ P pT 1 X σ qsss 6 @T 1 rT 1 Ă T Ñ Dσ r␣␣pσ P Σ ^ ␣Dτ rτ P pT 1 X σ qqsss 6 @T 1 rT 1 Ă T Ñ Dσ r␣p␣pσ P Σq _ Dτ rτ P pT 1 X σ qqsss 6 @T 1 rT 1 Ă T Ñ Dσ r␣pσ P Σ Ñ Dτ rτ P pT 1 X σ qqss 6 @T 1 rT 1 Ă T Ñ ␣@σ rσ P Σ Ñ Dτ rτ P pT 1 X σ qsss Q.E.D. 4 Analysis of Algorithm 2 4.1 Proofs of Satisfaction of Requirements by Algorithm 2 We show that Algorithm 2 satisfies the requirements established in section 2: pEESNePS 1q (Sufficiency) Since every set of propositions must contain at least one proposition that is minimally entrenched, at least one proposition is added to the list in each iteration of ListLoop. In the worst case, assume that for each iteration of MainLoop, only either RemoveLoop or Modi f yLoop do any work. We know that at least this much work is done for the following reasons: if Modi f yLoop cannot operate on any no-good during an iteration of MainLoop, then all no-goods have only one minimally-entrenched proposition. So either RemoveLoop’s condition at line 10 would hold, or: 1. A no-good has multiple minimally-entrenched propositions, causing Modi f yLoop to do work. This contradicts our assumption that Modi f yLoop could not do any work during this iteration of MainLoop, so we set this possibility aside. 2. Some proposition p1 is a non-minimally-entrenched proposition in some no-good σn , and a minimally-entrenched one in another no-good σm . In this case, either p1 is removed during the iteration of RemoveLoop where σm is considered, or there is another proposition p2 in σm that is not minimallyentrenched in σm , but is in σm1 . This chaining must eventually terminate at a no-good σm f inal since ď is transitive. And the final proposition in the chain p f inal must be the sole minimally-entrenched proposition in σ f inal , since otherwise Modi f yLoop would have been able to do work for this iteration of MainLoop, which is a contradiction. Modi f yLoop can only do work once for each no-good, so eventually its work is finished. If Modi f yLoop has no more work left to do, then RemoveLoop must do work at least once for each iteration of MainLoop. And in doing so, it will create a list of culprits of which each no-good contains at least one. Q.E.D. Decidability We see that DeleteLoop is executed once for each element in Σ, which is a finite set. So it always terminates. We see that AddLoop terminates when Σ is empty. And from lines 8 and 13 we see that at least one set is removed from Σ during each iteration of AddLoop. So AddLoop always terminates. Lines 2-4 involve finding a minimum element, which is a decision procedure. Line 5 performs sorting, which is also a decision procedure. Since every portion of Algorithm 1 always terminates, it is a decision procedure. Q.E.D. Supplementary Requirement Algorithm 1 is a fully-automated procedure that makes no queries of the user. Q.E.D. 3.2 Complexity of Algorithm 1 Space Complexity Algorithm 1 can be run completely in-place, i.e. it can use only the memory allocated to the input, with the exception of the production of the set of culprits T . Let us assume that the space needed to store a single proposition is Op1q memory units. Since we only need to remove one proposition from each no-good to restore consistency, algorithm 1 uses Op|Σ|q memory units. Time Complexity The analysis for time complexity is based on a sequentialprocesing system. Let us assume that we implement lists as array structures. Let us assume that we may determine the size of an array in Op1q time. Let us also assume that performing a comparison using tď takes Op1q time. Then in lines 2-4, for each array σ P Σ we find the minimum element σ and perform a swap on two elements at most once for each element in σ . If we let smax be the cardinality of the pEESNePS 2q (Minimal Entrenchment) Since propositions are only added to T when the condition in line 10 is satisfied, it is guaranteed that every proposition in 35 T is a minimally-entrenched proposition in some no-good σ . Op|Σ|q time. We noted earlier that during each iteration of MainLoop, RemoveLoop or Modi f yLoop will do work. In the worst case, only one will do work each time. And they each may do work at most |Σ| times. So the total running time for the procedure is Op|Σ|2 ¨ s2max q. pEESNePS 3q (Information Preservation) From line 10, we see that when a proposition p is removed, none of the other propositions in its no-good are minimallyentrenched in any other no-good. That means none of the other propositions could be a candidate for removal. So, the only way to remove the no-good in which p appears is by removing p. So if p were not removed, then pEESNePS 1q would not be satisfied. Q.E.D. 5 Annotated Demonstrations A significant feature of our work is that it generalizes previous published work on belief revision in SNePS [Johnson and Shapiro, 1999; Shapiro and Johnson, 2000; Shapiro and Kandefer, 2005]. The following demonstrations showcase the new features we have introduced to SNeBR, and capture the essence of belief revision as seen in the papers mentioned above by using well-specified epistemic ordering functions. The demos have been edited for formatting and clarity. The commands br-tie-mode auto and br-tie-mode manual indicate that Algorithm 1 and Algorithm 2 should be used respectively. A wff is a well-formed formula. A wff followed by a period (.) indicates that the wff should be asserted, i.e. added to the knowledge base. A wff followed by an exclamation point (!) indicates that the wff should be asserted, and that forward inference should be performed on it. Decidability ListLoop creates lists of minimal elements of lists. This is a decision procedure since the comparator is a total preorder. From the proof of pEESNePS 1q above, we see that either RemoveLoop or Modi f yLoop must do work for each iteration of MainLoop. Modi f yLoop cannot operate more than once on the same no-good, because there are no longer multiple minimally-entrenched propositions in the no-good after it does its work. Nor can RemoveLoop operate twice on the same no-good, since the no-good is removed when Modi f yLoop does work. So, eventually Modi f yLoop has no more work to do, and at that point RemoveLoop will remove at least one no-good for each iteration of MainLoop. By lines 17-18, when the last no-good is removed, the procedure terminates. So it always terminates. Q.E.D. Says Who? We present a demonstration on how the source-credibilitybased revision behavior from [Shapiro and Johnson, 2000] is generalized by our changes to SNeBR. The knowledge base in the demo is taken from [Johnson and Shapiro, 1999]. In the following example, the command set-order source sets the epistemic ordering used by SNeBR to be a lisp function that compares two propositions based on the relative credibility of their sources. Unsourced propositions are assumed to have maximal credibility. The sources, as well as their relative credibility are represented as meta-knowledge in the SNePS knowledge base. This was also done in [Johnson and Shapiro, 1999] and [Shapiro and Johnson, 2000]. The source function makes SNePSLOG queries to determine sources of propositions and credibility of sources, using the askwh and ask commands [Shapiro and The SNePS Implementation Group, 2010]. This allows it to perform inference in making determinations about sources. Here we see that the nerd and the sexist make the generalizations that all jocks are not smart and all females are not smart respectively, while the holy book and the professor state that all old people are smart, and all grad students are smart respectively. Since Fran is an old female jock graduate student, there are two sources that would claim she is smart, and two that would claim she is not, which is a contradiction. Supplementary Requirement RemoveLoop attempts to compute T each time it is run from MainLoop. If the procedure does not terminate within RemoveLoop, then we run Modi f yLoop on at most one nogood. Afterwards, we run RemoveLoop again. Since the user is only queried when the procedure cannot automatically determine any propositions to remove, we argue that this means minimal queries are made of the user. Q.E.D. 4.2 Complexity of Algorithm 2 Space Complexity As before, let smax be the cardinality of the largest no-good in Σ. In the worst case all propositions are minimally entrenched, so ListLoop will recreate Σ. So ListLoop will use Op|Σ| ¨ smax q space. RemoveLoop creates a culprit list, which we stated before takes Op|Σ|q space. ModifyLoop may be implemented in a variety of ways. We will assume that it creates a list of pairs, of which the first and second elements range over propositions in the no-goods. In this case Modi f yLoop uses Op|Σ|2 ¨ s2max q space. So the total space requirement is Op|Σ|2 ¨ s2max q memory units. Time Complexity The analysis for time complexity is based on a sequentialprocesing system. For each no-good σ , in the worst case, ListLoop will have to compare each proposition in σ agains every other. So, for each iteration of MainLoop, ListLoop takes Op|Σ| ¨ s2max q time. There are at most Opsmax q elements in each list created by ListLoop. So, checking the condition in line 10 takes Op|Σ| ¨ s2max q time. Lines 12-16 can be executed in Op|Σ| ¨ smax q time. Therefore, RemoveLoop takes Op|Σ| ¨ s2max q time. We assume that all the work in lines 2427 can be done in constant time. So, Modi f yLoop takes ; ; ; Show origin s e t s : expert : br´mode auto Automatic b e l i e f revision will now be automatically selected . : br´t ie´mode manual The user will be consulted when an entrenchment t i e occurs ; ; ; Use source c r e d i b i l i t i e s as epistemic ordering c r i t e r i a . set´order source ; ; ; The holy book i s a b e t t e r source than the professor . IsBetterSource ( holybook , prof ) . ; ; ; The professor i s a b e t t e r source than the nerd . IsBetterSource ( prof , nerd ) . ; ; ; The nerd i s a b e t t e r source than the s e x i s t . IsBetterSource ( nerd , s e x i s t ) . 36 less credible than the sources for “Fran is smart.” ; ; ; Fran i s a b e t t e r source than the nerd . IsBetterSource ( fran , nerd ) . ; ; ; Better´Source i s a t r a n s i t i v e r e l a t i o n a l l (x , y , z ) ({IsBetterSource (x , y) , IsBetterSource (y , z )} &=> IsBetterSource (x , z ) ) ! ; ; ; All jocks are not smart . a l l (x) ( jock (x)=>˜smart (x) ) . ; wff10 ; ; ; The source of the statement ’ All jocks are not smart ’ i s the nerd HasSource (wff10 , nerd ) . ; ; ; All females are not smart . a l l (x) ( female (x)=>˜smart (x) ) . ; wff12 ; ; ; The source of the statement ’ All females are not smart ’ i s the sexist . HasSource (wff12 , s e x i s t ) . ; ; ; All graduate students are smart . a l l (x) ( grad (x)=>smart (x) ) . ; wff14 ; ; ; The source of the statement ’ All graduate students are smart ’ i s the professor . HasSource (wff14 , prof ) . ; ; ; All old people are smart . a l l (x) ( old (x)=>smart (x) ) . ; wff16 ; ; ; The source of the statement ’ All old people are smart ’ i s the holy book . HasSource (wff16 , holybook ) . ; ; ; The source of the statement ’Fran i s an old female jock who i s a graduate student ’ i s fran . HasSource (and{jock ( fran ) , grad ( fran ) , female ( fran ) , old ( fran ) }, fran ) . ; ; ; The KB thus far l i s t´asserted´wffs wff23 ! : HasSource ( old ( fran ) and female ( fran ) and grad ( fran ) and jock ( fran ) , fran ) {<hyp,{wff23}>} wff17 ! : HasSource ( a l l (x) ( old (x) => smart (x) ) , holybook ){<hyp,{wff17}>} wff16 ! : a l l (x) ( old (x) => smart (x) ) {<hyp,{wff16}>} wff15 ! : HasSource ( a l l (x) ( grad (x) => smart (x) ) , prof ) {<hyp,{wff15}>} wff14 ! : a l l (x) ( grad (x) => smart (x) ) {<hyp,{wff14}>} wff13 ! : HasSource ( a l l (x) ( female (x) => ( ˜ smart (x) ) ) , s e x i s t ) {<hyp,{wff13}>} wff12 ! : a l l (x) ( female (x) => ( ˜ smart (x) ) ) {<hyp,{wff12}>} wff11 ! : HasSource ( a l l (x) ( jock (x) => ( ˜ smart (x) ) ) , nerd ) <hyp,{wff11}>} wff10 ! : a l l (x) ( jock (x) => ( ˜ smart (x) ) ) {<hyp,{wff10}>} wff9 ! : IsBetterSource ( fran , s e x i s t ) {<der ,{wff3 , wff4 , wff5}>} wff8 ! : IsBetterSource ( prof , s e x i s t ) {<der ,{wff2 , wff3 , wff5}>} wff7 ! : IsBetterSource ( holybook , s e x i s t ) {<der ,{wff1 , wff2 , wff3 , wff5}>} wff6 ! : IsBetterSource ( holybook , nerd ) {<der ,{wff1 , wff2 , wff5}>} wff5 ! : a l l ( z , y , x) ({IsBetterSource (y , z ) , IsBetterSource (x , y)} &=> {IsBetterSource (x , z ) }) {<hyp,{wff5}>} wff4 ! : IsBetterSource ( fran , nerd ) {<hyp,{wff4}>} wff3 ! : IsBetterSource ( nerd , s e x i s t ) {<hyp,{wff3}>} wff2 ! : IsBetterSource ( prof , nerd ) {<hyp,{wff2}>} wff1 ! : IsBetterSource ( holybook , prof ) {<hyp,{wff1}>} ; ; ; Fran i s an old female jock who i s a graduate student ( asserted with forward inference ) . and{jock ( fran ) , grad ( fran ) , female ( fran ) , old ( fran ) }! wff50 ! : ˜ ( a l l (x) ( jock (x) => ( ˜ smart (x) ) ) ) {<ext ,{wff16 , wff22}>,<ext ,{wff14 , wff22}>} wff24 ! : smart ( fran ) {<der ,{wff16 , wff22}>,<der ,{wff14 , wff22}>} ; ; ; The r e s u l t i n g knowledge base ( HasSource and IsBetterSource omited for c l a r i t y ) l i s t´asserted´wffs wff50 ! : ˜ ( a l l (x) ( jock (x) => ( ˜ smart (x) ) ) ) {<ext ,{wff16 , wff22}>, <ext ,{wff14 , wff22}>} wff37 ! : ˜ ( a l l (x) ( female (x) => ( ˜ smart (x) ) ) ) {<ext ,{wff16 , wff22}>} wff24 ! : smart ( fran ) {<der ,{wff16 , wff22}>,<der ,{wff14 , wff22}>} wff22 ! : old ( fran ) and female ( fran ) and grad ( fran ) and jock ( fran ) {<hyp,{wff22}>} wff21 ! : old ( fran ) {<der ,{wff22}>} wff20 ! : female ( fran ) {<der ,{wff22}>} wff19 ! : grad ( fran ) {<der ,{wff22}>} wff18 ! : jock ( fran ) {<der ,{wff22}>} wff16 ! : a l l (x) ( old (x) => smart (x) ) {<hyp,{wff16}>} wff14 ! : a l l (x) ( grad (x) => smart (x) ) {<hyp,{wff14}>} Wumpus World We present a demonstration on how the state-constraintbased revision behavior from [Shapiro and Kandefer, 2005] is generalized by our changes to SNeBR. The command setorder fluent says that propositional fluents are strictly less entrenched than non-fluent propositions. The fluent order was created specifically to replace the original belief revision behavior of the SNeRE believe act. In the version of SNeBR used in [Shapiro and Kandefer, 2005], propositions of the form andorpă 0|1 ą, 1qpp1 , p2 , . . .q were assumed to be state contraints, while the inner propositions, p1 , p2 , etc., were assumed to be fluents. The fluents were less entrenched than the state constraints. We see that the ordering was heavily syntax-dependent. In our new version, the determination of which propositions are fluents is made by checking for membership of the predicate symbol of an atomic proposition in a list called *fluents*, which is defined by the user to include the predicate symbols of all propositional fluents. So the entrenchment ordering defined here uses metaknowledge about the knowledge base that is not represented in the SNePS knowledge base. The command br-tie-mode manual indicates that Algorithm 2 should be used. Note that the xor connective [Shapiro, 2010] used below replaces instances of andor(1,1)(. . . ) from [Shapiro and Kandefer, 2005]. The command perform believe(wff) is identical to the command wff!, except that the former causes wff to be strictly more entrenched than every other proposition during belief revision. That is, wff is guaranteed to be safe (unless wff is itself a contradiction). So we would be using prioritized belief revision. ; ; ; Show origin s e t s : expert ; ; ; Always use automatic b e l i e f revision : br´mode auto Automatic b e l i e f revision will now be automatically selected . ; ; ; Use algorithm 2 : br´t ie´mode manual The user will be consulted when an entrenchment t i e occurs . ; ; ; Use an entrenchment ordering t ha t favors non´fluents over ; ; ; fluents set´order fluent ; ; ; Establish what kinds of propositions are fluents ; specifically , t h a t the agent i s facing some direction i s a f a c t t h a t may change over time . ˆ ( s e t f ∗fluents∗ ’( Facing ) ) ; ; ; The agent i s Facing west Facing ( west ) . ; ; ; At any given time , the agent i s facing e i t h e r north , south , east , or west ( asserted with forward inference ) . xor{Facing ( north ) , Facing ( south ) , Facing ( east ) , Facing ( west ) }! ; ; ; The knowledge base as i t stands l i s t´asserted´wffs wff8 ! : ˜ Facing ( north ) {<der ,{wff1 , wff5}>} wff7 ! : ˜ Facing ( south ) {<der ,{wff1 , wff5}>} wff6 ! : ˜ Facing ( east ) {<der ,{wff1 , wff5}>} wff5 ! : xor{Facing ( east ) , Facing ( south ) , Facing ( north ) , Facing ( west )} {<hyp,{wff5}>} wff1 ! : Facing ( west ) {<hyp,{wff1}>} ; ; ; Tell the agent to believe i t i s now facing east . perform believe ( Facing ( east ) ) ; ; ; The r e s u lt i n g knowledge base l i s t´asserted´wffs wff10 ! : ˜ Facing ( west ) {<ext ,{wff4 , wff5}>} wff8 ! : ˜ Facing ( north ) {<der ,{wff1 , wff5}>,<der ,{wff4 , wff5}>} wff7 ! : ˜ Facing ( south ) {<der ,{wff1 , wff5}>,<der ,{wff4 , wff5}>} We see that the statements that all jocks are not smart and that all females are not smart are no longer asserted at the end. These statements supported the statement that Fran is not smart. The statements that all old people are smart and that all grad students are smart supported the statement that Fran is smart. The contradiction was resolved by contracting “Fran is not smart,” since the sources for its supports were 37 [Gärdenfors, 1988] P. Gärdenfors. Knowledge in Flux: Modeling the Dynamics of Epistemic States. The MIT Press, Cambridge, Massachusetts, 1988. [Gärdenfors, 1989] P. Gärdenfors. The dynamics of belief systems: Foundations vs. coherence. Revue Internationale de Philosophie, 1989. [Hansson and Olsson, 1999] S. O. Hansson and E. J. Olsson. Providing foundations for coherentism. Erkenntnis, 51(2– 3):243–265, 1999. [Hansson, 1994] S. O. Hansson. Kernel contraction. The Journal of Symbolic Logic, 59(3):845–859, 1994. [Hansson, 1997] S. O. Hansson. Semi-revision. Journal of Applied Non-Classical Logics, 7(2):151–175, 1997. [Hansson, 1999] S. O. Hansson. A survey of non-prioritized belief revision. Erkenntnis, 50:413–427, 1999. [Johnson and Shapiro, 1999] F. L. Johnson and S. C. Shapiro. Says Who? - Incorporating Source Credibility Issues into Belief Revision. Technical Report 99-08, Department of Computer Science and Engineering, SUNY Buffalo, Buffalo, NY, 1999. [Lakemeyer, 1991] Lakemeyer. On the relation between explicit and implicit beliefs. In Proc. KR-1991, pages 368– 375. Morgan Kaufmann, 1991. [Martins and Shapiro, 1988] J. P. Martins and S. C. Shapiro. A model for belief revision. Artificial Intelligence, 35(1):25–79, 1988. [Shapiro and Johnson, 2000] S. C. Shapiro and F. L. Johnson. Automatic belief revision in SNePS. In C. Baral and M. Truszczynski, editors, Proc. NMR-2000, 2000. unpaginated, 5 pages. [Shapiro and Kandefer, 2005] S. C. Shapiro and M. Kandefer. A SNePS Approach to the Wumpus World Agent or Cassie Meets the Wumpus. In L. Morgenstern and M. Pagnucco, editors, NRAC-2005, pages 96–103, 2005. [Shapiro and The SNePS Implementation Group, 2010] Stuart C. Shapiro and The SNePS Implementation Group. SNePS 2.7.1 USER’S MANUAL. Department of Computer Science and Engineering, SUNY Buffalo, December 2010. [Shapiro, 1992] Stuart C. Shapiro. Relevance logic in computer science. Section 83 of A. R. Anderson and N. D. Belnap, Jr. and J. M/ Dunn et al. Entailment, Volume II, pages 553–563. Princeton University Press, Princeton, NJ, 1992. [Shapiro, 2010] S. C. Shapiro. Set-oriented logical connectives: Syntax and semantics. In F. Lin, U. Sattler, and M. Truszczynski, editors, KR-2010, pages 593–595. AAAI Press, 2010. [Williams, 1994] M.-A. Williams. On the logic of theory base change. In C. MacNish, D. Pearce, and L. Pereira, editors, Logics in Artificial Intelligence, volume 838 of Lecture Notes in Computer Science, pages 86–105. Springer Berlin / Heidelberg, 1994. wff5 ! : xor{Facing ( east ) , Facing ( south ) , Facing ( north ) , Facing ( west )} {< hyp,{wff5}>} wff4 ! : Facing ( east ) {<hyp,{wff4}>} There are three propositions in the no-good when revision is performed: Facing(west), Facing,east, and xor(1,1){Facing(...}. Facing(east) is not considered for removal since it was prioritized by the believe action. The state-constraint xor(1,1){Facing...} remains in the knowledge base at the end, because it is more entrenched than Facing(west), a propositional fluent, which is ultimately removed. 6 Conclusions Our modified version of SNeBR provides decision procedures for belief revision in SNePS. By providing a single resulting knowledge base, these procedures essentially perform maxichoice revision for SNePS. Using a well preorder, belief revision can be performed completely automatically. Given a total preorder, it may be necessary to consult the user in order to simulate a well preorder. The simulated well preorder need only be partially specified; it is only necessary to query the user when multiple beliefs are minimally-epistemicallyentrenched within a no-good, and even then only in the case where no other belief in the no-good is already being removed. In any event, the epistemic ordering itself is usersupplied. Our algorithm for revision given a well preorder uses asymptotically less time and space than the other algorithm, which uses a total preorder. Our work generalize previous belief revision techniques employed in SNePS. Acknowledgments We would like to thank Prof. William Rapaport for providing editorial review, and Prof. Russ Miller for his advice concerning the analysis portion of this paper. References [Alchourron and Makinson, 1985] C.E. Alchourron and D. Makinson. On the logic of theory change: Safe contraction. Studia Logica, (44):405–422, 1985. [Alchourron et al., 1985] C. E. Alchourron, P. Gärdenfors, and D. Makinson. On the logic of theory change: Partial meet contraction and revision functions. Journal of Symbolic Logic, 20:510–530, 1985. [de Kleer, 1986] J. de Kleer. An Assumption-Based TMS. Artificial Intelligence, 28:127–162, 1986. [Gärdenfors and Rott, 1995] P. Gärdenfors and H. Rott. Belief revision. In Gabbay, Hogger, and Robinson, editors, Epistemic and Temporal Reasoning, volume 4 of Handbook of Logic in Artificial Intelligence and Logic Programming, pages 35–131. Clarendon Press, Oxford, 1995. [Gärdenfors, 1982] P. Gärdenfors. Rules for rational changes of belief. In T. Pauli, editor, Philosophical Essays Dedicated to Lennart Åqvist on His Fiftieth Birthday, number 34 in Philosophical Studies, pages 88–101, Uppsala, Sweden, 1982. The Philosophical Society and the Department of Philosophy, University at Uppsala. 38 Decision-Theoretic Planning for Golog Programs with Action Abstraction Daniel Beck and Gerhard Lakemeyer Knowledge Based Systems Group RWTH Aachen University, Aachen, Germany {dbeck,gerhard}@cs.rwth-aachen.de Abstract troublesome in DTGolog. Whereas the semantics of Golog allow the agent to freely choose those arguments, DTGolog needs to restrict the choice to a finite, pre-defined list. The reason being that DTGolog performs a forward search and branches over the possible continuations of the remaining program (and also over the outcomes of stochastic actions) which requires that the number of successor states in the search tree is finite. Generally, what the possible choices are and how many there are in any domain instance is unknown a-priori and thus the approach of DTGolog is not directly extensible to handle an unconstrained nondeterministic choice of arguments. In [Boutilier et al., 2001] an approach that allows to solve an MDP using dynamic programming methods on a purely symbolic level was presented. The key idea was that from the first-order description of the MDP a first-order representation of the value function can be derived. This representation of the value function allows not only abstraction over the state space but also it allows to abstract over action instances. We show how these ideas extend in the presence of programs that constrain the search for the optimal policy. Finding the optimal execution of a DTGolog program (or, more precisely, the optimal policy compatible with the program) is understood as a multi-objective optimization problem where the objectives are the expected cumulative reward and the probability of successfully executing the program. The latter refers to the probability of not ending up in a configuration in which the program cannot be executed any further. We show how symbolic representations of the functions representing these quantities can be derived. With the help of these functions we then can extend the semantics of DTGolog to programs containing an unrestricted choice of arguments. In fact, we show that for DTGolog programs the original DTGolog interpreter and our extended version compute the same policies. DTGolog combines the ability to specify an MDP in a first-order language with the possibility to restrict the search for an optimal policy by means of programs. In particular, it employs decisiontheoretic planning to resolve the nondeterminism in the programs in an optimal fashion (wrt an underlying optimization theory). One of the nondeterministic constructs DTGolog offers is the nondeterministic choice of action arguments. The possible choices, though, are restricted to a finite, predefined list. We present an extension to DTGolog that overcomes this restriction but still retains the optimality property of DTGolog. That is, in our extended version of DTGolog we can formulate programs that allow for an unrestricted choice of action arguments even in domains where there are infinitely many possible choices. The key to this is that we compute the optimal execution strategy for a program on the basis of abstract value functions. We present experiments which show that these extensions may lead to a speed-up in the computation time in comparison to the original DTGolog. 1 Introduction Markov decision processes (MDPs) [Puterman, 1994] have proved to be a conceptually adequate model for decisiontheoretic planning. Their solution, though, is often intractable. DTGolog [Boutilier et al., 2000], a decisiontheoretic extension of the high-level agent programming language Golog [Levesque et al., 1997], tackles this problem by constraining the search space with a Golog program. In particular, only the policies which comply with the program are considered during the search. The agent programming language Golog is based on the situation calculus, has a clearly defined semantics and offers programming constructs known from other programming languages (e.g., conditionals, nondeterministic choice, etc.). Thus, DTGolog programs can be understood as an advice to the decision-theoretic planner. Their semantics is understood as the optimal execution of the program. There is one particular nondeterministic construct, namely the nondeterministic choice of arguments, which is a little We provide a short overview of the situation calculus, Golog and DTGolog in Section 1.1. In Section 2 we introduce the case notation which we use to represent the abstract value functions for Golog programs presented in Section 3. Using these abstract value functions we provide the semantics of our DTGolog extension in Section 4. We discuss the advantages and disadvantages of our extension over the original DTGolog in Section 5. 39 1.1 The Situation Calculus and Golog For every combination of a stochastic action and one of its associated, deterministic actions the probability with which Nature chooses the deterministic action needs to be specified. To continue the example from above we might have The situation calculus is a first-order logic (with second-order elements which are of no concern to us, here) with sorts for situations and actions. The binary predicate symbol do(a, s) denotes the situation resulting from executing action a in situation s; the constant S0 denotes the initial situation. Fluents are regular function- or predicate-symbols that take a term of sort situation as their last argument. According to Reiter’s solution of the frame problem (cf. [Reiter, 1991]) the value of a fluent in a particular situation can be determined with the help of so-called successor-state axioms (SSAs) of which one has to exists for every fluent. For instance for the fluent F (~x, s): F (~x, do(a, s)) ≡ ΦF (~x, a, s) where, intuitively, ΦF (~x, a, s) describes the conditions which have to hold in situation s such that in the successor situation do(a, s), after executing the action a, the fluent F holds for the parameters ~x. The preconditions for actions are specified by axioms of the form P oss(A(~x), s) ≡ ΠA (~x, s) where ΠA (~x, s) describes the preconditions that have to hold before the action A(~x) can be executed. A basic action theory (BAT) D then consists of the foundational axioms Σ constraining the form of situation terms, the successor-state axiom Dssa , the action preconditions Dap , unique names assumptions for actions Duna , and a description of the initial situation DS0 . By means of regression a regressable formula, basically a formula where all terms of sort situation are rooted in S0 , can be transformed into an equivalent formula which only mentions the initial situation S0 . Thereby reasoning is restricted to reasoning about formulas in the initial situation. In particular, every occurrence of a fluent having a non-initial situation as its last argument is replaced with the right-hand side of its SSA. The regression operator R for a formula whose situation arguments are of the form do(a, s) is defined as follows: R(F (~x, do(a, s))) = ΦF (~x, a, s) R(¬φ) = ¬R(φ) R(φ ∧ ψ) = R(φ) ∧ R(ψ) R(∃x. φ) = ∃x. R(φ) In order to model stochastic domains, that is, domains where the effect of performing an action is not deterministic, but different effects might occur with certain probabilities, some kind of stochastic actions are necessary. In DTGolog those stochastic actions are modelled with the help of a number of associated, deterministic actions that describe the possible outcomes when executing the stochastic action. The understanding is that Nature chooses between the associated, deterministic actions when executing the stochastic action. For instance, in a Blocks World domain the (stochastic) move(b1 , b2 ) action whose outcomes are described by the deterministic actions moveS(b1 , b2 ) and moveF (b1 , b2 ), respectively. Notationally, this is captured by def. prob0 (moveS(b1 , b2 ), move(b1 , b2 ), s) = p = ¬heavy(b1 ) ∧ p = 0.9 ∨ heavy(b1 ) ∧ p = 0.1 def. prob0 (moveF (b1 , b2 ), move(b1 , b2 ), s) = p = ¬heave(b1 ) ∧ p = 0.1 ∨ heavy(b1 ) ∧ p = 0.9 which says that if the block to be moved is heavy the moveaction succeeds with a probability of 0.1 and fails with a probability of 0.9. Note that prob0 neglects the preconditions of the associated, deterministic actions. Also, the probability of Nature choosing another than one of the associated, deterministic actions has to be 0: def. prob(a, α, s) = p = choice(α, a) ∧ P oss(a, s) ∧ p = prob0 (a, α, s) ∨ ¬(choice(α, a) ∧ P oss(a, s)) ∧ p = 0 It is crucial that the axiomatizer ensures that the probability distribution is well-defined, that is, the probabilities over the deterministic outcome actions always sum up to 1. When actually executing a program containing stochastic actions, it is necessary to determine which of the associated, deterministic actions has been selected by Nature during the execution. Consequently, some kind of sensing is required. In particular, we assume that for every stochastic action there is a unique associated sense action (which itself is a stochastic action) and sense outcome conditions which discriminate Nature’s choices. The intention behind this is that when the agent actually executes a stochastic action, it can execute the associated sense action to acquire the necessary information from its sensors afterwards to unambiguously determine the action chosen by Nature with the help of the sense outcome conditions. Since we assume full observability, we can assume that the sensing is accurate and consequently the associated sense action is a noop-action in the theory. Besides the BAT DTGolog also requires an optimization theory in order to determine the optimal execution strategy for a program. This theory includes axioms defining the reward function reward(s) which assesses the current situation. For instance: reward(do(moveS(B1 , B2 ), s)) = 10 The kind of programs we consider are similar to regular Golog programs with the only exception that the primitives in the programs are not deterministic but stochastic actions. In particular, the following program constructs are available: δ1 ; δ2 sequences ϑ? test actions if ϑ then δ1 else δ2 end conditionals while ϑ do δ end loops (δ1 | δ2 ) nondeterministic branching π v. (γ) nondeterministic choice of argument proc P (~x) δP end procedures (including recursion) In DTGolog only a restricted version of the nondeterministic def. choice(move(b1 , b2 ), a) = a = moveS(b1 , b2 ) ∨ a = moveF (b1 , b2 ) 40 For the casemax-operator we assume that the formulas in the input case statement are sorted in a descending order, that is, vi > vi+1 . For this it is necessary that the vi are numerical constants which is what we assume for the remainder of this paper. Then, the operator is defined as follows: choice of argument is supported which is semantically equivalent to a nondeterministic branching over the same program but with different arguments. In our extension of DTGolog we support the unrestricted nondeterministic choice of arguments. 2 def. The Case Notation casemax case[φi , vi : i ≤ n] = We use a case notation similar to that introduced in [Boutilier et al., 2001]. This notation is convenient for the representation of finite case distinctions, that is, piecewise constant functions which have a finite number of different values. We write case[φ1 , v1 ; . . . ; φn , vn ] (or case[φi , vi : i ≤ n] for short) as an abbreviation for n _ case[φi ∧ ¬φj , vi : i ≤ n] . j<i Generally, a formula v = case[φi , vi : i ≤ n] might be ambiguous wrt the value of v, i.e., the φi are not required to hold mutually exclusively. Applying the casemax-operator to case[φi , vi : i ≤ n] remedies this problems. In the resulting case statement the formulas hold mutual exclusively and, furthermore, the value of v is maximized. Given an ordering over two-valued tuples that allow to pre-sort the formulas in the input case statement the casemax-operator can be applied on two-value case statements as well. The expressions ψ ∧ case[φi , vi : i ≤ n] and ∃x. case[φi , vi : i ≤ n] are case statements as well. Due to the disjunctive nature of the case statements the conjunction can be distributed into the disjunction and the existential quantifier can be moved into the disjunction. The resulting case statements then are case[ψ ∧ φi , vi : i ≤ n] and case[∃x. φi , vi : i ≤ n], respectively. We assume that the reward function reward(s) and the probability distribution over the deterministic actions associated with a stochastic action are specified using case statements: φi ∧ µ = vi i=1 The vi are numerical expressions, that is, expressions that evaluate to numbers. The variable µ is a special variable that is reserved for the use in case statements and must not be used anywhere else. In order to use case statements in formulas without explicitly referring to µ we define the following macro: def. v = case[φi , vi : i ≤ n] = case[φi , vi : i ≤ n]µv Furthermore, we slightly extend the case notation to allow the representation of a two-valued function: case[φ1 , (v1 , p1 ); . . . ; φn , (vn , pn )] is used as an abbreviation for: n _ φi ∧ µ1 = vi ∧ µ2 = pi . rew reward(s) = case[φrew 1 (s), r1 ; · · · ; φm (s), rm ] i=1 and The vi and pi are numerical expressions and µ1 and µ2 are reserved variables. In a similar fashion as in the single-valued case we define the macro: prob(Nj (~x), A(~x), s) = x, s), p1 ; · · · ; φA x, s), pn ] case[φA j,1 (~ j,n (~ def. (v, p) = case[φi , (vi , pi ) : i ≤ n] = case[φi , (vi , pi ) : i ≤ n]µv 1 ^ µ2 p We denote them by rCase(s) and pCaseA x, s), respecj (~ tively. Since these case statements define functions it is necx, s)} (s)} and the {φA essary that each of the sets {φrew j,i (~ i partitions the state space. That is, for every ~x and s a unique value can be determined. Formally, a set of formulas W {ψi (~x, s)} is said to partition the state space iff |= ∀~x, s. i ψi (~x, s) and |= ∀~x, s. ψi (~x, s) ⊃ ¬ψj (~x, s) for all i, i 6= j. . By means of the ◦-operator two single-valued case statements can be combined to a two-valued case statement: def. case[φi , vi : i ≤ n] ◦ case[ψj , vj′ : j ≤ m] = case[φi ∧ ψj , (vi , vj′ ) : i ≤ n, j ≤ m] Further operators we use (for single-valued case statements) are the binary operators ⊕, ⊗, and ∪ and the unary casemax-operator for symbolic maximization (cf. [Sanner and Boutilier, 2009]). 3 Abstract Value Functions The type of programs we consider cannot be directly executed, the nondeterminism in the program needs to be resolved first. Of course, the agent executing the program strives for an optimal execution strategy for the program. Optimality, in this case, is defined with respect to the expected reward accumulated during the first h steps of the program and the probability that these first h steps can be executed successfully (i.e., the probability of not running into a situation in which the program cannot be executed any further). Those two quantities are measured by the value functions Vhδ (s) and case[φi , vi , : i ≤ n] ⊗ case[ψj , vj′ : j ≤ m] = case[φi ∧ ψj , vi · vj′ : i ≤ n, j ≤ m] case[φi , vi : i ≤ n] ⊕ case[ψj , vj′ : j ≤ m] = case[φi ∧ ψj , vi + vj′ : i ≤ n, j ≤ m] case[φi , vi : i ≤ n] ∪ case[ψj , vj′ : j ≤ m] = ′ case[φ1 , vi ; . . . ; φn , vn ; ψ1 , v1′ ; . . . ; ψm , vm ] 41 Phδ (s), respectively. Our intention and the key to abstracting from the actual situation is to identify regions of the state space in which these functions are constant. The advantages of such an abstract function are twofold. First, these functions can be pre-computed since they are independent of the actual situation (and the initial situation). This allows to apply some simplification in order to lower the time necessary to evaluate the formula. Second, these abstract value functions allow to asses the values of a program containing nondeterministic choices of action arguments without explicitly enumerating all possible (ground) choices for these arguments. Rather the abstract value functions abstract from the actual choices by identifying properties for the action arguments that lead to a certain value for the expected reward and the probability of successfully executing the program, respectively. For instance, if a high reward is given to situations where there is a green block on top of a non-green block and the program tells the agent to pick a block and move it onto another nondeterministically chosen block, then the value function for the expected reward distinguishes the cases where a green block is moved on top of a non-green block from the other constellations. What it does not do is to explicitly refer to the green and the non-green blocks in the domain instance. Thus, given these abstract value functions, the nondeterminism in the program can be resolved by settling on the choice maximizing the abstract value functions when evaluated in the current situation. For a program δ and a horizon h we compute case statements Vhδ (s) and Phδ (s) representing the abstract value functions. As can be seen in the definition below the computation of these case statements is independent of the situation s. Vhδ (s) and Phδ (s) are inductively defined on the structure of δ. Since the definition is recursive we first need to assume that the horizon h is finite and that the programs are nil-terminated which can be achieved easily by sequentially combining a program with the empty program nil. 1. Zero horizon: def. to the situation s and not to any of the successor situations. For the probability of successfully executing the program the definition is quite similar only that the immediate reward is ignored: A(~ x);δ Ph j=1  δ ⊗ R(Ph−1 (do(Nj (~x), s))) 4. The program begins with a test action: def. Vhϑ?;δ (s) = (ϑ[s] ∧ Vhδ (s)) ∪ (¬ϑ[s] ∧ rCase(s)) In case the test does not hold the execution of the program has to be aborted and consequently no further rewards are obtained. def. Phϑ?;δ (s) = (ϑ[s] ∧ Phδ (s)) ∪ case[¬ϑ[s], 0] 5. The program begins with a conditional: def. Vhif ϑ then δ1 else δ2 end;δ (s) = (ϑ[s] ∧ Vhδ1 ;δ (s)) ∪ (¬ϑ[s] ∧ Vhδ2 ;δ (s)) Analogous for Phif ϑ then δ1 else δ2 end;δ (s). 6. The program begins with a nondeterministic branching: (δ1 | δ2 );δ Vh case[φi , vi → idxi ] def. where idxi = 1 if φi stems from Vhδ1 ;δ (s) and idxi = 2 if φi stems from Vhδ2 ;δ (s). This allows the agent to reconstruct what branch has to be chosen when φi holds in the current situation. For all further operations on the case statement those mappings can be ignored. (δ | δ );δ Ph 1 2 (s) is defined analogously. 7. The program begins with a nondeterministic choice of arguments: For the remaining cases we assume h > 0. 2. The empty program nil: def. Vhnil (s) = rCase(s) and Phnil (s) = case[true, 1] 3. The program begins with a stochastic action A(~x) with outcomes N1 (~x), . . . , Nk (~x): A(~ x);δ Vh k M j=1 def. (s) = casemax (Vhδ1 ;δ (s) ∪≥ Vhδ2 ;δ (s)) where ∪≥ is an extended version of the ∪-operator that additionally sorts the formulas according to their values such that vi ≥ vi+1 holds in the resulting case statement. Another minor modification of the ∪-operator is necessary to keep track of from where the formulas originate. The resulting case statement then looks like this: V0δ (s) = rCase(s) and P0δ (s) = case[true, 1] def. k M  pCaseA x, s) j (~ def. (s) = def. (s) = rCase(s)⊕ π x. (γ);δ Vh   δ pCaseA x, s) ⊗ R(Vh−1 (do(Nj (~x), s))) j (~ def. (s) = casemax ∃x. Vhγ;δ (s) Note that the resulting case statement is independent of the actually available choices for x. The formulas φi (x, s) in Vhγ;δ (s) (which mention x as a free variable) describe how the choice for x influences the expected reward for the remaining program γ; δ. To obtain π v. (γ);δ Vh (s) it is then maximized over the existentially π x. (γ);δ quantified case statement Vhγ;δ (s). Again, Ph (s) is defined analogously. That is, the expected value is determined as the sum of the immediate reward and the sum over the expected values executing the remaining program in the possible successor situations do(Nj (~x), s) each weighted by the probability of seeing the deterministic actions Nj (~x) as the outcome. Due to regression the formulas only refer 42 execution of δ is terminated. This is denoted by the special action Stop. 8. The program begins with a sequence: [δ1 ;δ2 ];δ3 Vh def. δ ;[δ2 ;δ3 ] (s) = Vh 1 (s) def. BestDo+ (δ, 0, s, ρ) = ρ = Stop that is, we associate the sequential composition to the right. By possibly repetitive application of this rule the program is transformed into a form such that one of the cases above can be applied. If the program begins with a stochastic action α a policy for every possibly outcome n1 , . . . , nk is determined by means of the auxiliary macro BestDoAux+ which expects a list of deterministic outcome actions as its first argument. senseEffect α is the sense action associated with α. 9. Procedure calls: The problem with procedures is that it is not clear how to macro expand a procedure’s body when it includes a recursive procedure call. Similar to how procedures are handled by Golog’s Do-macro we define an auxiliary macro: P (t1 ,...,tn );δ Vh def. BestDo+ ([α; δ], h, s, ρ) = ∃ρ′ . ρ = [α; senseEffect a ; ρ′ ] ∧ BestDoAux+ ({n1 , . . . , nk }, h, s, ρ′ ) def. (s) = P (t1 [s], . . . , tn [s], δ, s, h, v) If the first argument, the list of outcome actions, is empty then BestDoAux+ expands to We consider programs including procedure definitions to have the following form: def. BestDoAux+ ({}, h, s, ρ) = ρ = Stop. {proc P1 (~v1 ) δ1 end; · · · ; proc Pn (~vn ) δn end; δ0 } Otherwise, the first action n1 of the list is extracted and (if it is possible) a policy for the remaining program starting in the situation do(n1 , s) is computed by BestDo+ . Then, the policy is assembled by branching over the sense outcome condition θ1 for outcome n1 . The if-branch is determined by BestDo+ (δ, do(ni , s), h − 1), ρ1 ); the else branch by the BestDoAux+ macro for the remaining outcome actions. Then, we define the optimal expected value obtainable for executing the first h steps of such a program as: {proc P1 (~ v1 ) δ1 end;··· ;proc Pn (~ vn ) δn end;δ0 } Vh ∀Pi . [ n ^ def. (s) = ′ ∀~vi , s′ , h′ , δ ′ , v. v = Vhδ′i ;δ (s′ ) def. i=1 ′ ⊃ P (~vi , δ , s, h, v)] ⊃ BestDoAux+ ({n1 , . . . , nk }, h, s, ρ) = Vhδ0 (s) ¬P oss(n1 , s) ∧ BestDoAux+ ({n2 , . . . , nk }, h, s, ρ) ∨ P oss(n1 , s) Lemma 1. For every δ and h, the formulas in Vhδ (s) and Phδ (s) partition the state space. ∧ ∃ρ′ . BestDoAux+ ({n2 , . . . , nk }), h, s, ρ′ ) ∧ ∃ρ1 . BestDo(δ, h − 1, do(n1 , s), ρ1 ) Proof. (Sketch) By definition the formulas in rCase(s) and pCaseA x, s) partition the state space. The operations on j (~ case statements used in the definition of Vhδ (s) Phδ (s) retain this property. 4 ∧ ρ = if θ1 then ρ1 else ρ′ The cases where the program begins with a test-action or a conditional are handled in a quite similar manner by DTGolog’s BestDo which is why we omit them here. The cases where the program begins with a nondeterministic statement are handled quite differently, though. Whereas BestDo computes the expected reward as well as the probability of successfully executing the remaining program for the current situation, BestDo+ (δ, s, h, ρ) relies on Vhδ (s) and Phδ (s) for that. If the program begins with a nondeterministic branching another auxiliary macro is necessary: Semantics Informally speaking, the semantics for the kind of programs we consider is given by the optimal h-step execution of the program. Formally, it is defined by means of the macro BestDo+ (δ, s, h, ρ) where δ is the program for which a hstep policy ρ in situation s shall be computed. A policy is a special kind of program that is intended to be directly handed over to the execution system of the agent and executed without further deliberation. A policy for a program δ “implements” a (h-step) execution strategy for δ: it resolves the nondeterminism in δ and considers the possible outcomes of stochastic actions. In particular, it may proceed differently depending on what outcome actually has been chosen by Nature. The macro BestDo+ (δ, s, h, ρ) is defined inductively on the structure of δ. Its definition is in parts quite similar to that of DTGolog’s BestDo which is why we do not present all cases here, but focus on those where the definitions differ. Clearly, if h equals zero the horizon has been reached and the def. BestDo+ ((δ1 | δ2 ); δ, s, h, ρ) = BestDoNDet((δ1 | δ2 ); δ, s, h, case[φi (s), (vi , pi ) → idxi ], ρ) where the forth argument of BestDoNDet, the case statement, is the result obtained from the casemax applying   δ1 ;δ δ1 ;δ δ2 ;δ operator on Vh (s) ◦ Ph (s) ∪≥ Vh (s) ◦ Phδ2 ;δ (s) where ≥ implies an ordering over tuples (vi , pi ) and implements the trade-off between the expected reward and the probability of successfully executing the program. The 43 5 BestDoNDet-macro then is defined as: The major difference between the original DTGolog and our extended version of DTGolog is that the latter allows for an unrestricted nondeterministic choice of arguments. DTGolog, on the other hand, only allows the agent to choose from a finite, pre-defined list of possibilities. The practical implications of this are that the programs are tailored to specific domain instantiations—allowing the agent to choose between the blocks B1 , B2 , and B3 , for instance, only makes sense if there are blocks with those names. On the other hand there might be other blocks than those three and and in that case limiting the choice to B1 , B2 , and B3 is not always what is intended. In our extension of DTGolog the choice is, in general, unrestricted. Any intended restriction on the choice can be implemented by including a corresponding test in the program. For instance, if the programmer wants the agent to choose a green block and do something with it she would write π x. (?(green(x)); · · · ); · · · BestDoNDet((δ1 | δ2 ); δ, s, h, def. case[φi (s), (vi , pi ) → idxi ], ρ) = _ φi (s) ∧ BestDo+ (δidxi ; δ, s, h, ρ) i According to Lemma 1 exactly one of the φi (s) holds and thus the decision of whether to continue with the policy computed for δ1 ; δ or for δ2 ; δ is unambiguous. If the remaining program begins with a nondeterministic choice of arguments the definition of BestDo+ again relies on an auxiliary macro BestDoP ick: def. BestDo+ (π x. (γ); δ, s, h, ρ) = BestDoPick (π x. (γ); δ, s, h, Vhγ;δ (s) ◦ Phγ;δ (s), ρ) The definition of BestDoPick (π x. (γ); δ, s, h, case[φi (x, s), (vi , pi )], ρ) resembles the operation method of the casemax-operator. We assume that the φi (x, s) are sorted such that (vi , pi ) ≥ (vi+1 , pi+1 ). Then: Our approach can handle the unrestricted choice of arguments due to the (first-order) state- as well as action-abstraction that is achieved by means of the case-statements Vhδ (s) and Phδ (s). The consequence thereof is that the branching factor of the search tree spanned by BestDo+ depends on the number of cases in Vhδ (s) and Phδ (s) and not on the number of (ground) choices given to the agent as it is the case for BestDo. Although DTGolog is not limited to finite domains it is not capable of incorporating infinitely many ground choices into its decisions. Our extension can do so due to its abstraction mechanisms. On the other hand the formulas in the case statements can get large, actually even unmanageable large, quite quickly. But one can argue that the complexity of the formulas resulting form expanding the BestDo-macro is comparable. To see how this turns out in practise and whether we can even achieve a speed-up in the computation of a policy in comparison to DTGolog we performed tests in two different domains. The first domain is a Blocks World domain. The purpose of this domain is to test how the number of available blocks affects the time it takes to compute a policy. The second domain is the logistics domain which consists of several cities, trucks, and boxes which can be transported from one city to another. Here the goal is to see how DTGolog with and without action abstraction compare with each other in a domain that is a little bit more complex than the Blocks World domain. Both interpreters, the standard DTGolog interpreter and our extended version, have been implemented in Prolog in a straightforward manner. The only optimization we employed is to represent the case statements Vhδ (s) and Phδ (s) using first-order arithmetic decision diagrams (FOADD) as it has been suggested in [Sanner and Boutilier, 2009]. All experiments were carried out on a machine with a 2.6 GHz Core 2 duo and 2 GB of RAM. def. BestDoPick (πx.(γ); δ, s, h, case[φi (x, s), (vi , pi )], ρ) = _^ ¬∃x. φj (x, s) i j<i ∧ ∃x. [φi (x, s) ∧ BestDo+ (γ; δ, s, h, ρ)] ∨ ^ Comparison with DTGolog ¬∃x. φi (x, s) ∧ ρ = Stop i Note that the existential quantifier over the the φi also ranges over the macro BestDo+ and thus the x which occurs as a free variable in the policy returned by BestDo+ (γ; δ, s, h, ρ) is bound by the existential such that φi (x, s) holds. Theorem 1. For any DTGolog program δ, D |= ∀ρ. ∃p, v. BestDo(δ, h, S0 , p, v, ρ) ≡ BestDo+ (δ, h, S0 , ρ) (We assume that all restricted nondeterministic choices of arguments in δ have been rewritten as nondeterministic branchings.) There seems to be an anomaly in the definition of DTGolog’s BestDo-macro. Whereas for primitive actions the reward obtained in the situation before the primitive action is executed is considered this is not the case for stochastic actions. For instance, let A be a primitive, deterministic action and B a stochastic action with A being its sole outcome action (which is chosen by Nature with a probability of 1). Then the expected rewards for executing A and B may be different which seems to be strange. This anomaly can easily be “fixed” by considering the reward obtained in a situation before a stochastic action is executed: def. BestDo([α; δ], s, h, ρ, v, pr) = ∃ρ′ , v ′ . BestDoAux({n1 , . . . ; nk }, δ, s, h, ρ′ , v ′ , pr) ∧ v = reward(s) + v ′ ∧ ρ = [α; senseEffect α ; ρ′ ] For the proof of Theoream 1 we assumed the definition of BestDo as shown above. 5.1 Blocks World In our instances of the Blocks World there are coloured blocks. The fluent On(b1 , b2 , s) denotes that block b1 is on 44 500 60 w/o action abstraction w/ action abstraction planning time [s] 400 planning time [s] w/o action abstraction w/ action abstraction 50 300 200 100 40 30 20 10 0 0 0 100 200 300 400 500 0 # of blocks 2 4 6 8 10 horizon Figure 1: Influence of the number of possible action arguments on the planning time of DTGolog (w/o action abstraction) and our extended version featuring action abstraction. Figure 2: Planning times for different horizons. setting there are five cities and the intended goal is to have a box in city C1 . The program we computed policies for is: top of block b2 in situation s. Predicates like green(b) or blue(b) encode the colouring of the blocks. The stochastic action move(b1 , b2 ) has two outcomes: if it succeeds block b1 is moved on top of block b2 or if it fails block b1 remains in its current location. The probability with which the moveaction succeeds or fails depends on whether the block to be moved is heavy or not; the probability that the action fails is higher if the block is heavy. The reward function assigns a reward of 10 to situations where there exists a green block on top of a non-green block. In the experiments the number of blocks varied between 10 and 500 and we computed policies with BestDo and BestDo+ for a program that nondeterministically picks two blocks and moves one of them on top of the other: π x. (π y. (move(x, y))) while ¬∃b. boxIn(b, C1 ) do π c. (drive(T1 , c)); π b. (load(b, T1 )); drive(T1 , C1 ); π b. (unload(b, T1 )) end This time we intend to examine whether we gain any advantage in terms of being able to compute longer policies in the same amount of time from using our extended version of DTGolog. Thus we recorded the computation time for planning policies of different lengths. The results are shown in Figure 2. It can be seen that with action abstraction policies that are two steps longer can be computed in the same time. It has to be noted though that there are also domains in which no advantage in terms of computation time can be gained from the abstraction we apply in our extended version of DTGolog. One such example is a slight variation of the aforementioned logistics domain. In the version above every city is directly reachable from every other city. If we restrict this it is necessary to explicitly encode the reachability in the domain description. This not only increases the complexity of the formulas in Vhδ (s) and Phδ (s) but in particular it leads to formulas with more deeply nested quantifiers. This in turn increases the time it takes to evaluate those formulas (at least with our rather unsophisticated implementation) by such an amount that in the end DTGolog is a faster by a little bit. Additionally, we did not precompute the case statements but computed them on-the-fly since the required computation time was marginal. To sum it up, these experiments show that our extension of DTGolog can lead to a tremendous speed-up in planning. Such a speed-up should be observable in domains/domain instances where the branching factor of the search tree can be drastically reduced due to state and action abstraction. But then, there are also domains where this is not possible and there the speed-up is modest or our extended version of DTGolog is even slower than the original version. In the DTGolog variant of this program the nondeterministic choices of action arguments ranges over all the blocks in the domain instance. Consequently, the search tree branches over all possible combinations for the two blocks. With action abstraction the branching factor of the search tree is constant and independent of the number of objects. This reflects in the time it takes to compute a policy (cf. Figure 1). With an increasing number of blocks the computation time for DTGolog rises exponentially whereas the computation time for our extended version of DTGolog with action abstraction remains nearly constant. The slow increase of computation time can be explained with the fact that the evaluation of quantified formulas as they appear in the case statements for the program above takes longer with an increasing number of objects. Nevertheless in comparison to the computation time of DTGolog this increase is negligible. 5.2 Logistics Domain In a second experiment we compared our version of DTGolog with action abstraction to the original DTGolog version in the logistics domain. In that domain trucks are supposed to transport boxes from one city to another. The world is described by the fluents boxIn(b, c) meaning that box b is in city c, boxOn(b, t) meaning that box b is on truck t, and truckIn(t, c) meaning that the truck t is in city c. In our 45 6 Related Work [Andre and Russell, 2002] D. Andre and S. J. Russell. State abstraction for programmable reinforcement learning agents. In Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI-02), pages 119–125, 2002. [Boutilier et al., 2000] C. Boutilier, R. Reiter, M. Soutchanski, and S. Thrun. Decision-theoretic, high-level agent programming in the situation calculus. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00), pages 355–362, 2000. [Boutilier et al., 2001] C. Boutilier, R. Reiter, and B. Price. Symbolic dynamic programming for first-order MDPs. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), volume 17, pages 690–700, 2001. [Dearden and Boutilier, 1997] R. Dearden and C. Boutilier. Abstraction and approximate decision-theoretic planning* 1. Artificial Intelligence, 89(1-2):219–283, 1997. [Finzi and Lukasiewicz, 2007] A. Finzi and T. Lukasiewicz. Adaptive multi-agent programming in GTGolog. In KI 2006: Advances in Artificial Intelligence, 29th Annual German Conference on AI, pages 389–403. Springer, 2007. [Kersting et al., 2004] K. Kersting, M.V. Otterlo, and L. De Raedt. Bellman goes relational. In Proceedings of the twenty-first international conference on Machine learning, page 59. ACM, 2004. [Levesque et al., 1997] H. J. Levesque, R. Reiter, Y. Lespérance, F. Lin, and R. B. Scherl. GOLOG: A logic programming language for dynamic domains. The Journal of Logic Programming, 31(1-3):59–83, 1997. [Marthi et al., 2005] B. Marthi, S. Russell, D. Latham, and C. Guestrin. Concurrent hierarchical reinforcement learning. In Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05), pages 1652–1653, 2005. [Parr and Russell, 1997] R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems 10 (NIPS 1997), pages 1043–1049, 1997. [Puterman, 1994] M. L. Puterman. Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc. New York, NY, USA, 1994. [Reiter, 1991] R. Reiter. The frame problem in situation the calculus: a simple solution (sometimes) and a completeness result for goal regression. Artificial intelligence and mathematical theory of computation: papers in honor of John McCarthy, pages 359–380, 1991. [Sanner and Boutilier, 2009] S. Sanner and C. Boutilier. Practical solution techniques for first-order MDPs. Artificial Intelligence, 173(5-6):748–788, 2009. There are numerous approaches that aim for a compact representation of MDPs by using representation languages of varying expressiveness (e.g., a probabilistic variant of STRIPS [Dearden and Boutilier, 1997], relational logic [Kersting et al., 2004], or first-order logic as in DTGolog [Boutilier et al., 2000]). These representations adhere to abstract representations of the states and the state transitions and transition probabilities, respectively. The next step then is to exploit those compact representations when solving the MDP which is exactly what we did here for DTGolog’s first-order MDP representation. The technique we use for that was first presented in [Boutilier et al., 2001] where it was shown how an abstract representation of the value function can be derived from a first-order description of an MDP. The main difference to our approach is that they do not consider programs to restrict the search for the optimal policy. A first approach that combines abstract value functions and Golog programs was presented in [Finzi and Lukasiewicz, 2007]. Contrary to our approach they assume an incompletely specified model (in particular the probabilistic distribution of Nature’s choice is unspecified) and apply Q-learning techniques to obtain an optimal policy. The update of the Q-values and the assembling of the policy, though, is not handled within the language. Furthermore, their approach does not incorporate action abstraction. Restricting the search space for the optimal policy by means of partial programs has been explored extensively in [Parr and Russell, 1997], [Andre and Russell, 2000], and [Andre and Russell, 2002], for instance. Some of these approaches include abstraction mechanisms, too, but these rely manual intervention since the partial programs used by them have no properly defined semantics. 7 Conclusion In this paper we presented an extension to DTGolog that allows to decision-theoretically determine an execution strategy for a given program with abstracting over action instances. This not only allows to heighten the expressiveness of the programs since we are no longer limited to the restricted nondeterministic choice as in DTGolog. Additionally, we have shown that also in practise this can lead to a speed-up in the computation of the policy despite the complexity of the formulas that has to be dealt with to achieve action abstraction. Nevertheless, for more complex domains or larger horizons the formulas still become unmanageable large, even with the ADD representation. Consequently, subject of future research will be to explore how the complexity of the formulas can be minimized. One possible approach is to approximate the value function by a linear combination of weighted basis functions. The problem with this is that it is not clear how to find basis functions that allow for a good approximation in the context of programs. References [Andre and Russell, 2000] D. Andre and S. Russell. Programmable reinforcement learning agents. In Advances in Neural Information Processing Systems 13 (NIPS 2000), pages 1019–1025, 2000. 46 Verifying properties of action theories by bounded model checking Laura Giordano Dipartimento di Informatica Università del Piemonte Orientale, Italy Alberto Martelli Dipartimento di Informatica Università di Torino Italy Abstract Temporal logics are well suited for reasoning about actions, as they allow for the specification of domain descriptions including temporal constraints as well as for the verification of temporal properties of the domain. In this paper, we exploit bounded model checking (BMC) techniques in the verification of properties of an action theory formulated in a temporal extension of answer set programming. To achieve completeness, in this paper, we follow an approach to BMC which exploits the Büchi automaton construction. The paper provides an encoding in ASP of the temporal action domain and of bounded model checking of LTL formulas. 1 Introduction Temporal logics are well suited for reasoning about actions, as they allow for the specification of domain descriptions including temporal constraints as well as for the verification of temporal properties of the domain. In this paper, we exploit bounded model checking (BMC) techniques in the verification of properties of an action theory formulated in a temporal extension of answer set programming (ASP [10]). Given a system model (a transition system) and a property to be checked, bounded model checking (BMC) [4] searches for a counterexample of the property by looking for a path in the model. BMC does not require a tableau or automaton construction. It searches for a counterexample as a path of length k and generates a propositional formula that is satisfiable iff such a counterexample exists. The bound k is iteratively increased and if no model exists, the iterative procedure will never stop. As a consequence, bounded model checking (as defined in [4]) provides a partial decision procedure for checking validity. Techniques for achieving completeness have been described in [4], where upper bounds for k are defined for some classes of properties, namely unnested properties. To solve this problem [5] proposes a semantic translation scheme, based on Büchi automata. Helianko and Niemelä [18] developed a compact encoding of bounded model checking of LTL formulas as the problem of finding stable models of logic programs. In this paper, we propose an alternative encoding of BMC of LTL formulas in ASP, with the aim of achieving completeness. As a difference 47 Daniele Theseider Dupré Dipartimento di Informatica Università del Piemonte Orientale, Italy with [18], the computed path is built by exploiting the Büchi automaton construction [14]: it is an accepting path of the product Büchi automaton which can be finitely represented as a k-loop, i.e., a finite path of length k terminating in a loop back in which the states are all distinct from each other. In the verification of a given property, the iterative procedure looks for a k-loop which provides a counterexample to the property by increasing k until either a counterexample is found, or no k-loop of length greater or equal to k can be found. The second condition can be verified by checking that there is no path of length k whose states are all distinct from each other. In the paper, the transition system defining the system model on which the property is to be checked is provided by defining a domain description in a temporal action theory. The action theory is given in a temporal extension of ASP and the extensions of a domain description are defined by generalizing the standard notion of answer set [10] to temporal answer sets. The encoding of BMC in ASP is based on the definition of the Büchi automaton in [19] and exploits the tableau-based procedure in [15] to provide the construction on-the-fly of the automaton. The tableau procedure is directly encoded in ASP to build a path of the product automaton. The encoding in ASP uses a number of ground atoms which is linear in the size of the formula and quadratic in k. 2 Linear Time Temporal Logic In this paper we refer to a formulation of LTL (linear time temporal logic), introduced in [19], where the next state modality is indexed by actions. Let Σ be a finite non-empty alphabet. The members of Σ are actions. Let Σ∗ and Σω be the set of finite and infinite words on Σ. Let Σ∞ =Σ∗ ∪ Σω . We denote by σ, σ ′ the words over Σω and by τ, τ ′ the words over Σ∗ . Moreover, we denote by ≤ the usual prefix ordering over Σ∗ and, for u ∈ Σ∞ , we denote by prf(u) the set of finite prefixes of u. Let P = {p1 , p2 , . . .} be a countable set of atomic propositions. The set of formulas of LTL(Σ) is defined as follows: LTL(Σ) ::= p | ¬α | α ∨ β | haiα | αUβ where p ∈ P and α, β range over LTL(Σ). A model of LTL(Σ) is a pair M = (σ, V ) where σ ∈ Σω and V : prf (σ) → 2 P is a valuation function. Given a model M = (σ, V ), a finite word τ ∈ prf (σ) and a formula α, the satisfiability of a formula α at τ in M , written M, τ |= α, is defined as follows: • M, τ |= p iff p ∈ V (τ ); • M, τ |= ¬α iff M, τ 6|= α; • M, τ |= α ∨ β iff M, τ |= α or M, τ |= β; • M, τ |= haiα iff τ a ∈ prf (σ) and M, τ a |= α. • M, τ |= αUβ iff there exists τ ′ such that τ τ ′ ∈ prf (σ) and M, τ τ ′ |= β. Moreover, for every τ ′′ such that ε ≤ τ ′′ < τ ′1 , M, τ τ ′′ |= α. A formula α is satisfiable iff there is a model M = (σ, V ) and a finite word τ ∈ prf (σ) such that M, τ |= α. The symbols ⊤ and ⊥ can be defined as: ⊤ ≡ p ∨ ¬p and ⊥≡ ¬⊤. The derived modalities [a]α, (next), ✸Wand ✷ can be defined as follows: [a]α ≡ ¬hai¬α, α ≡ a∈Σ haiα, ✸α ≡ ⊤Uα, ✷α ≡ ¬✸¬α. 3 Temporal action language A domain description Π is defined as a set of laws describing the effects of actions and their executability preconditions. Atomic propositions describing the state of the domain are called fluents. Actions may have direct effects, that are described by action laws, and indirect effects, described by causal laws capturing the causal dependencies among fluents. Let L be a first order language which includes a finite number of constants and variables, but no function symbol. Let P be a set of atomic literals p(t1 , . . . , tn ), that we call fluent names. A simple fluent literal l is a fluent name f or its negation ¬f . We denote by LitS the set of all simple fluent literals. LitT is the set of temporal fluent literals: if l ∈ LitS , then [a]l, l ∈ LitT , where a is an action name (an atomic proposition, possibly containing variables), and [a] and are the temporal operators introduced in the previous section. Let Lit = LitS ∪ LitT ∪ {⊥}, where ⊥ represents inconsistency. Given a (simple or temporal) fluent literal l, not l represents the default negation of l. A (simple or temporal) fluent literal possibly preceded by a default negation, will be called an extended fluent literal. The laws are formulated as rules of a temporally extended logic programming language. Rules have the form t0 ← t1 , . . . , tm , not tm+1 , . . . , not tn (1) where the ti ’s are either simple fluent literals or temporal fluent literals. As usual in ASP, the rules with variables will be used as a shorthand for the set of their ground instances. In the following, to define our action language, we make use of a notion of state: a set of ground fluent literals. A state is said to be consistent if it is not the case that both f and ¬f belong to the state, or that ⊥ belongs to the state. A state is said to be complete if, for each fluent name p ∈ P, either p or ¬p belong to the state. The execution of an action in a state may possibly change the values of fluents in the state through its direct and indirect effects, thus giving rise to a new state. We assume that a law as (1) can be applied in all states, while we prefix a law with Init if it only applies to the initial state. 1 We define τ ≤ τ ′ iff ∃τ ′′ such that τ τ ′′ = τ ′ . Moreover, τ < τ ′ iff τ ≤ τ ′ and τ 6= τ ′ . 48 Example 1 This example describes a mail delivery agent, which checks if there is mail in the mailbox of some employees and delivers the mail to them. The actions in Σ are: sense mail (the agent verifies if there is mail in all mailboxes), deliver(E) (the agent delivers the mail to employee E),wait. The fluent names are mail(E) (there is mail in the mailbox of E). The domain description Π contains the following immediate effects and persistency laws: [deliver(E)]¬mail(E) [sense mail]mail(E) ← not [sense mail]¬mail(E) mail(E) ← mail(E), not ¬mail(E) ¬mail(E) ← ¬mail(E), not mail(E) Their meaning is (in the order) that: after delivering the mail to E, there is no mail for E any more; the action sense mail may (non-monotonically) cause mail(E) to become true. The last two rules define the persistency of fluent mail. Observe that, the persistency laws interact with the immediate effect laws above. The execution of sense mail in a state in which there is no mail for some E (¬mail(E)), may either lead to a state in which mail(E) holds (by the second action law) or to a state in which ¬mail(E) holds (by the persistency of ¬mail(E)). Thus sense mail is a nondeterministic action. We can also add the following precondition laws: [deliver(E)] ⊥← ¬mail(E) [wait] ⊥← mail(E) specifying that, if there is no mail for E, deliver(E) is not executable, while, if there is mail for E, wait is not executable. We assume that there are only two employees, a and b, and that in the initial state there is neither mail for a nor for b: Init ¬mail(a) Init ¬mail(b). are included in Π. Although not included in the example, the language is also well suited to describe causal dependencies among fluents by means of static causal laws such as, for instance, light on ← voltage (if there is voltage, the light is on), or dynamic causal laws as (form the shooting domain) f rightened ← in sight, ¬in sight, alive (if the turkey is alive, it becomes frightened, if it is not already, when it starts seeing the hunter). Similar causal rules can be formulated in the action languages K [9] and C + [16]. 3.1 Temporal answer sets To define the the semantics of a domain description, we extend the notion of answer set [10] to capture the linear structure of temporal models. In the following, we consider the ground instantiation of the domain description Π, and we denote by Σ the set of all the ground instances of the action names in Π. We define a temporal interpretation as a pair (σ, S), where σ ∈ Σω is a sequence of actions and S is a consistent set of literals of the form [a1 ; . . . ; ak ]l, where a1 . . . ak is a prefix of σ, meaning that l holds in the state obtained by executing a1 . . . ak . S is consistent iff it is not the case that both [a1 ; . . . ; ak ]l ∈ S and [a1 ; . . . ; ak ]¬l ∈ S, for some l, or [a1 ; . . . ; ak ]⊥ ∈ S. A temporal interpretation (σ, S) is said to be total if either [a1 ; . . . ; ak ]p ∈ S or [a1 ; . . . ; ak ]¬p ∈ S, for each a1 . . . ak prefix of σ and for each fluent name p. We define the satisfiability of a simple, temporal or extended literal t in a partial temporal interpretation (σ, S) in the state a1 . . . ak , (written (σ, S), a1 . . . ak |= t) as follows: (σ, S), a1 . . . ak |= ⊤, (σ, S), a1 . . . ak 6|= ⊥ (σ, S), a1 . . . ak |= l iff [a1 ; . . . ; ak ]l ∈ S, for a literal l (σ, S), a1 . . . ak |= [a]l iff [a1 ; . . . ; ak ; a]l ∈ S or a1 . . . ak , a is not a prefix of σ (σ, S), a1 . . . ak |= l iff [a1 ; . . . ; ak ; b]l ∈ S, where a1 . . . ak b is a prefix of σ (σ, S), a1 . . . ak |= not l iff (σ, S), a1 . . . ak 6|= l The satisfiability of rule bodies in a temporal interpretation are defined as usual. A rule H ← Body is satisfied in a temporal interpretation (σ, S) if, for all action sequences a1 . . . ak (including the empty one), (σ, S), a1 . . . ak |= Body implies (σ, S), a1 . . . ak |= H. A rule Init H ← Body is satisfied in a partial temporal interpretation (σ, S) if, (σ, S), ε |= Body implies (σ, S), ε |= H, where ε is the empty action sequence. To define the answer sets of Π, we introduce the notion of reduct of Π, containing rules of the form: [a1 ; . . . ; ah ](H ← Body). Such rules are evaluated in the state a1 . . . ah . Let Π be a set of rules over an action alphabet Σ, not containing default negation, and let σ ∈ Σω . Definition 1 A temporal interpretation (σ, S) is a temporal answer set of Π if S is minimal (in the sense of set inclusion) among the S ′ such that (σ, S ′ ) is a partial interpretation satisfying the rules in Π. To define answer sets of a program Π containing negation, given a temporal interpretation (σ, S) over σ ∈ Σω , we define the reduct, Π(σ,S) , of Π relative to (σ, S) extending Gelfond and Lifschitz’ transform [11] to compute a different reduct of Π for each prefix a1 , . . . , ah of σ. (σ,S) Definition 2 The reduct, Πa1 ,...,ah , of Π relative to (σ, S) and to the prefix a1 , . . . , ah of σ , is the set of all the rules [a1 ; . . . ; ah ](H ← l1 , . . . , lm ) such that H ← l1 , . . . , lm , not lm+1 , . . . , not ln is in Π and (σ, S), a1 , . . . , ah 6|= li , for all i = m + 1, . . . , n. The reduct Π(σ,S) of Π relative to (σ, S) is the union of all (σ,S) reducts Πa1 ,...,ah for all prefixes a1 , . . . , ah of σ. Definition 3 A temporal interpretation (σ, S) is an answer set of Π if (σ, S) is an answer set of the reduct Π(σ,S) . Although the answer sets of a domain description Π are partial interpretations, in some cases, e.g., when the initial state is complete and all fluents are inertial, it is possible to guarantee that the temporal answer sets of Π are total. In case the initial state is not complete,we consider all the possible ways to complete the initial state by introducing in Π, for each fluent name f , the rules: Init f ← not ¬f Init ¬f ← not f 49 The case of total temporal answer sets is of special interest as a total temporal answer set (σ, S) can be regarded as temporal model (σ, V ), where, for each finite prefix a1 . . . ak of σ, V (a1 , . . . , ak ) = {p : [a1 , . . . , ak ]p ∈ S}. In the following, we restrict our consideration to domain descriptions Π, such that all the answer sets of Π are total. A total temporal interpretation (σ, S) provides, for each prefix a1 . . . ak , a complete state corresponding to that prefix. (σ,S) We denote by wa1 ...ak the state obtained by the execution of (σ,S) the actions a1 . . . ak in the sequence, namely wa1 ...ak = {l : [a1 ; . . . ; ak ]l ∈ S}. Given a domain description Π over Σ with total answer sets, a transition system (W, I, T ) can be associated with Π as follows: - W is the set of all the possible consistent and complete states of the domain description; - I is the set of all the states in W satisfying the initial state laws in Π; - T ⊆ W × Σ × W is the set of all triples (w, a, w′ ) such that: w, w′ ∈ W , a ∈ Σ and for some total answer set (σ,S) (σ,S) (σ, S) of Π: w = w[a1 ;...;ah ] and w′ = w[a1 ;...;ah ;a] , for some h. 3.2 Reasoning with LTL on domain descriptions As a total temporal answer set of a domain description can be interpreted as an LTL model, it is easy to combine domain descriptions with LTL formulas. This can be done in two ways: on the one hand, LTL formulas can be used as constraints C on the executions of the domain description; on the other hand, LTL formulas can encode properties φ to be verified on the domain description. Example 2 Assume we want to constrain our domain description in Example 1 so that the agent continuously executes a loop where it senses mail, but there cannot be two consecutive executions of sense mail. These constraints can be formulated as follows: ✷✸hsense maili⊤ ✷[sense mail]¬hsense maili⊤ Furthermore, we may want to check that, if there is mail for a, the agent will eventually deliver it to a. This property, which can be formalized as ✷(mail(a) ⊃ ✸¬mail(a)), does not hold as there is a possible scenario in which there is always mail for a and for b, but the mail is repeatedly delivered to b and never to a. The mail delivery agent we have described is not correct with respect to this property. In the following, we will assume that a set of constraints C is added to the domain description, beside the rules in Π, and we denote by (Π, C) the enriched domain description. We define the extensions of (Π, C) to be the temporal answer sets of Π satisfying the constraints C. 4 Model checking The above verification and satisfiability problems can be solved by means of model checking techniques. Given a domain description, with its associated transition system, the extension of the domain description satisfying a set of constraints C can be found by looking for a path in the transition system satisfying the formulas in C. On the other hand, given a property ϕ formulated as a LTL formula, we can check its validity by checking the unsatisfiability of ¬ϕ in the transition system. In this case, if a model satisfying ¬ϕ is found, it represents a counterexample to the validity of ϕ. The standard approach to model checking for LTL is based on Büchi automata. A Büchi automaton over an alphabet Σ is a tuple B = (Q, →, Qin , F ) where: Q is a finite nonempty set of states; →⊆ Q×Σ×Q is a transition relation; Qin ⊆ Q is the set of initial states; F ⊆ Q is a set of accepting states. Let σ ∈ Σω ; a run of B over σ is a map ρ : prf (σ) → Q a such that: ρ(ε) ∈ Qin and ρ(τ ) → ρ(τ a) for each τ a ∈ prf (σ) with a ∈ Σ. The run ρ is accepting iff inf(ρ) ∩ F 6= ∅, where inf(ρ) ⊆ Q is given by: q ∈ inf (ρ) iff ρ(τ ) = q for infinitely many τ ∈ prf (σ). The language of ω-words accepted by B is: L(B) = {σ|∃ an accepting run of B over σ}. The satisfiability problem for LTL can be solved in deterministic exponential time by constructing for each formula α ∈ LT L(Σ) a Büchi automaton Bα [14] such that the language of ω-words accepted by Bα is non-empty if and only if α is satisfiable. In case of model checking we have a property which is represented as an LTL formula ϕ, and a model (transition system) which directly corresponds to a Büchi automaton where all the states are accepting. The property can be proved by taking the product of the model and of the automaton derived from ¬ϕ, and by checking for emptiness of the accepted language. In [4] it has been shown that, in some cases, model checking can be more efficient if, instead of building the product automaton and checking for an accepting run on it, we look for an infinite path of the transition system satisfying C ∪ {¬ϕ}. This technique is called bounded model checking (BMC), since it looks for paths whose length is bounded by some integer k, by iteratively increasing the length k until a model satisfying the formulas in C ∪ {¬ϕ} is found (if one exists). More precisely, it considers infinite paths which can be represented as a finite path of length k with a back loop, i.e. with an edge from state k to a previous state in the path. A BMC problem can be efficiently reduced to a propositional satisfiability problem or to an ASP problem [18]. Unfortunately, if no model exists, the iterative procedure will never stop, if the transition system contains a loop. Thus it is a partial decision procedure for checking validity. Techniques for achieving completeness are described in [4] for some kinds of LTL formulas. 5 Büchi automata and Model checking In this paper, we propose an approach which combines the advantages of BMC and the possibility of formulating it easily and efficiently as an ASP problem, with the advantages of reasoning on the product Büchi automaton described above, mainly its completeness. In this section, we show how to build the product automaton and how to use the automaton tableau construction for BMC. In the next section we describe how to encode the transition system and BMC in ASP. 50 The problem of constructing a Büchi automaton from a LTL formula has been deeply studied. In this section we show how to build a Büchi automaton for a given LTL(Σ) formula φ using the tableau-like procedure. The construction is adapted from the procedure given in [19; 15] for Dynamic Linear Time Logic (DLTL), a logic which extends LTL by indexing the until operator with regular programs. The main procedure to construct the Büchi automaton for a formula φ builds a graph G(φ) whose nodes are labelled by sets of formulas, and whose edges are labelled by symbols from the alphabet Σ. States and transitions of the Büchi automaton are obtained directly from the nodes and edges of the graph. The construction of the states makes use of an auxiliary tableau-based function tableau which handles signed formulas, i.e. formulas prefixed with the symbol T or F. This function takes as input a set of formulas2 and returns a set of sets of formulas, obtained by expanding the input set according to a set of tableau rules, formulated as follows: • φ ⇒ ψ1 , ψ2 , if φ belongs to the set of formulas, then add ψ1 and ψ2 to the set • φ ⇒ ψ1 |ψ2 , if φ belongs to the set of formulas, then replace the set with two copies of the set and add ψ1 to one of them and ψ2 to the other one. The rules are the following: Tor: For: Tneg: Fneg: Tuntil: Funtil: T(α ∨ β) ⇒ Tα|Tβ F(α ∨ β) ⇒ Fα, Fβ T¬α ⇒ Fα F¬α ⇒ Tα TαUβ ⇒ T(β ∨ (α ∧ FαUβ ⇒ F(β ∨ (α ∧ αUβ)) αUβ)) where the tableau rules for the until formula make use of the equivalence: αUβ ≡ (β ∨ (α ∧ αUβ)). This set of rules can be easily extended to deal with other boolean connectives and modal operators like ✷ or ✸ by making use of the equivalences ✷β ≡ (β ∧ ✷β)) and ✸β ≡ (β ∨ ✸β)). Given a set of formulas s, function tableau repeatedly applies the above rules to the formulas of s (by possibly creating new sets) until all formulas in all sets have been expanded. If the expansion of a set of formulas produces an inconsistent set, then this set is deleted. A set of formulas s is inconsistent in the following cases: (i) T⊥ ∈ s; (ii) F⊤ ∈ s; (iii) Tα ∈ s and Fα ∈ s; (iv) Thaiα ∈ s and Thbiβ ∈ s with a 6= b, because in a linear time logic two different actions cannot be executed in the same state. To build the graph for a formula φ, we begin by building the initial states, obtained by applying function tableau to the W set {φ , T a∈Σ hai⊤}, where the second formula takes into account the fact that runs must be infinite and thus there must be at least an outgoing edge from each state. After execution of tableau, every resulting set contains exactly one Thai⊤ formula, for some a ∈ Σ. The above tableau rules do not expand formulas whose top operator is a next time operator, i.e. haiα or α. Expanding such formulas from a node n means creating a new node containing α connected to n through an edge labelled with a 2 In this section “formulas” means “signed formulas”. in the first case, or with any symbol in Σ in the second case. Thus an obvious procedure for building the graph is to apply to all sets obtained by the tableau procedure the following construction: if node n contains a formula Thaiα, then build the set of the nodes connected to n through an edge labelled a as tableau({Tα|Thaiα ∈ n} ∪ {Tα|T α ∈ W n}∪{Fα|Fhaiα ∈ n}∪{Fα|F α ∈ n}∪{T a∈Σ hai⊤}). The construction is iterated on the new nodes. States and transitions of the Büchi automaton correspond directly to the nodes and edges of the graph. We must now define the accepting states of the automaton. Intuitively, we would like to define as accepting those runs in which all the until formulas of the form TαUβ are fulfilled. If a node n contains the formula TαUβ, then we can accept an infinite run containing n, if node n is followed in the run by a node n′ containing Tβ. Furthermore all nodes between n and n′ must contain Tα. Let us assume that a node n contains the until formula TαUβ. After the expansion of this formula, n either contains Tβ or T αUβ. In the latter case, each successor node will contain a formula TαUβ. We say that this until formula is derived from formula TαUβ in node n. If a node contains an until formula which is not derived from a predecessor node, we will say that the formula is new. New until formulas are obtained during the expansion of the tableau procedure. In order to formulate the accepting condition, we must be able to trace the until formulas along the paths of the graph to make sure that they are fulfilled. To do this we extend Tsigned formulas so that all until formulas have a label 0 or 1, i.e. they have the form TαU l β where l ∈ {0, 1}3 . Note that two formulas TαU 0 β and TαU 1 β are considered to be different. Furthermore, we define each node of the graph as a triple (F, x, f ), where F is an expanded set of formulas built by function tableau, x ∈ {0, 1}, and f ∈ {↓, X}. f = X means that the node represents an accepting state. For each node (F, x, f ), the label of an until formula in F will be assigned as follows: if it is a derived until formula, then its label is the same as that of the until formula in the predecessor node it derives from, otherwise, if the formula is new, it is given the label 1 − x. Given a node (F, x, f ) and a successor (F ′ , x′ , f ′ ), x′ and f ′ are defined as follows: if f = X then x′ := 1 − x else x′ := x , ′ if there is no T αU x β ∈ F ′ then f ′ := X else f ′ :=↓ Let us call 0-sequences or 1-sequences the sequences of nodes of a run ρ with x = 0 or x = 1 respectively. Intuitively, every new until formula created in a node of a 0-sequence will be fulfilled within the end of the next 1-sequence, and vice versa. In fact, the formula will be given label 1 and propagated in the following nodes with the same label, and the 1-sequence cannot terminate until the until formula is fulfilled. If ρ is an accepting run, then it must contain infinitely many nodes containing X, and thus all 0-sequences and 1sequences must be finite and, as a consequence, all until formulas will be fulfilled. Given a graph G(φ), the states and transitions of the Büchi 3 If we introduce also the ✷ and ✸ operators, we have to label them in the analogous way. 51 automaton B(φ) correspond directly to the nodes and edges of G(φ), and the set of accepting states of B(φ) consists of all states whose corresponding node contains f = X. In [15] it is proved that there is a σ ∈ L(B(φ)) if and only if there is a model M = (σ, V ) such that M, ε |= φ. The same construction can be used in model checking for building the product automaton of B(φ) and the transition system. Every state of the product automaton is the union of a set of fluents forming a state of the transition system and a set of signed formulas corresponding to a state of B(φ), while transitions must agree both with transitions of B(φ) and those of the action theory. We assume that the action theory and the LTL formulas refer to the same set of actions and atomic propositions. Of course, the states of the product automaton must be consistent, i.e. they cannot contain the literal ¬f and the signed formula Tf or f and Ff 4 . The construction of the automaton can be done on-the-fly, while checking for the emptiness of the language accepted by the automaton. In this paper, following the BMC approach, we aim at generating a single path of the automaton at a time. Given an integer k, we look for a path of length k of the automaton, with a loop back from the last state to a previous state l in the path, such that there is an accepting state j, l ≤ j ≤ k. Such a k-loop finitely represents an accepting run of the automaton. Note that we can consider only simple paths, that is paths without repeated nodes. This property allows to define a terminating algorithm, thus achieving completeness: the bound k is increased until a k-loop is found or the length of the longest path of the automaton is reached. To find the length of the longest path we can proceed iteratively by looking for a simple path of length k (without loop), incrementing k at each iteration. Since the product automaton has a finite size, this procedure terminates. Example 3 Let us consider the domain description in Example 1 with the constraint ✷✸hsense maili⊤. The following is a k-loop satisfying the constraint for k = 4. It consist of the wait wait states s0 , . . . , s4 with the transitions s0 → s1 , s1 → s2 , sense mail deliver(b) deliver(a) −→ s3 , s3 −→ s4 , s4 −→ s1 . State s0 is obtained by applying tableau to the LTL formula expressing the constraint, and adding the fluent literals holding in the initial state. Thus we get5 : T✷✸hsense maili⊤, T✸1 hsense maili⊤, T ✷✸hsense maili⊤, T ✸1 hsense maili⊤, Thwaiti⊤, ¬a, ¬b, x = 0, f = X The second and third formulas are obtained by the expansion of the first one, while the fourth formula is obtained by the expansion of the second one. State s1 is obtained by propagating the next time formulas and expanding them: T✷✸hsense maili⊤, T✸1 hsense maili⊤, T✸0 hsense maili⊤, T ✷✸hsense maili⊤, T ✸1 hsense maili⊤, T ✸0 hsense maili⊤, Thwaiti⊤, ¬a, ¬b, x = 1, f =↓ s2 4 Remember that the states of the transition systems are complete and thus each state must contain either f or ¬f 5 We omit formulas having as topmost operator a boolean connective, and we use a and b as a shorthand for mail(a), mail(b). The second and third formulas are identical but the index of the ✸ operator: the second formula derives from the previous state, while the third one derives from the first formula of this state; f is ↓ because there is a next time formula with label 1. State s2 is: T✷✸hsense maili⊤, T✸1 hsense maili⊤, T✸0 hsense maili⊤, T ✷✸hsense maili⊤, Thsense maili⊤, ¬a, ¬b, x = 1, f = X The value of f is X because there are no next time formulas with label 1. The formulas T✸l hsense maili⊤ are fulfilled because sense mail will be the next action. State s3 is: T✷✸hsense maili⊤, T✸1 hsense maili⊤, T ✷✸hsense maili⊤, T ✸1 hsense maili⊤, Thdeliver mail(b)i⊤, a, b, x = 0, f = X Note that the execution of sense mail changes the value of a and b. State s4 is: T✷✸hsense maili⊤, T✸0 hsense maili⊤, T✸1 hsense maili⊤, T ✷✸hsense maili⊤, T ✸1 hsense maili⊤, T ✸0 hsense maili⊤, Thdeliver mail(a)i⊤, a, ¬b, x = 1, f =↓ By executing action deliver mail(a) we have a transition back to state s1 . Example 4 Let us consider now our domain description with the two constraints in Example 2. To check whether the formula ϕ = ✷(mail(a) ⊃ ✸¬mail(a)) is valid, we add to the domain description the two constraints and ¬ϕ, and we get the following k-loop which represent a counterexample to the property: s0 sense mail −→ s1 , s1 deliver(b) −→ s2 , s2 sense mail −→ s3 , deliver(b) s3 −→ s2 . Furthermore, we have the following fluents in each state: s0 : ¬a, ¬b, s1 : a, b, s2 : a, ¬b, s3 : a, b. Thus the mail of a is never delivered. Let us now modify the domain theory by adding the precondition [sense mail] ⊥← mail(E). In this case, we expect ϕ to hold. To check this, we first compute the length of the longest path in the Büchi automaton, which turns out to be 9, and then check that there is no k-loop for k up to 9. 6 Encoding bounded model checking in ASP We give now a translation into standard ASP of the above procedure for building a path of the product Büchi automaton. The translation has been run in the DLV-Complex extension of DLV [20]. In the translation we use predicates like fluent, action, state, to express the type of atoms. As we are interested in infinite runs represented as k-loops, we assume a bound K to the number of states. States are represented in ASP as integers from 0 to K, where K is given by the predicate laststate(State). The predicate occurs(Action,State) describes transitions. Occurrence of exactly one action in each state can be encoded as: -occurs(A,S):- occurs(A1,S),action(A), action(A1),A!=A1,state(S). occurs(A,S):- not -occurs(A,S),action(A), state(S). 52 As we have seen, states are associated with a set of fluent literals, a set of signed formulas, and the values of x and f . Fluent literals are represented with the predicate holds(Fluent, State), T or F formulas with tt(Formula,State) or ff(Formula,State), x with the predicate x(Val,State) and f with the predicate acc(State), which is true if State is an accepting state. States on the path must be all different, and thus we need to define a predicate eq(S1,S2) to check whether the two states S1 and S2 are equal: eq(S1,S2):- state(S1), state(S2), not diff(S1,S2). diff(S1,S2):- state(S1), state(S2), tt(F,S1), not tt(F,S2). diff(S1,S2):- state(S1), state(S2), holds(F,S1), not holds(F,S2). and similarly for any other kind of component of a state. The following constraint requires all states up to K to be different: :- state(S1), state(S2), S1!=S2, eq(S1,S2), laststate(K), S1<=K, S2<=K. Furthermore we have to define suitable constraints stating that there will be a transition from state K to a previous state L6 , and that there must be a state S, L ≤ S ≤ K, such that acc(S) holds, i.e. S is an accepting state. To do this we compute the successor of state K, and check that it is equal to S. loop(L):- state(L), laststate(K), L<=K, SuccK=K+1, eq(L,SuccK). accept:- loop(L), state(S), laststate(K), L<=S, S<=K, acc(S). :- not accept. A problem we want to translate to ASP consists of a domain description Π and a set of LTL formulas ϕ1 , . . . ϕn , representing constraints or negated properties, to be satisfied on the domain description. The rules of the domain description can be easily translated to ASP, similarly to [10]. In the following we give the translation of our running example7 . action(sense mail). action(deliver(a)). action(deliver(b)). action(wait). fluent(mail(a)). fluent(mail(b)). action effects: holds(mail(E),NS):- occurs(sense mail,S), fluent(mail(E)), NS=S+1, not -holds(mail(E),NS). -holds(mail(E),NS):- occurs(deliver(E),S), fluent(mail(E)), NS=S+1. persistence: holds(F,NS):- holds(F,S), fluent(F), NS=S+1, 6 Since states are all different, there will be at most one state equal to the successor of K. 7 Actually, a more general approach to deal with variables in action names and fluents, consists in introducing , as done in [8], type predicates for fluents and actions and to include type conditions in the translation. not -holds(F,NS). holds(F,NS):- -holds(F,S), fluent(F), NS=S+1, not holds(F,NS). preconditions: :- occurs(deliver(E),S),-holds(mail(E),S). :- occurs(wait,S), holds(mail(E),S). initial state: -holds(mail(a),0). -holds(mail(b),0). LTL formulas are represented as ASP terms. The expansion of signed formulas can be formulated by means of ASP rules corresponding to the rules given in the previous section. Disjunction: tt(F1,S) v tt(F2,S):- tt(or(F1,F2),S). ff(F1,S):- ff(or(F1,F2),S). ff(F2,S):- ff(or(F1,F2),S). Negation: ff(F,S):- tt(neg(F),S). tt(F,S):- ff(neg(F),S). Until: tt(lab until(F1,F2,Lab),S):tt(until(F1,F2),S), x(VX,S), 1=Lab+VX. ff(or(F2,and(F1,next(until(F1,F2)))),S):ff(until(F1,F2),S). tt(or(F2,and(F1,next(lab until(F1,F2,L)))),S):tt(lab until(F1,F2,L),S). Note that, to express splitting of sets of formulas, as in the case of disjunction, we can exploit disjunction in the head of clauses, provided by some ASP languages such as DLV. We have introduced the term lab until(F1,F2,Lab) for labeled until formulas, as described in the previous section. Expansions of next time formulas hai (diamond) and (next) are defined as: occurs(Act,S):- tt(diamond(Act,F),S). tt(F,NS):- tt(diamond(Act,F),S), NS=S+1. ff(F,NS):- ff(diamond(Act,F),S), occurs(Act,S), NS=S+1. tt(F,NS):- tt(next(F),S), NS=S+1. ff(F,NS):- ff(next(F),S), NS=S+1. Inconsistency of signed formulas is formulated with the following constraints: :- ff(true,S), state(S). :- tt(F,S), ff(F,S), state(S). :- tt(diamond(Act1,F),S), tt(diamond(Act2,F),S), Act1!=Act2. :- tt(F,S), not holds(F,S). :- ff(F,S), not -holds(F,S). Finally, predicates x and acc are defined as follows. x(NN,NS):- acc(S), x(N,S), NS=S+1, 1=NN+N. x(N,NS):- -acc(S), x(N,S), NS=S+1. -acc(NS):- x(N,NS), tt(lab until( , ,N),NS), NS=S+1. acc(NS):- not -acc(NS), NS=S+1. x(0,0). acc(0). We must also add a fact tt(tr(ϕi ),0) for each ϕi , where tr(ϕi ) is the ASP term representing ϕi . It is easy to see that the (groundization of the) encoding in ASP is linear in the size of the formula and quadratic in the size of k. Observe that the number of the ground instances 53 of all predicates is O(|φ| × k), except for eq, whose ground instances are k 2 . We can prove that there is a one to one correspondence between the extensions of a domain description satisfying a given temporal formula and the answer sets of the ASP program encoding the domain and the formula. Proposition 1 Let Π be a domain description whose temporal answer sets are total, let tr(Π) be the ASP encoding of Π (for a given k), and let φ be an LTL formula. There is a one to one correspondence between the temporal answer sets of Π that satisfy the formula φ and the answer sets of the ASP program tr(Π) ∪ tt(tr(φ, 0)), where tr(φ) is the ASP term representing φ. Completeness of BMC can be achieved considering that that the longest simple path in the product Büchi automaton determines an upper bound k0 on the length of the k-loops searched for by the iterative procedure. To check the validity of a formula φ, we look for a k-loop satisfying ¬φ. During the iterative search, we can check whether the bound k0 has been already reached or not. In practice, at each iteration k, either we find a k-loop of length k, or we check if there is a simple path (a path with no loop and without repeated nodes) of length k. If not, we can conclude that the bound has been reached and we can stop. The search for a simple path of length k can be done by removing from the above ASP encoding the rules for defining loops and the rules for defining the Buchi acceptance condition (the definitions of x, acc and accept and the constraint :- not accept). 7 Conclusions We have presented a bounded model checking approach for the verification of properties of temporal action theories in ASP. The temporal action theory is formulated in a temporal extension of ASP, where the presence of LTL constraints in the domain description, allows for state trajectory constraints to be captured, as advocated in PDDL3 [13]. The proposed approach can be easily extended to the logic DLTL, which extends LTL with regular programs of propositional dynamic logic [19]. It provides a uniform ASP methodology for specifying domain descriptions and for verifying them, which can to be used for several reasoning tasks, including reasoning about communication protocols [2; 15], business process verification [7], planning with temporal constraints [1], to mention some of them. Helianko and Niemelä [18] developed a compact encoding of bounded model checking of LTL formulas as the problem of finding stable models of logic programs. In this paper, to achieve completeness, we follow a different approach to BMC which exploits the Büchi automaton construction. This makes the proposed approach well suited both for verifying that there is an extension of the domain description satisfying/falsifying a given property, and for verifying that all the extensions of the domain description satisfy a given property. [5] first proposed the use of the Büchi automaton in BMC. As a difference, our encoding in ASP is defined without assuming that the Büchi automaton is computed in advance. The states of the Büchi automaton are indeed computed on the fly, when building the path of the product automaton. This requires the equality among states to be checked during the construction of k-loops, which makes the size of the translation quadratic in k. This quadratic blowup is the price we pay for achieving completeness with respect to the translation to stable models in [18]. Apart from the presence of the temporal constraints, the action language we introduced in Section 3 has strong relations with the languages K and C. The logic programming based planning language K [8; 9] is well suited for planning under incomplete knowledge and which allows concurrent actions. The temporal action language introduced in section 3 for defining the rules in Π can be regarded as a fragment of K in which concurrent actions are not allowed. The planning system DLV K provides an implementation of K in the disjunctive logic programming system DLV. DLV K does not appear to support other kinds of reasoning besides planning, and, in particular, does not allow to express temporal properties and to verify them. The languages C and C + [17; 16] also deal with actions with indirect and non-deterministic effects and with concurrent actions, and are based on nonmonotonic causation rules syntactically similar to those of K, where head and body of causation rules can be boolean combination of constants. Their semantics is based on a nonmonotonic causal logic [16]. Due to the different semantics, a mapping between our action language and the languages C and C + appears not to be straightforward. If a causal theory is definite (the head of a rule is an atom), it is possible to reason about it by turning the theory into a set of propositional formulas by means of a completion process, and then invoke a satisfiability solver. In this way it is possible to perform various kinds of reasoning such as prediction, postdiction or planning. However the language does not exploits standard temporal logic constructs to reason about actions. The action language defined in this paper can be regarded as a temporal extension of the language A [12]. The extension allows to deal with general temporal constraints and infinite computations. Instead, it does not deal with concurrent actions and incomplete knowledge. The presence of temporal constraints in our action language is related to the work on temporally extended goals in [6; 3], which, however, is concerned with expressing preferences among goals and exceptions in goal specification. References [1] F. Bacchus and F. Kabanza. Planning for temporally extended goals. Annals of Mathematics and AI, 22:5– 27, 1998. [2] M. Baldoni, C. Baroglio, and E. Marengo. Behaviororiented Commitment-based Protocols. In Proc. 19th ECAI, pages 137–142, 2010. [3] C. Baral and J. Zhao. Non-monotonic temporal logics for goal specification. In IJCAI 2007, pages 236–242, 2007. [4] A. Biere, A. Cimatti, E. M. Clarke, O. Strichman, and Y. Zhu. Bounded model checking. Advances in Computers, 58:118–149, 2003. 54 [5] E.M. Clarke, D. Kroening, J. Ouaknine, and O. Strichman. Completeness and complexity of bounded model checking. In VMCAI, pages 85–96, 2004. [6] U. Dal Lago, M. Pistore, and P: Traverso. Planning with a language for extended goals. In Proc. AAAI02, 2002. [7] D. D’Aprile, L. Giordano, V. Gliozzi, A. Martelli, G. L. Pozzato, and D. Theseider Dupré. Verifying Business Process Compliance by Reasoning about Actions. In CLIMA XI, volume 6245 of LNAI, 2010. [8] T. Eiter, W. Faber, N. Leone, G. Pfeifer, and A. Polleres. A logic programming approach to knowledge-state planning, II: The DLVk system. Artificial Intelligence, 144(1-2):157–211, 2003. [9] T. Eiter, W. Faber, N. Leone, G. Pfeifer, and A. Polleres. A logic programming approach to knowledge-state planning: Semantics and complexity. ACM Trans. Comput. Log., 5(2):206–263, 2004. [10] M. Gelfond. Handbook of Knowledge Representation, chapter 7, Answer Sets. Elsevier, 2007. [11] M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In Logic Programming, Proc. of the 5th Int. Conf. and Symposium, 1988. [12] M. Gelfond and V. Lifschitz. Representing action and change by logic programs. Journal of logic Programming, 17:301–322, 1993. [13] A. Gerevini and D. Long. Plan constraints and preferences in PDDL3. Technical Report, Department of Electronics and Automation, University of Brescia, Italy, 2005. [14] R. Gerth, D. Peled, M.Y.Vardi, and P. Wolper. Simple on-the-fly automatic verification of linear temporal logic. In Proc. 15th Work. Protocol Specification, Testing and Verification, 1995. [15] L. Giordano and A. Martelli. Tableau-based automata construction for dynamic linear time temporal logic. Annals of Mathematics and AI, 46(3):289–315, 2006. [16] E. Giunchiglia, J. Lee, V. Lifschitz, N. McCain, , and H. Turner. Nonmonotonic causal theories. Artificial Intelligence, 153(1-2):49–104, 2004. [17] E. Giunchiglia and V. Lifschitz. An action language based on causal explanation: Preliminary report. In AAAI/IAAI, pages 623–630, 1998. [18] K. Heljanko and I. Niemelä. Bounded LTL model checking with stable models. TPLP, 3(4-5):519–550, 2003. [19] J.G. Henriksen and P.S. Thiagarajan. Dynamic linear time temporal logic. Annals of Pure and Applied logic, 96(1-3):187–207, 1999. [20] N. Leone, G. Pfeifer, W. Faber, T. Eiter, G. Gottlob, S. Perri, and F. Scarcello. The DLV system for knowledge representation and reasoning. ACM Transactions on Computational Logic, 7(3):499–562, 2006. Efficient Epistemic Reasoning in Partially Observable Dynamic Domains Using Hidden Causal Dependencies Theodore Patkos and Dimitris Plexousakis Foundation for Research and Technology Hellas - FORTH Heraklion, Greece {patkos,dp}@ics.forth.gr Abstract Reasoning about knowledge and action in realworld domains requires establishing frameworks that are sufficiently expressive yet amenable to efficient computation. Where partial observability is concerned, contemporary research trends concentrate on alternative representations for knowledge, as opposed to the possible worlds semantics. In this paper we study hidden causal dependencies, a recently proposed approach to model an agent’s epistemic notions. We formally characterize its properties, substantiate its computational benefits and provide a generic implementation method. 1 Introduction Research in epistemic action theories has extended the competence of cognitive robotics to different domains. Powerful frameworks with expressive formal accounts for knowledge and change and elegant mathematical specifications have been developed, motivated primarily by the adaptation of the possible worlds semantics in action theories, e.g., [Scherl and Levesque, 2003; Thielscher, 2000; Lob, 2001]. Nevertheless, their employment to large-scale systems raises legitimate concerns, due to their dependence on a computationally intensive structure: according to the possible worlds specifications, for a domain of n atomic formulae determining whether a formula is known may require up to 2n worlds to check truth in. Aiming at efficiency, contemporary research departs from the standard practice and explores alternative characterizations of knowledge focusing on classes of restricted expressiveness or sacrificing completeness. In many cases, a demand for context-complete domains, i.e., deterministic domains where all action preconditions are known upon action execution, is introduced to prove logical equivalence to possible worlds. Yet, for real-world domains such restrictions are often too strong to accept. In a previous study we proposed a formal theory for reasoning about knowledge, action and time that can reach sound and complete conclusions with respect to possible worldsbased theories for a wide range of commonsense phenomena [Patkos and Plexousakis, 2009]. One innovative aspect of the approach was the introduction of a generic form of implication rules, called hidden causal dependency (HCD), 55 that captures the relation among unknown preconditions and effects of actions. In this paper we uncover the insights of HCDs and present the complete axiomatization. HCDs can be a valuable tool as they offer a way to represent an agent’s best knowledge about a partially observable world without having to maintain all alternative states. Setting off from previous studies, we adopt a more concrete representation that does not require the introduction of auxiliary fluents, offering a generic theory that is not attached to a particular underlying formalism. In addition, we elaborate on complexity issues substantiating the claims of efficiency, which stems from the fact that HCDs are treated as ordinary epistemic concepts. As the objective of this research is to contribute to the creation of practical applications without sacrificing expressiveness, this paper also discusses a way to implement the axiomatization and introduces a new tool for epistemic reasoning. We begin with a presentation of state-of-the-art approaches and, after a brief familiarization with the underlying knowledge theory, we describe in detail the axiomatization concerning HCDs. Section 5 provides a property analysis, while section 6 discusses implementation issues. 2 Related Work To alleviate the computational intractability of reasoning under the possible worlds specifications, as well as to address other problematic issues, such as the logical omniscience side-effect, alternative approaches for handling knowledge change have been proposed that are disengaged from the accessibility relation. These theories are rapidly moving from specialized frameworks to an important research field, but adopt certain restrictions on the type of knowledge formulae or domain classes that they can support. Maybe the most influential initial approach towards an alternative formal account for reasoning about knowledge and action is due to Demolombe and Pozos-Parra [2000] who introduced two different knowledge fluents to explicitly represent the knowledge that an ordinary fluent is true or false. Working on the Situation Calculus, they treated knowledge change as changing each of these fluents individually, the same way ordinary fluent change is performed in the calculus, thus reducing reasoning complexity by linearly increasing the number of fluents. Nevertheless, the expressive power of the representation was limited to knowledge of literals, while it enforced knowledge of disjunctions to be broken apart into knowledge of the individual disjuncts. Petrick and Levesque [2002] proved the correspondence of this approach to the possible worlds-based Situation Calculus axiomatization for successor state axioms of a restricted form. Moreover, they defined a combined action theory that extended knowledge fluents to also account for first-order formulae when disjunctive knowledge is tautology-free, still enforcing it to be broken apart into knowledge of the individual parts. Regression used by standard Situation Calculus is considered impractical for large sequences of actions and introduces restrictive assumptions, such as closed-world and domain closure, which are problematic when reasoning with incomplete knowledge. Recent approaches deploy different forms of progression. Liu and Levesque [2005] for instance, study a class of incomplete knowledge that can be represented in so called proper KBs and perform progression on them. The idea is to focus on domains where a proper KB will remain proper after progression, so that an efficient evaluation-based reasoning procedure can be applied. Domains where the actions have local effects (i.e., when the properties of fluents that get altered are contained in the action) provide such a safeguard. The approach is efficient and sound for local effect action theories and may also be complete given certain restrictions, still proper KBs under this weak progression do not permit some general forms of disjunctions to emerge. Recently, Vassos et al. [Vassos et al., 2009] investigated an extension to theories with incomplete knowledge in the Situation Calculus where the effects are not local and progression is still appropriate for practical purposes. Dependencies between unknown preconditions and effects have been incorporated in an extension of the FLUX programming language [Thielscher, 2005b] under the Fluent Calculus semantics. The extension, presented in [Thielscher, 2005a], handles dependencies by appending implication constraints to the existing store of constraint handling rules, in a spirit very similar to the HCDs proposed in the present study. The emphasis is on building an efficient constraint solver, thus limiting the expressiveness. Moreover, it is not clear how the extensive set of complicated implication rules that are defined there are related to possible worlds. Apart from Thielscher’s work, a recent study by Forth and Shanahan [2004] is highly related to ours, as they attempt to capture knowledge change as ordinary fluent change. The authors utilized knowledge fluents in the Event Calculus to specify when an agent possesses enough knowledge to execute an action in a partially observable environment. Still, their intention was to handle ramifications, focusing on closed, controlled environments, rather than building a generic epistemic theory for the Event Calculus. An agent is only assumed to perform ”safe” actions, i.e., actions for which enough knowledge about its preconditions is available. In an open environment the occurrence of exogenous actions might also fall under the agent’s attention, whose effects are dependent on -unknown to it- preconditions. It is not clear how knowledge evolves in terms of such uncertain effects, neither how knowledge about disjunction of fluents can be modeled. Within DECKT we attempt a broader treatment of knowledge evolution within open environments, unifying a wide range of complex commonsense phenomena. 56 3 Background This study uses the DECKT knowledge theory [Patkos and Plexousakis, 2009] as an underlying formalism. DECKT extends the Event Calculus [Kowalski and Sergot, 1986] with epistemic features enabling reasoning about a wide range of commonsense phenomena, such as temporal and delayed knowledge effects, knowledge ramifications, concurrency, non-determinism and others. The Event Calculus is a narrative-based many-sorted first-order language for reasoning about action and change, where events indicate changes in the environment, fluents denote time-varying properties and a timepoint sort implements a linear time structure. The calculus applies the principle of inertia, which captures the property that things tend to persist over time unless affected by some event; when released from inertia, a fluent may have a fluctuating truth value at each time instant. It also uses circumscription to solve the frame problem and support default reasoning. A set of predicates is defined to express which fluents hold when (HoldsAt), what events happen (Happens), which their effects are (Initiates, T erminates, Releases) and whether a fluent is subject to the law of inertia or released from it (ReleasedAt). DECKT employs the discrete time Event Calculus axiomatization described in [Mueller, 2006]. It assumes agents acting in dynamic environments having accurate but potentially incomplete knowledge and able to perform sensing and actions with context-dependent effects. It uses four new epistemic fluents, namely Knows, Kw (for ”knows whether”), KP (for ”knows persistently”) and KP w. The Knows fluent expresses knowledge about domain fluents and formulae. Whenever knowledge is subject to inertia the KP fluent is used that is related with the Knows fluent by the axiom1 : (KT2) HoldsAt(KP (φ), t) ⇒ HoldsAt(Knows(φ), t). Moreover, knowledge can also be inferred indirectly by means of appropriate ramifications, usually modeled as state constraints. In brief, direct action effects that are subject to inertia affect the KP fluent, while indirect effects and ramifications may interact with the Knows fluent explicitly. Finally, we have that HoldsAt((Kw(f ), t)) ≡ HoldsAt(Knows(f ), t) ∨ HoldsAt(Knows(¬f ), t) (the abbreviation for HoldsAt(KP w(f ), t) is analogous)2 . The objective is to extend a given domain axiomatization with a set of meta-axioms that enable an agent to perform epistemic derivations under incomplete information. For instance, for positive effect axioms that specify under what condition action e causes fluent f to become true, i.e., V i HoldsAt(fi , t) ⇒ Initiates(e, f, t), DECKT introduces a statement expressing that if the conjunction of preconditions C = {f~i } is known then after e the effect will be known: Vf ∈C (KT3.1) i [HoldsAt(Knows(fi ), t)]∧ Happens(e, t) ⇒ Initiates(e, KP (f ), t) 1 Free variables are implicitly universally quantified. Fluent formulae inside any epistemic fluent are reified, i.e., Knows(f1 ∧ f2 ) is a term of first-order logic, not an atom. 2 To clarify matters, the abbreviation only refers to the Kw and KP w fluents inside the distinguished predicate HoldsAt; these epistemic fluents can still be used as ordinary fluents inside any other predicate of the calculus, e.g., Terminates(e,KPw(f),t). However, if some precondition is unknown while none is known false, then after e knowledge about the effect is lost: Wf ∈C (KT5.1) i [¬HoldsAt(Kw(fi ), t)]∧ Wf ∈C ¬HoldsAt(Knows( i ¬fi ), t)∧ ¬HoldsAt(Knows(f ), t) ∧ Happens(e, t) ⇒ T erminates(e, KP w(f ), t) The approach is analogous for negative effect axioms Vi HoldsAt(fi , t) ⇒ T erminates(e, f, t) and release axVi ioms HoldsAt(fi , t) ⇒ Releases(e, f, t). The latter model non-deterministic effects, therefore they result in loss of knowledge about the effect. Finally, knowledge-producing (sense) actions provide information about the truth value of fluents and, by definition, only affect the agent’s mental state: (KT4) Initiates(sense(f ), KP w(f ), t) 4 Hidden Causal Dependencies HCDs emerge when an agent performs actions with unknown preconditions. Consider the positive effect axiom HoldsAt(f ′ , t) ⇒ Initiates(e, f, t), with f ′ unknown and f known to be false at t (f may denote that a door is open, f ′ that a robot stands in front of that door and e the action of pushing forward gently). If e happens at t, f becomes unknown at t + 1, as dictated by (KT5.1), still a dependency between f ′ and f must be created to denote that if we later sense any of them we can infer information about the value of the other, assuming no event interacted with them in the meantime (either the robot was standing in front of the door and opened it or the door remained closed). We propose here a representation of HCDs as disjunctive formulae and describe when they are created and destroyed and what knowledge is preserved when a HCD is destroyed. First, a word about notation. Let C denote the context of an effect axiom (the set of precondition fluents), i.e. C = {f0 , ..., fn }, n ≥ 0 (we omit to specify the axiom it refers to as it will be clear from the context). Let C(t)+ be the subset of known fluents from C at a given time instant t, i.e., C(t)+ = {f ∈ C|HoldsAt(Knows(f ), t)}. Finally, let C(t)− = C \ C(t)+ be the set of fluents that the agent either does not know or knows that they do not hold at t. 4.1 Creation of HCDs Each time an action with unknown effect preconditions occurs, a HCD is created. We assume in this study that no action affects the preconditions at the same time (except, of course, if the effect’s preconditions is the effect fluent itself). Vi Pos. effect axioms HoldsAt(fi , t) ⇒ Initiates(e, f, t): If an action with a positive effect occurs with none of its preconditions known to be false, but some unknown to the agent, a HCD is created between the latter and the effect (by sensing that the robot is standing in front of the door after the push gently action, it can infer that the door must be open): Wf ∈C (KT6.1.1) ¬HoldsAt(Knows( i ¬fi ), t)∧ Wfi ∈C [¬HoldsAt(Kw(fi ), t)] ⇒ Wf ∈C(t)− Initiates(e, KP (f ∨ j ¬fj ), t) 57 In other words and considering (KT2), we augment the theory with a disjunctive knowledge formula that is equivalent Vf ∈C(t)− to HoldsAt(Knows( j fj ⇒ f ), t + 1). In addition, a HCD is also created between the effect fluent and its unknown preconditions, given that the agent knew that the effect did not hold before the action: Wf ∈C (KT6.1.2) ¬HoldsAt(Knows( i ¬fi ), t)∧ Wfi ∈C [¬HoldsAt(Kw(fi ), t)] ∧ HoldsAt(Knows(¬f ), t) Vfj ∈C(t)− ⇒ Initiates(e, KP (¬f ∨ fj ), t) Axiom (KT6.1.2) is triggered with (KT6.1.1), resulting in the creation of an epistemic biimplication relation among the preconditions and the effect fluent. Example 1. Consider the positive effect axiom HoldsAt(f1 , t) ∧ HoldsAt(f2 , t) ⇒ Initiates(e, f, t) denoting that when the robot stands in front of a door (f1 ) and the door is not locked (f2 ), a gentle push (e) will cause the door to open (f ). Let the robot know initially that f does not hold, that f1 holds but does not know whether f2 holds, i.e., HoldsAt(Knows(¬f ), 0) ∧ HoldsAt(Knows(f1 ), 0) ∧ ¬HoldsAt(Kw(f2 ), 0). In this case, C = {f1 , f2 }, while C(0)− = {f2 }. After Happens(e, 0) both (KT6.1.1,2) are triggered resulting in HoldsAt(KP (¬f2 ∨ f ), 1) ∧ HoldsAt(KP (¬f ∨ f2 ), 1). This is equivalent toV HoldsAt(Knows(f2 ⇔ f ), 1).  i Neg. effect axioms Holds(fi , t) ⇒ T erminates(e, f, t) The situation is similar. Now, the HCDs are created for ¬f : Wf ∈C (KT6.1.3) ¬HoldsAt(Knows( i ¬fi ), t)∧ Wfi ∈C [¬HoldsAt(Kw(fi ), t)] ⇒ Wf ∈C(t)− Initiates(e, KP (¬f ∨ j ¬f ), t) Wf ∈Cj (KT6.1.4) ¬HoldsAt(Knows( i ¬fi ), t)∧ Wfi ∈C [¬HoldsAt(Kw(fi ), t)] ∧ HoldsAt(Knows(f ), t) ⇒ Vfj ∈C(t)− Initiates(e, KP (f ∨ fj ), t) Vi Release axioms HoldsAt(fi , t) ⇒ Releases(e, f, t): It is trivial to see that in the case of non-deterministic effects a HCD is only created if the agent has prior knowledge about the effects. Specifically, only if it senses that a previously known effect fluent has changed its truth value will the agent be certain that the preconditions must have been true: Wf ∈C (KT6.1.5) ¬HoldsAt(Knows( i ¬fi ), t)∧ Wfi ∈C [¬HoldsAt(Kw(fi ), t)]∧ HoldsAt(Knows((¬)f ), t) ⇒ Wf ∈C(t)− Initiates(e, KP ((¬)f ∨ j fj ), t) 4.2 Expiration of HCDs In contrast to state constraints that express implication relations that must be satisfied at all times, HCDs are valid only for limited time periods, as they are created due to the agent’s epistemic state. Specifically, the dependency is valid for as long as the involved fluents remain unaffected by occurring events; if an event modifies them the HCD expires. First, let us define an abbreviation stating that an event e affects or may affect a fluent f if there is some effect axiom none of whose preconditions f~i is known false: (KmA) KmAf f ect(e, f, t) ≡ (KmInitiate(e, f, t) ∨ KmT erminate(e, f, t) ∨ KmRelease(e, f, t)) where KmInitiate(e, f, t) ≡ Initiates(e, f, t) ∧ Wfi ∈C ¬HoldsAt(Knows( ¬fi ), t) and similarly for KmT erminate(e, f, t) and KmRelease(e, f, t). These epistemic predicates do not cause any actual effect to f . HCD Termination If an event occurs that affects or may affect any fluent of a HCD then this HCD is no longer valid: Wd (KT6.2.1) HoldsAt(KP ( fd ), t)∧ Wd Happens(e, t) ∧ [KmAf f ect(e, fd , t)] ⇒ Wd T erminates(e, KP ( fd ), t) Example 2. Imagine a robot that speaks the correct passcode into a microphone (action e) with the intention of opening a safe (fluent f ), without knowing whether the microphone is recording (fluent f1 ); the following HCD is created from (KT6.1.1): HoldsAt(Knows(¬f1 ∨ f ), t). Under this simplistic setting, if later on the robot obtains information through sensing (axiom (KT4)) that the safe is still locked, it will also infer that the device is not recording. Now, at some timepoint t1 > t the robot becomes aware of an action by a third party that switches the microphone on. At this point the HCD needs to expire, as the epistemic relation among the fluents in no longer valid and no sense action about any of the two fluents can provide information about the other. Axiom (KT6.2.1) accomplishes this effect.  HCD Reduction Depending on the type of action and the related context there are situations where although a HCD becomes invalidated due to (KT6.2.1) there may still be knowledge that should be preserved. Specifically, if before the action the agent has inferred indirectly that the fluent that may be affected does not hold, then this fluent did not contribute to the HCD in the first place; the remaining components should create a new HCD: Wf ∈D (KT6.2.2) HoldsAt(KP ( f ), t) ∧ Happens(e, t)∧ Wf ∈D [KmAf f ect(e, f, t) ∧ HoldsAt(Knows(¬f ), t)] ⇒ Wf ′ ∈D′ (t) ′ Initiates(e, KP ( f ), t) where if f ∈ D are the fluents of the initial HCD, then D′ (t) denotes those fluents of D that are not known at t. HCD Expansion Consider now the particular situation where a contextdependent event occurs, the preconditions of which are unknown to the agent and its effect is part of some HCD. In this case, the agent cannot be certain whether the HCD will be affected by the event, as this depends on the truth value of the preconditions. In fact, the HCD itself becomes contingent on this set; if the preconditions prove to be false, the original HCD should still be applicable, otherwise it must be invalidated, according to the previous analysis. The way to handle this situation is to expand the original HCD with the negation of the action’s unknown preconditions. As a result, by obtaining knowledge about them the agent can distinguish whether the original dependency should persist or not. Example 2. (cont’d) If at timepoint t2 (t1 > t2 > t) the robot itself attempts to switch the microphone on under the (unknown to it) precondition of having pressed the 58 proper button (fluent f2 ) then, apart from the new HCD HoldsAt(Knows(¬f2 ∨ f1 ), t2 + 1) according to (KT6.1.1), the initial HCD needs to be modified. In particular, it should capture the fact that only if the microphone has not been switched on, should the HCD remain valid, i.e., HoldsAt(Knows(f2 ∨ ¬f1 ∨ f ), t2 + 1).  It becomes clear that the unknown preconditions of a context-dependent effect should result in the expansion of any HCD that includes the effect. Before modeling this situation though, one must notice a crucial contingency: the agent uses these preconditions to determine whether the original HCD is applicable or not; what if this distinction cannot be made? Such a situation may be, for instance, the result of an action leading to a world state where the precondition fluents have the same truth value regardless of the state before the action (e.g., the action of closing a door if it is open). To capture such a situation we introduce the following abbreviation stating that a fluent may be inverted by an occurring event: (INV) KmInverted(f, t) ≡ ∃e(Happens(e, t) ∧ (Ef f ectP redicate(e, f, t) ∨ KmRelease(e, f, t))) where, for a fluent literal f and its corresponding atom F , Ef f ectP redicate(e, f, t) denotes KmT erminate(e, F, t) when f = F , and KmInitiate(e, F, t) when f = ¬F . Notice that the KmInverted predicate is completely independent of the truth value a fluent might have at any time instant. For example, for an effect axiom of the form HoldsAt(f1 , t) ⇒ Initiates(e, f, t) we are interested whether KmInverted(f1 , t) is true, while for the axiom ¬HoldsAt(f1 , t) ⇒ Initiates(e, f ′ , t) we should seek whether KmInverted(¬f1 , t) holds. We can now formalize the axiomatization for HCD expansion: for any action e that may initiate, terminate or release a fluent of a HCD, if its unknown preconditions f~i are not or may not be inverted, then a new HCD is created that involves all the components of the original HCD along with the unknown preconditions of e’s effect axiom: Wf ∈D (KT6.2.3) HoldsAt(KP ( f ), t) ∧ Happens(e, t)∧ Wf ∈D [KmAf f ect(e, f, t) ∧ ¬HoldsAt(Kw(f ), t)]∧ Vf ∈C(t)− ¬( i [KmInverted(fi , t)]) ⇒ Vfi ∈C(t)− Wf ′ ∈D′ (t) ′ [Initiates(e, KP (fi ∨ f ), t)] Intuitively, since any HCD represents an epistemic implication relation, axiom (KT6.2.3) creates a nested implication relation with head the HCD and body the negated unknown preconditions of the effect axiom that may affect it. Transitivity Finally, we also need to consider the transitivity property of implication relations. Whenever an agent knows that f1 implies f2 and f2 implies f3 there is an implicit relation stating that also f1 implies f3 . If an action affects f2 the two original HCDs will expire due to (KT6.2.1), still the relation between f1 and f3 that has been established should persist: Wf ∈D (KT6.2.4) HoldsAt(Knows(f ∨ ( i i fi )), t)∧ Wf ∈D HoldsAt(Knows(¬f ∨ ( j j fj )), t)∧ Happens(e, t) ∧ KmAf f ect(e, f, t) ⇒ Wf ′ ∈D′ (t) Wf ′ ∈D′ (t) Initiates(e, KP ( i i fi′ ∨ j j fj′ ), t) Figure 1: Relation among axiomatic sets. 5 Correctness and Complexity DECKT has been shown to derive sound and complete inferences with respect to possible worlds-based theories [Patkos and Plexousakis, 2009], based on a correspondence established to an epistemic extension of Mueller’s branching Event Calculus (BDEC) [Mueller, 2007]. Exploiting the existing machinery we arrived to the same result about our extension with HCDs, proving that the (KT6) axioms constitute a complete and sufficient set: Corollary 1 After any ground sequence of actions with deterministic effects but with potentially unknown preconditions, a fluent formula φ is known whether it holds in DECKT if and only if it is known whether it holds in BDECKT, under the bridging set of axioms L and M. Proof sketch3 : The branching discrete Event Calculus (BDEC) devised by Mueller is a modified version of the linear discrete Event Calculus (LDEC) (see Figure 1). It replaces the timepoint sort with the sort of situations, lifting the requirement that every situation must have a unique successor state. The Branching Discrete Event Calculus Knowledge Theory (BDECKT) that we have developed follows on from Moore’s [1985] formalization of possible world semantics in action theories, where the number of K-accessible worlds remains unchanged upon ordinary event occurrences and reduces as appropriate when sense actions occur. Similar to Scherl and Levesque’s [2003] approach for the Situation Calculus , BDECKT generalizes BDEC in that there is no single initial situation in the tree of alternative situations, rather a forest of trees each with its own initial situation. The DECKT axiomatization is based on the linear Event Calculus that treats knowledge as a fluent and uses a set of axioms to determine the way this epistemic fluent changes its truth value as a result of event occurrences and the knowledge already obtained about relevant context. BDECKT on the other hand is based on a branching time version of the Event Calculus where knowledge is understood as reasoning about the accessibility relation over possible situations (Figure 1). Mueller has established a set L of mapping rules between the underlying linear and branching versions of the Event Calcu- lus and proved that these two versions can be logically equivalent [Mueller, 2007]. The L axioms restrict -among othersBDEC to a linear past. Based on this corollary established by Mueller, we have shown that the two knowledge theories manipulate knowledge change the same way, i.e., the set of known formulae is the same after a sequence of actions (in contrast to the underlying theories, our equivalence result is not an one-to-one mapping of all the axioms). We define a set M that serves as a bridge between DECKT and BDECKT and construct each individual effect axiom of one theory from the axioms of the other and the bridging rules (and vice versa). This way, the conjunction of DECKT, BDEC, LDEC, L and M can provide all BDECKT epistemic derivations leading to completeness with respect to the possible worlds semantics and respectively, the conjunction of BDECKT, BDEC, LDEC, L and M can provide all DECKT epistemic derivations resulting in soundness of DECKT inferences.  In what follows, we additionally show that reasoning with HCDs is computationally more efficient. It is an important conclusion as it paves the way for practical applications of knowledge representation without substantial sacrifice in expressiveness. The objective is to study the complexity of checking whether a fluent formula holds after the occurrence of an event sequence in total chronological order, given a domain theory comprising a fixed set of context-dependent effect axioms and a set of implication rules (either in the form of state constraints or in the form of HCDs). 5.1 Classic Event Calculus Without Knowledge For n domain fluents there are potentially 2n distinct knowledge bases (KBs) (when all n fluents are released) that need to be progressed according to occurring events and at most n HoldsAt() and n ReleasedAt() predicates to search through for each KB. All predicates are stored as facts: Algorithm: For each event ei occurring at ti and each KB Vj 1. Retrieve all effect axioms of ei : [HoldsAt(fj , t)] ⇒ θ(ei , f ′ , t)?, for θ = Initiates, T erminates, Releases This information is already known at design time. Therefore, it requires constant time, regardless of the type of action, the number of effect axioms or the size of the domain (number of fluents). 2. Query the KB for the truth value of the precondition fluVj ents of ei : [HoldsAt(fj , ti )]? The intention is to determine which of the previously retrieved axioms will be triggered, i.e., which effect fluents will change their truth value. The problem of query answering on (untyped) ground facts (without rules) reduces to the problem of unifying the query with the facts, which is O(n), where n is the size of the KB. 3. Determine which fluents are inertial: ¬Released(f, t)? Inertial fluents that have not been affected by ei in step 2, i.e., are neither released nor the event releases them, need to maintain their truth value in the successor timepoint. As before, the cost of the query is O(n). 4. Assert in the KB the new truth values of fluents. As the new truth values refer to the successor timepoint, this step does not involve any update of existing facts, 3 The full proof is available at: http://www.csd.uoc.gr/∼patkos/Proof.pdf 59 rather an assertion of facts to an empty KB. We assume constant time, regardless of the number of assertions. 5. Use state constraints to infer all indirect effects. The truth value of those fluents that are released from inertia, yet ruled by state constraints, is determined. In order to perform all derivations from the set of rules, one may apply standard techniques from classical logic inference, such as resolution. To preserve generality of results, by a minor abuse of notation we denote this complexity as O(IN F SC ) or O(IN F HCD ) in the sequel, based on whether the rules involve only the state constraints or both state constraints and HCDs. We revert to this complexity at the end of this section. Also, in this step multiple models may be produced and added to the set of KBs, owed to the unconstrained released fluents, i.e., non-inertial fluents subject to no state constraint. Summarizing, the complexity of reasoning with the Event Calculus given a sequence of e actions is characterized by O(e∗2n ∗(2∗n+IN F SC ))4 . The steps of the algorithm follow on from previous complexity analysis of simpler formulations of the classic Event Calculus, as in [Paschke, 2006]. 5.2 Possible Worlds Approach The number of possible worlds depends on the number of unknown fluents, i.e., in a domain of n fluents, u of which being unknown, we need to store at most 2u possible worlds, where (u ≤ n). One reasoning task needs to be performed for each of these worlds, since the same effect axioms of a given domain theory may give rise to diverse conclusions in each world. As such, the size of the KB of fluents that is maintained at each timepoint is O(2u−1 ) (a fluent may hold only in half of the total 2u worlds). Moreover, it is easy to verify that, according to the definition of knowledge, answering if a conjunctive (resp. disjunctive) query of m fluents is known requires at most 2u−m (resp. 2u − 2u−m ) worlds to check truth in (plus one, if the formula turns out to be unknown). The algorithm and its logical inferences need to be performed for each possible world, given as input the domain fluents and the fixed set of effect axioms and state constraints. Given a sequence of e actions, the complexity for conjunctive queries is O(e ∗ 2u ∗ (2 ∗ n + IN F SC ) + 2u−m ∗ n) (resp. O(e∗2u ∗(2∗n+IN F SC )+(2u −2u−m )∗n) for disjunctive queries), as we first need to progress all possible worlds and then issue a query concerning the desirable fluent formula to a subset of them, with cost O(n). It should be noted that each fluent that becomes released from the law of inertia causes the number of possible worlds to double, i.e., u increases by one. As a result, both the size of the KB and the reasoning effort increase significantly. Unsettlingly, one should also expect that u ≃ n even for the real-world case, as we argue below. 5.3 DECKT Approach DECKT performs a single reasoning task with each action using the new axiomatization’s meta axioms and substitutes 4 Although constants could be eliminated, we include them for emphasis, so that the reader can follow each step of the algorithm. 60 each atomic domain fluent with the corresponding KP and Knows epistemic fluents. KP is always subject to inertia, whereas Knows is always released, therefore step 3 can be disregarded altogether, along with the need to preserve multiple versions of KBs for unconstrained fluents (the Knows fluents never fluctuate). Furthermore, disjunctive epistemic expressions are preserved in this single KB without the need to be broken apart, since all appropriate derivations are reached by means of HCDs. The size of the input for step 2 is equal to that of reasoning without knowledge, as we only search through those Knows fluents that refer to atomic domain ones. The difference now is that each domain effect axiom is replaced by 5 new: 2 due to (KT3), 1 due to (KT5) and 2 due to (KT6.1). Nevertheless, as with the non-epistemic theory, all we need to query in order to progress the KB after any action are the precondition fluents (plus the effect fluent for some of the axioms). Therefore, as before, the complexity of this step is O(n), since one predefined query to a domain of n fluents suffices to provide the necessary knowledge about all DECKT effect axioms mentioned above. Apart from the epistemic effect axioms, we also introduced axioms for handling HCDs (KT6.2-4). Since HCDs are treated as ordinary inertial fluents (they are modeled in terms of the KP fluent), they fall under the influence of traditional Event Calculus inference (steps 1,2). For these axioms the necessary knowledge that needs to be obtained is whether some HCD that incorporates the effect fluent is among the HCDs stored in the KB. Their number increases as the agent performs actions with unknown preconditions. Let d denote the number of KP fluents that represent HCDs, then the complexity of querying the KB is O(d), where d ≤ 2n . Following the algorithm, we can see that the complexity of reasoning with DECKT is O(e ∗ (n + d + IN F HCD ) + n), where O(IN F HCD ) is the complexity of logical inference with state constraints and HCDs. The input is the atomic inertial fluents, as usual, reified in the Knows fluent. The last addition refers to querying n atomic epistemic fluents, i.e., the query formula, after the narrative of actions. In fact, even when the underlying domain axiomatization is non-deterministic, its epistemic meta-program introduces no additional complexity: the KP fluent is always subject to inertia and whenever a domain fluent is released due to uncertainty, its corresponding KP fluents become false according to (KT5). As such, reasoning with DECKT requires only a reduced version of the Event Calculus, where the Releases predicate is removed from the foundational axioms. 5.4 Discussion on Complexity Results We see that the dominant factor in the complexity of reasoning with possible worlds is the number u of unknown world aspects. In the worst case, u = n resulting in exponential complexity to the size of the domain; yet, even in real-world problems u ≃ n, as we expect that in large-scale dynamic domains many more world aspects would be unknown to the agent than known ones at any given time instant. Furthermore, since in practice the query formula that needs to be evaluated is often orders of magnitude smaller in size than the domain itself, i.e., (n ≫ m), query answering of either conjunctive or disjunctive formulae spirals quickly out of control. With DECKT, on the other hand, it is the number of extra fluents capturing HCDs that predominates the complexity. In fact, although in the worst case it can be that d = 2n this is a less likely contingency to meet in practice: it would mean that the agent has interacted with all world aspects having no knowledge about any precondition or that state constraints that capture interrelated fluents embody the entire domain (so called dominos domains which lead to chaotic environments are not commonly met in commonsense reasoning). Moreover, HCDs fall under the agent’s control; even for longlived agents that execute hundreds of actions, HCDs provide a guide as to which aspects to sense in order to obtain knowledge about the largest set of interrelated fluents, thus enabling the agent to manage their number according to resources. Apparently, the number and length of HCDs also affect the inference task. Still, the transition from O(IN F SC ) to O(IN F HCD ) has polynomial cost; the complexity of most inference procedures, such as resolution, is linearly affected when increasing the number of implication rules, given that the size of the domain is constant. Finally, one should notice that even in the worst case one reasoning task needs to be performed for each action. Specifically, the factor d does not influence the entire process, as is the case of 2u for possible worlds, significantly reducing the overall complexity. 6 Implementation of HCD-enabled Reasoner The formal treatment of knowledge and change we develop aims at programming rational agents for practical implementations. Faced with the task of implementing an agent’s mental state, two features are most desirable by a reasoner in order to exploit DECKT’s full potential: • It should enable reasoning to progress incrementally to allow for run-time execution of knowledge-based programs, where an agent can benefit from planning with the knowledge at hand (online reasoning). Each time a program interpreter adds a new action to its agenda, the reasoner should update its current KB appropriately. • It should permit reification of the epistemic fluents in Event Calculus predicates, to allow for instance the epistemic proposition Knows(Open(S1)) to be handled as a term of a first-order logic rather than an atom. Based on this syntactical treatment proposition HoldsAt(Knows(Open(S1)), 0) can be regarded as a well-formed formula. Most existing Event Calculus reasoners do not satisfy the latter requirement, while only recently an online reasoner was released based on the Cached Event Calculus [Chesani et al., 2009]. Consequently, in order to implement and evaluate different use cases we have constructed an Event Calculus reasoner on top of Jess5 , a rule-based engine that deploys the efficient Rete algorithm for rule matching. Predicates are asserted as facts in the reasoner’s agenda, specified by the following template definition: (deftemplate EC (slot predicate) (slot event (default nil)) (slot epistemic (default no)) 5 (multislot posLtrs ) (multislot negLtrs ) (slot time (default 0))) Multislots create lists denoting fluent disjunctions (conjunctions are decomposable into their components according to the definition for knowledge). For instance, knowledge about formula (f1 ∨ f2 ∨ ¬f3 ) at time 1 is captured by the fact: (EC (predicate HoldsAt) (epistemic Knows) (posLtrs f_1 f_2) (negLtrs f_3) (time 1)) The exploitation of lists for maintaining positive and negative literals of formulae enables the representation of HCDs in a syntax-independent manner, so that all meta-axioms of DECKT be translated into appropriately defined rules. This way, the reasoning process can be fully automated, despite the fact that the (KT6) set is time-dependent: the meta-axioms adapt to the facts that exist in the reasoner’s agenda at each timepoint. Among the basic features of the new reasoner6 are: • given a domain axiomatization, the user can select between the execution of classical Event Calculus reasoning or epistemic reasoning using DECKT. • the domain axiomatization is written based on a simple, intuitive Event Calculus-like syntax, which is then parsed into appropriate Jess rules (Figure 2). The user may modify the Jess program as well, thus augmenting the axiomatization with advanced and more expressive components, such as rules and constraints not yet supported by the Event Calculus parser. • new events and observations can be asserted on-the-fly, based on information acquired at execution time, e.g., from the user or the agent’s sensors. • reasoning can progress incrementally, while the user can decide the time span of the execution step. • a GUI is provided for modifying and storing Event Calculus or Jess programs, for visualizing the output and for providing input to the reasoner at execution time. We should note, though, that the implementation of DECKT described here is general enough to be written in any prolog-like syntax and is not restricted to the Jess tool. 7 Conclusions The DECKT framework has been used to extended benchmark commonsense problems with incomplete knowledge, e.g., those included in [Mueller, 2006]. It is also integrated in an Ambient Intelligence project that is currently in progress in our institute, which introduces highly demanding challenges within dynamic environments. The benefits of HCDs are investigated in a number of other interesting aspects in cognitive robotics as well, such as for representing the potential effects of physical actions in unknown worlds, on whose occurrences the agent can only speculate, as well as for temporal indeterminacy of events. Among our future goals is also to extend the applicability of the new reasoner, constituting it a usable educational tool for epistemic action theories. 6 Jess, http://www.jessrules.com/ (last accessed: May 2011) 61 Jess-EC Reasoner: http://www.csd.uoc.gr/∼patkos/deckt.htm Figure 2: The Jess-EC Reasoner with epistemic capabilities: the Event Calculus domain is translated into Jess rules, whose input and execution the user can modify at execution time. References [Chesani et al., 2009] Federico Chesani, Paola Mello, Marco Montali, and Paolo Torroni. Commitment tracking via the reactive event calculus. In Proceedings of the 21st international jont conference on Artifical intelligence, pages 91–96, San Francisco, CA, USA, 2009. Morgan Kaufmann Publishers Inc. [Demolombe and Parra, 2000] Robert Demolombe and Maria del Pilar Pozos Parra. A simple and tractable extension of situation calculus to epistemic logic. pages 515–524, 2000. [Forth and Shanahan, 2004] Jeremy Forth and Murray Shanahan. Indirect and Conditional Sensing in the Event Calculus. In ECAI, pages 900–904, 2004. [Kowalski and Sergot, 1986] R Kowalski and M Sergot. A logicbased calculus of events. New Generation Computing, 4:67–95, January 1986. [Liu and Levesque, 2005] Yongmei Liu and Hector J. Levesque. Tractable reasoning with incomplete first-order knowledge in dynamic systems with context-dependent actions. In Proceedings of the 19th international joint conference on Artificial intelligence, pages 522–527, San Francisco, CA, USA, 2005. [Lob, 2001] Knowledge and the Action Description Language A. Theory and Practice of Logic Programming, 1:129–184, 2001. [Moore, 1985] R. C. Moore. A Formal Theory of Knowledge and Action. In Formal Theories of the Commonsense World, pages 319–358. J. Hobbs, R. Moore (Eds.), 1985. [Mueller, 2006] Erik Mueller. Commonsense Reasoning. Morgan Kaufmann, 1st edition, 2006. [Mueller, 2007] Erik Mueller. Discrete Event Calculus with Branching Time. In Eigth International Symposium on Logical Formalizations of Commonsense Reasoning (Commonsense’07), pages 126–131, 2007. [Paschke, 2006] Adrian Paschke. ECA-RuleML: An Approach Combining ECA Rules with Temporal Interval-based KR 62 Event/Action Logics and Transactional Update Logics. Computer Research Repository, abs/cs/0610167, 2006. [Patkos and Plexousakis, 2009] Theodore Patkos and Dimitris Plexousakis. Reasoning with knowledge, action and time in dynamic and uncertain domains. In Proceedings of the 21st international jont conference on Artifical intelligence, pages 885–890, USA, 2009. Morgan Kaufmann Publishers Inc. [Petrick and Levesque, 2002] R. Petrick and H. Levesque. Knowledge Equivalence in Combined Action Theories. In Proceedings of the 8th International Conference on Principles of Knowledge Representation and Reasoning (KR-02), pages 303–314, 2002. [Scherl and Levesque, 2003] Richard B. Scherl and Hector J. Levesque. Knowledge, Action, and the Frame Problem. Artificial Intelligence, 144(1-2):1–39, 2003. [Thielscher, 2000] Michael Thielscher. Representing the knowledge of a robot. In A. Cohn, F. Giunchiglia, and B. Selman, editors, Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR), pages 109–120. Morgan Kaufmann, 2000. [Thielscher, 2005a] M. Thielscher. Handling Implication and Universal Quantification Constraints in FLUX. In Proceedings of the 11th International Conference on Principles and Practice of Constraint Programming (CP11), pages 667–681, 2005. [Thielscher, 2005b] Michael Thielscher. FLUX: A Logic Programming Method for Reasoning Agents. Theory and Practice of Logic Programming, 5(4–5):533–565, 2005. [Vassos et al., 2009] Stavros Vassos, Stavros Sardina, and Hector Levesque. Progressing Basic Action Theories with non-Local Effect Actions. In Proceedings of the Ninth International Symposium on Logical Formalizations of Commonsense Reasoning (CS’09), pages 135–140, 2009. Preferred Explanations: Theory and Generation via Planning∗ Shirin Sohrabi Jorge A. Baier Sheila A. McIlraith Department of Computer Science University of Toronto Toronto, Canada [email protected] Depto. de Ciencia de la Computación Pontificia Universidad Católica de Chile Santiago, Chile [email protected] Department of Computer Science University of Toronto Toronto, Canada [email protected] Abstract observations might be of the actions of an agent, and the explanation a plan that captures what the agent is doing and/or the final goal of that plan. Here we conceive the computational core underlying explanation generation of dynamical systems as a nonclassical planning task. Our focus in this paper is with the generation of preferred explanations – how to specify preference criteria, and how to compute preferred explanations using planning technology. Most explanation generation tasks that distinguish a subset of preferred explanations appeal to some form of domain-independent criteria such as minimality or simplicity. Domain-specific knowledge has been extensively studied within the static-system explanation and abduction literature as well as in the literature on specific applications such as diagnosis. Such domain-specific criteria often employ probabilistic information, or in its absence default logic of some notion of specificity (e.g., Brewka 1994). In 2010, we examined the problem of diagnosis of discrete dynamical systems (a task within the family of explanation generation tasks), exploiting planning technology to compute diagnoses and suggesting the potential of planning preference languages as a means of specifying preferred diagnoses (Sohrabi, Baier, and McIlraith 2010). Building on our previous work, in this paper we explicitly examine the use of preference languages for the broader task of explanation generation. In doing so, we identify a number of somewhat unique representational needs. Key among these is the need to talk about the past (e.g., “If I observe that my car has a flat tire then I prefer explanations where my tire was previously punctured.”) and the need to encode complex observation patterns (e.g., “My brakes have been failing intermittently.”) and how these patterns relate to possible explanations. To address these requirements we specify preferences in Past Linear Temporal Logic (P LTL), a superset of Linear Temporal Logic (LTL) that is augmented with modalities that reference the past. We define a finite variant of P LTL, f-P LTL, that is augmented to include action occurrences. Motivated by a desire to generate explanations using state-of-the-art planning technology, we propose a means of compiling our f-P LTL preferences into the syntax of PDDL3, the Planning Domain Description Language 3 that supports the representation of temporally extended preferences (Gerevini et al. 2009). Although, f-P LTL is more In this paper we examine the general problem of generating preferred explanations for observed behavior with respect to a model of the behavior of a dynamical system. This problem arises in a diversity of applications including diagnosis of dynamical systems and activity recognition. We provide a logical characterization of the notion of an explanation. To generate explanations we identify and exploit a correspondence between explanation generation and planning. The determination of good explanations requires additional domainspecific knowledge which we represent as preferences over explanations. The nature of explanations requires us to formulate preferences in a somewhat retrodictive fashion by utilizing Past Linear Temporal Logic. We propose methods for exploiting these somewhat unique preferences effectively within state-of-the-art planners and illustrate the feasibility of generating (preferred) explanations via planning. 1. Introduction In recent years, planning technology has been explored as a computational framework for a diversity of applications. One such class of applications is the class that corresponds to explanation generation tasks. These include narrative understanding, plan recognition (Ramı́rez and Geffner 2009), finding excuses (Göbelbecker et al. 2010), and diagnosis (e.g., Sohrabi, Baier, and McIlraith 2010; Grastien et al. 2007).1 While these tasks differ, they share a common computational core, calling upon a dynamical system model to account for system behavior, observed over a period of time. The observations may be over aspects of the state of the world, or over the occurrence of events; the account typically takes the form of a set or sequence of actions and/or state that is extracted from the construction of a plan that embodies the observations. For example, in the case of diagnosis, the observations might be of the, possibly aberrant, behavior of an electromechanical device over a period of time, and the explanation a sequence of actions that conjecture faulty events. In the case of plan recognition, the ∗ A version of this paper appears in the Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11) Copyright c 2011, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1 (Grastien et al. 2007) characterized diagnosis in terms of SAT but employed a planning-inspired encoding. 63 A sequence of actions α is executable in s if δ(α, s) is defined. Furthermore α is executable in Σ iff it is executable in s, for any s consistent with I. expressive than the preference subset of PDDL3 (e.g., fP LTL has action occurrences and arbitrary nesting of temporal modalities), our compilation preserves the f-P LTL semantics while conforming to PDDL3 syntax. This enables us to exploit PDDL3-compliant preference-based planners for the purposes of generating preferred explanations. We also propose a further compilation to remove all temporal modalities from the syntax of our preferences (while preserving their semantics) enabling the exploitation of costbased planners for computing preferred explanations. Additionally, we exploit the fact that observations are known a priori to pre-process our suite of explanation preferences prior to explanation generation in a way that further simplifies the preferences and their exploitation. We show that this compilation significantly improves the time required to find preferred explanations, sometimes by orders of magnitude. Experiments illustrate the feasibility of generating (preferred) explanations via planning. 2.2 Past LTL with Action Occurrences Past modalities have been exploited for a variety of specialized verification tasks and it is well established that LTL augmented with such modalities has the same expressive power as LTL restricted to future modalities (Gabbay 1987). Nevertheless, certain properties (including fault models and explanation models) are more naturally specified and read in this augmentation of LTL. For example specifying that every alarm is due to a fault can easily be expressed by ✷(alarm → f ault), where ✷ means always and  means once in the past. Note that ¬U(¬f ault, (alarm ∧ ¬f ault)) is an equivalent formulation that uses only future modalities but is much less intuitive. In what follows we define the syntax and semantics of, f-P LTL, a variant of LTL that is augmented with past modalities and action occurrences. 2. Explaining Observed Behavior Syntax Given a set of fluent symbols F and a set of action symbols A, the atomic formulae of the language are: either a fluent symbol, or occ(a), for any a ∈ A. Non-atomic formulae are constructed by applying negation, by applying a standard boolean connective to two formulae, or by including the future temporal modalities “until” (U), “next” ( ), “always”(✷), and “eventually”(♦), or the past temporal modalities “since” (S), “yesterday” ( ), “always in the past”(), and “eventually in the past”(). We say φ is expressed in future f-P LTL if it does not contain any past temporal modalities. Similarly, φ is expressed in past f-P LTL if it does not contain any future temporal modalities. A nontemporal formula does not contain any temporal modalities. In this section we provide a logical characterization of a preferred explanation for observed behavior with respect to a model of a dynamical system. In what follows we define each of the components of this characterization, culminating in our characterization of a preferred explanation. 2.1 Dynamical Systems • Dynamical systems can be formally described in many ways. In this paper we assume a finite domain and model dynamical systems as transition systems. For convenience, we define transitions systems using a planning language. As such transitions occur as the result of actions described in terms of preconditions and effects. Formally, a dynamical system is a tuple Σ = (F, A, I), where F is a finite set of fluent symbols, A is a set of actions, and I is a set of clauses over F that defines a set of possible initial states. Every action a ∈ A is defined by a precondition prec(a), which is a conjunction of fluent literals, and a set of conditional effects of the form C → L, where C is a conjunction of fluent literals and L is a fluent literal. A system state s is a set of fluent symbols, which intuitively defines all that is true in a particular state of the dynamical system. For a system state s, we define Ms : F → {true, f alse} as the truth assignment that assigns the truth value true to f if f ∈ s, and assigns f alse to f otherwise. We say a state s is consistent with a set of clauses C, if Ms |= c, for every c ∈ C. Given a state s consistent with I, we denote Σ/s as the dynamical system (F, A, I/s), where I/s stands for the set of unit clauses whose only model is Ms . We say a dynamical system Σ = (F, A, I) has a complete initial state iff there is a unique truth assignment M : F → {true, f alse} such that M |= I. We assume that an action a is executable in a state s if Ms |= prec(a). If a is executable in a state s, we define its successor state as δ(a, s) = (s \ Del) ∪ Add, where Add contains a fluent f iff C → f is an effect of a and Ms |= C. On the other hand Del contains a fluent f iff C → ¬f is an effect of a, and Ms |= C. We define δ(a0 a1 . . . an , s) = δ(a1 . . . an , δ(a0 , s)), and δ(ǫ, s) = s. Semantics Given a system Σ, a sequence of actions α, and an f-P LTL formula ϕ, the semantics defines when α satisfies ϕ in Σ. An f-P LTL formula is interpreted over finite rather than infinite sequences of states. Its semantics resembles that of LTL on so-called truncated paths (Eisner et al. 2003). Before we define the semantics formally, we give two definitions. Let s be a state and α = a0 a1 . . . an be a (finite) sequence of actions. We say that σ is an execution trace of α in s iff σ = s0 s1 s2 . . . sn+1 and δ(ai , si ) = si+1 , for any i ∈ [0, n]. Furthermore, if l is the sequence ℓ0 ℓ1 . . . ℓn , we abbreviate its suffix ℓi ℓi+1 . . . ℓn by li . Definition 1 (Truth of an f-P LTL Formula) An f-P LTL formula ϕ is satisfied by α in a dynamical system Σ = (F, A, I) iff for any state s consistent with I, the execution trace σ of α in s is such that hσ, αi |= ϕ, where 2 • hσi , αi i |= ϕ, where ϕ ∈ F iff ϕ is an element of the first state of σi . • hσi , αi i |= occ(a) iff i < |α| and ai is the first action of αi . • hσi , αi i |= ϕ iff i < |σ| − 1 and hσi+1 , αi+1 i |= ϕ • hσi , αi i |= U(ϕ, ψ) iff there exists a j ∈ {i, ..., |σ| − 1} such that hσj , αj i |= ψ and for every k ∈ {i, ..., j − 1}, hσk , αk i |= ϕ 2 64 We omit standard definitions for ¬, ∨. • • hσi , αi i |= ϕ iff i > 0 and hσi−1 , αi−1 i |= ϕ • hσi , αi i |= S(ϕ, ψ) iff there exists a j ∈ {0, ..., i} such that hσj , αj i |= ψ and for every k ∈ {j + 1, ..., i}, hσk , αk i |= ϕ Definition 2 (Explanation) Given a dynamical system Σ = (F, A, I), and an observation formula ϕ, expressed in future f-P LTL, an explanation is a tuple (H, α), where H is a set of clauses over F such that I ∪ H is satisfiable, I 6|= H, and α is a sequence of actions in A such that α satisfies ϕ in the system ΣA = (F, A, I ∪ H). The semantics of other temporal modalities are defined in def def terms of these basic elements, e.g., ϕ = ¬¬ϕ, ϕ = def S(true, ϕ), and ♦ϕ = U(true, ϕ). Observe that our semantics adopts a strong next operator; i.e., φ will not be satisfied if evaluated in the final state of a finite sequence. It is well recognized that some properties are more naturally expressed using past modalities. An additional property of such modalities is that they can construct formulae that are exponentially more succinct than their future modality counterparts. Indeed let Σn be a system with Fn = {p0V , . . . , pn }, let ψi = pi ↔ (¬ true ∧ pi ), and let n Ψ = ✷ i=1 ψi → ψ0 . Intuitively, ψi expresses that “pi has the same truth value now as it did in the initial state”. Theorem 1 (Following Markey 2003) Any formula ψ, expressed in future f-P LTL, equivalent to Ψ (defined as above) has size Ω(2|Ψ| ). Note that although Markey’s theorem is related to temporal logic evaluated on infinite paths, the property also holds when it is interpreted on truncated paths. In the following sections we provide a translation of formulae with past modalities into future-only formulae, in order to use existing planning technology. Despite Markey’s theorem, it is possible to show that the blowup for Ψ can be avoided if one modifies the transition system to include additional predicates that keep track of the initial truth value of each of p0 , . . . , pn . Such a modification can be done in linear time. Example Assume a standard logistics domain with one truck, one package, and in which all that is known initially is that the truck is at loc1 . We observe pkg is unloaded from truck1 in loc1 , and later it is observed that pkg is in loc2 . One can express the observation as ♦[occ(unload(pkg, loc1 )) ∧ ♦at(pkg, loc2 )] A possible explanation (H, α), is such that H = {in(pkg, truck1 )}, and α is unload(pkg, loc1 ), load(pkg, loc1 ), drive(loc1 , loc2 ), unload(pkg, loc2 ). Note that aspects of H and α can be further filtered to identify elements of interest to a particular user following techniques such as those in (McGuinness et al. 2007). Given a system and an observation, there are many possible explanations, not all of high quality. At a theoretical level, one can assume a reflexive and transitive preference relation  between explanations. If E1 and E2 are explanations and E1  E2 we say that E1 is at least as preferred as E2 . E1 ≺ E2 is an abbreviation for E1  E2 and E2 6 E1 . • Definition 3 (Optimal Explanation) Given a system Σ, E is an optimal explanation for observation ϕ iff E is an explanation for ϕ and there does not exist another explanation E ′ for ϕ such that E ′ ≺ E. 3. Complexity and Relationship to Planning It is possible to establish a relationship between explanation generation and planning. Before doing so, we give a formal definition of planning. A planning problem with temporally extended goals is a tuple P = (Σ, G), where Σ is a dynamical system, and G is a goal formula expressed in future f-P LTL. The sequence of actions α is a plan for P if α is executable in Σ and α satisfies G in Σ. A planning problem (Σ, G) is classical if Σ has a complete initial state, and conformant otherwise. The following is straightforward from the definition. 2.3 Characterizing Explanations Given a description of the behavior of a dynamical system and a set of observations about the state of the system and/or action occurrences, we define an explanation to be a pairing of actions, orderings, and possibly state that account for the observations in the context of the system dynamics. The definitions in this section follow (but differ slightly from) the definitions of dynamical diagnosis we proposed in (Sohrabi, Baier, and McIlraith 2010), which in turn elaborate and extend previous work (e.g., McIlraith 1998; Iwan 2001). Assuming our system behavior is defined as a dynamical system and that the observations are expressed in future f-P LTL, we define an explanation as a tuple (H, α) where H is a set of clauses representing an assumption about the initial state and α is an executable sequence of actions that makes the observations satisfiable. If the initial state is complete, then H is empty, by definition. In cases where we have incomplete information about the initial state, H denotes assumptions that we make, either because we need to establish the preconditions of actions we want to conjecture in our explanation or because we want to avoid conjecturing further actions to establish necessary conditions. Whether it is better to conjecture more actions or to make an assumption is dictated by domain-specific knowledge, which we will encode in preferences. Proposition 1 Given a dynamical system Σ = (F, A, I) and an observation formula ϕ, expressed in future f-P LTL, then (H, α) is an explanation iff α is a plan for conformant planning problem P = ((F, A, I ∪ H), ϕ) where I ∪ H is satisfiable and where ϕ is a temporally extended goal. In systems with complete initial states, the generation of a single explanation corresponds to classical planning with temporally extended goals. Proposition 2 Given a dynamical system Σ such that Σ has complete initial state, and an observation formula ϕ, expressed in future f-P LTL, then (∅, α) is an explanation iff α is a plan for classical planning problem P = (Σ, ϕ) with temporally extended goal ϕ. Indeed, the complexity of explanation existence is the same as that of classical planning. 65 Theorem 2 Given a system Σ and a temporally extended formula ϕ, expressed in future f-P LTL, explanation existence is PSPACE-complete. include reasonable facts that we wish to posit about the initial state (e.g., that it’s below freezing outside – a common precursor to a car battery being dead). In response to the somewhat unique representational requirements, we express preferences in f-P LTL. In order to generate explanations using state-of-the-art planners, an objective of our work was to make the preference input language PDDL3 compatible. However, f-P LTL is more expressive than the subset of LTL employed in PDDL3, and we did not wish to lose this expressive power. In the next section we show how to compile away some or all temporal modalities by exploiting the correspondence between past and future modalities and by exploiting the correspondence between LTL and Büchi automata. In so doing we preserve the expressiveness of f-P LTL within the syntax of PDDL3. Proof sketch. For membership, we propose the following NPSPACE algorithm: guess an explanation H such that I ∪H has a unique model, then call a PSPACE algorithm (like the one suggested by de Giacomo and Vardi (1999)) to decide (classical) plan existence. Then we use the fact that NPSPACE=PSPACE. Hardness is given by Proposition 2 and the fact that classical planning is PSPACE-hard (Bylander 1994). § The proof of Theorem 2 appeals to a non-deterministic algorithm that provides no practical insight into how to translate plan generation into explanation generation. At a more practical level, there exists a deterministic algorithm that maps explanation generation to classical plan generation. 4.1 Preferred Explanations A high quality explanation is determined by the optimization of an objective function. The PDDL3 metric function we employ for this purpose is a weighted linear sum of formulae to be minimized. I.e., (minimize (+ (∗ w1 φ1 ) . . . (∗ wk φk ))) where each φi is a formula that evaluates to 0 or 1 depending on whether an associated preference formula, a property of the explanation trajectory, is satisfied or violated; wi is a weight characterizing the importance of that property (Gerevini et al. 2009). The key role of our preferences is to convey domain-specific knowledge regarding the most preferred explanations for particular observations. Such preferences take on the following canonical form. Definition 4 (Explanation Preferences) ✷(φobs → φexpl ) is an explanation preference formula where φobs , the observation formula, is any formula expressed in future fP LTL, and φexpl , the explanation formula, is any formula expressed in past f-P LTL. Non-temporal expressions may appear in either formula. An observation formula, φobs , can be as simple as the observation of a single fluent or action occurrence (e.g., my car won’t start.), but it can also be a complex formula. In many explanation scenarios, observations describe a telltale ordering of system properties or events that suggest a unique explanation such as a car that won’t start every time it rains. To simplify the description of observation formulae, we employ precedes as a syntactic constructor of observation patterns. ϕ1 precedes ϕ2 indicates that ϕ1 is observed before ϕ2 . More generally, one can express ordering among observations by using formula of the form (ϕ1 precedes ϕ2 ... precedes ϕn ) with the following interpretation: Theorem 3 Given an observation formula ϕ, expressed in future f-P LTL, and a system Σ, there is an exponential-time procedure to construct a classical planning problem P = (Σ′ , ϕ) with temporally extended goal ϕ, such that if α is a plan for P , then an explanation (H, α′ ) can be generated in linear time from α. Proof sketch. Σ′ , the dynamical system that describes P is the same as Σ = (F, A, I), augmented with additional actions that “complete” the initial state. Essentially, each such action generates a successor state s that is consistent with I. There is an exponential number of them. If a0 a1 . . . an is a plan for P , we construct the explanation (H, α′ ) as follows. H is constructed with the facts true in the state s that a0 generates. α′ is set to a1 . . . an . § All the previous results can be re-stated in a rather straightforward way if the desired problem is to find an optimal explanation. In that case the reductions are made to preference-based planning (Baier and McIlraith 2008). The proofs of the theorems above unfortunately do not provide a practical solution to the problem of (high-quality) explanation generation. In particular, we have assumed that planning problems contain temporally extended goals expresed in future f-P LTL. No state-of-the-art planner that we are aware of supports these goals directly. We have not provided a compact and useful way to represent the  relation. 4. Specifying Preferred Explanations The specification of preferred explanations in dynamical settings presents a number of unique representational requirements. One such requirement is that preferences over explanations be contextualized with respect to observations, and these observations themselves are not necessarily single fluents, but rich temporally extended properties – sometimes with characteristic forms and patterns. Another unique representational requirement is that the generation of explanations (and preferred explanations) necessitates reflecting on the past. Given some observations over a period of time, we wish to conjecture what preceded these observations in order to account for their occurrence. Such explanations may include certain system state that explains the observations, or it may include action occurrences. Explanations may also ϕ1 ∧ ♦(ϕ2 ∧ ♦(ϕ3 ...(ϕn−1 ∧ ♦ϕn )...)) (1) Equations (2) and (3) illustrate the use of precedes to encode a total (respectively, partial) ordering among observations. These are two common forms of observation formulae. (obs1 precedes obs2 precedes obs3 precedes obs4 ) (obs3 precedes obs4 ) ∧ (obs1 precedes obs2 ) (2) (3) Further characteristic observation patterns can also be easily described using precedes. The following is an example of an intermittent fault. 66 if and only if α satisfies φ in P ’s dynamical system. Predicate acceptφ is the (classical) goal in problem P ′ . Below we introduce an extension of the BM compilation that allows compiling away formulae expressed in past f-P LTL. Our compilation takes dynamical system Σ, an observation ϕ, a set Γ of formulae corresponding to explanation preferences, and produces a PDDL3 planning problem. Step 1 Takes Σ and ϕ and generates a classical planning problem P1 with temporally extended goal ϕ using the procedure described in the proof for Theorem 3. Step 2 Compiles away occ in P1 , generating P2 . For each occurrence of occ(a) in Γ or ϕ, it generates an additional fluent happeneda which is made true by a and is deleted by all other actions. Replace occ(a) by happeneda in Γ and ϕ. Step 3 Compiles away all the past elements of preferences in Γ. It uses the BM compilation over P2 to compile away past temporal operators in preference formulae of the form ✷(φobs → φexpl ), generating P3 . For every explanation formula φexpl , expressed in past f-P LTL, in Γ we do the following. We compute the reverse of φexpl , φrexpl , as a formula just like φexpl but with all past temporal operators changed to their future counterparts (i.e., by ,  by ♦, S by U). Note that φexpl is satisfied in a trajectory of states σ iff φrexpl is satisfied in the reverse of σ. Then, we use phase 1 of the BM compilation to build a finite state automaton Aφrexpl for φrexpl . We now compute the reverse of Aφrexpl by switching accepting and initial states and reversing the direction of all transitions. Then we continue with phase 2 of the BM compilation, generating a new planning problem for the reverse of Aφrexpl . In the resulting problem the new predicate acceptφexpl becomes true as soon as the formula φexpl , expressed in past f-P LTL, is made true by the execution of an action sequence. We replace any occurrence of φexpl in Γ by acceptφexpl . We similarly use the BM compilation to remove future temporal modalities from ϕ and φobs . This is only necessary if they contain nested modalities or , which they often will. The output of this step is PDDL3 compliant. To generate PDDL3 output without any temporal operators, we perform the following further step. Step 4 (optional) Compiles away temporal operators in Γ and ϕ using the BM compilation, ending with simple preferences that refer only to the final state. (alarm precedes no alarm precedes alarm precedes no alarm) Similarly, explanation formulae, φexp , can be complex temporally extended formulae over action occurrences and fluents. However, in practice these explanations may be reasonably simple assertions of properties or events that held (resp. occurred) in the past. The following are some canonical forms of explanation formulae: (e1 ∧ ... ∧ en ), and (e1 ⊗ ... ⊗ en ), where n ≥ 1, and ei is either a fluent ∈ F or occ(a), a ∈ A and ⊗ is exclusive or. 5. Computing Preferred Explanations In previous sections we addressed issues related to the specification and formal characterization of preferred explanations. In this section we examine how to effectively generate explanations using state-of-the-art planning technology. Propositions 1 and 2 establish that we can generate explanations by treating an observation formula ϕ as the temporally extended goal of a conformant (resp. classical) planning problem. Preferred explanations can be similarly computed using preference-based planning techniques. To employ state-of-the-art planners, we must represent our observation formulae and the explanation preferences in syntactic forms that are compatible with some version of PDDL. Both types of formulae are expressed in f-P LTL so PDDL3 is a natural choice since it supports preferences and some LTL constructs. However, f-P LTL is more expressive than PDDL3, supporting arbitrarily nested past and future temporal modalities, action occurrences, and most importantly the next modality, , which is essential to the encoding of an ordered set of properties or action occurrences that occur over time. As a consequence, partial- and total-order observations are not expressible in PDDL3’s subset of LTL, and so it follows that the precedes constructor commonly used in the φobs component of explanation preferences is not expressible in the PDDL3 LTL subset. There are similarly many typical φexpl formulae that cannot be expressed directly in PDDL3 because of the necessity to nest temporal modalities. So to generate explanations using planners, we must devise other ways to encode our observation formulae and our explanation preferences. • 5.1 Approach 1: PDDL3 via Compilation Although it is not possible to express our preferences directly in PDDL3, it is possible to compile unsupported temporal formulae into other formulae that are expressible in PDDL3. To translate to PDDL3, we utilize Baier and McIlraith’s future LTL compilation approach (2006), which we henceforth refer to as the BM compilation. Given an LTL goal formula φ, expressed in future f-P LTL, and a planning problem P , the BM compilation executes the following two steps: (phase 1) generates a finite state automaton for φ, and (phase 2) encodes the automaton in the planning problem by adding new predicates to describe the changing configuration of the automaton as actions are performed. The result is a new planning problem P ′ that augments P with a newly introduced accepting predicate acceptφ that becomes true after performing a sequence of actions α in the initial state Theorem 4 Let P3 be defined as above for a description Σ, an observation ϕ, and a set of preferences Γ. If α is a plan for P3 with an associated metric function value M , then we can construct an explanation (H, α) for Σ and ϕ with associated metric value M in linear time. Although Step 4 is not required, it has practical value. Indeed, it enables potential application of other compilation approaches that work directly with PDDL3 without temporal operators. For example, it enables the use of Keyder and Geffner’s compilation (2009) to compile preferences into corresponding actions costs so that standard cost-based planners can be used to find explanations. This is of practical importance since cost-based planners are (currently) more mature than PDDL3 preference-based planners. 67 5.2 Approach 2: Pre-processing (Sometimes) To further address our first objective, we compared the performance of FF (Hoffmann and Nebel 2001), L AMA (Richter, Helmert, and Westphal 2008), SGPlan6 (Hsu and Wah 2008) and HPLAN -P (Baier, Bacchus, and McIlraith 2009) on our compiled problems but with no preferences. The results show that in the total-order cases, all planners except HPLAN -P solved all problems within seconds, while HPLAN -P took much longer, and could not solve all problems (i.e., it exceeded the 600 second time limit). The same results were obtained with the partial-order problems, except that L AMA took a bit longer but still was far faster than HPLAN -P. This suggests that our domains are reasonably challenging. To address our second objective we turned to preferencebased planner HPLAN -P. We created different versions of the same problem by increasing the number of preferences they used. In particular, for each problem we tested with 10, 20, and 30 preferences. To measure the change in computation time between problems with different numbers of preference, we calculated the percentage difference between the computation time for the problem with the larger and with the smaller number of preferences, all relative to the computation time of the larger numbered problem. The average percentage difference was 6.5% as we increased the number of preferences from 10 to 20, and was 3.5% as we went from 20 to 30 preferences. The results suggest that as we increase the number of preferences, the time it takes to find a solution does increase but this increase is not significant. As noted previously the Approach 1 compilation technique (including Step 4) results in the complete removal of temporal modalities and therefore enables the use of the Keyder and Geffner compilation technique (2009). This techniques supports the computation of preference-based plans (and now preferred explanations) using cost-based planners. However, the output generated by our compilation requires a planner compatible with ADL or derived predicates. Among the rather few that support any of these, we chose to experiment with L AMA since it is currently the best-known cost-based planner. Figure 1 shows the time it takes to find the optimal explanation using HPLAN -P and L AMA as well as the time comparison between our “Approach 1” and “Approach 2” encodings (Section 5). To measure the gain in computation time from the “Approach 2” technique, we computed the percentage difference between the two, relative to “Approach 1”. (We assigned a time of 600 to those marked NF.) The results show that on average we gained 22.9% improvement for HPLAN -P and 29.8 % improvement for L AMA in the time it takes to find the optimal solution. In addition, we calculated the time ratio (“Approach 1”/ “Approach 2”). The results show that on average HPLAN -P found plans 2.79 times faster and L AMA found plans 31.62 times faster when using “Approach 2”. However, note that “Approach 2” does not always improve the performance. There are a few cases where the planners take longer when using “Approach 2”. While the definite cause of this decrease in performance is currently unknown, we believe this decrease may depend on the structure of the problem and/or on the difference in the size of the translated domains. On average the translated problems used in “Ap- The compiled planning problem resulting from the application of Approach 1 can be employed with a diversity of planners to generate explanations. Unfortunately, the preferences may not be in a form that can be effectively exploited by delete relaxation based heuristic search. Consider the preference formula γ = ✷(φobs → φexpl ). Step 4 culminates in an automaton with accepting predicate acceptγ . Unfortunately, acceptγ is generally true at the outset of plan construction because φobs is false – the observations have not yet occurred in the plan – making φobs → φexpl , and thus γ, trivially true. This deactivates the heuristic search to achieve acceptγ and thus the satisfaction of this preference does not benefit from heuristic guidance. For a restricted but compelling V class of preferences, namely those of the form ✷(φobs → i ei ) with ei a non-temporal formula, we can pre-process our preference formula in advance of applying Approach 1, by exploiting the fact that we know a priori what observations have occured. Our pre-processing utilizes the following LTL identity: ^ ^ ✷(φobs → ei ) ∧ ♦φobs ≡ ¬φobs U(ei ∧ ♦φobs ). i i V Given a preference in the form ✷(φobs → i ei ) we determine whether φobs is entailed by the observation ϕ (this can be done efficiently given the form of our observations). If this is the case, we use the identity above to transform our preferences, followed by application of Approach 1. The accepting predicate of the resulting automaton becomes true if φexpl is satisfied prior to φobs . In the section to follow, we see that exploiting this pre-processing can improve planner performance significantly. 6. Experimental Evaluation The objective of our experimental analysis was to gain some insight into the behavior of our proposed preference formalism, specifically, we wanted to: 1) develop a set of somewhat diverse benchmarks and illustrate the use of planners in the generation of explanations; 2) examine how planners perform when the number of preferences is increased; and 3) investigate the computational time gain resulting from Approach 2. We implemented all compilation techniques discussed in Section 5 to produce PDDL3 planning problem with simple preferences that are equivalent to the original explanation generation problems. We used four domains in our experiments: a computer domain (see Grastien et al. 2007), a car domain (see McIlraith and Scherl 2000), a power domain (see McIlraith 1998), and the trucks domain from IPC 2006. We modified these domains to account for how observations and explanations occur within the domain. In addition, we created two instances of the same problem, one with total-order observations and another with partial-order observations. Since the observations we considered were either total- or partial-order, we were able to compile them away using a technique that essentially makes an observation possible only after all preceding observations have been observed (Haslum and Grastien 2009; 2011). Finally, we increased problem difficulty by increasing the number of observations in each problem. 68 Total-Order HPLAN -P L AMA Partial-Order HPLAN -P L AMA that was amenable to heuristic search. In so doing, we were able to reduce the time required for explanation generation by orders of magnitude, sometimes. Appr 1 Appr 2 Appr 1 Appr 2 Appr 1 Appr 2 Appr 1 Appr 2 computer-1 1.05 0.78 5.29 computer-2 5.01 4.88 0.19 computer-3 23.44 22.85 0.75 computer-4 55.69 51.98 6.94 computer-5 128.50 125.98 2.05 computer-6 83.17 82.78 2.64 computer-7 505.73 484.68 4.23 computer-8 236.03 205.81 3.75 car-1 1.60 1.53 0.66 car-2 8.96 8.31 10.72 car-3 563.60 40.17 13.98 car-4 NF 103.80 24.00 car-5 NF 245.69 35.93 car-6 NF 522.50 117.45 car-7 NF NF 62.00 car-8 NF NF 108.07 power-1 0.02 0.01 0.02 power-2 0.18 0.18 0.13 power-3 0.47 0.50 0.13 power-4 1.62 1.52 0.63 power-5 26.98 24.60 5.97 power-6 51.65 51.48 11.26 power-7 177.58 177.42 15.09 power-8 565.77 564.71 30.67 truck-1 1.90 1.62 0.13 truck-2 5.25 5.10 0.85 truck-3 108.83 92.57 0.38 truck-4 323.18 323.06 2.03 truck-5 177.68 174.22 3.31 truck-6 NF NF 2.69 truck-7 NF NF 10.23 truck-8 NF NF 11.60 0.25 0.41 0.75 4.58 3.20 4.63 5.85 6.13 0.08 0.20 0.59 1.41 1.56 2.44 3.47 4.46 0.02 0.06 0.13 0.58 0.97 6.84 9.42 16.38 0.29 0.74 1.07 2.48 4.93 1.88 11.92 8.76 2.26 0.58 1.49 1.42 15.92 15.57 15.97 13.93 57.28 56.19 43.92 43.86 188.45 181.44 159.92 152.49 0.60 0.53 3.04 2.59 593.10 15.06 NF 38.48 NF 103.18 NF 176.11 NF 170.54 NF 257.10 0.82 0.53 0.14 0.13 0.31 0.33 75.37 69.85 NF NF NF NF NF NF NF NF 3.08 1.98 3.12 3.07 36.92 27.57 402.96 219.73 NF NF NF NF NF NF NF NF 5.93 0.50 1.02 3.64 3.42 16.99 89.03 29.35 2.11 15.14 16.51 33.79 NF NF NF NF 0.02 0.40 3.50 14.92 46.43 NF NF NF 0.24 0.32 0.59 2.15 2.71 8.53 8.42 8.19 0.57 0.42 1.94 4.33 6.12 16.27 68.47 28.91 0.07 0.25 0.62 0.95 1.23 1.56 2.02 2.94 0.02 0.06 0.11 18.37 0.64 NF NF NF 0.24 0.49 1.15 1.87 3.90 6.14 8.41 11.81 Acknowledgements We thank Alban Grastien and Patrik Haslum for providing us with an encoding of the computer problem, which we modified and used in this paper for benchmarking. We also gratefully acknowledge funding from the Natural Sciences and Engineering Research Council of Canada (NSERC). Jorge Baier was funded by the VRI-38-2010 grant from Universidad Católica de Chile. References Baier, J., and McIlraith, S. 2006. Planning with first-order temporally extended goals using heuristic search. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 788–795. Baier, J., and McIlraith, S. 2008. Planning with preferences. AI Magazine 29(4):25–36. Baier, J.; Bacchus, F.; and McIlraith, S. 2009. A heuristic search approach to planning with temporally extended preferences. Artificial Intelligence 173(5-6):593–618. Brewka, G. 1994. Adding priorities and specificity to default logic. In Proceedings of the Logics in Artificial Intelligence, European Workshop (JELIA), 247–260. Bylander, T. 1994. The computational complexity of propositional STRIPS planning. Artificial Intelligence 69(12):165–204. de Giacomo, G., and Vardi, M. Y. 1999. Automata-theoretic approach to planning for temporally extended goals. In Biundo, S., and Fox, M., eds., ECP, volume 1809 of LNCS, 226–238. Durham, UK: Springer. Eisner, C.; Fisman, D.; Havlicek, J.; Lustig, Y.; McIsaac, A.; and Campenhout, D. V. 2003. Reasoning with temporal logic on truncated paths. In Proceedings of the 15th International Conference on Computer Aided Verification (CAV), volume 2725 of LNCS. Boulder, CO: Springer. 27–39. Gabbay, D. M. 1987. The declarative past and imperative future: Executable temporal logic for interactive systems. In Temporal Logic in Specification, 409–448. Gerevini, A.; Haslum, P.; Long, D.; Saetti, A.; and Dimopoulos, Y. 2009. Deterministic planning in the 5th int’l planning competition: PDDL3 and experimental evaluation of the planners. Artificial Intelligence 173(5-6):619–668. Göbelbecker, M.; Keller, T.; Eyerich, P.; Brenner, M.; and Nebel, B. 2010. Coming up with good excuses: What to do when no plan can be found. In Proceedings of the 20th International Conference on Automated Planning and Scheduling (ICAPS), 81–88. Grastien, A.; Anbulagan; Rintanen, J.; and Kelareva, E. 2007. Diagnosis of discrete-event systems using satisfiability algorithms. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI), 305–310. Haslum, P., and Grastien, A. 2009. Personal communication. Figure 1: Runtime comparison between HPLAN -P and L AMA on problems of known optimal explanation. NF means the optimal explanation was not found within the time limit of 600 seconds. proach 2” are 1.4 times larger, hence this increase in the size of the problem may be one reason behind the decrease in performance. Nevertheless, this result shows that “Approach 2” can significantly improve the time required to find the optimal explanation, sometimes by orders of magnitude, in so doing it allows us to solve more problem instances than with “Approach 1” alone (see car-4 and car-5). 7. Summary In this paper, we examined the task of generating preferred explanations. To this end, we presented a logical characterization of the notion of a (preferred) explanation and established its correspondence to planning, including the complexity of explanation generation. We proposed a finite variant of LTL, f-P LTL, that includes past modalities and action occurrences and utilized it to express observations and preferences over explanation. To generate explanations using state-of-the-art planners, we proposed and implemented a compilation technique that preserves f-P LTL semantics while conforming to PDDL3 syntax. This enables computation of preferred explanations with PDDL3-compliant preference-based planners as well as with cost-based planners. Exploiting the property that observations are known a priori we transformed explanation preferences into a form 69 Haslum, P., and Grastien, A. 2011. Diagnosis as planning: Two case studies. In Proceedings of the International Scheduling and Planning Applications workshop (SPARK). Hoffmann, J., and Nebel, B. 2001. The FF planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research 14:253–302. Hsu, C.-W., and Wah, B. 2008. The SGPlan planning system. In 6th International Planning Competition Booklet (IPC-2008). Iwan, G. 2001. History-based diagnosis templates in the framework of the situation calculus. In Proceedings of the Joint German/Austrian Conference on Artificial Intelligence (KR/ÖGAI). 244–259. Keyder, E., and Geffner, H. 2009. Soft Goals Can Be Compiled Away. Journal of Artificial Intelligence Research 36:547–556. Markey, N. 2003. Temporal logic with past is exponentially more succinct, concurrency column. Bulletin of the EATCS 79:122–128. McGuinness, D. L.; Glass, A.; Wolverton, M.; and da Silva, P. P. 2007. Explaining task processing in cognitive assistants that learn. In Proceedings of the 20th International Florida Artificial Intelligence Research Society Conference (FLAIRS), 284–289. McIlraith, S., and Scherl, R. B. 2000. What sensing tells us: Towards a formal theory of testing for dynamical systems. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI), 483–490. McIlraith, S. 1998. Explanatory diagnosis: Conjecturing actions to explain observations. In Proceedings of the 6th International Conference of Knowledge Representation and Reasoning (KR), 167–179. Ramı́rez, M., and Geffner, H. 2009. Plan recognition as planning. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), 1778–1783. Richter, S.; Helmert, M.; and Westphal, M. 2008. Landmarks revisited. In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI), 975–982. Sohrabi, S.; Baier, J.; and McIlraith, S. 2010. Diagnosis as planning revisited. In Proceedings of the 12th International Conference on the Principles of Knowledge Representation and Reasoning (KR), 26–36. 70 The Method of ILP+ASP on Psychological Models J. Romero, A. Illobre, J. Gonzalez and R. Otero AI Lab. Computer Science Department, University of Corunna (Spain) {jromerod,infjdi00,jgonzalezi,otero}@udc.es Abstract In the next section we introduce the logic programming methods that are used in the method of ILP+ASP. Then in section 3 we describe the method of Experimental Psychology and the method of ILP+ASP, and in section 4 we show the application of the proposed method to HRDM. Finally, section 5 presents conclusions and future work. We propose to apply a new method of Inductive Logic Programming (ILP) and Answer Set Programming (ASP) to Experimental Psychology. The idea is to use ILP to build a model from experimental data and then use ASP with the resulting model to solve reasoning tasks as explanation or planning. For learning in dynamic domains without the frame problem we use the method of [Otero, 2003] and for reasoning in dynamic domains without the frame problem we use actions in ASP [Lifschitz, 2002]. We have applied this method to an experiment in a dynamic domain of Human Reasoning and Decision Making. The results show that the method can be used for learning and reasoning in real-world dynamic domains, thus improving the methods used in Experimental Psychology, that do not consider these problems. 1 2 Logic programming methods Inductive Logic Programming. ILP [Muggleton, 1995] is an area of Machine Learning for the induction of hypothesis from examples and background knowledge, using logic programming as a single representation for them. Inverse Entailment (IE) is a correct and complete ILP method proposed by S. Muggleton that can deal with recursive rules, implemented in the ILP systems Progol [Muggleton, 1995] and Aleph1 . Given a set of examples and background knowledge, these systems can find the simplest hypothesis that explains every example and is consistent with the background. ILP has been successfully applied in other areas of science such as Molecular Biology (see for example [King et al., 1996]). Inductive Logic Programming for Actions. Induction of the effects of actions consists in learning an action description of a dynamic system from evidence on its behavior. General logic-based induction methods can deal with this problem but, unfortunately, most of the solutions provided have the frame problem. Instead we propose to use the method of [Otero, 2003], implemented in the system Iaction [Otero and Varela, 2006]. This is a correct, complete and efficient method for induction of action descriptions that can cope with the frame problem in induction. Answer Set Programming. ASP is a form of logic programming based in the stable models (answer set) semantics [Gelfond and Lifschitz, 1991]. An ASP program can have none, one or several answer sets that can be computed with an ASP system (e.g., Clasp2 ). The method of ASP is (1) to encode a problem as an ASP program such that solutions of the problem correspond to answer sets of the program, and (2) to use an ASP system to compute answer sets of the program. Answer Set Programming for Actions. ASP is suitable for representing action descriptions without the frame problem, and it can be used to solve different tasks like prediction, Introduction The objective of Experimental Psychology is building models of human behavior supported by experimental data. These models are often incomplete and not formally defined, and usually a method of linear regression is used to complete and formalize them. In this paper we propose to use logic programming methods instead. To our knowledge there are few previous attempts to use symbolic methods in this area [Gigerenzer and Selten, 2002][Balduccini and Girotto, 2010]. The idea of the method is to apply Inductive Logic Programming (ILP) to build a psychological model and then apply Answer Set Programming (ASP) with the resulting model to solve reasoning tasks. For induction in dynamic domains without the frame problem [McCarthy and Hayes, 1969] we use the method of [Otero, 2003] and for reasoning in dynamic domains without the frame problem we use actions in ASP [Lifschitz, 2002]. We have applied this method to an experiment in a dynamic domain of Human Reasoning and Decision Making (HRDM), a field of Experimental Psychology. The objective of the experiment is to study how people select the strategies for solving repeatedly a given task. We use ILP to automatically build a model of the process of strategy selection, and then use ASP to reason about the model. 1 2 71 http://www.comlab.ox.ac.uk/activities/machinelearning/Aleph http://www.cs.uni-potsdam.de/clasp/ Subject s1 s2 s3 s4 diagnosis and planning [Lifschitz, 2002]. For example, the Decision Support System of the Space Shuttle [Nogueira et al., 2000] is an ASP system capable of solving planning and diagnostic tasks related to the operation of the Space Shuttle. 3 Behavior 1 2 3 5 Intention 2 3 3 5 Control 1 2 4 4 The Method of ILP+ASP Table 1: Behavior, intention and control for 4 subjects. In this section we explain the method of Experimental Psychology, we present the method of ILP+ASP and describe its application in detail to an example of Experimental Psychology. For example, the resulting model for behavior may consist of the following equation: 3.1 The Method of Experimental Psychology behavior = 1 ∗ intention + 0.4 ∗ control − 1.5 The method of Experimental Psychology follows these steps: (1) Step 5. Reasoning The model built can be used to predict the behavior of other persons. For example, if someone’s intention is high (4) but control is very low (1), the model predicts its behavior will be medium (2.9). Step 1. Psychological Theory First a psychological theory about human behavior is proposed. For example, the Theory of Planned Behavior [Ajzen, 1985] states that human behavior can be modeled by the following concepts: • Intention: the intention to perform the behavior. • Perceived Behavioral Control (control): the perceived ease or difficulty of performing the behavior. According to this theory, behavior is related to intention and control but the particular relation is not known in advance, and it is assumed to depend on the type of behavior. In this example we will consider ecological behavior, i.e. human behavior that is relevant for environmental issues: waste management, water and energy consumption, etc. Actions Actions can modify the behavior of people. For example, giving a course on ecology may promote ecological behavior, and removing recycling bins may diminish it. The behavior and the other concepts can be measured before and after the execution of the actions. Again the relation between these actions and the concepts of the theory is not known in advance. We know that some instances of these actions have been done on different people and we want to know the particular relation that holds on every one, which may depend on conditions of the particular person. This is a problem of learning the effects of actions, and the result is a model of the particular relation among the actions and the concepts of the theory. This problem is not considered in the method of Experimental Psychology. Step 2. Representation The concepts of the proposed theory are represented formally to be used in the step of model construction. In Experimental Psychology it is usual to represent the concepts as variables with several values. For example, behavior is not represented as a boolean variable that states whether a given behavior is performed or not, but instead it has values ranging from 1 to 5, which represent how often the behavior is performed. The same holds for intention and control. 3.2 The Method of ILP+ASP The method of ILP+ASP can be outlined as follows: 1. Substitute the numerical representation by a logic programming representation such that the final model is a logic program instead of a linear equation. Step 3. Data Collection Models are based in experimental data, that can be collected following different methods. For example, we do a survey to many persons in which we ask them whether they performed a given ecological behavior and what was their intention and their perceived behavioral control towards that behavior. Each answer is then quantified so that for each person we have a value for behavior, intention and control. Example data is shown in table 1, where each row corresponds to a subject and each column corresponds to a concept. For example, subject s1 has behavior 1, intention 2 and control 1. 2. Substitute the linear regression method by a method for induction of logic programs (ILP). Thus the logic program solution is built from instance data of the survey by a Machine Learning method. 3. Substitute the use of the model for prediction by reasoning with ASP. Thus we can do additional relevant tasks like explanation and planning, which are not considered in the method of Experimental Psychology. To justify the correctness and significance of the ILP+ASP method consider the following: 1. Logic programs provide a representation more general than linear equations. The relation among the variables may not be as simple as a linear equation, and a logic program can represent alternative relations, e.g. not continuous, which could be the case in the domain of Psychology. Note that logic programming allows the repre- Step 4. Model Construction A model with the particular relation among the concepts is built. Typically it is used a method of linear regression, that uses the representation chosen in step 2 and the data collected in step 3. 72 sentation of simple numerical formulas, e.g. linear equations, so we are clearly improving the form of representation. 2. ILP is a correct and complete method of induction for logic programs from instances. Thus the result will have the correction at the same level as linear regression has. The method has the power to identify a model if it exists or to tell us that there is no model, i.e. to validate the psychological theory on experimental data. 3. ASP is able to use the model built, which is a logic program, for tasks done with linear equations like prediction, but also for other additional relevant tasks like explanation and planning. In summary, the correctness of ILP provides the correctness of the method when building the model and the correction of ASP provides the correctness of the method when using the model. and lteq to allow Progol to make comparisons with different values of those concepts: 3.3 An example The first two lines define some predicates, called types, used in the mode declarations. The following sentences, called modes, describe the predicates that the system can use in the head of the learned rules (modeh declarations) or in the body (modeb declarations)3 . The modeh declaration states that the head of a learned rule has predicate symbol behavior and two parameters, one of type subject and another of type value. The meaning of the modeb declarations is very similar, but they refer to the predicates that can appear in the body of the rules learned: intention, control, gteq and lteq. Sentences with the set predicate are used to configure the search of the hypothesis. The final sentence states that a subject cannot have two levels of behavior. Given these sentences Progol finds two rules: i n t e n t i o n ( s1 , 2 ) . c o n t r o l ( s1 , 1 ) . i n t e n t i o n ( s2 , 3 ) . c o n t r o l ( s2 , 2 ) . i n t e n t i o n ( s3 , 3 ) . c o n t r o l ( s3 , 4 ) . i n t e n t i o n ( s4 , 5 ) . c o n t r o l ( s4 , 4 ) . g t e q (X, Y) :− v a l u e (X) , v a l u e (Y) , X >= Y . l t e q (X, Y) :− v a l u e (X) , v a l u e (Y) , X =< Y . These sentences, together with the code below, tell Progol to construct a definition for the behavior predicate. s u b j e c t ( s1 ) . s u b j e c t ( s2 ) . s u b j e c t ( s3 ) . s u b j e c t ( s4 ) . value ( 1 ) . value ( 2 ) . value ( 3 ) . value ( 4 ) . value ( 5 ) . :− :− :− :− :− modeh ( 1 , b e h a v i o r ( + s u b j e c t , + v a l u e ) ) ? modeb ( ∗ , i n t e n t i o n ( + s u b j e c t ,− v a l u e ) ) ? modeb ( ∗ , c o n t r o l ( + s u b j e c t ,− v a l u e ) ) ? modeb ( ∗ , g t e q ( + v a l u e , # v a l u e ) ) ? modeb ( ∗ , l t e q ( + v a l u e , # v a l u e ) ) ? :− s e t ( i n f l a t e , 1 0 0 0 ) ? :− s e t ( nodes , 1 0 0 0 ) ? :− b e h a v i o r ( S , X) , b e h a v i o r ( S , Y) , n o t (X==Y ) . Next we explain the particular steps of the method of ILP+ASP with an example. Step 1. Psychological Theory This step is the same as in the method of Experimental Psychology. We apply the Theory of Planned Behavior to study ecological behavior. Step 2. Representation Define the concepts of the theory as predicates of a logic program. In this example: • behavior(S, X): subject S behaves ecologically with degree X. For example, behavior(s4, 5) represents that s4 behaves very ecologically. • intention(S, X): subject S has the intention to be ecological with degree X. • control(S, X): subject S perceives that it is easy for her to be ecological to a degree X. b e h a v i o r ( S , X) :− c o n t r o l ( S , X) , l t e q (X , 2 ) . b e h a v i o r ( S , X) :− i n t e n t i o n ( S , X) , c o n t r o l ( S , Y) , g t e q (Y , 3 ) . If the control of a subject is less or equal to 2, behavior has the same value as control, and in other case the behavior is equal to the intention. The rules are a solution to induction because they explain every example and they are consistent with the background. We can compare these rules with the linear equation of section 3.1. Only in 6 out of 25 pairs of values of intention and control their predictions differ in more than one unit, so they predict similar values. But the result of Progol is more insightful. The linear equation simply states how much does every change in intention or control contribute to a change in behavior, while the rules of Progol provide more information about how do these changes happen: when control is very low it blocks the behavior, and in other case the behavior is determined by the intention. There is an underlying problem in the construction of the model. Surveys are not a safe instrument to measure the concepts of the theory: some people can give answers that, intentionally or not, are false. We can handle this problem with Progol allowing it to predict incorrectly a given proportion of the examples. Besides, the method of ILP+ASP provides a very precise way to detect outliers. To improve the quality of Step 3. Data Collection This step is the same as in the method of Experimental Psychology: a survey is done and the results are those of table 1. Step 4. Model construction Apply an ILP system (in this example, Progol 4.4) to automatically build a model. Progol constructs logic programs from examples and background knowledge. In this case there are 4 examples of behavior that we represent as ground facts (rules without body or variables) with predicate behavior: b e h a v i o r ( s1 , 1 ) . b e h a v i o r ( s2 , 2 ) . b e h a v i o r ( s3 , 3 ) . b e h a v i o r ( s4 , 5 ) . Progol uses the background knowledge to construct rules that explain these examples. The background knowledge encodes the knowledge that the expert thinks that is relevant for the learning process, e.g., part of a psychological theory. In this example we represent the other concepts of the theory with predicates intention and control and we add predicates gteq 3 73 For further details we refer the reader to [Muggleton, 1995]. 3.4 An example in Actions the surveys psychologists can introduce related questions, so that some combinations of answers are inconsistent. For example, question q1 could be “Do you recycle the paper?” and question q2 could be “Do you recycle anything?”. If someone answers 5 (always) to q1 and 1 (never) to q2 that person is answering inconsistently. ASP can be used to precisely identify the subjects, outliers, that give inconsistent answers. For example, consider the program We explain the application of the method of ILP+ASP to dynamic domains. Step 1. Psychological theory We consider the application of the Theory of Planned Behavior to study the effects of actions on ecological intention and on ecological control. q1 ( s1 , 2 ) . q2 ( s1 , 2 ) . q1 ( s2 , 5 ) . q2 ( s2 , 1 ) . q1 ( s3 , 5 ) . q2 ( s3 , 4 ) . q1 ( s4 , 3 ) . q2 ( s4 , 3 ) . o u t l i e r ( S ) :− q1 ( S , 5 ) , q2 ( S , 1 ) . Step 2. Representation Define predicates for the actions and for the concepts of the theory that change over time, called fluents. Each instant of time is called a situation, and situations range from 0 (initial situation) to n (final situation), where n depends on each experiment. Predicates now have a new term to represent the situation in which the action took place or the situation in which the fluent value holds. Actions. Two actions may change the fluents: where the first lines represent the answer of different subjects to questions q1 and q2 and the last rule is used to detect outliers. This program has a unique answer set, which can be computed with an ASP system like Clasp, that contains the atom outlier(s2), thus detecting the unique outlier of the experiment. The outliers identified can be removed from the data set and studied apart, and the model construction step can be repeated with the new set of examples. • course(S): a course on ecology is given at situation S. Step 5. Reasoning The logic program built in the previous step is a model of ecological behavior: • car sharing(S): a project for sharing cars to go to work is done at situation S. b e h a v i o r ( S , X) :− c o n t r o l ( S , X) , l t e q (X , 2 ) . b e h a v i o r ( S , X) :− i n t e n t i o n ( S , X) , c o n t r o l ( S , C ) , g t e q ( C , 3 ) . g t e q (X, Y) :− v a l u e (X) , v a l u e (Y) , X >= Y . l t e q (X, Y) :− v a l u e (X) , v a l u e (Y) , X <= Y . value ( 1 ) . value ( 2 ) . value ( 3 ) . value ( 4 ) . value ( 5 ) . • intention(S, A, X): at S subject A has intention to be ecological with degree X. For example, intention(0, s1, 5) represents that at 0 subject s1 has a high ecological intention, and intention(2, s1, 1) represents that at 2 her intention is low. Fluents. We modify the predicates of the static case: This program can be used in ASP to solve different tasks. Prediction. Given the intention and control of a new subject we can use the model to predict her behavior. For example, if subject s5 has intention 5 and control 2 we add to the previous program the facts: • control(S, A, X): at S subject A perceives that it is easy for her to be ecological to a degree X. Step 3. Data Collection To study the effects of actions we do surveys at different situations. In this example a survey was done to 2 subjects, then a course on ecology was given and another survey was done, and finally a car sharing project was done followed by another survey. The next program represents the data: i n t e n t i o n ( s5 , 5 ) . c o n t r o l ( s5 , 2 ) . The resulting program has a unique answer set that can be computed by Clasp and contains the prediction for the behavior of s5: i n t e n t i o n ( 0 , s1 , 3 ) . i n t e n t i o n ( 0 , s2 , 2 ) . course ( 1 ) . i n t e n t i o n ( 1 , s1 , 3 ) . i n t e n t i o n ( 1 , s2 , 2 ) . car sharing (2). i n t e n t i o n ( 2 , s1 , 5 ) . i n t e n t i o n ( 2 , s2 , 2 ) . b e h a v i o r ( s5 , 2 ) Explanation. Given the behavior of a new subject and possibly some additional information we can use the model to explain her behavior. For example, we want to explain why s6 has behavior 5, and we know her control is 3. For this task we add the following sentences: c o n t r o l ( 0 , s1 , 2 ) . c o n t r o l ( 0 , s2 , 3 ) . c o n t r o l ( 1 , s1 , 5 ) . c o n t r o l ( 1 , s2 , 5 ) . c o n t r o l ( 2 , s1 , 5 ) . c o n t r o l ( 2 , s2 , 5 ) . Step 4. Model construction Apply system Iaction [Otero and Varela, 2006] to automatically build a model of the relations between the actions and the concepts considered. Iaction implements the method of [Otero, 2003]. The syntax and use of Iaction is very similar to that of Progol. For example, the mode declarations for this example are: 1 { i n t e n t i o n ( s6 , 1 ) , i n t e n t i o n ( s6 , 2 ) , i n t e n t i o n ( s6 , 3 ) , i n t e n t i o n ( s6 , 4 ) , i n t e n t i o n ( s6 , 5 ) } 1 . c o n t r o l ( s6 , 3 ) . b e h a v i o r ( s6 , 5 ) . :− b e h a v i o r ( S , X) , b e h a v i o r ( S , Y) , X! =Y . The first rule forces to choose among one of the possible values of intention, the next rule represents the known data, and the last one, like the one we used in Progol, eliminates the answer sets that predict two different values for behavior. The output of Clasp: :− :− :− :− :− :− :− i n t e n t i o n ( s6 , 5 ) , c o n t r o l ( s6 , 3 ) , b e h a v i o r ( s6 , 5 ) gives the explanation for the very high behavior: the intention is also very high. 74 modeh ( ∗ , c o n t r o l ( + s i t u a t i o n , + s u b j e c t , # v a l u e ) ) ? modeh ( ∗ , i n t e n t i o n ( + s i t u a t i o n , + s u b j e c t , # v a l u e ) ) ? modeb ( ∗ , c o u r s e ( + s i t u a t i o n ) ) ? %ACTION modeb ( ∗ , c a r s h a r i n g ( + s i t u a t i o n , + s u b j e c t ) ) ? %ACTION modeb ( ∗ , i n t e n t i o n ( + s i t u a t i o n , + s u b j e c t ,− v a l u e ) ) ? modeb ( ∗ , g t e q ( + v a l u e , # v a l u e ) ) ? modeb ( ∗ , l t e q ( + v a l u e , # v a l u e ) ) ? Planning. Given the initial and the final state of a domain, the objective of planning is to find a sequence of actions that leads from the initial state to the final one. For example, subject s4 has low behavior, medium intention and low control, and we want to find a sequence of actions that can make him become very ecological. This problem can be represented adding the next program to the domain description: Symbol %ACTION tells the system which predicates are actions. We have instructed Iaction to induce an action description for fluents control and intention (we use one modeh for each). Finally, Iaction finds this action description: i n t e n t i o n ( S , A, 5 ) :− c o u r s e ( S ) , p r e v ( S , PS ) , i n t e n t i o n ( PS , A, X) , g t e q (X , 3 ) . c o n t r o l ( S , A, 5 ) :− c a r s h a r i n g ( S ) . The course improves the intention of subjects that had at least medium intention, and the car sharing project improves the control of all subjects. The solution found by Iaction, as guaranteed by the method of [Otero, 2003], explains all the examples and is consistent with the background. s i t u a t i o n ( 0 . . 2 ) . s u b j e c t ( s4 ) . i n t e n t i o n ( 0 , s4 , 3 ) . c o n t r o l ( 0 , s4 , 2 ) . 1 { c o u r s e ( S ) , c a r s h a r i n g ( S ) } 1 :− p r e v ( S , PS ) . :− n o t b e h a v i o r ( 2 , s4 , 5 ) . The line with brackets states that each answer set must contain one and only one of the atoms inside, i.e. for each situation we must choose one of the actions. The last line defines the goal of the planning problem. The program has two answer sets that represent the solutions to the planning problems. This is part of the output of Clasp: Step 5. Reasoning We have a description of the effects of actions on fluents control and behavior. From previous experiments we also know what is the relation of behavior with intention and control. In this step we apply ASP to this model for reasoning about actions. For all tasks we use a file representing the domain description and another file representing the particular task. The domain description file contains these rules to represent the changes in the domain: Answer : 1 . . . course (1) car sharing (2) Answer : 2 . . . car sharing (1) course (2) Rules for intention and control are the result of the previous learning process, and rules for behavior are known from previous experiments. For each fluent we have to add the indirect effects for the negation of the fluent and the inertia law. For example, for fluent behavior we add rules: 4 An experiment on Human Reasoning and Decision Making The theory of the Adaptive Toolbox ([Gigerenzer and Selten, 2002] proposes that human reasoning and decision making can be modeled as a collection of fast and frugal heuristics. A fast and frugal heuristic is a simple algorithm to both build a model and make predictions on a domain. Under this hypothesis people use one of these fast and frugal heuristics to decide the solution to a problem. However, the mechanism by which a subject would select one of the heuristics is still under study. It is also possible that the same subject, trying to solve a similar problem several times, uses different heuristics in different moments. In [Rieskamp and Otto, 2006] SSL (Strategy Selection Learning), a theory based on reinforcement learning, is proposed. It explains how people decide to apply one heuristic depending on feedback received. The theory is tested on 4 experimental studies. We apply the ILP+ASP method to one of these experiments to: 1) model how a subject decides to use one of the heuristics available in a given situation, and 2) use this model to solve prediction and planning tasks. −b e h a v i o r ( S , A, X) :− b e h a v i o r ( S , A, Y) , X! =Y . b e h a v i o r ( S , A, X) :− b e h a v i o r ( PS , A, X) , n o t −b e h a v i o r ( S , A, X) , p r e v ( S , PS ) . −b e h a v i o r ( S , A, X) :− −b e h a v i o r ( PS , A, X) , n o t b e h a v i o r ( S , A, X) , p r e v ( S , PS ) . This domain description can be used for solving different tasks. Prediction. Given the state of a domain at an initial situation and a sequence of actions, the objective of prediction is to determine the state of the domain in the final state and others. For example, at initial situation subject s3 has low intention and control. Then a car sharing project is done in his workplace and after that he goes to a course on ecology. We can represent this adding the following program to the domain description: s i t u a t i o n ( 0 . . 2 ) . s u b j e c t ( s3 ) . i n t e n t i o n ( 0 , s3 , 2 ) . c o n t r o l ( 0 , s3 , 2 ) . car sharing (1). course ( 2 ) . Step 1. Psychological Theory Suppose we have two unnamed companies, A and B, and we want to decide which one is the most creditworthy. Each company is described by 6 binary cues (Financial Flexibility, Efficiency, Capital Structure. . . ). For each cue a validity value, representing the Probability of success, is also available. Both companies with their respective cues are showed to a subject, see table 2. The subject, based on this information, has to This program has a unique answer set that solves the prediction problem. This is part of the output of Clasp: b e h a v i o r ( 2 , s3 , 2 ) i n t e n t i o n ( 2 , s3 , 2 ) c o n t r o l ( 2 , s3 , 5 ) ... To improve the behavior it is necessary to improve both intention and control, and to this aim both course and car sharing actions have to be executed. However, if instead of intention(0,s4,3) we write intention(0,s4,2) the program has no answer set and thus there is no solution to the planning problem: the intention is too low to be improved giving a course, so there is no way to improve her behavior. i n t e n t i o n ( S , A, 5 ) :− c o u r s e ( S ) , p r e v ( S , PS ) , i n t e n t i o n ( PS , A, X) , g t e q (X , 3 ) . c o n t r o l ( S , A, 5 ) :− c a r s h a r i n g ( S ) . b e h a v i o r ( S , A, X) :− c o n t r o l ( S , A, X) , l t e q (X , 2 ) . b e h a v i o r ( S , A, X) :− i n t e n t i o n ( S , A, X) , c o n t r o l ( S , A, Y) , g t e q (Y , 3 ) . ... ... ... The course had no effect on the intention of s3 so even if the car sharing project increased her control the behavior remains low. 75 Cue Financial Flexibility Efficiency Capital Structure Management Own Financial Resources Qualifications of Employees Validity 79% 90% 75% 70% 85% 60% A 1 1 0 1 1 1 B 0 1 0 0 0 0 then it is assumed that he has used the TTB heuristic. In any other case, the heuristic selected by the subject (from TTB or WADD) is unknown. Summarizing, for each subject in the experiment it is shown to him a sequence of trials. In each trial the subject has to select the company A or the company B as the most creditworthy. It is assumed that the subject has used the heuristic (TTB or WADD) which selects the same answer as the subject. After each trial feedback showing if the answer was correct or incorrect is shown. The objective is to model which heuristic is used by a subject in each trial. Table 2: Example of a possible trial. A subject will select company A or company B as the most creditworthy. After the subject’s choice, feedback saying if the answer is correct or incorrect is given. Step 2. Representation We define the trials of the survey, and the answers given by the subjects, with the following predicates of a logic program. decide whether the company A or the company B is the most creditworthy. After the subject’s choice it is shown if the answer is correct or not. Then another two unnamed companies are presented to the same subject and the previous process is repeated, each repetition is named a trial. The objective of the experiment is to model how a subject decides which company is the most creditworthy. In this study, subjects are shown 168 selection problems, trials. The study is divided in 7 trial blocks, each consisting on 24 trials. In each trial block, the same 24 pairs of companies, items, are shown to the subjects, but its order is varied. The order of each company in the screen is varied too, e.g. suppose that the trial involves two companies o1 and o2, in some trials the company o1 is named A and it is showed on the left side (the column under the label A in the table 2), while the company o2 is named B and it is showed on the right (the column under the label B), while in other trials the company o1 is named B and showed on the right, and the company o2 is named A and showed on the left. For each cue ci (e.g. Efficiency) the value 1 for a given company O represents that O has the property ci (e.g. the company O is efficient). The value 0 represents that O does not have the property ci (e.g. the company O is not efficient). Each cue ci has a value, 1 or 0, associated to each company A and B. Also each cue has associated a validity value, representing the Probability of success. For example, in the table 2 for the first cue Financial Flexibility the company A has the property Financial Flexibility (1), the company B has not the property Financial Flexibility (0) and the validity for this cue is 79%. It means that the probability of choosing the right answer, if you select the company with Financial Flexibility, is of 0.79. In the experiment it is assumed that, for each trial, subjects decide which is the most creditworthy company based on the results of one of these two heuristics: Take the Best (TTB) or Weighted Additive Heuristic (WADD) [Gigerenzer and Selten, 2002]. However a subject can use TTB in one trial and WADD in another trial. Note that the subject decides to select company A or company B in each trial. Hence it is not directly known which of the two heuristic has been used by the subject. In this study it is assumed that, if only one heuristic selects the same object as the subject, then the subject must necessarily have used that heuristic. For example, if the TTB heuristic selects company A, the WADD heuristic selects company B, and the subject has selected the company A, Actions. For this experiment, it is enough to define a single action predicate, show. • show(T,Sb,I,P): item I is shown to subject Sb on trial T. The parameter P represents if the correct company is at the left or right of the computer screen. Fluents. To represent the decisions of the subject on each trial, and the information she might keep to decide which heuristic to apply, we define the following fluents: • selectedheuristic(T,Sb,H): subject Sb used heuristic H (TTB or WADD) on trial T. The system Iaction [Otero and Varela, 2006] will learn an action description for this fluent. • itemselectedheuristic(T,Sb,I,H): subject Sb used heuristic H the last time item I was shown. • answeredcorrectly(T,Sb,D): subject Sb answered correctly on trial T. Parameter D is used to keep information from previous trials, that the subject might use to take a decision. For example, answeredcorrectly(T,Sb,d0) represents that the subject answered correctly at trial T, and answeredcorrectly(T,Sb,d1) that the subject answered correctly at T − 1. • itemansweredcorrectly(T,Sb,I): subject Sb answered correctly the last time item I was shown. • selected(T,Sb,O,D): subject Sb selected company O on trial T. • itemselected(T,Sb,I,O): subject Sb selected company O the last time item I was shown. • selectedposition(T,Sb,P): subject Sb selected the company at position P of the computer screen on trial T. • itemselectedposition(T,Sb,I,P): subject Sb selected the company at position P of the computer screen the last time item I was shown. • showed(T,Sb,I,P,D): item I was shown on trial T. • feedback(T,Sb,H,R,D): the subject might have chosen the company selected by ttb, wadd, or any of the two (parameter H). This decision was either correct or incorrect (parameter R). 76 Trials selectedheuristic answeredcorrectly actions (show) companies c1 c2 cue values c3 c4 c5 c6 • sfeedback(T,Sb,H,R,C,D): counts the feedback received by the subject on the last D trials. For example, sfeedback(T,Sb,ttb,incorrect,2,d2) represents that the subject used the TTB heuristic in T and T − 1, and both times received negative feedback. Static background. The following static background is defined to represent companies, their cue values, items and the selection that the TTB and WADD heuristic would make for each of them. t0 wadd yes t1 wadd no i6,left o22 o24 1 0 0 1 0 0 0 1 1 1 0 1 t2 ttb yes i1,right o33 o23 0 1 1 0 1 0 1 1 0 1 1 1 Table 3: Predicting heuristic selection problem. • within(I,P,O1,O2): companies O1 and O2 are within item I. P represents the position of the screen where the correct company appears. This set of action laws represents what makes a subject select an heuristic. The first action law states that, if 1) the subject has used the WADD heuristic on the last two trials and answered incorrectly on both cases, and 2) for the last shown item cue c6 in the screen was different for both companies, then she will use the TTB heuristic on the next trial. The second action law states that, if 1) the subject has answered incorrectly to the last trial, 2) the subject is shown an item where both companies have the same value for cue c6, and 3) the last time this item appeared, she selected the object on the left of the computer screen, then she will select the TTB heuristic. The third action law states that, if 1) the subject has answered correctly to the last trial, 2) the subject is shown an item where both companies have a different value for cue c5, and 3) the TTB heuristic would select the company with a value of 0 in this cue, then she will apply the WADD heuristic. • correctobject(I,O): company O is the correct answer for item I. • selects(H,I,O): heuristic H selects company O for item I. • only(I,O,C,V): company O is the only within item I that has a value of V for cue C. For example, if item i1 is formed by companies o1 and o2, only(i1,o1,c1,1) represents that o1 has a value of 1 for cue c1, while o2 has a value of 0. • same(I,C): both companies on item I have the same value for cue C. Step 3. Data collection We use the answers of 20 subjects from Study 1 of [Rieskamp and Otto, 2006]4 . For each subject, the results of the survey are represented as ground facts in a logic program, using the predicates of step 2. The following are examples of these: show ( t 1 , s1 , i 6 , l e f t ) . w i t h i n ( i 6 , l e f t , o22 , o24 ) . s e l e c t e d h e u r i s t i c ( t 1 , s1 , wadd ) . n a n s w e r e d c o r r e c t l y ( t 1 , s1 , d0 ) . ... The first fact represents that, at trial t1, subject s1 was shown item i6, with the correct object at the left of the computer screen. The second fact represents that item i6 is formed by companies o22 and o24. Finally, the last two facts represent the answer of the subject: he has used the WADD heuristic and thus answered incorrectly. Step 5. Reasoning The action theory built on the previous step is combined with the background defined on step 2. The resulting model can be used to predict, explain and plan for subject answers. Prediction. Given a sequence of trials, the model can be used to predict which heuristic the subject will use on each of them. For example, consider the problem summarized in table 3. At trial 0 (t0), the subject has selected the WADD heuristic, and has answered correctly. We now know that the subject will have to answer to items i6 and i11, and that the correct object appears in the left and right of the screen, respectively. Table 3 shows the companies within each item, their cue values, and which company would be selected by the TTB and WADD heuristic. With this information, the goal is to decide which heuristics the subject will use to answer to these two items. To solve this problem we add the following rules to the logic program: Step 4. Model construction We use the system Iaction [Otero and Varela, 2006] to build an action theory representing how a subject selects heuristics. An action theory is built for each subject. The following is an action theory built by the system: s e l e c t e d h e u r i s t i c ( T , Sb , t t b ): − show ( T , Sb , I , P ) , p r e v ( T , P t ) , f e e d b a c k ( P t r , Sb , wadd , i n c o r r e c t , d0 ) , showed ( P t r , Sb , I2 , P2 , d0 ) , o n l y ( I2 , O, c6 , 1 ) . s e l e c t e d h e u r i s t i c ( T , Sb , t t b ): − show ( T , Sb , I , P ) , p r e v ( Tr , P t ) , n a n s w e r e d c o r r e c t l y ( P t r , Sb , d0 ) , csame ( I , c6 ) , i t e m s e l e c t e d ( P t r , Sb , I , O1 ) , w i t h i n ( I , P , O1 , O2 ) . s e l e c t e d h e u r i s t i c ( Tr , Sb , wadd ): − show ( T , Sb , I , P ) , p r e v ( Tr , P t ) , a n s w e r e d c o r r e c t l y ( Pt , Sb , d0 ) , o n l y ( I , O, c5 , 0 ) , s e l e c t s ( t t b , I , O ) . s e l e c t e d h e u r i s t i c ( t 0 , s1 , wadd ) . a n s w e r e d c o r r e c t l y ( t 0 , s 1 ) . .... show ( t 1 , s1 , i 6 , l e f t ) . show ( t 2 , s1 , i 1 , r i g h t ) . First, we specify the state of the subject at trial t0 as a set of ground facts. Then, we specify the sequence of actions that the subject will be shown. With these rules we get a single solution, the prediction shown in table 3: 4 The authors wish to thank Dr. Jörg Rieskamp for providing the data used in this section. 77 dynamic domains need to be modeled. s e l e c t e d h e u r i s t i c ( t 0 , s1 , wadd ) s e l e c t e d h e u r i s t i c ( t 1 , s1 , wadd ) s e l e c t e d h e u r i s t i c ( t 2 , s1 , t t b ) Acknowledgments. This research is partially supported by the Government of Spain, grant AP2008-03841, and in part by the Government of Galicia (Spain), under grant PGIDIT08-IN840C. Planning. In the planning task, the goal is to find a sequence of actions that would make the subject show a certain behavior, e.g. using an heuristic or answering a question incorrectly. As an example, consider the same problem shown in table 1. This time, however, we just know that the subject has used the WADD heuristic at trial 0, and that we want her to use the TTB heuristic on trial 2. To solve this problem we add the following rules to the program: References [Ajzen, 1985] I. Ajzen. From intentions to actions: a theory of planned behavior. Action-control: from cognition to behavior, pages 11–39, 1985. [Balduccini and Girotto, 2010] M. Balduccini and S. Girotto. Formalization of psychological knowledge in answer set programming and its application. Theory Pract. Log. Program., 10:725–740, 2010. [Gelfond and Lifschitz, 1991] M. Gelfond and V. Lifschitz. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9:365–385, 1991. [Gigerenzer and Selten, 2002] G. Gigerenzer and R. Selten. Bounded Rationality: The Adaptive Toolbox. The MIT Press, 2002. [King et al., 1996] R.D. King, S.H. Muggleton, A. Srinivasan, and M. Sternberg. Structure-activity relationships derived by machine learning: the use of atoms and their bond connectives to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences, 93:438–442, 1996. [Lifschitz, 2002] Vladimir Lifschitz. Answer set programming and plan generation. Artificial Intelligence, 138:39– 54, 2002. [McCarthy and Hayes, 1969] J. McCarthy and P. J. Hayes. Some philosophical problems from the standpoint of artificial intelligence. In Machine Intelligence, pages 463–502. Edinburgh University Press, 1969. [Muggleton, 1995] S. H. Muggleton. Inverse entailment and progol. New Generation Computing, 13:245–286, 1995. [Nogueira et al., 2000] M. Nogueira, M. Balduccini, M. Gelfond, R. Watson, and M. Barry. An a-prolog decision support system for the space shuttle. In In PADL 2001, pages 169–183. Springer, 2000. [Otero and Varela, 2006] R. Otero and M. Varela. Iaction, a system for learning action descriptions for planning. Proceedings of the 16th Int. Conference on Inductive Logic Programming, ILP-06. LNAI, 4455, 2006. [Otero, 2003] R. Otero. Induction of the effects of actions by monotonic methods. Proceedings of the 13th Int. Conference on Inductive Logic Programming, ILP-03. LNAI, 2835:193–205, 2003. [Rieskamp and Otto, 2006] J. Rieskamp and P. E Otto. Ssl: a theory of how people learn to select strategies. Journal of Experimental Psychology: General, 135(2):219–238, 2006. s e l e c t e d h e u r i s t i c ( t 0 , s1 , wadd ) . a n s w e r e d c o r r e c t l y ( t 0 , s 1 ) . .... 1 { show ( T , Sb , I , P ) : i t e m ( I ) : p o s i t i o n ( P ) } 1 :− p r e v ( T , P t ) , s u b j e c t ( Sb ) . :− s e l e c t e d h e u r i s t i c ( T , Sb , H) , s e l e c t e d h e u r i s t i c ( T , Sb , H2 ) , H! =H2 . :− n o t s e l e c t e d h e u r i s t i c ( t 2 , s1 , t t b ) . First, we specify the state of the subject at trial t0 as in the prediction task. Then, we specify the planning problem using three rules. The first rule defines the set of possible solutions for the task, the second rule grants that the solutions found are consistent, and the last rule represents the goal of the problem. Running Clasp we get all possible solutions. For example: show ( t 8 8 , s107 , i 6 , l e f t ) show ( t 8 9 , s107 , i 1 , r i g h t ) that is the same used in the prediction example. Discussion. To model the process of strategy selection [Rieskamp and Otto, 2006] have proposed the SSL theory. According to this theory subjects start with an initial preference for each heuristic. At each trial subjects use the most preferred heuristic, and they update their preferences depending on the performance of the heuristics (see [Rieskamp and Otto, 2006] for a formal definition). On a preliminary study SSL correctly predicted 80% of the trials where the result of TTB and WADD is different, and in the same setting the ILP+ASP method predicted correctly 88% of the trials. Note that with ILP+ASP the model is built automatically, without prior knowledge of how the process of strategy selection is done. And the resulting model can provide new insight on this process. For example, the action theory constructed by Iaction suggests that the cue c6, that appears in the bottom of the computer screen, could be relevant for the decisions of the subjects. Finally, we have seen how these models can be used in an ASP system to predict and plan about subject answers. 5 Conclusions We have proposed a new method of ILP+ASP for Experimental Psychology. It is a correct and complete method for induction of logic programs that provides a general form of representation and can be used to solve relevant tasks like explanation and planning. We have applied the method for learning and reasoning in a real-world dynamic domain, thus improving the methods used in Experimental Psychology, that do not consider these problems. As of future work we will apply the method to other fields of Psychology where 78 Tractable Strong Outlier Identification Fabrizio Angiulli [email protected] ECSS Dept. University of Calabria Via P. Bucci, 41C, 87036 Rende (CS), Italy Rachel Ben-Eliyahu–Zohary [email protected] Software Engineering Dept. Jerusalem College of Engineering Jerusalem, Israel Abstract Intelligence literature. In particular, for such a formalism to be suitable to attack interesting KR problems it must be nonmonotonic, so that it is possible to naturally exploit defeasible reasoning schemas. Among the nonmonotonic knowledge representation formalisms, Reiter’s default logic (Reiter, 1980) occupies a preeminent and well-recognized role. In a recent paper (Angiulli, Zohary, & Palopoli, 2008) formally defined the outlier detection problem in the context of Reiter’s default logic knowledge bases and studied some associated computational problems. Following (Angiulli et al., 2008), this problem can be intuitively described as follows: outliers are sets of observations that demonstrate some properties contrasting with those that can be logically “justified” according to the given knowledge base. Thus, along with outliers, their witnesses, which are sets of observations encoding the unexpected properties associated with outliers, are singled out. To illustrate this technique, consider a case where during the same day, a credit card number is used several times to pay for services provided through the Internet. This sounds normal enough, but add to that the interesting fact that the payment is done through different IPs, each of which is located in a different country! It might be the case that the credit card owner is traveling on this particular day, but if the different countries from which the credit card is used are located in different continents we might get really suspicious about who has put his hands on these credit card numbers. Another way to put it, is to say that the fact that the credit card number is used in different continents during the same day makes this credit card an outlier, and one of the probable explanations for such a phenomenon is that the credit card numbers have been stolen. This example is discussed further in Section 3.3. As noted in (Angiulli et al., 2008),outlier detection problems are generally computationally quite hard, with their associated complexities ranging from DP -complete to DP 3 -complete, depending on the specific form of problem one decides to deal with. For this reason, (Angiulli, Zohary, & Palopoli, 2010) singled out several cases where a very basic outlier detection problem, that is, the problem of recognizing an outlier set and its witness set, can be solved in polynomial time. In knowledge bases expressed in default logic, outliers are sets of literals, or observations, that feature unexpected properties. This paper introduces the notion of strong outliers and studies the complexity problems related to outlier recognition in the fragment of acyclic normal unary theories and the related one of mixed unary theories. We show that recognizing strong outliers in acyclic normal unary theories can be done in polynomial time and, moreover, that this result is sharp, since switching to either general outliers, cyclic theories or acyclic mixed unary theories makes the problem intractable. This is the only fragment of default theories known so far for which the general outlier recognition problem is tractable. Based on these results, we have designed a polynomial time algorithm for enumerating all strong outliers of bounded size in an acyclic normal unary default theory. These tractability results rely on the Incremental Lemma which is also presented. This useful Lemma provides conditions under which a mixed unary default theory displays a monotonic reasoning behavior. 1 Luigi Palopoli [email protected] ECSS Dept. University of Calabria Via P. Bucci, 41C, 87036 Rende (CS), Italy Introduction Detecting outliers is a premiere task in data mining. Although there is no universal definition of outlier, it is usually referred to as an observation that appears to deviate markedly from the other observations or to be inconsistent with the remainder of the data (Hawkins, 1980). Applications of outlier detection include fraud detection, intrusion detection, activity and network monitoring, detecting novelties in various contexts, and many others (Hodge & Austin, 2004; Chandola, Banerjee, & Kumar, 2009). Consider a rational agent acquiring information about the world stated in the form of a sets of facts. It is analogously relevant to recognize if some of these facts disagree with her own view of the world. Obviously such a view has to be encoded somehow using one of the several KR&R formalisms defined and studied in the Artificial 79 proofs are ommited. All the proofs can be found in the full version of the paper, see (Angiulli, Zohary, & Palopoli, ). A cumulative look at the results presented in (Angiulli et al., 2008, 2010), provides an idea of the tractability frontier associated with outlier detection problems in default logic. In this paper, we continue along this line of research and attempt to draw, as precisely as possible, such a tractability frontier. We want to depict the contour of a tractability region for outlier detection problems that refers to the well-known fragment of unary propositional default theories. In particular, motivated by the intractability of the general outlier recognition problem in all the classes of theories considered thus far in the literature, we investigate this problem within further subsets of the classes already studied, such as the fragment of acyclic normal unary theories and the related one of mixed unary theories. We also introduce a new type of outliers which we will call strong outliers. Informally speaking, acyclic normal unary theories are normal unary theories characterized by a bounded degree of cyclicity, while strong outliers are outliers characterized by a stronger relationship with their witness set than in the general case. In this context, we have been able to prove that recognizing strong outliers under acyclic normal unary theories can be done in polynomial time and, moreover, that this result is sharp, since switching either to general outliers, to cyclic theories or to acyclic mixed unary theories makes the problem intractable. Notably, this is the only only fragment of default theories known so far for which the general outlier recognition problem is tractable. Based on these results, we designed a polynomial time algorithm for enumerating all strong outliers of bounded size in an acyclic normal unary default theory. This algorithm can also be employed to enumerate all strong outliers of bounded size in a general normal mixed unary theory and, with some minor modifications, all the general outliers and witness pairs of bounded size. However, in this latter case, since the problems at hand are NP-hard, its worst case running time is exponential, even if from a practical point of view it can benefit from some structural optimizations which allows the algorithm to reduce the size of the search space. The rest of the paper is organized as follows. Section 2 recalls the definition of default logics and that of the outlier detection task in the framework of default reasoning. Section 3 introduces the definitions of mixed unary and acyclic unary default theories, the definition of strong outlier, and provides a roadmap of the technical results that will be presented in the rest of the paper. Section ?? presents intractability results while Section ?? presents some computational characterizations of mixed unary theories, the tractability result concerning strong outlier recognition that completes the layout of the tractability frontier, and the polynomial time strong outlier enumeration algorithm for acyclic unary default theories. To conclude, Section 4 runs through the complexity results presented in Sections ?? and ?? once more, this time focusing on their complementarity and commenting also upon the application of the outlier enumeration algorithm within more general scenarios. The section ends with our conclusions. Due to space constraints, many 2 Outlier Detection using Default Logic Default logic was introduced by Reiter (Reiter, 1980). We first recall basic facts about its propositional fragment. For T , a propositional theory, and S, a set of propositional formulae, T ∗ denotes the logical closure of T , and ¬S the set {¬(s)|s ∈ S}. A set of literals L is inconsistent if ¬ℓ ∈ L for some literal ℓ ∈ L. Given a literal ℓ, letter(ℓ) denotes the letter in the literal ℓ. Given a set of literals L, letter(L) denotes the set {A | A = letter(ℓ) for some ℓ ∈ L}. 2.1 Syntax A propositional default theory ∆ is a pair (D, W ) where W is a set of propositional formulae and D is a set of default rules. We assume that both sets D and W are finite. A default rule δ is α : β1 , . . . , βm γ (1) where α (called prerequisite), βi , 1 ≤ i ≤ m (called justifications) and γ (called consequent) are propositional formulae. For δ a default rule, pre(δ), just(δ), and concl (δ) denote the prerequisite, justification, and consequent of δ, respectively. Analogously, given a set of default rules, D = {δ1 , . . . , δn }, pre(D), just(D), and concl (D) denote, respectively, the sets {pre(δ1 ), . . ., pre(δn )}, {just(δ1 ), . . . , just(δn )}, and {concl (δ1 ), . . . , concl (δn )}. The prerequisite may be missing, whereas the justification and the consequent are required (an empty justification denotes the presence of the identically true literal true specified therein). Next, we introduce some well-known subsets of propositional default theories relevant to our purposes. Normal theories. If the conclusion of a default rule is identical to the justification the rule is called normal. A default theory containing only normal default rules is called normal. Disjunction-free theories. A propositional default theory ∆ = (D, W ) is disjunction free (DF for short) (Kautz & Selman, 1991), if W is a set of literals, and, for each δ in D, pre(δ), just(δ), and concl (δ) are conjunctions of literals. Normal mixed unary theories. A DF default theory is normal mixed unary (NMU for short) if its set of defaults contains only rules of the form α:β β , where α is either empty or a single literal and β is a single literal. Normal and dual normal unary theories. An NMU default theory is normal unary (NU for short) if the prerequisite of each default is either empty or positive. An NMU default theory is dual normal (DNU for short) unary if the prerequisite of each default is either empty or negative. Figure 1 highlights the set-subset relationships between the above fragments of default logic. 80 of generating defaults of E. We assume that the set of generating defaults is maximal, that is, for every δ ∈ D, if δ is applicable in E then, for some 1 ≤ i ≤ n, δ = δi . Although default theories are non-monotonic, normal default theories satisfy the property of semimonotonicity (see Theorem 3.2 of (Reiter, 1980)). Semimonotonicity in default logic means the following: Let ∆ = (D, W ) and ∆′ = (D′ , W ) be two default theories such that D ⊆ D′ ; then for every extension E of ∆ there is an extension E ′ of ∆′ such that E ⊆ E ′ . A default theory may not have any extensions (an ex:β }, ∅). Then, a default theory ample is the theory ({ ¬β is called coherent if it has at least one extension, and incoherent otherwise. Normal default theories are always coherent. A coherent default theory ∆ = (D, W ) is called inconsistent if it has just one extension which is inconsistent. By Theorem 2.2 of (Reiter, 1980), the theory ∆ is inconsistent iff W is inconsistent. The theories examined in this paper are always coherent and consistent, since only normal default theories (D, W ) with W a consistent set of literals are taken into account. The entailment problem for default theories is as follows: Given a default theory ∆ and a propositional formula φ, does every extension of ∆ contain φ? In the affirmative case, we write ∆ |= φ. For a set of propositional formulas S, we analogously write ∆ |= S to denote (∀φ ∈ S)(∆ |= φ). Figure 1: A map of the investigated fragments default theory 2.2 Semantics The informal meaning of a default rule δ is as follows: If pre(δ) is known to hold and if it is consistent to assume just(δ), then infer concl (δ). The formal semantics of a default theory ∆ is defined in terms of extensions. A set E is an extension for a theory ∆ = (D, W ) if it satisfies the following set of equations: • E0 = W , • n for γ| • E= ≥ i α:β1 ,...,βm γ ∞ [ 0, Ei+1 = Ei∗ ∈ D, α ∈ Ei , ¬β1 6∈ E, . . . , ¬βm ∪ o 6 E , ∈ 2.3 Outliers in Default Logic The issue of outlier detection in default theories is extensively discussed in (Angiulli et al., 2008). The formal definition of outlier there proposed is given as follows. For a given set W and a list of sets S1 , . . . , Sn , WS1 ,...,Sn denotes the set W \ (S1 ∪ S2 ∪ . . . ∪ Sn ). Definition 2.2 (Outlier and Outlier Witness Set) (Angiulli et al., 2008) Let ∆ = (D, W ) be a propositional default theory and let L ⊆ W be a set of literals. If there exists a non-empty subset S of WL such that: 1. (D, WS ) |= ¬S, and 2. (D, WS,L ) 6|= ¬S then L is an outlier set in ∆ and S is an outlier witness set for L in ∆. The intuitive explanation of the different roles played by an outlier and its witness is as follows. Condition (i) of Definition 2.2 states that the outlier witness set S denotes something that does not agree with the knowledge encoded in the defaults. Indeed, by removing S from the theory at hand, we obtain ¬S. In other words, if S had not been observed, then, according to the given defaults, we would have concluded the exact opposite. Moreover, condition (ii) of Definition 2.2 states that the outlier L is a set of literals that, when removed from the theory, makes such a disagreement disappear. Indeed, by removing both S and L from the theory, ¬S is no longer obtained. In other words, disagreement for S is a consequence of the presence of L in the theory. To summarize, the set S witnesses that the piece of knowledge Ei . i=0 Given a default δ and an extension E, we say that δ is applicable in E if pre(δ) ∈ E and (6 ∃c ∈ just(δ))(¬c ∈ E). It is well known that an extension E of a finite propositional default theory ∆ = (D, W ) can be finitely characterized through the set DE of the generating defaults for E w.r.t. ∆ (Reiter, 1980; Zhang & Marek, 1990). Next we introduce a characterization of an extension of a finite DF propositional theory which is based on a lemma from (Kautz & Selman, 1991). Lemma 2.1 Let ∆ = (D, W ) be a DF default theory; then E is an extension of ∆ if there exists a sequence of defaults δ1 , ..., δn from D and a sequence of sets E0 , E1 , ..., En , such that for all i > 0: • E0 = W , • Ei = Ei−1 ∪ concl (δi ), • pre(δi ) ⊆ Ei−1 , • (6 ∃c ∈ just(δi ))(¬c ∈ En ), • (6 ∃δ ∈ D)(pre(δ) ⊆ En ∧ concl (δ) 6⊆ En ∧ (6 ∃c ∈ just(δ))(¬c ∈ En )), • E is the logical closure of En , where En is called the signature set of E and is denoted liter(E) and the sequence of rules δ1 , ..., δn is the set DE 81 denoted by L behaves, in a sense, exceptionally, tells us that L is an outlier set and S is its associated outlier witness set. The intuition here is better illustrated by referring to the example on stolen credit card numbers given in the Introduction. A default theory ∆ = (D, W ) that describes such an episode might be as follows: n o umber:¬M ultipleIP s , – D = CreditN¬M ultipleIP s Table 1 summarizes previous complexity results, together with the results that constitute the contributions of the present work that will be detailed later in this section. In particular, the complexity of the Outlier Recognition and the Outlier-Witness Recognition problems has been studied in (Angiulli et al., 2008) for general and disjunction-free (DF) default theories and in (Angiulli et al., 2010) for normal unary (NU) and dual normal unary (DNU) default theories. The results there pointed out that the general problem of recognizing an outlier set is always intractable (see Theorem 4.3 in (Angiulli et al., 2008) and Theorem 3.6 in (Angiulli et al., 2010)). As for recognizing an outlier together with its witness, this problem is intractable for general and disjunctionfree default theories (see Theorem 4.6 in (Angiulli et al., 2008)), but can be solved in polynomial time if either NU or DNU default theories are considered. Regarding the latter result, it is interesting to note that, while for both NU and DNU default theories the entailment of a literal can be decided in polynomial time, deciding the entailment in DF default theories is intractable. Motivated by the intractability of the general Outlier Recognition problem on all classes of default theories considered so far, in this paper we take some further steps in analyzing the complexity of outlier detection problems in default logics in order to try to chart the associated tractability frontier. To this end, in the next sections we consider further subsets of the classes already mentioned, referred to as Acyclic Normal Unary and Acyclic Dual Normal Unary theories, and a specific kind of outlier, which we will call Strong Outliers. The latter, loosely speaking are characterized by a stronger relationship with their witness set than in the general case. Then, in Subsection 3.5, the main results of our complexity analysis are overviewed. – W = {CreditN umber, M ultipleIP s}. Here, the credit card number might be stolen, for otherwise it wouldn’t have been used over different continents during the same day. Accordingly, L = {CreditN umber} is an outlier set here, and S = {M ultipleIP s} is the associated witness set. This reasoning agrees with our intuition that an outlier is, in some sense, abnormal and that the corresponding witness testifies to it. Note that sets of outliers and their corresponding witness sets are selected among those explicitly embodied in the given knowledge base. Hence, outlier detection using default reasoning is essentially a knowledge discovery technique. As such, it can be very useful, to give one example, when applied to information systems for crime prevention and homeland security, because the outlier detection technique can be exploited in order to highlight suspicious individuals and/or events. Several examples for the usefulness of this approach are given in (Angiulli et al., 2008) and in Section 3.3 below. 3 Charting the Tractability Frontier of Outlier Detection Subsection 3.1 recalls two main outlier detection tasks and the related complexity results known so far; Subsection 3.2 introduces acyclic theories and an interesting restriction of the above defined concept of outlier, that we call strong outliers, and finally, Subsection 3.5 presents the plan of a set of results that will allow us to chart the tractability frontier in the context of propositional normal mixed unary default theories. 3.1 3.2 Strong Outliers and Acyclic Theories Next, the definitions of strong outlier set (Section 3.3) and of acyclic default theory (Section 3.4) are given. 3.3 Strong Outliers Recall the definition of outlier set already provided in Section 2.3 (see Definition 2.2). Conditions 1 and 2 of the Definition 2.2 of outlier of Subsection 2.3 can be rephrased as follows: 1. (∀ℓ ∈ S)(D, WS ) |= ¬ℓ, and 2. (∃ℓ ∈ S)(D, WS,L ) 6|= ¬ℓ. In other words, condition 1 states that the negation of every literal ℓ ∈ S must be entailed by (D, WS ) while, according to condition 2, it is sufficient for just one literal ℓ ∈ S to exist whose negation is not entailed by (D, WS,L ). Hence, there is a sort of “asymmetry” between the two conditions, which is the direct consequence of the semantics of the entailment established for sets of literals. It is clear that, at least from a purely syntactic point of view, the relationship between the outlier set and its Outlier Detection Problems The computational complexity of discovering outliers in default theories under various classes of default logics has been previously investigated in (Angiulli et al., 2008). In particular, the two main recognition tasks in outlier detection are the Outlier Recognition and the Outlier-Witness Recognition problems (also called Outlier(L) and Outlier(S)(L), respectively, in (Angiulli et al., 2008)), and are defined as follows: - Outlier Recognition Problem: Given a default theory ∆ = (D, W ) and a set of literals L ⊆ W , is L an outlier set in ∆? - Outlier-Witness Recognition Problem: Given a default theory ∆ = (D, W ) and two sets of literals L ⊂ W and S ⊆ WL , is L an outlier set with witness set S in ∆? 82 Problem Outlier Type General Outlier Recognition Strong Outlier-Witness Recognition General Strong DF Default (D)NU Default ΣP ΣP 3 -c 2 -c ∗ Th.4.3 Th.4.3∗ NP-hard Th.3.13 DP DP -c 2 -c ∗ Th.4.6 Th.4.6∗ NP-hard Th.3.4 NP-c Th.3.6∗∗ NP-c Th.3.13 P Th.3.1∗∗ P Th.3.3 General Default Acyclic (D)NU Default NP-c Th.3.9 P Th.3.14 P Th.3.1∗∗ P Th.3.3 Table 1: Complexity results for outlier detection (∗ =reported in (Angiulli et al., 2008), et al., 2010)). witness set can be strengthened by replacing the existential quantifier in Condition 2 with the universal one, thus breaking the aforementioned asymmetry between the two conditions and obtaining the following definition of strong outlier set. Definition 3.1 (Strong Outlier) Let ∆ = (D, W ) be a propositional default theory and let L ⊂ W be a set of literals. If there exists a non-empty subset S of WL such that: ∗∗ =reported in (Angiulli CreditN umber:¬M ultipleIP s ¬M ultipleIP s 1. – Normally, credit card numbers are not used in different continents during the same day; 2. CellU se:M f C MfC – (MfC stands for “mostly from contacts”) - Normally numbers dialed from a cell phone are mostly from the contact list; CellU se:¬QuietT ime ¬QuietT ime – Normally people do not use the phone during their “quiet time” - e.g. late at night; se:¬N ewLocation 4. CellU – Normally cell phones are ¬N ewLocation used in locations in which the device was used in the past. Now, suppose that a pickpocket stole Michelle’s cellphone and purse from her handbag. She came home late and didn’t notice the theft till morning. While she was sleeping, the pickpocket could broadcast her credit card numbers through malicious servers over the Internet and use her cellphone to make expensive phone calls. A sophisticated crime prevention information system could automatically notice exceptional behaviors, and make the following observations: QuietTime – calls are made from the device during abnormal hours; ¬MfC – It is not the case that most of the calls’ destinations are from the phone’s contact list; NewLocation – The device is in a location where it hasn’t been before; MultipleIPs – The credit card number is used in different continents during the last day. Let us now consider the default theory ∆ = (D, W ), where D is the set of Defaults 1-4 introduced above, and W = {CreditN umber, CellU se, ¬M f C, QuietT ime, N ewLocation, M ultipleIP s}. According to the definition of outlier given in (Angiulli et al., 2008) (see Definition 2.2), we get that L = {CreditN umber} is an outlier and S = {¬M f C, N ewLocation, QuietT ime, M ultipleIP s} is a possible witness set for L. This last witness set is also a witness set for the outlier {CellU se}. However, although the observations M f C and QuietT ime are in the witness set of the outlier {CreditN umber}, they do not explain why 3. 1. (∀ℓ ∈ S)(D, WS ) |= ¬ℓ, and 2. (∀ℓ ∈ S)(D, WS,L ) 6|= ¬ℓ then L is a strong outlier set in ∆ and S is a strong outlier witness set for L in ∆. The following proposition is immediately proved: Proposition 3.2 If L is a strong outlier set then L is an outlier set. Proof: Straightforward. Note that, in general the vice versa of Proposition 3.2 does not hold. We study next the impact of restricting attention to strong outliers on the computational complexity of outlier detection problems. Before doing that, we first discuss the significance of the knowledge associated with strong outliers. We begin with an example that is an extension of the credit card scenario presented in the Introduction. Recall that a credit card number is suspected to be stolen since it was used from several IPs in different continents during the same day. The example we give now is related to violating normal behavioral patterns in using a cellular phone. Normally, almost all the numbers that people call are from their contacts list. In addition, for each cell phone user, there are hours of the day during which she normally does not use the phone. For example, most users would not use the phone during the night hours. Finally, for a typical cellphone user, there is a list of locations from which she normally calls. The knowledge described above can be summarized using the following defaults: 83 {CreditN umber} is an outlier. Similarly, the observation M ultipleIP s is in the witness set of the outlier {CellU se} but it does not explain why {CellU se} is an outlier. One might suggest that in order to improve the behavior demonstrated above, we should look for a minimal witness set. However, it seems to us counter intuitive to look for a minimal witness set. If we identify an outlier, we would like to have a maximal set of the observations that support our suspicion. In the example above, {¬M f C} is a minimal witness set for the outlier {CellU se}, but its superset {¬M f C, N ewLocation, QuietT ime}, which is also a witness set for the outlier {CellU se}, provides more information. The notion of strong outlier presented above seems to adequately capture, in such scenarios as that depicted in this example situation, the notion of outlier and its witness set. If we use the definition of strong outlier we get that S = {¬M f C, N ewLocation, QuietT ime, M ultipleIP s} is neither a witness set for the outlier {CreditN umber} nor a witness set for the outlier {CellU se}. A witness set for the outlier {CellU se} is, instead, the set {¬M f C, N ewLocation, QuietT ime} or any of its nonempty subsets, while a witness set for the outlier {CreditN umber} is the set {M ultipleIP s}. We now turn to the complexity issues. In order to mark the tractability landscape of the new strong outlier detection problem, we provide two results, the former one regarding the tractability of the outlier-witness recognition problem and the latter one pertaining to its intractability. In order to complete the proof, we note that a singleton witness set is always a strong witness set and, hence, the above reduction immediately applies to strong outliers as well. 3.4 Acyclic NU and DNU theories In this section, acyclic normal mixed unary default theories are defined. We begin by introducing the notions of atomic dependency graph and that of tightness of a NMU default theory. Definition 3.5 (Atomic Dependency Graph) Let ∆ = (D, W ) be a NMU default theory. The atomic dependency graph (V, E) of ∆ is a directed graph such that – V = {l | l is a letter occurring in ∆}, and – E = {(x, y) | letters x and y occur respectively in the prerequisite and the consequent of a default in D}. Definition 3.6 (A set influences a literal) Let ∆ = (D, W ) be an NMU default theory. We say that a set of literals S influences a literal l in ∆ if for some t ∈ S there is a path from letter(t) to letter(l) in the atomic dependency graph of ∆. Definition 3.7 (Tightness of an NMU theory) The tightness c of an NMU default theory is the size c (in terms of number of atoms) of the largest strongly connected component (SCC) of its atomic dependency graph. Intuitively, an acyclic NMU default theory is a theory whose degree of cyclicity is fixed, where its degree of cyclicity is measured by means of its tightness, as formalized in the following definition. Theorem 3.3 Strong Outlier-Witness Recognition on propositional NU default theories is in P. Proof: The proof is immediate since the statement follows from the definition of strong outlier set (Definition 3.1) and the fact that the entailment problem on propositional NU default theories is polynomial time solvable (as proved in (Kautz & Selman, 1991; Zohary, 2002)). Definition 3.8 (Acyclic NMU theory) Given a fixed positive integer c, a NMU default theory is said to be (c-)acyclic, if its tightness is not greater than c. Figure 1 in Section 2 highlights the containment relationship among DF, NMU, NU, DNU, and acyclic default theories. For the sake of simplicity, in the following sections we refer to c-acyclic theories simply as acyclic theories. As for the complexity of the Strong Outlier-Witness Recognition problem on propositional DF and general default theories, the following statement holds. Theorem 3.4 Strong Outlier-Witness Recognition on propositional DF default theories is NP-hard. Proof: The statement follows from the reduction employed in Theorem 4.6 of (Angiulli et al., 2008), where it is proved that given two DF default theories ∆1 = (D1 , ∅) and ∆2 = (D2 , ∅), and two letters s1 and s2 , the problem q of deciding whether ((∆1 |= s1 )∧(∆2 |= s2 )) is valid can be reduced to the outlier-witness problem; that is, to the problem of deciding whether L = {s2 } is an outlier having witness set S = {¬s1 } in the theory ∆(q), where ∆(q) = (D(q), W (q)) is the propositional DF de| α:β fault theory with D(q) = { s2 ∧α:β β β ∈ D1 } ∪ D2 and W (q) = {¬s1 , s2 }. Since the former problem is NP-hard, it follows from the reduction that the latter problem is NP-hard as well. 3.5 Main Results It is clear from the definition of outlier that tractable subsets for outlier detection problems necessarily have to be singled out by considering theories for which the entailment operator is tractable. Thus, with the aim of identifying tractable fragments for the outlier recognition problem, we have investigated its complexity on acyclic (dual) normal unary default theories. These theories form a strict subset of normal unary default theories already considered in (Angiulli et al., 2010) (other than being a subset of acyclic NMU theories), for which the entailment problem is indeed polynomially time solvable (proved in (Kautz & Selman, 1991; Zohary, 2002)) and for which the outlier recognition problem is known to be NP-complete (Th. 3.6 of (Angiulli et al., 2010)). 84 procedure can be built that enumerates all the potential witness sets S for the outlier L and checks that S is actually a witness set for L. The formal proofs of Theorem 3.14 and of Lemmas 3.10 and 3.11 are reported in the full paper (See (Angiulli et al., ). It is important to note that Lemma 3.11 cannot actually be exploited to prove the tractability of the Strong Outlier Recognition for NMU theories since for these theories the entailment problem remains intractable. Indeed, we show in the full paper that deciding the entailment is co-NP-complete even for NMU theories with tightness one. This latter result is complemented by the following one. Unexpectedly, it turns out that recognizing general outliers is intractable even in this rather restricted class of default theories, as accounted for in the theorem whose statement is reported below. Theorem 3.9 Outlier Recognition for NU acyclic default theories is NP-complete. Note that the results for NU (DNU, resp.) theories immediately apply to DNU theories, since given an NU (DNU, resp.) theory ∆, the dual theory ∆ of ∆ is obtained from ∆ by replacing each literal ℓ in ∆ with ¬ℓ is a DNU (NU, resp.) theory that has the same properties of its dual. Unfortunately, this result confirms that detecting outliers even in default theories as structurally simple as acyclic NU and DNU ones remains inherently intractable. Therefore, in order to chart the tractability frontier for this problem, we looked into the case of strong outliers. To characterize the complexity of this problem, a technical lemma, called the incremental lemma, is needed. The Incremental Lemma provides an interesting monotonicity characterization in NMU theories which is valuable on its own. The statement of the incremental lemma is reported next. Theorem 3.13 Strong Outlier Recognition for NU cyclic default theories is NP-complete. In particular, both Theorems 3.9 and 3.13 make use of a lemma that informally speaking, establishes that, despite the difficulty to encode the conjunction of a set of literals using a NU theory, a CNF formula can nonetheless be evaluated by means of condition 1 of Definition 2.2 applied to an acyclic NU theory, provided that the size of S is polynomial in the number of conjuncts in the formula. The tractability results complements the intractability results since Lemma 3.11 establishes that the size of a minimal strong outlier witness set is upper bounded by the tightness of the NU theory. Recognizing strong outliers under acyclic (dual) normal default theories is the only outlier recognition problem known so far to be tractable; Furthermore, this result is indeed sharp, since switching either to general outliers, to cyclic theories, or to acyclic NMU theories makes the problem intractable. Based on the above results, we designed a polynomial time algorithm for enumerating all strong outliers of bounded size in an acyclic (dual) normal unary default theory. The algorithm can also be employed to enumerate all strong outliers of bounded size in a general NMU theory and, with some minor modifications, all the general outliers and witness pairs of bounded size. However, in this latter case, since the problems at hand are NPhard, its worst case running time will remain exponential, even if from a practical point of view it can benefit from some structural optimizations, based on Lemmas 3.10 and 3.11, which would allow it to reduce the size of the search space. All complexity results presented inthis work, together with those already presented in the literature, are summarized in Table 1, where the problems lying on the tractability frontier are underlined. Lemma 3.10 [The Incremental Lemma] Let (D, W ) be an NMU default theory, q a literal and Sa set of literals such that W ∪ S is consistent and S does not influence q in (D, W ). Then the following hold: Monotonicity of brave reasoning: If q is in some extension of (D, W ) then q is in some extension of (D, W ∪ S). Monotonicity of skeptical reasoning: If q is in every extension of (D, W ) then q is in every extension of (D, W ∪ S). This lemma helps us to state an upper bound on the size of any minimal outlier witness set in an acyclic NMU (and, hence, also NU and DNU) default theory. Lemma 3.11 Let (D, W ) be a consistent NMU default theory and let L be a set of literals in W . If S is a minimal strong outlier witness set for L in (D, W ), then letter(S) is a subset of a SCC in the atomic dependency graph of (D, W ). Taken together, the following tractability result can be proved. Theorem 3.12 Strong Outlier Recognition for NU acyclic default theories is in P. The proof is informally as follows. Since by Lemma 3.11 the size of a minimal strong outlier witness set is upper bounded by c, where c is the tightness of the theory, then the number of potential witness sets is polynomially bounded in the size of the theory. Moreover, checking conditions of Definition 2.2 can be done in polynomial time on NU theories, as the associated entailment is tractable. Based on these properties, a polynomial time Theorem 3.14 Strong Outlier Recognition for NU acyclic default theories is in P. Proof: Given a NU default theory (D, W ) of tightness c and a set of literals L from W , by Lemma 3.11 a minimal outlier witness set S for L in (D, W ) has a size of at most 85 4 Input: ∆ = (D, W ) – a NU default theory. Output: Out – the set of all strong outlier sets L in ∆ s.t. |L| ≤ k. Discussion and Conclusions In this paper we have analyzed the tractability border associated with outlier detection in default logics. From Theorems 3.9 and 3.13, it is clear that neither acyclicity nor strongness alone are sufficient to achieve tractability. However, if both constraints are imposed together, the complexity of the outlier recognition problem falls below the tractability frontier, as shown in Theorem 3.14. Overall, the results and arguments reported in this paper indicate that outlier recognition, even in its strong version, remains challenging and difficult on default theories. The tractability results we have provided nonetheless indicate that there are significant cases which can be efficiently implemented. A complete package for performing outlier detection in general default theories might therefore try to attain reasonable efficiency by recognizing such tractable fragments. Techniques by which the outlier detection task in default logics can be rendered practically affordable remain a major subject area for future research. let C1 , . . . , CN the ordered SCCs in the atomic dependency graph of ∆; set Out to ∅; for i = 1..N do for all S ⊂ W s.t. letter(S) ⊆ Ci do if (∀ℓ ∈ S)(D, WS ) |= ¬ℓ then for all L ⊆ WS s.t. |L| ≤ k and letter(S) ⊆ (C1 ∪ . . . ∪ Ci ) do if (∀ℓ ∈ S)(D, WS,L ) 6|= ¬ℓ then set Out to Out ∪ {L}; end if end for end if end for end for Figure 2: Algorithm Outlier Enumeration. References c, where c is the maximum size of an SCC in the atomic dependency graph of (D, W ). Thus, the strong outlier recognition problem can be decided by solving the strong outlier-witness recognition problem for each subset S of literals in WL having a size of at most c. Since the latter problem is polynomial time solvable (by Theorem 3.3) and since the number of times it has to be evaluated, that is O(|W |c ), is polynomially in the size of the input, then the depicted procedure solves the strong outlier recognition problem in polynomial time. Angiulli, F., Zohary, R. B.-E., & Palopoli, L. Tractale strong outlier identification, submitted.. Angiulli, F., Zohary, R. B.-E., & Palopoli, L. (2008). Outlier detection using default reasoning. Artificial Intelligence, 172 (16-17), 1837–1872. Angiulli, F., Zohary, R. B.-E., & Palopoli, L. (2010). Outlier detection for simple default theories. Artificial Intelligence, 174 (15), 1247–1253. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Comput. Surv., 41 (3). Hawkins, D. (1980). Identification of Outliers. Chapman and Hall, London, New York. Hodge, V. J., & Austin, J. (2004). A survey of outlier detection methodologies. Artif. Intell. Rev., 22 (2), 85–126. Kautz, H. A., & Selman, B. (1991). Hard problems for simple default logics. Artificial Intelligence, 49 (13), 243–279. Reiter, R. (1980). A logic for default reasoning. Artificial Intelligence, 13 (1-2), 81–132. Zhang, A., & Marek, W. (1990). On the classification and existence of structures in default logic. Fundamenta Informaticae, 13 (4), 485–499. Zohary, R. B.-E. (2002). Yet some more complexity results for default logic. Artificial Intelligence, 139 (1), 1–20. To take a step further and present an outlier enumeration algorithm, a known proposition is recalled. Proposition 3.15 (proved in (Kautz & Selman, 1991; Zohary, 2002)) Let ∆ be an NU or a DNU propositional default theory and let L be a set of literals. Deciding whether ∆ |= L is O(n2 ), where n is the size of the theory ∆. Based on the above properties, we are now ready to describe the algorithm Outlier Enumeration which, for a fixed integer k, enumerates in polynomial time all the strong outlier sets of size at most k in an acyclic NU default theory. The algorithm is presented in Figure 2. The SCCs C1 , . . . , CN of the atomic dependency graph of the theory are ordered such that there do not exist Ci and Cj with i < j and two letters l ∈ Ci and q ∈ Cj such that there exists a path from letter(q) to letter(j). By Theorem 3.15, the cost of steps 5 and 7 is O(n2 ). Thus, the cost of the algorithm is O(2c (n/c) · (cn2 + nk cn2 )) = O(2c nk+3 ). Since c and k are fixed, the algorithm enumerates the strong outliers in polynomial time in the size of (D, W ). For example, all the singleton strong outlier sets can be enumerated in time O(n4 ). 86 Topics in Horn Contraction: Supplementary Postulates, Package Contraction, and Forgetting James P. Delgrande School of Computing Science, Simon Fraser University, Burnaby, B.C., Canada V5A 1S6. [email protected] Renata Wassermann Department of Computer Science University of São Paulo 05508-090 São Paulo, Brazil [email protected] Abstract overall framework of Horn contraction based on remainder sets. Previous work in this area has addressed counterparts to the basic AGM postulates; consequently we first examine prospects for extending the approach to counterparts of the supplemental AGM postulates. Second, we address package contraction, in which one may contract by a set of formulas, and the result is that no (contingent) formula in the set is believed. In the AGM approach, for a finite number of formulas this can be accomplished by contracting by the disjunction of the formulas. Since the disjunction of Horn formulas may not be in Horn form, package contraction then becomes an important accessory operation. Last we briefly examine a forgetting operator, in which one effectively reduces the language of discourse. The next section introduces belief change while the third section discusses Horn clause reasoning, and previous work in the area. Section 4 examines the supplementary postulates; Section 5 addresses package contraction; and Section 6 covers forgetting. The last section contains a brief conclusion. In recent years there has been interest in studying belief change, specifically contraction, in Horn knowledge bases. Such work is arguably interesting since Horn clauses have found widespread use in AI; as well, since Horn reasoning is weaker than classical reasoning, this work also sheds light on the foundations of belief change. In this paper, we continue our previous work along this line. Our earlier work focussed on defining contraction in terms of weak remainder sets, or maximal subsets of an agent’s belief set that fail to imply a given formula. In this paper, we first examine issues regarding the extended contraction postulates with respect to Horn contraction. Second, we examine package contraction, or contraction by a set of formulas. Last, we consider the closely-related notion of forgetting in Horn clauses. This paper then serves to address remaining major issues concerning Horn contraction based on remainder sets. 1 2 The AGM Framework for Contraction As mentioned, the AGM approach [Alchourrón et al., 1985; Gärdenfors, 1988] is the best-known approach to belief change. Belief states are modelled by deductively-closed sets of sentences, called belief sets, where the underlying logic includes classical propositional logic. Thus a belief set K satisfies the constraint: Introduction Belief change addresses how a rational agent may alter its beliefs in the presence of new information. The best-known approach in this area is the AGM paradigm [Alchourrón et al., 1985; Gärdenfors, 1988], named after the original developers. This work focussed on belief contraction, in which an agent may reduce its stock of beliefs, and belief revision, in which new information is consistently incorporated into its belief corpus. In this paper we continue work in belief contraction in the expressively weaker language of Horn formulas, where a Horn formula is a conjunction of Horn clauses and a Horn clause can be written as a rule in the form a1 ∧a2 ∧· · ·∧an → a for n ≥ 0, and where a, ai (1 ≤ i ≤ n) are atoms. (Thus, expressed in conjunctive normal form, a Horn clause will have at most one positive literal.) Horn contraction has been addressed previously in [Delgrande, 2008; Booth et al., 2009; Delgrande and Wassermann, 2010; Zhuang and Pagnucco, 2010b]. With the exception of the last reference, this work centres on the notion of a remainder set, or maximal subset of a knowledge base that fails to imply a given formula. In this paper we continue work in Horn belief contraction, on a number of aspects; our goal is to essentially complete the If K logically entails φ then φ ∈ K. The most basic operator is called expansion: For belief set K and formula φ, the expansion of K by φ, K + φ, is the deductive closure of K ∪ {φ}. Of more interest are contraction, in which an agent reduces its set of beliefs, and revision, in which an agent consistently incorporates a new belief. These operators can be characterised by two means. First, a set of rationality postulates for a belief change function may be provided; these postulates stipulate constraints that should govern any rational belief change function. Second, specific constructions for a belief change function are given. Representation results can then be given (or at least are highly desirable) showing that a set of rationality postulates exactly captures the operator given by a particular construction. Our focus in this paper is on belief contraction, and so we review these notions with respect to this operator. Informally, 87 For arbitrary theory K and function −̇ from 2L × L to 2L , it proves to be the case that −̇ is a partial meet contraction function iff it satisfies the basic contraction postulates (K −̇1)– (K −̇6). Last, let  be a transitive relation on 2K , and let the selection function be defined by: the contraction of a belief set by a formula is a belief set in which that formula is not believed. Formally, a contraction function −̇ is a function from 2L × L to 2L satisfying the following postulates: (K −̇1) K −̇φ is a belief set. (K −̇2) K −̇φ ⊆ K. (K −̇3) If φ 6∈ K, then K −̇φ = K. (K −̇4) If not ⊢ φ, then φ 6∈ K −̇φ. (K −̇5) If φ ∈ K, then K ⊆ (K −̇φ) + φ. (K −̇6) If ⊢ φ ≡ ψ, then K −̇φ = K −̇ψ. (K −̇7) K −̇φ ∩ K −̇ψ ⊆ K −̇(φ ∧ ψ). (K −̇8) If ψ 6∈ K −̇(ψ ∧ φ) then K −̇(φ ∧ ψ) ⊆ K −̇ψ. The first six postulates are called the basic contraction postulates, while the last two are referred to as the supplementary postulates. We have the following informal interpretations of the postulates: contraction yields a belief set (K −̇1) in which the sentence for contraction φ is not believed (unless φ is a tautology) (K −̇4). No new sentences are believed (K −̇2), and if the formula is not originally believed then contraction has no effect (K −̇3). The fifth postulate, the so-called recovery postulate, states that nothing is lost if one contracts and expands by the same sentence. This postulate is controversial; see for example [Hansson, 1999]. The sixth postulate asserts that contraction is independent of how a sentence is expressed. The last two postulates express relations between contracting by conjunctions and contracting by the constituent conjuncts. (K −̇7) says that if a formula is in the result of contracting by each of two formulas then it is in the result of contracting by their conjunction. (K −̇8) says that if a conjunct is not in the result of contracting by a conjunction, then contracting by that conjunct is (using (K −̇7)) the same as contracting by the conjunction. Several constructions have been proposed to characterise belief change. The original construction was in terms of remainder sets, where a remainder set of K with respect to φ is a maximal subset of K that fails to imply φ. Formally: Definition 1 Let K ⊆ L and let φ ∈ L. K ↓ φ is the set of sets of formulas s.t. K ′ ∈ K ↓ φ iff 1. K ′ ⊆ K 2. K ′ 6⊢ φ 3. For any K ′′ s.t. K ′ ⊂ K ′′ ⊆ K, it holds that K ′′ ⊢ φ. X ∈ K ↓ φ is a remainder set of K wrt φ. From a logical point of view, the remainder sets comprise equally-good candidates for a contraction function. Selection functions are introduced to reflect the extra-logical factors that need to be taken into account, to obtain the “best” or most plausible remainder sets. In maxichoice contraction, the selection function determines a single selected remainder set as the contraction. In partial meet contraction, the selection function returns a subset of the remainder sets, the intersection of which constitutes the contraction. Thus if the selection function is denoted by γ(·), then the contraction of K by formula φ can be expressed by \ K −̇φ = γ(K ↓ φ). γ(K ↓ φ) = {K ′ ∈ K ↓ φ | ∀K ′′ ∈ K ↓ φ, K ′′  K ′ }. γ is a transitively relational selection function, and −̇ defined in terms of such a γ is a transitively relational partial meet contraction function. Then we have: Theorem 1 ([Alchourrón et al., 1985]) Let K be a belief set and let −̇ be a function from 2L × L to 2L . Then 1. −̇ is a partial meet contraction function iff it satisfies the contraction postulates (K −̇1)–(K −̇6). 2. −̇ is a transitively relational partial meet contraction function iff it satisfies the contraction postulates (K −̇1)– (K −̇8). The second major construction for contraction functions is called epistemic entrenchment. The general idea is that extralogic factors related to contraction are given by an ordering on formulas in the agent’s belief set, reflecting how willing the agent would be to give up a formula. Then a contraction function can be defined in terms of removing less entrenched formulas from the belief set. It is shown in [Gärdenfors and Makinson, 1988] that for logics including classical propositional logic, the two types of constructions, selection functions over remainder sets and epistemic entrenchment orderings, capture the same class of contraction functions; see also [Gärdenfors, 1988] for details. 3 Horn Theories and Horn Contraction 3.1 Preliminary Considerations Let P = {a, b, c, . . . } be a finite set of atoms, or propositional letters, that includes the distinguished atom ⊥. LH is the language of Horn formulas. That is, LH is given by: 1. Every p ∈ P is a Horn clause. 2. a1 ∧ a2 ∧ · · · ∧ an → a, where n ≥ 0, and a, ai (1 ≤ i ≤ n) are atoms, is a Horn clause. 3. Every Horn clause is a Horn formula. 4. If φ and ψ are Horn formulas then so is φ ∧ ψ. For a rule r as in 2 above, head(r) is a, and body(r) is the set {a1 , a2 , . . . , an }. Allowing conjunctions of rules, as given in 4, adds nothing of interest to the expressivity of the language with respect to reasoning. However, it adds to the expressibility of contraction, as we are able to contract by more than a single Horn clause. For convenience, we use ⊤ to stand for some arbitrary tautology. An interpretation of LH is a function from P to {true, f alse} such that ⊥ is assigned f alse. Sentences of LH are true or false in an interpretation according to the standard rules in propositional logic. An interpretation M is a model of a sentence φ (or set of sentences), written M |= φ, just if M makes φ true. M od(φ) is the set of models of formula (or set of formulas) φ; thus M od(⊤) is the set of 88 countermodel a ac b bc ∅ c interpretations of LH . An interpretation is usually identified with the atoms true in that interpretation. Thus, for P = {p, q, r, s} the interpretation {p, q} is that in which p and q are true and r and s are false. For convenience, we also express interpretations by juxtaposition of atoms. Thus the interpretations {{p, q}, {p}, {}} will usually be written as {pq, p, ∅}. A key point is that Horn theories are characterised semantically by the fact that the models of a Horn theory are closed under intersections of positive atoms in an interpretation. That is, Horn theories satisfy the constraint: If M1 , M2 ∈ M od(H) then M1 ∩ M2 ∈ M od(H). This leads to the notion of the characteristic models [Khardon, 1995] of a Horn theory: M is a characteristic model of theory H just if for every M1 , M2 ∈ M od(H), M1 ∩ M2 = M implies that M = M1 or M = M2 . E.g. the theory expressed by {p ∧ q → ⊥, r}) has models {pr, qr, r} and characteristic models {pr, qr}. Since pr ∩ qr = r, r isn’t a characteristic model of H. A Horn formula ψ is entailed by a set of Horn formulas A, A ⊢H ψ, just if any model of A is also a model of ψ. For simplicity, and because we work exclusively with Horn formulas, we drop the subscript and write A ⊢ ψ. If A = {φ} is a singleton set then we just write φ ⊢ ψ. A set of formulas A is inconsistent just if A ⊢ ⊥. We use φ ↔ ψ to represent logical equivalence, that is φ ⊢ ψ and ψ ⊢ φ. induced models a b ∅ resulting KB r.s. a ∧ (c → b) a b ∧ (c → a) b (a → b) ∧ (b → a) ∧ (c → a ∧ b) (a → b) ∧ (b → a) Figure 1: Example: Candidates for Horn contraction 3.2 Horn Contraction The last few years have seen work on Horn contraction. Delgrande [2008] addressed maxichoice Horn belief set contraction based on (Horn) remainder sets, called e-remainder sets. The definition of e-remainder sets for Horn clause belief sets is the same as that for a remainder set (Definition 1) but with respect to Horn clauses and Horn derivability. For H a Horn belief set and φ ∈ LH , the set of e-remainder sets with respect to H and φ is denoted by H ↓e φ. Booth, Meyer, and Varzinczak [2009] subsequently investigated this area by considering partial meet contraction, as well as a generalisation of partial-meet, based on the idea of infra-remainder sets and package contraction. In [Booth et al., 2009], an infra remainder sets is defined as follows: Definition 2 For belief sets H and T X, X ∈ H ⇓e φ iff there is some X ′ ∈ H ↓e φ such that ( H ↓e φ) ⊆ X ⊆ X ′ . The elements of H ⇓e φ are the infra e-remainder sets of H with respect to φ. All e-remainder sets are infra e-remainder sets, as is the intersection of any set of e-remainder sets. It proved to be the case that e-remainder sets (and including the infra-remainder sets of [Booth et al., 2009]) are not sufficiently expressive for contraction. The problem arises from the relation between remainder sets on the one hand, and their counterpart in terms of interpretations on the other. In the classical AGM approach, a remainder set is characterised semantically by a minimal superset of the models of the agent’s belief set such that this superset does not entail the formula for contraction. As a result, the models of a remainder set consist of the models of a belief set H together with a countermodel of the formula φ for contraction. With Horn clauses, things are not quite so simple, in that for a countermodel M of φ, there may be no Horn remainder set that has M as a model. To see this, consider the following example, adapted from [Delgrande and Wassermann, 2010]. Example 1 Let P = {a, b, c} and H = Cnh (a ∧ b). Consider candidates for H −̇(a ∧ b). There are three remainder sets, given by the Horn closures of a ∧ (c → b), b ∧ (c → a), and (a → b) ∧ (b → a) ∧ (c → a ∧ b)). Any infra-remainder set contains the closure of (c → a) ∧ (c → b). See Figure 1. In the first line of the table, we have that a (viz. {a, ¬b, ¬c}) is a countermodel of a ∧ b. Adding this model to the models of H yields the models of the formula a ∧ (c → b). This characterises a remainder set, as indicated in the last column. In the second line, we have that ac (viz. Notation: We collect here notation that is used in the paper. Lower-case Greek characters φ, ψ, . . ., possibly subscripted, denote arbitrary formulas of LH . Upper case Roman characters A, B, . . . , possibly subscripted, denote arbitrary sets of formulas. H (H1 , H ′ , etc.) denotes Horn belief sets, so that φ ∈ H iff H ⊢H φ. Cnh (A) is the deductive closure of a Horn formula or set of formulas A under Horn derivability. |φ| is the set of maximal, consistent Horn theories that contain φ. m (and subscripted variants) represents a maximum consistent set of Horn formulas. M (M1 , M ′ , etc.) denote interpretations over some fixed language. M od(A) is the set of models of A. Arbitrary sets of interpretations will be denoted M (M′ etc.). Cl∩ (M) is the intersection closure of a set of interpretations M;1 that is, Cl∩ (M) is the least set such that M ⊆ Cl∩ (M) and M1 , M2 ∈ Cl∩ (M) implies that M1 ∩ M2 ∈ Cl∩ (M). Note that M denotes an interpretation expressed as a set of atoms, while m denotes a maximum consistent set of Horn formulas. Thus the logical content is the same, in that an interpretation defines a maximum consistent set of Horn formulas, and vice versa. We retain these two interdefinable notations, since each is useful in the subsequent development. Similar comments apply to M od(φ) vs. |φ|. Since P is finite, a (Horn or propositional logic) belief set may be finitely represented, that is, for X a belief set, there is a formula φ such that Cnh (φ) = X. As well, we make use of the fact that there is a 1-1 correspondence between elements of |φ| and of M od(φ). 1 Recall that an interpretation is represented by the set of atoms true in the interpretation. 89 √ √ √ {a, ¬b, c}) is another countermodel of H. However, since H has a model ab, the intersection of these models, ab ∩ ac = a must also be included; this is the item in the second column. The resulting belief set is characterised by the interpretations M od(H) ∪ {ac, a} = {abc, ab, ac, a}, which is the set of models of formula a, as given in the third column. However, the result isn’t a remainder set, since Cnh (a ∧ (c → b)) is a logically stronger belief set than Cnh (a), which also fails to imply a ∧ b. This result is problematic for both [Delgrande, 2008] and [Booth et al., 2009]. For example, in none of the approaches in these papers is it possible to obtain H −̇e (a ∧ b) ↔ a, nor H −̇e (a ∧ b) ↔ (a ≡ b). But presumably these possibilities are desirable as potential contractions. Thus, in all of the approaches developed in the cited papers, it is not possible to have a contraction wherein a ∧ ¬b ∧ c corresponds to a model of the contraction. This issue was addressed in [Delgrande and Wassermann, 2010]. There the characteristic models of maxichoice candidates for H −̇e φ consist of the characteristic models of H together with a single interpretation from M od(⊤)\M od(φ). The resulting theories, called weak remainder sets, corresponded to the theories given in the third column in Figure 1. in Horn logics. A postulate set is provided and shown to characterise entrenchment-based Horn contraction. The fact that AGM contraction refers to disjunctions of formulas, which in general will not be Horn, is handled by considering Horn strengthenings in their postulate set, which is to say, logically weakest Horn formulas that subsume the disjunction. In contrast to earlier work, their postulate set includes equivalents to the supplemental postulates, and so goes beyond the set of basic postulates. For a given clause ϕ, the set of its Horn strengthenings (ϕ)H is the set such that ψ ∈ (ϕ)H if and only if ψ is a Horn clause and there is no Horn clause ψ ′ such that ψ ⊂ ψ ′ ⊆ ϕ. Of the set of ten postulates given in [Zhuang and Pagnucco, 2010b], five correspond to postulates characterizing partial meet contraction based on weak remainders as defined in [Delgrande and Wassermann, 2010] and two correspond to the supplementary postulates (K −̇7) and (K −̇8). The three new postulates are: (H −̇5) If ψ ∈ H −̇ϕ ∧ ψ then ψ ∈ H −̇ϕ ∧ ψ ∧ δ (H −̇9) If ψ ∈ H \ H −̇ϕ then ∀χ ∈ (ϕ ∨ ψ)H , χ 6∈ H −̇ϕ (H −̇10) If ∀χ ∈ (ϕ ∨ ψ)H , χ 6∈ H −̇ϕ ∧ ψ then ψ 6∈ H \ H −̇ϕ Definition 3 ([Delgrande and Wassermann, 2010]) Let H be a Horn belief set, and let φ be a Horn formula. H ↓↓e φ is the set of sets of formulas s.t. H ′ ∈ H ↓↓e φ iff ′ H = H ∩ m for some m ∈ |⊤| \ |φ|. H ′ ∈ H ↓↓e φ is a weak remainder set of H and φ. While there has been other work on belief change and Horn logic, such work focussed on specific aspects of the problem, rather than a general characterisation of Horn clause belief change. For example, Eiter and Gottlob [1992] address the complexity of specific approaches to revising knowledge bases, including the case where the knowledge base and formula for revision are conjunctions of Horn clauses. Not unexpectedly, results are generally better in the Horn case. Liberatore [2000] considers the problem of compact representation for revision in the Horn case. Basically, given a knowledge base K and formula φ, both Horn, the main problem addressed is whether the knowledge base, revised according to a given operator, can be expressed by a propositional formula whose size is polynomial with respect to the sizes of K and φ. [Langlois et al., 2008] approaches the study of revising Horn formulas by characterising the existence of a complement of a Horn consequence; such a complement corresponds to the result of a contraction operator. This work may be seen as a specific instance of a general framework developed in [Flouris et al., 2004]. In [Flouris et al., 2004], belief change is studied under a broad notion of logic, where a logic is a set closed under a Tarskian consequence operator. In particular, they give a criterion for the existence of a contraction operator satisfying the basic AGM postulates in terms of decomposability. The following characterizations were given for maxichoice and partial meet Horn contraction: Theorem 2 ([Delgrande and Wassermann, 2010]) Let H be a Horn belief set. Then −̇w is an operator of maxichoice Horn contraction based on weak remainders iff −̇w satisfies the following postulates. (H −̇w 1) H −̇w φ is a belief set. (H −̇w 2) If not ⊢ φ, then φ 6∈ H −̇w φ. (H −̇w 3) H −̇w φ ⊆ H. (H −̇w 4) If φ 6∈ H, then H −̇w φ = H. (H −̇w 5) If ⊢ φ then H −̇w φ = H (closure) (success) (inclusion) (vacuity) (failure) (H −̇w 6) If φ ↔ ψ, then H −̇w φ = H −̇w ψ. (extensionality) (H −̇w 7) If H 6= H −̇w φ then ∃β ∈ LH s.t. {φ, β} is inconsistent, H −̇w φ ⊆ Cnh ({β}) and ∀H ′ s.t H −̇w φ ⊂ H ′ ⊆ H we have H ′ 6⊆ Cnh ({β}). (maximality) Theorem 3 ([Delgrande and Wassermann, 2010]) Let H be a Horn belief set. Then −̇w is an operator of partial meet Horn contraction based on weak remainders iff −̇w satisfies the postulates (H −̇w 1) – (H −̇w 6) and: 4 Supplementary postulates In this section we investigate how the different proposals for Horn contraction operations behave with respect to the supplementary postulates (K-7) and (K-8). Throughout the section, we consider all selection functions to be transitively relational. First we consider the operation of Horn Partial Meet eContraction as defined in [Delgrande, 2008]. The following example shows that, considering ↓e as defined in [Del- (H −̇pm 7) If β ∈ H\(H−α), then there is some H ′ such that H − α ⊆ H ′ , α 6∈ Cnh (H ′ ) and α ∈ Cnh (H ′ ∪ {β}) (weak relevance) More recently, [Zhuang and Pagnucco, 2010b] have addressed Horn contraction from the point of view of epistemic entrenchment. They compare AGM contraction via epistemic entrenchment in classical propositional logic with contraction 90 grande, 2008], Horn Partial Meet e-Contraction does not satisfy (K −̇7): Proposition 3 PMWR satisfies (H −̇9) PMWR in general does not satisfy (H −̇10), as the following example shows. Let H = Cnh ({a, b}). Then H ↓↓e a = {H1 , H3 } and H ↓↓e a ∧ b = {H1 , H2 , H3 }, where H1 = Cnh ({a ∨ ¬b, b ∨ ¬a}), H2 = Cnh ({a}) and H3 = Cnh ({b}). Assuming a selection function based on a transitive relation such that H1 ≺ H2 and H1 ≺ H3 (and H2  H3 and H3  H2 ), we have H − a = H3 and H − a ∧ b = H2 ∩ H3 Since (a∨b)H = {a, b}, we have that for any χ ∈ (a∨b)H , χ 6∈ H − a ∧ b, but b ∈ H − a. Example 2 Let H = Cnh ({a → b, b → c, a → d, d → c}). We then have H ↓e a → c = {H1 , H2 , H3 , H4 } H ↓e b → c = {H5 } where: H1 = Cnh ({a → b, a → d}), H2 = Cnh ({a → b, a ∧ c → d, d → c}), H3 = Cnh ({b → c, a ∧ c → b, a → d}), H4 = Cnh ({a ∧ c → b, b → c, a ∧ c → d, d → c, a ∧ d → b, a ∧ b → d}), and H5 = Cnh ({a → b, a → d, d → c}) Note that the two first elements of H ↓e a → c are subsets of the single element of H ↓e b → c and hence, cannot belong to H ↓e a → c ∧ b → c. In order to finish the comparison between the sets of postulates, it is interesting to note the following: H ↓e a → c ∧ b → c = {H3 , H4 , H5 } Observation 1 (H −̇9) implies weak relevance. If we take a selection function based on a transitive relation between remainder sets that gives priority in the order in which they appear in this example, i.e., H5 ≺ H4 ≺ H3 ≺ H2 ≺ H1 , we will have: 5 Package Contraction In this section we consider Horn package contraction. For belief set H and a set of formulas Φ, the package contraction H −̇p Φ is a form of contraction in which no member of Φ is in H −̇p Φ. As [Booth et al., 2009] points out, this operation is of interest in Horn clause theories given their limited expressivity: in order to contract by φ and ψ simultaneously, one cannot contract by the disjunction φ ∨ ψ, since the disjunction is generally not a Horn clause. Hence, one expresses the contraction of both φ and ψ as the package contraction H −̇p {φ, ψ}. We define the notion of Horn package contraction, and show that it is in fact expressible in terms of maxichoice Horn contraction. H − a → c = H1 H − b → c = H5 H − a → c ∧ b → c = H3 And we see that H − a → c ∩ H − b → c = H1 6⊆ H3 = H −a→c∧b→c The same example shows that the operation does not satisfy (K −̇8): a → c 6∈ H − a → c ∧ b → c, but H − a → c ∧ b → c 6⊆ H − a → c. If there are no further restrictions on the selection function, the same example also shows that contraction based on infra-remainders does not satisfy the supplementary postulates. Note that each remainder set in the example is also an infra-remainder and that the selection function always selects a single element. It suffices to assign all the remaining infraremainders lower priority. Now we can show that the operation of partial meet based on weak remainders (PMWR) has a better behaviour with respect to the supplementary postulates: Definition 4 Let H be a Horn belief set, and let Φ = {φ1 , . . . , φn } be a set of Horn formulas. H ↓↓p Φ is the set of sets of formulas s.t. H ′ ∈ H ↓↓p Φ iff ∃m1 , . . . , mn such that, for 1 ≤ i ≤ n: mi ∈ |⊤| \ |φT i | if 6⊢ φi , otherwise mi = LH n and H ′ = H ∩ i=1 mi . Definition 5 Let γ be a selection function on H such that γ(H ↓↓p Φ) = {H ′ } for some H ′ ∈ H ↓↓p Φ. The (maxichoice) package Horn contraction based on weak remainders is given by: Proposition 1 Partial meet based on weak remainders and a transitive relational selection function satisfies (K −̇7) and (K −̇8). H −̇p Φ = γ(H ↓↓p Φ) We have seen that Epistemic Entrenchment Horn Contraction (EEHC) is characterized by a set of ten postulates. In [Zhuang and Pagnucco, 2010a], it is shown that transitively relational PMWR as defined above is more general than EEHC. This means that any operation satisfying their set of 10 postulates (which include (K −̇7) and (K −̇8)) is a PMWR. We have seen that PMWR satisfies (K −̇7) and (K −̇8), hence, in order to compare PMWR and EEHC, we need to know whether PMWR satisfies (H −̇5), (H −̇9) and (H −̇10). if ∅ = 6 Φ ∩ H 6⊆ Cnh (⊤); and H otherwise. The following result relates elements of H ↓↓p Φ to weak remainders. Proposition 4 Let H be a Horn belief set and let Φ = {φ1 , . . . , φn } be a set of Horn formulas where for 1 ≤ i ≤ n we have 6⊢ φi . Then H ′ ∈ T H ↓↓p Φ iff for 1 ≤ i ≤ n there are Hi ∈ H ↓↓e n φi and H ′ = i=1 Hi . Proposition 2 PMWR satisfies (H −̇5). 91 It follows immediately from this that any maxichoice Horn contraction defines a package contraction, and vice versa. (H −̇p 5) H −̇p Φ = H −̇p (Φ \ Cnh (⊤)) Example 3 Consider the Horn belief set H = Cnh ({a, b}) over P = {a, b, c}. We want to determine elements of (H −̇p 6) If φ ↔ ψ, then H −̇p (Φ ∪ {φ}) = H −̇p (Φ ∪ {ψ}) (extensionality) (H −̇p 5b) H −̇p ∅ = H H ↓↓p Φ = Cnh ({a, b}) ↓↓p {a, b}. Φ′ = (Φ \ Cnh (⊤)) ∩ H = {φ1 , . . . , φn } there is {β1 , . . . , βn } s.t. {φi , βi } ⊢ ⊥ and H −̇p Φ ⊆ Cnh (βi ) for 1 ≤ i ≤ n; and ∀H ′ s.t H −̇p Φ ⊂ H ′ ⊆ H, ∃βi s.t. H ′ 6⊆ Cnh (βi ). (maximality) The following result, which shows that package contraction generalises maxichoice contraction, is not surprising, nor is the next result, which shows that a maxichoice contraction defines a package contraction. Proposition 5 Let −̇p be an operator of maxichoice Horn package contraction. Then H −̇φ = H −̇p Φ for Φ = {φ} is an operator of maxichoice Horn contraction based on weak remainders. Proposition 6 Let −̇ be an operator of maxichoice Horn contraction based on weak remainders. Then \ H −̇p Φ = H −̇φ 1. There are 4 countermodels of a, given by: A = {bc, b, c, ∅}. Thus there are four weak remainders corresponding to these countermodels, and so four candidates for maxichoice Horn contraction by a. 2. Similarly there are 4 countermodels of b: B = {ac, a, c, ∅}. 3. Members of H ↓↓p Φ are given by Cl∩ (M od(H) ∪ {x} ∪ {y}) for x ∈ A and y ∈ B. For example, for x = bc, y = ∅, we have that Cl∩ (M od(H)∪ {x} ∪ {y}) = {abc, ab, bc, b, ∅}, which is the set of models of (c → b) ∧ (a → b). For x = bc, y = ac, we have that Cl∩ (M od(H) ∪ {x} ∪ {y}) = Cnh (⊤); this holds for no other choice of x and y. φ∈Φ is an operator of maxichoice Horn package contraction. As described, a characteristic of maxichoice package contraction is that there are a large number of members of H ↓↓p Φ, some of which may be quite weak logically. Of course, a similar point can be made about maxichoice contraction, but in the case of package contraction we can eliminate some candidates via pragmatic concerns. We have that a package contraction H −̇p Φ is a belief set H ′ ∈ H ↓↓p Φ such that, informally, models of H ′ contain a countermodel for each φi ∈ Φ along with models of H. In general, some interpretations will be countermodels of more than one member of Φ, and so pragmatically, one canTselect minimal sets of countermodels. Hence in the case that i (M od(⊤)T\ M od(φi )) 6= ∅, a single countermodel, that is some m ∈ i (M od(⊤) \ M od(φi )), would be sufficient to yield T a package contraction. Now, it may be that i (M od(⊤) \ M od(φi )) is empty. A simple example illustrates this case: Example 4 Let H = Cnh (a → b, b → a) where P = {a, b}. Then H −̇p {a → b, b → a} = Cnh (⊤). That is, the sole countermodel of a → b is {a} while that of b → a is {b}. The intersection closure of these interpretations with those of H is {ab, a, b, ∅} = M od(⊤). Informally then one can select a minimal set of models such that a countermodel of each member of Φ is in the set. These considerations yield the following definition: Definition 6 Let H be a Horn belief set, and let Φ = {φ1 , . . . , φn } be a set of Horn formulas. HS(Φ), the set of (minimal) hitting sets of interpretations with respect to Φ, is defined by: S ∈ HS(Φ) iff What this example indicates informally is that there is a great deal of scope with respect to candidates for package contraction. To some extent, such a combinatorial explosion of possibilities is to be expected, given the fact that a formula will in general have a large number of countermodels, and that this is compounded by the fact that each formula in a package contraction does not hold in the result. However, it can also be noted that some candidate package contractions appear to be excessively weak; for example it would be quite drastic to have Cnh (⊤) as the result of such a contraction. As well, some candidate package contractions appear to contain redundancies, in that a selected countermodel of a may also be a countermodel of b, in which case there seems to be no reason to allow the possible incorporation of a separate countermodel of b. Consequently, we also consider versions of package contraction that in some sense yield a maximal belief set. However, first we provide results regarding package contraction. We have the following result: Theorem 4 Let H be a Horn belief set. Then if −̇p is an operator of maxichoice Horn package contraction based on weak remainders then −̇p satisfies the following postulates. (H −̇p 2) For φ ∈ Φ, if not ⊢ φ, then φ 6∈ H −̇p Φ (H −̇p 3) H −̇p Φ ⊆ H (H −̇p 4) H −̇p Φ = H −̇p (H ∩ Φ) (triviality) (H −̇p 7) If H 6= H −̇p Φ then for It proves to be the case that there are a total of 14 elements in H ↓↓p Φ and so 14 candidate package contractions. We have the following. (H −̇p 1) H −̇p Φ is a belief set. (failure) (closure) (success) (inclusion) (vacuity) 92 V V ( φ∈S1 φ)∨( φ∈S2 φ). Of course, all such sets will be guaranteed to be finite. We introduce the following notation for this section, where S is a set of Horn clauses. 1. S ⊆ |⊤| 2. S ∩ (|⊤| \ |φi |) 6= ∅ for 1 ≤ i ≤ n. 3. For S ′ ⊂ S, S ′ ∩ (|⊤| \ |φi |) = ∅ for some 1 ≤ i ≤ n. Thus we look for sets of sets of interpretations, elements of such a set S are interpretations represented as maximum consistent sets of formulas (Condition 1). As well, this set S contains a countermodel for each member of Φ (2) and moreover S is a subset-minimal set that satisfies these conditions (3). The notion of a hitting set is not new; see [Garey and Johnson, 1979] and see [Reiter, 1987] for an early use in AI. Thus S ∈ HS(Φ) corresponds to a minimal set of countermodels of members of Φ. • S[p/t] is the result of uniformly substituting t ∈ {⊥, ⊤} for atom p in S. • S↓p = {φ ∈ S | φ does not mention p} Assume without loss of generality that for φ ∈ S, that head (φ) 6∈ body(φ). The following definition adapts the standard definition for forgetting to Horn clauses. Definition 8 For set of Horn clauses S and atom p, define f orget(S, p) to be S[p/⊥] ∨ S[p/⊤]. Definition 7 H ↓↓p Φ is the set T of sets of formulas s.t. H ′ ∈ H ↓↓p Φ iff H ′ = H ∩ m∈S for some S ∈ HS(Φ). Cnh (c → a, c → b), This is not immediately useful for us, since a disjunction is generally not Horn. However, the next result shows that this definition nonetheless leads to a Horn-definable forget operator. Recall that for clauses c1 and c2 , expressed as sets of literals where p ∈ c1 and ¬p ∈ c2 , that the resolvent of c1 and c2 is the clause (c1 \ {p}) ∪ (c2 \ {¬p}). As well, recall that if c1 and c2 are Horn, then so is their resolvent. In the following, Res(S, p) is the set of Horn clauses obtained from S by carrying out all possible resolutions with respect to p. Cnh (a → b, b → a, c → a, c → b) }. Definition 9 Let S be a set of Horn clauses and p an atom. Define ′ ′ Proposition 7 For H ∈ H ↓↓p Φ, H is an operator of maxichoice Horn package contraction. Example 5 Consider where H = Cnh (a, b), P = {a, b, c}. 1. Let Φ = {a, b}. We obtain that H ↓↓p Φ = { Cnh (⊤), Cnh (c → a), Cnh (c → b), Cnh (a → b, b → a), Compare this with Example 3, where we have 14 candidate package contractions. Res(S, p) = {φ | ∃φ1 , φ2 ∈ S s.t. p ∈ body(φ1 ), p = head (φ2 ), and φ = (body(φ1 ) \ {p} ∪ body(φ2 )) → head (φ1 )} 2. Let Φ = {a, a ∧ b}. We obtain that H ↓↓p Φ = { Cnh (b), Cnh (b ∧ (c → a)), Theorem 5 f orget(S, p) ↔ S↓p ∪ Res(S, p). Cnh (a → b, b → a), Corollary 1 Let S be a set of Horn clauses and p an atom. Then f orget(S, p) is equivalent to a set of Horn clauses. Cnh (a → b, b → a, c → a, c → b) }. Corollary 2 Let S1 and S2 be sets of Horn clauses and p an atom. Then S1 ↔ S2 implies that f orget(S1 , p) ↔ f orget(S2 , p). Any set of formulas that satisfies Definition 7 clearly also satisfies Definition 5. One can further restrict the set of candidate package contractions by replacing S ′ ⊂ S by |S ′ | < |S| in the third part of Definition 7. As well, of course, one could continue in the obvious fashions to define a notion of partial meet Horn package contraction. 6 There are several points of interest about these results. The theorem is expressed in terms of arbitrary sets of Horn clauses, and not just deductively-closed Horn belief sets. Hence the second corollary states a principle of irrelevance of syntax for the case for forgetting for belief bases. As well, the expression S↓p ∪ Res(S, p) is readily computable, and so the theorem in fact provides a means of computing f orget. Further, the approach clearly iterates for more than one atom. We obtain the additional result: Forgetting in Horn Formulas This section examines another means of removing beliefs from an agent’s belief set, that of forgetting [Lin and Reiter, 1994; Lang and Marquis, 2002]. Forgetting is an operation on belief sets and atoms of the language; the result of forgetting an atom can be regarded as decreasing the language by that atom. In general it will be easier to work with a set of Horn clauses, rather than Horn formulas. Since there is no confusion, we will freely switch between sets of Horn clauses and the corresponding Horn formula comprising the conjunction of clauses in the set. Thus any time that a set appears as an element in a formula, it can be understood as standing for the conjunction of members of the set. Thus for sets of clauses S1 and S2 , S1 ∨ S2 will stand for the formula Corollary 3 f orget(f orget(S, p), q) ≡ f orget(f orget(S, q), p). (In fact, this is an easy consequence of the definition of forget.) Given this, we can define for set of atoms A, f orget(S, A) = f orget(f orget(S, a), A \ {a}) where a ∈ A. On the other hand, forgetting an atom may result in a quadratic blowup of the knowledge base. Finally, it might seem that the approach allows for the definition of a revision operator – and a procedure for computing 93 a revision – by using something akin to the Levi Identity. Let A(φ) be the set of atoms appearing in (formula or set of formulas) φ. Then: Preliminary results and applications. In Proceedings of the 10th International Workshop on Non-Monotonic Reasoning (NMR-04), pages 171–179, Whistler BC, Canada, June 2004. [Gärdenfors and Makinson, 1988] P. Gärdenfors and D. Makinson. Revisions of knowledge systems using epistemic entrenchment. In Proc. Second Theoretical Aspects of Reasoning About Knowledge Conference, pages 83–95, Monterey, Ca., 1988. [Gärdenfors, 1988] P. Gärdenfors. Knowledge in Flux: Modelling the Dynamics of Epistemic States. The MIT Press, Cambridge, MA, 1988. [Garey and Johnson, 1979] M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Co., New York, 1979. [Hansson, 1999] S. O. Hansson. A Textbook of Belief Dynamics. Applied Logic Series. Kluwer Academic Publishers, 1999. [Khardon, 1995] Roni Khardon. Translating between Horn representations and their characteristic models. Journal of Artificial Intelligence Research, 3:349–372, 1995. [Lang and Marquis, 2002] J. Lang and P. Marquis. Resolving inconsistencies by variable forgetting. In Proceedings of the Eighth International Conference on the Principles of Knowledge Representation and Reasoning, pages 239– 250, San Francisco, 2002. Morgan Kaufmann. [Langlois et al., 2008] M. Langlois, R.H. Sloan, B. Szörényi, and G. Turán. Horn complements: Towards Horn-to-Horn belief revision. In Proceedings of the AAAI National Conference on Artificial Intelligence, Chicago, Il, July 2008. [Liberatore, 2000] Paolo Liberatore. Compilability and compact representations of revision of Horn knowledge bases. ACM Transactions on Computational Logic, 1(1):131– 161, 2000. [Lin and Reiter, 1994] F. Lin and R. Reiter. Forget it! In AAAI Fall Symposium on Relevance, New Orleans, November 1994. [Reiter, 1987] R. Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32(1):57–96, 1987. [Zhuang and Pagnucco, 2010a] Z. Zhuang and Maurice Pagnucco. Two methods for constructing horn contractions. AI 2010: Advances in Artificial Intelligence, pages 72–81, 2010. [Zhuang and Pagnucco, 2010b] Zhi Qiang Zhuang and Maurice Pagnucco. Horn contraction via epistemic entrenchment. In Tomi Janhunen and Ilkka Niemelä, editors, Logics in Artificial Intelligence - 12th European Conference (JELIA 2010), volume 6341 of Lecture Notes in Artificial Intelligence, pages 339–351. Springer Verlag, 2010. def F Revise(S, φ) = f orget(S, A(S) ∩ A(φ)) + φ. In fact, this does yield a revision operator, but an operator that in general is far too drastic to be useful. To see this, consider a taxonomic knowledge base which asserts that whales are fish, whale → f ish. Of course, whales are mammals, but in using the above definition to repair the knowledge base, one would first forget all knowledge involving whales. Such an example doesn’t demonstrate that there are no reasonable revision operators definable via forget, but it does show that a naı̈ve approach is problematic. 7 Conclusions This paper has collected various results concerning Horn belief set contraction. Earlier work has established a general framework for maxichoice and partial meet Horn contraction. The present paper then extends this work in various ways. We examined issues related to supplementary postulates, developed an approach to package contraction, and explored the related notion of forgetting. For future work, it would be interesting to investigate relationships between remainderbased and entrenchment-based Horn contraction, as well as to explore connections to constructions for (Horn) belief revision. References [Alchourrón et al., 1985] C.E. Alchourrón, P. Gärdenfors, and D. Makinson. On the logic of theory change: Partial meet functions for contraction and revision. Journal of Symbolic Logic, 50(2):510–530, 1985. [Booth et al., 2009] Richard Booth, Thomas Meyer, and Ivan José Varzinczak. Next steps in propositional Horn contraction. In Proceedings of the International Joint Conference on Artificial Intelligence, Pasadena, CA, 2009. [Delgrande and Wassermann, 2010] James Delgrande and Renata Wassermann. Horn clause contraction functions: Belief set and belief base approaches. In Fangzhen Lin and Uli Sattler, editors, Proceedings of the Twelfth International Conference on the Principles of Knowledge Representation and Reasoning, pages 143–152, Toronto, 2010. AAAI Press. [Delgrande, 2008] J.P. Delgrande. Horn clause belief change: Contraction functions. In Gerhard Brewka and Jérôme Lang, editors, Proceedings of the Eleventh International Conference on the Principles of Knowledge Representation and Reasoning, pages 156–165, Sydney, Australia, 2008. AAAI Press. [Eiter and Gottlob, 1992] T. Eiter and G. Gottlob. On the complexity of propositional knowledge base revision, updates, and counterfactuals. Artificial Intelligence, 57(23):227–270, 1992. [Flouris et al., 2004] Giorgos Flouris, Dimitris Plexousakis, and Grigoris Antoniou. Generalizing the AGM postulates: 94 A Selective Semantics for Logic Programs with Preferences Alfredo Gabaldon CENTRIA – Center for Artificial Intelligence Universidade Nova de Lisboa [email protected] Abstract standard answer sets of the logic program. Another characteristic of the selective approaches is that most of them extend logic programs with preferences without increasing computational complexity. In this work we focus on a selective approach and propose a new semantics for answer set programming with preferences. The main motivation for introducing a new semantics is that all of the existing selective approaches seem to be too strong in the sense that there are programs that possess answer sets but not preferred answer sets. At the same time, the same approaches seem to be weak in the sense that there are programs that possess multiple answers sets that cannot be distinguished apart even by a full prioritization. Our proposed semantics yields at most one preferred answer set when a complete set of priorities is specified. Moreover, for a large class of propositional logic programs (called negative-cyclefree and head-consistent) that are guaranteed to have answer sets, we show that a preferred answer set always exists under our proposed semantics. In the case of logic programs without classical negation, this is the most general known class of programs guaranteed to have answer sets [Baral, 2003]. Our starting point of reference is the preferred answer set semantics introduced by Brewka and Eiter [1999] (for brevity, we will refer to this semantics as the BE semantics). Among the selective semantics within the NP complexity class, the BE semantics is the least restrictive in the sense that, for a given logic program with a fixed set of preferences, it selects a collection of preferred answer sets that is a superset of those selected by the other approaches, as shown in [Schaub and Wang, 2001]. In other words, if a program does not have preferred answer sets under the BE semantics, neither does it have any preferred answer sets under the other selective semantics. Since our aim is a semantics that always assigns a preferred answer set to a large class of logic programs, the BE semantics seems to be a good point of reference and comparison. Agents in complex domains need to be able to make decisions even if they lack complete knowledge about the state of their environment. One approach that has been fairly successful is to use logic programming with answer set semantics (ASP) to represent the beliefs of the agent and solve various reasoning problems such as planning. The ASP approach has been extended with preferences and several semantics have been proposed for selecting preferred answer sets of a logic program. Among the available semantics, one proposed by Brewka and Eiter has been shown to be more permissive than others in the sense of allowing the selection of a larger number of answer sets as the preferred ones. Although the semantics is permissive enough to allow multiple preferred answer sets even for some fully prioritized programs, there are on the other hand programs that have answer sets but not preferred ones. We consider a semantics that selects at most one answer set as the preferred one for fully prioritized (propositional) programs, and show that programs in a large class guaranteed to have an answer set (negative-cycle-free, headconsistent) are also guaranteed to have a preferred answer set. 1 Introduction Reasoning with preferences is widely recognized as an important problem. Many knowledge representation formalisms have been extended to represent preferences as priorities between propositions in a knowledge base. In particular, prioritized versions of logic programming with answer set semantics [Gelfond and Lifschitz, 1991; Gelfond, 2008] have been studied and various semantics have been proposed [Sakama and Inoue, 1996; Gelfond and Son, 1997; Zhang and Foo, 1997; Brewka and Eiter, 1999; Delgrande et al., 2000; Wang et al., 2000]. Some of these approaches [Brewka and Eiter, 1999; Delgrande et al., 2000; Wang et al., 2000] are classified as being selective in the sense that the preferences are effectively used as a selection mechanism for choosing among the answer sets of a logic program. That is, the preferred answer sets are always chosen from the collection of 2 Prioritized Extended Logic Programs In this section we give an overview of the answer set semantics and of the BE preferred answer set semantics. 2.1 Extended Logic Programs We start with the syntax of Extended Logic Programs (elps) [Gelfond and Lifschitz, 1991; Gelfond, 2008]. A literal is an 95 atom p or its negation ¬p. Literals p and ¬p are called contrary and l denotes the literal contrary to l. If the language of an elp is not explicitly defined then it is understood to consist of all the atoms that appear in the program. Lit denotes the set of all literals in the language of an elp. A rule r is an expression of the form Let us next look at the BE preferred answer set semantics. We will refer to preferred answers sets under the BE semantics as BE-preferred answer sets. These definitions are simplified versions of those in [Brewka and Eiter, 1999] as we focus in this work on propositional programs. Definition 5. Let P = (P, <) be a fully prioritized elp where P is a set of n prerequisite-free rules and let S be a set of literals. The sequence of sets S0 , S1 , . . . , Sn is defined as follows: S0 = ∅ and for 0 < i ≤ n, l0 ← l1 , . . . , ln , not ln+1 , . . . , not lm (1) where l0 , . . . , lm are literals and not denotes negation-asfailure (or default negation). Expressions not l are called extended literals. For a rule r of the form (1), the head, l0 , is denoted by head(r), the set of literals {l1 , . . . , ln } by body + (r) and the set of literals {ln+1 , . . . , lm } by body − (r). An extended logic program is a finite set of rules. A set of literals S ⊆ Lit is called a partial interpretation. A rule r is said to be defeated by a literal l if l ∈ body − (r). A partial interpretation S defeats a rule r if there is a literal l ∈ S that defeats r. S satisfies the body of a rule r if body + (r) ⊆ S and S does not defeat r. S satisfies r if head(r) ∈ S or S does not satisfy the body of r. The answer sets of an elp whose rules do not contain not is defined as follows. Definition 1. Let P be an elp without default negation. A partial interpretation S is an answer set of P if S is minimal (wrt set inclusion) among the partial interpretations that satisfy the rules of P , and S is logically closed, i.e. if S contains contrary literals then S is Lit. For arbitrary programs, the definition is extended by introducing the Gelfond-Lifschitz reduct: let S be a partial interpretation and P be an elp. The reduct, P S , of P relative to S is the set of rules l0 ← l1 , . . . , ln for all rules (1) in P that are not defeated by S. Definition 2. (Answer Set) A partial interpretation S is an answer set of an elp P if S is an answer set of P S . Si =  Si−1 ,         Si−1 ∪ {head(ri )}, if ri is defeated by Si−1 or head(ri ) ∈ S and ri is defeated by S, otherwise. The set CP (S) is defined to be the smallest set of literals such that Sn ⊆ CP (S) and CP (S) is logically closed (consistent or equal to Lit). Definition 6. (BE-preferred Answer Sets) Let P = (P, <) be a fully prioritized elp with prerequisite-free P and let A be an answer set of P . Then A is the BE-preferred answer set of P iff A = CP (A). As the definition suggests, a fully prioritized, prerequisitefree elp has at most one BE-preferred answer set. For non prerequisite-free prioritized elps, a transformation is applied similar to the Gelfond-Lifschitz reduct but that produces rules without prerequisites. The precise definition is as follows. Definition 7. Let P = (P, <) be a fully prioritized elp and let S be a set of literals. Define SP = (SP, S<) to be the fully prioritized elp such that SP is the set of rules obtained from P by 1. deleting every rule r ∈ P s.t. body + (r) 6⊆ S, and 2. deleting body + (r) from every remaining rule r; and S < is inherited from < by the mapping f : SP 7→ P where f (r′ ) is the first rule in P wrt < such that r′ results from r by step (2) above. In other words, for every r1 , r2 ∈ P , r1′ S< r2′ iff f (r1′ ) < f (r2′ ). Definition 8. A set of literals A is a BE-preferred answer set of a fully prioritized elp P = (P, <) iff A is a BE-preferred answer set of AP. In the next section we present some examples illustrating this semantics, including programs without preferred answer sets and fully prioritized programs with multiple ones. 2.2 Prioritized extended logic programs We now turn to prioritized elps, adapting the definitions from [Brewka and Eiter, 1999]. Let us start with the syntax. An elp rule r of the form (1) is called prerequisite-free if body + (r) = ∅ and an elp P is prerequisite-free if all its rules are prerequisite-free. Definition 3. A prioritized elp is a pair P = (P, <) where P is an elp and < is a strict partial order on the rules of P . The answer sets of a prioritized elp P = (P, <) are defined as the answer sets of P and are denoted by AS(P). Definition 4. A full prioritization of a prioritized elp P is any pair P ′ = (P, <′ ) where <′ is a total order on P that is compatible with <, i.e., r1 < r2 implies r1 <′ r2 for all r1 , r2 in P . The total ordering in a fully prioritized elp induces an enumeration r1 , r2 , . . . of its rules with r1 having the highest priority. Throughout the paper, we use such an enumeration in examples and write 3 Limitations of existing semantics As we discussed in the introduction, the motivation for proposing a new semantics for prioritized logic programs is twofold. First, there are elps that while containing no discernible conflict or contradiction and indeed possessing answer sets, have no BE-preferred answer sets and therefore no preferred answer sets under the other semantics which are more restrictive. Second, there are programs that even by providing a full prioritization of its rules, it is not possible to ri : l ← l1 , . . . , ln , not ln+1 , . . . , not lm to denote the ith rule in such an enumeration. 96 4 select only one of the answer sets as the preferred one. Most of the following examples are from [Brewka and Eiter, 1999]. Recall that ri means the rule is the ith rule in the enumeration of the rules by priority. Example 1. Consider the program P1 with rules A new semantics for prioritized elps Our proposed new semantics is intuitively based on the view, following Brewka and Eiter, that priorities are used to resolve conflicts. Intuitively, it is also based on the idea of taking conflicts between rules somewhat more literally by appealing to a notion of “attack” that is to some degree inspired by argument attacks in argumentation. Here, by an attack of a rule on another we simply mean that if the attacking rule fires, it will defeat the other one. We then consider rules to be in conflict when they attack each other, as in the program: r1 : c ← not b. r2 : b ← not a. This program has one answer set, A = {b}. Since c 6∈ A nor is r1 defeated by ∅, c ∈ CP1 (A). Therefore this program has no BE-preferred answer sets. Brewka and Eiter’s approach to preferences is based on the notion (which we follow as well) that preferences are introduced in order to “solve potential conflicts...to conclude more than in standard answer semantics.” [Brewka and Eiter, 1999]. Since the rules in the above program show no apparent conflict between them and in fact the program has only one answer set, it seems reasonable that it should have a preferred answer set. a ← not b. b ← not a. But these attacks can be indirect, through a chain of other rules, as in the program: a ← c. b ← not a. c ← not b. The following example shows this shortcoming even more directly, since one of the rules is a fact, i.e. does not involve defaults at all. Example 2. Consider the program P2 with rules Here, the second rule attacks the first indirectly through the third rule. In order to simplify the development of our semantics for prioritized logic programs, we appeal to a well know unfolding operation which would transform the above program into the program: r1 : a ← not b. r2 : b. a ← not b. b ← not a. c ← not b. This program has one answer set, A = {b}. By a similar argument as in the previous example, we have that a ∈ CP2 (A) and so the program has no BE-preferred answer sets. The above examples seem to show that the semantics is in some sense too strong (and so are other proposed selective semantics which have been shown to be even stronger). On the other hand, as already mentioned, in some cases this semantics assigns multiple preferred answer sets to programs that are fully prioritized. In other words, under the BE-preferred answer set semantics there are cases where it is not possible to “solve potential conflicts” completely even with a full prioritization. Consider the following example. Example 3. Consider the program P3 with rules The formal definitions follow. Definition 9. (Unfolding [Aravindan and Dung, 1995]) Let Pi be an elp and r be a rule in Pi of the form H ← L, Γ where L is a literal different from H and Γ is the rest of the rule’s body. Suppose that r1 , . . . , rk are all the rules in Pi such that each rj is of the form L ← Γj such that L 6∈ body + (rj ). Then Pi+1 = (Pi \ {r}) ∪ {H ← Γj , Γ : 1 ≤ j ≤ k}. This operation is called unfolding r in Pi (or unfolding in general), r is called the unfolded rule, and L is called the selected literal in the unfolding. r1 : b ← not ¬b, a. r2 : c ← not b. r3 : a ← not c. The answer set semantics is one of the logic programming semantics that satisfies the Generalized Principle of Partial Evaluation [Aravindan and Dung, 1995; Dix, 1995; Brass and Dix, 1999], which means that the unfolding transformation above results in a program that has exactly the same answer sets as the original program. This fully prioritized elp has two answer sets: A1 = {c} and A2 = {a, b}. They are both BE-preferred answer sets. Consider A1 . Rule r1 does not belong to the reduct of the program since prerequisite a 6∈ A1 . Then rule r2 is not defeated by ∅ nor by A1 , so we get c which then defeats rule r3 and we have A1 = CP3 (A1 ). Now consider A2 . Prerequisite a is removed from r1 in the reduct of the program. Then we have that rule r1 is not defeated by ∅ nor by A2 , so we get b which then defeats rule r2 allowing r3 to fire. Thus we have A2 = CP3 (A2 ). This program is already fully prioritized. It is not possible to use further priorities to select one of the two answer sets as the preferred one. Ideally, a fully prioritized elp should have at most one preferred answer set. Let us define the unfolding operation for prioritized elps. For our purposes it suffices to define it for fully prioritized elps. An unfolding operation for fully prioritized elps is defined as follows. Pi+1 = (Pi+1 , <i+1 ) is the result of applying an unfolding on Pi = (Pi , <i ) if 1. Pi+1 is the result of unfolding r ∈ Pi such that r is replaced with rules r1′ , . . . , rk′ , and 2. for each rule rj′ obtained in the previous step, if rj′ ∈ Pi , i.e. an identical rule was already in the program, then 97 We will use the notation r → r′ to mean that head(r) ∈ body − (r′ ), and r ։ r′ to mean that there is a sequence r → r1 , r1 → r2 , . . . , rk → r′ where k is an odd number. We say that r attacks r′ in X if r is active in X and r → r′ . (a) if rj′ <i r then let rj′ <i+1 r∗ (resp. r∗ <i+1 rj′ ) for every rule r∗ such that rj′ <i r∗ (resp. r∗ <i rj′ ), i.e. rj′ retains the same priority, since it has higher priority in Pi than the unfolded rule r. (b) if r <i rj′ then let rj′ <i+1 r∗ (resp. r∗ <i+1 rj′ ) for every rule r∗ such that r <i r∗ (resp. r∗ <i r), i.e. rj′ now has the same priority r has in Pi , since r had higher priority. Definition 12. Let P = (P, <) be an unfolded, fully prioritized elp. We define the sequence X0 , X1 , . . . satisfying the the following conditions: X0 = ∅ and Xi+1 = Xi ∪ {head(r)} such that r is active in Xi and 3. for each rule rj′ obtain in step one such that rj′ 6∈ Pi , i.e. it is a new rule, <i+1 extends <i with the priorities rj′ <i+1 r∗ (resp. r∗ <i+1 rj′ ) if r <i r∗ (resp. r∗ <i r), i.e. these new rules are assigned the same priority r has in Pi . 1. body(r) holds in Xi ; or 2. there is no active r s.t. body(r) holds in Xi (the previous case does not apply for any rule) and for all r′ , if r′ attacks r then r ։ r′ and r < r′ .2 Intuitively, in each iteration we first check if there are any rules whose body is definitely satisfied by Xi . If so, we add the heads of those rules and skip to the next iteration. If there are no rules whose body is satisfied, then the second case adds the heads of all rules r which are not attacked by any rule, or if they are attacked by a rule r′ , the rules are in an even cycle and r has higher priority. It is easy to see that applying an unfolding operation results in a fully prioritized elp. Definition 10. A transformation sequence is a sequence of fully prioritized elps P0 , . . . , Pn such that each Pi+1 is obtained by applying an unfolding operation on Pi . Definition 11. The unfolding of a fully prioritized P, denoted P, is the fully prioritized elp Pn such that there is a transformation sequence P0 , . . . , Pn where P0 = P and there is no rule in Pn that can be unfolded. Proposition 1. There exists n such that for all m > n, Xm = Xn , i.e., the sequence reaches a fixpoint. Example 4. Consider again the program Let IP be the fixpoint set Xn as defined above if Xn does not contain contrary literals, or Lit otherwise. r1 : b ← not ¬b, a. r2 : c ← not b. r3 : a ← not c. Definition 13. Let P be an unfolded, fully prioritized elp. The set IP is a preferred answer set of P if IP is an answer set of P. The unfolding of this program consists of the following rules: It trivially follows from this definition that all preferred answer sets of a prioritized elp P are answer sets of P. It is also easy to see that according to these definitions, if P has a preferred answer set at all, it has one: IP . r1′ : b ← not ¬b, not c. r2 : c ← not b. r3 : a ← not c. The computation of IP may fail to produce one of the answer sets of the program, hence the test done afterwards to check whether it is one them. For programs that are not negative-cycle-free, the computation may reach a fixpoint prematurely. The computation may also fail by producing a set that contains contrary literals. Later we show that for negative-cycle-free, head-consistent programs, this computation is guaranteed to produce one of the answer sets, making the check unnecessary. Here, the unfolding helps reveal more directly that there is a conflict between the first two rules: in the unfolded program the head of one rule appears negated in the body of the other and vice-versa. Let us now proceed to define our proposed preferred answer set semantics, starting with the semantics for unfolded, fully prioritized elps. We start with some terminology. For an arbitrary prioritized elp, preferred answer sets are defined as follows. Let P be an elp and X be a set of literals. We say a literal l holds in X if l ∈ X. An extended literal not l is defeated by X if l holds in X. Obviously, it is possible for not l not to hold nor be defeated in X. A rule r is defeated by X if there is a literal l ∈ body − (r) such that not l is defeated by X. An extended literal not l holds in X if l holds in X or if every rule r ∈ P whose head(r) = l is defeated in X. For a rule r, the body, body(r), holds in X if body + (r) holds in X and not l holds in X for each l ∈ body − (r). A rule r is active in X if neither head(r) nor head(r) holds, body + (r) holds and r is not defeated, in X.1 Definition 14. For an arbitrary prioritized elp P, A is a preferred answer set of P if A is a preferred answer set of the unfolding P ′ of one of the full prioritizations P ′ of P. The set of all preferred answer sets of P will be denoted by P AS(P). Given the above definition, the most interesting programs to analyze are the fully prioritized programs. Therefore all our examples use the latter. 2 According to this definition, the rules in the odd path need not be active. But when a rule in the cycle from r to r′ is not active then there is no longer a conflict to be resolved by priorities. 1 A similar notion of “active rule” is used in [Schaub and Wang, 2001] 98 5 Example 5. Consider the program from Example 1: Properties We start with a result establishing that prioritized elps with our proposed semantics are a conservative extension of elps. Given an elp P , a set of literals S is said to be generated by R if R is the set of all the rules r ∈ P whose bodies are satisfied by S and head(r) ∈ S. r1 : c ← not b. r2 : b ← not a. Since there is no rule with head a, body(r2 ) holds in X0 = ∅, so X1 = {b} which is the fixpoint. Since {b} is also an answer set, it is the preferred answer set. Theorem 1. Let P = (P, <) be a prioritized elp with empty <, that is, without priorities. Then AS(P) = P AS(P). Example 6. Consider next the program from Example 3 which has the following unfolding (also shown in Example 4): Proof. By definition P AS(P) ⊆ AS(P). We show that AS(P) ⊆ P AS(P). Note that, by the equivalence preserving property of the unfolding operation, AS(P) = AS(P). Thus it suffices to show that AS(P) ⊆ P AS(P). Note that the preference relation is empty so P is defined in terms the original unfolding without priorities. Let A ∈ AS(P). A Suppose A = Lit, then there are two rules r1 , r2 in P s.t. head(r1 ) = head(r2 ) and s.t. the smallest set S A closed under the rules of P satisfies the bodies of r1 , r2 . A Since r1 , r2 ∈ P , body − (r1 ) = body − (r2 ) = ∅. Also, since r1 , r2 are unfolded and their bodies satisfied by S, body + (r1 ) = body + (r2 ) = ∅. Therefore, for any full prioritization of P, IP = Lit. Suppose A 6= Lit and let R be the generating rules of A. Consider a full prioritization P ′ = (P ′ , <′ ) of P where for every rule r1 ∈ R and r2 ∈ (P ′ \ R), r1 <′ r2 . Since rules R are generating rules, there are no rules r1 , r2 ∈ R s.t. r1 → r2 , i.e. rules in R never attack each other. Consider a rule r 6∈ R. Since this rule is not generating, then either there is a literal l ∈ body + (r) s.t. there is no rule r′ ∈ P ′ with head(r′ ) = l, or there is a rule r′ ∈ R s.t. r → r′ . In the former case, r is never active. In the latter case, since r′ < r, r is first attacked and then defeated. Thus head(r) 6∈ IP ′ . On the other hand, every rule r′ ∈ R is attacked only by rules in (P ′ \ R) which are defeated in IP ′ . Therefore, head(r′ ) ∈ IP ′ . We conclude that IP ′ = A and therefore that A is a preferred answer set. r1′ : b ← not ¬b, not c. r2 : c ← not b. r3 : a ← not c. Extended literal not ¬b holds in X0 = ∅ since there are no rules with head ¬b. Also, r2 attacks r1′ but r1′ ։ r2 and r1′ < r2 . So r1′ fires. On the other hand, both r2 , r3 are attacked by higher priority rules that are active in X0 . Hence X1 = {b}. Then r2 is defeated in {b} so it is no longer active. Hence X2 = {a, b}, which is the fixpoint, an answer set, and therefore the (only) preferred answer set. The following example shows that there are programs that have no preferred answer sets. Example 7. Consider the program: r1 : p ← not p, a. r2 : a ← not b. r3 : b ← not a. which unfolds into the program consisting of r1′ : p ← not p, not b. and r2 , r3 . In this case r1 attacks itself so it never fires. The resulting fixpoint is {a} which is not an answer set and therefore not a preferred answer set. This program then has an answer set, {b}, but no preferred answer sets according to our semantics. As we discussed in the introduction, one of the motivations for a new semantics is the intention that by adding priorities it should be possible to select only one of the multiple answer sets of a program. Example 7 shows that the existence of answer sets does not guarantee the existence of preferred answer sets, although this seems to occur only in programs involving implicit constraints. The following theorem establishes the existence of preferred answer sets for a class of programs where constraints are ruled out. An elp P is said to be head-consistent if the set of literals {head(r) : r ∈ P } is consistent, i.e. does not contain contrary literals. It is said to be negative-cycle-free if the dependency graph of P does not contain cycles with an odd number of negative edges. The above program does not have BE-preferred answer sets either. The fact that it does not have preferred answer sets is not really surprising. The program essentially has two parts, one part consisting of rules r2 , r3 which intuitively generates a choice between a and b, and rule r1 which says that answer sets that contain a must be ruled out. But the priorities on the bottom two rules say to prefer a which conflicts with the constraint represented by r1 . Note that if the priorities on r2 , r3 are relaxed, i.e. the priority r2 < r3 is removed, then {b} is a preferred answer set. It is well know in answer set programming that cycles with an odd number of negative edges in the dependency graph, such as the cycle involving p and rule r1 above, are used as constraints that eliminate answer sets. In the following section we show that absent such constraints, a prioritized elp is guaranteed to have a preferred answer set according to our semantics. Theorem 2. Let P = (P, <) be a fully prioritized elp s.t. P is negative-cycle-free and head-consistent. Then P has a preferred answer set. Proof. Consider the program P ′ of the unfolding P. Unfolding cannot add negative cycles, so P ′ is negative-cycle-free. Let X denote the set IP . We show that X is an answer set of P ′ . Let A be the answer set of the reduct P ′X . For any 99 rule r ∈ P ′ , by r′X we denote the rule that results from r in computing the reduct P ′X . If r is deleted from the reduct, we will write r′X 6∈ P ′X . X ⊆ A: Assume X 6⊆ A. Let X0 , X1 , . . . , Xn = X be the sequence of IP . Let i be the smallest such that Xi 6⊆ A. Let l be any literal such that l ∈ Xi but l 6∈ A. There must be a rule r ∈ P ′ such that l = head(r) and r is active in Xi−1 . Then either body(r) holds in Xi−1 or for every rule r′ that attacks r in Xi−1 , r < r′ and r ։ r′ . In case body(r) holds in Xi−1 ⊆ A, then l ∈ A. Contradiction. Consider otherwise any literal b ∈ body − (r) and any rule r′ ∈ P ′ with b = head(r′ ). Then 1) r′ is not active in Xi−1 or 2) r < r′ and r ։ r′ . In case (1), we have three possibilities: i) b ∈ body + (r′ ); ii) ¬b ∈ Xi−1 hence ¬b ∈ X and ¬b ∈ A, therefore b 6∈ A unless A = Lit which is a contradiction; iii) r′ is defeated in Xi−1 and hence defeated in A. We conclude that for any rule r′ not active in Xi−1 , if r′X ∈ P ′X then b ∈ body + (r′ ). In case (2), we have that l ∈ body − (r′ ) hence r′X 6∈ P ′X . We conclude that the only rules in P ′X with head b have b in the positive body. Therefore b 6∈ A. Since this holds for any b ∈ body − (r) we have that body − (rX )∩A = ∅. Since r is active in Xi−1 , body + (r) holds in Xi−1 and in A. Since P ′ is head-consistent, X 6= Lit and rX ∈ P ′X . Therefore l ∈ A. Contradiction. A ⊆ X: Assume A 6⊆ X. Let l be any literal such that l ∈ A but l 6∈ X. There must be a rule r ∈ P such that l = head(r) and rX ∈ P ′X . Since l ∈ A and P ′ is unfolded, body + (r) = ∅ and body − (r) ∩ X = ∅. This means that r is active in X. Since l 6∈ X, there is at least one rule that attacks r in X. Let r′ be any rule that attacks r in X. Since rX ∈ P ′X , head(r′ ) 6∈ X, and since r′ attacks r in X, r′ is active in X. Since head(r′ ) 6∈ X, r′ must be attacked in X by some other rule whose head is not in X either. This implies that there is a set of rules active in X that includes r, r′ and that are in a cycle of attacks. Consider the largest such cycle. Since P is negative-cycle-free, the number of rules in the cycle is even. Let r1 be the rule in the cycle with the highest priority. For any rule r2 that attacks r1 in X it must be the case that r1 < r2 and r1 ։ r2 . But this implies that head(r1 ) ∈ X and therefore that either l ∈ X or head(r′ ) ∈ X. Contradiction. The first principle is a postulate meant as a minimal requirement on the treatment of < as a preference relation. Principle 1. Let A1 , A2 be two answer sets of a prioritized elp P = (P, <) generated by the rules R∪{r1 } and R∪{r2 }, respectively, where r1 , r2 6∈ R. If r1 < r2 then A2 is not a preferred answer set of P. The second principle is about relevance. It says that a preferred answer set A should not become non-preferred after adding a rule with a prerequisite not in A while keeping all preferences intact. Principle 2. Let A be a preferred answer set of (P, <) and r be a rule such that at least one prerequisite of r is not in A. Then A is a preferred answer set of (P ∪ {r}, <′ ) whenever <′ agrees with < on the rules in P . Let us show that our proposed semantics satisfies the first principle. Theorem 3. The preferred answer set semantics based on the computation of the set IP satisfies Principle 1. Proof. Let A1 , A2 be answer sets of a fully prioritized, unfolded P = (P, <) generated by R ∪ {r1 } and R ∪ {r2 }, respectively, where r1 , r2 6∈ R and r1 < r2 . Since r1 , r2 are unfolded and satisfied by A1 , A2 , we have body + (r1 ) = body + (r2 ) = ∅. Assume that A2 is a preferred answer set of P. Since r1 is not a generating rule of A2 , r1 is defeated in A2 . Then there exists b ∈ body − (r1 ) s.t. b ∈ A2 . Hence there is a rule r′ ∈ P s.t. b = head(r′ ). There are two cases: a) r′ ∈ R. Then b ∈ A1 and r1 is defeated in A1 . Contradiction. b) r′ 6∈ R. Then r′ must be r2 . Since r1 < r2 , it must be the case that body(r2 ) holds in A2 (by case 1 of Definition 12). But this implies that body(r2 ) holds in A1 and hence that b ∈ A1 and that r1 is defeated in A1 . Contradiction. Therefore A2 is not a preferred answer set. Principle 2, which is about relevance, is not satisfied by our semantics. We use the program from Example 3 to make some observations. Example 8. Consider again the program P from Example 3: r1 : b ← not ¬b, a. r2 : c ← not b. r3 : a ← not c. Corollary 1. If P is a fully prioritized elp with negativecycle-free and head-consistent P , then the set IP is the preferred answer set. Consider the program P ′ consisting only of r2 , r3 with r2 < r3 . This program has one answer set A1 = {c} which is also preferred according to our semantics. The full program P has two answer sets, A1 and the now preferred A2 = {a, b}. P ′ has no BE-preferred answer sets while for P both A1 , A2 are BE-preferred. According to Principle 2, {c} should remain a preferred answer set because r1 is not applicable in A1 (not relevant). But in terms of attacks, r1 seems relevant since it can attack r2 and has higher priority. Moreover, if we replace r1 with the unfolded version b ← not ¬b, not c, which results in an equivalent program, Principle 2 no longer says anything about it since it defines relevance in terms of prerequisites (literals in body + ). In other words, applying an unfolding operation The main point of this corollary is that for a prioritized elp P with negative-cycle-free, head-consistent program, the computation of the sequence X0 , . . . , Xn of Definition 12 is guaranteed to produce an answer set, making the check stipulated in Definition 13 unnecessary. In [Brewka and Eiter, 1999] one can find multiple examples illustrating how the BE-preferred answer set semantics overcomes some of the shortcomings of previous approaches. Based on their study of these shortcomings and the development of their semantics, Brewka and Eiter proposed two principles that ought to be satisfied by any system based on prioritized defeasible rules. We reproduced them here in their particular form for prioritized elps. 100 [Gelfond and Lifschitz, 1991] Michael Gelfond and Vladimir Lifschitz. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9(3-4):365–386, 1991. [Gelfond and Son, 1997] Michael Gelfond and Tran Cao Son. Reasoning with prioritized defaults. In Jurgen Dix, Luis Moniz Pereira, and Teodor C. Przymusinski, editors, Selected Papers of the Third International Workshop on Logic Programming and Knowledge Representation, number 1471 in LNCS, pages 164–223. Springer, 1997. [Gelfond, 2008] Michael Gelfond. Answer sets. In F. van Harmelen, V. Lifschitz, and B. Porter, editors, Handbook of Knowledge Representation, chapter 7, pages 285–316. Elsevier, 2008. [Sakama and Inoue, 1996] Chiaki Sakama and Katsumi Inoue. Representing priorities in logic programs. In M. Maher, editor, Proceedings of the Joint International Conference and Symposium on Logic Programming, pages 82– 96, 1996. [Schaub and Wang, 2001] Torsten Schaub and Kewen Wang. A comparative study of logic programs with preference. In Bernhard Nebel, editor, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI’01), pages 597–602, 2001. [Wang et al., 2000] Kewen Wang, Lizhu Zhou, and Fangzhen Lin. Alternating fixpoint theory for logic programs with priority. In John W. Lloyd, Veronica Dahl, Ulrich Furbach, Manfred Kerber, Kung-Kiu Lau, Catuscia Palamidessi, Luis M. Pereira, Yehoshua Sagiv, and Peter J. Stuckey, editors, Proceedings of the First International Conference on Computational Logic, number 1861 in LNCS, pages 164–178. Springer, 2000. [Zhang and Foo, 1997] Yan Zhang and Norman Y. Foo. Answer sets for prioritized logic programs. In Jan Maluszynski, editor, Proceedngs of the International Symposium on Logic Programming (ILPS’97), pages 69–83, 1997. allows switching from violating to satisfying Principle 2, even though unfolding has no effect on a program’s semantics. The example above also shows that satisfying Principle 2 necessarily requires that some programs have no preferred answer sets or to have multiple ones. In the above example, P ′ has one answer set, {c}, which is not the same as the intuitively preferred answer set of P. But Principle 2 requires that if {c} is a preferred answer set, it must remain one after adding r1 . For these reasons we believe that Principle 2 as stated is not entirely suitable. It is worth mentioning that Brewka and Eiter [1999] define another semantics, called weakly preferred semantics, which assigns preferred answer sets to programs that do not have one under the BE semantics. However, this semantics is based on quantitative measures of relative satisfaction of the preferences, which is very different to the style of semantics we propose here and of the BE semantics. Moreover, the weakly preferred semantics does not satisfy either of the above principles. 6 Conclusions We have proposed a new (selective) semantics for prioritized extended logic programs. In contrast to previously proposed semantics, ours selects at most one preferred answer set for all programs that are fully prioritized. Furthermore, for a large class of programs guaranteed to have an answer set, the existence of a preferred answer set is also guaranteed. We have also shown that our semantics captures the intended meaning of preferences as postulated by Principle 1 from [Brewka and Eiter, 1999]. Future work includes looking at whether the set of preferred answer sets of a program under our semantics is a subset of the BE-preferred answer sets when the latter exist. Another is to generalize the results to programs with variables. References [Aravindan and Dung, 1995] Chandrabose Aravindan and Phan Minh Dung. On the correctness of unfold/fold transformation of normal and extended logic programs. Journal of Logic Programming, 24(3):201–217, 1995. [Baral, 2003] Chitta Baral. Knowledge representation, reasoning and declarative problem solving. Cambridge University Press, 2003. [Brass and Dix, 1999] Stefan Brass and Jürgen Dix. Semantics of (disjunctive) logic programs based on partial evaluation. Journal of Logic Programming, 40(1):1–46, 1999. [Brewka and Eiter, 1999] Gerhard Brewka and Thomas Eiter. Preferred answer sets for extended logic programs. Artificial Intelligence, 109(1–2):297–356, 1999. [Delgrande et al., 2000] James P. Delgrande, Torsten Schaub, and Hans Tompits. Logic programs with compiled preferences. In Werner Horn, editor, Proceedings of the 14th European Conference on Artificial Intelligence (ECAI’00), pages 464–468, 2000. [Dix, 1995] Jürgen Dix. A classification theory of semantics of normal logic programs: II. Weak properties. Fundamenta Informaticae, 22(3):257–288, 1995. 101