Proceedings of the Ninth International Workshop on
Non-Monotonic Reasoning, Action and Change
(NRAC’11)
Editors:
Sebastian Sardina
School of Computer Science and IT
RMIT University
Melbourne, VIC, 3000
Australia
[email protected]
Stavros Vassos
Department of Informatics and Telecommunications
National and Kapodistrian University of Athens
Athens, 15784
Greece
[email protected]
Technical Report
RMIT-TR-11-02
July 2011
School of Computer Science and Information Technology
RMIT University
Melbourne 3000, Australia
Preface
We present here the informal proceedings for the Ninth International Workshop on Non-Monotonic Reasoning, Action and Change (NRAC’11), a well-established forum to foster discussion and sharing of
experiences among researchers interested in the broad areas of nonmonotonic reasoning, and reasoning
about action and change, including belief revision, planning, logic programming, argumentation, causality,
probabilistic and possibilistic approaches to KR, and other related topics.
Since its inception in 1995, NRAC has always been held in conjunction with International Joint Conference on Artificial Intelligence (IJCAI), each time with growing success, and showing an active and loyal
community. Previous editions were held in 2009 in Pasadena, USA; in 2007 in Hyderabad, India; in 2005
in Edinburgh, Scotland; in 2003 in Acapulco, Mexico; in 2001 in Seattle, USA; in 1999 in Stockholm,
Sweden; in 1997 in Nagoya, Japan; and in 1995 in Montreal, Canada. This time, NRAC’11 is held as a
1.5-day satellite workshop of IJCAI’11, in Barcelona, Spain, and will take place on July 16 & 17.
An intelligent agent exploring a rich, dynamic world, needs cognitive capabilities in addition to basic
functionalities for perception and reaction. The abilities to reason nonmonotonically, to reason about
actions, and to change one’s beliefs, have been identified as fundamental high-level cognitive functions
necessary for common sense. Many deep relationships have already been established between the three
areas and the primary aim of this workshop is to further promote this cross-fertilization. A closer look at
recent developments in the three fields reveals how fruitful such cross-fertilization can be. Comparing and
contrasting current formalisms for Nonmonotonic Reasoning, Reasoning about Action, and Belief Revision
helps identify the strengths and weaknesses of the various methods available. It is an important activity
that allows researchers to evaluate the state-of-the-art. Indeed a significant advantage of using logical
formalisms as representation schemes is that they facilitate the evaluation process. Moreover, following
the initial success, more complex real-world applications are now within grasp. Experimentation with
prototype implementations not only helps to identify obstacles that arise in transforming theoretical
solutions into operational solutions, but also highlights the need for the improvement of existing formal
integrative frameworks for intelligent agents at the ontological level.
This workshop will bring together researchers from all three areas with the aim to compare and evaluate
existing formalisms, report on new developments and innovations, identify the most important open
problems in all three areas, identify possibilities of solution transferal between the areas, and identify
important challenges for the advancement of the areas. As part of the program we will be considering the
status of the field and discussing questions such as: What nonmonotonic logics and what theories of action
and change have been implemented?; how to compare them?; which frameworks are implementable?; what
can be learned from existing applications?; what is needed to improve their scope and performance?
In addition to the paper sessions, this year’s workshop features invited talks by two internationally
renowned researchers: Jürgen Dix (TU Clausthal University, Germany) on ‘‘How to test and compare
Multi-agent systems?” and Grigoris Antoniou (University of Crete, Greece) on “Nonmonotonic Reasoning
in the Real: Reasoning about Context in Ambient Intelligence Environments”
The programme chairs would like to thank all authors for their contributions and are also very grateful to
the program committee for their hard work during the review phase and for providing excellent feedback
to the authors. The programme chairs are also very grateful to Pavlos Peppas and Mary-Anne Williams
from the steering committee for always being available for consultation, and to Maurice Pagnucco for
helping us to put these Proceedings together.
June 2011
Sebastian Sardina
Stavros Vassos
RMIT University
National and Kapodistrian University of Athens
Organization
Organizing Committee
Sebastian Sardina
Stavros Vassos
RMIT University, Australia
National and Kapodistrian University of Athens, Greece
Steering Committee
Gerhard Brewka
Michael Thielscher
Leora Morgenstern
Maurice Pagnucco
Pavlos Peppas
Mary-Anne Williams
Andreas Herzig
Benjamin Johnston
University of Leipzig, Germany
University of NSW, Australia
SAIC Advanced Systems and Concepts, USA
University of NSW, Australia
University of Patras, Greece
University of Technology, Sydney, Australia
Universite Paul Sabatier, France
University of Technology, Sydney, Australia
Program Committee
Xiaoping Chen
Jim Delgrande
Jérôme Lang
Thomas Meyer
Michael Thielscher
Sheila McIlraith
Eduardo Fermé
Dongmo Zhang
Mehdi Dastani
Giuseppe De Giacomo
Christian Fritz
Leora Morgenstern
Pavlos Peppas
Sajjad Haider
Alfredo Gabaldon
University of Science and Technology China, China
Simon Fraser University, Canada
Universite Paul Sabatier, France
Meraka Institute, South Africa
University of NSW, Australia
University of Toronto, Canada
University of Madeira, Portugal
University of Western Sydney, Australia
Utrecht University, The Netherlands
Sapienza Universita’ di Roma, Italy
PARC (Palo Alto Research Center), USA
SAIC Advanced Systems and Concepts, USA
University of Patras, Greece
Institute of Business Administratio, Pakistan
Universidade Nova de Lisboa, Portugal
Table of Contents
An Adaptive Logic-based Approach to Abduction in AI (Preliminary Report) . . . . . . . . . . . . . . . . . . . . . . . 1
Tjerk Gauderis
Default Reasoning about Conditional, Non-Local and Disjunctive Effect Actions . . . . . . . . . . . . . . . . . . . . . 7
Hannes Strass
A Logic for Specifying Partially Observable Stochastic Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Gavin Rens, Thomas Meyer, Alexander Ferrein and Gerhard Lakemeyer
Agent Supervision in Situation-Determined ConGolog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Giuseppe De Giacomo, Yves Lespérance and Christian Muise
On the Use of Epistemic Ordering Functions as Decision Criteria for Automated and Assisted Belief
Revision in SNePS (Preliminary Report) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Ari Fogel and Stuart Shapiro
Decision-Theoretic Planning for Golog Programs with Action Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Daniel Beck and Gerhard Lakemeyer
Verifying properties of action theories by bounded model checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Laura Giordano, Alberto Martelli and Daniele Theseider Dupré
Efficient Epistem ic Reasoning in Partially Observable Dynamic Domains Using Hidden
Causal Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Theodore Patkos and Dimitris Plexousakis
Preferred Explanations: Theory and Generation via Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Shirin Sohrabi, Jorge A. Baier and Sheila A. Mcilraith
The Method of ILP+ASP on Psychological Models (Preliminary Report) . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Javier Romero, Alberto Illobre, Jorge Gonzalez and Ramon Otero
Tractable Strong Outlier Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Fabrizio Angiulli, Rachel Ben-Eliyahu-Zohary and Luigi Palopoli
Topics in Horn Contraction: Supplementary Postulates, Package Contraction, and Forgetting . . . . . . . 87
James Delgrande and Renata Wassermann
A Selective Semantics for Logic Programs with Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Alfredo Gabaldon
An Adaptive Logic-based Approach to Abduction in AI∗
(Preliminary Report)
Tjerk Gauderis
Centre for Logic and Philosophy of Science
Ghent University, Belgium
[email protected]
Abstract
language of the logic over which the considered theory T is
defined and ∆, the condition of the formula, is a set of regular well-formed formulas that are assumed to be false. To
express this assumption, these formulas are generally called
abnormalities in adaptive logic literature.2 For an adaptive
logic in standard format, the abnormalities are characterized
by a logical form.
The set of plausibly derivable formulas P from a logical
theory T is formed in the following way:
In a logic-based approach to abductive reasoning,
the background knowledge is represented by a logical theory. A sentence φ is then considered as an
explanation for ω if it satisfies some formal conditions. In general, the following three conditions are
considered crucial: (1) φ together with the background knowledge implies ω; (2) φ is logically consistent with what is known; and (3) φ is the most
‘parsimonious’ explanation. But, since abductive
reasoning is a non-monotonic form of reasoning,
each time the background knowledge is extended,
the status of previously abduced explanations becomes once again undefined.
The adaptive logics program is developed to address these types of non-monotonic reasoning. In
addition to deductive reasoning steps, it allows
for direct implementation of defeasible reasoning
steps, but it adds to each formula the explicit set
of conditions that would defeat this formula. So,
in an adaptive logic for abduction, a formula is an
abduced hypothesis as long as none of its conditions is deduced. This implies that we will not have
to recheck all hypotheses each time an extension
to our background knowledge is made. This is the
key advantage of this approach, which allows us
to save repetitive re-computations in fast growing
knowledge bases.
1
1. Premise Rule: if A ∈ T , then (A, ∅) ∈ P
2. Unconditional Inference Rule:
if A1 , . . . , An ⊢ B
and (A1 , ∆1 ), . . . , (An , ∆n ) ∈ P,
then (B, ∆1 ∪ . . . ∪ ∆n ) ∈ P
3. Conditional Inference Rule:
if A1 , ..., An ⊢ B ∨ Dab(Θ)
and (A1 , ∆1 ), . . . , (An , ∆n ) ∈ P,
then (B, ∆1 ∪ . . . ∪ ∆n ∪ Θ) ∈ P
The Adaptive Logics Framework
The adaptive logics program is established to offer insight in
the direct application of defeasible reasoning steps.1 This is
done by focussing on which formulas would falsify a defeasible reasoning step. Therefore, in adaptive logics a formula
is a pair (A, ∆) with A a regular well-formed formula in the
∗
Research for this paper was supported by project subventions
from the Special Fund for Research (BOF) of Ghent University. I
am grateful to the anonymous referees for their helpful suggestions.
1
The adaptive logics program is founded by Batens in the eighties. For a more recent overview of the general results, see [Batens,
2007]. For a philosophical defense of the use of adaptive logics, see
[Batens, 2004].
1
where Dab(Θ) stands for disjunction of abnormalities, i.e.
the classical disjunction of all elements in the finite set of abnormalities Θ. This third rule, which adds new conditions,
makes clear how defeasible steps are modeled. The idea is
that if we can deductively derive the disjunction of a defeasible result B and the formulas, the truth of which would make
us to withdraw B, we can defeasibly derive B on the assumption that none of these formulas is true.
Apart from the set of plausible formulas P we need a
mechanism that selects which defeasible results should be
withdrawn. This is done by defining a marking strategy. In
the adaptive logics literature, several strategies have been developed, but for our purposes it is sufficient to consider the
simple strategy. According to this strategy, the set of the
derivable formulas or consequences D ⊆ P consists of :
1. Deductive Results: if (A, ∅) ∈ P, then (A, ∅) ∈ D
2. Unfalsified Defeasible Results:
if (A, Θ) ∈ P (with Θ 6= ∅)
and if for every ω ∈ Θ : (ω, ∅) 6∈ P,
then (A, Θ) ∈ D
2
This representation of adaptive logics is a reinterpretation of
the standard representation of adaptive logics, which is in terms of a
proof theory. I made this reinterpretation for purposes of comparison
with other approaches in AI.
So, apart from the deductive results – which are always
derivable – this strategy considers all defeasible results as derived, as long as none of the elements of their condition is
deductively derived.
From the definitions of the sets P and D, we can understand how adaptive logics model the non-monotonic character of defeasible reasoning. If our theory T is extended to the
new theory T ′ (T ⊂ T ′ ), then we can define the corresponding sets P ′ and D′ . On the one hand, the set of plausibly
derivable formulas will be monotonic (P ⊂ P ′ ), since there
is no mechanism to withdraw elements from this set and it can
only grow larger.3 On the other hand, we know that the set of
derivable formulas is non-monotonic (D 6⊂ D′ ). It is possible
that a condition of a defeasible result in D, is suddenly – in
light of the new information in T ′ – deductively derivable.
So, this result will not be part of D′ any more. Obviously, no
deductive result will ever be revoked.
This makes this kind of logics very apt to model fast growing knowledge bases.4 If one needs a previously defeasibly
derived result at a certain point, we cannot be sure whether it
is still valid, because there might have been several knowledge base updates in the meantime. But, since the set of
plausible formulas is monotonic, we know this formula will
still be in P. So, instead of recalculating the whole nonmonotonic set D after each knowledge base extension (which
is the traditional approach), it is sufficient to expand the
monotonic set P. Of course, in this approach, if we want
to use a defeasible result at a certain stage of knowledge base
expansion, we will first have to check its condition. Still, it is
easily seen that a lot of repetitive re-computation is avoided,
certainly in situations in which we only need a small percentage of the defeasible results at every stage of knowledge base
expansion.
Moreover, it is proven that if the adaptive logic is in standard format, which means that the abnormalities have a fixed
logical form, the corresponding logic will have all interesting
meta-theoretic properties. The logic for abduction developed
in this article will be in standard format and will therefore
be sound, complete, proof invariant and have the fixed-point
property.5
2
Other “conditional” approaches
formulas together with consistency conditions that need to be
satisfied to make these formulas acceptable.
The main difference with these research programs is that
the abnormalities in adaptive logics are based on a fixed logical form. This means that, for instance, the logical form for
abduction – explained in this paper – is the form of abnormalities for any premise set on which we want to apply abductive
reasoning. Put in other words, as soon as a fully classical
premise set is given, all the possible abnormalities and, therefore, all the plausible and finally derivable abductive results
can be calculated. There is no element of choice. In the other
approaches, the conditions of defeasible steps must be given
in the premise set, which leaves an element of choice which
conditions we want to add to which defeasible implications.
In adaptive logics, the defeasible consequences can be derived as soon as we have a classical premise set and as soon
as we have chosen the appropriate logic for the kind of reasoning we want to do (e.g. abduction).
3
The problem of multiple explanatory
hypotheses in Abduction
If we focus our attention now to the abductive problem,
we cannot allow that the different defeasible results – the
abduced hypotheses – are together in the set P. For instance, if Tweety is a non-flying bird, he may be a penguin or an ostrich. But a set containing both the formulas
(penguin(T weety), Θ1 ) and (ostrich(T weety), Θ2 ) is inconsistent.6
An elegant solution to this problem is found by translating this problem to a modal framework. When we introduce a possibility operator ♦ to indicate hypotheses and the
corresponding necessity operator ( =df ¬♦¬) to represent
background knowledge, we evade this problem. The Tweetyexample translates, for instance, as such (for variables ranging over the domain of all birds):
Background Knowledge:
(∀x(penguin(x) ⊃ ¬f lies(x)), ∅)
(∀x(ostrich(x) ⊃ ¬f lies(x)), ∅)
(¬f lies(T weety), ∅)
Plausible defeasible results:
As far as I can see, two other approaches in AI have used the
idea of directly adding conditions or restrictions to formulas.
On the one hand, there is a line of research, called “Cumulative Default Reasoning”, going back to a paper of [Brewka,
1991] with the same title. On the other hand, in the area of argumentation theory, some work on defeasible logic programs
(see, for instance, [Garcı́a and Simari, 2004]) is also based on
3
It is important to understand “plausible” as “initially plausible”
(at the time of derivation) and not as “plausible according to our
present insights”. The second definition would, of course, have led
to a non-monotonic set.
4
In that way, this kind of logic can offer a solution to what [Paul,
2000] mentioned as one of the main problems of both set-coverbased and some logic-based approaches to abduction.
5
For an overview of the generic proofs of these properties, see
[Batens, 2007].
2
(♦penguin(T weety), Θ1 )
(♦ostrich(T weety), Θ2 )
So, with this addition the sets P and D are consistent again.
Though, in this situation it is not really necessary to maintain the modal operators, because we can quite easily make a
translation to a hierarchical set-approach, by borrowing some
ideas of the Kripke-semantics for modal logics.7 In these semantics, a hypothesis is said to be true in a possible world
that is accessible from the world in which the hypothesis is
stated, while necessities are true in all accessible worlds.
6
At this point, we make abstraction of the exact conditions. The
details of the conditions will be explained below.
7
It is important to remember that we are constructing a syntactical representation, not a semantics for the underlying logic.
If we define now a world(-set) as the set of formulas
assigned to that world, we can finish our translation from
modalities to sets. We define the actual world w as the set
of all formulas of the knowledge base and all deductive consequences. The elements of the set w are the only formulas
that have a -operator in our modal logic, and are thus the
only elements that will be contained in every world-set in our
system. Subsequently, for every abduced hypothesis we define a new world-set that contains it. This world is hierarchically directly beneath the world from which the formula is
abduced. This new set contains further the formulas of all the
world-sets hierarchically above, and will be closed under deduction. To make this hierarchy clear, we will use the names
w1 , w2 , . . . for the worlds containing hypotheses directly abduced from the knowledge base, w1.1 , w1.2 , . . . , w2.1 , . . . for
hypotheses abduced from a first-level world, etc.
With this translation in mind, we can omit the modal operators and just keep for every formula track of the hierarchically
highest world-set that contains it. So, our Tweety example
can be respresented as such:
(∀x(penguin(x) ⊃ ¬f lies(x)), ∅)
(∀x(ostrich(x) ⊃ ¬f lies(x)), ∅)
(¬f lies(T weety), ∅)
(penguin(T weety), Θ1 )
(ostrich(T weety), Θ2 )
w
w
w
w1
w2
Since the hierarchical system of sets wi is equivalent to the
set P (the plausibly derivable results) of a logic for abduction, the definition of the set D (of this logic) can be applied
to this system of sets too. It is clear that only the deductive
consequences – the only formulas with an empty condition –
will be the formulas in the set w. Further, since all formulas in a world-set have the same conditions, i.e. the condition
of the hypothesis for which the world is created, the definition of D does not only select on the level of the formulas,
but actually also on the level of the world-sets.8 Put in other
words, D selects a subsystem of the initial system of hierarchically ordered sets. The different sets in this subsystem
are equivalent with what [Flach and Kakas, 2000] called abductive extensions of some theory. In this way, the logic can
handle mutually contradictory hypotheses,9 without the risk
that any set of formulas turns out to be inconsistent.
4
now use this set representation to reformulate the syntax of
the logic MLAs , which is previously developed in [Gauderis, 2011].10 This adaptive logic, the name of which stands
for Modal Logic for Abduction, is an adaptive logic designed
to handle contradictory hypotheses in abduction. The reformulation in terms of sets is performed with the goal to integrate the adaptive approach with other AI-approaches. First
we need to define the abductive problem in a formal way.
Definition 1. An abductive system T is a triple (H, O, d) of
the following three sets
• a set of clauses H of the form
∀x(A1 (α) ∧ . . . ∧ An (α) ⊃ B(α))
with A1 (α), . . . , An (α), B(α) literals and α ranging
over d.
• a set of observations O of the form C(γ)
with C a literal and a constant γ ∈ d.
• a domain d of constants.
All formulas are closed formulas defined over a standard
predicative first order logic.
Furthermore, the notation does not imply that predicates
should be of rank 1. Predicates can have any rank, the only
preliminaries are that in the clauses all Ai and B share a common variable, and that the observations have at least one variable that is replaced by a constant. Obviously, for predicates
of higher rank, extra quantifiers for the other variables need
to be added to make sure that all formulas are closed.
Definition 2. The background knowledge or actual world w
of an abductive system T = (H, O, d) is the set
w = {(P, ∅) | H ∪ O ⊢ P }
Since it was the goal of an adaptive logic-approach to implement directly defeasible reasoning steps, we will consider
instances of the Peircean schema for abduction [Peirce, 1960,
5.171]:
The surprising fact, C is observed;
But if A were true, C would be a matter of course,
Hence, there is reason to suspect that A is true.
When we translate his schema to the elements of T =
(H, O, d), we get the following schema:
∀x(A1 (α) ∧ . . . ∧ An (α) ⊃ B(α))
B(γ)
A1 (γ) ∧ . . . ∧ An (γ)
Reformulation of the abductive problem in
the adaptive logics format
So far, in this paper we have shown – in the first section –
how we can represent the standard format of adaptive logics in terms of two sets P and D, and – in the third section
– how we can cope with contradictory hypotheses by using
a hierarchical system of world-sets. In this section we will
8
Strictly speaking, each world-set contains also all formulas of
the world-sets hierarchically above. But since these formulas are
also contained in those worlds above, no information is lost if we
allow that D can select on the level of the world-sets.
9
Consider, for instance, the famous quaker/republican example:
our approach will lead to two different abductive extensions, one in
which Nixon will be a pacifist and another one in which he isn’t.
3
To implement this schema – better-known as the logical
fallacy Affirming the Consequent – in an adaptive logic, we
need to specify the logical form of the conditions that would
falsify the application of this rule. As we can see from how
the conditional inference rule is introduced in the first section,
the disjunction of the hypothesis and all defeating conditions
needs to be derivable from the theory. To specify these conditions, we will first overview the different desiderata for our
abductions.
10
In the original article, the syntax of the logic MLAs is defined
in terms of a proof theory.
Obviously, it is straightforward that if the negation of the
hypothesis can be derived from our background knowledge,
the abduction is falsified. If we know that Tweety lives in
Africa, we know that he cannot be a penguin. So, in light of
this information, the hypothesis cannot longer be considered
as derivable: (penguin(T weety), Θ1 ) 6∈ D. But the hypothesis still remains in the monotonic set of ‘initially’ plausible
results: (penguin(T weety), Θ1 ) ∈ P.
So, if we define A(α) to denote the full conjunction,
Now we can define the defeasible reasoning steps. Therefore we will need a new notation, which has the purpose to lift
out one element from the conjunction A1 (α) ∧ . . . ∧ An (α).
This will be used to check for more parsimonious explanations.
Notation 1 (A−1
i (α)).
A(α) =def A1 (α) ∧ . . . ∧ An (α)
the first formal condition that could falsify the defeasible
step will be
To avoid self-explanations we will further add the condition that A(α) and B(α) share no predicates.
The reason why this condition also states the two premises
of the abductive schema is because, in an adaptive logic, we
can apply the conditional rule each time the disjunction is
derivable. So, if we didn’t state the two premises in the abnormality, we could derive anything as a hypothesis since
⊢ A(γ) ∨ ¬A(γ) for any A(γ). But with the current form,
only hypotheses for which the two premises are true can be
derived. This abnormality would already be sufficient to create an adaptive logic.
Still, we want to add some other defeating conditions. This
could be done by replacing the abnormality by a disjunction
of the already found abnormality and the other wanted conditions. Then, each time one of the conditions is derivable,
the whole disjunction is derivable (by addition), and so, the
formula defeated. But this result is obtained in the same way
if we allow that one defeasible inference step adds more than
one element to the condition instead of this complex disjunction. Hence, we will add these extra conditions in this way.
A lot of times, it is stated that the abduced hypothesis must
be as parsimonious as possible. One of the main reasons for
this is that one has to avoid random explanations. For instance, have a look at the following example:
H = {∀x(penguin(x) ⊃ ¬f lies(x))}
O = {¬f lies(T weety)}
d = {x | x is a bird}
The following formulas are derivable from this:
(∀x(penguin(x) ∧ is green(x) ⊃ ¬f lies(x)), ∅)
(penguin(T weety) ∧ is green(x), Θ1 )
(is green(T weety), Θ1 )
w
w1
w1
The fact that T weety is green is not an explanation for the
fact that T weety doesn’t fly, nor is it something that follows
from our background knowledge. Since we want to avoid that
our abductions yield this kind of random hypotheses, we will
add a mechanism to control that our hypothesis is the most
parsimonious.
A final condition that we have to add is that our observation is not a tautology. Since we use a material implication,
anything could be derived as an explanation for a tautology,
because ⊢ B(α) ⊃ ⊤ for any B(α).
4
:
A−1
i (α) =df (A1 (α) ∧ . . . ∧ Ai−1 (α) ∧
Ai+1 (α) ∧ . . . ∧ An (α))
if n = 1
:
A−1
1 (α) =df ⊤
Definition 3. The set of abnormalities Ω for an abductive
system T is given by
Ω
∀α(A1 (α) ∧ . . . ∧ An (α) ⊃ B(α)) ∧ B(γ) ∧ ¬A(γ).
if n > 1
=
{(∀x(A1 (α) ∧ . . . ∧ An (α) ⊃ B(α)) ∧ B(γ) ∧ ¬A(γ))
n
_
∨ ∀αB(α) ∨
∀α(A−1
i (α) ⊃ B(α) | γ ∈ d,
i=1
α ranging over d, Ai and B literals, B 6∈ {Ai }}
It is easily seen that the generic conditional rule for adaptive logics – as defined in section 1 – defined by this set
of abnormalities is equivalent with the following inference
rule that is written in the style of the Peircean schema stated
above.
Definition 4. Defeasible Inference rule for Abduction
( ∀α(A1 (α) ∧ . . . ∧ An (α) ⊃ B(α)), ∅) w
( B(γ),
∅) wi
( A1 (γ) ∧ . . . ∧ An (γ),
Θ) wij
with wij a new world hierarchically directly beneath wi and
Θ = {¬(A1 (γ) ∧ . . . ∧ An (γ)), ∀αB(α),
−1
∀α(A−1
1 (α) ⊃ B(α)), . . . , ∀α(An (α) ⊃ B(α))}
So, it is possible to abduce further on hypothetical observations (and generate in that way further abductive extensions),
but the implications need to be present in the background
knowledge w. It is quite obvious, that if the abduced hypothesis is already abduced before (from, for instance, another implication), the resulting world-set will contain the same formulas, but with other conditions.
Finally, as explained in section 1, this body of definitions is
formulated in the general framework of adaptive logics. This
means that we have the following property.
Property 1. The logic MLAs is a fixed-point logic which
has a sound and complete semantics with respect to its syntax.
For the semantics and proof theory of this logic, and the
proof that this logic is in the standard format of adaptive logics, we refer to [Gauderis, 2011]. For the soundness and
completeness proof, we refer to the generic proof provided
in [Batens, 2007] for all adaptive logics in standard format.
5
Example
Motivation and comparison with other approaches In
this section we will consider an elaborate example of the dynamics of this framework. The main goal is to illustrate the
key advantage of this approach, i.e. that there is no longer the
need to recalculate all non-monotonic results at any stage of
a growing knowledge base, but that one only needs to check
the non-monotonic derivability of the needed formulas at a
certain stage against the monotonic plausibility.
This is the main difference with other approaches to abduction such as the ones explicated in, for instance, [Paul, 2000],
[Flach and Kakas, 2000] or [Kakas and Denecker, 2002].
Since these approaches focus on a fixed and not an expanding
knowledge base, they require in cases of expansion a full recomputation to keep the set of derived non-monotonic results
updated. It is not claimed that the adaptive approach yields
better results than these other approaches in cases of a fixed
knowledge base. In fact, it is an issue for future research to
investigate whether the integration of the existing approaches
for fixed knowledge bases with the adaptive approach does
not yield better results.
Since the background information is extended, we only
know that all previously derived hypotheses are still in the set
of plausible hypotheses P. If we want to check whether they
are in the set of derivable hypotheses D, we need to check
whether their conditions are derivable from this extended information or not. But – this has already been cited several
times as the key advantage of this system – we don’t need to
check all hypotheses. Since we don’t have any further information on the penguin case, we just leave the hypothesis (4)
for what it is. Thus, we save a computation, because at this
stage we are not planning on reasoning or communicating on
the penguin hypothesis. We just want to check whether this
new information is a problem for the ostrich hypothesis; and
indeed, it is easily seen that (5) 6∈ D′ .
Initial system T Our elaborate example will be an abductive learning situation about the observation of a nonflying bird, called Tweety. Initially, our abductive system
T = (H, O, d) contains in addition to this observation only
very limited background knowledge.
Second Extension T ′′ At this stage, we will investigate
further the penguin hypothesis and retrieve additional background information about penguins.
H = {∀x(penguin(x) ⊃ ¬f lies(x)),
∀x(ostrich(x) ⊃ ¬f lies(x))}
O = {¬f lies(T weety)}
d = {x | x is a bird}
H′′ = H′ ∪ {∀x(penguin(x) ⊃ eats f ish(x)),
∀x(on south pole(x) ∧ in wild(x) ⊃ penguin(x))}
O′′ = O′
d = {x | x is a bird}
The following formulas can now further be retrieved:
Thus, our background knowledge contains the following
formulas:
(∀x(penguin(x) ⊃ ¬f lies(x)), ∅)
(∀x(ostrich(x) ⊃ ¬f lies(x)), ∅)
(¬f lies(T weety), ∅)
w
w
w
(1)
(2)
(3)
w1
w2
(4)
(5)
with the sets Θ1 and Θ2 defined as
Θ1
Θ2
=
=
{¬penguin(T weety), ∀x ¬f lies(x)}
{¬ostrich(T weety), ∀x ¬f lies(x)}
Since both implications have only one conjunct in the antecedent, their parsimony conditions – as defined in the general logical form – trivially coincide with the second condition. Since none of the conditions is deductively derivable
in w, both (4) and (5) are elements of the set of derivable
formulas D.
First Extension T ′ At this stage, we discover that Tweety
can swim, something we know ostriches can’t.
H′ = H ∪ {∀x(ostrich(x) ⊃ ¬swims(x))},
O′ = O ∪ {swims(T weety)}
d = {x | x is a bird}
From which the following formulas can be derived:
(∀x(swims(x) ⊃ ¬ostrich(x)), ∅)
(¬ostrich(T weety), ∅)
w
w
(6)
(7)
5
w1
w1.1
w1.1
(8)
(9)
(10)
with the set Θ1.1 defined as
Θ1.1
And the following abductive hypotheses can be derived:
(penguin(T weety), Θ1 )
(ostrich(T weety), Θ2 )
(eats f ish(T weety), Θ1 )
(on south pole(T weety), Θ1.1 )
(in wild(T weety), Θ1.1 )
=
{¬(on south pole(T weety) ∧ in wild(T weety)),
∀x penguin(x),
∀x(on south pole(x) ⊃ penguin(x)),
∀x(in wild(x) ⊃ penguin(x))}
Since the first element of Θ1.1 is actually a disjunction, the
first condition can even be split in two.
This stage is added to illustrate the other aspects of adaptive reasoning. Firstly, as (8) illustrates, there is no problem in reasoning further on previously deductively derived
hypotheses. Only, to reason further, we must first check the
condition of these hypotheses (This poses no problem here,
because we can easily verify that (4) ∈ D′′ ). The deductively derived formula has the same conditions as the hypothesis on which it is built (and is contained in the same
world). So, these results stand as long as the hypotheses on
which assumption they are derived, hold. This characteristic
of adaptive logics is very interesting, because it allows to derive predictions that can be tested in further investigation. In
this example, we can test whether Tweety eats fish. In case
this experiment fails and ¬eats f ish(T weety) is added to
the observations in the next stage, the hypothesis (and all results derived on its assumption) will be falsified. Secondly,
the set of conditions Θ1.1 for the formulas (9) and (10) contains now also conditions that check for parsimony. Let us
illustrate their functioning with the final extension.
Third Extension T ′′′ At this stage, we learn that even in
captivity the only birds that can survive on the South Pole are
penguins. In addition to that, we get to know that Tweety is
held in captivity.
H′′′ = H′′ ∪ {∀x(on south pole(x) ⊃ penguin(x))},
O′′′ = O′′ ∪ {¬in wild(T weety)}
d = {x | x is a bird}
If we now check the parsimony conditions of Θ1.1 , we
see that an element of this condition can be derived from
our background knowledge. This means that all formulas assigned to world w1.1 are not derivable anymore on this condition. Still, one might wonder whether this parsimony condition should not keep (9) and only withdraw (10). But, that
this is not a good road is proven by the fact that in that case
(10) would be falsified by the extra observation that Tweety
does not live in the wild. In fact, that it was a good decision to
withdraw the whole world w1.1 is illustrated by the fact that
the South Pole hypothesis of (9) can also be derived from H′′′
in another world.
(on south pole(T weety), Θ1.2 )
w1.2
(11)
with the set Θ1.2 defined as
Θ1.1
=
{¬on south pole(T weety), ∀x penguin(x)}
So, at the end, we find that the set D′′′ of derivable formulas consists of all formulas derivable in the worlds w, w1 and
w1.2 . The formulas of w2 and w1.1 are not an element of this
final set of derivable results.
6
Conclusion
In this article we presented a new logic-based approach to
abduction which is based on the adaptive logics program. The
main advantages of this approach are :
1. Each abduced formula is presented together with the
specific conditions that would defeat it. In that way, it
is not necessary to check the whole system for consistency after each extension of the background knowledge.
Only the formulas that are needed at a certain stage need
to be checked. Furthermore, it allows for the conditions
to contain additional requirements, such as parsimony.
2. In comparison with other approaches that add conditions to formulas, the conditions are here fixed by a logical form and hence only determined by the (classical)
premise set. In this way, there is no element of choice in
stating conditions (as, for instance, in default logics).
3. By integrating a hierarchical system of sets, it provides
an intuitive representation of multiple hypotheses without causing conflicts between contradictory hypotheses.
4. It allows for further deductive and abductive reasoning
on previous retrieved abduced hypotheses.
5. The approach is based on a proper sound and complete
fixed point logic (MLAs ).
6
Limitations and Future Research It has been argued that
these advantages make this approach apt for systems in which
not all non-monotonic derivable results are needed at every
stage of expansion of a knowledge base. Still, it needs to be
examined whether an integration with existing systems (for a
fixed knowledge base) do not yield better results. Furthermore, since the key feature of this approach is the saving
of computations in expanding knowledge bases, it needs to
be investigated whether there is no integration possible with
assumption-based Truth Maintenance Systems (building on
the ideas of [Reiter and de Kleer, 1987]).
References
[Batens, 2004] Diderik Batens. The need for adaptive logics
in epistemology. In D. Gabbay, S. Rahman, J. Symons,
and J.P. Van Bendegem, editors, Logic, Epistemology and
the Unity of Science, pages 459–485. Kluwer Academic
Publishers, Dordrecht, 2004.
[Batens, 2007] Diderik Batens. A universal logic approach
to adaptive logics. Logica Universalis, 1:221–242, 2007.
[Brewka, 1991] Gerhard Brewka. Cumulative Default Logic.
Artificial Intelligence, 50(2):183–205, 1991.
[Flach and Kakas, 2000] Peter A. Flach and Antonis C.
Kakas. Abductive and Inductive Reasoning: Background
and Issues. In Peter A. Flach and Antonis C. Kakas, editors, Abduction and Induction. Essays on their Relation
and their Integration, volume 18 of Applied Logic Series, pages 1–27. Kluwer Academic Publishers, Dordrecht,
2000.
[Garcı́a and Simari, 2004] Alejandro
J.
Garcı́a
and
Guillermo R. Simari.
Defeasible Logic Programming: An Argumentative Approach. Theory and Practice
of Logic Programming, 4(1):95–2004, 2004.
[Gauderis, 2011] Tjerk Gauderis. Modelling Abduction in
Science by means of a Modal Adaptive Logic. Foundations of Science, 2011. Forthcoming.
[Kakas and Denecker, 2002] Antonis Kakas and Marc Denecker. Abduction in Logic Programming. In A. Kakas
and F. Sadri, editors, Computational Logic: Logic Programming and Beyond. Part I, pages 402–436. Springer
Verlag, 2002.
[Paul, 2000] Gabriele Paul. AI Approaches to Abduction. In
Dov M. Gabbay and Rudolf Kruse, editors, Abductive Reasoning and Uncertainty Management Systems, volume 4 of
Handbook of Defeasible Reasoning and Uncertainty Management Systems, pages 35–98. Kluwer Academic Publishers, Dordrecht, 2000.
[Peirce, 1960] Charles S. Peirce. Collected Papers. Belknap Press of Harvard University Press, Cambridge, Massachusetts, 1960.
[Reiter and de Kleer, 1987] Raymond Reiter and Johan
de Kleer. Foundations of Assumption-based Truth Maintenance Systems: Preliminary Report. In Proceedings of
the Sixth National Conference on Artificial Intelligence
(AAAI’87), pages 183–188, 1987.
Default Reasoning about Conditional, Non-Local and Disjunctive Effect Actions
Hannes Strass
Institute of Computer Science
University of Leipzig
[email protected]
Abstract
objects. In the presence of a (simple) state default expressing that objects are to be considered not broken unless there
is information to the contrary, this could lead to the following reasoning: After dropping an object x of which nothing
further is known, we can apply the default and infer it is not
broken. But this means it cannot have been fragile before
(since otherwise it would be broken). This line of reasoning
violates the principle of causality: while a fragile object will
be broken after dropping it, this does not mean that objects
should be assumed not fragile before dropping them. We will
formally define when such undesired inferences arise and devise a modification to the basic framework that provably disables them. Interestingly, the counterintuitive consequences
occur already with conditional, local-effect actions; our modification however prevents them also for actions with nondeterministic, non-local effects. Since the introduction of effect preconditions represents our most significant change, we
will prove that it is a proper generalisation of the original
framework: for all action default theories with only unconditional, local effect actions, the “old” and “new” approach
yield the same results. For the subsequent extensions it will
be straightforward to see that they are proper generalisations.
The paper proceeds as follows. In the next section, we provide the necessary background. The sections thereafter extend the basic approach introduced in [Baumann et al., 2010]
by conditional effects (Section 3), non-local effects (Section
4) and disjunctive effects (Section 5). In the penultimate section, we prove several desirable properties of the extended
framework; Section 7 discusses related work and concludes.
Recently, Baumann et al. [2010] provided a comprehensive framework for default reasoning about
actions. Alas, the approach was only defined for
a very basic class of domains where all actions
have mere unconditional, local effects. In this
paper, we show that the framework can be substantially extended to domains with action effects
that are conditional (i.e. are context-sensitive to
the state in which they are applied), non-local (i.e.
the range of effects is not pre-determined by the
action arguments) and even disjunctive (thus nondeterministic). Notably, these features can be carefully added without sacrificing important nice properties of the basic framework, such as modularity of
domain specifications or existence of extensions.
1
Introduction
Reasoning about actions and non-monotonic reasoning are
two important fields of logic-based knowledge representation and reasoning. While reasoning about actions deals with
dynamic domains and their evolution over time, default reasoning is usually concerned with closing gaps in incomplete
static knowledge bases. Both areas have received considerable attention and have reached remarkable maturity by now.
However, a unifying approach that combines the full expressiveness of both fields was still lacking, until a recent paper
[Baumann et al., 2010] took an important first step into the direction of uniting these two lines of research. There, a logical
framework was proposed that lifted default reasoning about
a domain to a temporal setting where defaults, action effects
and the frame assumption interact in a well-defined way.
In this paper, we develop a substantial extension of their
work: we significantly generalise the theoretical framework
to be able to deal with a broad class of action domains where
effects may be conditional, non-local and non-deterministic.
As we will show in the paper, extending the approach to conditional effects is straightforward. However, retaining their
construction of defaults leads to counterintuitive conclusions.
Roughly, this is due to eager default application in the presence of incomplete knowledge about action effects. As an
example, consider the classical drop action that breaks fragile
2
Background
2.1 Unifying Action Calculus
The unifying action calculus (UAC) was proposed in
[Thielscher, 2011] to allow for a treatment of problems in reasoning about actions that is independent of a particular calculus. It is based on a finite, sorted logic language with equality
which includes the sorts FLUENT, ACTION and TIME along
with the predicates < : TIME × TIME, that denotes a (possibly partial) ordering on time points; Holds : FLUENT × TIME,
that is used to state that a fluent is true at a given time point;
and Poss : ACTION × TIME × TIME, expressing that an action is possible for given starting and ending time points.
As a most fundamental notion in the UAC, a state formula
7
Φ[~s ] in ~s is a first-order formula with free TIME variables
~s where (1) for each occurrence of Holds(f, s) in Φ[~s ] we
have s ∈ ~s and (2) predicate Poss does not occur. State formulas allow to express properties of action domains at given
time points. Although this definition is quite general in that
it allows an arbitrary finite sequence of time points, for our
purposes two time points will suffice. For a function A into
sort ACTION, a precondition axiom for A(~x) is of the form
Poss(A(~x), s, t) ≡ πA [s]
if its prerequisite, justification and consequent are sentences,
that is, have no free variables; otherwise, it is open.
The semantics of defaults is defined via the notion of extensions for default theories. A default theory is a pair (W, D),
where W is a set of sentences in first-order logic and D is a
set of defaults. A default theory is closed if all its defaults
are closed; otherwise, it is open. For a set T of formulas, we
say that a default α : β/γ is applicable to T iff α ∈ T and
¬β ∈
/ T ; we say that the default has been applied to T if it is
applicable and additionally γ ∈ T . Extensions for a default
theory (W, D) are deductively closed sets of formulas which
contain all elements of W , are closed under application of defaults from D and which are grounded in the sense that each
formula in them has a non-cyclic derivation. For closed default theories this is captured by the following definition.
Definition 2 (Theorem 2.1, [Reiter, 1980]). Let (W, D) be
a closed default theory and E be a set of closed formulas.
def
def
Define E0 =
W and Ei+1 =
Th(Ei ) ∪ Di for i ≥ 0, where
α:β
def
∈ D, α ∈ Ei , ¬β ∈
/E
Di = γ
γ
S∞
Then E is an extension for (W, D) iff E = i=0 Ei .
We will interpret open defaults as schemata representing
all of their ground instances. Therefore, open default theories
can be viewed as shorthand notation for their closed counterparts.2 When we use an extension E or set of defaults D with
an integer subscript, we refer to the Ei and Di from above.
We write (W, D) |≈ Ψ to express that the formula Ψ is contained in each extension of the default theory (W, D).
(1)
where πA [s] is a state formula in s with free variables among
s, t, ~x. The formula πA [s] thus defines the necessary and sufficient conditions for the action A to be applicable for the
arguments ~x at time point s, resulting in t. The UAC also
provides a general form for effect axioms; we however omit
this definition because we only use a special form of effect
axioms here. The last notion we import formalises how action domains are axiomatised in the unifying action calculus.
Definition 1. A (UAC) domain axiomatisation consists of a
finite set of foundational axioms Ω defining a time structure,
a set Π of precondition axioms (1) and a set Υ of effect axioms; the latter two for all functions into sort ACTION; lastly,
it contains uniqueness-of-names axioms for all finitely many
function symbols into sorts FLUENT and ACTION.
The foundational axioms Ω serve to instantiate the UAC
by a concrete time structure, for example the branching situations with their usual ordering from the situation calculus. We restrict our attention to domains that make intuitive sense; one of the basic things we require is that
actions actually consume time: A domain axiomatisation is progressing, if Ω |= (∃s : TIME)(∀t : TIME)s ≤ t and
Ω ∪ Π |= Poss(a, s, t) ⊃ s < t. Here, we are only concerned
with progressing domain axiomatisations; we use the macro
def
¬(∃s)s < t to refer to the unique initial time point.
Init(t) =
For presentation purposes, we will make use of the concept
of fluent formulas, where terms of sort FLUENT play the role
of atomic formulas, and complex formulas can be built using the usual first-order constructors. For a fluent formula Φ,
we will denote by Φ[s] the state formula that is obtained by
replacing all fluent literals [¬]f in Φ by [¬]Holds(f, s). The
operator |·| will be used to extract the affirmative component
of a fluent literal, that is, |¬f | = |f | = f ; the polarity of a
fluent literal is given by sign(¬f ) = − and sign(f ) = +.
2.3 Default Reasoning in Action Domains with
Unconditional, Local Effect Actions
The approach of [Baumann et al., 2010] combines default
logic with the unifying action calculus: domain axiomatisations are viewed as incomplete knowledge bases that are completed by defaults. It takes as input a description of a particular action domain with normality statements. This description
comprises the following: (1) a domain signature, that defines
the vocabulary of the domain; (2) a description of the direct
effects of actions; (3) a set of state defaults Φ ψ, constructs
that specify conditions Φ under which a fluent literal ψ normally holds in the domain.3
The state defaults from the domain description are translated into Reiter defaults, where the special predicates
DefT(f, s, t) and DefF(f, s, t) are used to express that a fluent f becomes normally true (false) from s to t.4 For each
state default δ, two Reiter defaults are created: δInit , that is
used for default conclusions about the initial time point; and
δReach , that is used for default conclusions about time points
that can be reached via action application.
2.2 Default Logic
Default logic as introduced by [Reiter, 1980] uses defaults to
extend incomplete world knowledge. They are of the form1
α:β
γ
(shorthand: α : β/γ)
Here, α, the prerequisite, the β, the justification, and γ, the
consequent, are first-order formulas. These expressions are to
be read as “whenever we know α and nothing contradicts β,
we can safely conclude γ”. A default is normal if β = γ, that
is, justification and consequent coincide. A default is closed
2
Free variables of formulas not in a default will however be implicitly universally quantified from the outside.
3
Here, Φ, the prerequisite, is a fluent formula; ψ, the consequent,
being a fluent literal also allows to express that a fluent normally
does not hold in the domain.
4
It should be noted that DefF(f, s, t) is not the same as
¬DefT(f, s, t) – the latter only means that f becomes not normally
true from s to t.
1
Reiter [1980] introduces a more general version of defaults with
an arbitrary number of justifications, which we do not need here.
8
Definition 3. Let δ = Φ
ψ be a state default.
Init(t) ∧ Φ[t] : ψ[t]
ψ[t]
Pre
(s,
t) : Def (ψ, s, t)
δ
def
=
Def (ψ, s, t)
def
δInit =
δReach
Note that a default conclusion of a state property in a noninitial state crucially depends on an action execution leading
to that state. Hence, whenever it is definitely known that
Holds(f, t) after Poss(a, s, t), it follows from the effect axiom that ¬DefF(f, s, t); a symmetrical argument applies if
¬Holds(f, t). This means that definite knowledge about a fluent inhibits the opposite default conclusion. But observe that
the addition of DefT and DefF as “causes” to the effect axiom weakened the solution to the frame problem established
earlier. The following definition ensures that the persistence
assumption is restored in its full generality.
Definition 5. Let ∆ be a set of state defaults, ψ be a fluent
literal and s, t be variables of sort TIME. The default closure
axiom for ψ with respect to ∆ is
^
(11)
¬PreΦ ψ (s, t) ⊃ ¬Def (ψ, s, t)
(2)
(3)
def
Preδ (s, t) =
Φ[t] ∧ ¬(Φ[s] ∧ ¬ψ[s])
DefT(ψ, s, t)
if ψ = |ψ|
def
Def (ψ, s, t) =
DefF(|ψ| , s, t) otherwise
For a set ∆ of state defaults, the corresponding defaults are
def
def
∆Init =
{δInit | δ ∈ ∆} and ∆Reach =
{δReach | δ ∈ ∆}.
For the Reach defaults concerning two time points s, t connected via action application, we ensure that the state default
δ was not violated at the starting time point s by requiring
¬(Φ[s] ∧ ¬ψ[s]) in the prerequisite.5 The consequent is then
inferred unless there is information to the contrary.
Being true (or false) by default is then built into the effect
axiom by accepting it as a possible “cause” to determine a fluent’s truth value. The other causes are the ones already known
from monotonic formalisms for reasoning about actions: direct action effects, and a notion of persistence that provides a
solution to the frame problem [McCarthy and Hayes, 1969].
Definition 4. Let f : FLUENT and s, t : TIME be variables.
The following macros express that f persists from s to t:
def
FrameT(f, s, t) =
Holds(f, s) ∧ Holds(f, t)
def
FrameF(f, s, t) = ¬Holds(f, s) ∧ ¬Holds(f, t)
Φ
For a fluent literal ψ not mentioned as a consequent in ∆
the default closure axiom is just ⊤ ⊃ ¬Def (ψ, s, t). Given
a domain axiomatisation Σ and a set ∆ of state defaults, we
denote by Σ∆ the default closure axioms with respect to ∆
and the fluent signature of Σ.
The fundamental notion of the solution to the state default
problem by [Baumann et al., 2010] is now a default theory
where the incompletely specified world consists of a UAC
domain axiomatisation augmented by suitable default closure
axioms. The default rules are the automatic translations of
user-specified, domain-dependent state defaults. For a domain axiomatisation Σ and a set ∆ of state defaults, the corresponding domain axiomatisation with state defaults is the
pair (Σ ∪ Σ∆ , ∆Init ∪ ∆Reach ). We use a well-known example
domain [Reiter, 1991] to illustrate the preceding definitions.
To ease the presentation, in this example we instantiate the
UAC to the branching time structure of situations.
Example 1 (Breaking Objects). Imagine a robot that can
move around and carry objects, among them a vase. When
the robot drops an object x, it does not carry x any more and
additionally x is broken. Usually, however, objects are not
broken unless there is information to the contrary.
The fluents that we use to describe this domain are
Carries(x) (the robot carries x) and Broken(x) (x is broken);
the only function of sort ACTION is Drop(x). Dropping an
object is possible if and only if the robot carries the object:
(4)
(5)
Let A be a function into sort ACTION and ΓA be a set of fluent
literals with free variables in ~x that denote the positive and
negative direct effects of A(~x), respectively. The following
pair of macros expresses that f is a direct effect of A(~x):
_
def
DirectT(f, A(~x), s, t) =
f = F (~x′ )
(6)
F (~
x′ )∈ΓA , ~
x′ ⊆~
x
def
DirectF(f, A(~x), s, t) =
_
¬F (~
x′ )∈Γ
A
f = F (~x′ )
(7)
,~
x′ ⊆~
x
An effect axiom with unconditional effects, the frame assumption and normal state defaults is of the form
Poss(A(~x), s, t) ⊃
(∀f )(Holds(f, t) ≡ CausedT(f, A(~x), s, t)) ∧
(∀f )(¬Holds(f, t) ≡ CausedF(f, A(~x), s, t))
Poss(Drop(x), s, t) ≡
Holds(Carries(x), s) ∧ t = Do(Drop(x), s)
The effects of dropping an object x are given by the set
ΓDrop(x) = {¬Carries(x), Broken(x)}
(8)
where
def
CausedT(f, A(~x), s, t) =
DirectT(f, A(~x), s, t) ∨
FrameT(f, s, t) ∨ DefT(f, s, t)
def
CausedF(f, A(~x), s, t) =
DirectF(f, A(~x), s, t) ∨
FrameF(f, s, t) ∨ DefF(f, s, t)
ψ∈∆
The set of state defaults ∆break = {⊤ ¬Broken(x)} says
that objects are normally not broken. Applying the definitions from above to this specification results in the domain axbreak
break
iomatisation with defaults (Σbreak ∪ Σbreak
∆ , ∆Init ∪ ∆Reach ),
break
where Σ
contains effect axiom (8) and the above precondition axiom for Drop, the set ∆break
Init contains only
Init(t) : ¬Holds(Broken(x), t)
¬Holds(Broken(x), t)
(9)
(10)
5
The reason for this is to prevent application of initially definitely
violated state defaults through irrelevant actions. A default violation
occurs when the prerequisite Φ[s] of a state default δ is known to be
met, yet the negation of the consequent prevails, ¬ψ[s].
9
and the defaults ∆break
Reach for action application consist of
¬Holds(Broken(x), s) : DefF(Broken(x), s, t)
DefF(Broken(x), s, t)
Finally, the default closure axioms for the fluent Broken
are Holds(Broken(x), s) ⊃ ¬DefF(Broken(x), s, t) and
¬DefT(Broken(x), s, t), and ¬Def (ψ, s, t) for all other fluent
def
Do(Drop(Vase), S0 ), the default
literals ψ. With S1 =
theory sanctions the sceptical conclusions that the vase is
initially not broken, but is so after dropping it:
Definition 7. Let ε = Φ/ψ be a conditional effect expression
and f : FLUENT and s, t : TIME be variables. The following
macro expresses that ε has been activated for f from s to t:7
def
Activatedε (f, s, t) =
(f = |ψ| ∧ Φ[s])
Let A be a function into sort ACTION with a set of conditional effect expressions ΓA(~x) that is local-effect. The direct
positive and negative effect formulas for A(~x) are
_
DirT(f, A(~x), s, t) ≡
Activatedε (f, s, t)
(12)
break
break
(Σbreak ∪ Σbreak
∆ , ∆Init ∪ ∆Reach ) |≈
¬Holds(Broken(Vase), S0 ) ∧ Holds(Broken(Vase), S1 )
One of the main theoretical results of [Baumann et al., 2010]
was the guaranteed existence of extensions for the class of domain axiomatisations with defaults considered there. As we
will see later on, a similar result holds for our generalisation
of the theory.
Proposition 1 (Theorem 4, [Baumann et al., 2010]). Let
Σ be a domain axiomatisation and ∆ be a set of state defaults. Then the corresponding domain axiomatisation with
state defaults (Σ ∪ Σ∆ , ∆Init ∪ ∆Reach ) has an extension. If
furthermore Σ is consistent, then so are all extensions for
(Σ ∪ Σ∆ , ∆Init ∪ ∆Reach ).
3
ε∈Γ+
A(~
x)
DirF(f, A(~x), s, t) ≡
_
Activatedε (f, s, t)
(13)
ε∈Γ−
A(~
x)
An effect axiom with conditional effects, the frame assumption and normal state defaults is of the form (8), where
def
CausedT(f, A(~x), s, t) =
DirT(f, A(~x), s, t) ∨
FrameT(f, s, t) ∨ DefT(f, s, t)
(14)
def
CausedF(f, A(~x), s, t) =
DirF(f, A(~x), s, t) ∨
FrameF(f, s, t) ∨ DefF(f, s, t)
(15)
The only difference between the effect axioms of [Baumann et al., 2010] and the effect axioms defined here is the replacement of their macros DirectT, DirectF for unconditional
direct effects with the predicates DirT, DirF for conditional
effects. In the following, we will understand domain axiomatisations to contain – for each action – effect axioms of the
form (8) along with the respective direct positive and negative effect formulas. To ease notation, for predicates with
an obvious polarity (like DirT, DirF), we use a neutral version (like Dir) with fluent literals L, where Dir(L, a, s, t)
denotes DirF(F, a, s, t) if L = ¬F for some fluent F and
DirT(L, a, s, t) otherwise.
While this extended definition of action effects is straightforward, it severely affects the correctness of default reasoning in the action theory: as the following example shows, one
cannot naı̈vely take this updated version of the effect axioms
and use the Reiter defaults as before.
Example 1 (Continued). We add a unary fluent Fragile
with the obvious meaning and modify the Drop action
such that dropping only breaks objects that are fragile:
ΓDrop(x) = {⊤/¬Carries(x), Fragile(x)/Broken(x)}.
Assume that all we know is that the robot initially carries the
vase, Holds(Carries(Vase), S0 ). As before, the effect axiom
tells us that the robot does not carry the vase any more at
S1 . Additionally, since we do not know whether the vase
was fragile at S0 , there is no reason to believe that it is
broken after dropping it, hence ¬Broken(Vase) still holds by
default at S1 . But now, due to the presence of conditional
effects, the effect axiom for Drop(Vase) clearly entails
¬Holds(Broken(Vase), S1 ) ⊃ ¬Holds(Fragile(Vase), S0 ),8
Conditional Effects
We first investigate how the default reasoning framework of
[Baumann et al., 2010] can be extended to conditional effect
actions. As we will show, there is subtle interdependence between conditional effects and default conclusions, which requires a revision of the defaults constructed in Definition 3.
We begin by formalising how to represent conditional effects
in the domain specification language. Recall that in the unconditional case, action effects were just literals denoting the
positive and negative effects. In the case of conditional effects, theses literals are augmented with a fluent formula that
specifies the conditions under which the effect materialises.
Definition 6. A conditional effect expression is of the form
Φ/ψ, where Φ is a fluent formula and ψ a fluent literal. Φ/ψ is called positive if sign(ψ) = + and negative if
sign(ψ) = −. For an action A and sequence of variables ~x
matching A’s arity, a conditional effect expression ε is called
local for A(~x) iff all free variables in ε are among ~x.
Throughout the paper, we will assume given a set ΓA(~x) of
conditional effect expressions for each function A into sort
ACTION with matching sequence of variables ~
x. Such a set
ΓA(~x) is called local-effect if all ε ∈ ΓA(~x) are local for A(~x).
−
By Γ+
A(~
x) we refer to the positive, by ΓA(~
x) to the negative
elements of ΓA(~x) .
With this specification of action effects, it is easy to express
the implication “effect precondition implies effect” via suitable formulas. For this purpose, we introduce the new predicates DirT and DirF. Intuitively, DirT(f, a, s, t) says that f
is a direct positive effect of action a from s to t; symmetrically, DirF(f, a, s, t) says that f is a direct negative effect.6
7
The second time argument t of macro Activatedε (f, s, t) will
only be needed later when we introduce non-deterministic effects.
8
This is just the contrapositive of the implication expressed by
the effect axiom.
6
Notice that these new predicates are in contrast to Definition 4,
where DirectT and DirectF are merely syntactic sugar.
10
and thus we can draw the conclusion
default only if it is known that a conflict cannot arise, that is,
if it is known that the contradictory direct effect cannot materialise. To this end, we extend the original default prerequisite Preδ (s, t) = Φ[t] ∧ ¬(Φ[s] ∧ ¬ψ[s]) that only requires
the precondition to hold and the default not to be violated
previously: we will additionally stipulate that any action a
happening at the same time cannot create a conflict.
Definition 9. Let δ = Φ ψ be a state default and s, t :
TIME be variables.
break
break
(Σbreak ∪ Σbreak
∆ , ∆Init ∪ ∆Reach ) |≈
¬Holds(Fragile(Vase), S0 )
This is undesired as it lets us conclude something about the
present (S0 ) using knowledge about the future (S1 ) which we
could not conclude using only knowledge and default knowledge about the present (there is no default that could conclude
¬Fragile(Vase)).
The flaw with this inference is that it makes default conclusions about a fluent whose truth value is affected by an action
at the same time. This somewhat contradicts our intended
usage of defaults about states: we originally wanted to express reasonable assumptions about fluents whose values are
unknown.
Generalising the example, the undesired behaviour occurs
whenever there exists a default ΦD
ψ with conclusion ψ
whose negation ¬ψ might be brought about by a conditional
effect ΦC /¬ψ. The faulty inference then goes like this:
def
Safeδ (s, t) =
(∀a)(Poss(a, s, t) ⊃ ¬Dir(¬ψ, a, s, t))
Preδ (s, t) ∧ Safeδ (s, t) : Def (ψ, s, t)
def
(16)
δPoss =
Def (ψ, s, t)
def
For a set ∆ of state defaults, ∆Poss =
{δPoss | δ ∈ ∆}.
In the example domain, applying the above definition
yields the following.
Example 1 (Continued). For the state default δ break saying
that objects are usually not broken, we have Safeδbreak (s, t) =
(∀a)(Poss(a, s, t) ⊃ ¬DirT(Broken(x), a, s, t)). This expresses that the state default can be safely applied from s to
t whenever for any action a happening at the same time, it is
known that a does not cause a violation of this default at the
break
ending time point t. The resulting default δPoss
is
ΦD [t] ⊃ Def (ψ, s, t) ⊃ ψ[t] ⊃ ¬Dir(¬ψ, s, t) ⊃ ¬ΦC [s]
Obviously, this inference is only undesired if there is no information about the effect’s precondition at the starting time
point of the action. This motivates our formal definition of
the conditions under which a so-called conflict between an
action effect and a default conclusion arises.
Definition 8. Let (Σ, ∆) be a domain axiomatisation with
defaults, E be an extension for (Σ, ∆), α be a ground action
and δ = Φ ψ be a ground state default. We say that there
is a conflict between α and δ in E iff there exist ground time
points σ and τ such that for some i ≥ 0 we have
1. (a) Ei 6|= Poss(α, σ, τ ) ⊃ ¬Dir(¬ψ, α, σ, τ )
(b) Ei 6|= Def (ψ, α, σ, τ )
2. (a) Ei+1 |= Poss(α, σ, τ ) ⊃ ¬Dir(¬ψ, α, σ, τ )
(b) Ei+1 |= Def (ψ, σ, τ )
In words, a conflict arises in an extension if up to some stage
i, before we make the default conclusion ψ, we cannot conclude the effect ¬ψ will not occur (1); after concluding ψ by
default, we infer that ¬ψ cannot occur as direct effect (2). We
can now go back to the example seen earlier and verify that
the counter-intuitive conclusion drawn there was indeed due
to a conflict in the sense of the above definition.
Example 1 (Continued). Consider the only extension E break
break
break
for (Σbreak ∪ Σbreak
∆ , ∆Init ∪ ∆Reach ). Before applying any
defaults whatsoever, we know that dropping the vase is
possible: E0break |= Poss(Drop(Vase), S0 , S1 ); but we do
not know if the vase is fragile and hence E0break 6|=
¬DirT(Broken(Vase), Drop(Vase), S0 , S1 ) (item 1). After applying all the defaults, we know that the vase is
not broken at S1 : E1break |= DefF(Broken(Vase), S0 , S1 ).
Hence, it cannot have been broken by dropping it in S0 ,
that is, E1break |= ¬DirT(Broken(Vase), Drop(Vase), S0 , S1 )
(item 2), thus cannot have been fragile in the initial situation.
In the following, we will modify the definition of Reiter
defaults from [Baumann et al., 2010] to eliminate the possibility of such conflicts. The underlying idea is to apply a
¬Holds(Broken(x), s) ∧ Safeδbreak (s, t) : DefF(Broken(x), s, t)
DefF(Broken(x), s, t)
As we will see later (Theorem 3), the default closure
axioms ¬PreΦ ψ (s, t) ⊃ ¬Def (ψ, s, t) for preserving the
commonsense principle of inertia in the presence of inapplicable defaults need not be modified. With our new defaults,
we can now redefine the concept of a domain axiomatisation
with defaults for conditional effect actions.
Definition 10. Let Σ be a domain axiomatisation where the
effect axioms are given by Definition 7 and let ∆ be a set
of state defaults. The corresponding domain axiomatisation
with defaults is the pair (Σ ∪ Σ∆ , ∆Init ∪ ∆Poss ).
The direct effect formulas that determine DirT and DirF
will be redefined twice in this paper. We will understand the
above definition to be retrofitted with their latest version. The
extension to conditional effects is a proper generalisation of
the original approach of Section 2.3 for the special case of
unconditional effect actions, as is shown below.
Theorem 2. Consider a domain axiomatisation with only unconditional action effects and a set ∆ of state defaults. Let
Ξ1 = (Σ ∪ Σ∆ , ∆Init ∪ ∆Reach ) be the corresponding domain
axiomatisation with defaults of [Baumann et al., 2010], and
let Ξ2 = (Σ′ ∪ Σ∆ , ∆Init ∪ ∆Poss ) be the domain axiomatisation with defaults according to Definition 10. For a state formula Ψ and time point τ , we have Ξ1 |≈ Ψ[τ ] iff Ξ2 |≈ Ψ[τ ].
Proof sketch. For unconditional effects, a ground Dir atom is
by Definition 7 equivalent to the corresponding Direct macro,
hence the effect axioms of the two approaches are equivalent.
Furthermore, the truth values of ground DirT and DirF atoms
are always fixed, and consequently each Reiter default (16)
defined above is applicable whenever the original Reach default (3) of [Baumann et al., 2010] is applicable.
11
4
Non-Local Effects
2. Defaults override persistence:
(A) Let Φ′′ /ψ, Φ′′ /¬ψ ∈
/ Γα for all Φ′′ ;
′
′
(B) for each δ = Φ
¬ψ ∈ ∆, let δ ′ be not applicable to E; and
(C) E |= Preδ (σ, τ ) ∧ Safeδ (σ, τ ).
Then E |= ψ[τ ].
3. The frame assumption is correctly implemented:
For all fluent formulas Φ′′ , let Φ′′ /ψ, Φ′′ /¬ψ ∈
/ Γα and
for all state defaults δ ′ with consequent ψ or ¬ψ, let
E |= ¬Preδ′ (σ, τ ). Then E |= ψ[σ] ≡ ψ[τ ].
Up to here, conditional effect expressions for an action A(~x)
were restricted to contain only variables among ~x. Considering a ground instance A(~ς) of an action, this means that the
set of objects that can possibly be affected by this action is already fixed to ~ς. This is a restriction because it can make the
specification of certain actions at least cumbersome or utterly
impossible, for example actions that affect a vast number of
(or all of the) domain elements at once.
The gain in expressiveness when allowing non-local action
effects comes at a relatively low cost: it suffices to allow additional free variables ~y in the conditional effect expressions.
They represent the objects that may be affected by the action
without being among the action arguments ~x.
Definition 11. Let A be a function into sort ACTION and
~x a sequence of variables matching A’s arity. Let ε be
a conditional effect expression of the form Φ/F (~x′ , ~y ) or
Φ/¬F (~x′ , ~y ) with free variables ~x′ , ~y , where ~x′ ⊆ ~x and ~y
is disjoint from ~x.
For variables f : FLUENT and s, t : TIME, the following
macro expresses that ε has been activated for f from s to t:
Proof sketch. Similar to the proof of Theorem 3 in [Baumann
et al., 2010], adapted to our definition of Reiter defaults.
5
Disjunctive Effects
The next and final addition to effect axiom (8) is the step of
generalising purely deterministic action effects. Disjunctive
action effects have been studied in the past [Kartha, 1994;
Shanahan, 1997; Giunchiglia et al., 1997; Thielscher, 2000].
Our contribution in this paper is two-fold. First, we express
disjunctive effects by building them into the effect axiom inspired by work on nonmonotonic causal theories [Giunchiglia
et al., 2004]. This works without introducing additional function symbols – called determining fluents [Shanahan, 1997]
– for which persistence is not assumed and that are used to
derive indeterminate effects via conditional effects. The second and more important contribution is the combination of
non-deterministic effects with state defaults. We claim that
it brings a significant representational advantage: Disjunctive effects can explicitly represent potentially different outcomes of an action of which none is necessarily predictable.
At the same time, state defaults can be used to model the
action effect that normally obtains. For example, dropping
an object might not always completely break it, but most of
the time only damage it. This can be modelled in our framework by specifying “broken or damaged” as disjunctive effect
of the drop action, and then including the default “normally,
dropped objects are damaged” to express the usual outcome.
Next, we define how disjunctive effects are declared by the
user and accommodated into the theory. The basic idea is to
allow disjunctions of fluent literals ψ1 ∨ . . . ∨ ψn in the effect
part of a direct effect expression. The intended meaning of
these disjunctions is that after action execution, at least one
of the effects ψi holds.
Definition 12. Let Φ be a fluent formula and
Ψ = ψ1 ∨ . . . ∨ ψn be a disjunction of fluent literals.
The pair Φ/Ψ is called a conditional disjunctive effect
expression (or cdee).
Firstly, we want to guarantee that at least one effect out
of ψ1 ∨ . . . ∨ ψn occurs. To this end, we say for each ψi
that non-occurrence of all the other effects ψj with j 6= i is
a sufficient cause for ψi to occur. We build into the effect
axiom (in the same way as before) the n implications
Φ[s] ∧ ¬ψ2 [t] ∧ . . . ∧ ¬ψn [t] ⊃ Caused(ψ1 , a, s, t)
..
.
Φ[s] ∧ ¬ψ1 [t] ∧ . . . ∧ ¬ψn−1 [t] ⊃ Caused(ψn , a, s, t)
def
Activatedε (f, s, t) =
(∃~y )(f = F (~x′ , ~y ) ∧ Φ[s])
The direct positive and negative effect formulas are of the
form (12) and (13).
Note that according to this definition, free variables ~y are
quantified existentially when they occur in the context Φ and
universally when they occur in the consequence ψ. They thus
not only express non-local effects but also non-local contexts.
Example 2 (Exploding Bomb [Reiter, 1991]). In this domain, objects might get broken not by getting dropped,
but because a bomb in their proximity explodes:
ΓDetonate(b) = {Bomb(b) ∧ Near(b, x)/Broken(x)}. Def. 11
yields the direct effect formulas DirT(f, Detonate(b), s, t) ≡
(∃x)(f = Broken(x)
∧
Holds(Near(x, b), s))
and
DirF(f, Detonate(b), s, t) ≡ ⊥.
In this example, the defaults from Definition 9 also prevented conflicts possibly arising from non-local effects. We
will later see that this is the case for all domains with local
and non-local effect actions.
Like the original framework, our extension implements a
particular preference ordering between causes that determine
a fluent’s truth value. This means that whenever two causes
are in conflict – for example, a state default says an object
is not broken, and an action effect says it is – the preferred
cause takes precedence. The preferences are
direct effects < default conclusions < persistence,
where a < b means “a is preferred to b”. The theorem below
proves that this preference ordering is indeed established.
Theorem 3. Let Σ be a domain axiomatisation, ∆ be a set
of state defaults, δ = Φ ψ ∈ ∆ be a state default, E be
an extension for the domain axiomatisation with state defaults (Σ ∪ Σ∆ , ∆Init ∪ ∆Poss ), ϕ be a ground fluent, and
E |= Poss(α, σ, τ ) for ground action α and time points σ, τ .
1. Effects override everything:
Φ/(¬)ϕ ∈ Γα and E |= Φ[σ] imply E |= (¬)ϕ[τ ].
12
mined about Damaged(x) not being among its negative effects), the default δPoss is applicable and we conclude
This, together with the persistence assumption, is in effect
an exclusive or where only exactly one effect occurs (given
that no other effects occur simultaneously). Thus we add, for
each literal, its truth as sufficient cause for itself being true:
break
break
(Σbreak ∪ Σbreak
∆ , ∆Init ∪ ∆Poss ) |≈
Holds(Carries(Vase), S0 ) ∧ Holds(Damaged(Vase), S1 )
Φ[s] ∧ ψ1 [t] ⊃ Caused(ψ1 , a, s, t)
..
.
Φ[s] ∧ ψn [t] ⊃ Caused(ψn , a, s, t)
If we now observe that the vase is broken after all –
Holds(Broken(Vase), S1 ) – and add this information to the
knowledge base, we will learn that this was an action effect:
This makes every interpretation where at least one of the
mentioned literals became true a model of the effect axiom.
For the next definition, we identify a disjunction of literals
Ψ = ψ1 ∨ . . . ∨ ψn with the set of literals {ψ1 , . . . , ψn }.
break
break
(Σbreak ∪ Σbreak
∆ , ∆Init ∪ ∆Poss ) |≈
Holds(Broken(Vase), S1 ) ⊃
DirT(Broken(Vase), Drop(Vase), S0 , S1 )
Definition 13. Let ε = Φ/Ψ be a conditional disjunctive effect expression, ψ ∈ Ψ and f : FLUENT and s, t : TIME be
variables. The following macro expresses that effect ψ of cdee
ε has been activated for f from s to t:
Furthermore, the observation allows us to rightly infer that
the vase was fragile at S0 .
It is worth noting that for a cdee Φ/Ψ with deterministic effect Ψ = {ψ}, the macro ActivatedΦ/Ψ,ψ (f, s, t) expressing
activation of this effect is equivalent to ActivatedΦ/ψ (f, s, t)
from Definition 7 for activation of the conditional effect;
hence the direct effect formulas (17) for disjunctive effects
are a generalisation of (12), the ones for deterministic effects.
We have considered here only local non-deterministic effects
to keep the presentation simple. Of course, the notion can be
extended to non-local effects without harm.
def
Activatedε,ψ (f, s, t) =
f = |ψ| ∧ Φ[s] ∧
^
ψ ′ ∈Ψ\{ψ}
¬ψ ′ [t] ∨ ψ[t]
Let A be a function into sort ACTION and ΓA be a set of
conditional disjunctive effect expressions with free variables
in ~x that denote the direct conditional disjunctive effects of
A(~x). The direct positive and negative effect formulas are
_
DirT(f, A(~x), s, t) ≡
Activatedε,ψ (f, s, t) (17)
6
We have already seen in previous sections that the approach
to default reasoning about actions presented here has certain
nice properties: it is a generalisation of the basic approach
[Baumann et al., 2010] and it implements a particular preference ordering among causes. While those results were mostly
straightforward adaptations, the theorem below is novel. It
states that conflicts between conditional effects and default
conclusions in the sense of Definition 8 cannot occur.
Theorem 4. Let (Σ, ∆) be a domain axiomatisation with
defaults, E be an extension for (Σ, ∆) and δ = Φ
ψ
be a state default. Furthermore, let i ≥ 0 be such that
Def (ψ, σ, τ ) ∈
/ Ei and Def (ψ, σ, τ ) ∈ Ei+1 . Then for all
ground actions α, Poss(α, σ, τ ) ⊃ ¬Dir(¬ψ, α, σ, τ ) ∈ Ei .
Φ/Ψ∈ΓA(~
x) ,
ψ∈Ψ, sign(ψ)=+
DirF(f, A(~x), s, t) ≡
_
Activatedε,ψ (f, s, t)
Properties of the Extended Framework
(18)
Φ/Ψ∈ΓA(~
x) ,
ψ∈Ψ, sign(ψ)=−
The implementation of the example sketched above illustrates the definition.
Example 1 (Continued). We once again modify the action
Drop(x). Now a fragile object that is dropped becomes
not necessarily completely broken, but might only get damaged. To this end, we record in the new fluent Dropped(x)
that the object has been dropped and write the state default δ = Dropped(x) Damaged(x) saying that dropped
objects are usually damaged. Together, these two express
the normal outcome of the action drop. Formally, the action effects are ΓDrop(x) = { ⊤/¬Carries(x), ⊤/Dropped(x),
Fragile(x)/Broken(x) ∨ Damaged(x)}. Constructing the direct effect formulas as per Definition 13 yields
Proof. According to Def. 2, we have Ei+1 = Th(Ei ) ∪ ∆i ;
hence, Def (ψ, σ, τ ) ∈ Ei+1 can have two possible reasons:
1. Def (ψ, σ, τ ) ∈ Th(Ei ) \ Ei . By construction, this can
only be due to effect axiom (8), more specifically, we
have (1) Ei |= Caused(ψ, α, σ, τ ) ∧ ¬Frame(ψ, σ, τ ) ∧
¬Dir(ψ, σ, τ ) and (2) Ei |= ¬Caused(¬ψ, α, σ, τ ),
whence Ei |= ¬Dir(¬ψ, α, σ, τ ) proving the claim.
2. Def (ψ, σ, τ ) ∈ ∆i . By definition of δPoss in Def. 9,
Preδ (σ, τ ) ∧ Safeδ (σ, τ ) ∈ Ei , whereby we can conclude Poss(α, σ, τ ) ⊃ ¬Dir(¬ψ, α, σ, τ ) ∈ Ei .
DirT(f, Drop(x), s, t) ≡
f = Dropped(x)
∨ (f = Broken(x) ∧ Holds(Fragile(x), s) ∧
(¬Holds(Damaged(x), t) ∨ Holds(Broken(x), t)))
∨ (f = Damaged(x) ∧ Holds(Fragile(x), s) ∧
(¬Holds(Broken(x), t) ∨ Holds(Damaged(x), t)))
Note that conflicts already arise with conditional, local effects; the framework however makes sure there are no conflicts even for conditional, non-local, disjunctive effects.
Finally, the existence of extensions for domain axiomatisations with state defaults can still be guaranteed for the extended framework.
Since the effect axiom of Drop(x) is itself not determined
about the status of Broken(x) and Damaged(x) (but is deter-
13
[Denecker and Ternovska, 2007] Marc Denecker and Eugenia Ternovska. Inductive Situation Calculus. AIJ, 171(5–
6):332–360, 2007.
[Giunchiglia et al., 1997] Enrico Giunchiglia, G. Neelakantan Kartha, and Vladimir Lifschitz. Representing Action:
Indeterminacy and Ramifications. AIJ, 95(2):409–438,
1997.
[Giunchiglia et al., 2004] Enrico Giunchiglia, Joohyung
Lee, Vladimir Lifschitz, Norman McCain, and Hudson Turner.
Nonmonotonic Causal Theories.
AIJ,
153(1-2):49–104, 2004.
[Kartha, 1994] G. Neelakantan Kartha. Two Counterexamples Related to Baker’s Approach to the Frame Problem.
AIJ, 69(1–2):379–391, 1994.
[Lakemeyer and Levesque, 2009] Gerhard Lakemeyer and
Hector Levesque. A Semantical Account of Progression in
the Presence of Defaults. In Proceedings of IJCAI, pages
842–847, 2009.
[McCarthy and Hayes, 1969] John McCarthy and Patrick J.
Hayes. Some Philosophical Problems from the Standpoint
of Artificial Intelligence. In Machine Intelligence, pages
463–502. Edinburgh University Press, 1969.
[Michael and Kakas, 2011] Loizos Michael and Antonis
Kakas. A Unified Argumentation-Based Framework for
Knowledge Qualification. In E. Davis, P. Doherty, and
E. Erdem, editors, Proceedings of the Tenth International
Symposium on Logical Formalizations of Commonsense
Reasoning, Stanford, CA, March 2011.
[Reiter, 1980] Raymond Reiter. A Logic for Default Reasoning. AIJ, 13:81–132, 1980.
[Reiter, 1991] Raymond Reiter. The Frame Problem in the
Situation Calculus: A Simple Solution (Sometimes) and
a Completeness Result for Goal Regression. In Artificial
Intelligence and Mathematical Theory of Computation –
Papers in Honor of John McCarthy, pages 359–380. Academic Press, 1991.
[Reiter, 2001] Raymond Reiter. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. The MIT Press, September 2001.
[Shanahan, 1997] Murray Shanahan. Solving the Frame
Problem: A Mathematical Investigation of the Common
Sense Law of Inertia. The MIT Press, February 1997.
[Strass and Thielscher, 2009] Hannes Strass and Michael
Thielscher. Simple Default Reasoning in Theories of Action. In Proceedings of AI, pages 31–40, Melbourne, Australia, December 2009. Springer-Verlag Berlin Heidelberg.
[Thielscher, 2000] Michael Thielscher. Nondeterministic
Actions in the Fluent Calculus: Disjunctive State Update
Axioms. In Intellectics and Computational Logic (to Wolfgang Bibel on the occasion of his 60th birthday), pages
327–345, Deventer, The Netherlands, The Netherlands,
2000. Kluwer, B.V.
[Thielscher, 2011] Michael Thielscher. A Unifying Action
Calculus. AIJ, 175(1):120–141, 2011.
Theorem 5. Let Σ be a domain axiomatisation and ∆ be a set
of state defaults. Then the corresponding domain axiomatisation with defaults (Σ ∪ Σ∆ , ∆Init ∪ ∆Poss ) has an extension.
If furthermore Σ is consistent, then so are all extensions for
(Σ ∪ Σ∆ , ∆Init ∪ ∆Poss ).
Proof. Existence of an extension is a corollary of Theorem
3.1 in [Reiter, 1980] since the defaults in ∆Init ∪ ∆Poss are
still normal. If Σ is consistent, then so is Σ ∪ Σ∆ by the argument in the proof of Theorem 4 in [Baumann et al., 2010].
Consistency of all extensions then follows from Corollary 2.2
in [Reiter, 1980].
Additionally, it is easy to see that the domain specifications provided by the user are still modular: different parts of
the specifications, such as conditional effect expressions and
state defaults, are completely independent of each other from
a user’s point of view. Yet, the intricate semantic interactions
between them are correctly dealt with.
7
Discussion
We have presented an extension to a recently introduced
framework for default reasoning in theories of actions and
change. The extension increases the range of applicability of
the framework while fully retaining its desirable properties:
we can now express context-dependent effects of actions, actions with a potentially global effect range and indeterminate
effects of actions – all the while domain descriptions have not
become significantly more complex, and default extensions of
the framework still provably exist.
There is not much related work concerning the kind of default reasoning about actions we consider here. [Denecker
and Ternovska, 2007] enriched the situation calculus [Reiter,
2001] with inductive definitions. While they provide a nonmonotonic extension of an action calculus, the intended usage
is to solve the ramification problem rather than to do the kind
of defeasible reasoning we are interested in this work. [Lakemeyer and Levesque, 2009] provide a progression-based semantics for state defaults in a variant of the situation calculus,
but without looking at nondeterministic actions. In an earlier
paper [Strass and Thielscher, 2009], we explored default effects of nondeterministic actions, albeit in a much more restricted setting: there, actions had only unconditional effects
– either deterministic or disjunctive of the form f ∨ ¬f –, and
defaults had only atomic components, that is, they were of the
form (¬)Holds(f, t) : (¬)Holds(g, t)/(¬)Holds(g, t). Most
recently, [Michael and Kakas, 2011] gave an argumentationbased semantics for propositional action theories with state
defaults. While being more flexible in terms of preferences
between causes, their approach is constricted to a linear time
structure built into the language and does not make a clear
ontological distinction between fluents and actions.
References
[Baumann et al., 2010] Ringo Baumann, Gerhard Brewka,
Hannes Strass, Michael Thielscher, and Vadim Zaslawski.
State Defaults and Ramifications in the Unifying Action
Calculus. In Proceedings of KR, pages 435–444, Toronto,
Canada, May 2010.
14
A Logic for Specifying Partially Observable Stochastic Domains
Gavin Rens1,2 and Thomas Meyer1,2 and Alexander Ferrein3 and Gerhard Lakemeyer3
{grens,tmeyer}@meraka.org.za {ferrein,gerhard}@cs.rwth-aachen.de
1
2
CSIR Meraka Institute, Pretoria, South Africa
University of KwaZulu-Natal, School of Computer Science, South Africa
3
RWTH Aachen University, Informatik, Germany
Abstract
with expected meanings. The robot can perceive observations only from the set Ω = {obsNil , obsLight, obsMedium,
obsHeavy}. When the robot performs a weigh action (i.e., it
activates its ‘weight’ sensor) it will perceive either obsLight,
obsMedium or obsHeavy; for other actions, it will perceive
obsNil . The robot experiences its environs through three
Boolean features: P = {full , drank, holding} meaning respectively that the oil-can is full, that the robot has drunk the
oil and that it is currently holding something in its gripper.
Given a formalization K of our scenario, the robot may
have the following queries:
We propose a novel modal logic for specifying
agent domains where the agent’s actuators and sensors are noisy, causing uncertainty in action and
perception. The logic draws both on POMDP theory and logics of action and change. The development of the logic builds on previous work in which
a simple multi-modal logic was augmented with
first-class observation objects. These observations
can then be used to represent the set of observations in a POMDP model in a natural way. In this
paper, a subset of the simple modal logic is taken
for the new logic, in which modal operators may
not be nested. The modal operators are then extended with notions of probability. It will be shown
how stochastic domains can be specified, including
new kinds of axioms dealing with perception and a
frame solution for the proposed logic.
1
• Is the probability of perceiving that the oil-can is light
0.7 when the can is not full, and have I drunk the
oil, and am I holding the can? Does (obsLight |
weigh)0.7 (¬full ∧ drank ∧ holding) follow from K?
• If the oil-can is empty and I’m not holding it, is there
a 0.9 probability that I’ll be holding it after grabbing
it, and a 0.1 probability that I’ll have missed it? Does
(¬full ∧ ¬holding) → ([grab]0.9 (¬full ∧ holding) ∧
[grab]0.1 (¬full ∧ ¬holding)) follow from K?
Introduction and Motivation
In order for robots and intelligent agents in stochastic domains to reason about actions and observations, they must
first have a model of the domain over which to reason. For
example, a robot may need to represent available knowledge
about its grab action in its current situation. It may need
to represent that when ‘grabbing’ the oil-can, there is a 5%
chance that it will knock over the oil-can. As another example, if the robot has access to information about the weight
of an oil-can, it may want to represent the fact that the can
weighs heavy with a 90% chance in ‘situation A’, but that it
is heavy with a 98% chance in ‘situation B’.
Logic-based artificial intelligence for agent reasoning is
well established. In particular, a domain expert choosing
to represent domains with a logic can take advantage of the
progress made in cognitive robotics [Levesque and Lakemeyer, 2008] to specify domains in a compact and transparent
manner. Modal logic is considered to be well suited to reasoning about beliefs and changing situations.
POMDP theory has proven to be a good general framework
for formalizing dynamic, stochastic systems. A drawback of
traditional POMDP models is that they cannot include information about general facts and laws. Moreover, succinct axioms describing the dynamics of a domain cannot be writ-
In the physical real world, or in extremely complex engineered systems, things are not black-and-white. We live in
a world where there can be shades of truth and degrees of belief. Part of the problem is that agents’ actuators and sensors
are noisy, causing uncertainty in their action and perception.
In this paper, we propose a novel logic that draws on partially
observable Markov decision process (POMDP) theory and on
logics for reasoning about action and change, combining both
in a coherent language to model change and uncertainty.
Imagine a robot that is in need of an oil refill. There is
an open can of oil on the floor within reach of its gripper.
If there is nothing else in the robot’s gripper, it can grab the
can (or miss it, or knock it over) and it can drink the oil by
lifting the can to its ‘mouth’ and pouring the contents in (or
miss its mouth and spill). The robot may also want to confirm
whether there is anything left in the oil-can by weighing its
contents. And once holding the can, the robot may wish to
place it back on the floor. In situations where the oil-can is
full, the robot gets 5 units of reward for grabbing the can, and
it gets 10 units for a drink action. Otherwise, the robot gets
no rewards. Rewards motivate an agent to behave as desired.
The domain is (partially) formalized as follows. The robot
has the set of actions A = {grab, drink, weigh, replace}
15
and Schmolze, 2005; Sanner and Kersting, 2010; Poole,
1998]. But for two of these, the frameworks are not logics per
se. The first [Wang and Schmolze, 2005] is based on Functional STRIPS, “which is a simplified first-order language that
involves constants, functions, and predicate symbols but does
not involve variables and quantification”. Their representations of POMDPs are relatively succinct and they have the advantage of using first-order predicates. The STRIPS-like formalism is geared specifically towards planning, though, and
their work does not mention reasoning about general facts.
Moreover, in their approach, action-nondeterminism is modeled by associating sets of deterministic action-outcomes per
nondeterministic action, whereas SLAOP will model nondeterminism via action effects—arguably, ours is a more natural
and succinct method. Sanner and Kersting [2010] is similar to the first formalism, but instead of Functional STRIPS,
they use the situation calculus to model POMDPs. Although
reified situations make the meaning of formulae perspicuous,
and reasoning with the situation calculus, in general, has been
accepted by the community, when actions are nondeterministic, ‘action histories’ cause difficulties in our work: The set
of possible alternative histories is unbounded and some histories may refer to the same state [Rens, 2010, Chap. 6]. When,
in future work, SLAOP is extended to express belief states
(i.e., sets of possible alternative states), dealing with duplicate states will be undesirable.
The Independent Choice Logic [Poole, 1998] is relatively
different from SLAOP; it is an extension of Probabilistic Horn
Abduction. Due to its difference, it is hard to compare to
SLAOP, but it deserves mentioning because it shares its application area with SLAOP and both are inspired by decision
theory. The future may tell which logic is better for certain
representations and for reasoning over the representations.
Finally, SLAOP was not conceived as a new approach to
represent POMDPs, but as the underlying specification language in a larger meta-language for reasoning robots that
include notions of probabilistic uncertainty. The choice of
POMDPs as a semantic framework is secondary.
ten in POMDP theory. In this work, we develop a logic that
will further our goal of combining modal logic with POMDP
theory. That is, here we design a modal logic that can represent POMDP problems specifically for reasoning tasks in
cognitive robotics (with domain axioms). The logic for actual decision-making will be developed in later work. To facilitate the correspondence between POMDPs and an agent
logic, we require observation objects in the logic to correspond to the POMDPs’ set of observations. Before the introduction of the Logic of Actions and Observations (LAO)
[Rens et al., 2010], no modal logic had explicit observations
as first-class elements; sensing was only dealt with via special
actions or by treating actions in such a way that they somehow
get hold of observations. LAO is also able to accommodate
models of nondeterminism in the actions and models of uncertainty in the observations. But in LAO, these notions are
non-probabilistic.
In this paper we present the Specification Logic of Actions
and Observations with Probability (SLAOP). SLAOP is derived from LAO and thus also considers observations as firstclass objects, however, a probabilistic component is added
to LAO for expressing uncertainty more finely. We have invented a new knowledge representation framework for our
observation objects, based on the established approaches for
specifying the behavior of actions.
We continue our motivation with a look at the related work,
in Section 2. Section 3 presents the logic and Section 4 provides some of the properties that can be deduced. Section 5
illustrates domain specification with SLAOP, including a solution to the frame problem. Section 6 concludes the paper.
2
Related Work
Although SLAOP uses probability theory, it is not for reasoning about probability; it is for reasoning about (probabilistic) actions and observations. There have been many
frameworks for reasoning about probability, but most of them
are either not concerned with dynamic environments [Fagin and Halpern, 1994; Halpern, 2003; Shirazi and Amir,
2007] or they are concerned with change, but they are not
actually logics [Boutilier et al., 2000; Bonet and Geffner,
2001]. Some probabilistic logics for reasoning about action
and change do exist [Bacchus et al., 1999; Iocchi et al., 2009],
but they lack some desirable attributes, for example, a solution to the frame problem, nondeterministic actions, or catering for sensing. There are some logics that come closer to
what we desire [Weerdt et al., 1999; Van Diggelen, 2002;
Gabaldon and Lakemeyer, 2007; Van Benthem et al., 2009],
that is, they are modal and they incorporate notions of probability, but they were not created with POMDPs in mind and
they don’t take observations as first-class objects. One nonlogical formalism for representing POMDPs [Boutilier and
Poole, 1996] exploits structure in the problems for more compact representations. In (logic-based) cognitive robotics, such
compact representation is the norm, for example, specifying
only local effects of actions, and specifying a value related to
a set of states in only one statement.
On the other hand, there are three formalisms for specifying POMDPs that employ logic-based representation [Wang
3
Specification Logic of Actions and
Observations with Probability
SLAOP is a non-standard modal logic for POMDP specification for robot or intelligent agent design. The specification of robot movement has a ‘single-step’ approach
in SLAOP. As such, the syntax will disallow nesting of
modal operators; sentences with sequences of actions, like
[grab][drink][replace]drank are not allowed. Sentences
will involve at most unit actions, like [grab]holding ∨
[drink]drank. Nevertheless, the ‘single-step’ approach is
sufficient for specifying the probabilities of transitions due to
action executions. The logic to be defined in a subsequent
paper will allow an agent to query the probability of some
propositional formula ϕ after an arbitrary sequence of actions and observations.
16
3.1 Syntax
state-transition function, representing, for each action, transition probabilities between states; R is the reward function,
giving the expected immediate reward gained by the agent,
for any state and agent action; Ω is a finite set of observations
the agent can experience of its environment; and O is the observation function, giving, for each action and the resulting
state, a probability distribution over observations, representing the agent’s ‘trust’ in its observations.
Our semantics follows that of multi-modal logic K. However, SLAOP structures are non-standard in that they are extensions of structures with the form hW, Ri, where W is a
finite set of worlds such that each world assigns a truth value
to each atomic proposition, and R is a binary relation on W .
Intuitively, when talking about some world w, we mean a
set of features (fluents) that the agent understands and that
describes a state of affairs in the world or that describes a
possible, alternative world. Let w : P 7→ {0, 1} be a total
function that assigns a truth value to each fluent. Let C be
the set of all possible functions w. We call C the conceivable
worlds.
The vocabulary of our language contains four sorts:
1. a finite set of fluents (alias propositional atoms) P =
{p1 , . . . , pn },
2. a finite set of names of atomic actions A
{α1 , . . . , αn },
=
3. a finite set of names of atomic observations Ω =
{ς1 , . . . , ςn },
4. a countable set of names Q = {q1 , q2 , . . .} of rational
numbers in Q.
From now on, denote Q ∩ (0, 1] as Q∩ . We refer to elements
of A ∪ Ω ∪ Q as constants. We are going to work in a multimodal setting, in which we have modal operators [α]q , one
for each α ∈ A, and predicates (ς | α)q and (ς | α)✸ , for
each pair in Ω × A.
Definition 3.1 Let α, α′ ∈ A, ς, ς ′ ∈ Ω, q ∈ (Q ∩ (0, 1]),
r, c ∈ Q and p ∈ P. The language of SLAOP, denoted
LSLAOP , is the least set of Φ defined by the grammars:
Definition 3.2 A SLAOP structure is a tuple S
hW, R, O, N, Q, U i such that
ϕ ::= p | ⊤ | ¬ϕ | ϕ ∧ ϕ.
Φ ::= ϕ | [α]q ϕ | (ς | α)q | (ς | α)✸ | α = α′ |
=
1. W ⊆ C: the set of possible worlds
(corresponding to S);
ς = ς ′ | Reward(r) | Cost(α, c) | ¬Φ | Φ ∧ Φ.
2. R: a mapping that provides an accessibility relation Rα : W × W × Q∩ for each α ∈ A (correspondingPto T ); Given some w− ∈ W , we require that (w− ,w+ ,pr)∈Rα pr = 1; If (w− , w+ , pr),
(w− , w+ , pr′ ) ∈ Rα , then pr = pr′ ;
As usual, we treat ⊥, ∨, → and ↔ as abbreviations.
We shall refer to formulae ϕ ::= p | ⊤ | ¬ϕ | ϕ ∧ ϕ as
static. If a formula is static, it mentions no actions and no
observations.
[α]q ϕ is read ‘The probability of reaching a world in which
ϕ holds after executing α, is equal to q’. [α] abbreviates [α]1 .
hαiϕ abbreviates ¬[α]¬ϕ. (ς | α)q can be read ‘The probability of perceiving ς is equal to q, given α was performed’.
(ς | α)✷ abbreviates (ς | α)1 . (ς | α)✸ is read ‘It is possible
to perceive ς’, given α was performed’.
The definition of a POMDP reward function R(a, s) may
include not only the expected rewards for being in the states
reachable from s via a, but it may deduct the cost of performing a in s. To specify rewards and execution costs in
SLAOP, we require Reward and Cost as special predicates.
Reward(r) can be read ‘The reward for being in the current
situation is r units,’ and we read Cost(α, c) as ‘The cost for
executing α is c units.’
Let VA = {v1α , v2α , . . .} be a countable set of action variables and VΩ = {v1ς , v2ς , . . .} a countable set of observation
α
α
variables. Let ϕ|vα1 ∧ . . . ∧ ϕ|vαn be abbreviated by (∀v α )ϕ,
where ϕ|vc means ϕ with all variables v ∈ (VA ∪ VΩ ) appearing in it replaced by constant c of the right sort (action or observation). Quantification over observations is similar to that
for actions; the symbol ∃ is also available for abbreviation,
with the usual meaning.
3. O: a nonempty finite set of observations
(corresponding to Ω);
4. N : Ω 7→ O is a bijection that associates to each name
in Ω, a unique observation in O;
5. Q: a mapping that provides a perceivability relation
Qα : O × W × Q∩ for each α ∈ A (correspond+
ing
that
P to O); Given some w ∈ + W , we require
+
′
pr
=
1;
If
(ς,
w
,
pr),
(ς,
w
,
pr
) ∈
+
(o,w ,pr)∈Qα
Qα , then pr = pr′ ;
6. U : a pair hRe, Coi (corresponding to R), where Re :
W 7→ Q is a reward function and Co is a mapping that
provides a cost function Coα : W 7→ Q for each α ∈ A;
7. Observation-per-action condition: For all α ∈ A, if
(w, w′ , prα ) ∈ Rα , then there is an o ∈ O s.t.
(o, w′ , pro ) ∈ Qα ;
8. Nothing-for-nothing condition: For all w, if there exists
no w′ s.t. (w, w′ , pr) ∈ Rα for some pr, then
Coα (w) = 0.
A corresponds to A and Ω to Ω. Rα defines which worlds
w+ are accessible via action α performed in world w− and
the transition probability pr ∈ Q∩ . Qα defines which observations o are perceivable in worlds w+ accessible via action
α and the observation probability pr ∈ Q∩ . We prefer to
exclude relation elements referring to transitions that cannot
occur, hence why pr ∈ Q∩ and not pr ∈ Q ∩ [0, 1].
3.2 Semantics
While presenting our semantics, we show how a POMDP, as
defined below, can be represented by a SLAOP structure.
A POMDP [Kaelbling et al., 1998] (for our purposes) is a
tuple hS, A, T , R, Ω, Oi, where S is a finite set of states that
the agent can be in; A is a finite set of agent actions; T is the
17
Because N is a bijection, it follows that |O| = |Ω| (we take
|X| to be the cardinality of set X). The value of the reward
function Re(w) is a rational number representing the reward
an agent gets for being in or getting to the world w. It must
be defined for each w ∈ W . The value of the cost function
Co(α, w− ) is a rational number representing the cost of executing α in the world w− . It must be defined for each action
α ∈ A and each w− ∈ W . Item 7 of Definition 3.2 implies
that actions and observations always appear in pairs, even if
implicitly. And item 8 seems reasonable; it states that any action that is inexecutable in world w incurs no cost for it in the
world w.
Proposition 4.1 Assume an arbitrary structure S and some
w in S. Assume S, w |= [α]q θ ∧ [α]q′ ψ. Then
1. if q = q ′ then no deduction can be made;
2. if q 6= q ′ then S, w |= hαi¬(θ ↔ ψ);
3. if q > q ′ then S, w |= hαi¬(θ → ψ);
4. if q + q ′ > 1 then S, w |= hαi(θ ↔ ψ);
5. S, w |= [α]¬(θ ∧ ψ) → [α]q+q′ (θ ∨ ψ);
6. if q = 1 then S, w |= [α](ψ → θ) and
S, w |= [α]q′ (θ ∧ ψ);
7. S, w |= [α]q ⊤ is a contradiction if q < 1;
8. S, w |= [α]1−q ¬ϕ iff S, w |= [α]q ϕ and q 6= 1;
9. S, w |= ¬[α]1−q ¬ϕ iff S, w |= ¬[α]q ϕ and q 6= 1.
Definition 3.3 (Truth Conditions) Let S be a SLAOP structure, with α, α′ ∈ A, ς, ς ′ ∈ Ω, q ∈ (Q ∩ (0, 1]) or Q∩ as
applicable, and r ∈ Q or Q as applicable. Let p ∈ P and
let ϕ be any sentence in LSLAOP . We say ϕ is satisfied at
world w in structure S (written S, w |= ϕ) if and only if the
following holds:
1. S, w |= p iff w(p) = 1 for w ∈ W ;
2. S, w |= ⊤ for all w ∈ W ;
3. S, w |= ¬ϕ iff S, w 6|= ϕ;
4. S, w |= ϕ ∧ ϕ′ iff S, w |= ϕ and S, w |= ϕ′ ;
5. S, w |= α = α′ iff α and α′ are identical;
6. S, w |= ς = ς ′ iff ς and ς ′ are identical;
P
7. S, w |= [α]q ϕ iff
(w,w′ ,pr)∈Rα ,S,w′ |=ϕ pr = q;
8.
9.
10.
11.
S, w
S, w
S, w
S, w
Proof:
Please refer to our draft report [Rens and Meyer, 2011].
Q.E.D.
It is worth noting that in the case when q > q ′ (item 3),
S, w |= hαi¬(θ ∧ ψ) is also a consequence. But hαi¬(θ →
ψ) logically implies hαi¬(θ ∧ ψ).
Consider item 8 further: Suppose [α]q∗ ϕ where q ∗ = 1 (in
some structure at some world). Then, in SLAOP, one could
represent S, w |= [α]1−q∗ ¬ϕ as ¬hαi¬ϕ. But this is just
[α]ϕ (≡ [α]q∗ ϕ). The point is that there is no different way to
represent [α]ϕ in SLAOP (other than syntactically). Hence,
in item 8, we need not cater for the case when q = 1.
Proposition 4.2 |=SLAOP ([α]q θ ∧ ¬[α]q ψ) → ¬[α](θ ↔
ψ).
Proof:
Let S be any structure and w a world in S. Assume S, w |=
[α]q θ ∧ ¬[α]q ψ. Assume S, w |= [α](θ ↔ ψ). Then because S, w |= [α]q θ, one can deduce S, w |= [α]q ψ. This
is a contradiction, therefore S, w 6|= [α](θ ↔ ψ). Hence,
S, w |= ([α]q θ ∧ ¬[α]q ψ) → ¬[α](θ ↔ ψ). Q.E.D.
Proposition 4.3 Assume an arbitrary structure S and an arbitrary world w in S. There exists some constant q such that
S, w |= [α]q ϕ if and only if S, w |= hαiϕ.
Proof:
Assume an arbitrary structure S and an arbitrary world w in
it. Then
S, w |= [α]
Pq ϕ for some constant q
⇔ ∃q .
(w,w′ ,pr)∈Rα ,S,w′ |=ϕ pr = q
P
pr = 0
⇔ Not: ∃q .
′
′
P(w,w ,pr)∈Rα ,S,w |=ϕ
⇔ Not: ∃q .
′
′
(w,w ,pr)∈Rα ,S,w |=¬ϕ pr = 1
⇔ Not: S, w |= [α]¬ϕ
⇔ S, w |= hαiϕ. Q.E.D.
|= (ς | α)q iff (N (ς), w, q) ∈ Qα ;
|= (ς | α)✸ iff a q exists s.t. (N (ς), w, q) ∈ Qα ;
|= Reward(r) iff Re(w) = r;
|= Cost(α, c) iff Coα (w) = c.
The definition of item 7 comes from probability theory, which
says that the probability of an event (ϕ) is simply the sum
of the probabilities of the atomic events (worlds) where the
event (ϕ) holds.
A formula ϕ is valid in a SLAOP structure (denoted S |=
ϕ) if S, w |= ϕ for every w ∈ W . We define global logical
entailment
(denoted K |=GS ϕ) as follows: for all S, if S |=
V
ψ
then
S |= ϕ.
ψ∈K
4
Some Properties
Remark 4.1 Item 7 of Definition 3.2, the observation-peraction condition, implies that if S, w |= hαiϕ then S, w′ |=
ϕ → (∃v ς )(v ς | α)✸ , for some w, w′ ∈ W .
Remark 4.2 Item 8 of Definition 3.2, the nothing-for-nothing
condition, implies that |=SLAOP (∀v α ) ¬hv α i⊤ →
Cost(v α , 0).
We are also interested in noting the interactions of any two
percept events—when sentences of the form (ς | α)q ϕ are
satisfied in the same world. Only two consequences could be
gleaned, given Definition 3.3, item 8:
Proposition 4.4 Assume an arbitrary structure S and some
w in S.
1. If S, w |= (ς | α)q ∧ (ς ′ | α)q′ and ς is the same observation as ς ′ , then q = q ′ ;
In the terminology of probability theory, a single world
would be called an atomic event. Probability theory says that
the probability of an event e is simply the sum of the probabilities of the atomic events (worlds) where e holds. We are
interested in noting the interactions of any two sentences of
the form [α]q ϕ being satisfied in the same world. Given the
principle of the sum of atomic events, we get the following
properties.
18
2. If S, w |= (ς | α)q ∧ (ς ′ | α)q′ and ς is not the same
observation as ς ′ , then q + q ′ ≤ 1.
5.1 The Action Description
In the following discussion, W ϕ is the set of worlds in which
static formula ϕ holds (the ‘models’ of ϕ). A formal description for the construction of conditional effect axioms follows.
For one action, there is a set of axioms that take the form
Proof:
Directly from probability theory and algebra. Q.E.D.
Proposition 4.5 Assume an arbitrary structure S and an arbitrary world w in it. There exists some constant q such that
S, w |= (ς | α)q if and only if S, w |= (ς | α)✸ .
Proof:
Let N (ς) = o. Assume an arbitrary structure S and an arbitrary world w in S. Then
S, w |= (ς | α)q for some constant q
⇔ ∃q . (o, w, q) ∈ Qα
⇔ S, w |= (ς | α)✸ . Q.E.D.
φ1 → ([α]q11 ϕ11 ∧ . . . ∧ [α]q1n ϕ1n );
φ2 → ([α]q21 ϕ21 ∧ . . . ∧ [α]q2n ϕ2n );
φj → ([α]qj1 ϕj1 ∧ . . . ∧ [α]qjn ϕjn ),
where the φi and ϕik are static, and where the φi are conditions for the respective effects to be applicable, and in any
one axiom, each ϕik represents a set W ϕik of worlds. The
number qik is the probability that the agent will end up in a
world in W ϕik , as the effect of performing α in the right condition φi . For axioms generated from the effect axioms (later
in Sec. 5.1), we shall assume that ϕik is a minimal disjunctive normal form characterization of W ϕik . The following
constraints apply.
The following is a direct consequence of Propositions 4.3 and 4.5.
Corollary 4.1 |=SLAOP [α]q ϕ → hαiϕ and |=SLAOP (ς |
α)q → (ς | α)✸ .
Further Properties of Interest
−
Recall that Rα
= {(w, w′ ) | (w, w′ , pr) ∈ Rα }. We now
justify treating [α]1 as [α] of regular multi-modal logic.
Proposition 4.6 [α]1 is the regular [α]. That is, S, w |=
− ′
[α]1 ϕ if and only if for all w′ , if wRα
w , then S, w′ |= ϕ, for
any structure S and any world w in S.
Proof:
S, wP
|= [α]1 ϕ
⇔
(w,w′ ,pr)∈Rα ,S,w′ |=ϕ pr = 1
⇔ ∀w′ . if ∃pr . (w, w′ , pr) ∈ Rα then S, w′ |= ϕ
− ′
⇔ ∀w′ . if wRα
w then S, w′ |= ϕ. Q.E.D.
Proposition 4.7 hαi has normal semantics. That is, S, w |=
hαiϕ if and only if there exist w′ , pr such that (w, w′ , pr) ∈
Rα and S, w′ |= ϕ.
Proof:
S, w |= hαiϕ
⇔ S, w |= ¬[α]¬ϕ
⇔ S, w |= ¬[α]1 ¬ϕ
⇔ S,Pw 6|= [α]1 ¬ϕ
⇔
(w,w′ ,pr)∈Rα ,S,w′ |=¬ϕ pr 6= 1
⇔ ∃w′ , pr . (w, w′ , pr) ∈ Rα and S, w′ 6|= ¬ϕ
⇔ ∃w′ , pr . (w, w′ , pr) ∈ Rα and S, w′ |= ϕ. Q.E.D.
5
··· ;
• There must be a set of effect axioms for each action α ∈
A.
• The φi must be mutually exclusive, i.e., the conjunction
of any pair of conditions causes a contradiction. However, it is not necessary that W ϕi1 ∪ . . . ∪ W ϕin = C.
• A set of effects ϕi1 to ϕin in any axiom i must be mutually exclusive.
• The transition probabilities qi1 , . . . , qin of any axiom i
must sum to 1.
The following sentence is an effect axiom for the grab action: (full ∧ ¬holding) → ([grab]0.7 (full ∧ holding) ∧
[grab]0.2 (¬full ∧ ¬holding) ∧ [grab]0.1 (full ∧ ¬holding)).
Executability axioms of the form φk → hαi⊤ must be supplied, for each action, where φk is a precondition conveying physical restrictions in the environment with respect to α.
The sentence ¬holding → hgrabi⊤ states that if the robot is
not holding the oil-can, then it is possible to grab the can.
A set of axioms must be generated that essentially states
that if the effect or executability axioms do not imply executability for some action, then that action is inexecutable.
Hence, given α, assume the presence of an executability closure axiom of the following form: ¬(φ1 ∨ . . . ∨ φj ∨ φk ) →
¬hαi⊤. The sentence holding → ¬hgrabi⊤ states that if the
robot is holding the oil-can, then it is not possible to grab it.
Now we show the form of sentences that specify what does
not change under certain conditions—conditional frame axioms. Let φi → ([α]qi1 ϕi1 ∧ . . . ∧ [α]qin ϕin ) be the i-th
effect axiom for α. For each α ∈ A, for each effect axiom
i, do: For each fluent p ∈ P, if p is not mentioned in ϕi1 to
ϕin , then (φi ∧ p) → [α]p and (φi ∧ ¬p) → [α]¬p are part of
the domain specification.
For our scenario, the conditional frame axioms of grab are
Specifying Domains with SLAOP
We briefly describe and illustrate a framework to formally
specify—in the language of SLAOP—the domain in which
an agent or robot is expected to live. Let BK be an agent’s
background knowledge (including non-static formulae) and
let IC be its initial condition, a static formula describing
the world the agent finds itself in when it becomes active.
In the context of SLAOP, we are interested in determining
BK |=GS IC → ϕ, where ϕ is any sentence.
The agent’s background knowledge may include static law
axioms which are facts about the domain that do not change.
They have no predictable form, but by definition, they are
not dynamic and thus exclude mention of actions. drank →
¬full is one static law axiom for the oil-can scenario. The
other kinds of axioms in BK are described below.
(full ∧ ¬holding ∧ drank) → [grab]drank;
(full ∧ ¬holding ∧ ¬drank) → [grab]¬drank;
(¬full ∧ ¬holding ∧ drank) → [grab]drank;
(¬full ∧ ¬holding ∧ ¬drank) → [grab]¬drank.
19
Given frame and effect axioms, it may still happen that the
probability to some worlds cannot be logically deduced. Suppose (for the purpose of illustration only) that the sentence
[grab]0.7 (full ∧ holding) ∧
[grab]0.3 (full ∧ ¬holding ∧ drank).
via some action, there exists an observation associated
with the action, perceivable in that world. The perceivability axioms must adhere to this remark.
• For every pair of perceivability axioms φ → (ς | α)q
and φ′ → (ς | α)q′ for the same observation ς, W φ must
′
be disjoint from W φ .
P
• For every particular condition φ, φ→(ς|α)q q = 1. This
P
is so that N (ς):(N (ς),w+ ,pr)∈Qα pr = 1.
(1)
can be logically deduced from the frame and effect axioms in
BK. Now, according to (1) the following worlds are reachable: (full ∧ holding ∧ drank), (full ∧ holding ∧ ¬drank)
and (full ∧ ¬holding ∧ drank). The transition probability to (full ∧ ¬holding ∧ drank) is 0.3, but what are the
transition probabilities to (full ∧ holding ∧ drank) and
(full ∧holding ∧¬drank)? We have devised a process to determine such hidden probabilities via uniform axioms [Rens
and Meyer, 2011]. Uniform axioms describes how to distribute probabilities of effects uniformly in the case sufficient
information is not available. It is very similar to what [Wang
and Schmolze, 2005] do to achieve compact representation.
A uniform axiom generated for (1) would be
Some perceivability axioms for the oil-can scenario might be
(obsNil | grab)✷ ;
(¬full ∧ drank ∧ holding) → (obsLight | weigh)0.7 ;
(¬full ∧ drank ∧ holding) → (obsHeavy | weigh)0.1 ;
(¬full ∧ drank ∧ holding) → (obsMedium | weigh)0.2 .
Perceivability axioms for sensory actions also state when
the associated observations are possible. The following set of
axioms states when the associated observations are impossible for sensory action weigh of our scenario.
[grab]0.35 (full ∧ holding ∧ drank) ∧
[grab]0.35 (full ∧ holding ∧ ¬drank) ∧
[grab]0.3 (full ∧ ¬holding ∧ drank).
((¬full ∧ drank ∧ ¬holding) ∨ (full ∧ ¬drank ∧
¬holding)) → ¬(lobsLight | weigh)✸ ;
((¬full ∧ drank ∧ ¬holding) ∨ (full ∧ ¬drank ∧
¬holding)) → ¬(obsHeavy | weigh)✸ ;
((¬full ∧ drank ∧ ¬holding) ∨ (full ∧ ¬drank ∧
¬holding)) → ¬(obsMedium | weigh)✸ .
The following axiom schema represents all the effect condition closure axioms. (¬(φ1 ∨. . .∨φj )∧P ) → [A]P , where
there is a different axiom for each substitution of α ∈ A for
A and each literal for P . For example, (holding ∧ P ) →
[grab]P , where P is any p ∈ P or its negation.
The perceivability condition closure axiom schema is
5.2 The Perception Description
¬(φ11 ∨ · · · ∨ φ1j ) → ¬(ς1 | α)✸ ;
¬(φ21 ∨ · · · ) → ¬(ς2 | α)✸ ; · · · ;
¬(· · · ∨ φnk ) → ¬(ςn | α)✸ ,
One can classify actions as either ontic (physical) or sensory.
This classification also facilitates specification of perceivability. Ontic actions have intentional ontic effects, that is, effects on the environment that were the main intention of the
agent. grab, drink and replace are ontic actions. Sensory
actions—weigh in our scenario—result in perception, maybe
with (unintended) side-effects.
Perceivability axioms specify what conditions must hold
after the applicable action is performed, for the observation
to be perceivable. Ontic actions each have perceivability axioms of the form (obsNil | α)✷ . Sensory actions typically
have multiple observations and associated conditions for perceiving them. The probabilities for perceiving the various observations associated with sensory actions must be specified.
The following set of perceivability axiom schemata does this:
φ11 → (ς1 | α)q11 ; φ12 → (ς1 | α)q12 ;
φ1j → (ς1 | α)q1n ; φ21 → (ς2 | α)q21 ;
φnk → (ςn | α)qkn ,
where the φi are taken from the perceivability axioms. There
are no perceivability closure axioms for ontic actions, because they are always tautologies.
Ontic actions each have unperceivability axioms of the
form (∀v ς )((v ς | α)✸ ↔ v ς = obsNil ). The axiom says
that no other observation is perceivable given the ontic action. That is, for any instantiation of an observation ς ′ other
than obsNil , ¬(ς ′ | α)✸ is a logical consequence.
For sensory actions, to state that the observations not associated with action α are always impossible given α was executed, we need an axiom of the form (∀v ς )(v ς 6= o1 ∧ v ς 6=
o2 ∧ · · · ∧ v ς 6= on ) → ¬(v ς | α)✸ . For the oil-can scenario,
they are
··· ;
··· ;
(∀v ς )(v ς | grab)✸ ↔ v ς = obsNil ;
(∀v ς )(v ς | drink)✸ ↔ v ς = obsNil ;
(∀v ς )(v ς | replace)✸ ↔ v ς = obsNil ;
(∀v ς )(v ς 6= obsHeavy ∧ v ς 6= obsLight ∧
v ς 6= obsMedium) → ¬(v ς | weigh)✸ .
where {ς1 , ς2 , . . . , ςn } is the set of first components of all elements in Qα and the φi are the conditions expressed as static
formulae. The following constraints apply to these axioms.
• There must be a set of perceivability axioms for each
action α ∈ A.
5.3 The Utility Function
• In the semantics section, item 7 of the definition of a
SLAOP structure states that for every world reachable
A sufficient set of axioms concerning ‘state rewards’ and ‘action costs’ constitutes a utility function.
20
and
There must be a means to express the reward an agent will
get for performing an action in a world it may find itself—
for every action and every possible world. The domain expert must supply a set of reward axioms of the form φi →
Reward(ri ), where φi is a condition specifying the world in
which the rewards can be got (e.g., holding → Reward(5)
and drank → Reward(10)).
The conditions of the reward axioms must identify worlds
that are pairwise disjoint. This holds for cost axioms too:
The domain expert must also supply a set of cost axioms of
the form (φi ∧ hαi⊤) → Cost(α, ci ), where φi is a condition
specifying the world in which the cost ci will be incurred for
action α. For example,
(full ∧ hgrabi⊤) → Cost(grab, 2);
(¬full ∧ hgrabi⊤) → Cost(grab, 1);
(full ∧ hdrinki⊤) → Cost(drink, 2);
(¬full ∧ hdrinki⊤) → Cost(drink, 1);
hreplacei⊤ → Cost(replace, 0.8).
(∀v α )¬p →
(v α = β1 ∧ ¬Cond− (β1 , p)) → [β1 ])¬p ∧
..
.
(v α = βm ∧ ¬Cond− (βm , p)) → [βm ])¬p ∧
(v α 6= β1 ∧ · · · ∧ v α 6= βm ) → [v α ])¬p.
Claim 5.1 The collection of pairs of compact frame axioms
for each fluent in P is logically equivalent to the collection of
all conditional frame axioms and effect closure axioms generated with the processes presented above.
Proof:
Please refer to our draft report [Rens and Meyer, 2011].
Q.E.D.
5.4 A Frame Solution
The method we propose for avoiding generating all the frame
and effect closure axioms, is to write the effect and executability axioms, generate the uniform axioms, and then generate a set of a new kind of axioms representing the frame and
effect closure axioms much more compactly. By looking at
the effect axioms of a domain, one can define for each fluent
p ∈ P a set Cause+ (p) of actions that can (but not necessarily always) cause p (as a positive literal) to flip to ¬p,
and a set Cause− (p) of actions that can (but not necessarily always) causes ¬p (as a negative literal) to flip to p.1 For
instance, grab ∈ Cause+ (f ull), because in effect axiom
(f ull ∧ ¬holding) →
([grab]0.7 (f ull ∧ holding) ∧
[grab]0.2 (¬f ull ∧ ¬holding) ∧ [grab]0.1 (f ull ∧ ¬holding)),
grab flips f ull to ¬f ull (with probability 0.2). The axiom
also shows that grab ∈ Cause− (holding) because it flips
¬holding to holding (with probability 0.7). The actions
mentioned in these sets may have deterministic or stochastic
effects on the respective propositions.
Furthermore, by looking at the effects axioms, Cond
functions can be defined: For each α ∈ Cause+ (p),
Cond+ (α, p) returns a sentence that represents the disjunction of all φi under which α caused p to be a negative literal.
Cond− (α, p) is defined similarly.
Suppose that Cause+ (p)
=
{α1 , . . . , αm } and
Cause− (p) = {β1 , . . . , βn }. We propose, for any fluent p, a pair of compact frame axioms with schema
(∀v α )p →
(v α = α1 ∧ ¬Cond+ (α1 , p)) → [α1 ]p ∧
...
There are in the order of |A| · 2|Ω| · D frame axioms,
where D is the average number of conditions on effects per
action (the φi ). Let N be the average size of |Cause+ (p)| or
|Cause− (p)| for any p ∈ P. With the two compact frame
axioms (per fluent), no separate frame or effect closure axioms are required in the action description (AD). If we consider each of the most basic conjuncts and disjuncts as a unit
length, then the size of each compact frame axiom is O(N ),
and the size of all compact frame axioms in AD is in the order of N · 2|P|. For reasonable domains, N will be much
smaller than |A|, and the size of all compact frame axioms is
thus much smaller than the size of all frame and effect closure
axioms (|A| · 2|P| · (D + 1)).
5.5 Some Example Entailment Results
The following entailments have been proven concerning the
oil-can scenario [Rens and Meyer, 2011]. BK oc is the background knowledge of an agent in the scenario. To save space
and for neater presentation, we abbreviate constants and fluents by their initials.
BK oc |=GS (f ∧ d ∧ ¬h) → [g]0.7 (f ∧ d ∧ h):
If the can is full and the oil has been drunk, the probability of
successfully grabbing it without spilling oil is 0.7.
BK oc |=GS (f ∧ ¬d ∧ h) → ¬[d]0.2 (f ∨ ¬d ∨ ¬h):
If the robot is in a situation where it is holding the full oil-can
(and has not yet attempted drinking), then the probability of
having failed to drink the oil is not 0.2.
BK oc |=GS (∃v ς )(v ς | drink)✷ :
In any world, there always exists an observation after the
robot has drunk.
BK oc |=GS hdi⊤ ↔ h:
In any world, it is possible to drink the oil if and only if the
can is being held.
BK oc |=GS (f ∧ hdi⊤) → ¬Cost(d, 3):
Assuming it is possible to drink and the can is full of oil, then
the cost of doing the drink action is not 3 units.
6
(v α = αm ∧ ¬Cond+ (αm , p)) → [αm ]p ∧
(v α 6= α1 ∧ · · · ∧ v α 6= αm ) → [v α ]p
Concluding Remarks
We introduced a formal language specifically for robots that
must deal with uncertainty in affection and perception. It is
one step towards a general reasoning system for robots, not
the actual system.
1
Such sets and functions are also employed by Demolombe,
Herzig and Varzinczak [Demolombe et al., 2003].
21
[Halpern, 2003] J. Y. Halpern. Reasoning about Uncertainty.
The MIT Press, Cambridge, MA, 2003.
[Iocchi et al., 2009] L. Iocchi, T. Lukasiewicz, D. Nardi, and
R. Rosati. Reasoning about actions with sensing under
qualitative and probabilistic uncertainty. ACM Transactions on Computational Logic, 10(1):5:1–5:41, 2009.
[Kaelbling et al., 1998] L. Kaelbling, M. Littman, and
A. Cassandra. Planning and acting in partially observable
stochastic domains. Artificial Intelligence, 101(1–2):99–
134, 1998.
[Levesque and Lakemeyer, 2008] H. Levesque and G. Lakemeyer. Cognitive Robotics. In B. Porter F. Van Harmelen,
V. Lifshitz, editor, Handbook of Knowledge Representation, pages 869–886. Elsevier Science, 2008.
[Poole, 1998] D. Poole. Decision theory, the situation calculus and conditional plans. Linköping Electronic Articles in
Computer and Information Science, 8(3), 1998.
[Rens and Meyer, 2011] G. Rens and T. Meyer. Logic and
utility based agent planning language, Part II: Specifying
stochastic domains. Technical Report KRR-10-01, KRR,
CSIR Meraka Institute, Pretoria, South Africa, January
2011. url: http://krr.meraka.org.za/publications/2011.
[Rens et al., 2010] G. Rens, I. Varzinczak, T. Meyer, and
A. Ferrein. A logic for reasoning about actions and explicit observations. In Jiuyong Li, editor, Proc. of 23rd
Australasian Joint Conf. on AI, pages 395–404, 2010.
[Rens, 2010] G. Rens. A belief-desire-intention architecture
with a logic-based planner for agents in stochastic domains. Master’s thesis, School of Computing, University
of South Africa, 2010.
[Rens, 2011] G. Rens. From an agent logic to an agent programming language for partially observable stochastic domains. In Proc. of 22nd Intl. Joint Conf. on AI, Menlo
Park, CA, 2011. AAAI Press. To appear.
[Sanner and Kersting, 2010] S. Sanner and K. Kersting.
Symbolic dynamic programming for first-order POMDPs.
In Proc. of 24th Natl. Conf. on AI, pages 1140–1146, 2010.
[Shirazi and Amir, 2007] A. Shirazi and E. Amir. Probabilistic modal logic. In Proc. of 22nd Natl. Conf. on AI, pages
489–494. AAAI Press, 2007.
[Van Benthem et al., 2009] J. Van Benthem, J. Gerbrandy,
and B. Kooi. Dynamic update with probabilities. Studia
Logica, 93(1):67–96, 2009.
[Van Diggelen, 2002] J. Van Diggelen. Using modal logic in
mobile robots. Master’s thesis, Cognitive Artificial Intelligence, Utrecht University, 2002.
[Wang and Schmolze, 2005] C. Wang and J. Schmolze.
Planning with POMDPs using a compact, logic-based representation. In Proc. of 17th IEEE Intl. Conf. on Tools with
AI, pages 523–530, 2005.
[Weerdt et al., 1999] M. De Weerdt, F. De Boer, W. Van der
Hoek, and J.-J. Meyer. Imprecise observations of mobile
robots specified by a modal logic. In Proc. of ASCI-99,
pages 184–190, 1999.
POMDP theory is used as an underlying modeling formalism. The formal language is based on multi-modal logic and
accepts basic principals of cognitive robotics. We have also
included notions of probability to represent the uncertainty,
but we have done so ‘minimally’, that is, only as far as is
necessary to represent POMDPs for the intended application.
Beyond the usual elements of logics for reasoning about action and change, the logic presented here adds observations as
first-class objects, and a means to represent utility functions.
In an associated report [Rens and Meyer, 2011], the frame
problem is addressed, and we provided a belief network approach to domain specification for cases when the required
information is available.
The computational complexity of SLAOP was not determined, and is left for future work. Due to the nature of
SLAOP structures, we conjecture that entailment in SLAOP
is decidable. It’s worth noting that the three latter frameworks discussed in Section 2 [Wang and Schmolze, 2005;
Sanner and Kersting, 2010; Poole, 1998] do not mention decidability results either.
The next step is to prove decidability of SLAOP entailment, and then to develop a logic for decision-making in
which SLAOP will be employed. Domains specified in
SLAOP will be used to make decisions in the ‘meta’ logic,
with sentences involving sequences of actions and the epistemic knowledge of an agent. This will also show the significance of SLAOP in a more practical context. Please refer
to our extended abstract [Rens, 2011] for an overview of our
broader research programme.
References
[Bacchus et al., 1999] F. Bacchus, J. Y. Halpern, and H. J.
Levesque. Reasoning about noisy sensors and effectors
in the situation calculus. Artificial Intelligence, 111(1–
2):171–208, 1999.
[Bonet and Geffner, 2001] B. Bonet and H. Geffner. Planning and control in artificial intelligence: A unifying perspective. Applied Intelligence, 14(3):237–252, 2001.
[Boutilier and Poole, 1996] C. Boutilier and D. Poole. Computing optimal policies for partially observable decision
processes using compact representations. In Proc. of 13th
Natl. Conf. on AI, pages 1168–1175, 1996.
[Boutilier et al., 2000] C. Boutilier, R. Reiter, M. Soutchanski, and S. Thrun. Decision-theoretic, high-level agent
programming in the situation calculus. In Proc. of 17th
Natl. Conf. on AI, pages 355–362. AAAI Press, Menlo
Park, CA, 2000.
[Demolombe et al., 2003] R. Demolombe, A. Herzig, and
I. Varzinczak. Regression in modal logic. Journal of Applied Non-Classical Logics, 13(2):165–185, 2003.
[Fagin and Halpern, 1994] R. Fagin and J. Y. Halpern. Reasoning about knowledge and probability. J. of ACM,
41(2):340–367, 1994.
[Gabaldon and Lakemeyer, 2007] A. Gabaldon and G. Lakemeyer. ESP: A logic of only-knowing, noisy sensing and
acting. In Proc. of 22nd Natl. Conf. on AI, pages 974–979,
2007.
22
Agent Supervision in Situation-Determined ConGolog
Giuseppe De Giacomo
Sapienza – Università di Roma
Rome, Italy
Yves Lespérance
York University
Toronto, Canada
Christian Muise
University of Toronto
Toronto, Canada
[email protected]
[email protected]
[email protected]
Abstract
out). For example, we could have an agent process representing a child and its possible behaviors, and a second process
representing a babysitter that specifies the behaviors by the
child that can be allowed. If the supervisor can control all
the actions of the supervised agent, then it is straightforward
to specify the behaviors that may result as a kind of synchronized concurrent execution of the agent and supervisor
processes. A more interesting case arises when some agent
actions are uncontrollable. For example, it may be impossible to prevent the child from getting muddy once he/she
is allowed outside. In such circumstances, the supervisor
may have to block some agent actions, not because they are
undesirable in themselves (e.g. going outside), but because
if they are allowed, the supervisor cannot prevent the agent
from performing some undesirable actions later on (e.g. getting muddy).
We follow previous work [McIlraith and Son, 2002; Fritz
and McIlraith, 2006] in assuming that processes are specified in a high level agent programming language defined
in the Situation Calculus [Reiter, 2001].1 In fact, we define and use a restricted version of the ConGolog agent programming language [De Giacomo et al., 2000] that we call
Situation-Determined ConGolog (SDConGolog). In this version, following [De Giacomo et al., 2010] all transitions involve performing an action (i.e. there are no transitions that
merely perform a test). Moreover, nondeterminism is restricted so that the remaining program is a function of the
action performed, i.e. there is a unique remaining program
δ ′ such that a given program δ can perform a transition
(δ, s) →a (δ ′ , do(a, s)) involving action a in situation s. This
means that a run of such a program starting in a given situation can be taken to be simply a sequence of actions, as all
the intermediate programs one goes through are functionally
determined by the starting program and situation and the actions performed. Thus we can see a program and a starting
situation as specifying a language, that of all the sequences
of actions that are runs of the program in the situation. This
allows us to define language theoretic notions such as union,
intersection, and difference/complementation in terms of op-
We investigate agent supervision, a form of customization, which constrains the actions of an agent
so as to enforce certain desired behavioral specifications. This is done in a setting based on the
Situation Calculus and a variant of the ConGolog
programming language which allows for nondeterminism, but requires the remainder of a program
after the execution of an action to be determined
by the resulting situation. Such programs can be
fully characterized by the set of action sequences
that they generate. The main results are a characterization of the maximally permissive supervisor
that minimally constrains the agent so as to enforce
the desired behavioral constraints when some agent
actions are uncontrollable, and a sound and complete technique to execute the agent as constrained
by such a supervisor.
1
Introduction
There has been much work on process customization, where
a generic process for performing a task or achieving a goal
is customized to satisfy a client’s constraints or preferences
[Fritz and McIlraith, 2006; Lin et al., 2008; Sohrabi et al.,
2009]. This approach was originally proposed in [McIlraith and Son, 2002] in the context of web service composition [Su, 2008]. The idea is that the generic process
provides a wide range of alternative ways to perform the
task. During customization, alternatives that violate the constraints are eliminated. Some parameters in the remaining
alternatives may be restricted or instantiated so as to ensure that any execution of the customized process will satisfy the client’s constraints. Another approach to service
composition synthesizes an orchestrator that controls the execution of a set of available services to ensure that they
realize a desired service [Sardiña and De Giacomo, 2009;
Bertoli et al., 2010].
In this paper, we develop a framework for a similar type
of process refinement that we call supervised execution. We
assume that we have a nondeterministic process that specifies the possible behaviors of an agent, and a second process
that specifies the possible behaviors that a supervisor wants
to allow (or alternatively, of the behaviors that it wants to rule
1
Clearly, there are applications where a declarative formalism is
preferable, e.g. linear temporal logic (LTL), regular expressions over
actions, or some type of business rules. However, there has been previous work on compiling such declarative specification languages
into ConGolog, for instance [Fritz and McIlraith, 2006], which handles an extended version of LTL interpreted over a finite horizon.
23
erations on the corresponding programs, which has applications in many areas (e.g. programming by demonstration and
programming by instruction [Fritz and Gil, 2010], and plan
recognition [Demolombe and Hamon, 2002]).Working with
situation-determined programs also greatly facilitates the formalization of supervision/customization. In [De Giacomo et
al., 2010], it is in fact shown that any ConGolog program can
be made situation-determined by recording nondeterministic
choices made in the situation.
Besides a detailed characterization of SDConGolog,2 the
main contributions of the paper are as follows: first, based
on previous work in discrete event control [Wonham and Ramadge, 1987], we provide a characterization of the maximally
permissive supervisor that minimally constrains the actions
of the agent so as to enforce the desired behavioral specifications, showing its existence and uniqueness; secondly,
we define a program construct for supervised execution that
takes the agent program and supervisor program, and executes them to obtain only runs allowed by the maximally permissive supervisor, showing its soundness and completeness.
The rest of the paper proceeds as follows. In the next
section, we briefly review the Situation Calculus and the
ConGolog agent programming language. In Section 3, we define SDConGolog, discuss its properties, and introduce some
useful programming constructs and terminology. Then in
Section 4, we develop our account of agent supervision, and
define the maximal permissive supervisor and supervised execution. Finally in Section 5, we review our contributions and
discuss related and future work.
2
while ϕ do δ
δ1 |δ2
πx.δ
δ∗
δ1 kδ2
In the above, α is an action term, possibly with parameters,
and ϕ is situation-suppressed formula, that is, a formula in the
language with all situation arguments in fluents suppressed.
As usual, we denote by ϕ[s] the situation calculus formula
obtained from ϕ by restoring the situation argument s into
all fluents in ϕ. Program δ1 |δ2 allows for the nondeterministic choice between programs δ1 and δ2 , while πx.δ executes
program δ for some nondeterministic choice of a legal binding for variable x (observe that such a choice is, in general,
unbounded). δ ∗ performs δ zero or more times. Program
δ1 kδ2 expresses the concurrent execution (interpreted as interleaving) of programs δ1 and δ2 .
Formally, the semantics of ConGolog is specified in terms
of single-step transitions, using the following two predicates
[De Giacomo et al., 2000]: (i) T rans(δ, s, δ ′ , s′ ), which
holds if one step of program δ in situation s may lead to situation s′ with δ ′ remaining to be executed; and (ii) F inal(δ, s),
which holds if program δ may legally terminate in situation
s. The definitions of T rans and F inal we use are as in [De
Giacomo et al., 2010]; these are in fact the usual ones [De
Giacomo et al., 2000], except that the test construct ϕ? does
not yield any transition, but is final when satisfied. Thus, it
is a synchronous version of the original test construct (it does
not allow interleaving). A consequence of this is that in the
version of ConGolog that we use, every transition involves the
execution an action (tests do not make transitions), i.e.,
Preliminaries
The situation calculus is a logical language specifically designed for representing and reasoning about dynamically
changing worlds [Reiter, 2001]. All changes to the world are
the result of actions, which are terms in the language. We
denote action variables by lower case letters a, action types
by capital letters A, and action terms by α, possibly with subscripts. A possible world history is represented by a term
called a situation. The constant S0 is used to denote the initial situation where no actions have yet been performed. Sequences of actions are built using the function symbol do,
such that do(a, s) denotes the successor situation resulting
from performing action a in situation s. Predicates and functions whose value varies from situation to situation are called
fluents, and are denoted by symbols taking a situation term
as their last argument (e.g., Holding(x, s)). Within the language, one can formulate action theories that describe how
the world changes as the result of actions [Reiter, 2001].
To represent and reason about complex actions or processes obtained by suitably executing atomic actions, various
so-called high-level programming languages have been defined. Here we concentrate on (a fragment of) ConGolog that
includes the following constructs:
α
ϕ?
δ1 ; δ2
if ϕ then δ1 else δ2
while loop
nondeterministic branch
nondeterministic choice of argument
nondeterministic iteration
concurrency
Σ ∪ C |= Trans(δ, s, δ ′ , s′ ) ⊃ ∃a.s′ = do(a, s).
Here and in the remainder, we use Σ to denote the foundational axioms of the situation calculus from [Reiter, 2001]
and C to denote the axioms defining the ConGolog language.
3
Situation-Determined Programs
As mentioned earlier, we are interested in process customization. For technical reasons, we will focus on a restricted
class of ConGolog programs for describing processes, namely
“situation-determined programs”. A program δ is situationdetermined in a situation s if for every sequence of transitions, the remaining program is determined by the resulting
situation, i.e.,
.
SituationDetermined (δ, s) = ∀s′ , δ ′ , δ ′′ .
∗
∗
′ ′
Trans (δ, s, δ , s ) ∧ Trans (δ, s, δ ′′ , s′ ) ⊃ δ ′ = δ ′′ ,
where Trans∗ denotes the reflexive transitive closure of Trans.
Thus, a (partial) execution of a situation-determined program
is uniquely determined by the sequence of actions it has produced. This is a key point. In general, the possible executions of a ConGolog program are characterized by sequences
of configurations formed by the remaining program and the
current situation. In contrast, the execution of situationdetermined programs can be characterized in terms of sequences of actions only, those sequences that correspond to
the situations reached from where the program started.
atomic action
test for a condition
sequence
conditional
2
In [De Giacomo et al., 2010], situation-determined programs
were only dealt with incidentally.
24
For example, the ConGolog program (a; b) | (a; c) is not
situation-determined in situation S0 as it can make a transition to a configuration (b, do(a, S0 )), where the situation
is do(a, S0 ) and the remaining program is b, and it can also
make a transition to a configuration (c, do(a, S0 )), where the
situation is also do(a, S0 ) and the remaining program is instead c. It is impossible to determine what the remaining program is given only a situation, e.g. do(a, S0 ), reached along
an execution. In contrast, the program a; (b | c) is situationdetermined in situation S0 . There is a unique remaining program (b | c) in situation do(a, S0 ) (and similarly for the other
reachable situations).
When we restrict our attention to situation-determined programs, we can use a simpler semantic specification for the
language; instead of Trans we can use a next (partial) function, where next(δ, a, s) returns the program that remains after δ does a transition involving action a in situation s (if δ
is situation determined, such a remaining program must be
unique). We will axiomatize the next function so that it satisfies the following properties:
next(πx.δ, a, s) =
if next(δ, a, s) 6= ⊥
Interleaving concurrency: next(δ1 kδ2 , a, s) =
next(δ1 , a, s)kδ2
if next(δ1 , a, s) 6= ⊥ and next(δ2 , a, s) = ⊥
δ1 knext(δ2 , a, s)
if next(δ2 , a, s) 6= ⊥ and next(δ1 , a, s) = ⊥
⊥ otherwise
Test, empty program, undefined:
next(ϕ?, a, s) = ⊥ next(nil, a, s) = ⊥
next(⊥, a, s) = ⊥
Moreover the undefined program is never Final:
Final(⊥, s) ≡ false.
Let C n be the set of ConGolog axioms extended with the
above axioms specifying next and Final(⊥, s). It is easy to
show that:
Proposition 1 Properties N1, N2, and N3 are entailed by Σ∪
Cn.
Note in particular that as per N3, if the remaining program
is not uniquely determined, then next(δ, a, s) is undefined.
Notice that for situation-determined programs this will never
happen, and if next(δ, a, s) returns ⊥ it is because δ cannot
make any transition using a in s:
Corollary 2
∃!δ ′ .Trans(δ, s, δ ′ , do(a, s)) ⊃
∀δ ′ .(Trans(δ, s, δ ′ , do(a, s)) ⊃ next(δ, a, s) = δ ′ ) (N2)
¬∃!δ ′ .Trans(δ, s, δ ′ , do(a, s)) ⊃ next(δ, a, s) = ⊥ (N3)
Here ∃!x.φ(x) means that there exists a unique x such that
φ(x); this is defined in the usual way. ⊥ is a special value
that stands for “undefined”. The function next(δ, a, s) is only
defined when there is a unique remaining program after program δ does a transition involving the action a; if there is such
a unique remaining program, then next(δ, a, s) denotes it.
We define the function next inductively on the structure of
programs using the following axioms:
Atomic action:
nil if P oss(a, s) and α = a
next(α, a, s) =
⊥ otherwise
Σ ∪ C n |= ∀δ, s.SituationDetermined (δ, s) ⊃
∀a [(next(δ, a, s) = ⊥) ≡ (¬∃δ ′ .Trans(δ, s, δ ′ , do(a, s)))].
Let’s look at an example. Imagine an agent specified by
δB1 below that can repeatedly pick an available object and
repeatedly use it and then discard it, with the proviso that if
during use the object breaks, the agent must repair it:
Sequence: next(δ1 ; δ2 , a, s) =
next(δ1 , a, s); δ2 if next(δ1 , a, s) 6= ⊥ and
(¬F inal(δ1 , s) or next(δ2 , a, s) = ⊥)
δB1 = [π x.Available(x)?;
[use(x); (nil | [break(x); repair(x)])]∗ ;
discard(x)]∗
next(δ2 , a, s) if Final(δ1 , s) and next(δ1 , a, s) = ⊥
⊥ otherwise
We assume that there is a countably infinite number of available unbroken objects initially, that objects remain available
until they are discarded, that available objects can be used if
they are unbroken, and that objects are unbroken unless they
break and are not repaired (this is straightforwardly axiomatized in the situation calculus). Notice that this program is
situation-determined, though very nondeterministic.
Conditional:
next(if ϕ then δ1 else δ2 , a, s) =
next(δdx , a, s) if ∃!d.next(δdx , a, s) 6= ⊥
⊥ otherwise
Nondeterministiciteration:
next(δ, a, s); δ ∗
next(δ ∗ , a, s) =
⊥ otherwise
next(δ, a, s) = δ ′ ∧ δ ′ 6= ⊥ ⊃ Trans(δ, s, δ ′ , do(a, s)) (N1)
next(δ1 , a, s) if ϕ[s]
next(δ2 , a, s) if ¬ϕ[s]
Loop:
next(δ, a, s); while ϕ do δ
if ϕ[s] and next(δ, a, s) 6= ⊥
next(while ϕ do δ, a, s) =
⊥ otherwise
Nondeterministic branch:
next(δ1 , a, s) if next(δ2 , a, s) = ⊥ or
next(δ2 , a, s) = next(δ1 , a, s)
next(δ1 |δ2 , a, s) =
next(δ
2 , a, s) if next(δ1 , a, s) = ⊥
⊥ otherwise
Nondeterministic choice of argument:
25
Language theoretic operations on programs. We can extend the SDConGolog language so as to close it with respect
to language theoretic operations, such as union, intersection
and difference/complementation. We can already see the nondeterministic branch construct as a union operator, and intersection and difference can be defined as follows:
Intersection/synchronous concurrency:
next(δ1 , a, s) & next(δ2 , a, s)
if both are different from ⊥
next(δ1 & δ2 , a, s) =
⊥ otherwise
by executing δ from s which can be extended until a Final
Difference:
next(δ1 − δ2 , a, s) =
configuration is reached:
next(δ1 , a, s) − next(δ2 , a, s) if both are different from ⊥
next(δ1 , a, s)
if next(δ2 , a, s) = ⊥
GR(δ, s) = {~a | ∃~b.Final(next ∗ (δ, ~a~b, s), do(~a~b, s))}
⊥ if next(δ1 , a, s) = ⊥
It is easy to see that CR(δ, s) ⊆ GR(δ, s) ⊆ RR(δ, s),
i.e., complete runs are good runs, and good runs are indeed
runs. Moreover, CR(δ, s) = CR(δ ′ , s) implies GR(δ, s) =
GR(δ ′ , s), i.e., if two programs in a situation have the same
complete runs, then they also have the same good runs; however they may still differ in their sets of non-good runs,
since CR(δ, s) = CR(δ ′ , s) does not imply RR(δ, s) =
RR(δ ′ , s). We say that a program δ in s is non-blocking iff
RR(δ, s) = GR(δ, s), i.e., if all runs of the program δ in s
can be extended to runs that reach a Final configuration.
For these new constructs, Final is defined as follows:
Final(δ1 & δ2 , s) ≡ Final(δ1 , s) ∧ Final(δ2 , s)
Final(δ1 − δ2 , s) ≡ Final(δ1 , s) ∧ ¬Final(δ2 , s)
We can express the complement of a program δ using difference as follows: (πa.a)∗ − δ.
It is easy to check that Proposition 1 and Corollary 2 also
hold for programs involving these new constructs.
As we will see later, synchronous concurrency can be used
to constrain/customize a process. Difference can be used to
prohibit certain process behaviors: δ1 − δ2 is the process
where δ1 is executed but δ2 is not.
To illustrate, consider an agent specified by program δS1
that repeatedly picks an available object and does anything to
it provided it is broken at most once before it is discarded:
The search construct. We can add to the language a search
construct Σ, as in [De Giacomo et al., 1998]:
Σ(next(δ, a, s)) if there exists ~a s.t.
Final(next ∗ (δ, a~a, s))
next(Σ(δ), a, s) =
⊥ otherwise
δS1 = [π x.Available(x)?;
[π a.(a−(break(x) | discard(x)))]∗ ;
(nil | (break(x)); [π a.(a−(break(x) | discard(x)))]∗ );
discard(x)]∗
F inal(Σ(δ), s) ≡ F inal(δ, s).
Intuitively, next(Σ(δ), a, s) does lookahead to ensure that action a is in a good run of δ in s, otherwise it returns ⊥.
Notice that: (i) RR(Σ(δ), s) = GR(Σ(δ), s), i.e., under the search construct all programs are non-blocking; (ii)
RR(Σ(δ), s) = GR(δ, s), i.e., Σ(δ) produces exactly the
good runs of δ; (iii) CR(Σ(δ), s) = CR(δ, s), i.e., Σ(δ) and
δ produce exactly the same set of complete runs. Thus Σ(δ)
trims the behavior of δ by eliminating all those runs that do
not lead to a Final configuration.
Note also that if a program is non-blocking in s, then
RR(Σ(δ), s) = RR(δ, s), in which case there is no point in
using the search construct. Finally, we have that: CR(δ, s) =
CR(δ ′ , s) implies RR(Σ(δ), s) = RR(Σ(δ ′ ), s), i.e., if two
programs have the same complete runs, then under the search
construct they have exactly the same runs.
Sequences of actions generated by programs. We can extend the function next to the function next ∗ (δ, ~a, s) that takes
a program δ, a finite sequence of actions ~a,3 and a situation
s, and returns the remaining program δ ′ after executing δ in s
producing the sequence of actions ~a, defined by induction on
the length of the sequence of actions as follows:
next ∗ (δ, ǫ, s) = δ
next ∗ (δ, a~a, s) = next ∗ (next(δ, a, s), ~a, do(a, s))
where ǫ denotes the empty sequence. Note that if along ~a the
program becomes ⊥ then next ∗ returns ⊥ as well.
We define the set RR(δ, s) of (partial) runs of a program δ in
a situation s as the sequences of actions that can be produced
by executing δ from s:4
4
Supervision
Let us assume that we have two agents: an agent B with behavior represented by the program δB and a supervisor S with
behavior represented by δS . While both are represented by
programs, the roles of the two agents are quite distinct. The
first is an agent B that acts freely within its space of deliberation represented by δB . The second, S, is supervising B
so that as B acts, it remains within the behavior permitted by
S. This role makes the program δS act as a specification of
allowed behaviors for agent B.
Note that, because of these different roles, one may want to
assume that all configurations generated by (δS , s) are F inal,
so that we leave B unconstrained on when it may terminate.
This amounts to requiring the following property to hold:
CR(δS , s) = GR(δS , s) = RR(δS , s). While reasonable,
for the technical development below, we do not need to rely
on this assumption.
The behavior of B under the supervision of S is constrained so that at any point B can execute an action in its
original behavior, only if such an action is also permitted in
RR(δ, s) = {~a | next ∗ (δ, ~a, s) 6= ⊥}
Note that if ~a ∈ RR(δ, s), then all prefixes of ~a are in
RR(δ, s) as well.
We define the set CR(δ, s) of complete runs of a program δ in
a situation s as the sequences of actions that can be produced
by executing δ from s until a Final configuration is reached:
CR(δ, s) = {~a | Final(next ∗ (δ, ~a, s), do(~a, s))}
We define the set GR(δ, s) of good runs of a program δ in a
situation s as the sequences of actions that can be produced
3
Notice that such sequences of actions have to be axiomatized in
second-order logic, similarly to situations (with UNA and domain
closure). As a short cut they could also be characterized directly in
terms of “difference” between situations.
4
Here and in what follows, we use set notation for readability; if
we wanted to be very formal, we could introduce RR as a defined
predicate, and similarly for CR, etc.
26
S’s behavior. Using the synchronous concurrency operator,
this can be expressed simply as:
Relaxed supervision. To define relaxed supervision we first
need to introduce two operations on programs: projection
and, based on it, relaxation. The projection operation takes a
program and an action filter Au , and projects all the actions
that satisfy the action filter (e.g., are uncontrollable), out of
the execution. To do this, projection substitutes each occurrence of an atomic action term αi by a conditional statement
that replaces it with the trivial test true? when Au (αi ) holds
in the current situation, that is:
δB & δS .
Note that unless δB & δS happens to be non-blocking, it may
get stuck in dead end configurations. To avoid this, we need to
apply the search construct, getting Σ(δB & δS ). In general,
the use of the search construct to avoid blocking, is always
needed in the development below.
We can use the example programs presented earlier to illustrate. The execution of δB1 under the supervision of δS1
is simply δB1 & δS1 (assuming all actions are controllable).
It is straightforward to show that the resulting behavior is to
repeatedly pick an available object and use it as long as one
likes, breaking it at most once, and repairing it whenever it
breaks, before discarding it. It can be shown that the set of
partial/complete runs of δB1 & δS1 is exactly that of:
pj (δ, Au ) = δ αi
if Au (αi ) then true? else αi
for every occurrence of an action term αi in δ.
(Recall that such a test does not perform any transition in our
variant of ConGolog.)
The relaxation operation on δ wrt Au (a, s) is as follows:
rl (δ, Au ) = pj (δ, Au )k(πa.Au (a)?; a)∗ .
[π x.Available(x)?;
use(x)∗ ;
[nil | (break(x); repair(x); use(x)∗ )];
discard(x)]∗
In other words, we project out the actions in Au from δ and
run the resulting program concurrently with one that picks
(uncontrollable) actions filtered by Au and executes them.
The resulting program no longer constrains the occurrence
of actions from Au in any way. In fact, notice that the remaining program of (πa.Au (a)?; a)∗ after the execution of
an (uncontrollable) filtered action is (πa.Au (a)?; a)∗ itself,
and that such a program is always Final.
Now we are ready to define relaxed supervision. Let us
consider a supervisor S with behavior δS for agent B with
behavior δB . Let the action filter Au (au , s) specify the uncontrollable actions. Then the relaxed supervision of S (for
Au (au , s)) in s is the relaxation of δS so as that it allows
every uncontrollable action, namely: rl (δS , Au ). So we can
characterize the behavior of B under the relaxed supervision
of S as:
δB & rl (δS , Au ).
The following properties are immediate consequences of
the definitions:
Uncontrollable actions. In the above, we implicitly assumed
that all actions of agent B could be controlled by the supervisor S. This is often too strong an assumption, e.g. once we
let a child out in a garden after rain, there is nothing we can
do to prevent her/him from getting muddy. We now want to
deal with such cases.
Following [Wonham and Ramadge, 1987], we distinguish
between actions that are uncontrollable by the supervisor and
actions that are controllable. The supervisor can block the
execution of the controllable actions but cannot prevent the
supervised agent from executing the uncontrollable ones.
To characterize the uncontrollable actions in the situation
calculus, we use a special fluent Au (au , s), which we call an
action filter, that expresses that action au is uncontrollable in
situation s. Notice that, differently from the Wonham and Ramadge work, we allow controllability to be context dependent
by allowing an arbitrary specification of the fluent Au (au , s)
in the situation calculus.
While we would like the supervisor S to constrain agent B
so that δB & δS is executed, in reality, since S cannot prevent uncontrollable actions, S can only constrain B on the
controllable actions. When this is sufficient, we say that the
supervisor is “effective”. Technically, following again Wonham and Ramadge’s ideas, this can be captured by saying that
the supervision by δS is effective for δB in situation s iff:
Proposition 3 The relaxed supervision rl (δS , Au ) is effective for δB in situation s.
Proposition 4 CR(δB & δS , s) ⊆ CR(δB & rl (δS , Au ), s).
Proposition 5 If CR(δB & rl (δS , Au ), s) ⊆ CR(δB &
δS , s), then δS is effective for δB in situation s.
Notice that, the first one is what we wanted. But the second one says that rl (δS , Au ) may indeed by more permissive than δS : some complete runs that are disallowed in δS
may be permitted by its relaxation rl (δS , Au ). This is not always acceptable. The last one, says that when the converse
of Proposition 4 holds, we have that the original supervision
δS is indeed effective for δB in situation s. Notice however
that even if δS effective for δB in situation s, it may still be
the case that CR(δB & rl (δS , Au ), s) ⊂ CR(δB & δS , s).
∀~aau .~a ∈ GR(δB & δS , s) and Au (au , do(~a, s)) implies
if ~aau ∈ GR(δB , s) then ~aau ∈ GR(δS , s).
What this says is that if we postfix a good run ~a for both
B and S with an uncontrollable action au that is good for B,
then this uncontrollable action au must also be good for S. By
the way, notice that ~aau ∈ GR(δB , s) and ~aau ∈ GR(δS , s)
together imply that ~aau ∈ GR(δB & δS , s).
What about if such a property does not hold? We can take
two orthogonal approaches: (i) relax δS so that it places no
constraints on the uncontrollable actions; (ii) require that δS
be indeed enforced, but disallow all those runs that prevent
δS from being effective. We look at both approaches below.
Maximal permissive supervisor. Next we study a more conservative approach: we require the supervision δS to be fulfilled, and for getting effectiveness we restrict it further. Interestingly, we show that there is a single maximal way of
restricting the supervisor S so that it both fulfills δS and becomes effective. We call the resulting supervisor the maximal
permissive supervisor.
27
are controllable), the supervisor S1 can only ensure that its
constraints are satisfied if it forces B1 to discard an object as
soon as it is broken and repaired. This is what we get as maximal permissive supervisor mps(δB1 , δS1 , S0 ), whose set of
partial/complete runs can be shown to be exactly that of:
We start by introducing a new abstract program construct
set(E) taking as argument a possibly infinite set E of sequences of actions, with next and Final defined as follows:
(
set(E ′ ) with E ′ = {~a | a~a ∈ E}
if E ′ 6= ∅
next(set(E), a, s) =
⊥ if E ′ = ∅
Final(set(E), s) ≡ (ǫ ∈ E)
[π x.Available(x)?;
use(x)∗ ;
[nil | (break(x); repair(x))];
discard(x)]∗
Thus set(E) can be executed to produce any of the sequences
of actions in E.
Notice that for every program δ and situation s, we can
define Eδ = CR(δ, s) such that CR(set(Eδ ), s) = CR(δ, s).
The converse does not hold in general, i.e., there are abstract
programs set(E) such that for all programs δ, not involving
the set(·) construct, CR(set(Eδ ), s) 6= CR(δ, s). That is,
the syntactic restrictions in ConGolog may not allow us to
represent some possible sets of sequences of actions.
With the set(E) construct at hand, following [Wonham
and Ramadge, 1987], we may define the maximal permissive
supervisor mps(δB , δS , s) of B with behavior δB by S with
behavior δS in situation s, as:
S
mps(δB , δS , s) = set( E∈E E) where
By the way, notice that (δB1 & rl (δS1 , Au )) instead is completely ineffective since it has exactly the runs as δB1 .
Unfortunately, in general, mps(δB , δS , s) requires the use
of the abstract program construct set(E), which can be expressed directly in ConGolog only if E is finite.5 For this
reason the above characterization remains essentially mathematical. So next, we develop a new construct for execution
of programs under maximal permissive supervision, which is
indeed realizable.
Maximal permissive supervised execution. To capture the
notion of maximal permissive execution of agent B with behavior δB under the supervision of S with behavior δS in situation s, we introduce a special version of the synchronous
concurrency construct that takes into account the fact the
some actions are uncontrollable. Without loss of generality,
we assume that δB and δS both start with a common controllable action (if not, it is trivial to add a dummy action in front
of both so as to fullfil the requirement). Then, we characterize
the construct through next and Final as follows:
next(δB &Au δS , a, s) =
⊥ if ¬Au (a, s) and ∃a~u .Au (a~u , do(a, s)) s.t.
next ∗ (Σ(δB ), aa~u , s) 6= ⊥ and next ∗ (Σ(δS ), aa~u , s) = ⊥
E = {E | E ⊆ CR(δB & δS , s)
and set(E) is effective for δB in s}
Intuitively mps denotes the maximal set of runs that are effectively allowable by a supervisor that fulfills the specification δS , and which can be left to the arbitrary decisions of the
agent δB on the non-controllable actions. A quite interesting
result is that, even in the general setting we are presenting,
such a maximally permissive supervisor always exists and is
unique. Indeed, we can show:
Theorem 6 For the maximal permissive supervisor
mps(δB , δS , s) the following properties hold:
1. mps(δB , δS , s) always exists and is unique;
2. mps(δB , δS , s) is an effective supervisor for δB in s;
⊥ if next(δB , a, s) = ⊥ or next(δS , a, s) = ⊥
otherwise
next(δB , a, s) &Au next(δS , a, s)
Here Au (a~u , s) is inductively defined on the length of a~u as
the smallest predicate such that: (i) Au (ǫ, s) ≡ true; (ii)
Au (au a~u , s) ≡ Au (au , s) ∧ Au (a~u , do(au , s)).
Final for the new construct is as follows:
3. For every possible effective supervisor δ̂S for δB in s
such that CR(δB & δ̂S , s) ⊆ CR(δB & δS , s), we have
that CR(δB & δ̂S , s) ⊆ CR(δB & mps(δB , δS , s), s).
Proof: We prove the three claims separately.
Claim 1 follows directly from the fact set(∅) satisfies the
conditions to be included in mps(δB , δS , s).
Claim 3 also follows immediately from the definition of
mps(δB , δS , s), by recalling that CR(δB & δ̂S , s) =
CR(δB & set(Eδ̂S ), s).
Final(δB &Au δS , s) ≡ Final(δB , s) ∧ Final(δS , s).
This new construct captures exactly the maximal permissive
supervisor; indeed the theorem below shows the correctness
of maximal permissive supervised execution:
Theorem 7
CR(δB &Au δS , s) = CR(δB & mps(δB , δS , s), s).
Proof: We start by showing:
For Claim 2, it suffices to show that ∀~aau .~a ∈ GR(δB &
mps(δB , δS , s), s) and Au (au , do(~a, s)) we have that if
~aau ∈ GR(δB , s) then ~aau ∈ GR(mps(δB , δS , s), s). Indeed, if ~a ∈ GR(δB & mps(δB , δS , s), s) then there is
an effective supervisor set(E) such that ~a ∈ GR(δB &
set(E), δS , s), s). set(E) being effective for δB in s, if
~aau ∈ GR(δB , s) then ~aau ∈ GR(set(E), s), but then
~aau ∈ GR(mps(δB , δS , s), s).
CR(δB &Au δS , s) ⊆ CR(δB & mps(δB , δS , s), s).
It suffices to show that δB &Au δS is effective for δB in
s. Indeed, if this is the case, by considering that δB &
mps(δB , δS , s) is the largest effective supervisor for δB in s,
and that RR(δB & (δB &Au δS ), s) = RR(δB &Au δS , s),
we get the thesis.
5
Note that the object domain may be uncountable in general,
hence not even an infinitary ConGolog program could capture
set(E) in general.
We can illustrate using our example programs. If we assume that the break action is uncontrollable (and the others
28
So we have to show that: ∀~aau .~a ∈ GR(δB &Au δS , s)
and Au (au , do(~a, s)) we have that if ~aau ∈ GR(δB , s) then
~aau ∈ GR(δB &Au δS , s).
Since, wlog we assume that δB and δS started with a common controllable action, we can write ~a = a~′ ac a~u , where
′
¬Au (ac , do(a~′ , s)) and Au (a~u , do(a~′ ac , s)) holds. Let δB
=
∗
∗
′
′
′
′
′
~
~
~
next (δB , a , s), δS = next (δS , a , s), and s = do(a , s).
By the fact that a~′ ac a~u ∈ GR(δB &Au δS , s) we know that
′
next(δB
&Au δS′ , do(ac , s′ )) 6= ⊥. But then, by de definition of next, we have that for all b~u such that Au (b~u , s′ ) if
′
b~u ∈ GR(δB
, do(ac , s′ )) then b~u ∈ GR(δS′ , do(ac , s′ )). In
particular this holds for b~u = a~u au . Hence we have that if
~aau ∈ GR(δB , s) then ~aau ∈ GR(δS , s).
runs that lead to Final configurations. We can ensure that an
agent finds such executions by having it do lookahead/search.
Also of interest is the case in which agents act boldly without necessarily performing search to get to Final configurations. In this case, we need to consider all partial runs, not
just good ones. Note that this would actually yield the same
result if we engineered the agent behavior such that all of its
runs are good runs, i.e. if RR(δB , s) = GR(δB , s), i.e.,
all configurations are final. In fact, one could define a closure construct cl (δ) that would make all configurations of
δ final. Using this, one can apply our specification of the
maximal permissive supervisor to this case as well if we replace δB & δS by cl (δB & δS ) in the definition. Observe
also, that under the assumption RR(δB , s) = GR(δB , s), in
next(δB &Au δS , a, s) we no longer need to do the search
Σ(δB ) and Σ(δS ) and can directly use δB and δS .
We conclude by mentioning that if the object domain is finite, then ConGolog programs assume only a finite number of
possible configurations. In this case, we can take advantage
of the finite state machinery that was originally proposed by
Wonham and Ramage (generalizing it to deal with situationdependent sets of controllable actions), and the recent work
on translating ConGolog into finite state machines and back
[Fritz et al., 2008], to obtain a program that actually characterizes the maximally permissive supervisor. In this way,
we can completely avoid doing search during execution. We
leave an exploration of this notable case for future work.
Next we prove:
CR(δB & mps(δB , δS , s), s) ⊆ CR(δB &Au δS , s).
Suppose not. Then there exist a complete run ~a such that ~a ∈
CR(δB & mps(δB , δS , s), s) but ~a 6∈ CR(δB &Au δS , s).
As an aside, notice that ~a ∈ CR(δ, s) then ~a ∈ GR(δ, s) and
for all prefixes a~′ such that a~′~b = ~a we have a~′ ∈ GR(δ, s).
Hence, let a~′ = a~′′ a such that a~′ ∈ GR(δB &Au δS , s) but
′′
′′ ~′′
= next ∗ (δB
, a , s),
a~′′ a 6∈ GR(δB &Au δS , s), and let δB
∗
′
′′
δ = next (δS , a~′′ , s), and s = do(a~′′ , s).
S
Since a~′′ a 6∈ GR(δB &Au δS , s), it must be the case
′′
that next(δB
&Au δS′′ , a, s′′ ) = ⊥. But then, consider′′
, a, s′′ ) 6= ⊥ and next(δS′′ , a, s′′ ) 6=
ing that both next(δB
⊥, it must be the case that ¬Au (a, s′′ ) and exists b~u such
′′
that Au (b~u , do(a, s′′ )), and ab~u ∈ GR(δB
, s′′ ) but ab~u 6∈
′′ ′′
GR(δS , s ).
Notice that b~u 6= ǫ, since we have that a ∈ GR(δS′′ , s′′ ).
So b~u = c~u bu d~u with ac~u ∈ GR(δS′′ , s′′ ) but ac~u bu 6∈
GR(δS′′ , s′′ ).
Now a~′ ∈ GR(δB & mps(δB , δS , s), s) and since
Au (c~u bu , do(a~′ , s)), we have that a~′ c~u bu ∈ GR(δB &
mps(δB , δS , s), s). Since, mps(δB , δS , s) is effective for
δB in s, we have that, if a~′ c~′u bu ∈ GR(δB , s) then
a~′ c~u bu ∈ GR(mps(δB , δS , s), s). This, by definition of
mps(δB , δS , s), implies a~′ c~u bu ∈ GR(δB & δS , s), and
hence, in turn, a~′ c~u bu ∈ GR(δS , s). Hence, we can conclude
that ac~′u bu ∈ GR(δS′′ , s′′ ), getting a contradiction.
5
Acknowledgments
We thank Murray Wonham for inspiring discussions on
supremal controllable languages in finite state discrete event
control, which actually made us look into agent supervision
from a different and very fruitful point of view. We also thank
the anonymous referees for their comments. We acknowledge
the support of EU Project FP7-ICT ACSI (257593).
References
[Bertoli et al., 2010] Piergiorgio Bertoli, Marco Pistore, and
Paolo Traverso. Automated composition of web services via planning in asynchronous domains. Artif. Intell.,
174(3-4):316–361, 2010.
[De Giacomo et al., 1998] Giuseppe De Giacomo, Raymond
Reiter, and Mikhail Soutchanski. Execution monitoring of
high-level robot programs. In KR, pages 453–465, 1998.
[De Giacomo et al., 2000] Giuseppe De Giacomo, Yves
Lespérance, and Hector J. Levesque. ConGolog, a concurrent programming language based on the situation calculus. Artificial Intelligence, 121(1–2):109–169, 2000.
Conclusion
In this paper, we have investigated agent supervision in
situation-determined ConGolog programs. Our account of
maximal permissive supervisor builds on [Wonham and Ramadge, 1987]. However, Wonham and Ramage’s work deals
with finite state automata, while we handle infinite state systems in the context of the rich agent framework provided by
the situation calculus and ConGolog. We used ConGolog
as a representative of an unbounded-states process specification language, and it should be possible to adapt our account of supervision to other related languages. We considered a form of supervision that focuses on complete runs, i.e.,
[De Giacomo et al., 2010] Giuseppe De Giacomo, Yves
Lespérance, and Adrian R. Pearce. Situation calculus
based programs for representing and reasoning about game
structures. In KR, 2010.
[Demolombe and Hamon, 2002] Robert Demolombe and
Erwan Hamon. What does it mean that an agent is performing a typical procedure? a formal definition in the
situation calculus. In AAMAS, pages 905–911, 2002.
29
[Fritz and Gil, 2010] Christian Fritz and Yolanda Gil. Towards the integration of programming by demonstration
and programming by instruction using Golog. In PAIR,
2010.
[Fritz and McIlraith, 2006] Christian Fritz and Sheila McIlraith. Decision-theoretic Golog with qualitative preferences. In KR, pages 153–163, June 2–5 2006.
[Fritz et al., 2008] Christian Fritz, Jorge A. Baier, and
Sheila A. McIlraith. ConGolog, sin trans: Compiling ConGolog into basic action theories for planning and beyond.
In KR, pages 600–610, 2008.
[Lin et al., 2008] Naiwen Lin, Ugur Kuter, and Evren Sirin.
Web service composition with user preferences. In ESWC,
pages 629–643, 2008.
[McIlraith and Son, 2002] S. McIlraith and T. Son. Adapting
Golog for composition of semantic web services. In KR,
pages 482–493, 2002.
[Reiter, 2001] Ray Reiter. Knowledge in Action. Logical
Foundations for Specifying and Implementing Dynamical
Systems. The MIT Press, 2001.
[Sardiña and De Giacomo, 2009] Sebastian Sardiña and
Giuseppe De Giacomo.
Composition of ConGolog
programs. In IJCAI, pages 904–910, 2009.
[Sohrabi et al., 2009] Shirin Sohrabi, Nataliya Prokoshyna,
and Sheila A. McIlraith. Web service composition via the
customization of Golog programs with user preferences.
In Conceptual Modeling: Foundations and Applications,
pages 319–334. Springer, 2009.
[Su, 2008] Jianwen Su. Special issue on semantic web services: Composition and analysis. IEEE Data Eng. Bull.,
31(3), 2008.
[Wonham and Ramadge, 1987] WM Wonham and PJ Ramadge. On the supremal controllable sub-language of a
given language. SIAM J Contr Optim, 25(3):637659, 1987.
30
On the Use of Epistemic Ordering Functions as Decision Criteria
for Automated and Assisted Belief Revision in SNePS
(Preliminary Report)
Ari I. Fogel and Stuart C. Shapiro
University at Buffalo, The State University of New York, Buffalo, NY
{arifogel,shapiro}@buffalo.edu
Abstract
to give up, but this has to be decided by some
other means. What makes things more complicated is that beliefs in a database have logical consequences. So when giving up a belief you have to
decide as well which of its consequences to retain
and which to retract... [Gärdenfors and Rott, 1995]
In later sections we will discuss in detail how to make a
choice of belief(s) to retract when presented with an inconsistent belief set.
We implement belief revision in SNePS based on
a user-supplied epistemic ordering of propositions.
We provide a decision procedure that performs
revision completely automatically when given a
well preorder. We also provide a decision procedure for revision that, when given a total preorder, simulates a well preorder by making a minimal number of queries to the user when multiple propositions within a minimally-inconsistent
set are minimally-epistemically-entrenched. The
first procedure uses Op|Σ|q units of space, and completes within Op|Σ|2 ¨ smax q units of time, where Σ is
the set of distinct minimally-inconsistent sets, and
smax is the number of propositions in the largest
minimally-inconsistent set. The second procedure
uses Op|Σ|2 ¨s2max q space and Op|Σ|2 ¨s2max q time. We
demonstrate how our changes generalize previous
techniques employed in SNePS.
AGM Paradigm
In [Gärdenfors, 1982; Alchourron et al., 1985], operators and
rationality postulates for theory change are discussed. In
general, any operator that satisfies those postulates may be
thought of as an AGM operation.
Let CnpAq refer to the closure under logical consequence
of a set of propositions A. A theory is defined to be a set of
propositions closed under logical consequence. Thus for any
set of propositions A, CnpAq is a theory. It is worth noting
that theories are infinite sets. [Alchourron et al., 1985] discusses operations that may be performed on theories. Partial
meet contraction and revision. defined in [Alchourron et al.,
1985], satisfy all of the postulates for a rational contraction
and revision operator, respectively.
1 Introduction
1.1 Belief Revision
Several varieties of belief revision have appeared in the
literature over the years. AGM revision typically refers to
the addition of a belief to a belief set, at the expense of
its negation and any other beliefs supporting its negation
[Alchourron et al., 1985]. Removal of a belief and beliefs
that support it is called contraction. Alternatively, revision
can refer to the process of resolving inconsistencies in a
contradictory knowledge base, or one known to be inconsistent [Martins and Shapiro, 1988]. This is accomplished
by removing one or more beliefs responsible for the inconsistency, or culprits. This is the task with which we
are concerned. In particular, we have devised a means of
automatically resolving inconsistencies by discarding the
least-preferred beliefs in a belief base, according to some
epistemic ordering [Gärdenfors, 1988; Williams, 1994;
Gärdenfors and Rott, 1995].
Theory Change on Finite Bases
It is widely accepted that agents, because of their
limited resources, believe some but by no means all
of the logical consequences of their beliefs. [Lakemeyer, 1991]
A major issue with the AGM paradigm it tends to operate on
and produce infinite sets (theories). A more practical model
would include operations to be performed on finite belief
sets, or belief bases. Such operators would be useful in supporting computer-based implementations of revision systems
[Williams, 1994].
It has been argued that The AGM paradigm uses a coherentist approach1 [Gärdenfors, 1989], in that all beliefs require some sort of external justification. On the other hand,
finite-base systems are said to use a foundationalist approach,
wherein some beliefs indeed have their own epistemic standing, and others can be derived from them. SNePS, as we shall
see, uses the finite-base foundationalist approach.
The problem of belief revision is that logical considerations alone do not tell you which beliefs
1 It
31
has also been argued otherwise [Hansson and Olsson, 1999]
Epistemic Entrenchment
Let us assume that the decision on which beliefs to retract
from a belief base is made is based on the relative importance of each belief, which is called its degree of epistemic
entrenchment [Gärdenfors, 1988]. Then we need an ordering ď with which to compare the entrenchment of individual
beliefs. Beliefs that are less entrenched are preferentially discarded during revision over beliefs that are more entrenched.
An epistemic entrenchment ordering is used to uniquely determine the result of AGM contraction. Such an ordering is
a noncircular total preorder (that satisfies certain other postulates) on all propositions.
way to construct non-prioritized belief revision is
to base it on the following two-step process: First
we decide whether to accept or reject the input. After that, if the input was accepted, it is incorporated
into the belief state [Hansson, 1999].
Hansson goes on to describe several other models of nonprioritized belief revision, but they all have one unifiying feature distinguishing them from prioritized belief revision: the
input, i.e. the RHS argument to the revision operator, is not
always accepted. To reiterate: Prioritized belief revision is
revision in which the proposition by which the set is revised
is always present in the result (as long as it is not a contradiction). Non-prioritized belief revision is revision in which the
RHS argument to the revision operator is not always present
in the result (even if it is not a contradiction).
The closest approximation from Hansson’s work to our
work is the operation of semi-revision [Hansson, 1997].
Semi-revision is a type of non-prioritized belief revision that
may be applied to belief bases.
Ensconcements
Ensconcements, introduced in [Williams, 1994], consist of a
set of forumlae together with a total preorder on that set. They
can be used to construct epistemic entrenchment orderings,
and determine theory base change operators.
Safe Contraction
In [Alchourron and Makinson, 1985], the operation safe contraction is introduced. Let ă be a non-circular relation over a
belief set A. An element a of A is safe with respect to x iff a
is not a minimal element of any minimal subset B of A such
that x P CnpBq. Let A{x be the set of all elements of A that are
safe with respect to x. Then the safe contraction of A by x,
denoted A ´s x, is defined to be A XCnpA{xq.
1.2 SNePS
Description of the System
“SNePS is a logic-, frame-, and network- based knowledge
representation, reasoning, and acting system. Its logic is
based on Relevance Logic [Shapiro, 1992], a paraconsistent
logic (in which a contradiction does not imply anything whatsoever) [Shapiro and Johnson, 2000].”
SNeRE, the SNePS Rational Engine, provides an acting
system for SNePS-based agents, whose beliefs must change
to keep up with a changing world. Of particular interest is
the believe action, which is used to introduce beliefs that take
priority over all other beliefs at the time of their introduction.
Assumption-based Truth Maintenance Systems
In an assumption-based truth maintenance system (ATMS),
the system keeps track of the assumptions (base beliefs) underlying each belief [de Kleer, 1986]. One of the roles of an
conventional TMS is to keep the database contradiction-free.
In an assumption-based ATMS, contradictions are removed as
they are discovered. When a contradiction is detected in an
ATMS, then there will be one or more minimally-inconsistent
sets of assumptions underlying the contradiction. Such sets
are called no-goods. [Martins and Shapiro, 1988] presented
SNeBR, an early implementation of an ATMS that uses the
logic of SNePS. In that paper, sets of assumptions supporting
a belief are called origin sets. They correspond to antecedents
of a justification from [de Kleer, 1986]. The focus of this paper is modifications to the modern version of SNeBR.
Belief Change in SNePS
Every belief in a SNePS knowledge base (which consists
of a belief base and all currently-known derived propostions
therefrom) has one or more support sets, each of which consists of an origin tag and an origin set. The origin tag will
identify a belief as either being introduced as a hypothesis, or
derived (note that it is possible for a belief to be both introduced as a hypothesis and derived from other beliefs). The
origin set contains those hypotheses that were used to derive
the belief. In the case of the origin tag denoting a hypothesis, the corresponding origin set would be a singleton set
containing only the belief itself. The contents of the origin
set of a derived belief are computed by the implemented rules
of inference at the time the inference is drawn [Martins and
Shapiro, 1988; Shapiro, 1992].
The representation of beliefs in SNePS lends itself well to
the creation of processes for contraction and revision. Specifically, in order to contract a belief, one must merely remove
at least one hypothesis from each of its origin sets. Similarly,
prioritized revision by a belief b (where ␣b is already believed) is accomplished by removing at least one belief from
each origin set of ␣b. Non-prioritized belief revision under
this paradigm is a bit more complicated. We discuss both
types of revision in more detail in §2.
Kernel Contraction
In [Hansson, 1994], the operation kernel contraction is introduced. A kernel set Aáα is defined to be the set of all
minimal subsets of A that imply α . A kernel set is like a set
of origin sets from [Martins and Shapiro, 1988]. Let σ be an
incision function for A. Then for all α , σ pAáα q Ď YpAáα q,
and if H ‰ X P Aáα , then X X σ pAáα q ‰ H. The kernel
contraction of A by α based on σ , denoted A „σ α , is equal
to Azσ pAáα q.
Prioritized Versus Non-Prioritized Belief Revision
In the AGM model of belief revision [Alchourron
et al., 1985] . . . the input sentence is always accepted. This is clearly an unrealistic feature, and
. . . several models of belief change have been proposed in which no absolute priority is assigned to
the new information due to its novelty. . . . One
32
That is, a proposition asserted by a believe action takes priority over any other proposition. When either both or neither
propositions being compared have been assserted by the believe action, then we use the same ordering as we would for
nonprioritized revision.
SNeBR
SNeBR, The SNePS Belief Revision subsystem, is responsible for resolving inconsistencies in the knowledge base as
they are discovered. In the current release of SNePS (version
2.7.1), SNeBR is able to automatically resolve contradictions
under a limited variety of circumstances [Shapiro and The
SNePS Implementation Group, 2010, 76]. Otherwise “assisted culprit choosing” is performed, where the user must
manually select culprits for removal. After belief revision
is performed, the knowledge base might still be inconsistent,
but every known derivation of an inconsistency has been eliminated.
2.2 Common Requirements for a Rational Belief
Revision Algorithm
Primary Requirements
The inputs to the algorithm are:
• A set of formulae Φ: the current belief base, which is
known to be inconsistent
• A total preorder ď on Φ: an epistemic entrenchment ordering that can be used to compare the relative desirability of each belief in the current belief base
2 New Belief Revision Algorithms
2.1 Problem Statement
Nonprioritized Belief Revision
Suppose we have a knowledge base that is not known to be
inconsistent, and suppose that at some point we add a contradictory belief to that knowledge base. Either that new belief
directly contradicts an existing belief, or we derive a belief
that directly contradicts an existing one as a result of performing forward and/or backward inference on the new belief. Now the knowledge base is known to be inconsistent.
We will refer to the contradictory beliefs as p and ␣p
Since SNePS tags each belief with one or more origin sets,
or sets of supporting hypotheses, we can identify the underlying beliefs that support each of the two contradictory beliefs.
In the case where p and ␣p each have one origin set, OS p
and OS␣ p respectively, we may resolve the contradiction by
removing at least one hypothesis from OS p Y OS␣ p . We shall
refer to such a union as a no-good. If there are m origin sets
for p, and n origin sets for ␣p, then there will be at most mˆn
distinct no-goods (some unions may be duplicates of others).
To resolve a contradiction in this case, we must retract at least
one hypothesis from each no-good (Sufficiency).
We wish to devise an algorithm that will select the hypotheses for removal from the set of no-goods. The first priority will be that the hypotheses selected should be minimallyepistemically-entrenched (Minimal Entrenchment) according
to some total preorder ď. Note that we are not referring
strictly to an AGM entrenchment order, but to a total preorder on the set of hypotheses, without regard to the AGM
postulates. The second priority will be not to remove any
more hypotheses than are necessary in order to resolve the
contradiction (Information Preservation), while still satisfying priority one.
• Minimally-inconsistent sets of formulae σ1 , . . . , σn , each
of which is a subset of Φ: the no-goods
• A set Σ “ tσ1 , . . . , σn u: the set of all the no-goods
The algorithm should produce a set T that satisfies the following conditions:
pEESNePS 1q @σ rσ P Σ Ñ Dτ rτ P pT X σ qs (Sufficiency)
pEESNePS 2q @τ rτ P T Ñ Dσ rσ P Σ ^ τ P σ ^ @wrw P σ Ñ
τ ď wsss (Minimal Entrenchment)
pEESNePS 3q @T 1 rT 1 Ă T Ñ ␣@σ rσ P Σ Ñ Dτ rτ P pT 1 X
σ qsss (Information Preservation)
Condition pEESNePS 1q states that T contains at least one formula from each set in Σ. Condition pEESNePS 2q states that every formula in T is a minimally-entrenched formula of some
set in Σ. Condition pEESNePS 3q states that if any formula is
removed from T, then Condition pEESNePS 1q will no longer
hold. In addition to the above conditions, our algorithm must
terminate on all possible inputs, i.e. it must be a decision
procedure.
Supplementary Requirement
In any case where queries must be made of the user in order
to determine the relative epistemic ordering of propositions,
the number of such queries must be kept to a minimum.
2.3 Implementation
We present algorithms to solve the problem as stated:
Where we refer to ď below, we are using the prioritized entrenchment ordering from §2.1. In the case of nonprioritized
revision we may assume that P “ H
Prioritized Belief Revision
The process of Prioritized Belief Revision in SNePS occurs
when a contradiction is discovered after a belief is asserted
explicitly using the believe act of SNeRE. The major difference here is that a subtle change is made to the entrenchment
ordering ď. If ďnonpri is the ordering used for nonprioritized
belief revision, then for prioritized belief revision we use an
ordering ď pri as follows:
Let P be the set of beliefs asserted by a believe action. Then
@e1 , e2 re1 P P ^ e2 R P Ñ ␣pe1 ď pri e2 q ^ e2 ď pri e1 s
@e1 , e2 re1 R P ^ e2 R P Ñ pe1 ď pri e2 Ø e1 ďnonpri e2 qs
@e1 , e2 re1 P P ^ e2 P P Ñ pe1 ď pri e2 Ø e1 ďnonpri e2 qs
Using a well preorder
Let tď be the output of a function f whose input is a total
preorder ď, such that tď Ďď The idea is that f creates the
well preorder tď from ď by removing some pairs from the
total preorder ď. Note that in the case where ď is already
a well preorder, tď “ď. Then we may use Algorithm 1 to
solve the problem.
Algorithm 1 Algorithm to compute T given a well preorder
33
Input: Σ, tď
Output: T
1: T ð H
2: for all pσ P Σq do
3:
Move minimally entrenched belief in σ to first position
in σ , using tď as a comparator
4: end for
5: Sort elements of Σ into descending order of the values of
the first element in each σ using tď as a comparator
6: AddLoop :
7: while pΣ ‰ Hq do
8:
currentCulprit ð σ11
9:
T ð T Y tcurrentCulpritu
10:
DeleteLoop :
11:
for all pσcurrent P Σq do
12:
if pcurrentCulprit P σcurrent q then
13:
Σ ð Σzσcurrent
14:
end if
15:
end for
16: end while
17: return T
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
Using a total preorder
Unfortunately it is easy to conceive of a situation in which
the supplied entrenchment ordering is a total preorder, but
not a well preorder. For instance, let us say that, when
reasoning about a changing world, propositional fluents
(propositions that are only true of a specific time or situation)
are abandoned over non-fluent propositions. It is not clear
then how we should rank two distinct propositional fluents,
nor how to rank two distinct non-fluent propositions. If
we can arbitrarily specify a well preorder that is a subset
of the total preorder we are given, then algorithm 1 will
be suitable. Otherwise, we can simulate a well order t
through an iterative construction by querying the user for
the unique minimally-entrenched proposition of a particular
set of propositions at appropriate times in the belief-revision
process. Algorithm 2 accomplishes just this.
other no-good via an lσ j ,(1 ď j ď |Σ| , i ‰ j)) then
T ð T Y tpu
for all pσcurrent P Σq do
if pp P σcurrent q then
Σ ð Σzσcurrent
end if
end for
if pΣ “ Hq then
return T
end if
end if
end for
Modi f yLoop:
for all pσ P Σq do
if (σ has multiple minimally-entrenched propositions) then
query which proposition l of the minimallyentrenched propostions is least desired.
Modify ď so that l is strictly less entrenched than
those other propositions.
break out of Modi f yLoop
end if
end for
end loop
Characterization
These algorithms perform an operation similar to incision
functions [Hansson, 1994], since they select one or more
propositions to be removed from each minimally-inconsistent
set. Their output seems analogous to σ pΦápp ^ ␣pqq, where
σ is the incision function, á is the kernel-set operator from
[Hansson, 1994], and p is a proposition. But we are actually
incising Σ, the set of known no-goods. The known no-goods
are of course a subset of all no-goods, i.e. Σ Ď Φápp ^ ␣pq.
This happens because SNeBR resolves contradictions as soon
as they are discovered, rather than performing inference first
to discover all possible sources of contradictions.
The type of contraction eventually performed is similar to
safe contraction [Alchourron and Makinson, 1985], except
that there are fewer restrictions on our epistemic ordering.
Algorithm 2 Algorithm to compute T given a total preorder
3 Analysis of Algorithm 1
Input: Σ, ď
Output: T
1: T ð H
2: MainLoop:
3: loop
4:
ListLoop:
5:
for all pσi P Σ, 1 ď i ď |Σ|q do
6:
Make a list lσi of all minimally-entrenched propositions, i.e. propositions that are not strictly more
entrenched than any other, among those in σi , using
ď as a comparator.
7:
end for
8:
RemoveLoop:
9:
for all (σi P Σ, 1 ď i ď |Σ|) do
10:
if (According to lσi , σ has exactly one minimallyentrenched proposition p AND the other propositions in σi are not minimally-entrenched in any
3.1 Proofs of Satisfaction of Requirements by
Algorithm 1
We show that Algorithm 1 satisfies the requirements established in section 2:
pEESNePS 1q (Sufficiency)
During each iteration of AddLoop an element τ is added to
T from some σ P Σ. Then each set σ P Σ containing τ is
removed from Σ. The process is repeated until Σ is empty.
Therefore each removed set σ in Σ contains some τ in T
(Note that each σ will be removed from Σ by the end of the
process). So @σ rσ P Σ Ñ Dτ rτ P pT X σ qs. Q.E.D.
pEESNePS 2q (Minimal Entrenchment)
From lines 8-9, we see that T is comprised solely of first elements of sets in Σ. And from lines 2-4, we see that those
first elements are all minimal under tď relative to the other
34
largest σ in Σ, then lines 2-4 will take Op|Σ| ¨ smax q time. In
line 5, we sort the no-goods’ positions in Σ using their first
elements as keys. This takes Op|Σ| ¨ logp|Σ|qq time. Lines 716 iterate through the elements of Σ at most once for each
element in Σ. During each such iteration, a search is performed for an element within a no-good. Also, during each
iteration through all the no-goods, at least one σ is removed,
though this does not help asymptotically. Since the no-goods
are not sorted, the search takes linear time in smax . So lines
7-16 take Op|Σ|2 ¨ smax q time. Therefore, the running time is
Op|Σ|2 ¨ smax q time.
Note that the situation changes slightly if we sort the
no-goods instead of just placing the minimally-entrenched
proposition at the front, as in lines 2-4. In this case,
each search through a no-good will take Oplogpsmax qq time,
yielding a new total time of Op|Σ| ¨ smax ¨ logpsmax q ` |Σ|2 ¨
logpsmax qq.
elements in each set. Since @e1 , e2 , ď re1 tď e2 Ñ e1 ď e2 s,
those first elements are minimal under ď as well. That is,
@τ rτ P T Ñ Dσ rσ P Σ^ τ P σ ^@wrw P σ Ñ τ ď wsss. Q.E.D.
pEESNePS 3q (Information Preservation)
From the previous proof we see that during each iteration of
AddLoop, we are guaranteed that at least one set σ containing
the current culprit is removed from Σ. And we know that the
current culprit for that iteration is minimally-entrenched in σ .
We also know from pEESNePS 2q that each subsequently chosen culprit will be minimally entrenched in some set. From
lines 2-5 and AddLoop, we know that subsequently chosen
culprits will be less entrenched than the current culprit. From
lines 2-5, we also see that all the other elements in σ have
higher entrenchment than the current culprit. Therefore subsequent culprits cannot be elements in σ . So, they cannot be
used to eliminate σ . Obviously, previous culprits were also
not members of σ . Therefore, if we exclude the current culprit from T , then there will be a set in Σ that does not contain
any element of T . That is,
@T 1 rT 1 Ă T Ñ Dσ rσ P Σ ^ ␣Dτ rτ P pT 1 X σ qsss
6 @T 1 rT 1 Ă T Ñ Dσ r␣␣pσ P Σ ^ ␣Dτ rτ P pT 1 X σ qqsss
6 @T 1 rT 1 Ă T Ñ Dσ r␣p␣pσ P Σq _ Dτ rτ P pT 1 X σ qqsss
6 @T 1 rT 1 Ă T Ñ Dσ r␣pσ P Σ Ñ Dτ rτ P pT 1 X σ qqss
6 @T 1 rT 1 Ă T Ñ ␣@σ rσ P Σ Ñ Dτ rτ P pT 1 X σ qsss Q.E.D.
4 Analysis of Algorithm 2
4.1 Proofs of Satisfaction of Requirements by
Algorithm 2
We show that Algorithm 2 satisfies the requirements established in section 2:
pEESNePS 1q (Sufficiency)
Since every set of propositions must contain at least one
proposition that is minimally entrenched, at least one proposition is added to the list in each iteration of ListLoop. In the
worst case, assume that for each iteration of MainLoop, only
either RemoveLoop or Modi f yLoop do any work. We know
that at least this much work is done for the following reasons: if Modi f yLoop cannot operate on any no-good during
an iteration of MainLoop, then all no-goods have only one
minimally-entrenched proposition. So either RemoveLoop’s
condition at line 10 would hold, or:
1. A no-good has multiple minimally-entrenched propositions, causing Modi f yLoop to do work. This contradicts our
assumption that Modi f yLoop could not do any work during
this iteration of MainLoop, so we set this possibility aside.
2. Some proposition p1 is a non-minimally-entrenched
proposition in some no-good σn , and a minimally-entrenched
one in another no-good σm . In this case, either p1 is removed
during the iteration of RemoveLoop where σm is considered,
or there is another proposition p2 in σm that is not minimallyentrenched in σm , but is in σm1 . This chaining must eventually terminate at a no-good σm f inal since ď is transitive.
And the final proposition in the chain p f inal must be the sole
minimally-entrenched proposition in σ f inal , since otherwise
Modi f yLoop would have been able to do work for this iteration of MainLoop, which is a contradiction. Modi f yLoop
can only do work once for each no-good, so eventually its
work is finished. If Modi f yLoop has no more work left to
do, then RemoveLoop must do work at least once for each iteration of MainLoop. And in doing so, it will create a list of
culprits of which each no-good contains at least one. Q.E.D.
Decidability
We see that DeleteLoop is executed once for each element in
Σ, which is a finite set. So it always terminates. We see that
AddLoop terminates when Σ is empty. And from lines 8 and
13 we see that at least one set is removed from Σ during each
iteration of AddLoop. So AddLoop always terminates. Lines
2-4 involve finding a minimum element, which is a decision
procedure. Line 5 performs sorting, which is also a decision
procedure. Since every portion of Algorithm 1 always terminates, it is a decision procedure. Q.E.D.
Supplementary Requirement
Algorithm 1 is a fully-automated procedure that makes no
queries of the user. Q.E.D.
3.2 Complexity of Algorithm 1
Space Complexity
Algorithm 1 can be run completely in-place, i.e. it can use
only the memory allocated to the input, with the exception of
the production of the set of culprits T . Let us assume that the
space needed to store a single proposition is Op1q memory
units. Since we only need to remove one proposition from
each no-good to restore consistency, algorithm 1 uses Op|Σ|q
memory units.
Time Complexity
The analysis for time complexity is based on a sequentialprocesing system. Let us assume that we implement lists as
array structures. Let us assume that we may determine the
size of an array in Op1q time. Let us also assume that performing a comparison using tď takes Op1q time. Then in
lines 2-4, for each array σ P Σ we find the minimum element σ and perform a swap on two elements at most once
for each element in σ . If we let smax be the cardinality of the
pEESNePS 2q (Minimal Entrenchment)
Since propositions are only added to T when the condition in
line 10 is satisfied, it is guaranteed that every proposition in
35
T is a minimally-entrenched proposition in some no-good σ .
Op|Σ|q time. We noted earlier that during each iteration of
MainLoop, RemoveLoop or Modi f yLoop will do work. In
the worst case, only one will do work each time. And they
each may do work at most |Σ| times. So the total running
time for the procedure is Op|Σ|2 ¨ s2max q.
pEESNePS 3q (Information Preservation)
From line 10, we see that when a proposition p is removed,
none of the other propositions in its no-good are minimallyentrenched in any other no-good. That means none of the
other propositions could be a candidate for removal. So, the
only way to remove the no-good in which p appears is by removing p. So if p were not removed, then pEESNePS 1q would
not be satisfied. Q.E.D.
5 Annotated Demonstrations
A significant feature of our work is that it generalizes previous published work on belief revision in SNePS [Johnson
and Shapiro, 1999; Shapiro and Johnson, 2000; Shapiro and
Kandefer, 2005]. The following demonstrations showcase the
new features we have introduced to SNeBR, and capture the
essence of belief revision as seen in the papers mentioned
above by using well-specified epistemic ordering functions.
The demos have been edited for formatting and clarity.
The commands br-tie-mode auto and br-tie-mode manual
indicate that Algorithm 1 and Algorithm 2 should be used respectively. A wff is a well-formed formula. A wff followed
by a period (.) indicates that the wff should be asserted, i.e.
added to the knowledge base. A wff followed by an exclamation point (!) indicates that the wff should be asserted, and
that forward inference should be performed on it.
Decidability
ListLoop creates lists of minimal elements of lists. This
is a decision procedure since the comparator is a total preorder. From the proof of pEESNePS 1q above, we see that
either RemoveLoop or Modi f yLoop must do work for each
iteration of MainLoop. Modi f yLoop cannot operate more
than once on the same no-good, because there are no longer
multiple minimally-entrenched propositions in the no-good
after it does its work. Nor can RemoveLoop operate twice
on the same no-good, since the no-good is removed when
Modi f yLoop does work. So, eventually Modi f yLoop has no
more work to do, and at that point RemoveLoop will remove
at least one no-good for each iteration of MainLoop. By lines
17-18, when the last no-good is removed, the procedure terminates. So it always terminates. Q.E.D.
Says Who?
We present a demonstration on how the source-credibilitybased revision behavior from [Shapiro and Johnson, 2000] is
generalized by our changes to SNeBR. The knowledge base
in the demo is taken from [Johnson and Shapiro, 1999]. In the
following example, the command set-order source sets the
epistemic ordering used by SNeBR to be a lisp function that
compares two propositions based on the relative credibility of
their sources. Unsourced propositions are assumed to have
maximal credibility. The sources, as well as their relative
credibility are represented as meta-knowledge in the SNePS
knowledge base. This was also done in [Johnson and Shapiro,
1999] and [Shapiro and Johnson, 2000]. The source function
makes SNePSLOG queries to determine sources of propositions and credibility of sources, using the askwh and ask
commands [Shapiro and The SNePS Implementation Group,
2010]. This allows it to perform inference in making determinations about sources.
Here we see that the nerd and the sexist make the generalizations that all jocks are not smart and all females are not
smart respectively, while the holy book and the professor state
that all old people are smart, and all grad students are smart
respectively. Since Fran is an old female jock graduate student, there are two sources that would claim she is smart, and
two that would claim she is not, which is a contradiction.
Supplementary Requirement
RemoveLoop attempts to compute T each time it is run
from MainLoop. If the procedure does not terminate within
RemoveLoop, then we run Modi f yLoop on at most one nogood. Afterwards, we run RemoveLoop again. Since the user
is only queried when the procedure cannot automatically determine any propositions to remove, we argue that this means
minimal queries are made of the user. Q.E.D.
4.2 Complexity of Algorithm 2
Space Complexity
As before, let smax be the cardinality of the largest no-good
in Σ. In the worst case all propositions are minimally entrenched, so ListLoop will recreate Σ. So ListLoop will use
Op|Σ| ¨ smax q space. RemoveLoop creates a culprit list, which
we stated before takes Op|Σ|q space. ModifyLoop may be implemented in a variety of ways. We will assume that it creates
a list of pairs, of which the first and second elements range
over propositions in the no-goods. In this case Modi f yLoop
uses Op|Σ|2 ¨ s2max q space. So the total space requirement is
Op|Σ|2 ¨ s2max q memory units.
Time Complexity
The analysis for time complexity is based on a sequentialprocesing system. For each no-good σ , in the worst case,
ListLoop will have to compare each proposition in σ agains
every other. So, for each iteration of MainLoop, ListLoop
takes Op|Σ| ¨ s2max q time. There are at most Opsmax q elements
in each list created by ListLoop. So, checking the condition
in line 10 takes Op|Σ| ¨ s2max q time. Lines 12-16 can be executed in Op|Σ| ¨ smax q time. Therefore, RemoveLoop takes
Op|Σ| ¨ s2max q time. We assume that all the work in lines 2427 can be done in constant time. So, Modi f yLoop takes
; ; ; Show origin s e t s
: expert
: br´mode auto
Automatic b e l i e f revision will now be automatically selected .
: br´t ie´mode manual
The user will be consulted when an entrenchment t i e occurs
; ; ; Use source c r e d i b i l i t i e s as epistemic ordering c r i t e r i a .
set´order source
; ; ; The holy book i s a b e t t e r source than the professor .
IsBetterSource ( holybook , prof ) .
; ; ; The professor i s a b e t t e r source than the nerd .
IsBetterSource ( prof , nerd ) .
; ; ; The nerd i s a b e t t e r source than the s e x i s t .
IsBetterSource ( nerd , s e x i s t ) .
36
less credible than the sources for “Fran is smart.”
; ; ; Fran i s a b e t t e r source than the nerd .
IsBetterSource ( fran , nerd ) .
; ; ; Better´Source i s a t r a n s i t i v e r e l a t i o n
a l l (x , y , z ) ({IsBetterSource (x , y) , IsBetterSource (y , z )} &=>
IsBetterSource (x , z ) ) !
; ; ; All jocks are not smart .
a l l (x) ( jock (x)=>˜smart (x) ) . ; wff10
; ; ; The source of the statement ’ All jocks are not smart ’ i s the nerd
HasSource (wff10 , nerd ) .
; ; ; All females are not smart .
a l l (x) ( female (x)=>˜smart (x) ) . ; wff12
; ; ; The source of the statement ’ All females are not smart ’ i s the
sexist .
HasSource (wff12 , s e x i s t ) .
; ; ; All graduate students are smart .
a l l (x) ( grad (x)=>smart (x) ) . ; wff14
; ; ; The source of the statement ’ All graduate students are smart ’ i s
the professor .
HasSource (wff14 , prof ) .
; ; ; All old people are smart .
a l l (x) ( old (x)=>smart (x) ) . ; wff16
; ; ; The source of the statement ’ All old people are smart ’ i s the
holy book .
HasSource (wff16 , holybook ) .
; ; ; The source of the statement ’Fran i s an old female jock who i s a
graduate student ’ i s fran .
HasSource (and{jock ( fran ) , grad ( fran ) , female ( fran ) , old ( fran ) }, fran ) .
; ; ; The KB thus far l i s t´asserted´wffs
wff23 ! : HasSource ( old ( fran ) and female ( fran ) and grad ( fran ) and
jock ( fran ) , fran ) {<hyp,{wff23}>}
wff17 ! : HasSource ( a l l (x) ( old (x) => smart (x) ) , holybook ){<hyp,{wff17}>}
wff16 ! : a l l (x) ( old (x) => smart (x) ) {<hyp,{wff16}>}
wff15 ! : HasSource ( a l l (x) ( grad (x) => smart (x) ) , prof ) {<hyp,{wff15}>}
wff14 ! : a l l (x) ( grad (x) => smart (x) ) {<hyp,{wff14}>}
wff13 ! : HasSource ( a l l (x) ( female (x) => ( ˜ smart (x) ) ) , s e x i s t )
{<hyp,{wff13}>}
wff12 ! : a l l (x) ( female (x) => ( ˜ smart (x) ) ) {<hyp,{wff12}>}
wff11 ! : HasSource ( a l l (x) ( jock (x) => ( ˜ smart (x) ) ) , nerd ) <hyp,{wff11}>}
wff10 ! : a l l (x) ( jock (x) => ( ˜ smart (x) ) ) {<hyp,{wff10}>}
wff9 ! : IsBetterSource ( fran , s e x i s t ) {<der ,{wff3 , wff4 , wff5}>}
wff8 ! : IsBetterSource ( prof , s e x i s t ) {<der ,{wff2 , wff3 , wff5}>}
wff7 ! : IsBetterSource ( holybook , s e x i s t ) {<der ,{wff1 , wff2 , wff3 , wff5}>}
wff6 ! : IsBetterSource ( holybook , nerd ) {<der ,{wff1 , wff2 , wff5}>}
wff5 ! : a l l ( z , y , x) ({IsBetterSource (y , z ) , IsBetterSource (x , y)} &=>
{IsBetterSource (x , z ) }) {<hyp,{wff5}>}
wff4 ! : IsBetterSource ( fran , nerd ) {<hyp,{wff4}>}
wff3 ! : IsBetterSource ( nerd , s e x i s t ) {<hyp,{wff3}>}
wff2 ! : IsBetterSource ( prof , nerd ) {<hyp,{wff2}>}
wff1 ! : IsBetterSource ( holybook , prof ) {<hyp,{wff1}>}
; ; ; Fran i s an old female jock who i s a graduate student ( asserted
with forward inference ) .
and{jock ( fran ) , grad ( fran ) , female ( fran ) , old ( fran ) }!
wff50 ! : ˜ ( a l l (x) ( jock (x) => ( ˜ smart (x) ) ) )
{<ext ,{wff16 , wff22}>,<ext ,{wff14 , wff22}>}
wff24 ! : smart ( fran ) {<der ,{wff16 , wff22}>,<der ,{wff14 , wff22}>}
; ; ; The r e s u l t i n g knowledge base ( HasSource and IsBetterSource omited
for c l a r i t y )
l i s t´asserted´wffs
wff50 ! : ˜ ( a l l (x) ( jock (x) => ( ˜ smart (x) ) ) )
{<ext ,{wff16 , wff22}>, <ext ,{wff14 , wff22}>}
wff37 ! : ˜ ( a l l (x) ( female (x) => ( ˜ smart (x) ) ) ) {<ext ,{wff16 , wff22}>}
wff24 ! : smart ( fran ) {<der ,{wff16 , wff22}>,<der ,{wff14 , wff22}>}
wff22 ! : old ( fran ) and female ( fran ) and grad ( fran ) and jock ( fran )
{<hyp,{wff22}>}
wff21 ! : old ( fran ) {<der ,{wff22}>}
wff20 ! : female ( fran ) {<der ,{wff22}>}
wff19 ! : grad ( fran ) {<der ,{wff22}>}
wff18 ! : jock ( fran ) {<der ,{wff22}>}
wff16 ! : a l l (x) ( old (x) => smart (x) ) {<hyp,{wff16}>}
wff14 ! : a l l (x) ( grad (x) => smart (x) ) {<hyp,{wff14}>}
Wumpus World
We present a demonstration on how the state-constraintbased revision behavior from [Shapiro and Kandefer, 2005]
is generalized by our changes to SNeBR. The command setorder fluent says that propositional fluents are strictly less entrenched than non-fluent propositions. The fluent order was
created specifically to replace the original belief revision behavior of the SNeRE believe act. In the version of SNeBR
used in [Shapiro and Kandefer, 2005], propositions of the
form andorpă 0|1 ą, 1qpp1 , p2 , . . .q were assumed to be state
contraints, while the inner propositions, p1 , p2 , etc., were assumed to be fluents. The fluents were less entrenched than
the state constraints. We see that the ordering was heavily
syntax-dependent.
In our new version, the determination of which propositions are fluents is made by checking for membership of the
predicate symbol of an atomic proposition in a list called
*fluents*, which is defined by the user to include the
predicate symbols of all propositional fluents. So the entrenchment ordering defined here uses metaknowledge about
the knowledge base that is not represented in the SNePS
knowledge base. The command br-tie-mode manual indicates that Algorithm 2 should be used. Note that the xor connective [Shapiro, 2010] used below replaces instances of andor(1,1)(. . . ) from [Shapiro and Kandefer, 2005]. The command perform believe(wff) is identical to the command wff!, except that the former causes wff to be strictly
more entrenched than every other proposition during belief
revision. That is, wff is guaranteed to be safe (unless wff
is itself a contradiction). So we would be using prioritized
belief revision.
; ; ; Show origin s e t s
: expert
; ; ; Always use automatic b e l i e f revision
: br´mode auto
Automatic b e l i e f revision will now be automatically selected .
; ; ; Use algorithm 2
: br´t ie´mode manual
The user will be consulted when an entrenchment t i e occurs .
; ; ; Use an entrenchment ordering t ha t favors non´fluents over
; ; ; fluents
set´order fluent
; ; ; Establish what kinds of propositions are fluents ; specifically ,
t h a t the agent i s facing some direction i s a f a c t t h a t may
change over time .
ˆ ( s e t f ∗fluents∗ ’( Facing ) )
; ; ; The agent i s Facing west
Facing ( west ) .
; ; ; At any given time , the agent i s facing e i t h e r north , south , east ,
or west ( asserted with forward inference ) .
xor{Facing ( north ) , Facing ( south ) , Facing ( east ) , Facing ( west ) }!
; ; ; The knowledge base as i t stands
l i s t´asserted´wffs
wff8 ! : ˜ Facing ( north ) {<der ,{wff1 , wff5}>}
wff7 ! : ˜ Facing ( south ) {<der ,{wff1 , wff5}>}
wff6 ! : ˜ Facing ( east ) {<der ,{wff1 , wff5}>}
wff5 ! : xor{Facing ( east ) , Facing ( south ) , Facing ( north ) , Facing ( west )}
{<hyp,{wff5}>}
wff1 ! : Facing ( west ) {<hyp,{wff1}>}
; ; ; Tell the agent to believe i t i s now facing east .
perform believe ( Facing ( east ) )
; ; ; The r e s u lt i n g knowledge base
l i s t´asserted´wffs
wff10 ! : ˜ Facing ( west ) {<ext ,{wff4 , wff5}>}
wff8 ! : ˜ Facing ( north ) {<der ,{wff1 , wff5}>,<der ,{wff4 , wff5}>}
wff7 ! : ˜ Facing ( south ) {<der ,{wff1 , wff5}>,<der ,{wff4 , wff5}>}
We see that the statements that all jocks are not smart and
that all females are not smart are no longer asserted at the
end. These statements supported the statement that Fran is
not smart. The statements that all old people are smart and
that all grad students are smart supported the statement that
Fran is smart. The contradiction was resolved by contracting
“Fran is not smart,” since the sources for its supports were
37
[Gärdenfors, 1988] P. Gärdenfors. Knowledge in Flux: Modeling the Dynamics of Epistemic States. The MIT Press,
Cambridge, Massachusetts, 1988.
[Gärdenfors, 1989] P. Gärdenfors. The dynamics of belief
systems: Foundations vs. coherence. Revue Internationale
de Philosophie, 1989.
[Hansson and Olsson, 1999] S. O. Hansson and E. J. Olsson.
Providing foundations for coherentism. Erkenntnis, 51(2–
3):243–265, 1999.
[Hansson, 1994] S. O. Hansson. Kernel contraction. The
Journal of Symbolic Logic, 59(3):845–859, 1994.
[Hansson, 1997] S. O. Hansson. Semi-revision. Journal of
Applied Non-Classical Logics, 7(2):151–175, 1997.
[Hansson, 1999] S. O. Hansson. A survey of non-prioritized
belief revision. Erkenntnis, 50:413–427, 1999.
[Johnson and Shapiro, 1999] F. L. Johnson and S. C.
Shapiro. Says Who? - Incorporating Source Credibility
Issues into Belief Revision. Technical Report 99-08, Department of Computer Science and Engineering, SUNY
Buffalo, Buffalo, NY, 1999.
[Lakemeyer, 1991] Lakemeyer. On the relation between explicit and implicit beliefs. In Proc. KR-1991, pages 368–
375. Morgan Kaufmann, 1991.
[Martins and Shapiro, 1988] J. P. Martins and S. C. Shapiro.
A model for belief revision.
Artificial Intelligence,
35(1):25–79, 1988.
[Shapiro and Johnson, 2000] S. C. Shapiro and F. L. Johnson. Automatic belief revision in SNePS. In C. Baral and
M. Truszczynski, editors, Proc. NMR-2000, 2000. unpaginated, 5 pages.
[Shapiro and Kandefer, 2005] S. C. Shapiro and M. Kandefer. A SNePS Approach to the Wumpus World Agent or
Cassie Meets the Wumpus. In L. Morgenstern and M. Pagnucco, editors, NRAC-2005, pages 96–103, 2005.
[Shapiro and The SNePS Implementation Group, 2010]
Stuart C. Shapiro and The SNePS Implementation Group.
SNePS 2.7.1 USER’S MANUAL. Department of Computer
Science and Engineering, SUNY Buffalo, December
2010.
[Shapiro, 1992] Stuart C. Shapiro. Relevance logic in computer science. Section 83 of A. R. Anderson and N. D.
Belnap, Jr. and J. M/ Dunn et al. Entailment, Volume II,
pages 553–563. Princeton University Press, Princeton, NJ,
1992.
[Shapiro, 2010] S. C. Shapiro. Set-oriented logical connectives: Syntax and semantics. In F. Lin, U. Sattler, and
M. Truszczynski, editors, KR-2010, pages 593–595. AAAI
Press, 2010.
[Williams, 1994] M.-A. Williams. On the logic of theory
base change. In C. MacNish, D. Pearce, and L. Pereira, editors, Logics in Artificial Intelligence, volume 838 of Lecture Notes in Computer Science, pages 86–105. Springer
Berlin / Heidelberg, 1994.
wff5 ! : xor{Facing ( east ) , Facing ( south ) , Facing ( north ) , Facing ( west )} {<
hyp,{wff5}>}
wff4 ! : Facing ( east ) {<hyp,{wff4}>}
There are three propositions in the no-good when revision is performed: Facing(west), Facing,east, and
xor(1,1){Facing(...}. Facing(east) is not considered for removal since it was prioritized by the believe
action. The state-constraint xor(1,1){Facing...} remains in the knowledge base at the end, because it is more
entrenched than Facing(west), a propositional fluent,
which is ultimately removed.
6 Conclusions
Our modified version of SNeBR provides decision procedures for belief revision in SNePS. By providing a single resulting knowledge base, these procedures essentially perform
maxichoice revision for SNePS. Using a well preorder, belief
revision can be performed completely automatically. Given a
total preorder, it may be necessary to consult the user in order to simulate a well preorder. The simulated well preorder
need only be partially specified; it is only necessary to query
the user when multiple beliefs are minimally-epistemicallyentrenched within a no-good, and even then only in the case
where no other belief in the no-good is already being removed. In any event, the epistemic ordering itself is usersupplied. Our algorithm for revision given a well preorder
uses asymptotically less time and space than the other algorithm, which uses a total preorder. Our work generalize previous belief revision techniques employed in SNePS.
Acknowledgments
We would like to thank Prof. William Rapaport for providing
editorial review, and Prof. Russ Miller for his advice concerning the analysis portion of this paper.
References
[Alchourron and Makinson, 1985] C.E. Alchourron and
D. Makinson. On the logic of theory change: Safe
contraction. Studia Logica, (44):405–422, 1985.
[Alchourron et al., 1985] C. E. Alchourron, P. Gärdenfors,
and D. Makinson. On the logic of theory change: Partial meet contraction and revision functions. Journal of
Symbolic Logic, 20:510–530, 1985.
[de Kleer, 1986] J. de Kleer. An Assumption-Based TMS.
Artificial Intelligence, 28:127–162, 1986.
[Gärdenfors and Rott, 1995] P. Gärdenfors and H. Rott. Belief revision. In Gabbay, Hogger, and Robinson, editors,
Epistemic and Temporal Reasoning, volume 4 of Handbook of Logic in Artificial Intelligence and Logic Programming, pages 35–131. Clarendon Press, Oxford, 1995.
[Gärdenfors, 1982] P. Gärdenfors. Rules for rational changes
of belief. In T. Pauli, editor, Philosophical Essays Dedicated to Lennart Åqvist on His Fiftieth Birthday, number 34 in Philosophical Studies, pages 88–101, Uppsala,
Sweden, 1982. The Philosophical Society and the Department of Philosophy, University at Uppsala.
38
Decision-Theoretic Planning for Golog Programs with Action Abstraction
Daniel Beck and Gerhard Lakemeyer
Knowledge Based Systems Group
RWTH Aachen University, Aachen, Germany
{dbeck,gerhard}@cs.rwth-aachen.de
Abstract
troublesome in DTGolog. Whereas the semantics of Golog
allow the agent to freely choose those arguments, DTGolog
needs to restrict the choice to a finite, pre-defined list. The
reason being that DTGolog performs a forward search and
branches over the possible continuations of the remaining
program (and also over the outcomes of stochastic actions)
which requires that the number of successor states in the
search tree is finite. Generally, what the possible choices are
and how many there are in any domain instance is unknown
a-priori and thus the approach of DTGolog is not directly extensible to handle an unconstrained nondeterministic choice
of arguments. In [Boutilier et al., 2001] an approach that allows to solve an MDP using dynamic programming methods
on a purely symbolic level was presented. The key idea was
that from the first-order description of the MDP a first-order
representation of the value function can be derived. This representation of the value function allows not only abstraction
over the state space but also it allows to abstract over action
instances. We show how these ideas extend in the presence
of programs that constrain the search for the optimal policy.
Finding the optimal execution of a DTGolog program (or,
more precisely, the optimal policy compatible with the program) is understood as a multi-objective optimization problem where the objectives are the expected cumulative reward
and the probability of successfully executing the program.
The latter refers to the probability of not ending up in a configuration in which the program cannot be executed any further. We show how symbolic representations of the functions
representing these quantities can be derived. With the help of
these functions we then can extend the semantics of DTGolog
to programs containing an unrestricted choice of arguments.
In fact, we show that for DTGolog programs the original DTGolog interpreter and our extended version compute the same
policies.
DTGolog combines the ability to specify an MDP
in a first-order language with the possibility to restrict the search for an optimal policy by means
of programs. In particular, it employs decisiontheoretic planning to resolve the nondeterminism
in the programs in an optimal fashion (wrt an underlying optimization theory). One of the nondeterministic constructs DTGolog offers is the nondeterministic choice of action arguments. The possible choices, though, are restricted to a finite, predefined list. We present an extension to DTGolog
that overcomes this restriction but still retains the
optimality property of DTGolog. That is, in our extended version of DTGolog we can formulate programs that allow for an unrestricted choice of action arguments even in domains where there are infinitely many possible choices. The key to this is
that we compute the optimal execution strategy for
a program on the basis of abstract value functions.
We present experiments which show that these extensions may lead to a speed-up in the computation
time in comparison to the original DTGolog.
1
Introduction
Markov decision processes (MDPs) [Puterman, 1994] have
proved to be a conceptually adequate model for decisiontheoretic planning. Their solution, though, is often intractable. DTGolog [Boutilier et al., 2000], a decisiontheoretic extension of the high-level agent programming language Golog [Levesque et al., 1997], tackles this problem by
constraining the search space with a Golog program. In particular, only the policies which comply with the program are
considered during the search. The agent programming language Golog is based on the situation calculus, has a clearly
defined semantics and offers programming constructs known
from other programming languages (e.g., conditionals, nondeterministic choice, etc.). Thus, DTGolog programs can
be understood as an advice to the decision-theoretic planner.
Their semantics is understood as the optimal execution of the
program.
There is one particular nondeterministic construct, namely
the nondeterministic choice of arguments, which is a little
We provide a short overview of the situation calculus,
Golog and DTGolog in Section 1.1. In Section 2 we introduce the case notation which we use to represent the abstract
value functions for Golog programs presented in Section 3.
Using these abstract value functions we provide the semantics of our DTGolog extension in Section 4. We discuss the
advantages and disadvantages of our extension over the original DTGolog in Section 5.
39
1.1 The Situation Calculus and Golog
For every combination of a stochastic action and one of its
associated, deterministic actions the probability with which
Nature chooses the deterministic action needs to be specified.
To continue the example from above we might have
The situation calculus is a first-order logic (with second-order
elements which are of no concern to us, here) with sorts for
situations and actions. The binary predicate symbol do(a, s)
denotes the situation resulting from executing action a in situation s; the constant S0 denotes the initial situation. Fluents
are regular function- or predicate-symbols that take a term of
sort situation as their last argument. According to Reiter’s solution of the frame problem (cf. [Reiter, 1991]) the value of a
fluent in a particular situation can be determined with the help
of so-called successor-state axioms (SSAs) of which one has
to exists for every fluent. For instance for the fluent F (~x, s):
F (~x, do(a, s)) ≡ ΦF (~x, a, s)
where, intuitively, ΦF (~x, a, s) describes the conditions which
have to hold in situation s such that in the successor situation
do(a, s), after executing the action a, the fluent F holds for
the parameters ~x.
The preconditions for actions are specified by axioms of
the form
P oss(A(~x), s) ≡ ΠA (~x, s)
where ΠA (~x, s) describes the preconditions that have to hold
before the action A(~x) can be executed.
A basic action theory (BAT) D then consists of the foundational axioms Σ constraining the form of situation terms,
the successor-state axiom Dssa , the action preconditions Dap ,
unique names assumptions for actions Duna , and a description of the initial situation DS0 .
By means of regression a regressable formula, basically a
formula where all terms of sort situation are rooted in S0 , can
be transformed into an equivalent formula which only mentions the initial situation S0 . Thereby reasoning is restricted
to reasoning about formulas in the initial situation. In particular, every occurrence of a fluent having a non-initial situation
as its last argument is replaced with the right-hand side of its
SSA. The regression operator R for a formula whose situation arguments are of the form do(a, s) is defined as follows:
R(F (~x, do(a, s))) = ΦF (~x, a, s)
R(¬φ) = ¬R(φ)
R(φ ∧ ψ) = R(φ) ∧ R(ψ)
R(∃x. φ) = ∃x. R(φ)
In order to model stochastic domains, that is, domains
where the effect of performing an action is not deterministic, but different effects might occur with certain probabilities, some kind of stochastic actions are necessary. In DTGolog those stochastic actions are modelled with the help of
a number of associated, deterministic actions that describe
the possible outcomes when executing the stochastic action.
The understanding is that Nature chooses between the associated, deterministic actions when executing the stochastic action. For instance, in a Blocks World domain the (stochastic) move(b1 , b2 ) action whose outcomes are described by the
deterministic actions moveS(b1 , b2 ) and moveF (b1 , b2 ), respectively. Notationally, this is captured by
def.
prob0 (moveS(b1 , b2 ), move(b1 , b2 ), s) = p =
¬heavy(b1 ) ∧ p = 0.9 ∨ heavy(b1 ) ∧ p = 0.1
def.
prob0 (moveF (b1 , b2 ), move(b1 , b2 ), s) = p =
¬heave(b1 ) ∧ p = 0.1 ∨ heavy(b1 ) ∧ p = 0.9
which says that if the block to be moved is heavy the moveaction succeeds with a probability of 0.1 and fails with a probability of 0.9. Note that prob0 neglects the preconditions of
the associated, deterministic actions. Also, the probability of
Nature choosing another than one of the associated, deterministic actions has to be 0:
def.
prob(a, α, s) = p =
choice(α, a) ∧ P oss(a, s) ∧ p = prob0 (a, α, s)
∨ ¬(choice(α, a) ∧ P oss(a, s)) ∧ p = 0
It is crucial that the axiomatizer ensures that the probability
distribution is well-defined, that is, the probabilities over the
deterministic outcome actions always sum up to 1.
When actually executing a program containing stochastic
actions, it is necessary to determine which of the associated,
deterministic actions has been selected by Nature during the
execution. Consequently, some kind of sensing is required.
In particular, we assume that for every stochastic action there
is a unique associated sense action (which itself is a stochastic action) and sense outcome conditions which discriminate
Nature’s choices. The intention behind this is that when the
agent actually executes a stochastic action, it can execute the
associated sense action to acquire the necessary information
from its sensors afterwards to unambiguously determine the
action chosen by Nature with the help of the sense outcome
conditions. Since we assume full observability, we can assume that the sensing is accurate and consequently the associated sense action is a noop-action in the theory.
Besides the BAT DTGolog also requires an optimization
theory in order to determine the optimal execution strategy for
a program. This theory includes axioms defining the reward
function reward(s) which assesses the current situation. For
instance:
reward(do(moveS(B1 , B2 ), s)) = 10
The kind of programs we consider are similar to regular
Golog programs with the only exception that the primitives
in the programs are not deterministic but stochastic actions.
In particular, the following program constructs are available:
δ1 ; δ2
sequences
ϑ?
test actions
if ϑ then δ1 else δ2 end
conditionals
while ϑ do δ end
loops
(δ1 | δ2 )
nondeterministic branching
π v. (γ)
nondeterministic choice of argument
proc P (~x) δP end
procedures (including recursion)
In DTGolog only a restricted version of the nondeterministic
def.
choice(move(b1 , b2 ), a) =
a = moveS(b1 , b2 ) ∨ a = moveF (b1 , b2 )
40
For the casemax-operator we assume that the formulas in
the input case statement are sorted in a descending order, that
is, vi > vi+1 . For this it is necessary that the vi are numerical
constants which is what we assume for the remainder of this
paper. Then, the operator is defined as follows:
choice of argument is supported which is semantically equivalent to a nondeterministic branching over the same program
but with different arguments. In our extension of DTGolog
we support the unrestricted nondeterministic choice of arguments.
2
def.
The Case Notation
casemax case[φi , vi : i ≤ n] =
We use a case notation similar to that introduced in [Boutilier
et al., 2001]. This notation is convenient for the representation of finite case distinctions, that is, piecewise constant
functions which have a finite number of different values. We
write case[φ1 , v1 ; . . . ; φn , vn ] (or case[φi , vi : i ≤ n] for
short) as an abbreviation for
n
_
case[φi ∧
¬φj , vi : i ≤ n] .
j<i
Generally, a formula v = case[φi , vi : i ≤ n] might be ambiguous wrt the value of v, i.e., the φi are not required to
hold mutually exclusively. Applying the casemax-operator
to case[φi , vi : i ≤ n] remedies this problems. In the resulting case statement the formulas hold mutual exclusively and,
furthermore, the value of v is maximized. Given an ordering
over two-valued tuples that allow to pre-sort the formulas in
the input case statement the casemax-operator can be applied
on two-value case statements as well.
The expressions ψ ∧ case[φi , vi : i ≤ n] and
∃x. case[φi , vi : i ≤ n] are case statements as well. Due
to the disjunctive nature of the case statements the conjunction can be distributed into the disjunction and the existential quantifier can be moved into the disjunction. The resulting case statements then are case[ψ ∧ φi , vi : i ≤ n] and
case[∃x. φi , vi : i ≤ n], respectively.
We assume that the reward function reward(s) and the
probability distribution over the deterministic actions associated with a stochastic action are specified using case statements:
φi ∧ µ = vi
i=1
The vi are numerical expressions, that is, expressions that
evaluate to numbers. The variable µ is a special variable that
is reserved for the use in case statements and must not be
used anywhere else. In order to use case statements in formulas without explicitly referring to µ we define the following
macro:
def.
v = case[φi , vi : i ≤ n] = case[φi , vi : i ≤ n]µv
Furthermore, we slightly extend the case notation to
allow the representation of a two-valued function:
case[φ1 , (v1 , p1 ); . . . ; φn , (vn , pn )] is used as an abbreviation for:
n
_
φi ∧ µ1 = vi ∧ µ2 = pi .
rew
reward(s) = case[φrew
1 (s), r1 ; · · · ; φm (s), rm ]
i=1
and
The vi and pi are numerical expressions and µ1 and µ2 are
reserved variables. In a similar fashion as in the single-valued
case we define the macro:
prob(Nj (~x), A(~x), s) =
x, s), p1 ; · · · ; φA
x, s), pn ]
case[φA
j,1 (~
j,n (~
def.
(v, p) = case[φi , (vi , pi ) : i ≤ n] =
case[φi , (vi , pi ) : i ≤ n]µv 1
^
µ2
p
We denote them by rCase(s) and pCaseA
x, s), respecj (~
tively. Since these case statements define functions it is necx, s)}
(s)} and the {φA
essary that each of the sets {φrew
j,i (~
i
partitions the state space. That is, for every ~x and s a
unique value can be determined. Formally, a set of formulas W
{ψi (~x, s)} is said to partition the state space iff |=
∀~x, s. i ψi (~x, s) and |= ∀~x, s. ψi (~x, s) ⊃ ¬ψj (~x, s) for all
i, i 6= j.
.
By means of the ◦-operator two single-valued case statements
can be combined to a two-valued case statement:
def.
case[φi , vi : i ≤ n] ◦ case[ψj , vj′ : j ≤ m] =
case[φi ∧ ψj , (vi , vj′ ) : i ≤ n, j ≤ m]
Further operators we use (for single-valued case statements) are the binary operators ⊕, ⊗, and ∪ and the unary
casemax-operator for symbolic maximization (cf. [Sanner
and Boutilier, 2009]).
3
Abstract Value Functions
The type of programs we consider cannot be directly executed, the nondeterminism in the program needs to be resolved first. Of course, the agent executing the program
strives for an optimal execution strategy for the program. Optimality, in this case, is defined with respect to the expected
reward accumulated during the first h steps of the program
and the probability that these first h steps can be executed successfully (i.e., the probability of not running into a situation
in which the program cannot be executed any further). Those
two quantities are measured by the value functions Vhδ (s) and
case[φi , vi , : i ≤ n] ⊗ case[ψj , vj′ : j ≤ m] =
case[φi ∧ ψj , vi · vj′ : i ≤ n, j ≤ m]
case[φi , vi : i ≤ n] ⊕ case[ψj , vj′ : j ≤ m] =
case[φi ∧ ψj , vi + vj′ : i ≤ n, j ≤ m]
case[φi , vi : i ≤ n] ∪ case[ψj , vj′ : j ≤ m] =
′
case[φ1 , vi ; . . . ; φn , vn ; ψ1 , v1′ ; . . . ; ψm , vm
]
41
Phδ (s), respectively. Our intention and the key to abstracting from the actual situation is to identify regions of the state
space in which these functions are constant. The advantages
of such an abstract function are twofold. First, these functions
can be pre-computed since they are independent of the actual
situation (and the initial situation). This allows to apply some
simplification in order to lower the time necessary to evaluate the formula. Second, these abstract value functions allow
to asses the values of a program containing nondeterministic
choices of action arguments without explicitly enumerating
all possible (ground) choices for these arguments. Rather the
abstract value functions abstract from the actual choices by
identifying properties for the action arguments that lead to
a certain value for the expected reward and the probability
of successfully executing the program, respectively. For instance, if a high reward is given to situations where there is
a green block on top of a non-green block and the program
tells the agent to pick a block and move it onto another nondeterministically chosen block, then the value function for the
expected reward distinguishes the cases where a green block
is moved on top of a non-green block from the other constellations. What it does not do is to explicitly refer to the green
and the non-green blocks in the domain instance. Thus, given
these abstract value functions, the nondeterminism in the program can be resolved by settling on the choice maximizing
the abstract value functions when evaluated in the current situation.
For a program δ and a horizon h we compute case statements Vhδ (s) and Phδ (s) representing the abstract value functions. As can be seen in the definition below the computation of these case statements is independent of the situation
s. Vhδ (s) and Phδ (s) are inductively defined on the structure
of δ. Since the definition is recursive we first need to assume that the horizon h is finite and that the programs are
nil-terminated which can be achieved easily by sequentially
combining a program with the empty program nil.
1. Zero horizon:
def.
to the situation s and not to any of the successor situations.
For the probability of successfully executing the program the definition is quite similar only that the immediate reward is ignored:
A(~
x);δ
Ph
j=1
δ
⊗ R(Ph−1
(do(Nj (~x), s)))
4. The program begins with a test action:
def.
Vhϑ?;δ (s) = (ϑ[s] ∧ Vhδ (s)) ∪ (¬ϑ[s] ∧ rCase(s))
In case the test does not hold the execution of the program has to be aborted and consequently no further rewards are obtained.
def.
Phϑ?;δ (s) = (ϑ[s] ∧ Phδ (s)) ∪ case[¬ϑ[s], 0]
5. The program begins with a conditional:
def.
Vhif ϑ then δ1 else δ2 end;δ (s) =
(ϑ[s] ∧ Vhδ1 ;δ (s)) ∪ (¬ϑ[s] ∧ Vhδ2 ;δ (s))
Analogous for Phif ϑ then δ1 else δ2 end;δ (s).
6. The program begins with a nondeterministic branching:
(δ1 | δ2 );δ
Vh
case[φi , vi → idxi ]
def.
where idxi = 1 if φi stems from Vhδ1 ;δ (s) and idxi =
2 if φi stems from Vhδ2 ;δ (s). This allows the agent
to reconstruct what branch has to be chosen when φi
holds in the current situation. For all further operations
on the case statement those mappings can be ignored.
(δ | δ );δ
Ph 1 2 (s) is defined analogously.
7. The program begins with a nondeterministic choice of
arguments:
For the remaining cases we assume h > 0.
2. The empty program nil:
def.
Vhnil (s) = rCase(s) and Phnil (s) = case[true, 1]
3. The program begins with a stochastic action A(~x) with
outcomes N1 (~x), . . . , Nk (~x):
A(~
x);δ
Vh
k
M
j=1
def.
(s) = casemax (Vhδ1 ;δ (s) ∪≥ Vhδ2 ;δ (s))
where ∪≥ is an extended version of the ∪-operator that
additionally sorts the formulas according to their values
such that vi ≥ vi+1 holds in the resulting case statement.
Another minor modification of the ∪-operator is necessary to keep track of from where the formulas originate.
The resulting case statement then looks like this:
V0δ (s) = rCase(s) and P0δ (s) = case[true, 1]
def.
k
M
pCaseA
x, s)
j (~
def.
(s) =
def.
(s) = rCase(s)⊕
π x. (γ);δ
Vh
δ
pCaseA
x, s) ⊗ R(Vh−1
(do(Nj (~x), s)))
j (~
def.
(s) = casemax ∃x. Vhγ;δ (s)
Note that the resulting case statement is independent
of the actually available choices for x. The formulas
φi (x, s) in Vhγ;δ (s) (which mention x as a free variable) describe how the choice for x influences the expected reward for the remaining program γ; δ. To obtain
π v. (γ);δ
Vh
(s) it is then maximized over the existentially
π x. (γ);δ
quantified case statement Vhγ;δ (s). Again, Ph
(s)
is defined analogously.
That is, the expected value is determined as the sum of
the immediate reward and the sum over the expected
values executing the remaining program in the possible
successor situations do(Nj (~x), s) each weighted by the
probability of seeing the deterministic actions Nj (~x) as
the outcome. Due to regression the formulas only refer
42
execution of δ is terminated. This is denoted by the special
action Stop.
8. The program begins with a sequence:
[δ1 ;δ2 ];δ3
Vh
def.
δ ;[δ2 ;δ3 ]
(s) = Vh 1
(s)
def.
BestDo+ (δ, 0, s, ρ) = ρ = Stop
that is, we associate the sequential composition to the
right. By possibly repetitive application of this rule the
program is transformed into a form such that one of the
cases above can be applied.
If the program begins with a stochastic action α a policy for every possibly outcome n1 , . . . , nk is determined by
means of the auxiliary macro BestDoAux+ which expects
a list of deterministic outcome actions as its first argument.
senseEffect α is the sense action associated with α.
9. Procedure calls:
The problem with procedures is that it is not clear how
to macro expand a procedure’s body when it includes a
recursive procedure call. Similar to how procedures are
handled by Golog’s Do-macro we define an auxiliary
macro:
P (t1 ,...,tn );δ
Vh
def.
BestDo+ ([α; δ], h, s, ρ) =
∃ρ′ . ρ = [α; senseEffect a ; ρ′ ]
∧ BestDoAux+ ({n1 , . . . , nk }, h, s, ρ′ )
def.
(s) = P (t1 [s], . . . , tn [s], δ, s, h, v)
If the first argument, the list of outcome actions, is empty then
BestDoAux+ expands to
We consider programs including procedure definitions
to have the following form:
def.
BestDoAux+ ({}, h, s, ρ) = ρ = Stop.
{proc P1 (~v1 ) δ1 end; · · · ; proc Pn (~vn ) δn end; δ0 }
Otherwise, the first action n1 of the list is extracted and (if
it is possible) a policy for the remaining program starting
in the situation do(n1 , s) is computed by BestDo+ . Then,
the policy is assembled by branching over the sense outcome
condition θ1 for outcome n1 . The if-branch is determined
by BestDo+ (δ, do(ni , s), h − 1), ρ1 ); the else branch by the
BestDoAux+ macro for the remaining outcome actions.
Then, we define the optimal expected value obtainable
for executing the first h steps of such a program as:
{proc P1 (~
v1 ) δ1 end;··· ;proc Pn (~
vn ) δn end;δ0 }
Vh
∀Pi . [
n
^
def.
(s) =
′
∀~vi , s′ , h′ , δ ′ , v. v = Vhδ′i ;δ (s′ )
def.
i=1
′
⊃ P (~vi , δ , s, h, v)] ⊃
BestDoAux+ ({n1 , . . . , nk }, h, s, ρ) =
Vhδ0 (s)
¬P oss(n1 , s) ∧ BestDoAux+ ({n2 , . . . , nk }, h, s, ρ)
∨ P oss(n1 , s)
Lemma 1. For every δ and h, the formulas in Vhδ (s) and
Phδ (s) partition the state space.
∧ ∃ρ′ . BestDoAux+ ({n2 , . . . , nk }), h, s, ρ′ )
∧ ∃ρ1 . BestDo(δ, h − 1, do(n1 , s), ρ1 )
Proof. (Sketch) By definition the formulas in rCase(s) and
pCaseA
x, s) partition the state space. The operations on
j (~
case statements used in the definition of Vhδ (s) Phδ (s) retain
this property.
4
∧ ρ = if θ1 then ρ1 else ρ′
The cases where the program begins with a test-action or
a conditional are handled in a quite similar manner by DTGolog’s BestDo which is why we omit them here. The cases
where the program begins with a nondeterministic statement
are handled quite differently, though. Whereas BestDo computes the expected reward as well as the probability of successfully executing the remaining program for the current situation, BestDo+ (δ, s, h, ρ) relies on Vhδ (s) and Phδ (s) for
that. If the program begins with a nondeterministic branching another auxiliary macro is necessary:
Semantics
Informally speaking, the semantics for the kind of programs
we consider is given by the optimal h-step execution of the
program. Formally, it is defined by means of the macro
BestDo+ (δ, s, h, ρ) where δ is the program for which a hstep policy ρ in situation s shall be computed. A policy is a
special kind of program that is intended to be directly handed
over to the execution system of the agent and executed without further deliberation. A policy for a program δ “implements” a (h-step) execution strategy for δ: it resolves the
nondeterminism in δ and considers the possible outcomes of
stochastic actions. In particular, it may proceed differently
depending on what outcome actually has been chosen by Nature.
The macro BestDo+ (δ, s, h, ρ) is defined inductively on
the structure of δ. Its definition is in parts quite similar to
that of DTGolog’s BestDo which is why we do not present
all cases here, but focus on those where the definitions differ.
Clearly, if h equals zero the horizon has been reached and the
def.
BestDo+ ((δ1 | δ2 ); δ, s, h, ρ) =
BestDoNDet((δ1 | δ2 ); δ, s, h,
case[φi (s), (vi , pi ) → idxi ], ρ)
where the forth argument of BestDoNDet, the case statement, is the result obtained from
the casemax applying
δ1 ;δ
δ1 ;δ
δ2 ;δ
operator on Vh (s) ◦ Ph (s) ∪≥ Vh (s) ◦ Phδ2 ;δ (s)
where ≥ implies an ordering over tuples (vi , pi ) and implements the trade-off between the expected reward and
the probability of successfully executing the program. The
43
5
BestDoNDet-macro then is defined as:
The major difference between the original DTGolog and our
extended version of DTGolog is that the latter allows for
an unrestricted nondeterministic choice of arguments. DTGolog, on the other hand, only allows the agent to choose
from a finite, pre-defined list of possibilities. The practical
implications of this are that the programs are tailored to specific domain instantiations—allowing the agent to choose between the blocks B1 , B2 , and B3 , for instance, only makes
sense if there are blocks with those names. On the other hand
there might be other blocks than those three and and in that
case limiting the choice to B1 , B2 , and B3 is not always what
is intended. In our extension of DTGolog the choice is, in
general, unrestricted. Any intended restriction on the choice
can be implemented by including a corresponding test in the
program. For instance, if the programmer wants the agent
to choose a green block and do something with it she would
write
π x. (?(green(x)); · · · ); · · ·
BestDoNDet((δ1 | δ2 ); δ, s, h,
def.
case[φi (s), (vi , pi ) → idxi ], ρ) =
_
φi (s) ∧ BestDo+ (δidxi ; δ, s, h, ρ)
i
According to Lemma 1 exactly one of the φi (s) holds and
thus the decision of whether to continue with the policy computed for δ1 ; δ or for δ2 ; δ is unambiguous.
If the remaining program begins with a nondeterministic
choice of arguments the definition of BestDo+ again relies
on an auxiliary macro BestDoP ick:
def.
BestDo+ (π x. (γ); δ, s, h, ρ) =
BestDoPick (π x. (γ); δ, s, h, Vhγ;δ (s) ◦ Phγ;δ (s), ρ)
The
definition
of
BestDoPick (π x. (γ); δ, s, h,
case[φi (x, s), (vi , pi )], ρ) resembles the operation method
of the casemax-operator. We assume that the φi (x, s) are
sorted such that (vi , pi ) ≥ (vi+1 , pi+1 ). Then:
Our approach can handle the unrestricted choice of arguments
due to the (first-order) state- as well as action-abstraction
that is achieved by means of the case-statements Vhδ (s) and
Phδ (s). The consequence thereof is that the branching factor of the search tree spanned by BestDo+ depends on the
number of cases in Vhδ (s) and Phδ (s) and not on the number of (ground) choices given to the agent as it is the case
for BestDo. Although DTGolog is not limited to finite
domains it is not capable of incorporating infinitely many
ground choices into its decisions. Our extension can do so
due to its abstraction mechanisms. On the other hand the
formulas in the case statements can get large, actually even
unmanageable large, quite quickly. But one can argue that
the complexity of the formulas resulting form expanding the
BestDo-macro is comparable.
To see how this turns out in practise and whether we can
even achieve a speed-up in the computation of a policy in
comparison to DTGolog we performed tests in two different
domains. The first domain is a Blocks World domain. The
purpose of this domain is to test how the number of available
blocks affects the time it takes to compute a policy. The second domain is the logistics domain which consists of several
cities, trucks, and boxes which can be transported from one
city to another. Here the goal is to see how DTGolog with and
without action abstraction compare with each other in a domain that is a little bit more complex than the Blocks World
domain.
Both interpreters, the standard DTGolog interpreter and
our extended version, have been implemented in Prolog in a
straightforward manner. The only optimization we employed
is to represent the case statements Vhδ (s) and Phδ (s) using
first-order arithmetic decision diagrams (FOADD) as it has
been suggested in [Sanner and Boutilier, 2009]. All experiments were carried out on a machine with a 2.6 GHz Core 2
duo and 2 GB of RAM.
def.
BestDoPick (πx.(γ); δ, s, h, case[φi (x, s), (vi , pi )], ρ) =
_^
¬∃x. φj (x, s)
i j<i
∧ ∃x. [φi (x, s) ∧ BestDo+ (γ; δ, s, h, ρ)]
∨
^
Comparison with DTGolog
¬∃x. φi (x, s) ∧ ρ = Stop
i
Note that the existential quantifier over the the φi also ranges
over the macro BestDo+ and thus the x which occurs as a
free variable in the policy returned by BestDo+ (γ; δ, s, h, ρ)
is bound by the existential such that φi (x, s) holds.
Theorem 1. For any DTGolog program δ,
D |= ∀ρ. ∃p, v. BestDo(δ, h, S0 , p, v, ρ)
≡ BestDo+ (δ, h, S0 , ρ)
(We assume that all restricted nondeterministic choices of arguments in δ have been rewritten as nondeterministic branchings.)
There seems to be an anomaly in the definition of DTGolog’s BestDo-macro. Whereas for primitive actions the
reward obtained in the situation before the primitive action
is executed is considered this is not the case for stochastic
actions. For instance, let A be a primitive, deterministic action and B a stochastic action with A being its sole outcome
action (which is chosen by Nature with a probability of 1).
Then the expected rewards for executing A and B may be
different which seems to be strange. This anomaly can easily
be “fixed” by considering the reward obtained in a situation
before a stochastic action is executed:
def.
BestDo([α; δ], s, h, ρ, v, pr) =
∃ρ′ , v ′ . BestDoAux({n1 , . . . ; nk }, δ, s, h, ρ′ , v ′ , pr)
∧ v = reward(s) + v ′ ∧ ρ = [α; senseEffect α ; ρ′ ]
For the proof of Theoream 1 we assumed the definition of
BestDo as shown above.
5.1 Blocks World
In our instances of the Blocks World there are coloured
blocks. The fluent On(b1 , b2 , s) denotes that block b1 is on
44
500
60
w/o action abstraction
w/ action abstraction
planning time [s]
400
planning time [s]
w/o action abstraction
w/ action abstraction
50
300
200
100
40
30
20
10
0
0
0
100
200
300
400
500
0
# of blocks
2
4
6
8
10
horizon
Figure 1: Influence of the number of possible action arguments on the planning time of DTGolog (w/o action abstraction) and our extended version featuring action abstraction.
Figure 2: Planning times for different horizons.
setting there are five cities and the intended goal is to have a
box in city C1 . The program we computed policies for is:
top of block b2 in situation s. Predicates like green(b) or
blue(b) encode the colouring of the blocks. The stochastic
action move(b1 , b2 ) has two outcomes: if it succeeds block
b1 is moved on top of block b2 or if it fails block b1 remains
in its current location. The probability with which the moveaction succeeds or fails depends on whether the block to be
moved is heavy or not; the probability that the action fails is
higher if the block is heavy. The reward function assigns a
reward of 10 to situations where there exists a green block on
top of a non-green block. In the experiments the number of
blocks varied between 10 and 500 and we computed policies
with BestDo and BestDo+ for a program that nondeterministically picks two blocks and moves one of them on top of the
other:
π x. (π y. (move(x, y)))
while ¬∃b. boxIn(b, C1 ) do
π c. (drive(T1 , c));
π b. (load(b, T1 ));
drive(T1 , C1 );
π b. (unload(b, T1 ))
end
This time we intend to examine whether we gain any advantage in terms of being able to compute longer policies in
the same amount of time from using our extended version of
DTGolog. Thus we recorded the computation time for planning policies of different lengths. The results are shown in
Figure 2. It can be seen that with action abstraction policies that are two steps longer can be computed in the same
time. It has to be noted though that there are also domains
in which no advantage in terms of computation time can be
gained from the abstraction we apply in our extended version
of DTGolog. One such example is a slight variation of the
aforementioned logistics domain. In the version above every city is directly reachable from every other city. If we restrict this it is necessary to explicitly encode the reachability
in the domain description. This not only increases the complexity of the formulas in Vhδ (s) and Phδ (s) but in particular
it leads to formulas with more deeply nested quantifiers. This
in turn increases the time it takes to evaluate those formulas
(at least with our rather unsophisticated implementation) by
such an amount that in the end DTGolog is a faster by a little
bit. Additionally, we did not precompute the case statements
but computed them on-the-fly since the required computation
time was marginal.
To sum it up, these experiments show that our extension
of DTGolog can lead to a tremendous speed-up in planning.
Such a speed-up should be observable in domains/domain instances where the branching factor of the search tree can be
drastically reduced due to state and action abstraction. But
then, there are also domains where this is not possible and
there the speed-up is modest or our extended version of DTGolog is even slower than the original version.
In the DTGolog variant of this program the nondeterministic
choices of action arguments ranges over all the blocks in the
domain instance. Consequently, the search tree branches over
all possible combinations for the two blocks. With action abstraction the branching factor of the search tree is constant
and independent of the number of objects. This reflects in the
time it takes to compute a policy (cf. Figure 1). With an increasing number of blocks the computation time for DTGolog
rises exponentially whereas the computation time for our extended version of DTGolog with action abstraction remains
nearly constant. The slow increase of computation time can
be explained with the fact that the evaluation of quantified
formulas as they appear in the case statements for the program above takes longer with an increasing number of objects. Nevertheless in comparison to the computation time of
DTGolog this increase is negligible.
5.2 Logistics Domain
In a second experiment we compared our version of DTGolog
with action abstraction to the original DTGolog version in
the logistics domain. In that domain trucks are supposed to
transport boxes from one city to another. The world is described by the fluents boxIn(b, c) meaning that box b is in
city c, boxOn(b, t) meaning that box b is on truck t, and
truckIn(t, c) meaning that the truck t is in city c. In our
45
6
Related Work
[Andre and Russell, 2002] D. Andre and S. J. Russell. State
abstraction for programmable reinforcement learning
agents. In Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI-02), pages 119–125,
2002.
[Boutilier et al., 2000] C. Boutilier, R. Reiter, M. Soutchanski, and S. Thrun. Decision-theoretic, high-level agent
programming in the situation calculus. In Proceedings of
the Seventeenth National Conference on Artificial Intelligence (AAAI-00), pages 355–362, 2000.
[Boutilier et al., 2001] C. Boutilier, R. Reiter, and B. Price.
Symbolic dynamic programming for first-order MDPs. In
Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01), volume 17,
pages 690–700, 2001.
[Dearden and Boutilier, 1997] R. Dearden and C. Boutilier.
Abstraction and approximate decision-theoretic planning*
1. Artificial Intelligence, 89(1-2):219–283, 1997.
[Finzi and Lukasiewicz, 2007] A. Finzi and T. Lukasiewicz.
Adaptive multi-agent programming in GTGolog. In KI
2006: Advances in Artificial Intelligence, 29th Annual
German Conference on AI, pages 389–403. Springer,
2007.
[Kersting et al., 2004] K. Kersting, M.V. Otterlo, and
L. De Raedt. Bellman goes relational. In Proceedings
of the twenty-first international conference on Machine
learning, page 59. ACM, 2004.
[Levesque et al., 1997] H. J. Levesque,
R. Reiter,
Y. Lespérance, F. Lin, and R. B. Scherl. GOLOG:
A logic programming language for dynamic domains. The
Journal of Logic Programming, 31(1-3):59–83, 1997.
[Marthi et al., 2005] B. Marthi, S. Russell, D. Latham, and
C. Guestrin. Concurrent hierarchical reinforcement learning. In Proceedings of the Twentieth National Conference
on Artificial Intelligence (AAAI-05), pages 1652–1653,
2005.
[Parr and Russell, 1997] R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In Advances in Neural Information Processing Systems 10
(NIPS 1997), pages 1043–1049, 1997.
[Puterman, 1994] M. L. Puterman. Markov decision processes: Discrete stochastic dynamic programming. John
Wiley & Sons, Inc. New York, NY, USA, 1994.
[Reiter, 1991] R. Reiter. The frame problem in situation the
calculus: a simple solution (sometimes) and a completeness result for goal regression. Artificial intelligence and
mathematical theory of computation: papers in honor of
John McCarthy, pages 359–380, 1991.
[Sanner and Boutilier, 2009] S. Sanner and C. Boutilier.
Practical solution techniques for first-order MDPs. Artificial Intelligence, 173(5-6):748–788, 2009.
There are numerous approaches that aim for a compact representation of MDPs by using representation languages of varying expressiveness (e.g., a probabilistic variant of STRIPS
[Dearden and Boutilier, 1997], relational logic [Kersting et
al., 2004], or first-order logic as in DTGolog [Boutilier et al.,
2000]). These representations adhere to abstract representations of the states and the state transitions and transition probabilities, respectively. The next step then is to exploit those
compact representations when solving the MDP which is exactly what we did here for DTGolog’s first-order MDP representation. The technique we use for that was first presented
in [Boutilier et al., 2001] where it was shown how an abstract
representation of the value function can be derived from a
first-order description of an MDP. The main difference to our
approach is that they do not consider programs to restrict the
search for the optimal policy. A first approach that combines
abstract value functions and Golog programs was presented
in [Finzi and Lukasiewicz, 2007]. Contrary to our approach
they assume an incompletely specified model (in particular
the probabilistic distribution of Nature’s choice is unspecified) and apply Q-learning techniques to obtain an optimal
policy. The update of the Q-values and the assembling of the
policy, though, is not handled within the language. Furthermore, their approach does not incorporate action abstraction.
Restricting the search space for the optimal policy by
means of partial programs has been explored extensively in
[Parr and Russell, 1997], [Andre and Russell, 2000], and
[Andre and Russell, 2002], for instance. Some of these approaches include abstraction mechanisms, too, but these rely
manual intervention since the partial programs used by them
have no properly defined semantics.
7
Conclusion
In this paper we presented an extension to DTGolog that allows to decision-theoretically determine an execution strategy for a given program with abstracting over action instances. This not only allows to heighten the expressiveness of the programs since we are no longer limited to the
restricted nondeterministic choice as in DTGolog. Additionally, we have shown that also in practise this can lead to a
speed-up in the computation of the policy despite the complexity of the formulas that has to be dealt with to achieve
action abstraction. Nevertheless, for more complex domains
or larger horizons the formulas still become unmanageable
large, even with the ADD representation. Consequently, subject of future research will be to explore how the complexity
of the formulas can be minimized. One possible approach is
to approximate the value function by a linear combination of
weighted basis functions. The problem with this is that it is
not clear how to find basis functions that allow for a good
approximation in the context of programs.
References
[Andre and Russell, 2000] D. Andre and S. Russell. Programmable reinforcement learning agents. In Advances in
Neural Information Processing Systems 13 (NIPS 2000),
pages 1019–1025, 2000.
46
Verifying properties of action theories by bounded model checking
Laura Giordano
Dipartimento di Informatica
Università del Piemonte
Orientale, Italy
Alberto Martelli
Dipartimento di Informatica
Università di Torino
Italy
Abstract
Temporal logics are well suited for reasoning about
actions, as they allow for the specification of domain descriptions including temporal constraints as
well as for the verification of temporal properties
of the domain. In this paper, we exploit bounded
model checking (BMC) techniques in the verification of properties of an action theory formulated in
a temporal extension of answer set programming.
To achieve completeness, in this paper, we follow
an approach to BMC which exploits the Büchi automaton construction. The paper provides an encoding in ASP of the temporal action domain and
of bounded model checking of LTL formulas.
1
Introduction
Temporal logics are well suited for reasoning about actions,
as they allow for the specification of domain descriptions including temporal constraints as well as for the verification of
temporal properties of the domain. In this paper, we exploit
bounded model checking (BMC) techniques in the verification of properties of an action theory formulated in a temporal
extension of answer set programming (ASP [10]).
Given a system model (a transition system) and a property
to be checked, bounded model checking (BMC) [4] searches
for a counterexample of the property by looking for a path
in the model. BMC does not require a tableau or automaton
construction. It searches for a counterexample as a path of
length k and generates a propositional formula that is satisfiable iff such a counterexample exists. The bound k is iteratively increased and if no model exists, the iterative procedure
will never stop. As a consequence, bounded model checking (as defined in [4]) provides a partial decision procedure
for checking validity. Techniques for achieving completeness
have been described in [4], where upper bounds for k are defined for some classes of properties, namely unnested properties. To solve this problem [5] proposes a semantic translation
scheme, based on Büchi automata.
Helianko and Niemelä [18] developed a compact encoding
of bounded model checking of LTL formulas as the problem
of finding stable models of logic programs. In this paper, we
propose an alternative encoding of BMC of LTL formulas in
ASP, with the aim of achieving completeness. As a difference
47
Daniele Theseider Dupré
Dipartimento di Informatica
Università del Piemonte
Orientale, Italy
with [18], the computed path is built by exploiting the Büchi
automaton construction [14]: it is an accepting path of the
product Büchi automaton which can be finitely represented
as a k-loop, i.e., a finite path of length k terminating in a loop
back in which the states are all distinct from each other. In the
verification of a given property, the iterative procedure looks
for a k-loop which provides a counterexample to the property
by increasing k until either a counterexample is found, or no
k-loop of length greater or equal to k can be found. The second condition can be verified by checking that there is no path
of length k whose states are all distinct from each other.
In the paper, the transition system defining the system
model on which the property is to be checked is provided
by defining a domain description in a temporal action theory. The action theory is given in a temporal extension of
ASP and the extensions of a domain description are defined
by generalizing the standard notion of answer set [10] to temporal answer sets. The encoding of BMC in ASP is based on
the definition of the Büchi automaton in [19] and exploits the
tableau-based procedure in [15] to provide the construction
on-the-fly of the automaton. The tableau procedure is directly
encoded in ASP to build a path of the product automaton. The
encoding in ASP uses a number of ground atoms which is linear in the size of the formula and quadratic in k.
2
Linear Time Temporal Logic
In this paper we refer to a formulation of LTL (linear time
temporal logic), introduced in [19], where the next state
modality is indexed by actions.
Let Σ be a finite non-empty alphabet. The members of Σ
are actions. Let Σ∗ and Σω be the set of finite and infinite
words on Σ. Let Σ∞ =Σ∗ ∪ Σω . We denote by σ, σ ′ the
words over Σω and by τ, τ ′ the words over Σ∗ . Moreover,
we denote by ≤ the usual prefix ordering over Σ∗ and, for
u ∈ Σ∞ , we denote by prf(u) the set of finite prefixes of u.
Let P = {p1 , p2 , . . .} be a countable set of atomic propositions. The set of formulas of LTL(Σ) is defined as follows:
LTL(Σ) ::= p | ¬α | α ∨ β | haiα | αUβ
where p ∈ P and α, β range over LTL(Σ).
A model of LTL(Σ) is a pair M = (σ, V ) where σ ∈ Σω
and V : prf (σ) → 2 P is a valuation function. Given a model
M = (σ, V ), a finite word τ ∈ prf (σ) and a formula α, the
satisfiability of a formula α at τ in M , written M, τ |= α, is
defined as follows:
• M, τ |= p iff p ∈ V (τ );
• M, τ |= ¬α iff M, τ 6|= α;
• M, τ |= α ∨ β iff M, τ |= α or M, τ |= β;
• M, τ |= haiα iff τ a ∈ prf (σ) and M, τ a |= α.
• M, τ |= αUβ iff there exists τ ′ such that τ τ ′ ∈ prf (σ)
and M, τ τ ′ |= β. Moreover, for every τ ′′ such that ε ≤
τ ′′ < τ ′1 , M, τ τ ′′ |= α.
A formula α is satisfiable iff there is a model M = (σ, V )
and a finite word τ ∈ prf (σ) such that M, τ |= α.
The symbols ⊤ and ⊥ can be defined as: ⊤ ≡ p ∨ ¬p and
⊥≡ ¬⊤. The derived modalities [a]α, (next), ✸Wand ✷ can
be defined as follows: [a]α ≡ ¬hai¬α, α ≡ a∈Σ haiα,
✸α ≡ ⊤Uα, ✷α ≡ ¬✸¬α.
3
Temporal action language
A domain description Π is defined as a set of laws describing the effects of actions and their executability preconditions. Atomic propositions describing the state of the domain
are called fluents. Actions may have direct effects, that are
described by action laws, and indirect effects, described by
causal laws capturing the causal dependencies among fluents.
Let L be a first order language which includes a finite number of constants and variables, but no function symbol. Let
P be a set of atomic literals p(t1 , . . . , tn ), that we call fluent names. A simple fluent literal l is a fluent name f or its
negation ¬f . We denote by LitS the set of all simple fluent
literals. LitT is the set of temporal fluent literals: if l ∈ LitS ,
then [a]l, l ∈ LitT , where a is an action name (an atomic
proposition, possibly containing variables), and [a] and are
the temporal operators introduced in the previous section. Let
Lit = LitS ∪ LitT ∪ {⊥}, where ⊥ represents inconsistency.
Given a (simple or temporal) fluent literal l, not l represents
the default negation of l. A (simple or temporal) fluent literal possibly preceded by a default negation, will be called an
extended fluent literal.
The laws are formulated as rules of a temporally extended
logic programming language. Rules have the form
t0 ← t1 , . . . , tm , not tm+1 , . . . , not tn
(1)
where the ti ’s are either simple fluent literals or temporal fluent literals. As usual in ASP, the rules with variables will be
used as a shorthand for the set of their ground instances.
In the following, to define our action language, we make
use of a notion of state: a set of ground fluent literals. A state
is said to be consistent if it is not the case that both f and ¬f
belong to the state, or that ⊥ belongs to the state. A state is
said to be complete if, for each fluent name p ∈ P, either p or
¬p belong to the state. The execution of an action in a state
may possibly change the values of fluents in the state through
its direct and indirect effects, thus giving rise to a new state.
We assume that a law as (1) can be applied in all states,
while we prefix a law with Init if it only applies to the initial
state.
1
We define τ ≤ τ ′ iff ∃τ ′′ such that τ τ ′′ = τ ′ . Moreover,
τ < τ ′ iff τ ≤ τ ′ and τ 6= τ ′ .
48
Example 1 This example describes a mail delivery agent,
which checks if there is mail in the mailbox of some employees and delivers the mail to them. The actions in Σ are:
sense mail (the agent verifies if there is mail in all mailboxes), deliver(E) (the agent delivers the mail to employee
E),wait. The fluent names are mail(E) (there is mail in the
mailbox of E). The domain description Π contains the following immediate effects and persistency laws:
[deliver(E)]¬mail(E)
[sense mail]mail(E) ← not [sense mail]¬mail(E)
mail(E) ← mail(E), not ¬mail(E)
¬mail(E) ← ¬mail(E), not mail(E)
Their meaning is (in the order) that: after delivering the mail
to E, there is no mail for E any more; the action sense mail
may (non-monotonically) cause mail(E) to become true.
The last two rules define the persistency of fluent mail.
Observe that, the persistency laws interact with the immediate effect laws above. The execution of sense mail in a
state in which there is no mail for some E (¬mail(E)), may
either lead to a state in which mail(E) holds (by the second
action law) or to a state in which ¬mail(E) holds (by the
persistency of ¬mail(E)). Thus sense mail is a nondeterministic action.
We can also add the following precondition laws:
[deliver(E)] ⊥← ¬mail(E)
[wait] ⊥← mail(E)
specifying that, if there is no mail for E, deliver(E) is not executable, while, if there is mail for E, wait is not executable.
We assume that there are only two employees, a and b, and
that in the initial state there is neither mail for a nor for b:
Init ¬mail(a)
Init ¬mail(b).
are included in Π.
Although not included in the example, the language is also
well suited to describe causal dependencies among fluents by
means of static causal laws such as, for instance, light on ←
voltage (if there is voltage, the light is on), or dynamic
causal laws as (form the shooting domain) f rightened ←
in sight, ¬in sight, alive (if the turkey is alive, it becomes frightened, if it is not already, when it starts seeing the
hunter). Similar causal rules can be formulated in the action
languages K [9] and C + [16].
3.1 Temporal answer sets
To define the the semantics of a domain description, we extend the notion of answer set [10] to capture the linear structure of temporal models. In the following, we consider the
ground instantiation of the domain description Π, and we denote by Σ the set of all the ground instances of the action
names in Π.
We define a temporal interpretation as a pair (σ, S), where
σ ∈ Σω is a sequence of actions and S is a consistent set of
literals of the form [a1 ; . . . ; ak ]l, where a1 . . . ak is a prefix
of σ, meaning that l holds in the state obtained by executing a1 . . . ak . S is consistent iff it is not the case that both
[a1 ; . . . ; ak ]l ∈ S and [a1 ; . . . ; ak ]¬l ∈ S, for some l, or
[a1 ; . . . ; ak ]⊥ ∈ S. A temporal interpretation (σ, S) is said
to be total if either [a1 ; . . . ; ak ]p ∈ S or [a1 ; . . . ; ak ]¬p ∈ S,
for each a1 . . . ak prefix of σ and for each fluent name p.
We define the satisfiability of a simple, temporal or extended literal t in a partial temporal interpretation (σ, S) in
the state a1 . . . ak , (written (σ, S), a1 . . . ak |= t) as follows:
(σ, S), a1 . . . ak |= ⊤,
(σ, S), a1 . . . ak 6|= ⊥
(σ, S), a1 . . . ak |= l iff [a1 ; . . . ; ak ]l ∈ S, for a literal l
(σ, S), a1 . . . ak |= [a]l iff [a1 ; . . . ; ak ; a]l ∈ S or
a1 . . . ak , a is not a prefix of σ
(σ, S), a1 . . . ak |= l iff [a1 ; . . . ; ak ; b]l ∈ S,
where a1 . . . ak b is a prefix of σ
(σ, S), a1 . . . ak |= not l iff (σ, S), a1 . . . ak 6|= l
The satisfiability of rule bodies in a temporal interpretation
are defined as usual. A rule H ← Body is satisfied in
a temporal interpretation (σ, S) if, for all action sequences
a1 . . . ak (including the empty one), (σ, S), a1 . . . ak |=
Body implies (σ, S), a1 . . . ak |= H.
A rule Init H ← Body is satisfied in a partial temporal
interpretation (σ, S) if, (σ, S), ε |= Body implies (σ, S), ε |=
H, where ε is the empty action sequence.
To define the answer sets of Π, we introduce the notion of
reduct of Π, containing rules of the form: [a1 ; . . . ; ah ](H ←
Body). Such rules are evaluated in the state a1 . . . ah .
Let Π be a set of rules over an action alphabet Σ, not containing default negation, and let σ ∈ Σω .
Definition 1 A temporal interpretation (σ, S) is a temporal
answer set of Π if S is minimal (in the sense of set inclusion)
among the S ′ such that (σ, S ′ ) is a partial interpretation satisfying the rules in Π.
To define answer sets of a program Π containing negation,
given a temporal interpretation (σ, S) over σ ∈ Σω , we define
the reduct, Π(σ,S) , of Π relative to (σ, S) extending Gelfond
and Lifschitz’ transform [11] to compute a different reduct of
Π for each prefix a1 , . . . , ah of σ.
(σ,S)
Definition 2 The reduct, Πa1 ,...,ah , of Π relative to (σ, S)
and to the prefix a1 , . . . , ah of σ , is the set of all the rules
[a1 ; . . . ; ah ](H ← l1 , . . . , lm )
such that H ← l1 , . . . , lm , not lm+1 , . . . , not ln is in Π and
(σ, S), a1 , . . . , ah 6|= li , for all i = m + 1, . . . , n.
The reduct Π(σ,S) of Π relative to (σ, S) is the union of all
(σ,S)
reducts Πa1 ,...,ah for all prefixes a1 , . . . , ah of σ.
Definition 3 A temporal interpretation (σ, S) is an answer
set of Π if (σ, S) is an answer set of the reduct Π(σ,S) .
Although the answer sets of a domain description Π are
partial interpretations, in some cases, e.g., when the initial
state is complete and all fluents are inertial, it is possible to
guarantee that the temporal answer sets of Π are total.
In case the initial state is not complete,we consider all the
possible ways to complete the initial state by introducing in
Π, for each fluent name f , the rules:
Init f ← not ¬f
Init ¬f ← not f
49
The case of total temporal answer sets is of special interest as
a total temporal answer set (σ, S) can be regarded as temporal model (σ, V ), where, for each finite prefix a1 . . . ak of σ,
V (a1 , . . . , ak ) = {p : [a1 , . . . , ak ]p ∈ S}. In the following,
we restrict our consideration to domain descriptions Π, such
that all the answer sets of Π are total.
A total temporal interpretation (σ, S) provides, for each
prefix a1 . . . ak , a complete state corresponding to that prefix.
(σ,S)
We denote by wa1 ...ak the state obtained by the execution of
(σ,S)
the actions a1 . . . ak in the sequence, namely wa1 ...ak = {l :
[a1 ; . . . ; ak ]l ∈ S}.
Given a domain description Π over Σ with total answer
sets, a transition system (W, I, T ) can be associated with Π
as follows:
- W is the set of all the possible consistent and complete
states of the domain description;
- I is the set of all the states in W satisfying the initial
state laws in Π;
- T ⊆ W × Σ × W is the set of all triples (w, a, w′ ) such
that: w, w′ ∈ W , a ∈ Σ and for some total answer set
(σ,S)
(σ,S)
(σ, S) of Π: w = w[a1 ;...;ah ] and w′ = w[a1 ;...;ah ;a] , for
some h.
3.2 Reasoning with LTL on domain descriptions
As a total temporal answer set of a domain description can be
interpreted as an LTL model, it is easy to combine domain descriptions with LTL formulas. This can be done in two ways:
on the one hand, LTL formulas can be used as constraints
C on the executions of the domain description; on the other
hand, LTL formulas can encode properties φ to be verified on
the domain description.
Example 2 Assume we want to constrain our domain description in Example 1 so that the agent continuously executes a loop where it senses mail, but there cannot be two
consecutive executions of sense mail. These constraints can
be formulated as follows:
✷✸hsense maili⊤
✷[sense mail]¬hsense maili⊤
Furthermore, we may want to check that, if there is mail for a,
the agent will eventually deliver it to a. This property, which
can be formalized as ✷(mail(a) ⊃ ✸¬mail(a)), does not
hold as there is a possible scenario in which there is always
mail for a and for b, but the mail is repeatedly delivered to b
and never to a. The mail delivery agent we have described is
not correct with respect to this property.
In the following, we will assume that a set of constraints C
is added to the domain description, beside the rules in Π, and
we denote by (Π, C) the enriched domain description. We
define the extensions of (Π, C) to be the temporal answer sets
of Π satisfying the constraints C.
4
Model checking
The above verification and satisfiability problems can be
solved by means of model checking techniques. Given a domain description, with its associated transition system, the
extension of the domain description satisfying a set of constraints C can be found by looking for a path in the transition
system satisfying the formulas in C. On the other hand, given
a property ϕ formulated as a LTL formula, we can check its
validity by checking the unsatisfiability of ¬ϕ in the transition system. In this case, if a model satisfying ¬ϕ is found, it
represents a counterexample to the validity of ϕ.
The standard approach to model checking for LTL is based
on Büchi automata. A Büchi automaton over an alphabet Σ
is a tuple B = (Q, →, Qin , F ) where: Q is a finite nonempty
set of states; →⊆ Q×Σ×Q is a transition relation; Qin ⊆ Q
is the set of initial states; F ⊆ Q is a set of accepting states.
Let σ ∈ Σω ; a run of B over σ is a map ρ : prf (σ) → Q
a
such that: ρ(ε) ∈ Qin and ρ(τ ) → ρ(τ a) for each τ a ∈
prf (σ) with a ∈ Σ. The run ρ is accepting iff inf(ρ) ∩ F 6= ∅,
where inf(ρ) ⊆ Q is given by: q ∈ inf (ρ) iff ρ(τ ) = q
for infinitely many τ ∈ prf (σ). The language of ω-words
accepted by B is: L(B) = {σ|∃ an accepting run of B over
σ}.
The satisfiability problem for LTL can be solved in deterministic exponential time by constructing for each formula
α ∈ LT L(Σ) a Büchi automaton Bα [14] such that the language of ω-words accepted by Bα is non-empty if and only
if α is satisfiable. In case of model checking we have a property which is represented as an LTL formula ϕ, and a model
(transition system) which directly corresponds to a Büchi automaton where all the states are accepting. The property can
be proved by taking the product of the model and of the automaton derived from ¬ϕ, and by checking for emptiness of
the accepted language.
In [4] it has been shown that, in some cases, model checking can be more efficient if, instead of building the product automaton and checking for an accepting run on it, we
look for an infinite path of the transition system satisfying
C ∪ {¬ϕ}. This technique is called bounded model checking
(BMC), since it looks for paths whose length is bounded by
some integer k, by iteratively increasing the length k until a
model satisfying the formulas in C ∪ {¬ϕ} is found (if one
exists). More precisely, it considers infinite paths which can
be represented as a finite path of length k with a back loop,
i.e. with an edge from state k to a previous state in the path.
A BMC problem can be efficiently reduced to a propositional satisfiability problem or to an ASP problem [18]. Unfortunately, if no model exists, the iterative procedure will
never stop, if the transition system contains a loop. Thus it
is a partial decision procedure for checking validity. Techniques for achieving completeness are described in [4] for
some kinds of LTL formulas.
5
Büchi automata and Model checking
In this paper, we propose an approach which combines the
advantages of BMC and the possibility of formulating it easily and efficiently as an ASP problem, with the advantages of
reasoning on the product Büchi automaton described above,
mainly its completeness. In this section, we show how to
build the product automaton and how to use the automaton
tableau construction for BMC. In the next section we describe
how to encode the transition system and BMC in ASP.
50
The problem of constructing a Büchi automaton from a
LTL formula has been deeply studied. In this section we
show how to build a Büchi automaton for a given LTL(Σ)
formula φ using the tableau-like procedure. The construction
is adapted from the procedure given in [19; 15] for Dynamic
Linear Time Logic (DLTL), a logic which extends LTL by
indexing the until operator with regular programs.
The main procedure to construct the Büchi automaton for
a formula φ builds a graph G(φ) whose nodes are labelled by
sets of formulas, and whose edges are labelled by symbols
from the alphabet Σ. States and transitions of the Büchi automaton are obtained directly from the nodes and edges of the
graph. The construction of the states makes use of an auxiliary tableau-based function tableau which handles signed
formulas, i.e. formulas prefixed with the symbol T or F. This
function takes as input a set of formulas2 and returns a set of
sets of formulas, obtained by expanding the input set according to a set of tableau rules, formulated as follows:
• φ ⇒ ψ1 , ψ2 , if φ belongs to the set of formulas, then
add ψ1 and ψ2 to the set
• φ ⇒ ψ1 |ψ2 , if φ belongs to the set of formulas, then
replace the set with two copies of the set and add ψ1 to
one of them and ψ2 to the other one.
The rules are the following:
Tor:
For:
Tneg:
Fneg:
Tuntil:
Funtil:
T(α ∨ β) ⇒ Tα|Tβ
F(α ∨ β) ⇒ Fα, Fβ
T¬α ⇒ Fα
F¬α ⇒ Tα
TαUβ ⇒ T(β ∨ (α ∧
FαUβ ⇒ F(β ∨ (α ∧
αUβ))
αUβ))
where the tableau rules for the until formula make use of the
equivalence: αUβ ≡ (β ∨ (α ∧ αUβ)). This set of rules
can be easily extended to deal with other boolean connectives
and modal operators like ✷ or ✸ by making use of the equivalences ✷β ≡ (β ∧ ✷β)) and ✸β ≡ (β ∨ ✸β)).
Given a set of formulas s, function tableau repeatedly applies the above rules to the formulas of s (by possibly creating
new sets) until all formulas in all sets have been expanded. If
the expansion of a set of formulas produces an inconsistent
set, then this set is deleted. A set of formulas s is inconsistent
in the following cases: (i) T⊥ ∈ s; (ii) F⊤ ∈ s; (iii) Tα ∈ s
and Fα ∈ s; (iv) Thaiα ∈ s and Thbiβ ∈ s with a 6= b,
because in a linear time logic two different actions cannot be
executed in the same state.
To build the graph for a formula φ, we begin by building
the initial states,
obtained by applying function tableau to the
W
set {φ , T a∈Σ hai⊤}, where the second formula takes into
account the fact that runs must be infinite and thus there must
be at least an outgoing edge from each state. After execution
of tableau, every resulting set contains exactly one Thai⊤
formula, for some a ∈ Σ.
The above tableau rules do not expand formulas whose top
operator is a next time operator, i.e. haiα or α. Expanding such formulas from a node n means creating a new node
containing α connected to n through an edge labelled with a
2
In this section “formulas” means “signed formulas”.
in the first case, or with any symbol in Σ in the second case.
Thus an obvious procedure for building the graph is to apply to all sets obtained by the tableau procedure the following construction: if node n contains a formula Thaiα, then
build the set of the nodes connected to n through an edge
labelled a as tableau({Tα|Thaiα ∈ n} ∪ {Tα|T
α ∈
W
n}∪{Fα|Fhaiα ∈ n}∪{Fα|F α ∈ n}∪{T a∈Σ hai⊤}).
The construction is iterated on the new nodes.
States and transitions of the Büchi automaton correspond
directly to the nodes and edges of the graph. We must now
define the accepting states of the automaton. Intuitively, we
would like to define as accepting those runs in which all the
until formulas of the form TαUβ are fulfilled. If a node n
contains the formula TαUβ, then we can accept an infinite
run containing n, if node n is followed in the run by a node
n′ containing Tβ. Furthermore all nodes between n and n′
must contain Tα.
Let us assume that a node n contains the until formula
TαUβ. After the expansion of this formula, n either contains
Tβ or T αUβ. In the latter case, each successor node will
contain a formula TαUβ. We say that this until formula is
derived from formula TαUβ in node n. If a node contains an
until formula which is not derived from a predecessor node,
we will say that the formula is new. New until formulas are
obtained during the expansion of the tableau procedure.
In order to formulate the accepting condition, we must be
able to trace the until formulas along the paths of the graph
to make sure that they are fulfilled. To do this we extend Tsigned formulas so that all until formulas have a label 0 or
1, i.e. they have the form TαU l β where l ∈ {0, 1}3 . Note
that two formulas TαU 0 β and TαU 1 β are considered to be
different. Furthermore, we define each node of the graph as a
triple (F, x, f ), where F is an expanded set of formulas built
by function tableau, x ∈ {0, 1}, and f ∈ {↓, X}. f = X
means that the node represents an accepting state.
For each node (F, x, f ), the label of an until formula in F
will be assigned as follows: if it is a derived until formula,
then its label is the same as that of the until formula in the
predecessor node it derives from, otherwise, if the formula is
new, it is given the label 1 − x.
Given a node (F, x, f ) and a successor (F ′ , x′ , f ′ ), x′ and
f ′ are defined as follows:
if f = X then x′ := 1 − x else x′ := x ,
′
if there is no T αU x β ∈ F ′ then f ′ := X else f ′ :=↓
Let us call 0-sequences or 1-sequences the sequences of
nodes of a run ρ with x = 0 or x = 1 respectively. Intuitively,
every new until formula created in a node of a 0-sequence
will be fulfilled within the end of the next 1-sequence, and
vice versa. In fact, the formula will be given label 1 and
propagated in the following nodes with the same label, and
the 1-sequence cannot terminate until the until formula is fulfilled. If ρ is an accepting run, then it must contain infinitely
many nodes containing X, and thus all 0-sequences and 1sequences must be finite and, as a consequence, all until formulas will be fulfilled.
Given a graph G(φ), the states and transitions of the Büchi
3
If we introduce also the ✷ and ✸ operators, we have to label
them in the analogous way.
51
automaton B(φ) correspond directly to the nodes and edges
of G(φ), and the set of accepting states of B(φ) consists of all
states whose corresponding node contains f = X.
In [15] it is proved that there is a σ ∈ L(B(φ)) if and only
if there is a model M = (σ, V ) such that M, ε |= φ.
The same construction can be used in model checking for
building the product automaton of B(φ) and the transition
system. Every state of the product automaton is the union of
a set of fluents forming a state of the transition system and a
set of signed formulas corresponding to a state of B(φ), while
transitions must agree both with transitions of B(φ) and those
of the action theory. We assume that the action theory and
the LTL formulas refer to the same set of actions and atomic
propositions. Of course, the states of the product automaton
must be consistent, i.e. they cannot contain the literal ¬f and
the signed formula Tf or f and Ff 4 .
The construction of the automaton can be done on-the-fly,
while checking for the emptiness of the language accepted by
the automaton. In this paper, following the BMC approach,
we aim at generating a single path of the automaton at a time.
Given an integer k, we look for a path of length k of the automaton, with a loop back from the last state to a previous
state l in the path, such that there is an accepting state j,
l ≤ j ≤ k. Such a k-loop finitely represents an accepting
run of the automaton. Note that we can consider only simple paths, that is paths without repeated nodes. This property
allows to define a terminating algorithm, thus achieving completeness: the bound k is increased until a k-loop is found or
the length of the longest path of the automaton is reached.
To find the length of the longest path we can proceed iteratively by looking for a simple path of length k (without loop),
incrementing k at each iteration. Since the product automaton
has a finite size, this procedure terminates.
Example 3 Let us consider the domain description in Example 1 with the constraint ✷✸hsense maili⊤. The following
is a k-loop satisfying the constraint for k = 4. It consist of the
wait
wait
states s0 , . . . , s4 with the transitions s0 → s1 , s1 → s2 ,
sense mail
deliver(b)
deliver(a)
−→ s3 , s3 −→ s4 , s4 −→ s1 .
State s0 is obtained by applying tableau to the LTL formula expressing the constraint, and adding the fluent literals
holding in the initial state. Thus we get5 :
T✷✸hsense maili⊤, T✸1 hsense maili⊤,
T ✷✸hsense maili⊤, T ✸1 hsense maili⊤,
Thwaiti⊤, ¬a, ¬b, x = 0, f = X
The second and third formulas are obtained by the expansion of the first one, while the fourth formula is obtained by
the expansion of the second one.
State s1 is obtained by propagating the next time formulas
and expanding them:
T✷✸hsense maili⊤, T✸1 hsense maili⊤,
T✸0 hsense maili⊤, T ✷✸hsense maili⊤,
T ✸1 hsense maili⊤, T ✸0 hsense maili⊤,
Thwaiti⊤, ¬a, ¬b, x = 1, f =↓
s2
4
Remember that the states of the transition systems are complete
and thus each state must contain either f or ¬f
5
We omit formulas having as topmost operator a boolean connective, and we use a and b as a shorthand for mail(a), mail(b).
The second and third formulas are identical but the index of
the ✸ operator: the second formula derives from the previous
state, while the third one derives from the first formula of this
state; f is ↓ because there is a next time formula with label 1.
State s2 is:
T✷✸hsense maili⊤, T✸1 hsense maili⊤,
T✸0 hsense maili⊤, T ✷✸hsense maili⊤,
Thsense maili⊤, ¬a, ¬b, x = 1, f = X
The value of f is X because there are no next time formulas
with label 1. The formulas T✸l hsense maili⊤ are fulfilled
because sense mail will be the next action.
State s3 is:
T✷✸hsense maili⊤, T✸1 hsense maili⊤,
T ✷✸hsense maili⊤, T ✸1 hsense maili⊤,
Thdeliver mail(b)i⊤, a, b, x = 0, f = X
Note that the execution of sense mail changes the value
of a and b.
State s4 is:
T✷✸hsense maili⊤, T✸0 hsense maili⊤,
T✸1 hsense maili⊤, T ✷✸hsense maili⊤,
T ✸1 hsense maili⊤, T ✸0 hsense maili⊤,
Thdeliver mail(a)i⊤, a, ¬b, x = 1, f =↓
By executing action deliver mail(a) we have a transition
back to state s1 .
Example 4 Let us consider now our domain description with
the two constraints in Example 2. To check whether the formula ϕ = ✷(mail(a) ⊃ ✸¬mail(a)) is valid, we add to the
domain description the two constraints and ¬ϕ, and we get
the following k-loop which represent a counterexample to the
property: s0
sense mail
−→
s1 , s1
deliver(b)
−→
s2 , s2
sense mail
−→
s3 ,
deliver(b)
s3 −→ s2 . Furthermore, we have the following fluents
in each state: s0 : ¬a, ¬b, s1 : a, b, s2 : a, ¬b, s3 : a, b. Thus
the mail of a is never delivered.
Let us now modify the domain theory by adding the precondition [sense mail] ⊥← mail(E). In this case, we expect ϕ to hold. To check this, we first compute the length of
the longest path in the Büchi automaton, which turns out to
be 9, and then check that there is no k-loop for k up to 9.
6
Encoding bounded model checking in ASP
We give now a translation into standard ASP of the above
procedure for building a path of the product Büchi automaton.
The translation has been run in the DLV-Complex extension
of DLV [20].
In the translation we use predicates like fluent,
action, state, to express the type of atoms.
As we are interested in infinite runs represented as k-loops,
we assume a bound K to the number of states. States are
represented in ASP as integers from 0 to K, where K is
given by the predicate laststate(State). The predicate
occurs(Action,State) describes transitions. Occurrence
of exactly one action in each state can be encoded as:
-occurs(A,S):- occurs(A1,S),action(A),
action(A1),A!=A1,state(S).
occurs(A,S):- not -occurs(A,S),action(A),
state(S).
52
As we have seen, states are associated with a set of
fluent literals, a set of signed formulas, and the values of x and f . Fluent literals are represented with
the predicate holds(Fluent, State), T or F formulas with tt(Formula,State) or ff(Formula,State), x
with the predicate x(Val,State) and f with the predicate
acc(State), which is true if State is an accepting state.
States on the path must be all different, and thus we need
to define a predicate eq(S1,S2) to check whether the two
states S1 and S2 are equal:
eq(S1,S2):- state(S1), state(S2),
not diff(S1,S2).
diff(S1,S2):- state(S1), state(S2),
tt(F,S1), not tt(F,S2).
diff(S1,S2):- state(S1), state(S2),
holds(F,S1), not holds(F,S2).
and similarly for any other kind of component of a state.
The following constraint requires all states up to K to be
different:
:- state(S1), state(S2), S1!=S2, eq(S1,S2),
laststate(K), S1<=K, S2<=K.
Furthermore we have to define suitable constraints stating
that there will be a transition from state K to a previous state
L6 , and that there must be a state S, L ≤ S ≤ K, such that
acc(S) holds, i.e. S is an accepting state. To do this we
compute the successor of state K, and check that it is equal
to S.
loop(L):- state(L), laststate(K), L<=K,
SuccK=K+1, eq(L,SuccK).
accept:- loop(L), state(S), laststate(K),
L<=S, S<=K, acc(S).
:- not accept.
A problem we want to translate to ASP consists of a domain description Π and a set of LTL formulas ϕ1 , . . . ϕn , representing constraints or negated properties, to be satisfied on
the domain description. The rules of the domain description
can be easily translated to ASP, similarly to [10]. In the following we give the translation of our running example7 .
action(sense mail).
action(deliver(a)).
action(deliver(b)).
action(wait).
fluent(mail(a)).
fluent(mail(b)).
action effects:
holds(mail(E),NS):- occurs(sense mail,S),
fluent(mail(E)), NS=S+1,
not -holds(mail(E),NS).
-holds(mail(E),NS):- occurs(deliver(E),S),
fluent(mail(E)), NS=S+1.
persistence:
holds(F,NS):- holds(F,S), fluent(F), NS=S+1,
6
Since states are all different, there will be at most one state equal
to the successor of K.
7
Actually, a more general approach to deal with variables in action names and fluents, consists in introducing , as done in [8], type
predicates for fluents and actions and to include type conditions in
the translation.
not -holds(F,NS).
holds(F,NS):- -holds(F,S), fluent(F), NS=S+1,
not holds(F,NS).
preconditions:
:- occurs(deliver(E),S),-holds(mail(E),S).
:- occurs(wait,S), holds(mail(E),S).
initial state:
-holds(mail(a),0).
-holds(mail(b),0).
LTL formulas are represented as ASP terms. The expansion of signed formulas can be formulated by means of ASP
rules corresponding to the rules given in the previous section.
Disjunction:
tt(F1,S) v tt(F2,S):- tt(or(F1,F2),S).
ff(F1,S):- ff(or(F1,F2),S).
ff(F2,S):- ff(or(F1,F2),S).
Negation:
ff(F,S):- tt(neg(F),S).
tt(F,S):- ff(neg(F),S).
Until:
tt(lab until(F1,F2,Lab),S):tt(until(F1,F2),S), x(VX,S), 1=Lab+VX.
ff(or(F2,and(F1,next(until(F1,F2)))),S):ff(until(F1,F2),S).
tt(or(F2,and(F1,next(lab until(F1,F2,L)))),S):tt(lab until(F1,F2,L),S).
Note that, to express splitting of sets of formulas, as in the
case of disjunction, we can exploit disjunction in the head
of clauses, provided by some ASP languages such as DLV.
We have introduced the term lab until(F1,F2,Lab) for
labeled until formulas, as described in the previous section.
Expansions of next time formulas hai (diamond) and
(next) are defined as:
occurs(Act,S):- tt(diamond(Act,F),S).
tt(F,NS):- tt(diamond(Act,F),S), NS=S+1.
ff(F,NS):- ff(diamond(Act,F),S),
occurs(Act,S), NS=S+1.
tt(F,NS):- tt(next(F),S), NS=S+1.
ff(F,NS):- ff(next(F),S), NS=S+1.
Inconsistency of signed formulas is formulated with the
following constraints:
:- ff(true,S), state(S).
:- tt(F,S), ff(F,S), state(S).
:- tt(diamond(Act1,F),S),
tt(diamond(Act2,F),S), Act1!=Act2.
:- tt(F,S), not holds(F,S).
:- ff(F,S), not -holds(F,S).
Finally, predicates x and acc are defined as follows.
x(NN,NS):- acc(S), x(N,S), NS=S+1, 1=NN+N.
x(N,NS):- -acc(S), x(N,S), NS=S+1.
-acc(NS):- x(N,NS), tt(lab until( , ,N),NS),
NS=S+1.
acc(NS):- not -acc(NS), NS=S+1.
x(0,0). acc(0).
We must also add a fact tt(tr(ϕi ),0) for each ϕi , where
tr(ϕi ) is the ASP term representing ϕi .
It is easy to see that the (groundization of the) encoding in
ASP is linear in the size of the formula and quadratic in the
size of k. Observe that the number of the ground instances
53
of all predicates is O(|φ| × k), except for eq, whose ground
instances are k 2 .
We can prove that there is a one to one correspondence
between the extensions of a domain description satisfying a
given temporal formula and the answer sets of the ASP program encoding the domain and the formula.
Proposition 1 Let Π be a domain description whose temporal answer sets are total, let tr(Π) be the ASP encoding of Π
(for a given k), and let φ be an LTL formula. There is a one
to one correspondence between the temporal answer sets of
Π that satisfy the formula φ and the answer sets of the ASP
program tr(Π) ∪ tt(tr(φ, 0)), where tr(φ) is the ASP term
representing φ.
Completeness of BMC can be achieved considering that
that the longest simple path in the product Büchi automaton
determines an upper bound k0 on the length of the k-loops
searched for by the iterative procedure. To check the validity
of a formula φ, we look for a k-loop satisfying ¬φ. During
the iterative search, we can check whether the bound k0 has
been already reached or not. In practice, at each iteration k,
either we find a k-loop of length k, or we check if there is a
simple path (a path with no loop and without repeated nodes)
of length k. If not, we can conclude that the bound has been
reached and we can stop.
The search for a simple path of length k can be done by removing from the above ASP encoding the rules for defining
loops and the rules for defining the Buchi acceptance condition (the definitions of x, acc and accept and the constraint
:- not accept).
7
Conclusions
We have presented a bounded model checking approach for
the verification of properties of temporal action theories in
ASP. The temporal action theory is formulated in a temporal extension of ASP, where the presence of LTL constraints in the domain description, allows for state trajectory constraints to be captured, as advocated in PDDL3 [13].
The proposed approach can be easily extended to the logic
DLTL, which extends LTL with regular programs of propositional dynamic logic [19]. It provides a uniform ASP
methodology for specifying domain descriptions and for verifying them, which can to be used for several reasoning
tasks, including reasoning about communication protocols [2;
15], business process verification [7], planning with temporal
constraints [1], to mention some of them.
Helianko and Niemelä [18] developed a compact encoding
of bounded model checking of LTL formulas as the problem
of finding stable models of logic programs. In this paper,
to achieve completeness, we follow a different approach to
BMC which exploits the Büchi automaton construction. This
makes the proposed approach well suited both for verifying
that there is an extension of the domain description satisfying/falsifying a given property, and for verifying that all the
extensions of the domain description satisfy a given property.
[5] first proposed the use of the Büchi automaton in BMC.
As a difference, our encoding in ASP is defined without assuming that the Büchi automaton is computed in advance.
The states of the Büchi automaton are indeed computed on
the fly, when building the path of the product automaton. This
requires the equality among states to be checked during the
construction of k-loops, which makes the size of the translation quadratic in k. This quadratic blowup is the price we pay
for achieving completeness with respect to the translation to
stable models in [18].
Apart from the presence of the temporal constraints, the
action language we introduced in Section 3 has strong relations with the languages K and C. The logic programming
based planning language K [8; 9] is well suited for planning
under incomplete knowledge and which allows concurrent actions. The temporal action language introduced in section 3
for defining the rules in Π can be regarded as a fragment of K
in which concurrent actions are not allowed.
The planning system DLV K provides an implementation
of K in the disjunctive logic programming system DLV.
DLV K does not appear to support other kinds of reasoning
besides planning, and, in particular, does not allow to express
temporal properties and to verify them.
The languages C and C + [17; 16] also deal with actions
with indirect and non-deterministic effects and with concurrent actions, and are based on nonmonotonic causation rules
syntactically similar to those of K, where head and body
of causation rules can be boolean combination of constants.
Their semantics is based on a nonmonotonic causal logic
[16]. Due to the different semantics, a mapping between our
action language and the languages C and C + appears not to
be straightforward. If a causal theory is definite (the head of
a rule is an atom), it is possible to reason about it by turning
the theory into a set of propositional formulas by means of a
completion process, and then invoke a satisfiability solver. In
this way it is possible to perform various kinds of reasoning
such as prediction, postdiction or planning. However the language does not exploits standard temporal logic constructs to
reason about actions.
The action language defined in this paper can be regarded
as a temporal extension of the language A [12]. The extension allows to deal with general temporal constraints and infinite computations. Instead, it does not deal with concurrent
actions and incomplete knowledge.
The presence of temporal constraints in our action language is related to the work on temporally extended goals
in [6; 3], which, however, is concerned with expressing preferences among goals and exceptions in goal specification.
References
[1] F. Bacchus and F. Kabanza. Planning for temporally
extended goals. Annals of Mathematics and AI, 22:5–
27, 1998.
[2] M. Baldoni, C. Baroglio, and E. Marengo. Behaviororiented Commitment-based Protocols. In Proc. 19th
ECAI, pages 137–142, 2010.
[3] C. Baral and J. Zhao. Non-monotonic temporal logics
for goal specification. In IJCAI 2007, pages 236–242,
2007.
[4] A. Biere, A. Cimatti, E. M. Clarke, O. Strichman, and
Y. Zhu. Bounded model checking. Advances in Computers, 58:118–149, 2003.
54
[5] E.M. Clarke, D. Kroening, J. Ouaknine, and O. Strichman. Completeness and complexity of bounded model
checking. In VMCAI, pages 85–96, 2004.
[6] U. Dal Lago, M. Pistore, and P: Traverso. Planning with
a language for extended goals. In Proc. AAAI02, 2002.
[7] D. D’Aprile, L. Giordano, V. Gliozzi, A. Martelli, G. L.
Pozzato, and D. Theseider Dupré. Verifying Business
Process Compliance by Reasoning about Actions. In
CLIMA XI, volume 6245 of LNAI, 2010.
[8] T. Eiter, W. Faber, N. Leone, G. Pfeifer, and A. Polleres.
A logic programming approach to knowledge-state
planning, II: The DLVk system. Artificial Intelligence,
144(1-2):157–211, 2003.
[9] T. Eiter, W. Faber, N. Leone, G. Pfeifer, and A. Polleres.
A logic programming approach to knowledge-state
planning: Semantics and complexity. ACM Trans. Comput. Log., 5(2):206–263, 2004.
[10] M. Gelfond. Handbook of Knowledge Representation,
chapter 7, Answer Sets. Elsevier, 2007.
[11] M. Gelfond and V. Lifschitz. The stable model semantics for logic programming. In Logic Programming,
Proc. of the 5th Int. Conf. and Symposium, 1988.
[12] M. Gelfond and V. Lifschitz. Representing action and
change by logic programs. Journal of logic Programming, 17:301–322, 1993.
[13] A. Gerevini and D. Long. Plan constraints and preferences in PDDL3. Technical Report, Department of Electronics and Automation, University of Brescia, Italy,
2005.
[14] R. Gerth, D. Peled, M.Y.Vardi, and P. Wolper. Simple on-the-fly automatic verification of linear temporal
logic. In Proc. 15th Work. Protocol Specification, Testing and Verification, 1995.
[15] L. Giordano and A. Martelli. Tableau-based automata
construction for dynamic linear time temporal logic.
Annals of Mathematics and AI, 46(3):289–315, 2006.
[16] E. Giunchiglia, J. Lee, V. Lifschitz, N. McCain, , and
H. Turner. Nonmonotonic causal theories. Artificial Intelligence, 153(1-2):49–104, 2004.
[17] E. Giunchiglia and V. Lifschitz. An action language
based on causal explanation: Preliminary report. In
AAAI/IAAI, pages 623–630, 1998.
[18] K. Heljanko and I. Niemelä. Bounded LTL model
checking with stable models. TPLP, 3(4-5):519–550,
2003.
[19] J.G. Henriksen and P.S. Thiagarajan. Dynamic linear
time temporal logic. Annals of Pure and Applied logic,
96(1-3):187–207, 1999.
[20] N. Leone, G. Pfeifer, W. Faber, T. Eiter, G. Gottlob,
S. Perri, and F. Scarcello. The DLV system for knowledge representation and reasoning. ACM Transactions
on Computational Logic, 7(3):499–562, 2006.
Efficient Epistemic Reasoning in Partially Observable Dynamic Domains
Using Hidden Causal Dependencies
Theodore Patkos and Dimitris Plexousakis
Foundation for Research and Technology Hellas - FORTH
Heraklion, Greece
{patkos,dp}@ics.forth.gr
Abstract
Reasoning about knowledge and action in realworld domains requires establishing frameworks
that are sufficiently expressive yet amenable to efficient computation. Where partial observability is
concerned, contemporary research trends concentrate on alternative representations for knowledge,
as opposed to the possible worlds semantics. In this
paper we study hidden causal dependencies, a recently proposed approach to model an agent’s epistemic notions. We formally characterize its properties, substantiate its computational benefits and
provide a generic implementation method.
1 Introduction
Research in epistemic action theories has extended the competence of cognitive robotics to different domains. Powerful
frameworks with expressive formal accounts for knowledge
and change and elegant mathematical specifications have
been developed, motivated primarily by the adaptation of the
possible worlds semantics in action theories, e.g., [Scherl and
Levesque, 2003; Thielscher, 2000; Lob, 2001]. Nevertheless, their employment to large-scale systems raises legitimate concerns, due to their dependence on a computationally
intensive structure: according to the possible worlds specifications, for a domain of n atomic formulae determining
whether a formula is known may require up to 2n worlds
to check truth in. Aiming at efficiency, contemporary research departs from the standard practice and explores alternative characterizations of knowledge focusing on classes
of restricted expressiveness or sacrificing completeness. In
many cases, a demand for context-complete domains, i.e., deterministic domains where all action preconditions are known
upon action execution, is introduced to prove logical equivalence to possible worlds. Yet, for real-world domains such
restrictions are often too strong to accept.
In a previous study we proposed a formal theory for reasoning about knowledge, action and time that can reach sound
and complete conclusions with respect to possible worldsbased theories for a wide range of commonsense phenomena [Patkos and Plexousakis, 2009]. One innovative aspect
of the approach was the introduction of a generic form of
implication rules, called hidden causal dependency (HCD),
55
that captures the relation among unknown preconditions and
effects of actions. In this paper we uncover the insights of
HCDs and present the complete axiomatization. HCDs can
be a valuable tool as they offer a way to represent an agent’s
best knowledge about a partially observable world without
having to maintain all alternative states. Setting off from previous studies, we adopt a more concrete representation that
does not require the introduction of auxiliary fluents, offering
a generic theory that is not attached to a particular underlying
formalism. In addition, we elaborate on complexity issues
substantiating the claims of efficiency, which stems from the
fact that HCDs are treated as ordinary epistemic concepts. As
the objective of this research is to contribute to the creation of
practical applications without sacrificing expressiveness, this
paper also discusses a way to implement the axiomatization
and introduces a new tool for epistemic reasoning.
We begin with a presentation of state-of-the-art approaches
and, after a brief familiarization with the underlying knowledge theory, we describe in detail the axiomatization concerning HCDs. Section 5 provides a property analysis, while section 6 discusses implementation issues.
2 Related Work
To alleviate the computational intractability of reasoning under the possible worlds specifications, as well as to address
other problematic issues, such as the logical omniscience
side-effect, alternative approaches for handling knowledge
change have been proposed that are disengaged from the accessibility relation. These theories are rapidly moving from
specialized frameworks to an important research field, but
adopt certain restrictions on the type of knowledge formulae
or domain classes that they can support.
Maybe the most influential initial approach towards an alternative formal account for reasoning about knowledge and
action is due to Demolombe and Pozos-Parra [2000] who introduced two different knowledge fluents to explicitly represent the knowledge that an ordinary fluent is true or false.
Working on the Situation Calculus, they treated knowledge
change as changing each of these fluents individually, the
same way ordinary fluent change is performed in the calculus,
thus reducing reasoning complexity by linearly increasing the
number of fluents. Nevertheless, the expressive power of the
representation was limited to knowledge of literals, while it
enforced knowledge of disjunctions to be broken apart into
knowledge of the individual disjuncts. Petrick and Levesque
[2002] proved the correspondence of this approach to the possible worlds-based Situation Calculus axiomatization for successor state axioms of a restricted form. Moreover, they defined a combined action theory that extended knowledge fluents to also account for first-order formulae when disjunctive
knowledge is tautology-free, still enforcing it to be broken
apart into knowledge of the individual parts.
Regression used by standard Situation Calculus is considered impractical for large sequences of actions and introduces
restrictive assumptions, such as closed-world and domain closure, which are problematic when reasoning with incomplete
knowledge. Recent approaches deploy different forms of progression. Liu and Levesque [2005] for instance, study a class
of incomplete knowledge that can be represented in so called
proper KBs and perform progression on them. The idea is
to focus on domains where a proper KB will remain proper
after progression, so that an efficient evaluation-based reasoning procedure can be applied. Domains where the actions
have local effects (i.e., when the properties of fluents that get
altered are contained in the action) provide such a safeguard.
The approach is efficient and sound for local effect action theories and may also be complete given certain restrictions, still
proper KBs under this weak progression do not permit some
general forms of disjunctions to emerge. Recently, Vassos et
al. [Vassos et al., 2009] investigated an extension to theories
with incomplete knowledge in the Situation Calculus where
the effects are not local and progression is still appropriate for
practical purposes.
Dependencies between unknown preconditions and effects
have been incorporated in an extension of the FLUX programming language [Thielscher, 2005b] under the Fluent
Calculus semantics. The extension, presented in [Thielscher,
2005a], handles dependencies by appending implication constraints to the existing store of constraint handling rules, in a
spirit very similar to the HCDs proposed in the present study.
The emphasis is on building an efficient constraint solver,
thus limiting the expressiveness. Moreover, it is not clear
how the extensive set of complicated implication rules that
are defined there are related to possible worlds.
Apart from Thielscher’s work, a recent study by Forth
and Shanahan [2004] is highly related to ours, as they attempt to capture knowledge change as ordinary fluent change.
The authors utilized knowledge fluents in the Event Calculus to specify when an agent possesses enough knowledge
to execute an action in a partially observable environment.
Still, their intention was to handle ramifications, focusing
on closed, controlled environments, rather than building a
generic epistemic theory for the Event Calculus. An agent
is only assumed to perform ”safe” actions, i.e., actions for
which enough knowledge about its preconditions is available.
In an open environment the occurrence of exogenous actions
might also fall under the agent’s attention, whose effects are
dependent on -unknown to it- preconditions. It is not clear
how knowledge evolves in terms of such uncertain effects,
neither how knowledge about disjunction of fluents can be
modeled. Within DECKT we attempt a broader treatment of
knowledge evolution within open environments, unifying a
wide range of complex commonsense phenomena.
56
3 Background
This study uses the DECKT knowledge theory [Patkos and
Plexousakis, 2009] as an underlying formalism. DECKT
extends the Event Calculus [Kowalski and Sergot, 1986]
with epistemic features enabling reasoning about a wide
range of commonsense phenomena, such as temporal and delayed knowledge effects, knowledge ramifications, concurrency, non-determinism and others. The Event Calculus is a
narrative-based many-sorted first-order language for reasoning about action and change, where events indicate changes in
the environment, fluents denote time-varying properties and a
timepoint sort implements a linear time structure. The calculus applies the principle of inertia, which captures the property that things tend to persist over time unless affected by
some event; when released from inertia, a fluent may have a
fluctuating truth value at each time instant. It also uses circumscription to solve the frame problem and support default
reasoning. A set of predicates is defined to express which fluents hold when (HoldsAt), what events happen (Happens),
which their effects are (Initiates, T erminates, Releases)
and whether a fluent is subject to the law of inertia or released
from it (ReleasedAt).
DECKT employs the discrete time Event Calculus axiomatization described in [Mueller, 2006]. It assumes agents acting in dynamic environments having accurate but potentially
incomplete knowledge and able to perform sensing and actions with context-dependent effects. It uses four new epistemic fluents, namely Knows, Kw (for ”knows whether”),
KP (for ”knows persistently”) and KP w. The Knows fluent expresses knowledge about domain fluents and formulae.
Whenever knowledge is subject to inertia the KP fluent is
used that is related with the Knows fluent by the axiom1 :
(KT2) HoldsAt(KP (φ), t) ⇒ HoldsAt(Knows(φ), t).
Moreover, knowledge can also be inferred indirectly by
means of appropriate ramifications, usually modeled as state
constraints. In brief, direct action effects that are subject to inertia affect the KP fluent, while indirect effects
and ramifications may interact with the Knows fluent explicitly. Finally, we have that HoldsAt((Kw(f ), t)) ≡
HoldsAt(Knows(f ), t) ∨ HoldsAt(Knows(¬f ), t) (the
abbreviation for HoldsAt(KP w(f ), t) is analogous)2 .
The objective is to extend a given domain axiomatization with a set of meta-axioms that enable an agent to perform epistemic derivations under incomplete information.
For instance, for positive effect axioms that specify under
what condition action e causes fluent f to become true, i.e.,
V
i
HoldsAt(fi , t) ⇒ Initiates(e, f, t), DECKT introduces
a statement expressing that if the conjunction of preconditions
C = {f~i } is known then after e the effect will be known:
Vf ∈C
(KT3.1) i [HoldsAt(Knows(fi ), t)]∧
Happens(e, t) ⇒ Initiates(e, KP (f ), t)
1
Free variables are implicitly universally quantified. Fluent formulae inside any epistemic fluent are reified, i.e., Knows(f1 ∧ f2 )
is a term of first-order logic, not an atom.
2
To clarify matters, the abbreviation only refers to the Kw and
KP w fluents inside the distinguished predicate HoldsAt; these
epistemic fluents can still be used as ordinary fluents inside any other
predicate of the calculus, e.g., Terminates(e,KPw(f),t).
However, if some precondition is unknown while none is
known false, then after e knowledge about the effect is lost:
Wf ∈C
(KT5.1) i [¬HoldsAt(Kw(fi ), t)]∧
Wf ∈C
¬HoldsAt(Knows( i ¬fi ), t)∧
¬HoldsAt(Knows(f ), t) ∧ Happens(e, t) ⇒
T erminates(e, KP w(f ), t)
The approach is analogous for negative effect axioms
Vi
HoldsAt(fi , t) ⇒ T erminates(e, f, t) and release axVi
ioms
HoldsAt(fi , t) ⇒ Releases(e, f, t). The latter
model non-deterministic effects, therefore they result in loss
of knowledge about the effect. Finally, knowledge-producing
(sense) actions provide information about the truth value of
fluents and, by definition, only affect the agent’s mental state:
(KT4) Initiates(sense(f ), KP w(f ), t)
4 Hidden Causal Dependencies
HCDs emerge when an agent performs actions with unknown preconditions. Consider the positive effect axiom
HoldsAt(f ′ , t) ⇒ Initiates(e, f, t), with f ′ unknown and
f known to be false at t (f may denote that a door is open,
f ′ that a robot stands in front of that door and e the action
of pushing forward gently). If e happens at t, f becomes unknown at t + 1, as dictated by (KT5.1), still a dependency between f ′ and f must be created to denote that if we later sense
any of them we can infer information about the value of the
other, assuming no event interacted with them in the meantime (either the robot was standing in front of the door and
opened it or the door remained closed). We propose here a
representation of HCDs as disjunctive formulae and describe
when they are created and destroyed and what knowledge is
preserved when a HCD is destroyed.
First, a word about notation. Let C denote the context of
an effect axiom (the set of precondition fluents), i.e. C =
{f0 , ..., fn }, n ≥ 0 (we omit to specify the axiom it refers to
as it will be clear from the context). Let C(t)+ be the subset of known fluents from C at a given time instant t, i.e.,
C(t)+ = {f ∈ C|HoldsAt(Knows(f ), t)}. Finally, let
C(t)− = C \ C(t)+ be the set of fluents that the agent either does not know or knows that they do not hold at t.
4.1
Creation of HCDs
Each time an action with unknown effect preconditions occurs, a HCD is created. We assume in this study that no action
affects the preconditions at the same time (except, of course,
if the effect’s preconditions is the effect fluent itself).
Vi
Pos. effect axioms
HoldsAt(fi , t) ⇒ Initiates(e, f, t):
If an action with a positive effect occurs with none of its preconditions known to be false, but some unknown to the agent,
a HCD is created between the latter and the effect (by sensing
that the robot is standing in front of the door after the push
gently action, it can infer that the door must be open):
Wf ∈C
(KT6.1.1) ¬HoldsAt(Knows( i ¬fi ), t)∧
Wfi ∈C
[¬HoldsAt(Kw(fi ), t)] ⇒
Wf ∈C(t)−
Initiates(e, KP (f ∨ j
¬fj ), t)
57
In other words and considering (KT2), we augment the theory with a disjunctive knowledge formula that is equivalent
Vf ∈C(t)−
to HoldsAt(Knows( j
fj ⇒ f ), t + 1).
In addition, a HCD is also created between the effect fluent
and its unknown preconditions, given that the agent knew that
the effect did not hold before the action:
Wf ∈C
(KT6.1.2) ¬HoldsAt(Knows( i ¬fi ), t)∧
Wfi ∈C
[¬HoldsAt(Kw(fi ), t)] ∧ HoldsAt(Knows(¬f ), t)
Vfj ∈C(t)−
⇒
Initiates(e, KP (¬f ∨ fj ), t)
Axiom (KT6.1.2) is triggered with (KT6.1.1), resulting in the
creation of an epistemic biimplication relation among the preconditions and the effect fluent.
Example 1.
Consider the positive effect axiom
HoldsAt(f1 , t) ∧ HoldsAt(f2 , t) ⇒ Initiates(e, f, t) denoting that when the robot stands in front of a door (f1 )
and the door is not locked (f2 ), a gentle push (e) will
cause the door to open (f ). Let the robot know initially that f does not hold, that f1 holds but does not
know whether f2 holds, i.e., HoldsAt(Knows(¬f ), 0) ∧
HoldsAt(Knows(f1 ), 0) ∧ ¬HoldsAt(Kw(f2 ), 0). In this
case, C = {f1 , f2 }, while C(0)− = {f2 }. After
Happens(e, 0) both (KT6.1.1,2) are triggered resulting in
HoldsAt(KP (¬f2 ∨ f ), 1) ∧ HoldsAt(KP (¬f ∨ f2 ), 1).
This is equivalent toV
HoldsAt(Knows(f2 ⇔ f ), 1).
i
Neg. effect axioms
Holds(fi , t) ⇒ T erminates(e, f, t)
The situation is similar. Now, the HCDs are created for ¬f :
Wf ∈C
(KT6.1.3) ¬HoldsAt(Knows( i ¬fi ), t)∧
Wfi ∈C
[¬HoldsAt(Kw(fi ), t)] ⇒
Wf ∈C(t)−
Initiates(e, KP (¬f ∨ j
¬f ), t)
Wf ∈Cj
(KT6.1.4) ¬HoldsAt(Knows( i ¬fi ), t)∧
Wfi ∈C
[¬HoldsAt(Kw(fi ), t)] ∧ HoldsAt(Knows(f ), t) ⇒
Vfj ∈C(t)−
Initiates(e, KP (f ∨ fj ), t)
Vi
Release axioms
HoldsAt(fi , t) ⇒ Releases(e, f, t):
It is trivial to see that in the case of non-deterministic effects
a HCD is only created if the agent has prior knowledge about
the effects. Specifically, only if it senses that a previously
known effect fluent has changed its truth value will the agent
be certain that the preconditions must have been true:
Wf ∈C
(KT6.1.5) ¬HoldsAt(Knows( i ¬fi ), t)∧
Wfi ∈C
[¬HoldsAt(Kw(fi ), t)]∧
HoldsAt(Knows((¬)f ), t) ⇒
Wf ∈C(t)−
Initiates(e, KP ((¬)f ∨ j
fj ), t)
4.2
Expiration of HCDs
In contrast to state constraints that express implication relations that must be satisfied at all times, HCDs are valid only
for limited time periods, as they are created due to the agent’s
epistemic state. Specifically, the dependency is valid for as
long as the involved fluents remain unaffected by occurring
events; if an event modifies them the HCD expires.
First, let us define an abbreviation stating that an event e
affects or may affect a fluent f if there is some effect axiom
none of whose preconditions f~i is known false:
(KmA) KmAf f ect(e, f, t) ≡ (KmInitiate(e, f, t) ∨
KmT erminate(e, f, t) ∨ KmRelease(e, f, t))
where KmInitiate(e, f, t)
≡
Initiates(e, f, t) ∧
Wfi ∈C
¬HoldsAt(Knows(
¬fi ), t) and similarly for
KmT erminate(e, f, t) and KmRelease(e, f, t). These
epistemic predicates do not cause any actual effect to f .
HCD Termination
If an event occurs that affects or may affect any fluent of a
HCD then this HCD is no longer valid:
Wd
(KT6.2.1) HoldsAt(KP ( fd ), t)∧
Wd
Happens(e, t) ∧
[KmAf f ect(e, fd , t)] ⇒
Wd
T erminates(e, KP ( fd ), t)
Example 2. Imagine a robot that speaks the correct passcode into a microphone (action e) with the intention of opening a safe (fluent f ), without knowing whether the microphone is recording (fluent f1 ); the following HCD is created
from (KT6.1.1): HoldsAt(Knows(¬f1 ∨ f ), t). Under this
simplistic setting, if later on the robot obtains information
through sensing (axiom (KT4)) that the safe is still locked, it
will also infer that the device is not recording. Now, at some
timepoint t1 > t the robot becomes aware of an action by
a third party that switches the microphone on. At this point
the HCD needs to expire, as the epistemic relation among the
fluents in no longer valid and no sense action about any of the
two fluents can provide information about the other. Axiom
(KT6.2.1) accomplishes this effect.
HCD Reduction
Depending on the type of action and the related context there
are situations where although a HCD becomes invalidated
due to (KT6.2.1) there may still be knowledge that should be
preserved. Specifically, if before the action the agent has inferred indirectly that the fluent that may be affected does not
hold, then this fluent did not contribute to the HCD in the first
place; the remaining components should create a new HCD:
Wf ∈D
(KT6.2.2) HoldsAt(KP (
f ), t) ∧ Happens(e, t)∧
Wf ∈D
[KmAf f ect(e, f, t) ∧ HoldsAt(Knows(¬f ), t)] ⇒
Wf ′ ∈D′ (t) ′
Initiates(e, KP (
f ), t)
where if f ∈ D are the fluents of the initial HCD, then D′ (t)
denotes those fluents of D that are not known at t.
HCD Expansion
Consider now the particular situation where a contextdependent event occurs, the preconditions of which are unknown to the agent and its effect is part of some HCD. In
this case, the agent cannot be certain whether the HCD will
be affected by the event, as this depends on the truth value
of the preconditions. In fact, the HCD itself becomes contingent on this set; if the preconditions prove to be false, the
original HCD should still be applicable, otherwise it must be
invalidated, according to the previous analysis. The way to
handle this situation is to expand the original HCD with the
negation of the action’s unknown preconditions. As a result,
by obtaining knowledge about them the agent can distinguish
whether the original dependency should persist or not.
Example 2. (cont’d) If at timepoint t2 (t1 > t2 > t)
the robot itself attempts to switch the microphone on under the (unknown to it) precondition of having pressed the
58
proper button (fluent f2 ) then, apart from the new HCD
HoldsAt(Knows(¬f2 ∨ f1 ), t2 + 1) according to (KT6.1.1),
the initial HCD needs to be modified. In particular, it
should capture the fact that only if the microphone has
not been switched on, should the HCD remain valid, i.e.,
HoldsAt(Knows(f2 ∨ ¬f1 ∨ f ), t2 + 1).
It becomes clear that the unknown preconditions of a
context-dependent effect should result in the expansion of any
HCD that includes the effect. Before modeling this situation
though, one must notice a crucial contingency: the agent uses
these preconditions to determine whether the original HCD
is applicable or not; what if this distinction cannot be made?
Such a situation may be, for instance, the result of an action
leading to a world state where the precondition fluents have
the same truth value regardless of the state before the action
(e.g., the action of closing a door if it is open). To capture
such a situation we introduce the following abbreviation stating that a fluent may be inverted by an occurring event:
(INV) KmInverted(f, t) ≡ ∃e(Happens(e, t) ∧
(Ef f ectP redicate(e, f, t) ∨ KmRelease(e, f, t)))
where, for a fluent literal f and its corresponding atom F ,
Ef f ectP redicate(e, f, t) denotes KmT erminate(e, F, t)
when f = F , and KmInitiate(e, F, t) when f = ¬F .
Notice that the KmInverted predicate is completely independent of the truth value a fluent might have at any
time instant. For example, for an effect axiom of the
form HoldsAt(f1 , t) ⇒ Initiates(e, f, t) we are interested whether KmInverted(f1 , t) is true, while for the axiom ¬HoldsAt(f1 , t) ⇒ Initiates(e, f ′ , t) we should seek
whether KmInverted(¬f1 , t) holds.
We can now formalize the axiomatization for HCD expansion: for any action e that may initiate, terminate or release
a fluent of a HCD, if its unknown preconditions f~i are not
or may not be inverted, then a new HCD is created that involves all the components of the original HCD along with the
unknown preconditions of e’s effect axiom:
Wf ∈D
(KT6.2.3) HoldsAt(KP (
f ), t) ∧ Happens(e, t)∧
Wf ∈D
[KmAf f ect(e, f, t) ∧ ¬HoldsAt(Kw(f ), t)]∧
Vf ∈C(t)−
¬( i
[KmInverted(fi , t)]) ⇒
Vfi ∈C(t)−
Wf ′ ∈D′ (t) ′
[Initiates(e, KP (fi ∨
f ), t)]
Intuitively, since any HCD represents an epistemic implication relation, axiom (KT6.2.3) creates a nested implication
relation with head the HCD and body the negated unknown
preconditions of the effect axiom that may affect it.
Transitivity
Finally, we also need to consider the transitivity property of
implication relations. Whenever an agent knows that f1 implies f2 and f2 implies f3 there is an implicit relation stating
that also f1 implies f3 . If an action affects f2 the two original
HCDs will expire due to (KT6.2.1), still the relation between
f1 and f3 that has been established should persist:
Wf ∈D
(KT6.2.4) HoldsAt(Knows(f ∨ ( i i fi )), t)∧
Wf ∈D
HoldsAt(Knows(¬f ∨ ( j j fj )), t)∧
Happens(e, t) ∧ KmAf f ect(e, f, t) ⇒
Wf ′ ∈D′ (t)
Wf ′ ∈D′ (t)
Initiates(e, KP ( i i fi′ ∨ j j fj′ ), t)
Figure 1: Relation among axiomatic sets.
5 Correctness and Complexity
DECKT has been shown to derive sound and complete inferences with respect to possible worlds-based theories [Patkos
and Plexousakis, 2009], based on a correspondence established to an epistemic extension of Mueller’s branching Event
Calculus (BDEC) [Mueller, 2007]. Exploiting the existing
machinery we arrived to the same result about our extension
with HCDs, proving that the (KT6) axioms constitute a complete and sufficient set:
Corollary 1 After any ground sequence of actions with deterministic effects but with potentially unknown preconditions, a
fluent formula φ is known whether it holds in DECKT if and
only if it is known whether it holds in BDECKT, under the
bridging set of axioms L and M.
Proof sketch3 : The branching discrete Event Calculus
(BDEC) devised by Mueller is a modified version of the linear discrete Event Calculus (LDEC) (see Figure 1). It replaces the timepoint sort with the sort of situations, lifting the
requirement that every situation must have a unique successor state. The Branching Discrete Event Calculus Knowledge
Theory (BDECKT) that we have developed follows on from
Moore’s [1985] formalization of possible world semantics in
action theories, where the number of K-accessible worlds remains unchanged upon ordinary event occurrences and reduces as appropriate when sense actions occur. Similar to
Scherl and Levesque’s [2003] approach for the Situation Calculus , BDECKT generalizes BDEC in that there is no single
initial situation in the tree of alternative situations, rather a
forest of trees each with its own initial situation.
The DECKT axiomatization is based on the linear Event
Calculus that treats knowledge as a fluent and uses a set of
axioms to determine the way this epistemic fluent changes its
truth value as a result of event occurrences and the knowledge
already obtained about relevant context. BDECKT on the
other hand is based on a branching time version of the Event
Calculus where knowledge is understood as reasoning about
the accessibility relation over possible situations (Figure 1).
Mueller has established a set L of mapping rules between the
underlying linear and branching versions of the Event Calcu-
lus and proved that these two versions can be logically equivalent [Mueller, 2007]. The L axioms restrict -among othersBDEC to a linear past. Based on this corollary established
by Mueller, we have shown that the two knowledge theories
manipulate knowledge change the same way, i.e., the set of
known formulae is the same after a sequence of actions (in
contrast to the underlying theories, our equivalence result is
not an one-to-one mapping of all the axioms). We define a set
M that serves as a bridge between DECKT and BDECKT and
construct each individual effect axiom of one theory from the
axioms of the other and the bridging rules (and vice versa).
This way, the conjunction of DECKT, BDEC, LDEC, L and
M can provide all BDECKT epistemic derivations leading
to completeness with respect to the possible worlds semantics and respectively, the conjunction of BDECKT, BDEC,
LDEC, L and M can provide all DECKT epistemic derivations resulting in soundness of DECKT inferences.
In what follows, we additionally show that reasoning with
HCDs is computationally more efficient. It is an important
conclusion as it paves the way for practical applications of
knowledge representation without substantial sacrifice in expressiveness. The objective is to study the complexity of
checking whether a fluent formula holds after the occurrence
of an event sequence in total chronological order, given a domain theory comprising a fixed set of context-dependent effect axioms and a set of implication rules (either in the form
of state constraints or in the form of HCDs).
5.1 Classic Event Calculus Without Knowledge
For n domain fluents there are potentially 2n distinct knowledge bases (KBs) (when all n fluents are released) that need
to be progressed according to occurring events and at most n
HoldsAt() and n ReleasedAt() predicates to search through
for each KB. All predicates are stored as facts:
Algorithm: For each event ei occurring at ti and each KB
Vj
1. Retrieve all effect axioms of ei :
[HoldsAt(fj , t)] ⇒
θ(ei , f ′ , t)?, for θ = Initiates, T erminates, Releases
This information is already known at design time.
Therefore, it requires constant time, regardless of the
type of action, the number of effect axioms or the size
of the domain (number of fluents).
2. Query the KB for the truth value of the precondition fluVj
ents of ei :
[HoldsAt(fj , ti )]?
The intention is to determine which of the previously
retrieved axioms will be triggered, i.e., which effect fluents will change their truth value. The problem of query
answering on (untyped) ground facts (without rules) reduces to the problem of unifying the query with the facts,
which is O(n), where n is the size of the KB.
3. Determine which fluents are inertial: ¬Released(f, t)?
Inertial fluents that have not been affected by ei in step
2, i.e., are neither released nor the event releases them,
need to maintain their truth value in the successor timepoint. As before, the cost of the query is O(n).
4. Assert in the KB the new truth values of fluents.
As the new truth values refer to the successor timepoint,
this step does not involve any update of existing facts,
3
The full proof is available at:
http://www.csd.uoc.gr/∼patkos/Proof.pdf
59
rather an assertion of facts to an empty KB. We assume
constant time, regardless of the number of assertions.
5. Use state constraints to infer all indirect effects.
The truth value of those fluents that are released from
inertia, yet ruled by state constraints, is determined. In
order to perform all derivations from the set of rules,
one may apply standard techniques from classical logic
inference, such as resolution. To preserve generality of
results, by a minor abuse of notation we denote this complexity as O(IN F SC ) or O(IN F HCD ) in the sequel,
based on whether the rules involve only the state constraints or both state constraints and HCDs. We revert to
this complexity at the end of this section. Also, in this
step multiple models may be produced and added to the
set of KBs, owed to the unconstrained released fluents,
i.e., non-inertial fluents subject to no state constraint.
Summarizing, the complexity of reasoning with the Event
Calculus given a sequence of e actions is characterized by
O(e∗2n ∗(2∗n+IN F SC ))4 . The steps of the algorithm follow on from previous complexity analysis of simpler formulations of the classic Event Calculus, as in [Paschke, 2006].
5.2 Possible Worlds Approach
The number of possible worlds depends on the number of
unknown fluents, i.e., in a domain of n fluents, u of which
being unknown, we need to store at most 2u possible worlds,
where (u ≤ n). One reasoning task needs to be performed for
each of these worlds, since the same effect axioms of a given
domain theory may give rise to diverse conclusions in each
world. As such, the size of the KB of fluents that is maintained at each timepoint is O(2u−1 ) (a fluent may hold only
in half of the total 2u worlds). Moreover, it is easy to verify
that, according to the definition of knowledge, answering if a
conjunctive (resp. disjunctive) query of m fluents is known
requires at most 2u−m (resp. 2u − 2u−m ) worlds to check
truth in (plus one, if the formula turns out to be unknown).
The algorithm and its logical inferences need to be performed for each possible world, given as input the domain
fluents and the fixed set of effect axioms and state constraints.
Given a sequence of e actions, the complexity for conjunctive
queries is O(e ∗ 2u ∗ (2 ∗ n + IN F SC ) + 2u−m ∗ n) (resp.
O(e∗2u ∗(2∗n+IN F SC )+(2u −2u−m )∗n) for disjunctive
queries), as we first need to progress all possible worlds and
then issue a query concerning the desirable fluent formula to
a subset of them, with cost O(n).
It should be noted that each fluent that becomes released
from the law of inertia causes the number of possible worlds
to double, i.e., u increases by one. As a result, both the size
of the KB and the reasoning effort increase significantly. Unsettlingly, one should also expect that u ≃ n even for the
real-world case, as we argue below.
5.3 DECKT Approach
DECKT performs a single reasoning task with each action
using the new axiomatization’s meta axioms and substitutes
4
Although constants could be eliminated, we include them for
emphasis, so that the reader can follow each step of the algorithm.
60
each atomic domain fluent with the corresponding KP and
Knows epistemic fluents. KP is always subject to inertia, whereas Knows is always released, therefore step 3
can be disregarded altogether, along with the need to preserve multiple versions of KBs for unconstrained fluents (the
Knows fluents never fluctuate). Furthermore, disjunctive
epistemic expressions are preserved in this single KB without
the need to be broken apart, since all appropriate derivations
are reached by means of HCDs. The size of the input for
step 2 is equal to that of reasoning without knowledge, as we
only search through those Knows fluents that refer to atomic
domain ones. The difference now is that each domain effect
axiom is replaced by 5 new: 2 due to (KT3), 1 due to (KT5)
and 2 due to (KT6.1). Nevertheless, as with the non-epistemic
theory, all we need to query in order to progress the KB after
any action are the precondition fluents (plus the effect fluent
for some of the axioms). Therefore, as before, the complexity
of this step is O(n), since one predefined query to a domain of
n fluents suffices to provide the necessary knowledge about
all DECKT effect axioms mentioned above.
Apart from the epistemic effect axioms, we also introduced axioms for handling HCDs (KT6.2-4). Since HCDs
are treated as ordinary inertial fluents (they are modeled in
terms of the KP fluent), they fall under the influence of traditional Event Calculus inference (steps 1,2). For these axioms
the necessary knowledge that needs to be obtained is whether
some HCD that incorporates the effect fluent is among the
HCDs stored in the KB. Their number increases as the agent
performs actions with unknown preconditions. Let d denote
the number of KP fluents that represent HCDs, then the complexity of querying the KB is O(d), where d ≤ 2n .
Following the algorithm, we can see that the complexity of
reasoning with DECKT is O(e ∗ (n + d + IN F HCD ) + n),
where O(IN F HCD ) is the complexity of logical inference
with state constraints and HCDs. The input is the atomic inertial fluents, as usual, reified in the Knows fluent. The last
addition refers to querying n atomic epistemic fluents, i.e.,
the query formula, after the narrative of actions.
In fact, even when the underlying domain axiomatization
is non-deterministic, its epistemic meta-program introduces
no additional complexity: the KP fluent is always subject to
inertia and whenever a domain fluent is released due to uncertainty, its corresponding KP fluents become false according
to (KT5). As such, reasoning with DECKT requires only a
reduced version of the Event Calculus, where the Releases
predicate is removed from the foundational axioms.
5.4 Discussion on Complexity Results
We see that the dominant factor in the complexity of reasoning with possible worlds is the number u of unknown world
aspects. In the worst case, u = n resulting in exponential
complexity to the size of the domain; yet, even in real-world
problems u ≃ n, as we expect that in large-scale dynamic
domains many more world aspects would be unknown to the
agent than known ones at any given time instant. Furthermore, since in practice the query formula that needs to be
evaluated is often orders of magnitude smaller in size than the
domain itself, i.e., (n ≫ m), query answering of either conjunctive or disjunctive formulae spirals quickly out of control.
With DECKT, on the other hand, it is the number of extra fluents capturing HCDs that predominates the complexity. In fact, although in the worst case it can be that d = 2n
this is a less likely contingency to meet in practice: it would
mean that the agent has interacted with all world aspects having no knowledge about any precondition or that state constraints that capture interrelated fluents embody the entire domain (so called dominos domains which lead to chaotic environments are not commonly met in commonsense reasoning).
Moreover, HCDs fall under the agent’s control; even for longlived agents that execute hundreds of actions, HCDs provide
a guide as to which aspects to sense in order to obtain knowledge about the largest set of interrelated fluents, thus enabling
the agent to manage their number according to resources.
Apparently, the number and length of HCDs also affect
the inference task. Still, the transition from O(IN F SC ) to
O(IN F HCD ) has polynomial cost; the complexity of most
inference procedures, such as resolution, is linearly affected
when increasing the number of implication rules, given that
the size of the domain is constant. Finally, one should notice that even in the worst case one reasoning task needs to be
performed for each action. Specifically, the factor d does not
influence the entire process, as is the case of 2u for possible
worlds, significantly reducing the overall complexity.
6 Implementation of HCD-enabled Reasoner
The formal treatment of knowledge and change we develop
aims at programming rational agents for practical implementations. Faced with the task of implementing an agent’s mental state, two features are most desirable by a reasoner in order
to exploit DECKT’s full potential:
• It should enable reasoning to progress incrementally to
allow for run-time execution of knowledge-based programs, where an agent can benefit from planning with
the knowledge at hand (online reasoning). Each time a
program interpreter adds a new action to its agenda, the
reasoner should update its current KB appropriately.
• It should permit reification of the epistemic fluents in
Event Calculus predicates, to allow for instance the
epistemic proposition Knows(Open(S1)) to be handled as a term of a first-order logic rather than an
atom. Based on this syntactical treatment proposition
HoldsAt(Knows(Open(S1)), 0) can be regarded as a
well-formed formula.
Most existing Event Calculus reasoners do not satisfy the latter requirement, while only recently an online reasoner was
released based on the Cached Event Calculus [Chesani et al.,
2009]. Consequently, in order to implement and evaluate different use cases we have constructed an Event Calculus reasoner on top of Jess5 , a rule-based engine that deploys the
efficient Rete algorithm for rule matching. Predicates are asserted as facts in the reasoner’s agenda, specified by the following template definition:
(deftemplate EC (slot predicate)
(slot event (default nil))
(slot epistemic (default no))
5
(multislot posLtrs )
(multislot negLtrs )
(slot time (default 0)))
Multislots create lists denoting fluent disjunctions (conjunctions are decomposable into their components according to
the definition for knowledge). For instance, knowledge about
formula (f1 ∨ f2 ∨ ¬f3 ) at time 1 is captured by the fact:
(EC (predicate HoldsAt)
(epistemic Knows)
(posLtrs f_1 f_2)
(negLtrs f_3)
(time 1))
The exploitation of lists for maintaining positive and negative literals of formulae enables the representation of HCDs
in a syntax-independent manner, so that all meta-axioms of
DECKT be translated into appropriately defined rules. This
way, the reasoning process can be fully automated, despite the
fact that the (KT6) set is time-dependent: the meta-axioms
adapt to the facts that exist in the reasoner’s agenda at each
timepoint. Among the basic features of the new reasoner6 are:
• given a domain axiomatization, the user can select between the execution of classical Event Calculus reasoning or epistemic reasoning using DECKT.
• the domain axiomatization is written based on a simple, intuitive Event Calculus-like syntax, which is then
parsed into appropriate Jess rules (Figure 2). The user
may modify the Jess program as well, thus augmenting
the axiomatization with advanced and more expressive
components, such as rules and constraints not yet supported by the Event Calculus parser.
• new events and observations can be asserted on-the-fly,
based on information acquired at execution time, e.g.,
from the user or the agent’s sensors.
• reasoning can progress incrementally, while the user can
decide the time span of the execution step.
• a GUI is provided for modifying and storing Event Calculus or Jess programs, for visualizing the output and for
providing input to the reasoner at execution time.
We should note, though, that the implementation of
DECKT described here is general enough to be written in any
prolog-like syntax and is not restricted to the Jess tool.
7 Conclusions
The DECKT framework has been used to extended benchmark commonsense problems with incomplete knowledge,
e.g., those included in [Mueller, 2006]. It is also integrated in
an Ambient Intelligence project that is currently in progress in
our institute, which introduces highly demanding challenges
within dynamic environments. The benefits of HCDs are investigated in a number of other interesting aspects in cognitive robotics as well, such as for representing the potential
effects of physical actions in unknown worlds, on whose occurrences the agent can only speculate, as well as for temporal indeterminacy of events. Among our future goals is also
to extend the applicability of the new reasoner, constituting it
a usable educational tool for epistemic action theories.
6
Jess, http://www.jessrules.com/ (last accessed: May 2011)
61
Jess-EC Reasoner: http://www.csd.uoc.gr/∼patkos/deckt.htm
Figure 2: The Jess-EC Reasoner with epistemic capabilities: the Event Calculus domain is translated into Jess rules, whose
input and execution the user can modify at execution time.
References
[Chesani et al., 2009] Federico Chesani, Paola Mello, Marco Montali, and Paolo Torroni. Commitment tracking via the reactive
event calculus. In Proceedings of the 21st international jont
conference on Artifical intelligence, pages 91–96, San Francisco,
CA, USA, 2009. Morgan Kaufmann Publishers Inc.
[Demolombe and Parra, 2000] Robert Demolombe and Maria del
Pilar Pozos Parra. A simple and tractable extension of situation
calculus to epistemic logic. pages 515–524, 2000.
[Forth and Shanahan, 2004] Jeremy Forth and Murray Shanahan.
Indirect and Conditional Sensing in the Event Calculus. In ECAI,
pages 900–904, 2004.
[Kowalski and Sergot, 1986] R Kowalski and M Sergot. A logicbased calculus of events. New Generation Computing, 4:67–95,
January 1986.
[Liu and Levesque, 2005] Yongmei Liu and Hector J. Levesque.
Tractable reasoning with incomplete first-order knowledge in dynamic systems with context-dependent actions. In Proceedings of
the 19th international joint conference on Artificial intelligence,
pages 522–527, San Francisco, CA, USA, 2005.
[Lob, 2001] Knowledge and the Action Description Language A.
Theory and Practice of Logic Programming, 1:129–184, 2001.
[Moore, 1985] R. C. Moore. A Formal Theory of Knowledge and
Action. In Formal Theories of the Commonsense World, pages
319–358. J. Hobbs, R. Moore (Eds.), 1985.
[Mueller, 2006] Erik Mueller. Commonsense Reasoning. Morgan
Kaufmann, 1st edition, 2006.
[Mueller, 2007] Erik Mueller.
Discrete Event Calculus with
Branching Time. In Eigth International Symposium on Logical
Formalizations of Commonsense Reasoning (Commonsense’07),
pages 126–131, 2007.
[Paschke, 2006] Adrian Paschke. ECA-RuleML: An Approach
Combining ECA Rules with Temporal Interval-based KR
62
Event/Action Logics and Transactional Update Logics. Computer Research Repository, abs/cs/0610167, 2006.
[Patkos and Plexousakis, 2009] Theodore Patkos and Dimitris
Plexousakis. Reasoning with knowledge, action and time in
dynamic and uncertain domains. In Proceedings of the 21st
international jont conference on Artifical intelligence, pages
885–890, USA, 2009. Morgan Kaufmann Publishers Inc.
[Petrick and Levesque, 2002] R. Petrick and H. Levesque. Knowledge Equivalence in Combined Action Theories. In Proceedings
of the 8th International Conference on Principles of Knowledge
Representation and Reasoning (KR-02), pages 303–314, 2002.
[Scherl and Levesque, 2003] Richard B. Scherl and Hector J.
Levesque. Knowledge, Action, and the Frame Problem. Artificial Intelligence, 144(1-2):1–39, 2003.
[Thielscher, 2000] Michael Thielscher. Representing the knowledge of a robot. In A. Cohn, F. Giunchiglia, and B. Selman,
editors, Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR), pages
109–120. Morgan Kaufmann, 2000.
[Thielscher, 2005a] M. Thielscher. Handling Implication and Universal Quantification Constraints in FLUX. In Proceedings of
the 11th International Conference on Principles and Practice of
Constraint Programming (CP11), pages 667–681, 2005.
[Thielscher, 2005b] Michael Thielscher. FLUX: A Logic Programming Method for Reasoning Agents. Theory and Practice of
Logic Programming, 5(4–5):533–565, 2005.
[Vassos et al., 2009] Stavros Vassos, Stavros Sardina, and Hector
Levesque. Progressing Basic Action Theories with non-Local
Effect Actions. In Proceedings of the Ninth International Symposium on Logical Formalizations of Commonsense Reasoning
(CS’09), pages 135–140, 2009.
Preferred Explanations: Theory and Generation via Planning∗
Shirin Sohrabi
Jorge A. Baier
Sheila A. McIlraith
Department of Computer Science
University of Toronto
Toronto, Canada
[email protected]
Depto. de Ciencia de la Computación
Pontificia Universidad Católica de Chile
Santiago, Chile
[email protected]
Department of Computer Science
University of Toronto
Toronto, Canada
[email protected]
Abstract
observations might be of the actions of an agent, and the explanation a plan that captures what the agent is doing and/or
the final goal of that plan.
Here we conceive the computational core underlying explanation generation of dynamical systems as a nonclassical
planning task. Our focus in this paper is with the generation of preferred explanations – how to specify preference
criteria, and how to compute preferred explanations using
planning technology. Most explanation generation tasks that
distinguish a subset of preferred explanations appeal to some
form of domain-independent criteria such as minimality or
simplicity. Domain-specific knowledge has been extensively
studied within the static-system explanation and abduction
literature as well as in the literature on specific applications
such as diagnosis. Such domain-specific criteria often employ probabilistic information, or in its absence default logic
of some notion of specificity (e.g., Brewka 1994).
In 2010, we examined the problem of diagnosis of discrete dynamical systems (a task within the family of explanation generation tasks), exploiting planning technology to
compute diagnoses and suggesting the potential of planning
preference languages as a means of specifying preferred diagnoses (Sohrabi, Baier, and McIlraith 2010). Building on
our previous work, in this paper we explicitly examine the
use of preference languages for the broader task of explanation generation. In doing so, we identify a number of
somewhat unique representational needs. Key among these
is the need to talk about the past (e.g., “If I observe that my
car has a flat tire then I prefer explanations where my tire
was previously punctured.”) and the need to encode complex observation patterns (e.g., “My brakes have been failing intermittently.”) and how these patterns relate to possible explanations. To address these requirements we specify
preferences in Past Linear Temporal Logic (P LTL), a superset of Linear Temporal Logic (LTL) that is augmented with
modalities that reference the past. We define a finite variant of P LTL, f-P LTL, that is augmented to include action
occurrences.
Motivated by a desire to generate explanations using
state-of-the-art planning technology, we propose a means
of compiling our f-P LTL preferences into the syntax of
PDDL3, the Planning Domain Description Language 3 that
supports the representation of temporally extended preferences (Gerevini et al. 2009). Although, f-P LTL is more
In this paper we examine the general problem of generating
preferred explanations for observed behavior with respect to
a model of the behavior of a dynamical system. This problem arises in a diversity of applications including diagnosis
of dynamical systems and activity recognition. We provide a
logical characterization of the notion of an explanation. To
generate explanations we identify and exploit a correspondence between explanation generation and planning. The determination of good explanations requires additional domainspecific knowledge which we represent as preferences over
explanations. The nature of explanations requires us to formulate preferences in a somewhat retrodictive fashion by utilizing Past Linear Temporal Logic. We propose methods
for exploiting these somewhat unique preferences effectively
within state-of-the-art planners and illustrate the feasibility of
generating (preferred) explanations via planning.
1. Introduction
In recent years, planning technology has been explored as
a computational framework for a diversity of applications.
One such class of applications is the class that corresponds
to explanation generation tasks. These include narrative understanding, plan recognition (Ramı́rez and Geffner 2009),
finding excuses (Göbelbecker et al. 2010), and diagnosis
(e.g., Sohrabi, Baier, and McIlraith 2010; Grastien et al.
2007).1 While these tasks differ, they share a common computational core, calling upon a dynamical system model to
account for system behavior, observed over a period of time.
The observations may be over aspects of the state of the
world, or over the occurrence of events; the account typically takes the form of a set or sequence of actions and/or
state that is extracted from the construction of a plan that
embodies the observations. For example, in the case of diagnosis, the observations might be of the, possibly aberrant,
behavior of an electromechanical device over a period of
time, and the explanation a sequence of actions that conjecture faulty events. In the case of plan recognition, the
∗
A version of this paper appears in the Proceedings of the
Twenty-Fifth Conference on Artificial Intelligence (AAAI-11)
Copyright c 2011, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
1
(Grastien et al. 2007) characterized diagnosis in terms of SAT
but employed a planning-inspired encoding.
63
A sequence of actions α is executable in s if δ(α, s) is defined. Furthermore α is executable in Σ iff it is executable
in s, for any s consistent with I.
expressive than the preference subset of PDDL3 (e.g., fP LTL has action occurrences and arbitrary nesting of temporal modalities), our compilation preserves the f-P LTL semantics while conforming to PDDL3 syntax. This enables
us to exploit PDDL3-compliant preference-based planners
for the purposes of generating preferred explanations. We
also propose a further compilation to remove all temporal
modalities from the syntax of our preferences (while preserving their semantics) enabling the exploitation of costbased planners for computing preferred explanations. Additionally, we exploit the fact that observations are known
a priori to pre-process our suite of explanation preferences
prior to explanation generation in a way that further simplifies the preferences and their exploitation. We show that
this compilation significantly improves the time required to
find preferred explanations, sometimes by orders of magnitude. Experiments illustrate the feasibility of generating
(preferred) explanations via planning.
2.2 Past LTL with Action Occurrences
Past modalities have been exploited for a variety of specialized verification tasks and it is well established that LTL
augmented with such modalities has the same expressive
power as LTL restricted to future modalities (Gabbay 1987).
Nevertheless, certain properties (including fault models and
explanation models) are more naturally specified and read
in this augmentation of LTL. For example specifying that
every alarm is due to a fault can easily be expressed by
✷(alarm → f ault), where ✷ means always and means
once in the past. Note that ¬U(¬f ault, (alarm ∧ ¬f ault))
is an equivalent formulation that uses only future modalities
but is much less intuitive. In what follows we define the
syntax and semantics of, f-P LTL, a variant of LTL that is
augmented with past modalities and action occurrences.
2. Explaining Observed Behavior
Syntax Given a set of fluent symbols F and a set of action
symbols A, the atomic formulae of the language are: either
a fluent symbol, or occ(a), for any a ∈ A. Non-atomic
formulae are constructed by applying negation, by applying a standard boolean connective to two formulae, or by
including the future temporal modalities “until” (U), “next”
( ), “always”(✷), and “eventually”(♦), or the past temporal modalities “since” (S), “yesterday” ( ), “always in the
past”(), and “eventually in the past”(). We say φ is expressed in future f-P LTL if it does not contain any past temporal modalities. Similarly, φ is expressed in past f-P LTL if
it does not contain any future temporal modalities. A nontemporal formula does not contain any temporal modalities.
In this section we provide a logical characterization of a preferred explanation for observed behavior with respect to a
model of a dynamical system. In what follows we define
each of the components of this characterization, culminating in our characterization of a preferred explanation.
2.1 Dynamical Systems
•
Dynamical systems can be formally described in many
ways. In this paper we assume a finite domain and model
dynamical systems as transition systems. For convenience,
we define transitions systems using a planning language. As
such transitions occur as the result of actions described in
terms of preconditions and effects. Formally, a dynamical
system is a tuple Σ = (F, A, I), where F is a finite set of
fluent symbols, A is a set of actions, and I is a set of clauses
over F that defines a set of possible initial states. Every action a ∈ A is defined by a precondition prec(a), which is
a conjunction of fluent literals, and a set of conditional effects of the form C → L, where C is a conjunction of fluent
literals and L is a fluent literal.
A system state s is a set of fluent symbols, which intuitively defines all that is true in a particular state of the dynamical system. For a system state s, we define Ms : F →
{true, f alse} as the truth assignment that assigns the truth
value true to f if f ∈ s, and assigns f alse to f otherwise. We say a state s is consistent with a set of clauses C, if
Ms |= c, for every c ∈ C. Given a state s consistent with I,
we denote Σ/s as the dynamical system (F, A, I/s), where
I/s stands for the set of unit clauses whose only model
is Ms . We say a dynamical system Σ = (F, A, I) has a
complete initial state iff there is a unique truth assignment
M : F → {true, f alse} such that M |= I.
We assume that an action a is executable in a state s if
Ms |= prec(a). If a is executable in a state s, we define
its successor state as δ(a, s) = (s \ Del) ∪ Add, where
Add contains a fluent f iff C → f is an effect of a and
Ms |= C. On the other hand Del contains a fluent f iff
C → ¬f is an effect of a, and Ms |= C. We define
δ(a0 a1 . . . an , s) = δ(a1 . . . an , δ(a0 , s)), and δ(ǫ, s) = s.
Semantics Given a system Σ, a sequence of actions α, and
an f-P LTL formula ϕ, the semantics defines when α satisfies
ϕ in Σ. An f-P LTL formula is interpreted over finite rather
than infinite sequences of states. Its semantics resembles
that of LTL on so-called truncated paths (Eisner et al. 2003).
Before we define the semantics formally, we give two definitions. Let s be a state and α = a0 a1 . . . an be a (finite)
sequence of actions. We say that σ is an execution trace of
α in s iff σ = s0 s1 s2 . . . sn+1 and δ(ai , si ) = si+1 , for any
i ∈ [0, n]. Furthermore, if l is the sequence ℓ0 ℓ1 . . . ℓn , we
abbreviate its suffix ℓi ℓi+1 . . . ℓn by li .
Definition 1 (Truth of an f-P LTL Formula) An f-P LTL
formula ϕ is satisfied by α in a dynamical system
Σ = (F, A, I) iff for any state s consistent with I, the
execution trace σ of α in s is such that hσ, αi |= ϕ, where 2
• hσi , αi i |= ϕ, where ϕ ∈ F iff ϕ is an element of the first
state of σi .
• hσi , αi i |= occ(a) iff i < |α| and ai is the first action of
αi .
• hσi , αi i |= ϕ iff i < |σ| − 1 and hσi+1 , αi+1 i |= ϕ
• hσi , αi i |= U(ϕ, ψ) iff there exists a j ∈ {i, ..., |σ| − 1}
such that hσj , αj i |= ψ and for every k ∈ {i, ..., j − 1},
hσk , αk i |= ϕ
2
64
We omit standard definitions for ¬, ∨.
•
• hσi , αi i |= ϕ iff i > 0 and hσi−1 , αi−1 i |= ϕ
• hσi , αi i |= S(ϕ, ψ) iff there exists a j ∈ {0, ..., i} such
that hσj , αj i |= ψ and for every k ∈ {j + 1, ..., i},
hσk , αk i |= ϕ
Definition 2 (Explanation) Given a dynamical system Σ =
(F, A, I), and an observation formula ϕ, expressed in future
f-P LTL, an explanation is a tuple (H, α), where H is a set
of clauses over F such that I ∪ H is satisfiable, I 6|= H, and
α is a sequence of actions in A such that α satisfies ϕ in the
system ΣA = (F, A, I ∪ H).
The semantics of other temporal modalities are defined in
def
def
terms of these basic elements, e.g., ϕ = ¬¬ϕ, ϕ =
def
S(true, ϕ), and ♦ϕ = U(true, ϕ). Observe that our semantics adopts a strong next operator; i.e., φ will not be satisfied if evaluated in the final state of a finite sequence.
It is well recognized that some properties are more naturally expressed using past modalities. An additional property of such modalities is that they can construct formulae that are exponentially more succinct than their future
modality counterparts. Indeed let Σn be a system with
Fn = {p0V
, . . . , pn }, let ψi = pi ↔ (¬ true ∧ pi ), and let
n
Ψ = ✷ i=1 ψi → ψ0 . Intuitively, ψi expresses that “pi
has the same truth value now as it did in the initial state”.
Theorem 1 (Following Markey 2003) Any formula ψ, expressed in future f-P LTL, equivalent to Ψ (defined as above)
has size Ω(2|Ψ| ).
Note that although Markey’s theorem is related to temporal logic evaluated on infinite paths, the property also holds
when it is interpreted on truncated paths.
In the following sections we provide a translation of formulae with past modalities into future-only formulae, in order to use existing planning technology. Despite Markey’s
theorem, it is possible to show that the blowup for Ψ can
be avoided if one modifies the transition system to include
additional predicates that keep track of the initial truth value
of each of p0 , . . . , pn . Such a modification can be done in
linear time.
Example Assume a standard logistics domain with one
truck, one package, and in which all that is known initially
is that the truck is at loc1 . We observe pkg is unloaded from
truck1 in loc1 , and later it is observed that pkg is in loc2 .
One can express the observation as
♦[occ(unload(pkg, loc1 )) ∧
♦at(pkg, loc2 )]
A possible explanation (H, α),
is such that
H = {in(pkg, truck1 )}, and α is unload(pkg, loc1 ),
load(pkg, loc1 ), drive(loc1 , loc2 ), unload(pkg, loc2 ).
Note that aspects of H and α can be further filtered to
identify elements of interest to a particular user following
techniques such as those in (McGuinness et al. 2007).
Given a system and an observation, there are many possible explanations, not all of high quality. At a theoretical
level, one can assume a reflexive and transitive preference
relation between explanations. If E1 and E2 are explanations and E1 E2 we say that E1 is at least as preferred as
E2 . E1 ≺ E2 is an abbreviation for E1 E2 and E2 6 E1 .
•
Definition 3 (Optimal Explanation) Given a system Σ, E
is an optimal explanation for observation ϕ iff E is an explanation for ϕ and there does not exist another explanation
E ′ for ϕ such that E ′ ≺ E.
3. Complexity and Relationship to Planning
It is possible to establish a relationship between explanation
generation and planning. Before doing so, we give a formal
definition of planning.
A planning problem with temporally extended goals is a
tuple P = (Σ, G), where Σ is a dynamical system, and G
is a goal formula expressed in future f-P LTL. The sequence
of actions α is a plan for P if α is executable in Σ and α
satisfies G in Σ. A planning problem (Σ, G) is classical if
Σ has a complete initial state, and conformant otherwise.
The following is straightforward from the definition.
2.3 Characterizing Explanations
Given a description of the behavior of a dynamical system
and a set of observations about the state of the system and/or
action occurrences, we define an explanation to be a pairing
of actions, orderings, and possibly state that account for the
observations in the context of the system dynamics. The definitions in this section follow (but differ slightly from) the
definitions of dynamical diagnosis we proposed in (Sohrabi,
Baier, and McIlraith 2010), which in turn elaborate and extend previous work (e.g., McIlraith 1998; Iwan 2001).
Assuming our system behavior is defined as a dynamical system and that the observations are expressed in future
f-P LTL, we define an explanation as a tuple (H, α) where
H is a set of clauses representing an assumption about the
initial state and α is an executable sequence of actions that
makes the observations satisfiable. If the initial state is complete, then H is empty, by definition. In cases where we have
incomplete information about the initial state, H denotes assumptions that we make, either because we need to establish
the preconditions of actions we want to conjecture in our explanation or because we want to avoid conjecturing further
actions to establish necessary conditions. Whether it is better to conjecture more actions or to make an assumption is
dictated by domain-specific knowledge, which we will encode in preferences.
Proposition 1 Given a dynamical system Σ = (F, A, I)
and an observation formula ϕ, expressed in future f-P LTL,
then (H, α) is an explanation iff α is a plan for conformant
planning problem P = ((F, A, I ∪ H), ϕ) where I ∪ H is
satisfiable and where ϕ is a temporally extended goal.
In systems with complete initial states, the generation of
a single explanation corresponds to classical planning with
temporally extended goals.
Proposition 2 Given a dynamical system Σ such that Σ has
complete initial state, and an observation formula ϕ, expressed in future f-P LTL, then (∅, α) is an explanation iff
α is a plan for classical planning problem P = (Σ, ϕ) with
temporally extended goal ϕ.
Indeed, the complexity of explanation existence is the same
as that of classical planning.
65
Theorem 2 Given a system Σ and a temporally extended
formula ϕ, expressed in future f-P LTL, explanation existence is PSPACE-complete.
include reasonable facts that we wish to posit about the initial state (e.g., that it’s below freezing outside – a common
precursor to a car battery being dead).
In response to the somewhat unique representational requirements, we express preferences in f-P LTL. In order to
generate explanations using state-of-the-art planners, an objective of our work was to make the preference input language PDDL3 compatible. However, f-P LTL is more expressive than the subset of LTL employed in PDDL3, and
we did not wish to lose this expressive power. In the next
section we show how to compile away some or all temporal
modalities by exploiting the correspondence between past
and future modalities and by exploiting the correspondence
between LTL and Büchi automata. In so doing we preserve
the expressiveness of f-P LTL within the syntax of PDDL3.
Proof sketch. For membership, we propose the following
NPSPACE algorithm: guess an explanation H such that I ∪H
has a unique model, then call a PSPACE algorithm (like
the one suggested by de Giacomo and Vardi (1999)) to decide (classical) plan existence. Then we use the fact that
NPSPACE=PSPACE. Hardness is given by Proposition 2
and the fact that classical planning is PSPACE-hard (Bylander 1994).
§
The proof of Theorem 2 appeals to a non-deterministic algorithm that provides no practical insight into how to translate plan generation into explanation generation. At a more
practical level, there exists a deterministic algorithm that
maps explanation generation to classical plan generation.
4.1 Preferred Explanations
A high quality explanation is determined by the optimization of an objective function. The PDDL3 metric function
we employ for this purpose is a weighted linear sum of formulae to be minimized. I.e., (minimize (+ (∗ w1 φ1 ) . . . (∗
wk φk ))) where each φi is a formula that evaluates to 0 or
1 depending on whether an associated preference formula,
a property of the explanation trajectory, is satisfied or violated; wi is a weight characterizing the importance of that
property (Gerevini et al. 2009). The key role of our preferences is to convey domain-specific knowledge regarding
the most preferred explanations for particular observations.
Such preferences take on the following canonical form.
Definition 4 (Explanation Preferences) ✷(φobs → φexpl )
is an explanation preference formula where φobs , the observation formula, is any formula expressed in future fP LTL, and φexpl , the explanation formula, is any formula
expressed in past f-P LTL. Non-temporal expressions may
appear in either formula.
An observation formula, φobs , can be as simple as the observation of a single fluent or action occurrence (e.g., my
car won’t start.), but it can also be a complex formula. In
many explanation scenarios, observations describe a telltale
ordering of system properties or events that suggest a unique
explanation such as a car that won’t start every time it rains.
To simplify the description of observation formulae, we employ precedes as a syntactic constructor of observation patterns. ϕ1 precedes ϕ2 indicates that ϕ1 is observed before
ϕ2 . More generally, one can express ordering among observations by using formula of the form (ϕ1 precedes ϕ2 ...
precedes ϕn ) with the following interpretation:
Theorem 3 Given an observation formula ϕ, expressed in
future f-P LTL, and a system Σ, there is an exponential-time
procedure to construct a classical planning problem P =
(Σ′ , ϕ) with temporally extended goal ϕ, such that if α is a
plan for P , then an explanation (H, α′ ) can be generated in
linear time from α.
Proof sketch. Σ′ , the dynamical system that describes P is
the same as Σ = (F, A, I), augmented with additional actions that “complete” the initial state. Essentially, each such
action generates a successor state s that is consistent with
I. There is an exponential number of them. If a0 a1 . . . an
is a plan for P , we construct the explanation (H, α′ ) as follows. H is constructed with the facts true in the state s that
a0 generates. α′ is set to a1 . . . an .
§
All the previous results can be re-stated in a rather
straightforward way if the desired problem is to find an optimal explanation. In that case the reductions are made to
preference-based planning (Baier and McIlraith 2008).
The proofs of the theorems above unfortunately do not
provide a practical solution to the problem of (high-quality)
explanation generation. In particular, we have assumed that
planning problems contain temporally extended goals expresed in future f-P LTL. No state-of-the-art planner that we
are aware of supports these goals directly. We have not provided a compact and useful way to represent the relation.
4. Specifying Preferred Explanations
The specification of preferred explanations in dynamical settings presents a number of unique representational requirements. One such requirement is that preferences over explanations be contextualized with respect to observations,
and these observations themselves are not necessarily single
fluents, but rich temporally extended properties – sometimes
with characteristic forms and patterns. Another unique representational requirement is that the generation of explanations (and preferred explanations) necessitates reflecting on
the past. Given some observations over a period of time, we
wish to conjecture what preceded these observations in order to account for their occurrence. Such explanations may
include certain system state that explains the observations,
or it may include action occurrences. Explanations may also
ϕ1 ∧
♦(ϕ2 ∧
♦(ϕ3 ...(ϕn−1 ∧
♦ϕn )...))
(1)
Equations (2) and (3) illustrate the use of precedes to encode
a total (respectively, partial) ordering among observations.
These are two common forms of observation formulae.
(obs1 precedes obs2 precedes obs3 precedes obs4 )
(obs3 precedes obs4 ) ∧ (obs1 precedes obs2 )
(2)
(3)
Further characteristic observation patterns can also be
easily described using precedes. The following is an example of an intermittent fault.
66
if and only if α satisfies φ in P ’s dynamical system. Predicate acceptφ is the (classical) goal in problem P ′ . Below
we introduce an extension of the BM compilation that allows
compiling away formulae expressed in past f-P LTL.
Our compilation takes dynamical system Σ, an observation ϕ, a set Γ of formulae corresponding to explanation
preferences, and produces a PDDL3 planning problem.
Step 1 Takes Σ and ϕ and generates a classical planning
problem P1 with temporally extended goal ϕ using the procedure described in the proof for Theorem 3.
Step 2 Compiles away occ in P1 , generating P2 . For each
occurrence of occ(a) in Γ or ϕ, it generates an additional
fluent happeneda which is made true by a and is deleted by
all other actions. Replace occ(a) by happeneda in Γ and ϕ.
Step 3 Compiles away all the past elements of preferences
in Γ. It uses the BM compilation over P2 to compile away
past temporal operators in preference formulae of the form
✷(φobs → φexpl ), generating P3 . For every explanation
formula φexpl , expressed in past f-P LTL, in Γ we do the following. We compute the reverse of φexpl , φrexpl , as a formula
just like φexpl but with all past temporal operators changed
to their future counterparts (i.e.,
by , by ♦, S by U).
Note that φexpl is satisfied in a trajectory of states σ iff φrexpl
is satisfied in the reverse of σ. Then, we use phase 1 of the
BM compilation to build a finite state automaton Aφrexpl for
φrexpl . We now compute the reverse of Aφrexpl by switching accepting and initial states and reversing the direction of
all transitions. Then we continue with phase 2 of the BM
compilation, generating a new planning problem for the reverse of Aφrexpl . In the resulting problem the new predicate
acceptφexpl becomes true as soon as the formula φexpl , expressed in past f-P LTL, is made true by the execution of an
action sequence. We replace any occurrence of φexpl in Γ
by acceptφexpl . We similarly use the BM compilation to remove future temporal modalities from ϕ and φobs . This is
only necessary if they contain nested modalities or , which
they often will. The output of this step is PDDL3 compliant.
To generate PDDL3 output without any temporal operators,
we perform the following further step.
Step 4 (optional) Compiles away temporal operators in Γ
and ϕ using the BM compilation, ending with simple preferences that refer only to the final state.
(alarm precedes no alarm precedes alarm precedes no alarm)
Similarly, explanation formulae, φexp , can be complex
temporally extended formulae over action occurrences and
fluents. However, in practice these explanations may be reasonably simple assertions of properties or events that held
(resp. occurred) in the past. The following are some canonical forms of explanation formulae: (e1 ∧ ... ∧ en ), and
(e1 ⊗ ... ⊗ en ), where n ≥ 1, and ei is either a fluent ∈ F
or occ(a), a ∈ A and ⊗ is exclusive or.
5. Computing Preferred Explanations
In previous sections we addressed issues related to the specification and formal characterization of preferred explanations. In this section we examine how to effectively generate
explanations using state-of-the-art planning technology.
Propositions 1 and 2 establish that we can generate explanations by treating an observation formula ϕ as the temporally extended goal of a conformant (resp. classical) planning problem. Preferred explanations can be similarly computed using preference-based planning techniques. To employ state-of-the-art planners, we must represent our observation formulae and the explanation preferences in syntactic forms that are compatible with some version of PDDL.
Both types of formulae are expressed in f-P LTL so PDDL3
is a natural choice since it supports preferences and some
LTL constructs. However, f-P LTL is more expressive than
PDDL3, supporting arbitrarily nested past and future temporal modalities, action occurrences, and most importantly
the next modality, , which is essential to the encoding of
an ordered set of properties or action occurrences that occur over time. As a consequence, partial- and total-order
observations are not expressible in PDDL3’s subset of LTL,
and so it follows that the precedes constructor commonly
used in the φobs component of explanation preferences is
not expressible in the PDDL3 LTL subset. There are similarly many typical φexpl formulae that cannot be expressed
directly in PDDL3 because of the necessity to nest temporal
modalities. So to generate explanations using planners, we
must devise other ways to encode our observation formulae
and our explanation preferences.
•
5.1 Approach 1: PDDL3 via Compilation
Although it is not possible to express our preferences directly in PDDL3, it is possible to compile unsupported temporal formulae into other formulae that are expressible in
PDDL3. To translate to PDDL3, we utilize Baier and McIlraith’s future LTL compilation approach (2006), which we
henceforth refer to as the BM compilation. Given an LTL
goal formula φ, expressed in future f-P LTL, and a planning
problem P , the BM compilation executes the following two
steps: (phase 1) generates a finite state automaton for φ, and
(phase 2) encodes the automaton in the planning problem
by adding new predicates to describe the changing configuration of the automaton as actions are performed. The result
is a new planning problem P ′ that augments P with a newly
introduced accepting predicate acceptφ that becomes true
after performing a sequence of actions α in the initial state
Theorem 4 Let P3 be defined as above for a description
Σ, an observation ϕ, and a set of preferences Γ. If α is
a plan for P3 with an associated metric function value M ,
then we can construct an explanation (H, α) for Σ and ϕ
with associated metric value M in linear time.
Although Step 4 is not required, it has practical value. Indeed, it enables potential application of other compilation
approaches that work directly with PDDL3 without temporal operators. For example, it enables the use of Keyder and
Geffner’s compilation (2009) to compile preferences into
corresponding actions costs so that standard cost-based planners can be used to find explanations. This is of practical
importance since cost-based planners are (currently) more
mature than PDDL3 preference-based planners.
67
5.2 Approach 2: Pre-processing (Sometimes)
To further address our first objective, we compared the
performance of FF (Hoffmann and Nebel 2001), L AMA
(Richter, Helmert, and Westphal 2008), SGPlan6 (Hsu and
Wah 2008) and HPLAN -P (Baier, Bacchus, and McIlraith
2009) on our compiled problems but with no preferences.
The results show that in the total-order cases, all planners
except HPLAN -P solved all problems within seconds, while
HPLAN -P took much longer, and could not solve all problems (i.e., it exceeded the 600 second time limit). The same
results were obtained with the partial-order problems, except that L AMA took a bit longer but still was far faster than
HPLAN -P. This suggests that our domains are reasonably
challenging.
To address our second objective we turned to preferencebased planner HPLAN -P. We created different versions of
the same problem by increasing the number of preferences
they used. In particular, for each problem we tested with 10,
20, and 30 preferences. To measure the change in computation time between problems with different numbers of preference, we calculated the percentage difference between the
computation time for the problem with the larger and with
the smaller number of preferences, all relative to the computation time of the larger numbered problem. The average
percentage difference was 6.5% as we increased the number
of preferences from 10 to 20, and was 3.5% as we went from
20 to 30 preferences. The results suggest that as we increase
the number of preferences, the time it takes to find a solution
does increase but this increase is not significant.
As noted previously the Approach 1 compilation technique (including Step 4) results in the complete removal
of temporal modalities and therefore enables the use of the
Keyder and Geffner compilation technique (2009). This
techniques supports the computation of preference-based
plans (and now preferred explanations) using cost-based
planners. However, the output generated by our compilation
requires a planner compatible with ADL or derived predicates. Among the rather few that support any of these, we
chose to experiment with L AMA since it is currently the
best-known cost-based planner. Figure 1 shows the time it
takes to find the optimal explanation using HPLAN -P and
L AMA as well as the time comparison between our “Approach 1” and “Approach 2” encodings (Section 5). To measure the gain in computation time from the “Approach 2”
technique, we computed the percentage difference between
the two, relative to “Approach 1”. (We assigned a time of
600 to those marked NF.) The results show that on average we gained 22.9% improvement for HPLAN -P and 29.8
% improvement for L AMA in the time it takes to find the
optimal solution. In addition, we calculated the time ratio
(“Approach 1”/ “Approach 2”). The results show that on average HPLAN -P found plans 2.79 times faster and L AMA
found plans 31.62 times faster when using “Approach 2”.
However, note that “Approach 2” does not always improve
the performance. There are a few cases where the planners
take longer when using “Approach 2”. While the definite
cause of this decrease in performance is currently unknown,
we believe this decrease may depend on the structure of the
problem and/or on the difference in the size of the translated
domains. On average the translated problems used in “Ap-
The compiled planning problem resulting from the application of Approach 1 can be employed with a diversity of
planners to generate explanations. Unfortunately, the preferences may not be in a form that can be effectively exploited by delete relaxation based heuristic search. Consider
the preference formula γ = ✷(φobs → φexpl ). Step 4 culminates in an automaton with accepting predicate acceptγ .
Unfortunately, acceptγ is generally true at the outset of plan
construction because φobs is false – the observations have
not yet occurred in the plan – making φobs → φexpl , and
thus γ, trivially true. This deactivates the heuristic search to
achieve acceptγ and thus the satisfaction of this preference
does not benefit from heuristic guidance. For a restricted but
compelling V
class of preferences, namely those of the form
✷(φobs → i ei ) with ei a non-temporal formula, we can
pre-process our preference formula in advance of applying
Approach 1, by exploiting the fact that we know a priori what
observations have occured. Our pre-processing utilizes the
following LTL identity:
^
^
✷(φobs → ei ) ∧ ♦φobs ≡ ¬φobs U(ei ∧ ♦φobs ).
i
i
V
Given a preference in the form ✷(φobs → i ei ) we
determine whether φobs is entailed by the observation ϕ (this
can be done efficiently given the form of our observations).
If this is the case, we use the identity above to transform our
preferences, followed by application of Approach 1. The
accepting predicate of the resulting automaton becomes true
if φexpl is satisfied prior to φobs . In the section to follow, we
see that exploiting this pre-processing can improve planner
performance significantly.
6. Experimental Evaluation
The objective of our experimental analysis was to gain some
insight into the behavior of our proposed preference formalism, specifically, we wanted to: 1) develop a set of somewhat diverse benchmarks and illustrate the use of planners
in the generation of explanations; 2) examine how planners
perform when the number of preferences is increased; and 3)
investigate the computational time gain resulting from Approach 2. We implemented all compilation techniques discussed in Section 5 to produce PDDL3 planning problem
with simple preferences that are equivalent to the original
explanation generation problems.
We used four domains in our experiments: a computer domain (see Grastien et al. 2007), a car domain (see McIlraith
and Scherl 2000), a power domain (see McIlraith 1998), and
the trucks domain from IPC 2006. We modified these domains to account for how observations and explanations occur within the domain. In addition, we created two instances
of the same problem, one with total-order observations and
another with partial-order observations. Since the observations we considered were either total- or partial-order, we
were able to compile them away using a technique that essentially makes an observation possible only after all preceding observations have been observed (Haslum and Grastien
2009; 2011). Finally, we increased problem difficulty by increasing the number of observations in each problem.
68
Total-Order
HPLAN -P
L AMA
Partial-Order
HPLAN -P
L AMA
that was amenable to heuristic search. In so doing, we were
able to reduce the time required for explanation generation
by orders of magnitude, sometimes.
Appr 1 Appr 2 Appr 1 Appr 2 Appr 1 Appr 2 Appr 1 Appr 2
computer-1 1.05 0.78 5.29
computer-2 5.01 4.88 0.19
computer-3 23.44 22.85 0.75
computer-4 55.69 51.98 6.94
computer-5 128.50 125.98 2.05
computer-6 83.17 82.78 2.64
computer-7 505.73 484.68 4.23
computer-8 236.03 205.81 3.75
car-1
1.60 1.53 0.66
car-2
8.96 8.31 10.72
car-3
563.60 40.17 13.98
car-4
NF 103.80 24.00
car-5
NF 245.69 35.93
car-6
NF 522.50 117.45
car-7
NF
NF 62.00
car-8
NF
NF 108.07
power-1
0.02 0.01 0.02
power-2
0.18 0.18 0.13
power-3
0.47 0.50 0.13
power-4
1.62 1.52 0.63
power-5
26.98 24.60 5.97
power-6
51.65 51.48 11.26
power-7
177.58 177.42 15.09
power-8
565.77 564.71 30.67
truck-1
1.90 1.62 0.13
truck-2
5.25 5.10 0.85
truck-3
108.83 92.57 0.38
truck-4
323.18 323.06 2.03
truck-5
177.68 174.22 3.31
truck-6
NF
NF
2.69
truck-7
NF
NF 10.23
truck-8
NF
NF 11.60
0.25
0.41
0.75
4.58
3.20
4.63
5.85
6.13
0.08
0.20
0.59
1.41
1.56
2.44
3.47
4.46
0.02
0.06
0.13
0.58
0.97
6.84
9.42
16.38
0.29
0.74
1.07
2.48
4.93
1.88
11.92
8.76
2.26 0.58
1.49 1.42
15.92 15.57
15.97 13.93
57.28 56.19
43.92 43.86
188.45 181.44
159.92 152.49
0.60 0.53
3.04 2.59
593.10 15.06
NF 38.48
NF 103.18
NF 176.11
NF 170.54
NF 257.10
0.82 0.53
0.14 0.13
0.31 0.33
75.37 69.85
NF
NF
NF
NF
NF
NF
NF
NF
3.08 1.98
3.12 3.07
36.92 27.57
402.96 219.73
NF
NF
NF
NF
NF
NF
NF
NF
5.93
0.50
1.02
3.64
3.42
16.99
89.03
29.35
2.11
15.14
16.51
33.79
NF
NF
NF
NF
0.02
0.40
3.50
14.92
46.43
NF
NF
NF
0.24
0.32
0.59
2.15
2.71
8.53
8.42
8.19
0.57
0.42
1.94
4.33
6.12
16.27
68.47
28.91
0.07
0.25
0.62
0.95
1.23
1.56
2.02
2.94
0.02
0.06
0.11
18.37
0.64
NF
NF
NF
0.24
0.49
1.15
1.87
3.90
6.14
8.41
11.81
Acknowledgements
We thank Alban Grastien and Patrik Haslum for providing us
with an encoding of the computer problem, which we modified and used in this paper for benchmarking. We also gratefully acknowledge funding from the Natural Sciences and
Engineering Research Council of Canada (NSERC). Jorge
Baier was funded by the VRI-38-2010 grant from Universidad Católica de Chile.
References
Baier, J., and McIlraith, S. 2006. Planning with first-order
temporally extended goals using heuristic search. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), 788–795.
Baier, J., and McIlraith, S. 2008. Planning with preferences.
AI Magazine 29(4):25–36.
Baier, J.; Bacchus, F.; and McIlraith, S. 2009. A heuristic
search approach to planning with temporally extended preferences. Artificial Intelligence 173(5-6):593–618.
Brewka, G. 1994. Adding priorities and specificity to default
logic. In Proceedings of the Logics in Artificial Intelligence,
European Workshop (JELIA), 247–260.
Bylander, T. 1994. The computational complexity of
propositional STRIPS planning. Artificial Intelligence 69(12):165–204.
de Giacomo, G., and Vardi, M. Y. 1999. Automata-theoretic
approach to planning for temporally extended goals. In Biundo, S., and Fox, M., eds., ECP, volume 1809 of LNCS,
226–238. Durham, UK: Springer.
Eisner, C.; Fisman, D.; Havlicek, J.; Lustig, Y.; McIsaac,
A.; and Campenhout, D. V. 2003. Reasoning with temporal
logic on truncated paths. In Proceedings of the 15th International Conference on Computer Aided Verification (CAV),
volume 2725 of LNCS. Boulder, CO: Springer. 27–39.
Gabbay, D. M. 1987. The declarative past and imperative
future: Executable temporal logic for interactive systems. In
Temporal Logic in Specification, 409–448.
Gerevini, A.; Haslum, P.; Long, D.; Saetti, A.; and Dimopoulos, Y. 2009. Deterministic planning in the 5th int’l
planning competition: PDDL3 and experimental evaluation
of the planners. Artificial Intelligence 173(5-6):619–668.
Göbelbecker, M.; Keller, T.; Eyerich, P.; Brenner, M.; and
Nebel, B. 2010. Coming up with good excuses: What
to do when no plan can be found. In Proceedings of the
20th International Conference on Automated Planning and
Scheduling (ICAPS), 81–88.
Grastien, A.; Anbulagan; Rintanen, J.; and Kelareva, E.
2007. Diagnosis of discrete-event systems using satisfiability algorithms. In Proceedings of the 22nd National Conference on Artificial Intelligence (AAAI), 305–310.
Haslum, P., and Grastien, A. 2009. Personal communication.
Figure 1: Runtime comparison between HPLAN -P and L AMA on
problems of known optimal explanation. NF means the optimal
explanation was not found within the time limit of 600 seconds.
proach 2” are 1.4 times larger, hence this increase in the size
of the problem may be one reason behind the decrease in
performance. Nevertheless, this result shows that “Approach
2” can significantly improve the time required to find the optimal explanation, sometimes by orders of magnitude, in so
doing it allows us to solve more problem instances than with
“Approach 1” alone (see car-4 and car-5).
7. Summary
In this paper, we examined the task of generating preferred
explanations. To this end, we presented a logical characterization of the notion of a (preferred) explanation and established its correspondence to planning, including the complexity of explanation generation. We proposed a finite variant of LTL, f-P LTL, that includes past modalities and action occurrences and utilized it to express observations and
preferences over explanation. To generate explanations using state-of-the-art planners, we proposed and implemented
a compilation technique that preserves f-P LTL semantics
while conforming to PDDL3 syntax. This enables computation of preferred explanations with PDDL3-compliant
preference-based planners as well as with cost-based planners. Exploiting the property that observations are known
a priori we transformed explanation preferences into a form
69
Haslum, P., and Grastien, A. 2011. Diagnosis as planning: Two case studies. In Proceedings of the International
Scheduling and Planning Applications workshop (SPARK).
Hoffmann, J., and Nebel, B. 2001. The FF planning system:
Fast plan generation through heuristic search. Journal of
Artificial Intelligence Research 14:253–302.
Hsu, C.-W., and Wah, B. 2008. The SGPlan planning system. In 6th International Planning Competition Booklet
(IPC-2008).
Iwan, G. 2001. History-based diagnosis templates in the
framework of the situation calculus. In Proceedings of the
Joint German/Austrian Conference on Artificial Intelligence
(KR/ÖGAI). 244–259.
Keyder, E., and Geffner, H. 2009. Soft Goals Can Be
Compiled Away. Journal of Artificial Intelligence Research
36:547–556.
Markey, N. 2003. Temporal logic with past is exponentially
more succinct, concurrency column. Bulletin of the EATCS
79:122–128.
McGuinness, D. L.; Glass, A.; Wolverton, M.; and da Silva,
P. P. 2007. Explaining task processing in cognitive assistants that learn. In Proceedings of the 20th International
Florida Artificial Intelligence Research Society Conference
(FLAIRS), 284–289.
McIlraith, S., and Scherl, R. B. 2000. What sensing tells us:
Towards a formal theory of testing for dynamical systems.
In Proceedings of the 17th National Conference on Artificial
Intelligence (AAAI), 483–490.
McIlraith, S. 1998. Explanatory diagnosis: Conjecturing
actions to explain observations. In Proceedings of the 6th
International Conference of Knowledge Representation and
Reasoning (KR), 167–179.
Ramı́rez, M., and Geffner, H. 2009. Plan recognition as
planning. In Proceedings of the 21st International Joint
Conference on Artificial Intelligence (IJCAI), 1778–1783.
Richter, S.; Helmert, M.; and Westphal, M. 2008. Landmarks revisited. In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI), 975–982.
Sohrabi, S.; Baier, J.; and McIlraith, S. 2010. Diagnosis as
planning revisited. In Proceedings of the 12th International
Conference on the Principles of Knowledge Representation
and Reasoning (KR), 26–36.
70
The Method of ILP+ASP on Psychological Models
J. Romero, A. Illobre, J. Gonzalez and R. Otero
AI Lab. Computer Science Department, University of Corunna (Spain)
{jromerod,infjdi00,jgonzalezi,otero}@udc.es
Abstract
In the next section we introduce the logic programming
methods that are used in the method of ILP+ASP. Then in
section 3 we describe the method of Experimental Psychology and the method of ILP+ASP, and in section 4 we show
the application of the proposed method to HRDM. Finally,
section 5 presents conclusions and future work.
We propose to apply a new method of Inductive
Logic Programming (ILP) and Answer Set Programming (ASP) to Experimental Psychology. The
idea is to use ILP to build a model from experimental data and then use ASP with the resulting
model to solve reasoning tasks as explanation or
planning. For learning in dynamic domains without the frame problem we use the method of [Otero,
2003] and for reasoning in dynamic domains without the frame problem we use actions in ASP [Lifschitz, 2002]. We have applied this method to an
experiment in a dynamic domain of Human Reasoning and Decision Making. The results show that
the method can be used for learning and reasoning
in real-world dynamic domains, thus improving the
methods used in Experimental Psychology, that do
not consider these problems.
1
2
Logic programming methods
Inductive Logic Programming. ILP [Muggleton, 1995] is
an area of Machine Learning for the induction of hypothesis
from examples and background knowledge, using logic programming as a single representation for them. Inverse Entailment (IE) is a correct and complete ILP method proposed by
S. Muggleton that can deal with recursive rules, implemented
in the ILP systems Progol [Muggleton, 1995] and Aleph1 .
Given a set of examples and background knowledge, these
systems can find the simplest hypothesis that explains every
example and is consistent with the background. ILP has been
successfully applied in other areas of science such as Molecular Biology (see for example [King et al., 1996]).
Inductive Logic Programming for Actions. Induction of
the effects of actions consists in learning an action description
of a dynamic system from evidence on its behavior. General logic-based induction methods can deal with this problem but, unfortunately, most of the solutions provided have
the frame problem. Instead we propose to use the method
of [Otero, 2003], implemented in the system Iaction [Otero
and Varela, 2006]. This is a correct, complete and efficient
method for induction of action descriptions that can cope with
the frame problem in induction.
Answer Set Programming. ASP is a form of logic programming based in the stable models (answer set) semantics
[Gelfond and Lifschitz, 1991]. An ASP program can have
none, one or several answer sets that can be computed with
an ASP system (e.g., Clasp2 ). The method of ASP is (1) to
encode a problem as an ASP program such that solutions of
the problem correspond to answer sets of the program, and (2)
to use an ASP system to compute answer sets of the program.
Answer Set Programming for Actions. ASP is suitable
for representing action descriptions without the frame problem, and it can be used to solve different tasks like prediction,
Introduction
The objective of Experimental Psychology is building models of human behavior supported by experimental data. These
models are often incomplete and not formally defined, and
usually a method of linear regression is used to complete
and formalize them. In this paper we propose to use logic
programming methods instead. To our knowledge there are
few previous attempts to use symbolic methods in this area
[Gigerenzer and Selten, 2002][Balduccini and Girotto, 2010].
The idea of the method is to apply Inductive Logic Programming (ILP) to build a psychological model and then apply Answer Set Programming (ASP) with the resulting model
to solve reasoning tasks. For induction in dynamic domains
without the frame problem [McCarthy and Hayes, 1969] we
use the method of [Otero, 2003] and for reasoning in dynamic
domains without the frame problem we use actions in ASP
[Lifschitz, 2002].
We have applied this method to an experiment in a dynamic
domain of Human Reasoning and Decision Making (HRDM),
a field of Experimental Psychology. The objective of the experiment is to study how people select the strategies for solving repeatedly a given task. We use ILP to automatically build
a model of the process of strategy selection, and then use ASP
to reason about the model.
1
2
71
http://www.comlab.ox.ac.uk/activities/machinelearning/Aleph
http://www.cs.uni-potsdam.de/clasp/
Subject
s1
s2
s3
s4
diagnosis and planning [Lifschitz, 2002]. For example, the
Decision Support System of the Space Shuttle [Nogueira et
al., 2000] is an ASP system capable of solving planning and
diagnostic tasks related to the operation of the Space Shuttle.
3
Behavior
1
2
3
5
Intention
2
3
3
5
Control
1
2
4
4
The Method of ILP+ASP
Table 1: Behavior, intention and control for 4 subjects.
In this section we explain the method of Experimental Psychology, we present the method of ILP+ASP and describe its
application in detail to an example of Experimental Psychology.
For example, the resulting model for behavior may consist
of the following equation:
3.1 The Method of Experimental Psychology
behavior = 1 ∗ intention + 0.4 ∗ control − 1.5
The method of Experimental Psychology follows these steps:
(1)
Step 5. Reasoning
The model built can be used to predict the behavior of other
persons.
For example, if someone’s intention is high (4) but control is very low (1), the model predicts its behavior will be
medium (2.9).
Step 1. Psychological Theory
First a psychological theory about human behavior is proposed.
For example, the Theory of Planned Behavior [Ajzen,
1985] states that human behavior can be modeled by the following concepts:
• Intention: the intention to perform the behavior.
• Perceived Behavioral Control (control): the perceived
ease or difficulty of performing the behavior.
According to this theory, behavior is related to intention and
control but the particular relation is not known in advance,
and it is assumed to depend on the type of behavior. In this
example we will consider ecological behavior, i.e. human behavior that is relevant for environmental issues: waste management, water and energy consumption, etc.
Actions
Actions can modify the behavior of people. For example,
giving a course on ecology may promote ecological behavior,
and removing recycling bins may diminish it. The behavior
and the other concepts can be measured before and after the
execution of the actions. Again the relation between these
actions and the concepts of the theory is not known in advance. We know that some instances of these actions have
been done on different people and we want to know the particular relation that holds on every one, which may depend
on conditions of the particular person. This is a problem of
learning the effects of actions, and the result is a model of
the particular relation among the actions and the concepts of
the theory. This problem is not considered in the method of
Experimental Psychology.
Step 2. Representation
The concepts of the proposed theory are represented formally
to be used in the step of model construction. In Experimental
Psychology it is usual to represent the concepts as variables
with several values.
For example, behavior is not represented as a boolean variable that states whether a given behavior is performed or not,
but instead it has values ranging from 1 to 5, which represent how often the behavior is performed. The same holds for
intention and control.
3.2 The Method of ILP+ASP
The method of ILP+ASP can be outlined as follows:
1. Substitute the numerical representation by a logic programming representation such that the final model is a
logic program instead of a linear equation.
Step 3. Data Collection
Models are based in experimental data, that can be collected
following different methods.
For example, we do a survey to many persons in which we
ask them whether they performed a given ecological behavior
and what was their intention and their perceived behavioral
control towards that behavior. Each answer is then quantified
so that for each person we have a value for behavior, intention
and control. Example data is shown in table 1, where each
row corresponds to a subject and each column corresponds to
a concept. For example, subject s1 has behavior 1, intention
2 and control 1.
2. Substitute the linear regression method by a method for
induction of logic programs (ILP). Thus the logic program solution is built from instance data of the survey
by a Machine Learning method.
3. Substitute the use of the model for prediction by reasoning with ASP. Thus we can do additional relevant tasks
like explanation and planning, which are not considered
in the method of Experimental Psychology.
To justify the correctness and significance of the ILP+ASP
method consider the following:
1. Logic programs provide a representation more general
than linear equations. The relation among the variables
may not be as simple as a linear equation, and a logic
program can represent alternative relations, e.g. not continuous, which could be the case in the domain of Psychology. Note that logic programming allows the repre-
Step 4. Model Construction
A model with the particular relation among the concepts is
built. Typically it is used a method of linear regression, that
uses the representation chosen in step 2 and the data collected
in step 3.
72
sentation of simple numerical formulas, e.g. linear equations, so we are clearly improving the form of representation.
2. ILP is a correct and complete method of induction for
logic programs from instances. Thus the result will have
the correction at the same level as linear regression has.
The method has the power to identify a model if it exists
or to tell us that there is no model, i.e. to validate the
psychological theory on experimental data.
3. ASP is able to use the model built, which is a logic program, for tasks done with linear equations like prediction, but also for other additional relevant tasks like explanation and planning.
In summary, the correctness of ILP provides the correctness of the method when building the model and the correction of ASP provides the correctness of the method when using the model.
and lteq to allow Progol to make comparisons with different
values of those concepts:
3.3 An example
The first two lines define some predicates, called types, used
in the mode declarations. The following sentences, called
modes, describe the predicates that the system can use in the
head of the learned rules (modeh declarations) or in the body
(modeb declarations)3 . The modeh declaration states that the
head of a learned rule has predicate symbol behavior and two
parameters, one of type subject and another of type value.
The meaning of the modeb declarations is very similar, but
they refer to the predicates that can appear in the body of
the rules learned: intention, control, gteq and lteq. Sentences
with the set predicate are used to configure the search of the
hypothesis. The final sentence states that a subject cannot
have two levels of behavior.
Given these sentences Progol finds two rules:
i n t e n t i o n ( s1 , 2 ) . c o n t r o l ( s1 , 1 ) . i n t e n t i o n ( s2 , 3 ) . c o n t r o l ( s2 , 2 ) .
i n t e n t i o n ( s3 , 3 ) . c o n t r o l ( s3 , 4 ) . i n t e n t i o n ( s4 , 5 ) . c o n t r o l ( s4 , 4 ) .
g t e q (X, Y) :− v a l u e (X) , v a l u e (Y) , X >= Y .
l t e q (X, Y) :− v a l u e (X) , v a l u e (Y) , X =< Y .
These sentences, together with the code below, tell Progol to
construct a definition for the behavior predicate.
s u b j e c t ( s1 ) . s u b j e c t ( s2 ) . s u b j e c t ( s3 ) . s u b j e c t ( s4 ) .
value ( 1 ) . value ( 2 ) . value ( 3 ) . value ( 4 ) . value ( 5 ) .
:−
:−
:−
:−
:−
modeh ( 1 , b e h a v i o r ( + s u b j e c t , + v a l u e ) ) ?
modeb ( ∗ , i n t e n t i o n ( + s u b j e c t ,− v a l u e ) ) ?
modeb ( ∗ , c o n t r o l ( + s u b j e c t ,− v a l u e ) ) ?
modeb ( ∗ , g t e q ( + v a l u e , # v a l u e ) ) ?
modeb ( ∗ , l t e q ( + v a l u e , # v a l u e ) ) ?
:− s e t ( i n f l a t e , 1 0 0 0 ) ?
:− s e t ( nodes , 1 0 0 0 ) ?
:− b e h a v i o r ( S , X) , b e h a v i o r ( S , Y) , n o t (X==Y ) .
Next we explain the particular steps of the method of
ILP+ASP with an example.
Step 1. Psychological Theory
This step is the same as in the method of Experimental Psychology. We apply the Theory of Planned Behavior to study
ecological behavior.
Step 2. Representation
Define the concepts of the theory as predicates of a logic program. In this example:
• behavior(S, X): subject S behaves ecologically with
degree X. For example, behavior(s4, 5) represents that
s4 behaves very ecologically.
• intention(S, X): subject S has the intention to be ecological with degree X.
• control(S, X): subject S perceives that it is easy for her
to be ecological to a degree X.
b e h a v i o r ( S , X) :− c o n t r o l ( S , X) , l t e q (X , 2 ) .
b e h a v i o r ( S , X) :− i n t e n t i o n ( S , X) , c o n t r o l ( S , Y) , g t e q (Y , 3 ) .
If the control of a subject is less or equal to 2, behavior has
the same value as control, and in other case the behavior is
equal to the intention. The rules are a solution to induction
because they explain every example and they are consistent
with the background.
We can compare these rules with the linear equation of section 3.1. Only in 6 out of 25 pairs of values of intention and
control their predictions differ in more than one unit, so they
predict similar values. But the result of Progol is more insightful. The linear equation simply states how much does
every change in intention or control contribute to a change in
behavior, while the rules of Progol provide more information
about how do these changes happen: when control is very
low it blocks the behavior, and in other case the behavior is
determined by the intention.
There is an underlying problem in the construction of the
model. Surveys are not a safe instrument to measure the concepts of the theory: some people can give answers that, intentionally or not, are false. We can handle this problem with
Progol allowing it to predict incorrectly a given proportion of
the examples. Besides, the method of ILP+ASP provides a
very precise way to detect outliers. To improve the quality of
Step 3. Data Collection
This step is the same as in the method of Experimental Psychology: a survey is done and the results are those of table
1.
Step 4. Model construction
Apply an ILP system (in this example, Progol 4.4) to automatically build a model.
Progol constructs logic programs from examples and background knowledge. In this case there are 4 examples of behavior that we represent as ground facts (rules without body
or variables) with predicate behavior:
b e h a v i o r ( s1 , 1 ) . b e h a v i o r ( s2 , 2 ) .
b e h a v i o r ( s3 , 3 ) . b e h a v i o r ( s4 , 5 ) .
Progol uses the background knowledge to construct rules that
explain these examples. The background knowledge encodes
the knowledge that the expert thinks that is relevant for the
learning process, e.g., part of a psychological theory. In this
example we represent the other concepts of the theory with
predicates intention and control and we add predicates gteq
3
73
For further details we refer the reader to [Muggleton, 1995].
3.4 An example in Actions
the surveys psychologists can introduce related questions, so
that some combinations of answers are inconsistent. For example, question q1 could be “Do you recycle the paper?” and
question q2 could be “Do you recycle anything?”. If someone answers 5 (always) to q1 and 1 (never) to q2 that person is
answering inconsistently. ASP can be used to precisely identify the subjects, outliers, that give inconsistent answers. For
example, consider the program
We explain the application of the method of ILP+ASP to dynamic domains.
Step 1. Psychological theory
We consider the application of the Theory of Planned Behavior to study the effects of actions on ecological intention and
on ecological control.
q1 ( s1 , 2 ) . q2 ( s1 , 2 ) . q1 ( s2 , 5 ) . q2 ( s2 , 1 ) .
q1 ( s3 , 5 ) . q2 ( s3 , 4 ) . q1 ( s4 , 3 ) . q2 ( s4 , 3 ) .
o u t l i e r ( S ) :− q1 ( S , 5 ) , q2 ( S , 1 ) .
Step 2. Representation
Define predicates for the actions and for the concepts of the
theory that change over time, called fluents. Each instant of
time is called a situation, and situations range from 0 (initial
situation) to n (final situation), where n depends on each experiment. Predicates now have a new term to represent the
situation in which the action took place or the situation in
which the fluent value holds.
Actions. Two actions may change the fluents:
where the first lines represent the answer of different subjects
to questions q1 and q2 and the last rule is used to detect outliers. This program has a unique answer set, which can be
computed with an ASP system like Clasp, that contains the
atom outlier(s2), thus detecting the unique outlier of the experiment. The outliers identified can be removed from the
data set and studied apart, and the model construction step
can be repeated with the new set of examples.
• course(S): a course on ecology is given at situation S.
Step 5. Reasoning
The logic program built in the previous step is a model of
ecological behavior:
• car sharing(S): a project for sharing cars to go to work
is done at situation S.
b e h a v i o r ( S , X) :− c o n t r o l ( S , X) , l t e q (X , 2 ) .
b e h a v i o r ( S , X) :− i n t e n t i o n ( S , X) , c o n t r o l ( S , C ) , g t e q ( C , 3 ) .
g t e q (X, Y) :− v a l u e (X) , v a l u e (Y) , X >= Y .
l t e q (X, Y) :− v a l u e (X) , v a l u e (Y) , X <= Y .
value ( 1 ) . value ( 2 ) . value ( 3 ) . value ( 4 ) . value ( 5 ) .
• intention(S, A, X): at S subject A has intention to be ecological with degree X. For example,
intention(0, s1, 5) represents that at 0 subject s1 has
a high ecological intention, and intention(2, s1, 1) represents that at 2 her intention is low.
Fluents. We modify the predicates of the static case:
This program can be used in ASP to solve different tasks.
Prediction. Given the intention and control of a new subject we can use the model to predict her behavior. For example, if subject s5 has intention 5 and control 2 we add to the
previous program the facts:
• control(S, A, X): at S subject A perceives that it is easy
for her to be ecological to a degree X.
Step 3. Data Collection
To study the effects of actions we do surveys at different situations. In this example a survey was done to 2 subjects, then
a course on ecology was given and another survey was done,
and finally a car sharing project was done followed by another
survey. The next program represents the data:
i n t e n t i o n ( s5 , 5 ) . c o n t r o l ( s5 , 2 ) .
The resulting program has a unique answer set that can be
computed by Clasp and contains the prediction for the behavior of s5:
i n t e n t i o n ( 0 , s1 , 3 ) .
i n t e n t i o n ( 0 , s2 , 2 ) .
course ( 1 ) .
i n t e n t i o n ( 1 , s1 , 3 ) .
i n t e n t i o n ( 1 , s2 , 2 ) .
car sharing (2).
i n t e n t i o n ( 2 , s1 , 5 ) .
i n t e n t i o n ( 2 , s2 , 2 ) .
b e h a v i o r ( s5 , 2 )
Explanation. Given the behavior of a new subject and possibly some additional information we can use the model to
explain her behavior. For example, we want to explain why
s6 has behavior 5, and we know her control is 3. For this task
we add the following sentences:
c o n t r o l ( 0 , s1 , 2 ) .
c o n t r o l ( 0 , s2 , 3 ) .
c o n t r o l ( 1 , s1 , 5 ) .
c o n t r o l ( 1 , s2 , 5 ) .
c o n t r o l ( 2 , s1 , 5 ) .
c o n t r o l ( 2 , s2 , 5 ) .
Step 4. Model construction
Apply system Iaction [Otero and Varela, 2006] to automatically build a model of the relations between the actions and
the concepts considered. Iaction implements the method of
[Otero, 2003]. The syntax and use of Iaction is very similar
to that of Progol. For example, the mode declarations for this
example are:
1 { i n t e n t i o n ( s6 , 1 ) , i n t e n t i o n ( s6 , 2 ) , i n t e n t i o n ( s6 , 3 ) ,
i n t e n t i o n ( s6 , 4 ) , i n t e n t i o n ( s6 , 5 ) } 1 .
c o n t r o l ( s6 , 3 ) . b e h a v i o r ( s6 , 5 ) .
:− b e h a v i o r ( S , X) , b e h a v i o r ( S , Y) , X! =Y .
The first rule forces to choose among one of the possible values of intention, the next rule represents the known data, and
the last one, like the one we used in Progol, eliminates the answer sets that predict two different values for behavior. The
output of Clasp:
:−
:−
:−
:−
:−
:−
:−
i n t e n t i o n ( s6 , 5 ) , c o n t r o l ( s6 , 3 ) , b e h a v i o r ( s6 , 5 )
gives the explanation for the very high behavior: the intention
is also very high.
74
modeh ( ∗ , c o n t r o l ( + s i t u a t i o n , + s u b j e c t , # v a l u e ) ) ?
modeh ( ∗ , i n t e n t i o n ( + s i t u a t i o n , + s u b j e c t , # v a l u e ) ) ?
modeb ( ∗ , c o u r s e ( + s i t u a t i o n ) ) ? %ACTION
modeb ( ∗ , c a r s h a r i n g ( + s i t u a t i o n , + s u b j e c t ) ) ? %ACTION
modeb ( ∗ , i n t e n t i o n ( + s i t u a t i o n , + s u b j e c t ,− v a l u e ) ) ?
modeb ( ∗ , g t e q ( + v a l u e , # v a l u e ) ) ?
modeb ( ∗ , l t e q ( + v a l u e , # v a l u e ) ) ?
Planning. Given the initial and the final state of a domain,
the objective of planning is to find a sequence of actions that
leads from the initial state to the final one. For example, subject s4 has low behavior, medium intention and low control,
and we want to find a sequence of actions that can make him
become very ecological. This problem can be represented
adding the next program to the domain description:
Symbol %ACTION tells the system which predicates are actions. We have instructed Iaction to induce an action description for fluents control and intention (we use one modeh for
each). Finally, Iaction finds this action description:
i n t e n t i o n ( S , A, 5 ) :− c o u r s e ( S ) , p r e v ( S , PS ) ,
i n t e n t i o n ( PS , A, X) , g t e q (X , 3 ) .
c o n t r o l ( S , A, 5 ) :− c a r s h a r i n g ( S ) .
The course improves the intention of subjects that had at least
medium intention, and the car sharing project improves the
control of all subjects. The solution found by Iaction, as
guaranteed by the method of [Otero, 2003], explains all the
examples and is consistent with the background.
s i t u a t i o n ( 0 . . 2 ) . s u b j e c t ( s4 ) .
i n t e n t i o n ( 0 , s4 , 3 ) . c o n t r o l ( 0 , s4 , 2 ) .
1 { c o u r s e ( S ) , c a r s h a r i n g ( S ) } 1 :− p r e v ( S , PS ) .
:− n o t b e h a v i o r ( 2 , s4 , 5 ) .
The line with brackets states that each answer set must contain one and only one of the atoms inside, i.e. for each situation we must choose one of the actions. The last line defines the goal of the planning problem. The program has two
answer sets that represent the solutions to the planning problems. This is part of the output of Clasp:
Step 5. Reasoning
We have a description of the effects of actions on fluents control and behavior. From previous experiments we also know
what is the relation of behavior with intention and control.
In this step we apply ASP to this model for reasoning about
actions. For all tasks we use a file representing the domain
description and another file representing the particular task.
The domain description file contains these rules to represent
the changes in the domain:
Answer : 1
. . . course (1) car sharing (2)
Answer : 2
. . . car sharing (1) course (2)
Rules for intention and control are the result of the previous
learning process, and rules for behavior are known from previous experiments. For each fluent we have to add the indirect
effects for the negation of the fluent and the inertia law. For
example, for fluent behavior we add rules:
4
An experiment on Human Reasoning and
Decision Making
The theory of the Adaptive Toolbox ([Gigerenzer and Selten,
2002] proposes that human reasoning and decision making
can be modeled as a collection of fast and frugal heuristics.
A fast and frugal heuristic is a simple algorithm to both build
a model and make predictions on a domain.
Under this hypothesis people use one of these fast and frugal heuristics to decide the solution to a problem. However,
the mechanism by which a subject would select one of the
heuristics is still under study. It is also possible that the same
subject, trying to solve a similar problem several times, uses
different heuristics in different moments.
In [Rieskamp and Otto, 2006] SSL (Strategy Selection
Learning), a theory based on reinforcement learning, is proposed. It explains how people decide to apply one heuristic
depending on feedback received. The theory is tested on 4
experimental studies. We apply the ILP+ASP method to one
of these experiments to: 1) model how a subject decides to
use one of the heuristics available in a given situation, and 2)
use this model to solve prediction and planning tasks.
−b e h a v i o r ( S , A, X) :− b e h a v i o r ( S , A, Y) , X! =Y .
b e h a v i o r ( S , A, X) :− b e h a v i o r ( PS , A, X) ,
n o t −b e h a v i o r ( S , A, X) , p r e v ( S , PS ) .
−b e h a v i o r ( S , A, X) :− −b e h a v i o r ( PS , A, X) ,
n o t b e h a v i o r ( S , A, X) , p r e v ( S , PS ) .
This domain description can be used for solving different
tasks.
Prediction. Given the state of a domain at an initial situation and a sequence of actions, the objective of prediction
is to determine the state of the domain in the final state and
others. For example, at initial situation subject s3 has low
intention and control. Then a car sharing project is done in
his workplace and after that he goes to a course on ecology.
We can represent this adding the following program to the
domain description:
s i t u a t i o n ( 0 . . 2 ) . s u b j e c t ( s3 ) .
i n t e n t i o n ( 0 , s3 , 2 ) . c o n t r o l ( 0 , s3 , 2 ) .
car sharing (1).
course ( 2 ) .
Step 1. Psychological Theory
Suppose we have two unnamed companies, A and B, and we
want to decide which one is the most creditworthy. Each company is described by 6 binary cues (Financial Flexibility, Efficiency, Capital Structure. . . ). For each cue a validity value,
representing the Probability of success, is also available. Both
companies with their respective cues are showed to a subject,
see table 2. The subject, based on this information, has to
This program has a unique answer set that solves the prediction problem. This is part of the output of Clasp:
b e h a v i o r ( 2 , s3 , 2 ) i n t e n t i o n ( 2 , s3 , 2 ) c o n t r o l ( 2 , s3 , 5 )
...
To improve the behavior it is necessary to improve both intention and control, and to this aim both course and car sharing
actions have to be executed. However, if instead of intention(0,s4,3) we write intention(0,s4,2) the program has no answer set and thus there is no solution to the planning problem:
the intention is too low to be improved giving a course, so
there is no way to improve her behavior.
i n t e n t i o n ( S , A, 5 ) :− c o u r s e ( S ) , p r e v ( S , PS ) ,
i n t e n t i o n ( PS , A, X) , g t e q (X , 3 ) .
c o n t r o l ( S , A, 5 ) :− c a r s h a r i n g ( S ) .
b e h a v i o r ( S , A, X) :− c o n t r o l ( S , A, X) , l t e q (X , 2 ) .
b e h a v i o r ( S , A, X) :− i n t e n t i o n ( S , A, X) ,
c o n t r o l ( S , A, Y) , g t e q (Y , 3 ) .
...
...
...
The course had no effect on the intention of s3 so even if the
car sharing project increased her control the behavior remains
low.
75
Cue
Financial Flexibility
Efficiency
Capital Structure
Management
Own Financial Resources
Qualifications of Employees
Validity
79%
90%
75%
70%
85%
60%
A
1
1
0
1
1
1
B
0
1
0
0
0
0
then it is assumed that he has used the TTB heuristic. In any
other case, the heuristic selected by the subject (from TTB or
WADD) is unknown.
Summarizing, for each subject in the experiment it is
shown to him a sequence of trials. In each trial the subject has
to select the company A or the company B as the most creditworthy. It is assumed that the subject has used the heuristic
(TTB or WADD) which selects the same answer as the subject. After each trial feedback showing if the answer was correct or incorrect is shown. The objective is to model which
heuristic is used by a subject in each trial.
Table 2: Example of a possible trial. A subject will select
company A or company B as the most creditworthy. After the
subject’s choice, feedback saying if the answer is correct or
incorrect is given.
Step 2. Representation
We define the trials of the survey, and the answers given by
the subjects, with the following predicates of a logic program.
decide whether the company A or the company B is the most
creditworthy. After the subject’s choice it is shown if the answer is correct or not. Then another two unnamed companies
are presented to the same subject and the previous process is
repeated, each repetition is named a trial. The objective of the
experiment is to model how a subject decides which company
is the most creditworthy.
In this study, subjects are shown 168 selection problems,
trials. The study is divided in 7 trial blocks, each consisting
on 24 trials. In each trial block, the same 24 pairs of companies, items, are shown to the subjects, but its order is varied.
The order of each company in the screen is varied too, e.g.
suppose that the trial involves two companies o1 and o2, in
some trials the company o1 is named A and it is showed on the
left side (the column under the label A in the table 2), while
the company o2 is named B and it is showed on the right (the
column under the label B), while in other trials the company
o1 is named B and showed on the right, and the company o2
is named A and showed on the left.
For each cue ci (e.g. Efficiency) the value 1 for a given
company O represents that O has the property ci (e.g. the
company O is efficient). The value 0 represents that O does
not have the property ci (e.g. the company O is not efficient).
Each cue ci has a value, 1 or 0, associated to each company A
and B.
Also each cue has associated a validity value, representing
the Probability of success. For example, in the table 2 for the
first cue Financial Flexibility the company A has the property
Financial Flexibility (1), the company B has not the property
Financial Flexibility (0) and the validity for this cue is 79%.
It means that the probability of choosing the right answer, if
you select the company with Financial Flexibility, is of 0.79.
In the experiment it is assumed that, for each trial, subjects decide which is the most creditworthy company based
on the results of one of these two heuristics: Take the Best
(TTB) or Weighted Additive Heuristic (WADD) [Gigerenzer
and Selten, 2002]. However a subject can use TTB in one trial
and WADD in another trial. Note that the subject decides to
select company A or company B in each trial. Hence it is
not directly known which of the two heuristic has been used
by the subject. In this study it is assumed that, if only one
heuristic selects the same object as the subject, then the subject must necessarily have used that heuristic. For example, if
the TTB heuristic selects company A, the WADD heuristic selects company B, and the subject has selected the company A,
Actions. For this experiment, it is enough to define a single
action predicate, show.
• show(T,Sb,I,P): item I is shown to subject Sb on trial T.
The parameter P represents if the correct company is at
the left or right of the computer screen.
Fluents. To represent the decisions of the subject on each
trial, and the information she might keep to decide which
heuristic to apply, we define the following fluents:
• selectedheuristic(T,Sb,H): subject Sb used heuristic H
(TTB or WADD) on trial T. The system Iaction [Otero
and Varela, 2006] will learn an action description for this
fluent.
• itemselectedheuristic(T,Sb,I,H): subject Sb used heuristic H the last time item I was shown.
• answeredcorrectly(T,Sb,D): subject Sb answered correctly on trial T. Parameter D is used to keep information
from previous trials, that the subject might use to take
a decision. For example, answeredcorrectly(T,Sb,d0)
represents that the subject answered correctly at trial
T, and answeredcorrectly(T,Sb,d1) that the subject answered correctly at T − 1.
• itemansweredcorrectly(T,Sb,I): subject Sb answered correctly the last time item I was shown.
• selected(T,Sb,O,D): subject Sb selected company O on
trial T.
• itemselected(T,Sb,I,O): subject Sb selected company O
the last time item I was shown.
• selectedposition(T,Sb,P): subject Sb selected the company at position P of the computer screen on trial T.
• itemselectedposition(T,Sb,I,P): subject Sb selected the
company at position P of the computer screen the last
time item I was shown.
• showed(T,Sb,I,P,D): item I was shown on trial T.
• feedback(T,Sb,H,R,D): the subject might have chosen
the company selected by ttb, wadd, or any of the two
(parameter H). This decision was either correct or incorrect (parameter R).
76
Trials
selectedheuristic
answeredcorrectly
actions (show)
companies
c1
c2
cue values
c3
c4
c5
c6
• sfeedback(T,Sb,H,R,C,D): counts the feedback received
by the subject on the last D trials. For example, sfeedback(T,Sb,ttb,incorrect,2,d2) represents that the subject
used the TTB heuristic in T and T − 1, and both times
received negative feedback.
Static background. The following static background is defined to represent companies, their cue values, items and the
selection that the TTB and WADD heuristic would make for
each of them.
t0
wadd
yes
t1
wadd
no
i6,left
o22 o24
1
0
0
1
0
0
0
1
1
1
0
1
t2
ttb
yes
i1,right
o33 o23
0
1
1
0
1
0
1
1
0
1
1
1
Table 3: Predicting heuristic selection problem.
• within(I,P,O1,O2): companies O1 and O2 are within
item I. P represents the position of the screen where the
correct company appears.
This set of action laws represents what makes a subject select
an heuristic.
The first action law states that, if 1) the subject has used
the WADD heuristic on the last two trials and answered incorrectly on both cases, and 2) for the last shown item cue c6
in the screen was different for both companies, then she will
use the TTB heuristic on the next trial.
The second action law states that, if 1) the subject has answered incorrectly to the last trial, 2) the subject is shown an
item where both companies have the same value for cue c6,
and 3) the last time this item appeared, she selected the object on the left of the computer screen, then she will select the
TTB heuristic.
The third action law states that, if 1) the subject has answered correctly to the last trial, 2) the subject is shown an
item where both companies have a different value for cue c5,
and 3) the TTB heuristic would select the company with a
value of 0 in this cue, then she will apply the WADD heuristic.
• correctobject(I,O): company O is the correct answer for
item I.
• selects(H,I,O): heuristic H selects company O for item
I.
• only(I,O,C,V): company O is the only within item I that
has a value of V for cue C. For example, if item i1 is
formed by companies o1 and o2, only(i1,o1,c1,1) represents that o1 has a value of 1 for cue c1, while o2 has a
value of 0.
• same(I,C): both companies on item I have the same
value for cue C.
Step 3. Data collection
We use the answers of 20 subjects from Study 1 of [Rieskamp
and Otto, 2006]4 . For each subject, the results of the survey
are represented as ground facts in a logic program, using the
predicates of step 2. The following are examples of these:
show ( t 1 , s1 , i 6 , l e f t ) . w i t h i n ( i 6 , l e f t , o22 , o24 ) .
s e l e c t e d h e u r i s t i c ( t 1 , s1 , wadd ) . n a n s w e r e d c o r r e c t l y ( t 1 , s1 , d0 ) .
...
The first fact represents that, at trial t1, subject s1 was shown
item i6, with the correct object at the left of the computer
screen. The second fact represents that item i6 is formed by
companies o22 and o24. Finally, the last two facts represent
the answer of the subject: he has used the WADD heuristic
and thus answered incorrectly.
Step 5. Reasoning
The action theory built on the previous step is combined with
the background defined on step 2. The resulting model can
be used to predict, explain and plan for subject answers.
Prediction. Given a sequence of trials, the model can be
used to predict which heuristic the subject will use on each
of them. For example, consider the problem summarized in
table 3. At trial 0 (t0), the subject has selected the WADD
heuristic, and has answered correctly. We now know that
the subject will have to answer to items i6 and i11, and that
the correct object appears in the left and right of the screen,
respectively. Table 3 shows the companies within each item,
their cue values, and which company would be selected by the
TTB and WADD heuristic. With this information, the goal is
to decide which heuristics the subject will use to answer to
these two items. To solve this problem we add the following
rules to the logic program:
Step 4. Model construction
We use the system Iaction [Otero and Varela, 2006] to build
an action theory representing how a subject selects heuristics.
An action theory is built for each subject. The following is an
action theory built by the system:
s e l e c t e d h e u r i s t i c ( T , Sb , t t b ): − show ( T , Sb , I , P ) , p r e v ( T , P t ) ,
f e e d b a c k ( P t r , Sb , wadd , i n c o r r e c t , d0 ) ,
showed ( P t r , Sb , I2 , P2 , d0 ) ,
o n l y ( I2 , O, c6 , 1 ) .
s e l e c t e d h e u r i s t i c ( T , Sb , t t b ): − show ( T , Sb , I , P ) , p r e v ( Tr , P t ) ,
n a n s w e r e d c o r r e c t l y ( P t r , Sb , d0 ) ,
csame ( I , c6 ) ,
i t e m s e l e c t e d ( P t r , Sb , I , O1 ) , w i t h i n ( I , P , O1 , O2 ) .
s e l e c t e d h e u r i s t i c ( Tr , Sb , wadd ): − show ( T , Sb , I , P ) , p r e v ( Tr , P t ) ,
a n s w e r e d c o r r e c t l y ( Pt , Sb , d0 ) ,
o n l y ( I , O, c5 , 0 ) , s e l e c t s ( t t b , I , O ) .
s e l e c t e d h e u r i s t i c ( t 0 , s1 , wadd ) . a n s w e r e d c o r r e c t l y ( t 0 , s 1 ) .
....
show ( t 1 , s1 , i 6 , l e f t ) . show ( t 2 , s1 , i 1 , r i g h t ) .
First, we specify the state of the subject at trial t0 as a set of
ground facts. Then, we specify the sequence of actions that
the subject will be shown. With these rules we get a single
solution, the prediction shown in table 3:
4
The authors wish to thank Dr. Jörg Rieskamp for providing the
data used in this section.
77
dynamic domains need to be modeled.
s e l e c t e d h e u r i s t i c ( t 0 , s1 , wadd ) s e l e c t e d h e u r i s t i c ( t 1 , s1 , wadd )
s e l e c t e d h e u r i s t i c ( t 2 , s1 , t t b )
Acknowledgments. This research is partially supported
by the Government of Spain, grant AP2008-03841, and in
part by the Government of Galicia (Spain), under grant
PGIDIT08-IN840C.
Planning. In the planning task, the goal is to find a sequence of actions that would make the subject show a certain
behavior, e.g. using an heuristic or answering a question incorrectly. As an example, consider the same problem shown
in table 1. This time, however, we just know that the subject
has used the WADD heuristic at trial 0, and that we want her
to use the TTB heuristic on trial 2.
To solve this problem we add the following rules to the
program:
References
[Ajzen, 1985] I. Ajzen. From intentions to actions: a theory
of planned behavior. Action-control: from cognition to
behavior, pages 11–39, 1985.
[Balduccini and Girotto, 2010] M.
Balduccini
and
S. Girotto.
Formalization of psychological knowledge in answer set programming and its application.
Theory Pract. Log. Program., 10:725–740, 2010.
[Gelfond and Lifschitz, 1991] M. Gelfond and V. Lifschitz.
Classical negation in logic programs and disjunctive
databases. New Generation Computing, 9:365–385, 1991.
[Gigerenzer and Selten, 2002] G. Gigerenzer and R. Selten.
Bounded Rationality: The Adaptive Toolbox. The MIT
Press, 2002.
[King et al., 1996] R.D. King, S.H. Muggleton, A. Srinivasan, and M. Sternberg. Structure-activity relationships
derived by machine learning: the use of atoms and their
bond connectives to predict mutagenicity by inductive
logic programming. Proceedings of the National Academy
of Sciences, 93:438–442, 1996.
[Lifschitz, 2002] Vladimir Lifschitz. Answer set programming and plan generation. Artificial Intelligence, 138:39–
54, 2002.
[McCarthy and Hayes, 1969] J. McCarthy and P. J. Hayes.
Some philosophical problems from the standpoint of artificial intelligence. In Machine Intelligence, pages 463–502.
Edinburgh University Press, 1969.
[Muggleton, 1995] S. H. Muggleton. Inverse entailment and
progol. New Generation Computing, 13:245–286, 1995.
[Nogueira et al., 2000] M. Nogueira, M. Balduccini, M. Gelfond, R. Watson, and M. Barry. An a-prolog decision support system for the space shuttle. In In PADL 2001, pages
169–183. Springer, 2000.
[Otero and Varela, 2006] R. Otero and M. Varela. Iaction, a
system for learning action descriptions for planning. Proceedings of the 16th Int. Conference on Inductive Logic
Programming, ILP-06. LNAI, 4455, 2006.
[Otero, 2003] R. Otero. Induction of the effects of actions
by monotonic methods. Proceedings of the 13th Int. Conference on Inductive Logic Programming, ILP-03. LNAI,
2835:193–205, 2003.
[Rieskamp and Otto, 2006] J. Rieskamp and P. E Otto. Ssl:
a theory of how people learn to select strategies. Journal
of Experimental Psychology: General, 135(2):219–238,
2006.
s e l e c t e d h e u r i s t i c ( t 0 , s1 , wadd ) . a n s w e r e d c o r r e c t l y ( t 0 , s 1 ) .
....
1 { show ( T , Sb , I , P ) : i t e m ( I ) : p o s i t i o n ( P ) } 1 :−
p r e v ( T , P t ) , s u b j e c t ( Sb ) .
:− s e l e c t e d h e u r i s t i c ( T , Sb , H) ,
s e l e c t e d h e u r i s t i c ( T , Sb , H2 ) , H! =H2 .
:− n o t s e l e c t e d h e u r i s t i c ( t 2 , s1 , t t b ) .
First, we specify the state of the subject at trial t0 as in the
prediction task. Then, we specify the planning problem using three rules. The first rule defines the set of possible solutions for the task, the second rule grants that the solutions
found are consistent, and the last rule represents the goal of
the problem. Running Clasp we get all possible solutions.
For example:
show ( t 8 8 , s107 , i 6 , l e f t ) show ( t 8 9 , s107 , i 1 , r i g h t )
that is the same used in the prediction example.
Discussion.
To model the process of strategy selection [Rieskamp and
Otto, 2006] have proposed the SSL theory. According to this
theory subjects start with an initial preference for each heuristic. At each trial subjects use the most preferred heuristic, and
they update their preferences depending on the performance
of the heuristics (see [Rieskamp and Otto, 2006] for a formal definition). On a preliminary study SSL correctly predicted 80% of the trials where the result of TTB and WADD
is different, and in the same setting the ILP+ASP method predicted correctly 88% of the trials. Note that with ILP+ASP
the model is built automatically, without prior knowledge of
how the process of strategy selection is done. And the resulting model can provide new insight on this process. For example, the action theory constructed by Iaction suggests that
the cue c6, that appears in the bottom of the computer screen,
could be relevant for the decisions of the subjects. Finally, we
have seen how these models can be used in an ASP system to
predict and plan about subject answers.
5
Conclusions
We have proposed a new method of ILP+ASP for Experimental Psychology. It is a correct and complete method for
induction of logic programs that provides a general form of
representation and can be used to solve relevant tasks like
explanation and planning. We have applied the method for
learning and reasoning in a real-world dynamic domain, thus
improving the methods used in Experimental Psychology,
that do not consider these problems. As of future work we
will apply the method to other fields of Psychology where
78
Tractable Strong Outlier Identification
Fabrizio Angiulli
[email protected]
ECSS Dept.
University of Calabria
Via P. Bucci, 41C, 87036
Rende (CS), Italy
Rachel Ben-Eliyahu–Zohary
[email protected]
Software Engineering Dept.
Jerusalem College of Engineering
Jerusalem, Israel
Abstract
Intelligence literature. In particular, for such a formalism to be suitable to attack interesting KR problems it
must be nonmonotonic, so that it is possible to naturally exploit defeasible reasoning schemas. Among the
nonmonotonic knowledge representation formalisms, Reiter’s default logic (Reiter, 1980) occupies a preeminent
and well-recognized role.
In a recent paper (Angiulli, Zohary, & Palopoli, 2008)
formally defined the outlier detection problem in the context of Reiter’s default logic knowledge bases and studied some associated computational problems. Following
(Angiulli et al., 2008), this problem can be intuitively described as follows: outliers are sets of observations that
demonstrate some properties contrasting with those that
can be logically “justified” according to the given knowledge base. Thus, along with outliers, their witnesses,
which are sets of observations encoding the unexpected
properties associated with outliers, are singled out.
To illustrate this technique, consider a case where during the same day, a credit card number is used several
times to pay for services provided through the Internet.
This sounds normal enough, but add to that the interesting fact that the payment is done through different
IPs, each of which is located in a different country! It
might be the case that the credit card owner is traveling on this particular day, but if the different countries
from which the credit card is used are located in different
continents we might get really suspicious about who has
put his hands on these credit card numbers. Another
way to put it, is to say that the fact that the credit
card number is used in different continents during the
same day makes this credit card an outlier, and one of
the probable explanations for such a phenomenon is that
the credit card numbers have been stolen. This example
is discussed further in Section 3.3.
As noted in (Angiulli et al., 2008),outlier detection
problems are generally computationally quite hard, with
their associated complexities ranging from DP -complete
to DP
3 -complete, depending on the specific form of problem one decides to deal with. For this reason, (Angiulli,
Zohary, & Palopoli, 2010) singled out several cases where
a very basic outlier detection problem, that is, the problem of recognizing an outlier set and its witness set, can
be solved in polynomial time.
In knowledge bases expressed in default logic,
outliers are sets of literals, or observations, that
feature unexpected properties. This paper introduces the notion of strong outliers and studies the complexity problems related to outlier
recognition in the fragment of acyclic normal
unary theories and the related one of mixed
unary theories. We show that recognizing
strong outliers in acyclic normal unary theories
can be done in polynomial time and, moreover,
that this result is sharp, since switching to either general outliers, cyclic theories or acyclic
mixed unary theories makes the problem intractable. This is the only fragment of default
theories known so far for which the general outlier recognition problem is tractable. Based on
these results, we have designed a polynomial
time algorithm for enumerating all strong outliers of bounded size in an acyclic normal unary
default theory. These tractability results rely
on the Incremental Lemma which is also presented. This useful Lemma provides conditions
under which a mixed unary default theory displays a monotonic reasoning behavior.
1
Luigi Palopoli
[email protected]
ECSS Dept.
University of Calabria
Via P. Bucci, 41C, 87036
Rende (CS), Italy
Introduction
Detecting outliers is a premiere task in data mining. Although there is no universal definition of outlier, it is
usually referred to as an observation that appears to
deviate markedly from the other observations or to be
inconsistent with the remainder of the data (Hawkins,
1980). Applications of outlier detection include fraud detection, intrusion detection, activity and network monitoring, detecting novelties in various contexts, and many
others (Hodge & Austin, 2004; Chandola, Banerjee, &
Kumar, 2009).
Consider a rational agent acquiring information about
the world stated in the form of a sets of facts. It is analogously relevant to recognize if some of these facts disagree with her own view of the world. Obviously such a
view has to be encoded somehow using one of the several
KR&R formalisms defined and studied in the Artificial
79
proofs are ommited. All the proofs can be found in
the full version of the paper, see (Angiulli, Zohary, &
Palopoli, ).
A cumulative look at the results presented in (Angiulli
et al., 2008, 2010), provides an idea of the tractability
frontier associated with outlier detection problems in default logic. In this paper, we continue along this line of
research and attempt to draw, as precisely as possible,
such a tractability frontier. We want to depict the contour of a tractability region for outlier detection problems that refers to the well-known fragment of unary
propositional default theories. In particular, motivated
by the intractability of the general outlier recognition
problem in all the classes of theories considered thus far
in the literature, we investigate this problem within further subsets of the classes already studied, such as the
fragment of acyclic normal unary theories and the related one of mixed unary theories. We also introduce a
new type of outliers which we will call strong outliers.
Informally speaking, acyclic normal unary theories are
normal unary theories characterized by a bounded degree
of cyclicity, while strong outliers are outliers characterized by a stronger relationship with their witness set than
in the general case. In this context, we have been able to
prove that recognizing strong outliers under acyclic normal unary theories can be done in polynomial time and,
moreover, that this result is sharp, since switching either
to general outliers, to cyclic theories or to acyclic mixed
unary theories makes the problem intractable. Notably,
this is the only only fragment of default theories known
so far for which the general outlier recognition problem is
tractable. Based on these results, we designed a polynomial time algorithm for enumerating all strong outliers of
bounded size in an acyclic normal unary default theory.
This algorithm can also be employed to enumerate all
strong outliers of bounded size in a general normal mixed
unary theory and, with some minor modifications, all the
general outliers and witness pairs of bounded size. However, in this latter case, since the problems at hand are
NP-hard, its worst case running time is exponential, even
if from a practical point of view it can benefit from some
structural optimizations which allows the algorithm to
reduce the size of the search space.
The rest of the paper is organized as follows. Section 2
recalls the definition of default logics and that of the outlier detection task in the framework of default reasoning.
Section 3 introduces the definitions of mixed unary and
acyclic unary default theories, the definition of strong
outlier, and provides a roadmap of the technical results
that will be presented in the rest of the paper. Section ??
presents intractability results while Section ?? presents
some computational characterizations of mixed unary
theories, the tractability result concerning strong outlier
recognition that completes the layout of the tractability
frontier, and the polynomial time strong outlier enumeration algorithm for acyclic unary default theories. To
conclude, Section 4 runs through the complexity results
presented in Sections ?? and ?? once more, this time
focusing on their complementarity and commenting also
upon the application of the outlier enumeration algorithm within more general scenarios. The section ends
with our conclusions. Due to space constraints, many
2
Outlier Detection using Default Logic
Default logic was introduced by Reiter (Reiter, 1980).
We first recall basic facts about its propositional fragment. For T , a propositional theory, and S, a set of
propositional formulae, T ∗ denotes the logical closure
of T , and ¬S the set {¬(s)|s ∈ S}. A set of literals L is inconsistent if ¬ℓ ∈ L for some literal ℓ ∈ L.
Given a literal ℓ, letter(ℓ) denotes the letter in the literal ℓ. Given a set of literals L, letter(L) denotes the set
{A | A = letter(ℓ) for some ℓ ∈ L}.
2.1
Syntax
A propositional default theory ∆ is a pair (D, W ) where
W is a set of propositional formulae and D is a set of
default rules. We assume that both sets D and W are
finite. A default rule δ is
α : β1 , . . . , βm
γ
(1)
where α (called prerequisite), βi , 1 ≤ i ≤ m
(called justifications) and γ (called consequent) are
propositional formulae. For δ a default rule, pre(δ),
just(δ), and concl (δ) denote the prerequisite, justification, and consequent of δ, respectively. Analogously, given a set of default rules, D = {δ1 , . . . , δn },
pre(D), just(D), and concl (D) denote, respectively, the
sets {pre(δ1 ), . . ., pre(δn )}, {just(δ1 ), . . . , just(δn )}, and
{concl (δ1 ), . . . , concl (δn )}. The prerequisite may be
missing, whereas the justification and the consequent are
required (an empty justification denotes the presence of
the identically true literal true specified therein).
Next, we introduce some well-known subsets of propositional default theories relevant to our purposes.
Normal theories. If the conclusion of a default rule
is identical to the justification the rule is called normal.
A default theory containing only normal default rules is
called normal.
Disjunction-free theories. A propositional default
theory ∆ = (D, W ) is disjunction free (DF for short)
(Kautz & Selman, 1991), if W is a set of literals, and,
for each δ in D, pre(δ), just(δ), and concl (δ) are conjunctions of literals.
Normal mixed unary theories. A DF default theory is normal mixed unary (NMU for short) if its set of
defaults contains only rules of the form α:β
β , where α is
either empty or a single literal and β is a single literal.
Normal and dual normal unary theories. An
NMU default theory is normal unary (NU for short) if
the prerequisite of each default is either empty or positive. An NMU default theory is dual normal (DNU for
short) unary if the prerequisite of each default is either
empty or negative.
Figure 1 highlights the set-subset relationships between the above fragments of default logic.
80
of generating defaults of E. We assume that the set of
generating defaults is maximal, that is, for every δ ∈ D,
if δ is applicable in E then, for some 1 ≤ i ≤ n, δ = δi .
Although default theories are non-monotonic, normal default theories satisfy the property of semimonotonicity (see Theorem 3.2 of (Reiter, 1980)). Semimonotonicity in default logic means the following: Let
∆ = (D, W ) and ∆′ = (D′ , W ) be two default theories
such that D ⊆ D′ ; then for every extension E of ∆ there
is an extension E ′ of ∆′ such that E ⊆ E ′ .
A default theory may not have any extensions (an ex:β
}, ∅). Then, a default theory
ample is the theory ({ ¬β
is called coherent if it has at least one extension, and
incoherent otherwise. Normal default theories are always coherent. A coherent default theory ∆ = (D, W )
is called inconsistent if it has just one extension which is
inconsistent. By Theorem 2.2 of (Reiter, 1980), the theory ∆ is inconsistent iff W is inconsistent. The theories
examined in this paper are always coherent and consistent, since only normal default theories (D, W ) with W
a consistent set of literals are taken into account.
The entailment problem for default theories is as follows: Given a default theory ∆ and a propositional formula φ, does every extension of ∆ contain φ? In the
affirmative case, we write ∆ |= φ. For a set of propositional formulas S, we analogously write ∆ |= S to denote
(∀φ ∈ S)(∆ |= φ).
Figure 1: A map of the investigated fragments default
theory
2.2
Semantics
The informal meaning of a default rule δ is as follows: If
pre(δ) is known to hold and if it is consistent to assume
just(δ), then infer concl (δ). The formal semantics of a
default theory ∆ is defined in terms of extensions. A set
E is an extension for a theory ∆ = (D, W ) if it satisfies
the following set of equations:
• E0 = W ,
• n
for
γ|
• E=
≥
i
α:β1 ,...,βm
γ
∞
[
0,
Ei+1
=
Ei∗
∈ D, α ∈ Ei , ¬β1 6∈ E, . . . , ¬βm
∪
o
6 E ,
∈
2.3
Outliers in Default Logic
The issue of outlier detection in default theories is extensively discussed in (Angiulli et al., 2008). The formal
definition of outlier there proposed is given as follows.
For a given set W and a list of sets S1 , . . . , Sn , WS1 ,...,Sn
denotes the set W \ (S1 ∪ S2 ∪ . . . ∪ Sn ).
Definition 2.2 (Outlier and Outlier Witness Set)
(Angiulli et al., 2008) Let ∆ = (D, W ) be a propositional default theory and let L ⊆ W be a set of literals.
If there exists a non-empty subset S of WL such that:
1. (D, WS ) |= ¬S, and
2. (D, WS,L ) 6|= ¬S
then L is an outlier set in ∆ and S is an outlier witness
set for L in ∆.
The intuitive explanation of the different roles played
by an outlier and its witness is as follows. Condition (i)
of Definition 2.2 states that the outlier witness set S denotes something that does not agree with the knowledge
encoded in the defaults. Indeed, by removing S from
the theory at hand, we obtain ¬S. In other words, if
S had not been observed, then, according to the given
defaults, we would have concluded the exact opposite.
Moreover, condition (ii) of Definition 2.2 states that the
outlier L is a set of literals that, when removed from the
theory, makes such a disagreement disappear. Indeed,
by removing both S and L from the theory, ¬S is no
longer obtained. In other words, disagreement for S is a
consequence of the presence of L in the theory. To summarize, the set S witnesses that the piece of knowledge
Ei .
i=0
Given a default δ and an extension E, we say that δ is
applicable in E if pre(δ) ∈ E and (6 ∃c ∈ just(δ))(¬c ∈ E).
It is well known that an extension E of a finite propositional default theory ∆ = (D, W ) can be finitely characterized through the set DE of the generating defaults
for E w.r.t. ∆ (Reiter, 1980; Zhang & Marek, 1990).
Next we introduce a characterization of an extension
of a finite DF propositional theory which is based on a
lemma from (Kautz & Selman, 1991).
Lemma 2.1 Let ∆ = (D, W ) be a DF default theory; then E is an extension of ∆ if there exists a sequence of defaults δ1 , ..., δn from D and a sequence of
sets E0 , E1 , ..., En , such that for all i > 0:
• E0 = W ,
• Ei = Ei−1 ∪ concl (δi ),
• pre(δi ) ⊆ Ei−1 ,
• (6 ∃c ∈ just(δi ))(¬c ∈ En ),
• (6 ∃δ ∈ D)(pre(δ) ⊆ En ∧ concl (δ) 6⊆ En ∧ (6 ∃c ∈
just(δ))(¬c ∈ En )),
• E is the logical closure of En ,
where En is called the signature set of E and is denoted
liter(E) and the sequence of rules δ1 , ..., δn is the set DE
81
denoted by L behaves, in a sense, exceptionally, tells us
that L is an outlier set and S is its associated outlier
witness set.
The intuition here is better illustrated by referring to
the example on stolen credit card numbers given in the
Introduction. A default theory ∆ = (D, W ) that describes such an episode might be as follows:
n
o
umber:¬M ultipleIP s
,
– D = CreditN¬M
ultipleIP s
Table 1 summarizes previous complexity results, together with the results that constitute the contributions
of the present work that will be detailed later in this
section.
In particular, the complexity of the Outlier Recognition and the Outlier-Witness Recognition problems has
been studied in (Angiulli et al., 2008) for general and
disjunction-free (DF) default theories and in (Angiulli
et al., 2010) for normal unary (NU) and dual normal
unary (DNU) default theories. The results there pointed
out that the general problem of recognizing an outlier
set is always intractable (see Theorem 4.3 in (Angiulli
et al., 2008) and Theorem 3.6 in (Angiulli et al., 2010)).
As for recognizing an outlier together with its witness,
this problem is intractable for general and disjunctionfree default theories (see Theorem 4.6 in (Angiulli et al.,
2008)), but can be solved in polynomial time if either
NU or DNU default theories are considered. Regarding
the latter result, it is interesting to note that, while for
both NU and DNU default theories the entailment of a
literal can be decided in polynomial time, deciding the
entailment in DF default theories is intractable.
Motivated by the intractability of the general Outlier
Recognition problem on all classes of default theories
considered so far, in this paper we take some further
steps in analyzing the complexity of outlier detection
problems in default logics in order to try to chart the
associated tractability frontier. To this end, in the next
sections we consider further subsets of the classes already mentioned, referred to as Acyclic Normal Unary
and Acyclic Dual Normal Unary theories, and a specific
kind of outlier, which we will call Strong Outliers. The
latter, loosely speaking are characterized by a stronger
relationship with their witness set than in the general
case. Then, in Subsection 3.5, the main results of our
complexity analysis are overviewed.
– W = {CreditN umber, M ultipleIP s}.
Here, the credit card number might be stolen, for
otherwise it wouldn’t have been used over different
continents during the same day. Accordingly, L =
{CreditN umber} is an outlier set here, and S =
{M ultipleIP s} is the associated witness set. This reasoning agrees with our intuition that an outlier is, in
some sense, abnormal and that the corresponding witness testifies to it.
Note that sets of outliers and their corresponding witness sets are selected among those explicitly embodied in
the given knowledge base. Hence, outlier detection using default reasoning is essentially a knowledge discovery
technique. As such, it can be very useful, to give one example, when applied to information systems for crime
prevention and homeland security, because the outlier
detection technique can be exploited in order to highlight suspicious individuals and/or events. Several examples for the usefulness of this approach are given in
(Angiulli et al., 2008) and in Section 3.3 below.
3
Charting the Tractability Frontier of
Outlier Detection
Subsection 3.1 recalls two main outlier detection tasks
and the related complexity results known so far; Subsection 3.2 introduces acyclic theories and an interesting restriction of the above defined concept of outlier, that we
call strong outliers, and finally, Subsection 3.5 presents
the plan of a set of results that will allow us to chart
the tractability frontier in the context of propositional
normal mixed unary default theories.
3.1
3.2
Strong Outliers and Acyclic Theories
Next, the definitions of strong outlier set (Section 3.3)
and of acyclic default theory (Section 3.4) are given.
3.3
Strong Outliers
Recall the definition of outlier set already provided in
Section 2.3 (see Definition 2.2).
Conditions 1 and 2 of the Definition 2.2 of outlier of
Subsection 2.3 can be rephrased as follows:
1. (∀ℓ ∈ S)(D, WS ) |= ¬ℓ, and
2. (∃ℓ ∈ S)(D, WS,L ) 6|= ¬ℓ.
In other words, condition 1 states that the negation of
every literal ℓ ∈ S must be entailed by (D, WS ) while,
according to condition 2, it is sufficient for just one literal ℓ ∈ S to exist whose negation is not entailed by
(D, WS,L ). Hence, there is a sort of “asymmetry” between the two conditions, which is the direct consequence
of the semantics of the entailment established for sets of
literals.
It is clear that, at least from a purely syntactic point
of view, the relationship between the outlier set and its
Outlier Detection Problems
The computational complexity of discovering outliers
in default theories under various classes of default logics has been previously investigated in (Angiulli et al.,
2008). In particular, the two main recognition tasks
in outlier detection are the Outlier Recognition and
the Outlier-Witness Recognition problems (also called
Outlier(L) and Outlier(S)(L), respectively, in (Angiulli
et al., 2008)), and are defined as follows:
- Outlier Recognition Problem: Given a default
theory ∆ = (D, W ) and a set of literals L ⊆ W , is
L an outlier set in ∆?
- Outlier-Witness Recognition Problem: Given
a default theory ∆ = (D, W ) and two sets of literals
L ⊂ W and S ⊆ WL , is L an outlier set with witness
set S in ∆?
82
Problem
Outlier
Type
General
Outlier Recognition
Strong
Outlier-Witness
Recognition
General
Strong
DF
Default
(D)NU
Default
ΣP
ΣP
3 -c
2 -c
∗
Th.4.3
Th.4.3∗
NP-hard
Th.3.13
DP
DP -c
2 -c
∗
Th.4.6
Th.4.6∗
NP-hard
Th.3.4
NP-c
Th.3.6∗∗
NP-c
Th.3.13
P
Th.3.1∗∗
P
Th.3.3
General
Default
Acyclic
(D)NU
Default
NP-c
Th.3.9
P
Th.3.14
P
Th.3.1∗∗
P
Th.3.3
Table 1: Complexity results for outlier detection (∗ =reported in (Angiulli et al., 2008),
et al., 2010)).
witness set can be strengthened by replacing the existential quantifier in Condition 2 with the universal one,
thus breaking the aforementioned asymmetry between
the two conditions and obtaining the following definition
of strong outlier set.
Definition 3.1 (Strong Outlier) Let ∆ = (D, W ) be
a propositional default theory and let L ⊂ W be a set of
literals. If there exists a non-empty subset S of WL such
that:
∗∗
=reported in (Angiulli
CreditN umber:¬M ultipleIP s
¬M ultipleIP s
1.
– Normally, credit card
numbers are not used in different continents during
the same day;
2.
CellU se:M f C
MfC
– (MfC stands for “mostly from contacts”) - Normally numbers dialed from a cell phone
are mostly from the contact list;
CellU se:¬QuietT ime
¬QuietT ime
– Normally people do not use the
phone during their “quiet time” - e.g. late at night;
se:¬N ewLocation
4. CellU
– Normally cell phones are
¬N ewLocation
used in locations in which the device was used in
the past.
Now, suppose that a pickpocket stole Michelle’s cellphone and purse from her handbag. She came home late
and didn’t notice the theft till morning. While she was
sleeping, the pickpocket could broadcast her credit card
numbers through malicious servers over the Internet and
use her cellphone to make expensive phone calls. A sophisticated crime prevention information system could
automatically notice exceptional behaviors, and make
the following observations:
QuietTime – calls are made from the device during
abnormal hours;
¬MfC – It is not the case that most of the calls’ destinations are from the phone’s contact list;
NewLocation – The device is in a location where it
hasn’t been before;
MultipleIPs – The credit card number is used in different continents during the last day.
Let us now consider the default theory ∆ = (D, W ),
where D is the set of Defaults 1-4 introduced above, and
W = {CreditN umber, CellU se, ¬M f C, QuietT ime,
N ewLocation, M ultipleIP s}. According to the definition of outlier given in (Angiulli et al., 2008) (see Definition 2.2), we get that L = {CreditN umber} is an
outlier and S = {¬M f C, N ewLocation, QuietT ime,
M ultipleIP s} is a possible witness set for L. This
last witness set is also a witness set for the outlier {CellU se}. However, although the observations
M f C and QuietT ime are in the witness set of the
outlier {CreditN umber}, they do not explain why
3.
1. (∀ℓ ∈ S)(D, WS ) |= ¬ℓ, and
2. (∀ℓ ∈ S)(D, WS,L ) 6|= ¬ℓ
then L is a strong outlier set in ∆ and S is a strong
outlier witness set for L in ∆.
The following proposition is immediately proved:
Proposition 3.2 If L is a strong outlier set then L is
an outlier set.
Proof: Straightforward.
Note that, in general the vice versa of Proposition 3.2
does not hold.
We study next the impact of restricting attention to
strong outliers on the computational complexity of outlier detection problems. Before doing that, we first discuss the significance of the knowledge associated with
strong outliers.
We begin with an example that is an extension of the
credit card scenario presented in the Introduction. Recall that a credit card number is suspected to be stolen
since it was used from several IPs in different continents
during the same day. The example we give now is related
to violating normal behavioral patterns in using a cellular phone. Normally, almost all the numbers that people
call are from their contacts list. In addition, for each
cell phone user, there are hours of the day during which
she normally does not use the phone. For example, most
users would not use the phone during the night hours.
Finally, for a typical cellphone user, there is a list of locations from which she normally calls. The knowledge
described above can be summarized using the following
defaults:
83
{CreditN umber} is an outlier. Similarly, the observation M ultipleIP s is in the witness set of the outlier
{CellU se} but it does not explain why {CellU se} is an
outlier.
One might suggest that in order to improve the behavior demonstrated above, we should look for a minimal witness set. However, it seems to us counter intuitive to look for a minimal witness set. If we identify
an outlier, we would like to have a maximal set of the
observations that support our suspicion. In the example
above, {¬M f C} is a minimal witness set for the outlier {CellU se}, but its superset {¬M f C, N ewLocation,
QuietT ime}, which is also a witness set for the outlier
{CellU se}, provides more information.
The notion of strong outlier presented above seems to
adequately capture, in such scenarios as that depicted
in this example situation, the notion of outlier and its
witness set. If we use the definition of strong outlier
we get that S = {¬M f C, N ewLocation, QuietT ime,
M ultipleIP s} is neither a witness set for the outlier {CreditN umber} nor a witness set for the outlier
{CellU se}. A witness set for the outlier {CellU se} is,
instead, the set {¬M f C, N ewLocation, QuietT ime} or
any of its nonempty subsets, while a witness set for the
outlier {CreditN umber} is the set {M ultipleIP s}.
We now turn to the complexity issues. In order to
mark the tractability landscape of the new strong outlier detection problem, we provide two results, the former one regarding the tractability of the outlier-witness
recognition problem and the latter one pertaining to its
intractability.
In order to complete the proof, we note that a singleton witness set is always a strong witness set and, hence,
the above reduction immediately applies to strong outliers as well.
3.4
Acyclic NU and DNU theories
In this section, acyclic normal mixed unary default theories are defined. We begin by introducing the notions
of atomic dependency graph and that of tightness of a
NMU default theory.
Definition 3.5 (Atomic Dependency Graph) Let
∆ = (D, W ) be a NMU default theory. The atomic
dependency graph (V, E) of ∆ is a directed graph such
that
– V = {l | l is a letter occurring in ∆}, and
– E = {(x, y) | letters x and y occur respectively in
the prerequisite and the consequent of a default in
D}.
Definition 3.6 (A set influences a literal) Let ∆ =
(D, W ) be an NMU default theory. We say that a set of
literals S influences a literal l in ∆ if for some t ∈ S
there is a path from letter(t) to letter(l) in the atomic
dependency graph of ∆.
Definition 3.7 (Tightness of an NMU theory)
The tightness c of an NMU default theory is the
size c (in terms of number of atoms) of the largest
strongly connected component (SCC) of its atomic
dependency graph.
Intuitively, an acyclic NMU default theory is a theory
whose degree of cyclicity is fixed, where its degree of
cyclicity is measured by means of its tightness, as formalized in the following definition.
Theorem 3.3 Strong Outlier-Witness Recognition on
propositional NU default theories is in P.
Proof: The proof is immediate since the statement follows from the definition of strong outlier set (Definition
3.1) and the fact that the entailment problem on propositional NU default theories is polynomial time solvable
(as proved in (Kautz & Selman, 1991; Zohary, 2002)).
Definition 3.8 (Acyclic NMU theory) Given
a
fixed positive integer c, a NMU default theory is said to
be (c-)acyclic, if its tightness is not greater than c.
Figure 1 in Section 2 highlights the containment relationship among DF, NMU, NU, DNU, and acyclic default
theories.
For the sake of simplicity, in the following sections we
refer to c-acyclic theories simply as acyclic theories.
As for the complexity of the Strong Outlier-Witness
Recognition problem on propositional DF and general
default theories, the following statement holds.
Theorem 3.4 Strong Outlier-Witness Recognition on
propositional DF default theories is NP-hard.
Proof: The statement follows from the reduction employed in Theorem 4.6 of (Angiulli et al., 2008), where
it is proved that given two DF default theories ∆1 =
(D1 , ∅) and ∆2 = (D2 , ∅), and two letters s1 and s2 , the
problem q of deciding whether ((∆1 |= s1 )∧(∆2 |= s2 )) is
valid can be reduced to the outlier-witness problem; that
is, to the problem of deciding whether L = {s2 } is an
outlier having witness set S = {¬s1 } in the theory ∆(q),
where ∆(q) = (D(q), W (q)) is the propositional DF de| α:β
fault theory with D(q) = { s2 ∧α:β
β
β ∈ D1 } ∪ D2 and
W (q) = {¬s1 , s2 }. Since the former problem is NP-hard,
it follows from the reduction that the latter problem is
NP-hard as well.
3.5
Main Results
It is clear from the definition of outlier that tractable
subsets for outlier detection problems necessarily have
to be singled out by considering theories for which the
entailment operator is tractable. Thus, with the aim
of identifying tractable fragments for the outlier recognition problem, we have investigated its complexity on
acyclic (dual) normal unary default theories. These theories form a strict subset of normal unary default theories
already considered in (Angiulli et al., 2010) (other than
being a subset of acyclic NMU theories), for which the
entailment problem is indeed polynomially time solvable
(proved in (Kautz & Selman, 1991; Zohary, 2002)) and
for which the outlier recognition problem is known to be
NP-complete (Th. 3.6 of (Angiulli et al., 2010)).
84
procedure can be built that enumerates all the potential witness sets S for the outlier L and checks that S is
actually a witness set for L.
The formal proofs of Theorem 3.14 and of Lemmas
3.10 and 3.11 are reported in the full paper (See (Angiulli
et al., ). It is important to note that Lemma 3.11 cannot actually be exploited to prove the tractability of
the Strong Outlier Recognition for NMU theories since
for these theories the entailment problem remains intractable. Indeed, we show in the full paper that deciding the entailment is co-NP-complete even for NMU
theories with tightness one.
This latter result is complemented by the following
one.
Unexpectedly, it turns out that recognizing general
outliers is intractable even in this rather restricted class
of default theories, as accounted for in the theorem whose
statement is reported below.
Theorem 3.9 Outlier Recognition for NU acyclic default theories is NP-complete.
Note that the results for NU (DNU, resp.) theories immediately apply to DNU theories, since given an NU
(DNU, resp.) theory ∆, the dual theory ∆ of ∆ is obtained from ∆ by replacing each literal ℓ in ∆ with ¬ℓ is
a DNU (NU, resp.) theory that has the same properties
of its dual.
Unfortunately, this result confirms that detecting outliers even in default theories as structurally simple
as acyclic NU and DNU ones remains inherently intractable. Therefore, in order to chart the tractability frontier for this problem, we looked into the case
of strong outliers. To characterize the complexity of
this problem, a technical lemma, called the incremental
lemma, is needed. The Incremental Lemma provides an
interesting monotonicity characterization in NMU theories which is valuable on its own. The statement of the
incremental lemma is reported next.
Theorem 3.13 Strong Outlier Recognition for NU
cyclic default theories is NP-complete.
In particular, both Theorems 3.9 and 3.13 make use of
a lemma that informally speaking, establishes that, despite the difficulty to encode the conjunction of a set of
literals using a NU theory, a CNF formula can nonetheless be evaluated by means of condition 1 of Definition
2.2 applied to an acyclic NU theory, provided that the
size of S is polynomial in the number of conjuncts in the
formula.
The tractability results complements the intractability
results since Lemma 3.11 establishes that the size of a
minimal strong outlier witness set is upper bounded by
the tightness of the NU theory.
Recognizing strong outliers under acyclic (dual) normal default theories is the only outlier recognition problem known so far to be tractable; Furthermore, this result is indeed sharp, since switching either to general
outliers, to cyclic theories, or to acyclic NMU theories
makes the problem intractable.
Based on the above results, we designed a polynomial time algorithm for enumerating all strong outliers
of bounded size in an acyclic (dual) normal unary default
theory. The algorithm can also be employed to enumerate all strong outliers of bounded size in a general NMU
theory and, with some minor modifications, all the general outliers and witness pairs of bounded size. However,
in this latter case, since the problems at hand are NPhard, its worst case running time will remain exponential, even if from a practical point of view it can benefit
from some structural optimizations, based on Lemmas
3.10 and 3.11, which would allow it to reduce the size of
the search space.
All complexity results presented inthis work, together
with those already presented in the literature, are summarized in Table 1, where the problems lying on the
tractability frontier are underlined.
Lemma 3.10 [The Incremental Lemma] Let (D, W ) be
an NMU default theory, q a literal and Sa set of literals
such that W ∪ S is consistent and S does not influence
q in (D, W ). Then the following hold:
Monotonicity of brave reasoning: If q is in some extension of (D, W ) then q is in some extension of
(D, W ∪ S).
Monotonicity of skeptical reasoning: If q is in every extension of (D, W ) then q is in every extension of
(D, W ∪ S).
This lemma helps us to state an upper bound on the size
of any minimal outlier witness set in an acyclic NMU
(and, hence, also NU and DNU) default theory.
Lemma 3.11 Let (D, W ) be a consistent NMU default
theory and let L be a set of literals in W . If S is a
minimal strong outlier witness set for L in (D, W ), then
letter(S) is a subset of a SCC in the atomic dependency
graph of (D, W ).
Taken together, the following tractability result can be
proved.
Theorem 3.12 Strong Outlier Recognition for NU
acyclic default theories is in P.
The proof is informally as follows. Since by Lemma 3.11
the size of a minimal strong outlier witness set is upper bounded by c, where c is the tightness of the theory,
then the number of potential witness sets is polynomially bounded in the size of the theory. Moreover, checking conditions of Definition 2.2 can be done in polynomial time on NU theories, as the associated entailment is
tractable. Based on these properties, a polynomial time
Theorem 3.14 Strong Outlier Recognition for NU
acyclic default theories is in P.
Proof: Given a NU default theory (D, W ) of tightness c
and a set of literals L from W , by Lemma 3.11 a minimal
outlier witness set S for L in (D, W ) has a size of at most
85
4
Input: ∆ = (D, W ) – a NU default theory.
Output: Out – the set of all strong outlier sets L
in ∆ s.t. |L| ≤ k.
Discussion and Conclusions
In this paper we have analyzed the tractability border
associated with outlier detection in default logics. From
Theorems 3.9 and 3.13, it is clear that neither acyclicity
nor strongness alone are sufficient to achieve tractability.
However, if both constraints are imposed together, the
complexity of the outlier recognition problem falls below
the tractability frontier, as shown in Theorem 3.14.
Overall, the results and arguments reported in this paper indicate that outlier recognition, even in its strong
version, remains challenging and difficult on default theories. The tractability results we have provided nonetheless indicate that there are significant cases which can
be efficiently implemented. A complete package for
performing outlier detection in general default theories
might therefore try to attain reasonable efficiency by recognizing such tractable fragments. Techniques by which
the outlier detection task in default logics can be rendered practically affordable remain a major subject area
for future research.
let C1 , . . . , CN the ordered SCCs in the atomic dependency graph of ∆;
set Out to ∅;
for i = 1..N do
for all S ⊂ W s.t. letter(S) ⊆ Ci do
if (∀ℓ ∈ S)(D, WS ) |= ¬ℓ then
for all L ⊆ WS s.t. |L| ≤ k and letter(S) ⊆
(C1 ∪ . . . ∪ Ci ) do
if (∀ℓ ∈ S)(D, WS,L ) 6|= ¬ℓ then
set Out to Out ∪ {L};
end if
end for
end if
end for
end for
Figure 2: Algorithm Outlier Enumeration.
References
c, where c is the maximum size of an SCC in the atomic
dependency graph of (D, W ).
Thus, the strong outlier recognition problem can be
decided by solving the strong outlier-witness recognition
problem for each subset S of literals in WL having a size
of at most c. Since the latter problem is polynomial time
solvable (by Theorem 3.3) and since the number of times
it has to be evaluated, that is O(|W |c ), is polynomially
in the size of the input, then the depicted procedure
solves the strong outlier recognition problem in polynomial time.
Angiulli, F., Zohary, R. B.-E., & Palopoli, L. Tractale
strong outlier identification, submitted..
Angiulli, F., Zohary, R. B.-E., & Palopoli, L. (2008).
Outlier detection using default reasoning. Artificial
Intelligence, 172 (16-17), 1837–1872.
Angiulli, F., Zohary, R. B.-E., & Palopoli, L. (2010).
Outlier detection for simple default theories. Artificial Intelligence, 174 (15), 1247–1253.
Chandola, V., Banerjee, A., & Kumar, V. (2009).
Anomaly detection: A survey. ACM Comput.
Surv., 41 (3).
Hawkins, D. (1980). Identification of Outliers. Chapman
and Hall, London, New York.
Hodge, V. J., & Austin, J. (2004). A survey of outlier
detection methodologies. Artif. Intell. Rev., 22 (2),
85–126.
Kautz, H. A., & Selman, B. (1991). Hard problems for
simple default logics. Artificial Intelligence, 49 (13), 243–279.
Reiter, R. (1980). A logic for default reasoning. Artificial
Intelligence, 13 (1-2), 81–132.
Zhang, A., & Marek, W. (1990). On the classification
and existence of structures in default logic. Fundamenta Informaticae, 13 (4), 485–499.
Zohary, R. B.-E. (2002). Yet some more complexity
results for default logic. Artificial Intelligence,
139 (1), 1–20.
To take a step further and present an outlier enumeration algorithm, a known proposition is recalled.
Proposition 3.15 (proved in (Kautz & Selman, 1991;
Zohary, 2002)) Let ∆ be an NU or a DNU propositional
default theory and let L be a set of literals. Deciding
whether ∆ |= L is O(n2 ), where n is the size of the
theory ∆.
Based on the above properties, we are now ready to
describe the algorithm Outlier Enumeration which, for
a fixed integer k, enumerates in polynomial time all the
strong outlier sets of size at most k in an acyclic NU
default theory.
The algorithm is presented in Figure 2. The SCCs
C1 , . . . , CN of the atomic dependency graph of the theory are ordered such that there do not exist Ci and Cj
with i < j and two letters l ∈ Ci and q ∈ Cj such that
there exists a path from letter(q) to letter(j).
By Theorem 3.15, the cost of steps 5 and 7 is O(n2 ).
Thus, the cost of the algorithm is O(2c (n/c) · (cn2 +
nk cn2 )) = O(2c nk+3 ). Since c and k are fixed, the algorithm enumerates the strong outliers in polynomial time
in the size of (D, W ). For example, all the singleton
strong outlier sets can be enumerated in time O(n4 ).
86
Topics in Horn Contraction: Supplementary Postulates, Package Contraction, and
Forgetting
James P. Delgrande
School of Computing Science,
Simon Fraser University,
Burnaby, B.C.,
Canada V5A 1S6.
[email protected]
Renata Wassermann
Department of Computer Science
University of São Paulo
05508-090 São Paulo,
Brazil
[email protected]
Abstract
overall framework of Horn contraction based on remainder
sets. Previous work in this area has addressed counterparts
to the basic AGM postulates; consequently we first examine
prospects for extending the approach to counterparts of the
supplemental AGM postulates. Second, we address package
contraction, in which one may contract by a set of formulas,
and the result is that no (contingent) formula in the set is believed. In the AGM approach, for a finite number of formulas
this can be accomplished by contracting by the disjunction of
the formulas. Since the disjunction of Horn formulas may not
be in Horn form, package contraction then becomes an important accessory operation. Last we briefly examine a forgetting
operator, in which one effectively reduces the language of discourse.
The next section introduces belief change while the third
section discusses Horn clause reasoning, and previous work
in the area. Section 4 examines the supplementary postulates;
Section 5 addresses package contraction; and Section 6 covers forgetting. The last section contains a brief conclusion.
In recent years there has been interest in studying belief change, specifically contraction, in Horn
knowledge bases. Such work is arguably interesting since Horn clauses have found widespread use
in AI; as well, since Horn reasoning is weaker than
classical reasoning, this work also sheds light on
the foundations of belief change. In this paper, we
continue our previous work along this line. Our
earlier work focussed on defining contraction in
terms of weak remainder sets, or maximal subsets
of an agent’s belief set that fail to imply a given
formula. In this paper, we first examine issues regarding the extended contraction postulates with
respect to Horn contraction. Second, we examine
package contraction, or contraction by a set of formulas. Last, we consider the closely-related notion of forgetting in Horn clauses. This paper then
serves to address remaining major issues concerning Horn contraction based on remainder sets.
1
2
The AGM Framework for Contraction
As mentioned, the AGM approach [Alchourrón et al., 1985;
Gärdenfors, 1988] is the best-known approach to belief
change. Belief states are modelled by deductively-closed sets
of sentences, called belief sets, where the underlying logic
includes classical propositional logic. Thus a belief set K
satisfies the constraint:
Introduction
Belief change addresses how a rational agent may alter its beliefs in the presence of new information. The best-known approach in this area is the AGM paradigm [Alchourrón et al.,
1985; Gärdenfors, 1988], named after the original developers.
This work focussed on belief contraction, in which an agent
may reduce its stock of beliefs, and belief revision, in which
new information is consistently incorporated into its belief
corpus. In this paper we continue work in belief contraction
in the expressively weaker language of Horn formulas, where
a Horn formula is a conjunction of Horn clauses and a Horn
clause can be written as a rule in the form a1 ∧a2 ∧· · ·∧an →
a for n ≥ 0, and where a, ai (1 ≤ i ≤ n) are atoms. (Thus,
expressed in conjunctive normal form, a Horn clause will
have at most one positive literal.) Horn contraction has been
addressed previously in [Delgrande, 2008; Booth et al., 2009;
Delgrande and Wassermann, 2010; Zhuang and Pagnucco,
2010b]. With the exception of the last reference, this work
centres on the notion of a remainder set, or maximal subset
of a knowledge base that fails to imply a given formula.
In this paper we continue work in Horn belief contraction,
on a number of aspects; our goal is to essentially complete the
If K logically entails φ then φ ∈ K.
The most basic operator is called expansion: For belief set
K and formula φ, the expansion of K by φ, K + φ, is the
deductive closure of K ∪ {φ}. Of more interest are contraction, in which an agent reduces its set of beliefs, and revision, in which an agent consistently incorporates a new belief.
These operators can be characterised by two means. First, a
set of rationality postulates for a belief change function may
be provided; these postulates stipulate constraints that should
govern any rational belief change function. Second, specific
constructions for a belief change function are given. Representation results can then be given (or at least are highly
desirable) showing that a set of rationality postulates exactly
captures the operator given by a particular construction.
Our focus in this paper is on belief contraction, and so we
review these notions with respect to this operator. Informally,
87
For arbitrary theory K and function −̇ from 2L × L to 2L , it
proves to be the case that −̇ is a partial meet contraction function iff it satisfies the basic contraction postulates (K −̇1)–
(K −̇6). Last, let be a transitive relation on 2K , and let the
selection function be defined by:
the contraction of a belief set by a formula is a belief set in
which that formula is not believed. Formally, a contraction
function −̇ is a function from 2L × L to 2L satisfying the
following postulates:
(K −̇1) K −̇φ is a belief set.
(K −̇2) K −̇φ ⊆ K.
(K −̇3) If φ 6∈ K, then K −̇φ = K.
(K −̇4) If not ⊢ φ, then φ 6∈ K −̇φ.
(K −̇5) If φ ∈ K, then K ⊆ (K −̇φ) + φ.
(K −̇6) If ⊢ φ ≡ ψ, then K −̇φ = K −̇ψ.
(K −̇7) K −̇φ ∩ K −̇ψ ⊆ K −̇(φ ∧ ψ).
(K −̇8) If ψ 6∈ K −̇(ψ ∧ φ) then K −̇(φ ∧ ψ) ⊆ K −̇ψ.
The first six postulates are called the basic contraction postulates, while the last two are referred to as the supplementary
postulates. We have the following informal interpretations of
the postulates: contraction yields a belief set (K −̇1) in which
the sentence for contraction φ is not believed (unless φ is a
tautology) (K −̇4). No new sentences are believed (K −̇2),
and if the formula is not originally believed then contraction
has no effect (K −̇3). The fifth postulate, the so-called recovery postulate, states that nothing is lost if one contracts
and expands by the same sentence. This postulate is controversial; see for example [Hansson, 1999]. The sixth postulate asserts that contraction is independent of how a sentence is expressed. The last two postulates express relations
between contracting by conjunctions and contracting by the
constituent conjuncts. (K −̇7) says that if a formula is in the
result of contracting by each of two formulas then it is in the
result of contracting by their conjunction. (K −̇8) says that if
a conjunct is not in the result of contracting by a conjunction,
then contracting by that conjunct is (using (K −̇7)) the same
as contracting by the conjunction.
Several constructions have been proposed to characterise
belief change. The original construction was in terms of remainder sets, where a remainder set of K with respect to φ is
a maximal subset of K that fails to imply φ. Formally:
Definition 1 Let K ⊆ L and let φ ∈ L.
K ↓ φ is the set of sets of formulas s.t. K ′ ∈ K ↓ φ iff
1. K ′ ⊆ K
2. K ′ 6⊢ φ
3. For any K ′′ s.t. K ′ ⊂ K ′′ ⊆ K, it holds that K ′′ ⊢ φ.
X ∈ K ↓ φ is a remainder set of K wrt φ.
From a logical point of view, the remainder sets comprise
equally-good candidates for a contraction function. Selection functions are introduced to reflect the extra-logical factors that need to be taken into account, to obtain the “best” or
most plausible remainder sets. In maxichoice contraction, the
selection function determines a single selected remainder set
as the contraction. In partial meet contraction, the selection
function returns a subset of the remainder sets, the intersection of which constitutes the contraction. Thus if the selection
function is denoted by γ(·), then the contraction of K by formula φ can be expressed by
\
K −̇φ =
γ(K ↓ φ).
γ(K ↓ φ) = {K ′ ∈ K ↓ φ | ∀K ′′ ∈ K ↓ φ, K ′′ K ′ }.
γ is a transitively relational selection function, and −̇ defined
in terms of such a γ is a transitively relational partial meet
contraction function. Then we have:
Theorem 1 ([Alchourrón et al., 1985]) Let K be a belief set
and let −̇ be a function from 2L × L to 2L . Then
1. −̇ is a partial meet contraction function iff it satisfies the
contraction postulates (K −̇1)–(K −̇6).
2. −̇ is a transitively relational partial meet contraction
function iff it satisfies the contraction postulates (K −̇1)–
(K −̇8).
The second major construction for contraction functions is
called epistemic entrenchment. The general idea is that extralogic factors related to contraction are given by an ordering
on formulas in the agent’s belief set, reflecting how willing
the agent would be to give up a formula. Then a contraction
function can be defined in terms of removing less entrenched
formulas from the belief set. It is shown in [Gärdenfors and
Makinson, 1988] that for logics including classical propositional logic, the two types of constructions, selection functions over remainder sets and epistemic entrenchment orderings, capture the same class of contraction functions; see also
[Gärdenfors, 1988] for details.
3
Horn Theories and Horn Contraction
3.1 Preliminary Considerations
Let P = {a, b, c, . . . } be a finite set of atoms, or propositional
letters, that includes the distinguished atom ⊥. LH is the
language of Horn formulas. That is, LH is given by:
1. Every p ∈ P is a Horn clause.
2. a1 ∧ a2 ∧ · · · ∧ an → a, where n ≥ 0, and a, ai (1 ≤
i ≤ n) are atoms, is a Horn clause.
3. Every Horn clause is a Horn formula.
4. If φ and ψ are Horn formulas then so is φ ∧ ψ.
For a rule r as in 2 above, head(r) is a, and body(r) is the set
{a1 , a2 , . . . , an }. Allowing conjunctions of rules, as given in
4, adds nothing of interest to the expressivity of the language
with respect to reasoning. However, it adds to the expressibility of contraction, as we are able to contract by more than
a single Horn clause. For convenience, we use ⊤ to stand for
some arbitrary tautology.
An interpretation of LH is a function from P to
{true, f alse} such that ⊥ is assigned f alse. Sentences of
LH are true or false in an interpretation according to the standard rules in propositional logic. An interpretation M is a
model of a sentence φ (or set of sentences), written M |= φ,
just if M makes φ true. M od(φ) is the set of models of
formula (or set of formulas) φ; thus M od(⊤) is the set of
88
countermodel
a
ac
b
bc
∅
c
interpretations of LH . An interpretation is usually identified with the atoms true in that interpretation. Thus, for
P = {p, q, r, s} the interpretation {p, q} is that in which p
and q are true and r and s are false. For convenience, we
also express interpretations by juxtaposition of atoms. Thus
the interpretations {{p, q}, {p}, {}} will usually be written as
{pq, p, ∅}.
A key point is that Horn theories are characterised semantically by the fact that the models of a Horn theory are
closed under intersections of positive atoms in an interpretation. That is, Horn theories satisfy the constraint:
If M1 , M2 ∈ M od(H) then M1 ∩ M2 ∈ M od(H).
This leads to the notion of the characteristic models
[Khardon, 1995] of a Horn theory: M is a characteristic
model of theory H just if for every M1 , M2 ∈ M od(H),
M1 ∩ M2 = M implies that M = M1 or M = M2 . E.g. the
theory expressed by {p ∧ q → ⊥, r}) has models {pr, qr, r}
and characteristic models {pr, qr}. Since pr ∩ qr = r, r isn’t
a characteristic model of H.
A Horn formula ψ is entailed by a set of Horn formulas
A, A ⊢H ψ, just if any model of A is also a model of ψ.
For simplicity, and because we work exclusively with Horn
formulas, we drop the subscript and write A ⊢ ψ. If A = {φ}
is a singleton set then we just write φ ⊢ ψ. A set of formulas
A is inconsistent just if A ⊢ ⊥. We use φ ↔ ψ to represent
logical equivalence, that is φ ⊢ ψ and ψ ⊢ φ.
induced
models
a
b
∅
resulting KB
r.s.
a ∧ (c → b)
a
b ∧ (c → a)
b
(a → b) ∧ (b → a) ∧ (c → a ∧ b)
(a → b) ∧ (b → a)
Figure 1: Example: Candidates for Horn contraction
3.2 Horn Contraction
The last few years have seen work on Horn contraction. Delgrande [2008] addressed maxichoice Horn belief set contraction based on (Horn) remainder sets, called e-remainder sets.
The definition of e-remainder sets for Horn clause belief sets
is the same as that for a remainder set (Definition 1) but with
respect to Horn clauses and Horn derivability. For H a Horn
belief set and φ ∈ LH , the set of e-remainder sets with respect to H and φ is denoted by H ↓e φ.
Booth, Meyer, and Varzinczak [2009] subsequently investigated this area by considering partial meet contraction, as
well as a generalisation of partial-meet, based on the idea of
infra-remainder sets and package contraction. In [Booth et
al., 2009], an infra remainder sets is defined as follows:
Definition 2 For belief sets H and
T X, X ∈ H ⇓e φ iff there
is some X ′ ∈ H ↓e φ such that ( H ↓e φ) ⊆ X ⊆ X ′ . The
elements of H ⇓e φ are the infra e-remainder sets of H with
respect to φ.
All e-remainder sets are infra e-remainder sets, as is the intersection of any set of e-remainder sets. It proved to be the
case that e-remainder sets (and including the infra-remainder
sets of [Booth et al., 2009]) are not sufficiently expressive for
contraction.
The problem arises from the relation between remainder
sets on the one hand, and their counterpart in terms of interpretations on the other. In the classical AGM approach, a
remainder set is characterised semantically by a minimal superset of the models of the agent’s belief set such that this
superset does not entail the formula for contraction. As a result, the models of a remainder set consist of the models of
a belief set H together with a countermodel of the formula
φ for contraction. With Horn clauses, things are not quite so
simple, in that for a countermodel M of φ, there may be no
Horn remainder set that has M as a model.
To see this, consider the following example, adapted from
[Delgrande and Wassermann, 2010].
Example 1 Let P = {a, b, c} and H = Cnh (a ∧ b). Consider candidates for H −̇(a ∧ b). There are three remainder
sets, given by the Horn closures of a ∧ (c → b), b ∧ (c → a),
and (a → b) ∧ (b → a) ∧ (c → a ∧ b)). Any infra-remainder
set contains the closure of (c → a) ∧ (c → b).
See Figure 1. In the first line of the table, we have that
a (viz. {a, ¬b, ¬c}) is a countermodel of a ∧ b. Adding this
model to the models of H yields the models of the formula
a ∧ (c → b). This characterises a remainder set, as indicated
in the last column. In the second line, we have that ac (viz.
Notation: We collect here notation that is used in the paper.
Lower-case Greek characters φ, ψ, . . ., possibly subscripted,
denote arbitrary formulas of LH . Upper case Roman characters A, B, . . . , possibly subscripted, denote arbitrary sets of
formulas. H (H1 , H ′ , etc.) denotes Horn belief sets, so that
φ ∈ H iff H ⊢H φ.
Cnh (A) is the deductive closure of a Horn formula or
set of formulas A under Horn derivability. |φ| is the set of
maximal, consistent Horn theories that contain φ. m (and
subscripted variants) represents a maximum consistent set of
Horn formulas.
M (M1 , M ′ , etc.) denote interpretations over some fixed
language. M od(A) is the set of models of A. Arbitrary sets
of interpretations will be denoted M (M′ etc.). Cl∩ (M) is
the intersection closure of a set of interpretations M;1 that is,
Cl∩ (M) is the least set such that M ⊆ Cl∩ (M) and M1 ,
M2 ∈ Cl∩ (M) implies that M1 ∩ M2 ∈ Cl∩ (M). Note
that M denotes an interpretation expressed as a set of atoms,
while m denotes a maximum consistent set of Horn formulas. Thus the logical content is the same, in that an interpretation defines a maximum consistent set of Horn formulas,
and vice versa. We retain these two interdefinable notations,
since each is useful in the subsequent development. Similar
comments apply to M od(φ) vs. |φ|.
Since P is finite, a (Horn or propositional logic) belief set
may be finitely represented, that is, for X a belief set, there is
a formula φ such that Cnh (φ) = X. As well, we make use of
the fact that there is a 1-1 correspondence between elements
of |φ| and of M od(φ).
1
Recall that an interpretation is represented by the set of atoms
true in the interpretation.
89
√
√
√
{a, ¬b, c}) is another countermodel of H. However, since H
has a model ab, the intersection of these models, ab ∩ ac = a
must also be included; this is the item in the second column.
The resulting belief set is characterised by the interpretations
M od(H) ∪ {ac, a} = {abc, ab, ac, a}, which is the set of
models of formula a, as given in the third column. However,
the result isn’t a remainder set, since Cnh (a ∧ (c → b)) is a
logically stronger belief set than Cnh (a), which also fails to
imply a ∧ b.
This result is problematic for both [Delgrande, 2008] and
[Booth et al., 2009]. For example, in none of the approaches
in these papers is it possible to obtain H −̇e (a ∧ b) ↔ a, nor
H −̇e (a ∧ b) ↔ (a ≡ b). But presumably these possibilities
are desirable as potential contractions. Thus, in all of the
approaches developed in the cited papers, it is not possible to
have a contraction wherein a ∧ ¬b ∧ c corresponds to a model
of the contraction.
This issue was addressed in [Delgrande and Wassermann,
2010]. There the characteristic models of maxichoice candidates for H −̇e φ consist of the characteristic models of H
together with a single interpretation from M od(⊤)\M od(φ).
The resulting theories, called weak remainder sets, corresponded to the theories given in the third column in Figure 1.
in Horn logics. A postulate set is provided and shown to characterise entrenchment-based Horn contraction. The fact that
AGM contraction refers to disjunctions of formulas, which
in general will not be Horn, is handled by considering Horn
strengthenings in their postulate set, which is to say, logically
weakest Horn formulas that subsume the disjunction. In contrast to earlier work, their postulate set includes equivalents
to the supplemental postulates, and so goes beyond the set of
basic postulates.
For a given clause ϕ, the set of its Horn strengthenings
(ϕ)H is the set such that ψ ∈ (ϕ)H if and only if ψ is a Horn
clause and there is no Horn clause ψ ′ such that ψ ⊂ ψ ′ ⊆ ϕ.
Of the set of ten postulates given in [Zhuang and Pagnucco, 2010b], five correspond to postulates characterizing
partial meet contraction based on weak remainders as defined
in [Delgrande and Wassermann, 2010] and two correspond to
the supplementary postulates (K −̇7) and (K −̇8). The three
new postulates are:
(H −̇5) If ψ ∈ H −̇ϕ ∧ ψ then ψ ∈ H −̇ϕ ∧ ψ ∧ δ
(H −̇9) If ψ ∈ H \ H −̇ϕ then ∀χ ∈ (ϕ ∨ ψ)H , χ 6∈ H −̇ϕ
(H −̇10) If ∀χ ∈ (ϕ ∨ ψ)H , χ 6∈ H −̇ϕ ∧ ψ then ψ 6∈ H \
H −̇ϕ
Definition 3 ([Delgrande and Wassermann, 2010]) Let H
be a Horn belief set, and let φ be a Horn formula.
H ↓↓e φ is the set of sets of formulas s.t. H ′ ∈ H ↓↓e φ iff
′
H = H ∩ m for some m ∈ |⊤| \ |φ|.
H ′ ∈ H ↓↓e φ is a weak remainder set of H and φ.
While there has been other work on belief change and Horn
logic, such work focussed on specific aspects of the problem, rather than a general characterisation of Horn clause belief change. For example, Eiter and Gottlob [1992] address
the complexity of specific approaches to revising knowledge
bases, including the case where the knowledge base and formula for revision are conjunctions of Horn clauses. Not unexpectedly, results are generally better in the Horn case. Liberatore [2000] considers the problem of compact representation for revision in the Horn case. Basically, given a knowledge base K and formula φ, both Horn, the main problem
addressed is whether the knowledge base, revised according
to a given operator, can be expressed by a propositional formula whose size is polynomial with respect to the sizes of
K and φ. [Langlois et al., 2008] approaches the study of
revising Horn formulas by characterising the existence of a
complement of a Horn consequence; such a complement corresponds to the result of a contraction operator. This work
may be seen as a specific instance of a general framework
developed in [Flouris et al., 2004]. In [Flouris et al., 2004],
belief change is studied under a broad notion of logic, where
a logic is a set closed under a Tarskian consequence operator. In particular, they give a criterion for the existence of a
contraction operator satisfying the basic AGM postulates in
terms of decomposability.
The following characterizations were given for maxichoice
and partial meet Horn contraction:
Theorem 2 ([Delgrande and Wassermann, 2010]) Let H
be a Horn belief set. Then −̇w is an operator of maxichoice
Horn contraction based on weak remainders iff −̇w satisfies
the following postulates.
(H −̇w 1) H −̇w φ is a belief set.
(H −̇w 2) If not ⊢ φ, then φ 6∈ H −̇w φ.
(H −̇w 3) H −̇w φ ⊆ H.
(H −̇w 4) If φ 6∈ H, then H −̇w φ = H.
(H −̇w 5) If ⊢ φ then H −̇w φ = H
(closure)
(success)
(inclusion)
(vacuity)
(failure)
(H −̇w 6) If φ ↔ ψ, then H −̇w φ = H −̇w ψ. (extensionality)
(H −̇w 7) If H 6= H −̇w φ then ∃β ∈ LH s.t. {φ, β} is inconsistent, H −̇w φ ⊆ Cnh ({β}) and ∀H ′ s.t H −̇w φ ⊂
H ′ ⊆ H we have H ′ 6⊆ Cnh ({β}).
(maximality)
Theorem 3 ([Delgrande and Wassermann, 2010]) Let H
be a Horn belief set. Then −̇w is an operator of partial meet
Horn contraction based on weak remainders iff −̇w satisfies
the postulates (H −̇w 1) – (H −̇w 6) and:
4
Supplementary postulates
In this section we investigate how the different proposals for
Horn contraction operations behave with respect to the supplementary postulates (K-7) and (K-8). Throughout the section, we consider all selection functions to be transitively relational.
First we consider the operation of Horn Partial Meet eContraction as defined in [Delgrande, 2008]. The following example shows that, considering ↓e as defined in [Del-
(H −̇pm 7) If β ∈ H\(H−α), then there is some H ′ such that
H − α ⊆ H ′ , α 6∈ Cnh (H ′ ) and α ∈ Cnh (H ′ ∪ {β})
(weak relevance)
More recently, [Zhuang and Pagnucco, 2010b] have addressed Horn contraction from the point of view of epistemic
entrenchment. They compare AGM contraction via epistemic
entrenchment in classical propositional logic with contraction
90
grande, 2008], Horn Partial Meet e-Contraction does not satisfy (K −̇7):
Proposition 3 PMWR satisfies (H −̇9)
PMWR in general does not satisfy (H −̇10), as the following example shows.
Let H = Cnh ({a, b}). Then
H ↓↓e a = {H1 , H3 } and
H ↓↓e a ∧ b = {H1 , H2 , H3 }, where
H1 = Cnh ({a ∨ ¬b, b ∨ ¬a}),
H2 = Cnh ({a}) and
H3 = Cnh ({b}).
Assuming a selection function based on a transitive relation
such that H1 ≺ H2 and H1 ≺ H3 (and H2 H3 and H3
H2 ), we have
H − a = H3 and H − a ∧ b = H2 ∩ H3
Since (a∨b)H = {a, b}, we have that for any χ ∈ (a∨b)H ,
χ 6∈ H − a ∧ b, but b ∈ H − a.
Example 2 Let H = Cnh ({a → b, b → c, a → d, d → c}).
We then have
H ↓e a → c = {H1 , H2 , H3 , H4 }
H ↓e b → c = {H5 }
where:
H1 = Cnh ({a → b, a → d}),
H2 = Cnh ({a → b, a ∧ c → d, d → c}),
H3 = Cnh ({b → c, a ∧ c → b, a → d}),
H4 = Cnh ({a ∧ c → b, b → c, a ∧ c → d, d → c, a ∧ d →
b, a ∧ b → d}), and
H5 = Cnh ({a → b, a → d, d → c})
Note that the two first elements of H ↓e a → c are subsets
of the single element of H ↓e b → c and hence, cannot belong
to H ↓e a → c ∧ b → c.
In order to finish the comparison between the sets of postulates, it is interesting to note the following:
H ↓e a → c ∧ b → c = {H3 , H4 , H5 }
Observation 1 (H −̇9) implies weak relevance.
If we take a selection function based on a transitive relation between remainder sets that gives priority in the order in
which they appear in this example, i.e., H5 ≺ H4 ≺ H3 ≺
H2 ≺ H1 , we will have:
5
Package Contraction
In this section we consider Horn package contraction. For
belief set H and a set of formulas Φ, the package contraction
H −̇p Φ is a form of contraction in which no member of Φ is
in H −̇p Φ. As [Booth et al., 2009] points out, this operation
is of interest in Horn clause theories given their limited expressivity: in order to contract by φ and ψ simultaneously,
one cannot contract by the disjunction φ ∨ ψ, since the disjunction is generally not a Horn clause. Hence, one expresses
the contraction of both φ and ψ as the package contraction
H −̇p {φ, ψ}.
We define the notion of Horn package contraction, and
show that it is in fact expressible in terms of maxichoice Horn
contraction.
H − a → c = H1
H − b → c = H5
H − a → c ∧ b → c = H3
And we see that H − a → c ∩ H − b → c = H1 6⊆ H3 =
H −a→c∧b→c
The same example shows that the operation does not satisfy
(K −̇8):
a → c 6∈ H − a → c ∧ b → c, but H − a → c ∧ b → c 6⊆
H − a → c.
If there are no further restrictions on the selection function, the same example also shows that contraction based on
infra-remainders does not satisfy the supplementary postulates. Note that each remainder set in the example is also an
infra-remainder and that the selection function always selects
a single element. It suffices to assign all the remaining infraremainders lower priority.
Now we can show that the operation of partial meet based
on weak remainders (PMWR) has a better behaviour with respect to the supplementary postulates:
Definition 4 Let H be a Horn belief set, and let Φ =
{φ1 , . . . , φn } be a set of Horn formulas.
H ↓↓p Φ is the set of sets of formulas s.t. H ′ ∈ H ↓↓p Φ iff
∃m1 , . . . , mn such that, for 1 ≤ i ≤ n:
mi ∈ |⊤| \ |φT
i | if 6⊢ φi , otherwise mi = LH
n
and H ′ = H ∩ i=1 mi .
Definition 5 Let γ be a selection function on H such that
γ(H ↓↓p Φ) = {H ′ } for some H ′ ∈ H ↓↓p Φ.
The (maxichoice) package Horn contraction based on weak
remainders is given by:
Proposition 1 Partial meet based on weak remainders and
a transitive relational selection function satisfies (K −̇7) and
(K −̇8).
H −̇p Φ = γ(H ↓↓p Φ)
We have seen that Epistemic Entrenchment Horn Contraction (EEHC) is characterized by a set of ten postulates.
In [Zhuang and Pagnucco, 2010a], it is shown that transitively relational PMWR as defined above is more general than
EEHC. This means that any operation satisfying their set of
10 postulates (which include (K −̇7) and (K −̇8)) is a PMWR.
We have seen that PMWR satisfies (K −̇7) and (K −̇8), hence,
in order to compare PMWR and EEHC, we need to know
whether PMWR satisfies (H −̇5), (H −̇9) and (H −̇10).
if ∅ =
6 Φ ∩ H 6⊆ Cnh (⊤); and H otherwise.
The following result relates elements of H ↓↓p Φ to weak
remainders.
Proposition 4 Let H be a Horn belief set and let Φ =
{φ1 , . . . , φn } be a set of Horn formulas where for 1 ≤ i ≤ n
we have 6⊢ φi .
Then H ′ ∈ T
H ↓↓p Φ iff for 1 ≤ i ≤ n there are Hi ∈ H ↓↓e
n
φi and H ′ = i=1 Hi .
Proposition 2 PMWR satisfies (H −̇5).
91
It follows immediately from this that any maxichoice Horn
contraction defines a package contraction, and vice versa.
(H −̇p 5) H −̇p Φ = H −̇p (Φ \ Cnh (⊤))
Example 3 Consider the Horn belief set H = Cnh ({a, b})
over P = {a, b, c}. We want to determine elements of
(H −̇p 6) If φ ↔ ψ, then
H −̇p (Φ ∪ {φ}) = H −̇p (Φ ∪ {ψ}) (extensionality)
(H −̇p 5b) H −̇p ∅ = H
H ↓↓p Φ = Cnh ({a, b}) ↓↓p {a, b}.
Φ′ = (Φ \ Cnh (⊤)) ∩ H = {φ1 , . . . , φn }
there is {β1 , . . . , βn } s.t. {φi , βi } ⊢ ⊥ and H −̇p Φ ⊆
Cnh (βi ) for 1 ≤ i ≤ n;
and ∀H ′ s.t H −̇p Φ ⊂ H ′ ⊆ H, ∃βi s.t. H ′ 6⊆ Cnh (βi ).
(maximality)
The following result, which shows that package contraction generalises maxichoice contraction, is not surprising, nor
is the next result, which shows that a maxichoice contraction
defines a package contraction.
Proposition 5 Let −̇p be an operator of maxichoice Horn
package contraction. Then
H −̇φ = H −̇p Φ for Φ = {φ}
is an operator of maxichoice Horn contraction based on weak
remainders.
Proposition 6 Let −̇ be an operator of maxichoice Horn contraction based on weak remainders. Then
\
H −̇p Φ =
H −̇φ
1. There are 4 countermodels of a, given by:
A = {bc, b, c, ∅}.
Thus there are four weak remainders corresponding to
these countermodels, and so four candidates for maxichoice Horn contraction by a.
2. Similarly there are 4 countermodels of b:
B = {ac, a, c, ∅}.
3. Members of H ↓↓p Φ are given by
Cl∩ (M od(H) ∪ {x} ∪ {y})
for x ∈ A and y ∈ B.
For example, for x = bc, y = ∅, we have that Cl∩ (M od(H)∪
{x} ∪ {y}) = {abc, ab, bc, b, ∅}, which is the set of models of
(c → b) ∧ (a → b).
For x = bc, y = ac, we have that Cl∩ (M od(H) ∪ {x} ∪
{y}) = Cnh (⊤); this holds for no other choice of x and y.
φ∈Φ
is an operator of maxichoice Horn package contraction.
As described, a characteristic of maxichoice package contraction is that there are a large number of members of H ↓↓p
Φ, some of which may be quite weak logically. Of course, a
similar point can be made about maxichoice contraction, but
in the case of package contraction we can eliminate some candidates via pragmatic concerns. We have that a package contraction H −̇p Φ is a belief set H ′ ∈ H ↓↓p Φ such that, informally, models of H ′ contain a countermodel for each φi ∈ Φ
along with models of H. In general, some interpretations
will be countermodels of more than one member of Φ, and so
pragmatically, one canTselect minimal sets of countermodels.
Hence in the case that i (M od(⊤)T\ M od(φi )) 6= ∅, a single
countermodel, that is some m ∈ i (M od(⊤) \ M od(φi )),
would be sufficient to yield
T a package contraction.
Now, it may be that i (M od(⊤) \ M od(φi )) is empty. A
simple example illustrates this case:
Example 4 Let H = Cnh (a → b, b → a) where P =
{a, b}. Then H −̇p {a → b, b → a} = Cnh (⊤). That is,
the sole countermodel of a → b is {a} while that of b → a
is {b}. The intersection closure of these interpretations with
those of H is {ab, a, b, ∅} = M od(⊤).
Informally then one can select a minimal set of models
such that a countermodel of each member of Φ is in the set.
These considerations yield the following definition:
Definition 6 Let H be a Horn belief set, and let Φ =
{φ1 , . . . , φn } be a set of Horn formulas.
HS(Φ), the set of (minimal) hitting sets of interpretations
with respect to Φ, is defined by:
S ∈ HS(Φ) iff
What this example indicates informally is that there is a
great deal of scope with respect to candidates for package
contraction. To some extent, such a combinatorial explosion
of possibilities is to be expected, given the fact that a formula
will in general have a large number of countermodels, and
that this is compounded by the fact that each formula in a
package contraction does not hold in the result. However, it
can also be noted that some candidate package contractions
appear to be excessively weak; for example it would be quite
drastic to have Cnh (⊤) as the result of such a contraction. As
well, some candidate package contractions appear to contain
redundancies, in that a selected countermodel of a may also
be a countermodel of b, in which case there seems to be no
reason to allow the possible incorporation of a separate countermodel of b. Consequently, we also consider versions of
package contraction that in some sense yield a maximal belief set. However, first we provide results regarding package
contraction.
We have the following result:
Theorem 4 Let H be a Horn belief set. Then if −̇p is an
operator of maxichoice Horn package contraction based on
weak remainders then −̇p satisfies the following postulates.
(H −̇p 2) For φ ∈ Φ, if not ⊢ φ, then φ 6∈ H −̇p Φ
(H −̇p 3) H −̇p Φ ⊆ H
(H −̇p 4) H −̇p Φ = H −̇p (H ∩ Φ)
(triviality)
(H −̇p 7) If H 6= H −̇p Φ then for
It proves to be the case that there are a total of 14 elements in
H ↓↓p Φ and so 14 candidate package contractions. We have
the following.
(H −̇p 1) H −̇p Φ is a belief set.
(failure)
(closure)
(success)
(inclusion)
(vacuity)
92
V
V
( φ∈S1 φ)∨( φ∈S2 φ). Of course, all such sets will be guaranteed to be finite.
We introduce the following notation for this section, where
S is a set of Horn clauses.
1. S ⊆ |⊤|
2. S ∩ (|⊤| \ |φi |) 6= ∅ for 1 ≤ i ≤ n.
3. For S ′ ⊂ S, S ′ ∩ (|⊤| \ |φi |) = ∅ for some 1 ≤ i ≤ n.
Thus we look for sets of sets of interpretations, elements
of such a set S are interpretations represented as maximum
consistent sets of formulas (Condition 1). As well, this set S
contains a countermodel for each member of Φ (2) and moreover S is a subset-minimal set that satisfies these conditions
(3). The notion of a hitting set is not new; see [Garey and
Johnson, 1979] and see [Reiter, 1987] for an early use in AI.
Thus S ∈ HS(Φ) corresponds to a minimal set of countermodels of members of Φ.
• S[p/t] is the result of uniformly substituting t ∈ {⊥, ⊤}
for atom p in S.
• S↓p = {φ ∈ S | φ does not mention p}
Assume without loss of generality that for φ ∈ S, that
head (φ) 6∈ body(φ).
The following definition adapts the standard definition for
forgetting to Horn clauses.
Definition 8 For set of Horn clauses S and atom p, define
f orget(S, p) to be S[p/⊥] ∨ S[p/⊤].
Definition 7 H ↓↓p Φ is the set T
of sets of formulas s.t.
H ′ ∈ H ↓↓p Φ iff H ′ = H ∩ m∈S for some S ∈ HS(Φ).
Cnh (c → a, c → b),
This is not immediately useful for us, since a disjunction
is generally not Horn. However, the next result shows that
this definition nonetheless leads to a Horn-definable forget
operator. Recall that for clauses c1 and c2 , expressed as sets
of literals where p ∈ c1 and ¬p ∈ c2 , that the resolvent of c1
and c2 is the clause (c1 \ {p}) ∪ (c2 \ {¬p}). As well, recall
that if c1 and c2 are Horn, then so is their resolvent.
In the following, Res(S, p) is the set of Horn clauses obtained from S by carrying out all possible resolutions with
respect to p.
Cnh (a → b, b → a, c → a, c → b) }.
Definition 9 Let S be a set of Horn clauses and p an atom.
Define
′
′
Proposition 7 For H ∈ H ↓↓p Φ, H is an operator of maxichoice Horn package contraction.
Example 5 Consider where H = Cnh (a, b), P = {a, b, c}.
1. Let Φ = {a, b}. We obtain that
H ↓↓p Φ
=
{ Cnh (⊤), Cnh (c → a), Cnh (c → b),
Cnh (a → b, b → a),
Compare this with Example 3, where we have 14 candidate package contractions.
Res(S, p) = {φ | ∃φ1 , φ2 ∈ S s.t. p ∈ body(φ1 ),
p = head (φ2 ), and
φ = (body(φ1 ) \ {p} ∪ body(φ2 )) → head (φ1 )}
2. Let Φ = {a, a ∧ b}. We obtain that
H ↓↓p Φ
=
{ Cnh (b), Cnh (b ∧ (c → a)),
Theorem 5 f orget(S, p) ↔ S↓p ∪ Res(S, p).
Cnh (a → b, b → a),
Corollary 1 Let S be a set of Horn clauses and p an atom.
Then f orget(S, p) is equivalent to a set of Horn clauses.
Cnh (a → b, b → a, c → a, c → b) }.
Corollary 2 Let S1 and S2 be sets of Horn clauses and p
an atom. Then S1 ↔ S2 implies that f orget(S1 , p) ↔
f orget(S2 , p).
Any set of formulas that satisfies Definition 7 clearly also
satisfies Definition 5. One can further restrict the set of candidate package contractions by replacing S ′ ⊂ S by |S ′ | < |S|
in the third part of Definition 7. As well, of course, one could
continue in the obvious fashions to define a notion of partial
meet Horn package contraction.
6
There are several points of interest about these results.
The theorem is expressed in terms of arbitrary sets of Horn
clauses, and not just deductively-closed Horn belief sets.
Hence the second corollary states a principle of irrelevance
of syntax for the case for forgetting for belief bases. As well,
the expression S↓p ∪ Res(S, p) is readily computable, and so
the theorem in fact provides a means of computing f orget.
Further, the approach clearly iterates for more than one atom.
We obtain the additional result:
Forgetting in Horn Formulas
This section examines another means of removing beliefs
from an agent’s belief set, that of forgetting [Lin and Reiter,
1994; Lang and Marquis, 2002]. Forgetting is an operation
on belief sets and atoms of the language; the result of forgetting an atom can be regarded as decreasing the language by
that atom.
In general it will be easier to work with a set of Horn
clauses, rather than Horn formulas. Since there is no confusion, we will freely switch between sets of Horn clauses
and the corresponding Horn formula comprising the conjunction of clauses in the set. Thus any time that a set appears
as an element in a formula, it can be understood as standing for the conjunction of members of the set. Thus for sets
of clauses S1 and S2 , S1 ∨ S2 will stand for the formula
Corollary 3
f orget(f orget(S, p), q) ≡ f orget(f orget(S, q), p).
(In fact, this is an easy consequence of the definition of
forget.) Given this, we can define for set of atoms A,
f orget(S, A) = f orget(f orget(S, a), A \ {a}) where a ∈
A. On the other hand, forgetting an atom may result in a
quadratic blowup of the knowledge base.
Finally, it might seem that the approach allows for the definition of a revision operator – and a procedure for computing
93
a revision – by using something akin to the Levi Identity. Let
A(φ) be the set of atoms appearing in (formula or set of formulas) φ. Then:
Preliminary results and applications. In Proceedings of the
10th International Workshop on Non-Monotonic Reasoning (NMR-04), pages 171–179, Whistler BC, Canada, June
2004.
[Gärdenfors and Makinson, 1988] P.
Gärdenfors
and
D. Makinson. Revisions of knowledge systems using
epistemic entrenchment. In Proc. Second Theoretical
Aspects of Reasoning About Knowledge Conference,
pages 83–95, Monterey, Ca., 1988.
[Gärdenfors, 1988] P. Gärdenfors. Knowledge in Flux: Modelling the Dynamics of Epistemic States. The MIT Press,
Cambridge, MA, 1988.
[Garey and Johnson, 1979] M.R. Garey and D.S. Johnson.
Computers and Intractability: A Guide to the Theory of
NP-Completeness. W.H. Freeman and Co., New York,
1979.
[Hansson, 1999] S. O. Hansson. A Textbook of Belief Dynamics. Applied Logic Series. Kluwer Academic Publishers, 1999.
[Khardon, 1995] Roni Khardon. Translating between Horn
representations and their characteristic models. Journal of
Artificial Intelligence Research, 3:349–372, 1995.
[Lang and Marquis, 2002] J. Lang and P. Marquis. Resolving inconsistencies by variable forgetting. In Proceedings
of the Eighth International Conference on the Principles
of Knowledge Representation and Reasoning, pages 239–
250, San Francisco, 2002. Morgan Kaufmann.
[Langlois et al., 2008] M.
Langlois,
R.H.
Sloan,
B. Szörényi, and G. Turán. Horn complements: Towards Horn-to-Horn belief revision. In Proceedings of
the AAAI National Conference on Artificial Intelligence,
Chicago, Il, July 2008.
[Liberatore, 2000] Paolo Liberatore. Compilability and compact representations of revision of Horn knowledge bases.
ACM Transactions on Computational Logic, 1(1):131–
161, 2000.
[Lin and Reiter, 1994] F. Lin and R. Reiter. Forget it!
In AAAI Fall Symposium on Relevance, New Orleans,
November 1994.
[Reiter, 1987] R. Reiter. A theory of diagnosis from first
principles. Artificial Intelligence, 32(1):57–96, 1987.
[Zhuang and Pagnucco, 2010a] Z. Zhuang and Maurice Pagnucco. Two methods for constructing horn contractions.
AI 2010: Advances in Artificial Intelligence, pages 72–81,
2010.
[Zhuang and Pagnucco, 2010b] Zhi Qiang Zhuang and Maurice Pagnucco. Horn contraction via epistemic entrenchment. In Tomi Janhunen and Ilkka Niemelä, editors, Logics in Artificial Intelligence - 12th European Conference
(JELIA 2010), volume 6341 of Lecture Notes in Artificial
Intelligence, pages 339–351. Springer Verlag, 2010.
def
F Revise(S, φ) = f orget(S, A(S) ∩ A(φ)) + φ.
In fact, this does yield a revision operator, but an operator
that in general is far too drastic to be useful. To see this, consider a taxonomic knowledge base which asserts that whales
are fish, whale → f ish. Of course, whales are mammals,
but in using the above definition to repair the knowledge base,
one would first forget all knowledge involving whales. Such
an example doesn’t demonstrate that there are no reasonable
revision operators definable via forget, but it does show that a
naı̈ve approach is problematic.
7
Conclusions
This paper has collected various results concerning Horn belief set contraction. Earlier work has established a general
framework for maxichoice and partial meet Horn contraction.
The present paper then extends this work in various ways.
We examined issues related to supplementary postulates, developed an approach to package contraction, and explored
the related notion of forgetting. For future work, it would
be interesting to investigate relationships between remainderbased and entrenchment-based Horn contraction, as well as
to explore connections to constructions for (Horn) belief revision.
References
[Alchourrón et al., 1985] C.E. Alchourrón, P. Gärdenfors,
and D. Makinson. On the logic of theory change: Partial meet functions for contraction and revision. Journal of
Symbolic Logic, 50(2):510–530, 1985.
[Booth et al., 2009] Richard Booth, Thomas Meyer, and
Ivan José Varzinczak. Next steps in propositional Horn
contraction. In Proceedings of the International Joint Conference on Artificial Intelligence, Pasadena, CA, 2009.
[Delgrande and Wassermann, 2010] James Delgrande and
Renata Wassermann. Horn clause contraction functions:
Belief set and belief base approaches. In Fangzhen Lin
and Uli Sattler, editors, Proceedings of the Twelfth International Conference on the Principles of Knowledge Representation and Reasoning, pages 143–152, Toronto, 2010.
AAAI Press.
[Delgrande, 2008] J.P. Delgrande.
Horn clause belief
change: Contraction functions. In Gerhard Brewka and
Jérôme Lang, editors, Proceedings of the Eleventh International Conference on the Principles of Knowledge Representation and Reasoning, pages 156–165, Sydney, Australia, 2008. AAAI Press.
[Eiter and Gottlob, 1992] T. Eiter and G. Gottlob. On the
complexity of propositional knowledge base revision, updates, and counterfactuals. Artificial Intelligence, 57(23):227–270, 1992.
[Flouris et al., 2004] Giorgos Flouris, Dimitris Plexousakis,
and Grigoris Antoniou. Generalizing the AGM postulates:
94
A Selective Semantics for Logic Programs with Preferences
Alfredo Gabaldon
CENTRIA – Center for Artificial Intelligence
Universidade Nova de Lisboa
[email protected]
Abstract
standard answer sets of the logic program. Another characteristic of the selective approaches is that most of them extend
logic programs with preferences without increasing computational complexity.
In this work we focus on a selective approach and propose
a new semantics for answer set programming with preferences. The main motivation for introducing a new semantics
is that all of the existing selective approaches seem to be too
strong in the sense that there are programs that possess answer sets but not preferred answer sets. At the same time,
the same approaches seem to be weak in the sense that there
are programs that possess multiple answers sets that cannot
be distinguished apart even by a full prioritization. Our proposed semantics yields at most one preferred answer set when
a complete set of priorities is specified. Moreover, for a large
class of propositional logic programs (called negative-cyclefree and head-consistent) that are guaranteed to have answer
sets, we show that a preferred answer set always exists under
our proposed semantics. In the case of logic programs without classical negation, this is the most general known class of
programs guaranteed to have answer sets [Baral, 2003].
Our starting point of reference is the preferred answer set
semantics introduced by Brewka and Eiter [1999] (for brevity,
we will refer to this semantics as the BE semantics). Among
the selective semantics within the NP complexity class, the
BE semantics is the least restrictive in the sense that, for a
given logic program with a fixed set of preferences, it selects
a collection of preferred answer sets that is a superset of those
selected by the other approaches, as shown in [Schaub and
Wang, 2001]. In other words, if a program does not have
preferred answer sets under the BE semantics, neither does it
have any preferred answer sets under the other selective semantics. Since our aim is a semantics that always assigns a
preferred answer set to a large class of logic programs, the
BE semantics seems to be a good point of reference and comparison.
Agents in complex domains need to be able to
make decisions even if they lack complete knowledge about the state of their environment. One approach that has been fairly successful is to use logic
programming with answer set semantics (ASP) to
represent the beliefs of the agent and solve various reasoning problems such as planning. The
ASP approach has been extended with preferences
and several semantics have been proposed for selecting preferred answer sets of a logic program.
Among the available semantics, one proposed by
Brewka and Eiter has been shown to be more permissive than others in the sense of allowing the
selection of a larger number of answer sets as the
preferred ones. Although the semantics is permissive enough to allow multiple preferred answer sets
even for some fully prioritized programs, there are
on the other hand programs that have answer sets
but not preferred ones. We consider a semantics
that selects at most one answer set as the preferred
one for fully prioritized (propositional) programs,
and show that programs in a large class guaranteed
to have an answer set (negative-cycle-free, headconsistent) are also guaranteed to have a preferred
answer set.
1
Introduction
Reasoning with preferences is widely recognized as an important problem. Many knowledge representation formalisms
have been extended to represent preferences as priorities between propositions in a knowledge base. In particular, prioritized versions of logic programming with answer set semantics [Gelfond and Lifschitz, 1991; Gelfond, 2008] have
been studied and various semantics have been proposed
[Sakama and Inoue, 1996; Gelfond and Son, 1997; Zhang and
Foo, 1997; Brewka and Eiter, 1999; Delgrande et al., 2000;
Wang et al., 2000]. Some of these approaches [Brewka and
Eiter, 1999; Delgrande et al., 2000; Wang et al., 2000] are
classified as being selective in the sense that the preferences
are effectively used as a selection mechanism for choosing
among the answer sets of a logic program. That is, the preferred answer sets are always chosen from the collection of
2
Prioritized Extended Logic Programs
In this section we give an overview of the answer set semantics and of the BE preferred answer set semantics.
2.1 Extended Logic Programs
We start with the syntax of Extended Logic Programs (elps)
[Gelfond and Lifschitz, 1991; Gelfond, 2008]. A literal is an
95
atom p or its negation ¬p. Literals p and ¬p are called contrary and l denotes the literal contrary to l. If the language of
an elp is not explicitly defined then it is understood to consist
of all the atoms that appear in the program. Lit denotes the
set of all literals in the language of an elp.
A rule r is an expression of the form
Let us next look at the BE preferred answer set semantics.
We will refer to preferred answers sets under the BE semantics as BE-preferred answer sets. These definitions are simplified versions of those in [Brewka and Eiter, 1999] as we
focus in this work on propositional programs.
Definition 5. Let P = (P, <) be a fully prioritized elp where
P is a set of n prerequisite-free rules and let S be a set of
literals. The sequence of sets S0 , S1 , . . . , Sn is defined as
follows: S0 = ∅ and for 0 < i ≤ n,
l0 ← l1 , . . . , ln , not ln+1 , . . . , not lm
(1)
where l0 , . . . , lm are literals and not denotes negation-asfailure (or default negation). Expressions not l are called extended literals. For a rule r of the form (1), the head, l0 , is denoted by head(r), the set of literals {l1 , . . . , ln } by body + (r)
and the set of literals {ln+1 , . . . , lm } by body − (r). An extended logic program is a finite set of rules.
A set of literals S ⊆ Lit is called a partial interpretation.
A rule r is said to be defeated by a literal l if l ∈ body − (r). A
partial interpretation S defeats a rule r if there is a literal l ∈
S that defeats r. S satisfies the body of a rule r if body + (r) ⊆
S and S does not defeat r. S satisfies r if head(r) ∈ S or S
does not satisfy the body of r.
The answer sets of an elp whose rules do not contain not
is defined as follows.
Definition 1. Let P be an elp without default negation. A
partial interpretation S is an answer set of P if S is minimal
(wrt set inclusion) among the partial interpretations that satisfy the rules of P , and S is logically closed, i.e. if S contains
contrary literals then S is Lit.
For arbitrary programs, the definition is extended by introducing the Gelfond-Lifschitz reduct: let S be a partial interpretation and P be an elp. The reduct, P S , of P relative to S
is the set of rules l0 ← l1 , . . . , ln for all rules (1) in P that
are not defeated by S.
Definition 2. (Answer Set) A partial interpretation S is an
answer set of an elp P if S is an answer set of P S .
Si =
Si−1 ,
Si−1 ∪ {head(ri )},
if ri is defeated by Si−1 or
head(ri ) ∈ S and
ri is defeated by S,
otherwise.
The set CP (S) is defined to be the smallest set of literals
such that Sn ⊆ CP (S) and CP (S) is logically closed (consistent or equal to Lit).
Definition 6. (BE-preferred Answer Sets) Let P = (P, <)
be a fully prioritized elp with prerequisite-free P and let A be
an answer set of P . Then A is the BE-preferred answer set of
P iff A = CP (A).
As the definition suggests, a fully prioritized, prerequisitefree elp has at most one BE-preferred answer set.
For non prerequisite-free prioritized elps, a transformation
is applied similar to the Gelfond-Lifschitz reduct but that produces rules without prerequisites. The precise definition is as
follows.
Definition 7. Let P = (P, <) be a fully prioritized elp and
let S be a set of literals. Define SP = (SP, S<) to be the fully
prioritized elp such that SP is the set of rules obtained from
P by
1. deleting every rule r ∈ P s.t. body + (r) 6⊆ S, and
2. deleting body + (r) from every remaining rule r;
and S < is inherited from < by the mapping f : SP 7→ P
where f (r′ ) is the first rule in P wrt < such that r′ results
from r by step (2) above. In other words, for every r1 , r2 ∈
P , r1′ S< r2′ iff f (r1′ ) < f (r2′ ).
Definition 8. A set of literals A is a BE-preferred answer set
of a fully prioritized elp P = (P, <) iff A is a BE-preferred
answer set of AP.
In the next section we present some examples illustrating
this semantics, including programs without preferred answer
sets and fully prioritized programs with multiple ones.
2.2 Prioritized extended logic programs
We now turn to prioritized elps, adapting the definitions from
[Brewka and Eiter, 1999]. Let us start with the syntax.
An elp rule r of the form (1) is called prerequisite-free if
body + (r) = ∅ and an elp P is prerequisite-free if all its rules
are prerequisite-free.
Definition 3. A prioritized elp is a pair P = (P, <) where P
is an elp and < is a strict partial order on the rules of P .
The answer sets of a prioritized elp P = (P, <) are defined
as the answer sets of P and are denoted by AS(P).
Definition 4. A full prioritization of a prioritized elp P is
any pair P ′ = (P, <′ ) where <′ is a total order on P that
is compatible with <, i.e., r1 < r2 implies r1 <′ r2 for all
r1 , r2 in P .
The total ordering in a fully prioritized elp induces an enumeration r1 , r2 , . . . of its rules with r1 having the highest priority. Throughout the paper, we use such an enumeration in
examples and write
3
Limitations of existing semantics
As we discussed in the introduction, the motivation for
proposing a new semantics for prioritized logic programs is
twofold. First, there are elps that while containing no discernible conflict or contradiction and indeed possessing answer sets, have no BE-preferred answer sets and therefore
no preferred answer sets under the other semantics which are
more restrictive. Second, there are programs that even by
providing a full prioritization of its rules, it is not possible to
ri : l ← l1 , . . . , ln , not ln+1 , . . . , not lm
to denote the ith rule in such an enumeration.
96
4
select only one of the answer sets as the preferred one. Most
of the following examples are from [Brewka and Eiter, 1999].
Recall that ri means the rule is the ith rule in the enumeration
of the rules by priority.
Example 1. Consider the program P1 with rules
A new semantics for prioritized elps
Our proposed new semantics is intuitively based on the view,
following Brewka and Eiter, that priorities are used to resolve
conflicts. Intuitively, it is also based on the idea of taking conflicts between rules somewhat more literally by appealing to
a notion of “attack” that is to some degree inspired by argument attacks in argumentation. Here, by an attack of a rule on
another we simply mean that if the attacking rule fires, it will
defeat the other one. We then consider rules to be in conflict
when they attack each other, as in the program:
r1 : c ← not b.
r2 : b ← not a.
This program has one answer set, A = {b}. Since c 6∈ A
nor is r1 defeated by ∅, c ∈ CP1 (A). Therefore this program
has no BE-preferred answer sets.
Brewka and Eiter’s approach to preferences is based on
the notion (which we follow as well) that preferences are introduced in order to “solve potential conflicts...to conclude
more than in standard answer semantics.” [Brewka and Eiter,
1999]. Since the rules in the above program show no apparent
conflict between them and in fact the program has only one
answer set, it seems reasonable that it should have a preferred
answer set.
a ← not b.
b ← not a.
But these attacks can be indirect, through a chain of other
rules, as in the program:
a ← c.
b ← not a.
c ← not b.
The following example shows this shortcoming even more
directly, since one of the rules is a fact, i.e. does not involve
defaults at all.
Example 2. Consider the program P2 with rules
Here, the second rule attacks the first indirectly through the
third rule.
In order to simplify the development of our semantics for
prioritized logic programs, we appeal to a well know unfolding operation which would transform the above program into
the program:
r1 : a ← not b.
r2 : b.
a ← not b.
b ← not a.
c ← not b.
This program has one answer set, A = {b}. By a similar argument as in the previous example, we have that a ∈ CP2 (A)
and so the program has no BE-preferred answer sets.
The above examples seem to show that the semantics is in
some sense too strong (and so are other proposed selective semantics which have been shown to be even stronger). On the
other hand, as already mentioned, in some cases this semantics assigns multiple preferred answer sets to programs that
are fully prioritized. In other words, under the BE-preferred
answer set semantics there are cases where it is not possible
to “solve potential conflicts” completely even with a full prioritization. Consider the following example.
Example 3. Consider the program P3 with rules
The formal definitions follow.
Definition 9. (Unfolding [Aravindan and Dung, 1995]) Let
Pi be an elp and r be a rule in Pi of the form H ← L, Γ
where L is a literal different from H and Γ is the rest of the
rule’s body. Suppose that r1 , . . . , rk are all the rules in Pi
such that each rj is of the form L ← Γj such that L 6∈
body + (rj ). Then Pi+1 = (Pi \ {r}) ∪ {H ← Γj , Γ :
1 ≤ j ≤ k}. This operation is called unfolding r in Pi (or
unfolding in general), r is called the unfolded rule, and L is
called the selected literal in the unfolding.
r1 : b ← not ¬b, a.
r2 : c ← not b.
r3 : a ← not c.
The answer set semantics is one of the logic programming
semantics that satisfies the Generalized Principle of Partial
Evaluation [Aravindan and Dung, 1995; Dix, 1995; Brass and
Dix, 1999], which means that the unfolding transformation
above results in a program that has exactly the same answer
sets as the original program.
This fully prioritized elp has two answer sets: A1 = {c}
and A2 = {a, b}. They are both BE-preferred answer sets.
Consider A1 . Rule r1 does not belong to the reduct of the
program since prerequisite a 6∈ A1 . Then rule r2 is not defeated by ∅ nor by A1 , so we get c which then defeats rule r3
and we have A1 = CP3 (A1 ). Now consider A2 . Prerequisite
a is removed from r1 in the reduct of the program. Then we
have that rule r1 is not defeated by ∅ nor by A2 , so we get b
which then defeats rule r2 allowing r3 to fire. Thus we have
A2 = CP3 (A2 ).
This program is already fully prioritized. It is not possible
to use further priorities to select one of the two answer sets as
the preferred one. Ideally, a fully prioritized elp should have
at most one preferred answer set.
Let us define the unfolding operation for prioritized elps.
For our purposes it suffices to define it for fully prioritized
elps.
An unfolding operation for fully prioritized elps is defined
as follows. Pi+1 = (Pi+1 , <i+1 ) is the result of applying an
unfolding on Pi = (Pi , <i ) if
1. Pi+1 is the result of unfolding r ∈ Pi such that r is
replaced with rules r1′ , . . . , rk′ , and
2. for each rule rj′ obtained in the previous step, if rj′ ∈ Pi ,
i.e. an identical rule was already in the program, then
97
We will use the notation r → r′ to mean that head(r) ∈
body − (r′ ), and r ։ r′ to mean that there is a sequence r →
r1 , r1 → r2 , . . . , rk → r′ where k is an odd number. We say
that r attacks r′ in X if r is active in X and r → r′ .
(a) if rj′ <i r then let rj′ <i+1 r∗ (resp. r∗ <i+1 rj′ )
for every rule r∗ such that rj′ <i r∗ (resp. r∗ <i
rj′ ), i.e. rj′ retains the same priority, since it has
higher priority in Pi than the unfolded rule r.
(b) if r <i rj′ then let rj′ <i+1 r∗ (resp. r∗ <i+1 rj′ )
for every rule r∗ such that r <i r∗ (resp. r∗ <i r),
i.e. rj′ now has the same priority r has in Pi , since
r had higher priority.
Definition 12. Let P = (P, <) be an unfolded, fully prioritized elp. We define the sequence X0 , X1 , . . . satisfying the the following conditions: X0 = ∅ and Xi+1 =
Xi ∪ {head(r)} such that r is active in Xi and
3. for each rule rj′ obtain in step one such that rj′ 6∈ Pi ,
i.e. it is a new rule, <i+1 extends <i with the priorities
rj′ <i+1 r∗ (resp. r∗ <i+1 rj′ ) if r <i r∗ (resp. r∗ <i
r), i.e. these new rules are assigned the same priority r
has in Pi .
1. body(r) holds in Xi ; or
2. there is no active r s.t. body(r) holds in Xi (the previous case does not apply for any rule) and for all r′ , if r′
attacks r then r ։ r′ and r < r′ .2
Intuitively, in each iteration we first check if there are any
rules whose body is definitely satisfied by Xi . If so, we add
the heads of those rules and skip to the next iteration. If there
are no rules whose body is satisfied, then the second case adds
the heads of all rules r which are not attacked by any rule, or
if they are attacked by a rule r′ , the rules are in an even cycle
and r has higher priority.
It is easy to see that applying an unfolding operation results
in a fully prioritized elp.
Definition 10. A transformation sequence is a sequence of
fully prioritized elps P0 , . . . , Pn such that each Pi+1 is obtained by applying an unfolding operation on Pi .
Definition 11. The unfolding of a fully prioritized P, denoted
P, is the fully prioritized elp Pn such that there is a transformation sequence P0 , . . . , Pn where P0 = P and there is no
rule in Pn that can be unfolded.
Proposition 1. There exists n such that for all m > n, Xm =
Xn , i.e., the sequence reaches a fixpoint.
Example 4. Consider again the program
Let IP be the fixpoint set Xn as defined above if Xn does
not contain contrary literals, or Lit otherwise.
r1 : b ← not ¬b, a.
r2 : c ← not b.
r3 : a ← not c.
Definition 13. Let P be an unfolded, fully prioritized elp.
The set IP is a preferred answer set of P if IP is an answer
set of P.
The unfolding of this program consists of the following
rules:
It trivially follows from this definition that all preferred answer sets of a prioritized elp P are answer sets of P. It is
also easy to see that according to these definitions, if P has a
preferred answer set at all, it has one: IP .
r1′ : b ← not ¬b, not c.
r2 : c ← not b.
r3 : a ← not c.
The computation of IP may fail to produce one of the answer sets of the program, hence the test done afterwards to
check whether it is one them. For programs that are not
negative-cycle-free, the computation may reach a fixpoint
prematurely. The computation may also fail by producing
a set that contains contrary literals. Later we show that for
negative-cycle-free, head-consistent programs, this computation is guaranteed to produce one of the answer sets, making
the check unnecessary.
Here, the unfolding helps reveal more directly that there is
a conflict between the first two rules: in the unfolded program
the head of one rule appears negated in the body of the other
and vice-versa.
Let us now proceed to define our proposed preferred answer set semantics, starting with the semantics for unfolded,
fully prioritized elps. We start with some terminology.
For an arbitrary prioritized elp, preferred answer sets are
defined as follows.
Let P be an elp and X be a set of literals. We say a literal
l holds in X if l ∈ X. An extended literal not l is defeated
by X if l holds in X. Obviously, it is possible for not l not to
hold nor be defeated in X. A rule r is defeated by X if there
is a literal l ∈ body − (r) such that not l is defeated by X. An
extended literal not l holds in X if l holds in X or if every
rule r ∈ P whose head(r) = l is defeated in X. For a rule
r, the body, body(r), holds in X if body + (r) holds in X and
not l holds in X for each l ∈ body − (r). A rule r is active in
X if neither head(r) nor head(r) holds, body + (r) holds and
r is not defeated, in X.1
Definition 14. For an arbitrary prioritized elp P, A is a preferred answer set of P if A is a preferred answer set of the
unfolding P ′ of one of the full prioritizations P ′ of P.
The set of all preferred answer sets of P will be denoted by
P AS(P).
Given the above definition, the most interesting programs
to analyze are the fully prioritized programs. Therefore all
our examples use the latter.
2
According to this definition, the rules in the odd path need not
be active. But when a rule in the cycle from r to r′ is not active then
there is no longer a conflict to be resolved by priorities.
1
A similar notion of “active rule” is used in [Schaub and Wang,
2001]
98
5
Example 5. Consider the program from Example 1:
Properties
We start with a result establishing that prioritized elps with
our proposed semantics are a conservative extension of elps.
Given an elp P , a set of literals S is said to be generated
by R if R is the set of all the rules r ∈ P whose bodies are
satisfied by S and head(r) ∈ S.
r1 : c ← not b.
r2 : b ← not a.
Since there is no rule with head a, body(r2 ) holds in X0 =
∅, so X1 = {b} which is the fixpoint. Since {b} is also an
answer set, it is the preferred answer set.
Theorem 1. Let P = (P, <) be a prioritized elp with empty
<, that is, without priorities. Then AS(P) = P AS(P).
Example 6. Consider next the program from Example 3
which has the following unfolding (also shown in Example 4):
Proof. By definition P AS(P) ⊆ AS(P). We show that
AS(P) ⊆ P AS(P). Note that, by the equivalence preserving property of the unfolding operation, AS(P) = AS(P).
Thus it suffices to show that AS(P) ⊆ P AS(P). Note
that the preference relation is empty so P is defined in terms
the original unfolding without priorities. Let A ∈ AS(P).
A
Suppose A = Lit, then there are two rules r1 , r2 in P
s.t. head(r1 ) = head(r2 ) and s.t. the smallest set S
A
closed under the rules of P satisfies the bodies of r1 , r2 .
A
Since r1 , r2 ∈ P , body − (r1 ) = body − (r2 ) = ∅. Also,
since r1 , r2 are unfolded and their bodies satisfied by S,
body + (r1 ) = body + (r2 ) = ∅. Therefore, for any full prioritization of P, IP = Lit. Suppose A 6= Lit and let
R be the generating rules of A. Consider a full prioritization P ′ = (P ′ , <′ ) of P where for every rule r1 ∈ R and
r2 ∈ (P ′ \ R), r1 <′ r2 . Since rules R are generating rules,
there are no rules r1 , r2 ∈ R s.t. r1 → r2 , i.e. rules in R never
attack each other. Consider a rule r 6∈ R. Since this rule is
not generating, then either there is a literal l ∈ body + (r) s.t.
there is no rule r′ ∈ P ′ with head(r′ ) = l, or there is a rule
r′ ∈ R s.t. r → r′ . In the former case, r is never active. In the
latter case, since r′ < r, r is first attacked and then defeated.
Thus head(r) 6∈ IP ′ . On the other hand, every rule r′ ∈ R is
attacked only by rules in (P ′ \ R) which are defeated in IP ′ .
Therefore, head(r′ ) ∈ IP ′ . We conclude that IP ′ = A and
therefore that A is a preferred answer set.
r1′ : b ← not ¬b, not c.
r2 : c ← not b.
r3 : a ← not c.
Extended literal not ¬b holds in X0 = ∅ since there are
no rules with head ¬b. Also, r2 attacks r1′ but r1′ ։ r2 and
r1′ < r2 . So r1′ fires. On the other hand, both r2 , r3 are
attacked by higher priority rules that are active in X0 . Hence
X1 = {b}. Then r2 is defeated in {b} so it is no longer active.
Hence X2 = {a, b}, which is the fixpoint, an answer set, and
therefore the (only) preferred answer set.
The following example shows that there are programs that
have no preferred answer sets.
Example 7. Consider the program:
r1 : p ← not p, a.
r2 : a ← not b.
r3 : b ← not a.
which unfolds into the program consisting of
r1′ : p ← not p, not b.
and r2 , r3 .
In this case r1 attacks itself so it never fires. The resulting
fixpoint is {a} which is not an answer set and therefore not
a preferred answer set. This program then has an answer set,
{b}, but no preferred answer sets according to our semantics.
As we discussed in the introduction, one of the motivations
for a new semantics is the intention that by adding priorities
it should be possible to select only one of the multiple answer sets of a program. Example 7 shows that the existence
of answer sets does not guarantee the existence of preferred
answer sets, although this seems to occur only in programs
involving implicit constraints. The following theorem establishes the existence of preferred answer sets for a class of
programs where constraints are ruled out. An elp P is said to
be head-consistent if the set of literals {head(r) : r ∈ P } is
consistent, i.e. does not contain contrary literals. It is said to
be negative-cycle-free if the dependency graph of P does not
contain cycles with an odd number of negative edges.
The above program does not have BE-preferred answer
sets either. The fact that it does not have preferred answer
sets is not really surprising. The program essentially has two
parts, one part consisting of rules r2 , r3 which intuitively generates a choice between a and b, and rule r1 which says that
answer sets that contain a must be ruled out. But the priorities
on the bottom two rules say to prefer a which conflicts with
the constraint represented by r1 . Note that if the priorities on
r2 , r3 are relaxed, i.e. the priority r2 < r3 is removed, then
{b} is a preferred answer set.
It is well know in answer set programming that cycles with
an odd number of negative edges in the dependency graph,
such as the cycle involving p and rule r1 above, are used as
constraints that eliminate answer sets. In the following section we show that absent such constraints, a prioritized elp
is guaranteed to have a preferred answer set according to our
semantics.
Theorem 2. Let P = (P, <) be a fully prioritized elp s.t.
P is negative-cycle-free and head-consistent. Then P has a
preferred answer set.
Proof. Consider the program P ′ of the unfolding P. Unfolding cannot add negative cycles, so P ′ is negative-cycle-free.
Let X denote the set IP . We show that X is an answer set
of P ′ . Let A be the answer set of the reduct P ′X . For any
99
rule r ∈ P ′ , by r′X we denote the rule that results from r in
computing the reduct P ′X . If r is deleted from the reduct, we
will write r′X 6∈ P ′X .
X ⊆ A: Assume X 6⊆ A. Let X0 , X1 , . . . , Xn = X be
the sequence of IP . Let i be the smallest such that Xi 6⊆ A.
Let l be any literal such that l ∈ Xi but l 6∈ A. There must
be a rule r ∈ P ′ such that l = head(r) and r is active in
Xi−1 . Then either body(r) holds in Xi−1 or for every rule r′
that attacks r in Xi−1 , r < r′ and r ։ r′ . In case body(r)
holds in Xi−1 ⊆ A, then l ∈ A. Contradiction. Consider
otherwise any literal b ∈ body − (r) and any rule r′ ∈ P ′
with b = head(r′ ). Then 1) r′ is not active in Xi−1 or 2)
r < r′ and r ։ r′ . In case (1), we have three possibilities: i)
b ∈ body + (r′ ); ii) ¬b ∈ Xi−1 hence ¬b ∈ X and ¬b ∈ A,
therefore b 6∈ A unless A = Lit which is a contradiction; iii)
r′ is defeated in Xi−1 and hence defeated in A. We conclude
that for any rule r′ not active in Xi−1 , if r′X ∈ P ′X then
b ∈ body + (r′ ). In case (2), we have that l ∈ body − (r′ ) hence
r′X 6∈ P ′X . We conclude that the only rules in P ′X with head
b have b in the positive body. Therefore b 6∈ A. Since this
holds for any b ∈ body − (r) we have that body − (rX )∩A = ∅.
Since r is active in Xi−1 , body + (r) holds in Xi−1 and in
A. Since P ′ is head-consistent, X 6= Lit and rX ∈ P ′X .
Therefore l ∈ A. Contradiction.
A ⊆ X: Assume A 6⊆ X. Let l be any literal such that
l ∈ A but l 6∈ X. There must be a rule r ∈ P such that
l = head(r) and rX ∈ P ′X . Since l ∈ A and P ′ is unfolded,
body + (r) = ∅ and body − (r) ∩ X = ∅. This means that r
is active in X. Since l 6∈ X, there is at least one rule that
attacks r in X. Let r′ be any rule that attacks r in X. Since
rX ∈ P ′X , head(r′ ) 6∈ X, and since r′ attacks r in X, r′
is active in X. Since head(r′ ) 6∈ X, r′ must be attacked in
X by some other rule whose head is not in X either. This
implies that there is a set of rules active in X that includes
r, r′ and that are in a cycle of attacks. Consider the largest
such cycle. Since P is negative-cycle-free, the number of
rules in the cycle is even. Let r1 be the rule in the cycle
with the highest priority. For any rule r2 that attacks r1 in
X it must be the case that r1 < r2 and r1 ։ r2 . But this
implies that head(r1 ) ∈ X and therefore that either l ∈ X or
head(r′ ) ∈ X. Contradiction.
The first principle is a postulate meant as a minimal requirement on the treatment of < as a preference relation.
Principle 1. Let A1 , A2 be two answer sets of a prioritized
elp P = (P, <) generated by the rules R∪{r1 } and R∪{r2 },
respectively, where r1 , r2 6∈ R. If r1 < r2 then A2 is not a
preferred answer set of P.
The second principle is about relevance. It says that a preferred answer set A should not become non-preferred after
adding a rule with a prerequisite not in A while keeping all
preferences intact.
Principle 2. Let A be a preferred answer set of (P, <) and r
be a rule such that at least one prerequisite of r is not in A.
Then A is a preferred answer set of (P ∪ {r}, <′ ) whenever
<′ agrees with < on the rules in P .
Let us show that our proposed semantics satisfies the first
principle.
Theorem 3. The preferred answer set semantics based on the
computation of the set IP satisfies Principle 1.
Proof. Let A1 , A2 be answer sets of a fully prioritized, unfolded P = (P, <) generated by R ∪ {r1 } and R ∪ {r2 },
respectively, where r1 , r2 6∈ R and r1 < r2 . Since r1 , r2
are unfolded and satisfied by A1 , A2 , we have body + (r1 ) =
body + (r2 ) = ∅. Assume that A2 is a preferred answer set of
P. Since r1 is not a generating rule of A2 , r1 is defeated in
A2 . Then there exists b ∈ body − (r1 ) s.t. b ∈ A2 . Hence
there is a rule r′ ∈ P s.t. b = head(r′ ). There are two cases:
a) r′ ∈ R. Then b ∈ A1 and r1 is defeated in A1 . Contradiction. b) r′ 6∈ R. Then r′ must be r2 . Since r1 < r2 , it
must be the case that body(r2 ) holds in A2 (by case 1 of Definition 12). But this implies that body(r2 ) holds in A1 and
hence that b ∈ A1 and that r1 is defeated in A1 . Contradiction. Therefore A2 is not a preferred answer set.
Principle 2, which is about relevance, is not satisfied by
our semantics. We use the program from Example 3 to make
some observations.
Example 8. Consider again the program P from Example 3:
r1 : b ← not ¬b, a.
r2 : c ← not b.
r3 : a ← not c.
Corollary 1. If P is a fully prioritized elp with negativecycle-free and head-consistent P , then the set IP is the preferred answer set.
Consider the program P ′ consisting only of r2 , r3 with r2 <
r3 . This program has one answer set A1 = {c} which is also
preferred according to our semantics. The full program P has
two answer sets, A1 and the now preferred A2 = {a, b}. P ′
has no BE-preferred answer sets while for P both A1 , A2 are
BE-preferred.
According to Principle 2, {c} should remain a preferred
answer set because r1 is not applicable in A1 (not relevant).
But in terms of attacks, r1 seems relevant since it can attack
r2 and has higher priority. Moreover, if we replace r1 with
the unfolded version b ← not ¬b, not c, which results in an
equivalent program, Principle 2 no longer says anything about
it since it defines relevance in terms of prerequisites (literals
in body + ). In other words, applying an unfolding operation
The main point of this corollary is that for a prioritized
elp P with negative-cycle-free, head-consistent program, the
computation of the sequence X0 , . . . , Xn of Definition 12 is
guaranteed to produce an answer set, making the check stipulated in Definition 13 unnecessary.
In [Brewka and Eiter, 1999] one can find multiple examples illustrating how the BE-preferred answer set semantics
overcomes some of the shortcomings of previous approaches.
Based on their study of these shortcomings and the development of their semantics, Brewka and Eiter proposed two
principles that ought to be satisfied by any system based on
prioritized defeasible rules. We reproduced them here in their
particular form for prioritized elps.
100
[Gelfond and Lifschitz, 1991] Michael
Gelfond
and
Vladimir Lifschitz. Classical negation in logic programs and disjunctive databases.
New Generation
Computing, 9(3-4):365–386, 1991.
[Gelfond and Son, 1997] Michael Gelfond and Tran Cao
Son. Reasoning with prioritized defaults. In Jurgen Dix,
Luis Moniz Pereira, and Teodor C. Przymusinski, editors,
Selected Papers of the Third International Workshop on
Logic Programming and Knowledge Representation, number 1471 in LNCS, pages 164–223. Springer, 1997.
[Gelfond, 2008] Michael Gelfond. Answer sets. In F. van
Harmelen, V. Lifschitz, and B. Porter, editors, Handbook
of Knowledge Representation, chapter 7, pages 285–316.
Elsevier, 2008.
[Sakama and Inoue, 1996] Chiaki Sakama and Katsumi Inoue. Representing priorities in logic programs. In M. Maher, editor, Proceedings of the Joint International Conference and Symposium on Logic Programming, pages 82–
96, 1996.
[Schaub and Wang, 2001] Torsten Schaub and Kewen Wang.
A comparative study of logic programs with preference. In
Bernhard Nebel, editor, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI’01), pages 597–602, 2001.
[Wang et al., 2000] Kewen Wang, Lizhu Zhou, and
Fangzhen Lin. Alternating fixpoint theory for logic
programs with priority. In John W. Lloyd, Veronica Dahl,
Ulrich Furbach, Manfred Kerber, Kung-Kiu Lau, Catuscia
Palamidessi, Luis M. Pereira, Yehoshua Sagiv, and Peter J.
Stuckey, editors, Proceedings of the First International
Conference on Computational Logic, number 1861 in
LNCS, pages 164–178. Springer, 2000.
[Zhang and Foo, 1997] Yan Zhang and Norman Y. Foo. Answer sets for prioritized logic programs. In Jan Maluszynski, editor, Proceedngs of the International Symposium on
Logic Programming (ILPS’97), pages 69–83, 1997.
allows switching from violating to satisfying Principle 2, even
though unfolding has no effect on a program’s semantics.
The example above also shows that satisfying Principle 2
necessarily requires that some programs have no preferred answer sets or to have multiple ones. In the above example, P ′
has one answer set, {c}, which is not the same as the intuitively preferred answer set of P. But Principle 2 requires
that if {c} is a preferred answer set, it must remain one after
adding r1 . For these reasons we believe that Principle 2 as
stated is not entirely suitable.
It is worth mentioning that Brewka and Eiter [1999] define
another semantics, called weakly preferred semantics, which
assigns preferred answer sets to programs that do not have
one under the BE semantics. However, this semantics is based
on quantitative measures of relative satisfaction of the preferences, which is very different to the style of semantics we
propose here and of the BE semantics. Moreover, the weakly
preferred semantics does not satisfy either of the above principles.
6
Conclusions
We have proposed a new (selective) semantics for prioritized
extended logic programs. In contrast to previously proposed
semantics, ours selects at most one preferred answer set for all
programs that are fully prioritized. Furthermore, for a large
class of programs guaranteed to have an answer set, the existence of a preferred answer set is also guaranteed. We have
also shown that our semantics captures the intended meaning
of preferences as postulated by Principle 1 from [Brewka and
Eiter, 1999].
Future work includes looking at whether the set of preferred answer sets of a program under our semantics is a subset of the BE-preferred answer sets when the latter exist. Another is to generalize the results to programs with variables.
References
[Aravindan and Dung, 1995] Chandrabose Aravindan and
Phan Minh Dung. On the correctness of unfold/fold transformation of normal and extended logic programs. Journal
of Logic Programming, 24(3):201–217, 1995.
[Baral, 2003] Chitta Baral. Knowledge representation, reasoning and declarative problem solving. Cambridge University Press, 2003.
[Brass and Dix, 1999] Stefan Brass and Jürgen Dix. Semantics of (disjunctive) logic programs based on partial evaluation. Journal of Logic Programming, 40(1):1–46, 1999.
[Brewka and Eiter, 1999] Gerhard Brewka and Thomas
Eiter. Preferred answer sets for extended logic programs.
Artificial Intelligence, 109(1–2):297–356, 1999.
[Delgrande et al., 2000] James P. Delgrande, Torsten
Schaub, and Hans Tompits. Logic programs with compiled preferences. In Werner Horn, editor, Proceedings of
the 14th European Conference on Artificial Intelligence
(ECAI’00), pages 464–468, 2000.
[Dix, 1995] Jürgen Dix. A classification theory of semantics
of normal logic programs: II. Weak properties. Fundamenta Informaticae, 22(3):257–288, 1995.
101