On the Implementation of Speculative Constraint Processing

Ma, Jiefei; Russo, Alessandra; Broda, Krysia; Hosobe, Hiroshi; Satoh, Ken

On the Implementation of Speculative Constraint Processing

Krysia Broda

2010, Lecture Notes in Computer Science

visibility

…

description

229 pages

link

1 file

These are the proceedings of the tenth International Workshop on Computational Logic in Multi-Agent Systems (CLIMA-X), held from 9-10th September in Hamburg, colocated with MATES and MOCA.

Proceedings of the 10th International Workshop on Computational Logic in Multi-Agent Systems 2009 Jürgen Dix, Michael Fisher and Peter Novák (eds.) IfI Technical Report Series IfI-09-08 Impressum Publisher: Institut für Informatik, Technische Universität Clausthal Julius-Albert Str. 4, 38678 Clausthal-Zellerfeld, Germany Editor of the series: Jürgen Dix Technical editor: Michael Köster Contact: [email protected] URL: http://www.in.tu-clausthal.de/forschung/technical-reports/ ISSN: 1860-8477 The IfI Review Board Prof. Dr. Jürgen Dix (Theoretical Computer Science/Computational Intelligence) Prof. Dr. Klaus Ecker (Applied Computer Science) Prof. Dr. Barbara Hammer (Theoretical Foundations of Computer Science) Prof. Dr. Sven Hartmann (Databases and Information Systems) Prof. Dr. Kai Hormann (Computer Graphics) Prof. Dr. Gerhard R. Joubert (Practical Computer Science) apl. Prof. Dr. Günter Kemnitz (Hardware and Robotics) Prof. Dr. Ingbert Kupka (Theoretical Computer Science) Prof. Dr. Wilfried Lex (Mathematical Foundations of Computer Science) Prof. Dr. Jörg Müller (Business Information Technology) Prof. Dr. Niels Pinkwart (Business Information Technology) Prof. Dr. Andreas Rausch (Software Systems Engineering) apl. Prof. Dr. Matthias Reuter (Modeling and Simulation) Prof. Dr. Harald Richter (Technical Computer Science) Prof. Dr. Gabriel Zachmann (Computer Graphics) Prof. Dr. Christian Siemers (Hardware and Robotics) Proceedings of the 10th International Workshop on Computational Logic in Multi-Agent Systems 2009 Jürgen Dix, Michael Fisher and Peter Novák (eds.) These are the proceedings of the tenth International Workshop on Computational Logic in Multi-Agent Systems (CLIMA-X), held from 9-10th September in Hamburg, colocated with MATES and MOCA. Multi-Agent Systems are communities of problem-solving entities that can perceive and act upon their environment in order to achieve both their individual goals and their joint goals. The work on such systems integrates many technologies and concepts from artificial intelligence and other areas of computing as well as other disciplines. Over recent years, the agent paradigm gained popularity, due to its applicability to a full spectrum of domains, such as search engines, recommendation systems, educational support, e-procurement, simulation and routing, electronic commerce and trade, etc. Computational logic provides a well-defined, general, and rigorous framework for studying the syntax, semantics and procedures for the various tasks in individual agents, as well as the interaction between, and integration among, agents in multi-agent systems. It also provides tools, techniques and standards for implementations and environments, for linking specifications to implementations, and for the verification of properties of individual agents, multi-agent systems and their implementations. This year, we have again organised the Multi Agent Contest with CLIMAX: http://www.multiagentcontest.org/). It is now the fifth in a series that started in 2005 with CLIMA-6 in London. The contest is an attempt to stimulate research in the area of multi-agent programming by (1) identifying key problems and (2) collecting suitable benchmarks that can serve as milestones for testing agent-oriented programming languages, platforms and tools. A simulation platform has been developed to test MAS’s which have to solve a cooperative task in a dynamically changing environment. Last year we have changed our scenario and consider now the problem of herding cows: a truly cooperative task which requires the agents to work together and not on their own. These proceedings feature 11 regular papers (from a total of 18 papers submitted), as well as an abstract of an invited talk by Son Tran and the six contest papers. We thank all the authors of CLIMA-X and of the Multi Agent Contest for submitting papers and for revising their contributions to be included in i these proceedings. We are very grateful to the members of the CLIMA-X programme committee and the additional reviewers. Their service ensured the high quality of the accepted papers. A special thank you goes to the local organisers in Hamburg for their help and support. We are very grateful o them for handling the registration and a very enjoyable social program. August 2009 Jürgen Dix Michael Fisher Peter Novák ii Conference Organization Steering Committee Jürgen Dix, Clausthal University of Technology, Germany Michael Fisher, University of Liverpool, United Kingdom João Leite, New University of Lisbon, Portugal Francesca Toni, Imperial College London, United Kingdom Fariba Sadri, Imperial College London, United Kingdom Ken Satoh, National Institute of Informatics, Japan Paolo Torroni, University of Bologna, Italy Programme Chairs Jürgen Dix, Clausthal University of Technology, Germany Michael Fisher, University of Liverpool, United Kingdom Peter Novák, Clausthal University of Technology Programme Committee Thomas Ågotnes (Bergen, NO) Natasha Alechina (Nottingham, UK) Jose Julio Alferes (Lisbon, PT) Rafael Bordini (Durham, UK) Gerhard Brewka (Leipzig, DE) Keith Clark (Imperial, UK) Stefania Costantini (L’Aquila, IT) Mehdi Dastani (Utrecht, NL) Louise Dennis (Liverpool, UK) Chiara Ghidini (Trento, IT) James Harland (RMIT, AUS) Hisashi Hayashi (Toshiba, JP) Koen Hindriks (Delft, NL) Wiebe van der Hoek (Liverpool, UK) Katsumi Inoue (NII, JP) Wojtek Jamroga (Clausthal, DE) Viviana Mascardi (Genoa, IT) Paola Mello (Bologna, IT) John-Jules Meyer (Utrecht, NL) Leora Morgenstern (Stanford, USA) Naoyuki Nide (Nara, JP) Mehmet Orgun (Macquarie, AUS) Maurice Pagnucco (NSW, AUS) iii Jeremy Pitt (Imperial, UK) Enrico Pontelli (New Mexico, USA) Chiaki Sakama (Wakayama, JP) Renate Schmidt (Manchester, UK) Tran Cao Son (New Mexico, USA) Kostas Stathis (RHUL, UK) Michael Thielscher (Dresden, DE) Marina de Vos (Bath, UK) Cees Witteveen (Delft, NL) External Reviewers Gauvain Bourgne Carlos Ivan Chesnevar Agostino Dovier Rubén Fuentes-Fernández Ullrich Hustadt Wojtek Jamroga Mehrnoosh Sadrzadeh Yingqian Zhang iv Workshop Programme Session 1: Formal approaches and model checking Wednesday 11:30-13:00 session chair: Michael Fisher Invited talk: Logic Programming and Multiagent Planning Wednesday 16:30-17:30 conference joint session session chair: Jürgen Dix Session 2: Belief-Desire-Intention Wednesday 14:30-15:30 session chair: Koen V. Hindriks AgentContest results announcement Wednesday 15:30-16:00 Session 3: Answer Set Programming and (Multi-)Agent Systems Thursday 10:30-12:00 session chair: Peter Novák Session 4: Coordination and Deliberation Thursday 13:30-15:00 session chair: Ken Satoh v Table of Contents Invited talk. Logic Programming and Multiagent Planning (invited talk) . . . . . . . . . . . Tran Cao Son 1 Session 1. Formal approaches and model checking RTL and RTL*: Expressing Abilities of Resource-Bounded Agents . . . . . . Nils Bulling, Berndt Farwer Reasoning about Multi-Agent Domains using Action Language C: A Preliminary Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chitta Baral, Tran Cao Son, Enrico Pontelli Model Checking Normative Agent Organisations . . . . . . . . . . . . . . . . . . . . Louise Dennis, Nick Tinnemeier, John-Jules Meyer 2 20 38 Session 2. Belief-Desire-Intention Operational Semantics for BDI Modules in Multi-Agent Programming . Mehdi Dastani, Bas Steunebrink 55 BDI logic with probabilistic transition and fixed-point operator . . . . . . . Naoyuki Nide, Shiro Takata, Megumi Fujita 71 Session 3. Answer Set Programming and (Multi-)Agent Systems InstQL: A Query Language for Virtual Institutions using Answer Set Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luke Hopton, Owen Cliffe, Marina De Vos, Julian Padget 87 On the Implementation of Speculative Constraint Processing . . . . . . . . . 105 Jiefei Ma, Alessandra Russo, Krysia Broda, Hiroshi Hosobe, Ken Satoh Interacting Answer Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Chiaki Sakama, Tran Cao Son Session 4. Coordination and deliberation A Characterization of Mixed-Strategy Nash Equilibria in PCTL Augmented with a Cost Quantifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Pedro Arturo Gongora, David A. Rosenblueth vi Argumentation-Based Preference Modelling with Incomplete Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Wietske Visser, Koen Hindriks, Catholijn Jonker A Difference Logic Approach to Solve Matching Problems in Multi-Agent Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Helena Keinänen, Misa Keinänen Special track. Multi-Agent Programming Contest On the Decentral Coordination of Artificial Cowboys: A Jadex-based Realization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Gregor Balthasar, Jan Sudeikat, Wolfgang Renz Developing Artificial Herders Using Jason . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Niklas Skamriis Boss, Andreas Schmidt Jensen, Jørgen Villadsen Herding Agents - MicroJIAC 2.0 Team in Multi-Agent Programming Contest 2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Erdene-Ochir Tuguldur Using Jason, MOISE+, and CArtAgO to Develop a Team of Cowboys . . . 203 Jomi Fred Hubner, Rafael H. Bordini, Gustavo Pacianotto Pacianotto, Ricardo Hahn Pereira, Gauthier Picard, Michele Piunti, Jaime Sichman AF-ABLE: System Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 David Lillis Cows and Fences: JIAC V - AC’09 Team Description . . . . . . . . . . . . . . . . . . 213 Axel Hessler vii viii Logic Programming and Multiagent Planning Tran Cao Son Department of Computer Science New Mexico State University Las Cruces, NM 88003, USA [email protected] Abstract. Multiagent planning deals with the problem of generating plans for multiple agents. It requires formalizing ways for the agents to interact and cooperate, in order to achieve their goals. We will discuss two possible ways for agents to interact: the execution of cooperative actions and negotiations. We begin with the introduction of an action language for specifying multiagent planning problems. We next discuss a model for integration of negotiation in multiagent planning. Finally, we show how multiagent plans can be computed via answer set programming. The work presented in this talk is a joint work with Chiaki Sakama and Enrico Pontelli. 1 Expressing Properties of Resource-Bounded Systems: The Logics RTL and RTL∗ Nils Bulling1 and Berndt Farwer2 1 2 Department of Informatics, Clausthal University of Technology, Germany School of Engineering and Computing Sciences, Durham University, UK Abstract. Computation systems and logics for modelling such systems have been studied to a great extent in the past decades. This paper introduces resources into the models of systems and proposes the ResourceBounded Tree Logics RTL and RTL∗ , based on the well-known Computation Tree Logics CTL and CTL∗ , for reasoning about computations of such systems. We present some preliminary results on the complexity/decidability of model checking. 1 Introduction The basic idea of rational agents being autonomous entities perceiving changes in their environment and acting according to a set of rules or plans in the pursuit of goals does not take resources into account. However, many actions that an agent would execute in order to achieve a goal can – in real life – only be carried out in the presence of certain resources. Without sufficient resources some actions are not available, leading to plan failure. The analysis of agents and (multi agent) systems with resources is still in its infancy and has been tackled almost exclusively in a pragmatic and experimental way. This paper takes first steps in modelling resource bounded systems (which can be considered as the singleagent case of the scenario just described). Well-known computational models are combined with a notion of resource to enable a more systematic and rigorous specification and analysis of such systems. The main motivation of this paper is to propose a fundamental formal setting. In the future we plan to focus on a more practical aspect, i.e. how this setting can be used for the verification of systems. The proposed logic builds on Computation Tree Logic [4]. Essentially, the existential path quantifier Eϕ (there is a computation that satisfies ϕ) is replaced by hρiγ where ρ represents a set of available resources. The intuitive reading of the formula is that there is a computation feasible with the given resources ρ that satisfies ϕ. Finally, we turn to the decidability of model checking the proposed logics. We show that RTL, the less expressive version, has a decidable model checking problem as well as restricted variants of the full logic RTL∗ and its models. A closer analysis is left for future research. The remainder of the paper is structured as follows. In Section 2 we recall the computation tree logic CTL∗ and define multisets used as a representation 2 for of the resources. Section 3 forms the main part of the paper. Here we introduce resources into the well-known logic CTL∗ and its models. Subsequently, in Section 4 we show some properties of the logics. Section 5 includes the analysis of the model checking complexity, and finally, we conclude with an outlook on future work in Section 6. 2 Preliminaries In this section we present the computation tree logics CTL and CTL∗ as well as multisets which we will use to represent resources. 2.1 Computation Tree Logic and Transition Systems A(n) (unlabelled) transition system (or Kripke structure) T = (Q, →) consists of a finite set of states Q and a (serial) binary relation →⊆ Q × Q between states. We say that a state q ′ is reachable from a state q if q → q ′ . A Kripke model is defined as M = (Q, →, Props, π) where (Q, →) is a transition system, Props a non-empty set of propositions, and π : Q → P(Props) a labelling function that indicates which propositions are true in a given state. Such models represent the temporal behaviour of systems. There are no restrictions on the number of times a transition is used. A path λ of a transition system is an infinite sequence q0 q1 . . . of states such that qi → qi+1 for all i = 0, 1, 2, . . . . Given a path λ we use λ[i] and λ[i, j] to refer to state qi and to the path qi qi+1 . . . qj where j = ∞ is permitted, respectively. A path starting in q is called q-path. The set of all paths in M is denoted by ΛM and the set of all q-paths by ΛM (q). Formulae of CTL∗ [6] are defined by the following grammar: ϕ ::= p | ¬ϕ | ϕ ∧ ϕ | Eγ where γ ::= ϕ | ¬γ | γ ∧ γ | ϕ U ϕ | gϕ and p ∈ Props. Formulae ϕ (resp. γ) are called state (resp. path) formulae. There are two temporal operators: g(in the next moment in time) and U (until). The temporal operators ♦ (sometime in the future) and (always in the future) can be defined as abbreviations. CTL∗ formulae are interpreted over Kripke structures; truth is given by the satisfaction relation in the usual way: For state formulae we have M, q M, q M, q M, q |= p iff λ[0] ∈ π(p) and p ∈ Props; |= ¬ϕ iff M, q 6|= ϕ; |= ϕ ∧ ψ iff M, q |= ϕ and M, q |= ψ; |= Eϕ iff there is a path λ ∈ ΛM (q) such that M, λ |= ϕ; and for path formulae M, λ |= ϕ iff M, λ[0] |= ϕ; M, λ |= ¬γ iff M, λ 6|= γ; M, λ |= γ ∧ δ iff M, λ |= γ and M, λ |= δ; 3 M, λ |= gγ iff λ[1, ∞], π |= γ; and M, λ |= γ U δ iff there is an i ∈ N0 such that M, λ[i, ∞] |= δ and M, λ[j, ∞] |= γ for all 0 ≤ j < i; A less expressive fragment of CTL∗ called CTL [4] has become popular due to of its better computational properties. CTL restricts CTL∗ such that every temporal operator must directly be preceded by a path quantifier. The formula E ♦ p, for instance, is a formulae of the full language but not of the restricted version. 2.2 Multisets We define some variations of multisets used in the following sections. We assume that N = {0, 1, 2, . . . } and Z = {. . . , −2, −1, 0, 1, 2, . . . }. ± ⊕ Definition 1 (Z/Z∞ -multiset,X ± , X6∞ , N/N∞ -multiset, X ⊕ , X6∞ ). Let X be a non-empty set. (a) A Z-multiset (or Z-bag) Z : X−→Z over the set X is a mapping from the elements of X to the integers. A Z∞ -multiset (or Z∞ -bag) Z : X−→Z ∪ {−∞, ∞} over the set X is a mapping from the elements of X to the integers extended by −∞ and ∞. ± The set of all Z-multisets (resp. Z∞ -multisets) over X is denoted by X6∞ ± (resp. X ). (b) An N-multiset (resp. N∞ -multiset) N over X is a Z-multiset (resp. Z∞ multiset) over X such that for each x ∈ X we have N(x) ≥ 0. The set of all ⊕ N-multisets (resp. N∞ -multisets) over X is denoted by X6∞ (resp. X ⊕ ). Whenever we speak of a ‘multiset’ without further specification, the argument is supposed to hold for any variant from Def. 1. In general, we overload the standard set notation and use it also for multisets, i.e. ⊆ denotes multiset inclusion, ∅ is the empty multiset, etc. We assume a global set of resource types R. The resources of an individual agent form a multiset over this set. Z-multiset operations are straightforward extensions of N-multiset operations. Multisets Pare frequently written as formal sums, i.e., a multiset M : X−→N is written as x∈X M(x). Given two multisets M : X−→N and M′ : X−→N over the same set X, multiset union is denoted by +, and is defied as (M + M′ )(x) := M(x) + M′ (x) for all x ∈ X. Multiset difference is defined only if M has at least as many copies of each element as M′ . Then, (M − M′ )(x) := M(x) − M′ (x) for all x ∈ X. For Z-multisets, + is defined exactly as for multisets, but the condition is dropped for multiset difference, since for Z-multisets negative multiplicities are possible. Finally, for Z∞ -multisets we assume the standard arithmetic rules for −∞ and ∞ (for example, x + ∞ = ∞, x − ∞ = −∞, etc) with the following exceptional deviation: ∞ − ∞ = 0 = −∞ + ∞ We define multisets with a bound on the number of elements of each type. 4 Definition 2 (Bounded multisets). Let k, l ∈ Z. We say that a multiset M over a set X is k-bounded iff ∀x ∈ X (M(x) ≤ k). We use k X ± to denote the set of all k-bounded Z∞ -multisets over X; and analogously for the other types of multisets. We also introduce lower bounds and say that a multiset M over a set X is k -bounded iff ∀x ∈ X (l ≤ M(x) ≤ k). We use kl X ± to denote the set of all l k ∞ l -bounded Z -multisets over X; and analogously for the other types of multisets. Finally, we define the (positive) restriction of a multiset with respect to another multiset, allowing us to focus on elements with a positive multiplicity. Definition 3 ((Positive) restriction, M|N ). Let M be a multiset over X and let N be a multiset over Y . The (positive) restriction of M regarding N, M|N , is the multiset M|N over X ∪ Y defined as follows: ( M(x) if N(x) ≥ 0 and x ∈ Y M|N (x) := 0 otherwise. So, the multiset M|N is equal to M for all elements contained in N which have a non-negative quantity, and 0 for all others elements. 3 Modelling Resource-Bounded Systems In this section we introduce resource-bounded models (RBM s) for modelling system with limited resources. Then, we propose the logics RTL∗ and RTL (resource-bounded tree logics), for the verification of such systems. Subsequently, we introduce cover models and graphs and consider several properties and special cases of RBMs. 3.1 Resource-Bounded Systems A resource-bounded agent has at its disposal a (limited) repository of resources. Performing actions reduces some resources and may produce others; thus, an agent might not always be able to perform all of its available actions. In the single agent case that we consider here this corresponds to the activation or deactivation of transitions. Definition 4 (Resources R, resource quantity (set), feasible). An element of the non-empty set R = {r1 , . . . , rρ } is called resource. A tuple (r, c) ∈ R × Z∞ is called resource quantity and we refer to c as the quantity of r. A resource-quantity set is a Z∞ -multiset ρ ∈ R± . Note that ρ specifies a resource quantity for each r ∈ R. Finally, a resource-quantity set ρ is called feasible iff ρ ∈ R⊕ ; that is, if all resources have a non-negative quantity. 5 We model resource-bounded systems by an extension of standard transition systems, allowing each transition to consume and produce resources. We assign pairs (c, p) of resource-quantity sets to each transition, denoting that a transition labelled (c, p) produces p and consumes c. Definition 5 (Resource-bounded model). A resource-bounded model (RBM) is given by M = (Q, →, Props, π, R, t) where – Q, R, and Props are finite sets of states, resources, and propositions, respectively; – (Q, →, Props, π) is a Kripke model; and – t : Q × Q → R⊕ × R⊕ is a (partial) resource function, assigning to each transition (i.e. tuple (q, q ′ ) ∈→) a tuple of multisets of resources. Instead of t(q, q ′ ) we sometimes write tq,q′ and for tq,q′ = (c, p) we use • tq,q′ (resp. t •q,q′ ) to refer to c (resp. p). Hence, in order to make a transition from q to q ′ , where q → q ′ , the resources given in • tq,q′ are required ; and in turn, tq,q′ • are produced after executing the transition. A path of an RBM is a path of the underlying Kripke structure. We also use the other notions for paths introduced above. The consumption and production of resources of a path can now be defined in terms of the consumptions and productions of the transitions it comprises. Intuitively, not every path of an RBM is feasible; consider, for instance, a system consisting of a single state q only where q → q and t •q,q = • tq,q . It seems that the transition “comes for free” as it produces the resources it consumes; however, this is not the case. The path qqq . . . is not feasible as the initial transition is not enabled due to the lack of initial resources. Hence, in order to enable it, at least the resources given in • tq,q are necessary. Definition 6 (ρ-feasible path). Let M = (Q, →, Props, π, R, t) be an RBM and let ρ ∈ R± be a resource-quantity set. A path λ = q1 q2 q3 · · · ∈ ΛM (q) where q = q1 is called ρ-feasible if for all i ∈ N the resource-quantity set ρ + i−1 • Σj=1 (t qj qj+1 − • tqj qj+1 ) |• tqi qi+1 − • tqi qi+1 is feasible. Intuitively, a path is said to be ρ-feasible if each transition in the sequence can be executed with the resources available at the time of execution. 3.2 Resource-Bounded Tree Logic We present a logic based on CTL∗ which can be used to verify systems with limited resources. In the logic we replace the CTL∗ path quantifier E by hρi where ρ is a resource-quantity set. The intuitive reading of a formula hρiγ is that there is a(n) (infinite) ρ-feasible path λ on which γ holds. Note that E can be defined as h∅i. Formally, the language is defined as follows. 6 Definition 7 ((Full) Resource-Bounded Tree Logic RTL∗ ). Let R be a set of resources and let Props a set of propositions. Formulae of RTL∗ are defined by the following grammar: ϕ ::= p | ¬ϕ | ϕ ∧ ϕ | hρiγ where γ ::= ϕ | ¬γ | γ ∧ γ | ϕ U ϕ | gϕ and p ∈ Props and ρ ∈ R± . Formulae ϕ (resp. γ) are called state (resp. path) formulae. Moreover, we define fragments of RTL∗ in which the domain of ρ is restricted. Let X be any set of multisets over R. Then RTL∗X restricts RTL∗ in such a way that ρ ∈ X. Finally, we define [ρ], the dual of hρi, as ¬hρi¬. Analogously to CTL we define RTL as the fragment of RTL∗ in which each temporal operator is immediately preceded by a path quantifier. Definition 8 (Resource-Bounded Tree Logic RTL). Let R be a set of resources and let Props a set of propositions. Formulae of RTL are defined by the following grammar: ϕ ::= p | ¬ϕ | ϕ ∧ ϕ | hρi gϕ | hρi ϕ | hρiϕ U ϕ where p ∈ Props, ρ ∈ R± . Fragments RTLX are defined in analogy to Def. 7. As in CTL we define ♦ ϕ as ⊤ U ϕ and we use the following abbreviations for the universal quantifiers (they are not definable as duals in RTL as, for example, ¬hρi¬ ϕ is not an admissible RTL formula): [ρ] gϕ ≡ ¬hρi g¬ϕ, [ρ] ϕ ≡ ¬hρi♦ ¬ϕ, [ρ]ϕ U ψ ≡ ¬hρi((¬ψ) U (¬ϕ ∧ ¬ψ)) ∧ ¬hρi ¬ψ, Next, we give the semantics for both logics. Definition 9 (Semantics of RTL∗ ). Let M be an RBM, let q be a state in M, and let λ ∈ ΛM . The semantics of RTL∗ -formulae is defined by the satisfaction relation |= which is defined as follows: M, q |= p iff λ[0] ∈ π(p) and p ∈ Props; M, q |= ϕ ∧ ψ iff M, q |= ϕ and M, q |= ψ M, q |= hρiϕ iff there is a ρ-feasible path λ ∈ Λ(q) such that M, λ |= ϕ M, λ |= ϕ iff M, λ[0] |= ϕ; and for path formulae: M, λ |= ¬γ iff not M, λ |= γ M, λ |= γ ∧ ψ iff M, λ |= γ and M, λ |= ψ M, λ |= ϕ iff for all i ∈ N we have that λ[i, ∞] |= ϕ; M, λ |= gϕ iff λ[1, ∞] |= ϕ; and M, λ |= ϕ U ψ iff there is an i ≥ 0 such that M, λ[i, ∞] |= ψ and M, λ[j, ∞] |= ϕ for all 0 ≤ j < i; 7 (q0 , (1, 1)) (q0 , (1, ∞)) (q0 , (∞, 1)) (q1 , (0, 2)) (q2 , (1, ∞)) (q1 , (0, ∞)) (q1 , (∞, 2)) (q0 , (0, 4)) (q0 , (0, ∞)) (q0 , (∞, ω)) (1, 1), (0, 2) q1 r (q2 , (0, 1)) (q1 , (∞, ω)) 3) (0 , 0) (0, 0), (0, 2) (3 , (0, 3), (0, 0) t q0 (0, 0), (0, 0) p q2 (q0 , (0, 1)) (a) The RBM M. (q2 , (∞, ω)) (b) Three cover graphs for M. Fig. 1. A simple RBM and its cover graph. Thus the meanings of [ρ] p is that proposition p holds in every state on any ρ-feasible path. We now discuss some interpretations of the formula hρiγ considering various resource-quantity sets. For ρ ∈ R⊕ 6∞ it is assumed that ρ consists of an initial (positive) amount of resources which can be used to achieve γ where the quantity of each resource is finite. ρ ∈ R⊕ allows to ignore some resources (i.e. it is assumed that there is an infinite quantity of them). Note, that there might be transitions that consume all resources (since ∞ − ∞ is defined to be 0). Initial debts of resources can be modelled by ρ ∈ R± 6∞ . Example 1. Consider the RBM M in Figure 1. Each transition is labeled by (c1 , c2 ), (p1 , p2 ) with the interpretation: The transition consumes ci and produces pi quantities of resource ri for i = 1, 2. We encode the resource quantity set by (a1 , a2 ) to express that there are ai quantities of resource ri for i = 1, 2. – If there are infinitely many resources available proposition t can become true infinitely often: M, q0 |= h(∞, ∞)i ♦ t – We have M, q0 6|= h(1, 1)i ⊤ as there is only a finite no (1, 1)-feasible path. The formula h(1, ∞)i (p ∨ t) holds in q0 . – Is there a way that the system runs forever given specific resources? Yes, if we assume, for instance, that there are infinitely many resources of r1 and at least one resource of r2 : M, q0 |= h(∞, 1)i⊤ These simple examples show, that it is not always immediate whether a formula is satisfied, sometimes a rather tedious calculation might be required. 3.3 Cover Graphs and Cover Models In this section we introduce a transformation of RBMs into unlabelled transition systems. This allows us to reduce truth in RTL to truth in CTL. 8 We say that a resource-quantity set covers another, if it has at least as many resources of each type with at least one amount actually exceeding that of the other resource-quantity set. We are interested in cycles of transition systems that produce more resources than they consume, thereby giving rise to unbounded resources of some type(s). This is captured by a cover graph for RBMs, extending ideas from [8] and requiring an ordering on resource quantities. Definition 10 (Resource ordering <). Let ρ and ρ′ be resource sets in R± . We say ρ < ρ′ iff (∀r ∈ R (ρ(r) ≤ ρ′ (r))) ∧ (∃r ∈ R (ρ(r) < ρ′ (r))). We say ρ has strictly less resources than ρ′ or ρ′ covers ρ. The ordering is extended to allow values of ω by defining for x ∈ N that ∞ + ω = ∞, ∞ − ω = ∞, ω − ∞ = −∞, ω + x = ω, ω − x = ω, and ω < ∞. ρ Definition 11 (ρ-feasible transition, − →). We say that a transition q → q ′ ρ ′ is ρ-feasible and write q − →q if for all i ∈ {1, . . . , |R|} we have that 0 < • tq,q′ (ri ) • implies tq,q′ (ri ) ≤ ρ(ri ). So, given a specific amount of resources ρ a transition is said to be ρ-feasible if it can be traversed given ρ. Definition 12 ((ρ, q)-cover graph of an RBM, path, λ|Q ). Let M = (Q, → , Props, π, R, t), let q be a state in Q, and let ρ ∈ R± . Without loss of generality, assume R = {r1 , . . . , rn } and consider (xi )i as an abbreviation for the sequence (xi )i∈{1,...,n} . The (ρ, q)-cover graph CG(M, ρ, q) for M with initial state q ∈ Q and an initial resource-quantity set ρ is the graph (V, E) defined as the least fixpoint of the following specification: 1. (q, (ρ(ri ))i ) ∈ V (the root vertex). (xi )i 2. For (q ′ , (xi )i ) ∈ V and q ′′ ∈ Q with q ′ −−−→q ′′ then either: (a) if there is a vertex (q ′′ , (x̄i )i ) on the path from the root to (q ′ , (xi )i ) in V , with (x̄i )i < (xi − • tq′ ,q′′ (ri ) + tq′ ,q′′ • (ri ))i then (q ′′ , (x̃i )i ) ∈ V and ((q ′ , (xi )i ), (q ′′ , (x̃i )i )) ∈ E where we define ( max{ω, xi − • tq′ ,q′′ (ri ) + tq′ ,q′′ • (ri )} if x̄i < xi , x̃i := xi − • tq′ ,q′′ (ri ) + tq′ ,q′′ • (ri ) otherwise; else (b) (q ′′ , (xi − • tq′ ,q′′ (ri ) + tq′ ,q′′ • (ri ))i ) ∈ V and ((q ′ , (xi )i ), (q ′′ , (xi − • tq′ ,q′′ (ri ) + tq′ ,q′′ • (ri ))i ) ∈ E. A path in CG(M, ρ, q) is an infinite sequence of pairwise adjacent states. Given a path λ = (q1 , (x1i )i )(q2 , (x2i )i ) . . . we use λ|Q to denote the path q1 q2 . . . , i.e. the states of M are extracted from the states in V . Example 2. We continue Example 1. On the right of Figure 1 some examples of cover graphs for different initial resource sets for M are shown. In the cover graph, ω denotes the reachability of unbounded resources while ∞ is used for an infinite amount of resources. 9 Proposition 1. Let ρ ∈ R± , let M be an RBM, let q be a state in M, and let G denote the (ρ, q)-cover graph of M. Then, for each node (q, (xi )i ) of G the property xi ≥ min{ρ(ri ), 0} holds. Proof. Suppose there is a node (q, (xi )i ) in G and an index j such that xj < min{ρ(ri ), 0}. We first consider the case in which the minimum is equal to 0. Then, there must be a transition in G which causes a non-negative quantity of ri to become negative. But such a transition is not feasible due to the construction of G! The case in which the minimum is equal to ρ(ri ) < 0 yields the same contradiction as a negative quantity of ri reduces even further which is not allowed in the construction of G. ⊓ ⊔ The proposition states that non-positive resource quantities cannot decrease further. Theorem 1 is fundamental for the decidability of model checking RTL. Its proof is similar to the corresponding proof for Karp-Miller graphs [8]. Theorem 1 (Finiteness of the cover graph). Let ρ ∈ R± , let M be an RBM, and let q be a state in M. Then the (ρ, q)-cover graph of M is finite. Proof. Let G denote the (ρ, q)-cover graph of M and let Q be the set of states in M. Assume G is infinite (i.e., G has infinitely many nodes). Then, there is an infinite path l = v1 v2 . . . in G that contains infinitely many different states. Since Q is finite there is some state, say q ′ ∈ Q, of M and an infinite subsequence of distinct states l′ = vi1 vi2 . . . of l with vij = (q ′ , (xjk )k ) and ij < ij+1 for all j = 1, 2, . . . . Due to the construction of the cover graph, it cannot be the case ′ that (xjk )k ≤ (xjk )k for any 1 ≤ j < j ′ ; otherwise, an ω-node would have been introduced and the infinite sequence would have collapsed. So, there must be two distinct indices, o and p, with 1 ≤ o, p ≤ |R| such that, without loss of generality, ′ ′ xjo < xjo and xjp > xjp . But by Prop. 1 we know that each xjk ≥ min{ρ(rk ), 0}; hence, the previous property cannot hold for all indices o, p, j, j ′ but for the case in which ρ(r) = −∞ for some resource r. However, this would also yield a contradiction as any non-negative resource quantity is bounded by 0. This proves that such an infinite path cannot exist and that the cover graph therefore has to be finite. ⊓ ⊔ Cover graphs can be viewed as Kripke frames. It is obvious how they can be extended to models given an RBM. Definition 13 ((ρ, q)-cover model of an RBM). Let G = (V, E) be the (ρ, q)cover graph of an RBM M = (Q, →, Props, π, R, t). The (ρ, q)-cover model of M, CM(M, ρ, q), is given by (V, E, Props, π ′ ) with π ′ ((q, (xi )i )) := π(q) for all (q, (xi )i ) ∈ V . In Section 4.1 we show that we can use cover models to reduce truth in RTL to CTL; more precisely, given an RBM M and an RTL formula ϕ a CTL formula ϕ′ and the cover model C ′ are constructed from ϕ and M such that ϕ is true in M if, and only if, ϕ′ is true in C ′ . This allows us to use existing model-checking techniques for CTL showing the decidability of our logic. The 10 (0), (0) (2) (0) q2 q0 s (q2 , ω) (0), (0) (0), (1) r r q1 (q0 , 0) (q1 , 0) s r (q0 , ω) (q1 , ω) (a) The RBM (b) The (∅, q0 )-cover model for M. M. Fig. 2. Cover models do not preserve truth for RTL∗ formulae. next example, however, shows that cover models are not sufficient for the full language of RTL∗ . Example 3. Consider the model M and the (∅, q0 )-cover model C := CM(M, ∅, q0 ) of M in Figure 2. Note that the path q0 q1 q0 q2 q2 . . . in M corresponding to the path (q0 , 0)(q1 , 0)(q0 , ω)(q2 , ω)(q2 , ω) . . . in C is not ∅-feasible in M. For the formula γ = ♦ s ∧ g g ¬r we have that M, q0 6|= h∅iγ but C, (q0 , 0) |= Eγ. Note, that γ in the example is an RTL∗ path formula. Theorem 2, however, shows that cover models are sufficient to guarantee invariance of truth of pure RTL formulae. 3.4 Properties of Resource-Bounded Models In Section 5 we use cover models to show that the model-checking problem is decidable for RTL. Decidability of model checking for (full) RTL∗ over arbitrary RBMs is still open. However, we identify interesting subclasses in which the problem is decidable. Below we consider some restrictions which may be imposed on RBMs. Definition 14 (Production-, zero (loop)-,∞-free, k-, Let M = (Q, →, Props, π, R, t) be an RBM. k l -bounded). (a) We say that M is production free if for all q, q ′ ∈ Q we have that tq,q′ = (c, ∅). That is, actions cannot produce resources they only consume them. (b) We say that M is zero free if there are no states q, q ′ ∈ Q with q 6= q ′ and tq,q′ = (∅, p). That is, there are no transitions between distinct states which do not consume any resources. (c) We say that M is zero-loop free if there are no states q, q ′ ∈ Q with tq,q′ = (∅, p). That is, in addition to zero free models, loops without consumption of resources are also not allowed. (d) We say that M is structurally k-bounded for ρ ∈ k R± iff the available resources after any finite prefix of a ρ-feasible path are bounded by k, i.e., there is no reachable state in which the agent can have more than k resources of any resource type. 11 (e) We say that M is structurally kl -bounded for ρ ∈ kl R± iff the available resources after any finite prefix of a ρ-feasible path are bounded by l from below and by k from above, i.e., there is no reachable state in which the agent can have less than l or more than k resources of any resource type. (f ) We say that M is ∞-free if there is no transition that consumes or produces an infinite amount of a resource. That is, there are no states q, q ′ ∈ Q with tq,q′ = (c, p) such that there is a resource r with c(r) = ∞ or p(r) = ∞. In the following we summarise some results which are important for the model checking results presented in Section 5. Proposition 2. Let M be an ∞-free RBM and let ρ ∈ R± be a resourcequantity set. Then, there is an ∞-free RBM M′ and a ρ′ ∈ R± 6∞ , both effectively constructible from M and ρ, such that the following holds: A path is ρ-feasible in M if, and only if, it is ρ′ -feasible in M′ . Proof. Let ρ′ be equal to ρ but the quantity of each resource r with ρ(r) ∈ {−∞, ∞} is 0 in ρ′ and let M′ equal M apart from the following exceptions. For each transition (q, q ′ ) with tqq′ = (c, p) in M do the following: Set c(r) = 0 in M′ for each r with ρ(r) = ∞; or remove the transition (q, q ′ ) completely in M′ if c(r) > 0 (in M) and ρ(r) = −∞ for some resource r. Obviously, M′ is ∞-free and ρ ∈ R± 6∞ . Now, the left-to-right direction of the result is straightforward as only transitions were omitted in M′ which can not occur on any ρ-feasible path in M. The right-to-left direction is also obvious as only resource quantities in M′ were set to 0 from which an infinite amount is available in ρ and only those transitions were removed which can never occur due to an infinite debt of resources. ⊓ ⊔ The next proposition presents some properties of special classes of RBMs introduced above. In general there may be infinitely many ρ-feasible paths. We study some restrictions of RBMs that reduce the number of paths: Proposition 3. Let M = (Q, →, Props, π, R, t) be an RBM. (a) Let ρ ∈ R± 6∞ and let M be production and zero-loop free; then, there are no ρ-feasible paths. (b) Let ρ ∈ R± 6∞ and let M be production and zero free. Then for each ρ-feasible path λ there is an (finite) initial segment λ′ of λ and a state q ∈ Q such that λ = λ′ ◦ qqq . . . . (c) Let ρ ∈ R± 6∞ and let M be production free. Then, each ρ-feasible path λ has the form λ = λ1 ◦ λ2 where λ1 is a finite sequence of states and λ2 is a path such that no transition in λ2 consumes any resource. (d) Let ρ ∈ R± 6∞ and let M be k-bounded for ρ. Then there are only finitely many state/resource combinations (i.e. elements of Q × R± 6∞ ) possible on any ρ-feasible path. k (e) Let ρ ∈ R± 6∞ . Every ∞-free RBM M that is k-bounded for ρ is also l bounded for ρ for some l. 12 Proof (Sketch). (a) As there are no resources with an infinite amount and each transition is production free and consumes resources some required resources must be exhausted after finitely many steps. (b) Apart from (a) loops may come for free and this is the only way how ρ-feasible paths can result. (c) Assume the contrary. Then, in any infinite suffix of a path there is a resource-consuming transition that occurs infinitely often (as there are only finitely many transitions). But then, as the model is production free, the path cannot be ρ-feasible. (d) We show that there cannot be infinitely many state/resource combinations reachable on any ρ-feasible path. Since the condition of ρ-feasibility requires the consumed resources to be present, there is no possibility of infinite decreasing sequences of resource-quantity sets.This gives a lower bound for the initially available resources ρ. The k-boundedness also gives an upper bound. (e) also holds because of the latter argument. ⊓ ⊔ We show that k-boundedness and k l -boundedness are decidable for RBMs. Proposition 4 (Decidability of k-boundedness). Given a model M and an initial resource-quantity set ρ, the question whether M is structurally k-bounded (resp. kl -bounded) for ρ is decidable. Proof. First, we check that ρ ∈ k R⊕ . If this is not the case, then M is not k-bounded for ρ. Then we construct the cover graph of M and check whether there is a state (q, (xi )i ) in it so that xi > k for some i. If this is the case M is not k-bounded; otherwise it is. The case of kl -boundedness is treated similarly one has to explicitly check the lower bound in addition to the upper bound for every vertex. ⊓ ⊔ We end this section with an easy result showing a sufficient condition for a model to be k-bounded. Proposition 5. Let ρ ∈ R± 6∞ . Each production free and ∞-free RBM is kbounded for ρ where k := max{i | ∃r ∈ R (ρ(r) = i)}. 4 Properties of Resource-Bounded Tree Logics Before discussing specific properties of RTL and RTL∗ and showing the decidability of the model-checking problem for RTL and for special cases of RTL∗ and its models, we note that our logics conservatively extend CTL∗ and CTL. This is easily seen by defining the path quantifier E as h∅i and by setting tqq′ = (∅, ∅) for all states q and q ′ . Hence, every Kripke model has a canonical representation as an RBM. Proposition 6 (Expressiveness). CTL∗ and CTL can be embedded in RTL∗ and RTL over all Kripke models, respectively. 13 Proof. Given a CTL∗ formula ϕ and a Kripke model M we replace every existential path quantifier in ϕ by h∅i and denote the result by ϕ′ . Then, we extend M to the canonical RBM M′ and have that M, q |= ϕ iff M′ , q |= ϕ′ . ⊓ ⊔ 4.1 RTL and Cover Models Let λ be a finite sequence of states. Then we recursively define λn for n ∈ N as follows: λ0 := ǫ and λi := λi−1 λ for i ≥ 1. That is, λn is the path which results from putting λ n-times in sequence. The following lemma states that for RTL formulae it does not matter whether a cycle is traversed just once or many times. It can be proved by a simple induction on the path formula γ. Lemma 1. Let γ be an RTL path formula containing no more path quantifiers, let M be an RBM and let λ be a path in M. Now, if λ̃ = q1 . . . qn is a finite subsequence of λ with q1 = qn (note, that a single state is permitted as well), then, λ can be written as λ1 λ̃λ2 where λ1 , λ2 are subsequences of λ. We have that M, λ |= γ if, and only if, M, λ1 λ̃n λ2 |= γ for all n ∈ {1, 2, . . . }. The next lemma justifies the use of a CTL model checker for RTL formulae. We extend the cover-graph construction in the following way: Nodes not including ω are treated as before. For every node with an ω in one of the resource quantities, the construction changes for those transitions that consume from the ω quantified resource type. Instead of using the rule “ω − k = ω”, we expand the nodes for as long as is needed to ensure any other loop’s resource requirements can be met. This is important for the case where a loop consumes more resources of some type than it produces, i.e., represents a potential infinite deficit of that resource type, but does produce a surplus of another that might be required for some other transition or loop to be executed. The construction thus leads to a finite unwinding of loops that can only occur a finite number of times due to the unavailability of infinite resources. By unravelling loops to a limit according to the maximum resource requirement of all loops, we ensure that we do not inhibit the execution of any transitions that would lead to a state in which a proposition becomes true, if and only if this would in fact be possible after creating sufficient resources in the original resource-bounded model. This is important to ascertain that there exists an infinite path whenever there is a satisfying (infinite) path in the resource-bounded model. We denote this extended cover model for M with g initial state q and resources ρ by CM(M, ρ, q). Lemma 2. Let ρ ∈ R± , let M be an RBM, let q be a state in M, let G := e := CM(M, g CM(M, ρ, q), and let G ρ, q). Then, the following properties hold: (a) For each ρ-feasible q-path λ = qq1 q2 . . . in M there is a (q, ρ)-path λ′ in G such that λ = λ′ |Q . (b) Let γ be an RTL path formula without path quantifiers. If there is a (q, (ρ(ri ))i )e satisfying γ then there also is a (q, (ρ(ri ))i )-path λ′ in G e satispath λ in G ′ fying γ, such that λ |Q is ρ-feasible in M and satisfies γ in M. 14 Theorem 2 (RTL invariant under cover models). Let M be an RBM and let q be a state of M. Then, for any RTL formula hρiγ such that γ does not contain any more path quantifiers it holds that M, q |=RTL hρiγ CM(M, ρ, q), (q, ρ) |=CTL Eγ. if, and only if, Proof. Let G = (V, E, Props, π) := CM(M, ρ, q). “⇒”: Let λ be ρ-feasible such that M, λ |= γ. By Lemma 2(a) there is a path λ′ with λ′ |Q = λ in G. Clearly, we have that G, λ′ |= γ and hence G, (q, ρ) |= Eγ. “⇐”: Let G, (q, ρ) |= Eγ, i.e., G, λ |= γ for some (q, ρ)-path λ. Then, by Lemma 2(b) there is a path λ′ in G such that M, λ′ |Q |= γ and λ′ |Q ρ-feasible; thus, M, q |= hρiγ. ⊓ ⊔ The case for RTL∗ is more sophisticated as the language is able to characterise more complex temporal patterns. As a consequence Lemma 1 does not hold for RTL∗ . A counterexample is immediate from Example 3: γ is satisfied on q0 q1 q0 q2 q2 . . . but not on q0 q1 q0 q1 q0 q2 q2 . . . . Due to this, we consider subclasses of RBMs in the next section. 4.2 RTL∗ and Bounded Models In the following, we discuss the effects of various properties of RBM s with respect to RTL∗ . For a given resource quantity it is possible to transform a structurally k-bounded RBM into a production and ∞-free RBM such that satisfaction of specific path formulae is preserved. Proposition 7. Let ρ ∈ R± 6∞ , let M be a structurally k-bounded RBM for ρ, and let q be a state in M. Then, we can construct a finite, production free and ∞-free RBM M′ such that for every RTL∗ path formula γ containing no more path quantifiers the following holds: M, q |= hρiγ if, and only if, M′ , q ′ |= h∅iγ. Proof (Sketch). Firstly, we remove ∞ transitions from M (i.e. all transitions labelled (c, p) with c(r) = ∞ for some resource r) as they can never be traversed. Then, we essentially take M′ as the reachability graph of M. This graph is build similar to the cover graph but no ω-nodes are introduced. Because there are only finitely many distinct state/resource combinations in M (Prop. 3) the model is finite and obviously also production free and ∞-free. Let M, q |= hρiγ and let λ be a ρ-feasible path satisfying γ. Then, the path obtained from λ by coupling each state with its available resources is a path in M′ satisfying γ. Conversely, let λ be a path in M′ satisfying γ. Then, λ|Q is a γ satisfying ρ-feasible path in M due to the construction of M′ . ⊓ ⊔ The following corollary is needed for the model-checking results in Section 5. Corollary 1. Let ρ ∈ R± 6∞ , let M be a structurally k-bounded RBM for ρ, and let q be a state in M. Then, we can construct a finite Kripke model such that for every RTL∗ path formula γ containing no more path quantifiers the following holds: 15 M, q |= hρiγ if, and only if, M′ , q ′ |= Eγ. Lemma 3 states that loops that do not consume resources can be reduced to a fixed number of recurrences. For a path λ, we use λ[n] to denote the path which is equal to λ but each subsequence of states q1 q2 . . . qk q occurring in λ with q1 = q2 = · · · = qk 6= q and k > n where the transition q → q does not consume any resource is replaced by q1 q2 . . . qn q. That is, states qn+1 qn+2 . . . qk are left out. Note, that λ[n] is also well-defined for pure Kripke models. Lemma 3. (a) Let M be a Kripke model and γ be a CTL∗ path formula with only the initial path quantifier (and no further path quantifiers) and length |γ| = n. For every path λ in ΛM we have that M, λ |= γ if, and only if, M, λ[n] |= γ. (b) Let M be a production and zero free RBM and γ be an RTL∗ path formula with with only the initial path quantifier (and no further path quantifiers) and length |γ| = n. Then, for each path λ in ΛM the following holds true: M, λ |= γ if, and only if, M, λ[n] |= γ. Proof (Sketch). (a) We begin the proof with a claim which is easily proved by structural induction on γ. Claim: Suppose γ does not include any N ext-modality and let λ be a path in ΛM . Now, let λ′ be obtained from λ by replacing a single state, say q, in λ by a finite block (or sequence) of state q and repeating this for any (finite) number of states. Then, M, λ |= γ iff M, λ′ |= γ. The claim states that a g-free path formula cannot distinguish between paths of the following kind: q1 q2 q3 . . . vs. q1 q2 q2 q2 q3 . . . . We are ready to prove the lemma. Without loss of generality, assume that λ 6= λ[n] . Note, that the only difference between both paths is that λ contains at least one sequence of state repetitions, say of q, with length greater than n. We proceed by induction on the number of such (maximum-length) sequences of length greater than n. Without loss of generality, assume γ to be an atomic formula, and assume there is only one such sequence given by λ[l, l + k − 1], with k > n and l ≥ 0. We proceed by a second induction, this time on the number of g-modalities in γ. Assume there is just one modality, then γ = γ1 gγ2 . Let M, λ |= γ and I ⊆ N be the smallest set of indices at which gγ2 has to be true in order to satisfy γ. We say I is the witness of gγ2 wrt γ1 and λ. Moreover, we require that each eventuality subformulae (i.e. a formula starting with U ) becomes satisfied as soon as possible. For instance, if γ = gp then I = {1, 2, 3, . . . } and for γ = ♦ gp we have that I = {min{i ∈ N | M, λ[i, ∞] |= gp}}, provided that γ is true on λ. We define the following set J from I: J := {i ∈ I | i < l + n − 1} ∪ {l + n − 2 | ∃i ∈ I (l + n − 1 ≤ i < l + k − 1)} ∪ {i − (k − n) | i ∈ I, i ≥ l + k − 1} Now, it is easy to see from the claim stated above that J is the witness of gγ2 wrt λ[n] and γ1 which shows that M, λ[n] |= γ. 16 Assume we have proved the claim for all formulae that contain at most m gmodalities. Consider γ = γ1 gγ2 where γ2 contains m g-modalities. The proof is done analogously by constructing an appropriate witness set. It is important to note, that the total number of modalities has to be less than n = |γ|. We proceed with proving the induction step for the outer induction: Assume there are m occurrences of states sequences which are contracted to sequences of length n = |γ|. Again, this is proved by following the same mechanism used above. (b) This part follows directly from (a) because the repetitions of states do not comsume and produce any resources; thus, the feasability of the modified path does not change. ⊓ ⊔ Note that we might want to allow to re-enter loops n-times for cases in which the formula has the form g g. . . g♦ ϕ. 5 Model Checking Resource-Bounded Tree Logic We are mainly interested in the verification of systems. Model checking refers to the problem whether a formula ϕ is true in an RBM M and a state q in M. For CTL∗ this problem is PSPACE-complete and for CTL– the fragment of CTL∗ in which every temporal operator is directly preceded by a path quantifier – it is P-complete [5]. So, we cannot hope for our problem to be computationally any better than PSPACE in the general setting; actually, it is still open whether it is decidable at all. Theorem 3 (Model Checking RTL: Decidability). The model-checking problem for RTL over RBMs is decidable and P-hard. Proof. Let M be an RBM and ϕ a RTL formula. We would like to check whether M, q0 |= ϕ. Let hρiγ be a subformula of ϕ such that γ contains no more path quantifiers. Then, we construct CM(M, ρ, q) and label each state q in M for which CM(M, ρ, q), (q, ρ) |= Eγ with a fresh proposition symbol p. All occurrences of the subformula hρiγ in ϕ are replaced with p. Applying this procedure subsequently to ϕ and M results in a Boolean formula ϕ′ over the new propositional symbols and a model M′ labeled with these new symbols. Then we have that ϕ′ is true in M′ , q0 iff M, q0 |= ϕ. P-hardness follows from Proposition 6. ⊓ ⊔ In the following, we consider the decidability of fragments of the full logic over special classes of RBMs (which of course, implies decidability of the restricted version over the same class of models). Proposition 8 (Decidability: Production, zero free). The model-checking problem for RTL∗R± over production and zero free RBMs is decidable. ∞ 6 17 Proof (Sketch). According to Prop. 3 and Lemma 3 there are only finitely many ρ-feasible paths of interest for ρ ∈ R± 6∞ . This set can be computed step by step. Then, for M, q |= hρiγ where γ is a path formula one has to check whether γ holds on one of these finitely many ρ-feasible paths starting in q. The model checking algorithm proceeds bottom-up. ⊓ ⊔ From Corollary 1 we know that we can use a CTL∗ model checker over k-bounded models. Proposition 9 (Decidability: k-bounded). The model-checking problem for RTL∗R± over k-bounded RBMs is decidable and PSPACE-hard. ∞ 6 By Prop. 5 and the observation that resources with an infinite quantity can be neglected in a production and ∞-free RBM we can show the following theorem. Theorem 4 (Decidability: production, ∞-free). The model-checking problem for RTL∗ over production, ∞-free RBMs is decidable and PSPACE-hard. 6 Conclusions, Related and Future Work This paper introduced resources into CTL∗ [4], which is arguably one of the most important logics for computer science. The paper showed decidability results in the presence of some limiting constraints on the resource allocation for transitions in the Kripke models. While most agent models do not come with an explicit notion of resources, there is some recent work that take resources into account. [9] considers resources in conjunction with reasoning about an agent’s goal-plan tree. Time, memory, and communication bounds are studied as resources in [2]. In [1] the abilities of agents under bounded memory are considered. Instead of asking for an arbitrary winning strategy a winning strategy in their setting has to obey given memory limitations. A detailed analysis of the model checking complexity and the decidability question for the general case is left for future research. We are particularly interested in finding constraints that would make the extended logic’s model-checking problem efficiently decidable for a relevant class of MAS. Moreover, we would like to extend the resource-bounded setting to multiple agents (influenced by ATL [3] a logic for reasoning about strategic abilities of agents), so that abilities of coalitions in multi-agent systems can be expressed and analysed. Another direction is offered by Linear Logic. Although Girard’s linear logic [7] is not directly suitable for model checking, we will be looking into possible combinations of linear logic fragments with our approach. One idea is to formalise resources and their production/consumption by means of linear logic formulae and hope to come up with an axiomatisation for our logic. 18 References 1. Thomas Ågotnes and Dirk Walther. A logic of strategic ability under bounded memory. J. of Logic, Lang. and Inf., 18(1):55–77, 2009. 2. Natasha Alechina, Brian Logan, Nguyen Hoang Nga, and Abdur Rakib. Verifying time, memory and communication bounds in systems of reasoning agents. In AAMAS ’08: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems, pages 736–743. International Foundation for Autonomous Agents and Multiagent Systems, 2008. 3. Rajeev Alur, Thomas A. Henzinger, and Orna Kupferman. Alternating-time temporal logic. Journal of the ACM, 49:672–713, 2002. 4. E.M. Clarke and E.A. Emerson. Design and synthesis of synchronization skeletons using branching time temporal logic. In Proceedings of Logics of Programs Workshop, volume 131 of Lecture Notes in Computer Science, pages 52–71, 1981. 5. E.M. Clarke, E.A. Emerson, and A.P. Sistla. Automatic verification of finite-state concurrent systems using temporal logic specifications. ACM Transactions on Programming Languages and Systems, 8(2):244–263, 1986. 6. E.A. Emerson and J.Y. Halpern. “sometimes” and “not never” revisited: On branching versus linear time temporal logic. In Proceedings of the Annual ACM Symposium on Principles of Programming Languages, pages 151–178, 1982. 7. J.-Y. Girard. Linear logic. Theoretical Computer Science, 50:1–102, 1987. 8. R. M. Karp and R. E. Miller. Parallel program schemata. Journal of Computer and System Sciences, 3(2):147–195, 1969. 9. P. Shaw, B. Farwer, and R. Bordini. Theoretical and experimental results on the goal-plan tree problem (short paper). In Proceedings of AAMAS’08, pages 1379– 1382, 2008. 19 Reasoning about Multi-Agent Domains using Action Language C: A Preliminary Study Chitta Baral1 , Tran Cao Son2 , and Enrico Pontelli2 1 2 Dept. Computer Science & Engineering, Arizona State University, [email protected] Dept. Computer Science, New Mexico State University, tson|[email protected] Abstract. This paper investigates the use of action languages, originally developed for representing and reasoning single-agent domains, in modeling multiagent domains. We use the action language C and show that minimal extensions are sufficient to capture several multi-agent domains from the literature. The paper also exposes some limitations of action languages in modeling a specific set of features in multi-agent domains. 1 Introduction and Motivation Representing and reasoning in multi-agent domains are two of the most active research areas in multi-agent system (MAS) research. The literature in this area is extensive, and it provides a plethora of logics for representing and reasoning about various aspects of MAS domains. For example, the authors of [24] combine an action logic and a cooperation logic to represent and reason about the capabilities and the forms of cooperation between agents. The work in [16] generalizes this framework to consider domains where an agent may control only parts of propositions and to reason about strategies of agents. In [31], an extension of Alternating-time Temporal Logic is developed to facilitate strategic reasoning in multi-agent domains. The work in [30] suggests that decentralized partially observable Markov decision processes could be used to represent multi-agent domains, and discusses the usefulness of agent communication in multiagent planning. In [18], an extension of Alternating-time Temporal Epistemic Logic is proposed for reasoning about choices. Several other works (e.g., [12, 32]) discuss the problem of reasoning about knowledge in MAS. Even though a large number of logics have been proposed in the literature for formalizing MAS, several of them have been designed to specifically focus on particular aspects of the problem of modeling MAS, often justified by a specific application scenario. This makes them suitable to address specific subsets of the general features required to model real-world MAS domains. Several of these logics are quite complex and require modelers that are transitioning from work on single agents to adopt a very different modeling perspective. The task of generalizing some of these existing proposals to create a uniform and comprehensive framework for modeling different aspects of MAS domains is, to the best of our knowledge, still an open problem. Although we do not dispute the possibility of extending these existing proposals in various directions, the task does not seem easy. On the other hand, the need for a general language for MAS domains, with a formal and 20 simple semantics that allows the verification of plan correctness, has been extensively motivated (e.g., [8]). The state of affairs in formalizing multi-agent systems reflects the same trend that occurred in the early nineties, regarding the formalization of single agent domains. Since the discovery of the frame problem [22], several formalisms for representing and reasoning about dynamic domains have been proposed. Often, the new formalisms responded to the need to address shortcomings of the previously proposed formalisms within specific sample domains. For example, the well-known Yale Shooting problem [17] was invented to show that the earlier solutions to the frame problem were not satisfactory. A simple solution to the Yale Shooting problem, proposed by [2], was then shown not to work well with the Stolen Car example [20], etc. Action languages [15] have been one of the outcomes of this development, and they have been proved to be very useful ever since. Action description languages, first introduced in [14] and further refined in [15], are formal models used to describe dynamic domains, by focusing on the representation of effects of actions. Traditional action languages (e.g., A, B, C) have mostly focused on domains involving a single agent. In spite of different features and several differences between these action languages (e.g., concurrent actions, sensing actions, nondeterministic behavior), there is a general consensus on what are the essential components of an action description language in single agent domains. In particular, an action specification focuses on the direct effects of each action on the state of the world; the semantics of the language takes care of all the other aspects concerning the evolution of the world (e.g., the ramification problem). The analogy between the development of several formalisms for single agent domains and the development of several logics for formalizing multi-agent systems indicates the need for, and the usefulness of, a formalism capable of dealing with multiple desired features in multi-agent systems. A natural question that arises is whether single agent action languages can be adapted to describe MAS. This is the main question that we explore in this paper. In this paper, we attempt to answer the above question by investigating whether an action language developed for single agent domains can be used, with minimal modifications, to model interesting MAS domains. Our starting point is a well-studied and well-understood single agent action language—the language C [15]. We chose this language because it already provides a number of features that are necessary to handle multi-agent domains, such as concurrent interacting actions. The language is used to formalize a number of examples drawn from the multi-agent literature, describing different types of problems that can arise when dealing with multiple agents. Whenever necessary, we identify weaknesses of C and introduce simple extensions that are adequate to model these domains. The resulting action language provides a unifying framework for modeling several features of multi-agent domains. The language can be used as a foundation for different forms of reasoning in multi-agent domains (e.g., projection, validation of plans), which are formalized in the form of a query language. We expect that further development in this language will be needed to capture additional aspects such as agents’ knowledge about other agents’ knowledge. We will discuss them in the future. 21 We would like to note that, in the past, there have been other attempts to use action description languages to formalize multi-agent domains, e.g., [6]. On the other hand, the existing proposals address only some of the properties of the multi-agent scenarios that we deem to be relevant (e.g., focus only on concurrency). Before we continue, let us discuss the desired features and the assumptions that we place on the target multi-agent systems. In this paper, we consider MAS domains as environments in which multiple agents can execute actions to modify the overall state of the world. We assume that • Agents can execute actions concurrently; • Each agent knows its own capabilities—but they may be unaware of the global effect of their actions; • Actions executed by different agents can interact; • Agents can communicate to exchange knowledge; and • Knowledge can be private to an agent or shared among groups of agents. The questions that we are interested in answering in a MAS domain involve • hypothetical reasoning, e.g., what happens if agent A executes the action a; what happens if agent A executes a1 while B executes b1 at the same time; etc. • planning/capability, e.g., can a specified group of agents achieve a certain goal from a given state of the world. Variations of the above types of questions will also be considered. For example, what happens if the agents do not have complete information, if the agents do not cooperate, if the agents have preferences, etc. To the best of our knowledge, this is the first investigation of how to adapt a single agent action language to meet the needs of MAS domains. It is also important to stress that the goal of this work is to create a framework for modeling MAS domains, with a query language that enables plan validation and various forms of reasoning. In this work, we do not deal with the issues of distributed plan generation—an aspect extensively explored in the literature. This is certainly an important research topic and worth pursuing but it is outside of the scope of this paper. We consider the work presented in this paper a necessary precondition to the exploration of distributed MAS solutions. The paper is organized as follows. Section 2 reviews the basics of the action language C. Section 3 describes a straightforward adaptation of C for MAS. The following sections (Sects. 4–5) show how minor additions to C can address several necessary features in representation and reasoning about MAS domains. Sect. 6 presents a query language that can be used with the extended C. Sect. 7 discusses further aspects of MAS that the proposed extension of C cannot easily deal with. Sect. 8 presents the discussion and some conclusions. 2 Action Language C The starting point of our investigation is the action language C [15]—an action description language originally developed to describe single agent domains, where the agent is capable of performing non-deterministic and concurrent actions. Let us review a slight adaptation of the language C. 22 A domain description in C builds on a language signature hF, Ai, where F ∩ A = ∅ and F (resp. A) is a finite collection of fluent (resp. action) names. Both the elements of F and A are viewed as propositional variables, and they can be used in formulae constructed using the traditional propositional operators. A propositional formula over F ∪ A is referred to simply as a formula, while a propositional formula over F is referred to as a state formula. A fluent literal is of the form f or ¬f for any f ∈ F. A domain description D in C is a finite collection of axioms of the following forms: caused ℓ if F caused ℓ if F after G (static causal law) (dynamic causa laws) where ℓ is a fluent literal, F is a state formula, while G is a formula. The language also allows the ability to declare properties of fluents; in particular non inertial ℓ declares that the fluent literal ℓ is to be treated as a non-inertial literal, i.e., the frame axiom is not applicable to ℓ. A problem specification is obtained by adding an initial state description I to a domain D, composed of axioms of the form initially ℓ, where ℓ is a fluent literal. The semantics of the language can be summarized using the following concepts. An interpretation I is a set of fluent literals, such that {f, ¬f } 6⊆ I for every f ∈ F. Given an interpretation I and a fluent literal ℓ, we say that I satisfies ℓ, denoted by I |= ℓ, if ℓ ∈ I. The entailment relation |= is extended to define the entailment I |= F where F is a state formula in the usual way. An interpretation I is complete if, for each f ∈ F, we have that f ∈ I or ¬f ∈ I. An interpretation I is closed w.r.t. a set of static causal laws SC if, for each static causal law caused ℓ if F , if I |= F then ℓ ∈ I. Given an interpretation I and a set of static causal laws SC, we denote with ClSC (I) the smallest set of literals that contains I and that is closed w.r.t. SC. Given a domain description D, a state s in D is a complete interpretation which is closed w.r.t. the set of static causal laws in D. The notions of interpretation and entailment over the language of F ∪ A are defined in a similar way. Given a state s, a set of actions A ⊆ A, and a collection of dynamic causal laws DC, we define n o · Eff DC (s, A) = ℓ | ( caused ℓ if F after G) ∈ DC, s ∪ A |= G, s |= F · where s ∪ A stands for s ∪ A ∪ {¬a | a ∈ A \ A}. Let D = hSC, DC, IN i be a domain, where SC are the static causal laws, DC are the dynamic causal laws and IN are the non-inertial axioms. The semantic of D is given by a transition system (StateD , ED ), where StateD is the set of all states and the transitions in ED are of the form hs, A, s′ i, where s, s′ are states, A ⊆ A, and s′ satisfies the property s′ = ClSC (Eff DC (s, A) ∪ ((s \ IFL) ∩ s′ ) ∪ (IN ∩ s′ )) where IFL = {f, ¬f | f ∈ IN or ¬f ∈ IN }. The original C language supports a query language (called P in [15]). This language allows queries of the form necessarily F after A1 , . . . , Ak , where F is a state formula 23 and A1 , . . . , Ak is a sequence of sets of actions (called a plan). Intuitively, the query asks whether each state s reached after executing A1 , . . . , Ak from the initial state has the property s |= F . Formally, an initial state s0 w.r.t. an initial state description I and a domain D is an element of StateD such that {ℓ | initially ℓ ∈ I} ⊆ s0 . The transition function ΦD : 2A × StateD → 2StateD is defined as ΦD (A, s) = {s′ | hs, A, s′ i ∈ ED }, where (StateD , ED ) is the transition system describing the semantics of D. This function can be extended to define Φ∗D , which considers plans, where Φ∗D ([ ], s) = {s} and  ∅ Φ∗D ([A1 , . . . , An ], s) = S  if Φ∗D ([A1 , . . . , An−1 ], s) = ∅ ∨ ∃s′ ∈ Φ∗D ([A1 , . . . , An−1 ], s).[ΦD (An , s′ ) = ∅] ′ s′ ∈Φ∗ ([A1 ,...,An−1 ],s) ΦD (An , s ) otherwise D Let us consider an action domain D and an initial state description I. A query necessarily F after A1 , . . . , Ak is entailed by (D, I), denoted by (D, I) |= necessarily F after A1 , . . . , Ak if for every s0 initial state w.r.t. I, we have that Φ∗D ([A1 , . . . , Ak ], s0 ) 6= ∅, and for each s ∈ Φ∗D ([A1 , . . . , Ak ], s0 ) we have that s |= F . 3 C for Multi-agent Domains In this section, we explore how far one of the most popular action languages developed for single agent domains, C, can be used and adapted for multi-agent domains. We will discuss a number of incremental small modifications of C necessary to enable modeling MAS domains. We expect that similar modifications can be applied to other singleagent action languages with similar basic characteristics. We will describe each domain from the perspective of someone (the modeler) who has knowledge of everything, including the capabilities and knowledge of each agent. Note that this is only a modeling perspective—it does not mean that we expect individual agents to have knowledge of everything, we only expect the modeler to have such knowledge. We associate to each agent an element of a set of agent identifiers, AG. We will describe a MAS domain over a set of signatures hFi , A Ti i for each i ∈ AG, with the assumption that Ai ∩ Aj = ∅ for i 6= j. Observe that Ti∈S Fi may be not empty for some S ⊆ AG. This represents the fact that fluents in i∈S Fi are relevant to all the agents in S. Sn Sn The result is a C domain over the signature h i=1 Fi , i=1 Ai i. We will require the following condition to be met: if caused ℓ if F after G is a dynamic law and a ∈ Ai appears in G, then the literal ℓ belongs to Fi . This condition summarizes the fact that agents are aware of the direct effects of their actions. Observe that on the other hand, an agent might not know all the consequences of his own actions. For example, a deaf agent bumping into a wall might not be aware of the fact that his action causes noise observable by other agents. These global effects are captured by the modeler, through the use of static causal laws. 24 The next two sub-sections illustrate applications of the language in modeling cooperative multi-agent systems. In particular, we demonstrate how the language is already sufficiently expressive to model simple forms of cooperation between agents even though these application scenarios were not part of the original design of C. 3.1 The Prison Domain This domain has been originally presented in [24]. In this example, we have two prison guards, 1 and 2, who control two gates, the inner gate and the outer gate, by operating four buttons a1 , b1 , a2 , and b2 . Agent 1 controls a1 and b1 , while agent 2 controls a2 and b2 . If either a1 or a2 is pressed, then the state of the inner gate is toggled. The outer gate, on the other hand, toggles only if both b1 and b2 are pressed. The problem is introduced to motivate the design of a logic for reasoning about the ability of agents to cooperate. Observe that neither of the agents can individually change the state of the outer gate. On the other hand, individual agents’ actions can affect the state of the inner gate. In C, this domain can be represented as follows. The set of agents is AG = {1, 2}. For agent 1, we have: F1 = {in open, out open, pressed(a1 ), pressed(b1 )}. Here, in open and out open represent the fact that the inner gate and outer gate are open respectively. pressed(X) says that the button X is pressed where X ∈ {a1 , b1 }. We have A1 = {push(a1 ), push(b1 )}. This indicates that guard 1 can push buttons a1 and b1 . Similarly, for agent 2, we have that F2 = {in open, out open, pressed(a2 ), pressed(b2 )} A2 = {push(a2 ), push(b2 )} We assume that the buttons do not stay pressed—thus, pressed(X), for X∈{a1 , b1 , a2 , b2 }, is a non-inertial fluent with the default value false. The domain specification (Dprison ) contains: non inertial ¬pressed(X) caused pressed(X) after push(X) caused in open if pressed(a1 ), ¬in open caused in open if pressed(a2 ), ¬in open caused ¬in open if pressed(a1 ), in open caused ¬in open if pressed(a2 ), in open caused out open if pressed(b1 ), pressed(b2 ), ¬out open caused ¬out open if pressed(b1 ), pressed(b2 ), out open where X ∈ {a1 , b1 , a2 , b2 }. The first statement declares that pressed(X) is noninertial and has false as its default value. The second statement describes the effect of the action push(X). The remaining laws are static causal laws describing relationships between properties of the environment. The dynamic causal laws are “local” to each agent, i.e., they involve fluents that are local to that particular agent; in particular, one can observe that each agent can 25 achieve certain effects (e.g., opening/closing the inner gate) disregarding what the other agent is doing (just as if it was operating as a single agent in the environment). On the other hand, if we focus on a single agent in the domain (e.g., agent 1), then such agent will possibly see exogenous events (e.g., the value of the fluent in open being changed by the other agent). On the other hand, the collective effects of actions performed by different agents are captured through “global” static causal laws. These are laws that the modeler introduce and they do not “belong” to any specific agent. Let us now consider the queries that were asked in [24] and see how they can be answered by using the domain specification Dprison . In the first situation, both gates are closed, 1 presses a1 and b1 , and 2 presses b2 . The question is whether the gates are open or not after the execution of these actions. The initial situation is specified by the initial state description I1 containing I1 = © initially ¬in open, initially ¬out open ª In this situation, there is only one initial state s0 ={¬ℓ | ℓ∈F1 ∪F2 }. We can show that (Dprison , I1 ) |= necessarily out open ∧ in open after {push(a1 ), push(b1 ), push(b2 )} If the outer gate is initially closed, i.e., I2 = { initially ¬out open}, then the set of actions A = {push(b1 ), push(b2 )} is both necessary and sufficient to open it: (Dprison , I2 ) |= necessarily out open after X (Dprison , I2 ) |= necessarily ¬out open after Y where A⊆X and A\Y 6=∅. Observe that the above entailment correspond to the environment logic entailment in [24]. 3.2 The Credit Rating Domain We will next consider an example from [16]; in this example, we have a property of the world that cannot be changed by a single agent. The example has been designed to motivate the use of logic of propositional control to model situations where different agents have different levels of control over fluents. We have two agents, AG = {w, t}, denoting the website and the telephone operator, respectively. Both agents can set/reset the credit rating of a customer. The credit rating can only be set to be ok (i.e., the fluent credit ok set to true) if both agents agree. Whether the customer is a web customer (is web fluent) or not can be set only by the website agent w. The signatures of the two agents are as follows: ½ ¾ set web, reset web, Aw = Fw = {is web, credit ok} set credit(w), reset credit(w) At = {set credit(t), reset credit(t)} Ft = {credit ok} The domain specification Dbank consists of: caused is web after set web caused ¬is web after reset web caused ¬credit ok after reset credit(w) caused ¬credit ok after reset credit(t) caused credit ok after set credit(w) ∧ set credit(t) 26 We can show that (Dbank , I3 ) |= necessarily credit ok after {set credit(w), set credit(t)} where I3 = { initially ¬ℓ | ℓ ∈ Fw ∪ Ft }. This entailment also holds if I3 = ∅. 4 Adding Priority between Actions The previous examples show that C is sufficiently expressive to model the basic aspects of agents executing cooperative actions within a MAS, focusing on capabilities of the agents and action interactions. This is not a big surprise, as discussed in [6]. We will now present a small extension of C that allows for the encodings of competitive behavior between agents, i.e., situations where actions of some agents can defeat the effects of other agents. To make this possible, for each domain specification D, we assume the presence of a function P rD : 2A → 2A . Intuitively, P rD (A) denotes the actions whose effects will be accounted for when A is executed. This function allows, for example, to prioritize certain sets of actions. The new transition function ΦD,P will be modified as follows: ΦD,P (A, s) = ΦD (P rD (A), s) where ΦD is defined as in the previous section. Observe that if there is no competition among agents in D then P rD is simply the identity function. 4.1 The Rocket Domain This domain was originally proposed in [31]. It was invented to motivate the development of a logic for reasoning about strategies of agents. This aspect will not be addressed by our formalization of this example as C lacks this capability. Nevertheless, the encoding is sufficient for determining the state of the world after the execution of actions by the agents. We have a rocket, a cargo, and the agents 1, 2, and 3. The rocket or the cargo are either in london or paris. The rocket can be moved by 1 and 2 between the two locations. The cargo can be loaded (unloaded) into the rocket by 1 and 3 (2 and 3). Agent 3 can refill the rocket if the tank is not full. There are some constraints that limit the effects of the actions. They are: • If 1 or 2 moves the rocket, the cargo cannot be loaded or unloaded; • If two agents load/unload the cargo at the same time, the effect is the same as if it were load/unload by one agent. • If one agent load the cargo and another one unload the cargo at the same time, the effect is that the cargo is loaded. We will use the fluents rocket(london) and rocket(paris) to denote the location of the rocket. Likewise, cargo(london) and cargo(paris) denote the location of the cargo. in rocket says that the cargo is inside the rocket and tank f ull states that the 27 tank is full. The signatures for the agents can be defined as follows. ¾ ½ in rocket, rocket(london), rocket(paris), F1 = © cargo(london), cargo(paris)ª A1 = ½load(1), unload(1), move(1) ¾ in rocket, rocket(london), rocket(paris), F2 = © cargo(london), cargo(paris) ª A2 = ½unload(2), move(2) ¾ in rocket, rocket(london), rocket(paris), F3 = © cargo(london),ªcargo(paris), tank f ull A3 = load(3), ref ill The constraints on the effects of actions induce priorities among the actions. The action load or unload will have no effect if move is executed. The effects of two load actions is the same as that of a single load action. Likewise, two unload actions have the same result as one unload action. Finally, load has a higher priority than unload. To account for action priorities and the voting mechanism, we define P rDrocket : • P rDrocket (X) = {move(a)} if ∃a. move(a) ∈ X. • P rDrocket (X) = {load(a)} if move(x) 6∈ X for every x∈{1, 2, 3} and load(a)∈X. • P rDrocket (X) = {unload(a)} if move(x) 6∈ X and load(x) 6∈ X for every x∈{1, 2, 3} and unload(a) ∈ X. • P rDrocket (X) = X otherwise. It is easy to see that P rDrocket defines priorities among the actions: if the rocket is moving then load/unload are ignored; load has higher priority than unload; etc. The domain specification consists of the following laws: caused in rocket after load(i) caused ¬in rocket after unload(i) caused tank f ull if ¬tank f ull after ref ill caused ¬tank f ull if tank f ull after move(i) caused rocket(london) if rocket(paris), tank f ull after move(i) caused rocket(paris) if rocket(london), tank f ull after move(i) caused cargo(paris) if rocket(paris), in rocket caused cargo(london) if rocket(london), in rocket (i ∈ {1, 3}) (i ∈ {1, 2}) (i ∈ {1, 2}) (i ∈ {1, 2}) (i ∈ {1, 2}) Let I4 consist of the following facts: initially tank f ull initially cargo(london) initially rocket(paris) initially ¬in rocket We can show the following (Drocket , I4 ) |= necessarily cargo(paris) after {move(1)}, {load(3)}, {ref ill}, {move(3)}. Observe that without the priority function P rDrocket , for every state s, ΦDrocket ({load(1), unload(2)}, s) = ∅, i.e., the concurrent execution of the load and unload actions is unsuccessful. 28 5 Adding Reward Strategies The next example illustrates the need to handle numbers and optimization to represent reward mechanisms. The extension of C is simply the introduction of numerical fluents—i.e., fluents that, instead of being simply true or false, have a numerical value. For this purpose, we introduce a new variant of the necessity query necessarily max F for ϕ after A1 , . . . , An where F is a numerical expressions involving only numerical fluents, ϕ is a state formula, and A1 , . . . , An is a plan. Given a domain specification D and an initial state description I, we can define for each fluent numerical expression F and plan α: value(F, α) = max {s(F ) | s ∈ Φ∗ (α, s0 ), s0 is an initial state w.r.t. I, D} where s(F ) denotes the value of the expression F in state s. This allows us to define the following notion of entailment of a query: (D, I) |= necessarily max F for ϕ after A1 , . . . , An if: ◦ (D, I) |= necessarily ϕ after A1 , . . . , An ◦ for every other plan B1 , . . . , Bm such that (D, I) |= necessarily ϕ after B1 , . . . , Bm we have that value(F, [A1 , . . . , An ]) ≥ value(F, [B1 , . . . , Bm ]). The following example has been derived from [5] where it is used to illustrate the coordination among agents to obtain the highest possible payoff. There are three agents. Agent 0 is a normative system that can play one of two strategies—either st0 or ¬st0 . Agent 1 plays a strategy st1 , while agent 2 plays the strategy st2 . The reward system is described in the following tables (the first is for st0 and the second one is for ¬st0 ). st0 st1 ¬st1 st2 1, 1 0, 0 ¬st2 0, 0 −1, −1 ¬st0 st1 ¬st1 st2 1, 1 0, 0 ¬st2 0, 0 1, 1 The signatures used by the agents are F0 = {st0 , reward} A0 = {play 0, play not 0} F1 = {st1 , reward1 } A1 {play 1, play not 1} F2 = {st2 , reward2 } A2 = {play 2, play not 2} The domain specification Dgam consists of: caused st0 after play 0 caused ¬st0 after play not 0 caused st1 after play 1 caused ¬st1 after play not 1 caused st2 after play 2 caused ¬st2 after play not 2 caused reward 1 = 1 if ¬st0 ∧ st1 ∧ st2 caused reward 2 = 1 if ¬st0 ∧ st1 ∧ st2 caused reward 1 = 0 if ¬st0 ∧ st1 ∧ ¬st2 caused reward 2 = 0 if ¬st0 ∧ st1 ∧ ¬st2 ... caused reward = a + b if reward1 = a ∧ reward2 = b 29 Assuming that I = { initially st0 } we can show that (Dgame , I) |= necessarily max reward after {play1 , play2 }. 6 Reasoning and Properties In this section we discuss various types of reasoning that are directly enabled by the semantics of C that can be useful in reasoning about MAS. Recall that we assume that the action theories are developed from the perspective of a modeler who has the view of the complete MAS. 6.1 Capability Queries Let us explore another range of queries, that are aimed at capturing the capabilities of agents. We will use the generic form can X do ϕ, where ϕ is a state formula and X ⊆ AG where AG is the set of agent identifiers of the domain. The intuition is to validate whether the group of agents X can guarantee that ϕ is achieved. If X = AG then the semantics of the capability query is simply expressed as (D, I) |= can X do ϕ iff ∃k. ∃A1 , . . . , Ak such that (D, I) |= necessarily ϕ after A1 , . . . , Ak . If X 6= {1, . . . , n}, then we can envision different variants of this query. Capability query with non-interference and complete knowledge: Intuitively, the goal is to verify whether the agents X can achieve ϕ when operating in an environment that includes all the agents, but the agents AG \ X are simply providing their knowledge and not performing actions or interfering. We will denote this type of queries as cannk X do ϕ (n: not interference, k: availability of all knowledge). The semantics of this type of queries can be formalized as follows: (D, I) |= cannk X do ϕ if there is a sequence of sets of actions A1 , . . . , Am with the following properties: S ◦ for each 1 ≤ i ≤ m we have that Ai ⊆ j∈X Aj (we perform only actions of agents in X) ◦ (D, I) |= necessarily ϕ after A1 , . . . , Am Capability query with non-interference and projected knowledge: Intuitively, the query with projected knowledge assumes that not only the other agents (AG \ X) are passive, but they also are not willing to provide knowledge to the active agents. We will denote this type of queries as cann¬k X do ϕ. Let us refer to the projection of I w.r.t. X (denoted S by proj(I, X)) as the set of all the initially declarations that build on fluents of j∈X Fj . The semantics of cann¬k type of queries can be formalized as follows: (D, I) |= cann¬k X do ϕ if there is a sequence of sets of actions A1 , . . . , Am such that: S • for each 1 ≤ i ≤ m we have that Ai ⊆ j∈X Aj • (D, proj(I, X)) |= necessarily ϕ after A1 , . . . , Am (i.e., the objective will be reached irrespective of the initial configuration of the other agents) 30 Capability query with interference: The final version of capability query takes into account the possible interference from other agents in the system. Intuitively, the query with interference, denoted by cani X do ϕ, implies that the agents X will be able to accomplish X in spite of other actions performed by the other agents. The semantics is as follows: (D, I) |= cani X do ϕ if there is a sequence of sets of actions A1 , . . . , Am such that: S • for each 1 ≤ i ≤ m we have that Ai ⊆ j∈X Aj Sm S • for each sequence of sets of actions B1 , . . . , Bm , where j=1 Bj ⊆ j ∈X / Aj , we have that (D, I) |= necessarily ϕ after (A1 ∪ B1 ), . . . , (Am ∪ Bm ). 6.2 Inferring Properties of the Theory The form of queries explored above allows us to investigate some basic properties of a multi-agent action domain. Agent Redundancy: agent redundancy is a property of (D, I) which indicates the ability to remove an agent to accomplish a goal. Formally, agent i is redundant w.r.t. a state formula ϕ and an initial state I if (D, I) |= can X \ {i} do ϕ. The “level” of necessity can be refined, by adopting different levels of can (e.g., cann¬k implies that the knowledge of agent i is not required); it is also possible to strengthen it by enabling the condition to be satisfied for any I. Agent Necessity: agent necessity is symmetrical to redundancy—it denotes the inability to accomplish a property ϕ if an agent is excluded. Agent i is necessary w.r.t. ϕ and (D, I) if for all sequences of sets of actions A1 , . . . , Am , such that for all 1 ≤ j ≤ m Aj ∩ Ai = ∅, we have that it is not the case that (D, I) |= necessarily ϕ after A1 , . . . , Am . We can also define different degrees of necessity, depending on whether the knowledge of i is available (or it should be removed from I) and whether i can interfere. 6.3 Compositionality The formalization of multi-agent systems in C enables exploring the effects of composing domains; this is an important property, that allows us to model dynamic MAS systems (e.g., where new agents can join an existing coalition). Let D1 , D2 be two domains and let us indicate with hFi1 , A1i ii∈AG 1 and hFi2 , A2i ii∈AG 2 the agent signatures of D1 S and D2 . We assume that all actions sets are disjoint, while S we allow ( i∈AG 1 Fi1 ) ∩ ( i∈AG 2 Fi2 ) 6= ∅. We define the two instances (D1 , I1 ) and (D2 , I2 ) to be composable w.r.t. a state formula ϕ if (D1 , I1 ) |= can AG 1 do ϕ or (D2 , I2 ) |= can AG 2 do ϕ implies (D1 ∪ D2 , I1 ∪ I2 ) |= can AG 1 ∪ AG 2 do ϕ Two instances are composable if they are composable w.r.t. all formulae ϕ. Domains D1 , D2 are composable if all the instances (D1 , I1 ) and (D2 , I2 ) are composable. 31 7 Reasoning with Agent Knowledge In this section, we will consider some examples from [12, 30, 18] which address another aspect of modeling MAS, i.e., the exchange of knowledge between agents and the reasoning in presence of incomplete knowledge. The examples illustrate the limitation of C as a language for multi-agent domains and the inadequacy of modeling MAS from the perspective of an omniscient modeler. 7.1 Heaven and Hell Domain: The Modeler’s Perspective This example has been drawn from [30], where it is used to motivate the introduction of decentralized POMDP and its use in multi-agent planning. The following formalization does not consider the rewards obtained by the agents after the execution of a particular plan. In this domain, there are two agents 1 and 2, a priest p, and three rooms r1 , r2 , r3 . Each of the two rooms r2 and r3 is either heaven or hell. If r2 is heaven then r3 is hell and vice versa. The priest has the information where heaven/hell is located. The agents 1 and 2 do not know where heaven/hell is; but, by visiting the priest, they can receive the information that tells them where heaven is. 1 and 2 can also exchange their knowledge about the location of heaven. 1 and 2 want to meet in heaven. The signatures for the three agents are as follows (k, h ∈ {1, 2, 3}): F1 = {heaven21 , heaven31 , atk1 } F2 = {heaven22 , heaven32 , atk2 } Fp = {heaven2p , heaven3p } A1 = {m1 (k, h), ask12 , ask1p } A2 = {m2 (k, h), ask21 , ask2p } Ap = ∅ Intuitively, heavenji denotes that i knows that heaven is in the room j and atji denotes that i is at the room j. askij is an action whose execution will allow i to know where heaven is if j knows where heaven is. On the other hand, mi (k, h) encodes the action of moving from the room k to the room h of i. Observe that the fact that i does not know the location of heaven is encoded by the formula ¬heaven2i ∧ ¬heaven3i . The domain specification Dhh contains the following laws: (j ∈ {2, 3}, x ∈ {2, p}) caused heavenj1 if heavenjx after ask1x caused heavenj2 if heavenjx after ask2x (j ∈ {2, 3}, x ∈ {1, p}) j k (i ∈ {1, 2, p}, j, k ∈ {1, 2, 3}) caused ati if ati after mi (k, j) caused ¬atji if atki (i ∈ {1, 2, p}, j, k ∈ {1, 2, 3}, j 6= k) (i ∈ {1, 2, p}, j ∈ {2, 3}) caused ¬heaven2i if heaven3i caused ¬heaven3i if heaven2i (i ∈ {1, 2, p}, j ∈ {2, 3}) The first two laws indicate that if 1 (or 2) asks 2 or p (or 1 or p) for the location of heaven, then 1 (or 2) will know where heaven is if 2/p (or 1/p) has this information. The third law encodes the effect of moving between rooms by the agents. The fourth law represents the static law indicating that one person can be at one place at a time. 32 Let us consider an instance that has initial state described by I5 (j ∈ {2, 3}): initially at11 initially ¬heavenj1 initially at22 initially ¬heavenj2 initially heaven2p We can show that (Dhh , I5 ) |= necessarily at21 ∧ at22 after {ask1p }, {m1 (1, 2)} 7.2 Heaven and Hell: The Agent’s Perspective The previous encoding of the domain has been developed considering the perspective of a domain modeler, who has complete knowledge about the world and all the agents. This perspective is reasonable in the domains encountered in the previous sections. Nevertheless, this perspective makes a difference when the behavior of one agent depends on knowledge that is not immediately available, e.g., agent 1 does not know where heaven is and needs to acquire this information through knowledge exchanges with other agents. The model developed in the previous subsection is adequate for certain reasoning tasks (e.g., plan validation) but it is weak when it comes to tasks like planning. An alternative model can be devised by looking at the problem from the perspective of each individual agent (not from a central modeler). This can be captured through an adaptation of the notion of sensing actions discussed in [25, 26]. Intuitively, a sensing action allows for an agent to establish the truth value of unknown fluents. A sensing action a can be specified by laws of the form determines l1 , . . . , lk if F after a where l1 , . . . , lk are fluent literals, F is a state formula, and a is a sensing action. Intuitively, a can be executed only when F is true and after its execution, one of l1 , . . . , lk is set to true and all the others are set to false. The semantics of C extended with sensing actions can be defined in a similar fashion as in [26] and is omitted here for lack of space. It suffices to say that the semantics of the language should now account for different possibilities of the multi-agent systems due to incomplete information of the individual agents. The signatures for the three agents are as follows (k, h ∈ {1, 2, 3}): F1 = {heaven21 , heaven31 , ok12 , ok1p , atk1 } F2 = {heaven22 , heaven32 , ok21 , ok2p , atk2 } Fp = {heaven2p , heaven3p } A1 = {m1 (k, h), ask12 , ask1p , know?21 , know?p1 } A2 = {m2 (k, h), ask21 , ask2p , know?12 , know?p2 } Ap = ∅ Intuitively, the fluent okyx denotes the fact that agent y knows that agent x knows the location of heaven. The initial state for 1 is given by I51 = { initially at11 , initially ok1p }. Similarly, the initial state for 2 is I52 = { initially at22 , initially ok2p }, and for p is I5p = { initially heaven2p }. The domain specification D1 for 1 include the last four statements of Dhh and the following sensing action specifications: determines heaven21 , heaven31 if ok1x after ask1x determines ok1x , ¬ok1x after know?x1 33 (x ∈ {2, p}) (x ∈ {2, p}) The domain specification D2 for 2 is similar. The domain specification Dp consists of ′ only the last two static laws of Dhh . Let Dhh = D1 ∪ D2 ∪ Dp and I5′ = I51 ∪ I52 ∪ I5p , we can show that ′ , I5′ ) |= necessarily heaven21 ∧ heaven22 after {ask1p }, {know?12 }, {ask21 }. (Dhh 7.3 Beyond C with Sensing Actions This subsection discusses an aspect of modeling MAS that cannot be easily dealt with in C, even with sensing actions, i.e., representing and reasoning about knowledge of agents. In Section 7.1, we use two different fluents to model the knowledge of an agent about properties of the world, similar to the approach in [26]. This approach is adequate for several situations. Nevertheless, the same approach could become quite cumbersome if complex reasoning about knowledge of other agents is involved. Let us consider the well known Muddy Children problem [12]. Two children are playing outside the house. Their father comes and tells them that at least one of them has mud on his/her forehead. He then repeatedly asks “do you know whether your forehead is muddy or not?”. The first time, both answer “no” and the second time, both say ’yes’. It is known that the father and the children can see and hear each other. The representation of this domain in C is possible, but it would require a large number of fluents (that describe the knowledge of each child, the knowledge of each child about the other child, etc.) as well as a formalization of the axioms necessary to express how knowledge should be manipulated, similar to the fluents okij in the previous example. A more effective approach is to introduce explicit knowledge operators (with manipulation axioms implicit in their semantics—e.g., as operators in a S5 modal logic) and use them to describe agents state. Let us consider a set of modal operators Ki , one for each agent. A formula such as Ki ϕ denotes that agent i knows property ϕ. Knowledge operators can be nested; in particular, K∗ G ψ denotes all formulae with arbitrary nesting of KG operators (G being a set of agents). In our example, let us denote the children with 1 and 2, mi as a fluent to denote whether i is muddy or not. The initial state of the world can then be described as follows: initially m1 ∧ m2 initially ¬Ki mi ∧ ¬Ki ¬mi (1) (2) initially K∗ (m1 ∨ m2 ) initially K∗ {1,2}\{i} mi ∗ initially K (K∗ {1,2}\{i} mi ∨ K∗ {1,2}\{i} ¬mi ) (3) (4) (5) where i ∈ {1, 2}. (1) states that all the children are muddy. (2) says that i does not know whether he/she is muddy. (3) encodes the fact that the children share the common knowledge that at least one of them is muddy. (4) captures the fact that each child can see the other child. Finally, (5) represents the common knowledge that each child knows the muddy status of the other one. The actions used in this domain would enable agents to gain knowledge; e.g., the ’no’ answer of child 1 allows child 2 to learn K1 (¬K1 m1 ∧ ¬K1 ¬m1 ). This, together 34 with the initial knowledge, would be sufficient for 2 to conclude K2 m2 . A discussion of how these inferences occur can be found, for example, in [12]. 8 Discussion and Conclusion In this paper, we presented an investigation of the use of the C action language to model MAS domains. C, as several other action languages, is interesting as it provides well studied foundations for knowledge representation and for performing several types of reasoning tasks. Furthermore, the literature provides a rich infrastructure for the implementation of action languages (e.g., through translational techniques [27]). The results presented in this paper identify several interesting features that are necessary for modeling MAS, and they show how many of these features can be encoded in C—either directly or with simple extensions of the action language. We also report challenging domains for C. There have been many agent programming languages such as the BDI agent programming AgentSpeak [23], (as implemented in Jason [4]), JADE [3] (and its extension Jadex [7]), ConGolog [10], IMPACT [1], 3APL [9], GOAL [19]. A good comparison of many of these languages can be found in [21]. We would like to stress that the paper does not introduce a new agent “programming language”, in the style of languages mentioned above. Rather, we bring an action language perspective, where the concern is on succinctly and naturally specifying the transition between worlds due to actions. Thus our focus is how to extend actions languages to the multi-agent domain in a way to capture various aspects of multi-agent reasoning. The issues of implementation and integration in a distributed environment are interesting, but outside of the scope of this paper. To draw an analogy, what we propose in this paper is analogous to the role of situation calculus or PDDL in the description of single-agent domains, which describe the domains without providing implementation constructs for composing programs, as in Golog/ConGolog or GOAL. As such, our proposal could provide the underlying representation formalism for the development of an agent programming language; on the other hand, it could be directly used as input to a reasoning system, e.g., a planner [8]. Our emphasis in the representation is exclusively on the description of effects of actions; this distinguishes our approach from other logic-based formalisms, such as those built on MetateM [13]. Although our proposal is not an agent programming language, it is still interesting to analyze it according to the twelve dimensions discussed in [11] and used in [21]; 1. 2. 3. 4. Purpose of use: the language is designed for formalization and verification of MAS. Time: the language does not have explicit references to time. Sensing: the language supports sensing actions. Concurrency: our proposed language enables the description of concurrent and interacting actions. 5. Nondeterminism: the language naturally supports nondeterminism. 6. Agent knowledge: our language allows for the description of agents with incomplete knowledge and can be extended to handle uncertainty. 7. Communication: this criteria is not applicable to our language. 35 8. Team working: the language could be used for describing interaction between agents including coordination [28] and negotiation [29]. 9. Heterogeneity and knowledge sharing: the language does not force the agents to use the same ontology. 10. Programming style: this criteria is not applicable to our language since it is not an agent programming language. 11. Modularity: our language does not provide any explicit mechanism for modularizing the knowledge bases. 12. Semantics: our proposal has a clear defined semantics, which is based on the transition system between states. The natural next steps in this line of work consist of (1) exploring the necessary extensions required for a more natural representation and reasoning about knowledge of agents in MAS domains (see Sect. 7); (2) adapting the more advanced forms of reasoning and implementation proposed for C to the case of MAS domains; (3) investigating the use of the proposed extension of C in formalizing distributed systems. Acknowledgement: The last two authors are partially supported by the NSF grants IIS-0812267, CBET-0754525, CNS-0220590, and CREST-0420407. References 1. V.S. Subrahmanian, P. Bonatti, J. Dix, T. Eiter, S. Kraus, F. Ozcan and R. Ross. Heterogeneous Agent Systems: Theory and Implementation. MIT Press, 2000. 2. A. Baker. A simple solution to the Yale Shooting Problem. In KRR, 11–20. 1989. 3. F.L. Bellifemine, G. Caire, D. Greenwood. Developing Multi-Agent Systems with JADE J. Wiley & Sons, 2007. 4. R.H. Bordini, J.F. Hübner, M. Wooldridge. Programming Multi-agent Systems in AgentSpeak using Jason. J. Wiley and Sons, 2007. 5. G. Boella and L. van der Torre. Enforceable social laws. In AAMAS 2005, 682–689. ACM. 6. C. Boutilier and R. I. Brafman. Partial-order planning with concurrent interacting actions. J. Artif. Intell. Res. (JAIR), 14:105–136, 2001. 7. L. Braubach, A. Pokahr, W. Lamersdorf. Jadex: a BDI-Agent System Combining Middleware and Reasoning. In Software Agent-based Applications, Platforms and Development Kits, Springer Verlag, 2005. 8. M. Brenner. Planning for Multi-agent Environments: From Individual Perceptions to Coordinated Execution. In Work. on Multi-agent Planning and Scheduling, ICAPS, 80–88. 2005. 9. M. Dastani, F. Dignum, J.J. Meyer. 3APL: A Programming Language for Cognitive Agents. ERCIM News, European Research Consortium for Informatics and Mathematics, Special issue on Cognitive Systems, No. 53, 2003. 10. G. De Giacomo, Y. Lespèrance, H.J. Levesque. ConGolog, a concurrent programming language based on the situation calculus. Artificial Intelligence, 121(1–2):109–169, 2000. 11. N. Jennings, K. Sycara, and M. Wooldridge A roadmap of agent research and development. Autonomous Agents and Multi-Agent Systems 1, 738, 1998. 12. R. Fagin, J. Halpern, Y. Moses, M. Vardi. Reasoning about Knowledge. MIT press, 1995. 13. M. Fisher. A survey of Concurrent METATEM – the language and its applications. Temporal Logic - Proceedings of the First International Conference (LNAI Volume 827), Springer Verlag, pp. 480–505, 1994. 36 14. M. Gelfond and V. Lifschitz. Representing actions and change by logic programs. Journal of Logic Programming, 17(2,3,4):301–323, 1993. 15. M. Gelfond and V. Lifschitz. Action languages. ETAI, 3(6), 1998. 16. J. Gerbrandy. Logics of propositional control. In AAMAS 2006, 193–200. ACM, 2006. 17. S. Hanks and D. McDermott. Nonmonotonic logic and temporal projection. Artificial Intelligence, 33(3):379–412, 1987. 18. A. Herzig and N. Troquard. Knowing how to play: uniform choices in logics of agency. In AAMAS 2006, 209–216, 2006. 19. F.S. de Boer, K.V. Hindriks, W. van der Hoek, and J.-J.Ch. Meyer. A verification framework for agent programming with declarative goals. Journal of Applied Logic, 5:277–302, 2005. 20. H. Kautz. The logic of persistence. In Proceedings of AAAI-86, pages 401–405, 1986. 21. V. Mascardi, M. Martelli, and L. Sterling. Logic-Based Specification Languages for Intelligent Software Agents. Theory and Practice of Logic Programming, 4(4):495–537. 22. J. McCarthy and P. Hayes. Some philosophical problems from the standpoint of artificial intelligence. Machine Intelligence, vol. 4, pages 463–502. Edinburgh University Press, 1969. 23. A.S. Rao. AgentSpeak(L): BDI Agents Speak Out in a Logical Computable Language. In Proceedings of Seventh European Workshop on Modeling Autonomous Agents in a MultiAgent World, MAAMAW, pp. 42-55, 1996. 24. L. Sauro, J. Gerbrandy, W. van der Hoek, and M. Wooldridge. Reasoning about action and cooperation. In AAMAS 2006, 185–192, New York, NY, USA, 2006. ACM. 25. R. Scherl and H. Levesque. Knowledge, action, and the frame problem. Artificial Intelligence, 144(1-2), 2003. 26. T.C. Son and C. Baral. Formalizing sensing actions - a transition function based approach. Artificial Intelligence, 125(1-2):19–91, January 2001. 27. T. C. Son, C. Baral, N. Tran, and S. McIlraith. Domain-dependent knowledge in answer set planning. ACM Trans. Comput. Logic, 7(4):613–657, 2006. 28. T.C. Son and C. Sakama. Reasoning and Planning with Cooperative Actions for Multiagents Using Answer Set Programming. In Proceedings of DALT, 2009. 29. T.C. Son, E. Pontelli, and C. Sakama. Logic Programming for Multiagent Planning with Negotiation. In Proceedings of the 25th International Conference on Logic Programming (ICLP), 2009. 30. M. Spaan, G. J. Gordon, and N. A. Vlassis. Decentralized planning under uncertainty for teams of communicating agents. In AAMAS 2006, pages 249–256, 2006. 31. W. van der Hoek, W. Jamroga, and M. Wooldridge. A logic for strategic reasoning. In 2005, 157–164. ACM, 2005. 32. H.P. van Ditmarsch, W. van der Hoek, and B.P. Kooi. Concurrent Dynamic Epistemic Logic for MAS. In AAMAS, 2003. 37 Model Checking Normative Agent Organisations⋆ Louise Dennis1 , Nick Tinnemeier2 , and John-Jules Meyer2 1 Department of Computer Science, University of Liverpool, Liverpool, U.K. {L.A.Dennis}@csc.liv.ac.uk 2 Department of Information and Computing Sciences, Utrecht University, Utrecht, The Netherlands {nick,jj}@cs.uu.nl Abstract. We present the integration of a normative programming language in the MCAPL framework for model checking multi-agent systems. The result is a framework facilitating the implementation and verification of multi-agent systems coordinated via a normative organisation. The organisation can be programmed in the normative language while the constituent agents may be implemented in a number of (BDI) agent programming languages. We demonstrate how this framework can be used to check properties of the organisation and of the individual agents in an LTL based property specification language. We show that different properties may be checked depending on the information available to the model checker about the internal state of the agents. We discuss, in particular, an error we detected in the organisation code of our case study which was only highlighted by attempting a verification with “white box” agents. 1 Introduction Since Yoav Shoham coined the term ”agent-oriented programming” [18], many dedicated languages, interpreters and platforms to facilitate the construction of multi-agent systems have been proposed. Examples of such agent programming languages are Jason [6], GOAL [13] and 2APL [8]. An interesting feature of the agent paradigm is the possibility for building heterogeneous agent systems. That is to say, a system in which multiple agents, implemented in different agent programming languages and possibly by different parties, interact. Recently, the area of agent programming is shifting attention from constructs for implementing single agents, such as goals, beliefs and plans, to social constructs for programming multiagent systems, such as roles and norms. In this view a multi-agent system is seen as a computational organisation that is constructed separately from the agents that will interact with it. Typically, little can be assumed about the internals of these agents and the behaviour they will exhibit. When little can be assumed about the agents that will interact with the organisation, a norm enforcement mechanism – a process that is responsible for detecting when norms are violated and responding to these violations by imposing sanctions – becomes crucial to regulate their behaviour and to achieve and maintain the system’s global design objectives [19]. ⋆ Work partially supported by EPSRC under grant EP/D052548 and by the CoCoMAS project funded through the Dutch Organization for Scientific Research (NWO). 38 One of the challenges in constructing multi-agent systems is to verify that the system meets its overall design objectives and satisfies some desirable properties. For example, that a set of norms actually enforces the intended behaviour and whether the agents that will reside in the system will be able to achieve their goals. In this paper we report on the extension of earlier work [11] of one of the authors on the automatic verification of heterogeneous agent systems to include organisational (mostly normative) aspects also, by incorporating the normative programming language as presented in [9]. The resulting framework allows us to use automated verification techniques for multi-agent systems consisting of a heterogeneous set of agents that interact with a norm governed organisation. The framework in [11] is primarily targeted at a rapid implementation of agent programming languages that are endowed with an operational semantics [16]. The choice for the integration of the normative programming language proposed in [9] is motivated by the presence of an operational semantics. It should be noted that we are not the first to investigate the automatic verification of multi-agent systems and computational organisations. There are already some notable achievements in this direction. Examples of work on model checking techniques for multi-agent systems are [4, 5, 15]. In contrast to [11] the work on model checking agent systems is targeted at homogeneous systems pertaining to the less realistic case in which all agents are built in the same language. Most importantly, these works (including [11]) do not consider the verification of organisational concepts. Work related to the verification of organisational aspects has appeared, for example, in [14, 7, 20, 1], but in these frameworks the internals of the agents are (intentionally) viewed as unknown. This is explained by the observation that in a deployed system little can be assumed about the agents that will interact with it. Still, we believe that for verification purposes at design time it would be useful to also take the agents’ architecture into account. This allows us, for example, to assert the correctness of a (prototype) agent implementation in the sense that it will achieve its goals without violating a norm. Such an implementation might then be published to serve as a guideline for external agent developers. It also gives more insights in the behaviour of the system as a whole. The rest of the paper is structured as follows: In section 2 we give an overview of the language for programming normative organisations (which we will name ORWELL from now on) and discuss the general properties of the dining philosophers problem we use as a running example throughout the paper. Section 3 describes the MCAPL framework for model checking multi-agent systems programmed in a variety of BDI-style agent programming languages. Section 4 discusses the implementation of ORWELL in the MCAPL framework. Section 5 discusses a case study we undertook to model check some properties in a number of different multi-agent systems using the organisation. 2 ORWELL Programming Normative Agent Organisations This section briefly explains the basic concepts involved in the approach to constructing normative multi-agent organisations and how they can be programmed in ORWELL. A more detailed description of its formal syntax and operational semantics can be found in [9]. A multi-agent system, as we conceive it, consists of a set of heterogeneous agents interacting with a normative organisation (henceforth organisation). Figure 1 depicts a snapshot of such a multi-agent system. As mentioned before, by heterogeneous we mean that agents are potentially implemented in different agent programming languages by unknown pro- 39 grammers. An organisation encapsulates a domain specific state and function, for instance, a database in which papers and reviews are stored and accompanying functions to upload them. The domain specific state is modeled by a set of brute facts, taken from Searle [17]. The agents perform actions that change the brute state to interact with the organisation and exploit its functionality. organisation sanctions institutional facts brute facts counts-as actions agents Fig. 1: Agents interacting with a normative organisation. An important purpose of an organisation is to coordinate the behavior of its interactants and to guide them in interacting with it in a meaningful way. This is achieved through normative component that is defined by a simple account of counts-as rules as defined by Grossi [12]. Counts-as rules normatively assess the brute facts and label a state with a normative judgment marking brute states as, for example, good or bad. This normative judgment is stored as institutional facts, again taken from Searle [17]. To motivate the agents to abide by the norms, certain normative judgments might lead to sanctions which are imposed on the brute state. In what follows we explain all these constructs using the agent variant of the famous dining philosophers problem in which five spaghetti-eating agents sit at a circular table and compete for five chopsticks. The sticks are placed in between the agents and each agent needs two sticks to eat. Each agent can only pickup the sticks on her immediate left and right. When not eating the agents are deliberating. It is important to emphasize that in this example the chopsticks are metaphors for shared resources and the problem touches upon many interesting problems that commonly arise in the field of concurrent computing, in particular deadlock and starvation. There are many known solutions to the dining philosophers problem and it is not our intention to come up with a novel solution. We merely use it to illustrate the ORWELL language. The ORWELL implementation of the dining agents is listed in code fragment 2.1. The initial brute state of the organisation is specified by the facts component. The agents named ag1, . . . ,ag5 are numbered one to five clockwise through facts of the form agent(A,I). Sticks are also identified by a number such that the right stick of an agent numbered I is 40 Code fragment 2.1 Dining agents implemented in ORWELL. : Brute F a c t s : down ( 1 ) down ( 2 ) down ( 3 ) down ( 4 ) down ( 5 ) food ( 1 ) food ( 2 ) food ( 3 ) food ( 4 ) food ( 5 ) a g e n t ( ag1 , 1 ) a g e n t ( ag2 , 2 ) a g e n t ( ag3 , 3 ) a g e n t ( ag4 , 4 ) a g e n t ( ag5 , 5 ) 1 2 3 4 5 : E f f e c t Rules : { a g e n t (A, I ) , down ( I ) } d o e s (A, p u r ) {−down ( I ) , h o l d ( I , r ) , r e t u r n ( u ) } { a g e n t (A, I ) , −down ( I ) } d o e s (A, p u r ) { r e t u r n ( d ) } { a g e n t (A, I ) , h o l d ( I , r ) } d o e s (A, p d r ) {down ( I ) , −h o l d ( I , r ) } { a g e n t ( ag1 , 1 ) , down ( 2 ) } d o e s ( ag1 , p u l ) {−down ( 2 ) , h o l d ( 1 , l ) , r e t u r n ( u ) } { a g e n t ( ag1 , 1 ) , −down ( 2 ) } d o e s ( ag1 , p u l ) { r e t u r n ( d ) } { a g e n t ( ag2 , 2 ) , down ( 3 ) } d o e s ( ag2 , p u l ) {−down ( 3 ) , h o l d ( 2 , l ) , r e t u r n ( u ) } { a g e n t ( ag2 , 2 ) , −down ( 3 ) } d o e s ( ag2 , p u l ) { r e t u r n ( d ) } { a g e n t ( ag3 , 3 ) , down ( 4 ) } d o e s ( ag3 , p u l ) {−down ( 4 ) , h o l d ( 3 , l ) , r e t u r n ( u ) } { a g e n t ( ag3 , 3 ) , −down ( 4 ) } d o e s ( ag3 , p u l ) { r e t u r n ( d ) } { a g e n t ( ag4 , 4 ) , down ( 5 ) } d o e s ( ag4 , p u l ) {−down ( 5 ) , h o l d ( 4 , l ) , r e t u r n ( u ) } { a g e n t ( ag4 , 4 ) , −down ( 5 ) } d o e s ( ag4 , p u l ) { r e t u r n ( d ) } { a g e n t ( ag5 , 5 ) , down ( 1 ) } d o e s ( ag5 , p u l ) {−down ( 1 ) , h o l d ( 5 , l ) , r e t u r n ( u ) } { a g e n t ( ag5 , 5 ) , −down ( 1 ) } d o e s ( ag5 , p u l ) { r e t u r n ( d ) } { a g e n t (A, I ) , h o l d ( I , l ) } d o e s (A, p d l ) {down ( ( ( I % 5 ) + 1 ) ) , −h o l d ( I , l ) } { a g e n t (A, I ) , h o l d ( I , r ) , h o l d ( I , l ) , f o o d ( I ) } d o e s (A, e a t ) {−f o o d ( I ) , r e t u r n ( y e s ) } { a g e n t (A, I ) , −f o o d ( I ) } d o e s (A, e a t ) { r e t u r n ( no ) } 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 : CountsAs R u l e s : {−h o l d ( 1 , r ) , h o l d ( 1 , l ) , f o o d ( 1 ) } { T r u e } => { v i o l ( 1 ) } { h o l d ( 2 , r ) , −h o l d ( 2 , l ) , f o o d ( 2 ) } { T r u e } => { v i o l ( 2 ) } {−h o l d ( 3 , r ) , h o l d ( 3 , l ) , f o o d ( 3 ) } { T r u e } => { v i o l ( 3 ) } { h o l d ( 4 , r ) , −h o l d ( 4 , l ) , f o o d ( 4 ) } { T r u e } => { v i o l ( 4 ) } {−h o l d ( 5 , r ) , h o l d ( 5 , l ) , f o o d ( 5 ) } { T r u e } => { v i o l ( 5 ) } { a g e n t (A, I ) , − f o o d ( I ) , − h o l d ( I , r ) , − h o l d ( I , l ) } { T r u e } => { r e w a r d ( I ) } 32 33 34 35 36 37 38 39 : Sanction Rules : { v i o l (A) } => {−f o o d (A) , p u n i s h e d (A) } { r e w a r d (A) } => { f o o d (A) , r e w a r d e d (A) } 41 40 41 42 numbered I and its left stick is numbered I%5+13 . The fact that an agent I is holding a stick is modeled by hold(I,X) with X ∈ {r,l} in which r denotes the right and l the left stick. The fact that a stick I is down on the table is denoted by down(I) and a fact food(I) denotes that there is food on the plate of agent I. We assume that initially no agent is holding a stick (all sticks are on the table) and all agents are served with food. The initial situation of the dining agents is shown graphically in figure 2. The specification of the initial brute state is depicted in lines 1-4. 2 1 2 3 1 5 5 4 3 4 (a) The initial table arrangement. (b) A deadlock situation. Fig. 2: The dining agents problem. The brute facts change under the performance of actions by agents. The effects describe which effect an action has on the brute state and are used by the organization to determine the resulting brute state after performance of the action. They are defined by triples of the form {Pre}a{Post}, intuitively meaning that when action a is executed and set of facts Pre is derivable by the current brute state, the set of facts denoted by Post is to be accomodated in it. We use the notation φ to indicate that a belief holds in the precondition, or should be added in the postcondition and −φ to indicate that a belief does not hold (precondition) or should be removed (postcondition). Actions a are modeled by predicates of the form does(A,Act) in which Act is a term denoting the action and A denotes the name of the agent performing it. The dining agents, for example, can perform actions to pick up and put down their (left and right) sticks and eat. The effect rules defining these actions are listed in lines 7-304 . An agent can only pickup a stick if the stick is on the table (line 7), can only put down a stick when it is holding it (line 9) and can eat when it has lifted both sticks and has food on its plate (line 21). Actions might have different effects depending on the particular brute state. To inform agents about the effect of an action we introduce special designated unary facts starting with predicate return to pass back information (terms) to the agent performing the action. These facts are not asserted to the brute state. Picking up a stick will thus return u (up) in case the stick is successfully lifted (line 6) and d (down) otherwise (line 7). Similarly, the succes of performing an eat action is indicated by returning yes (line 29) or no (line 30). 3 4 Where % is arithmetic modulus. It should be noted that the current ORWELL prototype has limited ability to reason about arithmetic in rule preconditions. Hence the unecessary proliferation of some rules in this example. 42 Note that we assume that agents will only perform the eat action in case they have lifted their stick. Ways for returning information (and handling failure) were not originally described in [9] and are left for future research. When every agent has decided to eat, holds a left stick and waits for a right stick, we have a deadlock situation (see figure 2b for a graphical representation). One (of many) possible solutions to prevent deadlocks is to implement a protocol in which the odd numbered agents are supposed to pick-up their right stick first and the even numbered agents their left. Because we cannot make any assumptions about the internals of the agents we need to account for the sub-ideal situation in which an agent does not follow the protocol. To motivate the agents to abide by the protocol we implement norms to detect undesirable (violations) and desirable behaviour (lines 33-38). The norms in our framework take on the form of elementary countsas rules relating a set of brute facts with a set of institutional facts (the normative judgment). The rules listed in lines 33, 35 and 37 state that a situation in which an odd numbered agent holds her left stick and not her right while there is food on her plate counts as a violation. Rules listed in lines 34 and 36 implement the symmetric case for even numbered agents. The last rule marks a state in which an agent puts down both sticks when there is no food on her plate as good behaviour. It is important to emphasize that in general hard-wiring the protocol by the action specification (in this case effect rules) such that violations are not possible severely limits the agent’s autonomy [2]. It should also be noted that the antecedent of a counts-as rule can also contain institutional facts (in this example these are irrelevant and the institutional precondition is True). Undesirable behaviour is punished and good behaviour is rewarded. This is expressed by the sanction rules (lines 41-42) of code fragment 2.1. Sanction rules are expressed as a kind of inverted counts-as rules relating a set of institutional facts with a set of brute facts to be accommodated in the brute state. Bad behaviour, that is not abiding by the protocol, is thus punished by taking away the food of the agent such that it cannot successfully perform the eat action. Good behaviour, i.e. not unnecesarily keeping hold of sticks, is rewarded with food. 3 The MCAPL Framework for Model Checking Agent Programming Languages The MCAPL framework is intended to provide a uniform access to model-checking facilities to programs written in a wide range of BDI-style agent programming languages. The framework is outlined in [10] and described in more detail in [3]. Fig. 3 shows an agent executing within the framework. A program, originally programmed in some agent programming language and running within the MCAPL Framework is represented. It uses data structures from the Agent Infrastructure Layer (AIL) to store its internal state comprising, for instance, an agent’s belief base and a rule library. It also uses an interpreter for the agent programming language that is built using AIL classes and methods. The interpreter defines the reasoning cycle for the agent programming language which interacts with a model checker, essentially notifying it when a new state is reached that is relevant for verification. The Agent Infrastructure Layer (AIL) toolkit was introduced as a uniform framework [11] for easing the integration of new languages into the existing execution and verification engine. It provides an effective, high-level, basis for implementing operational semantics [16] 43 44 44 niques for various agent-oriented programming languages and even allows heterogeneous settings [11]. 4 Modified Semantics for ORWELL for Implementation in the AIL In this work we apply the MCAPL framework to the ORWELL language and experiment with the model checking of organisations. Although ORWELL is an organisational language rather than an agent programming language many of its features show a remarkable similarity to concepts that are used in BDI agent programming languages. The brute and insitutional facts, for example, can be viewed as knowledge bases. The belief bases of typical BDI agent languages, which are used to store the beliefs of an agent, are also knowledge bases. Further, the constructs used in modelling effects, counts-as and sanctions are all types of rules that show similarities with planning rules used by agents. This made it relatively straightforward to model ORWELL in the AIL. The framework assumes that agents in an agent programming language all possess a reasoning cycle consisting of several (≥ 1) stages. Each stage is a disjunction of rules that define how an agent’s state may change during the execution of that stage. Only one stage is active at a time and only rules that belong to that stage will be considered. The agent’s reasoning cycle defines how the reasoning process moves from one stage to another. The combined rules of the stages of the reasoning cycle define the operational semantics of that language. The construction of an interpreter for a language involves the implementation of these rules (which in some cases might simply make reference to the pre-implemented rules) and a reasoning cycle. Standard ORWELL [9] has one rule in its reasoning cycle that describes the organisation’s response to actions performed by interacting agents. When an action is received, the application of this rule; 1. applies one effect rule, 2. then applies all applicable counts-as rules until no more apply and 3. then applies all applicable sanction rules. The application of this rule thus performs a sequence of modifications to the agent state which the AIL would most naturally present as separate transitions. We needed to reformulate the original rule as a sequence of transition rules in a new form of the operational semantics and include a step in which the organisation perceived the actions taken by the agents interacting with it. Figure 4 shows the reworked reasoning cycle for ORWELL. It starts with a perception phase in which agent actions are perceived. Then it moves through two stages which apply an effect rule (B & C), two for applying counts-as rules (D & E) and two for applying sanction rules (F & G). Lastly there is a stage (H) where the results of actions are returned to the agent taking them. The splitting of the rule phases into two was dictated by the default mechanisms for applying rules5 in the AIL, in which a set of applicable rules are first generated and then one is chosen and processed. It would have been possible to combine this process into one 5 Called plans in the AIL terminology. 45 !"#$"%&"'()"*+'($+%,*- ( ."*"#/+"',*"'011"$+'234"- C 0>>07?'2@;09 (5548'011"$+'234"' 7 !#,$"--'011"$+'234"'7:/*)" 7A@<?9!(9'2@;09 D ."*"#/+"'/44'(554%$/64"'7,3*+-!(-'234"<,'234"-';"1+ (5548'/*'(554%$/64"'7,3*+-!(-'234" 234"-';"1+ !#,$"--'7,3*+-!(-'234"'7:/*)" 9(<7?BA<'2@;09 ."*"#/+"'/44'(554%$/64"'9/*$+%,*'234" <,'=,#" (554%$/64" 234"- 0 > (5548'/'9/*$+%,*'234" . !#,$"--'9/*$+%,*'234"'7:/*)" >%*/4%F" E Fig. 4: The ORWELL Reasoning Cycle in the AIL rule, but was simpler, when implementing this prototype, to leave in this form, although it complicates the semantics. Figures 5 to 8 show the operational semantics of ORWELL, reworked for an AIL interpreter and simplified slightly to ignore the effects of unification. The state of an organisation is represented by a large tuple consisting of a “current intention”, i ; a set of additional intentions I ; a set of brute facts, BF ; a set of institutional facts, IF ; a set of effect rules, ER; a set of counts-as rules, CAR; a set of sanction rules, SR; a set of applicable rules, AP ; a list of actions taken by the agents in the organisation, A; and a result store RS to store the result of the action. The last element of the tuple indicates the phase of the reasoning cycle from figure 4. In order to aid readability, we show only those parts of the agent tuple actually changed or referred to by a transition rule. We use the naming conventions just outlined to indicate which parts of the tuple we refer to. The concept of intention is common in many BDI-languages and is used to indicate the intended means for achieving a goal or handling an event. Within the AIL, intentions are data structures which associate events with the plans generated to handle that event (including any instantiations of variables appearing in those plans). As plans are executed the intention 46 is modified accordingly so that it only stores that part of the plan yet to be processed. Of course, the concept of intention is not used in ORWELL. We slightly abuse this single agent concept to store the instantiated plans associated with any applicable rules. Its exact meaning depends on which type of rule (effect, counts-as or sanction) is considered. When an effect rule is applicable, an intention stores the (unexecuted) postconditions of the rule associated with the action that triggered the rule. When a counts-as or sanction rule is applicable an intention stores its (unexecuted) postconditions together with a record of state that made the rule applicable (essentially the conjunction of its instantiated preconditions). (1) hi, a;A, Ai → h(a, ǫ), A, Bi Fig. 5: The Operational Semantics for ORWELL as implemented in the AIL (Agent Actions) Figure 5 shows the semantics for the initial stage. As agents take actions, these are stored in a queue, A, within the organisation for processing6 . The organisation processes one agent action at a time. The reasoning cycle starts by selecting an action, a, for processing. This is converted into an intention tuple (a, ǫ) where the first part of the tuple stores the action (in this case) which created the intention and the second part of the tuple stores the effects of any rule triggered by the intention, i.e. the brute facts to be asserted and retracted. Initially the effects are indicated by a distinguished symbol ǫ, which indicates that no effects have yet been calculated. {(a, Post) | {Pre}a{Post} ∈ ER ∧ BF |= Pre} = ∅ hBF , (a, ǫ), AP , Bi → hBF , null, ∅, Hi {(a, Post) | {Pre}a{Post} ∈ ER ∧ BF |= Pre} = AP ′ hBF , (a, ǫ), AP , Bi → hBF , (a, ǫ), AP ′ , Ci AP ′ 6= ∅ (a, Post) ∈ AP h(a, ǫ), AP , Ci → h(a, Post), ∅, Ci hBF , (a, +bf ;Post), Ci → hBF ∪ {bf }, (a, Post), Ci hBF , (a, −bf ;Post), Ci → hBF /{bf }, (a, Post), Ci h(a, []), Ci → h(a, []), Di (2) (3) (4) (5) (6) (7) Fig. 6: The Operational Semantics for ORWELL as implemented in the AIL (Effect Rules) Figure 6 shows the semantics for processing effect rules. These semantics are very similar to those used for processing counts-as rules and sanction rules and, in many cases the 6 We use ; to represented list cons. 47 implementation uses the same code, simply customised to choose from different sets of rules depending upon the stage of the reasoning cycle. Recall that an effect rule is a triple {Pre}a{Post} consisting of a set of preconditions Pre, an action a taken by an agent and a set of postconditions Post. If the action matches the current intention and the preconditions hold , written BF |= Pre (where BF are the brute facts of the organisation), then the effect rule is applicable. Rule 2 pertains to the case in which no effect rule can be applied. This could happen when no precondition is satisfied or if the action is simply undefined. The brute state will remain unchanged, so there is no need for normatively assessing it. Therefore, the organisation cycles on to stage H were an empty result will be returned. Applicable effect rules are stored in the set of applicable rules AP (rule 3), of which one applicable rule is chosen (rule 4) and its postconditions are processed (rules 5 and 6). The postconditions consist of a stack of changes to be made to the brute facts, +bf indicates that the fact bf should be added and −bf indicates that a fact should be removed. These are processed by rules 5 and 6 in turn until no more postconditions apply (rule 7). Then it moves on to the next stage (stage D) in which the resulting brute state is normatively assessed by the counts-as rules. V {( Pre, Post) | {Pre} ⇒ {Post} ∈ CAR/G ∧ BF ∪ IF |= Pre} = ∅ hBF , IF , AP , G, Di → hBF , IF , ∅, ∅, Fi V {( Pre, Post) | {Pre} ⇒ {Post} ∈ CAR/G ∧ BF ∪ IF |= Pre} = AP ′ hBF , IF , AP , G, Di → hBF , IF , AP ′ , AP ′ ∪ G, Ei (8) AP ′ 6= ∅ (9) AP 6= ∅ horg, I , AP , Ei → horg, AP ∪ I , ∅, Ei (10) V horg, ( Pre, []), i;I , Ei → horg, i, I , Ei (11) V V horg, IF , ( Pre, +if ;Post), Ei → horg, IF ∪ {if }, ( Pre, Post), Ei (12) V V horg, IF , ( Pre, −if ;Post), Ei → horg, IF /{if }, ( Pre, Post), Ei (13) I =∅ V V horg, ( Pre, []), I , Ei → horg, ( Pre, []), I , Di (14) Fig. 7: The Operational Semantics for ORWELL as implemented in the AIL (Counts-As Rules) Figure 7 shows the semantics for handling counts-as rules. These are similar to the semantics for effect rules except that the closure of all counts-as rules are applied. The set G, is used to track the rules that have been applied. All applicable counts as rules are made into intentions, these are selected one at a time and the rule postconditions are processed. As mentioned before, a counts-as rule may contain institutional facts in its precondition. Thus the application of a counts-as rule might trigger another counts-as rule that was not triggered 48 before. Therefore, when all intentions are processed the stage returns to stage D, in order to see if any new counts-as rules have become applicable. V {( Pre, Post) | {Pre} ⇒ {Post} ∈ SR ∧ IF |= Pre} = ∅ hIF , I , AP , Fi → hIF , ∅, Hi V {( Pre, Post) | {Pre} ⇒ {Post} ∈ SR ∧ IF |= Pre} = AP ′ hIF , AP , Fi → hIF , AP ′ , Gi AP ′ 6= ∅ (15) (16) AP 6= ∅ hI , AP , Gi → hAP ∪ I , ∅, Gi (17) V h( Pre, []), i;I , Gi → hi, I , Gi (18) V V hBF , ( Pre, +bf ;Post), Gi → hBF ∪ {bf }, ( Pre, Post), Gi (19) V V hBF , ( Pre, −bf ;Post), Gi → hBF /{bf }, ( Pre, Post), Gi (20) I =∅ V V h( Pre, []), I , Gi → h( Pre, []), I , Hi (21) Fig. 8: The Operational Semantics for ORWELL as implemented in the AIL (Sanction Rules) Figure 8 shows the rules governing the application of sanction rules. These are similar to the application of counts-as rules however, since sanction rules consider only institutional facts and alter only brute facts there is no need to check for more applicable rules once they have all applied. return(X) ∈ BF RS = [] horg, BF , RS , Hi → horg, BF /{return(X)}, [X], Ai (22) return(X) ∈ 6 BF RS = [] horg, BF , RS , Hi → horg, BF , [none], Ai (23) Fig. 9: The Operational Semantics for ORWELL as implemented in the AIL (Finalise) Lastly, figure 9 shows the rules of the final stage. The final stage of the semantics returns any results derived from processing the agent action. It does this by looking for a term of the form return(X) in the Brute Facts and placing that result, X, in the result store. The result store is implemented as a blocking queue, so, in this implementation, the rules wait until the store is empty and then place the result in it. When individual agents within the organisation take actions these remove a result from the store, again waiting until a result is available. Many of these rules are reused versions of customisable rules from the AIL toolkit. For instance the AIL mechanims for selecting applicable “plans” were easily customised to select 49 rules and was used in stages B, D and F. Similarly we were able to use AIL rules for adding and removing beliefs from an agent belief base to handle the addition and removal of brute and institutional facts. We modeled ORWELL’s fact sets as belief bases and extended the AIL’s belief handling methods to deal with the presence of multiple belief bases. It became clear that the ORWELL stages couldn’t be simply presented as a cycle. In some cases we needed to loop back to a previous stage. We ended up introducing rules to control phase changes explicitly (e.g. rule (21)) but these had to be used via an awkward implementational mechanism which involved considering the rule that had last fired. In future we intend to extend the AIL with a generic mechanism for doing this. It was outside the scope of our exploratory work to verify that the semantics of ORWELL, as implemented in the AIL, conformed to the standard language semantics as presented in [9]. However our aim is to discuss the verification of normative organisational programs and this implementation is sufficient for that, even if it is not an exact implementation of ORWELL. 5 Model Checking Normative Agent Organisations We implemented the ORWELL Organisation for the dining philosophers system shown in code fragment 2.1 but modified, for time reasons, to consider only three agents rather than five. We integrated this organisation into three multi-agent systems. The first system (System A) consisted of three agents implemented in the GOAL language. Part of the implementation of one of these agents is shown in code fragment 5.1. This agent has a goal to have eaten (line 4), but initially believes it has not eaten (line 7). It also believes that its left and right stick are both down on the table (also line 7). The agent has capabilities (lines 9-14) to perform all actions provided by the organisation. The return value of the organisation is accessed through the special designated variable term R that can be used in the postcondition of the capability specification. The beliefs of the agent will thus be updated with the effect of the action. The conditional actions define what the agent should do in achieving its goals and are the key to a protocol implementation. Whenever the agent has a goal to have eaten and believes it has not to have lifted either stick it will start by picking up its right stick first (line 17). Then it will pick up its left (line 18) and start eating when both are acquired (line 19). Note that if the eat action is successfully performed the agent has accomplished its goal. When the agent believes it has eaten and holds its sticks it will put them down again (lines 20 and 21). Other protocol abiding agents are programmed in a similar fashion provided that ag2 will pick up their left stick first instead of their right. Our expectation was, therefore, that this multi-agent system would never incur any sanctions within the organisation. System B used a similar set of three GOAL agents, only in this case all three agents were identical (i.e. they would all pick up their right stick first). We anticipated that this group of agents would trigger sanctions. Lastly, for System C, we implemented three entirely Black Box agents which simply performed the five possible actions at random. This system did not conform to the assumption that once an agent has picked up a stick it will not put it down until it has eaten. We investigated the truth of three properties evaluated on these three multi-agent systems. In what follows is the LTL operator, always. Thus φ means that φ holds in all states 50 Code fragment 5.1 A protocol abiding GOAL agent. : name : ag1 1 2 : I n i t i a l Goals : eaten ( yes ) 3 4 5 : I n i t i a l Beliefs : e a t e n ( no ) l e f t ( d ) r i g h t ( d ) 6 7 8 : Capabilities : pul pul { True } pur pur { True } pdl pdl { True } pdr pdr { True } e a t e a t { True } 9 {− l e f t ( d ) , {− r i g h t ( d ) , {− l e f t ( u ) , {− r i g h t ( u ) , {− e a t e n ( no ) l e f t ( R) } r i g h t ( R) } l e f t ( d )} r i g h t ( d )} , e a t e n ( R) } 10 11 12 13 14 15 : Conditional G eaten ( yes ) G eaten ( yes ) G eaten ( yes ) B eaten ( yes ) B eaten ( yes ) Actions : , B l e f t ( d ) , B r i g h t ( d ) |> do ( p u r ) , B l e f t ( d ) , B r i g h t ( u ) |> do ( p u l ) , B l e f t ( u ) , B r i g h t ( u ) |> do ( e a t ) , B l e f t ( u ) |> do ( p d l ) , B r i g h t ( u ) |> do ( p d r ) 16 17 18 19 20 21 contained in every run of the system. ⋄ is the LTL operator, eventually or finally. ⋄φ means that φ holds at some point in every run of a system. The modal operator B(ag, φ) stands for “ag believes φ” and is used by AJPF to interrogate the knowledge base of an agent. In the case of ORWELL this interrogates the fact bases. Property 1 states that it is always the case that if the organisation believes (i.e. stores as a brute fact in its knowledge base) all agents are holding their right stick (or all agents are holding their left stick) – i.e., the system is potentially in a deadlock – then at least one agent believes it has eaten (i.e., one agent is about to put down it’s stick and deadlock has been avoided). ^ ^ _ (( B(org, hold(i, r)) ∨ B(org, hold(i, l))) ⇒ B(agi , eaten(yes))) i i (24) i Property 2 states that it is not possible for any agent which has been punished to be given more food. ^ ¬(B(org, punished(i)) ∧ B(org, f ood(i))) (25) i Property 3 states after an agent violates the protocol it either always has no food or it gets rewarded (for putting its sticks down). This property was expected to hold for all systems irrespective of whether the agents wait until they have eaten before putting down their sticks or not. 51 V i (B(org, hold(i, l)) ∧ ¬B(org, hold(i, r))) =⇒ (26) (¬B(org, f ood(i)) ∨ ⋄B(org, rewarded(i))) The results of model checking the three properties on the three systems are shown below. We give the result of model checking together with the time taken in hours (h), minutes (m) or seconds (s) as appropriate and the number of states (st) generated by the model checker: System A System B System C Property 1 True (40m, 8214 st) False (2m, 432st) False (13s, 47st) Property 2 True (40m, 8214st) True (30m, 5622st) False (34s, 143st) Property 3 True (1h 7m , 9878st) True (1h 2m, 10352st) Timeout It should be noted that transitions between states within AJPF generally involve the execution of a considerable amount of JAVA code in the JPF virtual machine since the system only branches the search space when absolutely necessary. There is scope, within the MCAPL framework for controlling how often properties are checked. In our case we had the properties checked after each full execution of the ORWELL reasoning cycle. This was a decision made in an attempt to reduce the search space further. So in some cases above a transition between two states represents the execution of all the rules from stages A to H of the ORWELL reasoning cycle. Furthermore the JPF virtual machine is slow, compared to standard JAVA virtual machines, partly because of the extra burden it incurs maintaining the information needed for model checking. This accounts for the comparatively small number of states examined for the time taken when these results are compared with those of other model checking systems. Nevertheless we were disappointed that we were unable to verify that Property 3 held for System C. When the process timed out it had examined over 500,000 states. We intend to check these results for unnecessary duplication of states and hopefully re-run the experiment for future work. However it will almost certainly remain the case that verifying an organisation containing agents with known internal states will prove considerably more computationally tractable than verifying organisations that contain entirely random agents. In the process of conducting this experiment we discovered errors, even in the small program we had implemented. For instance we did not, initially, return a result when an agent attempted to pick up a stick which was held by another agent. This resulted in a failure of the agents to instantiate the result variable and, in some possible runs, to therefore assume that they had the stick and to attempt to pick up their other stick despite that being a protocol violation. This showed the benefit of model checking an organisation with reference to agents that are assumed to obey its norms. The experiments also show the benefits of allowing access to an agent’s state when verifying an organisation in order to, for instance, check that properties hold under assumptions such as that agents do not put down sticks until after they have eaten. The more that can be assumed about the agents within an organisation the more that can be proved and so the behaviour of the organisation with respect to different kinds of agent can be determined. 6 Conclusions In this paper we have explored the verification of multi-agent systems running within a normative organisation. We have implemented a normative organisational language, ORWELL, 52 within the MCAPL framework for model checking multi-agent systems in a fashion that allows us to model check properties of organisations. We have investigated a simple example of an organisational multi-agent system based on the dining philosophers problem and examined its behaviour in settings where we make very few assumptions about the behaviour of the agents within the system and in settings where the agents within the system are white box (i.e., the model checker has full access to their internal state). We have been able to use these systems to verify properties of the organisation, in particular properties about the way in which the organisation handles norms and sanctions. An interesting result of these experiments has been showing that the use of white box agents allows us to prove a wider range of properties about the way in which the organisation behaves with respect to agents that obey its norms, or agents that, even if they do not obey its norms, respect certain assumptions the organisation embodies about their operation. In particular the white box system enabled us to detect a bug in the organisational code which revealed that the organisation did not provide agents which did obey its norms with sufficient information to do so. This bug would have been difficult to detect in a system where there was no information about the internal state of the constituent agents, since the property that revealed it did not hold in general. In more general terms the verification of organisations containing white box agents enables the verification that a given multi-agent system respects the norms of an organisation. References 1. L. Aştefănoaei, M. Dastani, J.-J. Meyer, and F. S. Boer. A verification framework for normative multi-agent systems. In PRIMA 2008, pages 54–65, Berlin, Heidelberg, 2008. Springer-Verlag. 2. H. Aldewereld. Autonomy versus Conformity an Institutional Perspective on Norms and Protocols. PhD thesis, Utrecht University, SIKS, 2007. 3. R. H. Bordini, L. A. Dennis, B. Farwer, and M. Fisher. Automated Verification of Multi-Agent Programs. In Proc. 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 69–78, 2008. 4. R. H. Bordini, M. Fisher, W. Visser, and M. Wooldridge. Model Checking Rational Agents. IEEE Intelligent Systems, 19(5):46–52, September/October 2004. 5. R. H. Bordini, M. Fisher, W. Visser, and M. Wooldridge. Verifying Multi-Agent Programs by Model Checking. Journal of Autonomous Agents and Multi-Agent Systems, 12(2):239–256, March 2006. 6. R. H. Bordini, J. F. Hübner, and M. Wooldridge. Programming Multi-Agent Systems in AgentSpeak Using Jason. Wiley Series in Agent Technology. John Wiley & Sons, 2007. 7. O. Cliffe, M. D. Vos, and J. A. Padget. Answer set programming for representing and reasoning about virtual institutions. In K. Inoue, K. Satoh, and F. Toni, editors, CLIMA VII, volume 4371 of Lecture Notes in Computer Science, pages 60–79. Springer, 2006. 8. M. Dastani. 2apl: a practical agent programming language. Autonomous Agents and Multi-Agent Systems, 16(3):214–248, 2008. 9. M. Dastani, N. A. M. Tinnemeier, and J.-J. C. Meyer. A programming language for normative multi-agent systems. In V. Dignum, editor, Multi-Agent Systems: Semantics and Dynamics of Organizational Models, chapter 16. IGI Global, 2008. 10. L. A. Dennis, B. Farwer, R. H. Bordini, and M. Fisher. A Flexible Framework for Verifying Agent Programs. In Proc. 7th International Conference on Autonomous Agents and Multiagent Systems (AAMAS). ACM Press, 2008. (Short paper). 53 11. L. A. Dennis and M. Fisher. Programming verifiable heterogeneous agent systems. In K. Hindriks, A. Pokahr, and S. Sardina, editors, ProMAS 2008, pages 27–42, Estoril, Portugal, May 2008. 12. D. Grossi. Designing Invisible Handcuffs. Formal Investigations in Institutions and Organizations for Multi-agent Systems. PhD thesis, Utrecht University, SIKS, 2007. 13. K. V. Hindriks, F. S. d. Boer, W. v. d. Hoek, and J.-J. C. Meyer. Agent programming with declarative goals. In ATAL ’00: Proceedings of the 7th International Workshop on Intelligent Agents VII. Agent Theories Architectures and Languages, pages 228–243, London, UK, 2001. Springer-Verlag. 14. M.-P. Huguet, M. Esteva, S. Phelps, C. Sierra, and M. Wooldridge. Model checking electronic institutions. In MoChArt 2002, pages 51–58, 2002. 15. M. Kacprzak, A. Lomuscio, and W. Penczek. Verification of Multiagent Systems via Unbounded Model Checking. In Proc. 3rd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 638–645. IEEE Computer Society, 2004. 16. G. D. Plotkin. A structural approach to operational semantics. Technical Report DAIMI FN-19, University of Aarhus, 1981. 17. J. R. Searle. The Construction of Social Reality. Free Press, 1995. 18. Y. Shoham. Agent-oriented programming. AI, 60(1):51–92, 1993. 19. J. Vázquez-Salceda, H. Aldewereld, D. Grossi, and F. Dignum. From human regulations to regulated software agents’ behavior. AI & Law, 16(1):73–87, 2008. 20. F. Viganò. A framework for model checking institutions. In MoChArt 2006, pages 129–145, Berlin, Heidelberg, 2007. Springer-Verlag. 21. W. Visser, K. Havelund, G. P. Brat, S. Park, and F. Lerda. Model Checking Programs. Automated Software Engineering, 10(2):203–232, 2003. 54 Operational Semantics for BDI Modules in Multi-Agent Programming Mehdi Dastani and Bas R. Steunebrink Utrecht University The Netherlands Email: {mehdi,bass}@cs.uu.nl Abstract. This paper proposes an operational semantics for BDI modules that can be incorporated in multi-agent programming languages. The introduced concept of modules facilitates the implementation of agents, agent roles, and agent profiles. Moreover, the introduced concept of modules enables common programming techniques such as encapsulation and information hiding for BDI-based multi-agent programs. This vision is applied to a BDI-based multi-agent programming language to which specific programming constructs are added to allow the implementation of modules. The syntax and operational semantics of this programming language are provided and some properties of the module related programming constructs are discussed. An example is presented to illustrate how modules can be used to implement BDI-based multi-agent systems. 1 Introduction Modularity is an essential principle in structured programming in general and in agent programming in particular. This paper focuses on the modularity principle applied to BDI-based agent programming languages. There have been some proposals for supporting modules in BDI-based programming languages, e.g., [2, 3, 5, 8]. In these proposals, modularization is considered as a mechanism to structure an individual agent’s program in separate modules, each encapsulating cognitive components such as beliefs, goals, and plans that together model a specific functionality and can be used to handle specific situations or tasks. However, the way the modules are used in these programming approaches are different. For example, in Jack [3] and Jadex [2], modules (which are also called capabilities) are employed for information hiding and reusability by encapsulating different cognitive components that together implement a specific capability/functionality of the agent. In these approaches, the encapsulated components are used during an agent’s execution to process the events received by the agent. In other approaches [5, 8], modules are used to realize a specific policy or mechanism in order to control agent execution. More specifically, modules in GOAL [5] are considered as the ‘focus of execution’, which can be used to disambiguate the application and execution of plans. This is done by assigning a mental state condition (beliefs and/or goals) to each module. The modules whose conditions are satisfied form the focus of an agent’s execution such that only plans from these modules are applied and executed. Finally, in 3APL [8] a module can be associated with a specific goal indicating which planning rules can be applied to achieve the 55 goal. In other words, a module implements specific means for achieving specific goals. It should also be noted that the concept of module as used in [6] is different than in other approaches. A module in [6] is considered as one specific cognitive component (e.g., an agent’s beliefs) and not as a functionality modeled by different cognitive components. In these proposals, most module-related decisions such as when and how modules should be used during an agent’s execution are controlled by the agent’s execution strategy, usually implemented in the agent’s interpreter (i.e., agent deliberation cycle). An agent programmer can control the use of modules during an agent’s execution indirectly and implicitly either based on the predetermined functionality given to the modules or through conditions assigned to them. For example, in Jack [3] and Jadex [2] the agent’s interpreter uses modules to process the received events. In [5], belief or goal conditions are assigned to modules such that an agent’s interpreter uses the modules when the respective conditions hold. Finally, in [8] a programmer has only a limited control over the modules by indicating which modules (i.e., which planning rules) should be used to achieve a goal. Like in other approaches, we consider a module as an encapsulation of different cognitive components that together implement a specific agent functionality. However, the added value of our approach is that a programmer can perform a wide range of operations on modules. These module-related operations enable a programmer to directly and explicitly control when and how modules are used. Thus, in contrast to the abovementioned approaches, we propose a set of generic programming constructs that can be used by an agent programmer to perform a variety of operations on modules. The proposed notion of module can be used to implement a variety of agent concepts such as agent role and agent profile. In fact, in our approach a module can be used as a mechanism to specify a role that can be enacted by an agent during its execution. We also explain how the proposed notion of modules can be used to implement agents that can represent and reason about other agents. In section 2, we explain our module based programming vision, present its syntax, and provide an example. The operational semantics of the programming language are presented in section 3. In section 4, we discuss how the proposed notion of modules can be used to implement agent roles and agent profiles. Finally, in section 5, we conclude the paper and indicate some future research directions. 2 BDI Programming with Modules Programming a BDI-based individual agent amounts to specifying its initial (cognitive) state in terms of beliefs (information), goals (objectives), and plans (means). In programming terminology, the beliefs, goals, and plans can be considered as (cognitive) data structures specifying the state of the agent program. The execution of a BDI-based agent program, which is supposed to modify the state of the agent program, is based on a cyclic process called deliberation cycle (sense-reason-act cycle). Each iteration of this process starts with sensing the environment (i.e., receive events and messages), reasoning about its state (i.e., update the state with received events and messages, and generate plans to either achieve goals or to react to events), and performing actions (i.e., perform actions of the generated plans). Similar BDI ingredients and deliberation cy- 56 cles are used in existing BDI-based programming languages such as Jason [1], 2APL [4], Jadex [7], and Jack [9]. Without losing generality and committing to a specific knowledge representation scheme, we assume in the rest of the paper a BDI-based agent programming language with (cognitive) data structures and a similar deliberation process. Moreover, we consider structuring a BDI-based agent program in separate modules as encapsulation of cognitive data that together model a specific functionality (when the deliberation process operates on them). A multi-agent program consists of a set of modules with unique names, each specifying a state in terms of cognitive concepts. Initially, a subset of these modules is identified as the specification of the initial state of individual agents. The execution of a multi-agent program is then the instantiation of this subset of modules followed by performing a deliberation process on each module instance. In this way, an instance of a module forms the initial state of an individual agent. It should be emphasized that a module instance specifies the cognitive state of an agent while the agent itself is the deliberation process working on the cognitive state. 2.1 Syntax We do not present here the complete syntax of a modular BDI-based agent programming language as we aim at focusing on modules and module-related actions. In fact, we assume that a module is just like an agent program specifying a cognitive state by means of programming constructs (for beliefs, goals, and plans) of existing BDI-based programming languages extended with module-related actions. Moreover, we assume that the proposed module-related actions can be added to any existing BDI-based agent programming language [1, 4, 7, 9]. For the sake of presenting an example, however, we consider an agent’s beliefs being implemented by a set of Horn-clauses. An agent’s goals are assumed to be implemented by a set of conjunctive ground atoms, where each conjunction represents a situation the agent wants to realize. An agent is assumed to be capable of performing different types of actions such as update actions (to modify beliefs and adopt and drop goals), belief and goal test actions (to query beliefs and goals), and actions to send messages and to change the state of external environments. Moreover, an agent is assumed to generate plans at runtime by applying rules. These rules can be used to generate plans based on either the agent’s beliefs and goals, or the received internal and external events including messages from other agents. Rules have the form trigger | guard -> plan, where trigger is either a goal or an event query of the form G(ϕ) or E(ϕ), respectively, and the guard is a belief query of the form B(ϕ). Finally, plan is the plan to be generated and added to the set of plans if both trigger and guard hold. Similar BDI related programming constructs occur in many existing BDI-based agent programming languages such as Jason [1], Jadex [7], Jack [9], and 2APL [4]. The first module-related action is create(mod -name,ins-ident), which can be used to create an instance of the module specification named mod -name. The name that is assigned to the created module instance is given by the second argument ins-ident. The owner of the module instance can use this name to perform further operations on it. A module instance with identifier m can be released by its owner by means of the release(m) action. This means that its instance is removed/lost. 57 A module instance m can be executed by its owner through the execute(m,test) action. The execution of a module instance, performed by its owner, has two effects: 1) it suspends the execution of the owner module instance, and 2) it starts the execution of the owned module instance. The execution of the owner module instance will be resumed as soon as the execution of the owned module instance is terminated. In a sense, an agent that executes an owned module instance, stops deliberating on its current cognitive state and starts deliberating on a new cognitive state. The termination of the owned module instance1 is based on the mandatory test condition (i.e., the second argument of the execute action). As soon as this condition holds, a stop event is sent to the owned module instance. The module instance can then use the received event and start a cleaning operation after which it should broadcast a return event. For this we introduce an action return that can be executed by an owned module instance after which its execution is terminated and the execution of the owner module instance is resumed. The owner of a module instance can access, query, and update the internals of the instance. In particular, the owner can test whether certain beliefs and goals are entailed by the beliefs and goals of its owned module instance m through action test(m,ϕ,f), where ϕ consists of belief and goal queries of the form B(ϕ) and G(ϕ), and f is a boolean flag indicating whether the test action has been successful or not. Also, the beliefs and goals of a module instance m can be updated by means of the actions updateB(m,ϕ) and updateG(m,ϕ), respectively. Here ϕ can consist of multiple terms to be added, separated by commas; however, terms preceded by a minus sign are removed from the beliefs/goals. A typical life cycle of a module in terms of these operations is as follows. A module instance i can create a new module instance j from a specification file. The module instance i can then modify j’s internal state using update actions. The module instance i can transfer the execution control to the module instance j by the execute action. The execution of j continues until j performs a return action. The module instance i can specify a stopping condition ϕ, causing j to receive a stop event when ϕ is satisfied, in response to which it can perform clean-up operations before returning execution control back to the module instance i. When i is active again, it can query j’s internal state by the test action and release (remove) it. 2.2 An Example Multi-Agent Program The following example is provided to illustrate the idea of module-related constructs and their use to implement an agent’s role. This example is not intended to demonstrate the practical use of the constructs for which we may need substantially more space. Suppose we need to build a multi-agent system in which one single manager and three workers cooperate to collect gold items in an environment called gridworld. The manager coordinates the activities of the three workers by asking them either to explore the gridworld environment to detect the gold items or to carry the detected gold items 1 The owner cannot force the owned module instance’s execution to stop because its own execution has been suspended. 58 to a depot and store them. For this example, which can be implemented as the program illustrated in Figure 1, the module declaration includes a manager module (i.e., manager.mod) which specifies the initial state of the manager agent with the name m (the implementation of the manager module is presented in Figure 2). Note that only one manager agent will be initialized and created (line 7). Moreover, the worker module (worker.mod; see Figure 3) specifies the initial state of three worker agents. The names of the worker agents in the implemented multi-agent system is assumed to be indexed with numbers, i.e., there will be three worker agents with names w1, w2, and w3 (line 8). Finally, two additional modules are declared to implement the explorer and carrier functionalities (line 4, 5). As we will see, these functionalities will be used at runtime by the worker agents. Note that both functionalities can access the ‘gridworld’ environment. 1 2 3 4 5 6 7 8 Modules: manager.mod worker.mod explorer.mod @gridworld carrier.mod @gridworld Agents: m manager 1 w worker 3 Fig. 1. The multi-agent program of the running example. The manager module can be implemented as in Figure 2. The goal of the manager m is to have gold items (line 10). Moreover, it has one initial plan through which it sends a request to worker w3 to explore the gridworld environment (line 11).2 The first rule of the manager agent (lines 13-17) indicates that the goal to have a gold item (i.e., G(haveGold())) can be achieved if the agent believes that there is a gold item at position POS not assigned to any (worker) agent yet and that there is a worker agent A having no assigned task (i.e., collecting gold items) yet (i.e., B(gold(POS) && -assigned(POS, ) && worker(A) && -assigned( ,A)) ). The plan to achieve this goal sends a message to the free agent asking to play the carrier role to collect the gold item. This is followed by the action ModOwnBel(assigned(POS,A)) by means of which the manager agent modifies its own beliefs to record the fact that the free agent is not free anymore (i.e., after this action the manager agent believes that agent A has an assigned task). A similar rule should be added to the code of the manager module allowing to ask a (free) agent to play the explorer role when the manager has no beliefs about gold items. The second rule (lines 18-20) indicates that whenever the manager receives an event (message) containing the information about the position of a gold item (i.e., gold(POS)), it updates its own beliefs with this information (line 19). 2 Here we assume that the manager is aware of the three created workers, i.e., it has the identities of the workers. This assumption can be relaxed by making a query to a possibly existing agent management system to get the identifier of a worker. 59 The third rule (lines 21-23) indicates that when a worker informs the manager that it has collected and carried its assigned gold items to the depot, the manager updates its own beliefs (atoms preceded by a minus sign are removed) with the fact that the worker is ready to carry new gold items again. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Beliefs = { worker(w1), worker(w2), worker(w3) } Goals = { haveGold() } Plans = { send( w3, play(explorer) ); } Rules = { G( haveGold() ) | B( gold(POS) && -assigned(POS, _) && worker(A) && -assigned(_, A) ) -> { send( A, play(carrier, POS) ); ModOwnBel( assigned(POS, A) ); }, E( receive( A, gold(POS) ) ) | B( worker(A) ) -> { ModOwnBel( gold(POS) ); }, E( receive( A, done(POS) ) ) | B( worker(A) ) -> { ModOwnBel( -assigned(POS, A), -gold(POS) ); } } Fig. 2. The code of the manager module. The worker agent, as implemented in Figure 3, is an agent that waits for requests to either explore the gridworld environment or carry the gold items and store them. When it receives a request to explore the gridworld environment from the manager (line 27), it creates an explorer module instance and executes it (line 28-29). Note that the stopping condition of this module instance is the belief that gold has been found. When the execution of the module instance halts, the worker agent sends the position of the detected gold item to the manager (line 31), and finally releases the explorer module instance (line 32). It is important to note that for the worker agent the creation of an explorer module instance and executing it is the same as playing the explorer role. The worker agent plays this role until the goal of the role (i.e., finding gold items) is believed to be achieved. The second rule of the worker agent (line 34) is responsible for carrying gold items by creating a carrier module instance (line 35), adding the gold item information to its beliefs (line 36), and executing it until either it has found the gold items (done() condition) or an error has occurred (error() condition); see line 37. The final four lines of this code (38-41) is to inform the manager agent about the success or failure of carrying the gold item and releasing the carrier module instance after this communication. In other words, this second rule indicates when the worker agent should play the carrier role. Note that the code of the manager agent has no rule to react to the failure message; for the running implementation such a rule should be added. The explorer module (i.e., the implementation of the explorer role), as implemented in Figure 4, has the goal to find gold items (line 45). In order to achieve this goal, it proceeds to a random location in the gridworld, performs a sense gold action there and, 60 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 Beliefs = { manager(m) } Rules = { E( receive( A, play(explorer) ) ) | B( manager(A) ) -> { create( "explorer.mod", myexp ); execute( myexp, B( gold(POS) ) ); test( myexp, B( gold(POS) ), true); send( A, gold(POS) ); release( myexp ); }, E( receive( A, play(carrier, POS) ) ) | B( manager(A) ) -> { create( "carrier.mod", mycar ); updateB( mycar , gold(POS) ); execute( mycar, B( done() or error() ) ); test( mycar , B(done()) , F); if F=true then send( A, done(POS) ) else send( A, error(POS) ); release( mycar ); } } Fig. 3. The code of the worker module. if successful, adds the position of the detected gold item (i.e., gold(POS)) to its own local beliefs (line 50). Note that this belief information satisfies the stopping condition of the module instance (see line 29) since the goal foundGold() is achieved as soon as gold(POS) is added to its beliefs (line 44). In this example, the final rule (line 52) is to react to the stop event which is broadcasted when the explorer’s stopping condition holds. The reception of this event causes the explorer module to perform a return action, which in turn causes the execution to be handed back to the worker module. 44 45 46 47 48 49 50 51 52 53 Beliefs = { foundGold() :- gold(_) }; Goals = {foundGold()} Rules = { G( foundGold() ) | true -> { @gridworld( goToRandomPosition() ); @gridworld( senseGold() , POS ); if POS != nil then ModOwnBel( gold(POS) ); }, E( stop ) | true -> { return; } } Fig. 4. The code of the explorer module. Finally, the carrier module (i.e., the implementation of the carrier role) as implemented in Figure 5 has a goal to store a gold item (line 55). This goal can be achieved 61 by fetching the gold item, storing it in the depot, and removing that gold item from its own local beliefs (lines 58-60). Similar to the explorer module, the carrier module performs a return action when it receives a stop event (line 62). The third rule (line 63) adds error information (i.e., error()) to its own local beliefs when the execution of an action in the gridworld environment fails. Note that error() in the beliefs was one of the stopping conditions to stop the execution of the carrier module instance (line 37). It is also important to note that it is up to the gridworld programmer to determine when the execution of a gridworld action fails. 54 55 56 57 58 59 60 61 62 63 64 Beliefs = { goldStored() :- not gold(_) } Goals = { goldStored()} Rules = { G( goldStored() ) | B( gold(POS) ) -> { @gridworld( fetchGold(POS) ); @gridworld( storeGold() ); ModOwnBel( -gold(POS), done() ); }, E( stop ) | true -> { return; }, E( fail( @gridworld(_) ) ) | true ->{ ModOwnBel( error() ); } } Fig. 5. The code of the carrier module. 3 Semantics The semantics of the proposed actions are defined in terms of a transition system, which consists of a set of transition rules for deriving transitions. A transition specifies a single computation/execution step by indicating how one configuration can be transformed into another. In this paper, we first present the multi-agent system configuration, which consists of the configurations of module instances/individual agents and the state of the external shared environments. Then, we present transition rules from which possible execution steps for multi-agent programs can be derived. Here, we focus only on the semantics of module-related constructs. 3.1 Multi-Agent System Configuration The configuration of a multi-agent program is defined in terms of the configuration of active modules instances (some module instances are individual agents), inactive ones, and the state of their shared external environments. The configuration of a module instance includes 1) the cognitive state of the module instance (beliefs, goals, plans) with a unique name, and 2) a stopping condition for the module instance. We denote the configuration of the cognitive state of an agent or a module instance G with name i as Ai . We write AB i and Ai to denote the beliefs and goals of agent Ai , 62 respectively. Moreover, we assume suitable definitions of |=b , |=g , ⊕b , and ⊕g such that beliefs and goals can be queried and updated, respectively. We then define |=c as a test on a single agent configuration Ai as: Ai 6|=c ⊥; Ai |=c B(ϕ) ⇔ AB i |=b ϕ; and Ai |=c G(ϕ) ⇔ AG i |=g ϕ. To simplify keeping track of which module instance owns which, their names are composed using periods. For example, a module instance named 1.4.7 is owned by module instance 1.4, which is owned by the ‘top-level’ module instance 1. More formally, we define the sets Bid of ‘basic identifiers’ and Cid of ‘composed identifiers’; the function prefix returns all prefixes of a composed name (e.g., prefix (1.4.7) = {1.4.7, 1.4, 1}): Bid = N Cid = Bid ∪ { c.b | c ∈ Cid , b ∈ Bid } ( {i} if i ∈ Bid prefix (i) = {i} ∪ prefix (j) if i = j.k for some j ∈ Cid , k ∈ Bid The configuration of a multi-agent system is a triple hA, I, χi, where A is a set of configurations of active module instances (including module instances that implement individual agents), I is a set of configurations of inactive module instances, and χ is the state of the shared environments. The initial configuration of each individual agent is determined by the declared module that is assigned to the agent in the multiagent program. In particular, for each individual agent with initial configuration A, a module instantiation (A, ⊥) is created and added to the set of active module instances A. Thus, module instances created when the multi-agent program is started will have ⊥ as stopping condition. Also, all environments from the multi-agent system program are collected in the set χ. The initial state of the shared external environment is set by the programmer, e.g., the programmer may initially place gold or obstacles at certain positions in a grid-world environment. Finally, the initial configuration of the set of inactive module instances I is an empty set. The idea behind the distinction between A and I is that only module instance contained in A are subject to making transitions. All module instances that are inactive are kept in I. These module instances in I may at run-time be (re)activated (i.e. transferred to A) or removed from I. Given a multi-agent configuration hA, I, χi, two convenience functions are defined for looking up all ancestors and descendants using the name of a module instance, as follows: anc A I (i) = { (Aj , ψ) ∈ A ∪ I | j ∈ prefix (i) } desc A I (i) = { (Aj , ψ) ∈ A ∪ I | i ∈ prefix (j) } Note that the module instance with the given name (i) is included as its own ancestor and descendant. The execution of a multi-agent program modifies its initial configuration by means of transitions that are derivable from the transition rules presented in the following subsection. In fact, each transition rule indicates which execution step (i.e., transition) 63 is possible from a given configuration. It should be noted that for a given configuration there may be several transition rules applicable. An interpreter is a deterministic choice of applying transition rules in a certain order. 3.2 Transition Rules for Module Actions We provide the transition rules for deriving a multi-agent system transition based on the execution of a module-related action by one of the involved module instances. We α! will use Ai −→ A′i to indicate that the module instance Ai can make a transition to module instance A′i by performing action α and broadcasting event α!. When α? is used, instead of α!, Ai receives the event α?. The first transition indicates the effect of the create(f ,j) action performed by the module instance Ai , where f is the identifier of a module specification (typically a file name) and j is the name that will be associated with the created module instance. This transition rule indicates that a module instance can be created by another module instance if the creating module instance is active, i.e., (Ai , ϕ) ∈ A. The result is that the set of module instances A and I are modified. In particular, the creating module instance is modified as it has performed the create action and the newly created module instance is added to the set of inactive module instances I in the multi-agent system configuration. (Ai , ϕ) ∈ A create(f,j)! Ai (Ai.j , ⊥) 6∈ I −→ A′i hA, I, χi −→ hA′ , I ′ , χi where A′ = (A \ {(Ai , ϕ)}) ∪ {(A′i , ϕ)}, Ai.j 3 is a new configuration with name i.j created from specification f , and I ′ = I ∪ {(Ai.j , ⊥)}. Note that the newly created module’s execution stopping condition is set to ⊥ (as an arbitrary initial value). Also note that a module is only allowed to create another module twice (or more) if different names are used to identify it. This will result in two different instances of the module, each with its own name and state. Otherwise the create action blocks. A module Ai that owns another module named j (i.e. (Ai.j , ⊥) ∈ I) can release (delete) it. It can do this by performing the action release(j). As a result, this module configuration is removed from I. If Ai.j does not exist, the release action blocks. (Ai , ϕ) ∈ A release(j)! (Ai.j , ⊥) ∈ I Ai −→ A′i hA, I, χi −→ hA′ , I ′ , χi where A′ = (A\{(Ai , ϕ)})∪{(A′i , ϕ)} and, as it would seem, I ′ = I\{(Ai.j , ⊥)}. However, if Ai.j owns one or more unreleased (inactive) module instances, these would be kept dangling. To remove all descendants, I ′ = I \ desc A I (i.j). It should be noted that a module instance is always created privately for the creating module instance (or agent). Therefore, a module instance will not retain its state when it is released and created again. Also, the creating module instance (agent) is the only one that can release and thereby delete the module instance. 3 When writing Ai.j , it is assumed that i ∈ Cid and j ∈ Bid . 64 A module instance that owns another module instance can execute it, meaning that the owned module instance is transferred from I to A so that it can perform actions by itself. In doing so, the owning module instance is transferred from A to I, i.e. its execution is halted. In effect, control is ‘handed over’ from the owner module instance to the owned module instance. As part of the execute action, a stopping condition ψ is provided with which the owner module instance can specify when it wants control returned, i.e., as soon as the owned module instance satisfies the stopping condition (Ai.j |=c ψ; a transition rule for this case is provided next). (Ai , ϕ) ∈ A execute(j,ψ)! Ai −→ A′i (Ai.j , ⊥) ∈ I hA, I, χi −→ hA′ , I ′ , χi where A′ = (A \ {(Ai , ϕ)}) ∪ {(Ai.j , ψ)} and I ′ = (I \ {(Ai.j , ⊥)}) ∪ {(A′i , ϕ)}. As soon as the stopping condition of an executing module instance holds (Ai |=c ϕ), it will receive a stop event from the multi-agent level requesting it to stop its execution, possibly after first performing some cleanup operations. Note that it is assumed that a stop? module instance is always able to receive a stop event (Ai −→ A′i ). It is not guaranteed by the system that a module instance will actually ever stop; it must perform a return action (see below) itself in order to have it transferred back to I. stop? (Ai , ϕ) ∈ A Ai |=c ϕ Ai −→ A′i ′ hA, I, χi −→ hA , I, χi where A′ = (A \ {(Ai , ϕ)}) ∪ {(A′i , ϕ)}. Note that by definition, Ai 6|=c ⊥. This means that 1) top-level module instances (i.e. those created at initialization of the multiagent configuration, i.e. those with a non-composed name) never receive a stop event because they have ⊥ as stopping condition, and 2) module instances executed with ⊥ as stopping condition (e.g., execute(j,⊥)) never receive a stop event either; it is up to the programmer to ensure that the executed module instance performs a return action (see below) at some point to return control to its owning module instance. A module instance can return control to its parent module instance by performing a return action. This will cause them to ‘switch places’ again with respect to A and I. Only module instances with a parent can return control, which is enforced below requiring that the module instance performing a return action has a composite name i.j. It is up to the programmer to ensure that a return action is performed by a module instance in response to a stop event. It should be noted that a module’s execution has to be finished before it can be released, because the owning module instance must be in A to be able to perform a release action. return! (Ai , ϕ) ∈ I (Ai.j , ψ) ∈ A Ai.j −→ A′i.j ′ ′ hA, I, χi −→ hA , I , χi where A′ = (A \ {(Ai.j , ψ)}) ∪ {(Ai , ϕ)} and I ′ = (I \ {(Ai , ϕ)}) ∪ {(A′i.j , ⊥)}. This mechanism allows a module instance to respond to a stop event by performing 65 clean up operations and then returning. Finally, note that the state of A′i.j is saved (in I) with the default ⊥ as stopping condition. Next we consider several actions that a module instance can perform on a module instance that it owns that do not pertain to control, but to the state of the owned module instance. Specifically, a module instance can query the beliefs and goals of an owned module instance, update the beliefs of an owned module instance, and adopt and drop goals in an owned module instance. First we consider the belief and goal queries. A module instance Ai that owns another module instance named j which is currently inactive (i.e. (Ai.j , ⊥) ∈ I) can perform a (belief/goal) query ψ on Ai.j . The query succeeds and returns substitution θ if Ai.j |=c ψθ, or it fails returning an empty substitution. The following transition rule captures this. (Ai , ϕ) ∈ A test(j,ψ,f )! (Ai.j , ⊥) ∈ I Ai −→ A′i θ ′ hA, I, χi −→ hA , I, χi where A′ = (A \ {(Ai , ϕ)}) ∪ {(A′i θ, ϕ)} and f = ⊤ if Ai.j |=c ψθ or f = ⊥ if Ai.j 6|=c ψ. In this transition rule, we assume A′i θ to be the same as Ai except that the test action has been processed and the substitution θ is applied. How these operations are performed depends on the corresponding agent transition rules from which the transition Ai −→ A′i can be derived. Note that Ai.j is not changed by the test and that only direct descendants can be tested (and updated; see below). We now consider belief and goal updates. It is assumed that a formula ψ can represent a belief/goal and that Ai.j ⊕b/g ψ yields a configuration where the beliefs/goals have been updated with ψ. Note that if ψ contains any negated terms, these will be deleted from Ai.j . Similar to the transition rule for queries above, the owned module instance on which the belief or goal update is performed must be contained in the set of inactive module instances I. With slight abuse of notation (using a slash), the following transition rule captures both the updateB and updateG actions, respectively. (Ai , ϕ) ∈ A updateB/G(j,ψ)! −→ A′i Ai (Ai.j , ⊥) ∈ I ′ hA, I, χi −→ hA , I ′ , χi where A′ = (A\{(Ai , ϕ)})∪{(A′i , ϕ)} and I ′ = (I \{(Ai.j , ⊥)})∪{(Ai.j ⊕b/g ψ, ⊥)}. One module instance can send a message to another module instance if both the sender and receiver exist as active module instances (i.e. are elements of A). It is assumed a receive event is always successful. (Ai , ϕ) ∈ A Ai send(j,ψ)! (Aj , ϕ′ ) ∈ A −→ A′i hA, I, χi −→ hA′ , I, χi Aj receive(i,ψ)? −→ A′j where A′ = (A \ {(Ai , ϕ), (Aj , ϕ′ )}) ∪ {(A′i , ϕ), (A′j , ϕ′ )}. Note that only active module instances can exchange messages, meaning that they can never send messages to ancestors or descendants. If the intended receiver does not exist as an active module instance, the message is ‘bounced’ back to the sender. Again, it is assumed 66 an undelivered event is always successful. Note that sending a message to a module instance that was once active but has since stopped or been released will fail. (Ai , ϕ) ∈ A Ai send(j,ψ)! −→ A′i (Aj , ϕ′ ) 6∈ A hA, I, χi −→ hA′ , I, χi A′i undelivered(j,ψ)? −→ A′′i where A′ = (A \ {(Ai , ϕ)}) ∪ {(A′′i , ϕ)}. Thus we assume that the recipient of a message must be fully and correctly specified for it to be delivered. A different choice could be to always address messages to the top-level parent and look up which module instance of the receiving agent is currently active and deliver the message there. An objection to this would be that each module instance encapsulates a certain functionality and that a message sent to a specific module instance of an agent may make little sense to another module instance of the same agent. Finally, a general transition rule is needed for all actions α not equal to one of the module-specific ones introduced actions above (e.g. ‘normal’ actions such as assignments, function calls, etc.). Note that the execution of action α possibly leads to a change in the environment χ (as expressed by the subscript χ′ ). α! (Ai , ϕ) ∈ A Ai −→χ′ A′i hA, I, χi −→ hA′ , I, χ′ i where A′ = (A \ {(Ai , ϕ)}) ∪ {(A′i , ϕ)}. 3.3 Properties In this section we describe several properties (P1-P6) of the proposed module system. Proofs are omitted due to space limitations. All properties below assume a given multiagent configuration hA, I, χi. P1: If the names of all initial agents (i.e. those module instances with a basic, noncomposed name from Bid) are unique, then all module names (i.e., those modules instances with a name from Cid) that are generated at runtime are unique as well: ∀(Ai , φ) 6= (Aj , ψ) ∈ A ∪ I : i, j ∈ Bid ⇒ i 6= j ⇒ ∀(Ai , φ) 6= (Aj , ψ) ∈ A ∪ I : i 6= j (1) This property follows from 1) the fact that the transition rule for the create action does not allow a module instance to create two modules with the same name and 2) the fact that when different module instances create new module instances using equal names, they are still assigned unique names because their given names are composed with their ancestors’ names. P2: All children of an active module instance have ⊥ (a default value) as stopping condition: ∀(Ai , ϕ) ∈ A : ∀j ∈ Bid : (Ai.j , ψ) ∈ I ⇒ ψ = ⊥ (2) 67 Whenever a new module instance is created or an active one is halted (because it performed a return action), its stopping condition is/becomes irrelevant and is set to ⊥ as a default value. P3: All proper ancestors and descendants of an active module instance are themselves inactive: A ∀(Ai , ϕ) ∈ A : (anc A I (i) ∪ desc I (i)) \ {(Ai , ϕ)} ⊆ I (3) When a module instance activates another module instance by performing an execute action, it becomes inactive itself; when a module instance performs a return action, it becomes inactive and its parent becomes active again. Therefore only one module instance can be active at the time in a line of ancestors and descendants. P4: If an inactive module instance has a stopping condition not equal to ⊥, then all its ancestors must be inactive and it must have one active descendant: A ∀(Ai , ϕ) ∈ I : ϕ 6= ⊥ ⇒ anc A (4) I (i) ⊆ I & |desc I (i) ∩ A| = 1 Recall that whenever a module instance performs a return action, its stopping condition is set to ⊥. So when a module instance has a stopping condition not equal to ⊥ yet it is inactive, it must be the case that it has created and executed another module instance (which again may have done the same thing). P5: For each initial agent there is always exactly one active descendant (possibly itself): ∀(Ai , ϕ) ∈ A ∪ I : i ∈ Bid ⇒ |desc A (5) I (i) ∩ A| = 1 Each initial agent (i.e. those whose names are non-composed) can only pass control to other module instances by becoming inactive itself, and the same holds for every module instance down the line. This leads to the following corollary. P6: 4 | A | is constant. Roles, Profiles, and Task Encapsulation A module specification can be considered as the specification of a role. In this way, a role specifies a set of objectives (goals) to be achieved by the agent that plays the role, power that the agent gets when its plays the role (actions and plans), information that becomes accessible to the role playing agent (beliefs), and strategies of how to achieve objectives or react to events (rules). The runtime creation and execution of a module instance can then be used to implement the activation and enactment of a role. In particular, the action create(role,name) can be seen as the activation of a role. An agent that has successfully performed the action create(role,name) is the owner of role and may enact/play this role using execute(name, ϕ), where ϕ is a stopping condition, i.e., a composition of belief and goal queries. The owner agent is then put on hold until the role satisfies the terminating condition, at which point control 68 is returned to the owner agent. In this way, an agent can only play one role at each moment of time. In principle, it is allowed for a role to activate and enact a new role, and repeat this without (theoretical) depth limits. However, this is usually not allowed in literature on roles. We assume that it is up to the programmer to prevent roles from enacting other roles. As agents can be specified in terms of beliefs, goals and plans, we can use modules to represent agents. An agent can thus create and maintain profiles of other agents by creating module instances. For example, assume agent mary executes the actions create("profile template.mod", chris) and create("profile template.mod", john), i.e., it uses a single template to initialize profiles of the (hypothetical) agents chris and john. These profiles can be updated by mary using, e.g., updateB(chris, ϕ) and adoptgoal(john, ψ) when appropriate. mary can even ‘wonder’ what chris would do in a certain situation by setting up that situation using belief and goal updates on chris and then performing execute(chris, ϕ) with a suitable stopping condition ϕ. The resulting state of chris can be queried afterwards to determine what chris ‘would have done’. Modules can also be used for the common programming techniques of encapsulation and information hiding. Modules can encapsulate certain tasks, which can be performed by its owning agent if it performs an execute action on that module instance. Such a module can thus hide its internal state and keep it consistent for its task(s). An important difference between creating a module (in the sense proposed here) and including a module (in the sense of [3, 2, 4]) is that the contents of an included module instance are simply added to the including agent, whereas the contents of a created module instance are kept in a separate scope. So when using the create action, there can be no (inadvertent) clashes caused by equal names being used in different files for beliefs, goals, actions, and rules. 5 Conclusions and Future Work This paper introduced a mechanism to implement modules in BDI-based agent programming languages. The operational semantics for module-related actions such as creating, executing, testing, updating and releasing module instances are provided. It should be noted that these module-related actions are already added to the implemented 2APL interpreter such that 2APL multi-agent programs with modules can be developed and executed. We have also explained how modules can be used to facilitate the implementation of notions relevant to agent programming; namely, the implementation of agent roles and agent profiles. It should be noted that modularity in programming languages is not new. Our proposed notion of modules is inspired on the concepts found in many languages, particularly object-oriented languages. As a consequence some properties are the same, e.g. modules instances have an owner, which dictate the life cycle of the module. Also a module is designed with a particular task in mind, hiding the details from the owner. For future work, there are several extensions to this work on modularization that can make it more powerful for encapsulation and implementation of roles and agent profiles. Firstly, the execute action may not be entirely appropriate for the implementation of 69 profile execution, i.e., when an agent wonders “what would agent X (of which I have a profile) do in such and such a situation?”. This is because executing a profile should not have consequences for the environment and other agents, so a module representing an agent profile should not be allowed to execute external actions or send messages. Also, the execute action can be generalized to allow the simultaneous execution of multiple module instances. Doing so one may be able to implement agents that can play several roles simultaneously. Secondly, the notion of module can be generalized by introducing the possibility of specifying a minimum and maximum amount of instances of a module that can be active at one time. This can be used for ensuring that, e.g., there must always be three to five agents in the role of security guard. Additionally, one may want to be able to pass ownership of a module instance from one agent to another (especially when the module in question models a role) without losing its internal state. Thirdly, additional actions such as updateP and updateR can be introduced that accept as arguments a module instance and a plan or rule, so that all types of contents of module instances can be modified during runtime. In particular, by creating an empty module instance and using update* actions, modules instances can be created from scratch with custom components available at runtime. A related issue is the access to the internals of module instances by means of test and update actions. In order to manage the access to the internals of module instances, modules can be specified as private or public allowing restricted access to the internals of modules. References 1. Rafael H. Bordini, Michael Wooldridge, and Jomi Fred Hübner. Programming Multi-Agent Systems in AgentSpeak using Jason (Wiley Series in Agent Technology). John Wiley & Sons, 2007. 2. L. Braubach, A. Pokahr, and W. Lamersdorf. Extending the Capability Concept for Flexible BDI Agent Modularization. In Proc. of ProMAS ’05, pages 139–155, 2005. 3. P. Busetta, N. Howden, R. Ronnquist, and A. Hodgson. Structuring BDI Agents in Functional Clusters. In N. Jennings and Y. Lesperance, editors, Intelligent Agents VI: Theories, Architectures and Languages, pages 277–289, 2000. 4. M. Dastani. 2APL: a practical agent programming language. International Journal of Autonomous Agents and Multi-Agent Systems (JAAMAS), 16(3):214–248, July 2008. 5. K. Hindriks. Modules as policy-based intentions: Modular agent programming in GOAL. In Proc. of ProMAS ’07, volume 4908. Springer, 2008. 6. Peter Novák and Jürgen Dix. Modular BDI architecture. In Proceedings of the AAMAS’06, 2006. 7. A. Pokahr, L. Braubach, and W. Lamersdorf. Jadex: A BDI reasoning engine. In Multi-Agent Programming: Languages, Platforms and Applications. Kluwer, 2005. 8. M. B. van Riemsdijk, M. Dastani, J.-J. Ch. Meyer, and F. S. de Boer. Goal-Oriented Modularity in Agent Programming. In Proceedings of AAMAS’06, pages 1271–1278, 2006. 9. M. Winikoff. JACKT M intelligent agents: An industrial strength platform. In Multi-Agent Programming: Languages, Platforms and Applications. Kluwer, 2005. 70 BDI logic with probabilistic transition and fixed-point operator NIDE, Naoyuki1 , Shiro Takata2 , and Megumi Fujita1 1 Nara Women’s University, Kita-Uoya Nishimachi, Nara-shi, Nara, 630-8506 Japan [email protected], [email protected] 2 Kinki University, Kowakae 3–4–1, Higashi-Osaka-shi, Osaka, 577-8502 Japan [email protected] Abstract. One of the advantages of the BDI (Belief-Desire-Intention) model is that we can formally discuss and prove properties about the mental states (beliefs, desires and intentions) and behaviors of rational agents using a modal logic called BDI logic. However, various extensions, such as probabilistic state transitions in reinforcement learning and cooperative acts in multi-agent environments, have been attempted in the BDI model. Since those notions are difficult to treat precisely in traditional BDI logic, the advantage of formalization in BDI logic is diminished. In this paper, we propose an extension of BDI logic, called TOMATO, which introduces probabilistic state transitions and a fixed-point operator. We can strictly describe and infer various properties of rational agents with those extended notions by using TOMATO. 1 Introduction The BDI (Belief-Desire-Intention) model [1] is a model of rational agents based on Bratman’s “theory of intention” [2, 3]. There have been many studies and applications on this model, which have proved its usefulness [4]. In the BDI model, a rational agent has three kinds of mental states, which are belief, desire and intention, and the agent determines its action to achieve its goal by maintaining and updating these states of mind. One of the features of the BDI model is that it has a modal logic system called “BDI logic”. BDI logic explicitly describes those mental states and their temporal changes, so we can formally prove and discuss rational agents’ mental states and their behaviors. For example, a blind commitment strategy [5], wellknown one of the commitment maintenance strategies which is stated as ‘once an agent intends to achieve φ necessarily in the future, then she maintains that intention until she believes that she has achieved φ’, can be written as INTEND(AF φ) ⊃ A(INTEND (AF φ) U BEL(φ)). As another example, a property of rational agent that “if an agent intends to achieve p at the next time point, and believes that p and q are mutually excluded forever, then she does not intend to achieve q at that time”, one of the consistencies of mental states [2], can be shown by proving INTEND(AX p) ∧ BEL(AG(p ⊃ ¬q)) ⊃ ¬ INTEND(AX q). This point is considered to be a major advantage to designing rational agents, and that’s why the BDI model has been generally accepted. 71 However, in the advancement of research of rational agents, various extensions to BDI logic have been proposed. If there are mismatches between notions appearing in these extensions and the ones in traditional BDI logic, we may have difficulties in formalizing them appropriately. Therefore, one of the advantages of the BDI model that we can strictly discuss properties about rational agents can be diminished. Examples of such extensions are, as described in Section 2, “probabilistic state transitions” which are used in the reinforcement learning task and “cooperative actions” which are used in multi-agent system. In particular, these notions are considered important for realization of rational agents in the real world. Based on this standpoint, we propose a logic system called TOMATO (Theory about Observations of Multi-Agents with Tense and Odds) which introduces probabilistic state transitions and a fixed-point operator by extending traditional BDI logic. We have constructed sound and complete deduction systems of traditional BDI logic using sequence calculi [6–8]. Therefore, we also aim to construct one for TOMATO. In this paper, we show the soundness of the deduction system of TOMATO, and in addition, the completeness which is restricted to propositional logic. Our future work includes studying the completeness of TOMATOin predicate logic. With a deduction system, we can formally discuss properties of rational agents syntactically rather than semantically, and automatic proof checking also becomes possible. We also intend to construct a decision algorithm using the tableau method [9] in the future, though restricted to propositional logic. One of the advantages of TOMATO is that, using probabilistic state transition operators, we can describe state transitions in MDPs (Markoff decision processes), which is a basis of the reinforcement learning task. In addition, using a fixed-point operator, we can finitely describe notions, such as mutual belief and cooperative intentions, in multi-agent systems, which cannot be described in LORA [10] without using infinite conjunctions/disjunctions. Moreover, inferences about these properties using sequent calculus are possible. These points are discussed in detail in Section 4. In this paper, we first describe the mismatches between the traditional BDI model and the above-mentioned new notions in Section 2, and we introduce TOMATOin Section 3. In Section 4, we show examples of descriptions and proofs in TOMATO concerning probabilistic state transitions and cooperative actions. In Section 5, we present discussions and describe our future work, and conclude in Section 6. 2 2.1 Divergence from BDI Model Treatment of probabilistic state transition As described in Section 1, one of the notions that is difficult to treat strictly in traditional BDI logic is the idea of probabilistic state transitions, which is mandatory to incorporate machine learning techniques into the BDI model. We propose the integration of a BDI agent and reinforcement learning, in which an agent combines deliberation and reflexive actions according to the situation [11]. For example, when we are passing a familiar road, we can select the route in response to our surroundings without the need for practical reasoning. As another example, a soccer player instantaneously performs an appropriate action according to the 72 skills acquired by intensive training. Our idea is, similar to these situations, to import reactive action acquired by learning into a BDI agent to enable more human-like behaviors. We attempted, within the BDI model, to describe state transitions used in MDP [12], which is a basis for the reinforcement learning task [13]. However, MDP is basically based on probabilistic transitions, and within traditional BDI logic, which does not have probabilistic transition operators, we can only describe agent movement as “moves one of the accessible states”. For instance, if we try to write a situation “if an agent at state s1 executes an action e1 , then it transfers to state s2 and receives reward 3 with probability 0.7, or transfers to state s3 and receives reward 5 with probability 0.3” in traditional BDI logic, we have to eliminate the probabilities and only write as “transfers to either one”. PCTL [14] is known as a logical system that extends CTL to treat a probabilistic transition. However, since it describes probability per path (a line of time points) as described in Section 5.2, describing the probability for each action (event) may be difficult in this logic. 2.2 Treatment of cooperative action Another example is the difficulty in the treatment of cooperative actions in multi-agent environments. Even though this is an important issue, the original BDI logic can treat only a single agent’s mental state. There is a logical system LORA [10], which is extended to describe the mental states of multiple agents in multi-agent environments. It treats various concepts required for handling agents’ cooperative actions, such as mutual belief, recognition of the potential for cooperative action, and generation and execution of joint intension. However, LORA is a complicated logical system with various components, including action expressions corresponding to dynamic logic and operators such as Agt for judging whether an agent can execute an action. Nevertheless, it is still necessary to introduce new operators, by using infinite conjunctions/disjunctions of formulas, to describe cooperative actions, If a logical system is complicated, it will be intractable and difficult to construct its deduction system. Then the advantage of formalization in the logic is diminished. In fact, the deduction system of LORA has not been given. As an example, for an agent group g, to form a joint intention for achieving a mutual goal (φ) of lifting a 1-ton object, it is necessary that agents in g can achieve this only cooperatively, and they mutually believe this fact. To describe this situation in LORA, we introduce the formula (J-Can0 g φ) using pre-existing operators, which states that g can first achieve φ in a single step, as an abbreviation of a formula signifying that “g can execute some action α and φ is achieved by this action. Also, g mutually believes this fact”. Next, a formula (J-Can g φ) which states that an agent group g can achieve the goal φ, is introduced as an abbreviation of the infinite disjunction (J-Can0 g φ) ∨ (J-Can0 g (J-Can0 g φ))∨(J-Can0 g (J-Can0 g (J-Can0 g φ)))∨· · · . Subsequently, the process of forming a joint intention of achieving φ is described using J-Can. However, to be accurate, we have to introduce J-Can as a new operator rather than as an abbreviation, because the infinite disjunctive cannot be originally written as a 73 proper formula1 . Moreover, because infinite conjunctions are used in the definition of mutual belief2 , this part of J-Can cannot be written in LORA either. Consequently, we consider treating infinite conjunctions and disjunctions uniformly by introducing a fixed-point operator to reduce complication of the syntax. 3 Extension of BDI logic In this section, based on the discussions so far, we propose a modal logic system TOMATO for easily handling the notions described in Section 2. TOMATO is a branching-time temporal logic with a fixed-point operator and mental state operators for each agent in multi-agent environments. 3.1 Formulas Syntax We give the definition of formulas in TOMATO here. Hereinafter, the word ‘formula’ means that of TOMATO unless expressly stated otherwise. Symbols like x and y are used as usual variable symbols in first-order predicate logic, and symbols such as X and Y are variable symbols, each of which expresses a formula. We call the latter ‘formula variables’3 . Typically, they are used with fixed-point operators. Suppose that we fix a first-order language L, a set of formula variables V, a set of event constant symbols E, and a set of agent constant symbols A, where E and A are finite and V is infinite. Hereafter, we write {p | p ∈ R, 0 ≤ p ≤ 1} as [0, 1]. Then, Any atomic formula in L is a formula (in TOMATO). If φ, ψ are formulas, then φ ∨ ψ and ¬φ are also formulas. If φ is a formula and x is a variable symbol, then ∀xφ is also a formula. If e ∈ E, n is a positive integer, and for i = 1, 2, · · · , n, φi is a formula, pi ∈ [0, 1] and ri ∈ {≥, >}, then Xe (r1 p1 φ1 | · · · | rn pn φn ) is a formula. In particular, when n = 1, we write Xer1 p1 φ1 instead of Xe (r1 p1 φ1 ). – If φ is a formula and a ∈ A, then BELa φ, DESIREa φ, INTENDa φ are formulas. – If X ∈ V, then X is a formula. – If φ is a formula, X ∈ V, and X does not occur negatively (i.e. does not occur in odd number of nesting of ‘¬’) in φ, then µX.φ is a formula. However, X may occur only inside the scope of any modal operator (Xe , BELa , DESIREa , INTENDa ); for example, µX.p ∧ X is not a formula. – – – – We introduce ∧, ⊃, ⇔, ∃ as abbreviations in the usual manner. In addition, νX.φ is an abbreviation of ¬µX.¬φ[X := ¬X]. Here µ is the so-called least fixed-point operator [15], and ν is the greatest fixed-point operator. We also introduce notations Xe<p φ, 1 2 3 In other words, no finite formula in LORA can be semantically equivalent to (J-Can g φ), without introducing a new operator. See [10] for the need of infiniteness. The name ‘formula variables’ may be slightly irrelevant, because they don’t range over formulas. However, their main use is to form fixed-points, which can be regarded as new formulas. In this sense, we call them ‘formula variables’. 74 next time φ 1 .2 φ φ current 1 2 time e e .1 φ 2 e φ’ holds e .4 φ .3 φ does not hold 1 φ2 next time current time e .3 e .5 e φ’ holds .2 φ holds ′ φ = X (≥.3 φ1 | ≥.5 φ2 ), φ = e Xe≥.3 φ1 ∧ ψψ e .4 1 2 e .6 ψ 3 Z1 φ1 φ Z2 ψψ e .6 2 3 2 e .4 Xe≥.5 ψ1 Z3 ψψ e .6 3 1 e .4 ψ2 φ2 Fig. 1. Intuitive explanation of Xe operator Xe≤p φ, Xe=p φ as abbreviations of ¬ Xe≥p φ, ¬ Xe>p φ, (Xe≥p φ)∧(¬ Xe>p φ), respectively. When needed, we eliminate ambiguities using parenthesis. Without parenthesis, operators associate in the following order: unary operators (including fixed-point operators), ∧, ∨, ⊃, ⇔. Moreover, ⊃ is right-associative, while other binary operators are left-associative. Informal explanation of operators Xe is an extension of the next-time operator AX in CTL with an event e and transition probabilities. For example, Xe (≥.3 φ1 | ≥.5 φ2 ) intuitively means that if an event e occurs, then at the next time point, φ1 holds with probability of at least 0.3, and aside from that case, φ2 holds with probability of at least 0.5. Note the difference between that formula and Xe≥.3 φ1 ∧ Xe≥.5 φ2 ; the former ensures that the case in which φ1 holds and the one in which φ2 holds does not overlap, but the latter does not (the left half of Fig. 1, where at each state φ1 and φ2 may or may not hold unless expressly stated). BELa φ, DESIREa φ and INTENDa φ mean that an agent a has a belief, desire or intention φ, respectively. For simplicity, we currently do not introduce probabilities into these mental state operators. However, it is thought to be possible to do so in the same way as for Xe operator. It can be useful for modeling agents, which have functions of some sort of statistical estimations such as pattern recognition. Expressiveness compared to traditional BDI logics It is known that branching-time temporal logics with AX and the fixed-point operators have strictly stronger expressive power than CTL* [16, 17]. Since V TOMATO has an individual next-time operator for each event, we have to write e∈E AXe φ (where, and hereafter, AXe φ is an abbreviation of Xe≥1 φ) to express what is equivalent to AX φ in CTL. Formulas using other CTL or CTL* operators can also be written in TOMATO in a similar manner. Moreover, with event-wise next-time operators, we can write formulas such as µX.(ψ ∨ φ ∧ AXe X), which means that “if an event e continuously occurs, then φ holds until ψ holds” and cannot be handled by CTL*. Using the fixed-point operator, we can also handle notions which correspond to the action expressions in LORA [10]. In LORA, concatenations, choices, and repetitions of actions, such as in dynamic logic, can be written as action expressions. For example, 75 B at B at t t e1 0.5 e1 t e1 1.0 e2 0.4 0.5 e1 0.6 e1 e2 e3 1.0 0.6 0.4 e2 w’ 1.0 w’’ =world w =state Fig. 2. Overview of BDI structure a formula of LORA (Nec α φ) means “just after executing an action α, φ holds”. Supposing that α = ((α1 ; α2 )∗; α3 ), it means “if an action α3 is executed soon after executing an action sequence α1 , α2 for arbitrary times, then φ holds”. In TOMATO, the equivalent of this can be written as νX.(AXe3 φ ∧ AXe1 AXe2 X), where e1 , e2 , e3 are events corresponding to α1 , α2 , α3 , respectively. Mutual mental statesV[8, 10] can also be handled by the fixed-point operator. When g ⊂ A, we abbreviate a∈g BELa φ as E-BELg φ. Then, we abbreviate E-BELg νX. (φ ∧ E-BELg X) as M-BELg φ, which means that “a group of agents g has a mutual belief φ”. Mutual desires and intentions can be written in the same manner. 3.2 Semantics BDI structure First we fix the following: – a set of possible worlds W (6= ∅) – for each w ∈ W , a set of states St w (6= ∅) (may be different in different worlds) – for each w ∈ W and each t ∈ St w , an interpretation (including variable assignment) iw,t of L. In other words, a domain U and an interpretation of each constant, predicate, function, and variable symbol of L. All components except the interpretation of predicate symbols must be the S same for all states. – for each a ∈ A and each t ∈ w∈W St w , a serial, transitive and Euclidean binary relation Bat on the set {w | t ∈ St w }, and serial binary relations Dat , Iat on the same set. e – for each w ∈ W and eachP e ∈ E, a serial binary relation Rw on St w , and a function e e ′ Pwe : Rw → [0, 1] where t′ ∈{t′ |tRw P (t, t ) = 1 for any t ∈ St w . e t′ } w We call a tuple of the above-mentioned components a BDI-structure. Intuitively, a state corresponds to a time point in temporal logics, and a possible world is a time tree of e states. t Rw t′ and Pwe (t, t′ ) = p mean that if an event e occurs at state t, then the ′ next time is t with probability p. Bat , Dat , and Iat are accessibility relations on possible worlds at time t, which represent the belief, desire and intention of agent a, respectively (an overview is shown in Fig. 2). e Since each Rw is defined to be serial, any event can occur at any state. However, in fact, usually only specific events can occur at a specific state. This property can be expressed by establishing a so-called dead-state d, at which a specific atomic formula 76 dead holds, and creating state transitions from any state t to d with any non-executable event at t (in particular, state transition from d by any event goes to d itself). For example, a property that “if an event e can occur, then φ holds after e occurs” can be written as ¬ AXe dead ⊃ AXe φ. In this paper, for simplicity, we do not consider the mental state consistencies of the BDI model [7, 18]. Thus there are no special relationships among Bat , Dat and Iat . A brief discussion on this issue appears in Section 5.1. Interpretation of formulas We write {(w, t) | w ∈ W, t ∈ St w } as Swt hereafter. Given a BDI structure M and a function fV : V → 2Swt , we define the interpretation [ φ]]hM,fV i of a formula φ as follows (note that [ φ]]hM,fV i ⊂ Swt). – – – – – – – – If φ is an atomic formula, [ φ]]hM,fV i = {(w, t) | φ is true w.r.t. iw,t } [ φ ∨ ψ]]hM,fV i = [ φ]]hM,fV i ∪ [ ψ]]hM,fV i [ ¬φ]]hM,fV i = Swt T \ [ φ]]hM,fV i [ ∀xφ]]hM,fV i = u∈U [ φ]]hM u ,fV i where M u is a BDI structure obtained by replacing the interpretation of x in M with u. [ Xe (r1 p1 φ1 | · · · | rn pn φn )]]hM,fV i = {(w, t) | there are some mutually disjoint e subsets T1 , · · · , Tn of {t′ | t Rw t′ } s.t. Ti ⊂ {t′ | (w, t′ ) ∈ [ φi ] hM,fV i } and P e ′ t′ ∈Ti Pw (t, t ) ri pi for i = 1, · · · , n} (note that each r1 , · · · , rn is ≥ or >) [ BELa φ]]hM,fV i = {(w, t) | for any w′ s.t. w Bat w′ , (w′ , t) ∈ [ φ]]hM,fV i } Similar for [ DESIREa φ]]hM,fV i and [ INTENDa φ]]hM,fV i [ X]]hM,fV i = fV (X) for X ∈ V Then, a formula φ, with (or without) free occurrences of a formula variable X, can be regarded as a function fφ : Swt → Swt, which receives an interpretation of X as an argument and returns an interpretation of φ. Therefore, we define that – [ µX.φ]]hM,fV i is the least fixed-point of fφ . Here, the least fixed-point is known to exist since fφ in this case is monotonic by definition [19]. We say that φ holds at a state t of a world w when [ φ]]hM,fV i ∋ (w, t). If [ φ]]hM,fV i = Swt for any M and fV , we say that φ is valid. 3.3 Deduction system In this section we give a deduction system of TOMATOusing sequent calculus. We identify α-equivalent formulas. We regard the left-hand side of ‘→’ of a sequent as a (finite) multi-set of formulas, and likewise for the right-hand side (thus we do not have the exchange rule). Hereafter, we sometimes enclose a whole sequent into [ ] to clarify the range of the sequent in the text. We use a capital Greek letter (Σ, ∆ etc.; including a letter with a hash such as Σ ′ , and ∆′ ) to denote a multi-set of 0 or more formulas. As an exception, Θ contains only one or no formula. V W The interpretationVof a sequent [Σ W→ ∆] is defined as that of the formula Σ ⊃ ∆. We define that ∅ = true and ∅ = false, where true is an abbreviation of a suitable tautology and false is an abbreviation of ¬true. 77 Σ, φ, φ → ∆ Σ → ∆, φ, φ Σ→∆ Weak CL CR Initial φ→φ Σ, Σ ′ → ∆, ∆′ Σ, φ → ∆ Σ → ∆, φ Σ → ∆, φ Σ, φ → ∆ Σ, φ → ∆ Σ, ψ → ∆ Σ → ∆, φ, ψ ¬L ¬R ∨L ∨R Σ, ¬φ → ∆ Σ → ∆, ¬φ Σ, φ ∨ ψ → ∆ Σ → ∆, φ ∨ ψ Σ, φ[x := t] → ∆ Σ → ∆, φ[x := y] ∀L ∀R Σ, ∀xφ → ∆ Σ → ∆, ∀xφ Γ, Xe>1−p ¬φ → ∆ X≥ R Γ → ∆, Xe≥p φ Γ, φ[X := µX.φ] → ∆ Γ → ∆, φ[X := µX.φ] µL µR Γ, µX.φ → ∆ Γ → ∆, µX.φ Γ, Xe≥1−p ¬φ → ∆ X> R Γ → ∆, Xe>p φ Γ, BELa Γ → BELa ∆, Θ, BELa Θ Γ →Θ BEL-KD45 DESIRE-KD BELa Γ → BELa ∆, BELa Θ DESIREa Γ → DESIREa Θ · · · Xer1 p1 (φ1 ∧ ψ1 ) ∧ · · · ∧ Xern pn (φn ∧ ψn ) · · · Γ →Θ Xexcl INTEND-KD INTENDa Γ → INTENDa Θ · · · Xe (r1 p1 φ1 | · · · | rn pn φn ) · · · Fig. 3. Inference rules of TOMATO(excluding a rule described in Section 3.4) Inference rules We enumerate the inference rules of TOMATO in Fig. 3. However, note that there is an additional rule which is concerned with the Xe operator in the left-hand side of ‘→’ of a sequent. It is not shown in Fig. 3 but described in Section 3.4. For a multi-set of formulas Γ and a unaly operator K, K(Γ ) stands for a multi-set of formulas obtained by applying K for each element of Γ . In the ∀L rule, t is an arbitrary term. In the ∀R rule, y is a variable symbol which does not occur freely in the conclusion of the rule. The Xexcl rule means that any subformula of the form shown in the assumption anywhere in the sequent can be replaced by the formula shown in the conclusion. In this rule, n ≥ 2, and for i = 1, · · · , n, ψi is ¬X1 ∧ · · · ∧ ¬Xi−1 ∧ Xi ∧ ¬Xi+1 ∧ · · · ∧ ¬Xn , where X1 , · · · , Xn are formula variables that does not occur freely in the conclusion of the rule. This rule is provided so that we can decompose formulas in the form of Xe (· · · ) into those in the form of Xer1 p1 φ1 , by reversely applying it. The BEL-KD45 rule, same as in [6, 20], is constructed so that the axiom schemas KD45 for the BEL operator are ensured to be held. The µL and µR rules are provided to enable proofs by loop (see Section 6 for example), such as in [6, 20, 21]. 3.4 Additional inference rule for Xe In this section, we describe an additional inference rule for the Xe operator, which is not included in Fig. 3. Let Γ = {Xer1 p1 ψ1 , · · · , Xern pn ψn }, where each r1 , · · · , rP n is ≥ or >, and Q = Q {ψ , · · · , ψ }. If a function v : 2 → [0, 1] satisfies that n Q⊂Q v(Q) = 1, and P1 ( Q∈{T |T ⊂Q,ψi ∈T } v(Q)) ri pi holds for each i = 1, · · · , n, then we call v a probability distribution function (PrDF) of Γ . Intuitively, a PrDF determines probabilities of transitions from a state to next-time states, at each of which a subset Q of Q holds, so that for each ψi , the probability that ψi holds satisfies ri pi . For a PrDF v of Γ , we call {Q ⊂ Q | v(Q) > 0} a satisfaction request set (SRS) of Γ on v, and write it as reqv (Γ ). Let Z be an SRS of Γ (on some PrDF v). If all elements of Z are satisfiable, we say that Z is satisfiable. In general, Γ is satisfiable iff there is a 78 ψ1 , ψ2 → ψ2 , ψ3 → ψ3 , ψ1 → Xe≥.3 ψ1 , Xe≥.4 ψ2 , Xe≥.6 ψ3 → ψ1 → Xe≥.3 ψ1 , Xe≥.4 ψ2 , Xe≥.6 ψ3 → ψ3 → Xe≥.3 ψ1 , Xe≥.4 ψ2 , Xe≥.6 ψ3 → ψ2 → Xe≥.3 ψ1 , Xe≥.4 ψ2 , Xe≥.6 ψ3 → Fig. 4. Example of inference rules about Xe operators satisfiable SRS Z of Γ . If, for Z, Z ′ ⊂ 2Q , some Q ∈ Z and Z ′′ ⊂ 2Q exist and Z ′ = (Z ∪ Z ′′ ) \ {Q} holds, we write Z ≻ Z ′ . Let ≻≻ be a non-reflective transitive closure of ≻. Note that if Z ≻≻ Z ′ and Z is satisfiable, then Z ′ is also satisfiable. If Z is an SRS of Γ , and there is no SRS Z ′ of Γ which satisfies Z ≻≻ Z ′ , we say that Z is an essential SRS (eSRS). Since Q is finite, there is no infinite sequence Z1 , Z2 , · · · s.t. Z1 ≻ Z2 ≻ · · · . As a result, there exists a satisfiable SRS of Γ iff there exists a satisfiable eSRS of Γ . Let Z1 = {Q1,1 , · · · , Q1,m1 }, · · · , Zk = {Qk,1 , · · · , Qk,mk } be the enumeration of all eSRSs of Γ . Then for any sequence of positive integers j1 , · · · , jk , where 1 ≤ j1 ≤ m1 , · · · , 1 ≤ jk ≤ mk , the following is an inference rule of TOMATO. Q1,j1 → · · · Qk,jk → X-KD Γ → For example, Let Γ = {Xe≥0.3 ψ1 , Xe≥0.4 ψ2 , Xe≥0.6 ψ3 } and Q = {ψ1 , ψ2 , ψ3 }. Then a function v : 2Q → [0, 1], where v({ψ1 , ψ2 }) = 0.4, v({ψ3 }) = 0.6, and v(Q) = 0 for all other Q ⊂ Q is a PrDF of Γ , and reqv (Γ ) = {{ψ1 , ψ2 }, {ψ3 }} = Z1 is an SRS of Γ . Z1 is also an eSRS. A function v ′ , where v ′ ({ψ1 , ψ2 }) = 0.3, v ′ ({ψ2 }) = 0.1, v ′ ({ψ2 , ψ3 }) = 0.6, and v ′ (Q) = 0 for all other Q ⊂ Q is also a PrDF of Γ , but reqv′ (Γ ) = Z1′ is not an eSRS since Z1′ ≻≻ Z1 . In this example, Z1 and Z2 = {{ψ2 , ψ3 }, {ψ1 }} and Z3 = {{ψ3 , ψ1 }, {ψ2 }} (see the right half of Fig. 1) are all eSRSs. In the tableau method, to show that [Γ →] is provable, we have to show that Γ is unsatisfiable. It is equivalent to show that there is no satisfiable eSRS of Γ , and also equivalent to show that any eSRS of Γ has at least one unsatisfiable element. The X-KD rule is constructed in this way. Therefore, in this example, we have 8 (= |Z1 | · |Z2 | · |Z3 |) rules, and one of them is the following. ψ 1 , ψ2 → ψ 2 , ψ 3 → ψ 2 → Xe≥.3 ψ1 , Xe≥.4 ψ2 , Xe≥.6 ψ3 → However, the leftmost two sequents of the assumption of this rule are redundant. After removing similar redundancies from other rules, we need only 4 rules, as shown in Fig. 4, and the rest 4 can be omitted since the assumption of each of those includes another rule. (In addition, rules in Fig. 4 except the upper-left-most one can be aggregated into a single rule φ→ , where 0 < p ≤ 1.) Xe≥p φ → Definition of provability A sequent S is said to be derivable from a set L of sequents if one of the following conditions holds. 79 1. S ∈ L S , · · · , Sn 2. There is an inference rule 1 (n ≥ 0) and all of S1 , · · · , Sn are derivable S from L We say that a sequent S is provable if one of the following conditions is satisfied. Here φn (X) is defined as φ0 (X) = X and φn (X) = φ[X := φn−1 (X)]. 1. S is derivable from ∅. 2. S = [Σ, µX.φ → ∆] where X does not occur freely in Σ, ∆, and there is a positive integer n s.t. [Σ, φn (X) → ∆] is derivable from {[Σ, X → ∆]}. A formula φ is defined to be provable if [→ φ] is provable. Soundness and completeness In this section, we first show the soundness of TOMATO, and then we show a proof sketch to show the completeness of TOMATO restricted to propositional logic. A study of the completeness of TOMATOon predicate logic is for future work. To show the soundness, it is enough to show that every inference rule preserves the validity of sequents, and that S in the item 2. of the provability definition is valid. The former is easy; therefore, we show the latter. For any ordinal α and the function fφ in Section 3.2, we define a function fφα : Swt → Swt as follows. fφα+1 (x) = fφ (fφα (x)) fφ0 (x) = x [ {fφα (x)} when λ is a limit ordinal fφλ (x) = α<λ Then, if [Σ, φn (X) → ∆] is derivable from {[Σ, X → ∆]}, for any BDI structure and any infinite ordinal α, [Σ → ∆] holds at any state in fφα (∅). Also, an infinite ordinal α exists s.t. fφα (∅) = [ µX.φ]]. Thus [Σ, µX.φ → ∆] is valid. Next we show the proof sketch of the completeness restricted to propositional logic. Without loss of generality, we can assume that any subformulas of the form Xe (r1 p1 φ1 | · · · | rn pn φn ), where n ≥ 2 do not occur anywhere in sequents, since we can omit them by reversely applying the Xexcl rule (as described in Section 3.3). Let Nps be a set of non-provable sequents that have only atomic formulas (i.e. atomic propositions) and formulas in the form of µX.φ, Xerp φ, BELa φ, DESIREa φ, and INTENDa φ in the both sides of ‘→’, but do not have formulas in the form of Xerp φ to the right of ‘→’. For S ∈ Nps, we define dec-µ(S) as a non-provable sequent obtained from S by reversely applying µL/R, ∨L/R, ¬L/R, X≥ R, and X> R rules as many times as possible. If there are more than one such sequents, choose an arbitrary one as dec-µ(S). Note that we cannot apply µL/R infinite times because in a formula µX.φ we do not have any X outside the scope of modal operators. Regarding Nps as a set of states, we construct a ‘flat’ version of BDI structure (i.e. we do not take the set of worlds W into consideration, and all accessibility relations are binary relations on Nps) by the following procedure, which is based on Wang’s algorithm [20, 22] for propositional modal logics. First, we define binary relations Ba , Da , and Ia on Nps for each a ∈ A as follows. 80 – S Da S ′ iff we can obtain S ′ from dec-µ(S) by applying the following procedure: 1. First, reversely apply Weak to dec-µ(S) to leave only all formulas in the form of DESIREa φ to the left of ‘→’, and only one (iff there is any) of them in that form to the right of ‘→’. 2. Then, reversely apply DESIRE-KD once to remove outermost DESIREa . 3. Last, reversely apply rules ∨L/R, ¬L/R, X≥ R, X> R as many times as possible. – Similar for Ia . – To define Ba , we first define Ba′ in a similar manner to that for Da and Ia . Let BELa+ (S) be the set of formulas of the form BELa φ to the left of ‘→’ of sequent dec-µ(S), and BELa− (S) be a similar one for the right of ‘→’. Assume that S = S0 Ba′ S1 Ba′ S2 Ba′ · · · . Then BELa+ (S0 ), BELa+ (S1 ), · · · , and BELa− (S0 ), BELa− (S1 ), · · · are both monotonically nondecreasing. Therefore, due to the finiteness of formulas and sequents, there is some Sk that satisfies that if Sk Ba′∗ S ′ (here Ba′∗ is the transitive closure of Ba′ ), then BELa+ (Sk ) = BELa+ (S ′ ) and BELa− (Sk ) = BELa− (S ′ ). We define that S Ba S ′ , S ′ Ba S ′′ iff Sk Ba′∗ S ′ and Sk Ba′∗ S ′′ . Next we define binary relations Re on Nps and a function P e : Re → [0, 1] for each e ∈ E as follows. Let a sequent S be given. 1. First, we reversely apply Weak to dec-µ(S) to leave only all formulas in the form of Xerp φ in the both sides of ‘→’. 2. Then, reversely apply X≥ R and X> R as many times as possible to move all formulas in the form of Xerp φ in the right-hand side of ‘→’ toward the left of ‘→’. At this moment the sequent is in the form of [Γ →], where Γ is {Xer1 p1 ψ1 , · · · , Xern pn ψn }. 3. Since [Γ →] is not provable, by the construction of X-KD rule, there is a PrDF v of Γ and an eSRS {Q1 , · · · , Qm } of Γ on v, where none of sequents [Q1 → ], · · · , [Qm →] are provable. 4. Now, we put S Re S ′ and P e (S, S ′ ) = v(Qj ) iff S ′ can be obtained from some [Qj →] above, by reversely applying rules ∨L/R, ¬L/R, X≥ R, and X> R as many times as possible. Subsequently, for each state t in Nps, we choose an interpretation it of atomic propositions s.t. it (p) is true iff p occurs to the left of ‘→’ of the sequent dec-µ(t). In addition, we also choose a function fV : V → 2Swt s.t. fV (X) ∋ t iff X occurs to the left of ‘→’ of the sequent dec-µ(t). Now we have a ‘flat’ BDI structure. In addition, Ba satisfies KD45, and all other accessibility relations satisfy KD. We can easily convert it into a normal BDI structure M. Then we show that for each state t in M , formulas to the left of ‘→’ of the sequent t are true at t, and ones to the right are false at t. We do so only for the formulas of the form µX.φ at both sides of ‘→’. Let F be a set of states (sequents) in M , which has µX.φ to the right of ‘→’. By the construction method of M , for any ordinal α, we can show that (fφα (∅))c ⊃ F (here Ac denotes a complement set of A). Therefore, µX.φ is false at any state in F . Let S be a state (sequent) in M , which has µX.φ to the left of ‘→’, and S(n) be a state obtained from S by replacing µX.φ with φn (X). By the construction method of 81 M and the finiteness of formulas and sequents, there is a positiveSinteger n s.t. for each sequence of states S0 A S1 A · · · , where S0 = S(n) and A = a,t (Bat ∪ Dat ∪ Iat ) ∪ S e 4 e,w Rw , one of the followings holds . i. X does not occur in some Sk . ii. There are some k, ℓ s.t. Sk = Sℓ and X occurs in Sk . If all such sequences satisfy ii., then S is provable using the item 2. of the provability definition, and contradicts the assumption. Therefore, there is a sequence that satisfies i. above. By the construction of M , there is also a sequence S0′ A S1′ A · · · , where S0′ = S and which satisfies i., and again by the construction of M , µX.φ is true at S. A decision algorithm for propositional TOMATO can be directly derived from this proof of the completeness (if an algorithm to calculate eSRS is provided). We plan to mention this in the future. 4 Examples of description and proof 4.1 Modeling probabilistic state transitions We can write the situation in the example of Section 2.1 as at(s1 ) ⊃ Xe1 (≥.7 at(s2 ) ∧ reward(3) | ≥.3 at(s3 ) ∧ reward(5)) using the probabilistic transition operator of TOMATO. Let φ be this formula. We can confirm that if we are at s1 , then after executing e1 , we can surely receive reward 3 or more by proving φ ∧ at(s1 ) ⊃ AXe1 ∃x(reward(x) ∧ x ≥ 3), provided that we can prove 3 ≥ 3 and 5 ≥ 3. The proof is shown in Fig. 5, where we abbreviate at(s2 ), reward(3), at(s3 ), reward(5), and reward(x) ∧ x ≥ 3 as p1 , q1 , p2 , q2 , and ψ, respectively. An X-KD rule applied between the 3rd column from the bottom and a column right above it is derived from the fact that all eSRSs of 1 1 1 {Xe≥.7 ξ1 , Xe≥.3 ξ3 } are {{ξ1 , ξ2 }, {ξ3 }}, {{ξ2 , ξ3 }, {ξ1 }}, and {{ξ3 , ξ1 }, {ξ2 }} ξ2 , Xe>0 (where ξ1 , ξ2 , ξ3 are arbitrary formulas). Machine learning cannot be performed only by describing in logic, and requires external tools to do so. However, after learning, we can describe the result as a property of an agent like the one above. Also, there is a possibility to implement a learning system within a frame of logic. In this sense, treating such properties in logic has a positive significance. 4.2 Modeling coordinated actions J-Can described in Section 2.2 is necessary to describe coordinated actions. However, in LORA, it can only be written using infinite disjunctions and conjunctions. It is expressible in TOMATOusing the fixed-point operator. To describe the first half of the description of (J-Can0 g φ) in Section 2.2 (i.e. “g 4 In other words, the process of reversely applying rules continuously will eventually stop by entering a loop. That is why our system can have a decision algorithm, despite the lack of subformula property. 82 .. .. .. . . . p1 ∧ q1 ∧ X1 ∧ ¬X2 , p2 ∧ q2 ∧ ¬X1 ∧ X2 → →5≥3 →3≥3 6 .. .. . . p2 ∧ q2 ∧ ¬X1 ∧ X2 , ¬∃xψ → ¬∃xψ, p1 ∧ q1 ∧ X1 ∧ ¬X2 → 1 1 1 ¬∃xψ → (p2 ∧ q2 ∧ ¬X1 ∧ X2 ), Xe>0 (p1 ∧ q1 ∧ X1 ∧ ¬X2 ), Xe≥.3 Xe≥.7 .. . → φ ∧ at(s1 ) ⊃ AXe1 ∃xψ Fig. 5. Example of proof (1) can execute some action α and φ is achieved by this action”), we introduce a predicate Agt s.t. Agt(e, a) holds iff an agent a can execute an event e. We use a list structure in first-order language to represent a group of agents, and introduce the ‘member’ predicate using its general definition in Prolog, i.e. a non-logical axiom ∀x(member(x, cons (x, nil)) ∧ ∀y∀z(member(x, z) ⊃ member(x, cons(y, z))))5 . Then, we can represent W the above-mentioned part as µX.(φ ∨ e∈E,a∈A (Agt(e, a) ∧ member(a, g) ∧ AXe X)). (Note: LORA introduces equivalents for Agt and ‘member’ as primordial components of formulas, and enables the applying of ∀ for agents and actions. These reduce the length of formulas, but complicates syntax and semantics.) V Let ψ be this formula, and abbreviate νX.(ξ ∧ a∈A (member(a, g) ⊃ BELa X)) as E-Knowg ξ, which states that “ξ holds and an agent group g mutually believes it”. Then E-Knowg ψ is equivalent to (J-Can0 g φ). Further, we can represent (J-Can g φ) by µX.((J-Can0 g φ) ∨ (J-Can0 g X)). By proceeding in this way, we can construct further descriptions about coordinations of agents like in LORA. To prove various properties of coordinations is also possible. Fig. 6 is a proof of a property (J-Can0 g φ) ⊃ E-Knowg (J-Can0 g φ), whose equivalent is represented in LORA (we assume A in Section 3.1 be {a1 , · · · , an }). Using the above-mentioned ψ, g g this formula can be rewritten as E-Knowg ψ ⊃ E-Know V E-Know ψ, so we giveathe proof of this formula. In that figure, we abbreviate a∈A (member(a, g) ⊃ BEL ξ) as Bg ξ. Hence, E-Knowg ξ is an abbreviation of ¬µX.(¬ξ ∨ ¬ Bg ¬X). Furthermore we abbreviate µX.(¬ξ ∨ ¬ Bg ¬X) as nEkg ξ. As a result, E-Knowg ξ is syntactically equivalent to ¬ nEkg ξ. In Fig. 6, the topmost column of the rightward proof figure is derived from the leftward proof figure using the item 2. of the provability definition. 5 Discussions We have given an extended BDI logic to handle notions required for formalizing realistic rational agents. However, there are more issues to consider, though we do not treat them in this paper. In this section we discuss some of them. 5.1 Treatment of mental state consistencies As we described in Section 3.2, we have omitted discussions about mental state consistencies for simplicity. However, mental state consistencies are significant in Bratman’s 5 In fact, the proof in Fig. 6 does not depend on this definition. 83 X → nEkg ψ .. . X → nEkg ψ .. . ¬ nEkg ψ → ¬X ¬ nEkg ψ → ¬X g a1 a .. BEL ¬ nEk ψ → BEL ¬X BEL n ¬ nEkg ψ → BELan ¬X . .. .. · · · (n branches in total) · · · . . g Bg ¬ nEk ψ → Bg ¬X .. . .. nEkg E-Knowg ψ → nEkg ψ ¬ Bg ¬X → ¬ψ ∨ ¬ Bg ¬ nEkg ψ . .. g g g ¬ E-Know ψ → nEk ψ ¬ Bg ¬X → nEk ψ . g g g g → E-Know ψ ⊃ E-Know E-Knowg ψ ¬ E-Know ψ ∨ ¬ Bg ¬X → nEk ψ a1 Fig. 6. Example of proof (2) intention principle and need to be handled to describe rational agents. For example, the property that “an agent will not form an intention if she cannot believe the possibility of achieving it” is said to be one of the required properties of rational agents. In traditional BDI logic, as in [7,18], this is written as INTEND(EX φ) ⊃ BEL(EX φ), and it presents the semantics that make it valid and the deduction system that can prove it. Currently TOMATOcannot treat such a property. This is for future work. When considering this, it is also interesting to consider consistency between probabilistic mental state operators mentioned in Section 3.1. For example, when the possibility of achievement of φ is believed with a probability 0.9, can we intend φ? 5.2 Treatment of probabilistic transitions The temporal operator in TOMATO is an extension of the next-time operator in CTL with a probability. This is because we introduced this operator so that we can construct a proof system base on the tableau method. However, a disadvantage of this is that the description with the probability is restricted to the transition between current time and the next time. In PCTL [14], we can describe the probability on the time sequence (path). In other words, the probability is given on path formulas. For example, a property “we can achieve φ not less than the probability of 0.9 in the future” can be written as [true U φ]≥0.9 . Currently TOMATOcannot describe such a property. However, as described in Section 2.1, it is difficult to describe event-wise probabilities in PCTL, unlike in TOMATO. Moreover, it is believed to be difficult for PCTL to create the proof system using the tableau method due to an excessive flexibility of probability descriptions in PCTL. Even for qualitative PCTL, in which probabilistic descriptions are restricted to 0 and 1, no deduction system is yet known [23]. To take the balance of construction of the proof system and flexibility of representation is an important issue. 5.3 Treatment of stability of mental states We believe that there are more issues to be considered on BDI logic though we did not treat them in this paper. For example, mental states, such as belief, should generally be 84 kept by default. However, there is no such concept in BDI logic in nature. The mental states in BDI logic are merely modal operators, and represented by accessibility relations on possible worlds, which vary at different times. Thus, there is no logical relation between the current belief and the one in the next time. If we want to keep the belief to some extent, we must explicitly introduce a non-logical axiom such as ψ ⊃ A(BEL (φ) U ξ) (believes φ until ξ holds). In the standard implementation of BDI agents, mental states, such as belief, are restricted to first-order formulas, and an agent adds or deletes its mental states in its database by an event such as add-belief and del-belief. The addition and deletion of her mental states occurs procedurally, so the consistency between it and the logic is not guaranteed. There are some trials, such as AgentSpeak(L) [24], for bridging this gap by offering a proof theory about the properties of such procedures. However, they do not dissolve the un-naturalness of the logic that the mental states are not maintained by default, nor eliminate the fact that mental states are restricted to first-order formulas in the implementations. Mental states are not always expected to be kept; for example, if there is a belief BEL(AX φ) (believes that “φ in next time”), it would be natural that we have BEL(φ) the next time. [25] is one of such studies, though it is based on non-branching temporal logic and lack of descriptive power is anticipated. It will be interesting to consider how we treat such things in modal logics. Some studies treat the updating of mental states as an update of the model itself instead of time transition. Though such a method is difficult to apply to MDP because time path is restricted to be unique, it may be also a possibility to handle stability of mental states naturally. 6 Conclusion In this paper, we proposed TOMATO, an extended BDI logic with probabilistic transitions and a fixed-point operator, to enable formal descriptions and discussions on rational agents with notions such as probabilistic state transitions in reinforcement learning and cooperative actions in multi-agent environments. We also showed some examples of descriptions and proofs associated with these notions. Our future work includes a study of the completeness of TOMATO on predicate logic, construction of a proof algorithm in propositional logic and to introduce some of the notions described in Section 5, especially the consistency of mental states. We expect TOMATO to be a productive tool for modeling, designing and implementing rational agents. References 1. Rao, A.S., Georgeff, M.P.: Modeling Rational Agents within a BDI-Architecture. In: Proc. of International Conference on Principles of Knowlegde Representation and Reasoning. (1991) 473–484 2. Bratman, M.E.: Intention, Plans, and Practical Reason. Harvard University Press (1987) 85 3. Bratman, M.E., Israel, D.J., Pollack, M.E.: Plans and resource-bounded practical reasoning. Computational Intelligence 4 (1988) 349–355 4. Wooldridge, M., Rao, A., eds.: Foundations of Rational Agency. Volume 14 of Applied Logic Series. Kluwer Academic Publishers (1999) 5. Rao, A.S., Georgeff, M.P.: Modeling Rational Agents within a BDI-Architecture. In Huhns, M.N., Singh, M.P., eds.: Reading in Agents. Morgan Kaufmann, San Francisco (1997) 317– 328 6. Nide, N., Takata, S., Araragi, T.: Deduction Systems for BDI Logics Using Sequent Calculus. Computer Software 20(1) (2003) 66–83 (In Japanese). 7. Nide, N., Takata, S., Araragi, T.: Reasoning About Mental State Compatibilities of Rational Agents and Its Applications. Transactions of the Institute of Electronics, Information and Communication Engineers J86-D-I(8) (2003) 514–523 (In Japanese). 8. Nide, N., Takata, S., Araragi, T.: A Deduction System of Extended BDI Logic to Handle Mutual Belief. Transactions of Information Processing Society of Japan 46(SIG2 (TOM11)) (2005) 85–99 (In Japanese). 9. Wolper, P.: The tabuleau method for temporal logic: an overview. Logique et Anal. 28 (1985) 119–136 10. Wooldridge, M.: Reasoning about Rational Agents. The MIT Press (2000) 11. Takata, S., Nide, N., Yamakawa, H., Miyazaki, K., Ohta, M.: An achievement method of bdi agent who practically reasons about the skill acquired isung reinforcement learning. In: Proc. of JAWS2004. (2004) 517–524 (In Japanese). 12. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons (1994) 13. Nide, N., Takata, S., Yamakawa, H., Miyazaki, K., Ohta, M.: Correspondence between BDI model and world model in reinforcement learning. In: Proc. of JAWS2004. (2004) 378–385 14. Hansson, H., Jonsson, B.: A Logic for Reasoning about Time and Reliability. Formal Aspects of Computing 6(5) (1994) 512–535 15. Kozen, D.: Results on the propositional µ-calculus. Theoretical Computer Science 27 (1983) 333–354 16. Dam, M.: Translating CTL into the modal µ-calculus. Technical Report ECS-LFCS-90123, Laboratory for Foundations of Computer Science, Department of Computer Science, University of Edinburgh (1990) 17. Manolios, P.: Mu-calculus model-checking. In: Computer-Aided reasoning: ACL2 case studies. Kluwer Academic Publishers (2000) 93–111 18. Rao, A.S., Georgeff, M.P.: Decision Procedures for BDI Logics. Journal of Logic and Computation 8(3) (1998) 292–343 19. Tarski, A.: A lattice-theoretical fixpoint theorem and its application. Pacific Journal of Mathematics 5 (1955) 285–309 20. Nide, N., Takata, S.: Deduction Systems for BDI Logics Using Sequent Calculus. In: Proc. of International Joint Conference on Autonomous Agents & Multiagent Systems (AAMAS 2002). (2002) 928–935 21. Stirling, C.: Modal and Temporal Properties of Processes. Springer Verlag (2001) 22. Wang, H.: Towards mechanical mathematics. IBM Journal of Research and Development 4 (1960) 224–268 23. Brázdil, T., Forejt, V., Křetı́nský, J., Kučera, A.: The satisfiability problem for probabilistic ctl. In: Proc. of 23rd Annual IEEE Symposium on Logic in Computer Science. (2008) 391– 402 24. Rao, A.S.: AgentSpeak(L): BDI agents speak out in a logical computable language. In: Proc. of MAAMAW-96. Volume 1038 of LNAI., Springer-Verlag (1996) 42–55 25. Su, K., Sattar, A., Wang, K., Luo, X., Governatori, G., Padmanabhan, V.: Observation-based Model for BDI-Agents. In: Proc. of AAAI 2005. (2005) 190–195 86 InstQL : A Query Language for Virtual Institutions using Answer Set Programming Luke Hopton, Owen Cliffe, Marina De Vos, and Julian Padget Department of Computer Science University of Bath, BATH BA2 7AY, UK [email protected],{occ,mdv,jap}@cs.bath.ac.uk Abstract. Institutions provide a mechanism to capture and reason about “correct” and “incorrect” behaviour within a social context. While institutions can be studied in their own right, their real potential is as instruments to govern open software architectures like multi-agent and service-oriented systems. Our domain-specific action language for normative frameworks, InstAL aims to help focus designers’ attention on the expression of issues such as permission, violation and power but does not help the designer in verifying or querying the model they have specified. In this paper we present the query language InstQL which includes a number of powerful features including temporal constraints over events and fluents that can be used in conjunction with InstAL to specify those traces that are of interest in order to investigate and reason over the underlying normative models. The semantics of the query language is provided by translating InstQL queries into AnsP rolog, the same computational language as InstAL . The result is a simple, high-level query and constraint language that builds on and uses the reasoning power of ASP. 1 Introduction Institutions[8, 22, 24, 6], also known as normative frameworks or organisations in the literature, are a specific class of multi-agent systems where agent behaviour is governed by social norms and regulations. Within institutions it is possible to monitor the permissions, empowerment and obligations of participants and to indicate violations when norms are not followed. The change of the state over time as a result of these actions provides participants with information about each others behaviour. The information can also be used by the designer to query and verify normative properties, effects and expected outcomes in an institution. The research on institutions such as electronic contracts, and rules of governance over the last decade has demonstrated that they are powerful mechanism to make agent interactions more effective, structured and efficient. As with human regulatory settings, institutions become useful when it is possible to verify that particular properties are satisfied for all possible scenarios. Answer set programming[3, 15], a logic programming paradigm, permits, in contrast to related techniques like the event calculus[20] and C+[12], the specification of both problem and query as an executable program, thus eliminating the gap between specification and verification language. But perhaps more importantly, the specification 87 language and implementation language are identical, allowing for more straightforward verification and validation. In [6], we introduced a formal model for institutions, which admits reasoning about them by mapping to AnsP rolog, logic programs under answer set semantics. To make the reasoning process more accessible to users, in [7] we developed an action language named InstAL that allows a developer to design an institution in a more straightforward manner. InstAL is than translated into AnsP rolog, resulting in the same program as the formal description would have provided. While InstAL allowed the designer to specify the institution, it provided little to no support for verifying the institution and its design—indeed, as it stands queries must be written directly in AnsP rolog, thereby undoing most of the benefits of specifying in InstAL . In this paper, we present InstQL : a query language designed to complement InstAL . Its semantics is provided by ASP and it is used together with a description of an institution either in InstAL or AnsP rolog. InstQL can be used in two ways: as a tool to select certain transitions in the state space of the institution or to model-check a certain path. For temporal queries we describe how queries expressed in the widely used temporal logic LTL may be expressed (via simple transformations) into our query language. A brief summary of the InstQL language appears in [18]. In this paper we provide an extended account of the language, illustrations of its capabilities and applications; and situate it firmly in the context of multi-agent systems. 2 Answer Set Programming In answer set programming ([3]) a logic program is used to describe the requirements that must be fulfilled by the solutions of a certain problem. Answer set semantics is a model-based semantics for normal logic programs. Following the notation of [3], we refer to the language over which the answer set semantics is defined as AnsP rolog. An AnsP rolog program consists of a set of rules of the form a : −B, not C. with a being an atom and B, C being (possibly empty) sets of atoms. a is called the head of the rule, while B ∪ not C is the body. The rule can be read as: “if we know all atoms in B and we do not know any atom in C, then we must know a”. Rules with an empty body are called facts, as the head is always considered known. An interpretation is a truth assignment to all atoms in the program. Often only those literals that are considered true are mentioned, as all the other are false by default (negation as failure). The semantics of programs without negation (effectively horn clauses) are simple and uncontroversial, the Tp (immediate consequence) operator is iterated until a fixed point it reached. The Gelfond-Lifschitz reduct is used to deal with negation as failure. This takes a candidate set and reduces the program by removing any rule that depends on the negation of an atom in the set and removing all remaining negated atoms. Answer Sets are candidate sets that are also models of the corresponding reduced programs. The uncertain nature of negation-as-failure gives rise to several answer sets, which are all solutions to the problem that has been modelled. Algorithms and implementations for obtaining answer sets of logic programs are referred to as answer set solvers. Some of the most popular and widely used solvers are DLV[9], S MODELS[21] and CLASP[14]. 88 3 Institutions In this section, we give an informal description of institutions and their mapping to ASP. A more in-depth description can be found in [6, 7]. The concept of normative systems has long been used in economics, legal theory and political science to refer to systems of regulation which enable or assist human interaction at a high-level. The same principles could be applied to multi-agent systems. The model we use is based on the concept of exogenous events that describe salient events of the physical world—“shoot somebody”—and normative events that are generated by the normative framework—“murder”—but which only have meaning within a given social context. While exogenous events are clearly observable, normative ones are not, so how do they come into being? Searle [19] describes the creation of a normative state of affairs through conventional generation, whereby an event in one context counts as or generates the occurrence of another event in a second context. Taking the physical world as the first context and by defining conditions in terms of states, normative events may be created that count as the presence of states or the occurrence of events in the normative world. Thus, we model an institution as a set of normative states that evolve over time subject to the occurrence of events, where a normative state is a set of fluents that may be held to be true at some instant. Furthermore, we may separate such fluents into domain fluents, that depend on the instititution being modelledand normative fluents that are common to all specifications and may be classified as follows: – Permission: A permission fluent captures the property that some event may occur without violation. If an event occurs, and that event is not permitted, then a violation event is generated. – Normative Power: This represents the normative capability for an event to be brought about meaningfully, and hence change some fluents in the normative state. Without normative power, the event may not be brought about and has no effect; for example, a marriage ceremony will only bring about the married state, if the person performing the ceremony is empowered so to do. – Obligation: Obligation fluents are modelled as the dual of permission. They state that a particular event must occur before a given deadline event (such as a timeout) and is associated with a specified violation. If an obligation fluent holds and the necessary event occurs then the obligation is said to be satisfied. If the corresponding deadline event occurs then the obligation is said to be violated and the specified violation event is generated. Such a violation event can then be dealt with perhaps by a participating agent or the normative framework itself. Each event, being exogenous or normative, when generated could have an impact on the next state. For example, the event could trigger a violation or it could result in permissions being granted or retracted (e.g. Once you obtain your driving licence, you obtain the permission to drive a car, but, if you are convicted of a driving offence you lose that permission). The effects of events are modelled by the consequence relation. Thus we represent the normative framework by these five components: (i) the initial state—the set of fluents which are true when the institution is created, (ii) the set of fluents that capture the essential facts about the normative state, (iii) the set of events (both 89 exogenous and normative) that can occur, (iv) the conventional generation relation, and (v) the consequence relation. All state changes in a system stem from the occurrence of exactly one exogenous event. When such an event occurs, the transitive closure of the conventional generation function computes all empowered normative events that are directly or indirectly caused by the occurrence of the underlying event. This may include violations for unsatisfied obligations or unpermitted events. The consequences of each of these events with respect to the current state is computed using the consequence relationship. The combination of added and deleted fluents results in the new normative state. The semantics of this framework are defined over traces of exogenous events. Each trace induces a sequence of normative states, called a model or scenario. In [6], it was shown that the formal model of an institution could be translated to AnsP rolog program such that the answer sets of the program correspond exactly to the traces of the institution. A detailed description of the mapping can be found there. In [5] we developed InstAL , an action language inspired by action languages such as C + and A[12]. The use of the action language makes generating the AnsP rolog code less open to human coding errors, and perhaps more importantly, easier to understand and create by narrowing the semantic gap without losing either expressiveness or a formal basis for the language. Institutions specifications could give rise to a vast number of valid traces and associated histories. Often not all of them are equally useful for the task at hand and selection criteria have to be applied. Through InstQL , we aim to offer the designer the same sort of abstraction for queries as is provided by InstAL for the specification. 4 The Dutch Auction: A Motivating Example As a case study we will look a fragment of the Dutch auction protocol with only one round of bidding. In this protocol a single agent is assigned to the role of auctioneer, and one or more agents play the role of bidders. The purpose of the protocol as a whole is either to determine a winning bidder and a valuation for a particular item on sale, or to establish that no bidders wish to purchase the item. Consequently, conflict—where two bids are received “simultaneously”—is treated as an in-round state which takes the process back to the beginning. The protocol is summarised as follows: 1. Round starts: auctioneer selects a price for the item and informs each of the bidders present of the starting price. The auctioneer then waits for a given period of time for bidders to respond. 2. Bidding: upon receipt of the starting price, each bidder has the choice whether to send a message indicating their desire to bid on the item at that price or not. 3. Bid processing: at the end of the prescribed period of time, if the auctioneer has received a single bid from a given agent, then the auctioneer is obliged to inform each of the participating agents that this agent has won the auction. 4. No bids: if no bids are received at the end of the prescribed period of time, the auctioneer must inform each of the participants that the item has not been sold. 5. Multiple bids: if more than one bid was received then the auctioneer must inform every agent that a conflict has occurred. 90 annsold(A,B) generates sold(A,B); (DAR-1) annunsold(A,B) generates unsold(A,B); (DAR-2) annconf(A,B) generates conf(A,B); (DAR-3) biddl terminates pow(bid(B,A)); (DAR-4) biddl initiates pow(sold(A,B)),pow(unsold(A,B)), pow(conf(A,B)), pow(alerted(B)),perm(alerted(B)); (DAR-5) biddl initiates perm(annunsold(A,B)),perm(unsold(A,B)), obl(unsold(A,B),desdl,badgov) if not havebid; (DAR-6) biddl initiates perm(annsold(A,B)),perm(sold(A,B)), obl(sold(A,B), desdl, badgov) if havebid, not conflict; (DAR-7) biddl initiates perm(annconf(A,B)),perm(conf(A,B)), obl(conf(A,B), desdl, badgov) if havebid, conflict; (DAR-8) (DAR-9) unsold(A,B) generates alerted(B); sold(A,B) generates alerted(B); (DAR-10) conf(A,B) generates alerted(B); (DAR-11) alerted(B) terminates pow(unsold(A,B)), perm(unsold(A,B)), pow(sold(A,B)), pow(conf(A,B)), pow(alerted(B)), perm(sold(A,B)), perm(conf(A,B)), perm(alerted(B)), perm(annconf(A,B)),perm(annsold(A,B)),perm(annunsold(A,B)); (DAR-12) desdl generates finished if not conflict; (DAR-13) desdl terminates havebid,conflict,perm(annconf(A,B)); (DAR-14) desdl initiates pow(price(A,B)), perm(price(A,B)), perm(annprice(A,B)), perm(pricedl),pow(pricedl), obl(price(A,B),pricedl,badgov) if conflict; (DAR-15) Fig. 1. A partial InstAL specification for the Dutch Auction Round Institution 6. Termination: the protocol completes when an announcement is made indicating that an item is sold or that no bids have been received. 7. Conflict resolution: in the case where a conflict occurs then the auctioneer must re-open the bidding and re-start the round in order to resolve the conflict. Based on the protocol description above, the following agent actions are defined: the auctioneer announces a price to a given bidder (annprice), the bidder bids on the current item (annbid), the auctioneer announces a conflict to a given bidder (annconf) and the auctioneer announces that the item is sold (annsold) or not sold (annunsold) respectively. In addition to the agent actions we also include a number of time-outs indicating the three external events—that are independent of agents’ actions—that affect the protocol. For each time-out we define a corresponding protocol event suffixed by dl indicating a deadline in the protocol: priceto, pricedl: A time-out indicating the deadline by which the auctioneer must have announced the initial price of the item on sale to all bidders. bidto, biddl: A time-out indicating the expiration of the waiting period for the auctioneer to receive bids for the item. decto, decdl: A time-out indicating the deadline by which the auctioneer must have announced the decision about the auction to all bidders When the auctioneer violates the protocol, an event badgov occurs and the auction dissolves. Figure 1 gives the InstAL specification of the third phase of the protocol. The full specification can be found on [5]. Figure 2 shows the state transition diagram for an auctioneer and a single bidder. Every path in the graph is a valid trace. To guide the development of our query language InstQL for institutional models written in InstAL , five types of existing queries which were directly encoded in AnsP rolog were considered. 91 live(dutch_auction_round) obl(price(a,b),pricedl,badgov) priceto [badgov] [pricedl] createdar annprice(a,b) [price(a,b)] live(dutch_auction_round) annbid(b,a) [bid(b,a)] havebid live(dutch_auction_round) onlybidder(b) priceto [pricedl] live(dutch_auction_round) priceto [pricedl] annbid(b,a) [bid(b,a)] havebid live(dutch_auction_round) onlybidder(b) bidto [biddl] bidto [biddl] havebid live(dutch_auction_round) obl(sold(a,b),desdl,badgov) onlybidder(b) annsold(a,b) [notified(b)] [sold(a,b)] desto [badgov] [desdl] [finished] havebid live(dutch_auction_round) onlybidder(b) annconf(a,b) [conf(a,b)] [notified(b)] [viol(annconf(a,b))] [viol(conf(a,b))] live(dutch_auction_round) obl(unsold(a,b),desdl,badgov) annunsold(a,b) [notified(b)] [unsold(a,b)] [viol(annunsold(a,b))] [viol(unsold(a,b))] havebid live(dutch_auction_round) obl(sold(a,b),desdl,badgov) onlybidder(b) desto [desdl] [finished] annunsold(a,b) [notified(b)] [unsold(a,b)] live(dutch_auction_round) desto [badgov] [desdl] [finished] annsold(a,b) [notified(b)] [sold(a,b)] [viol(annsold(a,b))] [viol(sold(a,b))] desto [badgov] [desdl] [finished] annconf(a,b) [conf(a,b)] [notified(b)] [viol(annconf(a,b))] [viol(conf(a,b))] live(dutch_auction_round) obl(unsold(a,b),desdl,badgov) desto [desdl] [finished] desto [badgov] [desdl] [finished] Fig. 2. States of the auction round for a single bidder The first case is a simple constraint involving event occurrence. An example would be a query to obtain those traces in which the auctioneer violates the protocol. This query states that answer sets corresponding to traces in which the event badgov occurs at any point should be excluded. The key part of this condition is that an event can occur at any time. bad ← occurred(badgov, I), instant(I). ⊥ ← bad. (Q1) Similarly, the second query involves the a fluent being true at any time during the execution. This time, only those answer sets corresponding to traces that satisfy the condition should be included. As an example, we have a query that selects those traces in which a conflict occurs, i.e. more than one bidder submits a timely bid. hadconflict ← holdsat(conflict, I), instant(I). ⊥ ← not hadconflict. (Q2) In the third case, the query condition is for an event to occur at the same time as a fluent holds. Again, only answer sets in which the condition is satisfied should be included. An example of such a query would be selecting those traces/models in which at the occurence of the desdl-event we also have a conflict between two or more bidders. 92 restarted ← occurred(desdl, I), holdsat(conflict, I), instant(I). ⊥ ← not restarted. (Q3) The fourth case declares a parameterised condition. Whilst in the previous queries we considered conditions that are true/false for a whole model, this case declares a condition startstate that is true for a particular fluent. In addition, this query requires that fluent is true in the state after an event occurs. The use of parameterised conditions is illustrated in the following statement that enumerates all the fluents that are true when the protocol has just started, which is indicated by the occurrence of the event createdar: startstate(F) ← holdsat(F, I1), occurred(createdar, I0), next(I0, I1), ifluent(F). (Q4) The fifth query can be used to verify the protocol. This query features the use of previously declared conditions in subsequent conditions. (Note that one of these, startstate(F), is the condition specified in query (Q4).) The protocol states that if more than one bidder bids for the good, the protocol needs to restart completely. This implies that all the fluents from the beginning of the protocol need to be reinstated and all others have to be terminated. The query checks this has been done, but if we still obtain a trace with this query we know something has gone wrong. startstate(F) ← holdsat(F, I1), occurred(createdar, I0), next(I0, I1), ifluent(F). restartstate(F) ← holdsat(F, I1), occurred(desdl, I0), holdsat(conflict, I0), next(I0, I1), ifluent(F). missing(F) ← startstate(F), not restartstate(F), ifluent(F). added(F) ← restartstate(F), not startstate(F), ifluent(F). invalid ← missing(F), ifluent(F). invalid ← added(F), ifluent(F). ⊥ ← not invalid. (Q5) While it is possible to express these queries, as we have seen, directly in AnsP rolog, it requires a solid knowledge of the formalism and implementation detail to get the order of events and fluents correct. InstQL was designed to remove these difficulties and allow designers to write queries in a language more closely related to natural language. 5 InstQL In this section we introduce the query language, InstQL , that can be used directly with an AnsP rolog program representing the institution regardless of whether the program is derived from the formal description or InstAL . Space restrictions do not allow us to provide the complete mapping of an institution into AnsP rolog. We will only mention those atoms on which InstQL relies for its semantics. The set of events recognised by the institution is denoted E while the set of available fluents is F. When modelling traces, we need to monitor the domain over a period of time (or a sequence of states). We model time using instant(I) and an ordering on instances 93 Expression <variable> <variable list> <name> <param list> <identifier> <predicate> <literal> <while literal> <while expr> <after> <after expr> ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= <condition literal> <term> <conjunction> <disjunction> <condition decl> ::= ::= ::= ::= ::= <constraint> ::= Definition [A-Z][a-zA-Z0-9 ]* <variable> , <variable list> | <variable> [a-z][a-zA-Z0-9 ]* ( <variable list> ) <name> <param list> | <name> happens( <identifier> ) | holds( <identifier>) not <predicate> | <predicate> <literal> | <condition literal> <while literal> while <while expr> | <while literal> after( <integer> ) | after <while expr> <after> <after expr> | <while expr> not <identifier> | <identifier> <after expr> | <condition literal> <term> and <conjunction> | <term> <term> or <disjunction> | <term> condition <identifier> : <disjunction>; | condition <identifier> : <conjunction>; constraint <disjunction> ; | condition <identifier> : <conjunction>; Table 1. InstQL Syntax established by next(I1, I2), with the final instance defined as final(I). Following convention, we assume that the truth of a fluent F ∈ F at a given state instance I is represented as holdsat(F, I), while an event or an action E ∈ E is modelled as occurred(E, I). InstQL has two basic concepts: (i) constraint: an assertion of a property that must be satisfied by a valid trace (for example, a restriction on which traces are considered), and (ii) condition: a specification of properties that may hold for a given trace. Conditions can be declared in relation to other conditions and constraints can involve declared conditions. Table 1 summarises the syntax of the language, while the remainder of this section discusses in detail the elements of the language and their semantics. 5.1 Syntax InstQL provides two predicates that form the basis of all InstQL queries. The first is happens(Event), meaning that the specified event should occur at some point during the lifetime of the institution. The second is holds(Fluent), which means that the specified fluent is true at any point during the lifetime of the institution. That is: <predicate> ::= happens( <identifier> ) | holds(<identifier>) where the identifier corresponds to an event e (in the first case) or a fluent f (in the second case). Negation (as failure) is provided by the unary operator not: <literal> ::= not <predicate> | <predicate> To construct complex queries, it is often easier to break them up into sub-queries, or in InstQL terminology, sub-conditions. For example, suppose we have defined a condition called my cond which specifies some desired property. We can then join this with other criteria e.g. “my cond and happens(e)”. Sub-conditions may be referenced within rules as condition literals: <condition_literal> ::= not <identifier> | <identifier> 94 Note that this allows for parameterised conditions to be defined by the definition of an identifier. The building block of query conditions is the term: <term> ::= <after_expr> | <condition_literal> The after expression also allows for the simpler constructs of <literal> and <while expr>. Terms may be grouped and connected by the connectives and and or which provide logical conjunction and disjunction. <conjunction> ::= <term> and <conjunction> | <term> <disjunction> ::= <term> or <disjunction> | <term> On its own, this does not allow us create arbitrary combinations of predicates and named conditions and the logical operators and, or, not. To do so we need to be able to declare conditions: <condition_decl> ::= condition <identifier> : <disjunction> | condition <identifier> : <conjunction>; This construction defines a condition with the specified name to have a value equal to the specified disjunction or conjunction. This allows the condition name to be used as a condition literal. Constraints specify properties of the trace that must be true: <constraint> ::= constraint <disjunction> | <conjunction> ; For example, consider the following InstQL query: constraint happens(e); This indicates that only traces in which event e occurs at some point should be considered. To illustrate how this language is used to form queries, consider a simple light bulb action domain. The fluent on is true when the bulb is on. The event switch turns the light on or off. We can require that at some point the light is on: constraint holds(on); We can require that the light is never on: condition light_on: holds(on); constraint not light_on; There is some subtlety here in that light on is true if at any instant on is true. Therefore, if light on is not true, there cannot be an instant at which on was true. And what if the bulb is broken—the switch is pressed but the light never comes on? This can be expressed as: constraint not light_on and happens(switch); Using condition names, we can create arbitrary logical expressions. The statement that event e1 and either event e2 or e3 should occur can be expressed as follow: condition disj: happens(e2) or happens(e3); condition conj: happens(e1) and disj; We may wish to specify queries of the form “X and Y happen at the same time”. That is, we may wish to talk about events occurring at the same time as one or more fluents are true, simultaneous occurrence of events or combinations of fluents being simultaneously true (and/or false). For this situation, InstQL has the keyword while to indicate that literals are true simultaneously. Such while expressions are only defined over literals constructed from predicates (that is, happens and holds) or condition literals involving condition names. A while expression is defined as follows: 95 <while_literal} ::= <literal> | <condition\_literal> <while_expr> ::= <literal> while <while_expr> | <literal> The while-operator has higher precedence than and and or. Returning to the light bulb example, we can now specify that we want only traces where the light was turned off at some point: constraint happens(switch) while holds(on); Or that at some point the light was left on: constraint holds(on) while not happens(switch); The language allows for the expression of orderings over events. This is done with the after keyword. This allows statements of the form: holds(f1) while not holds(f2) after happens(e1) after happens(e2) This should be read as: (i) at some time instant k the event e2 occurs (ii) at some other time instant j the event e1 occurs (iii) at some other time instant i the fluent f1 is true but the fluent f2 is not true (iv) these time instants are ordered such that i > j > k (that is, k is the earliest time instant) However, in some cases we need to say not only that a given literal holds after some other literal, but that this is precisely one time instant later. Rather than just providing the facility to specify a literal occurs/holds in the next time instant, this is generalised to say that a literal holds n time instants after another. That is, for a fluent that does (not) hold at time instant ti or an event that occurs between ti and ti+1 , we can talk about literals that hold at ti+n or occur between ti+n and ti+n+1 . The syntax of an after expression is: <after> ::= after | after( <integer> ) <after_expr> ::= <while_expr> <after> <after_expr> | <while_expr> An after expression may contain only the after operator or the after(n) operator, depending on how precisely the gap between the two operands is to be specified. Once again returning to the light bulb example, we can now specify a query which requires the light to be switched twice (or more): constraint happens(switch) after happens(switch); Or that once that light has is on, it cannot be switched off again: condition switch_off: happens(switch) after holds(on); constraint not switch_off; 5.2 Semantics The semantics of an InstQL query is defined by the translation function T which translates InstQL into AnsP rolog. This function takes a fragment of InstQL and generates a set of (partial) AnsP rolog rules. Typically, this set is a singleton; only expressions involving disjunctions generate more than one rule. The semantics of predicates are defined as follows: T (happens(e)) = occurred(e, I), event(e) T (holds(f)) = holdsat(f, I), ifluent(f) 96 For a literal of the form not P (where P is a predicate) the semantics is: T (not P ) = not T (P) while for a condition literal they are: T (conditionName) = conditionName(I) T (not conditionName) = not conditionName(I) and a conjunction of terms is: T (c1 and c2 and · · · and cn ) = T (c1 ), T (c2 ), . . . ,T (cn ) A disjunction translates to more than one rule. However, this is defined slightly differently depending on whether it is part of a condition declaration or a constraint. T (condition conditionName : c1 or c2 or · · · or cn ; ) = {conditionName ← T (ci ). | 1 ≤ i ≤ n} T (constraint c1 or c2 or · · · or cn ; ) = {newName ← T (ci ). | 1 ≤ i ≤ n}∪ {⊥ ← not newName.} Note that the AnsP rolog term newName denotes any identifier that is unique within the AnsP rolog program that is the combination of the query and the action program. In addition, each time instant I generated in the translation of a predicate represents a name for a time instant that is unique within the InstQL query. Recall that a condition name may be parameterised: since an InstQL variable translates to a variable in Smodels, no additional machinery is required. For example, the condition “condition ever(E): happens(E);” (which just defines an alias for happens) is translated to “ever(E) ← occurred(E, I), instant(I), event(E).”. The semantics for while is: T (L1 while L2 while · · · while Ln ) = T (L1 ), T (L2 ), . . . , T (Ln ), instant(I) We give the semantics for the binary operator after(n). This can easily be generalised for after expressions built of sequences of after(n) operators mixed with after operators. T (Wi after(n) Wj ) = T (Wi ), T (Wj ), after(ti , tj , n) Where ti and tj are the time instants generated by Wi and Wj respectively. This is defined such that we require n > 0. We now provide a concrete example of the translation of an after expression to illustrate this process: T (happens(e) while holds(f) after happens(d) after(3) holds(g)) = occurred(e, ti ), event(e), holdsat(f, ti ), ifluent(f), instant(ti ), occurred(d, tj ), event(d), instant(tj ), holdsat(g, tk ), ifluent(g), instant(tk ), after(ti , tj ), after(tj , tk , 3). 97 5.3 The Dutch Auction Queries Having defined the query language InstQL, we return to the example queries for the Dutch auction from Section 4. For (Q1) the following InstQL query is equivalent: condition bad: happens(badgov); constraint\ not\ bad; Alternatively, we could look at all the traces in which the protocol is never violated by one of the bidders. condition bad: happens(viol(E)); constraint not bad; An InstQL query that is equivalent to (Q2) is: constraint holds(conflict); The following query is equivalent to (Q3): constraint happens(desdl) while holds(conflict); For (Q4), the following InstQL query is equivalent: condition startstate(F) : holds(F) after(1) happens(createdar); (1) For (Q5) the following InstQL query is equivalent: condition startstate(F): holds(F) after(1) happens(createdar); condition restartstate(F):\ holds(F) after(1) happens(desdl)} while holds(conflict); condition missing(F): startstate(F) and not restartstate(F); condition added(F): restartstate(F) and not startstate(F); constraint missing(F) or added(F); 6 Reasoning 6.1 Common Reasoning Tasks Following the description of InstQL in the preceding section, we now illustrate how it can be used to perform three common tasks[25] in computational reasoning: prediction, postdiction and planning. Prediction is the problem of ascertaining the resulting state for a given (partial) sequence of actions and initial state. That is, suppose some transition system is in state S and a sequence A = a1 , . . . , an of actions occurs. Then the prediction problem (S, A) is to decide the set of states {S ′ } which may result. Postdiction is the converse problem: if a system is in state S ′ and we know that A = a1 , . . . , an have occurred, then the problem (A, S ′ ) is to decide the set {S} of states that could have held before A. The planning problem (S, S ′ ) is to decide which sequence(s) of actions, {A}, will bring about state S ′ from state S. Identifying States: A state is described by the set of fluents that are true S = {f1 , . . . , fn } where fi are the fluents. States containing or not containing given fluents may be identified in InstQL using the while operator: holds(f_1) while ... while holds(f_n) while not holds(g_1) while ... while not holds(g_k) where f1...k are fluents which must hold in the matched state and g1...k are those fluents that do not. 98 Describing Event Ordering: A sequence of events E = e1 , . . . , en may be encoded as an after expression. If we have complete information, then we know that e1 occurred, then e2 at the next time instant and so on up to en with no other events occurring in between. In this case, we can express E as follows: happens(e_n) after(1) ... after(1) happens(e_1) This can be generalised to the case where ei+1 occurs after ei with some known number k ≥ 0 of events happening in between: happens(e_i+1) after(1) ... after(k+1) happens(e_i) Alternatively if we do not know k (that is, we know that ei+1 happens later than ei but zero or more events occur in between) we can express this as: happens(e_i+1) after happens(e_i) We can combine these cases throughout the formulation of E to represent the amount of information available. The Prediction Problem: Given an initial state S and a sequence of events E, the prediction problem (S, E) can be expressed in InstQL as: constraint E after(1) S; This query limits traces to those in which at some point S holds after which the events of E occur in sequence. The answer sets that satisfy this query will then contain the states {S ′ }. The Postdiction Problem: Given a sequence of events E and a resulting state S ′ , the postdiction problem (E, S ′ ) can be expressed as: constraint S after(1) E; This requires S to hold in the next instant following the final event of E. The Planning Problem: Given a pair of states S and S ′ the planning problem (S, S ′ ) can be expressed in InstQL as: constraint S’ after S; This allows any non-empty sequence of events to bring about the transition from S to S ′ . If we want to consider plans of length k (i.e. E = e1 , . . . , ek ) then we express this: constraint S’ after(k) S; Reasoning with institutions: There are two distinct types of reasoning about institutions. The first is the verification and exploration of normative properties. After specifying an institutions, queries can be used to determine that desired properties of the model are present or to elicit emergent properties that were perhaps not intended. The second case for reasoning about a normative frameworks is for the participants/agents within that institutions to use the available information in their decision processes. The participants could, using the current state and the specification apply prediction to determine previous actions of other participants, postdiction to evaluate possible effects of their actions or planning to determine the actions necessary to achieve certain goals. Using AnsP rolog as the underlying formalism, designers and institutional participants can use partial information to reason about the institution itself of other particicpants. 99 6.2 Modelling Linear Temporal Logic LTL[23] is a commonly used temporal logic used for model checking transitions systems. In this section we show that LTL style reasoning can also modelled using our InstQL . We opted for LTL since it shares the same linear time structure as our model and also allows complex expressions of temporal properties between states. Traditional LTL syntax is often considered difficult to write and we believe that InstQL would be a valuable alternative, especially if one wants to reason about events and fluents at the same time. Linear Temporal Logic (LTL) [23] provides us with a formalism for reasoning about paths of state transition systems. In LTL, we have a set AP of atomic propositions. The syntax of LTL [11] is defined as follows: (i) p ∈ AP is a formula of LTL (ii) ¬f is a formula if f is a formula (iii) f ∨ g is a formula if f and g are formulae (iv) f ∧ g is a formula if f and g are formulae (v) 3f is a formula if f is a formula (“sometimes f ”) (vi) f U g is a formula if f and g are formulae (“f until g”). We abbreviate ¬3¬f by 2f (“always f ”). The semantics of LTL is given with respect to a structure M = (S, X, L) and a path of state transitions. M contains a non-empty set of states, X a non-empty set of paths and L : S → P(AP ) a labelling function which assigns to each state a set of propositions true in that state. A path is a non-empty sequence of states x = s0 s1 s2 . . .. We denote by xk the suffix of path x starting with the k th state.In addition, we use f irst(x) to denote the first state in path x. The semantics of LTL is defined inductively in terms of interpretations (paths) over a linear structure (time) by the relation |= [11, 10, 26, 17, 4]. Without loss of generality we use the natural numbers N as our structure. An interpretation is a function π : N → P(AP ), which assigns a truth value to each element of AP at every instant i ∈ N . Let M be a structure and x ∈ X, then: π, i π, i π, i π, i π, i π, i |= |= |= |= |= |= p ∈ AP ¬f f ∨g f ∧g 3f fUg ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ ⇐⇒ p ∈ π(i) p, i |6 = f π, i |= f or π, i |= g π, i |= f and π, i |= g ∃j ≥ i · π, j |= f ∃j ≥ i · π, j |= g ∧ (∀i ≤ k < j · π, k |= f ) Where the structure is understood, we will omit it from the relation and write x |= f . In principle LTL (originally) only refers to states, and as a general observation, the merging of actions and fluents inside LTL is non-trivial as you are merging state-relative and transition-relative concepts. With institutions we want to reason about about both fluents and events, so AP = E ∪ F. Expressing LTL in InstQL There is important difference between LTL and InstQL in the sense that InstQL is not designed for model checking but for model generation. Given a query, it will generate those paths that satisfy the criteria. If π is the path given to LTL for verification, InstQL will return all traces that satisfy the query which may or may not include the path given for verification. To solve this problem one can provide the path itself as a constraint to the InstQL query. This can be easily done using 100 a combination of while and after in the same way as be defined event ordering above. This will restrict the search space to those traces in which the path is satisfied. If the path itself is invalid (e.g. two observed events during the same time, fluents that are in a state while they should not be), then the query will automatically not be satisfied. The LTL query itself can then be expressed in InstQL . We will briefly describe how the various formulae may be expressed as conditions in InstQL. Each sub-formula S of the formula F that is to be checked is translated as a condition with a unique name cond-S. To make a formula F effective (i.e. only compute traces for which F is true) we simply add a constraint to the query that specifies the condition for F must hold. This is done by “constraint cond-F;”. Atomic elements a of AP and their negation simply become conditions with of happens(a) or holds(a) or their negation depending on the type of a. The LTL disjunction can be handled be handled as a disjunction in InstQL . Conjunction is LTL is much more like our InstQL while as all sub-formulas need to be evaluated over the same time instant. For formulae of the form “3F ” we define the conditions: condition diamond-F: cond-F; Although it might seem similar to the encoding of atomic elements, this encoding guarantees a possible different time instance. Defining until (F U G) is a more complex. Naı̈vely, we could attempt to define “F until G” as follows: condition false_before(cond-F,cond-G): cond-F after not cond-G; condition cond-FUG: & not false_before(cond-F, cond-G); This gives us almost what we need. However, translating this into AnsP rolog we see that the condition is too strong. To make the example easier assume that F is a fluent and G an event and that we skip the encoding for the sub-formula. false before(F, E) ← occurred(E, I), event(E), instant(I), not holdsat(F, J), ifluent(F), instant(J), after(I, J). until(F, E) ← not false before(F, E). We can satisfy false before(f, e) if we can find time instants ti and tj such that tj < ti , e happens at ti and at tj f is false. That is, f cannot be false before any occurrence of e. The correct semantics of until is that f cannot be false before the first occurrence of e [17]. In order to achieve the correct semantics, we introduce a need to introduce new fluents happened(e) to the domain for each event e ∈ E to indicate that e occurred for the first time. This is done in the background when we translate InstQL to AnsP rolog to indicate when an event has happened at any time in the past during the current trace. holdsat(happened(E), I) ← occurred(E, I), event(E), instant(I). holdsat(happened(E), I) ← occurred(E, J), after(I, J), event(E), instant(I), instant(J). To allow for this we need to replay for each event E that is part of the query and the until statement the condition with condition con-E: holds(happened(E));. This allows us to then specify F U G as follows: 101 condition fb(cond-F, cond-G): not cond-F while not cond_G; condition cond-FUG: not fb(cond-F, cond-G) and cond-E and cond-F; 6.3 Institutional Designer and Reasoning Tools: InstSuite Both InstQL and InstAL were designed and implemented to make representing and reasoning about institutions more intuitive and effective. While they were designed to work together they can be used indenpendently from each other. InstAL and InstQL specifications can be written in any text processor and then translated into an answer set program and passed on to an answer set solver that computes the requested traces and models. To provide normative designer more support, we have developed an integrated development environment InstEdit with syntax highlighting. Together they are referred to as InstSuite , which source code, a combination of Java and perl, can be obtained from http://www.bath.ac.uk/˜mdv/ 7 Discussion Previous work in [2, 1] (using the action language C + [12]), has shown that action languages are particularly suited to modelling normative domains, where actions in the language are equated with institutional events. In [7] we extend this approach with the language InstAL which incorporates normative properties directly into the syntax of the language and operates by translating institutional specifications into AnsP rolog. In this case we are able to directly leverage the reasoning capabilities inherent in the underlying logic programming platform to query properties of models. By building InstQL upon this model we are able to offer an equivalent level of abstraction to InstAL while at the same time remaining independent of the action language itself InstAL . While InstQL was designed with institutions in mind, it can be used a general query language for action domains, provided their descriptions can be mapped to AnsP rolog. Compared to existing query languages for action domains, InstQL allows for simultaneous actions and the definition of conditions which can then be used to create more complex queries. In [16], the authors present four query languages: P, Q, Qn , R. Queries expressed in those languages can also be expressed using InstQL . The action query language P has only two constructs : now L and necessarily F after A1, ..., An, where L refers to a fluent or its negation, F is a fluent and where Ai are actions. These queries can be encoded in InstQL using the techniques discussed in Section 6. now L can be written as constraint happens(An) after(1) ... after(1) happens(A1) after(1) holds(L) while necessarily F after A1, ..., An is expressed as holds(F) after(1) happens(An) after(1) ... after(1) happens(A1). Similar techniques can be used for the query languages Q, Qn and R. Given the action ordering technique used, we can assign specific times to each of the fluents. InstQL can express all the same kinds of queries as the query languages above, but in addition InstQL is capable of modelling simultaneous actions and fluents, which permits the expression of complex queries using disjunctions and conjunctions of conditions and, above all, allows reasoning with incomplete information, thus fully exploiting the reasoning power of answer set programming. 102 The Causal Calculator (CCALC)[13] is a very versatile tool for modelling action domains. While queries are possible in CCALC, AQL has been designed specifically as a query language, providing constructs to make specifying queries more intuitive and versatile. Relative ordering of actions or states is much more difficult in CCALC then it is AQL. CCALC also does not allow for the formulation of composite queries (condition literals). As it stands InstQL is an intuitive and versatile query and abduction language for actions domains. The language is succinct and does not contain any overhead (i.e. no operator can be expressed as a function of other operators). However, from a software engineering point of view, we could make the language more accessible by providing commonly used constructs as part of the language. To this end, we plan to incorporate constructs such as eventually(F), never(F), always(F), before(F), before(E), and an if-construct to express conditions on events or fluents. For the same reasons, we plan to add time specific happens(E,I) and hold(F,I) predicates and the possibility to construct general logical expression without the need for condition statements. At the moment InstQL only supports linear time. For certain domains, other ways of representing time might be more appropriate. While linear time assumes implicit universal quantification over all paths in the transition function, branching time allows for explicit existential and universal quantification of all paths and alternating time offers selective quantification over those paths that are possible outcomes. While linear and branching time are natural ways of describing time in closed domains, alternative time is more suited to open domains. References [1] A. Artikis, M. Sergot, and J. Pitt. Specifying electronic societies with the Causal Calculator. In F. Giunchiglia, J. Odell, and G. Weiss, editors, Proceedings of workshop on agent-oriented software engineering iii (aose), LNCS 2585. Springer, 2003. [2] Alexander Artikis, Marek Sergot, and Jeremy Pitt. Specifying norm-governed computational societies. ACM Trans. Comput. Logic, 10(1):1–42, 2009. [3] Chitta Baral. Knowledge Representation, Reasoning and Declarative Problem Solving. Cambridge Press, 2003. [4] Diego Calvanese and Moshe Y. Vardi. Reasoning about actions and planning in LTL action theories. In Proc. KR-02, 2002. [5] Owen Cliffe. Specifying and Analysing Institutions in Multi-Agent Systems using Answer Set Programming. PhD thesis, University of Bath, 2007. [6] Owen Cliffe, Marina De Vos, and Julian A. Padget. Answer set programming for representing and reasoning about virtual institutions. In Katsumi Inoue, Ken Satoh, and Francesca Toni, editors, CLIMA VII, volume 4371 of Lecture Notes in Computer Science, pages 60– 79. Springer, 2006. [7] Owen Cliffe, Marina De Vos, and Julian A. Padget. Specifying and reasoning about multiple institutions. In Javier Vazquez-Salceda and Pablo Noriega, editors, COIN 2006, volume 4386 of Lecture Notes in Computer Science, pages 63–81. Springer, 2007. [8] Douglass C. North. Institutions, Institutional Change and Economic Performance. Cambridge University Press, 1991. 103 [9] Thomas Eiter, Nicola Leone, Cristinel Mateis, Gerald Pfeifer, and Francesco Scarcello. The KR system dlv: Progress report, comparisons and benchmarks. In Anthony G. Cohn, Lenhart Schubert, and Stuart C. Shapiro, editors, KR’98: Principles of Knowledge Representation and Reasoning, pages 406–417. Morgan Kaufmann, San Francisco, California, 1998. [10] E. Allen Emerson. Temporal and modal logic. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, pages 995–1072. Elsevier, 1990. [11] E. Allen Emerson and Joseph Y. Halpern. “sometimes” and “not never” revisited: on branching versus linear time temporal logic. Journal of the ACM, 33(1):151–178, 1986. [12] Enrico Giunchiglia, Joohyung Lee, Vladimir Lifschitz, Norman McCain, and Hudson Turner. Nonmonotonic causal theories. Artificial Intelligence, Vol. 153, pp. 49-104, 2004. [13] Enrico Giunchiglia, Joohyung Lee, Vladimir Lifschitz, Norman McCain, and Hudson Turner. Nonmonotonic causal theories. Artificial Intelligence, Vol. 153, pp. 49-104, 2004. [14] M. Gebser, B. Kaufmann, A. Neumann, and T. Schaub. Conflict-Driven Answer Set Solving. In Proceeding of IJCAI07, pages 386–392, 2007. [15] Michael Gelfond and Vladimir Lifschitz. Classical negation in logic programs and disjunctive databases. New Generation Computing, 9(3-4):365–386, 1991. [16] Michael Gelfond and Vladimir Lifschitz. Action languages. Electron. Trans. Artif. Intell., 2:193–210, 1998. [17] Keijo Heljanko and Ilkka Niemel. Bounded LTL model checking with stable models. In Proceedings of the 6th International Conference on Logic Programming and Nonmonotonic Reasoning, pages 200–212. Springer-Verlag, 2003. [18] Luke Hopton, Marina Cliffe, Owen De Vos, and Julian Padget. Aql, a query language for action domains modelled using answer set programming (short paper). In LPNMR’09, 2009. Accepted for publication. [19] John R. Searle. The Construction of Social Reality. Allen Lane, The Penguin Press, 1995. [20] Robert A. Kowalski and Fariba Sadri. Reconciling the event calculus with the situation calculus. Journal of Logic Programming, 31(1–3):39–58, April–June 1997. [21] I. Niemelä and P. Simons. Smodels: An implementation of the stable model and wellfounded semantics for normal LP. In Jürgen Dix, Ulrich Furbach, and Anil Nerode, editors, Proceedings of the 4th International Conference on Logic Programing and Nonmonotonic Reasoning, volume 1265 of LNAI, pages 420–429, Berlin, July 28–31 1997. Springer. [22] Pablo Noriega. Agent mediated auctions: The Fishmarket Metaphor. PhD thesis, Universitat Autonoma de Barcelona, 1997. [23] A. Pnueli. The Temporal Logic of Programs. In 19th Annual Symp. on Foundations of Computer Science, 1977. [24] Joan-Antonı́ Rodrı́guez, Pablo Noriega, Carles Sierra, and Julian Padget. FM96.5 A Javabased Electronic Auction House. In Proceedings of 2nd Conference on Practical Applications of Intelligent Agents and MultiAgent Technology (PAAM’97), pages 207–224, London, UK, April 1997. ISBN 0-9525554-6-8. [25] Marek Sergot. (C+)++ : An action language for modelling norms and institutions. Technical Report 8, Department of Computing, Imperial College, London, June 2004. [26] A. P. Sistla and E. M. Clarke. The complexity of propostional linear temporal logics. Journal of the ACM, 32(3):733–749, 1985. 104 On the Implementation of Speculative Constraint Processing Jiefei Ma1 , Alessandra Russo1 , Krysia Broda1 , Hiroshi Hosobe2 , Ken Satoh2 1 Imperial College London, United Kingdom {jm103,ar3,kb}@doc.ic.ac.uk 2 National Institute of Informatics, Japan {hosobe,ksatoh}@nii.ac.jp Abstract. Speculative computation has been proposed for reasoning with incomplete information in multi-agent systems. This paper presents the first practical multi-threaded implementation for speculative constraint processing with iterative revision for disjunctive answers in master-slave multi-agent systems. 1 Introduction In the context of distributed problem solving with multi-agent systems, communication among agents plays a very important role, as it enables coordination and cooperation between agents. However, in practice communication is not always guaranteed. For example, the physical channel may delay/lose messages, or agents may break down or take unexpectedly long time to compute answers. Moreover, agents are often unable to distinguish between the above situations. All such problems/uncertainties can seriously affect the system performance, especially for result-sharing applications. Speculative computation has been proposed in [1–5] as a solution to the problem. In the proposal, a master agent prepares default answers to the questions that it can ask to the slaves. When communication is delayed or failed, the master can use the default answers to continue the computation. If later a real answer is returned (e.g. the communication channel or the slave agent is recovered), the computation already done by the master, which is using the default answers, will be revised. One of the main advantages of speculative computation relies then on the fact that the computation process of an agent is never halted when waiting for other agent’s responses. Examples of real life situations where speculative computation is useful can be found in [1–5]. Within the last few years, speculative computation has gone through various stages of development and extensions. In [1] an abductive-based algorithm has been proposed for speculative computation with yes/no answers for masterslave systems. In [2], the algorithm has been generalised for hierarchical multiagent systems where agents are assumed to be organised into a hierarchy of master/slaves. The method proposed in [2] also considers only yes/no type of answers. This approach has been extended in [3] to allow more general queries, whereby an agent can ask possible values or constraints of given queries, but within the context of master-slave systems. This speculative constraint processing takes into account the possibility that the agent’s response may neither entail 105 nor contradict the default answer assumed during the computation. In this case the two alternative computations – the one that uses the default and the one that uses the agent’s response – are maintained active. The approach described in [3] assumes, however, that only the master agent can perform speculative computation, and that the answer of a slave agent is therefore final and cannot be changed during the entire computation. This limitation has been further addressed in [4], where asked agents may provide disjunctive answers to a query at different times, and may also change the answers they have sent previously. In this context, a dynamic iterative belief revision mechanism has been deployed to handle chain reactions of belief revisions among agents involved in a computational process. Among the operational models proposed for speculative computation [1–4, 6], the one in [4] is the most complex but powerful. A practical implementation for it is very much desired, not only for proof-of-context testing and benchmark investigation, but also for discovering further improvements and/or extensions of the model. The contribution of this paper is to provide the first multi-threaded implementation of a multi-agent system for speculative disjunctive constraint processing. The system allows the master agent to performs speculative computation locally (using multi-threading or-parallelism), and to ask constraint queries to the slave agents. The speculative master agent is associated with one manager thread (MT) and a set of worker threads (WT). The description of the implementation given in the paper re-organises the operational model proposed in [4] to distinguish the tasks of the MT and WTs. A concurrency control mechanism has been introduced to maximise the concurrent execution of the MT and WTs. This implementation design is shown to be good enough to allow for future extensions of the speculative framework to, for instance, hierarchical multi-agent systems. The paper is organised as follows. Section 2 briefly reviews the operational model of speculative constraint processing presented in [4]. Section 3 describes the multi-threaded implementation in details, as well as the solutions to several concurrent computation issues. Section 4 compares the implementation to the pseudo-parallel approach, and suggests a hybrid-implementation for situations where computational resources (for multi-threading) are limited. Finally, conclusion and future work are given in Section 5. 2 Speculative Disjunctive Constraint Processing In this section we review the framework of speculative constraint processing and its operational model that has been proposed in [4]. 2.1 Speculative Constraint Processing Framework Definition 1. Let Σ be a finite set of constants. We call an element in Σ a slave agent identifier. An atom is of the form either p(t1 , ..., tn ) or p(t1 , ..., tn )@S, where p is a predicate, ti (1 ≤ i ≤ n) is a term, and S is in Σ. We call an atom with an agent identifier an “askable atom”, and an atom without an identifier a “non-askable atom”. 106 Definition 2. A framework for speculative constraint computation, in a masterslave system, is a triple hΣ, ∆, Pi, where: – Σ is a finite set of constants; – ∆ is a set of rules of the following form, called default rules w.r.t. Q@S: Q@S ← Ck, where Q@S is an askable atom, each of whose arguments is a variable, and C is a set of constraints, called default constraints for Q@S; – P is a constraint logic program, that is, a set of rules R of the form: H ← CkB1 , B2 , ..., Bn , where: • H is a non-askable atom; we refer to H as the head of R, denoted as head(R); • C is a set of constraints, called the constraints of R, and denoted as const(R); • each Bi of B1 , ..., Bn is either an askable atom or a non-askable atom, and we refer to B1 , ..., Bn as the body of R denoted as body(R). For the semantics of the above framework, we index the semantics of a constraint logic program by a reply set, which specifies a reply for an askable atom. Definition 3. A reply set is a set of rules in the form: Q@S ← Ck, where Q@S is an askable atom, each of whose arguments is a variable, and C is a constraint over these variables. Let hΣ, ∆, Pi be a framework for speculative constraint computation, and R be a reply set. A belief state w.r.t. R and ∆ is a reply set defined as: R ∪ {“ Q@S ← Ck” ∈ ∆ | ¬∃ C ′ s.t. “ Q@S ← C ′ k” ∈ R} and denoted as BEL(R, ∆). We introduce the above belief state since, if the answer is not returned, we use a default rule for an unreplied askable atom. Definition 4. A goal is of the form ← CkB1 , ..., Bn , where C is a set of constraints and the Bi ’s are atoms. We call C the constraint of the goal and B1 , ..., Bn the body of the goal. Definition 5. A reduction of a goal ← CkB1 , ..., Bn w.r.t. a constraint logic program P, a reply set R, and an atom Bi , is a goal ← C ′ kB ′ such that: – there is a rule R in P ∪ R s.t. C ∧ (Bi = head(R)) ∧ const(R) is consistent3 . – C ′ = C ∧ (Bi = head(R)) ∧ const(R) – B ′ = {B1 , ...Bi−1 , Bi+1 , ..., Bn } ∪ body(R) Definition 6. A derivation of a goal G =← CkBs w.r.t. a framework for speculative constraint computation F = hΣ, ∆, Pi and a reply set R is a sequence of reductions “← CkBs”,...,“← C ′ k∅”4 w.r.t. P and BEL(R, ∆), where in each 3 4 A notation Bi = head(R) represents a conjunction of constraints equating the arguments of atoms Bi and head(R). ∅ denotes an empty goal. 107 reduction step, an atom in the body of the goal in each step is selected. C ′ is called an answer constraint w.r.t. G, F , and R. We call a set of all answer constraints w.r.t. G, F , and R the semantics of G w.r.t. F and R. 2.2 The Operational Model We briefly describe the execution of the speculative framework. The detailed description can be found in [4]. The execution is based on two phases: a process reduction phase and a fact arrival phase. The process reduction phase is a normal execution of a program in a master agent, and the fact arrival phase is an interruption phase when an answer arrives from a slave agent. Figures 1–4 intuitively explain how processes are updated according to askable atoms. In the tree, each node represents a process, but we only show constraints associated with the process. The top node represents a constraint for the original process, and the other nodes represent added constraints for the reduced processes. Let us note that we specify true for non-top nodes without added constraints, since the addition of the true constraint does not influence the solutions of existing constraints. The leaves of the process tree represent the current processes. Processes that are not in the leaves are deleted processes. Figure 1 shows a situation of the processes represented as a tree when an askable atom, whose reply has not yet arrived, is executed in the process reduction phase. In this case, the current process, represented by the processed constraints C, is split into two different kinds of processes: the first one is a process using default information, Cd , and is called default process 5 ; and the other one is the current process C itself, called original process, suspended at this point. CXXX XXX X Cd true suspended Fig. 1: When Q@S is processed in process reduction phase When, after some reduction of the default processes (represented in Fig. 2 by dashed lines), the first answer comes from a slave agent, expressing constraint Cf for this askable literal, we update the default processes as well as the original suspended process as follows: – Default processes are reduced to two different kinds of processes: the first kind is a process adding Cf to the problem to solve, and the other is the current process itself which is suspended at this point. – The original process is reduced to two different kinds of processes as well: the first kind is a process adding ¬Cd ∧ Cf , and the other is the original process, suspended at this point. 5 In this figure, we assume that there is only one default for brevity. 108 CXXX XXX X Cd true S Z Z S C C ∧¬C Cf true suspended f true suspended f d true suspended Fig. 2: When the first answer Cf for Q@S arrives Let ← CkBs be a goal containing Q@S. Suppose that it is reduced into ← C ∧ Cd kBs\{Q@S} by a default rule “Q@S ← Cd k”. To retain the previous computation as much as possible, we process the query by the following execution: 1. We add Cf to the constraint of every goal derived from the default process. 2. In addition to the above computation, we also start computing a new goal: ← C ∧ ¬Cd ∧ Cf kBs\{Q@S} to guarantee completeness. When an alternative answer, with the constraint Ca , comes from a slave agent (Fig. 3), we need to follow the same procedure as when the first answer comes (Fig. 2), except that now the processes handling only default information are suspended. So, this is done by splitting the suspended default process(es), in order to obtain the answer constraints that are logically equivalent to the answer constraints of: ← C ∧ Cd ∧ Ca kBs\{Q@S}, as well as by splitting the suspended original process, in order to obtain the answer constraints that are logically equivalent to the answer constraints of ← C ∧ ¬Cd ∧ Ca kBs\{Q@S} (Fig. 3). By gathering these answer constraints, we can compute all answer constraints for the alternative reply. C XXXX XXX Cd true S Z S Z Cf Cf ∧¬Cd true Cf true true ! " ! S !! "" S Ca true suspended Ca true suspended Ca ∧¬Cd true suspended Fig. 3: When the alternative answer Ca for Q@S arrives On the other hand, when a revised answer with the constraint Cr arrives, all processes using the first (or current) answer are split, in order to obtain the answer constraints that are logically equivalent to the answer constraints of: ← C ∧ Cf ∧ Cr kBs\{Q@S}, and the suspended original process is split as well, in order to obtain the answer constraints that are logically equivalent to the answer constraints of ← C ∧¬Cf ∧ Cr kBs\{Q@S} (Fig. 4). By gathering these answer constraints, we can override the previous reply by the revised reply. 109 Cr Cr CXXX XXX X Cd true S Z S Z Cf Cf ∧¬Cd true Cf true true ! suspended suspended ! S !! S C ∧¬C Cr Cr Cr Cr r f true suspended Fig. 4: When the revised answer Cr for Q@S arrives 3 A Multi-threaded Implementation In [4], the detailed operational model is described as a multi-processing computation. There are two types of processes – finished processes that represent successfully terminated computational branches, and ordinary processes that represent non-terminated branches. An ordinary process can be either an original process that is always suspended or an active process that searches down an open branch. In practice the operational model can be implemented in two ways: 1. we represent each process as a state, and use a single process/thread to manipulate the states in a pseudo-multi-threading (serialised) fashion. This is very close to the model description; 2. we execute each process using a real thread, so that different (non-suspended) processes can execute concurrently. The multi-threaded approach avoids overheads caused by state selection and management that the serialised approach has, and allows or-parallelism which will benefit the proof search. However, using one thread for each process may not always be necessary and may cause extra overheads such as in inter-threads communication. For example, original processes are always suspended and can never be resumed, though it may spawn new processes that are not suspended. Preferably they should be managed as states instead, for easy update when a relevant answer is returned. This is also true for finished process. In this section, we describe a practical implementation for the operational model, which considers various efficiency aspects. 3.1 Overview The model is implemented as a speculative computation module, and we refer to it as a speculative agent. A set of agents (some of them may not be speculative agents) can be deployed to one or more host machines on a network. Agents interact with each other via messages (containing queries or answers). Since the operational model proposed in [4] is for simple master-slave systems only, in this paper we also assume that there can be only one master, i.e. the only speculative agent, in the set of deployed agents, and the rest are the slaves. The master can send queries to the slaves, but a slave cannot send queries to the master or other slaves. Hence, only the master can perform constraint processing with iterative revision for disjunctive answers. But bear in mind that our implementation is in fact designed in a way that it can be easily extended for hierarchical multi-agent systems similar to that defined in [2]. As illustrated in Fig. 5, each agent has the following internal components: 110 Fig. 5: Agent Internal Components Communication Interface Module (COM) : this is the only interface for inter-agent communications. It accepts queries or answers sent by the agent’s master or slaves, and forwards the agent’s answers or queries to the master or the appropriate slaves. The reception list and the address book are used for keeping track of the queries received and the master/slave addresses6 . Speculative Computation Unit (SCU) : this is the central processing unit of the agent that performs speculative computations for one or more queries. Default Store (∆) and Program (P) : they are self-explained, and form the static knowledge of the agent. Answer Entry, Choice Point and Finish Point Stores (AES, CPS, FPS) : AES stores the answer entries that are created from either ∆ or the returned answers from the slaves (i.e. the reply set R). CPS stores the computation choice points (CP), each of which represents the state of a (suspended) original process. FPS stores the finish points (FP), which contain the results of finished processes. The three stores are used by SCU and form the dynamic knowledge of the agent. In the following sections, we describe how these components are implemented. 3.2 Implementing the Communication Interface Module (COM) Agents communicate asynchronously via messages sent over TCP connections. Each agent on the network is uniquely identified by a socket of the form IP:Port, where IP is the network address of the agent’s host and Port is the port number reserved for the agent on the host. Therefore, several agents may run simultaneously on a host. During the design of an agent’s program, the sockets for the slaves may not be known, or they may be changed during agent deployment. Therefore, 6 Both these features will be essential when the implementation is extended for hierarchical multi-agent systems. 111 each agent uses aliases to identify its slaves locally. For example, in an askable atom Q@S appearing in P or ∆, S is the alias of a slave. The address book stores the mapping between the slave aliases and the slave sockets, and it can be generated/updated during agent (re-)deployment. There are two types of messages for inter-agent communications: – a query message of the form query(From, Q@S, Cmd), where From is the socket of the sender, Q is a query, S is the recipient’s alias used by the sender, and Cmd is a command of either start or stop. If the command is start, it indicates a request for the recipient (i.e. the slave) to start a computation for the query; otherwise if the command is stop, it asks the recipient to stop the computation for a query previously requested and to free the resources. The “stop” signal (in this paper) is merely used for the execution control of the agent. – an answer message of the form answer(From, Q@S, ID, Ans), where From, Q and S are described as above, Ans is a set of constraints as the answer to the query, and ID is the answer identifier by the sender and is used to distinguish between a revised answer and an alternative answer. COM waits for any incoming message and handles it as follows: – if it is an inter-agent message query(Master, Q@S, start) from the agent’s socket, COM creates an entry <RID, Q@S, Master> in the reception list, where RID is a new query entry ID, and then sends a message start(RID, Q@S) to the manager thread (MT) in SPU (to be described soon); – if it is an inter-agent message query(Master, Q@S, stop), COM removes the entry <RID, Q@S, Master> from the reception list, and then sends a message stop(RID) to MT; – if it is an inter-agent message answer(Slave, Q@S, ID, Ans), COM simply forwards it as answer(Q@S, ID, Ans) to MT; – if it is an internal message answer(RID, Q, ID, Ans) from MT or from one of the worker threads (WT) in SPU, COM looks up <RID, Q@S, Master> from the reception list, and then sends the inter-agent message answer(Self, Q@S, ID, Ans) to the master, where Self is the current agent’s socket; – if it is an internal message query(Q@S) from a WT, COM looks up the slave’s socket from the address book using S, and then sends the inter-agent message query(Self, Q@S, start) to the slave. 3.3 Implementing the Speculative Computation Unit (SCU) SCU can be seen as a collection of concurrent threads. Specifically, there is a persistent manager thread (MT) and zero or more worker threads (WT). MT is responsible for updating/revising the choice points/finish points and for spawning new WT(s) when a new query or answer is received, and WTs are responsible for constraint processing. The three stores AES, CPS and FPS are used and maintained by both MT and WTs. AES stores three types of answer entries (AE), all of which have 112 the form <AID, Q@S, Type, Ans>, where AID is the entry ID, Q@S is the query and the slave alias, Type is the entry’s type and Ans is the set of constraints associated with the entry: – If Type is o, then this is an original answer entry, and Ans is equal to the conjunction of the negations of all the defaults in ∆ for Q@S7 if there is any default, and is equal to true otherwise; – If Type is d, then this is a default answer entry, and Ans is equal to a corresponding default answer for Q@S in ∆; – otherwise, Type is r(ID) and this is an ordinary answer entry, where ID and Ans are from an answer returned by the slave S for Q. CPS stores the states of original processes (or called choice points (CP)), each of which has the form <QID, PID, G, C, WA, AA>, where QID is the (top level) query and its ID, PID is the process ID, G and C are the set of remaining subgoals and the set of constraints collected so far respectively, WA and AA are the set of awaiting answer entries and the set of assumed answer entries respectively. QID is used by a process to “remember” what query its computation is for, and hence has two components (RID-Qtop), where RID is the reception entry ID, and Qtop is the initial query for the process. It is necessary to record Qtop so that when a process finishes successfully (i.e. G becomes empty), the variable bindings between the answer (i.e. set of constraints) and the initial query can be preserved. Each element in WA and AA has the form (AID, Q@S), where AID is the ID of an answer entry that the process is awaiting or is assuming for the sub-goal Q@S. Note that it is also necessary to record Q@S here despite having already recorded AID, because if later an assumed answer needs to be revised, the correct variable bindings between the query sent (to the slave) and the answer returned (from the slave) can be obtained. FPS stores the states of finished processes (or called finish points (FP)), each of which has the form <QID, PID, C, AA>, where QID, PID and AA are as described above, and C is the final set of constraints collected, i.e. the answer, already sent to the master for the query associated with QID. Each WT represents an active process, and its state can be represented as <QID, PID, G, C, AA>. It is just like a CP except that it doesn’t have the awaiting answer entry set (i.e. no WA). It is also important to keep track of what AE is currently assumed/awaited by what WTs, CPs and FPs. Such usages of AE are recorded as subscriptions in a directory as a part of AES. Each subscription has the form sub(AID, PID), where AID is the answer entry ID and PID is the ID of a WT, CP or FP. 3.4 The Execution of the Manager Thread and the Worker Threads The multi-threaded operational model is based on the pseudo-parallel (serialised) operational model proposed in [4], but with improved “process management” allowing true or-parallelism during the computation: 7 i.e. V (Q@S←Cd k)∈∆ ¬Cd . 113 – In the serialised model, the computation interleaves with the process reduction phase and the fact arrival phase. When it enters the process reduction phase, one active process is selected at a time for resolving a sub-goal. In the multi-threaded model, each WT can enter the process reduction phase and resolve sub-goals independently and concurrently to others. No process selection is required. – In the serialised model, when it enters the fact arrival phase, all the relevant processes (active or suspended) are updated, and necessary new processes from original processes are created at the same time. In the multi-threaded model, the fact arrival phase is splitted and is done by the MT and WTs separately. The MT is responsible for revising the answers from existing finished processes (i.e. the finish points), updating original processes (i.e. the choice points) and creating appropriate new WTs from choice points. The MT also notifies relevant WTs about the newly returned answer via messaging, but will not change the state of WTs directly. On the other hand, when a WT receives such notification from MT, it will check for consistency of the new answer independently from others, and create new choice point if needed (e.g. in the case where it is assuming a default answer and an alternative answer is received). Different WTs can update themselves concurrently. We now present the detailed execution steps for MT and WT. Fig. 6: Execution of MT Execution of MT (illustrated in Fig. 6): MT processes each message it receives from COM: – if the message is start(RID, Q), it spawns a new WT with initial state hQID, P IDnew , Q, ∅8 , ∅9 i, where QID = (RID, Q), P IDnew is a new process ID. – if the message is stop(RID), then 1. it removes all the choice points in CPS and all the finish points in FPS that are associated with RID; 8 9 This is the initially empty set of constraints. This is the initially empty set of assumed answer entries. 114 2. it broadcasts a message stop(RID) to all the WTs; – if the message is answer(Q@S, ID, Cnew ): • if there exists an answer entry hAID, Q@S, r(ID), Cold i in AES, then the received answer is a revised answer (following Fig. 4): 1. MT updates the existing answer entry to be hAID, Q@S, r(ID), Cnew i; 2. for each WT subscribing AID, MT sends a message rev(AID, Q@S, Cnew ) to the WT (so that the WT can check Cnew for consistency); 3. for each FP of hQID, P ID, Cf inal , AAi that is subscribing AID and QID = (RID, Qtop ), if Cf inal 6= Cf inal ∧ Cnew , then MT sends a message answer(RID, Qtop , P ID, Cf inal ∧ Cnew ) to COM; 4. for each CP of hQID, P ID, G, C, W A, AAi that is subscribing AID, if Call = C∧Cnew is consistent, then MT updates it to be hQID, P ID, G, Call , W A, AAi; otherwise, MT removes the CP and the CP’s subscriptions; 5. let hAIDo , Q@S, o, Co i be the original answer entry for Q@S, for each choice point of hQID, P ID, G, C, W A, AAi that is subscribing AIDo and Call = C ∧ ¬Cold ∧ Cnew is consistent: ∗ if W A contains only (AIDo , Q@S), then MT creates a new WT with hQID, P IDnew , G, Call , AA ∪ {(AID, Q@S)}i, and subscribes all the answer entries in AA and that with AID for the new WT (i.e. for each (AID′ , Q′ @S ′ ) ∈ AA ∪ {(AID, Q@S)}, it adds sub(AID′ , P IDnew ) to the directory in AES); ∗ otherwise, MT creates a new CP of hQID, P IDnew , G, Call , W A \ {(AIDo , Q@S)}, AA ∪ {(AID, Q@S)}i in AES, and subscribes all the answer entries in AA and in W A for the new CP; • otherwise, it is a first/alternative answer (following Fig. 2 and Fig. 3): 1. MT creates a new answer entry hAIDnew , Q@S, r(ID), Cnew i in AES; 2. for each default answer entry hAIDd , Q@S, d, Cd i in AES: ∗ for each WT subscribing AIDd , MT sends a message alt(AIDnew , AIDd , Q@S, Cnew ) to it; ∗ for each FP of hQID, P ID, Cf inal , AAi that is subscribing AIDd and QID = (RID, Qtop ), if Cf inal 6= Cf inal ∧ Cnew , then MT sends a message answer(RID, Qtop , P ID, Cf inal ∧ Cnew ) to COM; ∗ for each CP of hQID, P ID, G, C, W A, AAi that is subscribing AIDd , (a) MT updates the CP to be hQID, P IDnew , G, C, W A∪{(AIDd , Q@S)}, AA \ {(AIDd , Q@S)}i; (b) if Call = C ∧ Cnew is consistent, then · if W A contains only (AIDd , Q@S), then MT creates a new WT with hQID, P IDnew , G, Call , AA∪{(AID, Q@S)}i, and subscribes all the answer entries in AA and that with AID for the new WT; · otherwise, MT creates a new CP of hQID, P IDnew , G, Call , W A \ {(AIDd , Q@S)}, AA ∪ {(AID, Q@S)} \ {(AIDd , Q@S)}i in AES, and subscribes all the answer entries in AA ∪ W A ∪ {(AID, Q@S)} \ {(AIDd , Q@S)} for the new CP; 3. let hAIDo , Q@S, o, Co i be the original answer entry for Q@S, for each choice point of hQID, P ID, G, C, W A, AAi that is subscribing AIDo and Call = C ∧ Co ∧ Cnew is consistent: ∗ if W A contains only (AIDo , Q@S), then MT creates a new WT with hQID, P IDnew , G, Call , AA ∪ {(AID, Q@S)}i, and subscribes all the answer entries in AA and that with AID for the new WT; 115 ∗ otherwise, MT creates a new CP of hQID, P IDnew , G, Call , W A \ {(AIDo , Q@S)}, AA ∪ {(AID, Q@S)}i in AES, and subscribes all the answer entries in AA ∪ W A ∪ {(AID, Q@S)} \ {(AIDo , Q@S)} for the new CP; (a) Fact Arrival Phase (b) Process Reduction Phase Fig. 7: Execution of WT Execution of WT (illustrated in Fig. 7): The execution of a WT can be seen as a loop with the following steps performed at each iteration (let its initial state at each iteration be hQID, P ID, G, C, AAi): – If there is an internal message received by the WT (i.e. from MT), it enters the Fact Arrival Phase: • if the message is rev(AID, Q@S, Cr ) where (AID, Q@S) ∈ AA (see Fig. 4), let Call = C ∧ Cr : if Call is consistent, then the WT continues with hQID, P ID, G, Call , AAi. Otherwise, the WT removes all of its subscriptions in AES and terminates; • if the message is alt(AIDa , AIDd , Q@S, Ca ) where AIDd is an ID of a default answer entry (following Fig. 2), 1. it creates a new CP of hQID, P IDnew , G, C, {(AIDd , Q@S)}, AA \ {(AIDd , Q@S)}i in CPS, and subscribes for all the answer entries in AA for the new CP; 2. if Call = C∧Ca is consistent, then the WT continues with hQID, P ID, G, Call , AA ∪ {(AIDa , Q@S)} \ {(AIDd , Q@S)}i. Otherwise, it removes all of its subscriptions and terminates; • if the message is stop(RID), and RID is equal to the query ID in QID, then the WT removes all of its subscriptions and terminates; – Otherwise, it enters the Process Reduction Phase and tries to select L from G: • if G is empty and thus no L can be selected, the current computation succeeds: 1. let QID = (RID, Qtop ), the current WT sends a message answer(RID, Qtop , P ID, C) to COM; 2. it creates a FP of hQID, P ID, C, AAi and then terminates. Note that it doesn’t need to make answer entry subscriptions for the new FP or to remove its subscriptions, because the new FP “inherits” them. • if L is not an askable atom, for every rule R such that Cnew = C ∧ (L = head(R)) ∧ const(R) is consistent, the current WT spawns a new WT with 116 state hQID, P IDnew , G \ {L} ∪ body(R), Cnew , AAi and subscribes all the answer entries in AA for the new WT. Then the current WT removes all of its subscriptions and terminates10 . • if L is an askable atom Q@S (where S must be ground): if there exists (AID, Q′ @S) ∈ AA such that Q and Q′ are identical (i.e. they are not variants), then the WT continues with hQID, P ID, G \ {L}, C, AAi11 . Otherwise (following Fig. 1), 1. it collects (AIDo, AIDS) from AES as follows: ∗ if there exists some ordinary answer entries for Q@S, then there must exist an original answer entry for Q@S too. Let AIDo be the original answer entry ID, and AIDS be the set of ordinary answer entry IDs, whose associated answer constraints are consistent with C; ∗ otherwise, (a) if there exists no original answer entry for Q@S, then the WT i. creates one hAIDnew , Q@S, o, Co i in AES, where Co is the conjunction of the negations of all the default constraints for Q@S in ∆ if there is some default constraint, or is true if there is none; i ii. creates a default answer entry hAIDnew , Q@S, d, Cdi i for each i default constraint Cd for Q@S in ∆; iii. sends a message query(Q@S) to COM; (b) let AIDo be the original answer entry ID, and AIDS be the set of default answer entry IDs, whose associated answer constraints are consistent with C; 2. for each answer entry hAID, Q@S, T ype, Ca i such that AID ∈ AIDS, the current WT spawns a new WT with state hQID, P IDnew , G \ {Q@S}, C ∧ Ca , AA ∪ {(AID, Q@S)}i and subscribes all the answer entries in AA ∪ {(AID, Q@S)} for the new WT; 3. the current WT creates a new CP of hQID, P IDnew , G\{Q@S}, C, {(AIDo, Q@S)}, AAi in CPS, and subscribes all the answer entries in AA plus that with AIDo for the new CP; 4. the current WT removes all of its subscriptions and terminates12 . 3.5 Resolving Concurrency Issues Inside SPU, MT and WTs execute concurrently, and they all require read/write access to the three stores AES, CPS and FPS. Potential conflicts between MT and a WT, or between WTs may arise. Firstly, it is possible that after a WT spawns several children WTs, and just before it can make all the answer entry subscriptions for the children, MT receives an answer and notifies only some of its children (e.g. the subscription process is not yet complete). Secondly when two WTs encounter the same askable atom at the same time, and if there is no original answer entry for that atom yet, then the original answer entry may be created twice and the query may be sent twice by the two WTs. Hence, the three stores are considered as “critical regions” and need to be protected. One naı̈ve 10 11 12 As an optimisation, if there are N > 0 possible new processes (states), then only N − 1 new WTs are spawned, and the current WT continues as N th process. This is an optimisation to the original operational model, which prevents unnecessary new processes (threads) to be created. Optimisation similar to footnote 10 can be applied. 117 solution is to make all the iteration steps performed by WT or MT atomic. But this will greatly reduce the chance for concurrent processing and hence remove almost all the benefits brought by the multi-threaded implementation. Therefore, “fine grained” atomicity control is needed for the executions of MT and WTs. Let’s consider the first problem. The potential conflict is between MT and WT, and is not between WTs. Although several WTs may need to update the subscriptions in the directory of AES, they only modify the ones associated with their IDs or with their new born children’s IDs. As long as the children WTs do not start working until their parent WT has made all the correct subscriptions for them, there won’t be any conflict. Also, WTs can only create new choice points in CPS and create new finish points in FPS according to their own states, there is no potential conflict of updating CPS and FPS either. Therefore, the execution of a MT’s message handling step cannot (safely) interleave with that of the process reduction step or the fact arrival step of any WT, but the executions of WTs’ steps can interleave without problems. To impose such control, we have introduced an atomic counter13 called the “busy worker counter” (BC). Whenever a WT starts to perform a fact arrival step or reduction step, it will increment BC; and whenever it finishes one step, it will decrement BC. We also introduce an atomic flag called the “waiting/working manager flag”(W F ). Whenever MT receives an answer, it will set W F to 1; and when MT finishes handling one returned answer, it will clear W F to 0. The safe exclusive execution control between MT and WTs using BC and W F are as follows14 , WT MT Loop: Loop: 1. (atomic step) waits for W F to be 1. waits for a returned answer; cleared and then increments BC; 2. sets W F 2. performs either fact arrival step or re- 3. waits for BC to reach 0; duction step; 4. handles returned answer; 3. decrements BC 5. clears W F Hence, whenever a WT performing a fact arrival step or process reduction step, MT is not allowed to process any received answer; whenever MT has an answer waiting to be processed or being processed, no WT can perform a new step. Let’s now consider the second problem. The potential conflict is between two WTs when they both try to collect/create answer entries for an askable goal. The solution is relatively easy: we have introduced a mutex MAES and control the WT’s execution as follows, When a WT tries to collect answer entries for Q@S: – if an original answer entry for Q@S exists in AES, continues as normal; – otherwise, (1) locks MAES ; (2) if AES still doesn’t contain an original answer entry for Q@S, then creates the original and default answer entries, and then sends out the query; (3) unlocks MAES . 13 14 I.e. its value update is atomic. Pseudo-code in Prolog is provided in Appendix A. 118 The operation of locking a mutex succeeds immediately if the mutex hasn’t been locked by any other thread yet; otherwise it causes the current thread to be suspended. The suspended thread is revived only when the mutex is unlocked, and then the revived thread tries again to lock the mutex. In the above example, it is possible that while a thread is waiting to lock MAES , the thread already locking MAES creates the answer entries. Therefore, in Step 2 checking again whether an original answer entry exists is necessary. 4 Discussions The proposed mutli-threaded implementation is implemented in YAP Prolog [7]. We chose YAP not only because it has the necessary CLP and multi-threading supports, but also because it is considered as the one of the fastest Prolog engines that is free and open source. We have tested the implementation with meeting scheduling examples described in [4] but with increased size. During the testing, we used YAP’s default maximum number of WTs of 100 and were able to compute the correct answers within the order of 1 second. For large problems, e.g. if a query would lead to more than 10 (non-askable) sub-goals, each with more than 10 rules with constraints that are always consistent, the number of WTs would exceed 100. Our implementation is able to cope with such problems by setting a higher WT number limit, e.g. 1000, at the expense of initial memory consumed by YAP15 . In practice, to strike a balance between the number of WTs and the memory consumption, our implementation can be adapted to use a hybrid approach, which would implement two types of WTs: normal workers and super worker. A normal worker would execute as an active process as described in the multithreaded model. A super worker would behave like the serialised model [4] and manage several processes in a round-robin fashion. In this way, memory consumption would be reduced whilst maintaining the effect of a high number of WTs. For example, let M be the maximum number of WTs that an agent’s SPU can have, then there can be M − 1 (at most) normal workers and 1 super worker. During the computation, when there are N (N > M − 1) active processes, M − 1 of them are handled by the normal workers, and the rest of them are handled by the super worker. When an active process terminates (either due to failure or finish), the normal worker can release it and acquire another active process state from the super worker to continue. 5 Conclusion In this paper, we have presented a practical multi-threaded implementation for speculative constraint processing with iterative revision for disjunctive answers, and suggested a hybrid implementation for situation where multi-threading support is limited by resource constraint. Although the implementations are based 15 100 maximum threads in YAP require about 2MB memory, 1000 threads require about 4MB and 9999 threads require about 109MB 119 on the operational model described in [4], which is for simple master-slave systems where only the master can perform speculative computation, they are designed to be extendable for hierarchical master-slave systems. As a future work, we will prove the correctness of an extended operational model for a hierarchy of master-slave agents and extend the current implementation to support this more general type of multi-agent systems. References 1. Satoh, K., Inoue, K., Iwanuma, K., Sakama, C.: Speculative computation by abduction under incomplete communication environments. In: ICMAS. (2000) 263–270 2. Satoh, K., Yamamoto, K.: Speculative computation with multi-agent belief revision. In: AAMAS. (2002) 897–904 3. Satoh, K., Codognet, P., Hosobe, H.: Speculative constraint processing in multiagent systems. In: PRIMA. (2003) 133–144 4. Ceberio, M., Hosobe, H., Satoh, K.: Speculative constraint processing with iterative revision for disjunctive answers. In: CLIMA VI. (2005) 340–357 5. Satoh, K.: Speculative computation and abduction for an autonomous agent. IEICE Transactions 88-D(9) (2005) 2031–2038 6. Inoue, K., Kawaguchi, S., Haneda, H.: Controlling speculative computation in multiagent environments. In: Proc. Second Int. Workshop on Computational Logic in Multiagent Systems (CLIMA-01), 2001. (2001) 9–18 7. : YAP Prolog 5.1.3 manual. http://www.dcc.fc.up.pt/~ vsc/Yap/index.html (June 2008) A Pseudo-code for the Implementation of Exclusive Control between the Manager Thread and Worker Threads YAP Prolog only provides message queues and mutexes for multi-threading support [7]. % " m _ b c " a n d " m _ w f " a r e t h e m u t e x e s f o r BC a n d WF ; % " v _ b c " is t h e c o u n t e r f o r BC % " m q _ b c " is t h e m e s s a g e q u e u e f o r n o t i f i c a t i o n s % a b o u t BC % f o r WT wt_loop :mutex_lock ( m_wf ) , mutex_lock ( m_bc ) , mutex_unlock ( m_wf ) , increment ( v_bc) , mutex_unlock ( m_bc ) , // process reduction or fact arrival step mutex_lock ( m_bc ) , decrement ( v_bc) , ( v_bc (V ) , V == 0 -> send_notification_to ( mq_bc ) ; true ), mutex_unlock ( m_bc ) , wt_loop . % f o r MT mt_loop :// wait for received answer , mutex_lock ( m_wf ) , wait_for_zero_bc , // handle received answer mutex_unlock ( m_wf ) , mt_loop . wait_for_zero_bc :mutex_lock ( m_bc ) , clear_any_notification_ in ( mq_bc ) , ( v_bc( V) , V > 0 -> mutex_unlock ( m_bc ) , wait_for_notification_in ( mq_bc ) , wait_for_zero_bc ; mutex_unlock ( m_bc ) ). 120 ½ ¾ ½ ! "# $ " % & ¾ ' ( ) ' '* + ) ' ' ) (! ! '* &! ' ) )' * & , ! - ! . ) '* /- () ) -' - ) . ' ' !- - ' ) ' () ' ) , -* ) ' ' ') ) ' ' ' * ) - ' ! ''* !"# $%℄ ' ( ) * + ½ ¾ , ½ , ¾ , 121 ) ' ) ½ ¾ ' ' ½ ¾ ) + ½ ¾ - ½ ¾ , . !# , !# !# !# / !# , , !# , 0+ ½ ¾ ½ ¾ 1 !# . !# !# 2 ) $%3℄ , + , * , + 0+ , 2 ( 2 ) , 2 1 4 5 + 6 3 7 8 122 ( $8℄ + 2 !'9"# . ½ ·½ ·½ !%# !0# , , · ½ ·½ ·½ : · '9" ! # '9" ( $8℄ - ½ ·½ !%# ; 0, '9" !# · : !# ) 0, !# !# ; '9" : * ( ( * ) , + + 123 1 * ! # - ½ ¾ , ½ ¾ !4# !4# ) ½ ¾ , < = < = " , < !½ # = !¾ # . ½ ¾ ½ ½ ¾ ¾ ½ ¾ " ½ ¾ ¾ ½ ¼ ! " # ¼ 0 : * ½ ¾ ½ / + ¾ ! # ½ 1 124 ¾ $ ½ ¾ > * ½ ¾ ! # ½ ½ ¾ ¼ ¼ ¾ , - ½ ¾ ½ ¾ > ( + % ½ $ ½ ¾ # ¾ ½ ¾ ½ ¾ " * ! # ! # ( ! # - ½ ¾ ½ ¾ , !5# !5# ) ½ ¾ + < = < ( = " < !½ # = !¾ # . ½ ¾ 125 ½ ½ ¾ ¾ ½ ¾ " ½ ¾ ¾ ½ ¾ ¾ < , = ! " ¼ ¼ # : !(# ½ ¾ & * ( ( ! # ½ ¾ , " # & ' ½ ¾ ½ ¾ . ½ ¾ ¾ 0 + * ! # - ½ ¾ ½ ¾ !6# !6# ) ½ ¾ 2 126 ( = < 2 < ) 2 " < !½ # !¾ # ½ ! ! ! ! ¾ " " ! ! ' > ! " ! . ½ ! ! ½ " ! : ¾ ! ! ¾ " ! : ¿ ! ! ¿ " ! ! " ¼ ¼ # * * ! # - ½ ¾ : ) % " " ¾ : $ !%# ½ # !5# !4# 9( 5? @ * '+ 53 ½ ! ! ½ " ! ! " ! ½ ½ ! ½ ½ " , < 2 < 2 ½¼ ! ! ½¼ " ! < 127 + < > '+ 53 ¾ ¾ / ¾ ¾ ! ¾ ¾ " ! ¾ ¾ ( ! # - ¼ ½ ¼ ¼ ¼ ½ ¾ * ¾ ¼ ) ¼ / ( ) ) # ¼ ½ ¼ ¾ ¼ ¼! +# , ¼ ¼ ¼ ¼ $ / ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ + ! # , $%6℄ ! ! # ½ ¾ 1 $ # ½ ¾ $ # ½ ¾ $ ½ 1 $ $ ½ ¾ # ½ ¾ % # ¾ ½ & % # % & $ & % & & # + % & % - ½ ! ! ½ " ! ! " ! ) # ½ ¾ ! # ¾ ½ " ! 2 * 128 !2 # - ½ ¾ ' ½ ¾ ' ' ' !3# ' !3# ' ¾ ½ ' . / < / < / < 0 ½ ¾ " < !½ # / !¾ # . ½ ½ ( ¾ ½ ½ ) ½ ¾ ¾ ( ¾ ¾ ) ) ( ' ( ½ " '½ ½ %4 %7 ½ ¾ 2 '½ . ½ ½ ¾ ( 2 ½ ½ ¾ ) : ¾ ½ ¾ ( 2 ¾ ½ ¾ ) 0+ ¾ '½ '¾ ½ ¾ A 2 '¾ ½ ½ 2 '¾ ¾ ¾ ! ! " '# ' ' ' " # ' ¾ ½ ' " # $ 0 2 / 9( 5% 57 '# ' ' ' % " ' '# ' 2 ¼ ¼ 129 " " # 5 , ½ ¾ * ' ½ ¾ ' + * * * ½ ¾ * ! * * * # 9( 5% ! 9, 56 58 5%% '# # * * * * + # ! # - ½ ¾ * ½ ¾ ½ ¾ ! # * * * * * * * ¼ ¼ ¼ ¼ '¼ + ¼ + + ' ) > * ( + ) ' , ½ ¾ / ½ ¾ . ! , ( B ( / ½ ¾ . ½ ½ ¾ ¾ ½ ¾ ½ ½ ¾ ¾ ! ½ ½ ½ ¾ * * * * * , * * , # * * * * ( . !# , # , # : !# + # ¼ * * * * # # ¼ # ¼ ( !# 130 ! # - ( * * * * ½ ¾ * + + # * * * * # '+ 6% ¾ ! * * ½ # ½ ¾ ½ # * * ¾ $ 5 ' / + ! # - ½ - ½ + ¾ - ¾ / ! # + !7# ! # + - '% ! ( '+ 5%# < = < = < !½ # = !¾ # ½ ¾ ! - ( . ! - ( 2 , 0 - . ! ½ ! ¾ - , - ! # - 131 + - ( ½ + + 0 ( + ½ ¾ + 2 ¾ $ % &% 1 !, ,# - ½ ½ ½ . ½ ½ . % ½ / ½ 4 ½ / 5 ½ / ½ 6 ½ ½ !8# !?# !A# / ' ' * ½ , ½ ' !%C# * ' > 9( 66 9( 66 ( 4, . . ! " # , ½ 2 + . 2 ' $ ' % ! # - ½ 132 ¾ % ½ ¾ 4 ! # ½ ¾ ! / 5 6 ! / # ½ # ¾ ! # ½ ¾ ! / # ½ ½ ¾ ! # ½ ¾ ! / # ' ¾ '' ! # ½ ¾ 0 ( + , + * , 9( 6% + , 9( 66 " ! # ; 0, ( ; ½ . ½ ½ ½ ½ 133 ½ ( - ) ) ½ ¾ +# % ½ ¾ % ½ ½ ½ ¾ ¾ $ ½ ¾ $A℄ ¾ 0+ ( ! # - ½ ¾ * ' ½ ¾ !4# !5# !6# !3# ½ ¾ ( # % # . ½ ¾ ¾ ½ !%%# !%4# 4 # . ½ 5 ¾ !%5# # . ½ ¾ !%6# 6 ' # . ¾ ½ ¾ # ½ !%3# !%%# !%4# ½ ¾ !%5# ½ ¾ / !%6# + ½ ¾ !%3# ' ¾ ½ * + 134 () ) ½ ¾ * ' ½ ¾ + ½ ¾ * ¾ ½ ½ ¾ # % ½ ¾ ¾ $ / " 34 % ½ ½ ½ ¾ ¾ ½ % ¾ # ½ ½ ¾ ¾ # * * B () ) ½ ¾ * ' ½ ¾ 0 ½ ¾ * /¾ $ 9 + '9" /¾ , $3℄ 35 2 + , , * ' $ + * ' , ( , > + ( '+ 5% ( ½ " " " " " " ¼ ¾ " " ¼ ½¼ ¼ " ¾ " " ½ ¾ * ' ) , ) 135 '+ 5? * * D 1 ) 2 3 E F $?℄ * $%5℄ , $8℄ . ½ > . ¾ ½ ¾ D E +, 0 2 9( 34 ½ ¾ 2 ; * 2 * ( D ( E $%3℄ > ( , ( , ( = ( ( .-" /0 = $%C℄ . + $%C℄ ( 9( 54 55 1 , 136 + / ; $4℄ , ; $½ $ & ( (0 0 $½ 0 $ 0 $ (+ $ $6℄ !- # , 1 ) ( - - ( / $5℄ !>-"# . ℄ ( , >-" $7℄ , + , $%%℄ ; ½ ¾ $ $ ½ ¾ ½ ¾ $%4℄ $ $ ½ ¾ 1 1 , ½ ¾ + G + 9 $7 %% %4℄ , $%5℄ $ $ ½ ¾ 1 1 ½ ¾ 2 + , 1 2 ' , + 137 ( G * , + ! # * , 1 * 0 '-"% . 0 , C?%4478 H' ,C64C6C8 ! * 0) *1 2 !) ' 3 3 ' ) 4 () )'* ( ' 4 ( ' "& 56%7* 6* 0 +* 8 ) ( 8*1 " )' 9 .# () )* :1 2 &* * +* 5 *7 ) $ ' 1 $ ' 4 '' 0 * $ 5$ &:7 )* 6; * < ' = )(' 5667* %* 0 +* 8*1 & ) )' * :1 0) * )* 5 *7 $4 "3 6* $ 5$ &:7 )* %6 * %;<%6> ' = )(' 567* * ) &* $ ?* ")) 4* / +* / 4*1 &$:&1 ) ' ! ' - ' * &"&: %; <> 56%7* * ? /* 8 ) ( 8*1 )# ) 9 ) ' '' ) ) ' * :1 ")) * 5 *7 $ ' 4 '' * 6<6; ":/ 4 5>>%7* * + * " /* @-' A* @-' *1 ' ' ) ' '* :1 4 ' - - - 3 ' & -' 567* ;* 8) "* $ -B C*1 ) ) ' ) ' ' 9 (* ! 8 ' > %<% 5>>7* * 2 !) 3* &*1 &)' - D $ ' E )* - & " 66 6<% 5>;>7* >* $ -B C* / =*1 )' ) ' '* :1 = 4* C* 5 *7 $ ' 4 '' * 6%<%; ":/ 4 5>>7* * " /* + * 2! 3* @-' *1 $ ' ) ' 1 * :1 &&&: 6 * 6>%<6> ":/ 4 567* * * : 2*1 (' ! ) ' '* :1 +* / 4* / 5 *7 $:"& C: 3 ) : 4* $ 5$ &:7 )* %> * %6<%%> ' = )(' 567* 6* * : 2*1 ' ) ' '* :1 4 () 8* 5 *7 $F4/3 6 3 ) 4* $ )* ; * 6<6 ' = )(' 56;7* %* * : 2*1 ! ''* & " / ) $ ' > & ) *> 567* - 1 (! ) ' ) '* :1 $ * / 4* 5 *7 $:"& C 3 ) : 4* $ 5$ &:7 )* %; * <;; ' = )(' 567* * '- "* 4*1 & ) ' )' 1 ! . * &. ) :))' $! ; >;<% 5>>>7* * 8* 5 *7* " )' 1 & " & - ( &. ) :)) ' * ":/ 4 ( ' "& 5>>>7* 138 A Characterization of Mixed-Strategy Nash Equilibria in PCTL Augmented with a Cost Quantifier Pedro Arturo Góngora and David A. Rosenblueth Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas Universidad Nacional Autónoma de México A.P. 20-726, C.P. 01000, México D.F., México [email protected], [email protected] Abstract. The game-theoretic approach to multi-agent systems has been incorporated into the model-checking agenda by using temporal and dynamic logic to characterize notions such as Nash equilibria. Recent efforts concentrate on pure-strategy games, where intelligent agents act deterministically guided by utility functions. We build upon this tradition by incorporating stochastic actions. First, we present an extension to the Probabilistic Computation-Tree Logic (PCTL) to quantify and compare expected costs. Next, we give a discrete-time Markov chain codification for mixed-strategy games. Finally, we characterize mixedstrategy Nash equilibria. 1 Introduction As a decision theory for multi-agent settings, game theory is undoubtedly in the interest of Computer Science and Artificial Intelligence. Recent works have incorporated this interest into the model-checking agenda, characterizing various game-theoretic notions in temporal and dynamic logic (cf. [1–3]). These works concentrate on pure-strategy games, where intelligent agents act deterministically guided by utility functions. The focus has been on characterizing notions such as Nash equilibria, Pareto optimality, and dominating/dominated strategies. In this paper, we build upon this tradition by incorporating stochastic actions, focusing on the characterization of mixed-strategy Nash equilibria for finite strategic games. Previous works include, but are not limited to, characterizations of Nash equilibria. In [1] the author gives a characterization of backward induction predictions (i.e., Nash equilibria for extensive-form games) using a branching-time logic. In [2] the authors proceed in a similar vein, but using the richer language of Propositional Dynamic Logic (PDL). Another similar approach is in [3], where the authors introduce Alternating-Time Temporal Logic (ATL) augmented with a counterfactual operator. This extension to ATL allows us to express properties such as “if player 1 committed to strategy a, then ϕ would follow ”. Counterfactual reasoning is then used to characterize Nash equilibria for strategic-form 139 games. Further works emphasize other game-theoretic notions, such as automated mechanism design (cf. [4, 5]). None of these previous works handle mixed strategies. The first approach, to our knowledge, including stochastic actions is in [6], where the authors make a quantitative analysis of a bargaining game. They, however, do not provide a characterization of Nash equilibria. We start from Probabilistic Computation-Tree Logic (PCTL, [7]) augmented with costs as our underlying framework and proceed as follows. First, we present an extension to PCTL for quantifying the values in the expected-cost formulas (e.g., in E⊲⊳x [ϕ], x might be existentially or universally quantified). Next, we give a discrete-time Markov chain codification of a finite strategic game. The codification consists in unfolding the outcomes of a game, under a mixed-strategy profile, into a treelike structure that models the possibilities of action for each agent. Finally, we give a simple formula of the extended logic characterizing Nash equilibria under our codification. The rest of the paper is organized as follows. Section 2 is devoted to presenting all the definitions used from game theory. In Sect. 3 we introduce discrete-time Markov chains and PCTL with costs. In Sect. 4 we present the cost-quantifier extension to PCTL and discuss its model checking. In Sect. 5 we present the game codification on Markov chains, a characterization of a Nash equilibrium, and prove its correctness. We finish with some final thoughts and a discussion of future and related work. 2 Strategic Games Game theory studies the interaction between rational agents. Here, rationality is directly related to the maximization of utility. A game is just a formal description of that interaction. We will deal with games in which the sets of possible actions are those of individual players, sometimes called non-cooperative. For brevity, we will refer to non-cooperative games simply as games. Of the two formalizations for games, strategic and extensive games, we will use the former, as such a formalization can incorporate probabilistic actions. There exist several concepts of solution for games of which, arguably, the most widely known is that of Nash equilibrium. Broadly speaking, a Nash equilibrium is characterized by the decisions made by all players of a game, such that no player can increase her/his payoff by taking another action, assuming that every other player will stick to her/his decision. This section is based on the first chapters of [8], to where we refer the reader for a more thorough discussion. Definition 1 (Finite Strategic Game). A finite strategic game is a structure: G = hN, {Ai }i∈N , {ui }i∈N i where N = {1, . . . , n} is a finite set of n agents, Ai is a finite set of the pure strategies of agent i, ui : A → IR is the payoff or utility function of agent i, and A = ×i∈N Ai is the set of all pure-strategy profiles of G. 140 Example 1 (Bach or Stravinsky). Consider the game known as Bach or Stravinsky (BoS) for players 1 and 2. The players wish to decide which concert with music by one of two composers they will go to. Player 1 prefers twice as much Bach, while player 2 prefers twice as much Stravinsky. Both players prefer to go to either concert over disagreement. Each player makes her/his choice independently of the other but accounting that preferences are common knowledge among them. Two-player finite strategic games can be described using payoff matrices. The matrix shown in Fig. 1 defines the utility functions for BoS, e.g., u1 (B1 , B2 ) = 2, u2 (B1 , B2 ) = 1. B2 S2 B1 2, 1 0, 0 S1 0, 0 1, 2 Fig. 1. Payoff matrix for the strategic game BoS We use the following notational conventions. We use Latin letters a and a′ to range over the set A of strategy profiles. If a is a strategy profile, we use ai to refer to the strategy of agent i specified in a. Also, as a notational abuse, we denote with a−i the strategy profile which specifies the strategies of every agent but i, such that if ai ∈ Ai , then (a−i , ai ) ∈ A. We also assume that Ai sets are pairwise disjoint and, when it is clear, we will identify a strategy profile a ∈ A with another n-tuple a′ iff they contain exactly the same elements regardless of the order. Definition 2 (Best-Response Strategy and Nash Equilibrium). Given a finite strategic game G = hN, {Ai }i∈N , {ui }i∈N i, we say that a strategy ai is a best-response to strategy profile a iff ui (a−i , ai ) ≥ ui (a−i , a′i ) for each a′i ∈ Ai . We say that a strategy profile a is a Nash Equilibrium of G iff every strategy ai such that a = (a−i , ai ) is a best-response to a itself. Consider the previous definition and the matrix in Fig. 1. We can easily verify that both strategy profiles (B1 , B2 ) and (S1 , S2 ), are Nash equilibria of BoS (Example 1). Definition 3 (Mixed Extension of a Game). Let ∆(B) be the set of all probability distributions over the finite set B. For any finite strategic game: G = hN, {Ai }i∈N , {ui }i∈N i we define its mixed extension as the structure: b = hN, {∆(Ai )}i∈N , {Ui }i∈N i G 141 b → IR is the where ∆(Ai ) is the set of all the mixed-strategies of player i, Ui : A mathematical expectation of utility with respect to the probability measure induced b = ×i∈N ∆(Ai ) is the set of all mixed-strategy by a mixed-strategy profile, and A b profiles of G. b All other notational conWe use Greek letters α and α′ to range over A. ventions for pure-strategy games are used as well for their mixed extensions. As αi is a probability distribution over Ai , we use αi (ai ) to denote the probability assigned by αi to the event that pure strategy ai is selected. For a mixed strategy αi , the set of elements of Ai to which αi assigns probability greater than 0 is called the support of αi . We denote by supp(αi ) the subset of Ai whose elements are in the support of mixed strategy αi . We say that a mixed strategy αi degenerates to a pure strategy ai iff it assigns probability 1 to the event ai , i.e., αi (ai ) = 1. Finally, we say that mixed-strategy profile α is a Nash equilibrium b of a game G if it is a Nash equilibrium of its mixed extension G. The expected utility under some mixed-strategy profile is the mean value of such a utility. For some mixed-strategy profile α and player i the utility function is determined by: X Ui (α) = pα (a)ui (a) a∈A pα (a) = Y αj (aj ) j∈N The following theorem provides a useful characterization of Nash equilibria. See Lemma 33.2 in [8, p. 33] for a similar characterization and a proof for the if direction. Theorem 1. Given any finite strategic game G = hN, {Ai }i∈N , {ui }i∈N i, a mixed-strategy profile α is a Nash equilibrium of G iff the following two conditions hold for each player i ∈ N : 1. The equality Ui (α−i , ai ) = Ui (α−i , a′i ) holds for each (degenerate strategy) ai and a′i in supp(αi ). 2. The inequality Ui (α) ≥ Ui (α−i , ai ) holds for each (degenerate strategy) ai in Ai − supp(αi ). Proof. For the first part suppose that the equation Ui (α−i , ai ) = Ui (α−i , a′i ) does not hold for some i. Then either side must be greater than the other, but that contradicts the hypothesis of α being a Nash equilibrium, as i could increase his/her utility by assigning more probability to the strategy that increases his/her utility. The second part follows from the definition of Nash equilibria. The converse is direct: if both parts hold for each i, then it is impossible to increase some agent’s utility by increasing the probability for some strategy (both parts show the worst-case probability of 1 for each strategy and agent), hence the profile is a best-response to itself. ⊓ ⊔ 142 Consider again the matrix in Fig. 1. We can use Theorem 1 to verify that the mixed-strategy profile α = 23 , 31 , 13 , 32 is a Nash equilibrium for BoS. For example, for player 1, we replace α1 with one of the degenerate mixedstrategies that assigns probability 1 to B1 or S1 , and compare the expected utility in both cases. For B1 = (1, 0) and S1 = (0, 1) we have: U1 B1 , 13 , 32 = U1 S1 , 13 , 32 = 32 . We can follow the same procedure for player 2 to conclude that α is a Nash equilibrium for BoS. 3 Markov Chains and PCTL PCTL formulas describe qualitative and quantitative properties of probabilistic systems, sometimes modeled as Markov chains. These formulas address properties such as “the probability of getting p satisfied is at least one half ”, or “the expected cost (or reward) of getting p satisfied is at most 10 ”. This section has the purpose of introducing Markov chains and PCTL. We first introduce Markov chains, that will serve as the semantic model for PCTL formulas. Next, we introduce PCTL syntax and satisfaction. For details on the material presented in this section, we refer the reader to the original paper [7], and also to the book [9]. Definition 4 (Discrete-Time Markov Chain). A Discrete-Time Markov Chain (DTMC) is a structure: M = hS, sinit , P, C, AtP rop, ℓi where S is a finite set whose elements are called states, sinit is a distinguished element of S which is called the initial state, P : SP × S → [0, 1] is a transition probability function, such that for any state s ∈ S, s′ ∈S P(s, s′ ) = 1, C : S → [0, ∞) is a cost function, AtP rop is a set of countably many atomic propositions, and ℓ : S → 2AtP rop is a labelling function that marks each state in S with a subset of AtP rop. P ostM (s) = {s′ | P(s, s′ ) > 0} is the set of states which are possible to visit from s in one step. A path of a DTMC M is a possibly infinite sequence of states π = s0 s1 · · · such that for any si and si+1 , P(si , si+1 ) > 0. A path is finite if the sequence is finite. We denote by P athsM the set of all infinite paths of M , and by P athsfin M the set of all finite paths of M . Given a path π = s0 s1 · · · si · · · , we use π[i] = si to refer to the ith element of π, and π[0, i] to refer to the finite prefix s0 · · · si of π. The set P athsM (s) = {π | π ∈ P athsM and π[0] = s} denotes the set of all infinite paths of M beginning with s. Similarly, the set fin P athsfin M (s) = {π | π ∈ P athsM and π[0] = s} denotes the set of all finite paths of M beginning with s. For any finite path π, the cylinder set of π is the set Cyl(π) = {π ′ | ′ π has the prefix π}. The probability measure P rs associated with a DTMC M and state s is that of the smallest σ-algebra Σs that contains all the cylinder fin sets Cyl(π), for π ∈ P aths Q M (s). For finite paths π = s0 · · · sn , the probability of π is defined as P(π) = i<n P(si , si+1 ). The probability of Cyl(π) under P rs is 143 determined by P rs (Cyl(π)) = P(π). Let {Ci }i∈I be a collection of pairwise disjoint cylinder sets for some countableS index I. The Pprobability of the countable S = C union i∈I Ci is determined by P rs i i∈I P rs (Ci ). i∈I The application C(s) for some s in DTMC M denotes the cost (or reward, depending on the model in consideration) gained at leaving state s. Then, for any fin finite P π = s0 · · · sn in P athsM the cumulative cost of π is defined by CostM (π) = 0≤i<n C(si ). Note that the cost of leaving the last state of a path is not in the sum, and that for paths consisting of a single state s, CostM (s) = 0. For an infinite path π ∈ P athsM (s) and A ⊆ S, we define the cumulative cost of reaching a state in A as: CostM (π[0, n]) if ∃n ≥ 0 : π[n] ∈ A ∧ ∀0 ≤ i < n : π[i] 6∈ A CostM (π, A) = ∞ otherwise For some state s and A ⊆ S, we define the set {s |= FA} of all finite paths π = s0 · · · sn , such that s0 = s, sn ∈ A and ∀0 ≤ i < n : si 6∈ A. Note that the set {s |= FA} is measurable, therefore P rs ({s |= FA}) is the probability of reaching a state in A from s. We now define the expected cumulative cost of reaching a state in A from s as: P π∈{s|=F A} P(π)CostM (π) if P rs ({s |= FA}) = 1 ExpCostM (s, A) = ∞ otherwise Definition 5 (PCTL Well-formed Formulas). The set of well-formed formulas ϕ of PCTL for some countable set of atomic propositions AtP rop is defined as the set generated by the following BNF grammar: ϕ ::= ⊤ p ¬ϕ (ϕ ∧ ϕ) P⊲⊳a [τ ] E⊲⊳c [ϕ] τ ::= X ϕ ϕ U ϕ where p ∈ AtP rop, a ∈ [0, 1], c ∈ [0, ∞) and ⊲⊳ ∈ {<, >, ≤, ≥}. PCTL formulas describe properties of the infinite computations of a probabilistic system. We can study two classes of formulas: path or temporal formulas and state formulas. Path formulas inherit their meaning from LTL. X ϕ is satisfied by paths in which the next state satisfies ϕ. ϕ U ψ is satisfied by paths where there exists a future or present state that satisfies ψ, while all the previous states satisfy ϕ. State formulas inherit their meanings from CTL. The formula ⊤ is satisfied by every DTMC at every state. The formulas ¬ϕ, for negation, and (ϕ ∧ ψ), for conjunction, have their usual meanings. The CTL path quantifiers are replaced with the operator P. A formula P⊲⊳a [τ ] means that the probability of the temporal formula τ being satisfied is ⊲⊳ a. E⊲⊳c [ϕ] is satisfied at states where the expected cost of reaching another state where ϕ is satisfied is ⊲⊳ c. The other connectives from the propositional logic are defined as usual: ⊥ = ¬⊤ (ϕ ∨ ψ) = ¬(¬ϕ ∧ ¬ψ) (ϕ → ψ) = (¬ϕ ∨ ψ) (ϕ ↔ ψ) = ((ϕ → ψ) ∧ (ψ → ϕ)) 144 where ⊥ is not satisfied by any DTMC at any state, (ϕ ∨ ψ) is a disjunction, (ϕ → ψ) is a material implication and (ϕ ↔ ψ) is a biconditional. We also define the following derived formulas: P⊲⊳a [Fϕ] = P⊲⊳a [⊤ U ϕ] P⊲⊳a [Gϕ] = P⊲⊳1−a [F¬ϕ] P=a [τ ] = (P≥a [τ ] ∧ P≤a [τ ]) E=a [ϕ] = (E≥a [ϕ] ∧ E≤a [ϕ]) where < = >, > = <, ≤ = ≥ and ≥ = ≤. The derived path formulas also inherit their meanings from LTL. Fϕ is satisfied by paths where there exists a future or present state that satisfies ϕ. Gϕ is satisfied by paths where ϕ is satisfied at every state of the path. Definition 6 (PCTL Satisfaction). Let M = hS, sinit , P, C, AtP rop, ℓi be a DTMC. The satisfaction relation |= between pairs (M, s) with s ∈ S and wellformed formulas with atomic propositions in AtP rop is defined as the smallest relation such that: (M, s) |= ⊤ (M, s) |= p ⇔ p ∈ ℓ(s) (p ∈ AtP rop) (M, s) |= ¬ϕ ⇔ (M, s) 6|= ϕ (M, s) |= (ϕ ∧ ψ) ⇔ (M, s) |= ϕ and (M, s) |= ψ (M, s) |= P⊲⊳a [τ ] ⇔ ps (τ ) ⊲⊳ a (M, s) |= E⊲⊳c [ϕ] ⇔ es (ϕ) ⊲⊳ c Where the functions ps (τ ) and es (ϕ) are the following: ps (τ ) = P rs ({π ∈ P athsM (s) | π |= τ }) es (ϕ) = ExpCostM (s, {s′ | (M, s′ ) |= ϕ}) P rs is the probability measure described before and the relation |= between paths in P athsM and temporal formulas is defined as: π |= X ϕ ⇔ π[1] |= ϕ π |= ϕ U ψ ⇔ ∃n ≥ 0 : ∀i < n : π[i] |= ϕ ∧ π[n] |= ψ If there is some ϕ such that (M, sinit ) |= ϕ, then we say that ϕ is initially satisfied, and write M |= ϕ. Note that the set {π ∈ P athsM (s) | π |= τ } is a measurable set. The case τ = X ϕ is straightforward. When τ = ϕ U ψ, the set coincides with the countable union of cylinder sets Cyl(π ′ ), for finite prefix π ′ of π such that only its last state sn satisfies ψ, and all its previous states si satisfy ϕ. Given a DTMC M , a state s of M and a PCTL formula ϕ, the problem of deciding whether (M, s) |= ϕ is called the PCTL model-checking problem. The 145 basic algorithm for solving the model-checking problem consists in recursively computing the set Sat(ϕ) = {s ∈ S | (M, s) |= ϕ}. The computation of Sat for atomic formulas is given by the labelling function ℓ. Only basic set operations are needed for computing Sat for formulas with basic logical connectives. The computation of Sat for formulas P⊲⊳ [τ ] and E⊲⊳ [ϕ] involves the calculation of reachability probabilities and expected costs for every state. These tasks can be reduced to the problem of finding a solution to a system of linear equations. The explanation of these algorithms is out of the scope of this paper; for detailed explanations we refer the reader to [7, 9] 4 A Cost Quantifier for PCTL In this section, we present the language of Cost-Quantified PCTL (CQ-PCTL). CQ-PCTL extends its ancestor by adding the possibility to quantify the values of the expected cost operator. There is, however, a syntactic constraint on the quantified formulas: the occurrence of quantified variables cannot be nested. We first define the syntax of the modified language, followed by the algorithm for model checking. The syntax of CQ-PCTL is almost the same as that of PCTL. We modify the definition of expected cost formulas and add an extra clause to the grammar defining the syntax of PCTL formulas. Definition 7 (CQ-PCTL Well-formed Formulas). For some countable set of atomic propositions AtP rop and some set V ar of countably many variable names, the set of the well-formed formulas ϕ of CQ-PCTL is defined as the set generated by the following BNF grammar: ϕ ::= ⊤ p ¬ϕ (ϕ ∧ ϕ) P⊲⊳a [τ ] E⊲⊳c [ϕ] ∃x.ϕ τ ::= X ϕ ϕ U ϕ where p ∈ AtP rop, a ∈ [0, 1], c ∈ ([0, ∞)∪V ar), x ∈ V ar and ⊲⊳ ∈ {<, >, ≤, ≥}. From the basic syntax we can derive the universal quantifier: ∀x.ϕ = ¬∃x.¬ϕ Also, we say that a variable x occurs free in ϕ if it does not occur under the scope of an existential or universal quantifier; otherwise we say that it is bound. For a formula ϕ, we say that it has no nested variables if for any subformula E⊲⊳x [ψ] of ϕ: (i) the set of free variables of ψ contains at most x and (ii) the set of bound variables of ψ is empty. A formula with no free variables is called a sentence. Remark 1. In the rest of this paper we will assume that formulas are sentences without nested variables. 146 Definition 8 (CQ-PCTL Satisfaction). The satisfaction relation is defined as follows for the new formulas: (M, s) |= ∃x.ϕ ⇔ there exists c ∈ [0, ∞) such that (M, s) |= ϕ[x := c] where ϕ[x := c] is the syntactic substitution replacing all the free occurrences of the variable x in ϕ by the non-negative real c. The satisfaction for the rest of the formulas is defined as for PCTL. The model-checking algorithm for CQ-PCTL is essentially the same as for PCTL for their shared formulas. In the rest of this section we will describe the steps for calculating the set Sat(∃x.ϕ) for the new quantified formulas. Before applying the algorithm, it is necessary to transform the subformulas of ∃x.ϕ so as to eliminate negative formulas. This is done by transforming ϕ into its Positive Normal Form (PNF) [9]. Definition 9 (Positive Normal Form). A formula ϕ is non-negative iff ϕ 6= ¬ϕ′ for some ϕ′ . Also, we say that ϕ is in Positive Normal Form if ϕ, and all of its subformulas, excepting atomic propositions, are non-negative. Note that it is possible to transform every formula into another equivalent formula in PNF. This can be done by (i) introducing the constant ⊥, the disjunction, and the universal quantifier into the base syntax; (ii) applying De Morgan’s and double negation Laws; and (iii) applying the following additional equivalences: ¬P⊲⊳a [τ ] ⇔ P¬⊲⊳a [τ ] ¬E⊲⊳c [ϕ] ⇔ E¬⊲⊳c [ϕ] where ¬< = ≥, ¬> = ≤, ¬≤ = > and ¬≥ = <. Also, we will use PNF(¬ϕ) to denote a PNF formula equivalent to ¬ϕ. The algorithm for computing Sat(∃x.ϕ) consists of two steps. The first step computes a set I(∃x.ϕ) of intervals. These intervals are constraints that a value c of x must satisfy for ϕ[x := c] being satisfied at some state in S. The second step consists of several attempts of computing Sat(ϕ[x := c]), each attempt using a value for c taken from an interval obtained beforehand. Definition 10 (Set I(∃x.ϕ)). Given a DTMC M = hS, Sinit , P, C, AtP rop, ℓi and a CQ-PCTL existential formula in PNF ∃x.ϕ, the set I(∃x.ϕ) = i(x, ϕ) is 147 inductively constructed by the following definition: i(x, l) = {[0, ∞)} i(x, ¬p) = {[0, ∞)} (where l ∈ AtP rop ∪ {⊤, ⊥}) (where p ∈ AtP rop) ′ i(x, (ψ ∨ ψ )) = i(x, ψ) ∪ i(x, ψ ′ ) i(x, (ψ ∧ ψ ′ )) = {A ∩ B | A ∈ i(x, ψ), B ∈ i(x, ψ ′ )} [ i(x, E⊲⊳x [ψ]) = {r ∈ [0, ∞) | es (ψ) ⊲⊳ r} s∈S i(x, E⊲⊳a [ψ]) = k(i(x, ψ)) ∪ k(i(x, PNF(¬ψ))) i(x, P⊲⊳a [X ψ]) = k(i(x, ψ)) ∪ k(i(x, PNF(¬ψ))) i(x, P⊲⊳a [ψ U ψ ′ ]) = {A ∩ B | A, B ∈ iU (x, ψ, ψ ′ )} iU (x, ψ, ψ ′ ) = k(i(x, ψ)) ∪ k(i(x, ψ ′ )) ∪ k(i(x, PNF(¬ψ))) ∪ k(i(x, PNF(¬ψ ′ ))) where k(I) = T X | X ∈ 2I for I a set of intervals. The application i(x, ϕ) of Def. 10 builds a set containing intervals of real numbers. The values c in these intervals may satisfy ϕ[x := c]. This set is constructed in such a way that if there is a satisfying c (i.e., ϕ[x := c] is satisfiable at some state), then there is an interval A such that c ∈ A ∈ i(x, ϕ). In such a case, it is also important that the interval contains only satisfying values (Theorem 2), for we have to choose just one of the possibly infinitely many values in the interval. The set i(x, ϕ) is constructed inductively. At the basis of the induction there are the formulas E⊲⊳x ψ, for which it is easy to calculate the needed intervals using the model-checking algorithms of PCTL. For disjunctions, the set interval may be in the union of the sets calculated for both disjuncts. The case of conjunction is more complicated: if there is a satisfying c, then c must be at the same time in one interval calculated for each one of the disjuncts. For the temporal operators, a similar reasoning to that for conjunctions is made: if there is a c such that ϕ[x := c] at each state of some subset of S, then c must be contained in several of the intervals calculated for the subformulas of ϕ. The following theorem states a property necessary for using the set I(∃x.ϕ) in the model-checking algorithm. Also, the proof of the theorem provides some insight into the definition of the set I(∃x.ϕ). Theorem 2. Let M be a DTMC, s a state of M , and ∃x.ϕ a CQ-PCTL formula in PNF. Then, for all c ∈ [0, ∞) the following two conditions hold: 1. If (M, s) |= ϕ[x := c], then there exists A ∈ i(x, ϕ) such that c ∈ A and for all c′ ∈ A, (M, s) |= ϕ[x := c′ ] 2. If (M, s) 6|= ϕ[x := c], then there exists A ∈ i(x, PNF(¬ϕ)) such that c ∈ A and for all c′ ∈ A, (M, s) |= PNF(¬ϕ)[x := c′ ]. Proof. We will show only the case when x occurs in ϕ. The proof is by induction on ϕ. 148 – Case ϕ = ψ ∨ψ ′ . Condition (1): the required interval A is in i(x, ψ)∪i(x, ψ ′ ). Condition (2): by the induction hypothesis we have corresponding intervals A ∈ i(x, PNF(¬ψ)) and B ∈ i(x, PNF(¬ψ ′ )). Therefore the required interval A ∩ B is in i(x, PNF(¬ψ) ∧ PNF(¬ψ ′ )). – Case ψ ∧ ψ ′ . Condition (1): by the induction hypothesis we have corresponding intervals A ∈ i(x, ψ) and B ∈ i(x, ψ ′ ). Therefore the required interval A ∩ B is in i(x, (ψ ∧ ψ ′ )). Condition (2): by the induction hypothesis we have the corresponding intervals A ∈ i(x, PNF(¬ψ)) and B ∈ i(x, PNF(¬ψ ′ )). Therefore the required interval is in i(x, PNF(¬ψ)) ∪ i(x, PNF(¬ψ ′ )). – Case E⊲⊳x [ψ]. Condition (1): direct by definition. Condition (2): also by definition and the equivalence ¬E⊲⊳x ⇔ E¬⊲⊳x . – Case E⊲⊳a [ψ] (a 6= x). Condition (1): there are two subcases: (a) es (ψ) ∈ [0, ∞) and (b) es (ψ) = ∞. (a) There is a path from s to a state in the nonempty set Sat(ψ[x := c]). By the induction hypothesis, for each state sj in Sat(ψ[x := c]) there is a corresponding interval Aj . Then the required interval for E⊲⊳a [ψ] must be the intersection of some Aj intervals (contained in k(i(x, ψ))). (b) The set Sat(¬ψ[x := c]) is nonempty. Again by the induction hypothesis, for each sj ∈ Sat(¬ψ[x := c]) there is a corresponding interval Aj (contained in k(i(x, PNF(¬ψ)))). Condition (2): holds by the equivalence ¬E⊲⊳a ⇔ E¬⊲⊳a . – Case P⊲⊳a [X ψ]. Condition (1): there are two possibilities: (a) ps (X ψ[x := c]) ⊲⊳ a holds when ψ[x := c] is satisfiable at some states reachable from s in one step, and (b) ps (X ψ) ⊲⊳ a holds when ψ[x := c] is not satisfiable at some states reachable from s in one step. For (a) the required interval is in k(i(x, ψ)). For (b) the required interval is in k(i(x, PNF(¬ψ))). Condition (2): holds by the equivalence ¬P⊲⊳a ⇔ P¬⊲⊳a . – Case P⊲⊳a [ψ U ψ ′ ]. Condition (1): once again, ps (ψ U ψ ′ ) ⊲⊳ a may hold when either the subformulas are satisfiable or not. Every possible combination is included in {A ∩ B | A, B ∈ iU (x, ψ, ψ ′ )}. Condition (2): holds by the equivalence ¬P⊲⊳a ⇔ P¬⊲⊳a . ⊓ ⊔ Theorem 2 suggests the last step of the algorithm. Given a CQ-PCTL formula in PNF ∃x.ϕ, we build the set Sat(∃x.ϕ) as follows: Sat(∃x.ϕ) = [ {Sat(ϕ[x := c]) | c ∈ A} A∈I(∃x.ϕ) Note that Theorem 2 also implies that it suffices to choose a single c from each interval A. The basic algorithm presented here can be easily extended to the case where the values of P⊲⊳a formulas are also quantified. Also, some nesting constraints (remark 1) can be weakened, as long as there do not occur circular dependencies between the quantified variables. 149 5 Model-Checking Games for Nash Equilibria In this section, we show how to construct a DTMC MG,α for a finite strategic game G and its mixed-strategy α. Although the construction is for strategic-form games, it is based on extensive forms. Extensive-form games differ from strategic-form ones in that the sequentiality of the actions is important. An extensive game can be described by a tree structure. In a game tree each node represents the turn of only one player, and for each possible action, such a tree has one arc to another player’s turn. In a strategic game it is assumed that each agent executes her/his action independently from and without knowing the other players’ actions. To model this in an extensive game, states are grouped in such a manner that they represent next player’s uncertainty about previous actions (see Fig. 2 for an extensive form of BoS; dotted lines group player 1 moves as a single state, as player 2 does not know which action has been taken). B1 S1 B1 B2 S1 S2 B2 S2 B1 , B2 B1 , S2 S1 , B2 S1 , S2 2, 1 0, 0 0, 0 1, 2 Fig. 2. An extensive form of BoS; utilities are shown under the leaf nodes Given the game and the mixed-strategy profile, in our codification we build a structure similar to an extensive-form tree. In the built structure each arc, except the arcs leaving the root, is labelled with the probability that the mixedstrategy profile assigns to that particular action. As we cannot group states in a DTMC, we build one subtree for each player and each pure strategy. Each one of these subtrees models the situation where player i chooses some strategy ai , but the other players follow the mixed-strategy. By proceeding in this manner, each leaf node corresponds to one strategy profile of the strategic-form game. Consequently, each leaf node is associated with its utility via the cost function C. As the cost function models the cost of leaving the state, we need to add a fictitious absorbing node below the leafs, representing the ending of the game. Figure 3 illustrates one of the subtrees described above. Note that there is exactly one path from s(i,ai ) to the ending state, and going through each strategy profile. The arcs of such a path are the probabilities assigned by the 150 mixed-strategy profile to that action. Hence, the expected cost coincides with the expected utility. We can therefore use a cost-quantified formula to compare expected costs and verify if Theorem 1 is applicable. sinit s(i,ai ) s(i,ai ,a′−i ) s(i,ai ,a′′−i ) ··· s(i,ai ,am −i ) send Fig. 3. After player i chooses strategy ai other players make their own decisions, thus creating various strategy profiles Definition 11 (DTMC Game Model). For any game: G = hN, {Ai }i∈N , {ui }i∈N i b we define the DTMC and a mixed-strategy profile α of its mixed extension G, MG,α as the structure: MG,α = hS, sinit , P, C, AtP rop, ℓi where the set of states is: S = {sinit } ∪ {send } ∪ {sx }x∈Idx Idx is the following index set: Idx = [ {(i, ai ), (i, ai , aj1 ), . . . , (i, ai , aj1 , . . . , ajm ) i∈N ai ∈Ai | jk ∈ N −{i}, jk < jk+1 , and (ai , aj1 , . . . , ajm ) ∈ A} 151 The probability transition function is defined by cases: for i ∈ N, ai ∈ Ai , n = | P(sinit , s(i,ai ) ) = 1/n [ Aj | j∈N for j ∈ N, x ∈ Idx P(s(x) , s(x,aj ) ) = αj (aj ) for i ∈ N, a ∈ A P(s(i,a) , send ) = 1 P(send , send ) = 1 P(s, s′ ) = 0 otherwise The cost function is defined as follows: C(s(i,a) ) = ui (a) for a ∈ A C(s) = 0 otherwise Finally, the set of atomic propositions and the labelling function are the following: AtP rop = {end} ∪ [ Ai i∈N ℓ(send ) = {end} ℓ(s(i,ai ) ) = {ai } for i ∈ N, ai ∈ Ai ℓ(s) = ∅ otherwise Remark 2. The cost function of a DTMC requires non-negative values. We thus assume that games’ utility functions also assign non-negative values only. If this is not the case, it is possible to add a constant sufficiently large to every value returned by the ui functions, in order to make them non-negative. The addition of such a constant does no affect any result, as we only compare the mean values of utilities. Example 2. (Model for BoS) The DTMC model for the game M constructed BoS and the mixed-strategy profile α = 23 , 31 , 13 , 32 is depicted in Fig. 4. We can verify the following facts: (M, s(1,B1 ) ) |= B1 ∧ E= 23 end (M, s(2,B2 ) ) |= B2 ∧ E= 32 end (M, s(1,S1 ) ) |= S1 ∧ E= 23 end (M, s(2,S2 ) ) |= S2 ∧ E= 23 end For every player, all the pure strategies in the support of α yield the same payoff. Then, by Theorem 1 α is a Nash equilibrium. We can characterize this fact with a formula of CQ-PCTL: (M, sinit ) |= ∃x. (P>0 [X (B1 ∧ E=x end)] ∧ P>0 [X (S1 ∧ E=x end)]) ∧ ∃x. (P>0 [X (B2 ∧ E=x end)] ∧ P>0 [X (S2 ∧ E=x end)]) 152 sinit 1 4 1 4 1 4 1 4 s(1,B1 ) s(1,S1 ) s(2,B2 ) s(2,S2 ) 1 3 1 3 2 3 2 3 2 3 2 3 1 3 1 3 s(1,B1 ,B2 ) s(1,B1 ,S2 ) s(1,S1 ,B2 ) s(1,S1 ,S2 ) s(2,B2 ,B1 ) s(2,B2 ,S1 ) s(2,S2 ,B1 ) s(2,S2 ,S1 ) 1 1 1 1 1 1 1 1 send 1 Fig. 4. DTMC for the game BoS and its mixed-strategy Nash equilibrium The previous example shows how it is possible to characterize a mixedstrategy Nash equilibrium of a game with CQ-PCTL. Although it is not the case in BoS, by Theorem 1 we must verify that the expected cost is effectively a best response. This is achieved by verifying that the expected cost of deviating from the profile does not exceed that of the strategies in the support. The following definition captures this constraint. Definition 12 (Mixed-strategy Nash Equilibria Characterization). For a DTMC game model MG,α , the CQ-PCTL characterization of a mixed-strategy Nash equilibrium is the formula NE G,α defined as follows: ^ NE G,α = ∃x. fsupp(αi ) ∧ fsupp(αi ) i∈N fsupp(αi ) = ^ P>0 [X (ai ∧ E=x end)] ai ∈supp(αi ) fsupp(αi ) = ^ P>0 [X (ai ∧ E≤x end)] ai ∈supp(αi ) where supp(αi ) denotes the complement of supp(αi ). Finally, we end this section proving a lemma and a theorem, both of which assert the correctness of the whole construction. Lemma 1. Let MG,α be a DTMC game model. For any player i ∈ N and any strategy ai ∈ Ai , the equation Ui (ai , α−i ) = ExpCostMG,α (s(i,ai ) , send ) holds. Proof. Let a = (ai , aj1 , . . . , ajm ) ∈ A be a profile such that its components follow the constraints of the index Idx. From the definitions of S and P we have that 153 there is a unique path π = s(i,ai ) s(i,ai ,aj1 ) . . . s(i,ai ,aj1 ,...,ajm ) s{end} . For such a path, we have that: P rs(i,ai ) (π) = P(π) = P(s(i,ai ) , s(i,ai ,aj1 ) ) · · · P(s(i,ai ,aj1 ,...,ajm ) , send ) Y = αj (aj ) j∈N = pα (a) CostMG,α (π) = C(s(i,ai ) ) + · · · + C(s(i,a) ) = ui (a) Moreover, the set of all such paths is equal to P(i,ai ) = {s(i,ai ) |= F{send }}. Therefore: X ExpCostMG,α (s(i,ai ) , {send }) = P(π)CostMG,α (π) π∈P(i,ai ) = X pα (a)ui (a) a∈A = Ui (α) ⊓ ⊔ Theorem 3. Let MG,α be a DTMC game model. The mixed-strategy profile α is a Nash equilibrium of G if, and only if, MG,α |= NE G,α holds. Proof. We show the implication only in one direction (if ); the proof for the converse is similar. Suppose as a contradiction that the consequent does not hold. Therefore, there must be some player i ∈ N for which ∃x. fsupp(αi ) ∧ fsupp(αi ) is not initially satisfied. It follows by Lemma 1 that for any ai ∈ Ai , if u = Ui (ai , α−i ), then (MG,α , s(i,ai ) ) |= ai ∧ E=u end holds (the first conjunct by def. of ℓ and the second conjunct by Lemma 1). Let c = Ui (ai , α−i ) for some ai ∈ supp(αi ). Then, by the previous fact and Theorem 1, the formulas fsupp(αi ) [x := c] and fsupp(αi ) [x := c] are both initially satisfied. A contradiction. ⊓ ⊔ 6 Conclusions In this paper, we have addressed the problem of characterizing a mixed-strategy Nash equilibrium using PCTL enriched with an expected-cost quantifier: CQPCTL. Previous works include [1–3], where the authors give a characterization of pure-strategy Nash equilibria and other game-theoretic notions using temporal and dynamic logic. In [6], the authors incorporate stochastic actions. They provide a model for a bargaining game (Rubinstein’s alternating offers negotiation protocol, see [8]). With this model, the authors use PCTL formulas for making a quantitative analysis for several mixed strategies of the game. They, however, do not provide characterizations for Nash equilibria. 154 There are two general routes for future research: one dealing with CQ-PCTL and the other with its game-theoretic concepts. As for the first route, recall that in Sect. 4 we presented an algorithm for model checking a fragment of CQ-PCTL. The whole language includes formulas with nested variables. The nested variables introduce circular dependencies that our current algorithm cannot deal with. We do not know whether such an algorithm exists. As for the complexity of our algorithm, we do know that in the worst case it is exponential in the size of the formula. It is important to improve on this bound, if possible. It would also be desirable, in the spirit of this work, to address other game solution concepts, such as evolutionary and correlated equilibria (cf. [8]). Beyond finite strategic games, it would be interesting to deal with other classes of games, like Bayesian and iterated games. Finally, further investigation would be necessary to determine if model-checking tools can be used to calculate solutions, besides characterizing them. There is an implementation of the CQ-PCTL model checker and DTMC game construction of this paper written in the programming language Haskell. This implementation can be obtained by request to the authors. Acknowledgments We thank IIMAS and UNAM for their facilities. Pedro Arturo Góngora is sponsored by CONACyT. Finally, we also thank referees for their comments. References 1. Bonanno, G.: Branching time, perfect information games, and backward induction. Games and Economic Behavior 36 (2001) 57–73 2. Harrenstein, P., van der Hoek, W., Meyer, J.J., Witteveen, C.: On modal logic interpretations of games. In: Proceedings of the Fifteenth European Conference on Artificial Intelligence. (2002) 28–32 3. van der Hoek, W., Jamroga, W., Wooldridge, M.: A logic for strategic reasoning. In: AAMAS ’05: Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems, New York, NY, USA, ACM (2005) 157–164 4. Pauly, M., Wooldridge, M.: Logic for mechanism design — a manifesto. In: In Proceedings of the 2003 Workshop on Game Theory and Decision Theory in Agent Systems (GTDT-2003). (2003) 5. van der Hoek, W., Roberts, M., Wooldridge, M.: Social laws in alternating time: Effectiveness, feasibility, and synthesis. Synthese 156(1) (2007) 1–19 6. Ballarini, P., Fisher, M., Wooldridge, M.: Automated game analysis via probabilistic model checking: a case study. In: Proceedings of the Third Workshop on Model Checking and Artificial Intelligence. (2006) 125–137 7. Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects of Computing 6 (1994) 102–111 8. Osborne, M., Rubinstein, A.: A Course in Game Theory. The MIT Press, Cambridge, Massachusetts (1994) 9. Baier, C., Katoen, J.P.: Principles of Model Checking. The MIT Press, Cambridge, Massachusetts (2008) 155 Argumentation-Based Preference Modelling with Incomplete Information Wietske Visser, Koen V. Hindriks and Catholijn M. Jonker Man Machine Interaction Group, Delft University of Technology Mekelweg 4, 2628 CD Delft, The Netherlands [email protected], [email protected], [email protected] Abstract. No intelligent decision support system functions even remotely without knowing the preferences of the user. A major problem is that the way average users think about and formulate their preferences does not match the utility-based quantitative frameworks currently used in decision support systems. For the average user qualitative models are a better fit. This paper presents an argumentationbased framework for the modelling of and automated reasoning about multi-issue preferences of a qualitative nature. The framework presents preferences according to the lexicographic ordering that is well-understood by humans. The main contribution of the paper is that it shows how to reason about preferences when only incomplete information is available. An adequate strategy is proposed that allows reasoning with incomplete information and it is shown how to incorporate this strategy into the argumentation-based framework for modelling preferences. Key words: Qualitative Preferences, Argumentation, Incomplete Information 1 Introduction In this paper we introduce an argumentation-based framework for modelling qualitative multi-attribute preferences under incomplete information. This is motivated by our interest in developing a negotiation support system, as part of a larger project. In this context, we are faced with the need to express a user’s preferences. A necessary (but not sufficient) condition for an offer to become an agreement is that both parties feel that it satisfies their preferences well enough. Unfortunately, eliciting and representing a user’s preferences is not unproblematic. Existing negotiation support systems are based on quantitative models of preferences. These kinds of models are based on utilities; a utility function determines for each outcome a numerical value of utility. However, it is difficult to elicit such models from users, since humans generally express their preferences in a more qualitative way. We say we like something more than something else, but it seems strange to express liking something exactly twice as much as an alternative. In this respect, qualitative preference models will have a higher cognitive plausibility as they provide a better correspondence with representations used by humans. We also think that qualitative models will allow a human user to interact more naturally with an agent negotiating on his behalf or supporting him in his negotations, and will investigate this in future. There are, however, several challenges that need to 156 be met before qualitative models can be usefully applied. Doyle and Thomason [8] provide an overview including among others the challenge to deal with partial information (information-limited rationality) and, more generally, the challenge to formalize various reasoning-related tasks (knowledge representation, reasons, and preference revision). For any real-life application it is important to be able to handle multi-issue preferences. It is a natural approach to derive object preferences from general preferences over properties or attributes. For example, it is quite natural to say that you prefer one house over another because it is bigger and generally you prefer larger houses over smaller ones. This might still be so if the first house is more expensive and you generally prefer cheaper options. So there is an interplay between attributes and the preferences a user holds over them in determining object preferences. This means that object preferences can be quite complex. One approach to obtain preferences about objects is to start with a set of properties of these objects and derive preferences from a ranking of these properties that indicates the relative importance or priority of each of these properties. This approach to obtain preferences is typical in multi-attribute decision theory [12], a quantitative theory that derives object preferences from utility values assigned to outcomes which are derived from numeric weights associated with properties or attributes of objects. Several qualitative approaches have also been proposed [3, 5–7, 13]. A user’s preferences and knowledge about the world may also be incomplete, inconsistent or changing. For example, a user may lack some information regarding the objects he has to choose between, or he might have contradictory information from different sources. Preferences may change for various reasons, e.g. new information becoming available, experience, changing goals, or interaction with persuasive others. For now, we focus on the situation in which information about objects is not complete, but will address other types of incompleteness, inconsistency and change in future. The approach we take is based on argumentation. In recent years, argumentation has evolved to be a core study within artificial intelligence and has been applied in a range of different topics [2]. We incorporate some of the ideas introduced in existing qualitative approaches but also go beyond these approaches by introducing a framework that is able to reason about preferences also when only incomplete information is available. Because of its non-monotonic nature, argumentation is useful for handling inconsistent and incomplete information. Although a lot of work has been done on argumentationbased negotiation (for a comprehensive review, see [16]), most of this work considers only the bidding phase in which offers are exchanged. For preparation, the preferences of a user have to be made clear (both to the user himself and to the agent supporting him), hence we need to express and reason with them. We focus here on the modelling of a single user’s preferences by means of an argumentation process. The idea is that a user weighs his preferences, which gives him better insight into his own preferences, and so this weighing is part of the preference elicitation process. The weighing of arguments maps nicely onto argumentation. For example, ‘I like to travel by car because it is faster than going by bike’ is countered by ‘But cycling is healthier than driving the car and that is more important to me, so I prefer to take the bike’. This possibility to construct arguments that are attacked by counterarguments is another advantage of argumentation, since it is a very natural way of reasoning for humans and fits in with a user’s own reasoning processes. This is a general feature of argumentation and we 157 will make extensive use of it: arguments like those above form the basis of our system. We believe that this way of reasoning will also be very useful in the preference elicitation process since the user’s insight into his preferences grows piece by piece as he is expressing them. The introduction of an argumentation-based framework for reasoning about preferences even when only incomplete information is available seems particularly suitable for such a step-by-step process. It allows the user to extend and refine the system representation of his preferences gradually and as the user sees fit. Another motivation to use argumentation is the link with multi-agent dialogues [1], which will be very interesting in our further work on negotiation. In this paper we present an argumentation-based framework for reasoning with qualitative multi-attribute preferences. In Section 2, we introduce qualitative multi-attribute preferences, in particular the lexicographic preference ordering. In Section 3 we start by modelling this ordering for reasoning with complete information in an argumentation framework. Then we proceed and extend this framework in such a way that it can also handle incomplete information. Our main contribution, in Section 4, is a strategy (based on the lexicographic ordering) with some desired properties to derive object preferences in the case of incomplete information. In Section 5 this strategy is subsequently incorporated into the argumentation framework. Section 6 concludes the paper. 2 Qualitative Multi-Attribute Preferences Qualitative multi-attribute preferences over objects are based on a set of relevant attributes or goals, which are ranked according to their importance or priority. Without loss of generality, we only consider binary (Boolean) attributes (cf. [5]). Moreover, it is assumed that the presence of an attribute is preferred over its absence. For example, given that garden is an attribute, a house that has a garden is preferred over one that does not have one. The importance ranking of attributes is defined by a total preorder (a total, reflexive and transitive relation), which we will denote by . This relation is not required to be antisymmetric, so two or more attributes can have the same importance. The relation yields a stratification of the set of attributes into importance levels. Each importance level consists of attributes that are deemed equally important. Together with factual information about which objects have which attributes, the attribute ranking forms the basis on which various object preference orderings can be defined. One of the most well-known preference orderings is the lexicographic ordering, which we will use here. [5] and [7] define more multi-attribute preference orderings, such as the discrimin and best-out orderings. In this paper we focus on the lexicographic ordering because it seems natural, it defines a total preference relation (contrary to the discrimin ordering) and it is more discriminating than the best-out ordering. Since the other orderings are structurally similar to the lexicographic ordering, a similar argumentation framework could be defined for them if desired. We introduce the lexicographic preference ordering by means of an example. Example 1. Paul wants to buy a house. According to him, the most important attributes are large (minimally 100m2 ), garden and closeToWork, which among themselves are equally important. The next most important attributes are nearShops and quiet. Being detached is the least important. Paul can choose between three options: a villa, 158 large garden closeToWork nearShops quiet detached villa X X apartment X X X cottage X X X Table 1. An example of objects and attributes X X an apartment and a cottage. The attributes of these objects are displayed in Table 1. In this table, the attributes are ordered in decreasing importance from left to right. A dashed line between attributes indicates equal importance, a solid line a transistion to a lower importance level. A checkmark indicates that an object has the attribute, an empty box means that the attribute is absent. Which house should Paul choose? He first considers the highest importance level, which in this case comprises large, garden and closeToWork. The villa and the apartment both satisfy two of these attributes, while the cottage only satisfies one. So at this moment Paul concludes that both the villa and the apartment are preferred to the cottage. For the preference between the villa and the apartment he has to look further. At the next importance level, the apartment satisfies one attribute and the apartment satisfies none. So the apartment is preferred over the villa. Note that although the cottage satisfies the most attributes in total, it is still the least preferred option because of its bad score at the more important attributes. Definition 1. (Lexicographic preference ordering) Let P be a set of attributes or goals, and a total preorder on P. We write P ≻ Q for P Q and Q 6 P, and P ≈ Q for P Q and Q P. We use | · | to denote the cardinality of a set. Object a is strictly preferred over object b according to the lexicographic ordering if there exists an attribute P such that |{P′ | a satisfies P′ and P ≈ P′ }| > |{P′ | b satisfies P′ and P ≈ P′ }| and for all Q ≻ P: |{Q′ | a satisfies Q′ and Q ≈ Q′ }| = |{Q′ | b satisfies Q′ and Q ≈ Q′ }|. Object a is equally preferred as object b according to the lexicographic ordering if for all P: |{P′ | a satisfies P′ and P ≈ P′ }| = |{P′ | b satisfies P′ and P ≈ P′ }|. 3 Argumentation Framework for Complete Information In order to formally model and reason with preferences we define an argumentation framework (AF). We use as our starting point the well-known argumentation theory of Dung [10]. An abstract AF in the sense of Dung consists of a set of arguments and a defeat relation (informally, a counterargument relation) among those arguments. An AF is abstract in the sense that both the set of arguments and the defeat relation are assumed to be given, and the construction and internal structure of arguments is not taken into account. If we want to reason with argumentation, we have to instantiate an abstract AF by specifying the structure of arguments and the defeat relation. Arguments are typically built from a logical language by chaining inferences. Inferences are instantiations of general inference schemes, such as modus ponens. Defeat is based on certain relations between the elements of arguments. Together with a knowledge base, they provide a specific AF for arguing about multi-attribute preferences. 159 3.1 Language The language has to allow us to express everything we want to talk about when reasoning about preferences. To start, we need to be able to state the facts about objects: which attributes they do and do not have. We also have to express the importance ranking of attributes, so we need to be able to say that one attribute is more important than another, or that two attributes are equally important. Of course, we want to say that one object is preferred over another, and that two objects are equally preferred. Finally, we need to be able to express how many attributes of equal importance a certain object has, since the lexicographic preference ordering is based on counting these. To this end, we introduce a special predicate has(a, [P], n) which expresses that object a has n attributes of the importance level of attribute P. Since we have no names for importance levels, we denote them by any attribute of that level, placed between square brackets. It is not necessary that the attribute used is among the attributes that the object has; in our example, has(apartment, [quiet], 1) is true even though the apartment is not quiet. All of the things described can be expressed in the following language. Definition 2. (Language) Let P be a set of attribute names with typical elements P, Q, and O a set of object names with typical elements a, b, and let n be a non-negative integer. The language L is defined as follows. ϕ ∈ L ::= P(a) | P ≻ Q | P ≈ Q | pref(a, b) | eqpref(a, b) | has(a, [P], n) | ¬ϕ Formulas of this language have the following informal meaning: P(a) object a has attribute P P≻Q attribute P is more important than attribute Q P≈Q attribute P is equally important as attribute Q pref(a, b) object a is strictly preferred over object b eqpref(a, b) object a is equally preferred as object b has(a, [P], n) object a has n attributes equally important as attribute P (not necessarily including P itself) the negation of ϕ ¬ϕ The idea is that preferences over objects are derived from facts about which objects have which attributes, and the importance order among attributes. These facts are contained in a knowledge base, which is a set of formulas of the type P(a), ¬P(a), P ≻ Q and P ≈ Q. A knowledge base is complete if, given a set of objects to compare and a set of attributes to compare them on, it contains for every object a and for every attribute P, either P(a) or ¬P(a), and for all attributes P, Q, either P ≻ Q, Q ≻ P or P ≈ Q. Example 2. The information from Example 1 can be expressed in the form of the following knowledge base that is based on the language L . large ≈ garden ≈ closeToWork ≻ nearShops ≈ quiet ≻ detached large(villa) large(apartment) ¬large(cottage) garden(villa) ¬garden(apartment) garden(cottage) ¬closeToWork(villa) closeToWork(apartment) ¬closeToWork(cottage) ¬nearShops(villa) nearShops(apartment) nearShops(cottage) ¬quiet(villa) ¬quiet(apartment) quiet(cottage) detached(villa) ¬detached(apartment) detached(cottage) 160 count(a, [P], ∅) 1 has(a, [P], 0) 2 P1 (a) . . . Pn (a) P1 ≈ . . . ≈ Pn count(a, [P1 ], {P1 , . . . Pn }) has(a, [P1 ], n) 3 P1 (a) . . . Pn (a) P1 ≈ . . . ≈ Pn count(a, [P1 ], {P1 , . . . , Pn })uc count(a, [P1 ], S ⊂ {P1 , . . . , Pn }) is inapplicable 4 has(a, [P], n) has(b, [P′ ], m) P ≈ P′ pref(a, b) 5 has(a, [Q], n) has(b, [Q′ ], m) Q ≈ Q′ ≻ P n 6= m prefinf(a, b, [P])uc prefinf(a, b, [P]) is inapplicable 6 has(a, [P], n) has(b, [P′ ], m) P ≈ P′ eqpref(a, b) 7 has(a, [Q], n) has(b, [Q′ ], m) Q ≈ Q′ 6≈ P n 6= m eqprefinf(a, b, [P])uc eqprefinf(a, b, [P]) is inapplicable n>m n=m prefinf(a, b, [P]) eqprefinf(a, b, [P]) Table 2. Inference schemes 3.2 Inferences An argument is a derivation of a conclusion from a set of premises. Such a derivation is built from multiple steps called inferences. Every inference step consists of premises and a conclusion. Inferences can be chained by using the conclusion of one inference step as a premise in the following step. Thus a tree of chained inferences is created, which we use as the formal definition of an argument. Definition 3. (Argument) An argument is a tree, where the nodes are inferences, and an inference can be connected to a parent node if its conclusion is a premise of that node. Leaf nodes only have a conclusion (a formula from the knowledge base), and no premises. A subtree of an argument is also called a subargument. We define inf to be a function that returns the last inference of an argument (the root node), and conc to be a function that returns the conclusion of an argument, which is the same as the conclusion of the last inference. The inferences that can be made are defined by inference schemes. The inference schemes of our framework are listed in Table 2. The first and second inference schemes are used to count the number of attributes of equal importance as some attribute P that object a has. This type of inference is inspired by accrual [14], which combines multiple arguments with the same conclusion into one accrued argument for the same conclusion. Although our application is different, we use a similar mechanism. We want all attributes that are present to be counted. Otherwise we would conclude incorrect preferences (e.g. if the large attribute of the apartment were not counted, we would incorrectly derive that the villa were preferred over the apartment). Inference scheme 1, which counts 0, can always be applied since it has no premises. Inference scheme 161 large(apartment) closeToWork(apartment) large ≈ closeToWork garden(cottage) has(apartment, [large], 2) has(cottage, [garden], 1) A: large ≈ garden 2 > 1 pref(apartment, cottage) nearShops(apartment) has(apartment, [nearShops], 1) has(villa, [nearShops], 0) nearShops ≈ nearShops 1 > 0 pref(apartment, villa) B: has(villa, [nearShops], 0) C: has(apartment, [nearShops], 0) ∗ nearShops ≈ nearShops 0 = 0 eqpref(villa, apartment) nearShops(apartment) D: ∗ is inapplicable Table 3. Example arguments. 2 can be applied on any subset of the set of attributes of some importance level on that an object a has. This means that it is possible to construct an argument that does not count all attributes that are present (a so-called non-maximal count). To ensure that only maximal counts are used, we provide an inference scheme to make arguments that defeat non-maximal counts (inference scheme 3). An argument of this type says that any count which is not maximal is not applicable. This type of defeat is called undercut (see below). Inference scheme 4 says that an object a is preferred over an object b if the number of attributes of a certain importance level that a has is higher than the number of attributes on that same level that b has. For the lexicographic ordering, it is also required that a and b have the same number of attributes on any level higher than that of P. We model this by defining an inference scheme 5 that undercuts scheme 4 if there is a more important level than that of P on which a and b do not have the same number of attributes. Finally, inference schemes 6 and 7 do the same as 4 and 5, but for equal preference. We need these because equal preference cannot be expressed in terms of strict preference. Example 3. We now illustrate the inference schemes with some arguments that can be made from the knowledge base in Example 2. The example arguments are listed in Table 3 (for space reasons, the inference labels are left out). Argument A illustrates the general working; a preference for the apartment over the cottage is derived, based on the facts that the apartment has two attributes of some level and the cottage only one. Argument B illustrates a zero count. Here a preference for the apartment over the villa is derived, based on the facts that the apartment has one attribute of some level and the villa zero. In argument C a non-maximal count is used (stating that the apartment has zero attributes of the level of nearShops), which leads to another conclusion, namely that the villa and the apartment are equally preferred. However, there are undercutters to attack such arguments (argument D). Note that the lexicographic ordering results in a complete transitive order of weak preference on objects. This means that it is not necessary to define inference rules for the property of transitivity, because any preference that follows from transitivity can also be derived directly from the definition of lexicographic ordering. For example, if pref(a, b) and eqpref(b, c) hold, then pref(a, c) also holds, but this can be derived using the inference schemes of Table 2. The same holds for the asymmetry of strict preference and the symmetry of equal preference. 162 3.3 Defeat With the language and the inference rules defined in the previous sections we can construct arguments. To complete our argumentation framework, we also need to specify a defeat relation. This section provides the formal definition of defeat that we will use. The most common type of defeat is rebuttal. An argument rebuts another argument if its conclusion is the negation of the conclusion of the other argument. Rebuttal is always mutual. Another type of defeat is undercut. An undercutter is an argument for the inapplicability of an inference used in another argument (for the specific undercutters used in our framework, see the next section). Undercut works only one way. Defeat is defined recursively, which means that rebuttal can attack an argument on all its premises and (intermediate) conclusions, and undercut can attack ist on all its inferences. Definition 4. (Defeat) An argument A defeats an argument B if – conc(A) = ϕ and conc(B) = ¬ϕ (rebuttal), or – conc(A) =‘inf(B) is inapplicable’ (undercut), or – A defeats a subargument of B. 3.4 Semantics By specifying the inference schemes and the definition of defeat, together with a knowledge base, we have instantiated an argumentation framework consisting of a set of arguments and a defeat relation among them. Now we define which arguments are justified. For this we use Dung’s [10] grounded semantics.1 Grounded semantics is defined as follows. Definition 5. – An argument A is acceptable with respect to a set S of arguments iff each argument defeating A is defeated by an argument in S. – The characteristic function, denoted by FAF , of an argumentation framework AF is defined as follows: FAF (S) = {A | A is acceptable with respect to S}. – The grounded extension of AF is defined as the least fixed point of FAF . – An argument is justified with respect to grounded semantics iff it is a member of the grounded extension. 3.5 Validity The argumentation framework defined in previous sections indeed models lexicographic preference, assuming a complete and consistent knowledge base. Lemma 1. Let A (KB) denote all arguments that can be built from a knowledge base KB. Then there is an argument A ∈ A (KB) such that the conclusion of A is pref(a, b) and A is justified under grounded semantics iff a is preferred over b according to the lexicographic preference ordering (Definition 1) given KB. 1 For the argumentation system defined in this paper (including the extended version of Section 5), the choice of semantics is not relevant; we could also have used other semantics such as preferred or stable semantics (also from [10]). There would be a difference when we allow the use of an inconsistent knowledge base, in which case another semantics may be more suitable. This is something for further investigation. 163 Proof. Suppose a is preferred over b. This means that there exists an attribute P such that |{P′ | a satisfies P′ and P ≈ P′ }| > |{P′ | b satisfies P′ and P ≈ P′ }| and for all Q ≻ P: |{Q′ | a satisfies Q′ and Q ≈ Q′ }| = |{Q′ | b satisfies Q′ and Q ≈ Q′ }|. Let P1 . . . Pn denote all attributes of equal importance as P such that a has Pi and let P1′ . . . Pm′ denote all attributes of equal importance as P such that b has Pi . Note that n > m. Then the knowledge base is as follows: P1 ≈ . . . ≈ Pn ≈ P1′ ≈ . . . Pm′ and P1 (a) . . . Pn (a) and P′ (b) . . . Pm′ (b). The following argument (A) can be built (note that this argument can also be built if m is equal to 0, by using the empty set count): P1 (a) . . . Pn (a) P1 ≈ . . . ≈ Pn has(a, [P1 ], n) P1′ (b) . . . Pm′ (b) P1′ ≈ . . . ≈ Pm′ has(b, [P1′ ], m) P1 ≈ P1′ n>m pref(a, b) We will now play devil’s advocate and try to defeat this argument. We can try rebuttal and undercut of the argument and its subarguments. Rebuttal of premises is not applicable, since the knowledge base is consistent. Rebuttal of (intermediate) conclusions is not possible either, since there is no way to derive a negation. Then there are three inferences we can try to undercut (the last inference of the argument and the last inferences of two subarguments). For the left-hand count, this can only be done if there is another Pj such that Pj ≈ P and Pj 6∈ {P1 , . . . , Pn } and Pj (a) is the case. However, P1 . . . Pn encompass all such attributes, so count undercut is not possible. The same argument holds for the other count. At this point it is useful to note that these two counts are the only ones that are undefeated. Any lesser count will be undercut by the count undercutter that takes all of P1 . . . Pn (resp. P1′ . . . Pm′ ) into account. Such an undercutter has no defeaters, so any non-maximal count is not justified. The final thing that is left to try is undercut of prefinf(a, b, [P1 ]). The undercutter of prefinf(a, b, [P1 ]) is based on two counts. We have seen that any non-maximal count will be undercut. If the maximal counts are used, we have n = m, since we have for all Q ≻ P: |{Q′ | a satisfies Q′ and Q ≈ Q′ }| = |{Q′ | b satisfies Q′ and Q ≈ Q′ }|. So the undercutter inference rule cannot be applied since n 6= m is not true. This means that for every possible type of defeat, either the defeat is inapplicable or the defeater of A is itself defeated by undefeated arguments. This means that A is in the grounded extension and hence justified according to grounded semantics. The same line of argument can be followed for eqpre f . ⊓ ⊔ 4 Strategies for Handling Incomplete Information So far, we have defined an argumentation system that can reason about preferences according to the lexicographic preference ordering. Above, we have assumed that the information about the objects that are compared is complete. But, as stated in the introduction, this is often not the case. In this section we will investigate how incomplete information can best be handled when reasoning about preferences. Suppose it is not known whether an object has a specific attribute, e.g. we know that P(a) but we do not know whether P(b) or ¬P(b). This might not be a problem. If the preference between a and b can be decided based upon attributes that are more important than P, the knowledge whether P(b) or ¬P(b) is the case is irrelevant. But 164 often this information will be needed to decide a lexicographic preference. In that case, different approaches or strategies for drawing conclusions are possible. However, not all strategies give desired results. In the following, we will discuss some naive strategies and their shortcomings, from which we will derive some desired properties of strategies, and define and model a strategy that gives intuitive results. 4.1 Naive Strategies Optimistic resp. Pessimistic Strategy This strategy always assumes that an object has resp. does not have the attribute that is not known. This strategy can always derive some preference between two objects, since it completes the knowledge by making certain assumptions, and can then derive a complete preference ordering over objects. But there is no guarantee that the inferences made are correct. In fact, any inferred preference can only be correct if all the assumptions it is based on are either correct or irrelevant. Since we do not know whether assumptions are correct and the strategy does not check for relevance, the inference can only be correct by chance. For example, suppose it is not known whether the villa has a garden and whether it is closeToWork. The optimistic strategy would assume that it has both attributes, in which case an incorrect preference of the villa over the apartment would be derived. The pessimistic strategy on the other hand would assume the villa has neither of the attributes, and would derive an incorrect preference of the cottage over the villa. Note that using the framework defined above without adaptation would boil down to using a pessimistic strategy: if it is not known whether an object as a certain attribute, the attribute is (implicitly) assumed to be absent. This is due to the fact that only attributes for which it is known that an object has them are counted. Attributes that an object does not have and attributes for which this information is unavailable are treated the same way (i.e. not taken into account when counting). Disregard Attribute Strategy This strategy does not take into account the attributes for which information about the objects to be compared is incomplete. This strategy can always derive some preference between two objects, since the information regarding the remaining attributes is complete, so a complete preference ordering over objects can be derived. But the inference might not be correct, since the attributes that are disregarded might be relevant in defining a preference order. For example, suppose it is not known whether the cottage is large. In that case, the attribute large will not be taken into account when comparing the cottage to another object. This leaves only the attributes garden and closeToWork on the highest importance level, of which all attributes satisfy exactly one. Since the cottage has the most attributes on the next importance level, a preference of the cottage over the villa as well as the apartment will be derived, even though in the original example the cottage was the least preferred object. Cautious Strategy In order to prevent the derivation of preferences that are only correct by chance, a natural alternative is to use a cautious strategy that prevents such inferences. This strategy infers nothing unless all information about the objects under comparison is available. It never makes incorrect preference inferences, but it lacks in decisiveness. Even if the unknown information is irrelevant to make an inference, nothing is inferred. 165 P Q R P Q P Q a XX ? a X ? a X ? b ? X b ? X b X a. b. c. Table 4. Examples of objects and attributes with incomplete information 4.2 Desired Properties for Strategies Given the limitations of the strategies discussed above, it is clear that we need a more balanced strategy that takes two main concerns into account, which we call decisiveness and safety. Decisiveness We call a strategy decisive if it does not infer too little. As mentioned above, an unknown attribute might be irrelevant for deciding a preference. This is the case if the preference is already determined by more important attributes. For example, suppose that we do not know whether the apartment has attribute nearShops. Then we can still conclude that the apartment is preferred over the cottage, based on the attributes large, garden, and closeToWork. It is not required that a preference is derived in every case, since the missing information might be essential, but all preferences that are certain (for which no essential information is missing) should be derived. The cautious strategy is not decisive. Safety We call a strategy safe if it does not infer too much. Suppose again that we do not know whether the apartment has attribute nearShops. Whereas this is irrelevant for deciding a preference between apartment and cottage, we do need this information for deciding the preference between the villa and the apartment. A strategy that makes assumptions about the missing information, or that disregards the attribute in question, will make unfounded inferences, and hence be unsafe. The optimistic, pessimistic and disregard attribute strategies are not safe. 4.3 A Decisive and Safe Strategy We have seen above what may go wrong when a naive strategy is used to deal with incomplete information. In this section we define an alternative strategy that does satisfy the properties of decisiveness and safety identified above. A preference inference should never be based on an unfounded assumption for a strategy to be safe. But to be decisive, a strategy needs to be able to distinguish relevant from irrelevant information. Our approach is based on the following intuition. When comparing two objects under incomplete information, multiple situations are possible. That is, whenever it is not known whether an object has an attribute, there is a possibility that it does and a possibility that it does not. If a preference can be inferred in every possible situation, then apparently the missing information is not relevant, and it is safe to infer that preference. It is not necessary to check every possible situation, but it suffices to look at extreme cases. For every object, we can construct a best- and worst-case scenario, or best and worst possible situation. A possible situation is a completion of an object in the sense that all missing information is filled in. 166 Definition 6. (Completion) A completion of an object a is an extension of the knowledge base with (previously missing) facts about a such that for every attribute P, either P(a) or ¬P(a) is in the extended knowledge base. So if a has n unspecified attributes, there are 2n possible completions of a. Since we assumed that presence of an attribute is preferred over absence, the most preferred completion assumes presence of all unknown attributes, and the least preferred completion assumes absence. If even the least preferred completion of a is preferred over the most preferred completion of b, then a must always be preferred over b, since a could not be worse and b could not be better. For example, consider the objects and attributes in Table 4a. In the worst case for a, a does not have attribute R. In the best case for b, b has attribute P. But even in this situation, a will be preferred over b, based on attribute Q. There is no way that this situation can improve for b or deteriorate for a, so it is safe to infer a preference for a over b. The strategy’s power to make such inferences makes it decisive. The next example illustrates that this approach does not infer a preference when the missing information is relevant. Consider Table 4b. In the situation that is worst for a and best for b, b will be preferred because it has both attributes, while a only has P. But in the other extreme situation, that is best for a and worst for b, a is preferred. This means that in reality, anything is possible, and it is not safe to infer a preference. We have seen when a preference for a over b can be inferred, and in which case no preference can be inferred. There are, however, two more possibilities. One is the case in which a preference of the most preferred completion of a over the least preferred completion of b can be derived, but only equal preference between the least preferred completion of a and the most preferred completion of b. This is illustrated in Table 4c. In this case, we would like to derive at least a weak preference of a over b. This is important, because in many cases a weak preference is strong enough to base a decision on, even if a strict preference cannot be derived. When having to decide between a and b, choosing a cannot be wrong when a is weakly preferred over b. Failing to derive a weak preference makes a strategy less decisive. The last possibility is equal preference. We only want to derive an equal preference between two objects a and b if all possible completions of a are equally preferred as all possible completions of b. This also means that the most and least preferred completions of a and b have to be equally preferred. This can only be the case if all information about a and b is known, for as soon as some information is missing, there will be multiple possible completions which are not equally preferred. 5 Argumentation Framework for Incomplete Information This section presents how our framework is extended to incorporate the decisive and safe strategy for incomplete information as presented in Section 4.3. We first present the changes to the language and then the changes to the inference rules. The defeat definition does not have to change. 167 5.1 Language To distinguish between the different completions of an object, we introduce a completion label. We use the object name without label to denote the object in general, that is, the object with any completion. The superscript + is used for the most preferred completion of an object, − for the least preferred completion. For example, consider object a in Table 4a. The most preferred completion of a has attribute R, and is denoted a+ . The least preferred completion of a does not have attribute R, and is denoted a− . Reasoning with completions as discussed above can be viewed as a kind of assumption-based reasoning. To be able to support such reasoning, we extend the language and introduce weak negation, denoted by ∼, which is also used in [15]. This is used to formalize a kind of assumption-based reasoning. A formula ∼ ϕ can always be assumed, but is defeated by ϕ (see the next section for the details). So the statement ∼ ϕ should be interpreted as ‘ϕ cannot be derived’. Finally, we add formulas of the type wpref(a, b) which express weak preference, just as pref(a, b) and eqpref(a, b) express strict and equal preference, respectively. We use weak preference in the sense that an object a is weakly preferred over an object b if any completion of a is either preferred over or equally preferred as any completion of b, but no strict or equal preference can be derived with certainty. This leads to the following redefinition of the language. Definition 7. (Language) Let P be a set of attribute names with typical elements P, Q, and O a set of object names with typical elements a, b, and let n be a non-negative integer, and x, y ∈ {+, −, {}} a label for objects (where {} means no label). The language L is defined as follows. ϕ ∈ L ::= P(a) | P ≻ Q | P ≈ Q | pref(ax , by ) | eqpref(ax , by ) | wpref(ax , bb ) | has(ax , [P], n) | ¬ϕ | ∼ ϕ 5.2 Inferences The inference rules of the extended framework are listed in Table 5. Two inference rules are added that define the meaning of the weak negation ∼. According to inference rule 8, a formula ∼ ϕ can always be inferred, but such an argument will be defeated by an undercutter built with inference rule 9 if ϕ is the case. P is supposed to be among the attributes of the least preferred completion of a (a− ) only if it is known that a has P. This is modelled by inference rule 2b in Table 5. For the most preferred completion of a, it is only required that it is not known that a does not have P; if this is not known, a+ will be assumed to have P. This is modeled by using premises of the form ∼ ¬P(a) instead of P(a). This can be seen in inference rule 2a. Inference rules 4 through 7 remain unchanged, except that completion labels are added. To infer overall preferences from the preferences over certain completions, three more inference rules are defined. Inference rule 10 states that if (even) a− is preferred over b+ , then a must be preferred over b, as we saw above. When a+ is preferred over b− , but a− is only equally preferred as b+ , this not strong enough to infer a strict preference of a over b, but we can infer a weak preference of a over b using inference rule 11. Rule 12 states that in order to infer equal preference between a and b, both 168 1 count x (a, [P], ∅) hasx (a, [P], 0) ∼ ¬P1 (a) . . . ∼ ¬Pn (a) P1 ≈ . . . ≈ Pn has(a+ , [P1 ], n) 2a P1 (a) . . . Pn (a) P1 ≈ . . . ≈ Pn count(a− , [P1 ], {P1 , . . . Pn }) has(a− , [P1 ], n) 2b count(a+ , [P1 ], {P1 , . . . Pn }) ∼ ¬P1 (a) . . . ∼ ¬Pn (a) P1 ≈ . . . ≈ Pn count(a+ , [P1 ], {P1 , . . . , Pn })uc 3a count(a+ , [P1 ], S ⊂ {P1 , . . . , Pn }) is inapplicable 3b count(a− , [P1 ], S ⊂ {P1 , . . . , Pn }) is inapplicable 4 has(ax , [P], n) has(by , [P′ ], m) P ≈ P′ pref(ax , by ) 5 has(ax , [Q], n) has(by , [Q′ ], m) Q ≈ Q′ ≻ P n 6= m prefinf(ax , by , [P])uc prefinf(ax , by , [P]) is inapplicable 6 has(ax , [P], n) has(by , [P′ ], m) P ≈ P′ eqpref(ax , by ) 7 has(ax , [Q], n) has(by , [Q′ ], m) Q ≈ Q′ 6≈ P n 6= m eqprefinf(ax , by , [P])uc eqprefinf(ax , by , [P]) is inapplicable 8 ∼ ϕ asm(∼ ϕ) 10 pref(a− , b+ ) pref(a, b) 12 eqpref(a+ , b− ) eqpref(a− , b+ ) eqpref(a, b) P1 (a) . . . Pn (a) P1 ≈ . . . ≈ Pn count(a− , [P1 ], {P1 , . . . , Pn })uc n>m n=m prefinf(ax , by , [P]) eqprefinf(ax , by , [P]) 9 ϕ asm(∼ ϕ)uc asm(∼ ϕ) is inapplicable 11 eqpref(a− , b+ ) pref(a+ , b− ) wpref(a, b) Table 5. Inference schemes for incomplete information the most preferred completion of a and the least preferred completion of b, and the least preferred completion of a and the most preferred completion of b must be equally preferred. Example 4. In the case of Table 4a, the following argument can be built. Q(a) has(a− , [Q], 1) has(b+ , [Q], 0) Q≈Q 1>0 pref(a− , b+ ) pref(a, b) 169 The next argument shows that a weak preference can be inferred in the situation of Table 4c. P(a) ∼ ¬Q(b) has(a− , [P], 1) has(b+ , [Q], 1) ∼ ¬P(a) ∼ ¬Q(a) has(a+ , [P], 2) P≈Q 1=1 eqpref(a− , b+ ) P≈Q Q(b) has(b− , [Q], 1) P≈Q 2>1 pref(a+ , b− ) wpref(a, b) 6 Conclusion In this paper we have made the following contributions. Argumentation-based approaches can be used to model qualitative multi-attribute preferences such as the lexicographic ordering. The advantage of argumentation over other approaches emerges most clearly in the case of incomplete information. Our approach allows to reason about preferences from best- and worst-case perspectives (called completions here), and the consequences for overall preferences. In our current approach it is still often the case that no preference can be inferred. What should we do in such a case? One approach is to ask the user for the missing information. But the user might not have this information, and might not have the time or resources to look it up. In some situations it might be fruitful to relax the notion of safety, which we have used in a very strict sense here; a conclusion is only called safe if it can be drawn in every possible situation. But we might want to draw a conclusion if it follows in the most likely situation. Of course, to model this we need information about the likelihood of situations. This could for example be modelled by a normality ranking [3] or a possibility ranking [9]. Also, although general default assumptions are often not safe, some domain-specific default assumptions may be safe enough. For example, if nothing to the contrary is known, one may safely assume that a house has electricity. Some default assumptions may be conditional, for example, a detached house usually has a garden. One interesting extension therefore is to add such default reasoning and more general reasoning about the beliefs of an agent to the framework. Default rules (e.g. detached(a) ⇒ garden(a)) can be placed in the knowledge base. Next, an inference rule is needed that applies these rules and can infer garden(a) from detached(a) and detached(a) ⇒ garden(a). Finally, a strength mechanism is needed, so that factual information always defeats rebutting default assumptions (e.g. if ¬garden(a) is known for a fact, then this defeats the conclusion garden(a) that was derived using a default rule, but not vice versa). In our future work we would like to distinguish more explicitly between mental attitudes such as beliefs, goals, desires and preferences. This will also allow us to reason about these attitudes, for example that a certain preference we have is based on some specific beliefs. We hope to gain insight from modal preference languages with belief operators such as the one presented in [13]. Other interesting areas for future work include the representation of dependent preferences (e.g. ‘I only want a balcony if the house does not have a garden, otherwise I do not care’), and the relation with e.g. CPnets [4] and value-based argumentation [11]. 170 Finally, we believe that the argumentation-based framework for preferences presented here can be usefully applied in the preference elicitation process. It allows the user to extend and refine the system representation of his preferences gradually and as the user sees fit. To facilitate this elicitation process more research is needed how our framework can support a user e.g. by indicating which information is still missing. Acknowledgements This research is supported by the Dutch Technology Foundation STW, applied science division of NWO and the Technology Program of the Ministry of Economic Affairs. It is part of the Pocket Negotiator project with grant number VICI-project 08075. References 1. L. Amgoud, N. Maudet and S. Parsons. Modelling dialogues using argumentation. Proc. ICMAS, 2000. 2. T.J.M. Bench-Capon and P.E. Dunne. Argumentation in artificial intelligence. Artificial Intelligence, 171:619–641, 2007. 3. C. Boutilier. Toward a logic for qualitative decision theory. Proc. KR, pages 75–86, 1994. 4. C. Boutilier, R.I. Brafman, C. Domshlak, H.H. Hoos, and D. Poole. CP-nets: A tool for representing and reasoning with conditional ceteris paribus preference statements. Journal of Artificial Intelligence Research, 21:135–191, 2004. 5. G. Brewka. A rank based description language for qualitative preferences. Proc. ECAI, 2004. 6. G. Brewka, S. Benferhat, and D. Le Berre. Qualitative choice logic. Artificial Intelligence, 157(1-2):203–237, 2004. 7. S. Coste-Marquis, J. Lang, P. Liberatore, and P. Marquis. Expressive power and succinctness of propositional languages for preference representation. Proc. KR, pages 203–212, 2004. 8. J. Doyle and R.H. Thomason. Background to qualitative decision theory. AI Magazine, 20(2):55–68, 1999. 9. D. Dubois and H. Prade. Possibility theory as a basis for qualitative decision theory. Proc. IJCAI, 1995. 10. P.M. Dung. On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games. Artificial Intelligence, 77:321–357, 1995. 11. S. Kaci and L. van der Torre. Preference-based argumentation: Arguments supporting multiple values. Int. J. of Approximate Reasoning, 48:730–751, 2008. 12. R.L. Keeney and H. Raiffa. Decisions with multiple objectives: preferences and value tradeoffs. Cambridge University Press, 1993. 13. F. Liu. Changing for the Better: Preference Dynamics and Agent Diversity. PhD thesis, Universiteit van Amsterdam, 2008. 14. H. Prakken. A study of accrual of arguments, with applications to evidential reasoning. Proc. ICAIL, pages 85–94, 2005. 15. H. Prakken and G. Sartor. Argument-based extended logic programming with defeasible priorities. Journal of Applied Non-Classical Logics, 7:25–75, 1997. 16. I. Rahwan, S.D. Ramchurn, N.R. Jennings, P. McBurney, S. Parsons, and L. Sonenberg. Argumentation-based negotiation. Knowledge Engineering Review, 18(4):343–375, 2004. 171 A Difference Logic Approach to Solve Matching Problems in Multi-Agent Settings Helena Keinänen and Misa Keinänen Helsinki University of Technology, Faculty of Information and Natural Sciences Abstract. Matching problems are extensively studied combinatorial problems with real-world applications in the domain of multi-agent systems. In this paper, we describe an approach to solve NP-hard matching problems with difference logic satisfiability solvers. We present two novel encodings from matching problems to the satisfiability of difference logic. One encoding is given for two-sided stable matching, and another encoding is given for a kidney exchange problem. As a consequence of these encodings, we can directly employ fast implementations of satisfiability checking algorithms for the difference logic in order to solve matching problems. We have implemented the presented encodings, and we demonstrate via numerical comparisons the usefulness and applicability of our approach. 1 Introduction Several multi-agent scenarios require coordination of agents to form coalitions as well as allocation of indivisible items with non-transferable utilities of agents. Stable matching provides a useful mechanism to resolve such coalition formation and allocation problems. Classical examples of matching problems deal with student admissions, stable marriages and housing markets [1,2,3]. More recent examples of matching problems include allocation of kidney donors to compatible kidney transplant patients [4,5,6]. Many different real-world situations in multi-agent systems (e.g. robotic soccer, public transportation problems and autonomous robotic rescue scenarios) can be seen as instances of the classical matching problems [7]. However, in some important scenarios current stable matching techniques fail to scale up. There are even some recent variants of matching problems for which no algorithms have been presented up to date. By combining efficient algorithmics of fast satisfiability solvers for difference logic and appropriate problem encodings based on difference logic, we are able to solve many matching problems orders of magnitudes faster than the currently best matching solution techniques. In this paper we present translations that provide a basis for computationally efficient, polynomial-time algorithms for generally converting matching problems into difference logic formulas in an automated way. Our work also provides a new difference logic perspective on representational issues in modeling agents’ preferences. 172 In particular, there are matching problem variants which are difficult, N P hard combinatorial problems (see e.g. [8,9]). These hard variants typically model realistic features of preferences such as incompleteness and indifference in the preference lists of agents. There are some attempts to deal with the harder variants of the problem in [10,8,9]. However, these methods turn out to be inefficient in practice, if the number of agents grows and the preference lists are incomplete, not strictly ordered and short [11,12]. Motivated by the constraint programming encoding and the SAT encoding of [11,12], we propose an alternative difference logic encoding for hard variants of stable matching problems. We have conducted experiments and report their results demonstrating the effectiveness of the suggested approach. A more recent variant of the stable matching problem, which has great practical importance, is the so called kidney exchange problem. Kidney transplantation is currently the most preferable treatment for serious kidney failures but there is an acute world-wide shortage of deceased-donor kidneys. An optimal matching yields transplants for a maximal number of patients on the transplantation queue while taking also into account the number of surgeries that can be performed simultaneously. The social impacts of developing efficient allocation methods for the kidney exchange problem are considerable since the introduction of live donor exchange programs have remarkably shortened the waiting times for suitable transplants [5,13]. Some countries (e.g. the USA [13] and the Netherlands [14]) have established kidney exchange programs to overcome the difficulties in identifying a compatible donor to a recipient. The patients involved in these programs can exchange their incompatible donors in order to receive a compatible one. These exchanges that first occurred paired-wise have then enlarged to cover exchange cycles with multiple recipient-donor pairs. Although the possibility to find a suitable donor for a patient increases as the length of the cycle grows and the number of (compatible/incompatible) pairs willing to join in exchange program increases, there are practical and ethical constraints that keep the size of the cycles short [4]. More recently, an important step has been taken in kidney exchange problem [15]. The introduction of a non-simultaneous, extended, altruistic-donor (NEAD) chains has made possible to perform exchange chains of several kidney trasplantations involving an altruistic donor. Although there exist efficient methods for solving the classical kidney exchange problems [6], these techniques are not directly applicable to the most recent NEAD chain kidney exchange model described in [15]. In this paper, we propose a novel technique to solve the NEAD chain kidney exchange problem by employing the practically efficient algorithmics behind fast difference logic satisfiability solvers. From a multi-agent system point of view, the results in this paper are important because the matching problems and the kidney exchange problems can directly be seen equivalent to special cases of multi-agent coalition formation where agents group together to form coalitions according to their preferences. 173 2 Difference Logic Difference logic is the propositional logic combined with the theory of integer differences over infinite domains. The syntax of difference logic can be defined as follows. Definition 1. Let P = {p1 , p2 , . . . , pn } be a set of Boolean variables and let X = {x1 , x2 , . . . , xm } be a set of integer variables. The set of atomic formulas consists of propositions in P and integer constraints of the forms (xi = c), (xi ≤ c) and (xi < c) with xi ∈ X and c ∈ ZZ. The set F of all difference logic formulas is the smallest set containing the atomic formulas which is closed under negation and conjunction: – if Φ ∈ F, then ¬Φ ∈ F, and – if Φ ∈ F and Ψ ∈ F, then (Φ ∧ Ψ ) ∈ F. The remaining Boolean connectives ∨, →, ↔ are defined in the usual way in terms of ¬ and ∧. Our version of difference logic is actually a subset of the standard difference logic which allows also integer constraints of the form (xi + c ≤ xj ). Let us define the semantics. A valuation (P, X ) consists of two overloaded functions v : P → {⊤, ⊥} and v : X → ZZ. The valuation v is extended to all formulas in F by defining v(xi = c) = ⊤ iff v(xi ) = c, v(xi ≤ c) = ⊤ iff v(xi ) ≤ c and v(xi < c) = ⊤ iff v(xi ) < c. The usual semantics is applied for the Boolean connectives. A formula Φ is satisfied by a valuation v iff v(Φ) = ⊤. A formula Φ is satisfiable, if there exists a satisfying valuation. The satisfiability problem for difference logic is to determine whether or not a given formula Φ is satisfiable. The satisfiability problem for difference logic is known to be a NP-complete problem [16]. Recently, several practically efficient satisfiability solvers have been developed to solve very large problem instances. These solvers implement highly optimized, dedicated algorithmics such as [17] and [18]. In what follows, we show how these algorithms can be directly employed to solve hard variants of matching problems. 3 Solving Stable Marriage Problem via Difference Logic One of the well-known variants of matching problems is called stable marriage problem (SM). An instance of the problem involves finite sets M and W of men and women. In real-world multi-agent system settings, the sets M and W may represent any sets of agents. Also, one matching set may consist of agents and the other may represent any passive entities such as resources. Each m ∈ M has a preference order over w ∈ W and each w ∈ W has a preference order over m ∈ M . The classical version of the SM requires the preference orders to be strict and complete. There exist efficient polynomial time algorithms for SM [1]. More realistic features of preferences, such as incompleteness and indifference in the preference lists of agents, can be modeled as harder variants of SM. Here, 174 we study an N P -complete extended variant of SM where the preference lists may be incomplete and the indifference in the preference lists take the form of ties. This variant of SM is called stable marriage problem with ties and incomplete lists (SMTI), see e.g. [8,9]. A matching M is defined as a binary relation from M to W , representing an assignment where every man is matched to at most one woman, and each woman is matched to at most one man. In case of SM with complete preference lists, M in I is a bijective function from the set M to the set W , (or from W to M ). However, if the preference lists are incomplete, then an agent can find other agents as impossible mates. When wi (mi ) appears in the preference list of mj (wj ), we say that wi (mi ) is acceptable to mj (wj ), otherwise unacceptable. As usually [11,12,9] we assume that, if wi is acceptable to mj , mj is also acceptable to wi . Also, since the preference lists of the agents may be incomplete, an agent can remain single. If in a matching M each man (woman) is matched to exactly one woman (man), then M is called a complete matching. The requirement of the strict order in the preference list is often relaxed by letting the agents to be indifferent between some agents on their preference lists such that the preference lists involves ties. While there are ties on the preference lists, we will define the stability of a matching as weak stability. According to the weak stability condition, in the case of incomplete preference lists, matching M is stable if there is no unmatched pairs that are acceptable to each other and would strictly prefer each other to their matched partners. Thus, we call a matching M unstable if the instance of the problem involves two men mi , mj ∈ M and two women wk , wl ∈ W such that – mi is matched to wl and mj is matched to wk , – mi strictly prefers wk to wl and wk strictly prefers mi over mj , and – the agents are acceptable to each other. The pair (mi , wk ) above is called a blocking pair. A matching that admits no blocking pair is called stable. The main aim in SM is to form matchings that are stable. In the classical version on SM, each instance I always yields at least one stable matching [1]. However, in the case of SMTI a stable matching does not necessarily exist, or it is not a complete matching. An instance of SMTI can yield stable matchings of different sizes [9]. Example 1. As an example consider an instance of the SM shown in Table 1. The instance involves two sets of agents A1 = {1, 2, . . . 8} and A2 = {1, 2, . . . 8}. The agents have incomplete preference lists with ties. In Table 1, the ties are indicated by the symbols ( and ). The instance of this example has a complete stable matching, namely (1, 2)(2, 8)(3, 5)(4, 6)(5, 3)(6, 4)(7, 1)(8, 7). We now define a difference logic encoding for the SMTI. The encoding is very similar to [12], but in contrast to [12] we use only integrity constraints instead of Boolean constraints. Given an instance I of SMTI with n men and n women 175 Table 1. An example of the SM problem with two sets of agents, 8 agents per both sets. Lists of A1 1: 1 (7 2) 2: 8 3: 5 4: (6 5) 5: 3 1 6: 4 2 7 7: (8 1) 3 8: 7 Lists of A2 1: 7 1 5 2: 6 1 3: 5 7 4: 6 5: 4 3 6: 4 7: (8 6) 1 8: (2 7) together with their preference lists1 , we construct a difference logic formula ΦI which is satisfiable iff there is a complete stable matching M for I. Let us first define some notation. For 1 ≤ i, j ≤ n, we introduce integer variables mi and wj to represent the men and the women of the instance I. For 1 ≤ i ≤ n, let lmi refer to the integer constant which equals to the length of the preference list of man mi in the instance I. Similarly, for 1 ≤ j ≤ n let lwj refer to the integer constant which equals to the length of the preference list of woman wj . Let Acc be the set of all pairs (mi , wj ) in I acceptable to each other, i.e., for 1 ≤ i, j ≤ n, (mi , wj ) ∈ Acc iff mi appears on the preference list of wj and vice versa. For (mi , wj ) ∈ Acc, let p be the integer constant which equals to the position of wj in mi ’s preference list, and let q be the integer constant which equals to the position of mi in wj ’s list. In addition, for 1 ≤ i, j ≤ n let p+ be the integer constant which equals to the position in mi ’s list of the first woman who is worse than the woman in position p. If there is no such woman, p+ = lmi + 1. Finally, we define q + in the same way as p+ . The formula ΦI is a conjuction (Φm ∧ Φw ∧ Φc ∧ Φs ) with the sub-formulas defined as follows: ^ Φm = (mi ≤ lmi ) ∧ (mi ≥ 1), 1≤i≤n Φw = ^ (wj ≤ lwj ) ∧ (wj ≥ 1), 1≤j≤n Φc = ^ (mi = p) ↔ (wj = q), 1≤i,j≤n,(mi ,wj )∈Acc and Φs = ^ (mi < p+ ) ∨ (wj < q + ), 1≤i,j≤n,(mi ,wj )∈Acc We have the following theorem which states the correctness of the translation. 1 As usual in the literature, we assume without loss of generality that the matching sets are of equal sizes. 176 Theorem 1. Given an instance I of the SMTI problem, there is a complete stable matching M for I if and only if the difference logic formula ΦI is satisfiable. Proof. First we show that ΦI is satisfiable implies there is a complete stable matching M for I. Suppose that there does not exist a complete stable matching but ΦI is satisfiable. Since there are no stable matchings, for each matching M there must exist a blocking pair (m, w′ ) such that m is matched to w, m′ is matched to w′ , m strictly prefers w′ to w and w′ strictly prefers m to m′ . This means that m is matched with w, who holds position p on the preference list of m and that w′ holds position p′ < p on the same preference list. Equally, w′ is matched with m who holds position q and that m holds position q ′ < q on the list of w′ . Given the above blocking pair and the corresponding preferences, there is no integer valuation satisfying both conjuncts Φc and Φs which contradicts the assumption. Thus, it cannot be the case that ΦI is satisfiable while there is no stable matching M. What remains to be done is to show that the existence of a stable M implies the satisfiability of ΦI . This can be done as follows. Let M be any complete stable matching for the given instance I of the SMTI. We construct an integer valuation from M which satisfies ΦI . For all pairs (m, w) ∈ M, let the integer value v(m) be the position of w in m’s preference list and let the integer value v(w) be the position of m in w’s preference list. As M is complete, all variables of ΦI are clearly assigned a value. Furthermore, all conjuncts of the formula ΦI evaluate to ⊤ under the valuation v, and consequently v(ΦI ) = ⊤. This implies the formula is satisfied by v. Hence, ΦI is satisfiable. ¤ The size of the difference logic encoding is as follows. Theorem 2. Given an instance I of the SMTI problem, the difference logic formula ΦI has O(n) variables and the size of the formula is O(n2 ) (with n the number of men). Next, we turn to consider another matching problem. 4 Solving Exchange Market Problem via Difference Logic In this section, we consider a new variant of the kidney exchange problem based on the kidney matching model in [15], namely the kidney exchange problem with Non-simultaneous, Extended, Altruistic-Donor (NEAD) chains. This problem can be seen as an instance of a matching market in multi-agent systems, where autonomous agents exchange any indivisible items. Consider a set of patients needing a kidney transplantation who all have their own donors, but with incompatible kidney transplantations for the patients. Suppose such patient-donor pairs group together to exchange donors, so that all patients would obtain a donor with a compatible kidney transplantation. The techniques presented in [6] can effectively be used to search for a matching where all patients exchange their incompatible donors to compatible ones, and the operations for the kidney transplantations are performed simultaneously. 177 However, as demonstrated in [15] in many real-world situations it often happens that the patients have very conflicting preferences for the donors, and thus there does not always exist a suitable matching in terms of the conventional two-way simultaneous exchange [5,13,6]. As a more suitable way to perform the kidney donor exchange and the corresponding transplantations, [15] consider the following approach. An altruistic donor joins the group of patient-donor pairs, and this altruist is willing to donate a kidney for a compatible patient without any need of a donor in exchange. From this altruistic donation begins a so-called NEAD-chain, which resolves the possible preference conflicts among the patient-donor pairs, such that several patients get a compatible donors via a series of non-simultaneously performed transplantations. More formally, we define the kidney exchange problem with NEAD-chains in the following way. As in [13], we represent the possible kidney donor allocations as a directed graph G = (V, E). Let v1 ∈ V be the altruistic donor, and let all other nodes in V represent the patient-donor pairs of the exchange. The set of edges E represent all of the possible kidney donor allocations such that E ⊆ V × V (but there are no incoming edges to v1 , since v1 does not need a donor). Let a NEAD-chain be any simple path2 π = (v1 , vi , . . . vj ) appearing in G which starts from the altruistic donor v1 , and which has a length of at least k ≤ |V |. The NEAD-chain represents a feasible allocation of kidney donors to patients, and the path length gives the number of transplantations which shall be performed via non-simultaneous operations. Now, given a compatibility graph G and a target number k of transplantations, the problem is to find a NEADchain of length k, if such exists. Finding a NEAD-chain for a kidney exchange problem represented as G is clearly NP-complete because it corresponds to the well-known longest path problem. We now represent an approach to find NEAD-chains via difference logic satisfiability. Given a compatibility graph G and k ≤ |V |, we construct a difference logic formula ΦN EAD which is satisfiable iff there is a NEAD-chain of (at least) length k in G. The formula ΦN EAD contains the following variables. For all vi ∈ V , we introduce a Boolean variable pi which represents a patient-donor pair (v1 represents the altruist donor). For all (i, j) ∈ E, we introduce a Boolean variable ei,j for representing the compatibilities. For all 1 ≤ i ≤ n, we introduce an integer variable vi which represents the transplantation order of the NEAD-chain in the kidney transfer surgery sequence. The formula ΦN EAD is defined as a conjuction (Φad ∧ Φk ∧ Φv ∧ Φc ∧ Φe ) with the sub-formulas defined as follows: Φad = p1 ∧ (v1 = 1), Φk = _ ((vi = k) ∧ pi ), i∈V \{1} 2 I.e., all nodes occur only once along the path. 178 ^ Φv = (ei,j → (vj + 1 = vi )), (i,j)∈E Φc = ^ (ei,j → pj ) (i,j)∈E and Φe = ^ ((pi ∧ vi 6= k) → i∈V _ (ei,j )). (i,j)∈V One can easily verify the correctness of the translation, which is stated as follows. Theorem 3. Given a kidney exchange compatibility graph G = (V, E) and k ≤ |V |, there is a NEAD-chain in G of length (at most k) k if and only if the difference logic formula ΦN EAD is satisfiable. Proof. First we show that the existence of a k-length NEAD-chain in G implies the satisfiability of ΦN EAD . Suppose there is a NEAD-chain π in G whose length is k. Let us construct from π a truth assignment v in the following way. For all (i, j) ∈ E let the value v(ei,j ) of the Boolean variable ei,j be ⊤ if and only if edge (i, j) is appears in chain π. For all i ∈ V , let the value v(pi ) of the Boolean variable pi be ⊤ if and only if node i occurs in chain π. Let the integer value v(v1 ) of variable v1 be 1. For all i ∈ V \ {1}, if i appears along π, then let the value v(vi ) of the integer variable vi be the number of nodes preceding i along the path π added by one. Given this valuation v, one can verify that we have v(Φad ) = ⊤, v(Φk ) = ⊤, v(Φv ) = ⊤, v(Φc ) = ⊤, and v(Φe ) = ⊤. Thus, the formula ΦN EAD is satisfiable. We now show that the satisfiability of ΦN EAD implies the existence of a klength NEAD-chain in G. Whenever ΦN EAD is satisfiable there exists a valuation v which satisfies the formula s.t. v(ΦN EAD ) = ⊤. Let us consider a sub-graph G′ = (V ′ , E ′ ) of G, i.e. V ′ ⊆ V and E ′ ⊆ E, which is induced by v in the following way. For all i ∈ V , let i ∈ V ′ if and only if v(pi ) = ⊤. For all (i, j) ∈ E, let (i, j) ∈ E ′ if and only if v(ei,j ) = ⊤. Now, we notice that by the definition of ΦN EAD the induced graph G′ clearly contains a k-length path which is a NEADchain; otherwise, the conjuncts (Φad ∧ Φk ∧ Φv ∧ Φc ∧ Φe ) are not satisfied. As G′ is a sub-graph of G, G also contains a NEAD-chain of length k which concludes the proof.¤ The size of the resulting difference logic formula is as follows. Theorem 4. Given a kidney exchange compatibility graph G = (V, E) and k ≤ |V |, the difference logic formula ΦN EAD is of the length O(|V | × |E|) and there are O(V + E) variables. In Section 5, we introduce some experimental results which demonstrate the efficiency of the approach in practice. 179 5 Experimental Results In this section, we describe extensive experimental results on solving matching problems with the presented difference logic approach. In order to evaluate the approach we have implemented in the C programming language [19] various problem instance generators, various difference logic encodings for matching problems, and previous state-of-the-art Boolean SAT encoding for the SMTI from [12]. All of the the experiments are run with a laptop machine with 2.13GHz Inter Celeron CPU running on Linux with 1GB of RAM. We will demonstrate the applicability of our approach via the following three series of experiments. 5.1 Results on kidney exchange problems In the first series of experiments, we compare the run-time behaviours of Yices [18] and Barcelogic for SMT version 1.2 (BCLT) [17] difference logic solvers on real-world kidney exchange benchmark problems which are borrowed from [13,6] 3 . Unfortunately, there does not exist any previous algorithms directed at the NEAD-chain kidney exchange [15]. Thus, we only compare the two difference logic solvers both using the same encoding presented in Sect. 4, but these distinct solvers are based on different algorithmics. Table 2 shows run-time statistics for the solvers to find 100-length (i.e., k = 100) NEAD-chains from 40 kidney exchange markets, 10 instances per each market of size 200, 400, 600 and 800. Each instance were run 11 times with both solvers, and we report the minimum, median and maximum run-times in seconds. We observe that Yices solver can clearly easily solve all of the problem instances, which indicates the usefulness of the new difference logic approach to the NEADchain kidney exchange problem. The Barcelogic performs slightly worse on these examples than yices, and times out at 1000-second run-time limit on instances larger than 400. Table 2. The running times on kidney exchange problem with an altruistic donor. Size (|V |) 200 400 600 800 3 min 1.7 10.9 20.8 68.6 Yices med max 1.8 1.8 11.1 11.9 30.9 31.2 69.0 69.1 min 11.6 172.7 >1000 >1000 BCLT med 11.8 173.2 >1000 >1000 max 11.9 180 >1000 >1000 The problems consist of real-world problem instance distributions based on kidney exchange market data maintained in http : //optn.transplant.hrsa.gov/data/annualReport.asp. 180 5.2 Results on SMTI problems In second series of experiments, we compare the new different logic encoding presented in Section 3 with the state-of-the-art SAT encoding for the SMTI from [11,12]. In particular, we compare the run-time behaviours of zChaff [20] SAT solver and Yices [18] difference logic solver on SMTI instances generated uniformly at random. We produced the instances with the random SMTI generation algorithm described in [11,12]4 with additional restrictions to the lengths of preference lists, lengths ranging from 6 to 12, and with n = 100 number of agents. It is known that the SMTI problem instances with such short list lengths are usually intractable to solve in practice. Fig. 1 shows the run-times in seconds for the both encodings and the corresponding solvers on 100 instances where the preference list lengths are set to 6; we give the minimum (a) the median (b), and the maximum (c) run-times over 5 runs per each instance. Similarly, Fig. 2 and Fig. 3 show the behaviours of the solvers on 100+100 instances when the lengths of the lists are set to 9 and 12 respectively. Here, the run-time limit was set to 1000 seconds, because we noticed that the zChaff with the encoding from [11,12] cannot typically solve unsatisfiable instances at all in reasonable times. Based on these results one can observe that our approach is orders of magnitudes faster than the approach in [11,12] when the preference lists are short. In particular, we observe that (unlike zChaff) Yices can easily solve instances which do not have stable matchings. Notably, in these tests all of the instances on which the zChaff times out are cases where there is no stable matching. Also, we notice that all of the cases where the zChaff does not time out are s.t. there is a stable matching. We observe that the approaches in [11,12] perform better than our difference logic approach for SMTI instances with long preference lists. As indicated by the results shown in Fig. 3 the turning point in these benchmark problems is the list length 12. In third series of experiments we use benchmarks from [11,12]. These are generated with a random SMTI instance generation algorithm in [11,12] where a class of random instances is represented by a triple hn, p1 , p2 i with n the number of men and women, p1 the probability of incompleteness, and p2 the probability of ties on the preference lists. Fig. 4 shows a run-time comparison of zChaff (with the SAT encoding in [11,12]) and Yices (with the difference logic encoding in Sect. 3) on 1000 random instances for each parameter combination in the sequence: h100, 0.8, 0.2i, h100, 0.8, 0.3i, . . . , h100, 0.8, 0.8i For both solvers (and encodings respectively), every 7000 instances were run 5 times. Notably, all 7000 instances have a stable matching and, thus no unsatisfiable cases appear at all in this series of experiments. One can see that zChaff performs 4 We used a JAVA implementation of the generator obtained from Chris Unsworth. 181 slightly better than Yices but the run-times are only a few seconds for both solvers on all of the instances. Here, Yices is only a constant factor slower than zChaff. 6 Related Research Stable matching problems have been a widely studied subject since the seminal papers [1,2,21]. A number of algorithms for several variants of the problem and its applications have been reported. For instance, early research on the topic is discussed in [22]. It is known that, in the general case of the classical Stable Marriage (SM) problem there is always a stable matching which can be found in time O(n2 ) using Gale-Shapley algorithm [1]. In contrast, if we relax the requirement of complete and totally ordered preference lists, then the problem of finding a stable matching becomes N P -complete [8]. There are some previous attempts in the literature to solve the hard variants of the problem, see e.g. [10,8,9]. A constraint programming approach for the SM problem with ties and incomplete lists is presented in [11], and an encoding of SM problem instance to SAT instance is given in [12]. In multi-agent system settings, the stable matching problem has been studied, e.g., in [7]. Perhaps due to life-critical importance, the classical kidney exchange problem has received considerable attention lately, especially among economists, computer scientists and medical experts. There exist several approaches to the problem which are based on matching and market clearing algorithms, see e.g. [23,4,5,6]. In [6], an efficient algorithm for solving the kidney donator exchange problem is presented. Also, online stochastic optimization has been applied to the kidney exchange problem lately [24]. More recently, a novel model has been introduced to the kidney donor exchange problem, namely a non-simultaneous, extended, altruistic-donor chain model where a single altruistic donor may enable the performance of long chains of kidney transplantations [15]. 7 Conclusion In this paper, we presented difference logic encodings for matching problems arising frequently in different types of multi-agent systems. We demonstrated via extensive practical experiments that, combined with suitable difference logic satisfiability solvers, the approach can be effectively used to solve matching problems which cannot be solved with other state-of-the-art techniques. The encodings presented in this paper give a baseline for several further application scenarios, where difference logic is used to represent multi-agent preferences, and practically efficient satisfiability solvers can be used to solve computationally hard matching and coalition formation tasks. 182 10000 zchaff yices Minimum Run Times (sec) 1000 100 10 1 0.1 0.01 10 20 30 40 50 60 70 80 90 100 100 Random SMTI Instances, n=100, lengths of lists 6 (a) 10000 zchaff yices Median Run Times (sec) 1000 100 10 1 0.1 0.01 10 20 30 40 50 60 70 80 90 100 100 Random SMTI Instances, n=100, lengths of lists 6 (b) 10000 zchaff yices Maximum Run Times (sec) 1000 100 10 1 0.1 0.01 10 20 30 40 50 60 70 80 90 100 100 Random SMTI Instances, n=100, lengths of lists 6 (c) Fig. 1. Hard SMTI instances, n=100, lengths of preference lists 6. 183 10000 zchaff yices Minimum Run Times (sec) 1000 100 10 1 0.1 0.01 10 20 30 40 50 60 70 80 90 100 100 Random SMTI Instances, n=100, lengths of lists 9 (a) 10000 zchaff yices Median Run Times (sec) 1000 100 10 1 0.1 0.01 10 20 30 40 50 60 70 80 90 100 100 Random SMTI Instances, n=100, lengths of lists 9 (b) 10000 zchaff yices Maximum Run Times (sec) 1000 100 10 1 0.1 0.01 10 20 30 40 50 60 70 80 90 100 100 Random SMTI Instances, n=100, lengths of lists 9 (c) Fig. 2. Hard SMTI instances, n=100, lengths of preference lists 9. 184 10000 zchaff yices Minimum Run Times (sec) 1000 100 10 1 0.1 0.01 10 20 30 40 50 60 70 80 90 100 100 Random SMTI Instances, n=100, lengths of lists 12 (a) 10000 zchaff yices Median Run Times (sec) 1000 100 10 1 0.1 0.01 10 20 30 40 50 60 70 80 90 100 100 Random SMTI Instances, n=100, lengths of lists 12 (b) 10000 zchaff yices Maximum Run Times (sec) 1000 100 10 1 0.1 0.01 10 20 30 40 50 60 70 80 90 100 100 Random SMTI Instances, n=100, lengths of lists 12 (c) Fig. 3. Hard SMTI instances, n=100, lengths of preference lists 12. 185 5 runs on 7000 random instances with n=100, p1=0.80, p2= 0.20, 0.30,...,0.80 10 ’out’ using 2:1 x zChaff run−time (secs) 1 0.1 0.01 0.01 0.1 1 Yices run−time (secs) Fig. 4. Run-time comparisons between zChaff and Yices (in seconds). References 1. Gale, D., Shapley, L.S.: College admissions and the stability of marriage. Amer. Math. Monthly 69 (1962) 9–15 2. Knuth, D.E.: Les marriage stables et leur relations avec d’autres problèmes combinatoires. Les Presses de l’Université de Montréal, Montréal (1976) 3. Shapley, L.S., Scarf, H.: On cores and indivisibility. Journal of Mathematical Economics 1 (1974) 23–28 4. Roth, A.E., Sonmez, T., Unver, M.U.: Pairwise kidney exchange. Journal of Economic Theory 125 (2005) 151–188 5. Roth, A.E., Sonmez, T., Unver, M.U.: A kidney exchange clearinghouse in new england. Amer. Econ. Rev. Papers Proc. 95 (2005) 376–380 6. Abraham, D.J., Blum, A., Sandholm, T.: Clearing algorithms for barter exchange markets: Enabling nationwide kidney exchanges. In: EC ’07: Proceedings of the 8th ACM conference on Electronic commerce, ACM (2007) 295–304 7. Stolzenburg, F., Murray, J., Sturm, K.: Multiagent matching algorithms with and without coach. In: Multiagent System Technologies. Volume 2831., Springer LNCS (2004) 8. Iwama, K., Manlove, D., Miyazaki, S., Morita, Y.: Stable marriage with incomplete lists and ties. In: In Proceedings of ICALP 99: the 26th International Colloquium on Automata, Languages and Programming, Springer-Verlag (1999) 443–452 9. Manlove, D.F., Irving, R.W., Iwama, K., Miyazaki, S., Morita, Y.: Hard variants of stable marriage. Theor. Comput. Sci. 276(1-2) (2002) 261–279 186 10 10. Fleiner, T., Irving, R.W., Manlove, D.F.: Efficient algorithms for generalized stable marriage and roommates problems. Theor. Comput. Sci. 381(1-3) (2007) 162–176 11. Gent, I.P., Prosser, P.: An empirical study of the stable marriage problem with ties and incomplete lists. In: in ECAI 2002, IOS Press (2002) 141–145 12. Gent, I.P., Prosser, P., Smith, B., Walsh, T.: Sat encodings of the stable marriage problem with ties and incomplete lists. In: In SAT 2002. (2002) 133–140 13. Saidman, S.L., Roth, A.E., Sonmez, T., Unver, M.U., Delmonico, F.L.: Increasing the opportunity of live kidney donation by matching for two- and three-way exchanges. Transplantation 81 (2006) 773–782 14. de Klerk, M., Keizer, K.M., Claas, F.H., Witvliet, M., Haase-Kromwijk, B.J., Weimar, W.: The dutch national living donor kidney exchange program. American Journal of Transplant 5 (2005) 2302–2305 15. Rees, M.A., Kopke, J.E., Pelletier, R.P., Segev, D.L., Rutter, M.E., Fabrega, A.J., Rogers, J., Pankewycz, O.G., Hiller, J., Roth, A.E., Sandholm, T., Unver, M.U., Montgomery, R.A.: A nonsimultaneous, extended, altruistic-donor chain. N Engl J Med 360 (2009) 1096–10101 16. Mahfoundh, M., Niebert, P., Asarin, E., Maler, O.: Satisfiability checker for difference logic. In: Proc. of the 5th International Symposium on the Theory and Applications of Satisfiability Testing SAT’02. (2002) 222–230 17. Nieuwenhuis, R., Oliveras, A.: Dpll(t) with exhaustive theory propagation and its application to difference logic. In: In Proceedings of the 17th International Conference on Computer Aided Verification (CAV05), Springer (2005) 321–334 18. Dutertre, B., de Moura, L.: The yices smt solver. Available at http://yices.csl.sri.com/tool-paper.pdf (2009) 19. Kernighan, B.W., Ritchie, D.M.: The C Programming Language. Prentice Hall (1988) 20. Moskewicz, M.W., Madigan, C.F., Zhao, Y., Zhang, L., Malik, S.: Chaff: Engineering an efficient sat solver. In: In Proceedings of the 38th Design Automation Conference (DAC01). (2001) 21. Roth, A.E., Sotomayor, M.: The college admissions problem revisited. Econometrica 57(3) (May 1989) 559–70 22. Gusfield, D., Irving, R.W.: The stable marriage problem: structure and algorithms. MIT Press, Cambridge, MA, USA (1989) 23. Roth, A.E., Sonmez, T., Unver, M.U.: Kidney exchange. Quarterly Journal of Economics 119 (2004) 457–488 24. Awasthi, P., Sandholm, T.: Online stochastic optimization in the large: Application to kidney exchange. In: Proc. of the 5th International Joint Conference on Artificial Intelligence, (forthcoming) (2009) 187 On the Decentral Coordination of Artificial Cowboys: A Jadex-based Realization Gregor Balthasar, Jan Sudeikat, Wolfgang Renz Multimedia Systems Laboratory, Hamburg University of Applied Sciences, Berliner Tor 7, 20099 Hamburg, Germany Tel. +49-40-42875-8304 {baltha g|sudeikat|wr}@informatik.haw-hamburg.de Abstract. The Multi–Agent Programming Contest 2009 is based on the last year’s Cows and Herders scenario. It provides additional challenges by introducing fence structures as well as persistent cows. To deal with the new scenario our second–time participation in the contest is based on a set of BDI agents that share knowledge and coordinate by decentralized planning algorithms. As a result of switching to a decentralized planning from the centralized planning approach used in last year’s contest, the architecture had to be changed from top to bottom and has been extended to cover the scenario changes. The conceived design is implemented in the Jadex system which provides language constructs to implement BDI–agents on top of a distributed systems middleware. 1 Introduction Our implementation of the artificial cowboys is based on the Jadex 1 framework [1] that provides a run-time environment and tool-set for the construction of agent-based software systems. Agents follow the Belief–Desire–Intention (BDI) architecture. The basic elements for the design of these agents are Beliefs, Goals and Plans. Beliefs represent the knowledge that is available to individual actors about themselves, i.e. their internal state, and their environment. Goals represent the objectives that agents can commit to bring about. These are typically defined as specific states of the agent’s beliefs. Finally, Plans are used to equip agents with procedural knowledge, i.e. the ability to execute specific tasks or activities. Agents are realized by prescribing the structure of agents in a XML format and providing plans that are programmed in the Java language. Jadex also provides a modularization concept [2] to structure sets of agent elements into reusable functional clusters. The agent execution is governed by a reasoning mechanism that automates the deliberation, i.e. the selection of goals as well as means-endreasoning, i.e. the selection of plans for the achievement of currently activated goals [1][3]. 1 http://jadex.informatik.uni-hamburg.de/bin/view/About/Overview 188 In the following we describe our ongoing work on the development of a Jadex– based2 MAS to compete in the 2009 Multi–Agent Programming Contest. The reactive planning abilities of BDI agents are exploited to balance reactivity as well as strategic team play. Agents are arranged in an architecture that allows the adaption of the team’s strategy to varying game play settings. Since the AgentContest 2009 is still to come, the sections Discussion and Conclusion can only rely on the development process. 2 System Analysis and Design The system consists of ten homogeneous Teammate agents which are able to play different roles. These roles are subdivided into two main categories: – Leader: commanding a set of 1-4 sidekicks • Explorer: wandering the environment and report perceptions • Herder: guidance of groups of cows • Disturber: assault on the enemy’s herding attempts – Sidekick: navigation to assigned locations and evaluation of game theoretical aspects The decision of Teammates to play a certain role is achieved by the Role Decision–goal that becomes active at the beginning of each simulation step and takes account of the game situation. All perceptions of the Teammates are communicated to each other role-independently, from which follows that every Teammate has got the same view of the game situation. Due to the fact that all roles of the leader-roles pool need teamwork at least to get through the fence structures, teams are set up consisting of one teammate playing a leader-role (Herder, Explorer or Disturber) and up to four teammates playing the sidekick role. The Evolutionary Prototyping method is utilized throughout the development of the system, while the Tropos modeling notations and tools3 are used to facilitate the refinement of the system’s architectural design during the development. 3 Software Architecture The Jadex system has been selected for the realization of our competition team, as the BDI agent architecture allows to employ goal-oriented programming. The activities of agents, i.e. the behaviors that individual actors can exhibit, are partitioned into hierarchies of goals and sub-goals. According to our experiences this design and development stance facilitates the construction of situated and autonomous agents. The ability to include arbitrary Java objects in Jadex agents is of pragmatic value, e.g. to reuse communicative facilities to interact with the 2 3 Jadex 0.96 e.g. TAOM4E: http://sra.itc.it/tools/taom4e/ 189 server of the contest environment or to realize custom visualizations that display both the environment and the agent state to facilitate debugging. In addition, we make use of the tailored modularization concept that allows structuring Jadex agents. Therefore, differing concerns, e.g. the communication with the game environment and the communication among the team mates are clearly separated. These functionalities provide goal-oriented interfaces. Therefore, functionalities can be used by activating specific goals inside agent modules. Communication is routed via the agent state, i.e. perceptions of sensory information cause modifications of the agent beliefs. The Jadex language allows prescribing reactions to these modifications. The MAS is composed of a homogeneous set of Teammates which can play different roles. Figure 1 shows the dependency relationships of the identified roles. Role–independently every Teammate communicates with the competition server, i.e. getting sensory data and moving in the environment. The Teammates use MAS internal communication to request assistance from each other and regularly communicate their (local) perceptions to all other Teammate agents. As a result of all Teammates having the same view, a visualization can be done by requesting the knowledge of a single Teammate. All activities are cooperative efforts due to the scenario requirements. The team leaders coordinate their own activities as well as their sidekicks. Furthermore, the team leaders can cooperate with other team leaders through FIPA4 –compliant ACL–messages. Fig. 1. The MAS Architecture. Tropos model for the dependencies between roles. 4 Foundation for Intelligent Physical Agents, cf. http://www.fipa.org/ 190 4 Agent team strategy The agent team strategy is divided into the global strategy the whole team follows itself and into the particular strategies used by the different roles to execute their tasks. 4.1 Global strategy The team’s global strategy is to gain a global overview on the environment as quickly as possible and then to move crowds of cows to one’s own corral. During consecutive simulation runs, the game play environment and the opponents strategies are the subjects of changes. In order to show adaptivity to these influences, several global modes are distinguished and decided by background processing within dedicated agents. Environment properties, e.g. the obstacle density, and game theoretical aspects demand the adjustment of searching and herding as well as the disturbing strategies. The game setting permits members of one’s own team to retrieve cows from the enemy’s corral and vice versa. Therefore our implementation includes the role of the Disturber which Teammate agents can switch to, regarding the game situation. Since defending one’s own corral can hardly be done, there is no defense strategy implemented. Additionally, the availability of the Teammate agents during the competition is ensured by a dedicated observer agent which can restart agents if needed [4]. 4.2 Particular strategies As the overall effectiveness of the team still depends on bringing herds of cows to the own corral respectively bringing them out of the enemy’s corral, regarding the scenario extension, we will use parts of the efficient herding algorithm from our last year’s participation [4], extended by the following features: – Recognizing fences and dealing with them – Instead of herding cows by standing at fixed calculated points, the Teammates now wander between these points to affect more cows – Depending on the herd’s size, a herding team can now consist of up to five Teammates – For very large herds, teams can now cooperate to herd them Exploration is also done by teams and is forced from the simulation beginning until the whole map is discovered, to ensure that the A* Algorithm used for pathfinding can work properly. Furthermore, the enemy can be disturbed while herding cows by a disturber team consisting of up to 5 Teammates (adjusted to the overall game situation, e.g. the enemy has already herded too much cows to win conventionally, regarding the overall number of cows) that try to keep the enemy’s corral clear from cows. 191 5 Discussion As a result of the intensification of the scenario (introduction of switches and fences, no more vanishing of cows at the corral fields), the main work had to be centered again on the herding of cows. This, combined with the (in spite of the postponement of the contest) short development time, caused the negligence of aspects like game theory and enemy treatment, for example. The cow algorithm used by the contest server has improved from the one of the last year’s contest (where large cow herds could not be moved anymore) but still provides no swarm behavior (e.g. the flight behavior of a single cow does not spread out on other near cows). Furthermore, the effective defense of the own corral is nearly impossible, since blocking enemy agents from the fence button demands the allocation of five out of ten Teammates, while it is not ensured that there will be a fence that is separating the corral cells from the rest of the map. A Tit for Tat–like punishment if the enemy incapacitates the own agents from herding any cows by surrounding the switch of a fence separating the corral from the rest of the map is contemplated, but will not be implemented due to the above mentioned time issues. 6 Conclusion We presented a MAS design that aims at combining BDI–based practical reasoning with game theoretical aspects. MAS adaptivity to environment conditions and varying opponent behaviors is considered, as team strategies are continuously validated and revised by all Teammate agents. The outlined design has cycled several revisions but since the actual contest is still to come, it cannot be ensured that it reflects the final design that will be used in the contest. At last, it can be said that the development of the MAS has fulfilled the aims proposed by the AgentContest 2009. Key problems could be identified and eliminated as well as benchmarking and research can be done with the localserver package. References 1. Braubach, L., Pokahr, A., Lamersdorf, W.: Jadex: A bdi agent system combining middleware and reasoning. In: Software Agent-Based Applications, Platforms and Development Kits. Whitestein Series in Software Agent Technologies, Birkhäuser (2005) 2. Braubach, L., Pokahr, A., Lamersdorf, W.: Extending the capability concept for flexible BDI agent modularization. In: Proc. of PROMAS-2005. (2005) 3. Wooldridge, M.: An Introduction to Multi Agent Systems. Wiley (2002) 4. Balthasar, G., Sudeikat, J., Renz, W.: On Herding Artificial Cows: Using Jadex to Coordinate Cowboy Agents. In: Programming Multi-Agent Systems, 6th International Workshop, ProMAS 2008, Revised and Selected Papers. Springer (2009) 233–237 192 Developing Artificial Herders Using Jason Niklas Skamriis Boss, Andreas Schmidt Jensen, and Jørgen Villadsen⋆ Department of Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building 321, DK-2800 Kongens Lyngby, Denmark Abstract. This paper gives an overview of a proposed strategy for the “Cows and Herders” scenario given in the Multi-Agent Programming Contest 2009. The strategy is to be implemented using the Jason platform, based on the agent-oriented programming language AgentSpeak. The paper describes the agents, their goals and the strategies they should follow. The basis for the paper and for participating in the contest is a new course given in spring 2009 and our main objective is to show that we are able to implement complex multi-agent systems with the knowledge gained in an introductory course on multi-agent systems. 1 Introduction This paper describes the work with a multi-agent system consisting of artificial herders attempting to catch cows. The agents will compete in the Multi-Agent Programming Contest 2009 (the scenario “Cows and Herders”). One of our main objectives in the contest has been to gain experience with the development of multi-agent systems using Jason. Our basis for participating in the contest is the course “Artificial Intelligence and Multi-Agent Systems” given in spring 2009 at the Technical University of Denmark. The course provides an introduction to multi-agent systems using Jason as the implementation platform. We hope to show that this introduction is sufficient to be able to implement a more complex multi-agent system, such as the “Cows and Herders” scenario given in the contest. 2 System Analysis and Design Our system consists of three kinds of agents: a herder, a scout and a leader. The leader and the scout are basically herders with extra responsibilities. The scout will initially explore the environment and subsequently act as an ordinary herder. The leader will delegate targets to each of the herders – including himself. ⋆ Contact: [email protected] 193 Our system was designed using the Prometheus methodology as a guideline. By this we mean that we have adapted relevant concepts from the methodology, while not following it too strictly (as stated in [3]). It has allowed us to quickly identify the goals and what agents are needed to complete them. Fig. 1. Overview of the system. Figure 1 gives an overview of the system. The diagram distinguishes between the three types of agents, even though the leader and the scout are actually special cases of the herder. This has been done to easily see the different roles each agent plays. All agents know their own position and how many steps of the match that have elapsed. This is used to revise targets, since we do not want them to blindly follow a target. An agent gets a new target by fulfilling the goal get new target. The herders will tell the leader to delegate a target based on the agents current position, while the scout will autonomously decide where to go. We distinguish between the following types of targets. While the agents do not really have an understanding of each concept, it is helpful for us to be able to tell the targets apart. Exploration targets are targets in an area which has yet to be explored. Such target is delegated to the scout, when he has not explored the entire environment, or a herder, whenever he does not fulfill the criteria for a receiving another type of target. Formation targets are targets behind cows, but within a certain distance from both cows and other herders, so that the group of cows can be controlled and moved (or herded) towards the corral. 194 A switch target is a target next to a switch. The reason for this is that an agent should stand next to a switch in order to trigger it. This target will be delegated whenever an agent is near a closed switch and it is reasonable to open it. This is the case if one or more cows are near the fence or another agent is on one side, while having a target on the other side (thus needing another agent to open the fence, since one agent cannot pass a fence alone). The scenario is quite dynamic since cows are continuously moving and fences can be opened and closed, and all of this must be taken into account. 3 Software Architecture Our strategy and agents are implemented using the Jason platform, which is an implementation of the AgentSpeak language, written in Java. Jason is an effective platform for creating multi-agent systems with a variable number of agents. Combined with internal actions, we have a strong foundation for building a multi-agent system, which not only uses the features of logic programming, but allows us to develop imperative extensions as well. The use of custom architectures in Jason allows us to implement a local simulation, as described in [1]. This eases the testing, as it can be done much faster. As reference implementation we have used an implementation of the 2008 contest made by the authors of Jason. This has helped us getting started, even though the scenario differs in many ways from last year. Our solution to the contest was developed using Eclipse. The implementation will have great focus on the advantages of object oriented programming. This would also ease future expansion of more agents etc. Shared memory could also be modelled by use of references to shared objects used by multiple agents. 4 Agent Team Strategy The agents will be moving around in a partially known environment. At the beginning of a match everything is unknown, except for what lies within the agents’ field of view, and as the agents move around they gain knowledge of the environment. The entire map is represented by a graph, where each node in the graph represents a cell in the environment. When objects such as obstacles or cows are discovered etc. the corresponding cell in the graph will be assigned a value of that kind of object. When agents move around they follow paths calculated by our navigation algorithm. We have chosen to represent the environment as a graph, since it makes it is easy to use a graph search algorithms for navigation. The actual paths are calculated using the A* algorithm, which basically is an advanced best-first search as it uses a heuristic to guide the search for optimal paths. A part of our strategy is to try to keep clustered cows together. This means that the agents will have to move around a group of cows to avoid splitting 195 them up. This is ensured be assigning weight to the different cell in the graph. By assigning higher weights to cells occupied by cows and cells adjacent to cows, agents will navigate around a cluster instead of through it. Obstacles are handled slightly different. The algorithm is implemented so that it does not consider cells containing obstacles as valid cells for a path. This ensures that agents do not try to move through obstacles. To optimize the movement of our agents the paths are continuously calculated. This is done since all agents can add new knowledge of obstacles etc. to the graph as they perceive the environment. This ensures that if one agent discovers that a corridor is blocked, then the other agents will try to move around it to get to their target. Experiments have shown that it is more efficient to herd cows in groups. To ensure this the leading agent makes great use of a clustering algorithm. The algorithm works be examining the surroundings of each cow; adjacent cows are grouped together. The strategy for herding the cows will be taken care of by the leading agent. The team leader will coordinate the herding, ensuring that the cows are fleeing the right way and that an agent will open the fence at an appropriate time. Our strategy is mainly towards maximizing our own score. This means that our agents will not try to capture cows already being herded by the opponent deliberately, but it might happen if the leading agent estimates that they are the cows closest to the corral. An agent’s beliefs consist of what they perceive and what others tell them to believe. Optimally, we would like that every agent knows the same, i.e. they all have the same beliefs. Unfortunately, since agents can only see a limited area of the environment, this is not directly possible. To ensure that every agent knows the same, any new belief an agent perceives is sent to every other agent. All beliefs are shared immediately, since it does not create much overhead and it is more efficient to share it than consider whether it should be shared. When an agent discovers a static obstacle, every agent should know this, so that their navigation can be adjusted to this new knowledge. If an agent fails to achieve a given goal then we will use the Jason failure handling feature. This is done by implementing a deletion event -!g, which will be executed if a given plan fails [2]. After recovering from a failed plan, we will attempt to reintroduce the goal (+!g) again. 5 Discussion Our strategy is quite dynamic because of our use of path finding and clustering algorithms, which allow the herders to fulfill their goals in any given scenario. However, some of the choices we have made are made on assumptions which may prove to be mistaken when the competition is held. We have decided to have a maximum cluster size (i.e. limit the number of cows in a single cluster), because we believe that the agents may have a hard 196 time herding larger clusters. This may not be true, though, since it could be more efficient to herd as many cows as possible as long as they are clustered. To compute an optimal search it is important to move agents in patterns so that the largest possible area is explored. For example, agents should never move side by side towards the same location, since this would not exploit the full potential of the agents’ field of view. Likewise it could prove useful to move agents in patterns that ensures that no cow can remain undetected in the explored area. However, we need to carefully design our algorithms so that they do not take too long time to compute, since the duration of a turn is limited. At the time of writing this article our implementation is complete. However, the contest has been postponed until after the deadline of the article, so we are unable to discuss the results. We have managed to play a single training match against another team, which we won. This match gave us an opportunity to see how our team plays against others. Generally we are quite satisfied with our system, which is able to fulfill the goals of the scenario. Our strategy with a single leader delegating targets lead to a less autonomous approach, but the Jason framework has allowed us to easily implement agents with certain goals and a way to implement plans for handling these goals. 6 Conclusion As discussed our primary strategy will be to maximize our own score rather than prohibiting the opposing team from scoring points. This has been done by optimizing the search for cows and guiding the cows into the corral by using cooperating agents. Likewise all agents will take the positions of the opponents into account when choosing a target. Throughout the project we have considered problems such as navigation, search for objects using multiple start points, clustering, cooperation between agents and multi-agent planning. All planning was implemented using AgentSpeak, while external algorithms such as A*, our clustering algorithm and target delegation were implemented in Java. Despite our limited experience with AgentSpeak and programming intelligent multi-agent systems, we have managed to implement a fairly reasonable system, with agents which fulfill the goals of the contest. The ability of Jason to implement custom architectures was a great help during the work. References 1. Rafael H. Bordini, Jomi F. Hübner, and Daniel M. Tralamazza. Developing a Team of Gold Miners Using Jason. Springer-Verlag LNAI 4908, pages 241–245, 2008. 2. Rafael H. Bordini, Michael Wooldridge, and Jomi Fred Hübner. Programming MultiAgent Systems in AgentSpeak Using Jason. John Wiley & Sons, 2007. 3. Lin Padgham and Michael Winikoff. Developing Intelligent Agent Systems: A Practical Guide. John Wiley & Sons, 2007. 197 Herding Agents - MicroJIAC 2.0 Team in Multi-Agent Programming Contest 2009 Erdene-Ochir Tuguldur, Anand Bayarbileg and Marcel Patzlaff [email protected] [email protected] [email protected] DAI-Labor, Technische Universität Berlin, Germany Abstract. The MicroJIAC team has participated in the Multi-Agent Programming Contest 2007 with some success. This year, we are participating again in the contest; our contest contribution is implemented by a student of a university course using the current version of our agent framework. Unlike the gold mining scenario of MAPC 2007, this year’s cow herding scenario has higher complexity and will surely be a very good testbed to evaluate our new agent framework. 1 Introduction Like our participation in the MAPC 2007 [1], this year’s motivation to participate in the contest was to test the new features and to evaluate the usability of the next version of our agent framework, MicroJIAC 2.0 [2]. Since the first version [3], MicroJIAC underwent several modifications and extensions to meet further requirements. The current edition, MicroJIAC 2.0, supports real-time, adaptable nodes and agents, “hot deployment” and migrations, and increased usability. This year, the MicroJIAC 2.0 Agent Team has been prepared by Anand Bayarbileg, a student of a university course at Technische Universität Berlin supervised by members of the Competence Center Agent Core Technologies of DAI-Labor, TU Berlin. 2 System Analysis and Design Like our other student team, the JIAC V team, we have chosen a role-based approach where each agent can decide himself which role it will take, depending on the current world model state. First, the agents should explore the terrain in order to find all cows, obstacles, fences and the enemy corral. If the agents explore the entire environment, they will be able to make the best decisions regarding which cows they should herd and which paths they should use. This is realised by the scout role. Second, if an agent discovers cows, it will decide whether it should herd the cows into the corral or not. If he decides to herd the cows, it will compute the 198 center of the cow herd and the path from this center to the corral with the A∗ algorithm. This path is used to drive the cows into the corral. Those actions are subsumed by the herder role. Third, we need an agent that stands on the entry into the corral and opens the entry fence so that the herder agents can drive cows into the corral. This agent has the guard role and in contrast to the previous roles, this role is permanently assigned to a specific agent at the beginning of the game. We have also considered to realise some destructive roles such as disquieter which is responsible to go into the enemy corral and to scare the cows away from it. But we have finally decided not to implement destructive roles to be fair to other teams. Finally, to make an optimal decision, each agent should know what other agents are perceiving and planning. Thus each agent broadcasts its perception so that every agent has the same view on the surroundings; and its intention which allow other team agents to coordinate their actions with this agent’s action. 3 Software Architecture This year’s contribution to the contest is realised using the new version of MicroJIAC. MicroJIAC 2.0 is an agent framework for devices with scarce resources. It needs a JAVA virtual machine supporting at least the CLDC-1.1 [4] and is thus also executable on usual desktop systems. Currently we extend and modify its component structure to support real-time JAVA (RTSJ) [5]. MicroJIAC 2.0 distinguishes between four element kinds: sensors, actuators and active or passive behaviours. We use two sensor/actuator elements to connect the agents to the server and their team mates. The planning is done via an active behaviour which computes the path in the background. Changes in the knowledge store are issued by the aforementioned connector elements. They trigger the reactive behaviours which implement the action selection. Thus, all specified roles are realised with the reactive behaviours. The actual contest implementation is based on our contribution to the MAPC 2007. Thus, this years contribution implements a multi agent system whose agents are reactive and autonomous. These agents consist of four main agent elements (see Figure 1). 1. Connector It maintains the connection to the competition server. Parsing the messages received from and sending the actions back to the server are the main tasks of this element. The concrete implementation is a combination of the sensor and actuator interfaces. 2. Perceptor It updates the world model and fires notification events to trigger the rules. Furthermore it is responsible for the communication and coordination with the other agents. This element also implements the sensor and actuator interfaces. 199 Fig. 1. Design of the Competition Agents 3. Monitor It provides a graphical user interface which displays the world model of the agent. This is used for debug purposes (see Figure 2) and implements only the actuator interface. 4. Rules Rules implement the logic and are associated to specific world model states. Depending on the state the rules create actions for the agent. They implement the reactive behaviour interface. 4 Agent team strategy In order to improve the stability and to minimize the damage of an individual agent crash, we are following the self-organizing approach. There is no master agent and all agents are equal except for the guard agent. The agents cooperate on a number of levels. First, they share their perceptions, their local view on the environment. This provides them a global view on the environment. Second, the agents exchange their intentions, what they are going to do. This prevents the agents from going to the same unknown field or even exploring the same region of the world. Furthermore, it helps them to coordinate the cow driving. 200 Fig. 2. Monitor GUI of the Competition Agents At the start of the game, an agent is elected as the guard agent which stands on the corral entry and opens the fence. The other nine agents are responsible to explore the terrain and to drive cows. If an agent doesn’t find any cow, it will start to explore the environment. If the surrounding field of the own corral isn’t yet explored, the agents will go firstly to the corral. After reaching the own corral, the agent can now compute for every cell that it perceives the distance from the corral. This value is saved for every cell and used for choosing cows. If an agent finds a cow and no one herds the cow, the agent will start to drive the cow into the corral. If the agent finds many cows, it will compute whether some cows form a cluster, and if there are some cow clusters, the agent will choose the biggest cluster and drive it into the corral. If an agent finds a closed fence on the way, the agent will open the fence and be waiting for other team agents. If an agent could pass through an open fence, the agent will check if any team agent pushes the button. In that case, the agent goes to the other button and allows the team agent to go through the fence. In order to improve the stability of the contest implementation, a number of failure/crash recovery mechanisms will be deployed. – Whenever the connection between an agent and the server breaks during the simulation, the agent will try to reconnect. – If an agent crashes, the agent will be restarted and request the other agents to share the actual view on the global environment. 201 5 Discussion We hope that the changes made in this year’s scenario will bring more dynamics in the contest. Last year, there were only two agent roles in our JIAC-TNG team [6], scout and herder. The new scenario introduces the possibility of more agent roles such as – a guard that guards the corral and controls the button and – a disquieter that tries to scare away the cows from the inside of the enemy corral, to block the labyrinth entry to the enemy corral or to prevent the enemy agents from accessing the button. But, we have finally decided to implement only the guard role, because the implementation of the other role, disquieter, seemed to be unsportsmanlike. We have also used a fully self-organizing approach where all agents are equal. We are looking forward to seeing how our self-organizing agents play against other teams deploying centralized approaches and destructive strategies. 6 Conclusion For the first time, our agent framework is used in the teaching and we got much feedback and fresh ideas from the students. In the course of the contest implementation, we have also found bugs in the MicroJIAC agent framework and fixed them. Thus, the contest was a good testbed to evaluate our agent framework and helped to improve it. References 1. Tuguldur, E.O., Patzlaff, M.: Collecting Gold: MicroJIAC Agents in MULTIAGENT PROGRAMMING CONTEST. In Dastani, M., Segrouchni, A.E.F., Ricci, A., Winikoff, M., eds.: ProMAS 2007 Post-Proceedings. Volume 4908 of LNCS., Springer Berlin / Heidelberg (2008) 257–261 2. Patzlaff, M., Tuguldur, E.O.: Microjiac 2.0 - the agent framework for constrained devices and beyond. Technical Report TUB-DAI 07/0901, DAI-Labor, Technische Universität Berlin (2009) http://www.dailabor.de/fileadmin/files/publications/microjiac 20 2009 07 02.pdf. 3. Patzlaff, M.: Development of a Scalable Agent Architecture for Constrained Devices. Master’s thesis, Technische Universität Berlin (2007) 4. Sun Microsystems, Inc. Sun Microsystems, Inc. - 4150 Network Circle Santa Clara, California 95054 - U.S.A 650-960-1300: Connected Limited Device Configuration. Specification version 1.1 edn. (2003) available at http://jcp.org/aboutJava/communityprocess/final/jsr139/. 5. Bollela, G., Brosgol, B., Dibble, P., Furr, S., Gosling, J., Hardin, D., Turnbull, M., Belliardi, R.: The Real-Time Specification for Java. Addison-Wesley (2000) 6. Hessler, A., Keiser, J., Küster, T., Patzlaff, M., Thiele, A., Tuguldur, E.O.: Herding Agents - JIAC TNG in Multi-Agent Programming Contest 2008. In Dastani, M., Segrouchni, A.E.F., Ricci, A., Winikoff, M., eds.: ProMAS 2008 Post-Proceedings. Volume 5442/2009 of LNCS., Springer Berlin / Heidelberg (2009) 228–232 202 Using Jason, M+ , and CArtAgO to Develop a Team of Cowboys Jomi F. Hübner1 , Rafael H. Bordini2 , Gustavo Pacianotto Gouveia3 , Ricardo Hahn Pereira3 , Gauthier Picard4 , Michele Piunti5 , and Jaime S. Sichman3 1 2 Federal University of Santa Catarina, Brazil [email protected] Federal University of Rio Grande do Sul, Brazil [email protected] 3 University of São Paulo, Brazil {jaime.sichman,ricardo.pereira1,gustavo.gouveia}@poli.usp.br 4 École des Mines de Saint-Étienne, France [email protected] 5 Università di Bologna, Italy [email protected] 1 Introduction This paper gives an overview of a multi-agent system simulating a team of cowboys to compete in the Multi-Agent Programming Contest 2009. This edition of the contest uses a “Cows and Herders” scenario, similar to the 2009 contest but now extended with fences that require cooperation an coordination to be opened. In the previous contests we tested and improved Jason and its integration with other tools, in particular the organisational platform provided by M+ . Jason [2] is an interpreter for an agentoriented programming language that extends AgentSpeak(L) [6]. The language is inspired by the BDI architecture [7], therefore based on notions such as beliefs, goals, plans, intentions, etc. M+ is an organisational framework [5] that includes: (i) a language used to program the organisation of the MAS with concepts such as groups, roles, missions, global goals; and (ii) a platform that provides the necessary services for the agents to manage and operate within organisations. The participation in the last contests has contributed to our experience both in programming agents with Jason and in using BDI concepts. In the 2006 contest, the focus was on creating agent plans [1], which resulted in rather reactive agents. In the 2007 contest, the focus was on (declarative) goals [3], leading to more pro-active, goaldirected agents. In the 2008 contest, the focus was on the definition of the organisation of the MAS, leading to more social-aware agents [4]; instead of communication only (as in previous years), roles, groups, and common goals were also considered in the last edition of the multi-agent programming contest. This year, we were motivated to continue to improve and evaluate the integration of Jason with other technologies. Besides agents and organisation, we had hoped to 203 Legend team 0..5 inheritance: 0..10 min..max composition: exploration herding role 0..1 1..1 1..1 0..1 0..1 group 0..1 0..1 explorer gatekeeper2 herder Abs Role 0..10 gatekeeper1 links scouter herdboy intra-group inter-group acquaintance leader communication authority compatibility cowboy Fig. 1. The Structural Specification of the Organisation. also use artifacts that could help the agents in shared tasks [9]. Artifacts provide mechanisms to externalise functions that currently are implemented as internal actions in Jason. The system would therefore be developed in three dimensions: agents (using declarative goals), organisation (using groups, roles, and shared goals), and artifacts (using external, coordinating operations). Our objective in participating in this contest was originally twofold: (i) to continue to test and improve Jason and its integration with other tools (M+ and CArtAgO); (ii) evaluate the use of artifacts in the development of the team. Due to lack of time, we had to drop the use of artifacts in the implemented team for this edition of the agent contest, and have left it for future work (hopefully for the next edition). 2 System Analysis and Design It is clear, from the description of the scenario, the importance of cowboys working as a coordinated team. It would be very difficult for a cowboy alone to herd a group of cows. As in the previous edition, we adopted a strategy strongly tied to the notion of group of agents where issues such as spatial formation, membership, and coordination would be emphasised. The overall analysis of the team is the same used in the previous contest, since the scenario is very similar; we refer the reader to [4] as in the space available we can only discuss the main additions to team developed for the last edition of the agent contest. The organisational structure of the team is specified in Fig. 1 using the M+ notation. Compared to the previous edition, the structural specification now has two new roles, called gatekeeper1 and gatekeeper2. This scenario requires two agents to cooperate to open the fence to allow their team members and cows to pass, and they also need to coordinate their action, as discussed below. The two new roles created are used to handle the new feature of the scenario for this edition of the competition of fences that agents and cows need to pass through. They are the key roles in the new M+ scheme called Pass-Fence (see Figure 2) which is 204 PassFenceSch pass_fence(X,Y) first_switch second_switch keep_switch1(X,Y) keep_switch2(X,Y) first_switch first_switch goto_switch1(X,Y) second_switch cross_gate goto_switch2(X,Y) second_switch wait_for_others_to_pass(L) first_switch wait_gatekeeper2 Fig. 2. The Functional Specification for the Pass-Fence Scheme. used when a group of agents need to pass through a gate with a closed fence. When an exploring or herding group perceives a fence in their chosen path, the agents playing these two special roles within the groups will know the goals they have to do to ensure the group passes safe through the gate. The agent playing the gatekeeper1 role is sent to the position where the first switch (the one on the side where the agents currently are) can be activated. This allows the agent playing the gatekeeper2 role to go through the gate and position itself where the second switch can be activated (i.e., on the other side of the fence). When all agents of the group have passed through the gate, the scheme is finished. Table 1 briefly presents the goals that agents are obliged to achieve when playing one of the new roles (remember that we are not presenting here the part of of our solution that was already described in [4]); the goals are part of the first switch and second switch missions as shown in Figure 2. There is a further complication with the fences in this scenario which is when two groups of the team need to cross the same gate. To handle this, before creating an instance of the pass fence scheme, the second gatekeeper will always check with all team members (through communication) whether another group already has an active instance of such scheme, and if so, instead of creating another instance, the second gatekeeper will contact its counterpart in the group that is already in the process of passing that gate. The currently acting gatekeeper will then wait for all agents in both groups to pass through the gate and only then terminate the scheme. When the scheme Table 1. The New Organisational Goals of the Team. Role Goal gatekeeper1 goto switch1(X,Y) wait gatekeeper2 pass fence Goal Description position itself where the switch can be activated1 keep on activating the first switch until the other gatekeeper has reached its destination once the second gatekeeper is at its position, this agent can already go and join the rest of its group gatekeeper2 goto switch2(X,Y) position itself where the switch at the other side of the fence can be activated wait for other to pass this agent is the one that needs to wait until all team members, in any groups, who wanted to pass that fence at that time, have done so 205 terminates, the acting second gatekeeper joins the group again, who go on to resume whatever they were doing (either exploring or herding). Although we have some global constraints over the agents’ behaviour (based on the roles they are playing), they are autonomous to decide how to achieve the goals assigned to them. While coordination and team work are managed by the M+ tools, the autonomy and pro-activeness are facilitated by the BDI architecture of our agents implemented in Jason. Regarding communication (required, for example, for the share seen cows goal), we use speech-act based communication available in Jason. 3 Software Architecture We initially planned to use artifacts to encapsulate two functions in our solution: integration with the contest simulator and maintenance of a shared view of the scenario. These were the two artifacts we wanted to implement. The first artifact would replace the customised Jason agent architecture we used to interface our agents with the simulator. In that new version, each agent would have access to the InterfaceArtifact that would provide, as observable properties, the current perception provided by the simulator to the agent, and, as an operation, the capability to send the actions of the agent back to the contest simulator. This artifact would also responsible for encapsulating all network issues, like reconnection, login, failure handling, etc. The second artifact would be the PathArtifact. The motivation for this artifact is to have a shared representation of the scenario instead of each agent having its own representation. In the previous contest edition, we used broadcast messages: for each seen cow/obstacle, a message is broadcast so that all other team members can update their representation. This is a quite expensive solution in terms of communication. With this new artifact, for each seen cow/obstacle, an operation is triggered in the PathArtifact to update the scenario state. This operation may be triggered either by the agents or directly by the InterfaceArtifact. Other useful operations such as ‘find path’ would be implemented in this artifact so that the agents do not need to internally keep a representation of the world. The implementation and deployment of the artifacts was to be done with the CArtAgO platform [8]. 4 Agent Team Strategy 1. Navigation algorithms. As in previous teams, we use the A* algorithm to find paths and avoid obstacles. 2. Describe the team coordination strategy (if any). The coordination is based on shared global goals and global plans as defined in M+ . 3. Does your team strategy use some distributed optimisation technique w.r.t., e.g., minimising distances walked by the agents? In general, no, but in future work negotiation techniques might be used to find out good global solutions. At the individual level, A* finds optimal paths. 4. Describe and discuss the information exchanged (and shared) in the agent team. The more information (specially obstacles and fences) about the scenario is available for A*, the better it performs. So when an agent perceives an obstacle or a fence, it communicates that information to all team members. 206 5. Describe the communication strategy in the agent team. Can you estimate the communication complexity in your approach? We have not yet formally defined the communication protocols. 6. Did your system do some background processing? By background processing we understand some computation which happened while agents of the team were idle. No. 7. Possibly discuss additional technical details of your system such as failure/crash recovery and alike. We associate an “angel” to each agent; the angel checks if the agent is blocked/crashed and then tries to solve the problem automatically. 5 Conclusion Due to lack of time, we have not been able to implement the planned integration with CArtAgO. This would have made some parts of the implementation of our team (e.g., the sharing of spatial information) more elegant. The added feature of “fences” in the latest scenario of the agent competition lead to significant extra complexity in the scenario. However, our final solution remains compact and elegant because the high-level code at the organisational and agent levels remain essentially the same with the addition of only two extra roles and five new goals that agents playing those roles are required to achieve. It remains future work to implement the use of artifacts and make a thorough evaluation of the overall approach combining three of the most prominent agent development techniques. References 1. R. H. Bordini, J. F. Hübner, and D. M. Tralamazza. Using Jason to implement a team of gold miners. In CLIMA VII, volume 4371 of LNCS, pages 304–313. Springer, 2007. 2. R. H. Bordini, J. F. Hübner, and M. Wooldrige. Programming Multi-Agent Systems in AgentSpeak using Jason. John Wiley & Sons, 2007. 3. J. F. Hübner and R. H. Bordini. Developing a team of gold miners using Jason. In ProMAS 07, volume 4908 of LNCS, pages 241–245. Springer, 2008. 4. J. F. Hübner, R. H. Bordini, and G. Picard. Using Jason and MOISE+ to develop a team of cowboys. In ProMAS 08, volume 5442 of LNAI, pages 238–242. Springer, 2009. 5. J. F. Hübner, J. S. Sichman, and O. Boissier. Developing organised multi-agent systems using the MOISE+ model: Programming issues at the system and agent levels. International Journal of Agent-Oriented Software Engineering, 1(3/4):370–395, 2007. 6. A. S. Rao. AgentSpeak(L): BDI agents speak out in a logical computable language. In MAAMAW’96, number 1038 in LNAI, pages 42–55, Springer, 1996. 7. A. S. Rao and M. P. Georgeff. BDI agents: from theory to practice. In ICMAS’95, pages 312–319. AAAI Pess, 1995. 8. A. Ricci, M. Viroli, and A. Omicini. CArtAgO: A framework for prototyping artifact-based environments in MAS. In E4MAS 2006, volume 4389 of LNAI, pages 67–86. Springer, 2007. 9. M. Viroli, T. Holvoet, A. Ricci, K. Schelfthout, and F. Zambonelli. Infrastructures for the environment of multiagent systems. Autonomous Agents and Multi-Agent Systems, 14(1):49– 60, July 2007. 207 AF-ABLE: ProMAS System Description Howell R. Jordan, Jennifer Treanor, David Lillis, Mauro Dragone, Rem W. Collier, and G. M. P. O’Hare School of Computer Science and Informatics University College Dublin [email protected], {jennifer.treanor, david.lillis, mauro.dragone, rem.collier, gregory.ohare}@ucd.ie Abstract. This paper describes our entry to the Multi-Agent Programming Contest 2009. Based on last year’s entry, we modified our methodology, incorporated new features of the employed agent programming language, and adopted a simplified hierarchical organisation metaphor. This approach, together with a re-design of the task allocation algorithm, should result in increased efficiency and effectiveness. 1 Introduction Based on our entry in the ProMAS Agent Programming Contest 2008 [5], this paper discusses our submission for 2009 and the alterations to the previous entry. Last year’s entry was specified and designed with the SADAAM methodology [1], with a hybrid architecture based on the SoSAA [4] and using the Agent Factory Programming Language (AFAPL)[2]. This year’s entry uses new features of AFAPL [3] to better organise the herding agents. We also make use of a modified methodology, which is described in Section ??. With this methodology we use a modified form of the Agent Factory framework [2]. We once again use a hybrid architecture based on the SoSAA architecture [4]. 2 Software Architecture Our two-tiered architecture is based on the SoSAA robotic framework [4]. The upper layer is an intentional multi-agent system. The second layer is a low-level component based infrastructure. This combination of layers allows for intentional reasoning in the upper layer along with the support for multi-agent organization separate from lower-level actuation-based functionality. Our system uses AFAPL, taking advantage of new features introduced since last year’s contest. The basic features of AFAPL still include a model of beliefs, the notion of commitment to a course of action, a set of commitment rules, a simple language for specifying plans and support for specifying ontologies. Additionally, the language now also includes additional practical plan operators, a role programming construct and the notion of a goal. 208 209 209 Fig. 2. Example of RazorEdge scenario For this year’s entry, we therefore opted to eschew the auctions and, decided on a commander agent in charge of several herder agents. The commander agent is informed of herder agents’ map-related beliefs and their current commitments. The commander agent builds up an overall view of the system and the environment. Based on the current commitments, it decides on a course of action and provides the herder agents with new or amended beliefs and commitments. Our task allocation algorithm, will be based upon several factors. Dependent on the various roles, different considerations are taken into account for a cost/benefit calculation. Herding requires considering such factors as: the number of cows in a herd, (i.e. the reward); the distance from the corral, (i.e. the time cost); the number of herders used and available; the distance of herders to the herds; the proximity of known opponents. Exploring also takes distance to the corral into account. The location, path to and extent of unexplored spaces on the map are of interest. General weighting factors to be considered regardless of the role are the time left to the end of the game and the number of known cows left on the field. No offensive or defensive moves in relation to our opponents were made last year. After observation of some successful competitive tactics, we intend to explore the possibility of behaviours to protect the herded cattle and to provide more of an adversarial environment to our opponents. 4 Discussion As noted in the report on last year’s entry [5], our performance suffered due to runtime exceptions. Our efforts this year are centred around improving the quality of our agent-layer code by using better modelling techniques, a simplified architecture, and by taking advantage of advanced language features. Figure 3 depicts a simplified class diagram showing all the dependencies between the most important classes implemented for this year entry. In contrast to the previous version, new agents share a direct access to a global WorldModel 210 Fig. 3. Class diagram object that is updated with any new agents perception. The TeamLeader agent periodically examines the WorldModel class to find the best assignments for each agent in the team. First of all, the TaskEvalutor class runs a clustering algorithm on the cows registered in the WorldModel in order to group them in herds to be herded to our corral. In addition to herding tasks, the TaskEvaluator generates exploring tasks and open-fence sub-tasks whenever a primary task requires agents to pass through a fence. For each task, a cost-value analysis is carried out by estimating the number of iterations needed to execute the task. Secondly, the AbstractSolver examines each task, and tries to assign them to the most suitable team. It does this by removing the selected agents from the pool of available agents each time. By altering the ordering criteria we use to order the tasks, we can realise different strategies; from the greedy sequential auctioning to more sophisticated assignment schemas. As an example of the progression of the code, we now compare two functionally similar portions of code from the two submissions. Each represents a herder agent that can be told (using a ‘fipaMessage’) to explore an area of the board surrounding a pair of x, y co-ordinates. The behaviours ‘MoveToViaShortestPath’ and ‘Explore’ are implemented in the underlying Java layer. On receipt of a ‘fipaMessage’, the first excerpt simply installs the given task as a belief. When the given task is an ‘Explore’ task, the agent moves to the desired location. Once the agent arrives at its destination, the Java layer installs a ‘closeTarget’ belief, and this, in turn, triggers the ‘Explore’ behaviour. BELIEF(fipaMessage(request, sender(?name, ?addr), doTask(?task, ?params))) => BELIEF(chosenTask(?task, ?params)); BELIEF(chosenTask(Explore, params(?x, ?y))) => COMMIT(?self, ?now, BELIEF(true), activateBehaviour(MoveToViaShortestPath(x, ?x, y, ?y, tolerance, 5))); BELIEF(closeTarget) & BELIEF(chosenTask(Explore, params(?x, ?y))) & BELIEF(active_behaviour(MoveToViaShortestPath)) => COMMIT(?self, ?now, BELIEF(true), activateBehaviour(Explore)); 211 On receipt of a similar ‘fipaMessage’, the second excerpt adopts the given task as a goal. The Agent Factory interpreter selects a compatible plan with a postcondition that results in the goal being achieved; our ‘Explore’ plan is the only plan in the agent’s plan library and is therefore chosen. The ‘Explore’ plan then performs its three component actions (move to the location, explore it, and believe that the task is complete) in parallel. BELIEF(fipaMessage(request, sender(?name, ?addr), doTask(?task, ?params))) => COMMIT(?self, ?now, BELIEF(true), ADOPT(GOAL(completedTask(?task, ?params)))); PLAN explore(?x, ?y) { PRECONDITION BELIEF(true); POSTCONDITION BELIEF(completedTask(Explore, params(?x, ?y))); BODY PAR ( activateBehaviour(MoveToViaShortestPath(x, ?x, y, ?y, tolerance, 5)), DO_WHEN(BELIEF(closeTarget), activateBehaviour(Explore) ), DO_WHEN(BELIEF(completed(Explore)), ADOPT(BELIEF(completedTask(Explore, params(?x, ?y)))) )); } It can be seen that the second excerpt is more cohesive in that all of the exploring-related agent code is contained in a single plan. We hope that agent code written in the new style will prove a lot easier to test and debug. 5 Conclusion This paper presents an overview of our submission to the ProMAS Multi-Agent Programming Contest 2009. We aim to improve on last year’s result by implementing changes to last year’s system including the utilisation of new features of AFAPL2. References 1. Neil Clynch and Rem W. Collier. SADAAM: Software Agent Development An Agile Methodology. In Proceedings of the Workshop of Languages, methodologies and Development tools for multi-agent systems (LADS’07), 2007. 2. Rem W. Collier. Agent Factory: A Framework for the Engineering of Agent-Oriented Applications. PhD thesis, School of Computer Science and Informatics, 2002. 3. Rem W. Collier. AFAPL2: Development Kit. online, 2008. http://www. agentfactory.com/index.php, accessed on 28th April 2009. 4. Mauro Dragone, David Lillis, Rem W. Collier, and G.M.P O’Hare. SoSAA: A Framework for Integrating Components and Agents. SAC ‘09, 2009. 5. Mauro Dragone, David Lillis, Conor Muldoon, Richard Tynan, Rem W. Collier, and G.M.P O’Hare. Dublin Bogtrotters: Agent Herders. In Post-Proceedings of the Sixth International Workshop on Programming Multi-Agent Systems, ProMAS, 2008. 212 Cows and Fences: JIAC V - AC’09 Team Description Axel Hessler, Tobias Küster, Oliver Niemann, Aldin Sljivar, and Amir Matallaoui DAI-Labor, Technische Universität Berlin, Germany Abstract. The agent contest of 2009 has significantly increased the complexity of last years scenario. In this paper we present our approach to tackle this challenge. Based on last years work and methodology we introduce some refined collaboration strategies. While last years scenario gave rise to discussion of some destructive strategies, we think that this year such strategies play an important role and thus we try to address them. Again this years contest is strongly appreciated not only as testing ground and for evaluation purposes but also as contest of applied game strategies. 1 Introduction As in the last year’s edition of the contest, the JIAC V agent framework [1] will be used for implementing the multi-agent system. The framework is the successor of the time-honored JIAC IV [2], which has been created along with an accompanying toolkit in the course of a series of projects at DAI-Labor. Compared to AC’08 few things have changed. Still we believe, that these small changes will demand a lot more from the teams w.r.t. inter-agent communication and coordination. This year, the JIAC V Agent Team has been prepared by the students of a university course at Technische Universität Berlin which is supervised by members of the Competence Center Agent Core Technologies of DAI-Labor, TU Berlin. From this we got some fresh ideas, and also have gained some more insight in how well our agent framework can be used by developers being unfamiliar with it. 2 System Analysis and Design Intuitively, all students took a role-based approach to analysis and design, a role meaning the aggregation of functionality and interactions regarding a certain aspect of the domain. In the following, we name and explain the roles regardless of whether they have been implemented or not in the end. Obviously, the agents need to explore their environment in order to get to know it: find cows, find all obstacles in order to calculate the best way to the own corral, find the opponent’s corral. This is what is subsumed in the Explorer 213 role. We then need to drive one or more cows to the corral, the Herder role. We also assume that cows may escape from the corral when someone is using the fence switch, so we presumably need a Keeper role. The additional feature of this year’s scenario, the switch, leads to another role: the ButtonPusher, although we expect that this is not a full time job. We also identified implicit roles which all agents must be capable of: connect to the server, receive perceptions from and send actions to the server: the ServerConnector role. An agent must also be capable of parsing the server message and update its world model, the Perception role. The perception role also notices if actions failed or not. Furthermore, each agent should be capable of talking to all other agents of its own team to share its perception and its intention, the TeamCommunicator role. A third group of roles that has been identified concerns the opponent: analyse opponent behaviour in the the OpponentAnalyser role. Based on the analysis the agents can then interfere with opponent agents’ actions, the TroubleMaker role is born. If applicable to the situation, the own agents may try to steal cows from the other corral. These capabilities and interactions can be concentrated in the Thief role. We also must take into account that the opponent has the same skill, so we will have to expand the Keeper role with the ability to prevent that opponent agents steal our cows. When discussing the roles, there was no consensus on the Thief and TroubleMaker roles. Some stated that it is not worth to have these extra roles, because when making trouble and stealing these agents could have driven cows to one’s own corral. Last but not least, we identified the necessity to analyse the behaviour and performance of our own team, the TeamAnalyser role. 3 Software Architecture Our contribution is realised using the JIAC V agent framework which we are currently developing and extending. It is aimed at the easy and efficient development of large-scale and high-performance multi-agent systems. It provides a scalable single-agent model and is built on state-of-the-art standard technologies. The main focus rests on usability meaning that a developer can use it easily and that she is supported by the right set of tools depending on what she is doing. JIAC V is implemented in the Java programming language. The aforementioned roles are implemented with components (agentbeans) which are the behavioural structures of the agent. They access and modify the agent’s state, generate knowledge and trigger the actions. We also use two sensor/actuator components. One component, the standard communication component of the framework, is used for the information exchange between our agents. The other component gathers the perception messages from and delegates the action message to the competition server. According the the rules of the contest, we have ten agents, forming the team, acting autonomously on the environment, each of them having the same set of 214 roles. In Figure 1 we show the principle structure of each contest agent drawn in the JIAC V Agent World Editor. Fig. 1. Design of the competition agents using the JIAC V Agent World Editor (AWE) 4 Agent team strategy Due to the similarity of the scenarios, large parts of last year’s strategy can be adopted. Firstly, every agent builds its own world model from what it is told by the server and its team mates. Every agent also plans for itself, by taking the intentions of its team mates into account. Further, they share both their perceptions and their intentions, preventing redundant actions and allowing to quickly reenter the game in case an agent should need to be restarted. Just like last year, the agents will navigate using the A∗ algorithm both for calculating its own path and for calculating the path a cow should take and where to position itself to drive the cow in this direction. Last year, our agents were very proficient in driving single cows and smaller flocks of cows, and we found it impossible to separate cows from a flock of cows having a certain critical mass and shape. This year, due to the changed cow algorithm, single cows are harder to drive alone, and it is easier to drive smaller flocks and possible to drive even huge herds. Last year we could see emergent behaviour while driving cows in a team of agents. This was a special feature of our cow driving algorithm [3], which has not been implemented explicitly. Although we wanted to implement explicit team strategies, we could see to our astonishment that our cow driving algorithm also 215 adopts to the new terms, that cows do not dissappear in the corral, so explicit team strategy was not necessary. Even better, when cows try to escape from the corral or other agents try to drive them out, two or three agents alone take the Keeper role and drive them back. As we have not implemented such behaviour this can also be considered as emergent behaviour. Finally, it was not necessary to implement the Thief role. Our agents do not distinguish between free cows and cows in the opponent’s corral, they always drive cows not in the own corral. One feature of the scenario made changes in the cowboy agents’ behaviour necessary: fences. We try to solve the problem by introducing two new intentions: OpenFence and GoThroughFence. With these two intentions our agents tell their team mates if they just open a fence for others to drive cows through or if they just want to go through the fence in order to explore the world and find new cow herds. 5 Discussion Again, this year’s contest terms challenge the participating teams more then they did last year. But this was not the main reason why we wanted to take part int the contest. We use the contest as a test bed for the JIAC V agent framework to show how fast and reliable distributed computing can be done with agents from the shelf, and to improve the reliability and scalability of the JIAC V agent framework. As JIAC V is so new, we have to improve the way how people can learn developing agents using the framework by documentation, tutorials and small examples, which show aspects of JIAC V agents and multi-agent systems in the nutshell. This is the result of watching the students while they look into the cow herding problem and try to implement the solution on top of the JIAC V architecture. And not to forget, the supporting tools. We more and more learn which set of tools is essential to support agent-oriented software development. In this paper we have used a screenshot of the JIAC V Agent World Editor (AWE), a successor of the Toolipse AgentRole Editor [4]. The new AWE allows to design a multi-agent system without switching the view between platform level, agent level and agentrole level. Together with a code generation component, an application starter and a source code editor for the JADL++ agent programming language [5] on top of the Eclipse IDE [6] it makes the necessary toolkit for each agent developer. 6 Conclusion The JIAC V team solved the problem of capturing as much cows as possible and to keep them in the corral. We used the contest to improve our new agent framework concerning reliability and scalability and tested the new AWE tool. The greatest pleasure was, again, to see emergent team behaviour while agents are 216 driving cows, keep cows in the corrals and “steeling” agents from the other team. It is still not clear to us that sharing perceptions and intentions between agents is such a powerful concept. We also appreciate the higher scenario complexity. References 1. Hirsch, B., Konnerth, T., Heßler, A.: Merging Agents and Services — the JIAC Agent Platform. In: Multi-Agent Programming: Languages, Tools and Applications. Springer (2009) 159–185 2. Hessler, A., Hirsch, B., Keiser, J.: JIAC IV in Multi-Agent Programming Contest 2007. In Dastani, M., Segrouchni, A.E.F., Ricci, A., Winikoff, M., eds.: ProMAS 2007 Post-Proceedings. Volume 4908 of LNAI., Springer Berlin / Heidelberg (2008) 262–266 3. Hessler, A., Keiser, J., Küster, T., Patzlaff, M., Thiele, A., Tuguldur, E.O.: Herding Agents - JIAC TNG in Multi-Agent Programming Contest 2008. In Dastani, M., Segrouchni, A.E.F., Ricci, A., Winikoff, M., eds.: ProMAS 2008 Post-Proceedings. Volume 5442/2009 of LNCS., Springer Berlin / Heidelberg (2009) 228–232 4. Tuguldur, E.O., Heßler, A., Hirsch, B., Albayrak, S.: Toolipse: An IDE for development of JIAC applications. In: Proceedings of PROMAS08: Programming Multi-Agent Systems. (2008) 5. Hirsch, B., Konnerth, T., Burkhardt, M.: The JADL++ language — semantics. Technical report, Technische Universität Berlin — DAI Labor (2009) 6. The Eclipse Project. http://www.eclipse.org/ 217 218 Author Index Balthasar, Gregor . . . . . . . . . . . . . . . 188 Baral, Chitta . . . . . . . . . . . . . . . . . . . . . .20 Bordini, Rafael H. . . . . . . . . . . . . . . . 203 Boss, Niklas Skamriis . . . . . . . . . . . . 193 Broda, Krysia . . . . . . . . . . . . . . . . . . . . 105 Bulling, Nils . . . . . . . . . . . . . . . . . . . . . . . 2 Sakama, Chiaki . . . . . . . . . . . . . . . . . 121 Satoh, Ken . . . . . . . . . . . . . . . . . . . . . . 105 Sichman, Jaime . . . . . . . . . . . . . . . . . 203 Son, Tran Cao . . . . . . . . . . . . . . . . . . 1, 20 Steunebrink, Bas . . . . . . . . . . . . . . . . . 55 Sudeikat, Jan . . . . . . . . . . . . . . . . . . . . 188 Cao Son, Tran . . . . . . . . . . . . . . . . . . . 121 Cliffe, Owen . . . . . . . . . . . . . . . . . . . . . . 87 Takata, Shiro . . . . . . . . . . . . . . . . . . . . . 71 Tinnemeier, Nick . . . . . . . . . . . . . . . . .38 Tuguldur, Erdene-Ochir . . . . . . . . . 198 Dastani, Mehdi . . . . . . . . . . . . . . . . . . .55 De Vos, Marina . . . . . . . . . . . . . . . . . . . 87 Dennis, Louise . . . . . . . . . . . . . . . . . . . 38 Villadsen, Jørgen . . . . . . . . . . . . . . . . 193 Visser, Wietske . . . . . . . . . . . . . . . . . . 156 Farwer, Berndt . . . . . . . . . . . . . . . . . . . . . 2 Fujita, Megumi . . . . . . . . . . . . . . . . . . . 71 Gongora, Pedro Arturo . . . . . . . . . . 139 Hessler, Axel . . . . . . . . . . . . . . . . . . . . .213 Hindriks, Koen . . . . . . . . . . . . . . . . . . 156 Hopton, Luke . . . . . . . . . . . . . . . . . . . . .87 Hosobe, Hiroshi . . . . . . . . . . . . . . . . . 105 Hubner, Jomi Fred . . . . . . . . . . . . . . 203 Jensen, Andreas Schmidt . . . . . . . . 193 Jonker, Catholijn . . . . . . . . . . . . . . . .156 Keinänen, Helena . . . . . . . . . . . . . . . 172 Keinänen, Misa . . . . . . . . . . . . . . . . . 172 Lillis, David . . . . . . . . . . . . . . . . . . . . . 208 Ma, Jiefei . . . . . . . . . . . . . . . . . . . . . . . . 105 Meyer, John-Jules . . . . . . . . . . . . . . . . 38 Nide, Naoyuki . . . . . . . . . . . . . . . . . . . . 71 Pacianotto, Gustavo Pacianotto . 203 Padget, Julian . . . . . . . . . . . . . . . . . . . . 87 Pereira, Ricardo Hahn . . . . . . . . . . . 203 Picard, Gauthier . . . . . . . . . . . . . . . . 203 Piunti, Michele . . . . . . . . . . . . . . . . . 203 Pontelli, Enrico . . . . . . . . . . . . . . . . . . 20 Renz, Wolfgang . . . . . . . . . . . . . . . . . 188 Rosenblueth, David A. . . . . . . . . . . 139 Russo, Alessandra . . . . . . . . . . . . . . . 105 219

Log In

On the Implementation of Speculative Constraint Processing

Related papers

Related papers

Related topics