Academia.eduAcademia.edu

Anomalies in ontologies with rules

2010, Web Semantics: Science, Services and Agents on the World Wide Web

For the development of practical semantic applications, ontologies are commonly used with rule extensions. Prominent examples of semantic applications are Semantic Wikis, Semantic Desktops, but also advanced Web Services and agents. The application of rules increases the expressiveness of the underlying knowledge in many ways. Likewise, the integration creates new challenges for the design process of such ontologies, but also existing evaluation methods have to cope with the extension of ontologies by rules. Since the verification of OWL ontologies with rule extensions is not tractable in general, we propose to verify ontologies at the symbolic level by using a declarative approach: With the new language DATALOG , known anomalies can be easily specified and tested in a compact manner. We introduce supplements to existing verification techniques to support the design of ontologies with rule enhancements, and we focus on the detection of anomalies that especially occur due to the combined use of rules and ontological definitions.

Anomalies in Ontologies with Rules Joachim Baumeister a and Dietmar Seipel a a University of Würzburg, Institute of Computer Science, Am Hubland, 97074 Würzburg, Germany Abstract For the development of practical semantic applications, ontologies are commonly used with rule extensions. Prominent examples of semantic applications are Semantic Wikis, Semantic Desktops, but also advanced Web Services and agents. The application of rules increases the expressiveness of the underlying knowledge in many ways. Likewise, the integration creates new challenges for the design process of such ontologies, but also existing evaluation methods have to cope with the extension of ontologies by rules. Since the verification of OWL ontologies with rule extensions is not tractable in general, we propose to verify ontologies at the symbolic level by using a declarative approach: With the new language DATALOG⋆ , known anomalies can be easily specified and tested in a compact manner. We introduce supplements to existing verification techniques to support the design of ontologies with rule enhancements, and we focus on the detection of anomalies that especially occur due to the combined use of rules and ontological definitions. Key words: evaluation, verification, ontology engineering, OWL, R IF -B LD, S WRL 1. Introduction The use of ontologies has shown its benefits in many applications of intelligent systems in the last years. Recent examples are the development of Semantic Wikis, e.g., [7,28], and Semantic Desktops, e.g., [27]. Most prominently, the Semantic Web initiative [35] coordinates the specification and life cycle of ontology languages in the context of the Semantic Web [1]. The semantic web stack, e.g., see [16], describes the architecture of the Semantic Web at a technical level including languages for ontologies and rules, but also key technologies such as Unicode and encryption. Whereas, the implementation of lower parts of the semantic web stack has successfully led to a standardization, the upper parts, especially rules and the logic framework, are Email addresses: [email protected] (Joachim Baumeister), [email protected] (Dietmar Seipel). still heavily discussed in the research community, see for example [16,19,26]. This insight has led to many proposals for rule languages compatible with the semantic web stack, e.g., the definition of R IF -B LD (Basic Logic Dialect of the Rule Interchange Format, [33]), S WRL (semantic web rule language) which originates from RULE ML, and similar approaches [17]. It is generally agreed that the combination of ontologies with rule-based knowledge is essential for many interesting semantic web tasks such as the realization of semantic web agents and services. S WRL allows for the combination of a high-level abstract syntax for Horn-like rules with OWL, and a model theoretic semantics is given for the combination of OWL with S WRL rules. The X ML syntax was derived from RULE ML. With R IF -B LD an analogous X ML serialization of rules is in the process of standardization. R IF -B LD specifies an interchange format for rule languages and proposes an integration with R IF -B LD/OWL languages. Preprint submitted to Web Semantics: Science, Services and Agents on the WWW, doi:10.1016/j.websem.2009.12.003 1 February 2010 However, with the increased expressiveness of such ontologies new demands for development and maintenance guidelines arise. Thus, conventional approaches for evaluating and maintaining ontologies need to be extended and revised in the light of rules, and new measures need to be defined to cover the implied aspects of rules and their combination with conceptual knowledge in the ontology. Concerning the expressiveness of the ontology language, we focus on the basic elements of OWL DL, which should make the work transferable to ontology languages other than OWL, and we mostly describe methods for the syntactic analysis of the considered ontology. We also focus on the basic features of rule languages such as S WRL and R IF -B LD: they correspond to a rule language of Horn clauses with class or property descriptions as literals with equality and a standard first-order semantics. ditional elements of the ontology language may also introduce new possibilities of occurring anomalies. For this reason, we propose the declarative specification of anomalies by DATALOG⋆ , that allows for flexibly including new and application-relevant anomalies. Here, the axioms of the ontology and the given rules are mapped to corresponding DATALOG⋆ facts and rules, respectively. Thus, the anomaly predicates described in the remainder of the paper can be directly applied. In detail, we investigate the implications and problems that emerge from rule definitions in combination with some of the following ontological descriptions: (i) class relations like subclass, complement, disjointness, (ii) basic property characteristics like transitivity, symmetry, ranges and domains, and cardinality restrictions. We distinguish between the following categories of anomalies: – Circularity in taxonomies and rule definitions. – Redundancy due to duplicate or subsuming knowledge. – Inconsistency because of contradicting definitions. – Deficiency comprising subtle issues describing questionable ontology design. Since we mainly describe syntactic checks of ontologies, the presented work is different from the evaluation of ontologies with respect to the intended semantic meaning: the OntoClean methodology [14] is an example for semantic checks of taxonomic decisions made in ontologies. We also do not consider common errors that can be implemented due to the incorrect understanding of logical implications of OWL descriptions, as described by Rector et al. [25]. 1.1. Verification at the Symbolic Level Due to the combination of OWL and rules, however, the general detection of all anomalies is an undecidable task. Whereas for fragments of R IF -B LD or S WRL— such as E LP [19] — tractable reasoning can be provided, the identification of redundant and deficient knowledge still requires syntactic methods that investigate the concepts and rules at the symbolic level. Here, the term verification denotes the syntactic analysis of ontologies at the symbolic level for detecting anomalies. On the one hand, the discussed issues of the presented work originate from the evaluation of taxonomic structures in ontologies introduced by Gómez-Pérez [11]. On the other hand, in the context of rule ontologies, classical work on the verification of rule-based knowledge — see for instance Preece and Shinghal [22,23] — has to be reconsidered. In this work, the verification of ontologies (mostly taxonomies) and rules (based on predicate logic), respectively, has been investigated separately. However, the combination of taxonomic and other ontological knowledge with a rule extension leads to new evaluation issues that can cause redundant or even inconsistent behavior. For example, a very obvious redundancy may be due to the coexistence of the taxonomic relation subClassOf(A, B) and the rule A ⇒ B. One contribution of our work is the extension of classic measures by novel anomalies that result from the combination of rule-based and ontological knowledge. Here, the concept of dependency graphs from deductive databases can be used [8]. Of course, the collection of possible anomalies presented in this paper may always be incomplete, and ad- 1.2. Integration of Verification Methods In general, the verification of ontologies with rules should not be seen as an isolated task, but is understood as a subtask of the evaluation phase, that is proposed in almost all methodologies for ontology development [12]. In the past, a variety of methodologies was introduced, that structure the development and evolution process into distinct phases, for example the On-ToKnowledge methodology [32], M ETHONDOLOGY [12], and the extensive CommonKADS methodology [29]. Here, the presented verification methods can be integrated as a sub-task into the evaluation phase, and they are used after every significant modification of the working ontology. 2 – class atoms, such that A = C(x), A′ = C ′ (x), and ρ(C, C ′ ), or – property atoms, such that A = P (x, y), A′ = P ′ (x, y), and ρ(P, P ′ ). E.g., the relation ρ can be subClassOf, disjointClasses, objectComplementOf, etc. From a relationship ρ(A, A′ ) it follows that A and A′ are of the same type (either class or property atoms). 1.3. Structure of the Paper The paper is organized as follows: The next section gives basic definitions and describes the expressiveness of the underlying knowledge representation; in the context of this work a subset of OWL DL is used. Then, the four main classes of anomalies are discussed in more detail. In Section 3, we introduce anomalies concerning the circularity of definitions. Anomalies uncovering inconsistent knowledge are shown in Section 4. We deal with redundancy in Section 5 and describe deficient knowledge in Section 6. We present some technical details of the evaluation mechanism of DATALOG⋆ in Section 7. The paper is concluded with a discussion. 2.1. Specification in DATALOG⋆ The detection of anomalies has been done using the P ROLOG meta-interpreter DATALOG⋆ , which we have implemented in S WI P ROLOG [36]. Due to their compactness and conciseness, we give the corresponding formal definitions in DATALOG⋆ for the anomalies, which are evaluated using a mixed bottom-up/top-down approach based on DATALOG and P ROLOG concepts, respectively. An intuitive understanding of the presented, mixed rule sets is possible without fully understanding the new inference method. For the interested reader, we introduce technical details of the evaluation mechanism of DATALOG⋆ as well as some supporting predicates in Section 7. Variables such A, B, C, . . . , A’, or Bi can be used for both class atoms and property atoms, whereas As, Bs, . . . , denote sets of class atoms and property atoms. The relationship subClassOf(A, A’) describes that A is a subclass of A’. Rules B1 ∧ · · · ∧ Bn ⇒ A are represented as nongrounded DATALOG⋆ facts rule(Bs=>A) (with variable symbols), where Bs = [B1 , . . . , Bn ] is the list of body atoms, A is the head atom, and => is a binary infix functor. Without loss of generality, we can assume that the rule heads are atomic, since rules with conjunctive rule heads can be split into several rules. In the bodies of DATALOG⋆ and P ROLOG rules, conjunction (and) is denoted by ”,”, disjunction (or) is denoted by ”;”, and negation by ”not”. 2. Expressiveness and Basic Notions For the analysis of ontologies with rules we restrict the considered constructs to a subset of OWL DL; in fact, many anomalies can occur when using the simple profile OWL 2 E L [13]: we investigate the implications of rules that are mixed with subclass relations and/or the property characteristics transitivity, symmetry, cardinality, complement, and disjointness. For example, in a university domain there might exist classes like Person, Student, and Professor, that are connected by properties such as – subClassOf(Student, Person), – subClassOf(Professor, Person), – disjointClasses(Student, Professor) . Figure 1 shows a graphical version of the class definitions. Person Student disjoint Professor Fig. 1. A simple ontology example with a disjoint relation. 2.1.1. Incompatible Classes: Disjointness and Complements We use disjointClasses(C1, C2) to define the disjointness between two OWL classes. The construct objectComplementOf points to instances, that do not belong to a specified class. The disjointness relation between a class C1 and a class C2 is equivalent to the relation subClassOf(C1, objectComplementOf(C2)). Two classes C1 and C2 are incompatible, if there exists a disjointness or a complement relationship between Class Atoms and Property Atoms. Given a class C and a property P : when used in rules we call C(x) a class atom and P (x, y) a property atom. For the following, it will be useful to extend the relations on classes and properties to relations on class and property atoms. Given two atoms A, A′ , we write ρ(A, A′ ), if both atoms have the same argument tuple, and their predicate symbols are linked by a relation ρ, i.e., if A and A′ both are 3 them. This is described by the following P ROLOG predicate used in DATALOG⋆ : tc_derives(E1, E2) :derives(E1, E2). tc_derives(E1, E3) :derives(E1, E2), tc_derives(E2, E3). incompatible(C1, C2) :( subClassOf( C1, objectComplementOf(C2) ) ; disjointClasses(C1, C2) ). The following P ROLOG predicates with calls to DATALOG⋆ facts generalize tc_derives and incompatible to atoms: For incompatible classes C1 and C2 there cannot exist an instance x with C1(x) ∧ C2(x). tc_derives_atom(A1, A2) :A1 =.. [P1|Xs], A2 =.. [P2|Xs], tc_derives(P1, P2). 2.1.2. Taxonomic Relationships and Rules An obvious equivalence exists between a transitive subclass relationship subClassOf(B, A) — where A and B are both class atoms or both property atoms with the same arguments — and the rule B ⇒ A with a single body atom B, that has the same argument as A. Thus, we combine them into the single formalism derives in DATALOG⋆ : incompatible_atoms(A1, A2) :A1 =.. [P1|Xs], A2 =.. [P2|Xs], incompatible(P1, P2). tc_incompatible_atoms(A1, A2) :A1 =.. [P1|Xs], A2 =.. [P2|Xs], tc_derives(P1, P3), incompatible(P3, P4), tc_derives(P2, P4). derives(C1, C2) :( subClassOf(C1, C2) ; rule([B]=>A), B =.. [C1, X1], A =.. [C2, X2], var(X1), X1 == X2 ). As described before, the binary built-in predicate =.. of P ROLOG splits given atoms Ai into their predicate symbol Pi and their list Xs of arguments; using the same variable Xs for both atoms requires the argument lists to be identical. We cannot evaluate these rules using forward chaining, since =.. cannot be applied if Xs is an unknown list. For two (class or property) atoms A and B we say that A implies B, if A = B or tc_derives_atom(A, B). The first of the following supporting P ROLOG rules turns the transitive closure into a reflexive transitive closure; the second extends it to negated atoms, i.e. literals, where ~ denotes negation: The P ROLOG call “T =.. Xs” splits a term T into a list Xs = [F,X1,...,Xn] consisting of the functor F and the arguments X1,...,Xn. Above, the functors C1 and C2 of the class atoms A and B, respectively, are class names and both atoms have one argument. The call “var(X1), X1 == X2” tests, if these arguments X1 and X2 are bound to the same variable. With the existence of equivalence definitions E1 ≡ E2 in an ontology language, e.g., the OWL definitions equivalentClasses and equivalentObjectProperties, we can further extend the definition of derives: an element E1 is derived by an element E2, if the elements are equivalent classes or properties. In DATALOG⋆ , we extend derives with the following P ROLOG rule: implies(A, B) :( A = B ; tc_derives_atom(A, B) ). implies(~A, ~B) :implies(B, A). derives(E1, E2) :( equivalentClasses(E1, E2) ; equivalentObjectProperties(E1, E2) ). 2.1.3. Remark on Examples In the following we give examples for most of the described anomalies. Here, we use the benchmark university domain LUBM [15], because of its popularity and intuitive understanding. We use the prefixes a: and b: for classes and properties in order to paraphrase, that these elements are contained in the ontologies a and b. Since such an equivalence is symmetrical, the predicate derives always creates cyclic derivations of equivalent elements with length 1. We compute the transitive closure tc_derives of derives using the following standard DATALOG⋆ scheme: 4 3. Circularity Ontology a Ontology b Person Circular definitions in the ontology have a severe impact on the reasoning capabilities of the underlying knowledge. Here we distinguish circular definitions in the taxonomic structure of the ontology as described by Gómez-Pérez [11], circular dependencies in the rule base as considered, e.g., by Preece and Shinghal [22], but also circular dependencies that can occur due to the mixture of taxonomic and rule-based knowledge. Person equi equi Professor Employee Fig. 2. An example of a circular alignment of concepts of two different ontologies due to the incorrect use of equivalence relations. cularity. The rule should be considered as a restricted subClassOf relation between A and B, which may result in the detection of a misapplied taxonomic definition between them. The circularity can be found with by following DATALOG⋆ predicate: 3.1. Circularity in Taxonomy and Rules Circular definitions can occur in the taxonomy, in rules, and in property relations. anomaly(circularity, A-Bs) :rule(Bs=>A), member(B, Bs), implies(A, B). 3.1.1. Exact Circularity in Taxonomy and Rules The following DATALOG⋆ predicate finds pairs [E, F ] of subsequent elements E = Ei−1 and F = Ei from a cyclic chain Example: Since subClassOf(Professor, Person), the following rule — defining a specific restriction on instances of classes Person and Professor — creates a partially cyclic definition: Person(X) ∧ Teacher(X,Y) ∧ University(Y) ⇒ Professor(X). E1 , E2 , . . . , En , En+1 = E1 , where Ei−1 derives Ei , for 2 ≤ i ≤ n + 1, such that all elements Ei of the chain are either classes or properties. anomaly(exact_circularity, [E, F]) :derives(E, F), E \= F, tc_derives(F, E). 3.2. Circular Properties Cycles with n = 1 commonly occur due to the inclusion of equivalent classes and properties in the predicate derives. For the subClassOf relation alone (included in derives), the described circular relationships are commonly detected by existing tools. Property descriptions can also be the source of circularity, when a chain of properties Pi connects a class C by a chain P P Pn−1 C = C1 →1 C2 →2 . . . → Cn = C Example: Consider two ontologies a and b with classes – subClassOf(a:Professor, a:Person) and – subClassOf(b:Employee, b:Person). Then, the following incorrect alignments create an undesired circularity in the taxonomy with n = 4: – equivalentClasses(a:Professor, b:Person) and – equivalentClasses(a:Person, b:Employee). of classes Ci , with n ≥ 2, to itself; at least one of the properties should not be symmetric. We say that a property P connects two classes D and E and denote P this by D → E, if there exists a property between two classes D′ and E ′ , such that D transitively derives or is equal to D′ and E ′ transitively derives or is equal to E. Often such a circularity leads to infinite models of the ontology. In pure description logic reasoners, various blocking methods [2,18] ensure termination of the proof procedure in case of existentially quantified cycles. However, the extension of ontologies by rules requires new methods, and decidability is not guaranteed in the general case. Typical sources of circularity are the incorrect use of inverse and symmetrical properties during the matching of two ontologies. In the general The example is depicted in Figure 2, where the incorrect alignment between the concepts of two ontologies a and b produce the circular dependence. 3.1.2. Circularity between Rules and Taxonomy A rule B1 ∧ · · · ∧ Bn ⇒ A, such that the head atom A implies some body atom B = Bi , leads to a cir5 case however, a cyclic property chain may sometimes be an intentional design decision in ontology modeling and should be therefore not treated as an anomaly. 4.1. Partition Error in Taxonomy The partition error [10] is commonly created due to the incorrect combination of disjoint and derives relations: There exists a partition error on the class level, when a class C is the subclass of two disjoint classes Ci , Cj . Similarly, a partition error on the instance level occurs, when an instance X was created from two disjoint classes. Example: We consider two ontologies a and b with the following classes and alignments: equivalentClasses(a:Lecture, b:Course) and equivalentClasses(a:Professor, b:Professor). The following further properties are defined in the ontologies: – lectures(a:Professor, a:Lecture) and – teaches(b:Professor, b:Course). If lectures and teaches are incorrectly aligned as inverse properties, then a property cycle is created. Example: Consider the ontology a with a class Person having two disjoint subclasses Teacher and Student: – subClassOf(a:Teacher, a:Person), – subClassOf(a:Student, a:Person), and – disjointClasses(a:Teacher, a:Student) . The alignment of the class b:TA (TeachingAssistent) of the ontology b as a subclass of both a:Teacher and a:Student would introduce a partition error, see for example Figure 3. We consider common property and range restrictions and further restrictions like the quantifiers someValuesFrom and allValuesFrom. Circular properties are detected in DATALOG⋆ as follows. anomaly(circular_property, C, Ps) :tc_connected_classes(C, Ps, C), member(P, Ps), not(symmetricObjectProperty(P)). The following DATALOG⋆ predicate detects partition errors, where X is either a subclass or an instance of the disjoint classes C1 and C2. The Prolog term X-[C1,C2] is used as a syntactic data structure for X and the group of the disjoint classes C1 and C2. The call anomaly(circular_property, C, Ps) computes classes C that are connected to themselves by a chain Ps of properties; the chain Ps is computed by using the DATALOG⋆ predicate tc_connected_classes, which will be given in Section 7. If at least one of the properties is not symmetric, then we have found a circular chain. anomaly(partition_error, X-[C1, C2]) :incompatible(C1, C2), ( ( derives(X, C1), derives(X, C2) ) ; ( classAssertion(C1, X), classAssertion(C2, X) ) ). 4. Inconsistency Ontology a Contradictory knowledge contained in ontological knowledge and rules often yields unintended and unexpected conclusions. In the past, possible inconsistencies were investigated separately for both taxonomic knowledge [11] and rule-based knowledge [22]. In the context of this paper, we focus on inconsistent knowledge that can be detected at the symbolic level. In the common case, the consistency of ontological knowledge with (general) rules cannot be derived in a tractable manner. Typical examples of inconsistencies are contradicting rule consequences for two rules with subsuming rule antecedents. For taxonomic knowledge, the partition error, which is given by a subclass of two or more classes that are contained in a disjoint partition (pairwise disjoint classes), is very common. In the following, we additionally discuss inconsistencies that may occur due to the combined use of rules and ontology definitions. Person Teacher disjoint Student TA Ontology b Fig. 3. An example of a partition error, where the concept TA (Teaching Assistent) inherits from the disjoint concepts Teacher and Student. 6 Since OWL 2, it is also possible to define disjointness between properties, asserting that a given collection of properties is pairwise exclusive. A partition error for properties can be defined analogously to the DATALOG⋆ predicate given above. one body atom B, i.e., A and B are disjoint or complements. Note that, according to our definitions this means that A = C(x) and B = D(x) are class atoms with the same argument x, and that C and D are disjoint or complements. Example: For two ontologies, the relationship disjointClasses(a:Teacher, b:Student) can be responsible for creating a self-contradicting rule, for example b:Student(X) ∧ b:Lecture(Y) ∧ b:teaches(X,Y) ⇒ a:Teacher(X). 4.2. Incompatible Rule Antecedent A rule B1 ∧ · · · ∧ Bn ⇒ A has an incompatible antecedent, if there exists an incompatibility relationship between two body atoms Bi and Bj , e.g., a disjoint or complement relationship. Note that, according to our definitions this means that Bi = Ci (x) and Bj = Cj (x) are class atoms with the same argument x, and that Ci and Cj are disjoint or complements. The following DATALOG⋆ predicate derives instances of rules that are self-contradicting: anomaly(self_contradicting_rule, Bs=>A) :rule(Bs=>A), member(B, Bs), tc_incompatible_atoms(A, B). Example: Consider the ontology a with two disjoint classes Teacher and Student, i.e., disjointClasses(a:Teacher, a:Student) is defined. The following alignment rule would introduce an incompatible rule antecedent: a:Student(X) ∧ a:Teacher(X) ⇒ b:TA(X) The example is similar to the partition error shown in Figure 3, where the contradicting concept is inherited by two disjoint concepts. In this example, the contradicting concept is derived by a rule having incompatible concepts in the antecedent. If a self-contradicting rule is activated, then the derived consequent contradicts its antecedent. 4.4. Contradicting Rules Consider two instances r and r′ of rules, such that for every body atom B of r there exists a body atom B ′ of r′ , such that B ′ implies B. The rules r and r′ are contradicting, if their head atoms A and A′ are contradicting. If r′ would fire, then also the more general rule r would fire, which derives contradicting conclusions. The following DATALOG⋆ predicate detects incompatible rule antecedents. anomaly(incompatible_antecedent, Bs=>A) :rule(Bs=>A), sub_sequence([Bi, Bj], Bs), tc_incompatible_atoms(Bi, Bj). Example: For two ontologies a and b, the incorrect equivalence relationship between the properties a:lectures and b:inLecture will cause the following rules to be contradicting: – a:Per(X) ∧ a:Lec(Y) ∧ a:lectures(X,Y) ⇒ a:Teacher(X), – b:Per(X) ∧ b:Lec(Y) ∧ b:inLecture(X,Y) ⇒ b:Student(X), where Person and Lecture are abbreviated by Per and Lec, respectively, with the relationship disjointClasses(a:Teacher, b:Student) and the equivalence relationships between the corresponding classes Person and Lecture in the ontologies a and b. The example is depicted in Figure 4. The basic P ROLOG predicate sub_sequence selects a sub-sequence of (not necessarily consecutive) elements of a given list. Note, that the call tc_incompatible_atoms instantiates the rule Bs=>A. An incompatible rule antecedent can be also considered to be a redundancy, since it is responsible for an unused rule, that never fires. However, we classify this anomaly as an inconsistency, because the incompatible antecedent may very likely be the result of a defective alignment of classes. Contradicting rule instances can be detected based a suitable subsumption relation ☎ for clauses, which we will also use for detecting rule subsumption in a later section. Therefore, we extend the relation implies from atoms to negative literals: ¬A implies ¬B, if B implies A. Moreover, we call the disjunction α = ¬B1 ∨ . . . ∨ ¬Bn the body clause of a rule 4.3. Self-Contradicting Rule An anomaly similar to the incompatible rule antecedent is described by the following: A rule is called self-contradicting, if there exists an incompatibility relationship between the head atom A and at least 7 Ontology a stances of r1 and r2 , such that the body of the instance of r1 subsumes the body of the instance of r2 . 1 The consequences A1 = C1 (x) and A2 = C2 (x) are contradicting, if the corresponding classes C1 and C2 are incompatible, i.e., disjoint or complements. The described anomaly can be generalized to two (not necessarily disjoint) sets of rules that derive two semantically contradicting conclusions. However, this generalized type of anomaly cannot be detected in a purely syntactic manner. Ontology b Lecture equivalent Lecture lectures equivalent (!) inLecture Person equivalent Person Teacher disjoint Student 4.5. Multiple Functional Properties Fig. 4. An example of the incorrect alignment of the properties lectures and inLecture resulting in contradicting rules. Functional properties are not allowed to have more than one value for each individual. Therefore, the functional definition of a property can be canonically translated to a property with a minimum cardinality restriction ≥ 0 and a maximum cardinality restriction ≤ 1. For this reason, we can easily detect a semantic error, if a functional property has a maximum cardinality restriction greater than 1. Please note, that a property which transitively derives a functional property is also functional. The detection of an inconsistently defined maximum cardinality restriction can be done in DATALOG⋆ as follows: r = B1 ∧ · · · ∧ Bn ⇒ A. Definition 1 (Subsumption) Given two disjunctions α = L1 ∨ · · · ∨ Ln and β = K1 ∨ · · · ∨ Km of arbitrary (positive or negative) literals Li and Kj . α ☎ β, if there exists a substitution θ, such that for all Li there exists a Kj , where Li θ implies Kj . In comparison, the standard subsumption relation would require Li θ = Kj instead of Li θ implies Kj . The following DATALOG⋆ rule derives instances of rules that contradict each other: anomaly(contradicting_rules, [R1, R2]) :rule(R1), rule(R2), contradicting_rules(R1, R2). anomaly(multiple_functionality, Q) :functionalObjectProperty(Q), ( P = Q ; tc_derives(P, Q) ), max_cardinality_restriction(C, P, X), X > 1. This DATALOG⋆ rule is supported by the following P ROLOG rule: With the introduction of OWL 2, also qualified cardinality restrictions are allowed; thus, an additional predicate can be introduced to check, if particular instances of a property exceed a corresponding qualified restriction. An inconsistent property restriction may be the result of an incorrectly performed ontology integration, e.g., the wrong alignment of functional and non-functional properties. contradicting_rules(Bs1=>A1, Bs2=>A2) :negate_atoms(Bs1, Cs1), negate_atoms(Bs2, Cs2), clause_subsumes(Cs1, Cs2), tc_incompatible_atoms(A1, A2). The P ROLOG predicate negate_atoms transforms a list [B1,...,Bn] of atoms into a list [~B1,...,~Bn] of negated atoms. The rule is further supported by the following P ROLOG predicates clause_subsumes: 5. Redundancy Redundant knowledge is created by ontological definitions and rules that can be removed from the knowledge base without changing the intended semantics. In clause_subsumes(Cs1, Cs2) :checklist( implies, Cs1, Cs2 ). In case of subsumption, the body of the more general rule r1 always fires when the body of r2 fires. Note that, based on the call of the P ROLOG predicate clause_subsumes, the predicate above computes in- 1 If we would replace the call to the P ROLOG goal G = contradicting_rules(R1, R2) by not(not(G)) in the body of the anomaly rule above, then we would check for subsumption without creating instances. 8 most cases, redundancies can be clearly identified. Typical redundancies for ontologies like identical concepts have already been discussed, for example in [11]. Also, a separate discussion of rule-based redundancies like subsuming rules can be found for instance in [23]. In the following, we introduce further redundancies that can occur due to the combination of ontological definitions and rules. body clause of r subsumes the body clause of r′ with respect to the same substitution θ. A subsumed rule r′ can be removed without changing the semantics of the ontology. Subsuming rules can be detected by the following DATALOG⋆ predicate, where the P ROLOG predicate rule_subsumes_check, which we do not list here, is used for checking subsumption: anomaly(subsumed_rule, [R1, R2]) :rule(R1), rule(R2), rule_subsumes_check(R1, R2), not(rule_subsumes_check(R2, R1)). 5.1. Identity We call identical formal definitions of classes, properties or rules, that can be only discriminated by their different names, identity errors. They can occur if some implied knowledge is not explicitly stated in the ontology, thus uncovering an incompleteness error. For example, identical classes may be distinguished by the developer by the introduction of an additional property for one of the identical classes. Also identity of classes or rules can be created by the integration of overlapping ontologies that share (partially) identical concepts. 5.4. Redundant Implication A rule r (over class or property atoms) has a redundant implication of a parent, if some body atom B implies the head atom A. This can be seen as a special case of rule subsumption, since the implication can be seen as a rule B ⇒ A, which subsumes the rule r. Example: Given the subclass relation subClassOf(Professor, Teacher), the following rule redundantly derives the parent Teacher: Professor(X) ∧ Lecture(Y) ∧ teaches(X,Y) ⇒ Teacher(X). The example is depicted in Figure 5. 5.2. Redundancy by Repetitive Taxonomic Definition The redundant definition of taxonomic knowledge of classes and properties was already described by GómezPérez [11]. Let X, Y be either two classes or two properties. We distinguish two types of repetition: – direct repetition, where subClassOf(X, Y) is defined more than once in the ontology; – indirect repetition, where subClassOf(X, Y) is defined, but this relation can be also derived by a chain subClassOf(X, X1), subClassOf(X1, X2), . . . subClassOf(Xn, Y) with n ≥ 1. Direct and indirect repetition corresponding to the instantiation of classes and properties can be also defined on instance-of instead of subclass relations. A repetitive definition can easily occur due to the (correct) alignment of two classes or properties. In such cases, repetitions are not an undesirable redundancy, but an intended behavior. Teacher Teacher(x) ⇐ Professor(x), Lecture(Y), teaches(X,Y) Professor Lecture Fig. 5. An example for a rule redundantly deriving an already known parent. In DATALOG⋆ , such a redundancy can be defined as follows: 5.3. Rule Subsumption anomaly(implication_of_superclass, Bs=>A) :rule(Bs=>A), member(B, Bs), implies(B, A). A rule r = B1 ∧ · · · ∧ Bn ⇒ A, can be mapped to a logically equivalent disjunction clause(r) = ¬B1 ∨ . . . ∨ ¬Bn ∨ A. We say that a rule r subsumes another rule r′ , for short r ☎ r′ , if clause(r) ☎ clause(r′ ). This means, that the head A of r subsumes the head A′ of r′ , and the Besides the obviously redundant inclusion of B in the antecedent, this anomaly might also point to an incorrectly assigned subsumption relation between A and B. On the one hand, there exists a separate subsumption 9 If Rule is the DATALOG⋆ representation of an arbitrary rule r with the head predicate R, then the supporting P ROLOG predicate rule_transitivity/3, cf. Section 7, tests if R is transitive and constructs the DATALOG⋆ representation Rule_t of the rule rt = P (x, y) ∧ Q(y, z) ⇒ R(x, z), such that P, Q, and R are equivalent. Then, we can check if rt subsumes r; this depends on the arguments of the predicates P, Q, and R in r. E.g., rt subsumes the rule r from above, but it does not subsume the rule r′ = P (x, y)∧Q(y, z)∧β ⇒ R(z, x). between A and B. On the other hand, the rule defines an additional restricted subsumption if the rule body not only contains B but further atoms. Therefore, the anomaly may also point to an inconsistent mapping between A and (an ancestor of) B. For B ≡ A, the equivalence may be incorrectly assigned, since the rule condition denotes a restriction on the implication. This error is similar to circularity between rules and taxonomy, but with an inverse subclass relation. With the introduction of Property Chain Inclusion (ObjectPropertyChain) in OWL 2, it similarly becomes possible to redundantly derive a property chain by a rule. For instance, the rule worksFor(Person,Lab) ∧ locatedIn(Lab, Org) ∧ ... ⇒ worksFor(Person,Org) describes a redundant implication, if the following property chain was already defined: ObjectPropertyChain(worksFor locatedIn) worksFor). When the rule contains additional atoms in the rule body, the detection of this anomaly points to an incorrectly defined ObjectPropertyChain, since the additional atoms may define a more restricted constraint on the particular inclusion. Symmetry. An analogous anomaly can occur for symmetrical properties R in rule heads: if R is equivalent to the property P , then the rule r = P (x, y) ∧ β ⇒ R(y, x) is redundant, since the more general rule rs = P (x, y) ⇒ R(y, x) without β can be derived by the OWL reasoner. In DATALOG⋆ we detect such a redundancy as follows: anomaly(redundant_symmetry_hb, Rule) :rule(Rule), head_predicate(Rule, R), rule_symmetry(Rule, R, Rule_s), rule_subsumes_check(Rule_s, Rule). 5.5. Redundant Implication of Transitivity or Symmetry The following two anomalies can be interpreted as special cases of a rule subsumption. If Rule is the DATALOG⋆ representation of an arbitrary rule r with the head predicate R, then the supporting P ROLOG predicate rule_symmetry/3, cf. Section 7, tests if R is transitive and constructs the DATALOG⋆ representation Rule_s of the rule rs = P (x, y) ⇒ R(y, x), such that P and R are equivalent. Then, we can check if rs subsumes r, which again depends on the arguments of the predicates P and R in r. Often such redundancies can be explained by an erroneous assumption of the transitivity or symmetry during an ontology matching process. Then, the rules define a more restrictive condition of transitivity and symmetry, respectively, if the conjunctions β are not empty. For this reason, the anomalies may be either classified as inconsistent mappings of the properties, or as incorrect alignments of transitivity and symmetry. Transitivity. A rule of the form r = P (x, y) ∧ Q(y, z) ∧ β ⇒ R(x, z) with a transitive property R in the head is redundant, if the properties P, Q, and R are equivalent. The reason is that in this situation the more general rule rt = P (x, y) ∧ Q(y, z) ⇒ R(x, z) without β can be derived by the OWL reasoner. We always assume a property P to be equivalent to itself. Example: For a transitive property sub, which should abbreviate subOrganizationOf, the following rule redundantly repeats the transitive definition: sub(X,Y) ∧ sub(Y,Z) ∧ ... ⇒ sub(X,Z). A redundant definition of a transitive property can be detected using the following DATALOG⋆ predicate: anomaly(redundant_transitivity_hb, Rule) :rule(Rule), head_predicate(Rule, R), rule_transitivity(Rule, R, Rule_t), rule_subsumes_check(Rule_t, Rule). 5.6. Redundancy in the Antecedent of a Rule Redundancy in the antecedent may occur because of redundant derivations of classes or properties, or because of already defined property relations. 10 5.6.1. Redundant Derivation in the Antecedent A redundancy in the antecedent of a rule occurs in a rule B1 ∧ · · · ∧ Bn ⇒ A, if some body atom Bi implies another body atom Bj . Here, Bj is redundant in the rule body and may be removed. We first construct three atoms Rxz, Pxy, and Qyz for equivalent properties, where R is a transitive property that occurs in the body of a rule Rule together with P and Q. Then, we form a clause from the negations of the three atoms and check if it subsumes the body clause Cs of Rule. The body clause Cs is obtained by applying the predicate rule_to_clause and omitting the first element of the result, which is the head of Rule. Example: The subclass relationship subClassOf(TeachingAssistant, Person) makes the atom Person(X) redundant in the following rule: Person(X) ∧ TeachingAssistant(X) ∧ ... ⇒ Employee(X) anomaly(redundant_symmetry_b, Rule) :rule(Rule), body_predicate(Rule, Q), rule_symmetry(Rule, Q, [Pxy]=>Qyx), rule_to_clause(Rule, [_|Cs]), clause_subsumes_check([~Pxy, ~Qyx], Cs). The DATALOG⋆ implementation for finding the anomaly is as follows: anomaly(redundant_derivation, Bs=>A) :rule(Bs=>A), sub_sequence([Bi, Bj], Bs), ( implies(Bi, Bj) ; implies(Bj, Bi) ). We construct two atoms Pxy and Qyx for equivalent properties, where P is a symmetric property that occurs in the body of a rule Rule together with Q. Then, we form a clause from the negations of the two atoms, and we check if it subsumes the body clause Cs of Rule. As a special case, this form of redundancy can occur in the ontology, if Bi ≡ Bj , e.g., due to the definition of equivalence relations. The anomaly may alternatively point to an incorrect mapping between the elements Bi and Bj , when these two elements were aligned from different ontologies. Example: For two ontologies a and b, the symmetric properties – a:worksWith(a:Person, a:Person) and – b:collaborates(b:Person, b:Employee) were defined to be equivalent. With the alignment equivalentClasses(a:Person, b:Person) and the relationship subClassOf(b:Employee, b:Person), the rule a:P(X) ∧ a:worksWith(X,Y) ∧ b:collaborates(Y,X) ⇒ b:E(Y), where Person and Employee are abbreviated by P and E, respectively, redundantly includes one of the two symmetric properties; either the use of worksWith or collaborates is redundant. In Figure 6 the concepts and properties together with their alignments are shown. 5.6.2. Redundant Use of Transitivity and Symmetry With the definition of special property characteristics in OWL, further anomalies may occur. For equivalent properties P, Q, R, there may exist the following redundancies: – A rule P (x, y) ∧ Q(y, z) ∧ R(x, z) ∧ β ⇒ A has a redundant body atom R(x, z), if the properties P, Q, R, are transitive. – A rule P (x, y) ∧ Q(y, x) ∧ β ⇒ A has a redundant body atom Q(x, y), if the properties P and Q are equivalent and symmetric. In DATALOG⋆ – with a supporting P ROLOG rule – this can be detected using the P ROLOG predicate clause_subsumes_check: Ontology a Person worksWith (symmetric) anomaly(redundant_transitivity_b, Rule) :rule(Rule), body_predicate(Rule, R), rule_transitivity( Rule, R, [Pxy, Qyz]=>Rxz), rule_to_clause(Rule, [_|Cs]), clause_subsumes_check( [~Pxy, ~Qyz, ~Rxz], Cs). Ontology b equivalent equivalent Person collaborates (symmetric) Employee Fig. 6. The rule redundantly uses a symmetrical property: a:P(X) ∧ a:worksWith(X,Y) ∧ b:collaborates(Y,X) ⇒ b:E(Y) Like the redundant definitions of transitivity and symmetry as described in Section 5.5, these anomalies can 11 point to an incorrect mapping of properties during an ontology alignment process. anomaly(redundant_mincardinality_0, Q) :min_cardinality_restriction(C, P, 0). 5.7. Unsupported Rule Condition Example: The property teaches(Person, Person) defines a redundant cardinality restriction, that can be omitted. A rule r has an unsupported condition, if at least one of its body atoms B neither unifies with an input atom (e.g., a given instantiation of the ontological concepts) nor with the consequent of another rule. The corresponding DATALOG⋆ predicate is shown below: <owl:Restriction> <owl:onProperty rdf:resource=’#teaches’/> <owl:minCardinality rdf:datatype=’&xsd;nonNegativeInteger’> 0 </owl:minCardinality> </owl:Restriction> anomaly(unsupported_condition, Bs=>A) :rule(Bs=>A), member(B, Bs), not(call(B)), not(rule(_=>B)). The rule even checks if some call of the atom B is successful. Another example for redundant cardinality restrictions is a max-cardinality restriction ≤ 1 for functional properties. If a super-property of the property is functional, then the cardinality restriction is also redundant. 5.8. Unsatisfiable Rule Condition Example: For a property hasID(Organization, &xsd;string), a max-cardinality restriction with ≤ 1 is defined. If the property is also functional, then the restriction can be omitted. An unsatisfiable condition can occur due to the rich semantics of OWL, for instance, if complement or disjoint descriptions are incorrectly aligned. The rule antecedent is unsatisfiable, if two body literals Bi = Ci (x) and Bj = Cj (x) are incompatible. The definition of an unsatisfiable condition is given in DATALOG⋆ as follows: <owl:Restriction> <owl:onProperty rdf:resource=’#hasID’ /> <owl:maxCardinality rdf:datatype=’&xsd;nonNegativeInteger’> 1 </owl:maxCardinality> </owl:Restriction> anomaly(unsatisfiable_condition, Bs=>A) :rule(Bs=>A), sub_sequence([Bi, Bj], Bs), tc_incompatible_atoms(Bi, Bj). The restriction is redundant, because the functionality of a property requires its uniqueness for the entire ontology. The anomaly was also described as the inconsistency incompatible rule antecedent in Section 4.2, because the occurrence of such a rule in a (merged) ontology may also point to an incorrect alignment of a disjoint or complement description. The following DATALOG⋆ predicate detects redundant max-cardinality restrictions: anomaly(subsumed_maxcardinality_1, Q) :functionalObjectProperty(Q), ( P = Q ; tc_derives(P, Q) ), max_cardinality_restriction(C, P, 1). 5.9. Redundant Cardinalities When using properties to define relations between classes, the relation can be further specialized by cardinality restrictions. However, sometimes the cardinalities are redundant due to the semantics of some special properties in OWL. One example is the use of the minimal cardinality ≥ 0, since all instances of a property have a link to zero or more individuals in its domain definition. The detection of a redundant min-cardinality restriction can simply be done using the following DATALOG ⋆ predicate: Since a functionality definition is intuitively welldefined, this concept should be preferred when compared to a max-cardinality restriction. 6. Deficiency Deficiency is more subtle than the previously presented categories of anomalies. The following anoma12 lies consider the completeness, understandability and maintainability of ontologies. Possible sources of such anomalies are imprecision during the manual development of (large) ontologies, effects of the evolution of ontologies, e.g., [31], and erroneous side-effects of the integration of ontologies. Since deficiencies mostly detect areas in an ontology with problematic design, we also call them design anomalies. The identification of a such an anomaly is the starting point of a refactoring. Refactoring methods describe procedures to eliminate the corresponding design anomaly without changing the meaning of the remaining knowledge. Originally, design anomalies had been identified and investigated for relational databases. In the last years, software engineering research has coined the term bad smells for parts of the source code that do not produce false behavior, but are badly designed and should be improved for better maintainability, cf. [21,9]. Recently, some approaches were presented to transfer this idea to the conceptual properties of different knowledge representations [3,6] and OWL ontologies [4,5]. In the following, we present a set of possible anomalies that affect the design of an ontology. However, these can be only seen as indicators for an actual anomaly. In any case the user has to decide whether and how to remove the possible issue. The presented design anomalies mainly focus on the detection of badly designed ontology concepts. For some anomalies their use in rules is taken into account, whereas other anomalies can occur independent of the existence of rule-based knowledge. In DATALOG⋆ , a possibly lazy element E can be detected as follows: anomaly(lazy_element, E) :element(E), not(subClassOf(_, E)), not(rule_predicate(E)), not(instance(_, E)). The supporting predicate element is defined in P RO LOG : element(E) :( class(E) ; objectProperty(E) ; datatypeProperty(E) ; transitiveObjectProperty(E) ; symmetricObjectProperty(E) ; functionalObjectProperty(E) ; inverseObjectProperties(E, _) ). The constraints stated above can be relaxed by checking for very few rules with the considered element in their head or body. Then, these rules have to be inspected by the user and marked as not useful. 6.2. Chains of Inheritance The hierarchy of classes and properties define the backbone of every ontology. Simple subclass relations are used to describe the inheritance of concepts and property relations. During the evolution of (manually built) ontologies or due to the imprecise integration of ontologies, the intended subclass structure of classes and properties can degenerate to subclass cascades in some parts of the hierarchy. A taxonomic chain 6.1. Lazy Class/Property The usage of a class or property is often a good indicator for its actual utility. We call a class or a property of an ontology lazy, if it is never or rarely used in real-world applications. More precisely, an element is possibly lazy when – the element represents a leaf in the hierarchy, and – no rules use this element, and – there exist no instances of the element, and – no other element uses this element as a property. There exist a number of reasons for the occurrence of this anomaly: Lazy elements may occur due to the integration of ontologies (including terms that are not useful or relevant any more), or due to the evolution of an ontology (previously useful concepts were replaced by specializations or generalizations). C1 , C2 , . . . , Cn of pairwise different classes Ci , where Ci−1 is a subclass of Ci , for 2 ≤ i ≤ n, is called a chain of inheritance, if all intermediate classes C2 , . . . , Cn−1 are not participating in any other subclass relations except the ones in the chain (isolated subClassOf), see Figure 7. The intermediate elements Ci may be not useful for applications, when (i) there exist no or very few individuals for the elements Ci and (ii) the elements Ci are not (extensively) used in ontological definitions, e.g., restrictions, or in rules. 13 C1 A C2 B ... ... ... It is worth noticing that the class C does not need to be disjoint to all Ci in the disjoint partition, but to a sufficiently large number of classes Ci (in Figure 8, class Cn+1 is not disjoint with C). Q P B Cm C1 disjoint C disjoint C2 disjoint D Cn C3 disjoint D E ... ... ... ... H Cn Cn+1 disjoint partition Fig. 7. A chain of inheritance of classes C1 , . . . Cn . Fig. 8. Lonely disjoint – a distant class C disjoint to a collection of siblings C1 , . . . Cn . A maximal chain of inheritance can be detected in DATALOG⋆ as follows. The DATALOG⋆ implementation of this anomaly is as follows: anomaly(chain_of_inheritance, Cs) :maximal_simple_path( isolated_subClassOf, Cs). anomaly(lonely_disjoint, C) :class(C), siblings(Cs), checklist( disjointClasses(C), Cs ). not((sibling(C, M), disjointClasses(C, M))). isolated_subClassOf(C1, C2) :subClassOf(C1, C2), not( ( subClassOf(C1, C), C \= C2 ) ), not( ( subClassOf(C, C2), C \= C1 ) ). The P ROLOG meta-predicate checklist/2 calls disjointClasses(C, D) for all members D of the list Cs to determine, whether C and D (the C is always the same) are disjoint. The rule is supported by two P RO LOG rules for sibling and siblings, respectively, including aggregation, which we will see in Section 7. A lonely disjoint class is often created by the manual modification of the ontology: a class is moved to another branch of the taxonomy, but the attached disjointness descriptions are not re-aligned appropriately. Furthermore, the anomaly can also occur due to incorrect alignments during an ontology integration task. The existence of a lonely disjoint class can cause unintended reasoning results or even errors. The developer of the ontology has to decide manually about detected lonely disjoints. The elimination of an actual anomaly is quite simple: the disjointness property is just removed from the lonely disjoint class. The P ROLOG predicate maximal_simple_path from a graph library computes simple paths, which at both ends cannot be extended by an isolated_subClassOf. If the user has decided to eliminate the useless elements of the chain, then the particular elements have to be removed separately by the refactoring collapse hierarchy [5]. 6.3. Lonely Disjoint Class We call a class a lonely disjoint, if this class is not disjoint with any of its siblings, but it is disjoint with classes that are mutual siblings in another branch of the taxonomy. See Figure 8 for an example, where class C is a lonely disjoint, since C is disjoint to the classes C1 , . . . , Cn , that are siblings but not a sibling of C. 14 6.4. Over-Specific Property Range are commonly produced when there exist equivalent rules with values that were coarsened to a single value. Inconsistent rules can occur due to a semantically inconsistent mapping function. In consequence, after the elimination of an anomaly, it is reasonable to undergo a subsequent check for redundant or inconsistent definitions. Sometimes the developers of an ontology tend to be very specific when defining value ranges for properties. During the practical use of the ontology, it often turns out that the values are too specific and that a coarser range with less values would fit the considered domain much better. Example: The value range Rtemp = {very high, high, normal , low , very low } of a property temp (for temperature) may contain five possible values, but the actual application of the property uncovers that a more gen′ eral range Rtemp = { high ′ , normal , low ′ } with three values would work much better. A typical example for this situation is the alignment of two ontologies, where the value range of a specific concept is shrunk in order to match with a foreign concept. A further example is the planned use of the developed system by human operators: here, a smaller and more comprehensible range of values is less prone to errors caused by manual data entry. If rules are defined containing this property, then the anomaly can be identified by the existence of many analogous rules for the particular values. In our example, rules for the values high and very high could be present. In such cases, the refactoring coarsen value range [5] forms groups of equivalent values, e.g., high ′ = { high, very high } and low ′ = { low , very low }. 6.5. Property Clump In ontologies, properties are commonly used to define relations and attributes between classes and individuals, respectively. The repeated and identical use of a collection of properties in many classes is a deficiency called property clump. A property clump is comparable to the repeated use of code fragments in traditional software, so-called clones. For ontologies, a property clump PC = (C, P) is formed by set C = {C1 , . . . , Cn } of at least two classes, that all share the same set P = {P1 , . . . , Pm } of properties; these properties can describe data type properties and object properties. Such an unintentionally repeated definition of properties in different classes can occur due to the manual development and evolution of an ontology. The following DATALOG⋆ rules find all maximal sets Cs of classes that have the same set Ps of properties in common: The following DATALOG⋆ predicate detects overspecific property ranges by determining pairs of rules that have variants has_value(P, Vi) in their antecedent (with i = 1, 2). This type of rule pair is found by deleting the variants from the rule bodies and subsequently testing for their equality. anomaly(property_clump, [Cs, Ps]) :setof( C, class_has_properties(C, Ps), Cs ), length(Cs, N), N > 1. class_has_properties(C, Ps) :setof( P, class_has_property(C, P), Ps ). anomaly(over_specific, [R1, R2, has_value(P,[V1,V2])]) :rule(R1), rule(R2), R1 = Bs1=>A, R2 = Bs2=>A, R1 \= R2, delete(has_value(P, V1), Bs1, Bs), delete(has_value(P, V2), Bs2, Bs). We use two nested aggregations based on the powerful P ROLOG meta-predicate setof/3. The inner aggregation computes a class and its properties. The outer aggregation computes all classes having the same set of properties. The repeated use of properties of a clump PC = (C, P) can be caught by a new class CP , which gets the properties in P. The original classes C ∈ C are linked to CP instead of linking them to the properties P. For ontologies with rules, we have to change all rules with property atoms P (x, y) for P ∈ P in their antecedent or consequent. The definition of such an abstract property class CP may increase the compactness and the maintainability (with respect to changes, extensions, fixes) of the ontology. An analogous DATALOG⋆ predicate can be defined for over-specific property ranges in rule consequences. The anomaly is removed by replacing the original values of the property with the aggregated ones in every ontological definition, e.g., in restrictions or rules. A detailed example of this refactoring is shown in [5]. It is worth noticing that the elimination of an overspecific property range can introduce new redundancies or even inconsistencies. For example, redundant rules 15 P1 C1 P1 C1 P2 P2 C2 C2 P3 C3 Cp P3 P4 C3 P4 property clump Fig. 9. Refactoring a property clump to an n-ary relation with the abstract property class P. An example of a property clump P = {P1 , P2 , P3 , P4 } used by three classes C1 , C2 , and C3 is depicted in Figure 9 (left); the refactored design using an abstract property class CP is shown at the right of the figure. The introduction of a new class, that captures related aspects of another class, is also discussed in the ontology design pattern n-ary relations [20], where a new class is created in order to link the instances of n individuals to an instance of a single class. With the identification of a property clump, incorrectly modeled n-ary relations may be uncovered. The extraction of such repetitions into a single data structure is a common refactoring, which improves the compactness and maintainability of the implementation. ing or DATALOG’s forward chaining alone, since we need recursion on cyclic data, function symbols (mainly for representing lists), non-grounded facts, disjunction, negation, and aggregation (using meta-predicates) in rule bodies, and stratification. DATALOG and P ROLOG. We distinguish between DATALOG⋆ rules and P ROLOG rules: DATALOG⋆ rules are forward chaining rules that may contain function symbols (in rule heads and bodies) as well as negation, disjunction, and P ROLOG predicates in rule bodies. DATALOG⋆ rules are evaluated bottom-up, and all possible conclusions are derived. The supporting P ROLOG rules are evaluated topdown, and — for efficiency reasons — only on demand, and they can refer to DATALOG⋆ facts. The P ROLOG rules are also necessary for expressivity reasons: they are used for some computations on complex terms, and — more importantly — for computing very general aggregations of DATALOG⋆ facts. 7. Implementation in DATALOG⋆ The introduced anomalies have been also defined by an implementation in the new language DATALOG⋆ . Using this language, we have developed a new approach that extends the DATALOG paradigm and mixes it with P ROLOG. The analysis can be run using the system D IS L OG Developers’ Kit (DDK) [30]. This toolkit provides a module including the presented implementation of DATALOG⋆ and the anomaly predicates as well as the shown examples. For the interested reader, we introduce some technical details of the evaluation mechanisms of DATALOG⋆ in the following. For the detection of anomalies a number of further DATALOG⋆ and P ROLOG predicates was used. We describe their implementation in Section 7.3 and Section 7.4. Forward and Backward Chaining. DATALOG⋆ rules cannot be evaluated in P ROLOG or DATALOG alone for the following reasons: Current DATALOG engines cannot handle function symbols and non-grounded facts, and they do not allow for the embedded computations (arbitrary built-in predicates), which we need here in this work. Standard P ROLOG systems cannot easily handle recursion with cycles, because of non-termination, and are inefficient, because of subqueries that are posed and answered multiply. Thus, they have to be extended by some DATALOG⋆ facilities (our approach) or memoing/tabling facilities (the approach of the P ROLOG extension X SB [24]). Since the embedding system, the DDK [30], is developed in S WI-P ROLOG, we have implemented a new inference machine that can handle mixed, stratified DATALOG⋆ /P ROLOG rule systems. The evaluation of DATALOG⋆ programs mixes forward-chained evaluation of DATALOG with SLD- 7.1. Mixing DATALOG and P ROLOG: Forward and Backward Chaining The detection of anomalies in rule ontologies could not be formulated using P ROLOG’s backward chain16 resolution of P ROLOG, see Figure 10. A DATALOG⋆ rule A ← B1 ∧ · · · ∧ Bn can contain atoms Bi which are evaluated backward in P ROLOG. connected_classes(C, P, D) :tc_derives(C, C_), property_restriction(C_, P, D_), tc_derives(D_, D). 7.4. Further Supporting P ROLOG Predicates Head and Body. The head and body predicates of a rule can be determined using the following pure P RO LOG predicates: SLD-Resolution Fig. 10. Mixing Forward and Backward Chaining. head_predicate(_=>A, P) :functor(A, P, _). 7.2. Stratified Evaluation of DATALOG⋆ For the ontology evaluation we have implemented two layers (strata) D1 and D2 of DATALOG⋆ rules: – The upper layer D2 consists of the rules for the predicate anomaly/2 and some DATALOG⋆ rules that are stated together with them. – The lower layer D1 consists of all other DATALOG⋆ rules. For example, the rules for predicates derives and tc_derives are in D1 . D1 is applied to the DATALOG⋆ facts for the following basic predicates, which have to be derived from the underlying ontology document: rule, class, subClassOf, objectComplementOf, incompatible, equivalentObjectProperties, equivalentClasses, transitive/symmetricObjectProperty, min/max_cardinality_restriction, property_restriction, class_has_property. The resulting DATALOG⋆ facts are the input for D2 . The stratification into two layers is necessary, because D2 refers to D1 through negation and aggregation. Most P ROLOG predicates in this paper support the layer D2 . body_predicate(Bs=>_, P) :member(B, Bs), functor(B, P, _). rule_predicate(E) :rule(Rule), ( head_predicate(Rule, E) ; body_predicate(Rule, E) ). Siblings. The following P ROLOG rules define siblings and aggregate the siblings Z of a class X to a list Xs using the P ROLOG meta-predicate setof/3, respectively: sibling(X, Y) :subClassOf(X, Z), subClassOf(Y, Z), X \= Y. siblings(Zs) :setof( Z, sibling(X, Z), Zs ). These rules could also be evaluated in DATALOG⋆ using forward chaining. But, since we need siblings only for certain lists Zs, this would be far too inefficient. The call to setof/3 above succeeds for every class X having siblings, and it computes the list Zs of all siblings Z of X. On backtracking, the siblings of the other classes X are computed. This means, setof/3 does a grouping on the variable X. Within setof/3, the call sibling(X, Z) computes one class X and its siblings Z. 7.3. Further DATALOG⋆ Predicates The following DATALOG⋆ predicate computes a chain Ps of properties that connect two classes C and D using transitive closure: tc_connected_classes(C, [P], D) :connected_classes(C, P, D). tc_connected_classes(C, [P|Ps], D) :connected_classes(C, P, E), tc_connected_classes(E, Ps, D), not(member(P, Ps)). Transitivity. Given a DATALOG⋆ rule Rule and a predicate R, the following P ROLOG rule tests if R is transitive and then constructs three atoms Rxz, Pxy, and Qyz, where P and Q are body predicates of Rule that 17 are equivalent to R. Finally, it forms a DATALOG⋆ rule Rule_t from the three atoms. R L: Circular Properties (Sec. 3.2), Multiple Functional Properties (Sec. 4.5), Redundant Implication of Transitivity or Symmetry (Sec. 5.5), Redundant Use of Transitivity and Symmetry (Sec. 5.6.2), and Redundant Cardinalities (Sec. 5.9). About a half of the issues required the existence of rules in the knowledge base. Of course, the presented anomalies only gave a brief insight into the collection of possible verification issues. Anomalies were considered concerning the basic elements of the ontology language. When using builtins, the detection of anomalies becomes a difficult task, since the semantics of built-ins can rarely be evaluated at the symbolic level. Simple problems occurring with built-ins are easily detectable, especially the definition of identical knowledge. For instance, the assessment of the Body-mass-index (BMI) in two medical ontologies a and b is redundant given the rules a:hasBMI(P,W) ∧ op:num-greater-than(W,25) ⇒ a:overweight(P) and the rule b:calBMI(P,W) ∧ op:num-greater-than(W,25) ⇒ b:heavy(P) where a:hasBMI and b:calBMI are defined as equivalent, and a:overweight and b:heavy are equivalent, respectively. Here, easily detectable inconsistencies can be identified; for example, when the numeric threshold is specified differently in the above rules. The analysis of more complex definitions, however, becomes much more difficult, when the semantics of the built-ins cannot be mapped to the symbolic level. For all discussed anomalies, we have introduced a declarative approach using DATALOG⋆ for implementing the anomaly checks for ontology verification. Due to its declarative nature, new methods for anomaly detection can be easily added to the existing work. From our point of view, this is crucial, because of the incompleteness of the presented anomalies: in principle, giving a complete overview of possible anomalies is not feasible, since the number of anomalies depends on the expressiveness of the ontology and the rule representation, respectively, that should be verified. The actual frequency of the introduced anomalies is an interesting issue. However, only a small number of ontologies (mostly toy examples) is available that make use of a rule extension. A sound review of anomaly occurrences would require a reasonable number of practical ontologies with a significant size. Furthermore, larger systems may also include parts of a non-monotonic rule base. Here, some work has been done on the verification of non-monotonic rule bases [37,38], that has to be re-considered in the presence of an ontological layer. rule_transitivity(Rule, R, Rule_t) :transitiveObjectProperty(R), body_predicate(Rule, P), equivalentObjectProperties(R, P), body_predicate(Rule, Q), equivalentObjectProperties(R, Q), Pxy =.. [P,X,Y], Qyz =.. [Q,Y,Z], Rxz =.. [R,X,Z], Rule_t = [Pxy, Qyz]=>Rxz. Symmetry. Given a DATALOG⋆ rule Rule and a predicate R, the following P ROLOG rule tests if R is symmetric and then constructs two atoms Pxy and Ryx, where P is a body predicate of Rule that is equivalent to R. Finally, it forms a DATALOG⋆ rule Rule_s from the two atoms. rule_symmetry(Rule, R, Rule_s) :symmetricObjectProperty(R), body_predicate(Rule, P), equivalentObjectProperties(R, P), Pxy =.. [P,X,Y], Ryx =.. [R,Y,X], Rule_s = [Pxy]=>Ryx. 8. Discussion For the last couple of years, ontologies have played a major role for building intelligent systems. Currently, the standard ontology language OWL is extended by rule-based elements using, e.g., the rule interchange format R IF or the semantic web rule language S WRL. With the introduction of OWL 2 R L a profile of OWL is defined, that is especially useful for the interchange with rule-based knowledge. We have shown, that with the increased expressiveness of ontologies — now also including rules — a number of new evaluation issues has to be considered. In this paper, we have presented a collection of typical anomalies that arise during practical ontology development, especially when aligning and integrating existing ontologies. When reviewing the described anomalies, we see that most issues only depend on OWL axioms with a low expressivity, i.e., many anomalies can occur even when using the simple OWL 2 E L profile [13,34]. Only the following anomalies take advantage of more expressive OWL axioms requiring the profiles OWL 2 Q L or OWL 2 18 References [19] M. Krötzsch, S. Rudolph, P. Hitzler, ELP: Tractable rules for OWL 2, in: ISWC’08: Proceedings of the 7th International Semantic Web Conference, Springer, Berlin, 2008. [20] N. Noy, A. Rector, Defining n-ary relations on the semantic web, Tech. rep., W3C Working Group Note (12 April 2006). [21] W. F. Opdyke, Refactoring Object-Oriented Frameworks, Ph.D. thesis, University of Illinois, Urbana-Champaign, IL, USA (1992). [22] A. Preece, R. Shinghal, Foundation and application of knowledge base verification, International Journal of Intelligent Systems 9 (1994) 683–702. [23] A. Preece, R. Shinghal, A. Batarekh, Principles and practice in verifying rule-based systems, The Knowledge Engineering Review 7 (2) (1992) 115–141. [24] P. Rao, K. F. Sagonas, T. Swift, D. S. Warren, J. Freire, XSB: A system for effciently computing well-founded semantics, in: Logic Programming and Non-monotonic Reasoning, 1997. URL citeseer.ist.psu.edu/article/rao97xsb.html [25] A. L. Rector, N. Drummond, M. Horridge, J. Rogers, H. Knublauch, R. Stevens, H. Wang, C. Wroe, OWL pizzas: Practical experience of teaching OWL-DL: Common errors & common patterns, in: EKAW’04: Engineering Knowledge in the Age of the Semantic Web: 14th International Conference, LNAI 3257, Springer, 2004. [26] R. Rosati, On the decidability and complexity of integrating ontologies and rules, Web Semantics 3 (1) (2005) 61–73. [27] L. Sauermann, G. A. Grimnes, M. Kiesel, C. Fluit, H. Maus, D. Heim, D. Nadeem, B. Horak, A. Dengel, Semantic desktop 2.0: The Gnowsis experience, in: ISWC’06: Proceedings of the 5th International Semantic Web Conference, LNCS 4273, 2006. [28] S. Schaffert, F. Bry, J. Baumeister, M. Kiesel, Semantic wiki, IEEE Software 25 (4) (2008) 8–11. [29] G. Schreiber, H. Akkermans, A. Anjewierden, R. de Hoog, N. Shadbolt, W. V. de Velde, B. Wielinga, Knowledge Engineering and Management - The CommonKADS Methodology, 2nd ed., MIT Press, 2001. [30] D. Seipel, The D IS L OG Developers’ Kit (DDK): http://www1.informatik.uni-wuerzburg.de/databases/DisLog. [31] L. Stojanovic, A. Maedche, B. Motik, N. Stojanovic, User-driven ontology evolution management, in: EKAW’02: Ontologies and the Semantic Web, 13th International Conference, LNAI 2473, Springer, Berlin, 2002. [32] Y. Sure, S. Staab, R. Studer, On-to-knowledge methodology (OTKM), in: S. Staab, R. Studer (eds.), Handbook on Ontologies, International Handbooks on Information Systems, Springer, 2004. [33] W3C, RIF-BLD Specification: http://www.w3.org/tr/rif-bld (July 2008). [34] W3C, OWL2 Profiles: http://www.w3.org/tr/owl2-profiles/ (April 2009). [35] W3C, Semantic Web activity: http://www.w3.org/2001/sw/ (May 2009). [36] J. Wielemaker, An overview of the SWI-Prolog programming environment, in: WLPE’03: Proceedings of the 13th International Workshop on Logic Programming Environments, 2003. [37] N. P. Zlatareva, Verification of non-monotonic knowledge bases, Decision Support Systems 21 (4) (1997) 253 – 261. [38] N. P. Zlatareva, Testing the integrity of non-monotonic knowledge bases containing semi-normal defaults, in: FLAIRS’04: Proceedings of the 17th International Florida Artificial Intelligence Research Society Conference, AAAI Press, 2004. [1] G. Antoniou, F. van Harmelen, A Semantic Web Primer, 2nd ed., MIT Press, 2008. [2] F. Baader, U. Sattler, An Overview of Tableau Algorithms for Description Logics, Studia Logica 69 (2001) 5–40. [3] J. Baumeister, Agile Development of Diagnostic Knowledge Systems, IOS Press, AKA, DISKI 284, 2004. [4] J. Baumeister, D. Seipel, Smelly owls – design anomalies in ontologies, in: FLAIRS’05: Proceedings of the 18th International Florida Artificial Intelligence Research Society Conference, AAAI Press, 2005. URL http://ki.informatik.uni-wuerzburg.de/ papers/baumeister/2005/FLAIRS05OntoSmells.pdf [5] J. Baumeister, D. Seipel, Verification and refactoring of ontologies with rules, in: EKAW’06: Proceedings of the 15th International Conference on Knowledge Engineering and Knowledge Management, Springer, Berlin, 2006. URL http://ki.informatik.uni-wuerzburg.de/ papers/baumeister/2006/EKAW06_baumeisterSWRL.pdf [6] J. Baumeister, D. Seipel, F. Puppe, Refactoring methods for knowledge bases, in: EKAW’04: Engineering Knowledge in the Age of the Semantic Web: 14th International Conference, LNAI 3257, Springer, Berlin, 2004. URL http://ki.informatik.uni-wuerzburg.de/ papers/baumeister/2004/Refactoring-EKAW04.pdf [7] M. Buffa, F. Gandon, G. Ereteo, P. Sander, C. Faron, SweetWiki: A semantic wiki, Web Semantics 8 (1) (2008) 84–97. [8] S. Ceri, G. Gottlob, L. Tanca, Logic Programming and Databases, Springer, Berlin, 1990. [9] M. Fowler, Refactoring. Improving the Design of Existing Code, Addison-Wesley, 1999. [10] A. Gómez-Pérez, Evaluation of taxonomic knowledge on ontologies and knowledge-based systems, in: KAW’99: Proceedings of the 12th International Workshop on Knowledge Acquisition, Modeling and Management, 1999. [11] A. Gómez-Pérez, Evaluation of ontologies, International Journal of Intelligent Systems 16 (3) (2001) 391–409. [12] A. Gómez-Pérez, M. Fernández-López, O. Corcho, Ontological Engineering, Springer, 2004. [13] B. C. Grau, I. Horrocks, B. Motik, B. Parsia, P. Patel-Schneider, U. Sattler, OWL 2: The next step for OWL, Web Semantics 6 (4) (2008) 309–322. URL http://www.sciencedirect.com/science/ article/B758F-4TP1FC8-1/2/ 9d2f647c7ac874b8f8baa9cf92cf73a3 [14] N. Guarino, C. Welty, Evaluating ontological decisions with OntoClean, Communications of the ACM 45 (2) (2002) 61–65. [15] Y. Guo, Z. Pan, J. Heflin, LUBM: A benchmark for OWL knowledge base systems, Web Semantics 3 (2) (2005) 158–182. [16] I. Horrocks, B. Parsia, P. Patel-Schneider, J. Hendler, Semantic web architecture: Stack or two towers?, in: F. Fages, S. Soliman (eds.), Principles and Practice of Semantic Web Reasoning (PPSWR), No. 3703 in LNCS, Springer, 2005. [17] I. Horrocks, P. F. Patel-Schneider, S. Bechhofer, D. Tsarkov, OWL rules: A proposal and prototype implementation, Web Semantics 3 (1) (2005) 23–40. [18] I. Horrocks, U. Sattler, A tableaux decision procedure for SHOIQ, in: IJCAI’05: Proc. of the 19th International Joint Conference on Artificial Intelligence, 2005. 19