IRIS: Integrated Rule Inference System: API and User Guide
IRIS: Integrated Rule Inference System: API and User Guide
IRIS: Integrated Rule Inference System: API and User Guide
April 3, 2008
1
Contents
1 Introduction 4
1.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Description 4
2.1 Datalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Evaluation Process . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Architecture 5
3.1 Supported Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Program optimizations . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Rule Safety Processing . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Stratification Algorithms . . . . . . . . . . . . . . . . . . . . . . . 7
3.5 Rule Re-ordering optimizations . . . . . . . . . . . . . . . . . . . 8
3.6 Rule optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.7 Rule Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.8 Rule Evaluators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.9 Miscellaneous Components . . . . . . . . . . . . . . . . . . . . . 10
3.9.1 Storage and Indexing . . . . . . . . . . . . . . . . . . . . 10
3.9.2 Built-in Predicates . . . . . . . . . . . . . . . . . . . . . . 11
3.9.3 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.10 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Stratification 12
4.1 Stratified negation . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Locally stratified negation . . . . . . . . . . . . . . . . . . . . . . 13
5 Safe Rules 14
5.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Unsafe Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2
7 API guide 17
7.1 Creating objects with the Java API . . . . . . . . . . . . . . . . . 17
7.2 Creating objects using the parser . . . . . . . . . . . . . . . . . . 17
7.3 Evaluating a program . . . . . . . . . . . . . . . . . . . . . . . . 18
7.3.1 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 18
8 External Data-sources 19
3
1 Introduction
1.1 Purpose
This guide is intended to give a short introduction to using the Integrated Rule
Inference System (IRIS) and its application programming interface (API).
1.2 Audience
This guide is for software developers who will be integrating IRIS in to their ap-
plication as well as logicians/researchers who wish to understand the capabilities
of the IRIS reasoner.
1.3 Scope
This guide describes the reasoner architecture, evaluation algorithms and vari-
ous optimizations.
However, this document does not attempt to explain the theory of logic
programming and only provides a brief description of the evaluation strategies
employed.
2 Description
2.1 Datalog
One Logic Programming based formalism that has been thoroughly analyzed is
Datalog[1], which was originally developed as a database query and rule lan-
guage. Datalog is based on a simplified version of the Logic Programming
paradigm (it is a syntactic subset of Prolog) with its main focus on the pro-
cessing of large amounts of data from relational databases. Several relevant
complexity results of Datalog in regard to query answering have been derived.
Querying a static knowledge base in general has polynomial time complexity,
but is exponential otherwise[2].
Datalog can be used in a wide variety of applications, including Description
Logic Programming (DLP)[3], rule languages from the WSML family[4] and
RDF[5] reasoning.
2.2 Features
IRIS is an open-source Datalog reasoner that can evaluate safe or unsafe datalog
extended with function symbols, XML schema data types, built-in predicates
and (locally) stratified or well-founded negation as failure.
It is delivered in three java ‘jar’ files. One contains the reasoning engine,
another contains the parser and the last contains some utility programs includ-
ing two applications that provide a user interface to the IRIS engine. These
4
applications are useful for experimenting with Datalog and various evaluation
options.
IRIS is licensed under the GNU lesser GPL and hosted by Sourceforge1 .
More detailed information is available on the IRIS home page2 .
3 Architecture
IRIS has been designed to be as modular as possible and is comprised of a num-
ber of loosely coupled components that implement well-defined Java interfaces.
This approach allows more evaluation strategies to be added over the course of
time. For the initial releases of IRIS, it was decided to concentrate on bottom-
up evaluation techniques. However, future top-down and hybrid techniques are
envisaged and planned for. When the time comes, new implementations can
be easily ‘plugged-in’ and used without any requirement to modify the existing
code-base.
The advantage of using bottom-up techniques is that they are easily un-
derstood and implemented. The disadvantage is that for large or complex
knowledge-bases, the minimal model may be too expensive to calculate in ei-
ther time or storage requirements. However, ‘Magic Sets’[6] is a well-researched
program optimization technique that mitigates the disadvantages of bottom-up
evaluation by re-writing the rules of the knowledge-base to answer a specific
query. The end effect is that a far more efficient evaluation occurs where only
those tuples likely to be involved in answering the query are computed.
Therefore, at present, IRIS uses a combination of bottom-up evaluation for
simplicity, combined with magic sets optimization for efficiency. This particular
combination is well-researched[7][8], easy to implement, fast and efficient.
1 http://sourceforge.net/projects/iris-reasoner
2 http://www.iris-reasoner.org
5
(Locally) Stratified Well−Founded
Rule−Safety
Rule Re−ordering
Processing
Rule−Safety
Rule Re−ordering
Processing
6
3.2 Program optimizations
As mentioned above, the Magic Sets optimization technique re-writes the rule-
set according to the query so that only tuples likely to be involved in satisfying
the query are computed. It can be shown that this approach allows bottom-up
evaluation to rival top-down techniques in efficiency[10]. In essence, the appli-
cation of magic sets allows only a sub-set of the minimal model to be computed,
i.e. that part which contains all tuples that will be used to answer the query.
The disadvantage, is that a new sub-set of the model must be computed for
each new query. Therefore, magic sets allows faster knowledge-base initializa-
tion times at the expense of longer query times. Whether magic sets is used or
not can be configured programmatically to suit the environment in which IRIS
is being used.
Another simpler program optimization technique is rule-filtering. This tech-
nique is usually used in combination with Magic Sets and simply involves build-
ing a dependency graph between all rule predicates and removing those rules
that can not influence the query result, thus reducing the size of the minimal
model computation.
7
unstratified:
p(2, ?X) : −q(?X), ¬p(3, ?X)
because the rule head predicate has a direct negative dependency upon itself.
However, the rule can only produce tuples whose first term value is 2 and can
only use input tuples whose first term value is 3. Therefore no recursive depen-
dency exists at all and this rule can be evaluated normally.
Locally stratified logic programs can be far more complicated than the simple
example shown above. IRIS uses a novel technique to evaluate locally stratified
logic programs that does not require the use of a well-founded semantics (see
section 4.1 on page 12).
Join condition This optimizer attempts to use the same variable for join con-
ditions, e.g.
p(?X) : −q(?X), r(?Y ), ?X =?Y
would be changed to
8
Re-order literals Re-arrange the literals in a rule body so that the most re-
strictive literals appear first. The preferred order is: positive literals with
no variables, built-ins with no variables, positive literals, built-ins and
negated literals. However, negated literals and built-ins can be pushed
earlier in to the rule body as soon as all their variables are bound.
Remove duplicate literals Remove any literal that appears twice within the
rule with the same variables.
q(?X, ?Y ) is a simple view that selects all tuples from the relation for ‘q’.
r(?Y, ?Y ) is a view that selects only those tuples where both terms are equal.
This view appears as a unary relation.
s(1, ?X) is a view that selects values from the second term of the relation for ‘s’
where the first term is equal to 1. This view also appears as a unary relation.
t(g(?Y, ?Z)) is a view that selects the two term parameters of constructed terms
from the relation for ‘t’. This view converts a unary relation in to a binary view.
The next step is to assign join objects and indexes. Since all joins in Datalog
are natural joins, the compiling stage looks for all matching variables between
two adjacent views, calculates the join indices and creates indexes. The indexes
used in the default rule compiler are hash-based and therefore this approach
is equivalent to performing a hash join. The advantages of a hash join over
a sort-merge join are that the underlying views are not required to be sorted
in any way, rather simply grouped according to matching join indices. This
approach appears to scale much better than maintaining sorted relations (as in
previous versions of IRIS) and is much faster overall. When an evaluation is
highly iterative, the cost of maintaining a sorted relation as tuples are added on
each iteration becomes very expensive.
An important optimization that this approach allows is that of caching of
indexes, views and relations. Bottom-up evaluation can be expensive computa-
tionally when the rule set is highly recursive. However, when a rule is compiled
in to an object model just described, the fetching of matching tuples for joins
does not have to re-evaluate an entire view of a relation, because a view need
9
only process the extra tuples added since the last rule evaluation and the index
only need process those matching tuples from the view.
10
When an IRIS knowledge-base is initialized, the complete rule-set and set of
starting ground facts must be passed to the knowledge-base factory. However,
this is not always convenient, especially when the data set is large. It may
be that the data set does not fit in to memory or takes too long to parse
and format. In any case, not all the data may not be required for evaluation.
For these situations, IRIS allows the user to supply external data sources at
initialization time. An external data source is simply a user supplied Java object
that implements the external data source interface. The storage mechanism used
is left entirely to the class implementor. The external data source must simply
answer requests from the reasoner to provide facts for the given predicate and
selection criteria during program evaluation.
3.9.3 Configuration
IRIS can be configured at the point where a knowledge-base is created. All
configuration parameters are collected together in a single configuration class
that is passed to the knowledge-base factory, thus allowing a highly flexible
combination of standard and user-provided components. The configuration class
contains these categories of parameters:
11
• Program optimizers, rule optimizers and a rule re-ordering optimizer
• Rule set stratifiers
• Rule-safety processor for detecting unsafe rules or making unsafe rules safe
3.10 Errors
A number of problems can occur that halt the evaluation of a query. These
problems are indicated by throwing an exception of one of the following types:
4 Stratification
4.1 Stratified negation
The ‘meaning’ of negation in logic programs has been discussed at length in
literature. Here we adopt the relational model and describe the following con-
struct:
12
If the known facts are applied to rule (1) first, the following new facts are
generated:
p(a)
p(b)
Then applying the known facts to rule (2) produces the following:
r(a)
However, the existence of fact r(a) should have precluded the inference of
fact p(a) in rule (1).
In order to ensure that rule evaluation is monotone, rules must be evaluated
in a specific order.
For any general rule:
p : −L1 ...Lm , N1 ...Np
where L1 ...Lm are positive literals and N1 ...Np are negative literals, existing
stratification algorithms require that the rule ‘p’ be allocated to a strata that
is at least as high as any rule that has a head matching each of its positive
literals and at least one higher than any rule that has a head matching each of
its negative literals.
Such a scheme would therefore require that rule (2) above is evaluated before
rule(1).
However, this approach precludes the evaluation of any logic program con-
taining a rule that has a negative dependency upon itself.
In order for IRIS to evaluate a logic program when not using the well-founded
semantics, it must be stratified.
13
q(b,Y) :- p(b,Y)
q(X,Y) :- p(X,Y), X != b
Then we see that the complete set of rules is still stratified.
5 Safe Rules
An unsafe rule is one in which a variable is used, but has no binding. In essence,
the entire universe of possible values must be substituted for this variable, which
is clearly impractical.
When IRIS is configured not to allow unsafe rules, the standard rule-safety
processor (org.deri.iris.rules.RuleValidator) is used. This processor simply ex-
amines each rule and indicates if any rule is unsafe and exactly why it is unsafe.
Inputting a program containing an unsafe rule without unsafe rule support
turned on will result in a specific exception being thrown containing a message
explaining which rule is unsafe and which variables are problematic.
5.1 Algorithm
The algorithm for detecting unsafe rules is taken from [11] page 105. A rule is
considered safe if all variables are limited. A variable is limited if:
• It appears in a positive ordinary predicate
14
5.2 Unsafe Rules
In order to process unsafe rules IRIS can be configured to use a rule augmen-
tation processor. This processor uses a technique suggested by Gelder[13] that
adds a ‘universe’ predicate for each unbound variable. This universe predicate
automatically contains all term values that appear anywhere in the input pro-
gram or that are created during program evaluation.
p(1,2).
p(2,3).
p(4,3).
p(‘a’,4).
?-q(x,y).
3 http://www.wsmo.org/TR/d16/d16.1/v0.21/#sec:wsml-builtin-datatypes
15
This produces the result set:
p(4,3)
However this program:
p(1,2).
p(2,3).
p(4,3).
p(‘a’,4).
?-q(x,y).
Produces this result set:
p(4,3) p(‘a’,4)
As can be seen from this example, ‘not X < Y ’ is not the same as ‘X ≥ Y ’
16
6.6.2 Register the custom built-in for parsing
The Parser has a BuiltinHelper member object that is used to test if a predicate
symbol is a built-in in or not. If it is a built-in, the BuiltinRegister will create
a new object of the correct type. By default, the Parser will contain a Builtin-
Register with all the standard built-ins registered and this can be obtained from
the Parser and modified (either to add new built-ins or remove ones currently
registered). The javadoc for BuiltinRegister has more details. For an example,
see FahrenheitToCelsiusBuiltin.java in test/org.deri.iris.builtins.
7 API guide
7.1 Creating objects with the Java API
Rules, facts, queries and their components are created using factories. The most
important ones are described below:
17
7.3 Evaluating a program
After a the components of a logic program have been created, either step by step
using the API factories or using the parser, a knowledge base can be created
and queries evaluated by following these steps:
Choose a configuration A default configuration object can be obtained from
the KnowledgeBaseFactory class. Modify this object to change the Knowl-
edgeBase behaviour.
Execute queries After initialisation queries can be executed against the Knowl-
edgeBase by calling execute(). Two variations of this method are available.
The first one just accepts a query and the second accepts a query and an
array for variable bindings. This second method can be useful if the query
is complex and the order of variables is not obvious.
7.3.1 Configuration
IRIS can be configured at the point where a knowledge-base is created. All
configuration parameters are collected together in a single configuration class
that is passed to the knowledge-base factory, thus allowing a highly flexible
combination of standard and user-provided components.
The configuration class contains these categories of parameters:
factories for evaluation strategies, rule compilers, rule evaluators, relations and
indexes.
termination parameters for termination conditions (time out, maximum tu-
ples, maximum complexity)
numerical behaviour significant bits of floating point precision for compari-
son, divide by zero behaviour
rule-safety processor for detecting unsafe rules or making unsafe rules safe
18
8 External Data-sources
IRIS allows the user to reason with data stored outside of the reasoner, i.e. not
passed in at the point where the KnowledgeBase is instantiated. To use an exter-
nal data source, simply create a class that implements the org.deri.iris.api.storage.IDataSource
interface and add an instance of this class to the externalDataSources list in
the Configuration object prior to instantiating the KnowledgeBase.
This approach allows the data to be stored in any format. The user-defined
external data source is simply required to answer requests from the reasoner
to provide tuples for predicates as and when they are needed during query
evaluation.
19
A Datalog Grammar Support
Datalog is a database query language that is syntactically a subset of Prolog.
Its origins date back to around 1978 when Herve Gallaire and Jack Minker
organized a workshop on logic and databases. 5
A.1 Datalog
IRIS evaluates logic programs that contain rules and facts (the knowledge base)
and queries to be evaluated against this knowledge base.
All rules, facts and queries must be terminated by a ‘.’.
rules consist of a head and a body. Both, the head and the body, are lists of
literals where the literals are separated by ‘,’ and the head and the body
are separated by ‘:-’. The ‘,’ means ‘and’, e.g.
‘ancestor(?X, ?Y) :- ancestor(?X, ?Z), ancestor(?Z, ?Y).’
IRIS requires that the head contains exactly one literal, whereas the body
can contain zero or more literals.
facts are instances of predicates with constant terms, e.g. ‘ancestor(’john’, ’odin’).’
5 http://en.wikipedia.org/wiki/Datalog
20
datatype syntax
string ’<string>’
_string(’<string>’)
decimal ’-’?<integer>.<fraction>
_decimal(’-’?<integer>.<fraction>)
integer ’-’?<integer>
_integer(’-’?<integer>)
float _float(<integer>.<fraction>)
double _double(<integer>.<fraction>)
iri _’<iri>’
_iri(’<iri>’)
sqname <string>#<string>
_sqname(’<string>#<string>’)
boolean _boolean(<string>)
duration _duration(<year>, <month>, <day>, <hour>, <minute>,
<second>)
_duration(<year>, <month>, <day>, <hour>, <minute>,
<second>, <millisec>)
datetime _datetime(<year>, <month>, <day>, <hour>, <minute>,
<second>)
_datetime(<year>, <month>, <day>, <hour>, <minute>,
<second>, <tzHour>, <tzMinute>)
_datetime(<year>, <month>, <day>, <hour>, <minute>,
<second>, <millisec>, <tzHour>, <tzMinute>)
date _date(<year>, <month>, <day>)
_date(<year>, <month>, <day>, <tzHour>, <tzMinute>)
time _time(<hour>, <minute>, <second>)
_time(<hour>, <minute>, <second>,
<tzHour>, <tzMinute>)
_time(<hour>, <minute>, <second>, <millisec>,
<tzHour>, <tzMinute>)
gyear _gyear(<year>)
gyearmonth _gyearmonth(<year>, <month>)
gmonth _gmonth(<month>)
gmonthday _gmonthday(<month>, <day>)
gday _gday(<day>)
hexbinary _hexbinary(<hexbin>)
base64binary _base64binary(<base64binary>)
21
name syntax supported operations
add ?X + ?Y = ?Z numeric + numeric = numeric
ADD(?X, ?Y, ?Z) date + duration = date
duration + date = date
time + duration = time
duration + time = time
datetime + duration = datetime
duration + datetime = datetime
duration + duration = duration
subtract ?X - ?Y = ?Z numeric - numeric = numeric
SUBTRACT(?X, ?Y, ?Z) date - duration = date
date - date = duration
time - duration = time
time - time = duration
datetime - duration = datetime
datetime - datetime = duration
duration - duration = duration
multiply ?X * ?Y = ?Z numeric x numeric = numeric
MULTIPLY(?X, ?Y, ?Z)
divide ?X / ?Y = ?Z numeric / numeric = numeric
DIVIDE(?X, ?Y, ?Z)
equal ?X = ?Y any type = same type
EQUAL(?X, ?Y) numeric = numeric
any type = different type (always false)
not equal ?X != ?Y any type 6= same type
NOT_EQUAL(?X, ?Y) numeric 6= numeric
any type 6= different type (always true)
less ?X < ?Y any type < same type
LESS(?X, ?Y) numeric type < numeric type
less-equal ?X <= ?Y any type ≤ same type
LESS_EQUAL(?X, ?Y) numeric type ≤ numeric type
greater ?X > ?Y any type > same type
GREATER(?X, ?Y) numeric type > numeric type
greater-equal ?X >= ?Y any type ≥ same type
GREATER_EQUAL(?X, ?Y) numeric type ≥ numeric type
same type SAME_TYPE(?X, ?Y) any type same type as any type
regular expression match REGEX(?X, ?Y) string matches pattern (string term)
any other type is false
22
name syntax supported operations
is base64binary IS_BASE64BINARY(?X) true iff ?X is of type base64binary
is boolean IS_BOOLEAN(?X) true iff ?X is of type boolean
is date IS_DATE(?X) true iff ?X is of type date
is datetime IS_DATETIME(?X) true iff ?X is of type datetime
is decimal IS_DECIMAL(?X) true iff ?X is of type decimal
is double IS_DOUBLE(?X) true iff ?X is of type decimal
is duration IS_DURATION(?X) true iff ?X is of type duration
is float IS_FLOAT(?X) true iff ?X is of type float
is gday IS_GDAY(?X) true iff ?X is of type gday
is gmonth IS_GMONTH(?X) true iff ?X is of type gmonth
is gmonthday IS_GMONTHDAY(?X) true iff ?X is of type gmonthday
is gyear IS_GYEAR(?X) true iff ?X is of type gyear
is gyearmonth IS_GYEARMONTH(?X) true iff ?X is of type gyearmonth
is hexbinary IS_HEXBINARY(?X) true iff ?X is of type hexbinary
is integer IS_INTEGER(?X) true iff ?X is of type integer
is iri IS_IRI(?X) true iff ?X is of type iri
is numeric IS_NUMERIC(?X) true iff ?X is of any numeric type
(integer, float, double, decimal)
is sqname IS_SQNAME(?X) true iff ?X is of type sqname
is string IS_STRING(?X) true iff ?X is of type string
is time IS_TIME(?X) true iff ?X is of type time
23
References
[1] Ullman, J.: Principles of Database Systems. WH Freeman & Co. New
York, NY, USA (1983)
[2] Eiter, T., Gottlob, G., Mannila, H.: Disjunctive datalog. ACM Transac-
tions on Database Systems (TODS) 22(3) (1997) 364–418
[3] Grosof, B., Horrocks, I., Volz, R.: Description logic programs: combining
logic programs with description logic. Proceedings of the 12th international
conference on World Wide Web (2003) 48–57
[4] de Bruijn, J., Lausen, H., Krummenacher, R., Polleres, A., Predoiu, L.,
Kifer, M., Fensel, year=2005, D.: (D16. 1v0. 2 The Web Service Modeling
Language WSML)
[5] Lassila, O., Swick, R., et al.: Resource Description Framework (RDF)
Model and Syntax Specification. (1999)
[6] Beeri, C., Ramakrishnan, R.: On the power of magic. ACM Press New
York, NY, USA (1987)
[7] Bancilhon, F., Maier, D., Sagiv, Y., Ullman, J.: Magic sets and other
strange ways to implement logic programs. Proceedings of the fifth ACM
SIGACT-SIGMOD symposium on Principles of database systems (1986)
1–15
[8] Chen, Y.: Magic Sets and Stratified Databases. J. Logic Programming
295 (1991) 344
[9] Gelder, A.V.: The alternating fixpoint of logic programs with negation. In:
PODS ’89: Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART
symposium on Principles of database systems, New York, NY, USA, ACM
(1989) 1–10
[10] Ullman, J.D.: Bottom-up beats top-down for datalog. In: PODS ’89:
Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium
on Principles of database systems, New York, NY, USA, ACM (1989) 140–
149
[11] Ullman, J.: Principles of Database and Knowledgebase Systems Vol.1. WH
Freeman & Co. New York, NY, USA (1989)
[12] Przymusinski, T.: On the declarative semantics of stratified deductive
databases and logic programs. Foundations of Deductive Databases and
Logic Programming (1987) 193–216
[13] VAN GELDER, A., ROSS, K., SCHLIPF, J.: The Well-Founded Semantics
for General Logic Programs. Journal oi the AssocM1on for Computmg
Machinery 38(3) (1991) 620–650
24