A Completion Algorithm for
Lattice Tree Automata
Thomas Genet1 , Tristan Le Gall2 , Axel Legay1 , and Valérie Murat1
2
1
INRIA/IRISA, Rennes, France
CEA, LIST, Centre de recherche de Saclay, France
Abstract. When dealing with infinite-state systems, Regular Tree Model
Checking approaches may have some difficulties to represent infinite sets
of data. We propose Lattice Tree Automata, an extended version of tree
automata to represent complex data domains and their related operations in an efficient manner. Moreover, we introduce a new completionbased algorithm for computing the possibly infinite set of reachable states
in a finite amount of time. This algorithm is independent of the lattice
making it possible to seamlessly plug abstract domains into a Regular
Tree Model Checking algorithm. As a first instance, we implemented a
completion with an interval abstract domain. We provide some experiments showing that this implementation permits to scale up regular tree
model-checking of Java programs dealing with integer arithmetics.
1
Introduction
In verification, infinite-state models are often used to avoid assumptions on data
structures and architectures, e.g. an artificial bound on the size of a stack or
on the value of a variable. At the heart of most of the techniques that have
been proposed for exploring infinite state spaces, is a symbolic representation
that can finitely represent infinite set of states. In Regular Tree Model Checking
(RTMC), states are represented by trees, set of states by tree automata, and
behavior of the system by tree transducers [1, 8] or rewriting rules [11, 16]. Any
RTMC approach is equipped with an acceleration algorithm to compute possibly
infinite sets of states in a finite amount of time. Among such algorithms, completion by equational abstraction [16] computes successive automata obtained by
application of the rewriting rules, and merges intermediary states according to
an equivalence relation to enforce the termination of the process.
In [6], the authors proposed an exact translation of the semantics of the Java
Virtual Machine to tree automata and rewriting rules. This translation permits
to analyze Java programs with Regular Tree Model checkers. One of the major
difficulties of this encoding is to capture and handle the two-side infinite dimension that can arise in Java programs. Indeed, in such models, infinite behaviors
may be due to unbounded number of calls to method and object creation, or
simply because the program is manipulating unbounded data such as integer
variables. While multiple infinite behaviors can be over-approximated with completion and equational abstraction [16], their combinations may require the use
of artificially large structures.
We address this issue by defining Lattice Tree Automata (LTA). LTA have
special transitions to abstract possibly infinite sets of values by a single element of
a lattice. For example, we may abstract a set of integer values by a single interval
instead of using an unary or binary encoding of those integers and recognizing the
corresponding terms [6]. LTA recognize terms built over such intervals, and the
completion algorithm built on LTA will perform each basic arithmetic operation
in a single completion step, thanks to abstract interpretation techniques [9].
In this paper, we first define the LTA structure, then we propose a completion
algorithm (by equational abstraction) that returns an approximation of the set
of reachable states of an infinite-state systems whose behavior is modeled by
rewriting rules. Finally, we provide some experimental results on the verification
of Java programs using a RTMC environment. More details can be found in [15].
Related Work. [20] defined lattice automata to represent sets of words over an infinite alphabet. LTA are an extension of lattice automata to trees. Other models
like modal automata [4] or data trees [12, 13] consider tree structure with infinite
alphabets but do not exploit the lattice structure as we do. Lattice (-valued) automata [19]map words over a finite alphabet to a lattice value, while LTA map
trees over an infinite alphabet to {0, 1}. Similar automata may define fuzzy tree
languages [10]. Verification of particular classes of properties of Java programs
with interpreted terms can be found in [23].
Many techniques aim at the verification of programs with integer arithmetics.
Among them, abstract interpretation [9] computes over-approximations of reachability sets, but requires a complete evaluation of arithmetic expressions. LTA
can handle expressions that are only partially evaluated, thus may be useful in
interprocedural analysis. There are other ways to deal with arithmetic efficiently
in a regular model-checking framework such as [21]. However, we think that LTA
provide a way to abstract many different types of data (integers, strings, etc.)
by simply plugging the adapted abstract domain (and using its best available
implementation) in a RTMC framework. In particular, LTA could be used by
other RTMC techniques like [1, 8] where such an ability does not exist.
2
Background
Rewriting Systems and Tree Automata. Let F be a finite set of functional symbols, where each symbol is associated with an arity, and let X be a countable
set of variables. T (F, X ) denotes the set of terms and T (F) denotes the set of
ground terms (terms without variables). Var(t) denotes the set of variables of a
term t, and F n , the set of functional symbols of arity n. We denote by P os(t)
the set of positions of a term t, i.e. the set of positions of all its subterms, where
a position is a world over N and ε denotes the top-most position. If p ∈ Pos(t),
then t|p denotes the subterm of t at position p and t[s]p denotes the term obtained by replacement of the subterm t|p at position p by the term s. A Term
Rewriting System (TRS ) R is a set of rewrite rules l → r, where l, r ∈ T (F, X ),
and Var(l) ⊇ Var(r). A rewrite rule l → r is left-linear if each variable of l occurs
only once in l. A TRS R is left-linear if every rewrite rule of R is left-linear.
We now define Tree Automata that are used to recognize possibly infinite
sets of terms. Let Q be a finite set of symbols of arity 0, called states, such that
Q ∩ F = ∅. The set of configurations is T (F ∪ Q). A transition is a rewrite rule
c → q, where c is a configuration and q is a state. A transition is normalized
when c = f (q1 , . . . , qn ), f ∈ F is of arity n, and q1 , . . . , qn ∈ Q. A bottomup nondeterministic finite tree automaton (tree automaton for short) over the
alphabet F is a tuple A = hF, Q, QF , ∆i, where QF ⊆ Q is the set of final states,
∆ is a set of normalized transitions. The transitive and reflexive rewriting relation
on T (F ∪ Q) induced by ∆ is denoted by →∗A . The tree language recognized
by A
S
in a state q is L(A, q) = {t ∈ T (F) | t →∗A q}. We define L(A) = q∈QF L(A, q).
Lattices, Atomic Lattices, Galois Connections. A partially ordered set (Λ, v) is a
lattice if it admits a smallest element ⊥ and a greatest element >, and if any finite
set of elements X ⊆ Λ admits a greatest lower bound (glb) uX and a least upper
bound (lub) tX . A lattice is complete if the glb and lub operators are defined
for all possibly infinite subsets of Λ. An element x of a lattice (Λ, v) is an atom
if it is minimal, i.e. ⊥ < x ∧ ∀y ∈ Λ : ⊥ < y v x ⇒ y = x. The set of atoms of Λ
is denoted by Atoms(Λ). A lattice (Λ, v) is atomic if any element x ∈ Λ where
x 6= ⊥ is the least upper bound of atoms, i.e. x = t{a|a ∈ Atoms(Λ) ∧ a v x}.
Considered two lattices (C, vC ) (the concrete domain) and (A, vA ) (the
abstract domain), there is a Galois connection between the two if there are
two monotonic functions α : C → A and γ : A → C such that : ∀x ∈ C, y ∈ A,
α(x) vA y if and only if x vC γ(y). As an example, sets of integers (2Z , ⊆) can
be abstracted by the atomic lattice (I, v) of intervals, whose bounds belong to
Z ∪ {−∞, +∞} and whose atoms are of the form [x, x], for each x ∈ Z. Any
operation op defined on a concrete domain C can be lifted to an operation op#
on the corresponding abstract domain A, thanks to the Galois connection.
3
Lattice Tree Automata
In this section, we first explain how to add elements of a concrete domain into
terms, which has been defined in [18]. Then we propose a new type of tree
automata recognizing terms with elements of an abstract lattice.
3.1
Interpreted Symbols and Evaluation
In what follows, elements of a possibly infinite concrete domain D will be represented by a set of interpreted symbols F• . The set of symbols is now F = F◦ ∪F• ,
where F◦ is the set of passive (uninterpreted) symbols. The set of interpreted
symbols F• is composed of elements of D (notice that D ⊆ F•0 ), and is also composed of some predefined operations op : Dn → D, where op ∈ F•n and n > 0.
We denote by OP the set of predefined operations, thus we have F• = D ∪ OP .
For example, if D = Z, then F• can be Z ∪ {+, −, ∗}. Passive symbols can
be seen as usual non-interpreted functional operators, and interpreted symbols stand for built-in operations on the domain D. The set T (F• ) of terms
built on F• (called interpreted terms) can be evaluated by using an eval function eval : T (F• ) → D. The purpose of eval is to simplify a term using
the built-in operations of the domain D. eval naturally extends to T (F): (1)
eval(f (t1 , . . . , tn )) = f (eval(t1 ), . . . , eval(tn )) if f ∈ F◦ or ∃i = 1 . . . n : ti 6∈
T (F• ), or (2) the evaluation returns an element of D if f (t1 , . . . , tn ) ∈ T (F• ).
We want to define tree automata to recognize sets of interpreted terms. To
recognize {f (1), f (2), f (3), f (4)}, we would like to have tree automata with special transitions to handle sets of integers for instance: {1, . . . , 4} → q, f (q) → qf .
We propose to generalize this encoding and to define tree automata with some
transitions to recognize elements of a lattice (sets of integers are elements of
the lattice (2Z , ⊆)). By considering generic lattices, we can also improve the efficiency of the approach. Since RTMC only requires an over-approximation of
the set of reachable states, we have special transitions to recognize elements of
a simple, abstract lattice (Λ, v) such as the lattice of intervals. Moreover, we
assume that this abstract lattice is atomic (cf. Section 2).
Each built-in operation op ∈ OP defined on D, is also abstracted by op# ∈
OP # . Since we have that F• = D ∪ OP , the set of abstract symbols is F•# = Λ ∪
OP # . The arity of op# is the same as the one of op. Assuming there is a Galois
connection between the concrete domain and the abstract one (cf. Section 2),
then op# = α ◦ op ◦ γ and eval# : T (F•# ) 7→ Λ is the best approximation of eval.
Example 1. There is a Galois connection between (2Z , ⊆) and the lattice of intervals (I, v). eval# ([2, 3] +# [−1, 2]) = [1, 5]).
3.2
Definition and Semantics
Definition 1 (Lattice tree automaton). A bottom-up non-deterministic finite tree automaton with lattice (lattice tree automaton for short, LTA) for a
given lattice Λ, is a tuple A = hF = F◦ ∪ F•# , Q, QF , ∆i, where F◦ is a set
of passive symbols and F•# = Λ ∪ OP # a set of interpreted symbols, Q a set of
states, QF ⊆ Q are the final states, and ∆ is a set of normalized transitions.
The set of lambda transitions, which recognize elements of the lattice, is defined
by ∆Λ = {λ → q | λ → q ∈ ∆ ∧ λ 6= ⊥ ∧ λ ∈ Λ}. The set of ground transitions
is formally defined by ∆G = {f (q1 , . . . , qn ) → q | f ∈ F ∧ f (q1 , . . . , qn ) → q ∈
∆ ∧ q, q1 , . . . , qn ∈ Q}. Epsilon transitions are transitions of the form q → q 0
where q, q 0 ∈ Q. We extend the partial ordering v (on Λ) on T (F):
Definition 2. Given s, t ∈ T (F), s v t iff :
(1) eval(s) v eval(t) (if both s and t belong to T (F•# )), or (2) s = f (s1 , . . . , sn ),
t = f (t1 , . . . , tn ), f ∈ F◦n and s1 v t1 ∧ . . . ∧ sn v tn .
Example 2. f (g(a, [1, 5])) v f (g(a, [0, 8])), and h([0, 4] + [2, 6]) v h([1, 3] + [1, 9]).
In what follows we may omit # on abstract operations when it is clear from
the context. We now define the transition relation and recognized language of
an LTA. A term t is recognized by an LTA A if eval(t) can be reduced in A.
Definition 3 (t1 →A t2 for LTA). Let t1 , t2 ∈ T (F ∪ Q). t1 →A t2 iff, for all
position p ∈ P os(t1 ) :
– if t1 |p ∈ T (F•# ), there is a transition λ → q ∈ ∆ such that eval(t1 |p ) v λ
and t2 = t1 [q]p
– if t1 |p = q where q ∈ Q, there is an epsilon-transition q → q 0 ∈ ∆, where
q 0 ∈ Q such that t2 = t1 [q 0 ]p
– if t1 |p = f (s1 , . . . , sn ) where f ∈ F n and s1 , . . . sn ∈ T (F ∪ Q), ∃s0i ∈
T (F ∪ Q) such that si →A s0i and t2 = t1 [f (s1 , . . . , si−1 , s0i , si+1 , . . . , sn )]p
– if t1 |p = f (q1 , . . . , qn ) where f ∈ F n and q1 , . . . qn ∈ Q, there is a transition
f (q1 , . . . , qn ) → q ∈ ∆ such that t2 = t1 [q]p .
→∗A is the reflexive transitive closure of →A . There is a run from t1 to t2 if
t1 →∗A t2 . If a LTA has a transition [0, 2] → q then [0, 0] →∗A q, [1, 2] →∗A q, . . . ,
i.e. all possible unions of atoms [0, 0],[1, 1],[2, 2]. The language recognized by a
LTA is thus defined over T (F, Atoms(Λ)), where T (F, Atoms(Λ)) is the set of
ground terms built over (F \ Λ) ∪ Atoms(Λ).
Definition 4 (Recognized language). The tree language recognized by A in
0
0
0
∗
a state q is L(A, q) = {t ∈ T (F, Atoms(Λ))
S | ∃ t such that t v t and t →A q}.
The language recognized by A is L(A) = q∈Qf L(A, q).
Example 3 (Run, recognized language). Let A = hF = F◦ ∪ F•# , Q, Qf , ∆i
be an LTA where ∆ = {[0, 4] → q1 , f (q1 ) → q2 } and Qf = {q2 }. We have:
f ([1, 4]) →∗A q2 and f ([0, 1] + [0, 1]) →∗A q2 , and the recognized language of A is
given by L(A, q2 ) = {f ([0, 0]), f ([1, 1]), . . . , f ([4, 4])}.
4
Completion Algorithm
We only present here the completion algorithm on LTA, other operations are
detailed in [15]. We are interested in computing the set of reachable states of
an infinite state system. We propose to represent states by (built-in) terms and
possibly infinite set of states by an LTA. In this section, we assume that the behavior of the system can be represented by conditional term rewriting systems,
i.e. TRS equipped with conjunction of conditions used to restrain the applicability of the rule. Our conditional TRS , which extends the classical definition of
[2], rewrites terms defined on the concrete domain. This makes them independent from the abstract lattice. We first start with the definition of predicates
that allows us to express conditions in TRS .
Definition 5 (Predicates). Let P be the set of predicates over D. Let ρ be a
n-ary predicate of P such that ρ : Dn 7→ {true, f alse}. We extend the domain
n
of ρ to T (F) in the
following way:
ρ(u1 , . . . , un ) if ∀i = 1 . . . n : ti ∈ T (F• ) and ui = eval(ti )
ρ(t1 , . . . , tn ) =
f alse if ∃j = 1 . . . n : tj 6∈ T (F• )
Observe that if one of the predicate parameters cannot be evaluated into a builtin term, then the predicate returns false and the rule is not applied.
Definition 6 (Conditional Term Rewriting System (CTRS ) on T (F◦ ∪
F• , X )). In our setting, a Conditional Term Rewriting System R is a set of
rewrite rules l → r ⇐ c1 ∧ . . . ∧ cn , where l ∈ T (F◦ , X ), r ∈ T (F◦ ∪ F• , X ),
l 6∈ X , Var(l) ⊇ Var(r) and ∀i = 1 . . . n : ci = ρi (t1 , . . . , tm ) where ρi is a m-ary
predicate of P and ∀j = 1 . . . m : tj ∈ T (F• , X ) ∧ Var(tj ) ⊆ Var(l).
Example 4. Using conditional rewriting rules, the factorial can be encoded by
the CTRS: f act(x) → 1 ⇐ x ≥ 0 ∧ x ≤ 1, f act(x) → x ∗ f act(x − 1) ⇐ x ≥ 2.
Let X a set of variables, Q a set of states, and F a set of symbols. A substitution σ is a function σ : X 7→ Q ∪ T (F) that can be extended to T (F, X )
in this way: for all t ∈ T (F, X ), we define tσ as: (1) if t = f (t1 , . . . , tn ) then
tσ = f (t1 σ, . . . , tn σ), where t, t1 , . . . , tn ∈ T (F, X ), f ∈ F n , (2) if t = x ∈ X
then tσ = σ(x). Recall that F = F◦ ∪F• . The CTRS R and the eval function induces a rewriting relation →R on T (F): in the following way: for all s, t ∈ T (F),
we have s →R t if there exist: (1) a rewrite rule l → r ⇐ c1 ∧ . . . ∧ cn ∈ R,
(2) a position p ∈ Pos(s), and (3) a substitution σ : X 7→ T (F) s.t. s|p = lσ,
t = eval(s[rσ]p ) and ∀i = 1 . . . n : ci σ = true. The reflexive transitive closure of
→R is denoted by →∗R .
Let A be an LTA representing the set of initial states, and R be a CTRS . Our
objective is to compute another LTA representing (an over-approximation of) the
set R∗ (L(A)) = {t | ∃t0 ∈ L(A), t0 →∗R t}. We adopt the completion approach
of [16, 11], which intends to compute a tree automaton AkR such that L(AkR ) ⊇
R∗ (L(A)) for a left-linear CTRS R. The algorithm proceeds by computing the
sequence of automata A0R , A1R , A2R , ... that represents successive applications
i
of R. Computing Ai+1
R from AR is called a one-step completion. In general the
sequence of automata may not converge in a finite amount of time. To accelerate
the convergence, we perform an abstraction operation that will be described
in section 4.3. We now give details on the above constructions, which will be
illustrated step by step by a running example.
4.1
Computation of Ai+1
R
In our setting, Ai+1
is built from AiR by using a completion step that relies
R
on finding critical pairs. Given a substitution σ : X 7→ Q and a rule l → r ⇐
c1 ∧. . .∧cn ∈ R, a critical pair is a pair (rσ 0 , q) where q ∈ Q and σ 0 is the greatest
substitution w.r.t v such that lσ →∗Ai q, σ w σ 0 and c1 σ 0 ∧ . . . ∧ cn σ 0 . Since R,
R
AiR , Q are finite, there is only a finite number of such critical pairs. For each
critical pair such that rσ 0 6→∗Ai q, the algorithm adds two new transitions rσ 0 →
R
q 0 and q 0 → q to AiR , in order to enrich the language of the previous automaton.
To find all critical pairs, in what follows, we use the standard matching algorithm
introduced and described in [11]. This algorithm M atching(l, A, q) matches a
linear term l with a state q in the automaton A. The solution returned by
M atching is a set of substitutions {σ1 , . . . , σn } so that lσi →∗A q. However, as
our TRS relies on conditions, we have to extend this matching algorithm in order
to guarantee that each substitution σi that is a solution of l → r ⇐ c1 ∧ . . . ∧ cn
satisfies c1 ∧ . . . ∧ cn .
Example 5. Let Z be the concrete domain, and intervals on Z be the lattice,
R = {f (x) → cons(x, f (x+1)) ⇐ x ≥ 1} be the CTRS , A0 the LTA representing
the set of initial configurations, with transitions: ∆0 = {[0, 2] → q1 , f (q1 ) → q2 }.
To build A1R from A0 , we have to find all possible substitutions. The matching
algorithm tells that the rewrite rule applies with the substitution {x 7→ q1 }. To
satisfy the constraint x ≥ 1, the substitution {x 7→ q1 } with [0, 2] → q1 will be
restricted to {x 7→ [1, 2]}.
Restricting substitutions is done by a solver Solve on abstract domains. The
output of Solve(σ, A, c1 ∧ . . . ∧ cn ) is either a set of substitutions σ 0 which is a
restriction of σ satisfying c1 ∧ . . . ∧ cn or ∅ if such a restriction does not exist.
On the previous example, Solve({x 7→ q1 }, A, x ≥ 1) = {{x 7→ [1, 2]}}. See [15]
for details about the properties the solver needs to have. Such properties are
generally fulfilled by usual abstract domains implementations.
Definition 7 (Matching solutions of conditional rewrite rules). Let A be
a tree automaton, rl = l → r ⇐ c1 ∧. . .∧cn a rewrite rule and q a state of A. The
set of all possible substitutions for the rewrite rule rl is Ω(A, rl, q) = {σ 0 | σ ∈
M atching(l, A, q) ∧ σ 0 ∈ Solve(σ, A, c1 ∧ . . . ∧ cn ) ∧ @σ 00 : rσ 0 v rσ 00 →A ∗ q}.
Once the set of all possible restricted substitutions σi has been obtained, we
must add the rules rσi → q 0 and q 0 → q in the automaton, where q 0 is a new
state. However, rσi → q 0 is not necessarily a normalized ground transition of the
form f (q1 , . . . , qn ) → q or a lambda transition of the form λ → q, which means
it must be normalized first in order to be added to the LTA.
Definition 8 (Normalization). Let s ∈ T (F ∪ Q), q ∈ Q, A = hF, Q, Qf , ∆i
an LTA, where F• is the set of concrete interpretable symbols used in the CTRS ,
0
F•# the set of abstract symbols used in A, F = F•# ∪ F◦ , and α : F•0 → F•# the
abstraction function, mapping concrete constants to elements of Λ. A new state
is a state of Q not occurring in ∆. N orm(s → q) returns the set of normalized
transitions deduced from s. N orm(s → q) is defined by:
1. if s ∈ F•0 then N orm(s → q) = {α(s) → q}.
0
2. if s ∈ F◦0 ∪ F•# then N orm(s → q) = {s → q},
3. if s = f (t1 , . . . , tn ) where f ∈ F◦n ∪F•n , then N orm(s → q) = {f (q10 , . . . , qn0 ) →
q} ∪ N orm(t1 → q10 ) ∪ . . . ∪ N orm(tn → qn0 ) where for i = 1 . . . n, qi0 is either:
– the right-hand side of a transition of ∆ such that ti →∗∆ qi0
– or a new state, otherwise.
Example 6. From Ex.5, we have to add the normalized form of cons([1, 2], f ([1, 2]+
1)) → q20 and q20 → q2 (where q20 is a new state) to the set of transitions: 1 has
to be abstracted by [1, 1] and f ([1, 2]) has to be replaced by a state recognizing
this term. So ∆1 = ∆0 ∪ N orm(cons([1, 2], f ([1, 2] + 1)) → q20 ) ∪ {q20 → q2 } =
∆0 ∪ {[1, 2] → q3 , [1, 1] → q[1,1] , q3 + q[1,1] → q4 , f (q4 ) → q5 , cons(q3 , q5 ) →
q20 , q20 → q2 }, where q[1,1] , q3 , q4 , q5 are new states induced by normalization.
Observe that the normalization algorithm always terminates. We conclude by
the formal characterization of the one step completion.
Definition 9 (One step completed automaton CR (A)). Let A = hF, Q, Qf , ∆i
be a tree automaton, R be a left-linear CTRS . We denote by CR (A) the one step
completed automaton CR (A) = hF, Q0 , Qf , ∆0 i where:
[
∆0 = ∆ ∪
N orm(rσ → q 0 ) ∪ {q 0 → q}
l→r∈R, q∈Q, σ∈Ω(A,l→r,q)
where Ω(A, l → r, q) is the set of all possible substitutions defined in Def.7,
q0 ∈
/ Q a new state and Q0 contains all the states of ∆0 .
4.2
Evaluation of a Lattice Tree Automaton
Any set of concrete terms that contains the term 1 + 2 should also contain the
term 3. While this property can be true on the initial automaton, it may be
broken when performing a completion step.
Example 7. The first completion step described in Ex.6 adds the transition q3 +
q[1,1] → q4 . Since we have that [1, 2] → q3 and [1, 1] → q[1,1] , the language
recognized by q4 should also contain the term [2, 3].
The objective of the propag function is to evaluate the LTA and to add the
transition [2, 3] → q4 in the above example.
Definition 10 (propag). Let ∆ be the set of transitions of a LTA. Let f (q1 , . . . ,
qn ) → q ∈ ∆, where f ∈ F•n is an interpreted symbol and q, q1 , . . . , qn ∈ Q. If
there exists λ1 , . . . , λn ∈ Λ such that λ1 →∗∆ q1 , . . . , λn →∗∆ qn , then one step of
evaluation of f (q1 , . . . , qn ) → q is defined by:
∆ if ∃λ → q ∈ ∆ ∧ eval(f (λ1 , . . . , λn )) v λ
propag(∆, f (q1 , . . . , qn ) → q) =
∆ ∪ {eval(f (λ1 , . . . , λn )) → q}, otherwise.
One step of evaluation for ∆ is defined by:
[
propag(∆) =
propag(∆, f (q1 , . . . , qn ) → q)
∀f (q1 ,...,qn )→q∈∆ s.t. f ∈F•n
Since propag can add new transitions, it must be applied until a fix-point is
reached. Then using propag, we can extend the eval function to sets of transitions
and to tree automata in the following way.
Definition 11 (eval on transitions and automata). µX.f (X) denotes the
least fix-point of a generic function f . We define: eval(∆) = µX.propag(X) ∪ ∆
and eval(hF, Q, Qf , ∆i) = hF, Q, Qf , eval(∆)i.
Example 8. In our example, eval(∆1 ) = ∆1 ∪ {[2, 3] → q4 }.
Theorem 1. L(A) ⊆ L(eval(A)).
4.3
Equational Abstraction
If we perform another completion step on our example, we see that we can apply
the rewrite rule with a new substitution mapping x to q4 .
Example 9. Then N orm(cons(q4 , f (q4 + 1))) → q50 and q50 → q5 will be added
to eval(∆1 ) to build A2R from A1R . We have ∆2 = eval(∆1 ) ∪ {q4 + q[1,1] →
q6 , f (q6 ) → q7 , cons(q4 , q7 ) → q50 , q50 → q5 }. If we perform the evaluation step,
we have eval(∆2 ) = ∆2 ∪ {[3, 4] → q6 }. We can see that this process is infinite,
because it will compute the infinite term cons([1, 2], cons([2, 3], cons([3, 4], . . .))).
Termination of completion can be enforced using a set E of approximation equations as in [22, 16]. Depending on the objective, E can either be defined by hand
(e.g. [22]), by hand and automatically refined [5], or automatically generated
from a static analysis of the TRS (e.g. [7]). In our example, the infinite behavior
is due to transitions of the form qi + q[1,1] → qj . An equation such as x = x + 1
is needed to ensure termination of completion. Equations of E will be of the
form u = v ⇐ c1 ∧ . . . ∧ cn , where u, v ∈ T (F◦ ∪ F• , X ). Let σ : X 7→ Q be a
substitution s.t. uσ →Ai+1 q, vσ →Ai+1 q 0 and q 6= q 0 . An over-approximation
R
R
0
of Ai+1
(denoted by Ai+1
R
R,E ) can be obtained by merging states q and q , i.e.
replacing each occurrence of q 0 by q in Ai+1
R . Contrary to the completion case,
we do not need to restrict the substitutions obtained by the matching algorithm
with respect to the constraints of the equation, but simply guarantee that such
constraints are satisfiable, i.e., Solve(σ, A, c1 ∧ · · · ∧ cn ) 6= ∅.
For instance, E = {x = x + 1 ⇐ x > 2} can be used on Ex.9. We have
two possible substitutions: σ1 = {x 7→ q3 } and σ2 = {x 7→ q4 }. σ1 is due to
the transition q3 + q[1,1] → q4 . However, since [1, 2] → q3 we have Solve({x 7→
q3 }, A2 , x > 2) = ∅ and thus σ1 does not satisfy the condition. Substitution σ2 ,
due to the transition q4 + q[1,1] → q6 , satisfies the condition because [2, 3] → q4
and Solve({x 7→ q4 }, A2 , x > 2) = {x 7→ [3, 3]} 6= ∅. Hence, the equation is
applied for σ2 and results in the merging of q4 and q6 according to E.
Theorem 2. Let A be an LTA and E a set of equations. We denote by ;!E the
transformation of A by merging all equivalent states according to E. If A ;!E A0
then L(A) ⊆ L(A0 ).
Widening Step. Any set containing the term 1+2 should also contain the term 3.
However, this can be broken by merging. Merging of states changes transitions of
the LTA. So we have to perform an evaluation step after merging by equations.
Example 10. After merging q4 and q6 , we have M erge(∆2 , q4 , q6 ) = eval(∆1 ) ∪
q3 + q[1,1] → q4 , f (q4 ) → q5 , cons(q3 , q5 ) → q20 , q20 → q2 , [2, 3] → q4 , q4 + q[1,1] →
q4 , f (q4 ) → q7 , cons(q4 , q7 ) → q50 , q50 → q5 , [3, 4] → q4 }. We have to evaluate the
transition q4 + q[1,1] → q4 . The first iteration will evaluate the term [3, 4] + [1, 1]
which adds the transition [4, 5] → q4 . Since a new element is in the state q4 , the
second iteration will evaluate the term [4, 5] + [1, 1] recognized by the transition
q4 + q[1,1] → q4 . Since there will always be a new element of the lattice that will
be associated to q4 , the computation of the evaluation will not terminate.
Since eval is defined as a fix-point of propag, this computation may not terminate without the application of a widening operator ∇Λ : Λ × Λ 7→ Λ. It is a
classical way to compute over-approximation of fix-points within the abstract
interpretation framework [9].
Example 11. If we apply such a widening operator on our example after 3 iterations (for instance) of the propag function, then the transitions: [2, 3] → q4 ,
[3, 4] → q4 , [4, 5] → q4 will be replaced by [2, +∞[→ q4 .
4.4
LTA Completion and its Soundness
Definition 12 (Automaton completion for LTA). Let A be a tree automaton, R a CTRS and E a set of equations.
– A0R,E = A,
0
n
!
00
00
0
– Repeat An+1
R,E = A with eval(CR (AR,E )) ;E A and eval(A ) = A ,
– Until a fixpoint A∗R,E = AkR,E = Ak+1
R,E (with k ∈ N) is reached.
Theorem 3 (Soundness). Let R be a left-linear CTRS , A be a tree automaton and E be a set of linear equations. If completion terminates on A∗R,E then
L(A∗R,E ) ⊇ R∗ (L(A))
Example 12. In our example, thanks to the widening performed at the previous evaluation step, completion adds no more rule to the current automaton
and stops. We have a fixed-point which is an over-approximation of the set of
reachable states.
5
Experiments
LTA completion has been developed and integrated into Timbuk [14]. For those
experiments, we choose to instantiate the generic LTA-completion algorithm with
the lattice of integer intervals: TimbukLTA. Experiments are detailed in [15].
We compare the efficiency of LTA completion w.r.t. standard completion on
TRS produced by Copster [3]. Copster compiles Java .class files into a TRS
modeling exactly the semantics of the Java program3 . We extend Copster to
produce either TRSs or conditional TRS (CTRS ) as in Section 4. CTRSs do
not use Peano integers or arithmetic but assume that all integer arithmetic is
built-in. On the Java program examples, we prove the same properties using
either Timbuk or TimbukLTA and compare their efficiency. We made several experiments on three different Java programs that are detailed in [15]. On the first
one, called “Threads”, we prove that whatever the scheduling of Java threads
may be, the access to a critical section is protected using the synchronized Java
mechanism. The second one “Euclid” consists of an implementation of integer
3
Copster covers basic types, arithmetic, object creation, heap management, field manipulation, virtual method invocation, threads, as well as a subset of the System and
String library. Details about this compilation can be found in [6].
division in a recursive way using addition and subtraction. In the third one,
called “FactoList”, there is an unbounded number of integers which are read on
the input channel and their factorial values are stored into a singly linked list.
In the end, the content of the list is printed to the output stream. Depending on
the possible values for integers read on the input stream, we can prove different
properties on the integers printed on the output stream.
Standard completion LTA completion
Compl.
Compl.
Compl. Compl.
Examples
steps
time
steps
time
Threads
306
56s
328
280s
Euclid
2019
59s
727
14s
FactoList, input stream=(3, 1, 2, 0)
799
17s
538
33s
FactoList, input stream=(7, 5, 6, 4, 1)
>9465
>2h
1251
250s
FactoList, any input stream of [−∞; +∞] 467
20s
349
40s
FactoList, any input stream of [2; +∞]
468
21s
430
14s
FactoList, any input stream of [3; +∞]
953
320s
467
15s
FactoList, any input stream of [4; +∞]
>1500
> 2h
641
32s
Table 1. Performances of standard completion against LTA completion
Table 1 shows that integration of LTA in completion may reduce its efficiency
when the TRS to verify does not rely on arithmetic (“Threads” example). On the
opposite, unlike standard completion, LTA completion scales up when arithmetic
is used in the analysis (“Euclid” and “FactoList” example). TimbukLTA and the
adapted Copster can be downloaded from their respective pages [14, 3].
6
Conclusion and Future Work
We have proposed LTA, a new extension of tree automata for tree regular model
checking of infinite-state systems with interpreted terms. One of our main contributions is the development of a new completion algorithm for such automata.
A nice property of this adapted algorithm is that it is independent of the lattice:
it only has to be atomic and equipped with a solver for the predicates of the
CT RS [15]. Any lattice fulfilling those requirements can be seamlessly plugged
into the regular tree model checking algorithm. We developed TimbukLTA which
is the implementation of completion for LTA. We presented a first instance of
TimbukLTA where we plugged in an integer interval abstract domain. This simple
abstract domain permitted to drastically improve the efficiency of completion for
the verification of Java programs dealing with integer arithmetic. The resulting
LTA homogeneously combine abstract domains to approximate numerical values with tree automata to approximate structures: thread states, stacks, heaps
and objects. Future plans are to integrate in TimbukLTA more abstract domains
dealing with other kinds of built-ins: strings, reals, etc. and to define syntactic
constraints on equations to guarantee termination of LTA completion like in [17].
References
1. P. A. Abdulla, B. Jonsson, P. Mahata, and J. d’Orso. Regular tree model checking.
In CAV, volume 2404 of LNCS. Springer, 2002.
2. F. Baader and T. Nipkow. Term Rewriting and All That. Cambridge University
Press, 1998.
3. N. Barré, F. Besson, T. Genet, L. Hubert, and L. Le Roux. Copster homepage,
2009. http://www.irisa.fr/celtique/genet/copster.
4. S. Bauer, U. Fahrenberg, L. Juhl, K.G. Larsen, A. Legay, and C. Thrane. Quantitative refinement for weighted modal transition systems. In MFCS, volume 6907
of LNCS. springer, 2011.
5. Y. Boichut, B. Boyer, T. Genet, and A. Legay. Equational Abstraction Refinement
for Certified Tree Regular Model Checking. In ICFEM’12, volume 7635 of LNCS.
Springer, 2012.
6. Y. Boichut, T. Genet, T. Jensen, and L. Leroux. Rewriting Approximations for
Fast Prototyping of Static Analyzers. In RTA, LNCS. Springer Verlag, 2007.
7. Y. Boichut, P.-C. Héam, and O. Kouchnarenko. Approximation-based tree regular
model-checking. Nord. J. Comput., 14(3):216–241, 2008.
8. A. Bouajjani and T. Touili. Extrapolating tree transformations. In CAV, volume
2404 of LNCS. Springer, 2002.
9. P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static
analysis of programs by construction or approximation of fixpoints. In POPL, pages
238–252, 1977.
10. Z. Ésik and G. Liu. Fuzzy tree automata. Fuzzy Sets Syst., 158:1450–1460, July
2007.
11. G. Feuillade, T. Genet, and V. Viet Triem Tong. Reachability Analysis over Term
Rewriting Systems. jar, 33 (3-4):341–383, 2004.
12. D. Figueira and L. Segoufin. Bottom-up automata on data trees and vertical xpath.
In STACS, 2011.
13. B. Genest, A. Muscholl, and Z. Wu. Verifying recursive active documents with
positive data tree rewriting. In FSTTCS, 2010.
14. T. Genet. Timbuk. http://www.irisa.fr/celtique/genet/timbuk/.
15. T. Genet, T. Le Gall, A. Legay, and V. Murat. Tree regular model checking for lattice-based automata.
Technical Report RT-0424, INRIA, 2012.
http://hal.inria.fr/hal-00687310.
16. T. Genet and V. Rusu. Equational approximations for tree automata completion.
Journal of Symbolic Computation, 45(5):574–597, May 2010.
17. T. Genet and Y. Salmon. Tree Automata Completion for Static Analysis
of Functional Programs. Technical report, INRIA, 2013. http://hal.archivesouvertes.fr/hal-00780124/PDF/main.pdf.
18. S. Kaplan and C. Choppy. Abstract rewriting with concrete operations. In RTA,
pages 178–186, 1989.
19. O. Kupferman and Y. Lustig. Lattice automata. In VMCAI, 2007.
20. T. Le Gall and B. Jeannet. Lattice Automata: A Representation for Languages on
Infinite Alphabets, and Some Applications to Verification. In SAS, 2007.
21. J. Leroux. Structural Presburger digit vector automata. TCS, 409(3), 2008.
22. J. Meseguer, M. Palomino, and N. Mart-Oliet. Equational Abstractions. In Proc.
19th CADE Conf., Miami Beach (Fl., USA), volume 2741 of LNCS, pages 2–16.
Springer, 2003.
23. C. Otto, M. Brockschmidt, C. von Essen, and J. Giesl. Automated termination
analysis of java bytecode by term rewriting. In RTA, LIPIcs. Dagstuhl, 2010.
View publication stats