Random Context Tree Grammars and Tree Transducers

Brink Van Der Merwe

Random Context Tree Grammars and Tree Transducers

Brink Van Der Merwe

2005

visibility

…

description

24 pages

link

1 file

Regular tree grammars and top-down tree transducers are extended by random context sensitivity as known from the areas of string and picture generation. First results regarding the generative power of the resulting devices are presented. In particular, we investigate the path languages of random context tree languages.

Random Context Tree Grammars and Tree Transducers∗ Sigrid Ewert School of Computer Science, University of the Witwatersrand 2050 Wits, South Africa [email protected] Brink van der Merwe, Christine du Toit, Andries van der Walt Department of Computer Science, University of Stellenbosch 7602 Stellenbosch, South Africa {abvdm,cdutoit,apjw}@cs.sun.ac.za Frank Drewes, Johanna Högberg Department of Computing Science, Umeå University S–901 87 Umeå, Sweden {drewes,johanna}@cs.umu.se Abstract. Regular tree grammars and top-down tree transducers are extended by random context sensitivity as known from the areas of string and picture generation. First results regarding the generative power of the resulting devices are presented. In particular, we investigate the path languages of random context tree languages. c 2005 Copyright ° UMINF 05.02 1 ISSN 0348-0542 Introduction We provide regular tree grammars and top-down tree transducers with a regulation mechanism known as random context sensitivity, and initiate the investigation of the resulting devices. Some results concerning the properties of the generated tree languages and the computed tree transformations are presented. In particular, the path languages of random context tree languages are studied in The work reported here has been part of the project Random Context in the Generation and Transformation of Trees funded by the National Research Foundation South Africa (NRF) and the Swedish International Development Cooperation Agency (Sida). ∗ 1 a ﬁrst attempt to understand the generative power of random context tree grammars. Moreover, results regarding the domains and ranges of random context tree transducers are proved. Random context is an old and well-known concept. It was introduced by one of the authors in 1972 as a means to enhance the generative power of context-free string grammars [Wal72], and is one of several equivalent mechanisms of regulated rewriting discussed in [DP89]. Every rule consists of a context-free rule A → w and two sets P, F of nonterminals, called the permitting respectively forbidding set. The rule is applicable to an occurrence of A in a string if every symbol in P and none of those in F occurs elsewhere in the string. Thus, in contrast to the notion of context sensitivity introduced by Chomsky, the context can be distributed randomly in the string. This has several advantages. First, random context string grammars are less powerful than context sensitive Chomsky grammars, which is a desirable property because the latter are commonly considered to be too powerful to allow for a nice theory. Second, there are two natural restrictions called random permitting context and random forbidding context, where F respectively P is required to be empty for every rule. Third, the application condition is based on the existence or non-existence of nonterminals alone, rather than relying on a notion of substring or, more generally, subobject. This means that it can easily be applied to any other type of generative device based on the replacement of nonterminals. This has been done for picture grammars [EvdW99b], leading to classes of picture languages with interesting properties (see, e.g., [EvdW99a, EvdW00, EvdW02, EvdW03]). In this paper, we apply the idea to regular tree grammars and top-down tree transducers. Tree grammars and tree transducers have a long and successful history starting in the late sixties. The ﬁrst works on regular tree languages focussed on the bottomup tree automaton as a language recognition device. As an equivalent generative device, the regular tree grammar was introduced explicitly by Brainerd in [Bra69], but was implicitly already present in the seminal paper [MW67] by Mezei and Wright. Not long thereafter, Rounds and Thatcher [Rou68, Rou70, Tha70] invented the top-down tree transducer, motivated by problems and applications in syntax-directed translation. Nowadays, there is an enormous body of literature on tree grammars, tree transducers, and their applications (see, e.g., [GS84, GS97, FV98, Dre05]). Devices for the generation and transformation of trees are interesting because a tree can be viewed as an expression to be evaluated with respect to an algebra. For instance, the context-free string languages are obtained from the regular tree languages by interpreting all symbols of rank 0 as strings of length one and each symbol of rank n > 0 as n-ary concatenation. But we may also interpret the symbols as operations on graphs to obtain context-free graph languages [Eng94], or as operations on pictures to obtain context-free picture languages [Dre05]. Similarly, a tree transformation can be considered as a symbolic algorithm that 2 can be interpreted in any domain [Eng80]. The study of random context tree grammars and random context tree transformations initiated in this paper is therefore not only interesting in its own right. Via interpretation, it also makes the random context mechanism available to all other types of domain. For example, by simply looking at the deﬁnitions it is obvious that the random context tree grammars deﬁned in this paper generate exactly the random context string languages if symbols are interpreted as strings and string concatenation in the way mentioned above. On the other hand, if symbols are interpreted as a certain type of operations on pictures (namely those in [Dre96, DEKK03]), one obtains exactly the random context picture languages. The structure of this paper is as follows. In Section 2 some basic terminology is compiled. Section 3 introduces random context tree grammars (obtained from the regular ones by adding the random context mechanism) and discusses a few basic properties. In Section 4, we study the path languages of random context tree languages, i.e., the set of all strings obtained by collecting all root-to-leaf paths in the trees of the given language. It is shown that the path language of a random context tree language L is regular if L is of ﬁnite index. Moreover, for the permitting case, we prove a pumping lemma for paths that can be used to show that a given tree language cannot be generated by means of random permitting context. Finally, in Section 5, we introduce random context top-down tree transducers and study some of their properties. 2 Basic notation The set of natural numbers (including 0) is denoted by N. For n in N, [n] denotes the set {1, . . . , n}. Given a function f : A → B, we also use f to denote its canonical extension to the power set of A, i.e, f : 2A → 2B is given by f (S) = {f (a) | a ∈ S} for all S ⊆ A. The transitive and reﬂexive closure of a relation ∗ → ⊆ A × A is denoted by →. S A signature is a ﬁnite set of symbols Σ = k∈N Σ(k) that is partitioned into pairwise disjoint subsets Σ(k) , where the symbols in Σ(k) are said to have rank k. If the rank of a symbol is important, we will add it to the symbol as a superscript, e.g. if f ∈ Σ(k) , then we will write f (k) . The set TΣ of all trees over Σ is deﬁned inductively, as usual: It is the smallest set of strings over Σ such that f t1 · · · tk ∈ TΣ for every f ∈ Σ(k) and all t1 , . . . , tk ∈ TΣ . Note that Σ(0) ⊆ TΣ . To improve legibility, we usually write f [t1 , . . . , tk ] instead of f t1 · · · tk unless k ≤ 1. If, for all n > 1, Σ(n) = ∅, we say that Σ is a monadic signature; if t is a tree over Σ, then t is a monadic tree. The set of 3 subtrees of a tree t = f [t1 , . . . , tk ] is deﬁned recursively as subtrees (t) = {f [t1 , . . . , tk ]} ∪ k [ subtrees (ti ) . i=0 Given a set T of trees, Σ(T ) denotes the set of all trees of the form f [t1 , . . . , tk ] such that f ∈ Σ(k) for some k ∈ N and t1 , . . . , tk ∈ T . Furthermore, the set of trees over Σ with subtrees in T , denoted by TΣ (T ), is deﬁned inductively: i) T ⊆ TΣ (T ) and ii) if f ∈ Σ(k) and t1 , . . . , tk ∈ TΣ (T ), then f [t1 , . . . , tk ] ∈ TΣ (T ). Note that this generalizes the notation TΣ because TΣ ⊆ TΣ (T ) and, in particular, TΣ (∅) = TΣ . We need some standard terminology regarding trees. For this, let t = f [t1 , . . . , tk ] be a tree. • The set of vertices or nodes of t is V(t) = {λ} ∪ {iv | 1 ≤ i ≤ k, v ∈ V(ti )}. • The label t(u) of a vertex u ∈ V(t) is f if u = λ, and ti (v) if u = iv for some i ∈ [k] and v ∈ V(ti ). • The subtree t/u rooted at a vertex u ∈ V(t) is t if u = λ, and ti /v if u = iv for some i ∈ [k] and v ∈ V(ti ). • The size of t, denoted by |t|, is the number of (occurrences of) symbols in P t, i.e. |t| = 1 + ki=1 |ti |. • The height of t, which we denote by height (t), equals 0 if k = 0; otherwise, height (t) = max(height (t1 ), . . . , height (tk )) + 1. • The yield of t is deﬁned recursively by ( f if k = 0 yield (t) = yield (t1 ) · · · yield (tk ) otherwise . Thus, yield (t) is the string of leaves of t, read from left to right. Let X = {x1 , x2 , . . .} be a set of special symbols, all of rank zero, that is disjoint with every other signature in this paper. If t ∈ TΣ (X) for some arbitrary signature Σ, then we denote by t[[t1 , . . . , tk ] the tree that results when each occurrence of xi in t is replaced by ti , i ∈ [k]. If we only wish to talk about a subset {x1 , . . . , xk } of X for some k ∈ N, we refer to the subset as Xk . A regular tree grammar is a quadruple G = (N, Σ, R, S) consisting of • a signature N of nonterminals of rank 0, • a signature Σ of terminals, with N ∩ Σ = ∅, • a ﬁnite set of productions (also called rules) of the form A → r, where A ∈ N and r ∈ TΣ∪N ; and • an initial nonterminal S. 4 Let t, t′ ∈ TΣ∪N . We say that there is a derivation step from t to t′ and write t ⇒G t′ (or simply t ⇒ t′ if G is understood) if t = s[[A]] and t′ = s[[r]], such that s ∈ TΣ∪N (X1 ) and there is a production A → r in R (where s contains ∗ x1 exactly once). We say that there is a derivation from t to t′ if t ⇒G t′ . The generated language of a regular tree grammar G = (N, Σ, R, S) is the set ∗ L(G) = {t ∈ TΣ | S ⇒ t}. 3 Random context tree grammars A random context tree grammar (rc tree grammar for short) can be seen as a regular tree grammar with extended productions: To each production we add two sets of nonterminals, a permitting set and a forbidding set. In a derivation a production may only be applied to a tree s if every nonterminal in the permitting set occurs in s at least once, but none of the nonterminals of the forbidding set occurs in s at all. Deﬁnition 3.1 (Random context tree grammar) An rc tree grammar is a quadruple (N, Σ, R, S) deﬁned in the same way as a regular tree grammar in every respect, except that the productions have the form A → r (P ; F ) where A ∈ N , r ∈ TΣ (N ) and P, F ⊆ N . There is a derivation step from t = s[[A]] to t′ = s[[r]] (where s contains x1 exactly once) and we write t ⇒R t′ , t ⇒G t′ , or simply t ⇒ t′ if - there is a production A → r (P ; F ), - each nonterminal in P occurs in s, and - none of the nonterminals in F occurs in s. The language generated by G, called an rc tree language, is given by ∗ L(G) = {t ∈ TΣ | S ⇒ t} . Obviously, regular tree grammars are a special case of random context tree grammars if every rule A → r is identiﬁed with A → r (∅; ∅). This observation makes it meaningful to abbreviate a random context rule A → r (∅; ∅) as A → r. The situation when either all permitting sets, or all forbidding sets are empty is also worth investigating. Deﬁnition 3.2 (Special cases of random context) A random permitting context tree grammar (rpc tree grammar) is a random context tree grammar whose forbidding sets are all empty, while a random forbidding context tree grammar (rfc tree grammar) is a random context tree grammar whose permitting sets are all empty. 5 S f g A B ′ f ⇒ A ⇒ g A ⇒ g g A B A g B ⇒ g A′ ⇒ g g A′ f f g g A f ⇒ ′ A g B ⇒ g A′ ⇒ g g A′ ′ B′ ⇒ g B′ ⇒ g g B′ ⇒ f f g g B f g g h B f f g B′ f ⇒ f g g B f g g B′ ⇒ g g h g g h Figure 1: A possible derivation of the grammar G deﬁned in Example 3.3. We call the families of tree languages generated by rpc and rfc tree grammars rpc respectively rfc tree languages. Example 3.3 To demonstrate that random context does add generative power, we consider the rc grammar G = (N, Σ, R, S) whose components are N = {S, A, A′ , B, B ′ }, Σ = {f (2) , g (1) , h(0) }, and R ={ S → f [A, B], A → A′ ({B}; ∅), B → B ′ ({A′ }; ∅), A′ → g[A] ({B ′ }; ∅), B ′ → g[B] ({A}; ∅), } . A′ → h, B′ → h One possible derivation in G is shown in Figure 1. The reader is encouraged to determine, for each step in the derivation, which productions in R are applicable. It should then be clear that L(G) is the set {f [t, t] | t ∈ T{g(1) ,h(0) } }, a tree language that is well known not to be regular (due to pumping arguments). We would now like to establish some properties of rc tree grammars. We begin by mentioning two easy observations that relate rc tree languages to string languages. The ﬁrst concerns the relation between rc tree languages and rc string languages mentioned in the introduction. If we turn every rule A → r (P ; F ) of an rc tree grammar G into A → yield (r) (P ; F ), we obtain an rc string grammar G′ in 6 the sense of [Wal72].1 From the relevant deﬁnitions, it is then obvious that the following implications hold for every tree t: 1. For every tree t′ , if t ⇒G t′ then yield (t) ⇒G′ yield (t′ ), and ∗ 2. for every string w, if yield (t) ⇒G′ w, then t ⇒G t′ for a tree t′ such that w = yield (t′ ). Hence, L(G′ ) = yield (L(G)). Conversely, given an rc string grammar G, by turning every rule A → a1 · · · ak (P ; F ) of G′ into A → f [a1 , . . . , ak ] for some symbol f (k) , we obtain an rc tree grammar G such that, again, L(G′ ) = yield (L(G)). Hence, we have the following result. Observation 3.4 A string language is an rc string language if and only if it is the yield of an rc tree language. This is also true if rc is replaced by rpc or rfc. In the derivation of a monadic tree there can never be more than one nonterminal at a time. Therefore, only productions with empty permitting contexts can be applied to that nonterminal, and forbidding contexts have no eﬀect at all. This yields the following observation. Observation 3.5 For every rc tree language L and every monadic signature Σ, the language L ∩ TΣ is regular. In particular, L is regular if it is monadic. A well-known normal form result for regular tree grammars states that it suﬃces to consider rules whose right-hand sides are elements of Σ(N ). For rc tree grammars, this does not seem to be achievable in general since chain rules (i.e., rules with right-hand sides in N ) are quite important (consider, e.g., the way in which Example 3.3 makes use of such rules). However, we can show that right-hand sides in N ∪ Σ(N ) are suﬃcient. The construction used in the proof is rather straightforward. Except for some slight modiﬁcations, it is similar to the one for regular tree grammars. Theorem 3.6 Every rc tree grammar G = (N, Σ, R, S) can eﬀectively be turned into an rc tree grammar G′ = (N ′ , Σ, R′ , S) with L(G′ ) = L(G), such that r ∈ N ′ ∪ Σ(N ′ ) for every rule A → r (P ; F ) in R′ . This is also true if rc is replaced by rpc or rfc. Proof We ﬁrst prove this for the case of rpc tree grammars. Hence, suppose that G is of the permitting type. We let R′ = R0 ∪ RGEN , where the rules in RGEN help decompose every rule A → r (P ; ∅) with r 6∈ N ∪ Σ(N ), as follows. For every subtree s of r, let GENs be a new nonterminal. Now, R0 contains the rule A → GENr (P ; ∅) and RGEN contains, for every subtree s = f [s1 , . . . , sk ] of r, the rule GENs → f [GENs1 , . . . , GENsk ]. Clearly, for every tree t ∈ TΣ (N ), 1 The explicit definition of rc string grammars is omitted here because it should be obvious enough. 7 ∗ it holds that GENr ⇒RGEN t if and only if t = r. Hence, if G′ = (N ′ , Σ, R′ , S), where N ′ is the new set of nonterminals, then L(G) ⊆ L(G′ ). On the other hand, since the new nonterminals do not occur in the permitting context of any rule, ∗ the derivation steps of every derivation S ⇒R′ t with t ∈ TΣ can be reordered in such a way that the derivation consists of segments of the form ∗ s[[A]] ⇒R0 s[[GENr ] ⇒RGEN s[[r]] , each of which can be turned into s[[A]] ⇒R s[[r]]. This shows that, for all t ∈ TΣ , ∗ ∗ S ⇒R′ t implies S ⇒R t. Hence, L(G′ ) ⊆ L(G). Now, suppose G is an rc or rfc tree grammar. We use a similar construction as above, but now adding the set NGEN of new nonterminals to the forbidding context of all rules in R0 . More precisely, every rule A → r (P ; F ) in R is turned into the rule A → GENr (P ; F ∪ NGEN ) and all rules GENs → f [GENs1 , . . . , GENsk ] where s = f [s1 , . . . , sk ] is a subtree of r. Since r can be derived from GENr in any context, it is clear that L(G) ⊆ L(G′ ). Conversely, once a rule A → GENr has been applied in a derivation in G′ , the forbidding context ensures that GENr is turned into r before a rule can be applied to some nonterminal in N . In other words, even without reordering derivation steps, the derivation consists of segments as in the ﬁrst part of the proof. Thus, by the same argument as above, L(G′ ) ⊆ L(G). 4 Path languages In this section, we study the path languages of rc tree languages. Let us ﬁrst deﬁne some terminology and notation. Given a tree t and a node v = i1 · · · in ∈ V(t), we denote by path t (v) the string of symbols on the path from the root of t to v (including the label of v). Thus, path t (v) = t(λ)t(i1 ) · · · t(i1 · · · in ). The set paths (t) of all root-to-leaf paths in t is given by paths (t) = {path t (v) | v ∈ V(t) and t(v) ∈ Σ(0) } . Clearly, if t is a monadic tree, then |paths (t)| = 1. We shall in the following be interested in the path language of a tree language L, deﬁned as paths (L) = S t∈L paths (t). Our ﬁrst result concerns rc tree languages of ﬁnite index. Given an rc tree grammar G = (N, Σ, R, S), we say that G is of index k ∈ N if, for all t ∈ L(G), there is a derivation S = t1 ⇒ . . . ⇒ tn = t for some n ∈ N such that |VN (ti )| ≤ k for all i ∈ [n]. Here, VN (s) denotes the set of nodes v ∈ V (s) with s(v) ∈ N . A tree language is said to be of ﬁnite index (with respect to rc tree grammars) if it is generated by an rc tree grammar of ﬁnite index. 8 In the following, let G = (N, Σ, R, S) be of index k ∈ N, where N = {A1 , . . . , Am } for some m ∈ N. We are going to show that paths (L(G)) is regular. For this, we construct a regular string grammar G′ and prove that L(G′ ) = paths (L(G)). While the formal proof is quite technical, the idea behind it is rather obvious. The grammar G′ derives a path w ∈ paths (t) by simulating the derivation of t while actually producing only the symbols along a nondeterministically chosen path. To be able to choose only applicable rules, it must be stored in the (unique) nonterminal how many nonterminals there are in the tree. Assume that, after some steps, G would have derived a tree s[[A]], where A is the nonterminal at the end of the path that has, so far, been produced by G′ . Then the corresponding nonterminal in G′ is (A, γ), where γ is a vector used to count how many nonterminals of each type occur in s[[A]]. For every rule ρ in G, those in G′ must account for two possibilities. Either ρ is applied at the leaf at the end of the derived path (labelled with A), thus extending this path by replacing A with a string w′ b, or ρ is applied to some other nonterminal occurrence, thus aﬀecting only γ. This is illustrated in Figure 2 (where the trees are turned counter-clockwise by 90 degrees). Also, there must be a mechanism that prevents the derivation of the path from terminating prior to the rest of the simulated derivation if b ∈ Σ(0) . To formalize the construction, let Γ = {0, . . . , k}m and denote by O the element s w A (a) r s s w′ b r w w (b) A (c) Figure 2: In a derivation of G the sentential form (a) can be followed by either (b) or (c). 9 of Γ consisting entirely of zeroes.¯ Furthermore, for every tree t ∈ TΣ (N ), let ¯ cnt(t) = (i1 , . . . , im ) where ij = ¯V{Aj } (t)¯ for all j ∈ [m]. Thus, cnt(t) is the vector representing the numbers of occurrences of A1 , . . . , Am in t. Note that cnt(t) ∈ Γ P if t contains at most k nonterminals. Given a subset N ′ of N , we let cnt(N ′ ) = A∈N ′ cnt(A).2 By these deﬁnitions, a rule A → r (P ; F ) is applicable to some Alabelled leaf of t if and only if min(cnt(t) − cnt(A), cnt(P )) = cnt(P ) and min(cnt(t) − cnt(A), cnt(F )) = O. Now, let G′ be the regular grammar (N ′ , Σ, R′ , S ′ ), where N ′ = (N ∪ Σ) × Γ, S ′ = (S, cnt(S)), and R′ is deﬁned as follows. For each rule ρ = A → r (P ; F ) ∈ R and every γ ∈ Γ, we add certain rules to R′ . Let rρ,γ be the tree obtained by replacing every symbol a ∈ N ∪Σ(0) in r by (a, γ ′ ), where γ ′ = γ +cnt(r)−cnt(A). Now, if γ ′ ∈ Γ, add to R′ the productions ¯ ¯ p ∈ paths (rρ,γ ), ¯ ¯ min(γ − cnt(A), cnt(P )) = cnt(P ) and (A, γ) → p ¯ ¯ min(γ − cnt(A), cnt(F )) = O. ¯ ¯ a ∈ N ∪ Σ(0) , ¯ ¯ min(γ − cnt(A), cnt(P )) = cnt(P ), ¯ ′ (a, γ) → (a, γ )¯¯ min(γ − cnt(A), cnt(F )) = O and ¯ if a = A then min(γ, 2cnt(A)) = 2cnt(A), i.e., s[[A]] contains ¯ ¯ not only the A at the end of the generated path. (a, O) → a | a ∈ Σ(0) . To show that the construction is correct, we prove two lemmas stating that each derivation step in G corresponds to one in G′ and vice versa. (Readers who feel that this is obvious may of course skip these easy proofs.) Lemma 4.1 Let t = s[[A]] ∈ TΣ (N ), where A ∈ N , and assume that t ⇒G t′ = s[[r]] by an application of a rule A → r (P ; F ), where |VN (t)| , |VN (t′ )| ≤ k. Let v ∈ VN ∪Σ(0) (t′ ) and path t (v1 ) = wa, where v1 is the longest preﬁx of v such that v1 ∈ V (s). 1. If s(v1 ) = x1 (and thus a = A), then w(a, cnt(t)) ⇒G′ ww′ (b, cnt(t′ )) for all w′ b ∈ paths (r) (see Figure 2(a)). 2. If s(v1 ) 6= x1 (and thus v = v1 ), then w(a, cnt(t)) ⇒G′ w(a, cnt(t′ )) (see Figure 2(b)). 2 The sum, as well as every other arithmetic operation used in the following, is defined componentwise on Γ. 10 Proof Case 1. By construction, G′ contains the rule (A, cnt(t)) → w′ (b, γ) for all w′ b ∈ paths (r), where γ = cnt(t) + cnt(r) − cnt(A) = cnt(t′ ). Case 2. Again by construction, G′ contains the rule (a, cnt(t)) → (a, γ), where γ = cnt(t) + cnt(r) − cnt(A) = cnt(t′ ). Lemma 4.2 Let t ∈ TΣ (N ) and v ∈ VN ∪Σ(0) (t), where path t (v) = wa. If w(a, cnt(t)) ⇒G′ ww′ (b, γ), then t ⇒G t′ for a tree t′ ∈ TΣ (N ) such that γ = cnt(t′ ) and there is a node v ′ ∈ VN ∪Σ(0) (t′ ) with path t′ (v ′ ) = ww′ b. Proof There are two cases, depending on why the rule (a, cnt(t)) → w′ (b, γ) was included in R′ . Case 1. There is a rule A → r (P ; F ) ∈ R and a node v1 ∈ VN ∪Σ(0) (r) such that a = A and path r (v1 ) = w′ b, where the following are satisﬁed: (1) min(cnt(t) − cnt(A), cnt(P )) = cnt(P ), (2) min(cnt(t) − cnt(A), cnt(F )) = O, and (3) γ = cnt(t) + cnt(r) − cnt(A). Now we can write t in the form t = s[[A]], where s(v) = x1 , and, due to (1) and (2), apply the rule A → r (P ; F ) in order to get t ⇒G t′ = s[[r]], with cnt(t′ ) = γ (using (3)). Moreover, deﬁning v ′ = vv1 we get path t′ (v ′ ) = w path r (v1 ) = ww′ b. Case 2. We have w′ = λ, b = a, and there is a rule A → r (P ; F ) ∈ R such that (1) min(cnt(t) − cnt(A), cnt(P )) = cnt(P ), (2) min(cnt(t) − cnt(A), cnt(F )) = O, (3) if a = A, then min(cnt(t), 2cnt(A)) = 2cnt(A) (i.e., in addition to the A at the end of the generated path, t contains another one), and (4) γ = cnt(t) + cnt(r) − cnt(A). We can now write t in the form t = s[[A]], where s(v) 6= x1 (using (3) in case a = A), and apply the rule A → r (P ; F ) to obtain t ⇒ t′ = s[[r]] (using (1) and (2)). Since s(v) 6= x1 , it holds that path t′ (v) = path t (v) = wa. Thus, the statement of the lemma is true, taking v ′ = v. By induction on the length of derivations, the two lemmas prove the following: ∗ 1. For every derivation S ⇒G t, t ∈ TΣ (N ), and every node v ∈ VN ∪Σ(0) (t), ∗ S ′ ⇒G′ w(a, cnt(t)), where wa = path t (v). ∗ 2. For every derivation S ′ ⇒G′ w(a, γ), there is a tree t ∈ TΣ (N ) such that ∗ S ⇒G t, cnt(t) = γ and path t (v) = wa for some v ∈ VN ∪Σ(0) (t). Note that, for the induction basis, paths (S) = {S} and S ′ = (S, cnt(S)). Taking 11 ∗ t ∈ TΣ and v ∈ VΣ(0) (t), it follows from (1) that S ⇒G t implies ∗ S ′ ⇒G′ w(a, cnt(t)) = w(a, O) ⇒G′ wa , ∗ where wa = path t (v). Conversely, by (2), S ′ ⇒G′ w(a, O) ⇒G′ wa implies ∗ that there is a tree t ∈ TΣ with S ⇒G t and wa ∈ paths (t). Hence, L(G′ ) = paths (L(G)) and we have proved the desired theorem. Theorem 4.3 The path language of an rc tree language of ﬁnite index is regular. The next example shows that the theorem does not extend to all rc tree languages. Example 4.4 Let Σ = {g (2) , a(1) , b(1) , ⊥(0) }. We generate the set L ⊆ TΣ of all trees of the form shown in Figure 3 using an rc tree grammar. Clearly, the path language of L is not regular as its restriction to {a, b, ⊥}∗ is {an bn ⊥ | n ≥ 0}. L is generated by G = (N, Σ, R, S), where N = {S, S ′ , Sa , Sa′ , Sb , Sb′ , A, A′ , B, B ′ , Z} and R consists of the following productions: g ¯ g ¯ S′ ′ ¯ S → g[Z, S ] g ¯ ¯ Z → g[Z, g[A, B]] ¯ Generate A B ¯ g ¯ Z → ⊥ ¯ g ⊥ A B ¯ ¯ Turn to a-generating phase when Z has S ′ → Sa (∅; {Z}) ¯ ﬁnished. ¯ ¯ Sa → a Sa′ ({A}; {A′ }) ¯ ′ ′ ′ A → A ({Sa }; {A }) ¯¯ Consume all A’s, one at a time. For each ¯ ¯ A, turn Sa into a Sa . Sa′ → Sa ({A′ }; ∅) ¯ ¯ A′ → ⊥ ({S }; ∅) a Sa → Sb (∅; {A}) Sb B Sb′ B′ → → → → b Sb′ B′ Sb ⊥ ({B}; {B ′ }) ({Sb′ }; {B ′ }) ({B ′ }; ∅) ({Sb }; ∅) Sb → ⊥ (∅; {B}) ¯ ¯ Turn to b-generating phase when no A is ¯ left. ¯ ¯ ¯ ¯ ¯ Consume all B’s, one at a time. For each ¯ ¯ B, turn Sb into b Sb . ¯ ¯ ¯ ¯ ¯ Terminate when no B is left. 12 g n      ⊥ g g ⊥ ⊥ g g ⊥ ⊥ a ) a ) b n n b ⊥ Figure 3: The path language of an rc tree language is not necessarily regular. Thus, as one may have expected, rc tree grammars of inﬁnite index are substantially more powerful than those of ﬁnite index. Whether or not this also holds for the permitting and forbidding cases is still open. For the permitting case however, we believe that simply adding more nonterminals to a derivation will not signiﬁcantly increase the generative power of an rpc tree grammar. This is expressed in Conjecture 4.5, which we hope to be able to prove in a future paper. Conjecture 4.5 The path language of an rpc tree language is regular. Example 4.4 can easily be extended to obtain an rc tree language whose path language is not context free. One uses the same construction as in Example 4.4, but with an additional unary terminal c and new nonterminals, C and C ′ , playing a similar role as the pairs A, A′ and B, B ′ . Now, intersecting the path language with {a, b, c, ⊥}∗ yields the non-context-free language {an bn cn ⊥ | n ≥ 0}. Theorem 4.6 The path language of an rc tree language is not necessarily context free. The remainder of this section is devoted to the proof of a pumping lemma for paths in rpc tree languages. For this, it is technically useful to extend the derivation relation of rc tree grammars to forests. This is deﬁned in such a way that there is a derivation step (t1 , . . . , tn ) ⇒G (t′1 , . . . , t′n ) if f [t1 , . . . , tn ] ⇒G f [t′1 , . . . , t′n ] for an auxiliary symbol f (n) . Deﬁnition 4.7 Let G = (N, Σ, R, S) be an rc tree grammar and t1 , . . . tn ∈ TΣ (N ). There is a derivation step (t1 , . . . , tn ) ⇒G (t′1 , . . . , t′n ) if there is a rule A → r (P ; F ) and an index i ∈ [n] such that the following hold: 1. ti = s[[A]] and t′i = s[[r]] for a tree s ∈ TΣ∪N (X1 ) containing x1 exactly once, 2. t′j = tj for all j ∈ [n] \ {i}, and 3. all symbols in P and no symbols in F occur in t1 , . . . , ti−1 , s, ti+1 , . . . , tn . ∗ Now, suppose we have a derivation (t1 , . . . , tn ) ⇒G (t′1 , . . . , t′n ) in an rpc tree grammar. The following simple lemma states that we may then copy and swap 13 around the trees in these forests in an arbitrary way (but consistently and without deleting trees). The second forest will still be derivable from the ﬁrst. Lemma 4.8 Let G be an rpc tree grammar and t1 , t′1 , . . . , tn , t′n ∈ TΣ (N ). Let ∗ σ : [m] → [n] be surjective. If there is a derivation (t1 , . . . , tn ) ⇒G (t′1 , . . . , t′n ), ∗ then there is also a derivation (tσ(1) , . . . , tσ(m) ) ⇒G (t′σ(1) , . . . , t′σ(m) ). Proof It suﬃces to consider a single step (t1 , . . . , tn ) ⇒G (t′1 , . . . , t′n ). Assume, without loss of generality, that the derivation step replaces a nonterminal of t1 and tσ(1) , . . . , tσ(k) are the copies of t1 . More precisely, t′2 = t2 , . . . , t′n = tn and σ −1 (1) = {1, . . . , k}. Since σ is surjective, all nonterminals in t1 , . . . , tn occur in tσ(k) , . . . , tσ(m) . Hence, as G is permitting, (tσ(1) , . . . , tσ(m) ) ⇒G ⇒G ⇒G = (t′1 , tσ(2) , . . . , tσ(m) ) ... (t′1 , . . . , t′1 , tσ(k+1) , . . . , tσ(m) ) (t′σ(1) , . . . , t′σ(m) ) . Next, we prove a rather technical lemma that allows us to simultaneously pump parts of a tree derived by an rpc tree grammar. Let us ﬁrst try to convey the idea behind this lemma at an intuitive level. Suppose we have a derivation ∗ ∗ ∗ S ⇒ t1 ⇒ t2 ⇒ t such that cnt(t1 ) ≤ cnt(t2 ). Intuitively, the assumption means that the nonterminals A1 , . . . , Am in t1 are among the nonterminals B1 , . . . , Bn in t2 , where some of them may have become copied and entirely new ones may have been added, but none has been removed. By Lemma 4.8, this means that the rules that have been applied to A1 , . . . , Am in t1 can also be applied to their copies in t2 . In other words, the part of the derivation leading from t1 to t2 may be repeated. More precisely, if Bj is a copy of a nonterminal Ai that has been ∗ replaced by the subtree s(j) in t1 ⇒ t2 , we can continue the derivation in such a way that each such Bj is replaced by the corresponding s(j), and this can be repeated as often as we like. Finally, the derivation is terminated by deriving an appropriate number of copies of the subtrees s′1 , . . . , s′n derived from B1 , . . . , Bn in the original derivation. Figure 4 illustrates the situation by means of an example. The nonterminals A, A, B are those in t1 and B, B, A, A, C are those in t2 (i.e., A1 , A2 , A3 respec∗ ∗ tively B1 , . . . , B5 ). The subtrees derived from them in t1 ⇒ t2 and t2 ⇒ t are s1 , s2 , s3 and s′1 , . . . , s′5 , respectively. Intuitively, the fact that s(3) = s1 and s(4) = s2 indicates that B3 is regarded as a copy of A1 and B4 as a copy of A2 . One could equally well choose s(3) = s2 and s(4) = s1 , but note that s(3) = s(4) = s1 is not possible because both A1 and A2 (i.e., both A’s) must be present among B1 , . . . , B5 . Note also that s(5) = x5 , reﬂecting the fact that B5 = C does not occur at all among A1 , . . . , Am . The lower part of the ﬁgure shows in a schematic way the tree obtained by pumping once. 14 x1 x2 x3 A A B s1 = s(3) s3 = s(1), s(2) x4 x5 x1 x2 s2 = s(4) x3 B B A A C s′1 s′2 s′3 s′4 s′5 s(5) = x5 s′5 x4 x5 x4 x5 x1 x2 x3 s′4 s′4 s′1 s′3 s′5 s′5 s′2 Figure 4: The pumping situation of Lemma 4.9 Lemma 4.9 Let G be an rpc tree grammar and consider a derivation ∗ ∗ t1 [ A1 , . . . , Am ] ⇒ t1 [ s1 , . . . , sm ] [ B1 , . . . , Bn ] ⇒ t2 [ s′1 , . . . , s′n ] {z } | t2 with cnt(t1 [ A1 , . . . , Am ] ) ≤ cnt(t2 [ B1 , . . . , Bn ] ). For all j ∈ [n], choose s(j) ∈ {si | i ∈ [m], Ai = Bj } ∪ {xj } in such a way that {s1 , . . . , sm } ⊆ {s(1), . . . , s(n)}. Then, for all k ∈ N, there is a derivation ∗ t2 [ B1 , . . . , Bn ] ⇒ t2 [ s(1), . . . , s(n)]]k [ B1 , . . . , Bn ] ∗ ⇒ t2 [ s(1), . . . , s(n)]]k [ s′1 , . . . , s′n ] (where the superscript k indicates k-fold iteration). 15 Proof Let j1 , . . . , jk ∈ [n] be the indices j ∈ [n] for which Bj ∈ {A1 , . . . , Am }. Since {s1 , . . . , sm } ⊆ {s(1), . . . , s(n)}, Lemma 4.8 applies and yields a derivation ∗ (Bj1 , . . . , Bjk ) ⇒ (s(j1 )[[B1 , . . . , Bn ] , . . . , s(jk )[[B1 , . . . , Bn ] ). Moreover, for all j ∈ ∗ [n] with Bj 6∈ {A1 , . . . , Am }, we have s(j)[[B1 , . . . , Bn ] = Bj and thus Bj ⇒ s(j)[[B1 , . . . , Bn ] . Hence, ∗ (B1 , . . . , Bn ) ⇒ (s(1)[[B1 , . . . , Bn ] , . . . , s(n)[[B1 , . . . , Bn ] ) and thus ∗ t2 [ B1 , . . . , Bn ] ⇒ t2 [ s(1)[[B1 , . . . , Bn ] , . . . , s(n)[[B1 , . . . , Bn ] ] = t2 [ s(1), . . . , s(n)]][ B1 , . . . , Bn ] . Repeating the argument, it follows that ∗ t2 [ B1 , . . . , Bn ] ⇒ t2 [ s(1), . . . , s(n)]]k [ B1 , . . . , Bn ] . In a similar way, it follows from Lemma 4.8 that ∗ t2 [ s(1), . . . , s(n)]]k [ B1 , . . . , Bn ] ⇒ t2 [ s(1), . . . , s(n)]]k [ s′1 , . . . , s′n ] , which completes the proof. The next lemma states that Lemma 4.9 can be used to pump a path if, in the situation of that lemma, some Bj is equal to the Ai it has been derived from. (For the lemma, recall that t/u denotes the subtree rooted at node u in a tree t.) Lemma 4.10 Let G = (N, Σ, R, S) be an rpc tree grammar and assume that ∗ ∗ ∗ there is a derivation S ⇒ t ⇒ t′ ⇒ t′′ such that cnt(t) ≤ cnt(t′ ) and t′′ ∈ TΣ . If there are nodes u ∈ V(t) and v ∈ t′ /u with path t (u) = xA and path t′ /u (v) = yA for some A ∈ N , then xy k z ∈ paths (L(G)) for all k ≥ 1 and z ∈ paths (t′′ /uv). Proof By the assumptions, Lemma 4.9 applies with t1 [ A1 , . . . , Am ] = t, t2 [ B1 , . . . , Bn ] = t′ = t1 [ s1 , . . . , sm ] , and t2 [ s′1 , . . . , s′n ] = t′′ . Furthermore, if t1 (u) = xi (i.e., Ai is the nonterminal A from the statement of the lemma) and t2 (uv) = xj , then we have Ai = Bj and can thus in particular choose s(j) = si . (For example, in Figure 4 one could have chosen s(3) = s2 instead of s(3) = s1 .) Then, each tree t2 [ s(1), . . . , s(n)]]k [ s′1 , . . . , s′n ] contains the path xy k z for every z ∈ paths (s′j ) = paths (t′′ /uv), namely the strings on paths passing through the node uv k . Next, we state the rather obvious fact that the situation described in Lemma 4.10 must necessarily occur if we have a suﬃciently large number of trees t0 , . . . , tϕ such that cnt(t0 ) ≤ · · · ≤ cnt(tϕ ). 16 Lemma 4.11 Let G = (N, Σ, R, S) be an rpc tree grammar such that all right∗ ∗ ∗ hand sides of rules in R are in N ∪Σ(N ).3 For every derivation t0 ⇒ t1 ⇒ . . . ⇒ tϕ with ϕ > |N | and height (t0 ) < . . . < height (tϕ ), there are i, j (0 ≤ i < j ≤ ϕ) and nodes u ∈ V(ti ), uv ∈ V(tj ) such that ti (u) = tj (uv) ∈ N and v 6= λ. Proof A straightforward induction on ϕ shows that there is a node v1 · · · vϕ ∈ V(tϕ ) such that v1 , . . . , vϕ 6= λ and v1 · · · vi is a node of ti with ti (v1 · · · vi ) ∈ N , for each i ∈ {0, . . . , ϕ − 1}. Thus, if ϕ > |N |, the nodes as described in the claim must exist among {λ, v1 , . . . , v1 · · · vϕ−1 }. The proof of the pumping lemma also uses [EvdW00, Lemma 3], stated below in a slightly generalized form that follows easily from the original formulation. Lemma 4.12 Let b1 , b2 , . . . be an inﬁnite sequence of vectors in Nm . For every ϕ ∈ N there exists a number d such that if c1 , c2 , . . . is a sequence with ci ≤ bi , then there is a subsequence ci1 ≤ ci2 ≤ . . . ≤ ciϕ+1 with iϕ+1 ≤ d. We are now ready to prove the promised pumping lemma. Theorem 4.13 For every rpc tree grammar G there exists d ∈ N such that the following holds for every tree t ∈ L(G) with height (t) ≥ d: There is some xyz ∈ paths (t) such that |xy| ≤ d, y 6= λ, and xy k z ∈ paths (L(G)) for all k ≥ 1. Proof Let G = (N, Σ, R, S) and ϕ = |N | + 1. By Lemmas 4.9 – 4.11 it suﬃces to show that d can be selected in such a way that every derivation yielding a tree t with height (t) ≥ d has the form ∗ ∗ ∗ ∗ S = t0 ⇒ t1 ⇒ . . . ⇒ tϕ ⇒ t , where height (t0 ) < . . . < height (tϕ ) and cnt(t0 ) < . . . < cnt(tϕ ). For this, we use Lemma 4.12. By Theorem 3.6, it may be assumed that the right-hand sides of rules in R are elements of N ∪ Σ(N ). To exploit Lemma 4.12, let bi = (ei , . . . , ei ), where e is the maximal rank of symbols in Σ. For the given ϕ choose d as in the lemma. Now, let height (t) ≥ d. Every derivation of t can obviously be written as ∗ ∗ ∗ ∗ S = s0 ⇒ s1 ⇒ . . . ⇒ sd ⇒ t where si is the ﬁrst tree s in the derivation satisfying height (s) = i. In particular, height (s0 ) < . . . < height (sd ) and cnt(si ) ≤ bi for all i ∈ [d]. Thus, by Lemma 4.12, there are i0 , . . . , iϕ (0 ≤ i0 < . . . < iϕ ≤ ϕ) with cnt(si0 ) < . . . < cnt(siϕ ). The proof is completed by choosing tj = sij for j = 0, . . . , ϕ. 3 By Theorem 3.6 on page 7, this assumption is not a serious restriction. 17 5 Random context tree transducers In the two previous sections we explored the eﬀects of adding random context regulation to a tree generating device, namely the regular tree grammar. In this section we will focus on the transformation of trees as we equip the top–down tree transducer with rc regulation. Let us ﬁrst recall the deﬁnition of top-down tree transducers. This device turns an input tree into an output tree, starting at the root and working its way towards the leaves. Deﬁnition 5.1 (top–down tree transducer) A top–down tree transducer (td transducer, for short) is a quintuple td = (Σ, Σ′ , Q, R, q0 ), where Σ and Σ′ are signatures, the input and the output signature, Q is a signature of states of rank 1 with Q ∩ (Σ ∪ Σ′ ) = ∅, R is a ﬁnite set of rules, and q0 ∈ Q is the initial state. Every rule in R has the form qf [x1 , . . . , xn ] → r , where q ∈ Q, f ∈ Σ(n) for some n ∈ N, and r ∈ TΣ′ (Q(Xk )). For trees t, t′ , there is a transduction step t 7→td t′ if they can be written in the form t = s[[qf [s1 , . . . , sn ]]] and t′ = s[[r[[s1 , . . . , sn ] ] (where s contains x1 exactly once), such that the rule qf [x1 , . . . , xn ] → r is in R. Note that the right-hand side r of a rule in a td transducer can always be written in the form r′ [ q1 xi1 , . . . , qk xik ] , where r′ ∈ TΣ′ (Xk ), q1 , . . . , qk ∈ Q, and xi1 , . . . , xik are variables occurring in the left-hand side of the rule. If we denote the right-hand side of a rule in this form, this is always meant to imply that r′ ∈ TΣ′ (Xk ) (for some suitable k ∈ N) and that it contains each variable in Xk exactly once. Thus, each of the terms q1 xi1 , . . . , qk xik corresponds to a unique occurrence of the respective subtree in r. This convention carries over to rc td transducers as deﬁned below. We now add random context regulation to the td transducer, and the resulting device will henceforth be referred to as a random context td transducer. Whereas in the last section we deﬁned random context in terms of nonterminals, we now use states for the same purpose. Deﬁnition 5.2 (rc td transducer) A random context top-down tree transducer (rc td transducer, for short) is a quintuple rctd = (Σ, Σ′ , Q, R, q0 ) that is deﬁned like a td transducer in every respect, except that the rules have the extended form qf [x1 , . . . , xn ] → r (P ; F ) , where P, F ⊆ Q. For a tree t, let states (t) denote the set of all states occurring in t. Now, given another tree t′ , there is a transduction step t 7→rctd t′ if the two trees can be written 18 in the form t = s[[qf [s1 , . . . , sn ]]] and t′ = s[[r[[s1 , . . . , sn ] ] (where s contains x1 exactly once), such that - R contains a rule qf [x1 , . . . , xn ] → r (P ; F ) with - P ⊆ states (s) and - F ∩ states (s) = ∅. The rc td transduction computed by rctd is given by ∗ rctd (t) = {t′ ∈ TΣ′ | q0 t 7→rctd t′ } for every tree t ∈ TΣ . For a set T ⊆ TΣ , we let rctd (T ) = S t∈T rctd (t). Obviously, a td transducer can be identiﬁed with an rc td transducer in which P = ∅ = F for all rules. Thus, the rc td transducer generalizes the td transducer just like the rc tree grammar generalizes the regular tree grammar. An rc td transducer whose every forbidding context is empty is an rpc td transducer, and an rc td transducer whose every permitting context is empty is an rfc td transducer. If every rule of an rc td transducer contains each variable at most once in its right-hand side, then it is said to be linear. In order to get some insight into how powerful the diﬀerent types of rc td transducers are, we shall in the following study the characteristics of their domains and ranges: Given an rc td transducer rctd = (Σ, Σ′ , Q, R, q0 ), its domain is the set of all trees t ∈ TΣ such that rctd (t) 6= ∅, and its range is the set rctd (TΣ ). Example 5.3 Neither the domain nor the range of a nonlinear rpc td transduction is necessarily an rc tree language. Let us ﬁrst consider the domain. We construct an rpc td transducer rctd with input signature {a(1) , b(1) , ⊥(0) } that works as follows. It starts by consuming, in the ﬁrst step, the root symbol a of its monadic input tree and making two copies of the remaining input. (If the root symbol is not a, then the computation gets stuck already in this step.) In the next phase, it consumes the initial a’s of the left copy until it reaches a b or ⊥. It then continues by matching the b’s of the left copy with the a’s of the right copy (for simplicity always taking two at a time), using states qb , qb′ , qa and qa′ . Only if there is an equal (and even) number of a’s and b’s will it terminate. It follows that the domain of rctd is {a2n b2n ⊥ | n ≥ 1}, which is not regular, and hence not random context (by Observation 3.5). Explicitly, rctd = (Σ, Σ′ , Q, R, q0 ) is given by the following components; the 19 reader should easily be able to check that it works as described above: Σ Σ′ Q R = = = = { a(1) , b(1) , ⊥(0) { g (2) , ⊥(0) { q0 , q, qa , qa′ , qb , qb′ { }, }, }, ¯ ¯ ¯ q0 a x1 → g[q x1 , qa x1 ] ¯ ¯ ¯ q a x1 → q x1 qa′ b x1 qb′ ⊥ qa′ x1 qb′ x1 → ({qb }; → ({qa′ }; → qa x1 ({qb′ }; → qb x1 ({qa }; → ⊥ → ⊥ Delete all a’s in the ﬁrst copy. ¯ ¯ ¯ q b x 1 → qb x1 qa a x1 qb b x 1 qa′ a x1 qb′ b x1 Copy input, except for ﬁrst a. ({qb′ }; ∅) ∅) ∅) ∅) Delete the topmost b in the ﬁrst copy. g ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ∅) } g qb b b qa a a s t 4 7→ qb qa s t Terminate if qa′ reaches the ﬁrst b exactly when qb′ reaches ⊥. Now, let rctd ′ be deﬁned similarly to rctd , but modiﬁed to keep a copy of the inspected monadic subtree, as indicated in Figure 5. Then the range of rctd ′ cannot be an rc tree language because of Theorem 4.3. n ( a b b ⊥ 7→ a b b ⊥ a b b ⊥ g a qcp a ⊥ ∗ a b 7 → (if m = n even) ⊥ a) a ) b m = m q a q0 (a g qa a n b ⊥ b ⊥ Figure 5: The range of a nonlinear rc td transduction is not necessarily an rc tree language. 20 Clearly, every rc tree language is the range of an rc td transduction having a monadic input signature. Together with the example above, this yields the following theorem. Theorem 5.4 The class of rc tree languages is contained in the class of ranges of rc td transductions. Furthermore, even for rpc td transductions neither the domain nor the range is necessarily an rc tree language. These statements remain true if restricted to tree transductions with monadic input signatures. Intuitively, the power of the rc td transducer in Example 5.3 is due to its nonlinearity. It is therefore natural to wonder what happens if linearity is required. As the following theorem shows, the domains and ranges of rc td transducers become indeed rc tree languages in this case. Lemma 5.5 The domain of every linear rc td transduction is an rc tree language. This is also true if rc is replaced by rpc or rfc. Proof The domain of a linear rc td transducer rctd = (Σ, Σ′ , Q, R, q0 ) is generated by G = (Q ∪ {A}, Σ, R′ , q0 ), where R′ is given as follows: For every rule qf [x1 , . . . , xk ] → r[[q1 xi1 , . . . , ql xil ] (P ; F ) in R, R′ contains the rule q → f [A1 , . . . , Ak ] (P ; F ) , where Aj (1 ≤ j ≤ k) is the unique qm such that im = j, if such an m ∈ [l] exists, and Aj = A otherwise. In addition, R′ contains all rules A → f [A, . . . , A] such that f ∈ Σ (thus, A generates TΣ ). Note that this construction preserves the permitting and forbidding properties. Using the fact that rctd is linear, correctness can be proved by induction on the length of derivations. We omit this proof because it is both straightforward and very technical. Lemma 5.6 The range of every linear rc td transduction is an rc tree language. This is also true if rc is replaced by rpc or rfc. Proof The range of a linear rc td transducer rctd = (Σ, Σ′ , Q, R, q0 ) is generated by G = (Q, Σ′ , R′ , q0 ), where R′ is given as follows: For every rule qf [x1 , . . . , xk ] → r[[q1 xi1 , . . . , ql xil ] (P ; F ) in R, R′ contains the rule q → r[[q1 , . . . , ql ] (P ; F ). Again, the construction preserves the permitting and forbidding properties. As in the proof of the previous lemma, we omit the technical but straightforward inductive correctness proof. Lemma 5.5 and Lemma 5.6, when combined, prove the following theorem. Theorem 5.7 Both the domain and the range of a linear rc td transduction are rc tree languages. This is also true if rc is replaced by rpc or rfc. 21 g m (a n a ⊥ ( a) b a ) b b ⊥ m n b ⊥ Figure 6: Another rpc tree language Corollary 5.8 If L is the domain or range of a linear rc td transducer, then yield(L) is an rc string language. This is also true if rc is replaced by rpc or rfc. By deﬁnition, the range of an rc td transduction rctd is the image of a particular tree language under rctd , namely rctd (TΣ ). What happens if we look at images of rc tree languages under rc td transductions? – As the following example shows, even the image of an rpc tree language of ﬁnite index under a linear rpc td transduction rctd is not necessarily rc. Example 5.9 Let L be the set of all trees such as depicted in Figure 6, which we generate using the rpc tree grammar G = (N, Σ, S, R) whose components are ′′ ′′ N = { S, A, A′ , A , B, B ′ , B , C, C ′ Σ = { a(1) , b(1) , g (3) , ⊥(0) R = { S → g[A, B, C] A C A′ C′ A → → → → → a A′ a C′ A C ′′ A ({C}; ({A′ }; ({C ′ }; ({A}; ({C}; B C B′ C′ B → → → → → b B′ b C′ B C ′′ B ({C, A }; ′′ ({B ′ , A }; ′′ ({C ′ , A }; ′′ ({B, A }; ′′ ({C, A }; ′′ ∅) ∅) ∅) ∅) ∅) ′′ ∅) ∅) ∅) ∅) ∅) ′′ A → ⊥, B → ⊥, C → ⊥ 22 } } ¯ ¯ ¯ ¯ ¯ ¯ ¯ Generate g[am A′′ , B, am C]. ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Generate g[am A′′ , bn B, am bn C]. ¯ ¯ ¯ ¯ ¯ } A linear rpc td transducer rctd , using the technique from Example 5.3, repeatedly reads two symbols each, from the ﬁrst and second subtree of the root symbol g, and terminates only if ⊥ is reached by both subcomputations at the same time. The third subtree is simply copied to the output. The rpc td transducer thus turns a tree g[a2m ⊥, b2n ⊥, t] into g[⊥, ⊥, t] if and only if m = n ≥ 1. However, the path language of rctd (L) is {ga2n b2n ⊥ | n ≥ 1}. If rctd (L) were rc, then paths(rctd (L)) would be regular by Theorem 4.3 (as rctd (L) would obviously be of ﬁnite index in that case). Hence, rctd (L) is not rc. References [Bra69] Walter S. Brainerd. Tree generating regular systems. Information and Control, 14:217–231, 1969. [DEKK03] Frank Drewes, Sigrid Ewert, Renate Klempien-Hinrichs, and HansJörg Kreowski. Computing raster images from grid picture grammars. Journal of Automata, Languages and Combinatorics, 8:499– 519, 2003. [DP89] Jürgen Dassow and Gheorge Păun. Regulated Rewriting in Formal Language Theory. Springer-Verlag, Berlin, 1989. [Dre96] Frank Drewes. Language theoretic and algorithmic properties of ddimensional collages and patterns in a grid. Journal of Computer and System Sciences, 53:33–60, 1996. [Dre05] Frank Drewes. Grammatical Picture Generation – A Tree-Based Approach. Texts in Theoretical Computer Science. An EATCS Series. Springer-Verlag, 2005. To appear. [Eng80] Joost Engelfriet. Some open questions and recent results on tree transducers and tree languages. In R. V. Book, editor, Formal Language Theory: Perspectives and Open Problems, pages 241–286. Academic Press, New York, 1980. [Eng94] Joost Engelfriet. Graph grammars and tree transducers. In S. Tison, editor, Proc. CAAP 94, volume 787 of Lecture Notes in Computer Science, pages 15–37. Springer-Verlag, 1994. [EvdW99a] Sigrid Ewert and Andries van der Walt. A hierarchy result for random forbidding context picture grammars. International Journal of Pattern Recognition and Artificial Intelligence, 13:997–1007, 1999. [EvdW99b] Sigrid Ewert and Andries van der Walt. Random context picture grammars. Publicationes Mathematicae (Debrecen), 54 (Supp):763– 786, 1999. [EvdW00] Sigrid Ewert and Andries van der Walt. A shrinking lemma for random forbidding context languages. Theoretical Computer Science, 237:149–158, 2000. 23 [EvdW02] [EvdW03] [FV98] [GS84] [GS97] [MW67] [Rou68] [Rou70] [Tha70] [Wal72] Sigrid Ewert and Andries van der Walt. A pumping lemma for random permitting context languages. Theoretical Computer Science, 270:959–967, 2002. Sigrid Ewert and Andries van der Walt. A property of random context picture grammars. Theoretical Computer Science, 301:313–320, 2003. Zoltán Fülöp and Heiko Vogler. Syntax-Directed Semantics: Formal Models Based on Tree Transducers. Springer-Verlag, 1998. Ferenc Gécseg and Magnus Steinby. Tree Automata. Akadémiai Kiadó, Budapest, 1984. Ferenc Gécseg and Magnus Steinby. Tree languages. In G. Rozenberg and A. Salomaa, editors, Handbook of Formal Languages. Vol. III: Beyond Words, chapter 1, pages 1–68. Springer-Verlag, 1997. J. Mezei and Jesse B. Wright. Algebraic automata and context-free sets. Information and Control, 11:3–29, 1967. William C. Rounds. Trees, Transducers and Transformations. PhD thesis, Stanford University, 1968. William C. Rounds. Mappings and grammars on trees. Mathematical Systems Theory, 4:257–287, 1970. James W. Thatcher. Generalized2 sequential machine maps. Journal of Computer and System Sciences, 4:339–367, 1970. Andries van der Walt. Random context languages. Information Processing, 71:66–68, 1972. 24

Log In

Random Context Tree Grammars and Tree Transducers

Related papers

Related papers

Related topics