Regular tree grammar
In computer science, a regular tree grammar (RTG)[1] is a formal grammar that describes a set of directed trees.
Definition
A regular tree grammar is defined by the tuple
,
where
- is a set of nonterminals,
- is a ranked alphabet (i.e., an alphabet whose symbols have an associated arity) disjoint from ,
- is the starting nonterminal, with , and
- is a set of productions of the form , where , and , where is the associated term algebra, i.e. the set of all trees composed from symbols in according to their arities, where nonterminals are considered nullary.
Derivation of trees
The grammar implicitly defines a set of trees: any tree that can be derived from using the rule set is said to be described by . This set of trees is known as the language of . To express this more formally, we define the relation on the set as follows:
Wwe say that can be derived in a single step into a tree (in short: ), if there is a context and a production such that:
- , and
- .
Here, a context means a tree with exactly one hole in it; if is such a context, we denote by the result of filling the tree into the hole of .
The tree language generated by is the language .
Here, denotes the set of all trees composed from symbols of , while denotes successive applications of .
A language generated by some regular tree grammar is called a regular tree language.
Examples
Let , where
- is our set of nonterminals,
- is our ranked alphabet, arities indicated by dummy arguments (i.e. the symbol has arity 2),
- is our starting nonterminal, and
- the set consists of the following productions:
An example derivation is:
.
The tree language generated by is the set of all finite lists of boolean values, that is, happens to equal . The grammar corresponds to the algebraic data type declarations
datatype Bool
= false
| true
datatype BList
= nil
| cons of Bool * BList
in the Standard ML programming language: every member of corresponds to a Standard-ML value of type BList.
For another example, let , using the nonterminal set and the alphabet from above, but extending the production set by , consisting of the following productions:
The language is the set of all finite lists of boolean values that contain at least once. The set has no datatype counterpart in Standard ML, nor in any other functional language. It is a proper subset of .
Language properties
If both are regular tree languages, then the tree sets , , and are also regular tree languages, and it is decidible whether , and whether .
Alternative characterizations and relation to other formal languages
As shown by Rajeev Alur and Parthasarathy Madhusudan[2][3] the class of regular tree languages coincides with nested words and visibly pushdown languages.
The regular tree languages are also[4] the languages recognized by bottom-up tree automata and nondeterministic top-down tree automata.
Regular tree grammars are a generalization of regular word grammars.
See also
References
- ^ "Regular tree grammars as a formalism for scope underspecification". CiteSeerx: 10.1.1.164.5484.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1007352.1007390, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with
|doi=10.1145/1007352.1007390
instead. - ^ Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi:10.1145/1516512.1516518, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with
|doi=10.1145/1516512.1516518
instead. - ^ Comon et al, Tree Automata Techniques and Applications, 1997