Some classifications of context-free languages

Jozef Gruska

Some classifications of context-free languages

Jozef Gruska

1969, Information and Control

visibility

…

description

28 pages

link

1 file

I N F O R M A T I O N AND C O N T R O L 14, 152-179 (1969) Some Classifications of Context-Free Languages J. GRUSKAI Mathematical Institute of SIovak Academy o.f Sciences, Bralislava, Czechoslovakia 1. INTRODUCTION AND SUM2iIAI~Y The basic definitions and notations of the theory of context-free grammars and languages (briefly grammars and languages) used in this paper are as in Ginsburg (1966). The classification of languages L according to the minimal number of variables in grammars for L was studied in Gruska (1967). In this paper some other classifications of grammars and languages are investigated. They are chosen in such a way as to characterize some aspects of our intuitive notion about complexity (of the description) of grammars and languages and their intrinsic structure. The classifications of languages are indicated by those of grammars. The intrinsic structure of a grammar G is characterized by the number and by the depth of the grammatical levels of G. A grammatical level Go of a grammar G is a maximal set of productions of G the left-side symbols of which are mutually dependent. The basic concepts of grammatical levels and classifications of grammars and languages are given in Sections 2 and 3. Only such classifications K are considered here, wherein for every grammar G (language L) K(G) (K(L)) is an integer. In this paper only nonnegative integers will be considered. A classification K is said to be connected in an alphabet Z if for every integer n there is a language L c Z* such that K ( L ) = n. Sections 4 to 6 provide the proofs that the classifications according to the number of variables, the number of productions, the number of grammatical levels, the number of non-elementary grammatical levels (that is, the grammatical levels with at least two variables) and the maximal depth of grammatical levels (that is, according to the maximal number of variables in grammatical levels) are connected in any alphabet with 1 Currently at University of Minnesota, School of Mathematics, Minneapolis, Minn. 55955. 152 CONTEXT-FREE LANGUAGES 153 at least two symbols. All these classifications of languages are based upon the c]assifications of grammars and the integer associated to a language L is the minimal of those associated to all grammars for L. If only a restrictive class of grammars for L is considered, we speak about bounded classifications. T h e y are studied in Section 7 where especially the case of regular events, one-side linear grammars and nonself-embedding grammars is investigated. Section 8 is devoted to the relations between classifications of lan~ guages. Besides general results this section provides the proof that if only classifications from Sections 4 to 6 are considered, then, with one exception, for any two of them, written K and K', there is a language L such that the class of the simplest grammars for L according to K is disjoint with the class of the simplest grammars for L according to K'. In Section 9 the so-called multiple classifications are considered. The proof is given here that no two of the classifications considered in Sec~ tions 4 to 6 are symmetric and each two of them give a new classification which is again connected in any alphabet ~4th at least two symbols. In the final section some generalizations, open questions and problem areas are discussed. 2. GRAMMATICAL LEVELS k grammatical level of a grammar G is represented by a set of productions (of G) the left-side symbols of which are mutually dependent. We can say t h a t the left-side symbols of productions in a grammatical level are "equally complicated" or are " o f the same level", or define " t h e equally complicated languages". 2.1. DEFINITmN. Let G = (V, ~, P, ~) be a grammar. For a set a c V* let G ( a ) = {x; z ~ x E ~*, z E a}. For two variables A, B put A ,-B if there are strings x and y such that the production A ~ x B y G is in P. Let .-be a transitive and reflexive closure of the relation.--. For G G two variables A, B p u t A ~G B i f A ~G* B and B ..*G A. For two productions pl = A --~ a, p2 = B --~ fl put pl ~ p~ if and only if A ----~B. When there is no danger of misunderstanding the symbol specifying grammar is omitted. 2.2. COROLLARY. The relations -- on V -- ~ and --~ on P are obviously equivalence relations. The relation ,,.- is termed dependence relation and was introduced i n Culik (1962) and Kop~'iva (1964). A s to the relation =--, see 154 ~nVSK~ Culik (1962). The relations -~ and ~1 are termed level relations for variables and productions, respectively. 2.3. DEFXN~TION.Let G = (V, 24 P, ~) be a grammar and -~1 the level relation on P. A set Go c P is said to be a grammatical level of G if and only if Go is an equivalence class with respect to the relation 2 1 . 2.4. DEFINITION. A grammar G = (V, Z, P, ~) is said to be reduced-see Ginzburg (1966)--if for every variable A ~ a there are terminal strings x, y and z such t h a t ~ ~ x A y ~ xzy. 2.5. Remarl~. A context-free grammar is usually defined as a quadruple (V, ~, P, ~). If G is a reduced grammar and every symbol from ~ occurs in a string in L (G), then G is uniquely determined by P and ~. The basic relations -% ~ , ~ , ~-, .~* do not depend on ~. T h a t is why we regard grammatical levels as (non-initial) context-free grammars. 2.6. LEM~a. For every grammar G r = (V', ~, P', o-) such that e is not in L (G') there is a grammar G = (V, Z, P , ¢) such that the following conditions are satisfied: (1) V c V'; • (2) if A , B are in V - ~, then A (7*~- B if and only if A ~ - B; (3) (4) (5) (6) L ( G ' ) = L(G); G is an e-free and reduced grammar; There are no productions of the form A --~ B, B C V -- ~, in P, A D.(7 A for every variable A ~ a, A C V - ~. Proof. Let G' be given and e not in L (G'). By using the constructions given in Lemmas 1.4.2, 1.4.3, 1.8.2 Ginzburg, 1966 and Theorems 1.8.1, 1.8.2 (Ginzburg, 1966) we can easily construct ~ grammar G1 for L (G~) such t h a t for G1 the conditions (1) to (5) are satisfied. Now, let 61 = ( V i , Z , / ) 1 , a>. Suppose t h a t in V1 - ~ a variable A ~ ~ exists such t h a t A *-- A does not hold. Let G2 = (V~ -- {A}, Z, P2, ~) be a (71 grammar with P~ -- {B ---> ulv~u2 • • • v,u,+~ ; B --~ u l A u : • • • Au,+l is in P1, A does not occur in UlU2 . . . u,+l and A -~ vl is in P1 for 1 =< i =< n}. Obviously for G~ the conditions (1) to (5) are also satisfied and G~ has fewer variables than G~. By repeating the application of the last construction we obtain a grammar G satisfying all the conditions (1) to (6). 2.7. DEFINITION. Jk grammar G is said to be perfectly reduced if for G the conditions (3) to (6) of the previous lemma are satisfied. CONTEXT-FREE LANGUAGES 155 2.8. COnOLLAaY.I f G is a perfectly reduced grammar then G (A ) is an infinite set for every variable A # a, A in G. 3. CLASSIFICATIONS OF GRAMMARS AND LANGUAGES 3.1. DEFINITION. Denote by 8 the class of context-free grammars and by ~s the class of context-free languages. Moreover, let I be the set of all nonnegative integers. A mapping K: 8 --~ I (K: 2~ -* I ) is said to be a classification of context-free grammars (languages). The concept of a classification of grammars (languages) iust defined is too general to obtain some more interesting results. In the sequel only some special classifications will be studied. They will be chosen in such a way as to characterize some aspects of our intuitive notion about the complexity of grammars and languages. However, one can expect that also some other classifications will be found to be interesting and important or even more important and more interesting. That is why the definitions are formulated rather generally. There are many ways to associate a classification of languages with a given classification of grammars. Some of them will be investigated in the following. If a classification of grammars is meant to characterize the complexity of grammars, then it seems to be quite natural to extend K to classify languages in the following manner: 3.2. DEFINITION.Let K be a classification of (context-free) grammars. We extend K to (classify) context-free languages putting K ( L ) = min {K(G); L(G) = L} for every language L and we shall speak about a natural extension of K from 8 to g U 2~. RemarIc. Whenever in the following K will be a classification of grammars and K will be applied to a language L, then tile natural extension of K is supposed. 3.3. DEFINITIOn. A classification K of languages is said to be nontrivial (connected) in an alphabet Z if for every integer n > 0 there is a language L c Z* such that K(L)>= n ( K ( L ) = n). A classification K of grammars is said to be nontrivial (connected) in ~n alphabet ~ if the natural extension of K has this property. 4. CLASSIFICATION A C C O R D I N G TO T H E MAXIMAL D E P T H OF GRAMMATICAL L E V E L S 4.1. DEFINITION.Let G be a grammar and Go a grammatical level of G. Denote by Dept (Go) the number of distinct variables on the left-hand 156 GaUSKa sides of productions in Go. Moreover, denote Dep (G) = max {Dep (G0) ; Go is a grammatical level of G}. Dep (G) (Dep (L)) is called the maximal depth of grammatical levels of G (of L). In this section the classification of languages L according to Dep (/1) is studied. I t is well-known t h a t Dep (R) = 1 for every regular set R (Ginzburg, 1963b). Languages L with Dep (L) = 1 are called sequential. I t is known t h a t there is a sequential language which is not regular--for example l a~b'; n>= 1}--and t h a t there is a language which is not sequential (Ginzburg, 1963b). The following theorem asserts t h a t a more detailed classification of (linear) languages is possible. 4.2. T ~ E o n ~ . For every integer n_~ 1 and every alphabet Z with at least two symbols there is a linear language L over Z such that Dep (L) = n. Proof. The case n = 1 is trivial. Let now an n ->__2 be given. Denote by L . the language generated by the grammar with the productions :2 ¢~~ azb l aba~A2bab (I) A t ---> a~Aib I bai+lA~+lba 2 <_ i <_ n -- 1 A~ ~ a~A~b l bca l b2~ In order to prove the theorem it is sufficient to prove t h a t Dep (L~) = n. Since the inequality Dep (L,) 6 n follows directly from (1), it suffices to show t h a t Dep (G)> n for every grammar G for L~. To do this we first introduce some properties of strings of the language L~. Let us define the so called LP-strings x (left-part strings) and their representations x (x) in an inductive way (i) if x = (a~)~b(a2)Z~b . . . b(a~)l~b for some positive integers /1, .-- l~, then x is an LP-string and x ( x ) = abZ~a . . . ab ~, (if) if x~ and x~ are LP-strings, then x~xa is also an LP-string and (iii) there are no other LP-strings. By using LP-strings we can give a more detailed characterization of the strings in L~ : (iv) z C L~ if and only if z = x b a l x ( x ) for an LP-string x (v) if z E L~, then the decomposition z on xbax(x) is determined uniquely because there is no occurrence of "aa" in x (x) and "bb" in x if x is an LP-string. We denote here and in the remainder of this paper grammars in an abbreviated ror~. w e writ~ a -* ~, I~ I"" ["~ inste,d of ~ -* ~, ~ -* ~ ,-.., ~ -~ ~,. CONTEXT-FREE LANGUAGES Denote ,(z) = x and I57 ~(z) = x ( x ) . For every string z let Ib (z) (la (z) ) be the number of occurrences of b (of a) in z. According to (i) to (iv) we have (vi) if z C L~, then lb(z) <= l~(z) < nlb(z) and IbO,(z)) = ~o(,(z)) T o prove t h a t Dep (G)_> n for every grammar G for L~ assume by way of contradiction t h a t there is a grammar G for L such t h a t Dep (G) < n. According to L e m m a 2.6 we can suppose t h a t G -- (V, Z, P, ~) is a perfectly reduced grammar. G r a m m a r G is linear. To prove it assume by way of contradiction t h a t in G there are variables A, B and terminal strings u, w, v such t h a t z ~ uAwBv. F r o m this and from (iv) and (v) we get immediately t h a t both sets G (A) and G (B) are finite, which contradicts Corollary 2.8. Thus G is a linear grammar. Now suppose that A ~ uAv for a variable A and strings u, v. We shall investigate the structure of strings u and v. Since G is perfectly reduced, there are terminal strings p, w and q such t h a t a~pAq~pu~wvlqC L,~ for every i-> 0. According to (iv) and (v) we get (vii) the string "bb" does not occur in pu 2 and the string "aa" does not occur in v2q. Denote b y zi the string pu%v¢q and let us consider three cases (A) u = e. T h e n zi = pwv~q. B y (vii), the string "aa" does not occur in v~q and therefore , (z~) = v (z~.) for all i, j. Hence z~ = zi for all i a n d j . Consequently v = e and this contradicts the assumption t h a t G is a perfectly reduced grammar. Hence, the case u = e is impossible. (B) u = a ~ for an integer m > 0. B y (vi), the symbol b occurs in v. B y (vii) there is no occurrence of the string "as" in v~q and thus, l p(z~)[ <-_ ]pu~w] for all i. Hence lb(~(zi)) is a constant and, b y (vi), l~(~(z~)) is the same constant. Therefore v = bin1 for an integer mlo From this we deduce t h a t there are strings p0 and w and integers i < n~ k,/c~, k2,/c3, s~ < i , s~ < i, sa < i such t h a t p = poalk~a "~, u = a Ik2+~2, w = alk3+'~bw~ where the string "aa" occurs in w~vq and either po = e or p0 has n/c ~- (i -- 1) occurrences of b~s and p0 ends with b. Then sl ~- s2 -~ sa E {0, i}. Since also pu2wv2q E L~ , sl -]- s2 -}- s2 + s3 C {0, i, 2i}, which is possible only if s~ --- 0. Therefore m = ilc~. From this and from (i) to ]58 GRUSK~ (v) we deduce that ml = k2. Similarly we get that if A ~ a~'Ab ~1 for some integers ~ and ~ 1 , then r~ = ~ i i with the same i. (C) The symbol b occurs in u. According to (vii), there are uniquely 2 2 determined strings w0 and w~ such that uwv = wobaw~ where lw01= > l u l -- 1 and ]wl[= > lvl -- 1. Hence lb(~(z~)) = Ib(p) 9- (i -- 1), lb(u) 9- lb(w~) q- 1. On the other hand we deduce from (i) to (v) that for every z E L~, Ib(~ (z)) is a multiple of n. Therefore lb (u) = kn for a suitable k and we have (viii) the number of b's in u is a multiple of n and there is no occurrence of "bb" in u 2. This information about the structure of u and v will be sufficient for our purposes. Before approaching the main part of our proof we have to introduce the concept of an ith level for strings in L~ and the concept of a variable of the type i. We now do it. Let z E L~ and z = robrlbr2, rl = a m, I ~(z)l> I robr~b I. If lb(rob) = /~n q- j, j < n, then we say t h a t rl forms the (j q- 1)th level of z. If z E L~, Ib (v (Zo)) = kon, 1 <=j <--_n, t h e n j t h level occurs in z k0-times. If r0 = a ~, then we say t h a t r0 forms the first level of z. A variable A in G is said to be an a-variable in G if there are strings u C {a}*a and v such that A ~ uAv. B y (B), for every a-variable A there is a uniquely determined integer j, called type of A, such t h a t if A ~ uoAvo, Uo E {a} *a, then u0 = a j~, v0 = b~ for suitable ]c. If A is an a-variable of the type j and, in a derivation # in G from A, the string u~Avi, ul E {a} *a is derived, then u~ always builds up exactly the j t h level of the derived strings. A string z is said to be /-complete if z = xo~bax(Xo ~) where xo = a~bc~b "'" ba~ib, lb(xo) = n. Now let d be the maximal length of right-hand sides of productions in G and p the number of variables in G. Let z0 be an/-incomplete string where i > pd. Let # be a derivation of z0 from a in G. Since G is a linear grammar, # determines uniquely the sequence of productions (2) Aj--+ as, j = 0, 1, " . , m being used, following one another, in the derivation ~. Since i > d, we get from (viii) t h a t no production in (2) has the form A --~ uAv where b occurs in u. Similarly in (2) there are no productions of them form A "--->uBv where B ~ A and u has two occurrences of b. Therefore the productions in (2) have one of the following two forms: (3) A ~uBv, u E {a}*, 159 CONTEXT-FREE LANGUAGES (4) A --*uBv, B ~ A, u ~ {a}*b{a}*, B is a variable. Let j~, j2, ... , j, be the indices of all productions of the type (4) in (2). If we speak in the following about a kth group of productions, we shall have in mind the productions with indices jk -t- 1, .-. ,A+I -- 1. Since Ib(~(zo)) = ni, we have r >- ni -- 1. Moreover, from the inequality i > pd we get jk + p < j~+~ for k = 1, 2 , . . . , r - 1. Whence for any k = 1, 2, • • • , r - 1 a variable Bk exists such t h a t Bk occurs on the lefthand side of at least two productions of the kth group of productions. Hence B~ is an a-variable. Let sk be the type of B~. Since r > in - 1 > p, at least two of the variables B1, • • • , Be are the same. Let B~,0 and Bkl, 1~0 < kl, be two neighbouring occurrences of the same variable B in B1 • • • B , . Let s be the type of B. Then the koth and kith groups of produetions build the sth level of the derived string. If Dep (G) < n, then /~1 < k0 + n and we get t h a t there are at most n - 2 levels between two sth levels of the string z0 C L~ which is impossible. Hence the assumption Dep (G) < n yields a contradiction and the theorem is proved. Remark. B y Culik (1962)--see also Kop~iva (1964) for a correction-any context-free language can be represented b y a finite number of applications the following operations: (i) set union, set product, substitution and two special (n + 1 )-ary operations (ii) [[¢, 4~, . - . , 4~]]~ and [[¢, 41, "'" ,*~1]* A~s a corollary of the preceding theorem we get t h a t for every m there is a language L such t h a t if we want to represent L b y using operations (i) and (ii) in the same way as in Culik (1962), then we have to use some n-ary operations of the type (ii) with n -> m. B y l~edko (1965), any context-free languages can be constructed from basic languages by a finite number of the operations composition and weak recursion. B y the preceding Theorem, for every m there is a language L such in order to construct L an n-ary operation of weak recursion has to be used with n => m. 5. C L A S S I F I C A T I O N A C C O R D I N G T O T I l E M A X I M A L N U M B E R OF (NON-ELEMENTARY) GRAMMATICAL LEVELS 5.1. DEFImTmN. Let G be a grammar. Denote by Lev (G) the number of grammatical levels of G. 3. grammatical level Go is said to be non- 160 ~nvs~ elementary if Dep (Go) > 1. Denote b y Lev~ (G) the number of nonelementary grammatical levels of G. A grammar G is said to be inherently context-free if Lev (G) = 1. A language L is said to be inherently contextfree if Lev (L) = 1. 5.2. THEOREM. For every integer n _~ 1 and every alphabet Z with at least two symbols there is an inherently context-free language L in ~ such that Dep (L) = n. The Theorem follows directly from the proof of Theorem 4.2. 5.3. TI~EOREM.For every integer n > 1 and every alphabet ~ with at least two symbols there is a regular event R over Z such that Lev (R) = n. Proof. T h e case n -- 1 is trivial. Denote R: = {a}*a (J {b}. Since for Rs there is a grammar with two variables we have Lev (R~) _-< 2. Suppose now t h a t for R2 a grammar G = <V, {a, b}, P, ¢} exists such t h a t Lev (G) = 1. B y L e m m a 2.6 we can suppose t h a t G is perfectly reduced. Since Lev (G) = 1, there are strings u, v such t h a t uv ~ e, ~ ~ u~v. B u t then ubv is in R~, which is a contradiction, whence Lev (R~) -- 2. To prove the Theorem for n > 2. Let R~+I, n__> 2, be the regular event generated b y the grammar: (1) 1 <_i<-n. cr~.--) ~ a~b l a~b Consequently, Lev (R.+l) < n + 1. Now suppose t h a t for R,+, a grammar G = <V, {a, b}, P, a} exists such t h a t Lev (G) < n -{- 1. We can again suppose t h a t G is a perfectly reduced grammar. F r o m the proof of L e m m a 1, Gruska (1967) we get t h a t if A E V -- Z, A ~ ~, then (2) there are uniquely determined integers i, s, if, is such that 1 _-< i __< n, 0 _-< i~ =< n, 0 =< i~ =<_ n, 0 ___<s =< 1, G ( A ) c ai~b'{a~b}*a ~ and either s = 0 = ii or s = 1. (In such case we write i = ~ ( A ) . ) F r o m t h a t it follows t h a t (i) if A1 ~ ~ ~ A2 and both variables Ai and As belong to the same grammatical level, then ~(A1) -- ~(A~), and (if) A ~ - ¢ for no variable A # ~. Thus, if Lev (G) < n + 1, then there is an integer i0, 1 =< i0 =< n such t h a t if A is a variable and A # ~, then (A) # i0. Now let a ~ uBv for a variable B and strings u, v. T h e n G(uBv) c {a*(')b} *. Hence the set G(S) N {a~°b}* is finite; a contradiction with the definition of R~+~. Therefore Lev (R,,+,) -- n + 1. T h e main result of this section is CONTEXT-FREE 161 LANGUAGES 5.4. T~EO~nM. For every integer n>= 0 and every alphabet Z with at least two symbols there is a linear language L~ over Z such that L e v . (L) = n. Proof. T h e case n = 0 is trivial. L e t n = 1 be given and L be the language g e n e r a t e d b y t h e g r a m m a r with productions (1) ¢i --~ a~zi b ] a~ba~+~Sibab 1 <- i <- n c'~ ~ Si --+ a n-~i .~o i bzia [ b 2 a 2 Obviously L = Ui~l L~ where L~ are m u t u a l l y disjoint languages and L~ is g e n e r a t e d b y the g r a m m a r G~ with two variables z~ and S~ and with the s a m e productions for z~ and S~ as in (1). I n order to give a m o r e detailed characterization of the languages L~ we define the so called LP~strings, 1 < i _< n, and their characterization in an inductive way. (i) I f x = a~kba(~+~)Zb for some positive integers k, 1 t h e n x is an LP~-string and x (x ) = ab Zab~ (ii) I f xl and x2 are LPi-strings, t h e n so is x~x2 and x(xlx2) = x (x~)x (x~) (iii) T h e r e ~re no other LP~-strings. Consequently (iv) z E L i if and only if z = xbax (x) for a LP~-string x; (v) if z E L i , t h e n t h e decomposition z on xbax (x) is d e t e r m i n e d uniquely because there is no occurrence of "aa" in x (x) and "bb" in x if x is an LP~-string. D e n o t e ~ (z) = x, ~ (z) = x (x); (vi) if z = a~bzoab z E L, t h e n z C L_~ ; l (vii) i f z = powqo ~ L, IPol < Iv(z)[, Iqo] > fPol, t h e n lb(po) la (qo), if z = powqoCL, Iq01 < I~(z)l, ! p 0 1 => 2nlq0l, then z~(po) > lo(q0). Suppose now t h a t there is a g r a m m a r G = (V, {a, b}, p, a} for L such t h a t Lev~ (G) < n. B y L e m m a 2.6 we can suppose t h a t G is a perfectly reduced g r a m m a r . I n a similar w a y as in the proof of T h e o r e m 4.2 we can p r o v e the following assertions: (viii) G is a linear g r a m m a r (ix) if A ~ u A v for a variable A and strings u, v, t h e n (a) u ~ ~ ~ v (b) if u E {a} *, t h e n there are integers k, i such t h a t I ~ i ~ n, v = bk and either u = a ~k or u - a (~+~)~. Moreover, if 162 6ausKa A ~ uiAvl, ul C {a} *, then there is also an integer kl, such that v = bkl and either u = a ~kI or u = a t~+°kl. We say t h a t the variable A is of the type i. B y the foregoing the type of a variable is determined uniquely. (c) if u ~ {a}*, then lb(u) = 2k for a suitable k. (d) if ~ ~ pAq, then the string "bb" does not occur in pu 2 and "aa" does not occur in v2q. Now we prove some other auxiliary assertions (x) If A is an a-variable-i.e. A ~ akAv for some integer k- and ~ pAq, then Ib(p ) = la(q). Proof of (x). Let w E L (A). Since A is an a-variable there are integers kl > 0 and k2 > 0 such t h a t A ~ ak'Ab k~. Hence pwq C L and also pa~lwb~2q C L for every integer i. Hence it follows t h a t I Pl =< I v(pa~wb~k~q) l and [ql < I~(Pa'k~wb'~q)l • According to (vii) we get, for ~ sufficiently large i, Ib(p) = la(b~k2q) = l~(q), Ib(p) = lb(pa ~k~) _--__l~ (q) and this completes the proof of (x). (xi) Let A be an a-variable involved in a derivation of a z C L i , 1 -< i -< n. Then A is of the type i, Proof of (xi). Let ~ ~ pAq ~ pwq = z C Li for some strings p, q, w. Since A is an a-variable, there are integers ]c~, k2 such t h a t A ~ a~Ab k~. Since z E L~, we get according to (i) to (v) and (x) t h a t either k~ = ik2 and Ib(p) is even or kl = (i q- n)k2 and lb(p) is odd. B y (ix) this completes the proof of (xi). (xii) If A, B are two a-variables of the same grammatical level, then both variables are of the same type. i Proof of (xii). Since A and B belong to the same grammatical level, there are strings p, u, w, v, q such t h a t ¢ ~ pAq ~ puBvq ~ puwvq E L Now (xii) follows from (xi). P u t N = p d where p is the number of variables in G and d is the maximal length of right-hand sides of productions in G. Denote by z~, 1 _-< i _-< n, the string x~bax (xl) where x~ -- (a~ba('+~);b) ~. Fix an i, l<_i<_n. Obviously z~ ~ L~. Let ~ be a derivation of z~ from a and let (2) A~--~ a~, j = 1, 2, ... , m be all the productions involved, following one another, in the derivation 163 CONTEXT-FREE LANGUAGES q/~. Since N > d, we have in (2) only productions of the two following types (3) A --~ uBv, u C {a}*, v E {b}* (4) A --~ uBv, B ~ A, B is a variable and u ~ {a}*b{a}*. Let f f ,l " ° ° jT be, in the increasing order, the indices of all productions of the type (4) in (2). If we speak in the following about a kth group of productions, we mean the productions with indieesjk + 1, - . . , j~+l - 1. Now let 1 -< k _< r - 1. Since N > p, fl+l > 3"k • p and therefore in the ]~tI~group of productions, there is a production having an a-variable on the left-hand side. Denote this variable by Bk. Since/b(,(z~)) = 2N and only productions of the type (3) and (4) are in (2), we get immediately r>= 2 N - 1 > p. But then there are integers ]ci, k2 such that 1 =< kl < k2 < r and Bk~ = B ~ . Since the productions of the type (4) have different symbols on the left-hand and right-hand side, a vari&ble A ~ Bk~ has to exist such that B h ~- A and $ A , ~ Bk~. Thus, the variable Bk~ belongs to a grammatical level with at least two variables. Moreover, by (xi), Bk~ is of the type i. Hence, for every integer i, 1 -< i < n, there is an a-variable A ~of the type i which belongs to a non-elementary grammatical level. By (xii), if A and B are two a-variables in the same grammatical level then both are of the same type. Hence Lev~ (G) > n, wMch contradicts our assumption Lev~ (G) < n. From this and from (1) we get Lev~ (L) = n. Theorems 4.2 and 5.4 do not hold if Z is an alphabet with only one symbol. Indeed, in t h a t ease, b y Gruska (1967), Dep (L) = 1, Lev (L) = 2 and Lev. (L) = 0 for every language L c E*. Remarl~. As to the grammar for ALGOL 60, Naur (1963), we have, by Kop~iva (1964), Lev~ (L (ALGOL 60) ) =< 2 Dep (L (ALGOL 60) ) =< 26. 6. CLASSIFICATION ACCORDING TO THE NUMBER OF VARIABLES AND PRODUCTIONS 6.1. DEFINITmN. Let G be a grammar. Denote by Var (G) (Prod (G)) the number of variables (of productions) in G. As to the classification of languages according to Var (L) we have the result of Gruska (1967): 164 GRUSKA 6.2. THEOrEm. For every integer n ~ 1 and every alphabet Z with at least two symbols there is a regular event R in Z such that Var (R) = n. Finally we consider the classification according to the minimal humber of productions. 6.3. TI~EOaE~. For every integer n>= 1 and every alphabet ~ there is a finite set L C ~* such that P r o d (L) = n. Proof. T h e case n = 1 is trivial. L e t n > 1 be given. Denote L = {a~; i = 0, 1, . . . , n -- 1}. Obviously Prod (L) < n. Let G = (V, {a}, P, a} be a g r a m m a r for L such t h a t Prod (G) = Prod (L). G is a reduced grammar. I f A ~ z is a variable in G then the set G (A) contains at least two strings. Indeed, in the opposite case there would exist a g r a m m a r G~ for L having fewer productions t h a n G. T h e same obviously holds for G (~) = L. We now prove t h a t G is a linear grammar. Indeed, suppose t h a t there are terminal strings x, y, z and variables A, B such t h a t z ~ x A y B z . Let a ~ = xyz, {a ~, a ~} ~ G ( A ), {a ~, a ~} ~ G ( B ) . Since L ( G ) = L, there are integers s~, Ss, s~ and s4 such t h a t i2+j~ i1~3"1 = 2 ~1 - i , il + j~ = 2 "~ - - i, = 2" ~ - i i2--~ j2 = 2~4 -- i. Hence 281 -- 2 ~ -- 288 + 284 = 0. This is possible only if either Sl = ss and s~ = s4 or sl = 83 and s2 = s4. Thus, jl = j2 in the former case and il = is in the latter one, which contradicts the fact t h a t the sets G (A) and G (B) contain at least two different strings. Hence, G is a linear grammar. N e x t suppose t h a t there is a variable A in G such t h a t A occurs on the right-hand side of at least two productions B1 --~ xiAy~, and B2 -~ xsAys. Investigate two cases: (i) If ~ ~ uoB~vo and ¢ ~ ~oBsO0, then ]Uox~y~vol = 1~oxsy2~ol. B u t then we can omit the production Bs ~ x2Ay2 without changing the set generated b y the g r a m m a r and thus this case is impossible. (ii) There are u0, vo, u0, v0 such t h a t ~ uoB~vo, ~ ~ ~0Bs~0 and l uoxlylvol ~ [~0x2y2~01 • L e t {a ~, a ~} c G (A). Denote j~ = l uoxly~vo I, j2 = [~ox2y20o [. T h e n there are integers Sl, ss, sa and s4 such t h a t j~+i~ = 2 "~, j~+is = 2 '~ j2+il = 2 "3, js+is = 2 *~ CONTEXT-FREE LANGUAGES 165 and we get a similar contradiction as above. Hence, every variable A occurs only once in the right-hand sides of productions. But this means that if we omit a variable A ~ ~ from the vocabulary of G and in all right-hand sides of productions we replace A by its right-hand sides, then a grammar for L with a smaller number of productions and with a smaller number of variables is obtained. Consequently the grammar G has only one variable and thus, Prod (L) -- n. Re~zark. If L is a finite language with n strings then obviously Prod (L) =< n. A question arises as to whether it is possible to put a reasonable lower bound for Prod (L) if L consists of n strings. The following example indicates t h a t the answer is likely negative. 6.4. Example. Let n=> 1 be an integer. Consider the grammar G with three productions cr .--->S", S .--* a, S ~ aa Then L ( G ) = {a s , n <- i <- 2n} and Prod (L) = 3. 7. B O U N D E D C L A S S I F I C A T I O N S According to what has already been said in Section 3, there are many ways how to associate a classification of languages with a given classification of grammars. One of them, the so called natural extension, is considered in Section 3; i.e., if K is a classification of grammars, then, for a particular language L, K (L) = rain {K (G); L (G) = L}. In this definition the minimum is taken throughout all context-free grammars. If only a special class of grammars for L is admitted, then we speak about a bounded classification. 7.1. DEFINITION. Let ~b be a class of grammars and K a classification of grammars. Put, for a language L, K ~ ( L ) = inf {K(G); L ( G ) = L, G C ~}. K~ is said to be a classification K bounded to ~. The case K¢ (L) = ~ is possible for a language L. I t means that L~2~. Throughout this section we shall consider the class 8 of context-free grammars, the class 8, of non-self-embedding grammars and the class 80 of one-side linear grammars and investigate the classifications of regular events with respect to the classifications considered so far. First we have 166 GRUSKA 7.2. THEOREM. Dep~ (R) = Dep~ (R) = 1, Lev~,~ (R) -- Lev~,s~ (R) = 0 for every regular event R. Proof. Directly from Ginzburg & Rose (1963b)--see also Exercise 10, p. 55 Ginsburg (1966). On the other hand the classifications Dep~ o and Lev~,~ o are conuected in the class of regular events. We have 7.3. THEOREm. For every integer n>= 1 there are regular events R and R' such that Dep~ 0 (R) = n = Lev~,~ 0 (R'). Proof. Let n > 1 be given. Denote b y R the regular event generated b y the grammar with productions zl --~ ¢~a~ ] ~i+lai+ib, 2 _< i -< n -- 1 and by R' the regular event generated b y the grammar zl --> o'~a~]Sia~+~b, S i "--> ~ i a 1 <- i <_ n Icqa b [b B y using methods similar to those in proofs of Theorems 4.2 and 5.4 we can prove that Dep~ 0 (R) = n = Lev~.80 (RP). If the classification Var is considered then different results are obtained if different classes of grammars are considered. 7.4. T~EOnEM. There is a regular event R such that Var (R) < Var~ (R) < Vary0 (R) Proof. Denote (1) R = {a} *ba{b} *{a} *ba{b} * In the following we shall consider several grammars. In all the cases we denote b y d the maximal length of productions of the grammar under construction. P u t ~2o = adbab~adbab~ CONTEXT-FREE LANGUAGES 167 For x = xtz2 - -. xl~l, x~ -( [a, hi, let ¢ (z) deaote the number of indices i such that x~ ~ z~+l. TheI~ we have (i) i f x C / ~ , t h e n ~ ( x ) < 7 The proof of the Theorem, will be divided into several steps: (ii) Var (R) > 1. Proof of (ii). Suppose (ii) does not hold, i.e. there is a grammar G for R with one variable ~. B y (i), G is linear. Moreover, if ~ --~ z~¢~y~ and ¢ ~-* x ~ y ~ are two productions of G then, by (1) and (i), x~.v2 6 [a} *, y~y2 6 {b}*. ttenee 50 C L ( G ) and this contradiction proves (ii). (iii) Var (R) = 2. Immediately from (ii) and from the fact that R = L ( G ) where G has productions z --~ S~S~ , S t --~ aS~l Slb ]ba. (iv) Vara,, (R) > 2. P r o o f of (iv). Suppose that (iv) does not hold. Then, by (iii), there is a grammar G for R with two variables ~ and A. If A ~ ~, then G is onesided linear. Without loss of generMity we can suppose that G is right-linear. According to (i), if ¢ --> xlo-, ~ --~ x2~, cr - ~ xaA, A ~ x,¢, A --~ x~A, A --~ x6A, then xlx2x3x4xsx~ C l a}* and hence again ~0 ~ L (G). Now suppose t h a t A -- ~ does not hold. I f A --~ x~Ay~ and A - ~ x~Ay2, then, by (i), either xlx2 = e and ~ (yly2) = 0 or y~y~ = e and ¢ (x~x2) = O. Similarly if~ -~ ul~vx, ~ ~ ~e~'e, then either u~u~ = e and G(v~v~) {b}* or v~v~ = e and G(u~u~) ~ {a}*. From that we conclude 20 ~ L ( G ) and this completes the proof of (iv). (v) Var~ (R) = 3 I t follows from (iv) and from the fact that R is generated by the grammar with productions: ~ --~ S S , S ---> a S t bS~ , S~ -+ S~b ta. (vi) Vara 0 (R) > 3 Proof of (vi). By (v), Varz 0 ( R ) > 3. Now suppose that there is a onesided linear grammar G for R with three variables. We can suppose without loss of generality that G is right-linear. If A --= B for two vari. A ~ xaB, B ~*x 4 A , B ~*x ~ B , B ~*x ~ B , ables in G and A ~ x~A, A ~ xoA, then, according to (i), ~ (x~xo. • • • x~) = O. Similarly if A --~ x~A, A ~ x~A for a variable A in G, then ~ (x~xe) = O. From that we again conclude 20 ~ L (G) and (vi) is proved. This completes the proof of the Theorem. A similar result holds for the classification Lev. 7.5. T~EOaE~. There is a regular event R such that Lev (R) < Levs~ (R) < Levs0 (R). 168 GRUSKA We do not give the detailed proof here but using the ideas and results of the foregoing proof we can easily show that Lev (R) = 2, Le~,~ (R) = 3, Lev~0 (R) > 3 f o r R = {a}*ba{b}*ba{a}*ba{b}*. Finally we have 7.6. THEOREM. There is a regular event R such that Prod (R) < Prod~ (R) < Prods0 (R) The proof is not given in this case either. But it is quite easy to show for the regular event R = {a} *ba{b} *{a} *ba{b} *, that Prod (R) = 4, Prod~ (R) = 5 and Prod~ o (R) > 5. The following example shows that the difference between Var (R) and Var~ 0 (R) can be arbitrarily large. 7.7. Example. P u t Ri = {[a}*ba{b}*} ~. By using similar methods as above we can show that Var (R~) = 2, Var~ 0 (R~) = 2i, Prod (R~) = 4, Proda~ (R~) > 2i, Lev (R~) = 2, Levso (R~) = 2i. The following examples show that bounded classifications can yield essentailly different classifications of languages than the original classifications. 7.8. Example. Denote R1 = {b} *{a} *ba{ba}*a{aba}* R2 = {b} *{a} *ab{b} *{a} *ab{b} *{a} * T h e n V a r (R1) = Var (R2) = 2, 3 = Var~ (R~) < Var~ (R~) = 4, 6 = Vary0 (R2) > Vary0 (R1) = 4. 7.9. Example Denote R1 = {a}*b{a}*b{a} *15}*a{b} *a R2 = {a} *b{a}*ba{b} *{a} *ha{b} *a{b} * Then 3 = Var (R,) > Var (R~) = 2, 3 = Var~ (R,) = Var~,~ (R2) = 3, 5 = Vary0 (R~) < Vary0 (R2) = 6. Remark. Let ~1,5C2,5C3 be any symbols from the set { < , > , = }. It seems that there are regular events R~ and R2 such that Var (R~)SC~ Var (R2), Vara~(R1)~2 Vara~ (R2), Vara0 ( R 1 ) ~ Vary0 (R~). 8. RELATIONS BETWEEN CLASSIFICATIONS In this section the basic concepts concerning the relations between classifications of grammars are defined and some relations between 169 CONTEXT-FREE LANGUAGES classifications defined in preeeeding sections are investigated. To be more brief, we shall write K1, Ks, • • •, K5 instead of Vat, Dep, Lev, Lev~, Prod, respectively. Moreover, we shall write K~,~ instead of (K~)~ if ~ is a class of grammars. 8.1. D~FINITION. Let K be a classification of grammars and ~ a class of grammars. Put K~I(L) = {G; L(G) = L, G ~ ~, K(G) = K¢.(L)} for every language L. K-~I(L) is said to be the class of the simplest grammars for L with respect to K and ¢,. We write K -1 (L) instead of K2~ (L ). 8,2. COROLLAaY.I f L1 ~ L~ are two different languages and ~ a class of grammars, then K~ ~(L~) fl K¢ 1(L2) --- O. 8.3. DEFINITm~. Let K~ and K~ be two classifications of grammars and ~ a class of grammars. The classifications K~,~ and K~.~ are said to be bound if K:,I¢ (L) N K~.~(L) = 0 for no language L. K,,¢ is said to be stronger than Kz,~ if K : ~ (L) C Kz,~ - 1 (L) . Finally K~,~ is said to be equivalent to K~,~ if K~.¢ -~ (L) = K2~,~ ~ (L) for every language L. 8.4. ConosLAm-. Let K be a classification and f: I --~ I a monotone increasing function. Define K~(G) = f (K (G) ) for every grammar G. The classifications K and K , are obviously equivalent. 8.5. T ~ E o n ~ . Let K~ and K~ be two classifications of grammars and ~1 _~ ~hfor two classes of grammars. Let for every language L, K-21,~)L/ n K~,~¢(L ) N ~b~ ~ ~. Then (i) the classifications K~,,~ and K¢,~ are bound, (ii) if K~,¢ is stronger than K~,¢ then K~,.~t is stronger than K~,~t , (iii) if K~,, is equivalent to K~,¢, then K~,¢~ is equivalent to K~,~t. Proof. (i)follows from the condition that K:,~ (L)/7 ~.~(L) N ~ ~ for every language L. Now let K,,~ be stronger than K~.~. Let L be a language. Then (,) K -~ .,~ (L) = {G; L (G) = L, K~,, (L) = K~ (G), G ~ ~P} c K -1~,~(L) = {G; L(G) = L, K~,,~(L) = K~(G), G ~ g,} and K,~,¢(L) = rain {K~(G); L(G) = L, a ~ ~}. Since we have K.,¢ (L) = K.,¢.~ (L) and similarly K~,~ (L) = K~,~ (L). Therefore K:~t~(L) = K=~(L) ['] ~bl. Similarly K-2I~,~(L) = K -1~,¢(L) ~ ~,~ and (ii) follows from (*), (iii) follows directly from (ii), As to the classifications K~, 1 -< i _< 5 we have the following results. 170 O~USK,t 8.6. TgEon~M. Let 1 <= i, j <= 5, ij ~ 8. Then the classifications K~ and Kj are bound if and only if i = j. The case i = j is trivial. In order to prove the Theorem it is sufficient to show that if i ~ j, ij ~ 8 (i.e., the cases i = 2, j = 4 and i = 4, j = 2 are omitted), then a language L exists such t h a t K71(L) N K71 (L) = 0. We shall give here such a language for every considered pair, i, j but the proof t h a t K71 (L) N K71 (L) = @ will be omitted because it is quite obvious in some cases and in other cases it can be proved by using similar methods as those used in previous sections although the proofs are rather cumbersome. To be brief denote ~bi.i(L) = K71(L) n KTI(L) for 1 =< i, j ___ 5 and a language L. Obviously ~,j(L) = ~.,(L). (i) Denote by R1 the regular event generated by the grammar with the productions z ~ ~a[ zlt~b I(r2a3b a2 ~ a2a~l(Tsa4b]aab ~ ~ ~3a41 (Tab Ib If G C Kit(R1), then Var (G) = 4 = Dep (G), Lev~ (G) = 1. O n t h e other hand Dep (R1) = 1 and Lev~ (R1) = 0 because R1 is a regular event. Therefore ~h,s(R1) = ~1,4(R1) = 0. Moreover, if Lev (G) = 1 and L(G) = R~, then Dep (G) > 1, Lev~ (G) > 0. Thus, ~,~(R~) = ~,~ (R~) = V. (ii) Denote by R~ the regular event generated by the grammar aa ]o'la2b a ~ Hence Lev (R~) = 1. Moreover, if G ~ K~ ~(R:), then Var (G) = 4. On the other hand Vur (R~) < 4. Indeed, R~ is generated by the grammar 3 2 a~ --->c~z~1ala ]ba ~ a b 8 Hence ~1,3(R2) = 0. 2 171 CONTEXT-FREE LANGUAGES (iii) Denote by R3 the finite set {a~; i = 3, 4, 5, 6}. Obviously Prod (R3) = 3 and if GC /C551(R3), then Var (G) = Lev (G) = 2. On the other hand, R~ is a finite set and therefore ~1,5(R~) = ~8,5(R8) = 0. (iv) Denote by R4 the regular event generated by the grammar cr --~ o'a[ cria2b [a~a~b ~ -~ ~abl ~ a ~ [~a~b ~2 "-~ ~abl zla2b I ¢~a3 [b Prod (R4) = 10. If GC K~ ~(R4), t h e n D e p (G) = 3, Lev~ (G) > 0. Since R4 is a regular event, we have ¢~,~(R4) = ~4,5(R4) = 0. We have just showed t h a t if 1 < i, j - 5, ij ~ 8, then a regular event R exists such t h a t ~bc~-(R) = 0. On the other hand K~-2~(R) = K ~ ( R ) for every regular event R and this fact complicates the investigation for the c a s e i = 2, j = 4. Open problem. Are the classifications Dep and Lev~ bound? 9. MULTIPLE CLASSIFICATIONS 9.1. DEFINITION. Let K. and K~ be two classifications of grammars. Put, for a language L, K(,,~) (L) = man {K,(G); G E K ~ ( L ) } , K-1 <,,~)(L) = {G; G C Ky~(L), K , ( G ) = K(~3)(L)} , K(,m) is said to be a multiple classification of languages composed from K , and K~. 9.2. COROLLARY. K(~,~) (L ) = K~,K;I(L) (L ) and KTJ,~) (L ) = K -1 - i T j"~ ..K~ (~) fv~ for every language L and classifications K . and ICe . 9.3. COROLLARY.K(.,e) (L )>= K~ (L ) for every language L and therefore K(.,e) is nontrivial if Ks is. In a natural way an important question arises as to what we can say about the relation between the classifications of languages K(.,~) and K(~,~). First we have 9.4. THEOREM. For every language L either K-(~,~) (L ) N K -~ (~,o) ( L ) = o --1 (L) = K-1 or K(~,~) (~,.) (L). 172 GnUSK~ Proof. Suppose that K -1 (.,~) (L) • K -1 (~,.) (L) ~ e for a language L. L(G) = L, K~(G) = K~(L) = (.,~)(L) Q K(~,~) (L). Then Let G C K -1 -1 K(~,.)(L), K(~,,)(L) = K.(G) = K . ( L ) . Now, let GI be any grammar in --1 K(.,,) (L). Then L (G1) = L, K, (G1) = K~ (L ) and, moreover, K . (G1) = --1 Ka(G) = K . ( L ) . Hence G1C K(~,~)(L) and we have K -I(.,,)(L) K -1(~,.)(L). The reverse inclusion can be proved similarly. 9.5. THEOREM. For every language 5, K~(~,~)(L ) N K~(~.) (L ) = K : I (L ) n K ; 1(L ). Proof. G C K ~ ( L ) n K-~(L) if and only if L(G) = L, K~(G) = K . ( L ) , K~(L) = K~(G), which is true if and only if G E K~,~)(L) M K(~,.) -1 (L). 9 . 6 . THEOREM. K -I (~,~) = K(~,~) -1 if and only if classifications K~ and K~ are bound. Proof. Directly from the preceding theorems. (~,~)(L ) for 9.7. T~EOREM. K(~,~) --1 (L ) = /C~I(L) N K ~ (L ) = K-1 every languages L if and only if the classifications K . and K~ are bound. Proof. According to Theorems 9.4 to 9.6. 9.8. T~OREM. The classifications K . and K~ are bound if and only if K(~,~) = K . , i.e. K(.,~) (L ) = K~ (L ) for every language L. Proof. K(.,a)(L) = min{K~(G); G C K~I(L)}. If K . and Ka are bound, then for every language L,K~ 1(L) N K~1(L) ~ 0, hence K(~,~)(L) = rain {K.(G); G E K~(L)} = K . ( L ) . On the other hand let K(.,a) = K . . Then rain {K.(G); G ~ K ~ ( L ) I = K . ( L ) for every language L. Hence, K~I(L) ~ K-~(L) ~ ¢) and the classifications K . and K~ are bound. 9.9. COROL~nnr. I f K . and K~ are considered as classifications of languages, then K(.,~) = K . if and only if K(~,.) = K~. 9.10. TI~EOnE~. Let the classifications K~ and K~ be bound. Then K . ( L ) = K~(L) for every language L if and only if K(~,~) = K(~,.). Proof. According to Theorem 9.8. Remark. The assumption that K . and K~ are bound is necessary. Consider now the classifications K~, 1 -< i -< 5. We have the following Theorem. CONTEXT-FREE LANGUAGES 173 9.11. THEOREM. I f i ~ j, ij ~ 8, then there is a language L such that K-1 -I (L) = 0. (~,~) (L) N K(j,~) T h e proof follows directly from Theorems 8.6, 9.4 and 9.6. I f i ~ j we have also K(~,~-) ~ K(j,,). Indeed, we have K(2,I)(RI) = 4 < K(1,2)(RI) > 5, K(3,1)(R1) = 1 < K(1.~)(R1) = 4, K(4a)(R1) = 1 < K(1,4) (R~)> 5, K(2,3)(R~) = 4 < K(8,2) (R~)> 5 where R~ is as in the proof of T h e o r e m 8.6. Moreover, K(1,5)(R3) = 2, K(~,5)(R3) = 1, /((3,5) (R3) = 2, K(4,5) (R3) = 0 and K(5,~) (R3) > 3 for i = 1, 2, 3, 4, 5 where R8 is as in the proof of T h e o r e m 8.6. Finally, if R is any regular event, then K(s,4) (R) = 1 and K(4,2) (R) = 0 and if F is a finite set, then K(3,4) (F) = 1, K(~,~) (F) = 0. Finally we have the 9.12. THEOrEm. Classifications K(~,i) , 1 <= i, j < 5 are connected in any alphabet with at least two symbols. Proof. F o r i = j the assertion of the Theorem follows dh'eetly from Theorems 4.2, 5.3, 5.4, 6.2 and 6.3. I n order to prove the rest of the T h e o r e m we shall consider several cases. (i) i = 1. Obviously K(i,j) ({a}) = i f o r j = 2, 3, 4, 5. Now let n > 1 and G. be the g r a m m a r (1) in the proof of Theorem 4.2. Obviously (A) V a t (G.) = Dep (G.) = n, Lev (G.) = L e v . (G.) = 1, Prod (G.) = 2n -j- 1. Denote, as in 4.2, L ( G . ) = L . . We shall now prove t h a t (AI) V a t ( L . ) = Dep (L.) = n, Lev (L.) = Lev,~ (L.) = 1, Prod (L~) = 2n + 1. Indeed, D e p (L.) = n is just the T h e o r e m 4.2. Var (L) > Dep (L) for every language L and therefore n = Dep (L.) < Var (L.) _-< Var (G.) = n proving Var (L.) = n. Lev (G.) = 1 implies Lev (L.) -- 1. Since D e p (L.) > 1 and L e v . (Gn) = 1 we have L e v . (L.) = 1. I n order to prove Prod (L.) = 2n -J- 1 we proceed as follows: L e t G = (V, N, P, a) be a g r a m m a r for L . such t h a t Prod (G) = Prod (L,~). T h e g r a m m a r G is obviously reduced, l~.[oreover, if A ~ is a variable in G, then there are at least two productions in P with A on the left-hand side. (Indeed, suppose t h a t there is only one production A ~ a in P, with A on the left-hand side. Since G is reduced, A does not 174 GnVSKA occur in a and therefore we can delete the production A --* a from P and replace A by ~ in all other productions, which is a contradiction with Prod (G) = Prod (L~).) Thus, Prod (L~)_>- 2 V a r (G) - 1. Since Prod (G,~) = 2n + 1, we get Prod (L,~) = 2n + 1 if Vat (G) > n. Let now Var (G) = n. Since Dep (L~) = n, we have immediately Dep (G) = n. Hence, A ~ - A for any variable A # ¢ in G and, moreover, following the proof of Theorem 4.2, G is a linear grammar. Whence it follows t h a t for every variable A in G there are strings al # a2 such that A --~ al and A --~ a~ are productions and a~, a2 are not terminal strings. However, L (G) = L~ and therefore there is at least one variable B in G such t h a t B --+/~ for a terminal string ~. Thus Prod ( G ) = 2 n + 1. Since Prod (G~) = 2n + 1, we have Prod (L,~) = 2n + 1 and (A1) is proved. Having proved (A) and (A1) we get immediately that K(1.j) (L~) = n f o r j = 2, 3, 4, 5. (ii) i = 2. Obviously K(2.j) ({a}) = 1 for 1 ~ j < 5. Now let n > 1. B y (A) and (A1), G,, E K-71(L,,) if n > 1, j C {1, 2, 3, 4, 5}. Thus, n = K2(L,~) <= K(2,j)(L~) = min {K2(G); G E KT~(L~)} =< K2(G,,) = n and we have K(2.~.)(L~) = n. (iii) i = 3. Clearly K(3,j) ({a}) = i for 1 =< j =< 5. Denote now by R2' the regular event generated by the grammar G2t with productions Obviously (B) Var (G() = Lev (G~') = 2, Dep (G() = 1, Lev~ (G() = 0, Prod (G2') = 4. According to the proof of Theorem 5.3, Lev ( R ( ) = 2. Consequently 2 = Lev (R2') < Var (R2') < Var (G2') = 2. Since R~' is a regular event we have Dep (R2') = 1, Lev~ ( R ( ) = 0. I t is also easy to show that Prod (R2') = 4. Hence (B) holds for R2' as well as G2' whence we get immediately that K(3,j) (R2') = 2 for 1 =< j =< 5. Let now n = 2. Denote by G~+I the grammar (1) from the proof of Theorem 5.3. Clearly (C) Var (G~+I) = n + 1 = Lev (G,+I), L e v . (G~+I) = 0, Dep (G~+I) = 1, Prod (G~+I) = 3n. Denote R~+~ = L(G,,+x). B y the proof of Theorem 5.3, Lev (R.+I) = n + 1. H e n c e n + 1 = Lev (R~+I) -_< Var (G~+I) = n + 1 and 175 CONTEXT-FREE LANGUAGES Var (R,+I) = n + 1. Since R.+I is a regular event we have Dep (R~+I) = 1, L e v , (R,+x) = 0. Finally we shall prove t h a t Prod (R,+I) = 3n. To do this let G = (V, {a, b}, P, z} be a grammar for R~+I such t h a t Prod (G) = Prod (R~+I). Similarly as in point one we can show that if A is a variable and A ~ z, then in G there are at least two productions ~Sth A on the left-hand side. Next, b y using similar arguments as those in the proof of L e m m a 1, Gruska (1967), we can show that if A is a variable in G and there are terminal strings x, y, xy ~ e, such that A ~ x A y , then the assertion (2) from the proof of Theorem 5.3 holds. Then we say that A is an R-variable of the type i (as to the i see (2) in Theorem 5.3). From the structure of the language R~+~ it follows t h a t for every integer i -< n there is an R-variable of the type i and, moreover, if Aj. is an R-variable of the type ~j, j = 1, 2, ix ~ i2, then neither A~ ~ A2 nor A2 ~,~ A~ nor are there strings x, y, z such t h a t z ~ xAlyA2z. Whence it follows that for every R-variable A there are at least two productions with A on the left-hand side and at least one production with A on the right-hand side and such t h a t the variable on the left-hand side is not A. Thus Prod ( G ) > 3n. Summarizing the foregoing results we get (C1) Var (R~+I) = n -~ 1 = Lev (R~+x), Lev~ (R~+I) = 0, Dep (R~+I) = 1, Prod (R.+I) = 3n and hence, by (C) and (C1), K(~.j) (R~+I) = n + 1 for 1 =< j -< 5. (iv) i = 4. Let n>_- 1. Denote b y G~ the grammar (1) in the proof of Theorem 5.4. Clearly (D) Var (G~) = 2 n ~ - 1, L e v ( G ~ ) = n - ~ 1, Lev~ (G~) = n, Dep (G~) = 2, Prod (G~) = 6n Denote L(G,~) = L,~. B y Theorem 5.4, Lev~ (L,) = n and hence Dep (L~) > 1. Since Dep (G~) = 2 we have Dep (L~) = 2. Now let G = (V, Z, P, z} be a reduced grammar for L~. Let z ~ x z y for some strings x and y. According to the properties of strings in L~--see (i) to (v) in the proof of Theorem 5.4--we get G ( x ) = G ( y ) = {e} and therefore we can omit in G all productions having z on the right-hand side without changing the language and without increasing the number of variables or grammatical levels. This fact, together with Lev~ (L.) = n, Var (G~) = 2n -k 1, Lev (G~) = n -b 1, implies Lev (L~) = n q- 1, Vat (L~) = 2n -k 1. Finally w e h a v e P r o d (L.) --- 6n. 176 GRUSKA We do not give the detailed proof of this assertion here, only the main ideas of such a proof will be sketched. Let G = (V, ~, P, z> be a grammar for L~ with a minimal number of productions. We may suppose that does not occur on the right-hand sides of productions, i¥~oreover, for 1 -< i _< n, there is a non-elementary grammatical level G~ in G such that if a variable from G~ is used in a derivation of a string x C L~, then x E L~--see the proof of 5.4. Next, if z ~ x, then there is in x at most one variable which belongs to a non-elementary grammatical level of G. Consequently, the number of productions in G is at least n + (the number of productions in non-elementary grammatical levels). However, every non-elementary grammatical level has at least 5 productions. From that and from Lev. (L.) = n we get Prod (G) > 6n. But Prod (G.) = 6n whence Prod (L.) = 6n. Summarizing the foregoing results we get that (D) holds for L~ as well as G~ yielding K(4,j) (L~) = n, 1=<j=<5. (v) If F is a finite set with n strings, then K(5,1) (F) = n = K(5.3)(F). Put F0 = {a2~; i = 0, 1, . - . , n - 1}.By Theorem 6.3 we have K(~.~) (F0) = K(5.4)(F0) = n. This completes the proof of the Theorem. 10. REMARKS AND OPEN QUESTIONS 1. In this paper some basic concepts concerning classifications of context-free grammars and languages were introduced. However, we used in it only the fact that context-free grammars form a class of generative devices and context-free languages are just the objects that are defined by context-free grammars. That is why the basic definitions and concepts given in this paper can be applied whenever we want to study the classifications (with respect to "complexity") of some generative devices and the objects defined by them. (For example contextsensitive grammars and languages.) 2. If K is one of the classifications K1 to K5 and K(G1) < K(G~) for some grammars G1 and G2 (or K(L1) < K(L2) for languages L1 and L2), then GI(L1) is simpler than G~(L2) either from the point of view of the number of elements needed to describe (languages)--in the cases K1 and K~---or from the point of view of the internal structure of grammars (languages)--see K2 to K4. Therefore, the classification K1 to K~ can be viewed as some criteria of complexity of both grammars and languages. Naturally, some other classifications of this kind are possible and it is very difficult to say which of the classifications gives the best picture of the complexity of grammars and languages. 5loreover, it is questionable whether "the best classification" exists. 'CONTEXT-FREE LANGUAGES 177 3. Other ways of classifying context-free languages are by time and memory requirements for reeognization (Hartmanis and Stearns, and Hartmannis, Lewis, and Stearns, 1965). These classifications and especially the case of real-time recognization are very important from the practical point of view. That is why a question arises as to the connection between the classifications by time and memory requirements and those considered in this paper or classifications of a similar type. Some results indicate that even very simple languages, with respect to the classifications K1 to K5, may be difficult to recognize. For example, if the language generated by the grammar G with the productions - , O~j 1~ t s~ IxZs X -~ 0X0[ 1X1 ]sYs Y--~ OY] 1YI sY I fO I Yl l f slc , where Var (G) = Lev (G) = 3, Dep (G) = 1, Lev (G) = 0, is T(n)recognizable, in the sense of Hartmanis and Stearns (1965) by an on-line multitape Turing machine, then, by Kasami, 1967, there is a constant C such that T(n)> C(n/log n) 2. 4. Some upper bound for the number of steps to recognize contextfree languages are well-known. One of the best is given in ~ulik et al (To appear) in the form (1) (Var (¢)[ x [)s(a)+l(N(G) - 1) where x is a string to be recognizable, G is a grammar, N (G) is the maximal number of variables on the right-hand sides of productions in G and (1) is the upper bound for the number of steps to recognize whether x E L (G). This result partly indicates the importance of the classification Var and partly directs the attention to the classification N. Obviously N(G) <__ 2 for every language L. Moreover N ( L ) = 0 if and only if L is a finite set, N ( L ) = 1 if and only if L is an infinite linear language. Although the classification N (G) is not nontrivial, it seems, putting Ks = N, that the classifications K(1.0) or K(6.1~ may be interesting. 5. Let here and in the next two points K be some of the classifications K~ to Ks and n aninteger. Denote K-l(n) = {L; K ( L ) = n}. What can be said about classes K -~ (n)? Some results: By Gruska (1967), for every n there are languages L~ and L2 in K~-1(n) such that L~ [7 L2 is not a language. Moreover, if L C K~-~(n), (L -- L1) U (L~ - L) is a finite set, then L~ is also in L-~(n). The same is true for K4. Let an integer n be 178 ORUSK~ given. Is there an unambiguous language L in K-~l(n) (g~l(n) or g~l (n ) )? 6. W h a t is the relation between K(L1), K(L2) and K(L1 U L2), K(L1L2), K(LI*)? Some of the quite obvious results: (i) K(L1 U L~) < K(L~) + K(L~) + 1, K(L1L~) < K(L1) + K(L2) + 1, K(L~*) <= K(L~) + 1 if K = Var or K -- Lev; (ii) For every n there is a language L such that K ( L ) = n, K(L*) = 1 and either K = V a r or K = Lev, (iii) Dep (L1 [J L~) __< max (Dep (L~), Dep (L2)), Dep (LI*) _-< Dep (L~), Dep (L~L2) <-_ max (Dep (L~), Dep (L2)). For what integers n and m are there languages L1 and L2 in K -~ (n) such that (iv) K (L, U L2 ) = m, (v) K (L1 D L2 ) = m (vii) K (LI*) = m ? Some results follow from (i) to (iii). iVIoreover, it is known (Gruska, 1967), t h a t if 1 _-< m =< n then there are languages L~, L2 in K ~ ( n ) such t h a t Var (L~ n L2) = m. Is it also true for any m > n? Let L be a language and R a regular event. W h a t can be said about K (L N R), K (L - R ), K (LR) ? (If R is a finite set, then I Var (L) -V a r ( L - R){ < 1, I L e v ( L ) - L e v ( L - R){ __< 1, Dep (L) = Dep (L - R), Lev. (L) = Lev~ (L - R ) ) . Is it true t h a t Dep (L) = max {Dep (L n R), Dep (L n/~)} for any language L and any regular event R? Let S be a gsm mapping and L a language. W h a t is the relation betwen K (L) and K (S (L)) ? 7. For what n is it recursively solvable for an arbitrary grammar G whether K (L (G)) = n? Let L be a language. W h a t can be said about the sets K -~ (L)? One of the results: There is an unambiguous language L such t h a t K~ 1(L) contains only ambiguous grammars. Is it true also for classification K2 to K~ ? 8. To every classification K of grammars one can associate the relation, written ~ ' , between languages in the following manner: L1 ~ L2 if and only if there is a grammar G C K -~ (L~) and a variable A in G such t h a t G (A) = L2. W h a t can be said about languages L~ and L2 i f / 4 ' ~ L2 or LI ~'x L2 where ~* is " a transitive and reflexive closure of ~-? 9. As a corollary of inequalities (iii) in point 6 we have THEOREM. Let ~ be an alphabet with at least two symbols. Then there is no finite class of context-free languages over ~ such that e~ery context-free language in ~* can be obtained from these languages by using only a finite number of Kleene operations [J, • , * RECEIVED: July 8, 1968; revised January 3, 1969 CONTEXT-FREE LANGUAGES 179 REFERENCES I. CVLiK, K., (Roma 1962) Formal structure of ALGOL and simplification of its description. Symbolic languages in data processing, pp. 75-82. 2. CULIK, K., FRI~, I., GRUSKA, ft., HAVEL, I., KOPf~IVA,J., AND NOVOTNY, I~[., The mathematical theory of grammars and languages. (To be published). 3. GINSBURG, S., (1966) "The mathematical theory of context-free languages." McGraw-Hill, New York. 4. GINSBURG,S., AND ROSE, G. F., (1963a) Some recursively unsolvable problems in ALGOL-like languages. J.ACM. 10, 29-47. 5. GINSBURG,S., ANDROSE, G. F., (1963b) Operations which preserve definability in languages. J.ACM. 10,175-195. 6. GRUSKA,J., (1967) On a classification of context-free grammars. Kybernelika 3, 22-29. 7. HARTMANIS,J., AND STEARNS,R. E., (1965) On the computational complexity of algorithms. Trans. Am. Math. Soe. 117,285-306. 8. HARTMANIS,J., LEWIS, 2nd, P. M., AND STEARNS, R. E., (1965) Classifications of computations by time and memory requirements. Proceedings of I F I P Congress, 31-35. 9. KASAMI, T., (1967) A note on computing time for recognition of languages generated by linear grammars. Inform. and Control 10,209-214. 10. KoPfiIvA, J., (1964) Some notes on the formal structure of ALGOL 60. Publ. Fac. Sci. Univ. J. E. Purkyne, 409-418. 11. N•UR, P., (1963) Revised report on the algorithmic language ALGOL 60. CACM. 6, 1-20. 12. REDKO,V. N (1965), Some problems of language theory. Kibernetika 1, 12-21.

Log In

Some classifications of context-free languages

Related papers

Related papers

Related topics