I N F O R M A T I O N AND C O N T R O L
14, 152-179 (1969)
Some Classifications of Context-Free Languages
J. GRUSKAI
Mathematical Institute of SIovak Academy o.f Sciences, Bralislava, Czechoslovakia
1. INTRODUCTION AND SUM2iIAI~Y
The basic definitions and notations of the theory of context-free
grammars and languages (briefly grammars and languages) used in this
paper are as in Ginsburg (1966).
The classification of languages L according to the minimal number of
variables in grammars for L was studied in Gruska (1967). In this paper
some other classifications of grammars and languages are investigated.
They are chosen in such a way as to characterize some aspects of our
intuitive notion about complexity (of the description) of grammars and
languages and their intrinsic structure. The classifications of languages
are indicated by those of grammars. The intrinsic structure of a grammar
G is characterized by the number and by the depth of the grammatical
levels of G.
A grammatical level Go of a grammar G is a maximal set of productions
of G the left-side symbols of which are mutually dependent. The basic
concepts of grammatical levels and classifications of grammars and
languages are given in Sections 2 and 3. Only such classifications K are
considered here, wherein for every grammar G (language L) K(G)
(K(L)) is an integer. In this paper only nonnegative integers will be
considered. A classification K is said to be connected in an alphabet Z
if for every integer n there is a language L c Z* such that K ( L ) = n.
Sections 4 to 6 provide the proofs that the classifications according to the
number of variables, the number of productions, the number of grammatical levels, the number of non-elementary grammatical levels (that
is, the grammatical levels with at least two variables) and the maximal
depth of grammatical levels (that is, according to the maximal number
of variables in grammatical levels) are connected in any alphabet with
1 Currently at University of Minnesota, School of Mathematics, Minneapolis,
Minn. 55955.
152
CONTEXT-FREE
LANGUAGES
153
at least two symbols. All these classifications of languages are based
upon the c]assifications of grammars and the integer associated to a
language L is the minimal of those associated to all grammars for L.
If only a restrictive class of grammars for L is considered, we speak
about bounded classifications. T h e y are studied in Section 7 where
especially the case of regular events, one-side linear grammars and nonself-embedding grammars is investigated.
Section 8 is devoted to the relations between classifications of lan~
guages. Besides general results this section provides the proof that if
only classifications from Sections 4 to 6 are considered, then, with one
exception, for any two of them, written K and K', there is a language L
such that the class of the simplest grammars for L according to K is
disjoint with the class of the simplest grammars for L according to K'.
In Section 9 the so-called multiple classifications are considered. The
proof is given here that no two of the classifications considered in Sec~
tions 4 to 6 are symmetric and each two of them give a new classification
which is again connected in any alphabet ~4th at least two symbols.
In the final section some generalizations, open questions and problem
areas are discussed.
2. GRAMMATICAL LEVELS
k grammatical level of a grammar G is represented by a set of productions (of G) the left-side symbols of which are mutually dependent.
We can say t h a t the left-side symbols of productions in a grammatical
level are "equally complicated" or are " o f the same level", or define " t h e
equally complicated languages".
2.1. DEFINITmN. Let G = (V, ~, P, ~) be a grammar. For a set
a c V* let G ( a ) = {x; z ~ x E ~*, z E a}. For two variables A, B put
A ,-B if there are strings x and y such that the production A ~ x B y
G
is in P. Let .-be a transitive and reflexive closure of the relation.--.
For
G
G
two variables A, B p u t A ~G B i f A ~G* B and B ..*G A. For two productions
pl = A --~ a, p2 = B --~ fl put pl ~ p~ if and only if A ----~B. When there
is no danger of misunderstanding the symbol specifying grammar is
omitted.
2.2. COROLLARY. The relations -- on V -- ~ and --~ on P are obviously
equivalence relations. The relation ,,.- is termed dependence relation and was
introduced i n Culik (1962) and Kop~'iva (1964). A s to the relation =--, see
154
~nVSK~
Culik (1962). The relations -~ and ~1 are termed level relations for variables
and productions, respectively.
2.3. DEFXN~TION.Let G = (V, 24 P, ~) be a grammar and -~1 the level
relation on P. A set Go c P is said to be a grammatical level of G if and
only if Go is an equivalence class with respect to the relation 2 1 .
2.4. DEFINITION. A grammar G = (V, Z, P, ~) is said to be reduced-see Ginzburg (1966)--if for every variable A ~ a there are terminal
strings x, y and z such t h a t ~ ~ x A y ~ xzy.
2.5. Remarl~. A context-free grammar is usually defined as a quadruple
(V, ~, P, ~). If G is a reduced grammar and every symbol from ~ occurs
in a string in L (G), then G is uniquely determined by P and ~. The basic
relations -% ~ , ~ , ~-, .~* do not depend on ~. T h a t is why we regard
grammatical levels as (non-initial) context-free grammars.
2.6. LEM~a. For every grammar G r = (V', ~, P', o-) such that e is not in
L (G') there is a grammar G = (V, Z, P , ¢) such that the following conditions
are satisfied:
(1) V c V';
•
(2) if A , B are in V - ~, then A (7*~- B if and only if A ~ - B;
(3)
(4)
(5)
(6)
L ( G ' ) = L(G);
G is an e-free and reduced grammar;
There are no productions of the form A --~ B, B C V -- ~, in P,
A D.(7 A for every variable A ~ a, A C V -
~.
Proof. Let G' be given and e not in L (G'). By using the constructions
given in Lemmas 1.4.2, 1.4.3, 1.8.2 Ginzburg, 1966 and Theorems 1.8.1,
1.8.2 (Ginzburg, 1966) we can easily construct ~ grammar G1 for L (G~)
such t h a t for G1 the conditions (1) to (5) are satisfied. Now, let
61 = ( V i , Z , / ) 1 , a>. Suppose t h a t in V1 - ~ a variable A ~ ~ exists
such t h a t A *-- A does not hold. Let G2 = (V~ -- {A}, Z, P2, ~) be a
(71
grammar with P~ -- {B ---> ulv~u2 • • • v,u,+~ ; B --~ u l A u : • • • Au,+l is in
P1, A does not occur in UlU2 . . . u,+l and A -~ vl is in P1 for 1 =< i =< n}.
Obviously for G~ the conditions (1) to (5) are also satisfied and G~ has
fewer variables than G~. By repeating the application of the last construction we obtain a grammar G satisfying all the conditions (1) to (6).
2.7. DEFINITION. Jk grammar G is said to be perfectly reduced if for G
the conditions (3) to (6) of the previous lemma are satisfied.
CONTEXT-FREE LANGUAGES
155
2.8. COnOLLAaY.I f G is a perfectly reduced grammar then G (A ) is an
infinite set for every variable A # a, A in G.
3. CLASSIFICATIONS OF GRAMMARS AND LANGUAGES
3.1. DEFINITION. Denote by 8 the class of context-free grammars and
by ~s the class of context-free languages. Moreover, let I be the set of all
nonnegative integers. A mapping K: 8 --~ I (K: 2~ -* I ) is said to be a
classification of context-free grammars (languages).
The concept of a classification of grammars (languages) iust defined is
too general to obtain some more interesting results. In the sequel only
some special classifications will be studied. They will be chosen in such a
way as to characterize some aspects of our intuitive notion about the
complexity of grammars and languages. However, one can expect that
also some other classifications will be found to be interesting and important or even more important and more interesting. That is why the
definitions are formulated rather generally.
There are many ways to associate a classification of languages with a
given classification of grammars. Some of them will be investigated in the
following. If a classification of grammars is meant to characterize the
complexity of grammars, then it seems to be quite natural to extend K to
classify languages in the following manner:
3.2. DEFINITION.Let K be a classification of (context-free) grammars.
We extend K to (classify) context-free languages putting
K ( L ) = min {K(G); L(G) = L}
for every language L and we shall speak about a natural extension of K
from 8 to g U 2~.
RemarIc. Whenever in the following K will be a classification of
grammars and K will be applied to a language L, then tile natural extension of K is supposed.
3.3. DEFINITIOn. A classification K of languages is said to be nontrivial (connected) in an alphabet Z if for every integer n > 0 there is a
language L c Z* such that K(L)>= n ( K ( L ) = n). A classification K of
grammars is said to be nontrivial (connected) in ~n alphabet ~ if the
natural extension of K has this property.
4. CLASSIFICATION A C C O R D I N G TO T H E MAXIMAL D E P T H
OF GRAMMATICAL L E V E L S
4.1. DEFINITION.Let G be a grammar and Go a grammatical level of G.
Denote by Dept (Go) the number of distinct variables on the left-hand
156
GaUSKa
sides of productions in Go. Moreover, denote Dep (G) = max {Dep (G0) ;
Go is a grammatical level of G}. Dep (G) (Dep (L)) is called the maximal
depth of grammatical levels of G (of L).
In this section the classification of languages L according to Dep (/1)
is studied. I t is well-known t h a t Dep (R) = 1 for every regular set R
(Ginzburg, 1963b). Languages L with Dep (L) = 1 are called sequential.
I t is known t h a t there is a sequential language which is not regular--for
example l a~b'; n>= 1}--and t h a t there is a language which is not sequential (Ginzburg, 1963b). The following theorem asserts t h a t a more
detailed classification of (linear) languages is possible.
4.2. T ~ E o n ~ . For every integer n_~ 1 and every alphabet Z with at least
two symbols there is a linear language L over Z such that Dep (L) = n.
Proof. The case n = 1 is trivial. Let now an n ->__2 be given. Denote by
L . the language generated by the grammar with the productions :2
¢~~ azb l aba~A2bab
(I)
A t ---> a~Aib I bai+lA~+lba
2 <_ i <_ n -- 1
A~ ~ a~A~b l bca l b2~
In order to prove the theorem it is sufficient to prove t h a t Dep (L~) = n.
Since the inequality Dep (L,) 6 n follows directly from (1), it suffices
to show t h a t Dep (G)> n for every grammar G for L~. To do this we
first introduce some properties of strings of the language L~.
Let us define the so called LP-strings x (left-part strings) and their
representations x (x) in an inductive way
(i) if x = (a~)~b(a2)Z~b . . . b(a~)l~b for some positive integers
/1, .-- l~, then x is an LP-string and x ( x ) = abZ~a . . . ab ~,
(if) if x~ and x~ are LP-strings, then x~xa is also an LP-string and
(iii) there are no other LP-strings.
By using LP-strings we can give a more detailed characterization of the strings in L~ :
(iv) z C L~ if and only if z = x b a l x ( x ) for an LP-string x
(v) if z E L~, then the decomposition z on xbax(x) is determined
uniquely because there is no occurrence of "aa" in x (x) and
"bb" in x if x is an LP-string.
We denote here and in the remainder of this paper grammars in an abbreviated
ror~. w e writ~ a
-* ~,
I~
I""
["~
inste,d of ~
-* ~,
~
-* ~
,-..,
~
-~
~,.
CONTEXT-FREE LANGUAGES
Denote
,(z) = x
and
I57
~(z) = x ( x ) .
For every string z let Ib (z) (la (z) ) be the number of occurrences
of b (of a) in z. According to (i) to (iv) we have
(vi) if z C L~, then lb(z) <= l~(z) < nlb(z) and IbO,(z)) =
~o(,(z))
T o prove t h a t Dep (G)_> n for every grammar G for L~ assume by way
of contradiction t h a t there is a grammar G for L such t h a t Dep (G) < n.
According to L e m m a 2.6 we can suppose t h a t G -- (V, Z, P, ~) is a perfectly reduced grammar. G r a m m a r G is linear. To prove it assume by way
of contradiction t h a t in G there are variables A, B and terminal strings
u, w, v such t h a t z ~ uAwBv. F r o m this and from (iv) and (v) we get
immediately t h a t both sets G (A) and G (B) are finite, which contradicts
Corollary 2.8. Thus G is a linear grammar.
Now suppose that A ~ uAv for a variable A and strings u, v. We shall
investigate the structure of strings u and v. Since G is perfectly reduced,
there are terminal strings p, w and q such t h a t
a~pAq~pu~wvlqC
L,~ for every i-> 0.
According to (iv) and (v) we get
(vii) the string "bb" does not occur in pu 2 and the string "aa" does
not occur in v2q.
Denote b y zi the string pu%v¢q and let us consider three cases
(A) u = e. T h e n zi = pwv~q. B y (vii), the string "aa" does not occur
in v~q and therefore , (z~) = v (z~.) for all i, j. Hence z~ = zi for all i a n d j .
Consequently v = e and this contradicts the assumption t h a t G is a
perfectly reduced grammar. Hence, the case u = e is impossible.
(B) u = a ~ for an integer m > 0. B y (vi), the symbol b occurs in v.
B y (vii) there is no occurrence of the string "as" in v~q and thus,
l p(z~)[ <-_ ]pu~w] for all i. Hence lb(~(zi)) is a constant and, b y (vi),
l~(~(z~)) is the same constant. Therefore v = bin1 for an integer mlo
From this we deduce t h a t there are strings p0 and w and integers i < n~
k,/c~, k2,/c3, s~ < i , s~ < i, sa < i such t h a t
p = poalk~a "~,
u = a Ik2+~2,
w = alk3+'~bw~
where the string "aa" occurs in w~vq and either po = e or p0 has
n/c ~- (i -- 1) occurrences of b~s and p0 ends with b. Then sl ~- s2 -~
sa E {0, i}. Since also pu2wv2q E L~ , sl -]- s2 -}- s2 + s3 C {0, i, 2i}, which
is possible only if s~ --- 0. Therefore m = ilc~. From this and from (i) to
]58
GRUSK~
(v) we deduce that ml = k2. Similarly we get that if A ~ a~'Ab ~1 for
some integers ~ and ~ 1 , then r~ = ~ i i with the same i.
(C) The symbol b occurs in u. According to (vii), there are uniquely
2 2
determined strings w0 and w~ such that uwv = wobaw~ where
lw01=
> l u l -- 1 and ]wl[=
> lvl -- 1. Hence lb(~(z~)) = Ib(p) 9- (i -- 1),
lb(u) 9- lb(w~) q- 1. On the other hand we deduce from (i) to (v) that
for every z E L~, Ib(~ (z)) is a multiple of n. Therefore lb (u) = kn for a
suitable k and we have
(viii) the number of b's in u is a multiple of n and there is no occurrence of "bb" in u 2.
This information about the structure of u and v will be sufficient for our
purposes. Before approaching the main part of our proof we have to
introduce the concept of an ith level for strings in L~ and the concept of a
variable of the type i. We now do it.
Let z E L~ and z = robrlbr2, rl = a m, I ~(z)l> I robr~b I. If lb(rob) =
/~n q- j, j < n, then we say t h a t rl forms the (j q- 1)th level of z. If
z E L~, Ib (v (Zo)) = kon, 1 <=j <--_n, t h e n j t h level occurs in z k0-times. If
r0 = a ~, then we say t h a t r0 forms the first level of z.
A variable A in G is said to be an a-variable in G if there are strings
u C {a}*a and v such that A ~ uAv. B y (B), for every a-variable A
there is a uniquely determined integer j, called type of A, such t h a t if
A ~ uoAvo, Uo E {a} *a, then u0 = a j~, v0 = b~ for suitable ]c. If A is an
a-variable of the type j and, in a derivation # in G from A, the string
u~Avi, ul E {a} *a is derived, then u~ always builds up exactly the j t h
level of the derived strings.
A string z is said to be /-complete if z = xo~bax(Xo ~) where
xo = a~bc~b "'" ba~ib, lb(xo) = n.
Now let d be the maximal length of right-hand sides of productions in
G and p the number of variables in G. Let z0 be an/-incomplete string
where i > pd. Let # be a derivation of z0 from a in G. Since G is a linear
grammar, # determines uniquely the sequence of productions
(2)
Aj--+
as,
j = 0, 1, " .
, m
being used, following one another, in the derivation ~. Since i > d, we
get from (viii) t h a t no production in (2) has the form A --~ uAv where
b occurs in u. Similarly in (2) there are no productions of them form
A "--->uBv where B ~ A and u has two occurrences of b. Therefore the
productions in (2) have one of the following two forms:
(3)
A ~uBv,
u E {a}*,
159
CONTEXT-FREE LANGUAGES
(4)
A --*uBv,
B
~
A,
u ~ {a}*b{a}*,
B
is a variable.
Let
j~, j2, ...
, j,
be the indices of all productions of the type (4) in (2). If we speak in the
following about a kth group of productions, we shall have in mind the
productions with indices jk -t- 1, .-. ,A+I -- 1. Since Ib(~(zo)) = ni, we
have r >- ni -- 1. Moreover, from the inequality i > pd we
get jk + p < j~+~ for k = 1, 2 , . . . ,
r - 1. Whence for any
k = 1, 2, • • • , r - 1 a variable Bk exists such t h a t Bk occurs on the lefthand side of at least two productions of the kth group of productions.
Hence B~ is an a-variable. Let sk be the type of B~. Since r > in - 1 > p,
at least two of the variables B1, • • • , Be are the same. Let B~,0 and Bkl,
1~0 < kl, be two neighbouring occurrences of the same variable B in
B1 • • • B , . Let s be the type of B. Then the koth and kith groups of produetions build the sth level of the derived string. If Dep (G) < n, then
/~1 < k0 + n and we get t h a t there are at most n - 2 levels between two
sth levels of the string z0 C L~ which is impossible. Hence the assumption
Dep (G) < n yields a contradiction and the theorem is proved.
Remark. B y Culik (1962)--see also Kop~iva (1964) for a correction-any context-free language can be represented b y a finite number of
applications the following operations: (i) set union, set product, substitution and two special (n + 1 )-ary operations
(ii) [[¢, 4~, . - . , 4~]]~ and [[¢, 41, "'" ,*~1]*
A~s a corollary of the preceding theorem we get t h a t for every m there
is a language L such t h a t if we want to represent L b y using operations
(i) and (ii) in the same way as in Culik (1962), then we have to use
some n-ary operations of the type (ii) with n -> m.
B y l~edko (1965), any context-free languages can be constructed from
basic languages by a finite number of the operations composition and
weak recursion. B y the preceding Theorem, for every m there is a
language L such in order to construct L an n-ary operation of weak recursion has to be used with n => m.
5. C L A S S I F I C A T I O N A C C O R D I N G T O T I l E M A X I M A L N U M B E R
OF (NON-ELEMENTARY) GRAMMATICAL LEVELS
5.1. DEFImTmN. Let G be a grammar. Denote by Lev (G) the number
of grammatical levels of G. 3. grammatical level Go is said to be non-
160
~nvs~
elementary if Dep (Go) > 1. Denote b y Lev~ (G) the number of nonelementary grammatical levels of G. A grammar G is said to be inherently
context-free if Lev (G) = 1. A language L is said to be inherently contextfree if Lev (L) = 1.
5.2. THEOREM. For every integer n _~ 1 and every alphabet Z with at
least two symbols there is an inherently context-free language L in ~ such
that Dep (L) = n.
The Theorem follows directly from the proof of Theorem 4.2.
5.3. TI~EOREM.For every integer n > 1 and every alphabet ~ with at least
two symbols there is a regular event R over Z such that Lev (R) = n.
Proof. T h e case n -- 1 is trivial. Denote R: = {a}*a (J {b}. Since for
Rs there is a grammar with two variables we have Lev (R~) _-< 2. Suppose
now t h a t for R2 a grammar G = <V, {a, b}, P, ¢} exists such t h a t
Lev (G) = 1. B y L e m m a 2.6 we can suppose t h a t G is perfectly reduced.
Since Lev (G) = 1, there are strings u, v such t h a t uv ~ e, ~ ~ u~v. B u t
then ubv is in R~, which is a contradiction, whence Lev (R~) -- 2.
To prove the Theorem for n > 2. Let R~+I, n__> 2, be the regular event
generated b y the grammar:
(1)
1 <_i<-n.
cr~.--) ~ a~b l a~b
Consequently, Lev (R.+l) < n + 1. Now suppose t h a t for R,+, a grammar G = <V, {a, b}, P, a} exists such t h a t Lev (G) < n -{- 1. We can
again suppose t h a t G is a perfectly reduced grammar. F r o m the proof of
L e m m a 1, Gruska (1967) we get t h a t if A E V -- Z, A ~ ~, then
(2) there are uniquely determined integers i, s, if, is such that
1 _-< i __< n, 0 _-< i~ =< n, 0 =< i~ =<_ n, 0 ___<s =< 1, G ( A ) c
ai~b'{a~b}*a ~ and either s = 0 = ii or s = 1. (In such case we
write i = ~ ( A ) . )
F r o m t h a t it follows t h a t (i) if A1 ~ ~ ~ A2 and both variables Ai and
As belong to the same grammatical level, then ~(A1) -- ~(A~), and (if)
A ~ - ¢ for no variable A # ~. Thus, if Lev (G) < n + 1, then there is an
integer i0, 1 =< i0 =< n such t h a t if A is a variable and A # ~, then
(A) # i0. Now let a ~ uBv for a variable B and strings u, v. T h e n
G(uBv) c {a*(')b} *. Hence the set G(S) N {a~°b}* is finite; a contradiction with the definition of R~+~. Therefore Lev (R,,+,) -- n + 1.
T h e main result of this section is
CONTEXT-FREE
161
LANGUAGES
5.4. T~EO~nM. For every integer n>= 0 and every alphabet Z with at least
two symbols there is a linear language L~ over Z such that L e v . (L) = n.
Proof. T h e case n = 0 is trivial. L e t n = 1 be given and L be the language g e n e r a t e d b y t h e g r a m m a r with productions
(1)
¢i --~ a~zi b ] a~ba~+~Sibab
1 <- i <- n
c'~ ~
Si --+ a n-~i .~o
i bzia [ b 2 a 2
Obviously L = Ui~l L~ where L~ are m u t u a l l y disjoint languages and
L~ is g e n e r a t e d b y the g r a m m a r G~ with two variables z~ and S~ and with
the s a m e productions for z~ and S~ as in (1). I n order to give a m o r e detailed characterization of the languages L~ we define the so called LP~strings, 1 < i _< n, and their characterization in an inductive way.
(i) I f x = a~kba(~+~)Zb for some positive integers k, 1 t h e n x is an
LP~-string and x (x ) = ab Zab~
(ii) I f xl and x2 are LPi-strings, t h e n so is x~x2 and x(xlx2) =
x (x~)x (x~)
(iii) T h e r e ~re no other LP~-strings.
Consequently
(iv) z E L i if and only if z = xbax (x) for a LP~-string x;
(v) if z E L i , t h e n t h e decomposition z on xbax (x) is d e t e r m i n e d
uniquely because there is no occurrence of "aa" in x (x) and
"bb" in x if x is an LP~-string. D e n o t e ~ (z) = x, ~ (z) = x (x);
(vi) if z = a~bzoab z E L, t h e n z C L_~ ;
l
(vii) i f z = powqo ~ L, IPol < Iv(z)[, Iqo] > fPol, t h e n lb(po)
la (qo),
if z =
powqoCL,
Iq01
<
I~(z)l,
! p 0 1 =>
2nlq0l,
then
z~(po) > lo(q0).
Suppose now t h a t there is a g r a m m a r G = (V, {a, b}, p, a} for L such
t h a t Lev~ (G) < n. B y L e m m a 2.6 we can suppose t h a t G is a perfectly
reduced g r a m m a r . I n a similar w a y as in the proof of T h e o r e m 4.2 we
can p r o v e the following assertions:
(viii) G is a linear g r a m m a r
(ix) if A ~ u A v for a variable A and strings u, v, t h e n
(a) u ~ ~ ~ v
(b) if u E {a} *, t h e n there are integers k, i such t h a t I ~ i ~ n,
v = bk and either u = a ~k or u - a (~+~)~. Moreover, if
162
6ausKa
A ~ uiAvl, ul C {a} *, then there is also an integer kl, such
that v = bkl and either u = a ~kI or u = a t~+°kl. We say t h a t
the variable A is of the type i. B y the foregoing the type of a
variable is determined uniquely.
(c) if u ~ {a}*, then lb(u) = 2k for a suitable k.
(d) if ~ ~ pAq, then the string "bb" does not occur in pu 2 and
"aa" does not occur in v2q.
Now we prove some other auxiliary assertions
(x) If A is an a-variable-i.e. A ~ akAv for some integer k- and
~ pAq, then Ib(p ) = la(q).
Proof of (x). Let w E L (A). Since A is an a-variable there are integers
kl > 0 and k2 > 0 such t h a t A ~ ak'Ab k~. Hence pwq C L and also
pa~lwb~2q C L for every integer i. Hence it follows t h a t I Pl =<
I v(pa~wb~k~q) l and [ql < I~(Pa'k~wb'~q)l • According to (vii) we get,
for ~ sufficiently large i, Ib(p) = la(b~k2q) = l~(q), Ib(p) = lb(pa ~k~)
_--__l~ (q) and this completes the proof of (x).
(xi) Let A be an a-variable involved in a derivation of a z C L i ,
1 -< i -< n. Then A is of the type i,
Proof of (xi). Let ~ ~ pAq ~ pwq = z C Li for some strings p, q, w.
Since A is an a-variable, there are integers ]c~, k2 such t h a t A ~ a~Ab k~.
Since z E L~, we get according to (i) to (v) and (x) t h a t either k~ = ik2
and Ib(p) is even or kl = (i q- n)k2 and lb(p) is odd. B y (ix) this completes the proof of (xi).
(xii) If A, B are two a-variables of the same grammatical level,
then both variables are of the same type.
i
Proof of (xii). Since A and B belong to the same grammatical level,
there are strings p, u, w, v, q such t h a t
¢ ~ pAq ~ puBvq ~ puwvq E L
Now (xii) follows from (xi).
P u t N = p d where p is the number of variables in G and d is the
maximal length of right-hand sides of productions in G. Denote by z~,
1 _-< i _-< n, the string x~bax (xl) where x~ -- (a~ba('+~);b) ~. Fix an i,
l<_i<_n.
Obviously z~ ~ L~. Let ~ be a derivation of z~ from a and let
(2)
A~--~ a~,
j = 1, 2, ... , m
be all the productions involved, following one another, in the derivation
163
CONTEXT-FREE LANGUAGES
q/~. Since N > d, we have in (2) only productions of the two following
types
(3)
A --~ uBv,
u C {a}*,
v E {b}*
(4)
A --~ uBv,
B ~ A,
B
is a variable and
u ~ {a}*b{a}*.
Let
f f ,l " °
°
jT
be, in the increasing order, the indices of all productions of the type (4)
in (2). If we speak in the following about a kth group of productions, we
mean the productions with indieesjk + 1, - . . , j~+l - 1.
Now let 1 -< k _< r - 1. Since N > p, fl+l > 3"k • p and therefore in
the ]~tI~group of productions, there is a production having an a-variable
on the left-hand side. Denote this variable by Bk.
Since/b(,(z~)) = 2N and only productions of the type (3) and (4)
are in (2), we get immediately r>= 2 N - 1 > p. But then there are
integers ]ci, k2 such that 1 =< kl < k2 < r and Bk~ = B ~ . Since the productions of the type (4) have different symbols on the left-hand and
right-hand side, a vari&ble A ~ Bk~ has to exist such that B h ~- A and
$
A , ~ Bk~. Thus, the variable Bk~ belongs to a grammatical level with at
least two variables. Moreover, by (xi), Bk~ is of the type i. Hence, for
every integer i, 1 -< i < n, there is an a-variable A ~of the type i which
belongs to a non-elementary grammatical level. By (xii), if A and B are
two a-variables in the same grammatical level then both are of the
same type. Hence Lev~ (G) > n, wMch contradicts our assumption
Lev~ (G) < n. From this and from (1) we get Lev~ (L) = n.
Theorems 4.2 and 5.4 do not hold if Z is an alphabet with only one
symbol. Indeed, in t h a t ease, b y Gruska (1967), Dep (L) = 1,
Lev (L) = 2 and Lev. (L) = 0 for every language L c E*.
Remarl~. As to the grammar for ALGOL 60, Naur (1963), we have, by
Kop~iva (1964), Lev~ (L (ALGOL 60) ) =< 2 Dep (L (ALGOL 60) ) =< 26.
6. CLASSIFICATION ACCORDING TO THE NUMBER OF
VARIABLES AND PRODUCTIONS
6.1. DEFINITmN. Let G be a grammar. Denote by Var (G) (Prod (G))
the number of variables (of productions) in G.
As to the classification of languages according to Var (L) we have the
result of Gruska (1967):
164
GRUSKA
6.2. THEOrEm. For every integer n ~ 1 and every alphabet Z with at
least two symbols there is a regular event R in Z such that Var (R) = n.
Finally we consider the classification according to the minimal humber of productions.
6.3. TI~EOaE~. For every integer n>= 1 and every alphabet ~ there is a
finite set L C ~* such that P r o d (L) = n.
Proof. T h e case n = 1 is trivial. L e t n > 1 be given. Denote
L = {a~; i = 0, 1, . . . , n -- 1}. Obviously Prod (L) < n. Let
G = (V, {a}, P, a} be a g r a m m a r for L such t h a t Prod (G) = Prod (L).
G is a reduced grammar. I f A ~ z is a variable in G then the set G (A)
contains at least two strings. Indeed, in the opposite case there would
exist a g r a m m a r G~ for L having fewer productions t h a n G. T h e same
obviously holds for G (~) = L.
We now prove t h a t G is a linear grammar. Indeed, suppose t h a t there
are terminal strings x, y, z and variables A, B such t h a t z ~ x A y B z .
Let a ~ = xyz, {a ~, a ~} ~ G ( A ), {a ~, a ~} ~ G ( B ) . Since L ( G ) = L,
there are integers s~, Ss, s~ and s4 such t h a t
i2+j~
i1~3"1 = 2 ~1 - i ,
il + j~ = 2 "~
-
-
i,
= 2" ~ - i
i2--~ j2 = 2~4 -- i.
Hence 281 -- 2 ~ -- 288 + 284 = 0. This is possible only if either Sl = ss
and s~ = s4 or sl = 83 and s2 = s4. Thus, jl = j2 in the former case and
il = is in the latter one, which contradicts the fact t h a t the sets G (A)
and G (B) contain at least two different strings. Hence, G is a linear
grammar.
N e x t suppose t h a t there is a variable A in G such t h a t A occurs on
the right-hand side of at least two productions B1 --~ xiAy~, and
B2 -~ xsAys. Investigate two cases:
(i) If ~ ~ uoB~vo and ¢ ~ ~oBsO0, then ]Uox~y~vol = 1~oxsy2~ol. B u t
then we can omit the production Bs ~ x2Ay2 without changing the set
generated b y the g r a m m a r and thus this case is impossible.
(ii) There are u0, vo, u0, v0 such t h a t
~
uoB~vo,
~ ~
~0Bs~0 and
l uoxlylvol
~
[~0x2y2~01 •
L e t {a ~, a ~} c G (A). Denote j~ = l uoxly~vo I, j2 = [~ox2y20o [. T h e n there
are integers Sl, ss, sa and s4 such t h a t
j~+i~
= 2 "~,
j~+is
= 2 '~
j2+il
= 2 "3,
js+is
= 2 *~
CONTEXT-FREE LANGUAGES
165
and we get a similar contradiction as above. Hence, every variable A
occurs only once in the right-hand sides of productions. But this means
that if we omit a variable A ~ ~ from the vocabulary of G and in all
right-hand sides of productions we replace A by its right-hand sides,
then a grammar for L with a smaller number of productions and with a
smaller number of variables is obtained. Consequently the grammar G
has only one variable and thus, Prod (L) -- n.
Re~zark. If L is a finite language with n strings then obviously
Prod (L) =< n. A question arises as to whether it is possible to put a
reasonable lower bound for Prod (L) if L consists of n strings. The
following example indicates t h a t the answer is likely negative.
6.4. Example. Let n=> 1 be an integer. Consider the grammar G with
three productions
cr .--->S",
S .--* a,
S ~ aa
Then L ( G ) = {a s , n <- i <- 2n} and Prod (L) = 3.
7. B O U N D E D C L A S S I F I C A T I O N S
According to what has already been said in Section 3, there are many
ways how to associate a classification of languages with a given classification of grammars. One of them, the so called natural extension, is considered in Section 3; i.e., if K is a classification of grammars, then, for a
particular language L, K (L) = rain {K (G); L (G) = L}. In this definition the minimum is taken throughout all context-free grammars. If
only a special class of grammars for L is admitted, then we speak about
a bounded classification.
7.1. DEFINITION. Let ~b be a class of grammars and K a classification
of grammars. Put, for a language L,
K ~ ( L ) = inf {K(G); L ( G ) = L, G C ~}.
K~ is said to be a classification K bounded to ~.
The case K¢ (L) = ~ is possible for a language L. I t means that
L~2~.
Throughout this section we shall consider the class 8 of context-free
grammars, the class 8, of non-self-embedding grammars and the class
80 of one-side linear grammars and investigate the classifications of
regular events with respect to the classifications considered so far.
First we have
166
GRUSKA
7.2. THEOREM.
Dep~ (R) = Dep~ (R) = 1,
Lev~,~ (R) -- Lev~,s~ (R) = 0
for every regular event R.
Proof. Directly from Ginzburg & Rose (1963b)--see also Exercise
10, p. 55 Ginsburg (1966).
On the other hand the classifications Dep~ o and Lev~,~ o are conuected
in the class of regular events. We have
7.3. THEOREm. For every integer n>= 1 there are regular events R and
R' such that Dep~ 0 (R) = n = Lev~,~ 0 (R').
Proof. Let n > 1 be given. Denote b y R the regular event generated
b y the grammar with productions
zl --~ ¢~a~ ] ~i+lai+ib,
2 _< i -< n -- 1
and by R' the regular event generated b y the grammar
zl --> o'~a~]Sia~+~b,
S i "--> ~ i a
1 <- i <_ n
Icqa b [b
B y using methods similar to those in proofs of Theorems 4.2 and 5.4
we can prove that Dep~ 0 (R) = n = Lev~.80 (RP).
If the classification Var is considered then different results are obtained if different classes of grammars are considered.
7.4. T~EOnEM. There is a regular event R such that
Var (R) < Var~ (R) < Vary0 (R)
Proof. Denote
(1)
R = {a} *ba{b} *{a} *ba{b} *
In the following we shall consider several grammars. In all the cases we
denote b y d the maximal length of productions of the grammar under
construction. P u t
~2o = adbab~adbab~
CONTEXT-FREE LANGUAGES
167
For x = xtz2 - -. xl~l, x~ -( [a, hi, let ¢ (z) deaote the number of indices
i such that x~ ~ z~+l. TheI~ we have
(i) i f x C / ~ , t h e n ~ ( x ) < 7
The proof of the Theorem, will be divided into several steps:
(ii) Var (R) > 1.
Proof of (ii). Suppose (ii) does not hold, i.e. there is a grammar
G for R with one variable ~. B y (i), G is linear. Moreover, if ~ --~ z~¢~y~
and ¢ ~-* x ~ y ~ are two productions of G then, by (1) and (i), x~.v2 6 [a} *,
y~y2 6 {b}*. ttenee 50 C L ( G ) and this contradiction proves (ii).
(iii) Var (R) = 2.
Immediately from (ii) and from the fact that R = L ( G ) where G has
productions z --~ S~S~ , S t --~ aS~l Slb ]ba.
(iv) Vara,, (R) > 2.
P r o o f of (iv). Suppose that (iv) does not hold. Then, by (iii), there is a
grammar G for R with two variables ~ and A. If A ~ ~, then G is onesided
linear. Without loss of generMity we can suppose that G is right-linear.
According to (i), if ¢ --> xlo-, ~ --~ x2~, cr - ~ xaA, A ~ x,¢, A --~ x~A,
A --~ x6A, then xlx2x3x4xsx~ C l a}* and hence again ~0 ~ L (G).
Now suppose t h a t A -- ~ does not hold. I f A --~ x~Ay~ and A - ~ x~Ay2,
then, by (i), either xlx2 = e and ~ (yly2) = 0 or y~y~ = e and ¢ (x~x2) = O.
Similarly if~ -~ ul~vx, ~ ~ ~e~'e, then either u~u~ = e and G(v~v~)
{b}* or v~v~ = e and G(u~u~) ~ {a}*. From that we conclude 20 ~ L ( G )
and this completes the proof of (iv).
(v) Var~ (R) = 3
I t follows from (iv) and from the fact that R is generated by the grammar
with productions: ~ --~ S S , S ---> a S t bS~ , S~ -+ S~b ta.
(vi) Vara 0 (R) > 3
Proof of (vi). By (v), Varz 0 ( R ) > 3. Now suppose that there is a
onesided linear grammar G for R with three variables. We can suppose
without loss of generality that G is right-linear. If A --= B for two vari. A ~ xaB, B ~*x 4 A , B ~*x ~ B , B ~*x ~ B ,
ables in G and A ~ x~A, A ~ xoA,
then, according to (i), ~ (x~xo. • • • x~) = O. Similarly if A --~ x~A, A ~ x~A
for a variable A in G, then ~ (x~xe) = O. From that we again conclude
20 ~ L (G) and (vi) is proved. This completes the proof of the Theorem.
A similar result holds for the classification Lev.
7.5. T~EOaE~. There is a regular event R such that
Lev (R) < Levs~ (R) < Levs0 (R).
168
GRUSKA
We do not give the detailed proof here but using the ideas and results of
the foregoing proof we can easily show that Lev (R) = 2, Le~,~ (R) = 3,
Lev~0 (R) > 3 f o r R = {a}*ba{b}*ba{a}*ba{b}*.
Finally we have
7.6. THEOREM. There is a regular event R such that
Prod (R) < Prod~ (R) < Prods0 (R)
The proof is not given in this case either. But it is quite easy to show
for the regular event R = {a} *ba{b} *{a} *ba{b} *, that Prod (R) = 4,
Prod~ (R) = 5 and Prod~ o (R) > 5.
The following example shows that the difference between Var (R) and
Var~ 0 (R) can be arbitrarily large.
7.7. Example. P u t Ri = {[a}*ba{b}*} ~. By using similar methods
as above we can show that Var (R~) = 2, Var~ 0 (R~) = 2i, Prod (R~) = 4,
Proda~ (R~) > 2i, Lev (R~) = 2, Levso (R~) = 2i.
The following examples show that bounded classifications can yield
essentailly different classifications of languages than the original classifications.
7.8. Example. Denote
R1 = {b} *{a} *ba{ba}*a{aba}*
R2 = {b} *{a} *ab{b} *{a} *ab{b} *{a} *
T h e n V a r (R1) = Var (R2) = 2, 3 = Var~ (R~) < Var~ (R~) = 4,
6 = Vary0 (R2) > Vary0 (R1) = 4.
7.9. Example Denote
R1 = {a}*b{a}*b{a} *15}*a{b} *a
R2 = {a} *b{a}*ba{b} *{a}
*ha{b} *a{b} *
Then 3 = Var (R,) > Var (R~) = 2, 3 = Var~ (R,) = Var~,~ (R2) = 3,
5 = Vary0 (R~) < Vary0 (R2) = 6.
Remark. Let ~1,5C2,5C3 be any symbols from the set { < , > , = }. It seems
that there are regular events R~ and R2 such that Var (R~)SC~ Var (R2),
Vara~(R1)~2 Vara~ (R2), Vara0 ( R 1 ) ~ Vary0 (R~).
8. RELATIONS BETWEEN CLASSIFICATIONS
In this section the basic concepts concerning the relations between
classifications of grammars are defined and some relations between
169
CONTEXT-FREE LANGUAGES
classifications defined in preeeeding sections are investigated. To be
more brief, we shall write K1, Ks, • • •, K5 instead of Vat, Dep, Lev,
Lev~, Prod, respectively. Moreover, we shall write K~,~ instead of
(K~)~ if ~ is a class of grammars.
8.1. D~FINITION. Let K be a classification of grammars and ~ a
class of grammars. Put
K~I(L) =
{G; L(G)
= L, G ~ ~, K(G)
= K¢.(L)}
for every language L. K-~I(L) is said to be the class of the simplest
grammars for L with respect to K and ¢,. We write K -1 (L) instead of
K2~ (L ).
8,2. COROLLAaY.I f L1 ~ L~ are two different languages and ~ a class of
grammars, then K~ ~(L~) fl K¢ 1(L2) --- O.
8.3. DEFINITm~. Let K~ and K~ be two classifications of grammars
and ~ a class of grammars. The classifications K~,~ and K~.~ are said to
be bound if K:,I¢ (L) N K~.~(L) = 0 for no language L. K,,¢ is said to be
stronger than Kz,~ if K : ~ (L) C Kz,~
- 1 (L) . Finally K~,~ is said to be
equivalent to K~,~ if K~.¢
-~ (L) = K2~,~
~ (L) for every language L.
8.4. ConosLAm-. Let K be a classification and f: I --~ I a monotone increasing function. Define K~(G) = f (K (G) ) for every grammar G. The
classifications K and K , are obviously equivalent.
8.5. T ~ E o n ~ . Let K~ and K~ be two classifications of grammars and
~1 _~ ~hfor two classes of grammars. Let for every language L, K-21,~)L/ n
K~,~¢(L ) N ~b~ ~ ~. Then (i) the classifications K~,,~ and K¢,~ are bound,
(ii) if K~,¢ is stronger than K~,¢ then K~,.~t is stronger than K~,~t , (iii) if
K~,, is equivalent to K~,¢, then K~,¢~ is equivalent to K~,~t.
Proof. (i)follows from the condition that K:,~ (L)/7 ~.~(L) N ~ ~
for every language L. Now let K,,~ be stronger than K~.~. Let L be a
language. Then
(,)
K -~
.,~
(L) = {G; L (G) = L, K~,, (L) = K~ (G), G ~ ~P} c K -1~,~(L)
=
{G;
L(G) = L,
K~,,~(L)
=
K~(G), G ~ g,}
and K,~,¢(L) = rain {K~(G); L(G) = L, a ~ ~}. Since
we have K.,¢ (L) = K.,¢.~ (L) and similarly K~,~ (L) = K~,~ (L). Therefore K:~t~(L) = K=~(L) ['] ~bl. Similarly K-2I~,~(L) = K -1~,¢(L) ~ ~,~
and (ii) follows from (*), (iii) follows directly from (ii),
As to the classifications K~, 1 -< i _< 5 we have the following results.
170
O~USK,t
8.6. TgEon~M. Let 1 <= i, j <= 5, ij ~ 8. Then the classifications K~
and Kj are bound if and only if i = j.
The case i = j is trivial. In order to prove the Theorem it is sufficient
to show that if i ~ j, ij ~ 8 (i.e., the cases i = 2, j = 4 and i = 4,
j = 2 are omitted), then a language L exists such t h a t K71(L) N
K71 (L) = 0. We shall give here such a language for every considered
pair, i, j but the proof t h a t K71 (L) N K71 (L) = @ will be omitted because it is quite obvious in some cases and in other cases it can be
proved by using similar methods as those used in previous sections although the proofs are rather cumbersome. To be brief denote ~bi.i(L) =
K71(L) n KTI(L) for 1 =< i, j ___ 5 and a language L. Obviously
~,j(L) = ~.,(L).
(i) Denote by R1 the regular event generated by the grammar with
the productions
z ~
~a[
zlt~b I(r2a3b
a2 ~ a2a~l(Tsa4b]aab
~ ~ ~3a41 (Tab Ib
If G C Kit(R1), then Var (G) = 4 = Dep (G), Lev~ (G) = 1. O n t h e
other hand Dep (R1) = 1 and Lev~ (R1) = 0 because R1 is a regular
event. Therefore ~h,s(R1) = ~1,4(R1) = 0. Moreover, if Lev (G) = 1
and L(G) = R~, then Dep (G) > 1, Lev~ (G) > 0. Thus, ~,~(R~) =
~,~ (R~) = V.
(ii) Denote by R~ the regular event generated by the grammar
aa ]o'la2b
a ~
Hence Lev (R~) = 1. Moreover, if G ~ K~ ~(R:), then Var (G) = 4.
On the other hand Vur (R~) < 4. Indeed, R~ is generated by the grammar
3
2
a~ --->c~z~1ala ]ba ~ a b
8
Hence ~1,3(R2) = 0.
2
171
CONTEXT-FREE LANGUAGES
(iii) Denote by R3 the finite set {a~; i = 3, 4, 5, 6}. Obviously
Prod (R3) = 3 and if GC /C551(R3), then Var (G) = Lev (G) = 2.
On the other hand, R~ is a finite set and therefore ~1,5(R~) = ~8,5(R8) = 0.
(iv) Denote by R4 the regular event generated by the grammar
cr --~ o'a[ cria2b [a~a~b
~
-~
~abl ~ a ~ [~a~b
~2 "-~ ~abl zla2b I ¢~a3 [b
Prod (R4) = 10. If GC K~ ~(R4), t h e n D e p (G) = 3, Lev~ (G) > 0.
Since R4 is a regular event, we have ¢~,~(R4) = ~4,5(R4) = 0.
We have just showed t h a t if 1 < i, j - 5, ij ~ 8, then a regular event
R exists such t h a t ~bc~-(R) = 0. On the other hand K~-2~(R) = K ~ ( R )
for every regular event R and this fact complicates the investigation for
the c a s e i = 2, j = 4.
Open problem. Are the classifications Dep and Lev~ bound?
9. MULTIPLE
CLASSIFICATIONS
9.1. DEFINITION. Let K. and K~ be two classifications of grammars.
Put, for a language L,
K(,,~) (L) = man {K,(G); G E K ~ ( L ) } ,
K-1
<,,~)(L) = {G; G C Ky~(L), K , ( G ) = K(~3)(L)} ,
K(,m) is said to be a multiple classification of languages composed from
K , and K~.
9.2. COROLLARY.
K(~,~) (L ) = K~,K;I(L) (L )
and
KTJ,~) (L ) =
K -1 - i
T j"~
..K~ (~) fv~
for every language L and classifications K . and ICe .
9.3. COROLLARY.K(.,e) (L )>= K~ (L ) for every language L and therefore
K(.,e) is nontrivial if Ks is.
In a natural way an important question arises as to what we can say
about the relation between the classifications of languages K(.,~) and
K(~,~). First we have
9.4. THEOREM. For every language L either K-(~,~) (L ) N K -~
(~,o) ( L ) = o
--1 (L) = K-1
or K(~,~)
(~,.) (L).
172
GnUSK~
Proof. Suppose that K -1
(.,~) (L) • K -1
(~,.) (L) ~ e for a language L.
L(G)
= L, K~(G) = K~(L) =
(.,~)(L)
Q
K(~,~)
(L).
Then
Let G C K -1
-1
K(~,.)(L), K(~,,)(L) = K.(G) = K . ( L ) . Now, let GI be any grammar in
--1
K(.,,) (L). Then L (G1) = L, K, (G1) = K~ (L ) and, moreover, K . (G1) =
--1
Ka(G) = K . ( L ) . Hence G1C K(~,~)(L)
and we have K -I(.,,)(L)
K -1(~,.)(L). The reverse inclusion can be proved similarly.
9.5. THEOREM. For every language 5, K~(~,~)(L ) N K~(~.) (L ) =
K : I (L ) n K ; 1(L ).
Proof. G C K ~ ( L ) n K-~(L) if and only if L(G) = L, K~(G)
= K . ( L ) , K~(L) = K~(G), which is true if and only if G E K~,~)(L)
M K(~,.)
-1 (L).
9 . 6 . THEOREM. K -I
(~,~) = K(~,~)
-1
if and only if classifications K~ and
K~ are bound.
Proof. Directly from the preceding theorems.
(~,~)(L ) for
9.7. T~EOREM. K(~,~)
--1 (L ) = /C~I(L) N K ~ (L ) = K-1
every languages L if and only if the classifications K . and K~ are bound.
Proof. According to Theorems 9.4 to 9.6.
9.8. T~OREM. The classifications K . and K~ are bound if and only
if K(~,~) = K . , i.e. K(.,~) (L ) = K~ (L ) for every language L.
Proof. K(.,a)(L) = min{K~(G); G C K~I(L)}. If K . and Ka are
bound, then for every language L,K~ 1(L) N K~1(L) ~ 0, hence K(~,~)(L)
= rain {K.(G); G E K~(L)} = K . ( L ) . On the other hand let
K(.,a) = K . . Then rain {K.(G); G ~ K ~ ( L ) I = K . ( L ) for every
language L. Hence, K~I(L) ~ K-~(L) ~ ¢) and the classifications
K . and K~ are bound.
9.9. COROL~nnr. I f K . and K~ are considered as classifications of
languages, then K(.,~) = K . if and only if K(~,.) = K~.
9.10. TI~EOnE~. Let the classifications K~ and K~ be bound. Then
K . ( L ) = K~(L) for every language L if and only if K(~,~) = K(~,.).
Proof. According to Theorem 9.8.
Remark. The assumption that K . and K~ are bound is necessary.
Consider now the classifications K~, 1 -< i -< 5. We have the following
Theorem.
CONTEXT-FREE LANGUAGES
173
9.11. THEOREM. I f i ~ j, ij ~ 8, then there is a language L such that
K-1
-I (L) = 0.
(~,~) (L) N K(j,~)
T h e proof follows directly from Theorems 8.6, 9.4 and 9.6.
I f i ~ j we have also K(~,~-) ~ K(j,,). Indeed, we have K(2,I)(RI) =
4 < K(1,2)(RI) > 5, K(3,1)(R1) = 1 < K(1.~)(R1) = 4, K(4a)(R1) =
1 < K(1,4) (R~)> 5, K(2,3)(R~) = 4 < K(8,2) (R~)> 5 where R~ is as in the
proof of T h e o r e m 8.6. Moreover, K(1,5)(R3) = 2, K(~,5)(R3) = 1,
/((3,5) (R3) = 2, K(4,5) (R3) = 0 and K(5,~) (R3) > 3 for i = 1, 2, 3, 4, 5
where R8 is as in the proof of T h e o r e m 8.6. Finally, if R is any regular
event, then K(s,4) (R) = 1 and K(4,2) (R) = 0 and if F is a finite set, then
K(3,4) (F) = 1, K(~,~) (F) = 0.
Finally we have the
9.12. THEOrEm. Classifications K(~,i) , 1 <= i, j < 5 are connected in
any alphabet with at least two symbols.
Proof. F o r i = j the assertion of the Theorem follows dh'eetly from
Theorems 4.2, 5.3, 5.4, 6.2 and 6.3. I n order to prove the rest of the
T h e o r e m we shall consider several cases.
(i) i = 1. Obviously K(i,j) ({a}) = i f o r j = 2, 3, 4, 5. Now let n > 1
and G. be the g r a m m a r (1) in the proof of Theorem 4.2. Obviously
(A)
V a t (G.) = Dep (G.) = n,
Lev (G.) = L e v . (G.) = 1,
Prod (G.) = 2n -j- 1.
Denote, as in 4.2, L ( G . ) = L . . We shall now prove t h a t
(AI)
V a t ( L . ) = Dep (L.) = n,
Lev (L.) = Lev,~ (L.) = 1,
Prod (L~) = 2n + 1.
Indeed, D e p (L.) = n is just the T h e o r e m 4.2. Var (L) > Dep (L) for
every language L and therefore n = Dep (L.) < Var (L.) _-<
Var (G.) = n proving Var (L.) = n. Lev (G.) = 1 implies
Lev (L.) -- 1. Since D e p (L.) > 1 and L e v . (Gn) = 1 we have
L e v . (L.) = 1. I n order to prove Prod (L.) = 2n -J- 1 we proceed as
follows:
L e t G = (V, N, P, a) be a g r a m m a r for L . such t h a t Prod (G) =
Prod (L,~). T h e g r a m m a r G is obviously reduced, l~.[oreover, if A ~
is a variable in G, then there are at least two productions in P with A on
the left-hand side. (Indeed, suppose t h a t there is only one production
A ~ a in P, with A on the left-hand side. Since G is reduced, A does not
174
GnVSKA
occur in a and therefore we can delete the production A --* a from P
and replace A by ~ in all other productions, which is a contradiction with
Prod (G) = Prod (L~).) Thus, Prod (L~)_>- 2 V a r (G) - 1. Since
Prod (G,~) = 2n + 1, we get Prod (L,~) = 2n + 1 if Vat (G) > n.
Let now Var (G) = n. Since Dep (L~) = n, we have immediately
Dep (G) = n. Hence, A ~ - A for any variable A # ¢ in G and, moreover,
following the proof of Theorem 4.2, G is a linear grammar. Whence it
follows t h a t for every variable A in G there are strings al # a2 such that
A --~ al and A --~ a~ are productions and a~, a2 are not terminal strings.
However, L (G) = L~ and therefore there is at least one variable B in G
such t h a t B --+/~ for a terminal string ~. Thus Prod ( G ) = 2 n + 1.
Since Prod (G~) = 2n + 1, we have Prod (L,~) = 2n + 1 and (A1) is
proved. Having proved (A) and (A1) we get immediately that
K(1.j) (L~) = n f o r j = 2, 3, 4, 5.
(ii) i = 2. Obviously K(2.j) ({a}) = 1 for 1 ~ j < 5. Now let n > 1.
B y (A) and (A1), G,, E K-71(L,,) if n > 1, j C {1, 2, 3, 4, 5}. Thus,
n = K2(L,~) <= K(2,j)(L~) = min {K2(G); G E KT~(L~)} =< K2(G,,) = n
and we have K(2.~.)(L~) = n.
(iii) i = 3. Clearly K(3,j) ({a}) = i for 1 =< j =< 5. Denote now by
R2' the regular event generated by the grammar G2t with productions
Obviously
(B)
Var (G() = Lev (G~') = 2,
Dep (G() = 1,
Lev~ (G() = 0,
Prod (G2') = 4.
According to the proof of Theorem 5.3, Lev ( R ( ) = 2. Consequently
2 = Lev (R2') < Var (R2') < Var (G2') = 2. Since R~' is a regular
event we have Dep (R2') = 1, Lev~ ( R ( ) = 0. I t is also easy to show
that Prod (R2') = 4. Hence (B) holds for R2' as well as G2' whence we
get immediately that K(3,j) (R2') = 2 for 1 =< j =< 5.
Let now n = 2. Denote by G~+I the grammar (1) from the proof of
Theorem 5.3. Clearly
(C)
Var (G~+I) = n + 1 = Lev (G,+I),
L e v . (G~+I) = 0,
Dep (G~+I) = 1,
Prod (G~+I) = 3n.
Denote R~+~ = L(G,,+x). B y the proof of Theorem 5.3, Lev (R.+I)
= n + 1. H e n c e n + 1 = Lev (R~+I) -_< Var (G~+I) = n + 1 and
175
CONTEXT-FREE LANGUAGES
Var (R,+I) = n + 1. Since R.+I is a regular event we have
Dep (R~+I) = 1, L e v , (R,+x) = 0. Finally we shall prove t h a t
Prod (R,+I) = 3n. To do this let G = (V, {a, b}, P, z} be a grammar
for R~+I such t h a t Prod (G) = Prod (R~+I).
Similarly as in point one we can show that if A is a variable and A ~ z,
then in G there are at least two productions ~Sth A on the left-hand side.
Next, b y using similar arguments as those in the proof of L e m m a 1,
Gruska (1967), we can show that if A is a variable in G and there are
terminal strings x, y, xy ~ e, such that A ~ x A y , then the assertion (2)
from the proof of Theorem 5.3 holds. Then we say that A is an R-variable of the type i (as to the i see (2) in Theorem 5.3). From the structure of the language R~+~ it follows t h a t for every integer i -< n there is an
R-variable of the type i and, moreover, if Aj. is an R-variable of the type
~j, j = 1, 2, ix ~ i2, then neither A~ ~ A2 nor A2 ~,~ A~ nor are there
strings x, y, z such t h a t z ~ xAlyA2z. Whence it follows that for every
R-variable A there are at least two productions with A on the left-hand
side and at least one production with A on the right-hand side and such
t h a t the variable on the left-hand side is not A. Thus Prod ( G ) > 3n.
Summarizing the foregoing results we get
(C1)
Var (R~+I) = n -~ 1 = Lev (R~+x),
Lev~ (R~+I) = 0,
Dep (R~+I) = 1,
Prod (R.+I) = 3n
and hence, by (C) and (C1), K(~.j) (R~+I) = n + 1 for 1 =< j -< 5.
(iv) i = 4. Let n>_- 1. Denote b y G~ the grammar (1) in the proof of
Theorem 5.4. Clearly
(D)
Var (G~) = 2 n ~ - 1,
L e v ( G ~ ) = n - ~ 1,
Lev~ (G~) = n,
Dep (G~) = 2,
Prod (G~) = 6n
Denote L(G,~) = L,~. B y Theorem 5.4, Lev~ (L,) = n and hence
Dep (L~) > 1. Since Dep (G~) = 2 we have Dep (L~) = 2. Now let
G = (V, Z, P, z} be a reduced grammar for L~. Let z ~ x z y for some
strings x and y. According to the properties of strings in L~--see (i)
to (v) in the proof of Theorem 5.4--we get G ( x ) = G ( y ) = {e} and
therefore we can omit in G all productions having z on the right-hand
side without changing the language and without increasing the number
of variables or grammatical levels. This fact, together with
Lev~ (L.) = n, Var (G~) = 2n -k 1, Lev (G~) = n -b 1, implies
Lev (L~) = n q- 1, Vat (L~) = 2n -k 1. Finally w e h a v e P r o d (L.) --- 6n.
176
GRUSKA
We do not give the detailed proof of this assertion here, only the main
ideas of such a proof will be sketched. Let G = (V, ~, P, z> be a grammar
for L~ with a minimal number of productions. We may suppose that
does not occur on the right-hand sides of productions, i¥~oreover, for
1 -< i _< n, there is a non-elementary grammatical level G~ in G such
that if a variable from G~ is used in a derivation of a string x C L~,
then x E L~--see the proof of 5.4. Next, if z ~ x, then there is in x at
most one variable which belongs to a non-elementary grammatical
level of G. Consequently, the number of productions in G is at least
n + (the number of productions in non-elementary grammatical levels).
However, every non-elementary grammatical level has at least 5 productions. From that and from Lev. (L.) = n we get Prod (G) > 6n. But
Prod (G.) = 6n whence Prod (L.) = 6n. Summarizing the foregoing
results we get that (D) holds for L~ as well as G~ yielding K(4,j) (L~) = n,
1=<j=<5.
(v) If F is a finite set with n strings, then K(5,1) (F) = n = K(5.3)(F).
Put F0 = {a2~; i = 0, 1, . - . , n - 1}.By Theorem 6.3 we have
K(~.~) (F0) = K(5.4)(F0) = n. This completes the proof of the Theorem.
10. REMARKS AND OPEN QUESTIONS
1. In this paper some basic concepts concerning classifications of
context-free grammars and languages were introduced. However, we
used in it only the fact that context-free grammars form a class of
generative devices and context-free languages are just the objects that
are defined by context-free grammars. That is why the basic definitions
and concepts given in this paper can be applied whenever we want to
study the classifications (with respect to "complexity") of some generative devices and the objects defined by them. (For example contextsensitive grammars and languages.)
2. If K is one of the classifications K1 to K5 and K(G1) < K(G~) for
some grammars G1 and G2 (or K(L1) < K(L2) for languages L1 and L2),
then GI(L1) is simpler than G~(L2) either from the point of view of the
number of elements needed to describe (languages)--in the cases K1
and K~---or from the point of view of the internal structure of grammars
(languages)--see K2 to K4. Therefore, the classification K1 to K~ can be
viewed as some criteria of complexity of both grammars and languages.
Naturally, some other classifications of this kind are possible and it is
very difficult to say which of the classifications gives the best picture of
the complexity of grammars and languages. 5loreover, it is questionable
whether "the best classification" exists.
'CONTEXT-FREE LANGUAGES
177
3. Other ways of classifying context-free languages are by time and
memory requirements for reeognization (Hartmanis and Stearns, and
Hartmannis, Lewis, and Stearns, 1965). These classifications and
especially the case of real-time recognization are very important from
the practical point of view. That is why a question arises as to the connection between the classifications by time and memory requirements
and those considered in this paper or classifications of a similar type.
Some results indicate that even very simple languages, with respect to
the classifications K1 to K5, may be difficult to recognize. For example,
if the language generated by the grammar G with the productions
- , O~j 1~ t s~ IxZs
X -~ 0X0[ 1X1 ]sYs
Y--~ OY] 1YI sY I fO I Yl l f slc ,
where Var (G) = Lev (G) = 3, Dep (G) = 1, Lev (G) = 0, is T(n)recognizable, in the sense of Hartmanis and Stearns (1965) by an on-line
multitape Turing machine, then, by Kasami, 1967, there is a constant C
such that T(n)> C(n/log n) 2.
4. Some upper bound for the number of steps to recognize contextfree languages are well-known. One of the best is given in ~ulik et al
(To appear) in the form
(1)
(Var (¢)[ x [)s(a)+l(N(G) - 1)
where x is a string to be recognizable, G is a grammar, N (G) is the maximal number of variables on the right-hand sides of productions in G
and (1) is the upper bound for the number of steps to recognize whether
x E L (G). This result partly indicates the importance of the classification Var and partly directs the attention to the classification N. Obviously N(G) <__ 2 for every language L. Moreover N ( L ) = 0 if and
only if L is a finite set, N ( L ) = 1 if and only if L is an infinite linear
language. Although the classification N (G) is not nontrivial, it seems,
putting Ks = N, that the classifications K(1.0) or K(6.1~ may be interesting.
5. Let here and in the next two points K be some of the classifications
K~ to Ks and n aninteger. Denote K-l(n) = {L; K ( L ) = n}. What can
be said about classes K -~ (n)? Some results: By Gruska (1967), for every
n there are languages L~ and L2 in K~-1(n) such that L~ [7 L2 is not a
language. Moreover, if L C K~-~(n), (L -- L1) U (L~ - L) is a finite set,
then L~ is also in L-~(n). The same is true for K4. Let an integer n be
178
ORUSK~
given. Is there an unambiguous language L in K-~l(n)
(g~l(n)
or
g~l (n ) )?
6. W h a t is the relation between K(L1), K(L2) and K(L1 U L2),
K(L1L2), K(LI*)? Some of the quite obvious results: (i) K(L1 U L~) <
K(L~) + K(L~) + 1, K(L1L~) < K(L1) + K(L2) + 1, K(L~*) <=
K(L~) + 1 if K = Var or K -- Lev; (ii) For every n there is a language
L such that K ( L ) = n, K(L*) = 1 and either K = V a r or K = Lev,
(iii) Dep (L1 [J L~) __< max (Dep (L~), Dep (L2)), Dep (LI*) _-<
Dep (L~), Dep (L~L2) <-_ max (Dep (L~), Dep (L2)).
For what integers n and m are there languages L1 and L2 in K -~ (n)
such that (iv) K (L, U L2 ) = m, (v) K (L1 D L2 ) = m (vii) K (LI*) = m ?
Some results follow from (i) to (iii). iVIoreover, it is known (Gruska,
1967), t h a t if 1 _-< m =< n then there are languages L~, L2 in K ~ ( n )
such t h a t Var (L~ n L2) = m. Is it also true for any m > n?
Let L be a language and R a regular event. W h a t can be said about
K (L N R), K (L - R ), K (LR) ? (If R is a finite set, then I Var (L) -V a r ( L - R){ < 1, I L e v ( L ) - L e v ( L - R){ __< 1, Dep (L) =
Dep (L - R), Lev. (L) = Lev~ (L - R ) ) . Is it true t h a t Dep (L) =
max {Dep (L n R), Dep (L n/~)} for any language L and any regular
event R? Let S be a gsm mapping and L a language. W h a t is the relation
betwen K (L) and K (S (L)) ?
7. For what n is it recursively solvable for an arbitrary grammar G
whether K (L (G)) = n?
Let L be a language. W h a t can be said about the sets K -~ (L)? One
of the results: There is an unambiguous language L such t h a t K~ 1(L)
contains only ambiguous grammars. Is it true also for classification
K2 to K~ ?
8. To every classification K of grammars one can associate the relation, written ~ ' , between languages in the following manner: L1 ~ L2
if and only if there is a grammar G C K -~ (L~) and a variable A in G
such t h a t G (A) = L2. W h a t can be said about languages L~ and L2
i f / 4 ' ~ L2 or LI ~'x L2 where ~* is
" a transitive and reflexive closure of ~-?
9. As a corollary of inequalities (iii) in point 6 we have
THEOREM. Let ~ be an alphabet with at least two symbols. Then there is
no finite class of context-free languages over ~ such that e~ery context-free
language in ~* can be obtained from these languages by using only a finite
number of Kleene operations [J, • , *
RECEIVED: July 8, 1968; revised January 3, 1969
CONTEXT-FREE LANGUAGES
179
REFERENCES
I. CVLiK, K., (Roma 1962) Formal structure of ALGOL and simplification of its
description. Symbolic languages in data processing, pp. 75-82.
2. CULIK, K., FRI~, I., GRUSKA, ft., HAVEL, I., KOPf~IVA,J., AND NOVOTNY, I~[.,
The mathematical theory of grammars and languages. (To be published).
3. GINSBURG, S., (1966) "The mathematical theory of context-free languages."
McGraw-Hill, New York.
4. GINSBURG,S., AND ROSE, G. F., (1963a) Some recursively unsolvable problems
in ALGOL-like languages. J.ACM. 10, 29-47.
5. GINSBURG,S., ANDROSE, G. F., (1963b) Operations which preserve definability
in languages. J.ACM. 10,175-195.
6. GRUSKA,J., (1967) On a classification of context-free grammars. Kybernelika
3, 22-29.
7. HARTMANIS,J., AND STEARNS,R. E., (1965) On the computational complexity
of algorithms. Trans. Am. Math. Soe. 117,285-306.
8. HARTMANIS,J., LEWIS, 2nd, P. M., AND STEARNS, R. E., (1965) Classifications
of computations by time and memory requirements. Proceedings of I F I P
Congress, 31-35.
9. KASAMI, T., (1967) A note on computing time for recognition of languages
generated by linear grammars. Inform. and Control 10,209-214.
10. KoPfiIvA, J., (1964) Some notes on the formal structure of ALGOL 60. Publ.
Fac. Sci. Univ. J. E. Purkyne, 409-418.
11. N•UR, P., (1963) Revised report on the algorithmic language ALGOL 60.
CACM. 6, 1-20.
12. REDKO,V. N (1965), Some problems of language theory. Kibernetika 1, 12-21.