200 Only Problems

Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

s in

b l em
Pro ua g es
0
20 ormal Lan g
F
and
Automata
Theor y

damian niwiński
wojciech rytter
Niwiński and Rytter’s

200 Problems
in Formal Languages
and Automata Theory

Edited by Filip Murlak


200 Problems
in Formal Languages
and Automata Theory
collected by damian niwi ński and wojciech rytter

solutions by mikołaj boja ńczyk, lorenzo clemente,


wojciech czerwi ński, piotr hofman, szczepan hummel,
bartek klin, eryk kopczy ński, sławomir lasota,
filip mazowiecki, henryk michalewski, filip murlak,
joanna ochremiak, paweł parys, michał pilipczuk,
michał skrzypczak, szymon toru ńczyk,
igor walukiewicz, and joost winter

edited by filip murlak

typeset by michał skrzypczak and szymon toru ńczyk

proofread by leszek kołodziejczyk and joost winter

university of warsaw 2017


printed by Zakład Graficzny UW
order n o 640/2017
Preface

T h i s book contains problems collected over more than two decades by


Damian Niwiński and Wojtek Rytter for their course on Automata, Languages,
and Computation at the University of Warsaw. Over the years the collection was
circulated informally, with hardly any hints on how to solve the problems.
Damian and Wojtek always felt that it should be eventually turned into a
proper problem book. On many occasions, trying to decipher tiny scraps of
solutions scribbled in the margins of my own yellowish, dog-eared
printout—inevitably in the very last minutes before the class—I deeply shared
the sentiment, as I am sure everybody ever teaching the course did. But how to
get enough hands on deck to get it done without overworking oneself?

When Mikołaj Bojańczyk first brought up the topic of a gift for Damian’s 60th
birthday, I thought it was the greatest excuse one could hope for. And so it
happened that 19 people, related in various ways to the automata group at the
University of Warsaw, enthusiastically agreed to contribute. For all of us, this
was a wonderful opportunity to thank Damian for creating the automata group
and shaping us as researchers.

Over the course of slightly more than a year, we wrote solutions to all problems
in the original collection. Some were later merged, a few were removed. We
included several exercises used as home assignments in the most recent edition
of the automata course, taught by Wojtek Czerwiński, as well as selected
problems from Damian’s Computational Complexity course. A seemingly
innocent problem was offered at the last moment by Wojtek Rytter, causing
some embarrassing blunders on my side. This problem, along with several
others, is now marked with (⇤) to indicate that it is particularly difficult.
Indeed, working on the problem book we all kept rediscovering how much
more there is to learn about automata and formal languages. It would make us
very happy to see the readers share this experience.

Sto lat, Damian! Here is your birthday gift.

Filip Murlak Warsaw, 10 May 2017


Contents

1 Words, numbers, graphs 3 57

2 Regular languages 7 69

2.1 Regular expressions and finite automata 7 69

2.2 The pumping lemma 10 79

2.3 Closure properties 12 85

2.4 Minimal automata 16 100

2.5 Variants of finite automata 18 113

2.6 Combinatorics of finite automata 21 119

2.7 Algorithms on automata 23 130

2.8 Stringology 25 147


3 Context-free languages 27 155
3.1 Context-free grammars 27 155

3.2 Context-free or not? 31 172


3.3 Pushdown automata 35 195
3.4 Properties of context-free languages 39 221

4 Theory of Computation 43 239


4.1 Turing machines 43 239

4.2 Computability and undecidability 47 265


4.3 Chomsky hierarchy 51 279
4.4 Computational complexity 52 287
Notation

A + B — disjoint union of sets A and B.

f : A * B — f is a partial function from A to B.

#u (w) — the number of (possibly overlapping) occurrences of the word u as a


subword of the word w.

|w| — the length of the word w.

w[i ] — the ith letter of the word w (counted from 1).

w[i..j] — the infix of the word w from position i to position j, inclusive.

wR — the reverse of the word w, or w written backwards.

KL 1 = {u : 9v 2 L. uv 2 K } — the right quotient of K by L.

L 1K = {v : 9u 2 L. uv 2 K } — the left quotient of K by L.

r · s = {( x, z) : 9y. ( x, y) 2 r ^ (y, z) 2 s} — the left composition of the binary


relations r and s.

r ⇤ — the transitive-reflexive closure of the binary relation r.

[w]2 — the numerical value of the binary sequence w; e.g., [011]2 = 3.

bin(n) — the binary representation of n 2 N, without leading zeros.


Problems
1
Words, numbers, graphs

Let us fix a finite set S; we shall refer to it as the alphabet. The elements of S


are called letters or symbols. A word w over S is a finite sequence a1 a2 . . . an of
letters from S. The length of w = a1 a2 . . . an , denoted by |w|, is n. The empty
word, denoted by #, is the empty sequence; it has length 0. We write S⇤ for the
set of all words over S, and S+ for the set of non-empty words over S. The
concatenation of words u = a1 a2 . . . am and v = b1 b2 . . . bn , denoted by u · v or
simply uv, is the word a1 a2 . . . am b1 b2 . . . bn . We write vn for the word |vv {z
. . . v}.
n
For a word w 2 S⇤ and 1  i, j  |w|, we write w[i ] for the ith letter of w and
w[i..j] for the infix starting at the ith letter and ending at the jth letter of w; that
is, if w = a1 a2 . . . an , then w[i..j] = ai ai+1 . . . a j . In particular w[i..i ] = w[i ] and
w[1..|w|] = w. For j < i we let w[i..j] = #.

Problem 1. primitive words. A word w 2 S⇤ is primitive if it cannot be


presented as w = vn for any n > 1.

(1) Prove that for each non-empty word w there is exactly one primitive word
v such that w = vn for some n 1. We call n the exponent of the word w.

(2) For any words w, v, we say that the words wv and vw are conjugate to
each other. Prove that being conjugate is an equivalence relation and all
conjugate words have the same exponent. What is the cardinality of the
equivalence class of a word of length m and exponent n?
4 WORDS, NUMBERS, GRAPHS

Problem 2. parenthesis expressions. Show that the following two ways of


defining the set of balanced sequences of parentheses are equivalent:

(a) The least set L such that the empty sequence # is in L and if w, v 2 L then
(w), wv 2 L.

(b) The set K of words over the alphabet {( , )} in which the number of oc-
currences of ) is the same as the number of occurrences of (, and in each
prefix the number of occurrences of ) is not greater than the number of
occurrences of (.

Problem 3. semi-linear sets. For any fixed a, b 2 N, the set of natural num-
bers { a + bn : n 2 N } is called linear. A semi-linear set is a finite union of linear
sets. (The empty set is obtained as the union of the empty family of linear sets.)

(1) Prove that the set A = { a + b1 n1 + . . . + bk nk : n1 , . . . , nk 2 N } is semi-


linear for all fixed k and a, b1 , . . . , bk 2 N.

hint: Use congruence mod m for suitably chosen m.

(2) Prove that a set A of natural numbers is semi-linear if and only if it is


ultimately periodic; that is, there exist c 2 N and d 2 N {0} such that for
all x > c, x 2 A if and only if x + d 2 A.

(3) Fix a directed graph. Prove that the set of lengths of all directed paths
between any two fixed vertices is semi-linear.

(4) Prove that the family of all semi-linear sets is closed under finite unions,
finite intersections, and complement with respect to N.

Problem 4. game graph (by j. p. jouannaud). Consider the following game


between a barman and a customer. Between the players there is a revolving tray
with 4 glasses forming the vertices of a square. Each glass is either right-side up
or upside down, but the barman is blindfolded and wears gloves, so he has no
way of telling which of the two cases holds. In each round, the barman chooses
one or two glasses and reverses them. Afterwards, the customer turns the tray
5

by a multiple of 90 degrees. The barman wins if at any moment all glasses are
in the same position (he is to be informed about this immediately). Can the
barman win this game, starting from an unknown initial position? If so, how
many moves are sufficient? Would you play this game for money against the
barman? What about an analogous game with 3 or 5 glasses?

Problem 5. codes. A set C ✓ S+ is a code if each word w 2 S⇤ can be decoded;


that is, each w admits at most one factorization with respect to C: there is at
most one way to present w as v1 v2 . . . vn with v1 , v2 , . . . , vn 2 C and n 2 N.

(1) Let S = { a, b}. Prove that the set { aa, baa, ba} is a code and the set
{ a, ab, ba} is not a code.

(2) For a finite set A that is not a code, give an upper bound on the length of
the shortest word that admits two different factorizations.

(3) Show that {u, v} is a code if and only if uv 6= vu.

See also Problem 75.

Problem 6. thue–morse word. Show that the following definitions of the


Thue–Morse word are equivalent:

(1) the infinite sequence of 0’s and 1’s obtained by starting with 0 and succes-
sively appending the sequence obtained so far with all bits flipped;

(2) the infinite word s0 s1 s2 . . . such that sn = 0 if the number of 1’s in the
binary expansion of n is even, and sn = 1 if it is odd;

(3) the infinite word t0 t1 t2 . . . , whose letters satisfy the recurrence relation:
t0 = 0, t2n = tn , and t2n+1 = 1 tn for all n.

Show that the Thue–Morse word is cube-free; that is, it contains no infix of the
form www with w 6= #. In fact, it is strongly cube-free; that is, it contains no infix
of the form bwbwb for b 2 {0, 1}.
hint: First show that it contains neither 000 nor 111 as an infix, but each infix of length
5 contains 00 or 11 as an infix.
6 WORDS, NUMBERS, GRAPHS

Construct an infinite word over a four-letter alphabet that is square-free; that is, it
contains no infix of the form ww with w 6= #. Can it be done with three letters?
And two letters?
2
Regular languages

2.1 Regular expressions and finite automata

A regular expression is used to generate (that is, describe) a set of words. The
most basic regular expressions are ∆, which generates no words, and #, which
generates only the empty word. Each individual letter a can be viewed as a reg-
ular expression, which generates only one word, namely the one-letter word a.
Finally, regular expressions can be combined using the following operators:

ef generates {wv : e generates w and f generates v} ,


e + f generates {w : e generates w or f generates v} ,
e⇤ generates {w1 · · · wn : n 0 and e generates all w1 , . . . , wn } .

The operator e⇤ is called Kleene star. Note the case of n = 0 in the definition of
the Kleene star, which means that the empty word is generated by the Kleene
star of every expression. Here is an example of a regular expression that uses all
the available operations except ∆:

( a + b)( a + b) # + (( a + b) a( a + b)) .

This particular expression generates the set of words over the alphabet { a, b}
which have either even length, or have an odd length of at least 3 and penulti-
mate letter a.
8 REGULAR LANGUAGES

A second formalism for describing sets of words is finite automata, which


can be deterministic or non-deterministic (deterministic is a special case of non-
deterministic). A (non-deterministic) automaton is defined to be a tuple

(S, Q, I, d, F ),

where S is the input alphabet, Q is the set of states, I ✓ Q is the set of initial
states, F ✓ Q is the set of final (or accepting) states, and d ✓ Q ⇥ S ⇥ Q is the
transition relation; elements of d are called transitions or transition rules, and
a
are written as q ! q0 . An automaton is often drawn as follows:
a
a, b transition
a a, b
initial state
a, b
an automaton
accepting state

The automaton in the picture above has only one initial and one final state, but
this is not required. We extend the notation for transition rules to arbitrary
w
words w 2 S⇤ : we write p ! q if there is a run over w that begins in state p and
ends in state q; that is, a path in the automaton which goes from state p to state
q, and such that the labels of the edges on the path (that is, the transitions used
in the run) are a1 , . . . , an where w = a1 a2 . . . an . A word w is accepted if there is
a run over w from some initial state to some accepting state. For example, the
automaton in the picture above accepts exactly the odd-length words generated
by our example regular expression above. The language recognized by A, denoted
by L(A), is the set of words accepted by A. Two automata are equivalent if they
recognize the same languages.
We often implicitly assume that non-deterministic automata have only one
initial state. This can be done without loss of generality, because for each finite
automaton one can construct an equivalent automaton (of the same size) with a
single initial state. For automata with a single initial state q I we use the notation
(S, Q, q I , d, F ).
An automaton is called deterministic if it has one initial state and its transition
relation is a function from Q ⇥ S to Q, which means that for every q 2 Q and
R E G U L A R E X P R E S S I O N S A N D F I N I T E A U T O M ATA 9

a
a 2 S, there is exactly one state p such that q ! p. Determinism guarantees
that for every word there is exactly one run, and thus a word is accepted if and
only if this unique run ends in an accepting state. For each non-deterministic
automaton one can construct an equivalent deterministic automaton, but the
number of states may grow exponentially.

Problem 7. Prove that all languages L and M satisfy

( L⇤ M⇤ )⇤ = ( L [ M )⇤ .

Problem 8. Prove that the regular expression



00 + 11 + (01 + 10)(00 + 11)⇤ (01 + 10)

generates all words over the alphabet {0, 1}⇤ where both 0 and 1 appear an even
number of times.

Problem 9. Construct an automaton over the alphabet {0, 1}, which recognizes
those words, where the number of ones on even-numbered positions is even,
and the number of ones on odd-numbered positions is odd.

Problem 10. addition. Consider the alphabet {0, 1}3 , with letters written as
columns. Give a regular expression defining the language
82 3 2 3 2 3 9
< a1
> a2 an >
=
6 76 7 6 7
4 b1 5 4 b2 5 . . . 4 bn 5 : [ a1 a2 . . . an ]2 + [b1 b2 . . . bn ]2 = [c1 c2 . . . cn ]2 .
>
: c >
;
1 c2 cn

Problem 11. divisibility.

(1) Construct a deterministic automaton over the alphabet {0, 1, . . . , 9} which


recognizes decimal representations of numbers divisible by 7.

(2) Do the same, but with a reverse representation, where the least significant
digit comes first.

(3) Generalize this.


10 REGULAR LANGUAGES

Problem 12. one-letter alphabet.

(1) Prove that a language L ✓ { a}⇤ is regular if and only if the set of natural
numbers {n : an 2 L} is semi-linear in the sense of Problem 3.

(2) Prove that for an arbitrary set X ✓ { a}⇤ , the language X ⇤ is regular.

Problem 13. semi-linear sets.

(1) Prove that for every regular language L, the set {|w| : w 2 L} is semi-
linear. In particular, regular languages over one-letter alphabets corre-
spond to semi-linear sets via the bijection w 7! |w|.

(2) Let M be a semi-linear set. Show that {bin(n) : n 2 M} is a regular lan-


guage.

Problem 14. Prove that for all a, b, k, r 2 N, the following language is regular:

L = {bin( x ) $ bin(y) : ( a · x + b · y) ⌘ r mod k } .

2.2 The pumping lemma

The pumping lemma provides a property of regular languages which is often


used to prove that a given language is not regular. The lemma says that if a
language L is regular, then there exists a constant N 2 N such that for each
word w 2 L of length at least N, there is a decomposition w = xyz such that

| xy|  N, |y| 1, and xyi z 2 L for all i 2 N.

Problem 15. Prove that the following languages are not regular:

(1) { an bn : n 2 N };
n n o
(2) a2 : n 2 N ;

(3) { a p : p is a prime number};

(4) ai b j : gcd(i, j) = 1 ;
THE PUMPING LEMMA 11

(5) { am bn : m 6= n};

(6) {bin( p) : p is a prime number}.

Problem 16. Prove that the language of palindromes over an alphabet with at
least two letters is not regular.

Problem 17. A regular expression over an alphabet S can be seen as a word over
the alphabet S [ ∆, #, +, ⇤, ( , ) . Prove that the set of regular expressions over
an alphabet S is not a regular language.

Problem 18. Show that if in Problem 10 we consider multiplication instead of


addition, then the obtained language is not regular.

Problem 19. Prove a slightly stronger version of the pumping lemma: if a lan-
guage L is regular, then there exists a constant N such that for any words v, w, u
satisfying |w| N and vwu 2 L, there exist words x, y, z such that w = xyz,
0 < |y|  N, and vxyn zu 2 L for all n 2 N. Exhibit a language which satisfies
the claim of this stronger lemma, but is not regular.
hint: Consider the language

L= | cb {z. . . cb} +(b + c) cc(b + c) ,


 b⇤ cb ⇤ ⇤ ⇤ ⇤ ⇤
p2 P p

where P is the set of prime numbers.

Problem 20. Is the following language regular:

{w 2 { a, b}⇤ : #a (u) > 2017 · #b (u) for each non-empty prefix u of w}?

Problem 21. antipalindromes. A binary word w is an antipalindrome if for


some non-empty word z and s 2 {0, 1}

w = z z R or w = z s z R ,

where z is obtained from z by flipping the bits. For instance, 0010011 and 0011
are antipalindromes, and the empty word or a single letter are not. Let L be the
set of binary words which do not contain as a subword any antipalindrome of
length greater than 3 whose first letter is 0. Is L a regular language?
12 REGULAR LANGUAGES

Problem 22. Decide whether the following language is regular:

L = {uv 2 { a, b, c}⇤ : #a (u) + #b (u) = #b (v) + #c (v)} .

2.3 Closure properties

Each function f : S ! G⇤ can be extended to f : S⇤ ! G⇤ by setting

h ( a1 a2 . . . a n ) = h ( a1 ) h ( a2 ) . . . h ( a n );

functions obtained in this way are called homomorphisms.


The class of regular languages is closed under union, intersection, differ-
ence, concatenation, Kleene star, homomorphic images, and homomorphic pre-
images; that is, if L, M ✓ S⇤ are regular, so are L [ M, L \ M, L M, LM, L⇤ ,
f ( L), and g 1 ( L) for all functions f : S ! G⇤ and g : G ! S⇤ .
Closure under homomorphic images is a special case of closure under sub-
stitution: if h : S ! P(G⇤ ) maps each letter a 2 S to a regular language
S
h( a) over G, then w2 L h(w) is also regular; here, the image h(w) of a word
w = a1 a2 . . . an is the concatenation h( a1 )h( a2 ) . . . h( an ) of the languages assigned
to letters a1 , a2 , . . . , an .

Problem 23. Prove that for each regular language L ✓ S⇤ and each set X ✓ S⇤
the following languages, known as left and right quotients of L, are regular:
1 1
X L = {w : 9v 2 X. vw 2 L} , LX = {w : 9u 2 X. wu 2 L} .

Problem 24. A reverse of a word w, denoted by wR , can be defined recursively:


#R = # and (ws )R = swR for w 2 S⇤ and s 2 S. Prove that a set L ✓ S⇤ is
regular if and only if the language LR containing the reverses of words in L is
regular.

Problem 25. For a given automaton A recognizing a language L, construct an


automaton B that recognizes the language

Cycle( L) = {vu : uv 2 L} .

Does the fact that Cycle( L) is regular imply that L is regular as well?
C L O S U R E P R O P E RT I E S 13

Problem 26. Let L be a regular language over the alphabet {0, 1}. Prove regular-
ity of the language Lmin of words w 2 L which are minimal in the lexicographic
order among words of length |w| in L.

Problem 27. Let us assign to each non-empty binary word w 2 {0, 1}+ the
number 0.w in [0, 1) defined as
1 1 1
0.w = w[1] · + w[2] · 2 + . . . + w[|w|] · |w| .
2 2 2
For a real number r 2 [0, 1], let

Lr = {w : 0.w  r } .

Prove that the language Lr is regular if and only if r is rational.

Problem 28. Give an example of an infinite language closed under infixes that
does not contain an infinite regular language as a subset.

Problem 29. Let L be a regular language. Prove that the following languages are
also regular:

1
L = {w : 9u. |u| = |w| ^ wu 2 L} ,
2
p
L = {w : ww 2 L} .

Problem 30. Let L be a regular language. Prove that the following languages are
also regular:

Root( L) = {w : 9n 2 N. wn 2 L} ,
n o
Sqrt( L) = w : 9u. |u| = |w|2 ^ wu 2 L ,
n o
Log( L) = w : 9u. |u| = 2|w| ^ wu 2 L ,
n o
Fibb( L) = w : 9u. |u| = F|w| ^ wu 2 L ,

where Fn is the nth Fibonacci number: F1 = F2 = 1, Fn+2 = Fn + Fn+1 .


hint: n2 = 1 + 2 + . . . + (2n 1) and 2n = 1 + 21 + 22 + . . . + 2n 1 + 1.
14 REGULAR LANGUAGES

Problem 31. nProve that foroevery regular language L, the following language is
also regular w : w|w| 2 L .

Problem 32. Consider a regular language L and arbitrary (possibly non-regular)


languages L1 , . . . , Lm . Construct a finite deterministic automaton recognizing
the following language over the alphabet {1, . . . , m}:

L = i 1 i 2 . . . i k : L i1 L i2 . . . L i k ✓ L .

Problem 33. Let PalS denote the set of palindromes over the alphabet S of length

at least 2. Prove that the language PalS is regular if and only if |S| = 1. Is the

language uuR : u 2 (0 + 1)⇤ regular?

Problem 34. A palindrome is non-trivial if it has length at least 2. Determine


which of the following languages over the alphabet {0, 1} are regular:

(1) words containing a non-trivial palindrome as a prefix;

(2) words containing a non-trivial palindrome of even length as a prefix;

(3) words containing a non-trivial palindrome of odd length as a prefix.

Problem 35. Determine if the following languages are regular:

(1) L1 = { x 2 { a, b}⇤ : #ab ( x ) = #ba ( x ) + 1},

(2) L2 = { x 2 { a, b}⇤ : #aba ( x ) = #bab ( x )}.

If so, provide regular expressions for them.

Problem 36. Let L be a regular language. Prove that the languages

(1) L+ = {w : 9u. |u| = 2 · |w| ^ wu 2 L},

(2) L++ = {w : 9u. 2 · |u| = |w| ^ wu 2 L},

(3) L + = {w : 9u, v. |u| = |v| = |w| ^ uwv 2 L}

are regular, but the following language may be non-regular:


C L O S U R E P R O P E RT I E S 15

(4) L+ + = {uv : 9w. |u| = |v| = |w| ^ uwv 2 L}.


Problem 37. Is it true that for each regular language L over S there exist two
distinct non-empty words u, v over S such that L {uv} 1 = L {vu} 1 ?
Problem 38.
(1) Is there a non-regular language L ✓ { a}⇤ such that L2 is regular?

(2) Let L = {w 2 { a, b}⇤ : #a (w) 6= #b (w)}. Show that L2 is regular.


Problem 39. The Hamming distance between two words of equal length is the
number of positions at which they differ. Prove that for every regular language
L and every constant k, the set of words at a distance at most k from a word in L
is regular.
Problem 40. A shuffle of words u and v is any word that can be split into two
sub-sequences equal to u and v, respectively. The shuffle of languages L and M,
denoted by L k M, is the set of all possible shuffles of a word from L and a word
from M. Prove that if L and M are regular languages then the language L k M
is also regular.
Problem 41. Let the shuffle closure of a language L be defined as
L] = L [ ( L k L) [ ( L k L k L) [ . . . .
Give an example of a regular language over a two-element alphabet whose shuf-
fle closure is not regular.
Problem 42. (⇤) Let Pump(w) be the least set K such that w 2 K and for all
words x, y, z, if xyz 2 K, then xyyz 2 K. Is Pump( ab) regular? What about
Pump( abc)?
Problem 43.
(1) Is it true that for each finite alphabet S, the family of regular languages
over S is the least family containing all finite languages over S and closed
under union, complement, and concatenation?

(2) What if we additionally assume closure under images through homomor-


phisms from S⇤ to S⇤ that preserve length?
16 REGULAR LANGUAGES

2.4 Minimal automata

A deterministic finite automaton that recognizes a language L is minimal if no


automaton with fewer states recognizes L. A minimal automaton for a regular
language is unique up to isomorphism, so we are justified to speak about the
minimal automaton for L.
An automaton is minimal for its language if and only if it is:

• reachable, i.e., every state is reachable from the initial state, and

• observable, i.e., from every state a different language is recognized.

Every language L ✓ S⇤ determines the Myhill-Nerode equivalence relation on


S⇤ which relates words v and v0 if and only if

vw 2 L , v0 w 2 L for all w 2 S⇤ .

A language is regular if and only if its Myhill-Nerode equivalence has finitely


many equivalence classes. A transition relation on these equivalence classes
can be defined so that from the equivalence class of a word w, upon reading
a letter a 2 S, one moves to the equivalence class of the word wa. Putting the
equivalence class of the empty word as the initial state, and marking equivalence
classes of words from L as accepting states, we obtain the minimal automaton
for the language L.
Problems related to minimal automata can also be found in Section 2.8.

Problem 44. Construct the minimal deterministic automaton for the language
L = ai bn a j : n > 0, i + j is even .

Problem 45. Construct the minimal deterministic automaton for the language
L = w 2 {0, 1, 2, 3, 4}+ : maxi,j |w[i ] w[ j]|  2 .

Problem 46. For k 2 N let Lk ✓ {0, 1}⇤ be the language of words where each
infix of length k contains exactly two 1’s and each infix of length at most k
contains at most two 1’s. Describe minimal deterministic automata recognizing
Lk for k 2 {3, 4}.
M I N I M A L A U T O M ATA 17

Problem 47. For n > 0 let NPaln be the set of words over {0, 1, . . . , n 1} that
contain no non-trivial palindrome as an infix. How many states does the mini-
mal deterministic automaton for NPaln have?

Problem 48. Football teams A, B, and C compete against each other according
to the following rule: the winner of the previous match plays against the team
that did not participate in it. Assuming that there are no draws, consider the
language over the alphabet { A, B, C } of possible sequences of winners. Prove
that it is a regular language and describe its minimal deterministic automaton.

Problem 49. Mr. X owns stock of three different companies: A, B, and C. Every
day, he checks the relative values of his stocks and orders them from the most
to the least valuable (we assume that the values of two stocks can never be the
same). Mr. X decided to sell stock of a company if it ever goes down in the order
for two days in a row. For example, if the stock record in three consecutive days
is CBA, ACB, ABC, then Mr. X will sell C. Let S be the set of all permutations
of { A, B, C }. Prove that the language L of words over S that describes those
stock records that do not lead to the sale of any stock owned by Mr. X is regular.
Calculate the number of states in the minimal deterministic automaton for L.

Problem 50. (⇤) Mr. X decided that every day he will work or not, following
the rule that in any seven consecutive days there are at most four working days.
A calendar of n consecutive days can be expressed as a word of length n over
alphabet {0, 1}; where 1 means a working day and 0 means a day off. Prove
that the set of all words describing valid calendars forms a regular language
over the alphabet {0, 1}. Describe the minimal deterministic automaton for this
language.

Problem 51. Consider a vending machine that accepts coins in two currencies,
eur and pln with 1 eur = 4 pln, and works as follows:

• Every drink costs 1 pln.

• In the beginning the machine does not contain any coins.


18 REGULAR LANGUAGES

• The machine accepts 1 pln or 1 eur; in the latter case the machine gives
back 3 pln if they are available. If the machine cannot give back change,
then it signals an error.

• If after the transaction the machine contains an equivalent of at least 8 pln,


then all coins are removed from it, and the machine resumes its operation
normally.

The log of the machine is a sequence of inserted coins. The log is correct if
there was no error signal emitted by the machine while it worked. Construct the
minimal deterministic automaton over the alphabet { eur, pln } recognizing the
language of correct logs.

Problem 52. Let L be a language over the alphabet { a, b} that contains the empty
word and all words starting with a that do not contain as an infix any palindrome
of length strictly greater than 3. Draw the minimal deterministic automaton
recognizing L. How many words are there in L?

Problem 53. Let L be the language of all words over the alphabet {0, 1} that do
not contain an anti-palindrome of length strictly greater than 2 (cf. Problem 21).
Draw the minimal deterministic automaton for L.

2.5 Variants of finite automata

Problem 54. Automata with #-transitions are an extension of ordinary finite au-
#
tomata that allows transition rules of the form p ! q; using this rule in a run
amounts to changing the state from p to q without advancing in the input word.
Show that for each automaton with #-transitions there exists an automaton with-
out #-transitions recognizing the same language.

A Mealy machine is a finite automaton with output. Let S be a finite in-


put alphabet and let D be a finite output alphabet. A Mealy machine can be
presented as a deterministic finite automaton A = (S, Q, q I , d) with the set of ac-
cepting states left unspecified, together with an output function g : Q ⇥ S ! D.
When the machine is in a state q and reads an input letter a, it moves to the
VA R I A N T S O F F I N I T E A U T O M ATA 19

next state d(q, a) while additionally outputting the letter g(q, a). That is, if
q0 , q1 , . . . , qn is the sequence of states constituting the unique run on an input
word w = a1 a2 · · · an 2 S⇤ , then the machine outputs the following n-letter word
over D:

b( w ) = g ( q 0 , a 1 ) g ( q 1 , a 2 ) · · · g ( q n
g 1 , a n ).

We say that the Mealy machine A realizes the function g b : S⇤ ! D⇤ defined


above. A function f : S ! D is a Mealy function, if it is realized by some Mealy
⇤ ⇤

machine.

Problem 55. A function f : S⇤ ! D⇤ reduces a language L ✓ S⇤ to a language


M ✓ D⇤ when w 2 L if and only if f (w) 2 M for all w 2 S⇤ . Construct a Mealy
function that reduces the language, consisting of the empty word and the words
with an odd number of b’s, to the set of words with an even number of b’s.

Problem 56. Show the following closure properties:

(1) if f 1 : S1⇤ ! S2⇤ and f 2 : S2⇤ ! S3⇤ are Mealy functions, then their composi-
tion f 2 f 1 is a Mealy function too;

(2) if f : S⇤ ! D⇤ is a Mealy function and L ✓ S⇤ is a regular language, then


f ( L) is a regular language too;

(3) if f : S⇤ ! D⇤ is a Mealy function and L ✓ D⇤ is a regular language, then


f 1 ( L) is a regular language too.

Moore machines are defined like Mealy machines, except that the output func-
tion has the form g : Q ! D and the word produced along the run q0 , q1 , . . . , qn
over a word w = a1 a2 . . . an is

b( w ) = g ( q 1 ) g ( q 2 ) . . . g ( q n ) .
g

Problem 57. Show that the class of functions realized by Moore machines coin-
cides with the class of Mealy functions.
20 REGULAR LANGUAGES

Two-way automata are similar to ordinary finite automata, except that when
reading the input word they can go either left or right; that is, their transition
relation d is of the form

d ✓ Q ⇥ (S [ {., /}) ⇥ { , !} ⇥ Q.

Note that the input word is decorated with markers . and / indicating the left
and the right endpoint of the word, respectively; that is, we run the automata
on words of the form .w/ for w 2 S⇤ . A configuration of the automaton is a
pair (q, i ), where q is a state and i is a position in the word .w/. The automaton
can move from configuration (q, i ) to configuration (q0 , i0 ) if either i0 = i + 1,
(q, ai , !,q0 ) 2 d, and ai 6= / or i0 = i 1, (q, ai , , q0 ) 2 d, and ai 6= .,
where ai is the ith letter of .w/. A an accepting run on .w/ is a sequence
(q0 , i0 ), (q1 , i1 ), . . . , (qk , ik ) of configurations like above such that

• q0 is the initial state of the automaton and i0 = 1,

• for all j < k the automaton can move from (q j , i j ) to (q j+1 , i j+1 ), and

• qk is accepting.

Thus, the automaton stops and accepts immediately upon reaching an accepting
state, regardless of the current position in the word. The recognized language is
the set of words w such that there is an accepting run on .w/.

Problem 58. Show that for each two-way automaton there exists an ordinary
finite automaton recognizing the same language.

Problem 59. For k 2 N {0}, let Lk ✓ { a, b, c}⇤ be the language


1
(( a + b)⇤ c)k ( a + b)⇤ a( a + b)k 1
c ( a + b + c)⇤ .

(1) Show that each deterministic automaton recognizing Lk has at least 2k


states, and similarly for ( Lk )R .

(2) Construct a deterministic two-way automaton recognizing Lk that has O(k)


states and changes the direction of movement only once.
C O M B I N AT O R I C S O F F I N I T E A U T O M ATA 21

2.6 Combinatorics of finite automata

Let w 2 S⇤ be a word of length n and 1  i, j  n. We use the notation w[i ] for


the ith letter of w and w[i..j] for the infix starting at the ith letter and ending at
the jth letter of w (positions are numbered from 1). In particular w[i..i ] = w[i ]
and w[1..n] = w. For j < i we assume w[i..j] = #.

Problem 60. Let k be a positive integer. Prove that every non-deterministic au-
tomaton recognizing the language

{ xcy : x, y 2 { a, b}⇤ ^ x [1..k] = y[1..k]}

has at least 2k states.

Problem 61. (⇤) Two states of an automaton are distinguished by a word w, if w


is accepted from exactly one of them. Show that if two states of an n-state deter-
ministic automaton are distinguished by some word, then they are distinguished
by a word of length at most n.

Problem 62. Show that for all words u, v such that |u| < |v| = n there exists a
deterministic automaton with O(log n) states that accepts u and rejects v.
hint: Use the fact that lcm(1, 2, . . . , `) 2` for ` 7 (M. Nair, 1982).1

Problem 63. Let An and Bn be the families of non-deterministic finite automata


over the alphabet { a, b} presented below:
a, b
a a, b a, b a, b a, b a, b a, b
An ...
q1 q2 q3 q4 qn 2 qn 1 qn

a b b b b b b
a a a a
Bn ... a a a
q1 q2 q3 q4 qn 2 qn 1 qn

1 M. Nair, On Chebyshev-type inequalities for primes, 1982.


22 REGULAR LANGUAGES

(1) Prove that the minimal deterministic automaton recognizing L(An ) has
2n 1 states.

(2) (⇤) Prove that the minimal deterministic automaton recognizing L(Bn )
has 2n states.

Problem 64. (⇤) Construct a non-deterministic automaton with n states such


that the shortest word it rejects has length 2W(n) .
hint: For each n, construct an automaton over Sn = { a1 , . . . , an } where the shortest
rejected word is wn defined recursively as follows: w1 = a1 and wi+1 = wi ai+1 wi for
i 1.

Problem 65. (⇤) For n 2, let An be the following deterministic automaton:


b, c b b, c b, c b, c b, c b, c
a, c a a a a a a
...
q1 q2 q3 q4 qn 2 qn 1 qn
a

Prove that every deterministic automaton recognizing the reverse of the lan-
guage recognized by An has at least 2n states.
hint: The group of permutations of a finite set X is generated by any single cycle
shifting all elements of X and any transposition of two consecutive elements on the
cycle.

Problem 66. For every n, find languages K1n , K2n , . . . , Knn over an alphabet Sn such
that
• each language Kin can be recognized by a deterministic automaton with at
most C states, for some constant C independent of n;

• each non-deterministic automaton recognizing Kn = K1n \ K2n \ . . . \ Knn has


2W(n) states.

Problem 67.

(1) Prove that for every non-deterministic automaton with n states there exists
a regular expression of length 2O(n) that recognizes the same language.
A L G O R I T H M S O N A U T O M ATA 23

(2) (⇤) Find an example showing that for a deterministic automaton with n
states, the shortest regular expression recognizing its language may have
length as high as 2W(n) .

Problem 68. The density of a language L is the function assigning to each n 2 N


the number of words of length n in L. Is there a regular language whose density
is o (cn ) for all c > 0, but is not O(nc ) for any c?

2.7 Algorithms on automata

In this section we compute the running time of algorithms in the random-access


machine (RAM) model, where any cell of the memory can be accessed in con-
stant time. In this model the running time can be slightly better than in the
Turing machine model, where memory is accessed sequentially by means of a
head moving along the tape, one cell at a time.
We write kAk for the total size of the representation of the automaton A,
when given as input.

Problem 69. Design an algorithm which, for a given non-deterministic automa-


ton A and a word w, decides if w 2 L(A) in time O(kAk · |w|).

Problem 70. Design an algorithm which, for a regular expression b and a word
w 2 S⇤ , decides whether b generates w and works in time

(1) O(|S| · | b|2 · |w|);

(2) (⇤) O(| b| · |w|).

Problem 71. Consider generalized regular expressions, which additionally use op-
erators \ and . Design an algorithm that, given a generalized regular expres-
sion b and a word w, decides in polynomial time whether w 2 L( b).

Problem 72. Let A be a fixed deterministic automaton. Design an algorithm that,


for a given non-negative integer n, computes the number of words of length n
accepted by A using O(log n) arithmetic operations. The constants hidden in the
O-notation may depend on A.
24 REGULAR LANGUAGES

Problem 73. Design an algorithm that, given two deterministic automata over a
fixed alphabet S, of sizes N1 and N2 respectively, decides in time O( N1 · N2 ) if
they recognize the same language.2 The constants hidden in the O-notation may
depend on S.

Problem 74. A word w synchronizes a deterministic automaton if there exists a


w
state q such that for all states q0 it holds that q0 ! q.

(1) Find a synchronizing word for the following automaton:


b b b b b b
a a a a
... a a a

0 1 2 3 n 3 n 2 n 1
a, b

(2) Design an algorithm that given an automaton with n states over a fixed-
size alphabet, decides in polynomial time whether there exists a synchro-
nizing word for it. If so, the algorithm should output such a word of
length O(n3 ).

(3) (⇤) Find a shortest synchronizing word for the automaton above.3

Problem 75. Design a polynomial-time algorithm for testing whether a given


finite set of words C ✓ S⇤ is a code (see Problem 5 for the definition).

Problem 76. Design algorithms solving the following two problems (ignoring
complexity issues).

(1) Given a finite automaton A and a finite alphabet S, verify whether for all
words w accepted by A, for all a, b 2 S, #a (w) = #b (w).

2 An intricate algorithm by Hopcroft (1976) solves this problem in time O( N log N ), where N =

N1 + N2 .
3 The Černý conjecture, a 40-years-old open problem, states that if there is a synchronizing word

for an automaton with n states, then there is one of length at most (n 1)2 .
STRINGOLOGY 25

(2) (⇤) Given a finite automaton A and a finite alphabet S, verify whether
for all but finitely many words w accepted by A, for all different a, b 2 S,
# a ( w ) 6 = #b ( w ) .

Problem 77. For a fixed deterministic automaton A design a dynamic data struc-
ture for a word w that can be build in time O(|w|) and enables the following
operations:

• change letter on position i to a 2 S in time O(log |w|);

• decide whether the current word belongs to L(A) in time O(1).

Problem 78. For a fixed deterministic automaton A design an algorithm, which


for a given word w performs a precomputation in time O(|w|) and then for given
positions i  j decides if w[i..j] 2 L(A) in time O(log |w|).

2.8 Stringology

Problem 79. For a given set of words w1 , w2 , . . . , wn over the alphabet S, con-
struct a deterministic automaton with at most Âin=1 |wi | states, recognizing the
language S⇤ (w1 + w2 + · · · + wn ).

Problem 80. (⇤) Let w 2 S⇤ be a word of length n > 0 and let A be the minimal
deterministic automaton recognizing the language S⇤ w.

(1) Show that A has n + 1 states.

(2) Show that in A all but at most 2n transitions go to the initial state.

(3) Show that A can be computed in time O(n), provided that the description
does not list transitions leading to the initial state.

Problem 81. (⇤) Recognizing subwords. Let w 2 S⇤ be a word of length n > 0.


Let A be the minimal deterministic automaton recognizing the set of suffixes of
w (including the empty word and w itself).

(1) Show that A has at most 2n + 1 states.


26 REGULAR LANGUAGES

(2) Show that in A all except at most 3n transitions go to the sink state.

(3) The set of all infixes of w is recognized by a modification of the automaton


A where all states except the sink state are accepting. Give an example of
a word w for which this automaton is not minimal.

hint: December 4th.

Problem 82. Draw minimal automata recognizing the following languages:

(1) all infixes of abbababa;

(2) all suffixes of abbababa.

For simplicity, omit the sink state. How many states are needed for the analogu-
ous automata for the words ab(ba)n , n 2 N, including the sink state?
3
Context-free languages

3.1 Context-free grammars

A context free grammar is a tuple G = (S, N, S, R), where S is a set of terminal


symbols (or terminals), N is a set of non-terminal symbols (or non-terminals), S 2 N
is an initial non-terminal, and R is a set of rules that are of the form X ! a, where
X 2 N is a non-terminal, and a is a sequence of symbols (terminals and non-
terminals) from S [ N; if the sequence a is empty, we write X ! #. We often
regroup the rules for X writing them as

X ! a1 . . . a n

instead of listing them separately: X ! a1 , . . . , X ! an .


Context free grammars are used to generate (derive) words over the alphabet
S of terminal symbols. For sequences a and b of terminal and non-terminal
symbols, we define a one step derivation relation a ! b whenever a = a1 Xa3 ,
b = a1 a2 a3 , and G has a rule X ! a2 . A word w 2 S⇤ is generated by G if
S !⇤ w, where S is the initial symbol of G, and !⇤ is the reflexive-transitive
closure of !. By L( G ) we denote the set of words generated by G.
A derivation for a word w in L( G ) is a sequence

S ! a1 ! . . . ! an = w.

The number n is called the length of such a derivation.


28 C O N T E X T- F R E E L A N G U A G E S

One can also present derivations as trees. A derivation tree for a word w in
L( G ) is a tree with nodes labelled by terminal or non-terminal symbols, or the
empty word #. It must satisfy the following conditions:

• if a node is labelled by a non-terminal X, then

• either its children, enumerated from left to right, have labels a1 , . . . , an 2


S [ N with n 1 and G has a rule X ! a1 . . . an ,
• or its only child has label #, and G has a rule X ! #;

• terminal symbols and # appear only in leaves; by reading these symbols


from left to right, we obtain the word w.

A context-free grammar is called unambiguous if it allows at most one deriva-


tion tree for every word. We remark that a single derivation tree may be written
in a linear form (that is, as a derivation) in multiple ways, depending on the
order in which rules of the grammar are applied. Thus, even if the grammar is
unambiguous, words may have multiple (linear) derivations.

Problem 83. Write context-free grammars for the following languages:

(1) the set of words over the alphabet { a, b} containing the same number of
occurrences of a and b;

(2) the set of words over the alphabet { a, b} containing twice as many occur-
rences of a as occurrences of b;

(3) the set of words over the alphabet { a, b} of even length where the num-
ber of occurrences of b in even positions is the same as the number of
occurrences of b in odd positions.

Problem 84. Write context-free grammars for the following languages:

(1) the set of fully parenthesized arithmetical expressions over the alphabet
{0, 1, (, ), +, ·} that evaluate to 2 under the standard interpretation of the
symbols in the alphabet;
C O N T E X T- F R E E G R A M M A R S 29

(2) the set of arithmetical expressions in the reverse Polish notation, over the
alphabet {0, 1, +, ·}, that evaluate to 4.

Problem 85. Write context-free grammars for the following languages:

(1) the set of propositional formulas with one variable p, and constants true,
false (the alphabet is { p, true, false, ^, _, ¬, ( , )});

(2) the set of formulas from the previous item that evaluate to true under
every valuation of p.

Problem 86. Write context-free grammars for the following languages:


n o
(1) ai b j ck : i 6= j _ j 6= k ;
n o
(2) ai b j a k : i + k = j ;

(3) ai b j c p d q : i + j = p + q .

Problem 87. Given two context-free grammars generating, respectively, lan-


guages L and K, construct grammars generating the languages L [ K, LK, L⇤ ,
LR .

Problem 88. Prove that the set of palindromes over a fixed alphabet, as well as
the complement thereof, are context-free languages.

Problem 89. Write a context-free grammar generating the language:


n o
L = ai b j : 1  i < 2j 1, 1  j .

Problem 90. Construct an unambiguous context-free grammar generating the


language of balanced sequences of parentheses (see Problem 2).

Problem 91. Give context-free grammars generating those sequences of balanced


parentheses that:

(1) contain even number of opening parentheses;


30 C O N T E X T- F R E E L A N G U A G E S

(2) do not contain (()) as a subword.


Problem 92. Construct an unambiguous context-free grammar generating the
set of words over the alphabet { a, b} containing the same number of occurrences
of a and b (cf. Problem 83 (1)).
Problem 93. Prove that for all L ✓ S⇤ the following conditions are equivalent:
(a) L is regular;

(b) L is generated by a context-free grammar in which every rule is of one of


the forms: X ! #, X ! Y, X ! aY with a 2 S;

(c) L is generated by a right-linear context-free grammar; that is, a grammar


whose rules are the form X ! u or X ! vY for u, v 2 S⇤ ;

(d) L is generated by a left-linear context-free grammar, that is, a grammar


whose rules are of form X ! u or X ! Yv for u, v 2 S⇤ .
Problem 94. Give an example of a context-free grammar with rules of the form
X ! #, X ! Y, X ! wY, X ! Yw with w 2 S⇤ that generates a non-regular
language. Can such grammars generate all context-free languages?
Problem 95. We say that a context-free grammar G has a self-loop if for some non-
terminal symbol X we have X !⇤ aXb where a, b 6= #. Prove that a grammar
without a self-loop generates a regular language.
Problem 96. Let G be a context-free grammar with m non-terminals and rules
whose right sides have length at most l. Show that if # is generated by G, then it
has a derivation of length at most 1 + l + l 2 + · · · + l m 1 . Is this bound optimal?
Problem 97. Show that for every context-free grammar G there is a constant C
such that every non-empty word w generated by G has a derivation of length at
most C · |w|.
Problem 98. Give an algorithm to decide whether a given context-free grammar
generates an infinite language.
Problem 99. Prove that every infinite context-free language can be generated by
a grammar whose all non-terminals generate infinitely many words.
C O N T E X T- F R E E O R N O T ? 31

3.2 Context-free or not?

Similarly to the pumping lemma for regular languages (cf. Section 2.2), there
is a pumping lemma which can be used to prove that a given language is not
context-free. There are several variants of this lemma. The basic pumping lemma
for context-free languages says that for each context-free language L there exists
a constant N with the following property: every word w 2 L of length at least N
can be decomposed as

w = prefix · left · infix · right · suffix,

in such a way that

• at least one of the words left, right is non-empty,

• the word left · infix · right has at most N letters,

• the word wk = prefix · leftk · infix · rightk · suffix belongs to the language L,
for all k 2 N.

A stronger variant is the so-called Ogden’s lemma, which talks about words
with a distinguished set of marked positions. It says that for each context-free
language L, there exists a constant N with the following property: every word
w 2 L with at least N marked positions (in particular, |w| N) can be decom-
posed as

w = prefix · left · infix · right · suffix,

in such a way that

• at least one of the words left, right contains a marked position,

• the word left · infix · right has at most N marked positions,

• the word wk = prefix · leftk · infix · rightk · suffix belongs to the language L,
for all k 2 N.
32 C O N T E X T- F R E E L A N G U A G E S

Notice that marking positions induces a tradeoff: on the one hand, we can guar-
antee that some position in left, right is marked, but, on the other hand, we know
that the length of left · infix · right is bounded by N only with respect to the
number of marked positions. The basic pumping lemma is a special case of Og-
den’s lemma with all positions of w marked. For some languages that cannot be
proved not to be context-free using the pumping lemma, Ogden’s lemma can be
helpful.
In determining whether a given language is context-free it is often useful
that the class of context-free languages is closed under homomorphic images,
finite unions, and intersections with regular languages.

Problem 100. Prove that no infinite subset of L = { an bn cn : n 1} is context-


free, but { a, b, c}⇤ L is context-free.

Problem 101. Prove that the following languages are not context-free:
n o
(1) L1 = ai b j ak : j = max{i, k} ;
n o
(2) L2 = ai bi c k : k 6 = i .
n o
Problem 102. Prove that L = ai b j ck : i 6= j, i 6= k, j 6= k is not context-free. Is
its complement context-free?

Problem 103. Is L = ai b j ai b j : i, j 1 or its complement context-free?

Problem 104. Let L = {ww : w 2 S⇤ }. Prove that L is context-free if, and only
if, S contains at most one letter, and that its complement is always context-free,
regardless of the cardinality of S.

n 105. Prove
Problem o that for every k 2 N, the complement of the language
Lk = wk : w 2 S⇤ is context-free.

Problem 106. Let the alphabet S = { a, b, $} contain three distinct letters. Prove
that the language L = {w$w : w 2 { a, b}⇤ } is not context-free, but its comple-
ment is.
C O N T E X T- F R E E O R N O T ? 33

Problem 107. Prove that L = {w$v : w, v 2 { a, b}⇤ , v is an infix of w} is not


context-free. Is its complement context-free?

Problem 108. Prove that L = w$vR : w, v 2 { a, b}⇤ , v is an infix of w is context-


free. Is its complement context-free?

Problem 109. Is the language


n o
L = w$vR : w, v 2 { a, b}⇤ , v is a prefix and a suffix of w

context-free? Is its complement context-free?

Problem 110. Prove that L = wwR w : w 2 { a, b}⇤ is not context-free. Is its


complement context-free?

Problem 111. Determine if the following languages are context-free:

(1) { x$y : x, y 2 {0, 1}+ , [ x ]2 + 1 = [y]2 },

(2) x$yR : x, y 2 {0, 1}+ , [ x ]2 + 1 = [y]2 .

Problem 112. Determine if the following languages are context-free:

(1) { am bn : m < n < 2m},

(2) { a, b}⇤ {( an bn )n : n 1},


n o
(3) ai b j ck : i, j, k > 0, i · j = k ,
n o
(4) ai b j ck dl : i, j, k, l > 0, i · j = k + l .

Problem 113. Is L = bin(n) $ bin(n2 )R : n 0 context-free?

Problem 114. Is L = {bin(n) bin(2n) : n 1} context-free?

Problem 115. (⇤)

(1) Prove that the set of tautologies over a fixed finite set of propositional vari-
ables V, interpreted as words over the alphabet V [ {false, true, _, ^, ¬, ( , )},
is context-free (cf. Problem 85 (2)).
34 C O N T E X T- F R E E L A N G U A G E S

(2) The set of formulas over a countable set of variables can be represented as
the language over the alphabet

{false, true, x, 1, 0, _, ^, ¬, ( , )}

generated by the following grammar:

F ! true false V ( F _ F ) ( F ^ F ) (¬ F ),
V ! x1 V0 V1.

For example, (( x101 _ (¬ x1)) ^ (¬(false _ x101))) is a formula. Prove that


the set of all tautologies is not a context-free language. (It follows easily
from the p 6= np conjecture, but show it without assuming this conjecture.)

Problem 116. Consider the ordered alphabet {0, 1} with 0 < 1. A word w is
primitive if there is no base u and exponent n > 1 such that w = un . A word w1
is a cyclic permutation of a word w2 if there exist words u, v such that w1 = uv
and w2 = vu. A Lyndon word is a primitive word which is lexicographically
the smallest among all its cyclic permutations. Is the set of all Lyndon words
context-free?

Problem 117. Let L be a context-free language. Is the set of palindromes in L


also context-free?

A context-free grammar is linear if in every rule A ! w the word w contains


at most one non-terminal. A context-free language is linear if it is generated by
a linear grammar.

Problem 118. Show that for each linear context-free language L there exists a
constant N such that each word w 2 L of length at least N can be factorized as

w = prefix · left · infix · right · suffix,

in such a way that left · right 6= #, |prefix · left|  N, |right · suffix|  N, and for all
k 2 N, the word wk = prefix · leftk · infix · rightk · suffix belongs to L.

Problem 119. Show that L = ai bi c j d j : i, j 2 N is not a linear context-free


language.
P U S H D O W N A U T O M ATA 35

Problem 120. Show that the set of those words w over the alphabet { a, b} which
have the same number of a’s and b’s is not a linear context-free language.

Problem 121. Determine if the following languages are context-free:

(1) {bin(n) $ bin(m) : 1  n  m},

(2) bin(n) $ bin(m)R : 1  n  m .

Problem 122. Let D1 , D2 denote the sets of balanced sequences of parentheses of


one type (round) and of two types (round and square), respectively. Determine
if the following languages are context-free:

(1) uvR : uv 2 D1 ,

(2) uvR : uv 2 D2 .

Problem 123. A word is square-free if it has no infixes of the form vv with v 6= #.


Prove that each language containing only square-free words is context-free if and
only if it is finite.

Problem 124. Give an example of a context-free language over a two-letter al-


phabet, whose complement is infinite and cube-free; that is, it contains no words
of the form uv3 w with v 6= #.
hint: Use the Thue-Morse sequence (see Problem 6).

3.3 Pushdown automata

A pushdown automaton can be presented as a tuple

A = (S, G, Q, q I , Z I , d, F ),

where S is an alphabet of input symbols, G is an alphabet of stack symbols, Q is


a set of states, q I 2 Q is an initial state, Z I 2 G is an initial stack symbol,
d ✓ Q ⇥ (S [ {#}) ⇥ G ⇥ Q ⇥ G⇤ is a transition relation, and F ✓ Q is a set of
36 C O N T E X T- F R E E L A N G U A G E S

accepting states. All of the above sets are required to be finite. We write transi-
tion rule (q, a, Z, q0 , g) 2 d as

q, a, Z !A q0 , g.

It tells the automaton to first pop the symbol Z from the stack, and then push
the sequence g.
A configuration of the pushdown automaton is a triple (q, w, g), where q 2 Q
is the current state, w 2 S⇤ is the word that remains to be read, and g 2 G⇤ is the
stack content (where the first letter is the symbol on the top of the stack, etc.).
Initial configurations are of the form (q I , w, Z I ); that is, the state is initial, and
the stack contains only the initial symbol. Final configurations are of the form
(q, #, g); that is, the whole input word is already read.
The following relation `A on configurations reflects a single step of the au-
tomaton: we let

(q, aw, Zb) `A (q0 , w, ab)

whenever A has a transition rule q, a, Z !A q0 , a (including the special case of


aw = w when a = #). Notice that A can reach a configuration (q, w, #) with
empty stack, but it can make no further transitions from this configuration. By
`A⇤ we denote the reflexive-transitive closure of ` .
A
A sequence of configurations (q0 , w0 , g0 ), (q1 , w1 , g1 ), . . . , (qm , wm , gm ) is called
a computation of A on a word w 2 S⇤ if (q0 , w0 , g0 ) is the initial configuration
with w0 = w, and (qi , wi , gi ) `A (qi+1 , wi+1 , gi+1 ) for all i < m. The computation
is accepting if (qm , wm , gm ) is a final configuration (that is, wm = #) and qm 2 F.
The language recognized by A is defined as the set of those words, on which
there exists an accepting computation:

L(A) = {w 2 S⇤ : (q I , w, Z I ) `A

(q F , #, g) for some q F 2 F, g 2 G⇤ } .

Two automata are equivalent if they recognize the same language.

Problem 125. Construct pushdown automata recognizing previously introduced


context-free languages:
P U S H D O W N A U T O M ATA 37

(1) palindromes (Problem 88),

(2) balanced sequences of parentheses (Problem 90),

(3) words containing two times more a’s than b’s (Problem 83 (2)),

(4) words that are not of the form ww (Problem 104).

Problem 126. Construct a pushdown automaton recognizing the language


n o
bin(n) $ bin(n + 1)R : n 2 N .

Problem 127. Construct a pushdown automaton recognizing the language


n o
bin(n) $ bin(3 · n)R : n 2 N .

Generalize this construction.

Problem 128. (⇤) Prove that for every pushdown automaton one can construct
an equivalent pushdown automaton with two states only.

Problem 129. Prove that for each pushdown automaton one can construct an
equivalent automaton (with the same states) that in each transition replaces the
topmost stack symbol with at most two stack symbols.

Problem 130. (⇤) Prove that for each pushdown automaton one can construct
an equivalent pushdown automaton that has only push rules and pop rules; that
is, only rules of the form

q, a, Z ! q0 , YZ and q, a, Z ! q0 , #.

Can one limit the number of states for such automata as well?

Problem 131. Given a pushdown automaton recognizing a language L, construct


pushdown automata recognizing the following languages:

(1) Prefix( L) = {w : 9v. wv 2 L},

(2) Suffix( L) = {w : 9u. uw 2 L},


38 C O N T E X T- F R E E L A N G U A G E S

(3) Infix( L) = {w : 9u, v. uwv 2 L},

(4) LR = wR : w 2 L ,

(5) (⇤) Cycle( L) = {vw : wv 2 L}.

Problem 132.

(1) Give an example of a non-regular context-free language L such that the


set of infixes of words from L is regular.

(2) Give an example of a non-regular context-free language L such that the


set of infixes of words from L is not regular.

Problem 133. Given a pushdown automaton recognizing a language L, and a


finite automaton recognizing a language K, construct pushdown automata rec-
ognizing the following languages:

(1) L \ K ,

(2) LK 1,

(3) K 1L .

Can this be done also when K is only assumed to be context-free?

Problem 134. Let max(w), min(w), med(w) denote, respectively, the maximum,
the minimum, and the median of the numbers #a (w), #b (w), #c (w). Determine
which of the following languages are regular, and which are context-free:

(1) {u 2 { a, b, c}⇤ : max(w) min(w)  2017 for each prefix w of u},

(2) {u 2 { a, b, c}⇤ : max(w) med(w)  2017 for each prefix w of u}.

Problem 135. (⇤) Prove that for every pushdown automaton A there exists a
constant C such that for every word w 2 L(A) there exists an accepting compu-
tation of length at most C |w|.
P R O P E RT I E S O F C O N T E X T- F R E E L A N G U A G E S 39

Problem 136. (⇤) Prove that for each pushdown automaton A the set of words
that are the possible contents of the stack in computations of A is regular. Then,
deduce that the set of words that are the possible contents of the stack in accepting
computations of A is also regular.

A pushdown automaton over the input alphabet S is deterministic if from


every configuration there is at most one possible move; that is, for each state p
and each stack symbol Z,

• for each symbol a 2 S [ {#} there is at most one transition of the form
p, a, Z ! q, g, and

• if there is a transition of the form p, #, Z ! q, g, then there is no transition


of the form p, a, Z ! q0 , g0 for a 6= #.

Problem 137. Prove that the language


n o
{ an bn : n 2 N } [ an b2n : n 2 N

cannot be recognized by a deterministic pushdown automaton.

Problem 138. Prove that the set of palindromes over the alphabet { a, b} cannot
be recognized by a deterministic pushdown automaton.

3.4 Properties of context-free languages

Problem 139. Give an example of a context-free language L such that the lan-
guage 12 L = { x : 9y. | x | = |y| ^ xy 2 L} is not context-free.

Problem 140. Recall the notion of shuffle, defined in Problem 40.

(1) Prove that the shuffle of a context-free language and a regular language is
context-free.

(2) Construct an example of two context-free languages whose shuffle is not


context-free.
40 C O N T E X T- F R E E L A N G U A G E S

Problem 141. Recall the notion of shuffle closure defined in Problem 41. Con-
struct a finite language whose shuffle closure is not context-free.

Problem 142. Prove that if X and Y are regular languages then the language
S n n
n2N X \ Y is context-free, but need not be regular.

Problem 143. Give an example of regular languages X, Y, Z such that the lan-
S
guage n2N X n \ Y n \ Z n is not context-free.

Problem 144. Give an example of a context-free language L such that the lan-
p
guage L = {w : ww 2 L} is not context-free.

Problem 145. Give


n an example of a context-free
o language L such that the lan-
k
guage Root( L) = x : x 2 L for some k is not context-free.

Problem 146. Let L be a regular language. Show that xyR : xy 2 L, x 6= y is a


context-free language.

Problem 147. Let L ✓ { a, b}⇤ be a regular language and let h1 , h2 be morphisms.


Show that h1 (u)(h2 (u))R : u 2 L is a linear context-free language.

Problem 148. Let h1 and h2 be morphisms on { a1 , b1 , a2 , b2 }⇤ defined by


hi ( ai ) = a, hi (bi ) = b, and hi ( a j ) = hi (b j ) = # for i 6= j. Show that the lan-
guage {w : h1 (w) = h2 (w)} is not context-free.

Problem 149. Show that for every pair of morphisms, h1 and h2 , the languages
xyR : h1 ( x ) = h2 (y) and xyR : h1 ( x ) 6= h2 (y) are both linear context-free.

Problem 150. Show that the class of linear context-free languages is closed under
intersections with regular languages.

Problem 151. For a given language L, let min( L) be the language of words from
L that are minimal in the prefix order; that is, u 2 min( L) if and only if u 2 L and
no strict prefix of u belongs to L. Prove that if L is a deterministic context-free
language, then so is min( L).
n o
Problem 152. Show that for L = ai b j ck : k i or k j the language min( L),
defined in Problem 151, is not context-free.
P R O P E RT I E S O F C O N T E X T- F R E E L A N G U A G E S 41

Problem 153. In analogy to Problem 151, we define the language max( L) of


words in L that are maximal in the prefix order; that is, u 2 max( L) if u 2 L
and no word having u as a strict prefix belongs to L. Give an example of a
context-free language L such that max( L) is not context-free.

Problem 154. The Hamming distance between two words of the same length
is the number of positions at which they differ. Prove that for every regular
|v|
language L, the set M of words v at a distance at most 2 from a word of length
|v| in L is context-free. Is it always regular?

Problem 155. (⇤) parikh’s theorem. Fix an alphabet S = { a1 , . . . , ad }. The


Parikh image of a word w 2 S⇤ is (#a1 (w), . . . , #ad (w)) 2 N d . The Parikh image of
a language L ✓ S⇤ is the set of Parikh images of all w 2 L. Prove that for every
context-free language over S there is a regular language over S with the same
Parikh image.

Problem 156. Prove that each context-free language over a one-letter alphabet is
regular.
n o
Problem 157. Prove that if L is a context-free language, then a|w| : w 2 L is a
regular language.

Problem 158. A language has the prefix property if for every two words from that
language, one is a prefix of the other. Show that if a context-free language has
the prefix property then it is a regular language.

Problem 159.

(1) Is there a language L over a finite alphabet such that neither L nor the
complement of L contain an infinite regular language?

(2) What if we additionally require L to be context-free?


4
Theory of Computation

4.1 Turing machines

A Turing machine is essentially a non-deterministic finite automaton enriched


with external memory in the form of an infinite sequence of cells, called the tape.
At every stage of the computation, the machine’s head is placed over one of the
cells. In every step, the machine changes its state, overwrites the contents of the
current cell, and possibly moves its head to a neighbouring cell.
Formally, a Turing machine M over an input alphabet S has a finite set of
states Q with a distinguished initial state q0 2 Q, a subset F ✓ Q of accepting
states, a finite tape alphabet T ◆ S with a distinguished blank symbol b 2 T S,
and a transition relation

d ✓ Q⇥T⇥Q⇥T⇥{ , , !}.

A transition rule (q, a, p, b, d) 2 d applies in state q if the machine’s head sees the
tape letter a in its current cell. The rule allows the machine to overwrite a with
b, change the state to p, and either keep the head over the same cell or move it
left or right, depending on d.
A configuration of the machine specifies the tape contents, the machine’s state,
and the position of the head: for w, v 2 T ⇤ and q 2 Q, the configuration wqv
describes the situation where the tape contains the word wv with all remaining
44 T H E O RY O F C O M P U TAT I O N

cells empty (that is, containing the blank symbol b), the state is q, and the head
is placed over the cell containing the first symbol of the word v.
Given an input word w, the machine starts its computation in its initial state
q0 , with its head over the first (left-most) symbol of w. The initial configuration
is thus q0 w. The machine’s run consists of applications of transition rules, which
yield a finite or infinite sequence of consecutive configurations. A configuration
wqv is accepting if the state q is accepting; a run is accepting if it is finite and
its last configuration is accepting. The language L( M) recognized by a machine
M consists of all those words w 2 S⇤ for which M has an accepting run start-
ing in the initial configuration q0 w. Two Turing machines are equivalent if they
recognize the same language.
Unless stated otherwise, we assume that the tape is infinite in both direc-
tions; however, one could also consider a model with right-infinite tape, where
there are no tape cells to the left of the initial position of the head. The two
models are computationally equivalent (see Problem 163).
Transition rules of a machine with k tapes are in the following format:

d ✓ Q ⇥ Tk ⇥ Q ⇥ Tk ⇥ { , , !}k .

Thus such a machine has k heads, moving independently, but a common state is
used to determine their moves. A machine with any constant number of tapes
can be simulated by a machine with a single tape. Therefore, the 1-tape model
is computationally equivalent to the k-tape one, for all k.
The Turing machines discussed so far are non-deterministic. A machine is
deterministic if its transition relation d satisfies the following condition: for every
state q and tape symbol a, there is at most one state p, tape symbol b and di-
rection t such that (q, a, p, b, t) 2 d (equivalently, there is exactly one p, b and t).
Thus, in every state q and for every tape symbol a, a deterministic machine has
at most one possible transition rule to apply.

Problem 160. Construct Turing machines over the input alphabet {0, 1} recog-
nizing the following languages:

(1) {ww : w 2 {0, 1}⇤ };


TURING MACHINES 45

(2) palindromes;

(3) sequences over {0, 1} representing prime numbers in binary notation.

Problem 161. A directed graph with n vertices {0, . . . , n 1} can be represented


by a word over {0, 1} of length n2 , whose kth letter is 1 if and only if there is an
edge in the graph from vertex i to vertex j, where k = n · i + j + 1.

(1) Construct a non-deterministic Turing machine that recognizes the language


of all those words over {0, 1} that represent a graph with a path from
vertex 0 to vertex n 1.

(2) Construct a deterministic Turing machine recognizing the same language.

Problem 162. We say that a deterministic Turing machine over input alphabet
{ a} computes a function f : N ! N in unary representation if, for each n 0,
the computation of the machine starting in the initial configuration q0 an termi-
nates in the configuration q f a f (n) , where q f is a distinguished final state.

(1) Construct a Turing machine computing the function n 7! 2n .

(2) Construct a Turing machine computing the function n 7! dlog2 ne.

Problem 163. Prove that every Turing machine is equivalent to a machine with a
right-infinite tape; that is, a tape that has no cells to the left of the initial position
of the head.

Problem 164. Given a non-deterministic Turing machine construct an equivalent


deterministic one.

Problem 165. Given two deterministic Turing machines M1 and M2 over the al-
phabet S, construct deterministic machines recognizing the following languages:

(1) L( M1 ) [ L( M2 );

(2) L( M1 ) \ L( M2 );

(3) L( M1 ) L( M2 );
46 T H E O RY O F C O M P U TAT I O N

(4) L( M1 )⇤ .

Problem 166. Prove that every Turing machine M over the input alphabet {0, 1}
is equivalent to a Turing machine M0 with tape alphabet {0, 1, b } which never
writes the blank symbol b.

Problem 167. In a write-once Turing machine, whenever a transition rule over-


writes a symbol a with a symbol b, either a = b 6= b or a = b 6= b holds.

(1) Given a 1-tape Turing machine, construct an equivalent 2-tape write-once


machine.

(2) (⇤) Prove that 1-tape write-once Turing machines only recognize regular
languages.

Problem 168. (⇤) Prove that a deterministic 1-tape Turing machine that makes
O(n) steps on each input of length n recognizes a regular language.

Problem 169. Given a Turing machine, construct an equivalent 1-tape machine


with four states.

Problem 170. The notion of pushdown automaton can be naturally extended to


automata with k stacks, for any k. Show that every Turing machine is equivalent
to an automaton with two stacks. Deduce further that every automaton with k
stacks is equivalent to an automaton with two stacks.

An automaton with a queue is similar to pushdown automata, except that it


performs operations on a queue, not on a stack. Transition rules are of the forms

(q, a, p), (q, get(s), p), (q, put(s), p),

where q, p are states, a is an input letter, and s is an element of a finite queue


alphabet S. The first one reads a from the input; the second one is only enabled
when s is the first symbol in the queue and it removes this symbol from the
queue; the last one adds s to the queue as the last symbol. We assume that the
queue is initially empty.
C O M P U TA B I L I T Y A N D U N D E C I D A B I L I T Y 47

Problem 171. Prove that every Turing machine is equivalent to an automaton


with a queue.

A k-counter automaton, for k 1, is a non-deterministic finite automaton ad-


ditionally equipped with k counters c1 , . . . , ck . Each counter stores a non-negative
integer; initially, all the counters are set to 0, except for a distinguished counter c1
whose initial value is understood as the input of the counter automaton. Thus,
counter automata recognize sets of non-negative integers, rather than sets of
words. Transitions of counter automata do not read input, but manipulate coun-
ters: every transition performs an operation on one of the counters ci . The
allowed operations are:
?
ci = 0 (zero test),
ci ++ (increment),
ci (decrement),

but the transition ci can be executed only if the current value of ci is strictly
positive. That is, counter values are not allowed to drop below 0.

Problem 172. Turing machines over a one-letter input alphabet can be viewed as
acceptors of sets of natural numbers, written in unary notation. Prove that such
Turing machines are equivalent to a 3-counter automata.

4.2 Computability and undecidability

A machine halts on an input word w if it has no infinite run starting from the
initial configuration q0 w. A language L ✓ S⇤ such that L = L( M) for some Tur-
ing machine M that may or may not halt on all input words is called recursively
enumerable or semidecidable. Problem 173 below justifies the common use of both
these rather different names: it implies that a language L is accepted by a Turing
machine if and only if there exists a (different) Turing machine that outputs all
words in L one by one. A machine that halts on all inputs is called total. If
L = L( M) for a total machine M, then L is called decidable. Unsurprisingly, a
language that is not decidable is called undecidable.
48 T H E O RY O F C O M P U TAT I O N

It is standard to identify a language with the computational problem of


checking whether a given word belongs to the language. For example, one may
say that “it is decidable whether a given number is prime”, meaning that the lan-
guage of all (representations of) prime numbers is decidable. In such statements
one typically neglects to specify a concrete representation schema for numbers
(or for automata, grammars, Turing machines or other input objects) as words,
since decidability properties usually do not depend on the chosen method of
representation.
Many natural computational problems are known to be undecidable, the
halting problem for Turing machines being the archetypical example. Consider
a representation of Turing machines as words over some fixed finite alphabet
(for instance, as a list of alphabet letters followed by a list of transition rules).
The halting problem is then the problem of checking, given a representation
[ M] of a machine M and a word w over the input alphabet of M, whether M
halts on input w. Other undecidable problems include checking whether a given
machine accepts a non-empty language, a regular or context-free language, etc.
In fact, the well-known Rice theorem says that every non-trivial question about
the language accepted by a given Turing machine is undecidable.
Turing machines can be seen as devices for computing functions. One way
to do that, for functions on natural numbers, is used in Problem 162. Another,
more general way is to consider functions f : S⇤ ! G⇤ for some alphabets S and
G. If a machine M with input alphabet S, given input w 2 S⇤ as input, halts
in an accepting configuration with a word v 2 G⇤ written on its tape, then we
say that M computes a function f such that f (w) = v. In general the function
computed by a machine is partial, because the machine may reject some inputs
and may not halt on some inputs. Such a function is called a partial computable
function. If the machine accepts every input then the function is total, and is
simply called a computable function.

Problem 173. Prove that the following conditions on a non-empty language L


are equivalent:

(a) L is recursively enumerable;


C O M P U TA B I L I T Y A N D U N D E C I D A B I L I T Y 49

(b) L is the domain of some partial computable function;

(c) L is the image of some partial computable function;

(d) L is the image of some computable function.

Problem 174. Prove that a set L ✓ N, treated as a language L over the alphabet
{0, 1} via the standard binary representation of natural numbers, is decidable if
and only if it is finite or it is the image of some strictly increasing computable
function f : N ! N.

Problem 175. Prove the following Turing–Post theorem: if a language and its
complement are both recursively enumerable, then they are both decidable.

Problem 176. (⇤) Prove that there exists a recursively enumerable set whose
complement is infinite but does not contain any infinite, recursively enumerable
subset.
hint: Construct the set by choosing, for every Turing machine M that accepts an infinite
language, a single word accepted by M. Choose wisely, so that the complement of your
set remains infinite.

Problem 177. Recall the definition of a k-counter automaton from Problem 172.
Prove that there exists a 2-counter automaton A such that it is undecidable
whether A halts on a given input number n.

Problem 178. (⇤) Prove that the following Post problem is undecidable: Given
two lists of words u1 , . . . , un 2 S⇤ and w1 , . . . , wn 2 S⇤ , is there a sequence of
indices i1 , . . . , im 2 {1, . . . , n} such that

u i1 . . . u i m = wi1 . . . wi m ?

If such a sequence exists then the word ui1 . . . uin (equal to wi1 . . . win ) is called a
solution of the instance of the Post problem.
hint: As an intermediate step, use a modification where i1 has to be 1.
50 T H E O RY O F C O M P U TAT I O N

Problem 179. Prove that the following universality problem is undecidable: given
a context-free grammar G over an alphabet S, is it the case that

L( G ) = S⇤ ?

Problem 180. Are the following problems decidable?

(1) Given a context-free grammar G, does L( G ) contain a palindrome?

(2) Given two context-free grammars G1 and G2 , is it the case that

L( G1 ) \ L( G2 ) = ∆ ?

(3) Is a given a context-free grammar G unambiguous?

Hint: Use the Post problem.

Problem 181. Assume that we know that a 1-tape, deterministic Turing machine
M makes at most one “turn”; that is, once the head moves to the left it never
moves to the right again. Is it decidable whether a given machine M with this
property accepts a given word w?

Problem 182. Is the following problem decidable: given words u, w 2 S⇤ and a


number k, is there a word x 2 S⇤ of length at least k such that #u ( x ) = #w ( x )?

Problem 183. Fix an encoding of Turing machines that represents a machine M


as a word [ M ] over {0, 1}. Consider a function C that maps a pair ([ M ], v) to
the minimal length of a word w such that M (w) = v, or to a special symbol • if
there is no such w.

(1) Prove that the function C is not computable.

(2) Prove that the function C can be approximated in the following sense:
there is a Turing machine that, for input ([ M ], v), produces an infinite
sequence of numbers that eventually stabilizes at the value C ([ M ], v).

Note that there is no contradiction between (a) and (b), since an observer of an
infinite sequence can never say whether it has already stabilized.
CHOMSKY HIERARCHY 51

4.3 Chomsky hierarchy

Context-sensitive grammars are defined like context-free grammars, with the ex-
ception that production rules are of the form:

aXb ! agb,

where X 2 V, a, b, g 2 (S [ V )⇤ , g 6= #, for a set V of non-terminal symbols and


a set S of terminal symbols. Languages generated by context-sensitive grammars
are called context-sensitive languages.
Under the above definition, context-sensitive languages cannot contain the
empty word. One can easily extend the definition slightly to allow the empty
word. In this section, however, we prefer to keep this simple definition and
restrict our attention to non-empty words.
The Chomsky hierarchy consists of four classes of languages:

Type 0 : Recursively enumerable languages,


Type 1 : Context-sensitive languages,
Type 2 : Context-free languages,
Type 3 : Regular languages.

Ordered by inclusion, they form a strictly increasing chain:

Type 3 ( Type 2 ( Type 1 ( Type 0.

Problem 184. A monotonic grammar has rules of the form a ! b, where | b| |a|.
Prove that each monotonic grammar can be transformed into a context-sensitive
grammar (possibly using different non-terminal symbols).

Problem 185. Prove that context-sensitive languages are exactly the languages
recognized by non-deterministic linear bounded automata, that is, 1-tape Turing
machines which are only allowed to use the cells of the tape that are initially
occupied by the input word (we assume that the last letter of the input word is
marked).1
1 One can equivalently take Turing machines using O(n) space.
52 T H E O RY O F C O M P U TAT I O N

Problem 186. Prove that the quotient of a context-sensitive language by a regular


language may be an undecidable language.
hint: Use the language of computations of a Turing machine that recognizes a recur-
sively enumerable but undecidable language.

Problem 187. Prove that recursively enumerable languages are closed under
quotients by recursively enumerable languages.

Problem 188. Which of the four classes of the Chomsky Hierarchy are closed
under union, intersection and complement?

4.4 Computational complexity

In this section we assume the standard model of Turing machines defined in


Section 4.1. All machines are multi-tape by default, unless explicitly stated oth-
erwise.
We will commonly use the concept of an off-line Turing machine. In this set-
ting, the input word w is given on a special input tape, which contains w delim-
ited by the start marker . on the left and the end marker / on the right. The
input tape is read-only, which means that the head, initially placed over the start
marker, can only move over the tape reading its contents, but cannot write on it.
The machine, however, also has a constant number of work tapes, initially filled
with blanks, which can be used for storing intermediate results of computations.
The transitions are defined as usual for multi-tape machines; that is, a transition
consists of simultaneous moves of all the heads over all the tapes.
Recall that the class l contains all languages recognized by an off-line Turing
machine working in logarithmic space; that is, the working space used for inputs
of size n is bounded by k · log2 n for some constant k. Similarly, we can define
functions f : S⇤ ! G⇤ computable in logarithmic space, for some fixed alphabets
S and G. We say that such a function f is computable in l if there is an off-line
Turing machine that, given input w of size n, uses at most k · log n working space
and outputs the word f (w) in the following sense. Transitions of the machine
can be enriched with annotations ‘output g’ for some g 2 G. When such a
C O M P U TAT I O N A L C O M P L E X I T Y 53

transition is executed, the symbol g is written to the output. The machine has to
accept and the word composed of consecutive output letters has to be equal to
f ( w ).

Problem 189. Prove that the composition of two functions computable in l is


also computable in l.

A function f is space-constructible if there exists an off-line Turing machine


that on input 1n produces the word 1 f (n) on the first work tape while visiting
only O( f (n)) cells of the work tapes in total.

Problem 190. Which of the following functions are space-constructible: 2n, n2 ,


n
nk for any constant k, 2n , 22 , dlog2 (n + 1)e?2

A function f is time-constructible if there exists an off-line Turing machine


that on input 1n produces the word 1 f (n) on the first work tape while performing
O( f (n)) steps in total.

Problem 191. Which of the following functions are time-constructible: 2n, n2 , nk


n
for any constant k, 2n , 22 , dlog2 (n + 1)e?

Problem 192. Prove that the classes p, np, and pspace are closed under Kleene’s
star in the following sense: if a language L belongs to the class, then so does the
language L⇤ .

Problem 193. Show that if the class l is closed under Kleene’s star, then l = nl.

Problem 194. In the Exact Set Cover problem one is given a finite universe
U and a family F of subsets of U. The task is to determine whether there is a
subfamily of F consisting of pairwise disjoint sets whose union is equal to U.
Prove that this problem is np-complete.

Problem 195. In the Subset Sum problem one is given a set S of non-negative
integers and a target non-negative integer t, all encoded in binary. The task is
to verify whether there exists a subset of S such that the sum of elements of the
subset is equal to t. Prove that this problem is np-complete.
2 Note that dlog2 ne is not well defined for n = 0.
54 T H E O RY O F C O M P U TAT I O N

Problem 196. In the Tiling problem we are given a set of square tiles S ✓
{0, 1, . . . , n}4 and an integer N, represented in unary. Each tile x is represented
as a quadruple of integers x = ( x [ ], x ["], x [!], x [#]) from the range between
0 and n, which we will treat as colours of the respective sides of the tile. A
tiling of an N ⇥ N square is called proper if it consists of N 2 tiles from S aligned
side-to-side, and the following two conditions are satisfied:

• If two tiles share a side, the colours of the corresponding sides of the tiles
match.

• The sides of the N ⇥ N square are coloured with 0.

The tiles cannot be rotated. The question is whether such a proper tiling exists.
Prove that this problem is np-complete.

Problem 197. Prove that every problem in np can be reduced to 3SAT by a


reduction working in logarithmic space.

Problem 198. Prove that it is np-complete to decide if a given regular expres-


sion over a given alphabet generates some word containing all letters from the
alphabet.

Problem 199. Prove that it is pspace-complete to decide if a given non-deterministic


automaton A over an alphabet S rejects some word in S⇤ .

Problem 200. Prove that it is pspace-complete to decide, given a finite set of


automata, if there is a word accepted by all automata from the set.

You might also like