200 Only Problems
200 Only Problems
200 Only Problems
b l em
Pro ua g es
0
20 ormal Lan g
F
and
Automata
Theor y
damian niwiński
wojciech rytter
Niwiński and Rytter’s
200 Problems
in Formal Languages
and Automata Theory
When Mikołaj Bojańczyk first brought up the topic of a gift for Damian’s 60th
birthday, I thought it was the greatest excuse one could hope for. And so it
happened that 19 people, related in various ways to the automata group at the
University of Warsaw, enthusiastically agreed to contribute. For all of us, this
was a wonderful opportunity to thank Damian for creating the automata group
and shaping us as researchers.
Over the course of slightly more than a year, we wrote solutions to all problems
in the original collection. Some were later merged, a few were removed. We
included several exercises used as home assignments in the most recent edition
of the automata course, taught by Wojtek Czerwiński, as well as selected
problems from Damian’s Computational Complexity course. A seemingly
innocent problem was offered at the last moment by Wojtek Rytter, causing
some embarrassing blunders on my side. This problem, along with several
others, is now marked with (⇤) to indicate that it is particularly difficult.
Indeed, working on the problem book we all kept rediscovering how much
more there is to learn about automata and formal languages. It would make us
very happy to see the readers share this experience.
2 Regular languages 7 69
(1) Prove that for each non-empty word w there is exactly one primitive word
v such that w = vn for some n 1. We call n the exponent of the word w.
(2) For any words w, v, we say that the words wv and vw are conjugate to
each other. Prove that being conjugate is an equivalence relation and all
conjugate words have the same exponent. What is the cardinality of the
equivalence class of a word of length m and exponent n?
4 WORDS, NUMBERS, GRAPHS
(a) The least set L such that the empty sequence # is in L and if w, v 2 L then
(w), wv 2 L.
(b) The set K of words over the alphabet {( , )} in which the number of oc-
currences of ) is the same as the number of occurrences of (, and in each
prefix the number of occurrences of ) is not greater than the number of
occurrences of (.
Problem 3. semi-linear sets. For any fixed a, b 2 N, the set of natural num-
bers { a + bn : n 2 N } is called linear. A semi-linear set is a finite union of linear
sets. (The empty set is obtained as the union of the empty family of linear sets.)
(3) Fix a directed graph. Prove that the set of lengths of all directed paths
between any two fixed vertices is semi-linear.
(4) Prove that the family of all semi-linear sets is closed under finite unions,
finite intersections, and complement with respect to N.
by a multiple of 90 degrees. The barman wins if at any moment all glasses are
in the same position (he is to be informed about this immediately). Can the
barman win this game, starting from an unknown initial position? If so, how
many moves are sufficient? Would you play this game for money against the
barman? What about an analogous game with 3 or 5 glasses?
(1) Let S = { a, b}. Prove that the set { aa, baa, ba} is a code and the set
{ a, ab, ba} is not a code.
(2) For a finite set A that is not a code, give an upper bound on the length of
the shortest word that admits two different factorizations.
(1) the infinite sequence of 0’s and 1’s obtained by starting with 0 and succes-
sively appending the sequence obtained so far with all bits flipped;
(2) the infinite word s0 s1 s2 . . . such that sn = 0 if the number of 1’s in the
binary expansion of n is even, and sn = 1 if it is odd;
(3) the infinite word t0 t1 t2 . . . , whose letters satisfy the recurrence relation:
t0 = 0, t2n = tn , and t2n+1 = 1 tn for all n.
Show that the Thue–Morse word is cube-free; that is, it contains no infix of the
form www with w 6= #. In fact, it is strongly cube-free; that is, it contains no infix
of the form bwbwb for b 2 {0, 1}.
hint: First show that it contains neither 000 nor 111 as an infix, but each infix of length
5 contains 00 or 11 as an infix.
6 WORDS, NUMBERS, GRAPHS
Construct an infinite word over a four-letter alphabet that is square-free; that is, it
contains no infix of the form ww with w 6= #. Can it be done with three letters?
And two letters?
2
Regular languages
A regular expression is used to generate (that is, describe) a set of words. The
most basic regular expressions are ∆, which generates no words, and #, which
generates only the empty word. Each individual letter a can be viewed as a reg-
ular expression, which generates only one word, namely the one-letter word a.
Finally, regular expressions can be combined using the following operators:
The operator e⇤ is called Kleene star. Note the case of n = 0 in the definition of
the Kleene star, which means that the empty word is generated by the Kleene
star of every expression. Here is an example of a regular expression that uses all
the available operations except ∆:
⇤
( a + b)( a + b) # + (( a + b) a( a + b)) .
This particular expression generates the set of words over the alphabet { a, b}
which have either even length, or have an odd length of at least 3 and penulti-
mate letter a.
8 REGULAR LANGUAGES
(S, Q, I, d, F ),
where S is the input alphabet, Q is the set of states, I ✓ Q is the set of initial
states, F ✓ Q is the set of final (or accepting) states, and d ✓ Q ⇥ S ⇥ Q is the
transition relation; elements of d are called transitions or transition rules, and
a
are written as q ! q0 . An automaton is often drawn as follows:
a
a, b transition
a a, b
initial state
a, b
an automaton
accepting state
The automaton in the picture above has only one initial and one final state, but
this is not required. We extend the notation for transition rules to arbitrary
w
words w 2 S⇤ : we write p ! q if there is a run over w that begins in state p and
ends in state q; that is, a path in the automaton which goes from state p to state
q, and such that the labels of the edges on the path (that is, the transitions used
in the run) are a1 , . . . , an where w = a1 a2 . . . an . A word w is accepted if there is
a run over w from some initial state to some accepting state. For example, the
automaton in the picture above accepts exactly the odd-length words generated
by our example regular expression above. The language recognized by A, denoted
by L(A), is the set of words accepted by A. Two automata are equivalent if they
recognize the same languages.
We often implicitly assume that non-deterministic automata have only one
initial state. This can be done without loss of generality, because for each finite
automaton one can construct an equivalent automaton (of the same size) with a
single initial state. For automata with a single initial state q I we use the notation
(S, Q, q I , d, F ).
An automaton is called deterministic if it has one initial state and its transition
relation is a function from Q ⇥ S to Q, which means that for every q 2 Q and
R E G U L A R E X P R E S S I O N S A N D F I N I T E A U T O M ATA 9
a
a 2 S, there is exactly one state p such that q ! p. Determinism guarantees
that for every word there is exactly one run, and thus a word is accepted if and
only if this unique run ends in an accepting state. For each non-deterministic
automaton one can construct an equivalent deterministic automaton, but the
number of states may grow exponentially.
( L⇤ M⇤ )⇤ = ( L [ M )⇤ .
generates all words over the alphabet {0, 1}⇤ where both 0 and 1 appear an even
number of times.
Problem 9. Construct an automaton over the alphabet {0, 1}, which recognizes
those words, where the number of ones on even-numbered positions is even,
and the number of ones on odd-numbered positions is odd.
Problem 10. addition. Consider the alphabet {0, 1}3 , with letters written as
columns. Give a regular expression defining the language
82 3 2 3 2 3 9
< a1
> a2 an >
=
6 76 7 6 7
4 b1 5 4 b2 5 . . . 4 bn 5 : [ a1 a2 . . . an ]2 + [b1 b2 . . . bn ]2 = [c1 c2 . . . cn ]2 .
>
: c >
;
1 c2 cn
(2) Do the same, but with a reverse representation, where the least significant
digit comes first.
(1) Prove that a language L ✓ { a}⇤ is regular if and only if the set of natural
numbers {n : an 2 L} is semi-linear in the sense of Problem 3.
(2) Prove that for an arbitrary set X ✓ { a}⇤ , the language X ⇤ is regular.
(1) Prove that for every regular language L, the set {|w| : w 2 L} is semi-
linear. In particular, regular languages over one-letter alphabets corre-
spond to semi-linear sets via the bijection w 7! |w|.
Problem 14. Prove that for all a, b, k, r 2 N, the following language is regular:
Problem 15. Prove that the following languages are not regular:
(1) { an bn : n 2 N };
n n o
(2) a2 : n 2 N ;
(4) ai b j : gcd(i, j) = 1 ;
THE PUMPING LEMMA 11
(5) { am bn : m 6= n};
Problem 16. Prove that the language of palindromes over an alphabet with at
least two letters is not regular.
Problem 17. A regular expression over an alphabet S can be seen as a word over
the alphabet S [ ∆, #, +, ⇤, ( , ) . Prove that the set of regular expressions over
an alphabet S is not a regular language.
Problem 19. Prove a slightly stronger version of the pumping lemma: if a lan-
guage L is regular, then there exists a constant N such that for any words v, w, u
satisfying |w| N and vwu 2 L, there exist words x, y, z such that w = xyz,
0 < |y| N, and vxyn zu 2 L for all n 2 N. Exhibit a language which satisfies
the claim of this stronger lemma, but is not regular.
hint: Consider the language
{w 2 { a, b}⇤ : #a (u) > 2017 · #b (u) for each non-empty prefix u of w}?
w = z z R or w = z s z R ,
where z is obtained from z by flipping the bits. For instance, 0010011 and 0011
are antipalindromes, and the empty word or a single letter are not. Let L be the
set of binary words which do not contain as a subword any antipalindrome of
length greater than 3 whose first letter is 0. Is L a regular language?
12 REGULAR LANGUAGES
h ( a1 a2 . . . a n ) = h ( a1 ) h ( a2 ) . . . h ( a n );
Problem 23. Prove that for each regular language L ✓ S⇤ and each set X ✓ S⇤
the following languages, known as left and right quotients of L, are regular:
1 1
X L = {w : 9v 2 X. vw 2 L} , LX = {w : 9u 2 X. wu 2 L} .
Cycle( L) = {vu : uv 2 L} .
Does the fact that Cycle( L) is regular imply that L is regular as well?
C L O S U R E P R O P E RT I E S 13
Problem 26. Let L be a regular language over the alphabet {0, 1}. Prove regular-
ity of the language Lmin of words w 2 L which are minimal in the lexicographic
order among words of length |w| in L.
Problem 27. Let us assign to each non-empty binary word w 2 {0, 1}+ the
number 0.w in [0, 1) defined as
1 1 1
0.w = w[1] · + w[2] · 2 + . . . + w[|w|] · |w| .
2 2 2
For a real number r 2 [0, 1], let
Lr = {w : 0.w r } .
Problem 28. Give an example of an infinite language closed under infixes that
does not contain an infinite regular language as a subset.
Problem 29. Let L be a regular language. Prove that the following languages are
also regular:
1
L = {w : 9u. |u| = |w| ^ wu 2 L} ,
2
p
L = {w : ww 2 L} .
Problem 30. Let L be a regular language. Prove that the following languages are
also regular:
Root( L) = {w : 9n 2 N. wn 2 L} ,
n o
Sqrt( L) = w : 9u. |u| = |w|2 ^ wu 2 L ,
n o
Log( L) = w : 9u. |u| = 2|w| ^ wu 2 L ,
n o
Fibb( L) = w : 9u. |u| = F|w| ^ wu 2 L ,
Problem 31. nProve that foroevery regular language L, the following language is
also regular w : w|w| 2 L .
L = i 1 i 2 . . . i k : L i1 L i2 . . . L i k ✓ L .
Problem 33. Let PalS denote the set of palindromes over the alphabet S of length
⇤
at least 2. Prove that the language PalS is regular if and only if |S| = 1. Is the
⇤
language uuR : u 2 (0 + 1)⇤ regular?
• reachable, i.e., every state is reachable from the initial state, and
vw 2 L , v0 w 2 L for all w 2 S⇤ .
Problem 44. Construct the minimal deterministic automaton for the language
L = ai bn a j : n > 0, i + j is even .
Problem 45. Construct the minimal deterministic automaton for the language
L = w 2 {0, 1, 2, 3, 4}+ : maxi,j |w[i ] w[ j]| 2 .
Problem 46. For k 2 N let Lk ✓ {0, 1}⇤ be the language of words where each
infix of length k contains exactly two 1’s and each infix of length at most k
contains at most two 1’s. Describe minimal deterministic automata recognizing
Lk for k 2 {3, 4}.
M I N I M A L A U T O M ATA 17
Problem 47. For n > 0 let NPaln be the set of words over {0, 1, . . . , n 1} that
contain no non-trivial palindrome as an infix. How many states does the mini-
mal deterministic automaton for NPaln have?
Problem 48. Football teams A, B, and C compete against each other according
to the following rule: the winner of the previous match plays against the team
that did not participate in it. Assuming that there are no draws, consider the
language over the alphabet { A, B, C } of possible sequences of winners. Prove
that it is a regular language and describe its minimal deterministic automaton.
Problem 49. Mr. X owns stock of three different companies: A, B, and C. Every
day, he checks the relative values of his stocks and orders them from the most
to the least valuable (we assume that the values of two stocks can never be the
same). Mr. X decided to sell stock of a company if it ever goes down in the order
for two days in a row. For example, if the stock record in three consecutive days
is CBA, ACB, ABC, then Mr. X will sell C. Let S be the set of all permutations
of { A, B, C }. Prove that the language L of words over S that describes those
stock records that do not lead to the sale of any stock owned by Mr. X is regular.
Calculate the number of states in the minimal deterministic automaton for L.
Problem 50. (⇤) Mr. X decided that every day he will work or not, following
the rule that in any seven consecutive days there are at most four working days.
A calendar of n consecutive days can be expressed as a word of length n over
alphabet {0, 1}; where 1 means a working day and 0 means a day off. Prove
that the set of all words describing valid calendars forms a regular language
over the alphabet {0, 1}. Describe the minimal deterministic automaton for this
language.
Problem 51. Consider a vending machine that accepts coins in two currencies,
eur and pln with 1 eur = 4 pln, and works as follows:
• The machine accepts 1 pln or 1 eur; in the latter case the machine gives
back 3 pln if they are available. If the machine cannot give back change,
then it signals an error.
The log of the machine is a sequence of inserted coins. The log is correct if
there was no error signal emitted by the machine while it worked. Construct the
minimal deterministic automaton over the alphabet { eur, pln } recognizing the
language of correct logs.
Problem 52. Let L be a language over the alphabet { a, b} that contains the empty
word and all words starting with a that do not contain as an infix any palindrome
of length strictly greater than 3. Draw the minimal deterministic automaton
recognizing L. How many words are there in L?
Problem 53. Let L be the language of all words over the alphabet {0, 1} that do
not contain an anti-palindrome of length strictly greater than 2 (cf. Problem 21).
Draw the minimal deterministic automaton for L.
Problem 54. Automata with #-transitions are an extension of ordinary finite au-
#
tomata that allows transition rules of the form p ! q; using this rule in a run
amounts to changing the state from p to q without advancing in the input word.
Show that for each automaton with #-transitions there exists an automaton with-
out #-transitions recognizing the same language.
next state d(q, a) while additionally outputting the letter g(q, a). That is, if
q0 , q1 , . . . , qn is the sequence of states constituting the unique run on an input
word w = a1 a2 · · · an 2 S⇤ , then the machine outputs the following n-letter word
over D:
b( w ) = g ( q 0 , a 1 ) g ( q 1 , a 2 ) · · · g ( q n
g 1 , a n ).
machine.
(1) if f 1 : S1⇤ ! S2⇤ and f 2 : S2⇤ ! S3⇤ are Mealy functions, then their composi-
tion f 2 f 1 is a Mealy function too;
Moore machines are defined like Mealy machines, except that the output func-
tion has the form g : Q ! D and the word produced along the run q0 , q1 , . . . , qn
over a word w = a1 a2 . . . an is
b( w ) = g ( q 1 ) g ( q 2 ) . . . g ( q n ) .
g
Problem 57. Show that the class of functions realized by Moore machines coin-
cides with the class of Mealy functions.
20 REGULAR LANGUAGES
Two-way automata are similar to ordinary finite automata, except that when
reading the input word they can go either left or right; that is, their transition
relation d is of the form
d ✓ Q ⇥ (S [ {., /}) ⇥ { , !} ⇥ Q.
Note that the input word is decorated with markers . and / indicating the left
and the right endpoint of the word, respectively; that is, we run the automata
on words of the form .w/ for w 2 S⇤ . A configuration of the automaton is a
pair (q, i ), where q is a state and i is a position in the word .w/. The automaton
can move from configuration (q, i ) to configuration (q0 , i0 ) if either i0 = i + 1,
(q, ai , !,q0 ) 2 d, and ai 6= / or i0 = i 1, (q, ai , , q0 ) 2 d, and ai 6= .,
where ai is the ith letter of .w/. A an accepting run on .w/ is a sequence
(q0 , i0 ), (q1 , i1 ), . . . , (qk , ik ) of configurations like above such that
• for all j < k the automaton can move from (q j , i j ) to (q j+1 , i j+1 ), and
• qk is accepting.
Thus, the automaton stops and accepts immediately upon reaching an accepting
state, regardless of the current position in the word. The recognized language is
the set of words w such that there is an accepting run on .w/.
Problem 58. Show that for each two-way automaton there exists an ordinary
finite automaton recognizing the same language.
Problem 60. Let k be a positive integer. Prove that every non-deterministic au-
tomaton recognizing the language
Problem 62. Show that for all words u, v such that |u| < |v| = n there exists a
deterministic automaton with O(log n) states that accepts u and rejects v.
hint: Use the fact that lcm(1, 2, . . . , `) 2` for ` 7 (M. Nair, 1982).1
a b b b b b b
a a a a
Bn ... a a a
q1 q2 q3 q4 qn 2 qn 1 qn
(1) Prove that the minimal deterministic automaton recognizing L(An ) has
2n 1 states.
(2) (⇤) Prove that the minimal deterministic automaton recognizing L(Bn )
has 2n states.
Prove that every deterministic automaton recognizing the reverse of the lan-
guage recognized by An has at least 2n states.
hint: The group of permutations of a finite set X is generated by any single cycle
shifting all elements of X and any transposition of two consecutive elements on the
cycle.
Problem 66. For every n, find languages K1n , K2n , . . . , Knn over an alphabet Sn such
that
• each language Kin can be recognized by a deterministic automaton with at
most C states, for some constant C independent of n;
Problem 67.
(1) Prove that for every non-deterministic automaton with n states there exists
a regular expression of length 2O(n) that recognizes the same language.
A L G O R I T H M S O N A U T O M ATA 23
(2) (⇤) Find an example showing that for a deterministic automaton with n
states, the shortest regular expression recognizing its language may have
length as high as 2W(n) .
Problem 70. Design an algorithm which, for a regular expression b and a word
w 2 S⇤ , decides whether b generates w and works in time
Problem 71. Consider generalized regular expressions, which additionally use op-
erators \ and . Design an algorithm that, given a generalized regular expres-
sion b and a word w, decides in polynomial time whether w 2 L( b).
Problem 73. Design an algorithm that, given two deterministic automata over a
fixed alphabet S, of sizes N1 and N2 respectively, decides in time O( N1 · N2 ) if
they recognize the same language.2 The constants hidden in the O-notation may
depend on S.
0 1 2 3 n 3 n 2 n 1
a, b
(2) Design an algorithm that given an automaton with n states over a fixed-
size alphabet, decides in polynomial time whether there exists a synchro-
nizing word for it. If so, the algorithm should output such a word of
length O(n3 ).
(3) (⇤) Find a shortest synchronizing word for the automaton above.3
Problem 76. Design algorithms solving the following two problems (ignoring
complexity issues).
(1) Given a finite automaton A and a finite alphabet S, verify whether for all
words w accepted by A, for all a, b 2 S, #a (w) = #b (w).
2 An intricate algorithm by Hopcroft (1976) solves this problem in time O( N log N ), where N =
N1 + N2 .
3 The Černý conjecture, a 40-years-old open problem, states that if there is a synchronizing word
for an automaton with n states, then there is one of length at most (n 1)2 .
STRINGOLOGY 25
(2) (⇤) Given a finite automaton A and a finite alphabet S, verify whether
for all but finitely many words w accepted by A, for all different a, b 2 S,
# a ( w ) 6 = #b ( w ) .
Problem 77. For a fixed deterministic automaton A design a dynamic data struc-
ture for a word w that can be build in time O(|w|) and enables the following
operations:
2.8 Stringology
Problem 79. For a given set of words w1 , w2 , . . . , wn over the alphabet S, con-
struct a deterministic automaton with at most Âin=1 |wi | states, recognizing the
language S⇤ (w1 + w2 + · · · + wn ).
Problem 80. (⇤) Let w 2 S⇤ be a word of length n > 0 and let A be the minimal
deterministic automaton recognizing the language S⇤ w.
(2) Show that in A all but at most 2n transitions go to the initial state.
(3) Show that A can be computed in time O(n), provided that the description
does not list transitions leading to the initial state.
(2) Show that in A all except at most 3n transitions go to the sink state.
For simplicity, omit the sink state. How many states are needed for the analogu-
ous automata for the words ab(ba)n , n 2 N, including the sink state?
3
Context-free languages
X ! a1 . . . a n
S ! a1 ! . . . ! an = w.
One can also present derivations as trees. A derivation tree for a word w in
L( G ) is a tree with nodes labelled by terminal or non-terminal symbols, or the
empty word #. It must satisfy the following conditions:
(1) the set of words over the alphabet { a, b} containing the same number of
occurrences of a and b;
(2) the set of words over the alphabet { a, b} containing twice as many occur-
rences of a as occurrences of b;
(3) the set of words over the alphabet { a, b} of even length where the num-
ber of occurrences of b in even positions is the same as the number of
occurrences of b in odd positions.
(1) the set of fully parenthesized arithmetical expressions over the alphabet
{0, 1, (, ), +, ·} that evaluate to 2 under the standard interpretation of the
symbols in the alphabet;
C O N T E X T- F R E E G R A M M A R S 29
(2) the set of arithmetical expressions in the reverse Polish notation, over the
alphabet {0, 1, +, ·}, that evaluate to 4.
(1) the set of propositional formulas with one variable p, and constants true,
false (the alphabet is { p, true, false, ^, _, ¬, ( , )});
(2) the set of formulas from the previous item that evaluate to true under
every valuation of p.
(3) ai b j c p d q : i + j = p + q .
Problem 88. Prove that the set of palindromes over a fixed alphabet, as well as
the complement thereof, are context-free languages.
Similarly to the pumping lemma for regular languages (cf. Section 2.2), there
is a pumping lemma which can be used to prove that a given language is not
context-free. There are several variants of this lemma. The basic pumping lemma
for context-free languages says that for each context-free language L there exists
a constant N with the following property: every word w 2 L of length at least N
can be decomposed as
• the word wk = prefix · leftk · infix · rightk · suffix belongs to the language L,
for all k 2 N.
A stronger variant is the so-called Ogden’s lemma, which talks about words
with a distinguished set of marked positions. It says that for each context-free
language L, there exists a constant N with the following property: every word
w 2 L with at least N marked positions (in particular, |w| N) can be decom-
posed as
• the word wk = prefix · leftk · infix · rightk · suffix belongs to the language L,
for all k 2 N.
32 C O N T E X T- F R E E L A N G U A G E S
Notice that marking positions induces a tradeoff: on the one hand, we can guar-
antee that some position in left, right is marked, but, on the other hand, we know
that the length of left · infix · right is bounded by N only with respect to the
number of marked positions. The basic pumping lemma is a special case of Og-
den’s lemma with all positions of w marked. For some languages that cannot be
proved not to be context-free using the pumping lemma, Ogden’s lemma can be
helpful.
In determining whether a given language is context-free it is often useful
that the class of context-free languages is closed under homomorphic images,
finite unions, and intersections with regular languages.
Problem 101. Prove that the following languages are not context-free:
n o
(1) L1 = ai b j ak : j = max{i, k} ;
n o
(2) L2 = ai bi c k : k 6 = i .
n o
Problem 102. Prove that L = ai b j ck : i 6= j, i 6= k, j 6= k is not context-free. Is
its complement context-free?
Problem 104. Let L = {ww : w 2 S⇤ }. Prove that L is context-free if, and only
if, S contains at most one letter, and that its complement is always context-free,
regardless of the cardinality of S.
n 105. Prove
Problem o that for every k 2 N, the complement of the language
Lk = wk : w 2 S⇤ is context-free.
Problem 106. Let the alphabet S = { a, b, $} contain three distinct letters. Prove
that the language L = {w$w : w 2 { a, b}⇤ } is not context-free, but its comple-
ment is.
C O N T E X T- F R E E O R N O T ? 33
(1) Prove that the set of tautologies over a fixed finite set of propositional vari-
ables V, interpreted as words over the alphabet V [ {false, true, _, ^, ¬, ( , )},
is context-free (cf. Problem 85 (2)).
34 C O N T E X T- F R E E L A N G U A G E S
(2) The set of formulas over a countable set of variables can be represented as
the language over the alphabet
{false, true, x, 1, 0, _, ^, ¬, ( , )}
F ! true false V ( F _ F ) ( F ^ F ) (¬ F ),
V ! x1 V0 V1.
Problem 116. Consider the ordered alphabet {0, 1} with 0 < 1. A word w is
primitive if there is no base u and exponent n > 1 such that w = un . A word w1
is a cyclic permutation of a word w2 if there exist words u, v such that w1 = uv
and w2 = vu. A Lyndon word is a primitive word which is lexicographically
the smallest among all its cyclic permutations. Is the set of all Lyndon words
context-free?
Problem 118. Show that for each linear context-free language L there exists a
constant N such that each word w 2 L of length at least N can be factorized as
in such a way that left · right 6= #, |prefix · left| N, |right · suffix| N, and for all
k 2 N, the word wk = prefix · leftk · infix · rightk · suffix belongs to L.
Problem 120. Show that the set of those words w over the alphabet { a, b} which
have the same number of a’s and b’s is not a linear context-free language.
(1) uvR : uv 2 D1 ,
(2) uvR : uv 2 D2 .
A = (S, G, Q, q I , Z I , d, F ),
accepting states. All of the above sets are required to be finite. We write transi-
tion rule (q, a, Z, q0 , g) 2 d as
q, a, Z !A q0 , g.
It tells the automaton to first pop the symbol Z from the stack, and then push
the sequence g.
A configuration of the pushdown automaton is a triple (q, w, g), where q 2 Q
is the current state, w 2 S⇤ is the word that remains to be read, and g 2 G⇤ is the
stack content (where the first letter is the symbol on the top of the stack, etc.).
Initial configurations are of the form (q I , w, Z I ); that is, the state is initial, and
the stack contains only the initial symbol. Final configurations are of the form
(q, #, g); that is, the whole input word is already read.
The following relation `A on configurations reflects a single step of the au-
tomaton: we let
L(A) = {w 2 S⇤ : (q I , w, Z I ) `A
⇤
(q F , #, g) for some q F 2 F, g 2 G⇤ } .
(3) words containing two times more a’s than b’s (Problem 83 (2)),
Problem 128. (⇤) Prove that for every pushdown automaton one can construct
an equivalent pushdown automaton with two states only.
Problem 129. Prove that for each pushdown automaton one can construct an
equivalent automaton (with the same states) that in each transition replaces the
topmost stack symbol with at most two stack symbols.
Problem 130. (⇤) Prove that for each pushdown automaton one can construct
an equivalent pushdown automaton that has only push rules and pop rules; that
is, only rules of the form
q, a, Z ! q0 , YZ and q, a, Z ! q0 , #.
Can one limit the number of states for such automata as well?
(4) LR = wR : w 2 L ,
Problem 132.
(1) L \ K ,
(2) LK 1,
(3) K 1L .
Problem 134. Let max(w), min(w), med(w) denote, respectively, the maximum,
the minimum, and the median of the numbers #a (w), #b (w), #c (w). Determine
which of the following languages are regular, and which are context-free:
Problem 135. (⇤) Prove that for every pushdown automaton A there exists a
constant C such that for every word w 2 L(A) there exists an accepting compu-
tation of length at most C |w|.
P R O P E RT I E S O F C O N T E X T- F R E E L A N G U A G E S 39
Problem 136. (⇤) Prove that for each pushdown automaton A the set of words
that are the possible contents of the stack in computations of A is regular. Then,
deduce that the set of words that are the possible contents of the stack in accepting
computations of A is also regular.
• for each symbol a 2 S [ {#} there is at most one transition of the form
p, a, Z ! q, g, and
Problem 138. Prove that the set of palindromes over the alphabet { a, b} cannot
be recognized by a deterministic pushdown automaton.
Problem 139. Give an example of a context-free language L such that the lan-
guage 12 L = { x : 9y. | x | = |y| ^ xy 2 L} is not context-free.
(1) Prove that the shuffle of a context-free language and a regular language is
context-free.
Problem 141. Recall the notion of shuffle closure defined in Problem 41. Con-
struct a finite language whose shuffle closure is not context-free.
Problem 142. Prove that if X and Y are regular languages then the language
S n n
n2N X \ Y is context-free, but need not be regular.
Problem 143. Give an example of regular languages X, Y, Z such that the lan-
S
guage n2N X n \ Y n \ Z n is not context-free.
Problem 144. Give an example of a context-free language L such that the lan-
p
guage L = {w : ww 2 L} is not context-free.
Problem 149. Show that for every pair of morphisms, h1 and h2 , the languages
xyR : h1 ( x ) = h2 (y) and xyR : h1 ( x ) 6= h2 (y) are both linear context-free.
Problem 150. Show that the class of linear context-free languages is closed under
intersections with regular languages.
Problem 151. For a given language L, let min( L) be the language of words from
L that are minimal in the prefix order; that is, u 2 min( L) if and only if u 2 L and
no strict prefix of u belongs to L. Prove that if L is a deterministic context-free
language, then so is min( L).
n o
Problem 152. Show that for L = ai b j ck : k i or k j the language min( L),
defined in Problem 151, is not context-free.
P R O P E RT I E S O F C O N T E X T- F R E E L A N G U A G E S 41
Problem 154. The Hamming distance between two words of the same length
is the number of positions at which they differ. Prove that for every regular
|v|
language L, the set M of words v at a distance at most 2 from a word of length
|v| in L is context-free. Is it always regular?
Problem 156. Prove that each context-free language over a one-letter alphabet is
regular.
n o
Problem 157. Prove that if L is a context-free language, then a|w| : w 2 L is a
regular language.
Problem 158. A language has the prefix property if for every two words from that
language, one is a prefix of the other. Show that if a context-free language has
the prefix property then it is a regular language.
Problem 159.
(1) Is there a language L over a finite alphabet such that neither L nor the
complement of L contain an infinite regular language?
d ✓ Q⇥T⇥Q⇥T⇥{ , , !}.
A transition rule (q, a, p, b, d) 2 d applies in state q if the machine’s head sees the
tape letter a in its current cell. The rule allows the machine to overwrite a with
b, change the state to p, and either keep the head over the same cell or move it
left or right, depending on d.
A configuration of the machine specifies the tape contents, the machine’s state,
and the position of the head: for w, v 2 T ⇤ and q 2 Q, the configuration wqv
describes the situation where the tape contains the word wv with all remaining
44 T H E O RY O F C O M P U TAT I O N
cells empty (that is, containing the blank symbol b), the state is q, and the head
is placed over the cell containing the first symbol of the word v.
Given an input word w, the machine starts its computation in its initial state
q0 , with its head over the first (left-most) symbol of w. The initial configuration
is thus q0 w. The machine’s run consists of applications of transition rules, which
yield a finite or infinite sequence of consecutive configurations. A configuration
wqv is accepting if the state q is accepting; a run is accepting if it is finite and
its last configuration is accepting. The language L( M) recognized by a machine
M consists of all those words w 2 S⇤ for which M has an accepting run start-
ing in the initial configuration q0 w. Two Turing machines are equivalent if they
recognize the same language.
Unless stated otherwise, we assume that the tape is infinite in both direc-
tions; however, one could also consider a model with right-infinite tape, where
there are no tape cells to the left of the initial position of the head. The two
models are computationally equivalent (see Problem 163).
Transition rules of a machine with k tapes are in the following format:
d ✓ Q ⇥ Tk ⇥ Q ⇥ Tk ⇥ { , , !}k .
Thus such a machine has k heads, moving independently, but a common state is
used to determine their moves. A machine with any constant number of tapes
can be simulated by a machine with a single tape. Therefore, the 1-tape model
is computationally equivalent to the k-tape one, for all k.
The Turing machines discussed so far are non-deterministic. A machine is
deterministic if its transition relation d satisfies the following condition: for every
state q and tape symbol a, there is at most one state p, tape symbol b and di-
rection t such that (q, a, p, b, t) 2 d (equivalently, there is exactly one p, b and t).
Thus, in every state q and for every tape symbol a, a deterministic machine has
at most one possible transition rule to apply.
Problem 160. Construct Turing machines over the input alphabet {0, 1} recog-
nizing the following languages:
(2) palindromes;
Problem 162. We say that a deterministic Turing machine over input alphabet
{ a} computes a function f : N ! N in unary representation if, for each n 0,
the computation of the machine starting in the initial configuration q0 an termi-
nates in the configuration q f a f (n) , where q f is a distinguished final state.
Problem 163. Prove that every Turing machine is equivalent to a machine with a
right-infinite tape; that is, a tape that has no cells to the left of the initial position
of the head.
Problem 165. Given two deterministic Turing machines M1 and M2 over the al-
phabet S, construct deterministic machines recognizing the following languages:
(1) L( M1 ) [ L( M2 );
(2) L( M1 ) \ L( M2 );
(3) L( M1 ) L( M2 );
46 T H E O RY O F C O M P U TAT I O N
(4) L( M1 )⇤ .
Problem 166. Prove that every Turing machine M over the input alphabet {0, 1}
is equivalent to a Turing machine M0 with tape alphabet {0, 1, b } which never
writes the blank symbol b.
(2) (⇤) Prove that 1-tape write-once Turing machines only recognize regular
languages.
Problem 168. (⇤) Prove that a deterministic 1-tape Turing machine that makes
O(n) steps on each input of length n recognizes a regular language.
but the transition ci can be executed only if the current value of ci is strictly
positive. That is, counter values are not allowed to drop below 0.
Problem 172. Turing machines over a one-letter input alphabet can be viewed as
acceptors of sets of natural numbers, written in unary notation. Prove that such
Turing machines are equivalent to a 3-counter automata.
A machine halts on an input word w if it has no infinite run starting from the
initial configuration q0 w. A language L ✓ S⇤ such that L = L( M) for some Tur-
ing machine M that may or may not halt on all input words is called recursively
enumerable or semidecidable. Problem 173 below justifies the common use of both
these rather different names: it implies that a language L is accepted by a Turing
machine if and only if there exists a (different) Turing machine that outputs all
words in L one by one. A machine that halts on all inputs is called total. If
L = L( M) for a total machine M, then L is called decidable. Unsurprisingly, a
language that is not decidable is called undecidable.
48 T H E O RY O F C O M P U TAT I O N
Problem 174. Prove that a set L ✓ N, treated as a language L over the alphabet
{0, 1} via the standard binary representation of natural numbers, is decidable if
and only if it is finite or it is the image of some strictly increasing computable
function f : N ! N.
Problem 175. Prove the following Turing–Post theorem: if a language and its
complement are both recursively enumerable, then they are both decidable.
Problem 176. (⇤) Prove that there exists a recursively enumerable set whose
complement is infinite but does not contain any infinite, recursively enumerable
subset.
hint: Construct the set by choosing, for every Turing machine M that accepts an infinite
language, a single word accepted by M. Choose wisely, so that the complement of your
set remains infinite.
Problem 177. Recall the definition of a k-counter automaton from Problem 172.
Prove that there exists a 2-counter automaton A such that it is undecidable
whether A halts on a given input number n.
Problem 178. (⇤) Prove that the following Post problem is undecidable: Given
two lists of words u1 , . . . , un 2 S⇤ and w1 , . . . , wn 2 S⇤ , is there a sequence of
indices i1 , . . . , im 2 {1, . . . , n} such that
u i1 . . . u i m = wi1 . . . wi m ?
If such a sequence exists then the word ui1 . . . uin (equal to wi1 . . . win ) is called a
solution of the instance of the Post problem.
hint: As an intermediate step, use a modification where i1 has to be 1.
50 T H E O RY O F C O M P U TAT I O N
Problem 179. Prove that the following universality problem is undecidable: given
a context-free grammar G over an alphabet S, is it the case that
L( G ) = S⇤ ?
L( G1 ) \ L( G2 ) = ∆ ?
Problem 181. Assume that we know that a 1-tape, deterministic Turing machine
M makes at most one “turn”; that is, once the head moves to the left it never
moves to the right again. Is it decidable whether a given machine M with this
property accepts a given word w?
(2) Prove that the function C can be approximated in the following sense:
there is a Turing machine that, for input ([ M ], v), produces an infinite
sequence of numbers that eventually stabilizes at the value C ([ M ], v).
Note that there is no contradiction between (a) and (b), since an observer of an
infinite sequence can never say whether it has already stabilized.
CHOMSKY HIERARCHY 51
Context-sensitive grammars are defined like context-free grammars, with the ex-
ception that production rules are of the form:
aXb ! agb,
Problem 184. A monotonic grammar has rules of the form a ! b, where | b| |a|.
Prove that each monotonic grammar can be transformed into a context-sensitive
grammar (possibly using different non-terminal symbols).
Problem 185. Prove that context-sensitive languages are exactly the languages
recognized by non-deterministic linear bounded automata, that is, 1-tape Turing
machines which are only allowed to use the cells of the tape that are initially
occupied by the input word (we assume that the last letter of the input word is
marked).1
1 One can equivalently take Turing machines using O(n) space.
52 T H E O RY O F C O M P U TAT I O N
Problem 187. Prove that recursively enumerable languages are closed under
quotients by recursively enumerable languages.
Problem 188. Which of the four classes of the Chomsky Hierarchy are closed
under union, intersection and complement?
transition is executed, the symbol g is written to the output. The machine has to
accept and the word composed of consecutive output letters has to be equal to
f ( w ).
Problem 192. Prove that the classes p, np, and pspace are closed under Kleene’s
star in the following sense: if a language L belongs to the class, then so does the
language L⇤ .
Problem 193. Show that if the class l is closed under Kleene’s star, then l = nl.
Problem 194. In the Exact Set Cover problem one is given a finite universe
U and a family F of subsets of U. The task is to determine whether there is a
subfamily of F consisting of pairwise disjoint sets whose union is equal to U.
Prove that this problem is np-complete.
Problem 195. In the Subset Sum problem one is given a set S of non-negative
integers and a target non-negative integer t, all encoded in binary. The task is
to verify whether there exists a subset of S such that the sum of elements of the
subset is equal to t. Prove that this problem is np-complete.
2 Note that dlog2 ne is not well defined for n = 0.
54 T H E O RY O F C O M P U TAT I O N
Problem 196. In the Tiling problem we are given a set of square tiles S ✓
{0, 1, . . . , n}4 and an integer N, represented in unary. Each tile x is represented
as a quadruple of integers x = ( x [ ], x ["], x [!], x [#]) from the range between
0 and n, which we will treat as colours of the respective sides of the tile. A
tiling of an N ⇥ N square is called proper if it consists of N 2 tiles from S aligned
side-to-side, and the following two conditions are satisfied:
• If two tiles share a side, the colours of the corresponding sides of the tiles
match.
The tiles cannot be rotated. The question is whether such a proper tiling exists.
Prove that this problem is np-complete.