TOC Question Bank
TOC Question Bank
TOC Question Bank
Homework
CS 341 Homework 1
Basic Techniques
1. What are these sets? Write them using braces, commas, numerals, (for infinite sets), and only.
(a) ({1, 3, 5} {3, 1}) {3, 5, 7}
(b) {{3}, {3, 5}, {{5, 7}, {7, 9}}}
(c) ({1, 2, 5} - {5, 7, 9}) ({5, 7, 9} - {1, 2, 5})
(d) 2{7, 8, 9} - 2{7, 9}
(e) 2
(f) {x : y N where x = y2}
(g) {x : x is an integer and x2 = 2}
Basic Techniques
(b) LessThanOrEqual defined on ordered pairs of natural numbers. (a, b) (x, y) iff a x or (a = x and
b y). For example, (1,2) (2,1) and (1,2) (1,3).
(c) The relation defined by the following boolean matrix:
1
1
1
1
1
1
1
1
10. Are the following sets closed under the following operations? If not, what are the respective closures?
(a) The odd integers under multiplication.
(b) The positive integers under division.
(c) The negative integers under subtraction.
(d) The negative integers under multiplication.
(e) The odd length strings under concatenation.
11. What is the reflexive transitive closure R* of the relation
R = {(a, b), (a, c), (a, d), (d, c), (d, e)} Draw a directed graph representing R*.
12. For each of the following relations R, over some domain D, compute the reflexive, symmetric, transitive
closure R. Try to think of a simple descriptive name for the new relation R. Since R must be an equivalence
relation, describe the partition that R induces on D.
(a) Let D be the set of 50 states in the US. xy, xRy iff x shares a boundary with y.
(b) Let D be the natural numbers. xy, xRy iff y = x+3.
(c) Let D be the set of strings containing no symbol except a. xy, xRy iff y = xa. (i.e., if y equals x
concatenated with a).
13. Consider an infinite rectangular grid (like an infinite sheet of graph paper). Let S be the set of intersection
points on the grid. Let each point in S be represented as a pair of (x,y) coordinates where adjacent points differ
in one coordinate by exactly 1 and coordinates increase (as is standard) as you move up and to the right.
(a) Let R be the following relation on S: (x1,y1)(x2,y2), (x1,y1)R(x2,y2) iff x2= x1+1 and y2=y1+1. Let R be
the reflexive, symmetric, transitive closure of R. Describe in English the partition P that R induces on S. What
is the cardinality of P?
(b) Let R be the following relation on S: (x1,y1)(x2,y2), (x1,y1)R(x2,y2) iff (x2= x1+1 and y2=y1+1) or (x2= x11 and y2=y1+1). Let R be the reflexive, symmetric, transitive closure of R. Describe in English the partition P
that R induces on S. What is the cardinality of P?
(c) Let R be the following relation on S: (x1,y1)(x2,y2), (x1,y1)R(x2,y2) iff (x2,y2) is reachable from (x1,y1) by
moving two squares in any one of the four directions and then one square in a perpendicular direction. Let R
be the reflexive, symmetric, transitive closure of R. Describe in English the partition P that R induces on S.
What is the cardinality of P?
14. Is the transitive closure of the symmetric closure of a binary relation necessarily reflexive? Prove it or give
a counterexample.
15. Give an example of a binary relation that is not reflexive but has a transitive closure that is reflexive.
16. For each of the following functions, state whether or not it is (i) one-to-one, (ii) onto, and (iii) idempotent.
Justify your answers.
(a) +: P P P, where P is the set of positive integers, and
+(a, b) = a + b (In other words, simply addition defined on the positive integers)
(b) X : B B B, where B is the set {True, False}
Homework 1
Basic Techniques
{3, 5}
{3, 5, 7}
{1, 2, 7, 9}
{8}, {7, 8}, {8, 9}, {7, 8, 9}
{}
{0, 1, 4, 9, 25, 36} (the perfect squares)
(since the square root of 2 is not an integer)
2. (a)
A (B C)
= (B C) A
= (B A) (C A)
= (A B) (A C)
commutativity
distributivity
commutativity
(b)
A (B C)
= (B C) A
= (B A) (C A)
= (A B) (A C)
commutativity
distributivity
commutativity
(c)
A (A B)
= (A B) A
=A
commutativity
absorption
3. (a)
(b)
(c)
{(,1), (,2), ({1}, 1), ({1}, 2), ({2}, 1), ({2}, 2), ({1,2}, 1), ({1,2}, 2)}
4.
R R = {(a, a), (a, d), (a, b), (b, b), (b, c), (b, a), (a, c)}
R inverse = {(b, a), (c, a), (d, c), (a, a), (a, b)}
None of R, R R or R inverse is a function.
5. (a) S = {0, 1, 5, 6, 7, }. S has the same number of elements as N. Why? Because there is a bijection
between S and N: f: S N, where f(0) = 0, f(1) = 1, x 5, f(x) = x - 3. So |S| = 0.
(b) 2.
(c) S = all subsets of {a, b, c}. So S = {, {a}, {b}, {c}, {a, b}, {a, c}, {b, c}, {a, b, c}}. So |S| = 8. We
could also simply have used the fact that the cardinality of the power set of a finite set of cardinality c
is 2c.
(d) S = {(a, 1), (a, 2), (a, 3), (a, 4), (b, 1), (b, 2), (b, 3), (b, 4), (c, 1), (c, 2), (c, 3), (c, 4)}. So |S| = 12. Or
we could have used the fact that, for finite sets, |A B| = |A| * |B|.
(e) S = {(a, 0), (a, 1), , (b, 0), (b, 1),} Clearly S contains an infinite number of elements. But are there
the same number of elements in S as in N, or are there more (26 times more, to be precise)? The
answer is that there are the same number. |S| = 0. To prove this, we need a bijection from S to N. We
can define this bijection as an enumeration of the elements of S:
(a, 0), (b, 0), (c, 0),
(First enumerate all 26 elements of S that have 0 as their second element)
Homework 1
Basic Techniques
6. Mother-of:
Not reflexive: Eve is not the mother of Eve (in fact, no one is her own mother).
Not symmetric: mother-of(Eve, Cain), but not Mother-of(Cain, Eve).
Not transitive: Each person has only one mother, so if Mother-of(x, y) and Mother-of(y, z),
the only way to have Mother-of(x, z) would be if x and y are the same person, but we know
that that's not possible since Mother-of(x, y) and no one can be the mother of herself).
Would-recognize-picture-of:
Not symmetric: W-r-p-o(Elaine, Bill Clinton), but not W-r-p-o (Bill Clinton, Elaine)
Not transitive: W r-p-o(Elaine, Bill Clinton) and W r-p-o(Bill Clinton, Bill's mom) but not
W-r-p-o(Elaine, Bill's mom)
Has-ever-been-married-to:
Not reflexive: No one is married to him or herself.
Not transitive: H-e-b-m-t(Dave, Sue) and H-e-b-m-t(Sue, Jeff) but not
H-e-b-m-t(Dave, Jeff)
Ancestor-of: Not reflexive: not Ancestor-of(Eve, Eve) (in fact, no one is their own ancestor).
Not symmetric: Ancestor-of(Eve, Cain) but not Ancestor-of(Cain, Eve)
Hangs-out-with: Not transitive: Hangs-out-with(Bill, Monica) and Hangs-out-with(Monica, Linda Tripp),
but not Hangs-out-with(Bill, Linda Tripp).
Less-than-or-equal-to: Not symmetric: 1 2, but not 2 1.
7. Yes, if 2A = 2B, then A must equal B. Suppose it didn't. Then there is some element x that is in one set but
not the other. Call the set x is in A. Then 2A must contain {x}, which must not be in 2B, since x B. This
would mean that 2A 2B, which contradicts our premise.
8. (a)
(b)
(c)
(d)
yes
no, since no element of a partition can be empty.
no, 0 is missing
no, since, each element of the original set S must appear in only one element of a partition of S.
Basic Techniques
contains all the necessary elements. Since it is not possible to derive zero by multiplying two negative
numbers, it must not be in the closure set.
(e) The odd length strings are not closed under concatenation. "a" || "b" = "ab", which is of length 2. The
closure is the set of strings of length 2. Note that strings of length 1 are not included. Why?
11. R* = R {(x, x) : x {a, b, c, d, e}} {(a, e)}
12. (a) The easiest way to start to solve a problem like this is to start writing down the elements of R and see if
a pattern emerges. So we start with the elements of R: {(TX, LA), (LA, TX), (TX, NM), (NM, TX), (LA, Ark),
(Ark, LA), (LA Miss), (Miss, LA) }. To construct R, we first add all elements of the form (x, x), so we add
(TX,TX), and so forth. Then we add the elements required to establish transitivity:
(NM, TX), (TX, LA) (NM, TX)
(TX, LA), (LA, Ark) (TX, Ark)
(NM, TX), (TX, Ark) (NM, Ark), and so forth.
If we continue this process, we will see that the reflexive, symmetric, transitive closure R relates all states
except Alaska and Hawaii to each other and each of them only to themselves. So R can be described as
relating two states if its possible to drive from one to the other without leaving the country. The partition is:
[Alaska]
[Hawaii]
[all other 48 states]
(b) R includes, for example {(0, 3), (3, 6), (6, 9), (9, 12) }. When we compute the transitive closure, we
add, among other things {(0, 6), (0, 9), (0,12)}. Now try this starting with (1,4) and (2, 5). Its clear that x,y,
xRy iff x = y (mod 3). In other words, two numbers are related iff they have the same remainder mod 3. The
partition is:
[0, 3, 6, 9, 12 ]
[1, 4, 7, 10, 13 ]
[2, 5, 8, 11, 14 ]
(c) R relates all strings composed solely of as to each other. So the partition is
[, a, aa, aaa, aaaa, ]
13. (a) Think of two points being related via R if you can get to the second one by starting at the first and
moving up one square and right one square. When we add transitivity, we gain the ability to move diagonally
by two squares, or three, or whatever. So P is an infinite set. Each element of P consists of the set of points
that fall on an infinite diagonal line running from lower left to upper right.
(b) Now we can more upward on either diagonal. And we can move up and right followed by up and left,
and so forth. The one thing we cant do is move directly up or down or right or left exactly one square. So take
any given point. To visualize the points to which it is related under R, imagine a black and white chess board
where the squares correspond to points on our grid. Each point is related to all other points of the same color.
Thus the cardinality of P is 2.
(c) Now every point is related to every other point. The cardinality of P is 1.
14. You might think that for all relations R on some domain D, the transitive closure of the symmetric closure
of R (call it TC(SC(R))) must be reflexive because for any two elements x, y D such that (x, y) R, we'll
have (x, y), (y, z) SC(R) and therefore (x, x), (y, y) TC(SC(R)). This is all true, but does not prove that for
all z D, (z, z) TC(SC(R)). Why not? Suppose there is a z D such that there is no y D for which (y, z)
R or (z, y) R. (If you look at the graph of R, z is an isolated vertex with no edges in or out.) Then (z, z)
TC(SC(R)). So the answer is no, with R = on domain {a} as a simple counterexample: TC(SC(R)) = , yet
it should contain (a, a) if it were reflexive.
15. R = {(a, b), (b, a)} on domain {a, b} does the trick easily.
Homework 1
Basic Techniques
Homework 1
Basic Techniques
CS 341 Homework 2
Strings and Languages
1. Let = {a, b}. Let L1 = {x *: |x| < 4}. Let L2 = {aa, aaa, aaaa}. List the elements in each of the
following languages L:
(a) L3 = L1 L2
(b) L4 = L1 L2
(c) L5 = L1 L4
(d) L6 = L1 - L2
2. Consider the language L = anbncm. Which of the following strings are in L?
(a)
(b) ab
(c) c
(d) aabc
(e) aabbcc
(f) abbcc
3. It probably seems obvious to you that if you reverse a string, the character that was originally first becomes
last. But the definition we've given doesn't say that; it says only that the character that was originally last
becomes first. If we want to be able to use our intuition about what happens to the first character in a proof, we
need to turn it into a theorem. Prove x,a where x is a string and a is a single character, (ax)R = xRa.
4. For each of the following binary functions, state whether or not it is (i) one-to-one, (ii) onto, (iii) idempotent,
(iv) commutative, and (v) associative. Also (vi) state whether or not it has an identity, and, if so, what it is.
Justify your answers.
(a) || : S S S, where S is the set of strings of length 0
||(a, b) = a || b (In other words, simply concatenation defined on strings)
(b) || : L L L where L is a language over some alphabet
||(a, b) = {w *: w = x || y for some x a and y b} In other words, the concatenation of two
languages A and B is the set of strings that can be derived by taking a string from A and then
concatenating onto it a string from B.
5. We can define a unary function F to be self-inverse iff x Domain(F) F(F(x)) = x. The Reverse function
on strings is self-inverse, for example.
(a) Give an example of a self-inverse function on the natural numbers, on sets, and on booleans.
(b) Prove that the Reverse function on strings is self-inverse.
Solutions
1. First we observe that L1 = {, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb}.
(a) L3 = {, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, aaaa}
(b) L4 = {aa, aaa}
(c) L5 = every way of selecting one element from L1 followed by one element from L4:
{aa, aaa, baa, aaaa, abaa, baaa, bbaa, aaaaa, aabaa, abaaa, abbaa, baaaa, babaa, bbaaa, bbbaa}
{aaa, aaaa, baaa, aaaaa, abaaa, baaaa, bbaaa, aaaaaa, aabaaa, abaaaa, abbaaa, baaaaa, babaaa,
bbaaaa, bbbaaa}. Note that we've written aa, just to make it clear how this string was derived. It
should actually be written as just aa. Also note that some elements are in both of these sets (i.e.,
there's
more than one way to derive them). Eliminating duplicates (since L is a set and thus does not contain
duplicates), we get:
{aa, aaa, baa, aaaa, abaa, baaa, bbaa, aaaaa, aabaa, abaaa, abbaa, baaaa, babaa, bbaaa, bbbaa, aaaaaa,
aabaaa, abaaaa, abbaaa, baaaaa, babaaa, bbaaaa, bbbaaa}
(d) L6 = every string that is in L1 but not in L2: {, a, b, ab, ba, bb, aab, aba, abb, baa, bab, bba, bbb}.
Homework 2
2. (a)
(b)
(c)
(d)
(e)
(f)
Yes. n = 0 and m = 0.
Yes. n = 1 and m = 0.
Yes. n = 0 and m = 1.
No. There must be equal numbers of a's and b's.
Yes. n = 2 and m = 2.
No. There must be equal numbers of a's and b's.
3. Prove: x,a where x is a string and a is a single character, (ax)R = xRa. We'll use induction on the length of
x. If |x| = 0 (i.e, x = ), then (a)R = a = Ra. Next we show that if this is true for all strings of length n, then it
is true for all strings of length n + 1. Consider any string x of length n + 1. Since |x| > 0, we can rewrite x as yb
for some single character b.
(ax)R = (ayb)R
Rewrite of x as yb
= b(ay)R
Definition of reversal
= b(yRa)
Induction hypothesis (since |x| = n + 1, |y| = n)
= (b yR) a
Associativity of concatenation
= xRa
Definition of reversal: If x = yb then xR = byR
4. (a)
(b)
on
the fact that is an identity for concatenation of strings. Given the way in which we defined
concatenation of languages as the concatenation of strings drawn from the two languages, {} is an
identity for concatenation of languages and thus it enables us to prove that all languages can be derived
from the concatenation operation.
(iii) || is not idempotent. ||({a}, {a}) = {aa}
(iv) || is not commutative. ||({a}, {b}) = {ab}. But ||({b}, {a}) = {ba}.
(v) || is associative.
(vi) || has {} as both a left and right identity.
5. (a)
(b)
Homework 2
CS 341 Homework 3
Languages and Regular Expressions
1. Describe in English, as briefly as possible, each of the following (in other words, describe the language
defined by each regular expression):
(a) L( ((a*a) b) b )
(b) L( (((a*b*)*ab) ((a*b*)*ba))(b a)* )
2. Rewrite each of these regular expressions as a simpler expression representing the same set.
(a) * a* b* (a b)*
(b) ((a*b*)* (b*a*)*)*
(c) (a*b)* (b*a)*
3. Let = {a, b}. Write regular expressions for the following sets:
(a) All strings in * whose number of a's is divisible by three.
(b) All strings in * with no more than three a's.
(c) All strings in * with exactly one occurrence of the substring aaa.
4. Which of the following are true? Prove your answer.
(a) baa a*b*a*b*
(b) b*a* a*b* = a* b*
(c) a*b* c*d* =
(d) abcd (a(cd)*b)*
5. Show that L((a b)*) = L(a* (ba*)*).
6. Consider the following:
(a) ((a b) (ab))*
(b) (a+ anbn)
(c) ((ab)* )
(d) (((ab) c)* (b c*))
(e) (* (bb*))
(i) Which of the above are pure regular expressions?
(ii) For each of the above that is a regular expression, give a simplified equivalent pure regular expression.
(iii) Which of the above represent regular languages?
7. True - False: For all languages L1, L2, and L3
(a) (L1L2)* = L1*L2*
(b) (L1 L2)* = L1* L2*
(c) (L1 L2) L3 = L1 L3 L2 L3
(d) (L1 L2) L3 = (L1 L3) (L2 L3)
(e) L1+)* = L1*
(f) (L1+)+ = L1+
(g) (L1*)+ = (L1+)*
(h) L1* = L1+
(i) (ab)*a = a(ba)*
(j) (a b)* b (a b)* = a* b (a b)*
(k) [(a b)* b (a b)* (a b)* a (a b)*] = (a b)*
(l) [(a b)* b (a b)* (a b)* a (a b)*] = (a b)+
(m) [(a b)* b a (a b)* a*b*] = (a b)*
Homework 3
Any string of a's and/or b's with zero or more a's followed by a single b.
Any string of a's and/or b's with at least one occurrence of ab or ba.
Homework 3
(c) (a*b)* (b*a)* = (a b)* (In other words, all strings over {a, b}.) How do we know that? (a*b)* is the
union of and all strings that end in b. (b*a)* is the union of and all strings that end in a. Clearly any string
over {a, b} must either be empty or it must end in a or b. So we've got them all.
3. (a) The a's must come in groups of three, but of course there can be arbitrary numbers of b's everywhere. So:
(b*ab*ab*a)*b*
Since the first expression has * around it, it can occur 0 or more times, to give us any number of a's
that is divisible by 3.
(b) Another way to think of this is that there are three optional a's and all the b's you want. That gives us:
b* (a ) b* (a ) b* (a ) b*
(c) Another way to think of this is that we need one instance of aaa. All other instances of aa must occur
with
either b or end of string on both sides. The aaa can occur anywhere so we'll plunk it down, then list the
options for everything else twice, once on each side of it:
(ab aab b)*
aaa
(ba baa b)*
4. (a) True. Consider the defining regular expression: a*b*a*b*. To get baa, take no a's, then one b, then two
a's then no b's.
(b) True. We can prove that two sets X and Y are equal by showing that any string in X must also be in Y
and vice versa. First we show that any string in b*a* a*b* (which we'll call X) must also be in a* b*
(which we'll call Y). Any string in X must have two properties: (from b*a*): all b's come before all a's; and
(from a*b*): all a's come before all b's. The only way to have both of these properties simultaneously is to be
composed of only a's or only b's. That's exactly what it takes to be in Y.
Next we must show that every string in Y is in X. Every string in Y is either of the form a* or b*. All strings
of the form a* are in X since we simply take b* to be b0, which gives us a* a* = a*. Similarly for all strings
of the form b*, where we take a* to be a0.
(c) False. Remember that to show that any statements is false it is sufficient to find a single counterexample:
a*b* and c*d*. Thus a*b* c*d* , which is therefore not equal to .
(d) False. There is no way to generate abcd from (a(cd)*b)*. Let's call the language generated by
(a(cd)*b)* L. Notice that every string in L has the property that every instance of (cd)* is immediately
preceded by a. abcd does not possess that property.
5. That the language on the right is included in the language on the left is immediately apparent since every
string in the right-hand language is a string of a's and b's. To show that any string of a's and b's is contained in
the language on the right, we note that any such string begins with zero or more a's. If there are no b's, then the
string is contained in a*. If there is at least one b, we strip off any initial a's as a part of a* and examine the
remainder. If there are no more b's, the remainder is in ba*. If there is at least one more b to the right, then we
strip of the initial b and any following consecutive a's (a string in ba*) and examine the remainder. Repeat the
last two steps until the end of the string is reached. Thus, every string of a's and b's is included in the language
on the right.
6. (i) a, c, e (b contains superscript n; d contains )
(ii) (a) = (a b)*
(c) =
(e) = b*
(iii) a, c, d, e (b is {ambn : m > n}, which is not regular)
7. (a) F, (b) F, (c) T, (d) F, (e) T, (f) T, (g) T, (h) F, (i) T, (j) T, (k) F, (l) T, (m) T, (n) F, (o) F, (p) T (by def. of
+), (q) T, (r) F, (s) T, (t) F, (u) F, (v) T.
Homework 3
Homework 3
CS 341 Homework 4
Deterministic Finite Automata
1. If M is a deterministic finite automaton. Under exactly what circumstances is L(M)?
2. Describe informally the languages accepted by each of the following deterministic FSMs:
(from Elements of the Theory of Computation, H. R. Lewis and C. H. Papdimitriou, Prentice-Hall, 1998.)
Homework 4
a
4
a
b
3
a
a
6
a,b
7. Give a dfa accepting {x {a, b}* : at least one a in x is not immediately followed by b}.
8. Let L = {w {a, b}* : w does not end in ba}.
(a) Construct a dfa accepting L.
(b) Give a regular expression for L.
9. Consider L = {anbn : 0 n 4}
(a) Show that L is regular by giving a dfa that accepts it.
(b) Give a regular expression for L.
10. Construct a deterministic finite state machine to accept strings that correspond to odd integers without
leading zeros.
11. Imagine a traffic light. Let = {a}. In other words, the input consists just of a string of a's. Think of
each a as the output from a timer that signals the light to change. Construct a deterministic finite state
transducer whose outputs are drawn from the set {Y, G, R} (corresponding to the colors yellow, green, and
red). The outputs of the transducer should correspond to the standard traffic light behavior.
12. Recall the finite state machine that we constructed in class to accept $1.00 in change or bills. Modify
the soda machine so that it actually does something (i.e., some soda comes out) by converting our finite state
acceptor to a finite state transducer. Let there be two buttons, one for Coke at $.50 and one for Water at
$.75 (yes, it's strange that water costs more than Coke. The world is a strange place). In any case, there will
now be two new symbols in the input alphabet, C and W. The machine should behave as follows:
Homework 4
The machine should keep track of how much money has been inserted. If it ever gets more than $1.50, it
should spit back enough to get it under $1.00 but keep it above $.75.
If the Coke or Water button is pushed and enough money has been inserted, the product and the change
should be output.
If a button is pushed and there is not enough money, the machine should remember the button push and
wait until there is enough money, at which point it should output the product and the change.
13. Consider the problem of designing an annoying buzzer that goes off whenever you try to drive your car
and you're not wearing a seat belt. (For simplicity, we'll just worry about the driver's possible death wish. If
you want to make this harder, you can worry about the other seats as well.) Design a finite state transducer
whose inputs are drawn from the alphabet {KI, KR, SO, SU, BF, BU}, representing the following events,
respectively: "key just inserted into ignition", "key just removed from ignition", "seat just became
occupied", "seat just became unoccupied", "belt has just been fastened", and "belt has just been unfastened".
The output alphabet is {ON, OFF}. The buzzer should go on when ON is output and stay off until OFF is
output.
14. Is it possible to construct a finite state transducer that can output the following sequence:
1010010001000010000010000001
If it is possible, design one. If it's not possible, why not?
Solutions
1. L(M) iff the initial state is a final state. Proof: M will halt in its initial state given as input. So: (IF)
If the initial state is a final state, then when M halts in the initial state, it will be in a final state and will
accept as an element of L(M). (ONLY IF) If the initial state is not a final state, then when M halts in the
initial state, it will reject its input, namely . So the only way to accept is for the initial state to be a final
state.
2.
Homework 4
3.
Homework 4
Homework 4
4. (a)
(b)
Homework 4
(c)
5.
a
a,b
1
6. (aa)* (bb* bb*a(aa)*) = (aa)*b+( a(aa)*) = all strings of a's and b's consisting of an even number of
a's, followed by at least one b, followed by zero or an odd number of a's.
7.
b
a
a,b
b
(b) a (a b)* (b aa)
8. (a)
a
2
a
a
4
Homework 4
9. (a)
a
b
b
a,b
b
a
b
a
a,b
(b) ( ab aabb aaabbb aaaabbbb)
Homework 4
CS 341 Homework 5
Regular Expressions in UNIX
Regular expressions are all over the place in UNIX, including the programs grep, sed, and vi.
There's a regular expression pattern matcher built into the programming language perl. There's also
one built into the majordomo maillist program, to be used as a way to filter email messages. So it's
easy to see that people have found regular expressions extremely useful. Each of the programs that
uses the basic idea offers its own definition of what a regular expression is. Some of them are more
powerful than others. The definition in perl is shown on the reverse of this page.
1. Write perl regular expressions to do the following things. If you have easy access to a perl
interpreter, you might even want to run them.
(a) match occurrences of your phone number
(b) match occurrences of any phone number
(c) match occurrences of any phone number that occurs more than once in a string
(d) match occurrences of any email address that occurs more than once in a string
(e) match the Subject field of any mail message from yourself
(f) match any email messages where the address of the sender occurs in the body of
the message
2. Examine the constructs in the perl regular expression definition closely. Compare them to the
much more limited definition we are using. Some of them can easily be described in terms of the
primitive capabilities we have. In other words, they don't offer additional power, just additional
convenience. Some of them, though, are genuinely more powerful, in the sense that they enable you
to define languages that aren't regular (i.e., they cannot be recognized with Finite State Machines).
Which of the perl constructs actually add power to the system? What is it about them that makes
them more powerful?
Homework 5
from Programming in Perl, Larry Wall and Randall L. Scwartz, OReilly & Associates, 1990.
Homework 5
CS 341 Homework 6
Nondeterministic Finite Automata
1. (a) Which of the following strings are accepted by the nondeterministic finite automaton shown on the left below?
(i)
a
(ii)
aa
(iii)
aab
(iv)
b
a
b
b
(b) Which of the following strings are accepted by the nondeterministic finite automaton on the right above?
(i)
(ii)
ab
(iii)
abab
(iv)
aba
(v)
abaa
2. Write regular expressions for the languages accepted by the nondeterministic finite automata of problem 1.
3. For any FSM F, let |F| be the number of states in F. Let R be the machine shown on the right in problem 1.
Let L = {w {0, 1}* : M such that M is an FSM, L(M) = L(R), |M| |R|, and w is the binary encoding of |M|}. Write a
regular expression for L.
4. Draw state diagrams for nondeterministic finite automata that accept these languages:
(a) (ab)*(ba)* aa*
(b) ((ab aab)*a*)*
(c) ((a*b*a*)*b)*
(d) (ba b)* (bb a)*
5. Some authors define a nondeterministic finite automaton to be a quintuple (K, , , S, F), where K, , , and F are as we
have defined them and S is a finite set of initial states, in the same way that F is a finite set of final states. The automaton may
nondeterministically begin operating in any of these initial states. Explain why this definition is not more general than ours in
any significant way.
6. (a) Find a simple nondeterministic finite automaton accepting ((a b)*aabab).
(b) Convert the nondeterministic finite automaton of Part (a) into a deterministic finite automaton by the method described
in class and in the notes.
(c) Try to understand how the machine constructed in Part (b) operates. Can you find an equivalent deterministic machine
with fewer states?
7. Construct a NDFSA that accepts the language (ba ((a bb) a*b)).
Homework 6
0
a,b
b
b
4
a
Homework 6
(ab a)*. This is so because aab can be formed by one application a, followed by one of ab. So it
is redundant inside a Kleene star. Now we can write a two state machine:
a
b
If you put the loop on a on the start state, either in place of where we have it, or in addition to it, it's also right.
(c) First we simplify:
((a*b*a*)*b)*
/ (L1*L2*L3*)* = (L1 L2 L3)* /
((a b a)*b)*
/ union is idempotent /
((a b)*b)*
There is a simple 2 state NDFSM accepting this, which is the empty string and all strings ending with b.
(d) This is the set of strings where either:
(1) every a is preceded by a b,
or (2) all b's occur in pairs.
So we can make a five state nondeterministic machine by making separate machines (each with two states) for the two
languages and then introducing transitions from the start state to both of them.
5. To explain that any construct A is not more general or powerful than some other construct B, it suffices to show that any
instance of A can be simulated by a corresponding instance of B. So in this case, we have to show how to take a multiple start
state NDFSA, A, and convert it to a NDFSA, B, with a single start state. We do this by initially making B equal to A. Then
add to B a new state we'll call S0. Make it the only start state in B. Now add transitions from S0 to each of the states that
was a start state in A. So B has a single start state (thus it satisfies our original definition of a NDFSA), but it simulates the
behavior of A since the first thing it does is to move, nondeterministically, to all of A's start states and then it exactly mimics
the behavior of A.
6. If you take the state machine as it is given, add a new start state and make transitions from it to the given start states, you
have an equivalent machine in the form that weve been using.
7. (a)
a
q0
a
q1
b
q2
a
q3
b
q4
q5
a,b
(b)
(1) Compute the E(q)s. Since there are no transitions, E(q), for all states q is just {q}.
(2) S = {q0}
(3) =
{
({q0}, a, {q0, q1}),
({q0}, b, {{q0}),
({q0, q1}, a, {q0, q1, q2}),
({q0, q1}, b, {q0}),
({q0, q1, q2}, a, {q0, q1, q2}),
({q0, q1, q2}, b, {q0, q3}),
({q0, q3}, a, {q0, q1, q4}),
({q0, q3}, b, {q0}),
({q0, q1, q4}, a, {q0, q1, q2}),
({q0, q1, q4}, b, {q0, q5}),
({q0, q5}, a, {q0, q1}),
({q0, q5}, b, {q0}) }
Homework 6
(4) K = {{q0}, {q0, q1}, {q0, q1, q2}, {q0, q3}, {q0, q1, q4}, {q0, q5}}
(5) F = {{q0, q5}}
(c) There isnt a simpler machine since we need a minimum of six states in order to keep track of how many characters
(between 0 and 5) of the required trailing string we have seen so far.
8. We can build the following machine really easily. We make the path from 1 to 2 to 3 for the ba option. The rest is for the
second choice. We get a nondeterministic machine, as we generally do when we use the simple approach.
1
b
a
a
4
In this case, we could simplify our machine if we wanted to and get rid of state 4 by adding a transition on b from 2 to 5.
9. (1) E(q0) = {q0, q1}, E(q1) = {q1}, E(q2) = {q2}, E(q3) = {q3, q4}, E(q4) = {q4}
(2) s = {q0, q1}
(3) =
{ ({q0, q1}, a, {q0, q1}),
({q0, q1}, b, {q0, q1, q2, q4}),
({q0, q1, q2, q4}, a, {q0, q1, q3, q4}),
({q0, q1, q2, q4), b, {q0, q1, q2, q4}) }
({q0, q1, q3, q4}, a, {q0, q1, q3, q4}),
({q0, q1, q3, q4}, b, {q0, q1, q2, q4}),
(4) K = { {q0, q1}, {q0, q1, q3, q4}, {q0, q1, q2, q4} }
(5) F = { {q0, q1, q3, q4}, {q0, q1, q2, q4} }
This machine corresponds to the regular expression a*b(a b)*
10. (a)
a
a
b
S
F
b
Homework 6
CS 341 Homework 7
Review of Equivalence Relations
1. Assume a finite domain that includes just the specific cities mentioned here. Let R = the reflexive,
symmetric, transitive closure of:
(Austin, Dallas), (Dallas, Houston), (Dallas, Amarillo), (Austin, San Marcos),
(Philadelphia, Pittsburgh), (Philadelphia, Paoli), (Paoli, Scranton),
(San Francisco, Los Angeles), (Los Angeles, Long Beach), (Long Beach, Carmel)
(a) Draw R as a graph.
(b) List the elements of the partition defined by R on its domain.
2. Let R be a relation on the set of positive integers. Define R as follows:
{(a, b) : (a mod 2) = (b mod 2)} In other words, R(a, b) iff a and b have the same remainder when
divided by 2.
(a) Consider the following example integers: 1, 2, 3, 4, 5, 6. Draw the subset of R involving just these values as
a graph.
(b) How many elements are there in the partition that R defines on the positive integers?
(c) List the elements of that partition and show some example elements.
3. Consider the language L, over the alphabet = {a, b}, defined by the regular expression
a*(b ) a*
Let R be a relation on *, defined as follows:
R(x, y) iff both x and y are in L or neither x nor y is in L. In other words, R(x,y) if x and y have
identical status with respect to L.
(a) Consider the following example elements of *: , b, aa, bb, aabaaa, bab, bbaabb. Draw the subset of R
involving just these values as a graph.
(b) How many elements are there in the partition that R defines on *?
(c) List the elements of that partition and show some example elements.
Solutions
1.
2.
(b) Two
(c) [even integers] Examples: 2, 4, 6, 106
[odd integers] Examples: 1, 3, 5, 17, 11679
3.
(a) (Hint: L is the language of strings with no more than one b.)
(b) Two
(c) [strings in L] Examples: , aa, b, aabaaa
[strings not in L] Examples: bb, bbaabb, bab
Homework 7
CS 341 Homework 8
Finite Automata, Regular Expressions, and Regular Grammars
1. We showed that the set of finite state machines is closed under complement. To do that, we presented a
technique for converting a deterministic machine M into a machine M' such that L(M') is the complement of
L(M). Why did we insist that M be deterministic? What happens if we interchange the final and nonfinal states
of a nondeterministic finite automaton?
2. Give a direct construction for the closure under intersection of the languages accepted by finite automata.
(Hint: Consider an automaton whose set of states is the Cartesian product of the sets of states of the two
original automata.) Which of the two constructions, the one given in the text or the one suggested in this
problem, is more efficient when the two languages are given in terms of nondeterministic finite automata?
3. Using the either of the construction techniques that we discussed, construct a finite automaton that accepts
the language defined by the regular expression: a*(ab ba )b*.
4. Write a regular expression for the language recognized by the following FSM:
a
b
b
a
a
a,b
b
a
b
1
a
Homework 8
8. Construct a deterministic FSM to accept the intersection of the languages accepted by the following FSMs:
a
2'
1'
a
a
3'
b
1
a,b
Homework 8
Homework 8
4. Without using the algorithm for finding a regular expression from an FSM, we can note in this case that the
lower right state is a dead state, i.e., an absorbing, non-accepting state. We can leave and return to the initial
state, the only accepting state, by reading ab along the upper path or by reading ba along the lower path. These
can be read any number of times, in any order, so the regular expression is (ab ba)*. Note that is included,
as it should be.
5. (a) ((a ba)(ba)*b)
(b)
a
{1}
a
{2, 3}
b
{0}
a
b
{2}
a,b
6. (a)
a
a
a
b
a,b
b
b
a
a,b
b
Homework 8
(b)
a
a
b
b
b
a
a,b
7. (a) Nonterminal S is the starting symbol. We'll use it to generate an odd number of a's. We'll also use the
nonterminal E, and it will always generate an even number of a's. So, whenever we generate an a, we must
either stop then, or we must generate the nonterminal E to reflect the fact that if we generate any more a's, we
must generate an even number of them.
Sa
b
S aE
(b)
a
S bS
Eb
S
a
E
E bE
b
E aS
a
b
#
8.
b
<1, 1'>
<2, 1'>
b
<3, 1'>
a
b
a
b
<1, 2'>
<2,2'>
<3, 2'>
a
b
a
b
<1,3'>
<2, 3'>
<3, 3'>
b
b
a
9. (a) (a bb*aa)* ( bb*(a ))
(b) All strings in {a, b}* that contain no occurrence of bab.
Homework 8
CS 341 Homework 9
Languages That Are and Are Not Regular
1. Show that the following are not regular.
(a) L = {wwR : w {a, b}*}
(b) L = {ww : w {a, b}*}
(c) L = {ww' : w {a, b}*}, where w' stands for w with each occurrence of a replaced by b, and vice versa.
2. Show that each of the following is or is not a regular language. The decimal notation for a number is the
number written in the usual way, as a string over the alphabet {-, 0, 1, , 9}. For example, the decimal
notation for 13 is a string of length 2. In unary notation, only the symbol 1 is used; thus 5 would be represented
as 11111 in unary notation.
(a) L = {w : w is the unary notation for a natural number that is a multiple of 7}
(b) L = {w : w is the decimal notation for a natural number that is a multiple of 7}
(c) L = {w : w is the unary notation for a natural number n such that there exists a pair p and q of twin primes,
both > n.} Two numbers p and q are a pair of twin primes iff q = p + 2 and both p and q are prime. For
example, (3, 5) is a pair of twin primes.
(d) L = {w : w is, for some n 1, the unary notation for 10n}
(e) L = {w : w is, for some n 1, the decimal notation for 10n}
(f) L = {w is of the form x#y, where x, y {1}+ and y = x+1 when x and y are interpreted as unary numbers}
(For example, 11#111 and 1111#11111 L, while 11#11, 1#111, and 1111 L.)
(g) L = {anbj: |n j| = 2}
(h) L = {uwwRv : u, v, w {a, b}+}
(i) L = {w {a, b}* : for each prefix x of w, #a(x) #b(x)}
3. Are the following statements true or false? Explain your answer in each case. (In each case, a fixed alphabet
is assumed.)
(a) Every subset of a regular language is regular.
(b) Let L = L1 L2. If L is regular and L2 is regular, L1 must be regular.
(c) If L is regular, then so is L = {xy : x L and y L}.
(d) {w : w = wR} is regular.
(e) If L is a regular language, then so is L = {w : w L and wR L}.
(f) If C is any set of regular languages, C (the union of all the elements of C) is a regular language.
(g) L = {xyxR : x, y *} is regular.
(h) If L = L1 L2 is a regular language and L1 is a regular language, then L2 is a regular language.
(i) Every regular language has a regular proper subset.
(j) If L1 and L2 are nonregular languages, then L1 L2 is also not regular.
4. Show that the language L = {anbm : n m} is not regular.
5. Prove or disprove the following statement:
If L1 and L2 are not regular languages, then L1 L2 is not regular.
6. Show that the language L = {x {a, b}* : x = anbambamax(m,n)} is not regular.
7. Show that the language L = {x {a, b}* : x contains exactly two more b's than a's} is not regular.
8. Show that the language L = {x {a, b}* : x contains twice as many a's as b's} is not regular.
Homework 9
Homework 9
(b) L = {w : w is the decimal notation for a natural number that is a multiple of 7}. L is regular. We can
build a deterministic FSM M to accept it. M is based on the standard algorithm for long division. The states
represent the remainders we have seen so far (so there are 7 of them, corresponding to 0 6). The start state, of
course, is 0, corresponding to a remainder of 0. So is the final state. The transitions of M are as follows:
si {0 - 6} and cj {0 - 9}, (si, cj) = (10si + cj) mod 7
So, for example, on the input 962, M would first read 9. When you divide 7 into 9 you get 1 (which we dont
care about since we dont actually care about the answer we just care whether the remainder is 0) with a
remainder of 2. So M will enter state 2. Next it reads 6. Since it is in state 2, it must divide 7 into 2*10 +6
(26). It gets a remainder of 5, so it goes to state 5. Next it reads 2. Since it is in state 5, it must divide 7 into
5*10 + 5 (52), producing a remainder of 3. Since 3 is not zero, we know that 862 is not divisible by 7, so M
rejects.
(c) L = {w : w is the unary notation for a natural number such that there exists a pair p and q of twin primes,
both > n.}. L is regular. Unfortunately, this time we dont know how to build a PDA for it. We can, however,
prove that it is regular by considering the following two possibilities:
(1) There is an infinite number of twin primes. In this case, for every n, there exists a pair of twin primes
greater than n. Thus L = 1*, which is clearly regular.
(2) There is not an infinite number of twin primes. In this case, there is some largest pair. There is thus
also a largest n that has a pair greater than it. Thus the set of such ns is finite and so is L (the unary
encodings of those values of n). Since L is finite, it is clearly regular.
It is not known which of these cases is true. But interestingly, from our point of view, it doesnt matter. L is
regular in either case. It may bother you that we can assert that L is regular when we cannot draw either an
FSM or a regular expression for it. It shouldnt bother you. We have just given a nonconstructive proof that L
is regular (and thus, by the way, that some FSM M accepts it). Not all proofs need to be constructive. This
situation isnt really any different from the case of L = {w : w is the unary encoding of the number of siblings
I have}. You know that L is finite and thus regular, even though you do not know how many siblings I have
and thus cannot actually build a machine to accept L.
(d) L = {w : w is, for some n 1, the unary notation for 10n}. So L = {1111111111, 1100, 11000, }. L isnt
regular, since clearly any machine to accept L will have to count the 1s. We can prove this using the pumping
lemma: Let w = 1P, N P and P is some power of 10. y must be some number of 1s. Clearly, it can be of
length at most P. When we pump it in once, we get a string s whose maximum length is therefore 2P. But the
next power of 10 is 10P. Thus s cannot be in L.
(e) L = {w : w is, for some n 1, the decimal notation for 10n}. Often its easier to work with unary
representations, but not in this case. This L is regular, since it is just 100*.
(f) L = {w is of the form x#y, where x, y {1}+ and y = x+1 when x and y are interpreted as unary numbers}
(For example, 11#111 and 1111#11111 L, while 11#11, 1#111, and 1111 L.) L isnt regular. Intuitively, it
isnt regular because any machine to accept it must count the 1s before the # and then compare that number to
the number of 1s after the #. We can prove that this is true using the pumping lemma: Let w = 1N#1N+1. Since
|xy| N, y must occur in the region before the #. Thus when we pump (either in or out) we will change x but
not make the corresponding change to y, so y will no longer equal x +1. The resulting string is thus not in L.
(g) L = {anbj: |n j| = 2}. L isnt regular. L consists of all strings of the form a*b* where either the number
of as is two more than the number of bs or the number of bs is two more than the number of as. We can
show that L is not regular by pumping. Let w = aNbN+2. Since |xy| N, y must equal ap for some p > 0. We can
pump y out once, which will generate the string aN-pbN+2, which is not in L.
Homework 9
(h) L = {uwwRv : u, v, w {a, b}+}. L is regular. This may seem counterintuitive. But any string of length
at least four with two consecutive symbols, not including the first and the last ones, is in L. We simply make
everything up to the first of the two consecutive symbols u. The first of the two consecutive symbols is w. The
second is wR. And the rest of the string is v. And only strings with at least one pair of consecutive symbols
(not including the first and last) are in L because w must end with some symbol s. wR must start with that same
symbol s. Thus the string will contain two consecutive occurrences of s. L is regular because it can be
described the regular expression (a b)+ (aa bb) (a b)+.
(i) L = {w {a, b}* : for each prefix x of w, #a(x) #b(x)}. First we need to understand exactly what L is.
In order to do that, we need to define prefix. A string x is a prefix of a string y iff z * such that y = xz. In
other words, x is a prefix of y iff x is an initial substring of y. For example, the prefixes of abba are , a, ab,
abb, and abba. So L is all strings over {a, b}* such that, at any point in the string (reading left to right), there
have never been more bs than as. The strings , a, ab, aaabbb, and ababa are in L. The strings b, ba, abba, and
ababb are not in L. L is not regular, which we can show by pumping. Let w = aNbN. So y = ap, for some
nonzero p. If we pump out, there will be fewer as than bs in the resulting string s. So s is not in L since every
string is a prefix of itself.
3. (a) Every subset of a regular language is regular. FALSE. Often the easiest way to show that a universally
quantified statement such as this is false by showing a counterexample. So consider L = a*. L is clearly
regular, since we have just shown a regular expression for it. Now consider L = ai: i is prime. L L. But we
showed in class that L is not regular.
(b) Let L = L1 L2. If L is regular and L2 is regular, L1 must be regular. FALSE. We know that the
regular languages are closed under intersection. But it is important to keep in mind that this closure lemma (as
well as all the others we will prove) only says exactly what it says and no more. In particular, it says that:
If L1 is regular and L2 is regular
Then L is regular.
Just like any implication, we cant run this one backward and conclude anything from the fact that L is regular.
Of course, we cant use the closure lemma to say that L1 must not be regular either. So we cant apply the
closure lemma here at all. A rule of thumb: it is almost never true that you can prove the converse of a closure
lemma. So it makes sense to look first for a counterexample. We dont have to look far. Let L = . Let L2 =
. So L and L2 are regular. Now let L1 = {ai: i is prime}. L1 is not regular. Yet L = L1 L2. Notice that
we could have made L2 anything at all and its intersection with would have been . When you are looking
for counterexamples, it usually works to look for very simple ones such as or *, so its a good idea to start
there first. works well in this case because were doing intersection. * is often useful when were doing
union.
(c) If L is regular, then so is L = {xy : x L and y L}. TRUE. Proof: Saying that y L is equivalent to
saying that y L. Since the regular languages are closed under complement, we know that L is also regular. L
is thus the concatenation of two regular languages. The regular languages are closed under concatenation.
Thus L must be regular.
(d) L = {w : w = wR} is regular. FALSE. L is NOT regular. You can prove this easily by using the
pumping lemma and letting w = aNbaN.
(e) If L is a regular language, then so is L = {w : w L and wR L}. TRUE. Proof: Saying that wR L is
equivalent to saying that w LR. If w must be in both L and LR, that is equivalent to saying that L = L LR.
L is regular because the problem statement says so. LR is also regular because the regular languages are closed
Homework 9
under reversal. The regular languages are closed under intersection. So the intersection of L and LR must be
regular.
Proof that the regular languages are closed under reversal (by construction): If L is regular, then there exists
some FSM M that accepts it. From M, we can construct a new FSM M that accepts LR. M will effectively run
M backwards. Start with the states of M equal to states of M. Take the state that corresponds to the start state
of M and make it the final state of M. Next we want to take the final states of M and make them the start states
of M. But M can have only a single start state. So create a new start state in M and create an epsilon
transition from it to each of the states in M that correspond to final states of M. Now just flip the arrows on all
the transitions of M and add these new transitions to M.
(f) If C is any set of regular languages, C is a regular language. FALSE. If C is a finite set of regular
languages, this is true. It follows from the fact that the regular languages are closed under union. But suppose
that C is an infinite set of languages. Then this statement cannot be true. If it were, then every language would
be regular and we have proved that there are languages that are not regular. Why is this? Because every
language is the union of some set of regular languages. Let L be an arbitrary language whose elements are w1,
w2, w3, . Let C be the set of singleton languages {{w1}, {w2}, {w3}, } such that wi L. The number of
elements of C is equal to the cardinality of L. Each individual element of C is a language that contains a single
string, and so it is finite and thus regular. L = C. Thus, since not all languages are regular, it must not be the
case that C is guaranteed to be regular. If youre not sure you follow this argument, you should try to come
up with a specific counterexample. Choose an L such that L is not regular, and show that it can be described as
C for some set of languages C.
(g) L = {xyxR : x, y *} is regular. TRUE. Why? Weve already said that xxR isnt regular. This looks a
lot like that, but it differs in a key way. L is the set of strings that can be described as some string x, followed
by some string y (where x and y can be chosen completely independently), followed by the reverse of x. So, for
example, it is clear that abcccccba L (assuming ={a, b, c}). We let x = ab, y = ccccc, and xR = ba. Now
consider abbcccccaaa. You might think that this string is not in L. But it is. We let x = a, y = bbcccccaa, and
xR = a. What about acccb? This string too is in L. We let x = , y = acccb, and xR = . Note the following
things about our definition of L: (1) There is no restriction on the length of x. Thus we can let x = . (2)There
is no restriction on the relationship of y to x. And (3) R = . Thus L is in fact equal to * because we can take
any string w in * and rewrite it as w , which is of the form xyxR. Since * is regular, L must be regular.
(h) If L = L1 L2 is a regular language and L1 is a regular language, then L2 is a regular language.
FALSE. This is another attempt to use a closure theorem backwards. Let L1 = *. L1 is clearly regular. Since
L1 contains all strings over , the union of L1 with any language is just L1 (i.e., L = *). If the proposition
were true, then all languages L2 would necessarily be regular. But we have already shown that there are
languages that are not regular. Thus the proposition must be false.
(i) Every regular language has a regular proper subset. FALSE. is regular. And it is subset of every set.
Thus it is a subset of every regular language. However, it is not a proper subset of itself. Thus this statement is
false. However the following two similar statements are true:
(1) Every regular language has a regular subset.
(2) Every regular language except has a regular proper subset.
(j) If L1 and L2 are nonregular languages, then L1 L2 is also not regular. False. Let L1 = {anbm, n m}
and L2 = {anbm, n m}. L1 L2 = a*b*, which is regular.
Homework 9
4. If L were regular, then its complement, L1, would also be regular. L1 contains all strings over {a, b} that are
not in L. There are two ways not to be in L: have any a's that occur after any b's (in other words, not have all
the a's followed by all the b's), or have an equal number of a's and b's. So now consider
L2 = L1 a*b*
L2 contains only those elements of L1 in which the a's and b's are in the right order. In other words,
L2 = anbn
But if L were regular, then L1 would be regular. Then L2, since it is the intersection of two regular languages
would also be regular. But we have already shown that it (anbn) is not regular. Thus L cannot be regular.
5. This statement is false. To prove it, we offer a counter example. Let L1 = {anbm : n=m} and let L2 =
{anbm : n m}. We have shown that both L1 and L2 are not regular. However,
L1 L2 = a*b*, which is regular.
There are plenty of other examples as well. Let L1 = {an: n 1 and n is prime}. Let L2 = {an: n 1 and n is not
prime}. Neither L1 nor L2 is regular. But L1 L2 = a+, which is clearly regular.
6. This is easy to prove using the pumping lemma. Let w = aNbaNbaN. We know that xy must be contained
within the first block of a's. So, no matter how y is chosen (as long as it is not empty, as required by the
lemma), for any i > 2, xyiz L, since the first block of a's will be longer than the last block, which is not
allowed. Therefore L is not regular.
7. First, let L' = L a*b*, which must be regular if L is. We observe that L' = anbn+2 : n 0. Now use the
pumping lemma to show that L' is not regular in the same way we used it to show that anbn is not regular.
8. We use the pumping lemma. Let w = a2NbN. xy must be contained within the block of a's, so when we pump
either in or out, it will no longer be true that there will be twice as many a's as b's, since the number of a's
changes but not the number of b's. Thus the pumped string will not be in L. Therefore L is not regular.
9. (a) L is not regular. We can prove this using the pumping lemma. Let w = aNbN. Since y must occur within
the first N characters of w, y = ap for some p > 0. Thus when we pump y in, we will have more as than bs,
which produces strings that are not in L.
(b) L* is also not regular. To prove this, we need first to prove a lemma, which well call EQAB: s, s
L* #a(s) = #b(s). To prove the lemma, we first observe that any string s in L* must be able to be
decomposed into at least one finite sequence of strings, each element of which is in L. Some strings will have
multiple such decompositions. In other words, there may be more than one way to form s by concatenating
together strings in L. For any string s in L*, let SQ be some sequence of elements of L that, when concatenated
together, form s. It doesnt matter which one. Define the function HowMany on the elements of L*.
HowMany(x) returns the length of SQ. Think of HowMany as telling you how many times we went through the
Kleene star loop in deriving x. We will prove EQAB by induction on HowMany(s).
Base case: If HowMany(s) = 0, then s = . #a(s) = #b(s).
Induction hypothesis: If HowMany(s) N, then #a(s) = #b(s).
Show: If HowMany(s) = N+1, then #a(s) = #b(s).
If HowMany(s) = N+1, then w,y such that s = wy, w L*, HowMany(w) = N, and y L. In other words,
we can decompose s into a part that was formed by concatenating together N instances of L plus a second part
that is just one more instance of L. Thus we have:
Homework 9
Definition of L
Induction hypothesis
s = wy
s = wy
4, 2
5, 1
6, 3
Q. E. D.
Now we can show that L* isnt regular using the pumping lemma. Let w = aNbN. Since y must occur within the
first N characters of w, y = ap for some p > 0. Thus when we pump y in, we will have a string with more as
than bs. By EQAB, that string cannot be in L*.
Homework 9
CS 341 Homework 10
State Minimization
1. (a) Give the equivalence classes under L for these languages:
(i) L = (aab ab)*
(ii) L = {x : x contains an occurrence of aababa}
(iii) L = {xxR : x {a, b}*}
(iv) L = {xx : x {a, b}*}
(v) Ln = {a, b}a{a, b}n, where n > 0 is a fixed integer
(vi) The language of balanced parentheses
(b) For those languages in (a) for which the answer is finite, give a deterministic finite automaton with the smallest number
of states that accepts the corresponding language.
2. Let L = {x {a, b}* : x contains at least one a and ends in at least two b's}.
(a) Write a regular expression for L.
(b) Construct a deterministic FSM that accepts L.
(c) Let RL be the equivalence relation of the Myhill-Nerode Theorem. What partition does RL induce on the set
{a, bb, bab, abb, bba, aab, abba, bbaa, baaba}?
(d) How many equivalence classes are there in the partition induced on * by RL?
3. Let L = {x {a, b}* : x begins with a and ends with b}.
(a) What is the nature of the partition induced on * by RL, the equivalence relation of the Myhill-Nerode Theorem? That
is, how many classes are there in the partition and give a description of the strings in each.
(b) Using these equivalence classes, construct the minimum state deterministic FSM that accepts L.
4. Suppose that we are given a language L and a deterministic FSM M that accepts L. Assume L is a subset of {a, b, c}*. Let
RL and RM be the equivalence relations defined in the proof of the Myhill-Nerode Theorem. True or False:
(a) If we know that x and y are two strings in the same equivalence class of RL, we can be sure that they are in the same
equivalence class of RM.
(b) If we know that x and y are two strings in the same equivalence class of RM, we can be sure that they are in the same
equivalence class of RL.
(c) There must be at least one equivalence class of RL that has contains an infinite number of strings.
(d) RM induces a partition on {a, b, c}* that has a finite number of classes.
(e) If L, then [] (the equivalence class containing ) of RL cannot be an infinite set.
5. Use the Myhill-Nerode Theorem to prove that {anbmcmax(m,n)bpdp : m, n, p 0} is not regular.
6. (a) In class we argued that the intersection of two regular languages was regular on the basis of closure properties of
regular languages. We did not show a construction for the FSM that recognizes the intersection of two regular languages.
Such a construction does exist, however, and it is suggested by the fact that L1 L2 = * - ((* - L1) (* - L2)).
Given two deterministic FSMs, M1 and M2, that recognize two regular languages L1 and L2, we can construct an FSM that
recognizes L = L1 L2 (in other words strings that have all the required properties of both L1 and L2), as follows:
1. Construct machines M1' and M2', as deterministic versions of M1 and M2. This step is necessary because complementation
only works on deterministic machines.
2. Construct machines M1'' and M2'', from M1' and M2', using the construction for complementation, that recognize * - L1
and * - L2, respectively.
3. Construct M3, using the construction for union and the machines M1'' and M2'', that recognizes
((* - L1) (* - L2)). This will be a nondeterministic FSM.
4. Construct M4, the deterministic equivalent of M3.
5. Construct ML, using the construction for complementation, that recognizes * - ((* - L1) (* - L2)).
Now consider:
Homework 10
= {a, b}
L1 = {w * : all a's occur in pairs}
State Minimization
Solutions
1. (a)
(i) L = (aab ab)*
1. [, aab, ab, and all other elements of L]
2. [a or wa : w L]
3. [aa or waa : w L]
4. [everything else, i.e., strings that can never become elements of L because they contain illegal
substrings such as bb or aaa]
(ii) L = {x : s contains an occurrence of aababa}
1. [(a b)*aababa(a b)*, i.e., all elements of L]
2. [ or any string not in L and ending in b but not in aab or aabab, i.e., no progress yet on
"aababa"]
3. [wa for any w [2]; alternatively, any string not in L and ending in a but not in aa, aaba, or
aababa]
4. [any string not in L and ending in aa]
5. [any string not in L and ending in aab]
6. [any string not in L and ending in aaba]
7. [any string not in L and ending in aabab]
Note that this time there is no "everything else". Strings never become hopeless in this
language. They simply fail if we get to the end without finding "aababa".
(iii)L = {xxR : x {a, b}*}
1. [a, which is the only string for which the continuations that lead to acceptance are all strings of
the form wa : where w L]
2. [b, which is the only string for which the continuations that lead to acceptance are all strings of
the form wb : where w L]
3. [ab, which is the only string for which the continuations that lead to acceptance are all strings
of the form wba : where w L]
4. [aa, which is the only string for which the continuations that lead to acceptance are all strings
of the form waa : where w L]
And so forth. Every string is in a distinct equivalence class.
(iv) L = {xx : x {a, b}*}
1. [a, which is the only string for which the continuations that lead to acceptance are all strings
that would be in L except that they are missing a leading a]
2. [b, which is the only string for which the continuations that lead to acceptance are all strings
that would be in L except that they are missing a leading b]
3. [ab, which is the only string for which the continuations that lead to acceptance are all strings
that would be in L except that they are missing a leading ab]
Homework 10
State Minimization
4. [aa, which is the only string for which the continuations that lead to acceptance are all strings
that would be in L except that they are missing a leading aa]
And so forth. Every string is in a distinct equivalence class.
(v) Ln = {a, b}a{a, b}n
0. []
1. [a, b]
2. [aa, ba]
3. [aaa, aab, baa, bab]
.
.
.
n+2. [(a b)a(a b)n]
n+3. [strings that can never become elements of Ln]
There is a finite number of strings in any specific language Ln. So there is a finite number of equivalence classes of
L. Every string in Ln must be of length n+2. So there are n+3 equivalence classes (numbered 0 to n+2, to indicate the length
of the strings in the class) of strings that may become elements of Ln, plus one for strings that are already hopeless, either
because they don't start with ab or aa, or because they are already too long.
(vi) L = The language of balanced parentheses
1. [w*(w*: w L]
/* i.e., one extra left parenthesis somewhere in the string
2. [w*((w* : w L]
/*
two
3. [w*(((w* : w L]
4. [w*((((w* : w L]
5. [w*(((((w* : w L]
and so on. There is an infinite number of equivalence classes.
Each of these classes is distinct, since ) is an acceptable continuation for 1, but none of the others; )) is acceptable for
2, but none of the others, ))) is acceptable for 3, but none of the others, and so forth.
1. (b)
(i)
b
a
b
b
a
4
(ii) There's always a very simple nondeterministic FSM to recognize all strings that contain a specific substring. It's just a
chain of states for the desired substring, with loops on all letters of on the start state and the final state. In this case, the
machine is:
a
2
a
3
b
4
a
5
b
6
a
7
a,b
1
a,b
To construct a minimal, deterministic FSM, you have two choices. You can use our algorithm to convert the NDFSM to a
deterministic one and then use the minimization algorithm to get the minimal machine. Or you can construct the minimal
FSM directly. In any case, it is:
Homework 10
State Minimization
a
a
a,b
1
b
a
(v)
a,b
0
a,b
a,b
3
a,b
..
n+2
a,b
n+3
(a b)*a(a b)* bb
a
a
0
1
a
b
3
b
(c) It's easiest to answer (d) first, and then to consider (c) as a special case.
(d)
[0] = all strings that contain no a
[1] = all strings that end with a
[2] = all strings that end with ab
[3] = all strings that contain at least one a and that end with bb, i.e., all strings in L.
It is clear that these classes are pairwise disjoint and that their union is {a, b}*. Thus they represent a partition of {a, b}*. It
is also easy to see that they are the equivalence classes of RL of the Myhill-Nerode Theore, since all the members of one
equivalence class will, when suffixed by any string z, form strings all of which are in L or all of which are not in L. Further,
for any x and y from different equivalence classes, it is easy to find a z such that one of xz, yz is in L and the other is not.
Letting the equivalence relation RL be restricted to the set in part (c), gives the partition
{{bb}, {a, bba, abba, bbaa, baaba}, {bab, aab}, {abb}}.
Homework 10
State Minimization
3. (a)
[1] = {}
[2] = b (a b)*
[3] = a a(a b)*a
[4] = a(a b)*b
(b)
a
1
a
b
a
2
a,b
4. (a) F, (b) T, (c) T (* is infinite and the number of equivalence classes is finite), (d) T, (e) F.
5. Choose any two distinct strings of a's: call them ai and aj (i < j). Then they must be in different equivalence classes of RL
since aibici L but ajbici L. Therefore, there is an infinite number of equivalence classes and L is not regular.
M1, which recognizes L1, is:
6. (a)
a,b
a
a
M2, which recognizes L2, is:
a,b
b
a
a
a,b
a
a
1
a,b
b
a
a
Homework 10
State Minimization
s = {1, 2, 5}
F = K - {2, 8}, i.e., all states except {2, 8} are final states.
You may find it useful at this point to draw this out.
Step (5) ML = M4 except that now there is a single final state, {2, 8}.
ML is deterministic. M4 is deterministic by construction, and Step 5 can not introduce any nondeterminism since it doesn't
introduce any transitions.
(b)
(c)
{1, 2, 5}
{3, 5}
{2, 6}
{2, 5}
{4, 6}
{2, 7}
{4, 5}
{4, 7}
{2, 8}
{4, 8}
{3, 8}
(d)
[]
[strings without bbb but with a single a at the end]
[strings without bbb, with any a's in pairs, and ending in a single b]
[strings without bbb and with at least one pair of a's and any a's in pairs]
[strings that cannot be made legal because they have a single a followed by a b
and where every b is preceded by an a and the last character is b]
[strings without bbb, with any a's in pairs, and ending with just two b's]
[strings that cannot be made legal because they have a single a followed by b
and where there is no bbb and the last character is a]
[strings that cannot be made legal because they have a single a followed by a b
and where there is no bbb but there is at least one bb and the last
character is b]
[all strings in L]
[strings that cannot be made legal because they have a single a followed by a b
and where there is a bbb, but the ab violation came before the first bbb]
[strings with bbb but with a single a at the end]
{1, 2, 5} {2,5} = 1
{3, 5} = 2
{4, 6} {4, 5} {4, 7} {4, 8} = 3
{2, 6} = 4
{2, 7} = 5
{2, 8} = 6
{3, 8} = 7
Homework 10
State Minimization
Homework 10
State Minimization
a
b
G
a,b
a
b
a
E
b
a
C
b
D
a
Homework 10
State Minimization
CS 341 Homework 11
Context-Free Grammars
1. Consider the grammar G = (V, , R, S), where
V = {a, b, S, A},
= {a, b},
R = { S AA,
A AAA,
A a,
A bA,
A Ab }.
(a) Which strings of L(G) can be produced by derivations of four or fewer steps?
(b) Give at least four distinct derivations for the string babbab.
(c) For any m, n, p > 0, describe a derivation in G of the string bmabnabp.
2. Construct context-free grammars that generate each of these languages:
(a) {wcwR : w {a, b}*}
(b) {wwR : w {a, b}*}
(c) {w {a, b}* : w = wR}
3. Consider the alphabet = {a, b, (, ), , *, }. Construct a context-free grammar that generates all strings in
* that are regular expressions over {a, b}.
4. Let G be a context-free grammar and let k > 0. We let Lk(G) L(G) be the set of all strings that have a
derivation in G with k or fewer steps.
(a) What is L5(G), where G = ({S, (, )}, {(, )}, {S , S SS, S (S) })?
(b) Show that, for all context-free grammars G and all k > 0, Lk(G) is finite.
5. Let G = (V, , R, S), where
V = {a, b, S},
= {a, b},
R = { S aSb,
S aSa,
S bSa,
S bSb,
S }.
Show that L(G) is regular.
6. A program in a procedural programming language, such as C or Java, consists of a list of statements, where
each statement is one of several types, such as:
(1) assignment statement, of the form id := E, where E is any arithmetic expression (generated by the grammar
using T and F that we presented in class).
(2) conditional statement, e.g., "if E < E then statement", or while statement , e.g. "while E < E do statement".
(3) goto statement; furthermore each statement could be preceded by a label.
(4) compound statement, i.e., many statements preceded by a begin, followed by an end, and separated by ";".
Give a context-free grammar that generates all possible statements in the simplified programming language
described above.
7. Show that the following languages are context free by exhibiting context-free grammars generating each:
(a) {ambn : m n}
Homework 11
Context-Free Grammars
(b) {ambncpdq : m + n = p + q}
(c) {w {a, b}* : w has twice as many b's as a's}
(d) {uawb : u, w {a, b}*, |u| = |w|}
8. Let = {a, b, c}. Let L be the language of prefix arithmetic defined as follows:
(i) any member of is a well-formed expression (wff).
(ii) if and are any wff's, then so are A, S, M, and D.
(iii) nothing else is a wff.
(One might think of A, S, M, and D as corresponding to the operators +, -, , /, respectively. Thus in L we could
write Aab instead of the usual (a + b), and MSabDbc, instead of ((a - b) (b/c)). Note that parentheses are
unnecessary to resolve ambiguities in L.)
(a) Write a context-free grammar that generates exactly the wff's of L.
(b) Show that L is not regular.
9. Consider the language L = {amb2nc3ndp : p > m, and m, n 1}.
(a) What is the shortest string in L?
(b) Write a context-free grammar to generate L.
Solutions
1. (a) We can do an exhaustive search of all derivations of length no more than 4:
S AA aA aa
S AA aA abA aba
S AA aA aAb aab
S AA bAA baA baa
S AA bAA bAa baa
S AA AbA abA aba
S AA AbA Aba aba
S AA Aa aa
S AA Aa bAa baa
S AA Aa Aba aba
S AA AbA abA aba
S AA AbA Aba aba
S AA AAb aAb aab
S AA AAb Aab aab
Many of these correspond to the same parse trees, just applying the rules in different orders. In any case, the
strings that can be generated are: aa, aab, aba, baa.
(b) Notice that A bA bAb bab, and also that A Ab bAb bab. This suggests 8 distinct
derivations:
S AA AbA AbAb Abab * babbab
S AA AAb AbAb Abab * babbab
S AA bAA bAbA babA * babbab
S AA AbA bAbA babA * babbab
Where each of these four has 2 ways to reach babbab in the last steps. And, of course, one could interleave the
productions rather than doing all of the first A, then all of the second A, or vice versa.
(c) This is a matter of formally describing a sequence of applications of the rules in terms of m, n, p that will
produce the string bmabnabp.
S
/* by rule S AA */
Homework 11
Context-Free Grammars
AA
* /* by m applications of rule A bA */
bmAA
/* by rule A a */
bmaA
* /* by n applications of rule A bA */
bmabnA
* by p applications of rule A Ab */
bmabnAbp
/* by rule A a */
bmabnabp
Clearly this derivation (and some variations on it) produce bmabnabp for each m, n, p.
2. (a) G = (V, , R, S) with V = {S, a, b, c}, = {a, b, c}, R = {
S aSa
S bSb
Sc
}.
(b) Same as (a) except remove c from V and and replace the last rule, S c, by S .
(c) This language very similar to the language of (b). (b) was all even length palindromes; this is all
palindromes. We can use the same grammar as (b) except that we must add two rules:
Sa
Sb
3. This is easy. Recall the inductive definition of regular expressions that was given in class :
1. and each member of is a regular expression.
2. If , are regular expressions, then so is
3. If , are regular expressions, then so is .
4. If is a regular expression, then so is *.
5. If is a regular expression, then so is ().
6. Nothing else is a regular expression.
This definition provides the basis for a grammar for regular expressions:
G = (V, , R, S) with V = {S, a, b, (, ), , *, }, = { a, b, (, ), , *, }, R= {
S
/* part of rule 1, above
Sa
/*
"
Sb
/*
"
S SS
/* rule 2
SSS
/* rule 3
S S*
/* rule 4
S (S)
/* rule 5
}
4. (a) We omit derivations that don't produce strings in L (i.e, they still contain nonterminals).
L1 : S
L2 : S (S) ()
L3 : S SS S
S (S) ((S)) (())
L4 : S SS (S)S ()S ()
S SS S(S) (S) ()
S (S) ((S)) (((S))) ((()))
L5 : S SS (S)S (S)(S) ()(S) ()()
Homework 11
Context-Free Grammars
Context-Free Grammars
T < | > | =
Context-Free Grammars
Another approach you could take is to build a pushdown automaton for L and then derive a grammar from it.
This may be easier simply because PDA's are good at counting. But deriving the grammar isn't trivial either. If
you had a hard time with this one, don't worry.
(d) L = {uawb : u, w {a, b}*, |u| = |w|}. This one fools some people since you might think that the a and b
are correlated somehow. But consider the simpler language L' = {uaw : u, w {a, b}*, |u| = |w|}. This one
seems easier. We just need to generate u and w in parallel, keeping something in the middle that will turn into a.
Now back to L: L is just L' with b tacked on the end. So a grammar for L is:
G = ({S, T, a, b}, {a, b}, R, S), where R = {S Tb, T aTa, T aTb, T bTa, T bTb, T a}.
8. (a) G = ({S, A, M, D, F, a, b, c}. {A, M, D, S, a, b, c}, R, S), where R = {
Fa
F AFF
Fb
F SFF
Fc
F MFF
F DFF }
(b) First, we let L' = L A*a*. L' = {Anan+1 : n 0}. L' can easily be shown to be nonregular using the
Pumping Theorem, so, since the regular languages are closed under intersection, L must not be regular.
9. (a) abbcccdd
(b) G = ({S, X, Y, a, b, c, d}, {a, b, c, d}, R, S), where R is either:
(S aXdd, X Xd, X aXd, X bbYccc, Y bbYccc, Y ), or
(S aSd, S Sd, S aMdd, M bbccc, M bbMccc)
Homework 11
Context-Free Grammars
CS 341 Homework 12
Parse Trees
1. Consider the grammar G = ({+, *, (, ), id, T, F, E}, {+, *, (, ), id}, R, E}, where
R = {E E+T, E T, T T * F, T F, F (E), F id}.
Give two derivations of the string id * id + id, one of which is leftmost and one of which is not leftmost.
2. Draw parse trees for each of the following:
(a) The simple grammar of English we presented in class and the string "big Jim ate green cheese."
(b) The grammar of Problem 1 and the strings id + (id + id) * id and (id * id + id * id).
3. Present a context-free grammar that generates , the empty language.
4. Consider the following grammar (the start symbol is S; the alphabets are implicit in the rules):
S SS | AAA |
A aA | Aa | b
(a) Describe the language generated by this grammar.
(b) Give a left-most derivation for the terminal string abbaba.
(c) Show that the grammar is ambiguous by exhibiting two distinct derivation trees for some terminal string.
(d) If this language is regular, give a regular (right linear) grammar generating it and construct the
corresponding FSM. If the language is not regular, prove that it is not.
5. Consider the following language : L = {wRw" : w {a, b}* and w" indicates w with each occurrence of a
replaced by b, and vice versa}. Give a context-free grammar G that generates L and a parse tree that shows that
aababb L.
6. (a) Consider the CFG that you constructed in Homework 11, Problem 2 for {wcwR : w {a, b}*}. How many
derivations are there, using that grammar, for the string aabacabaa?
(b) Show parse tree(s) corresponding to your derivation(s). Is the grammar ambiguous?
7. Consider the language L = {w {a, b}* : w contains equal numbers of a's and b's}
(a) Write a context-free grammar G for L.
(b) Show two derivations (if possible) for the string aabbab using G. Show at least one leftmost derivation.
(c) Do all your derivations result in the same parse tree? If so, see if you can find other parse trees or convince
yourself there are none.
(d) If G is ambiguous (i.e., you found multiple parse trees), remove the ambiguity. (Hint: look out for two
recursive occurrences of the same nonterminal in the right side of a rule, e.g, X XX)
(e) See how many parse trees you get for aabbab using the grammar developed in (d).
Solutions
3. G = ({S} , , R, S), where R is any set of rules that can't produce any strings in *. So, for example, R =
{S S} does the trick. So does R = .
Homework 12
Parse Trees
4. (a) (a*ba*ba*ba*)*
(b) S AAA aAAA abAA abAaA abbaA abbaAa abbaba
(c)
S
S
A
A
A
a
b
(d) G = ({S, S1, B1, B2, B3, a, b}, {a, b}, R, S), where R = {
S
B1 aB1
S S1
B1 bB2
S1 aS1
B2 aB2
S1 bB1
B2 bB3
a
S
b
B3 aB3
B3
B3 S1
a
S1
B3
B1
b
B2
a
5. G = ({S, a, b}, {a, b}, R, S), R = { S aSb, S bSa, S }
S
a
S
a
S
b
b
a
6. (a) The grammar is G = (V, , R, S) with V = {S, a, b, c}, = {a, b, c}, R = {S aSa, S bSb, S c}.
There is a single derivation:
S aSA aaSaa aabSbaa aabaSabaa aabacabaa
(b) There is a single parse tree. The grammar is unambiguous.
7. (a) G = (V, , R, S) with V = {S }, = {a, b }, R = {
S aSb
S bSa
S
S SS
}
(b) (i) S SS aSbS aaSbbS aabbaSb aabbab /* This is the leftmost derivation of the most
"sensible" parse.
(ii) S SS SSS aSbSS aaSbbSS aabbSS aabbaSbS aabbabS aabbab /* This is the
leftmost derivation of a parse that introduced an unnecessary S in the first step, which was then eliminated by
rewriting it as in the final step.
Homework 12
Parse Trees
(c) No. The two derivations shown here have different parse trees. They do, however, have the same
bracketing, [a[ab]b][ab].(In other words, they have similar essential structures.) They differ only in how S is
introduced and then eliminated. But there are other derivations that correspond to additional parse trees, and
some of them correspond to a completely different bracketing, [a[ab][ba]b]. One derivation that does this is
(ii) S aSb aSSb aabSb aabbab
(d) This is tricky. Recall that we were able to eliminate ambiguity in the case of the balanced parentheses
language just by getting rid of except at the very top to allow for the empty string. If we do the same thing here,
we get R = { S
ST
T ab
T aTb
T ba
T bTa
T TT
But aabbab still has multiple parses in this grammar. This language is different from balanced parens since we
can go back and forth between being ahead on a's and being ahead on b's (whereas, in the paren language, we
must always either be even or be ahead on open paren). So the two parses correspond to the bracketings
[aabb][ab] and [a [ab] [ba] b]. The trouble is the rule T TT, which can get applied at the very top of the tree
(as in the case of the first bracketing shown here), or anywhere further down (as in the case of the second one).
We clearly need some capability for forming a string by concatenating a perfectly balanced string with another
one, since, without that, we'll get no parse for the string abba. Just nesting won't work. We have to be able to
combine nesting and concatenation, but we have to control it. It's tempting to think that maybe an unambiguous
grammar doesn't exist, but it's pretty easy to see how to build a deterministic pda (with a bottom of stack symbol)
to accept this language, so there must be an unambiguous grammar. What we need is the notion of an A region,
in which we are currently ahead on a's, and a B region, in which we are currently ahead on b's. Then at the top
level, we can allow an A region, followed by a B region, followed by an A region and so forth. Think of
switching between regions as what will happen when the stack is empty and we're completely even on the number
of a's and b's that we've seen so far. For example, [ab][ba] is one A region followed by one B region. Once we
enter an A region, we stay in it, always generating an a followed (possibly after something else embedded in the
middle) by a b. After all, the definition of an A region, is that we're always ahead on a's. Only when we are
even, can we switch to a B region. Until then, if we want to generate a b, we don't need to do a pair starting with
b. We know we're ahead on a's, so make any desired b's go with an a we already have. Once we are even, we
must either quit or move to a B region. If we allow for two A regions to be concatenated at the top, there will be
ambiguity between concatenating two A regions at the top vs. staying in a single one. We must, however, allow
two A regions to be concatenated once we're inside an A region. Consider [a[ab][ab]b] Each [ab] is a perfectly
balanced A region and they are concatenated inside the A region whose boundaries are the first a and the final b.
So we must distinguish between concatenation within a region (which only happens with regions of the same
type, e.g, two A's within an A) and concatenation at the very top level, which only happens between different
types.
Also, we must be careful of any rule of the form X XX for another reason. Suppose we have a string that
corresponds to XXX. Is that the first X being rewritten as two, or the second one being rewritten as two. We
need to force a single associativity.
Homework 12
Parse Trees
Homework 12
Parse Trees
CS 341 Homework 13
Pushdown Automata
1. Consider the pushdown automaton M = (K, , , , s, F), where
K = {s, f},
F = {f},
= {a, b},
= {a},
= {((s, a, ), (s, a)), ((s, b, ), (s, a)), ((s, a, ), (f, )), ((f, a, a), (f, )), ((f, b, a), (f, ))}.
(a) Trace all possible sequences of transitions of M on input aba.
(b) Show that aba, aa, abb L(M), but baa, bab, baaaa L(M).
(c) Describe L(M) in English.
2. Construct pushdown automata that accept each of the following:
(a) L = the language generated by the grammar G = (V, , R, S), where
V = {S, (, ), [, ]},
= {(, ), [, ]},
R = { S ,
S SS,
S [S],
S (S)}.
(b) L = {ambn : m n 2m}.
(c) L = {w {a, b}* : w = wR}.
(d) L = {w {a, b}* : w has equal numbers of a's and b's}.
(e) L = {w {a, b}* : w has twice as many a's as b's}.
(f) L = {ambn : m n }
(g) L = {uawb: u and w {a, b}* and |u| = |w|}
3. Consider the following language : L = {wRw" : w {a, b}* and w" indicates w with each occurrence of a
replaced by b, and vice versa}. In Homework 12, problem 5, you wrote a context-free grammar for L. Now give
a PDA M that accepts L and trace a computation that shows that aababb L.
4. Construct a context-free grammar for the language of problem 2(b): L = ({ambn: m n 2m}).
Solutions
1. (a) There are three possible computations of M on aba:
(s, aba, ) |- (s, ba, a) |- (s, a, aa) |- (s, , aaa)
(s, aba, ) |- (s, ba, a) |- (s, a, aa) |- (f, , aa)
(s, aba, ) |- (f, ba, )
None of these is an accepting configuration.
(b) This is done by tracing the computation of M on each of the strings, as shown in (a).
(c) L(M) is the set of strings whose middle symbol is a. In other words,
L(M) = {xay {a, b}* : |x| = |y|}.
2. (a) Notice that the square brackets and the parentheses must be properly nested. So the strategy will be to push
the open brackets and parens and pop them against matching close brackets and parens as they are read in. We
only need one state, since all the counting will be done on the stack. Since L, the start state can be final.
Thus we have M = ({s}, {(, ), [, ]}, {(, [}, , s {s}), where (sorry about the confusing use of parentheses both as
part of the notation and as symbols in the language):
Homework 13
Pushdown Automata
Homework 13
Pushdown Automata
To make this work, we need to be able to tell if the stack is empty, since that's the only case where we might
consider pushing either a or b. Recall that we can't do that just by writing as the stack character, since that
always matches, even if the stack is not empty. So we'll start by pushing a special character # onto the bottom of
the stack. We can then check to see if the stack is empty by seeing if # is on top. We can do all the real work in
our PDA in a single state. But, because we're using the bottom of stack symbol #, we need two additional states:
the start state, in which we do nothing except push # and move to the working state, and the final state, which we
get to once we've popped # and can then do nothing else. Considering all these issues, we get M = ({s, q, f}, {a,
b}, {#, a, b}, , s, {f}), where
= { ((s, , ), (q, #)),
/* push # and move to the working state q */
((q, a, #), (q, a#)),
/* the stack is empty and we've got an a, so push it */
((q, a, a), (q, aa)),
/* the stack is counting a's and we've got another one so push it */
((q, b, a), (q, )),
/* the stack is counting a's and we've got b, so cancel a and b */
((q, b, #), (q, b#)),
/* the stack is empty and we've got a b, so push it */
((q, b, b), (q, bb)),
/* the stack is counting b's and we've got another one so push it */
((q, a, b), (q, )),
/* the stack is counting b's and we've got a, so cancel b and a */
((q, , #), (f, ))}.
/* the stack is empty of a's and b's. Pop the # and quit. */
To convince yourself that M does the job, you should show that M does in fact maintain the invariant we stated
above.
The only nondeterminism in this machine involves the last transition in which we guess that we're at the end of
the input. There is an alternative way to solve this problem in which we don't bother with the bottom of stack
symbol #. Instead, we substitute a lot of nondeterminism and we sometimes push a's on top of b's, and so forth.
Most of those paths will end up in dead ends. The machine has fewer states but is harder to analyze. Try to
construct it if you like.
(e) This one is similar to (d) except that there are two a's for every b. Recall the two techniques for matching two
to one that we discussed in our solution to (b). This time, though, we do know that there are always two a's to
every b. We don't need nondeterminism to allow for either one or two. But, because we no longer know that all
the a's come first, we do need to consider what to do in the two cases: (1) We're counting b's on the stack; and (2)
We're counting a's on the stack. If we're counting b's, let's take the approach in which we push two b's every time
we see one. Then, when we go to cancel a's, we can just pop one b for each a. If we see twice as many a's as b's,
we'll end up with an empty stack. Now what if we're counting a's? We'll push one a for every one we see. When
we see b, we pop two a's. The only special case we need to consider arises in strings such as "aba", where we'll
only have seen a single a at the point at which we see the b. What we need to do is to switch from counting a's to
counting b's, since the b counts twice. Thus the invariant that we want to maintain is now
(Number of a's read so far) - 2*(Number of b's read so far)
=
(Number of a's on stack) - (Number of b's on stack)
We can do all this with M = ({s, q, f}, {a, b}, {#, a, b}, , s, {f}), where
= { ((s, , ), (q, #)),
/* push # and move to the working state q */
((q, a, #), (q, a#)),
/* the stack is empty and we've got an a, so push it */
((q, a, a), (q, aa)),
/* the stack is counting a's and we've got another one so push it */
((q, b, aa), (q, )),
/* the stack is counting a's and we've got b, so cancel aa and b */
((q, b, a#), (q, b#)),
/* the stack contains a single a and we've got b, so cancel the a and b
and start counting b's, since we have a shortage of one a */
((q, b, #), (q, bb#)),
/* the stack is empty and we've got a b, so push two b's */
((q, b, b), (q, bbb)),
/* the stack is counting b's and we've got another one so push two */
((q, a, b), (q, )),
/* the stack is counting b's and we've got a, so cancel b and a */
((q, , #), (f, ))}.
/* the stack is empty of a's and b's. Pop the # and quit. */
Homework 13
Pushdown Automata
(g) The idea here is to create a nondeterministic machine. In the start state (1), it reads a's and b's, and for each
character seen, it pushes an x on the stack. Thus it counts the length of u. If it sees an a, it may also guess that
this is the required separator a and go to state 2. In state 2, it reads a's and b's, and for each character seen, pops
an x off the stack. If there's nothing to pop, the machine will fail. If it sees a b, it may also guess that this is the
required final b and go to the final state, state 3. The machine will then accept if both the input and the stack are
empty.
M = ({1, 2, 3}, {a, b}, {x}, s, {2}, =
((1, a, ), (1, x))
((1, b, ), (1, x))
((1, a, ), (2, ))
((2, a, x), (2, )
((2, b, x), (2, )
((2, b, ), (3, )
3.
a//a
a/b/
p
b//b
4.
//
q
b/a/
S
S aSb
S aSbb
Homework 13
Pushdown Automata
CS 341 Homework 14
Pushdown Automata and Context-Free Grammars
1. In class, we described an algorithm for constructing a PDA to accept a language L, given a context free
grammar for L. Let L be the balanced brackets language defined by the grammar G = ({S, [, ]}, {[, ]}, R, S),
where R =
S , S SS, S [S]
Apply the construction algorithm to this grammar to derive a PDA that accepts L. Trace the operation of the
PDA you have constructed on the input string [[][]].
2. Consider the following PDA M:
//
a//a
a//
b/a/
b/a/
//
//
b//b
//
/b/
c/b/
a//a
a//aa
b/a/
a//
b//
a//
b/a/
Homework 14
3. Don't even try to use the grammar construction algorithm. Just observe that L = {anbnbmcp : m p and n and p
0}, or, alternatively {anbmcp : m n + p and n and p 0}. It can be generated by the following rules:
S S1S2
S1 aS1b
/* S1 generates the anbn part. */
S1
S2 bS2
/* S2 generates the bmcp part. */
S2 bS2c
S2
4. (a)
a,b//
b,//
a//a
b//
b//
b//
5
/a/
a/a/
/a/
a//
b//
a,b//
b//
7
a,b//
We use state 2 to skip over an arbitrary number of bai groups that aren't involved in the required mismatch.
We use state 3 to count the first group of a's we care about.
We use state 4 to count the second group and make sure it's not equal to the first.
We use state 5 to skip over an arbitrary number of bai groups in between the two we care about.
We use state 6 to clear the stack in the case that the second group had fewer a's than the first group did.
We use state 7 to skip over any remaining bai groups that aren't involved in the required mismatch.
(b)
S A'bLA'
S A'bRA'
L ab | aL | aLa
R ba | Ra | aRa
A' bAA' |
A aA |
(c)
Homework 14
/* L will take care of two groups where the first group has more a's */
/* R will take care of two groups where the second group has more a's */
CS 341 Homework 15
Parsing
1. Show that the following languages are deterministic context free.
(a) {ambn : m n}
(b) {wcwR : w {a, b}*}
(c) {cambm : m 0} {damb2m : m 0}
(d) (amcbm : m 0} {amdb2m : m 0}
2. Consider the context-free grammar: G = (V, , R, S), where V = {(, ), ., a, S, A}, = {(, ), .}, and R =
{ S (),
S a,
S (A),
A S,
A A.S}
(If you are familiar with the programming language LISP, notice that L(G) contains all
atoms and lists, where the symbol a stands for any non-null atom.)
(a) Apply left factoring and the rule for getting rid of left recursion to G. Let G' be the resulting grammar.
Argue that G' is LL(1). Construct a deterministic pushdown automaton M that accepts L(G)$ by doing a top
down parse. Study the computation of M on the string ((()).a).
(b) Repeat Part (a) for the grammar resulting from G if one replaces the first rule by A .
(c) Repeat Part (a) for the grammar resulting from G if one replaces the last rule by A S.A.
3. Answer each of the following questions True or False. If you choose false, you should be able to state a
counterexample.
(a) If a language L can be described by a regular expression, we can be sure it is a context-free language.
(b) If a language L cannot be described by a regular expression, we can be sure it is not a context-free
language.
(c) If L is generated by a context-free grammar, then L cannot be regular.
(d) If there is no pushdown automaton accepting L, then L cannot be regular.
(e) If L is accepted by a nondeterministic finite automaton, then there is some deterministic PDA accepting L.
(f) If L is accepted by a deterministic PDA, then L' (the complement of L) must be regular.
(g) If L is accepted by a deterministic PDA, then L' must be context free.
(h) If, for a given L in {a, b}*, there exist x, y, z, such that y and xynz L for all n 0, then L must be
regular.
(i) If, for a given L in {a, b}*, there do not exist u, v, x, y, z such that |vy| 1 and uvnxynz L for all n 0,
then L cannot be regular.
(j) If L is regular and L = L1 L2 for some L1 and L2, then at least one of L1 and L2 must be regular.
(k) If L is context free and L = L1L2 for some L1 and L2, then L1 and L2 must both be context free.
(l) If L is context free, then L* must be regular.
(m) If L is an infinite context-free language, then in any context-free grammar generating L there exists at least
one recursive rule.
(n) If L is an infinite context-free language, then there is some context-free grammar generating L that has no
rule of the form A B, where A and B are nonterminal symbols.
(o) Every context-free grammar can be converted into an equivalent regular grammar.
(p) Given a context-free grammar generating L, every string in L has a right-most derivation.
4. Recall problem 4 from Homework 12. It asked you to consider the following grammar for a language L (the
start symbol is S; the alphabets are implicit in the rules):
S SS | AAA |
A aA | Aa | b
(a) It is not possible to convert this grammar to an equivalent one in Chomsky Normal Form. Why not?
Homework 15
Parsing
(b) Modify the grammar as little as possible so that it generates L - . Now convert this new grammar to
Chomsky Normal Form. Is the resulting grammar still ambiguous? Why or why not?
(c) From either the original grammar for L - or the one in Chomsky Normal Form, construct a PDA that
accepts L - .
5. Consider the following language : L = {wRw" : w {a, b}* and w" indicates w with each occurrence of a
replaced by b, and vice versa}. In Homework 12, problem 5, you wrote a context-free grammar for L. Then, in
Homework 13, problem 3, you wrote a PDA M that accepts L and traced one of its computations. Now decide
whether you think L is deterministic context free. Defend your answer.
6. Convert the following grammar for arithmetic expressions to Chomsky Normal Form:
EE+T
ET
TT*F
TF
F (E)
F id
7. Again, consider the grammar for arithmetic expressions given in Problem 6. Walk through the process of
doing a top down parse of the following strings using that grammar. Point out the places where a decision has to
be made about what to do.
(a) id * id + id
(b) id * id * id
Solutions
1. (a) L = {ambn : m n}. To show that a language L is deterministic context free, we need to show a
deterministic PDA that accepts L$. We did that for L = {ambn : m n} in class. (See Lecture Notes 14).
(b) L = {wcwR : w {a, b}*}. In class (again see Lecture Notes 14), we built a deterministic PDA to accept L
= {wcwR : w {a, b}*}. Its easy to turn it into a deterministic PDA that accepts L$.
(c) L = {cambm : m 0} {damb2m : m 0}. Often its hard to build a deterministic PDA for a language that is
formed by taking the union of two other languages. For example, {ambm : m 0} {amb2m : m 0} would be
hard (in fact its impossible) because we have no way of knowing, until we run out of bs, whether were
expecting two bs for each a or just one. However, {cambm : m 0} {damb2m : m 0} is actually quite easy.
Every string starts with a c or a d. If its a c, then we know to look for one b for each a; if its a d, then we know
to look for two. So the first thing we do is to start our machine like this:
1
c
0
d
2
The machine that starts in state 1 is our classic machine for anbn, except of course that it must have a final
transition on $ to its final state.
We have two choices for the machine that starts in state 2. It can either push one a for every a it sees, and then
pop an a for every pair of bs, or it can push two as for every a it sees, and then pop one a for every b.
Homework 15
Parsing
(d) L = (amcbm : m 0} {amdb2m : m 0}. Now weve got another unioned language. But this time we dont
get a clue from the first character which part of the language were dealing with. That turns out to be okay
though, because we do find out before we have to start processing the bs whether weve got two bs for each a or
just one. Recall the two approaches we mentioned for machine 2 in the last problem. What we need here is the
first, the one that pushes a single a for each a it sees. Then, when we see a c or d, we branch and either pop an a
for each b or pop an a for every two bs.
2. (a) We need to apply left factoring to the two rules S () and S (A). We also need to eliminate the left
recursion from A A . S. Applying left factoring, we get the first column shown here. Then getting rid of left
recursion gets us the second column:
S (S
S (S
S )
S )
S A)
S A)
Sa
Sa
AS
A SA
A A.S
A .SA
A
(b) Notice that the point of the first rule, which was S (), was to get a set of parentheses with no A inside.
An alternative way to do that is to dump that rule but to add the rule A . Now we always introduce an A
when we expand S, but we can get rid of it later. If we do this, then theres no left factoring to be done. We still
have to get rid of the left recursion on A, just as we did above, however.
(c) If we change A A.S to S S.A, then theres no left recursion to get rid of and we can leave the rules
unchanged. Notice, though, that well get different parse trees this way, which may or may not be important. To
see this, consider the string (a.a.a) and parse it using both the original grammar and the one we get if we change
the last rule.
3. (a) True, since all regular languages are context-free.
(b) False, there exist languages that are context-free but not regular.
(c) False. All regular languages are also context-free and thus are generated by context-free grammars.
(d) True, since if L were regular, it would also be context free and thus would be accepted by some PDA.
(e) True, since there must also be a deterministic FSM and thus a deterministic PDA.
(f) False. Consider L = anbn. L' = {w {a, b}* : either some b comes before some a or there is an unequal
number of a's and b's.}. Clearly this language is not regular since we can't count the a's and b's.
(g) True, since the deterministic context-free languages are closed under complement.
(h) False. Suppose L = anc*bn, which is clearly not regular. Let x = aa, y = c, and z = bb. xynz L.
(i) False. L could be finite.
(j) False. L1 could be anbn and L2 could be { anbm : n m}. Neither is regular. But L1 L2 = {}, which
is regular.
(k) False. Let L1 = a* and L2 = {anbmcm : n m}. L2 is not context free. But L = L1L2 = a*bmcm, which is
context free.
(l) False. Let L = wwR.
(m) True.
(n) True, since we have a procedure for eliminating such unit productions.
(o) False, since there exist context-free languages that are not regular.
(p) True.
4. (a) No grammar in Chomsky Normal Form can generate , yet L.
Homework 15
Parsing
(b) In the original grammar, we could generate zero copies of AAA (by letting S go to ), one copy of AAA
(by letting S go to AAA), two copies (by letting S go to SS and then each of them to AAA), three copies of AAA
(by letting S go to SS, then one of the Ss goes to SS, then all three go to AAA), and so forth. We want to make
sure that we can still get one copy, two copies, three copies, etc. We just want to eliminate the zero copies
option. Note that the only role of S is to determine how many copies of AAA are produced. Once we have
generated As we can never go back and apply the S rules again. So all we have to do is to eliminate the
production S . The modified grammar to accept L - is thus:
G = ({S, A, B, C, a, b}, {a, b}, R, S), where R = {
S SS | AAA
A aA | Aa | b
If we convert this grammar to Chomsky Normal Form, we get:
G = ({S, A, B, C, a, b}, {a, b}, R, S), where R = {
S SS
A AC
S AB
Ab
B AA
Ca
A CA }
This grammar is still ambiguous.
(c) (from the grammar of part (b)): M = ({p, q}, {a, b}, {S, A, a, b,}, , p, {q})
= { ((p, , ), (q, S))
((q, , A), (q, aA))
((q, , S), (q, SS)
((q, , A), (q, Aa))
((q, , S), (q, AAA)
((q, , A), (q, b))
((q, a, a), (q, ))
((q, b, b), (q, )) }
5. L is not deterministic context free for essentially the same reason that wwR is not.
EE+T
ET
TT*F
TF
F (E)
F id
Step 2. There are no rules. We show steps 3, 4, and 5 next to each other, so it's clear where the rules in steps 4
and 5 came from. In each case, the first rule that is derived from a step 3 rule is next to its source. If more than
one rule is derived from any given step 3 rule, the second and others are shown immediately under the first.
That's why there are some blank lines in the first two columns.
6. The original grammar was:
Homework 15
Parsing
Step 5.
Step 4.
Step 3.
E * T, F
T * F
G" =
E TMF
E LER
E id
T LER
T id
P+
M*
L(
R)
E T T'
E LF'
E id
T L F'
T id
P+
M*
L(
R)
E EPT
TT*F
T TMF
F (E)
F LER
F id
Then we add:
ET*F
E (E)
E id
T (E)
T id
Homework 15
F id
E E E'
E' PT
T T T'
T' M F
F L F'
F' E R
F id
EE+T
Parsing
(since T' M F)
(since F' E R)
(since F' E R)
CS 341 Homework 16
Languages that Are and Are Not Context-Free
1. Show that the following languages are context-free. You can do this by writing a context free grammar or a
PDA, or you can use the closure theorems for context-free languages. For example, you could show that L is the
union of two simpler context-free languages.
(a) L = ancbn
(b) L = {a, b}* - {anbn : n 0}
(c) L = {ambncpdq : n = q, or m p or m + n = p + q}
(d) L = {a, b}* - L1, where L1 is the language {babaabaaabban-1banb : n n 1}.
2. Show that the following languages are not context-free.
2
(a) L = { a n : n 0}
(b) L = {www : w {a, b}*}
(c) L = {w {a, b, c}* : w has equal numbers of a's, b's, and c's}
(d) L = {anbman : n m}
(e) L = {anbmcnd(n+m) : m, n 0}
3. Give an example of a context free language ( *) that contains a subset that is not context free. Describe the
subset.
4. What is wrong with the following "proof" that anb2nan is context free?
(1) Both {anbn : n 0} and {bnan : n 0} are context free.
(2) anb2nan = {anbn}{bnan }
(3) Since the context free languages are closed under concatenation, anb2nan is context free.
5. Consider the following context free grammar: G = ({S, A, a, b}, {a, b}, R, S), where R = {
S aAS
Sa
A SbA
A SS
A ba }
(a) Answer each of the following questions True or False:
(i) From the fact that G is context free, it follows that there is no regular expression for L(G).
(ii) L(G) contains no strings of length 3.
(iii) For any string w L(G), there exists u, v, x, y, z such that w = uvxyz, |vy| 1, and uvnxynz L(G)
for all n 0.
(iv) If there exist languages L1 and L2 such that L(G) = L1 L2, then L1 and L2 must both be context
free.
(v) The language (L(G))R is context free.
(b) Give a leftmost derivation according to G of aaaabaa.
(c) Give the parse tree corresponding to the derivation in (b).
(d) Give a nondeterministic PDA that accepts L(G).
6. Show that the following language is context free:
L = {xxRyyRzzR : x, y, z {a, b}*}.
Homework 16
2. (a) L = { a n : n 0}. Suppose L = { a n : n 0} were context free. Then we could pump. Let n = M2. So w
2
is the string with M 2 , or M4, a's.) Clearly |w| K, since M > K. So uvvxyyz must be in L (whatever v and y
Homework 16
are). But it can't be. Why not? Given our w, the next element of L is the string with (M2+1)2 a's. That's M4 +
2M2 + 1 (expanding it out). But we know that |vxy| M, so we can't pump in more than M a's when we pump
only once. Thus the string we just created can't have more than M4 + M a's. Clearly not enough.
(b) L = {www : w {a, b}*}. The easiest way to do this is not to prove directly that L = {www : w {a, b}*}
is not context free. Instead, let's consider L1 = L a*ba*ba*b. If L is context free, L1 must also be. L1 =
{anbanbanb : n 0}. To show that L1 is not context free, let's choose w = aMbaMbaMb. First we observe that
neither v nor y can contain b, because if either did, then, when we pump, we'd have more than three b's, which is
not allowed. So both must be in one of the three a regions. We consider the cases:
(1, 1) That group of a's will no longer match the other two, so the string is not in L1.
(2, 2)
"
(3, 3)
"
(1, 2) At least one of these two groups will have something pumped into it and will no longer match the
one that is left out.
(2, 3)
"
(1, 3) excluded since |vxy| M, so vxy can't span the middle region of a's.
(c) L = {w {a, b, c}*. Again, the easiest thing to do is first to intersect L = {w {a, b, c}* : w has equal
numbers of a's, b's, and c's} with a regular language. This time we construct L1 = L a*b*c*. L1 must be
context free if L is. But L1 = anbncn, which we've already proven is not context free. So L isn't either.
(d) L = {anbman : n m}. We'll use the pumping lemma to show that L = {anbman : n m} is not context free.
Choose w = aMbMaM. We know that neither v nor y can cross a and b regions, because if one of them did, then,
when we pumped, we'd get a's and b's out of order. So we need only consider the cases where each is in one of
the three regions of w (the first group of a's, the b's, and the second group of a's.)
(1, 1) The first group of a's will no longer match the second group.
(2, 2) If we pump in b's, then at some point there will be more b's than a's, and that's not allowed.
(3, 3) Analogous to (1, 1)
(1, 2) We must either (or both) pump a's into region 1, which means the two a regions won't match, or,
if y is not empty, we'll pump in b's but then eventually there will be more b's than a's.
(2, 3) Analogous to (1, 2)
(1, 3) Ruled out since |vxy| M, so vxy can't span the middle region of b's.
(e) L = {anbmcnd(n+m) : m, n 0}. We can show that L = {anbmcnd(n+m) : m, n 0} is not context free by
pumping. We choose w = aMbMcMd2M. Clearly neither v nor y can cross regions and include more than one letter,
since if that happened we'd get letters out of order when we pumped. So we only consider the cases where v and
y fall within a single region. We'll consider four regions, corresponding to a, b, c, and d.
(1, 1) We'll change the number of a's and they won't match the c's any more.
(1, 2) If v is not empty, we'll change the a's and them won't match the c's. If y is nonempty, we'll change the
number of b's and then we won't have the right number of d's any more.
(1, 3), (1, 4) are ruled out because |vxy| M, so vxy can't span across any whole regions.
(2, 2) We'll change the number of b's but then we won't have the right number of d's.
(2, 3) If v is not empty, we'll change the b's without changing the d's. If y is not empty, we'll change the c's and
they'll no longer match the a's.
(2, 4) is ruled out because |vxy| M, so vxy can't span across any whole regions.
(3, 3) We'll change the number of c's and they won't match the a's.
(3, 4) If v is not empty, we'll change c's and they won't match a's. If y is not empty, we'll change d's without
changing b's.
(4, 4) We'll change d's without changing a's or b's.
Homework 16
3. Let L = { anbmcp : n = m or m = p}. L is clearly context free. We can build a nondeterministic PDA M to
accept it. M has two forks, one of which compares n to m and the other of which compares m to p (skipping over
the a's). L1 = {anbmcp : n = m and m = p} is a subset of L. But L1 = anbncn, which we know is not context free.
4. (1) is fine. (2) is fine if we don't over interpret it. In particular, although both languages are defined in terms
of the variable n, the scope of that variable is a single language. So within each individual language definition,
the two occurrences of n are correctly interpreted to be occurrences of a single variable, and thus the values must
be same both times. However, when we concatenate the two languages, we still have two separate language
definitions with separate variables. So the two n's are different. This is the key. It means that we can't assume
that, given {anbn}{bnan }, we choose the same value of n for the two strings we choose. For example, we could
get a2b2b3a3, which is a2b5a3, which is clearly not in {anb2nan}.
5. (a)
(i) False, since all regular languages are also context free.
(ii) True.
(iii) False. For example a L, but is not long enough to contain pumpable substrings.
(iv) False.
(v) True, since the context-free languages are closed under reversal.
(b) S aAs aSSS aaSS aaaS aaaaAS aaaabaS aaaabaa.
(c)
S
a
A
b
S
a
Homework 16
CS 341 Homework 17
Turing Machines
1. Let M = (K, , , s, {h}), where
K = {q0, q1, h},
= {a, b, , },
s = q0,
and is given by the following table,
q
q0
q0
q0
q0
q1
q1
q1
q1
a
b
a
b
(q,
)
(q1, b)
(q1, a)
(h, )
(q0, )
(q0, )
(q0, )
(q0, )
(q1, )
(a) Trace the computation of M starting from the configuration (q0, aabbba).
(b) Describe informally what M does when started in q0 on any square of a tape.
2. Repeat Problem 1 for the machine M = (K, , , s, {h}), where
K = {q0, q1, q2, h},
= {a, b, , },
s = q0,
and is given by the following table (the transitions on are (q, ) = (q, ), and are omitted).
q
q0
q0
q0
q1
q1
q1
q2
q2
q2
a
b
a
b
a
b
(q,
)
(q1, )
(q0, )
(q0, )
(q1, )
(q2, )
(q1, )
(q2, )
(q2, )
(h, )
Turing Machines
q
q0
q0
q0
q1
q1
q1
q2
q2
q2
(q,
)
(q1, )
(q0, )
(q0, )
(q2, )
(h, )
(q1, )
(q2, a)
(q0, )
(q2, )
4. Design and write out in full a Turing machine that scans to the right until it finds two consecutive a's and then
halts. The alphabet of the Turing machine should be {a, b, , }.
5. Give a Turing machine (in our abbreviated notation) that takes as input a string w {a, b}* and squeezes out
the a's. Assume that the input configuration is (s, w) and the output configuration is (h, w'), where w' = w
with all the a's removed.
6. Give a Turing machine (in our abbreviated notation) that shifts its input two characters to the right.
Input:
w
Output:
w
7. (L & P 5.7.2) Show that if a language is recursively enumerable, then there is a Turing machine that
enumerates it without ever repeating an element of the language.
Solutions
/
1. (a)
/
a,b,/
q0
q1
a/b; b/a
/
h
q0, aabbba
q1, babbba
q0, babbba
q1, bbbbba
q0, bbbbba
q1, bbabba
q0, bbabba
q1, bbaaba
q0, bbaaba
q1, bbaaaa
q0, bbaaaa
q1, bbaaab
q0, bbaaab
h, bbaaab
(b) Converts all a's to b's, and vice versa, starting with the current symbol and moving right.
Homework 17
Turing Machines
2. (a)
b,/
a,/
a/
q0, abbbbaba
q0, abbbbaba
q0, abbbbaba
q0, abbbbaba
q1, abbbbaba
q1, abbbbaba
q2, abbbbaba
h, abbbbaba
a,b,/
b/
q0
q1
q2
/
h
(b) M goes right until if finds an a, then left until it finds a b, then right until it finds a blank.
/
3.
a/a
a/
a/
q0
q1
q2
/
h
q0, a a a a a
q1, a a a a a
q2, a a a a
q0, a a a a
q1, a a a a
q2, a a a
q0, a a a
q1, a a a
h,a a a
/
M changes every other a, moving left from the start, to a blank. If n is odd, it loops. If n is even, it halts.
4.
a/
a/a
q0
q1
b,/
b,/
5. The idea here is that first we'll push all b's to the left, thus squeezing all the a's to the right. Then we'll just
replace the a's with blanks. In more detail: scan the string from left to right. Every time we find an a, if there are
any b's to the right of it we need to shift them left. So scan right, skipping as many a's as there are. When we
find a b, swap it with the last a. That squeezes one a further to the right. Go back to the left end and repeat.
Eventually all the a's will come after all the b's. At that point, when we look for a b following an a, all we'll find
is a blank. At that point, we just clean up by rewriting all the a's as blanks.
>Ra,
Rb,
aLbL
L
a
Homework 17
Turing Machines
6. The idea is to start with the rightmost character of w, rewrite it as a blank, then move two squares to the right
and plunk that character back down. Then scan left for the next leftmost character, do the same thing, and so
forth.
>L
R2aLL
R3
7. Suppose that M is the Turing machine that enumerates L. Then construct M* to enumerate L with no
repetitions: M* will begin by simulating M. But whenever M outputs a string, M* will first check to see if the
string has been output before (see below). If it has, it will just continue looking for strings. If not, it will output
it, and it will also write the string, with # in front of it, at the right end of the tape. To check whether a string has
been output before, M* just scans its tape checking for the string it is about to output.
Homework 17
Turing Machines
CS 341 Homework 18
Computing with Turing Machines
1. Present Turing machines that decide the following languages over {a, b}:
(a)
(b) {}
(c) {a}
(d) {a}*
2. Consider the simple (regular, in fact) language L = {w {a,b}* : |w| is even}
(a) Give a Turing machine that decides L.
(b) Give a Turing machine that semidecides L.
3. Give a Turing machine (in our abbreviated notation) that accepts L = {anbman : m > n}
4. Give a Turing machine (in our abbreviated notation) that accepts L = {ww : w {a, b}*}
5. Give a Turing machine (in our abbreviated notation) that computes the following function from strings in {a,
b}* to strings in {a, b}* : f(w) = wwR.
6. Give a Turing machine that computes the function f: {a,b,c}* N (the integers), where f(w) = the number of
a's (in unary) in w.
7. Let w and x be any two positive integers encoded in unary. Show a Turing machine M that computes
f(w, x) = w + x.
Represent the input to M as
w;x
8. Two's complement form provides a way to represent both positive and negative binary integers. Suppose that
the number of bits allocated to each number is k (generally the word size). Then each positive integer is
represented simply as its binary encoding, with leading zeros. Each negative integer n is represented as the result
of subtracting |n| from 2k, where k is the number of bits to be used in the representation. Given a fixed k, it is
possible to represent any integer n if -2k-1 n 2k-1 -1. The high order digit of each number indicates its sign: it
is zero for positive integers and 1 for negative integers.
Examples, for k = 4:
0 = 0000, 1 = 0001, 2 = 0010, 3 = 0011, 4 = 0100, 5 = 0101, 6 = 0110, 7 = 0111
-1 = 1111, -2 = 1110, -3 = 1101, -4 = 1100, -5 = 1011, -6 = 1010, -7 = 1001, -8 = 1000
Since Turing machines don't have fixed length words, we'd like to be able to represent any integer. We will
represent positive integers with a single leading 0. We will represent each negative integer n as the result of
subtracting n from 2i+1, where i is the smallest value such that 2i |n|. For example, -65 will be represented as
1111111, since 27 (128) 65, so we subtract 65 (01000001 in binary) from 28 (in binary, 100000000). We need
the extra digit (i.e., we subtract from 2i+1 rather than from 2i) because, in order for a positive number to be
interpreted as positive, it must have a leading 0, thus consuming an extra digit.
Let w be any integer encoded in two's complement form. Show a Turing machine that computes f(w) = -w.
Homework 18
Solutions
1 (a) We should reject everything, since no strings are in the language.
>n
(b) Other than the left boundary symbol, the tape should be blank:
>R
y
(c) Just the single string a:
>R
a
n
(d) Any number of a's:
a
>R
(a,)
2. (a)
a,b
>R
a,b
>R
Homework 18
a,b
(b)
a,b
3. The idea is to make a sequence of passes over the input. On each pass, we mark off (with d, e, and f) a
matching a, b, and a. This corresponds to the top row of the machine shown here. When there are no matching
groups left, then we accept if there is nothing left or if there are b's in the middle. If there is anything else, we
reject. It turns out that a great deal of this machine is essentially error checking. We get to the R on the second
row as soon as we find the first "extra" b. We can loop in it as long as we find b's. If we find a's we reject. If we
find a blank, then the string had just b's, which is okay, so we accept. Once we find an f, we have to go to the
separate state R on the third row to skip over the f's and make sure we get to the final blank without either any
more b's or any more a's.
d,e
a,e
R
a d
b,f
R
fL
a,b
R
4. The hard part here is that we don't know where the middle of the string is. So we don't know where the
boundary between the first occurrence of w ends and the second begins. We can break this problem into three
subroutines, which will be executed in order:
(1) Find the middle and mark it. If there's a lone character in the middle (i.e., the length of the input string isn't
even), then reject immediately.
(2) Bounce back and forth between the beginning of the first w and the beginning of the second, marking off
characters if they match and rejecting if they don't.
(3) If we get to the end of the w's and everything has matched, accept.
Let's say a little more about step (1). We need to put a marker in the middle of the string. The easiest thing to do
is to make a double marker. We'll use ##. That way, we can start at both ends (bouncing back and forth), moving
a marker one character toward the middle at each step. For example, if we start with the tape aabbaabb,
after one mark off step we'll have a#abbaab#b, then aa#bbaa#bb, and finally
aabb##aabb. So first we shift the whole input string two squares to the right on the tape to make room
for the two markers. Then we bounce back and forth, moving the two markers toward the center. If they meet,
we've got an even length string and we can continue. If they don't, we reject right away.
Homework 18
5. The idea is to work from the middle. We'll scan right to the rightmost character of w (which we find by
scanning for the first blank, then backing up (left) one square. We'll rewrite it so we know not to deal with it
again (a's will become 1's; b's will become 2's.) Then we move right and copy it. Now if we scan back left past
any 1's or 2's, we'll find the next rightmost character of w. We rewrite it to a 1 or a 2, then scan right to a blank
and copy it. We keep this up until, when we scan back to the left, past any 1's or 2's, we hit a blank. That means
we've copied everything. Now we scan to the right, replacing 1's by a's and 2's by b's. Finally, we scan back to
the left to position the read head to the left of w.
a
>R L
1Ra
L1,2 L1,2
b
2Rb
a
1
a,b
2
b
L
6. The idea here is that we need to write a 1 for every a and we can throw away b's and c's. We want the 1's to
end up at the left end of the string, so then all we have to do to clean up at the end is erase all the b's and c's to the
right of the area with the 1's. So, we'll start at the left edge of the string. We'll skip over b's and c's until we get
to an a. At that point, rewrite it as b so we don't count it again. Then (remembering it in the state) scan left until
we get to the blank (if this is the first 1) or we get to a 1. In either case, move one square to the right and write a
1. We are thus overwriting a b or a c, but we don't care. We're going to throw them away anyway. Now start
again scanning to the right looking for an a. At some point, we'll come to the end of string blank instead. At that
point, just travel leftward, rewriting all the b's and c's to blank, then cross the 1's and land on the blank at the left
of the string.
>R
bL,1R1
b,c
b,c
1
7. All we have to do is to concatenate the two strings. So shift the second one left one square, covering up the
semicolon.
>R; R 1
;L1
LL
Homework 18
8. Do a couple of examples of the conversion to see what's going on. What you'll observe is that we want to scan
from the right. Initially, we may see some zeros, and those will stay as zeros. If we ever see a 1, then we rewrite
the first one as a 1. After that, we're dealing with borrowing, so we swap all digits: every zero becomes a one and
every one becomes a zero, until we hit the blank at the end of the string and halt.
>R L
1
1L
1
0
Homework 18
CS 341 Homework 19
Turing Machine Extensions
1. Consider the language L = {wwR}.
(a) Describe a one tape Turing machine to accept L.
(b) Describe a two tape Turing machine to accept L.
(c) How much more efficient is the two tape machine?
2. Give (in abbreviated notation) a nondeterministic Turing machine that accepts the language
L = {wwRuuR : w, u {a, b}*}
Solutions
(1) (a) The one tape machine needs to bounce back and forth between the beginning of the input string and the
end, marking off matching symbols.
(b) The two tape machine works as follows: If the input is , accept. If not, copy the input to the second tape and
record in the state that you have processed an even number of characters so far. Now, start the first tape at the
left end and the second tape at the right end . Check that the symbols on the two tapes are the same. If not,
reject. If so, move the first tape head to the right and the second tape head to the left. Also record that you have
processed an odd number and continue, each time using the state to keep track of whether youve seen an even or
odd number of characters so far. When you reach the end of the input tape, accept if youve seen an even number
of characters. Reject if youve seen an odd number. (The even/odd counter is necessary to make sure that you
reject strings such as aba.)
(c) The one tape machine takes time proportional to the square of the length of the input, since for an input of
length n it will make n passes over the input, each of which takes on average n/2 steps. The two tape machine
takes time that's linear in n. It takes n steps to copy, then another n steps to compare.
2. The idea is just to use nondeterminism to guess the location of the boundary between the w and u regions.
Each path will choose a spot, shift the u region to the right, and insert a boundary marker #. Once this is done,
the machine simply checks each region for wwR. If we get a string in L, one of the guessed paths will work.
Homework 19
CS 341 Homework 20
Unrestricted Grammars
1. Find grammars that generate the following languages:
(a) L = {ww : w {a, b}*}
n
(b) L = { a 2 : n 0}
(c) L = { anb2nc3n : n 1}
(d) L = {wR : w is the social security number of a living American citizen}
(e) L = {wcmdn : w {a, b}* and m = the number of a's in w and n equals the number of b's in w}
2. Find a grammar that computes the function f(w) = ww, where w {a, b}*.
Solutions
1. (a) L = {ww : w {a, b}*}
There isnt any way to generate the two ws in the correct order. Suppose we try. Then we could get aSa.
Suppose we want b next. Then we need Sa to become bSab, since the new b has to come after the a thats
already there. That could work. Now we have abSab. Lets say we want a next. Now Sab has to become aSaba.
The problem is that, as the length of the string grows, so does the number of rules well need to cope with all the
patterns we could have to replace. In a finite number of rules, we cant deal with replacing S (which we need to
do to get the next character in the first occurrence of w), and adding a new character that is arbitrarily far away
from S.
The other approach we could try would be have a rule S WW, and then let W generate a string of a's and b's.
But this won't work, since we have no way to control the expansion of the two W's so that they produce the same
thing.
So what we need to do is to generate wwR and then, carefully, reverse the order of the characters in wR. What
well do is to start by erecting a wall (#) at the right end of the string. Then well generate wwR. Then, in a
second phase, well take the characters in the second w and, one at a time, starting with the leftmost, move it right
and then move it past the wall. At each step, we move each character up to the wall and then just over it, but we
dont reverse characters once they get over the wall. The first part of the grammar, which will generate wTwR,
looks like this:
S S1 #
S1 aS1a
S1 bS1b
S1 T
T will mark the left edge of the portion that needs to be reversed.
At this point, we can generate strings such as abbbTbbba#. What we need to do now is to reverse the string of
as and bs that is between T and #. To do that, we let T spin off a marker Q, which we can pass rightward
through the string. As it moves to the right, it will take the first a or b it finds with it. It does this by swapping the
character it is carrying (the one just to the right of it) with the next one to the right. It also moves itself one
square to the right. The four rules marked with * accomplish this. When Qs character gets to the # (the rules
marked **), the a or b will swap places with the # (thus hopping the fence) and the Q will go away. We can keep
doing this until all the as and bs are behind the fence and in the right order. Then the final T# will drop out.
Here are the rules for this phase:
Homework 20
Unrestricted Grammars
T TQ
Qaa aQa
*
Qab bQa
*
Qbb bQb
*
Qba aQb
*
Qa# #a
**
Qb# #b
**
T#
So with R as given above, the grammar G = ({S, S1, #, T, Q, a, b}, {a, b}, R, S}
n
(b) L = { a 2 : n 0}
The idea here is first to generate the first string, which is just a. Then think about the next one. You can derive it
by taking the previous one, and, for every a, write two as. So we get aa. Now to get the third one, we do the
same thing. Each of the two as becomes two and we have four, and so forth. So we need a rule to get us started
and to indicate the possibility of duplication. Then we need rules to actually do the duplication. To make
duplication happen, we need a symbol that gets generated by S indicating the option to repeat. Well use P.
Since duplication can happen an arbitrary number of times, we need P to spin off as many individual duplication
commands as we want. Well use R for that. The one other thing we need is to make sure, if we start a
duplication step, that we finish it. In other words, suppose we currently have aaaa. If we start duplicating the as,
we must duplicate all of them. Otherwise, we might end up with, for example, seven as. So well introduce a
left edge marker, #. Once we fire up a duplication (by creating an R), well only stop (i.e., get rid of R) when R
has made it all the way to the other end of the string (namely the left end since it starts at the right). So we get
the following rules:
S #aP
P lets us start up duplication processes as often as we like.
P
When weve done as many as we want, we get rid of P.
P RP
R will actually do a duplication by moving leftward, duplicating every a it sees.
aR Raa
Actually duplicates one a, and moves R one square to the left so it moves on to the next a
#R #
Get rid of R once it's made it all the way to the left
#
Get of # at the end
So with R as given above, the grammar G = ({S, P, R, #, a, b}, {a, b}, R, S}
(c) L = { anb2nc3n : n 1}
This one is very similar to anbncn. The only difference is that we will churn out b's in pairs and c's in triples each
time we expand S. So we get:
S aBSccc
S aBccc
Ba aB
Bc bbc
Bb bbb
So with R as given above, the grammar G = ({S, B, a, b, c}, {a, b, c}, R, S}
(d) L = {wR : w is the social security number of a living American citizen}
This one is regular. There is a finite number of such social security numbers. So we need one rule for each
number. Each rule is of the form S <valid number>. So with that collection of rules as R, the grammar G =
({S, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, R, S}
(e) L = {wcmdn : w {a, b}* and m = the number of a's in w and n equals the number of b's in w}
The idea here is to generate a c every time we generate an a and to generate a d every time we generate a b. We'll
do this by generating the nonterminals C and D, which we will use to generate c's and d's once everything is in
the right place. Once we've finished generating all the a's and b's we want, the next thing we need to do is to get
Homework 20
Unrestricted Grammars
all the D's to the far right of the string, all the C's next, and then have the a's and b's left alone at the left. We
guarantee that everything must line up that way by making sure that C can't become c and D can't become d
unless things are right. To do this, we require that D can only become d if it's all the way to the right (i.e., it's
followed by #) or it's got a d to its right. Similarly with C. We can do this with the following rules;
S S1#
S1 aS1C
S1 bS1D
S1
DC CD
D# d
Dd dd
C# c
Cd cd
Cc cc
#
So with R as given above, the grammar G = ({S, S1, C, D, #, a, b, c, d}, {a, b, c, d}, R, S}
2. We need to find a grammar that computes the function f(w) = ww. So we'll get inputs such as SabaS. Think of
the grammar we'll build as a procedure, which will work as described below. At any given time, the string that
has just been derived will be composed of the following regions:
<the part of w that
has already been
inserted
copied>
when T
is)
Most of the rules come in pairs, one dealing with an a, the other with b.
SS
Sa aS#a
Unrestricted Grammars
%aa a%a
%ab b%a
%ba a%b
%bb b%b
%aW aW
%bW bW
ST
W
Push a to the right through the copied region in exactly the same way we pushed it through w,
except we're using % rather than # as the pusher. This rule pushes a past a.
Pushes a past b.
Same two rules for pushing b.
"
We've pushed an a all the way to the right boundary, so get rid of %, the pusher.
Same for a pushed b.
All the characters from w have been copied, so they're all to the left of S, which causes S to be
adjacent to the middle marker T. We can now get rid of our special walls. Here we get rid of S
and T.
Gid rid of W. Note that if we do this before we should, there's no way to get rid of %, so any
derivation path that does this will fail to produce a string in {a, b}*.
So with R as given above, the grammar G = ({S, T, W, #, %,a, b}, {a, b}, R, S}
Homework 20
Unrestricted Grammars
CS 341 Homework 21
Undecidability
1. Which of the following problems about Turing machines are solvable, and which are undecidable? Explain
your answers carefully.
(a) To determine, given a Turing machine M, a state q, and a string w, whether M ever reaches state q when
started with input w from its initial state.
(b) To determine, given a Turing machine M and a string w, whether M ever moves its head to the left when
started with input w.
(c) To determine, given two Turing machines, whether one semidecides the complement of the language
semidecided by the other.
(d) To determine, given a Turing machine M, whether the language semidecided by M is finite.
2. Show that it is decidable, given a pushdown automaton M with one state, whether L(M) = *. (Hint: Show
that such an automaton accepts all strings if and only if it accepts all strings of length one.)
3. Which of the following problems about context-free grammars are solvable, and which are undecidable?
Explain your answers carefully.
(a) To determine, given a context-free grammar G, is L(G)?
(b) To determine, given a context-free grammar G, is {} = L(G)?
(c) To determine, given two context-free grammars G1 and G2, is L(G1) L(G2)?
4. The nonrecursive languages L that we have discussed in class all have the property that either L or the
complement of L is recursively enumerable.
(a) Show by a counting argument that there is a language L such that neither L nor its complement is recursively
enumerable.
(b) Give an example of such a language.
Solutions
1. (a) To determine, given a Turing machine M, a state q, and a string w, whether M ever reaches state q when
started with input w from its initial state. This is not solvable. We can reduce H to it. Essentially, if we can tell
whether a machine M ever reaches some state q, then let q be M's halt state (and we can massage M so it has only
one halt state). If it ever gets to q, it must have halted. More formally:
L1 = H =
(?M2) L2 =
{s : "M" "w" "q" : M reaches state q when started with input w from its initial state}
Let ' create, from M the machine M* as follows. Initially M* equals M. Next, a new halting state H is created in
M*. Then, from each state that was a halting state in M, we create transitions in M* such that for all possible
values of the current tape square, M* goes to H. We create no other transitions to H. Notice that M* will end up
in H in precisely the same situations in which M halts.
Now let ("M" "w") = '("M") "w" "H"
So, if M2 exists, then M1 exists. It invokes ' to create M*. Then it passes "M*", "w", and "H" to M2 and returns
whatever M2 returns. But M1 doesn't exist. So neither does M2.
Homework 21
Undecidability
(b) To determine, given a Turing machine M and a string w, whether M ever moves its head to the left when
started with input w. This one is solvable. We will assume that M is deterministic. We can build the deciding
machine D as follows. D starts by simulating the operation of M on w. D keeps track on another tape of each
configuration of M that it has seen so far. Eventually, one of the following things must happen:
1. M moves its head to the left. In this case, we say yes.
2. M is stuck on some square s of the tape. In other words, it is in some state p looking at some square s on the
tape and it has been in this configuration before. If this happens and M didn't go left yet, then M simply
hasn't moved off of s. And it won't from now on, since it's just going to do the same thing at this point as it
did the last time it was in this configuration. So we say no.
3. M moves off the right hand edge of the input w. So it is in some state p looking at a blank. Within k steps (if
k is the number of states in M), M must repeat some state p. If it does this without moving left, then again we
know that it never will. In other words, if the last time it was in the configuration in which it was in state p,
looking at a blank, there was nothing to the right except blanks, and it can't move left, and it is again in that
same situation, it will do exactly the same thing again. So we say no.
(c) To determine, given two Turing machines, whether one semidecides the complement of the language
semidecided by the other. This one is not solvable. We can reduce to it the problem, "Given a Turing machine
M, is there any string at all on which M halts?" (Which is equivalent to "Is L(M) = ?") In the book we show
that this problem is not solvable. What we'll do is to build a machine M* that semidecides the language *,
which is the complement of the language . If we could build a machine to tell, given two Turing machines,
whether one semidecides the complement of the language semidecided by the other, then to find out whether any
given machine M accepts anything, we'd pass M and our constructed M* to this new machine. If it says yes, then
M accepts . If it says no, then M must accept something. Formally:
L1 =
(?M2) L2 =
M accepts strings over some input alphabet . Let ' construct a machine M* that semidecides the language *.
Then ("M") = "M" "'(M)".
So, if M2 exists, then M1 exists. It invokes ' to create M*. Then it passes "M" and "M*" to M2 and returns the
opposite of whatever M2 returns (since M2 says yes if L(M) = and M1 wants to say yes if L(M) ). But M1
doesn't exist. So neither does M2.
(d) To determine, given a Turing machine M, whether the language semidecided by M is finite. This one isn't
solvable. We can reduce to it the problem, "Given a Turing machine M, does M halt on ?" We'll construct,
from M, a new machine M*, which erases its input tape and then simulates M. M* halts on all inputs iff M halts
on . If M doesn't halt on , then M* halts on no inputs. So there are two situations: M* halts on all inputs (i.e.,
L(M*) is infinite) or M* halts on no inputs (i.e., L(M*) is finite). So, if we could build a Turing machine M2 to
decide whether L(M*) is finite or infinite, we could build a machine M1 to decide whether M halts on .
Formally:
{s = "M" M halts on }
L1 =
(?M2) L2 =
Homework 21
{s = "M" is finite}
Undecidability
Homework 21
Undecidability
4. (a) If any language L is recursively enumerable, then there is a Turing machine that semidecides it. Every
Turing machine has a description of finite length. Therefore, the number of Turing machines, and thus the
number of recursively enumerable languages, is countably infinite (since the power set of a countable set is
countably infinite). If, for some language L, its complement is re, then it must have a semideciding Turing
machine, so there is a countably infinite number of languages whose complement is recursively enumerable. But
there is an uncountable number of languages. So there must be languages that are not recursively enumerable
and do not have recursively enumerable complements.
(b) L = {"M" : M halts on the input 0 and M doesn't halt on the input 1}.
The complement of L = {"M" : M doesn't halt on the input 0 or M halts on the input 1}. Neither of these
languages is recursively enumerable because of the doesn't halt piece.
Homework 21
Undecidability
CS 341 Homework 22
Review
1. Given the following language categories:
A:
L is finite.
B:
L is not finite but is regular.
C:
L is not regular but is deterministic context free
D:
L is not deterministic context free but is context free
E:
L is not context free but is Turing decidable
F:
L is not Turing decidable but is Turing acceptable
G:
L is not Turing acceptable
Assign the appropriate category to each of the following languages. Make sure you can justify your answer.
a. _____ {anbkn : k = 1 or k = 2, n 0}
b. _____ {anbkn : k = 0 or k = 1, n 0}
c. _____ {anbncn : n 0}
d. _____ {anbncm : n 0, m 0}
e. _____ {anbn : n 0} a*
f. _____ {anbm : n is prime and m is even}
g. _____ {anbmcm+n : n 0, m 0}
h. _____ {anbmcmn : n 0, m 0}
i. _____ {anbm : n 0, m 0}
j. _____ {xy : x a*, y b*, |x| = |y|}
k. _____ {xy : x a*, y a*, |x| = |y|}
l. _____ {x : x {a, b, c}*, and x has 5 or more a's}
m._____ {"M" : M accepts at least 1 string}
n._____ {"M" : M is a Turing machine that halts on input and |"M"| 1000}
o._____ {"M" : M is a Turing machine with 50 states}
p._____ {"M" : M is a Turing machine such that L(M) = a*}
q. _____ {x : x {A, B, C, D, E, F, G}, and x is the answer you write to this question}
Solutions
a. __D__ {anbkn : k = 1 or k = 2, n 0}
We havent discussed many techniques for proving that a context free language isnt deterministic, so we cant
prove that this one isnt. But essentially the reason this one isnt is that we dont know what to do when we see
bs. Clearly, we can build a pda M to accept this language. As M reads each a, it pushes it onto the stack. When
it starts seeing bs, it needs to start popping as. But theres no way to know, until either it runs out of bs or it
gets to the n+1st b, whether to pop an a for each b or hold back and pop an a for every other b. So M is not
deterministic.
b. __C__ {anbkn : k = 0 or k = 1, n 0}
This one is looks very similar to a, but its different in one key way. Remember that the definition of
deterministic context free is that it is possible to build a deterministic pda to accept L$. So now, we can build a
deterministic pda M as follows: Push each a onto the stack. When we run out of as, the next character will
either be $ (in the case where k = 0) or b (in the case where k = 1). So we know right away which case were
dealing with. If M sees a b, it goes to a state where it pops one b for each a and accepts if it comes out even. If it
sees $, it goes to a state where it clears the stack and accepts.
c. __E__ {anbncn : n 0}
We proved that this is recursive by showing a grammar for it in Lecture Notes 24. We used the pumping theorem
to prove that it isnt context free in Lecture Notes 19.
Homework 22
Review
d. __C__ {anbncm : n 0, m 0}
This one is context free. We need to compare the as to the bs, but the cs are independent. So a grammar to
generate this one is:
SAC
AaAb
A
CcC
C
Its deterministic because we can build a pda that always knows what to do: push as, pop an a for each b, then
simply scan the cs.
e. __C__ {anbn : n 0} a*
This one is equivalent to b, since a* = anb0n.
f. __E__ {anbm : n is prime and m is even}
This one is recursive because we can write an algorithm to determine whether a number is prime and another one
to determine whether a number is even. The proof that it is essentially the same as the one we did in class that an:
n is prime is not context free.
g. __C__ {anbmcm+n : n 0, m 0}
This one is context free. A grammar for it is:
SaSc
SbSc
S
Its deterministic because we can build a deterministic pda M for it: M pushes each a onto its stack. It also
pushes an a for each b. Then, when it starts seeing cs, it pops one a for each c. If it runs out of as and cs at the
same time, it accepts.
h. __E__ {anbmcmn : n 0, m 0}
This one is similar to g, but because the number of cs is equal to the product of n and m, rather than the sum,
there is no way to know how many cs to generate until we know both how many as there are and how many bs.
Clearly we can write an algorithm to do it, so its recursive. To prove this , we need to use the pumping theorem.
Let w = aMbMcMM. Call the as, region 1, the bs region 2, and the cs region 3. Clearly neither v nor y can span
regions since, if they did, wed get a string with letters out of order. So we need only consider the following
possibilities:
(1, 1) The number of cs will no longer be the product of n and m.
(1, 2) The number of cs will no longer be the product of n and m.
(1, 3) Ruled out by |vxy| M.
(2, 2) The number of cs will no longer be the product of n and m.
(2, 3) The number of cs will no longer be the product of n and m.
(3, 3) The number of cs will no longer be the product of n and m.
i. __B__ {anbm : n 0, m 0}
This one is regular. It is defined by the regular expression a*b*. It isnt finite, which we know from the presence
of Kleene star in the regular expression.
j. __C__ {xy : x a*, y b*, |x| = |y|}
This one is equivalent to anbn, which weve already shown is context free and not regular. W showed a
deterministic pda to accept it in Lecture Notes 14.
k. __B__ {xy : x a*, y a*, |x| = |y|}
This one is {w = a*: |w| is even}. Weve shown a simple two state FSM for this one.
l. __B__ {x : x {a, b, c}*, and x has 5 or more a's}
This one also has a simple FSM F that accepts it. F has six states. It simply counts as, up to five. If it ever gets
to 5, it accepts.
Homework 22
Review
Homework 22
Review