Exercises of Compiler
Exercises of Compiler
Exercises of Compiler
1
1.1.1
What is the difference between a compiler and an interpreter?
Answer
A compiler is a program that can read a program in one language - the source language
- and translate it into an equivalent program in another language – the target language
and report any errors in the source program that it detects during the translation
process.
Interpreter directly executes the operations specified in the source program on inputs
supplied by the user.
1.1.2
What are the advantages of: (a) a compiler over an interpreter (b) an interpreter over a
compiler?
Answer
a. The machine-language target program produced by a compiler is usually much faster
than an interpreter at mapping inputs to outputs.
b. An interpreter can usually give better error diagnostics than a compiler, because it
executes the source program statement by statement.
1.1.3
What advantages are there to a language-processing system in which the compiler
produces assembly language rather than machine language?
Answer
The compiler may produce an assembly-language program as its output, because
assembly language is easier to produce as output and is easier to debug.
1.1.4
A compiler that translates a high-level language into another high-level language is
called a source-to-source translator. What advantages are there to using C as a target
language for a compiler?
Answer
For the C language there are many compilers available that compile to almost every
hardware.
1.1.5
Describe some of the tasks that an assembler needs to perform.
Answer
It translates from the assembly language to machine code. This machine code is
relocatable.
a. imperative
b. declarative
c. von Neumann
d. object-oriented
e. functional
f. third-generation
g. fourth-generation
h. scripting
1. C
2. C++
3. Cobol
4. Fortran
5. Java
6. Lisp
7. ML
8. Perl
9. Python
10. VB.
Answer
imperative: C, C++
functional: ML
S -> S S + | S S * | a
Answer
1. S -> S S -> S S + S -> a S + S -> a a + S -> a a + a *
2.
3. L = {Postfix expression consisting of digits, plus and multiple signs}
2.2.2
What language is generated by the following grammars? In each case justify your
answer.
1. S -> 0 S 1 | 0 1
2. S -> + S S | - S S | a
3. S -> S ( S ) S | ε
4. S -> a S b S | b S a S | ε
5. S -> a | S + S | S S | S * | ( S )
Answer
1. L = {0n1n | n>=1}
2. L = {Prefix expression consisting of plus and minus signs}
3. L = {Matched brackets of arbitrary arrangement and nesting, includes ε}
4. L = {String has the same amount of a and b, includes ε}
5. L = {Regular expressions used to describe regular languages} refer to wiki
2.2.3
Which of the grammars in Exercise 2.2.2 are ambiguous?
Answer
1. No
2. No
3. Yes
4. Yes
5. Yes
2.2.4
Construct unambiguous context-free grammars for each of the following languages. In
each case show that your grammar is correct.
Answer
1. E -> E E op | num
2.2.5
1. Show that all binary strings generated by the following grammar have values
divisible by 3. Hint. Use induction on the number of nodes in a parse tree.
2. Does the grammar generate all binary strings with values divisible by 3?
Answer
1. Proof
sum
= Σ n 3 2 n + Σm 9 2 m
1. No. Consider the string "10101", which is divisible by 3, but cannot be derived
from the grammar.
Proof:
Every number divisible by 3 can be written in the form 3k. We will consider k >
0 (though it would be valid to consider k to be an arbitrary integer).
Note that every part of num(11, 1001 and 0) is divisible by 3, if the grammar
could generate all the numbers divisible by 3, we can get a production for binary
k from num's production:
3k = num -> 11 | 1001 | num 0 | num num
k = num/3 -> 01 | 0011 | k 0 | k k
k -> 01 | 0011 | k 0 | k k
It is obvious that any value of k that has more
than 2 consecutive bits set to 1 can
never be produced. This can be confirmed by the example given in the
beginning:
10101 is 3*7, hence, k = 7 = 111 in binary. Because 111 has more than 2
consecutive 1's in binary, the grammar will never produce 21.
2.2.6
Construct a context-free grammar for roman numerals.
Note: we just consider a subset of roman numerals which is less than 4k.
Answer
wikipedia: Roman_numerals
via wikipedia, we can categorize the single roman numerals into 4 groups:
Answer
productions:
expr -> expr + term
| expr - term
| term
term -> term * factor
| term / factor
| factor
factor -> digit | (expr)
translation schemes:
expr -> {print("+")} expr + term
| {print("-")} expr - term
| term
term -> {print("*")} term * factor
| {print("/")} term / factor
| factor
factor -> digit {print(digit)}
| (expr)
2.3.2
Construct a syntax-directed translation scheme that translates arithmetic expressions
from postfix notation into infix notation. Give annotated parse trees for the inputs 95-
2 and 952-.
Answer
productions:
expr -> expr expr +
| expr expr -
| expr expr *
| expr expr /
| digit
translation schemes:
expr -> expr {print("+")} expr +
| expr {print("-")} expr -
| {print("(")} expr {print(")*(")} expr {print(")")} *
| {print("(")} expr {print(")/(")} expr {print(")")} /
| digit {print(digit)}
2.3.3
Construct a syntax-directed translation scheme that translates integers into roman
numerals.
Answer
assistant function:
repeat(sign, times) // repeat('a',2) = 'aa'
translation schemes:
num -> thousand hundred ten digit
{ num.roman = thousand.roman || hundred.roman || ten.roman || digit.roman;
print(num.roman)}
thousand -> low {thousand.roman = repeat('M', low.v)}
hundred -> low {hundred.roman = repeat('C', low.v)}
| 4 {hundred.roman = 'CD'}
| high {hundred.roman = 'D' || repeat('X', high.v - 5)}
| 9 {hundred.roman = 'CM'}
ten -> low {ten.roman = repeat('X', low.v)}
| 4 {ten.roman = 'XL'}
| high {ten.roman = 'L' || repeat('X', high.v - 5)}
| 9 {ten.roman = 'XC'}
digit -> low {digit.roman = repeat('I', low.v)}
| 4 {digit.roman = 'IV'}
| high {digit.roman = 'V' || repeat('I', high.v - 5)}
| 9 {digit.roman = 'IX'}
low -> 0 {low.v = 0}
| 1 {low.v = 1}
| 2 {low.v = 2}
| 3 {low.v = 3}
high -> 5 {high.v = 5}
| 6 {high.v = 6}
| 7 {high.v = 7}
| 8 {high.v = 8}
2.3.4
Construct a syntax-directed translation scheme that trans lates roman numerals into
integers.
Answer
productions:
romanNum -> thousand hundred ten digit
thousand -> M | MM | MMM | ε
hundred -> smallHundred | C D | D smallHundred | C M
smallHundred -> C | CC | CCC | ε
ten -> smallTen | X L | L smallTen | X C
smallTen -> X | XX | XXX | ε
digit -> smallDigit | I V | V smallDigit | I X
smallDigit -> I | II | III | ε
translation schemes:
romanNum -> thousand hundred ten digit {romanNum.v = thousand.v || hundred.v || ten.v
|| digit.v; print(romanNun.v)}
thousand -> M {thousand.v = 1}
| MM {thousand.v = 2}
| MMM {thousand.v = 3}
| ε {thousand.v = 0}
hundred -> smallHundred {hundred.v = smallHundred.v}
| C D {hundred.v = smallHundred.v}
| D smallHundred {hundred.v = 5 + smallHundred.v}
| C M {hundred.v = 9}
smallHundred -> C {smallHundred.v = 1}
| CC {smallHundred.v = 2}
| CCC {smallHundred.v = 3}
| ε {hundred.v = 0}
ten -> smallTen {ten.v = smallTen.v}
| X L {ten.v = 4}
| L smallTen {ten.v = 5 + smallTen.v}
| X C {ten.v = 9}
smallTen -> X {smallTen.v = 1}
| XX {smallTen.v = 2}
| XXX {smallTen.v = 3}
| ε {smallTen.v = 0}
digit -> smallDigit {digit.v = smallDigit.v}
| I V {digit.v = 4}
| V smallDigit {digit.v = 5 + smallDigit.v}
| I X {digit.v = 9}
smallDigit -> I {smallDigit.v = 1}
| II {smallDigit.v = 2}
| III {smallDigit.v = 3}
| ε {smallDigit.v = 0}
2.3.5
Construct a syntax-directed translation scheme that translates postfix arithmetic
expressions into equivalent prefix arithmetic expressions.
Answer
production:
expr -> expr expr op | digit
translation scheme:
expr -> {print(op)} expr expr op | digit {print(digit)}
1. S -> + S S | - S S | a
2. S -> S ( S ) S | ε
3. S -> 0 S 1 | 0 1
Answer
See 2.4.1.1.c, 2.4.1.2.c, and 2.4.1.3.c for real implementations in C.
1) S -> + S S | - S S | a
void S(){
switch(lookahead){
case "+":
match("+"); S(); S();
break;
case "-":
match("-"); S(); S();
break;
case "a":
match("a");
break;
default:
throw new SyntaxException();
}
}
void match(Terminal t){
if(lookahead = t){
lookahead = nextTerminal();
}else{
throw new SyntaxException()
}
}
2) S -> S ( S ) S | ε
void S(){
if(lookahead == "("){
match("("); S(); match(")"); S();
}
}
3) S -> 0 S 1 | 0 1
void S(){
switch(lookahead){
case "0":
match("0"); S(); match("1");
break;
case "1":
// match(epsilon);
break;
default:
throw new SyntaxException();
}
}
1. A comment begins with // and includes all characters until the end of that line.
2. A comment begins with / and includes all characters through the next occurrence
of the character sequence /.
2.6.2
Extend the lexical analyzer in Section 2.6.5 to recognize the relational operators <, <=,
==, ! =, >=, >.
2.6.3
Extend the lexical analyzer in Section 2.6.5 to recognize floating point numbers such as
2., 3.14, and . 5.
Answer
Source code: commit 8dd1a9a
Code snippet(src/lexer/Lexer.java):
// handle comment
if(peek == '/'){
peek = (char) stream.read();
if(peek == '/'){
// single line comment
for(;;peek = (char)stream.read()){
if(peek == '\n'){
break;
}
}
}else if(peek == '*'){
// block comment
char prevPeek = ' ';
for(;;prevPeek = peek, peek = (char)stream.read()){
if(prevPeek == '*' && peek == '/'){
break;
}
}
}else{
throw new SyntaxException();
}
}
// handle word
if(Character.isLetter(peek)){
StringBuffer b = new StringBuffer();
do{
b.append(peek);
peek = (char)stream.read();
}while(Character.isLetterOrDigit(peek));
String s = b.toString();
Word w = words.get(s);
if(w == null){
w = new Word(Tag.ID, s);
words.put(s, w);
}
return w;
}
The first expression is executed before the loop; it is typically used for initializing the
loop index. The second expression is a test made before each iteration of the loop; the
loop is exited if the expression becomes 0. The loop itself can be thought of as the
statement {stmt expr3 ; }. The third expression is executed at the end of each iteration; it
is typically used to increment the loop index. The meaning of the for-statement is similar
to
Answer
class For extends Stmt {
Expr E1;
Expr E2;
Expr E3;
Stmt S;
public For(Expr expr1, Expr expr2, Expr expr3, Stmt stmt){
E1 = expr1;
E2 = expr2;
E3 = expr3;
S = stmt;
}
public void gen(){
E1.gen();
Label start = new Label();
Label end = new Label();
emit("ifFalse " + E2.rvalue().toString() + " goto " + end);
S.gen();
E3.gen();
emit("goto " + start);
emit(end + ":")
}
}
2.8.2
The programming language C does not have a boolean type. Show how a C compiler
might translate an if-statement into three-address code.
Answer
Replace
emit("ifFalse " + E.rvalue().toString() + " goto " + after);
with
emit("ifEqual " + E.rvalue().toString() + " 0 goto " + after);
or
emit("ifEqualZero " + E.rvalue().toString() + " goto " + after);
Exercises for Section 3.1
3.1.1
Divide the following C++ program:
float limitedSquare(x){float x;
/* returns x-squared, nut never more than 100 */
return (x <= -10.0 || x >= 10.0) ? 100 : x*x;
}
into appropriate lexemes, using the discussion of Section 3.1.2 as a guide. Which
lexemes should get associated lexical values? What should those values be?
Answer
<float> <id, limitedSquaare> <(> <id, x> <)> <{>
<float> <id, x>
<return> <(> <id, x> <op,"<="> <num, -10.0> <op, "||"> <id, x> <op, ">="> <num,
10.0> <)> <op, "?"> <num, 100> <op, ":"> <id, x> <op, "*"> <id, x>
<}>
3.1.2
Tagged languages like HTML or XML are different from conventional programming
languages in that the punctuation (tags) are either very numerous (as in HTML) or a
user-definable set (as in XML). Further, tags can often have parameters. Suggest how
to divide the following HTML document:
Here is a photo of <b>my house</b>;
<p><img src="house.gif"/><br/>
see <a href="morePix.html">More Picture</a> if you
liked that one.</p>
into appropriate lexemes. Which lexemes should get associated lexical values, and
what should those values be?
Answer
<text, "Here is a photo of"> <nodestart, b> <text, "my house"> <nodeend, b>
<nodestart, p> <selfendnode, img> <selfendnode, br>
<text, "see"> <nodestart, a> <text, "More Picture"> <nodeend, a>
<text, "if you liked that one."> <nodeend, p>
1. the sets of characters that form the input alphabet (excluding those that may only
appear in character strings or comments)
2. the lexical form of numerical constants, and
3. the lexical form of identifiers,
1. C
2. C++
3. C#
4. Fortran
5. Java
6. Lisp
7. SQL
3.3.2
Describe the languages denoted by the following regular expressions:
1. a(a|b)*a
2. ((ε|a)b*)*
3. (a|b)*a(a|b)(a|b)
4. a*ba*ba*ba*
5. !! (aa|bb)*((ab|ba)(aa|bb)*(ab|ba)(aa|bb)*)*
Answer
1. String of a's and b's that start and end with a.
2. String of a's and b's.
3. String of a's and b's that the character third from the last is a.
4. String of a's and b's that only contains three b.
5. String of a's and b's that has a even number of a and b.
3.3.3
In a string of length n, how many of the following are there?
1. Prefixes.
2. Suffixes.
3. Proper prefixes.
4. ! Substrings.
5. ! Subsequences.
Answer
1. n+1
2. n+1
3. n-1
4. C(n+1,2) + 1 (need to count epsilon in)
5. Σ(i=0,n) C(n, i)
3.3.4
Most languages are case sensitive, so keywords can be written only one way, and the
regular expressions describing their lexeme is very simple. However, some languages,
like SQL, are case insensitive, so a keyword can be written either in lowercase or in
uppercase, or in any mixture of cases. Thus, the SQL keyword SELECT can also be
written select, Select, or sElEcT, for instance. Show how to write a regular expression
for a keyword in a case insensitive language. Illustrate the idea by writing the
expression for "select" in SQL.
Answer
select -> [Ss][Ee][Ll][Ee][Cc][Tt]
3.3.5
!Write regular definitions for the following languages:
1. All strings of lowercase letters that contain the five vowels in order.
2. All strings of lowercase letters in which the letters are in ascending lexicographic
order.
3. Comments, consisting of a string surrounded by / and /, without an intervening */,
unless it is inside double-quotes (")
4. !! All strings of digits with no repeated digits. Hint: Try this problem first with a few
digits, such as {O, 1, 2}.
5. !! All strings of digits with at most one repeated digit.
6. !! All strings of a's and b's with an even number of a's and an odd number of b's.
7. The set of Chess moves,in the informal notation,such as p-k4 or kbp*qn.
8. !! All strings of a's and b's that do not contain the substring abb.
9. All strings of a's and b's that do not contain the subsequence abb.
Answer
1、
\/\*([^*"]*|".*"|\*+[^/])*\*\/
4、
step2. GNFA
5、
b*(a+b?)*
9、
b* | b*a+ | b*a+ba*
3.3.6
Write character classes for the following sets of characters:
1. The first ten letters (up to "j") in either upper or lower case.
2. The lowercase consonants.
3. The "digits" in a hexadecimal number (choose either upper or lower case for the
"digits" above 9).
4. The characters that can appear at the end of alegitimate English sentence (e.g. ,
exclamation point) .
Answer
1. [A-Ja-j]
2. [bcdfghjklmnpqrstvwxzy]
3. [0-9a-f]
4. [.?!]
3.3.7
Note that these regular expressions give all of the following symbols (operator
characters) a special meaning:
\ " . ^ $ [ ] * + ? { } | /
Their special meaning must be turned off if they are needed to represent themselves in
a character string. We can do so by quoting the character within a string of length one
or more; e.g., the regular expression "**" matches the string ** . We can also get the
literal meaning of an operator character by preceding it by a backslash. Thus, the
regular expression \*\* also matches the string **. Write a regular expression that
matches the string "\.
Answer
\"\\
3.3.9 !
The regular expression r{m, n} matches from m to n occurrences of the pattern r. For
example, a [ 1 , 5] matches a string of one to five a's. Show that for every regular
expression containing repetition operators of this form, there is an equivalent regular
expression without repetition operators.
Answer
r{m,n} is equals to r.(m).r | r.(m + 1).r | ... | r.(n).r
3.3.10 !
The operator ^ matches the left end of a line, and $ matches the right end of a line. The
operator ^ is also used to introduce complemented character classes, but the context
always makes it clear which meaning is intended. For example, ^aeiou*$ matches any
complete line that does not contain a lowercase vowel.
Answer
1. if ^ is in a pair of brakets, and it is the first letter, it means complemented classes,
or it means the left end of a line.
3.4 节的练习
3.4.1
给出识别练习 3.3.2 中各个正则表达式所描述的语言状态转换图。
解答
1. a(a|b)*a
NFA:
DFA:
NFA DFA a
{0} A B
{1,2,3,5,8} B C D
{2,3,4,5,7,8,9} C C D
{2,3,5,6,7,8} D C D
最少状态的 DFA(状态转换图):
合并不可区分的状态 B 和 D
![3 4 1-1](assets/3.4.1-1.gif)
1. ((ε|a)b*)*
2. (a|b)*a(a|b)(a|b)
NFA:
DFA:
<table>
<thead>
<tr>
<th>NFA</th>
<th>DFA</th>
<th>a</th>
<th>b</th>
</tr>
</thead>
<tbody>
<tr>
<td>{0,1,2,4,7}</td>
<td>A</td>
<td>B</td>
<td>C</td>
</tr>
<tr>
<td>{1,2,3,4,6,7,8,9,11}</td>
<td>B</td>
<td>D</td>
<td>E</td>
</tr>
<tr>
<td>{1,2,4,5,6,7}</td>
<td>C</td>
<td>B</td>
<td>C</td>
</tr>
<tr>
<td>{1,2,3,4,6,7,8,9,10,11,13,14,16}</td>
<td>D</td>
<td><b>F</b></td>
<td><b>G</b></td>
</tr>
<tr>
<td>{1,2,4,5,6,7,12,13,14,16}</td>
<td>E</td>
<td><b>H</b></td>
<td><b>I</b></td>
</tr>
<tr>
<td>{1,2,3,4,6,7,8,9,10,11,13,14,15,16,<b>18</b>}</td>
<td><b>F</b></td>
<td><b>F</b></td>
<td><b>G</b></td>
</tr>
<tr>
<td>{1,2,4,5,6,7,12,13,14,16,17,<b>18</b>}</td>
<td><b>G</b></td>
<td><b>H</b></td>
<td><b>I</b></td>
</tr>
<tr>
<td>{1,2,3,4,6,7,8,9,11,15,<b>18</b>}</td>
<td><b>H</b></td>
<td>D</td>
<td>E</td>
</tr>
<tr>
<td>{1,2,4,5,6,7,17,<b>18</b>}</td>
<td><b>I</b></td>
<td>B</td>
<td>C</td>
</tr>
</tbody>
</table>
最少状态的 DFA(状态转换图):
合并不可区分的状态 A 和 C
![3 4 1-3](assets/3.4.1-3.gif)
1. a*ba*ba*ba*
3.4.2
给出识别练习 3.3.5 中各个正则表达式所描述语言的状态转换图。
3.4.3
构造下列串的失效函数。
1. abababaab
2. aaaaaa
3. abbaabb
解答
代码详见:src/failure-function.js
1. [ 0, 0, 1, 2, 3, 4, 5, 1, 2 ]
2. [ 0, 1, 2, 3, 4, 5 ]
3. [ 0, 0, 0, 1, 1, 2, 3 ]
3.4.4 !
对 s 进行归纳,证明图 3-19 的算法正确地计算出了失效函数。
图 3-19:计算关键字 b_1b_2…b_n 的失效函数
01 t = 0;
02 f(1) = 0;
03 for (s = 1; s < n; s ++){
04 while( t > 0 && b_s+1 != b_t+1) t = f(t);
05 if(b_s+1 == b_t+1){
06 t = t + 1;
07 f(s + 1) = t;
08 }else{
09 f(s + 1) = 0;
10 }
11 }
证明
1. 已知 f(1) = 0
2. 在第 1 次 for 循环时,计算 f(2) 的值,当第5行代码 b_2 == b_1 成立时,代码进
入到第7行得出 f(2) = 1,不成立时,则代码进入第9行得出 f(2) = 0。显然,这次
循环正确的计算出了 f(2) 。
3. 假设在第 i-1 次进入循环时,也正确的计算出了 f(i),也有 f(i) = t (无论 t 是大于 0
还是等于 0)
4. 那么在第 1 次进入循环时,分两种情况进行考虑:
i. t == 0
3.4.5 !!
说明图 3-19 中的第 4 行的复制语句 t = f(t) 最多被执行 n 次。进而说明整个算法的时间复
杂度是 O(n),其中 n 是关键字长度。
解答
3.4.6
应用 KMP 算法判断关键字 ababaa 是否为下面字符串的子串:
1. abababaab
2. abababbaa
解答
代码详见:src/failure-function.js
1. true
2. false
3.4.7 !!
说明图 3-20 中的算法可以正确的表示输入关键字是否为一个给定字符串的子串。
s = 0;
for(i = 1; i <= m; i ++){
while(s > 0 && a_i != b_s+1) s = f(s);
if(a_i == b_s+1) s = s + 1;
if(s == n) return "yes";
}
return "no";
3.4.8
假设已经计算得到函数 f 且他的值存储在一个以 s 为下标的数字中,说明图 3-20 中算法
的时间复杂度为 O(m + n)。
解答
1. s1 = b
2. s2 = a
3. 当 k > 2 时, sk = sk-1 sk-2
1. sn 的长度是多少?
2. 构造 s6 的失效函数。
3. 构造 s7 的失效函数。
4. !! 说明任何 sn 的失效函数都可以被表示为:f(1) = f(2) = 0,且对于 2 < j <= |sn|,
f(j) = j - |sk-1|,其中 k 是使得 |sk| <= j + 1 的最大整数。
5. !! 在 KMP 算法中,当我们试图确定关键字 sk 是否出现在字符串 sk+1 中,最多
会连续多少次调用失效函数?
解答
1. 见 维基百科
2. s6 = abaababa
failure = [ 0, 0, 1, 1, 2, 3, 2, 3 ]
3. s7 = abaababaabaab
failure = [ 0, 0, 1, 1, 2, 3, 2, 3, 4, 5, 6, 4, 5 ]
source
3.5.2
Write a Lex program that copies a file, replacing each non empty sequence of white
space by a single blank
3.5.3
Write a Lex program that copies a C program, replacing each instance of the keyword f
loat by double.。
3.5.4 !
Write a Lex program that converts a file to "Pig latin." Specifically, assume the file is a
sequence of words (groups . of letters) separated by whitespace. Every time you
encounter a word:
1. If the first letter is a consonant, move it to the end of the word and then add ay!
2. If the first letter is a vowel, just add ay to the end of the word.
source
3.5.5 !
In SQL, keywords and identifiers are case-insensitive. Write a Lex program that
recognizes the keywords SELECT, FROM, and WHERE (in any combination of capital
and lower-case letters) , and token ID, which for the purposes of this exercise you may
take to be any sequence of letters and digits, beginning with a letter. You need not
install identifiers in a symbol table, but tell how the "install" function would differ from
that described for case-sensitive identifiers as in Fig. 3.23.
source
3.6 Exercises for Section 3.6
3.6.1 !
Figure 3.19 in the exercises of Section 3.4 computes the failure function for the KMP
algorithm. Show how, given that failure function, we can construct, from a keyword
b1b2...bn an n + 1-state DFA that recognizes .*b1b2...bn, where the dot stands for "any
character." Moreover, this DFA can be constructed in O(n) time.
Answer
Take the string "abbaabb" in exercise 3.4.3-3 as example, the failure function is:
n : 1, 2, 3, 4, 5, 6, 7
f(n): 0, 0, 0, 1, 1, 2, 3
3.6.2
Design finite automata (deterministic or nondeterministic) for each of the languages of
Exercise 3.3.5.
3.6.3
For the NFA of Fig. 3.29, indicate all the paths labeled aabb. Does the NFA accept
aabb?
Answer
(0) -a-> (1) -a-> (2) -b-> (2) -b-> ((3))
(0) -a-> (0) -a-> (0) -b-> (0) -b-> (0)
(0) -a-> (0) -a-> (1) -b-> (1) -b-> (1)
(0) -a-> (1) -a-> (1) -b-> (1) -b-> (1)
(0) -a-> (1) -a-> (2) -b-> (2) -b-> (2)
(0) -a-> (1) -a-> (2) -b-> (2) -ε-> (0) -b-> (0)
(0) -a-> (1) -a-> (2) -ε-> (0) -b-> (0) -b-> (0)
3.6.4
Repeat Exercise 3.6.3 for the NFA of Fig. 3.30.
3.6.5
Give the transition tables for the NFA of:
1. Exercise 3.6.3.
2. Exercise 3.6.4.
3. Figure 3.26.
Answer
Table 1
state a b ε
0 {0,1} {0} ∅
1 {1,2} {1} ∅
state a b ε
3 ∅ ∅ ∅
Table 2
state a b ε
0 {1} ∅ {3}
1 ∅ {2} {0}
2 ∅ {3} {1}
3 {0} ∅ {2}
Table 3
state a b ε
0 ∅ ∅ {1,2}
1 {2} ∅ ∅
2 {2} ∅ ∅
3 ∅ {4} ∅
4 ∅ {4} ∅
Answer
1、
Transition table
{0,1,3} A B C
{2} B B ∅
{4} C ∅ C
DFA
2、
Transition table
{0} A B A
{0,1} B C B
NFA State DFA State a b
{0,1,2} C C D
{0,2,3} D C D
DFA
3、
Transition table
{0,1,2,3} A A A
DFA
3.7.2
use Algorithm 3.22 to simulate the NFA's:
1. Fig. 3.29.
2. Fig. 3.30.
on input aabb.
Answer
1. -start->{0}-a->{0,1}-a->{0,1,2}-b->{0,2,3}-b->{0,2,3}
2. -start->{0,1,2,3}-a->{0,1,2,3}-a->{0,1,2,3}-b->{0,1,2,3}-b->{0,1,2,3}
3.7.3
Convert the following regular expressions to deterministic finite automata, using
algorithms 3.23 and 3.20:
1. (a|b)*
2. (a*|b*)*
3. ((ε|a)|b*)*
4. (a|b)*abb(a|b)*
Answer
1、
NFA
Transition table
{0,1,2,3,7} A B C
{1,2,3,4,6,7} B B C
NFA State DFA State a b
{1,2,3,5,6,7} C B C
DFA
2、
NFA
Transition table
{0,1,2,3,4,5,8,9,10,11} A B C
NFA State DFA State a
{1,2,3,4,5,6,8,9,10,11} B B C
{1,2,3,4,5,7,8,9,10,11} C B C
DFA
3、
NFA
Transition table
{0,1,2,3,4,6,7,9,10} A B C
NFA State DFA State a
{1,2,3,4,5,6,7,9,10} B B C
{1,2,3,4,6,7,8,9,10} C B C
DFA
4、
NFA
Transition table
{0,1,2,4,7} A B C
{1,2,3,4,6,7,8} B B D
{1,2,4,5,6,7} C B C
{1,2,4,5,6,7,9} D B E
{1,2,4,5,6,7,10,11,12,14,17} E F G
{1,2,3,4,6,7,8,11,12,13,14,16,17} F F H
NFA State DFA State a
{1,2,4,5,6,7,11,12,13,15,16,17} G F G
{1,2,4,5,6,7,9,11,12,14,15,16,17} H F I
{1,2,4,5,6,7,10,11,12,14,15,16,17} I F G
DFA
Answer
1. NFA
NOTE: this NFA has potential conflict, we can decide the matched lexeme by 1.
take the longest 2. take the first listed.
2. DFA
3.8.2
Repeat Exercise 3.8.1 for tokens consisting of (1) the keyword while, (2) the keyword
when, and (3) identifiers consisting of strings of letters and digits, beginning with a letter.
Answer
1. NFA
2. DFA
bother to paint
3.8.3 !
Suppose we were to revise the definition of a DFA to allow zero or one transition out of
each state on each input symbol (rather than exactly one such transition, as in the
standard DFA definition). Some regular expressions would then have smaller "DFA's"
than they do under the standard definition of a DFA. Give an example of one such
regular expression.
Answer
Take the language defined by regular expression "ab" as the example, assume that the
set of input symbols is {a, b}
Standard DFA
Revised DFA
Obviously, the revised DFA is smaller than the standard DFA.
3.8.4 !!
Design an algorithm to recognize Lex-lookahead patterns of the form rl/r2, where rl and
r2 are regular expressions. Show how your algorithm works on the following inputs:
1. (abcd|abc)/d
2. (a|ab)/ba
3. aa*/a*
1. ?
2. +
Answer
3.9.2
Use Algorithm 3.36 to convert the regular expressions of Exercise 3.7.3 directly to
deterministic finite automata.
Answer
1. (a|b)*
o Syntax tree
<table>
<thead>
<tr>
<th>node n</th>
<th>followpos(n)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>{1, 2, 3}</td>
</tr>
<tr>
<td>2</td>
<td>{1, 2, 3}</td>
</tr>
<tr>
<td>3</td>
<td>∅</td>
</tr>
</tbody>
</table>
Steps
The value of firstpos for the root of the tree is {1, 2, 3}, so this set is the start
state of D. Call this set of states A. We compute Dtran[A, a] and Dtran[A, b].
Among the positions of A, 1 correspond to a, while 2 correspond to b. Thus
Dtran[A, a] = followpos(1) = {1, 2, 3}, Dtran[A, b] = followpos(2) = {1, 2, 3}. Both
the results are set A, so dose not have new state, end the computation.
DFA
1. (a*|b*)*
2. ((ε|a)|b*)*
3. (a|b)*abb(a|b)*
3.9.3 !
We can prove that two regular expressions are equivalent by showing that their
minimum-state DFA's are the same up to renaming of states. Show in this way that the
following regular expressions: (a|b)*, (a*|b*)*, and ((ε|a)b*)* are all equivalent. Note: You
may have constructed the DFA's for these expressions in response to Exercise 3.7.3.
Answer
Refer to the answers of 3.7.3 and 3.9.2-1
3.9.4 !
Construct the minimum-state DFA's for the following regular expressions:
1. (a|b)*a(a|b)
2. (a|b)*a(a|b)(a|b)
3. (a|b)*a(a|b)(a|b)(a|b)
Do you see a pattern?
3.9.5 !!
To make formal the informal claim of Example 3.25, show that any deterministic finite
automaton for the regular expression
(a|b)*a(a|b)...(a|b)
where (a|b) appears n - 1 times at the end, must have at least 2n states. Hint: Observe
the pattern in Exercise 3.9.4. What condition regarding the history of inputs does each
state represent?