Compiler Lecture 4
Compiler Lecture 4
Compiler Lecture 4
Today’s lecture:
Algorithms to derive a DFA from a RE.
CSE 359 - Compiler Masud Ibn Afjal, CSE, HSTU 1
Design
An Example (recognise r0 through r31)
Register r ((0|1|2) (Digit|) | (4|5|6|7|8|9) | (3|30|31))
S2 digit S3
0|1|2
S0 r S1 3 S5 0|1 S6
4|5|6|7|8|9 S4
• Same code skeleton
State ‘r’ 0,1 2 3 4,5,…,9
0 1 - - - - (Lecture 3, slide 15)
1 - 2 2 5 4 can be used!
2(final) - 3 3 3 3 • Different (bigger)
3(final) - - - - - transition table.
4(final) - - - - - • Our Deterministic
5(final) - 6 - - - Finite Automaton
6(final) - - - - - (DFA) recognises
only r0 through r31.
CSE 359 - Compiler Masud Ibn Afjal, CSE, HSTU 2
Design
Non-deterministic Finite Automata
What about a RE such as (a | b)*abb?
a|b
S0 S1 a S2 b S3 b S4
b
S1 S2 S2 b S3
S0 S5 S0 S1 S6 S7
c c S
S3 S4 S4 5
Second: NFA for b|c Third: NFA for (b|c)*
Of course, a human
Fourth: NFA would design a simpler
for a(b|c)* S4 b S5 one… But, we can
automate production of
S0 a S1 S2 S3 S8 S9 the complex one...
S6 c S7 b|c
S0 a S1
CSE 359 - Compiler Masud Ibn Afjal, CSE, HSTU 6
Design
NFA to DFA: two key functions
• move(si,a): the (union of the) set of states to which there is a transition
on input symbol a from state si
-closure(si): the (union of the) set of states reachable by from si.
Example (see the diagram below):
-closure(3)={3,4,7}; -closure({3,10})={3,4,7,10};
• move(-closure({3,10}),a)=8;
10
a
4 7 8
3
The Algorithm:
• start with the -closure of s0 from NFA.
• Do for each unmarked state until there are no unmarked states:
– for each symbol take their -closure(move(state,symbol))
CSE 359 - Compiler Masud Ibn Afjal, CSE, HSTU 7
Design
NFA to DFA with subset construction
Initially, -closure is the only state in Dstates and it is unmarked.
while there is an unmarked state T in Dstates
mark T
for each input symbol a
U:=-closure(move(T,a))
if U is not in Dstates then add U as unmarked to Dstates
Dtable[T,a]:=U
• Dstates (set of states for DFA) and Dtable form the DFA.
• Each state of DFA corresponds to a set of NFA states that NFA
could be in after reading some sequences of input symbols.
• This is a fixed-point computation.
• A=-closure(0)={0,1,2,4,7}
• for each input symbol (that is, a and b):
– B=-closure(move(A,a))=-closure({3,8})={1,2,3,4,6,7,8}
– C=-closure(move(A,b))=-closure({5})={1,2,4,5,6,7}
– Dtable[A,a]=B; Dtable[A,b]=C
• B and C are unmarked. Repeating the above we end up with:
– C={1,2,4,5,6,7}; D={1,2,4,5,6,7,9}; E={1,2,4,5,6,7,10}; and
– Dtable[B,a]=B; Dtable[B,b]=D; Dtable[C,a]=B; Dtable[C,b]=C;
Dtable[D,a]=B; Dtable[D,b]=E; Dtable[E,a]=B; Dtable[E,b]=C;
no more unmarked sets at this point!
CSE 359 - Compiler Masud Ibn Afjal, CSE, HSTU 9
Design
Result of applying subset construction
Transition table:
state a b
A B C
B B D
C B C
D B E
E(final) B C
b
C b
b
A a D E
a b b
B a a
a
CSE 359 - Compiler Masud Ibn Afjal, CSE, HSTU 10
Design
Another NFA version of the same RE
a|b
a b b
N0 N1 N2 N3 N4
Note:
• iteration 3 adds nothing new, so the algorithm stops.
• state E contains N4 (final state)
CSE 359 - Compiler Masud Ibn Afjal, CSE, HSTU 11
Design
Enough theory… Let’s conclude!
• We presented algorithms to construct a DFA from a RE.
• The DFA is not necessarily the smallest possible.
• Using an (automatically generated) transition table and
the standard code skeleton (Lecture 3, slide 15) we can
build a lexical analyser from regular expressions
automatically. But, the size of the table can be large...
• Next time:
– DFA minimisation; Practical considerations; Lexical
Analysis wrap-up.
• Reading: Aho2 Sections 3.6-3.7; Aho1 pp. 147-166;
Grune 2.1.6.1-2.1.6.6 (different style); Hunter 3.3 (very
condensed); Cooper1 2.4-2.4.3
CSE 359 - Compiler Masud Ibn Afjal, CSE, HSTU 12
Design