04 Regular Expressions & FAs
04 Regular Expressions & FAs
04 Regular Expressions & FAs
Course 2316410
Lecture No. 4
Quick Review
Last class:
— The scanner is the first stage in the front end
— Specifications can be expressed using regular expressions
— Build tables and code from a DFA
1
Quick Review of Regular Expressions
• All strings of 1s and 0s ending in a 1
( 0 | 1 )* 1
2
Example (from Lab 1)
Consider the problem of recognizing ILOC register names
Register r (0|1|2| … | 9) (0|1|2| … | 9)*
(0|1|2| … 9)
r (0|1|2| … 9)
S0 S1 S2
accepting state
3
Example (continued)
DFA operation
• Start in state S0 & make transitions on each input character
• DFA accepts a word x iff x leaves it in a final state (S2 )
(0|1|2| … 9)
r (0|1|2| … 9)
S0 S1 S2
accepting state
4
Example (continued)
To be useful, the recognizer must be converted into code
0,1,2,3,4, All
r 5,6,7,8,9 others
Char next character
State s0
s0 s1 se se
while (Char EOF)
State (State,Char)
s1 se s2 se
Char next character
if (State is a final state ) s2 se s2 se
then report success
else report failure
se se se se
0,1,2,3,4, All
Char next character r 5,6,7,8,9 others
State s0
while (Char EOF) s0 s1 se se
Next (State,Char) start error error
Act (State,Char)
perform action Act s1 se s2 se
State Next error add error
Char next character
s2 se s2 se
if (State is a final state ) error add error
then report success
se se se se
else report failure
error error error
7
Tighter register specification (continued)
(0|1|2| … 9)
S2 S3
0,1,2
r 3 0,1
S0 S1 S5 S6
4,5,6,7,8,9
S4
8
Tighter register specification (continued)
s5 se s6 se se se se The extra
precision costs
s6 se se se se se se us table space,
not time
se se se se se se se
(0|1|2| … 9)
0,1,2
r 3 0,1
S0 S1 S5 S6
4,5,6,7,8,9
9
S4
Tighter register specification (continued)
State 4,5,6
r 0,1 2 3 other
Action 7,8,9
1
0 e e e e e
start
2 2 5 4
1 e e
add add add add
3 3 3 3 e
2 e
add add add add exit
e
3,4 e e e e e
exit
6 e
5 e e e e
add exit
x
6 e e e e e
exit
e e e e e e e
(0|1|2| … 9)
S2 S3
11
Non-deterministic Finite Automata
What about an RE such as ( a | b )* abb ?
b a
a b b
S0 S1 S2 S3
a
a
a|b
a b b
S0 S1 S2 S3 S4
NFA
NFA becomes an NFA
14
Relationship between NFAs and DFAs
DFA is a special case of an NFA
• DFA has no transitions
• DFA’s transition function is single-valued
• Same rules will work
Scanner generators
• Lex and Flex work along these lines
• Algorithms are well-known and well-understood
• Key issue is interface to parser (define all parts of speech)
• You could build one in a weekend!
16
Where are we? Why are we doing this?
RE NFA (Thompson’s construction)
• Build an NFA for each term
• Combine them with -moves
NFA DFA (Subset construction)
• Build the simulation
The Cycle of Constructions
DFA Minimal DFA
• Hopcroft’s algorithm minimal
RE NFA DFA
DFA
DFA RE
• All pairs, all paths problem
• Union together paths from s0 to a final state
17
RE NFA using Thompson’s Construction
Key idea
• NFA pattern for each symbol & each operator
• Join them with moves in precedence order
a a b
S0 S1 S0 S1 S3 S4
a
S1 S2
S0 S5 S0
S1
a
S3
S4
S3
b
S4
NFA for a*
NFA for a | b
18
Example of Thompson’s Construction
Let’s try a ( b | c )*
1. a, b, & c S0
a
S1 S0
b
S1 S0
c
S1
b
S2
2. b | c S1
S0 S5
c
S3 S4
3. ( b | c )* b
S2 S3
S0
S1
S7
S6
c
S4 S5
19
Example of Thompson’s Construction (con’t)
4. a ( b | c )*
b
S4 S5
S0
a
S1
S2 S3 S8 S9
c
S6 S7
b|c
But, we can automate production of
a
S0 S1 the more complex NFA version ...
20
Where are we? Why are we doing this?
RE NFA (Thompson’s construction)
• Build an NFA for each term
• Combine them with -moves
NFA DFA (subset construction)
• Build the simulation
The Cycle of Constructions
DFA Minimal DFA
• Hopcroft’s algorithm minimal
RE NFA DFA
DFA
DFA RE
• All pairs, all paths problem
• Union together paths from s0 to a final state
21
NFA DFA with Subset Construction
Need to build a simulation of the NFA
25
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
26
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9
q1, q2, q3 , q5, q 8, q9, q7, q8, q 9,
s1 q4, q6, q 9 none q 3 , q4 , q 6 q3, q4, q 6
q5, q8, q9 ,
s2 none s2 s3
q3 , q 4 , q 6
27
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
28
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
29
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
30
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
31
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
32
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
33
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
34
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
35
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
q7 is the core
state of s3
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
36
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
q5 is the core
state of s2
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
37
NFA DFA with Subset Construction
a ( b | c )* : b
q4 q5
a
q0 q1 q2 q3 q8 q9
q6 c
q7
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 q 1 , q2 , q 3 ,
q 4 , q6 , q 9 none none
States -closure(Move(s,*))
DFA NFA a b c
s0 q0 s1 none none
q1, q2, q3 ,
s1 none s2 s3
q4, q6, q 9
q5, q8, q9 ,
s2 none s2 s3
q3 , q 4 , q 6
b
a b c
s2 s0 s1 none none
b
a s1 none s2 s3
s0 s1 b c
c s2 none s2 s3
s3
c s3 none s2 s3
41
Alternative Approach to DFA Minimization
The Intuition
• The subset construction merges prefixes in the NFA
a b c
s1 s2 s3 s4
abc | bc | ad
b c
s0 s5 s6 s7
s6
d
b Subset construction eliminates -
c
s1 s2 s3 transitions and merges the paths for a.
a
It leaves duplicate tails, such as bc.
b c
s0 s4 s5
42
Alternative Approach to DFA Minimization
Idea: use the subset construction twice
• For an NFA N
— Let reverse(N) be the NFA constructed by making initial states
final (& vice-versa) and reversing the edges
— Let subset(N) be the DFA that results from applying the
subset construction to N
— Let reachable(N) be N after removing all states that are not
reachable from the initial state
• Then,
reachable(subset(reverse[reachable(subset(reverse(N))]))
43
Alternative Approach to DFA Minimization
Step 1
• The subset construction on reverse(NFA) merges suffixes in
original NFA
a b c
s1 s2 s3 s4
b c Reversed NFA
s0 s5 s6 s7 s11
a d
s8 s9 s10
a b c
s1 s2 s3
subset(reverse(NFA))
a d
s8 s9 s11
44
Alternative Approach to DFA Minimization
Step 2
• Reverse it again & use subset to merge prefixes …
a b c
s1 s2 s3
s0 Reverse it, again
d s11
a
s8 s9
d
a s2
b
And subset it, again
b c
s0 s3 s11
The Cycle of Constructions
Minimal DFA
minimal
RE NFA DFA
DFA
45
Brzozowski