Patterns, Automata, and Regular Expressions

• Finite Automata – formal or abstract
machine to recognize patterns

Patterns, Automata, and
• Regular expressions – formal notation to
Regular Expressions describe/generate patterns
Finite Automata Regular Expressions

• A finite collection of states • Defines a set of strings over the characters contained in
• An alphabet some alphabet, defining a language
• A set of transitions between those states labeled • Atomic operand can be
– a character,
with symbols in the alphabet
– the symbol ε,
• A start state, S0 – the symbol Φ, or
• One or more final states – a variable whose value can be any pattern defined by a regular
expression
• Three basic operations/operators
Deterministic – single-valued transition, no epsilon
– Union – e.g., a|b
transitions – Concatenation – e.g., ab
Non-deterministic – multi-valued transitions – Closure – (Kleene closure) – e.g., a* - where a can be a set
concatenated with itself zero or more times
Precedence of Regular Expression Algebraic Laws for Regular

Operators Expressions
• Closure (highest) • Identity for Union Φ|R = R|Φ= R
• Identity for concatenation εR=Rε=R
• Concatenation • Associativity and commutativity of union
• Union R|S=S|R, ((R|S)|T) = (R|(S|T))
• Associativity of concatenation (RS)T = R(ST)
• Non-commutativity of concatenation
• Left distributivity of concatenation over union
(R(S|T)) = (RS|RT)
• Right distributivity of concatenation over union
((S|T)R) = (SR|TR)
• Idempotence of union (R | R) = R
1
RE <-> DFA Automated RE->NFA
State-Elimination
• Build NFA for each term, connect them
a b
Construction
DFA
with moves s0 s1 s2 s3
RE
– Concatenate - ab
a b
s0 s1 s2
Subset s3
Thompson’s
Construction
Construction
– Union – a|b
s0
a

s1
NFA s4
s2
b s5
– Kleene Closure – a* s3
a

s2 s0 s1 s3

Thompson’s Construction NFA->DFA

• Each NFA has a single start state and a single • Subset construction algorithm
– Each state in DFA corresponds to a set of states in NFA
final state, with no transitions leaving the final
state and only the initial transition entering the q0 ε-closure(n0)
start state initialize Q with {q0}
• An -move always connects two states that were

while (Q is still changing)

for each qi Q
start or final states for each character
• A state has at most 2 entering and 2 exiting - t ε-closure(move(qi, ))
T[qi, ] t
moves, and at most 1 entering and 1 exiting
if t Q then
move on a symbol in the alphabet add t to Q
Example DFA Minimization

NFA P P {SF, {S - SF}
while (P is still changing)
m a i n
S0 Sm Sa Si Sn T 0
for each set p P
-m -a,m -i,m -n,m for each
Corresponding DFA partition p by

m a i
S0 S0,Sm S0,Sa S0,Si into p1, p2, p3, … pk
n
-m m m m T T U p1, p2, p3, … pk
m m m
if T P then
m
a So,Sn,
i
S0,Sn, P T
S0,Sn S0,Sn,
Sa Si -m
-a,m
Sm
-i,m
2
Automated DFA->RE Regular Expression and DFA
Identifier
for i = 1 to N
for j = 1 to N letter (a|b|c| … |z|A|B|C| … |Z)
Rij0 = {a| (si,a) = sj} digit (0|1|2|3|4|5|6|7|8|9)
if (i == j)
Rij0 = Rij0 U
id letter (letter|digit)*
letter
for k = 1 to N digit
for i = 1 to N
letter
for j = 1 to N S0 S1
Rijk = Rikk-1 Rkkk-1 Rkjk-1 URijk-1

digit
L = U R1jN accept
s S
j

F S2 error
letter
digit
Implementing Scanners Code for Semi-Mechanical Pure

(Recognizer) DFA
state = S0; /* code for S0 */
• Ad-hoc done = false;
token_value = “” /* empty string */
token_type = error;
• Semi-mechanical pure DFA char = next_char();

while (not done) {
class = char_class[char];
• Table-driven DFA switch(state) {

case S0:
switch (class)
case ‘letter’: token_type = identifier; token_value = token_value+char;
state = S1; char = next_char(); break;
case ‘digit’: done = true; break;
case ‘other’: done = true; break;
break;
case S1:
switch(class)
case ‘letter’:
case ‘digit’: token_value = token_value+char; char = next_char(); break;
case ‘other’: done = true; break;
break;
}
}
return(token_type);
Table-Driven Recognizer Tables Driving the Recognizer

a-z A-Z 0-9 other
letter char_class
digit value letter letter digit other
letter other
S0 S1 S2
digit
other
class S0 S1 S2 S3
accept
S3 error letter S1 S1 -- --
next_state
digit S3 S1 -- --
other S3 S2 -- --
To change language, we can just change tables
3
Table-Driven Recognizer/Scanner Error Recovery
char = next_char();
state = S0; /* code for S0 */
token_value = “” /* empty string */
• E.g., illegal character
while (not done) {
class = char_class[char]; • What should the scanner do?
state = next_state[class,state];
switch(state) {
case S2: /* accept state */
– Report the error
token_type = identifier;
done = true; break; – Try to correct it?
case S3: /* error */
token_type = error;
done = true; break;
• Error correction techniques
default: /* building an id */
token_value = token_value + char; – Minimum distance corrections
char = next_char; break;
}
} – Hard token recovery
return(token_type);
– Skip until match
Scanner Summary
• Break up input into tokens
• Catch lexical errors
• Difficulty affected by language design
• Issues
– Input buffering
– Lookahead
– Error recovery
• Scanner generators
– Tokens specified by regular expressions
– Regular expression -> DFA
– Highly efficient in practice

Patterns, Automata, and Regular Expressions

Uploaded by

Copyright:

Available Formats

Patterns, Automata, and Regular Expressions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Patterns, Automata, and Regular Expressions

Uploaded by

Copyright:

Available Formats

• Finite Automata – formal or abstract

machine to recognize patterns

Finite Automata Regular Expressions

Precedence of Regular Expression Algebraic Laws for Regular

Thompson’s Construction NFA->DFA

• An -move always connects two states that were

Example DFA Minimization

Corresponding DFA partition p by

Rijk = Rikk-1 Rkkk-1 Rkjk-1 URijk-1

Implementing Scanners Code for Semi-Mechanical Pure

• Semi-mechanical pure DFA char = next_char();

• Table-driven DFA switch(state) {

Table-Driven Recognizer Tables Driving the Recognizer

You might also like

Patterns, Automata, and Regular Expressions

Uploaded by

Copyright:

Available Formats

Patterns, Automata, and Regular Expressions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Patterns, Automata, and Regular Expressions

Uploaded by

Copyright:

Available Formats

• Finite Automata – formal or abstract

machine to recognize patterns

Finite Automata Regular Expressions

Precedence of Regular Expression Algebraic Laws for Regular

Thompson’s Construction NFA->DFA

• An -move always connects two states that were

Example DFA Minimization

Corresponding DFA partition p by 

Rijk = Rikk-1 Rkkk-1 Rkjk-1 URijk-1

Implementing Scanners Code for Semi-Mechanical Pure

• Semi-mechanical pure DFA char = next_char();

• Table-driven DFA switch(state) {

Table-Driven Recognizer Tables Driving the Recognizer

You might also like

Corresponding DFA partition p by