Computation Theory: Expressions Languages Grammar

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Computation Theory

Chapter 3

Regular expressions , Regular languages and Regular Grammar

1
Chap3: Regular expressions, Regular languages and Regular Grammar
Operations on formal languages (1)
Purpose : We define operators that allow us to combine elementary
languages ​to create more complex languages.

Let L1={abc, cab}, L2={ab, ac}


Union: L=L 1 U L2={abc, cab, ab, ac}
Concatenation: L1={abc, cab}, L2={ab, ac}
L=L1L2={abcab, abcac, cabab, cabac}
Kleene Star: L1={abc, cab}
L1*={abc, cab, abcabc, abccab, cababc, cabcab,
abcabcabc, abcabccab,…}
Other operations: intersection, complement, difference
Chap3: Regular expressions, Regular languages and Regular Grammar 2
Operations on formal languages (2)
Let L1={abc, cab}, L2={ab, ac}
Union: L=L 1 U L2={abc, cab, ab, ac}

Concatenation: L1={abc, cab}, L2={ab, ac}


L=L1L2={abcab, abcac, cabab, cabac}

Kleene Star: L1={abc, cab}


L1*={abc, cab, abcabc, abccab, cababc, cabcab,
abcabcabc, abcabccab,…}
Other operations: intersection, complement, difference

Chap3: Regular expressions, Regular languages and Regular Grammar 3


Elementary Language
• An elementary language has length one

• Exemple : on  ={a, b} the elementary languages are ,


{λ}, {a}, and {b}

• An elementary language is a regular language

• A regular language is a language that can be built from


elementary languages, by using the three operations of
union, concatenation, and Kleene star.

Chap3: Regular expressions, Regular languages and Regular Grammar 4


Regular Languages : Another definition
Let L1= {a}, L2= {b}
▪ the union of two regular languages ({a, b}) is a regular
language
▪ the concatenation of two regular languages ({ab}) is a
regular language
▪ the kleene’s closure of a regular language (a*={a, aa,
aaa, aaaa, …}) is a regular language
• Ø is RL
• elementary language is RL
• L1  L2 is RL
• L1L2 is RL
• L1* is RL Chap3: Regular expressions, Regular languages and Regular Grammar 5
Regular expressions
▪ A useful shorthand for describing regular languages.
▪ Compare to arithmetic expressions, such as (x + 3)/2.
An arithmetic expression is constructed using arithmetic
operators, such as addition and division.
▪A regular expression is constructed using operations on
languages, such as concatenation, union, and Kleene star.
▪ The value of an arithmetic expression is a number.
▪ The value of a regular expression is a language.

Chap3: Regular expressions, Regular languages and Regular Grammar 6


Regular Languages correspond to
Regular Expressions

• RL = Ø RE is Ø (rule 1)
• RL = {} RE =  (rule 2)
• RL = {a} RE = a (rule 3)
• RL = L1  L2 RE = (r1 + r2) (rule 4)
• RL = L1 L2 RE = (r1r2) (rule 5)
• RL = L1* RE = (r1*) (rule 6)

Chap3: Regular expressions, Regular languages and Regular Grammar 7


L=(ab  b)* is it regular?
L={, ab, b, abab, abb, bb, ababab, ababb, abbb,…}

Language Rule Expression


RL is elementary language Rule 3 R1= a R2= b
RL = L1 L2 Rule 5 R3= ab
RL = L1  L2 Rule 4 R4= ab  b
RL = L1* Rule 6 R5= (ab  b)*

From R1 to R5 are all regular languages (regular expressions)

Chap3: Regular expressions, Regular languages and Regular Grammar 8


Hints for writing regular expressions
Assume  = {a, b, c}.

Zero or more a’s: a*


One or more a’s: aa*
Any string at all: (a + b + c)*
Any nonempty string: (a + b + c)(a + b + c)*
Any string that does not contain a: (b + c)*
Any string containing exactly one a: (b + c)*a(b + c)*
Chap3: Regular expressions, Regular languages and Regular Grammar 9
Examples of Regular Expressions
• a*+ b = {λ, a, b, aa, aaa, aaaa, aaaaa, … }
• a*ba* = {w  * : w has exactly one b}
• (a + b)* = any string of a’s and b’s
• (a + b)*aa (a + b)* = {w  * : w contains aa}
• (a + b)*aa (a + b)* + (a + b)*bb (a + b)* =
{w  * : w contains aa or bb}
• (a + λ)b* = {abn : n  0} + {bn : n  0}
• (b + c)*(λ + a)(b + c)*(λ + a)(b + c)*=
all strings containing no more than two a’s:
▪ As with arithmetic expressions, there is an order of precedence for operators The order is:
star closure first, then concatenation, then union.
Chap3: Regular expressions, Regular languages and Regular Grammar 10
Do these strings match the regular
expression?

Regular expression String

(01* + 1) 0101

(a + λ)b b

(ab)*a* λ

(a + b)(ab) bb
Chap3: Regular expressions, Regular languages and Regular Grammar 11
Kleene’s theorem
(1909-1994)

1) For any regular expression r that represents language


L(r), there is a finite automaton that accepts that same
language.

2) For any finite automaton M that accepts language L(M),


there is a regular expression that represents the same
language.

Therefore, the class of languages that can be represented by


regular expressions is equivalent to the class of languages
accepted by finite automata -- the regular languages.
Chap3: Regular expressions, Regular languages and Regular Grammar 12
Kleene’s theorem part 1

NDFA regular
expression

Kleene’s
DFA Theorem part 2

Chap3: Regular expressions, Regular languages and Regular Grammar 13


Kleene’s Theorem part1(1)
Regular expression NDFA

Let r be a regular expression. Then there exists some


nondeterministic regular accepter that accepts L(r).
Consequently, L(r) is a regular language.

Proof strategy: for any regular expression, we show how to


construct an equivalent NDFA.

Chap3: Regular expressions, Regular languages and Regular Grammar 14


Kleene’s Theorem part1(2)
Regular expression NDFA

Base step: Give a NDFA that accepts each of the


elementary languages :

:

{λ} :

{a} for each a   : a

Chap3: Regular expressions, Regular languages and Regular Grammar 15


Kleene’s Theorem part1(3)
Regular expression NDFA
Inductive step: For each of the operations – union,
concatenation and Kleene star – show how to construct an
accepting NDFA.
Closure under union:

M1
λ
λ

λ M2
λ

Chap3: Regular expressions, Regular languages and Regular Grammar 16


Kleene’s Theorem part1(4)
Regular expression NDFA
Closure under concatenation:
M1 M2
λ

Closure under Kleene Star:


λ

λ M1

Chap3: Regular expressions, Regular languages and Regular Grammar 17


Example: how to construct an equivalent
NDFA for R=(abUb)* (1)

1) A NDFA that accepts each of the elementary languages


b
a
R1= a R2 = b
q0 q1 q2 q3

2) Closure under concatenation

a b
λ
R3 = ab q0 q1 q2 q3

a b
R3 = ab q0 q2 q3

Chap3: Regular expressions, Regular languages and Regular Grammar 18


Example: how to construct an equivalent
NDFA for R=(abUb)* (2)

b
3) Closure under union: R3 = ab a
q0 q1 q2

b
R2 = b q3 q4

R4 = R3 U R2 a b
= ab U b λ q0 q1 q2 λ

q5
λ b q6
λ
q3 q4

a b
R4 = R3 U R2 q5 q1
= ab U b
b q6

Chap3: Regular expressions, Regular languages and Regular Grammar 19


Example: how to construct an equivalent
NDFA for R=(abUb)* (3)
4) Closure under Kleene Star:
R5 = (R4)* = (ab U b)* R4 = ab U b
a b
q0 q1
λ - transition elimination :
q3q0 q1 : q3 q1 : a λ
b q2
q3q0 q2: q3 q2 : b
λ a b
q3 q0 q1

b q2

a a
λ q3 q1
λ - transition elimination :
q2q3 q1 : q2 q1 : a b
q2q3 q2 : q2 q2 : b a b q2
b
q3 q1
b
b q2
R5 = (ab U b)*
L5 = {λ, ab, b, abab, abb, bab, bb,
ababab, ababb,…}

Chap3: Regular expressions, Regular languages and Regular Grammar 20


Kleene’s Theorem part2(1)
NDFA Regular expression
(1909-1994)

▪ Let L be a regular language. Then there exists a regular


expression r such that L = L(r).

▪ Any language accepted by a finite automaton can be


represented by a regular expression.

▪ The proof strategy: For any DFA, we show how create


an equivalent regular expression. In other words, we
describe an algorithm for converting any DFA to a regular
expression.
Chap3: Regular expressions, Regular languages and Regular Grammar 21
Kleene’s Theorem part2(2)
NDFA Regular expression

• An Expression diagram is a labeled directed graph


(similar to a finite state diagram) in which
transitions are labeled by regular expressions
• Has a single start state with no incoming transitions
• Has a single accepting state with no outgoing
transitions
• Example: ab

(a+b) a*

Chap3: Regular expressions, Regular languages and Regular Grammar 22


Kleene’s Theorem part2(3)
NDFA Regular Expression
▪ Algorithm for converting a DFA into Regular Expression
• Initial step:
1) Change every transition labeled a, b to (a + b).
2) Add a single start state with an outgoing λ-transition to the current
start state.
3) Add a single final state with incoming λ-transitions from every
previous final state.
• Main step: Until expression diagram has only two states (initial state
and final state), repeat the following:
4) Pick some non-start, non-final state.
5) Remove it from the diagram and re-label transitions with regular
expressions so that the same language is accepted.
Chap3: Regular expressions, Regular languages and Regular Grammar 23
Kleene’s Theorem part2(4)
NDFA Regular expression
▪ The key step is removing states and re-labeling transitions with
regular expressions. Here are some examples of how to do this.
b
a a ab*a

b ab*a
a
b ab*b
a
b
b ab*b
a
a ab*a
Chap3: Regular expressions, Regular languages and Regular Grammar 24
Kleene’s Theorem part2(5)
NDFA Regular expression
▪ The key step is removing states and re-labeling transitions with
regular expressions. Here are some examples of how to do this.
Remove state 2

4 pathes passing by state 2 :

Chap3: Regular expressions, Regular languages and Regular Grammar 25


Example: how to construct an equivalent
R. Expression for A NDFA (1)
a a,b a (a+b)

b λ b λ

(a+b)

a* b λ a*b (a+b)*

Chap3: Regular expressions, Regular languages and Regular Grammar 26


Applications of regular expressions
• Validation
– checking that an input string is in valid format
– example 1: checking format of email address on
WWW entry form
– example 2: UNIX regex command
• Search and selection
– looking for strings that match a certain pattern
– example: UNIX grep command
• Tokenization
– converting sequence of characters (a string) into sequence of
tokens (e.g., keywords, identifiers)
– used in lexical analysis phase of compiler
Chap3: Regular expressions, Regular languages and Regular Grammar 27
3 ways of specifying regular
languages

Regular expressions
describe

accept
DFA NDFA Regular languages

generate
Regular grammars

Chap3: Regular expressions, Regular languages and Regular Grammar 28


Grammar
• A grammar G = (V, T, S, P) consists of the
following quadruple:
– a set V of variables (non-terminal symbols),
including a starting symbol S
– a set T of terminals (same as an alphabet, )
– a start symbol S  V
– a set P of production rules
• Example:
S → aS | A
A→ bA | λ Chap3: Regular expressions, Regular languages and Regular Grammar 29
Derivation
• Strings are “derived” from a grammar
• Example of a derivation
S  aS  aaS  aaA  aabA  aab
• At each step, a nonterminal is replaced by
the sentential form on the right-hand side
of a rule (a sentential form can contain
nonterminals and/or terminals)
• Automata recognize languages; grammars
generate languages
Chap3: Regular expressions, Regular languages and Regular Grammar 30
Context-free grammar
• A grammar is said to be context-free if every
rule has a single non-terminal on the left-hand
side
• This means you can apply the rule in any
context. More complicated languages (such as
English) have context-dependent rules. A
language generated from a context-free
grammar is called a context-free language
Chap3: Regular expressions, Regular languages and Regular Grammar 31
Regular grammar
• A grammar is said to be right-linear if all
productions are of the form A→xB or A→x,
where A and B are variables and x is a string of
terminals
• A grammar is said to be left-linear if all
productions are of the form A→Bx or A→x
• A regular grammar is either right-linear or left-
linear. (λ can appear in RHS)
Chap3: Regular expressions, Regular languages and Regular Grammar 32
Linear grammar
• A grammar can be linear without being right-
or left-linear.
• A linear grammar is a grammar in which at
most one variable can occur on the right side of
any production rule, without any restriction on
the position of the variable.
• Example:
S → aS | A
A→ Ab | λ Chap3: Regular expressions, Regular languages and Regular Grammar 33
Another formalism for regular languages

• Every regular grammar generates a regular


language, and every regular language can be
generated by a regular grammar.
• A regular grammar is a simpler, special-
case of a context-free grammar
• The regular languages are a proper subset of
the context-free languages

Chap3: Regular expressions, Regular languages and Regular Grammar 34


Example 1
• The following grammar generates the
language on  = {a,b} consisting of all
strings with no more than three a’s

S → bS | aA | λ
A → bA | aB | λ
B → bB | aC | λ
C → bC | λ
Chap3: Regular expressions, Regular languages and Regular Grammar 35
Example 2 (1)
• Given a grammar, you should be able to say
what language it generates
• Use set notation to define the language
generated by the following grammar
S → aaSB | λ
B → bB | b

Chap3: Regular expressions, Regular languages and Regular Grammar 36


Example 2 (2)
S → aaSB | λ
B → bB | b
It helps to list some of the strings that can be formed:
S  aaSB  aaB  aab
S  aaSB  aaB  aabB  aabb
S  aaSB  aaB  aabB  aabbB  aabbb
S  aaSB  aaB  aabB  aabbB  aabbbB  aabbbb
S  aaSB  aaaaSBB  aaaaBB  aaaaBb  aaaabb
S  aaSB  aaaaSBB  aaaaBB  aaaaBbB  aaaaBbb 
aaaabbb
What is the pattern?
L = {(aa)nbnb*} Chap3: Regular expressions, Regular languages and Regular Grammar 37
Example 3 (1)
• Given a language, you should be able give a
grammar that generates it.
• For example, give a regular (right-linear)
grammar for the language consisting of all
strings over {a, b, c} that begin with a,
contain exactly two b’s, and end with cc.

Chap3: Regular expressions, Regular languages and Regular Grammar 38


Example 3 (2)
• Give a regular (right-linear) grammar for
the language consisting of all strings over
{a, b, c} that begin with a, contain exactly
two b’s, and end with cc
S → aA
A → bB | aA | cA
B → bC | aB | cB
C → aC | cC | cD
D→c Chap3: Regular expressions, Regular languages and Regular Grammar 39
Theorem 1(1)
G NDFA

▪ Every language generated by a right-


linear grammar is regular.
▪ Proof:
– Specify a procedure for automatically
constructing an NDFA that mimics the
derivations of a right-linear grammar.

Chap3: Regular expressions, Regular languages and Regular Grammar 40


Theorem 1(2)
G NDFA
▪ Justification:
• The sentential forms produced by a right linear grammar
have exactly one variable, which occurs as the rightmost
symbol.

• Assume that our grammar has a production rule


D → dE and that, during the derivation of a string, there is a
step wcD  wcdE

• We can construct an NDFA which has states D and E, and


an arc labeled d from D to E.

• NDFAs can be converted to DFAs.

• All languages accepted by DFAs are regular.


Chap3: Regular expressions, Regular languages and Regular Grammar 41
Theorem 1(3)
G NDFA

▪ Construction:
• For each variable Vi in the grammar there will be a state in
the automaton labeled Vi.

• The initial state of the automaton will be labeled V0 and


will correspond to the S variable in the grammar.

• For each production rule Vi → a1a2…amVj the automaton


will have transitions such that δ*(Vi → a1a2…am) = Vj
• For each production rule Vi → a1a2…am the automaton will
have transitions such that δ*(Vi → a1a2…am) = Vfinal

Chap3: Regular expressions, Regular languages and Regular Grammar 42


Theorem 1(4)
G NDFA

Construct an NDFA that accepts the language


generated by the grammar:
S → aA convert to: V0 → aV1
A →abS | b V1 → abV0 | b

a b
V0 V1 Vf

b a

Chap3: Regular expressions, Regular languages and Regular Grammar 43


Theorem 2(1)
DFA G

• Every regular language can be generated


by a right-linear grammar.
• Proof:
Specify a procedure for automatically
constructing a right-linear grammar from
the DFA.

Chap3: Regular expressions, Regular languages and Regular Grammar 44


Theorem 2(2)
DFA G

• Given a regular language L, let M = (Q, , δ,


q0, F) be a DFA that accepts L. Let Q = {q0,
q1, …, qn} and  = {a1, a2, …, am}.
• Construct the grammar G = (V, T, S, P) with:
V = {q0, q1, …, qn}
T = {a1, a2, …, am}
S = q0.
P = {} initially.
• P, the set of production rules, is constructed
as follows:
Chap3: Regular expressions, Regular languages and Regular Grammar 45
Theorem 2(3)
DFA G

• For each transition


δ(qi, aj) = qk
in the transition table of M, add to P the
production:
qi → ajqk
• If qk is in F, then add to P the production:
qk → λ

Chap3: Regular expressions, Regular languages and Regular Grammar 46


Theorem 2(4)
DFA G

• Example: Construct a right-linear grammar for


the language L = L(aab*a)
• First, build an NDFA for L:

a a a
q0 q1 q2 qf

Chap3: Regular expressions, Regular languages and Regular Grammar 47


Theorem 2(5)
DFA G

q0 a q1 a q2 a qf

P = {} initially.
Add to P a rule for each transition in the NDFA:
q0 → aq1
q1 → aq2
q2 → bq2
q2 → aqf
Since qf is in F, add to P the production:
qf → λ Chap3: Regular expressions, Regular languages and Regular Grammar 48
Theorem 2(6)
DFA G

q0 a q1 a q2 a qf

Now P = You can convert to normal grammar notation:


{q0 → aq1 S → aA
q1 → aq2 A → aB
q2 → bq2 B → bB
q2 → aqf B → aC
qf → λ } C→λ

Chap3: Regular expressions, Regular languages and Regular Grammar 49


Theorem 3
▪ A language L is regular if and only if there exists a
regular grammar G such that L = L(G).

▪ Proof:
Combine our definition of regular grammars,
which includes the statement, “A regular grammar is
right-linear, with theorem 2.

Chap3: Regular expressions, Regular languages and Regular Grammar 50


Questions

Chap3: Regular expressions, Regular 51


languages and Regular Grammar

You might also like