Computation Theory: Expressions Languages Grammar

Computation Theory
Chapter 3
Regular expressions , Regular languages and Regular Grammar
1
Chap3: Regular expressions, Regular languages and Regular Grammar
Operations on formal languages (1)
Purpose : We define operators that allow us to combine elementary
languages to create more complex languages.
Let L1={abc, cab}, L2={ab, ac}

Union: L=L 1 U L2={abc, cab, ab, ac}
Concatenation: L1={abc, cab}, L2={ab, ac}
L=L1L2={abcab, abcac, cabab, cabac}
Kleene Star: L1={abc, cab}
L1*={abc, cab, abcabc, abccab, cababc, cabcab,
abcabcabc, abcabccab,…}
Other operations: intersection, complement, difference
Chap3: Regular expressions, Regular languages and Regular Grammar 2
Operations on formal languages (2)
Let L1={abc, cab}, L2={ab, ac}
Union: L=L 1 U L2={abc, cab, ab, ac}
Concatenation: L1={abc, cab}, L2={ab, ac}

L=L1L2={abcab, abcac, cabab, cabac}
Kleene Star: L1={abc, cab}

L1*={abc, cab, abcabc, abccab, cababc, cabcab,
abcabcabc, abcabccab,…}
Other operations: intersection, complement, difference

Elementary Language
• An elementary language has length one
• Exemple : on  ={a, b} the elementary languages are ,

{λ}, {a}, and {b}
• An elementary language is a regular language
• A regular language is a language that can be built from

elementary languages, by using the three operations of
union, concatenation, and Kleene star.

Regular Languages : Another definition
Let L1= {a}, L2= {b}
▪ the union of two regular languages ({a, b}) is a regular
language
▪ the concatenation of two regular languages ({ab}) is a
regular language
▪ the kleene’s closure of a regular language (a*={a, aa,
aaa, aaaa, …}) is a regular language
• Ø is RL
• elementary language is RL
• L1  L2 is RL
• L1L2 is RL
• L1* is RL Chap3: Regular expressions, Regular languages and Regular Grammar 5
Regular expressions
▪ A useful shorthand for describing regular languages.
▪ Compare to arithmetic expressions, such as (x + 3)/2.
An arithmetic expression is constructed using arithmetic
operators, such as addition and division.
▪A regular expression is constructed using operations on
languages, such as concatenation, union, and Kleene star.
▪ The value of an arithmetic expression is a number.
▪ The value of a regular expression is a language.

Regular Languages correspond to
Regular Expressions
• RL = Ø RE is Ø (rule 1)
• RL = {} RE =  (rule 2)
• RL = {a} RE = a (rule 3)
• RL = L1  L2 RE = (r1 + r2) (rule 4)
• RL = L1 L2 RE = (r1r2) (rule 5)
• RL = L1* RE = (r1*) (rule 6)

L=(ab  b)* is it regular?
L={, ab, b, abab, abb, bb, ababab, ababb, abbb,…}
Language Rule Expression

RL is elementary language Rule 3 R1= a R2= b
RL = L1 L2 Rule 5 R3= ab
RL = L1  L2 Rule 4 R4= ab  b
RL = L1* Rule 6 R5= (ab  b)*
From R1 to R5 are all regular languages (regular expressions)

Hints for writing regular expressions
Assume  = {a, b, c}.
Zero or more a’s: a*

One or more a’s: aa*
Any string at all: (a + b + c)*
Any nonempty string: (a + b + c)(a + b + c)*
Any string that does not contain a: (b + c)*
Any string containing exactly one a: (b + c)*a(b + c)*
Examples of Regular Expressions
• a*+ b = {λ, a, b, aa, aaa, aaaa, aaaaa, … }
• a*ba* = {w  * : w has exactly one b}
• (a + b)* = any string of a’s and b’s
• (a + b)*aa (a + b)* = {w  * : w contains aa}
• (a + b)*aa (a + b)* + (a + b)*bb (a + b)* =
{w  * : w contains aa or bb}
• (a + λ)b* = {abn : n  0} + {bn : n  0}
• (b + c)*(λ + a)(b + c)*(λ + a)(b + c)*=
all strings containing no more than two a’s:
▪ As with arithmetic expressions, there is an order of precedence for operators The order is:
star closure first, then concatenation, then union.
Do these strings match the regular
expression?
Regular expression String
(01* + 1) 0101
(a + λ)b b
(ab)*a* λ
(a + b)(ab) bb
Kleene’s theorem
(1909-1994)
1) For any regular expression r that represents language

L(r), there is a finite automaton that accepts that same
language.
2) For any finite automaton M that accepts language L(M),

there is a regular expression that represents the same
language.
Therefore, the class of languages that can be represented by

regular expressions is equivalent to the class of languages
accepted by finite automata -- the regular languages.
Kleene’s theorem part 1
NDFA regular
expression
Kleene’s
DFA Theorem part 2

Kleene’s Theorem part1(1)
Regular expression NDFA
Let r be a regular expression. Then there exists some

nondeterministic regular accepter that accepts L(r).
Consequently, L(r) is a regular language.
Proof strategy: for any regular expression, we show how to

construct an equivalent NDFA.

Base step: Give a NDFA that accepts each of the

elementary languages :
:
{λ} :
{a} for each a   : a

Inductive step: For each of the operations – union,
concatenation and Kleene star – show how to construct an
accepting NDFA.
Closure under union:
M1
λ
λ
λ M2
λ

Closure under concatenation:
M1 M2
λ
Closure under Kleene Star:

λ
λ M1

Example: how to construct an equivalent
NDFA for R=(abUb)* (1)
1) A NDFA that accepts each of the elementary languages

b
a
R1= a R2 = b
q0 q1 q2 q3
2) Closure under concatenation
a b
λ
R3 = ab q0 q1 q2 q3
a b
R3 = ab q0 q2 q3

b
3) Closure under union: R3 = ab a
q0 q1 q2
b
R2 = b q3 q4
R4 = R3 U R2 a b
= ab U b λ q0 q1 q2 λ
q5
λ b q6
λ
q3 q4
a b
R4 = R3 U R2 q5 q1
= ab U b
b q6

4) Closure under Kleene Star:
R5 = (R4)* = (ab U b)* R4 = ab U b
a b
q0 q1
λ - transition elimination :
q3q0 q1 : q3 q1 : a λ
b q2
q3q0 q2: q3 q2 : b
λ a b
q3 q0 q1
b q2
a a
λ q3 q1
λ - transition elimination :
q2q3 q1 : q2 q1 : a b
q2q3 q2 : q2 q2 : b a b q2
b
q3 q1
b
b q2
R5 = (ab U b)*
L5 = {λ, ab, b, abab, abb, bab, bb,
ababab, ababb,…}

NDFA Regular expression
(1909-1994)
▪ Let L be a regular language. Then there exists a regular

expression r such that L = L(r).
▪ Any language accepted by a finite automaton can be

represented by a regular expression.
▪ The proof strategy: For any DFA, we show how create

an equivalent regular expression. In other words, we
describe an algorithm for converting any DFA to a regular
expression.
• An Expression diagram is a labeled directed graph

(similar to a finite state diagram) in which
transitions are labeled by regular expressions
• Has a single start state with no incoming transitions
• Has a single accepting state with no outgoing
transitions
• Example: ab
(a+b) a*

NDFA Regular Expression
▪ Algorithm for converting a DFA into Regular Expression
• Initial step:
1) Change every transition labeled a, b to (a + b).
2) Add a single start state with an outgoing λ-transition to the current
start state.
3) Add a single final state with incoming λ-transitions from every
previous final state.
• Main step: Until expression diagram has only two states (initial state
and final state), repeat the following:
4) Pick some non-start, non-final state.
5) Remove it from the diagram and re-label transitions with regular
expressions so that the same language is accepted.
▪ The key step is removing states and re-labeling transitions with
regular expressions. Here are some examples of how to do this.
b
a a ab*a
b ab*a
a
b ab*b
a
b
b ab*b
a
a ab*a
▪ The key step is removing states and re-labeling transitions with
regular expressions. Here are some examples of how to do this.
Remove state 2
4 pathes passing by state 2 :

R. Expression for A NDFA (1)
a a,b a (a+b)
b λ b λ
(a+b)
a* b λ a*b (a+b)*

Applications of regular expressions
• Validation
– checking that an input string is in valid format
– example 1: checking format of email address on
WWW entry form
– example 2: UNIX regex command
• Search and selection
– looking for strings that match a certain pattern
– example: UNIX grep command
• Tokenization
– converting sequence of characters (a string) into sequence of
tokens (e.g., keywords, identifiers)
– used in lexical analysis phase of compiler
3 ways of specifying regular
languages
Regular expressions
describe
accept
DFA NDFA Regular languages
generate
Regular grammars

Grammar
• A grammar G = (V, T, S, P) consists of the
following quadruple:
– a set V of variables (non-terminal symbols),
including a starting symbol S
– a set T of terminals (same as an alphabet, )
– a start symbol S  V
– a set P of production rules
• Example:
S → aS | A
A→ bA | λ Chap3: Regular expressions, Regular languages and Regular Grammar 29
Derivation
• Strings are “derived” from a grammar
• Example of a derivation
S  aS  aaS  aaA  aabA  aab
• At each step, a nonterminal is replaced by
the sentential form on the right-hand side
of a rule (a sentential form can contain
nonterminals and/or terminals)
• Automata recognize languages; grammars
generate languages
Context-free grammar
• A grammar is said to be context-free if every
rule has a single non-terminal on the left-hand
side
• This means you can apply the rule in any
context. More complicated languages (such as
English) have context-dependent rules. A
language generated from a context-free
grammar is called a context-free language
Regular grammar
• A grammar is said to be right-linear if all
productions are of the form A→xB or A→x,
where A and B are variables and x is a string of
terminals
• A grammar is said to be left-linear if all
productions are of the form A→Bx or A→x
• A regular grammar is either right-linear or left-
linear. (λ can appear in RHS)
Linear grammar
• A grammar can be linear without being right-
or left-linear.
• A linear grammar is a grammar in which at
most one variable can occur on the right side of
any production rule, without any restriction on
the position of the variable.
• Example:
S → aS | A
A→ Ab | λ Chap3: Regular expressions, Regular languages and Regular Grammar 33
Another formalism for regular languages
• Every regular grammar generates a regular

language, and every regular language can be
generated by a regular grammar.
• A regular grammar is a simpler, special-
case of a context-free grammar
• The regular languages are a proper subset of
the context-free languages

Example 1
• The following grammar generates the
language on  = {a,b} consisting of all
strings with no more than three a’s
S → bS | aA | λ
A → bA | aB | λ
B → bB | aC | λ
C → bC | λ
Example 2 (1)
• Given a grammar, you should be able to say
what language it generates
• Use set notation to define the language
generated by the following grammar
S → aaSB | λ
B → bB | b

Example 2 (2)
S → aaSB | λ
B → bB | b
It helps to list some of the strings that can be formed:
S  aaSB  aaB  aab
S  aaSB  aaB  aabB  aabb
S  aaSB  aaB  aabB  aabbB  aabbb
S  aaSB  aaB  aabB  aabbB  aabbbB  aabbbb
S  aaSB  aaaaSBB  aaaaBB  aaaaBb  aaaabb
S  aaSB  aaaaSBB  aaaaBB  aaaaBbB  aaaaBbb 
aaaabbb
What is the pattern?
L = {(aa)nbnb*} Chap3: Regular expressions, Regular languages and Regular Grammar 37
Example 3 (1)
• Given a language, you should be able give a
grammar that generates it.
• For example, give a regular (right-linear)
grammar for the language consisting of all
strings over {a, b, c} that begin with a,
contain exactly two b’s, and end with cc.

Example 3 (2)
• Give a regular (right-linear) grammar for
the language consisting of all strings over
{a, b, c} that begin with a, contain exactly
two b’s, and end with cc
S → aA
A → bB | aA | cA
B → bC | aB | cB
C → aC | cC | cD
D→c Chap3: Regular expressions, Regular languages and Regular Grammar 39
Theorem 1(1)
G NDFA
▪ Every language generated by a right-

linear grammar is regular.
▪ Proof:
– Specify a procedure for automatically
constructing an NDFA that mimics the
derivations of a right-linear grammar.

Theorem 1(2)
G NDFA
▪ Justification:
• The sentential forms produced by a right linear grammar
have exactly one variable, which occurs as the rightmost
symbol.
• Assume that our grammar has a production rule

D → dE and that, during the derivation of a string, there is a
step wcD  wcdE
• We can construct an NDFA which has states D and E, and

an arc labeled d from D to E.
• NDFAs can be converted to DFAs.
• All languages accepted by DFAs are regular.

Theorem 1(3)
G NDFA
▪ Construction:
• For each variable Vi in the grammar there will be a state in
the automaton labeled Vi.
• The initial state of the automaton will be labeled V0 and

will correspond to the S variable in the grammar.
• For each production rule Vi → a1a2…amVj the automaton

will have transitions such that δ*(Vi → a1a2…am) = Vj
• For each production rule Vi → a1a2…am the automaton will
have transitions such that δ*(Vi → a1a2…am) = Vfinal

Theorem 1(4)
G NDFA
Construct an NDFA that accepts the language

generated by the grammar:
S → aA convert to: V0 → aV1
A →abS | b V1 → abV0 | b
a b
V0 V1 Vf
b a

Theorem 2(1)
DFA G
• Every regular language can be generated

by a right-linear grammar.
• Proof:
Specify a procedure for automatically
constructing a right-linear grammar from
the DFA.

Theorem 2(2)
DFA G
• Given a regular language L, let M = (Q, , δ,

q0, F) be a DFA that accepts L. Let Q = {q0,
q1, …, qn} and  = {a1, a2, …, am}.
• Construct the grammar G = (V, T, S, P) with:
V = {q0, q1, …, qn}
T = {a1, a2, …, am}
S = q0.
P = {} initially.
• P, the set of production rules, is constructed
as follows:
Theorem 2(3)
DFA G
• For each transition

δ(qi, aj) = qk
in the transition table of M, add to P the
production:
qi → ajqk
• If qk is in F, then add to P the production:
qk → λ

Theorem 2(4)
DFA G
• Example: Construct a right-linear grammar for

the language L = L(aab*a)
• First, build an NDFA for L:
a a a
q0 q1 q2 qf

Theorem 2(5)
DFA G
q0 a q1 a q2 a qf
P = {} initially.
Add to P a rule for each transition in the NDFA:
q0 → aq1
q1 → aq2
q2 → bq2
q2 → aqf
Since qf is in F, add to P the production:
qf → λ Chap3: Regular expressions, Regular languages and Regular Grammar 48
Theorem 2(6)
DFA G
q0 a q1 a q2 a qf
Now P = You can convert to normal grammar notation:

{q0 → aq1 S → aA
q1 → aq2 A → aB
q2 → bq2 B → bB
q2 → aqf B → aC
qf → λ } C→λ

Theorem 3
▪ A language L is regular if and only if there exists a
regular grammar G such that L = L(G).
▪ Proof:
Combine our definition of regular grammars,
which includes the statement, “A regular grammar is
right-linear, with theorem 2.

Questions
Chap3: Regular expressions, Regular 51

languages and Regular Grammar

Computation Theory: Expressions Languages Grammar

Uploaded by

Copyright:

Available Formats

Computation Theory: Expressions Languages Grammar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computation Theory: Expressions Languages Grammar

Uploaded by

Copyright:

Available Formats

Computation Theory

Regular expressions , Regular languages and Regular Grammar

Let L1={abc, cab}, L2={ab, ac}

Concatenation: L1={abc, cab}, L2={ab, ac}

Kleene Star: L1={abc, cab}

Chap3: Regular expressions, Regular languages and Regular Grammar 3

• Exemple : on  ={a, b} the elementary languages are ,

• An elementary language is a regular language

• A regular language is a language that can be built from

Chap3: Regular expressions, Regular languages and Regular Grammar 4

Chap3: Regular expressions, Regular languages and Regular Grammar 6

Chap3: Regular expressions, Regular languages and Regular Grammar 7

Language Rule Expression

From R1 to R5 are all regular languages (regular expressions)

Chap3: Regular expressions, Regular languages and Regular Grammar 8

Zero or more a’s: a*

Regular expression String

1) For any regular expression r that represents language

2) For any finite automaton M that accepts language L(M),

Therefore, the class of languages that can be represented by

Chap3: Regular expressions, Regular languages and Regular Grammar 13

Let r be a regular expression. Then there exists some

Proof strategy: for any regular expression, we show how to

Chap3: Regular expressions, Regular languages and Regular Grammar 14

Base step: Give a NDFA that accepts each of the

{a} for each a   : a

Chap3: Regular expressions, Regular languages and Regular Grammar 15

Chap3: Regular expressions, Regular languages and Regular Grammar 16

Closure under Kleene Star:

Chap3: Regular expressions, Regular languages and Regular Grammar 17

1) A NDFA that accepts each of the elementary languages

2) Closure under concatenation

Chap3: Regular expressions, Regular languages and Regular Grammar 18

Chap3: Regular expressions, Regular languages and Regular Grammar 19

Chap3: Regular expressions, Regular languages and Regular Grammar 20

▪ Let L be a regular language. Then there exists a regular

▪ Any language accepted by a finite automaton can be

▪ The proof strategy: For any DFA, we show how create

• An Expression diagram is a labeled directed graph

Chap3: Regular expressions, Regular languages and Regular Grammar 22

4 pathes passing by state 2 :

Chap3: Regular expressions, Regular languages and Regular Grammar 25

Chap3: Regular expressions, Regular languages and Regular Grammar 26

Chap3: Regular expressions, Regular languages and Regular Grammar 28

• Every regular grammar generates a regular

Chap3: Regular expressions, Regular languages and Regular Grammar 34

Chap3: Regular expressions, Regular languages and Regular Grammar 36

Chap3: Regular expressions, Regular languages and Regular Grammar 38

▪ Every language generated by a right-

Chap3: Regular expressions, Regular languages and Regular Grammar 40

• Assume that our grammar has a production rule

• We can construct an NDFA which has states D and E, and

• NDFAs can be converted to DFAs.

• All languages accepted by DFAs are regular.

• The initial state of the automaton will be labeled V0 and

• For each production rule Vi → a1a2…amVj the automaton

Chap3: Regular expressions, Regular languages and Regular Grammar 42

Construct an NDFA that accepts the language

Chap3: Regular expressions, Regular languages and Regular Grammar 43