Automata Chapter 1 2 &3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Chapter 1: Introduction

Definition of Automata.
An Automaton is defined as a system where energy, materials and
information’s are transformed, transmitted and used for performing some
functions without direct help of man. Example automatic packing
machine and automatic photo printing machines.

Fig. shows model discrete automaton


Input. At each of the discrete instants of time t1,t2,…., input values
I1,I2…., each of which can take a finite numbers of fixed values from input
alphabets , are applied to input side of the model shown in above fig.
Output. O1,O2,….OQ are the outputs of the model, each of which can
take finite numbers of fixed values from an output.
States. At any instants of time automaton can be in one of the states
Q1,Q2,…Qn.
State relation. The next stage of an automaton at any instant of time is
determined by the present state and the present input.
Output Relation. Output is related to either state only or to both the
input and state.

The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-
acting". An automaton (Automata in plural) is an abstract self-propelled computing
device which follows a predetermined sequence of operations automatically.
An automaton with a finite number of states is called a Finite Automaton (FA) or Finite
State Machine (FSM).
Alphabets and strings

Alphabet
 Definition − An alphabet is any finite set of symbols.
 Example − ∑ = {a, b, c, d} is an alphabet set where ‘a’, ‘b’, ‘c’, and ‘d’ are symbols.
String
 Definition − A string is a finite sequence of symbols taken from ∑.
 Example − ‘cabcad’ is a valid string on the alphabet set ∑ = {a, b, c, d}
Length of a String
1
 Definition − It is the number of symbols present in a string. (Denoted by |S|).
 Examples −
o If S = ‘cabcad’, |S|= 6
o If |S|= 0, it is called an empty string (Denoted by λ or ε)

Languages and Grammars

 Definition − A language is a subset of ∑* for some alphabet ∑. It can be finite or


infinite.
 Example − If the language takes all possible strings of length 2 over ∑ = {a, b}, then
L = { ab, aa, ba, bb }
Automata

Finite Automaton can be classified into two types −


 Deterministic Finite Automaton (DFA)
 Non-deterministic Finite Automaton (NDFA / NFA)

Finite automata, Deterministic and Non-deterministic finite automata


Description of a Deterministic Finite Automaton.

In DFA, for each input symbol, one can determine the state to which the machine will move.
Hence, it is called Deterministic Automaton. As it has a finite number of states, the machine is
called Deterministic Finite Machine or Deterministic Finite Automaton.
A DFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −
 Q is a finite set of states.

 ∑ is a finite set of symbols called the alphabet.


 δ is the transition function where δ: Q × ∑ → Q
 q0 is the initial state from where any input is processed (q 0 ∈ Q).
 F is a set of final state/states of Q (F ⊆ Q).
Graphical Representation of a DFA
A DFA is represented by digraphs called state diagram.

 The vertices represent the states.


 The arcs labeled with an input alphabet show the transitions.
 The initial state is denoted by an empty single incoming arc.
 The final state is indicated by double circles.

2
Example
Let a deterministic finite automaton be →

 Q = {a, b, c},
 ∑ = {0, 1},
 q0 = {a},
 F = {c}, and
Transition function δ as shown by the following table −

Present State Next State for Input 0 Next State for Input 1

a a b

b c a

c b c

Its graphical representation would be as follows −

Transition Systems
A transition graph or a transition system is a finite directed labeled graph in
which each vertex or node represents a state and the directed edges
indicate the transition of a state and the edges are labeled with
input/output. An initial state represented by a circle with an arrow pointing
towards it, the final state by two concentric circles, and the other states are
represented by one circle.
The transitions frorn one internal state to another are governed bv the transition function
δ, for example, for

Then the dfa is in state q0 and the current input symbol is a, the dfa will go into state q1.
Properties of Transition Function.
Property 1. δ(q, ) = q in a finite automaton. This means the state of the
3
system can be changed only by an input symbol.
Property 2. For all strings W ∑* and input symbols a.
δ(q, aw) = δ ( δ(q, a), w)
δ(q, wa) = δ ( δ(q, w), a)
This property gives the state after the automaton reads the first symbol
of a string aw and the state after the automaton reads a prefix of the string
wa.
Example : prove that for any transition function δ and for any two input
string x and y.
δ(q, xy) = δ ( δ(q, x), y)  2.1
Proof: By the method of induction on |y|. i.e. length of y. when
|y|=1,y=a.
= δ(q, xa) = δ ( δ(q, x), a) by property 2.
Consider the finite state machine whose ={0,1}, K={q0,q1,q2, q3},q0 in
K is a initial state, final state is q0. Transition function δ is given in the
table. Give the entire sequence of state for the input string 110101.

δ(q0, 110101) = δ(q1, 10101) = δ(q0, 0101)


= δ(q2, 101) = δ(q3, 01)
= δ(q1, 1) = δ(q0, )
= q0
Hence state sequence is given below
1 1 0 1 0 1
q0  q1  q0  q2  q3  q1 q0.
Initial state is q0. Input is either 0 or 1. Start from q0, read 1 and go to q1, from q1,
read 1 and go to q0, from q0, read 0 and go to q2, from q2, read 1 and go to
q3, from q3, read 0 and go to q1, from q1, read 1 and go to q0, State q0 is
a final state.

Let consider the DFA ={a,b}, K={ q0 },q0 in K is a initial state, final state is q0.

Empty string is also accepted by this machine. If initial


state is a final state the empty string will be accepted
by the machine. Set of string starting with a’s and b’s
accepted by this machine.

4
Languages L(m) = { x/x  * }
Let consider the DFA ={a,b}, K={q0,q1,D}, q0 in K is an initial state, final
state is q0.

Example
Let a deterministic finite automaton be →

 Q = {a, b, c},
 ∑ = {0, 1},
 q0 = {a},
 F = {c}, and
Transition function δ as shown by the following table −

Present State Next State for Input 0 Next State for Input 1

a a b

b c a

c b c

Its graphical representation would be as follows −

Non-Deterministic Finite State Automata.

In NFA, for a particular input symbol, the machine can move to any combination of the states in
the machine. In other words, the exact state to which the machine moves cannot be determined.
Hence, it is called Non-deterministic Automaton. As it has finite number of states, the machine is
called Non-deterministic Finite Machine or Non-deterministic Finite Automaton.

An NFA can be represented by a 5-tuple (Q, ∑, δ, q0, F) where −


 Q is a finite set of states.

5
 ∑ is a finite set of symbols called the alphabets.
 δ is the transition function where δ: Q × ∑ → 2
Q

(Here the power set of Q (2Q) has been taken because in case of NDFA, from
a state, transition can occur to any combination of Q states)
 q0 is the initial state from where any input is processed (q 0 ∈ Q).
 F is a set of final state/states of Q (F ⊆ Q).
Let see the example below, which describes the nondeterministic automaton since
there are two transitions labeled with ‘a’ out of qo.

Let a non-deterministic finite automaton be →

 Q = {a, b, c}
 ∑ = {0, 1}
 q0 = {a}
 F = {c}
The transition function δ as shown below −

Present State Next State for Input 0 Next State for Input 1

a a, b b

b c a, c

c b, c c

Its graphical representation would be as follows −

6
Consider the NFA whose ={a,b}, Q={ q0,q1,q2 },q0 in Q is initial state,
final state is q2, transition function δ is given in the table. To check
whether the given string aaabb is accepted by the machine or not.
Initial state q0. Two choose are there reading
a from q0 either it is go to q0 or q1. Similarly
two choose are there reading b from q1 it is go
either to q1 or q2. Final state is q2. The string
sequence of a’s followed by sequence of b’s
will be accepted by the machine. The string
starting with b and ending a are not accepted
by the machine. The state sequence is q0
q0q0 q1  q1  q2 for the input aaabb.

a b The initial state q0. Reading a from q0 and go to q0, again a


 q0 q0,q1 Ø
q1 q1,q2
is reading from q0 and go to q0, again a is reading from q0
Ø
* q2 Ø Ø and go to q1, again b reading from q1 and go to q1 and
finally b reading from q1 and go to q2,

Example: Ending of Strings

An NFA that accepts all binary strings that end with 101.

Design a NFA for the transition table as given below:

7
The transition diagram can be drawn by using the mapping function as given in the table.

Design an NFA with ∑ = {0, 1} in which double '1' is followed by double '0'.

Now before double 1, there can be any string of 0 and 1. Similarly, after double 0, there can be
any string of 0 and 1.
The Equivalence of DFA and NFA
Let, M = (Q, ∑, δ, q0, F) is an NFA which accepts the language L(M). There should be
equivalent DFA denoted by M' = (Q', ∑', q0', δ', F') such that L(M) = L(M'). Any string is
accepted by NFA can be accepted by a DFA. Method is called as subset.

Example 1:
Convert the given NFA to DFA.

8
For the given transition diagram we will first construct the transition table.

State 0 1

→q0 {q0, q1} {q1}

*q1 ϕ {q0, q1}


Now we will obtain δ' transition for state q0.

δ'([q0], 0) = {q0, q1}


= [q0, q1] (new state generated)
δ'([q0], 1) = {q1} = [q1]

The δ' transition for state q1 is obtained as:

δ'([q1], 0) = ϕ
δ'([q1], 1) = [q0, q1]
Now we will obtain δ' transition on [q0, q1].
δ'([q0, q1], 0) = δ(q0, 0) ∪ δ(q1, 0)
= {q0, q1} ∪ ϕ
= {q0, q1}
= [q0, q1]
Similarly,

δ'([q0, q1], 1) = δ(q0, 1) ∪ δ(q1, 1)


= {q1} ∪ {q0, q1}
= {q0, q1}
= [q0, q1]
As in the given NFA, q1 is a final state, then in DFA wherever, q1 exists that state
becomes a final state. Hence in the DFA, final states are [q1] and [q0, q1]. Therefore set
of final states F = {[q1], [q0, q1]}.

The transition table for the constructed DFA will be:

State 0 1

→[q0] [q0, q1] [q1]

*[q1] ϕ [q0, q1]

*[q0, q1] [q0, q1] [q0, q1]


The Transition diagram will be:

9
Example 2:
Convert the given NFA to DFA.

For the given transition diagram we will first construct the transition table.

State 0 1

→q0 q0 q1

q1 {q1, q2} q1

*q2 q2 {q1, q2}

Now we will obtain δ' transition for state q0.


δ'([q0], 0) = [q0]
δ'([q0], 1) = [q1]
The δ' transition for state q1 is obtained as:
δ'([q1], 0) = [q1, q2] (new state generated)
δ'([q1], 1) = [q1]
The δ' transition for state q2 is obtained as:
δ'([q2], 0) = [q2]
δ'([q2], 1) = [q1, q2]
Now we will obtain δ' transition on [q1, q2].
δ'([q1, q2], 0) = δ(q1, 0) ∪ δ(q2, 0)
= {q1, q2} ∪ {q2}
= [q1, q2]
δ'([q1, q2], 1) = δ(q1, 1) ∪ δ(q2, 1)

10
= {q1} ∪ {q1, q2}
= {q1, q2}
= [q1, q2]
The state [q1, q2] is the final state as well because it contains a final state q2. The transition
table for the constructed DFA will be:

State 0 1

→[q0] [q0] [q1]

[q1] [q1, q2] [q1]

*[q2] [q2] [q1, q2]

*[q1, q2] [q1, q2] [q1, q2]


The Transition diagram will be:

11
Chapter 2: Regular Expression and Regular languages

Regular expressions
Regular Expressions are an algebraic way to describe languages

A Regular Expression can be recursively defined as follows −


 ε is a Regular Expression indicates the language containing an empty string. (L (ε) = {ε})

 φ is a Regular Expression denoting an empty language. (L (φ) = { })

 x is a Regular Expression where L = {x}

 If X is a Regular Expression denoting the language L(X) and Y is a Regular Expression


denoting the language L(Y), then

o X + Y is a Regular Expression corresponding to the language L(X) ∪


L(Y) where L(X+Y) = L(X) ∪ L(Y).

o X . Y is a Regular Expression corresponding to the language L(X) .


L(Y) where L(X.Y) = L(X) . L(Y)

o R* is a Regular Expression corresponding to the language L(R*)where L(R*) =


(L(R))*

 Some RE Examples

Regular Expressions Regular Set

(0 + 10*) L = { 0, 1, 10, 100, 1000, 10000, … }

(0*10*) L = {1, 01, 10, 010, 0010, …}

(0 + ε)(1 + ε) L = {ε, 0, 1, 01}

(a+b)* Set of strings of a’s and b’s of any length including the null string. So
L = { ε, a, b, aa , ab , bb , ba, aaa…….}

(a+b)*abb Set of strings of a’s and b’s ending with the string abb. So L = {abb,
aabb, babb, aaabb, ababb, …………..}

(11)* Set consisting of even number of 1’s including empty string, So L= {ε,
11, 1111, 111111, ……….}

(aa)*(bb)*b Set of strings consisting of even number of a’s followed by odd

12
number of b’s , so L = {b, aab, aabbb, aabbbbb, aaaab, aaaabbb,
…………..}

(aa + ab + ba + bb)* String of a’s and b’s of even length can be obtained by concatenating
any combination of the strings aa, ab, ba and bb including null, so L =
{aa, ab, ba, bb, aaab, aaba, …………..}

Example 1:

Write the regular expression for the language accepting all combinations of a's,
over the set ∑ = {a}

Solution:

All combinations of a's means a may be zero, single, double and so on. If a is
appearing zero times, that means a null string. That is we expect the set of {ε, a, aa,
aaa, ....}. So we give a regular expression for this as

R = a*

Example 2. Write the regular expression for the language accepting all the string
containing any number of a's and b's.

Solution:

The regular expression will be:

R = (a + b)*

This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, .....}, any combination
of a and b.

The (a + b)* shows any combination with a and b even a null string.

Example 3. Write the regular expression for the language accepting all the string
which are starting with 1 and ending with 0, over ∑ = {0, 1}.

Solution:

In a regular expression, the first symbol should be 1, and the last symbol should be
0. The r.e. is as follows:

13
R = 1 (0+1)* 0

Example 4. Describe the language denoted by following regular expression

r.e. = (b* (aaa)* b*)*

Solution:

The language can be predicted from the regular expression by finding the meaning of it.
We will first split the regular expression as:

r.e. = (any combination of b's) (aaa)* (any combination of b's)

L = {The language consists of the string in which a's appear triples, there is no restriction
on the number of b's}

Regular Sets
Any set that represents the value of the Regular Expression is called a Regular Set.

Properties of Regular Sets


Property 1. The union of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(aa)* and RE2 = (aa)*
So, L1 = {a, aaa, aaaaa,.....} (Strings of odd length excluding Null)
and L2 ={ ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∪ L2 = { ε, a, aa, aaa, aaaa, aaaaa, aaaaaa,.......}
(Strings of all possible lengths including Null)
RE (L1 ∪ L2) = a* (which is a regular expression itself)
Property 2. The intersection of two regular set is regular.
Proof −
Let us take two regular expressions
RE1 = a(a*) and RE2 = (aa)*
So, L1 = { a,aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)

14
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 ∩ L2 = { aa, aaaa, aaaaaa,.......} (Strings of even length excluding Null)
RE (L1 ∩ L2) = aa(aa)* which is a regular expression itself.
Property 3. The complement of a regular set is regular.
Proof −
Let us take a regular expression −
RE = (aa)*
So, L = {ε, aa, aaaa, aaaaaa, .......} (Strings of even length including Null)
Complement of L is all the strings that is not in L.
So, L’ = {a, aaa, aaaaa, .....} (Strings of odd length excluding Null)
RE (L’) = a(aa)* which is a regular expression itself.
Property 4. The difference of two regular set is regular.
Proof −
Let us take two regular expressions −
RE1 = a (a*) and RE2 = (aa)*
So, L1 = {a, aa, aaa, aaaa, ....} (Strings of all possible lengths excluding Null)
L2 = { ε, aa, aaaa, aaaaaa,.......} (Strings of even length including Null)
L1 – L2 = {a, aaa, aaaaa, aaaaaaa, ....}
(Strings of all odd lengths excluding Null)
RE (L1 – L2) = a (aa)* which is a regular expression.
Property 5. The reversal of a regular set is regular.
Proof −
We have to prove LR is also regular if L is a regular set.
Let, L = {01, 10, 11, 10}
RE (L) = 01 + 10 + 11 + 10
LR = {10, 01, 11, 01}
RE (LR) = 01 + 10 + 11 + 10 which is regular
Property 6. The closure of a regular set is regular.
15
Proof −
If L = {a, aaa, aaaaa, .......} (Strings of odd length excluding Null)
i.e., RE (L) = a (aa)*
L* = {a, aa, aaa, aaaa , aaaaa,……………} (Strings of all lengths excluding Null)
RE (L*) = a (a)*
Property 7. The concatenation of two regular sets is regular.
Proof −
Let RE1 = (0+1)*0 and RE2 = 01(0+1)*
Here, L1 = {0, 00, 10, 000, 010, ......} (Set of strings ending in 0)
and L2 = {01, 010,011,.....} (Set of strings beginning with 01)
Then, L1 L2 = {001,0010,0011,0001,00010,00011,1001,10010,.............}
Set of strings containing 001 as a substring which can be represented by an RE −
(0 + 1)*001(0 + 1)*
Hence, proved.
Connection between regular expression and regular languages

Identities Related to Regular Expressions


Given R, P, L, Q as regular expressions, the following identities hold −
 ∅* = ε
 ε* = ε
 RR* = R*R
 R*R* = R*
 (R*)* = R*
 RR* = R*R
 (PQ)*P =P(QP)*
 (a+b)* = (a*b*)* = (a*+b*)* = (a+b*)* = a*(ba*)*
 R + ∅ = ∅ + R = R (The identity for union)
 R ε = ε R = R (The identity for concatenation)
 ∅ L = L ∅ = ∅ (The annihilator for concatenation)
 R + R = R (Idempotent law)
 L (M + N) = LM + LN (Left distributive law)
16

(M + N) L = ML + NL (Right distributive law)
 ε + RR* = ε + R*R = R*
Arden's Theorem
In order to find out a regular expression of a Finite Automaton, we use Arden’s Theorem along
with the properties of regular expressions.
Let P and Q be two regular expressions.
If P does not contain null string, then R = Q + RP has a unique solution that is R =
QP*
Proof −
R = Q + (Q + RP)P [After putting the value R = Q + RP]
= Q + QP + RPP
When we put the value of R recursively again and again, we get the following
equation −
R = Q + QP + QP2 + QP3…..
R = Q (ε + P + P2 + P3 + …. )
R = QP* [As P* represents (ε + P + P2 + P3 + ….) ]

Assumptions for Applying Arden’s Theorem


 The transition diagram must not have NULL transitions
 It must have only one initial state
Method
Step 1 − Create equations as the following form for all the states of the DFA
having n states with initial state q1.
q1 = q1R11 + q2R21 + … + qnRn1 + ε
q2 = q1R12 + q2R22 + … + qnRn2
...
qn = q1R1n + q2R2n + … + qnRnn
Rij represents the set of labels of edges from qi to qj, if no such edge exists,
then Rij = ∅
Step 2 − Solve these equations to get the equation for the final state in terms of Rij

17
Example 1

Construct a regular expression corresponding to the automata given below −

Here the initial state and final state is q1.

The equations for the three states q1, q2, and q3 are as follows −

q1 = q1a + q3a + ε (ε move is because q1 is the initial state0

q2 = q1b + q2b + q3b

q3 = q2a

Now, we will solve these three equations −

q2 = q1b + q2b + q3b

= q1b + q2b + (q2a)b (Substituting value of q3)

= q1b + q2(b + ab)

= q1b (b + ab)* (Applying Arden’s Theorem)

q1 = q1a + q3a + ε

= q1a + q2aa + ε (Substituting value of q3)

= q1a + q1b(b + ab*)aa + ε (Substituting value of q2)

= q1(a + b(b + ab)*aa) + ε

= ε (a+ b(b + ab)*aa)*

18
= (a + b(b + ab)*aa)*

Hence, the regular expression is (a + b(b + ab)*aa)*.

Example 2
Construct a regular expression corresponding to the automata given below −

Solution −
Here the initial state is q1 and the final state is q2
Now we write down the equations −
q1 = q10 + ε
q2 = q11 + q20
q3 = q21 + q30 + q31
Now, we will solve these three equations −
q1 = ε0* [As, εR = R]
So, q1 = 0*
q2 = 0*1 + q20
So, q2 = 0*1(0)* [By Arden’s theorem]
Hence, the regular expression is 0*10*.
Regular grammar
In the literary sense of the term, grammars denote syntactical rules for conversation in natural
languages. Linguistics have attempted to define grammars since the inception of natural
languages like English, Sanskrit, Mandarin, etc.

The theory of formal languages finds its applicability extensively in the fields of Computer
Science. Noam Chomsky gave a mathematical model of grammar in 1956, which is effective for writing
computer languages.

19
Grammar

A grammar G can be formally written as a 4-tuple (N, T, S, P) where −

 N or VN is a set of variables or non-terminal symbols or (states in FA) and represented by upper


cases

 T or ∑ is a set of Terminal symbols represented in lower cases.

 S is a special variable called the Start symbol, S ∈ N

 P is Production rules for Terminals and Non-terminals. A production rule has the form α → β,
where α and β are strings on VN ∪ ∑ and least one symbol of α belongs to VN.

 P represented as LHS RHS

Single Non-Terminal ε/ string of terminals/ Non-terminals

Example

Grammar G1 −

({S, A, B}, {a, b}, S, {S → AB, A → a, B → b})

Here,

 S, A, and B are Non-terminal symbols;

 a and b are Terminal symbols

 S is the Start symbol, S ∈ N

 Productions, P : S → AB, A → a, B → b

Generating a production rules from FA

20
N= {q1, q2, q3} T= {a, b} S = {q1}
P = {q1 aq1, q1bq2, q2aq3, q2bq2, q3aq1, q3bq2}

Derivations from a Grammar


Strings may be derived from other strings using the productions in a grammar.

Example
Let us consider the grammar −
G2 = ({S, A}, {a, b}, S, {S → aAb, aA → aaAb, A → ε } )
Some of the strings that can be derived are −
S ⇒ aAb using production S → aAb
⇒ aaAbb using production aA → aAb
⇒ aaaAbbb using production aA → aaAb
⇒ aaabbb using production A → ε
We will see another example The peacock is a beautiful bird. Starting
with the sentence symbol S, this can be dividing as Noun parse one NP1
and verb parse VP, then Noun parse one is split into article one and
Noun one, article one is The and Noun one is Peacock, then VP is
dividing into verb V, article two, adjective one and Noun two N2. Verb
is is, Article two is a, Adjective is beautiful and Noun two is bird
<S>  <NP1> <VP>
<NP1>  <Art1> <N1>
<Art1>  The
<N1>  Peacock
<VP>  <V> <Art2> <Adj><N2>
<V>  is
<Art2>  a
<Adj>  Beautiful

21
<N2>  Bird.
So this single arrow can be rewritten as and double
arrow directly derives.

S  <NP1> <VP>
 <NP1><V> <ART2> <ADJ> <N2>
 <NP1><V> a <ADJ><N2>
 <NP1><V> a beautiful <N2>
 <ART1> <N1><V> a beautiful <N2>
 The <N1><V> a beautiful <N2>
 The <N1> is a beautiful <N2>
 The Peacock is a Beautiful <N2>
 The Peacock is a Beautiful Bird.

We will consider one more example Venice is a beautiful city.

<S> <NP1> <VP> <S>  <NP1><VP>


<NP1> <PropN>  <PropN><VP>
<PropN> Venice  Venice <VP>
<VP> <V> <NP2>  Venice <V><NP2>
<V> Is  Venice is <NP2>
<NP2> <Art><Adj><N2  Venice is <Art><Adj><N2>
>
<Art> A  Venice is a <Adj><N2>
<Adj> beautiful  Venice is a beautiful <N2>
<N2> City  Venice is a beautiful city.

<S> rewritten as <NP1><VP>, then <NP1> rewritten as <PropN>, then <PropN> rewritten as
Venice, then <VP> rewritten as <Verb><NP2>, then <Verb> rewritten as is, then <NP2>
rewritten as <Art><Adj><N2>, then <Art> rewritten as a, then <Adj> rewritten as beautiful
and finally <N2> rewritten as city. So Grammar generates the sentence Venice is a beautiful
city.
Language Generated by a Grammar
Example
If there is a grammar
G: N = {S, A, B} T = {a, b} P = {S → AB, A → a, B → b}
Here S produces AB, and we can replace A by a, and B by b. Here, the only accepted
string is ab, i.e.,
L(G) = {ab}
Example
Suppose we have the following grammar −
G: N = {S, A, B} T = {a, b} P = {S → AB, A → aA|a, B → bB|b}
The language generated by this grammar −
L(G) = {ab, a2b, ab2, a2b2, ………}

22
= {am bn | m ≥ 1 and n ≥ 1}

Construction of a Grammar Generating a Language

We’ll consider some languages and convert it into a grammar G which produces those
languages.
Example1
Problem − Suppose, L (G) = {am bn | m ≥ 0 and n > 0}. We have to find out the
grammar G which produces L(G).
Solution
Since L(G) = {am bn | m ≥ 0 and n > 0}
the set of strings accepted can be rewritten as −
L(G) = {b, ab,bb, aab, abb, …….}
Here, the start symbol has to take at least one ‘b’ preceded by any number of ‘a’
including null.
To accept the string set {b, ab, bb, aab, abb, …….}, we have taken the productions −
S → aS , S → B, B → b and B → bB
S → B → b (Accepted)
S → B → bB → bb (Accepted)
S → aS → aB → ab (Accepted)
S → aS → aaS → aaB → aab(Accepted)
S → aS → aB → abB → abb (Accepted)
Thus, we can prove every single string in L(G) is accepted by the language generated
by the production set.
Example2
Problem − Suppose, L (G) = {am bn | m > 0 and n ≥ 0}. We have to find out the grammar
G which produces L(G).
Solution −
Since L(G) = {am bn | m > 0 and n ≥ 0}, the set of strings accepted can be rewritten as −
L(G) = {a, aa, ab, aaa, aab ,abb, …….}
Here, the start symbol has to take at least one ‘a’ followed by any number of ‘b’
including null.
To accept the string set {a, aa, ab, aaa, aab, abb, …….}, we have taken the productions

23
S → aA, A → aA , A → B, B → bB ,B → λ
S → aA → aB → aλ → a (Accepted)
S → aA → aaA → aaB → aaλ → aa (Accepted)
S → aA → aB → abB → abλ → ab (Accepted)
S → aA → aaA → aaaA → aaaB → aaaλ → aaa (Accepted)
S → aA → aaA → aaB → aabB → aabλ → aab (Accepted)
S → aA → aB → abB → abbB → abbλ → abb (Accepted)
Thus, we can prove every single string in L(G) is accepted by the language generated
by the production set.
Hence the grammar −
G: ({S, A, B}, {a, b}, S, {S → aA, A → aA | B, B → λ | bB })

Chomsky Classification of Grammars

According to Chomsky hierarchy, grammar is divided into 4 types as follows:


1. Type 0 is known as unrestricted grammar and recognized by Turing machine
2. Type 1 is known as context-sensitive grammar and recognized by linear bounded
automata
3. Type 2 is known as a context-free grammar and recognized by pushdown automata
4. Type 3 Regular Grammar and recognized by Finite automata (DFA and NFA)

Grammar Grammar Language Accepted Automaton


Type Accepted

Type 0 Unrestricted Recursively enumerable Turing Machine


grammar language

Type 1 Context-sensitive Context-sensitive Linear-bounded


grammar language automaton

Type 2 Context-free Context-free language Pushdown


grammar automaton

Type 3 Regular grammar Regular language Finite state


automaton

24
Type-0 grammars: generate recursively enumerable languages. The productions have no restrictions.
They are any phase structure grammar including all formal grammars.

They generate the languages that are recognized by a Turing machine.

The productions can be in the form of α → β where α is a string of terminals and non-terminals with at
least one non-terminal and α cannot be null. β is a string of terminals and non-terminals.

Example

S → ACaB

Bc → acB

CB → DB

aD → Db
Type-1 grammars: generate context-sensitive languages.

 First of all Type 1 grammar should be Type 0


The productions must be in the form

α A β → α γ β, | α |<= | β |

where A ∈ N (Non-terminal)

and α, β, γ ∈ (T ∪ N)* (Strings of terminals and non-terminals)

The strings α and β may be empty, but γ must be non-empty.

The rule S → ε is allowed if S does not appear on the right side of any rule. The languages generated by
these grammars are recognized by a linear bounded automaton.

Example

AB → AbBc

25
A → bcA

B→b

Type-2 grammars: generate context-free languages.

 First, it should be Type 0 and Type 1.


The productions must be in the form A → γ

where A ∈ N ( single Non-terminal)

and γ ∈ (T ∪ N)* (String of terminals and non-terminals).

These languages generated by these grammars are be recognized by a non-deterministic pushdown


automaton.

Example

S→Xa

X→a

X → aX

X → abc

X→ε
Type-3 grammars: generate regular languages. Type-3 grammars must have a single non-terminal on
the left-hand side and a right-hand side consisting of a single terminal or single terminal followed by a
single non-terminal.

The productions must be in the form X → a or X → aY

where X, Y ∈ N (Non terminal)

and a ∈ T (Terminal)

The rule S → ε is allowed if S does not appear on the right side of any rule.

Example

X→ε

X → a | aY

Y→b

26
Chapter 3: Context free languages

Context free languages


Context-Free Language (CFL) is a language generated by a Context-Free Grammar (CFG). CFG is a set
of rules for deriving (or generating) strings (or sentences) in a language.

A grammar G = (N, T, S, P) is said to be context-free if all productions in P have the form


A → α where A is a Non-terminal [A ∈ N] and α is any string containing Non-terminals and/or
terminals. [α ∈ (N ∪ T)*].

Definition: Let L be a language. Then L is a context-free language if and only if there exists a
context-free grammar G such that L = L(G).

Every regular grammar is context-free, so a regular language is also a context- free one, so we
see that the family of regular languages is a proper subset of the family of context-free
languages. Context- free grammars derive their name from the fact that the substitution of the
variable on the left of a production can be made any time such a variable appears in a sentential
form. It does not depend on the symbols in the rest of the sentential form (the context). This
feature is the consequence of allowing only a single variable on the left side of the production.

Examples of Context-Free languages:


The grammar G = ({S}, {a, b}, S, P) with productions
S → aSa
S → bSb
S→ε
is context-free. A typical derivation in this grammar is
S ⇒ aSa ⇒ aaSaa ⇒ aabSbaa ⇒ aabbaa
This makes it clear that
L(G) = {wwR : w ∈ {a, b}*}.
The language is context-free

Parsing and ambiguity


Definition: Let G = (V, T, P, S) be a CFG. A tree is a derivation (or parse) tree if:
– Every vertex has a label from V U T U {ε}
– The label of the root is S
– If a vertex with label A has children with labels X1, X2,…, Xn, from left to right,
then
A –> X1, X2,…, Xn
must be a production in P
If a vertex has label ε, then that vertex is a leaf and the only child of its’ parent
More generally, a derivation tree can be defined with any non-terminal as the root.

Example

27
For generating a language that generates equal number of a’s and b’s in te form
anbn . the context free grammar is defined as G= {(S, A), (a, b), (S aAb, A
aAb/ ε)}
SaAb
 aaAbb (AaAb)
 aaaAbbb (A aAb)
 aaabb (A ε)
 aaabbb  a3b3  anbn

28
Parsing is the process of analyzing a string of symbols, either in natural or computer language.

Ambiguity
Grammar can be derived in either leftmost or rightmost derivation. We can draw the derivation
tree called as parse tree or syntax tree.
 Parse tree is a unique one though the derivation is leftmost or rightmost.
 If there exists more than one parse tree for a given grammar that means more than one
rightmost or leftmost is possible: that grammar is said to be ambiguous grammar.

29
For example, consider the grammar
S → SS
S → aSb
S→ε
The sentence aabb has two derivation trees as shown in the figures below:

Hence the above grammar is an Ambiguous grammar.


Ambiguity is a common feature of natural languages, where it is tolerated and dealt with in a
variety of ways. In programming languages, where there should be only one interpretation of

30
each statement, ambiguity must be removed when possible. Often we can achieve this by
rewriting the grammar in an equivalent, unambiguous form.

Sentential forms
Replacing right most non terminal, then replacing right most non terminal, and so on. So always
replacing right most non terminal, such a derivation is called right most derivation. Left most
and right most derivation are very important because later on when you learn about compiler
realizing top most parsing used for left most derivation and bottom most parsing used for right
most derivation.
We will consider one more example Venice is a beautiful city.
<S> <NP1> <VP> <S>  <NP1><VP>
<NP1> <PropN>  <PropN><VP>
<PropN Venice  Venice <VP>
>
<VP> <V> <NP2>  Venice <V><NP2>
<V> Is  Venice is <NP2>
<NP2> <Art><Adj><N  Venice is <Art><Adj><N2>
2>
<Art> A  Venice is a <Adj><N2>
<Adj> beautiful  Venice is a beautiful <N2>
<N2> City  Venice is a beautiful city.

<S> rewritten as <NP1><VP>, then <NP1> rewritten as <PropN>, then <PropN>


rewritten as Venice, then <VP> rewritten as <Verb><NP2>, then <Verb> rewritten
as is, then <NP2> rewritten as <Art><Adj><N2>, then <Art> rewritten as a, then
<Adj> rewritten as beautiful and finally <N2> rewritten as city. So Grammar
generates the sentence Venice is a beautiful city.

Derivation tree or parse tree


Derivation tree is a graphical representation for the derivation of the given production rules for
the given CFG. Its called parse tree.
Properties:
• Root node indicating start symbol
• Derivation is read from left to right
• Leaf nodes are terminals
• The interior nodes are non terminals
• Example: Let G be the grammar
• S  aB | bA
• A  a | aS | bAA

31
• B  b | bS | aBB
• For the string baaabbabba find leftmost, rightmost derivation and Derivation tree
• First write the production rules separately like below
• S  aB rule1
• S  bA rule2
• Aa rule3
• A  aS rule4
• A  bAA rule5
• Bb rule6
• B  bS rule7
• B  aBB rule8

32
Left most and right most derivations
Generation of the string using production rules is called derivation. In derivation of string:
replacing the non -terminal by appropriate rule. If there are more non-terminal available then
which non-terminal has to replace first; which is based on the two methods listed below:
Leftmost Derivation
It’s a derivation in which, leftmost non-terminal is replaced first from the sentential form
Rightmost Derivation
Its derivation in which, rightmost non-terminal is replaced first from the sentential form
Simplification of context free grammar
All Grammars are not always optimized. That means grammars are containing some unnecessary
symbols(non-terminals) and this will increase the length of the grammar.
 Simplification of grammar means reduction of grammar by removing useless symbols.
This is done in three ways:
i. Removal of useless symbols
ii. Elimination of ε production
iii. Removal of unit production

P= { S  11A | 11
A0 }
33
34
35
Chomsky’s hierarchy of grammars

Type-2 Context Free Grammar:


Generate the context-free languages. These are defined by rules of the form with a non-terminal
and a string of terminals and/or non-terminals. These languages are exactly all languages that can
be recognized by a non-deterministic pushdown automaton.
Type-3 Regular Grammar:
Generate the regular languages. Such a grammar restricts its rules to a single non-terminal on the
left-hand side and the right-hand side consisting of a single terminal, possibly followed by a
single non-terminal (right regular).

36

You might also like