Automata Theory

Automata Theory
Theory of Computation 1
Instructor: Hassan Kassim Mohammad

Theory of computation is the theoretical study of capabilities and limitations of Computers (Theoretical
models of computation).
Objectives:
Providing students with:
o an understanding of basic concepts in the theory of computation through simple models of
computational devices.
o apply models in practice to solving problems in diverse areas such as string searching, pattern
matching, cryptography, and language design;
o understand the limitations of computing, the relative power of formal languages and the inherent
complexity of many computational problems.
o be familiar with standard tools and notation for formal reasoning about machines and programs.
REFERENCES:
1. Introduction to Computer Theory 2nd Edition
Daniel I. A. Cohen John Wiley & Sons, Inc 1997. ISBN 0-471-13772-3
2. Introduction to Automata Theory, Languages, and Computation, 2/E,
John E. Hopcroft, Rajeev Motwani, Jeffrey D.Ullman, Addison-Wesley 2001. ISBN 0-201-44124-1.
Units: 6
Grading Policy
Semester Exam Attendance Assignments & Quizzes Total

1st semester 10 2 3 15
2nd semester 10 2 3 15
Final 70 - - 70
Notes
Student must attend at least 80% of total classes to pass the course.
Any kind of cheating/plagiarism may result in a Fail grade in the course.
No labs. But you should write some programs with any language you may know.
There will be about 30 lectures 100 minutes each.
Late homework submissions will be penalized
Office Hours
Sunday, Monday, Tuesday, Wednesday
Contact Information
Office: computer science dept. room no. 67
E-mail: [email protected]
Mustansiriya University – college of sciences – computer science department – second class

Syllabus
Week Date Subject Chapter

Introduction, terminology, definitions 1
Sets and operations 1
languages 2
Regular Expressions RE 4
Finite Automata FA 5
Deterministic Finite Automaton DFA 5
Non Deterministic Finite Automaton NDFA 8
Language Accepted by Finite Automata 5
Convert Regular Expression into NFA
Constructing regular expression from Finite Automata
Finite Automata with Epsilon moves
Moore and Mealy machines 9
Converting between Moore and Mealy machine
Pumping lemma for regular languages
Kleene's Theorem 7
Regular Grammar 10
Myhill-Nerode Theorem Minimization of DFA
EXAM
Context-free Languages 13
Pushdown Automata 17
CFG/CFL to PDA 18
PDA to CFG/CFL
CFG derivation trees Parsing 22
Chomsky normal form 16
Greibach normal form 16
Ambiguous CFL's
EXAM
TURING MACHINES TM 24
COMPUTABILITY and COMPLEXITY
Unsolvable Problems
Time Complexity
CYK algorithm for CFG's
CFL pumping lemma and properties
Church Turing Thesis

As a computer IT, you must study the following:

1- Automata and formal language.
Which answers - What are computers (Or what are models of computers)
2- Compatibility.
Which answers - What can be computed by computers?
3- Complexity.
Which answers - What can be efficiently computed?
In automata we will simulates parts of computers. Or we will make mathematical models of computers
Automata are more powerful than any real computer because we can design any machine on papers that can
do everything we want.
Theory of computation is the theoretical study of capabilities and limitations of Computers

(Theoretical models of computation).
Sets
Let A, B, and C be subsets of the universal set U
Distributive properties
A (B U C) (A B) U (A C
A U (B C) (A U B) (A U C
Idempotent properties
A A A,
AUA A.
Double Complement property

(A ) A.
De Morgan’s laws
(A U B) A B
(A B) A UB
Commutative properties
A B B A,
AUB B U A.
Associative laws
A (B C) (A B) C
A U (B U C) (A U B) U C
Identity properties
AU A,
A U A.
Complement properties
AUA U,
A A .

Language
language is the set of all strings of terminal symbols derivable from alphabet.
alphabet is a finite set of symbols. For example {0, 1} is an alphabet with two symbols, {a, b} is another
alphabet with two symbols and English alphabet is also an alphabet. A string (also called a word) is a finite
sequence of symbols of an alphabet. b, a and aabab are examples of string over alphabet {a, b} and 0, 10
and 001 are examples of string over alphabet {0, 1}, A null string is a string with no symbols, usually
denoted by epsilon or lambda ( ). A language is a set of strings over an alphabet. Thus {a, ab, baa} is a
language (over alphabert {a,b}) and {0, 111} is a language (over alphabet {0,1}). The number of symbols in
a string is called the length of the string. For a string w its length is represented by |w| . It can be The empty
string (also called null string) it has no symbols. The empty string is denoted by Thus | | = 0.
For example |00100| = 5, |aab| = 3, | | = 0
Language = alphabet + string (word) + grammar (rules, syntax) + operations on languages (concatenation,
union, intersection, Kleene star)
Kinds of languages:
1- Talking language: (e.g.: English, Arabic): It has alphabet: ={a,b,c,….z}From these alphabetic we
make sentences that belong to the language.
Now we want to know is this sentence is true or false so We need a grammar.
Ali is a clever student. (It is a sentence English language.)
2- Programming language: (e.g.: c++, Pascal):It has alphabetic: ={a,b,c,.z , A,B,C,..Z , ?, /, - ,\.}
From these alphabetic we make sentences that belong to programming language.
Now we want to know if this sentence is true or false so we need a compiler to make sure that syntax is true.
3- Formal language: (any language we want.) It has strings from these strings we make sentences that
belong to this formal language.
Now we want to know is this sentence is true or false so we need rules.
Example:
Alphabetic: = {0, 1}.
Sentences: 0000001, 1010101.
Rules: Accept any sentence start with zero and refuse sentences that start with one.
So we accept: 0000001 as a sentence satisfies the rules.
And refuse: 1010101 as a sentence doesn't satisfy the rules.
Example:
Alphabetic: = {a, b}.
Sentences: ababaabb, bababbabb
Rules: Accept any sentence start with a and refuse sentences that start with b.
So we accept: aaaaabba as a sentence satisfies the rules.
And refuse: baabbaab as a sentence doesn't satisfy the rules.

Regular Expression
is a set of symbols, Thus if alphabet= {a, b}, then aab, a, baba, bbbbb, and baaaaa would all be strings of
symbols of alphabet.
In addition we include an empty string denoted by which has no symbols in it.
Examples of Kleene star:
1* is the set of strings { , 1, 11, 111, 1111, 11111, etc. }
(1100)* is the set of strings { , 1100, 11001100, 110011001100, etc. }
(00+11)* is the set of strings {epsilon, 00, 11, 0000, 0011, 1100, 1111, 000000, 000011, 001100, etc. }
(0+1)* is all possible strings of zeros and ones, often written as sigma * where sigma = {0, 1}
(0+1)* (00+11) is all strings of zeros and ones that end with either 00 or 11.
(w)+ is a shorthand for (w)(w)* w is any string or expression and the superscript plus, +
1- Concatenation:
Notation to the concatenation: . (The dot.):
if L1 = {x, xxx} and L2 = {xx} So (L1.L2) means L1 concatenated L2 and it is equal = {xxx, xxxxx}
Examples on concatenations:
Ex1:
L1 = {a, b}.
L2 = {c, d}.
L1.L2 = {ac, ad, bC, bd}
Note: ab differ from ba.
Ex2:
= {x}.
L1 = {set of all odd words over with odd length}.
L1 = {set of all even words over with odd length}.
L1= {x, xxx, xxxxx, xxxxxxx……}.
L2= { , xx, xxxx, xxxxxx…}.
L1.L2 = {x, xxx, xxxxx, xxxxxxx…}.
Note:
Ex3:
L1 = {x, xxx}.
L2 = {xx}.
L1.L2 = {xxx, xxxxx}.
Some rules on concatenation:
.x = x
L1.L2 = {set of elements}
Definition of a Regular Expression

A regular expression may be the null string, r=
A regular expression may be an element of the input alphabet, r=a
A regular expression may be the union of two regular expressions, r = r1 + r2
A regular expression may be the concatenation of two regular expressions, r = r1 r2
A regular expression may be the Kleene closure (star) of a regular expression r = r1*
A regular expression may be a regular expression in parenthesis r = (r1)

epsilon is the zero length string

0, 1, a, b, c, are symbols in sigma
x is a variable or regular expression
( ... )( ... ) is concatenation
( ... ) + ( ... ) is union
( ... )* is the Kleene Closure = Kleene Star
( )(x) = (x)( ) =
( )(x) = (x)( ) = x
( ) + (x) = (x) + ( ) = x
x+x =x
( )* = ( )( ) =
(x)* + ( ) = (x)* = x*
(x + )* = x*
x* (a+b) + (a+b) = x* (a+b)
x* y + y = x* y
(x + )x* = x* (x + ) = x*
(x+ )(x+ )* (x+ ) = x*
λ is the null string (there are no symbols in this string)

* is the set of all strings of length greater than or equal to 0
Example:
A = {a,b} // the alphabet is composed of a and b
A* = {λ, a,b,aa,ab,ba,bb,aaa,aab,…}
The symbol * is called the Kleene star.
∅ (empty set)
λ (empty string)
( ) delimiter ,
∪ + union (selection)
concatenation
Given regular expressions x and y, x + y is a regular expression

representing the set of all strings in either x or y (set union)
x = {a b} y = {c d} x + y = {a b c d}
Mark Hills CS421 Lecture 9: Regular Expressions and Finite Automata
Example 1
Let A={0,1}, W1 = 00110011, W2 = 00000
W1W2 = 0011001100000
W2W1 = 0000000110011
W1 λ = W1 = 00110011
λ W2 = W2 = 00000
x = {a b} y = {c d} xy = {ac ad bc bd}
Note:
( a + b )* = ( a*b* )*

Examples of regular expressions

Describe the language = what is the output (words, strings) of the following RE
Regular expression output(set of strings)

λ {λ}
λ* {λ}
a {a}
aa { aa }
a* {λ, a, aa, aaa, ….}
aa* { a, aa, aaa, ... }
a+ { a, aa, aaa, ...}
ba+ { ba, baa, baaa, ...}
(ba)+ { ba, baba, bababa, ...}
(a|b) { a, b }
a|b* { a, λ, b, bb, bbb, ... }
(a|b)* { λ, a, b, aa, ab, ba, bb, ... }
aa(ba)*bb { aabb, aababb, aabababb, ... }
(a + a) {a}
(a + b) {a, b}
(a + b)2 (a + b)(a + b) == {aa, ab, ba, bb}
(a + b + c) {a, b, c}
(a + b)* {λ, a, b, aa, bb, ab, ba, aaa, bbb, aab, bba, ….}
(abc) {abc}
(λ + a) bc {bc, abc}
ab* {a, ab, abb, abbb, …}
(ab)* {λ, ab, abab, ababab, …}
a + b* {a, λ, b, bb, bbb, …}
a (a + b)* {a, aa, ab, aaa, abb, aba, abaa, … }
(a + b)* a (a + b)* {a, aaa, aab, baa, bab, …}
(a + λ)* (a)* = {λ, a, aa, aaa, ….}
x* (a + b) + (a + b) x* (a + b)
x* y + y x* y
(x + λ)x* x* (x + λ) = x*
(x + λ)( x + λ)* (x + λ) x*

start with a a (a + b)*

end with b (a + b)* b
start with a and end with b
start with a or b
not start with b
contains exactly 2 a's (b)* a (b)* a (b)*
contains at least 2 a's (a + b)* a (a + b)* a (a + b)*
contains exactly 2 a's or 2 b's [(b)* a (b)* a (b)*] + [(a)* b (a)* b (a)*)]
contains even no of a [ (b)* a (b)* a (b)* ]*
not start with a and not contain b
with even length of a (aa)+
Strings containing 101
Even number of 0’s and contains 101
Even number of 0’s or contains 101
Every one has at least two zeros that follow it
Second symbol not a one
End with 00 or 01
Exercise
Ex. 1: Find a regular expression over the alphabet { a, b } that contain exactly three a's.
Ex. 2: Find a regular expression over the alphabet { a, b } that end with ab.
Ex. 3: Find a regular expression over the alphabet { a, b } that has length of 3.
Ex. 4: Find a regular expression over the alphabet { a, b } that contain exactly two successive a's.
Ex. 5: Find the output (words) for the following regular expressions.
(λ)*
(x)* + (λ)
aa* b
bba*a
(a + b)* ba
(0+1)* 00 (0+1)*
(11 + 0)* (0+11)*
01* + (00+101)*
(a+b)* abb+
(((01+10)* 11)* 00)*

Finite Automata
! " # $% & ' ( )$* + ,-( . / 0
is a device consisting of a tape and a control circuit
which satisfy the following conditions:
1. The tape start from left end and extends to the right
without an end.
2. The tape is divide into squares in each a symbol.
3. The tape has a read only head.
4. The head moves to the right one square every time it
reads a symbol. It never moves to the left. When it
sees no symbol, it stops and the automata terminates
its operation.
5. There is a control determines the state of the
automaton and also controls the movement of the head.
Input Yes
No
A DFA represents a finite state machine that recognizes a RE.

For example, the following FA: recognize (accept) string ab
A finite automaton consists of a finite set of states, a set of transitions (moves), one start state, and a set of
final states (accepting states). In addition, a DFA has a unique transition for every state combination.
it is a set of states, and its “control” moves from state to state in response to external “inputs” .
A finite automaton, FA, provides the simplest model of a computing device. It has a central processor of
finite capacity and it is based on the concept of state.
finite state machine is a 5 tuple M = (Q, A, T, S ,F), where

o Q --set of states = {q0, q1, q2, ….}
o A -- set of input symbols ={a,b, …, 0, 1, …}
o T --set of transitions or rules
o S -- an initial state
o F -- the final state -- could be more than one final state

Designing (drawing) FA
State Start Final Transition

with numbers or any name - or small arrow + or double circle (only one input or symbol on the edge)
a,b allowed means (a or b)
loop
Example: Q = { 0, 1, 2 }, A= { a, b }, F = { 1 }, the initial state is 0 and T is shown in the following table.
State (q) Input (a) Input (b)

0 1 2
1 2 2
2 2 2
Transition diagram:
TG has many inputs on the edge ab FA has only one input on the edge a
Deterministic Finite Automata DFA and Non Deterministic Finite Automata NFA
DFA: different input from state to different states NFA: one input from state to different states

Language accepted by FA
String is accepted by a FA if and only if the FA starting at the initial state and ends in an accepting state after
reading the string.
Examples of languages accepted by FA
FA RE
aa
a+ = aa*
a*
a+b
(a+b)*
a*b

b(a+b)*
(a+b)*b
a(a+b)*b
(a+b)* b(a+b)*
ab(a+b)*
a*babb*
(aa)*ba
contains 3 a's b*ab*ab*ab*

contains even number of a = (b*ab*ab*)+
a(bba + baa)*bb

Converting Regular Expression into a Finite Automata

RE FA
aa
a+ = aa*
NFA
a*
a+b
(a+b)*
a*b
b(a+b)*
NFA

(a+b)*b
NFA
a(a+b)*b
NFA
(a+b)* b(a+b)*
NFA
ab(a+b)*
NFA
a*babb*
NFA
(aa)*ba
contains 3 a's b*ab*ab*ab*

contains even number of a = (b*ab*ab*)+
dividable by 3
all bit strings that begin with 0 and end with 1
all bit strings whose number of 0's is a multiple

of 5
all bit strings with more 1's than 0's
all bit strings with no consecutive 1's

Converting NFA into DFA

Three steps : 1- find transition table ,- 12 3
2- drawing new design 4 50

3- remove unreachable states $% 6 1 7 )/ 8 9: )$; <6
Example : convert the following NDFA into DFA

state a b
q0 q1q2 q0
q1 q2 q3
q2 q2q3 q2
q3 q3 -
q1q2 q2q3 q2q3
q2q3 q2q3 q2
Note: Any state contains final mark it will be final state
3)) remove unreachable states (marked by dashed circle – state q1 and state q3 ) because we can not reach it.
---------------------------------------------------------------------------------------------------------------------------------
HW convert the following NFA into DFA

Finite State Machines with Output (Mealy and Moore Machines)
Moore Machines
Moore machine M is the 5tuple M = (Q, A, O, T, F, s) where
Q is a finite set of states
A is the finite input alphabet
O is the finite output alphabet
T is the transition function
F is the output function Q A
in addition to the start state or the initial state
A Moore machine is very similar to a Finite Automaton (FA), with a few key differences:
• It has no final states.
• It does not accept or reject input, instead, it generates output from input.
• Moore machines cannot have nondeterministic states.
Every input gives output not if word belongs to the machine or language like FA
In each state we stop we print out what inside that state (it's content)
so the output will be more than input by one because we start with start state and print out it's content before
we trace the input string
This machine might be considered
as a "counting" machine
Input string aaababbaabb

State q0q1q2q2q3q1q0q0q1q2q3q0
Output 000010000010
this machine gives 1 after each aab
aabaabaaababaab
0001001000100001
so we use Moore machine as a string recognizer to give us a mark (1) after each substring so we design a
machine put 0 in all states except the one after the one represent end of substring aba
The output of the machine contains 1 for each occurrence of the substring aab found in the input string.
H.W. Construct a Moore machine that outputs a binary string that contains a 1 for every double letter
substring in an input string composed of a’s and b’s. For example if abba is the input string 0010 is the
corresponding output.
Mealy machines
Moore machine M is a 5 tuple M = (Q, A, O, T, F ,s) where
Q is a finite set of states
A is the finite input alphabet
O is the finite output alphabet
T is the transition function
F is the output function Q A
in addition to the start state or the initial state
output on edge
same input to output
aaabb
01110
Mealy machines are finite-state machines that act as transducers or translators, taking a string on an input
alphabet and producing a string of equal length on an output alphabet.
Mealy machine does not accept or reject an input string,
The machine represented in below, outputs an E if the number of 1s read so far is even and an O if it is odd;
for example, the translation of 11100101 is OEOOOEEO.
A Mealy machine that outputs

E if the number of 1 is even
O if the number of 1 is odd
Binary inverter
The following Mealy machine takes the one's
complement of its binary input. In other words, it flips Input = 0010 Output=11010
each digit from a 0 to a 1 or from a 1 to a 0.
There are no accept states in a Mealy machine because it is not a language recogniser, it is an output
producer. Its output will be the same length as its input.

Binary Incrementer
One thing you will notice is the numbering of the

states. Usually, if there are 3 states, we number them 00, 01, and 10.
- the input bit string is a binary number fed in backward
- The output string will be the binary number that is one greater and that is generated right to left.
- The machine will have 3 states: start, carry and no-carry. The carry state represents the overflow when two
bits of 1’s are added, we print a 0 and we carry a 1.
Let the input string be 1011 (binary representation of 11).
• The string is fed into the machine as 1101 (backwards).
• The output will be 0011, which when reversed is 1100 and is the
binary representation of 12.
• In Mealy machine, output length = input length. Hence, if input were
1111, then output would be 0000 (overflow situation).
Homework:
Construct a Mealy machine that takes a string of a’s and b’s as input and outputs a binary string with a 1 at
the position of every second double letter. For example, for ababbaab the machine produces 00001010
and for the input bbb the output string 011 is produced.

Kleene's Theorem
Any language that can be defined by: Regular expression/ Finite automata/ Transition graph
Can be defined by all three methods.
Proof
There are three parts of our proof :
Part1: every language that can be defined by a FA == can be defined by a TG.
Part2: every language that can be defined by a TG == can be defined by a RE.
Part3: every language that can be defined by a RE == can be defined by a FA.
proof of part1
Every FA is itself a TG. Therefore, any language that has been defined by a FA has already
been defined by a TG.
proof of part2
The proof of this part will be by constructive algorithm. This means that we present a
procedure that starts out with a TG and ends up with a RE that defines the same language.
If we have many start states = become only one
becomes
If we have many final states = become only one
becomes
we are now going to build the RE that defines the same language as TG
reduce the number of edges or states in each time

becomes
becomes
becomes
becomes
becomes
special case :
becomes
our goal: unique start state and unique final state.

Example
Find the RE that defines the same language accepted by the following TG using Kleenes
theorem.
RE=(aa+bb)(a+b)*(aa+bb)

Example
Find the RE that defines the same language accepted by the following TG using Kleenes
theorem.
RE= [(aa+bb)+(ab+ba)(aa+bb)*(ab+ba)]*

proof of part3
Rule1: there is a FA that accepts any particular letter of the alphabet.
There is an FA that accepts only the word .
FA accepts only FA accepts only a FA accepts
a+b
Rule2: if there is a FA called FA1, that accepts the language defined by the regular
expression r1 and there is a FA called FA2, that accepts the language defined by the regular
expression r2, then there is a FA calledFA3 that accepts language defined by the regular
expression (r1+r2).
Example
We have FA1 accepts all words with a double a in them, and FA2 accepts all words ending
in b. we need to build FA3 that accepts all words that have double a or that end in b.
FA1
a b
-x1 X2 X1
X2 X3 X1
+x3 X3 X3
FA2
a b
-y1 Y1 Y2
+y2 Y1 Y2
FA3
Z1 = x1 or y1 a b
Z2 = x2 or y1 Z1 Z2 Z3
Z3 = x1 or y2 Z2 Z4 Z3
Z4 = x3 or y1 Z3 Z2 Z3
Z5 = x3 or y2 Z4 Z4 Z5
Z5 Z4 Z5

Example
a b
-x1 X2 X1
x2 X3 X1
+x3 X3 X3
a b
-+y1 Y3 Y2
Y2 Y4 Y1
Y3 Y1 Y4
Y4 Y2 Y3
a b
z1=x1 or y1 -+z1 z2 z3
z2=x2 or y3 z2 Z4 z5
z3=x1 or y2 z3 Z6 z1
z4=x3 or y1 +z4 Z7 Z8
z5=x1 or y4 Z5 Z9 Z10
z6=x2 or y4 Z6 Z8 Z10
z7=x3 or y3 +Z7 Z4 Z11
z8=x3 or y2 Z8 Z11 Z4
z9=x2 or y2 Z9 Z11 Z1
z10=x1 or y3 Z10 Z12 Z5
z11=x3 or y4 +Z11 Z8 Z7
z12=x2 or y1 +Z12 z7 Z3

HomeWork
Let FA1 accepts all words ending in a, and let FA2 accepts all words with an odd number of
letters (odd length). Build FA3 that accepts all words with odd length or end in a using
Kleene's theorem.
FA1 FA2
HomeWork
Let FA1 accepts all words ending in a, and let FA2 accepts all words end with b.
Build FA3 that accepts FA1+FA2 using Kleene's theorem.
FA1 FA2

Rule3: if there is a FA1 that accepts the language defined by the regular expression r1 and a
FA2 that accepts the language defined by the regular expression r2, then there is a FA3 that
accepts the language defined by the concatenation r1r2.
We can describe the algorithm for forming FA3 as follows:
We make a z state for each none final x state in FA1. And for each final state in FA1 we
establish a z state that expresses the options that we are continuing on FA1 or are beginning
on FA2.
state
We have to connect (merge) the final state of FA1 with the start state of FA2 to produce new
state (wich it is not final)
Example:
Z1=x1
Z2=x2
Z3=x3 or y1
Z4=y2
But it is not simple like that, so we have to take all possablites

Example
We have FA1 accepts all words with a double a in them, and FA2 accepts all words ending
in b. we need to build FA3 that accepts all words that have double a and end with b.
FA1 FA2
a B
a B
-y1 Y1 y2
-x1 X2 X1
+y2 Y1 Y2
X2 X3 X1
+x3 X3 X3
a b
Z1= x1 -z1 Z2 Z1
Z2= x2 z2 Z3 Z1
Z3= x3 or y1 z3 Z3 Z4
Z4= x3 or y2 or y1
+z4 Z3 Z4

HomeWork
Let FA1 accepts all words with a double a in them, and let FA2 accepts all words with an
odd number of letters (odd length). Build FA3 that accepts all words with odd length and
have double a using Kleene's theorem.

Rule4: if r is a regular expression and FA1 accepts exactly the language defined by r, then
there is an FA2 that will accept exactly the language defined by r*.
We can describe the algorithm for forming FA2 as follows:
Each z state corresponds to some collection of x states. We must remember each time we
reach a final state it is possible that we have to start over again at x1.
Remember that the start state must be the final state also.
Example
If r=a find r*
Example
If r=ab find r*

Example
If we have FA1 that accepts the language defined by the regular expression: r=a*+aa*b
We want to build FA2 that accept the language defined by r*.
Note: We will try to connect the final state with start state.
a b
-+x1 X2 X4
X2 X2 X3
X3 X4 X4
X4 X4 X4
z1=x1
z2=x4 a b
z3=x2 or x1 -+z1 Z3 Z2
z4=x1 or x3 or x4 Z2 Z2 Z2
z5=x1 or x2 or x4 +z3 Z3 Z4
+z4 Z5 Z2
+z5 Z5 Z4

Example
Find FA2 that accept the language defined by r1* using Kleene's theorem. r1= aa*bb*
a b
-x1 X2 X3
X2 X2 X4
X3 X3 X3
+X4 X3 X4
z1=x1
z2=x2
z3=x3
z4=x1 or x4
z5=x2 or x3
z6=x1 or x3 or x4

Problems
For the following transition graphs, find regular expression
Consider following FA
Find
r1+r2 r2 + r3 r1r2 r1r3 r2r1 r1r1 (r1)* (r2)* (r1+r2)* (r1r2)*
(r2r1)*
• is r1r2 = r2r1 why ?
• is r1 + r2 = r2 + r1 why ?
• is r1r1 = (r1)* why ?

Grammars
A grammar is a set of rules which are used to construct a language (combine words to generate sentences).
Sentence = noun verb noun

Verb = went, eat, reading
Noun = lesson, boy, school, book
Boy reading book
may be there are sentences have no meaning = book reading boy

%= 3 >$' 2 $ & ? ( , @$ ;A B C
G=(N, T, S, P)
N= set of nonterminal symbols (parts of speech (sentence, noun, verb, …))ex: S D2 * $%#$, 2 9$%A
8 EF
T= set of terminal symbols (words, or symbols in) ex: a 8 G7 D2 * $%#$, )2 9$%A
S= start symbol non-terminal used to start every derivation.
P= set of productions.
Start terminal nonterminal
Example
productions: S→aS
S→
S
The derivation for aaaa is:
S => aS
=> aaS
=> aaaS
=> aaaaS
=> aaaa = aaaa
RE= a+
Example
productions: S → SS
S→a
S→
S → SS / a /
Derivation of aa
S => SS
=> SSS
=> SSa
=> SSSa
=> SaSa
=> aSa
=> a a = aa

Example
S → aS | bS | a | b
Derive abbab
S => aS
=> abS
=> abbS
=> abbaS
=> abbab
Example
S aA / bB
A aS / a
B bS / b
Find bbaaaa
Leftmost and rightmost derivation LMD RMD

The leftmost nonterminal (LMN) in a working string is the first nonterminal that we encounter when
we scan the string from left to right.
S → aS | ab
S => aS
S => aaS
S => aaaS
S => aaaaS
S => aaaaab
Example
Consider the Grammar G = (X, T, R, S) with X = {S, A, B, a, b}, T = {a, b} and productions
S AB
A Aa | a
B bBa | ba
Example
S E
E E+T|T
T T*F | F
F (E) | id
Converting Grammar into Regular Expression

Chomsky Normal Form CNF

Context free grammar (CFG) is the most important type of grammars because it is context
free i.e. the right hand side is contains anything from terminal and nonterminal so that it is
widely using in the programming languages to represent the language rules and grammars.
It will be difficult to deal with this type of grammars because the right hand side contains
everything of terminals/nonterminals, so Chomsky introduced a new formula for constrain
this grammar to be :
The right hand side of a rule consists of: t / NN like the following grammar :
S → AB / BA /
A → AA / a
B → BB / b
Nonterminal → Nonterminal Nonterminal or Nonterminal → terminal
The conversion takes place in four stages.

Introduce a new start variable if the start in the right side.
1. Eliminate lambda
2. Eliminate all unit-rules: rules of the form A B
3. Change the terminals into nonterminals
4. Reduce rules with more than tow nonterminals into tow nonterminals
Example: Convert the following grammar into CNF:

S → ASA | aB
A→ B|S
B→b|
S’ → S
S → ASA | aB
A→ B|S
B→b|
Remove all epsilon productions, except from start variable. B ->e

S’ → S
S → ASA | aB | a
A→ B|S|
B→b|
Remove all epsilon productions, except from start variable. A ->

S’ → S
S → ASA | aB | a | SA | AS | S
A→ B|S|
B→b
Remove unit variable productions of the form S -> S

S’ → S | ASA | aB | a | SA | AS
S → ASA | aB | a | SA | AS | S
A→ B|S
B→b
Remove unit variable productions of the form S’ -> S

S’ → S| ASA | aB | a | SA | AS
S → ASA | aB | a | SA | AS
A→ B|S
B→ b
Remove unit variable productions of the form A -> B

A→ B|S|b
B→ b
Remove unit variable productions of the form A -> S

A → S | b | ASA | aB | a | SA | AS
B→ b
A → S | b | ASA | aB | a | SA | AS
B→ b
Add variables and dyadic variable rules to replace any longer productions.
S’ → AA1 | UB | a | SA | AS
S → AA1 | UB | a | SA | AS
A → b | AA1 | UB | a | SA | AS
A1→ SA
U→a
B→ b


S → bA | aB
A → a | aS | bAA
B → b | bS | aBB
Step1: no lambda
Step2: no unit production
Step3: convert small into capital
S → YA / XB
A → a / XS / YAA
B → b / YS / XBB
X→a
Y→b
Step4: convert more than 2 nonterminal into 2 nonterminal
S → YA | XB
A → a | XS | YR1
B → b | YS | XR2
X→a
Y→b
R1 → AA
R2 → BB
S aSaS / SaSb /
S SaS/SaSb/a/Sa/aS/ab/Sab/aSb 1
S SAS/SASB/a/SA/AS/AB/SAB/ASB 3
A a
B b
S SR1/SR2/a/SA/AS/AB/SR4/AR3 4
R1 AS
R2 AR3
R3 SB
R4 AB

Example
E →E+T|E–T|T
T → T*F | T/F | F
F → (E) | id
Id → a | b | c
Then the string (a + b)*c belongs to above grammar and the derivation of this string:
E→T
→T*F
→F*F
→ (E) * F
→ (E + T) * F
→ (T + T) * F
→ (F + T) * F
→ (id + T) * F
→ (a + T) * F
→ (a + F) * F
→ (a + id) * F
→ (a + b) * F
→ (a + b) * id
→ (a + b) * c
Derivation can also be nicely represented in a tree form, as bellow

Derivation Tree for the Expression (a + b)*c

The internal nodes of the derivation, or syntax, tree are nonterminal symbols and the frontier of the tree
consists of terminal symbols. The start symbol is the root and the derived symbols are nodes. The string
(a + b)*c obtained from the concatenation of the leaf nodes together from left to right.
A correct parse of the string a + b*c as a sequence of shift/reduce actions is given bellow.
Parse of the expression a + b*c
Stack Input Action

$ a + b*c$ Shift
id$ + b*c$ Reduce
F$ + b*c$ Reduce
T$ + b*c$ Reduce
E$ + b*c$ Reduce
+ E$ b*c$ Shift
b + E$ *c$ Shift
id + E$ *c$ Reduce
F + E$ *c$ Reduce
T + E$ *c$ Reduce
*T + E$ c$ Shift
c*T + E$ $ Reduce
id*T + E$ $ Reduce
F*T + E$ $ Reduce
T + E$ $ Reduce
E$ $ Accept

Derivations and Parse Trees

For every derivation there is a unique corresponding parse tree.
A derivation is a leftmost derivation if the variable chosen for substitution, at any step, is the
leftmost variable.
E → E + T | E – T | T we can write this grammar as:E →E + E | E – E | E * E | E / E | id

T→ T*F|T/F|F
F → id
Derive id + id * id
E→ E+T
→ T+T
→ F+T
→ id + T
→ id + T * F
→ id + F * F
→ id + id * F
→ id + id * id
The parse tree of an input sequence according to a CFG is the tree of derivations. For
example, the parse tree of id(x) + num(2) * id(y) is:
E
/ | \
E + T
| / | \
T T * F
| | |
F F id
| |
id num
So a parse tree has non-terminals for internal nodes and terminals for leaves.
There are tow types of parsing or derivation:
Top down parsing and Bottom up parsing

Top-down parsing starts from the start symbol of the grammar S and applies derivations until
the entire input string is derived (ie, a sequence of terminals that matches the input tokens).
For example,
E→E+T
→T+T
→F+T
→i+T
→i+T*F
→i+F*F
→i+i*F
→i+i*i
Which matches the input sequence i + i * i
Bottom-up parsing starts from the input string and uses derivations in the opposite directions
(ie, by replacing the right hand side sequence of a production with the nonterminal. It stops
when it derives the start symbol. For example,
i+i*i
→F+i*i
→T+i*i
→E+i*i
→E+ F*i
→E+T*i
→T*T
→E*T
→E

Ambiguity
A CFG is ambiguous if it generates some string with more than one parse tree.
A grammar is ambiguous if it has more than one parse tree for the same input sequence.
For example, the grammar G3 is ambiguous since it has two parse trees.
A string w is derived ambiguously in CFG G if it has two or more leftmost derivations.
Suppose, that the rules of the expression grammar were written E →E + E | E*E | id, then
two different syntax trees are the result. If the first production E →E + E were chosen then
the result would be the tree on the left, On the other hand, choosing the production E →E*E
first results in a syntax tree of an entirely different
Thus this grammar is ambiguous, because it is possible to generate two different syntax trees
for the expression a + b*c.
Exercises: Convert the following grammars into CNF:
1. S →HaSa | bSb | a | b | aa | bb
2. S →bA | aB
A →HbAA | aS | a
B →HaBB | bS | b
3. S→HAba
A →Haab
B →HAC
4. S →H0A0 |1B1 | BB
A →HC
B →HS|A
C →HS|
5. S →HaAa | bBb|
A →HC|a
B →HC | b
C →HCDE |
D →HA | B | ab

6. SIabAB,
AIbAB|A,
BI BAa|A| ,
7. SIAB|aB
AIaab|
BIbbA
1- SIaSb|ab.
2- SIaSaA|A
AIabA|b
.
3- SIabAB,
AIbAB|A,
BI BAa|A| ,
5.S → HaAD
A → aB | bAB
B→b
D→d
6. S → Aa | B
B → A | bb
A → a | bc | B
7.S → ASB |
A → HaAS | a
B → SbS | A | bb
8. Show a derivation tree for the string aabbbb with the grammar
S I AB|x
AI aB
BI Sb
9. with the following grammar Derive 1+(0+(1+0)-1)
N → N – N / N + N / (N) / D
D → 0/1

The Chomsky Hierarchy

Noam Chomsky introduced the Chomsky hierarchy which classifies grammars and languages. This hierarchy
can be amended by different types of machines (or automata) which can recognize the appropriate class of
languages.
In the late fifties Noam Chomsky, a linguist at MIT, was investigating the relationship between the syntactic
structure of languages and the meaning (semantics) of statements in a language.
The Chomsky hierarchy comprises four types of languages and their associated grammars and machines.
Type Language Grammar Machine Example

Finite Automata
Type 3 Regular language Regular grammar RG a*b*
FA
Pushdown automaton
Type 2 Context free language Context-free grammar CFG a b
PDA
Context sensitive Context sensitive grammar Linear bounded automaton
Type 1 a b c
language CSG LBA
any
Recursively Turing machine
Type 0 Unrestricted grammar UG computable
enumerable language TM
function
regular languages context-free languages context-sensitive languages recursive enumerable

languages.
Type 3 type2 type1 type0

Type 3:
regular grammars generate the regular languages. Such aiii grammar restricts its rules to a single
nonterminal on the left-hand side and a right-hand side consisting of a single terminal, possibly followed by
a single nonterminal. The rule S I is also here allowed if S does not appear on the right side of any rule.
These languages are exactly all languages that can be decided by a finite state automaton. Additionally, this
family of formal languages can be obtained by regular expressions. Regular languages are commonly used to
define search patterns and the lexical structure of programming languages.
N t/tN
A a/aB
S aS/b
Problem is left recursion A Aa
Type 2:
context-free grammars generate the context-free languages. These are defined by rules of the form
A I J with A a nonterminal and J a string of terminals and nonterminals. These languages are exactly all
languages that can be recognized by a non-deterministic pushdown automaton. Context free languages are
the theoretical basis for the syntax of most programming languages.
S (N U t)*
S SS/aA/bA
S
S abB
Type 1:
context-sensitive grammars generate the context-sensitive languages. These grammars have rules of
the form KAL I KJL with A a nonterminal and K, L and J strings of terminals and nonterminals. The strings K
and L may be empty, but J must be nonempty. The rule S I M is allowed if S does not appear on the right
side of any rule. The languages described by these grammars are exactly all languages that can be recognized
by a non-deterministic Turing machine whose tape is bounded by a constant times the length of the input.
U V U,V (N U t)+
S SS
aA bAa
BB aB
A wrong
Left side <= right side
Type 0:
unrestricted grammars include all formal grammars. They generate exactly all languages that can be
recognized by a Turing machine. The language that is recognized by a Turing machine is defined as all the
strings on which it halts. These languages are also known as the recursively enumerable languages. Note that
this is different from the recursive languages which can be decided by an always halting Turing machine.
U V U,V (N U t)*
S SS
S aAb
aA Aa
BB a
aA bAa
Ba bAb

Push Down Automata PDA

PDA consists of 7 components (7-tuple):
M = (Q, Sigma, Gamma, delta, q0, Z0, F) where
Q = a finite set of states

Sigma = a finite alphabet of input symbols (input tape)
Gamma = a finite set of push down stack symbols
Delta = a group of transitions
q0 = the initial state
Z0 = the initial stack contents - stack symbol N
F = the set of final, accepting, states
It is the machine that accepts CFG languages, the most important form is anbn , n>=1 i.e there is a relation
between exponents of both variables and it is gives strings like aaabbbN, aaaaabbbbbN , aaaaxaaaaN, abbcbbaN
Consist of basic shapes to represent PDA like triangle, diamond, trapezoidal …
START ACCEPT REJECT
The START symbol just points to the start state

The ACCEPT symbol represents a final state or accept
The REJECT symbol represents a reject state or error
Some of states are decision state; it is represent every function performed in a state by a different type of
box. The typical task performed in a state is to "read and branch" which will now be represented by a
diamond shaped box, such as:
PUSH
b a
READ
POP
N
Then we have two main operations either push the input string into the stack or pop it from the stack depending
on the reading string We need a type of memory which is the stack (First In Last Out) contain N simple to
represent the empty stack
a
a a
a a a
N N N N

We have to push the first part of string into the stack and then pop contents of the stack when we start
reading the second part of the string
For example aaabbbN We will push all "aaa" into the stack then we will pop when we start reading "bbb"
We can divide string reading into tow stages
When we get the first part "aaa" we will push them into the stack
and when we read the second part "bbb" we will pop from the stack, we should get "aaa" and the stack will
be empty and the string is reached to the space symbol
so if we read the space symbol we have to pop from the stack, if we got space symbol that is means the
string is accepted
Tracing the input string on the PDA

We will trace and witching 3 variables state, stack and the tape (input string)
For example: trace aaabbbN on the PDA a n b n n>=1
State Stack Tape

START N aaabbbN
READ N aaabbbN
PUSH aN aaabbbN
READ aN aaabbbN
PUSH aaN aaabbbN
READ aaN aaabbbN
PUSH aaaN aaabbbN
READ aaaN aaabbbN
POP aaN aaabbbN
READ aaN aaabbbN
POP aN aaabbbN
READ aN aaabbbN
POP N aaabbbN
READ N aaabbb
POP - aaabbb
ACCEPT

%& '( # $ " %& '( " ! " RE PDA

Example: a
Example a + b
Example (a + b)*
Example a* b
Example a b*
H.W. Design a PDA for a*b*
H.W. Design a PDA for a n b n n>=1
H.W. Design a PDA for ca db a n>=1
H.W. a b c d n>=1
H.W. a b c d n,m>=1

Palindrome
We can design a PDA to check the odd palindrome (string can be read from right or left) RADAR, MADAM
We will use X as a mark to distinguish the middle of the string
We will push all letters of the first part before X
Read x with no action X

Then we will pop the contents of stack
Check it with the tape, if it is same
We continue pop and read,
Until we get N then we will pop
One time if it is N so we reached
ACCEPT state else go to Reject state
Example : aabaXxabaaN
State Stack Tape

START N aaaxbbbN
READ N aaaxbbbN
PUSH aN aaaxbbbN
READ aN aaaxbbbN
PUSH aaN aaaxbbbN
READ aaN aaaxbbbN
PUSH aaaN aaaxbbbN
READ aaaN aaaxbbbN
READ aaaN aaaxbbbN
POP aaN aaaxbbbN
READ aaN aaxabbbN
POP aN aaaxbbbN
READ aN aaaxbbbN
POP N aaaxbbbN
READ N aaaxbbb
POP - aaaxbbb
ACCEPT

Non deterministic palindrome NPDA

Here there is no distinguish mark,
but our palindrome is odd
(there is (a or b) in the middle)
so we will push all letters (a and b)
and then read one (a or b)
then continue reading the tape
and pop from the stack
until we get N
H.W.
Trace the string abbababbaN on the above PDA
State Stack Tape

START N abbababbaN

n
H.W. a b n n>=1 aaaaaabbbN
State Stack Tape

START N aaaaaabbbN
H.W.
Design a PDA for a2nbnamb2m n,m>=1

Turing Machine, TM
A Turing machine is defined by M = (Q, Sigma, Gamma, delta, q0, B, F) where
Q = finite set of states including q0
Sigma = finite set of input symbols not including B
Gamma = finite set of tape symbols including Sigma and B
delta = transitions mapping Q x Gamma to Q x Gamma x {L,R}
q0 = initial state
B = blank tape symbol, initially on all tape not used for input
F = set of final states
M = ( Q, Sigma, Gamma, delta, q0, B, F)
? A ,- & O # 5%- +2 )$* P& $% 0 Q ' A B C 2 $ 2 & R ; Q ' ( $%- 2 1$ $% 9 ,+ : ( /S2

T$ () S U $. 2 @ ) S /A$. 2 1$ ) . 12) G V V $% &
$ & 2 & X A2 S $ F X ,EA 2 EF D * P T G + W ,A2 D ; 8> ,+ W ,A2
It is designed to solve 3 or more of letters with same no of letters abc …. And it can be designed for any
type of grammars
Read write direction
Input output direction
Tape same/different R/L
(a+b)b(a+b)*

n n
a b , n>=1 aaabbbY
State tape
Start aaabbbY
2 AaabbbY
2 AaabbbY
2 AaabbbY
3 AaaBbbY
4 AaaBbbY
Start AaaBbbY
AaaBbbY
AAaBbbY
.
.
.
Palindrome aabaabaaY
$ 2 $%(> # S & F$ A Z A $-S
C S2 ) E# Q /A$. 2 ) Q D * 12 $ + W ,-0 U *
H.W.
n n n
Design a Turing machine for a b c ,n>=1


Automata Theory

Uploaded by

Copyright:

Available Formats

Automata Theory

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automata Theory

Uploaded by

Copyright:

Available Formats

Automata Theory

Instructor: Hassan Kassim Mohammad

Semester Exam Attendance Assignments & Quizzes Total

Mustansiriya University – college of sciences – computer science department – second class

Week Date Subject Chapter

Mustansiriya University – college of sciences – computer science department – second class

As a computer IT, you must study the following:

Theory of computation is the theoretical study of capabilities and limitations of Computers

Double Complement property

Mustansiriya University – college of sciences – computer science department – second class

Mustansiriya University – college of sciences – computer science department – second class

Definition of a Regular Expression

Mustansiriya University – college of sciences – computer science department – second class

epsilon is the zero length string

λ is the null string (there are no symbols in this string)

Given regular expressions x and y, x + y is a regular expression

Mustansiriya University – college of sciences – computer science department – second class

Examples of regular expressions

Regular expression output(set of strings)

Mustansiriya University – college of sciences – computer science department – second class

start with a a (a + b)*

Mustansiriya University – college of sciences – computer science department – second class

A DFA represents a finite state machine that recognizes a RE.

finite state machine is a 5 tuple M = (Q, A, T, S ,F), where

Mustansiriya University – college of sciences – computer science department – second class

State Start Final Transition

Example: Q = { 0, 1, 2 }, A= { a, b }, F = { 1 }, the initial state is 0 and T is shown in the following table.

State (q) Input (a) Input (b)

Mustansiriya University – college of sciences – computer science department – second class

Examples of languages accepted by FA

Mustansiriya University – college of sciences – computer science department – second class

contains 3 a's b*ab*ab*ab*

Mustansiriya University – college of sciences – computer science department – second class

contains even number of a = (b*ab*ab*)+

Mustansiriya University – college of sciences – computer science department – second class

Converting Regular Expression into a Finite Automata

Mustansiriya University – college of sciences – computer science department – second class

contains 3 a's b*ab*ab*ab*

Mustansiriya University – college of sciences – computer science department – second class

contains even number of a = (b*ab*ab*)+

all bit strings that begin with 0 and end with 1

all bit strings whose number of 0's is a multiple

all bit strings with more 1's than 0's

all bit strings with no consecutive 1's

Mustansiriya University – college of sciences – computer science department – second class

Converting NFA into DFA

2- drawing new design 4 50

Example : convert the following NDFA into DFA

q1q2 q2q3 q2q3

Note: Any state contains final mark it will be final state

Mustansiriya University – college of sciences – computer science department – second class

Finite State Machines with Output (Mealy and Moore Machines)

Input string aaababbaabb

this machine gives 1 after each aab

A Mealy machine that outputs

Mustansiriya University – college of sciences – computer science department – second class

One thing you will notice is the numbering of the

Mustansiriya University – college of sciences – computer science department – second class

If we have many start states = become only one

contains 3 a's bababab

contains even number of a = (babab*)+

contains 3 a's bababab

contains even number of a = (babab*)+

H.W. Design a PDA for ab