Unit 1 Finite Automata and Languages: Structure Page Nos
Unit 1 Finite Automata and Languages: Structure Page Nos
Unit 1 Finite Automata and Languages: Structure Page Nos
LANGUAGES
Structure Page Nos.
1.0 Introduction 7
1.1 Objectives 7
1.2 Regular Expressions 7
1.2.1 Introduction to Defining of Languages
1.2.2 Kleene closure definition
1.2.3 Formal Definition of Regular Expressions
1.2.4 Algebra of Regular Expressions
1.3 Regular Languages 13
1.4 Finite Automata 14
1.4.1 Finite Automata
1.4.2 Another method to describe FA
1.4.3 Finite Automata as Output Devices
1.5 Summary 23
1.6 Solutions/Answers 23
1.0 INTRODUCTION
We shall study different types of theoretical machines that are mathematical models
for actual physical processes. By considering the possible inputs on which these
machines can work, we can analyze their various strengths and weaknesses. We then
arrive at what we may believe to be the most powerful machine possible. When we
do so, we would be surprised to find the computational tasks that this machine cannot
perform. This will be our ultimate result that no matter what machine we build, there
will always be questions that are simple to state but even the most powerful machine
possibly cannot answer. Along the way, we hope you would understand the concept
of computability, which is the foundation of further research in this field.
1.1 OBJECTIVES
After studying this unit, you should be able to:
7
Finite Automata and Alphabet: A finite set of symbols/characters. We generally denote an alphabet by .
Formal Languages
If we start an alphabet having only one letter, say, the letter z, then = {z}
Letter : Each symbol of an alphabet may also be called a letter of the alphabet or
simply a letter.
Example 2: If the word zzz is called c and the word zz is called d, then the word
formed by concatenating c and d is
cd = zzzzz
When two words in our language L1 are concatenated they produce another word in
the language L1. However, this may not be true in all languages.
= {zodd}
then c = zzz and d = zzzzz are both words in L2, but their concatenation cd = zzzzzzzz
is not a word in L2. The reason is simple that member of L2 are of odd length while
after concatenation it is of even length.
Note: The alphabet for L2 is the same as the alphabet for L1.
Example 4: A Language L3 may denote the language having strings of even lengths
include of length 0. In other words, L3 = { , zz, zzzz, …..}
8
In the above description of concatenation we find very commonly, that for a single Finite Automata and
Languages
letter alphabet when we concatenate c with d, we get the same word as when we
concatenate d with c, that is cd = dc But this relationship does not hold for all
languages. For example, in the English language when we concatenate “Ram” and
“goes” we get “Ram goes”. This is, indeed, a word but distinct from “goes Ram”.
Now, let us define the reverse of a language L. If c is a word in L, then reverse (c) is
the same string of letters spelled backward.
The reverse (L) = {reverse (w), w L}
Let us define a new language called PALINDROME over the alphabet = {a,b}.
For a given alphabet , the language L consists of all possible strings, including the
null string.
*
For example, If = {z}, then, = L1 = { , z, zz, zzz …..}
Example 7: If = {0, 1}, then, * = { , 0, 1, 00, 01, 10, 11, 000, 001 …..}
So, we can say that Kleene Star is an operation that makes an infinite language of
strings of letters out of an alphabet, if the alphabet, . However, by the definition
*
alphabet may also be . In that case, is finite. By “infinite language, we mean a
language with infinitely many words.
Now, we can generalise the use of the star operator to languages, i.e., to a set of
words, not just sets of alphabet letters.
Definition: If s is a set of words, then by s* we mean the set of all finite strings
formed by concatenating words from s, where any word may be used as often.
Positive Closure: If we want to modify the concept of closure to refer to only the
concatenation leading to non-null strings from a set s, we use the notation + instead of
*. This plus operation is called positive closure.
9
Finite Automata and
Formal Languages Theorem 1: For any set s of strings prove that s* = (s*)* = s**
Proof: We know that every word in s** is made up of factors from s*.
Also, every factor from s* is made up of factors from s.
Therefore, we can say that every word in s** is made up of factors from s.
For example, now we would build expression from the symbols 0,1 using the
operations of union, concatenation, and Kleene closure.
(i) 01 means a zero followed by a one (concatenation)
(ii) 0+1 means either a zero or a one (union)
(iii) 0* means +0+00+000+….. (Kleen closure) .
With parentheses, we can build larger expressions. And, we can associate meanings
with our expressions. Here’s how
Example 10: The language associated with the regular expression a*b* contains all
the strings of a’s and b’s in which all the a’s (if any) come before all the b’s (if any).
L = { , a, b, aa, ab, bb, aaa, aab, abb, bbb, aaa,…)
Note that ba and aba are not in this language. Notice also that there need not be the
same number of a’s and b’s.
Example 11: Let us consider the language L defined by the regular expression
(a+b)* a(a+b)*. The strings of the language L are obtained by concatenating a string
from the language corresponding to (a+b)* followed by a followed by a string from
the language associated with (a+b)*. We can also say that the language is a set of all
words over the alphabet = {a,b} that have an a in them somewhere.
Definition: If S and T are sets of strings of letters (they may be finite or infinite sets),
we define the product set of strings of letters to be. ST = {all combinations of a string
from S concatenated with a string from T in that order}.
Ex.4) Find a regular expression over the alphabet {0,1,} to describe the set of all
binary numerals without leading zeroes (except 0 itself). So the language is
the set
{0,1,10,11,100,101,110,111,…}.
11
Finite Automata and 1. (R+S)+T = R+(S+T)
Formal Languages
2. R+R = R
3. R+ = +R = R.
4. R+S = S+R
5. R = R=
6. R = R=R
7. (RS)T = R(ST)
8. R(S+T) = RS+RT
9. (S+T)R = SR+TR
* *
10. = =
So R+R = R
R(S+T) = RS+RT
L(R(S+T)) = L(R)L(S+T)
= L(R)(L(S)UL(T))
= (L(R)L(S))U(L(R)L(T))
= L(RS+RT)
Similarly, by using the equalities we can prove the rest. The proofs of the rest of the
equalities are left as exercises.
Example 15: Show that R+RS*S = a*bS*, where R = b+aa*b and S is any regular
expression.
= R( +S*S) (property 8)
= (b+aa*b)S* (definition of R)
= ( +aa*) bS* (properties 6 and 8)
Definition: For a given alphabet , the following rules define the regular language
associated with a regular expression.
Rule 2: For each a in , the set {a} is a regular language denoted by the regular
expression a.
(i) The language = {xy : x L and y M} is a regular expression associated with the
regular expression lm
(ii) The regular expression l+m is associated with the language formed by the union
of the sets L and M.
language (l+m) = L M
(iii) The language associated with the regular expression (l)* is L*, the Kleen Closure
of the set L as a set of words:
Now, we shall derive an important relation that, all finite languages are regular.
Theorem 4: If L is a finite language, then L can be defined by a regular expression.
In other words, all finite languages are regular.
To make one regular expression that defines the language L, turn all the words in L
into bold face type and insert plus signs between them. For example, the regular
13
Finite Automata and expression that defines the language L = {baa, abbba, bababa} is baa + abbba +
Formal Languages bababa
Example1 6: If L = {aa, ab, ba, bb}, then the corresponding regular expression is
aa + ab +ba + bb.
So, a particular regular language can be represented by more than one regular
expressions. Also, by definition, each regular language must have at least one regular
expression corresponding to it.
In our day to day life we oftenly use the word Automatic. Automation is the process
where the output is produced directly from the input without direct involvement of
mankind. The input passes from various states in process for the processing of a
language we use very important finite state machine called finite automata.
Can a machine recognise a language? The answer is yes for some machine and some
an elementary class of machines called finite automata. Regular languages can be
represented by certain kinds of algebraic expressions by Finite automaton and by
certain grammars. For example, suppose we need to compute with numbers that are
represented in scientific notation. Can we write an algorithm to recognise strings of
symbols represented in this way? To do this, we need to discuss some basic
computing machines called finite automaton.
An automata will be a finite automata if it accepts all the words of any regular
language where language means a set of strings. In other words, The class of regular
language is exactly the same as the class of languages accepted by FA’s., a
deterministic finite automata.
14
A finite automata is similar to a finite state machine. A finite automata consists of Finite Automata and
Languages
five parts:
(1) a finite set of states;
(2) a finite set of alphabets;
(3) an initial state;
(4) a subset of set of states (whose elements are called “yes” state or; accepting
state;) and
(5) a next-state function or a transition state function.
A finite automata over a finite alphabet A can be thought of as a finite directed graph
with the property that each node omits one labelled edge for each distinct element of
A. The nodes are called states. There is one special state called the start (or initial)
state, and there is a possible empty set of states called final states.
For example, the labelled graph in fig.1 given below represents a DFA over the
alphabet A = {a,b} with start state 1 and final state 4.
We always indicate the start state by writing the word start with an arrow painting to
it. Final states are indicated by double circle.
The single arrow out of state 4 labelled with a,b is short hand for two arrows from
state 4, going to the same place, one labelled a and one labelled b. It is easy to check
that this digraph represents a DFA over {a,b} because there is a start state, and each
state emits exactly two arrows, one labelled with a and one labelled with b.
2. An alphabet of possible input letters from which are formed strings that are to
be read one letter at a time.
3. A finite set of transitions that tell for each state and for each letter of the input
alphabet which state to go to next.
For example the input alphabet has only two letters a and b. Let us also assume that
there are only three states, x, y and z. Let the following be the rules of transition:
1. from state x and input a go to state y;
15
Finite Automata and 2. from state x and input b go to state z;
Formal Languages
3. from state y and input b go to state x;
Let us examine what happens to various input strings when presented to this FA. Let
us start with the string aaa. We begin, as always, in state x. The first letter of the
string is an a, and it tells us to go state y (by rule 1). The next input (instruction) is
also an a, and this tells us (by rule 3) to go back to state x. The third input is another
a, and (by Rule 1) again we go to the state y. There are no more input letters in the
input string, so our trip has ended. We did not finish in the final state (state z), so we
have an unsuccessful termination of our run.
The string aaa is not in the language of all strings that leave this FA in state z. The set
of all strings that do leave as in a final state is called the language defined by the finite
automaton. The input string aaa is not in the language defined by this FA. We may
say that the string aaa is not accepted by this FA because it does not lead to a final
state. We may also say “aaa is rejected by this FA.” The set of all strings accepted is
the language associated with the FA. So, we say that L is the language accepted by
this FA. FA is also called a language recogniser.
Let us examine a different input string for this same FA. Let the input be abba. As
always, we start in state x. Rule 1 tells us that the first input letter, a, takes us to state
y. Once we are in state y we read the second input letter, which is ab. Rules 4 now
tells us to move to state z. The third input letter is a b, and since we are in state z,
Rule 5 tells us to stay there. The fourth input letter is an a, and again Rule 5 says state
z. Therefore, after we have followed the instruction of each input letter we end up in
state z. State z is designated as a final state. So, the input string abba has taken us
successfully to the final state. The string abba is therefore a word in the language
associated with this FA. The word abba is accepted by this FA.
It is not difficult for us to predict which strings will be accepted by this FA. If an
input string is made up of only the letter a repeated some number of times, then the
action of the FA will be jump back and forth between state x and state y. No such
word can ever be accepted.
To get into state z, it is necessary for the string to have the letter b in it as soon as a b
is encountered in the input string, the FA jumps immediately to state z no matter what
state it was before. Once in state z, it is impossible to leave. When the input strings
run out, the FA will still be in state z, leading to acceptance of the string.
So, the FA above will accept all the strings that have the letter b in them and no other
strings. Therefore, the language associated with this FA is the one defined by the
regular expression (a+b)* b(a+b)*.
The list of transition rules can grow very long. It is much simpler to summarise them
in a table format. Each row of the table is the name of one of the states in FA, and
each column of this table is a letter of the input alphabet. The entries inside the table
are the new states that the FA moves into the transition states. The transition table for
the FA we have described is:
Table 1
Input
State
a b
16
Start x y z Finite Automata and
Languages
y x z
Final z z z
The machine we have already defined by the transition list and the transition table can
be depicted by the state graph in Figure 2.
Note: A single state can be start as well as final state both. There will be only one
start state and none or more than one final states in Finite Automaton.
The finite automata shown in Figure 3 can also be represented in Tabular form as
below:
Table 2
Input
State 0 1 Accept?
Start 1 1 2 No
Final 2 2 3 Yes
3 3 3 No
Before continuing, let’s examine the computation of a finite automaton. Our first
example begins in state one and reads the input symbols in turn changing states as
necessary. Thus, a computation can be characterized by a sequence of states. (Recall
that Turing machine configurations needed the state plus the tape content. Since a
finite automaton never writes, we always know what is on the tape and need only look
at a state as a configuration.) Here is the sequence for the input 0001001.
Input Read : 0 0 0 1 0 0 1
States : 1 1 1 1 2 2 2 3
17
Finite Automata and
Formal Languages Example 17 (An elevator controller): Let’s imagine an elevator that serves two
floors. Inputs are calls to a floor either from inside the elevator or from the floor
itself. This makes three distinct inputs possible, namely:
0 - no calls
1 - call to floor one
2.- call to floor two
The elevator itself can be going up, going down, or halted at a floor. If it is on a floor,
it could be waiting for a call or about to go to the other floor. This provides us with
the six states shown in figure 4 along with the state graph for the elevator controller.
Accepting and rejecting states are not included in the elevator design because
acceptance is not an issue. If we were to design a more sophisticated elevator, it
might have states that indicated:
Finite automata
a) power faukyrem
b) overloading, or
c) breakdown
18
In this case, acceptance and rejection might make sense. Finite Automata and
Languages
Let us make a few small notes about the design. If the elevator is about to move ( i.e.,
in state U1 or D2) and it is called to the floor it is presently on it will stay. (This may
be good Try it next time you are in an elevator.) And, if it is moving (up or down)
and gets called back the other way, it remembers the call by going to the U1 or D2
state upon arrival on the next floor. Of course, the elevator does not do things like
open and close doors (these could be states too) since that would have added
complexity to the design. Speaking of complexity, imagine having 100 floors.
That is our levity for this section. Now that we know what a finite automaton is, we
must (as usual) define it precisely.
We also need some additional notation. The next state function is called the transition
function and the accepting states are often called final states. The entire machine is
usually defined by presenting a transition state table or a transition diagram. In this
way, the states, alphabet, transition function, and final states are constructively
defined. The starting state is usually the lowest numbered state. Our first example of
a finite automaton is:
Where the transition function , is defined explicitly by either a state table or a state
graph.
At this point, we must make a slight detour and examine a very important yet
seemingly insignificant input string called the empty string. It is a string without any
symbols in it and is denoted as . It is not a string of blanks. An example might
make this clear. Look between the brackets in the picture below.
Let us look again at a computation by our first finite automaton. For the input 010,
our machine begins in q1, reads a 0 and goes to (q2,0) = q2 after reading the final 0.
All that can be put together as:
( ( (q1,0),1)0) = q2
*
We call this transition on strings and define it as follows:
Definition : Let M = (Q, , ,qO,F). For any input string x, input symbol a,
and state qi, the transition function on strings * takes the values:
*
(qi,(*e)) = wi
19
Finite Automata and
Formal Languages *
(qi,a) = (qi,,a) a
*
(qi,xa) = ( *(qi,x),a) a ,x *
That certainly was terse. But * is really just what one expects it to be. It merely
applies the transition function to the symbols in the string.
This machine has a set of states = {q0, q1, q2, q3) and operates over the input alphabet
{a,b}. It’s starting state is q0 which can also be shown by an arrow headed toward it
with no start point (as shown in Fig.6) and its set of final or accepting states, F = {q2}
an accepting state can also be shown by two concentric circles as shown in the fig..
The transition function is fully described twice once in figure 6 as a state graph and
once in tasble 4 as a state table.
Table 4
Input
State A b Accept?
0 3 1 No
1 3 2 No
2 2 2 Yes
3 3 3 No
If the machine receives the input bbaa, it goes through the sequence of states:
q0,q1,q2,q2,q2
While when it gets an input such as abab, it goes through the state transition:
q0,q3,q3,q3,q3
Now we shall become a bit more abstract. When a finite automaton receives an input
string such as:
x = x1x2….xn
where the xi are symbols from its input alphabet, it progresses through the sequence:
q k1 , q k 2 ,...q k n 1
where the states in the sequence are defined as:
q k1 = q 0
q k2 (q k1 , x 1 ) (q 0 , x 1 )
*
q k3 (q k 2 , x 2 ) (q 0 , x 1 x 2 )
*
q kn (q k n , x n ) (q 0 , x 1 x 2 ....x n )
20 1
Finite Automata and
Languages
Getting back to a more intuitive reality, the following table provides an assignment of
values to the symbols used above for an input of bbaba to the finite automaton of
figure 3.
i 1 2 3 4 5 6
xi b b a b a
q k1 q0 q1 q2 q2 q2 q2
Definition: The set (of strings) accepted by the finite automaton M = (Q, , ,q0,F) is
T(M) = {x *(q0,x) F}
This set of accepted strings (L(M) to mean for language accepted by M) is merely all
of the strings for which M ended up in a final or accepting state after processing the
string. For our first example (figure 1) this was all strings of 0’s and 1’s that contain
exactly one 1. Our last example (figure 3) accepted the set of strings over the alphabet
{a,b} which began with exactly two b’s.
Indicating that the machine in state i and on input a gives output x and enters state j.
In a Mealy machine, an output always takes place during a transition of the states.
The second model invented by Moore [1956], is called a Moore machine. It
associates an output letter with each state. For example, if the output associated with
state I is x, we will always write i/x inside the state circle. A typical state transition
for a Moore machine can be presented in figure8 as follows:
Example 18: Suppose we want to compute the number of sub strings of the form
21
Finite Automata and bab
Formal Languages that occurs in an arbitrary input string over the alphabet {a,b}. For example, there are
three such sub strings b, a, b in the string bab.
The diagrammatic representation of a Mealy machine for the task is given below in
figure 9:
For example, the output of this Moore machine for the simple string.
Abababaababb is 0000101000010. We can count the number of 1’s in the output
string to obtain the number of occurrence of the sub string bab.
The input symbols for the required Moore machine are 0 (no traffic detected) and 1
(traffic detected). Let G, Y and R mean the colours Green, Yellow and Red,
respectively. The output strings are GR, YR, RG, AND RY, where the first letter of a
string is the colour of the east-west light and the second letter of a string is the colour
of the north-sought light. The Moore machine model for this simple traffic
intersection problem is given below diagrammatically:
Mealy machines appear to be more useful than Moore machines. But problems like
traffic signal control have hic Moore machine solutions because each state is
associated with a new output configuration.
Ex.9) Build an FA that accepts only those words that have even lengths. Also write
the regular expression.
Ex.10) Build an FA that accepts only the word baa, ab and abb and no other words.
Also write the corresponding regular expression.
Ex.11) Build an FA that will accept the language of all words each having twice as
many a’s as the number of b’s. Also write the corresponding regular
expression.
(a) (b)
(c)
Fig. 12
1.5 SUMMARY
In this unit we introduced several formulations for regular languages, regular
expressions are algebraic representations of regular languages. Finite Automata are
machines that recognise regular languages. From regular expressions, we can derive
regular languages. We also made some other observations. Finite automata can be
used as output devices - Mealy and Moore machines.
1.6 SOLUTIONS/ANSWERS
Ex.1) (i) ababbbaa
(ii) baaababb
(iii) ab abb ab abb
(iv) baa baa
(v) ababbababb baa
23
Finite Automata and (ii) {a,ba}*= { , a, ba, aa, baba, aba, baa, … }
Formal Languages
Ex.3) (a) a+b+c
(b) ab*+ba*
(c) +a(bb)*
Ex.4) 0+1(0+1)*
Ex.5) Starting with the left side and using properties of regular expressions, we get
b*(abb* + aabb*+aaabb*)*
= b*((ab+aab+aaab)b*)* (property 9)
= (b + ab + aab + aaab)* (property 7).
Ex.8)
Ex.9)
24
Finite Automata and
Languages
25