Module 1 21CS51
Module 1 21CS51
Module 1 21CS51
-------------MODULE-1-------------
Alphabets: A symbol is an abstract entity. Letters and digits are examples of frequently used symbol. An
alphabet is a finite, non-empty set of symbol and is denoted by ∑.
Operations on string:
Concatenation: of two strings is formed by writing first string followed by second string with no space.Ex.
V=’a’, W=’cat’ then V.W=’acat’
Reverse: of the string W is obtained by writing the symbols of the string in reverse order and is denoted as
WR. ex.
W=’the’ then WR=’eht’.
Length: of a string W, denoted |W| is the number of symbols composing the string. Ex.W=’the’
then |W|=3
1
Empty String: Denoted by Ɛ is the string with zero symbol.
Power of Alphabets:if ∑ is alphabet, the set of all strings of certain length can be expressed from that
alphabet by using exponential notation. Ex. ∑={a,b} then
∑0 = { Ɛ}
∑1={a,b}
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
∑2={aa,ab,bb,ba}
∑3 ={aaa,aab,aba,abb,bbb,baa,bab,bba}
∑*=∑0Ս∑1Ս∑2Ս….∑n
Language: set of strings, all are chosen from ∑*, where ∑ is a particular alphabet is called a language.
L ∑*.
Finite automata are computing devices that accept/recognize regular languages and are used to model
operations of many systems. Their operations can be simulated by a very simple computer program.
Automaton:
A finite automaton (FA, also called a finite-state automaton or a finite-state machine) is a mathematical tool
used to describe processes involving inputs and outputs. An FA can be in one of several states and can
switch between states depending on symbols that it inputs. Once it settles in a state, it reads the next input
2
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
symbol, performs some computational task associated with the new input, outputs a symbol, and switches
to a new state depending on the input. Notice that the new state may be identical to the current state.
DFA: Deterministic Finite Automata
Definition: DFA is a finite automaton in which for each input symbol there is exactly one transition out of
each state
3
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
4
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
5
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
6
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
7
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
8
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
9
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
10
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
11
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
12
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
13
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
Definition: Let M = (Q, ∑, δ, q0, A) be a DFA where Q is set of finite states, ∑ is set of input alphabets
(from which a string can be formed), δ is transition function from Q x {∑Uε} to 2Q, q0 is the start state and
A is the final or accepting state. The string (also called language) w accepted by an NFA can be defined in
formal notation as:
Examples
14
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
15
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
16
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
17
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
Extended Transition δ*: Describes what happens when we start in any state and follow sequence of inputs.
Definition:
Let M = (Q, ∑, δ, q0, F) where
Q is non-empty, finite set of states.
∑ is non-empty, finite set of input alphabets.
q0 Q is the start state.
F Q is set of accepting or final states.
δ* is extended transition function, which is a mapping from Q X ∑ -> Q. as follows:
i. For any q ∈ Q , δ*(q, ∈)=q ii. For any q∈Q, y ∈ ∑ * ∑ δ*(q, ya)= δ(δ*(q,y
18
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
Is an NFA with Epsilon transitions. The NFA which includes transitions on the empty input Ɛ is called
ƐNFA
19
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
Let M = (Q, ∑, δ, q0, F) be an NFA with M = (Q, ∑, δ, q0, F) transitions and let S be any subset of Q. The
Ɛ-closure of S denoted as Ɛ(S) is defined by
1. Every element of S is an element of Ɛ(S).
2. For any q Є Ɛ(S) every element of δ(q, Ɛ) is in Ɛ(S)
3. No other element are in Ɛ(S)
0
1 ε
Start q r s
0 ε
1
• ε-closure(q) = { q }
• ε-closure(r) = { r, s}
Examples
20
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
21
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
Conversion from NFA-DFA (Subset Construction) With an NFA, at any point in a scanning of the input
string we may be faced with a choice of any number of paths to be followed to reach a final state. With a
DFA there is never a choice of paths. So, when we construct a DFA which accepts the same language as a
particular NFA, the conversion process effectively involves merging all possible states which can be reached
on a particular input character, from a particular state, into a single, composite, state which represents all
those paths.
Let MN = (QN, ∑, δN, q0, FN) be an NFA and accepts the language L(MN). There should be an equivalent
DFA MD = (QD, ∑D, δD, q0, FD) such that L(MD) = L(MN). The procedure to convert an NFA to its
equivalent DFA is shown below
Step1: The start state of NFA MN is the start state of DFA MD. So, add q0(which is the start state of NFA)
to QD and find the transitions from this state.
Step2: For each state [qi, qj,….qk] in QD, the transitions for each input symbol in ∑ can be obtained as
shown below:
a =[ql,qm,….qn]say.
• Add the transition from [qi, qj,….qk] to [ql, qm,….qn] on the input symbol α iff the state [ql, qm,….qn]
is added to QD in the previous step.
Step3: The state [qa, qb,….qc] QD is the final state, if at least one of the state in qa, qb, ….. qc AN i.e., at
least one of the component in [qa, qb,….qc] should be the final state of NFA.
22
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
23
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
24
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
25
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
b. Compute (pi ,a) and call this set {r1, r2, r3 … rm} This set is achieved
i 1
3. Make a state an accepting state if it includes any final states in the ε-NFA.
26
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
1 ε
Start q r s
0 ε
1
Converts to
0,1
q sr
Start
0,1
27
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
28
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
29
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
If MD = (QD, ∑D, δD, q0, FD) is the DFA constructed from NFA MN = (QN, ∑, δN, q0, FN) by the subset
construction, then L(MD) = L(MN).
Proof: Let |w| =0, that is w= ε. By the basis definitions of δ* for DFA’s and NFA’s both δ*({ q0 }, ε ) and
δ*( q0, ε) are {q0}
Let w be of length n+1, and assume the statement for length n. break w as w=xa, where a is the final
symbol of w. by the inductive hypothesis δ*({ q0 }, x )= δ*( q0, x). let both these sets of N’s states be
{p1,p2,…. pk}
Using eqn 2 and the fact that δ*({ q0 }, x )={p1,p2,…. pk}in the inductive part of the definition of δ* for
DFA’s
30
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
There can be zero or one There can be zero, one or There can be zero, one or
transition from a state on an more number of transitions more number of transitions
input symbol; from a state on an input from a state with or without
symbol an input symbol
Difficult to design The NFA are easier to Easy to construct using
design regular expression
More number of transitions Less number of More number of transitions
transitions compared to NFA
Less powerful since at any point More powerful; than DFA More powerful than NFA
of time it will be in only one state since it can be in more since at any point of time it
than one state will be in more than one state
with or without giving any
input.
31
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
Preprocessor
A preprocessor produce input to compilers. They may perform the following functions.
1. Macro processing: A preprocessor may allow a user to define macros that are short hands for
longer constructs.
2. File inclusion: A preprocessor may include header files into the program text.
3. Rational preprocessor: these preprocessors augment older languages with more modern
flow-of-control and data structuring facilities.
4. Language Extensions: These preprocessor attempts to add capabilities to the language by
certain amounts to build-in macro
COMPILER
32
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
Compiler is a translator program that translates a program written in (HLL) the source
program and translate it into an equivalent program in (MLL) the target program. As
an important part of a compiler is error showing to the programmer.
Source g
m
pgm target pgm
Compiler
Error msg
ASSEMBLER
programmers found it difficult to write or read programs in machine language. They begin to use a
mnemonic (symbols) for each machine instruction, which they would subsequently translate into
machine language. Such a mnemonic machine language is now called an assembly language.
Programs known as assembler were written to automate the translation of assembly language
in to machine language. The input to an assembler program is called source program, the output is
a machine language translation (object program).
33
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
Languages such as BASIC, SNOBOL, LISP can be translated using interpreters. JAVA also
uses interpreter. The process of interpretation can be carried out in following phases.
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis 4. Direct Execution Advantages:
Modification of user program can be easily made and implemented as execution proceeds.
Type of object that denotes a various may change dynamically.
Debugging a program and finding errors is simplified task for a program used for
Interpretation.
The interpreter for the language makes it machine independent.
Disadvantages:
The execution of the program is slower.
Memory consumption is more
34
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
by leaving the assembler in memory while the user’s program was being executed. Also the
programmer would have to retranslate his program with each execution, thus wasting translation time.
To over come this problem of wasted translation time and memory. System programmers developed
another component called loader
“A loader is a program that places programs into memory and prepares them for
execution.” It would be more efficient if subroutines could be translated into object form the loader
could “relocate” directly behind the user’s program. The task of adjusting programs or they may
be placed in arbitrary core locations is called relocation. Relocation loaders perform four
functions.
TRANSLATOR
A translator is a program that takes as input a program written in one language and produces
as output a program in another language. Beside program translation, the translator performs
another very important role, the error-detection. Any violation of d HLL specification
would be detected and reported to the programmers. Important role of translator are:
1 Translating the hll program input into an equivalent ml program.
2 Providing diagnostic messages wherever the programmer violates specification of the hll.
TYPE OF TRANSLATORS:-
INTERPRETOR
COMPILER
PREPROSSESSOR
35
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
Lexical Analysis:- Lexical Analyzer or Scanners reads the source program one character at a time,
On reading character stream of source program, it groups them into meaningful sequences called
“Lexemes”. For each lexeme analyzer produces an output called tokens. <token_name, attribute
value>
36
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
Syntax Analysis:-The second stage of translation is called Syntax analysis or parsing. In this
phase expressions, statements, declarations etc… are identified by using the results of lexical
analysis. Syntax analysis is aided by using techniques based on formal grammar of the
programming language.
A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program. A
syntax analyzer is also called as a parser. A parse tree describes a syntactic structure.
Semantic Analysis: Uses syntax tree and information in symbol table to check source program for
semantic consistency with language definition. It gathers type information and saves it in either syntax
tree or symbol table for use in Intermediate code generation.
Type checking- compiler checks whether each operator has the matching operands. Coercions-
language specification may permit some type of conversion.
Intermediate Code Generations:- An intermediate representation of the final machine language code
is produced. This phase bridges the analysis and synthesis phases of translation.
Code Optimization :-This is optional phase described to improve the intermediate code so that the
output runs faster and takes less space.
Code Generation:-The last phase of translation is code generation. A number of optimizations to
reduce the length of machine language program are carried out during this phase. The
output of the code generator is the machine language program of the specified computer.
37
AUTOMATA THEORY AND COMPILER DESIGN- 21CS51
38