Unit Ii
Unit Ii
Unit Ii
INTRODUCTION TO COMPILERS
TRANSLATORS-COMPILATION AND INTERPRETATION
TRANSLATOR
A translator is a program that takes as input a program written in one language
and produces as output a program in another language. Beside program translation,
the translator performs another very important role, the error-detection. Any
violation of d HLL(High Level Language) specification would be detected and
reported to the programmers.
Important Role of Translator are:
Translating the HLL program input into an equivalent ml program.
Providing diagnostic messages wherever the programmer violates
specification of the HLL.
A translator or language processor is a program that translates an input
program written in a programming language into an equivalent program in
another language.
Source Code Translator Target Code
Execution in Translator
Types of Translators:
a. Interpreter
b. Assembler
c. Compiler
1.1.1.a INTERPRETER
An interpreter is a program that appears to execute a source program as if
it were machine language. It is one of the translators that translate high level
language to low level language.
Source Program Program Output
Interpreter
Data
Execution in Interpreter
1
During execution, it checks line by line for errors. Languages such as BASIC,
SNOBOL, LISP can be translated using interpreters. JAVA also uses interpreter. The
process of interpretation can be carried out in following phases.
1. Lexical analysis
2. Syntax analysis
3. Semantic analysis
4. Direct Execution
Example: BASIC , Lower Version of Pascal, SNOBOL, LISP & JAVA
Advantages:
Modification of user program can be easily made and implemented as execution
proceeds.
Type of object that denotes various may change dynamically.
Debugging a program and finding errors is simplified task for a
program used for interpretation.
The interpreter for the language makes it machine independent.
Disadvantages:
The execution of the program is slower.
Memory consumption is more.
1.1.1.b. ASSEMBLER
Programmers found it difficult to write or read programs in machine
language. They begin to use a mnemonic (symbols) for each machine instruction,
which they would subsequently translate into machine language. Such a
mnemonic machine language is now called an assembly language. Programs
known as assembler were written to automate the translation of assembly
language in to machine language. The input to an assembler program is called
source program, the output is a machine language translation (object program).
It translates assembly level language to machine code.
Assembly Language Assembler Machine Code
2
Portability. Assembly code is platform-specific. Porting to a different platform is
difficult
1.1.1.c. COMPILER
Compiler is a translator program that translates a program written in (HLL)
the source program and translate it into an equivalent program in (MLL) the target
program. As an important role of a compiler is error showing to the
programmer.
Source Code Compiler Target Code
Error Message
Fig1.1.4: Execution in
Compiler
Executing a program written in HLL programming language is basically
of two parts. The source program must first be compiled translated into a object
program. Then the results object program is loaded into a memory executed.
Ada compilers
ALGOL compilers
BASIC compilers
C# compilers
C compilers
C++ compilers
COBOL compilers
D compilers
Common Lisp compilers
Fortran compilers
Java compilers
Pascal compilers
PL/I compilers
Python
3
Difference between Compiler and
Interpreter
Sl.No Compiler Interpreter
1 Compiler works on the Interpreter program works
complete program at once. It line-by-line. It takes one
takes the entire program as statement at a time as input.
input.
2 Compiler generates Interpreter does not generate
intermediate code, called the intermediate object code or
object code or machine code. machine code .
3 Compiler executes conditional Interpreter executes conditional
control statements (like if-else control statements at a much
and switch-case) and logical slower speed.
constructs faster than
interpreter.
4 Compiled programs take more Interpreter does not generate
memory because the entire intermediate object code. As a
object code has to reside in result, interpreted programs
memory. are more memory efficient.
5 Compile once and run 5 Interpreted programs are
anytime. Compiled program interpreted line-by-line every
does not need to be compiled time they are run.
every time.
6 Compiler does not allow a Interpreter runs the program
program to run until it is from first line and stops
completely error-free. execution only if it encounters
an error.
7 Compiled languages are more Interpreted languages are less
efficient but difficult to debug. efficient but easier to debug.
This makes such languages
an ideal choice for new
students
8 Example: C, C++, COBOL Example: BASIC, Visual
Basic, Python, Ruby, PHP,
Perl, MATLAB, Lisp
2. Complier :
It converts the source program (HLL) into target program (LLL).
3. Assemblers :
It converts an assembly language (LLL) into machine code. Some
compilers produce assembly for further processing. Other compilers
perform the job of the assembler, producing relocatable machine code that
can be passed directly to the loader/link-editor.
Assembly code is a mnemonic version of the machine code, in
which names are used instead of binary codes for operation and names are
also given to memory addresses. A typical sequence of assembly
instructions might be
MOV a,
R1 ADD
#2, R1
MOV R1,
b
4. Loader and Link
Editors : Loader :
The process of loading consists of taking relocatable machine code,
altering the relocatable addresses and placing the altered instructions and
5
data in memory at the proper locations. The Link-editor allows us to make
a single program from several files of relocatable machine code. These
files may have been the result of several different compilations, and one
or more may be library files of routines provided by the system and
available to any program that needs them.
Link Editor :
It allows us to make a single program from several files of
relocatable machine code.
Software Tools
Many software tools that manipulate source program first perform
some kind of analysis. Some examples of such tools include:
1. STRUCTURE EDITORS: A structure editor takes as input a
sequence of commands to build a source program. The structure
editor not only performs the text creation and modification functions
of an ordinary text editor but it also analyses the program text. For
example, it can check that the input is correctly formed, can supply
keywords automatically etc., The output of such an editor is often
similar to the output of the analyses phase of the compiler.
2. PRETTY PRINTERS: A Pretty Printer analyzes a program and
prints it in such a way that the structure of the program becomes
clearly visible. For example, comments may appear in a special
font, and statements may appear with the amount of indentation
proportional to the depth of the nesting.
3. STATIC CHECKERS: A Static Checker reads a program,
analyzes it and attempts to discover potential bugs without running
the program. The analysis portion is often similar to that found in
optimizing compilers. For example, a static checker may detect that
parts of the source program can never be executed or that a certain
variable might be used before being defined. In addition it can catch
logical errors.
4. INTERPRETERS: Instead of producing a target program as a
translation, an interpreter performs the operation implied by the
source program. For an assignment statement, for example, an
interpreter builds a tree and then carry out the operations at the
nodes.
6
The Syntax Tree for the expression Position: = initial + rate * 60 is
shown below
The first three phases, forms the analysis portion of a compiler and
last three phases form the synthesis portion of a compiler. Two other
activities, symbols-table management and error handling, are shown
interacting with the six phases of lexical analysis, syntax analysis,
intermediate code generation, code optimization, and code generation.
Informally, we shall also call the symbol-table manager and the error
handler ―phases‖.
Symbol-Table Management
An essential function of a compiler is to record the identifiers used in
the source program and collect information about various attributes of each
identifier. These attributes may provide information about the storage
allocated for an identifier, its type, its scope(where in the program it is valid),
and in the case of procedure names, such things as the number and types
of its arguments, the method of passing each argument, and the type
returned, if any.
Source Program
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Code Optimizer
Code Generator
8
Target Program
ANALYSIS OF THE SOURCE PROGRAM:
Analysis consists of three phases.
1. LINEAR ANALYSIS: The stream of characters making up the source
program is read from left-to-right and grouped into tokens that are
sequence of characters having a collective meaning.
2. HIERARCHICAL ANALYSIS: Characters or tokens are grouped
hierarchically into nested collections with collective meaning.
3. SEMANTIC ANALYSIS: Certain checks are performed to ensure
that the components of the program fit together meaningfully.
LEXICAL ANALYSIS:
In a compiler, Linear analysis is called lexical analysis or scanning.
For example, in lexical analysis the characters in the assignment statement
position: = initial + rate * 60
would be grouped in to the following tokens
1. The identifier position
2. The assignment symbol :=
3. The identifier initial
4. The plus sign
5. The identifier rate
6. The multiplication sign
7. The number 60
The blanks are usually eliminated during lexical analysis.
SYNTAX ANALYSIS:
Hierarchical analysis is called parsing or syntax analysis. It involves
grouping the tokens of the source program into grammatical phrases that
are used by the compiler to synthesize output.
The hierarchical structure of a program is usually expressed by
recursive rules. The rules are
1. Any identifier is an expression
2. Any number is an expression
3. If expression1 and expression2 are expressions, then so are
expression1 + expression2
expression1 * expression2
(expression1 )
Rules (1) and (2) are basis rules, while (3) defines expressions in
terms of operators applied to other expressions. Thus by rule (1), initial and
rate are expressions. By rule(2), 60 is an expression, while by rule (3), we
can first infer that rate*60 is an expression and finally that initial + rate
* 60 is an expression.
Similarly, many languages define statements recursively by rules
such as:
1. if identifier1 is an identifier, and expression2 is an expression, then
identifier1 := expression2 is a statement
2. If expression2 is an expression and statement2 is a statement,
then
while (expression1) do statement2
if (expression1) then statement2
are statements
9
SEMANTIC ANALYSIS:
The semantic analysis checks the source program for semantic errors
and gathers type information for the subsequent code-generation phase. It
uses the hierarchical structure determined by the syntax-analysis phase to
identify the operators and operands of expressions and statements.
An important component of semantic analysis is type checking.
Here the compiler checks that each operator has operands that are
permitted by the source language specification. For example, many
programming language definitions require a compiler to report an error
every time a real number is used to index an array.
:= code optimizer
id1 +
* temp1 := id3 * 60.0
id2 id1 := id2 + temp1
id3 60
code generator
semantic analyzer
MOVF id3, R2
MULF #60.0, R2
:= MOVF id2, R1
ADDF R2, R1
id1 + MOVF R1, id1
id2 *
id3 inttoreal
60
10
Code Optimization
The code optimization phase attempts to improve the intermediate
code, so that faster running machine codes will result. Some optimizations
are trivial. There is a great variation in the amount of code optimization
different compilers perform. In those that do the most, called ‗optimizing
compilers‘, a significant fraction of the time of the compiler is spent on this
phase.
temp1 := id3 * 60.0
id1 := id2 + temp1
Code Generation
The final phase of the compiler is the generation of target code,
consisting normally of relocatable machine code or assembly code. Memory
locations are selected for each of the variables used by the program. Then,
intermediate instructions are each translated into a sequence of machine
instructions that perform the same task. A crucial aspect is the
assignment of variables to registers.
For example, using registers 1 and 2, the translation of the code in
code optimizer becomes,
MOVF id3, R2
MULF #60.0,R2
MOVF id2, R1
ADDF R2,R1
MOVF R1, id1
The first and second operands of each instruction specify a source
and destination, respectively. The F in the each instruction tells us that
instruction deal with floating point numbers.
The compiler writer, like any software developer, can profitably use
modern software development environments containing tools such as
language editors, debuggers, version managers, profilers, test harnesses,
and so on. In addition to these general software- development tools,
other more specialized tools have been created to help implement various
phases of a compiler. These tools use specialized languages for
specifying and implementing specific components, and many use quite
sophisticated algorithms. The most successful tools are those that hide the
details of the generation algorithm and produce components that can be
13
easily integrated into the remainder of the compiler. Some commonly used
compiler- construction tools include
Upon receiving a getNextToken command from the parser, the lexical analysis reads
input characters until it can identify the next token.
Secondary Task:
⚫ It produces stream of tokens that all the basic elements in a language must be token
⚫ Stripping out from the comments and whitespaces while creating the tokens
⚫ It generates symbol table which stores the information about identifiers,
constantsencountered in the input
⚫ It keeps track of line numbers
⚫ If any error is present then lexical analyzer will compare that error with source file
andline number mean while it reports the error encountered while generating the
tokens.
⚫ If source language uses as a macro preprocessor (e.g: #define pi 3.14) expansion
ofmacro may be performed by the lexical analyzer.
Some Lexical Analyzer are divided into cascade of two phases:
1 .Scanning – Responsibility for doing simple task that scans the source program
torecognize the tokens
2. Lexical analysis- Responsibility for doing complex task, perform all secondary task.
Issues in Lexical analysis: (Lexical Analysis vs. Parsing)
Reasons for separating the analysis phase of compiling into lexical analysis and
parsingare as follows:
Simplicity of design
◦ Separation of lexical from syntactical analysis -> simplify at least one of the tasks
◦ e.g. parser dealing with white spaces -> complex
Improved compiler efficiency
15
◦ Speedup reading input characters using specialized buffering techniques
Enhanced compiler portability
Input device peculiarities are restricted to the lexical analyzer
Tokens, Patterns, Lexemes:
Token: Sequence of character having a collective meaning.
Example: keyword, identifier, operators, special character constants, etc
Pattern: The set of rules by which set of string associated with single token
Example:
for a keyword the pattern is the character sequence forming that keyword
for identifiers the pattern is a letter followed by any number of
letterordigits
Lexeme: a sequence of characters in the source program matching a pattern for a token
Examples of Tokens:
16
◦ <mult_op>
◦ <id,
pointer to symbol-table entry for G>
◦ <mult_op>
◦ <id, pointer to symbol-table entry for H>
Symbol Table (The Data will be stored in symbol table as follows)
Example 2 : E = M * C ** 2
◦ <id, pointer to symbol table entry for E>
◦ <assign_op>
◦ <id, pointer to symbol-table entry for M>
◦ <mult_op>
◦ <id, pointer to symbol-table entry for C>
◦ <exp_op>
◦ <number,
integer value 2>
Symbol Table (The Data will be stored in symbol table as follows)
Lexical Errors
Few errors are visible at the lexical level alone, because lexical analyzer has a
localized view of a source program
For instance, if the string whlle is occurred in a source program
Example : whlle ( a <= 7 ) a lexical analyzer cannot tell whether whlle is a
misspelling of the keyword while or an undeclared function identifier.
Since whlle is a valid lexeme for the token id, the lexical analyzer must return the
token to the parser and some other phase of the compiler handle an error due to
misspelling of the letters. However, suppose a circumstance arises in which the lexical
analyzer is unable to proceed because none of the patterns for tokens matches any
prefix of the remaining input. The simplest recovery strategy is "panic mode"
recovery.
Actions are not followed by Lexical analyzer
o Misspelling of the keyword while
o An undeclared function identifier
Error-recovery actions
1. Delete successive characters from the remaining input, until the lexical analyzer can
find awell-formed token at the beginning of what input is left.
2. Delete one character from the remaining input.
3. Insert a missing character into the remaining input.
17
4. Replace a character by another character.
5. Transpose two adjacent characters.
18
INPUT
BUFFERING
We often have to look one or more characters beyond the next lexeme before we can be
sure wehave the right lexeme. As characters are read from left to right, each character is
stored in the buffer to form a meaningful token
We introduce a two-buffer scheme that handles large look ahead‘s safely. We then
consider animprovement involving "sentinels" that saves time checking for the ends of
buffers.
BUFFER PAIRS
A buffer is divided into two N-character halves, as shown below
Each buffer is of the same size N, and N is usually the number of characters on one disk
block.E.g., 1024 or 4096 bytes.
Using one system read command we can read N characters into a buffer.
If fewer than N characters remain in the input file, then a special character,
representedby eof, marks the end of the source file.
For each character read, we make two tests: one for the end of the buffer, and one to
determine what character is read. We can combine the buffer-end test with the test for the
current characterif we extend each buffer to hold a sentinel character at the end.
The sentinel is a special character that cannot be part of the source program, and a natural
choiceis the character eof.
The sentinel arrangement is as shown below:
Note that eof retains its use as a marker for the end of the entire input. Any eof that appears other
thanat the end of a buffer means that the input is at an end.
Code to advance forward pointer:
forward : = forward + 1;
if forward ↑ = eof then begin
if forward at end of first half then
beginreload second half;
forward := forward +
1end
else if forward at end of second half then
beginreload first half;
move forward to beginning of first
halfend
else /* eof within a buffer signifying end of input
*/terminate lexical analysis
end
20
in s. e.g., string ―cs6660‖ is of length six.
• The empty string denoted by ε – length of empty string is zero.
The term language denotes any set of strings over some fixed alphabet.
• e.g., {ε} – set containing only empty string is language under φ.
Operations on string:
1.Concatenation of string
If x and y are strings, then the concatenation of x and y (written as xy) is the
stringformed by appending y to x. x = hello and y = world; then xy is helloworld.
2.Exponentiation Let s be a string, then
s0 = ε, s1 = s, s2 = ss, s3 = sss, … sn = sss…s(n times) so on.
3. Identity element sε = εs = s.
TERM DEFINITION
Prefix of s A string obtained by removing zero or more trailing symbols of string
s;
e.g., cs is a prefix of cs6660.
Suffix of s A string formed by deleting zero or more of the leading symbols of
s;
e.g., 660 is a suffix of cs6660.
Substring of s A string obtained by deleting a prefix and a suffix from s;
e.g., 66 is a substring of cs6660.
Proper prefix, Any nonempty string x that is a prefix, suffix or substring of s that
suffix,or substring s <> x.
of s
Subsequence of s Any string formed by deleting zero or more not necessarily
contiguoussymbols from s; e.g., c60 is a subsequence of cs6660.
Operations on Languages :
There are several operations that can be applied to languages:
Definitions of operations on languages L and M:
OPERATION DEFINITION
Union of L and M. written LυM L υ M = { s | s is in L or s is in M }
Concatenation of L and M. written LM LM = { st | s is in L and t is in M }
Kleene closure of
L.written L*
1. L U D is the set of letters and digits — strictly speaking the language with 62 strings of
lengthone, each of which strings is either one letter or one digit.
2. LD is the set of 520 strings of length two, each consisting of one letter followed by one digit.
3. L5 is the set of all 5-letter strings.
4. L* is the set of all strings of letters, including ε, the empty string.
5. L(L U D)* is the set of all strings of letters and digits beginning with a letter.
6. D+ is the set of all strings of one or more digits.
Example 2
Let W be the set of characters {c,o,m,p,i,l,e,r} and let N be the set of digits {1,2,3}
Operations:
1. W U N is the set of characters and digits —language with 11 strings of length one,
each ofwhich strings is either one letter or one digit.
2. WN is the set of 24 strings of length two, each consisting of one characters followed by
onedigit.
3. W3 is the set of all 3-character strings.
4. W* is the set of all strings of characters, including ε, the empty string.
5. W(W U N)* is the set of all strings of characters and digits beginning with a character.
6. N+ is the set of all strings of one or more digits.
Regular Expressions
It allows defining the sets to form tokens.
Defines a Pascal identifier –identifier is formed by a letter followed by zero or
moreletters or digits. e.g., letter ( letter | digit) *
A regular expression is formed using a set of defining rules.
Each regular expression r denotes a language L(r).
The rules that define the regular expressions over alphabet ∑. Associated with each
ruleis a specification of the language denoted by the regular expression being defined
BASIS
Rule 1 ε is a regular expression that denotes {ε}, i.e. the set containing
theempty string.
Rule2 If a is a symbol in ∑, then a is a regular expression that denotes {a},
i.e. the set containing the string a.
INDUCTION (Rule 3). Suppose r and s is regular expressions denoting the
languages L(r) and L(s). Then
(r) | (s) is a regular expression denoting the languages L(r) U L(s).
(r)(s) is a regular expression denoting the languages L(r)L(s).
(r)* is a regular expression denoting the languages (L(r))*.
(r) is a regular expression denoting the languages L(r).
• A language denoted by a regular expression is said to be a regular set.
• The specification of a regular expression is an example of a recursive definition.
22
Order of evaluate Regular expression:
As defined, regular expressions often contain unnecessary pairs of parentheses. We may
dropcertain pairs of parentheses if we adopt the conventions that:
The unary operator * has highest precedence and is left associative.
Concatenation has second highest precedence and is left associative.
| has lowest precedence and is left associative.
Example 3 we may replace the regular expression (a)|((b)*(c)) by a|b*c.
Both expressions denote the set of strings that are either a single a or are zero or more
b's followed by one c.
Example 4 Let £ = {a, b}.
1. The regular expression a|b denotes the language {a, b}.
2. (a|b)(a|b) denotes {aa, ab, ba, bb}, the language of all strings of length two over the alphabet
E. Another regular expression for the same language is aa|ab|ba|bb.
3. a* denotes the language consisting of all strings of zero or more a's, tha t is, { ε, a,aa,aaa,...}.
4. (a|b)* denotes the set of all strings consisting of zero or more instances of a or b, that is, all
strings of a's and b's: { ε,a, b,aa, ab, ba, bb,aaa,...}. Another regular expression for the same
language is (a*b*)*.
5. a|a*b denotes the language {a, b, ab, aab, aaab,...}, tha t is, the string a and all strings
consisting of zero or more a's and ending in b.
Regular set
A language that can be defined by a regular expression is called a regular set. If two regular
expressions r and s denote the same regular set, we say they are equivalent and write r = s.
For instance, (a|b) = (b|a).
Below Table shows some of the algebraic laws that hold for arbitrary regular expressions r, s, and t.
Regular Definition
For notational convenience, we need to give names for regular expressions and to
defineregular expressions using these names as if they were symbols.
23
Identifiers are the string of letters and digits beginning with a letter. The following
regulardefinition provides clear specification for the string.
If ∑ is an alphabet of basic symbols, then a regular definition is a sequence of definitions of the
form
d1 →
r1 d2
→ r2
…
dn → rn
Where each di is a distinct name, and each ri is a regular expression over the symbols in ∑ U {d1,
d2, … , di-1}, i.e., the basic symbols and the previously defined names.
Example 14 C identifiers are strings of letters, digits, and underscores. Here is a
regulardefinition for the language of C identifiers.
letter_ A|B|. . . |Z|a|b|. . . |z|
_digit 0 | 1 | • • • | 9
id letter_ ( letter_ | digit )*
Example :Unsigned numbers (integer or floating point) are strings such as 5280, 0.01234,
6.336E4, or 1.89E-4. The regular definition
digit 0 | 1 | • • • | 9
digits digit digit*
optionalFraction . digits | ε
optionalExponent ( E ( + | - | ε ) digits ) | ε
number digits optionalFraction optionalExponent
Example :
Write a Regular Definition to represent date in the following format: JAN 5th2016
Date format = Month Date Year
Month =JAN|FEB| ........... |DEC
Date =[0-3] [0-9]th|1st|2nd|3rd
Year =[1|2] [0-9]3
Extensions of Regular Expressions
• Notational Shorthand:
– This shorthand is used in certain constructs that occur frequently in regular expressions.
1. One or more instances. The unary, postfix operator + represents the positive closure of
a regular expression and its language. That is, if r is a regular expression, then (r) +
denotes the language (L(r)) + . The operator + has the same precedence and associativity
as the operator *. Two useful algebraic laws, r* = r + | ε and r + = rr* = r*r relate the
Kleene closure and positive closure.
2. Zero or one instance. The unary postfix operator ? means "zero or one occurrence."
That is, r? is equivalent to r|ε, or put another way, L(r?) = L(r) U { ε }. The ? operator
has the same precedence and associativity as * and +.
3. Character classes. A regular expression a1|a2| |an, where the ai‘s are each symbols of
the alphabet, can be replaced by the shorthand [a1 a2 . . . an]. e.g., consecutive
uppercaseletters, lowercase letters, or digits, we can replace them by a1-an , that is, just
24
the first andlast separated by a hyphen. Thus, [abc] is shorthand for a|b|c, and [a-z] is
shorthand for a|b|--
-|z .
Example Using these shorthands, we can rewrite the regular definition of Example 3.4.1 as:
letter_ [A- Za-z_]
digit [0-9]
id letter_ ( letter_ | digit )*
the regular definition of Example 3.4.2 as:
digit [0-9]
digits digit+
optionalFraction (. digits )?
optionalExponent ( E [+-]? digits ) ?
number digits (. digits )? ( E [+ - ]? digits )
?
Recognition of tokens:
We learn how to express pattern using regular expressions. Now, we must study how to take
the patterns for all the needed tokens and build a piece of code that examines the
input string and finds a prefix that is a lexeme matching one of the patterns
|є
termTerm
→id
|number
For relop ,we use the comparison operations of languages like Pascal or SQL where = is
―equals‖ and < > is ―not equals‖ because it presents an interesting structure of lexemes.
The terminal of grammar, which are if, then , else, relop ,id and numbers are the names of
tokens as far as the lexical analyzer is concerned, the patterns for the tokens are described
using regular definitions.
Patterns for tokens of above grammar
digit → [0,9]
25
digits
→digit+
number →digit(.digit)?(e.[+-
id
→letter(letter/digit)*if
→ if
then
→thenelse
→else
In addition, we assign the lexical analyzer the job stripping out white space, by recognizing
the ―token‖ we defined by:
WS → (blank/tab/newline)+
Here, blank, tab and newline are abstract symbols that we use to express the ASCII characters
of the same names. Token ws is different from the other tokens in that ,when we recognize it,
we do not return it to parser ,but rather restart the lexical analysis from the character
that follows the white space . It is the following token that gets returned to the parser.
26
Transition Diagram
Transition Diagram consists of a collection of nodes or circles, called states.
Each state represents a rule that occurs during the process of reading the
input looking for a lexeme that matches one of many patterns .
Edges are directed from one state to another. Each edge is labeled by a symbol
or set of symbols. If we are in one state s, and the next input symbol is a, we
check for an edge out of state s labeled by a. if we find such an edge ,we advance
the forward pointer and enter the state of the transition diagram to which that
edge leads.
27
Transition Diagram for identifier
28
Transition Diagram for identifier
29