CD Unit 1
CD Unit 1
CD Unit 1
INTRODUCTION
Divyaprabha K N
Department of Computer Science & Engineering
COMPILER DESIGN
Syllabus
Unit 2 : Bottom-up parsing: Shift-Reduce Parsing, LR (0), SLR, viable prefixes, CLR,
LALR. Syntax-Directed Translation Syntax-directed definitions, Evaluation orders
for SDD’s, Applications of Syntax-Directed Translation, and Syntax-directed
Translation Schemes – Postfix Translation Schemes. – 20 Slots
COMPILER DESIGN
Syllabus
Unit 3 : Parser Stack Implementation: Parser Stack Implementation of Postfix SDT's, SDT's
with actions inside Productions, SDT's for L-Attributed Definitions. Implementing L-
Attributed SDD’s: Bottom-Up Parsing. Intermediate-Code Generation Variants of Syntax
Trees – Directed Acyclic Graphs for Expressions, Three-Address Code – Addresses and
Instructions, Quadruples, Triples, Indirect Triples, SSA Form, Control Flow Graph. – 22 Slots
Desirable Knowledge:
Course Contents
T1 -
“Compilers–Principles, Techniques and
Tools” Alfred V. Aho, Monica S. Lam,
Ravi Sethi, Jeffery D. Ullman; 2nd
Edition
R1 –
“Modern Compiler Design”, Dick Grune,
Kees van Reeuwijk, Henri E. Bal, Ceriel
J.H. Jacobs, Koen Langendoen; 2nd
Edition
1. Use the knowledge of patterns, tokens and regex for solving the problems
in the field of data mining.
•In some of these cases, the initial implementation was not self-
hosted, but rather, written in another language (or even in machine
language); in other cases, the initial implementation was developed
using bootstrapping.
Compiler Design
Bootstrapping (Compilers)
https://www.geeksforgeeks.org/bootstrapping-in-compiler-design/
https://www.geeksforgeeks.org/bootstrapping-in-compiler-design/
Native Compiler
Generates code for the same Platform Converts
on which it runs. high language into
computer’s native language. A cross
Example: Turbo C or GCC compiler compiler is for
Cross compiler cross-platform
A cross compiler is a compiler capable of creating executable code for a software
platform other than the one on which the compiler is running. development
of binary code
For example, a compiler that runs on a Windows 7 PC but generates
code that runs on Android smartphone is a cross compiler. USE :
Embedded Computers (Microwave oven, Washing Machine)
Compiler Design
Other Variants
Application of Compiler
Divyaprabha K N
Assistant Professor,
Department of Computer Science & Engineering
COMPILER DESIGN
Unit 1
Compiler
• Target program is executable Machine language program
Input output
Target Program
COMPILER DESIGN
The Language Processing
System
Interpreter
#define PI 3.14
GCC : The acronym GCC is used to refer to the "GNU Compiler Collection". Over
time GCC has been extended to support many additional languages, including Fortran, A
DA, Java and ObjectiveC.
GCC is written in C with a strong focus on portability, and can compile itself, so it can
be adapted to new systems easily. GCC is a portable compiler-
it runs on most platforms available today, and can produce output for many types of proce
ssors.
GCC is not only a native compilerit can also crosscompile any program, producing
executable files for a different system from the one used by GCC itself. GCC has multiple
language frontends, for parsing different languages. Programs in each language can be co
mpiled, or crosscompiled, for any architecture. For example, an ADA program can be
compiled for a microcontroller, or a C program for a supercomputer.
Compiler Design
The language processing system
Compiled version in
memory is interpreted.
COMPILER DESIGN
Example of Language Processing System(LPS)
The sequence of commands executed by a single invocation of GCC consists of the following stages:
There are two types of external libraries: static library and shared library.
A static library has file extension of ".a" (archive file) in Unixes or ".lib"
(library) in Windows. When your program is linked against a static library, the
machine code of external functions used in your program is copied into the
executable. A static library can be created via the archive program “ar.exe".
Compiler Design
Shared Library
of
The compiler th does its
em in seven different
work
have access to
phases and each
something known as “Symbol
Table” and to an “Error
Handler”.
These two entities are explained
in the upcoming slides
Compiler Design
Grouping of phases into passes
Single pass Compiler – All phases are grouped into one part.
Lexical Analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
Two pass Compiler- The phases are grouped into two parts.
Source
Analysis Part
Program Lexical Syntax Semantic
Analysis Analysis Analysis (Front end of Compiler)
(HLL)
Intermediate Code
Target
Synthesis Part Code Code Machine
(Back end of Compiler) Optimizer Generator code
(Assembly level)
Compiler Design
Traditional two-pass compiler
Responsibilities
• Recognize legal programs
• Report errors for the illegal programs in a useful way
• Produce IR and construct the symbol table
• Much of front end construction can be automated
Compiler Design
The back-end
Source code
a = b + c; If we generate code for each statement separately
d = a + e; we will not generate efficient code
Target code
MOV b,R0
If c = 1, then:-
ADD c,R0 Code for first statement
MOV R0,a Redundant MOV b,R0
MOV a,R0 INC R0
ADD e,R0 ADD e.R0
Code for second statement
MOV R0,d MOV R0,d
Compiler Design
The back-end (Instruction Scheduling)
Typical Transformations:
• Analyzes IR and transforms IR
•Discover and
•Primary goal is to reduce running constantpropagate
values
time of the compiled code
•Discover a
•May also improve space, power consumption (mobile redundant
computing) computation and remove it
• Must also preserve “meaning” of the code •Remove unreachable code
THANK
YOU
Divyaprabha K N
Department of Computer Science & Engineering
[email protected]
Compiler Design
Divyaprabha K N
Assistant Professor,
Department of Computer Science & Engineering
COMPILER DESIGN
The Structure of a Compiler
Phases of a Compiler
SYMBOL TABLE
COMPILER DESIGN
The Structure of a Compiler
Phases of a Compiler
Source Lexeme
Program (Sequence of
Toke <token-name, attribute
toke nam aatttributtribu Tokens)
n value>
n e tee vvalue
Lexeme Tokens
Source a = b + c * 60
a <id, 1>
Program
<id, 1> <=> <id, 2> <+> <id, 3> <*> <60>
COMPILER DESIGN
The Structure of a Compiler (Phases of a Compiler)
a = b + c * 60
<id, 1> <=> <id, 2> <+> <id, 3> <*> <60>
id1 +
id2 *
id3 60
COMPILER DESIGN
The Structure of a Compiler (Phases of a Compiler)
=
id1 +
a
id2 *
b
=
c
id3 60
id1 + Type-
checking
id2 *
id3 inttofloat
60
COMPILER DESIGN
The Structure of a Compiler (Phases of a Compiler)
COMPILER DESIGN
The Structure of a Compiler (Phases of a Compiler)
id3 inttofloat
60
COMPILER DESIGN
The Structure of a Compiler (Phases of a Compiler)
t1 = id3 * 60.0
id1 = id2 + t1
COMPILER DESIGN
The Structure of a Compiler
(Phases of a Compiler)
LDF R2, id3
LD- Load MULF R2, #60.0
LD Register, Memory LDF R1, id2
ST- Store ADDF R1, R2
ST Memory, Register STF id1, R1
ADD- Additon
ADD R1, R2
ADD R1, R2, R3
MUL - Multiplication
MUL R1, R2
Symbol Table
Two pass Compiler- The phases are grouped into two parts.
COMPILER DESIGN
Grouping of Phases into passes
Code Code
Target Machine code
Optimizer Generator
(Assembly language code)
Divyaprabha K N
Department of Computer Science & Engineering
COMPILER DESIGN
Detailed study of Lexical Analysis
It reads the input characters of the source program , group them into Lexemes.
It produces a sequence of tokens for each lexemes in the source program
as output.
The stream of tokens is sent to the parser for syntax analysis.
When it discovers the lexeme representing an identifier, it will enter that
lexeme into the symbol table.
COMPILER DESIGN
Role of Lexical Analyzer
Lexical Analyzer
Lexical Analyzer are divided into two process
1.Scanning :- this consists of the simple processes that do not require tokenization
of the input, such as deletion of comments and compaction of consecutive
whitespace character into one.
2. Lexical Analysis :- It produces the sequence of tokens as output.
COMPILER DESIGN
Role of Lexical Analyzer
distinct terms :
Token :- It is a pair consisting of a token name and an optional attribute
value.
< token name, attribute value>
- Token name is an abstract symbol representing a kind of lexical unit
like keywords, identifiers
- Attribute value is pointer to the symbol table entry
COMPILER DESIGN
Role of Lexical Analyzer
Tokens, Patterns and Lexemes
Pattern :- is a description of the form that the Lexemes of a token may
take.
- keyword- sequence of character
- Identifiers - sequence of character with some complex Structure
fi Spelling Error
fi (a == f(x)) Unmatched string
Appearance of illegal
characters
Exceeding length of identifiers
COMPILER DESIGN
Role of Lexical
Analyzer
Lexical Errors
input
Replace character by another printf(“hi”);
character
Transpose
Delete two adjacent
successive characters.
characters from the remaining input until
Lexical Analyzer recognizes a well-formed token. This is
also called PANIC MODE error recovery.
COMPILER DESIGN
Role of Lexical
Analyzer
Error Recovery Actions
Delete one character from the remaining input.
Example -
Insert missing character into the remaining di{…
…..
input
Replace character by another } while();
character
Transpose two adjacent characters.
Delete successive characters from the remaining input until Lexical Analyzer
recognizes a well-formed token. This is also called PANIC MODE error recovery.
COMPILER DESIGN
Role of Lexical
Analyzer
Error Recovery Actions
Delete one character from the remaining input.
Example -
Insert missing character into the remaining rpintf(“hi”);
input
printf(“hi”);
Transpose
Replace character by another
two adjacent character
characters.
Delete successive characters from the remaining input until Lexical Analyzer
recognizes a well-formed token. This is also called PANIC MODE error recovery.
COMPILER DESIGN
Role of Lexical
Analyzer
Error Recovery Actions
Delete one character from the remaining input.
Example -
Insert missing character into the remaining printf(“hi”)$$;
input
printf(“hi”);
Replace character by another character
Transpose
Delete two adjacent
successive characters.
characters from the remaining input until
Lexical Analyzer recognizes a well-formed token. This is
also called PANIC MODE error recovery.
Compiler Design
Error Message
When writing an error message, make sure it satisfies the following properties:
● It must pinpoint the error correctly
● It must be understandable
● It must be specific
● It must not have any redundancy
THANK
YOU
Divyaprabha K N
Department of Computer Science & Engineering
[email protected]
Compiler
Design
Divyaprabha K N
Assistant Professor,
Department of Computer Science & Engineering
Compiler Design
Unit 1
Symbol Table
Divyaprabha K N
Department of Computer Science & Engineering
Compiler Design
Lecture Overview
Other attributes:
• A scope of a variable can be represented by
– A number (Scope is just one of attributes)
– A different symbol table is constructed for different scope.
e Int 0 4 0 2 7 argument
7. z = fun1(k,e,); TK_ID
https://www.tutorialspoint.com/compiler_design/compiler_design_symbol_table.htm
Compiler Design
Constructing the symbol table (Linked list implementation)
http://codingloverlavi.blogspot.com/2014/02/symbol-table-implementation-using.html
Compiler Design
Symbol table and scope
Divyaprabha K N
Department of Computer Science &
Engineering
[email protected]
Compiler Design
Divyaprabha K N
Assistant Professor,
Department of Computer Science & Engineering
COMPILER DESIGN: UNIT 1
Divyaprabha K N
Department of Computer Science & Engineering
Compiler Design
Lecture Overview
Loop example
DO 5 I = 1,25 Here DO is a keyword representing a loop (similar to FOR in C), I = 1,25 represents that the
iteration variable I ranges from 1..25. The number 5 represents that the loop extends from
the DO statement till the statement with label 5, including all statements that are in between.
Another example:
DO 5 I = 1.25 The only difference in this code from the previous code is the replacement of , with .. This
simply means that the variable DO5I = 1.25, i.e., a variable has been assigned an integer (there is no
loop). The problem is that just by looking at the first three characters, I cannot tell whether DO is a
keyword or a prefix of a variable name. Hence we need a lookahead. In this example, a large lookahead is
required. Ideally, the language design should ensure that the lookahead is small.
Compiler Design
Challenge : Scanning is hard
For example, when we read the first character of the keyword else, we need to lookahead to
decide whether it is an identifier or a keyword. Similarly, we need lookahead to disambiguate
between == and =.
This makes lexical analysis a bit more difficult -- need to decide what is a variable name and
what is a keyword, and so need to look at what's going on in the rest of the expression.
Compiler Design
Challenge : Scanning is hard
Reference:
https://docs.python.org/3/reference/lexical_analysis.html
Compiler Design
Challenge : Scanning is hard
Divyaprabha K N
Department of Computer Science & Engineering
[email protected]
Compiler Design
Preet Kanwal
Department of Computer Science & Engineering
Unit 1
The Phases of a Compiler
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lecture
Overview
To define the Grammar for C Language, let us first define the following:
P : Program Beginning
S : Statement
Declr : Declaration
Assign : Assignment
Cond : Condition UnaryExpr
: Unary Expression
Type : Data type
ListVar : List of variables
X : (can take any identifier or assignment)
RelOp : Relational Operator
Compiler Design
C Language Grammar
P →S
S → Declr; S | Assign; S | if (Cond) {S} S | while (Cond) {S} S |
if (Cond) {S} else {S} S | for (Assign; Cond; UnaryExpr) {S} S | return E; S | λ
→ Type ListVar
Declr
Type → int | float
ListVar → X | ListVar, X
Recall that
X → id | Assign E -> E + T | E – T | T
Assign → id = E T -> T * F | T / F | F
F -> G ^ F | G
Cond → E RelOp E G -> ( E ) | id | num
RelOp → < | > | <= | >= | == | !=
UnaryExpr → E++ | ++E | E-- | --E
Compiler Design
Another example for all phases of a compiler
We will now run this code through the different phases of a compiler and
see what the output will be at each stage.
Compiler Design
Example 1
(continued)
) }
Number 0
Literal – anything in double quotes
Inc_op
Compiler Design
Example 1
(continued)
Output: tokens
< keyword, while >
< punctuation, ( > or < ( > < ID, 3 > corresponds
< ID, 3 > to the third
identifier in the
< RelOp, > > or < > > symbol table
< Number, 0 >
< punctuation, ) > or < ) >
while (number > 0)
< punctuation, { > or < { >
< ID, 4 > {
< Assign, = > or < = > factorial = factorial *
< ID, 4 > number;
< ArithOp, * > or < * >
--number;
< ID, 3 >
< punctuation, ; > or < ; > }
< Dec_op, -- > or < -- >
< ID, 3 >
Compiler Design
Example 1
(continued)
Phase 2 : Syntax Analysis or Parser
Output: Parse Tree or Syntax Tree
Use the grammar
we have defined
earlier
number = t2 --number;
goto L0 }
L2 : …
Compiler Design
Example 1
(continued)
Phase 5 : Machine Independent Code Optimization
Output: DAG Optimization
LD R1, #0
L1 : LD R4, number L0 : If number > 0 goto L1
SUB R2, R1, R4
goto L2 while (number > 0)
BLTZ R2, L2
LD R3, factorial L1 : factorial = factorial * {
MUL R3, R3, R4
number number = number – 1 factorial = factorial *
ST factorial,
R3 SUB R4, number;
goto
R4, #1 --number;
ST number, L0 L2 :
}
R4 BR L1 …
L2:
Phase 7 : Machine dependent Optimized Code
Output: Optimized Target Code
Compiler Design
Example 2
Run this code through the different phases of a compiler and produce the
output at each stage.
Compiler Design
C Language Grammar
P →S
S → Declr; S | Assign; S | if (Cond) {S} S | while (Cond) {S} S |
if (Cond) {S} else {S} S | for (Assign; Cond; UnaryExpr) {S} S |
return E; S | λ
Declr → Type ListVar
Type → int | float
ListVar → X | ListVar, X Recall that
E -> E + T | E – T | T
X → id | Assign
T -> T * F | T / F | F
Assign → id = E F -> G ^ F | G
Cond → E RelOp E G -> ( E ) | id | num
RelOp → < | > | <= | >= | == | !=
UnaryExpr → E++ | ++E | E-- | --E
Compiler Design
Simple IR
Optimizations
Run this code through the different phases of a compiler and produce the
output at each stage.
THANK
YOU
Preet Kanwal
Department of Computer Science & Engineering
[email protected]
Compiler Design
Preet Kanwal
Department of Computer Science & Engineering
Unit 1
The Phases of a Compiler
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lecture
Overview
We will now run this code through the different phases of a compiler and
see what the output will be at each stage.
Compiler Design
Example 2
(continued)
goto L2 {
lhs = lhs – 2 * rhs;
L1 : t1 = 2 * rhs
}
t2 = lhs -
t1 lhs = t2
Compiler Design
Example 2
(continued)
Phase 5 : Machine Independent Code
Optimization Output: Optimized IR
– t2,
lhs = 0
lhs rhs = 2 int rhs, lhs = 0;
We will now run this code through the different phases of a compiler and
see what the output will be at each stage.
Compiler Design
Example 3
(continued)
;
Number 23 0
Compiler Design
Example 3
(continued)
Output: # of tokens : 25 Symbol
Table
< ID, 1 > name type …
<=> 1 n
<(> …
n = 23;
for (i = 0; i < n; i++)
{
sum = sum * i;
}
Compiler Design
Example 3
(continued)
Phase 3 : Semantic Analyzer
Output: Semantically checked Syntax
Tree
n = 23;
for (i = 0; i < n; i++)
{
sum = sum * i;
}
Compiler Design
Example 3
(continued)
Phase 4 : Intermediate Code Generator
Output: Three Address Code (3AC or
TAC)number = 23
i=0
L0 : If i < n goto L1
n = 23;
goto L2
for (i = 0; i < n; i++)
L1 : t1 = sum * i
{
sum = t1
sum = sum * i;
t2 = i + 1
}
i = t2
goto L0
L2 : …
Compiler Design
Example 3
(continued)
Phase 5 : Machine Independent Code
Optimization
Output: Optimized IR
number = 23
i=0
n = 23;
L0 : If i < n goto L1
for (i = 0; i < n; i++)
goto L2
{
L1 : sum = sum * i
sum = sum * i;
i=i+1
}
goto L0
L2 : …
Compiler Design
Example 3
(continued)
Phase 6 : Code
Generator Output:
Assembly Code number = 23
MOV R1, #23 //
i=0
R1 → n MOV R2, #0 //
L0 : If i < n goto L1
R2 → i n = 23;
MOV R4, sum // R4 = goto L2 for (i = 0; i < n; i++)
contents(sum) L0 : SUB R3, R1, R2 L1 : sum = sum * i {
BLTZ R3, L2 i=i+1 sum = sum * i;
MUL R4, R4, R2
goto L0 }
A L2 : …
DD
R2 ,
Compiler Design
Example 3
We will now run this code through the different phases of a compiler and
see what the output will be at each stage.
Compiler Design
Example 3
(continued)
;
Number 23 0 65
Literal – anything in double quotes “%c”
Compiler Design
Example 3
(continued)
Output: tokens
< ID, 1 >
<=>
< Number, 23 >
< Keyword, for
>
<(> <{> n = 23;
< ID, 2 > < ID, 3 >
for (i = 0; i < n; i++)
<=> <(>
< Number, 0 > < Literal, “%c” {
<;> >
< ID, 2 > <,> printf(“%c”,
<<> < Number, 65 > 65+i);
}
< ID, 1 > <+>
<;> < ID, 2 >
< ID, 2 > <)>
< ++ > <;>
<)> <}>
Compiler Design
Example 3
(continued)
Phase 2 : Syntax Analysis or
Parser Assume slightly
modified
grammar (with
functions)
n = 23;
for (i = 0; i < n; i++)
{
printf(“%c”, 65+i);
}
S -> id ( Param ); S
Param -> E, Param | E
Compiler Design
Example 3
(continued)
Phase 3 : Semantic Analyzer
Output: Semantically checked Syntax
Tree
n = 23;
for (i = 0; i < n; i++)
{
printf(“%c”, 65+i);
}
Compiler Design
Example 3
(continued)
Phase 4 : Intermediate Code Generator
Output: Three Address Code (3AC or
TAC)n = 23
i=0
L0 : If i < n goto L1
n = 23;
goto L2
for (i = 0; i < n; i++)
L1 : t1 = 65 + i
{
call (printf,
printf(“%c”,
t1) t2 = i + 1 65+i);
}
i = t2
goto L0
L2 : …
Compiler Design
Example 3
(continued)
Phase 5 : Machine Independent Code
Optimization
Output: DAG
Optimization
n = 23
i=0
L0 : If i < n goto L1 n = 23;
goto L2 for (i = 0; i < n; i++)
L1 : t1 = 65 + i {
call (printf, printf(“%c”,
t1) i = i + 1 65+i);
}
goto L0
L2 : …
Compiler Design
Example 3
(continued)
Phase 6 : Code
Generator Output:
Assembly
MOV R1,Code
#23 n = 23
ST n, R1 i=0
LD R2, #0 L0 : If i < n goto L1 n = 23;
L1 : goto L2 for (i = 0; i < n; i++)
SUB R3, R1,
L1 : t1 = 65 + i {
R2 BLTZ R3,
L2 call (printf, printf(“%c”,
LD R4, #65 t1) i = i + 1 65+i);
}
ADD R5, R4, goto L0
R2 PRINT R5
L2 : …
ADD R2, R2,
#1 BR L1
Compiler Design
Example 4 - Take home
exercise
Exercise Problem : Consider the following piece of
code: int n = 3;
do This problem is for the
{ understanding of the
if( (n/2)*2 == n ) concept by testing you
{ on if-else and do-while
n = n / 2; constructs.
}
Note: The Parse Tree
else
may become large, try to
{ do the if-else part
n = 3 * n + 1; separately
}
} Hint: Consider
while(n != 1 || n != 2 || n != S -> do {S} while (S);
4); return n; S in the grammar
THANK
YOU
Preet Kanwal
Department of Computer Science & Engineering
[email protected]
Compiler Design
Preet Kanwal
Department of Computer Science & Engineering
Unit 1
Lexical
Analysis
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lecture
Overview
The Lexical analyzer reads the input characters of the source program and
then performs the following steps:
1. Groups the input characters into Lexemes.
2. Produces a sequence of tokens for each lexeme in the source program as
output.
3. Sends the stream of tokens to the parser for Syntax Analysis.
4. When a lexeme representing an identifier is discovered, it is entered into
the symbol table.
Compiler Design
Role of a Lexer
(2)
Error
Handler
Symbol
Regex Grammar
Table
Compiler Design
Interaction Between the Lexical Analyzer and Parser
(2)
● The lexical analyzer reads characters of the source program one at a time to
discover tokens.
● Thus, speed of lexical analysis is a concern.
● In many languages, there are scenarios when the lexer needs to look ahead
at least one character before a match can be announced.
Solution - The lexical analyzer reads from Input Buffers
● There are many schemes that can be used to bufferinput.
This course discusses two schemes -
○ Buffer Pairs
○ Sentinels
Compiler Design
Input Buffers - Buffer
Pairs
Solution - Combine the buffer-end check with the check for current
character.
This can be done by extending each buffer to hold a Sentinel character (EOF)
at the end.
Compiler Design
Input Buffering : Buffer pairs with
Sentinels
forward = forward + 1;
if forward = eof then begin
if forward at end of first half then begin
reload second half;
forward := forward +1
end
else if forward at end of second half then begin
reload first half;
move forward to beginning of first half
end
else
terminate lexical analysis
/* eof within buffer, i.e, end of input
*/
end
Compiler Design
Lexical Errors
1. Spelling Error
di{…
2. Unmatched string
3. Appearance of illegal …………………}
characters while();
4. Exceeding length of identifiers
Compiler Design
Lexical Errors
1. Spelling Error
2. Unmatched string printf(“hi”)$$;
3. Appearance of illegal
characters
4. Exceeding length of identifiers
Compiler Design
Lexical Errors
1. Spelling Error
More than 32
2. Unmatched string characters in
3. Appearance of illegal an identifier
characters
4. Exceeding length of identifiers
Compiler Design
Error Message
Preet Kanwal
Department of Computer Science & Engineering
Lex
Tutorial
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lex and Yacc
● The Lex specification file is divided into three sections with%% dividing
the sections.
● The structure of a lex specification file is as follows -
User subroutines are auxiliary procedures needed by the actions. The subroutine can be loaded with
the lexical analyzer and compiled separately.
Compiler Design
Lex Specification
file
● Input is copied to output one character at a time.
● The first %% is always required as there must always be a rules section.
● However, if we don’t specify any rules, then the default action is to
match everything and copy it to output.
● Defaults for input and output are stdin and stdout.
● Any source code not intercepted by lex is copied
into the generated program.
○ A line that is not part of a lex rule or action, which begins with a blank
or tab, is copied out as above (useful for global declarations)
○ Anything included between lines containing only %{ and %}
(useful for preprocessor statements that must start in
col.1)
○ Anything after the second %% delimiter is copied
Compiler Design
Lex Specification
file
● Definitions intended for lex are given before the first %%. Anyline
in this section that does not begin with a blank or tab, or is not
enclosed by %{...%}, is assumed to be defining a lex substitution
string of the form
name translation , for e.g, letter [a-zA-Z]
● yytext : a character array that contains the
actual string that matches a pattern. [0-9]+ { yylval =
atoi(yytext);
● yyleng : the no. of characters matched.
return INTEGER;}
● Values associated with the token are returned by lex in a union
variable yylval.
● Local variables can be defined.
● Do not leave extra spaces and/or empty lines at the end of the
lex specification file.
Compiler Design
Lex Specification file - A Simple
Example
https://www.epaperpress.com/lexandyacc/prl.html
digit [0-9]
● This example demonstrates the letter
use of the definitions section. [_A-Za-z]
%{
● The definitions section is #include<stdio.h>
composed of substitutions, code, %}
and start states. %% printf("Identifier :
{letter}({letter}|{digit}) %s",yytext);
● Code in the definitions section is *
simply copied as-is to the top of .;
the generated C file and must be %%
bracketed with %{ and %} markers. int
yywrap(){return(1);}
yyin = fopen("input.txt",
● Substitutions simplify int"r");
main()
yylex();
pattern-matching { fclose(yyin);
rules. return 0;
}
Program file -
Compiler Design
Lex Specification file - Example 4 - Word
count
%{
● This example counts the number of
int nchar, nword, nline;
characters, words, and lines in a %}
file (similar to Unix wc) %%
● Whitespace must separate the \n { nline++; nchar++; }
defining term and the [^ \t\n]+ { nword++, nchar += yyleng; }
associated expression. . { nchar++; }
%%
● References to substitutions in the int
rules section are surrounded by yywrap(){return(1);}
braces ({letter}) to distinguish int main(void) {
them from literals. yyin = fopen("input.txt",
"r"); yylex();
● When we have a match in the printf("%d\t%d\t%d\n", nchar, nword,
rules section the associated C nline); return 0;
code is executed. }
Program file -
Compiler Design
Lex Specification file - Example 5 - Removing
comments
%%
● This example removes single line
\/\/(.*) ;
and double line comments. \/\*(.*\n*)*.*\*\/ ;
%%
int
yywrap(){return(1);}
int main()
{
yyin=fopen("input.txt","r")
; yylex();
return 0;
}
Program file - removing_comments.l
Compiler Design
yyparse() and yylex()
https://www.ibm.com/docs/en/zos/2.4.0?topic=yacc-function-section#yacfun
yylex() returns a value indicating the type of token that has been obtained. If
the token has an actual value, this value (or some representation of the
value, for example, a pointer to a string containing the value) is returned in
an external variable named yylval.
It is up to the user to write a yylex() routine that breaks the input into tokens
and returns the tokens one by one to yyparse(). See Function section for
more information on the lexical analyzer.
Compiler Design
Start conditions
https://support.workiva.com/hc/en-us/articles/4407304269204-Regular-expression-operators
Compiler Design
Commonly used Regular expressions in lex specification
file
Pattern Matches
[a] a but nothing else
[ ]a] ] or a but nothing else
[[a] [ or a but nothing else
[ ][ ] ] or [ but nothing else (why?)
\[ [ (not a character class)
[-az] - or a or z but nothing else
[az-] - or a or z but nothing else
[ ]-az] any character from ] to a, or z
[ ]az-] ] or a or z or - but nothing else
[ab^] a or b or ^ but nothing else
[ ]az^-] ] or a or z or ^ or - but nothing else
Pattern Matches
[^ab] any character EXCEPT a or b
[^^] any character EXCEPT ^
https://www.ics.uci.edu/~alspaugh/cls/shr/regularExpression.html
THANK
YOU
Preet Kanwal
Department of Computer Science & Engineering
[email protected]
Compiler Design
Preet Kanwal
Department of Computer Science & Engineering
Unit 1
Design of a Lexical Analyzer
Generator
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lecture
Overview
Given -
abb printf(“1”)
aba printf(“2”)
a printf(“3”)
Questions:
• Provide output for the String : a
• Provide output for the String :
ababa
Compiler Design
How does a Lexer resolve
ambiguities?
Questions:
● Provide output for the String : a
○ Answer - 3 Recall Panic
● Provide output for the String : Mode
ababa Recovery
○ Answer
■ aba --22b3
■ b
■ a-3
Compiler Design
Example 2
Given -
a*b printf(“1”)
(a|b)*b printf(“2”)
c* printf(“3”)
Questions:
• Provide output for the String : cbabc
• Provide output for the String : cbbbbac
• Provide a String such that the Output is
132
Compiler Design
Example 2 -
Solution
a*b printf(“1”)
(a|b)*b printf(“2”)
c* printf(“3”)
● Provide output for the String : cbabc ● Provide a String such that the Output is
○ Answer - 323 132
■ c-3 ○ Answer - abccbb
■ bab - 2 ■ ab - 1
■ c-3 ■ cc - 3
● Provide output for the String : ■ bb - 2
cbbbbac ○ Other answers are possible
○ Answer - 32a3
■ c-3
■ bbbb - 2
■ a-a
■ c-3
Compiler Design
Example 3
Given -
aa printf(“1”)
b?a+b? printf(“2”)
b?a*b? printf(“3”)
Questions:
● Provide output for the String : bbbaabb
● Provide a String such that the Output is
123
● Provide a String such that the Output is
321
Compiler Design
Example 3 -
Solution
aa printf(“1”)
b?a+b? printf(“2”)
b?a*b? printf(“3”)
● Provide output for the String : bbbaabb
○ Answer - 323
■ bb - 3
■ baab - 2
■ b-3
● Provide a String such that the Output is
123
○ Not possible
● Provide a String such that the Output is
321
○ bbbabaa
■ bb - 3
■ bab - 2
Compiler Design
Example 4
Answer -
Suppose the patterns are as follows
:a* print(“1”)
ab print(“2”)
bb print(“3”)
Program Files -
● Example 1 - example1.l
● Example 2 - example2.l
● Example 3 - example3.l
● Example 4 - example4.l
○ In this program, lex shows a warning - “rule cannot be matched”,
because the first rule always matches the lexemes matched by the
other 2 rules.
Compiler Design
Lexical Analyzer with
Lex
Compiler Design
The Structure of the Lex Analyzer
Generator
Compiler Design
Constructing Automata - RE to NFA to
DFA
Compiler Design
Constructing Automata - RE to NFA to
DFA
The generated lexical analyzer consists of components that are created from
the lex specification file.
These components are -
● Transition Table - Created for all patterns defined in the lex program.
● Actions - Fragments of code defined to their corresponding patterns.
○ Actions are invoked by the Automata Simulator at the appropriate
time.
● Functions - Defined in the auxiliary functions section of the lex program,
these are passed directly through lex to the output file.
● Automata Simulator - The program that serves as the lexical analyzer and
uses the components mentioned above.
○ It simulates an automata (NFA or DFA)
Compiler Design
Question 1- Construct an automata for the lex
program
Sequence of
states entered
while processing
input aaba
Compiler Design
Question 1- Construct an automata for the lex
program
● At this point, the longest prefix that matches the NFA can be identified,
● As indicated in the diagram, in state 8, aab matches the pattern a*b.
● Prefix aab is the longest prefix, and so it executes action A3.
● The forward pointer is movies to the end of the lexeme.
● Note - If there are several accepting states in the set, the action
associated with the earliest pattern is executed.
Sequence of
states entered
while
processing
input aaba
Compiler Design
Question 1- Construct an automata for the lex
program
Sequence of
states entered
while
processing
input abba
Compiler Design
Question 2- Construct an automata for the lex
program
Sequence of
states entered
while processing
input ababc
Compiler Design
Question 1- Construct an automata for the lex
program
● At this point, the longest prefix that matches the NFA can be identified,
● As indicated in the diagram, in state 8, abab matches the pattern (a|b)*b.
● Prefix abab is the longest prefix, and so it executes action A2.
● The forward pointer is movies to the end of the lexeme.
● Note - If there are several accepting states in the set, the action
associated with the earliest pattern is executed.
Sequence of
states entered
while
processing
input aabac
Compiler Design
Question 1- Construct an automata for the lex
program
Sequence of
states entered
while
processing
input ababc
Compiler Design
Lookahead Operator in Lex
(/)
digit [0-9]
letter
[A-Za-z]
%{
#include <stdio.h>
The rule
}% .;
%% is used to ignore
if|else|int|char|float {printf("Keyword :\t %s",yytext); all other
characters.
{letter}({letter}|{digit})* printf("Valid Identifier:\t %s",yytext);
.;
%%
int
yywrap(){return(1);}
int main() {
Program file : keywords_and_identifiers.l
yyin = fopen(“input.txt”,
THANK
YOU
Preet Kanwal
Department of Computer Science & Engineering
[email protected]
Compiler Design
Preet Kanwal
Department of Computer Science & Engineering
Unit 1
Specification and Recognition of Tokens
Preet Kanwal
Department of Computer Science & Engineering
Compiler Design
Lecture Overview
Recall that
Token :- It is a pair consisting of a token name and an optional attribute
value.
<token name, attribute value>
● Token name is an abstract symbol representing a kind of lexical unit -
keywords, identifiers etc.
● Attribute value is pointer to the symbol table entry.
Compiler Design
Examples
● When more than one lexeme can match a pattern, lexical analyzer must
provide additional information about each lexeme matched.
● This additional information is stored in the attribute-value field of the token.
● The attribute-value field can include several pieces of information.
● Consider the example : token ID (an identifier)
○ Its attributes are the lexeme, the type and the location at which it is first
found. This is added to the symbol table.
○ The attribute value is the pointer to the symbol table entry for that
identifier.
● Lexical analyzer returns token name and attribute value to parser.
● The token name influences parsing decisions, whereas the attribute value
influences translation of tokens after parsing.
Compiler Design
Implementation of Lexer
Hand-written Implementation
Compiler Design
Specification of Tokens
● Regular expressions are special text strings that describe a search pattern.
● It is mainly used for pattern matching.
● It is a metalanguage - a form of language used for description or analysis of
another language.
● It uses metacharacters - characters that have special meaning. For eg -
○ * matches any number of characters
○ . matches single characters except newline
○ \ drops the special meaning of the next character
● A Regular language is a formal language that can be expressed using a regular
expression.
● Regular expressions are the grammatical notations used for describing
structure of tokens, i.e, patterns.
Compiler Design
Regular Definitions
○ getToken() - examines the symbol table entry for the lexeme found and
returns the corresponding token name.
○ fail() - resets forward ptr to lexeme_begin to try another transition
diagram.
■ What the fail() function does depends on the global error recovery
strategy of the Lexical Analyzer
○ isalpha(c) - returns true iff c is an alphabet
○ isdigit(c) - returns true iff c is a digit
○ isalnum(c) - returns true iff c is an alphabet/digit
○ isdelim(c) - returns true iff c is a delimiter
Compiler Design
Transition Diagram for Relational operators
Regular Expression -
Regular Expression -
letter [a-zA-Z]
digit [0-9]
letter ([letter|digit] | ”_”)*
Compiler Design
Implementation for Identifiers
non-letter/digit
simply means that
the transition
should be over a
non-letter, non-digit
character
Compiler Design
Implementation for Keywords
Regular Expression -
blank ““
tab “\t”
Here, the input is
newline “\n”
ws -> (blank | tab | newline)+
retracted to begin at a
non-whitespace
There is no return to
parser.
Regular Expression -
digit [0-9]+
number digit(.digit)?(E[+-]? digit)?
Compiler Design
Transition Diagram for Single and Multi-Line Comments in C
Regular Expression -
single-line comments: \/\/(.*)
multiline comments: \/\*(.*\n)*.*\*\/
Compiler Design
Keywords vs Identifiers - How does the lexer handle keywords?
There are two ways to handle keywords that look like identifiers -
1. Install the reserved words in the symbol table beforehand.
a. One field of the symbol table entry can indicate that these are reserved
words (e.g. token_name = keyword)
b. Thus, when an identifier is found
i. installID() is called to place the identifier in the symbol table.
ii. Since keywords were preloaded in the symbol table, installID()
returns the pointer to the keyword found.
c. getToken() can be used to return the token_name
Compiler Design
Keywords vs Identifiers - How does the lexer handle keywords?
int a = 20;
Tokens Count
int keyword 1
a Identifier 1
= Operator 1
20 Constant 1
; Special Symbol 1
Total 5
Compiler Design
Question 2: Find the number of tokens
Tokens Count
int main Keyword 2
() Special symbol 2
int main() { Special symbol 1
{ //Compiler Design Comment 0
//Compiler Design
int Keyword 1
int a, b;
a Identifier 1
a = 10;
b = 20; , Separator 1
Total 22
COMPILER DESIGN
Question 3: Write the tokens and associated attribute
E = M * C ** 2
Pattern Lexemes
Keyword if, else
if(a > b) Identifier a, b
{ Relop >
a = a + 1; Assignop =
}
else Arithop +
{ Number 1
b = b + 1; Punctuation ( ) { ; } { ; }
} Number 0
COMPILER DESIGN
Question 4: List all the tokens in the following code snippet
Preet Kanwal
Department of Computer Science & Engineering
[email protected]
Compiler Design
Preet Kanwal
Department of Computer Science & Engineering
Unit 2
Parsers and Error Recovery
Strategies
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lecture
Overview
• Lexical errors include misspellings of identifiers, keywords, or operators — e.g., the use of
an identifier elipseSize instead of ellipseSize — and missing quotes around text intended as
a string.
• Syntactic errors include misplaced semicolons or extra or missing braces; that is, "{ " or "}.
" As another example, in C or Java, the appearance of a case statement without an enclosing
switch is a syntactic error (however, this situation is usually allowed by the parser and
caught later in the processing, as the compiler attempts to generate code).
• Semantic errors include type mismatches between operators and operands. An example is
a return statement in a Java method with result type void.
• Logical errors can be anything from incorrect reasoning on the part of the programmer to
the use in a C program of the assignment operator = instead of the comparison operator ==.
The program containing = may be well formed; however, it may not reflect the
programmer's intent.
Compiler Design
Error Recovery
Strategies
Error Recovery
Strategies
Compiler Design
Error Recovery in
Parsers
2. Phrase level
recovery
● This involves corrections in the
performing
program, in caselocal
it contains errors. input
● For example,
○ missing semicolon ; - insert it
○ missing closing bracket ) - insert it
● The choice of the correction is left to the compiler designer.
However, it needs to be done carefully.
○ Incorrectly inserting characters could cause the program
to perform erroneously.
Compiler Design
Error Recovery
Strategies
3. Error productions
● The idea here is to specify in the grammar known common
mistakes.
For example,
○ S → if (cond) {S} else {S}
code.
○ S → fi (cond) {S} else {S}
can be written as an error production, with action being
a more informative message about the error to the user.
Compiler Design
Error Recovery
Strategies
4. Global Correction
● Extremely hard to implement
● Takes the grammar and the input program (with errors) as
input, and passes them to the algorithm to find a
program(string) y that is closest correct program to the
original program x.
● This has minimal changes done to x.
● Disadvantages
○ The closest correct program y may not exactly be what
the programmer intended to write.
○ It slows down parsing of correct programs and
expensive.
Compiler Design
Error Recovery
Strategies
Types of
Parsers
Compiler Design
Types of Parsers - An
Overview
Parser
s
Preet Kanwal
Department of Computer Science & Engineering
Unit 2 - Types of
Parsers
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lecture
Overview
Parser
s
Preet Kanwal
Department of Computer Science & Engineering
Unit 2
Recursive Descent Parsers
with Backtracking
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Recap - Top Down Parsers
X → oa | ea
Z → ai
1. Parser program may not accept the input even if the input
string belonged to the grammar because of the order of
productions.
2. RDP with backtracking cannot work with left recursive
grammars because it would cause the program to enter an
infinite loop.
3. Reversing semantic actions during the parsing of a string
using RDP with backtracking is an overhead.
Compiler Design
Drawbacks of RDP with backtracking
Given -
S → cAd
A → a |
ab
○ Input string - cabd
3. Overhead
● RDP with backtracking involves a series
of erroneous expansions and corresponding semantic
actions.
● Reversing semantic actions during the
parsing of a string causes substantial
overhead.
Compiler Design
Question - Working of RDP
Preet Kanwal
Department of Computer Science & Engineering
Unit 2
Recursive Descent Parsing
without Backtracking
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Recap - Top Down
Parsers
X → b
→ eS | λ
Compiler Design
Top-down Parsing without
backtracking
2. Elimination of Left Recursion
A left recursive grammar is typically of the
form A -> Aα, where α ϵ (V U T)*
[ V = variable, T = terminal]
● Left recursion is eliminated to prevent top down parsers from
entering into an infinite loop.
● Left recursion can be
○ Immediate - For example -
○ A → Aa |b
○ Indirect - Recursion occurs in two or more steps
Example
S -→ Aab
A → Sd |
λ
Compiler Design
How to eliminate Left
Recursion?
A → Sd |
λ by Aab in the second
○ Here, S is replaced
production.
S → Aab
A → Aabd |
λ
Compiler Design
Algorithm to eliminate Left
Recursion?
A → β A’ | β A’ |
… | β A’
1 2 s
A’ → α A’ | α A’ |
Compiler Design
Eliminating Left Recursion - Example
1
C → CD
D → A |
c
→ d
Eliminate left
recursion.
Compiler Design
Eliminating Left Recursion - Example
1
Given -
A → Bxy |
x
B → CD
C → A| c
Step
D 1 - Replace
→ dindirect left
recursion
A → Bxy |x
B → CD
C → Bxy | x => C → CDxy |
D | c c | x
→ d
Compiler Design
Eliminating Left Recursion - Example
1
A → Bxy |x
B → CD
C → CDxy |
D c | x
→ d
C’
C’ → D x y C’ |
Compiler Design
Eliminating Left Recursion - Example
2
Eliminate left
recursion.
Compiler Design
Eliminating Left Recursion - Example
2
Answer -
S → aAcB
A → b A’ | b c A’
A’ → b A’ | λ
Compiler Design
Eliminating Left Recursion -
Exercise
SX → → i C teSSX | λ | a
C → c
Preet Kanwal
Department of Computer Science & Engineering
Unit 2
Predictive Parsers
(LL(1))
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lecture
Overview
F F -> id
F -> num
F -> (
● According to Rule 1, First(F) = { id, num, ( }
● Since F does not have any λ productions, we stop here.
Since First(T) = First(F),
First(T) = { id, num, ( }
Compiler Design
Follow Sets
A -> aBb
B -> c | λ
● When the input symbol is b, the non-terminal to derive
is B. There is no production of B that starts with b, so
the parser is now stuck.
● However, if the lambda production of B, B -> λ, is
applied, B vanishes and the parser parses ab correctly.
● A non-terminal should vanish if the symbol that follows
the non-terminal in the input is the same as the symbol
that follows it in the production rule/in some
Compiler Design
Follow Sets
Follow(A) }
3. If A -> αB, Follow(B) = Follow(A)
Note: Follow set should not contain λ
where α, β ∈ (V U T)*
V = Variables (non
Compiler Design
Follow Sets - Example
1
E → TE’ Follow(E) = $
Follow(T) = {+}⋃
Follow(E) Follow(E’) =
Follow(E)
E’ → + T E’/λ Follow(T) = { + } ⋃ Follow(E’)
Follow(E’) = Follow(E’)
T → F T’ Follow(F) = { * } ⋃ Follow(T')
Follow(T’) = Follow(T)
T’ → * F T’/λ Follow(F) = { * } ⋃ Follow(T’)
Follow(T’) = Follow(T')
F → (E) Follow(E) = {)}
Compiler Design
Steps to construct LL(1) Parsing
Table
id + * num ( ) $
E’
T’
F
Compiler Design
Constructing LL(1) Parsing Table -
Example 1
For each production in the entries to
grammar, add appropriate locations. the
id + * num ( ) $
E → TE’
Add the E→
E E -> TE’ E -> TE’ E -> TE’
TE’ under each
E’ element in First(T)
T First(T) = {id,
num and ( }
T’
F
Compiler Design
Constructing LL(1) Parsing Table -
Example 1
id + * num ( ) $ E’ → +
T E’ Add E’ →
E E -> TE’ E -> TE’ E -> TE’ + T E’ under +
T T -> FT’ T -> FT’ T -> FT’ Add T -> FT’ under
each element.
T’
F
Compiler Design
Constructing LL(1) Parsing Table -
Example 1
For each production in the grammar, add entries
to the appropriate locations.
id + * num ( ) $ T’ → *FT’
Add T’ →
E E -> TE’ E -> TE’ E -> TE’
*FT’ under *
Thus, this is the final LL(1) parsing table for the given
grammar.
id + * num ( ) $
Preet Kanwal
Department of Computer Science & Engineering
Unit 2
Predictive Parsers
(LL(1))
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lecture
Overview
Given -
S → AB
A. → a|λ
B. → b|λ
No left recursion present, and grammar is left
factored.
Compiler Design
LL(1) Parsing Algorithm - Example
5
a b $
S S → AB S → AB S → AB
A A →a A →λA →λ
B B →bB →λ
Compiler Design
LL(1) Parsing Algorithm - Example
5
S → ABCDE
A. → a|λ
B. → b|λ
C. → c
D. → d|λ
E. → e|λ
Compiler Design
LL(1) Parsing Algorithm - Example
6
S → ABCDE
A. → a|λ
B. → b|λ
C. → c
D. → d|λ
E. → e|λ
Compute First and Follow Sets
- First Table Follow Table
S A B C D E S A B C D E
a a b c d e $ b c d e $
b λ λ λ λ c e $
c $
Compiler Design
LL(1) Parsing Algorithm - Example
6
a b c d e $
Bbby A → x
B→ x
2. S→ Z
Z→ aMa | bMb | aRb |
bRa M→ c
R→ c
Compiler Design
Identify whether a grammar belongs to LL(k) -
Solution
Bbby A → x
B→ x
Answer - Yes. It belongs to LL(4).
2. S→ Z
Z→ aMa | bMb | aRb |
bRa M→ c
R→ c
Answer - Yes. It belongs to LL(3).
THANK
YOU
Preet Kanwal
Department of Computer Science & Engineering
[email protected]
Compiler Design
Preet Kanwal
Department of Computer Science & Engineering
Unit 2
Predictive Parsers
(LL(1))
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lecture
Overview
Given -
S → a |
(L)
L → L,S |
First, remove left S
recursion.
● Arrange non-terminals -
S → a | (L
L )
→ L,S | a
● Replacing productions with left recursion
| (L)
- S
→ a |
L (L)
L → a L’ |
( L ) L’
Compiler Design
LL(1) Parsing Algorithm - Example
2
L → a L’ |
( L ) L’
’
→ , S L’ |
FirstλTable Follow Table
S L L’ S L L’
a a , $ ) )
( ( λ ,
)
Compiler Design
LL(1) Parsing Algorithm - Example
2
a ( ) , $
S S->a S->(L)
L L->aL’ L->(L)L’
L’ L’->λ L’->,SL’
Compiler Design
LL(1) Parsing Algorithm - Example
2
Stack Input Buffer Action
S$ ( a , a ) $ M[S,(] = S -> (L)
(L)$ ( a , a ) $ M[(,(] = match
)$ ) $ M[),)] = match
$ $ Accept
Compiler Design
Identify whether a grammar belongs to
LL(1)
S → AaAb |
A BbBa
B → λ
→ λ
Compiler Design
Identify whether a grammar belongs to LL(1) - Example
3
S → AaAb |
A BbBa
B → λ
→ λ
Compute First and Follow Sets
-
S → AaAb |
BbBa
is of the format
S → a |
b
First(AaAb) = First(A) = { a }
First(BbBa) = First(B) = { b }
S → iCtS| iCtSeS
C | a
→ b
Compiler Design
Identify whether a grammar belongs to LL(1) - Example
2
• If we construct the parse table M for the following
grammar,
S → iCtS| iCtSeS
C | a
→ b
• M[S,i] will have 2
productions
S → iCtS
S → iCtSeS
• Therefore, it is not in LL(1).
• To modify, left factor the above grammar
-
S → iCtSX | a
X → eS |λ
C → b
Compiler Design
Identify whether a grammar belongs to LL(1) - Example
4
• Is the modified grammar in LL(1)
?
S → iCtSX | a
X → eS |λ
C → b
• Notice the production
X → eS |
λ
is of the form
A → a |λ
• First(e) = e
• Follow(X) = { e, $ }
• Thus, First(e) ∩ Follow(X) = e ≠Φ
• Therefore, the modified grammar is also not in
Compiler Design
LL(k) grammars,
k>1
As mentioned
previously,
1. S → ab | ac | ad LL(2)
2. S→aaB|aaC LL(3)
B→b
C→c
3. E→T+E|T LL(2)
T→ id | id * T | ( E )
5. S→ Z LL(3)
Z→ aMa | bMb | aRb | bRa
M→ c
R→ c
Compiler Design
Identify whether a grammar belongs to LL(k) -
Examples
2. S → aaB | a
B aC
C → b
→ c
3. E → T+E | T
T → id | id * T
| (E)
THANK
YOU
Preet Kanwal
Department of Computer Science & Engineering
[email protected]
Compiler Design
Preet Kanwal
Department of Computer Science & Engineering
Unit 2
Predictive Parsers
(LL(1))
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lecture
Overview
a If M[X,a] == sync
a
Compiler Design
Error Recovery in LL(1) Parser - Example
1
Implement the parsing table for the following
grammar with panic mode recovery. Also, parse the
string
“)id*+id”
E → T E’
E’ → + T E’ | λ
T → F T’
T’ → * F T’ | λ
F → num | id |
(E
)
Compiler Design
Error Recovery in LL(1) Parser - Example
1
Given,
E → T E’
E’ → + T E’ | λ
T → F T’
T’ → * F T’ | λ
F → num | id |
(E Follow Table
)
First Table E E’ T T’ F
E E’ T T’ F $ $ + + *
id + id * id ) ) $ $ +
num λ num λ num ) ) $
( ( ( )
Compiler Design
Error Recovery in LL(1) Parser - Example
1
The LL(1) parsing table without panic mode
recovery-
id + * num ( ) $
Now, using the new LL(1) Parsing table M, parse the input
string
) id * + id
Initially, the stack and input buffer contain -
Preet Kanwal
Department of Computer Science & Engineering
Unit 2
Predictive Parsers
(LL(1))
Preet Kanwal
Department of Computer Science &
Engineering
Compiler Design
Lecture
Overview
L → a L’ |
( L ) L’
’
→ , S L’ |
λ
Compiler Design
Error Recovery in LL(1) Parser - Example
2
L → a L’ |
( L ) L’
’
→ , S L’ |
FirstλTable Follow Table
S L L’ S L L’
a a , $ ) )
( ( λ ,
)
Compiler Design
Error Recovery in LL(1) Parser - Example
2
a ( ) , $
S S->a S->(L)
L L->aX L->(L)L’
L’ L’->λ L’->,SL’
Compiler Design
Error Recovery in LL(1) Parser - Example 2
a ( ) , $ Follow Table
S L L’
S S->a S->(L) sync sync sync
$ ) )
L L->aX L->(L)L’ sync
,
L’ L’->λ L’->,SL’
)
Compiler Design
Error Recovery in LL(1) Parser - Example
1
Now, using the new LL(1) Parsing table M, parse the input string
a(a) Initially, the stack and input buffer contain -