Compiler Design CS8602 Part A & Part B Answers
Compiler Design CS8602 Part A & Part B Answers
Compiler Design CS8602 Part A & Part B Answers
in
ENGINEERING COLLEGES
Prepared by
Sl.
Name of the Faculty Designation Affiliating College
No.
1. Mrs.A.P.Subapriya AP/CSE SCADIT
TEXT BOOKS
1. Alfred V Aho, Monica S. Lam, Ravi Sethi and Jeffrey D Ullman, ―Compilers –
Principles, Techniques and Tools‖, 2nd Edition, Pearson Education, 2007.
REFERENCES
1. Alfred V Aho, Monica S. Lam, Ravi Sethi and Jeffrey D Ullman, ―Compilers –
Principles, Techniques and Tools‖, 2nd Edition, Pearson Education, 2007.
2. Randy Allen, Ken Kennedy, ―Optimizing Compilers for Modern Architectures: A
Dependence-based Approach‖, Morgan Kaufmann Publishers, 2002.
3. Steven S. Muchnick, ―Advanced Compiler Design and Implementation, ―Morgan
Kaufmann Publishers - Elsevier Science, India, Indian Reprint 2003.
4. Keith D Cooper and Linda Torczon, ―Engineering a Compiler‖, Morgan Kaufmann
Publishers Elsevier Science, 2004.
5. Charles N. Fischer, Richard. J. LeBlanc, ―Crafting a Compiler with C‖, Pearson
Education, 2008.
TABLE OF CONTENTS
Text Book
1. Alfred V Aho, Monica S. Lam, Ravi Sethi and Jeffrey D Ullman, ―Compilers –
Principles, Techniques and Tools‖, 2nd Edition, Pearson Education, 2007.
References
Hours
Books
Sl. Require Cumulativ
Unit Topic / Portions to be Covered Referre
No d/ e Hrs
d
Planned
Recursive Descent Parser Predictive RB2
16 1 18
Parser RB2
17 LL(1) Parser- Shift Reduce Parser 1 19 RB2
18 LR Parser 1 20 RB2
LR (0)Item-Construction of SLR Parsing
19 1 21 RB2
Table
20 Introduction to LALR Parser 1 22 RB2
Error Handling and Recovery in Syntax
21 1 23 RB2
Analyzer
YACC-Design of a syntax Analyzer for a
22 1 24 TB1
Sample Language
23 Syntax directed Definitions 1 25 RB2
24 Construction of Syntax Tree 1 26 RB2
Bottom-up Evaluation of S-Attribute
25 1 27 RB2
Definitions
26 Design of predictive translator 1 28 RB2
Type Systems-Specification of a simple RB2
27 1 29
type checker
Equivalence of Type Expressions-Type RB2
28 IV 1 30
Conversions
Source Language Issues-Storage RB2
29 1 31
Organization
30 Storage Allocation 1 32 RB2
31 Parameter Passing 1 33 RB2
32 Symbol Tables 1 34 RB2
33 Dynamic Storage Allocation 1 35 RB2
34 Storage Allocation in FORTAN 1 36 RB2
35 Principal Sources of Optimization 1 37 RB2
36 DAG 1 38 RB2
37 Optimization of Basic Blocks 1 39 RB2
38 V Global Data Flow Analysis 2 41 RB2
40 Efficient Data Flow Algorithms 1 42 RB2
41 Issues in Design of a Code Generator 1 43 RB2
42 A Simple Code Generator Algorithm 2 45 RB2
target
source code Translator code
Types of Translator:
1.Interpreter
2.Compiler
3.Assembler.
3. What is an Assembler?
Assembler:
It translates assembly level language to machine code.
assembly language machine
code
Assembler
13. What is front end and back end of a compiler? [Nov/Dec 2013]
Front end of a compiler includes,
Lexical Analysis
Syntactic Analysis
The creation of the symbol Table
Semantic Analysis
Intermediate code generation
These phases depend on the source languages and are independent of target
machine.
Back end of a compiler includes
Code Optimization
Code Generation
These phases depend on the target machine and do not independent of
the source language.
14. List the various compiler construction tools. [Nov/Dec 2016]
a. Parser Generator
b. Scanner Generator
c. Syntax directed translated engine
d. Automatic code Generator
e. Data flow engines.
15. What is a language processing system?
[May /June 2016]
16. Write a regular Definition to represent date in the following format :JAN-
5th 2014.(May2015)
General RD Format for mm-dd-yyyy is: [1-12]-[1-31]-[0-9][0-9][0-9][0-9]
RD for JAN-5th-2014 is :[5]-[1]-[2][0][1][4]
PART-B
1. a.LANGUAGE PROCESSOR
Define the following terms:Compiler,Translator,Interpreter and
differentiate between them.(6) [May/June 2014]
Translator:
It is a program that translates one language to another.
Types of Translator:
1. Interpreter
2. Compiler
3. Assembler
1.Interpreter:
It is one of the translators that translate high level language to low
level language.
Low level
High level language Interpreter language
Machine
Assembly language code
Assembler
1) Structure editor:
Takes as input a sequence of commands to build a source program.
The structure editor not only performs the text-creation and
modification functions of an ordinary text editor, but it also analyzes
the program text, putting an appropriate hierarchical structure on the
source program.
For example, it can supply key words automatically - while …. do and
begin….. end.
2) Pretty printers :
A pretty printer analyzes a program and prints it in such a way that the
structure of the program becomes clearly visible.
For example, comments may appear in a special font.
3) Static checkers :
A static checker reads a program, analyzes it, and attempts to discover
potential bugs without running the program.
For example, a static checker may detect that parts of the source
program can never be executed.
4) Interpreters :
Translates from high level language ( BASIC, FORTRAN, etc..) into
machine language.
An interpreter might build a syntax tree and then carry out the
operations at the nodes as it walks the tree.
Interpreters are frequently used to execute command language since
each operator executed in a command language is usually an
invocation of a complex routine such as an editor or complier.
10
PHASES OF COMPILER
A Compiler operates in phases, each of which transforms the source
program from one representation into another. The following are the phases
of the compiler:
11
Main phases:
1) Lexical analysis
2) Syntax analysis
3) Semantic analysis
4) Intermediate code generation
5) Code optimization
6) Code generation
Sub-Phases:
1) Symbol table management
2) Error handling
Lexical Analysis:
It is the first phase of the compiler. It gets input from the source
program and produces tokens as output.
It reads the characters one by one, starting from left to right and forms
the tokens.
Token : It represents a logically cohesive sequence of characters such
as keywords, operators, identifiers, special symbols etc.
Example: a +b =20
Here, a,b,+,=,20 are all separate tokens.
Group of characters forming a token is called the Lexeme.
The lexical analyser not only generates a token but also enters the lexeme
into the symbol table if it is not already there.
Syntax Analysis:
It is the second phase of the compiler. It is also known as parser.
It gets the token stream as input from the lexical analyser of the
12
Semantic Analysis:
It is the third phase of the compiler.
It gets input from the syntax analysis as parse tree and checks
whether the given syntax is correct or not.
It performs type conversion of all the data types into real data types.
Intermediate Code Generation:
It is the fourth phase of the compiler.
It gets input from the semantic analysis and converts the input into
output as intermediate code such as three-address code.
Each three address instruction has at most one operator in addition to
the assignment
The compiler must generate a temporary name to hold the value
computed by each instruction.
Some three address instructions have fewer than three operands.
The three -address code consists of a sequence of instructions, each
of which has atmost three operands.
Example: t1=t2+t3
Code Optimization:
It is the fifth phase of the compiler.
It gets the intermediate code as input and produces optimized
intermediate code as output.
This phase reduces the redundant code and attempts to improve the
intermediate code so that faster-running machine code will result.
During the code optimization, the result of the program is not affected.
To improve the code generation, the optimization involves
Deduction and removal of dead code (unreachable code).
Calculation of constants in expressions and terms.
Collapsing of repeated expression into temporary string.
Loop unrolling.
Moving code outside the loop.
13
14
pos id1
initial id2
rate id3
15
4. COUSINS OF COMPILER
Discuss the cousins of compiler.(4) [Nov/Dec 2013]
COUSINS OF COMPILER
1. Preprocessor
16
2. Assembler
3. Loader and Link-editor
Preprocessor
A preprocessor is a program that processes its input data to produce output that is
used as input to another program. The output is said to be a preprocessed form of
the input data, which is often used by some subsequent programs like compilers.
They may perform the following functions :
1. Macro processing
2. File Inclusion
3. Rational Preprocessors
4. Language extension
1. Macro processing:
A macro is a rule or pattern that specifies how a certain input sequence should be
mapped to an output sequence according to a defined procedure. The mapping
process that instantiates a macro into a specific output sequence is known as macro
expansion.
2. File Inclusion:
Preprocessor includes header files into the program text. When the preprocessor
finds an #include directive it replaces it by the entire content of the specified file.
3. Rational Preprocessors:
These processors change older languages with more modern flow-of-control and
data-structuring facilities.
4. Language extension :
These processors attempt to add capabilities to the language by what amounts to
built-in macros. For example, the language Equel is a database query language
embedded in C.
ASSEMBLER
Assembler creates object code by translating assembly instruction
mnemonics into machine code. There are two types of assemblers:
One-pass assemblers go through the source code once and assume
that all symbols will be defined before any instruction that references them.
Two-pass assemblers create a table with all symbols and their values in
the first pass, and then use the table in a second pass to generate code.
Two Pass Assembly :
Cosider the Assembly code
Mov a R1
ADD #2 R1
MOV R1 b Thus it computes b=a+2
The simplest form of assembler makes two passes over the input.A pass
consists of reading an input file once.
In the first pass all the identifiers that denote storage locations are found and
stored in a symbol table.
For eg) IDENTIFIER ADDRESS
17
a 0
b 4
In the second pass the assembler scans the input again.This time it translates each
operation code into the sequence of bits representing that operation in machine
language and it translates each identifier representing a location into the address
given for that identifier in the symbol table.
The output of the second pass is relocatable machine code that it can be
loaded starting at any location L in memory.
The following is machine code into which the assembly instructions might be
translated
0001 01 00 00000000*
0011 01 10 00000010
0010 01 00 00000100*
The first four bits are the instruction code
0001 - Load
0010 - Store 0011 - Add
By load and store we mean moves from memory into a register and vice versa.
The next two bits designate a register and 01 refers to register 1
The two bits after that represent a ―tag‖ represent the addressing mode.
00 ordinary addressing mode
10 immediate addressing mode
The last eight bits refers to a memory address.
Linker and Loader
A linker or link editor is a program that takes one or more objects
generated by a compiler and combines them into a single executable
program.
Three tasks of the linker are :
1. Searches the program to find library routines used by program, e.g.
printf(), math routines.
2. Determines the memory locations that code from each module will
occupy and relocates its instructions by adjusting absolute references
3. Resolves references among files.
A loader is the part of an operating system that is responsible for loading
programs in memory, one of the essential stages in the process of starting a
program.
Suppose that the address space containing the data is to be loaded at
starting at location L. Then L must be added to the address of the instruction.
Thus if L = 00001111 i.e 15 then a and b would be at locations 15 and 19
respectively and the instruction would appear as
0001 01 00 00001111
0011 01 10 00000010
0010 01 00 00010011
18
19
construction tools:
1. Parser Generators:
These produce syntax analyzers, normally from input that is based on
a context-free grammar.
It consumes a large fraction of the running time of a compiler.
Example-YACC (Yet Another Compiler-Compiler).
2. Scanner Generator:
These generate lexical analyzers, normally from a specification based
on regular expressions.
The basic organization of lexical analyzers is based on finite
automation.
3. Syntax-Directed Translation:
These produce routines that walk the parse tree and as a result
generate intermediate code.
Each translation is defined in terms of translations at its neighbor
nodes in the tree.
4. Automatic Code Generators:
It takes a collection of rules to translate intermediate language into
machine language. The rules must include sufficient details to handle
different possible access methods for data.
5. Data-Flow Engines:
It does code optimization using data-flow analysis, that is, the gathering
of information about how values are transmitted from one part of a
program to each other part.
9. What is Regular Expression? Or List the rules that form the BASIS. [Nov/Dec
2016]
1. ε is a regular expression, and L(ε) is { ε }, that is, the language whose sole
member is the empty string.
2. If‗a‘is a symbol in Σ, then ‗a‘is a regular expression, and L(a) = {a}, that is,
the language with one string, of length one, with ‗a‘in its one position.
22
3. Suppose r and s are regular expressions denoting the languages L(r) and
L(s). Then,
a) (r)|(s) is a regular expression denoting the language L(r) U L(s).
b) (r)(s) is a regular expression denoting the language L(r)L(s).
c) (r)* is a regular expression denoting (L(r))*.
d) (r) is a regular expression denoting L(r).
10. Define Regular Set and Non regular set?
Regular Set:
A language that can be defined by a regular expression is called a
regular set. If two regular expressions r and s denote the same regular set, we
say they are equivalent and write r = s.
Non Regular Set:
A language that cannot be described by regular expression is called a non
regular set.
11. Construct Regular expression for the language
PART-B
1. NEED AND ROLE OF LEXICAL ANALYZER
What are the issues in the design of a lexical Analyzer.(4[May /June 2014,
May /June 2012 ] [May /June 2016]
Discuss the role of Lexical Analyzer in detail.(8).[Apr / May 2011] [Nov/Dec
2016].
Describe the error recovery schemes in lexical phase of a compiler.[May 15]
23
token
source Lexical parser
program Analyser
Symbol Table
Upon receiving a ―get next token‖ command from the parser, the lexical
analyzer reads input characters until it can identify the next token.
24
=
Assignment operator <,<=,>,>=,< >,=
End of Statement
Pattern:
The set of strings decribed by a rule called patterns.
In the case of a keyword as a token, the pattern is just the sequence of
characters that form the keyword. For identifiers and some other tokens, the pattern
is a more complex structure that is matched by many strings.
Attributes for Tokens
Some tokens have attributes that can be passed back to the parser. The
lexical analyzer collects information about tokens into their associated attributes.
The attributes influence the translation of tokens.
i) Constant : value of the constant
ii) Identifiers: pointer to the corresponding symbol table entry.
Error Recovery Strategies In Lexical Analysis:
The following are the error-recovery actions in
lexical analysis:
1)Deleting an extraneous character.
2) Inserting a missing character.
3)Replacing an incorrect character by a correct character
4)Transforming two adjacent characters.
5) Panic mode recovery: Deletion of successive characters from the
token until error is resolved.
2. INPUT BUFFERING
Describe input buffering techniques in detail.[Nov /Dec 2013]
Explain briefly about input buffering in reading the source program for
finding the tokens.[Nov/Dec 2011].
Differentiate Lexeme Pattern and Token. (6)[May/Jun 2014]
BUFFER PAIRS
A buffer is divided into two N-character halves, as shown below
25
: : E : : = : : M : * C : * : : * : 2 : eof
lexeme_beginning
for
war
d
Each buffer is of the same size N, and N is usually the number of characters
on one disk block. E.g., 1024 or 4096 bytes.
Using one system read command we can read N characters into a buffer.
If fewer than N characters remain in the input file, then a special character,
represented by eof, marks the end of the source file.
Two pointers to the input are maintained:
1. Pointer lexeme_beginning, marks the beginning of the
current lexeme, whose extent we are attempting to
determine.
2. Pointer forward scans ahead until a pattern match is found.
Once the next lexeme is determined, forward is set to the
character at its right end.
The string of characters between the two pointers is the current lexeme.
After the lexeme is recorded as an attribute value of a token returned
to the parser, lexeme _beginning is set to the character immediately
after the lexeme just found.
Sentinels
For each character read, we make two tests: one for the end of the buffer,
26
and one to determine what character is read. We can combine the buffer-end
test with the test for the current character if we extend each buffer to hold a
sentinel character at the end.
The sentinel is a special character that cannot be part of the source program,
and a natural choice is the character eof.
The sentinel arrangement is as shown below:
lexeme_beginning forward
Note that eof retains its use as a marker for the end of the entire input.
Any eof that appears other than at the end of a buffer means that the
input is at an end.
Code to advance forward pointer:
forward : = forward +
1; if forward ↑ = eof
then begin
if forward at end of first half then
begin reload second half;
forward := forward + 1
end
else if forward at end of second half
then begin reload first half;
move forward to beginning of
first half end
else /* eof within a buffer signifying end of
input */ terminate lexical analysis
end
SPECIFICATION OF TOKENS
There are 3 specifications of
tokens:
27
1)Strings
2) Language
3)Regular
expression
Strings and Languages
An alphabet or character class is a finite set of symbols.
A string over an alphabet is a finite sequence of symbols drawn from that
alphabet.
A language is any countable set of strings over some fixed alphabet.
In language theory, the terms "sentence" and "word" are often used as
synonyms for "string." The length of a string s, usually written |s|, is the number of
occurrences of symbols in s. For example, banana is a string of length six. The
empty string, denoted ε, is the string of length zero.
Operations on strings
The following string-related terms are commonly used:
1. A prefix of string s is any string obtained by removing zero or more symbols
from the end of strings.
Forexample, ban is a prefix of banana.
2. A suffix of string s is any string obtained by removing zero or more
symbols from the beginning of s.
For example, nana is a suffix of banana.
3. A substring of s is obtained by deleting any prefix and any
suffix from s. For example, nan is a substring of banana.
4. The proper prefixes, suffixes, and substrings of a string s are those
prefixes, suffixes, and substrings, respectively of s that are not ε or not equal to s
itself.
5. A subsequence of s is any string formed by deleting zero or more not
necessarily consecutive positions of s.
For example, baan is a subsequence of banana.
Operations on languages:
The following are the operations that can be applied to languages:
1. Union
2. Concatenation
3. Kleene closure
4. Positive closure
The following example shows the operations on strings:
Let L={0,1} and S={a,b,c}
1. Union : L U S={0,1,a,b,c}
:
2. Concatenation L.S={0a,1a,0b,1b,0c,1c}
Kleene
3. closure : L*={ ε,0,1,00….}
28
Positive
4. closure : L+={0,1,00….}
Regular Expressions
Each regular expression r denotes a language L(r).
Here are the rules that define the regular expressions over some alphabet Σ
and the languages that those expressions denote:
1.ε is a regular expression, and L(ε) is { ε }, that is, the language whose sole
member is the empty string.
2.If‗a‘is a symbol in Σ, then ‗a‘is a regular expression, and L(a) = {a}, that is, the
language with one string, of length one, with ‗a‘in its one position.
3.Suppose r and s are regular expressions denoting the languages L(r) and L(s).
Then,
a) (r)|(s) is a regular expression denoting the language L(r) U L(s).
b) (r)(s) is a regular expression denoting the language L(r)L(s).
c) (r)* is a regular expression denoting (L(r))*.
d) (r) is a regular expression denoting L(r).
4.The unary operator * has highest precedence and is left associative.
5.Concatenation has second highest precedence and is left associative.
6.| has lowest precedence and is left associative.
Regular set
A language that can be defined by a regular expression is called a regular set.
If two regular expressions r and s denote the same regular set, we say they are
equivalent and write r = s.
There are a number of algebraic laws for regular expressions that can
be used to manipulate into equivalent forms.
For instance, r|s = s|r is commutative; r|(s|t)=(r|s)|t is associative.
Regular Definitions
Giving names to regular expressions is referred to as a Regular definition.
If Σ is an alphabet of basic symbols, then a regular definition is a sequence of
definitions of the form
dl → r1
d2 →r2
………
dn → rn
1. Each di is a distinct name.
2. Each ri is a regular expression over the alphabet Σ U {dl, d2,. . . , di-l}.
Example: Identifiers is the set of strings of letters and digits beginning with
a letter. Regular definition for this set:
letter → A | B | …. | Z | a | b | …. |z
digit → 0 | 1 | …. | 9
id → letter ( letter | digit ) *
Shorthands
Certain constructs occur so frequently in regular expressions that it is
29
Non-regular Set
A language which cannot be described by any regular expression is a non-
regular set. Example: The set of all strings of balanced parentheses and repeating
strings cannot be described by a regular expression. This set can be specified by a
context-free grammar.
RECOGNITION OF TOKENS
Consider the following grammar fragment:
stmt → if expr then stmt
|if expr then stmt
else stmt |ε
expr → term relop term
|term
term → id |num
where the terminals if , then, else, relop, id and num generate sets of
strings given by the following regular definitions:
if → If
then → Then
else → Else
30
relop → <|<=|=|<>|>|>=
id → letter(letter|digit)*
digit+ (.digit+)?(E(+|-
Num → )?digit+)?
For this language fragment the lexical analyzer will recognize the keywords if,
then, else, as well as the lexemes denoted by relop, id, and num. To simplify
matters, we assume keywords are reserved; that is, they cannot be used as
identifiers.
Transition diagrams
As an intermediate step in the construction of a lexical Analyzer we first
produce a stylized flowchart called a transition diagram.
This Transition diagram are deterministic.
One state is labeled as start state where control resides when we begin to
recognize a token.
On entering a state we read the next input character.
If there is an edge from the current state whose label matches this character.
We then go to the state pointed by the edge.Otherwise we indicate failure.
Since keywords are sequence of letters ,they are exceptions to the rule that
asequence of letters and digits starting with a letter is an identifier.
When the accepting state is reached we execute some code to determine if
the lexeme leading to the accepting state is a keyword or an identifier.
The return statement next to the accepting state uses
Gettoken() to obtain the token.
Install-id() to obtain the attribute values to be returned.
The symbol table is examined and if the lexeme is found there marked as a
keyword install-id() returns 0.
If the lexeme is found and is a variable install-id() returns a pointer to the
symbol table entry.
If the lexeme is not found in the symbol table it is installed as a variable and
a pointer to the newly created entry is returned.
31
32
Lex Compiler
lex.l lex.yy.cc
Sequence of Tokens
Lex Specification
A Lex program consists of three parts:
{ definitions }
%%
{ rules }
%%
{ user subroutines }
Definitions include declarations of variables, constants, and regular
definitions
Rules are
statements of the form
p1 {action1}
p2 {action2}
…
pn {actionn}
where pi is regular expression and actioni describes what action the
lexical analyzer should take when pattern pi matches a lexeme. Actions
are written in C code.
User subroutines are auxiliary procedures needed by the actions.
These can be compiled separately and loaded with the lexical analyzer.
33
34
DTrans [E, a] = F
E-closure (move(E, b)) = E-closure (move({1, 2, 4, 5, 6, 7, 10, 11, 12, 14, 17}, b))
= E-closure (5, 15) = {1, 2, 4, 5, 6, 7, 11, 12,14, 15, 16, 17} = G
DTrans [E, b] = G
E-closure (move(F, a)) =E-closure (3, 8, 13) = F
DTrans [F, a] = F
E-closure (move(F, b)) =E-closure (5, 9, 15)
={1, 2, 4, 5, 6, 7, 9, 11, 12, 14, 15, 16, 17}= H
DTrans [F, b] = H
E-closure (move(G, a)) =E-closure (3, 8, 13) = F
DTrans [G, a] = F
E-closure (move(G, b)) =E-closure (5, 15) = G
DTrans [G, b] = G
E-closure (move(H, a)) =E-closure (3, 8, 13) = F
DTrans [H, a] = F
E-closure (move(H, b)) =E-closure (5, 10, 15)
={1, 2, 4, 5, 6, 7, 10, 11, 12, 14, 15, 16, 17} = I
DTrans [H, b] = I
E-closure (move(I, a)) =E-closure (3, 8, 13) = F
DTrans [I, a] = F
E-closure (move(I, b)) =E-closure (5, 15) = G
DTrans [I, b] = G
Input : A DFA M with set o states S. Inputs £ .Transitions defined for all states and
inputs,initial state s0 and set of final states F.
Output : A DFA M‘ accepting the same language as M and having as few states as
possible .
35
One important result on finite automata, both theoretically and practically, is that for
any regular language there is a unique DFA having the smallest number of states
that accepts it. Let M = < Q , , q0 , , A > be a DFA that accepts a language L.
Then the following algorithm produces the DFA, denote it by M1, that has the
smallest number of states amomg the DFAs that accept L.
while ( new )
:= ;
new
new := new_partition( )
final := ;
function new_partition( )
for each set S of do
partition S into subsets such that two states p and q of S are in the same subset
of S
if and only if for each input symbol, p and q make a transition to (states of) the
same set of .
The subsets thus formed are sets of the output partition in place of S.
If S is not partitioned in this process, S remains in the output partition.
end
Minimum DFA M1 is constructed from final as follows:
Select one state in each set of the partition final as the representative for the
set. These representatives are states of minimum DFA M1.
Let p and q be representatives i.e. states of minimum DFA M1. Let us also
denote by p and q the sets of states of the original DFA M represented by p
and q, respectively. Let s be a state in p and t a state in q. If a transition from
s to t on symbol a exists in M, then the minimum DFA M1 has a transition from
p to q on symbol a.
The start state of M1 is the representative which contains the start state of M.
The accepting states of M1 are representatives that are in A.
Note that the sets of final are either a subset of A or disjoint from A.
Remove from M1 the dead states and the states not reachable from the start state, if
there are any. Any transitions to a dead state become undefined.
A state is a dead state if it is not an accepting state and has no out-going transitions
except to itself.
36
UNIT-III
SYNTAX ANALYSIS
Need and Role of the Parser-Context free Grammars-Top Down Parsing-General
Strategies-Recursive Descent Parser-Predictive Parser-LL(1) Parser-Shift Reduce
Parser-LR Parser-LR(0) Item-Construction of SLR Parsing Table-Introduction to
LALR Parser-Error Handling and Recovery in Syntax Analyzer.-YACC-Design of a
syntax Analyzer for a sample Language.
PART-A
1.What are the three general types of parsers? Differentiate them.
1. Universal parsing methods
2. Top down parsing
3. Bottom up parsing
Universal parsing methods are too inefficient to use in production
compilers and can parse any grammar
Top down parsers build parse trees from the top (root) to the
bottom (leaves).
Bottom up parser build parse trees from the leaves and work up
to the root.
Top down and bottom up parser can parse only sub classes of
Grammar such as LL Grammar and LR Grammar. These
methods are efficient.
2.What are the goals of error handler?
It should report the presence of errors clearly and accurately
It should recover from each error quickly enough to be able to detect
subsequent errors
It should not significantly slow down the processing of correct programs
37
The output of the parser is parse tree for the stream of tokens produced by the
lexical Analyzer.
4. What are the error recovery strategies in Parsing?
Panic mode
Phrase level
Error productions
Global corrections.
5. What is meant by ambiguous grammar? [May/June 2012,2016] [Nov / Dec
2016]
A grammar that produces more than one parse tree, or more than one left most
derivation or more than one right most derivation for some sentence is said to be
ambiguous.
Example:
Grammar G : E → E+E | E*E | ( E ) | - E | id is ambiguous.
The sentence id+id*id has the following two distinct leftmost derivations:
E → E+ E E → E* E
E → id + E E→E+E*E
E → id + E * E E → id + E * E
E → id + id * E E → id + id * E
E → id + id * id E → id + id * id
The two corresponding parse trees are :
6. What is left recursive grammar? How will you eliminate the left recursive
grammar and eliminate left recursion for the following grammar? [Nov/Dec
2013]
E E + T | T T T * F | F F (E) | id
Left Recursive Grammar:
A Grammar is left recursive if it has a non terminal A such that there is
a derivation
A + Aα for some string α
Rule to eliminate Left recursive Grammar:
Left recursive pair of productions
A + Aα | β
Would be replaced by the non left recursive productions
38
A --> βA‟
A‟ --> α A‟| ε
without changing the set of strings derivable from A.
Non left recursive grammar:
E TE‘ T FT‘ F (E) |id
E‘ +TE‘ | ε T‘ *FT‘| ε
FIRST(α):
If α is any string of grammar symbols , let FIRST (α) be the set of terminals
that begin the string derived from α . If α * ε then ε is also in FIRST(α).
FOLLOW(A):
FOLLOW(A for non terminal A to be the set of terminals a that can appear
immediately to the right of A in some sentential form, that is, the set of terminals a
such that there exists a derivation of the form S * αAa β for some α and β
Rules to compute FIRST(X):
1. If X is a terminal then FIRST(X) is {X}
2. If X is a production then add to FIRST(X).
3. If X is a non terminal and X Y1Y2…..YK is a production then place a in
FIRST(X) if for some I, a is in FIRST(Yi)
Rules to compute FOLLOW(A):
1. Place $ in FOLLOW(S) where S is the start symbol and $ is the right
end marker.
2. If there is a production A αBβ then everything in FIRST(β) except for
is places in FOLLOW(B).
3. If there is a production A αB or a production A αBβ where
FIRST(β) contains then everything in FOLLOW(A) is in FOLLOW(B).
39
10. Define Handle and handle pruning? [Apr/May 2011, Nov/Dec 2011]
[Nov/Dec 2016].
Handle :
A handle of a string is a sub string that matches the right side of a production and
whose reduction to the non terminal on the left side of the production represents one
step along the reverse of a right most derivation.
Handle Pruning:
A right most derivation in reverse can be obtained by ―handle pruning‖.If w is a
sentence of the grammar then w=n , where n is the right sentential form of some
right most derivation.
S = 1 23 ….. n-1n =w
To reconstruct this derivation in reverse order we locate the handle n in n and
replace n by the left side of a production An n to obtain the (n-1)st right
sentential form n-1.
40
13. Define LR(0) item or item and define the two classes of item.
LR(0) Item:
An LR(0) item (item for short) of a grammar G is a production of G with a dot at some
position of the right side Thus production AXYZ yields four items
A . XYZ
AX . YZ
AXY . Z
AXYZ .
Two classes of Item:
All the sets of items can be divided into two classes of items
1. Kernal items:
It includes the initial item S‘ . S and all items whose dots are not at
the left end.
2.Nonkernal items:
These have their dots at the left end.
14. Define closure operation and Goto operation?
Closure
If I is a set of items for Grammar G then closure(I) is the set of items
constructed from I by the two rules.
1. Initially every item in I is added to closure(I)
2. If A . B is in closure(I) and B is a production then add the item
B . to I if it is not already there. This rule is applied until no more items can
be added to closure(I).
Goto:
goto(I,X) is defined to be the closure of the set of all items [AX . ] such
that [A . X] is in I. Where I is the set of items and X is a grammar symbol.
15. Why SLR and LALR are more economical than Canonical LR Parser.
For a comparison of parser size ,the SLR and LALR tables for a
grammar always have the same number of states and this number is typically
several hundred states for a language like Pascal.The canonical LR table would
typically have several thousand states for the same size language.Thus it is much
easier and more economical to construct SLR and LALR tables than the Canonical
LR tables.
PART-B
Discuss the Role of the Parser in detail. [ Apr/May 2007, Apr/May 2015]
Explain the error recovery strategies in syntax analysis.(8)[Apr/May
2011]
41
The output of the parser is parse tree for the stream of tokens produced by the
lexical Analyzer.
The Tasks of Parser
Collecting informations about various tokens into the symbol table.
Performing type checking.
Semantic Analysis.
Issues :
Parser cannot detect errors such as:
1. Variable re-declaration
2. Variable initialization before use.
3. Data type mismatch for an operation.
The above issues are handled by Semantic Analysis phase.
43
Parsing method in which the construction of parse tree starts at the leaves
and proceeds towards the root is called as Bottom up Parsing.
Bottom Up Parsing method finds the right most derivation in reverse for a
string of tokens.
d. Recursive Decent Parsing:
The general form of top down parsing called recursive descent parsing may
involve backtracking that is making repeated scans of the input. It can be viewed as
an attempt to construct a parse tree for the input string starting from the root and
creating the nodes of the parse tree in preorder.
e. Predictive Parsing:
A special form of recursive descent Parsing in which the look ahead symbol
unambiguously determines the procedures selected for each non terminal ,where no
backtracking is required .It consist of stack, input buffer and finite control and tries to
find the left most derivation.
f. Shift Reduce Parsing:
A general style of bottom up syntax analysis which attemts to construct a
parse tree for an input string beginning at the leaves and working up towards the
root.
g. LR(k) Parsing :
The ―L‖ is for left to right scanning of the input ,the ―R‖ for constructing a right
most derivation in reverse and the k for the number of input symbols of look ahead
that are used in making parsing decisions.
h. LL(1) Grammar:
A grammar whose parsing table has multiply defined entries is said to be
LL(1) grammar.
L - Left most Derivation.
L - Input scanned from left to right.
1- One input symbol used as a look ahead symbol.
CONTEXT-FREE GRAMMARS
A Context-Free Grammar is a quadruple that consists of terminals, non-terminals,
start symbol and productions.
Terminals : These are the basic symbols from which strings are formed.
Non-Terminals : These are the syntactic variables that denote a set of strings.
These help to define the language generated by the grammar.
Start Symbol : One non-terminal in the grammar is denoted as the ―Start-symbol‖
and the set of strings it denotes is the language defined by the grammar.
44
E→-E E→-E
E → - (E ) E → - (E )
E → - (E+E ) E → - (E+E )
E → - (id+E ) E → - (E+id )
E → - (id+id ) E → - (id+id )
String that appear in leftmost derivation are called left sentinel forms.
String that appear in rightmost derivation are called right sentinel forms.
Sentinels:
Given a grammar G with start symbol S, if S → α , where α may contain non-
terminals or
terminals, then α is called the sentinel form of G.
Yield or frontier of tree:
Each interior node of a parse tree is a non-terminal. The children of node can be a
terminal or non-terminal of the sentinel forms that are read from left to right. The
sentinel form in the parse tree is called yield or frontier of the tree .The yield is the
leaf nodes which are read from left to right.
Ambiguity:
A grammar that produces more than one parse for some sentence is said to be
ambiguous grammar.
Example : Given grammar G : E → E+E | E*E | ( E ) | - E | id
The sentence id+id*id has the following two distinct leftmost derivations:
E → E+ E E → E* E
E → id + E E→E+E*E
E → id + E * E E → id + E * E
E → id + id * E E → id + id * E
E → id + id * id E → id + id * id
46
WRITING A GRAMMAR
There are four categories in writing a grammar :
1. Regular Expression Vs Context Free Grammar
2. Eliminating ambiguous grammar.
3. Eliminating left-recursion
4. Left-factoring.
Each parsing method can handle grammars only of a certain form hence, the initial
grammar may have to be rewritten to make it parsable.
Reasons for using regular expressions to define the lexical syntax of a
language.
The lexical rules of a language are simple and RE is used to describe them.
Regular expressions provide a more concise and easier to understand
notation for tokens than grammars.
Efficient lexical analyzers can be constructed automatically from RE than
from grammars.
Separating the syntactic structure of a language into lexical and nonlexical
parts provides a convenient way of modularizing the front end into two
manageable-sized components
Regular Expressions vs. Context-Free Grammars:
REGULAR EXPRESSION CONTEXT-FREE GRAMMAR
It is used to describe the tokens of It consists of a quadruple where S →
programming languages. start symbol, P → production, T →
terminal, V → variable or non-
terminal.
It is used to check whether the given It is used to check whether the given
input is valid or not using transition input is valid or not using derivation.
diagram.
47
The transition diagram has set of The context-free grammar has set of
states and edges. productions.
It has no start symbol It has start symbol.
It is useful for describing the It is useful in describing nested
structure of lexical constructs such structures such as balanced
as identifiers, constants, keywords, parentheses, matching begin-end‘s
and so forth. and so on.
Eliminating ambiguity:
Ambiguity of the grammar that produces more than one parse tree for leftmost or
rightmost derivation can be eliminated by re-writing the grammar.
To eliminate ambiguity the general rule is
―Match each else with the closest previous unmatched then.‖
The idea is that a statement appearing between a then and an else must be
matched.
i.e. It must not end with an unmatched then followed by any statement.
To eliminate ambiguity, the following grammar may be used:
stmt → matched_stmt | unmatched_stmt
matched_stmt → if expr then matched_stmt else matched_stmt | other
unmatched_stmt → if expr then stmt | if expr then matched_stmt else
unmatched_stmt
Consider this example, G: stmt → if expr then stmt | if expr then stmt else stmt |
other
This grammar is ambiguous since the string if E1 then if E2 then S1 else S2 has
the following two parse trees for leftmost derivation:
48
F → (E) |id
S → cAd
A → ab |a
and the input string w=cad.
The parse tree can be constructed using the following top-down approach :
Step1:
Initially create a tree with single node labeled S. An input pointer points to ‗c‘, the
first symbol of w. Expand the tree with the production of S.
Step2:
The leftmost leaf ‗c‘ matches the first symbol of w, so advance the input pointer to
the second symbol of w ‗a‘ and consider the next leaf ‗A‘. Expand A using the first
alternative.
Step3:
The second symbol ‗a‘ of w also matches with second leaf of tree. So advance the
input pointer to third symbol of w ‗d‘. But the third leaf of tree is b which does not
match with the input symbol d.
Hence discard the chosen production and reset the pointer to second position. This
is called backtracking.
Step4:
Now try the second alternative for A.
Step2:
The leftmost leaf ‗c‘ matches the first symbol of w, so advance the input pointer to
the second symbol of w ‗a‘ and consider the next leaf ‗A‘. Expand A using A→aA‘
Step3:
The leftmost leaf ‗c‘ matches the first symbol of w, so advance the input pointer to
the second symbol of w ‗a‘ and consider the next leaf A‘ and expand A‘ using A‘ → ε.
52
4.a.PREDICTIVE PARSER
53
Predictive parsing
table :
Non
Id + * ( ) $
Terminal
E E → TE‘ E → TE‘
E‟ E‘ → +TE‘ E‘ → ε E‘→ ε
T T → FT‘ T → FT‘
T‟ T‘→ ε T‘→ *FT‘ T‘ → ε T‘ → ε
F F→ id F→ (E)
54
55
1); that is, Y1,….Yi-1 => ε.If ε is in FIRST (Y j) for all j=1,2,..,k, then add ε to
FIRST(X).
FOLLOW :
FOLLOW(A) for non-terminal A is the set of terminals A that can appear
immediately to the right of A in some sentential form .
*
That is the set of terminals a such that there exists a derivation of the form S =>
αAaβ for some α and β
Rules for computing FOLLOW(A ):
Non
Id + * ( ) $
Terminal
E E → TE‘ E → TE‘
E‟ E‘ → +TE‘ E‘ → ε E‘→ ε
T T → FT‘ T → FT‘
T‟ T‘→ ε T‘→ *FT‘ T‘ → ε T‘ → ε
F F→ id F→ (E)
4c.Explain LL(1) grammar for the sentence S->iEtS | iEtSeS | a E->b (May
/June 2016)
Solution:
The production rule for S needs to be left factored.After left factoring the
grammar is,
S-> iEtSS‟ | a
S‟ -> eS | €
E->b
Compute FIRST and FOLLOE for S,S‟ and E.
FIRST(S) = (I,a)
FIRST (S‟) = (e,€}
FIRST (E) = {b}
FOLLOW(S) = (e,$}
FOLLOW(S‟) = (e,$}
FOLLOW(E) = {t}
5.a.SHIFT-REDUCE PARSING
Describe the conflicts that may occur during shift Reduce
Parsing[May/Jun 2012]
Describe the shift reduce Parser with an example.[Apr/May 2008] [May
/June 2016]
SHIFT-REDUCE PARSING
Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a
57
parse tree for an input string beginning at the leaves (the bottom) and working
up towards the root (the top).
Example:
Consider the grammar:
S → aABe
A → Abc | b
B→ d
The sentence to be recognized is abbcde.
abbcde (A → b) S → aABe
aAbcd
e (A → Abc) → aAde
aAde (B → d) → aAbcde
aABe (S → aABe) → abbcde
S
The reductions trace out the right-most derivation in reverse.
Handles:
A handle of a string is a substring that matches the right side of a production, and
whose reduction to the non-terminal on the left side of the production represents one
step along the reverse of a rightmost derivation.
Example:
Consider the grammar:
E → E+E
E → E*E
E → (E)
E → id
And the input string id1+id2*id3
The rightmost derivation is :
E ==> E+E
==> E+E*E
==> E+E*id3
==> E+id2*id 3
==> id1+id2*id 3
In the above derivation the underlined substrings are called handles.
Handle pruning:
A rightmost derivation in reverse can be obtained by ―handle pruning‖.
(i.e.) if w is a sentence or string of the grammar at hand, then w = γn, where γn is the
nth right-sentinel form of some rightmost derivation.
Actions in shift -reduce parser:
Shift – The next input symbol is shifted onto the top of the stack.
58
Reduce – The parser replaces the handle within a stack with a non-terminal.
Accept – The parser announces successful completion of parsing.
Error – The parser discovers that a syntax error has occurred and calls an error
recovery routine
Stack implementation of shift-reduce parsing :
$ id1+id2*id3 $ Shift
$E +id2*id3 $ Shift
$ E+ id2*id3 $ Shift
$
$ E+E*id3 reduce by E→id
$ E+E*E $ reduce by E→ E *E
$E $ accept
.
Conflicts in shift-reduce parsing:
There are two conflicts that occur in shift shift-reduce parsing:
Shift-reduce conflict: The parser cannot decide whether to shift or to reduce.
Reduce-reduce conflict: The parser cannot decide which of several
reductions to make.
1. Shift-reduce conflict:
Example:
Consider the grammar:
E→E+E |E*E |id and input id+id*id
Stack Input Action Stack Input Action
59
Reduce by
$ E+E *id $ E→E+E $E+E *id $ Shift
$E *id $ Shift $E+E* id $ Shift
Reduce by
$ E* id $ Shift $E+E*id $ E→id
Reduce by Reduce by
$ E*id $ E→id $E+E*E $ E→E*E
Reduce by Reduce by
$ E*E $ E→E*E $E+E $ E→E*E
$E $E
Reduce-reduce conflict:
Consider the grammar:
M → R+R |R+c |R
R→c
and input c+c
$c +c $ Reduce by $c +c $ Reduce by
R→c R→c
$R +c $ Shift $R +c $ Shift
$ R+ c$ Shift $ R+ c$ Shift
$ R+R $ Reduce by $M $
M→R+R
$M $
5.b.LR PARSER
Explain LR Parser with an example.[Apr/May 2009]
LR PARSER
An efficient bottom-up syntax analysis technique that can be used to parse a large
class of CFG is called LR(k) parsing. The ‗L‘ is for left-to-right scanning of the input,
60
the ‗R‘ for constructing a rightmost derivation in reverse, and the ‗k‘ for the number of
input symbols. When ‗k‘ is omitted, it is assumed to be 1.
Advantages of LR parsing:
It recognizes virtually all programming language constructs for which CFG can
be written.
It is an efficient non-backtracking shift-reduce parsing method.
A grammar that can be parsed using LR method is a proper superset of a
grammar that can be parsed with predictive parser.
It detects asyntactic error as soon as possible.
Drawbacks of LR method:
It is too much of work to construct a LR parser by hand for a programming language
grammar. A specialized tool, called a LR parser generator, is needed. Example:
YACC.
Types of LR parsing method:
1. SLR- Simple LR
Easiest to implement, least powerful.
2. CLR- Canonical LR
Most powerful, most expensive.
3. LALR- Look -Ahead LR
Intermediate in size and cost between the other two methods.
The LR parsing algorithm:
It consists of :
An input
An output
A stack
A driver program
A parsing table that has two parts (action and goto).
The schematic form of an LR parser is as follows:
The parsing program reads characters from an input buffer one at a time.
The program uses a stack to store a string of the form s0X1s1X2s2…Xmsm,
where sm is on top. Each Xi is a grammar symbol and each si is a state.
The parsing table consists of two parts : action and goto functions.
Action : The parsing program determines sm, the state currently on top of stack, and
ai, the current input symbol. It then consults action[sm,ai] in the action table which
can have one of four values :
1. shift s, where s is a state,
2. reduce by a grammar production A → β,
3. accept, and
4. error.
Goto : The function goto takes a state and grammar symbol as arguments and
produces a state.
A Configuration of an LR parser is a pair whose first component is the stack contents
and whose second component is the unexpended input.
(s0X1s1X2s2…Xmsm , aiai+1......an$)
This configuration represents the right – sentential form.
X1X2.....Xm aiai+1......an.
The next move of the parser is determined by reading a i , the current input symbol
and sm, the state on the top of the stack and then consulting the parsing action table
entry action[sm, ai].
The configurations resulting after each of the four types of move are as follows.
1. If action[sm, ai] = shift s, the parser executes a shift move entering the
configuration
(s0X1s1X2....Xmsmais,ai+1....an$).
Here the parser has shifted both the current input symbol a i and the next
state s which is given in action[sm, ai] onto the stack ai+1 becomes the current
input symbol.
2. If action[sm, ai]= reduce A-> β then the parser executes a reduce move
entering the configuration
(s0X1s1X2....Xm-rsm-rAs, aiai+1....an$)
Where s = goto[sm-r,A] and r is the length of β,the right side of
production.Here the parser popped 2r symbols off the stack (r state symbols and r
Grammar symbols ),exposing state sm-r. The parser then pushed both A ,the
left side of the production and s ,the entry for goto[sm-r,A] onto the stack.
3. If action[sm, ai]=accept, then Parsing is completed.
4. If action[sm, ai]=error , the Parser has discovered an error and calls an error
recovery routine.
LR Parsing algorithm:
Input: An input string w and an LR parsing table with functions action and goto for
grammar G.
62
0 id + id * id $ GOTO ( I0 , id ) = s5 ; shift
0 id 5 + id * id $ GOTO ( I5 , + ) = r6 ; reduce by F→id
GOTO ( I0 , F ) = 3
0F3 +id * id $
GOTO ( I3 , + ) = r4 ; reduce by T → F
GOTO ( I0 , T ) = 2
0T2 + id * id $
GOTO ( I2 , + ) = r2 ; reduce by E → T
+ id * id $ GOTO ( I0 , E ) = 1
0E1
GOTO ( I1 , + ) = s6 ; shift
0E1+6 id * id $ GOTO ( I6 , id ) = s5 ; shift
0 E 1 + 6 id 5 * id $ GOTO ( I5 , * ) = r6 ; reduce by F→ id
0E1+6F 3 * id $ GOTO ( I6 , F ) = 3
GOTO ( I3 , * ) = r4 ; reduce by T → F
GOTO ( I6 , T ) = 9
0E1+6T 9 * id $
GOTO ( I9 , * ) = s7 ; shift
0E1+6T 9*7 id $ GOTO ( I7 , id ) = s5 ; shift
0 E 1+ 6 T 9 * 7 id 5 $ GOTO ( I5 , $ ) = r6 ; reduce by F→ id
GOTO ( I7 , F ) = 10
0 E 1+ 6 T 9 * 7 F
$ GOTO ( I10 , $ ) = r3 ; reduce by T → T *
10
F
63
GOTO ( I6 , T)=9
0 E 1+ 6 T 9 $
GOTO(I9,$)=r1 ; reduce by E-E+T
6.a.CONSTRUCTION OF SLR PARSING TABLE
Closure operation:
If I is a set of items for a grammar G, then closure(I) is the set of items constructed
from I by the two rules:
\{ Initially, every item in I is added to closure(I).
\{ If A → α . Bβ is in closure(I) and B → γ is a production, then add the item B
→ . γ to I , if it is not already there. We apply this rule until no more new items
can be added to closure(I).
Goto operation:
Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such
that [A→ α . Xβ] is in I.
Steps to construct SLR parsing table for grammar G are:
1. Augment G and produce G‘
2. Construct the canonical collection of set of items C for G‘
3. Construct the parsing action function action and goto using the following
algorithm that requires FOLLOW(A) for each non-terminal of grammar.
E->.id
I1: goto(I0, E)
E.->E.
E->E.+E
E->E.*E
I2: goto(I0, ()
E->(.E)
E->.E+E
E->.E*E
E->.(E)
E->.id
I3: goto(I0, id)
E->id.
I4: goto(I1, +)
E->E+.E
E->.E+E
E->.E*E
E->.(E)
E->.id
I5: goto(I1, *)
E->E*.E
E->.E+E
E->.E*E
E->.(E)
E->.id
I6: goto(I2, E)
E->(E.)
E->E.+E
E->E.*E
I7: goto(I4, E)
E->E+E.
E->E.+E
E->E.*E
I8: goto(I5, E)
E->E*E.
E->E.+E
E->E.*E
goto(I2, ()=I2
goto(I2, id)=I3
goto(I4, ()=I2
goto(I4, id)=I3
goto(I5, ()=I2
goto(I5, id)=I3
66
I9: goto(I6, ))
E->(E).
goto(I6, +)=I4
goto(I6, *)=I5
goto(I7, +)=I4
goto(I7, *)=I5
goto(I8, +)=I4
goto(I8, *)=I5
First(E) = {(, id}
Follow(E)={+, *, ), $}
Design an LALR Parser for the following Grammar and parse the input
string w=*id
S L=R SR L*R Lid RL (16) [Nov/Dec 2013]
Find the LALR Parser for the following Grammar
S L=R SR L*R Lid RL (16) [Nov/Dec 2014]
LALR PARSER:
Closure operation:
If I is a set of items for a grammar G, then closure(I) is the set of items constructed
from I by the two rules:
\{ Initially, every item in I is added to closure(I).
\{ If A → α . Bβ is in closure(I) and B → γ is a production, then add the item B
→ . γ to I , if it is not already there. We apply this rule until no more new items
can be added to closure(I).
Goto operation:
Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such
67
Example :
S L=R SR L*R Lid RL
Augumented Grammar(I)
S‘S
68
S L=R
SR
L*R
Lid
RL
Closure(I) :
I0 : S‘. S , $
S . L=R , $
S. R , $
L. *R , $
L. id , $
R. L , $
S L . =R , $
RL . , $
I9: goto(I6,R)
S L = R . , $
69
Action Goto
State
= * id $ S L R
0 s4 s5 1 2 3
1 acc
2 s6 r5
3 r2
4 s4 s5 8 7
5 r4 r4
6 s4 s5 8 9
7 r3 r3
8 r5 r5
9 r1
YACC:
YACC Stands for Yet Another Compiler Compiler.
YACC is a tool available for converting parse trees from source programs.
The input to YACC is a CFG and is having a specific structure. For each
production there is an action implemented.
70
Definitions:
There are three points to be considered.
i. C Code:
All code between %{ and % } is copied to the beginning of the resulting C
File.TThis is used to define Variables Prototypes and routines etc..
ii. Definition:
The definition of a YACC file is concerned with tokens. These token
definitions are written to a .h file when YACC compiles this file.
iii. Associativity Rules:
These handle associativity and priority of operators.
Rules:
A number of combinations of patterns and action is more than a single
command then it needs to be in braces.
Code:
This can be very elaborate. It includes a call to yylex.
The LEX Code parses a file of characters and outputs a stream of tokens.
YACC accepts a stream of tokens. YACC accepts a stream of tokens and parses it,
performing appropriate actions.
If the LEX program is supplying tokenizer YACC program will repeatedly call
the yylex routine. The rules will repeatedly call the yylex routine. The rules will
probably function by calling return every time LEX have passed a token.
If LEX returns tokens then YACC will process them and they have to agree on
what tokens there are. This is done as follows.
The YACC file will have token definitions in the definition section.
% token NUMBER
When the YACC file is translated with YACC -d , a header file y.tab.h
is created.
71
Return Values:
The LEX Parse can return a symbol that is put on top of the stack so that
YACC can access it. This symbol is returned in the variable yylval.
By default this is defined as an int
Extern int llval;
%%
[0-9]+ {llval=atoi(yytext); return NUMBER ;}
Rules Section:
The rules section contains the grammar of the language we want to parse.
This looks like
72
UNIT – IV
SYNTAX DIRECTED TRANSLATION & RUN TIME ENVIRONMENT
Syntax Directed Definitions-Construction of syntax Tree-Bottom up Evaluation of S-
Attribute Definitions-Design of predictive Translator-Type Systems-Specification of a
simple type checker-Equivalence of Type Expressions-Type Conversions. RUN-
TIME ENVIRONMENT : Source Language Issues—Storage organization-Storage
Allocation-Parameter Passing-Symbol Tables. Dynamic Storage Allocation-Storage
Allocation in FORTRAN.
PART-A
1. Define syntax directed definition
A Syntax Directed Definition(SDD) is a context-free grammar together with attributes
and rules. Attributes are associated with grammar symbols and rules are associated
with productions. If X is a symbol and a is one of its attributes, then we write X.a to
denote the value of a at a particular parse-tree node labled X.
be computed.
Type checks
Flow-of-control checks
Uniqueness checks
Name-related checks
10. List the different storage allocation strategies and write down the
limitations of static allocation.
The strategies are:
Static allocation
Stack allocation
Heap allocation
Limitations of static allocation:
The size of a data object and constraints on its position in memory
must be known at compile time
Recursive procedure are restricted, because all activations of a
procedure use the same binding for local name
Data structures cannot be created dynamically since there is no
mechanism for storage allocation at run time
11. Define activation tree and what are the contents of activation record?
Activation tree
Control flows sequentially; the execution of a program consists of a
sequence of steps, with control being at some specific point in the
program at each step.
Each execution of a procedure starts at the beginning of the procedure
body and eventually returns control to the point immediately following
the place where the procedure was called. This means the flow of
control between procedures can be depicted using trees.
The activation record is a block of memory used for managing the information
needed by a single execution of a procedure. Various fields of activation record are:
Temporary variables.
Local variables
Saved machine registers
Control link
Access link
Actual parameters
Return values
12. Write down syntax directed definition of a simple desk calculator. [Nov/Dec
2016]
75
PART-B
begin
for i : = 1 to 9 do read(a[i]) end;
When a procedure name appears within an executable statement, the procedure is
said to be called at that point.
Activation trees:
An activation tree is used to depict the way control enters and leaves activations. In
an activation tree,
1. Each node represents an activation of a procedure.
2. The root represents the activation of the main program.
3. The node for a is the parent of the node for b if and only if control flows from
activation a to b.
76
4. The node for a is to the left of the node for b if and only if the lifetime of a
occurs before the lifetime of b.
Control stack:
A control stack is used to keep track of live procedure activations. The idea is
to push the node for an activation onto the control stack as the activation
begins and to pop the node when the activation ends.
The contents of the control stack are related to paths to the root of the
activation tree. When node n is at the top of control stack, the stack contains
the nodes along the path from n to the root.
The Scope of a Declaration:
A declaration is a syntactic construct that associates information with a name.
Declarations may be explicit, such as:
var i : integer ;
or they may be implicit. Example, any variable name starting with I is assumed to
denote an integer.
The portion of the program to which a declaration applies is called the scope of that
declaration.
Binding of names:
Even if each name is declared once in a program, the same name may denote
different data objects at run time. ―Data object‖ corresponds to a storage location that
holds values.
The term environment refers to a function that maps a name to a storage location.
The term state refers to a function that maps a storage location to the value held
there.
environment state
STORAGE ORGANISATION
The executing target program runs in its own logical address space in which
each program value has a location.
The management and organization of this logical address space is shared
between the complier, operating system and target machine. The operating
77
system maps the logical address into physical addresses, which are usually
spread throughout memory.
Free memory
Heap
This unused space due to alignment considerations is referred to as padding.
The size of some program objects may be known at run time and may be
placed in an area called static.
The dynamic areas used to maximize the utilization of space at run time are
stack and heap.
Activation records:
Procedure calls and returns are usually managed by a run time stack called
the control stack.
Each live activation has an activation record on the control stack, with the
root of the activation tree at the bottom, the latter activation has its record at
the top of the stack.
The contents of the activation record vary with the language being
implemented. The diagram below shows the contents of activation record.
Temporary values such as those arising from the evaluation of expressions.
78
Local data belonging to the procedure whose activation record this is.
A saved machine status, with information about the state of the machine just
before the call to procedures.
An access link may be needed to locate data needed by the called procedure
but found elsewhere.
A control link pointing to the activation record of the caller.
Space for the return value of the called functions, if any. Again, not all called
procedures return a value, and if one does, we may prefer to place that value
in a register for efficiency.
The actual parameters used by the calling procedure. These are not placed in
activation record but rather in registers, when possible, for greater efficiency.
2.STORAGE ALLOCATION
Discuss in detail about Storage Allocation Strategies.[Nov/Dec 2013]
[May /June 2016]
Discuss Run Time Storage Management in detail.[Nov/Dec
2013][Apr/May 2011].
What are the different Storage Allocation Strategies?Explain.[Nov/Dec
2011]
STORAGE ALLOCATION STRATEGIES
The different storage allocation strategies are :
1. Static allocation – lays out storage for all data objects at compile time
2. Stack allocation – manages the run-time storage as a stack.
3. Heap allocation – allocates and deallocates storage as needed at run time from a
data area known as heap.
Static Allocation
In static allocation, names are bound to storage as the program is compiled,
so there is no need for a run-time support package.
79
Values communicated between caller and callee are generally placed at the
beginning of the callee‘s activation record, so they are as close as possible to
the caller‘s activation record.
Fixed length items are generally placed in the middle. Such items typically
include the control link, the access link, and the machine status fields.
Items whose size may not be known early enough are placed at the end of the
activation record. The most common example is dynamically sized array,
where the
value of one of the callee‘s parameters determines the length of the array
80
The calling sequence and its division between caller and callee are as follows.
The caller evaluates the actual parameters.
The caller stores a return address and the old value of top_sp into the callee‘s
activation record. The caller then increments the top_sp to the respective
positions.
The callee saves the register values and other status information. The callee
initializes its local data and begins execution.
81
Procedure p has three local arrays, whose sizes cannot be determined at compile
time. The storage for these arrays is not part of the activation record for p.
Access to the data is through two pointers, top and top-sp. Here the top marks
the actual top of stack; it points the position at which the next activation record
will begin.
The second top-sp is used to find local, fixed-length fields of the top activation
record.
The code to reposition top and top-sp can be generated at compile time, in
terms of sizes that will become known at run time.
82
Heap Allocation
Stack allocation strategy cannot be used if either of the following is possible:
1. The values of local names must be retained when activation ends.
2. A called activation outlives the caller.
Heap allocation parcels out pieces of contiguous storage, as needed for activation
records or other objects.
Pieces may be deallocated in any order, so over the time the heap will consist of
alternate areas that are free and in use.
The record for an activation of procedure r is retained when the activation ends.
Therefore, the record for the new activation q(1 , 9) cannot follow that for s
physically. If the retained activation record for r is deallocated, there will be free
space in the heapbetween the activation records for s and q.
3.a.TYPE SYSTEMS
Illustrate type checking with necessary diagram . [Nov/Dec 2016].
TYPE CHECKING
A compiler must check that the source program follows both syntactic and semantic
conventions of the source language.
This checking, called static checking, detects and reports programming errors. Some
examples of static checks:
1. Type checks – A compiler should report an error if an operator is applied to an
incompatible operand. Example: If an array variable and function variable are added
together.
83
A type checker verifies that the type of a construct matches that expected by its
context. For example : arithmetic operator mod in Pascal requires integer operands,
so a type checker verifies that the operands of mod have type integer.
Type information gathered by a type checker may be needed when code is
generated.
A Simple Language
Consider the following grammar:
P→D;E
D → D ; D | id : T
T → char | integer | array [ num ] of T | ↑ T
E → literal | num | id | E mod E | E [ E ] | E ↑
Translation scheme:
P→D;E
D→D;D
D → id : T { addtype (id.entry , T.type) } T → char { T.type : = char }
T → integer { T.type : = integer }
T → ↑ T1 { T.type : = pointer(T1.type) }
T → array [ num ] of T1 { T.type : = array ( 1… num.val , T1.type) }
84
TYPE SYSTEMS
The design of a type checker for a language is based on information about the
syntactic constructs in the language, the notion of types, and the rules for assigning
types to language constructs.
For example : ― if both operands of the arithmetic operators of +,- and * are of type
integer, then the result is of type integer ‖
Type Expressions
The type of a language construct will be denoted by a ―type expression.‖
A type expression is either a basic type or is formed by applying an operator called a
type constructor to other type expressions.
The sets of basic types and constructors depend on the language to be checked.
The following are the definitions of type expressions:
1. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type_error , will signal an error during type checking; void
denoting ―the absence of a value‖ allows statements to be checked.
2. Since type expressions may be named, a type name is a type expression.
3. A type constructor applied to type expressions is a type expression. Constructors
include:
Arrays : If T is a type expression then array (I,T) is a type expression denoting the
type of an array with elements of type T and index set I.
Products : If T1 and T2 are type expressions, then their Cartesian product T1 X T2
is a type expression.
Records : The difference between a record and a product is that the fields of a
record have names. The record type constructor will be applied to a tuple formed
from field names and field types.
For example:
type row = record address: integer;
lexeme: array[1..15] of char end;
var table: array[1...101] of row;
declares the type name row representing the type expression record((address X
integer) X
(lexeme X array(1..15,char))) and the variable table to be an array of records of this
type.
Pointers : If T is a type expression, then pointer(T) is a type expression denoting the
type ―pointer to an object of type T‖.
86
For example, var p: ↑ row declares variable p to have type pointer(row). Functions
: A function in programming languages maps a domain type D to a range type R.
The type of such function is denoted by the type expression D → R
4. Type expressions may contain variables whose values are type expressions.
Tree representation for char x char → pointer (integer)
Type systems
A type system is a collection of rules for assigning type expressions to the various
parts of a program.
A type checker implements a type system. It is specified in a syntax-directed
manner.
Different type systems may be used by different compilers or processors of the same
language.
87
A Simple Language
P→D;E
D→D;D
D → id : T { addtype (id.entry , T.type)}
T → char { T.type : = char }
T → integer { T.type : = integer }
T → ↑ T1 { T.type : = pointer(T1.type) }
T → array [ num ] of T1 { T.type : = array ( 1… num.val , T1.type) }
In the following rules, the attribute type forE gives the type expression assigned to
the expression generated by E.
88
Here, constants represented by the tokens literal and num have type char and
integer.
2. E → id { E.type : = lookup ( id.entry ) }
lookup ( e ) is used to fetch the type saved in the symbol table entry pointed to
by e.
3. E → E1 mod E2 { E.type : = if E1. type = integer and
E2. type = integer then integer
else type_error }
The expression formed by applying the mod operator to two subexpressions of type
integer has type integer; otherwise, its type is type_error.
4. E → E1 [ E2 ] { E.type : = if E2.type = integer and
E1.type = array(s,t) then t
else type_error }
In an array reference E1 [ E2 ] , the index expression E 2 must have type integer.
The result is the element type t obtained from the type array(s,t) of E1.
5. E → E1 ↑ { E.type : = if E1.type = pointer (t) then t
else type_error }
The postfix operator ↑ yields the object pointed to by its operand. The type of E ↑ is
the type t of the object pointed to by the pointer E.
89
1.Syntax-Directed Definitions
A syntax-directed definition (SDD) is a context-free grammar together
with attributes and rules. Attributes are associated with grammar
symbols and rules are associated with productions.
An attribute has a name and an associated value: a string, a number, a
type, a memory location, an assigned register, strings. The strings may
even be long sequences of code, say code in the intermediate
language used by a compiler. If X is a symbol and a is one of its
attributes, then we write X.a to denote the value of a at a particular
parse-tree node labeled X. If we implement the nodes of the parse tree
by records or objects, then the attributes of X can be implemented by
data fields in the records that represent the nodes for X. The attributes
are evaluated by the semantic rules attached to the productions.
90
91
Example : The SDD in Fig. 5.1 is based on grammar for arithmetic expressions with
operators + and *. It evaluates expressions terminated by an endmarker n.
In the SDD,each of the nonterminals has a single synthesized attribute,
called val. We also suppose that the terminal digit has a synthesized
attribute lexval, which is an integer value returned by the lexical
analyzer.
An SDD that involves only synthesized attributes is called S-attributed;
the SDD in Fig. 5.1 has this property. In an S-attributed SDD, each rule
computes an attribute for the nonterminal at the head of a production
from attributes taken from the body of the production.
92
Inherited attributes are useful when the structure of a parse tree does
not match the abstract syntax of the source code. They can be used to
overcome the mismatch due to grammar designed for parsing rather
than translation.
In the SDD below, the nonterminal T has an inherited attribute inh as
well as a synthesized attribute val. T inherits F.val from its left sibling F
in the production T → F T .
Annotated Parse Tree for 3*5 using the above SDD is as below.
An SDD with both inherited and synthesized attributes does not ensure any
guaranteed order; even it may not have an order at all. For example, consider
nonterminals A and B, with synthesized and inherited attributes A.s and B.i,
respectively, along with the production and rules as in Fig.5.2. These rules are
circular; it is impossible to evaluate either A.s at a node N or B.i at the child of N
without first evaluating the other. The circular dependency of A.s and B.i at some
pair of nodes in a parse tree is suggested by Fig.
93
Solution:
1. The CFG
G = {{a,b,c},{S},S,P} for all strings over the alphabet {a,b,c} with Pas the
set of productions given below .S->SAS->SbS->ScS->aS->bS->c.
2. Given the grammar above
94
Define three synthesized attributes for the non terminals symbol S,namely
nA1,nA2, and total.The idea of these attributes is that in the first attribute
will capture the number of a‘s to left of a given c character, the second
attribute, nA@, the number of a‘s to the right of that character.so that we
can then add the value of a‘s to the right of c ,so that when find a new c,we
copy the value of a‘s that were to the right of the first c and which are now
to the left of the second c.
3. As suc ha set of rules is as follows, jere written as semantic actions given
that their order of evaluation is done using a bottom-up depth-first search
traversal.
S1-> S2 a {S1.nA1=S2.nA1 +1;S1.nA2=S2.total = S2.total;}
S1-> S2 b {S1.nA1=S2.nA1;S2.nA2=S2.nA2;S1.total=S2.total+S2.nA2;}
S1-> S2 c {S1.nA1=0;S1.nA2=S2.nA1;S1.total = S2.total;}
S1->a{S1.nA1=1;S1.nA2=0;S1.total=0;}
S1->b{S1.nA1=0;S1.nA2=0;S1.total=0;}
S1->c{S1.nA1=1;S1.nA2=0;S1.total=0;}
5.CONSTRUCTION OF SYNTAX TREE
Describe the procedure to construct the syntax tree using syntax
directed translation scheme.
If the node is a leaf, an additional field holds the lexical value for
the leaf . This is created by function Leaf(op, val)
If the node is an interior node, there are as many fields as the
node has children in the syntax tree. This is created by function
Node(op, c1, c2,...,ck) .
Example: The S-attributed definition in figure below constructs syntax trees for a
simple expression grammar involving only the binary operators + and -. As usual,
these operators are at the same precedence level and are jointly left associative. All
nonterminals have one synthesized attribute node, which represents a node of the
syntax tree.
With a grammar designed for top-down parsing, the same syntax trees are
constructed, using the same sequence of steps, even though the structure of the
parse trees differs significantly from that of syntax trees.
6.PARAMETER PASSING
96
PARAMETER PASSING
When one procedure calls another the usual method of communication between
them is through non local names and through parameter of the called procedure.
Call by Value
• Each actual argument is evaluated before call. On entry, the resulting value is
copied and bound to the formal parameter; which behaves just like a local variable.
• Advantages: –
• Formal parameters can be used as local variables, Updating them doesn‘t affect
actuals in calling procedure:
{ a = a * a; b = b * b;
Call by Reference
• If actual argument doesn‘t have an l-value (e.g., ―2 + 3‖), then either: – Forbid
in language, i.e. treat as an error; compiler catches this – evaluate it into a
temporary location and pass its address
• Advantages
Call by Copy-Restore
• Upon exit, the final contents of formals are copied into the actual
• Thus, behaves like call by reference in most ―normal‖ situations, but may give
different results when concurrency or aliasing are involved:
end foo;
r.a := 1; foo( r );
print( r.a );
Method.
1.For each nonterminal A, construct a function that has a formal parameter for each
inherited attribute of A and that returns the values of the synthesized attributes of A.
2.The code for nonterminal A decides what production to use based on the current
input symbol.
98
We consider the tokens, nonterminals, and actions on the right side of the production
from left to right.
i) For token X with synthesized attribute x, save the value of x in the variable
declared for X. x. then generate a call to match token X and advance the input.
iii)For an action , copy the code into the parser, replacing each reference to an
attribute by the variable for that attribute.
R addop
T {R1.i := mknode(addop.lexeme, R.i, T.nptr)}
R1 {R.s := R1.s}
Rε {R.s := R.i}
The code for R is based on the parsing procedure is
Procedure R;
begin
if lookahead=addop then begin
match(addop);T;R
end
else begin /*do nothing*/
end
end;
addoplexeme : char;
begin
/* production R addop T R */
match(addop);
nptr := T;
i1 := mknode(addoplexeme, i, nptr);
s1 := R(il)
s := s1;
end
100
else s := i; /* production R ε */
return s
end;
Contains code for evaluating attributes. The lexical value lexval of the token addop
is saved in addoplexeme, addop is matched, T is called, and its result is saved using
nptr. Variable i1 corresponds to the inherited attribute R1.i, and s1 to the synthesized
attribute R1.s. the return statement returns the value of s just before control leaves
the function. The functions for E and T are constructed similarly.
UNIT – 5
CODE OPTIMIZATION AND CODE GENERATION
Principal Sources of Optimization-DAG-Optimization of Basic Blocks-Global Data
Flow Analysis-Efficient Data Flow Algorithms-Issues in Design of a Code Generator-
A simple Code Generator Algorithm.
PART – A
1. What are the properties of optimizing compilers? [May /June 2016]
The source code should be such that it should produce minimum
amount of target code.
There should not be any unreachable code.
Dead code should be completely removed from source language.
The optimizing compilers should apply following code improving
transformations on source language.
Common subexpression elimination
Dead code elimination
Code movement
Strength reduction
2. List the terminologies used in basic blocks.
Define and use – the three address statement a:=b+c is said to define a
and to use b and c.
Live and dead – the name in the basic block is said to be live at a given
point if its value is used after that point in the program. And the name in
the basic block is said to be dead at a given point if its value is never used
after that point in the program.
3. What is a DAG? Mention its applications. [May /June 2016]
Directed acyclic graph(DAG) is a useful data structure for implementing
transformations on basic blocks. DAG is used in
Determining the common sub-expressions.
101
102
103
d := t4
PART-B
1.PRINCIPAL SOURCES OF OPTIMIZATION
Write in detail about the Function Preserving Transformation.[Apr/May 2011]
Explain loop optimization in detail and apply to an example. [Apr/May
2015]
Explain Principal sources of optimization. [Nov/Dec 2014]
Write in detail about loop optimization. [Nov/Dec 2013]
Describe in detail the principal sources optimization. [Nov/Dec 2011]
Explain the principal sources of optimization with examples.
[May/Jun 2016]
Function-Preserving Transformations
There are a number of ways in which a compiler can improve a program
without changing the function it computes.
The transformations
Common sub expression elimination,
Copy propagation,
Dead-code elimination, and
Constant folding
are common examples of such function-preserving transformations. The other
transformations come up primarily when global optimizations are performed.
Frequently, a program will include several calculations of the same value,
such as an offset in an array. Some of the duplicate calculations cannot be avoided
by the programmer because they lie below the level of detail accessible within the
source language.
Common Sub expressions elimination:
An occurrence of an expression E is called a common sub-expression if E
was previously computed, and the values of variables in E have not changed since
the previous computation. We can avoid recomputing the expression if we can use
the previously computed value.
For example
t1: =4*i t2: =a[t1]
t3:=4*j t4:=4*i
t5: =n
t 6: =b [t 4] +t 5
105
The above code can be optimized using the common sub-expression elimination
as
t1=4 * i
t2=a[t1]
t3=4*j
t5=n
t6=b[t1] + t5
106
Loop Optimizations:
We now give a brief introduction to a very important place for optimizations,
namely loops, especially the inner loops where programs tend to spend the bulk of
their time. The running time of a program may be improved if we decrease the
number of instructions in an inner loop, even if we increase the amount of code
outside that loop.
Three techniques are important for loop optimization:
1.code motion, which moves code outside a loop;
2.Induction -variable elimination, which we apply to replace variables
from inner loop.
3.Reduction in strength, which replaces and expensive operation by a
cheaper one, such as a multiplication by an addition.
1.Code Motion:
An important modification that decreases the amount of code in a loop
is code motion. This transformation takes an expression that yields the
same result independent of the number of times a loop is executed ( a
loop-invariant computation) and places the expression before the loop.
Note that the notion ―before the loop‖ assumes the existence of an
entry for the loop. For example, evaluation of limit-2 is a loop-invariant
computation in the following while-statement:
while (i <= limit-2) /* statement does not change limit*/
Code motion will result in the equivalent of
t= limit-2;
while (i<=t) /* statement does not change limit or t */
2.Induction Variables :
Loops are usually processed inside out. For example consider the loop
around B3.
Note that the values of j and t4 remain in lock-step; every time the
value of j decreases by 1, that of t4 decreases by 4 because 4*j is
assigned to t4. Such identifiers are called induction variables.
When there are two or more induction variables in a loop, it may be
possible to get rid of all but one, by the process of induction-variable
elimination. For the inner loop around B3 in Fig. we cannot get rid of
either j or t4 completely; t4 is used in B3 and j in B4.
However, we can illustrate reduction in strength and illustrate a part of the process
of induction-variable elimination. Eventually j will be eliminated when the outer loop
of B2 - B5 is considered.
Example:
107
As the relationship t 4:=4*j surely holds after such an assignment to t 4 in Fig. and t4
is not changed elsewhere in the inner loop around B3, it follows that just after the
statement j:=j -1 the relationship t4:= 4*j-4 must hold. We may therefore replace the
assignment t 4:= 4*j by t4:= t4-4. The only problem is that t 4 does not have a value
when we enter block B3 for the first time. Since we must maintain the relationship
t4=4*j on entry to the block B3, we place an initializations of t4 at the end of the
block where j itself is initialized, shown by the dashed addition to block B1 in second
Fig.
before after
The replacement of a multiplication by a subtraction will speed up the
object code if multiplication takes more time than addition or subtraction,
as is the case on many machines.
3.Reduction In Strength:
Reduction in strength replaces expensive operations by equivalent cheaper
ones on the target machine. Certain machine instructions are considerably
cheaper than others and can often be used as special cases of more
expensive operators.
For example, x² is invariably cheaper to implement as x*x than as a call to an
exponentiation routine. Fixed-point multiplication or division by a power of two
is cheaper to implement as a shift. Floating-point division by a constant can
be implemented as multiplication by a constant, which may be cheaper.
For Example
In block B3
j=j-1
t4=4*j
Can be replaced by
108
j=j-1
t4=t4-4
2.DAG
What are the advantages of DAG Representation?Give example.
[Apr/May 2015]
Describe the algorithm for constructing DAG with an example.
[Apr/May 2015]
Generate DAG Representation with an example and list out the
applications of DAG Representation.
[Nov/Dec 2014]
Construct DAG and three address code for the following C Code
[Nov/Dec 2013]
prod=0
i=1
while(i<=20)
{ prod=prod+a[i]*b[i]
i=i+1
}
A DAG for a basic block is a directed acyclic graph with the following labels
on nodes:
Leaves are labeled by unique identifiers, either variable names or
constants.
Interior nodes are labeled by an operator symbol.
Nodes are also optionally given a sequence of identifiers for labels to
store the computed values.
DAGs are useful data structures for implementing transformations on basic
blocks.
It gives a picture of how the value computed by a statement is used in
subsequent statements.
It provides a good way of determining common sub – expressions
Application of DAGs:
Method:
Step 2: For the case(i), create a node(OP) whose left child is node(y) and right child
is
node(z) . (Checkingfor common sub expression). Let n be this node.
For case(ii), determine whether there is node(OP) with one child node(y). If not create such a node
For case(iii), node n will be node(y).
Step 3: Delete x from the list of identifiers for node(x). Append x to the list of attached
identifiers for the noden found in step 2 and set node(x) to n.
1. t1 := 4* i
2. t2 := a[t1]
3. t3 := 4* i
4. t4 := b[t3]
5. t5 := t2*t4
6. t6 := prod+t5
7. prod := t6
8. t7 := i+1
9. i := t7
10. if i<=20 goto (1)
110
111
112
113
114
2.Algebraic Transformations:
Algebraic identities represent another important class of optimizations on
basic blocks. This includes simplifying expressions or replacing expensive
operation by cheaper ones i.e. reduction in strength.
Another class of related optimizations is constant folding. Here we evaluate
constant expressions at compile time and replace the constant expressions
by their values. Thus the expression 2*3.14 would be replaced by 6.28.
The relational operators <=, >=, <, >, + and = sometimes generate
unexpected common sub expressions.
Associative laws may also be applied to expose common sub expressions.
For example, if the source code has the assignments
a :=b+c
e:=c+d+b
the following intermediate code may be generated:
a :=b+c
t :=c+d
e:=t+b
Example:
x:=x+0 can be removed
x:=y**2 can be replaced by a cheaper statement x:=y*y
115
116
Now let us take a global view and consider all the points in all the blocks. A
path from p1 to pn is a sequence of points p1, p2,….,pn such that for each i
between 1 and n-1, either
Pi is the point immediately preceding a statement and pi+1 is the point
immediately following that statement in the same block, or
Pi is the end of some block and pi+1 is the beginning of a successor block.
Reaching definitions:
A definition of variable x is a statement that assigns, or may assign, a
value to x. The most common forms of definition are assignments to x and
statements that read a value from an i/o device and store it in x.
These statements certainly define a value for x, and they are referred to as
unambiguous definitions of x. There are certain kinds of statements that
may define a value for x; they are called ambiguous definitions. The most
usual forms of ambiguous definitions of x are:
A call of a procedure with x as a parameter or a procedure that can access x
because x is in the scope of the procedure.
An assignment through a pointer that could refer to x. For example, the
assignment *q: = y is a definition of x if it is possible that q points to x. we
must assume that an assignment through a pointer is a definition of every
variable.
117
Flow graphs for control flow constructs such as do-while statements have a
useful property: there is a single beginning point at which control enters and a
single end point that control leaves from when execution of the statement is
over. We exploit this property when we talk of the definitions reaching the
beginning and the end of statements with the following syntax.
S1 If E goto
S1
S1
S2 S2
S1 If E goto S1
S1 ; S2 if E then S1 else S2 do
S1 while E
118
i)
gen[S]={d}
S1
S
S2
gen[S]=gen[S2]U(gen[S1]-kill[S2]) Kill[S] =
kill[S2] U (kill[S1] – gen[S2])
in [S1] = in [S] in [S2]= out [S1] out [S] = out[S2]
iii)
S1 S2
S
119
gen[S]=gen[S1]Ugen[S2]
kill[S]=kill[S1]∩kill[S2]
in [S1] = in [S]
in [S2]= in [S]
out [S] = out[S1]
iv)
S S
gen[S]=gen[S1]
kill[S]=kill[S1]
in[S1]=in[S]Ugen[S1]
out[S]=out[S1]
Under what circumstances is definition d generated by S=S1; S2? First of all,
if it is generated by S2, then it is surely generated by S. if d is generated by
S1, it will reach the end of S provided it is not killed by S2. Thus, we write
gen[S]=gen[S2] U (gen[S1]-kill[S2])
Similar reasoning applies to the killing of a definition, so
we have Kill[S] = kill[S2] U (kill[S1] – gen[S2])
120
Out[S]=out[S 1] U out[S2]
Representation of sets:
Sets of definitions, such as gen[S] and kill[S], can be represented
compactly using bit vectors. We assign a number to each definition of
interest in the flow graph. Then bit vector representing a set of definitions
will have 1 in position I if and only if the definition numbered I is in the set.
The number of definition statement can be taken as the index of statement in
an array holding pointers to statements. However, not all definitions may be of
interest during global data-flow analysis. Therefore the number of definitions
of interest will typically be recorded in a separate table.
A bit vector representation for sets also allows set operations to be
implemented efficiently. The union and intersection of two sets can be
implemented by logical or and logical and, respectively, basic operations in
most systems-oriented programming
languages. The difference A-B of sets A and B can be implemented
by taking the complement of B and then using logical and to compute
A .
Local reaching definitions:
Space for data-flow information can be traded for time, by saving information
only at certain points and, as needed, recomputing information at intervening
points. Basic blocks are usually treated as a unit during global flow analysis,
with attention restricted to only those points that are the beginnings of blocks.
Since there are usually many more points than blocks, restricting our effort to
blocks is a significant savings. When needed, the reaching definitions for all
points in a block can be calculated from the reaching definitions for the
beginning of a block.
Use-definition chains:
It is often convenient to store the reaching definition information as‖ use-
definition chains‖ or ―ud-chains‖, which are lists, for each use of a variable, of
all the definitions that reaches that use. If a use of variable a in block B is
preceded by no unambiguous definition of a, then ud-chain for that use of a
is the set of definitions in in[B] that are definitions ofa.in addition, if there are
ambiguous definitions of a ,then all of these for which no unambiguous
definition of a lies between it and the use of a are on the ud-chain for this use
of a.
Evaluation order:
The techniques for conserving space during attribute evaluation, also apply
to the computation of data-flow information using specifications. Specifically,
the only constraint on the evaluation order for the gen, kill, in and out sets for
statements is that imposed by dependencies between these sets. Having
chosen an evaluation order, we are free to release the space for a set after
122
123
124
5. Register allocation
Instructions involving register operands are shorter and faster than those
involving operands in memory.
The use of registers is subdivided into two subproblems :
Register allocation – the set of variables that will reside in registers
at a point in the program is selected.
Register assignment – the specific register that a variable will reside
in is picked.
Certain machine requires even-odd register pairs for some operands
and results. For example , consider the division instruction of the
form :
D x, y
(or)
(or)
125
ADD Rj, Ri
An address descriptor stores the location where the current value of the name
can be found at run time.
A code-generation algorithm:
The algorithm takes as input a sequence of three -address statements constituting a
basic block. For each three-address statement of the form x : = y op z, perform the
following actions:
1. Invoke a function getreg to determine the location L where the result of the
computation y op z should be stored.
2. Consult the address descriptor for y to determine y‘, the current location of
y. Prefer the register for y‘ if the value of y is currently both in memory and a
register. If the value of y is not already in L, generate the instruction MOV y‟ , L
to place a copy of y in L.
126
Register Address
Statements Code Generated
descriptor descriptor
Register empty
127
a : = *p MOV *Rp, a 2
*p : = a MOV a, *Rp 2
7. PEEPHOLE OPTIMIZATION
Explain peephole optimization in detail.
Write an algorithm for constructing natural loop of a back edge. Nov : 16
PEEPHOLE OPTIMIZATION
Many simple transformations can significantly improve the running time or
space requirement of the target program.
A simple but effective technique for locally improving the target code is
peephole optimization, which is done by examining a sliding window of target
instructions (called the peephole) and replacing instruction sequences within the
peephole by a shorter or faster sequence, whenever possible. Peephole optimization
can also be applied directly after intermediate code generation to improve the
intermediate representation. The peephole is a small, sliding window on a program.
The characteristics of Peephole Optimization:
• Redundant-instruction elimination
• Flow-of-control optimizations
• Algebraic simplifications
• Use of machine idioms
128
Flow-of-Control Optimizations
Simple intermediate code-generation algorithms frequently produce jumps to jumps,
jumps to conditional jumps, or conditional jumps to jumps. These unnecessary jumps
can be eliminated in either the intermediate code or the target code by the following
types of peephole optimizations. We can replace the sequence
goto Li
LI: goto L2
by the sequence
goto L2
LI: goto L2
If there are now no jumps to LI, then it may be possible to eliminate the statement LI:
goto L2 provided it is preceded by an unconditional jump. Similarly, the sequence
if a < b goto LI
LI: goto L2
Can be replaced by the sequence
if a < b goto L2
LI: goto L2
Finally, suppose there is only one jump to LI and LI is preceded by an unconditional
goto. Then the sequence
129
goto LI
LI: if a < b goto L2
L3:
may be replaced by the sequence
if a < b goto L2
goto L3
L3:
While the number of instructions in the two sequences is the same, we sometimes
skip the unconditional jump in the second sequence, but never in the first. Thus, the
second sequence is superior to the first in execution time.
These algebraic identities can also be used by a peephole optimizer to eliminate
three-address statements such as
x=x+0
or
x=x*1
in the peephole. Similarly, reduction-in-strength transformations can be applied in the
peephole to replace expensive operations by equivalent cheaper ones on the target
machine. Certain machine instructions are considerably cheaper than others and can
often be used as special cases of more expensive operators. For example, x2 is
invariably cheaper to implement as x * x than as a call to an exponentiation routine.
Fixed-point multiplication or division by a power of two is cheaper to implement as a
shift. Floating-point division by a constant can be approximated as multiplication by a
constant, which may be cheaper.
Use of Machine Idioms
The target machine may have hardware instructions to implement certain specific
operations efficiently. Detecting situations that permit the use of these instructions
can reduce execution time significantly. For example, some machines have auto-
increment and auto-decrement addressing modes. These add or subtract one from
an operand before or after using its value. The use of the modes greatly improves
the quality of code when pushing or popping a stack, as in parameter passing. These
modes can also be used in code for statements like x = x + 1.
130
131
10. Write three address code sequence for the assignment statement. Pg. No : 104
11 . a) Describe the various phases of compiler and trace it with the program
segment (position =initial+rate * 60 ). (16) Pg.No : 11
OR
Pg.No : 16
OR
132
13. a) i) Consider Stack implementation of shift reduce parsing for the grammar:
E → E+E
E → E*E
E → (E)
E → id and the input string id1+id2*id3 (8) Pg.No :57
ii ) Explain LL(1) grammar for the sentence S->iEtS | iEtSeS | a E->b( 8)
Pg .No:57
OR
b) i)Write an algorithm for Non recursive predictive parsing. (6)Pg.No : 54
ii) Explain context free grammars with examples. (10) Pg.No :44
14.a) i) Construct a syntax directed definition for constructing a syntax tree for
assignment statements. (8) Pg.No:89
S-> id:=E
E-> E1 +E2
E-> E1*E2
E->-E1
E-> (E1)
E->id
15. a) Explain principal sources of optimization with examples. (16) Pg.No : 104
Or
133
PART – B (5x16=80)
11.(a) (i)Explain the phases of compiler wit ha neat diagram.(10) [Pg.No :11]
OR
(b) (i) Explain the need for grouping of phases. (8) [Pg.No :19]
[Pg.No :16]
12.(a) (i) Discuss the role of lexical analyzer in detail with necessary examples. (8)
[Pg.No :23]
(ii) Disuss how FA is used to represent tokens and perform lexical analysis
with examples. (8) [Pg.No :27]
OR
(b) (i) Conversion of regular expression (a/b)* abb to NFA. (8) [Pg.No :34]
(ii) Write an algorithm for minimizing the number of states of a DFA. (8)
[Pg.No :35]
13.(a) (i) Construct parse tree for the input w= cad using top down parser.(6)
134
14 . (a) (i) Construct a syntax directed definition scheme that takes strings of a‘s,b‘s
and c‘s as input and produces as output the number of substrings in the input string
that correspond to the pattern a(a/b * c +(a/b)*b. For example the translation of the
input string ―abbcabcababc‖ is ―3‖.
4. Write a context free grammar that generate all strings of a‘s,b‘s and c‘s
5. Give the semantic attributes for the grammar symbols.
For each productions of the grammar a set of rules for evaluation o the semantic
attributes. (8) [Pg.No :94]
(ii) Illustrate type checking with necessary diagram . (8) [Pg.No :83]
OR
(b) Explain the following with reapect to code generation phase. (16)
I. Input to code generator. [Pg.No :123]
II. Target program.
III. Memory management.
IV. Instruction Selection.
V. Registr allocation.
VI. Evaluation order.
15. (a) (i) Write an algorithm for constructing natural loop of a back edge. (8)
[Pg.No :128]
(ii) Explain any four issues that crop up when designing a code generator. (8)
[Pg.No :123]
OR
(b) Expalin global data flow analysis with necessary equations. (16)
[Pg.No :116]
135
136
137
138
139
140
141
142
OR
(b) (i)Explain the code-generation algorithm in detail.
(ii)Construct the dag for the following basic block.
d: =b*c
e: = a +b
b: =b*c
a: =e-d
15. (a) (i) Explain the principal sources of optimization in detail.
(ii)Discuss the various peephole optimization techniques in detail.
OR
(b) (i) How to trace data-flow analysis of structured program?
(ii) Explain the common sub expression elimination, copy
propagation, and
transformation for moving loop invariant computations in detail.
----------------------------------------
144