UNIT 3 - Chapter 2 in Compiler Design
UNIT 3 - Chapter 2 in Compiler Design
UNIT 3 - Chapter 2 in Compiler Design
o If the compiler directly translates source code into the machine code without generating
intermediate code then a full native compiler is required for each new machine.
o The intermediate code keeps the analysis portions same for all the compilers thats
why it doesn't need a full compiler for every unique machine.
o Intermediate code generator receives input from its predecessor phase and semantic
analyzer phase. It takes input in the form of an annotated syntax tree.
o Using the intermediate code, the second phase of the compiler synthesis phase is
changed according to the target machine.
o Intermediate code can be either language specific (e.g., Byte Code for Java)
or language independent (three-address code).
Statements can have symbolic labels and there are statements for flow of control. A symbolic label
represents the index of three-address statement in the array holding intermediate code.
Here are the common three address statements used :
1. Assignment statements :
x := y op z, where op is a binary arithmetic or logical operation.
2. Assignment instructions :
x : = op y, where op is a unary operation . Essential unary operations include unary minus,
logical negation, shift operators, and conversion operators that for example, convert a fixed-point
number to a floating-point number.
3. Copy statements :
x : = y where the value of y is assigned to x.
4. Unconditional jump :
goto L The three-address statement with label L is the next to be executed
5. Conditional jump :
if x relop y goto L This instruction applies a relational operator ( <, =, =, etc,) to x and y,
and executes the statement with label L next if x stands in relation relop to y. If not, the three-
address statement following if x relop y goto L is executed next, as in the usual sequence.
6. Procedural call and return :
param x and call p, n for procedure calls and return y, where y representing a returned
value is optional. Their typical use is as the sequence of three-address statements
param x1
param x2
……….
param xn
call p,n
generated as part of the call procedure p( xl , x2, . . . , xn ) . The integer n indicating the number of
actual-parameters in ''call p , n" is not redundant because calls can be nested.
7. Indexed Assignments :
Indexed assignments of the form x = y[i] or x[i] = y
8. Address and pointer assignments
o Three address code is a type of intermediate code which is easy to generate and can
be easily converted to machine code. It makes use of at most three addresses and one
operator to represent an expression and the value computed at each instruction is
stored in temporary variable generated by compiler. The compiler decides the order of
operation given by three address code.
o General representation –
x = y op z
Where x, y and z represents operands like names, constants or compiler generated
temporaries and op represents the operator
o Implementation of Three Address Code –There are 3 representations of three address
code namely:
o Quadruple
o Triples
o Indirect Triples
o 1. Quadruple –It is structure with consist of 4 fields namely op, arg1, arg2 and result. op
denotes the operator and arg1 and arg2 denotes the two operands and result is used to
store the result of the expression.
o Advantage –
o Easy to rearrange code for global optimization.
o One can quickly access value of temporary variables using symbol table.
o Disadvantage –
o Contain lot of temporaries.
o Temporary variable creation increases time and space complexity.
o Example – Consider expression a = (b * – c) + (b * – c).
o The three address code is:
t1 = uminus c
t2 = t1 * b
t3 = uminus c
t4 = t3 * b
t5 = t2 + t4
a = t5
2. Triples –This representation doesn’t make use of extra temporary variable to represent a single
operation instead when a reference to another triple’s value is needed, a pointer to that triple is used.
So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage –
o Temporaries are implicit and difficult to rearrange code.
o It is difficult to optimize because optimization involves moving intermediate code. When a triple is
moved, any other triple referring to it must be updated also. With help of pointer one can directly
Triples
o Example : Translate the following expression to quadruple, triple and indirect triple-
a + b x c / e ↑ f + b x a (or) a + b * c / e ↑ f + b * a
o Solution-
o Three Address Code for the given expression is-
T1 = e ↑ f
T2 = b x c
T3 = T2 / T1
T4 = b x a
T5 = a + T3
T6 = T5 + T4
o Now, we write the required representations-
o Question – Write quadruple, triples and indirect triples for following expression :
(x + y) * (y + z) + (x + y + z)
o Postfix notation is the useful form of intermediate code if the given language isexpressions.
o Postfix notation is also called as 'suffix notation' and 'reverse polish'.
o Postfix notation is a linear representation of a syntax tree.
o In the postfix notation, any expression can be written unambiguously without parentheses.
o In general, if e1 and e2 are any postfix expressions, and + is any binary operator, the postfix
notation is: e1e2 +
o No parentheses are needed in postfix notation because the position and number of arguments of
the operators permit only one way to decode a postfix expression.
o The postfix representation of the expressions:
1. (a+b)*c → ab+c*
2. a*(b+c) → abc+*
3. (a+b)*(c+d) → ab+cd+*
4. (a – b) * (c + d) + (a – b) → ab - cd + *ab - +
o A syntax tree is a tree in which each leaf node represents an operand, while each
inside node represents an operator. The Parse Tree is abbreviated as the syntax
tree. The syntax tree is usually used when representing a program in a tree
structure.
o When you create a parse tree then it contains more details than actually needed.
So, itis very difficult to compiler to parse the parse tree.
o In the parse tree, most of the leaf nodes are single child to their parent nodes.
o In the syntax tree, we can eliminate this extra information.
o Syntax tree is a variant of parse tree. In the syntax tree, interior nodes are operators
andleaves are operands.
o Syntax tree is usually used when represent a program in a tree structure.
o A sentence id + id * id would have the following syntax tree:
2. mkleaf(id,entry): creates an identifier node with label id and a field containing entry, a pointer to the
symbol table entry for identifier
3. mkleaf(num,val): creates a number node with label num and a field containing val, the value of the
number.
Such functions return a pointer to a newly created node.
Syntax directed definition
Syntax trees for assignment statements are produced by the syntax-directed definition.
Non terminal S generates an assignment statement.
The two binary operators + and * are examples of the full operator set in a typical
language. Operator associates and precedences are the usual ones, even though they
have not been put into the grammar. This definition constructs the tree from the input
a:=b* -c + b* -c
The token id has an attribute place that points to the symbol-table entry for the
identifier.
A symbol-table entry can be found from an attribute id.name, representing the lexeme
associated with that occurrence of id.
Difference between Syntax Tree and Parse Tree
Direct Acyclic Graph (DAG) :
Dag also gives the hierarchical structure of source program but in a more compact way
because common sub expressions are identified.
SDT Schemes
o Syntax-directed translation schemes are a complementary notation to syntax-directed
definitions. All of the applications of syntax-directed definitions can be implemented
using syntax-directed translation schemes.
o A syntax-directed translation scheme (SDT) is a context-free grammar with program
fragments embedded within production bodies. The program fragments are
called semantic actions and can appear at any position within a production body. By
convention, we place curly braces around actions; if braces are needed as grammar
symbols, then we quote them.
SDT scheme for Assignment Statements
o In the syntax directed translation, assignment statement is mainly deals with
expressions.
o The expression can be of type real, integer, array and records.
o Consider the grammar :
2. Find which value in the list of cases is the same as the value of the expression.
o This case statement is translated into intermediate code that has the following form:
To translate into above form:
When keyword switch is seen, two new labels test and next, and a new temporary t
are generated.
As expression E is parsed, the code to evaluate E into t is generated. After processing E,
the jump goto test is generated.
As each case keyword occurs, a new label Li is created and entered into the symbol
table.
A pointer to this symbol-table entry and the value Vi of case constant are placed on a
stack (used only to store cases).
Each statement case Vi : Si is processed by emitting the newly created label Li, followed
by the code for Si , followed by the jump goto next.
Then when the keyword end terminating the body of the switch is found, the code can
be generated for the n-way branch. Reading the pointer-value pairs on the case stack
from the bottom to the top, we can generate a sequence of three-address statements of
the form
Where t is the name holding the value of the selector expression E, and Ln is the label
for the default statement.
SDT scheme for Boolean Expressions
o Boolean expressions have two primary purposes. They are used for computing the
logical values.
o They are also used as conditional expression using if-then-else or while-do.
o Consider the grammar:
Example 2:
Need of Backpatching
SDT scheme for Flow of control constructs
o We now consider the translation of Boolean expressions into three-address code in the
context of if-then, if-then-else, and while-do statements such as those generated by the
following grammar:
Calling Sequences:
The translation for a call includes a calling sequence, a sequence of actions taken on entry
to and exit from each procedure. The falling are the actions that take place in a calling sequence :
* When a procedure call occurs, space must be allocated for the activation record of the
called procedure.
* The arguments of the called procedure must be evaluated and made available to the called
procedure in a known place.
* Environment pointers must be established to enable the called procedure to access data in
enclosing blocks.
* The state of the calling procedure must be saved so it can resume execution after the call.
* Also saved in a known place is the return address, the location to which the called routine
must transfer after it is finished.
* Finally a jump to the beginning of the code for the called procedure must be generated. For
example, consider the following syntax-directed translation :
(1) S-> call id ( Elist )
{ for each item p on queue do emit (‘ param’ p );
emit (‘call’ id.place) }
Here, the code for S is the code for Elist, which evaluates the arguments, followed
by a param p statement for each argument, followed by a call statement.
(2) Elist->Elist , E
{ appendE.place to the end of queue }
(3) Elist-> E
{ initialize queue to contain only E.place }
Here, queue is emptied and then gets a single pointer to the symbol table location for the
name that denotes the value of E.