UNIT 3 - Chapter 2 in Compiler Design

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Unit 3 – Chapter 2

Intermediate code Generation


 Intermediate code is used to translate the source code into the machine code.
 Intermediate code lies between the high-level language and the machine language.

o If the compiler directly translates source code into the machine code without generating
intermediate code then a full native compiler is required for each new machine.
o The intermediate code keeps the analysis portions same for all the compilers thats
why it doesn't need a full compiler for every unique machine.
o Intermediate code generator receives input from its predecessor phase and semantic
analyzer phase. It takes input in the form of an annotated syntax tree.
o Using the intermediate code, the second phase of the compiler synthesis phase is
changed according to the target machine.
o Intermediate code can be either language specific (e.g., Byte Code for Java)
or language independent (three-address code).

o There are three 3 kinds of intermediate representations :


1. Three address code
2. Postfix notation
3. Syntax trees
Three address code

Three address code is a sequence of statements of the general form

x =y op z where x, y, and z are names, constants, or compiler-generated


temporaries;
 op stands for any operator such as a fixed- or floating-point
arithmetic operator or a logical operator on Boolean valued data.
Note that no built-up arithmetic expressions are permitted, as there is only
one operator on the right side of a statement. Thus a source language
expression like x+ y * z might be translated intoa sequence
t1= y * z
t2=x + t1 where t1, and t2 are compiler-generated temporary names.

Types of Three Address Statements:

Statements can have symbolic labels and there are statements for flow of control. A symbolic label
represents the index of three-address statement in the array holding intermediate code.
Here are the common three address statements used :
1. Assignment statements :
x := y op z, where op is a binary arithmetic or logical operation.
2. Assignment instructions :
x : = op y, where op is a unary operation . Essential unary operations include unary minus,
logical negation, shift operators, and conversion operators that for example, convert a fixed-point
number to a floating-point number.
3. Copy statements :
x : = y where the value of y is assigned to x.
4. Unconditional jump :
goto L The three-address statement with label L is the next to be executed
5. Conditional jump :
if x relop y goto L This instruction applies a relational operator ( <, =, =, etc,) to x and y,
and executes the statement with label L next if x stands in relation relop to y. If not, the three-
address statement following if x relop y goto L is executed next, as in the usual sequence.
6. Procedural call and return :
param x and call p, n for procedure calls and return y, where y representing a returned
value is optional. Their typical use is as the sequence of three-address statements
param x1
param x2
……….
param xn
call p,n
generated as part of the call procedure p( xl , x2, . . . , xn ) . The integer n indicating the number of
actual-parameters in ''call p , n" is not redundant because calls can be nested.
7. Indexed Assignments :
Indexed assignments of the form x = y[i] or x[i] = y
8. Address and pointer assignments

Address and pointer operator of the form x := &y, x := *y and *x := y


Implementation of Three Address Statements:
A three-address statement is an abstract form of intermediate code. In a
compiler, these statements can be implemented as records with fields for the
operator and the operands.

o Three address code is a type of intermediate code which is easy to generate and can
be easily converted to machine code. It makes use of at most three addresses and one
operator to represent an expression and the value computed at each instruction is
stored in temporary variable generated by compiler. The compiler decides the order of
operation given by three address code.
o General representation –
x = y op z
Where x, y and z represents operands like names, constants or compiler generated
temporaries and op represents the operator
o Implementation of Three Address Code –There are 3 representations of three address
code namely:
o Quadruple
o Triples
o Indirect Triples
o 1. Quadruple –It is structure with consist of 4 fields namely op, arg1, arg2 and result. op
denotes the operator and arg1 and arg2 denotes the two operands and result is used to
store the result of the expression.
o Advantage –
o Easy to rearrange code for global optimization.
o One can quickly access value of temporary variables using symbol table.
o Disadvantage –
o Contain lot of temporaries.
o Temporary variable creation increases time and space complexity.
o Example – Consider expression a = (b * – c) + (b * – c).
o The three address code is:
t1 = uminus c
t2 = t1 * b
t3 = uminus c
t4 = t3 * b
t5 = t2 + t4
a = t5
2. Triples –This representation doesn’t make use of extra temporary variable to represent a single

operation instead when a reference to another triple’s value is needed, a pointer to that triple is used.

So, it consist of only three fields namely op, arg1 and arg2.
Disadvantage –
o Temporaries are implicit and difficult to rearrange code.
o It is difficult to optimize because optimization involves moving intermediate code. When a triple is

moved, any other triple referring to it must be updated also. With help of pointer one can directly

access symbol table entry.


Triples for indexed assignment statements

o Example – Consider expression a = (b * – c) + (b * – c).


3. Indirect Triples –
o This representation makes use of pointer to the listing of all references to computations
which is made separately and stored. Its similar in utility as compared to quadruple
representation but requires less space than it. Temporaries are implicit and easier to
rearrange code.
o Example – Consider expression a = (b * – c) + (b * – c).
Quadruples

Triples
o Example : Translate the following expression to quadruple, triple and indirect triple-

a + b x c / e ↑ f + b x a (or) a + b * c / e ↑ f + b * a

o Solution-
o Three Address Code for the given expression is-
T1 = e ↑ f
T2 = b x c
T3 = T2 / T1
T4 = b x a
T5 = a + T3
T6 = T5 + T4
o Now, we write the required representations-
o Question – Write quadruple, triples and indirect triples for following expression :
(x + y) * (y + z) + (x + y + z)

o Explanation – The three address code is:


Postfix notation

o Postfix notation is the useful form of intermediate code if the given language isexpressions.
o Postfix notation is also called as 'suffix notation' and 'reverse polish'.
o Postfix notation is a linear representation of a syntax tree.
o In the postfix notation, any expression can be written unambiguously without parentheses.

o In postfix notation, the operator follows the operand.


o The ordinary (infix) way of writing the sum of a and b is with operator in the middle : a + b. The
postfix notation for the same expression places the operator at the right end as ab +.

o In general, if e1 and e2 are any postfix expressions, and + is any binary operator, the postfix
notation is: e1e2 +

o No parentheses are needed in postfix notation because the position and number of arguments of
the operators permit only one way to decode a postfix expression.
o The postfix representation of the expressions:
1. (a+b)*c → ab+c*
2. a*(b+c) → abc+*
3. (a+b)*(c+d) → ab+cd+*
4. (a – b) * (c + d) + (a – b) → ab - cd + *ab - +

Syntax Directed Translation for Postfix:


E.Code is a string-valued translation. The value of the translation E.code for the first production is
the concatenation of E(1).code , E(2).code and the symbol op.
Processing the input a+b*c , a syntax-directed translator based on an LR parser, has
the following sequence of moves and generates the postfix abc*+.
1. Shift a
2. Reduce by E→ id and print a
3. Shift +
4. Shift b
5. Reduce by E→ id and print b
6. Shift *
7. Shift c
8. Reduce by E→ id and print c
9. Reduce by E→ E op E and print *
10. Reduce by E→ E op E and print +
Syntax trees

o A syntax tree is a tree in which each leaf node represents an operand, while each
inside node represents an operator. The Parse Tree is abbreviated as the syntax
tree. The syntax tree is usually used when representing a program in a tree
structure.
o When you create a parse tree then it contains more details than actually needed.
So, itis very difficult to compiler to parse the parse tree.

o In the parse tree, most of the leaf nodes are single child to their parent nodes.
o In the syntax tree, we can eliminate this extra information.
o Syntax tree is a variant of parse tree. In the syntax tree, interior nodes are operators
andleaves are operands.
o Syntax tree is usually used when represent a program in a tree structure.
o A sentence id + id * id would have the following syntax tree:

Constructing Syntax Tree For Expression


Each node in a syntax tree can be implemented in a record with several fields.
In the node of an operator, one field contains operator and remaining field contains pointer to the
nodes for the operands.
When used for translation, the nodes in a syntax tree may contain addition of fields to hold the values of
attributes attached to the node.
Following functions are used to create syntax tree :
1. mknode(op,left,right): creates an operator node with label op and two fields containing pointers to
left and right.

2. mkleaf(id,entry): creates an identifier node with label id and a field containing entry, a pointer to the
symbol table entry for identifier

3. mkleaf(num,val): creates a number node with label num and a field containing val, the value of the
number.
Such functions return a pointer to a newly created node.
Syntax directed definition

 Syntax trees for assignment statements are produced by the syntax-directed definition.
 Non terminal S generates an assignment statement.
 The two binary operators + and * are examples of the full operator set in a typical
language. Operator associates and precedences are the usual ones, even though they
have not been put into the grammar. This definition constructs the tree from the input
a:=b* -c + b* -c

 The token id has an attribute place that points to the symbol-table entry for the
identifier.

 A symbol-table entry can be found from an attribute id.name, representing the lexeme
associated with that occurrence of id.
Difference between Syntax Tree and Parse Tree
Direct Acyclic Graph (DAG) :

 Graphical Intermediate Representation

 Dag also gives the hierarchical structure of source program but in a more compact way
because common sub expressions are identified.
SDT Schemes
o Syntax-directed translation schemes are a complementary notation to syntax-directed
definitions. All of the applications of syntax-directed definitions can be implemented
using syntax-directed translation schemes.
o A syntax-directed translation scheme (SDT) is a context-free grammar with program
fragments embedded within production bodies. The program fragments are
called semantic actions and can appear at any position within a production body. By
convention, we place curly braces around actions; if braces are needed as grammar
symbols, then we quote them.
SDT scheme for Assignment Statements
o In the syntax directed translation, assignment statement is mainly deals with
expressions.
o The expression can be of type real, integer, array and records.
o Consider the grammar :

o Translation scheme to produce three address code for assignments.


o From the above:
o Names in an assignment generated by S must have ken declared in either the
procedure that S appears in, or in some enclosing procedure.
o When applied to name, the modified Iookup operation first checks if name appears in
the current symbol table, accessible through top of the stack.
o If not, lookup uses the pointer in the header of a table to find the symbol table for
theenclosing procedure and ids for the name there.
o lf the name cannot be found in any of these scopes, then look returns nil.
o The p returns the entry for id.name in the symbol table.
o The emit function is used for appending the three-address code to the output file. (The
emit function is used to generate the three-address code) Otherwise it will report an
error.
o The newtemp() is a function used to generate new temporary variables.
o E.place holds the value of E.
o Example: Consider the assignment:

X: =a*b + c*d - e*f


o Figure shows the sequence of three-address statements that would be generated by
semantic rules in Fig.if newtemp were modified.
o The figure also contains an indication of the "current" value of c after the generation of
each statement. Note that when we compute $0 - $1, c is decremented to zero, so $0 is
again available to hold the result.
o Let us assume for simplicity that we are dealing only with integers.
o Keep account c, initialized to zero.
o Whenever a temporary name is used as an operand, decrement c by 1.
o Whenever a new temporary name is generated, use $c and increase c by 1.
o Note that the "stack" of temporaries is not pushed or popped at run time, although it
happens that stores and loads of temporary values are made by the compiler to occur at
the "top."
SDT scheme for Switch and Case Statements
o The “switch” or “case” statement is available in a variety of languages.
o The switch-statement syntax is as shown below :

o There is a selector expression, which is to be evaluated, followed by n constant values


that the expression might take, including a default “value” which always matches the
expression if no other value does. The intended translation of a switch is code to:

1. Evaluate the expression.

2. Find which value in the list of cases is the same as the value of the expression.

3. Execute the statement associated with the value found.

o Syntax-Directed Translation scheme of Case Statements:

o This case statement is translated into intermediate code that has the following form:
To translate into above form:
 When keyword switch is seen, two new labels test and next, and a new temporary t
are generated.
 As expression E is parsed, the code to evaluate E into t is generated. After processing E,
the jump goto test is generated.
 As each case keyword occurs, a new label Li is created and entered into the symbol
table.
 A pointer to this symbol-table entry and the value Vi of case constant are placed on a
stack (used only to store cases).
 Each statement case Vi : Si is processed by emitting the newly created label Li, followed
by the code for Si , followed by the jump goto next.
 Then when the keyword end terminating the body of the switch is found, the code can
be generated for the n-way branch. Reading the pointer-value pairs on the case stack
from the bottom to the top, we can generate a sequence of three-address statements of
the form

 Where t is the name holding the value of the selector expression E, and Ln is the label
for the default statement.
SDT scheme for Boolean Expressions

o Boolean expressions have two primary purposes. They are used for computing the
logical values.
o They are also used as conditional expression using if-then-else or while-do.
o Consider the grammar:

o The relop is denoted by <, >, <, >.


o The AND and OR are left associated. NOT has the higher precedence then AND and lastly
OR.
o The EMIT function is used to generate the three address code and the newtemp( )
function is used to generate the temporary variables.
o The E → id relop id2 contains the next_state and it gives the index of next three address
statements in the output sequence.
o Here is the example which generates the three address code using the above translation
scheme:
Example 1:

Example 2:
Need of Backpatching
SDT scheme for Flow of control constructs

o We now consider the translation of Boolean expressions into three-address code in the
context of if-then, if-then-else, and while-do statements such as those generated by the
following grammar:

o In each of these productions, E is the Boolean expression to be translated. In the


translation, we assume that a three-address statement can be symbolically labeled, and
that the function new label returns a new symbolic label each time it is called.
o E.true is the label to which control flows if E is true, and E.false is the label to which
control flows if E is false.
o The semantic rules for translating a flow-of-control statement S allow control to flow
from the translation S.code to the three-address instruction immediately following
S.code. S.next is a label that is attached to the first three-address instruction to be
executed after the code for S.
o Code for if-then , if-then-else, and while-do statements :
o Syntax-directed definition for flow-of-control statements:
SDT scheme for Procedure calls
 Procedure is an important and frequently used programming construct for a compiler.
 It is used to generate good code for procedure calls and returns.
 Calling sequence: The translation for a call includes a sequence of actions taken on
entryand exit from each procedure. Following actions take place in a calling sequence:
o When a procedure call occurs then space is allocated for activation record.
o Evaluate the argument of the called procedure.
o Establish the environment pointers to enable the called procedure to access data in
enclosing blocks.
o Save the state of the calling procedure so that it can resume execution after the call.
o Also save the return address. It is the address of the location to which the called routine
must transfer after it is finished.
o Finally generate a jump to the beginning of the code for the called procedure.
o Let us consider a grammar for a simple procedure call statement

o A suitable transition scheme for procedure call would be:

o Queue is used to store the list of parameters in the procedure call.


Detailed Explanation:

Let us consider a grammar for a simple procedure call statement

(1) S->call id(Elist)


(2) Elist->Elist , E
(3) Elist-> E

Calling Sequences:

The translation for a call includes a calling sequence, a sequence of actions taken on entry
to and exit from each procedure. The falling are the actions that take place in a calling sequence :
* When a procedure call occurs, space must be allocated for the activation record of the
called procedure.
* The arguments of the called procedure must be evaluated and made available to the called
procedure in a known place.
* Environment pointers must be established to enable the called procedure to access data in
enclosing blocks.
* The state of the calling procedure must be saved so it can resume execution after the call.

* Also saved in a known place is the return address, the location to which the called routine
must transfer after it is finished.
* Finally a jump to the beginning of the code for the called procedure must be generated. For
example, consider the following syntax-directed translation :
(1) S-> call id ( Elist )
{ for each item p on queue do emit (‘ param’ p );
emit (‘call’ id.place) }
Here, the code for S is the code for Elist, which evaluates the arguments, followed
by a param p statement for each argument, followed by a call statement.
(2) Elist->Elist , E
{ appendE.place to the end of queue }
(3) Elist-> E
{ initialize queue to contain only E.place }
Here, queue is emptied and then gets a single pointer to the symbol table location for the
name that denotes the value of E.

You might also like