CD Unit-3 (2) (R20)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

COMPILER DESIGN

UNIT III
Syntax Directed Translation

Syntax-Directed Definitions, Evaluation Orders for SDD’s, Applications of Syntax


Directed Translation, Syntax-Directed Translation Schemes, Implementing L-
Attributed SDD’s.
Intermediate Code Generation: Variants of Syntax Trees, Three Address Code,
Types and Declarations, Translation of Expressions, Type Checking, Control Flow,
Backpatching, Intermediate Code for Procedures.

Introduction
Parser uses a CFG(Context-free-Grammar) to validate the input string and produce output for the
next phase of the compiler. Output could be either a parse tree or an abstract syntax tree. Now to
interleave semantic analysis with the syntax analysis phase of the compiler, we use Syntax
Directed Translation.

Conceptually, with both syntax-directed definition and translation schemes, we parse the input
token stream, build the parse tree, and then traverse the tree as needed to evaluate the semantic
rules at the parse tree nodes. Evaluation of the semantic rules may generate code, save
information in a symbol table, issue error messages, or perform any other activities. The
translation of the token stream is the result obtained by evaluating the semantic rules.
Definition
Syntax Directed Translation has augmented rules to the grammar that facilitate semantic analysis.
SDT involves passing information bottom-up and/or top-down to the parse tree in form of
attributes attached to the nodes. Syntax-directed translation rules use
1) lexical values of nodes,
2) constants &
3) attributes associated with the non-terminals in their definitions.
The general approach to Syntax-Directed Translation is to construct a parse tree or syntax tree and
compute the values of attributes at the nodes of the tree by visiting them in some order. In many
cases, translation can be done during parsing without building an explicit tree.
Example

E -> E+T | T
T -> T*F | F
F -> INTLIT
This is a grammar to syntactically validate an expression having additions and multiplications in
it. Now, to carry out semantic analysis we will augment SDT rules to this grammar, in order to
pass some information up the parse tree and check for semantic errors, if any. In this example, we

1 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
will focus on the evaluation of the given expression, as we don’t have any semantic assertions to
check in this very basic example.

E -> E+T { E.val = E.val + T.val } PR#1


E -> T { E.val = T.val } PR#2
T -> T*F { T.val = T.val * F.val } PR#3
T -> F { T.val = F.val } PR#4
F -> INTLIT { F.val = INTLIT.lexval } PR#5

For understanding translation rules further, we take the first SDT augmented to [ E -> E+T ]
production rule. The translation rule in consideration has val as an attribute for both the non-
terminals – E & T. Right-hand side of the translation rule corresponds to attribute values of the
right-side nodes of the production rule and vice-versa. Generalizing, SDT are augmented rules to
a CFG that associate 1) set of attributes to every node of the grammar and 2) a set of translation
rules to every production rule using attributes, constants, and lexical values.
Let’s take a string to see how semantic analysis happens – S = 2+3*4. Parse tree corresponding to
S would be

To evaluate translation rules, we can employ one depth-first search traversal on the parse tree.
This is possible only because SDT rules don’t impose any specific order on evaluation until
children’s attributes are computed before parents for a grammar having all synthesized attributes.
Otherwise, we would have to figure out the best-suited plan to traverse through the parse tree and
evaluate all the attributes in one or more traversals. For better understanding, we will move
bottom-up in the left to right fashion for computing the translation rules of our example.

2 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

The above diagram shows how semantic analysis could happen. The flow of information happens
bottom-up and all the children’s attributes are computed before parents, as discussed above.
Right-hand side nodes are sometimes annotated with subscript 1 to distinguish between children
and parents.
Additional Information
Synthesized Attributes are such attributes that depend only on the attribute values of children
nodes.
Thus [ E -> E+T { E.val = E.val + T.val } ] has a synthesized attribute val corresponding to node
E. If all the semantic attributes in an augmented grammar are synthesized, one depth-first search
traversal in any order is sufficient for the semantic analysis phase.
Inherited Attributes are such attributes that depend on parent and/or sibling’s attributes.
Thus [ Ep -> E+T { Ep.val = E.val + T.val, T.val = Ep.val } ], where E & Ep are same production
symbols annotated to differentiate between parent and child, has an inherited attribute val
corresponding to node T.

Syntax Directed Definition (SDD)


is a kind of abstract specification. It is generalization of context free grammar in which each
grammar production X –> a is associated with it a set of production rules of the form s = f(b1,
b2, ……bk) where s is the attribute obtained from function f. The attribute can be a string, number,
type or a memory location. Semantic rules are fragments of code which are embedded usually at
the end of production and enclosed in curly braces ({ }).

Example:
E --> E1 + T { E.val = E1.val + T.val}
Annotated Parse Tree – The parse tree containing the values of attributes at each node for given
input string is called annotated or decorated parse tree.
Features –

3 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
 High level specification
 Hides implementation details
 Explicit order of evaluation is not specified
Types of attributes – There are two types of attributes:
1. Synthesized Attributes – These are those attributes which derive their values from their
children nodes i.e. value of synthesized attribute at node is computed from the values of attributes
at children nodes in parse tree.
Example:
E --> E1 + T { E.val = E1.val + T.val}
In this, E.val derive its values from E 1.val and T.val
Computation of Synthesized Attributes –
 Write the SDD using appropriate semantic rules for each production in given grammar.
 The annotated parse tree is generated and attribute values are computed in bottom up manner.
 The value obtained at root node is the final output.
Example: Consider the following grammar
S --> E
E --> E1 + T
E --> T
T --> T1 * F
T --> F
F --> digit
The SDD for the above grammar can be written as follow

Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. The annotated parse
tree for the input string is

4 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
For computation of attributes we start from leftmost bottom node. The rule F –> digit is used to
reduce digit to F and the value of digit is obtained from lexical analyzer which becomes value of
F i.e. from semantic action F.val = digit.lexval. Hence, F.val = 4 and since T is parent node of F
so, we get T.val = 4 from semantic action T.val = F.val. Then, for T –> T1 * F production, the
corresponding semantic action is T.val = T 1.val * F.val . Hence, T.val = 4 * 5 = 20
Similarly, combination of E 1.val + T.val becomes E.val i.e. E.val = E 1.val + T.val = 26. Then, the
production S –> E is applied to reduce E.val = 26 and semantic action associated with it prints the
result E.val . Hence, the output will be 26.
2. Inherited Attributes – These are the attributes which derive their values from their parent or
sibling nodes i.e. value of inherited attributes are computed by value of parent or sibling nodes.
Example:
A --> BCD { C.in = A.in, C.type = B.type }
Computation of Inherited Attributes –
 Construct the SDD using semantic actions.
 The annotated parse tree is generated and attribute values are computed in top down manner.
Example: Consider the following grammar
S --> T L
T --> int
T --> float
T --> double
L --> L1, id
L --> id
The SDD for the above grammar can be written as follow

Let us assume an input string int a, c for computing inherited attributes. The annotated parse tree
for the input string is

5 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

The value of L nodes is obtained from T.type (sibling) which is basically lexical value obtained as
int, float or double. Then L node gives type of identifiers a and c. The computation of type is done
in top down manner or preorder traversal. Using function Enter_type the type of identifiers a and
c is inserted in symbol table at corresponding id.entry.

Evaluation Orders for SDD’s

Evaluation order for SDD includes how the SDD(Syntax Directed Definition) is evaluated with the
help of attributes, dependency graphs, semantic rules, and S and L attributed definitions. SDD helps
in the semantic analysis in the compiler so it’s important to know about how SDDs are evaluated
and their evaluation order. This article provides detailed information about the SDD evaluation. It
requires some basic knowledge of grammar, production, parses tree, annotated parse tree,
synthesized and inherited attributes.

Dependency Graphs:

A dependency graph provides information about the order of evaluation of attributes with the help
of edges. It is used to determine the order of evaluation of attributes according to the semantic rules
of the production. An edge from the first node attribute to the second node attribute gives the
information that first node attribute evaluation is required for the evaluation of the second node
attribute. Edges represent the semantic rules of the corresponding production.
Dependency Graph Rules: A node in the dependency graph corresponds to the node of the parse
tree for each attribute. Edges (first node from the second node)of the dependency graph represent
that the attribute of the first node is evaluated before the attribute of the second node.

6 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Ordering the Evaluation of Attributes:

The dependency graph provides the evaluation order of attributes of the nodes of the parse tree. An
edge( i.e. first node to the second node) in the dependency graph represents that the attribute of the
second node is dependent on the attribute of the first node for further evaluation. This order of
evaluation gives a linear order called topological order.
There is no way to evaluate SDD on a parse tree when there is a cycle present in the graph and due
to the cycle, no topological order exists.

Production Table

S.No. Productions Semantic Rules

1. S⇢A&B S.val = A.syn + B.syn

A.syn = A1.syn * B.syn


2. A ⇢ A1 # B
A1.inh = A.syn

3. A1 ⇢ B A1.syn = B.syn

4. B ⇢ digit B.syn = digit.lexval

Annotated Parse Tree For 1#2&3

7 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

Dependency Graph For 1#2&3


Explanation of dependency graph:
Node number in the graph represents the order of the evaluation of the associated attribute. Edg in
the graph represent that the second value is dependent on the first value.
Table-1 represents the attributes corresponding to each node.
Table-2 represents the semantic rules corresponding to each edge.

Table-1 Table-2

Node Attribute Edge Corresponding Semantic Rule


(From the production table)
1 digit.lexval From To

1 4 B.syn = digit.lexval
2 digit.lexval

2 5 B.syn = digit.lexval
3 digit.lexval

3 6 B.syn = digit.lexval
4 B.syn

4 7 A1.syn = B.syn
5 B.syn

5 8 A.syn = A1.syn * B.syn


6 B.syn

6 10 S.val = A.syn + B.syn


7 A1.syn

7 8 A.syn = A1.syn * B.syn


8 A.syn

8 10 S.val = A.syn + B.syn


9 A1.inh

10 S.val 8 9 A1.inh = A.syn

S-Attributed Definitions:

8 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
S-attributed SDD can have only synthesized attributes. In this type of definitions semantic rules are
placed at the end of the production only. Its evaluation is based on bottom up parsing.
Example: S ⇢ AB { S.x = f(A.x | B.x) }

L-Attributed Definitions:

L-attributed SDD can have both synthesized and inherited (restricted inherited as attributes can only
be taken from the parent or left siblings). In this type of definition, semantics rules can be placed
anywhere in the RHS of the production. Its evaluation is based on inorder (topological sorting).
Example: S ⇢ AB {A.x = S.x + 2} or S ⇢ AB { B.x = f(A.x | B.x) } or S ⇢ AB { S.x = f(A.x |
B.x) }
Note:
 Every S-attributed grammar is also L-attributed.
 For L-attributed evaluation in order of the annotated parse tree is used.
 For S-attributed reverse of the rightmost derivation is used.

Semantic Rules with Controlled Side Effects


In practice, side effects occur: a desk calculator may print a result; a code generator may enter the
identifier type into a symbol table. We achieve a balance between attribute grammars and
translation systems with SDDs. Attribute grammars have no negative consequences and allow for
any evaluation order that is consistent with the dependency graph. Left-to-right evaluation is
imposed by translation systems, which enables semantic actions to comprise any program fragment.
The following methods will be used to control side effects in SDDs:
 Allow for unintentional side effects that do not affect attribute evaluation. Allow side effects
when attribute evaluation based on any topological sort of the dependency network results in
a "correct" translation, where "correct" is determined by the application.
 Limit the evaluation orders that can be used so that each order that can be utilized produces
the exact translation. The constraints might be viewed as hidden edges in the dependency
graph.

Application of Syntax Directed Translation

We use SDT(Syntax Directed Translation) for


 Executing Arithmetic Expressions
 Conversion from infix to postfix expression
 Conversion from infix to prefix expression
 For Binary to decimal conversion
 Counting the number of Reductions
 Creating a Syntax tree
 Generating intermediate code
 Storing information into the symbol table
 Type checking

Example

9 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Here, we will cover an example of the application of SDT(Syntax Directed Translation) for a better
understanding of the SDT application uses. Let’s consider an example of an arithmetic expression,
and then you will see how we construct an SDT.
For Example: 2*(4+5)

Parse Tree of SDT construction


Semantic action is given as follows:
Production Semantic Rules

E -> E + T E1.trans = E2.trans + T.trans

E -> T E.trans = T.trans

T -> T * F T1.trans = T2.trans * F.trans

T -> F T.trans = F.trans

F -> int F.trans = int.value

F -> ( E ) F.trans = E.trans


Abstract Syntax Tree
AST(Abstract Syntax Tree) is a condensed form of an SDT tree. It has operators at internal nodes,
not at leaves. Syntactic details like parenthesis, commas, semi-commas, etc., are omitted in AST.
AST is better to structure for later compiler stages since it omits details with the source language. It
only contains information about the essential structure of the program.
For Example: 2*(4+5)

Abstract Syntax Tree Presentation

10 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

Syntax-Directed Translation Schemes

A SDT scheme is a context-free grammar with program fragments embedded within production
bodies .The program fragments are called semantic actions and can appear at any position within
the production body.
Any SDT can be implemented by first building a parse tree and then pre-forming the actions in a
left-to-right depth first order. i.e during preorder traversal.
The use of SDT‟s to implement two important classes of SDD‟s
1. If the grammar is LR parsable, then SDD is S-attributed.
2. If the grammar is LL parsable, then SDD is L-attributed.

Postfix Translation Schemes

The postfix SDT implements the desk calculator SDD with one change: the action for the first
production prints the value. As the grammar is LR, and the SDD is S-attributed.
L →E n {print(E.val);}
E → E1 + T { E.val = E1.val + T.val }
E → E1 - T { E.val = E1.val - T.val }
E → T { E.val = T.val }
T → T1 * F { T.val = T1.val * F.val }
T → F { T.val = F.val }
F → ( E ) { F.val = E.val }
F → digit { F.val = digit.lexval }

Implementing L-Attributed SDD’s

11 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

L-Attributed definitions are the L-Attribute class of Syntax Directed Definitions (SDD). The idea
behind this class is that dependency-graph edges between attributes associated with a production
body can go from left to right, but not right to left, thus L-attribute.
L-Attribute grammars are a subset of attribute grammars. They allow the characteristics to be
assessed in a single left-to-right traverse of the SDD from depth to the right. As a result, attribute
evaluation in L-attributed grammars may be included simply in top-down parsing.
Many computer languages have the letter L attached to them. Narrow compilers are particular sorts
of compilers that use some form of L-attributed grammar. S-attributed grammars are a rigorous
superset of these. It's used to make code.
We'll go through how to use L-attributed definitions in greater depth in this article because they may
be used for a variety of translation purposes. The following techniques traverse a parse tree to do
translation:

1. Build the parse tree and annotate. Works as long as no cycles are present (guaranteed by L-
or S-attributed).
2. the parse tree, add actions, and execute the actions in preorder. Works for any L-attributed
definition. Can add actions based on the semantic rules of the SDD. (Since actions
are leaves of the tree).
3. Translate During Recursive Descent Parsing.
4. Generate Code on the Fly. Also uses recursive descent
5. Implement an SDT during LL-parsing.
6. Implement an SDT during LR-parsing of an LL Language.

Intermediate Code Generation:

Intermediate code generation is a phase in the compiler. In our earlier content ‘Compiler in
Computer’, we have discussed Intermediate code generation. It gets its input from the semantic
analysis phase and serves its output to the code optimizer phase.

The intermediate code generator generates some intermediate representation. And from this
intermediate representation, the compiler generates the target code.

As we know the task of the compiler is to process the source program and translate it into a target
program. Well, in this process a compiler may generate one or more intermediate representations.
This representation can have various forms.
The intermediate representation must be:

1. Easy to produce.

2. Easy to translate into target code.

The intermediate representation can be of various forms. The two most important kinds of
intermediate representations are:

1) Tree i.e. “parse trees” and “syntax tree”.

12 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
2) A Linear representation i.e., “three address code”.

We have discussed “trees” briefly in our previous content ‘Syntax Directed Translation’. In this
context, we will focus on the linear representation code i.e. “three address codes”.
Logical Structure of Compiler

The logical structure of the compiler has two ends i.e., the front end and a back end. The front end
generates the intermediate representation of the source program. And this intermediate
representation helps the back end in generating the target program.

The frontend end of the compiler includes:

 Parser,
 Static checker
 Intermediate Code Generator

Earlier we have discussed parser and types of parsing i.e. top-up parsing and bottom parsing.
The parser parses the input string and generates the parse tree or the syntax tree. The nodes of the
syntax tree represent the operations. And the leaves represent the operands.
As the parsing proceeds some information keeps on attaching to these nodes. And we refer to it as
an annotated parse tree.
Static Checker

Static checking confirms that the compiler can compile the program successfully. It identifies the
programming errors earlier. This helps the programmer to rectify the error before a
program executes. Static checking performs two types of checking:

 Syntactic Checking
This kind of checking identifies the syntactic errors present in the program. The compiler
examines the sequence of elements of the program. And assures that this sequence forms a
valid statement in the source program.
 Type Checking
It checks the operations present in the program. And assures that it respect the type system of
the source language. And if this is not the case the compiler performs the type conversion.

 Coercion
In coercion, the type of operands converts according to the type of operator. For example,

13 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
consider the expression 2 * 3.14. Now 2 is an integer and 3.14 is a floating-point number.
The coercion specified by the language converts integer 2 to floating-point 2.0. Now both the
operands are floating-point, the compiler will perform the floating-point operation. This
operation will provide a floating-point resultant.
 Overloading
We have studied the concept of overloading in Java. For example, the operator ‘+’ if applied
to the integer performs the addition of two integers. And if applied to the string performs
concatenation of the two strings.
Thus, the meaning of the operator changes according to the type of operands specified. In the
expression x + y is even if one operand is an integer and the other is a string then there occurs
a type error.

 Three-Address Code The three-address code is also a form of intermediate representation like
a syntax tree. The parser generates a syntax tree. And the intermediated code generator
generates the three-address code. This intermediate representation is a sequence of assembly-
like instructions.

Note: It is possible for the compiler to construct a syntax tree and three-address code parallelly.
Three-Address Instructions

1. Three-address instruction has three operands per instruction. And each operand act like a
register.
2. The three-address assignment instruction must have at least one operand on the right side of
the instruction.

3. To hold the value computed by the three-address code instruction, the compiler generates a
temporary name.

4. The three-address instructions may have less than three operands.

The three-address instruction is of the form:


x = y op z
Here,

 y and z are the operands.


 op is the operator.
 x can be a name or a compiler-generated temporary name (location) that will hold the result.

The three-address instructions are always executed sequentially. They can even perform the
conditional or unconditional jump.
Address and Instruction

The three-address code is based on two concepts addresses and the instruction.
Address:

1. Name: A name specifies the identity of a variable in the instructions or even the name of the
source program. In the three-address code, the name in the instruction appears as the address.

14 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
2. Constant: There can be different types of constants that the compiler deal with. The
compiler performs the necessary type of conversion when required.

3. Compiler-generated temporary: t is a name generated by the compiler. The compiler


generates it each time there is a need for a temporary variable. This address stores the result or a
value.

Instructions:

Assignment Instructions

 x = y op z, here the x, y, and z are addresses and op can be an arithmetic or logical binary
operator.|

 x = op y, where x and y are the addresses and op is a unary operator.

Copy Instruction
The copy instruction x = y, where x and y are addresses and the value at the address of y is set at the
address of x.

Jump Instruction

 The conditional jump, if x goto L and ifFalse x goto K. Here the instruction with label L is
executed if x is true. And if x is false the instruction with label K is executed.

 The conditional jump if x relop y goto L, here relop is the relational operator. And if x stands
by the relational operator then the instruction with label L is executed. If not, the instruction
next to this conditional instruction is operated.

 Unconditional jump goto L, here the instruction with label L is executed next.

Advanced Instructions:

1. Procedure Call
The instruction of the form y = call p, n make a procedure or function call. Here p is the name
of the procedure and n is the integer that indicates the number of parameters passed. The
instruction of the form ‘return y’ here y indicates the returned value.
2. Indexed Copy Instruction
The instruction of form x = y[i], where the value at the location ith memory unit beyond the
location y is set in x. In the instruction x[i] = y, the value of y is set to the ith memory location
beyond the location of x.
3. Address and Pointer Assignment
In the instruction of form x = &y, here the L value of y is set to x. The instruction x = *y, here
the r-value of y holds L value of a location say z. And the r-value of z is set to x. The
instruction *x = y sets the r-value of y to the location pointed by x.

15 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Types of Intermediate Code

We can classify the intermediate representation into two types:

1) High-Level Intermediate Representation


This representation is the very first intermediate representation of the source language. The
syntax tree generated by the parser is the high-level representation.
The high-level intermediate representation provides a hierarchical structure of the source
program. This hierarchical structure helps the task such as a static check.
2) Low-Level Intermediate Representation
After high-level intermediate representation, the next intermediate representation is low-level
intermediate representation. The intermediate code generation phase generates the low-level
intermediate representation.
The three-address code is a low-level representation. The three-address code is suitable for
machine-dependent tasks. Like register allocation and instruction selection.

Benefits of Intermediate Code Generation

 Consider that a compiler translates a source program directly into a target program. And it
doesn’t generate an intermediate code between. Here we can execute the generated target code
only on the machine on which the compiler was executed.
 In this case, we need a native compiler for each of the new machines.
 The intermediate code wipes out the need for the full native compiler for each machine.
 So, this is all about intermediate code generation. We have discussed from where the
intermediate code generation phase gets its input. And what output it generates. Further, we
have also discussed the types of intermediate representation. We have concluded the topic the
benefits of the intermediate code generator.

Variants of Syntax Trees

Rules of Constructing a Syntax Tree

A syntax tree’s nodes can all be performed as data with numerous fields. One element of the node
for an operator identifies the operator, while the remaining field contains a pointer to the operand
nodes. The operator is also known as the node’s label. The nodes of the syntax tree for
expressions with binary operators are created using the following functions. Each function returns
a reference to the node that was most recently created.
1. mknode (op, left, right): It creates an operator node with the name op and two fields,
containing left and right pointers.
2. mkleaf (id, entry): It creates an identifier node with the label id and the entry field, which is a
reference to the identifier’s symbol table entry.
3. mkleaf (num, val): It creates a number node with the name num and a field containing the
number’s value, val. Make a syntax tree for the expression a 4 + c, for example. p1, p2,…, p5 are
pointers to the symbol table entries for identifiers ‘a’ and ‘c’, respectively, in this sequence.
Example 1: Syntax Tree for the string a – b ∗ c + d is:

16 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

Syntax tree for example 1

Example 2: Syntax Tree for the string a * (b + c) – d /2 is:

Syntax tree for example 2

Variants of syntax tree:

A syntax tree basically has two variants which are described below:
1. Directed Acyclic Graphs for Expressions (DAG)
2. The Value-Number Method for Constructing DAGs

Directed Acyclic Graphs for Expressions (DAG)

A DAG, like an expression’s syntax tree, includes leaves that correspond to atomic operands and
inside codes that correspond to operators. If N denotes a common subexpression, a node N in a
DAG has many parents; in a syntax tree, the tree for the common subexpression would be
duplicated as many times as the subexpression appears in the original expression. As a result, a
DAG not only encodes expressions more concisely but also provides essential information to the
compiler about how to generate efficient code to evaluate the expressions.
The Directed Acyclic Graph (DAG) is a tool that shows the structure of fundamental blocks,
allows you to examine the flow of values between them, and also allows you to optimize them.
DAG allows for simple transformations of fundamental pieces.
Properties of DAG are:
1. Leaf nodes represent identifiers, names, or constants.

17 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
2. Interior nodes represent operators.
3. Interior nodes also represent the results of expressions or the identifiers/name where the
values are to be stored or assigned.
Examples:
T0 = a+b --- Expression 1
T1 = T0 +c --- Expression 2
Expression 1: T0 = a+b

Syntax tree for expression 1

Expression 2: T1 = T0 +c

Syntax tree for expression 2

The Value-Number Method for Constructing DAGs:


An array of records is used to hold the nodes of a syntax tree or DAG. Each row of the array
corresponds to a single record, and hence a single node. The first field in each record is an
operation code, which indicates the node’s label. In the given figure below, Interior nodes contain
two more fields denoting the left and right children, while leaves have one additional field that
stores the lexical value (either a symbol-table pointer or a constant in this instance).

Nodes of a DAG for i = i + 10 allocated in an array

18 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
The integer index of the record for that node inside the array is used to refer to nodes in this array.
This integer has been referred to as the node’s value number or the expression represented by the
node in the past. The value of the node labeled -I- is 3, while the values of its left and right
children are 1 and 2, respectively. Instead of integer indexes, we may use pointers to records or
references to objects in practice, but the reference to a node would still be referred to as its “value
number.” Value numbers can assist us in constructing expressions if they are stored in the right
data format.
 Algorithm: The value-number method for constructing the nodes of a Directed Acyclic
Graph.
 INPUT: Label op, node /, and node r.
 OUTPUT: The value number of a node in the array with signature (op, l,r).
 METHOD: Search the array for node M with label op, left child I, and right child r. If there
is such a node, return the value number of M. If not, create in the array a new node N with
label op, left child I, and right child r, and return its value number.
While Algorithm produces the intended result, examining the full array every time one node is
requested is time-consuming, especially if the array contains expressions from an entire program.
A hash table, in which the nodes are divided into “buckets,” each of which generally contains only
a few nodes, is a more efficient method. The hash table is one of numerous data structures that
may effectively support dictionaries. 1 A dictionary is a data type that allows us to add and
remove elements from a set, as well as to detect if a particular element is present in the set. A
good dictionary data structure, such as a hash table, executes each of these operations in a
constant or near-constant amount of time, regardless of the size of the set.
To build a hash table for the nodes of a DAG, we require a hash function h that computes the
bucket index for a signature (op, I, r) in such a manner that the signatures are distributed across
buckets and no one bucket gets more than a fair portion of the nodes. The bucket index h(op, I, r)
is deterministically computed from the op, I, and r, allowing us to repeat the calculation and
always arrive at the same bucket index per node (op, I, r).
The buckets can be implemented as linked lists, as in the given figure. The bucket headers are
stored in an array indexed by the hash value, each of which corresponds to the first cell of a list.
Each column in a bucket’s linked list contains the value number of one of the nodes that hash to
that bucket. That is, node (op,l,r) may be located on the array’s list whose header is at index
h(op,l,r).

Data structure for searching buckets

We calculate the bucket index h(op,l,r) and search the list of cells in this bucket for the specified
input node, given the input nodes op, I, and r. There are usually enough buckets that no list has

19 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
more than a few cells. However, we may need to examine all of the cells in a bucket, and for each
value number v discovered in a cell, we must verify that the input node’s signature (op,l,r)
matches the node with value number v in the list of cells (as in fig above). If a match is found, we
return v. We build a new cell, add it to the list of cells for bucket index h(op, l,r), and return the
value number in that new cell if we find no match.

Three Address Code

Three Address Code is a simple sequence of statements that is a kind of intermediate code and
simple to convert to machine code. It employs three addresses and one operator to define an
expression. It helps us to determine the sequence of operations performed by the compiler.

General illustration
a = b op c

Where a, b, or c stand for operands such as names, constants, compiler-generated temporaries, and
op stands for the operator.
There can only be one operator on the right side of instruction in a three-address code; no built-up
arithmetic expressions are allowed. As a result, a source-language statement like x+y*z might be
translated into a three-address instruction sequence.
t₁=y*z

t₂=x+t₁

Where t₁ and t₂ are temporary names produced by the compiler. The unwinding of multi-operator
arithmetic expressions and nested flow-of-control statements make three-address code ideal for
target-code creation and optimization. Using names for the intermediate values computed by a
computer makes it simple to rearrange three-address codes.

Examples
1. Convert a = (-c * b) + (-c * d) into three address code.
Solution
t₁ = -c

t₂ = b*t₁

t₃ = -c

t₄ = d * t₃

t₅ = t₂ + t₄

a = t₅

In the target program, t is utilized as a register.


2. For the following code, write the three address codes.

for(i = 1; i<=10; i++) { a[i]=x * 5; }

20 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

Solution
i=1

L:t1=x*5

t2=&a

t3=sizeof(int)

t4=t3*i

t5=t2+t4

*t5=t1

i=i+1

if i<10 goto L

Implementation of Three Address Code

There are three different ways to express three address codes:


 Quadruple
 Triples
 Indirect Triples

Quadruple
It is a structure that has four fields: op, arg1, arg2, and result. The operator is denoted by op, arg1,
and arg2 denote the operands, and the result is used to record the outcome of the expression.

Benefits
 For global optimization, it's simple to restructure code.
 Using a symbol table, one may rapidly get the value of temporary variables.

Drawbacks
 There is a lot of temporary items.
 The establishment of temporary variables adds to the time and space complexity.

Example
Convert a = -b * c + d into three address code.
The following is the three-address code:
t₁ = -b

t₂ = c + d

t₃ = t₁ * t₂

a = t₃

21 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Quadruples are used to symbolize these statements:
# Op Arg1 Arg2 Result

(0) uminus b - t₁

(1) + c d t₂

(2) * t₁ t₂ t₃

(3) = t₃ - a

Triples
Instead of using an additional temporary variable to represent a single action, a pointer to the triple
is utilized when a reference to another triple's value is required. As a result, it only has three fields:
op, arg1, and arg2.

Drawbacks
 Temporaries are difficult to rearrange since they are implicit.
 It's tough to optimize since it necessitates the relocation of intermediary code. When a triple is
relocated, all triples that relate to it must likewise be changed. The symbol table entry can be
accessed directly using the pointer.

Example
Convert a = -b * c + d into three address code.
The following is the three-address code:
t₁ = -b

t₂ = c + dM

t₃ = t₁ * t₂

a = t₃

The following triples represent these statements:


# Op Arg1 Arg2

(0) uminus b -

(1) + c d

(2) * (0) (1)

(3) = (2) -

22 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Also See, Top Down Parsing
Indirect Triples
This approach employs a pointer to a list of all references to computations that is created and kept
separately. Its usefulness is comparable to quadruple representation, however, it takes up less space.
Temporaries are easy to rearrange since they are implicit.
Example
Convert a = b * – c + b * – c into three address code.
The following is the three-address code:
t1 = uminus c

t2 = b * t1

t3 = uminus c

t4 = b * t3

t5 = t2 + t4

a = t5

# Op Arg1 Arg2

(14) uminus c -

(15) * (14) b

(16) uminus c -

(17) * (16) b

(18) + (15) (17)

(19) = a (18)

List of pointers to the table


# Statement

(0) (14)

(1) (15)

(2) (16)

(3) (17)

(4) (18)

(5) (19)

23 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Types and Declarations

we will discuss Types and Declarations in compiler design. Typical basic types and declarations for
a language contain boolean, char, integer, float, and void; here, void denotes "the absence of a
value." A type name is a type expression. We can form a type expression by applying the array type
constructor to a number and a type expression.
Declaration involves allocating space in memory and entering type and name in the symbol table.
Memory allocation is consecutive, and it assigns names to memory in the sequence they are
declared in the program.

Type Expressions
A primary type is a type of expression. Typical basic types for a language contain boolean, char,
integer, float, and void; the void implies "the absence of a value."
We can form a type name or type expression by applying the array type constructor to a number and
a type expression.
Types have structure, which we can represent using type expressions: a type expression is either a
primary type or is formed by applying a type constructor operator to a type expression. The basic
types and constructors depend on the language to be checked.

For Example,

The array type int [2][3][2] can be read as an "array of 2 arrays each of 3 arrays of 2 integers each"
and written as a type expression array(2, array(3, array(2, integer))).
Basic types
char, int, double, float are type expressions.
Type names
It is convenient to consider that names of types are type expressions.
Arrays
If E is a type expression and n is an int, then
array(n, E)
is a type expression indicating the type of an array with elements in T and indices in the
range 0 ... n - 1.
Products
If E1 and E2 are type expressions then
E1 × E2

24 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
is a type expression indicating the type of an element of the Cartesian product of E1 and E2.
The products of more than two types are defined similarly, and this product operation is left-
associative. Hence
(E1 × E2) × E3 and E1 × E2 × E3
are equivalent.
Records
The only difference between a record and a product is that the fields of a record have names.
If abc and xyz are two type names, if E1 and E2 are type expressions then
record(abc: E1, xyz: E2)
is a type expression indicating the type of an element of the Cartesian product of T1 and T2.
Pointers
Let E is a type expression, then
pointer(E)
is a type expression denoting the type pointer to an object of type T.
Function types
If E1 and E2 are type expressions then
E1 -> E2
is a type expression denoting the type of functions associating an element from E1 with an
element from E2.
Type Equivalence
When graphs represent type expressions, two types are structurally equivalent if and only if one of
the following conditions is true:
 They are of the same basic type.
 They are formed by applying the identical constructor to structurally equivalent types.
 One is a type name that refers to the other.

The first two conditions in the above definition lead to the name equivalence of type expressions if
type names are interpreted as though they stand for themselves.
"If two type expressions are equal, return a specified type else error," says a common type-checking
rule. Ambiguities might develop when type expressions are given names and then utilized in later
type expressions. The critical question is whether a type expression's name refers to itself or is an
abbreviation for another type of expression.
Type names can create implicit cycles by denoting type expressions. The resulting graph can have
cycles owing to recursive types if edges to type names are redirected to the type expressions given
by the names.
Example of a non Cyclic graph:

25 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Declarations
When we encounter declarations, we need to layout storage for the declared variables.
For every local name in a procedure, we create a Symbol Table(ST) entry including:
1. The type of the name
2. How much storage the name requires

Grammar:
D -> real, id

D -> integer, id

D -> D1, id

We use ENTER to enter the symbol table and use ATTR to trace the data type.

Production Rule (PR) Semantic Action

D -> real, id ENTER (id.PLACE, real)


D.ATTR = real

D -> integer, id ENTER (id.PLACE, integer)


D.ATTR = integer

D -> D1, id ENTER (id.PLACE, D1.ATTR)


D.ATTR = D1.ATTR

A succession of declarations is generated by nonterminal D: Basic, array, and record types are
developed by nonterminal T. One of the basic types int and float is caused by nonterminal B.
Nonterminal C, which stands for "component," creates strings of zero or more integers, each
encircled by brackets. A primary kind given by B is followed by array components specified by
nonterminal C in an array type. A record type (T's second product) is a set of declarations for the
record fields, all of which are enclosed in curly braces.

Translation of Expressions

An expression with more than one operator, like a b* c, will translate into instructions with at most
one operator per instruction. An array reference A[i][j] will expand into a sequence of three-address
instructions that calculate an address for the reference.

26 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

Operations within Expressions: The syntax-directed definition in Fig. 6.19 builds up the three-
address code for an assignment statement S using attribute code for S and attributes addr and code for
an expression E. Attributes S.code and E.code denote the three-address code for S and E, respectively.
Attribute E.addr denotes the address that will hold the value of E. Recall from Section 6.2.1 that an
address can be a name, a constant, or a compiler-generated temporary.

Consider the last production, E ->• id, in the syntax-directed definition in Fig. 6.19. When an
expression is a single identifier, say x, then x itself holds the value of the expression. The semantic
rules for this production define E.addr to point to the symbol-table entry for this instance of id. Let
top denote the current symbol table. Function top. get retrieves the entry when it is applied to the
string representation id.lexeme of this instance of id. E.code is set to the empty string.

The operators and unary - in Fig. 6.19 are representative of the operators in a typical language. The
semantic rules for E —> E\ E2, generate code to compute the value of E from the values of E\ and
E2. Values are computed into newly generated temporary names. If E\ is computed into Ei.addr and
E2 into E2. addr, then E\ E2 translates into t = E\. addr E2. addr, where t is a new temporary name.
E.addr is set to t. A sequence of distinct temporary names ti,t2,... is created by successively executing
new Tempi).

For convenience, we use the notation gen(x '=' y ' ' z) to represent the three-address instruction x = y
z. Expressions appearing in place of variables like x, y, and z are evaluated when passed to gen, and
quoted strings like are taken literally.5 Other three-address instructions will be built up similarly by
applying gen to a combination of expressions and strings.

27 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
When we translate the production E -> Ei E2, the semantic rules in Fig. 6.19 build up E.code by
concatenating Ei.code, E2.code, and an instruction that adds the values of E\ and E2. The instruction
puts the result of the addition into a new temporary name for E, denoted by E.addr.

The translation of E -» - E\ is similar. The rules create a new temporary for E and generate an
instruction to perform the unary minus operation.

Finally, the production S id = E; generates instructions that assign the value of expression E to the
identifier id. The semantic rule for this production uses function top.get to determine the address of
the identifier represented by id, as in the rules for E —v id. S.code consists of the instructions to
compute the value of E into an address given by E.addr, followed by an assignment to the address
top.get (id.lexeme) for this instance of id.

Example the syntax-directed definition converts the assignment phrase a = b + - c; into a


three-address code sequence.

a=b+-c
=> t2 = minus c
t1= b+t1
a= t2
Incremental Translation

The syntax-directed specification in Fig. 6.19 generates the same code as the translation scheme in
Fig. 6.20. The code property is not needed in the incremental technique since a single sequence of
instructions is formed by successive calls to gen. The semantic rule in Fig. 6.20 for E ->• E1 + E2
merely calls gen to construct an add instruction; the instructions to calculate Ei into Ei.addr and E2
into E2.addr have already been generated.
A syntax tree can also be built using the method shown in Fig. 6.20. The new semantic action for E
— > E1 + E2 uses a constructor to produce a node, similar to generated instructions.
E.addr = new Node("+', E1.addr, E2.addr); E — > Ei + E2 E.addr = new Node("+', E1.addr,
E2.addr);

28 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
The syntax-directed specification in Fig. 6.19 generates the same code as the translation scheme in
Fig. 6.20. The code attribute is not used in the incremental technique because a single sequence of
instructions is formed by successive calls to gen. The semantic rule in Fig. 6.20 for E ->• E1 + E2
merely calls gen to construct an add instruction; the instructions to compute Ei into Ei.addr and E2
into E2.addr have already been generated.

A syntax tree can also be built using the method shown in Fig. 6.20. The new semantic action for E
— > E1 + E2 uses a function Object() { [native code] } to create a node, similar to how generated
instructions do.

Type Checking

The technique of verifying and enforcing type restrictions in values is type checking.
Many texts are filtered out by the compiler's lexical analysis and parsing phases. Still, these
techniques cannot handle many programming languages with well-formed needs since they are
frequently not context-free and cannot be checked by the membership of a context-free grammar.

The type-checking phase of compiler design is interleaved with the syntax analysis phase, so it is
done before a program's execution or translation (static typing), and the information is gathered for
use by subsequent phases, such as the translator, which will naturally combine type calculation with
the actual translation.

29 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
A type checker ensures that a construct's type matches the type expected by its context.
For example, in Pascal, the arithmetic operator mod requires integer operands; hence a type checker
ensures that the operands of the mod are of type integer.
When the code is generated, type information acquired by a type checker may be required.

Type Systems

The design of a type checker for a language is based on information about the syntactic
constructs in the language, the notion of types, and the rules for assigning types to language
constructs.
For example : “ if both operands of the arithmetic operators of +,- and * are of type integer, then the
result is of type integer ”

Type Expressions
The type of a language construct will be denoted by a “type expression.” A type expression is
either a basic type or is formed by applying an operator called a type constructor to other type
expressions. The sets of basic types and constructors depend on the language to be checked. The
following are the definitions of type expressions:
1. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type_error , will signal an error during type checking; void denoting “the
absence of a value” allows statements to be checked.

2. Since type expressions may be named, a type name is a type expression.


3. A type constructor applied to type expressions is a type expression.

Constructors include:

Arrays : If T is a type expression then array (I,T) is a type expression denoting the type of an
array with elements of type T and index set I.

Products : If T1 and T2 are type expressions, then their Cartesian product T1 X T2 is a type
expression.

Records : The difference between a record and a product is that the names. The record type
constructor will be applied to a tuple formed from field names and field types.

For example:

type row = record


address: integer;
lexeme: array[1..15] of char
end;
var table: array[1...101] of row;

30 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
declares the type name row representing the type expression record((address X integer) X
(lexeme X array(1..15,char))) and the variable table to be an array of records of this type.

Pointers : If T is a type expression, then pointer(T) is a type expression denoting the type
“pointer to an object of type T”.
For example, var p: ↑ row declares variable p to have type pointer(row).
Functions : A function in programming languages maps a domain type D to a range type R.
The type of such function is denoted by the type expression D → R
4. Type expressions may contain variables whose values are type expressions.

Fg. 5.7 Tree representation for char x char → pointer (integer)

Type systems

A type system is a collection of rules for assigning type expressions to the various parts of a
program. A type checker implements a type system. It is specified in a syntax-directed manner.
Different type systems may be used by different compilers or processors of the same language.

Static and Dynamic Checking of Types

Checking done by a compiler is said to be static, while checking done when the target program
runs is termed dynamic. Any check can be done dynamically, if the target code carries the type of an
element along with the value of that element.

Sound type system

A sound type system eliminates the need for dynamic checking for allows us to determine
statically that these errors cannot occur when the target program runs. That is, if a sound type system
assigns a type other than type_error to a program part, then type errors cannot occur when the target
code for the program part is run.

Strongly typed language

A language is strongly typed if its compiler can guarantee that the programs it accepts will
execute without type errors.

31 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

Error Recovery

Since type checking has the potential for catching errors in program, it is desirable for type
checker to recover from errors, so it can check the rest of the input. Error handling has to be designed
into the type system right from the start; the type checking rules must be prepared to cope with errors.

Control Flow

Translating if-else and while statements involves translating boolean expressions.While


programming, we use boolean expressions to compute logical values and alter the flow of control.
The use of boolean expressions is determined by the syntactic context, e.g, an expression that
follows the keyword if alters the flow of control, while an expression on the right side is used to
denote a logical value.
uch syntactic contexts are specified in several ways, we can use two different non-terminals, use
inherited attributes, or set a flag during parsing.
We can also build a syntax tree then invoke different procedures for the two different uses of
boolean expressions.

Boolean expressions
Boolean expressions comprise of boolean operators denoted by && - (AND), || - (OR), ! - (NOT).
Relational expressions are in the form of E1 rel E2 where E1 and E1 are arithmetic expressions.
We will consider boolean expressions generated using the following grammar.
B → B || B | B && B | !B | (B) | E rel E | true | false
The attribute rel.op is used to indicate the operator represented by rel among the 6 comparison
operators <, <=, =, !=, > or >=.
It is conventional to assume that || and && are left-associative and || has the lowest precedence,
then && and finally !.
Given an expression B1 || B2, if we determine that B1 is true then we can conclude the entire
expression if true without evaluating B2. Similarly, given B1 && Bs, if B1 is false, the entire
expression is false.

Short-circuit code
In a short-circuit code, boolean operators such as &&, ||, and ! translate into jumps. Instead, if the
operators themselves appear in the code, the value of the boolean expression is represented by a
position in the code sequence. An example is shown below;
We have the statement:
if(x < 100 || x > 200 && x != y) x = 0;
The statement is translated into the following code segment;

if x < 100 goto L2


ifFalse x > 200 goto L1
ifFalse x != y goto L1

32 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

L2: x = 0
L1:

In the above translation, the boolean expression is true if the control reaches label 2. If the
expression is false, control goes to L1 skipping L2 and the assignment x = 0.
Flow-of-control statements
In this section, we consider the translation of boolean expressions into three-address code in the
context of statements generated by the following grammar;

S → if(B) S1
S → if(B) S1 else S2
S → while(B) S1

In the productions, non-terminal B represents a boolean expression while non-terminal S represents


a statement.
The translation of if(B) S1 consists of B.code followed by S1.code as can be seen in the image
below;
(1).

(2).
The if-else translation

33 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
(3).
The while translation

Inside B.code we have jumps based on the value of B. If the value is true, control flows to the first
instruction of S1.code otherwise B is false, therefore control flows to the instruction that
follows S1.code.
Inherited attributes are used to manage labels for B.code and S.code. With a boolean expression B,
two labels B.true and B.false are associated with each other if B is false.
We use statement S to associate the inherited attribute S.next that denotes a label for the instruction
immediately following the code for S.
In other cases, the instruction that follows S.code is a jump to another label L.
A jump to L from S.code is avoided using S.next.
We have the following Syntax-directed definition that produces a three-address code for boolean
expressions in the context of if, if-else and while statements.
(4).

34 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Generating three-address code for booleans;
(5).

We have a program that consists of a statement that is generated by P → S. Semantic rules


associated with this production initialize S.next to a new label.
P.code will consist of S.code which is followed by a new label S.next. assign in the production S →
assign. This acts as a placeholder for the assignment statement.
In translating S → if(B) S1, semantic rules as shown in image (4) create a new label B.true then
attach it to the first three-address instruction that is generated for S1.
By setting B.false to S.next, we make sure that control skips code for S1 if B results in a false.
Translating if-else statement that is S → if(B) S1 else S2, the code for the boolean expression B has
jumps out of it to the first instruction of the code for S1 if B is true and to the first instruction of the
code for S2, if B is false as can be seen from image (2).
Furthermore, control flows from S1 and S2 to the three-address instruction that immediately follows
code for S. S.next is its label which is also an inherited attribute.
S.next is an explicit goto that appears after code for S1 to skip code for S2
After S2, we don't need any goto since S2.next will be similar to S.next.
For translating S → while(B) S1, we form its code from B.code and S1.code. This is seen from
image (3).
The local variable begin holds a new label that is attached to the first instruction for this while
statement, which is also the first instruction for B.
A variable is preferred to an attribute since begin is local to the semantic rules for this production.
Inherited label S.next marks the instruction where control must flow to, that is, when B is false,
therefore, B.false is set to S.next. B.true which is a new label is attached to the first instruction
for S1. Code for B generates a jump to this label, that is, if B is true.
Following the code for S1, we place the instruction goto begin. This causes a jump back to the start
of the code for boolean expressions. Keep in mind that S1.next is currently set to the label begin,
therefore, jumps within S1.code can go directly to begin.
Code for S → S1 S2 consists of code for S1 that is followed by code for S2. Semantic rules manage
labels, the first instruction after S1 is the beginning of the code for S2, and the instruction after
code S2 code, is the instruction after code for S.

35 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Control-flow translation of boolean expressions.
Semantic rules for boolean expressions in image (5) complement semantic rules for image (4). As
we can see from the code layout in images (1-3), a boolean B is translated into three-address
instructions that evaluate B using conditional and unconditional jumps to one of two labels, these
are B.true and B.false, the former is B is true and the latter otherwise.
Now, we have the following 4th production for B → E1 rel E2, is translated into a comparison
three-address instruction with jumps to appropriate locations.
For example, if we have B with the form of a < b. It translates to;

if a < b goto B.true


goto B.false

For the other productions, we translate them as follows;

1. If B is in the form of B1 || B2, if B is true, we know that B is true, therefore B1.true is similar
to B.true. If B1 is false, B2 must be evaluated. Therefore we make, B1.false the label of the first
instruction in the code for B2. True and false exits for B2 are similar to true and false exits of B.

2. Translations of B1 && B2 are similar.

3. No code is needed for expression B in the form of !B1. Here we just interchange true and false exits
of B to get those for B1.
4. The constants true and false translate into jumps to B.true and B.false.
How to avoid redundant gotos.
From the if statement from section 3 (Short-Circuit code), the comparison x > 200 translates into the
following;

if x > 200 goto L4


goto L1
L4: ...

Consider the following instruction;

ifFalse x > 200 goto L1


L4: ...

The above ifFalse instruction takes advantage of the natural flow of instructions from one to the
next in sequence and therefore control simply falls through to L4 if x > 200 and thus a jump is
avoided.
From the code layouts from images 1, 2 and 3, the statement S1 follows code for the boolean
expression B. Using a special label fall which states, 'don't generate any jump', we adapt semantic
rules from images 4 and 5 so as to allow control to fall through from the code for B to the code

36 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
for S1.
The new rules for S → if(B) S1 from image 4 set B.true to the label fall;
B.true = fall
B.false = S1.next = S.next
S.code = B.code || S1.code
In a similar way, rules for if-else and while statements set B.true to fall.
Now, we adapt semantic rules for boolean expressions which allows control to fall through when
possible. Given the following rules for B → E1 rel E2;
(6).

We generate two instructions as shown in image 5, if both B.true and B.false are explicit labels,
meaning neither equals fall., Otherwise, if B.true is explicit then B.false must be fall and thus they
generate an if instruction that lets control fall through if the condition is false.
Conversely, if B.false is explicit, then they generate an ifFalse instruction. In the last case,
both B.true and B.false are fall, here no jump is generated.
We have the following semantic rules for B → B1 || B2;
(7).

Here the meaning of the label fall for B is different from the meaning for B1. If B.true is fall, that is,
control falls through B, if B evaluates to true. Although B evaluates to true if B1 also evaluates to
true, B1.true must make sure that control jumps over the code for B2 to get to the next instruction
after B.
Also, if B1 evaluates to false, B's truth-value is determined by the value of B2 and therefore rules
from image 7 make sure that B1.false corresponds to control falling through from B1 to the code
for B2.
Also note that the semantic rules for B → B1 && B2 will be similar to those of image 7.
Boolean values and jumping code.
Apart from using boolean expressions to change the flow of control in statements, boolean
expressions can also be evaluated for their values.

For example, x = true or x = a > b.

37 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
An effective way of handling both these roles of boolean expressions is to construct a syntax tree for
expressions using one of the following approaches.

1. Use two passes. Here we build a complete syntax tree for the input the traverse the tree using depth-
first, computing translations specified by semantic rules.
2. Use a single pass for statements and two passes for expressions. Here we translate E in while
(E) S1 before S1 is examined. The translation of S1 would then be done by constructing the tree and
traversing it.
We have the following grammar that has a single nonterminal E for expressions;
S → id = E; | if(E) S | while(E)S | S S
E → E || E | E && E | E rel E | E + E | (E) | id | true | false

The non-terminal E governs the flow of control in S → while(E) S1, it also denotes a value S → id =
E; and E → E + E.
We handle the two roles of expressions by using separate code generation functions. We assume
that an attribute E.n denotes the syntax tree node for an expression E and that nodes are objects.
We have the method jump that generates the jumping code at an expression node and the
method rvalue that generates code for computing the value of the node into a temporary.
When E appears in S → while(E) S1, the method jump is called at the node E.n. jump is
implemented based on the rules for boolean expressions as can be seen from image 5.
By calling E.n.jump(t, f), jumping code is generated, here t represents a label for for the first
instruction of S1.code and f represents the label for S.next.
When E appears in S → id = E;, the method rvalue is called at E.n node. If E is of the
form E1 + E2, E.n.rvalue generates code. If E is of the form E1 && E2, we will first generate the
jumping code for E then assign true or false to a new temporary t at the true and false exits from the
jumping code.

Back patching

The easiest way to implement the syntax-directed definitions for boolean


expressions is to use two passes. First, construct a syntax tree for the input, and then walk
the tree in depth-first order, computing the translations. The main problem with generating
code for boolean expressions and flow-of-control statements in a single pass is that during
one single pass we may not know the labels that control must go to at the time the jump
statements are generated. Hence, a series of branching statements with the targets of the
jumps left unspecified is generated. Each statement will be put on a list of goto statements
whose labels will be filled in when the proper label can be determined. We call this
subsequent filling in of labelsbackpatching.

To manipulate lists of labels, we use three functions :

1. makelist(i) creates a new list containing onlyi, an index into the array of quadruples;
makelistreturns a pointer to the list it has made.

38 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
2. merge(p 1,p2) concatenates the lists pointed to byp 1 andp 2,and returns a pointer
to theconcatenated list.
3. backpatch(p,i)inserts i as the target label for each of the statements on the list
pointed to byp.

Boolean Expressions:

We now construct a translation scheme suitable for producing quadruples for


boolean expressions during bottom-up parsing. The grammar we use is the following:

(1) EE 1 orM E 2


(2) |E 1 andM E 2
(3) |notE 1
(4) | (E 1)
(5) |id 1 relop id2
(6) |true
(7) |
false
(8)M
ɛ

Synthesized attributestruelistandfalselistof nonterminalEare used to generate jumping code


for boolean expressions. Incomplete jumps with unfilled labels are placed on lists pointed to by
E.truelistandE.falselist.

Consider productionEE 1 andM E 2.IfE 1 is false, thenEis also false, so the statements on
E1.falselistbecome part ofE.falselist. IfE 1 is true, then we must next testE 2, so the target
for thestatementsE 1.truelistmust be the beginning of the code generated forE 2. This
target is obtained using marker nonterminalM.

AttributeM.quadrecords the number of the first statement ofE 2.code. With the production M
ɛwe associate the semantic action

{M.quad : = nextquad}

The variablenextquadholds the index of the next quadruple to follow. This value will be
backpatched onto theE 1.truelistwhen we have seen the remainder of the productionE
E1 andM E2. The translation scheme is as follows:

(1) EE 1 orM E 2 {backpatch(E 1.falselist,M.quad);


E.truelist: =merge(E 1.truelist,
E2.truelist);E.falselist: =E 2.falselist}

(2) EE 1 andM E 2 {backpatch(E 1.truelist, M.quad);


E.truelist: =E 2.truelist;

39 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
E.falselist: =merge(E 1.falselist, E2.falselist) }

(3) EnotE 1 {E.truelist: =E 1.falselist;


E.falselist: =E 1.truelist; }

(4) E(E 1 ) {E.truelist: =E 1.truelist;


E.falselist: =E 1.falselist; }

(5) Eid 1 relop id2 {E.truelist: =makelist(nextquad);


E.falselist: =makelist(nextquad + 1);
emit(‘if’id 1.placerelop.opid 2.place
‘goto_’)emit(‘goto_’) }

(6)Etrue{E.truelist: =makelist(nextquad);
emit(‘goto_’) }

(7)Efalse{E.falselist: =makelist(nextquad);
emit(‘goto_’) }

(8)Mɛ{M.quad: =nextquad}

Flow-of-Control Statements:

A translation scheme is developed for statements generated by the following grammar :

(1)SifEthenS
(2) |ifEthenSelseS
(3) |whileEdoS
(4) |beginLend
(5) |A
(6) LL ; S
(7) |S

HereSdenotes a statement,La statement list,Aan assignment statement, and E a boolean


expression. We make the tacit assumption that the code that follows a given
statement inexecution also follows it physically in the quadruple array. Else, an
explicit jump must beprovided.

Scheme to implement the Translation:

The nonterminal E has two attributesE.truelistandE.falselist.LandSalso need a list


of unfilled quadruples that must eventually be completed by backpatching. These lists
are pointedto by the attributesL..nextlistandS.nextlist.S.nextlistis a pointer to a list of all
conditional and unconditional jumps to the quadruple following the statement S in
execution order, andL.nextlist is defined similarly.

40 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
The semantic rules for the revised grammar are as follows:

(1)SifEthenM 1 S1 NelseM 2 S2

{backpatch(E.truelist,M 1.quad);
backpatch(E.falselist,M 2.quad);
S.nextlist: =merge(S 1.nextlist,merge(N.nextlist,S 2.nextlist)) }

We backpatch the jumps whenEis true to the quadrupleM 1.quad, which is the
beginning of thecode for S1. Similarly, we backpatch jumps whenEis false to go to the
beginning of the code for S2. The listS.nextlistincludes all jumps out of S 1 and S2, as well
as the jump generated byN.

(2)Nɛ{N.nextlist: =makelist(nextquad);
emit(‘goto_’) }

(3)Mɛ{M.quad: =nextquad}

(4) SifEthenM S 1 {backpatch(E.truelist,M.quad);


S.nextlist: =merge(E.falselist,S 1.nextlist) }

(5) SwhileM 1 EdoM 2 S1 {backpatch(S 1.nextlist,M 1.quad);


backpatch(E.truelist,M 2.quad);
S.nextlist:
=E.falselist emit(
‘goto’M 1.quad)
}

(6)SbeginLend{S.nextlist: =L.nextlist}

(7)SA{S.nextlist: =nil}

The assignmentS.nextlist: =nilinitializesS.nextlistto an empty

list. (8)LL1;M S{backpatch(L

1.nextlist,M

.quad);
L.nextlist: =S.nextlist}

The statement followingL 1 in order of execution is the beginning ofS. Thus theL1.nextlistlist is
backpatched to the beginning of the code forS, which is given byM.quad.

(9)LS{L.nextlist: =S.nextlist}

41 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN

Intermediate Code for Procedures

Procedures and their implementation will be discussed at length, along with the run-time management
of storage for names. We use the term function in this section for a procedure that returns a value.
We briefly discuss function declarations and three-address code for function calls. In three-address
code, a function call is unraveled into the evaluation of parameters in prepa-ration for a call, followed
by the call itself. For simplicity, we assume that parameters are passed by value; parameter-passing
methods are discussed
Example: Suppose that a is an array of integers, and that f is a function from integers to integers.
Then, the assignment

The first two lines compute the value of the expression a [ i ] into temporary t 2 , as discussed. Line
3 makes t2 an actual parameter for the call on line 4 of f with one parameter. Line 5 assigns the value
returned by the function call to t 3 . Line 6 assigns the returned value to n.
The productions in Fig. allow function definitions and function calls. (The syntax generates unwanted
commas after the last parameter, but is good enough for illustrating translation.)
Nonterminals D and T generate declara-tions and types, respectively, as in Section 6.3. A function
definition gener-ated by D consists of keyword define, a return type, the function name, for-mal
parameters in parentheses and a function body consisting of a statement. Nonterminal F generates
zero or more formal parameters, where a formal pa-rameter consists of a type followed by an
identifier. Nonterminals S and E generate statements and expressions, respectively. The production
for S adds a statement that returns the value of an expression. The production for E adds function
calls, with actual parameters generated by A. An actual parameter is an expression.

42 Dr.SGIET ,Markapur .CSE Department


COMPILER DESIGN
Function definitions and function calls can be translated using concepts that have already been
introduced in this chapter.
Function types. The type of a function must encode the return type and the types of the formal
parameters. Let void be a special type that represents no parameter or no return type. The type of a
function pop() that returns an integer is therefore "function from void to integer." Function types can
be represented by using a constructor fun applied to the return type and an ordered list of types for
the parameters.
Symbol tables. Let s be the top symbol table when the function definition is reached. The function
name is entered into s for use in the rest of the program. The formal parameters of a function can be
handled in analogy with field names in a record (In the production for D, after seeing define and the
function name, we push s and set up a new symbol table
Env.push(top)] top = new Env(top);
Call the new symbol table, t. Note that top is passed as a parameter in n e w Env(top), so the new
symbol table t can be linked to the previous one, s. The new table t is used to translate the function
body. We revert to the previous symbol table s after the function body is translated.
• Type checking. Within expressions, a function is treated like any other operator. The discussion
of type checking in Section 6.5.2 therefore carries over, including the rules for coercions. For
example, i f / i s a function with a parameter of type real, then the integer 2 is coerced to a real
in the call
(2).
Function calls. When generating three-address instructions for a function call id(E, E,... , E), it is
sufficient to generate the three-address instruc-tions for evaluating or reducing the parameters E to
addresses, followed by a param instruction for each parameter. If we do not want to mix the
parameter-evaluating instructions with the param instructions, the attribute E.addr for each expression
E can be saved in a data structure
such as a queue. Once all the expressions are translated, the param in-structions can be generated
as the queue is emptied.

The procedure is such an important and frequently used programming con-struct that it is
imperative for a compiler to good code for procedure calls and returns. The run-time routines that
handle procedure parameter passing, calls, and returns are part of the run-time support package.
Mechanisms for run-time support are discussed

43 Dr.SGIET ,Markapur .CSE Department

You might also like