CD Unit-3 (2) (R20)
CD Unit-3 (2) (R20)
CD Unit-3 (2) (R20)
UNIT III
Syntax Directed Translation
Introduction
Parser uses a CFG(Context-free-Grammar) to validate the input string and produce output for the
next phase of the compiler. Output could be either a parse tree or an abstract syntax tree. Now to
interleave semantic analysis with the syntax analysis phase of the compiler, we use Syntax
Directed Translation.
Conceptually, with both syntax-directed definition and translation schemes, we parse the input
token stream, build the parse tree, and then traverse the tree as needed to evaluate the semantic
rules at the parse tree nodes. Evaluation of the semantic rules may generate code, save
information in a symbol table, issue error messages, or perform any other activities. The
translation of the token stream is the result obtained by evaluating the semantic rules.
Definition
Syntax Directed Translation has augmented rules to the grammar that facilitate semantic analysis.
SDT involves passing information bottom-up and/or top-down to the parse tree in form of
attributes attached to the nodes. Syntax-directed translation rules use
1) lexical values of nodes,
2) constants &
3) attributes associated with the non-terminals in their definitions.
The general approach to Syntax-Directed Translation is to construct a parse tree or syntax tree and
compute the values of attributes at the nodes of the tree by visiting them in some order. In many
cases, translation can be done during parsing without building an explicit tree.
Example
E -> E+T | T
T -> T*F | F
F -> INTLIT
This is a grammar to syntactically validate an expression having additions and multiplications in
it. Now, to carry out semantic analysis we will augment SDT rules to this grammar, in order to
pass some information up the parse tree and check for semantic errors, if any. In this example, we
For understanding translation rules further, we take the first SDT augmented to [ E -> E+T ]
production rule. The translation rule in consideration has val as an attribute for both the non-
terminals – E & T. Right-hand side of the translation rule corresponds to attribute values of the
right-side nodes of the production rule and vice-versa. Generalizing, SDT are augmented rules to
a CFG that associate 1) set of attributes to every node of the grammar and 2) a set of translation
rules to every production rule using attributes, constants, and lexical values.
Let’s take a string to see how semantic analysis happens – S = 2+3*4. Parse tree corresponding to
S would be
To evaluate translation rules, we can employ one depth-first search traversal on the parse tree.
This is possible only because SDT rules don’t impose any specific order on evaluation until
children’s attributes are computed before parents for a grammar having all synthesized attributes.
Otherwise, we would have to figure out the best-suited plan to traverse through the parse tree and
evaluate all the attributes in one or more traversals. For better understanding, we will move
bottom-up in the left to right fashion for computing the translation rules of our example.
The above diagram shows how semantic analysis could happen. The flow of information happens
bottom-up and all the children’s attributes are computed before parents, as discussed above.
Right-hand side nodes are sometimes annotated with subscript 1 to distinguish between children
and parents.
Additional Information
Synthesized Attributes are such attributes that depend only on the attribute values of children
nodes.
Thus [ E -> E+T { E.val = E.val + T.val } ] has a synthesized attribute val corresponding to node
E. If all the semantic attributes in an augmented grammar are synthesized, one depth-first search
traversal in any order is sufficient for the semantic analysis phase.
Inherited Attributes are such attributes that depend on parent and/or sibling’s attributes.
Thus [ Ep -> E+T { Ep.val = E.val + T.val, T.val = Ep.val } ], where E & Ep are same production
symbols annotated to differentiate between parent and child, has an inherited attribute val
corresponding to node T.
Example:
E --> E1 + T { E.val = E1.val + T.val}
Annotated Parse Tree – The parse tree containing the values of attributes at each node for given
input string is called annotated or decorated parse tree.
Features –
Let us assume an input string 4 * 5 + 6 for computing synthesized attributes. The annotated parse
tree for the input string is
Let us assume an input string int a, c for computing inherited attributes. The annotated parse tree
for the input string is
The value of L nodes is obtained from T.type (sibling) which is basically lexical value obtained as
int, float or double. Then L node gives type of identifiers a and c. The computation of type is done
in top down manner or preorder traversal. Using function Enter_type the type of identifiers a and
c is inserted in symbol table at corresponding id.entry.
Evaluation order for SDD includes how the SDD(Syntax Directed Definition) is evaluated with the
help of attributes, dependency graphs, semantic rules, and S and L attributed definitions. SDD helps
in the semantic analysis in the compiler so it’s important to know about how SDDs are evaluated
and their evaluation order. This article provides detailed information about the SDD evaluation. It
requires some basic knowledge of grammar, production, parses tree, annotated parse tree,
synthesized and inherited attributes.
Dependency Graphs:
A dependency graph provides information about the order of evaluation of attributes with the help
of edges. It is used to determine the order of evaluation of attributes according to the semantic rules
of the production. An edge from the first node attribute to the second node attribute gives the
information that first node attribute evaluation is required for the evaluation of the second node
attribute. Edges represent the semantic rules of the corresponding production.
Dependency Graph Rules: A node in the dependency graph corresponds to the node of the parse
tree for each attribute. Edges (first node from the second node)of the dependency graph represent
that the attribute of the first node is evaluated before the attribute of the second node.
The dependency graph provides the evaluation order of attributes of the nodes of the parse tree. An
edge( i.e. first node to the second node) in the dependency graph represents that the attribute of the
second node is dependent on the attribute of the first node for further evaluation. This order of
evaluation gives a linear order called topological order.
There is no way to evaluate SDD on a parse tree when there is a cycle present in the graph and due
to the cycle, no topological order exists.
Production Table
3. A1 ⇢ B A1.syn = B.syn
Table-1 Table-2
1 4 B.syn = digit.lexval
2 digit.lexval
2 5 B.syn = digit.lexval
3 digit.lexval
3 6 B.syn = digit.lexval
4 B.syn
4 7 A1.syn = B.syn
5 B.syn
S-Attributed Definitions:
L-Attributed Definitions:
L-attributed SDD can have both synthesized and inherited (restricted inherited as attributes can only
be taken from the parent or left siblings). In this type of definition, semantics rules can be placed
anywhere in the RHS of the production. Its evaluation is based on inorder (topological sorting).
Example: S ⇢ AB {A.x = S.x + 2} or S ⇢ AB { B.x = f(A.x | B.x) } or S ⇢ AB { S.x = f(A.x |
B.x) }
Note:
Every S-attributed grammar is also L-attributed.
For L-attributed evaluation in order of the annotated parse tree is used.
For S-attributed reverse of the rightmost derivation is used.
Example
A SDT scheme is a context-free grammar with program fragments embedded within production
bodies .The program fragments are called semantic actions and can appear at any position within
the production body.
Any SDT can be implemented by first building a parse tree and then pre-forming the actions in a
left-to-right depth first order. i.e during preorder traversal.
The use of SDT‟s to implement two important classes of SDD‟s
1. If the grammar is LR parsable, then SDD is S-attributed.
2. If the grammar is LL parsable, then SDD is L-attributed.
The postfix SDT implements the desk calculator SDD with one change: the action for the first
production prints the value. As the grammar is LR, and the SDD is S-attributed.
L →E n {print(E.val);}
E → E1 + T { E.val = E1.val + T.val }
E → E1 - T { E.val = E1.val - T.val }
E → T { E.val = T.val }
T → T1 * F { T.val = T1.val * F.val }
T → F { T.val = F.val }
F → ( E ) { F.val = E.val }
F → digit { F.val = digit.lexval }
L-Attributed definitions are the L-Attribute class of Syntax Directed Definitions (SDD). The idea
behind this class is that dependency-graph edges between attributes associated with a production
body can go from left to right, but not right to left, thus L-attribute.
L-Attribute grammars are a subset of attribute grammars. They allow the characteristics to be
assessed in a single left-to-right traverse of the SDD from depth to the right. As a result, attribute
evaluation in L-attributed grammars may be included simply in top-down parsing.
Many computer languages have the letter L attached to them. Narrow compilers are particular sorts
of compilers that use some form of L-attributed grammar. S-attributed grammars are a rigorous
superset of these. It's used to make code.
We'll go through how to use L-attributed definitions in greater depth in this article because they may
be used for a variety of translation purposes. The following techniques traverse a parse tree to do
translation:
1. Build the parse tree and annotate. Works as long as no cycles are present (guaranteed by L-
or S-attributed).
2. the parse tree, add actions, and execute the actions in preorder. Works for any L-attributed
definition. Can add actions based on the semantic rules of the SDD. (Since actions
are leaves of the tree).
3. Translate During Recursive Descent Parsing.
4. Generate Code on the Fly. Also uses recursive descent
5. Implement an SDT during LL-parsing.
6. Implement an SDT during LR-parsing of an LL Language.
Intermediate code generation is a phase in the compiler. In our earlier content ‘Compiler in
Computer’, we have discussed Intermediate code generation. It gets its input from the semantic
analysis phase and serves its output to the code optimizer phase.
The intermediate code generator generates some intermediate representation. And from this
intermediate representation, the compiler generates the target code.
As we know the task of the compiler is to process the source program and translate it into a target
program. Well, in this process a compiler may generate one or more intermediate representations.
This representation can have various forms.
The intermediate representation must be:
1. Easy to produce.
The intermediate representation can be of various forms. The two most important kinds of
intermediate representations are:
We have discussed “trees” briefly in our previous content ‘Syntax Directed Translation’. In this
context, we will focus on the linear representation code i.e. “three address codes”.
Logical Structure of Compiler
The logical structure of the compiler has two ends i.e., the front end and a back end. The front end
generates the intermediate representation of the source program. And this intermediate
representation helps the back end in generating the target program.
Parser,
Static checker
Intermediate Code Generator
Earlier we have discussed parser and types of parsing i.e. top-up parsing and bottom parsing.
The parser parses the input string and generates the parse tree or the syntax tree. The nodes of the
syntax tree represent the operations. And the leaves represent the operands.
As the parsing proceeds some information keeps on attaching to these nodes. And we refer to it as
an annotated parse tree.
Static Checker
Static checking confirms that the compiler can compile the program successfully. It identifies the
programming errors earlier. This helps the programmer to rectify the error before a
program executes. Static checking performs two types of checking:
Syntactic Checking
This kind of checking identifies the syntactic errors present in the program. The compiler
examines the sequence of elements of the program. And assures that this sequence forms a
valid statement in the source program.
Type Checking
It checks the operations present in the program. And assures that it respect the type system of
the source language. And if this is not the case the compiler performs the type conversion.
Coercion
In coercion, the type of operands converts according to the type of operator. For example,
Three-Address Code The three-address code is also a form of intermediate representation like
a syntax tree. The parser generates a syntax tree. And the intermediated code generator
generates the three-address code. This intermediate representation is a sequence of assembly-
like instructions.
Note: It is possible for the compiler to construct a syntax tree and three-address code parallelly.
Three-Address Instructions
1. Three-address instruction has three operands per instruction. And each operand act like a
register.
2. The three-address assignment instruction must have at least one operand on the right side of
the instruction.
3. To hold the value computed by the three-address code instruction, the compiler generates a
temporary name.
The three-address instructions are always executed sequentially. They can even perform the
conditional or unconditional jump.
Address and Instruction
The three-address code is based on two concepts addresses and the instruction.
Address:
1. Name: A name specifies the identity of a variable in the instructions or even the name of the
source program. In the three-address code, the name in the instruction appears as the address.
Instructions:
Assignment Instructions
x = y op z, here the x, y, and z are addresses and op can be an arithmetic or logical binary
operator.|
Copy Instruction
The copy instruction x = y, where x and y are addresses and the value at the address of y is set at the
address of x.
Jump Instruction
The conditional jump, if x goto L and ifFalse x goto K. Here the instruction with label L is
executed if x is true. And if x is false the instruction with label K is executed.
The conditional jump if x relop y goto L, here relop is the relational operator. And if x stands
by the relational operator then the instruction with label L is executed. If not, the instruction
next to this conditional instruction is operated.
Unconditional jump goto L, here the instruction with label L is executed next.
Advanced Instructions:
1. Procedure Call
The instruction of the form y = call p, n make a procedure or function call. Here p is the name
of the procedure and n is the integer that indicates the number of parameters passed. The
instruction of the form ‘return y’ here y indicates the returned value.
2. Indexed Copy Instruction
The instruction of form x = y[i], where the value at the location ith memory unit beyond the
location y is set in x. In the instruction x[i] = y, the value of y is set to the ith memory location
beyond the location of x.
3. Address and Pointer Assignment
In the instruction of form x = &y, here the L value of y is set to x. The instruction x = *y, here
the r-value of y holds L value of a location say z. And the r-value of z is set to x. The
instruction *x = y sets the r-value of y to the location pointed by x.
Consider that a compiler translates a source program directly into a target program. And it
doesn’t generate an intermediate code between. Here we can execute the generated target code
only on the machine on which the compiler was executed.
In this case, we need a native compiler for each of the new machines.
The intermediate code wipes out the need for the full native compiler for each machine.
So, this is all about intermediate code generation. We have discussed from where the
intermediate code generation phase gets its input. And what output it generates. Further, we
have also discussed the types of intermediate representation. We have concluded the topic the
benefits of the intermediate code generator.
A syntax tree’s nodes can all be performed as data with numerous fields. One element of the node
for an operator identifies the operator, while the remaining field contains a pointer to the operand
nodes. The operator is also known as the node’s label. The nodes of the syntax tree for
expressions with binary operators are created using the following functions. Each function returns
a reference to the node that was most recently created.
1. mknode (op, left, right): It creates an operator node with the name op and two fields,
containing left and right pointers.
2. mkleaf (id, entry): It creates an identifier node with the label id and the entry field, which is a
reference to the identifier’s symbol table entry.
3. mkleaf (num, val): It creates a number node with the name num and a field containing the
number’s value, val. Make a syntax tree for the expression a 4 + c, for example. p1, p2,…, p5 are
pointers to the symbol table entries for identifiers ‘a’ and ‘c’, respectively, in this sequence.
Example 1: Syntax Tree for the string a – b ∗ c + d is:
A syntax tree basically has two variants which are described below:
1. Directed Acyclic Graphs for Expressions (DAG)
2. The Value-Number Method for Constructing DAGs
A DAG, like an expression’s syntax tree, includes leaves that correspond to atomic operands and
inside codes that correspond to operators. If N denotes a common subexpression, a node N in a
DAG has many parents; in a syntax tree, the tree for the common subexpression would be
duplicated as many times as the subexpression appears in the original expression. As a result, a
DAG not only encodes expressions more concisely but also provides essential information to the
compiler about how to generate efficient code to evaluate the expressions.
The Directed Acyclic Graph (DAG) is a tool that shows the structure of fundamental blocks,
allows you to examine the flow of values between them, and also allows you to optimize them.
DAG allows for simple transformations of fundamental pieces.
Properties of DAG are:
1. Leaf nodes represent identifiers, names, or constants.
Expression 2: T1 = T0 +c
We calculate the bucket index h(op,l,r) and search the list of cells in this bucket for the specified
input node, given the input nodes op, I, and r. There are usually enough buckets that no list has
Three Address Code is a simple sequence of statements that is a kind of intermediate code and
simple to convert to machine code. It employs three addresses and one operator to define an
expression. It helps us to determine the sequence of operations performed by the compiler.
General illustration
a = b op c
Where a, b, or c stand for operands such as names, constants, compiler-generated temporaries, and
op stands for the operator.
There can only be one operator on the right side of instruction in a three-address code; no built-up
arithmetic expressions are allowed. As a result, a source-language statement like x+y*z might be
translated into a three-address instruction sequence.
t₁=y*z
t₂=x+t₁
Where t₁ and t₂ are temporary names produced by the compiler. The unwinding of multi-operator
arithmetic expressions and nested flow-of-control statements make three-address code ideal for
target-code creation and optimization. Using names for the intermediate values computed by a
computer makes it simple to rearrange three-address codes.
Examples
1. Convert a = (-c * b) + (-c * d) into three address code.
Solution
t₁ = -c
t₂ = b*t₁
t₃ = -c
t₄ = d * t₃
t₅ = t₂ + t₄
a = t₅
Solution
i=1
L:t1=x*5
t2=&a
t3=sizeof(int)
t4=t3*i
t5=t2+t4
*t5=t1
i=i+1
if i<10 goto L
Quadruple
It is a structure that has four fields: op, arg1, arg2, and result. The operator is denoted by op, arg1,
and arg2 denote the operands, and the result is used to record the outcome of the expression.
Benefits
For global optimization, it's simple to restructure code.
Using a symbol table, one may rapidly get the value of temporary variables.
Drawbacks
There is a lot of temporary items.
The establishment of temporary variables adds to the time and space complexity.
Example
Convert a = -b * c + d into three address code.
The following is the three-address code:
t₁ = -b
t₂ = c + d
t₃ = t₁ * t₂
a = t₃
(0) uminus b - t₁
(1) + c d t₂
(2) * t₁ t₂ t₃
(3) = t₃ - a
Triples
Instead of using an additional temporary variable to represent a single action, a pointer to the triple
is utilized when a reference to another triple's value is required. As a result, it only has three fields:
op, arg1, and arg2.
Drawbacks
Temporaries are difficult to rearrange since they are implicit.
It's tough to optimize since it necessitates the relocation of intermediary code. When a triple is
relocated, all triples that relate to it must likewise be changed. The symbol table entry can be
accessed directly using the pointer.
Example
Convert a = -b * c + d into three address code.
The following is the three-address code:
t₁ = -b
t₂ = c + dM
t₃ = t₁ * t₂
a = t₃
(0) uminus b -
(1) + c d
(3) = (2) -
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
# Op Arg1 Arg2
(14) uminus c -
(15) * (14) b
(16) uminus c -
(17) * (16) b
(19) = a (18)
(0) (14)
(1) (15)
(2) (16)
(3) (17)
(4) (18)
(5) (19)
we will discuss Types and Declarations in compiler design. Typical basic types and declarations for
a language contain boolean, char, integer, float, and void; here, void denotes "the absence of a
value." A type name is a type expression. We can form a type expression by applying the array type
constructor to a number and a type expression.
Declaration involves allocating space in memory and entering type and name in the symbol table.
Memory allocation is consecutive, and it assigns names to memory in the sequence they are
declared in the program.
Type Expressions
A primary type is a type of expression. Typical basic types for a language contain boolean, char,
integer, float, and void; the void implies "the absence of a value."
We can form a type name or type expression by applying the array type constructor to a number and
a type expression.
Types have structure, which we can represent using type expressions: a type expression is either a
primary type or is formed by applying a type constructor operator to a type expression. The basic
types and constructors depend on the language to be checked.
For Example,
The array type int [2][3][2] can be read as an "array of 2 arrays each of 3 arrays of 2 integers each"
and written as a type expression array(2, array(3, array(2, integer))).
Basic types
char, int, double, float are type expressions.
Type names
It is convenient to consider that names of types are type expressions.
Arrays
If E is a type expression and n is an int, then
array(n, E)
is a type expression indicating the type of an array with elements in T and indices in the
range 0 ... n - 1.
Products
If E1 and E2 are type expressions then
E1 × E2
The first two conditions in the above definition lead to the name equivalence of type expressions if
type names are interpreted as though they stand for themselves.
"If two type expressions are equal, return a specified type else error," says a common type-checking
rule. Ambiguities might develop when type expressions are given names and then utilized in later
type expressions. The critical question is whether a type expression's name refers to itself or is an
abbreviation for another type of expression.
Type names can create implicit cycles by denoting type expressions. The resulting graph can have
cycles owing to recursive types if edges to type names are redirected to the type expressions given
by the names.
Example of a non Cyclic graph:
Grammar:
D -> real, id
D -> integer, id
D -> D1, id
We use ENTER to enter the symbol table and use ATTR to trace the data type.
A succession of declarations is generated by nonterminal D: Basic, array, and record types are
developed by nonterminal T. One of the basic types int and float is caused by nonterminal B.
Nonterminal C, which stands for "component," creates strings of zero or more integers, each
encircled by brackets. A primary kind given by B is followed by array components specified by
nonterminal C in an array type. A record type (T's second product) is a set of declarations for the
record fields, all of which are enclosed in curly braces.
Translation of Expressions
An expression with more than one operator, like a b* c, will translate into instructions with at most
one operator per instruction. An array reference A[i][j] will expand into a sequence of three-address
instructions that calculate an address for the reference.
Operations within Expressions: The syntax-directed definition in Fig. 6.19 builds up the three-
address code for an assignment statement S using attribute code for S and attributes addr and code for
an expression E. Attributes S.code and E.code denote the three-address code for S and E, respectively.
Attribute E.addr denotes the address that will hold the value of E. Recall from Section 6.2.1 that an
address can be a name, a constant, or a compiler-generated temporary.
Consider the last production, E ->• id, in the syntax-directed definition in Fig. 6.19. When an
expression is a single identifier, say x, then x itself holds the value of the expression. The semantic
rules for this production define E.addr to point to the symbol-table entry for this instance of id. Let
top denote the current symbol table. Function top. get retrieves the entry when it is applied to the
string representation id.lexeme of this instance of id. E.code is set to the empty string.
The operators and unary - in Fig. 6.19 are representative of the operators in a typical language. The
semantic rules for E —> E\ E2, generate code to compute the value of E from the values of E\ and
E2. Values are computed into newly generated temporary names. If E\ is computed into Ei.addr and
E2 into E2. addr, then E\ E2 translates into t = E\. addr E2. addr, where t is a new temporary name.
E.addr is set to t. A sequence of distinct temporary names ti,t2,... is created by successively executing
new Tempi).
For convenience, we use the notation gen(x '=' y ' ' z) to represent the three-address instruction x = y
z. Expressions appearing in place of variables like x, y, and z are evaluated when passed to gen, and
quoted strings like are taken literally.5 Other three-address instructions will be built up similarly by
applying gen to a combination of expressions and strings.
The translation of E -» - E\ is similar. The rules create a new temporary for E and generate an
instruction to perform the unary minus operation.
Finally, the production S id = E; generates instructions that assign the value of expression E to the
identifier id. The semantic rule for this production uses function top.get to determine the address of
the identifier represented by id, as in the rules for E —v id. S.code consists of the instructions to
compute the value of E into an address given by E.addr, followed by an assignment to the address
top.get (id.lexeme) for this instance of id.
a=b+-c
=> t2 = minus c
t1= b+t1
a= t2
Incremental Translation
The syntax-directed specification in Fig. 6.19 generates the same code as the translation scheme in
Fig. 6.20. The code property is not needed in the incremental technique since a single sequence of
instructions is formed by successive calls to gen. The semantic rule in Fig. 6.20 for E ->• E1 + E2
merely calls gen to construct an add instruction; the instructions to calculate Ei into Ei.addr and E2
into E2.addr have already been generated.
A syntax tree can also be built using the method shown in Fig. 6.20. The new semantic action for E
— > E1 + E2 uses a constructor to produce a node, similar to generated instructions.
E.addr = new Node("+', E1.addr, E2.addr); E — > Ei + E2 E.addr = new Node("+', E1.addr,
E2.addr);
A syntax tree can also be built using the method shown in Fig. 6.20. The new semantic action for E
— > E1 + E2 uses a function Object() { [native code] } to create a node, similar to how generated
instructions do.
Type Checking
The technique of verifying and enforcing type restrictions in values is type checking.
Many texts are filtered out by the compiler's lexical analysis and parsing phases. Still, these
techniques cannot handle many programming languages with well-formed needs since they are
frequently not context-free and cannot be checked by the membership of a context-free grammar.
The type-checking phase of compiler design is interleaved with the syntax analysis phase, so it is
done before a program's execution or translation (static typing), and the information is gathered for
use by subsequent phases, such as the translator, which will naturally combine type calculation with
the actual translation.
Type Systems
The design of a type checker for a language is based on information about the syntactic
constructs in the language, the notion of types, and the rules for assigning types to language
constructs.
For example : “ if both operands of the arithmetic operators of +,- and * are of type integer, then the
result is of type integer ”
Type Expressions
The type of a language construct will be denoted by a “type expression.” A type expression is
either a basic type or is formed by applying an operator called a type constructor to other type
expressions. The sets of basic types and constructors depend on the language to be checked. The
following are the definitions of type expressions:
1. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type_error , will signal an error during type checking; void denoting “the
absence of a value” allows statements to be checked.
Constructors include:
Arrays : If T is a type expression then array (I,T) is a type expression denoting the type of an
array with elements of type T and index set I.
Products : If T1 and T2 are type expressions, then their Cartesian product T1 X T2 is a type
expression.
Records : The difference between a record and a product is that the names. The record type
constructor will be applied to a tuple formed from field names and field types.
For example:
Pointers : If T is a type expression, then pointer(T) is a type expression denoting the type
“pointer to an object of type T”.
For example, var p: ↑ row declares variable p to have type pointer(row).
Functions : A function in programming languages maps a domain type D to a range type R.
The type of such function is denoted by the type expression D → R
4. Type expressions may contain variables whose values are type expressions.
Type systems
A type system is a collection of rules for assigning type expressions to the various parts of a
program. A type checker implements a type system. It is specified in a syntax-directed manner.
Different type systems may be used by different compilers or processors of the same language.
Checking done by a compiler is said to be static, while checking done when the target program
runs is termed dynamic. Any check can be done dynamically, if the target code carries the type of an
element along with the value of that element.
A sound type system eliminates the need for dynamic checking for allows us to determine
statically that these errors cannot occur when the target program runs. That is, if a sound type system
assigns a type other than type_error to a program part, then type errors cannot occur when the target
code for the program part is run.
A language is strongly typed if its compiler can guarantee that the programs it accepts will
execute without type errors.
Error Recovery
Since type checking has the potential for catching errors in program, it is desirable for type
checker to recover from errors, so it can check the rest of the input. Error handling has to be designed
into the type system right from the start; the type checking rules must be prepared to cope with errors.
Control Flow
Boolean expressions
Boolean expressions comprise of boolean operators denoted by && - (AND), || - (OR), ! - (NOT).
Relational expressions are in the form of E1 rel E2 where E1 and E1 are arithmetic expressions.
We will consider boolean expressions generated using the following grammar.
B → B || B | B && B | !B | (B) | E rel E | true | false
The attribute rel.op is used to indicate the operator represented by rel among the 6 comparison
operators <, <=, =, !=, > or >=.
It is conventional to assume that || and && are left-associative and || has the lowest precedence,
then && and finally !.
Given an expression B1 || B2, if we determine that B1 is true then we can conclude the entire
expression if true without evaluating B2. Similarly, given B1 && Bs, if B1 is false, the entire
expression is false.
Short-circuit code
In a short-circuit code, boolean operators such as &&, ||, and ! translate into jumps. Instead, if the
operators themselves appear in the code, the value of the boolean expression is represented by a
position in the code sequence. An example is shown below;
We have the statement:
if(x < 100 || x > 200 && x != y) x = 0;
The statement is translated into the following code segment;
L2: x = 0
L1:
In the above translation, the boolean expression is true if the control reaches label 2. If the
expression is false, control goes to L1 skipping L2 and the assignment x = 0.
Flow-of-control statements
In this section, we consider the translation of boolean expressions into three-address code in the
context of statements generated by the following grammar;
S → if(B) S1
S → if(B) S1 else S2
S → while(B) S1
(2).
The if-else translation
Inside B.code we have jumps based on the value of B. If the value is true, control flows to the first
instruction of S1.code otherwise B is false, therefore control flows to the instruction that
follows S1.code.
Inherited attributes are used to manage labels for B.code and S.code. With a boolean expression B,
two labels B.true and B.false are associated with each other if B is false.
We use statement S to associate the inherited attribute S.next that denotes a label for the instruction
immediately following the code for S.
In other cases, the instruction that follows S.code is a jump to another label L.
A jump to L from S.code is avoided using S.next.
We have the following Syntax-directed definition that produces a three-address code for boolean
expressions in the context of if, if-else and while statements.
(4).
1. If B is in the form of B1 || B2, if B is true, we know that B is true, therefore B1.true is similar
to B.true. If B1 is false, B2 must be evaluated. Therefore we make, B1.false the label of the first
instruction in the code for B2. True and false exits for B2 are similar to true and false exits of B.
3. No code is needed for expression B in the form of !B1. Here we just interchange true and false exits
of B to get those for B1.
4. The constants true and false translate into jumps to B.true and B.false.
How to avoid redundant gotos.
From the if statement from section 3 (Short-Circuit code), the comparison x > 200 translates into the
following;
The above ifFalse instruction takes advantage of the natural flow of instructions from one to the
next in sequence and therefore control simply falls through to L4 if x > 200 and thus a jump is
avoided.
From the code layouts from images 1, 2 and 3, the statement S1 follows code for the boolean
expression B. Using a special label fall which states, 'don't generate any jump', we adapt semantic
rules from images 4 and 5 so as to allow control to fall through from the code for B to the code
We generate two instructions as shown in image 5, if both B.true and B.false are explicit labels,
meaning neither equals fall., Otherwise, if B.true is explicit then B.false must be fall and thus they
generate an if instruction that lets control fall through if the condition is false.
Conversely, if B.false is explicit, then they generate an ifFalse instruction. In the last case,
both B.true and B.false are fall, here no jump is generated.
We have the following semantic rules for B → B1 || B2;
(7).
Here the meaning of the label fall for B is different from the meaning for B1. If B.true is fall, that is,
control falls through B, if B evaluates to true. Although B evaluates to true if B1 also evaluates to
true, B1.true must make sure that control jumps over the code for B2 to get to the next instruction
after B.
Also, if B1 evaluates to false, B's truth-value is determined by the value of B2 and therefore rules
from image 7 make sure that B1.false corresponds to control falling through from B1 to the code
for B2.
Also note that the semantic rules for B → B1 && B2 will be similar to those of image 7.
Boolean values and jumping code.
Apart from using boolean expressions to change the flow of control in statements, boolean
expressions can also be evaluated for their values.
1. Use two passes. Here we build a complete syntax tree for the input the traverse the tree using depth-
first, computing translations specified by semantic rules.
2. Use a single pass for statements and two passes for expressions. Here we translate E in while
(E) S1 before S1 is examined. The translation of S1 would then be done by constructing the tree and
traversing it.
We have the following grammar that has a single nonterminal E for expressions;
S → id = E; | if(E) S | while(E)S | S S
E → E || E | E && E | E rel E | E + E | (E) | id | true | false
The non-terminal E governs the flow of control in S → while(E) S1, it also denotes a value S → id =
E; and E → E + E.
We handle the two roles of expressions by using separate code generation functions. We assume
that an attribute E.n denotes the syntax tree node for an expression E and that nodes are objects.
We have the method jump that generates the jumping code at an expression node and the
method rvalue that generates code for computing the value of the node into a temporary.
When E appears in S → while(E) S1, the method jump is called at the node E.n. jump is
implemented based on the rules for boolean expressions as can be seen from image 5.
By calling E.n.jump(t, f), jumping code is generated, here t represents a label for for the first
instruction of S1.code and f represents the label for S.next.
When E appears in S → id = E;, the method rvalue is called at E.n node. If E is of the
form E1 + E2, E.n.rvalue generates code. If E is of the form E1 && E2, we will first generate the
jumping code for E then assign true or false to a new temporary t at the true and false exits from the
jumping code.
Back patching
1. makelist(i) creates a new list containing onlyi, an index into the array of quadruples;
makelistreturns a pointer to the list it has made.
Boolean Expressions:
Consider productionEE 1 andM E 2.IfE 1 is false, thenEis also false, so the statements on
E1.falselistbecome part ofE.falselist. IfE 1 is true, then we must next testE 2, so the target
for thestatementsE 1.truelistmust be the beginning of the code generated forE 2. This
target is obtained using marker nonterminalM.
AttributeM.quadrecords the number of the first statement ofE 2.code. With the production M
ɛwe associate the semantic action
{M.quad : = nextquad}
The variablenextquadholds the index of the next quadruple to follow. This value will be
backpatched onto theE 1.truelistwhen we have seen the remainder of the productionE
E1 andM E2. The translation scheme is as follows:
(6)Etrue{E.truelist: =makelist(nextquad);
emit(‘goto_’) }
(7)Efalse{E.falselist: =makelist(nextquad);
emit(‘goto_’) }
(8)Mɛ{M.quad: =nextquad}
Flow-of-Control Statements:
(1)SifEthenS
(2) |ifEthenSelseS
(3) |whileEdoS
(4) |beginLend
(5) |A
(6) LL ; S
(7) |S
(1)SifEthenM 1 S1 NelseM 2 S2
{backpatch(E.truelist,M 1.quad);
backpatch(E.falselist,M 2.quad);
S.nextlist: =merge(S 1.nextlist,merge(N.nextlist,S 2.nextlist)) }
We backpatch the jumps whenEis true to the quadrupleM 1.quad, which is the
beginning of thecode for S1. Similarly, we backpatch jumps whenEis false to go to the
beginning of the code for S2. The listS.nextlistincludes all jumps out of S 1 and S2, as well
as the jump generated byN.
(2)Nɛ{N.nextlist: =makelist(nextquad);
emit(‘goto_’) }
(3)Mɛ{M.quad: =nextquad}
(6)SbeginLend{S.nextlist: =L.nextlist}
(7)SA{S.nextlist: =nil}
1.nextlist,M
.quad);
L.nextlist: =S.nextlist}
The statement followingL 1 in order of execution is the beginning ofS. Thus theL1.nextlistlist is
backpatched to the beginning of the code forS, which is given byM.quad.
(9)LS{L.nextlist: =S.nextlist}
Procedures and their implementation will be discussed at length, along with the run-time management
of storage for names. We use the term function in this section for a procedure that returns a value.
We briefly discuss function declarations and three-address code for function calls. In three-address
code, a function call is unraveled into the evaluation of parameters in prepa-ration for a call, followed
by the call itself. For simplicity, we assume that parameters are passed by value; parameter-passing
methods are discussed
Example: Suppose that a is an array of integers, and that f is a function from integers to integers.
Then, the assignment
The first two lines compute the value of the expression a [ i ] into temporary t 2 , as discussed. Line
3 makes t2 an actual parameter for the call on line 4 of f with one parameter. Line 5 assigns the value
returned by the function call to t 3 . Line 6 assigns the returned value to n.
The productions in Fig. allow function definitions and function calls. (The syntax generates unwanted
commas after the last parameter, but is good enough for illustrating translation.)
Nonterminals D and T generate declara-tions and types, respectively, as in Section 6.3. A function
definition gener-ated by D consists of keyword define, a return type, the function name, for-mal
parameters in parentheses and a function body consisting of a statement. Nonterminal F generates
zero or more formal parameters, where a formal pa-rameter consists of a type followed by an
identifier. Nonterminals S and E generate statements and expressions, respectively. The production
for S adds a statement that returns the value of an expression. The production for E adds function
calls, with actual parameters generated by A. An actual parameter is an expression.
The procedure is such an important and frequently used programming con-struct that it is
imperative for a compiler to good code for procedure calls and returns. The run-time routines that
handle procedure parameter passing, calls, and returns are part of the run-time support package.
Mechanisms for run-time support are discussed