Abstract Syntax Trees (AST) : Modern Compiler Design by David Galles University of San Francisco

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

Abstract Syntax Trees (AST)

Modern Compiler Design


by David Galles
University of San Francisco
Abstract Syntax Trees (ASTs)
• Parse trees not only prove that a given string can
be derived from a context-free grammar, they
also give some indication of the meaning of the
string.
• The parse tree for the expression id+id*id also
provides the information about the execution of
the operators
• Whether the execution is id+(id*id) or (id+id)*id
Parse tree
Parse tree
• What about the non-terminals E, T and F – are
they required to convey the meaning of the
string id+id*id?... NO
• The non-terminals are necessary to describe the
grammar but not required when the syntax
analysis (parsing) has been done successfully.
• We need a simplified version of the parse tree
that conveys all of the structure and – meaning.
• This kind of simplified parse tree is known as an
abstract syntax tree or AST.
Abstract Syntax Tree (AST)
• An abstract syntax tree is a simplified parse tree
which contains the structural information about
the shape of the parse tree, without the non-
terminals of the grammars.
• Non-terminals are required for parsing but they
give no information about the meaning of the
string (expression)
• Non-terminals are removed to produce an
abstract syntax tree.
ASTs - Examples
• ASTs for 3+4*5

• 3*(4+5)
Designing ASTs
• When designing an AST for a programming
language, we need to decide what information
from the parse tree is essential and what can be
discarded.
• We need to structure the tree so that it is easy to
work with the next stages of compilation.
• In general, an AST consists of a root and several
children, as follows:
Designing ASTs
• The root of the tree contains information that
describes the type of the tree – whether the tree
represents an operator expression, an integer
literal, an if statement, and so on.
• The children of the root are subtrees that describe
the various components of the tree itself. For
instance, the children of an operator expression
describe the operands and the children of
assignment statement describe the variable being
modified and the value being assigned.
ASTs for Expressions
• ASTs for expressions are fairly straight forward.
• Simple expressions (integers constants, identifiers
etc.) can be represented by a single node.
• Binary operations are represented as root, which
stores the operation, and a left and a right subtree,
which corresponds to the left and right operands of
the expression.
• What about parenthesis? Parenthesis give us
grouping information – but that information is already
contained within the structure of the tree.
• Thus we need not include parenthesis in an AST.
ASTs for Expressions
• ASTs for: (3+4)*(5+6)

• AST for (4)


• Integer_Literal(4)
ASTs for Variables
• The method that we will use to describe
structured variables as trees is as follows:
• Base variable: Base variable are simple variable,
such as x and y, They are represented by a node
that has a single child – identifier for the variable.
• Array variables: Array variables are used for
array accesses, such as A[3] or A[4+5]. They are
represented by a tree with a root of ‘ArrayVar’, a
variable tree for the ‘base’ of the array and an
expression tree for the index
ASTs for Variables
• Class instance variables: Class instance variables
are used for accessing variables inside of classes,
such as x.y. They are represented by a tree with a
root of ‘ClassVar’, a variable tree for the ‘base’ of
the class instance variable and an identifier for
the variable name of the class instance variable
being accessed.
• Same strategy can be used in the ASTs for records
or structs
ASTs for Variables
• AST for variable x

• AST for class variable x.y


ASTs for Variables
• AST for array variable x[4]

• AST for class variable x.w[4]


ASTs for Variables
• AST for class variable y[3].z
ASTs for Variables
• Consider the following program:
Class simpleClass{
int a;
int b; }
Class complexClass{
int u;
simpleClass v; }
void main(){
complexClass x;
x = new complexClass();
x.v = new simpleClass();
x.v.a = 3; }
• ASTs for x.v.a AND w.x.y.z
ASTs for Variables
• AST for v.w[x.y].z
ASTs for Statements
• AST for Assignment Statements
– To represent an assignment statement, we need to
keep track the l-value of the assignment statement
and the r-value of the assignment statement.
– Thus the AST for an assignment statement will have
two children , a left subtree for the l-value of the
assignment statement and a right subtree for the r-
value side.
ASTs for Statements
• AST for Assignment Statements
AST for if Statements
• To represent an if statement, we need three
pieces of information.
• The test for the id statement, the ‘then’ clause of
the if statement and the ‘else’ clause of an if
statement.
• The else subtree may be empty for an if
statement that does not have an else.
AST for while Statement
• While statements are similar to if statement.
They require an expression tree for the loop test
and a statement tree for the body of the loop as
follows:
AST for BlockStatement
• The AST for a block statement has a variable
number of children – one for each statement in
the block.
AST for Variable Declaration
• We can represent any variable declaration as an
AST with four children.
• The root contains a flag that signifies that the
tree represents a variable declaration. The four
children are as follows:
– An identifier for the type of the variable.
– An identifier for the name of the variable.
– An identifier for the dimensionality of the variable.
Non-array variable will have a dimensionality of 0.
– An initial value for the variable (if any).
AST for Variable Declaration
• If a declaration has no initial value, then the
initial value child of null.
• For example: int x = 3 ; AST will be;

• int y[ ][ ];
AST for Variable Declaration
• AST for x = 4 + y
AST for Class Declaration
• class MyClass {
int x;
boolean B[ ];
}
ASTs Examples
• AST for x.y = u[v.w].z
ASTs Examples
• AST for: if (x>0) x=x-1; else x=x+1;
ASTs Examples
• AST for: while(x>0){ x=x/2; y=y+1; }
AST for Function Declaration
• To represent a function declaration we need to
keep track of several pieces of information:
– Return type of the function (Identifier).
– Name of the function (Identifier).
– List of formal parameters (each of which needs the
same information as a variable declaration).
– Function body (Block statement).
AST Summary
• An AST is a simplified parse tree which
represents the structural information without
non-terminals
• ASTs help to perform the task of next phases of
the compiler
• All language structures can be represented by
abstract syntax trees
• AST can be referred as Intermediate
representation of the source program
Three-Address Code
• Three-address code is a linearized representation
of a syntax tree.
• In three-address code, there is at most one operator
on the right side of an instruction; that is, no built-
up arithmetic expressions are permitted.
• Thus a expression like x+y*z might be translated
into the sequence of three-address instructions:
t1 = y * z
t2 = x + t 1
t1 and t2 are compiler-generated temporary names
Three-Address Code
• Three-address code is desirable for code
generation.
• Three-address representation of the given AST is
as follows.
t1 = a + b
t2 = c + d
t3 = t 1 * t 2
Addresses and Instructions
• Three-address code is built from two concepts:
addresses and instructions.
• An address can be one of the following:
• A name. For convenience, we allow source-program
names to appear as addresses in three-address code.
In an implementation, a source name is replaced by a
pointer to its symbol-table entry, where all
information about the name is kept.
• A constant. In practice, a compiler must deal with
many different types of constants and variables. Type
conversions within expressions are considered.
Addresses and Instructions
• A compiler-generated temporary. It is useful,
especially in optimizing compilers, to create a
distinct name each time a temporary is needed.
These temporaries can be combined, if possible,
when registers are allocated to variables.
Addresses and Instructions
• Consider the statement: do i = i+1; while(a[i]<v);

• In both translations, the last instruction is a


conditional jump to the first instruction.
• The multiplication i * 8 is appropriate for an
array of elements that each take 8 units of space.
Quadruples
• In a compiler, these instructions can be
implemented as objects or as records with fields for
the operator and the operands. Three
such representations are called quadruples, triples,
and indirect triples.
• A quadruple (or just "quad') has four fields, which
we call op, arg1, arg2, and result.
• Instructions with unary operators like x = minus y or
x = y do not use arg2.
• Conditional and unconditional jumps put the target
label in result.
Quadruples
• Three-address code for the assignment a = b * - c
+b * - c ;
• Three-address code and quadruples are
represented as:
Triples
• A triple has only three fields, which we call op,
arg1, and arg2.
• Using triples, we refer to the result of an
operation x op y by its position, rather than by
an explicit temporary name.
Indirect triples
• Indirect triples consist of a listing of pointers to
triples, rather than a listing of triples themselves.
• For example, let us use an array instruction to list
pointers to triples in the desired order. Then, the
triples in Fig. 6.11(b) can be represented as:
Summary
• An intermediate representation is typically some
combination of a graphical notation and three-
address code.
• As in syntax trees, a node in a graphical notation
represents a construct; the children of a node
represent its sub-constructs.
• Three address code takes its name from
instructions of the form x = y op z, with at most
one operator per instruction.
• There are additional instructions for control flow.

You might also like