Abstract Syntax Trees (AST) : Modern Compiler Design by David Galles University of San Francisco
Abstract Syntax Trees (AST) : Modern Compiler Design by David Galles University of San Francisco
Abstract Syntax Trees (AST) : Modern Compiler Design by David Galles University of San Francisco
• 3*(4+5)
Designing ASTs
• When designing an AST for a programming
language, we need to decide what information
from the parse tree is essential and what can be
discarded.
• We need to structure the tree so that it is easy to
work with the next stages of compilation.
• In general, an AST consists of a root and several
children, as follows:
Designing ASTs
• The root of the tree contains information that
describes the type of the tree – whether the tree
represents an operator expression, an integer
literal, an if statement, and so on.
• The children of the root are subtrees that describe
the various components of the tree itself. For
instance, the children of an operator expression
describe the operands and the children of
assignment statement describe the variable being
modified and the value being assigned.
ASTs for Expressions
• ASTs for expressions are fairly straight forward.
• Simple expressions (integers constants, identifiers
etc.) can be represented by a single node.
• Binary operations are represented as root, which
stores the operation, and a left and a right subtree,
which corresponds to the left and right operands of
the expression.
• What about parenthesis? Parenthesis give us
grouping information – but that information is already
contained within the structure of the tree.
• Thus we need not include parenthesis in an AST.
ASTs for Expressions
• ASTs for: (3+4)*(5+6)
• int y[ ][ ];
AST for Variable Declaration
• AST for x = 4 + y
AST for Class Declaration
• class MyClass {
int x;
boolean B[ ];
}
ASTs Examples
• AST for x.y = u[v.w].z
ASTs Examples
• AST for: if (x>0) x=x-1; else x=x+1;
ASTs Examples
• AST for: while(x>0){ x=x/2; y=y+1; }
AST for Function Declaration
• To represent a function declaration we need to
keep track of several pieces of information:
– Return type of the function (Identifier).
– Name of the function (Identifier).
– List of formal parameters (each of which needs the
same information as a variable declaration).
– Function body (Block statement).
AST Summary
• An AST is a simplified parse tree which
represents the structural information without
non-terminals
• ASTs help to perform the task of next phases of
the compiler
• All language structures can be represented by
abstract syntax trees
• AST can be referred as Intermediate
representation of the source program
Three-Address Code
• Three-address code is a linearized representation
of a syntax tree.
• In three-address code, there is at most one operator
on the right side of an instruction; that is, no built-
up arithmetic expressions are permitted.
• Thus a expression like x+y*z might be translated
into the sequence of three-address instructions:
t1 = y * z
t2 = x + t 1
t1 and t2 are compiler-generated temporary names
Three-Address Code
• Three-address code is desirable for code
generation.
• Three-address representation of the given AST is
as follows.
t1 = a + b
t2 = c + d
t3 = t 1 * t 2
Addresses and Instructions
• Three-address code is built from two concepts:
addresses and instructions.
• An address can be one of the following:
• A name. For convenience, we allow source-program
names to appear as addresses in three-address code.
In an implementation, a source name is replaced by a
pointer to its symbol-table entry, where all
information about the name is kept.
• A constant. In practice, a compiler must deal with
many different types of constants and variables. Type
conversions within expressions are considered.
Addresses and Instructions
• A compiler-generated temporary. It is useful,
especially in optimizing compilers, to create a
distinct name each time a temporary is needed.
These temporaries can be combined, if possible,
when registers are allocated to variables.
Addresses and Instructions
• Consider the statement: do i = i+1; while(a[i]<v);