LL1 Parsing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Predictive Parsing For a given non-terminal, the lookahead symbol uniquely determines the production to apply Top-down parsing

ng = predictive parsing Driven by a predictive parsing table of non-terminals X input symbols productions
CS331 LL Parsing

LL Parsing
Reads input from left to right and constructs leftmost derivation (forwards) LL Parsing is predictive Features
input parsed from left to right leftmost derivation (forward) one token lookahead

CS331 LL Parsing

LL(1) Grammars
Definition A grammar G is LL(1) if and only if for each set of productions A ::= 1 | 2| | n
1. FIRST( 1), FIRST(2), FIRST( n ) are all pairwise disjoint, and 2. if i * then FIRST( j) FOLLOW(A) = , for all 1 j n, i j.

Table-driven predictive parser - LL(1)


Input: a string w and a parsing table M for G
push eof push Start Symbol token next token() X top-of-stack repeat if X is a terminal then if X = token then pop X token next token() else error() else /* X is a non-terminal */ if M[X,token] = X Y 1 Y 2 Y k then pop X push Y k , Yk -1, , Y1 else error() X top-of-stack until X = eof if token eof then error()

What rule to select for a given non-terminal and input token can be represented in a parse table M. Algorithm for LL(1) parse table construction must not result in multiple entries for any M[A, a] or M[A, eof]
(Aho, Lam, Sethi, and Ullman, Algorithm 4.31) Whether a grammar is LL(1) or not is decidable
CS331 LL Parsing

Aho, Sethi, and Ullman, Algorithm 4.3


CS331 LL Parsing

Example grammar and its table


Expression grammar with precedence
<goal> ::= <expr> <expr> ::= <term> <expr> <expr> ::= + <expr> | - <expr> | <term> ::= <factor> <term> <term> ::= * <term> | / <term> | <factor> ::= num | id

Another example
S
ES (S)S (ES)S (1S)S (1+S)S (1+ES)S (1+2 S)S

S ES S | +S E num | (S)

( ( 1 1 + 2 2 +
num
ES +S num (S)
CS331 LL Parsing

(1+2+(3+4))+5 (1+2+(3+4))+5 (1+2+(3+4))+5 (1+2+(3+4))+5 (1+2+(3+4))+5 (1+2+(3+4))+5 (1+2+(3+4))+5 (1+2+(3+4))+5


( )

LL(1) parse table

Parse table:
S S E
CS331 LL Parsing

EOF

ES

How to implement?
The table can be easily converted into a recursive descent parser
num S S E
num ES +S (S)

Recursive descent parsing - LL(1)


Recursive descent is one of the simplest parsing techniques used in practical compilers Each non-terminal has an associated parsing procedure that can recognize any sequence of tokens generated by that nonterminal
Within a parsing procedure, both non-terminals and terminals can be matched: non-terminal A
call parsing procedure for A

(
ES

EOF

token t
compare t with current input token if match, consume input otherwise, ERROR

Three procedures: parse_S, parse_S, parse_E

CS331 LL Parsing

Parsing procedures may contain (call upon) code that performs some useful computation" (syntax directed translation)
CS331 LL Parsing

Recursive-Descent Parser
Lookahead token

void parse_S () { switch (token) { case num: parse_E(); parse_S(); return; case (: parse_E(); parse_S(); return; default: throw new ParseError(); } }
num S S E
num ES +S (S)

S ES S | +S E num | (S)

Recursive-Descent Parser
void parse_S() { switch (token) { case +: token = input.read(); parse_S(); return; case ): return; case EOF: return; default: throw new ParseError(); } }
num +
+S num (S)
CS331 LL Parsing

(
ES

EOF

(
ES

EOF

S S E

ES

CS331 LL Parsing

Recursive-Descent Parser
void parse_E() { switch (token) { case number: token = input.read(); return; case (: token = input.read(); parse_S(); if (token != )) throw new ParseError(); token = input.read(); return; default: throw new ParseError(); } }
num S S E
num ES +S (S)
CS331 LL Parsing

Call tree = Parse tree


parse_S parse_E parse_S parse_E parse_S parse_S
E

(1+2+(3+4))+5
S
S + S 5 S E S S E ( S ) E 3 + S S E S ( S ) E 1 + S

parse_S parse_S

2 +

parse_E

parse_S parse_S parse_E parse_S


CS331 LL Parsing

(
ES

EOF

parse_S

Recall: Expression grammar


Expression grammar with precedence
<goal> ::= <expr> <expr> ::= <term> <expr> <expr> ::= + <expr> | - <expr> | <term> ::= <factor> <term> <term> ::= * <term> | / <term> | <factor> ::= num | id

Recursive Descent Parser


For the expression grammar:
goal: token next token(); if (expr() = ERROR | token EOF) then return ERROR; else return OK; expr: if (term() = ERROR) then return ERROR; else return expr_prime(); Expr_prime: if (token = PLUS) then token next_token(); return expr(); else if (token = MINUS) then token next_token(); return expr(); else return OK;
CS331 LL Parsing

LL(1) parse table

CS331 LL Parsing

Recursive Descent Parser (cont.)


term: if (factor() = ERROR) then return ERROR; else return term_prime(); term_prime: if (token = MULT) then token next token(); return term(); else if (token = DIV) then token next token(); return term(); else return OK; factor: if (token = NUM) then token next token(); return OK; else if (token = ID) then token next token(); return OK; else return ERROR;
CS331 LL Parsing

Constructing parse tables


Needed: algorithm for automatically generating a predictive parse table from a grammar

S ES S | +S E num | (S )

num S S E
num ES

+
+S

(
ES

EOF

(S)

CS331 LL Parsing

Constructing Parse Tables


Use FIRST and FOLLOW sets Recall:
FIRST() for arbitrary string of terminals and nonterminals is the set of symbols that might begin the fully expanded version of FOLLOW(X) for a non-terminal X is the set of symbols that might follow the derivation of X in the input stream

Parse table entries


Consider a production X
Add to the X row for each symbol in FIRST()
num S ES S | +S E num | (S ) S S E
num ES +S (S)

(
ES

EOF

If can derive ( is nullable), add for each symbol in FOLLOW(X)


Follow

First
CS331 LL Parsing

Grammar is LL(1) if there are no conicting entries


CS331 LL Parsing

Computing nullable
X is nullable if it can derive the empty string:
Directly: X Indirectly: X has a production X YZ where all rhs symbols (Y, Z) are nullable

Constructing FIRST sets


FIRST(X) FIRST() if X FIRST(a) = {a} FIRST(X) FIRST(X) FIRST(X) FIRST() if X is nullable Algorithm

Algorithm
Assume all non-terminals non-nullable, apply rules repeatedly until no change in status

Assume FIRST() ={} for all , apply rules repeatedly to build FIRST sets

CS331 LL Parsing

CS331 LL Parsing

Constructing FOLLOW sets


FOLLOW(S) { EOF } if X Y
FOLLOW(Y) = FIRST() FIRST(X) FIRST(X)

Example
Nullable
Only S is nullable

if X Y and is nullable (or non-existent)


FOLLOW(Y) FOLLOW(X)

FIRST

S ES S | +S E num | (S)

Algorithm
Assume FOLLOW(X) = { } for all X, apply rules repeatedly to build FOLLOW sets
Common theme: iterative analysis. Start with initial assignment, apply rules until no change
CS331 LL Parsing

FIRST(ES ) = {num, ( } FIRST(+S) = { + } FIRST(num) = {num} FIRST( (S) ) = { ( } FOLLOW FIRST(S) = { + } FOLLOW(S) = { EOF, ) } FOLLOW(S) = = {EOF, )} FOLLOW(E) = { +, ), EOF}
CS331 LL Parsing

Creating the parse table


For each production X
Add to the X row for each symbol in FIRST() If is nullable, add for each symbol in FOLLOW(X) Entry for [S, EOF]

S ES S | +S E num | (S )

Ambiguous grammars
Construction of predictive parse table for ambiguous grammar results in conicts (but converse does not hold) Grammar and FIRST sets S S + S | S * S | num FIRST(S + S) = FIRST(S * S) = FIRST(num) = { num } Parse table
num S S+S S S*S
CS331 LL Parsing

is ACCEPT

FOLLOW(S) = { EOF, ) } FOLLOW(S) = = {EOF, )} FOLLOW(E) = { +, ), EOF} num S S E


num ES +S

FIRST(ES ) = {num, ( } FIRST(+S) = { + } FIRST(num) = {num} FIRST( (S) ) = { ( } FIRST(S) = { + } (


ES (S)

EOF
Accept

CS331 LL Parsing

LL(1) grammars
Provable facts about LL(1) grammars: no left recursive grammar is LL(1) no ambiguous grammar is LL(1) LL(1) parsers operate in linear time an -free grammar where each alternative expansion for A begins with a distinct terminal is a simple LL(1) grammar

LL grammars
LL(1) grammars may need to rewrite grammar (left recursion removal, left factoring) resulting grammar larger, less maintainable LL(k) grammars k-token lookahead more powerful than LL(1) grammars example:
S ::= ac | abc is LL(2)

Not all grammars are LL(1) S ::= aS | a is not LL(1) FIRST(aS) = FIRST(a) = {a}

Not all grammars are LL(k) Example:


Set of productions of form: S ::= aibj for i j

S ::= aS S ::= aS |
accepts the same language and is LL(1)
CS331 LL Parsing

Problem:
must choose production after k tokens of lookahead

Bottom-up parsers avoid some of these problems


CS331 LL Parsing

Completing the parser


One of the key jobs of the parser is to build an intermediate representation of the source code. To build an abstract syntax tree in the recursive descent parser, we can simply insert code at the appropriate points E.g., for expression grammar: factor() can stack nodes id, num term_prime() can stack nodes *, / term() can pop 3, build and push subtree expr_prime() can stack nodes +, expr() can pop 3, build and push subtree goal() can pop and return tree
CS331 LL Parsing

Creating the AST


abstract class Expr { } class Add extends Expr { Expr left, right; Add(Expr L, Expr R) { left = L; right = R; } } class Num extends Expr { int value; Num (int v) { value = v); }

Expr Num Add

CS331 LL Parsing

AST Representation
(1 + 2 + (3 + 4)) + 5 + + 1 2 3 + + 4 5 Add Num(1) Add Add Num(4) Add Num (5)

Creating the AST


Just add code to each parsing routine to create the appropriate nodes!
Works because parse tree and call tree have the same shape parse_S, parse_S, parse_E all return an Expr: void parse_E() void parse_S() void parse_S() Expr parse_E() Expr parse_S() Expr parse_S()

Num(2)

Num(3)

How can we generate this structure during recursivedescent parsing?


CS331 LL Parsing

CS331 LL Parsing

S ES S | +S E num | (S)

AST creation code


Expr parse_E() { switch(token) { case num: // E number Expr result = Num (token.value); token = input.read(); return result; case (: // E ( S ) token = input.read(); Expr result = parse_S(); if (token != )) throw new ParseError(); token = input.read(); return result; default: throw new ParseError(); } }
CS331 LL Parsing

parse_S
Expr parse_S() { S ES S | +S switch (token) { E num | (S) case num: case (: Expr left = parse_E(); Expr right = parse_S(); if (right == null) return left; else return new Add(left, right); default: throw new ParseError(); } }
CS331 LL Parsing

Oran interpreter!
int parse_S() { switch (token) { case number: case (: int left = parse_E(); int right = parse_S(); if (right == 0) return left; else return left + right; default: throw new ParseError(); } }

int parse_E() { switch(token) { case number: int result = token.value; token = input.read(); return result; case (: token = input.read(); int result = parse_S(); if (token != )) throw new ParseError(); token = input.read(); return result; default: throw new ParseError(); }}
CS331 LL Parsing

You might also like