LL1 Parsing
LL1 Parsing
LL1 Parsing
ng = predictive parsing Driven by a predictive parsing table of non-terminals X input symbols productions
CS331 LL Parsing
LL Parsing
Reads input from left to right and constructs leftmost derivation (forwards) LL Parsing is predictive Features
input parsed from left to right leftmost derivation (forward) one token lookahead
CS331 LL Parsing
LL(1) Grammars
Definition A grammar G is LL(1) if and only if for each set of productions A ::= 1 | 2| | n
1. FIRST( 1), FIRST(2), FIRST( n ) are all pairwise disjoint, and 2. if i * then FIRST( j) FOLLOW(A) = , for all 1 j n, i j.
What rule to select for a given non-terminal and input token can be represented in a parse table M. Algorithm for LL(1) parse table construction must not result in multiple entries for any M[A, a] or M[A, eof]
(Aho, Lam, Sethi, and Ullman, Algorithm 4.31) Whether a grammar is LL(1) or not is decidable
CS331 LL Parsing
Another example
S
ES (S)S (ES)S (1S)S (1+S)S (1+ES)S (1+2 S)S
S ES S | +S E num | (S)
( ( 1 1 + 2 2 +
num
ES +S num (S)
CS331 LL Parsing
Parse table:
S S E
CS331 LL Parsing
EOF
ES
How to implement?
The table can be easily converted into a recursive descent parser
num S S E
num ES +S (S)
(
ES
EOF
token t
compare t with current input token if match, consume input otherwise, ERROR
CS331 LL Parsing
Parsing procedures may contain (call upon) code that performs some useful computation" (syntax directed translation)
CS331 LL Parsing
Recursive-Descent Parser
Lookahead token
void parse_S () { switch (token) { case num: parse_E(); parse_S(); return; case (: parse_E(); parse_S(); return; default: throw new ParseError(); } }
num S S E
num ES +S (S)
S ES S | +S E num | (S)
Recursive-Descent Parser
void parse_S() { switch (token) { case +: token = input.read(); parse_S(); return; case ): return; case EOF: return; default: throw new ParseError(); } }
num +
+S num (S)
CS331 LL Parsing
(
ES
EOF
(
ES
EOF
S S E
ES
CS331 LL Parsing
Recursive-Descent Parser
void parse_E() { switch (token) { case number: token = input.read(); return; case (: token = input.read(); parse_S(); if (token != )) throw new ParseError(); token = input.read(); return; default: throw new ParseError(); } }
num S S E
num ES +S (S)
CS331 LL Parsing
(1+2+(3+4))+5
S
S + S 5 S E S S E ( S ) E 3 + S S E S ( S ) E 1 + S
parse_S parse_S
2 +
parse_E
(
ES
EOF
parse_S
CS331 LL Parsing
S ES S | +S E num | (S )
num S S E
num ES
+
+S
(
ES
EOF
(S)
CS331 LL Parsing
(
ES
EOF
First
CS331 LL Parsing
Computing nullable
X is nullable if it can derive the empty string:
Directly: X Indirectly: X has a production X YZ where all rhs symbols (Y, Z) are nullable
Algorithm
Assume all non-terminals non-nullable, apply rules repeatedly until no change in status
Assume FIRST() ={} for all , apply rules repeatedly to build FIRST sets
CS331 LL Parsing
CS331 LL Parsing
Example
Nullable
Only S is nullable
FIRST
S ES S | +S E num | (S)
Algorithm
Assume FOLLOW(X) = { } for all X, apply rules repeatedly to build FOLLOW sets
Common theme: iterative analysis. Start with initial assignment, apply rules until no change
CS331 LL Parsing
FIRST(ES ) = {num, ( } FIRST(+S) = { + } FIRST(num) = {num} FIRST( (S) ) = { ( } FOLLOW FIRST(S) = { + } FOLLOW(S) = { EOF, ) } FOLLOW(S) = = {EOF, )} FOLLOW(E) = { +, ), EOF}
CS331 LL Parsing
S ES S | +S E num | (S )
Ambiguous grammars
Construction of predictive parse table for ambiguous grammar results in conicts (but converse does not hold) Grammar and FIRST sets S S + S | S * S | num FIRST(S + S) = FIRST(S * S) = FIRST(num) = { num } Parse table
num S S+S S S*S
CS331 LL Parsing
is ACCEPT
EOF
Accept
CS331 LL Parsing
LL(1) grammars
Provable facts about LL(1) grammars: no left recursive grammar is LL(1) no ambiguous grammar is LL(1) LL(1) parsers operate in linear time an -free grammar where each alternative expansion for A begins with a distinct terminal is a simple LL(1) grammar
LL grammars
LL(1) grammars may need to rewrite grammar (left recursion removal, left factoring) resulting grammar larger, less maintainable LL(k) grammars k-token lookahead more powerful than LL(1) grammars example:
S ::= ac | abc is LL(2)
Not all grammars are LL(1) S ::= aS | a is not LL(1) FIRST(aS) = FIRST(a) = {a}
S ::= aS S ::= aS |
accepts the same language and is LL(1)
CS331 LL Parsing
Problem:
must choose production after k tokens of lookahead
CS331 LL Parsing
AST Representation
(1 + 2 + (3 + 4)) + 5 + + 1 2 3 + + 4 5 Add Num(1) Add Add Num(4) Add Num (5)
Num(2)
Num(3)
CS331 LL Parsing
S ES S | +S E num | (S)
parse_S
Expr parse_S() { S ES S | +S switch (token) { E num | (S) case num: case (: Expr left = parse_E(); Expr right = parse_S(); if (right == null) return left; else return new Add(left, right); default: throw new ParseError(); } }
CS331 LL Parsing
Oran interpreter!
int parse_S() { switch (token) { case number: case (: int left = parse_E(); int right = parse_S(); if (right == 0) return left; else return left + right; default: throw new ParseError(); } }
int parse_E() { switch(token) { case number: int result = token.value; token = input.read(); return result; case (: token = input.read(); int result = parse_S(); if (token != )) throw new ParseError(); token = input.read(); return result; default: throw new ParseError(); }}
CS331 LL Parsing