Chapter 3-Syntax Analysis-II
Chapter 3-Syntax Analysis-II
Chapter 3-Syntax Analysis-II
Chapter 3
Lecture 2
Top-Down Parsing
1
Top-Down Parsing
• The parse tree is created top to bottom.
• Top-down parser
– Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does not work, we backtrack to try other
alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient
– Predictive Parsing
• no backtracking
• efficient
• needs a special form of grammars (LL(1) grammars).
• Recursive Predictive Parsing is a special form of Recursive Descent parsing without backtracking.
• Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.
21 Q
Recursive-Descent Parsing (uses Backtracking)
• Backtracking is needed.
• It tries to find the left-most derivation.
S aBc
B bc | b
S S
input: abc
a B c a B c
fails, backtrack
b c b
3
Predictive Parser
a grammar a grammar suitable for predictive
eliminate left parsing (a LL(1) grammar)
left recursion factor no %100 guarantee.
current token
4
Left Factoring
A predictive parser (a top-down parser without backtracking) insists that the grammar
must be left-factored.
grammar a new equivalent grammar suitable for predictive parsing
stmt → if expr then stmt else stmt | if expr then stmt when we see if, we cannot now
which production rule to choose to re-write stmt in the derivation.
In general,
A → βα1 | βα where α is non-empty and the
first symbols of β1 and β2 (if they have one)are different.
when processing α we cannot know whether expand A to βα1 or A to βα2 But, if
we re-write the grammar as follows
A → αA’
A’ → β1 | β2 so, we can immediately expand A to αA’
5
Predictive Parser (example)
stmt if ...... |
while ...... |
begin ...... |
for .....
• When we are trying to write the non-terminal stmt, if the current token
is if we have to choose first production rule.
• When we are trying to write the non-terminal stmt, we can uniquely
choose the production rule by just looking the current token.
• We eliminate the left recursion in the grammar, and left factor it. But it
may not be suitable for predictive parsing (not LL(1) grammar).
6
Recursive Predictive Parsing
• Each non-terminal corresponds to a procedure.
proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}
7
Recursive Predictive Parsing (cont.)
A aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}
8
Recursive Predictive Parsing (cont.)
• When to apply -productions.
A aA | bB |
9
Recursive Predictive Parsing (Example)
A aBe | cBd | C
B bB |
Cf
proc C { match the current token with f,
proc A { and move to the next token; }
case of the current token {
a: - match the current token with a,
and move to the next token; proc B {
- call B; case of the current token {
- match the current token with e, b: - match the current token with b,
and move to the next token; and move to the next token;
c: - match the current token with c, - call B
and move to the next token; e,d: do nothing
- call B; }
- match the current token with d, }
and move to the next token;
f: - call C
}
}
follow set of B
first set of C
10
Non-Recursive Predictive Parsing -- LL(1) Parser
• Non-Recursive predictive parsing is a table-driven parser.
• It is a top-down parser.
• It is also known as LL(1) Parser.
input buffer
Parsing Table
11
LL(1) Parser
input buffer
– our string to be parsed. We will assume that its end is marked with a special symbol $.
output
– a production rule representing a step of the derivation sequence (left-most derivation) of the string in the input
buffer.
stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol S. $S initial stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is completed.
parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.
12
LL(1) Parser – Parser Actions
• The symbol at the top of the stack (say X) and the current symbol in the input string
(say a) determine the parser action.
• There are four possible parser actions.
1. If X and a are $ parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
parser pops X from the stack, and moves the next symbol in the input buffer.
3. If X is a non-terminal
parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production rule
XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The
parser also outputs the production rule XY1Y2...Yk to represent a step of the
derivation.
4. none of the above error
– all empty entries in the parsing table are errors.
– If X is a terminal symbol different from a, this is also an error case.
13
LL(1) Parser – Example1
S aBa a b $ LL(1) Parsing
B bB | S S aBa Table
B B B bB
14
LL(1) Parser – Example1 (cont.)
Derivation(left-most): SaBaabBaabbBaabba
S
parse tree
a B a
b B
b B
15
LL(1) Parser – Example2
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
id + * ( ) $
E E TE’ E TE’
E’ E’ +TE’ E’ E’
T T FT’ T FT’
T’ T’ T’ *FT’ T’ T’
F F id F (E)
16
LL(1) Parser – Example2
stack input output
$E id+id$ E TE’
$E’T id+id$ T FT’
$E’ T’F id+id$ F id
$ E’ T’id id+id$
$ E ’ T’ +id$ T’
$ E’ +id$ E’ +TE’
$ E’ T+ +id$
$ E’ T id$ T FT’
$ E ’ T’ F id$ F id
$ E’ T’id id$
$ E ’ T’ $ T’
$ E’ $ E’
$ $ accept
17
Constructing LL(1) Parsing Tables
• Two functions are used in the construction of LL(1) parsing tables:
– FIRST FOLLOW
18
Compute FIRST for Any String X
• If X is a terminal symbol FIRST(X)={X}
• If X is a non-terminal symbol and X is a production rule
is in FIRST(X).
• If X is a non-terminal symbol and X Y1Y2..Yn is a production rule
if a terminal a in FIRST(Yi) and is in all FIRST(Yj) for j=1,...,i-1
then a is in FIRST(X).
if is in all FIRST(Yj) for j=1,...,n
then is in FIRST(X).
• If X is FIRST(X)={}
• If X is Y1Y2..Yn
if a terminal a in FIRST(Yi) and is in all FIRST(Yj) for j=1,...,i-1
then a is in FIRST(X).
if is in all FIRST(Yj) for j=1,...,n
then is in FIRST(X).
19
FIRST Example
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
20
Compute FOLLOW (for non-terminals)
• If S is the start symbol $ is in FOLLOW(S)
• If ( A B is a production rule ) or
( A B is a production rule and is in FIRST() )
everything in FOLLOW(A) is in FOLLOW(B).
We apply these rules until nothing more can be added to any follow set.
21
FOLLOW Example
E TE’
E’ +TE’ |
T FT’
T’ *FT’ |
F (E) | id
FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }
22
Constructing LL(1) Parsing Table -- Algorithm
• for each production rule A of a grammar G
– for each terminal a in FIRST()
add A to M[A,a]
– If in FIRST()
for each terminal a in FOLLOW(A) add A to M[A,a]
– If in FIRST() and $ in FOLLOW(A)
add A to M[A,$]
• All other undefined entries of the parsing table are error entries.
23
Constructing LL(1) Parsing Table -- Example
E TE’ FIRST(TE’)={(,id} E TE’ into M[E,(] and M[E,id]
E’ +TE’ FIRST(+TE’ )={+} E’ +TE’ into M[E’,+]
E’ FIRST()={} none
but since in FIRST()
and FOLLOW(E’)={$,)} E’ into M[E’,$] and M[E’,)]
T’ FIRST()={} none
but since in FIRST()
and FOLLOW(T’)={$,),+} T’ into M[T’,$], M[T’,)] and M[T’,+]
24
LL(1) Grammars
• A grammar whose parsing table has no multiply-defined entries is said
to be LL(1) grammar.
• The parsing table of a grammar may contain more than one production
rule. In this case, we say that it is not a LL(1) grammar.
25
A Grammar which is not LL(1)
SiCtSE | a FOLLOW(S) = { $,e }
EeS | FOLLOW(E) = { $,e }
Cb FOLLOW(C) = { t }
FIRST(iCtSE) = {i}
a b e i t $
FIRST(a) = {a}
S Sa S iCtSE
FIRST(eS) = {e}
E EeS E
FIRST() = {}
E
FIRST(b) = {b}
C Cb
Problem ambiguity
26
A Grammar which is not LL(1) (cont.)
• What do we have to do it if the resulting parsing table contains multiply
defined entries?
– If we didn’t eliminate left recursion, eliminate the left recursion in the grammar.
– If the grammar is not left factored, we have to left factor the grammar.
– If its (new grammar’s) parsing table still contains multiply defined entries, that grammar is
ambiguous or it is inherently not a LL(1) grammar.
• A left recursive grammar cannot be a LL(1) grammar.
– A A |
any terminal that appears in FIRST() also appears FIRST(A) because A .
If is , any terminal that appears in FIRST() also appears in FIRST(A) and FOLLOW(A).
28