Chapter 3-Syntax Analysis-II

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 28

Principles of Compiler Design

Chapter 3
Lecture 2
Top-Down Parsing

1
Top-Down Parsing
• The parse tree is created top to bottom.
• Top-down parser
– Recursive-Descent Parsing
• Backtracking is needed (If a choice of a production rule does not work, we backtrack to try other
alternatives.)
• It is a general parsing technique, but not widely used.
• Not efficient
– Predictive Parsing
• no backtracking
• efficient
• needs a special form of grammars (LL(1) grammars).
• Recursive Predictive Parsing is a special form of Recursive Descent parsing without backtracking.
• Non-Recursive (Table Driven) Predictive Parser is also known as LL(1) parser.

21 Q
Recursive-Descent Parsing (uses Backtracking)
• Backtracking is needed.
• It tries to find the left-most derivation.

S  aBc
B  bc | b
S S
input: abc
a B c a B c
fails, backtrack
b c b

3
Predictive Parser
a grammar   a grammar suitable for predictive
eliminate left parsing (a LL(1) grammar)
left recursion factor no %100 guarantee.

• When re-writing a non-terminal in a derivation step, a predictive parser


can uniquely choose a production rule by just looking the current
symbol in the input string.

A  1 | ... | n input: ... a .......

current token

4
Left Factoring

 A predictive parser (a top-down parser without backtracking) insists that the grammar
must be left-factored.
 
 grammar a new equivalent grammar suitable for predictive parsing
stmt → if expr then stmt else stmt | if expr then stmt when we see if, we cannot now
which production rule to choose to re-write stmt in the derivation.

 In general, 
A → βα1 | βα where α is non-empty and the
first symbols of β1 and β2 (if they have one)are different.
 when processing α we cannot know whether expand A to βα1 or A to βα2 But, if
we re-write the grammar as follows
A → αA’
A’ → β1 | β2 so, we can immediately expand A to αA’
5
Predictive Parser (example)

stmt  if ...... |
while ...... |
begin ...... |
for .....

• When we are trying to write the non-terminal stmt, if the current token
is if we have to choose first production rule.
• When we are trying to write the non-terminal stmt, we can uniquely
choose the production rule by just looking the current token.
• We eliminate the left recursion in the grammar, and left factor it. But it
may not be suitable for predictive parsing (not LL(1) grammar).
6
Recursive Predictive Parsing
• Each non-terminal corresponds to a procedure.

Ex: A  aBb (This is only the production rule for A)

proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}

7
Recursive Predictive Parsing (cont.)
A  aBb | bAB

proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}

8
Recursive Predictive Parsing (cont.)
• When to apply -productions.

A  aA | bB | 

• If all other productions fail, we should apply an -production. For


example, if the current token is not a or b, we may apply the
-production.
• Most correct choice: We should apply an -production for a non-
terminal A when the current token is in the follow set of A (which
terminals can follow A in the sentential forms).

9
Recursive Predictive Parsing (Example)
A  aBe | cBd | C
B  bB | 
Cf
proc C { match the current token with f,
proc A { and move to the next token; }
case of the current token {
a: - match the current token with a,
and move to the next token; proc B {
- call B; case of the current token {
- match the current token with e, b: - match the current token with b,
and move to the next token; and move to the next token;
c: - match the current token with c, - call B
and move to the next token; e,d: do nothing
- call B; }
- match the current token with d, }
and move to the next token;
f: - call C
}
}
follow set of B

first set of C
10
Non-Recursive Predictive Parsing -- LL(1) Parser
• Non-Recursive predictive parsing is a table-driven parser.
• It is a top-down parser.
• It is also known as LL(1) Parser.

input buffer

stack Non-recursive output


Predictive Parser

Parsing Table

11
LL(1) Parser
input buffer
– our string to be parsed. We will assume that its end is marked with a special symbol $.

output
– a production rule representing a step of the derivation sequence (left-most derivation) of the string in the input
buffer.

stack
– contains the grammar symbols
– at the bottom of the stack, there is a special end marker symbol $.
– initially the stack contains only the symbol $ and the starting symbol S. $S  initial stack
– when the stack is emptied (ie. only $ left in the stack), the parsing is completed.

parsing table
– a two-dimensional array M[A,a]
– each row is a non-terminal symbol
– each column is a terminal symbol or the special symbol $
– each entry holds a production rule.

12
LL(1) Parser – Parser Actions
• The symbol at the top of the stack (say X) and the current symbol in the input string
(say a) determine the parser action.
• There are four possible parser actions.
1. If X and a are $  parser halts (successful completion)
2. If X and a are the same terminal symbol (different from $)
 parser pops X from the stack, and moves the next symbol in the input buffer.
3. If X is a non-terminal
 parser looks at the parsing table entry M[X,a]. If M[X,a] holds a production rule
XY1Y2...Yk, it pops X from the stack and pushes Yk,Yk-1,...,Y1 into the stack. The
parser also outputs the production rule XY1Y2...Yk to represent a step of the
derivation.
4. none of the above  error
– all empty entries in the parsing table are errors.
– If X is a terminal symbol different from a, this is also an error case.

13
LL(1) Parser – Example1
S  aBa a b $ LL(1) Parsing
B  bB |  S S  aBa Table
B B B  bB

stack input output


$S abba$ S  aBa
$aBa abba$
$aB bba$ B  bB
$aBb bba$
$aB ba$ B  bB
$aBb ba$
$aB a$ B
$a a$
$ $ accept, successful completion

14
LL(1) Parser – Example1 (cont.)

Outputs: S  aBa B  bB B  bB B

Derivation(left-most): SaBaabBaabbBaabba

S
parse tree
a B a

b B

b B


15
LL(1) Parser – Example2

E  TE’
E’  +TE’ | 
T  FT’
T’  *FT’ | 
F  (E) | id

id + * ( ) $
E E  TE’ E  TE’
E’ E’  +TE’ E’   E’  
T T  FT’ T  FT’
T’ T’   T’  *FT’ T’   T’  
F F  id F  (E)
16
LL(1) Parser – Example2
stack input output
$E id+id$ E  TE’
$E’T id+id$ T  FT’
$E’ T’F id+id$ F  id
$ E’ T’id id+id$
$ E ’ T’ +id$ T’  
$ E’ +id$ E’  +TE’
$ E’ T+ +id$
$ E’ T id$ T  FT’
$ E ’ T’ F id$ F  id
$ E’ T’id id$
$ E ’ T’ $ T’  
$ E’ $ E’  
$ $ accept

17
Constructing LL(1) Parsing Tables
• Two functions are used in the construction of LL(1) parsing tables:
– FIRST FOLLOW

• FIRST() is a set of the terminal symbols which occur as first symbols


in strings derived from  where  is any string of grammar symbols.
• if  derives to , then  is also in FIRST() .

• FOLLOW(A) is the set of the terminals which occur immediately after


(follow) the non-terminal A in the strings derived from the starting
symbol.
– a terminal a is in FOLLOW(A) if S  * Aa
*
– $ is in FOLLOW(A) if S  A

18
Compute FIRST for Any String X
• If X is a terminal symbol  FIRST(X)={X}
• If X is a non-terminal symbol and X   is a production rule
  is in FIRST(X).
• If X is a non-terminal symbol and X  Y1Y2..Yn is a production rule
 if a terminal a in FIRST(Yi) and  is in all FIRST(Yj) for j=1,...,i-1
then a is in FIRST(X).
 if  is in all FIRST(Yj) for j=1,...,n
then  is in FIRST(X).
• If X is   FIRST(X)={}
• If X is Y1Y2..Yn
 if a terminal a in FIRST(Yi) and  is in all FIRST(Yj) for j=1,...,i-1
then a is in FIRST(X).
 if  is in all FIRST(Yj) for j=1,...,n
then  is in FIRST(X).
19
FIRST Example
E  TE’
E’  +TE’ | 
T  FT’
T’  *FT’ | 
F  (E) | id

FIRST(F) = {(,id} FIRST(TE’) = {(,id}


FIRST(T’) = {*, } FIRST(+TE’ ) = {+}
FIRST(T) = {(,id} FIRST() = {}
FIRST(E’) = {+, } FIRST(FT’) = {(,id}
FIRST(E) = {(,id} FIRST(*FT’) = {*}
FIRST() = {}
FIRST((E)) = {(}
FIRST(id) = {id}

20
Compute FOLLOW (for non-terminals)
• If S is the start symbol  $ is in FOLLOW(S)

• if A  B is a production rule


 everything in FIRST() is FOLLOW(B) except 

• If ( A  B is a production rule ) or
( A  B is a production rule and  is in FIRST() )
 everything in FOLLOW(A) is in FOLLOW(B).

We apply these rules until nothing more can be added to any follow set.

21
FOLLOW Example
E  TE’
E’  +TE’ | 
T  FT’
T’  *FT’ | 
F  (E) | id

FOLLOW(E) = { $, ) }
FOLLOW(E’) = { $, ) }
FOLLOW(T) = { +, ), $ }
FOLLOW(T’) = { +, ), $ }
FOLLOW(F) = {+, *, ), $ }

22
Constructing LL(1) Parsing Table -- Algorithm
• for each production rule A   of a grammar G
– for each terminal a in FIRST()
 add A   to M[A,a]
– If  in FIRST()
 for each terminal a in FOLLOW(A) add A   to M[A,a]
– If  in FIRST() and $ in FOLLOW(A)
 add A   to M[A,$]

• All other undefined entries of the parsing table are error entries.

23
Constructing LL(1) Parsing Table -- Example
E  TE’ FIRST(TE’)={(,id}  E  TE’ into M[E,(] and M[E,id]
E’  +TE’ FIRST(+TE’ )={+}  E’  +TE’ into M[E’,+]

E’   FIRST()={}  none
but since  in FIRST()
and FOLLOW(E’)={$,)}  E’   into M[E’,$] and M[E’,)]

T  FT’ FIRST(FT’)={(,id}  T  FT’ into M[T,(] and M[T,id]


T’  *FT’ FIRST(*FT’ )={*}  T’  *FT’ into M[T’,*]

T’   FIRST()={}  none
but since  in FIRST()
and FOLLOW(T’)={$,),+}  T’   into M[T’,$], M[T’,)] and M[T’,+]

F  (E) FIRST((E) )={(}  F  (E) into M[F,(]

F  id FIRST(id)={id}  F  id into M[F,id]

24
LL(1) Grammars
• A grammar whose parsing table has no multiply-defined entries is said
to be LL(1) grammar.

one input symbol used as a look-head symbol do determine parser action

LL(1) left most derivation


input scanned from left to right

• The parsing table of a grammar may contain more than one production
rule. In this case, we say that it is not a LL(1) grammar.

25
A Grammar which is not LL(1)
SiCtSE | a FOLLOW(S) = { $,e }
EeS |  FOLLOW(E) = { $,e }
Cb FOLLOW(C) = { t }

FIRST(iCtSE) = {i}
a b e i t $
FIRST(a) = {a}
S Sa S  iCtSE
FIRST(eS) = {e}
E EeS E
FIRST() = {}
E
FIRST(b) = {b}
C Cb

two production rules for M[E,e]

Problem  ambiguity
26
A Grammar which is not LL(1) (cont.)
• What do we have to do it if the resulting parsing table contains multiply
defined entries?
– If we didn’t eliminate left recursion, eliminate the left recursion in the grammar.
– If the grammar is not left factored, we have to left factor the grammar.
– If its (new grammar’s) parsing table still contains multiply defined entries, that grammar is
ambiguous or it is inherently not a LL(1) grammar.
• A left recursive grammar cannot be a LL(1) grammar.
– A  A | 
 any terminal that appears in FIRST() also appears FIRST(A) because A  .
 If  is , any terminal that appears in FIRST() also appears in FIRST(A) and FOLLOW(A).

• A grammar is not left factored, it cannot be a LL(1) grammar


• A  1 | 2
any terminal that appears in FIRST(1) also appears in FIRST(2).

• An ambiguous grammar cannot be a LL(1) grammar.


27
Properties of LL(1) Grammars
• A grammar G is LL(1) if and only if the following conditions hold for
two distinctive production rules A   and A  

1. Both  and  cannot derive strings starting with same terminals.

2. At most one of  and  can derive to .

3. If  can derive to , then  cannot derive to any string starting


with a terminal in FOLLOW(A).

28

You might also like