Tekkom M4,5

Pertemuan 4 & 5
Syntax Analysis (Parsing)
1
Materi
 Peranan parser
 Jenis-jenis parser
 Error level dan Error recovery
 Context Free grammar (CFG)
 Parse tree
 Konversi RE ke CFG
 Ambiguity
 Menghilangkan ambiguity
 Left factoring dan left recursive
 Menghilangkan left recursion
2
Syntax Analyzer
 Syntax Analyzer creates the syntactic structure of the given
source program.
 This syntactic structure is mostly a parse tree.
 Syntax Analyzer is also known as parser.
 The syntax of a programming is described by a context-free
grammar (CFG). We will use BNF (Backus-Naur Form) notation
in the description of CFGs.
 The syntax analyzer (parser) checks whether a given source
program satisfies the rules implied by a context-free grammar
or not.
 If it satisfies, the parser creates the parse tree of that program.
 Otherwise the parser gives the error messages.
 A context-free grammar
 gives a precise syntactic specification of a programming language.
 the design of the grammar is an initial phase of the design of a
compiler.
 a grammar can be directly converted into a parser by some tools.
3
Parser
• Parser works on a stream of tokens.
• The smallest item is a token.
source Lexical token

parse tree
program Parser
Analyzer get next token
4
Parsers (cont.)
 We categorize the parsers into two groups:
1. Top-Down Parser
 the parse tree is created top to bottom, starting
from the root.
2. Bottom-Up Parser
 the parse is created bottom to top; starting from
the leaves
 Both top-down and bottom-up parsers scan the input from

left to right (one symbol at a time).
 Efficient top-down and bottom-up parsers can be
implemented only for sub-classes of context-free
grammars.
 LL for top-down parsing
 LR for bottom-up parsing
5
Context-Free Grammars
 Inherently recursive structures of a programming language
are defined by a context-free grammar.
 In a context-free grammar, we have:

 A finite set of terminals (in our case, this will be the set of tokens)
 A finite set of non-terminals (syntactic-variables)
 A finite set of productions rules in the following form
A where A is a non-terminal and
 is a string of terminals and non-terminals
(including the empty string)
 A start symbol (one of the non-terminal symbol)
 Example:
E E+E | E–E | E*E | E/E | -E
E (E)
E  id
6
Derivations
E  E+E
 E+E derives from E
 we can replace E by E+E
 to able to do this, we have to have a production rule EE+E
in our grammar.
E  E+E  id+E  id+id
 A sequence of replacements of non-terminal symbols is called a
derivation of id+id from E.
 In general a derivation step is
A   if there is a production rule A in our grammar

where  and  are arbitrary strings of terminal and non-terminal
symbols
1  2  ...  n (n derives from 1 or 1 derives n )

*
: derives in one step
 : derives in zero or more steps
+
 : derives in one or more steps
7
CFG - Terminology
 L(G) is the language of G (the language generated by
G) which is a set of sentences.
 A sentence of L(G) is a string of terminal symbols of
G.
 If S is the start symbol of G then
 is a sentence of L(G)+iff S   where  is a string of terminals of
G.
 If G is a context-free grammar, L(G) is a context-free
language.
 Two grammars are equivalent if they produce the
same* language.
S  
- If  contains non-terminals, it is called as a sentential form of G.
- If  does not contain non-terminals, it is called as a sentence of G.
8
Derivation Example
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)
OR
E  -E  -(E)  -(E+E)  -(E+id)  -(id+id)
 At each derivation step, we can choose any of the

non-terminal in the sentential form of G for the
replacement.
 If we always choose the left-most non-terminal in each

derivation step, this derivation is called as left-most
derivation.
 If we always choose the right-most non-terminal in

each derivation step, this derivation is called as right-
most derivation.
9
Left-Most and Right-Most
Derivations
Left-Most Derivation
E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)

lm lm lm lm lm
Right-Most Derivation
E rm -E rm
 -(E) rm
 -(E+E)rm -(E+id)rm  -(id+id)
 We will see that the top-down parsers try to find the
left-most derivation of the given source program.
 We will see that the bottom-up parsers try to find the

right-most derivation of the given source program in
the reverse order.
10
Parse Tree
• Inner nodes of a parse tree are non-terminal symbols.
• The leaves of a parse tree are terminal symbols.
• A parse tree can be seen as a graphical representation of a derivation .
E  -E E
 -(E) E
 -(E+E)
E
- E - E - E
( E ) ( E )
E E E + E
- E - E
 -(id+E)  -(id+id)
( E ) ( E )
E + E E + E
id id id
11
Ambiguity
• A grammar produces more than one parse tree for a sentence is
called as an ambiguous grammar.
E
E  E+E  id+E  id+E*E E + E
 id+id*E  id+id*id
id E * E
id id
E
E  E*E  E+E*E  id+E*E
 id+id*E  id+id*id E * E
E + E id
id id
12
Ambiguity (cont.)
 For the most parsers, the grammar must be
unambiguous.
 unambiguous grammar
 unique selection of the parse tree for a
sentence
 We should eliminate the ambiguity in the grammar

during the design phase of the compiler.
 An unambiguous grammar should be written to
eliminate the ambiguity.
 We have to prefer one of the parse trees of a
sentence (generated by an ambiguous grammar) to
disambiguate that grammar to restrict to this
choice.
13
Ambiguity (cont.)
stmt  if expr then stmt |
if expr then stmt else stmt | otherstmts
if E1 then if E2 then S1 else S2

stmt
if expr then stmt
E1 if expr then stmt else stmt

stmt
E2 S1 S2
if expr then stmt else stmt 2
E1 if expr then stmt S2
E2 S1 14
1
Ambiguity (cont.)
• We prefer the second parse tree (else matches with closest if).
• So, we have to disambiguate our grammar to reflect this choice.
• The unambiguous grammar will be:

stmt  matchedstmt | unmatchedstmt
matchedstmt  if expr then matchedstmt else matchedstmt | otherstmts
unmatchedstmt  if expr then stmt |

if expr then matchedstmt else unmatchedstmt
15
Ambiguity – Operator
Precedence
 Ambiguous grammars (because of ambiguous
operators) can be disambiguated according to
the precedence and associativity rules.
E  E+E | E*E | E^E | id | (E)

disambiguate the grammar
precedence: ^ (right to left)
* (left to right)
 + (left to right)
E  E+T | T
T  T*F | F
F  F^G | G
G  id | (E)
16
Left Recursion
 A grammar is left recursive if it has a non-
terminal A such that there is a derivation.
+
A  A for some string 
 Top-down parsing techniques cannot handle
left-recursive grammars.
 So, we have to convert our left-recursive
grammar into an equivalent grammar which
is not left-recursive.
 The left-recursion may appear in a single step
of the derivation (immediate left-recursion),
or may appear in more than one step of
the derivation.
17
Immediate Left-Recursion
AA|  where  does not start with A
 eliminate immediate left recursion
A   A’
A’   A’ |  an equivalent grammar
In general,
A  A 1 | ... | A m | 1 | ... | n where 1 ... n do not start with A
A  1 A’ | ... | n A’
A’  1 A’ | ... | m A’ |  an equivalent grammar
18
Immediate Left-Recursion --
Example
E  E+T | T
T  T*F | F
F  id | (E)

E  T E’
E’  +T E’ | 
T  F T’
T’  *F T’ | 
F  id | (E)
19
Left-Recursion -- Problem
• A grammar cannot be immediately left-recursive, but it still can be
left-recursive.
• By just eliminating the immediate left-recursion, we may not get
a grammar which is not left-recursive.
S  Aa | b
A  Sc | d
This grammar is not immediately left-recursive,
but it is still left-recursive.
S  Aa  Sca or
A  Sc  Aac causes to a left-recursion
• So, we have to eliminate all left-recursions from our grammar

20
Eliminate Left-Recursion --
Algorithm
- Arrange non-terminals in some order: A1 ... An
- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
Ai  A j 
by
Ai  1  | ... | k 
where Aj  1 | ... | k
}
- eliminate immediate left-recursions among Ai
productions
}
21
Eliminate Left-Recursion --
Example
S  Aa | b
A  Ac | Sd | f
- Order of non-terminals: S, A
for S:
- we do not enter the inner loop.
- there is no immediate left recursion in S.
for A:
- Replace A  Sd with A  Aad | bd
So, we will have A  Ac | Aad | bd | f
- Eliminate the immediate left-recursion in A
A  bdA’ | fA’
A’  cA’ | adA’ | 
So, the resulting equivalent grammar which is not left-recursive is:
S  Aa | b
A  bdA’ | fA’
A’  cA’ | adA’ | 
22
Eliminate Left-Recursion –
Example2
S  Aa | b
A  Ac | Sd | f
- Order of non-terminals: A, S
for A:
- we do not enter the inner loop.
- Eliminate the immediate left-recursion in A
A  SdA’ | fA’
A’  cA’ | 
for S:
- Replace S  Aa with S  SdA’a | fA’a
So, we will have S  SdA’a | fA’a | b
- Eliminate the immediate left-recursion in S
S  fA’aS’ | bS’
S’  dA’aS’ | 
So, the resulting equivalent grammar which is not left-recursive is:
S  fA’aS’ | bS’
S’  dA’aS’ | 
A  SdA’ | fA’
A’  cA’ |  23
Left-Factoring
 A predictive parser (a top-down parser
without backtracking) insists that the
grammar must be left-factored.
grammar  a new equivalent grammar

suitable for predictive parsing
stmt  if expr then stmt else stmt |

if expr then stmt
 when we see if, we cannot now which

production rule to choose to re-write stmt in
the derivation.
24
Left-Factoring (cont.)
 In general,
A  1 | 2 where  is non-empty and the first symbols
of 1 and 2 (if they have one)are different.
 when processing  we cannot know whether expand

A to 1 or
A to 2
 But, if we re-write the grammar as follows

A  A’
A’  1 | 2 so, we can immediately expand A to A’
25
Left-Factoring -- Algorithm
 For each non-terminal A with two or more alternatives (production
rules) with a common non-empty prefix, let say
A  1 | ... | n | 1 | ... | m
convert it into
A  A’ | 1 | ... | m
A’  1 | ... | n
26
Left-Factoring – Example1
A  abB | aB | cdg | cdeB | cdfB

A  aA’ | cdg | cdeB | cdfB
A’  bB | B

A  aA’ | cdA’’
A’  bB | B
A’’  g | eB | fB
27
Left-Factoring – Example2
A  ad | a | ab | abc | b

A  aA’ | b
A’  d |  | b | bc

A  aA’ | b
A’  d |  | bA’’
A’’   | c
28
Non-Context Free Language
Constructs
 There are some language constructions in the
programming languages which are not context-free.
This means that, we cannot write a context-free
grammar for these constructions.
 L1 = { c |  is in (a|b)*} is not context-free

 declaring an identifier and checking whether it is declared
or not later. We cannot do this with a context-free language.
We need semantic analyzer (which is not context-free).
 L2 = {anbmcndm | n1 and m1 } is not context-

free
 declaring two functions (one with n parameters, the other
one with m parameters), and then calling them with actual
parameters. 29

Tekkom M4,5

Uploaded by

Copyright:

Available Formats

Tekkom M4,5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tekkom M4,5

Uploaded by

Copyright:

Available Formats

Pertemuan 4 & 5

Syntax Analysis (Parsing)

• The smallest item is a token.

source Lexical token

 Both top-down and bottom-up parsers scan the input from

 In a context-free grammar, we have:

A   if there is a production rule A in our grammar

 At each derivation step, we can choose any of the

 If we always choose the left-most non-terminal in each

 If we always choose the right-most non-terminal in

E  -E  -(E)  -(E+E)  -(id+E)  -(id+id)

 We will see that the bottom-up parsers try to find the

• A parse tree can be seen as a graphical representation of a derivation .

 We should eliminate the ambiguity in the grammar

if E1 then if E2 then S1 else S2

if expr then stmt

E1 if expr then stmt else stmt

• The unambiguous grammar will be:

matchedstmt  if expr then matchedstmt else matchedstmt | otherstmts

unmatchedstmt  if expr then stmt |

E  E+E | E*E | E^E | id | (E)

 eliminate immediate left recursion

• So, we have to eliminate all left-recursions from our grammar

grammar  a new equivalent grammar

stmt  if expr then stmt else stmt |

 when we see if, we cannot now which

A  1 | 2 where  is non-empty and the first symbols

of 1 and 2 (if they have one)are different.

 when processing  we cannot know whether expand

 But, if we re-write the grammar as follows

A  1 | ... | n | 1 | ... | m

 L1 = { c |  is in (a|b)*} is not context-free

 L2 = {anbmcndm | n1 and m1 } is not context-

You might also like