MODULE 3 Syntax Analysis
MODULE 3 Syntax Analysis
MODULE 3 Syntax Analysis
CHAPTER 4
SYNTAX ANALYSIS
VENKATESH
Senior Associate Professor
Department of CSE
AIET, MOODBIDRI
Introduction
■ Syntax Analysis is the second phase in compilation. The Syntax
Analyzer (parser) checks the syntax of the language.
■ A parsing or syntax analysis is a process which takes the input
string w and produces either a parse tree (Syntactic Structure) or
generates the syntactic errors.
■ Example, a = b + 10
2
4.1 Role of Parser
6
4.1.2 Representative grammars
Ambiguous version of arithmetic expression grammar:
E → E + E | E * E | ( E ) | id
Unambiguous version of arithmetic expression grammar
E→ E+T|T
T→T*F |F
F → ( E ) | id
8
3. Semantic errors: include type mismatches between operators and operands.
Eg: return of a value in a function whose return type is void
4. Logical errors: can be anything from incorrect reasoning on the part of the
programmer.
Eg: Use of assignment operator = instead of the comparison operator = =.
❑The precision of parsing methods allows the syntactic errors to be detected
very efficiently
❑Several parsing methods such as LL and LR detects an error as soon as
possible., that is when the stream of tokens from the lexical analyzer cannot be
parsed further according to the grammar specified.
■ The error handler in a parser has goals that are simple to state but
challenging to realize:
❑Report the presence of errors clearly and accurately
❑Recover from each error quickly enough to detect the
subsequent errors.
❑Add minimal overhead to the processing of correct programs.
How an error handler report the presence of an error?
❑At the very least, it must report the place in the source program
where an error is detected, because there is a good chance that
the actual error occurred within the previous few tokens.
❑A common strategy is to print the offending line with a pointer to
the position at which an error is detected.
10
4.1.4 Error - Recovery strategies
11
1)Panic-Mode Recovery
■ On discovering an error, the parser discards input symbols one at a time
until one of a designated set of Synchronizing tokens is found.
■ Synchronizing tokens are usually delimiters such as semicolon or }
whose role in the source program is clear and unambiguous.
■ The compiler designer must select the synchronizing tokens
appropriate for the source language.
■ Panic mode correction often skips a considerable amount of input
without checking it for additional errors.
Advantage:
Simplicity
Is guaranteed not to go into an infinite loop
2) Phrase-Level Recovery
■ On discovering an error, A parser may perform local correction on the
remaining input. i.e, it may replace a prefix of the remaining input by some
string that allows the parser to continue.
Ex: Typical local correction is to replace a comma by a semicolon, delete an
extraneous semicolon or insert a missing semicolon
■ Local correction is left to the compiler designer and the replacement has to
be done carefully so that it does not lead to infinite loops.
■ Phrase level replacement has been used in several error-repairing
compliers, as it can correct any input string.
■ Major drawback is difficulty in coping with the situations in which the actual
error has occurred before the point of detection.
3) Error Productions
17
Derivations
Left most derivation (LMD): In left most derivation, at each step of derivation the left-most non-
terminal is chosen for replacement.
Right most derivation: : In right most derivation, at each step of derivation the right-most non-
terminal is chosen for replacement.
i.e In S ⇒ α
*
- If α contains non-terminals, it is called as a sentential form of G.
- If α does not contain non-terminals, it is called as a sentence of G.
■ L(G) is the language of G (the language generated by G) which is its set of sentences
(terminal strings)
19
Parse tree
■ A parse tree is a graphical representation of a derivation that
filters out the order in which the productions are applied to
replace the non terminals.
■ Each interior node of parse tree is labeled with a nonterminal in
the head of the production and the children of the node are
labeled with symbols in the body of the production
■ The leaves of the parse tree which are labeled by nonterminal or
terminals and are read from left to right, constitute a sentential
form called the yield or frontier of the parse tree.
20
Ambiguity
id id
Activity
Given a grammar S (L) | a
L L , S | S
■ unambiguous grammar
unique selection of the parse tree for a sentence
stmt stmt
E2 S1 E2 S1 S2
• The disambiguity rule is “match each else with closest unmatched Then”
• So, we have to disambiguate our grammar to reflect this choice.
| other
open_stmt if expr then stmt
| if expr then matched_stmt else open_stmt
Left Recursion
■ A grammar is left recursive if it has a non-terminal A such that there
is a derivation.
+
A ⇒ Aα for some string α
A non terminal is said to be left recursive if the symbol on the LHS and
the leftmost symbol in the RHS is same.
■ Top-down parsing techniques cannot handle left-recursive grammars.
■ So, a transformation is required to eliminate left recursion.
■ The left-recursion may appear in a single step of the derivation
(immediate left-recursion), or may appear in more than one step of
the derivation (indirect left recursion).
Immediate Left-Recursion
Consider a non terminal A with two productions:
A → β A|
A | → α A| | ε
Immediate left recursion can be eliminated by the following
technique, which works foe any number of A- productions.
33
Immediate Left-Recursion -- Example
E → E+T | T
T → T*F | F
F → id | (E)
- Order of non-terminals: S, A
for S:
- we do not enter the inner loop.
- there is no immediate left recursion in S.
for A:
- Replace A → Sd with A → Aad | bd
So, we will have A → Ac | Aad | bd | ε
- Order of non-terminals: A, S
for A:
- we do not enter the inner loop.
- Eliminate the immediate left-recursion in A
A → Sd A | | A |
A|→ cA| | ε
for S:
- Replace S → Aa with S → Sd A | a | A | a
So, we will have S → Sd A | a | A | a | b
- Eliminate the immediate left-recursion in S
S → A | a A | | bS |
S’ → d A | aS | | ε
(i)
S aB | aC | Sd | Se
B bBc | f
Cg
(ii)
S cAd
A Ac | Sd | a
Left-Factoring
■ Left factoring is a grammar transformation that is useful for producing a
grammar suitable for predictive or top down parsing.
■ A predictive parser (a top-down parser without backtracking) insists
that the grammar must be left-factored.
■ When the choice between the two alternative A –productions is not
clear we may be able to rewrite the productions to defer the decision
until enough of the input has been seen that we can make a right
choice.
stmt → if expr then stmt else stmt
| if expr then stmt
■ when we see the input if, we cannot immediately tell which production
rule to choose to expand stmt in the derivation.
Left-Factoring (cont.)
■ In general,
If A → αβ1 | αβ2 are two A productions and input begins
with non empty string derived from α, we do not know whether
to expand
A to αβ1 or
A to αβ2
■ We may defer the decision by expanding A to αA | and then A | to
β1 or β2 as follows:
A → αA |
A | → β1 | β2
Left-Factoring – Method
■ Method:
For each non-terminal A, find the longest prefix ε common to two or
more of its alternatives. If α ≠ ε, i.e there is a common prefix –
replace all of the A productions
A → αβ1 | ... | αβn | γ
where γ represents all alternatives that do not begin with α, by
A → αA | | γ
A | → β1 | β2 |... | βn
Q1) Left Factor the following Grammar
S iEtS | iEtSeS | a
Eb
43
Q1) Left Factor the following Grammar:
E iEtS | iEtSeS | a
Eb
Solution:
E iEtSE1 | a
E1 eS | ε
Eb
44
Left factor the following grammar
2) A ad | a | ab | abc | b
45
Left-Factoring – Example1
A → abB | aB | cdg | cdeB | cdfB
A → aA | | cdA ||
A | → bB | B
A ||→ g | eB | fB
Left-Factoring – Example2
A → ad | a | ab | abc | b
⇓
A → aA | | b
A | → d | ε | b | bc
⇓
A → aA | | b
A | → d | ε | bA||
A|| → ε | c
Left-Factoring – Example3
lexp atom | list
atom number | identifier
list (lexp_seq)
lexp_seq lexp lexp_seq |
lexp_seq | , lexp_seq | ε
48
Parsing Techniques
49
4.4 Top down Parsing
■ Top down parsing is viewed as a problem of
constructing parse tree from the root and creating
the nodes of the parse tree in preorder and finding
the left most derivation for an input string.
50
EXAMPLE
■ Write the sequence of top down parse for id+id*id using
the arithmetic expression grammar
E → T E’
E’ → +T E’ | ε
T → F T’
T’ → *F T’ | ε
F → id | (E)
51
EXAMPLE
■ Write the sequence of top down parse for id+id using the arithmetic
expression grammar Left Most Derivation
E → T E’ E TE’
lm
E’ → +T E’ | ε E
lm
FT’E’
T → F T’ E
lm id T’E’
T’ → *F T’ | ε E
lm
id E’
F → id | (E) E
lm id + T E’
E id + F T’ E’
lm
E
lm
id + id T’ E’
E id + id E’
lm
E
lm
id + id
52
SEQUENCE OF STEPS IN WRITING THE PARSE
TREE FOR THE STRING id + id
E TE’
lm
E FT’E’ E E E
lm lm lm
E
lm id T’E’
E id E’ T E’ T E’
lm
E
lm id + T E’
E id + F T’ E’ F T’
lm
E id + id T’ E’
lm
E id + id E’
lm
E
lm
id + id
Sequence of Steps in Writing Parse Tree For the String id = id + id
E TE’ FT’E’ id T’E’ id E’ id + T E’ id + F T’ E’ id + id T’ E’ id + id E’ id + id
lm lm lm lm lm lm lm lm lm
E E E E E
lm lm lm T E’
lm
T E’ T E’ T E’
F T’ F T’
F T’
id id Ɛ
E E
lm lm E
lm T E’
T E’
T E’
F T’ F T’ + T E’
+ T E’ F T’ + T E’
id id Ɛ F
Ɛ id
T’
Ɛ F T’
id
Contd….
E E
lm lm
T E’ T E’
F T’ + T E’ F T’ + E’
T
id Ɛ F T’
id Ɛ F T’ Ɛ
id Ɛ id Ɛ
Activity 1
■ Write the sequence of top down parse for id+id*id using the arithmetic
expression grammar
E → T E’
E’ → +T E’ | ε
T → F T’
T’ → *F T’ | ε
F → id | (E)
56
57
Activity 2
■ Write the sequence of top down parse for id*id+id using the arithmetic
expression grammar
E → T E’
E’ → +T E’ | ε
T → F T’
T’ → *F T’ | ε
F → id | (E)
58
4.4.1 Recursive – Descent Parsing
Typical procedure for a nonterminal in top down parser
void A ( )
{
Choose an A-production, A X1, X2, . . . . . . . . . Xk ;
for (i=1 to k)
{
if (Xi is a non-terminal)
call Procedure Xi ( );
else if (Xi equals the current input symbol a )
advance the input to the next symbol ;
else /* An error has occurred */
}
} 59
■ A parser that uses collection of Recursive procedures for parsing
the given input string is called Recursive descent.
■ A recursive descent parser consists of set of procedures, one for
each nonterminal.
■ Execution begins with the procedure for the start symbol, which
halts and announces success if its procedure body scans the entire
input string.
■ General recursive descent parsing may require backtracking, that
is, it may require repeated scans over the input.
■ However backtracking is rarely needed to parse. Procedure of
recursive descent parsing does not support backtracking, to allow
backtracking the code has to be modified. 60
■ Consider the grammar
S → cAd
A → ab | a
To construct a top down parse tree for the input string w= cad,
begin with a tree consisting of single node labeled S and the
input pointer pointing to c, the first symbol of w.
61
S has only one production, so we use it to expand S obtain the
tree of Fig. (a). The leftmost leaf, labeled c, matches the rest
symbol of input w, so we advance the input pointer to a, the
second symbol of w, and consider the next leaf, labeled A.
62
4.4.2 FIRST and FOLLOW
■ The construction of both top-down and bottom –up parsers
is aided by two functions, FIRST and FOLLOW, associated
with grammar G.
■ During top-down parsing, FIRST and FOLLOW allows us to
choose which production to apply, based on the next input
symbol.
■ During panic mode error recovery, sets of tokens produced
by FOLLOW can be used as synchronizing tokens.
63
■ Definition: FIRST(α)
FIRST(α), where α is any string of grammar symbols is defined to be
the set of terminals that begin strings derived from α
If α ⇒ ε, then ε* is also in FIRST(α)
■ Definition: FOLLOW(A)
FOLLOW(A) for a nonterminal A is defined to be set of terminals that
appear immediately to the right of A in some sentential form.
■ That is the set of terminals a such that there exists a derivation of
the form S ⇒ α Aa β for some α and β
*
64
Rules to compute FIRST
65
Rules to compute FOLLOW
66
1) Obtain FIRST and FOLLOW for the grammar given below:
E T E|
E| +T E| | ε
T F T|
T| *F T| | ε
F id | (E)
67
Solution
FIRST(E) = FIRST(T E |) = FIRST(T) ------- (1)
FIRST(T) = FIRST(F T’) = FIRST(F) --- (2)
FIRST(F) = FIRST(id) U FIRST( (E) ) = {id} U { ( } = {id, ( }
So, From Equation (2) We Get, FIRST(T) = {id, (}
So, From Equation (1) We Get, FIRST(E) = {id, (}
FIRST(E’) = FIRST(+T E’) U FIRST(ε) = {+} U{ε} = {+,ε}
FIRST(T’) = FIRST(*F T’ ) U FIRST(ε) = {*}U{ε} = {*,ε}
68
FOLLOW(E) = FIRST( ) ) U $ ={ ), $}
FOLLOW(E’) = FOLLOW(E) ={ ), $}
FOLLOW(T) = FIRST(E’) = {+} U FOLLOW(E) U FOLLOW(E’)
= {+} U { ), $} ={+, ), $}
FOLLOW(T’) = FOLLOW(T) ={+, ), $}
FOLLOW(F) = FIRST(T’) ={*,ε} U FOLLOW(T) U FOLLOW(T’)
= {*} U {+, ),$} = {*,+, ),$}
69
Calculate FIRST and FOLLOW For the Following Grammar
2)
S → AaAb | BbBa
A→ε
B→ε
3)
S → L=R | R
L → *R |id
R→L
70
Question 2) Solution
FIRST(S) =FIRST(AaAb) U FIRST(BbBa)
={a} U {b} ={a,b}
FIRST(A)= FIRST(ε) ={ε}
FIRST(B) =FIRST(ε) ={ε}
FOLLOW(S)={$}
FOLLOW(A)= FIRST(a) U FIRST(b)={a,b}
Follow(B) ={a,b}
71
Calculate FIRST and FOLLOW For the Following Grammar
4) S → aBa
B → bB | ε 6) S → ABCDE
A → a|ε
5) S → ,GH; B →b |ε
G → aF C →c
F → bF |ε D →d |ε
H → KL E →e |ε
K → m |ε
L→n|ε
72
LL(1) Grammars
■ The first “L" in LL(1) stands for scanning the input from left to
right, the second “L" for producing a leftmost derivation, and
the “1" for using one input symbol of lookahead at each step
to make parsing action decisions.
73
Transition diagram for predictive parsers
■ Transition diagrams are useful for visualizing predictive parsers.
■ To construct the transition diagram from a grammar, first eliminate left recursion and then
left factor the grammar.
2. For each production A → X1X2 . . . Xk, create a path from the initial to the final state,
with edges labeled X1,X2,. . . , Xk. If A → ε , the path is an edge labeled ε.
74
Consider the Productions in the Arithmetic Expression Grammar:
E TE/
E/ + T E/
75
Definition of LL(1) grammar
i.e if β derives ε, then FIRST(α) should not derive any string beginning
with a terminal in FOLLOW(A)
76
■ The first two conditions are equivalent to the statement
that FIRST(α) and FIRST(β ) are disjoint sets.
77
Algorithm for construction of predictive parsing table
This algorithm collects the information from FIRST and FOLLOW sets
into a predictive parsing table M[A; a], a two-dimensional array,
where A is a nonterminal, and a is a terminal or the symbol $, the
input endmarker
78
■ Construct predictive parsing table ( LL(1) table) for arithmetic
expression grammar. Check whether the grammar is LL(1) by
using predictive parsing table and also without using predictive
parsing table.
Note:
To construct predictive parsing table, the grammar should be
unambiguous, non left recursive and left factored.
79
We Have three Grammars for Expression
EE+E EE+T E TE/
EE*E ET E/ +TE/
E (E) TT*F E/ Ɛ
E id TF T FT/
F (E) T/ *FT/
F id T/ Ɛ
F (E)
F id
The grammar is
FIRST(E) = FIRST(T E |) = FIRST(T) ------- (1)
FIRST(T) = FIRST(F T’) = FIRST(F) --- (2)
E→TE|
FIRST(F) = FIRST(id) U FIRST( (E) ) = {id} U { ( } = {id, ( }
E’ → +T E | | ε So, From Equation (2) We Get, FIRST(T) = {id, (}
T → F T| So, From Equation (1) We Get, FIRST(E) = {id, (}
T’ → *F T || ε FIRST(E’) = FIRST(+T E’) U FIRST(ε) = {+} U{ε} = {+,ε}
F → id | (E) FIRST(T’) = FIRST(*F T’ ) U FIRST(ε) = {*}U{ε} = {*,ε}
FOLLOW(E) = FIRST( ) ) U $ = { ), $}
FOLLOW(E’) = FOLLOW(E) = { ), $}
FOLLOW(T) = FIRST(E’) = {+} U FOLLOW(E) U FOLLOW(E’)
= {+} U { ), $} = {+, ), $}
FOLLOW(T’) = FOLLOW(T) = {+, ), $}
FOLLOW(F) = FIRST(T’) = {*,ε} U FOLLOW(T) = {*, +, ), $}
81
Solution
■ Construct parsing table by placing all terminals in columns and non
terminals in rows.
■ Blanks are error entries, non blanks indicate a production with which to
expand a nonterminal
83
To check whether the grammar is LL(1) or not without
using predictive parsing table:
Consider only those rules of the form A → α|β
i.e 1) E’ → +T E’ | ε
2) T’ → *F T ’| ε
3) F → id | (E)
Apply the rules given in definition to check whether it is LL(1)
Consider the First Production: E’ → +T E’ | ε
In the rule, E’ → +T E’ | ε
α = +T E’ and β = ε
84
Rule 1) FIRST(α) and FIRST(β) Should be Disjoint
FIRST(+T E’ ) = {+} and FIRST(ε) = {ε} Since nothing is common,
both the sets are disjoint.
So, First Rule is Satisfied for first production.
85
Rule 2) Atmost one of α or β is deriving ε.
86
Rule 3) if FIRST(β) contains ε, then FIRST (α) and
FOLLOW(E’) should be disjoint and vice-versa.
FIRST (α) = FIRST (+T E’ ) = {+}
FOLLOW(E’) ={ ), $}
Both are disjoint .
Consider only those rules Consider the 2nd Rule 1) FIRST(α) and Rule 2) Almost one of α Rule 3) if FIRST(β) contains ε,
Production: then FIRST (α) and FOLLOW(E’)
of the form A → α|β FIRST(β) Should be Disjoint or β is deriving ε. should be disjoint and vice-
versa.
1) E’ → +T E’ | ε T’ → *F T’ | ε FIRST(*F T’ ) = {*} and FIRST(ε) = {ε} It is true because only β FIRST (α) = FIRST (*F T’ ) = {*}
2) T’ → *F T ’| ε In the rule, Since nothing is common, both is deriving ε FOLLOW(T’) ={+ ), $}
3) F → id | (E) T’ → *F T’ | ε the sets are disjoint. Both are disjoint .
α = *F T’ and So, First Rule is Satisfied for first
β=ε production.
Consider only those rules Consider the 3rd Rule 1) FIRST(α) and Rule 2) Almost one of α Rule 3) if FIRST(β) contains ε,
Production: then FIRST (α) and FOLLOW(E’)
of the form A → α|β FIRST(β) Should be Disjoint or β is deriving ε. should be disjoint and vice-
versa.
1) E’ → +T E’ | ε F → id | (E) FIRST(id) = {id} and FIRST((E)) = {(} It is true because only β FIRST (α) = FIRST (*F T’ ) = {*}
2) T’ → *F T ’| ε In the rule, Since nothing is common, both is deriving ε FOLLOW(T’) ={+ ), $}
3) F → id | (E) F → id | (E) the sets are disjoint. Both are disjoint .
α = id and So, First Rule is Satisfied for first
production.
β = (E)
■ A grammar is LL(1) grammar if the associated LL(1) parsing
table has atmost one production in each table entry.
89
Construct predictive parsing table for the grammar.
Also verify that the grammar is LL(1) or not with and
without using predictive parsing table.
90
91
■Above grammar is not LL(1) since there are two
entries for the nonterminal S on the input
symbol e.
92
Non recursive predictive parsing
■ A non-recursive predictive parser can be built by
maintaining a stack explicitly, rather than implicitly via
recursive calls.
■ The parser mimics a leftmost derivation.
■ If w is the input that has been matched so far, then the
stack holds a sequence of grammar symbols α such that
93
Model of table driven Predictive Parser
94
■ The table-driven parser has an input buffer, a stack containing a
sequence of grammar symbols, a parsing table constructed by
predictive parsing algorithm, and an output stream.
95
■ The parser is controlled by a program that considers X, the symbol
on top of the stack, and a, the current input symbol.
■ If X is a nonterminal, the parser chooses an X-production by
consulting entry M[X, a] of the parsing table M.
■ Otherwise, it checks for a match between the terminal X and
current input symbol a.
96
Table driven predictive parsing algorithm
97
LL(1) Parsing Table
■ We already Constructed the LL(1) Parsing Table for the Arithmetic
Expression.
98
Matched Stack Input String Action
E $ id + id * id $
T E’ $ id + id * id $
F T’ E’ $ id + id * id $
id T’ E’ $ id + id * id $
id T’ E’ $ + id * id $
id E’ $ + id * id $
id + T E’ $ + id * id $
id + T E’ $ id * id $
id + F T’ E’ $ id * id $
id + id T’ E’ $ id * id $
id + id T’ E’ $ * id $
Continue Like This
Write the sequence of moves made by the predictive parser on the
input string id+id*id
100
Exercise 1 to students
Given the grammar,
A(A) | a
1. Construct the predictive parsing table
2. Check the grammar is LL(1) or not
3. Show the parser steps for the input ((a))
101
Exercise 2 to students
Given the grammar,
S AaAb| BbBa
A ε
B ε
1. Compute FIRST and FOLLOW
2. Construct the predictive parsing table
3. Show the parser steps for the input ab
102
Exercise 3 to students
Consider the grammar given below. Check whether
the grammar is LL(1) or not without using Predictive
Parsing Table.
S→ aSbS | bSaS | ε
103
Exercise 4 to the students
3. Consider the grammar given below. Check
whether the grammar is LL(1) or not without
using predictive parsing table.
S→ aSbS | bSaS | ε
104
Error recovery in predictive parsing
■ An error is detected during predictive parsing when the terminal on
top of the stack does not match the next input symbol or when
nonterminal A is on top of the stack, a is the next input symbol, and
M[A, a] is error (i.e., the parsing-table entry is empty).
■ Panic Mode error recovery
Panic-mode error recovery is based on the idea of skipping over
symbols on the input until a token in a selected set of synchronizing
tokens appears. Its effectiveness depends on the choice of
synchronizing set. The sets should be chosen so that the parser
recovers quickly from errors that are likely to occur in practice.
105
Some heuristics are as follows
1. As a starting point, place all symbols in FOLLOW(A) into the synchronizing set
for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and
pop A from the stack, it is likely that parsing can continue.
2. It is not enough to use FOLLOW(A) as the synchronizing set for A. For example,
if semicolons terminate statements, as in C, then keywords that begin
statements may not appear in the FOLLOW set of the nonterminal representing
expressions. A missing semicolon after an assignment may therefore result in the
keyword beginning the next statement being skipped. We can add to the
synchronizing set of a lower-level construct the symbols that begin higher-level
constructs.
For example, we might add keywords that begin statements to the
synchronizing sets for the non-terminals generating expressions.
106
Some heuristics – Contd…
3. If we add symbols in FIRST(A) to the synchronizing set for nonterminal
A, then it may be possible to resume parsing according to A if a symbol
in FIRST(A) appears in the input.
4. If a nonterminal can generate the empty string, then the production
deriving can be used as a default. Doing so may postpone some error
detection, but cannot cause an error to be missed. This approach
reduces the number of non-terminals that have to be considered during
error recovery.
5. If a terminal on top of the stack cannot be matched, a simple idea is to
pop the terminal, issue a message saying that the terminal was inserted,
and continue parsing. In effect, this approach takes the synchronizing set
of a token to consist of all other tokens.
107
Method for panic mode error recovery in LL(1) parsing
■ Construct the predictive parsing table.
■ Compute FOLLOW. Place all the symbols in FOLLOW(A)
into the synchronizing set for nonterminal A. “synch”
indicates the synchronizing tokens obtained from the
FOLLOW set of the nonterminal.
108
Method for panic mode error recovery in LL(1) parsing
■ For parsing the erroneous input, the parsing table is to be used
as follows:
If the parser looks up entry M[A, a] and finds that it is blank,
then the input symbol a is skipped.
If the entry is “synch," then the nonterminal on top of the stack
is popped in an attempt to resume parsing, other than the initial
parsing condition where the input is skipped with the error
message rather than popping of f the nonterminal from the
stack.
If a token(terminal) on top of the stack does not match the
input symbol, then we pop the token from the stack.
109
Write the parsing and error recovery moves made by the
predictive parser on the erroneous input )id*+id
• Steps: Take the grammar suitable for top down parsing.
• Construct predictive parsing table.
• Add synchronizing tokens to the table by computing FOLLOW.
110
Write the parsing and error recovery moves made by the
predictive parser on the erroneous input )id*+id
Steps: Take the grammar suitable for top down parsing.
Construct predictive parsing table.
Add synchronizing tokens to the table by computing FOLLOW.
111
Adding synchronizing token
112
Parsing the string )id*+id
113
Bottom up Parsing
■ A bottom-up parse corresponds to the construction of a
parse tree for an input string beginning at the leaves (the
bottom) and working up towards the root (the top).
114
Write the sequence of bottom up parse for the input id*id
■ Since it is bottom up parse the grammar to be
considered should be unambiguous
Unambiguous arithmetic expression grammar is:
115
Write the sequence of bottom up parse for the input id*id
■ Since it is bottom up parse the grammar to be
considered should be unambiguous
Unambiguous arithmetic expression grammar is:
E→ E+T|T
T→T*F |F
F → ( E ) | id
116
Write the sequence of bottom up parse for the input id*id
■ Since it is bottom up parse the grammar to be
considered should be unambiguous
Unambiguous arithmetic expression grammar is:
E→ E+T|T
T→T*F |F
F → ( E ) | id
■ To write the bottom up parse for the input, First write
the rightmost derivation for id*id
117
Write the sequence of bottom up parse for the input id*id
■ Since it is bottom up parse the grammar to be considered
should be unambiguous
Unambiguous arithmetic expression grammar is:
E → E + T | T
T → T * F | F
F → ( E ) | id
■ To write the bottom up parse for the input, First write the
rightmost derivation for id*id
E ═> T ═> T*F ═> T*id ═> F*id ═> id*id
118
Bottom Up Parse for id*id
119
Reduction
■ Reduction is the reverse of the step in a derivation.
■ We can think of bottom up parsing as the process of “reducing" a
string w to the start symbol of the grammar.
■ At each reduction step, a specific substring matching the body of a
production is replaced by the nonterminal at the head of that
production.
■ Key problems in bottom up parsing:
Key decisions during bottom up parsing are about when to reduce and
what production to apply as the parse continues
Sequence of reductions for the given arithmetic expression grammar is:
id*id, F*id, T*id, T*F, T, E
120
Handle Pruning
Definition of handle
■Informally, a “handle" is a substring that matches the body of a
production, and whose reduction represents one step along the
reverse of a rightmost derivation.
121
Handle pruning process
122
The handles during the parse of id1*id2 according to
the arithmetic expression grammar.
123
Construct bottom up parse for the input string w= aaa*a++
using the grammar: S → SS+ | SS* |a
124
Construct bottom up parse for the input string w= aaa*a++
using the grammar: S → SS+ | SS* |a
125
Shift Reduce Parsing
■ Shift-Reduce Parsing is a form of bottom-up parsing in which a
stack holds grammar symbols and an input buffer holds the rest of
the string to be parsed.
■ Shift Reduce Parsing can be built for the largest class of grammar
called LR grammars.
■ Working:
■ We use $ to mark the bottom of the stack and also the right end of
the input. Conventionally, when discussing bottom-up parsing, the
top of the stack is to the right.
126
■ Initially, the stack is empty, and the string w is on the input,
as follows:
128
Actions of a Shift Reduce Parser
1. Shift : Shift the next input symbol onto the top of the stack.
2. Reduce: The right end of the string to be reduced must be at the
top of the stack. Locate the left end of the string within the stack
and decide with what nonterminal to replace the string.
3. Accept: Announce successful completion of parsing.
4. Error: Discover a syntax error and call an error recovery routine.
129
Show the parsing of input id1*id2 using Shift Reduce Parser.
130
Conflicts during shift reduce parsing
■ There are context-free grammars for which shift-reduce
parsing cannot be used.
■ Two types of conflicts that occur during shift-reduce
parsing are:
a)Shift/Reduce Conflict
b)Reduce/Reduce Conflict
131
Shift/Reduce Conflict
• Every Shift-Reduce Parser for such a grammar can reach a
configuration in which the parser, knowing the entire stack and also
the next k input symbols, cannot decide whether to shift or to reduce.
132
■ If we have a shift-reduce parser in configuration
136
■ If we have shift/reduce parser in configuration
141
Items and LR(0) automaton
■ An LR parser makes shift-reduce decisions by maintaining states to
keep track of where we are in a parse. States represent sets of
“items”.
■ An LR(0) item (item for short) of a grammar G is a production of G
with a dot at some position of the body. Thus, production A → XYZ
yields the four items
A → .XYZ
A → X.YZ
A → XY.Z
A → XYZ.
The production A → ε generates only one item , A → .
142
■ An item indicates how much of a production we have seen at a
given point in the parsing process. For example, the item A → XY.Z
indicates that we hope to see a string derivable from XY Z next on
the input.
■ One collection of sets of LR(0) items, called the canonical LR(0)
collection, provides the basis for constructing a deterministic finite
automaton that is used to make parsing decisions. Such an
automaton is called an LR(0) automaton.
■ Each state of the LR(0) automaton represents a set of items in the
canonical LR(0) collection.
143
■ To construct the canonical LR(0) collection for a grammar, we define
an augmented grammar and two functions, CLOSURE and GOTO.
■ What is augmented grammar?
If G is a grammar with start symbol S, then G’, the augmented grammar
for G, is G with a new start symbol S’ and production S’ → S
■ The purpose of this new starting production is to indicate to the
parser when it should stop parsing and announce acceptance of the
input. That is, acceptance occurs when and only when the parser is
about to reduce by S’ → S
144
Closure of item sets
■ If I is a set of items for a grammar G, then CLOSURE(I ) is the set of
items constructed from I by the two rules:
145
■ Consider the augmented grammar
E’ → E
E→ E+T|T
T→T*F |F
F → ( E ) | id
146
■ Consider the augmented grammar
E’ → E
E→ E+T|T
T→T*F |F
F → ( E ) | id
If I is the set of one item { [E’ → . E] } then CLOSURE(I) is??
E’ → . E
E→ .E+T
E → .T
T → .T * F
T→.F
F→ .(E)
F → . id
147
Computation of CLOSURE
148
■ We divide all the sets of items of interest into two classes:
2) Non kernel items: all items with their dots at the left end,
except for S’ → . S
149
Function GOTO
■ The second useful function is GOTO(I,X) where I is a set of
items and X is a grammar symbol.
■ Definition
GOTO(I,X) is defined to be the closure of the set of all items
[A →αX.β] such that [A → α .Xβ] is in I
■ GOTO function is used to define the transitions in the LR(0)
automaton for a grammar. The states of the automaton
correspond to sets of items, and GOTO(I,X) specifies the
transition from the state for I under input X.
150
■ If I is the set of two items { [E’ → E.], [E → E . + T ] }, then
GOTO(I ,+) contains the items
E → E +. T
T → .T * F
T→.F
F→ .(E)
F → . id
151
Algorithm for Computation of canonical collection of sets of
LR(0) items for augmented grammar G’
152
LR(0)Automaton
■ The states of this automaton are the sets of items from the
canonical LR(0) collection, and the transitions are given by the
GOTO function.
153
Construct LR(0) automaton for the arithmetic
expression grammar
E→ E+T|T
T→T*F |F
F → ( E ) | id
■First augment the arithmetic expression grammar. Resulting grammar is:
E’ → E
E→ E+T|T
T→T*F |F
F → ( E ) | id
■Begin with [E’ → .E] item set. Compute its CLOSURE.
154
■ CLOSURE is:
E’ → . E
E→ .E+T
E → .T
T → .T * F
T→.F
F→ .(E)
F → . id
This sets of items of CLOSURE forms the state I0
■ Start state of LR(0)automaton is CLOSURE( { [E’ → .E] } )
155
LR(0) Automaton
156
How can LR(0) automaton help with shift reduce decisions?
157
■ The LR parsing algorithm uses its stack to keep track of
states as well as grammar symbols.
158
■ Initially stack holds the start state 0 of the automaton and SYMBOL
holds the bottom-of-stack marker $.
■ The input symbol is id. Check whether state 0 on id has transition
to new state, if so shift the symbol else reduce.
■ While shifting push the new state to the top of the stack and the
input symbol to the top of the SYMBOL column.
■ If there is no transition then reduce the symbol by the head of the
production and push the head of the production to the SYMBOL
column and push the new state from which head of the production
makes transition to from the top of the stack.
159
Actions of shift reduce parser on input id*id using LR(0) Automaton
160
Exercise to students
161
solution
162
Line Stack Symbols Input Action
1 0 $ id * id + id $ Shift to 5
2 0 5 $ id * id + id $ Reduce by F id
3 0 3 $ F * id + id $ Reduce by T F
4 0 2 $ T * id + id $ Shift to 7
5 0 2 7 $ T * id + id $ Shift to 5
6 0 2 7 5 $ T * id + id $ Reduce by F id
7 0 2 7 10 $ T * F + id $ Reduce by T T * F
8 0 2 $ T + id $ Reduce by E T
9 0 1 $ E + id $ Shift to 6
10 0 1 6 $ E + id $ Shift to 5
11 0 1 6 5 $ E + id $ Reduce by F id
12 0 1 6 3 $ E + F $ Reduce by T E
13 $ E + T $ Reduce by E E + T
14 0 1 $ E $ Accept
■Construct LR(0) automaton for the
following grammar
A→(A) | a
164
solution
165
LR Parsing algorithm
■ Model of LR Parser
166
■ It consists of an input, an output, a stack, a driver program,
and a parsing table that has two parts (ACTION and GOTO).
■ The driver program is the same for all LR parsers; only the
parsing table changes from one parser to another.
■ The parsing program reads characters from an input buffer
one at a time.
■ Where a shift-reduce parser would shift a symbol, an LR
parser shifts a state.
167
■ Each state summarizes the information contained in the
stack below it
■ The stack holds a sequence of states, s0,s1, …sm, where
sm is on top. In the SLR method, the stack holds states
from the LR(0) automaton.
168
Structure of LR parsing table
■ The parsing table consists of two parts: a parsing-action function
ACTION and a goto function GOTO.
169
Configuration of LR parser
■ To describe the behavior of an LR parser, it helps to have a notation representing the
complete state of the parser: its stack and the remaining input. A configuration of an LR
parser is:
■ where the first component is the stack contents (top on the right), and the second component
is the remaining input. This configuration represents the right-sentential forms a pair:
170
LR parsing Algorithm
171
Construction of SLR parsing table
172
Algorithm for constructing SLR parsing table
173
■ The parsing table consisting of the ACTION and GOTO
functions determined by Algorithm above is called the
SLR(1) table for G.
■ An LR parser using the SLR(1) table for G is called the
SLR(1) parser for G, and a grammar having an SLR(1)
parsing table is said to be SLR(1).
■ We usually omit the “(1)" after the “SLR," since we shall
not deal here with parsers having more than one symbol
of lookahead.
174
Construct SLR parsing table for arithmetic expression grammar
and verify whether the grammar is SLR(1) or not?
■ Augment the grammar
■ Construct C the canonical collection of LR(0) items. (i.e construct
LR(0)automaton. The states will become canonical collection).
■ Number the productions other than the starting production of
augmented grammar as:
(1)E → E + T
(2) E → T
(3)T → T * F
(4) T → F
(5)F → ( E )
(6) F → id
175
■The codes for the actions are:
176
177
Moves of an LR parser on id*id+id
178
To verify grammar is SLR(1) or not?
■ There should be atmost one entry in a table cell. Then the
grammar is SLR(1).
179
Exercise to students
1)Given the grammar
S → CC
C →cC | d
1)Obtain the sets of canonical collection of valid LR(0)
items.
2)Design SLR parsing table
3)Verify whether the grammar is SLR(1) or not
180
2) Show that the following grammar is not SLR(1)
181
3) Construct LR(0) parsing table for the following
grammar
S->Ac
A->AB| Ɛ
B->aB | Ɛ
182