MODULE 3 Syntax Analysis

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 182

MODULE 3

CHAPTER 4
SYNTAX ANALYSIS
VENKATESH
Senior Associate Professor
Department of CSE
AIET, MOODBIDRI
Introduction
■ Syntax Analysis is the second phase in compilation. The Syntax
Analyzer (parser) checks the syntax of the language.
■ A parsing or syntax analysis is a process which takes the input
string w and produces either a parse tree (Syntactic Structure) or
generates the syntactic errors.
■ Example, a = b + 10

2
4.1 Role of Parser

Position of parser in compiler model


3
■ In the process of compilation, the parser and Lexical Analyzer work
together, that means when parser requires string of tokens, it
invokes Lexical Analyzer.
■ The parser obtains a string of tokens from the lexical analyzer and
verifies that the string of token names can be generated by the
grammar for the source program.
■ We expect the parser to report any syntax errors in an intelligible
fashion and to recover from commonly occurring errors to
continue processing the remainder of the program.
■ For the well formed program, the parser constructs a parse tree
and passes it to the rest of the compiler for further processing.
4
Three general types of parsers for grammars
1. Universal Parser
■ Universal parsing methods are Cocke-Younger-Kasami algorithm
and Earley’s algorithm.
■ These methods are very much inefficient to use in compilers.
2. Top-Down Parser
Builds the parse tree from top(root) to bottom(leaves)
■ Bottom-Up Parser
Builds the parse tree from bottom(leaves) to top(root)
In both top-down and bottom-up parsers, input to the parser is
scanned from left to right, one symbol at a time..
5
■ Efficient top-down and bottom-up parsers can be implemented
only for sub-classes of context-free grammars, particularly, LL and
LR grammars.
❑LL for top-down parsing
❑LR for bottom-up parsing
■ Parsers implemented by hand often uses LL grammars.
■ Parsers for the larger class of LR grammars are constructed using
automated tools.

6
4.1.2 Representative grammars
Ambiguous version of arithmetic expression grammar:
E → E + E | E * E | ( E ) | id
Unambiguous version of arithmetic expression grammar
E→ E+T|T
T→T*F |F
F → ( E ) | id

Above grammar belongs to the class of LR grammars that are suitable


for bottom – up parsing. However, this grammar cannot be used for
top-down parsing because it is left recursive.
Non –left recursive version of arithmetic expression grammar that is suitable for top–down parsing:
E → T E’
E’ → +T E’ | ε
T → F T’
T’ → *F T’ | ε
F → id | (E)
7
4.1.3 Syntax error handling

Common Programming errors can occur at many different levels.


1.Lexical errors: include misspelling of identifiers, keywords, or
operators.
Eg: The use of an identifier elipseSize instead of ellipeSize and
missing quotes around the text that is intended as string
2.Syntactic errors : include misplaced semicolons or extra or missing
braces that is { or }
Eg in C : Appearance of case statement without a switch
Printing the value of a variable without declaring it

8
3. Semantic errors: include type mismatches between operators and operands.
Eg: return of a value in a function whose return type is void
4. Logical errors: can be anything from incorrect reasoning on the part of the
programmer.
Eg: Use of assignment operator = instead of the comparison operator = =.
❑The precision of parsing methods allows the syntactic errors to be detected
very efficiently
❑Several parsing methods such as LL and LR detects an error as soon as
possible., that is when the stream of tokens from the lexical analyzer cannot be
parsed further according to the grammar specified.
■ The error handler in a parser has goals that are simple to state but
challenging to realize:
❑Report the presence of errors clearly and accurately
❑Recover from each error quickly enough to detect the
subsequent errors.
❑Add minimal overhead to the processing of correct programs.
How an error handler report the presence of an error?
❑At the very least, it must report the place in the source program
where an error is detected, because there is a good chance that
the actual error occurred within the previous few tokens.
❑A common strategy is to print the offending line with a pointer to
the position at which an error is detected.
10
4.1.4 Error - Recovery strategies

■ Once the error is detected, how should the parser recover?


■ Simplest method for the parser is to quit with informative error
message when it detects the first error.
■ Additional errors are often uncovered.
Common Error recovery strategies are:
1.Panic-Mode Recovery
2.Phrase-Level Recovery
3.Error Productions
n Global Correction

11
1)Panic-Mode Recovery
■ On discovering an error, the parser discards input symbols one at a time
until one of a designated set of Synchronizing tokens is found.
■ Synchronizing tokens are usually delimiters such as semicolon or }
whose role in the source program is clear and unambiguous.
■ The compiler designer must select the synchronizing tokens
appropriate for the source language.
■ Panic mode correction often skips a considerable amount of input
without checking it for additional errors.
Advantage:
 Simplicity
 Is guaranteed not to go into an infinite loop
2) Phrase-Level Recovery
■ On discovering an error, A parser may perform local correction on the
remaining input. i.e, it may replace a prefix of the remaining input by some
string that allows the parser to continue.
Ex: Typical local correction is to replace a comma by a semicolon, delete an
extraneous semicolon or insert a missing semicolon
■ Local correction is left to the compiler designer and the replacement has to
be done carefully so that it does not lead to infinite loops.
■ Phrase level replacement has been used in several error-repairing
compliers, as it can correct any input string.
■ Major drawback is difficulty in coping with the situations in which the actual
error has occurred before the point of detection.
3) Error Productions

■ By anticipating the common errors that might be encountered, We


can augment the grammar for the language at hand with
productions that generate the erroneous constructs.
■ A parser constructed from the grammar augmented by these error
productions detects the anticipated errors when an error
production is used during parsing.
■ The parser can then generate appropriate error diagnostics to
indicate the erroneous construct that has been recognized in the
input.
4) Global Correction

■ We would like a compiler to make as few changes as possible in


processing an incorrect input string.
■ There are algorithms that perform minimal sequence of changes to
obtain a globally least- cost correction.
■ Given an incorrect input string x and grammar G, these algorithms
will find a parse tree for a related string y, such that the number of
insertions, deletions and changes of tokens required to transform x
into y is as small as possible.
■ Unfortunately, these methods are too costly to implement in terms
of time and space, so these techniques only of theoretical interest.
4.2 Context free grammars (CFG)
Consider the conditional statement grammar
stmt → if (expr) stmt else stmt
Definition of CFG
■Context free grammar is 4 tuple consisting of terminals, non
terminals, a start symbol and productions
1)Terminals: are the basic symbols from which strings are formed.
Term Token name is synonym for “terminal” and we use the word
“token” for a terminal
Eg: if and else in the conditional statement grammar
2)Non terminals: are syntactic variables that denote set of strings.
Eg: stmt and expr in conditional statement grammar 16
3) One nonterminal symbol is the start symbol and the set of
strings it denotes is the language generated by the grammar
4) Productions: specify the manner in which the terminals and
nonterminals can be combined to form strings.
Each production consists of:
■ A nonterminal called head or left side of production.
■ Symbol → sometimes : : = has been used in place of →
■ A body or right side of production consisting of zero or more
terminals and non-terminals.

17
Derivations
Left most derivation (LMD): In left most derivation, at each step of derivation the left-most non-
terminal is chosen for replacement.

Eg: consider arithmetic expression grammar


E → E + E | E * E | ( E ) | id

Left Most Derivation for string id+id*id is


E ⇒ E+E ⇒ id+E ⇒ id+E*E ⇒ id+id*E ⇒ id+id*id
lm lm lm lm lm

Right most derivation: : In right most derivation, at each step of derivation the right-most non-
terminal is chosen for replacement.

Right Most Derivation for string id+id*id is


E ⇒ E*E ⇒ E*id rm⇒ E+E*id ⇒ E+id*E ⇒ id+id*id
rm rm rm rm
18
■ Sentential Form: If S ⇒ α,
*
where S is the start symbol of grammar G, we say that α is
the sentential form of G

■ A sentence of L(G) is a sentential form consisting only string of terminal symbols of G.

i.e In S ⇒ α
*
- If α contains non-terminals, it is called as a sentential form of G.
- If α does not contain non-terminals, it is called as a sentence of G.
■ L(G) is the language of G (the language generated by G) which is its set of sentences
(terminal strings)

19
Parse tree
■ A parse tree is a graphical representation of a derivation that
filters out the order in which the productions are applied to
replace the non terminals.
■ Each interior node of parse tree is labeled with a nonterminal in
the head of the production and the children of the node are
labeled with symbols in the body of the production
■ The leaves of the parse tree which are labeled by nonterminal or
terminals and are read from left to right, constitute a sentential
form called the yield or frontier of the parse tree.

20
Ambiguity

• A grammar produces more than one parse tree for a sentence is


called as an ambiguous grammar.

Eg: Arithmetic expression grammar E


Parse tree for string id+id*id
E + E

E ⇒ E+E ⇒ id+E ⇒ id+E*E id E * E


⇒ id+id*E ⇒ id+id*id id id

E ⇒ E*E ⇒ E+E*E ⇒ id+E*E E * E


⇒ id+id*E ⇒ id+id*id
E + E id

id id
Activity
Given a grammar S (L) | a
L L , S | S

Write i)Leftmost derivation


ii)Rightmost derivation
iii)parse tree

For the stingsi) ((a ,a) , a , (a) )


ii) (a , (a , a))
iii) (((a, a) , a))
22
Lexical vs syntax analysis
Why use regular expressions to define the lexical syntax of a
language?"
1.Separating the syntactic structure of a language into lexical and
nonlexical parts provides a convenient way of modularizing the front
end of a compiler into two manageable-sized components.
2.The lexical rules of a language are frequently quite simple, and to
describe them we do not need a notation as powerful as grammars.
3.Regular expressions generally provide a more concise and easier-
to-understand notation for tokens than grammars.
4.More efficient lexical analyzers can be constructed automatically
from regular expressions than from arbitrary grammars.
23
Eliminating Ambiguity
■ For the most parsers, the grammar must be unambiguous.

■ unambiguous grammar
 unique selection of the parse tree for a sentence

■ We should eliminate the ambiguity in the grammar during the


design phase of the compiler.

■ An ambiguous grammar should be rewritten to eliminate the


ambiguity.
Consider the Grammar for Conditional Statement

stmt → if expr then stmt


| if expr then stmt else stmt
| other
Consider the Grammar for Conditional Statement

stmt → if expr then stmt


| if expr then stmt else stmt
| S
Expr  E
According to this grammar, the compound conditional statement

if E1 then S1 else if E2 then S2 else S3


Eliminating Ambiguity (cont.)
The grammar is ambiguous since the string

if E1 then if E2 then S1 else S2

Has two parse trees

Two parse trees for an ambiguous statement


Eliminating Ambiguity (cont.)
The grammar is ambiguous since the string

if E1 then if E2 then S1 else S2

Has two parse trees

stmt stmt

if expr then stmt else stmt if expr then stmt

E1 if expr then stmt S2 E1 if expr then stmt else stmt

E2 S1 E2 S1 S2

Two parse trees for an ambiguous statement


Eliminating Ambiguity (cont.)
• In Programming language with conditional statements the second parse Tree
is preferred.

• The disambiguity rule is “match each else with closest unmatched Then”
• So, we have to disambiguate our grammar to reflect this choice.

• The unambiguous grammar will be:


stmt  matched_stmt | open_stmt

matched_stmt  if expr then matched_stmt else matched_stmt

| other
open_stmt  if expr then stmt
| if expr then matched_stmt else open_stmt
Left Recursion
■ A grammar is left recursive if it has a non-terminal A such that there
is a derivation.
+
A ⇒ Aα for some string α
A non terminal is said to be left recursive if the symbol on the LHS and
the leftmost symbol in the RHS is same.
■ Top-down parsing techniques cannot handle left-recursive grammars.
■ So, a transformation is required to eliminate left recursion.
■ The left-recursion may appear in a single step of the derivation
(immediate left-recursion), or may appear in more than one step of
the derivation (indirect left recursion).
Immediate Left-Recursion
Consider a non terminal A with two productions:

A→Aα| β where β does not start with A

A left recursion production can be eliminated by rewriting the


offending production as :

A → β A|
A | → α A| | ε
Immediate left recursion can be eliminated by the following
technique, which works foe any number of A- productions.

First group the productions as:


A  A α1 | ... | A αm | β1 | ... | βn
where β1 ... βn do not begin with A
Then replace the A productions by
A  β1 A | | β2 A | | ………… | βnA |
A |  α1 A | | α2 A | | …………| αm A | | ε
32
■ The nonterminal A generates the same string as before but it
is no longer left recursive.

■ This procedure eliminates all left recursion from A and A |


productions provided no αi is ε

■ But it does not eliminate left recursion involving derivations


of two or more steps (indirect left recursion)

33
Immediate Left-Recursion -- Example

E → E+T | T
T → T*F | F
F → id | (E)

⇓ eliminate immediate left recursion


E → T E’
E’ → +T E’ | ε
T → F T’
T’ → *F T’ | ε
F → id | (E)
Left-Recursion -- Problem
• A grammar cannot be immediately left-recursive, but it still can
be left-recursive.

• By just eliminating the immediate left-recursion, we may not get


a grammar which is not left-recursive.
S → Aa | b
A → Ac | Sd | ε
This grammar is not immediately left-recursive,but it is still left recursive.
S ⇒ Aa ⇒ Sda
• So, we have to eliminate all left-recursions from our grammar
Eliminate Left-Recursion -- Algorithm
Input: Grammar G with no cycles or ε - productions
Output: an equivalent grammar with no left recursion.
arrange non-terminals in some order: A1 ... An
for (each i from 1 to n )
{
for (each j from 1 to i-1 )
{
replace each production of the form Ai → Aj γ by the productions
Ai → δ1 γ | δ2 γ| ... | δk γ, where Aj → δ1 | δ2 |... | δk are current Aj productions
}
eliminate immediate left-recursions among Ai productions
}
Eliminate Left-Recursion -- Example
S → Aa | b
A → Ac | Sd | ε

- Order of non-terminals: S, A

for S:
- we do not enter the inner loop.
- there is no immediate left recursion in S.

for A:
- Replace A → Sd with A → Aad | bd
So, we will have A → Ac | Aad | bd | ε

- Eliminate the immediate left-recursion in A


A → bdA | | A |
A | → cA | | adA | | ε

So, the resulting equivalent grammar which is not left-recursive is:


S → Aa | b
A → bdA | | A |
A | → cA | | adA | | ε
Eliminate Left-Recursion – Example2
S → Aa | b
A → Ac | Sd | ε

- Order of non-terminals: A, S

for A:
- we do not enter the inner loop.
- Eliminate the immediate left-recursion in A
A → Sd A | | A |
A|→ cA| | ε

for S:
- Replace S → Aa with S → Sd A | a | A | a
So, we will have S → Sd A | a | A | a | b
- Eliminate the immediate left-recursion in S
S → A | a A | | bS |
S’ → d A | aS | | ε

So, the resulting equivalent grammar which is not left-recursive is:


S → A | aS | | bS |
S | → d A | aS | | ε
A → Sd A | | A’
A|→ cA|| ε
Activity:
Eliminate Left Recursion from the following Grammars

(i) 
S  aB | aC | Sd | Se
B  bBc | f
Cg

(ii) 
S  cAd
A  Ac | Sd | a
Left-Factoring
■ Left factoring is a grammar transformation that is useful for producing a
grammar suitable for predictive or top down parsing.
■ A predictive parser (a top-down parser without backtracking) insists
that the grammar must be left-factored.
■ When the choice between the two alternative A –productions is not
clear we may be able to rewrite the productions to defer the decision
until enough of the input has been seen that we can make a right
choice.
stmt → if expr then stmt else stmt
| if expr then stmt
■ when we see the input if, we cannot immediately tell which production
rule to choose to expand stmt in the derivation.
Left-Factoring (cont.)
■ In general,
If A → αβ1 | αβ2 are two A productions and input begins
with non empty string derived from α, we do not know whether
to expand
A to αβ1 or
A to αβ2
■ We may defer the decision by expanding A to αA | and then A | to
β1 or β2 as follows:
A → αA |
A | → β1 | β2
Left-Factoring – Method
■ Method:
For each non-terminal A, find the longest prefix ε common to two or
more of its alternatives. If α ≠ ε, i.e there is a common prefix –
replace all of the A productions
A → αβ1 | ... | αβn | γ
where γ represents all alternatives that do not begin with α, by
A → αA | | γ
A | → β1 | β2 |... | βn
Q1) Left Factor the following Grammar

S  iEtS | iEtSeS | a
Eb
 

43
Q1) Left Factor the following Grammar:
 
E  iEtS | iEtSeS | a
Eb
 
Solution:
 
E  iEtSE1 | a
E1  eS | ε
Eb

44
Left factor the following grammar

1) A  abB | aB | cdg | cdeB | cdfB

2) A  ad | a | ab | abc | b

3) lexp  atom | list


atom  number | identifier
list  (lexp_seq)
lexp_seq  lexp, lexp_seq | lexp

45
Left-Factoring – Example1
A → abB | aB | cdg | cdeB | cdfB

A → aA | | cdA ||
A | → bB | B
A ||→ g | eB | fB
Left-Factoring – Example2
A → ad | a | ab | abc | b


A → aA | | b
A | → d | ε | b | bc


A → aA | | b
A | → d | ε | bA||
A|| → ε | c
Left-Factoring – Example3
lexp  atom | list
atom  number | identifier
list  (lexp_seq)
lexp_seq  lexp lexp_seq |
lexp_seq | , lexp_seq | ε

48
Parsing Techniques

49
4.4 Top down Parsing
■ Top down parsing is viewed as a problem of
constructing parse tree from the root and creating
the nodes of the parse tree in preorder and finding
the left most derivation for an input string.

50
EXAMPLE
■ Write the sequence of top down parse for id+id*id using
the arithmetic expression grammar
E → T E’
E’ → +T E’ | ε
T → F T’
T’ → *F T’ | ε
F → id | (E)

51
EXAMPLE
■ Write the sequence of top down parse for id+id using the arithmetic
expression grammar Left Most Derivation
E → T E’ E  TE’
lm
E’ → +T E’ | ε E 
lm
FT’E’
T → F T’ E 
lm id T’E’
T’ → *F T’ | ε E 
lm
id E’
F → id | (E) E 
lm id + T E’
E  id + F T’ E’
lm
E 
lm
id + id T’ E’
E  id + id E’
lm
E 
lm
id + id
52
SEQUENCE OF STEPS IN WRITING THE PARSE
TREE FOR THE STRING id + id
E  TE’
lm
E  FT’E’ E E E
lm lm lm
E 
lm id T’E’
E  id E’ T E’ T E’
lm
E 
lm id + T E’
E  id + F T’ E’ F T’
lm
E  id + id T’ E’
lm
E  id + id E’
lm
E 
lm
id + id
Sequence of Steps in Writing Parse Tree For the String id = id + id
E  TE’  FT’E’  id T’E’  id E’  id + T E’  id + F T’ E’  id + id T’ E’  id + id E’  id + id
lm lm lm lm lm lm lm lm lm

E E E E E
lm lm lm T E’
lm
T E’ T E’ T E’
F T’ F T’
F T’
id id Ɛ
E E
lm lm E
lm T E’
T E’
T E’
F T’ F T’ + T E’
+ T E’ F T’ + T E’
id id Ɛ F
Ɛ id
T’
Ɛ F T’
id
Contd….
E E
lm lm
T E’ T E’

F T’ + T E’ F T’ + E’
T

id Ɛ F T’
id Ɛ F T’ Ɛ

id Ɛ id Ɛ
Activity 1
■ Write the sequence of top down parse for id+id*id using the arithmetic
expression grammar
E → T E’
E’ → +T E’ | ε
T → F T’
T’ → *F T’ | ε
F → id | (E)

56
57
Activity 2
■ Write the sequence of top down parse for id*id+id using the arithmetic
expression grammar
E → T E’
E’ → +T E’ | ε
T → F T’
T’ → *F T’ | ε
F → id | (E)

58
4.4.1 Recursive – Descent Parsing
Typical procedure for a nonterminal in top down parser
void A ( )
{
Choose an A-production, A  X1, X2, . . . . . . . . . Xk ;
for (i=1 to k)
{
if (Xi is a non-terminal)
call Procedure Xi ( );
else if (Xi equals the current input symbol a )
advance the input to the next symbol ;
else /* An error has occurred */
}
} 59
■ A parser that uses collection of Recursive procedures for parsing
the given input string is called Recursive descent.
■ A recursive descent parser consists of set of procedures, one for
each nonterminal.
■ Execution begins with the procedure for the start symbol, which
halts and announces success if its procedure body scans the entire
input string.
■ General recursive descent parsing may require backtracking, that
is, it may require repeated scans over the input.
■ However backtracking is rarely needed to parse. Procedure of
recursive descent parsing does not support backtracking, to allow
backtracking the code has to be modified. 60
■ Consider the grammar
S → cAd
A → ab | a

To construct a top down parse tree for the input string w= cad,
begin with a tree consisting of single node labeled S and the
input pointer pointing to c, the first symbol of w.

61
S has only one production, so we use it to expand S obtain the
tree of Fig. (a). The leftmost leaf, labeled c, matches the rest
symbol of input w, so we advance the input pointer to a, the
second symbol of w, and consider the next leaf, labeled A.

62
4.4.2 FIRST and FOLLOW
■ The construction of both top-down and bottom –up parsers
is aided by two functions, FIRST and FOLLOW, associated
with grammar G.
■ During top-down parsing, FIRST and FOLLOW allows us to
choose which production to apply, based on the next input
symbol.
■ During panic mode error recovery, sets of tokens produced
by FOLLOW can be used as synchronizing tokens.

63
■ Definition: FIRST(α)
FIRST(α), where α is any string of grammar symbols is defined to be
the set of terminals that begin strings derived from α
If α ⇒ ε, then ε* is also in FIRST(α)
■ Definition: FOLLOW(A)
FOLLOW(A) for a nonterminal A is defined to be set of terminals that
appear immediately to the right of A in some sentential form.
■ That is the set of terminals a such that there exists a derivation of
the form S ⇒ α Aa β for some α and β
*
64
Rules to compute FIRST

65
Rules to compute FOLLOW

66
1) Obtain FIRST and FOLLOW for the grammar given below:
E  T E|
E|  +T E| | ε
T  F T|
T|  *F T| | ε
F  id | (E)

67
Solution
FIRST(E) = FIRST(T E |) = FIRST(T) ------- (1)
FIRST(T) = FIRST(F T’) = FIRST(F) --- (2)
FIRST(F) = FIRST(id) U FIRST( (E) ) = {id} U { ( } = {id, ( }
So, From Equation (2) We Get, FIRST(T) = {id, (}
So, From Equation (1) We Get, FIRST(E) = {id, (}
FIRST(E’) = FIRST(+T E’) U FIRST(ε) = {+} U{ε} = {+,ε}
FIRST(T’) = FIRST(*F T’ ) U FIRST(ε) = {*}U{ε} = {*,ε}

68
FOLLOW(E) = FIRST( ) ) U $ ={ ), $}
FOLLOW(E’) = FOLLOW(E) ={ ), $}
FOLLOW(T) = FIRST(E’) = {+} U FOLLOW(E) U FOLLOW(E’)
= {+} U { ), $} ={+, ), $}
FOLLOW(T’) = FOLLOW(T) ={+, ), $}
FOLLOW(F) = FIRST(T’) ={*,ε} U FOLLOW(T) U FOLLOW(T’)
= {*} U {+, ),$} = {*,+, ),$}

69
Calculate FIRST and FOLLOW For the Following Grammar

2)
S → AaAb | BbBa
A→ε
B→ε
3)
S → L=R | R
L → *R |id
R→L
70
Question 2) Solution
FIRST(S) =FIRST(AaAb) U FIRST(BbBa)
={a} U {b} ={a,b}
FIRST(A)= FIRST(ε) ={ε}
FIRST(B) =FIRST(ε) ={ε}
 

FOLLOW(S)={$}
FOLLOW(A)= FIRST(a) U FIRST(b)={a,b}
Follow(B) ={a,b}

71
Calculate FIRST and FOLLOW For the Following Grammar
4) S → aBa
B → bB | ε 6) S → ABCDE
A → a|ε
5) S → ,GH; B →b |ε
G → aF C →c
F → bF |ε D →d |ε
H → KL E →e |ε
K → m |ε
L→n|ε
72
LL(1) Grammars

■ Predictive parsers, that is, recursive-descent parsers needing


no backtracking, can be constructed for a class of grammars
called LL(1).

■ The first “L" in LL(1) stands for scanning the input from left to
right, the second “L" for producing a leftmost derivation, and
the “1" for using one input symbol of lookahead at each step
to make parsing action decisions.
73
Transition diagram for predictive parsers
■ Transition diagrams are useful for visualizing predictive parsers.

■ To construct the transition diagram from a grammar, first eliminate left recursion and then
left factor the grammar.

■ Then, for each nonterminal A,

1. Create an initial and final (return) state.

2. For each production A → X1X2 . . . Xk, create a path from the initial to the final state,
with edges labeled X1,X2,. . . , Xk. If A → ε , the path is an edge labeled ε.

74
Consider the Productions in the Arithmetic Expression Grammar:

E  TE/
E/  + T E/

75
Definition of LL(1) grammar

i.e if β derives ε, then FIRST(α) should not derive any string beginning
with a terminal in FOLLOW(A)
76
■ The first two conditions are equivalent to the statement
that FIRST(α) and FIRST(β ) are disjoint sets.

■ The third condition is equivalent to stating that if ε is in


FIRST(β), then FIRST(α) and FOLLOW(A) are disjoint sets,
and likewise if ε is in FIRST(α), then FIRST(β) and
FOLLOW(A) are disjoint sets

77
Algorithm for construction of predictive parsing table
This algorithm collects the information from FIRST and FOLLOW sets
into a predictive parsing table M[A; a], a two-dimensional array,
where A is a nonterminal, and a is a terminal or the symbol $, the
input endmarker

78
■ Construct predictive parsing table ( LL(1) table) for arithmetic
expression grammar. Check whether the grammar is LL(1) by
using predictive parsing table and also without using predictive
parsing table.

Note:
To construct predictive parsing table, the grammar should be
unambiguous, non left recursive and left factored.

79
We Have three Grammars for Expression
EE+E EE+T E  TE/
EE*E ET E/  +TE/
E  (E) TT*F E/  Ɛ
E  id TF T  FT/
F  (E) T/  *FT/
F  id T/  Ɛ
F  (E)
F  id
The grammar is
FIRST(E) = FIRST(T E |) = FIRST(T) ------- (1)
FIRST(T) = FIRST(F T’) = FIRST(F) --- (2)
E→TE|
FIRST(F) = FIRST(id) U FIRST( (E) ) = {id} U { ( } = {id, ( }
E’ → +T E | | ε So, From Equation (2) We Get, FIRST(T) = {id, (}
T → F T| So, From Equation (1) We Get, FIRST(E) = {id, (}
T’ → *F T || ε FIRST(E’) = FIRST(+T E’) U FIRST(ε) = {+} U{ε} = {+,ε}
F → id | (E) FIRST(T’) = FIRST(*F T’ ) U FIRST(ε) = {*}U{ε} = {*,ε}

FOLLOW(E) = FIRST( ) ) U $ = { ), $}
FOLLOW(E’) = FOLLOW(E) = { ), $}
FOLLOW(T) = FIRST(E’) = {+} U FOLLOW(E) U FOLLOW(E’)
= {+} U { ), $} = {+, ), $}
FOLLOW(T’) = FOLLOW(T) = {+, ), $}
FOLLOW(F) = FIRST(T’) = {*,ε} U FOLLOW(T) = {*, +, ), $}
81
Solution
■ Construct parsing table by placing all terminals in columns and non
terminals in rows.
■ Blanks are error entries, non blanks indicate a production with which to
expand a nonterminal

Grammar is LL(1) or not?


Above grammar is LL(1) , since there is atmost one production in each table entry.
82
To check whether the grammar is LL(1) or not without
using predictive parsing table:
Consider only those rules of the form A → α|β
i.e 1) E’ → +T E’ | ε
2) T’ → *F T ’| ε
3) F → id | (E)

83
To check whether the grammar is LL(1) or not without
using predictive parsing table:
Consider only those rules of the form A → α|β
i.e 1) E’ → +T E’ | ε
2) T’ → *F T ’| ε
3) F → id | (E)
Apply the rules given in definition to check whether it is LL(1)
Consider the First Production: E’ → +T E’ | ε
In the rule, E’ → +T E’ | ε
α = +T E’ and β = ε
84
Rule 1) FIRST(α) and FIRST(β) Should be Disjoint
FIRST(+T E’ ) = {+} and FIRST(ε) = {ε} Since nothing is common,
both the sets are disjoint.
So, First Rule is Satisfied for first production.

85
Rule 2) Atmost one of α or β is deriving ε.

It is true because only β is deriving ε

86
Rule 3) if FIRST(β) contains ε, then FIRST (α) and
FOLLOW(E’) should be disjoint and vice-versa.
FIRST (α) = FIRST (+T E’ ) = {+}
FOLLOW(E’) ={ ), $}
Both are disjoint .

Now apply these three rules for the production


T’ → *F T ’| ε and F → id | (E)
The same is applicable for this production and it satisfies.
Therefore the given grammar is LL(1)
87
To check whether the grammar is LL(1) or not without using predictive parsing table:
Same previous slides are consolidated in one slide
Consider only those rules Consider the 1st Rule 1) FIRST(α) and Rule 2) Almost one of α Rule 3) if FIRST(β) contains ε,
Production: then FIRST (α) and FOLLOW(E’)
of the form A → α|β FIRST(β) Should be Disjoint or β is deriving ε. should be disjoint and vice-
versa.
1) E’ → +T E’ | ε E’ → +T E’ | ε FIRST(+T E’ ) = {+} and FIRST(ε) = {ε} It is true because only β FIRST (α) = FIRST (+T E’ ) = {+}
2) T’ → *F T ’| ε In the rule, Since nothing is common, both is deriving ε FOLLOW(E’) ={ ), $}
3) F → id | (E) E’ → +T E’ | ε the sets are disjoint. Both are disjoint .
α = +T E’ and So, First Rule is Satisfied for first
β=ε production.

Consider only those rules Consider the 2nd Rule 1) FIRST(α) and Rule 2) Almost one of α Rule 3) if FIRST(β) contains ε,
Production: then FIRST (α) and FOLLOW(E’)
of the form A → α|β FIRST(β) Should be Disjoint or β is deriving ε. should be disjoint and vice-
versa.
1) E’ → +T E’ | ε T’ → *F T’ | ε FIRST(*F T’ ) = {*} and FIRST(ε) = {ε} It is true because only β FIRST (α) = FIRST (*F T’ ) = {*}
2) T’ → *F T ’| ε In the rule, Since nothing is common, both is deriving ε FOLLOW(T’) ={+ ), $}
3) F → id | (E) T’ → *F T’ | ε the sets are disjoint. Both are disjoint .
α = *F T’ and So, First Rule is Satisfied for first
β=ε production.

Consider only those rules Consider the 3rd Rule 1) FIRST(α) and Rule 2) Almost one of α Rule 3) if FIRST(β) contains ε,
Production: then FIRST (α) and FOLLOW(E’)
of the form A → α|β FIRST(β) Should be Disjoint or β is deriving ε. should be disjoint and vice-
versa.
1) E’ → +T E’ | ε F → id | (E) FIRST(id) = {id} and FIRST((E)) = {(} It is true because only β FIRST (α) = FIRST (*F T’ ) = {*}
2) T’ → *F T ’| ε In the rule, Since nothing is common, both is deriving ε FOLLOW(T’) ={+ ), $}
3) F → id | (E) F → id | (E) the sets are disjoint. Both are disjoint .
α = id and So, First Rule is Satisfied for first
production.
β = (E)
■ A grammar is LL(1) grammar if the associated LL(1) parsing
table has atmost one production in each table entry.

■ But there is no guarantee that a grammar which is


unambiguous, non left recursive and left factored to be
LL(1). Because there are few grammars which are
unambiguous, non left recursive and left factored but still
it is not LL(1)

89
Construct predictive parsing table for the grammar.
Also verify that the grammar is LL(1) or not with and
without using predictive parsing table.

90
91
■Above grammar is not LL(1) since there are two
entries for the nonterminal S on the input
symbol e.

■Verify it without using predictive parsing table.


 Left as Exercise

92
Non recursive predictive parsing
■ A non-recursive predictive parser can be built by
maintaining a stack explicitly, rather than implicitly via
recursive calls.
■ The parser mimics a leftmost derivation.
■ If w is the input that has been matched so far, then the
stack holds a sequence of grammar symbols α such that

93
Model of table driven Predictive Parser

94
■ The table-driven parser has an input buffer, a stack containing a
sequence of grammar symbols, a parsing table constructed by
predictive parsing algorithm, and an output stream.

■ The input buffer contains the string to be parsed, followed by the


end-marker $.

■ We reuse the symbol $ to mark the bottom of the stack, which


initially contains the start symbol of the grammar on top of $.

95
■ The parser is controlled by a program that considers X, the symbol
on top of the stack, and a, the current input symbol.
■ If X is a nonterminal, the parser chooses an X-production by
consulting entry M[X, a] of the parsing table M.
■ Otherwise, it checks for a match between the terminal X and
current input symbol a.

96
Table driven predictive parsing algorithm

97
LL(1) Parsing Table
■ We already Constructed the LL(1) Parsing Table for the Arithmetic
Expression.

98
Matched Stack Input String Action
E $ id + id * id $
T E’ $ id + id * id $
F T’ E’ $ id + id * id $
id T’ E’ $ id + id * id $
id T’ E’ $ + id * id $
id E’ $ + id * id $
id + T E’ $ + id * id $
id + T E’ $ id * id $
id + F T’ E’ $ id * id $
id + id T’ E’ $ id * id $
id + id T’ E’ $ * id $
Continue Like This
Write the sequence of moves made by the predictive parser on the
input string id+id*id

100
Exercise 1 to students
Given the grammar,
A(A) | a
1. Construct the predictive parsing table
2. Check the grammar is LL(1) or not
3. Show the parser steps for the input ((a))

101
Exercise 2 to students
Given the grammar,
S AaAb| BbBa
A ε
B ε
1. Compute FIRST and FOLLOW
2. Construct the predictive parsing table
3. Show the parser steps for the input ab
102
Exercise 3 to students
Consider the grammar given below. Check whether
the grammar is LL(1) or not without using Predictive
Parsing Table.

S→ aSbS | bSaS | ε

103
Exercise 4 to the students
3. Consider the grammar given below. Check
whether the grammar is LL(1) or not without
using predictive parsing table.

S→ aSbS | bSaS | ε

104
Error recovery in predictive parsing
■ An error is detected during predictive parsing when the terminal on
top of the stack does not match the next input symbol or when
nonterminal A is on top of the stack, a is the next input symbol, and
M[A, a] is error (i.e., the parsing-table entry is empty).
■ Panic Mode error recovery
Panic-mode error recovery is based on the idea of skipping over
symbols on the input until a token in a selected set of synchronizing
tokens appears. Its effectiveness depends on the choice of
synchronizing set. The sets should be chosen so that the parser
recovers quickly from errors that are likely to occur in practice.
105
Some heuristics are as follows
1. As a starting point, place all symbols in FOLLOW(A) into the synchronizing set
for nonterminal A. If we skip tokens until an element of FOLLOW(A) is seen and
pop A from the stack, it is likely that parsing can continue.
2. It is not enough to use FOLLOW(A) as the synchronizing set for A. For example,
if semicolons terminate statements, as in C, then keywords that begin
statements may not appear in the FOLLOW set of the nonterminal representing
expressions. A missing semicolon after an assignment may therefore result in the
keyword beginning the next statement being skipped. We can add to the
synchronizing set of a lower-level construct the symbols that begin higher-level
constructs.
For example, we might add keywords that begin statements to the
synchronizing sets for the non-terminals generating expressions.
106
Some heuristics – Contd…
3. If we add symbols in FIRST(A) to the synchronizing set for nonterminal
A, then it may be possible to resume parsing according to A if a symbol
in FIRST(A) appears in the input.
4. If a nonterminal can generate the empty string, then the production
deriving can be used as a default. Doing so may postpone some error
detection, but cannot cause an error to be missed. This approach
reduces the number of non-terminals that have to be considered during
error recovery.
5. If a terminal on top of the stack cannot be matched, a simple idea is to
pop the terminal, issue a message saying that the terminal was inserted,
and continue parsing. In effect, this approach takes the synchronizing set
of a token to consist of all other tokens.
107
Method for panic mode error recovery in LL(1) parsing
■ Construct the predictive parsing table.
■ Compute FOLLOW. Place all the symbols in FOLLOW(A)
into the synchronizing set for nonterminal A. “synch”
indicates the synchronizing tokens obtained from the
FOLLOW set of the nonterminal.

108
Method for panic mode error recovery in LL(1) parsing
■ For parsing the erroneous input, the parsing table is to be used
as follows:
If the parser looks up entry M[A, a] and finds that it is blank,
then the input symbol a is skipped.
If the entry is “synch," then the nonterminal on top of the stack
is popped in an attempt to resume parsing, other than the initial
parsing condition where the input is skipped with the error
message rather than popping of f the nonterminal from the
stack.
If a token(terminal) on top of the stack does not match the
input symbol, then we pop the token from the stack.
109
Write the parsing and error recovery moves made by the
predictive parser on the erroneous input )id*+id
• Steps: Take the grammar suitable for top down parsing.
• Construct predictive parsing table.
• Add synchronizing tokens to the table by computing FOLLOW.

110
Write the parsing and error recovery moves made by the
predictive parser on the erroneous input )id*+id
Steps: Take the grammar suitable for top down parsing.
Construct predictive parsing table.
Add synchronizing tokens to the table by computing FOLLOW.

111
Adding synchronizing token

112
Parsing the string )id*+id

113
Bottom up Parsing
■ A bottom-up parse corresponds to the construction of a
parse tree for an input string beginning at the leaves (the
bottom) and working up towards the root (the top).

■ Bottom up parse constructs the rightmost derivation of the


input string in reverse.(i.e, it creates leftmost reduction).

114
Write the sequence of bottom up parse for the input id*id
■ Since it is bottom up parse the grammar to be
considered should be unambiguous
Unambiguous arithmetic expression grammar is:

115
Write the sequence of bottom up parse for the input id*id
■ Since it is bottom up parse the grammar to be
considered should be unambiguous
Unambiguous arithmetic expression grammar is:
E→ E+T|T
T→T*F |F
F → ( E ) | id

116
Write the sequence of bottom up parse for the input id*id
■ Since it is bottom up parse the grammar to be
considered should be unambiguous
Unambiguous arithmetic expression grammar is:
E→ E+T|T
T→T*F |F
F → ( E ) | id
■ To write the bottom up parse for the input, First write
the rightmost derivation for id*id

117
Write the sequence of bottom up parse for the input id*id
■ Since it is bottom up parse the grammar to be considered
should be unambiguous
Unambiguous arithmetic expression grammar is:
E → E + T | T
T → T * F | F
F → ( E ) | id
■ To write the bottom up parse for the input, First write the
rightmost derivation for id*id
E ═> T ═> T*F ═> T*id ═> F*id ═> id*id
118
Bottom Up Parse for id*id

119
Reduction
■ Reduction is the reverse of the step in a derivation.
■ We can think of bottom up parsing as the process of “reducing" a
string w to the start symbol of the grammar.
■ At each reduction step, a specific substring matching the body of a
production is replaced by the nonterminal at the head of that
production.
■ Key problems in bottom up parsing:
Key decisions during bottom up parsing are about when to reduce and
what production to apply as the parse continues
Sequence of reductions for the given arithmetic expression grammar is:
id*id, F*id, T*id, T*F, T, E
120
Handle Pruning
Definition of handle
■Informally, a “handle" is a substring that matches the body of a
production, and whose reduction represents one step along the
reverse of a rightmost derivation.

■Formally, if , then production A → β in the


position following α is a handle of αβw

121
Handle pruning process

122
The handles during the parse of id1*id2 according to
the arithmetic expression grammar.

123
Construct bottom up parse for the input string w= aaa*a++
using the grammar: S → SS+ | SS* |a

Rightmost derivation is:

124
Construct bottom up parse for the input string w= aaa*a++
using the grammar: S → SS+ | SS* |a

Rightmost derivation is:


S ═> SS+
═> SSS++
═> SSa++
═> SSS*a++
═> SSa*a++
═> Saa*a++
═> aaa*a++

125
Shift Reduce Parsing
■ Shift-Reduce Parsing is a form of bottom-up parsing in which a
stack holds grammar symbols and an input buffer holds the rest of
the string to be parsed.
■ Shift Reduce Parsing can be built for the largest class of grammar
called LR grammars.
■ Working:
■ We use $ to mark the bottom of the stack and also the right end of
the input. Conventionally, when discussing bottom-up parsing, the
top of the stack is to the right.

126
■ Initially, the stack is empty, and the string w is on the input,
as follows:

■ During a left-to-right scan of the input string, the parser shifts


zero or more input symbols onto the stack, until it is ready to
reduce a string β of grammar symbols on top of the stack.
■ It then reduces β to the head of the appropriate production.
127
■ The parser repeats this cycle until it has detected an
error or until the stack contains the start symbol and
the input is empty:

■ Upon entering the above configuration, the parser


halts and announces successful completion of parsing.

128
Actions of a Shift Reduce Parser
1. Shift : Shift the next input symbol onto the top of the stack.
2. Reduce: The right end of the string to be reduced must be at the
top of the stack. Locate the left end of the string within the stack
and decide with what nonterminal to replace the string.
3. Accept: Announce successful completion of parsing.
4. Error: Discover a syntax error and call an error recovery routine.

129
Show the parsing of input id1*id2 using Shift Reduce Parser.

130
Conflicts during shift reduce parsing
■ There are context-free grammars for which shift-reduce
parsing cannot be used.
■ Two types of conflicts that occur during shift-reduce
parsing are:
a)Shift/Reduce Conflict
b)Reduce/Reduce Conflict

131
Shift/Reduce Conflict
• Every Shift-Reduce Parser for such a grammar can reach a
configuration in which the parser, knowing the entire stack and also
the next k input symbols, cannot decide whether to shift or to reduce.

Eg: Consider the dangling else grammar

stmt → if expr then stmt


|if expr then stmt else stmt
|other

132
■ If we have a shift-reduce parser in configuration

■ we cannot tell whether if expr then stmt is the handle, no matter


what appears below it on the stack. Here there is a shift/reduce
conflict. Depending on what follows the else on the input, it might
be correct to reduce if expr then stmt to stmt, or it might be
correct to shift else and then to look for another stmt to complete
the alternative if expr then stmt else stmt.
133
Consider the input
if expr then if expr then stmt else stmt
There are two rightmost derivation for the above input
Stmt ═> if expr then stmt
═> if expr then if expr then stmt else stmt
Stmt ═> if expr then stmt else stmt
═> if expr then if expr then stmt else stmt
When stack top has if expr then stmt and the input from left
contains else we cannot decide whether to Shift or Reduce
134
Reduce/Reduce Conflict
• It is a conflict in which parser cannot decide which of
several reductions to make.

• Here we know the handle, but the stack contents and


the next input symbol are insufficient to determine
which production should be used in a reduction.
135
■ Consider the grammar which involves productions for procedure
calls and array references.

136
■ If we have shift/reduce parser in configuration

■ It is evident that the id on top of the stack must be reduced, but by


which production?
parameter->id or expr->id
■ The correct choice is parameter -> id if p is a procedure, but expr -> id
if p is an array.
■ The stack does not tell which information in the symbol table
obtained from the declaration of p must be used. Here there is
reduce/reduce conflict.
137
■ One solution is to change the token id in production
stmt  id(parameter_list)
to procid and to use a more sophisticated lexical analyzer that
returns the token name procid when it recognizes a lexeme that is
the name of a procedure.
■ If we made this modification, then on processing p(i,j) the parser
would be either in the configuration

■ In this configuration we choose reduction by parameter->id


138
Introduction to LR parsing: Simple LR
■ The most prevalent type of bottom-up parser today is based
on a concept called LR(k) parsing; the “L" is for left-to-right
scanning of the input, the “R" for constructing a rightmost
derivation in reverse, and the k for the number of input
symbols of lookahead that are used in making parsing
decisions.
■ This section introduces the basic concepts of LR parsing and
the easiest method for constructing shift-reduce parsers,
called “simple LR" (or SLR, for short).
139
Why LR Parsers?
LR parsing is attractive for variety of reasons:
■LR parsers can be constructed to recognize virtually all programming
language constructs for which context-free grammars can be written.
Non LR context-free grammars exist, but these can generally be
avoided for typical programming-language constructs.
■The LR-parsing method is the most general non backtracking shift-
reduce parsing method known, yet it can be implemented as
efficiently as other, more primitive shift-reduce methods.
■An LR parser can detect a syntactic error as soon as it is possible to
do so on a left-to-right scan of the input.
140
■ The class of grammars that can be parsed using LR methods is a
proper superset of the class of grammars that can be parsed with
predictive or LL methods. For a grammar to be LR(k), we must be
able to recognize the occurrence of the right side of a production in
a right-sentential form, with k input symbols of look ahead.

The principal drawback of the LR method is that it is too much work


to construct an LR parser by hand for a typical programming-
language grammar. A specialized tool, an LR parser generator, is
needed.

141
Items and LR(0) automaton
■ An LR parser makes shift-reduce decisions by maintaining states to
keep track of where we are in a parse. States represent sets of
“items”.
■ An LR(0) item (item for short) of a grammar G is a production of G
with a dot at some position of the body. Thus, production A → XYZ
yields the four items
A → .XYZ
A → X.YZ
A → XY.Z
A → XYZ.
The production A → ε generates only one item , A → .
142
■ An item indicates how much of a production we have seen at a
given point in the parsing process. For example, the item A → XY.Z
indicates that we hope to see a string derivable from XY Z next on
the input.
■ One collection of sets of LR(0) items, called the canonical LR(0)
collection, provides the basis for constructing a deterministic finite
automaton that is used to make parsing decisions. Such an
automaton is called an LR(0) automaton.
■ Each state of the LR(0) automaton represents a set of items in the
canonical LR(0) collection.
143
■ To construct the canonical LR(0) collection for a grammar, we define
an augmented grammar and two functions, CLOSURE and GOTO.
■ What is augmented grammar?
If G is a grammar with start symbol S, then G’, the augmented grammar
for G, is G with a new start symbol S’ and production S’ → S
■ The purpose of this new starting production is to indicate to the
parser when it should stop parsing and announce acceptance of the
input. That is, acceptance occurs when and only when the parser is
about to reduce by S’ → S

144
Closure of item sets
■ If I is a set of items for a grammar G, then CLOSURE(I ) is the set of
items constructed from I by the two rules:

145
■ Consider the augmented grammar
E’ → E
E→ E+T|T
T→T*F |F
F → ( E ) | id

146
■ Consider the augmented grammar
E’ → E
E→ E+T|T
T→T*F |F
F → ( E ) | id
If I is the set of one item { [E’ → . E] } then CLOSURE(I) is??
E’ → . E
E→ .E+T
E → .T
T → .T * F
T→.F
F→ .(E)
F → . id
147
Computation of CLOSURE

148
■ We divide all the sets of items of interest into two classes:

1)Kernel Items: the initial item, S’ → . S and all items whose


dots are not at the left end.

2) Non kernel items: all items with their dots at the left end,
except for S’ → . S

149
Function GOTO
■ The second useful function is GOTO(I,X) where I is a set of
items and X is a grammar symbol.
■ Definition
GOTO(I,X) is defined to be the closure of the set of all items
[A →αX.β] such that [A → α .Xβ] is in I
■ GOTO function is used to define the transitions in the LR(0)
automaton for a grammar. The states of the automaton
correspond to sets of items, and GOTO(I,X) specifies the
transition from the state for I under input X.
150
■ If I is the set of two items { [E’ → E.], [E → E . + T ] }, then
GOTO(I ,+) contains the items
E → E +. T
T → .T * F
T→.F
F→ .(E)
F → . id

151
Algorithm for Computation of canonical collection of sets of
LR(0) items for augmented grammar G’

152
LR(0)Automaton

■ The states of this automaton are the sets of items from the
canonical LR(0) collection, and the transitions are given by the
GOTO function.

■ The start state of the LR(0) automaton is CLOSURE({ [S’ → .S]}),


where S’ is the start symbol of the augmented grammar.

153
Construct LR(0) automaton for the arithmetic
expression grammar
E→ E+T|T
T→T*F |F
F → ( E ) | id
■First augment the arithmetic expression grammar. Resulting grammar is:
E’ → E
E→ E+T|T
T→T*F |F
F → ( E ) | id
■Begin with [E’ → .E] item set. Compute its CLOSURE.
154
■ CLOSURE is:
E’ → . E
E→ .E+T
E → .T
T → .T * F
T→.F
F→ .(E)
F → . id
This sets of items of CLOSURE forms the state I0
■ Start state of LR(0)automaton is CLOSURE( { [E’ → .E] } )

155
LR(0) Automaton

156
How can LR(0) automaton help with shift reduce decisions?

Shift-Reduce decisions can be made as follows:


■Suppose that the string γ of grammar symbols takes the
LR(0) automaton from the start state 0 to some state j. Then,
❑Shift on next input symbol a if state j has a transition on a.

❑Otherwise, we choose to reduce; the items in state j will


tell us which production to use.

157
■ The LR parsing algorithm uses its stack to keep track of
states as well as grammar symbols.

■ We use a stack to hold states and the grammar symbols


corresponding to the states on the stack appear in column
SYMBOLS.

158
■ Initially stack holds the start state 0 of the automaton and SYMBOL
holds the bottom-of-stack marker $.
■ The input symbol is id. Check whether state 0 on id has transition
to new state, if so shift the symbol else reduce.
■ While shifting push the new state to the top of the stack and the
input symbol to the top of the SYMBOL column.
■ If there is no transition then reduce the symbol by the head of the
production and push the head of the production to the SYMBOL
column and push the new state from which head of the production
makes transition to from the top of the stack.
159
Actions of shift reduce parser on input id*id using LR(0) Automaton

160
Exercise to students

■ Write the parsing moves for the input id*id+id using


LR(0) automaton.

161
solution

162
Line Stack Symbols Input Action
1 0 $ id * id + id $ Shift to 5
2 0 5 $ id * id + id $ Reduce by F  id
3 0 3 $ F * id + id $ Reduce by T  F
4 0 2 $ T * id + id $ Shift to 7
5 0 2 7 $ T * id + id $ Shift to 5
6 0 2 7 5 $ T * id + id $ Reduce by F  id
7 0 2 7 10 $ T * F + id $ Reduce by T  T * F
8 0 2 $ T + id $ Reduce by E  T
9 0 1 $ E + id $ Shift to 6
10 0 1 6 $ E + id $ Shift to 5
11 0 1 6 5 $ E + id $ Reduce by F  id
12 0 1 6 3 $ E + F $ Reduce by T  E
13 $ E + T $ Reduce by E  E + T
14 0 1 $ E $ Accept
■Construct LR(0) automaton for the
following grammar
A→(A) | a

Also show the shift reduce parsing action


for input ((a))

164
solution

165
LR Parsing algorithm
■ Model of LR Parser

166
■ It consists of an input, an output, a stack, a driver program,
and a parsing table that has two parts (ACTION and GOTO).
■ The driver program is the same for all LR parsers; only the
parsing table changes from one parser to another.
■ The parsing program reads characters from an input buffer
one at a time.
■ Where a shift-reduce parser would shift a symbol, an LR
parser shifts a state.
167
■ Each state summarizes the information contained in the
stack below it
■ The stack holds a sequence of states, s0,s1, …sm, where
sm is on top. In the SLR method, the stack holds states
from the LR(0) automaton.

168
Structure of LR parsing table
■ The parsing table consists of two parts: a parsing-action function
ACTION and a goto function GOTO.

169
Configuration of LR parser
■ To describe the behavior of an LR parser, it helps to have a notation representing the
complete state of the parser: its stack and the remaining input. A configuration of an LR
parser is:

■ where the first component is the stack contents (top on the right), and the second component
is the remaining input. This configuration represents the right-sentential forms a pair:

170
LR parsing Algorithm

171
Construction of SLR parsing table

■ An LR parser using an SLR parsing table is called an SLR parser


The SLR method:
■ The SLR method begins with LR(0) items and LR(0) automata.
■ That is, given a grammar, G, we augment G to produce G’, with a
new start symbol S’.
■ From G’ , we construct C, the canonical collection of sets of items
for G’ together with the GOTO function.
■ The ACTION and GOTO entries in the parsing table are then
constructed using the following algorithm.

172
Algorithm for constructing SLR parsing table

173
■ The parsing table consisting of the ACTION and GOTO
functions determined by Algorithm above is called the
SLR(1) table for G.
■ An LR parser using the SLR(1) table for G is called the
SLR(1) parser for G, and a grammar having an SLR(1)
parsing table is said to be SLR(1).
■ We usually omit the “(1)" after the “SLR," since we shall
not deal here with parsers having more than one symbol
of lookahead.
174
Construct SLR parsing table for arithmetic expression grammar
and verify whether the grammar is SLR(1) or not?
■ Augment the grammar
■ Construct C the canonical collection of LR(0) items. (i.e construct
LR(0)automaton. The states will become canonical collection).
■ Number the productions other than the starting production of
augmented grammar as:
(1)E → E + T
(2) E → T
(3)T → T * F
(4) T → F
(5)F → ( E )
(6) F → id
175
■The codes for the actions are:

1.si means shift and stack state I


2.rj means reduce by the production numbered j
3.acc means accept
4.blank means error.

176
177
Moves of an LR parser on id*id+id

178
To verify grammar is SLR(1) or not?
■ There should be atmost one entry in a table cell. Then the
grammar is SLR(1).

■ Every SLR(1) are unambiguous. But there are many unambiguous


grammar that are not SLR(1).

179
Exercise to students
1)Given the grammar
S → CC
C →cC | d
1)Obtain the sets of canonical collection of valid LR(0)
items.
2)Design SLR parsing table
3)Verify whether the grammar is SLR(1) or not

180
2) Show that the following grammar is not SLR(1)

S-> AaAb | BbBa


A->€
B->€

181
3) Construct LR(0) parsing table for the following
grammar
S->Ac
A->AB| Ɛ
B->aB | Ɛ

182

You might also like