Module 3
Module 3
Module 3
Tech/S6
MODULE 3
Along the way, a bottom-up parser searches for substrings of the working string that match
the right side of some production.
When it finds such a substring, it reduces it, i.e., substitutes the left side non-terminal for the
matching right side. The goal is to reduce all the way up to the start symbol and report a
successful parse.
This can be considered as the process of “reducing” a string w to the start symbol of
a grammar.
At each reduction step parser searches for substrings of the working string that match
the right side of some production.
When it finds such a substring, it reduces it, i.e., substitutes the left side non-terminal
for the matching right side. The goal is to reduce all the way up to the start symbol
and report a successful parse.
In Shift-reduce parsing a stack holds grammar symbols and an input buffer holds the
rest of the string to be parsed.
As we shall see, the handle always appears at the top of the stack just before it is
identified as the handle.
We use $ to mark the bottom of the stack and also the right end of the input. Initially,
the stack is empty, and the string w is on the input, as follows:
STACK INPUT
$ w$
During a left-to-right scan of the input string, the parser shifts zero or more input
symbols onto the stack, until it is ready to reduce a string β of grammar symbols on
top of the stack.
It then reduces β to the head of the appropriate production. The parser repeats this
cycle until it has detected an error or until the stack contains the start symbol and the
input is empty as follows:
STACK INPUT
$S $
Upon entering this configuration, the parser halts and announces successful
completion of parsing.
There are actually four possible actions a shift-reduce parser can make:
Shift
Reduce
Accept
Error
1. Shift: The next input symbol is shifted onto the top of the stack.
2. Reduce: The parser knows the right end of the string to be reduced must be at the
top of the stack. It must then locate the left end of the string within the stack and
decide with what nonterminal to replace the string.
3. Accept: Announce successful completion of parsing.
4. Error: Discover a syntax error has occurred and calls an error recovery routine.
EXAMPLE
Following figure steps through the actions a shift-reduce parser might take in parsing the
input string id1 *id2 according to the expression grammar.
E→ E + T | T
T→ T * F | F
F→ ( E ) | id
EXAMPLE
Precedence Relations
In operator-precedence parsing, we define three precedence relations between certain
pairs of terminals as follows:
The intention of the precedence relations is to find the handle of a right sentential
form,
In our input string $a1a2...an$, we insert the precedence relation between the pairs of
terminals.
<. is inserted between the leftmost $ and id since <. is the entry in row $ and column
id.
Handle is a substring that matches with the right side of the production and if the
substring matches with the right side of the production, it is reduced with the non-
terminal on the left side of the production
EXAMPLE
E E+E
E E*E
E (E)
E id
EE+E
E+E*E
E + E * id3
E + id2 * id3
Id1 + id2 * id3
HANDLE PRUNING
The process of obtaining rightmost derivation in reverse order is called “handle pruning”.
(i.e.) if w is a sentence or string of the grammar at hand, then w = n where n is the nth right
sentinel form of some rightmost derivation.
3.1.2 LR PARSING
LR parsing is most efficient method of bottom up parsing which can be used to parse large
class of context free grammar.
The technique is called LR(k) parsing; the “L” is for left to right scanning of input symbol, the
“R” for constructing right most derivation in reverse, and the k for the number of input
symbols of lookahead that are used in making parsing decision.
Requirements of LR parser:
Input Buffer
Stack
Parsing Table
LR Parsing Program
Input Buffer
The parsing program reads characters from an input buffer one at a time.
Stack
Parsing program uses a stack to store string of the form s0X1s1X2s2X3 ………Xmsm
where sm is on the top.
Each state symbol summarizes the information contained in the stack below it
Combination of state symbol on stack top and current input symbol are used to index
the parsing table and determine the shift reduce parsing decision
Parsing Table
Consist of 2 parts
1. parsing action function action
2. goto function goto
Action
This function takes as arguments a state i and a terminal a (or $, the input end marker). The
value of ACTION [i, a] can have one of the four forms:
i. Shift j, where j is a state.
ii. Reduce by a grammar production A---> β.
iii. Accept.
iv. Error.
Goto
This function takes a state and grammar symbol as arguments and produces a state. If GOTO
[Ii ,A] = Ij, the GOTO also maps a state i and nonterminal A to state j.
It determines sm, the state currently on top of stack and ai the current input symbol
It consults parsing action table entry for action[sm,ai] which can have 4 values:
The function goto takes a state and grammar symbol as arguments and produces a
state.
The function goto takes a state and grammar symbol as arguments and produces a
state.
( s0 X1 s1 X2 . . .Xmsm, ai ai+1 . . . an $)
The next move of the parser is determined by reading ai, the current input symbol
and sm, state on stack top and then consulting parsing action table entry action[sm,ai].
The configuration resulting after each of four types of moves are as follows:
1. If action [sm, ai] = shift s, the parser executes a shift move, entering the
configuration
( s0 X1 s1 X2 . . .Xmsm, ai ai+1 . . . an $)
Here the parser has shifted both the current input symbol ai and the next state s,
which is given in action [sm, ai], onto the slack; ai+1 becomes the current input
symbol.
2. If action [sm, ai] =reduce A , then the parser executes a reduce move, entering
the configuration
where s = goto [sm-r , A] and r is the length of , the right side of the production.
Here the parser first popped 2r symbols off the stack (r state symbols and r
grammar symbols), exposing state sm-r. The parser then pushed both A, the left
side of the production, and s, the entry for goto[sm-r , A], onto the stack. The
current input symbol is not changed in a reduce move.
4. If action [sm, ai] = error, the parser has discovered an error and calls an error
recovery routine.
LR Parsing Algorithm
INPUT : Input string w, LR-Parsing table with functions ACTION and GOTO for a grammar
G
METHOD : Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the
input buffer. The parser then executes the following program until an accept or error action
is encountered.
end
end
return
else error();
end
SLR(l) - Simple LR
LR( 1) - LR parser
An LR parser can detect the syntax errors as soon as they can occur.
Drawbacks of LR parsers
It is too much work to construct LR parser by hand. It needs an automated parser
generator.
1. SLR
2. CLR
3. LALR
Simple LR (SLR)
Easy to implement
Least powerful
Canonical LR
Most powerful
Most expensive
LookAhead LR(LALR)
For an SLR parser, if we are in a state with the item A -> ., then we reduce by A -> iff the
next input symbol ‘a’ belongs to Follow(A).
This is actually a weak rule and can lead to erroneous results and/or shift/reduce conflicts.
EXAMPLE
Consider A ->
State i: A -> .
State j: A -> .
String 1: a
String 2: β a
A → .aBb
A → a.Bb
A → aB.b
A → aBb.
Intuitively, an item shows how much of a production we have seen till the current
point in the parsing procedure.
A collection of sets of LR(0) items (the canonical LR(0) collection) is the basis for
constructing SLR parsers.
Augmented grammar
Augmented grammar
G’ is the augmented grammar of G with a new production rule S’→S where S’ is the
new starting symbol i.e,
This is done to signal to the parser when the parsing should stop to announce
acceptance of input.
Non-kernel items are the items which have the dot at leftmost end.
Sets of items are formed by taking the closure of a set of kernel items.
We will apply this rule until no more new LR(0) items can be added to closure(I).
Computation of Closure
function closure ( I )
begin
J := I;
repeat
add B→. to J
return J
end
Goto Operation
If I is a set of LR(0) items and X is a grammar symbol (terminal or non-terminal), then
goto(I,X) is defined as follows:
EXAMPLE
I ={ E’ → E., E → E.+T}
goto(I,+) = { E → E+.T
T →.T*F
T →.F
F →.(E)
F →.id }
ALGORITHM
Procedure items( G’ )
begin
C := { closure({S’→.S}) }
add goto(I,X) to C
\end
E → .T T → .F I9:goto(I6, T) E → .T
T → .F F → .id T → T.*F T → .F
F → .(E) F → .(E)
F → id. F → .id
OUTPUT: The SLR parsing table functions action and goto for G' .
METHOD:
1. Construct C ={I0, I1,…..In}, the collection of sets of LR(0) items for G'.
2. State i is constructed from Ii . The parsing actions for state i are determined as follows:
If any conflicting actions are generated by the above rules, we say the grammar is not
SLR (1). The algorithm fails to produce a parser in this case.
3. The goto transitions for state i are constructed for all nonterminals A using the rule:
if GOTO(Ii , A)=Ij then GOTO[I, A]=j .
4. All entries not defined by rules (2) and (3) are made “error”.
5. The initial state of the parser is the one constructed from the set of items containing
[S’ S].
I5: F → id. (5, *), (5, +), (5, )), (5, $) ====> r6
I10: T → T*F. (10, $), (10, +), (10, )), (10, *) ====> r3
If a state does not know whether it will make a reduction operation using the
production rule i or j for a terminal, we say that there is a reduce/reduce conflict.
If the SLR parsing table of a grammar G has a conflict, we say that that grammar is
not SLR grammar.
Stack Implementation
Check whether the given input is valid or not?
EXAMPLE
SLR ( 1 ) GRAMMAR
S→E
E→E+T|T
T→T*F|F
F → id
AUGMENTED GRAMMAR
S’ → S
S→E
E→E+T|T
T→T*F|F
F → id
S → •E
T → •F I5= Go to (I1 , +)
I4= Go to (I5, id)
F → •id E → E +•T
F → id•
T → •T * F
I1= Go to (I0, E) T → •F
I8= Go to (I6, F)
S` → E• F → •id
T → T * F•
E → E• + T
I6= Go to (I2, *)
I4= Go to (I6, id)
I2= Go to (I0, T) T → T * •F
F → id•
E → T•T F → •id
T• → * F
1. S→E
2. E→E+T|T
3. T→T*F|F
4. F → id
SLR(1) TABLE
I4 contains the final item which drives F → id• and follow (F) = {+, *, $}, so
action {I4, +} = R5, action {I4, *} = R5, action {I4, $} = R5
I7 contains the final item which drives E → E + T• and follow (E) = {+, $}, so
action {I7, +} = R1, action {I7, $} = R1
I8 contains the final item which drives T → T * F• and follow (T) = {+, *, $}, so
action {I8, +} = R3, action {I8, *} = R3, action {I8, $} = R3.
EXAMPLE
Every SLR (1) grammar is unambiguous, but there are many unambiguous grammars that
are not SLR(1). Consider the grammar with productions
SL=R
SR
L*R
L id
RL
AUGMENTED GRAMMAR
S’ S
SL=R
SR
L*R
L id
RL
S •L = R
R•L
I4= Go to (I0, *) L •* R
L * •R L •id
R•L
1. SL=R
2. SR
3. L*R
4. L id
5. RL
PARSING TABLE
I1= S’ S •
I2= R L• FOLLOW (S)={$}
I7= L * R•
I8= R L•
I9= S L = R•
(2, =), (2, $) =====> r5
(3, $) =====> r2
(9, $) =====> r1
id * = $ R L S
0 s5 s4 3 2 1
1 ACCEPT
2 s6/ r5 r5
3 r2
4 s5 s4 7 8
5 r4 r4
6 s5 s4 9 8/7
7 r3 r3
8 r5 r5
9 r1
OUTPUT: the canonical-LR parsing table functions ACTION AND GOTO for G’.
METHOD:
1. Construct C’ = { I0, I1, …….., In}, the collection of sets of LR(1) items for
G’.
2. State i of the parser is constructed from Ii. The parsing action for state
i is determined as follows.
(a) If [ A .a,b] is in Ii, and GOTO (Ii, a) = Ij, then set ACTION [i, a]
To “shift j”. Here a must be a terminal.
(b) If [A ., a] is in Ii, A S’, then set ACTION[ i, a ] to “reduce A .”
(c) If [ S’ S., $ ] is in Ii, then set ACTION[ i, $ ] to “accept”.
If any conflicting actions result from the above rules, we say the grammar
is not LR(1). The algorithm falls to produce a parser in this case.
3. The goto transitions for state i are constructed for all non terminals A
using the rule: if GOTO( Ii, A) = Ij, then GOTO[ i, A ]= j.
4. All entries not defined by rules (2) and (3) are made “error”.
5. The initial state of the parser is the one constructed from the set of items
containing [S’ .S, $ ].
EXAMPLE
S CC
C cC| d
AUGMENTED GRAMMAR
S’ S
S CC
C cC
C d
I5 = GOTO (I2, C)
S CC•, $
1. S CC
2. C cC
3. C d
PARSING TABLE
I1 = S’ S• , $
I4 = C d• , c | d
I5 = S CC•, $
I7 = C d• , $
I8 = C c C•, c | d
I9 = C c C• , $
LALR parsers are often used in practice because LALR parsing tables are smaller than
LR(1) parsing tables.
The number of states in SLR and LALR parsing tables for a grammar G are equal.
OUTPUT: The LALR parsing – table functions ACTION and GOTO for G’
METHOD:
2. For each core present among the set of LR(1) items, find all sets
having that core, and replace these sets by their union.
3. Let C’ = {J0, J1, ........, Jm}, the collection of sets of LR(1) items. The
parsing action for state i are constructed from Ji in the same manner as
in Algorithm. If there is a parsing action conflict, the algorithm fails
to produce a parser, and the grammar is said not to be LALR(1)
**********