3 Syntax Analysis - Top Down Parsing
3 Syntax Analysis - Top Down Parsing
3 Syntax Analysis - Top Down Parsing
Backtracking: It means, if one derivation of a production fails, the syntax analyser restarts the process using
different rules of same production. This technique may process the input string more than once to determine
the right production.
Example
Consider the following grammar
𝑆 → 𝑎𝑏𝐴
𝐴 → 𝑐𝑑|𝑐|𝑒
For the input stream 𝑎𝑏, the recursive descent parser starts by constructing a parse tree representing
𝑆 → 𝑎𝑏𝐴 as shown below:
S
a b A
a b A
c d
Since it does not match 𝑎𝑏, it backtracks and tries the alternative 𝐴 → 𝑐 as shown below:
a b A
However, the parse tree does not match 𝑎𝑏; the parser backtracks and tries the alternative 𝐴 → 𝑒 as shown
below:
a b A
parseT’() =
if next = ’a’ or next = ’b’ or next = ’$’ then
parseT() ; match(’$’)
else reportError()
parseT() =
if next = ’b’ or next = ’c’ or next = ’$’ then
parseR()
else if next = ’a’ then
match(’a’) ; parseT() ; match(’c’)
parseR() =
if next = ’c’ or next = ’$’ then
(* do nothing *)
else if next = ’b’ then
match(’b’) ; parseR()
else reportError()
b) Predictive Parsing
Predictive parsing is a special form of recursive descent parsing in which the current input token
unambiguously determines the production to be applied at each step i.e. they are backtrack free.
Non-recursive Predictive – 𝐋𝐋(𝐤) Parsing
This is a parsing strategy which involves building tables instead of writing code. It is also possible to
automatically create tables. These parsers scan over the input stream using a prefix of tokens so as to
identify production to be applied. The language accepted by these parsers is called
𝐿𝐿(𝑘),
where 𝑘 is the length of the prefix. 𝐿𝐿 Stands for: Left to right scan; left most derivation.
The length of the prefix to be considered will be 𝑘 = 1. The construction of an 𝐿𝐿(1) parser for the
grammar < 𝑉𝑁 , 𝑉𝑇 , 𝑃, 𝑆 > requires first computing the following properties:
Generally, FIRST (A) is the set of terminals that can start a string derivable from A; FOLLOW (A) is the
set of terminals that can follow an occurrence of A, FIRST (𝛼) is the set of terminals that can start a string
derivable from 𝛼.
To compute these sets of grammar, it is necessary to define another property:
Notice that a non-terminal A is nullable if 𝑛𝑢𝑙𝑙𝑎𝑏𝑙𝑒(𝐴) = 𝑡𝑟𝑢𝑒, and a terminal is never nullable.
The following is the algorithm for computing first and follow sets of a grammar:
We can extend the notation of FIRST set to include 𝑒 and define them for strings as follows:
𝐹𝐼𝑅𝑆𝑇(𝑒) = {𝑒}
𝑋 𝑖𝑓 𝑋 ∈ 𝑇
Example 1
Given the grammar
𝑆 → 𝑎𝑆|𝑎
Example 2
Consider the grammar
𝐸 → 𝑇|𝐸 + 𝑇
𝑇 → 𝐹|𝑇 ∗ 𝐹
𝐹 → 𝑎|(𝐸)
Find whether the grammar is 𝐿𝐿(1).
Solution:
Calculate the FIRST sets:
𝐹𝐼𝑅𝑆𝑇(𝐹) = {𝑎, ( }
𝐹𝐼𝑅𝑆𝑇(𝑇) = {𝑎, ( }
𝐹𝐼𝑅𝑆𝑇(𝐸) = {𝑎, ( }
Calculate the LOOKAHEAD sets:
𝐿𝑂𝑂𝐾𝐴𝐻𝐸𝐴𝐷(𝐸 → 𝑇) = {𝑎, (}
Example 3
Verify whether or not the following grammar is 𝐿𝐿(1):
𝑆 → 𝐴𝐵𝑎
𝐴 → 𝑏𝐴
𝐴→𝑒
𝐵 → 𝑎𝐵
𝐵→𝑒
Solution
𝐿𝑂𝑂𝐾𝐴𝐻𝐸𝐴𝐷(𝑆 → 𝐴𝐵𝑎) = (𝐹𝐼𝑅𝑆𝑇(𝐴) − {𝑒}) ∪ (𝐹𝐼𝑅𝑆𝑇(𝐵) − {𝑒}) ∪ 𝐹𝐼𝑅𝑆𝑇(𝑎)
= {𝑏} ∪ {𝑎} = {𝑎, 𝑏}
𝐿𝑂𝑂𝐾𝐴𝐻𝐸𝐴𝐷(𝐴 → 𝑏𝐴) = 𝐹𝐼𝑅𝑆𝑇(𝑏) = {𝑏}
𝐿𝑂𝑂𝐾𝐴𝐻𝐸𝐴𝐷(𝐴 → 𝑒) = (𝐹𝐼𝑅𝑆𝑇(𝑒) − {𝑒}) ∪ 𝐹𝑂𝐿𝐿𝑂𝑊(𝐴) = { } ∪ {𝑎} = {𝑎}
𝐿𝑂𝑂𝐾𝐴𝐻𝐸𝐴𝐷(𝐵 → 𝑎𝐵) = 𝐹𝐼𝑅𝑆𝑇(𝑎) = {𝑎}
𝐿𝑂𝑂𝐾𝐴𝐻𝐸𝐴𝐷(𝐵 → 𝑒) = (𝐹𝐼𝑅𝑆𝑇(𝑒) − {𝑒}) ∪ 𝐹𝑂𝐿𝐿𝑂𝑊(𝐵) = { } ∪ {𝑎} = {𝑎}
Hence the grammar is not 𝐿𝐿(1) since 𝑎 predicts 𝐵 → 𝑎𝐵 and 𝐵 → 𝑒.
Example 4
Consider the following grammar:
𝑆 → 𝑎𝐴𝑎
𝑆→𝑏
𝐴 → 𝑎𝑏
𝐴→𝑒
Verify whether the grammar is 𝐿𝐿(1) or not.
Solution:
𝐿𝑂𝑂𝐾𝐴𝐻𝐸𝐴𝐷(𝑆 → 𝑎𝐴𝑎) = 𝐹𝐼𝑅𝑆𝑇(𝑎) = {𝑎}
𝐿𝑂𝑂𝐾𝐴𝐻𝐸𝐴𝐷(𝑆 → 𝑏) = 𝐹𝐼𝑅𝑆𝑇(𝑏) = {𝑏}
𝐿𝑂𝑂𝐾𝐴𝐻𝐸𝐴𝐷(𝐴 → 𝑎𝑏) = 𝐹𝐼𝑅𝑆𝑇(𝑎) = {𝑎}
𝐿𝑂𝑂𝐾𝐴𝐻𝐸𝐴𝐷(𝐴 → 𝑒) = 𝐹𝑂𝐿𝐿𝑂𝑊(𝐴) = {𝑎}
Therefore the grammar is not 𝐿𝐿(1) since 𝑎 is in both 𝐴 → 𝑎𝑏 and 𝐴 → 𝑒.
Method - 1:
Once the first and follow states have been computed, it is possible to construct 𝐿𝐿(1) parse table 𝑀 that
maps pairs of non-terminals and terminals to production using the following method:
Input: Grammar 𝐺
Output: Parsing table 𝑀
Method:
1. ∀ 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 𝐴 → 𝛼
∀𝑎 ∈ 𝐿𝑂𝑂𝐾𝐴𝐻𝐸𝐴𝐷 𝐴 → 𝛼
𝑎𝑑𝑑 𝐴 → 𝛼 𝑡𝑜 𝑀 𝐴, 𝑎
2. Set each unidentified entry of 𝑀 to error
Example
Consider the following grammar:
𝑆→𝐸
𝐸 → 𝑇𝐸′
𝐸′ → +𝐸|−𝐸|𝑒
𝑇 → 𝐹𝑇′
𝑇′ →∗ 𝑇|/𝑇|𝑒
𝐹 → 𝑖𝑑|𝑛𝑢𝑚
Production LOOKAHEAD
S→E {num, id}
E→TE’ {num, id}
E’→+E {+}
E’→-E {-}
E’→e {$}
T→FT’ {num, id}
T’→*T {*}
Symbol Id num + - * / $
S S→E S→E
E E→TE’ E→TE’
E’ E’→+E E’→-E E’→e
T T→FT’ T→FT’
T’ T’→e T’→e T’→*T T’→/T T’→e
F F→id F→num
Method - 2:
Alternatively, once the FIRST and FOLLOW sets have been computed, LL(1) parse table M, that maps pairs
of non-terminals and terminals to productions can be constructed by using the following algorithm.
𝐹𝑜𝑟 𝑒𝑎𝑐ℎ 𝐴 → 𝛼 ∈ 𝑃 𝑑𝑜
𝑖𝑓 𝑒 ∈ 𝐹𝑖𝑟𝑠𝑡(𝛼) 𝑡ℎ𝑒𝑛
𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑢 ∈ 𝑓𝑜𝑙𝑙𝑜𝑤(𝐴) 𝑑𝑜
𝑎𝑑𝑑 𝐴 → 𝛼 𝑡𝑜 𝑀 𝐴, 𝑢
𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑢 ∈ 𝐹𝑖𝑟𝑠𝑡 (𝛼) 𝑑𝑜
𝑎𝑑𝑑 𝐴 → 𝛼 𝑡𝑜 𝑀 𝐴, 𝑢
If the resulting table has at most one production per (𝐴, 𝑢) pair then the grammar is 𝐿𝐿(1):
Example
Consider the following grammar:
𝐸 → 𝑀𝐸’
𝐸’ → 𝑒
𝐸’ → +𝑀𝐸’
𝑀 → 𝐴𝑀’
𝑀’ → 𝑒
𝑀’ →∗ 𝐴𝑀’
𝐴 → 𝑛𝑢𝑚
𝐴 → (𝐸)
The following is the table of the first and follow sets:
The table driven parser for 𝐿𝐿(1) grammar works using a parsing table and an auxiliary stack of grammar
symbols. The table driven parser uses the following algorithm:
The following is the trace of learning the algorithm on the string “1 + 2 ∗ 3” using the above parsing table:
Note: Both the stack and the input contains an end symbol $ to denote that the stack is empty and the input is
consumed