Module 2 C D Notes
Module 2 C D Notes
Module 2 C D Notes
Module – 2
(Introduction to Syntax Analysis)
Role of the Syntax Analyser – Syntax error handling. Review of Context Free Grammars -
Derivation and Parse Trees, Eliminating Ambiguity. Basic parsing approaches - Eliminating left
recursion, left factoring. Top-Down Parsing - Recursive Descent parsing, Predictive Parsing,
LL(1) Grammars.
SYNTAX ANALYSIS
EXAMPLE:
The grammar with the following productions defines simple arithmetic expression:
[Type here]
[Type here]
Notational Conventions
To avoid always having to state that "these are the terminals," "these are the
nonterminals," and so on, the following notational conventions for grammars will be
used.
These symbols are terminals:
a. Lowercase letters early in the alphabet, such as a, b, c.
b. Operator symbols such as +, *, and so on.
c. Punctuation symbols such as parentheses, comma, and so on.
d. The digits 0, 1, . . . , 9.
e. Boldface strings such as id or if, each of which represents a single
terminal symbol.
These symbols are nonterminals:
a. Uppercase letters early in the alphabet, such as A, B, C.
b. The letter S, which, when it appears, is usually the start symbol.
c. Lowercase, italic names such as expr or stmt.
d. When discussing programming constructs, uppercase letters may be used
to represent nonterminals for the constructs. For example, nonterminals
for expressions, terms, and factors are often represented by E, T, and F,
respectively.
Uppercase letters late in the alphabet, such as X, Y, Z, represent grammar
symbols; that is, either nonterminals or terminals.
Lowercase letters late in the alphabet , chiefly u, v, ... ,z, represent (possibly
empty) strings of terminals.
Lowercase Greek letters α, β, γ for example, represent (possibly empty) strings of
grammar symbols.
[Type here]
[Type here]
symbol. Using these conventions, the grammar for arithmetic expression can be
rewritten as:
E→ E + T | E - T | T
T→ T * F | T / F | F
F→ ( E ) | id
The construction of a parse tree can be made precise by taking a derivational view, in
which productions are treated as rewriting rules.
Beginning with the start symbol, each rewriting step replaces a nonterminal by the body
of one of its productions.
For example , consider the following grammar , with a single nonterminal E:
E → E + E | E * E | - E | ( E ) | id
The production E → - E signifies that if E denotes an expression, then – E must also
denote an expression. The replacement of a single E by - E will be described by writing
E => -E which is read, "E derives - E."
The production E -+ ( E ) can be applied to replace any instance of E in any string of
grammar symbols by (E) ,
e.g., E * E => (E) * E or E * E => E * (E)
We can take a single E and repeatedly apply productions in any order to get a sequence
of replacements. For example,
E => - E => - (E) => - (id)
We call such a sequence of replacements a derivation of - (id) from E. This derivation
provides a proof that the string - (id) is one particular instance of an expression.
[Type here]
[Type here]
[Type here]
[Type here]
Example
Let any set of production rules in a CFG be
X → X+X | X*X |X| a
over an alphabet {a}.
The leftmost derivation for the string "a+a*a" may be –
X → X+X → a+X → a + X*X → a+a*X → a+a*a
[Type here]
[Type here]
Root node of parse tree has the start symbol of the given grammar from where
the derivation proceeds.
Leaves of parse tree are labeled by non-terminals or terminals.
If A xyz is a production, then the parse tree will have A as interior node whose
children are x, y and z from its left to right.
[Type here]
[Type here]
The leaves of the parse tree are labeled by nanterminals or terminals and read
from left to right, they constitute a sentential form, called the yield or frontier of
the tree.
Figure above represents the parse tree for the string id+ id*id. The string id + id *
id, is the yield of parse tree depicted in Figure.
AMBIGUITY
An ambiguous grammar is one that produces more than one leftmost or more
than one rightmost derivation for the same sentence.
For most parsers, it is desirable that the grammar be made unambiguous, for if it is
not, we cannot uniquely determine which parse tree to select for a sentence.
In other cases, it is convenient to use carefully chosen ambiguous grammars,
together with disambiguating rules that "throw away" undesirable parse trees,
leaving only one tree for each sentence.
EXAMPLE
Consider very simple sentence id+ id * id.
E ===> E + E E ===> E * E
===> id + E ===> E + E * E
===> id + E * E ===> id + id * E
===> id + id * E ===> id + id * E
===> id + id * id ===> id + id * id
E E
/| \ /| \
E + E *
E E
| /| \ /| \ |
id E * E + E
E
| | id
id id | |
id id
[Type here]
[Type here]
Bottom Up Parsing
In top down parsing, parse tree is constructed from top (root) to the bottom (leaves).
In bottom up parsing, parse tree is constructed from bottom (leaves)) to the top
(root).
It can be viewed as an attempt to construct a parse tree for the input starting from the
root and creating the nodes of parse tree in preorder.
Pre-order traversal means: 1. Visit the root 2. Traverse left subtree 3. Traverse right
subtree.
Top down parsing can be viewed as an attempt to find a leftmost derivation for an
input string (that is expanding the leftmost terminal at every step).
The leftmost leaf, labeled c, matches the first symbol of input w, so we advance
the input pointer to a, the second symbol of w, and consider the next leaf, labeled
A.
Now, We have a match for the second input symbol, a, so we advance the input
pointer to
d, the third input symbol, and compare d against the next leaf, labeled b.
Since b does not match d, we report failure and go back to A to see whether there
is another alternative for A that has not been tried, but that might produce a
match.
In going back to A, we must reset the input pointer to position 2 , the position it had
when we first came to A, which means that the procedure for A must store the input
[Type here]
[Type here]
[Type here]
[Type here]
We have a match for the second input symbol, a, so we advance the input pointer to
d, the third input symbol, and compare d against the next leaf, labeled b.
Since b does not match d, we report failure and go back to A to see whether there
is another alternative for A that has not been tried, but that might produce a
match.
In going back to A, we must reset the input pointer to position 2 , the position it had
when we first came to A, which means that the procedure for A must store the input
pointer in a local variable.
The second alternative for A produces the tree as:
The leaf a matches the second symbol of w and the leaf d matches the third
symbol. Since we have produced a parse tree for w, we halt and announce
successful completion of parsing. (that is the string parsed completely and the
parser stops).
The leaf a matches the second symbol of w and the leaf d matches the third
symbol. Since we have produced a parse tree for w, we halt and announce
successful completion of parsing. (that is the string parsed completely and the
parser stops).
2.1.1PREDICTIVE PARSING
Left Recursion
Eliminating the immediate left recursion to the productions for E and then for T, we
obtain
E T E’
E’ + T E’|
T F T’
T’ * F T’|
F ( E )|
id
[Type here]
[Type here]
No matter how many A-productions there are, we can eliminate immediate left
recursion from them by the following technique. First, we group the A
productions as
A A1 | A2 | . . . | Am| 1| 2| . . . | n
A 1 A’| 2 A’| . . . | n
m A’|
Left Factoring
are two A-productions, and the input begins with a non-empty string derived from
we do not know whether to expand A to 1 or 2.
However, we may defer the decision by expanding A to B. Then, after seeing
the input derived from , we may expand B to 1 or 2 .
|
[Type here]
[Type here]
Requirements
1. Stackv
2. Parsing Table
3. Input Buffer
4. Parsing
2. If X = a $ , the parser pops X off the stack and advances the input pointer
to the next input symbol,
[Type here]
[Type here]
w$; repeat
ip else error ()
else /* X is a nonterminal */
if M|X, a|= X Y1 Y2 . . . Yi then
end
else error ()
*/
EXAMPLE
[Type here]
[Type here]
Consider Grammar:
E T E’
E' +T E' |
Є T F T'
T' * F T' |
ЄF(E)|
id
Uses 2 functions:
FIRST()
FOLLOW()
EXAMPLE
Consider Grammar:
E T E’
E' +T E' |
Є T F T'
T' * F T' |
[Type here]
[Type here]
ЄF(E)|
id
FOLLOW
FOLLOW is defined only for non-terminals of the grammar G.
It can be defined as the set of terminals of grammar G , which can immediately
follow the non-terminal in a production rule from start symbol.
In other words, if A is a nonterminal, then FOLLOW(A) is the set of terminals 'a'
that can appear immediately to the right of A in some sentential form.
Rules to Compute Follow Set
1. If S is the start symbol, then add $ to the FOLLOW(S).
EXAMPLE
Consider Grammar:
E T E’
E' +T E' |
Є T F T'
T' * F T' |
ЄF(E)|
id
EXAMPLE
[Type here]
[Type here]
EXAMPLE
INPUT: Grammar G.
OUTPUT: Parsing table
METHOD
1. For each production A of the grammar, do steps 2 and 3.
2. For each terminal a in FIRST(), add A to M|A, a|
3. If is in FIRST(), add A to M|A, b | for each terminal b in
FOLLOW (A). If is in FIRST ()j and $ is in FOLLOW(A), add
A to M |A, $|
4. Make each undefined entry of M be error
[Type here]
[Type here]
Parsing Table
[Type here]
[Type here]
[Type here]
[Type here]
2.1.2LL(1)GRAMMERS
LL(l) grammars are the class of grammars from Which the predictive parsers can
be constructed automatically.
A context-free grammar G = (VT, VN, P, S) whose parsing table has no multiple
entries is said to be LL(1).
In the name LL(1),
the first L stands for scanning the input from left to right,
the second L stands for producing a leftmost derivation,
and the 1 stands for using one input symbol of lookahead at each step to
make parsing action decision.
A language is said to be LL(1) if it can be generated by a LL(1) grammar. It can
be shown that LL(1) grammars are
not ambiguous and
not left-recursive
EXAMPLE
Consider the following grammar
S → i E t S S' | a
S' → eS | ϵ
E→b