Unit 2 Compiler
Unit 2 Compiler
Unit 2 Compiler
SYNTAX ANALYSIS
Syntax analysis is the second phase of the compiler. It gets the input from the tokens and
generates a syntax tree or parse tree.
b e
verifies that the string can be generated by the grammar for the source language. It reports any
s e
Position of parser in compiler model
/ c
:/
source lexical token parser parse rest of intermediate
program analyzer tree front end representation
h t
symbol
table
Issues :
http://csetube.weebly.com/
1. Variable re-declaration
2. Variable initialization before use.
3. Data type mismatch for an operation.
On discovering an error, the parser performs local correction on the remaining input that
allows it to continue. Example: Insert a missing semicolon or delete an extraneous semicolon etc.
Error productions:
The parser is constructed using augmented grammar with error productions. If an error
production is used by the parser, appropriate error diagnostics can be generated to indicate the
erroneous constructs recognized by the input.
Global correction:
Given an incorrect input string x and grammar G, certain algorithms can be used to find a
parse tree for a string y, such that the number of insertions, deletions and changes of tokens is as
small as possible. However, these methods are in general too costly in terms of time and space.
http://csetube.weebly.com/
CONTEXT-FREE GRAMMARS
Terminals : These are the basic symbols from which strings are formed.
Non-Terminals : These are the syntactic variables that denote a set of strings. These help to
define the language generated by the grammar.
Start Symbol : One non-terminal in the grammar is denoted as the “Start-symbol” and the set of
strings it denotes is the language defined by the grammar.
Productions : It specifies the manner in which terminals and non-terminals can be combined to
form strings. Each production consists of a non-terminal, followed by an arrow, followed by a
string of non-terminals and terminals.
Derivations:
Derivation is a process that generates a valid string with the help of grammar by replacing the
non-terminals on the left with the string on the right side of the production.
http://csetube.weebly.com/
To generate a valid string - ( id+id ) from the grammar the steps are
1. E → - E
2. E → - ( E )
3. E → - ( E+E )
4. E → - ( id+E )
5. E → - ( id+id )
Types of derivations:
b e
for replacement. t u
In rightmost derivations, the rightmost non-terminal in each sentinel is always chosen first
Example: s e
/ c
/
Given grammar G : E → E+E | E*E | ( E ) | - E | id
:
p
Sentence to be derived : – (id+id)
t
LEFTMOST DERIVATION
h t RIGHTMOST DERIVATION
E→-E E→-E
E→-(E) E→-(E)
E → - ( E+E ) E → - (E+E )
E → - ( id+E ) E → - ( E+id )
E → - ( id+id ) E → - ( id+id )
String that appear in leftmost derivation are called left sentinel forms.
String that appear in rightmost derivation are called right sentinel forms.
Sentinels:
http://csetube.weebly.com/
Yield or frontier of tree:
Each interior node of a parse tree is a non-terminal. The children of node can be a
terminal or non-terminal of the sentinel forms that are read from left to right. The sentinel form
in the parse tree is called yield or frontier of the tree.
Ambiguity:
A grammar that produces more than one parse for some sentence is said to be ambiguous
grammar.
The sentence id+id*id has the following two distinct leftmost derivations:
E → E+ E E → E* E
E → id + E E→E+E*E
E → id + E * E E → id + E * E
k /
E → id + id * E
t
E → id + id * E
.
E → id + id * id e
E → id + id * id
b
The two corresponding parse trees are :
t u
E e E
/ cs
E + E
: / E * E
tp
id E * E E + E id
ht id id id id
WRITING A GRAMMAR
http://csetube.weebly.com/
Regular Expressions vs. Context-Free Grammars:
The transition diagram has set of states and The context-free grammar has set of
edges. productions.
It is useful for describing the structure of lexical It is useful in describing nested structures
. t
b e
The lexical rules of a language are simple and RE is used to describe them.
t u
Regular expressions provide a more concise and easier to understand notation for tokens
than grammars.
s e
/ c
Efficient lexical analyzers can be constructed automatically from RE than from
grammars.
: /
p
Separating the syntactic structure of a language into lexical and nonlexical parts provides
t
h t
a convenient way of modularizing the front end into two manageable-sized components.
Eliminating ambiguity:
Ambiguity of the grammar that produces more than one parse tree for leftmost or rightmost
derivation can be eliminated by re-writing the grammar.
Consider this example, G: stmt → if expr then stmt | if expr then stmt else stmt | other
This grammar is ambiguous since the string if E1 then if E2 then S1 else S2 has the following
two parse trees for leftmost derivation :
http://csetube.weebly.com/
1. stmt
E1
E2 S1 S2
k /
2. stmt . t
b e
t u
if expr then stmt
s e else stmt
/ c
E1
p if
:/ expr then stmt
S2
t t
h E2 S1
http://csetube.weebly.com/
If there is a production A → Aα | β it can be replaced with a sequence of two productions
A → βA’
A’ → αA’ | ε
E → E+T | T
T → T*F | F
F → (E) | id
E → TE’
E’ → +TE’ | ε
k /
Then eliminate for T as
. t
T → FT’
b e
T’→ *FT’ | ε
t u
s e
Thus the obtained grammar after eliminating left recursion is
E → TE’
/ c
E’ → +TE’ | ε
: /
T → FT’ t p
T’ → *FT’ | ε h t
F → (E) | id
http://csetube.weebly.com/
Left factoring:
A → αA’
A’ → β 1 | β2
S → iEtSS’ | a
S’ → eS | ε
E→b k /
. t
PARSING
b e
t u
It is the process of analyzing a continuous stream of input in order to determine its
grammatical structure with respect to a given formal grammar.
Parse tree: s e
/ c
Graphical representation of a derivation or deduction is called a parse tree. Each interior
: /
node of the parse tree is a non-terminal; the children of the node can be terminals or non-
terminals.
t p
Types of parsing:
h t
1. Top down parsing
2. Bottom up parsing
Top–down parsing : A parser can start with the start symbol and try to transform it to the
input string.
Example : LL Parsers.
Bottom–up parsing : A parser can start with input and attempt to rewrite it into the start
symbol.
Example : LR Parsers.
TOP-DOWN PARSING
http://csetube.weebly.com/
Types of top-down parsing :
Recursive descent parsing is one of the top-down parsing techniques that uses a set of
recursive procedures to scan its input.
This parsing method may involve backtracking, that is, making repeated scans of the
input.
The parse tree can be constructed using the following top-down approach :
k /
Step1:
. t
b e
Initially create a tree with single node labeled S. An input pointer points to ‘c’, the first symbol
of w. Expand the tree with the production of S.
t u
S
s e
c /
A
c d
Step2:
p :/
t t
The leftmost leaf ‘c’ matches the first symbol of w, so advance the input pointer to the second
h
symbol of w ‘a’ and consider the next leaf ‘A’. Expand A using the first alternative.
c A d
a b
Step3:
The second symbol ‘a’ of w also matches with second leaf of tree. So advance the input pointer
to third symbol of w ‘d’. But the third leaf of tree is b which does not match with the input
symbol d.
http://csetube.weebly.com/
Hence discard the chosen production and reset the pointer to second position. This is called
backtracking.
Step4:
c A d
k /
A left-recursive grammar can cause a recursive-descent parser to go into an infinite loop. Hence,
elimination of left-recursion must be done before parsing.
. t
Consider the grammar for arithmetic expressions
b e
E → E+T | T t u
T → T*F | F s e
F → (E) | id / c
: /
After eliminating the left-recursion the grammar becomes,
E → TE’ t p
E’ → +TE’ | ε h t
T → FT’
T’ → *FT’ | ε
F → (E) | id
Recursive procedure:
Procedure E()
begin
T( );
EPRIME( );
end
http://csetube.weebly.com/
Procedure EPRIME( )
begin
If input_symbol=’+’ then
ADVANCE( );
T( );
EPRIME( );
end
Procedure T( )
begin
F( );
TPRIME( );
end
Procedure TPRIME( )
begin
If input_symbol=’*’ then
ADVANCE( );
F( ); k /
TPRIME( );
. t
end
b e
Procedure F( )
begin t u
If input-symbol=’id’ then
se
ADVANCE( );
else if input-symbol=’(‘ then / c
ADVANCE( );
: /
E( );
t p
else if input-symbol=’)’ then
ADVANCE( );
h t
end
else ERROR( );
Stack implementation:
E( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
http://csetube.weebly.com/
TPRIME( ) id+id*id
EPRIME( ) id+id*id
ADVANCE( ) id+id*id
T( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
TPRIME( ) id+id*id
ADVANCE( ) id+id*id
F( ) id+id*id
ADVANCE( ) id+id*id
k /
TPRIME( ) id+id*id
. t
2. PREDICTIVE PARSING b e
t u
Predictive parsing is a special case of recursive descent parsing where no backtracking is
required.
s e
/ c
The key problem of predictive parsing is to determine the production to be applied for a
/
non-terminal in case of alternatives.
:
Non-recursive predictive parser
t p
h t
INPUT a + b $
STACK
X Predictive parsing program
OUTPUT
Y
$
Parsing Table M
http://csetube.weebly.com/
The table-driven predictive parser has an input buffer, stack, a parsing table and an output
stream.
Input buffer:
It consists of strings to be parsed, followed by $ to indicate the end of the input string.
Stack:
It contains a sequence of grammar symbols preceded by $ to indicate the bottom of the stack.
Initially, the stack contains the start symbol on top of $.
Parsing table:
It is a two-dimensional array M[A, a], where ‘A’ is a non-terminal and ‘a’ is a terminal.
The parser is controlled by a program that considers X, the symbol on top of stack, and a, the
possibilities: k /
current input symbol. These two symbols determine the parser action. There are three
. t
1. If X = a = $, the parser halts and announces successful completion of parsing.
e
2. If X = a ≠ $, the parser pops X off the stack and advances the input pointer to the next
b
input symbol.
t u
3. If X is a non-terminal , the program consults entry M[X, a] of the parsing table M. This
e
entry will either be an X-production of the grammar or an error entry.
s
/ c
If M[X, a] = {X → UVW},the parser replaces X on top of the stack by WVU.
If M[X, a] = error, the parser calls an error recovery routine.
: /
p
Algorithm for nonrecursive predictive parsing:
t
h t
Input : A string w and a parsing table M for grammar G.
Method : Initially, the parser has $S on the stack with S, the start symbol of G on top, and w$ in
the input buffer. The program that utilizes the predictive parsing table M to produce a parse for
the input is as follows:
http://csetube.weebly.com/
pop X from the stack;
push Yk, Yk-1, … ,Y1 onto the stack, with Y1 on top;
output the production X → Y1 Y2 . . . Yk
end
else error()
until X = $ /* stack is empty */
The construction of a predictive parser is aided by two functions associated with a grammar G :
1. FIRST
2. FOLLOW
: /
2. If there is a production A → αBβ, then everything in FIRST(β) except ε is placed in
follow(B).
t p
h t
3. If there is a production A → αB, or a production A → αBβ where FIRST(β) contains ε, then
everything in FOLLOW(A) is in FOLLOW(B).
Input : Grammar G
Method :
http://csetube.weebly.com/
Example:
E → E+T | T
T → T*F | F
F → (E) | id
E → TE’
E’ → +TE’ | ε
T → FT’
T’ → *FT’ | ε
F → (E) | id
First( ) :
FIRST(E) = { ( , id}
FIRST(E’) ={+ , ε } k /
. t
FIRST(T) = { ( , id}
b e
FIRST(T’) = {*, ε }
t u
FIRST(F) = { ( , id }
s e
Follow( ):
/ c
FOLLOW(E) = { $, ) }
: /
FOLLOW(E’) = { $, ) }
t p
t
FOLLOW(T) = { +, $, ) }
h
FOLLOW(T’) = { +, $, ) }
FOLLOW(F) = {+, * , $ , ) }
NON- id + * ( ) $
TERMINAL
E E → TE’ E → TE’
E’ E’ → +TE’ E’ → ε E’→ ε
T T → FT’ T → FT’
T’ T’→ ε T’→ *FT’ T’ → ε T’ → ε
F F → id F → (E)
http://csetube.weebly.com/
Stack implementation:
t p
Consider this following grammar:
h t
S → iEtS | iEtSeS | a
E→b
S → iEtSS’ | a
S’→ eS | ε
E→b
To construct a parsing table, we need FIRST() and FOLLOW() for all the non-terminals.
FIRST(S) = { i, a }
FIRST(S’) = {e, ε }
FIRST(E) = { b}
FOLLOW(S) = { $ ,e }
http://csetube.weebly.com/
FOLLOW(S’) = { $ ,e }
FOLLOW(E) = {t}
Parsing table:
NON- a b e i t $
TERMINAL
S S→a S → iEtSS’
S’ S’ → eS S’ → ε
S’ → ε
E E→b
Since there are more than one production, the grammar is not LL(1) grammar.
1. Shift
2.
3.
Reduce
Accept k /
4. Error
. t
Implementation of predictive parser:
b e
1. t u
Elimination of left recursion, left factoring and ambiguous grammar.
2. e
Construct FIRST() and FOLLOW() for all non-terminals.
s
3.
4. / c
Construct predictive parsing table.
Parse the given input string using stack and parsing table.
: /
BOTTOM-UP PARSING
t p
h t
Constructing a parse tree for an input string beginning at the leaves and going towards the root is
called bottom-up parsing.
SHIFT-REDUCE PARSING
Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a parse tree
for an input string beginning at the leaves (the bottom) and working up towards the root (the
top).
Example:
Consider the grammar:
S → aABe
A → Abc | b
B→d
The sentence to be recognized is abbcde.
http://csetube.weebly.com/
REDUCTION (LEFTMOST) RIGHTMOST DERIVATION
abbcde (A → b) S → aABe
aAbcde (A → Abc) → aAde
aAde (B → d) → aAbcde
aABe (S → aABe) → abbcde
S
The reductions trace out the right-most derivation in reverse.
Handles:
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the non-terminal on the left side of the production represents one step along the
reverse of a rightmost derivation.
Example:
E → E+E k /
E → E*E
. t
E → (E)
E → id b e
t u
And the input string id1+id2*id3
s e
The rightmost derivation is :
/ c
E → E+E
→ E+E*E : /
→ E+E*id3
t p
→ E+id2*id3
→ id1+id2*id3 h t
In the above derivation the underlined substrings are called handles.
Handle pruning:
http://csetube.weebly.com/
Stack implementation of shift-reduce parsing :
$E +id2*id3 $ shift
$ E+ id2*id3 $ shift
$ E+E*E $ k /
reduce by E→ E *E
$ E+E $ . t
reduce by E→ E+E
b e
tu
$E $ accept
s e
Actions in shift-reduce parser:
/ c
: /
shift – The next input symbol is shifted onto the top of the stack.
reduce – The parser replaces the handle within a stack with a non-terminal.
t p
accept – The parser announces successful completion of parsing.
h
routine.
t
error – The parser discovers that a syntax error has occurred and calls an error recovery
2. Reduce-reduce conflict: The parser cannot decide which of several reductions to make.
1. Shift-reduce conflict:
Example:
http://csetube.weebly.com/
Stack Input Action Stack Input Action
$ E+E *id $ Reduce by $E+E *id $ Shift
E→E+E
$E *id $ Shift $E+E* id $ Shift
2. Reduce-reduce conflict: k /
Consider the grammar: . t
M → R+R | R+c | R b e
R→c
t u
and input c+c
se
Stack Input Action
/ c Stack Input Action
$ c+c $
: /
Shift $ c+c $ Shift
$c +c $ t p Reduce by $c +c $ Reduce by
h t R→c R→c
$R +c $ Shift $R +c $ Shift
$ R+ c$ Shift $ R+ c$ Shift
http://csetube.weebly.com/
Viable prefixes:
α is a viable prefix of the grammar if there is w such that αw is a right sentinel form.
The set of prefixes of right sentinel forms that can appear on the stack of a shift-reduce parser
are called viable prefixes.
The set of viable prefixes is a regular language.
OPERATOR-PRECEDENCE PARSING
Operator precedence parser can be constructed from a grammar called Operator-grammar. These
grammars have the property that no production on right side is ɛ or has two adjacent non -
terminals.
Example:
E → EAE | (E) | -E | id
A→+|-|*|/|↑
k /
. t
Since the right side EAE has three consecutive non-terminals, the grammar can be written as
follows:
b e
E → E+E | E-E | E*E | E/E | E↑E | -E | id
t u
Operator precedence relations:
s e
< . - less than / c
There are three disjoint precedence relations namely
.
= - equal to
> - greater than : /
t p
The relations give the following meaning:
h t
a < . b – a yields precedence to b
a = b – a has the same precedence as b
a . > b – a takes precedence over b
http://csetube.weebly.com/
Also make
( = ) , ( < . ( , ) . > ) , ( < . id , id . > ) , $ < . id , id . > $ , $ < . ( , ) . > $
Example:
E → E+E | E-E | E*E | E/E | E↑E | (E) | -E | id is given in the following table assuming
.
>
>
.
.
>
>
.
.
>
>
.
.
>
>
<.
.
∙> b e
<. <. .
.
>
>
.
.
>
>
( <. <. <. <.
t
<. u <. <. =
) .
> .
> .
> .
> e.
> .
> .
>
$ < .
< .
<
/
.
cs
< .
< .
< .
< .
:
Operator precedence parsing algorithm: /
t p
Input : An input string w and a table of precedence relations.
ht
Output : If w is well formed, a skeletal parse tree ,with a placeholder non-terminal E labeling all
interior nodes; otherwise, an error indication.
Method : Initially the stack contains $ and the input buffer the string w $. To parse, we execute
the following program :
http://csetube.weebly.com/
(9) else if a . > b then /*reduce*/
(10) repeat
(11) pop the stack
(12) until the top stack terminal is related by <.
to the terminal most recently popped
(13) else error( )
end
Example: k /
. t
Consider the grammar E → E+E | E-E | E*E | E/E | E↑E | (E) | id. Input string is id+id*id .The
implementation is as follows:
b e
STACK INPUT t u COMMENT
$ <∙ id+id*id $ e shift id
$ id
$
∙>
<∙ / cs
+id*id $
+id*id $
pop the top of the stack id
shift +
$+ : /
<∙ id*id $ shift id
$ +id
t p ∙> *id $ pop id
$+
$+* h t <∙
<∙
*id $
id $
shift *
shift id
$ + * id ∙> $ pop id
$+* ∙> $ pop *
$+ ∙> $ pop +
$ $ accept
http://csetube.weebly.com/
LR PARSERS
An efficient bottom-up syntax analysis technique that can be used to parse a large class of
CFG is called LR(k) parsing. The ‘L’ is for left-to-right scanning of the input, the ‘R’ for
constructing a rightmost derivation in reverse, and the ‘k’ for the number of input symbols.
When ‘k’ is omitted, it is assumed to be 1.
Advantages of LR parsing:
It recognizes virtually all programming language constructs for which CFG can be
written.
It is an efficient non-backtracking shift-reduce parsing method.
A grammar that can be parsed using LR method is a proper superset of a grammar that
can be parsed with predictive parser.
It detects a syntactic error as soon as possible.
Drawbacks of LR method:
It is too much of work to construct a LR parser by hand for a programming language
/
grammar. A specialized tool, called a LR parser generator, is needed. Example: YACC.
k
Types of LR parsing method:
1. SLR- Simple LR . t
Easiest to implement, least powerful.
b e
2. CLR- Canonical LR
Most powerful, most expensive. t u
3. LALR- Look-Ahead LR
s e
c
Intermediate in size and cost between the other two methods.
/
The LR parsing algorithm:
: /
p
The schematic form of an LR parser is as follows:
t
h t
INPUT a1 ai an $
… …
STACK
http://csetube.weebly.com/
It consists of : an input, an output, a stack, a driver program, and a parsing table that has two
parts (action and goto).
The parsing program reads characters from an input buffer one at a time.
The program uses a stack to store a string of the form s 0X1s1X2s2…Xmsm, where sm is on
top. Each Xi is a grammar symbol and each si is a state.
The parsing table consists of two parts : action and goto functions.
Action : The parsing program determines sm, the state currently on top of stack, and a i, the
current input symbol. It then consults action[sm,ai] in the action table which can have one of four
values :
LR Parsing algorithm: b e
t u
Input: An input string w and an LR parsing table with functions action and goto for grammar G.
s e
Output: If w is in L(G), a bottom-up-parse for w; otherwise, an error indication.
/ c
Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w$ in the input
: /
buffer. The parser then executes the following program :
t p
set ip to point to the first input symbol of w$;
h t
repeat forever begin
let s be the state on top of the stack and
a the symbol pointed to by ip;
if action[s, a] = shift s’ then begin
push a then s’ on top of the stack;
advance ip to the next input symbol
end
else if action[s, a] = reduce A→β then begin
pop 2* | β | symbols off the stack;
let s’ be the state now on top of the stack;
push A then goto[s’, A] on top of the stack;
output the production A→ β
end
else if action[s, a] = accept then
return
else error( )
end
http://csetube.weebly.com/
CONSTRUCTING SLR(1) PARSING TABLE:
LR(0) items:
An LR(0) item of a grammar G is a production of G with a dot at some position of the
right side. For example, production A → XYZ yields the four items :
A → . XYZ
A → X . YZ
A → XY . Z
A → XYZ .
Closure operation:
If I is a set of items for a grammar G, then closure(I) is the set of items constructed from I
by the two rules:
k /
1. Initially, every item in I is added to closure(I).
. t
b e
2. If A → α . Bβ is in closure(I) and B → γ is a production, then add the item B → . γ to I , if it
is not already there. We apply this rule until no more new items can be added to closure(I).
Goto operation: t u
e
Goto(I, X) is defined to be the closure of the set of all items [A→ αX . β] such that
s
[A→ α . Xβ] is in I.
/ c
/
Steps to construct SLR parsing table for grammar G are:
:
1. Augment G and produce G’
t p
2. Construct the canonical collection of set of items C for G’
h t
3. Construct the parsing action function action and goto using the following algorithm that
requires FOLLOW(A) for each non-terminal of grammar.
If any conflicting actions are generated by the above rules, we say grammar is not SLR(1).
http://csetube.weebly.com/
3. The goto transitions for state i are constructed for all non-terminals A using the rule:
If goto(Ii,A) = Ij, then goto[i,A] = j.
4. All entries not defined by rules (2) and (3) are made “error”
5. The initial state of the parser is the one constructed from the set of items containing
[S’→.S].
I0 : E’ → . E
E →.E+T
E →.T
T →.T*F
T →.F
F → . (E)
F → . id
GOTO ( I0 , E) GOTO ( I4 , id )
I1 : E’ → E . I5 : F → id .
E →E.+T
http://csetube.weebly.com/
GOTO ( I6 , T )
GOTO ( I0 , T) I9 : E → E + T .
I2 : E → T . T→T.*F
T →T.*F
GOTO ( I6 , F )
GOTO ( I0 , F) I3 : T → F .
I3 : T → F .
GOTO ( I6 , ( )
I4 : F → ( . E )
GOTO ( I1 , + ) t u T →.F
F → . (E)
I6 : E → E + . T
se F → . id
T →.T*F
T →.F / c GOTO ( I7 , id )
F → . (E)
F → . id p:/ I5 : F → id .
t t GOTO ( I8 , ) )
GOTO ( I2 , * )
I7 : T → T * . F
h I11 : F → ( E ) .
F → . (E) GOTO ( I8 , + )
F → . id I6 : E → E + . T
T→.T*F
GOTO ( I4 , E ) T→.F
I8 : F → ( E . ) F→.(E)
E→E.+T F → . id
GOTO ( I4 , T) GOTO ( I9 , *)
I2 : E →T . I7 : T → T * . F
T→T.*F F→.(E)
F → . id
GOTO ( I4 , F)
I3 : T → F .
http://csetube.weebly.com/
GOTO ( I4 , ( )
I4 : F → ( . E)
E →.E+T
E →.T
T →.T*F
T →.F
F → . (E)
F → id
FOLLOW (E) = { $ , ) , +)
FOLLOW (T) = { $ , + , ) , * }
FOOLOW (F) = { * , + , ) , $ }
ACTION GOTO
id + * ( ) $ Ek / T F
I0 s5 s4 . t1 2 3
b e
tu
I1 s6 ACC
se
I2 r2 s7 r2 r2
I3 r4 r4
/c r4 r4
I4
I5
s5
p
r6
:/
r6
s4
r6 r6
8 2 3
I6 s5 t t s4 9 3
I7 s5
h s4 10
I8 s6 s11
I9 r1 s7 r1 r1
I10 r3 r3 r3 r3
I11 r5 r5 r5 r5
Stack implementation:
http://csetube.weebly.com/
STACK INPUT ACTION
0 id + id * id $ GOTO ( I0 , id ) = s5 ; shift
0F3 + id * id $ GOTO ( I0 , F ) = 3
GOTO ( I3 , + ) = r4 ; reduce by T → F
0T2 + id * id $ GOTO ( I0 , T ) = 2
GOTO ( I2 , + ) = r2 ; reduce by E → T
0E1 + id * id $ GOTO ( I0 , E ) = 1
GOTO ( I1 , + ) = s6 ; shift
0 E 1 + 6 id 5 * id $ GOTO ( I5 , * ) = r6 ; reduce by F → id
0E1+6F3 * id $ GOTO ( I6 , F ) = 3
k /
GOTO ( I3 , * ) = r4 ; reduce by T → F
0E1+6T9 * id $ .
GOTO ( I6 , T ) = 9
t
b e
GOTO ( I9 , * ) = s7 ; shift
0E1+6T9*7 id $ t u
GOTO ( I7 , id ) = s5 ; shift
0 E 1 + 6 T 9 * 7 id 5 $ s e
GOTO ( I5 , $ ) = r6 ; reduce by F → id
0 E 1 + 6 T 9 * 7 F 10 /
$
c GOTO ( I7 , F ) = 10
0E1+6T9
t t $ GOTO ( I6 , T ) = 9
GOTO ( I9 , $ ) = r1 ; reduce by E → E + T
0E1
h $ GOTO ( I0 , E ) = 1
GOTO ( I1 , $ ) = accept
TYPE CHECKING
A compiler must check that the source program follows both syntactic and semantic conventions
of the source language.
This checking, called static checking, detects and reports programming errors.
http://csetube.weebly.com/
2. Flow-of-control checks – Statements that cause flow of control to leave a construct must have
some place to which to transfer the flow of control. Example: An error occurs when an
enclosing statement, such as break, does not exist in switch statement.
A type checker verifies that the type of a construct matches that expected by its context.
For example : arithmetic operator mod in Pascal requires integer operands, so a type
checker verifies that the operands of mod have type integer.
Type information gathered by a type checker may be needed when code is generated.
TYPE SYSTEMS
/
The design of a type checker for a language is based on information about the syntactic
k
constructs. . t
constructs in the language, the notion of types, and the rules for assigning types to language
b e
For example : “ if both operands of the arithmetic operators of +,- and * are of type integer, then
the result is of type integer ”
t u
Type Expressions
s e
/ c
The type of a language construct will be denoted by a “type expression.”
: /
A type expression is either a basic type or is formed by applying an operator called a type
p
constructor to other type expressions.
t
h t
The sets of basic types and constructors depend on the language to be checked.
1. Basic types such as boolean, char, integer, real are type expressions.
A special basic type, type_error , will signal an error during type checking; void denoting
“the absence of a value” allows statements to be checked.
http://csetube.weebly.com/
Records : The difference between a record and a product is that the fields of a record have
names. The record type constructor will be applied to a tuple formed from field names and
field types.
For example:
type row = record
address: integer;
lexeme: array[1..15] of char
end;
var table: array[1...101] of row;
declares the type name row representing the type expression record((address X integer) X
(lexeme X array(1..15,char))) and the variable table to be an array of records of this type.
Pointers : If T is a type expression, then pointer(T) is a type expression denoting the type
“pointer to an object of type T”.
For example, var p: ↑ row declares variable p to have type pointer(row).
k /
Functions : A function in programming languages maps a domain type D to a range type R.
The type of such function is denoted by the type expression D → R
. t
4. e
Type expressions may contain variables whose values are type expressions.
b
u
Tree representation for char x char → pointer (integer)
t
→ s e
/ c
:/
x pointer
char
tp
char integer
Type systems
ht
A type system is a collection of rules for assigning type expressions to the various parts of
a program.
Different type systems may be used by different compilers or processors of the same
language.
Checking done by a compiler is said to be static, while checking done when the target
program runs is termed dynamic.
Any check can be done dynamically, if the target code carries the type of an element
along with the value of that element.
http://csetube.weebly.com/
Sound type system
A sound type system eliminates the need for dynamic checking for type errors because it
allows us to determine statically that these errors cannot occur when the target program runs.
That is, if a sound type system assigns a type other than type_error to a program part, then type
errors cannot occur when the target code for the program part is run.
Error Recovery
Since type checking has the potential for catching errors in program, it is desirable for
type checker to recover from errors, so it can check the rest of the input.
Error handling has to be designed into the type system right from the start; the type
checking rules must be prepared to cope with errors.
b e
synthesizes the type of each expression from the types of its subexpressions. The type checker
can handle arrays, pointers, statements and functions.
t u
A Simple Language
s e
Consider the following grammar:
/ c
P →D;E
: /
D → D ; D | id : T
t p
T → char | integer | array [ num ] of T | ↑ T
t
E → literal | num | id | E mod E | E [ E ] | E ↑
h
Translation scheme:
P→D;E
D→D;D
D → id : T { addtype (id.entry , T.type) }
T → char { T.type : = char }
T → integer { T.type : = integer }
T → ↑ T1 { T.type : = pointer(T1.type) }
T → array [ num ] of T1 { T.type : = array ( 1… num.val , T1.type) }
http://csetube.weebly.com/
Type checking of expressions
In the following rules, the attribute type for E gives the type expression assigned to the
expression generated by E.
5. E → E1 ↑ t u
{ E.type : = if E1.type = pointer (t) then t
else type_error }
s e
/ c
The postfix operator ↑ yields the object pointed to by its operand. The type of E ↑ is the type t
/
of the object pointed to by the pointer E.
:
Type checking of statements
t p
h t
Statements do not have values; hence the basic type void can be assigned to them. If an error is
detected within a statement, then type_error is assigned.
1. Assignment statement:
S → id : = E { S.type : = if id.type = E.type then void
else type_error }
2. Conditional statement:
S → if E then S1 { S.type : = if E.type = boolean then S1.type
else type_error }
3. While statement:
S → while E do S1 { S.type : = if E.type = boolean then S1.type
else type_error }
http://csetube.weebly.com/
4. Sequence of statements:
S → S1 ; S2 { S.type : = if S1.type = void and
S1.type = void then void
else type_error }
Procedures:
A procedure definition is a declaration that associates an identifier with a statement. The
identifier is the procedure name, and the statement is the procedure body.
For example, the following is the definition of procedure named readarray :
k /
procedure readarray; . t
var i : integer;
begin b e
for i : = 1 to 9 do read(a[i])
t u
end;
s e
c
When a procedure name appears within an executable statement, the procedure is said to be
/
called at that point.
: /
Activation trees:
t p
An activation tree is used to depict the way control enters and leaves activations. In an
activation tree,
h t
1. Each node represents an activation of a procedure.
2. The root represents the activation of the main program.
3. The node for a is the parent of the node for b if and only if control flows from activation a to
b.
4. The node for a is to the left of the node for b if and only if the lifetime of a occurs before the
lifetime of b.
Control stack:
A control stack is used to keep track of live procedure activations. The idea is to push the
node for an activation onto the control stack as the activation begins and to pop the node
when the activation ends.
The contents of the control stack are related to paths to the root of the activation tree.
When node n is at the top of control stack, the stack contains the nodes along the path
from n to the root.
http://csetube.weebly.com/
The Scope of a Declaration:
A declaration is a syntactic construct that associates information with a name.
Declarations may be explicit, such as:
var i : integer ;
or they may be implicit. Example, any variable name starting with I is assumed to denote an
integer.
The portion of the program to which a declaration applies is called the scope of that declaration.
Binding of names:
Even if each name is declared once in a program, the same name may denote different
data objects at run time. “Data object” corresponds to a storage location that holds values.
The term environment refers to a function that maps a name to a storage location.
The term state refers to a function that maps a storage location to the value held there.
environment state
k /
name storage value . t
b e
When an environment associates storage location s with a name x, we say that x is bound
t
to s. This association is referred to as a binding of x. u
s e
STORAGE ORGANISATION / c
: /
The executing target program runs in its own logical address space in which each
t p
program value has a location.
h t
The management and organization of this logical address space is shared between the
complier, operating system and target machine. The operating system maps the logical
address into physical addresses, which are usually spread throughout memory.
Code
Static Data
Stack
free memory
Heap
http://csetube.weebly.com/
Run-time storage comes in blocks, where a byte is the smallest unit of addressable
memory. Four bytes form a machine word. Multibyte objects are stored in consecutive
bytes and given the address of first byte.
The storage layout for data objects is strongly influenced by the addressing constraints of
the target machine.
A character array of length 10 needs only enough bytes to hold 10 characters, a compiler
may allocate 12 bytes to get alignment, leaving 2 bytes unused.
This unused space due to alignment considerations is referred to as padding.
The size of some program objects may be known at run time and may be placed in an
area called static.
The dynamic areas used to maximize the utilization of space at run time are stack and
heap.
Activation records:
Procedure calls and returns are usually managed by a run time stack called the control
stack.
k /
Each live activation has an activation record on the control stack, with the root of the
activation tree at the bottom, the latter activation has its record at the top of the stack.
. t
The contents of the activation record vary with the language being implemented. The
diagram below shows the contents of activation record.
b e
t u
s e
/ c
: /
t p
h t
http://csetube.weebly.com/
Space for the return value of the called functions, if any. Again, not all called procedures
return a value, and if one does, we may prefer to place that value in a register for
efficiency.
The actual parameters used by the calling procedure. These are not placed in activation
record but rather in registers, when possible, for greater efficiency.
STATIC ALLOCATION
In static allocation, names are bound to storage as the program is compiled, so there is no
need for a run-time support package.
names are bound to the same storage locations. k /
Since the bindings do not change at run-time, everytime a procedure is activated, its
. t
Therefore values of local names are retained across activations of a procedure. That is,
t u
From the type of a name, the compiler decides the amount of storage for the name and
s e
decides where the activation records go. At compile time, we can fill in the addresses at
which the target code can find the data it operates on.
/ c
STACK ALLOCATION OF SPACE
: /
t p
All compilers for languages that use procedures, functions or methods as units of user-
defined actions manage at least part of their run-time memory as a stack.
h t
Each time a procedure is called , space for its local variables is pushed onto a stack, and
when the procedure terminates, that space is popped off the stack.
Calling sequences:
Procedures called are implemented in what is called as calling sequence, which consists
of code that allocates an activation record on the stack and enters information into its
fields.
A return sequence is similar to code to restore the state of machine so the calling
procedure can continue its execution after the call.
The code in calling sequence is often divided between the calling procedure (caller) and
the procedure it calls (callee).
When designing calling sequences and the layout of activation records, the following
principles are helpful:
Values communicated between caller and callee are generally placed at the
beginning of the callee’s activation record, so they are as close as possible to the
caller’s activation record.
http://csetube.weebly.com/
Fixed length items are generally placed in the middle. Such items typically include
the control link, the access link, and the machine status fields.
Items whose size may not be known early enough are placed at the end of the
activation record. The most common example is dynamically sized array, where the
value of one of the callee’s parameters determines the length of the array.
We must locate the top-of-stack pointer judiciously. A common approach is to have
it point to the end of fixed-length fields in the activation record. Fixed-length data
can then be accessed by fixed offsets, known to the intermediate-code generator,
relative to the top-of-stack pointer.
...
Parameters and returned values
caller’s
control link
activation
links and saved status
record
caller’s
responsibility
temporaries and local data
k /
callee’s
Parameters and returned values
. t
activation
record
control link
b
links and saved status
e
t u top_sp
callee’s
responsibility s e
temporaries and local data
/ c
/
Division of tasks between caller and callee
:
t p
The calling sequence and its division between caller and callee are as follows.
h t
The caller evaluates the actual parameters.
The caller stores a return address and the old value of top_sp into the callee’s
activation record. The caller then increments the top_sp to the respective
positions.
The callee saves the register values and other status information.
The callee initializes its local data and begins execution.
A suitable, corresponding return sequence is:
http://csetube.weebly.com/
Variable length data on stack:
The run-time memory management system must deal frequently with the allocation of
space for objects, the sizes of which are not known at the compile time, but which are
local to a procedure and thus may be allocated on the stack.
The reason to prefer placing objects on the stack is that we avoid the expense of garbage
collecting their space.
The same scheme works for objects of any type if they are local to the procedure called
and have a size that depends on the parameters of the call.
activation
control link
record for p
pointer to A
pointer to B
pointer to C k /
. t
array A
b e
arrays of p
array B t u
s e
/ c array C
: / top_sp
activation record for
t
procedure q called by p
p control link
h
arrays of q
t top
Procedure p has three local arrays, whose sizes cannot be determined at compile time.
The storage for these arrays is not part of the activation record for p.
Access to the data is through two pointers, top and top-sp. Here the top marks the actual
top of stack; it points the position at which the next activation record will begin.
The second top-sp is used to find local, fixed-length fields of the top activation record.
The code to reposition top and top-sp can be generated at compile time, in terms of sizes
that will become known at run time.
http://csetube.weebly.com/
HEAP ALLOCATION
Stack allocation strategy cannot be used if either of the following is possible :
1. The values of local names must be retained when an activation ends.
2. A called activation outlives the caller.
Heap allocation parcels out pieces of contiguous storage, as needed for activation records
or other objects.
Pieces may be deallocated in any order, so over the time the heap will consist of alternate
areas that are free and in use.
s Retained activation
s record for r
r q ( 1 , 9) control link k /
. t
r
b e
u
control link
t
s e
/ c q(1,9)
: / control link
t p
h t
The record for an activation of procedure r is retained when the activation ends.
Therefore, the record for the new activation q(1 , 9) cannot follow that for s physically.
If the retained activation record for r is deallocated, there will be free space in the heap
between the activation records for s and q.
http://csetube.weebly.com/