PCD Lab Manual

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 60

Introduction to the compiler

What is a Compiler

Compiler is a program that reads a program written in one language – the source language – and

translates it in to an equivalent program in another language – the target language. As an

important part of this translation process, the compiler reports to its user the presence of errors in

the source program.

Source Program Compiler Target program

Error Message

The Analysis-Synthesis Model of Compilation

There are two parts to compilation: Analysis and Synthesis. The analysis part breaks up the

source program into constituent pieces and creates an intermediate representation of source

program. The synthesis part constructs the desired target program from the intermediate

representation. Of the two parts, synthesis requires the most specialize technique.

The Phases Of A Compiler

A compiler operates in six phases, each of which transforms the source program from one

representation to another. The first three phases are forming the bulk of analysis portion of a

compiler. Two other activities, symbol table management and error handling, are also interacting

with the six phases of compiler. These six phases are lexical analysis, syntax analysis, semantic

analysis, intermediate code generation, code optimization and code generation.


Phases of Compiler

Source program

Lexical Analyzer

Syntax Analyzer

Semantic Analyzer

Symbol table Intermediate code Error


manager generation Handler

Code Optimizer

Code Generator

Target

progra

Lexical analysis
In compiler, lexical analysis is also called linear analysis or scanning. In lexical analysis the

stream of characters making up the source program is read from left to right and grouped into

tokens that are sequences of characters having a collective meaning.

Syntax analysis

It is also called as Hierarchical analysis or parsing. It involves grouping the tokens of the source

program into grammatical phrases that are used by the compiler to synthesize output. Usually, a

parse tree represents the grammatical phrases of the sourse program.

Semantic Analysis

The semantic analysis phase checks the source program for semantic errors and gathers type

information for the subsequent code generation phase. It uses the hierarchical structure

determined by the syntax-analysis phase to identify the operators and operands of expressions

and statements.

An important component of semantic analysis is type checking. Here the compiler checks that

each operator has operands that are permitted by the source language specification.

Symbol table management

Symbol table is a data structure containing the record of each identifier, with fields for the

attributes of the identifier. The data structure allows us to find the record for each identifier

quickly and store or retrieve data from that record quickly. When the lexical analyzer detects an

identifier in the source program, the identifier is entered into symbol table. The remaining

phases enter information about identifiers in to the symbol table.

Error detection

Each phase can encounter errors. The syntax and semantic analysis phases usually handle a large

fraction of the errors detectable by compiler. The lexical phase can detect errors where the

characters remaining in the input do not form any token of language. Errors where the token

stream violates the structure rules of the language are determined by the syntax analysis phase.
Intermediate code generation

After syntax and semantic analysis, some compilers generate an explicit intermediate

representation of the source program. This intermediate representation should have two

important properties: it should be easy to produce and easy to translate into target program.

During semantic analysis the compiler tries to detect constructs that have the right syntactic

structure but no meaning to the operation involved

Code optimization

The code optimization phase attempts to improve the intermediate code so that the faster-

running machine code will result. There are simple optimizations that significantly improve the

running time of the target program without slowing down compilation too much.

Code generation

The final phase of compilation is the generation of target code, consisting normally of reloadable

machine code or assembly code.


Practical 1. Lexical Analysis

Aim: Program to generate lexical tokens.


Theory:
THE ROLE OF LEXICAL ANALYZER
The lexical analyzer is the first phase of a compiler. Its main task is to read the input
characters and produce as output a sequence of tokens that the parser uses for syntax analysis.
Upon receiving a “get next token” command from the parser, the lexical analyzer reads input
characters until it can identify the next token.

next_token

Lexical Analyzer Parser


Input getnext_token

Symbol Table

Error Handler

Figure 1.1 Interaction of Lexical Analyzer with Parser

Since the lexical analyzer is the part of the compiler that reads the source text, it may
also perform certain secondary tasks at the user interface. One such task is stripping out from
the source program comments and white spaces in the form of blank, tab, and new line
characters. Another is correlating error messages from the compiler with the source program.
Sometimes lexical analyzers are divided into a cascade of two phases first called
“scanning” and the second “lexical analysis”. The scanner is responsible for doing simple tasks,
while the lexical analyzer proper does the more complex operations.
Algorithm:
1. Declare an array of characters, as buffer to store the tokens ,that is,’lexbuffer’;
2. Get token from user put it into character type of variable, say ‘c’.
3. If ‘c’ is blank then do nothing.
4. If ‘c’ is new line character line=line+1.
5. If ‘c’ is digit, set token_val ,the value assigned for a digit and return the ‘NUMBER’.
6. If ‘c’ is proper token then assign the token value.
1. Print the complete table with
a. Token entered by the user
b. Associated token value.

Output:

Enter the Statement


If(a= =1) then b++;

Token Code Value

if 1 -
( 5 1
a 8 Pointer to Symbol table
== 6 1
1 9 Pointer to literal table
) 5 2
then 2 -
b 8 Pointer to literal table
++ 6 2
; 7 1

Algorithm to check the whether the string is KETWORD or not


1. Start
2. Declare the character storing the keyword S[5]
[10]={“if”,”else”,”for”,”int”,”goto”,”return”} and another character array to store the
string to be compared st[], initialize integer variable l, flag = 0 ,m
3. Input the string that is to be compared st[]
4. Repeat step 5 to 6 till counter i becomes equal to number of keyword stored in an
array
5. Compared the string entered by the user with the string in the character array by using
m=strcmp(st,s[i]),where strcmp()function returns TRUE if strings are equal
6. i=i+1
7. if flag equal to one then it is keyword
8. Else it is not keyword
9. Stop
Output:
Enter the string : return
It is KEYWORD

Enter the string : hello


It is not KEYWORD

Algorithm to find whether the string is CONSTANT or not


1. Start
2. Declare the character array str[] and initialize integer variable len , a= 0.
3. Input the string from user
4. Find the length of the string
5. Repeat step 6 to 7 till a<len
6. If str[a] is numeric character then a++
7. Else it is not constant and break from the loop and goto step 9.
8. if a = = len then print that entered string is constant
9. else it is not a constant
10. Stop.

Output:
Input a string : 24
It is a CONSTANT

Input a string : a34


It is a NOT CONSTANT

Conclusion: Thus we have studied Lexical Analyzer


Practical 2. NFA TO DFA

Aim: Program to convert NFA to DFA.


Theory:
FINITE AUTOMATA: Machine Model that Recognizes Regular Languages
The finite automata, (FA), machine M defined by the 5-tuple M = {∑, S, s0, δ, F}; where the
alphabet is: ∑ = {0,1}; the set of states is: S = {s0, s1, s2}; the starting state is s0; the final state
is: F = {s2}; and the transitions δ are defined by the table below.
δ 0 1.
s0 s0 s1
s1 {s1, s2} s1
s2 s2 Ø

M can also be represented by the transition graph:


0 1, 0
0

1 0
s0 s1 s2

This figure (which corresponds to the transition table above) is a non-deterministic finite
automaton, NFA. The big circles represent states and the double circles represent accepting or
final states. The state with an unlabeled arrow coming from its left is the starting state

Constructing NFA from Regular Expression


REGULAR EXPRESSIONS
A regular expression is built up out of simpler regular expressions using a set of defining
rules. Each regular expression r denotes a language L(r). The defining rules specifies how L(r)
is formed by combining in various ways the languages denoted by the subexpressions of r.

Rules that define the regular expressions using a set of defining rules:
1.  is a regular expression that denotes {  }, that is, the set of containing the empty
string.
2. If a is a symbol in , then a is a regular expression that denotes {a} is a set containing
the string a.
3. Suppose r and s are regular expressions denoting the language L(r) and L(s). Then:
a) (r) | (s) is a regular expression denoting L(r)  L(s)
b) (r) (s) is a regular expression denoting L(r) L(s)
c) (r)* is a regular expression denoting (L(r))*
d) (r) is a regular expression denoting L(r)2
A language denoted by a regular expression is said to be a regular set.

NFA vs. DFA


Deterministic Finite Automaton (DFA)
• For every state there is exactly one outgoing edge per alphabet symbol (the transition
function IS a function)
• For each symbol in the alphabet there is a corresponding output and there is only one.

Non-Deterministic Finite Automaton (NFA)


• At least one of the states has more than one outgoing edge for the same alphabet symbol
(the transition function IS NOT a function, it is a relation.)
• There may be e transitions (transitions that occur without the presence of an input symbol
from the alphabet.)

ALGORITHM TO SIMULATE A DFA

Algorithm 1
Input: A string x ended by an EOF character and a DFA defined as a 5-tuple with s0 as its
initial state and F as its set of accepting states.
Output: A “YES” if the DFA accepts x or “NO” otherwise.

Method:
Apply the “pseudo code” algorithm below to the input string x. The function move(x, c) gives
the state to which there is a transition from state s on input character c. The function nextchar
returns the next character of the input string x.
s = s0;
c = nextchar;
while c =! EOF
s = move(s, c);
c = nextchar;
end
if s is in F then
return “YES”
else return “NO”;
Finite Automata
Parsing
Accept input string iff there exists at least one path from the initial state (s0) to a final state
(sf) that spells the input string.
If no path exists, reject the string
Example: Use the input string x = ‘11000’

0 1,0 0

0 0 0 1 1->
•s0 s1 s2
1 0

0 1,0 0

0 0 0 1 ->
s0 •s1 s2
1 0

0 1,0 0

0 0 ->
s0 •s1 •s2
1 0

0 1,0 0

•s0 s1 •s2
1 0
Example
Consider the NFA N = {{q0,.., q4}, {0, 1}, q0,

0, 1

q0
1 0

17
q1 q3
| 0 1 .
q0 | {q0, q3} {q0, q1}
q1 | Ø {q2}
q2 | {q2} {q2}
q3 | {q4} Ø
q4 | {q4} {q4}

We show next the proliferation of states of the NFA under the input string 01001
0 0 0 1
q0 q0 q0 q0
1
q0 q0 0
0
0 1
1

q3 q3 q1
q1 q3
0
1 q4
q4
DEFINITIONS RELATED TO THE e-closure ALGORITHM
Definition
Given a state s of some NFA N, the e-closure(s) is the set of states in N reachable
froms on e-transitions only.
Definition
Given a set of states T of some NFA N, the e-closure(T) is the set of states in N
reachable from some state s in T on e-transitions alone.
Example
X = e-closure({0}) = {0, 1, 2, 4, 7}
Y = e-closure({2}) = {2}
Z = e-closure({6}) = {6, 1, 2, 4, 7} = {1, 2, 4, 6, 7}
T = e-clos.({1, 2, 5}) = e-clos.({1}) U e-clos.({2}) U e-clos.({5})
= {1, 2, 5, 4, 6, 7} = {1, 2, 4, 5, 6, 7}
e
a
2 3
e e
e e
0 1 6 7
e b e 18
4 5
e
THE e-closure ALGORITHM
The computation of e-closure(T) is a typical process of searching a graph for nodes
reachable from a given set T of nodes by following the e-labeled edges of the NFA.
The (pseudo code) algorithm below used to compute the e-closure (T) uses a stack to
hold the states whose edges have not been checked for e-labeled transitions

Algorithm (e-closure)
push all states in T onto stack;
initialize e-closure(T) to T;
while stack is not empty
pop t, the top element, off stack;
for each state u with an edge from t to u labeled e
if u is not in the e-closure(T)
add u to e-closure(T);
push u onto stack
end
end

ALGORITHM TO BUILD THE DFA FROM AN NFA

Potentially, the total number of states of the DFA is the power set of the number of
states in the NFA (2N).
Input: An NFA N with starting state s0, accepting some regular language.
Output: A DFA D accepting the same language as N.

Method:
The algorithm constructs a transition table DTran for D. We call Dstates the set of states in D.
We will use the definitions given for the e-closure algorithm plus:
move(T, a) = set of states to which there is a transition on input symbol a in Σ from
some NFA state s in T.

Algorithm 2 (“subset construction”).

19
initially, e-closure(s0) is the only state in Dstates and it is unmarked;
while there is an unmarked state T in Dstates
mark T;
for each input symbol a in Σ
U = e-closure(move(T, a));
if U is not in Dstates then
add U as an unmarked state to Dstates;
DTran(T, a) = U
end
end
Algorithm for Subset Construction.
(taken from Parson’s textbook)

Consider an NFA N and a DFA D accepting the same regular language. [Equivalent to
Algorithm 2].
1) Initially the list of states in D is empty. Create the starting state as e-closure(s0), named after
initial state {s0} in N. That is, new start state: state(1) = e-closure (s0).
2) While (there is an uncompleted row in the transition table for D) do:
a) Let x = {s1, s2, ...,sk) be the state for this row.
b) For each input symbol a do:
i) Find the e-closure({s1,s2,...sk},a) = N({s1},a) U N({s2},a) U.....U N({sk},a) = some
set we'll call T .
ii) Create the D-state y = {T}, corresponding to T.
iii) If y is not already in the list of D-states, add it to the list. (This results in a new row in
the table)
iv) Add the rule D(x , a) = y to the list of transition rules for D.

3) Identify the accepting states in D. (They must include the accepting states in N). That is,
make state(j) final state iff at least one state in state(j) is final state in the NFA.

Yet another algorithm for the construction of a DFA equivalent to an NFA.


state[0] = { };
state[1] = e-closure (s0);
p = 1;
j = 0;
while (j <= p) {
foreach c in ∑ do
{
e = DFAedge(state[j],c);

20
if (e == state[i] for some i <= p) {
trans[j,c] = i;
}
else {
states[p] = e;
trans[j,c] = p;
} }
j = j + 1;
}
Example
Find all possible theoretical states for the NFA given below.
0 1,0 0

1 0
0 1 2

There are in total 23 = 8 states for the corresponding DFA. This is the Power Set of set S =
{1, 2, 3} represented by 2|S|. That is, the 8 states of the DFA are: 2|
S|
={Ø, {0}, {1}, {2}, {0, 1}, {0, 2}, {1, 2}, {0, 1, 2}}
and graphically, the transition diagram is:
0 1
0 1,0

1 1
{0} {1} {2} Ø

1 0 1
1 1
0 {0, 1, 2}
{0, 2} {1, 2} {0, 1}

0 0 0

Trivially, the states {2} and Ø plus the states {0, 2} and {0, 1} that have no input can be
eliminated. After the 1st elimination cycle is complete, {0, 1, 2} has no input to it and can be
eliminated. Only A = {0}, B= {1} and C = {1, 2} remain.
Example
The NFA below represents the regular expression letter (letter | digit)*. Find its
corresponding DFA. (The theoretical number of states in tehis case

letter
5 1 2 3 4
letter e e e
6 e 21
e
9 10
is 210 = 1,024).

Solution:
A = e-closure ({1}) = {1}
move(A, letter) = {2} (function move defined in Alg. 2)
move(A, digit) = Ø
B = e-closure({2}) = {2, 3, 4, 5, 7, 10}
move(B, letter) = {6}
move(B, digit) = {8}
C = e-closure({6}) = {6, 9, 10, 4, 5, 7} = {4, 5, 6, 7, 9, 10}
move(C, letter) = {6}
move(C, digit) = {8}
D = e-closure({8}) = {8, 9, 10, 4, 5, 7} = {4, 5, 7, 8, 9, 10}
move(D, letter) = {6}
move(D, digit) = {8}
State A is the start state of the DFA and all states that include state 10 of the NFA are
accepting states of the DFA. The transition diagram of the DFA is given below.

letter

start letter letter


ABC

digit
digit digit letter

Ø D

digit

Example
The NFA below represents the regular expression letter(letter | digit)*. Find its
corresponding DFA.
MINIMIZING THE NUMBER OF STATES OF A DFA
This is an important theoretical result: every regular set is recognized by a minimum-
state DFA.

22
Algorithm 4

Input: A DFA M with set of states S, start state s0 and a set of accepting states F.
Output: A DFA M’ accepting the same language as M and having a minimum number of
states.

Method:
1. Construct an initial partition P of the set of states with two groups: the accepting states
F and the non-accepting states S – F.

2. Apply the procedure given below to P and construct a new partition Pnew.
for each group G of P
• partition G into subgroups such that two states s and t of G are in the same
subgroup if and only if for all input symbols a, states s and t have transitions on
a to states in the same group of P;
/* at worst, a state will be in a subgroup by itself */
• partition G in Pnew by the set of all subgroups formed
end
3. If Pnew = P, let Pfinal = P and continue to step 4. Otherwise, repeat step 2 with P = Pnew.

4. Choose one state in each group of the partition Pfinal as the representative for that
group. The representative will be the states of the reduced DFA M’. Let s be a
representative state, and suppose on input a there is a transition of M from s to t. Let r
be the representative of t’s group (r may be t). The M’ has a transition from s to r on
a. Let the start state of M’ be the representative of the group containing the start state
s0 of M, and let the accepting states of M’ be the representatives that are in F. Note
that each group of Pfinal either consists only of states in F or has no states in F.

5. If M’ has a dead state, that is, a state d that is not accepting and that has transitions to
itself on all input symbols, then remove d from M’. Also, remove any states not
reachable from the start state. Any transitions to d from other states become
undefined.

Example
Consider the DFA given by the following transition table and transition diagram.
a
| a b
A | B C b
B | B D B D
a a
b
a
A a 23

b
E
C
C | B C
D | B E
E | B C

Since E is the only accepting state we have the following partition:


[ABCD] & [E]
Under a [ABCD] goes to B which is in [ABCD] and under b [ABCD] goes to [CDE], thus we
have the new table

| a b .
[ABCD] | B [CDE]
E | B C

We must separate D from the subgroup [ABCD] since D is going to E under b. We now
build the following table.
| a b .
[ABC] | B [CD]
D | B E
E | B C

Now we separate B which is going to D under b. We can build the following table.
| a b
[AC] | B C
B | B D
D | B E
E | B C

The final table and transition diagram is given below.


|a b .
[AC] |B [AC]
B |B D
D |B E
E |B [AC]
Thus the minimal DFA has only four states versus five states in the initial DFA.
a

b
B D
a 24

a b
a
e

a
e 2 3 e
e e a b b
0 1 6 7 8 9 10
eb e
4 5

e
a

a a
a b
a
b
b

Conclusion: Thus we have studied conversion of NFA to DFA, minimization of DFA

Practical 3. LEX tool


Aim:Study of Lex tool
Theory:
Lex is officially known as a "Lexical Analyzer". It’s main job is to break up an input stream
into more into meaningful units, or tokens. For example, consider breaking a text file up into
individual words.
More pragmatically, Lex is a tool for automatically generating a lexer ( also known as scanner)
starting from a lex specification (*.l file 2 ).
The skeleton of a lex specification file is given in Figure 3.
The rules section is composed of tuples of the form <pattern, action>. As it can be seen from

25
the following examples, are specified by regular expressions.
Example
<pattern> <action to take when matched> [A-Za-z]+ printf("this is a word") ;
<pattern> <action to take when matched> [0-9]+ printf("this is a number") ;
yyparse
source
Example.y yacc y.tab.c

cc
Regular Expressions y.tab.h
: example
In the following we denote by c=character, x,y=regular expressions, m,n=integers,
i=identifier.
single characterlex.yy.c
mat ches anylex
. Exampl e.1 except newline
* mat ches 0 or more instanc es of the preceding regular expression Compiled outpu
yylex t
+ matches 1 or more instances of the preceding regular expression

? matches 0 or 1 of the preceding regular expression

| matches the preceding or following regular expression

[] defines a character class

() groups enclosed regular expression into a new regular expression

"..." matches everything within the " " literally

x|y x or y

{i} definition of i

x/y x, only if followed by y (y not removed from input)

x{m,n} m to n occurrences of x

ˆx x, but only at beginning of line

x$ x, but only at end of line

"s" exactly what is in the quotes (except for "\" and following character)

26
Algorithm:
1. Open file in text editor
2. Enter keywords, rules for identifier and constant, operators and relational operators. In
the following format
a) %{

Definition of constant /header files

%}

b) Regular Expressions

%%

Transition rules

%%

c) Auxiliary Procedure (main( ) function)

3. Save file with .l extension e.g. Mylex.l


4. Call lex tool on the terminal e.g. [root@localhost]# lex Mylex.l This lex tool will convert
“.l” file into “.c” language code file i.e. lex.yy.c
5. Compile the file lex.yy.c e.g. gcc lex.yy.c .After compiling the file lex.yy.c, this will
create the output file a.out
6. Run the file a.out e.g. ./a.out

7. Give input on the terminal to the a.out file upon processing output will be displayed

Example: Program for counting number of vowels and consonant


%{
#include <stdio.h>
#include <ctype.h>

int vowels = 0;
int consonants = 0;
%}

%%

27
[aeiouAEIOU] vowels++;
[a-zA-Z] consonants++;
[\n] ;
. ;
%%

int main()
{
printf ("This Lex program counts the number of vowels and ");
printf ("consonants in given text.");
printf ("\nEnter the text and terminate it with CTRL-d.\n");

yylex();

printf ("Vowels = %d, consonants = %d.\n", vowels, consonants);

return 0;
}

Output:
#lex alphalex.l
#gcc lex.yy.c
#./a.out
This Lex program counts the number of vowels and consonants in given text.
Enter the text and terminate it with CTRL-d.
Iceream
Vowels =4, consonants =3.

Output:
For lexical analyzer
[root@localhost]# lex Mylex.l
[root@localhost]# gcc lex.yy.c
[root@localhost]# ./a.out
123
NUMBER
bar123
WORD

28
123ba
NUMBER
WORD

Conclusion: Thus we have studied Lex tool

Practical 4. Parsers
Aim: Program to implement predictive parsers
Theory:
BASIC PARSING TECHNIQUES:
ADVANTAGES OFFERED BY GRAMMAR TO BOTH LANGUAGES AND DESIGNERS
1. A grammar gives a precise, yet easy to understand, syntactic specification of a programming
language.
2. From certain classes of grammars, it can automatically construct an efficient parser that
determines if a source program is syntactically well formed.
3. A properly designed grammar imparts a structure to programming language that is useful for
the translation of source programs into correct object code and for the detection of errors.
Tools are available for converting grammar-based descriptions of translations into working
programs.
4. Languages evolve a period of time, acquiring new constructs and performing additional
tasks.
THE ROLE OF THE PARSER

Token
Intermediate
Source Lexical Parse Rest of
Program Analyzer Parser Tree Front End Representation
Get_Next
Token

Symbol
Table

THE POSITION OF PARSER IN COMPILER MODEL

29
There are three general parsers for grammars. Universal methods such as the Cocke-
Younger-Kasami algorithm and Earley’s algorithm can parse any grammar. These two methods
are too efficient to use in the production compilers.
The most efficient top-down and bottom up methods work only on the classes of
grammars, but several of these subclasses, such as the LL and LR grammars, are expensive
enough to describe most syntactic constructs in programming languages.

TOP DOWN PARSING: The parse tree is created top to bottom.


 Top-down parser
 Recursive-Descent Parsing
o Backtracking is needed (If a choice of a production rule does not work,
we backtrack to try other alternatives.)
o It is a general parsing technique, but not widely used.
o Not efficient

S  aBc
S
B  bc | b S

input: abc
a c a
B B a

b c b
 Predictive Parsing
o no backtracking
o efficient
o Needs a special form of grammars (LL (1) grammars).
o Recursive Predictive Parsing is a special form of Recursive Descent parsing
without backtracking.
o Non-Recursive (Table Driven) Predictive Parser is also known as LL (1) parser.

A grammar a grammar suitable for predictive


parsing Eliminating left factor
left factoring

When re-writing a non-terminal in a derivation step, a predictive parser can uniquely choose a
production rule by just looking the current symbol in the input string.
A  1 | ... | n Input: ... a.......

30
Current token

Example stmt  if........|


while........|
begin........|
for .....
When we are trying to write the non-terminal stmt, if the current tokenis if we have to choose
first production rule. When we are trying to write the non-terminal stmt, we can uniquelychoose
the production rule by just looking the current token. We eliminate the left recursion in the
grammar, and left factor it. But itmay not be suitable for predictive parsing (not LL (1)
grammar).
 Each non-terminal corresponds to a procedure.
Ex: A aBb (This is only the production rule for A)
proc A {
- match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
}
A  aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next token;
- call ‘B’;
- match the current token with b, and move to the next token;
‘b’: - match the current token with b, and move to the next token;
- call ‘A’;
- call ‘B’;
}
}
When to apply έ-productions.
A aA | bB | έ
• If all other productions fail, we should apply an έ -production. For example, if the
current token is not a or b, we may apply theέ -production.
• Most correct choice: We should apply an έ-production for a nonterminal A when the
current token is in the follow set of A (which terminals can follow A in the sentential forms).

Example And Algorithm Recursive Predictive Parsing


A aBe | cBd | C

31
B bB | έ
C f

Proc A{
Case of the current token {
a: -match the current token with a, and move to the next token;
-call B;
c: - match the current token with e, and move to the next token;
-call B;
-match the current token with d, and move to the next
token; f: -call C (f means first set of C)
}
}
Proc C {match the current token with f, and move to the next token ;}

Proc B {
Case of the current token {
b: -match the current token with b, and move to the next token ;
-call B
e,d: do nothing (follow set of B)
}
}

Implementation of a Table-Driven Predictive Parser

A table-driven parser can be implemented using an input buffer, a stack, and a parsing table. The
input buffer is used to hold the string to be parsed. The string is followed by a "$" symbol that is
used as a right-end maker to indicate the end of the input string. The stack is used to hold the
sequence of grammar symbols. A "$" indicates bottom of the stack. Initially, the stack has the
start symbol of a grammar above the $. It is a two-dimensional array TABLE[A, a], where A is a
nonterminal and a is a terminal, or $ symbol. The parser is controlled by a program that behaves
as follows:

1. The program considers X, the symbol on the top of the stack, and the next input symbol
a.
2. If X = a = $, then parser announces the successful completion of the parsing and halts.
3. If X = a ≠ $, then the parser pops the X off the stack and advances the input pointer to the
next input symbol.

32
4. If X is a nonterminal, then the program consults the parsing table entry TABLE[x, a]. If
TABLE[x, a] = x → UVW, then the parser replaces X on the top of the stack by UVW in
such a manner that U will come on the top. If TABLE[x, a] = error, then the parser calls
the error-recovery routine.

For example consider the following grammar:

FIRST(S) = FIRST(aABb) = { a }

FIRST(A) = FIRST(c) 𝖴 FIRST(∈) = { c, ∈

} FIRST(B) = FIRST(d) 𝖴 FIRST(∈) = { d,

∈}

Since the right-end marker $ is used to mark the bottom of the stack, $ will initially be
immediately below S (the start symbol) on the stack; and hence, $ will be in the FOLLOW(S).
Therefore:

Using S → aABb, we get:

Therefore, the parsing table is as shown in Table 2.

Table 2: Production Selections for Parsing Derivations

® a b c d $

S S → aABb ® ® ® ®
A ® A→∈ A→c A→∈ ®
B ® B→∈ ® B→d ®

Consider an input string acdb. The various steps in the parsing of this string, in terms of the
contents of the stack and unspent input are shown in Table 3.

33
Table 3: Steps Involved in Parsing the String acdb
Stack Contents Unspent Input Moves

$S acdb$ Derivation using S → aABb


$bBAa acdb$ Popping a off the stack and advancing one position in the input
$bBA cdb$ Derivation using A → c
$bBc cdb$ Popping c off the stack and advancing one position in the input
$bB db$ Derivation using B → d
$bd db$ Popping d off the stack and advancing one position in the input
$b b$ Popping b off the stack and advancing one position in the input
$ $ Announce successful completion of the parsing

Similarly, for the input string ab, the various steps in the parsing of the string, in terms of the
contents of the stack and unspent input, are shown in Table 4.

Table 4: Production Selections for String ab Parsing Derivations


Stack Contents Unspent Input Moves

$S ab$ Derivation using S → aABb


$bBAa ab$ Popping a off the stack and advancing one position in the input
$bBA b$ Derivation using A → ∈
$bB b$ Derivation using B → ∈
$b b$ Popping b off the stack and advancing one position in the input
$ $ Announce successful completion of the parsing

For a string adb, the various steps in the parsing of the string, in terms of the contents of the
stack and unspent input are shown in Table 5.

Table 5: Production Selections for Parsing Derivations for the String adb
Stack Contents Unspent Input Moves

$S adb$ Derivation using S → aABb


$bBAa adb$ Popping a off the stack and advancing one position in the input
$bBA ab$ Calling an error-handling routine

EXAMPLE

Test whether the grammar is LL(1) or not, and construct a predictive parsing table for it.

34
Since the grammar contains a pair of productions S → AaAb | BbBa, for the grammar to be
LL(1), it is required that:

Hence, the grammar is LL(1).

To construct a parsing table, the FIRST and FOLLOW sets are computed, as shown below:

1. Using S → AaAb, we get:

2. Using S → BbBa, we get

Table 6: Production Selections for Example 2 Parsing Derivations

® a b $

S S → AaAb S → BbBa ®
A A→∈ A→∈ ®
B B→∈ B→∈ ®

Conclusion: Thus we have studied Parser.

35
Practical 5. YACC tool
Aim: Study of YACC tool

Theory:
YACC stands for Yet Another Compiler Compiler. Its GNU version is called Bison. YACC
translates any grammar of a language into a parser for that language. Grammars for YACC are
described using a variant of Backus Naur Form (BNF). A BNF grammar can be used to
express Context-free languages. By convention, a YACC file has the suffix .y.
Structure of a yacc file
A yacc file looks much like a lex file:
...definitions...
%%
...rules...
%%
...code...
Definitions All code between %{ and %} are copied to the beginning of the resulting C file.
Rules A number of combinations of pattern and action: if the action is more than a single
command it needs to be in braces.
Code This can be very elaborate, but the main ingredient is the call to yylex, the lexical
analyzer. If the code segment is left out, a default main is used which only calls
yylex.
YACC file structure

%{ ------------------------------
#include <stdio.h> Part to be embedded into the *.c
int yylex(void); ------------------------------
void yyerror(char *); %Definition Section
%} (Token declarations)
%token INTEGER ------------------------------
%% Production rules section:
program: define how to "understand" the
program expr ’\n’ { printf("%d\n", input language, and what actions
$2);} to take for each "sentence".
| ------------------------------
; < C auxiliary subroutines >
expr: Any user code. For example,
INTEGER { $$ = $1; } a main function to call the
| expr ’+’ expr { $$ = $1 + $3; } parser function yyparse()
| expr ’-’ expr { $$ = $1 - $3; }
;
36
%%
void yyerror(char *s) {
Algorithm:
1. Open file in text editor

2. Specify grammar rules and associated action in the following format

a. %{

Statements (Include statement optional)

%}

b. Lexical tokens, grammar, precedence and associated information.

%%

Grammar, rules and action

%%

c. Auxiliary Procedure (main( ) function)

3. Save file with .l extension e.g. Mylex.l

4. Call lex tool on the terminal e.g. [root@localhost]# lex Mylex.l This lex tool will convert

“.l” file into “.c” language code file i.e. lex.yy.c

5. Compile the file lex.yy.c e.g. gcc lex.yy.c .After compiling the file lex.yy.c, this will

create the output file a.out

7. Run the file a.out e.g. ./a.out

8. Give input on the terminal to the a.out file upon processing output will be displayed
Design LALR Bottom up Parser .
<parser.l>

37
%{
#include<stdio.h>
#include "y.tab.h"
%}
%%
[0-9]+ {yylval.dval=atof(yytext);
return DIGIT;
}
\n|. return yytext[0];
%%

<parser.y>

%{
/*This YACC specification file generates the LALR parser for the program considered in
experiment 4.*/
#include<stdio.h>
%}
%union
{
double dval;
}

%token <dval> DIGIT


%type <dval> expr
%type <dval> term
%type <dval> factor

%%

line: expr '\n'

{ printf("%g\n",$1);
}

;
expr: expr '+' term {$$=$1 + $3 ;}
| term

38
;

39
term: term '*' factor {$$=$1 * $3 ;}
| factor
;
factor: '(' expr ')' {$$=$2 ;}
| DIGIT
;

%%
int main()
{
yyparse();
}

yyerror(char *s)
{
printf("%s",s);
}

Output:

#lex parser.l
#yacc –d parser.y
#cc lex.yy.c y.tab.c –ll –lm
#./a.out
2+3
5.0000

Practical 6. Code Generation

Aim: Program to convert Infix expression to Postfix

Theory:

While translating a source program into a functionally equivalent object code representation, a
parser may first generate an intermediate representation. This makes retargeting of the code

40
possible and allows some optimizations to be carried out that would otherwise not be possible.
The following are commonly used intermediate representations:

1. Postfix notation
2. Syntax tree
3. Three-address

code Postfix Notation

In postfix notation, the operator follows the operand. For example, in the expression (a − b) * (c
+ d) + (a − b), the postfix representation is:

Syntax Tree

The syntax tree is nothing more than a condensed form of the parse tree. The operator and
keyword nodes of the parse tree (Figure 1) are moved to their parent, and a chain of single
productions is replaced by single link (Figure ).

Figure 1: Parse tree for the string id+id*id.

41
Figure 2: Syntax tree for id+id*id.
Three-Address Code

Three address code is a sequence of statements of the form x = y op z. Since a statement involves
no more than three references, it is called a "three-address statement," and a sequence of such
statements is referred to as three-address code. For example, the three-address code for the
expression a + b * c + d is:

Sometimes a statement might contain less than three references; but it is still called a three-
address statement. The following are the three-address statements used to represent various
programming language constructs:

 Used for representing arithmetic expressions:

 Used for representing Boolean expressions:

 Used for representing array references and dereferencing operations:

42
 Used for representing a procedure call:

Infix Expression :
Any expression in the standard form like "2*3-4/5" is an Infix(Inorder) expression.

Postfix Expression :
The Postfix(Postorder) form of the above expression is "23*45/-".

Infix to Postfix Conversion :


In normal algebra we use the infix notation like a+b*c. The corresponding postfix notation is
abc*+. The algorithm for the conversion is as follows :

 Scan the Infix string from left to right.


 Initialise an empty stack.
 If the scannned character is an operand, add it to the Postfix string. If the scanned
character is an operator and if the stack is empty Push the character to stack.
 If the scanned character is an Operand and the stack is not empty, compare the
precedence of the character with the element on top of the stack (topStack). If topStack
has higher precedence over the scanned character Pop the stack else Push the scanned
character to stack. Repeat this step as long as stack is not empty and topStack has
precedence over the character.

Repeat this step till all the characters are scanned.

 (After all characters are scanned, we have to add any character that the stack may have to
the Postfix string.) If stack is not empty add topStack to Postfix string and Pop the stack.
Repeat this step as long as stack is not empty.
 Return the Postfix string.

Example :

43
Let us see how the above algorithm will be imlemented using an example.

Infix String : a+b*c-d

Initially the Stack is empty and our Postfix string has no characters. Now, the first character
scanned is 'a'. 'a' is added to the Postfix string. The next character scanned is '+'. It being an
operator, it is pushed to the stack.

Postfix String

Stack

Next character scanned is 'b' which will be placed in the Postfix string. Next character is '*'
which is an operator. Now, the top element of the stack is '+' which has lower precedence than
'*', so '*' will be pushed to the stack.

Postfix String

Stack

The next character is 'c' which is placed in the Postfix string. Next character scanned is '-'. The
topmost character in the stack is '*' which has a higher precedence than '-'. Thus '*' will be
popped out from the stack and added to the Postfix string. Even now the stack is not empty.
Now the topmost element of the stack is '+' which has equal priority to '-'. So pop the '+' from the
stack and add it to the Postfix string. The '-' will be pushed to the stack.

Postfix String

Stack

Next character is 'd' which is added to Postfix string. Now all characters have been scanned so
we must pop the remaining elements from the stack and add it to the Postfix string. At this stage

44
we have only a '-' in the stack. It is popped out and added to the Postfix string. So, after all
characters are scanned, this is how the stack and Postfix string will be :

Postfix String

Stack

End result :

 Infix String : a+b*c-d


 Postfix String : abc*+d-

Algorithm:

1. Take a stack OPSTK and initialize it to be empty.


2. Read the entire string or in infix form e.g. A+B*C
3. Read the string character by character into var symbol.
i)If symbol is an operand
Add it to the postfix string.
ii) if stack OPSTK is not empty and precedence of top of stack symbol is
greater than recently read character symbol then pop OPSTK .
topsymbol=pop(OPSTK)
Add this popped topsymbol to the postfix string
iii) Repeat step iii. Till stack is not empty precedence of top of stack symbol is greater
than recently read character symbol.
iv) push symbol onto OPSTK.
4. Output any remaining
operators.
Pop OPSTK till it is not empty and ad top symbol to postfix string .

Output:

Enter the Infix Notation : (A+B)*C


Postfix Notation is: AB+C*

45
Practical 7. Code optimization

Aim: Implementation of code optimization for common sub-expression elimination, loop


invariant code movement

Theory:
To create an efficient target language program, a programmer needs more than an
optimizing compiler. In this section, we review the options available to a programmer and a
compiler for creating efficient target programs. We mention of code-improving transformations
that a programmer and a compiler writer can be expected to use it improve the performance of a
program.
Criteria for Code-Improving Transformations
The best program transformations are those that yield the most benefit for the least effort.
The transformations provided by an optimizing compiler should have several properties.
First, a transformation must preserve the meaning of programs. That is, an “optimization”
must not change the output produced by a program for a given input, or cause an error, such as
division by zero, that was not present in the original version of the source program.
Second, a transformation must, on the average, speed up programs by a measurable amount.
Third, a transformation must be worth the effort. It does not make sense for a compiler writer
to expend the intellectual effort to implement a code-improving transformation

Getting Better Performance


Dramatic improvements in the running time of a program –such as cutting the running time
from a few hours to few seconds-are usually obtained by improving the program at all levels.

Sourc Code
Front- Intermedia targ
e end generat
te code et
code or
code

Compiler can
User
can Improve loops
Profile program Procedure calls
Change algorithm Address calculations
Transform loops

46
Compiler can
Use registers Select
instructions Do
peephole
transformations

Fig.1 Places for improvements by the user and the compiler

47
Algorithmic transformations occasionally produce spectacular improvements in running
time. For example, Bentley [1982] relates that the running time of a program for sorting N
elements dropped from 2.02N2 microseconds to 12Nlog2N microseconds when a carefully coded
insertion sort was replaced by quick sort. For N=100 the replacement speeds up the program by
a factor of 2.5. For N =100,000 the improvement is far more dramatic: the replacement speeds
up the program by a factor of more than a thousand. Unfortunately, no compiler can find the best
algorithm for a given program.
In this section, and the next, a sorting program called quicksort will be used to illustrate
the effect of various code-improving transformations. The C program below is derived from
Sedgewick [1978], where hand-optimization of such a program is discussed
void quicksort (m, n)
int m, n
{ int i, j; int v, x;
if (n <= m) return ;
/* fragment begins here*/
i = m – 1; j = n; v = a[n];
while (1) {
do i = i + 1; while ( a[ j ] < v );
do j = j – 1; while ( a[ j ] > v );
if ( i >= j ) break;
x = a [i ] ; a[ i ] = a[ j ]; a [ n ] = x;
/*fragment ends here*/
quicksort ( m, j ) ; quicksort ( i + 1,n);
}
Fig.2 C code for quicksort

An Organization for an Optimizing Compiler


Advantages of figure 3:

1. The operations needed to implement high-level constructs are made explicit in the
intermediate code, so it is possible to optimize them. For example, the address
calculations for a [i] are explicit in figure 4 so the recomputation of expressions like 4*i
can be eliminated as discussed in the next section.
2. The intermediate code can be relatively independent of the target machine, so the
optimizer does not have to change much if one for a different machine replaces the code
generator. The intermediate code in figure 4 assumes that each element of the array a
takes four bytes. Some intermediate codes, e.g. P-code for Pascal, leave it to the code
generator to fill in the size of a machine word. We could have done the same in our
intermediate code if we replaced 4 by symbolic constant.

48
front Code Code
end optimizer generator

Control-flow analysis data-flow analysis Transfor- mations

Fig. 3 Organization of the code optimizer

(1) i : = m – 1 (16)t7 : = 4 * i
(2) j : = n (17)t8 : = 4 * j
(3) t1 := 4 * n (18)t9 : = a [t8 ]
(4) v : = a [ t1 ] (19)a[t7]: = t9
(5) i : = i + 1 (20)t10 : = 4 * j
(6) t2: = 4 * i (21)a [ t10] : = x
(7) t3: = a [t2] (22)goto (5)
(8) if t3 < v goto (5) (23)t11 : = 4 * i
(9) j : = j – 1 (24)x := a [t11]
(10)t4 : = 4 * j (25)t12 : = 4 * i
(11)t5 : = a [t4 ] (26)t13 : = 4 *n
(12)if t5 > v goto (9) (27)t14: = a [t13]
(13)if i >=j goto (23) (28)a[t12] : = t14
(14)t6 : = 4 * i (29)t15 : = 4 * n
(15)x : = a [t6 ] (30)a [ t15] : = x

Fig.4 Three-address code for fragment in figure 3

Loop optimization is the most valuable machine-independent optimization because a program's


inner loops are good candidates for improvement. The important loop optimizations are
elimination of loop invariant computations and elimination of induction variables. A loop
invariant computation is one that computes the same value every time a loop is executed.
Therefore, moving such a computation outside the loop leads to a reduction in the execution
time. Induction variables are those variables used in a loop; their values are in lock-step, and
hence, it may be possible to eliminate all except one.

49
1 Eliminating Loop Invariant Computations

To eliminate loop invariant computations, we first identify the invariant computations and then
move them outside loop if the move does not lead to a change in the program's meaning.
Identification of loop invariant computation requires the detection of loops in the program.
Whether a loop exists in the program or not depends on the program's control flow, therefore,
requiring a control flow analysis. For loop detection, a graphical representation, called a
"program flow graph," shows how the control is flowing in the program and how the control is
being used. To obtain such a graph, we must partition the intermediate code into basic blocks.
This requires identifying leader statements, which are defined as follows:

1. The first statement is a leader statement.


2. The target of a conditional or unconditional goto is a leader.
3. A statement that immediately follows a conditional goto is a leader.

A basic block is a sequence of three-address statements that can be entered only at the
beginning, and control ends after the execution of the last statement, without a halt or any
possibility of branching, except at the end.

2 Algorithm to Partition Three-Address Code into Basic Blocks

To partition three-address code into basic blocks, we must identify the leader statements in the
three-address code and then include all the statements, starting from a leader, and up to, but not
including, the next leader. The basic blocks into which the three-address code is partitioned
constitute the nodes or vertices of the program flow graph. The edges in the flow graph are
decided as follows. If B1 and B2 are the two blocks, then add an edge from B1 to B2 in the
program flow graph, if the block B2 follows B1 in an execution sequence. The block B2 follows
B1 in an execution sequence if and only if:

1. The first statement of block B2 immediately follows the last statement of block B1 in the
three-address code, and the last statement of block B1 is not an unconditional goto
statement.
2. The last statement of block B1 is either a conditional or unconditional goto statement,
and the first statement of block B2 is the target of the last statement of block B1.

For example, consider the following program fragment:

Fact(x)
{
int f = 1;
for(i = 2; i<=x; i++)

50
f = f*i;
return(f);
}

The three-address-code representation for the program fragment above is:

1. f = 1;
2. i=2
3. if i <= x goto(8)
4. f = f *i
5. t1 = i + 1
6. i = t1
7. goto(3)
8. goto calling

program The leader

statements are:

 Statement number 1, because it is the first statement.


 Statement number 3, because it is the target of a goto.
 Statement number 4, because it immediately follows a conditional goto
statement.
 Statement number 8, because it is a target of a conditional goto statement.

Therefore, the basic blocks into which the above code can be partitioned are as
follows, and the program flow graph is shown in Figure 1.

 Block B1:

 Block B2:

 Block B3:

 Block B4:

51
Figure 1: Program flow graph.
3 Loop Detection

A loop is a cycle in the flow graph that satisfies two properties:

1. It should have a single entry node or header, so that it will be possible to move all of the
loop invariant computations to a unique place, called a "preheader," which is a
block/node placed outside the loop, just in front of the header.
2. It should be strongly connected; that is, it should be possible to go from any node of the
loop to any other node while staying within the loop. This is required until at least some
of the loops get executed repeatedly.

If the flow graph contains one or more back edges, then only one or more loops/ xcycles exist in
the program. Therefore, we must identify any back edges in the flow graph.

4 Identification of the Back Edges

To identify the back edges in the flow graph, we compute the dominators of every node of the
program flow graph. A node a is a dominator of node b if all the paths starting at the initial node
of the graph that reach to node b go through a. For example, consider the flow graph in Figure 2.
In this flow graph, the dominator of node 3 is only node 1, because all the paths reaching up to
node 3 from node 1 do not go through node 2.

52
Figure 2: The flow graph back edges are identified by computing the dominators.

Dominator (dom) relationships have the following properties:

1. They are reflexive; that is, every node dominates itself.


2. That are transitive; that is, if a dom b and b dom c, this implies a dom c.

5 Reducible Flow Graphs

Several code-optimization transformations are easy to perform on reducible flow graphs. A flow
graph G is reducible if and only if we can partition the edges into two disjointed groups, forward
edges and back edges, with the following two properties:

1. The forward edges form an acyclic graph in which every node can be reached from the
initial node G.
2. The back edges consist only of edges whose heads dominate their tails.

For example, consider the flow graph shown in Figure 3. This flow graph has no back edges,
because no edge's head dominates the tail of that edge. Hence, it could have been a reducible
graph if the entire graph had been acyclic. But that is not the case. Therefore, it is not a reducible
flow graph.

Figure 3: A flow graph with no back edges.

After identifying the back edges, if any, the natural loop of every back edge must be identified.
The natural loop of a back edge a → b is the set of all those nodes that can reach a without
going through b, including node b itself. Therefore, to find a natural loop of the back edge n →

53
d, we start with node n and add all the predecessors of node n to the loop. Then we add the
predecessors of the nodes that were just added to the loop; and we continue this process until we
reach node d. These nodes plus node d constitute the set of all those nodes that can reach node n
without going through node d. This is the natural loop of the edge n → d. Therefore, the
algorithm for detecting the natural loop of a back edge is:

Input : back edge n → d.


Output: set loop, which is a set of nodes forming the
natural loop of the back edge n → d.
main()
{
loop = { d } / * Initialize by adding node d to the set loop*/
insert(n); /* call a procedure insert with the node n */
}
procedure insert(m)
{
if m is not in the loop then
{
loop = loop 𝖴 { m }
for every predecessor p of m do
insert(p);
}
}

For example in the flow graph shown in Figure 1, the back edges are edge B3 → B2, and the
loop is comprised of the blocks B2 and B3.

After the natural loops of the back edges are identified, the next task is to identify the loop
invariant computations. The three-address statement x = y op z, which exists in the basic block B
(a part of the loop), is a loop invariant statement if all possible definitions of b and c that reach
upto this statement are outside the loop, or if b and c are constants, because then the calculation
b op c will be the same each time the statement is encountered in the loop. Hence, to decide
whether the statement x = b op c is loop invariant or not, we must compute the u−d chaining
information. The u−d chaining information is computed by doing a global data flow analysis of
the flow graph. All of the definitions that are capable of reaching to a point immediately before
the start of the basic block are computed, and we call the set of all such definitions for a block B
the IN(B). The set of all the definitions capable of reaching to a point immediately after the last
statement of block B will be called OUT(B). We compute both IN(B) and OUT(B) for every
block B, GEN(B) and KILL(B), which are defined as:

54
 GEN(B): The set of all the definitions generated in block B.
 KILL(B): The set of all the definitions outside block B that define the same variables as
are defined in block B.

Consider the flow graph in Figure 4.The GEN and KILL sets for the basic blocks are as shown
in Table 1.

Table 1: GEN and KILL sets for Figure 4 Flow Graph


Block GEN KILL
B1 {1,2} {6,10,11}
B2 {3,4} {5,8}
B3 {5} {4,8}
B4 {6,7} {2,9,11}
B5 {8,9} {4,5,7}
B6 {10,11} {1,2,6}

Figure 4: Flow graph with GEN and KILL block sets.

IN(B) and OUT(B) are defined by the following set of equations, which are called "data flow
equations":

55
The next step, therefore, is to solve these equations. If there are n nodes, there will be 2n
equations in 2n unknowns. The solution to these equations is not generally unique. This is
because we may have a situation like that shown in Figure 5, where a block B is a predecessor of
itself.

Figure 5: Nonunique solution to a data flow equation, where B is a predecessor of itself.

If there is a solution to the data flow equations for block B, and if the solution is IN(B) = IN0 and
OUT(B) = OUT0, then IN0 𝖴 {d} and OUT0 𝖴 {d}, where d is any definition not in IN0. OUT0
and KILL(B) also satisfy the equations, because if we take OUT 0 𝖴 {d} as the value of OUT(B),
since B is one of the predecessors of itself according to IN(B) = 𝖴 OUT(P), d gets added to
IN(B), because d is not in the KILL(B). Hence, we get IN(B) = IN0 𝖴 {d}. And according to
OUT(B) = IN(B) − KILL(B) GEN(B), OUT(B) = OUT0 𝖴 {d} gets satisfied. Therefore, IN0,
OUT0 is one of the solutions, whereas IN 0 𝖴 {d}, OUT0 𝖴 {d} is another solution to the
equations—no unique solution. What we are interested in is finding smallest solution, that is, the
smallest IN(B) and OUT(B) for every block B, which consists of values that are in all solutions.
For example, since IN0 is in IN0 𝖴 {d}, and OUT0 is in OUT0 𝖴 {d}, IN0, OUT0 is the smallest
solution. And this is what we want, because the smallest IN(B) turns out to be the set of all
definitions reaching the point just before the beginning of B. The algorithm for computing the
smallest IN(B) and OUT(B) is as follows:

1. For each block B do


2. {
IN(B)= φ
OUT(B)= GEN(B)
}

3. flag = true
4. while (flag) do

56
5.

{ flag = false
for each block B do
{
INnew(B) = Φ
6. for each predecessor P of B
INnew(B) = INnew(B) 𝖴 OUT(P)
if INnew(B) ≠ IN(B) then
{
flag = true
IN(B) = INnew(B)
OUT(B) = IN(B) - KILL(B) 𝖴 GEN(B)
}
}
}

Initially, we take IN(B) for every block that is to be an empty set, and we take OUT(B) for
GEN(B), and we compute INnew(B). If it is different from IN(B), we compute a new OUT(B) and
go for the next iteration. This is continued until IN(B) comes out to be the same for every B in a
previous or current iteration.

For example, for the flow graph shown in Figure 5, the IN and OUT iterations for the blocks are
computed using above algorithm, as shown in Tables 2–6.

Table 2: IN and OUT Computation for Figure 5


Block IN OUT
B1 Φ {1,2}
B2 Φ {3,4}
B3 Φ {5}
B4 Φ {6,7}
B5 Φ {8,9}
B6 Φ {10,11}
Table 3: First Iteration for the IN and OUT Values
Block IN OUT
B1 Φ {1,2}
B2 {1,2,6,7} {1,2,3,4,6,7}
B3 {3,4,8,9} {3,5,9}
B4 {3,4,5} {3,4,5,6,7}
B5 {5} {8,9}
B6 {6,7} {7,10,11}

57
Table 4: Second Iteration for the IN and OUT Values
Block IN OUT
B1 Φ {1,2}
B2 {1,2,3,4,5,6,7} {1,2,3,4,6,7}
B3 {1,2,3,4,6,7,8,9} {1,2,3,5,6,7,9}
B4 {1,2,3,4,5,6,7,9} {1,3,4,5,6,7}
B5 {3,5,9} {3,8,9}
B6 {3,4,5,6,7} {3,4,5,7,10,11}
Table 5: Third Iteration for the IN and OUT Values
Block IN OUT
B1 Φ {1,2}
B2 {1,2,3,4,5,6,7} {1,2,3,4,6,7}
B3 {1,2,3,4,6,7,8,9} {1,2,3,5,6,7,9}
B4 {1,2,3,4,5,6,7,9} {1,3,4,5,6,7}
B5 {1,2,3,5,6,7,9} {1,2,3,6,8,9}
B6 {1,3,4,5,6,7} {1,3,4,5,7,10,11}
Table 6: Fourth Iteration for the IN and OUT Values
Block IN OUT
B1 Φ {1,2}
B2 {1,2,3,4,5,6,7} {1,2,3,4,6,7}
B3 {1,2,3,4,6,7,8,9} {1,2,3,5,6,7,9}
B4 {1,2,3,4,5,6,7,9} {1,3,4,5,6,7}
B5 {1,2,3,5,6,7,9} {1,2,3,6,8,9}
B6 {1,3,4,5,6,7} {1,3,4,5,7,10,11}

The next step is to compute the u−d chains from the reaching definitions information, as follows.

If the use of A in block B is preceded by its definition, then the u−d chain of A contains only the
last definition prior to this use of A. If the use of A in block B is not preceded by any definition
of A, then the u−d chain for this use consists of all definitions of A in IN(B).

For example, in the flow graph for which IN and OUT were computed in Tables 2–6, the use of
a in definition 4, block B2 is preceded by definition 3, which is the definition of a. Hence, the
u−d chain for this use of a only contains definition 3. But the use of b in B2 is not preceded by
any definition of b in B2. Therefore, the u−d chain for this use of B will be {1}, because this is
the only definition of b in IN(B2).

The u−d chain information is used to identify the loop invariant computations. The next step is
to perform the code motion, which moves a loop invariant statement to a newly created node,
called "preheader," whose only successor is a header of the loop. All the predecessors of the
header that lie outside the loop will become predecessors of the preheader.

58
But sometimes the movement of a loop invariant statement to the preheader is not possible
because such a move would alter the semantics of the program. For example, if a loop invariant
statement exists in a basic block that is not a dominator of all the exits of the loop (where an exit
of the loop is the node whose successor is outside the loop), then moving the loop invariant
statement in the preheader may change the semantics of the program. Therefore, before moving
a loop invariant statement to the preheader, we must check whether the code motion is legal or
not. Consider the flow graph shown in Figure 6

Figure 6: A flow graph containing a loop invariant statement.

In the flow graph shown in Figure 6, x = 2 is the loop invariant. But since it occurs in B3, which
is not the dominator of the exit of loop, if we move it to the preheader, as shown in Figure 7, a
value of two will always get assigned to y in B5; whereas in the original program, y in B5 may
get value one as well as two.

Figure 7: Moving a loop invariant statement changes the semantics of the program.
After Moving x = 2 to the Preheader

In the flow graph shown in Figure 7, if x is not used outside the loop, then the statement x = 2
can be moved to the preheader. Therefore, for a code motion to be legal, the following
conditions must be met, even if no errors are encountered:

59
1. The block in which a loop invariant statement occurs should be a dominator of all exits
of the loop, or the name assigned to the block should not be used outside the loop.
2. We cannot move a loop invariant statement assigned to A into preheader if there is
another statement in the loop that assigns to A. For example, consider the flow graph
shown in Figure 8.

Figure 8: Moving the preheader changes the meaning of the program.

Even though the statement x = 3 in B2 satisfies condition (1), moving it to the preheader
will change the meaning of the program. Because if x = 3 is moved to the preheader, then
the value that will be assigned to y in B5 will be two if the execution path is B1–B2–B3–
B4–B2–B4–B5. Whereas for the same execution path, the original program assigns a
three to y in B5.

3. The move is illegal if A is used in the loop, and A is reached by any definition of A other
than the statement to be moved. For example, consider the flow graph shown in Figure 9.

Figure 9: Moving a value to the preheader changes the original meaning of the program.

Even though x is not used outside the loop, the statement x = 2 in the block B2 cannot be moved
to the preheader, because the use of x in B4 is also reached by the definition x = 1 in B1.
Therefore, if we move x = 2 to the preheader, then the value that will get assigned to a in B4 will
always be a one, which is not the case in the original program.

60
Conclusion: Thus we have studied code optimization for common sub-expression elimination,
loop invariant code movement.

61
Appendix B: ASCII Character Set

American Standard Code for Information Interchange (ASCII) is a character set the most
widely used at the early development of computers, proposed by ANSI in 1963. It uses an 7-bit
word to represent a character so that at maximum 2 7=128 characters can be represented.. ASCII
code is the numerical representation of a character such as 'x' or '$' or any action commonly
required in simple data interchange.
Later on, ASCII code was extended to cater for the need of increasing demand to
represent additional characters on the computer. This stem to the Extended ASCII character set
consist of 128 characters. Below is the original ASCII character set consisting of 128 characters.

62
Evaluation and marking system:

Basic honesty in the evaluation and marking system is absolutely essential and in the process
impartial nature of the evaluator is required in the examination system to become popular
amongst the students. It is a wrong approach or concept to award the students by way of easy
marking to get cheap popularity among the students to which they do not deserve. It is a
primary responsibility of the teacher that right students who are really putting up lot of hard
work with right kind of intelligence are correctly awarded.

The marking patterns should be justifiable to the students without any ambiguity and teacher
should see that students are faced with unjust circumstances.

The assessment is done according to the directives of the Principal/ Vice-Principal/ Dean
Academics.

61

You might also like