Compiler Construction-Ii Project "C2ASM" (Cross Compiler)
Compiler Construction-Ii Project "C2ASM" (Cross Compiler)
Compiler Construction-Ii Project "C2ASM" (Cross Compiler)
CONSTRUCTION-II
PROJECT
“C2ASM”
(Cross Compiler)
COURSE TITLE: COMPILER CONSTRUCTION-II
(BSCS-603)
COURSE SUPERVISOR:
Sir Tafseer Ahmed Khan
Madam Sadaf Alvi
Thanks,
M Owais Khan Afridi,
C2ASM Programmer.
We Need To Make Transition Diagrams For Identifiers And Keywords For
Others.
*
Start l/- Other
Start = =
Other *
Start < =
>
Other *
For ‘ >’,’>=’ :-
Start > =
Other *
Start / / / /-
% / * +
FA For Punctuations :-
For ( , ) , { , } , , , ;
Start ( ,),{,},,,;
FA For Numbers :-
Start d L/l
Other
*
FA For Errors :-
^
Void ( 123
Main ) 231
Int { 0 etc
Long }
If , Identifiers
Else ; variable names
For function names
While
Do
Return
OPERATORS
== + =
<> -
> /
>= *
< %
<=
Format:
(class , value)
Keywords:
(void , ----)
(main, ---)
(dt, int/long)
(if, ---)
(else, ---)
(for, ---)
(while, ---)
(do, ---)
(return, ---)
Identifiers:
(Id, _1/_fld/…….etc )
Punctuations:
braces_open={
braces_close=}
paranthesis_open=(
paranthesis_close=)
comma= ,
semicolon= ;
square_open= [
square_close= ]
Operators:
relop= ‘==’, ‘<>’ , ‘>=’ , ‘<=’ , ‘>’ , ‘<’
assignop= ‘=’
add_sub= ‘+’,’-‘
mul_div_mod= ‘*’ , ’/’ , ‘%’
Numbers:
int_const = 0,1,2,232,2323,…….etc
long_const = 1L,2L,232l,2312123L,…….etc
GRAMMAR:
Features
The grammar for the C-Language subset has the following notable features.
1. Multiple Global Variable Declarations
2. Multiple Global Function Declarations
3. Main program
4. Variable Declarations at the start of the program as in C-Language
5. for, while and do-while loops
6. Nested loops
7. Function calls but they are different from original C-Language function calls,
8. The ‘return’ key word
9. The argument to functions or the right hand side of an assignment operator can be function
calls
10. Recursion is allowed.
Productions
The grammar for the language has been split into three sections for its easy understanding. Three
sections in which the grammar is split into are
,
3. <data-or-function> à id <data-or-function>
Selection-Set = ,
4. <data-or-function> à ;
Selection-Set = ;
5. <data-or-fucntion> à (<argument-list>) <function-body>
Selection-Set = (
10. <variable-declarations> à Є
Selection-Set = id , { , while , for , do , return , if , [ , }
12. <variable-list> à Є
Selection-Set = ;
13. <argument-list> à void
Selection-Set = void
16. <argument> à Є
Selection-Set = )
Statement Productions :
19. <statements> à Є
Selection-Set = }
29. <optional-else> à Є
Selection-Set = id , while , { , for , do , return , if , } , [
31.<right-hand-side> à [<fucntion-call>]
Selection-Set = [
32.<fucntion-call> à id (<optional-expression-list>)
Selection-Set = id
33. <optional-expression-list> à Є
Selection-Set = )
34.<optional-expression-list> à <expression-list-element> <expression-listf>
Selection-Set = ( , id , int_const , long_const , [
36. <expression-listf> à Є
Selection-Set = )
37.<expression-list-element> à <expression>
Selection-Set = ( , id , int_const , long_const
Expression Productions :
40. <relational> à Є
Selection-Set = ) , ; , ,
43. <Subract> à Є
Selection-Set = relop , ) , ; , ,
45.<Add> à + <U><add>
Selection-Set = +
46. <Add> à Є
Selection-Set = - , relop , ) , ; , ,
48.<Multiply> à * <V><Multiply>
Selection-Set = *
49. <Multiply> à Є
Selection-Set = + , - , relop , ) , ; , ,
51.<divide> à / <W><divide>
Selection-Set = /
52. <divide> à Є
Selection-Set = * , + , - , relop , ) , ; , ,
55. <mod> à Є
Selection-Set = / , * , + , - , relop , ) , ; , ,
56.<X> à (<expression>)
Selection-Set = (
57.<X>à id
Selection-Set = id
58.<X> à int-const
Selection-Set = int_const
59.<X> à long-const
Selection-Set = long_const
Convention:
• Those ACTION SYMBOLS,Which are in small letters are used for type checking,No
INTERMEDIATE CODE is Generated for them.
• Those ACTION SYMBOLS,Which are in CAPITAL LETTERS letters are used to show
ATOMS,means INTERMEDIATE CODE is generated for them.
• Those in Bold Italic belong to TOKEN SET.
4. <data-or-function>k à ;
10. <variable-declarations> à Є
12. <variable-list>t à Є
13. <argument-list> à void {Set parameter Info of this Particular Function to Void}
{Do Function binding}{Registering Function’s Start, Set func_index}
16. <argument> à Є
19. <statements> à Є
29. <optional-else> à Є
31.<right-hand-side>r à [<fucntion-call>f]
{Find Function’s Return Value and assign it’s referece to
Right hand Side i.e,r }
rßf
33. <optional-expression-list> à Є
36. <expression-listf> à Є
--CONVENTIONS:
*<Subract/Add/Multiply/Divide/Mod>p,q
p=Inherited Attribute,q=Synthesized Attribute
*<arithmetic/T/U/V/W/X>p
p= Synthesized Attribute
*<relational>k,t1
k,t1= Synthesized Attribute
56.<X>p à (<expression>p)
58.<X>p à int-consti
pßi
59.<X>p à long-consti
pßi
• Name:
It’s a String Class Object
• Datatype
0=Int
1=Long
22=Void
• Scope
0=Global Scope
1,2,3…….n= ScopeStack( ).Top
Where “ScopeStack( ).Top” is a Method, that will give CURRENT SCOPE.
• Binding
-2= IdentifierBinded Globally
-1= IdentifierBinded to Main()
Func_indx=Index of Variable[Must be a Function Variable] to which a Local/Temp
Variable is binded
• Function_or_not
Function_or_not=0, If Identifier is not a function
Function_or_not=1, If Identifier is a Function
• Type
0=Global Identifier
1=Main Identifier
2=Local Identifier
3=Parameter Identifier
4=Temporary Identifier
• Offset
-10 = Undefined
2n=Where n=1,2,3,4,……..n
2n must be calculated by programmer defined function for Local And Parameters Of
Functions
• Xtra
-10 = Undefinded
2n=Where n=1,2,3,4,……..n
2n must be calculated by programmer defined function for the TOTAL SUM of all
PARAMETERS
Int/Long NUMBER
3) Symbol table for Labels.
Specific instance in our code is labels. Which is a Vector of STL(C++)
STRING
• op
op=OP-Code Like in ASSEMBLY or they’are ATOMS generated by the Syntax
Box.In Our case it can’ve values like LABEL,ASSIGN,CONDJUMP,JUMP,
RETURN,PARAM,CALL,JUMPF,CMP,ADD,SUB,MUL,DIV,MOD,
PROC_MARKER.
• Type
It’s an Internal representation of all the ATOMs in an INTEGER.Like LABEL has a
type 25,CALL has type 31,etc.
• Expr
It’s an structure,desgined for making ATOM Sets as small and with no REDUNDANT
information already present in other tables.It has following structure.
Index Datatype Whichtable
Datatype Int Int int
- Index
It points to the ORIGNAL position of identifiers in other tables.Which may be
syn_identifier’s or args_identifier’s INDEX.
- Datatype
It has a value
0=int
1=long
22=void
23=int_const
24=long_const
- Whichtable
As we’ve been discussing the TABLE formats in our compiler. It’s evident that
there are 5 tables with which we’re running the whole SYNTAx Box and Code
Generator.To Facilitate the programming these tables are ASSUMED to have
some INTEGER NUMBERs attached to them,Which is as Follows.
0=syn_identifier
1=args_identifier
2=number_long
3=number_int
4=lables
Note: Since Arg1,Arg2,Result are all Expr type Structures,Therefore They all are defined in terms
of Expr’s Fields.Those Fields which’ve an X in their place are NOT USED.We’ve used “-10” for
all the things which are UNDEFINED or have no Relevant meaning in that context
LABEL:
It outputs a label in the code.
Result.Index=Pointer to “labels “ symbol table particular entry
Result.Datatype= -10
Result.Whichtable=Table Number,here it’s 4
ASSIGN:
It’ll perform the assignment operation like a=b or a=v+f….etc in the program.
Arg1=R.H.S
Arg1.Index=Points to “syn_identifier” or “args_identifier” or “number_long” or
“number_int” SYMBOL table’s particular entry.
Arg1.Datatype=0 for int
1 for long
23 for int_const
24 for long_const
Arg1.Whichtable
0=syn_identifier
1=args_identifier
2=number_long
3=number_int
Result =L.H.S
Result.Index=Points to “syn_identifier” or “args_identifier” SYMBOL table’s
particular entry.
Result.Datatype=
0 for int
1 for long
Result.Whichtable=
0=syn_identifier
1=args_identifier
CONDJUMP:
Arg1:
Arg1.Index=Points to “syn_identifier” SYMBOL table’s
particular entry.
Arg1.Datatype=
0 for int
1 for long
Arg1.Whichtable=
0=syn_identifier
Arg2:
Arg2.Index= -10
Arg2.Datatype=0 or 1
We’ll use this value for comparison with Arg1
Arg2.Whichtable= -10
Result:
Result.Index=Pointer to “labels “ symbol table particular entry
Result.Datatype= -10
Result.Whichtable=Table Number,here it’s 4
JUMPF:
Result:
Result.Index=Pointer to “labels “ symbol table particular entry
Result.Datatype= -10
Result.Whichtable=Table Number,here it’s 4
RETURN:
Arg1:
Arg1.Index=Pointer to “syn_identifier” table’s particular entry
Arg1.Datatype=0 or 1 or 22
Arg1.Whichtable=Table Number,Here it’s 0
Result:
Result.Index=Pointer to “syn_identifier“ or “args_identifier” or “number_long”
or ”number_int” symbol table’s particular entry
Result.Datatype=0 for int
1 for long
23 for int_const
24 for long_const
Result.Whichtable
0=syn_identifier
1=args_identifier
2=number_long
3=number_int
PARAM:
Result:
Result.Index=Pointer to “syn_identifier“ or “args_identifier” or “number_long”
or ”number_int” symbol table’s particular entry
Result.Datatype=0 for int
1 for long
23 for int_const
24 for long_const
Result.Whichtable
0=syn_identifier
1=args_identifier
2=number_long
3=number_int
CALL:
Arg1:
Arg1.Index=Number Of arguments expected
Arg1.Datatype= -10
Arg1.WhichTable= -10
Result:
Result.Index=Pointer to “syn_identifier“ symbol table’s particular entry
Result.Datatype=0 for int
1 for long
22 for Void
It Shows Function’s return type
Result.Whichtable
0=syn_identifier
JUMPF:
Arg1:
Arg1.Index=Pointer to “syn_identifier” symbol table’s particular entry.
Arg1.Datatype=0 for int
1 for long
Arg1.Whichtable
0=syn_identifier
1=args_identifier
Arg2:
Arg2.Index=Pointer to “syn_identifier” or “args_identifier” or
“number_int” or “number_long” symbol table’s particular
entry.
Arg2.Datatype=0 for int
1 for long
23 for int_const
24 for long_const
Arg2.Whichtable
0=syn_identifier
1=args_identifier
2=number_long
3=number_int
Result:
Result.Index=Pointer to “labels” symbol table’s particular entry.
Result.Datatype= -10
Result.Whichtable
4=labels
CMP:
Arg1:
Arg1.Index=Pointer to “syn_identifier” or “args_identifier” or
“number_int” or “number_long” symbol table’s particular
entry.
Arg1.Datatype=0 for int
1 for long
23 for int_const
24 for long_const
Arg1.Whichtable
0=syn_identifier
1=args_identifier
2=number_long
3=number_int
Arg2:
Arg2.Index=Pointer to “syn_identifier” or “args_identifier” or
“number_int” or “number_long” symbol table’s particular
entry.
Arg2.Datatype=0 for int
1 for long
23 for int_const
24 for long_const
Arg2.Whichtable
0=syn_identifier
1=args_identifier
2=number_long
3=number_int
Result:
Result.Index=Pointer to “syn_identifier”
symbol table’s particular entry.
Result.Datatype=0 for int
1 for long
Result.Whichtable
0=syn_identifier
ADD/SUB/MUL/DIV/MOD:
Arg1:
Arg1.Index=Pointer to “syn_identifier” or “args_identifier” or
“number_int” or “number_long” symbol table’s particular
entry.
Arg1.Datatype=0 for int
1 for long
23 for int_const
24 for long_const
Arg1.Whichtable
0=syn_identifier
1=args_identifier
2=number_long
3=number_int
Arg2:
Arg2.Index=Pointer to “syn_identifier” or “args_identifier” or
“number_int” or “number_long” symbol table’s particular
entry.
Arg2.Datatype=0 for int
1 for long
23 for int_const
24 for long_const
Arg2.Whichtable
0=syn_identifier
1=args_identifier
2=number_long
3=number_int
Result:
Result.Index=Pointer to “syn_identifier”symbol table’s particular entry.
Result.Datatype=0 for int
1 for long
Result.Whichtable
0=syn_identifier
PROC_MARKER:
Arg1:
Arg1.Index= -1 For main() or Function’s Index For OTHER FUNCTIONS
Arg1.Datatype= 1 for Start or 0 for End
Arg1.Whichtable=Saving The TOTAL LENGTH of OFFSETS of Local
Variables/Temporaries of a Particular Function
Action Symbols:
There are many action Symbols used in this particular compiler.Many of them
are used for type checking and preparing other information which is being used by the code
generator.We’ve implemented them as Helper Functions having Declarations as follows.There
names help us guess their respective functionality
//Helper Functions
void settype(int index,int type);
void settype_args(int index,int type);
expr newtemp(int type);
string newtempname(void);
long chk_ident(long index);
bool chk_types(expr v1,expr v2);
void setatom(int op,expr arg1,expr arg2,expr result);
int newlabel(void);
long chk_func(long index);
void args_info(long index,long &init_arg,long &fin_arg);
int calc_param_offset();
int calc_local_offset();
CODE GENERATOR:
Now,The Code generator will take ATOMs STREAMS as input and
start making ASSEMBLY CODE.We’ve coded Functions against all ATOMS.So,When a Particular
Atoms is seen it’s corresponding function is called, generating ASSEMBLY code for it.Which can
be TESTED on an assembler.
MY CODING CONVENTION:
I’ve used mapping of all Keywords, Punctuations, Operators, Tokens, Atoms or
Intermediate Code to INTEGER NUMBERS, To ease Programming. Take a look at following to
better understand what I’m trying to say. I’ve used these number counterparts all over in my
implementation,since numbers are easy to handle then strings and more efficient.
"int" = 0
"long" = 1
"{" = 2
"}" = 3
"(" = 4
")" = 5
"," = 6
";" = 7
"==" = 8
"<>" = 9
">=" = 10
"<=" = 11
">" = 12
"<" = 13
"=" = 14
"+" = 15
"-" = 16
“*" = 17
"/" = 18
"%" = 19
"[" = 20
"]" = 21
"void" = 22
“int_const" = 23
"long_const" = 24
label" = 25
"assign" = 26
"condjump" = 27
"jump" = 28
"return" = 29
"param" = 30
"call" = 31
"jumpf" = 32
"cmp" = 100
"proc_marker" = 34