Design Assembler Based On Lex and Yacc: Amera Ismail Melhum, Suzan Abdulla Mahmood
Design Assembler Based On Lex and Yacc: Amera Ismail Melhum, Suzan Abdulla Mahmood
Design Assembler Based On Lex and Yacc: Amera Ismail Melhum, Suzan Abdulla Mahmood
Abstract
LEX and YACC are very useful tools for constructing the assemblers; they generate
functions that perform standard operations of a lexical analysis and parsing without
any effecting on the organization processes for semantic analysis, machine code
generation and listing source of compilation. In this work, the cross assembler was
designed using some compiler development tools like LEX and YACC having the
ability to implement the lexical analyzer and parser, for generating the syntax and
parsing modules in a short period of time. These activates are expressed in form of
actions, the bigger number of lexical and grammar rules are used by these actions in
a simpler way.
Keywords: Assembly language, Assembler, Context Free Grammar, LEX, YACC
اﻟﺧﻼﺻﺔ
اذ ﺗوﻟــد وظــﺎﺋف ﻟﺗﻧﻔﯾــذ اﻟﻌﻣﻠﯾــﺎت اﻟﻘﯾﺎﺳــﯾﺔ. ذات ﻓﺎﺋـدة ﻛﺑﯾـرة ﻟﺑﻧــﺎء اﻟﻣﺟﻣﻌــﺎتYACC وLEX ﺗﻌــد اﺳــﻠوﺑﻲ
ﻓـﻲ ﻫـذا اﻟﺑﺣـث ﺗﻣـت ﺗـﺻﻣﯾم.وذﻟك ﻋن طرﯾق ﺗﺣﻠﯾل اﻟﻣﻔردات ﺑدون ﺗﺄﺛﯾر ﻋﻠﻰ ﺗﻧظﯾم ﻋﻣﻠﯾـﺎت اﻟﺗﺣﻠﯾـل اﻟـدﻻﻟﻲ
ذﻟـك ﻟﻘـﺎﺑﻠﯾﺗﻬم ﻷﻧﺟـﺎز ﺗﺣﻠﯾـلYACC وLEX اﻟﻣﺟﻣـﻊ اﻟـﺻﻠب ﺑﺎﺳـﺗﺧدام ﺑﻌـض ادوات اﻟﺗطـوﯾر اﻟﻣﺗـرﺟم ﻣﺛـل
وC++ اﻟﻣﻔــردات و ﺗﻌرﯾـب اﻟﻛﻠﻣــﺔ و اﻧــﺷﺎء ﺗرﻛﯾــب ﻓــﻲ ﻓﺗـرة زﻣﻧﯾــﺔ ﻗﻠﯾﻠــﺔ ﻣﻘﺎرﻧﺗﻬــﺎ ﻣــﻊ اﻟﻠﻐــﺎت اﻟﺑرﻣﺟﯾــﺔ ﻣﺛــل
و اﻟﺦ ﻣن اﻟﻠﻐﺎت اﻟﺑرﻣﺟﯾﺔ واﻟﺟدﯾر ﺑﺎﻻﺷﺎرة ان ﻫـذﻩ اﻟﻔﻌﺎﻟﯾـﺎت ﺗوﺻـف ﺑـﺷﻛل ﺻـﯾﻎ واﺟـراءاتVisual Basic
.ﺑطرﯾﻘﺔ ﻣﺑﺳطﺔ وﻟﻬﺎ اﻟﻘدرة ﻋﻠﻰ ﻣﻌﺎﻟﺟﺔ أﻛﺑر ﻋدد ﻣن اﻟﻘواﻋد اﻟﻧﺣوﯾﺔ وﻣﻔردات اﻟﻣﻌﺎﺟم
179
Melhum and Mahmood Iraqi journal
180 of science, Vol.53, No.1, 2012, pp.179-185
used as an extensive programming project in a written by the user are executed in the order in
software engineering setting. which the corresponding regular expressions
In this research, it shows how to build the occur in the input stream[5, 6], the following is
assembler using some compiler development a sample code for Lex part, in the left hand the
tools like LEX and YACC[3] having the ability description of Token and in the right hand the
to implement the lexical analyzer and parser also associative action for it.
help us to generate the syntax and parsing %{
modules in a short period of time. #include <stdlib.h>
In the next section, the structure of the void yyerror(char *);
assembler is implemented by using two pass #include "y.tab.h"
assembler, the two pass assembler refer to the %}
number of time reading file. %%
/* variables */
1.1 LEX and Yacc [a-z] {yylval = *yytext - 'a';
An assembler, compiler or interpreter for a
return VARIABLE;}
programming language is often decomposed into
two parts:[4]
/* integers */
1-Read the source program and discover its
[0-9]+ { yylval = atoi(yytext);
structure.
return INTEGER;}
2-Process this structure, e.g. to generate the
}
target program.
/* operators */
Lex and Yacc can generate program
[-+()=/*\n] { return *yytext; }
fragments that solve the first task. The task of
/* skip whitespace */
discovering the source structure again is
[ \t] ;
decomposed into subtasks:
/* anything else is an error */
1-Split the source file into tokens (Lex).
. yyerror("invalidcharacter");
2-Find the hierarchical structure of the program
%%
(Yacc).
int yywrap(void) {
Lex and Yacc are tools that help
return 1 ;}
programmers build compilers and interpreters,
but they also have a wider range of applications 1.3 Yacc: Yet Another Compiler –
like converting text data to HTML ,XML or Compiler
Latex. Computer program input generally has some
structure; in fact, every computer program that
1.2 Lex : A lexical Analyzer Generator
does input can be thought of as defining an
Lex helps to write programs whose control
``input language'' which it accepts. An input
flow is directed by instances of regular
language may be as complex as a programming
expressions in the input stream. It is well suited
language, or as simple as a sequence of
for editor-script type transformations and for
numbers. Unfortunately, usual input facilities
segmenting input in preparation for a parsing
are limited, difficult to use, and often are lax
routine. While Lex can be used to generate
about checking their inputs for validity.
sequences of tokens for a parser, it can also be
Yacc provides a general tool for describing
used to perform simple input operations. A
the input to a computer program. The Yacc user
number of programs have been written to
specifies the structures of his input, together
convert simple English text into dialects using
with code to be invoked as each such structure is
only Lex.
recognized. Yacc turns such a specification into
Lex source is a table of regular expressions
a subroutine that handles the input process
and corresponding program fragments. The table
frequently, it is convenient and appropriate to
is translated to a program which reads an input
have most of the flow of control in the user's
stream, copying it to an output stream and
application handled by this subroutine [5, 6].
partitioning the input into strings which match
The following a sample code for Yacc part, in
the given expressions. As each such string is
the left hand the description of Grammar rule
recognized the corresponding program fragment
and in the right hand the associative action for it.
is executed. The recognition of the expressions
%token INTEGER VARIABLE
is performed by a deterministic finite automaton
%left '+' '-'
generated by Lex. The program fragments
180
Melhum and Mahmood Iraqi journal
181 of science, Vol.53, No.1, 2012, pp.179-185
181
Melhum and Mahmood Iraqi journal
182 of science, Vol.53, No.1, 2012, pp.179-185
182
Melhum and Mahmood Iraqi journal
183 of science, Vol.53, No.1, 2012, pp.179-185
2+3*4 in this simple example grammar [3].the - lrl, the list of the required labels, the labels
following is a part of the BNF for the proposal that appears in operand field.
assembler Both lists have the same structure:
struct list { char name[10]; // name of label
prog stmts end int lnum; // line number
stmts stmt newline stmts | epsiod int address; // address contents of the
stmt line | comment | label line | equ arg byte counter
line ld arg , arg struct list *next}
| in arg , arg *ldl // pointer to the current element of the ldl
| out arg , arg * pd // pointer to the last element of the ldl
| brs arg , arg * lrl // pointer to the current element of the lrl
| al1 *pr // pointer to the last element of the LRL
| al2 Applying the list structure is an acute
| al2 arg , arg necessity since number of labels and references
| posh ras to them is dependent on a program and can vary
| ret conr in a big range.
arg a | b | c | d | e | |h | i | r | hl | bc | de | af
| sp |af |ix | iy
| [ bc |de | hl | sp |ixy | label | number]
conr z | c | p | m | nz | nc | po | pe 2.2 Main segment
ixy i xy mp number In contrast to the name this segment performs
xy x | y only initialization of the lists LDL and LRL,
mp - | + calls parser and after terminating its work, prints
posh pop | push the closing message.
al1 cp | sub | and | or | xor 2.3 Lexical Analyzer Routine
al2 add | adc | sbc The lexical analyzer routine identifies
al3 inc | dec terminal symbols in the input text. Each time the
ras rlc | rrc | rl | rr | sla | sra routine tries to find the longest string of
| srl characters matching one of the given patterns of
brs bit | set | res terminal symbols. In case of ambiguity (there
................ are two or more patterns matching the same
.................. string) the first pattern found is taken into
.................. account. So the order of patterns have an
comment ; identifier important role. Each of the patterns correspond
identifier letter identifier | letter letterdigit to the user define actions coded in the C–
| episode language. Recognition of the given pattern in the
label letter Letterdigit input text is associated with execution of the
letterdigit letter letterdigit | digit letterdigit suitable action.
| _ letterdigit | episode This routine was produced by the lexical
number digit number | episode analyzer generator LEX: input data to this
digit 0 | 1 | 2 | 3|… |9 program is just specification of patterns and
letter a | b | c | d |…|z corresponding to actions (terminal symbol
actions). Output file of the LEX is (yy.lex.c) file
2.1 Data ( machine code for the that contains the C –language lexical analyzer
instructions) routine.
The assembler uses 2 types of data: 2.4 Terminal symbol actions
-One or two dimension tables contain machine Main job of a terminal symbol action is
codes of instruction for different operands. The informing the parser routine about the identified
correct association of an instruction and token. Additionally this action organizes of a
operands correspond to a non-zero value of a characters is printed out (by suitable place of the
machine code table element. Two lists of output buffer char linebuf [75]). The number of
symbolic names (labels): lexical rules decides about the size of the lexical
- ldl, the list of the label definitions, the address analyzer routine. It is common view, that an
of actual label that appears in the label field . action continues the process of input string
183
Melhum and Mahmood Iraqi journal
184 of science, Vol.53, No.1, 2012, pp.179-185
184
Melhum and Mahmood Iraqi journal
185 of science, Vol.53, No.1, 2012, pp.179-185
185