ASSembler (2)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Chapter 2: Assemblers:

2.1 Elements of Assembly language programming.

2.2 A simple assembly scheme.

2.3 Pass structure of assemblers.

2.4 Design of a two pass assembler.

2.5 A single pass assembler for IBM PC.

2.1 Elements of assembly language programming: -

- An assembly language is a machine dependant low level programming


language which is specific to a certain computer system.

- compared to the machine language of a computer system it provides three


basic features which simplify programming.

1) Mnemonic Operation codes:

- Also called as mnemonic opcodes.

- It eliminates the need to memorize numeric operation codes.

- It also enables the assembler to provide helpful diagnostion.

Eg. Indication of mis-spelt opcodes.

2) Symbolic Operands:

- Symbolic names can be associated with data or instructions & can be used
in assembly statement as operand.

- Assembler performs memory heading to ………….. ……………


………………… …………………………….

- Programmers need not to know any details of memory bindings.


- This leads to very important practical advantage during program
modification.

3) Data Declarations:

- Data can be declared in a variety of notations, including decimal notation.

- This avoids manual conversion of constants in to their internal machine


representation.

Eg. -5 in to (1111 1010)2 or

10.5 in to (41A8 0000)16

• Statement Format:
An assembly language statement has the following format:

[Label] <opcode> <operand spec> [<operand, spec>]

Where

[….] indicates that the enclosed specification is optional.

Label is symbolic name.

<operand spec> has the following syntax:

<symbolic name>[+<displacement>][(<index register>)]

This some possible operand forms are.

AREA => refers to the memory word with which the name AREA is associated.

AREA+5 => refers to the mem ory word 5 word away from the word with the name
AREA.

AREA(4) => operand addren is obtained by adding the contents of index register &
to the addren of AREA.
AREA+5(4) => combination of previous two.

• A simple assembly language.


- Each statement has two operands, the i) first operand is always a register
which can be ana1 of AREG, BREG, CREG & DREG. ii) second operand
refers to a memory word using a symbolic name and optional displacement.

Instruction Assembly Remarks

Opcode mnemonic

00 STOP stop execution

01 ADD first operand is modified

02 SUB condition code is set

03 MULT

04 MOVER Register memory mover

05 MOVEM Memory register movem

06 COMP Sets condition code

07 BC Branch on condition

08 DIV Analogous to SUB

09 READ first operand is ……………

10 ………… ………………….

- MOVE instruction move a value between a memory word and a register.


- In the MOVER instruction the second operand is source operand & first
operand is target operand.
- Converse is true for the MOVEM instruction.
- Condition code can be tested by BC instruction.

BC <condition code spec>,<memory addren>

It transfer control to memory word with the addren <memory addren> if the current
value of condition code matches <condition code spec>.

Fig 2.1.2 shows the machine instruction format:

- The opcode, register operand & memory operand occupy 2, 1 & 3 digits.
- Sign is not part of instruction.

sign opcode reg operand memory operand

Fig 2.1.2 Instruction format.

2.1.1 Assembly Language Statements:

An assembly program contains three kinds of statements.

1) Imperative Statements
2) Declarative Statements
3) Assembler Directives

1) Imperative statements:
- It indicates an action to be performed during the execution of the assembled
programs.
- Each imperative statement typically translates in to one machine instruction.
2) Declarative statements:
Syntax is
[Label] DS <constant>
[Label] DC ‘<value>’

The DS (short for declare storage) statement reserves area of memory &
associates name with them.
Consider the following DS statements.

A DS 1

G DS 200

The DC ( short for declare constant ) statement construct memory words containing
constant.

- It associates the name ONE with a memory word containing the value ‘1’.
- Constant can be a form decimal, binary, hexadecimal etc.

3) Assembler Directives:
- Assembler directives instruct the assembler to perform certain actions during
the assembly of a program.

Eg. START <constant>

Indicates that first word of the target program generated by the assembler should
be placed in the memory word with addren <constant>

END [<operand spec>]

This directive indicates end of source program optional part indicates addren of
instruction where execution of pgm should begin.

2.2A Simple Assembly Scheme .

Design specification of an assembler for this we use four step approack.

1) Identify the information necessary to perform a task.


2) Determine the suitable data structure to record information.
3) Determine the processing necessary to obtain & maintain the information.
4) Determine the processing necessary to perform task.

- The information requirements arise in the synthesis phase of an assembler.


- Here we consider how to make this information available ie during analysis
or during synthesis.
• Synthesis Phase:

Consider assembly statement

MOVER BREG, ONE

We must have the following information.

1) Addren of memory word with which name ONE is associated.


2) Machine opcode corresponding to the mnemonic MOVER.

First information depends on SP, hence it must be made available by analysis


phase.
Second info doesn’t depend on SP & hence synthesis phase can determine this
information for itself.

For this, we need to consider two data structure during the synthesis phase:
1) Symbol table
2) Mnemonics table

1)Symbol table:
- It has two entries name & addren.
- It is built by analysis phase.
2)Mnemonics table:
- It has two primary fields mnemonic & opcode
- The synthesis phases uses these table to obtain the machine addren with which
a name is associated & machine opcode corresponding to a mnemonic
respectively.
- Searching is done by symbol name & mnemonic as keys.

Analysis Phase:

- To determine addren we must have finished with memory allocation.


- To implement memory allocation a data structure called location counter (LC)
is introduced.
- The LC is always made to contain the addren of the next memory word in the
TP.
- LC is initialized to constant specified by START statement.
- For each label, new entry is done in symbol table with label & its LC.
- LC is updated each time to ensure that LC points to next memory word.
- To update LC, analysis phase must know the length of different instruction.
- This information depends on assembly language & hence new field called
length is introduced in the mnemonic table.
- Processing of maintaining LC, is called as LC processing.
- Fig. 2.2 shows the use of data structures by analysis & synthesis phase.
- Mnemonic table is fixed, morely accessed by analysis & synthesis phase.
- Symbol table is constructed during analysis phase used during synthesis
phase.
Fig:- Symbol Table

Data access

-----> control transfer.

Fig. 2.2 data structure of Assembler

• Tasks performed by Analysis & Synthesis phase

Analysis Phase:

1) Isolate the label, mnemonic opcode & operand fields of a statement.


2) It label is present enter the pair ……………. …………… ……..
……………………………………………………
3) Check the validity of mnemonic opcode through a look-up in the
Mnemonics table.
4) Perform LC processing.

Synthesis Phase:

1) Obtain the machine opcode corresponding to the mnemonic from the


Mnemonics table.
2) Obtain addren of a memory operand from the symbol table.
3) Synthesize a machine instruction or the machine form of a constant, as the
case may be.

2.3 Pass Structure of Assemblers:

- we know pass is the complete scan of a source program.

- let us discuss two pass & single pan assembly schemes.

1) Two pass translation:

It can handle forward references easily as first pass will do LC processing &
creates a symbol table and also constructs an IR of SP.

Simply first pan will perform analysis.

The second pass uses symbol table, IR & gives TP, simply we can say
synthesis is performed by second pass as shown in fig 2.3.
Data access

------> Control transfer.

2. Single Pass translation:

- LC processing & construction of symbol table is as in two pass translation


but the problem of forward reference is tackled using a process called backpatching.

- Operand field of instruction containing forward reference is left blank &


once it is encountered it is put at blank space.

- The need of inserting the second operands addren at later stage can be
indicated by adding an entry to the Table Incomplete Instruction. (TII).
- By the time END statement is processed, symbol table would contain all the
addrens & TII would contain info of all forward references.

- Assembler can now process each entry in TII to complete concerned


instruction.

2.4 Design of a two Pass Assembler:

Tasks performed by each pass are:

PassI.

1) Separate the symbol, mnemonic opcode & operand fields.


2) Build the symbol table.
3) Perform LC processing .
4) Construct intermediate representation.

PassII.

Synthesize the target program.

Pass I perform analysis of the SP & synthesis of the IR while Pass II processes
the IR to synthesize the target program.

2.4.1 Advanced Assembler Directives:

ORIGIN

Syntax: ORIGIN <addren spec>.

Where,

<addren spec> is an <operand spec> or <constant>.

- This directive indicates that LC should be set to addren given by <addren


spec>.
- It is useful when TP doesn’t consitst of consecutive memory words.

LTORG:-

- The LTORG statement permits a programmer to specify where literals should


be placed .
- By default assembler places the literals after the END statement.
- At every LTORG statement, the assembler allocates memory to the literals of
a literal pool.
- The pool contains all literals used in the program since the start of the program
or since the last LTORG statement.

4.2.2 PassI OF THE ASSEMBLER:-

It uses following data strtuctures

OPTAB – A table of mnemonic opcodes & related information.

SYMTAB – Symbol table

LITTAB – A table of literals used in the program.

Let us take an example to understand all the data structures. (example 2.4)

1 START 200

2 MOVER AREG, =‘5’ 200) + 04 1 211

3 MOV EM AREG, A 201) +05 1 217

4 LOOP MOVER AREG,A 202) +04 1 217

5 MOVER AREG,B 203) +05 3 218

6 ADD CREG, B 204) +01 3 212

7 ……
12 BC ANY, NEXT 210) +07 6 214

13 LTORG

= ‘5’ 211) +00 0 005

= ‘1’ 212) +00 0 001

14 ……

15 NEXT SUB AREG, = ‘1’ 214) +02 1 219

16 BC LT,BACK 215) +07 1 202

17 LAST STOP 216) +00 0 000

18 ORIGIN LOOP +2

19 MULT CREG,B 204) +03 3 218

20 ORIGIN LAST +1

21 A DS 1 217)

22 BACK EQU LOOP

23 B DS 1 218)

24 END

25 = ‘1’ 219) +00 0 001

Fig. An assembly program. (Program 2.4)

Mnemonic Class Mnemonic


opcode info
MOVER IS (04,1)
DS DL R#7
START AD R#11
:

Symbol Addren Length


LOOP 202 1
NEXT 214 1
LAST 216 1
A 217 1
BACK 202 1
B 218 1
OPTAB SYMTAB

Literal addren literal No.


1 = ‘5’
2 = ‘1’
3 = ‘1’
LITTAB POOLTAB

- Above tables have sample contents after processing program 2.4.


- Processing of assembly statement begins with the processing of its label field.
If it contains a symbol, the symbol & the value in LC is copied in to a new
entry of SYMTAB.
- Interpretation of the OPTAB entry for the mnemonic is also important in
PassI. Class field determines whether it is IS/DL/AD.
- The use of LITTAB is to collect all literals used in the propgram.
- POOLTAB, contains literal number of the starting literal of each literal pool.

2.4.3 Intermediate code forms. (used in two pass assembly)

The intermediate code consists of a set of IC units, each IC unit


consisting of the following three fields.

1) Addren.
2) Representation of mnemonic opcode.
3) Representation of operands.

Addren Opcode Operands

Fig. An IC unit

Mnemonic Field:-

The mnemonic field contains a pair of the form

(statement class, code)

Where, statement class = IS/DL/AD

Code = instruction opcode.

Declaration statement Assembler directives

DC 01
DS 02 START 01
END 02
ORIGIN 03
Thus (AD, 01) stands for START.

2.4.4 Intermediate code for imperative statement.

Variant- I

Syntax : (operand class, code)

Where,

Operand class is C/S/L

constant / symbol / literal & code field contains


internal representation of constant itself.
Eg.

START 200 is (C, 200)

Fig. 2.4.4 represents intermediate code-variants

START 200 (AD, 01) (C,200)

READ A (IS: 09) (3,01)

LOOP MOVER AREG, A (IS, 04) (1)(5,01)

SUB AREG, = ‘1’ (IS,02) (1) (L,01)

BC GT,LOOP (IS,07) (4) (5, 02)

STOP (IS, 00)

A DS 1 (DL, 02) (C,1)

LTORG (AD,01)
……….. ………….

Fig 2.4.4 IC : Varient – I

Varient – II

- In this, the operand field is processed only to identify literal references.


- Literals are entered in LITTAB & are represented as (L,m) in LC.
- Symbolic references in the source statement are not processed at all during
pass I.

START 200 (AD,01) (C,200)

READ A (IS, 09) A


MOVER AREG,A (IS, 04) AREG,A

SUB AREG,=‘1’ (IS, 02) AREG, (L,01)

BC GT, LOOP (IS,07) GT, LOOP

STOP (IS,00)

DS 1 (DL, 02) (C,1)

LTORG (AD,05)

…….. ………

Fig. Intermediate code Variant – II

2.4.5 Processing of Declaration & Assembler Directives.

In pass I of the assembler-

i) It is necessary to represent the addren of each source statement


in IC.
ii) It is necessary to have an explicit representation of DS statement
& Assembler directives.

Eg. Consider following source program & its intermediate code

START 200 (AD,01) (C, 200)


AREA1 DS 20 200) (DL, 02) (C, 20)
SIZE DC 5 220) (DL, 01) (C,5)
a) SP b) IC
- If DC statement defines many constant DC ‘5,3,-7’ then a series of (DL, 01)
units can be put in the IC.
- LTORG : when it appears in the source program, it assign memory addresses
to literals in the current pool of these addrens are entered in their LITTAB
entries.

2.4.6 Pass II of the assembler

Following is the algorithm for Pass-II of assembler.

Algorithm.

1) code-area-addren:= addren of code_area:


pooltab_ptr :=1;
loc_cntr:=0;
2) while next statement is not an END statement
a) clear machine-code-buffer;
b) if an LTORG statement
i) process literal in LITTAB[POOLTAB[pooltab_prt ……..
….LITTAB[POOLTAB[pooltab_ptr + 1] – 1 similar to processing
of constants in a DC statement ie assembler literals in
machine_code_buffer.
ii) size:= size of memory area required for literal
iii) pooltab_ptr: = pooltab_ptr + 1;

c) If a START or ORIGIN statement then


i) loc_cntr : = value specified in operand field;
ii) size : = 0;
d) If a declaration statement
i) If a DC statement then Assemble the constant in
machine_code_buffer.
ii) size: = size of memory area required by DC/DS.
e) An imperative statement:
i) …………………………………………….. ……………… ..
ii) Assemble instruction in machine_code_buffer.
iii) size:= size of instruction;
f) if size ≠0 then
i) Move contents of machine_code_buffer to the addren
code_area_addren + loc_cntr;
ii) loc_cntr : = loc_cntr + size ;
3) Processing of END statement.
a) Perform steps 2(b) & 2(f)
b) Write code_area in to output file.

2.4.7 Listing and Error Reporting:

- Design of an error indication schemes involves some decisions which


influence the

i) effectiveness of error reporting.

ii) speed of the assembler.

Iii) memory requirements of the assembler.

- The basic decision is whether to produce program error report in Pass I or


delay these actions until Pass II.
- Advantage of producing the listing in Pass I is that source program need not
be preserved till Pass-II.
- This conserves memory & avoids duplicate processing.

2.4.8 Some Organizational issues:-

We discuss some organizational issues assembler design like placement &


access tables & IC, with respect to schematic shown in fig 2.4.8.
Fig:- Data Structure & Two File Pan Assembler

Fig. 2.4.8 Data Structures & Files in two Pass Assembler.

TABLES:

1) For efficiency reasons SYMTAB must remain in main memory throughout


Passes I & II.
2) LITTAB is not accessed again & again & hence if memory is premium it is
possible hold part of it in the memory.
3) OPTAB should be in memory during PassI.

SOURCE PROGRAM & IC :-

- The SP would be read by Pass I in a statement by statement basis.


- After processing, a source statement can be written in to a file for subsequent
use in Pass – II
- The IC generated for it would also be written in to another file.
- The TP & program listing can be written out as separate file by Pass – II.
- Since all these files are sequential in nature, it is beneficial to use appropriate
blocking & buffering of records.

2.5 A Single Pass Assembler for IBM PC

- In this the problem of forward reference is handled using segment based


addressing .

Problem of single pass assembly:

- A single pass assembler for intel 8088 shares some problems with other single
pass assemblers.
Eg problems in assembling forward references in error reporting.
- We know that if at all there is forward reference, its entry is made in TII.
- When symbol’s def is encountered, this entry would be analysed to complete
the instruction. However use of symbolic name as a definition in branch
instruction gives rise to a peculiar problem.
- Some generic branch opcodes like JMP in 8088 assembly language can give
rise to instruction of different formats & different length depending on
whether the jump is near or far.
- However this would not be known until sometime later in the assembly
process. This problem is solved by assembling such unless a programmer
indicates a short displacement.
Eg. JMP SHORT LOOP.

Design of Assembler:

1) Forward reference:
- Infromation concerning forward references to a symbol is organized in the
form of a linked list.
- Thus, the forward reference table (FRT) contains a set of linked lists.
- The FRT pointer field of a SYMTAB entry points to the head of this list.
- For efficiency reason new entries are addren at the beginning of the list.
2)Cross references:
- The assembler uses a cross-reference table (CRT) to collect the information
concerning referencing to all symbols in the program.
- Each SYMTAB entry points to the head & tail of the linked list in the CRT.
- Here, new entries are added at the end of the list.

You might also like