ASSembler (2)

Chapter 2: Assemblers:
2.1 Elements of Assembly language programming.
2.2 A simple assembly scheme.
2.3 Pass structure of assemblers.
2.4 Design of a two pass assembler.
2.5 A single pass assembler for IBM PC.
2.1 Elements of assembly language programming: -
- An assembly language is a machine dependant low level programming

language which is specific to a certain computer system.
- compared to the machine language of a computer system it provides three

basic features which simplify programming.
1) Mnemonic Operation codes:
- Also called as mnemonic opcodes.
- It eliminates the need to memorize numeric operation codes.
- It also enables the assembler to provide helpful diagnostion.
Eg. Indication of mis-spelt opcodes.
2) Symbolic Operands:
- Symbolic names can be associated with data or instructions & can be used
in assembly statement as operand.
- Assembler performs memory heading to ………….. ……………

………………… …………………………….
- Programmers need not to know any details of memory bindings.

- This leads to very important practical advantage during program
modification.
3) Data Declarations:
- Data can be declared in a variety of notations, including decimal notation.
- This avoids manual conversion of constants in to their internal machine

representation.
Eg. -5 in to (1111 1010)2 or
10.5 in to (41A8 0000)16
• Statement Format:
An assembly language statement has the following format:
[Label] <opcode> <operand spec> [<operand, spec>]
Where
[….] indicates that the enclosed specification is optional.
Label is symbolic name.
<operand spec> has the following syntax:
<symbolic name>[+<displacement>][(<index register>)]
This some possible operand forms are.
AREA => refers to the memory word with which the name AREA is associated.
AREA+5 => refers to the mem ory word 5 word away from the word with the name
AREA.
AREA(4) => operand addren is obtained by adding the contents of index register &
to the addren of AREA.
AREA+5(4) => combination of previous two.
• A simple assembly language.

- Each statement has two operands, the i) first operand is always a register
which can be ana1 of AREG, BREG, CREG & DREG. ii) second operand
refers to a memory word using a symbolic name and optional displacement.
Instruction Assembly Remarks
Opcode mnemonic
00 STOP stop execution
01 ADD first operand is modified
02 SUB condition code is set
03 MULT
04 MOVER Register memory mover
05 MOVEM Memory register movem
06 COMP Sets condition code
07 BC Branch on condition
08 DIV Analogous to SUB
09 READ first operand is ……………
10 ………… ………………….
- MOVE instruction move a value between a memory word and a register.

- In the MOVER instruction the second operand is source operand & first
operand is target operand.
- Converse is true for the MOVEM instruction.
- Condition code can be tested by BC instruction.
BC <condition code spec>,<memory addren>
It transfer control to memory word with the addren <memory addren> if the current
value of condition code matches <condition code spec>.
Fig 2.1.2 shows the machine instruction format:
- The opcode, register operand & memory operand occupy 2, 1 & 3 digits.
- Sign is not part of instruction.
sign opcode reg operand memory operand
Fig 2.1.2 Instruction format.
2.1.1 Assembly Language Statements:
An assembly program contains three kinds of statements.
1) Imperative Statements
2) Declarative Statements
3) Assembler Directives
1) Imperative statements:
- It indicates an action to be performed during the execution of the assembled
programs.
- Each imperative statement typically translates in to one machine instruction.
2) Declarative statements:
Syntax is
[Label] DS <constant>
[Label] DC ‘<value>’
The DS (short for declare storage) statement reserves area of memory &
associates name with them.
Consider the following DS statements.
A DS 1
G DS 200
The DC ( short for declare constant ) statement construct memory words containing
constant.
- It associates the name ONE with a memory word containing the value ‘1’.
- Constant can be a form decimal, binary, hexadecimal etc.
3) Assembler Directives:
- Assembler directives instruct the assembler to perform certain actions during
the assembly of a program.
Eg. START <constant>
Indicates that first word of the target program generated by the assembler should
be placed in the memory word with addren <constant>
END [<operand spec>]
This directive indicates end of source program optional part indicates addren of
instruction where execution of pgm should begin.
2.2A Simple Assembly Scheme .
Design specification of an assembler for this we use four step approack.
1) Identify the information necessary to perform a task.

2) Determine the suitable data structure to record information.
3) Determine the processing necessary to obtain & maintain the information.
4) Determine the processing necessary to perform task.
- The information requirements arise in the synthesis phase of an assembler.

- Here we consider how to make this information available ie during analysis
or during synthesis.
• Synthesis Phase:
Consider assembly statement
MOVER BREG, ONE
We must have the following information.
1) Addren of memory word with which name ONE is associated.

2) Machine opcode corresponding to the mnemonic MOVER.
First information depends on SP, hence it must be made available by analysis

phase.
Second info doesn’t depend on SP & hence synthesis phase can determine this
information for itself.
For this, we need to consider two data structure during the synthesis phase:
1) Symbol table
2) Mnemonics table
1)Symbol table:
- It has two entries name & addren.
- It is built by analysis phase.
2)Mnemonics table:
- It has two primary fields mnemonic & opcode
- The synthesis phases uses these table to obtain the machine addren with which
a name is associated & machine opcode corresponding to a mnemonic
respectively.
- Searching is done by symbol name & mnemonic as keys.
Analysis Phase:
- To determine addren we must have finished with memory allocation.

- To implement memory allocation a data structure called location counter (LC)
is introduced.
- The LC is always made to contain the addren of the next memory word in the
TP.
- LC is initialized to constant specified by START statement.
- For each label, new entry is done in symbol table with label & its LC.
- LC is updated each time to ensure that LC points to next memory word.
- To update LC, analysis phase must know the length of different instruction.
- This information depends on assembly language & hence new field called
length is introduced in the mnemonic table.
- Processing of maintaining LC, is called as LC processing.
- Fig. 2.2 shows the use of data structures by analysis & synthesis phase.
- Mnemonic table is fixed, morely accessed by analysis & synthesis phase.
- Symbol table is constructed during analysis phase used during synthesis
phase.
Fig:- Symbol Table
Data access
-----> control transfer.
Fig. 2.2 data structure of Assembler
• Tasks performed by Analysis & Synthesis phase
Analysis Phase:
1) Isolate the label, mnemonic opcode & operand fields of a statement.

2) It label is present enter the pair ……………. …………… ……..
……………………………………………………
3) Check the validity of mnemonic opcode through a look-up in the
Mnemonics table.
4) Perform LC processing.
Synthesis Phase:
1) Obtain the machine opcode corresponding to the mnemonic from the

Mnemonics table.
2) Obtain addren of a memory operand from the symbol table.
3) Synthesize a machine instruction or the machine form of a constant, as the
case may be.
2.3 Pass Structure of Assemblers:
- we know pass is the complete scan of a source program.
- let us discuss two pass & single pan assembly schemes.
1) Two pass translation:
It can handle forward references easily as first pass will do LC processing &
creates a symbol table and also constructs an IR of SP.
Simply first pan will perform analysis.
The second pass uses symbol table, IR & gives TP, simply we can say
synthesis is performed by second pass as shown in fig 2.3.
Data access
------> Control transfer.
2. Single Pass translation:
- LC processing & construction of symbol table is as in two pass translation

but the problem of forward reference is tackled using a process called backpatching.
- Operand field of instruction containing forward reference is left blank &

once it is encountered it is put at blank space.
- The need of inserting the second operands addren at later stage can be
indicated by adding an entry to the Table Incomplete Instruction. (TII).
- By the time END statement is processed, symbol table would contain all the
addrens & TII would contain info of all forward references.
- Assembler can now process each entry in TII to complete concerned

instruction.
2.4 Design of a two Pass Assembler:
Tasks performed by each pass are:
PassI.
1) Separate the symbol, mnemonic opcode & operand fields.

2) Build the symbol table.
3) Perform LC processing .
4) Construct intermediate representation.
PassII.
Synthesize the target program.
Pass I perform analysis of the SP & synthesis of the IR while Pass II processes
the IR to synthesize the target program.
2.4.1 Advanced Assembler Directives:
ORIGIN
Syntax: ORIGIN <addren spec>.
Where,
<addren spec> is an <operand spec> or <constant>.
- This directive indicates that LC should be set to addren given by <addren

spec>.
- It is useful when TP doesn’t consitst of consecutive memory words.
LTORG:-
- The LTORG statement permits a programmer to specify where literals should

be placed .
- By default assembler places the literals after the END statement.
- At every LTORG statement, the assembler allocates memory to the literals of
a literal pool.
- The pool contains all literals used in the program since the start of the program
or since the last LTORG statement.
4.2.2 PassI OF THE ASSEMBLER:-
It uses following data strtuctures
OPTAB – A table of mnemonic opcodes & related information.
SYMTAB – Symbol table
LITTAB – A table of literals used in the program.
Let us take an example to understand all the data structures. (example 2.4)
1 START 200
2 MOVER AREG, =‘5’ 200) + 04 1 211
3 MOV EM AREG, A 201) +05 1 217
4 LOOP MOVER AREG,A 202) +04 1 217
5 MOVER AREG,B 203) +05 3 218
6 ADD CREG, B 204) +01 3 212
7 ……
12 BC ANY, NEXT 210) +07 6 214
13 LTORG
= ‘5’ 211) +00 0 005
= ‘1’ 212) +00 0 001
14 ……
15 NEXT SUB AREG, = ‘1’ 214) +02 1 219
16 BC LT,BACK 215) +07 1 202
17 LAST STOP 216) +00 0 000
18 ORIGIN LOOP +2
19 MULT CREG,B 204) +03 3 218
20 ORIGIN LAST +1
21 A DS 1 217)
22 BACK EQU LOOP
23 B DS 1 218)
24 END
25 = ‘1’ 219) +00 0 001
Fig. An assembly program. (Program 2.4)
Mnemonic Class Mnemonic

opcode info
MOVER IS (04,1)
DS DL R#7
START AD R#11
:
Symbol Addren Length

LOOP 202 1
NEXT 214 1
LAST 216 1
A 217 1
BACK 202 1
B 218 1
OPTAB SYMTAB
Literal addren literal No.

1 = ‘5’
2 = ‘1’
3 = ‘1’
LITTAB POOLTAB
- Above tables have sample contents after processing program 2.4.

- Processing of assembly statement begins with the processing of its label field.
If it contains a symbol, the symbol & the value in LC is copied in to a new
entry of SYMTAB.
- Interpretation of the OPTAB entry for the mnemonic is also important in
PassI. Class field determines whether it is IS/DL/AD.
- The use of LITTAB is to collect all literals used in the propgram.
- POOLTAB, contains literal number of the starting literal of each literal pool.
2.4.3 Intermediate code forms. (used in two pass assembly)
The intermediate code consists of a set of IC units, each IC unit

consisting of the following three fields.
1) Addren.
2) Representation of mnemonic opcode.
3) Representation of operands.
Addren Opcode Operands
Fig. An IC unit
Mnemonic Field:-
The mnemonic field contains a pair of the form
(statement class, code)
Where, statement class = IS/DL/AD
Code = instruction opcode.
Declaration statement Assembler directives
DC 01
DS 02 START 01
END 02
ORIGIN 03
Thus (AD, 01) stands for START.
2.4.4 Intermediate code for imperative statement.
Variant- I
Syntax : (operand class, code)
Where,
Operand class is C/S/L
constant / symbol / literal & code field contains

internal representation of constant itself.
Eg.
START 200 is (C, 200)
Fig. 2.4.4 represents intermediate code-variants
START 200 (AD, 01) (C,200)
READ A (IS: 09) (3,01)
LOOP MOVER AREG, A (IS, 04) (1)(5,01)
SUB AREG, = ‘1’ (IS,02) (1) (L,01)
BC GT,LOOP (IS,07) (4) (5, 02)
STOP (IS, 00)
A DS 1 (DL, 02) (C,1)
LTORG (AD,01)
……….. ………….
Fig 2.4.4 IC : Varient – I
Varient – II
- In this, the operand field is processed only to identify literal references.

- Literals are entered in LITTAB & are represented as (L,m) in LC.
- Symbolic references in the source statement are not processed at all during
pass I.
START 200 (AD,01) (C,200)
READ A (IS, 09) A

MOVER AREG,A (IS, 04) AREG,A
SUB AREG,=‘1’ (IS, 02) AREG, (L,01)
BC GT, LOOP (IS,07) GT, LOOP
STOP (IS,00)
DS 1 (DL, 02) (C,1)
LTORG (AD,05)
…….. ………
Fig. Intermediate code Variant – II
2.4.5 Processing of Declaration & Assembler Directives.
In pass I of the assembler-
i) It is necessary to represent the addren of each source statement

in IC.
ii) It is necessary to have an explicit representation of DS statement
& Assembler directives.
Eg. Consider following source program & its intermediate code
START 200 (AD,01) (C, 200)

AREA1 DS 20 200) (DL, 02) (C, 20)
SIZE DC 5 220) (DL, 01) (C,5)
a) SP b) IC
- If DC statement defines many constant DC ‘5,3,-7’ then a series of (DL, 01)
units can be put in the IC.
- LTORG : when it appears in the source program, it assign memory addresses
to literals in the current pool of these addrens are entered in their LITTAB
entries.
2.4.6 Pass II of the assembler
Following is the algorithm for Pass-II of assembler.
Algorithm.
1) code-area-addren:= addren of code_area:

pooltab_ptr :=1;
loc_cntr:=0;
2) while next statement is not an END statement
a) clear machine-code-buffer;
b) if an LTORG statement
i) process literal in LITTAB[POOLTAB[pooltab_prt ……..
….LITTAB[POOLTAB[pooltab_ptr + 1] – 1 similar to processing
of constants in a DC statement ie assembler literals in
machine_code_buffer.
ii) size:= size of memory area required for literal
iii) pooltab_ptr: = pooltab_ptr + 1;
c) If a START or ORIGIN statement then

i) loc_cntr : = value specified in operand field;
ii) size : = 0;
d) If a declaration statement
i) If a DC statement then Assemble the constant in
machine_code_buffer.
ii) size: = size of memory area required by DC/DS.
e) An imperative statement:
i) …………………………………………….. ……………… ..
ii) Assemble instruction in machine_code_buffer.
iii) size:= size of instruction;
f) if size ≠0 then
i) Move contents of machine_code_buffer to the addren
code_area_addren + loc_cntr;
ii) loc_cntr : = loc_cntr + size ;
3) Processing of END statement.
a) Perform steps 2(b) & 2(f)
b) Write code_area in to output file.
2.4.7 Listing and Error Reporting:
- Design of an error indication schemes involves some decisions which

influence the
i) effectiveness of error reporting.
ii) speed of the assembler.
Iii) memory requirements of the assembler.
- The basic decision is whether to produce program error report in Pass I or

delay these actions until Pass II.
- Advantage of producing the listing in Pass I is that source program need not
be preserved till Pass-II.
- This conserves memory & avoids duplicate processing.
2.4.8 Some Organizational issues:-
We discuss some organizational issues assembler design like placement &

access tables & IC, with respect to schematic shown in fig 2.4.8.
Fig:- Data Structure & Two File Pan Assembler
Fig. 2.4.8 Data Structures & Files in two Pass Assembler.
TABLES:
1) For efficiency reasons SYMTAB must remain in main memory throughout

Passes I & II.
2) LITTAB is not accessed again & again & hence if memory is premium it is
possible hold part of it in the memory.
3) OPTAB should be in memory during PassI.
SOURCE PROGRAM & IC :-
- The SP would be read by Pass I in a statement by statement basis.

- After processing, a source statement can be written in to a file for subsequent
use in Pass – II
- The IC generated for it would also be written in to another file.
- The TP & program listing can be written out as separate file by Pass – II.
- Since all these files are sequential in nature, it is beneficial to use appropriate
blocking & buffering of records.
2.5 A Single Pass Assembler for IBM PC
- In this the problem of forward reference is handled using segment based

addressing .
Problem of single pass assembly:
- A single pass assembler for intel 8088 shares some problems with other single
pass assemblers.
Eg problems in assembling forward references in error reporting.
- We know that if at all there is forward reference, its entry is made in TII.
- When symbol’s def is encountered, this entry would be analysed to complete
the instruction. However use of symbolic name as a definition in branch
instruction gives rise to a peculiar problem.
- Some generic branch opcodes like JMP in 8088 assembly language can give
rise to instruction of different formats & different length depending on
whether the jump is near or far.
- However this would not be known until sometime later in the assembly
process. This problem is solved by assembling such unless a programmer
indicates a short displacement.
Eg. JMP SHORT LOOP.
Design of Assembler:
1) Forward reference:
- Infromation concerning forward references to a symbol is organized in the
form of a linked list.
- Thus, the forward reference table (FRT) contains a set of linked lists.
- The FRT pointer field of a SYMTAB entry points to the head of this list.
- For efficiency reason new entries are addren at the beginning of the list.
2)Cross references:
- The assembler uses a cross-reference table (CRT) to collect the information
concerning referencing to all symbols in the program.
- Each SYMTAB entry points to the head & tail of the linked list in the CRT.
- Here, new entries are added at the end of the list.

ASSembler (2)

Uploaded by

Copyright:

Available Formats

ASSembler (2)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ASSembler (2)

Uploaded by

Copyright:

Available Formats

Chapter 2: Assemblers:

2.1 Elements of Assembly language programming.

2.2 A simple assembly scheme.

2.3 Pass structure of assemblers.

2.4 Design of a two pass assembler.

2.5 A single pass assembler for IBM PC.

2.1 Elements of assembly language programming: -

- An assembly language is a machine dependant low level programming

- compared to the machine language of a computer system it provides three

1) Mnemonic Operation codes:

- Also called as mnemonic opcodes.

- It eliminates the need to memorize numeric operation codes.

- It also enables the assembler to provide helpful diagnostion.

Eg. Indication of mis-spelt opcodes.

- Assembler performs memory heading to ………….. ……………

- Programmers need not to know any details of memory bindings.

- Data can be declared in a variety of notations, including decimal notation.

- This avoids manual conversion of constants in to their internal machine

Eg. -5 in to (1111 1010)2 or

10.5 in to (41A8 0000)16

[Label] <opcode> <operand spec> [<operand, spec>]

[….] indicates that the enclosed specification is optional.

Label is symbolic name.

<operand spec> has the following syntax:

<symbolic name>[+<displacement>][(<index register>)]

This some possible operand forms are.

• A simple assembly language.

Instruction Assembly Remarks

00 STOP stop execution

01 ADD first operand is modified

02 SUB condition code is set

04 MOVER Register memory mover

05 MOVEM Memory register movem

06 COMP Sets condition code

08 DIV Analogous to SUB

09 READ first operand is ……………

- MOVE instruction move a value between a memory word and a register.

BC <condition code spec>,<memory addren>

Fig 2.1.2 shows the machine instruction format:

sign opcode reg operand memory operand

Fig 2.1.2 Instruction format.

2.1.1 Assembly Language Statements:

An assembly program contains three kinds of statements.

Eg. START <constant>

END [<operand spec>]

2.2A Simple Assembly Scheme .

Design specification of an assembler for this we use four step approack.

1) Identify the information necessary to perform a task.

- The information requirements arise in the synthesis phase of an assembler.

Consider assembly statement

MOVER BREG, ONE

We must have the following information.

1) Addren of memory word with which name ONE is associated.

First information depends on SP, hence it must be made available by analysis

- To determine addren we must have finished with memory allocation.

-----> control transfer.

Fig. 2.2 data structure of Assembler

• Tasks performed by Analysis & Synthesis phase

1) Isolate the label, mnemonic opcode & operand fields of a statement.

1) Obtain the machine opcode corresponding to the mnemonic from the