AC CourseWork MihaiTrofim

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 31

Ministry of Education of Republic of Moldova

Technical University of Moldova


Faculty of Computers, Informatics and Microelectronics
Anglophone Department

Course Work
Computer Architecture
Topic: Floating point multiplication (algorithm nr.1)

Performed by:

st. gr. FAF-141 (l. eng.)

Trofim Mihai

Verified by:

dr. conf. univ.

Sudacevschi Viorica

Chiinu 2016

Content
Introduction...............................................................................................................2
1. Central Processing Unit.......................................................................................2
1.1 CPU basics.......................................................................................................2
1.2 The register set.................................................................................................3
1.3 Instruction cycle...............................................................................................4
2. I8086 microprocessor architecture......................................................................5
2.1 Execution Unit.................................................................................................6
2.2 Bus Interface Unit............................................................................................6
2.3 Registers set of I8086.......................................................................................7
3. Instruction set architecture..................................................................................9
3.1

Instruction Format.......................................................................................11

3.2

Instruction Types.........................................................................................12

4. Floating Point Arithmetic..................................................................................15


4.1 Floating point numbers..15
4.2 Floating point numers range..........................................................................15
4.3 IEEE : floating point in modern computer.16
4.4 Floating point multiplication algorithm nr. 1.................................................18
5. Code explanation.....21
6. Code explanation.....21
8.Conclusion22
Appendix....23

Introduction
A computer consists of a set of physical components (hardware) and system
programs (system software) that are responsible for data processing according to
an algorithm, specified by the user through an application program (application
software).
Computer systems have conventionally been defined through their interfaces
at a number of abstraction levels, each providing functional support to its
predecessor. Included among the levels are the application programs, the high-level
languages, and the set of machine instructions.
In the past, the term computer architecture often referred only to
instruction set design that represents an interface between hardware and the lowest
level software - machine instructions (binary coded programs).
A different definition of computer architecture is built on four basic
viewpoints:
structure (defines the interconnection of various hardware components),
organization (defines the dynamic interplay and management of the various
components),
implementation (defines the detailed design of hardware components),
performance (specifies the behavior of the computer system).

1. Central Processing Unit


1.1 CPU basics
A typical CPU has three major components:
1. register set,
2. arithmetic logic unit (ALU),
3. control unit (CU).
2

The register set differs from one computer architecture to another. It is


usually a combination of general-purpose and special purpose registers.
The ALU provides the circuitry needed to perform the arithmetic, logical
and shift operations demanded of the instruction set. It also generates information
about carry, overflow and other special cases. It consists of combinational logic
circuits: adders, decoders, encoders, multiplexers and a set of registers (ex.
accumulator), used as a fast memory in arithmetic and logic operations.
The control unit is the entity responsible for fetching the instruction to be
executed from the main memory and decoding and then executing it.
The main components of the CPU and its interactions with the memory
system and the input/output devices:

1.2 The register set


The register set is usually a combination of general-purpose and special
purpose registers.
General-purpose registers can be used for multiple purposes and assigned to
a variety of functions by the programmer. Special-purpose registers are restricted
to only specific functions.
Two main registers are involved in fetching an instruction for execution:

program counter (PC) (is the register that contains the address of the next
instruction to be fetched). After a successful instruction fetch, the PC is
updated to point to the next instruction to be executed.
instruction register (IR) in which the fetched instruction is loaded
Two registers are essential in memory write and read operations:
memory data register (MDR)
memory address register (MAR).
The MDR and MAR are used exclusively by the CPU and are not directly
accessible to programmers.
In order to perform a write operation into a specified memory location, the
MDR and MAR are used as follows:
1. The word to be stored into the memory location is first loaded by the CPU
into MDR.
2. The address of the location into which the word is to be stored is loaded by
the CPU into a MAR.
3. A write signal is issued by the CPU.
Similarly, to perform a memory read operation, the MDR and MAR are used
as follows:
1. The address of the location from which the word is to be read is loaded into
the MAR.
2. A read signal is issued by the CPU.
3. The required word will be loaded by the memory into the MDR ready for
use by the CPU.
Some architectures contain a special program status word (PSW) register
or a Flag register. The PSW contains bits that are set by the CPU to indicate the
current status of an executing program. These indicators are typically for arithmetic
operations, interrupts, memory protection information, or processor status.

1.3 Instruction cycle


The basic function performed by a computer is execution of a program,
which consists of a set of instructions stored in memory. The CPU reads (fetch)
instructions from memory one at a time and executes each instruction. Program
execution consists of repeating the process of instruction fetch and execution.

The processing required for a single instruction is called an instruction


cycle. It consists of two steps: fetch cycle and execute cycle. The instruction cycle
is the multiple of the clock signal.
The fetched instruction is loaded into the IR. The processor interprets a
binary code of the instruction and executes the required action: reads and writes
data from and to memory, and transfers data from and to input/output devices.
A typical and simple instruction cycle can be summarized as follows:
1. Instruction address calculation: determine the address of the next
instruction to be executed by adding a fixed number to the address of the
previous instruction in PC.
2. Instruction fetch: Read the instruction from its memory location and store
it into IR.
3. Instruction decoding: analyze instruction to determine type of operation to
be performed and operands to be used.
4. Operands address calculation, if needed.
5. Operand fetch: fetch the operand from memory and store it in CPU
registers, if needed.
6. Instruction execution.
7. Results store: results are transferred from CPU registers to memory, if
needed.
The instruction cycle is repeated as long as there are more instructions to
execute.
A check for pending interrupts is usually included in the cycle. Examples of
interrupts include I/O device request, arithmetic overflow, division by zero, etc.
Interrupts are provided primarily as a way to improve processing efficiency. For
example, most external devices are much slower than a processor. With interrupts;
the processor can be engaged in executing other instructions while an I/O operation
is in progress.
To accommodate interrupts, an interrupt cycle is added to the instruction
cycle. In the interrupt cycle, the processor checks to see if any interrupts have
occurred. If no interrupts are pending, the processor proceeds to the fetch cycle for
the next instruction. If an interrupt is pending, the processor suspends execution of
the current program, saves the address of the next instruction and relevant data.
Then it sets the PC to the starting address of an interrupt handler routine.
The actions of the CPU during an instruction cycle are defined by microorders issued by the control unit. These micro-orders are individual control signals
sent over dedicated control lines.
5

2. I8086 microprocessor architecture


The I8086 microprocessor architecture consists of two sections:
execution unit (EU)
bus interface unit (BIU)
These two sections work simultaneously. BIU accesses memory and
peripherals while the EU executes the instructions previously fetched. Thus, Intel
implemented the concept of pipelining. Pipelining is the simplest form to allow the
CPU to fetch and execute at the same time.

It only works if BIU keeps ahead of EU. Thus BIU has a buffer of queue. (6
bytes). If the execution of any instruction takes to long, the BIU is filled to its
maximum capacity and busses will stay idle. It starts to fetch again whenever there
is 2-byte room in the queue.
When there is a jump instruction, the microprocessor must flush out the
queue. When a jump instruction is executed BIU starts to fetch information from
the new location in the memory. In this situation EU must wait until the BIU starts
to fetch the new instruction. This is known as branch penalty.

2.1 Execution Unit


The Execution Unit executes all instructions, provides data and addresses to
the Bus Interface Unit and manipulates the general registers and the Processor
Status Word (Flags register).
The 16-bit ALU performs arithmetic and logic operations, control flags and
manipulates the general registers and instruction operands.

The Execution Unit does not connect directly to the system bus. It obtains
instructions from a queue maintained by the Bus Interface Unit. When an
instruction requires access to memory or a peripheral device, the Execution Unit
requests the Bus Interface Unit to read and write data.

2.2 Bus Interface Unit


The Bus Interface Unit facilities communication between the EU and
memory or I/O circuits. It is responsible for transmitting address, data, and control
signals on the buses. This unit consists of the segment registers, the Instruction
Pointer, internal communication registers, a logic circuit to generate a 20 bit
address, bus control logic that multiplexers data and address lines, the instruction
code queue (6 bytes RAM).

2.3 Registers set of I8086


1. General Purpose Registers
The CPU has eight 16-bit general registers. The general registers are
subdivided into two sets of four registers. These sets are the data registers (also
called the H & L group for high and low) and the pointer and index registers (also
called the P & I group).

The data registers can be addressed by their upper or lower halves. Each data
register can be used interchangeably as a 16-bit register or two 8-bit registers. The
pointer and index registers are always accessed as 16-bit values. The p can use
data registers without constraint in most arithmetic and logic operations.
Arithmetic and logic operations can also use the pointer and index registers. Some
instructions use certain registers implicitly allowing compact encoding.
SP - Stack Pointer: Always points to top item of the stack.
BP - Base Pointer: It is used to access any item in the stack;
SI - Source Index: Contains the address of the current element in the source string;
DI - Destination Index: Contains the address of the current element in the
destination string.

2. Segment registers
The microprocessor 8086 has a 20-bit address bus for 1 Mbyte
external memory but inside the CPU registers have 16 bits that can access 64
Kbytes. The 8086 family memory space is divided into logical segments of
up to 64 Kbytes each. The segment registers contain the base addresses
(starting locations) of these memory segments.
8

CS (code segment) points at the segment containing the current


program.
DS (data segment) generally points at the segment where variables
are defined.
ES (extra segment) extra segment register, it's up to a coder to
define its usage.
SS (stack segment) points at the segment containing the stack.
3. Special purpose registers
IP - the instruction pointer or program counter: Always points to next
instruction to be executed. It contains the offset (displacement) of the next
instruction from the start address of the code segment.
Flags Register - determines the current state of the processor. It is also
called PSW (processor state word). From 16 bits are used only 9. Flags Register is
modified automatically by CPU after mathematical operations, this allows to
determine the type of the result, and to determine conditions to transfer control to
other parts of the program. Generally you cannot access these registers directly.
All flags can be divided into condition (status) flags and control (system) flags.
Condition flags:
0 bit -Carry Flag (CF) - this flag is set to 1 when there is a carry (borrow)
from the 8 or 16 bit in addition or subtraction operation. For example when
you add bytes 255 + 1 (result is not in range 0...255). When there is no a
carry or borrow this flag is set to 0. It is also used to store the value of the
MSB in shift operations.
2 bit - Parity Flag (PF) - this flag is set to 1 when there is even number of
one bits in result, and to 0 when there is odd number of one bits. Even if
result is a word only 8 low bits are analyzed!
4 bit - Auxiliary Flag (AF) - set to 1 when there is an unsigned overflow
for low nibble (4 bits).
6 bit - Zero Flag (ZF) - set to 1 when result is zero. For none zero result
this flag is set to 0.
7 bit - Sign Flag (SF) - set to 1 when result is negative. When result is
positive it is set to 0. Actually this flag take the value of the most significant
bit.
11 bit - Overflow Flag (OF) - set to 1 when there is a signed overflow. For
example, when you add bytes 100 + 50 (result is not in range -128...127).
9

Control flags:
8 bit - Trap Flag (TF) System flag - Used for on-chip debugging (pas cu
pas) when TF=1. In this case the interrupt is generated (int 1) which calls a
special routine to show the state of internal registers. There are no
instructions to change this flag. The content of PSW is written in one general
Rg through the stack to can change it.
9 bit - Interrupt enable Flag (IF) System flag - when this flag is set to 1
CPU reacts to interrupts on INTR input of the microprocessor from external
devices. When IF=0 interrupts are not allowed (masked). IF do not react to
NMI (non maskable) interrupts and to internal interrupts performed by
instruction INT. Instructions CLI (clear interrupt) and STI (set interrupt) are
used to control this flag.
10 bit - Direction Flag (DF) - this flag is used by some instructions to
process data chains, when this flag is set to 0 - the processing is done
forward (increment of SI and DI registers), when this flag is set to 1 the
processing is done backward - decrement (instructions CLD and STD).

3. Instruction set architecture


The instruction set architecture (ISA) includes:
instruction set in a binary code (machine language) that is recognized
by a processor;
data types with which instructions can operate;
environment in which instructions operate.

10

Technically, CPUs come in two main architectures:


CISC (Complex Instruction-Set Computing)
RISC (Reduced Instruction-Set Computing).
CISC chips (Motorola 68k and Intel x86 architectures) sacrifice speed in
favour of having a complete set of built-in instructions on the chip. RISC chips
(Power PC, ARM, SPARC) contain fewer instructions but can execute their tasks
much faster.
A computer program can be represented at different levels of abstraction. A
program could be written in a machine-independent, high-level language such as
Java or C++.
A computer can execute programs only when they are represented in
machine language specific to its architecture.
A machine language program for a given architecture is a collection of
machine instructions represented in binary form that are recognised by a Control
Unit (CU). According to this binary code, CU selects a certain transition states
algorithm and generates control signals to ALU and registers. The algorithm can be
microprogramed or hardwired.
Programs written at any level higher than the machine language must be
translated to the binary representation before a computer can execute them.
An assembly language program is a symbolic representation of the
machine language program.
Converting the symbolic representation into machine language is performed
by a special program called the assembler.
Although high-level languages and compiler technology have witnessed
great advances over the years, assembly language remains necessary in some cases.
Programming in assembly can result in machine code that is much smaller
and much faster than that generated by a compiler of a high-level language. Small
11

and fast code could be critical in some embedded and portable applications, where
resources may be very limited. In such cases, small portions of the program that
may be heavily used can be written in assembly language.
Assembly programmers have access to all the hardware features of the target
machine that might not be accessible to high-level language programmers.
Learning assembly languages can be of great help in understanding the low
level details of computer organization and architecture.
Machine language is the native language of a given processor. Since
assembly language is the symbolic form of machine language, each different type
of processor has its own unique assembly language. Before we study the assembly
language of a given processor, we need first to understand the details of that
processor. We need to know the memory size and organization, the processor
registers, the instruction format, and the entire instruction set.

3.1 Instruction Format


Assembly language is the symbolic form of machine language. Assembly
programs are written with short abbreviations that represents the actual machine
instruction called mnemonics.
The use of mnemonics is more meaningful than that of hex or binary values,
which would make programming at this low level easier and more manageable.
Examples: mov - move, add addition, aub subtraction, mul multiplication.
An assembly program consists of a sequence of assembly statements, where
statements are written one per line. Each line of an assembly program is split into
the following four fields: label, operation code (opcode), operand, and comments.

Labels are used to provide symbolic names for memory addresses. A label is
an identifier that can be used on a program line in order to branch to the labeled
line. It can also be used to access data using symbolic names. The operation code
(opcode) field contains the symbolic abbreviation of a given operation. The
operand field consists of additional information or data that the opcode requires.
The operand field may be used to specify constant, label, immediate data, register,
or a memory address. The comments field provides a space for documentation to
explain what has been done for the purpose of debugging and maintenance. In
I8086 instruction consists from one to six bytes.
12

According to the length of the instructions exists two types of ISA:


1. With fixed length instructions (commonly used in RISC architectures)
2. With variable length instructions (commonly used in CISC architectures)
The advantage of using variable length instructions is that they reduce the
amount of memory space required for a program. In I8086 instructions are from
one byte to a maximum of 6 bytes in length.
The advantage of fixed length instructions is that they make the job of
fetching and decoding instructions easier and more efficient, which means that they
can be executed in less time than the corresponding variable length instructions.
Instructions can be classified based on the number of operands as: three-address,
two-address, one-address, and zero-address.
Examples:

Three-address instruction formats are not common, because they require a


relatively long space to hold all addresses.
In two-address instruction one address is an operand and also a result.
In one-address instruction a second address is implicit. Usually it is the
accumulator AX. It is used for one operand and the result.
Zero-address instructions are applicable to stack memory and use as address
the content of SP (top of the stack).
The number of addresses per instruction is a basic design decision. Fewer
addresses per instruction result in more primitive instructions, which require a less
complex CPU. It also results in instruction of shorter length. On the other hand
programs contain more total instructions and have a longer execution time.
Another problem: with one-address instructions, the programmer has available
only one general-purpose register the accumulator, with multiple address
instructions it is common to have multiple general-purpose registers. Because
register references are faster than memory references this speeds up execution.
Most contemporary machines employ a mixture of two and three address
instructions.

3.2 Instruction Types


The X86 family of processors defines a number of instruction types.
13

I. Data transfer instructions


1. General-purpose data transfer
MOV dst, src: (dst) (src) copies the second operand to the first
operand.
XCHG dst, src: (dst) (src) exchange bytes or exchange words.
2. Data transfer with stack
PUSH src: copy specified word to top of stack.
POP dst: copy word from top of stack to specific location.
3. Flag transfer
PUSHF: Copy flag register to top of stack.
POPF: Copy word at top of stack to flag register
LAHF: Load AH with the low byte of the flag register. No operands
SAHF: Store AH register into low 8 bits of Flags register. No operands
4. Address transfer
LEA reg, src: Load effective address of operand in specified register.
LDS reg, src: Load DS register and other specified register from
memory.
LES reg, src: Load ES register and other specified register from
memory.
5. I/O port transfer
IN ac, port: Copy a byte or word from specified port to accumulator
IN ac, DX
OUT port, ac: Copy a byte or word from accumulator to specified port.
OUT DX, ac
II. Arithmetic instructions
Arithmetic operations are executed on integer numbers in 4 formats:
unsigned binary (byte or word ) 5h - 0000 0101
signed binary (byte or word), -5h or FAh 1111 1011
packed decimal ( the string of decimal digits are stored in consecutive 4-bit
groups : 3251- 0011 0010 0101 0001)
unpacked decimal ( each digit is stored in low 4-bit part of the byte: 3251 ****0011 ****0010 ****0101 ****0001)
All arithmetic instructions influence flags that can be checked with
conditional transfer instructions.
14

Arithmetic operations can use all addressing modes but one operand should
be a register.
ADD dst, src: dst (dst) + (scr) src can be also immediate value of 8 or 16
bits
ADC dst,src: dst (dst) + (src) + CF. It is used in multiple precision
operations
SUB dst, src: dst (dst) - (src) Subtract byte from byte or word from word.
SBB dst, src: dst (dst) - (src) - CF
INC opr: opr (opr) + 1 do not change CF.
DEC opr: opr (opr) - 1
NEG opr: opr - (opr) Negate invert each bit of a specified byte or word and
add 1 (form 2s complement).
CMP opr1, opr2: Compare two specified bytes or two specified words and do
not keep the result, just for flags (OF, SF, ZF, AF, PF, CF according to result). It is
used with conditional jump instructions.
CBW: (no opr) (for signed binary) converts byte to word. If the high digit in AL is
0 then all AH bits are 0, if high bit in AL is 1 then all AH bits are 1.
CWD: convert word to double word. Works with AX and DX (high word)
MUL src: (AX) (AL) * (src) for bytes CF and OF =1 if the high byte is not 0
(DX : AX) (AX) * (src) for words
IMUL src: Multiply signed byte by byte or signed word by word CF and OF =1
if the high byte is not the extension of sign
DIV src:
divisor is a byte
(AL) quotient (AX) / (src)
(AH) remainder (AX) / (src)
divisor is a word
(AX) quotient (DX : AX) / (src)
(DX) remainder (DX : AX) / (src)
IDIV src: Divide signed word by byte or signed double word by word.

4. Floating Point Arithmetic


4.1 Floating point numbers

15

In computing, floating point is the formulaic representation which approximates a


real number so as to support a trade-off between range and precision. A number is,
in general, represented approximately to a fixed number of significant digits (the
significand) and scaled using an exponent; the base for the scaling is normally two,
ten, or sixteen. A number that can be represented exactly is of the following form:
Where
The term floating point refers to the fact that a number's radix point (decimal point,
or, more commonly in computers, binary point) can "float"; that is, it can be placed
anywhere relative to the significant digits of the number. This position is indicated
as the exponent component, and thus the floating-point representation can be
thought of as a kind of scientific notation.

A floating-point system can be used to represent, with a fixed number of digits,


numbers of different orders of magnitude: e.g. the distance between galaxies or the
diameter of an atomic nucleus can be expressed with the same unit of length. The
result of this dynamic range is that the numbers that can be represented are not
uniformly spaced; the difference between two consecutive representable numbers
grows with the chosen scale.

Over the years, a variety of floating-point representations have been used in


computers. However, since the 1990s, the most commonly encountered
representation is that defined by the IEEE 754 Standard.
The speed of floating-point operations, commonly measured in terms of
FLOPS, is an important characteristic of a computer system, especially for
applications that involve intensive mathematical calculations.

4.2 Floating point numbers range


A floating-point number consists of two fixed-point components, whose range
depends exclusively on the number of bits or digits in their representation.
16

Whereas components linearly depend on their range, the floating-point range


linearly depends on the significant range and exponentially on the range of
exponent component, which attaches outstandingly wider range to the number.
On a typical computer system, a 'double precision' (64-bit) binary floating-point
number has a coefficient of 53 bits (one of which is implied), an exponent of 11
bits, and one sign bit. Positive floating-point numbers in this format have an
approximate range of 10308 to 10308, because the range of the exponent is [1022,
1023] and 308 is approximately log10(21023). The complete range of the format is
from about 10308 through +10308.
The number of normalized floating-point numbers in a system F (B, P, L, U)
(where B is the base of the system, P is the precision of the system to P numbers, L
is the smallest exponent representable in the system, and U is the largest exponent
used in the system) is:
.
There is a smallest positive normalized floating-point number, Underflow level =
UFL =
which has a 1 as the leading digit and 0 for the remaining digits of the
significand, and the smallest possible value for the exponent.
There is a largest floating-point number, Overflow level = OFL =
which has B 1 as the value for each digit of the significand
and the largest possible value for the exponent.
In addition there are representable values strictly between UFL and UFL.
Namely, positive and negative zeros, as well as denormalized numbers.

4.3 IEEE : floating point in modern computer


The IEEE has standardized the computer representation for binary floating-point
numbers in IEEE 754 (a.k.a. IEC 60559). This standard is followed by almost all
modern machines. IBM mainframes support IBM's own hexadecimal floating point
format and IEEE 754-2008 decimal floating point in addition to the IEEE 754
17

binary format. The Cray T90 series had an IEEE version, but the SV1 still uses
Cray floating-point format.
The standard provides for many closely related formats, differing in only a few
details. Five of these formats are called basic formatsand others are
termed extended formats; three of these are especially widely used in computer
hardware and languages:

Single precision, usually used to represent the "float" type in the C language
family (though this is not guaranteed). This is a binary format that occupies 32
bits (4 bytes) and its significand has a precision of 24 bits (about 7 decimal
digits).

Double precision, usually used to represent the "double" type in the C


language family (though this is not guaranteed). This is a binary format that
occupies 64 bits (8 bytes) and its significand has a precision of 53 bits (about
16 decimal digits).

Double extended, also called "extended precision" format. This is a binary


format that occupies at least 79 bits (80 if the hidden/implicit bit rule is not
used) and its significand has a precision of at least 64 bits (about 19 decimal
digits). A format satisfying the minimal requirements (64-bit precision, 15-bit
exponent, thus fitting on 80 bits) is provided by the x86 architecture. In general
on such processors, this format can be used with "long double" in the C
language family (the C99 and C11 standards "IEC 60559 floating-point
arithmetic extension- Annex F" recommend the 80-bit extended format to be
provided as "long double" when available). On other processors, "long double"
may be a synonym for "double" if any form of extended precision is not
available, or may stand for a larger format, such as quadruple precision.

Increasing the precision of the floating point representation generally reduces the
amount of accumulated round-off error caused by intermediate calculations.[8]
Less common IEEE formats include:

Quadruple precision (binary128). This is a binary format that occupies 128


bits (16 bytes) and its significand has a precision of 113 bits (about 34 decimal
digits).

Double precision (decimal64) and quadruple precision (decimal128)


decimal floating-point formats. These formats, along with the single
18

precision (decimal32) format, are intended for performing decimal rounding


correctly.

Half, also called binary16, a 16-bit floating-point value.

Any integer with absolute value less than 224 can be exactly represented in the
single precision format, and any integer with absolute value less than 253 can be
exactly represented in the double precision format. Furthermore, a wide range of
powers of 2 times such a number can be represented. These properties are
sometimes used for purely integer data, to get 53-bit integers on platforms that
have double precision floats but only 32-bit integers.
The standard specifies some special values, and their representation:
positive infinity (+), negative infinity (), a negative zero (0) distinct from
ordinary ("positive") zero, and "not a number" values (NaNs).
Comparison of floating-point numbers, as defined by the IEEE standard, is a bit
different from usual integer comparison. Negative and positive zero compare
equal, and every NaN compares unequal to every value, including itself. All values
except NaN are strictly smaller than + and strictly greater than . Finite
floating-point numbers are ordered in the same way as their values (in the set of
real numbers).
A project for revising the IEEE 754 standard was started in 2000 (see IEEE 754
revision); it was completed and approved in June 2008. It includes decimal
floating-point formats and a 16-bit floating-point format ("binary16"). binary16 has
the same structure and rules as the older formats, with 1 sign bit, 5 exponent bits
and 10 trailing significand bits. It is being used in the NVIDIA Cg graphics
language, and in the openEXR standard.

4.4 Floating point multiplication algorithm nr. 1


First we must declare out floating numbers as mantissa and exponent.
x=m x 2e

ey

y=m y 2
ez

z=m z 2
The result is stored in
Here are the steps of the algorithm:
19

1. Determination of sign of the result


sign(mz) = sign(mx)

sign(my)

As the 8086 processor sees the numbers if Twos Complement code, the sign
of a number is its MSB (Most Significant Bit)
2. Find modules of mantissas
If a number is positive, its module remain unchanged. If it is negative (MSB
= 1) it is converted in CC by inverting all bits, and adding 1 to the number.
if( sign(mx) = 0 ) | mx | = mx
if( sign(mx) = 1 ) | mx | = neg( mx )
neg is an assembly instruction which performs the conversion in CC.

3. Find results exponent


ez = ex + ey , but it may not be the final exponent, as the number might be
denormalized
4. Multiplication of modules (1st algorithm)
o Allocate double memory for an adder which will be the results
mantissa, and for multiplicand
o Check LSB of my
if LSB = 0 => shift my to right, shift mx to left
if LSB = 1 => add adder and mx then shift my to right, shift mx
to left
o Check my
If it is 0 (all the initial bits has been shifted) => quit the
algorithm
If it is not zero = > return to 2nd step

20

5. Result normalization
Shift mz to left how many times MSB is repeated.

if ( mz = 1.1 or mz = 0.0 ) mz = mz

and ez = ez - 1

6. Assign to result the sign calculated at the beginning of the algorithm


Example:
mx = 1.0001101
my = 1.0010010
ex = 0.1000010
ey = 1.1010001
1. Find sign of the result:
sign(mz) = sign(mx) sign(my) = 1 1 = 0
2. Find modules of mantissas:
|my| = 0.1101110
|mx| = 0.1110011
3. Calculating the exponent of the result:
ez = 0.1000010 + 1.1010001 = 0.0010011
21

4. Multiplication algorithm nr.1 :


ADDER
00000000 00000000
01101110
00000000 01101110

|MY|
01110011

COMMENTS
+|mx|
m y

00111001

mx
, +|

0 1101110

0000001 01001010

mx|
m y

00011100

mx
, +|

01 101110

00000001 01001010

mx|
m y
2

00000111

mx
, +|

0110 1110

00001000 00101010

mx|
m y

00000011

mx
, +|

01101 110

00010101 11101010

mx|
m y

00000001

mx
, +|

011011 10

00110001 01101010

mx|
m y

00000000

mx

0110111 0

5. Normalization
mz = 0.0110001 01101010
mz = 0.110001011010100
ez = ez - 1 = 0.0010011 1 = 0.00010010
6. The sign of result remain the same

5. Code explanation
22

The assembly program performs the floating point multiplication (algorithm nr.1).
It uses the following macros:
FirstAlgMult macro x,y
It performs the fixed-point multiplication (algorithm nr.1) of sent peremeters
x and y, and stores the result in mz variable
Normalize macro z,exp
It checks if the number is denormalized. It uses a mask (C000h) for
highlighting the first 2 bits of the number and checks if it is 00 or 11. If so, it
shift z to the left and decrement exp.
print macro message
Print a message to the screen
print8Bits macro x
Print a 8-bit number in binary
print16Bits macro x
Print a 16-bit number in binary

6. Results

23

7. Conclusion
Assembler is a low level programming language. It allows the programmer to
interact with the processor, to manage memory.
While performing this course work I learned a lot of things about assembly
language. In my code I used macros which are very useful things because it allows
to pass parameters and make the program more modular.
The implementation of floating point multiplication algorithm was an interesting
performance. The arithmetic algorithms are very important because they are the
basics of data management in a computer. Their implementation and complexity
influences the performance of the computer.

Appendix
Source code in Assembler:
.model small
.stack 100h
.data
mx db 3Ah ; 0011 1010 b
my db 0D1h ; 1101 0001 b
ex db 42h
ey db 0D1h
sign db ?
mz dw ?
ez db ?
mask dw 0xC000h ; 1100 0000 0000 0000 b - for checking 1st two
bits at normalization
24

; variables for outputing


mxInput db 10,13,'Input',10,13,'mx = $'
myInput db 10,10,13,'my = $'
exInput db 10,13,'ex = $'
eyInput db 10,13,'ey = $'
mzResult db 10,13,10,13,'Result',10,13,'mz = $'
ezResult db 10,13,'ez = $'

.code
;First Multiplication Fixed-Point Algorithm
;-----------------------------------------FirstAlgMult macro x, y
; save data from registers
push dx
push ax
push bx
xor dx, dx ; adder
xor ax, ax
xor bx, bx
mov al, x ; AX <-- mx
mov bl, y ; BX <-- my
CheckLSB:
test bl, 1
jz Shift
;if LSB = 1
add dx, ax ; adder = adder + mx
Shift:
shl ax, 1
;shift mx left
shr bl, 1
;shift my right
test bl, 0FFh ;check if my is 0
jnz CheckLSB
mov

mz, dx

;restore data from stack


pop bx
25

pop ax
pop dx
endm

; Check for denormalization


;------------------------------------Normalize macro z, exp
push ax
push bx
xor bx,bx
xor ax,ax
mov bx, z
CheckTwoBits:
mov ax, bx
and ax, mask
jz DoNormalization ; two consecutive 0s
= 00...)
cmp ax, mask
je DoNormalization ; two consecutive 1s
= 11... = mask)
jmp NumberIsNormalized
DoNormalization:
shl bx, 1
dec exp
jmp CheckTwoBits
NumberIsNormalized:
mov z, bx
pop bx
pop ax
endm
26

(00... & 11...

(11... & 11...

print macro message


push dx
push ax
mov dx, offset message
mov ah, 9
int 21h
pop ax
pop dx
endm

; print an 8 bits number------------print8Bits macro x, len


LOCAL show8, Zero8, Jump8, Quit8
push bx
push cx
push dx
mov dl, 8
mov cx, 8
xor bx,bx
show8:
mov bl, x
dec dl
mov cl, dl
shr bx, cl
test bx, 1
jz Zero8
mov al, '1'
mov ah, 0Eh
int 10h
jmp Jump8
Zero8:
mov al, '0'
27

mov ah, 0Eh


int 10h
Jump8:
test dx, 0FFFFh
jz Quit8
jmp show8
Quit8:
push dx
push cx
push bx
endm

;print a 16 bits number------------print16Bits macro x


LOCAL show16, Zero16, Jump16, Quit16
push bx
push cx
push dx
mov dl, 16
mov cx, 16
xor bx,bx
show16:
mov bx, x
dec dl
mov cl, dl
shr bx, cl
test bx, 1
jz Zero16
mov al, '1'
mov ah, 0Eh
int 10h
28

jmp Jump16
Zero16:
mov al, '0'
mov ah, 0Eh
int 10h
Jump16:
test dx, 0FFFFh
jz Quit16
jmp show16
Quit16:
push dx
push cx
push bx
endm

;------------------- START --------------------start:


mov ax, @data
mov ds, ax
;Determine sign of the result
; sign(mx) xor sign(my) = sign(mz)
xor
mov
xor
mov

ax, ax
al, mx
al, my
sign, al

; find modules of mx and my


test mx, 80h
jz mxIsPositive
;mx negative case
neg mx
mxIsPositive:
test my, 80h
jz myIsPositive
;my negative case
29

neg my
myIsPositive:
; find exponent of result ez
mov al, ex
add al, ey
mov ez, al
; perform fixed point multiplication of mantissas
FirstAlgMult mx, my
Normalize mz, ez
; convert the resulting mz in CC
; by checking the sign stored at the beginning
test sign, 80h
jz DoNotConvert
neg mz
DoNotConvert:

print mxInput
print8Bits mx
print exInput
print8Bits ex
print myInput
print8Bits my
print eyInput
print8Bits ey
print mzResult
print16Bits mz
print ezResult
print8Bits ez

end start

30

You might also like