Floating Point Numbers - Representation & Arithmetic: Dr. Arunachalam V Associate Professor, SENSE

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Floating point numbers –

representation & Arithmetic


Dr. Arunachalam V
Associate Professor, SENSE
Introduction

=± ×
A floating point number has four components:
1. The sign, s (+ by 0/- by 1), usually it represents sign of mantissa.
2. The exponent base b (usually 2 for digital).
3. The significand or Mantissa, S or M. Usually normalized by shifting, so that
the MSB becomes nonzero. With base 2, the fixed leading 1 can be removed
to save one bit; this bit is called as “ hidden 1”.
4. The exponent, e (usually biased representation, = + , =
2 − 1 for k bit exponent)
5. For example, k = 4 bits, =2 − 1 = 8 − 1 = 7 ; e = + 2, e’ = 2+7 =9
and e = - 2, e’ = - 2 + 7 = 5.
Subranges and special values

= × = ×

• Range [-max, max] increases and precision decreases as the base b increase.
• In most practical cases, b = 2.
• Devoting more bits to exponent part widens the range but reduces the
precision.
• In most practical cases, exponent uses lesser number of bits than the
mantissa.
ANSI / IEEE Std 754 - 1985
Half precision
• In computing, half precision is a binary floating-point computer number
format that occupies 16 bits (two bytes in modern computers) in computer
memory.
• In the IEEE 754-2008 standard, the 16-bit base 2 format is referred to as
binary16. It is intended for storage of floating-point values in applications
where higher precision is not essential for performing arithmetic computations.
• Sign bit: 1 bit
• Exponent width: 5 bits
• Significand precision: 11 bits (10 explicitly stored)
Addition algorithm
1. Subtract exponents (d =e1 - e2)
2. Align the mantissas :
a) Shift right d positions the mantissa of the operand with the smallest exponent.
b) Select the largest exponent as exponent of the result.
3. Add (sub) significands and produce sign of result. The effective operation
(EOP):

Floating point operation Sign of the operands EOP


Add Equal Add
Add Different Sub
Sub Equal Sub
Sub Different Add
Addition algorithm
4. Normalize:
a) Effective operation addition: there is no over flow of the significand., i.e.
result is already normalized: no action is needed
b) Effective operation addition: there might be an over flow of the
significand. Normalize:
i. Shift right the significand one position
ii. Increment by one the exponent
c) Effective operation subtraction: the result might have leading zeros.
Normalize:
i. Shift left the significand by a number of positions corresponding to the number of
leading zeros.
ii. Decrement the exponent by the number of leading zeros.
Addition algorithm
5. Round. According to the specified mode. Might require an addition. If
over flow occurs, normalize by a right shift and increment the exponent.
6. Determine exception flags and special values : exponent over flow
(special value ± infinity), exponent under flow (special value gradual
under- flow), inexact, and the special value zero.
Multiplication algorithm
Input: x and y - normalized operands represented by ( : : )&( : : )
1. Multiply significands ( = ∗× ∗ ).

2. Add exponents = + − ,
3. Determine sign ( = ⨁ )
4. Normalize and update exponent
5. Round (to nearest , to zero and to ±∞)
6. Determine exception Flags and special values
Reference
1. Chapters 17 of Behrooz Parhami, “Computer Arithmetic: Algorithms and
Hardware Design”, (2/e) Oxford University Press 2015.
Next Class

FLOATING POINT ARITHMETIC PROBLEM


SOLVING

You might also like