Floating Point Numbers - Representation & Arithmetic: Dr. Arunachalam V Associate Professor, SENSE
Floating Point Numbers - Representation & Arithmetic: Dr. Arunachalam V Associate Professor, SENSE
Floating Point Numbers - Representation & Arithmetic: Dr. Arunachalam V Associate Professor, SENSE
=± ×
A floating point number has four components:
1. The sign, s (+ by 0/- by 1), usually it represents sign of mantissa.
2. The exponent base b (usually 2 for digital).
3. The significand or Mantissa, S or M. Usually normalized by shifting, so that
the MSB becomes nonzero. With base 2, the fixed leading 1 can be removed
to save one bit; this bit is called as “ hidden 1”.
4. The exponent, e (usually biased representation, = + , =
2 − 1 for k bit exponent)
5. For example, k = 4 bits, =2 − 1 = 8 − 1 = 7 ; e = + 2, e’ = 2+7 =9
and e = - 2, e’ = - 2 + 7 = 5.
Subranges and special values
= × = ×
• Range [-max, max] increases and precision decreases as the base b increase.
• In most practical cases, b = 2.
• Devoting more bits to exponent part widens the range but reduces the
precision.
• In most practical cases, exponent uses lesser number of bits than the
mantissa.
ANSI / IEEE Std 754 - 1985
Half precision
• In computing, half precision is a binary floating-point computer number
format that occupies 16 bits (two bytes in modern computers) in computer
memory.
• In the IEEE 754-2008 standard, the 16-bit base 2 format is referred to as
binary16. It is intended for storage of floating-point values in applications
where higher precision is not essential for performing arithmetic computations.
• Sign bit: 1 bit
• Exponent width: 5 bits
• Significand precision: 11 bits (10 explicitly stored)
Addition algorithm
1. Subtract exponents (d =e1 - e2)
2. Align the mantissas :
a) Shift right d positions the mantissa of the operand with the smallest exponent.
b) Select the largest exponent as exponent of the result.
3. Add (sub) significands and produce sign of result. The effective operation
(EOP):
2. Add exponents = + − ,
3. Determine sign ( = ⨁ )
4. Normalize and update exponent
5. Round (to nearest , to zero and to ±∞)
6. Determine exception Flags and special values
Reference
1. Chapters 17 of Behrooz Parhami, “Computer Arithmetic: Algorithms and
Hardware Design”, (2/e) Oxford University Press 2015.
Next Class