IEEE 754 Floating Point Standard
IEEE 754 Floating Point Standard
IEEE 754 Floating Point Standard
IEEE 754 Floating Point Standard In lecture slides CO2103 Chapter 03 on Background, we briefly mentioned how computer stores floating point numbers. The format used in the representation of floating point number in the computer is based on the IEEE 754 Floating Point Standard. All floating point numbers will be normalized and the normalized form will be stored in the computer in accordance to IEEE 754 standard. Normalized form: 1.xxxxxx 2yyyy IEEE 754 Floating Point Standard: -1S (1.0 + 0.M) 2E
The Sign (S) bit indicates if the number is positive (S=0) or negative (S=1). With normalized form, only the fractional part of the mantissa needs to be stored. The Mantissa (M) bits are the xxxxxx after the radix point. M is stored in natural binary form. The Exponential (E) bits are the yyyy, which are represented in bias-m to ease comparisons. Using 1. 2. 3. normalized scientific notation Simplifies the exchange (and representation) of data that includes floating-point numbers Simplifies the arithmetic algorithms to know that the numbers will always be in this form Increases the accuracy of the numbers that can be stored in a word, since each unnecessary leading 0 is replaced by another significant digit to the right of the decimal point
Under IEEE 754 standard, floating point numbers can be represented in either of the two precisions: Single-Precision (32-bit) or Double-Precision (64-bit).
Bit No 31 23-30
Size
Field Name
Bit No 63
Size
Field Name
Single-Precision floating point numbers will occupy 32 bits and give approx range of 10-38 1038. The Exponent (E) is represented in bias-127. Double-Precision floating point numbers will occupy 64 bits and give approx range of 10-308 10308. The Exponent (E) is represented in bias-1023. Few examples for Single-Precision:
Number (binary) -10.00111 101101.111011 -0.001111 0.0000101111 = = = = Normalized (binary) -1.000111121 1.0110111101125 -1.1112-3 1.011112-5 S 1 0 1 0 E (8-bit in bias-127) 1+127=128=100000002 10000100 01111100 01111010 M (23-bit) 00001111 001101111011 00111 001111 IEEE 754 Single (32-bit) 1100000111 010000101101111011 101111100111 00111101001111
There are two potential errors in representing a floating numbers in IEEE 754 format: Overflow - the exponent is too large to be represented in the Exponent field Underflow - the number is too small to be represented in the Exponent field
CO2103
To reduce the chances of underflow/overflow, can use 64-bit Double-Precision arithmetic For further reference: http://babbage.cs.qc.edu/IEEE-754/References.xhtml. The above material was prepared with reference to http://www.doc.ic.ac.uk/~ih. Exercises: 1. Determine the normalized binary for the following decimal numbers: a) 234.625 b) -890.375 c) -0.001007080078125 d) 0.000091552734375
(Ans. a) 11101010.101, b) -1101111010.011, c) -0.000000000100001, d) 0.000000000000011)
2.
Represent the above floating point numbers in IEEE 754 Single-Precision format. Write your answers in Hex.
(Ans. a) 436AA000, b) C45E9800, c) BA840000, d) 38C00000)
3.
value
of
BFC940000000000016,
which
is
an
IEEE
754