CH 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 82

Data Representations

Dr. Suresh Balpande


11/7/2023 1
Signed Integer
• 3 major representations:
Sign and magnitude
One’s complement
Two’s complement
• Assumptions:
4-bit machine word
16 different values can be represented
Roughly half are positive, half are negative

11/7/2023 2
Unsigned Binary Integers
0000
1111 0001 Turn x notches
counterclockwise
0
1110 15 1 0010 to add x
14 2
1101 0011
13 3 15 1
14 2
13 0 3
Inside: Natural number
1100 12 Outside: 4-bit encoding 4 0100 12 4
11 5
11 5 10 6
1011 0101 9 8 7
10 6
1010 9 7 0110 Turn y notches
8
clockwise
1001 0111 to subtract y
1000

Figure : Schematic representation of 4-bit code for integers in [0, 15].

11/7/2023 Slide 3
Sign and Magnitude Representation

High order bit is sign: -7 +0


-6 1111 0000 +1
0 = positive (or zero), 1110 0001
1 = negative -5 +2 +
1101 0010
-4 1100 0011 +3 0 100 = + 4
Three low order bits is the
-3 1011
+4 1 100 = - 4
magnitude: 0 (000) thru 7 (111) 0100
1010 0101
-2 +5 -
1001 0110
Number range for n bits -1 1000 0111 +6
= +/-2n-1 -1 -0 +7
Two representations for 0

11/7/2023 4
One’s Complement Representation

• Subtraction implemented -0 +0
+1
-1 1111 0000
by addition & 1's -2
1110 0001
+2 +
1101 0010
complement -3 1100 0011 +3 0 100 = + 4

• Still two representations of -4 1011 0100 +4 1 011 = - 4


1010
0! This causes some -5
1001
0101
0110
+5 -

problems -6 1000 0111 +6


-7 +7
• Some complexities in
addition

11/7/2023 5
Two’s Complement Representation
-1 +0
-2 1111 0000 +1
1110 0001
like 1's comp -3 +2 +
1101 0010
except shifted
one position -4 1100 0011 +3 0 100 = + 4
clockwise
-5 1011 0100 +4 1 100 = - 4
1010 0101
-6 +5 -
1001 0110
-7 1000 0111 +6
-8 +7
• Only one representation for 0
• One more negative number than positive number

11/7/2023 6
Example

• Compute the 32-bit 2’s complement representations for the


following decimal numbers: 5, -5, -6

5: 0000 0000 0000 0000 0000 0000 0000 0101


-5: 1111 1111 1111 1111 1111 1111 1111 1011
-6: 1111 1111 1111 1111 1111 1111 1111 1010

Given -5, verify that negating and adding 1 yields the number 5

11/7/2023 7
Sign Extension

• Occasionally, 16-bit signed numbers must be converted into 32-bit signed numbers
– for example, when doing an add with an immediate operand

• The conversion is simple: take the most significant bit and use it to fill up the
additional bits on the left – known as sign extension

So 210 goes from 0000 0000 0000 0010 to


0000 0000 0000 0000 0000 0000 0000 0010

and -210 goes from 1111 1111 1111 1110 to


1111 1111 1111 1111 1111 1111 1111 1110

11/7/2023 8
Binary, Signed-Integer Representations
B V alues represented

b b b b Sign and magnitude 1' s complement 2' s complement


3 2 1 0

0 1 1 1 + 7 + 7 + 7
0 1 1 0 + 6 + 6 + 6
0 1 0 1 + 5 + 5 + 5
0 1 0 0 + 4 + 4 + 4
0 0 1 1 + 3 + 3 + 3
0 0 1 0 + 2 + 2 + 2
0 0 0 1 + 1 + 1 + 1
0 0 0 0 + 0 + 0 + 0
1 0 0 0 - 0 - 7 - 8
1 0 0 1 - 1 - 6 - 7
1 0 1 0 - 2 - 5 - 6
1 0 1 1 - 3 - 4 - 5
1 1 0 0 - 4 - 3 - 4
1 1 0 1 - 5 - 2 - 3
1 1 1 0 - 6 - 1 - 2
1 1 1 1 - 7 - 0 - 1
Figure 2.1. Binary, signed-integer representations.

11/7/2023 9
Addition and Subtraction – 2’s Complement
4 0100 -4 1100
If carry-in to the high +3 0011
order bit = + (-3) 1101
carry-out then ignore 7 0111 -7 11001
carry

if carry-in differs from


carry-out then overflow
4 0100 -4 1100
-3 1101 +3 0011
1 10001 -1 1111

Simpler addition scheme makes twos complement the most common choice for integer number systems
within digital systems
11/7/2023 10
2’s-Complement Add and Subtract Operations
1101
(a) 0010 ( + 2) (b) 0100 (+ 4) + 0111
+ 0011 ( + 3)
0101 ( + 5)
+ 1010 (- 6 ) 0100 ( + 4)
(c) 1011 (- 5)
(- 2 )
0010
+ 1110 (- 2) 1110 + 1100
1001 (- 7)
(e) 1101 (- 3) (d) 0111 (+ 7) 1110 ( - 2)
- 1001 (- 7) 0110
+ 1101 (- 3)
+ 1101
(f) 0010 ( + 2) 0011 ( + 3)
- 0100 ( + 4) 0100 (+ 4)
1001
(g) 0110 ( + 6) + 0101
- 0011 ( + 3)
1110 ( - 2)
(h) 1001 ( - 7) 1001
- 1011 (- 5) + 1111
1000 ( - 8)
(i) 1001 (- 7)
- 0001 ( + 1) 0010
+ 0011
(j) 0010 ( + 2) 0101 ( + 5)
- 1101 ( - 3)

11/7/2023 11
Big-Endian and Little-Endian Assignments
• Modern computers store one byte of data in each memory
address or location, i.e., byte addressable memory.
• An 32-bit integer is, therefore, stored in 4 memory
addresses.
• The term "Endian" refers to the order of storing bytes in
computer memory.
In "Big Endian" scheme, the most significant byte is
stored first in the lowest memory address (or big in first),
while "Little Endian" stores the least significant bytes in
the lowest memory address.

11/7/2023 12
Big-Endian and Little-Endian Assignments

(a) Big-endian assignment (b) Little-endian assignment


11/7/2023 13
Uses of big-endian and little-endian
• IBM's 370 mainframes (RISC)-based computers and Motorola microprocessors use
the big-endian approach.
• Transmission Control Protocol/Internet Protocol (TCP/IP) also uses the big-endian
approach. For this reason, big-endian is sometimes called network order.
• On the other hand, Intel processors, DEC Alphas and at least some programs that
run on them are little-endian.
• There are also mixed forms of endianness. For example, VAX floating point uses
mixed-endian, also referred to as middle-endian.
• Bi-endian processors can run in both modes little and big endian. ARM processors
were little endian. Current generation ARM processors are bi-endian.
• Language compilers, such as Java or FORTRAN, have to know which way
the object code they develop is going to be stored. Converters can be used to
change one kind of endian to the other when necessary.
11/7/2023 14
ADDER
Simple Adders
Inputs Outputs x y Logical Expression of Full Adder
x y c s COUT = AB + BCin + ACin
c SUM = (A ⊕ B) ⊕ Cin
0 0 0 0 HA
0 1 0 1
1 0 0 1
1 1 1 0 s

Inputs Outputs
x y cin cout s
x y
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0 cout FA cin
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0 s
1 1 1 1 1
11/7/2023 Slide 16
Two’s-Complement Addition and Subtraction

k
x / c in

k
Adder / xy
k c out
y / k
/
A y or
y
AddSub

Add’Sub=‘0’ Addition ‘1’ Subtraction


Figure :Binary adder used as 2’s-complement adder/subtractor.

11/7/2023 Slide 17
Ripple-Carry Adder: Slow But Simple

x31 y31 x1 y1 x0 y0 Because of this, it takes more


time to produce SUM and
CARRY as each section in the
c32 c31 c2 c1 c0 circuit waits for the arrival of
FA . . . FA FA input.
cout cin
Critical path For instance, to deliver output for
the nth block, it needs to receive
s31 s1 s0 input from (n-1)th block. And this
delay is correspondingly termed
as propagation delay.
In the concept of ripple carry adder circuits, the bits that are necessary for addition
are immediately available.
Whereas every adder section needs to hold its time for the arrival of carry from the
previous adder block.
11/7/2023 Slide 18
4 bit Carry Lookahead Adder/Fast Adder
• To overcome the delay in ripple
carries adder, a carry-lookahead adder
was introduced.
• Here, by using complicated hardware,
the propagation delay can be
minimized.
Parallel adders normally incorporate carry
lookahead logic to ensure that carry
propagation between subsequent stages of
addition does not limit addition speed

S=Sum Gi = Xi*Yi.
G=carry generate Pi = Xi XOR Yi
P=carry propagate SUM = Pi XOR Ci
11/7/2023 19
Carry Lookahead Adder/Fast Adder
The carry propagates equation is
Pi = Ai XOR Bi Gi delivers carry only when both the inputs Ai and Bi are 1 without
considering the input carry. Pi is related to the carry propagation from
The carry generate is Ci to Ci+1.
Gi = Ai*Bi.
With these equations, the sum and carry equations can be represented as
SUM = Pi XOR Ci
Ci+1 = Gi + Pi*Ci

11/7/2023 20
Multipliers and Dividers
Modern processors perform many multiplications & divisions:
• Encryption, image compression, graphic rendering
• Hardware vs programmed shift-add/sub algorithms

11/7/2023 Slide 21
Shift- and Add Multiplication

Multiplicand x
Multiplier y
y0 x 20
Partial y x 21
products 1
y2 x 22
bit-matrix
y3 x 23
Product z

Figure : Multiplication of 4-bit numbers in dot notation.


11/7/2023 Slide 22
Binary and Decimal Multiplication

Step-by-step multiplication examples for 4-digit unsigned • Y(Multiplicand)=M X(Multiplier)=Q A.Q=Result


numbers.

11/7/2023 Slide 23
Hardware Multipliers

0 when Qi=0

M when Qi=1
Registers A and Q are concatenated shift registers . Together, they Typical Cell
hold partial product PPi while multiplier bit qi generates the signal
Add/Noadd. This signal causes the multiplexer MUX to select 0 when
qi=0, or to select the multiplicand M when qi=1, to be added to PPi to
generate PP(i+1). The product is computed in n cycles.
11/7/2023 Slide 24
Multiplication of Signed Numbers
• Multiplication of 2’s-complement operands, generating a double-
length product. The general strategy is still to accumulate partial
products by adding versions of the multiplicand as selected by the
multiplier bits. +13=01101
First, consider the case of a positive
multiplier and a negative multiplicand.
When we add a negative multiplicand
to a partial product, we must extend
the sign-bit value of the multiplicand
to the left as far as the product will
extend.
Multiplicand= -13 Multiplier=11
11/7/2023 25
The Booth Algorithm
The algorithm was invented by Andrew Donald Booth in 1950 while doing research
on crystallography at Birkbeck College in Bloomsbury, London. Booth's algorithm is of interest
in the study of computer architecture.
• The Booth algorithm generates a 2n-bit product and treats
both positive and negative 2’s complement n-bit operands
uniformly. To understand the basis of this algorithm, consider a
multiplication operation in which the multiplier is positive and
has a single block of 1s, for example, 0011110.
• To derive the product, we could add four appropriately shifted
versions of the multiplicand, as in the standard procedure.
However, we can reduce the number of required operations by
regarding this multiplier as the difference between two
numbers:

11/7/2023 26
Booth’s Multiplication Algorithm
Booth’s algorithm
→ treats both negative and positive operand uniformly
→ possibly use fewer than n addition or subtractions, thus possibly
fast multiplication
P=X·Y
Two adjacent bits xixi-1 are examined in each step.
If xixi-1= 01, Y is added to the accumulated partial product pi.
If xixi-1= 10, Y is subtracted from pi.
If xixi-1= 00 or 11 neither addition nor subtraction and
subsequent right shift of pi.
11/7/2023 27
The Booth Algorithm
• This suggests that the product can be generated by adding 2^5 times
the multiplicand to the 2’s-complement of 2^1 times the multiplicand.
For convenience, we can describe the sequence of required operations
by recoding the preceding multiplier as 0 +100010.
• In general, in the Booth algorithm,-1 times the shifted multiplicand is
selected when moving from 0 to 1, and +1 times the shifted
multiplicand is selected when moving from 1 to 0, as the multiplier is
scanned from right to left.
• To demonstrate the correctness of the Booth algorithm for negative
multipliers, we use the following property of negative-number
representations in the 2’s-complement system.
11/7/2023 28
Example
Normal
multiplication
schemes

-1 times the shifted multiplicand is selected


when moving from 0 to 1, and +1 times the
shifted multiplicand is selected when moving
from 1 to 0

Booth multiplication schemes


11/7/2023 29
Booth Algorithm Example

The transformation 011 ...110⇒ +1 0 0...0 -1 0 is called skipping over 1s. This term is derived from the
case in which the multiplier has its 1s grouped into a few contiguous blocks. Only a few versions of the
shifted multiplicand need to be added to generate the product, thus speeding up the multiplication
operation.
However, in the worst case—that of alternating 1s and 0s in the multiplier—each bit of the multiplier
selects a summand.
11/7/2023 30
Booth’s Example
• Example : 2 x 6 Q0 Q-1 = 00, then right shift and decrement
Initialize values (CNT stands for Count in the CNT by 1
flowchart)

ultiplicand

Multiplier

11/7/2023 31
Booth’s Example
• Q0 Q-1 = 10, then deduct A by M, and right shift, then decrement CNT by 1

11/7/2023 32
Booth’s Example
• Q0 Q-1 = 11, then right shift and Q0 Q-1 = 01, then add A by M,
decrement CNT by 1

11/7/2023 33
Booth’s Example
• and right shift and decrement CNT by 1

Result

CNT = 0, thus the algorithm terminates, the result is 00001100, i.e. 12.
11/7/2023 34
Working on the Booth Algorithm
1.Set the Multiplicand and Multiplier binary bits as M and Q, respectively.
2.Initially, we set the AC and Qn + 1 registers value to 0.
3.SC (sequence counter) represents the number of Multiplier bits (Q), that is continuously
decremented till equal to the number of bits (n) or reached to 0.
4.A Qn represents the last bit of the Q, and the Qn+1 shows the incremented bit of Qn by 1.
5.On each cycle of the booth algorithm, Qn and Qn + 1 bits will be checked on the following
parameters as follows:
1. When two bits Qn and Qn + 1 are 00 or 11, we simply perform the arithmetic shift right operation
to the partial product AC. And the bits of Qn and Qn + 1 is incremented by 1 bit.
2. If the bits of Qn and Qn + 1 is shows to 01, the multiplicand bits (M) will be added to the AC
(Accumulator register). After that, we perform the right shift operation to the AC and QR bits by
1.
3. If the bits of Qn and Qn + 1 is shows to 10, the multiplicand bits (M) will be subtracted from the
AC (Accumulator register). After that, we perform the right shift operation to the AC and QR bits
by 1.
6.The operation continuously works till we reached n - 1 bit in the booth algorithm.
7.Results of the Multiplication binary bits will be stored in the AC and QR registers.

11/7/2023 35
Example
• Multiply the two numbers 7 and 3 by using the Booth's multiplication algorithm.
Qn Qn + 1 M = (0111) AC Q Qn - 1 SC
M' + 1 = (1001) & Operation
Now set 7 (in binary 0111)
1 0 Initial 0000 0011 0 4
as multiplicand (M) and 3
Subtract (M' + 1) 1001
(in binary 0011) as a
1001
multiplier (Q).
Perform Arithmetic Right Shift 1100 1001 1 3
operations (ashr) And SC (Sequence Count)
1 1 Perform Arithmetic Right Shift 1110 0100 1 2
represents the number of
operations (ashr) bits, and here we have 4 bits,
0 1 Addition (A + M) 0111 so set the SC = 4.
0101 0100 Also, it shows the number of
Perform Arithmetic right shift 0010 1010 0 1 iteration cycles of the
operation booth's algorithms and then
0 0 Perform Arithmetic right shift 0001 0101 0 0 cycles run SC = SC - 1 time.
operation
11/7/2023 36
Homework
• Multiply the two numbers 23 and -9 by using the Booth's multiplication algorithm.
Qn Qn + 1 M=010111 AC Q Qn - 1 SC
M' + 1 = 1 0 1 0 0 1
Initially 000000 110111 0 6
Here, M = 23 = (010111)
1 0 Subtract M 101001 and Q = -9 = (110111)
101001
Perform Arithmetic right shift operation 110100 111011 1 5
1 1 Perform Arithmetic right shift operation 111010 011101 1 4
1 1 Perform Arithmetic right shift operation 111101 001110 1 3
0 1 Addition (A + M) 010111
Qn + 1 = 1, it means the
010100
output is negative.
Perform Arithmetic right shift operation 001010 000111 0 2
Hence, 23 * -9 = 2's
1 0 Subtract M 101001
complement of
110011
111001
111100110001
Perform Arithmetic right shift operation 100011 1 1
1 1 Perform Arithmetic right shift operation 111100 110001 1 0
=> (00001100111)

11/7/2023 37
Booth Algorithm Example
• A 16-bit worst-case multiplier, an ordinary
multiplier, and a good multiplier

11/7/2023 38
Booth-recoded multiplier /Fast Multiplication
• Bit-Pair Recoding of Multipliers:
• A technique called bit-pair recoding of the multiplier results in using at most one
summand for each pair of bits in the multiplier.
• It is derived directly from the Booth algorithm. Group the Booth-recoded multiplier
bits in pairs, and observe the following.
• The pair (+11)is equivalent to the pair (0 +1). That is, instead of adding 1 times the
multiplicand M at shift position i to +1×M at position i+1, the same result is
obtained by adding+1×Mat position i.
• Other examples are: (+1 0) is equivalent to (0+2), (1+1)is equivalent to(01),and so
on.
• Thus, if the Booth-recoded multiplier is examined two bits at a time, starting from
the right, it can be rewritten in a form that requires at most one version of the
multiplicand to be added to the partial product for each pair of multiplier bits.

11/7/2023 39
Booth-recoded multiplier Example
Num: 0 1 1 0 1
2’s Comp( -1): 1 0 0 1 1
2’s Comp( -1): 1 0 0 1 1
Add (-2): 1 0 0 1 1 0

Booth Algo

Booth-recoded Algo

-2 operation: take 2’s complement


i.e. -1 and add the same number to find -2

11/7/2023 40
Division
11/7/2023 41
Division of integers
• Division of integers involves the grouping of items. It includes both
positive numbers and negative numbers. Just like multiplication, the
division of integers also involves the same cases.
To sum it all up and to make everything easy, Types of
Result Example
Integers
the two most important things to remember
when you are multiplying integers or Both Integers
Positive 16 ÷ 8 = 2
dividing integers are: Positive
1.When the signs are different, the answer is
Both Integers
always negative. Negative
Positive –16 ÷ –8 = 2
2.When the signs are the same, the answer is
always positive. 1 Positive and
Negative –16 ÷ 8 = –2
1 Negative

11/7/2023 42
Division
1001ten Quotient 1001ten Quotient
Divisor 1000ten | 1001010ten Dividend Divisor 1000ten | 1001010ten Dividend
-1000
0001001010 0001001010 0000001010 0000001010 10
100000000000 → 0001000000→ 0000100000→0000001000 101
Quo: 0 000001 0000010 000001001
1010
-1000
10ten Remainder

At every step,
• shift divisor right and compare it with current dividend
• if divisor is larger, shift 0 as the next bit of the quotient
• if divisor is smaller, subtract to get new dividend and shift 1 as the next bit of
the quotient
11/7/2023 43
Non-Restoring Division Algorithm for Unsigned Integer
• Instead of the quotient digit set {0, 1}, the set {-1, 1} is used by the non-restoring division.
The non-restoring division algorithm is more complex as compared to the restoring division
algorithm
• This algorithm basically performs simple operations such as addition, subtraction. In this
method, we will use the sign bit of register A. 0 is the starting value/bit of register A.
Algo:
• Step 1: In this step, the corresponding value will be initialized to the registers, i.e., register A will
contain value 0, register M will contain Divisor, register Q will contain Dividend, and N is used to
specify the number of bits in dividend.
• Step 2: In this step, we will check the sign bit of A.
• Step 3: If this bit of register A is 1, then shift the value of AQ through left, and perform A = A +
M. If this bit is 0, then shift the value of AQ into left and perform A = A - M. That means in case
of 0, the 2's complement of M is added into register A, and the result is stored into A.
• Step 4: Now, we will check the sign bit of A again.
11/7/2023 44
Cntd..
• Step 5: If this bit of register A is 1, then Q[0] will become 0.
If this bit is 0, then Q[0] will become 1. Here Q[0] indicates the
least significant bit of Q.
• Step 6: After that, the value of N will be decremented. Here N is used
as a counter.
• Step 7: If the value of N = 0, then we will go to the next step.
Otherwise, we have to again go to step 2.
• Step 8: We will perform A = A + M if the sign bit of register A is 1.
• Step 9: This is the last step. In this step, register A contains the
remainder, and register Q contains the quotient.
11/7/2023 45
Example
N M A Q Action
Dividend = 11 Divisor =3 Divisor Dividend
4 00011 00000 1011 Begin

it contains only one decision 00011 00001 011_ Shift left AQ

and addition/subtraction per 00011 11110 011_ A=A-M


quotient bit. 3 00011 11110 0110 Q[0] = 0

00011 11100 110_ Shift left AQ

00011 11111 110_ A=A+M

2 00011 11111 1100 Q[0] = 0

00011 11111 100_ Shift left AQ

00011 00010 100_ A = A +M

1 00011 00010 1001 Q[0] = 1


So, register A contains the 00011 00101 001_ Shift left AQ
remainder 2, and register Q 00011 00010 001_ A=A-M
contains the quotient 3. 0 00011 00010 0011 Q[0] = 1

11/7/2023 46
Restoring Division Algorithm
Restoring division is usually performed on the fixed point fractional
numbers.
When we perform division operations on two numbers, the division
algorithm will give us two things, i.e., quotient and remainder.
• With the help of digit set {0, 1}, the quotient digit q will
be formed in the restoring division algorithm.
• We are using restoring term because we know that the
value of register A will be restored after each iteration.
• Here, register Q is used to contain the quotient, and
register A is used to contain the remainder. Here, the
divisor will be loaded into the register M

11/7/2023 47
Restoring division algorithm
• tep 1: In this step, the corresponding value will be initialized to the registers, i.e., register A will contain
value 0, register M will contain Divisor, register Q will contain Dividend, and N is used to specify the
number of bits in dividend.
• Step 2: In this step, register A and register Q will be treated as a single unit, and the value of both the
registers will be shifted left.
• Step 3: After that, the value of register M will be subtracted from register A. The result of subtraction
will be stored in register A.
• Step 4: Now, check the most significant bit of register A. If this bit of register A is 0, then the least
significant bit of register Q will be set with a value 1. If the most significant bit of A is 1, then the least
significant bit of register Q will be set to with value 0, and restore the value of A that means it will
restore the value of register A before subtraction with M.
• Step 5: After that, the value of N will be decremented. Here n is used as a counter.
• Step 6: Now, if the value of N is 0, we will break the loop. Otherwise, we have to again go to step 2.
• Step 7: This is the last step. In this step, the quotient is contained in the register Q, and the remainder is
contained in register A.

11/7/2023 48
Example
N M A Q Operation
Dividend = 11 Divisor = 3
4 00011 00000 1011 Initialize
So we should not forget to 00011 00001 011_ Shift left AQ
restore the value of the most 00011 11110 011_ A=A-M
significant bit of A, which is 1. 00011 00001 0110 Q[0] = 0 And
So, register A contains the restore A
remainder 2, and register Q 3 00011 00010 110_ Shift left AQ
contains the quotient 3. 00011 11111 110_ A=A-M
00011 00010 1100 Q[0] = 0
2 00011 00101 100_ Shift left AQ
00011 00010 100_ A=A-M
00011 00010 1001 Q[0] = 1
1 00011 00101 001_ Shift left AQ
00011 00010 001_ A=A-M
00011 00010 0011 Q[0] = 1

11/7/2023 49
Hardware for Division

Efficient Division

Source: P &H textbook

A comparison requires a subtract; the sign of the result is examined; if the result is negative, the divisor
must be added back . Similar to multiply, results are placed in Hi (remainder) and Lo (quotient)

11/7/2023 50
Floating Point Arithmetic
Floating point numbers
• When numbers cannot be represented as integers, we represent them
as floating point numbers.
• We will see how one could do the usual arithmetic and other
comparison operations on floating point numbers.
• Integer :- 7
• Rational :- 5/8
• Real :- √2
• Complex :- 2-3i Need to go beyond integers
Extremely small and large values :-
• Mass of Sun =1.989 × 10^30 kg
• Electron Mass =9.10938356 × 10-31 kg
Floating-Point Numbers

Useful for applications where very large and very small numbers are needed
simultaneously

• Fixed-point representation must sacrifice precision for small values to represent


large values
x = (0000 0000 . 0000 1001)two Small number
y = (1001 0000 . 0000 0000)two Large number

• Floating-point representation is like scientific notation:


(1) -20 000 000 = -2  10 7 (2) +0.000 000 007 = +7  10–9 Also, 7E-9

Sign Exponent
Significand Exponent base

11/7/2023 Slide 53
Interesting Fact

11/7/2023 54
ARM Floating Point architecture (VFP)
VFP Applications
• Automotive control applications
• Powertrain
• ABS, Traction control & active suspension
• 3D Graphics
• Digital consumer products
• Set-top boxes, games consoles
• Imaging
• Laser printers, still digital cameras, digital video cameras
• Industrial control systems
• Many real-time control applications in the industrial and automotive fields benefit from the dynamic
range and precision of floating-point unit.
• Automotive power train, anti-lock braking, traction control, and active suspension systems are all
mission-critical applications where precision and predictability are essential requirements.
Number System
That is, the least-significant digit (right-most digit) is of the order of 10^0 (units or ones),
the second right-most digit is of the order of 10^1 (tens), the third right-most digit is of the
order of 10^2 (hundreds), and so on, where ^ denotes exponent. For example,
735 = 700 + 30 + 5 = 7×10^2 + 3×10^1 + 5×10^0

Binary number system has two symbols: 0 and 1, called bits. It is also a positional notation,
for example,
10110B = 10000B + 0000B + 100B + 10B + 0B = 1×2^4 + 0×2^3 + 1×2^2 + 1×2^1 + 0×2^0
Conversion between Two Number Systems with Fractional Part
• Separate the integral and the fractional parts.
• For the integral part, divide by the target radix repeatedly, and collect the remainder in reverse
order.
• For the fractional part, multiply the fractional part by the target radix repeatedly, and collect the
integral part in the same order.

11/7/2023 56
Example
• Convert 18.6875D to binary Exercise: Convert the following decimal
• Integral Part = 18D numbers into binary equivalent:
• 18/2 => quotient=9 remainder=0 19.25D
• 9/2 => quotient=4 remainder=1 123.456D
• 4/2 => quotient=2 remainder=0
• 2/2 => quotient=1 remainder=0
• 1/2 => quotient=0 remainder=1 (quotient=0 stop) Hence, 18D = 10010B
• Fractional Part = .6875D
• .6875*2=1.375 => whole number is 1
• .375*2=0.75 => whole number is 0
• .75*2=1.5 => whole number is 1
• .5*2=1.0 => whole number is 1 Hence .6875D = .1011B

• Combine, 18.6875D = 10010.1011B


11/7/2023 57
Floating Point

• Normalized scientific notation: single non-zero digit to the


left of the decimal (binary) point – example:

1.010001 x 2-5 = (1 + 0 x 2-1 + 1 x 2-2 + … + 1 x 2-6) x 2-5

• A standard notation enables easy exchange of data between


machines and simplifies hardware algorithms – the IEEE 754
standard defines how floating point numbers are represented

58
Sign and Magnitude Representation (IEEE 754)
Sign Exponent Fraction • More exponent bits ➔
1 bit 8 bits 23 bits wider range of numbers
S E F (not necessarily more
numbers – recall there
• Largest number that can be represented: 2.0 x 2128 = 2.0 x 1038 are infinite real
• Smallest number that can be represented: 1.0 x 2-127 = 2.0 x 10-38 numbers)

•Overflow: when representing a number larger than the one above; • More fraction bits ➔
Underflow: when representing a number smaller than the one above higher precision

•Double precision format: occupies 64-bit registers:


Largest: Smallest: The largest exponent is U =
Sign Exponent Fraction 2046 − 1023 = 1023 The
1 bit 11 bits 52 bits smallest exponent is L = 1 −
S E F 1023 = −1022

59
Floating Point Normalization
• S,E,F representation allows more than one representation
for a particular value, e.g.
1.0 x 105 = 0.1 x 106 = 10.0 x 104
• This makes comparison operations difficult
• Prefer to have a single representation

• Hence, normalize by convention:


• Only one digit to the left of the floating point
• In binary, that digit must be a 1
• Since leading ‘1’ is implicit, no need to store it

6
Exponent Representation
• To simplify sort, sign was placed as the first bit
• For a similar reason, the representation of the exponent is also
modified: in order to use integer compares, it would be
preferable to have the smallest exponent as 00…0 and the
largest exponent as 11…1
• This is the biased notation, where a bias is subtracted from the
exponent field to yield the true exponent

• IEEE 754 single-precision uses a bias of 127 (since the


exponent must have values between -127 and 128)…double
precision uses a bias of 1023
Final representation: (-1)S x (1 + Fraction) x
2(Exponent – Bias)
61
Normalized Encoding Example
Float F = 15213.0;
• 1521310 = 111011011011012 = 1.11011011011012 X 213
• Significand
Exponent coded as biased value
M = 1.11011011011012 e = Exp – Bias ➔ Exp= e+Bias
frac = 110110110110100000000002 Exp : unsigned value denoted by Exp
Bias : Bias value
• Exponent Single precision: 127
e = 13 Double precision: 1023
Bias = 127 The largest exponent is U = 254 − 127 = 127
Exp = 140 =100011002 The smallest exponent is L = 1 − 127 = −126
Sign Exponent Fraction
1 bit 8 bits 23 bits
S=0 Exp=10001100 F=11011011011010000000000
Examples

Example: Represent -0.75ten in single and double-precision formats

Single: (No of bits = 1 + 8 + 23)


1 0111 1110 1000…000

Double: (No of bits = 1 + 11 + 52)


1 0111 1111 110 1000…000


Final representation: (0/1)S (1 + Fraction) x 2(Exponent – Bias)

63
Exercise
• Determine the single-precision representation of the
decimal number x= 37.625
• Determine the single-precision representation of the
decimal number x= -68.215
• Determine the double-precision machine representation of
the decimal number x = −28.12
• Determine the double-precision machine representation of
the decimal number x = 189.0125

11/7/2023 64
IEEE-754 32-bit floating-point to Decimal conversion
• Example 1: Suppose that IEEE-754 32-bit floating-point
representation pattern is
0 10000000 110 0000 0000 0000 0000 0000

• Sign bit S = 0 ⇒ positive number


• E = 1000 0000B = 128D (in normalized form)
Denormalised exponent (128-127)
• Fraction is 1.11B (with an implicit leading 1) = 1 + 1×2^-1 +
1×2^-2 = 1.75D
• The number is +1.75 × 2^(128-127) = +3.5D
11/7/2023 65
Example-2
• Suppose that IEEE-754 32-bit floating-point representation
pattern is
1 01111110 100 0000 0000 0000 0000 0000

Sign bit S = 1 ⇒ negative number


E = 0111 1110B = 126D (in normalized form)
Fraction is 1.1B (with an implicit leading 1) = 1 + 2^-1 =
1.5D
The number is -1.5 × 2^(126-127) = -0.75D

11/7/2023 66
Exercise
• Determine the decimal number corresponding to the
following single-precision machine number:
1 10011001 00000000000000000000001

• Determine the decimal number corresponding to the


following single-precision machine number:
0 11001011 00000000000000011000101

11/7/2023 67
FP arithmetic operations
11/7/2023 68
Precision Considerations
1. Guard Bits
• Although the significands (mantissas) of initial operands and final results are limited to a specified
number of bits (e.g. 24 bits for single format, including the implicit leading 1), it is still often necessary
to have some extra bits to accommodate the results of the execution of intermediate steps of any type of
arithmetic operation.
• Fortunately, the ALU registers that hold the exponent and significand of each operand prior and after a
floating-point operation are always greater in length than the length of the significand plus an implied
bit. The register thus automatically provides additional bits to the operands as well as to the final results,
and these types of extra bits that are retained are often called the guard bits, which help to realize the
maximum accuracy in the final results.
• 2. Truncation
• At the time of generating any result, it is often required to remove the guard bits from the extended
significand by appropriately chopping it off to bring it to a specified length that simply approximates the
longer version. This type of act making no changes in the retained bits is commonly known
as truncation.
• Chopping: Simple cutting the bits.
• Rounding: Rounding is essentially a variant of truncation that also disposes guard bits (extra bits) when represented in a specific
format.

11/7/2023 69
Example

11/7/2023 70
FP arithmetic operations
Ye
Addition X+Y (adjusted Xm + Ym) 2 where Xe ≤ Ye
Ye
Subtraction X-Y (adjusted Xm - Ym) 2 where Xe ≤ Ye
Xe+Ye
Multiplication XxY (adjusted Xm x Ym) 2
Xe-Ye
Division X/Y (adjusted Xm / Ym) 2

11/7/2023 71
Floating-Point Addition & round up operation

Numbers to be added: In this example, there are


x = 25  1.00101101 Operand with three bits to the right of the
y = 21  1.11101101 smaller exponent unit of the last place.
to be preshifted If these three bits are 0xx,
then we round down; if the
Operands after alignment shift: bits are 100 we round to
x = 25  1.00101101 even; if the bits are 1xx and
y = 25  0.000111101101 at least one of the x’s is a
Extra bits to be one, we round up.

Result of addition: rounded off


s = 25  1.010010111101
s = 25  1.01001100 Rounded sum

Figure : Alignment shift and rounding in floating-point addition.

11/7/2023 Slide 72
FP Addition
• Considerthe following decimal example (can maintain only 4
decimal digits and 2 exponent digits)

9.999 x 101 + 1.610 x 10-1


Convert to the larger exponent:
9.999 x 101 + 0.016 x 101
Add
10.015 x 101
Normalize If we had more fraction bits,
these errors would be minimized
1.0015 x 102
Check for overflow/underflow Round
1.002 x 102
Re-normalize

73
Overflows

• For an unsigned number, overflow happens when the last carry (1) cannot be
accommodated

• For a signed number, overflow happens when

▪ when the sum of two positive numbers is a negative result


▪ when the sum of two negative numbers is a positive result
▪ The sum of a positive and negative number will never overflow

▪Not in this range -126 <= Exponent <= 127 i.e. 1 <= Biased
Exponent <= 254

• 11/7/2023 74
Addition and Subtraction
• Addition is similar to decimal arithmetic

• For subtraction, simply add the negative number – hence, subtract A-B involves
negating B’s bits, adding 1 and A

Source: P &H textbook

11/7/2023 75
FP Multiplication Steps

•Steps:
▪ Compute exponent (careful!)
(Remove the bias→ Add→ Restore
the Bias)
▪ Multiply significands (set the
binary point correctly)
▪ Normalize
▪ Round (potentially re-normalize)
▪ Assign sign (Add sign bits, mod 2,
to get sign of resulting
multiplication. Same as XORing the
sign bit)

76
Sample multiplication

11/7/2023 77
FP Multiplication Example
• Suppose you want to multiply following two numbers:

• Now, these are steps according to above algorithm: Biased Exponent value
• Given, A = 1.11 x 2^0 and B = 1.01 x 2^2 for 4 bits=??
• So, exponent c = a + b = 0 + 2 = 2 is the resulting exponent.
• Now, multiply 1.11 by 1.01, so result will be 10.0011
• We need to normalize 10.0011 to 1.00011 and adjust exponent 1 by 3 appropriately.
• Resulting sign bit 0 (XOR) 0 = 0, means positive.
• Now, truncate and normalize it 1.00011 x 2^3 to 1.000 x 2^3.
e=3 Bias = ?? Exp= ??
11/7/2023 78
FP Division Example
• Division of IEEE 754 Floating point numbers (X1 & X2) is done
by dividing the mantissas and subtracting the exponents.
X3 = (X1/X2)
= (-1)S1 (M1 x 2E1) / (-1) S2 (M2 x 2E2)
= (-1) S3(M1/M2) 2 (E1-E2)

1) If divisor X2 = zero then set the result to "infinity", if both X1 and X2 are zero's set it to "NAN"
2) Sign bit S3 = (S1 XOR S2).
3) Find mantissa by dividing M1/M2
4) Exponent E3 = (E1 - E2) + bias
5) Normalize if required, i.e by left shifting the mantissa and decrementing the resultant exponent.
6) Check for overflow/underflow
If E3 > Emax return overflow i.e. "infinity"
If E3 < Emin return underflow i.e. zero

11/7/2023 79
IEEE 754 standard floating point Division Example:
X1=127.03125 Binary Division Examples
X2=16.9375 Result=7.5
Equivalent floating point binary words are

1) S3 = S1 XOR S2 = 0 2) E3 = (E1 - E2) + bias = (10000101) -


(10000011)+ (1111111)
= 133-131+127 => 129 => (10000001)
3) Divide the mantissas M1/M2
Final result

X3 in decimal = 7.5

11/7/2023 80
References:

• Computer Organization and Design 5th Edition


by Patterson and Hennessy.
• Computer Architecture and Organization; J. P. Hayes; Third Edition,
McGraw Hill.
• Digital integrated circuits by Jan Rabaey Prentice Hall 2nd edition
• Computer Organization; Safwat G. Zaky, Zvonko G. Vranesic, Carl
Hammacher ; McGraw Hill.

11/7/2023 81
THANKS
11/7/2023 Slide 82

You might also like