Mulitpliers Dividers Supp4 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Supplement to

Logic and Computer


Design Fundamentals
4th Edition 1

MULTIPLIERS AND DIVIDERS


elected topics not covered in the fourth edition of Logic and Computer Design

S Fundamentals are provided here for optional coverage and for self-study. This
material fits well with the desired coverage in some programs but not may not fit
within others due to time constraints or local preferences. This supplement introduces
unsigned integer multipliers and dividers. Basic algorithms are introduced and two
fundamental concepts that support high-speed implementations are presented. One
example high-speed implementation is outlined for each operation in order to give a
sense of how these operations are handled using combinational circuits in modern
VLSI chips. References [3] and [4] provide additional information on this topic.

Introduction
Many digital systems. including computers, require implementation of integer
and/or floating point multiplication and division. Further, many of these applica-
tions require that multiplication and division for operands of significant size be
executed at very high speeds. Historically, the recursive algorithms for these two
operations if implemented in a straightforward manner as sequential circuits with
maximum hardware reuse have very limited performance. Our goals here are to
present these recursive algorithms for integer multiplication and division, intro-
duce two fundamental techniques to improved performance, and to provide an
implementation of each algorithm that executes at high speed.

MULTIPLICATION The basic multiplication algorithm recursively applies the follow-


ing multiply step to each successive bit of the multiplier beginning with the its LSB.
AND the multiplier bit with the entire multiplicand, add the result to the
accumulating partial product, and shift the accumulating partial product and multi-
plier one bit to the right.
Repeat this step until the MSB of the multiplier has been processed, and read
the final product.

1© Pearson Education 2008. All rights reserved.

1
• EXAMPLE 1 Fixed Point Multiplication
The operation to be illustrated is fixed point unsigned binary multiplication, which
is quite simple. The multiplier digits are always 1 or 0. Therefore, the partial prod-
ucts are equal either to the multiplicand or to zero. Multiplication is illustrated by
the following example:
Multiplicand: 1011
Multiplier: × 1010

Initialize Partial Product Z to 0 0000

Add (Y0 = 0) x (X = 1011) to Z + 0000


and shift right one position 00000

Add (Y1 = 1) x (X = 1011) to Z + 1011


and shift right one position 010110

Add (Y2 = 0) x (X = 1011) to Z + 0000


and shift right one position 0010110

Add (Y3 = 1) x (X = 1011) to Z + 1011


and shift right one position Product: 01101110

This example shows an initialization step followed by four recursive multiply steps,
one for each bit of the multiplier. The algorithm illustrated differs from the classic
hand multiplication algorithm in two ways. First of all, the left shift of the multipli-
cand is replaced by a right shift of the partial product which permits an adder of
width n, the width of the multiplicand to be employed. Second, the single addition
of partial products at the end is replaced by immediate addition of each partial
product to form the accumulating partial product to avoid the need for a multi-
operand adder. This second change, however, is often reversed in modern imple-
mentations of the algorithm. •

DIVISION Initially, the dividend, divisor, and quotient are assumed to be n-bit
quantities. The initial partial remainder is n 0s followed by the dividend. The basic
division algorithm recursively applies the following divide step to find successive
bits of the quotient beginning with the its MSB.
Shift the partial remainder one bit to the left. Subtract the divisor from the
partial remainder. If the new partial remainder is positive, then set the new quo-
tient bit to 1 and shift the new partial remainder and the quotient one bit to the
left. If the new partial remainder is negative, set the new quotient bit to 0, and
replace the new partial remainder with the original one and shift it and the quo-
tient one bit to the left.
Repeat this divide step until the all bits of the quotient have been generated
to produce the quotient and remainder.

2
• EXAMPLE 2 Fixed Point Division
The operation to be illustrated is fixed point unsigned binary division. The compu-
tation begins with the default initial partial remainder, n 0s followed by the n-bit
dividend. Each divide step must perform subtraction of the divisor from the shifted
partial remainder in order to determine the value of the quotient bit. Based on this
result the quotient bit value is set and if it is 0, the new partial remainder is replace
by the prior partial remainder. Note that the least significant half of the partial
remainder and the quotient can share the same storage space.
Dividend: 00110101
Divisor D: 00000101
Zero/Partial Remainder R: 0000000000110101
Shift R and Q left. Subtract D. Since the new R 000000000110101
is −, replace with prior R, and append 0 to Q. − 00000101Quotient↓
0000000001101010
Shift R and Q left. Subtract D. Since the new R 000000001101010
is −, replace with prior R, and append 0 to Q. − 00000101
0000000011010100
Shift R and Q left. Subtract D. Since the new R 000000011010100
is −, replace with prior R, and append 0 to Q. − 00000101
0000000110101000
Shift R and Q left. Subtract D. Since the new R 000000110101000
is −, replace with prior R, and append 0 to Q. − 00000101
0000001101010000
Shift R and Q left. Subtract D. Since the new R 000001101010000
is +, append 1 to Q. − 00000101
0000000110100001
Shift R and Q left. Subtract D. Since the new R 000000110100001
is −, replace with prior R, and append 0 to Q. − 00000101
0000001101000010
Shift R and Q left. Subtract D. Since the new R 000001101000010
is +, append 1 to Q.
0000000110000101
Shift R and Q left. Subtract D. Since the new R 000000110000101
is −, replace with prior R, and append 0 to Q. − 00000101Quotient↓
Remainder: 0000001100001010

From this example, it is apparent that the step must be applied n times where
n is the number of bits to be generated in the quotient.

Speed-up Fundamentals
The algorithms presented and illustrated for multiplication and division are charac-
terized by two common properties: (1) each of them consists primarily of a recur-
sive step, and (2) each execution of the recursive step processes a single bit of the
multiplier or produces a single bit of the quotient. respectively, for multiplication

3
and division. Both algorithms can be implemented by a sequential circuit with stor-
age registers for the operands, a block of combinational hardware that implements
the recursive step by processing one bit of the multiplier or quotient per clock
cycle.
The number of clock cycles required can be reduced from n to n/m for per-
forming multiplication by using m copies of the recursive combinational block
interconnected to process m bits, 1 < m ≤ n of the multiplier in a single clock cycle.
The approach is illustrated by use of an adder array in the following example in
which m = n.
• EXAMPLE 3 Adder Array
Consider the multiplication of two 4-bit numbers, as shown in Figure 1. The multi-
plicand is X3, X2, X1, X0 , the multiplier is Y3, Y2, Y1, Y0 , and the product is Z7, Z6,
Z5, Z4, Z3 , Z2 , Z1, Z0 . The first partial product is formed by multiplying X3, X2, X1,
X0 by Y0 . The multiplication of two bits such as X0 and Y0 produces a 1 if both bits
are 1; otherwise it produces a 0. This is identical to an AND operation. Therefore,
the partial product can be implemented with AND gates as shown in the diagram.
The second partial product is formed by multiplying X3, X2, X1, X0 by Y1 and is
shifted one position to the left. The sum of the two partial products is produced by
a binary adder. Note that the least significant bit of the product does not have to go
through an adder, since it is completely formed by the output of the first AND
gate. The same approach as used for multiplier bit Y1 is also used for multiplier bits
Y2 and Y3 as shown in the final circuit in Figure 1. Note that the Carry Out can be
a 1, so it enters the MSB of the adder at the next level down. In general, for J mul-
tiplier bits and K multiplicand bits, we need J × K AND gates and (J ⫺ 1) K-bit
adders to produce a product of J ⫹ K bits. For the Example 1 multiplier in Figure
1, K ⫽ 4 and J ⫽ 4, so we need 16 AND gates and three 4-bit adders to produce a
product of 8 bits. •
In Example 1, the entire multiplication is completed for all multiplier bits in a
single clock cycle using only combinational logic. Thus, in effect, four bits of the
multiplier are processed simultaneously. However, because of the cost and the
delay of an adder, an additional goal with respect to processing multiple bits of the
multiplier or quotient can be to process m multiplier bits per recursive step. In this
case, only a single adder of length n + m bits is required. for the step. Such an
approach requires that multiples of the multiplicand be made available to the
adder and additional hardware be made available to select these multiples. This
second speedup approach is illustrated by the next example.
• EXAMPLE 4 Multiplier Radix Greater Than 2
Consider again the multiplication of two 4-bit numbers, as shown in Figure 1. Our
goal is to use only two recursive steps instead of four to perform this multiply,
reducing the number of adders required and the delay through the circuit. The first
partial product is to multiply the multiplicand by Y1, Y0, and the second partial
product is multiply the multiplicand by Y3, Y2. The values taken on by these bit
pairs are 0, 1, 2, 3, the digits of radix 4. To process multiplier digits in radix 4, we
need to provide multiples of X = X3, X2, X1, X0 by 0, 1, 2, and 3 instead of by just 0
and 1 as for radix 2. This requires that pairs of the AND operation structure from

4
Y0
X3 X2 X1 X0

Y1
X3 X2 X1 X0

Addend Augend
4-bit adder
Carry
output Sum

Y2
X3 X2 X1 X0

Addend Augend
4-bit adder
Carry
output Sum

Y3
X3 X2 X1 X0

Addend Augend
4-bit adder
Carry
output Sum

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

FIGURE 1
A 4-Bit by 4-Bit Binary Multiplier

5
Example 3 be replace by multiplexers able to select one of these four multiples of
the multiplicand. The formation of the multiples and their selection is illustrated in
Table 1. From the table, it is apparent that four values must be provided to be

• TABLE 1 2-bit Multiplier

Multiplier Bits Multiple of Multiplicand

Yi+1 Yi Multiples Implementation

0 0 0 0

0 1 1 X

1 0 2 Shift left X by 1

1 1 3 (Shift left X by 1) + X

selected for each of the recursive steps. The first value 0 is a constant. The second
value is simply X. The third value is 2X which is obtained by shifting X one posi-
tion to the left, filling the vacant position with 0. The final value is 3X which is
obtained by using an adder to add X to 2X. The hardware for forming these four
values is shown in the upper part of Figure 2.
Since there are only two radix 4 values in the 4 × 4 multiplier, there are only
two multiples to be selected from the four values in Table 1, Y1Y0 × X and Y3Y2 ×
X. These two operands require only a single adder as shown in Figure 2. In order
to select one of the four values for each adder input, a multiplexer with select
inputs Y1,Y0 and one with select inputs Y3,Y2 are used. Note that bits Z1 and Z0
bypass the adder since they depend only on the value selected by Y1 and Y0, The 0s
inserted are there to show the shifting of operands relative to each other and as
fixed inputs can be used to simplify hardware using contraction as presented in
Chapter 4. The width of the various operands external and internal to the multi-
plier are determined by looking at the number of bits in the incoming operands
and the number of bits, including outgoing carries that are non-zero, in the results.

High-Speed Multipliers
In this section and the next section, the two speedup approaches will be applied to
an example of a contemporary multiplier design and a contemporary divider
design. Before presenting these examples, we need to introduce the concept of the
carry save adder (CSA) which is used in both of the designs.

CARRY SAVE ADDER


The carry save adder consists of full adders as shown in Figure 3. Beginning with a
ripple carry adder as a starting point, the ripple carries are disconnected. This pro-

6
0 X
6 4

1 Shift Left by 1 Bit

0 5
5
6 6

Addend Augend
5-bit Adder
Carry
output Sum
6
3X

2X

1X

0X

D0 D1 D2 D3 D0 D1 D2 D3
Y2 S0 MUX Y0 S0 MUX
Y3 S1 Y1 S1
6
6
4
00 6
Addend Augend
6-bit Adder

Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0

FIGURE 2
A 4-Bit by 4-Bit Binary Multiplier with Radix-4 Multiplication

vides an additional input on each full adder to apply an operand bit, permitting
three inputs to the adder. Further, as well as the sum output, the carry output from
each full adder is provided. Thus, the sum vector S and the carry vector C repre-
sent the actual sum of the three input operands, without the carries fully propa-
gated. Because 2n bits are required to represent an n-bit sum, this is sometimes
referred to as a redundant representation of the sum. These S and C vectors can be
summed by a carry propagate adder to form the overall sum of the three opera-
tions. A single carry save adder is not of a great amount of value. However,
replacement of all but the final carry propagate adders in Figures 1 or 2 with care-
fully constructed arrays of carry save adders can be of value in reducing both delay
and the amount of digital logic required.

7
B3 A3 D3 B2 A2 D2 B1 A1 D1 B0 A0 D0

FA FA FA FA

C4 S3 C3 S2 C2 S1 C1 S0

FIGURE 3
4-bit Carry-Save Adder (CSA)

WALLACE TREE MULTIPLIER There are a number of different designs for high-
speed adder arrays for multiplication. The one we illustrate as an example is the
Wallace tree multiplier. This multiplier consists of an arrangement of CSAs inter-
connected in the form of an inverted tree with the outputs at the bottom and the
partial products at or near the top. The block diagram for an 8 × 8 Wallace tree
multiplier for an 8-bit multiplicand X and an 8-bit multiplier Y is shown in Figure
4. Rather than showing the input and output vectors, we show the 8-bit partial
products consisting of the AND of a multiplier bit Yi with the multiplicand X. The
partial products are appropriately shifted to the left depending on the value of i. In
the tree, full use is made of the three inputs on each CSA. If the tree is built up
from the bottom, the structure is quite apparent. The bottom component is a carry
propagate adder, typically a carry lookahead adder (CLA) for improved perfor-
mance. Above the CLA and the first carry save adder (CSA), pairs of C and S
Y5X Y4X Y3X Y2X Y1X Y0X

CSA2 CSA1
C S C S
Y7X Y6X

CSA4 CSA3
C S C S

CSA5
C S

CSA6
C S

CLA7
S

FIGURE 4
8 × 8 Wallace Tree Multiplier

8
inputs are applied from right to left to the triples of inputs on the row of maximum
possible CSAs below. This can be continued until the number of inputs to the top
row exceeds the number required. For the Wallace tree multiplier shown, the top
row if completed would have 12 inputs which exceeds the eight required, one for
each bit of the multiplier. The four excess inputs can be removed by pruning the
tree which amounts to removing some of the CSAs in the top row and possibly in
lower rows. In this case, to reduce the 12 inputs to eight inputs, two CSAs are
removed from the right top. This removes six inputs and opens up two inputs in the
second row, for a net reduction of four inputs.
The full logic diagram of the Wallace tree multiplier based on the diagram in
Figure 4 is shown in Figure 5. The CSA are made up of full adders (FA) with three
inputs and two outputs also called 3-2 adders. In some cases, along the left and
right edges of the adder, there are unused inputs to the CSAs, so they degenerate
into half adders (HA) with two inputs (2-2 adders) and, in extreme cases, just lines.
The process or generating this logic is not insignificant since preserving the proper
alignment of the C and S vectors while progressing up or down the tree is critical.
In this example, the FAs and HAs are aligned so that the S bits go straight down
and the C bits going down move one position to the left.
The Wallace tree shown in Figure 4 processes eight multiplier bits in five lev-
els. In contrast, a conventional series of adders for eight multiplier bits uses seven
levels. So the delay in terms of levels is reduced by 29 percent. Using the more
detailed diagram in Figure 5, there are nine levels through the CSAs plus the CLA
on the longest path. In the series of adders, there are 12 levels through the CLAs,
for a delay reduction in terms of levels of 25 percent. So the two evaluation meth-
ods yield about the same amount of improvement in delay.

HIGHER RADIX MULTIPLIER PROCESSING As illustrated in Example 2, two or more


bits of a multiplier may be retired per multiply step. Just as in that example, this
approach has potential for speeding up multiplication and for reducing hardware
costs in multipliers using carry save adder trees. In this case, we use a radix-4 multi-
plier, retiring two multiplier bits per multiply step in the 8 × 8 multiplier.
The four partial products needed for the four multiply steps are produced by
adding two multiplexers to the same circuit used in the upper part of Figure 2. The
resulting circuit is shown in the upper part of Figure 6. The four partial products
are fed to a 4-input CSA tree requiring just two CSA levels. In addition to dealing
with more than two multiplier bits, this circuit also reduces the hardware required
by a considerable amount going from a total of seven adders to four adders.

High-Speed Dividers
For each divide step, a single quotient bit is generated. Further, in order to deter-
mine the quotient bit, a trial subtraction must be performed in which the quotient
bit is temporarily set to 1 and the divisor is subtracted from the current partial
remainder to determine the sign of the new partial remainder. If the sign is 0 (non-
negative), then the new partial remainder replaces the old one and the quotient bit

9
10
A B
Cout Cin
S Y1X7 Y1X6 Y2X4 Y0X6 Y1X4 Y2X2 Y0X4 Y1X2 Y2X0 Y0X2 Y0X1
Y2X7 Y2X6 Y2X5 Y0X7 Y1X5 Y2X3 Y0X5 Y 1X3 Y2X1 Y0X3 Y 1X1 Y1X0 Y0X0
(a)
CSA1 HA FA FA FA FA FA FA HA

Y4X7 Y5X5 Y3X7 Y 4X5 Y5X3 Y3X5 Y4X3 Y5X1 Y3X3 Y4X1 Y4X0 Y3X0
Y5X7 Y5X6 Y4X6 Y5X4 Y3X6 Y4X4 Y5X2 Y3X4 Y4X2 Y5X0 Y3X2 Y3X1

CSA2 HA FA FA FA FA FA FA HA

partial remainder retained.


CSA3 FA FA FA FA FA FA FA HA

Y6X7 Y6X6 Y6X5 Y6X4 Y 6X3 Y6X2 Y6X1 Y6X0


Y7X7 Y7X6 Y7X5 Y7X4 Y7X3 Y7X2 Y7X1 Y7X0

CSA4 HA FA FA FA FA FA FA HA

CSA5 FA FA FA FA FA FA FA HA HA HA

CSA6
HA FA FA FA FA FA FA FA HA HA HA

CLA7 CLA

Z15 Z14 Z13 Z12 Z11 Z10 Z9 Z8 Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0


(b)

divisor from the appropriately shifted partial remainder. Assuming that these are
To implement the above for a divide step, a subtractor is used to subtract the
is 1. If this sign is negative, then the quotient bit is changed to 0 and the current
FIGURE 5
Figure 5Wallace Multiplier Details
0 X

Shift Left by 1 Bit

Addend Augend
5-bit Adder
Carry
output Sum

3X

2X
1X

0X

D0 D1 D2 D3 D0 D1 D2 D3 D0 D1 D2 D3 D0 D1 D2 D3
Y6 S0 MUX Y4 S0 MUX Y2 S0 MUX Y0 S0 MUX
Y7 S1 Y5 S1 Y3 S1 Y1 S1

CSA
C S

CSA
C S

CLA
S

FIGURE 6
Radix-4 Multiplier

unsigned numbers, the subtraction is performed by adding the two’s complement


of the divisor to the partial remainder. If no carry out occurs, then the result is neg-
ative, the quotient bit is set to 0, and the old partial remainder is shifted one posi-
tion to the left. If a carry occurs, then the result is non-negative, the quotient bit is
set to 1, and the new partial remainder shifted one bit to the right replaces the old
partial remainder.
Just as was done for multiplication, the combinational logic used to perform
the above divide step can be cascaded to build a combinational divider that pro-
cesses multiple quotient bits, one at a time. The requires n such circuits for n divide
steps and corresponds to the cascaded adders for multiplication in Figure 1. There
will be a multiplexer between adders in order to select between the new and old
partial remainders. CSA adders can be used to add the inverted divisor and a 1 to
the most significant n bits of the partial remainder. This performs a 2s-complement
unsigned subtraction of the divisor from the partial remainder. This can also be

11
converted to a CSA adder cascade with multiplexers for both C and S inserted
between the adders. However, due to the necessity that the subtractions be per-
formed for each quotient bit sequentially, this cannot be converted to a CSA tree.
In order to achieve greater speed up and potential reduction of hardware, the logi-
cal next step is to attempt multiple bit quotient generation.
To illustrate multiple quotient bit generation, the circuit in Figure 7 is an
alternative implementation of a division method appearing in the patent reference
[4]. This particular method generates the quotient radix 16, i. e., four bits at a time.
Initially, we consider conceptually the problem of generating four bits of quotient
in a divide step. Recall that in the one-bit-at-a-time quotient generation, a trial
subtraction of the divisor from the appropriately shifted dividend is performed to
determine whether the quotient is 0 or 1. Extending this approach, to generate a
divisor four bits at a time could potentially require 15 trial subtractions simulta-
neously, a procedure that is too costly in hardware. By examining a few of the most
significant divisor bits and several of the most significant shifted partial remainder
bits, this can be reduced to a maximum of five trial subtractions. In the upper left
of Figure 7, two divisor bits and four partial remainder bits are examined by using
a lookup table 2 implemented by a combinational circuit or a high-speed memory.
The outputs of the lookup table consist of five hexadecimal values that are know to
contain the representation of the actual four quotient bits.
The five hexadecimal values are used to generate five multiples of the divisor
that are to be subtracted from the shifted partial remainder. The shifting opera-
tions and CSA adder in the upper center of Figure 7, produce multiples of 12 and
of 3 of the divisor. Shifting operators alone at the inputs to the first pair of multi-
plexers produce multiples of 0, 4, and 8 and 0, 1, and 2, respectively of the divisor.
The pair of dual multiplexers that follow combine these multiples with the multi-
ples of 12 and 3, respectively produced by the CSA in C and S form. The multi-
plexers must be dual because they have to past the C and S form. For the multiples
from the first pair of multiplexers, a 0 is applied to each of the second multiplexers
for the C vector. Table 2 shows how the multiples of 0, 4, 8, 12 and 0, 1, 2, 3 are
selected by the four quotient bits to produce multiples of 0 through 15. These mul-
tiples enter a CSA adder tree consisting of two CSA adders to produce the divisor
multiple. In order to not have to deal with taking the twos complement of the divi-
sor multiple in C and S form, we subtract the n most significant bits of the shifted
partial remainder from the divisor multiple using a cascade of a final CSA followed
by a CLA. The operation performed is qD + P + 1 = qD − P. The carry out is that
of qD − P. which is 0 for P − qD negative. The entire circuit, encircled by a dashed

2 This implementation uses a pre-shift of the dividend and divisors so that they are normal-
ized as done for floating point numbers. Each of these two operands are shift to the left so
that the MSB has value 1. Now that the divisor always contains a 1 as its MSB, the two divi-
sor bits used for the lookup table are the two bits following the MSB since the fixed 1 pro-
vides no information. Thus the MSB is implicitly being used giving three divisor bits. The
amount that each of the operands is shifted to the left is saved. For fixed point division, this
shift information controls post-shifts applied to the quotient and the remainder to change
them from floating point to fixed point format.

12
• TABLE 2 Formation of Multiples of 0 through 15

Most Significant Least Significant


Multiple Half Half

0 0 0
1 0 1
2 0 2
3 0 3
4 4 0
5 4 1
6 4 2
7 4 3
8 8 0
9 8 1
10 8 2
11 8 3
12 12 0
13 12 1
14 12 2
15 12 3

line, is called a Trial Divide Step (TDS) circuit and is replicated five times to per-
form the five trial divisions. The carry outs of the five TDSs enter a priority
encoder with TD5’s carry out highest priority D (TD0 through TD4 have the divi-
sors ordered from smallest to largest). The priority encoder selects the largest divi-
sor that results in a carry out of 1, corresponding to the largest quotient that
produces a positive partial remainder, which is the actual quotient. This quotient is
selected from the output of the multiplexer below the lookup table using the prior-
ity encoder outputs. Likewise, the 2s-complement of n bits of the new partial
remainder is selected by the multiplexer driven by the TDSs. This is inverted and
+1 is added to obtain the new partial remainder bits which are loaded into the
most significant n bits of the new partial remainder which is also shifted by four to
the left. For an n-bit divisor, this circuit is used k times sequentially where k is an
integer, k ≥ n/4.

References
1. MANO, M. M. AND C. R. KIME. Logic and Computer Design Fundamentals, 4th
ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2008.
2. MANO, M. M. AND C. R. KIME. Logic and Computer Design Fundamentals, 3rd
ed. Upper Saddle River, NJ: Pearson Prentice Hall, 2004.
3. HENNESSY, J. L. AND D. A. PATTERSON, Computer Architecture, a Quantitative
Approach, 4th ed. Morgan Kaufmann/Elsevier, Inc., 2007.
4. SHEAFFER, G. S., Computer for Performing Non-Restoring Division, United
States Patent 5,818,745, United States Patent and Trademark Office,
http://www.uspto.gov/, October 6, 1998.

13
Divisor Partial Remainder D
Bits Bits
1 Shift Left 1 Bit
2 4

Quotient Addend Augend


Candidate CSA
Lookup S
C
Table

Shift Left 2 Bits 2


2 Shift Left 2 Bits

To TDS 1 through TDS 4 To TDS 1 through TDS 4

0 2 3 0 1

S03,S02 D0 D1 D2 D0 D1 D2
S01,S00
S1,S0 MUX S1,S0 MUX

D0 D1 D2 D3 D4 0 0
S2,S1,S0 MUX
D01 D00 D10 D11 D01 D00 D10 D11
4 Quotient Bits S C MUX S S C MUX S

CSA
C S

CSA
C S

+1
CSA
C S

Carry Out TDS 0 CLA


Trial Divide Step A (TDS 0) TDS 1 TDS 2 TDS 3 TDS 4

Carry Outs from TDS 1 through TDS 4

Priority 3 D0 D1 D2 D3 D4
Encoder S2,S1,S0 MUX

+1

4 Shift Left 4 Bits 4 Shift Left 4 Bits

Partial Remainder
Register R

FIGURE 7
Radix-16 Divider

Problems
1. Perform a fixed point multiplication with multiplier 11001001 and
multiplicand 11100011.

14
2. Perform a fixed point division with divisor 00001011 and dividend 11100011.
3. Construct a Wallace CSA block diagram for an 8 × 8 multiplier.
4. Many multiple-bit processing algorithms for multiplication and division use
both positive and negative coefficients times the multiplicand or divisor,
respectively. These methods avoid the use of adders in forming multiples,
requiring only multiplicand or divisor shifts to generate multiples. For
multiplication, one of these schemes is based on the Booth algorithm, which
uses signed 2s-complement operations and result, and 2s-complement
subtraction. In this algorithm for processing of two multiplier bits at a time,
a bit position m−1, initialized to 0, is added to the right of the LSB of the
multiplier. For each step, the multiplier is shifted right by two positions with
its LSB going into m−1. Using the 3-bit combination (m1, m0, m−1) as input,
the multiple of the multiplicand M to be used for processing bit pair (m1, m0)
is determined using Table 3. After the operation specified by the table, the

• TABLE 3 2-bit Booth Algorithm Recoding

Multiplier Bits Multiple of Multiplicand

m1 m0 m-1 Multiple Implementation

0 0 0 0 Constant 0

0 0 1 +1 Add M

0 1 0 +1 Add M

0 1 1 +2 Add M left-shifted by 1

1 0 0 −2 Subtract M left-shifted by 1

1 0 1 −1 Subtract M

1 1 0 −1 Subtract M

1 1 1 0 Constant 0

product and multiplier are shifted two bit positions to the right.
(a) Show the multiple of M required for each pair of bits in the multiplier
1011100010. Remember to append the 0 to the right of the LSB before
applying Table 3.
(b) Sketch an implementation of the hardware required for a multiplication
step.

15

You might also like