Multiplexer-Based Array Multipliers: Kiamal Z. Pekmestzi
Multiplexer-Based Array Multipliers: Kiamal Z. Pekmestzi
Multiplexer-Based Array Multipliers: Kiamal Z. Pekmestzi
1, JANUARY 1999 15
Multiplexer-Based Array Multipliers
Kiamal Z. Pekmestzi
AbstractA new algorithm for the multiplication of two n-bit numbers based on the synchronous computation of the partial sums of
the two operands is presented. The proposed algorithm permits an efficient realization of the parallel multiplication using iterative
arrays. At the same time, it permits high-speed operation. Multiplier arrays for positive numbers and numbers in twos complement
form based on the proposed technique are implemented. Also, an efficient pipeline form of the proposed multiplication scheme is
introduced. All multipliers obtained have low circuit complexity permitting high-speed operation and the interconnections of the cells
are regular, well-suited for VLSI realization.
Index TermsArray multipliers, multiplication algorithm, twos complement multiplication, pipeline multipliers.
1 INTRODUCTION
ULTIPLICATION is the most critical operation in every
computational system. Innumerable schemes have
been proposed for the realization of this operation. In the
early multiplier schemes proposed by Hoffman et al. [1],
Burton and Noaks [2], De Mori [3], and Guilt [4] for posi-
tive numbers, and by Baugh and Wooley [5] and Hwang [6]
for numbers in twos complement form, the effort was on
implementations using iterative circuits with uniform inter-
connections pattern. Also, on the same base, schemes using
Booths algorithm [7], [8] were presented.
The basic idea behind all these attempts was the fast im-
plementation of the addition of the partial products. For
this purpose, the carry-save addition (CSA) technique has
been extensively used. In this technique, intermediate re-
sults are always in a redundant form of two numbers. Two
types of arrays have been proposed for the addition of the
intermediate results. In the first type, the arrays are iterative
with regular interconnection structure, permitting multipli-
cation in time O(n) [9], [10]. The second type arrays are of
tree form, permitting higher speed in O(log n) time, but the
irregular form of a tree-array does not permit an efficient
VLSI realization [11], [12].
A new type of iterative array for a parallel multiplier,
based on a five-counter addition unit, that yields a regular
interconnection structure and is faster than the previous
iterative-type multipliers, is proposed by Nakamura [13],
[14]. The implementation of this multiplier, based on differ-
ent grouping schemes of products of bits, requires ap-
proximately
n
2
2
cells, half compared to n
2
cells of the previ-
ous iterative array multipliers. However, the circuit com-
plexity of each five-counter cell is double (two gated Full-
Adders) compared to the basic cell of an iterative parallel
multiplier.
The Modified Booths algorithm [8] permits the reduc-
tion of the partial products to the half. At each step of this
algorithm two bits of the multiplier are processed. Its array
implementation requires approximately
n
2
2
cells. Now, the
cell is not a gated Full-Adder but must perform addition-
subtraction-no operation and controlled left-shift of the bits
of the multiplicand.
In this paper, a new multiplication algorithm based on a
different mechanism is proposed. At each step, one bit of the
multiplier and one bit of the multiplicand are processed. So,
the algorithm is symmetric; this means that multiplier and
multiplicand can be interchanged. According to the pro-
posed algorithm, presented in Section 2, the sum of the two
operands, progressively computed, is a useful quantity that
is used in the computation of certain partial products. This
quantity is computed one bit at each step of the algorithm
and is used in the next step, in case it is necessary. The par-
allel implementation of the proposed algorithm, presented
in Section 3, yields an iterative type array with almost equal
number of cells as the five-counter multiplier. However,
each cell of the proposed multiplier adds three bits instead
of five bits of the five-counter multiplier. Consequently, a
smaller circuit and faster addition of the partial products
must be expected. Compared to the implementations based
on the Modified Booths algorithm, this requires the same
amount of circuitry but yields faster multiplication time.
The extension of the proposed multiplication algorithm
to numbers in twos complement form is presented in Sec-
tion 4 and the corresponding array is given. Next, in Section 5,
the implementation of the proposed algorithm in bit-
parallel pipeline form is examined. The symmetry of the
proposed algorithm results in small number of delay ele-
ments for the implementation in pipeline form. In all cases,
the realization of the proposed multiplexer-based multiplier
scheme is compared to existing multiplier designs in terms
of circuit complexity (number of gates and transistors in
CMOS VLSI technology) and operation speed. The conclu-
sions are summarized in Section 6.
0018-9340/99/$10.00 1999 IEEE
1 2 0
0
1
2 K (1)
Y y y y y
n n j
j
j
n
= =
=
1 2 0
0
1
2 K . (2)
We define X
n1
and Y
n1
as the numbers that remain after
truncation of bits x
n1
and y
n1
. So:
X x x x x X X x
n n n j
j
j
n
n
n
n
=
= = = +
1 2 3 0
0
2
1
1
1
2 2 K and (3)
Y y y y y Y Y y
n n n j
j
j
n
n
n
n
=
= = +
1 2 3 0
0
2
1
1
1
2 2 K and . (4)
The product of the numbers X and Y, according to (3)
and (4), can be computed as follows:
P XY x X y Y
n
n n
n
n n
= = + +
2 2
1
1 1
1
1 1
> C> C
= + + +
2 2
2 2
1 1
1
1 1 1 1 1 1
n
n n
n
n n n n n n
x y x Y y X X Y
< A
. (5)
Let us define and generally P X Y P X Y
n n n j j j
= =
1 1 1
, (6)
where X
j
and Y
j
are the numbers formed by the j least sig-
nificant bits of X and Y, respectively. The product P
j
can be
computed by the following recursive equation:
P X Y
x y x Y y X X Y
j j j
j
j j
j
j j j j j j
=
= + + +
2 2
2 2
1 1
1
1 1 1 1 1 1 J L
= + + +
2 2
2 2
1 1
1
1 1 1 1 1
j
j j
j
j j j j j
x y x Y y X P
J L
. (7)
According to the above relations, the product P = XY can be
computed by the equation:
P x y x Y X y
j j
j
j
n
j j j j
j
j
n
= + +
=
2 2
2
0
1
1
1
J L
. (8)
The main computation concerns the addition of the terms
Z x Y X y
j j j j j
= + . (9)
These terms must be added, properly weighted, and the
product is given by the next relation:
P x y Z
j j
j
j
n
j
j
j
n
= +
=
2 2
2
0
1
1
1
. (10)
The above equation describes a multiplication algorithm
that is illustrated in Fig. 1, where the grouping of the partial
product bits are distinguished by connecting them with
solid lines. By folding the array of this figure along the di-
agonal line, the final form of the algorithm, given by (10), is
derived.
The values that quantity Z
j
can take depend on the values
of the logical variables x
j
and y
j
and are shown in Table 1.
The only case where Z
j
requires computation is when the
two bits of the multiplied numbers have value 1. At each
step j, only s
j
and c
j+1
are new. The rest of the bits of S
j
have
been formed in the previous j 1 steps according to the
relation:
S S x y
j j j j
= + +
1 1 1
. (11)
The sum S = X + Y can be computed once. During the jth
iteration of the algorithm, the j least significant bits of S with
c
j
as the most significant bit form the quantity S
j
which, in
turn, is used only if x
j
= y
j
= 1. In this case, however, s
j+1
= c
j
and, in the proposed algorithm, the quantity S
j
can be com-
puted equivalently by both the following relations:
Fig. 1. Schematic illustration of the proposed multiplication algorithm.
PEKMESTZI: MULTIPLEXER-BASED ARRAY MULTIPLIERS 17
S c s s s S s s s s
j j j j j j j j
= =
1 2 0 1 2 0
K K or . (12)
The addition of the two operands can be performed on a
carry-propagate adder shown in Fig. 2. There is no need to
use a faster adder because the computation of each term S
j
must be synchronized with the jth step of the algorithm. In
the implementations that follow, the first form of (12) is
used. Each term of the partial products Z
j
can be selected
according to Table 1. This operation can be realized using a
4 1 multiplexer with bits x
j
and y
j
being the control inputs.
In the next section, we present the parallel realization of the
above described algorithm.
3 PARALLEL REALIZATION OF THE MULTIPLEXER-
BASED MULTIPLIERS
The terms Z
j
2
j
and x
j
y
j
2
2j
of (10) are produced in the jth row
of MUX shown in Fig. 3. These terms can be summed on an
array of carry-save adders. The partial products that corre-
spond to the right-boundary MUX are added in the first row
of cells of Fig. 4. The partial products that are produced by
the neighboring MUX are added in the second row of cells of
the array of Fig. 4 and so on. Generally, the terms Z
j
2
j
and
x
j
y
j
2
2j
are added at the jth diagonal column cells of this ar-
ray. The array is composed of two types of cells. The first
type of cells includes a 4 1 multiplexer and a Full-Adder
and is shown in Fig. 5a. Input bits x
j
and y
j
are broadcast to
n 1 i cells of first type at the ith horizontal row of the ar-
ray. The input bits x
j
and y
j
are broadcast to j + 1 diagonally
placed cells of the array. The jth diagonal row of cells consists
of j first type cells and one of second type cell. In total,
n n ( ) 1
2
first type cells and n second type cells are required.
TABLE 1
THE VALUES OF Z
j
x
j
y
j
Z
j
0 0 0
0 1 X
j
1 0 Y
j
1 1 X
j
+ Y
j
= S
j
Fig. 2. A carry-propagate adder for the generation of S
j
.
Fig. 3. The implementation of the terms Z
j
2
j
and x
j
y
j
2
2j
used in the proposed multiplication algorithm.
18 IEEE TRANSACTIONS ON COMPUTERS, VOL. 48, NO. 1, JANUARY 1999
The second type cell, shown in Fig. 5b, includes two dis-
tinct circuits, each one of which performs a different job.
The first circuit is a Full-Adder that produces at the new
bits s
j
= s
i
and c
j+1
. In all second type cells, indices i and j
can be interchanged because they participate in horizontal
and diagonal row of cells of the same order. Bits s
i
along
with x
j
and y
j
are broadcast to all first type cells of the ith
row of the array. Bits c
j+1
are propagated to the next second
type cells (for the progressive formation of the terms S
j
)
according to (11) and (12). The second circuit of the second
type cells completes the addition of the term Z
j
by sum-
ming the last bit of this term, equal to x
j
y
j
c
j
, while at the
same time it generates the term x
j
y
j
, which is required in
the computation of the product, according to (10).
Finally, small complexity two-bit carry-lookahead adders
permit the addition of the outputs of the right boundary
cells yielding high-speed operation. Such a carry-lookahead
adder is given in Fig. 6.
As can be seen in Fig. 4, the total multiplier circuit con-
sists of
n n
n
( )
+ +
1
2
2 1
> C
Full-Adders,
n n ( ) 1
2
multiplexers, 2n
AND gates and n 2 two-bit carry-lookahead (CLA) add-
ers. For the estimation of the complexity of the proposed
Fig. 4. The proposed multiplexer-based parallel multiplier.
Fig. 5. (a) First type cell (CELL I) used in the proposed array multiplier. (b) Second type cell (CELL II) used in the proposed array multiplier.
PEKMESTZI: MULTIPLEXER-BASED ARRAY MULTIPLIERS 19
multiplier circuit, the number of gates of each one of the
above circuits is given in Table 2. Also, in this table, the
transistor count, using a standard CMOS VLSI realization,
for the same circuits [15] is given.
Taking into account the Table 2, the total number of gates
and transistors (in CMOS VLSI realization) for the iterative
type, the five-counter cell and the proposed multiplexer-
based multipliers have been found and are presented in
Table 2. Notice that the proposed multiplier scheme of Fig.
4 yields almost the same circuit complexity with the arrays
based on the Modified Booths algorithm, while it requires
a considerably smaller number of gates or transistors com-
pared to the other arrays. Seven connection input and out-
put lines between cells are needed, compared to eight of the
Modified Booths algorithm, seven of a five-counter cell and
four of an iterative multiplier array. However, the intercon-
nection lines do not consume more chip area, because they
are realized using the metal interconnection layers of the
VLSI technology.
The multiplication time of the proposed scheme is equal
to T = (n +1)
FA
, where
FA
is the delay time of a Full-Adder. It
has been assumed that the carry propagation delay of a two-
bit CLA adder is also equal to
FA
. In the iterative multiplier
arrays with carry-save addition (CSA) of the partial products
and carry propagation for the final addition, the operation
time is T = (2n 1)
FA
. The use of the carry-lookahead tech-
nique for the final addition permits faster operation, equal to
T = (n + log
2
n)
FA
, which is still slower than the multiplier of
Fig. 4. The proposed scheme is also faster than the imple-
mentation based on the Modified Booths algorithm, except
for the case that it uses final CLA addition.
In each one of the boundary cells of the top row and of
the left diagonal column of the multiplier array of Fig. 4,
only the multiplexer is required. Thus, circuit economy and
faster operation can be achieved by eliminating the Full-
Adder circuits from the above cells. This simplified form of
the proposed multiplier requires 2n 1 fewer Full-Adders.
In this case, the multiplication time is T = n
FA
.
The five-counter cell multiplier has the same operation
time but its cell is more complicated. A CMOS realization of
this cell requires 60 transistors and, because of the higher
complexity, it is expected to have greater propagation delay.
All the above comparisons that concern the transistor count
and the multiplier operation time are summarized in Tables 3
and 4.
Finally, an example for the multiplication of X = 27
10
=
11011 by Y = 29
10
= 11101 using the proposed algorithm is
given in Fig. 7.
4 A TWOS COMPLEMENT MULTIPLEXER-BASED
MULTIPLIER
The proposed multiplication algorithm given in (10) can be
easily extended to numbers in twos complement form. Let
us consider two integer numbers X and Y in this form:
X x x x x x x X
n n
n
n j
j
j
n
n
n n
= = + = +
1 2 0
1
1
0
2
1
1 1
2 2 2 K
(13)
Y y y y y y y Y
n n
n
n j
j
j
n
n
n n
= = + = +
1 2 0
1
1
0
2
1
1 1
2 2 2 K . (14)
X
n1
and Y
n1
are the numbers that remain after trunca-
tion of the bits x
n1
and y
n1
from X and Y, respectively. The
product P of the numbers X and Y, according to (13) and
(14), is equal to:
P XY x X y Y
n
n n
n
n n
= = + +
2 2
1
1 1
1
1 1
> C> C
= + +
2 2
2 2
1 1
1
1 1 1 1 1 1
n
n n
n
n n n n n n
x y x Y y X X Y
< A
. (15)
Fig. 6. A two-bit carry-lookahead adder circuit.
TABLE 2
NUMBER OF GATES AND TRANSISTORS
FOR VARIOUS TYPE OF CIRCUITS
CIRCUIT GATES TRANSISTORS*
FULL-ADDER 12 26
4 TO 1 MULTIPLEXER 5 16
HALF-ADDER 5 10
XOR GATE 4 6
2-BIT CLA ADDER 23 50
2 TO 1 MULTIPLEXER 3 6
*In CMOS VLSI technology
20 IEEE TRANSACTIONS ON COMPUTERS, VOL. 48, NO. 1, JANUARY 1999
The above relation differs from the corresponding (5) for
positive numbers only in the sign of the term Z
n1
=
x
n1
Y
n1
+ X
n1
y
n1
. Now, the above term must be sub-
tracted instead of added. The algorithm for all other terms
remains unchanged and the product P = XY can be com-
puted by the next equation:
P x y x Y X y x Y X y
j j
j
j
n
j j j j
j
n n n n
n
j
n
= + + +
=
2 2 2
2
0
1
1 1 1 1
1
1
2
J L
< A
= +
=
x y Z Z
j j
j
j
n
j
j
n
n
j
n
2 2 2
2
0
1
1
1
1
2
. (16)
Consequently, the implementation of a twos complement
multiplier based on the proposed technique can be done on
an array similar to the array of Fig. 4. Quantity Z
n1
is gen-
erated on the left boundary cells and it is subtracted by in-
verting the S
out
bits of these cells. Also, the two additive
inputs of the next to the leftmost top cell must be set to one.
The final array implementing the multiplication of numbers
in twos complement form is shown in Fig. 8. In Fig. 9, we
give a multiplication example with X = 19
10
and Y = +22
10
using the algorithm introduced above. The proposed mul-
tiplier of signed numbers yields the same performance as
the positive number multiplier of Fig. 4.
5 PIPELINE MULTIPLEXER-BASED MULTIPLIERS
The proposed multiplication schemes of Figs. 4 and 8 can
be easily transformed to an equivalent pipeline form by
inserting delay elements to the path of all signals that
propagate downwards from each row of cells to the next.
The required number of latches in the first type cell is four.
They must be inserted at the input x
j
and y
j
and at the out-
put c
out
and s
out
lines, as is shown in Fig. 10a.
The second type cells require nine latches each. All out-
put lines of these cells are latched. The input x
j
and y
j
lines
are directly connected to the output x
j
and y
j
lines. The
above latching policy, shown in Fig. 10b, guarantees syn-
chronization of the elements of the pipeline array. The up-
per half of each second type cell operates preparing bits x
j
,
y
j
, and s
j
one clock cycle ahead of the first type cells of the
same row, where these bits will be broadcast. The lower
half of a second type cell that processes the term x
j
y
j
c
j
must
be synchronized with the first type cells. The latches
placed at the outputs of the two AND gates contribute to
this synchronization. Each two-bit carry-lookahead adder
TABLE 3
CIRCUIT COMPARISON OF VARIOUS MULTIPLIERS
MULTIPLIER
TYPE
ITERATIVE
ARRAY
5-COUNTER CELL MODIFIED BOOTHS
ALGORITHM
MULTIPLIER
(Fig. 4)
NUMBER OF
BASIC CELLS
n
2
n(n + 1)/2 CELL A : (n + 1)
2
/2
CELL B : (n + 1)/2
CELL I : n(n 1)/2
CELL II : n
CLA-2 : n
CELL
COMPLEXITY
1 GATED F-A 2 GATED F-A CELL A:
2 1 MUX, 5 GATES, 1 F-A
CELL B : 6 GATES, 2 F-A
CELL I : 4x1 MUX, 1 F-A
CELL II : 2 GATES, 2 F-A
CLA-2 : 23 GATES
GATE NUMBER 13n
2
13(n
2
+ 3n 2) 10n
2
+ 23n + 13 8.5n
2
+ 40.5n 34
8.5n
2
+ 16.5n 22 *
TRANSISTOR
NUMBER
30n
2
30(n
2
+ 3n 2) 21n
2
+ 52n + 31 21n
2
+ 89n 73
21n
2
+ 37n 99 *
*Simplified form of the proposed multiplier
n = number of bits of the two operands, F-A = Full-Adder, CLA-2 = two bits carry-lookahead
TABLE 4
OPERATION DELAY OF VARIOUS MULTIPLIERS
MULTIPLIER TYPE ITERATIVE
ARRAY
5-COUNTER
CELL
MODIFIED BOOTHS
ALGORITHM
PROPOSED
MULTIPLIER (Fig. 4)
MULTIPLICATION
TIME
T = (2n 1)
FA
T =
FA
(n + log
2
n)*
T = (n +1)t T = n
FA
/2 + n
FA
T = n
FA
/2 + log
2
n
FA
*
T = (n + 1)
FA
THE SIMPLIFIED
FORM : T = n
FA
*Corresponds to carry-save arrays with final CLA addition
n = number of bits of the operands,
FA
= delay of a full-adder, t = delay of a five-counter cell
Fig. 7. A multiplication example using the proposed technique.
PEKMESTZI: MULTIPLEXER-BASED ARRAY MULTIPLIERS 21
has synchronous inputs. A latch is placed between succes-
sive carry-lookahead adder modules. A number of n 2 of
such latches is required.
The input of the pipeline multiplier does not require any
additional latches and all bits of the two operands must enter
the array at the same time. The output of the multiplier must
be realigned using an array of latches, which is triangularly
shaped with a two-bit step. The total number of delays for
the synchronization of the output is 2n(n + 1)/2 = n(n + 1).
Each cell of the first type includes four delay elements and
each cell of the second type includes nine delay elements.
The proposed scheme has n(n 1)/2 cells of the first type
and n cells of the second. Consequently, the total number of
delay elements for the bit-parallel pipeline form of the ar-
rays of Figs. 4 and 8 is:
L n n n n n n n n = + + + + = + 1 4 1 2 9 2 3 9 2
2
0 5 0 5 0 5 . (17)
Three types of iterative array pipeline multipliers exist.
The first two types are based on the carry-save technique of
McCanny and McWhirter [16], Hatamian and Cash [17],
Fig. 8. The proposed multiplexer-based twos complement parallel multiplier.
Fig. 9. A multiplication example for numbers in twos complement form using the proposed algorithm.
22 IEEE TRANSACTIONS ON COMPUTERS, VOL. 48, NO. 1, JANUARY 1999
and Cappello and Steiglitz [18] and the third type is based
on the carry propagation technique of Roy and Bayoumi
[19] and Jump and Ahuja [20]. All figures that concern the
number of basic cells, the cell complexity and the overall
combinational circuit complexity in terms of number of
gates and the number of latches of the above, and the pro-
posed pipeline multipliers are shown in Table 5. In the
above estimation of the number of latches, realignment de-
lays have been taken into account besides the cell latches.
The pipeline form of the proposed multiplication tech-
nique requires considerably fewer latches and is superior,
in terms of combinational circuit complexity, to existing
realizations.
As a remark, we mention that the proposed algorithm
can be used also in a serial fashion for bit-serial arithmetic.
The performance of the parallel pipeline form leads us to
expect significant improvement compared to existing bit-
serial multiplier implementations.
6 CONCLUSION
In this paper, we have proposed a multiplication algorithm
that permits efficient VLSI realization and compares fa-
vorably to previously reported iterative array multipliers.
The proposed technique is in all cases faster than the above
multipliers. At the same time, a saving in circuitry up to 20
percent is obtained except for the multiplier that is based on
the Modified Booths algorithm, where the circuit complex-
ity is the same. The canonical form of the obtained circuits
for unsigned number and for numbers in twos comple-
ment form makes them suitable for VLSI realization. Fi-
nally, the proposed multiplexer-based multiplier array can
be used in a pipeline form. This pipeline multiplier array is
advantageous in terms of combinational circuit complexity
and requires considerably fewer delay elements compared
to previously reported pipeline multipliers.
Fig. 10. The two types of cells of the proposed pipelined multiplexer-based parallel multiplier with synchronization delays. (a) First type cell. (b)
Second type cell.
TABLE 5
COMPARISON OF THE PROPOSED PIPELINE MULTIPLIERS TO EXISTING PIPELINE MULTIPLIERS
PIPELINE
MULTIPLIER TYPE
1ST TYPE
CARRY-SAVE
2ND TYPE
CARRY-SAVE
CARRY PROPAGATE
ARRAY
352326(' 3,3(/,1(
$55$<
NUMBER OF BASIC
CELLS
3n
2
/2 3n
2
/2 n
2
CELL I : n(n 1)/2
CELL II : n
CLA-2 : n
CELL COMPLEXITY 1 GATED F-A 1 GATED F-A 1 GATED F-A CELL I : 4 1 MUX, 1 F-A
CELL II : 2 GATES, 2 F-A
CLA-2 : 23 GATES
COMBINATIONAL
CIRCUIT
(NUMBER OF GATES)
39n
2
/2 39n
2
/2 13n
2
8.5n
2
+ 16.5n 22
DELAY ELEMENTS
(LATCHES)
7n
2
10.5n
2
+ 0.5n 9n
2
7n 3n
2
+ 9n 2
n = number of bits of the two operands, F-A = Full-Adder, CLA-2 = two bits carry-lookahead
PEKMESTZI: MULTIPLEXER-BASED ARRAY MULTIPLIERS 23
REFERENCES
[1] J. Hoffman, G. Lacaze, and P. Csillag, Iterative Logical Network
for Parallel Multiplication, Electronics Letters, vol. 4, p. 178, 1968.
[2] P. Burton and D.R. Noaks, High-Speed Iterative Multiplier,
Electronics Letters, vol. 4, p. 262, 1968.
[3] R. De Mori, Suggestion for an IC Fast Parallel Multiplier, Elec-
tronics Letters, vol. 5, pp. 50-51, Feb. 1969.
[4] H. Guilt, Fully Iterative Fast Array for Binary Multiplication,
Electronics Letters, vol. 5, p. 263, 1969.
[5] R. Baugh and B.A. Wooley, A Twos Complement Parallel Array
Multiplication Algorithm, IEEE Trans. Computers, vol. 22, no. 12,
pp. 1,045-1,059, Dec. 1973.
[6] K. Hwang, Global and Modular Twos Complement Array Mul-
tipliers, IEEE Trans. Computers, vol. 28, no. 4, pp. 300-306, Apr.
1979.
[7] A. Booth, A Signed Binary Multiplication Technique, Quarterly
J. Mechanics of Applied Math., vol. 4, pp. 236-240, 1951.
[8] L. MacSorley, High Speed Arithmetic in Binary Computers,
Proc. IRE, vol. 49, Jan. 1961.
[9] Y. Oowaki et al., A Sub-10-ns 16x16 Multiplier Using 0.6-mm
CMOS Technology, IEEE J. Solid-State Circuits, vol. 22, no. 5, Oct.
1987.
[10] R. Sharma et al., A 6.75-ns 16x16-bit Multiplier in Single-Level-
Metal CMOS Technology, IEEE J. Solid-State Circuits, vol. 24, no. 4,
Aug. 1989.
[11] C. Wallace, A Suggestion for a Fast Multiplier, IEEE Trans. Elec-
tronic Computers, vol. 13, pp. 114-117, Feb. 1964.
[12] N. Takagi, H. Yasuura, and S. Yajima, High-Speed VLSI Multipli-
cation Algorithm with a Redundant Binary Addition Tree, IEEE
Trans. Computers, vol. 34, no. 9, pp. 789-796, Sept. 1985
[13] S. Nakamura, Algorithm for Iterative Array Multiplication,
IEEE Trans. Computers, vol. 35, no. 8, pp. 713-719, Aug. 1986.
[14] S. Nakamura and K.-Y. Chu, A Single Chip Parallel Multiplier by
MOS Technology, IEEE Trans. Computers, vol. 37, no. 3, pp. 274-
282, Mar. 1988.
[15] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design.
Reading, Mass.: Addison-Wesley, 1985.
[16] J.V. McCanny and J.G. McWhirter, Completely Iterative, Pipe-
lined Multiplier Array Suitable for VLSI, Proc. IEE, pt. G., vol. 129,
no. 2, pp. 40-46, Apr. 1982.
[17] M. Hatamian and G.L. Cash, Parallel Bit-Level Pipelined VLSI
Designs for High-Speed Signal Processing, Proc. IEEE, vol. 75,
no. 9, pp. 1,192-1,202, Sept. 1987.
[18] R. Cappello and K. Steiglitz, A Note on Free Accumulation in VLSI
Filter Architectures, IEEE Trans. Circuits and Systems, vol. 32,
pp. 291-296, Mar. 1985.
[19] R. Roy and M. Bayoumi, An Efficient Twos Complement Sys-
tolic Multiplier for Real-Time Digital Signal Processing, IEEE
Trans. Circuits and Systems, vol. 36, no. 11, pp. 1,488-1,493, Nov.
1989.
[20] J.R. Jump and S.R. Ahuja, Effective Pipelining of Digital Sys-
tems, IEEE Trans. Computers, vol. 27, pp. 855-865, Sept. 1978.
Kiamal Z. Pekmestzi received his diploma in
electrical engineering from the National Techni-
cal University of Athens (1975). From 1975 to
1981, he was a research fellow in the Electronics
Department of the Nuclear Research Center
Demokritos. He received his PhD in electrical
engineering from the University of Patras (1981).
From 1983 to 1985, he was a professor at the
Higher School of Electronics in Athens. Since
1985, he has been with the National Technical
University of Athens, where he is currently an
associate professor. His research interests includes computer arithme-
tic, VLSI digital filters, and VLSI design automation.