A Low-Error Energy-Efficient Fixed-Width Booth Multiplier With Sign-Digit-Based Conditional Probability Estimation
A Low-Error Energy-Efficient Fixed-Width Booth Multiplier With Sign-Digit-Based Conditional Probability Estimation
A Low-Error Energy-Efficient Fixed-Width Booth Multiplier With Sign-Digit-Based Conditional Probability Estimation
2, FEBRUARY 2018
Abstract—Fixed-width multipliers are intensively used in many A partitioning method was proposed in [6] to partition the
DSP applications whose accuracy and energy efficiency affect truncated part in Booth multiplication into two: one major and
the whole digital system to a large extent. To improve the com- one minor depending upon the influence on the induced trun-
putation accuracy, a Booth-encoded sign-digit-based conditional cation error. This method is widely adopted in the subsequent
probability estimation approach is proposed. A symmetric error designs. Later on, a simple compensation circuit is proposed
distribution is obtained by taking the sign bit of the Booth- in [7] to compensate the truncation error with limited accu-
encoded multiplier into consideration when applying the con-
ditional probability. In addition, a more generalized mux-based
racy. To lower the error, a simulation-based method in [8] is
estimation method is formulated for the circuit implementation, formulated by taking the information provided by the Booth
which reduces the delay time and power dissipation. Simulation encoding. However, the exhaustive simulation is time consum-
results show that the proposed multiplier exhibits the best compu- ing. A probabilistic estimation bias method is thus proposed
tation accuracy with the least energy per operation. It performs in [9] to reduce the simulation time with acceptable accu-
even better for those operand lengths that are not multiples of racy. The error performance is further improved in [10] by
4. The maximum reduction on energy-delay-error product can applying a complex multi-level conditional probability model,
reach 14.8% compared with all its contenders among various which gains higher accuracy at the cost of increased area.
operand lengths. A better accuracy-area trade-off solution is presented in [11],
Index Terms—Fixed-width multiplier, Booth encoding, approx- which applies both the conditional probability estimation and
imate computation, mean square error, energy-efficient. the computer simulation with a dynamic error-compensation
method.
In this brief, a Booth-encoded sign-digit-based conditional
probability (BSCP) method is proposed in order to achieve
I. I NTRODUCTION a lower mean square error since it is an important indicator of
IXED-WIDTH multipliers are widely used in modern
F energy-efficient multimedia and digital signal process-
ing systems, which are desirable to utilize a fixed operand
computation accuracy [1]. The compensation value is derived
by applying the sign bit of Booth encoded multiplier to the
conditional and expected probability. A simple mux-based esti-
length [1]–[5]. In order to obtain the same bit width as input, mation circuit is formulated from the proposed method, which
the least significant bits of the output are normally trun- yields a simplified compensation logic design.
cated. Among various fixed-width multipliers, post-truncated The rest of this brief is organized as follows. Section II
multiplier (PTM) rounds off the product after a complete introduces the basics of fixed-width Booth multiplier. The pro-
calculation, which makes its accuracy the best while the posed compensation algorithm and the corresponding circuit
hardware expenses the most. Contrarily, direct-truncated mul- design are illustrated in Section III. Section IV evaluates the
tiplier (DTM) cuts off the least significant half partial products performance of proposed fixed-width multiplier and compares
directly without any compensation, which results in the least it with its rivals. Finally, Section V concludes this brief.
computation accuracy and the hardware complexity as well.
For a low error energy-efficient multiplication, the truncation II. P RELIMINARIES
error has to be well compensated with minimum overhead in
order to maintain the required precision while keep the energy Suppose A(aN−1 aN−2 . . . . . . a1 a0 ) and B(bN−1 bN−2 . . . . . .
dissipation as low as possible. b1 b0 ) are two N-bit two’s complement numbers, which rep-
Recently, modified Booth algorithm has been increas- resent multiplicand and multiplier, respectively. Thus, their
ingly employed in the fixed-width multiplier design so as to product P (p2N−1 p2N−2 . . . . . . p1 p0 ) can be obtained as
contribute further to the speedup and logic reduction [6]–[11]. follows:
N−2
Manuscript received March 26, 2017; accepted April 30, 2017. Date of pub- P = A × B = −aN−1 · 2N−1 + ai · 2i
lication May 29, 2017; date of current version January 29, 2018. This work
i=0
was supported by the National Natural Science Foundation of China under ⎛ ⎞
Grant 61604036 and Grant 61534002. This brief was recommended by
N−2
Associate Editor H.-T. Zhang. (Corresponding author: Yajuan He.) × ⎝−bN−1 · 2N−1 + bj · 2j ⎠. (1)
The authors are with the School of Microelectronics and Solid State
Electronics, University of Electronic Science and Technology of China, j=0
Chengdu 610054, China (e-mail: [email protected]).
Color versions of one or more of the figures in this paper are available In fast multiplication design, modified Booth encoding is an
online at http://ieeexplore.ieee.org. efficient way to reduce the number of partial products by half.
Digital Object Identifier 10.1109/TCSII.2017.2709801 As shown in Table I, three consecutive bits (b2j+1 b2j b2j−1 )
1549-7747 c 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
ZHANG AND HE: LOW-ERROR ENERGY-EFFICIENT FIXED-WIDTH BOOTH MULTIPLIER 237
TABLE I
M ODIFIED B OOTH E NCODING
accumulation, while the latter is used for estimation. Suppose εmse,D = σ 2 εA,D + μ2 εA,D
f (D) is the compensation function, which is only related to
the Booth-encoded multiplier D. As shown in Fig. 2, f (D) = σ 2 TPA,D − f (D)/2 + μ2 TPA,D − f (D)/2 . (10)
represents the estimation result from TPminor . Thus, the N-bit
Since f (D) is only related to D, the variance of εA,D should
product PA,D can be expressed as
be a constant and independent of f (D). Therefore, the Min
PA,D = MP + TPmajor + f (D) /2. (5) value of εmse,D can be obtained as shown in (11), with the
condition being satisfied in (12).
With (4) and (5), the truncation error after compensation
εA,D can be calculated as
εA,D = PFWM − PA,D = TPminor · 2−N − f (D)/2. (6) μ TPA,D − f (D)/2 = 0. (12)
238 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 65, NO. 2, FEBRUARY 2018
TABLE II
PARTIAL P RODUCT B ITS AND THE C ORRESPONDING E XPECTATIONS
BASED ON A LL P OSSIBLE B OOTH -E NCODED D IGITS
−1
2N−1 −1
2N−1
f (D) = 2μ TPA,D = β/2 + 2 −dj · 22j−1−N . (19)
εmse min = εmse,Dmin = σ 2 TPA,D − f (D)/2 . j=0
D=−2N−1 D=−2N−1
(13) As the final product is supposed to be an integer, the com-
pensation value should be rounded into an integer as well.
N/2−1
As the mean of TPA,D could be obtained from the expecta- In addition, the amplitude of j=0 (−dj ) · 22j−1−N is less
tions of each partial product together with the corresponding than a quarter with its sign opposite to the multiplier, i.e.,
correction bit. Thus, the mean value of the j-th partial product when bN −1 = 0, this term is negative, otherwise, it is pos-
in the matrix, μj (TPA,D ), can be calculated as itive. Considering the parity of β, the compensation integer
f (D)int can be derived as:
N−2j−2
μj TPA,D = E ppj,i · 22j+i−N + cj · 22j−N (14) β/2 + 1 bN−1 = 1
f (D)int = (20)
i=0 (β − 1)/2 + 1 bN−1 = 0.
where E(·) indicates the expectation. Noted that cj here uses
the actual bit generated from the encoded multiplier instead B. Compensation Circuits Design
of a traditional estimation.
Suppose ωk−1 ωk−2 . . . . . . ω1 ω0 is a k-bit compensation
Let nj be an indication whether the j-th Booth encoded digit
binary that equals f(D)int , k is determined by
(dj ) is a non-zero digit or not. Based on all possible values of
dj , nj , cj , each partial product bit ppj ,i and its corresponding
k = 1 + log2 (max(β)/2 + 1)
expectation are summarized in Table II. ai is the i-th bit of the
multiplicand (a−1 =0) with a uniform input distribution. Thus, = 1 + log2 (N/4 + 1) . (21)
the probability of ai being one is a half.
Therefore, the compensation value can be calculated based
Observed from Table II, E(ppj,i ) can be expressed as
on β for the operand length N according to (22).
a function of dj and cj according to nj and the integer i as
To obtain ‘β’ and ‘β-1’ at the same time, odd-even
0 nj = 0 merge sorting algorithm is a more efficient way than tra-
E ppj,i = 1/2 nj = 0, i = 0 (15) ditional addition. In this algorithm, the non-zero bit array
1 − dj /2 − cj nj = 0, i = 0 (nN /2−1 nN /2−2 . . . nj . . . n1 n0 ) is sorted in such an ascend-
ing order, (γ N /2−1 γ N /2−2 . . . γ j . . . γ 1 γ 0 ), so that there
From (15), E(ppj,i ) can be further derived as a function of must be a demarcation point, γ β , where γ β =0 and γ β −1 = 1.
dj , cj and nj Therefore, we have
nj /2 i = 0
E ppj,i = (16) 1 (0 ≤ j < β)
nj − dj /2 − cj i = 0. γj = (22)
0 (β ≤ j < N/2).
Therefore, μj (TPA,D ) in (14) can be calculated as
A simple odd-even merge sorting circuit is exemplified in
μj TPA,D Fig. 3 with N = 8 and k = 2. (γ 3 γ 2 γ 1 γ 0 ) is the order gen-
erated by the merge sorting circuits from the original array
N−2j−2
(n3 n2 n1 n0 ).
= E ppj,i · 22j+i−N + cj · 22j−N Once the order is generated, a 2-bit compensation value
i=0
f (D)int (ω1 ω0 ) can be derived as
= nj /4 + −dj · 22j−1−N (j = 0, 1, . . . , N/2 − 1). (17)
ω1 = γ2 b7 + γ1 b7 = γ2 + γ1 b7
Let β be the total number of non-zero digits in D, we have
ω0 = γ2 b7 + γ3 b7 + γ1 b7 = ω1 + γ3 b7 . (23)
N/2−1
N/2−1
μ TPA,D = μj TPA,D = nj /4 + −dj · 22j−1−N As noticed, the merge sorting circuit requires a bal-
j=0 j=0
anced structure. Therefore, extra hardware resources would
be incurred for those operand lengths which are not multiples
N/2−1
of 4. In order to deal with this problem, we propose a sim-
= β/4 + −dj · 22j−1−N (0 ≤ β ≤ N/2). ple method that firstly round down the operand length to be
j=0 a multiple of 4, which excludes nN /2−1 from the original array.
(18) Then, bN −1 and nN /2−1 are used as two control signals to
ZHANG AND HE: LOW-ERROR ENERGY-EFFICIENT FIXED-WIDTH BOOTH MULTIPLIER 239
TABLE IV
C IRCUIT P ERFORMANCE W ITH D IFFERENT F IXED -W IDTH M ULTIPLIER
V. C ONCLUSION
In this brief, a Booth encoded sign-digit-based conditional
probability method for a fixed-width multiplier design is
presented. A mux-based estimation circuit is proposed to sim-
plify the compensation logics, and shorten the critical paths.
The proposed approach can be easily adapted for an accuracy-
adaptive design by putting variable columns of TPmajor into
at a 200MHz data rate with 50% switching activity to each considerations. Simulation results show that our proposed mul-
design. Both area and energy per operation for each operand tiplier is the most energy efficient design with the highest
length are normalized so that DTM with the least area and accuracy. It can reach maximum 14.8% reduction on the
energy consumption is one. energy-delay-error product compared with all its contenders
As indicated in Table IV, the proposed BSCP multiplier among various operand lengths.
has the minimum delay time except DTM. This is because
R EFERENCES
the compensation function of this brief uses a mux-based
estimation circuits instead of modifying the partial products [1] N. Petra, D. De Caro, V. Garofalo, E. Napoli, and A. G. M. Strollo,
before compression as in [8] and [11]. This results in shorter “Design of fixed-width multipliers with linear compensation function,”
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 5, pp. 947–960,
critical paths with a slight overhead on area. As for power May 2011.
dissipation, the proposed BSCP multiplier dissipates the least [2] C.-H. Chang and R. K. Satzoda, “A low error and high performance
for those bit widths which are not multiples of 4, although multiplexer-based truncated multiplier,” IEEE Trans. Very Large Scale
their corresponding area are not the smallest. That is due to Integr. (VLSI) Syst., vol. 18, no. 12, pp. 1767–1771, Dec. 2010.
our logic simplification on traditional merge sorting algorithm [3] S.-N. Tang, J.-W. Tsai, and T.-Y. Chang, “A 2.4-Gs/s FFT processor for
OFDM-based WPAN applications,” IEEE Trans. Circuits Syst. II, Exp.
discussed in Section III-B, which decreases the chances of Briefs, vol. 57, no. 6, pp. 451–455, Jun. 2010.
switching activity for the internal signals. When the operand [4] S. W. Yen et al., “A 5.79-Gb/s energy-efficient multirate LDPC codec
length increases, the power savings becomes more obvious chip for IEEE 802.15.3c applications,” IEEE J. Solid-State Circuits,
for all operand lengths. Benefit from the improved speed, vol. 47, no. 9, pp. 2246–2257, Sep. 2012.
and power dissipation, this brief also exhibits the least energy [5] H.-J. Ko and S.-F. Hsiao, “Design and application of faithfully rounded
and truncated multipliers with combined deletion, reduction, truncation,
consumption among all competitors. and rounding,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 5,
pp. 304–308, May 2011.
C. Energy-Delay-Accuracy Analysis [6] K.-J. Cho, K.-C. Lee, J.-G. Chung, and K. K. Parhi, “Design of low-
For benchmarking the energy efficiency of a circuit, the error fixed-width modified Booth multiplier,” IEEE Trans. Very Large
energy-delay product (EDP) is used to be a better metric Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 522–531, May 2004.
[7] H.-A. Huang, Y.-C. Liao, and H.-C. Chang, “A self-compensation fixed-
than energy per operation. For lossy applications, to maintain width Booth multiplier and its 128-point FFT applications,” in Proc.
a required accuracy, error performance should be taken into IEEE Int. Symp. Circuits Syst., May 2006, pp. 3538–3541.
account as well as the energy efficiency. Thus, a new design [8] J.-P. Wang, S.-R. Kuang, and S.-C. Liang, “High-accuracy fixed-width
metric energy-delay-error-product (EDEmse ) is adopted to bet- modified Booth multipliers for lossy applications,” IEEE Trans. Very
ter evaluate the design efficiency of the proposed approach, Large Scale Integr. (VLSI) Syst., vol. 19, no. 1, pp. 52–60, Jan. 2011.
[9] C.-Y. Li, Y.-H. Chen, T.-Y. Chang, and J.-N. Chen, “A probabilistic
which is given by estimation bias circuit for fixed-width Booth multiplier and its DCT
applications,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 4,
EDEmse = Energy(E) × Delay(D) pp. 215–219, Apr. 2011.
× MSE(Emse ). (25) [10] Y.-H. Chen, “An accuracy-adjustment fixed-width Booth multiplier
based on multilevel conditional probability,” IEEE Trans. Very Large
As there is no compensation algorithm involved in either Scale Integr. (VLSI) Syst., vol. 23, no. 1, pp. 203–207, Jan. 2015.
DTM or PTM, these two multipliers are excluded from this [11] W.-Q. He, Y.-H. Chen, and S.-J. Jou, “High-accuracy fixed-width Booth
comparison. The results are illustrated in Fig. 6. multipliers based on probability and simulation,” IEEE Trans. Circuits
Syst. I, Reg. Papers, vol. 62, no. 8, pp. 2052–2061, Aug. 2015.
As expected, the proposed design exhibits the least EDEmse [12] G. W. Bewick, “Fast multiplication: Algorithm and implementation,”
for all operand lengths. That means BSCP approach in the Ph.D. dissertation, Dept. Elect. Eng., Stanford Univ., Stanford, CA,
design of a fixed-width multiplier is the most energy efficient USA, 1994.