Project Base Paper

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

7A-1

A Low-Power High-Speed Accuracy-Controllable


Approximate Multiplier Design
Tongxin Yang1 Tomoaki Ukezono2 Toshinori Sato3
1
Graduate School of Information and Control Systems, Fukuoka University, Japan
2,3
Department of Electronics Engineering and Computer Science, Fukuoka University, Japan
1
Email: [email protected] 2Email: [email protected] 3Email: [email protected]

Abstract—Multiplication is a key fundamental function for Our approach introduces a term representing the power and
many error-tolerant applications. Approximate multiplication is accuracy requirements which simplifies the partial product
considered to be an efficient technique for trading off energy reduction (PPR) component as needed. An approximate
against performance and accuracy. This paper proposes an multiplier is designed using the proposed adder and
accuracy-controllable multiplier whose final product is generated compressor. This multiplier, together with a conventional
by a carry-maskable adder. The proposed scheme can dynamically multiplier and the previously studied approximate multipliers,
select the length of the carry propagation to satisfy the accuracy was implemented in Verilog HDL using a 45-nm library to
requirements flexibly. The partial product tree of the multiplier is evaluate the power consumption, critical path delay, and design
approximated by the proposed tree compressor. An ૡ ൈ ૡ
area. Compared with the conventional Wallace tree multiplier,
multiplier design is implemented by employing the carry-
the proposed approximate multiplier reduced power
maskable adder and the compressor. Compared with a
conventional Wallace tree multiplier, the proposed multiplier
consumption by between 47.3% and 56.2% and the critical path
reduced power consumption by between 47.3% and 56.2% and delay by between 29.9% and 60.5%, depending on the required
critical path delay by between 29.9% and 60.5%, depending on the computational accuracy. In addition, its design area was 44.6%
required accuracy. Its silicon area was also 44.6% smaller. In smaller. Comparisons with the established approximate
addition, results from an image processing application multipliers, none of which have any dynamic reconfigurability,
demonstrate that the quality of the processed images can be demonstrate that the proposed multiplier provided the best
controlled by the proposed multiplier design. trade-off of power and delay against accuracy. All the multiplier
designs are then evaluated in a real image processing
I. INTRODUCTION application.
Many increasingly popular applications, such as image The remainder of this paper is organized as follows. Section
processing and recognition, are inherently tolerant of small II reviews previous works. Section III introduces the accuracy-
inaccuracies. These applications are computationally controllable approximate multiplier after explaining the tree
demanding and multiplication is their fundamental arithmetic compressor and the CMA. Section IV evaluates the multipliers
function, which creates an opportunity to trade off experimentally and then evaluates the proposed approximate
computational accuracy for reduced power consumption. multiplier using an image processing application. Section V
Approximate computing is an efficient approach for error- presents our conclusions.
tolerant applications because it can trade off accuracy for
power, and it currently plays an important role in such II. PREVIOUS WORK
application domains [1]. The adder is a basic element of most multipliers. Mahdiani
Different error-tolerant applications have different accuracy et al. [2] proposed the lower-part-OR adder, which utilizes OR
requirements, as do different program phases in an application. gates for addition of the lower bits and precise adders for
If multiplication accuracy is fixed, power will be wasted when addition of the upper bits. It is similar to our proposed CMA in
high accuracy is not required. This means that approximate that it uses OR gates to generate the sum approximately, but our
multipliers should be dynamically reconfigurable to match the CMA is also dynamically reconfigurable.
different accuracy requirements of different program phases Liu et al. [3] utilized an approximate adder to reduce carry
and applications. propagation delay in partial product accumulation. They also
This paper focuses on an approximate multiplier design that proposed a recovery vector to improve accuracy. The bit width
can control accuracy dynamically. A carry-maskable adder of the error recovery vector can be selected by the designer to
(CMA) is proposed that can be dynamically configured to satisfy accuracy requirements. Hashemi et al. [4] proposed a
function as a conventional carry propagation adder (CPA), a set technique that reduces the size of the multiplier by detecting the
of bit-parallel OR gates, or a combination of the two. This leading one bit of the input operands and selecting the following
configurability is realized by masking carry propagation: the ݇ bits as abridged operands for both inputs, where ݇ is a
CPA in the last stage of the multiplier is replaced by the designer-defined value that specifies the bandwidth used in the
proposed CMA. An approximate tree compressor is utilized to core accurate multiplier. Both [3] and [4] allow a static trade-
reduce the accumulation layer depth of the partial product tree. off between power consumption and accuracy. The bit lengths

978-1-5090-0602-1/18/$31.00 ©2018 IEEE 605


7A-1
of the recovery vector [3] and the input operands [4] are
determined during the design process and the accuracy is not a s a p
dynamically controllable, unlike with our proposed multiplier. b
Moons et al. [5] proposed a system-level technique that disables
b
part of the combinational logic and reconfigures the pipelined
registers and combinational logic. It can trade off accuracy for c q
power dynamically by changing the numbers of pipeline stages
and voltage-accuracy scaling modes. Our proposed multiplier
also disables part of the combinational logic in the CPA to (a) (b)
achieve lower power consumption, but ours does not require a
Fig. 1. (a) Accurate half adder and (b) incomplete adder cell.
pipeline system or control circuits for voltage scaling.
III. ACCURACY-CONTROLLABLE MULTIPLIER TABLE I. TRUTH TABLES FOR ACCURATE HALF ADDER AND
INCOMPLETE ADDER CELL.
A typical multiplier consists of three parts: (i) partial
product generation using an AND gate; (ii) PPR using an adder Outputs
tree; and (iii) addition to produce the final result using a CPA. Inputs Accurate half adder iCAC
Power consumption and circuit complexity are dominated by
a b c s q p
the PPR [6], and the multiplier’s critical path is dominated by
the propagated carry chain in the CPA [7]. 0 0 0 0 0 0
0 1 0 1 0 1
This section is organized as follows. Section III-A explains
how the partial product layer is simplified by the approximate 1 0 0 1 0 1
tree compressor. Section III-B introduces the CMA. Finally, 1 1 1 0 1 1
Section III-C presents the overall structure of the accuracy-
controllable approximate multiplier, which uses the proposed
adder and tree compressor. Two 8-bit inputs :
A = {a7, a6, a5, a4, a3, a2, a1, a0} B = {b7, b6, b5, b4, b3, b2, b1, b0}
A. Approximate Tree Compressor
Figure 1(a) shows an accurate half adder, for which the
following equation can be obtained:
b7 a 7 b6 a 6 b5 a 5 b4 a 4 b3 a3 b2 a 2 b1 a1 b0 a 0
ሼ…ǡ •ሽ ൌ ƒ ൅ „ ൌ ʹ… ൅ • ൌ ሺ… ൅ •ሻ ൅ …ǡ
where {,} and + denote concatenation and addition, respectively.
The value c is generated by a ‫ ܦܰܣ‬b and s is generated by a
ܱܴܺ b, so (… ൅ •) can be generated by a ܱܴ b. Based on the
above, consider the basic logic cell shown in Fig. 1(b), for
which the following equations can be obtained: q7 p7 q6 p6 q5 p5 q4 p4 q3 p3 q2 p2 q1 p1 q0 p0
’ ൌ … ൅ •ǡ
“ ൌ …ǡ Two 8-bit outputs :
ሼ…ǡ •ሽ ൌ ƒ ൅ „ ൌ ’ ൅ “Ǥ Approximate sum : P = {p7, p6, p5, p4, p3, p2, p1, p0}
This is called an incomplete adder cell (iCAC). Table I shows Error recovery vector : Q = {q7, q6, q5, q4, q3, q2, q1, q0}
the truth tables for an accurate half adder and an iCAC. Note
that the bit position of c and that of s, p, and q are different. As Fig. 2. A row of incomplete adder cells with two 8-bit inputs.
can be seen, q is equal to c. While p is not equal to s, the precise
sum can be obtained by adding p and q, so the iCAC is not an By extending the row of iCACs from two to ݊ inputs, ݊/2
approximate adder but an element of a precise adder. Ps and ݊/2 Qs are obtained. If the sum of the ݊/2 Qs is used
instead of the ݊/2 Qs themselves, the number of Qs is reduced
By extending the above equation to ݉ bits, the following to one. Remember that P is always greater than or equal to S,
equation can be obtained: and Q is equal to C. By exploiting these facts, OR gates can be
 ൌ  ൅  ൌ  ൅ Ǥ used to generate the approximate sum of the ݊/2 Qs without
significant loss of accuracy. This approximate sum is called the
where A, B, P, and Q are ݉ -bit values, the bits of which accuracy compensation vector and is denoted by V. This
correspond to a, b, p, and q, respectively. A row of eight iCACs, method is named approximate tree compressor (ATC). An ATC
used for 8-bit inputs, is shown in Fig. 2. with ݊ inputs is called an ATC-݊, and the structure of an ATC
Consider the example of an 8-bit adder with the two inputs with eight inputs (ATC-8) is shown in Fig. 3. The rectangles
A = 01011111 and B = 00110110. The accurate sum S is represent rows of iCACs and the number of iCACs in each row
10010101, while the row of iCACs produces P = 01111111 and (rectangle) is dependent on the bit width of the inputs. For
Q = 00010110. Again, it is evident that the following holds: example, if there are eight ݉-bit inputs (D1, D2, …, D8), four
rows of ݉ iCACs are required to build a ݉-bit ATC-8. This
 ൌ  ൅ Ǥ ( 1 ) reconstruction generates four approximate sums, P1, P2, P3,
and P4, and four error recovery vectors, Q1, Q2, Q3, and Q4.
While S is obtained from P and Q, P can be used as an
OR gates generate the accuracy compensation vector V. As a
approximation for S, and Q can be used as an error recovery
result, the eight inputs have been reduced to five.
vector for the approximate sum P.

606
7A-1

D1 iCACs P1 mask_x
mask_x
x
x y
D2 Q1
y
s
s
D3 iCACs P2 Cout
Cin

D4 Q2 Cout

V
Q3 (a) (b)
D5 iCACs Fig. 4. (a) Carry-maskable half adder, (b) Carry-maskable full adder.
D6 P3
Q4 1 1 1 1 1
D7 iCACs
Stage 1 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
ATC-8
D8 P4
V1
Fig. 3. Structure of an approximate tree compressor with eight inputs.

B. Carry-maskable Adder
A CMA is proposed to control the accuracy flexibly and
dynamically. A ݇-bit CMA comprises (݇ െ1) carry-maskable P1 P2 P3 P4
full adders and one carry-maskable half adder, and its structure
is similar to that of a ݇-bit CPA. P1
ATC-4
V2
P2
The structures of the proposed carry-maskable half and full P3
P4
adders are shown in Fig. 4. In the proposed half adder, when
mask_x is 0, S is equal to x ܱܴ y and Cout is equal to 0. P5 P6
Otherwise, when mask_x is 1, S is equal to x ܱܴܺ b and Cout
is equal to x ‫ ܦܰܣ‬y. In other words, the operation of the P5
iCACs Q7
proposed half adder can be controlled by the active-low signal P6
mask_x. When mask_x is disabled (=1), it functions as an
accurate half adder, and when mask_x is enabled (=0), Cout is P7
masked to 0 and it functions as an OR gate with output S. The Stage 2
operation of the proposed full adder is similar to the half adder: P7
Q7
when mask_x is disabled (=1), it functions as an accurate full V2
V1
adder, and when mask_x is enabled (=0), Cout is equal to Cin and
S is the output of an OR gate. Seven OR gates

C. Overall Structure Stage 3 Eleven Full adders


HA HA
An ݊-bit multiplier consists of ݊ rows, each of which has
݊ partial products (PP), so there are ݊ ൈ ݊ PPs in total. Using
the ATC-݊ introduced in the previous section, the ݊ rows can
be replaced by ݊/2൅ͳ rows. Figure 5 shows an example of an Stage 4
Accurate part Controllable part Truncated part
8-bit multiplier with ͺ ൈ ͺ PPs. The PPR is performed in three
stages (Stage 1, Stage 2, and Stage 3) and the CPA is performed
in Stage 4. The PP generation step is not shown. Each dot
A 7-bit CMA
represents a PP. The least significant bit (right side) is bit 0, and
the most significant bit (left side) is bit 14. The solid rectangles
in Stage 1 represent ATCs and the dashed rectangles represent
rows of seven iCACs. Every row of iCACs includes PPs that Fig. 5. Structure of an 8-bit multiplier with ͺ ൈ ͺ partial products.
are not processed: for example, the PP at position 0 in the first
row and the one at position 8 in the second row of the first iCAC In Stage 2, there are four PPs for each of bits 4 to 10. In
block in ATC-8 are not processed. order to achieve a lower path delay, OR gates are used to sum
In Stage 1, eight rows of PPs are reduced to four rows (P1, V1 and V2 approximately. The empty circles for V1 and V2
P2, P3, and P4) and one accuracy compensation vector (V1) by represent the bits which are summed using OR gates. Seven OR
an ATC-8. The four rows are further reduced to two rows (P5 gates are required in total and the four rows are compressed
and P6) and another accuracy compensation vector (V2) by an to three.
ATC-4. A final row of iCACs then processes P5 and P6 and In Stage 3, full adders and half adders are used to compress
generates P7 and Q7. In summary, Stage 1 uses an ATC-8, an the three rows to two. Two half adders are required for bits 1
ATC-4, and a row of seven iCACs to compress the ͺ ൈ ͺ PPs and 13, and eleven full adders are required for bits 2 to 12.
to four rows (P7, V1, V2, and Q7).
Addition using a CPA is required after PPR to produce the
final result. For an 8-bit Wallace tree multiplier, the length of

607
7A-1
the CPA is 11 [7]. In our proposed multiplier, the length of the TABLE II. ACCURACY COMPARISON.
CPA is 13. In Stage 4, the CPA is divided into three parts in NMED MRED ER
order to reduce the length of the carry propagation. Since the (%) (%) (%)
lower bits are not significant for accuracy, bits 0 to 4 are defined
as the truncated part and three OR gates are used to generate the m_7b 0.25 0.85 36.16
values for bits 2, 3, and 4 of the final result. Because there is no m_6b 0.26 0.99 43.46
carry out from the truncated part, the length of the CPA is
reduced to 10. Since the upper bits are the most significant for m_5b 0.29 1.31 52.07
accuracy, bits 12 to 14 are defined as the accurate part, and three m_4b 0.35 1.93 61.05
accurate adders are used to generate the values for these bits of 0.49 3.05 69.61
the final result. m_3b
m_2b 0.71 4.57 74.93
The accuracy-controllable part lies between the truncated
and accurate parts. This part is important for both critical path m_1b 1.05 6.50 78.10
delay and accuracy. In Stage 4, bits 5 to 11 in the CPA are m_0b 1.64 9.02 80.02
replaced by a 7-bit CMA. Note that every 1-bit CMA has a
mask_x signal. Given a value for ‫ݑ‬, the ‫ ݑ‬upper bits in the AMER_10b 0.20 0.62 31.59
accuracy-controllable part are configured as a ‫ݑ‬-bit CPA and AMER_8b 0.24 1.16 55.44
the lower bits are configured as (͹ െ ‫ )ݑ‬2-input OR gates by 0.46 3.23 71.12
AMER_6b
managing the seven mask_x signals appropriately. When ‫= ݑ‬
7, it functions as a 7-bit CPA, and when ‫ = ݑ‬0, it functions as AMER_4b 1.20 7.53 79.54
seven 2-input OR gates. For each bit of S that is generated by a ACCI_M2 0.04 0.62 72.29
2-input OR gate, power consumption is reduced because the
switching activity is reduced in some of the logic gates. switching activity interchange format files generated from the
Furthermore, the maximum delay of the CMA is reduced. VCD files. The Synopsys VCS was used to evaluate the
numerical outputs of all the multipliers. Because 8-bit
IV. EXPERIMENTAL RESULTS multipliers were evaluated, the total number of test patterns
A. Experimental Setup was 65,536.
In this section, the proposed multiplier is evaluated in terms B. Accuracy Results
of power consumption, critical path delay, design area, and The error distance (ED) and mean ED (MED) measures
computational accuracy. To clarify the ability of the have been proposed to evaluate the performance of
approximate multiplier to save power, shorten critical path delay, approximate arithmetic circuits [10]. For multipliers, the ED is
and control the accuracy, a conventional Wallace tree multiplier defined as the arithmetic difference between the accurate
and the previously-proposed approximate multipliers [3] [8] product () and the approximate product ( ᇱ ):  ൌ ȁ െ  ᇱ ȁ.
were implemented for comparison. The approximate multiplier The MED is the average ED for a set of outputs. In [3], the mean
[3] can be configured at design time, and its accuracy is relative ED (MRED) and normalized MED (NMED) are
controlled by the length of the recovery vector. Four different proposed to evaluate approximate multipliers. The relative ED
approximate multipliers were implemented, using 10-bit, 8-bit, (RED) is the ED divided by the accurate output:  ൌ
6-bit, and 4-bit recovery vectors, and are referred to as ȁ െ  ᇱ ȁΤ, and the MRED is the average RED, which can be
AMER_10b, AMER_8b, AMER_6b, and AMER_4b, obtained similarly to the MED. The NMED is defined as
respectively. Note that the accuracy of AMER_XX is not  ൌ Τ୫ୟ୶ , where ୫ୟ୶ is the maximum output
dynamically controllable, unlike that of our proposed multiplier. magnitude of an accurate multiplier. The error rate (ER) is the
The ACCI2 approximate multiplier [8] is one of the most percentage of inaccurate outputs among all outputs generated
accurate approximate multipliers and is referred to as ACCI_M2. from all combinations of inputs. These three metrics (NMED,
The multipliers with eight different accuracy settings (values of MRED, and ER) were used to evaluate the proposed multiplier.
‫ )ݑ‬are referred to as m_7b, m_6b, …, m_0b. Multiplier m_‫ݑ‬b
utilized an approximate adder for the final results from the PPR Table II compares the accuracy results. It can be seen that
consisting of a (͵ ൅ ‫)ݑ‬-bit CPA and ሼሺ͹ െ ‫ݑ‬ሻ ൅ ͵ሽ 2-input OR the accuracy of the proposed multiplier changes widely
gates. For example, the approximate adder for m_6b consisted according to its setting. While the NMED and MRED values of
of a 9-bit CPA and four 2-input OR gates. the most accurate configuration of the proposed multiplier are
larger than those of the most accurate AMER configuration and
All the approximate multipliers, as well as the conventional
ACCI_M2, its controllability is better than that of AMER.
Wallace tree multiplier, were eight bits and coded using Verilog
HDL. The Synopsys VCS was used to simulate the designs and Remember that the proposed multiplier is dynamically
generate value change dump (VCD) files to evaluate the power controllable, unlike AMER.
consumption precisely. The Synopsys Design Compiler was C. Power, Critical Path Delay, and Design Area Results
used to synthesize the multipliers with the NanGate 45nm Open
Comparisons of the power consumption and critical path
Cell Library [9]. The power consumption was evaluated at a
delay for the different multipliers relative to accuracy are
frequency of 0.5GHz. The operating conditions for synthesis
shown in Fig. 6 and Fig. 7, respectively, where the ‫ ݔ‬-axis
were typical (a 1.00 process factor, 1.1 V power supply, and
25°C operating temperature). All designs were synthesized and indicates the MRED. The circles, triangles, asterisk, and square
optimized using the default compiler options. The Synopsys represent the proposed accuracy-controllable multiplier with
Power Compiler was used to estimate power consumption from different dynamic configurations (m_7b, m_6b, …, m_0b), the

608
7A-1

Fig. 6. Power consumption results relative to the MRED. Fig. 8. Design area results.

TABLE III. INPUT IMAGES.

Image No. Description


1 Lena
2 Some peppers
3 A bridge
4 A domed palace
5 A truck on grassland
6 A bird standing in a stream
7 A view of a small town
8 A house and a car

A comparison of the design area results is shown in Fig. 8.


Note that the accuracy setting does not have any effect on the
design area of the proposed multiplier because it is dynamically
configurable, and thus only one design area result is shown. In
contrast, AMER produces different results for different
accuracy settings because it is not dynamically configurable.
While our proposed multiplier is larger than AMER_4b, it
consumes less power and has a shorter critical path delay than
Fig. 7. Critical path delay results relative to the MRED. AMER_4b does. In addition, the proposed multiplier has a
smaller design area than AMER_6b, while having lower power
AMERs [3] with different static configurations (AMER_10b, consumption and critical path delay and a higher accuracy.
AMER_8b, AMER_6b, AMER_4b), ACCI_M2 [8], and the
Wallace tree multiplier, respectively. The power values used in D. Image Processing
this evaluation are the sum of the dynamic and static power An image processing application was also evaluated. An
consumptions. image sharpening algorithm [11] was used, which is popular in
the evaluation of approximate multipliers. Eight images
As can be seen in Fig. 6 and Fig. 7, the proposed multiplier collected from the Internet were used, all ͷͳʹ ൈ ͷͳʹ 8-bit
achieves good results, both in terms of power consumption and grayscale bitmap images, and these are summarized in Table III.
critical path delay. For all accuracies (MRED), the proposed Only the multiplications were approximate; all the other
accuracy-controllable multiplier achieves the smallest power operations (addition, subtraction, and division) were accurate.
consumption and delay results. For example, if MRED of
around 9% is required, m_0b delivers the lowest power
consumption and the shortest critical path delay.

609
7A-1

TABLE IV. PSNR RESULTS OF THE APPROXIMATE MULTIPLIERS, IN DB.

Image AMER AMER AMER AMER


No. m_7b m_6b m_5b m_4b m_3b m_2b m_1b m_0b 10b 8b 6b 4b ACCI_M2
1 53.1 50.9 45.8 40.4 34.4 28.6 25.1 15.5 57.0 44.6 31.7 22.8 49.4
2 54.1 51.6 47.6 42.0 35.8 30.1 27.4 17.9 57.0 46.5 33.8 24.9 51.1
3 56.0 52.6 48.8 40.2 33.3 31.6 27.0 27.0 56.8 46.8 33.7 25.3 50.9
4 52.9 49.3 45.9 39.6 32.1 30.3 25.5 24.1 55.3 45.3 32.0 23.5 49.1
5 50.3 49.1 46.7 41.3 33.9 32.9 23.9 23.9 57.3 49.4 31.4 22.7 51.4
6 50.9 49.3 46.8 39.8 32.9 30.4 25.1 24.1 52.3 46.6 31.8 23.7 49.4
7 49.4 48.3 46.2 40.4 33.7 29.7 25.1 13.6 57.0 46.5 31.5 18.8 49.4
8 53.6 51.1 47.6 42.8 35.4 27.3 24.8 11.5 56.7 46.8 31.7 20.3 51.0

The processed image quality was measured using the peak Co., Ltd. for assistance with the experiments. This work was
signal-to-noise ratio (PSNR). This is usually used to measure supported by JSPS KAKENHI Grant Number JP17K00088 and
the quality of reconstructive processes that involves by funds (No.175007 and No.177005) from the Central
information loss and is defined in terms of the mean squared Research Institute of Fukuoka University. This work is
error (MSE) [6]. The MSE and PSNR were defined in [6] as supported by VLSI Design and Education Center (VDEC), the
ଵ ௣ିଵ
University of Tokyo in collaboration with Synopsys, Inc.
 ൌ σ௠ିଵ ଶ
௜ୀ଴ σ௝ୀ଴ ሾ‫ܫ‬ሺ݅ǡ ݆ሻ െ ‫ܭ‬ሺ݅ǡ ݆ሻሿ ǡ (2)
௠௣
ெ஺௑಺మ
 ൌ ͳͲ Ž‘‰ଵ଴ ሺ ሻǡ (3)
୑ୗ୉ REFERENCES
where ‫ܫ‬ሺ݅ǡ ݆ሻ and ‫ܭ‬ሺ݅ǡ ݆ሻ are the correct and obtained values, [1] S. Venkataramani, V. K. Chippa, S. T. Chakradhar, K. Roy, and A.
respectively, of each pixel, ݉ and ‫ ݌‬are the image Raghunathan. “Quality programmable vector processors for approximate
dimensions, and ‫ܺܣܯ‬ூ represents the maximum value of each computing,” 46th Annual IEEE/ACM International Symposium on
Microarchitecture (MICRO), pp. 1-12, Dec. 2013.
pixel (255 here, as the images are 8-bit).
[2] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-Inspired
Table IV shows the PSNR results of the approximate imprecise computational blocks for efficient VLSI implementation of
multipliers, in dB. Larger values represent better quality images. Soft-Computing applications,” IEEE Transactions on Circuits and
Systems I: Regular Papers, vol. 57, no. 4, pp. 850-862, Apr. 2010.
As can be seen, different PSNR values are found for the
[3] C. Liu, J. Han, and F. Lombardi, “A Low-Power, High-Performance
different images on each column of the table. This confirms that approximate multiplier with configurable partial error recovery,” Design,
the dynamic reconfigurability is necessary for the situations Automation & Test in Europe Conference & Exhibition (DATE), Mar.
where different qualities are required. In addition, the proposed 2014.
accuracy-controllable multiplier produced a wide range of [4] S. Hashemi, R. I. Bahar, and S. Reda, “DRUM: A Dynamic Range
Unbiased Multiplier for approximate applications,” IEEE/ACM
PSNR values, with its largest values being comparable to those International Conference on Computer-Aided Design (ICCAD), pp. 418-
of the other approximate multipliers. 425, Nov. 2015.
[5] B. Moons, M. Verhelst, “DVAS: Dynamic Voltage Accuracy Scaling for
V. CONCLUSION increased energy-efficiency in approximate computing,” IEEE/ACM
An accuracy-controllable approximate multiplier has been International Symposium on Low Power Electronics and Design
(ISLPED), Jul. 2015.
proposed in this paper that consumes less power and has a
[6] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and analysis
shorter critical path delay than the conventional design. Its of approximate compressors for multiplication,” IEEE Transactions on
dynamic controllability is realized by the proposed CMA. The Computers, vol. 64, no. 4, pp. 984-994, Apr. 2015.
multiplier was evaluated at both the circuit and application [7] K. C. Bickerstaff, E. E. Swartzlander, and M. J. Schulte, “Analysis of
levels. The experimental results demonstrate that the proposed column compression multipliers,” 15th IEEE Symposium on Computer
multiplier was able to deliver significant power savings and Arithmetic, pp. 33-39, Jun. 2001.
speedups while maintaining a significantly smaller circuit area [8] Z. Yang, J. Han, and F. Lombardi, “Approximate compressors for Error-
than that of the conventional Wallace tree multiplier. Resilient multiplier design,” IEEE International Symposium on Defect
and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS), pp.
Furthermore, for the same accuracy, the proposed multiplier 183-186, Oct. 2015.
delivered greater improvements in both power consumption
[9] NanGate, Inc. NanGate FreePDK45 Open Cell Library,
and critical path delay than other previously studied http://www.nangate.com/?page_id=2325, 2008
approximate multipliers. Finally, the ability of our proposed [10] J. Liang, J. Han, and F. Lombardi, “New metrics for the reliability of
multiplier to control accuracy was confirmed by an application- approximate and probabilistic adders,” IEEE Transactions on computers,
level evaluation. vol. 62, no. 9, pp. 1760-1771, Sep. 2013.
[11] M. S. Lau, K. V. Ling, and Y. C. Chu, “Energy-Aware probabilistic
ACKNOWLEDGMENT multiplier: Design and Analysis,” 2009 international Conferrence on
Compliers, architeture, and synthesis for embedded systems, pp. 281-290,
Thanks are due to Katsuhiko Wakasugi of Logic Research Oct. 2009.

610

You might also like