ApproximateCompressor FinalforReview
ApproximateCompressor FinalforReview
ApproximateCompressor FinalforReview
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1
1
2
3 Design and Analysis of
4
5
6
Approximate Compressors for Multiplication
7 A. Momeni, J. Han, Member, P.Montuschi, Senior Member and F. Lombardi, Fellow
8 Abstract—Inexact (or approximate) computing is an attractive Addition and multiplication are widely used operations in
paradigm for digital processing at nanometric scales. Inexact computer arithmetic; for addition full-adder cells have been
9
computing is particularly interesting for computer arithmetic extensively analyzed for approximate computing [2-4]. [1] has
10 designs. This paper deals with the analysis and design of two new
11 compared these adders and proposed several new metrics for
approximate 4-2 compressors for utilization in a multiplier.
12 These designs rely on different features of compression, such that evaluating approximate and probabilistic adders with respect
13 imprecision in computation (as measured by the error rate and to unified figures of merit for design assessment for inexact
14 the so-called normalized error distance) can meet with respect to computing applications. For each input to a circuit, the error
Fo
15 circuit-based figures of merit of a design (number of transistors, distance (ED) is defined as the arithmetic distance between an
16 delay and power consumption). Four different schemes for erroneous output and the correct one [1]. The mean error
utilizing the proposed approximate compressors are proposed
17 distance (MED) and normalized error distance (NED) are
and analyzed for a Dadda multiplier. Extensive simulation results
18 are provided and an application of the approximate multipliers proposed by considering the averaging effect of multiple
rP
19 to image processing is presented. The results show that the inputs and the normalization of multiple-bit adders. The NED
20 proposed designs accomplish significant reductions in power is nearly invariant with the size of an implementation and is
21 dissipation, delay and transistor count compared to an exact therefore useful in the reliability assessment of a specific
22 design; moreover, two of the proposed multiplier designs provide design. The tradeoff between precision and power has also
excellent capabilities for image multiplication with respect to
ee
23 been quantitatively evaluated in [1].
24 average normalized error distance and peak signal-to-noise ratio
(more than 50dB for the considered image examples). However, the design of approximate multipliers has
25 received less attention. Multiplication can be thought as the
26 Index Terms—Compressor, Dadda Multiplier, Inexact repeated sum of partial products; however, the straightforward
rR
31
implemented using digital logic circuits, thus have been proposed in the literature [4] [5] [6] [7]. Most of
32
operating with a high degree of reliability and these designs use a truncated multiplication method; they
33
precision. However, many applications such as in multimedia estimate the least significant columns of the partial products as
34
ie
35 and image processing can tolerate errors and imprecision in a constant. In [4], an imprecise array multiplier is used for
36 computation and still produce meaningful and useful results. neural network applications by omitting some of the least
37 Accurate and precise models and algorithms are not always significant bits in the partial products (and thus removing
w
38 suitable or efficient for use in these applications. The some adders in the array). A truncated multiplier with a
39 paradigm of inexact computation relies on relaxing fully correction constant is proposed in [5]. For an n×n multiplier,
40 precise and completely deterministic building modules when this design calculates the sum of the n+k most significant
On
41 for example, designing energy-efficient systems. This allows columns of the partial products and truncates the other n-k
42 imprecise computation to redirect the existing design process columns. The n+k bit result is then rounded to n bits. The
43 of digital circuits and systems by taking advantage of a reduction error (i.e. the error generated by truncating then-k
44 decrease in complexity and cost with possibly a potential least significant bits) and rounding error (i.e. the error
45 generated by rounding the result to n bits) are found in the
ly
1
[8-10] to speed up the partial product reduction tree and or i + 2.For the correct operation of the circuit shown in
2
decrease power dissipation. Optimized designs of 4-2 exact Figure 1, the following inequality must be satisfied
3
compressors have been proposed in [8, 11 - 16]. [17] [18] have … 3 2 4 8 … (1)
4
also considered compression for approximate multiplication.
5
6 In [17], an approximate signed multiplier has been proposed
7 for use in arithmetic data value speculation (AVDS);
8 multiplication is performed using the Baugh-Wooley
9 algorithm. However, no new design is proposed for the
10 compressors for the inexact computation. Designs of
11 approximate compressors have been proposed in [18];
12 however, these designs do not target multiplication. It should
13 be noted that the approach of [7] improves over [17] [18] by
14 utilizing a simplified multiplier block that is amenable to
Fo
Figure 1.Schematic diagram of n-2 compressors in a multi operand addition
15 approximate multiplication.
circuit [13]
16 Initially in this paper, two novel approximate 4-2
17 compressors are proposed and analyzed. It is shown that these Where denotes the number of carry bits from slice ito
18 simplified compressors have better delay and power slice i+ j.
rP
19 consumption than the optimized (exact) 4-2 compressor A widely used structure for compression is the 4-2
20 designs found in the technical literature [8]. These compressor; a 4-2 compressor (Figure 2) can be implemented
21 approximate compressors are then used in the restoration with a carry bit between adjacent slices ( 1 1). The carry bit
22 module of a Dadda multiplier; four different schemes are from the position to the right is denoted as cin while the carry
ee
23 proposed for inexact multiplication. Extensive simulation bit into the higher position is denoted as cout. The two output
24 results are provided at circuit-level for figures of merit, such bits in positions i and i + 1are also referred to as the sum and
25 as delay, transistor count, power dissipation, error rate and carry respectively.
26 normalized error distance under CMOS feature sizes of 32, 22
rR
31
output product image that has a very high quality and
32
resemblance to the image generated by an exact multiplier, i.e.
33
excellent values for the average NED and the Peak Signal-to-
34
ie
35 Noise Ratio (PSNR) are found (for the PSNR more than
36 50db). The analysis and simulation results show that the
proposed approximate designs for both the compressor and the Figure2.4-2 compressor
37
w
38 multiplier are viable candidates for inexact computing. The following equations give the outputs of the 4-2
39 This paper is organized as follows. Section 2 is a review of compressor, while Table 1 shows its truth table.
40 existing schemes for (exact) compressors. The two new
On
46 multipliers to image processing is presented in Section 6. The common implementation of a 4-2 compressor is
47 Section 7 concludes the manuscript. accomplished by utilizing two full-adder (FA) cells (Figure 3)
48 [8]. Different designs have been proposed in the literature for
49 II. EXACT COMPRESSORS 4-2 compressor [8, 11-16].
50 The main goal of either multi-operand carry-save addition Figure 4 shows the optimized design of an exact4-2
51 or parallel multiplication is to reduce n numbers to two compressor based on the so-called XOR-XNOR gates [8]; a
52 XOR-XNOR gate simultaneously generates the XOR and
numbers; therefore, n-2 compressors (or n-2 counters) have
53 XNOR output signals. The design of [8] consists of three
been widely used in computer arithmetic. An-2 compressor
54 XOR-XNOR (denoted by XOR*) gates, one XOR and two 2-1
(Figure 1) is usually a slice of a circuit that reduces n numbers
55
to two numbers when properly replicated. In slice i of the MUXes. The critical path of this design has a delay of 3Δ,
56
57 circuit, the n-2 compressor receives n bits in position i and one where Δ is the unitary delay through any gate in the design.
58 or more carry bits from the positions to the right, such as i – 1
59 or i – 2. It produces two output bits in positions i and i + 1 and
60 one or more carry bits into the higher positions, such as i + 1
Page 3 of 13 Transactions on Computers
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3
1
performance improvement compared to an exact compressor
2
with respect to delay, number of transistors and power
3
consumption.
4
5
6
7
8
9
10
11
12
13
14
Fo
15
16 Figure 3. Implementation of 4-2 Compressor
17 TABLE I
18 TRUTH TABLE OF 4-2 COMPRESSOR
rP
19 cin X4 X3 X2 X1 cout carry sum
20 0 0 0 0 0 0 0 0
21 0 0 0 0 1 0 0 1
22 0 0 0 1 0 0 0 1
0 0 0 1 1 1 0 0
ee
23 Figure4. Optimized 4-2 compressor of [8]
0 0 1 0 0 0 0 1
24 0 0 1 0 1 1 0 0 A. Design 1
25 0 0 1 1 0 1 0 0
0 0 1 1 1 1 0 1 As shown in Table I, the carry output in an exact
26
rR
31 0 1 1 1 0 1 0 1
32 0 1 1 1 1 1 1 0 (5)
33 1 0 0 0 0 0 0 1
1 0 0 0 1 0 1 0 Since the Carry output has the higher weight of a binary bit,
34
ie
1 0 0 1 0 0 1 0
35 1 0 0 1 1 1 0 1
an erroneous value of this signal will produce a difference
36 1 0 1 0 0 0 1 0 value of two in the output. For example, if the input pattern is
37 1 0 1 0 1 1 0 1 “01001” (row 10 of Table II), the correct output is “010” that
w
1 0 1 1 0 1 0 1
38 1 0 1 1 1 1 1 0
is equal to 2. By simplifying the carry output to cin, the
39 1 1 0 0 0 0 1 0 approximate compressor will generate the “000” pattern at the
40 1 1 0 0 1 0 1 1 output (i.e. a value of 0). This substantial difference may not
On
41 1 1 0 1 0 0 1 1
be acceptable; however, it can be compensated or reduced by
1 1 0 1 1 1 1 0
42 1 1 1 0 0 0 1 1 simplifying the cout and sum signals. In particular, the
43 1 1 1 0 1 1 1 0 simplification of sum to a value of 0 (second half of Table II)
44 1 1 1 1 0 1 1 0 reduces the difference between the approximate and the exact
45 1 1 1 1 1 1 1 1
ly
1
B. Design 2
2
Although the above mentioned simplifications of carry and A second design of an approximate compressor is proposed
3
sum increase the error rate in the proposed approximate to further increase performance as well as reducing the error
4
compressor, its design complexity and therefore the power rate. Since the carry and cout outputs have the same weight,
5
6 consumption are considerably decreased. This can be realized the proposed equations for the approximate carry and cout in
7 by comparing (2)-(4) and (5)-(7).Table II shows the truth table the previous part can be interchanged. In this new design,
8 of the first proposed approximate compressor. It also shows carry uses the right hand side of (7) and cout is always equal to
9 the difference between the inexact output of the proposed cin; since cin is zero in the first stage, cout and cin will be zero in
10 approximate compressor and the output of the exact all stages. So, cin and cout can be ignored in the hardware
11 compressor. As shown in Table II, the proposed design has 12 design. Figure 7shows the block diagram of this approximate
12 incorrect outputs out of 32 outputs (thus yielding an error rate 4-2 compressor and the expressions below describe its outputs.
13 of 37.5%). This is less than the error rate using the best
14 approximate full-adder cell of [2]. 1 2 3 4 (8)
Fo
15 1 2 3 4 (9)
16 TABLE II
TRUTH TABLE OF THE FIRSTAPPROXIMATE 4-2 COMPRESSOR
17
cin X4 X3 X2 X1 cout’ carry’ sum' Difference
18
rP
0 0 0 0 0 0 0 1 1
19 0 0 0 0 1 0 0 1 0
20 0 0 0 1 0 0 0 1 0
21 0 0 0 1 1 0 0 1 -1
22 0 0 1 0 0 0 0 1 0
0 0 1 0 1 1 0 0 0
ee
23 0 0 1 1 0 1 0 0 0
24 0 0 1 1 1 1 0 1 0
25 0 1 0 0 0 0 0 1 0
26 0 1 0 0 1 1 0 0 0
rR
0 1 0 1 0 1 0 0 0
27 0 1 0 1 1 1 0 1 0 Figure 6. Gate level implementation of Design 1
28 0 1 1 0 0 0 0 1 -1
29 0 1 1 0 1 1 0 1 0
0 1 1 1 0 1 0 1 0
30
0 1 1 1 1 1 0 1 -1
ev
31 1 0 0 0 0 0 1 0 1
32 1 0 0 0 1 0 1 0 0
33 1 0 0 1 0 0 1 0 0
1 0 0 1 1 0 1 0 -1
34 1 0 1 0 0 0 1 0 0
ie
35 1 0 1 0 1 1 1 0 1
36 1 0 1 1 0 1 1 0 1
37 1 0 1 1 1 1 1 0 0
w
1
1, the decimal value of the addition of the inputs is 4. the partial products into at most four rows. In the second or
2
However, the approximate compressor produces a 1 for the final stage, 1 half-adder, 1 full-adder and 10 compressors are
3
carry and sum. The decimal value of the outputs in this case is used to compute the two final rows of partial products.
4
3; Table II shows that the difference is -1. Therefore, two stages of reduction and 3 half-adders, 3 full-
5
6 adders and 18 compressors are needed in the reduction
7 circuitry of an 8×8Dadda multiplier.
TABLE III In this paper, four cases are considered for designing an
8 TRUTH TABLE OF SECOND PROPOSED 4-2 COMPRESSOR
9 approximate multiplier.
X4 X3 X2 X1 carry’ sum' difference
10 0 0 0 0 0 1 1
11 0 0 0 1 0 1 0
12 0 0 1 0 0 1 0
13 0 0 1 1 0 1 -1
0 1 0 0 0 1 0
14 0 1 0 1 1 0 0
Fo
15 0 1 1 0 1 0 0
16 0 1 1 1 1 1 0
1 0 0 0 0 1 0
17 1 0 0 1 1 0 0
18 1 0 1 0 1 0 0
rP
19 1 0 1 1 1 1 0
20 1 1 0 0 0 1 -1
1 1 0 1 1 1 0
21 1 1 1 0 1 1 0
22 1 1 1 1 1 1 -1
ee
23
24 This design has therefore 4 incorrect outputs out of 16
25 outputs, so its error rate is now reduced to 25%. This is a very
26 positive feature, because it shows that on a probabilistic basis,
rR
31
32 In this section, the impact of using the proposed
33 compressors for multiplication is investigated. A fast (exact)
34 multiplier is usually composed of three parts (or modules) [8].
• Partial product generation.
ie
35
36 • A Carry Save Adder (CSA) tree to reduce the partial
37 products’ matrix to an addition of only two operands
w
1
• In the fourth case (Multiplier 4), Design 2 and exact 4-2 achieve significant improvement in terms of power
2
compressors are used in then-1 least significant columns consumption; on average at different feature sizes, the power
3
and then most significant columns in the reduction consumption of Design 1 is 57% less than the exact
4
5 circuitry respectively. compressor, while Design 2 has a power consumption that is
6 The objectives of the first two approximate designs are to 60% less than the exact design of [8].
7 reduce the delay and power consumption compared with an Table V compares these designs in terms of number of
8 exact multiplier; however, a high error distance is expected. transistors, as a measure of circuit complexity. The exact
9 The next two approximate multipliers (i.e. Multipliers 3 and 4) compressor [8] uses 10 transistors to implement each XOR*
10 are proposed to decrease the error distance. The delay in these gate, 6 transistors to implement the XOR gate and 8 transistors
11 designs is determined by the exact compressors that are in the to implement each MUX gate [8]; therefore, the exact
12 critical path; therefore, there is no improvement in delay for compressor utilizes 52 transistors. A 50% improvement in
13 these approximate designs compared with an exact multiplier. circuit complexity is accomplished by Design 2, as reflected
14 However, it is expected that the utilization of approximate by the lower number of transistors. This is expected because
Fo
15 compressors in the least significant columns will decrease the the second approximate design has no cin and cout with only 4
16 power consumption and transistor count (as measure of circuit inputs and 2 outputs (the exact compressor has 5 inputs and 3
17 complexity). While the first two proposed multipliers have outputs).
18 better performance in terms of delay and power consumption,
rP
19 the error distances in the third and fourth designs are expected TABLE V
20 COMPARISON OF NUMBER OF TRANSISTORS
to be significantly lower.
21 Design Number of transistors
22 Exact Design [8] 52
V. SIMULATION RESULTS Design 1 28
ee
23 Design 2 26
In this section, he designs of the two approximate
24
compressors (Section III) and the four approximate multipliers
25
(Section IV) are simulated using HSPICE. Predictive
26 B. Approximate Multipliers
rR
31 The two approximate compressors of this paper and the best measure of reliability [1]) of the proposed multipliers with
32 low-power exact compressor of [8] (implemented by using other approximate multipliers is also pursued.
33 XOR-XNOR gates) are simulated at a 1 GHz frequency; a fan-
34 out of 4 is utilized in all simulations. The simulation results of • Delay
ie
35 the delay, power consumption and power-delay product (PDP) The delay of the reduction circuitry (second module) of a
36 are given in Table IV by using the PTMs at 32 nm, 22 nm and Dadda multiplier is dependent on the number of reduction
37
w
16 nm. stages and the delay of each stage. In Multipliers 1 and 2, the
38 approximate compressors are used in all columns; therefore,
39 TABLE IV the delay of the stages is equal to the delay of the approximate
40 SIMULATION RESULTS (@32 NM) compressors. However, in Multipliers 3 and 4, the delay of the
On
41 Design Delay(ps) Power(μW) PDP(aJ) stages is equal to the delay of the exact compressors. So, the
42 @32 nm use of these approximate compressors in the n/2 LSBs cause
43 Exact Design [8] 60.36 2.98 180 no improvement in terms of delay compared to an exact
44 Design 1 58.32 1.27 74
multiplier. The delay improvement in the reduction circuitry
45 Design 2 44.35 1.14 50
ly
1
the power consumption improvement of each multiplier at 32 each input. Therefore the average NED is equivalent to the
2
nm feature size with respect to an exact adder; this confirms NED defined in [1]. The maximum high (low) NED is also
3
that an approximate multiplier in the reduction circuitry will defined as the largest absolute value of NED for the case in
4 result in a considerable power saving.
5 which the erroneous result is more (less) than the exact result.
6 Table X shows the average NED, the maximum high and low
TABLE VII
7 POWER CONSUMPTION IMPROVEMENT IN REDUCTION CIRCUITRY
NEDs and the number of correct results (or outputs) of
8 Design Improvement (%) approximate multipliers for n=8. The number of correct
9 Multiplier 1 52.49 outputs out of the total outputs represents the probability of
10 Multiplier 2 58.58 correctness for each design. Based on Table X, the probability
Multiplier 3 17.50 of correctness in Multiplier 1 is 0.16% (103 out of 65025)
11 Multiplier 4 26.15
12 while the probability of correctness in Multiplier 4 is 14.3%
13 (9320 out of 65025). Since the proposed approximate
14 • Transistor Count compressors produce erroneous results for all-zero input
The transistor count is used in this paper as metric of circuit
Fo
15 patterns (row 1 in Tables II and III), the proposed approximate
16 complexity. The first two approximate multipliers have a multipliers will generate an erroneous result if at least one of
lower transistor count compared with Multipliers 3 and 4.
17 the inputs is zero. However, in these cases (511 cases for n=8)
Table VIII shows the transistor count improvement of the
18 the multiplier can produce correct result by adding a circuit for
reduction circuitry of each multiplier compared to an exact
rP
19 detecting the zero-valued inputs. Therefore, the zero-valued
adder.
20 input patterns are not considered further in the simulation to
21 TABLE VIII investigate the proposed multipliers for a fair comparison.
22 TRANSISTOR COUNT IMPROVEMENT IN REDUCTION CIRCUITRY
ee
23 Design Improvement (%) TABLEX
NED FOR N = 8
24 Multiplier 1 42.11
Multiplier 2 48.15 Average Max High Max Low correct outputs
25 Design
NED NED NED (out of 65025)
Multiplier 3 14.03
26 Multiplier 4 22.42 Multiplier 1 0.6065×10-1 0.1593 0.1375 103
rR
31 proposed in [7] is simulated for n=8. The truncated multiplier Multiplier 7 [6] 0.1146×10-2 0.3060×10-2 0.4045×10-2 769
32 with constant correction [5] (Multiplier 6) and the truncated Multiplier 8 0.1049 0.2263 0.1207 8
33
multiplier with variable correction [6] (Multiplier 7) are also
34 Based on Table X, Multiplier 4 has the lowest average NED
simulated for n=8 and k=1. A further approximate multiplier
ie
1
multipliers to image processing is illustrated. A multiplier is
2
used to multiply two images on a pixel by pixel basis, thus Figure 11 shows two examples: both input images and the
3
blending the two images into a single output image. resulting output image are provided. A program has been
4
developed in C# .net and simulated in Microsoft Visual Studio
5
2010 using the 8 approximate multipliers at n=8. Figures 12
6
and 13 show the outputs for the two examples.
7 The average NED and the Peak Signal-to-Noise Ratio
8
(PSNR) that is based on the Mean Squared Error (MSE) are
9
computed to assess the quality of the output image and
10
compare it with the output image generated by an exact
11
multiplier. The equations for the MSE and PSNR are given in
12
13 (10) and (11); in (10), m and p are the image dimensions and
14 I(i,j) and K(i,j) are the exact and obtained values of each pixel
respectively. In (11), MAXI represents the maximum value of
Fo
15
16 each pixel.
17
18 MSE ∑ ∑ , , (10)
rP
19
20 PSNR 10 (11)
MSE
21
22
ee
23
24
25
26
rR
27
28
29
30
ev
31
32
33
34
ie
38
39
40
On
41
42
43
44
45
ly
46
47
48
49
50
51 Figure12. Image multiplication results for example 1, (a) Multiplier 1, (b)
52 Multiplier 2, (c) Multiplier 3, (d) Multiplier 4, (e) Multiplier 5, (f) Multiplier
6, (g) Multiplier 7, (h) Multiplier 8.
53
54
55
56
57
58 Figure10.Average NED distribution in 8×8 approximate multipliers. (a)
59 Multiplier 1, (b) Multiplier 2, (c) Multiplier 3, (d) Multiplier 4
60
Page 9 of 13 Transactions on Computers
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 9
1
compressors are utilized in the reduction module of four
2
approximate multipliers. The approximate compressors show a
3
significant reduction in transistor count, power consumption
4 and delay compared with an exact design.
5
• In terms of transistor count, the first design has a 46%
6
improvement, while the second design has a 50%
7
improvement.
8
• In terms of power consumption, the first design has a 57%
9
improvement and the second design has a 60%
10
improvement on average for CMOS implementation at
11
feature sizes of 32, 22 and 16 nm.
12
• In terms of delay, the second design has a 44%
13 Figure13. Image multiplication results for example2, (a) Multiplier 1, (b)
Multiplier 2, (c) Multiplier 3, (d) Multiplier 4, (e) Multiplier 5, (f) Multiplier improvement compared to the exact compressor and 35%
14
6, (g) Multiplier 7, (h) Multiplier 8 improvement compared to the first design on average at
Fo
15
different CMOS feature sizes of 32, 22 and 16 nm.
16 TABLE XI
PSNR AND AVERAGE NED FOR FIRST EXAMPLE
Four different approximate schemes have been proposed in
17 this paper to investigate the performance of the approximate
18 Design PSNR (dB) Average NED(×10-2)
compressors for the aforementioned metrics for inexact
rP
19 Multiplier 1 25.3 4.4
multiplication. The approximate compressors have been
Multiplier 2 26.3 3.7
20 utilized in the reduction module of a Dadda multiplier. The
Multiplier 3 53.9 0.10
21 Multiplier 4 53.2 0.12 following conclusions can be drawn from the simulation
22 Multiplier 5 [7] 26.3 2.3
results presented in this manuscript.
ee
23 Multiplier 6 [5] 48.3 0.28
Multiplier 7 [6] 52.3 0.15 • The first and second proposed multipliers show a
24 Multiplier 8 21.2 7.6 significant improvement in terms of power consumption
25 and transistor count compared to an exact multiplier.
26 TABLE XII • The first and second multipliers have larger average
rR
31 Multiplier 4 54.9 0.083 power consumption, the third and fourth proposed
32 Multiplier 5 [7] 35.7 0.72 multipliers have very low average NED values, thus
33 Multiplier 6 [5] 52.4 0.14
presenting the best tradeoff for energy with accuracy.
34 Multiplier 7 [6] 53.5 0.11
Moreover, the application of these approximate multipliers
ie
images generated by Multipliers 3 and 4, are nearly 50 dB, a generated by multiplying two input images, thus viable for
38 most applications.
39 value that is acceptable for most applications. Consistently,
Table XIII compares the four proposed approximate design
40 Multiplier 1 has the worst PSNR among 4 proposed designs.
with four other approximate designs found in the technical
On
1
figures of merit. Although not discussed and beyond the scope
2
of this manuscript, the proposed designs may also be useful in
3
other arithmetic circuits for applications in which inexact
4 computing can be used. The provision of an error indicator (as
5 required for other applications) is a topic of current
6 investigation.
7
8 TABLE XIII
9 RANKING OF APPROXIMATE MULTIPLIERS
10 Design Average NED Max High NED Max Low NED Correct Outputs PSNR example 1 PSNR example 2
11 Multiplier 1 7 7 7 6 7 7
12 Multiplier 2 6 6 6 5 5 6
Multiplier 3 2 4 1 3 1 2
13 Multiplier 4 1 2 2 2 2 1
14 Multiplier 5 [7] 5 1 8 1 5 5
Fo
15 Multiplier 6 [5] 4 5 4 8 4 4
16 Multiplier 7 [6] 3 3 3 4 3 3
Multiplier 8 8 8 5 7 8 8
17
[17] D. Kelly, B. Phillips, S. Al-Sarawi, "Approximate signed binary integer
18 multipliers for arithmetic data value speculation", in Proc. of the
rP
19 REFERENCES conference on design and architectures for signal and image processing,
20 [1] J. Liang, J. Han, F. Lombardi, “New Metrics for the Reliability of 2009.
21 Approximate and Probabilistic Adders,” IEEE Transactions on [18] J. Ma, K. Man, T. Krilavicius, S. Guan, and T. Jeong, “Implementation
Computers,vol. 63, no. 9, pp. 1760 - 1771, 2013. of High Performance Multipliers Based on Approximate Compressor
22 [2] V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, K. Roy, Design” in international Conference on Electrical and Control
ee
23 “IMPACT: IMPrecise adders for low-power approximate computing,” Technologies (ECT), 2011.
24 Low Power Electronics and Design (ISLPED) 2011 International
Symposium on. 1-3 Aug. 2011. Fabrizio Lombardi (M’81–SM’02-F’09)
25 [3] S. Cheemalavagu, P. Korkmaz, K.V. Palem, B.E.S. Akgul, and L.N. graduated in 1977 from the University of
26 Chakrapani, “A probabilistic CMOS switch and its realization by Essex (UK) with a B.Sc. (Hons.) in
rR
27 exploiting noise,” in Proc. IFIP-VLSI SoC, Perth, Western Australia, Electronic Engineering. In 1977 he joined the
28 Oct. 2005. Microwave Research Unit at University
[4] H.R. Mahdiani, A. Ahmadi, S.M. Fakhraie, C. Lucas, “Bio-Inspired College London, where he received the
29 Imprecise Computational Blocks for Efficient VLSI Implementation of Master in Microwaves and Modern Optics
30 Soft-Computing Applications,” IEEE Transactions on Circuits and (1978), the Diploma in Microwave
ev
31 Systems I: Regular Papers, vol. 57, no. 4, pp. 850-862, April 2010. Engineering (1978) and the Ph.D. from the
32 [5] M. J. Schulte and E. E. Swartzlander, Jr., “Truncated multiplication with University of London (1982).He is currently
correction constant,” VLSI Signal Processing VI, pp. 388–396, 1993. the holder of the International Test
33 [6] E. J. King and E. E. Swartzlander, Jr., “Data dependent truncated Conference (ITC) Endowed Chair Professorship at Northeastern
34 scheme for parallel multiplication,” in Proceedings of the Thirty First University, Boston. During 2007-2010 Dr. Lombardi was the Editor-In-
ie
35 Asilomar Conference on Signals, Circuits and Systems, pp. 1178–1182, Chief of the IEEE Transactions on Computers. He is also an Associate
1998. Editor of the IEEE Transactions on Nanotechnology and the inaugural
36 Editor-in-Chief of the IEEE Transactions on Emerging Topics in
[7] P. Kulkarni, P. Gupta, and MD Ercegovac, “Trading accuracy for power
37
w
in a multiplier architecture”, Journal of Low Power Electronics, vol. 7, Computing. He currently serves as an elected Member of the Board of
38 no. 4, pp. 490--501, 2011. Governors of the IEEE Computer Society. His research interests are bio-
[8] C. Chang, J. Gu, M. Zhang, “Ultra Low-Voltage Low- Power CMOS 4- inspired and nano manufacturing/computing, VLSI design, testing, and
39 fault/defect tolerance of digital systems. He has extensively published in
2 and 5-2 Compressors for Fast Arithmetic Circuits,” IEEE Transactions
40 on Circuits & Systems, Vol. 51, No. 10, pp. 1985-1997, Oct. 2004. these areas and coauthored/edited seven books.
On
1
2 Paolo Montuschi is a Professor of
3 Computer Engineering at Politecnico
4 di Torino and Deputy Chair of the
Control and Computer Engineering
5 Department. Previously, he served as
6 Chair of Department from 2003 to
7 2011, and as Chair or Member of
8 several Boards. He is currently
serving as Associate Editor-in-Chief
9 of the IEEE Transactions on
10 Computers, as well as member of the
11 steering committee of the IEEE Transactions on Emerging Topics in
Computing and of the Advisory Board of Computing Now. He is also
12 serving in the Board of Governors of the IEEE Computer Society, as
13 Chair of the Magazine Operations Committee of the Computer Society
14 and member of the Publications Board, Audit and Digital Library
Fo
Operations Committees. Previously, he served as chair of the Electronic
15 Products and Services and the Digital Library Operations Committees,
16 member of Electronic Products and Services Committee, Member-at-
17 Large of the Computer Society’s Publications Board, and Member of
18 Conference Publications Operations Committee. He served as guest and
associate editor of the IEEE Transactions on Computers from 2000 to
rP
19 2004 and from 2009 to 2012, and co-chair, program and steering
20 committee member of several conferences. His current main research
21 interests and scientific achievements are in computer arithmetic,
architectures, graphics, and new publication frameworks for “augmented
22 reading” and scientific knowledge dissemination. Within the Computer
ee
23 Society he is actively involved in opening the door to new publication
24 frameworks geared towards e-reading and mobile devices. He is a
Computer Society Golden Core Member, and an IEEE Senior Member.
25 Montuschi obtained a PhD in computer engineering in 1989, and since
26 2000 he has been full Professor.
rR
27
28 Amir Momeni received the BS degree in
29 computer engineering from Sharif University of
Technology, and the MS degree in computer
30 engineering from Shahid Beheshti University,
ev
35
36
37
w
38
39
40
On
41
42
43
44
45
ly
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60