ApproximateCompressor FinalforReview

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Page 1 of 13 Transactions on Computers

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1

1
2
3 Design and Analysis of
4
5
6
Approximate Compressors for Multiplication
7 A. Momeni, J. Han, Member, P.Montuschi, Senior Member and F. Lombardi, Fellow
8 Abstract—Inexact (or approximate) computing is an attractive Addition and multiplication are widely used operations in
paradigm for digital processing at nanometric scales. Inexact computer arithmetic; for addition full-adder cells have been
9
computing is particularly interesting for computer arithmetic extensively analyzed for approximate computing [2-4]. [1] has
10 designs. This paper deals with the analysis and design of two new
11 compared these adders and proposed several new metrics for
approximate 4-2 compressors for utilization in a multiplier.
12 These designs rely on different features of compression, such that evaluating approximate and probabilistic adders with respect
13 imprecision in computation (as measured by the error rate and to unified figures of merit for design assessment for inexact
14 the so-called normalized error distance) can meet with respect to computing applications. For each input to a circuit, the error
Fo
15 circuit-based figures of merit of a design (number of transistors, distance (ED) is defined as the arithmetic distance between an
16 delay and power consumption). Four different schemes for erroneous output and the correct one [1]. The mean error
utilizing the proposed approximate compressors are proposed
17 distance (MED) and normalized error distance (NED) are
and analyzed for a Dadda multiplier. Extensive simulation results
18 are provided and an application of the approximate multipliers proposed by considering the averaging effect of multiple
rP
19 to image processing is presented. The results show that the inputs and the normalization of multiple-bit adders. The NED
20 proposed designs accomplish significant reductions in power is nearly invariant with the size of an implementation and is
21 dissipation, delay and transistor count compared to an exact therefore useful in the reliability assessment of a specific
22 design; moreover, two of the proposed multiplier designs provide design. The tradeoff between precision and power has also
excellent capabilities for image multiplication with respect to
ee
23 been quantitatively evaluated in [1].
24 average normalized error distance and peak signal-to-noise ratio
(more than 50dB for the considered image examples). However, the design of approximate multipliers has
25 received less attention. Multiplication can be thought as the
26 Index Terms—Compressor, Dadda Multiplier, Inexact repeated sum of partial products; however, the straightforward
rR

27 Computing, Approximate Circuits application of approximate adders when designing an


28 approximate multiplier is not viable, because it would be very
29 I. INTRODUCTION inefficient in terms of precision, hardware complexity and
30
other performance metrics. Several approximate multipliers

M OST computer arithmetic applications are


ev

31
implemented using digital logic circuits, thus have been proposed in the literature [4] [5] [6] [7]. Most of
32
operating with a high degree of reliability and these designs use a truncated multiplication method; they
33
precision. However, many applications such as in multimedia estimate the least significant columns of the partial products as
34
ie

35 and image processing can tolerate errors and imprecision in a constant. In [4], an imprecise array multiplier is used for
36 computation and still produce meaningful and useful results. neural network applications by omitting some of the least
37 Accurate and precise models and algorithms are not always significant bits in the partial products (and thus removing
w

38 suitable or efficient for use in these applications. The some adders in the array). A truncated multiplier with a
39 paradigm of inexact computation relies on relaxing fully correction constant is proposed in [5]. For an n×n multiplier,
40 precise and completely deterministic building modules when this design calculates the sum of the n+k most significant
On

41 for example, designing energy-efficient systems. This allows columns of the partial products and truncates the other n-k
42 imprecise computation to redirect the existing design process columns. The n+k bit result is then rounded to n bits. The
43 of digital circuits and systems by taking advantage of a reduction error (i.e. the error generated by truncating then-k
44 decrease in complexity and cost with possibly a potential least significant bits) and rounding error (i.e. the error
45 generated by rounding the result to n bits) are found in the
ly

increase in performance and power efficiency. Approximate


46 (or inexact) computing relies on using this property to design next step. The correction constant (n+k bits) is selected to be
47 simplified, yet approximate circuits operating at higher as close as possible to the estimated value of the sum of these
48 performance and/or lower power consumption compared with errors to reduce the error distance.
49 precise (exact) logic circuits [1]. A truncated multiplier with constant correction has the
50 maximum error if the partial products in the n-k least
51 ___________________________________________ significant columns are all ones or all zeros. A variable
52 correction truncated multiplier has been proposed in [6].This
53 A Momeni and F. Lombardi are with the Department of Electrical and method changes the correction term based on column n-k-1. If
54 Computer Engineering, Northeastern University, Boston, MA 02115, USA; all partial products in columnn-k-1 are one, then the correction
55 {[email protected], [email protected]}. J. Han is with the
term is increased. Similarly, if all partial products in this
56 Department of Electrical and Computer Engineering, University of Alberta,
Edmonton, Canada; {[email protected]}, P. Montuschi is withthe column are zero, the correction term is decreased.
57 Department of Control and Computer Engineering, Politecnico di Torino, In [7], a simplified (and thus inaccurate) 2x2 multiplier
58 Turin, Italy;{[email protected]) block is proposed for building larger multiplier arrays. In the
59
design of a fast multiplier, compressors have been widely used
60
Transactions on Computers Page 2 of 13
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 2

1
[8-10] to speed up the partial product reduction tree and or i + 2.For the correct operation of the circuit shown in
2
decrease power dissipation. Optimized designs of 4-2 exact Figure 1, the following inequality must be satisfied
3
compressors have been proposed in [8, 11 - 16]. [17] [18] have  …  3 2 4 8  … (1)
4
also considered compression for approximate multiplication.
5
6 In [17], an approximate signed multiplier has been proposed
7 for use in arithmetic data value speculation (AVDS);
8 multiplication is performed using the Baugh-Wooley
9 algorithm. However, no new design is proposed for the
10 compressors for the inexact computation. Designs of
11 approximate compressors have been proposed in [18];
12 however, these designs do not target multiplication. It should
13 be noted that the approach of [7] improves over [17] [18] by
14 utilizing a simplified multiplier block that is amenable to
Fo
Figure 1.Schematic diagram of n-2 compressors in a multi operand addition
15 approximate multiplication.
circuit [13]
16 Initially in this paper, two novel approximate 4-2
17 compressors are proposed and analyzed. It is shown that these Where denotes the number of carry bits from slice ito
18 simplified compressors have better delay and power slice i+ j.
rP
19 consumption than the optimized (exact) 4-2 compressor A widely used structure for compression is the 4-2
20 designs found in the technical literature [8]. These compressor; a 4-2 compressor (Figure 2) can be implemented
21 approximate compressors are then used in the restoration with a carry bit between adjacent slices ( 1 1). The carry bit
22 module of a Dadda multiplier; four different schemes are from the position to the right is denoted as cin while the carry
ee
23 proposed for inexact multiplication. Extensive simulation bit into the higher position is denoted as cout. The two output
24 results are provided at circuit-level for figures of merit, such bits in positions i and i + 1are also referred to as the sum and
25 as delay, transistor count, power dissipation, error rate and carry respectively.
26 normalized error distance under CMOS feature sizes of 32, 22
rR

27 and 16 nm. The application of these multipliers to image


28 processing is then presented. The results of two examples of
29 multiplication of two images are reported; these results show
30
that the third and fourth approximate multipliers yield an
ev

31
output product image that has a very high quality and
32
resemblance to the image generated by an exact multiplier, i.e.
33
excellent values for the average NED and the Peak Signal-to-
34
ie

35 Noise Ratio (PSNR) are found (for the PSNR more than
36 50db). The analysis and simulation results show that the
proposed approximate designs for both the compressor and the Figure2.4-2 compressor
37
w

38 multiplier are viable candidates for inexact computing. The following equations give the outputs of the 4-2
39 This paper is organized as follows. Section 2 is a review of compressor, while Table 1 shows its truth table.
40 existing schemes for (exact) compressors. The two new
On

41 designs of an approximate 4-2 compressor are presented in 1 2 3 4 (2)


42 Section 3.Multiplication and four different approximate
1 2 3 1 2 1 (3)
43 multipliers are proposed in Section 4. Simulation results for
44 the approximate compressors and multipliers are provided in
1 2 3 4 1 2 3 4 4 (4)
45 Section 5. The application of the proposed approximate
ly

46 multipliers to image processing is presented in Section 6. The common implementation of a 4-2 compressor is
47 Section 7 concludes the manuscript. accomplished by utilizing two full-adder (FA) cells (Figure 3)
48 [8]. Different designs have been proposed in the literature for
49 II. EXACT COMPRESSORS 4-2 compressor [8, 11-16].
50 The main goal of either multi-operand carry-save addition Figure 4 shows the optimized design of an exact4-2
51 or parallel multiplication is to reduce n numbers to two compressor based on the so-called XOR-XNOR gates [8]; a
52 XOR-XNOR gate simultaneously generates the XOR and
numbers; therefore, n-2 compressors (or n-2 counters) have
53 XNOR output signals. The design of [8] consists of three
been widely used in computer arithmetic. An-2 compressor
54 XOR-XNOR (denoted by XOR*) gates, one XOR and two 2-1
(Figure 1) is usually a slice of a circuit that reduces n numbers
55
to two numbers when properly replicated. In slice i of the MUXes. The critical path of this design has a delay of 3Δ,
56
57 circuit, the n-2 compressor receives n bits in position i and one where Δ is the unitary delay through any gate in the design.
58 or more carry bits from the positions to the right, such as i – 1
59 or i – 2. It produces two output bits in positions i and i + 1 and
60 one or more carry bits into the higher positions, such as i + 1
Page 3 of 13 Transactions on Computers
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3

1
performance improvement compared to an exact compressor
2
with respect to delay, number of transistors and power
3
consumption.
4
5
6
7
8
9
10
11
12
13
14
Fo
15
16 Figure 3. Implementation of 4-2 Compressor
17 TABLE I
18 TRUTH TABLE OF 4-2 COMPRESSOR
rP
19 cin X4 X3 X2 X1 cout carry sum
20 0 0 0 0 0 0 0 0
21 0 0 0 0 1 0 0 1
22 0 0 0 1 0 0 0 1
0 0 0 1 1 1 0 0
ee
23 Figure4. Optimized 4-2 compressor of [8]
0 0 1 0 0 0 0 1
24 0 0 1 0 1 1 0 0 A. Design 1
25 0 0 1 1 0 1 0 0
0 0 1 1 1 1 0 1 As shown in Table I, the carry output in an exact
26
rR

0 1 0 0 0 0 0 1 compressor has the same value of the input cin in 24 out of 32


27 0 1 0 0 1 0 1 0
28 states. Therefore, an approximate design must consider this
0 1 0 1 0 0 1 0
29 0 1 0 1 1 1 0 1 feature. In Design 1, the carry is simplified to cin by changing
30 0 1 1 0 0 0 1 0 the value of the other 8 outputs.
0 1 1 0 1 1 0 1
ev

31 0 1 1 1 0 1 0 1
32 0 1 1 1 1 1 1 0 (5)
33 1 0 0 0 0 0 0 1
1 0 0 0 1 0 1 0 Since the Carry output has the higher weight of a binary bit,
34
ie

1 0 0 1 0 0 1 0
35 1 0 0 1 1 1 0 1
an erroneous value of this signal will produce a difference
36 1 0 1 0 0 0 1 0 value of two in the output. For example, if the input pattern is
37 1 0 1 0 1 1 0 1 “01001” (row 10 of Table II), the correct output is “010” that
w

1 0 1 1 0 1 0 1
38 1 0 1 1 1 1 1 0
is equal to 2. By simplifying the carry output to cin, the
39 1 1 0 0 0 0 1 0 approximate compressor will generate the “000” pattern at the
40 1 1 0 0 1 0 1 1 output (i.e. a value of 0). This substantial difference may not
On

41 1 1 0 1 0 0 1 1
be acceptable; however, it can be compensated or reduced by
1 1 0 1 1 1 1 0
42 1 1 1 0 0 0 1 1 simplifying the cout and sum signals. In particular, the
43 1 1 1 0 1 1 1 0 simplification of sum to a value of 0 (second half of Table II)
44 1 1 1 1 0 1 1 0 reduces the difference between the approximate and the exact
45 1 1 1 1 1 1 1 1
ly

outputs as well as the complexity of its design. Also, the


46 presence of some errors in the sum signal will results in a
47 reductions of the delay of producing the approximate sum and
48 III. PROPOSED APPROXIMATE COMPRESSORS the overall delay of the design (because it is on the critical
49 path).
50 In this section, two designs of an approximate compressor
51 are proposed. Intuitively to design an approximate 4-2
compressor, it is possible to substitute the exact full-adder 1 2 3 4 (6)
52
53 cells in Figure3 by an approximate full-adder cell (such as the
first design proposed in [2]). However, this is not very In the last step, the change of the value of cout in some
54
55 efficient, because it produces at least 17 incorrect results out states, may reduce the error distance provided by approximate
56 of 32 possible outputs, i.e. the error rate of this inexact carry and sum and also more simplification in the proposed
57 compressor is more than 53% (where the error rate is given design.
58 by the ratio of the number of erroneous outputs over the total
59 number of outputs). Two different designs are proposed next 1 2   3 4 (7)
60 to reduce the error rate; these designs offer significant
Transactions on Computers Page 4 of 13
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 4

1
B. Design 2
2
Although the above mentioned simplifications of carry and A second design of an approximate compressor is proposed
3
sum increase the error rate in the proposed approximate to further increase performance as well as reducing the error
4
compressor, its design complexity and therefore the power rate. Since the carry and cout outputs have the same weight,
5
6 consumption are considerably decreased. This can be realized the proposed equations for the approximate carry and cout in
7 by comparing (2)-(4) and (5)-(7).Table II shows the truth table the previous part can be interchanged. In this new design,
8 of the first proposed approximate compressor. It also shows carry uses the right hand side of (7) and cout is always equal to
9 the difference between the inexact output of the proposed cin; since cin is zero in the first stage, cout and cin will be zero in
10 approximate compressor and the output of the exact all stages. So, cin and cout can be ignored in the hardware
11 compressor. As shown in Table II, the proposed design has 12 design. Figure 7shows the block diagram of this approximate
12 incorrect outputs out of 32 outputs (thus yielding an error rate 4-2 compressor and the expressions below describe its outputs.
13 of 37.5%). This is less than the error rate using the best
14 approximate full-adder cell of [2]. 1 2 3 4 (8)
Fo
15 1 2   3 4 (9)
16 TABLE II
TRUTH TABLE OF THE FIRSTAPPROXIMATE 4-2 COMPRESSOR
17
cin X4 X3 X2 X1 cout’ carry’ sum' Difference
18
rP
0 0 0 0 0 0 0 1 1
19 0 0 0 0 1 0 0 1 0
20 0 0 0 1 0 0 0 1 0
21 0 0 0 1 1 0 0 1 -1
22 0 0 1 0 0 0 0 1 0
0 0 1 0 1 1 0 0 0
ee
23 0 0 1 1 0 1 0 0 0
24 0 0 1 1 1 1 0 1 0
25 0 1 0 0 0 0 0 1 0
26 0 1 0 0 1 1 0 0 0
rR

0 1 0 1 0 1 0 0 0
27 0 1 0 1 1 1 0 1 0 Figure 6. Gate level implementation of Design 1
28 0 1 1 0 0 0 0 1 -1
29 0 1 1 0 1 1 0 1 0
0 1 1 1 0 1 0 1 0
30
0 1 1 1 1 1 0 1 -1
ev

31 1 0 0 0 0 0 1 0 1
32 1 0 0 0 1 0 1 0 0
33 1 0 0 1 0 0 1 0 0
1 0 0 1 1 0 1 0 -1
34 1 0 1 0 0 0 1 0 0
ie

35 1 0 1 0 1 1 1 0 1
36 1 0 1 1 0 1 1 0 1
37 1 0 1 1 1 1 1 0 0
w

1 1 0 0 0 0 1 0 0 Figure7. Approximate 4-2 compressor, Design 2


38 1 1 0 0 1 1 1 0 1
39 1 1 0 1 0 1 1 0 1
40 1 1 0 1 1 1 1 0 0 Note that (9) is the same as (7) and (8) is the same as (6) for
On

41 1 1 1 0 0 0 1 0 -1 cin= 0. Figure 8 shows the gate level implementation of the


1 1 1 0 1 1 1 0 0
42 1 1 1 1 0 1 1 0 0
second proposed design. The delay of the critical path of this
43 1 1 1 1 1 1 1 0 -1 approximate design is 2Δ, so it is 1Δ less than the previous
44 designs; moreover, a further reduction in the number of gates
45 (5)-(7) are the logic expressions for the outputs of the first is accomplished.
ly

46 design of the approximate 4-2 compressor proposed in this


47 manuscript.
48 The gate level structure of the first proposed design (Figure
49 6) shows that the critical path of this compressor has still a
50 delay of 3Δ, so it is the same as for the exact compressor of
51 Figure 5. However, the propagation delay through the gates of
52 this design is lower than the one for the exact compressor. For
53 example, the propagation delay in the XOR* gate that Figure 8. Gate level implementation of Design 2
54 generates both the XOR and XNOR signals in [8], is higher
55 than the delay through a XNOR gate of the proposed design. Table III shows the truth table of the second approximate
56 design for a 4-2 compressor; this Table also shows the
Therefore, the critical path delay in the proposed design is
57 difference between the exact decimal value of the addition of
lower than in the exact design and moreover, the total number
58 the inputs and the decimal value of the outputs produced by
of gates in the proposed design is significantly less than that in
59 the approximate compressor. For example when all inputs are
the optimized exact compressor of [8].
60
Page 5 of 13 Transactions on Computers
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5

1
1, the decimal value of the addition of the inputs is 4. the partial products into at most four rows. In the second or
2
However, the approximate compressor produces a 1 for the final stage, 1 half-adder, 1 full-adder and 10 compressors are
3
carry and sum. The decimal value of the outputs in this case is used to compute the two final rows of partial products.
4
3; Table II shows that the difference is -1. Therefore, two stages of reduction and 3 half-adders, 3 full-
5
6 adders and 18 compressors are needed in the reduction
7 circuitry of an 8×8Dadda multiplier.
TABLE III In this paper, four cases are considered for designing an
8 TRUTH TABLE OF SECOND PROPOSED 4-2 COMPRESSOR
9 approximate multiplier.
X4 X3 X2 X1 carry’ sum' difference
10 0 0 0 0 0 1 1
11 0 0 0 1 0 1 0
12 0 0 1 0 0 1 0
13 0 0 1 1 0 1 -1
0 1 0 0 0 1 0
14 0 1 0 1 1 0 0
Fo
15 0 1 1 0 1 0 0
16 0 1 1 1 1 1 0
1 0 0 0 0 1 0
17 1 0 0 1 1 0 0
18 1 0 1 0 1 0 0
rP
19 1 0 1 1 1 1 0
20 1 1 0 0 0 1 -1
1 1 0 1 1 1 0
21 1 1 1 0 1 1 0
22 1 1 1 1 1 1 -1
ee
23
24 This design has therefore 4 incorrect outputs out of 16
25 outputs, so its error rate is now reduced to 25%. This is a very
26 positive feature, because it shows that on a probabilistic basis,
rR

27 the imprecision of the proposed design is smaller than the


28 other available schemes.
29
30 IV. MULTIPLICATION
ev

31
32 In this section, the impact of using the proposed
33 compressors for multiplication is investigated. A fast (exact)
34 multiplier is usually composed of three parts (or modules) [8].
• Partial product generation.
ie

35
36 • A Carry Save Adder (CSA) tree to reduce the partial
37 products’ matrix to an addition of only two operands
w

38 • A Carry Propagation Adder (CPA) for the final


39 computation of the binary result.
40 In the design of a multiplier, the second module plays a
On

41 pivotal role in terms of delay, power consumption and circuit


42 complexity. Compressors have been widely used [9, 10] to
43 speed up the CSA tree and decrease its power dissipation, so
44 to achieve fast and low-power operation. The use of
45
ly

approximate compressors in the CSA tree of a multiplier


46 results in an approximate multiplier.
47 Figure 9. Reduction circuitry of an 8×8Dadda multiplier, (a) using Design
A 8×8 unsigned Dadda tree multiplier is considered to 1 compressors, (b) using Design 2 compressors
48
assess the impact of using the proposed compressors in
49
50
approximate multipliers. The proposed multiplier uses in the • In the first case (Multiplier 1), Design 1 is used for all 4-2
first part AND gates to generate all partial products. In the compressors in Figure 9(a).
51
52
second part, the approximate compressors proposed in the • In the second case (Multiplier 2), Design 2 is used for the
53 previous section are utilized in the CSA tree to reduce the 4-2 compressors. Since Design 2 does not have cin and
54 partial products. The last part is an exact CPA to compute the cout, the reduction circuitry of this multiplier requires a
55 final binary result. Figure 9(a) shows the reduction circuitry of lower number of compressors (Figure 9(b)). Multiplier 2
56 an exact multiplier for n=8. In this figure, the reduction part uses 6 half-adders, 1 full-adder and 17 compressors.
57 uses half-adders, full-adders and 4-2 compressors; each partial • In the third case (Multiplier 3), Design 1 is used for the
58 product bit is represented by a dot. In the first stage, 2 half- compressors in then-1 least significant columns. The other
59 adders, 2 full-adders and 8 compressors are utilized to reduce n most significant columns in the reduction circuitry use
60 exact 4-2 compressors.
Transactions on Computers Page 6 of 13
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 6

1
• In the fourth case (Multiplier 4), Design 2 and exact 4-2 achieve significant improvement in terms of power
2
compressors are used in then-1 least significant columns consumption; on average at different feature sizes, the power
3
and then most significant columns in the reduction consumption of Design 1 is 57% less than the exact
4
5 circuitry respectively. compressor, while Design 2 has a power consumption that is
6 The objectives of the first two approximate designs are to 60% less than the exact design of [8].
7 reduce the delay and power consumption compared with an Table V compares these designs in terms of number of
8 exact multiplier; however, a high error distance is expected. transistors, as a measure of circuit complexity. The exact
9 The next two approximate multipliers (i.e. Multipliers 3 and 4) compressor [8] uses 10 transistors to implement each XOR*
10 are proposed to decrease the error distance. The delay in these gate, 6 transistors to implement the XOR gate and 8 transistors
11 designs is determined by the exact compressors that are in the to implement each MUX gate [8]; therefore, the exact
12 critical path; therefore, there is no improvement in delay for compressor utilizes 52 transistors. A 50% improvement in
13 these approximate designs compared with an exact multiplier. circuit complexity is accomplished by Design 2, as reflected
14 However, it is expected that the utilization of approximate by the lower number of transistors. This is expected because
Fo
15 compressors in the least significant columns will decrease the the second approximate design has no cin and cout with only 4
16 power consumption and transistor count (as measure of circuit inputs and 2 outputs (the exact compressor has 5 inputs and 3
17 complexity). While the first two proposed multipliers have outputs).
18 better performance in terms of delay and power consumption,
rP
19 the error distances in the third and fourth designs are expected TABLE V
20 COMPARISON OF NUMBER OF TRANSISTORS
to be significantly lower.
21 Design Number of transistors
22 Exact Design [8] 52
V. SIMULATION RESULTS Design 1 28
ee
23 Design 2 26
In this section, he designs of the two approximate
24
compressors (Section III) and the four approximate multipliers
25
(Section IV) are simulated using HSPICE. Predictive
26 B. Approximate Multipliers
rR

Technology Models (PTMs) at different CMOS feature sizes


27 The four proposed approximate multipliers are simulated for
(32 nm, 22 nm and 16 nm) are utilized in the HSPICE
28 n=8. The delay, power consumption and number of transistors
simulation.
29 are investigated for these approximate designs as well as the
30 A. Approximate Compressors exact multiplier. A comparison of the error distance (as
ev

31 The two approximate compressors of this paper and the best measure of reliability [1]) of the proposed multipliers with
32 low-power exact compressor of [8] (implemented by using other approximate multipliers is also pursued.
33 XOR-XNOR gates) are simulated at a 1 GHz frequency; a fan-
34 out of 4 is utilized in all simulations. The simulation results of • Delay
ie

35 the delay, power consumption and power-delay product (PDP) The delay of the reduction circuitry (second module) of a
36 are given in Table IV by using the PTMs at 32 nm, 22 nm and Dadda multiplier is dependent on the number of reduction
37
w

16 nm. stages and the delay of each stage. In Multipliers 1 and 2, the
38 approximate compressors are used in all columns; therefore,
39 TABLE IV the delay of the stages is equal to the delay of the approximate
40 SIMULATION RESULTS (@32 NM) compressors. However, in Multipliers 3 and 4, the delay of the
On

41 Design Delay(ps) Power(μW) PDP(aJ) stages is equal to the delay of the exact compressors. So, the
42 @32 nm use of these approximate compressors in the n/2 LSBs cause
43 Exact Design [8] 60.36 2.98 180 no improvement in terms of delay compared to an exact
44 Design 1 58.32 1.27 74
multiplier. The delay improvement in the reduction circuitry
45 Design 2 44.35 1.14 50
ly

@22 nm of each multiplier (at 32 nm CMOS technology) compared to


46
Exact Design [8] 55.82 1.50 84 an exact adder is shown in Table VI.
47 Design 1 56.79 0.62 35
48 Design 2 41.69 0.58 24 TABLE VI
49 @16 nm DELAY IMPROVEMENT IN REDUCTION CIRCUITRY
50 Exact Design [8] 47.59 0.95 45
Design Improvement (%)
Design 1 37.16 0.39 14
51 Design 2 24.44 0.36 9 Multiplier 1 3.38
52 Multiplier 2 26.52
53 As expected, the second proposed design (Design 2) has the
Multiplier 3 0
Multiplier 4 0
54 best delay, power consumption and PDP; these improvements
55
56
are irrespective of feature size. This approximate design is • Power Consumption
62% faster than the exact compressor at 16 nm CMOS The power consumption of each multiplier is determined by
57
technology and 44% faster on average for the three feature the number and type of compressors used. Multipliers 1 and 2
58
sizes considered. Moreover on average, Design 2 is also 35% use only approximate compressors so they have power
59
faster than Design 1. The two proposed approximate designs consumption lower than Multipliers 3 and 4. Table VII shows
60
Page 7 of 13 Transactions on Computers
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 7

1
the power consumption improvement of each multiplier at 32 each input. Therefore the average NED is equivalent to the
2
nm feature size with respect to an exact adder; this confirms NED defined in [1]. The maximum high (low) NED is also
3
that an approximate multiplier in the reduction circuitry will defined as the largest absolute value of NED for the case in
4 result in a considerable power saving.
5 which the erroneous result is more (less) than the exact result.
6 Table X shows the average NED, the maximum high and low
TABLE VII
7 POWER CONSUMPTION IMPROVEMENT IN REDUCTION CIRCUITRY
NEDs and the number of correct results (or outputs) of
8 Design Improvement (%) approximate multipliers for n=8. The number of correct
9 Multiplier 1 52.49 outputs out of the total outputs represents the probability of
10 Multiplier 2 58.58 correctness for each design. Based on Table X, the probability
Multiplier 3 17.50 of correctness in Multiplier 1 is 0.16% (103 out of 65025)
11 Multiplier 4 26.15
12 while the probability of correctness in Multiplier 4 is 14.3%
13 (9320 out of 65025). Since the proposed approximate
14 • Transistor Count compressors produce erroneous results for all-zero input
The transistor count is used in this paper as metric of circuit
Fo
15 patterns (row 1 in Tables II and III), the proposed approximate
16 complexity. The first two approximate multipliers have a multipliers will generate an erroneous result if at least one of
lower transistor count compared with Multipliers 3 and 4.
17 the inputs is zero. However, in these cases (511 cases for n=8)
Table VIII shows the transistor count improvement of the
18 the multiplier can produce correct result by adding a circuit for
reduction circuitry of each multiplier compared to an exact
rP
19 detecting the zero-valued inputs. Therefore, the zero-valued
adder.
20 input patterns are not considered further in the simulation to
21 TABLE VIII investigate the proposed multipliers for a fair comparison.
22 TRANSISTOR COUNT IMPROVEMENT IN REDUCTION CIRCUITRY
ee
23 Design Improvement (%) TABLEX
NED FOR N = 8
24 Multiplier 1 42.11
Multiplier 2 48.15 Average Max High Max Low correct outputs
25 Design
NED NED NED (out of 65025)
Multiplier 3 14.03
26 Multiplier 4 22.42 Multiplier 1 0.6065×10-1 0.1593 0.1375 103
rR

27 Multiplier 2 0.5352×10-1 0.1278 0.1329 458


28 Multiplier 3 0.9199×10-3 0.3199×10-2 0.2707×10-2 5888
• Error Distance
29 Multiplier 4 0.7827×10-3 0.1845×10-2 0.3076×10-2 9320
Four additional approximate multipliers are simulated to Multiplier 5 [7] 0.1400×10-1 0 0.2222 34400
30 compare the error distance. The multiplier (Multiplier 5) Multiplier 6 [5] 0.1609×10-2 0.3937×10-2 0.9858×10-2 0
ev

31 proposed in [7] is simulated for n=8. The truncated multiplier Multiplier 7 [6] 0.1146×10-2 0.3060×10-2 0.4045×10-2 769
32 with constant correction [5] (Multiplier 6) and the truncated Multiplier 8 0.1049 0.2263 0.1207 8
33
multiplier with variable correction [6] (Multiplier 7) are also
34 Based on Table X, Multiplier 4 has the lowest average NED
simulated for n=8 and k=1. A further approximate multiplier
ie

35 among all approximate multipliers. The average NED of


(Multiplier 8) is simulated to investigate the impact of using
36 Multiplier 4 is 18 times better than Multiplier 5, 2 times better
the proposed approximate compressors compared with other
37 than Multiplier 6 and 1.5 times better than Multiplier 7.
w

38 approximate compressors. This 8×8 Dadda multiplier uses 4-2


Multiplier 5 has the highest number of correct outputs. It has
39 compressors made of two approximate full-adders (Figure 3).
also the lowest maximum high NED. As the approximate
40 The first full-adder design proposed in [2] is used in this
output is always less than the exact output, the maximum high
On

41 approximate multiplier. Table IX summarizes the eight


NED is 0 for this design; however, it has the worst maximum
42 approximate multipliers assessed in this manuscript, i.e. the
low NED among all considered designs.
43 four proposed designs and the other four approximate
A plot of the NED distribution is also generated (Figure 10)
44 multipliers together with their salient features.
to compare the performance of the approximate multipliers.
45
ly

TABLE IX The range of the product in a 8×8 multiplier is between 0 and


46
APPROXIMATE MULTIPLIERS AND THEIR FEATURES 65025 (unsigned values). All possible outputs are categorized
47
Design Feature in 127 intervals; in the first interval the output is between 0
48
Multiplier 1 Design 1 in all columns and 512, in the second interval the output is between 513 and
49 Multiplier 2 Design 2 in all columns
50 1024 and so on. In the last interval the output is between
Multiplier 3 Design 1 in LSBs and exact compressor in MSBs
51 Multiplier 4 Design 2 in LSBs and exact compressor in MSBs 64513 and 65025. The average NED of each interval is then
52 Multiplier 5 [7] Approximate 2x2 multiplier blocks computed for the approximate multiplier. Figures 10a and 10b
Multiplier 6 [5] Truncated multiplier with constant correction show that for Multipliers 1 and 2, the average NED increases
53 Multiplier 7 [6] Truncated multiplier with variable correction
54 Multiplier 8 Compressors made of approximate FAs [2]
only at very large or very small product values, i.e. these
55 approximate multiplier incur on average in a small error in
56 The normalized error distance (NED) is used to compare output compared to the exact calculation.
57 these approximate multipliers. In [1], the NED is defined as
58 VI. APPLICATION: IMAGE PROCESSING
the average error distance over all inputs, normalized by the
59 maximum possible error. In this paper the NED is defined for In this section the application of the proposed approximate
60
Transactions on Computers Page 8 of 13
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 8

1
multipliers to image processing is illustrated. A multiplier is
2
used to multiply two images on a pixel by pixel basis, thus Figure 11 shows two examples: both input images and the
3
blending the two images into a single output image. resulting output image are provided. A program has been
4
developed in C# .net and simulated in Microsoft Visual Studio
5
2010 using the 8 approximate multipliers at n=8. Figures 12
6
and 13 show the outputs for the two examples.
7 The average NED and the Peak Signal-to-Noise Ratio
8
(PSNR) that is based on the Mean Squared Error (MSE) are
9
computed to assess the quality of the output image and
10
compare it with the output image generated by an exact
11
multiplier. The equations for the MSE and PSNR are given in
12
13 (10) and (11); in (10), m and p are the image dimensions and
14 I(i,j) and K(i,j) are the exact and obtained values of each pixel
respectively. In (11), MAXI represents the maximum value of
Fo
15
16 each pixel.
17
18 MSE ∑ ∑ , , (10)
rP
19
20 PSNR 10 (11)
MSE
21
22
ee
23
24
25
26
rR

27
28
29
30
ev

31
32
33
34
ie

35 Figure11. Image multiplication (a) example 1, (b) example 2 (both using an


exact multiplier)
36
37
w

38
39
40
On

41
42
43
44
45
ly

46
47
48
49
50
51 Figure12. Image multiplication results for example 1, (a) Multiplier 1, (b)
52 Multiplier 2, (c) Multiplier 3, (d) Multiplier 4, (e) Multiplier 5, (f) Multiplier
6, (g) Multiplier 7, (h) Multiplier 8.
53
54
55
56
57
58 Figure10.Average NED distribution in 8×8 approximate multipliers. (a)
59 Multiplier 1, (b) Multiplier 2, (c) Multiplier 3, (d) Multiplier 4
60
Page 9 of 13 Transactions on Computers
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 9

1
compressors are utilized in the reduction module of four
2
approximate multipliers. The approximate compressors show a
3
significant reduction in transistor count, power consumption
4 and delay compared with an exact design.
5
• In terms of transistor count, the first design has a 46%
6
improvement, while the second design has a 50%
7
improvement.
8
• In terms of power consumption, the first design has a 57%
9
improvement and the second design has a 60%
10
improvement on average for CMOS implementation at
11
feature sizes of 32, 22 and 16 nm.
12
• In terms of delay, the second design has a 44%
13 Figure13. Image multiplication results for example2, (a) Multiplier 1, (b)
Multiplier 2, (c) Multiplier 3, (d) Multiplier 4, (e) Multiplier 5, (f) Multiplier improvement compared to the exact compressor and 35%
14
6, (g) Multiplier 7, (h) Multiplier 8 improvement compared to the first design on average at
Fo
15
different CMOS feature sizes of 32, 22 and 16 nm.
16 TABLE XI
PSNR AND AVERAGE NED FOR FIRST EXAMPLE
Four different approximate schemes have been proposed in
17 this paper to investigate the performance of the approximate
18 Design PSNR (dB) Average NED(×10-2)
compressors for the aforementioned metrics for inexact
rP
19 Multiplier 1 25.3 4.4
multiplication. The approximate compressors have been
Multiplier 2 26.3 3.7
20 utilized in the reduction module of a Dadda multiplier. The
Multiplier 3 53.9 0.10
21 Multiplier 4 53.2 0.12 following conclusions can be drawn from the simulation
22 Multiplier 5 [7] 26.3 2.3
results presented in this manuscript.
ee
23 Multiplier 6 [5] 48.3 0.28
Multiplier 7 [6] 52.3 0.15 • The first and second proposed multipliers show a
24 Multiplier 8 21.2 7.6 significant improvement in terms of power consumption
25 and transistor count compared to an exact multiplier.
26 TABLE XII • The first and second multipliers have larger average
rR

27 PSNR AND AVERAGE NED FOR SECOND EXAMPLE


NEDs (and thus, larger PSNRs), while the second
28 Design PSNR (dB) Average NED(×10-2)
multiplier that uses the second proposed approximate
29 Multiplier 1 25.1 4.5
compressor for all bits, has the best delay.
30 Multiplier 2 25.8 4.1
Multiplier 3 54.2 0.096 • With relatively modest reductions in transistor count and
ev

31 Multiplier 4 54.9 0.083 power consumption, the third and fourth proposed
32 Multiplier 5 [7] 35.7 0.72 multipliers have very low average NED values, thus
33 Multiplier 6 [5] 52.4 0.14
presenting the best tradeoff for energy with accuracy.
34 Multiplier 7 [6] 53.5 0.11
Moreover, the application of these approximate multipliers
ie

Multiplier 8 18.7 10.4


35 to image processing has confirmed that two of the proposed
36 designs achieve a PSNR of nearly 50dB in the output
37 Tables XI and XII show that the PSNRs of the output
w

images generated by Multipliers 3 and 4, are nearly 50 dB, a generated by multiplying two input images, thus viable for
38 most applications.
39 value that is acceptable for most applications. Consistently,
Table XIII compares the four proposed approximate design
40 Multiplier 1 has the worst PSNR among 4 proposed designs.
with four other approximate designs found in the technical
On

41 As discussed previously, the proposed approximate multipliers


literature by ranking them under various metrics. Multiplier 4
42 have a higher error distance for very large and very small is overall the best design with respect to all figures of merit for
43 input values in the product operands. Therefore the pixels that approximate multiplication as well as the two PSNR
44 have high RGB (Red-Green-Blue) model values (such as of a examples. Multiplier 5 has the best performance in terms of
45 white color) or small RGB model values (such as those of a
ly

Max High NED and number of correct outputs; however, its


46 black color), show a larger inaccuracy than other pixels due to rather poor performance for the other figures of merit causes
47 the approximate nature of the compressors. However, the error its ranking to be in the middle once the PSNR examples are
48 distance of Multipliers3 and 4 still remains very low. considered. Multiplier 3 is the second best design among the
49 schemes considered in this manuscript. It offers overall good
50 VII. CONCLUSION performance in most metrics of Table XIII. Current and future
51 research addresses the tradeoffs of the different figures of
Inexact computing is an emerging paradigm for
52 merit in the proposed designs to establish conditions by which
computation at nanoscale. Computer arithmetic offers
53 significant operational advantages for inexact computing; an combined metrics can be attained. Moreover, physical designs
54 extensive literature exists on approximate adders. However, of the approximate multipliers are being pursued to further
55 this paper has initially focused on compression as used in a confirm the analysis presented in this paper.
56 multiplier; to the best knowledge of the authors, no work has In conclusion, this paper has shown that by an appropriate
57 been reported on this topic. design of an approximate compressor, multipliers can be
58 This paper has presented the novel designs of two designed for inexact computing; these multipliers offer
59 approximate 4-2 compressors. These approximate significant advantages in terms of both circuit-level and error
60
Transactions on Computers Page 10 of 13
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 10

1
figures of merit. Although not discussed and beyond the scope
2
of this manuscript, the proposed designs may also be useful in
3
other arithmetic circuits for applications in which inexact
4 computing can be used. The provision of an error indicator (as
5 required for other applications) is a topic of current
6 investigation.
7
8 TABLE XIII
9 RANKING OF APPROXIMATE MULTIPLIERS
10 Design Average NED Max High NED Max Low NED Correct Outputs PSNR example 1 PSNR example 2
11 Multiplier 1 7 7 7 6 7 7
12 Multiplier 2 6 6 6 5 5 6
Multiplier 3 2 4 1 3 1 2
13 Multiplier 4 1 2 2 2 2 1
14 Multiplier 5 [7] 5 1 8 1 5 5
Fo
15 Multiplier 6 [5] 4 5 4 8 4 4
16 Multiplier 7 [6] 3 3 3 4 3 3
Multiplier 8 8 8 5 7 8 8
17
[17] D. Kelly, B. Phillips, S. Al-Sarawi, "Approximate signed binary integer
18 multipliers for arithmetic data value speculation", in Proc. of the
rP
19 REFERENCES conference on design and architectures for signal and image processing,
20 [1] J. Liang, J. Han, F. Lombardi, “New Metrics for the Reliability of 2009.
21 Approximate and Probabilistic Adders,” IEEE Transactions on [18] J. Ma, K. Man, T. Krilavicius, S. Guan, and T. Jeong, “Implementation
Computers,vol. 63, no. 9, pp. 1760 - 1771, 2013. of High Performance Multipliers Based on Approximate Compressor
22 [2] V. Gupta, D. Mohapatra, S. P. Park, A. Raghunathan, K. Roy, Design” in international Conference on Electrical and Control
ee
23 “IMPACT: IMPrecise adders for low-power approximate computing,” Technologies (ECT), 2011.
24 Low Power Electronics and Design (ISLPED) 2011 International
Symposium on. 1-3 Aug. 2011. Fabrizio Lombardi (M’81–SM’02-F’09)
25 [3] S. Cheemalavagu, P. Korkmaz, K.V. Palem, B.E.S. Akgul, and L.N. graduated in 1977 from the University of
26 Chakrapani, “A probabilistic CMOS switch and its realization by Essex (UK) with a B.Sc. (Hons.) in
rR

27 exploiting noise,” in Proc. IFIP-VLSI SoC, Perth, Western Australia, Electronic Engineering. In 1977 he joined the
28 Oct. 2005. Microwave Research Unit at University
[4] H.R. Mahdiani, A. Ahmadi, S.M. Fakhraie, C. Lucas, “Bio-Inspired College London, where he received the
29 Imprecise Computational Blocks for Efficient VLSI Implementation of Master in Microwaves and Modern Optics
30 Soft-Computing Applications,” IEEE Transactions on Circuits and (1978), the Diploma in Microwave
ev

31 Systems I: Regular Papers, vol. 57, no. 4, pp. 850-862, April 2010. Engineering (1978) and the Ph.D. from the
32 [5] M. J. Schulte and E. E. Swartzlander, Jr., “Truncated multiplication with University of London (1982).He is currently
correction constant,” VLSI Signal Processing VI, pp. 388–396, 1993. the holder of the International Test
33 [6] E. J. King and E. E. Swartzlander, Jr., “Data dependent truncated Conference (ITC) Endowed Chair Professorship at Northeastern
34 scheme for parallel multiplication,” in Proceedings of the Thirty First University, Boston. During 2007-2010 Dr. Lombardi was the Editor-In-
ie

35 Asilomar Conference on Signals, Circuits and Systems, pp. 1178–1182, Chief of the IEEE Transactions on Computers. He is also an Associate
1998. Editor of the IEEE Transactions on Nanotechnology and the inaugural
36 Editor-in-Chief of the IEEE Transactions on Emerging Topics in
[7] P. Kulkarni, P. Gupta, and MD Ercegovac, “Trading accuracy for power
37
w

in a multiplier architecture”, Journal of Low Power Electronics, vol. 7, Computing. He currently serves as an elected Member of the Board of
38 no. 4, pp. 490--501, 2011. Governors of the IEEE Computer Society. His research interests are bio-
[8] C. Chang, J. Gu, M. Zhang, “Ultra Low-Voltage Low- Power CMOS 4- inspired and nano manufacturing/computing, VLSI design, testing, and
39 fault/defect tolerance of digital systems. He has extensively published in
2 and 5-2 Compressors for Fast Arithmetic Circuits,” IEEE Transactions
40 on Circuits & Systems, Vol. 51, No. 10, pp. 1985-1997, Oct. 2004. these areas and coauthored/edited seven books.
On

41 [9] D. Radhakrishnan and A. P. Preethy, “Low-power CMOS pass logic 4-2  


42 compressor for high-speed multiplication,” in Proc. 43rd IEEE Midwest Jie Han (S’02–M’05) received the B.Sc.
43 Symp. Circuits Syst., vol. 3, 2000, pp. 1296–1298. degree in electronic engineering from
[10] Z. Wang, G. A. Jullien, and W. C. Miller, “A new design technique for Tsinghua University, Beijing, China, in 1999
44 column compression multipliers,” IEEE Trans. Comput., vol. 44, pp. and the Ph.D. degree from Delft University
45
ly

962–970, Aug. 1995. of Technology, The Netherlands, in 2004.


46 [11] J. Gu, C. H. Chang, “Ultra Low-voltage, low-power 4-2 compressor for He is currently an assistant professor in the
47 high speed multiplications,” in Proc. 36th IEEE Int. Symp. Circuits Department of Electrical and Computer
Systems, Bangkok, Thailand, May 2003. Engineering at the University of Alberta,
48 [12] M. Margala and N. G. Durdle, “Low-power low-voltage 4-2 Edmonton, AB, Canada. His research
49 compressors for VLSI applications,” in Proc. IEEE Alessandro Volta interests include reliability and fault
50 Memorial Workshop Low-Power Design, 1999, pp. 84–90. tolerance, nanoelectronic circuits and
[13] B. Parhami, “Computer Arithmetic: Algorithms and Hardware Designs,” systems, novel computational models for
51 2nd edition, Oxford University Press, New York, 2010. nanoscale and biological applications.Dr. Han was nominated for the
52 [14] K. Prasad and K. K. Parhi, “Low-power 4-2 and 5-2 compressors,” in 2006 Christiaan Huygens Prize of Science by the Royal Dutch Academy
53 Proc. of the 35th Asilomar Conf. on Signals, Systems and Computers, of Science (KoninklijkeNederlandseAkademie van Wetenschappen
54 vol. 1, 2001, pp. 129–133. (KNAW) Christiaan Huygens Wetenschapsprijs). His work was
[15] Ercegovac, Miloš D., and Tomas Lang. Digital arithmetic. Elsevier, recognized by the 125th anniversary issue of Science, for developing
55 2003. theory of fault-tolerant nanocircuits. He served as a Technical Program
56 [16] Baran, Dursun, Mustafa Aktan, and Vojin G. Oklobdzija. "Energy Chair and a General Chair in IEEE International Symposium on Defect
57 efficient implementation of parallel CMOS multipliers with improved and Fault Tolerance in VLSI and Nanotechnology Systems (DFT) 2012
58 compressors."Proc. of the 16th ACM/IEEE international symposium on and 2013, respectively. He has also served as a Technical Program
Low power electronics and design. ACM, 2010. Committee Member in several other international symposia and
59 conferences.
60
Page 11 of 13 Transactions on Computers
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 11

1
2 Paolo Montuschi is a Professor of
3 Computer Engineering at Politecnico
4 di Torino and Deputy Chair of the
Control and Computer Engineering
5 Department. Previously, he served as
6 Chair of Department from 2003 to
7 2011, and as Chair or Member of
8 several Boards. He is currently
serving as Associate Editor-in-Chief
9 of the IEEE Transactions on
10 Computers, as well as member of the
11 steering committee of the IEEE Transactions on Emerging Topics in
Computing and of the Advisory Board of Computing Now. He is also
12 serving in the Board of Governors of the IEEE Computer Society, as
13 Chair of the Magazine Operations Committee of the Computer Society
14 and member of the Publications Board, Audit and Digital Library
Fo
Operations Committees. Previously, he served as chair of the Electronic
15 Products and Services and the Digital Library Operations Committees,
16 member of Electronic Products and Services Committee, Member-at-
17 Large of the Computer Society’s Publications Board, and Member of
18 Conference Publications Operations Committee. He served as guest and
associate editor of the IEEE Transactions on Computers from 2000 to
rP
19 2004 and from 2009 to 2012, and co-chair, program and steering
20 committee member of several conferences. His current main research
21 interests and scientific achievements are in computer arithmetic,
architectures, graphics, and new publication frameworks for “augmented
22 reading” and scientific knowledge dissemination. Within the Computer
ee
23 Society he is actively involved in opening the door to new publication
24 frameworks geared towards e-reading and mobile devices. He is a
Computer Society Golden Core Member, and an IEEE Senior Member.
25 Montuschi obtained a PhD in computer engineering in 1989, and since
26 2000 he has been full Professor.
rR

27
28 Amir Momeni received the BS degree in
29 computer engineering from Sharif University of
Technology, and the MS degree in computer
30 engineering from Shahid Beheshti University,
ev

31 Iran. He is currently working toward the PhD


32 degree in computer engineering at Northeastern
33 University. His research interests include VLSI,
EDA, parallel computing, and heterogeneous
34 systems. He is a student member of the IEEE.
ie

35
36
37
w

38
39
40
On

41
42
43
44
45
ly

46
47
48
49
50
51
52
53
54
55
56
57
58
59
60

You might also like