PaperID 74S201921

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

International Journal of Research in Advent Technology, Vol.7, No.

4S, April 2019


E-ISSN: 2321-9637
Available online at www.ijrat.org

Design of MAC Unit For DSP Applications using Verilog


HDL
1
B.Hemalatha, ,2Dr.Hari Shanker Srivastava,3V.Vinay Kumar
1
Asst.Professor, 2, 3Assoc.Prof, Dept of ECE
1, 2, 3
Anurag Group of Institutions

Abstract: With the recent rapid advances in multimedia and communication systems, real-time signal processing like audio
signal processing, video/image processing, or large-capacity data processing are increasingly being demanded. The
multiplier and multiplier-and-accumulator (MAC) are the essential elements of the digital signal processing such as filtering,
convolution, transformations and Inner products. There are different entities that one would like to optimize when designing
a VLSI circuit. These entities can often not be optimized simultaneously, only improve one entity at the expense of one or
more others The design of an efficient integrated circuit in terms of power, area, and speed simultaneously, has become a
very challenging problem. Power dissipation is recognized as a critical parameter in modern the objective of a good
multiplier is to provide a physically compact, good speed and lowpower consuming chip.
This paper proposes a new architecture of multiplier-and-accumulator (MAC) for high speed and low-power by
adopting the new SPST implementing approach. This multiplier is designed by equipping the Spurious Power Suppression
Technique (SPST) on a modified Booth encoder which is controlled by a detection unit using an AND gate. The modified
booth encoder will reduce the number of partial products generated by a factor of 2. The SPST adder will avoid the
unwanted addition and thus minimize the switching power dissipation. By combining multiplication with accumulation and
devising a low power equipped carry save adder (CSA), the performance was improved.
In this project we used Modelsim for logical verification, and further synthesizing it on Xilinx-ISE tool using target
technology and performing placing & routing operation for system verification on targeted FPGA.

appropriate bit of the „multiplicand‟. The delayed, gated


1. INTRODUCTION instance of the multiplicand must all be in the same
column of the shifted partial product matrix. They are
Power dissipation is recognized as a critical parameter in then added to form the product bit for the particular form.
modern VLSI design field. To satisfy MOORE‟S law and to Multiplication is therefore a multi operand operation. To
produce consumer electronics goods with more backup and extend the multiplication to both signed and unsigned
less weight, low power VLSI design is necessary. Fast numbers, a convenient number system would be the
multipliers are essential parts of digital signal processing representation of numbers in two‟s complement format.
systems. The speed of multiply operation is of great The MAC (Multiplier and Accumulator Unit) is
importance in digital signal processing as well as in the used for image processing and digital signal processing
general purpose processors today, especially since the media (DSP) in a DSP processor. Algorithm of MAC is Booth's
processing took off. In the past multiplication was generally radix-2 algorithm, Modified Booth Multiplier; 17-bit
implemented via a sequence of addition, subtraction, and SPST adder improves speed and reduces the power.
shift operations. Multiplication can be considered as a series
of repeated additions. The number to be added is the 2. OBJECTIVE
multiplicand, the number of times that it is added is the
multiplier, and the result is the product. Each step of The main objective of this paper is to design and
addition generates a partial product. In most computers, the implementation of a Multiplier and Accumulator r. A
operand usually contains the same number of bits. When the multiplier which is a combination of Modified Booth and
operands are interpreted as integers, the product is generally SPST (Spurious Power Suppression Technique) adder are
twice the length of operands in order to preserve the designed taking into account the less area consumption of
information content. This repeated addition method that is booth algorithm because of less number of partial
suggested by the arithmetic definition is slow that it is products and more speedy accumulation of partial
almost always replaced by an algorithm that makes use of products and less power consumption of partial products
positional representation. It is possible to decompose addition using SPST adder approach.
multipliers into two parts. The first part is dedicated to the
generation of partial products, and the second one collects 3. BASICS OF MULTIPLIER
and adds them.
The basic multiplication principle is two fold i.e. Multiplication is a mathematical operation that at
evaluation of partial products and accumulation of the its simplest is an abbreviated process of adding an integer
shifted partial products. It is performed by the successive to itself a specified number of times. A number
additions of the columns of the shifted partial product (multiplicand) is added to itself a number of times as
matrix. The „multiplier‟ is successfully shifted and gates the specified by another number (multiplier) to form a result

87
International Journal of Research in Advent Technology, Vol.7, No.4S, April 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org

(product). In elementary school, students learn to multiply In the binary number system the digits, called
by placing the multiplicand on top of the multiplier. The bits, are limited to the set [0, 1]. The result of multiplying
multiplicand is then multiplied by each digit of the any binary number by a single binary bit is either 0, or the
multiplier beginning with the rightmost, Least Significant original number. This makes forming the intermediate
Digit (LSD). Intermediate results (partial products) are partial-products simple and efficient. Summing these
placed one atop the other, offset by one digit to align digits partial-products is the time consuming task for binary
of the same weight. The final product is determined by multipliers. One logical approach is to form the partial-
summation of all the partial-products. Although most people products one at a time and sum them as they are
think of multiplication only in base 10, this technique generated. Often implemented by software on processors
applies equally to any base, including binary. Figure 1.1 that do not have a hardware multiplier, this technique
shows the data flow for the basic multiplication technique works fine, but is slow because at least one machine cycle
just described. Each black dot represents a single digit. is required to sum each additional partial-product. For
applications where this approach does not provide enough
performance, multipliers can be implemented directly in
hardware. The two main categories of binary
multiplication include signed and unsigned numbers.
Digit multiplication is a series of bit shifts and series of
bit additions, where the two numbers, the multiplicand
and the multiplier are combined into the result.
Considering the bit representation of the multiplicand x =
xn-1…..x1 x0 and the multiplier y = yn-1…..y1y0 in
order to form the product up to n shifted copies of the
multiplicand are to be added for unsigned multiplication.
The entire process consists of three steps, partial product
Figure 1.1: basic Multiplication generation, partial product reduction and final addition.
Here, we assume that MSB represent the sign of
the digit. The operation of multiplication is rather simple in
digital electronics. It has its origin from the classical
algorithm for the product of two binary numbers. This
algorithm uses addition and shift left operations to calculate
the product of two numbers. Based upon the above
procedure, we can deduce an algorithm for any kind of
multiplication which is shown in figure 1.2. We can check
at the initial stage also that whether the product will be
positive or negative or after getting the whole result, MSB
of the results tells the sign of the product.

Fig 1.3 Multiplication calculations by hand


The bold italic digits are the sign extension bits
of the partial products. The first operand is called the
multiplicand and the second the multiplier. The
intermediate products are called partial products and the
final result is called the product. However, the
multiplication process, when this method is directly
mapped to hardware, is shown in figure 1.2. As can been
seen in the figures, the multiplication operation in
hardware consists of PP generation, PP reduction and
final addition steps. The two rows before the product are
called sum and carry bits. The operation of this method is
to take one of the multiplier bits at a time from right to
left, multiplying the multiplicand by the single bit of the
multiplier and shifting the intermediate product one
position to the left of the earlier intermediate products.
Figure 1.2 Signed Multiplication Algorithm All the bits of the partial products in each
 BINARY MULTIPLICATION column are added to obtain two bits: sum and carry.
Finally, the sum and carry bits in each column have to be
summed. Similarly, for the multiplication of an n-bit

88
International Journal of Research in Advent Technology, Vol.7, No.4S, April 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org

multiplicand and an m-bit multiplier, a product with n + m role of increasing the speed to add the partial products. To
bits long and m partial products can be generated. The increase the speed of the MBA algorithm, many parallel
method shown in figure 1.3 is also called a non-Booth multiplication architectures have been researched .Among
encoding scheme. them, the architectures based on the Baugh–Wooley
algorithm (BWA) have been developed and they have
been applied to various digital filtering calculations.
The N-bit 2‟s complement binary number can be
expressedas

……..(1)
If (1) is expressed in base-4 type redundant sign digit
form in order to apply the radix-2 Booth‟s algorithm.

……………………………………..(2)

……………………(3)
Fig 1.4: Multiplication 0peration in hardware If (2) is used, multiplication can be expressed as

4. MULTIPLIER-ACCUMULATOR (MAC)
UNIT
In the majority of digital signal processing (DSP)
applications the critical operations are the multiplication and ………………………………(4)
accumulation. Real-time signal processing requires high If these equations are used, the afore-mentioned
speed and high throughput Multiplier-Accumulator (MAC) multiplication–accumulation results can be expressed as
unit that consumes low power, which is always a key to
achieve a high performance digital signal processing
system. The purpose of this work is to design and
implementation of a low power MAC unit with block
enabling technique to save power. Firstly, a 1-bit MAC unit …….(5)
is designed, with appropriate geometries that give optimized Each of the two terms on the right-hand side of
power, area and delay. The delay in the pipeline stages in (5) is calculated independently and the final result is
the MAC unit is estimated based on which a control unit is produced by adding the two results. The MAC
designed to control the data flow between the MAC blocks architecture implemented by (5) is called the standard
for low power. Similarly, the N-bit MAC unit is designed design [6].
and controlled for low power using a control logic that If radix-2 Booth encoding is used, the number of
enables the pipelined stages at appropriate time. The adder partial products, is reduced to half, resulting in the
cell designed has advantage of high operational speed, small decrease in Addition of Partial Products step. In addition,
Gate count and low power. the signed multiplication based on 2‟s complement
In general, a multiplier uses Booth‟s algorithm and numbers is also possible. Due to these reasons, most
array of full adders (FAs), or Wallace tree instead of the current used multipliers adopt the Booth encoding.
array of FA‟s., i.e., this multiplier mainly consists of the
three parts: Booth encoder, a tree to compress the partial
products such as Wallace tree, and final adder. Because
Wallace tree is to add the partial products from encoder as
parallel as possible, its operation time is proportional to,
where is the number of inputs. It uses the fact that counting
the number of 1‟s among the inputs reduces the number of
outputs into. In real implementation, many (3:2) or (7:3)
counters are used to reduce the number of outputs in each
pipeline step. The most effective way to increase the speed
of a multiplier is to reduce the number of the partial
products because multiplication precedes a series of
additions for the partial products. To reduce the number of
calculation steps for the partial products, MBA algorithm
has been applied mostly where Wallace tree has taken the

89
International Journal of Research in Advent Technology, Vol.7, No.4S, April 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org

Fig 1.5: Simple Multiplier and Accumulator signed digits, -2, -1, 0, +1, +2. Each encoded digit in the
Architecture multiplier performs a certain operation on the
5. HIGH-SPEED BOOTH ENCODED multiplicand, X, as illustrated in Table 1
PARALLELMULTIPLIER DESIGN

Fast multipliers are essential parts of digital signal


processing systems. The speed of multiply operation is of
great importance in digital signal processing as well as in
the general purpose processors today, especially since the
media processing took off. In the past multiplication was
generally implemented via a sequence of addition,
subtraction, and shift operations. Multiplication can be
considered as a series of repeated additions. The number to
be added is the multiplicand, the number of times that it is
added is the multiplier, and the result is the product

6. MODIFIED BOOTH ENCODER Figure 4, shows a computing example of Booth


multiplying two numbers ”2AC9” and “006A”. The
In order to achieve high-speed multiplication, shadow denotes that the numbers in this part of Booth
multiplication algorithms using parallel counters, such as multiplication are all zero so that this part of the
the modified Booth algorithm has been proposed, and some computations can be neglected. Saving those
multipliers based on the algorithms have been implemented computations can significantly reduce the power
for practical use. This type of multiplier operates much consumption caused by the transient signals. According to
faster than an array multiplier for longer operands because the analysis of the multiplication shown in figure 4, we
its computation time is proportional to the logarithm of the propose the SPST-equipped modified-Booth encoder,
word length of operands. which is controlled by a detection unit. The detection unit
has one of the two operands as its input to decide whether
the Booth encoder calculates redundant computations. As
shown in figure 9. The latches can, respectively, freeze
the inputs of MUX-4 to MUX-7 or only those of MUX-6
to MUX-7 when the PP4 to PP7 or the PP6 to PP7 are
zero; to reduce the transition power dissipation. Figure 10,
shows the booth partial product generation circuit. It
includes AND/OR/EX-OR logic.

Booth multiplication is a technique that allows for


smaller, faster multiplication circuits, by recoding the
numbers that are multiplied. It is possible to reduce the
number of partial products by half, by using the technique of
radix-4 Booth recoding. The basic idea is that, instead of
shifting and adding for every column of the multiplier term
and multiplying by 1 or 0, we only take every second
column, and multiply by ±1, ±2, or 0, to obtain the same
results. The advantage of this method is the halving of the
number of partial products. To Booth recode the multiplier
term, we consider the bits in blocks of three, such that each
Fig.2.3 Illustration of multiplication using modified Booth
block overlaps the previous block by one bit. Grouping
encoding
starts from the LSB, and the first block only uses two bits of
The PP generator generates five candidates of the
the multiplier. Figure 3 shows the grouping of bits from the
partial products, i.e., {-2A,-A, 0, A, 2A}. These are then
multiplier term for use in modified booth encoding.
selected according to the Booth encoding results of the
operand B. When the operand besides the Booth encoded
one has a small absolute value, there are opportunities to
Fig.2.2 Grouping of bits from the multiplier term reduce the spurious power dissipated in the compression
Each block is decoded to generate the correct tree.
partial product. The encoding of the multiplier Y, using the
modified booth algorithm, generates the following five

90
International Journal of Research in Advent Technology, Vol.7, No.4S, April 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org

Fig.2.4 SPST equipped modified Booth encoder


6.1 Partial product generator

Fig.2.7 Booth single partial product selector logic


6.3 Modified Booth Encoder
Multiplication consists of three steps: 1) the first
step to generate the partial products; 2) the second step to
add the generated partial products until the last two rows
are remained; 3) the third step to compute the final
multiplication results by adding the last two rows. The
modified Booth algorithm reduces the number of partial
products by half in the first step. We used the modified
Booth encoding (MBE) scheme proposed in. It is known
as the most efficient Booth encoding and decoding
scheme. To multiply X by Y using the modified Booth
Fig.2.5 Booth partial product selector logic algorithm starts from grouping Y by three bits and
6.2 Partial Product Generator: encoding into one of {-2, -1, 0, 1, 2}. Table I shows the
The multiplication first step generates from A and rules to generate the encoded signals by MBE scheme and
X a set of bits whose weights sum is the product P. For Fig. 1 (a) shows the corresponding logic diagram. The
unsigned multiplication, P most significant bit weight is Booth decoder generates the partial products using the
positive, while in 2's complement it is negative. encoded signals as shown in Fig. 1
The multiplication second step reduces the partial
products from the preceding step into two numbers while
preserving the weighted sum. The sough after product P is
the sum of those two numbers. The two numbers will be
added during the third step The "Wallace trees" synthesis
follows the Dadda's algorithm, which assures of the
minimum counter number. If on top of that we impose to
reduce as late as (or as soon as) possible then the solution is
unique. The two binary number to be added during the third
step may also be seen a one number in CSA notation (2 bits
per digit).

Fig.2.8 Booth Encoder

Fig.2.9 Booth Decoder

91
International Journal of Research in Advent Technology, Vol.7, No.4S, April 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org

Fig. shows the generated partial products and sign extension 7 . RESULTS AND DISCUSSION
scheme of the 8-bit modified Booth multiplier. The partial
products generated by the modified Booth algorithm are
added in parallel using the Wallace tree until the last two
rows are remained. The final multiplication results are
generated by adding the last two rows. The carry
propagation adder is usually used in this step.

8. SYNTHESIS RESULT

The developed MAC design is simulated and


verified their functionality. Once the functional
Fig. shows a 16-bit adder/subtractor design
verification is done, the RTL model is taken to the
example adopting the proposed SPST. In this example, the
synthesis process using the Xilinx ISE tool. In synthesis
16-bit adder/subtractor is divided into MSP and LSP
process, the RTL model will be converted to the gate
between the eighth and the ninth bits. Latches implemented
level netlist mapped to a specific technology library. This
by simple AND gates are used to control the input data of
MAC design can be synthesized on the family of Spartan
the MSP. When theMSP is necessary, the input data of MSP
3E.
remain unchanged. However, when the MSP is negligible,
Here in this Spartan 3E family, many different
the input data of the MSP become zeros to avoid glitching
devices were available in the Xilinx ISE tool. In order to
power consumption. The two operands of the MSP enter the
synthesis this design the device named as “XC3S500E”
detection-logic unit, except the adder/subtractor, so that the
has been chosen and the package as “FG320” with the
detection-logic unit can decide whether to turn off the MSP
device speed such as “-4”. The design of MAC is
or not. Based on the derived Boolean equations (1) to (8),
synthesized and its results were analyzed as follows.
the detection-logic unit of SPST is shown in Fig. 6(a),
Device utilization summary:
which can determine whether the input data of MSP should
be latched or not. Moreover, we propose the novel glitch-
diminishing technique by adding three 1-bit registers to
control the assertion of the close, sign, and carr-ctrl signals
to further decrease the transient signals occurred in
thecascaded circuits which are usually adopted in VLSI
architecturesdesigned for multimedia/DSP applications. The
timing diagram is shown in Fig. 6(b). A certain amount of
delay is used to assert the close, sign, and carr-ctrl signals
after the period of data transition which is achieved by
controlling the three 1-bit registers at the outputs of the
detection-logic unit.
Hence, the transients of the detection-logic unit can
be filtered out; thus, the data latches shown in Fig can
prevent the glitch signals from flowing into the MSP with In timing summary, details regarding time period
tiny cost. The data transient time and the earliest required and frequency is shown are approximate while synthesize.
time of all the inputs are also illustrated. The delay should After place and routing is over, we get the exact timing
be set in the range of, which is shown as the shadow area in summery. Hence the maximum operating frequency of
Fig, to filter out the glitch signals as well as to keep the this synthesized design is given as 86.987 MHz and the
computation results correct. Based on Figs. 5 and 6, the minimum period as 11.496 ns. Here, OFFSET IN is the
timing issue of the SPST is analyzed as follows. minimum input arrival time before clock and OFFSET
OUT is maximum output required time after clock.
RTL Schematic

92
International Journal of Research in Advent Technology, Vol.7, No.4S, April 2019
E-ISSN: 2321-9637
Available online at www.ijrat.org

The RTL (Register Transfer Logic) can be


viewed as black box after synthesize of design is made. It REFERENCES
shows the inputs and outputs of the system. By double- [1]. T. Stockhammer, M. Hannuksela and T.Wiegand,
clicking on the diagram we can see gates, flip-flops and “H.264/AVC in wireless environments,” IEEE Trans.
MUX. Circuits Syst. Video Technol., vol.13, no.7, pp.657-
673,Jul. 2003.
[2]. A. Bellaouar and M.I. Elmasry, Low-power Digital
VLSI Design”,Circuits and Syst.,Norwell,MA:Kluwer,
1995.
[3]. A.P Chandrakasan and R.W. Brodersen, “Minimising
Power Consumption in digital CMOS
circuits,”Proc.IEEE, vol.83, no.4,pp. 498-523, Apr.1995.
[4]. R. Schafer, T. Wiegand and H. Schwarz, “The
emerging H.264/AVC standard,”EBU Available:
http://www.ebu.ch/trev_293-schaefer.pdf
[5]. K.Choi, R. Soma and M.Pedram, “Dynamic Voltage
Figure 3.6 Schematic with Basic Inputs and Output and frequency scaling based on workload
decomposition,”in proc. IEEE Int Symp.nLow power
9. SUMMARY Electron Des., 2004, pp.174-179.
[6]. K.K. Parhi, “Approaches to low power
 The developed MAC design is modelled and is implementations of DSP systems,” IEEE trans. Circuits
simulated using the Modelsim tool. Syst. I, Fundam, Theory Appl., Vol.48,no.10, pp.1214-
 The simulation results are discussed by 1224, Oct.2001.
considering different cases. [7]. O.Chen, R. Sheen and S.Wang “ A low power adder
 The RTL model is synthesized using the operating on effective dynamic data ranges ,” IEEE Trans.
Xilinx tool in Spartan 3E and their synthesis VLSI Syst.,Vol 10, no. 4, pp.435-453, Aug 2002.
results were discussed with the help of [8]. J. Choi, J. Jeon and K. Choi, “power minimizationof
generated reports. functional unitsby partially guarded computation,” in
proc. IEEE Int. Symp. Low power electron,
Des.,2000,pp.131-136.
[9]. L.Benni, G.D. Micheli, A. Macii, E. Macii, M.
Poncino and R. scarsi,” glitch power minimisation by
selective gate freezing ,” IEEE Trans. VLSI Syst. Vol.
8,no. 3,pp.287-298, Jun 2000.
[10]. O.Chen, S wang and Y. W Wu, “minimisation of
switching activities of partial products for designing low
power mulitpliers”, IEEE Trans. VLSI Syst.,Vol.11, no.3,
pp 418-433, Jun 2003.

10. CONCLUSION

A 16x16 multiplier-accumulator (MAC) is


presented in this work. A RADIX 4 Modified Booth
multiplier circuit is used for MAC architecture. Compared
to other circuits, the Booth multiplier has the highest
operational speed and less hardware count. The basic
building blocks for the MAC unit are identified and each of
the blocks is analyzed for its performance. Power and delay
is calculated for the blocks. 1-bit MAC unit is designed with
enable to reduce the total power consumption based on
block enable technique. Using this block, the N-bit MAC
unit is constructed and the total power consumption is
calculated for the MAC unit. The power reduction
techniques adopted in this work. The MAC unit designed in
this work can be used in filter realizations for High speed
DSP applications. Table 12 summarizes the results obtained.

93

You might also like