An Efficient MAC Unit With Low Area Consumption

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

IEEE INDICON 2015 1570186355

1  
2  
An Efficient MAC Unit with Low Area
3  
4   Consumption
5  
6  
7   Gitika Bhatia1, Karanbir Singh Bhatia2, Osheen Chauhan3, Soumya Chourasia4, Pradeep Kumar5
8   1,2,3,4,5
9   AMITY School of Engineering and Technology, AMITY University Uttar Pradesh, Noida, India
1
10   [email protected],2 [email protected], [email protected], [email protected],
5
11   [email protected]
12  
13  
14  
15   Abstract- In this paper we propose a new architecture for an the functions in the processors are multiplication and
16   efficient MAC (Multiplier Accumulator Unit) unit with low addition based. Thus, the MAC (Multiplier Accumulator
17   area consumption which includes Vedic Square as an alternate Unit), which is a subpart of ALU (Arithmetic Logical
18   component in the MAC unit. Vedic Square is based on the
19   Unit), being the main component of digital signal
principle of Duplex property of Urdhva Tiryagbhya. Using the
20   processors, comprises of a multiplier, an adder and an
proposed architecture, 50% of logic gates are reduced from the
21   accumulator. Addition was found to be the most
basic level of 2*2 bit and 12.64% from 16*16 bit square
22   computation. Hence, speed is increased by means of decreased frequently applied process amid real-time digital signal
23   toggling of gates. This also reduces total area of the MAC unit by processing benchmarks by Chen et al. [2]. Multiplication
24   11.23% as compared to the MAC with Vedic Multiplier and Kogge is the most frequently used CIAF (Computation Intensive
25   Arithmetic Function), and dominates the delay of the
Stone Adder (when run for computations of square). The overall
26  
performance of MAC unit is determined by three parameters, processor. Hence, overall increase in speed can be seen
27  
namely speed, power and area. The proposed architecture of High after incorporating faster adder and multiplier circuits.
28  
speed and low area consumption MAC unit would contribute
29  
immensely to the future DSP systems. Mentor Graphics- Modelsim The simplest yet slowest adder is the Ripple Carry
30  
6.5a PE and Precision synthesis by Mentor Graphics are used for Adder (RCA)with O(n) area and O(n) delay [3-4]; here ‘n’
31  
implementation. is the number of bits of the operand. Carry Look-Ahead
32  
33   Keywords- DSP Processor; Multiplier Accumulator Unit Adders (CLA) [5-6], suffer from irregular layout, even when
34   (MAC); Vedic multiplier; Sparse Kogge Stone Adder; Vedic square they have O(n.log (n)) area and O(log (n)) delay. To the
35   contrary, Carry Skip Adder(CSA) [7-8], carry increment
36   Adder [9-10] and carry select Adders [11-12] have O(n) area
37  
I. INTRODUCTION and O(n l+2/l+1 ) delay but provide a good compromise
38  
39   regarding delay and area; along with a simple and regular
40   layout. Carry-tree adders which are parallel-prefix adders,
Vedic Maths was formulated by Swami Bharati pre compute the generate signals and propagate signals. The
41  
42   Krishna Tirthaji Maharaja from the ancient Indian fundamental carry operator (fco) [13] is thereafter used to
43   scriptures (Vedas) after extensive research on the Vedas differently combine these signals. When a comparison is
44   [1]. Vedic Mathematics finds its base on the sixteen carried out between Sparse Kogge Stone, ripple carry and
45   principles or ‘sutras’. Thus, integration of multiplication the carry save adder, the Sparse Kogge Stone Adder
46   with Vedic Maths can lead to wonders in various domains performs the addition process with the least delay and
47   of engineering, such as Digital Signal Processing. Earlier,
48   lowest power consumption, that too at a reduced cost, and
DSP applications were implemented using bit-slice higher speeds [13].
49  
processors which would a part of series from TRW
50  
51   including the TDC1008 and TDC1010, providing the Apart from adders, high-speed multipliers are also
52   requisite multiply–accumulate (MAC) function. Bell Labs desired as performance of DSP systems is limited by their
53   introduced the first generation in February, 1980. Second performance to execute multiplication processes, which is
54   generation of DSPs operated required about 21 ns for a due to the fact that multiplication dominates the execution
55   MAC for example the Motorola 56000. Now, the fourth time of a majority of DSP algorithms [14-16]. Designing
56   generation is better in terms of the clock-speeds, a 3 ns of MAC unit that caters to both delay and power issues
57  
MAC now became possible. has, henceforth, become the need of the hour. The entire
60  
61   process of fetching inputs from memory location, feeding
With time, DSP processors are being incorporated them to the multiplier block, passing the multiplication
62  
into handheld, wireless and mobile devices; therefore, results to the adder, and finally storing the accumulated
63  
64   power is an area of concern that is gaining importance. A results in a memory location, is to be achieved in a single
65   high performance Digital Signal Processing system can clock cycle [17]. When we discuss matters of delay in
be achieved by using high speed and high throughput multipliers, the Urdhva Tiryagbhyam sutra which is used
MAC. Also, we are aware of the fact that almost 70% of

978-1-4673-6540-6/15/$31.00 ©2015 IEEE


1
in the Vedic Multiplier is the most efficient algorithm applications [19], it will possibly lead to enormous reduction
helping to achieve the minimum delay for the operation of used hardware, as it is very effective when in matters of
of multiplication. Thus, the existing architecture, as silicon area and speed of computation.
shown in Fig.1, uses the combination of Vedic multiplier
and Sparse Kogge Stone Adder for implementing high This paper is organized as follows: Section I
speed and low power consumption MAC unit [17]. provides an introduction to the paper’s objective along
with a brief background to the existing architectures of
Moreover, when we exploit the Duplex Property of MAC and other components used. The major
this sutra, we also achieve a lower area. Bringing this contributions of the paper are also discussed. Section II
sutra into practicality, we can solve various, area and describes the proposed architecture. Section III compiles
delay issues in the VLSI industry. We are aware, that the all the results of simulation and synthesis. Section IV
combination of the Vedic Multiplier (based on the concludes our work.
Urdhva Tiryagbhyam sutra) with Kogge Stone adder is
one of the best possible combinations for implementing
high speed and power efficient MAC unit [17]. But, the II. PROPOSED ARCHITECTURE
need of the hour is to have a MAC unit with the possible
statistics for all three aspects, namely area, speed and
The proposed architecture to reduce the area and
power. The proposed MAC unit will potentially provide a
improve the speed of multiplication process of MAC unit
solution to the problem of area efficiency at high speed
with Vedic Square concept is introduced as shown in Fig.
processing with reasonable power consumption, for
2. The Vedic square is an extremely fast way of
computation of squares.
calculating the square of large numbers [2]. It makes use
of the duplex method of Urdhva Tiryagbhyam sutra. The
duplex method is described as follows:

Multiplicand:

X3 X2 X1 X0

Multiplier:

X3 X2 X1 X0

-----------------------------------

HGFEDCBA

-----------------------------------

Fig. 1. MAC Architecture using Vedic Multiplier and Sparse Product:


Kogge Stone Adder
P7 P6 P5 P4 P3 P2 P1 P0
The main contribution of this paper is to propose a new
architecture for MAC unit with low area and high speed. We -----------------------------------
have achieved this by using Vedic Square [18] in the
existing architecture. The principle of Vedic Square is based Example: Parallel Computation
on the ‘Duplex D’ property of Urdhva Triyagbhyam which
involves addition of twice the product of outermost pair of a. To calculate square of 2345 using Vedic Square
an n-bit number. With odd number of bits, if one bit is left
D0= (5)2 = 25 = A
then its square is being taken in the result. Thus,
computations decrease with Vedic square in terms of D1= 2*4*5 = 40 = B
reduction in multiplication operations. The proposed MAC
Architecture uses 11.23% less area than the MAC unit using D2= 2*3*5 + (4)2 = 46 = C
Vedic multiplier and Kogge Stone adder; Moreover, with
the 11.15% reduction of gates in total, it is also, bound to be D3= 2*2*5 + 2*3*4 = 44 = D
faster. The improvements in power-delay product of MAC
D4= 2*2*4 + (3)2 = 25= E
units are an important aspect as it will be used in high speed
DSP applications [17]. Therefore, our proposed architecture D5= 2*2*3 = 12 = F
provides a way for an improvement in area thereby
complimenting the speed and power aspects. Also, if the D6= (2)2 = 4 = G
Vedic square is used in cryptography and security
Total No. of computations using Vedic square: 19

2
b. To calculate square of 2345 using Vedicc Multiplier: respectively.

Step 1: (5)2

Step 2: 4*5 +5*4

Step 3: 3*5 + 5*3 + (4)2

Step 4: 2*5 +5*2 + 3*4 +4*3

Step 5: 2*4 + 4*2 + (3)2

Step 6: 2*3 + 3*2

Step 7: (2)2

Total No. of Operations using Vedic Multipliier: 25


Fig. 3. RTL Schematic for 2*2 Vedic Multiplier
Computations decrease with Vedic squuare in terms
of reduction of total number of operaations. Thus,
efficiency of our MAC unit increases. On the basis of
these calculations, we propose the followingg architecture
for an efficient MAC unit, as shown in fig. 2. This leads
to a reduction of area and gates, and an increaase in speed.

Fig. 4. RTL Schemaatic for 2*2 Vedic Square

As seen in Fig. 3 and Fig 4, logic gates are 50% less in


Vedic square, which will result in reduced area and
increased speed due to decreaased toggling of gates.
The 16*16 bit Vedic Sqquare is also simulated and
synthesized; the RTL schem matic for the same is shown in
Fig. 5.

Fig. 2. Proposed Architecture

III. SIMULATION RESUL


LTS

Precision synthesis by mentor graphiccs is used to


synthesize the code. Firstly the 2*2 bit Veddic multiplier
and 2*2 bit Vedic Square are synthesized inndependently;
the RTL schematics are shown in Fig. 3 and Fig. 4,

3
Table 1. Area Report Compaarison for 16*16 Vedic Square and
16*16 Veedic Multiplier

Resource VE
EDIC VEDIC
SQU
UARE MULTIPLIER
Used
U Used
IOs 6
64 64
LUTs 7
727 834
No. of Gates 7
747 855
No. of 7
791 898
accumulated
instances

Area reports for both, MAC C unit using Vedic multiplier,


and MAC unit using Vediic square are obtained after
synthesis and are presented as
a follows:

Table 2. Area Report for MAC


M using Vedic Multiplier

Resource Used A
Available Utilization (%)

IOs 68 2440 28.33


LUTs 946 100944 8.64
Fig. 5. RTL Schematic for 16*16 Vedic Square
No. of 968 - -
Simulation of MAC unit with 16*116 bit Vedic Gates
No. of 1078 - -
square and Kogge Stone Adder is done usinng Modelsim
accumulate
6.5 PE. The simulation result for the samee is shown in d instances
Fig. 5.

Table 3. Area Report for MAC using Vedic Square

Resource Used Available Utilization (%)


IOs 68 240 28.33
LUTs 839 10944 7.67
No. of gates 860 - -
No. of 971 - -
accumulated
instances

IV. CO
ONCLUSION

From the RTL schem matics shown in Fig. 3 and


Fig.4, we can see that the cooncept of Vedic square that is
used in the MAC, reduces the number of gates and area
by 50% from the initial levvel of 2*2 bit computation.
From Table. 1 we can deduuce that, when the 16*16 bit
Vedic square unit is simulateed and synthesized, it is found
that the number of gates are reduced by 12.64% as
Fig. 6. Simulation Result for MAC unit usingg 16*16 Vedic compared to the 16*16 bit Vedic Multiplier. From the
Square and 32 bit Kogge Stone Add der simulation result and Area reports as shown in Fig. 6,
Table. 2 and Table. 3 respeectively, it is evident that the
Table 1 shows comparison of area repports of both
MAC architecture with Vediic square reduces the area by
16*16 Vedic square and Vedic Multiplier aftter synthesis. 11.23%, as compared to the MACM using Vedic multiplier.
VLSI industry is based upon the balance of area, speed

4
and power. At this stage, we have successfully improved Transistor Logic”, International Conference on Intelligent and
Advanced Systems, 2007, pp. 1374-1378
our area by incorporating the concept of Vedic square
[16] Kiat-seng Yeo and Kaushik Roy “Low-voltage, low power VLSI
which reduces the number of gates by 11.15%, and sub system” Mc Graw-Hill Publication, USA, 2005.
thereby leads to an increase in speed. [17] Avisek Sen, Partha Mitra, Debarshi Datta, “Low Power MAC
Unit for DSP Processor”, International Journal of Recent
Technology and Engineering (IJRTE) Vol. 1, Issue-6, January
2013, pp. 93-95
V. REFERENCES
[18] Amandeep Singh, “Design and Hardware Realization of 16 bit
Vedic Arithmetic Unit”, June 2010, pp. 1-65
[19] Himanshu Thapliyal and M.B Srinivas, “An Efficient Method of
[1] Harpreet Singh Dhillon, Abhijit Mitra, “A Reduced-Bit
Elliptic Curve Encryption Using Ancient Indian Vedic
Multiplication Algorithm for Digital Arithmetic”, International
Mathematics”, 48th Midwest Symposium on Circuits and
Journal of Computational and Mathematical Sciences, 2008, pp.
Systems, Vol. 1, 2005, pp. 826-828
719-723
[2] D. C. Chen, L. M. Guerra, E. H. Ng, M. Potkonjak, D. P.
Schultz, and J. M. Rabaey, “An integrated system for rapid
prototyping of high performance algorithm specific data
paths,” International Conference on Application Specific
Array Processors, Aug 1992, pp. 134-148
[3] Aminul Islam, M.W. Akram, S.D. Pable, Mohd. Hasan,
"Design and Analysis of Robust Dual Threshold CMOS Full
Adder Circuit in 32nm Technology”, International Conference
on Advances in Recent Technologies in Communication and
Computing, 2010, pp. 418-420
[4] Deepa Sinha, Tripti Sharma, K.G.Sharma, Prof B.P Singh,
“Design and Analysis of low Power 1-bit Full Adder Cell”,
International Conference on Electronics Computer Technology
(ICECT), Vol. 2, 2011, pp. 303-305
[5] Nabihah Ahmad, Rezaul Hasan, “A new Design of XOR-XNOR
gates for Low Power application”, International Conference on
Electronic Devices, Systems and Applications (ICEDSA), 2011,
pp. 45-49
[6] R.Uma, “4-Bit Fast Adder Design: Topology and Layout with
Self-Resetting Logic for Low Power VLSI Circuits”, International
Journal of Advanced Engineering Sciences and Technology, Vol
No. 7, Issue No. 2, 2011, pp. 29-37
[7] David J. Willingham and izzet Kale, “A Ternary Adiabatic Logic
(TAL) Implementation of a Four-Trit Full-Adder, NORCHIP,
2011, pp. 1-4
[8] Padma Devi, Ashima Girdher and Balwinder Singh, “Improved
Carry Select Adder with Reduced Area and Low Power
Consumption”, International Journal of Computer Application,
Vol 3. Issue.4, June 2010, pp. 14-18 .
[9] B.Ramkumar, Harish M Kittur, P.Mahesh Kannan, “ASIC
Implementation of Modified Faster Carry Save Adder”, European
Journal of Scientific Research, Vol.42, No.1, 2010, pp. 53-58
[10] Y. Sunil Gavaskar Reddy and V.V.G.S.Rajendra Prasad, “Power
Comparison of CMOS and Adiabatic Full Adder Circuits”,
International Journal of VLSI design & Communication Systems
(VLSICS) Vol.2, No.3, September 2011, URL:
http://arxiv.org/ftp/arxiv/papers/1110/1110.1549.pdf
[11] Mariano Aguirre-Hernandez and Monico Linares-Aranda,
“CMOS Full-Adders for Energy-Efficient Arithmetic
Applications”, IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, Vol. 19, No. 4, April 2011, pp. 718-
721
[12] Ning Zhu, Wang Ling Goh, Weija Zhang, Kiat Seng Yeo, and Zhi
Hui Kong, “Design of Low-Power High-Speed Truncation-Error-
Tolerant Adder and Its Application in Digital Signal Processing”,
IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, Vol. 18, No. 8, August 2010, pp. 1225-1229
[13] P.Annapurna Bai, M.Vijaya Laxmi, “Design of 128- bit Kogge-
Stone Low Power Parallel Prefix VLSI Adder for High Speed
Arithmetic Circuits”, International Journal of Engineering and
Advanced Technology (IJEAT), Volume-2, Issue-6, August 2013,
pp. 415-418
[14] Pravinkumar Parate, “ASIC Implementation of 4 Bit Multipliers”,
International Conference on Emerging Trends in Engineering and
Technology ICETET, 2008, pp. 408-413
[15] C.Senthilpari, Ajay Kumar Singh and K. Diwadkar, “Low power
and high speed 8x8 bit Multiplier Using Non-clocked Pass

You might also like