An Efficient MAC Unit With Low Area Consumption
An Efficient MAC Unit With Low Area Consumption
An Efficient MAC Unit With Low Area Consumption
1
2
An Efficient MAC Unit with Low Area
3
4
Consumption
5
6
7
Gitika Bhatia1, Karanbir Singh Bhatia2, Osheen Chauhan3, Soumya Chourasia4, Pradeep Kumar5
8
1,2,3,4,5
9
AMITY School of Engineering and Technology, AMITY University Uttar Pradesh, Noida, India
1
10
[email protected],2 [email protected], [email protected], [email protected],
5
11
[email protected]
12
13
14
15
Abstract- In this paper we propose a new architecture for an the functions in the processors are multiplication and
16
efficient MAC (Multiplier Accumulator Unit) unit with low addition based. Thus, the MAC (Multiplier Accumulator
17
area consumption which includes Vedic Square as an alternate Unit), which is a subpart of ALU (Arithmetic Logical
18
component in the MAC unit. Vedic Square is based on the
19
Unit), being the main component of digital signal
principle of Duplex property of Urdhva Tiryagbhya. Using the
20
processors, comprises of a multiplier, an adder and an
proposed architecture, 50% of logic gates are reduced from the
21
accumulator. Addition was found to be the most
basic level of 2*2 bit and 12.64% from 16*16 bit square
22
computation. Hence, speed is increased by means of decreased frequently applied process amid real-time digital signal
23
toggling of gates. This also reduces total area of the MAC unit by processing benchmarks by Chen et al. [2]. Multiplication
24
11.23% as compared to the MAC with Vedic Multiplier and Kogge is the most frequently used CIAF (Computation Intensive
25
Arithmetic Function), and dominates the delay of the
Stone Adder (when run for computations of square). The overall
26
performance of MAC unit is determined by three parameters, processor. Hence, overall increase in speed can be seen
27
namely speed, power and area. The proposed architecture of High after incorporating faster adder and multiplier circuits.
28
speed and low area consumption MAC unit would contribute
29
immensely to the future DSP systems. Mentor Graphics- Modelsim The simplest yet slowest adder is the Ripple Carry
30
6.5a PE and Precision synthesis by Mentor Graphics are used for Adder (RCA)with O(n) area and O(n) delay [3-4]; here ‘n’
31
implementation. is the number of bits of the operand. Carry Look-Ahead
32
33
Keywords- DSP Processor; Multiplier Accumulator Unit Adders (CLA) [5-6], suffer from irregular layout, even when
34
(MAC); Vedic multiplier; Sparse Kogge Stone Adder; Vedic square they have O(n.log (n)) area and O(log (n)) delay. To the
35
contrary, Carry Skip Adder(CSA) [7-8], carry increment
36
Adder [9-10] and carry select Adders [11-12] have O(n) area
37
I. INTRODUCTION and O(n l+2/l+1 ) delay but provide a good compromise
38
39
regarding delay and area; along with a simple and regular
40
layout. Carry-tree adders which are parallel-prefix adders,
Vedic Maths was formulated by Swami Bharati pre compute the generate signals and propagate signals. The
41
42
Krishna Tirthaji Maharaja from the ancient Indian fundamental carry operator (fco) [13] is thereafter used to
43
scriptures (Vedas) after extensive research on the Vedas differently combine these signals. When a comparison is
44
[1]. Vedic Mathematics finds its base on the sixteen carried out between Sparse Kogge Stone, ripple carry and
45
principles or ‘sutras’. Thus, integration of multiplication the carry save adder, the Sparse Kogge Stone Adder
46
with Vedic Maths can lead to wonders in various domains performs the addition process with the least delay and
47
of engineering, such as Digital Signal Processing. Earlier,
48
lowest power consumption, that too at a reduced cost, and
DSP applications were implemented using bit-slice higher speeds [13].
49
processors which would a part of series from TRW
50
51
including the TDC1008 and TDC1010, providing the Apart from adders, high-speed multipliers are also
52
requisite multiply–accumulate (MAC) function. Bell Labs desired as performance of DSP systems is limited by their
53
introduced the first generation in February, 1980. Second performance to execute multiplication processes, which is
54
generation of DSPs operated required about 21 ns for a due to the fact that multiplication dominates the execution
55
MAC for example the Motorola 56000. Now, the fourth time of a majority of DSP algorithms [14-16]. Designing
56
generation is better in terms of the clock-speeds, a 3 ns of MAC unit that caters to both delay and power issues
57
MAC now became possible. has, henceforth, become the need of the hour. The entire
60
61
process of fetching inputs from memory location, feeding
With time, DSP processors are being incorporated them to the multiplier block, passing the multiplication
62
into handheld, wireless and mobile devices; therefore, results to the adder, and finally storing the accumulated
63
64
power is an area of concern that is gaining importance. A results in a memory location, is to be achieved in a single
65
high performance Digital Signal Processing system can clock cycle [17]. When we discuss matters of delay in
be achieved by using high speed and high throughput multipliers, the Urdhva Tiryagbhyam sutra which is used
MAC. Also, we are aware of the fact that almost 70% of
Multiplicand:
X3 X2 X1 X0
Multiplier:
X3 X2 X1 X0
-----------------------------------
HGFEDCBA
-----------------------------------
2
b. To calculate square of 2345 using Vedicc Multiplier: respectively.
Step 1: (5)2
Step 7: (2)2
3
Table 1. Area Report Compaarison for 16*16 Vedic Square and
16*16 Veedic Multiplier
Resource VE
EDIC VEDIC
SQU
UARE MULTIPLIER
Used
U Used
IOs 6
64 64
LUTs 7
727 834
No. of Gates 7
747 855
No. of 7
791 898
accumulated
instances
Resource Used A
Available Utilization (%)
IV. CO
ONCLUSION
4
and power. At this stage, we have successfully improved Transistor Logic”, International Conference on Intelligent and
Advanced Systems, 2007, pp. 1374-1378
our area by incorporating the concept of Vedic square
[16] Kiat-seng Yeo and Kaushik Roy “Low-voltage, low power VLSI
which reduces the number of gates by 11.15%, and sub system” Mc Graw-Hill Publication, USA, 2005.
thereby leads to an increase in speed. [17] Avisek Sen, Partha Mitra, Debarshi Datta, “Low Power MAC
Unit for DSP Processor”, International Journal of Recent
Technology and Engineering (IJRTE) Vol. 1, Issue-6, January
2013, pp. 93-95
V. REFERENCES
[18] Amandeep Singh, “Design and Hardware Realization of 16 bit
Vedic Arithmetic Unit”, June 2010, pp. 1-65
[19] Himanshu Thapliyal and M.B Srinivas, “An Efficient Method of
[1] Harpreet Singh Dhillon, Abhijit Mitra, “A Reduced-Bit
Elliptic Curve Encryption Using Ancient Indian Vedic
Multiplication Algorithm for Digital Arithmetic”, International
Mathematics”, 48th Midwest Symposium on Circuits and
Journal of Computational and Mathematical Sciences, 2008, pp.
Systems, Vol. 1, 2005, pp. 826-828
719-723
[2] D. C. Chen, L. M. Guerra, E. H. Ng, M. Potkonjak, D. P.
Schultz, and J. M. Rabaey, “An integrated system for rapid
prototyping of high performance algorithm specific data
paths,” International Conference on Application Specific
Array Processors, Aug 1992, pp. 134-148
[3] Aminul Islam, M.W. Akram, S.D. Pable, Mohd. Hasan,
"Design and Analysis of Robust Dual Threshold CMOS Full
Adder Circuit in 32nm Technology”, International Conference
on Advances in Recent Technologies in Communication and
Computing, 2010, pp. 418-420
[4] Deepa Sinha, Tripti Sharma, K.G.Sharma, Prof B.P Singh,
“Design and Analysis of low Power 1-bit Full Adder Cell”,
International Conference on Electronics Computer Technology
(ICECT), Vol. 2, 2011, pp. 303-305
[5] Nabihah Ahmad, Rezaul Hasan, “A new Design of XOR-XNOR
gates for Low Power application”, International Conference on
Electronic Devices, Systems and Applications (ICEDSA), 2011,
pp. 45-49
[6] R.Uma, “4-Bit Fast Adder Design: Topology and Layout with
Self-Resetting Logic for Low Power VLSI Circuits”, International
Journal of Advanced Engineering Sciences and Technology, Vol
No. 7, Issue No. 2, 2011, pp. 29-37
[7] David J. Willingham and izzet Kale, “A Ternary Adiabatic Logic
(TAL) Implementation of a Four-Trit Full-Adder, NORCHIP,
2011, pp. 1-4
[8] Padma Devi, Ashima Girdher and Balwinder Singh, “Improved
Carry Select Adder with Reduced Area and Low Power
Consumption”, International Journal of Computer Application,
Vol 3. Issue.4, June 2010, pp. 14-18 .
[9] B.Ramkumar, Harish M Kittur, P.Mahesh Kannan, “ASIC
Implementation of Modified Faster Carry Save Adder”, European
Journal of Scientific Research, Vol.42, No.1, 2010, pp. 53-58
[10] Y. Sunil Gavaskar Reddy and V.V.G.S.Rajendra Prasad, “Power
Comparison of CMOS and Adiabatic Full Adder Circuits”,
International Journal of VLSI design & Communication Systems
(VLSICS) Vol.2, No.3, September 2011, URL:
http://arxiv.org/ftp/arxiv/papers/1110/1110.1549.pdf
[11] Mariano Aguirre-Hernandez and Monico Linares-Aranda,
“CMOS Full-Adders for Energy-Efficient Arithmetic
Applications”, IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, Vol. 19, No. 4, April 2011, pp. 718-
721
[12] Ning Zhu, Wang Ling Goh, Weija Zhang, Kiat Seng Yeo, and Zhi
Hui Kong, “Design of Low-Power High-Speed Truncation-Error-
Tolerant Adder and Its Application in Digital Signal Processing”,
IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, Vol. 18, No. 8, August 2010, pp. 1225-1229
[13] P.Annapurna Bai, M.Vijaya Laxmi, “Design of 128- bit Kogge-
Stone Low Power Parallel Prefix VLSI Adder for High Speed
Arithmetic Circuits”, International Journal of Engineering and
Advanced Technology (IJEAT), Volume-2, Issue-6, August 2013,
pp. 415-418
[14] Pravinkumar Parate, “ASIC Implementation of 4 Bit Multipliers”,
International Conference on Emerging Trends in Engineering and
Technology ICETET, 2008, pp. 408-413
[15] C.Senthilpari, Ajay Kumar Singh and K. Diwadkar, “Low power
and high speed 8x8 bit Multiplier Using Non-clocked Pass