Full Paper PDF
Full Paper PDF
Full Paper PDF
Abstract—Floating-point representation of any fractional delay overhead. With the increasing demand for
number offers a wider dynamic range which makes it computational power for scientific applications such as
extremely compliant and scalable against fixed-point computational physics, computational geometry, etc. [2] that
representation. Since fractional numbers are frequently used require high precision in the calculation, it is critical to have
in computation, such as in astronomical calculations, faster and more accurate floating-point units, especially the
graphics processing, and signal processing, floating-point multiplier. The two main types of multipliers are serial
representation becomes the ideal representation for them. multipliers and parallel multipliers. Each bit in a partial
Floating-point multipliers perform differently depending on product is generated in parallel by a parallel multiplier.
the multiplier design. The purpose of this paper is to review Nevertheless, a serial multiplier uses every bit of the
various studies done in the field of floating-point multiplier to make partial products. As a result, serial
multipliers, such as the Modified Booth multiplier, Array multipliers have lower speeds than parallel multipliers. [3]
multiplier, Dadda multiplier, Wallace Tree multiplier, and
Furthermore, it is important to note that floating-point
Vedic multiplier. A floating-point multiplier's performance
multipliers have long been popular in a variety of other
is evaluated based on a number of attributes, including
disciplines, including image processing, and graphic
speed, latency, area, and power consumption. In the design
processing, digital signal processing (DSP). In comparison
phase, Verilog HDL is used, and in the simulation phase, to fixed-point numbers, floating-point numbers have a much
Xilinx Isim is used. To implement RTL blocks created with wider dynamic range. But this increased range comes with a
Xilinx ISE 14.7, FPGA devices were used.
drawback of increased structural complexity. As accuracy
increases, they also require more space, which tends to grow
Keywords — Floating-point numbers, Single-precision (32-
as complexity increases. [2] It has been observed that single-
bit), Double-precision (64-bit), IEEE-754, Array, Modified
precision floating-point representation can handle the
Booth, Wallace tree, Dadda tree, Vedic, Verilog HDL, required range of numbers [5]. In order to multiply 32-bit
Xilinx ISE, FPGA. floating-point values, three steps must be performed, namely
sign bit determination, exponent addition, and significand
I. INTRODUCTION multiplication [1]. After examining various single-precision
(32-bit) floating-point multipliers and their implementation
Microprocessors carry out a variety of arithmetic operations, issues, it was found that of these three components, the
including addition, subtraction, multiplication, division, and significand multiplication unit is the slowest. Also, it uses
logical operations, using their ALU (Arithmetic-Logic Unit) the maximum chip area, so it contributes the most to power
block. Among all these operations, binary multiplications dissipation. [5] If the efficiency of significand multiplication
are noteworthy. They get the constant attention of could be improved either in terms of speed, area, or power,
researchers and scientists because of their resource-intensive it would refine the architecture and boost overall throughput
and time-consuming executions. According to Moore's law remarkably. Therefore, multiplicative power optimization is
chip transistor count doubles every 18 to 24 months. a significant advancement in optimizing floating-point
Although the pace of this is much slower than that of operations.
computing power growth. Additionally, floating-point
calculations consume approximately 75% of the core power
and 45% of the total power of high-performance computing II. IEEE-754 FLOATING POINT REPRESENTATION
applications. Therefore, floating-point operations have been Computers have used a variety of floating-point
rendered less efficient as a result of the large resource and representations, but IEEE-754 is one of the most extensively
used standards in the industry. It is regarded as the most operations. In the following paragraphs, we will discuss a
widely applied standard for floating point calculations by few of them.
the Institute of Electrical and Electronics Engineers (IEEE)
A. Array Multiplier
[6]. Three fundamental parts make up single precision
representations of this standard: a sign bit, an 8-bit This multiplier multiplies two binary numbers by
exponent, and a 23-bit mantissa [7]. utilizing an assembly of half and full adders. Due to its
predictable structure, this add-and-shift-based algorithm is
well-liked. The final product is created by adding the partial
TABLE I. FLOATING-POINT SINGLE AND DOUBLE products using a carry propagate adder after multiplying the
PRECISION FORMAT multiplier's bits by the multiplicand's bits to create the partial
products and shifting them in accordance with their bit order.
We need N-1 stages, where N is the multiplier bit number, to
get the final result [7].
Step 3. The sign bits of both numbers are XORed to get the
sign bit for the final product.
Fig. 2. 4x4 Array Multiplier
Step 4. Normalization of the result is done such that the
MSB of the result is logic-1.
B. Modified Booth Multiplier
The number of multiplicand multiples has been decreased in
this multiplier by using a higher representation radix, which
also lowers the number of partial products. A given range of
numbers requires fewer digits when a representation radix is
increased. A k-bit binary number can be represented as a
radix-4 number with K/2 digits, a radix-8 number with K/3
digits, and so on. It is capable of handling multiple
multiplier bits at once by utilizing high radix multiplication.
A partial product tree can be reduced in this method with
very few adder blocks, resulting in a shorter signal path and
faster performance for partial product reduction. Using tree
reduction algorithms, partial product reduction is carried
out.
III. METHODOLOGIES
Many research studies have been done in recent years aimed
at improving the multiplier’s performance by efficiently
designing the floating-point arithmetic unit. Power and area
consumption by the floating-point units were reduced in
some works, while high speed and improved accuracy were
attained in others [9]. A variety of multipliers have been
designed till now and successfully implemented for
significand multiplication in floating-point multiplication Fig. 3. Modified Booth Multiplier Block Diagram
The following table shows how radix-4 recoding can be
used to reduce partial products by half.[3]
E. Vedic Multiplier
Vedic mathematics underpins Vedic multipliers. As far as
speed is concerned, the "Urdhva Tiryakbhyam" sutra has
been considered the most effective of the sixteen sutras in
Vedic multiplication. [12]
In 1965, Shri Bharathi Krishna Tirthaji proposed the Urdhva
Tiryakbhyam multiplication algorithm in a book entitled
"Vedic Mathematics" [4]. The algorithm multiplies two
numbers into many sums of products of each number's
single digit. Vertical and crosswise multiplication is used to
multiply bits in the multiplicand and multiplier at various bit
Fig. 10. An analysis of 16-bit multipliers
positions. This multiplier has a lower propagation delay than
a complex traditional multiplier. [4]-[7]
It has been observed that the array multiplier consumes less
area but also has less speed whereas the radix-4 booth
multiplier occupies more area but has a higher speed.
[1] Na Bai, Hang Li, Jiming Lv, Shuai Yang, Yaohua Xu, "Logic Design
and Power Optimization of Floating-Point Multipliers",
Computational Intelligence and Neuroscience, vol. 2022, Article ID
6949846, 10 pages, 2022. https://doi.org/10.1155/2022/6949846.
[2] S. Arish and R. K. Sharma, "Run-time reconfigurable multi-precision
floating point multiplier design for high speed, low-power
applications," 2015 2nd International Conference on Signal
Processing and Integrated Networks (SPIN), 2015, pp. 902-907, doi:
10.1109/SPIN.2015.7095315.
[3] Shakya, Mr. Rahul and Jindal, Poonam, Comparative analysis of 8-bit
and 16-bit Array multiplier, modified Booth Multiplier-A Study
(February 25, 2022). Proceedings of the 3rd International Conference
on Contents, Computing & Communication (ICCCC-2022), Available
at
SSRN: https://ssrn.com/abstract=4043967 or http://dx.doi.org/10.213
9/ssrn.4043967.
[4] V. K. R, A. R. S and N. D. R, "A comparative study on the
performance of FPGA implementations of high-speed single-
precision binary floating-point multipliers," 2019 International
Conference on Smart Systems and Inventive Technology (ICSSIT),
2019, pp. 1041-1045, doi: 10.1109/ICSSIT46314.2019.8987800.
Fig. 14. Analyzing the delay of multipliers implemented with Xilinx Artix- [5] A. Sharma and T. K. Rawat, "Truncated Wallace Based Single
7 FPGA Precision Floating Point Multiplier," 2018 7th International
Conference on Reliability, Infocom Technologies and Optimization
It can be seen that the delay of array multipliers increases (Trends and Future Directions) (ICRITO), 2018, pp. 407-411, doi:
linearly with a linear increase in the number of bits. The 10.1109/ICRITO.2018.8748843.
delay for Wallace and Dadda multipliers does not follow the [6] K. V. Gowreesrinivas and P. Samundiswary, "Comparative study on
performance of single precision floating point multiplier using vedic
linear trend and increases logarithmically as data width multiplier and different types of adders," 2016 International
increases. We can say that the Dadda multipliers are faster Conference on Control, Instrumentation, Communication and
than Wallace multipliers for a given data width and on Computational Technologies (ICCICCT), 2016, pp. 466-471, doi:
combining Booth and Dadda multipliers we get the least 10.1109/ICCICCT.2016.7987995.
possible delay. [7] K. V. Gowreesrinivas and P. Samundiswary, "Comparative
performance analysis of multiplexer based single precision floating
point multipliers," 2017 International conference of Electronics,
V. CONCLUSION Communication and Aerospace Technology (ICECA), 2017, pp. 430-
435, doi: 10.1109/ICECA.2017.8212851.
This paper provides a comparative study of different
[8] B. Jeevan, S. Narender, C. V. K. Reddy and K. Sivani, "A high speed
floating-point multipliers based on their performance binary floating point multiplier using Dadda algorithm," 2013
characteristics. When data width is low, array multipliers International Mutli-Conference on Automation, Computing,
consume the fewest LUTs, but they have the highest Communication, Control and Compressed Sensing (iMac4s), 2013,
latency. As compared to regular array multipliers, modified pp. 455-460, doi: 10.1109/iMac4s.2013.6526454.
Booth multipliers consume a larger area but have a higher [9] V. Buddhe, P. Palsodkar and P. Palsodakar, "Design and verification
of Dadda algorithm based Binary Floating Point Multiplier," 2014
speed. The speed of Vedic and Wallace tree multipliers is International Conference on Communication and Signal Processing,
higher than array multipliers, but their power consumption 2014, pp. 1073-1077, doi: 10.1109/ICCSP.2014.6950012.
is also higher. The Dadda multiplier is even faster than [10] D. Kalaiyarasi and M. Saraswathi, "Design of an Efficient High
Wallace because of the lesser number of half and full adders Speed Radix-4 Booth Multiplier for both Signed and Unsigned
Numbers," 2018 Fourth International Conference on Advances in
at every level. We can draw the conclusion that the Electrical, Electronics, Information, Communication and Bio-
amalgamation of the modified Booth multiplier and Dadda Informatics (AEEICB), 2018, pp. 1-6, doi:
multiplier results in the shortest delay among the various 10.1109/AEEICB.2018.8480959.
designs that have been analyzed. [11] Fun, Chuah Ching and Nandha Kumar Thulasiraman. “Synthesizable
Verilog Code Generator for Variable-Width Tree
Multipliers.” Journal of Physics: Conference Series 1962 (2021): n.
pag.
[12] S. S. Sinthura, A. Begum, B. Amala, A. Vimala and V. Vidhya
Aparna, "Implemenation and Analysis of Different 32-Bit Multipliers
on Aspects of Power, Speed and Area," 2018 2nd International
Conference on Trends in Electronics and Informatics (ICOEI), 2018,
pp. 312-317, doi: 10.1109/ICOEI.2018.8553859.