Fast Modular Multiplication Using Parallel Prefix Adder: Pravin P. Zode Raghavendra B. Deshmukh
Fast Modular Multiplication Using Parallel Prefix Adder: Pravin P. Zode Raghavendra B. Deshmukh
Fast Modular Multiplication Using Parallel Prefix Adder: Pravin P. Zode Raghavendra B. Deshmukh
Abstract- Public key cryptography applications involve use of by incorporating alternative regular adder structures. An
large integer arithmetic operations which are compute intensive Interleaved and Montgomery modular multiplication algorithm
in term of power, delay and area. Modular multiplication, which are considered for implementation as they are more Area-Time
is frequently used most resource hungry block. Generally, last
stage of modular multiplication is implemented by using carry efficient and widely used [9]. These are implemented on Xilinx
propagate adder whose long carry chain takes more time. In this Virtex -6 FPGA platform to test the performance.
paper, FPGA based Modulo multiplication architectures using The rest of the paper is organized as follows- Section-2
Carry Save and Kogge-Stone parallel prefix adder are presented describes reported Interleaved and Montgomery Multiplication
to reduce this problem. Proposed implementations are faster as Algorithms, Section-3 describes proposed architecture of
compared to conventional carry save adder and carry propagate Interleaved and Montgomery Multiplier using Kogge-Stone
adder implementations.
Parallel Prefix adder. The Experimental results are discussed in
Keywords: Interleaved Multiplication, Montgomery Section-4 and finally concluding remarks and future directions
Multiplication, Parallel Prefix Adder, Kogge-Stone Adder, Carry are presented in Section-5.
Save Adder
978-1-4799-6986-9/14/$31.00©2014 IEEE
Algorithm 1: Interleaved Multiplication Algorithm [10] depth and minimum fan out and lowest possible delay [27][28].
Inputs: X, Y, M with 0 < X ,Y < M
Next sub section discusses structure of Kogge-Stone adder.
Output :P = X*Y mod M n: number of bits in X Subsequently, Interleaved multiplication and Montgomery
1. S:=0; C:= 0; A=0; Multiplication implementation is presented.
2. for i in n-1 to 0 loop A. Kogge-Stone Adder
3. C,S := 2S + 2C + Xi*Y
Peter M. Kogge and Harold S. Stone [21] reported this
4. C,S := S + C + A;
parallel prefix form of carry look-ahead adder architecture.
5. A= (2*Sn+1+Sn+2*Cn+1+Cn)*2n mod M; Fig. 1 shows the architecture of Kogge-Stone adder (KSA).
6. end loop ; Three building blocks for this adder are bitwise Propagate-
7. P := C + S; Generate (PG) cell, Group-Propagate-Generate(GPG) cell, and
the sum XORs. Square black cells in architecture are carry
propagate (A+B) and generate structures(A.B). Black cells
B. Montgomery Multiplication
drives upper input of PG Cell. Gray cells are AND-OR
Montgomery multiplication was proposed by P.L. networks used for lower inputs for sum logic and Triangular
Montgomery [12] in 1985 without trial division for speeding blocks are buffers. Kogge-Stone architecture achieves log2N
up the modulus calculation i.e. (XY mod M) for positive stages and fanout of 2 at each stage.
integers. The algorithm implements division operation with
shifts which can be efficiently implemented in hardware. This
approach avoids time consuming trial division that is common
bottleneck in modular multiplication. It works from Least
Significant Bit position.
The above Carry Save based Algorithms are implemented by B. Proposed Architecture
conventional carry propagate adder and Kogge-Stone parallel Application of above structure for Interleaved and
prefix adder. Montgomery Multiplier is presented in this sub-section.
III. PROPOSED ARCHITECTURE Datapath of 163 bit Interleaved Multiplication (Algorithm-1)
and Montgomery Multiplication (Algorithm-2) is shown in
The proposed work explores the performance improvement by fig.2 and fig.3 respectively. Steps 3 and 4 of algorithm- 1 and
improving the adder architecture. The Carry computation in 2 are implemented using CSA . The operation of steps 5 and 6
addition process is speeded up using parallel prefix form of are divide by 2 which are implemented by right shift
carry look-ahead adder. In literature, various parallel prefix operations. Performing additions with CSA significantly saves
adder structures reported are- Ling [20], Kogge-Stone [21], delay and area. The proposed work uses two carry save adders
Brent-Kung [22], Lander-Fischer [23], Han-Carlson [24], for calculation of intermediate results. Instead of carry
Knowles [25] and Sklansky[26]. In parallel prefix adders there propagate adder which is generally used. Final results in the
is always tradeoff between stages of logic, number of logic form of SUM and CARRY (P=S+C) of algorithm are added
gates, maximum fanout of each stage and amount of using Kogge-Stone adder. This reduces the critical path of
interconnection between stages [27]. Kogge-stone adder is operation and increases the speed of operation. FSMs were
widely used because of its regular structure with minimum designed to control the above datapaths.
presents comparison results of implementation of Interleaved
C=0 S=0 and Montgomery Multiplication using RCA and Kogge-Stone
X Y adder. In the proposed architectures dynamic power
consumption is increased by 5.31% and 1.97% , area is
increased by 8.53% and 5.51% due to increase in switching,
fanout and logic levels. However, this structure significantly
Right Shift Right Shift Xi*Y reduces the carry propagation delay. Reduction of delay is
FSM Controller
noticeable in multipliers with Carry Save adder and Kogge-
Carry Save Adder -1 Stone adder. This improvement is attributed because of use of
alternative adder structure. Overall Power-Delay product sees
2*A improvement of 72.83% and 27.42% over RCA. Area-Delay
product also shows improvement of 72.00% and 24.90%. The
Carry Save Adder -2 proposed architectures are best suitable for constraint based
implementation.
A=(A= (2*Sn+1+Sn+2*Cn+1+Cn)*2n mod M) Table 1: Implementation Comparison of Modulo Multipliers
`
Multiplier Interleaved Multiplier Montgomery Multiplier
Kogge-Stone Adder -3 Parameter RCA KSA RCA KSA
Dynamic
44.22 mW 46.57mW 57.98 mW 59.12 mW
Fig. 2 Proposed Datapath of Interleaved Multiplier Power
Delay 6.881ns 1.775ns 1.735ns 1.235ns
C=0 S=0 Xi Y M
Slice LUT 2661 2888 3595 3793
Power- Delay
3.04E-10 8.27E-11 1.006E-10 7.3E-11
Xi*Y Product
Area- Delay
1.83E-05 5.13E-06 6.237E-06 4.68E-06
Product
Carry Save Adder -1
V. CONCLUSIONS
Carry C Sum S S*M We have presented FPGA based architectures for modular
multipliers using parallel prefix Kogge-Stone adder.
FSM Controller