A Fast VLSI Design of SMS4 Cipher Based On Twisted BDD S-Box Architecture
A Fast VLSI Design of SMS4 Cipher Based On Twisted BDD S-Box Architecture
A Fast VLSI Design of SMS4 Cipher Based On Twisted BDD S-Box Architecture
A Fast VLSI Design of SMS4 Cipher Based on Twisted BDD S-Box Architecture
AbstractSMS4 is a 128-bit block cipher used in the WAPI standard for protecting data packets in WLAN. In this paper, various S-box circuit architectures were evaluated firstly and the twisted BDD with m=4 was proved as the fastest one. A fast SMS4 cipher VLSI implementation was completed based on the twisted BDD S-box architecture, and achieved over 200MHz and 100MHz maximal frequency on SMIC 0.18m and Chartered 0.35m CMOS technology respectively. Keywords-SMS4 cipher; S-box; twisted BDD architecture; VLSI design
I.
INTRODUCTION
In 2006, the Office of State Commercial Cipher Administration of China (OSCCA) released the specification of the SMS4 block cipher [1], which was employed in the Wide Authentication and Privacy Infrastructure (WAPI) standard to provide the data confidentiality in wireless networks. To date, several studies have been performed on the SMS4 cipher, such as differential power analysis [2], differential fault analysis [3][4], the algebraic structure [5], and hardware implementations [6]. However, no research has been reported on speed evaluation and optimization of the SMS4 cipher circuit design. In this paper, we evaluated various hardware architectures of the S-box and completed a fast SMS4 VLSI design with the twisted binary decision diagram (BDD) Sbox architecture presented in [7]. The simulation results indicate that, our design achieves a 200MHz clock frequency on SMIC 0.18m technology, and 103MHz on Chartered 0.35m technology. The rest of this paper is organized as follows. Section II describes the SMS4 block cipher. In section III, some SMS4 S-box circuit architectures are introduced briefly and our evaluation results are presented. A fast full SMS4 VLSI implementation and its simulation results are described in section IV. Finally, conclusions are reported in section V. II. SMS4 BLOCK CIPHER SMS4 is a 32-round iterative algorithm, and both the data block and the key size are fixed to 128 bits [1]. The encryption flow of the SMS4 cipher is showed in Fig. 1.
978-0-7695-3610-1/09 $25.00 2009 IEEE DOI 10.1109/NSWCTC.2009.114 345
A. Encryption Algorithm Let X = (X0, X1, X2, X3) (GF(232))4 be the plaintext and Y = (Y0, Y1, Y2, Y3) (GF(232))4 be the ciphertext. Let denoted by rki GF(232) the round keys and by (Xi, Xi+1, X i+2, X i+3) the (i+1)-th round inputs, i{0,1,...,31}. Then the SMS4 scheme can be written as X i + 4 = F ( X i , X i +1 , X i + 2 , X i +3 , rk i ) (1) = X i T ( X i +1 X i + 2 X i + 3 rk i ) and (Y0 , Y1 , Y2 , Y3 ) = ( X 35 , X 34 , X 33 , X 32 ) (2) where i{0,1,...,31}, F is the round function and T is the composite transformation. The transformation T: GF(232) GF(232) is composed of the nonlinear transformation and the linear transformation L: (3) T (.) = L( (.))
The transformation includes four 8-bit nonlinear Sboxes in parallel. Let denoted by A = (a0,a1,a2,a3) (GF(28))4
the input of and by B = (b0,b1,b2,b3) (GF(28))4 the output. Then can be defined as B = (b0 , b1 , b2 , b3 ) = ( A) (4) = ( Sbox(a 0 ), Sbox(a1 ), Sbox(a 2 ), Sbox(a3 )) where Sbox(.) is the S-box byte substitution which will be described in detail later. The output of , B, is also the input of the linear transformation L. Let denoted by C GF(232) the output of L. Then L can be defined as C = L( B ) = B ( B <<< 2) ( B <<< 10) (5) ( B <<< 18) ( B <<< 24) where <<< i denotes a 32-bit cyclic left shift by i positions. SMS4 is a symmetric key cipher, in which both the sender and the receiver use a single key for encryption and decryption. The decryption procedure of SMS4 can be done in the same way as the encryption procedure by reversing the order of the round keys.
TABLE I. 0 1 2 3 4 5
0 1 2 3 4 5 6 7 8 9 A B C D E F
D6 2B 9C E4 47 68 1E D4 EA E0 1D D5 8D 0A 89 18
90 67 42 B3 07 6B 24 00 BF AE F6 DB 1B C1 69 F0
E9 9A 50 1C A7 81 0E 46 8A 5D E2 37 AF 31 97 7D
FE 76 F4 A9 FC B2 5E 57 D2 A4 2E 45 92 88 4A EC
CC 2A 91 C9 F3 71 63 9F 40 9B 82 DE BB A5 0C 3A
E1 BE EF 08 73 64 58 D3 C7 34 66 FD DD CD 96 DC
3D 04 98 E8 17 DA D1 27 38 1A CA 8E BC 7B 77 4D
B7 C3 7A 95 BA 8B A2 52 B5 55 60 2F 7F BD 7E 20
16 AA 33 80 83 F8 25 4C A3 AD C0 03 11 2D 65 79
B6 44 54 DF 59 EB 22 36 F7 93 29 FF D9 74 B9 EE
14 13 0B 94 3C 0F 7C 02 F2 32 23 6A 5C D0 F1 5F
C2 26 43 FA 19 4B 3B E7 CE 30 AB 72 41 12 09 3E
28 49 ED 75 E6 70 01 A0 F9 F5 0D 6D 1F B8 C5 D7
FB 86 CF 8F 85 56 21 C4 61 8C 53 6C 10 E5 6E CB
2C 06 AC 3F 4F 9D 78 C8 15 B1 4E 5B 5A B4 C6 39
05 99 62 A6 A8 35 87 9E A1 E3 6F 51 D8 B0 84 48
B. Key Schedule The key schedule process of SMS4 cipher has the same structure as that in the encryption process except for L function. Let MK = (MK0,MK1,MK2,MK3)(GF(232))4 denote the cipher key, rki GF(232), i{0,1,...,31} denote the round keys, and Ki GF(232), i{0,1,...,35}. Then key schedule algorithm is defined as ( K 0 , K1 , K 2 , K 3 ) = ( MK 0 FK 0 , MK1 FK1 , (6) MK 2 FK 2 , MK 3 FK 3 ) and rk i = K i + 4 = K i T ( K i +1 K i + 2 K i +3 CK i ) (7) where FKi, i{0,1,2,3} are system parameters, CKi, i {0,1,2,3} are key constants, and T' is a transformation similar to T in the encryption process. The only difference between T and T' is the linear transformation. Instead of L, the following transformation L' is used in T': L( B) = B ( B <<< 13) ( B <<< 23) (8) The system parameters FKi are defined in hexadecimal as FK 0 = 0xA3B1BAC6, FK1 = 0x56AA3350 (9) FK 2 = 0x677D9197, FK 3 = 0xB27022DC The key constants CKi = (cki,0, cki,1, cki,2, cki,3) (GF(28))4 can be computed as follows: ck i , j = (4 i + j ) 7(mod 256) (10)
where i{0,1,...,31}, and j{0,1,2,3}. C. S-box The S-box lookup table of the SMS4 cipher is shown as Table I [1], and the algebraic structure of the S-box can be described as [5] Sbox(a) = I (a A 1 + C1 ) A 2 + C 2 (11) 8 where I(.) is the patched inversion over GF(2 ), the matrices A1, A2 GL(8,2), and the vectors C1, C2 GF(2)8. The cyclic matrices and the row vectors in (11) are as follows:
(12)
(13) (14)
As the S-box operation is the most critical part in the SMS4 encryption and key schedule process, we evaluated various S-box circuit architectures firstly.
A. Approaches Because the SMS4 cipher was released not long ago, few studies have been reported on the S-box circuit architectures. But there are several approaches on the AES S-box circuit design [7][8], which is very similar to the SMS4 S-box. In the AES S-box implementations, the most intuitive approach is to use the lookup table method, where the S-box circuit is synthesized from the complete S-box mapping table directly using EDA tools. The S-box circuit can be also obtained from its truth table using some logic architectures, such as sum of products (SOP), a BDD and a twisted BDD. In addition, compact S-box circuits can be designed based on mathematical operations over composite fields. In the various S-box circuit architectures, the twisted BDD was reported as the fastest one [7]. In this method, the
346
TABLE II. COMPARISON OF VARIOUS SMS4 S-BOX ARCHITECTURES ON SMIC 0.18m CMOS TECHNOLOGY Architecture Area (gates) Delay (ns)
Lookup Table Composite Field SOP BDD Twisted BDD (m=0) Twisted BDD (m=3) Twisted BDD (m=4) Twisted BDD (m=5)
TABLE III. COMPARISON OF VARIOUS SMS4 S-BOX ARCHITECTURES ON CHARTERED 0.35m CMOS TECHNOLOGY Architecture
Figure 2. BDD structure in the twisted BDD architecture.
Area (gates) 1595.33 1068.00 1657.67 1781.33 1826.67 1799.67 1788.33 1941.00
Delay (ns) 3.6649 12.9101 3.0669 3.2690 3.1353 2.9788 2.8057 2.8516
m levels on the output side in each BDD was replaced by a 2m:1 selector which was comprise of a select-signal decoder and a data selection part to reduce the delay, as shown in Fig. 2. With different m value, the delay of each BDD is different. When m=0, there is no selector replacement described above. B. Evaluation Results We completed various S-box circuit architecture designs including coding, logic synthesis and physical design with the same constraint on SMIC 0.18m and Chartered 0.35m CMOS technology respectively. The BDDs were constructed using the CUDD package [9]. The area and delay of the S-box designs are shown in Table II and Table III. From the implementation results, we can find that the twisted BDD with m=4 is the fastest architecture and it is more than 26% and 30% faster than the directly synthesized lookup table method on SMIC 0.18 m and Chartered 0.35m technology, respectively. Although the absolute values of the delay and area also depend on the technology, EDA tools and synthesis constraints, the comparisons of the performances between the various architectures are almost the same.
Lookup Table Composite Field SOP BDD Twisted BDD (m=0) Twisted BDD (m=3) Twisted BDD (m=4) Twisted BDD (m=5) TABLE IV. S-box Output Bit No. Minimal BDD Size Maximal BDD Size 0 69 78
fastest twisted BDD architecture will consume much time, because the total number of the possible input orders combinations of the 8 BDDs in the twisted BDD architecture is 8!7!=203,212,800. So, we have not done any optimization about input orders of the BDDs. The input order combination selected in our SMS4 VLSI implementation and the sizes of the BDDs are shown in Table V. Also, the 4 levels at the input side of each BDD, which correspond to the (8m) inputs at the input side in Fig. 2 where m=4, are also shown in Table V.
TABLE V. INPUT ORDERS AND SIZES OF BDDS IN SMS4 VLSI IMPLEMENTATION
IV.
According to the evaluation results in section III, we selected the twisted BDD with m=4 as the S-box architecture in our SMS4 VLSI design to obtain a higher speed.
A. BDDs Construction We calculated out the sizes of the BDDs with all possible input orders using the CUDD package firstly. The size ranges of the BDDs are shown in Table IV. From the results, we can find that the input order of the BDD does not much affect the overall size of the BDD, which is similar to the BDDs in the AES implementation [7]. In addition, an exhaustive search for the smallest or
BDD Size S-box Output Input Order BDD Size (Full) (4 Levels at Input Side) Bit No. (in7...in0) 0 (01234567) 74 59 1 (12345670) 78 63 2 (23456701) 74 59 3 (34567012) 78 63 4 (45670123) 76 61 5 (56701234) 77 62 6 (67012345) 72 57 7 (70123456) 78 63
347
results, our SMS4 circuit can run at speeds over 200MHz on SMIC 0.18m technology and over 100MHz on Chartered 0.35m technology, and achieves over 800Mbps and 400Mbps throughputs in the CBC mode respectively. The design presented in this paper is suitable for the application fields that require a high operation speed and throughput. ACKNOWLEDGMENT This work was supported by National Natural Science Foundation of China (No. 60606005). REFERENCES
[1] Office of State Commercial Cipher Administration of China, SMS4 cipher for WLAN products (in Chinese), 2006. [Online]. Available: http://www.oscca.gov.cn/UpFile/200621016423197990.pdf X. Bai, L. Guo, and T. Li, Differential power analysis attack on SMS4 block cipher, in Proceedings of 4th IEEE International Conference on Circuits and Systems for Communications, ICCSC 2008, Shanghai, China, May 2008, pp. 613617. L. Zhang and W. Wu, Differential fault analysis on SMS4 (in Chinese), Chinese Journal of Computers, vol. 29, no. 9, pp. 1596 1602, 2006. W. Li and D. Gu, An improved method of differential fault analysis on the SMS4 cryptosystem, in Proceedings of 1st International Symposium on Data, Privacy, and E-Commerce, ISDPE 2007, Chengdu, China, Nov. 2007, pp. 175180. F. Liu, W. Ji, L. Hu, J. Ding, S. Lv, A. Pyshkin, and R.-P. Weinmann, Analysis of the SMS4 block cipher, in Information Security and Privacy, 12th Australasian Conference, ACISP 2007, Proceedings, LNCS 4586, Townsville, Australia, Jul. 2007, pp. 158170. Y. Jin, H. Shen, and R. You, Implementation of SMS4 block cipher on FPGA, in Proceedings of 1st International Conference on Communications and Networking in China, ChinaCom06, Beijing, China, Oct. 2006, pp. 14. S. Morioka and A. Satoh, A 10-Gbps full-AES crypto design with a twisted BDD S-box architecture, IEEE Trans. VLSI Syst., vol. 12, no. 7, pp. 686691, Jul. 2004. U. Mayer, C. Oelsner, and T. Khler, Evaluation of different Rijndael implementations for high end servers, in Proceedings of 2002 IEEE International Symposium on Circuits and Systems, ISCAS 2002, vol. 2, Scottsdale, AZ, USA, May 2002, pp. 348351. F. Somenzi, CUDD: CU decision diagram package release 2.4.1, 2005. [Online]. Available: http://vlsi.colorado.edu/~fabio/CUDD
B. Implementation Results
On SMIC 0.18m technology and Chartered 0.35m technology, we completed the RTL coding, logic synthesis and physical design of the full SMS4 cipher VLSI design with the twisted BDD S-box architecture with m=4. The implementation results are shown in Table VI, including the area, the delay, the maximal frequency and the throughput in the cipher block chaining (CBC) mode. In the SMS4 VLSI implementation, we achieved a frequency more than 200MHz and a throughput more than 800Mbps on SMIC 0.18m technology, and a frequency more than 100MHz and a throughput more than 400Mbps on Chartered 0.35m technology.
C. Discussion Based on twisted BDD architecture, we improved the operation speed of the S-box circuit and also the full SMS4 cipher VLSI design. On the other hand, from the implementation results, we can find that much of the critical path delay is used by other operations other than S-box, including XORs, multiplexors and setup time required by the technology. So, future works on improving the SMS4 circuit speed can be focused on the fast circuit architecture design of other parts in SMS4 cipher.
[2]
[3]
[4]
[5]
[6]
[7]
V.
CONCLUSIONS
[8]
In this paper, we evaluated several circuit architectures of the S-box in SMS4 cipher and completed a fast full SMS4 cipher VLSI implementation based on the twisted BDD Sbox architecture with m=4. According to the experiment
[9]
348