Notes On The McEliece Cryptosystem
Notes On The McEliece Cryptosystem
Notes On The McEliece Cryptosystem
Week 1 (6/5/2011): Background Information on the McEliece Cryptosystem -Asymmetric encryption algorithm, introduced by Robert McEliece in 1978 due to error-correcting codes -First cryptosystem that used randomization in the encryption process -Original algorithm uses binary Goppa codes, and has resisted cryptanalysis thus far -The cryptosystem consists of 3 major algorithms: Probabilistic key generation algorithm (produces a public and private key), probabilistic encryption algorithm, and deterministic decryption algorithm -Common security parameters: The process in a key generation algorithm: 1. Select a binary linear code capable of correcting errors. The code must produce a decoding algorithm and create a generator matrix, . Note that decoding a binary linear code is considered an NP-hard problem. This means that 2. Select a random binary non-singular (INVERTIBLE) matrix.
3. Select a random permutation matrix. (Square binary matrix with exactly one 1 in every row and column, and 0s elsewhere) 4. Compute the 5. Public key: matrix, , Private key: . .
In a probabilistic encryption algorithm: -Uses randomness -Encrypting the same message several times will produce different cipher texts -Encryption algorithms must be probabilistic in order to be semantically secure and even hide partial information about the plaintext -For letting B encrypt a message for A, B must do the following: 1. Encode the message as a binary string of length . 2. Obtain the public key 3. Choose a random binary error vector of length with at most 1s.
4. Compute the binary vector 5. Send the cipher text c to A. In a deterministic decryption algorithm:
2. Use the algorithm for the code generated by G to decrypt c to m, the new decoded message. 3. Compute .
Known attacks on the cryptosystem: 1. Brute force. In other words, this is done by simply finding the generator matrix G, preferably using the Patterson algorithm, which determines whether a given variable-length code is uniquely decodable. But this is unlikely to succeed with large values of n and t. -Randomly pick k columns from G. -If denote the restriction of G, c, and z, respectively, to those k columns, . If zk=0 and Gk is invertible, m can be recovered by solving the system of equations in . The probability that zk=0 is only C (n-t, k)/C (n, k), so brute force attacks have a very low probability (in fact, almost negligible) of succeeding. 2. Most effective known attack: Information set decoding. - Find low-weight code words. Try to directly recover the plaintext when given a certain cipher. Recommended Parameter Sizes: -Original suggestions: n=1024, t=50, k524 -Optimum choice for Goppa code: n=1024, t=38, k644 -The McEliece system still suffers from the drawback that it has a high public key, even though it has fast encryption and decryption operations
Week 2 (6/12/2011): Subroutines (Hamming Codes, Reed-Solomon Codes) -Did some implementation with Hamming encoders and decoders on Java, looked a little bit on Reed Solomon codes -Hamming codes: Invented in 1947 by Richard W. Hamming, who found a way for computers to detect and correct their own errors in passing code messages
-Based on parity checking with adding on extra check bits to detect faulty bits and their locations -Each check bit is a function of the message bits -Check bit=mod 2 sum of some of the message bits -Codeword = message bits + check bits -Bits are indexed from left to right, beginning with 0 -SAMPLE CODE VECTOR REPRESENTATION: c= (c0, c1, c2, , c6) Let c0, c1, c2, and c3 be the information symbols Let c4, c5, and c6 be the check symbols Define the check symbols as follows: , , and , all in modulo 2 Message bits 1010 give codeword 1010011 simply as a result of plugging in the values in the mod 2 equations The codeword c is then transmitted across the channel The channel adds an error pattern vector e to the codeword, giving a received pattern r r=c+e Every codeword satisfies the matrix equation , where c is the codeword and H is the
parity check matrix. Each column in the parity check matrix represents every distinct nonzero combination of bits. n=k+r, where n is the total length of the codeword, k is the number of message bits, and r is the number of check bits. k/n is known as the codes information rate.
-A binary code of length n is a subspace of the space of all vectors with n components, with each component being a 0 or a 1 -The subspace has 2k vectors that constitute the code, where k is the number of information symbols. The check symbols are already completely determined by the information symbols. The code itself is a subspace of the vector space containing 2n vectors, each of length n bits. -The codeword forms the matrix product (called a syndrome) which reveals the pattern of parity check failures on the received pattern r. . Positions with errors are indicated by 1s, while the rest are 0s. -THE DECODING PROCESS: 1. Find the syndrome vector .
2. Find which column in H the syndrome matches. (If the syndrome has all 0s, you can safely assume that no errors are present) 3. Column index i indicates that the error pattern e is the vector with a 1 in the ith position and 0s in all the other positions. Thus the decoders estimate of the transmitted codeword is c=r+e -Minimum distance=minimum Hamming weight of a nonzero codeword (Number of positions at which the corresponding symbols between 2 equal-length code strings are different) -THEOREM: If no linear combination of d or fewer columns of H gives the zero column vector, then the code has a minimum distance of at least d+1. -Three Major Characteristics of a Code: n, k, d. Length, number of information symbols, and minimum distance -For each integer m, there is a binary linear code with m parity bits and 2m-m-1 data bits.
Week 3: Reed-Solomon Error Correction -Implemented Reed-Solomon codes on Java, read about how the encoding and decoding processes work -Highland Communication Technologies web source: Reed-Solomon codes
Week 4: Goppa Codes -Goppa codes are often viewed as a generalization of BCH codes -Useful for finding asymptotic bounds on codes