Weight spectrum of quasi-perfect binary codes with distance 4

We consider the weight spectrum of a class of quasiperfect binary linear codes with code distance 4. For example, extended Hamming code and Panchenko code are the known members of this class. Also, it is known that in many cases Panchenko code has the minimal number of weight 4 codewords. We give exact recursive formulas for the weight spectrum of quasi-perfect codes and their dual codes. As an example of application of the weight spectrum we derive a lower estimate for the conditional probability of correction of erasure patterns of high weights (equal to or greater than code distance).

As an example of application of the weight spectrum we derive a lower estimate for the conditional probability of correction of erasure patterns of high weights (equal to or greater than code distance). Definition 1. The doubling construction creates a parity check matrix Hr of an [nr , nr − r, dr ] code from a parity check matrix Hr−1 of an [nr−1 , nr−1 −(r−1), dr−1 ] code as follows   0...0 | 1...1 Hr =  − − −− | − − −  . (1) Hr−1 | Hr−1 By (1) we have nr = 2nr−1 . Let us define matrices S and M as   1 0 0 0 1   0 1 0 0 1  0   S= , M= 0 0 1 0 1  1 0 0 0 1 1 I. I NTRODUCTION Calculation or estimation of the weight spectrum of linear code is one of very old unresolved problem that gives rise a long list of other unresolved problems in coding theory. Binary quasi-perfect codes has a long history in investigation but with a “hole” in area of weight distribution for the most of the codes. We caught a happy chance to find a “simple” solution for weight spectrum of a whole class of binary quasi-perfect codes. The other and real motivation for the research was to search most effective encoding and decoding schemes for error correction and error detection in computer memory. The physical volume of contemporary memory cells tends to “zero” but the probability of error or defect in a cell tends to be very critical for a whole memory device. As a consequence of this trend we need more and more effective encoding schemes for correction of independent errors and their collections in the form of two dimensional blots. The binary quasi-perfect extended Hamming code is traditional choice for memory devices. We suggest as a better choice Panchenko code in original and product forms (for blot correction). The main our improvement over the traditional solution is the extension of the decoding area due to correction of detected errors as erasure patterns of weights equal to or greater than the code distance. II. Q UASI - PERFECT CODES CREATED BY THE DOUBLING CONSTRUCTION Let an [n, n − r, d] be a linear binary code of length n, redundancy r, and minimum distance d. For a code with redundancy r we introduce also the following notations: nr is length of the code, Hr is its parity check matrix of size r × nr , and dr is code distance. 978-1-5090-4096-4/17/$31.00 ©2017 IEEE 1 1  . Denote by HrEH a parity check matrix of the extended Hamming [2r−1 , 2r−1 − r, 4] code, r ≥ 3. By (1), if Hr−1 = M EH (resp. Hr−1 = Hr−1 ) then Hr = H3EH (resp. Hr = HrEH ). If in (1) we have dr−1 = 3 then dr = 3 since the left part of Hr contains 3 linear dependent columns provided by the structure of Hr−1 . Finally, let hi (resp. [0hi ]T or [1hi ]T ) be a column of Hr−1 (resp. Hr ). If in (1) dr−1 ≥ 4 then dr = 4 as the sum of columns [0hi ]T + [0hj ]T + [1hi ]T + [1hj ]T , i 6= j, is equal to zero. Definition 2. A code correcting t errors is quasi-perfect if its covering radius is equal to t + 1. In particular, a quasi-perfect code with distance d = 4 has covering radius 2. Minimum distance of any code correcting t errors is equal to 2t+1 or 2t+2. A linear quasi-perfect code is “non-extendable” in the sense that addition of any column to a parity check matrix decreases the code distance. Any linear [n, n − r, 2t + 2] code with 2t + 2 ≥ 4 is either a quasi-perfect one or shortening of some quasi-perfect code of redundancy r and distance 2t + 2. Theorem 3. [1] Let nr ≥ 2r−2 +2, r ≥ 3, and let an [nr , nr − r, 4] code be quasi-perfect. Then a parity check matrix Hr of the code can be presented in the form (1) where matrix Hr−1 is given in one of the following three variants only: • Hr−1 is a parity check matrix of an [nr−1 , nr−1 −(r −1), 4] quasi-perfect code with nr−1 = 12 nr ; • Hr−1 = S; • Hr−1 = M. 2198 2017 IEEE International Symposium on Information Theory (ISIT) Corollary 4. [1] Let nr ≥ 2r−2 + 2, r ≥ 5, and let an [nr , nr − r, 4] code be quasi-perfect. Then length nr can take any value from the sequence nr = 2r−2 + 2r−2−g for g = 0, 2, 3, 4, 5, . . . , r − 3. (2) Moreover, for each g = 0, 2, 3, 4, 5, . . . , r − 3, there exists an [nr , nr − r, 4] quasi-perfect code with nr = 2r−2 + 2r−2−g . Also, nr may not take any other value that is not noted in (2). Now we give a general description of a parity check matrix for whole class of quasi-perfect codes with distance 4. Let Bk,g = [bk . . . bk ] , g ∈ {0, 2, 3, 4, 5, . . . , r − 3}, be the (r − g − 2) × (2g + 1) matrix of identical columns bk , where r ≥ 5 is code redundancy, bk is the binary representation of the integer k (with the most significant bit at the top position). Corollary 5. [1] Let nr = 2r−2 + 2r−2−g , r ≥ 5, g ∈ {0, 2, 3, 4, 5, . . . , r − 3}, and let an [nr , nr − r, 4] code be quasi-perfect. Then a parity check matrix Hr of the code can be presented in the form  B0,g Hr =  − − − Hg+2 | | | B1,g −−− Hg+2 | | | ... | | |  BD,g − − − , Hg+2 (3) where D = 2r−g−2 − 1, H2 = M , H4 = S, Hg+2 is a parity check matrix of a quasi-perfect [2g + 1, 2g + 1 − (g + 2), 4] code if g ≥ 3. Remark 6. By Corollary 5 a parity check matrix of any quasiperfect binary code with length 2r−2 +2r−2−g and redundancy r can be created by (r − g − 2)-fold applying of the doubling construction. As it is noted above, an arbitrary [n, n − r, 4] code is either a quasi-perfect code or shortening of some quasi-perfect code with d = 4 and redundancy r. Therefore Theorem 3, Corollaries 4, 5, and Remark 6, in fact, describe all binary linear codes with d = 4 and length ≥ 2r−2 +2. It is why weight spectrum of codes obtained by the doubling construction (1) is an important problem. The class of codes, say D, obtained by the doubling construction is sufficiently wide. By (1), the [2r − 1, 2r − 1 − r, 3] Hamming code and many its shortenings are included in D. It follows from Theorem 3 that [2r−1 , 2r−1 − r, 4] extended Hamming code and Panchenko code Πr (see below) belong to D. Other numerous non-equivalent codes of D can be obtained by multiple application of the doubling construction to distinct quasi-perfect [2g + 1, 2g + 1 − (g + 2), 4] codes C0 with g ∈ {0, 2, 3, 4, 5, . . . , r − 3}, see (3). Examples of codes C0 can be found in [1]–[3] in algebraic and in geometrical forms. For instance, we give a parity [9, 9 − 5, 4] code.  00000  10001   01001   00101 00011 check matrix of a quasi-perfect | | | | | 1111 0000 1001 0101 0011    .   The quasi-perfect codes Πr were proposed by V.I. Panchenko in paper [4]. The [n, n − r, 4] code Πr has length n = 5 · 2r−4 , redundancy r ≥ 5, and code distance d = 4. (In paper [5] the code Πr is denoted as Π.) The parity check r × 5 · 2r−4 matrix Pr of Panchenko code Πr is the matrix Hr of (3) with g = 2, D = 2r−4 − 1, and Hg+2 = S. So,   B0,2 B1,2 B2,2 . . . BD,2 . (4) Pr = S S S ... S Remind the known [4]–[6] and important properties of Panchenko code and its shortenings: • For all r and n, there exist a shortened Panchenko code in which the number of weight 4 codewords is close to the theoretical lower bound. • Independently of shortening algorithm, for all r and n, the number of weight 4 codewords in Panchenko code and its shortenings is smaller than in a shortened extended Hamming code. • For r = 7, n ∈ {32, 33, . . . , 40}, and r = 8, n ∈ {72, 73, . . . , 80}, Panchenko code and its shortenings by a special algorithm have the minimal number of weight 4 codewords among all other codes of the same length and redundancy. As the consequence of this property, Panchenko code has a small (often the minimal) probability of undetected error since this probability is essentially defined by the number of weight 4 codewords. In particular, it is important for error correction in computer memory [5], [6]. III. W EIGHT SPECTRUM OF CODES CREATED BY THE DOUBLING CONSTRUCTION By Section II, a parity check matrix of any quasi-perfect binary code with d = 4 can be created by multiple application of the doubling construction. Therefore, Theorems 7 and 8 allow us to obtain weight spectrum of such code (and its dual) starting from weight spectrum of a short code. We use notations introduced in the previous section. Also, (r) for a code with redundancy r we denote by Aw the number of (r)⊥ codewords of weight w and by Aw the number of codewords of weight w in the dual code. Theorem 7. Let dr ≤ 4. Assume that an [nr , nr − r, dr ] code Cr is created from a [ 12 nr , 21 nr − r + 1, dr−1 ] code Cr−1 by the doubling construction (1). Then weight spectrum (r) {Aw , dr ≤ w ≤ nr } of Cr can be obtained from weight (r−1) spectrum {Aw , dr−1 ≤ w ≤ 12 nr } of Cr−1 as follows:  1 v−2 X nr − 2v + 2j (r) (r) 2v−2j−1 (r−1) 2 (5) A2v = ∆v + 2 A2v−2j j j=0 2199 2017 IEEE International Symposium on Information Theory (ISIT) where ∆(r) v (r) A2v+1 = v−2 X =  0 1 2 nr v  (r−1) 22v−2j A2v+1−2j j=0 if v if v 1 2 nr odd ; even  − 2v − 1 + 2j . j (6) Proof. We consider structures of weight w codewords and the structures of the corresponding sets of w columns of a parity check matrix. Let u ∈ {r, r − 1}. Let cw,u be a weight w codeword of the code Cu . Denote by Hr (cw,u ) the set of w columns of the matrix Hu corresponding to the codeword cw,u . By definition, the sum of all columns of Hu (cw,u ) is equal to zero. We describe column sets of Hr in (1) with the help of column sets of Hr−1 placed in the left and right sides of (1). (i) Let us consider all possible structures of codewords c2v,r of even weight 2v and the corresponding column sets Hr (c2v,r ) in the matrix Hr of (1). Every such column set consists of the following components: • A column set Hr−1 (c2v−2j,r−1 ) partitioned by two parts that are placed in the left and right sides of Hr . • Two sets of the same j columns of Hr−1 placed in the left and right sides of Hr . (These column sets are not connected with any codewords of Cr−1 .) For j = 0, 1, . . . , v − 2 and for every codeword c2v−2j,r−1 of even weight, we explain summands of the formula (5). v−2 P 2v−2j−1 (r−1) 1 nr −2v+2j  of (5). – The summand 2 A2v−2j 2 j j=0 A column set Γ = Hr−1 (c2v−2j,r−1 ) is partitioned by two parts. Every part contains an odd (resp. even) number of columns if j is odd (resp. even). The partition is executed by all possible ways. The number of the partitions is equal to 22v−2j−1 . The obtained parts are placed in the left and right sides of Hr . Also, in every of two submatrices Hr−1 of (1) we take the same set of j columns that do not belong to Γ. The number of  1 such j-sets is equal to 2 nr −2v+2j . As a result, in the right j side of Hr we always take an even number of columns.  1 (r) nr 2 – The summand ∆v = v of (5). If v is even then in every of two submatrices Hr−1 of (1) we take the same  set of v columns. The number of variants is 1 equal to 2 vnr . (ii) Let us consider all possible structures of codewords c2v+1,r of odd weight 2v + 1 and the corresponding column sets Hr (c2v+1,r ) in the matrix Hr of (1). Every such column set consists of the following components: • A column set Hr−1 (c2v+1−2j,r−1 ) partitioned by two parts that are placed in the left and right sides of Hr . • Two sets of the same j columns of Hr−1 placed in the left and right sides of Hr . (These column sets are not connected with any codewords of Cr−1 .) For j = 0, 1, . . . , v−2 and for every codeword c2v+1−2j,r−1 of odd weight, we explain the formula (6). A column set Γ = Hr−1 (c2v+1−2j,r−1 ) is partitioned by two parts. One part, say Aodd , contains an odd number of columns, another part, say Beven , contains an even number of columns. The partition is executed by all possible ways. The number of the partitions is equal to 22v−2j . If j is odd then the part Beven (resp. Aodd ) is placed in the left (resp. right) side of Hr . If j is even or j = 0 then the part Aodd (resp. Beven ) is placed in the left (resp. right) side of Hr . Also, in every of two submatrices Hr−1 of (1) we take the same set of j columns that do no belong  to Γ. The number of 1 . As a result, in the right such j-sets is equal to 2 nr −2v−1+2j j side of Hr we always take an even number of columns. Now we give the weight spectrum for duals to quasi-perfect codes. Theorem 8. Let dr ≤ 4. Assume that an [nr , nr − r, dr ] code Cr is created from a [ 12 nr , 21 nr −r +1, dr−1 ] code Cr−1 by the (r)⊥ doubling construction (1). Then weight spectrum {Aw , w ≤ ⊥ nr } of the [nr , r, dr ] code dual to Cr can be obtained from (r−1)⊥ weight spectrum {Aw , w ≤ 12 nr } of the [ 21 nr , r−1, d⊥ r−1 ] code dual to Cr−1 as follows:  0 if 2v 6= 12 nr (r)⊥ . (7) A2v = Av(r−1)⊥ + r−1 2 if 2v = 21 nr Proof. We consider matrix (1) as a generator matrix of the dual code. If codeword of the dual code is created without inclusion the top row, then its weight is equal to the doubled weight of the corresponding word formed from rows of matrix Hr−1 . If the top row is included into codeword, its weight is equal to 12 nr . IV. O N CORRECTION OF ERASURE PATTERNS OF HIGH WEIGHT Knowledge of the weight spectrum of a code opens a way for calculation of very important probabilities for the code, like conditional probability of correct decoding of erasure patterns, probability of undetected error and so on. In binary codes, the number of parity check bits is larger than code distance. That is a good reason to investigate a total ability of codes to correct erasure patterns of high weights (equal to or greater than code distance). The necessary condition for correction of weight ρ erasure patterns is the full rank of submatrix, consisting of columns of a code parity check matrix, corresponding to erased positions. Let Sρ be the number of erasure patterns of weight ρ, which can be corrected by a code (equivalently, for a code parity check matrix, Sρ is the number of distinct sets of ρ linear independent columns or the number of distinct r × ρ submatrices of the full rank). S For a code of length n, let δρ = nρ be the conditional (ρ) probability of correct decoding of erasure patterns of weight ρ. In further, for [n, n − r, d] code with weight spectrum A0 , A1 , . . . , An we introduce the function   X   ρ n n−w Ψ(n, d, ρ) = − Aw , d ≤ ρ ≤ r. (8) ρ ρ−w 2200 w=d 2017 IEEE International Symposium on Information Theory (ISIT) This function gives a lower estimate of Sρ , see [7], [8]. We give a recursive form of function of type (8) :   X ρ n Aw (n)Ψ̃(n − w, d, ρ − w), − Ψ̃(n, d, ρ) = ρ Theorem 9. For an [n, n−r, d] code, the conditional probability δρ and the value Sρ satisfy the following lower estimates: w=d Ψ(n, d, ρ)  , δρ ≥ n Sρ ≥ Ψ(n, d, ρ), d ≤ ρ ≤ r. (9) ρ In particular, the following equalities δρ = Ψ(n, d, ρ)  , n Sρ = Ψ(n, d, ρ), where Aw (n) is the number of weight w words in a (shortened) code of length n. A recursive estimate of the conditional probability of correct decoding of erasure patterns of weight ρ and the first and second steps of the recursion has the form, respectively, ρ δ̃ (n, d, ρ) = hold under the condition Ψ̃(n, d, ρ)  n ρ d−1 ρ≤d+ . 2 =1− ρ X Aw (n)δ̃(n − w, d, ρ − w) w=d The proof of Theorem 9 is based on the fact that the value Sρ is equal to the difference between the total number of sets of ρ columns of a parity check matrix and the number of patterns of ρ linear dependent columns. Now we use the known binomial approximation of weight  n , spectrum of a binary linear code [9]–[11] Aw ≈ 2−z w r − 1 < z ≤ r, w ≥ d, where z is a real value taking into account (in principle) correction terms in the mentioned approximations and the weight region w ≥ d. We obtain the following approximation of the function Sρ for the region d ≤ ρ ≤ r.     X ρ n−w n Aw Sρ ≥ − ρ−w ρ w=d      ρ X n n−w n − 2−z ≈ ρ−w w ρ w=d  X   ρ   ρ n n . − 2−z = w ρ ρ w=d From here, using [11, Lemma 10.8], we obtain an estimate of the conditional probability δρ of correct decoding of erasure patterns of high weight ρ. ρ   X ρ Sρ δρ ≥ n ≈ 1 − 2−z w ρ w=d ≈ 1 − 2−z · 2ρH(d/ρ) ≥ 1 − 2ρ−z , d ≤ ρ < z, where H(d/ρ) is the binary entropy. The proposed estimate shows that for a fixed r, the probability δρ decreases exponentially with growth of ρ. Therefore the reasonable extended region of correctable erasure patterns is ρ < 2d. The following lemma allows us to improve estimates of Theorem 9 using a recursive approach. Lemma 10. Any set of ρ linear dependent columns of a parity check matrix is an union of w columns with the zero sum (corresponding to a weight w codeword ) and a set of ρ − w linear independent columns, where d ≤ w ≤ ρ. δ˜2 (n, d, ρ) = 1 − ρ X Aw1 (n) w1 =d " × 1− ρ−w X1 w2 =d Aw2 (n − w1 ) n−w1 ρ−w1  n ρ  n−w1 −w2 ρ−w1 −w2  n−w1 ρ−w1 n−w ρ−w  n ρ  ; × # . V. A PPLICATION TO MEMORY An important area for application of quasi-perfect codes is computer memory (Flash or SSD). Their ability to correct a big number of erasures instead of one error and very low probability of undetected error gives us a strong incentive to investigate the conditional probability of correct decoding for erasure patterns of high weight. As an example, useful for application, we give two tables: the first one for conditional probability of correct decoding for erasure patterns of weights higher the code distance and the second one for the probability (unconditional) of decoding failure in memory channel with different error probability for the product of Panchenko codes. Decoding algorithm for product of Panchenko codes consists of following steps. 1) Error detection in rows and columns of the received word (in parallel). 2) Check (in parallel) of the detected row (column) list for correctability as erasure pattern. 3) Correction of the chosen erasure pattern (row or column) and output. Check for correctability is executed in extended area up to d+ erasures. Table I gives a comparison between Hamming and Panchenko codes with 7 and 8 parity symbols. We can see from the table that extended decoding with correction of 4, 5, 6, 7 erasures has decreasing probability from 1 up to 1/2 (approximately). Table II demonstrates fast decreasing of the probability of decoding failure for fixed number of parity bits with extension of the decoding area for product of two Panchenko codes. We can see from the second table fast decreasing of the failure probability with extension of the decoding area from 3 up to 6 erasures. 2201 2017 IEEE International Symposium on Information Theory (ISIT) TABLE I C ONDITIONAL PROBABILITY δρ OF CORRECT DECODING OF ERASURE PATTERNS OF WEIGHT ρ FOR H AMMING AND PANCHENKO CODES code Hamming Panchenko Hamming Panchenko r 7 7 8 8 ρ=d=4 0.9836 0.9870 0.9920 0.9934 ρ=5 0.9180 0.9287 0.9600 0.9647 ρ=6 0.7469 0.7656 0.8741 0.8830 ρ=7 0.4121 0.4306 0.6879 0.6996 TABLE II FAILURE PROBABILITY FOR PRODUCT OF PANCHENKO CODES [72, 64, 4] d+ d+ d+ d+ p =3 =4 =5 =6 10−1 1 1 1 1 10−2 0,996 0,988 0,967 0,926 5 · 10−3 0,250 0,092 0,027 0,008 10−3 1,1e-09 1,6e-12 7,0e-14 5,8e-14 5 · 10−4 2,3e-14 5,1e-18 1,045e-18 1,029e-18 ACKNOWLEDGMENT The research was carried out at the IITP RAS at the expense of the Russian Foundation for Sciences (project 14-50-00150). R EFERENCES [1] A. Davydov and L. 