Academia.eduAcademia.edu

Multi-Kernel Polar Codes: Concept and Design Principles

2020, IEEE Transactions on Communications

In this paper, we propose a new polar code construction by employing kernels of different sizes in the Kronecker product of the transformation matrix, thus generalizing the original construction by Arikan. The proposed multi-kernel polar code allows for more flexibility in terms of the code length, moreover allowing for various new design principles. We describe in detail encoding as well as successive cancellation (SC) decoding and SC list (SCL) decoding, and we provide a novel design method for the frozen set that allows to optimise the performance under list decoding, as opposed to original relability-based code design. Finally, we numerically demonstrate the advantage of multi-kernel polar codes under the new design principles compared to punctured and shortened polar codes.

1 Multi-Kernel Polar Codes: Concept and Design Principles arXiv:2001.04670v1 [cs.IT] 14 Jan 2020 Valerio Bioglio, Frédéric Gabry, Ingmar Land, Jean-Claude Belfiore Mathematical and Algorithmic Sciences Lab France Research Center, Huawei Technologies France SASU Email: {valerio.bioglio,frederic.gabry,ingmar.land,jean.claude.belfiore}@huawei.com Abstract—In this paper, we propose a new polar code construction by employing kernels of different sizes in the Kronecker product of the transformation matrix, thus generalizing the original construction by Arikan. The proposed multi-kernel polar code allows for more flexibility in terms of the code length, moreover allowing for various new design principles. We describe in detail encoding as well as successive cancellation (SC) decoding and SC list (SCL) decoding, and we provide a novel design method for the frozen set that allows to optimise the performance under list decoding, as opposed to original relability-based code design. Finally, we numerically demonstrate the advantage of multi-kernel polar codes under the new design principles compared to punctured and shortened polar codes. u0 u1 u2 u3 u4 Polar codes are a family of error-correcting codes recently introduced by Arikan in [1] as the first codes able to provably achieve channel capacity for a large number of channels. The construction of a polar code of length N = 2n is based on  the recursive concatenation of the binary matrix T2 = 11 01 , referred to as the kernel of the transformation. This operation results in a transformation matrix TN = T2⊗n , given by the nfold Kronecker power of the kernel matrix T2 , converting the physical channel into N virtual synthetic channels characterized by either very high or very low reliability. This channel polarization effect leads to a portion of fully reliable channels that tends to the channel capacity for symmetric binary-input memoryless channels, when the code length tends to infinity. In the asymptotic case, successive cancellation (SC) decoding is sufficient to achieve the channel capacity [1]. In the finitelength regime, successive cancellation list (SCL) decoding [2] leads to performance competitive with many other classes of channel codes, like LDPC codes, particularly when an outer CRC code or generalizations thereof are applied [3]. Due to their excellent performance, polar codes were recently adopted as moderate-length codes for 5G [4]. As conjectured in [1], the polarization phenomenon obtained by the Kronecker powers of T2 can be extended to other kernels. In [5], authors show necessary and sufficient conditions for kernels to polarize, allowing researchers to propose larger binary kernels as bases of novel polar codes [6]. Nonbinary kernels have been proposed to improve the asymptotic error probability [7], [8], while mixing binary and non-binary kernels showed additional improvement over homogeneous kernels constructions [9]. As a result, current constructions restrict the length of polar codes to be in the form N = ln . T3 u5 u6 u7 T3 u8 u9 u10 I. I NTRODUCTION T3 T3 u11 Stage 3 π3 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 Stage 2 π2 Stage 1 x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 π1 Fig. 1: Tanner graph of the multi-kernel polar code of length N = 12 for the transformation matrix T12 = T2 ⊗ T2 ⊗ T3 . This code length constraint can be a huge limitation to practical use of polar codes, since only few block lengths can be expressed as a power of an integer. Punctured [10] and shortened [11] polar codes have been proposed to increase the number of achievable block lengths. Even if these techniques offer a practical way to construct codes of arbitrary lengths, they show many disadvantages. In fact, punctured and shortened codes are decoded by means of their mother polar codes, increasing the decoding latency with respect to the actual code length. Moreover, the location of dummy bits, altering the polarization of the codes, has to be carefully chosen to avoid catastrophic error-rate performance [12]. Finally, the lack of structure between the frozen sets and the puncturing or shortening patterns complicates the code design [13]. In this paper, we present multi-kernel polar codes, which generalize polar codes by mixing kernels of different sizes over the same binary alphabet. These codes were theorized in [14], where their polarization rate is calculated algebrically, and explicitaly constructed in [15]; they conceptually permit to construct polar codes of any block length while keeping the polarization effect [16]. The encoding follows the general structure of polar codes, and the decoding can be performed by successive cancellation as well. Building on our previous contributions in [15] and [17], in this paper we present a thorough description and analysis of construction of multi-kernel polar codes, discussing the issues related to the choice of 2 component kernels. Finally, similarly to [18] we combine the aforementioned designs into a novel hybrid design exhibiting good error correction performance at moderate length. This paper is organized as follows. In Section II, we present the general construction, including the encoding and decoding, of multi-kernel polar codes. In Section III we provide recommendations for the selection of kernels to be used in the multi-kernel construction. In Section IV we describe explicitly the different designs for multi-kernel polar codes, namely by reliability, by minimum distance, and by a hybrid criterion. In Section V we discuss the performance of the proposed codes, and Section VI concludes this paper. II. C ODE AND D ECODER In this section, we introduce the structure, encoding and decoding of multi-kernel polar codes. As an example, the Tanner graph of a code of length N = 12 is depicted in Fig. 1, comprising two kernels of size 2 and one kernel of size 3. A. Code Structure and Encoding Multi-kernel polar codes are a generalization of Arikan’s polar codes obtained by using binary kernels of different sizes in the construction of the transformation matrix of the code. An (N, K) multi-kernel polar code of length N and dimension K is defined by a N × N transformation matrix TN = Tp 1 ⊗ Tp 2 ⊗ . . . ⊗ Tp s , (1) with N = p1 · p2 · . . . · ps , and a frozen set F ⊂ [N ], where [N ] = {0, 1, 2, . . . , N − 1}, such that |F| = N − K. The information set is defined as I = F C . Building blocks of the code are the pi × pi matrices Tpi with binary entries, which define kernels of dimension pi [19]. A list of binary polarizing kernels with maximum exponents, i.e. maximum polarization, can be found in [7]. However, other kernels may be advantageous for the design of multi-kernel polar codes, as described in Sec. III. The frozen set collects the indices of the input vector to be frozen: its design will be discussed in Section IV. Codewords x ∈ F2 N are generated from the input vector u ∈ F2 N by x = u·TN , where uj = 0 for j ∈ F and the remaining K entries of ui for i ∈ I store the information to be transmitted. Note that outer CRCs or similar parity checks may be inserted as for original polar codes. In the following, we will refer to polar codes when the transformation matrix is generated using a single kernel according to the original formulation by Arikan, while we will refer to multi-kernel polar codes if more than one binary kernel is used in the transformation matrix generation. The order of the kernels in (1) is important for the design of the code, as this operation is not commutative. Changing the order of kernels in TN is equivalent to permuting its rows and columns, since for any Kronecker product there exist two permutation matrices P, Q such that A ⊗ B = P · (B ⊗ A) · Q [20]. In practice, changing the order of the kernels leads to a transformation matrix in which rows and columns are permuted as compared to the original transformation matrix. Every frozen set imposed on the original matrix can hence be mapped in a frozen set of the permuted matrix: all the kernel orders lead to equivalent codes. However, this order may have an effect on the polarization of the virtual channels as discussed in Section IV-A. B. Tanner Graph The structure of multi-kernel polar codes can be illustrated by the Tanner graph as depicted in Figure 1. This graph describes the transformation matrix TN of the code, and consists of various pi × pi boxes, each corresponding to a kernel Tpi which defines the relation between the input vector and the output vector. A pi × pi box has pi inputs and outputs. A stage of the graph corresponds to a factor in the Kronecker product of the transformation matrix, and is depicted by N/pi boxes vertically distributed, for a total of s stages, counted from the codeword x (right) to the input vector u (left). The connections between stages are implicitly defined by the Kronecker product. Stage i has N/pi boxes, each one representing a kernel Tpi , that are connected to the N/pi−1 boxes of stage i − 1 through an edge permutation πi . These permutations operate in blocks, where two boxes Qi−1 in different blocks are not connected. Denoting Ni = j=1 pj the partial product of the kernel sizes up to stage i, with N1 = 1, we can divide the boxes forming stages i and i − 1 in N/Ni+1 blocks. Inside a block, boxes of stages i − 1 are further divided into Ni sub-blocks, so that the j-th box of stage i is connected to the j-th output of each sub-block of stage i − 1. This canonical permutation ρi , depicted in Equation (2), will be used as a basis to create the general permutations between stages. In fact, given the canonical permutation ρi , the permutation πi is given by πi = (ρi | ρi + Ni+1 | ρi + 2Ni+1 | . . . | ρi + (N/Ni+1 − 1)Ni+1 ) for i = 2, . . . , s. Note that for the last stage, we have πs = ρs . First permutation π1 , acting like the bit-reversal permutation for polar codes, is an exception obtained inverting the product of the other permutations as π1 = (π2 · . . . · πs )−1 . C. Decoding Decoding of multi-kernel polar codes is performed by successive cancellation (SC) [1] on the Tanner graph of the code. Similar to polar codes, enhanced SC-based decoding methods, like simplified SC (SSC) [21], SC list (SCL) [2] or SC stack decoding [22] may be employed. In SC, bits are decoded sequentially using the log-likelihood ratios (LLRs) of the received symbols along with the (estimated) previously decoded bits. If λi corresponds to the LLR of input bit ui , an SC decoder sequentially evaluates λi = fiN (l0 , l1 , . . . , lN −1 , û0 , û1 , . . . , ûi−1 ) (3) for every i from 0 to N − 1, where li corresponds to the LLR of the codebit xi . In the following, we assume a BPSK transmission over an AWGN channel, referred to as the BIAWGN, denoting with Es the energy of the transmitted symbols and with N0 the single-sided noise power density. With constellation points ±1, the SNR may be given in Es /N0 or in Eb /N0 = 1/R · Es /N0 , where R denotes the code rate and σ 2 = N0 /(2Es ) is the variance of the AWGN. With 3 ρi =  1 1 2 pi + 1 ... ... Ni (Ni − 1)pi + 1 Ni + 1 2 y = [y0 , . . . , yN −1 ] denoting the output of the BI-AWGN channel, the channel LLRs are computed as li = 2yi /σ 2 . The recursive structure of the transformation matrix of multi-kernel polar codes, like polar codes, permits to drastically reduce the decoding complexity by performing the LLR computation on a kernel base. In fact, LLRs can be calculated in the kernel boxes and passed along the Tanner graph of the code to the other kernel boxes from the right to the left, with hard decisions on decoded bits flowing from left to right. These hard decisions, representing the estimates of the intermediate bits, are used by kernel boxes to calculate intermediate LLRs. û0 , λ0 û1 , λ1 ûp−1 , λp−1 .. . Tp .. . x 0 , L0 x 1 , L1 xp−1 , Lp−1 Fig. 2: A p × p box corresponding to kernel Tp . The p × p box corresponding to a Tp kernel is depicted in Fig. 2, having u = [u0 , u1 , . . . , up−1 ] as input vector and x = [x0 , x1 , . . . , xp−1 ] as output vector. Hard decisions on the output vector are calculated via multiplication with the kernel matrix, as x̂ = û · Tp . The LLR calculation is more complex; denoting by Li the input LLRs and by λi the output LLRs (seen right to left), the SC equation (3) can be simplified as λi = fip (L0 , L1 , . . . , Lp−1 , û0 , û1 , . . . , ûi−1 ), (4) taking into account only LLRs and bit estimates belonging to the present box. The formulation of the LLR update functions fip for specific kernels will be discussed in Sec. III. III. K ERNEL A NALYSIS The polarization phenomenon, originally proved for matrix T2 , has been extended to binary matrices in [5] and to arbitrary finite fields in [19], where sufficient and necessary conditions are provided for a square matrix to polarize. A polarizing matrix is called kernel, and can be used in the construction of polar codes. Multi-kernel polar codes are based on the generalized polarization effect obtained mixing kernels of different sizes [16]. Compared to polar codes, our construction permits to exploit the distance properties of the kernels to improve the error correction performance, in particular for small block lengths. Here we analyze the structure and the decoding complexity of large kernels, providing recommendations for the design of kernels for multi-kernel polar codes. A. Minimum-Distance Spectrum The speed of polarization of a kernel is evaluated via the polarization exponent [7], calculated through the partial distances of the kernel matrix. These partial distances are defined as the minimum weights of a sequence given by Ni + 2 pi + 2 ... ... (ni − 1)Ni + 1 pi ... ... Ni+1 Ni+1  (2) the sum of a row and any linear combination of following rows. This notion is based on the nature of SC decoding, and is used to drive the kernel design. Under SCL decoding, however, polar codes show better performance than predicted by the polarization exponent. Moreover, the polarization effect is less important than distance properties for short codes, and kernels should be designed taking this aspect into account. We conjecture the notion of minimum-distance spectrum [17] to be more effective in these scenarios. The minimum-distance spectrum STp of a kernel Tp is defined as the mapping from dimension k, k = 1, 2, . . . , p, to the largest minimum distance achievable by any (p, k) subcode of Tp , i.e. by any code having k rows of Tp as generator (R) matrix. More formally, if Tp is the matrix formed by the rows of Tp indexed by R ⊂ [p] and d(A) is the minimum distance of the code generated by the rows of matrix A, then the minimum-distance spectrum of Tp is defined as   (R) STp (k) = max d TP , (5) |R|=k for k = 1, . . . , p. Optimal row set Rkp ⊂ [p] collects the indices of the k rows of Tp forming the generator matrix of the optimal (p, k, STp (k)) code extracted from the kernel. The minimum-distance spectrum can be seen as a generalization of partial distances. In fact, partial distances are obtained as the distances determined by selecting the rows bottom up. As a result, row sets are nested, every row set being a subset of all larger ones. The minimum-distance spectrum relaxes this constraint, allowing for non-nested information sets, and thus providing a new degree of freedom for the overall code design. While partial distances are conceived for characterizing SC decoding, the minimum-distance spectrum seems to be more effective in portraying SCL decoding. B. Kernel Decoding Formulation of decoding equations (4) for the LLR calculation under SC decoding is of capital importance during kernel design. With reference to Fig. 2, the (input) LLRs Li derive from previous decoding steps or, if the kernel is in the first stage, are the LLRs of bits xj , Lj = L(xj ), calculated from the received symbols; the (output) LLRs λi represent the LLRs of bits ui , λi = L(ui ). They can be expressed using only input LLRs and hard decisions of the previously decoded bits u0 , . . . , ui−1 , as P  P p−1 (1 − x )L (i) exp t t t=0 x∈X0 P , (6) λi = ln P p−1 (1 − x )L (i) exp t t t=0 x∈X 1 (i) Xa = {v · G : v = [u0 , . . . , ui−1 , a, vi+1 , . . . , vp−1 ], vj ∈ F2 }, a = 0, 1, which corresponds to the marginalization over (i) the unknown bits vi+1 , . . . , vp−1 [23]. Since |Xa | = 2p−i , to compute this expression is in general exponential in the kernel 4 size p, even if it can be simplified by human inspection [24] or trellis-based decoding of block codes [25]. However, this formulation complicates the calculation of input-bit reliability, restricting the design of the code to Monte-Carlo methods. In practice, analysis of the tree of recurrent relations of the graph inducted by the kernel matrix may permit to discriminate among the different channel observations, rewriting expression (6) using only basic operations of the LLR algebra [26]; if bits x1 , . . . , xj represent a repetition of bit x0 and the corresponding LLRs Li are based on independent observations, then L(x0 |L0 , . . . , Lj ) = L0 + . . . + Lj . (7) On the other hand, if x0 , . . . , xj are independent bits and the LLRs Li are based on independent observations, then L(x0 ⊕ . . . ⊕ xj ) = L0 ⊞ . . . ⊞ Lj (8) where the ⊞ operation is defined as j j Y Y X at  sgn at . ≈ min (|at |) · tanh ⊞ at , 2 tanh−1 0≤t≤j 2 t=0 t=0 If it is possible to formulate (6) with an expression involving only + and ⊞ operations of the LLRs L0 , . . . , Lp−1 , we say that the decoding equation is expressed in reduced form. In such a case, the complexity becomes linear in p, and the analysis of the kernel polarization can be simplified as explained in the next sections. Reducibility of general decoding equations being an open problem, we conjecture that (6) cannot be expressed in reduced form for all bit positions and all kernels; we suggest to approximate irreducible expressions with similar expressions in reduced form. C. Kernel Examples Minimum-distance spectrum of the original kernel of size 2   1 0 T2 = (9) 1 1 is straightforward: for dimension 1, the second row is used, achieving distance 2, while for dimension 2, both rows are used, achieving distance 1. Thus we have ST2 = (2, 1), with optimal row sets R12 = {1} and R22 = {0, 1}; they are nested since R12 ⊂ R22 . As a proof of concept for the different designs presented in Section IV, we introduce kernels   1 1 1 1 1   1 0 0 0 0 1 1 1      T3 = 1 0 1 , T5 =  (10) 1 0 0 1 0  0 1 1 1 1 1 0 0 0 0 1 1 1 for the construction of multi-kernel polar codes. We calculate their minimum-distance profiles and their decoding equations, showing their flexibility compared to kernels presented in [7]. The spectrum of T3 can be calculated as follows. we select the first row (1 1 1) to maximize the minimum distance, giving minimum distance 3 and R13 = {0}; any other row selection would result in a smaller minimum distance, namely 2. For a code of dimension K = 2, the last two rows, (1 0 1) and (0 1 1), are selected, generating a code of minimum distance 2 with R23 = {1, 2}; any other row selection would result in a smaller minimum distance. Finally, the code of dimension K = 3 requires to select all rows having R33 = {0, 1, 2}, resulting in a code of minimum distance 1. The use of notnested row sets like R13 6⊂ R23 allows for improved minimumdistance spectrum ST3 = (3, 2, 1) compared to [7] while keeping the same polarization rate E = 0.42. A similar analysis for T5 leads to the minimum-distance spectrum ST5 = (5, 3, 2, 1, 1) with optimal row sets R15 = {0}, R25 = {3, 4}, R35 = {2, 3, 4}, R45 = {1, 2, 3, 4} and R55 = {0, 1, 2, 3, 4}. On the other side, its polarization rate is E = 0.359, which is worse than the rate of 0.431 achieved by optimal kernel in [7]. However, the optimal kernel has spectrum (4, 2, 2, 2, 1), limiting the achievable minimum distances of the full code to powers of 2; it is hard to state if proposed T5 spectrum is better, since the full code distance depends on dimension and code length, however it permits a finer quantization in achievable minimum distances. Decoding equations in reduced form can be calculated for the presented kernel; the ones for T2 are the well known f02 : λ0 = L0 ⊞ L1 , f12 : λ1 = (−1)û0 · L0 + L1 . The ones for T3 given in (10) are f03 : λ0 = L0 ⊞ L1 ⊞ L2 , f13 : λ1 = (−1)û0 · L0 + L1 ⊞ L2 , f23 : λ2 = (−1)û0 · L1 + (−1)û0 ⊕û1 · L2 . Finally, the decoding equations for T5 given in (10) are f05 : λ0 = L1 ⊞ L2 ⊞ L4 , f15 : λ1 = (−1)û0 · (L0 ⊞ L3 ⊞ (L2 + (L1 ⊞ L4 ))), f25 : λ2 = (−1)û1 · (L0 ⊞ L1 ) + (L3 ⊞ L4 ), f35 : λ3 = (−1)û0 ⊕û1 ⊕û2 · L0 + (−1)û0 · L1 + (L2 ⊞ (L3 + L4 )), f45 : λ4 = (−1)û0 ⊕û3 · L2 + (−1)û0 ⊕û2 · L3 + (−1)û0 · L4 . All expressions are optimal apart from f25 . Conjecturing the original expression to be irreducible, we approximate it with a reducible one. This partially effects the decoding performance, however it allows for analytical determination of reliabilities to be used for code design as shown in Section IV-A. IV. C ODE D ESIGN Multi-kernel polar codes introduce new options in the code design, which for the original polar codes is limited to the selection of the information set according to reliabilites. In this section, we describe three design principles for multikernel polar codes, called the reliability design, the distance design and the hybrid design, theoretically motivating them and providing for each one a practical design algorithm. 5 A. Reliability Design The reliability design is based on the polarization phenomenon and aims at minimizing the probability of error under SC decoding. This design is conceived for long codes, where channel polarization is strong enough to discriminate the channels properly. After a brief review of the concept, we will show how to determine these reliabilities, concluding the section with a discussion on the optimal kernels order. We upper-bound the error probability under SC decoding of an (N, K) multi-kernel polar code with information set I by X Pe (ui ), (11) PeSC ≤ i∈I where Pe (ui ) = P (ûi 6= ui |ûj = uj , j < i) denotes the probability of making a wrong decision for bit ûi assuming that all previously decoded bits are correct. The reliability design aims to minimize this upper bound so that the information set I R is chosen to contain the K most reliable positions. This is equivalent to finding the K positions minimizing the maximum error probability within these positions; I R can hence be found as the solution of the optimization problem min max Pe (ui ) i∈I (12) s.t. I ⊂ [N ], |I| = K. The simplest way to calculate the reliabilities is to use Monte-Carlo simulation, e.g. to run a genie-aided SC decoder to estimate the error rate of each input bit. In more detail, the all-zero codeword is transmitted over a channel with a target design SNR, and is decoded with a modified SC decoder that counts bit errors based on hard decisions of the LLRs but feeds back the correct decisions. As this method requires a large number of simulations to get stable results, we suggest to compute the approximated reliabilities of the input bits by density evolution under Gaussian approximation (DE/GA) [27] instead. If we suppose LLR distributions to be Gaussian, their variance is twice their mean value, i.e., Li ∼ N (mi , 2mi ), permitting to follow their evolution by tracking their mean. For DE/GA, the means are passed in the Tanner graph from the right to the left, similarly to LLRs. Looking at the Tanner graph block depicted in Fig. 2, we denote the mean of λi by µi , and the mean of Lj by mj . For a BI-AWGN transmission system, the initial channel LLRs are distributed as  li ∼ N σ22 , σ42 , the initial mean value being mi = σ22 [28]. Under the Gaussian assumption, the error probability Pe (ui ) is in direct correspondence with the LLR mean value µi as p Pe (ui ) = Q( µi /2), where Q(.) denotes the tail probability of the standard Gaussian distribution; hence the SC error probability can be lower bounded by p (13) PeSC ≥ max Q( µi /2). i∈I To reduce the complexity of (12), error probabilities may be replaced LLR mean values; I R can then be determined solving max s.t. min µi i∈I I ⊂ [N ], |I| = K, where µi denotes the mean value of the LLR of ui . (14) If kernel decoding equations (4) are expressed in reduced form, then the equations tracking the LLR means can be written directly: if LLRs L0 , . . . , Lj−1 are independent, then µ(L0 + . . . + Lj−1 ) = m0 + . . . + mj−1 µ(L0 ⊞ . . . ⊞ Lj−1 ) = ϕj (m0 , . . . , mj−1 ), (15) (16) where ϕj (m0 , . . . , mj−1 ) = φ 1 φ(m) = 1 − √ 4πm −1 Z 1− +∞ −∞ t=0 ! (1 − φ(mt )) , u (u−m)2 tanh e− 4m du. 2 We recall that functions φ and φ−1  am2 −bm e φ(m) ≈ γ e−αm +β  √  b− b2 +4a ln m 2a 1 −1   φ (m) ≈  β−ln m γ α j−1 Y (17) (18) can be approximated as if 0 ≤ m < c if m ≥ c if 0 ≤ m < c if m ≥ c (19) (20) Parameters α = 0.4527, β = 0.0218, γ = 0.86, a = 0.0564, b = 0.48560, c = 0.867861 are acquired by curve-fitting [29]. The decoding equations for the kernels presented in Section II lead to the following evolution of mean values. DE/GA for kernel T2 : µ0 = ϕ2 (m0 , m1 ) µ1 = m0 + m1 DE/GA for kernel T3 : µ0 = ϕ3 (m0 , m1 , m2 ) µ1 = m0 + ϕ2 (m1 , m2 ) µ2 = m1 + m2 DE/GA for kernel T5 : µ0 = ϕ3 (m1 , m2 , m4 ) µ1 = ϕ3 (m0 , m3 , (m2 + ϕ2 (m1 , m4 ))) µ2 = ϕ2 (m0 , m1 ) + ϕ2 (m4 , m4 ) µ3 = m0 + m1 + ϕ2 (m2 , m3 + m4 ) µ4 = m2 + m3 + m4 The last point to be addressed is the selection of the order of kernels. The kernel order has two aspects. First, transformation matrices obtained by permuting the same kernels lead to equivalent codes by conveniently selecting the two information sets due to the permutation property of the Kronecker product. However, it is hard to predict the impact of the kernel order on the polarization of the input bits due to the non-linearity of function φ. Therefore codes of same length and dimension but different kernel orders may be different if reliability design is performed; as a result, for specific code dimensions, one kernel order may be preferable to others. We propose to perform an exhaustive search among all possible kernel orders to find the best one. For a code of dimension K, the metric used for the order selection is the sum of the reliabilities of the best K bits, and the kernel order giving the largest sum is retained. 6 B. Distance Design In this section, we describe a design for multi-kernel polar codes maximizing the minimum distance of the resulting code [17]. This design is envisaged for short codes, where the polarization effect is not strong enough to prevail over the minimum distance properties. For an (N, K) multi-kernel polar code, the probability of error under maximum-likelihood (ML) decoding for the AWGN channel is bounded below as p (21) PeML ≥ Q( dµ/2), where d denotes the minimum distance of the code and µ = 2/σ 2 the mean of the channel LLR. If the information set I D is selected to maximize dµ, then (21) is minimized; since µ is fixed, this corresponds to solving the optimization problem max s.t. dN (I) (22) I ⊂ [N ], |I| = K, where dN (I) denotes the minimum distance of the code defined by the rows of TN indexed by I. We solve this problem in two steps: first, we calculate the optimal minimum distance through the minimum distance spectrum of TN , then we find the information set achieving that distance. Algorithm 1 Information set for minimum distance 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: Initialize the sets I = ∅ and R0p = ∅ Load vector sN = (2, 1)⊗n ⊗ STp Load optimal row sets R1p , . . . , Rpp for k = 1 . . . K do l = argmax(sN ) c = (l mod p) ⌋ q = ⌊ N −l−1 p  I = I \ (Rcp + qp) ∪ (Rc+1 + qp) p sN (l) = 0 end for return I Though finding the minimum-distance spectrum of a code is in general a complex task, for polar codes it can be easily computed as ST ⊗n = sort([2 1]⊗n ), where the vector 2 is sorted in descending order [30]. This property can be generalized to multi-kernel polar codes as follows. Proposition 1. If TN = T2⊗n ⊗ Tp , then STN = sort(ST ⊗n ⊗ STp ) = sort([2 1]⊗n ⊗ STp ). 2 (23) Proof: The property obviously holds for n = 0. By inductive hypothesis we now suppose that it holds for n − 1, i.e., that STN/2 = sort(ST ⊗n−1 ⊗ STp ). Defining aU = STN/2 , 2 aL = 2STN/2 , such that a = sort([aU , aL ]), the proposition is proved if a = STN since sort(STN/2 , 2STN/2 ) = sort([2 = sort([2 = sort([2 = sort([2 1] ⊗ STN/2 ]) = 1] ⊗ sort([1, 2] ⊗ STN/4 )) = 1] ⊗ [1, 2] ⊗ STN/4 ) = . . . = 1]⊗n ⊗ STp ).   T 0 defined by K Consider a subcode of TN = TN/2 T N/2  N/2  U U T 0 , K L rows rows, with K  rows from T = N/2 from T L = TN/2 TN/2 , and K U + K L = K; denote the corresponding submatrices of T U and T L by TAU and TBL , respectively. If row indices are selected such that d(TAU ) = aU (K U ) and d(TBL ) = aL (K L ), which is possible by the induction hypotheses, then the minimum distance of this subcode is  U  (a)  TA d = min d(TAU ), d(TBL ) = L TB  (b) = min aU (K U ), aL (K L ) = a(K) where (a) follows from the distance property of the (u|u + v) construction [31] and (b) from the sorting of two sorted lists. Proposition 1 requires the transformation matrix of the code to be in the form TN = T2⊗n ⊗ Tp . Note that this is not a limiting construction for the minimum-distance construction, since the minimum-distance spectrum does not depend on the order of the kernels in the Kronecker product. However, this structure permits to divide TN into 2n sub-matrices of p rows, termed as sectors in the following, each one consisting of a vector of Tp kernels and all-zero matrices. Analogously, n information sets I of size K can be split into P 2 smaller information sets Iq ⊆ [p], |Iq | = Kq ≤ p and q Kq = K, where each Iq collects the rows of the q-th sector included in I. Since the minimum distance spectrum of q-th sector is given by Sq = 2wt(q) · STp , where wt(q) is the number of ones of q-th row of T2⊗n , this division permits to identify the contribution of each sector to the minimum distance of the code. Due to the distance property of the (u|u + v) construction, dN (I) = min (dp (I1 ), . . . , dp (I2n )); if Iq is K formed by the optimal row set Rp q , then dp (Iq ) = Sq (Kq ). This concept is exploited in greedy Algorithm 1 to design multi-kernel polar codes with optimal minimum distance. The algorithm adds sequentially row indices to the information set I, modifying the information set of dimension K −1 to obtain the one for dimension K. Vector sN = [S1 | . . . |S2n ] formed by the minimum-distance spectra of the individual sectors is initially calculated as sN = (2, 1)⊗n ⊗STp ; note that the vector is not sorted. At step k, the position l of the largest entry in sN is extracted as b = sN (l), and sN (l) is set to zero. Then b represents the best minimum distance achievable by the code for dimension K; q defined in line 7 identifies the sector to be updated to reach that distance; index c = l mod p represents the number of rows of sector q already included in I. To increase the value of c by one, the algorithm substitutes the previous optimal row set Rcp with the following one, i.e. Rc+1 p , in line 8. The optimal row sets of the individual sectors need to be shifted by qp to be properly included in I. The algorithm stops when I comprises K elements. Algorithm 1 requires the minimum-distance spectrum STp and the optimal row sets of kernel Tp . A brute force calculation may be prohibitive for large kernels, since it is required to check the distances generated by all the kp possible k-rows sub-matrices of Tp . However, if Tp = Tp1 ⊗ Tp2 , the kernel Tp can be divided into p1 sectors of p2 rows, each one formed 7 Algorithm 2 Spectrum of Kronecker product of kernels 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: optimal row sets associated to h1, 3i is given by (24) as   Rh1,3i = R1p2 + 1 · 3 ∪ R3p2 + 2 · 3 = {3, 6, 7, 8} (25) Load optimal row sets R1p1 , . . . , Rpp11 , R1p2 , . . . , Rpp22 for k = 1 . . . p do STp (k) = 0 κ = ListPartition(k, p1 , p2 ) for ℓ = 1 . . . length(κ) do hk1 , . . . , kt i= κ(ℓ)  St k R = j=1 Rp2j + Rtp1 (j) · p2 m = MinDist(Tp (R, :)) if m > STp (k) then STp (k) = m Rkp = R end if end for end for return R1p , . . . , Rpp , STp with minimum distance MinDist(T9 (Rh1,3i , :)) = 2; similarly, Algorithm 2 calculates Rh2,2i = {4, 5, 7, 8} with minimum distance 4 and Rh1,1,2i = {0, 3, 7, 8} with minimum distance 3. Algorithm 2 selects R49 = Rh2,2i = {4, 5, 7, 8}, with ST9 (4) = 4. The complete minimum-distance spectrum of T9 obtained by running Algorithm 2 for k = 1, . . . , 9 is ST9 = (9, 6, 4, 4, 3, 2, 2, 2, 1). Note that this spectrum is optimal, as we verified by an exhaustive search. In general the spectrum achieved by Algorithm 2 may be suboptimal. C. Hybrid Design by the juxtaposition of kernels Tp2 . Then the set of k row indices, given by Rkp , is partitioned according to the sectors, indexed by the set {i1 , ..., it }, and within Pt each sector there are kj rows, j = 1, . . . , t, where k = j=1 kj . Given this structure, we propose to limit the search space for the sector indices and for the index sets within the sectors to optimal row sets of the component kernels. In more detail, an optimal row set Rtp1 = {i1 , . . . , it } identifies the indices of the sectors that will contribute to k Rkp ; for every retained sector ij , an optimal row set Rp2j is k included in Rp . For each k, all the possible combinations of t and kj have to be checked. Algorithm 2 performs this task, comparing the minimum distances of the row sets generated by all integer partitions of k of maximum length p1 , i.e. the set of all possible ways of writing k as a sum of up to p1 positive integers notP larger than p2 . The integer partition hk1 , . . . , kt i, t where k = i=1 ki , unambiguously identifies row set Rhk1 ,...,kt i = t [ j=1  Rkp2j + ij · p2 , (24) where Rtp1 = {i1 , . . . , it } with i1 < . . . < it . In practice, t sectors are included in the row set, whose indices are listed in Rtp1 . The j-th sector contributes with kj rows, that are chosen according to the optimal row set of kernel Tp2 . As an example, we show the steps performed by Algorithm 2 to compute the optimal row set R49 for kernel   1 1 1 1 1 1 1 1 1  1 0 1 1 0 1 1 0 1     0 1 1 0 1 1 0 1 1     1 1 1 0 0 0 1 1 1     T9 = T3 ⊗ T3 =   1 0 1 0 0 0 1 0 1 .  0 1 1 0 0 0 0 1 1     0 0 0 1 1 1 1 1 1     0 0 0 1 0 1 1 0 1  0 0 0 0 1 1 0 1 1 In this case, only 3 integer partitions of k = 4 respect the required properties, namely h1, 3i, h2, 2i and h1, 1, 2i. The The reliability design is conceived for SC decoding, and therefore suited for very long codes, where SC decoding becomes asymptotically optimal. The distance design, on the other hand, is assuming ML decoding, and thus is suited for short codes under SCL decoding, where moderate list lengths approximate ML decoding very well. The hybrid design combines reliability and distance as design criteria, and it is particularly effective to construct multi-kernel polar codes for medium code lengths under SCL decoding. To introduce the hybrid design principle, we partition the transformation matrix (1) of a multi-kernel polar code as TN = TN r ⊗ TN d , (26) with TNr = Tp1 ⊗. . .⊗Tpψ and TNd = Tpψ+1 ⊗. . .⊗Tps . The two matrices TNr and TNd can be treated as transformation matrices of smaller multi-kernel polar codes, of length Nr = p1 · . . . · pψ and Nd = pψ+1 · . . . · ps respectively, with N = Nr · Nd ; corresponding Tanner graph is depicted in Fig. 3. The idea is to apply the distance design to the left part of the graph, consisting of TNd blocks, and the reliability design to the right part of the graph, consisting of TNr blocks; indices ‘d’ and ‘r’ stand for distance and reliability, respectively. This hybrid design comprises a parameter, ψ, that allows to trade distance vs. reliability. Reliability and distance designs can be seen as extreme cases with ψ = s and ψ = 0 respectively. Consider now the following decoding principle. The normal SC decoder proceeds until all right-messages are available at the input to the first TNd block. The block makes a local ML decision, i.e., it decides for the most likely codeword, and this decision is fed back into the SC decoding process. SC decoding proceeds until all right-messages are available at the input to the second TNd block, which makes a local ML decision and feeds the result back into the decoding process. This continues until the last TNd block has made its decision. Note that this decoding principle can be approximated by a plain SCL decoder, where larger values of ψ may require larger list sizes to reach ML decoding of left blocks. Consider now the probability of error of the local ML decision at the i-th TNd block, denoted by Pe (ui ) for any i = 0, 1, . . . , Nr − 1, assuming all previous decisions being error-free. By SC decoding, all incoming messages have the same reliability; imposing Gaussian approximation on the message densities, we denote the mean of the incoming 8 D. Construction Example Tb TNa d TNa d TNa d r Fig. 3: Tanner graph of TN = TNr ⊗TNd for the hybrid design. message density by µi and the minimum distance of the local code, as imposed by the information set on the TNd block, by di . The probability of error can thus be lower-bounded by p Pe (ui ) ≥ Q( di µi /2), (27) while for the overall decoder is lower-bounded as p PeMLSC ≥ max Pe (ui ) ≥ max Q( di µi /2). i∈[Nr ] i∈[Nr ] (28) The information set I H of the hybrid design is selected to minimize this lower-bound, namely solving the optimization problem maximizing the minimum of the terms di µi as max s.t. min di (I)µi i∈[Nr ] (29) I ⊂ [N ], |I| = K, where di (I) denotes the minimum distance of the code induced by I over the i-th TNd block. This hybrid design minimizes the error rate for the mixed ML-SC decoder as described above. Note that both the bound (28) and the information set (29) are for a fixed value of ψ ∈ [0, s], which may be adapted to the available list length of the SCL decoder. The optimisation for the information set, as given in (29), can be solved slightly modifying Algorithm 1, introduced for the distance design in the previous section. Initially, the reliabilities of the Nr input bits of the partial transformation matrix TNr are determined using DE/GA, as described in Section IV-A. These reliabilities are stored in an intermediate vector µ = (µNr −1 , . . . , µ0 ), where µi represents the reliability of the i-th input bit of the code generated by TNr . At the same time, the minimum distance spectrum of the partial transformation matrix TNd is computed, along with the optimal rows sets, using the methods for the distance design, as described in Section IV-B. Finally, vector dN = µ ⊗ STNd , representing the ”hybrid” spectrum of TN , is calculated and given to Algorithm 1, along with the optimal rows sets calculated previously, to design the information set of the code. We illustrate the proposed designs through a multi-kernel polar code of length N = 12 and dimension K = 4 with transformation matrix T12 = T2⊗2 ⊗ T3 depicted in Fig. 1, i.e.   1 1 1 0 0 0 0 0 0 0 0 0  1 0 1 0 0 0 0 0 0 0 0 0     0 1 1 0 0 0 0 0 0 0 0 0     1 1 1 1 1 1 0 0 0 0 0 0     1 0 1 1 0 1 0 0 0 0 0 0     0 1 1 0 1 1 0 0 0 0 0 0  ,  T12 =    1 1 1 0 0 0 1 1 1 0 0 0   1 0 1 0 0 0 1 0 1 0 0 0     0 1 1 0 0 0 0 1 1 0 0 0     1 1 1 1 1 1 1 1 1 1 1 1     1 0 1 1 0 1 1 0 1 1 0 1  0 1 1 0 1 1 0 1 1 0 1 1 for a BI-AWGN channel with inputs ±1 and σ 2 = 0.5. 1) Reliability design: Using DE/GA, the reliabilities are calculated as (0.09, 1.28, 2, 1.85, 7.3, 9.12, 2.75, 9.57, 11.56, 11.94, 29.42, 32). The K = 4 most reliable positions form the information set: I R = {8, 9, 10, 11}. 2) Distance design: The minimum-distance spectrum and the optimal row sets of the kernel T3 depicted in (10) are ST3 = (3, 2, 1) with R13 = {0}, R23 = {1, 2} and R33 = {0, 1, 2}; the auxiliary vector is calculated as d12 = (2, 1)⊗2 ⊗ (3, 2, 1) = (12, 8, 4, 6, 4, 2, 6, 4, 2, 3, 2, 1). Algorithm 1 for K = 4 gives then I D = {3, 6, 10, 11}. 3) Hybrid design: Transformation matrix T12 includes s = 3 kernels, thus ψ ∈ {0, 1, 2, 3}. The hybrid design results in the reliability design for ψ = 3 and in the distance design for ψ = 0. For ψ = 2, we have TNr = T2⊗2 and TNd = T3 . The reliabilities of TNr are determined by DE/GA, resulting in µ = (16, 5.78, 4.56, 1), while ST3 has been determined above. Thus we obtain the mixed spectrum s12 = µ ⊗ ST3 = (48, 32, 16, 17.34, 11.56, 5.78, 13.68, 9.12, 4.56, 3, 2, 1) for the transformation matrix T12 . This vector along with the optimal row sets for T3 is used as input to Algorithm 1, giving the information set I H = {6, 9, 10, 11} for K = 4. For ψ = 1 instead, we have TNr = T2 and TNd = T2 ⊗ T3 , with vector µ = (8, 2.28). The minimum-distance spectrum of TNd can be calculated by Algorithm 2 as ST2 ⊗T3 = (6, 4, 3, 2, 2, 1), with R16 = {3}, R26 = {4, 5}, R36 = {0, 4, 5}, R46 = {0, 3, 4, 5}, R56 = {1, 2, 3, 4, 5} and R66 = {0, 1, 2, 3, 4, 5}. Algorithm 1 takes then as inputs d12 = µ ⊗ SGd = (48, 32, 16, 24, 16, 8, 13.68, 9.12, 4.56, 6.84, 4.56, 2.28) and provides the information set I H = {6, 9, 10, 11}. The four designs are summarized in Table I. As expected, distance design leads to the best minimum distance of 6, while the other designs yield a minimum distance of 4. In this case, ψ = 1, 2 result in the same information set, which is however different from the reliability design. V. N UMERICAL E XAMPLES In this section we evaluate the performance of the proposed multi-kernel polar codes under the different designs. All the simulations determine the BLock Error Rate (BLER) of the 9 ψ I min. dist. 3 (rel.) {8, 9, 10, 11} 4 2 {6, 9, 10, 11} 4 1 {6, 9, 10, 11} 4 0 (dist.) {3, 6, 10, 11} 6 TABLE I: I and minimum distances for (12, 4) codes. codes for BI-AWGN channels under SCL decoding, usually for list size L = 8. To begin with, we evaluate the impact of the kernel order in the multi-kernel construction. Next, the impact of parameter ψ of the hybrid design is studied. Then we compare multi-kernel polar codes to punctured and shortened polar codes of same length and dimension. Finally, we compare multi-kernel polar codes with standard codes of same length and dimension, namely with LDPC codes for 802.11n [32] and polar codes for 5G NR [4]. 10 -1 -2 10 -3 =3 =3 =3 =3 =3 =3 =3 10 0 10 -1 10 -2 BLER BLER 10 p7 p6 p5 p4 p3 p2 p1 Simulations confirm that the kernel order has some impact on the performance of multi-kernel polar codes under reliability design; no systematic behavior could be identified at this stage. Figure 5 shows the BLER performance of multi-kernel polar codes of length N = 384 = 27 · 3 and rate R = 1/2 under hybrid design for different values of parameter ψ introduced in Section IV-C. All the simulations are performed under SCL decoding with list size L = 8. In this case, the number of kernels composing the transformation matrix is s = 8, and the single T3 kernel is placed in the last position, having p8 = 3. Parameter ψ can hence span from 0 to 8, where the extreme cases ψ = 0 and ψ = 8 corresponding to distance and reliability designs, respectively, are highlighted with different colors. As expected for mid-length codes, distance design performs worse than reliability design, however the code is not long enough to have strong polarization. In this case, the mixed design offers an advantage over the other two designs due to higher flexibility. Simulations show that the performance strongly depends on the choice of parameter ψ; a clear pattern is not recognizable and needs further studies. In the following, we set ψ = ⌈ s−1 2 ⌉ as a rule of thumb for the hybrid design. 10 -4 10 -5 2.5 10 10 -4 3 3.5 4 4.5 5 5.5 Eb /N0 (dB) Fig. 4: BLER performance of (192, 96) multi-kernel polar codes under reliability design for different position of the unique T3 kernel and list size L = 8. Figure 4 shows the BLER performance of multi-kernel polar codes of length N = 192 = 26 · 3 and rate R = 1/2, designed according to reliability, under SCL decoding with list size L = 8. The transformation matrix of this code is constructed mixing 6 kernels T2 and a single T3 kernel. There are hence 7 possible configurations of (1), namely depending on the position of the T3 kernel in the Kronecker product. According to Figure 4, the performance gain between the best and the worst design for this code is about 0.5 dB. In particular, the best performance is obtained when p2 = 3, i.e. when T3 is placed in second position, while the worst performance is attained by switching the first two kernels of the best design, with p1 = 3. Note also that the slope of the curves is different, which is due to differences in their distance properties. The metric proposed in Section IV-A for the selection of the kernel order suggests setting p3 = 3, resulting in average performance at low SNR but quickly approaching the performance of p2 = 3 at higher SNR. = = = = = = = = = 0 (dist) 1 2 3 4 5 6 7 8 (rel) 1.5 2 ψ ψ ψ ψ ψ ψ ψ ψ ψ -3 1 2.5 3 3.5 4 4.5 5 Eb /N0 (dB) Fig. 5: BLER performance of (384, 192) multi-kernel polar codes under hybrid design for different values of ψ and L = 8. Figure 6 shows a comparison between the presented multikernel polar codes under various designs and rate-matched polar codes for rate R = 1/2 under SCL decoding with L = 8. Polar codes of length N are generated from a mother polar code of length M = 2log 2(⌈N ⌉) through puncturing according to [10] and shortening according to [11]. The information sets of mother polar codes are hence calculated by either puncturing the first M − N bits or shortening the last M − N bits and calculating bit reliabilities through DE/GA [13]. Code lengths are selected to have a wide range of possibilities. In Figure 6a the code length is set to N = 144 = 32 · 24 . This length can be reached by multi-kernel polar codes with two T3 kernels and four T2 kernels, while it demands strong puncturing/shortening by 144 bits to apply rate-matching from a mother polar code of length M = 256. Figure 6b shows the performance of codes of length N = 200 = 52 ·23 , reached by multi-kernel polar codes using two T5 kernels in conjunction to 10 10 -1 10 -1 10 -1 10 -2 10 -2 10 -2 10 -3 BLER 10 0 BLER 10 0 BLER 10 0 10 -3 MK - Dist MK - Hyb MK - Rel PC - short PC - punct 10 -4 1 1.5 2 10 -3 MK - Dist MK - Hyb MK - Rel PC - short PC - punct 10 -4 2.5 3 3.5 4 4.5 5 5.5 6 1 1.5 2 MK - Dist MK - Hyb MK - Rel PC - short PC - punct 10 -4 2.5 3 Eb /N0 (dB) 3.5 4 4.5 5 5.5 6 1 1.5 2 2.5 3 Eb /N0 (dB) (a) N = 144 (b) N = 200 3.5 4 4.5 5 5.5 6 Eb /N0 (dB) (c) N = 90 Fig. 6: BLER performance of multi-kernel polar codes compared with punctured [10] and shortened [11] polar codes of rate R = 1/2 under SCL decoding with L = 8. T2 kernels. The last Figure 6c mixes all the presented kernels showing the performance of codes of length N = 90 = 5·32 ·2. Overall, we can see that reliability design of multi-kernel polar codes behaves similarly to the best rate-matching strategy among puncturing and shortening. The sub-optimality of T5 LLR equation f25 has an impact on the performance of the reliability design for short codes, as shown in Figure 6b. However, other designs are not impacted excessively by this issue, exhibiting good performance in this case. Moreover, distance design outperforms other designs for short codes, where minimum distance have still more impact than polarization effect; hybrid design permits a tradeoff between polarization and distance, always outperforming rate-matched polar codes. 10 0 10 -1 BLER 10 -2 10 -3 10 -4 10 -5 0.5 LDPC, R = 1/2 MK-CRC, R = 1/2 5G polar, R = 1/2 LDPC, R = 2/3 MK-CRC, R = 2/3 5G polar, R = 2/3 LDPC, R = 3/4 MK-CRC, R = 3/4 5G polar, R = 3/4 LDPC, R = 5/6 MK-CRC, R = 5/6 5G polar, R = 3/4 1 1.5 of 10 CRC bits to help SCL decoding [2]. List size is set to L = 8 for both multi-kernel polar codes and 5G polar codes; LDPC codes are decoded using a 10-iterations offset min-sum decoder. Results show that the proposed multi-kernel polar codes are comparable to state-of-the-art channel codes. VI. C ONCLUSIONS In this paper, we proposed a generalized polar code construction based on multiple kernels, termed multi-kernel polar codes. Though encoding and decoding resemble those of the original polar codes, as proposed by Arikan, multi-kernel polar codes provide various new design options. We presented such new code design principles based on reliability, distance, and a mix of those two as design criteria, coined as hybrid design, allowing to adapt the design to a given list length of the SCL decoder. The error-rate performance of multi-kernel polar codes was evaluated by simulations, resulting to be superior to state-of-the-art polar-code constructions, using puncturing or shortening methods, and state-of-the-art LDPC codes. This paper focused on the information set design of multikernel polar codes. the design of optimal kernels or the optimization of hybrid design are not addressed in this paper and left for future research; the presented tools for analysis and design are believed to be useful for these purposes. R EFERENCES 2 2.5 3 3.5 4 4.5 Eb /N0 (dB) Fig. 7: BLER performance comparison among multi-kernel polar codes with 8 CRC bits, LDPC codes in [32] and 5G polar codes [4] for length N = 1944. Figure 7 shows a comparison among multi-kernel polar codes, LDPC codes of the 802.11n standard [32] and polar codes standardized in 5G [4]. The 802.11n standard specifies the three code lengths 1944, 1296 and 648, and the four code rates 1/2, 2/3, 3/4 and 5/6; we show simulation results for N = 1944 and all four admissible rates. Multi-kernel polar codes are designed according to reliability, with the addition [1] E. Arikan, “Channel polarization: a method for constructing capacityachieving codes for symmetric binary-input memoryless channels,” IEEE Transactions on Information Theory, vol. 55, no. 7, pp. 3051– 3073, July 2009. [2] I. Tal and A. Vardy, “List decoding of polar codes,” in IEEE International Symposium on Information Theory (ISIT), St. Petersburg, Russia, July 2011. [3] K. Niu and K. Chen, “CRC-aided decoding of polar codes,” IEEE Communications Letters, vol. 16, no. 10, pp. 1668–1671, 2012. [4] V. Bioglio, C. Condo, and I. Land, “Design of polar codes in 5G New Radio,” in arXiv preprint arXiv:1804.04389., April 2018. [5] S. B. Korada, E. Sasoglu, and R. Urbanke, “Polar codes: Characterization of exponent, bounds, and constructions,” IEEE Transactions on Information Theory, vol. 56, no. 12, pp. 6253–6264, Dec. 2010. [6] N. Presman, O. Shapira, S. Litsyn, T. Etzion, and A. Vardy, “Binary polarization kernels from code decompositions,” IEEE Transactions on Information Theory, vol. 61, no. 5, pp. 2227–2239, May 2015. 11 [7] H.-P. Lin, S. Lin, and K. Abdel-Ghaffar, “Linear and nonlinear binary kernels of polar codes of small dimensions with maximum exponents,” IEEE Transactions on Information Theory, vol. 61, no. 10, pp. 5253– 5270, Oct. 2015. [8] R. Mori and T. Tanaka, “Non-binary polar codes using Reed-Solomon codes and algebraic geometry codes,” in IEEE Information Theory Workshop (ITW), Dublin, Ireland, September 2010. [9] N. Presman, O. Shapira, and S. Litsyn, “Mixed-kernels constructions of polar codes,” IEEE Journal on Selected Areas in Communications, vol. 34, no. 2, pp. 239–253, 2016. [10] K. Niu, K. Chen, and J.-R. Lin, “Beyond turbo codes: Rate-compatible punctured polar codes,” in IEEE International Conference on Communications (ICC), Budapest, Hungary, June 2013. [11] R. Wang and R. Liu, “A novel puncturing scheme for polar codes,” IEEE Communications Letters, vol. 18, no. 12, pp. 2081–2084, Dec. 2014. [12] L. Zhang, Z. Zhang, X. Wang, Q. Yu, and Y. Chen, “On the puncturing patterns for punctured polar codes,” in IEEE International Symposium on Information Theory (ISIT), Hawaii, U.S.A., July 2014. [13] V. Bioglio, F. Gabry, and I. Land, “Low-complexity puncturing and shortening of polar codes,” in IEEE Wireless Communications and Networking Conference (WCNC), San Francisco, USA, March 2017. [14] M. K. Lee and K. Yang, “The exponent of a polarizing matrix constructed from the kronecker product,” Designs, codes and cryptography, vol. 70, no. 3, pp. 313–322, March 2014. [15] F. Gabry, V. Bioglio, I. Land, and J.-C. Belfiore, “Multi-kernel construction of polar codes,” in IEEE International Conference on Communications (ICC), Paris, France, May 2017. [16] M. Benammar, V. Bioglio, F. Gabry, and I. Land, “Multi-kernel polar codes: Proof of polarization and error exponents,” in IEEE Information Theory Workshop (ITW), Kaohsiung, Taiwan, Nov. 2017. [17] V. Bioglio, F. Gabry, I. Land, and J.-C. Belfiore, “Minimum-distance based construction of multi-kernel polar codes,” in IEEE Global Communications Conference (GLOBECOM), Singapore, Dec. 2017. [18] M. Mondelli, S. H. Hassani, and R. L. Urbanke, “From polar to ReedMuller codes: a technique to improve the finite-length performance,” IEEE Transactions on Communications, vol. 62, no. 9, pp. 3084–3091, 2014. [19] R. Mori and T. Tanaka, “Channel polarization on q-ary discrete memoryless channels by arbitrary kernels,” in IEEE International Symposium on Information Theory (ISIT), Austin, Texas, USA, June 2010. [20] D. S. Bernstein, Matrix mathematics: Theory, facts, and formulas with application to linear systems theory, Princeton University Press, 2005. [21] A. Alamdar-Yazdi and F. R. Kschischang, “A simplified successivecancellation decoder for polar codes,” IEEE communications letters, vol. 15, no. 12, pp. 1378–1380, 2011. [22] K. Niu and K. Chen, “Stack decoding of polar codes,” Electronics letters, vol. 48, no. 12, pp. 695–697, 2012. [23] G. Bonik, S. Goreinov, and N. Zamarashkin, “Construction and analysis of polar and concatenated polar codes: practical approach,” in arXiv preprint, arXiv:1207.4343, July 2012. [24] Z. Huang, S. Zhang, F. Zhang, C. Duanmu, and M Chen, “On the successive cancellation decoding of polar codes with arbitrary binary linear kernels,” in arXiv preprint, arXiv:1701.03264, Jan. 2017. [25] H. Griesser and V. R. Sidorenko, “A posteriory probability decoding of nonsystematically encoded block codes,” Problems of Information Transmission, vol. 38, no. 3, pp. 182–193, Mar. 2002. [26] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary block and convolutional codes,” IEEE Transactions on Information Theory, vol. 42, no. 2, pp. 429–445, 1996. [27] R. Mori and T. Tanaka, “Performance of polar codes with the construction using density evolution,” IEEE Communications Letters, vol. 13, no. 7, pp. 519–521, July 2009. [28] S. Y. Chung, T. J. Richardson, and R. L. Urbanke, “Analysis of sumproduct decoding of low-density parity-check codes using a Gaussian approximation,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 657–670, 2001. [29] J. Ha, J. Kim, and S. W. McLaughlin, “Rate-compatible puncturing of low-density parity-check codes,” IEEE Transactions on Information Theory, vol. 50, no. 11, pp. 2824–2836, 2004. [30] N. Hussami, S. B. Korada, and R. Urbanke, “Performance of polar codes for channel and source coding,” in IEEE International Symposium on Information Theory (ISIT), Seoul, Korea, June 2009. [31] F. J. MacWilliams and N. J. A. Sloane, The theory of error-correcting codes, vol. 16, Elsevier, 1977. [32] “IEEE standard for information technology - local and metropolitan area networks - specific requirements - part 11: Wireless LAN medium access control (MAC) and physical layer (PHY) specifications,” Mar 2012.