Academia.eduAcademia.edu

Refined rate of channel polarization

2010, 2010 IEEE International Symposium on Information Theory

A rate-dependent upper bound of the best achievable block error probability of polar codes with successivecancellation decoding is derived.

Refined Rate of Channel Polarization Toshiyuki Tanaka and Ryuhei Mori Graduate School of Informatics, Kyoto University, Kyoto-shi, Kyoto, 606-8501 Japan. e-mail: [email protected], [email protected] arXiv:1001.2067v1 [cs.IT] 13 Jan 2010 Abstract—A rate-dependent upper bound of the best achievable block error probability of polar codes with successivecancellation decoding is derived. I. I NTRODUCTION Channel polarization [1] is a method which allows us to construct a family of error-correcting codes, called polar codes. Polar codes have been attracting theoretical interest because they are capacity achieving for binary-input symmetric memoryless channels (B-SMCs), which are also achieving symmetric capacity for general binary-input memoryless channels (B-MCs), whereas computational complexity of encoding and decoding is polynomial in the block length. Soon after the first proposal [1], one can find in the literature a number of contributions regarding channel polarization and polar codes [2]– [10]. Of particular theoretical interest is analysis of how fast the best achievable block error probability Pe of polar codes decays toward zero as the block length N tends to infinity. Arıkan [1] has shown that Pe tends to zero as N → ∞ whenever the code rate R is less than the symmetric capacity of the underlying channel. The upper bound he obtained is proportional to a negative power of N , which means that guaranteed speed of the convergence to zero is very slow. His result has subsequently been improved by Arıkan and Telatar [4], who have obtained a much tighter upper bound, which scales as exponential in −N β for β < 1/2. Both of these bounds, however, do not depend on the code rate R. A rate-dependent bound is more desirable, since one naturally expects a smaller error probability from a smaller code rate, which might in turn suggest that the rate-independent bounds are not tight. In this paper, we present an analysis of the rate of channel polarization. The argument basically follows that of Arıkan and Telatar [4], but extends it to obtain rate-dependent bounds of the best achievable error probability. II. P ROBLEM Let W : X 7→ Y be an arbitrary binary-input memoryless channel (B-MC) with input alphabet X = {0, 1}, output alphabet Y, and channel transition probabilities {W (y|x) : x ∈ X , y ∈ Y}. Let I(W ) be the symmetric capacity of W , which is defined as the mutual information between the input and output of W when the input is uniformly distributed over X . It is an upper bound of achievable rates over W with codes that use input symbols with equal frequency. Let the Bhattacharyya parameter Z(W ) of the channel W be defined as Xp W (y|0)W (y|1). Z(W ) = y∈Y It is an upper bound of the maximum-likelihood estimation error for a single channel usage. Polar codes are constructed on the basis of recursive application of channel combining and splitting operation. In this operation, two independent copies of a channel W is combined and then split to generate two different channels W − : X → Y 2 and W + : X → Y 2 × X . The operation, in its most basic form, is defined as X 1 W − (y1 , y2 |x1 ) = W (y1 |(xF )1 )W (y2 |(xF )2 ), 2 x2 ∈X W + (y1 , y2 , x1 |x2 ) = with F =  1 1 1 W (y1 |(xF )1 )W (y2 |(xF )2 ), 2 0 1  , (1) x = (x1 , x2 ). (2) It has been shown [1] that Z(W + ) = Z(W )2 , Z(W ) ≤ Z(W − ) ≤ 2Z(W ) − Z(W )2 . (3) In constructing polar codes, we recursively generate channels with the channel combining and splitting operation, starting with the given channel W , as W → {W − , W + } → {W −− , W −+ , W +− , W ++ } → {W −−− , W −−+ , W −+− , W −++ , W +−− , W +−+ , W ++− , W +++ } → · · · , (4) −− where we have adopted the shorthand notation W = (W − )− , etc. Following Arıkan [1], this process of recursive generation of channels can be dealt with by introducing a channel-valued stochastic process, defined as follows. Let {B1 , B2 , . . .} be a sequence of independent and identically distributed (i.i.d.) Bernoulli random variables with P (B1 = 0) = P (B1 = 1) = 1/2. Given a channel W , we define a sequence of channelvalued random variables {W0 , W1 , . . .} as  Wn− if Bn+1 = 1 W0 = W, Wn+1 = . (5) Wn+ if Bn+1 = 0 We also define a real-valued random process {Z0 , Z1 , . . .} via Zn = Z(Wn ). Conceptually, a polar code is constructed by picking up channels with good quality, among N = 2n realizations of Wn . We use these selected channels for transmitting data, while some predetermined values are transmitted over the remaining unselected channels. Thus, the rate of the resulting polar code is R if we pick up N R channels. We are interested in performance of polar codes under successive cancellation (SC) decoding, which is defined in [1]. Let Pe (N, R) be the best achievable block error probability of polar codes of block length N and rate R under successive cancellation decoding. Since the Bhattacharyya parameter Z(W ) serves as an upper bound of bit error probability in each step of successive cancellation decoding, an inequality of the form P (Zn ≤ γ) ≥ R B. Preliminaries For m, n ∈ N with m < n, define Sm, n = III. M AIN R ESULT The main contribution of this paper is to prove the following theorem, which improves the results in [1], [4], giving a ratedependent upper bound of the block error probability. Theorem 1: Let W be any B-MC with I(W ) > 0. Let R ∈ (0, I(W )) be fixed. Then, for N = 2n , n ∈ N, the best achievable block error probability Pe (N, R) satisfies,   √ (n+t n)/2 , (7) Pe (N = 2n , R) = o 2−2 for t satisfying t < Q−1 (R/I(W )), where Q(x) = √ R ∞ any −u2 /2 e du/ 2π. x IV. P ROOF A. Outline The proof basically follows that of Arıkan and Telatar [4] but extends it in several respects. It consists of three stages, which we call polarization, concentration, and bootstrapping, respectively. In the first stage, it will be argued that realizations of Zn are in (0, ζ] for some ζ > 0 with probability arbitrarily close to I(W ) as n becomes large. This corresponds to the fundamental result of channel polarization [1]. In the second stage, concentration will be argued, that is, again with probability arbitrarily close to I(W ) as n gets large, realizations of Zn are in (0, fn ] for some fn approaching zero exponentially in n. In the last stage, we will argue that, once Zm for some m enters the interval (0, fn ], the sequence Zm+1 , . . . , Zn is rapidly decreasing with overwhelming probability, which is a refinement of the “bootstrapping argument” of [4]. The last stage is further divided into two substages, the rate-independent bootstrapping stage and the rate-dependent bootstrapping stage, the latter of which is crucial in order to see dependence on the code rate. Bi , (8) i=m+1 which follows a binomial distribution, since it is a sum of i.i.d. Bernoulli random variables. Definition 1: For a fixed γ ∈ [0, 1], let Gm, n (γ) be the event defined by Gm, n (γ) = {Sm, n ≥ γ(n − m)}. From the law of large numbers, lim (6) implies Pe (N, R) ≤ N Rγ via union bound. It has been proved [1] that, for any R < I(W ), there exists a polar code with block length N = 2n , whose block error probability Pe (N, R) is arbitrarily close to 0. The proof is based on showing the condition (6) to hold for γ ∈ o(N −1 ). n X n−m→∞ P (Gm, n (γ)) = 1 (9) holds if γ < 1/2. C. Random Process We now consider the random process Xn ∈ [0, 1] satisfying the following properties. 1) Xn converges to a random variable X∞ almost surely. 2) Conditional on Xn , if Xn 6= 0, 1,  ∈ [Xn , qXn ] if Bn+1 = 1 Xn+1 = Xn2 if Bn+1 = 0 for a constant q ≥ 1, and Xn+1 = Xn with probability 1 for Xn = 0 or 1. Equation (3) implies that the random process Zn satisfies the above properties with q = 2. It should be noted that the properties 1 and 2 imply P (X∞ ∈ {0, 1}) = 1. Definition 2: For ζ ∈ (0, 1) and n ∈ N, define an event Tn (ζ) with Tn (ζ) = {Xi ≤ ζ; ∀ i ≥ n}. The following lemma is an immediate consequence of the above definition. Lemma 1: For any fixed ζ ∈ (0, 1), lim P (Tn (ζ)) = P (X∞ = 0). n→∞ D. Concentration For large enough n, one can expect that Xn is exponentially small in n with probability arbitrarily close to P (X∞ = 0). In other words, a P (X∞ = 0)-fraction of realizations of Xn “concentrates” toward zero. To formalize the above statement, we introduce the following definition. Definition 3: Let ρ ∈ (0, 1) and β ∈ (0, 1/2). The events Cn (ρ) and Dn (β) are defined as Cn (ρ) = {Xn ≤ ρn }, nβ Dn (β) = {Xn ≤ 2−2 }, (10) (11) respectively. We will first prove that the event Cn has a probability arbitrarily close to P (X∞ = 0) as n tends to infinity, on the basis of which we will next prove that the event Dn (β) has a probability arbitrarily close to P (X∞ = 0) as n → ∞. The result for the event Cn is proved in the following proposition, on the basis of which the result for the event Dn is proved in the bootstrapping stage. Proposition 1: For an arbitrary fixed ρ ∈ (0, 1), let Cn (ρ) be the event defined as (10). Then, lim P (Cn (ρ)) = P (X∞ = 0). n→∞ The proof is essentially the same as that for Theorem 2 in [1], and is omitted due to space limitations. E. Bootstrapping: Rate-Independent Stage For some m ≪ n, once a realization of Xm becomes small enough, one can assure, with probability very close to 1, that samples conditionally generated on the realization of Xm will converge to zero exponentially fast. This is the basic idea leading to the “bootstrapping argument” of [4]. We basically follow the same idea. The proof regarding the bootstrapping stage is based on a consideration of properties of a process {Li }, defined on the basis of {Xi } as Li = log2 Xi , i = 0, . . . , m,  2Li if Bi+1 = 1 Li+1 = , Li + log2 q if Bi+1 = 0 (12) i≥m γ(1−α)n L(1) εαn n ≤ −2 holds, which in turn implies o n  γ(1−α)n εαn ⊃ Cαn ρ(γ) ∩ Gαn, n (γ). Xn ≤ 2−2 (15) For any n ≥ (εα)−1 , βn ≤ γ(1 − α)n + log2 εαn holds, so that one obtains o n γ(1−α)n εαn . (16) Dn (β) ⊃ Xn ≤ 2−2  From (15) and (16), as well as the independence of Cαn ρ(γ) and Gαn, n (γ), one consequently has P (Dn (β)) ≥ P (Gαn, n (γ))P (Cαn (ρ(γ))). (17) Hence, using (9) and Proposition 1,   lim P (Dn (β)) ≥ lim P Gαn, n (γ) P Cαn ρ(γ) n→∞ n→∞ = P (X∞ = 0). (18) (13) F. Bootstrapping: Rate-Dependent Stage for a fixed m. The inequality Xi ≤ 2Li holds on the samplepath basis for all i ≥ 0. If we fix Lm and Sm, n , the largest value of Ln is achieved by the sequence {Bm+1 , . . . , Bn } of (n − m − Sm, n ) consecutive 0s followed by Sm, n consecutive 1s. One therefore obtains Ln ≤ 2Sm, n [Lm + (n − m − Sm, n ) log2 q] .  that, conditional on the event Cαn ρ(γ) ∩Gαn, n (γ) with ρ(γ) defined in Lemma 2, the inequality (14) Lemma 2: Fix γ ∈ [0, 1] and ε > 0, and let ρ = ρ(γ) be such that log2 ρ = −(1 − γ)(n − m) log2 q/m − ε holds. Then, conditional on Cm ρ(γ) ∩ Gm, n (γ), the inequality Ln ≤ −2γ(n−m) εm holds. Proof: Conditional on Cm (ρ) ∩ Gm, n (γ), one has, from (14), the inequality Ln ≤ 2Sm, n [m log2 ρ + (1 − γ)(n − m) log2 q] . Letting ρ = ρ(γ) completes the proof. Proposition 2: For an arbitrary fixed β ∈ (0, 1/2), let Dn (β) be the event defined in (11). Then, lim P (Dn (β)) = P (X∞ = 0). n→∞ Proof: Since β ∈ (0, 1/2), there exists (γ, α) ∈ (0, 1/2) × (0, 1) satisfying the condition γ(1 − α) = β (e.g., letting γ = (1 + 2β)/4 and α = (1 − 2β)/(1 + 2β) satisfies (1) the condition). We take m = αn in Lemma 2, and let {Li } denote the process defined by (12) and (13) with m = αn. Then, for any ε > 0, one obtains by applying Lemma 2 So far, our treatment of the random variable Sm, n is restricted to that within regimes of the law of large numbers. In order to obtain a rate-dependent bound, we have to go further and treat Sm, n within regimes of the central limit theorem. √ Definition 4: For t ∈ R and for a function f (n) = o( n), the event Hm, n (t) is defined as   √ 1 Hm, n (t) = Sm, n ≥ (n − m + t n − m) + f (n − m) . 2 (19) Noting that the random variable Sm, n is a sum of (n−m) i.i.d. Bernoulli random variables, and that the mean and the variance are (n − m)/2 and (n − m)/4, respectively, the following lemma is a direct consequence of the central limit theorem. Lemma 3: Let m < n. Then, for any t ∈ R, lim n−m→∞ P (Hm, n (t)) = Q(t). √ Proposition 3: For an arbitrary function f (n) = o( n).   √ (n+t n)/2+f (n) ≥ Q(t)P (X∞ = 0). lim inf P Xn ≤ 2−2 n→∞ Proof: For a fixed β ∈ (0, 1/2), we take m = (2) {Li } 1 β log2 n in denote the process defined by (12) Lemma 2, and let and (13) with this choice of m. Conditional on the event Dm (β), one obtains, from (14), the inequality  βm  Sm, n −2 + (n − m − Sm, n ) log2 q . (20) L(2) n ≤2 Let Hm, n (t) be the event defined in Definition 4 for a √ fixed t ∈ R and for an arbitrarily chosen function f (n) = o( n). (2) Conditional on Dm (β)∩Hm, n (t), Ln is bounded from above as 1 1 √ 2 (n−m)+ 2 t n−m+f (n−m) L(2) n ≤2 √     n−m−t n−m βm × −2 + − f (n − m) log2 q 2 which implies that there exists n0 such that for all n ≥ n0 , the condition   1 (n−m)+ 1 t√n−m+f (n−m) 2 Xn ≤ 2−2 2 ⊃ Dm (β) ∩ Hm, n (t) (21) is satisfied. From this observation, as well as the independence of Dm (β) and Hm, n (t), one has P   1 (n−m)+ 1 t√n−m+f (n−m) 2 Xn ≤ 2−2 2 ≥ P (Dm (β))P (Hm, n (t)). (22) Thus, lim inf P n→∞   1 (n−m)+ 1 t√n−m+f (n−m) 2 Xn ≤ 2−2 2 ≥ lim P (Dm (β))P (Hm, n (t)) = P (X∞ = 0)Q(t). (23) n→∞ √ Since m = o( n), one can safely absorb possible effects of m into the function f . This completes the proof. G. Converse In this subsection, we discuss the converse, in which probabilities that Xn takes small values are bounded from above. √ Proposition 4: For an arbitrary function f (n) = o( n)   √ (n+t n)/2+f (n) ≤ Q(t)P (X∞ = 0) lim sup P Xn ≤ 2−2 n→∞ Proof: Fix a process {Xn }. Let {X̌n } be the random process defined as X̌i = Xi , ( 2 X̌i−1 , if Bi = 1 X̌i = , X̌i−1 , if Bi = 0 for i = 0, · · · , m (24) for i > m (25) The inequality Xi ≥ X̌i holds on the sample-path basis for all i ≥ 0, which implies P (Xn ≤ a) ≤ P (X̌n ≤ a), for any a. One also has log2 log2 (1/X̌m+k ) = √ any fixed m and for an arbitrary function f (k) = o( k), one has √   (k+t k)/2+f (k) P X̌m+k ≤ 2−2 Xm ! √ k t k = P log2 log2 (1/X̌m+k ) ≤ + + f (k) Xm 2 2 = Q(t) + o(1) (26) as k → ∞. For any fixed δ ∈ (0, 1), and m ≥ 0   √ (n+t n)/2+f (n) lim sup P Xn ≤ 2−2 n→∞   √ (m+k+t m+k)/2+f (m+k) ≤ lim sup P Xm+k ≤ 2−2 k→∞    √ −2(m+k+t m+k)/2+f (m+k) Xm ≤ δ ≤ lim sup P X̌m+k ≤ 2 k→∞   δ . (27) × P (Xm ≤ δ) + P Xm+k ≤ , Xm > δ 2 From Fatou’s lemma,     δ δ lim sup P Xm+k ≤ , Xm > δ ≤ P X∞ ≤ , Xm > δ . 2 2 k→∞ (28) On the basis of (26), (27), and (28), one obtains   √ (n+t n)/2+f (n) lim sup P Xn ≤ 2−2 n→∞   δ ≤ Q(t)P (Xm ≤ δ) + P X∞ ≤ , Xm > δ . (29) 2 Since this is true for all m, we conclude that   √ (n+t n)/2+f (n) lim sup P Xn ≤ 2−2 n→∞    δ ≤ lim Q(t)P (Xm ≤ δ) + P X∞ ≤ , Xm > δ m→∞ 2 = Q(t)P (X∞ = 0), (30) holds, where we have used the almost-sure convergence of Xm to X∞ (property 1 in Sect. IV-C). Putting Propositions 3 and 4 together, we arrive at the following theorem. √ Theorem 2: For an arbitrary function f (n) = o( n)   √ (n+t n)/2+f (n) = Q(t)P (X∞ = 0). lim P Xn ≤ 2−2 n→∞ In applying Theorem 2 to {Zn }, it should be noted that P (Z∞ = 0) = I(W ) holds. Theorem 1 is proved straightforwardly on the basis of Theorem 2 via the argument at the end of Sect. II. V. D ISCUSSION A. Extension to Construction with a Larger Matrix m+k Sm + log2 log2 (1/Xm ) The central limit theorem dictates that √2k (S −k/2) asymptotically follows the standard Gaussian distribution, so that, for Polar codes can be constructed on the basis of a matrix larger than the 2 × 2 matrix F in (2). Korada, Şaşoğlu, and Urbanke [6] have provided a full characterization of whether a matrix induces channel polarization. They have shown that if an ℓ × ℓ matrix G is polarizing, then given a symmetric B-MC W,   lim P Zn ≤ 2−ℓ n→∞ nβ = I(W ) holds for any β < E(G), where E(G) is the exponent of the matrix G defined in [6]. For a non-polarizing matrix, the exponent E(G) is zero. Our analysis can be extended to obtain a rate-dependent result for channel polarization using a larger matrix. The extension includes introduction of a sequence {Bi } of i.i.d. random variables with P (B1 = k) = 1/ℓ for k = 1, 2, . . . , ℓ. Let {D1 , D2 , . . . , Dℓ } be “partial distances” of the matrix G defined in [6]. The exponent E(G) is given by the mean of the random variable logℓ DBi . Let V(G) be the variance of the random variable logℓ DBi . Our result in this direction is the following:   √ nE(G)+t nV(G) = Q(t)I(W ). (31) lim P Zn ≤ 2−ℓ n→∞ The worst case of polarizing partial distances is given by the case where only one of {D1 , D2 , . . . , Dℓ } is equal to 2 and the rest are equal to 1. Since E(G) = logℓℓ 2 and  2 V(G) = logℓℓ 2 (ℓ − 1) for the worst case, a universal bound is obtained as   √ (n+t (ℓ−1)n) logℓℓ 2 ≥ Q(t)I(W ), (32) lim P Zn ≤ 2−ℓ n→∞ which can be regarded as a refinement of Theorem 8 in [6]. B. Minimum Distance and ML Decoding We return to the original construction of polar codes on the basis of the 2 × 2 matrix F . Polar codes are linear codes, and their generator matrices are obtained from the matrices of the form F ⊗n via removal of some rows (corresponding to “shortening”) and reordering of the remaining rows. Hussami, Korada, and Urbanke [7] studied the class of linear codes constructed from F ⊗n via shortening, and showed using minimum distance analysis that the error probability of such codes is βn ω(2−2 ) (in the standard Landau notation) for β > 12 . This fact means that polar codes with SC decoding achieve the best performance as n → ∞ up to the dominant term in the double exponent of the error probability. In this subsection, it is shown that the minimum distance analysis does not give the second dominant term in the double exponent of polar codes with SC decoding. This fact implies that SC decoding is not necessarily optimal in the second dominant term. Proposition 5: For any codes whose generator matrix consists of 2n R distinct rows of F ⊗n and any fixed t > Q−1 (R), √ −2(n+t n)/2 ). the error probability of ML decoding is ω(2 Proof: Let I ⊆ {0, 1, . . . , 2n −1} denote the set of indices of rows of F ⊗n chosen to form the generator matrix. The minimum distance of the codes is given by mini∈I 2w(i) , where w(i) denotes the Hamming weight of the binary expansion of i. Let the minimum distance of a code be 2d . Since the number of rows with weight 2i of the matrix F ⊗n is the inequality n   X n ≥ 2n R, i n i  , one obtains (33) i=d or equivalently, P (Sn ≥ d) ≥ R, (34) where Sn is a sum of n i.i.d. Bernoulli √ random variables with probability one half. Let d = n/2+t n/2 for any fixed t ∈ R. Then, ! Sn − n2 √ ≥ t ≥ R. (35) P n 2 From the central limit theorem, the left-hand side converges to Q(t) as n → ∞. Hence, the condition t ≤ Q−1 (R) is necessary for asymptotic existence of the codes satisfying the conditions stated in the Proposition, completing the proof. It should be noted that Proposition 5 also means that the minimum distance of √ the codes considered is asymptotically −1 at most 2(n+Q (R) n)/2 . √ The prefactor of the second dominant term n in the double −1 exponent is Q (R)/2 in Proposition  5, which is strictly larger than the prefactor Q−1 R/I(W ) /2 in Theorem 1 whenever I(W ) < 1. One can argue that it might be due to the channelindependent nature of the analysis leading to Proposition 5, which is reflected in the absence of the channel W in the result. In any case, whether polar codes with SC decoding are optimal in terms of the double exponent up to the second dominant term is an open problem, and thus needs further investigation. VI. C ONCLUSION We have derived a rate-dependent upper bound of the best achievable block error probability of polar codes with successive cancellation decoding. The derivation is based on that of the previous rate-independent results [1], [4], which discusses channel polarization in regimes of the law of large numbers, extending it to regimes of the central limit theorem. We would like to mention that the argument given in this paper can also be applied to the problem of lossy source coding discussed in [7]. R EFERENCES [1] E. Arıkan, “Channel polarization: a method for constructing capacityachieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Info. Theory, vol. 55, no. 7, pp. 3051–3073, July 2009. [2] E. Arıkan, “A performance comparison of polar codes and Reed-Muller codes,” IEEE Comm. Lett., vol. 12, no. 6, pp. 447–449, June 2008. [3] R. Mori and T. Tanaka, “Performance of polar codes with construction using density evolution,” IEEE Comm. Lett., vol. 13, no. 7, pp. 519–521, July 2009. [4] E. Arıkan and E. Telatar, “On the rate of channel polarization,” Proc. 2009 IEEE Int. Symp. Info. Theory, Seoul, Korea, pp. 1493–1495, June/July 2009; [online] arXiv:0807.3806v3 [cs.IT], 2008. [5] S. B. Korada and E. Şaşoğlu, “A class of transformations that polarize binary-input memoryless channels,” Proc. 2009 IEEE Int. Symp. Info. Theory, Seoul, Korea, pp. 1478–1482, June/July 2009. [6] S. B. Korada, E. Şaşoğlu, and R. Urbanke, “Polar codes: Characterization of exponent, bounds, and constructions,” Proc. 2009 IEEE Int. Symp. Info. Theory, Seoul, Korea, pp. 1483–1487, June/July 2009; [online] arXiv:0901.0536v2 [cs.IT], 2009. [7] N. Hussami, S. B. Korada, and R. Urbanke, “Performance of polar codes for channel and source coding,” Proc. 2009 IEEE Int. Symp. Info. Theory, Seoul, Korea, pp. 1488–1492, June/July 2009; [online] arXiv:0901.2370v2 [cs.IT], 2009. [8] R. Mori and T. Tanaka, “Performance and construction of polar codes on symmetric binary-input memoryless channels,” Proc. 2009 IEEE Int. Symp. Info. Theory, Seoul, Korea, pp. 1496–1500, June/July 2009; [online] arXiv:0901.2207v2 [cs.IT], 2009. [9] S. B. Korada and R. Urbanke, “Polar codes are optimal for lossy source coding,” [online] arXiv:0903.0307v1 [cs.IT], 2009. [10] S. H. Hassani, S. B. Korada, and R. Urbanke, “The compound capacity of polar codes,” [online] arXiv:0907.3291v1 [cs.IT], 2009.