Academia.eduAcademia.edu

Parallel bit interleaved coded modulation

2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

A new variant of bit interleaved coded modulation (BICM) is proposed. In the new scheme, called parallel BICM, L identical binary codes are used in parallel using a mapper, a newly proposed finite-length interleaver and a binary dither signal. As opposed to previous approaches, the scheme does not rely on any assumptions of an ideal, infinite-length interleaver. Over a memoryless channel, the new scheme is proven to be equivalent to a binary memoryless channel. Therefore the scheme enables one to easily design coded modulation schemes using a simple binary code that was designed for that binary channel. The overall performance of the coded modulation scheme is analytically evaluated based on the performance of the binary code over the binary channel. The new scheme is analyzed from an information theoretic viewpoint, where the capacity, error exponent and channel dispersion are considered. The capacity of the scheme is identical to the BICM capacity. The error exponent of the scheme is numerically compared to a recently proposed mismatched-decoding exponent analysis of BICM.

Parallel Bit Interleaved Coded Modulation Amir Ingber† and Meir Feder Department of EE-Systems, Tel Aviv University Tel Aviv 69978, ISRAEL {ingber, meir}@eng.tau.ac.il arXiv:1007.1407v2 [cs.IT] 17 Aug 2010 Abstract A new variant of bit interleaved coded modulation (BICM) is proposed. In the new scheme, called parallel BICM, L identical binary codes are used in parallel using a mapper, a newly proposed finite-length interleaver and a binary dither signal. As opposed to previous approaches, the scheme does not rely on any assumptions of an ideal, infinite-length interleaver. Over a memoryless channel, the new scheme is proven to be equivalent to a binary memoryless channel. Therefore the scheme enables one to easily design coded modulation schemes using a simple binary code that was designed for that binary channel. The overall performance of the coded modulation scheme is analytically evaluated based on the performance of the binary code over the binary channel. The new scheme is analyzed from an information theoretic viewpoint, where the capacity, error exponent and channel dispersion are considered. The capacity of the scheme is identical to the BICM capacity. The error exponent of the scheme is numerically compared to a recently proposed mismatched-decoding exponent analysis of BICM. I. I NTRODUCTION Bit interleaved coded modulation (BICM) is a pragmatic approach for coded modulation [1]. It enables the construction of nonbinary communication schemes from binary codes by using a long bit interleaver that separates the coding and the modulation. BICM has drawn much attention in recent years, because of its efficiency for wireless and fading channels. The information-theoretic properties of BICM were first studied by Caire et. al. in [2]. BICM was modeled as a binary channel with a random state that is known at the receiver. The state determines how the input bit is mapped to the channel, along with the other bits that are assumed to be random. Under the assumption of an infinite-length, ideal interleaver, the BICM scheme is modeled by parallel uses of independent instances of this binary channel. This model is referred to as the independent parallel channel model. Using this model the capacity of the BICM scheme could be calculated. It was further shown that BICM suffers from a gap from the full channel capacity, and that when Gray mapping is used this gap is generally small. In [2], methods for evaluating the error probability of BICM were proposed, which rely on the properties of the specific binary codes that were used (e.g. Hamming weight of error events). A basic information-theoretic quantity other than the channel capacity is the error exponent [3], which quantifies the speed at which the error probability decreases to zero with the block length n. Another tool for evaluating the performance at finite block length is the channel dispersion, which was presented in 1962 [4] and was given more attention only in recent years [5], [6]. It would therefore be interesting to analyze BICM at finite block length from the information-theoretic viewpoint. Several attempts have been made to provide error exponent results for BICM. In their work on multilevel codes, Wachsmann et. al. [7] have considered the random coding error exponent of BICM, by relying on the independent parallel channels model. However, there were several flaws in the derivation: • The independent parallel channels model is justified by an infinite-length interleaver. Therefore it might be problematic to use its properties for evaluating the finite length performance of the BICM scheme. In the current paper we address this point and propose a scheme with a finite-length interleaver for that purpose. • There was a technical flaw in the derivation, which resulted in an inaccurate expression for the random coding error exponent. We discuss this point in detail in Theorem 4. • As noticed in [8], the error exponent result obtained in [7] sometimes may even exceed that of unconstrained coding over the channel (called in [8] the “coded modulation exponent”). We therefore agree with [8] in the claim that “the independent parallel channel model fails to capture the statistics of the channel”. However, by properly designing the communication scheme the model can become valid in a rigorous way, as we show in Theorem 1. In [8] (see also [9]), Martinez et al. have considered the BICM decoder as a mismatched decoder, which has access only to the log-likelihood values (LLR) of each bit, where the LLR calculation assumes that the other bits are random, independent and equiprobable (as in the classical BICM scheme [2]). Using results from mismatched decoding, they presented the generalized error exponent and the generalized mutual information, and pinpointed the loss of BICM that incurs from using the mismatched LLRs. Note that when a binary code of length n is used, the scheme requires only n/L channel uses. While this result is valid for any block size and any interleaver length, achieving this error exponent in practice requires complex code design. † A. Ingber is supported by the Adams Fellowship Program of the Israel Academy of Sciences and Humanities. For example, one cannot design a good binary code for a binary memoryless channel and have any guarantee that the BICM scheme will perform well with that code. In fact, the code design for this scheme requires taking into account the memory within the levels, or equivalently, nonbinary codes, which is what we wish to avoid when choosing BICM. On the theoretical side, another drawback of existing approaches is the lack of converse results (for either capacity or error exponent). The initial discussion of BICM information theory in [2] assumes the model of independent channels, and any converse result based on this model must assume that an infinite, ideal, interleaver. Therefore the converse results (such as upper bound on the achievable rate with BICM) do not hold for finite-length interleavers. The authors in [8] provide no converse results for their model. In this paper we propose the parallel BICM (PBICM) scheme, which has the following properties. First, the scheme includes an explicit, finite length interleaver. Second, in order to attain good performance on any memoryless channel, PBICM allows one to design a binary code for a binary memoryless channel, and guarantees good performance on the nonbinary channel. Third, because the scheme does not rely on the use of an infinite-length interleaver, the error exponent and the dispersion of the scheme can be calculated (both achievability and converse results) as means to evaluate the PBICM performance at finite block length. The comparison between PBICM and the mismatched decoding approach [8] should be done with care. With PBICM, when the binary codeword length is n the scheme requires n channel uses. Therefore when the latency kept equal for both schemes, PBICM uses a codeword length that is L times shorter than the codeword used in the mismatched decoder. A fair comparison would be to fix the binary codeword length n for both schemes, resulting in different latency, but equal decoder complexity. The results presented in the paper are summarized as follows: • The PBICM communication framework is presented. Over a memoryless channel, it is shown to be equivalent to a binary memoryless channel (Theorem 1). • In Theorem 2, the capacity of PBICM is shown to be equal to the BICM capacity, as calculated in [2]. • PBICM is analyzed at finite block length. The error exponent of PBICM is defined and bounded by error exponent bounds of the underlying binary channel (Theorems 3 and 4). • The PBICM dispersion is defined as an alternative measure for finite-length performance. It is calculated by the dispersion of the underlying binary channel (Theorems 5 and 6). • The error exponent of PBICM is numerically compared to the mismatched-decoding error exponent of BICM [8]. The additive white Gaussian noise (AWGN) channel and the Rayleigh fading channel are considered. When the latency of both schemes is equal, the mismatched-decoding is generally better. However, when the complexity is equal (or where the codeword length of the underlying binary code is equal), the PBICM exponent is better in many cases. The paper is organized as follows. In Section II we review the classical BICM model and its properties, under the assumption of an infinite-length, ideal interleaver. In Section III the parallel BICM scheme is presented and the equivalence to a memoryless binary channel is established. In Section IV parallel BICM is studied from an information-theoretical viewpoint. Numerical examples and summary follow in Sections V and VI respectively. II. T HE BICM C OMMUNICATION M ODEL Notation: letters in bold (x, y...) denote row vectors, capital letters (X, Y ...) denote random variables, and tilde denotes interleaved signals (b̃, z̃). PX (x) denotes the probability that the random variable (RV) X will get the value x, and similarly PY |X (y|x) denotes the probability Y will get the value y given that the RV X is equal to x. E[·] denotes statistical expectation. log means log2 . A. Channel model Let W denote a memoryless channel with input and output alphabets X and Y respectively. The transition probabilities are defined by W (y|x) for y ∈ Y and x ∈ X . We assume that kX k = 2L . We consider equiprobable signaling only over the channel W . An (n, R) code C ⊆ X n is a set of M = 2nR codewords c ∈ X n . The encoder wishes to convey one of M equiprobable messages. The error probability of interest shall be the codeword error probability. An (n, R) code with codeword error probability pe will sometimes be called an (n, R, pe ) code. B. Classical BICM encoding and decoding In BICM, a binary code is used to encode information messages [m1 , m2 , ...] into binary codewords [b1 , b2 , ...]. The binary codewords are then interleaved using a long interleaver π(·), which applies a permutation on the coded bits. The interleaved bit stream b̃ is partitioned into groups of L consecutive bits and inserted into a mapper µ : {0, 1}L → X . The mapper output, denoted x, is fed into the channel. The encoding process is described in Figure 1. m1 ,m2 .. Binary encoder b1, b2 .. Fig. 1. x W y LLR calc. Fig. 2. π b̃ µ x BICM encoding process z̃ z1, z2 .. π−1 Binary dec. m̂1 ,m̂2 .. BICM decoding process The decoding process of BICM proceeds as follows. The channel output y is fed into a bit metric calculator, which calculates the log-likelihood ratio (LLR) of each input bit b given the corresponding output sample y (L LLR values for each output sample). These LLR values (or bit metrics) denoted z̃ are de-interleaved and partitioned into bit metrics [z1 , z2 , ...] that correspond to the binary input codewords. Finally, the binary decoder decodes the messages [m̂1 , m̂2 , ...] from [z1 , z2 , ...]. The decoding process is described in Figure 2. The LLR of the j th bit in a symbol given the output value y is calculated as follows: LLRj (y) , log PY |Bj (y|0) , PY |Bj (y|1) (1) where PY |Bj (y|b) is the conditional probability of the channel output getting the value y given that the j th bit at the mapper input was b, and the other (L − 1) bits are equiprobable independent binary random variables (RVs). C. Classical BICM analysis: ideal interleaving In classical BICM (e.g. [2]) the LLR calculation is motivated by the assumption of a very long (ideal) interleaver π, so the coded bits go through essentially independent channels. These binary channels are defined as follows: Definition 1: Let Wi be a binary channel with transition probability Wi (y|b) , E [W (y|X = µ(B1 , ..., BL ))|Bi = b] X 1 W (y|µ(b1 , ..., bL )). = L−1 2 (2) (3) bj ; i6=j bi =b The channel Wi (y|bi ) can be thought of as the original channel W where the input is x = µ(b1 ...bL ), where the bits {bj }j6=i are equiprobable independent RVs (see Fig. 3). In [2], Caire et al. have proposed the following channel model for BICM called the independent parallel channel model. In this model the channel has a binary input b. A channel state s is selected at random from S , {1, ..., L} with equal probability (and independently of b). Given a state s, the input bit b is fed into the channel Ws . The channel outputs are the state s and f , is depicted in Figure 4. the output y of the channel Ws . The channel, denoted by W f The transition probability function of W is given by f (y, s|b) = W = = B1 B2 Bi BL Fig. 3. : : µ PY,S|B (y, s|b) PY |S,B (y|s, b)PS (s) 1 Ws (y, b). L X W (4) Y The binary channel Wi . The bits {Bj }j6=i are equiprobable independent RVs. S: random state B Fig. 4. Ws S Y f . The random state S is known at the receiver. The binary channel W Note that both outputs can be combined into a single output, the LLR, which is a sufficient statistic for optimal decoding f is given by over any binary-input channel. The LLR calculation for the channel W LLRW f (y, s) = LLRs (y), (5) where LLRs is given in (1). Therefore the independent parallel channel model transforms the original nonbinary channel W to a simple, memoryless f , reliable channel. Using an infinite-length interleaver and a binary code that was designed for the simple binary channel W communication for the original channel W can be attained. Let C(·) denote the Shannon capacity of a channel (with equiprobable input). Lemma 1 (following [2]): Let CBICM (W ) denote the capacity of the channel W with BICM, a given mapping µ(·) and an infinite-length interleaver (according to the independent parallel channel model). Denote by C(Ws ) the capacity of the channel Ws . Then L X BICM C (W ) = C(Ws ). (6) s=1 C f , we get that Proof: Since the independent parallel channel model assumes L independent uses of the channel W f f (W ) = L · C(W ). The capacity of W is given by   f C W = I(B; Y, S) = I(B; Y |S) BICM = ES I(B; Y |S = s) = ES C(Ws ) L 1X = C(Ws ). L s=1 (7) It is known that CBICM (W ) is generally smaller than the full channel capacity C(W ), as opposed to other schemes, most notably multilevel coding and multistage decoding (MLC-MSD) [7], in which C(W ) can be achieved. However, for Gray mapping the gap is small and can sometimes be tolerated. For example, for 8-PSK signaling over the AWGN channel with SNR = 5dB, C(W ) = 1.86bit where CBICM (W ) = 1.84bit. III. T HE PARALLEL BICM S CHEME In this section we propose an explicit BICM-type communication scheme which we call parallel BICM (PBICM), which allows the usage of binary codes on nonbinary channels at finite blocklength. The main features of the scheme include the following: • Binary codewords are used in parallel to construct a codeword that enters the channel. • A new finite-length interleaver. • A random binary signal (binary dither) that is added to the binary codewords. f , thus allowing exact With the proposed scheme, we rigorously show how the original channel W relates to the channel W analysis and design of codes at finite block lengths. A. Interleaver Design We wish to design a finite length interleaver, where: • The length of the interleaver is minimal, • The interleaver should be as simple as possible, • The binary codewords will go through a binary memoryless channel. m1 .. . EN C1 .. . mL EN CL Fig. 5. s b1 .. . bL LLR calc. Fig. 6. x µ Interleaving scheme viewed as parallel encoders s y B̃ π Z̃ π −1 z1 .. . zL DEC1 .. . DECL m̂1 .. . m̂L De-interleaving scheme viewed as parallel decoders In order for the binary codewords to experience a memoryless channel, each binary codeword must be spread over n channel uses of W , so the interleaver output length cannot be less than n channel uses. The newly proposed interleaver has of output length of exactly n, which satisfies the above requirements. Let EN C and DEC be an encoder-decoder pair for a binary code. Let b1 , ..., bL be L consecutive codewords from the output of EN C, bunched together to a matrix B:     b1 b11 . . . b1n    ..  . B =  ...  =  ... (8) b .  lk bL1 bL n ... bLn n Let s be a vector of i.i.d. random states drawn from S = {1..L} . s shall be the interleaving signal. Each column in B shall be shifted cyclically by the corresponding element sk , so the interleaved signal B̃ is defined as   b(1+s1 )L 1 ... b(1+sn )L n   .. .. B̃ =  , . . b (l+sk )L k b(L+s1 )L 1 ... b(1+sn )L n where (ξ)L , (ξ modulo L) + 1. Each column vector of interleaved signal B̃ is mapped to a single channel symbol: xk = µ(b(1+sk )L k , . . . , b(L+sk )L k ), (9) and we call x = [x1 , ..., xn ] the channel codeword. At the decoder an LLR value is calculated for every bit b in B̃ from y. The LLR values are denoted by Z̃. We assume that s is known at the decoder (utilizing common randomness), therefore the de-interleaving operation is simply sorting back the columns of Z̃ according to s by reversing the modulo operation. The de-interleaver output is a vector of LLR values z for each transmitted codeword b, according to (1). Each codeword is decoded independently by DEC. B. Binary dither Since the decoder decodes each binary codeword independently, the communication scheme employing the above interleaver can be viewed as as set of parallel encoder-decoder pairs, which we denote by EN C1 , ..., EN CL and DEC1 , ..., DECL (see Figures 5 and 6). We do not assume any independence between the effective channels between each encoder-decoder pair. Consider the first encoder-decoder pair, EN C1 and DEC1 . Since the input of DEC1 depends on the codewords transmitted by EN C2 ,...,EN CL , the channel between EN C1 and DEC1 is not strictly memoryless. If, somehow, the decoders DEC2 ,...,DECL were forced to send i.i.d. equiprobable binary codewords, then the channel between EN C1 and DEC1 would f (which is a binary memoryless channel) with the accurate LLR calculation (1). be exactly the channel W In order to achieve the goal of L binary memoryless channels between each encoder-decoder pair simultaneously, we add a binary dither - an i.i.d. equiprobable binary signal - to each encoder-decoder pair as follows. Let the dither signals dl = [dl1 , ..., dln ], l ∈ {1, ..., L} be L random vectors, each of length n, that are drawn independently from a memoryless equiprobable binary source. The output of each encoder EN Cl , bl , goes through a component-wise XOR b1 EN C1 .. . m1 .. . mL EN CL Fig. 7. bL d1 + dL + Fig. 8. B̃ π x µ PBICM encoding scheme. ‘+’ denoted modulo-2 addition (XOR). s y s b′1 .. . b′L LLR calc. Z̃ z′1 π −1 z′L δ1 × δL × z1 .. . zL DEC1 .. . DECL m̂1 .. . m̂L PBICM decoding scheme. δ l , 1 − 2 · dl , ‘×’ denotes element-wise multiplication. operation with the dither vector dl . The output of the XOR operation, denoted b′l , is fed into the interleaver π. The full PBICM encoding scheme is shown in Fig. 7. We let each decoder DECl know the value of the dither used by its corresponding encoder EN Cl , dl (in practice the dither signals are generated using a pseudo-random generator which allows the common randomness). In order to compensate for the dither at the decoder, the LLR values are modified by flipping their sign for each dither value of 1 (and maintaining the ′ ′ ... zln ]. The LLR values sign where the dither is 0). Formally, denote the LLR values at the de-interleaver output by z′l = [zl1 at the decoders input shall be denoted by zl = [zl1 ... zln ] and calculated as follows: ′ zlj = zlj (1 − 2dlj ), j = 1, ..., n. (10) The PBICM decoding scheme is shown in Fig. 8. C. Model equivalence Before we analyze the channel between each encoder-decoder pair in PBICM, let us define a binary memoryless channel f , that will prove useful in the analysis of PBICM. that is related to W Definition 2: Let W be a memoryless binary channel with input B and output hY, S, Di: S is drawn at random from {1, ..., L}, D is drawn at random from {0, 1} (S and D are independent, and both do not depend on the input B). Y is the f where the output of the channel WS with input B ⊕ D (⊕ is the XOR operation). Note that the channel W is the channel W input is XORed with a binary RV D (see Fig. 9). Note that the LLR calculation for the channel W is given by d LLRW (y, s, d) = (−1)d LLRW f (y, s) = (−1) LLRs (y), (11) where LLRW f and LLRs are given in (5) and (1), respectively. Theorem 1: In parallel BICM, the channel between every encoder-decoder pair is exactly the binary memoryless channel W , with its exact LLR output. Proof: Consider the pair EN C1 and DEC1 . Let b1 be the codeword sent from EN C1 . After adding the dither d1 , the dithered codeword b′1 enters the interleaver. The other codewords b2 , ..., bL are dithered using d2 , ..., dL . Since the dither of these codewords is unknown at DEC1 , the dithered codewords b′2 , ..., b′L are truly random i.i.d. signals. The interleaving signal s interleaves the dithered codewords according to (8). The interleaved signal enters the mapper µ and the channel W , W D ∈ {0, 1} B Fig. 9. f S ∈ {1, ..L} W Ws D S Y The binary channel W . The random state S and the dither D are known at the receiver. resulting in an output y. Since the dithered codewords b′2 , ..., b′L are i.i.d., the equivalent channel from b′1 to hy, si is exactly f . The LLR calculation at the PBICM receiver along with the interleaver produce z′1 , which is exactly the LLR the channel W f (cf. (5)). calculation that fits the channel W f with its input XORed with a binary RV, and that the LLR of Recalling that the channel W is nothing but the channel W f can be easily modified by the dither according to Eq. (11) to produce the LLR of the channel W , we conclude the channel W that the channel between b1 to z1 is exactly the channel W with LLR calculation. Since by symmetry the above holds for any encoder-decoder pair EN Cl -DECl , the proof is concluded. An important note should be made: Parallel BICM allows the decomposition of the nonbinary channel W to L binary channels of the type W . These L channels are not independent. For example, if W is an additive noise channel, and at some point the noise instance is very strong, this will affect all the decoders and they will fail in decoding together. However, since in the PBICM scheme the channels are used independently, the operation of each decoder depends only on the marginal distribution of the relevant channel outputs. The outputs of these decoders will inevitably be statistically dependent, and we take this into consideration when analyzing the performance of coding using PBICM in the following. D. Error Probability Analysis We wish to analyze the performance of PBICM, and specifically, we are interested in the overall codeword error probability. Let C be a binary (n, R) code, used in the PBICM scheme. To assure a fair comparison, we regard each L consecutive information messages (m1 , ..., mL ) as a single message m, and regard the scheme as a code of length n on the channel input alphabet X . We define the following error events: Let El beSthe event of a codeword error in DECl , and let E be the event of an error in any of the messages {m1 , ..., mL }, i.e. E = l El . Denote the corresponding error probabilities by pel and pe respectively. Corollary 1: Let pe (W ) be the codeword error probability of the code C over the channel W . Then the overall error probability pe of the code C used with PBICM can be bounded by pe (W ) ≤ pe ≤ L · pe (W ). (12) Proof: Since the error events El in codewords that are mapped to the same channel codeword together are dependent, we can only bound the overall error probability pe using the union bound. pe can be also lower bounded by the minimum of the error probabilities in any of the channels: X min{pe1 , ..., peL } ≤ pe ≤ pel . (13) l Since by Theorem 1 the channel between each of the encoder-decoder pairs is W , we get that the error probabilities must be all equal to the error probability of the code C over the channel W . Setting pe1 = pe2 = ... = peL = pe (W ) in (13) completes the proof. In many cases the bit error rate (BER) is of interest. Suppose that each of the messages (m1 , ..., mL ) represents k information b ′ bits and the entire message m represents L · k information bits. Let Elk ′ denote the error in the k -th bit of the information message ml . The average BER for the encoder-decoder pair EN Cl -DECl is defined by pbel , k 1 X b P r{Elk ′ }. k ′ (14) k =1 Similarly, define the overall average BER as pbe , L k L 1X b 1 XX b P r{Elk pel . ′} = L·k L ′ l=1 k =1 (15) l=1 Corollary 2: Let pbe (W ) be the average BER of a binary code C over the channel W . Then the average BER pbe of the code C used with PBICM is equal to pbe (W ). Proof: Follows directly from Theorem 1 and from the definition of the average BER in (15). IV. PARALLEL BICM: I NFORMATION T HEORETICAL A NALYSIS In the previous section we defined the PBICM scheme and analyzed its basic error probability properties. The equivalence of the channel between each encoder-decoder pair that was established in Theorem 1 enables a full information-theoretical analysis of the scheme. We show that the highest achievable rate by PBICM (the PBICM capacity) is equal to the BICM capacity as in Equation (7), which should not be a surprise. At the finite-length regime, we derive error exponent and channel dispersion results as information-theoretical measures for optimal PBICM performance at finite-length. A. Capacity Let the PBICM capacity of W , CPBICM (W ), be the highest achievable rate for reliable communication over the channel W with PBICM and a given mapping µ. (As usual, reliable communication means a vanishing codeword error probability as the codelength n goes to infinity.) Theorem 2: The PBICM capacity is given by CPBICM (W ) = L · C(W ) = L X C(Ws ) = CBICM (W ). (16) s=1 Proof: (n) Achievability: Let C (n) be a series of (binary) capacity-achieving codes for the channel W , and let pe (W ) be the corresponding (vanishing) codeword error probabilities. By Corollary 1, the overall error probability of PBICM with a binary code is upper bounded by L times the error probability of the same code over the channel W , therefore when the codes C (n) (n) are used with PBICM, the overall error probability is bounded by L · pe (W ) and also vanish with n. Since there are L instances of the channel W , we get that the rate of L · C(W ) is achievable by PBICM. Converse: Let C (n) be a series of binary codes that are used with PBICM and achieve a vanishing overall error probability (n) pe , and suppose that the overall PBICM rate is given by L · R (a rate of R at each encoder-decoder pair). By Corollary 1, the codeword error probability of a code over W is upper bounded by the overall error probability of the same code used (n) in PBICM. Therefore, if pe vanishes as n → ∞, then the error probability over W must also vanish, and therefore the communication rate between each encoder-decoder pair must be upper bounded by C(W ), and the overall rate cannot surpass L · C(W ). All that remains is to calculate the capacity of W : 1 (17) C(W ) = I(B; Y, S, D) = I(B; Y, S|D) = (I(B; Y, S|D = 0) + I(B; Y, S|D = 1)) . 2 f exactly, and when D = 1 we get the channel W f with its input symbols always switched. When D = 0, we get the channel W f In either way, the expression I(B; Y, S|D = d) is equal to the capacity of W . Using Lemma 1, we get that L X f) = 1 C(W ) = C(W C(Ws ). L s=1 (18) A note regarding the capacity proof: one might me tempted to try and prove the capacity theorem for PBICM without dither, since with random coding, the code C is merely an i.i.d. binary random vector. This approach fails because of the f relies on the fact that the other codewords are following. In the decoding of each codeword, the correctness of the model W i.i.d. signals. Since PBICM requires a single code for all the L levels, such a condition can never be met. It it possible to prove the achievability without dither when using a different random code at each level, but such an approach will not guarantee the existence of a single code, as required by PBICM. B. Error Exponent The error exponent of a channel W is defined by 1 log (pe (n)) , (19) n where pe (n) is the average codeword error probability for the best code of length n. A lower bound on the error exponent for memoryless channels is the random coding error exponent [3], which is given by E(R) , lim − n→∞ Er (R) = max max {E0 (ρ, PX ) − ρR}, (20) ρ∈[0,1] PX (·) where  E0 (ρ, PX ) , − log  X y∈Y X x∈X PX (x)W (y|x)1/(1+ρ) !1+ρ  . (21) Since we consider equiprobable inputs only we omit the dependence of E0 (ρ) in PX , and omit the maximization w.r.t. PX in (20). Others known bounds on the error exponent include the expurgation error exponent lower bound, the sphere packing error exponent (an upper bound) and others [3]. The random coding and sphere packing exponents coincide for rates above the critical rate, and therefore the error exponent is known precisely at these rates. 1) PBICM error exponent: Similarly to (19), we define the PBICM error exponent: Definition 3: For a given channel W and a mapping µ, let EPBICM (R) be defined as 1 log (pe (n)) , n where pe (n) is the average codeword error probability for the best PBICM scheme with block length of n. Using Corollary 1, we can calculate the PBICM exponent using the error exponent of W : Theorem 3: The PBICM error exponent of a channel W is given by EPBICM (R) , lim − n→∞ EPBICM (R) = E(R/L), (22) (23) where E(·) is the error exponent function of the binary channel W . (n) Proof: Let C (n) be a series of the binary codes. Denote their codeword error probabilities over the channel W by pe (W ). (n) Let pe be the error probabilities of the corresponding PBICM schemes with C (n) used as underlying codes. It follows from (12) that 1 1 1 (n) − log(L · p(n) (24) ≤ − log p(n) e (W )) ≤ − log pe e (W ). n n n By taking n → ∞ the factor of L vanishes and we get that for any series of codes, lim − n→∞ 1 1 log p(n) = lim − log p(n) e e (W ). n→∞ n n (25) The above equation holds for the series of best codes for the channel W , as well as for the series of the best codes for PBICM. Therefore the equality holds for the sequence of best codes on either side. Since the rate for PBICM is L times the rate for coding on W , the proof is concluded. 2) The error exponent of W : The channel W has a special structure, and is related to the binary sub-channels Wi . We now calculate two basic bounds for the error exponent of W in terms of the sub-channels Wi . By Theorem 3, the PBICM error exponent of the channel W can be bounded accordingly. Theorem 4: Let E(R) be the error exponent of the channel W . It can be bounded as follows: Random coding: E(R) ≥ Er (R) = max {E0 (ρ) − ρR}, (26) ρ∈[0,1] where (s) i h (S) E0 (ρ) = − log E 2−E0 (ρ) , (27) E0 (ρ) is the E0 function for the channel Ws , and the expectation is w.r.t. the state S which is drawn uniformly from {1..L}. Sphere packing: E(R) ≤ Esp (R) = max{E0 (ρ) − ρR}, (28) ρ>0 where E0 (ρ) is given in (27). Proof: The bounds in the theorem are the original random coding and sphere packing exponents [3]. The proof, therefore, boils down to the simplification of the E0 function to the form of (27). f Consider the channel W (Definition 2) with binary input B and outputs hY, S, Di. Since W is equivalent to the channel W with input B ⊕ D, where D is an equiprobable binary RV (and known at the receiver), we get that W (y, s, d|b) = 1f W (y, s|b ⊕ d). 2 f , in turn, is nothing more than the channel Ws with the additional output S. This yields The channel W 1 1f W (y, s|b ⊕ d) = Ws (y|b ⊕ d). 2 2L (29) (30) Combining the above, the function E0 of W is therefore given by 1+ρ  X X 1  PB (b)W (y, s, d|b) 1+ρ  E0 (ρ) = − log y∈Y s∈{1..L} b∈{0,1} d∈{0,1} = − log X y∈Y  1+ρ 1  1+ρ X 1 1   Ws (y|b ⊕ d) 2 2L s∈{1..L} d∈{0,1} (a) = − log X y∈Y   s∈{1..L} = (b) = = − log X s∈{1..L} X b∈{0,1} X b′ ∈{0,1} 1 2  1+ρ 1  1+ρ 1  Ws (y|b′ ) L  1 X X L ′ y∈Y b ∈{0,1} 1 −E(0s) (ρ) 2 − log L s∈{1..L} i h (s) − log E 2−E0 (ρ) (31) 1+ρ 1 1 Ws (y|b′ ) 1+ρ  2 (32) (a) follows by setting b′ , b ⊕ d, and by noting that the summation result is independent of the value of d. (b) follows from (s) the definition of E0 (ρ) (the E0 function for the channel Ws ). Several notes can be made: • It is well known that the random coding and sphere packing exponents coincide at rates above the critical rate. Therefore W the exact error exponent of W is known at rates above the critical rate of W , Rcr . It follows that the exact PBICM error PBICM W exponent is known at rates above Rcr , L · Rcr , which we define to be the PBICM critical rate. • In theorem 4 we have shown that the random coding and the sphere packing bounds have a compact form because of the special structure of the channel W . Clearly, following Theorem 3, every bound on E(R) of W serves as a bound on the PBICM error exponent. However, for other bounds (such as the expurgation error exponent [3]), no compact form could be found. Such bounds, of course, can still be applied to bound EPBICM (R). f is equal to the E0 function of the channel W . This can easily be seen from the proof • The E0 function of the channel W f above: E0 for W is given in (31) by definition. f for calculating the error exponent of BICM. It is claimed that E0 of the • In [7], the authors offered the model of W f is given by [7, Eq. (37)]: channel W L h i 1 X (s) (S) E E0 (ρ) = E (ρ). (33) L s=1 0 h i (S) As we have just shown in Theorem 4, this is not the exact expression. In fact, it can be shown that E0 (ρ) ≤ E E0 (ρ) . This follows directly from the convexity of the function 2−(·) and the Jensen inequality. Therefore the incorrect expression in [7, Eq. (37)] always overestimates the value of E0 (ρ), and therefore the resulting Er (R) expression also overestimates the true random coding expression. C. Channel Dispersion An alternative information theoretical measure for quantifying coding performance with finite block lengths is the channel dispersion. Suppose that a fixed codeword error probability pe and a codeword length n are given. We can then seek the maximal achievable rate R given pe and n. √ It appears that for fixed pe and n, the gap to the channel capacity is approximately proportional to Q−1 (pe )/ n (where Q(·) is the complementary Gaussian cumulative distribution function). The proportion constant (squared) is called the channel dispersion. Formally, define the (operational) channel dispersion as follows [6]: Definition 4: The dispersion V(W ) of a channel W with capacity C is defined as 2  C − R(n, pe ) , (34) V(W ) = lim lim sup n · pe →0 n→∞ Q−1 (pe ) where R(n, pe ) is the highest achievable rate for codeword error probability pe and codeword length n. In 1962 , Strassen [4] used the Gaussian approximation to derive the following result for DMCs1 :   p log n −1 R(n, pe ) = C − V /nQ (pe ) + O , n (35) where C is the channel capacity, and the new quantity V is the (information-theoretic) dispersion , which is given by V , VAR(i(X; Y )), (36) where i(x; y) is the information spectrum, given by i(x; y) , log PXY (x, y) , PX (x)PY (y) (37) and the distribution of X is the capacity-achieving distribution that minimizes V . Strassen’s result proves that the dispersion of DMCs is equal to VAR(i(X; Y )). This result was recently tightened (and extended to the power-constrained AWGN channel) in [6]. It is also known that the channel dispersion and the error exponent are related as follows. For a channel with capacity (C−R)2 C and dispersion V , the error exponent can be approximated by E(R) ∼ = 2V ln 2 . See [6] for details on the early origins of this approximation by Shannon. 1) PBICM dispersion: In order to estimate the finite-block performance of PBICM schemes we extend the dispersion definition as follows: Definition 5: The PBICM dispersion VPBICM (W ) of a channel W with a given mapping µ and PBICM capacity CPBICM (W ) is defined as 2  PBICM C (W ) − R(n, pe ) , (38) VPBICM (W ) = lim lim sup n · pe →0 n→∞ Q−1 (pe ) where R(n, pe ) is the highest achievable rate for any PBICM scheme with a given n and pe . Relying on the relationship between the PBICM scheme and the binary channel W , we can show the following: Theorem 5: Let n be a given block length and let pe be a given codeword error probability. The highest achievable rate attained using PBICM, RPBICM (n, pe ) is bounded from above and below by: s   1 L2 V(W ) −1  pe  +O , (39) Q RPBICM (n, pe ) ≥ CPBICM (W ) − n L n s   log n L2 V(W ) −1 PBICM PBICM . (40) Q (pe ) + O R (n, pe ) ≤ C (W ) − n n As a result, the PBICM dispersion is given by VPBICM (W ) = L2 V(W ). (41) Proof: Direct: From the achievability proof of (35) [6, Theorem 45], there must exist an (n, R′ , p′e = pe /L) binary code for W that satisfies s   1 V(W ) −1 ′ . (42) Q (pe /L) + O R ≥ C(W ) − n n By Theorem 1 and Corollary 1, it follows that the PBICM scheme based on this code is not greater than Lp′e = pe . The rate of the PBICM scheme satisfies   s   V(W ) 1  R = L · R′ ≥ L C(W ) − Q−1 (p′e ) + O (43) n n s   L2 V(W ) −1  pe  1 BICM +O . (44) = C (W ) − Q n L n 1 see Appendix B for the big-O notation. Converse: Suppose we have a (n, R, pe ) PBICM scheme. According to Corollary 1, the codeword error probability p′e of the underlying binary code is not greater than than pe . By Equation (35), the rate R′ of the underlying binary code is bounded by s   V(W ) −1 ′ log n ′ R ≤ C(W ) − . (45) Q (pe ) + O n n Since Q−1 (·) is a decreasing function, the bound loosens by replacing p′e with the higher pe . Therefore the overall rate R is bounded by   s   log n V(W )  (46) Q−1 (p′e ) + O R = L · R′ ≤ L C(W ) − n n s   L2 V(W ) −1 log n PBICM ≤ C (W ) − . (47) Q (pe ) + O n n PBICM dispersion: Rewriting Equations (39) and (40), we get the following: s s     1 log n L2 V(W ) −1 L2 V(W ) −1  pe  PBICM ≤C (W ) − R ≤ +O Q (pe ) + O Q n n n L n      PBICM  q  q √ Q−1 pLe log n C (W ) − R 1 2 V(W ) · √ ≤ n ≤ L2 V(W ) + O √ L + O Q−1 (pe ) Q−1 (pe ) n n (48) (49) Taking the limit w.r.t. n yields or   PBICM  q q √ Q−1 pLe C (W ) − R 2 2 L V(W ) ≤ lim sup n ≤ L V(W ) · −1 , Q−1 (pe ) Q (pe ) n→∞ 2 L V(W ) ≤ lim sup n n→∞ By noting that limε→0+ Q−1 (ε)2 2 ln 1ε lim pe →0  CPBICM (W ) − R Q−1 (pe ) 2  !2 Q−1 pLe . Q−1 (pe ) 2 ≤ L V(W ) (50) (51) = 1 (see Appendix A), we get that  !2 Q−1 pLe ln(L/pe ) ln(1/pe ) + ln L = lim = lim = 1, −1 pe →0 ln(1/pe ) pe →0 Q (pe ) ln(1/pe ) (52) which leads to the desired result: VPBICM (W ) = lim lim sup n · pe →0 n→∞  CPBICM (W ) − R(n, pe ) Q−1 (pe ) 2 = L2 V(W ). (53) Note that the PBICM dispersion result is not as tight as the bound for general coding schemes as in (35). The reason is the unavoidable use of the union bound when estimating the overall error probability of PBICM in Theorem 1. In the dispersion proof for DMCs, the value of the dispersion is obtained even without taking the limit w.r.t. pe . However, the gap between Q−1 (pe ) and Q−1 (pe /L) for values of interest is not very large. 2) The dispersion of W : As in the error exponent case, the PBICM dispersion of a channel is related to the dispersion of the binary channel W . We now calculate it explicitly from the dispersions of the sub-channels Wi . Theorem 6: The dispersion of the channel W is given by # " L X 1 f ) = E[V(WS )] + VAR [C(WS )] = V(Ws ) + VAR(C(WS )) (54) V(W ) = V(W L s=1 where VAR(C(WS )) is the statistical variance of the capacity of Ws , i.e. VAR(C(WS )) , E[C2 (WS )] − E2 [C(WS )]. (55) Proof: Consider the channel W (Definition 2) with binary input B and outputs hY, S, Di, and recall that PY SD|B (y, s, d|b) = W (y, s, d|b) = 1 1f W (y, s|b ⊕ d) = Ws (y|b ⊕ d). 2 2L (56) f . Since S and the channel input B are independent, the information spectrum is given We first calculate the dispersion of W by i(b; y, s) , = PY |SB (y|s, b)PS (s)PB (b) PY SB (y, s, b) = log PY S (y, s)PB (b) PY S (y, s)PB (b) PY |SB (y|s, b) log , i(b; y|s). PY |S (y|s) log (57) (58) Using this notation, the dispersion of the channel Ws is given by V(Ws ) = VAR(i(B; Y |s)|S = s)   = E i2 (B; Y |s)|S = s − C(Ws )2 . f is given as follows: Next, the dispersion of the channel W f) = V(W (a) = = VAR(i(B; Y, S)) = VAR(i(B; Y |S)) E [VAR[i(B; Y |s)|S = s]] + VAR [E[i(B; Y |S)|S = s]] # " L 1X V(Ws ) + VAR(C(WS )). E[V(WS )] + VAR [C(WS )] = L s=1 (59) (a) follows from the law of total variance. Finally, the dispersion of the channel W is calculated as follows: f to a single output Z = hY, Si. We therefore end up with a channel with input Let us combine the outputs of the channel W B and outputs Z and D (see Fig. 9). Similarly to (57), we get that the information spectrum is given by i(b; z, d) , log PZDB (z, d, b) = i(b; z|d). PZD (z, d)PB (b) (60) Following (59), we get that h i fD )] + VAR C(W fD ) = 1 V(W ) = E[V(W 2 X d={0,1} fd ) + VAR(C(W fD )), V(W (61) fd is the channel W f with its input XORed with the value d. where W f0 ) = C(W f1 ) = C(W f ), and that V(W f0 ) = V(W f1 ) = Since only equiprobable inputs are considered, it follows that C(W f f f V(W ). It therefore follows that VAR(C(WD )) = 0, and consequently, V(W ) = V(W ), as required. Note that since large dispersion means higher backoff from the capacity (see (35)), the term VAR(C(WS )) can be thought of as a penalty factor for the dispersion, over the expected dispersion over the channels Ws , E[V(WS )]. This factor grows as the capacities of the sub-channels Wi are more spread. V. N UMERICAL R ESULTS In this section we evaluate numerically the information-theoretical quantities for PBICM. In particular, we calculate the PBICM random coding error exponent (see Theorems 3 and 4) in order to compare with the mismatched decoding approach [8]. We consider the AWGN channel and the Rayleigh fading channel (with perfect channel state information at the receiver) over a wide range of SNR values and constellations. Gray mapping was used throughout all the examples. A. Normalization: latency vs. complexity One way to compare the PBICM error exponent with the mismatched decoding exponent is to compare the error probability when the block length n is fixed, which gives a simple comparison between the exponent values. Such an approach makes sense, since both schemes have the same latency of n channel uses. As will be seen in the coming examples, for fixed n the PBICM error exponent is inferior to that of the mismatched decoding. This can also be seen by observing that the PBICM random coding exponent has a slope of −1/L (in its straight-line region), where the mismatched decoding exponent has a slope of −1. However, it should be taken into consideration that when the block length is n, the mismatched decoder is working with a binary code of length n · L. The complexity of the maximum-metric decoder is proportional to the number of codewords 2n·L·R [8], where R is the rate of the binary code. On the other hand, the number of codewords in the PBICM scheme is L · 2n·R only. In order to assure a fair comparison from the complexity point of view, one has to allow the PBICM scheme to use a block length that is L time the block length of the mismatched decoding scheme. Comparing the error probabilities of both schemes gives nLEPBICM = nEMismatched . We therefore define the normalized PBICM error exponent as L times the PBICM r r error exponent. We conclude that when the complexity is more important (and the latency is less important), the normalized PBICM exponent is the quantity of interest. It could be claimed, of course, that practical codes used today (such as low-density parity check (LDPC) codes ) will be used and they do not have exponential decoding complexity. On the other hand, such codes do not guarantee an exponentially decaying error probability. B. Comparison with the Mismatched Decoding Exponent In the following figures we show the comparison between the PBICM error exponent and the mismatched decoding error exponent [8]. The figures show the (unconstrained) random coding error exponent of the channel, along with the mismatched error exponent and the PBICM random coding error exponent (both normalized and un-normalized). Figure 10 compares the exponents of 16QAM signaling over the Rayleigh fading channel at SNR = 5dB. Figure 11 shows the same graph, zoomed-in on the capacity region. It can be seen that throughout the entire range of rates between zero and the BICM capacity, the normalized PBICM random coding exponent is higher (better) than the mismatched decoding exponent. Both BICM exponents are above zero for rates below the BICM capacity, and the unconstrained random coding exponent reaches zero at the full channel capacity, as expected. A fact that might be somewhat surprising at first glance is that the normalized PBICM exponent is better than the unconstrained random coding exponent in some rates. While this may seem contradictory, recall that we consider coding schemes with the same maximum-likelihood (or maximum metric) complexity. When normalizing the schemes complexity, PBICM operates with a block length that is L times the block length of the unconstrained scheme, and therefore there is no contradiction. The mismatched decoder never attains higher values than the unconstrained exponent, a fact that is known as the data processing inequality for exponents (see e.g. [8, Proposition 3.2]). Figure 12 shows a similar picture (zoomed on the capacity in Figure 13). Again, the normalized PBICM outperforms the mismatched decoding exponent for all rates. In this case, the BICM capacity is very close to the full channel capacity, which enables the normalized PBICM to outperform the unconstrained exponent for essentially all rates. On the Rayleigh fading channel, the same behavior was observed for the range of all practical ranges of SNR for 8PSK, 16QAM and 64QAM signaling: the normalized PBICM exponent outperformed the mismatched decoding exponent. On the AWGN channel it cannot be claimed that the normalized PBICM exponent outperforms the mismatched exponent, and the other way around is also not true: for 16QAM signaling and a SNR of 0dB (Fig. 14) the normalized PBICM exponent was better, while for a SNR of 5dB the mismatched exponent was better (Fig. 15). VI. D ISCUSSION In this paper we have presented parallel bit-interleaved coded modulation (PBICM). The scheme is based on a finite-length interleaver and adding binary dither to the binary codewords. The scheme is shown to be equivalent to a binary memoryless channel, therefore the scheme allows easy code design and exact analysis. The scheme was analyzed from an informationtheoretical viewpoint, and the capacity, error exponent and the dispersion of the PBICM scheme were calculated. Another approach for analyzing BICM at finite block length was proposed in [8], where BICM is thought of as a mismatched decoder. Since this BICM setting uses finite length, the random coding error exponent of the scheme can be calculated. In the previous section we have compared the error exponents of PBICM and of the mismatched decoding approach. When the two schemes have the same latency (same block length) the PBICM exponent is inferior to that of the mismatched decoding approach. However, when the complexity of the scheme is considered (or equivalently, when codeword length of the underlying code is the same), PBICM becomes comparable, and generally better over the Rayleigh fading channel. An important merit of the PBICM scheme is that it allows an easy code design. In PBICM, one has to design a binary code for a memoryless binary channel. In recent years there have developed methods to design very efficient binary codes, such as LDPC codes [10]. When designing LDPC codes, A desired property of a binary channel is that its output will be symmetric. It appears that no matter what channel W we have at hand, the resulting binary channel W is always output-symmetric (when the output is the LLR). Because of its simplicity and easy code design, we conclude that PBICM is an attractive practical communication scheme, which also allows exact theoretical analysis. Capacity Capacity (BICM) E (R) 1 r E (R) [PBICM] r 0.8 E (R) [PBICM, normalized] r E (R) [Mismatched decoding] r 0.6 0.4 0.2 0 Fig. 10. 0 0.5 1 1.5 2 Random coding exponents over the Rayleigh fading channel with 16-QAM signaling and SNR of 5dB. 0.02 Capacity Capacity (BICM) Er(R) 0.018 0.016 E (R) [PBICM] r E (R) [PBICM, normalized] 0.014 r Er(R) [Mismatched decoding] r E (R) 0.012 0.01 0.008 0.006 0.004 0.002 0 1.3 Fig. 11. 1.35 1.4 1.45 1.5 R[bits] 1.55 1.6 1.65 1.7 Random coding exponents over the Rayleigh fading channel with 16-QAM signaling and SNR of 5dB (zoomed on the capacity) 5.5 Capacity Capacity (BICM) E (R) 5 r 4.5 E (R) [PBICM] r 4 E (R) [PBICM, normalized] r E (R) [Mismatched decoding] r 3 r E (R) 3.5 2.5 2 1.5 1 0.5 0 Fig. 12. 0 1 2 3 R[bits] 4 5 Random coding exponents over the Rayleigh fading channel with 64-QAM signaling and SNR of 20dB. Capacity Capacity (BICM) Er(R) 1.2 E (R) [PBICM] 1 r E (R) [PBICM, normalized] r Er(R) [Mismatched decoding] r E (R) 0.8 0.6 0.4 0.2 0 3 3.5 4 4.5 5 5.5 R[bits] Fig. 13. Random coding exponents over the Rayleigh fading channel with 64-QAM signaling and SNR of 20dB (zoomed on the capacity) 0.7 Capacity Capacity (BICM) E (R) 0.6 r E (R) [PBICM] r E (R) [PBICM, normalized] 0.5 r E (R) [Mismatched decoding] r r E (R) 0.4 0.3 0.2 0.1 0 Fig. 14. 0 0.2 0.4 0.6 R[bits] 0.8 1 Random coding exponents over the AWGN channel with 16QAM signaling and SNR of 0dB 1.5 Capacity Capacity (BICM) Er(R) E (R) [PBICM] r E (R) [PBICM, normalized] r 1 r E (R) Er(R) [Mismatched decoding] 0.5 0 Fig. 15. 0 0.5 1 R[bits] 1.5 2 Random coding exponents over the AWGN channel with 16QAM signaling and SNR of 5dB Several additional notes can be made: • The analysis holds for any mapping µ. Finding the mapping that yields the optimal performance at finite lengths is an open question (although Gray mapping is expected to perform well). • PBICM scheme is composed of, among other things, binary dither. Such tool is used in some cases as a theoretical tool for proving achievability in some problems. In PBICM, it is an essential part of the scheme itself, and even the random capacity proof becomes impossible without it. The main role of the dither is to validate the equivalence of the PBICM scheme to a binary memoryless channel. In addition, the binary dither is the element that symmetrizes the binary channel, which makes the code design easier. This symmetrization property was also noticed by [11] where a similar dither is used with BICM (and termed ’channel adapters’). The code design proposed in [11] rely on the assumption of an ideal interleaver. • The channel is assumed to be memoryless. This captures many interesting channels, including the AWGN channel, and the memoryless fading channel with and without state known at the receiver (ergodic fading). For slow-fading channels, another interleaver (symbol interleaver) is required in order to transform the slowly fading channel into a fast-fading channel (cf. [2]). A PPENDIX A A PPROXIMATION Q- FUNCTION OF THE INVERSE The following is a useful approximation for the inverse Q-function. Lemma 2:  −1  (Q (ε))2 lim = 1. ε→0 2 ln 1ε (62) Proof: We start with the well known bound on the Q function:   x2 x2 1 1 1 √ 1 + 2 e− 2 ≤ Q(x) ≤ √ e− 2 x 2πx 2πx Dividing by the upper bound yields   1 1+ 2 ≤ x Taking the limit x → ∞ gives lim Q(x) √ 1 e− 2πx Q(x) x2 2 ≤ 1. (64) = 1. 2 x x→∞ √ 1 e− 2 2πx (63) (65) Since the limit exists, we may take the natural logarithm: lim ln x→∞ Q(x) lim ln Q(x) − ln √ x→∞ = 0. (66) 1 x2 − ln e− 2 = 0. 2πx (67) 2 x √ 1 e− 2 2πx Since limx→∞ ln Q(x) = −∞, we get lim 1 ln Q(x) − ln √2πx − ln e− ln Q(x) x→∞ which leads to x2 2 = 0, (68) x2 −x2 ln e− 2 = lim = 1. lim x→∞ 2 ln Q(x) x→∞ ln Q(x) (69) Since limε→0 Q−1 (ε) = ∞, we may substitute x with Q−1 (ε), and write −(Q−1 (ε))2 = 1, ε→0 2 ln ε lim which leads to (62). (70) A PPENDIX B B IG -O NOTATION : As usual, f (n) = O(εn ) means that there exist c > 0 and n0 > 0 s.t. for all n > n0 , |f (n)| ≤ εn or equivalently, that − cεn ≤ f (n) ≤ cεn . (71) fn = gn + O(εn ) will mean that fn − gn = O(εn ), which means that fn can be approximated by gn , up to a factor that is not greater in absolute value than c · εn for some constant c. Sometimes we will be interested in only one of the sides in (71). For that purpose, f (n) ≤ O(εn ) means that there exist c > 0 and n0 > 0 s.t. for all n > n0 , f (n) ≤ c · εn , and f (n) ≥ O(εn ) will mean that there exist c > 0 and n0 > 0 s.t. for all n > n0 , −f (n) ≤ c · εn . The different combinations of usages of the O notation are listed in the table below. Notation fn = O(εn ) fn = gn + O(εn ) fn ≤ O(εn ) fn ≤ gn + O(εn ) fn ≥ O(εn ) fn ≥ gn + O(εn ) Meaning ∃c>0,n0 >0 ∀n>n0 |fn | ≤ c · εn fn − gn = O(εn ) ∃c>0,n0 >0 ∀n>n0 fn ≤ c · εn fn − gn ≤ O(εn ) −fn ≤ O(εn ), or ∃c>0,n0 >0 ∀n>n0 −fn ≤ c · εn fn − gn ≥ O(εn ) Note that fn ≤ O(εn ) with fn ≥ O(εn ) is equivalent to fn = O(εn ), as expected. ACKNOWLEDGMENT Interesting discussions with A. G. i Fàbregas are acknowledged. R EFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE Trans. on Communications, vol. 40, no. 5, pp. 873–884, May 1992. G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modulation,” IEEE Trans. on Information Theory, vol. 44, no. 3, pp. 927–946, 1998. R. G. Gallager, Information Theory and Reliable Communication. New York, NY, USA: John Wiley & Sons, Inc., 1968. V. Strassen, “Asymptotische abschätzungen in shannons informationstheorie,” Trans. Third Prague Conf. Information Theory, 1962, Czechoslovak Academy of Sciences, pp. 689–723. Y. Polyanskiy, V. Poor, and S. Verdú, “Dispersion of Gaussian channels,” in Proc. IEEE International Symposium on Information Theory, 2009, pp. 2204–2208. Y. Polyanskiy, H. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Trans. on Information Theory, vol. 56, no. 5, pp. 2307 –2359, May 2010. U. Wachsmann, R. F. H. Fischer, and J. B. Huber, “Multilevel codes: Theoretical concepts and practical design rules,” IEEE Trans. on Information Theory, vol. 45, no. 5, pp. 1361–1391, 1999. A. Martinez, A. Guillén i Fàbregas, G. Caire, and F. Willems, “Bit-interleaved coded modulation revisited: A mismatched decoding perspective,” IEEE Trans. on Information Theory, vol. 55, no. 6, pp. 2756–2765, June 2009. A. Guillén i Fàbregas, A. Martinez, and G. Caire, “Bit-interleaved coded modulation,” Foundations and Trends in Communications and Information Theory, vol. 5, no. 1-2, pp. 1–153, 2008. T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes,” IEEE Trans. on Information Theory, vol. 47, no. 2, pp. 619–637, 2001. J. Hou, P. H. Siegel, L. B. Milstein, and H. D. Pfister, “Capacity-approaching bandwidth-efficient coded modulation schemes based on low-density parity-check codes,” IEEE Trans. on Information Theory, vol. 49, no. 9, pp. 2141–2155, 2003.