Parallel Bit Interleaved Coded Modulation
Amir Ingber† and Meir Feder
Department of EE-Systems, Tel Aviv University
Tel Aviv 69978, ISRAEL
{ingber, meir}@eng.tau.ac.il
arXiv:1007.1407v2 [cs.IT] 17 Aug 2010
Abstract
A new variant of bit interleaved coded modulation (BICM) is proposed. In the new scheme, called parallel BICM, L identical
binary codes are used in parallel using a mapper, a newly proposed finite-length interleaver and a binary dither signal. As opposed
to previous approaches, the scheme does not rely on any assumptions of an ideal, infinite-length interleaver. Over a memoryless
channel, the new scheme is proven to be equivalent to a binary memoryless channel. Therefore the scheme enables one to easily
design coded modulation schemes using a simple binary code that was designed for that binary channel. The overall performance
of the coded modulation scheme is analytically evaluated based on the performance of the binary code over the binary channel.
The new scheme is analyzed from an information theoretic viewpoint, where the capacity, error exponent and channel dispersion
are considered. The capacity of the scheme is identical to the BICM capacity. The error exponent of the scheme is numerically
compared to a recently proposed mismatched-decoding exponent analysis of BICM.
I. I NTRODUCTION
Bit interleaved coded modulation (BICM) is a pragmatic approach for coded modulation [1]. It enables the construction
of nonbinary communication schemes from binary codes by using a long bit interleaver that separates the coding and the
modulation. BICM has drawn much attention in recent years, because of its efficiency for wireless and fading channels.
The information-theoretic properties of BICM were first studied by Caire et. al. in [2]. BICM was modeled as a binary
channel with a random state that is known at the receiver. The state determines how the input bit is mapped to the channel,
along with the other bits that are assumed to be random. Under the assumption of an infinite-length, ideal interleaver, the
BICM scheme is modeled by parallel uses of independent instances of this binary channel. This model is referred to as the
independent parallel channel model.
Using this model the capacity of the BICM scheme could be calculated. It was further shown that BICM suffers from a gap
from the full channel capacity, and that when Gray mapping is used this gap is generally small. In [2], methods for evaluating
the error probability of BICM were proposed, which rely on the properties of the specific binary codes that were used (e.g.
Hamming weight of error events).
A basic information-theoretic quantity other than the channel capacity is the error exponent [3], which quantifies the speed
at which the error probability decreases to zero with the block length n. Another tool for evaluating the performance at finite
block length is the channel dispersion, which was presented in 1962 [4] and was given more attention only in recent years
[5], [6]. It would therefore be interesting to analyze BICM at finite block length from the information-theoretic viewpoint.
Several attempts have been made to provide error exponent results for BICM.
In their work on multilevel codes, Wachsmann et. al. [7] have considered the random coding error exponent of BICM, by
relying on the independent parallel channels model. However, there were several flaws in the derivation:
• The independent parallel channels model is justified by an infinite-length interleaver. Therefore it might be problematic
to use its properties for evaluating the finite length performance of the BICM scheme. In the current paper we address
this point and propose a scheme with a finite-length interleaver for that purpose.
• There was a technical flaw in the derivation, which resulted in an inaccurate expression for the random coding error
exponent. We discuss this point in detail in Theorem 4.
• As noticed in [8], the error exponent result obtained in [7] sometimes may even exceed that of unconstrained coding
over the channel (called in [8] the “coded modulation exponent”). We therefore agree with [8] in the claim that “the
independent parallel channel model fails to capture the statistics of the channel”. However, by properly designing the
communication scheme the model can become valid in a rigorous way, as we show in Theorem 1.
In [8] (see also [9]), Martinez et al. have considered the BICM decoder as a mismatched decoder, which has access only to
the log-likelihood values (LLR) of each bit, where the LLR calculation assumes that the other bits are random, independent and
equiprobable (as in the classical BICM scheme [2]). Using results from mismatched decoding, they presented the generalized
error exponent and the generalized mutual information, and pinpointed the loss of BICM that incurs from using the mismatched
LLRs. Note that when a binary code of length n is used, the scheme requires only n/L channel uses. While this result is
valid for any block size and any interleaver length, achieving this error exponent in practice requires complex code design.
†
A. Ingber is supported by the Adams Fellowship Program of the Israel Academy of Sciences and Humanities.
For example, one cannot design a good binary code for a binary memoryless channel and have any guarantee that the BICM
scheme will perform well with that code. In fact, the code design for this scheme requires taking into account the memory
within the levels, or equivalently, nonbinary codes, which is what we wish to avoid when choosing BICM.
On the theoretical side, another drawback of existing approaches is the lack of converse results (for either capacity or
error exponent). The initial discussion of BICM information theory in [2] assumes the model of independent channels, and
any converse result based on this model must assume that an infinite, ideal, interleaver. Therefore the converse results (such
as upper bound on the achievable rate with BICM) do not hold for finite-length interleavers. The authors in [8] provide no
converse results for their model.
In this paper we propose the parallel BICM (PBICM) scheme, which has the following properties. First, the scheme includes
an explicit, finite length interleaver. Second, in order to attain good performance on any memoryless channel, PBICM allows
one to design a binary code for a binary memoryless channel, and guarantees good performance on the nonbinary channel.
Third, because the scheme does not rely on the use of an infinite-length interleaver, the error exponent and the dispersion of
the scheme can be calculated (both achievability and converse results) as means to evaluate the PBICM performance at finite
block length.
The comparison between PBICM and the mismatched decoding approach [8] should be done with care. With PBICM, when
the binary codeword length is n the scheme requires n channel uses. Therefore when the latency kept equal for both schemes,
PBICM uses a codeword length that is L times shorter than the codeword used in the mismatched decoder. A fair comparison
would be to fix the binary codeword length n for both schemes, resulting in different latency, but equal decoder complexity.
The results presented in the paper are summarized as follows:
• The PBICM communication framework is presented. Over a memoryless channel, it is shown to be equivalent to a binary
memoryless channel (Theorem 1).
• In Theorem 2, the capacity of PBICM is shown to be equal to the BICM capacity, as calculated in [2].
• PBICM is analyzed at finite block length. The error exponent of PBICM is defined and bounded by error exponent bounds
of the underlying binary channel (Theorems 3 and 4).
• The PBICM dispersion is defined as an alternative measure for finite-length performance. It is calculated by the dispersion
of the underlying binary channel (Theorems 5 and 6).
• The error exponent of PBICM is numerically compared to the mismatched-decoding error exponent of BICM [8]. The
additive white Gaussian noise (AWGN) channel and the Rayleigh fading channel are considered. When the latency of
both schemes is equal, the mismatched-decoding is generally better. However, when the complexity is equal (or where
the codeword length of the underlying binary code is equal), the PBICM exponent is better in many cases.
The paper is organized as follows.
In Section II we review the classical BICM model and its properties, under the assumption of an infinite-length, ideal
interleaver. In Section III the parallel BICM scheme is presented and the equivalence to a memoryless binary channel is
established. In Section IV parallel BICM is studied from an information-theoretical viewpoint. Numerical examples and
summary follow in Sections V and VI respectively.
II. T HE BICM C OMMUNICATION M ODEL
Notation: letters in bold (x, y...) denote row vectors, capital letters (X, Y ...) denote random variables, and tilde denotes
interleaved signals (b̃, z̃). PX (x) denotes the probability that the random variable (RV) X will get the value x, and similarly
PY |X (y|x) denotes the probability Y will get the value y given that the RV X is equal to x. E[·] denotes statistical expectation.
log means log2 .
A. Channel model
Let W denote a memoryless channel with input and output alphabets X and Y respectively. The transition probabilities are
defined by W (y|x) for y ∈ Y and x ∈ X . We assume that kX k = 2L . We consider equiprobable signaling only over the
channel W .
An (n, R) code C ⊆ X n is a set of M = 2nR codewords c ∈ X n . The encoder wishes to convey one of M equiprobable
messages. The error probability of interest shall be the codeword error probability. An (n, R) code with codeword error
probability pe will sometimes be called an (n, R, pe ) code.
B. Classical BICM encoding and decoding
In BICM, a binary code is used to encode information messages [m1 , m2 , ...] into binary codewords [b1 , b2 , ...]. The binary
codewords are then interleaved using a long interleaver π(·), which applies a permutation on the coded bits. The interleaved
bit stream b̃ is partitioned into groups of L consecutive bits and inserted into a mapper µ : {0, 1}L → X . The mapper output,
denoted x, is fed into the channel. The encoding process is described in Figure 1.
m1 ,m2 ..
Binary
encoder
b1, b2 ..
Fig. 1.
x
W
y
LLR
calc.
Fig. 2.
π
b̃
µ
x
BICM encoding process
z̃
z1, z2 ..
π−1
Binary
dec.
m̂1 ,m̂2 ..
BICM decoding process
The decoding process of BICM proceeds as follows. The channel output y is fed into a bit metric calculator, which calculates
the log-likelihood ratio (LLR) of each input bit b given the corresponding output sample y (L LLR values for each output
sample). These LLR values (or bit metrics) denoted z̃ are de-interleaved and partitioned into bit metrics [z1 , z2 , ...] that
correspond to the binary input codewords. Finally, the binary decoder decodes the messages [m̂1 , m̂2 , ...] from [z1 , z2 , ...]. The
decoding process is described in Figure 2.
The LLR of the j th bit in a symbol given the output value y is calculated as follows:
LLRj (y) , log
PY |Bj (y|0)
,
PY |Bj (y|1)
(1)
where PY |Bj (y|b) is the conditional probability of the channel output getting the value y given that the j th bit at the mapper
input was b, and the other (L − 1) bits are equiprobable independent binary random variables (RVs).
C. Classical BICM analysis: ideal interleaving
In classical BICM (e.g. [2]) the LLR calculation is motivated by the assumption of a very long (ideal) interleaver π, so the
coded bits go through essentially independent channels. These binary channels are defined as follows:
Definition 1: Let Wi be a binary channel with transition probability
Wi (y|b) , E [W (y|X = µ(B1 , ..., BL ))|Bi = b]
X
1
W (y|µ(b1 , ..., bL )).
= L−1
2
(2)
(3)
bj ; i6=j
bi =b
The channel Wi (y|bi ) can be thought of as the original channel W where the input is x = µ(b1 ...bL ), where the bits {bj }j6=i
are equiprobable independent RVs (see Fig. 3).
In [2], Caire et al. have proposed the following channel model for BICM called the independent parallel channel model. In
this model the channel has a binary input b. A channel state s is selected at random from S , {1, ..., L} with equal probability
(and independently of b). Given a state s, the input bit b is fed into the channel Ws . The channel outputs are the state s and
f , is depicted in Figure 4.
the output y of the channel Ws . The channel, denoted by W
f
The transition probability function of W is given by
f (y, s|b) =
W
=
=
B1
B2
Bi
BL
Fig. 3.
:
:
µ
PY,S|B (y, s|b)
PY |S,B (y|s, b)PS (s)
1
Ws (y, b).
L
X
W
(4)
Y
The binary channel Wi . The bits {Bj }j6=i are equiprobable independent RVs.
S: random state
B
Fig. 4.
Ws
S
Y
f . The random state S is known at the receiver.
The binary channel W
Note that both outputs can be combined into a single output, the LLR, which is a sufficient statistic for optimal decoding
f is given by
over any binary-input channel. The LLR calculation for the channel W
LLRW
f (y, s) = LLRs (y),
(5)
where LLRs is given in (1).
Therefore the independent parallel channel model transforms the original nonbinary channel W to a simple, memoryless
f , reliable
channel. Using an infinite-length interleaver and a binary code that was designed for the simple binary channel W
communication for the original channel W can be attained.
Let C(·) denote the Shannon capacity of a channel (with equiprobable input).
Lemma 1 (following [2]): Let CBICM (W ) denote the capacity of the channel W with BICM, a given mapping µ(·) and an
infinite-length interleaver (according to the independent parallel channel model). Denote by C(Ws ) the capacity of the channel
Ws . Then
L
X
BICM
C
(W ) =
C(Ws ).
(6)
s=1
C
f , we get that
Proof: Since the independent parallel channel model assumes L independent uses of the channel W
f
f
(W ) = L · C(W ). The capacity of W is given by
f
C W
= I(B; Y, S) = I(B; Y |S)
BICM
= ES I(B; Y |S = s) = ES C(Ws )
L
1X
=
C(Ws ).
L s=1
(7)
It is known that CBICM (W ) is generally smaller than the full channel capacity C(W ), as opposed to other schemes, most
notably multilevel coding and multistage decoding (MLC-MSD) [7], in which C(W ) can be achieved. However, for Gray
mapping the gap is small and can sometimes be tolerated. For example, for 8-PSK signaling over the AWGN channel with
SNR = 5dB, C(W ) = 1.86bit where CBICM (W ) = 1.84bit.
III. T HE PARALLEL BICM S CHEME
In this section we propose an explicit BICM-type communication scheme which we call parallel BICM (PBICM), which
allows the usage of binary codes on nonbinary channels at finite blocklength. The main features of the scheme include the
following:
• Binary codewords are used in parallel to construct a codeword that enters the channel.
• A new finite-length interleaver.
• A random binary signal (binary dither) that is added to the binary codewords.
f , thus allowing exact
With the proposed scheme, we rigorously show how the original channel W relates to the channel W
analysis and design of codes at finite block lengths.
A. Interleaver Design
We wish to design a finite length interleaver, where:
• The length of the interleaver is minimal,
• The interleaver should be as simple as possible,
• The binary codewords will go through a binary memoryless channel.
m1
..
.
EN C1
..
.
mL
EN CL
Fig. 5.
s
b1
..
.
bL
LLR
calc.
Fig. 6.
x
µ
Interleaving scheme viewed as parallel encoders
s
y
B̃
π
Z̃
π −1
z1
..
.
zL
DEC1
..
.
DECL
m̂1
..
.
m̂L
De-interleaving scheme viewed as parallel decoders
In order for the binary codewords to experience a memoryless channel, each binary codeword must be spread over n channel
uses of W , so the interleaver output length cannot be less than n channel uses. The newly proposed interleaver has of output
length of exactly n, which satisfies the above requirements.
Let EN C and DEC be an encoder-decoder pair for a binary code. Let b1 , ..., bL be L consecutive codewords from the
output of EN C, bunched together to a matrix B:
b1
b11 . . . b1n
.. .
B = ... = ...
(8)
b
.
lk
bL1
bL
n
...
bLn
n
Let s be a vector of i.i.d. random states drawn from S = {1..L} . s shall be the interleaving signal. Each column in B
shall be shifted cyclically by the corresponding element sk , so the interleaved signal B̃ is defined as
b(1+s1 )L 1
...
b(1+sn )L n
..
..
B̃ =
,
.
.
b
(l+sk )L k
b(L+s1 )L 1
...
b(1+sn )L n
where (ξ)L , (ξ modulo L) + 1.
Each column vector of interleaved signal B̃ is mapped to a single channel symbol:
xk = µ(b(1+sk )L k , . . . , b(L+sk )L k ),
(9)
and we call x = [x1 , ..., xn ] the channel codeword.
At the decoder an LLR value is calculated for every bit b in B̃ from y. The LLR values are denoted by Z̃. We assume that
s is known at the decoder (utilizing common randomness), therefore the de-interleaving operation is simply sorting back the
columns of Z̃ according to s by reversing the modulo operation. The de-interleaver output is a vector of LLR values z for
each transmitted codeword b, according to (1). Each codeword is decoded independently by DEC.
B. Binary dither
Since the decoder decodes each binary codeword independently, the communication scheme employing the above interleaver
can be viewed as as set of parallel encoder-decoder pairs, which we denote by EN C1 , ..., EN CL and DEC1 , ..., DECL (see
Figures 5 and 6). We do not assume any independence between the effective channels between each encoder-decoder pair.
Consider the first encoder-decoder pair, EN C1 and DEC1 . Since the input of DEC1 depends on the codewords transmitted by EN C2 ,...,EN CL , the channel between EN C1 and DEC1 is not strictly memoryless. If, somehow, the decoders
DEC2 ,...,DECL were forced to send i.i.d. equiprobable binary codewords, then the channel between EN C1 and DEC1 would
f (which is a binary memoryless channel) with the accurate LLR calculation (1).
be exactly the channel W
In order to achieve the goal of L binary memoryless channels between each encoder-decoder pair simultaneously, we add
a binary dither - an i.i.d. equiprobable binary signal - to each encoder-decoder pair as follows.
Let the dither signals dl = [dl1 , ..., dln ], l ∈ {1, ..., L} be L random vectors, each of length n, that are drawn independently
from a memoryless equiprobable binary source. The output of each encoder EN Cl , bl , goes through a component-wise XOR
b1
EN C1
..
.
m1
..
.
mL
EN CL
Fig. 7.
bL
d1
+
dL
+
Fig. 8.
B̃
π
x
µ
PBICM encoding scheme. ‘+’ denoted modulo-2 addition (XOR).
s
y
s
b′1
..
.
b′L
LLR
calc.
Z̃
z′1
π
−1
z′L
δ1
×
δL
×
z1
..
.
zL
DEC1
..
.
DECL
m̂1
..
.
m̂L
PBICM decoding scheme. δ l , 1 − 2 · dl , ‘×’ denotes element-wise multiplication.
operation with the dither vector dl . The output of the XOR operation, denoted b′l , is fed into the interleaver π. The full PBICM
encoding scheme is shown in Fig. 7.
We let each decoder DECl know the value of the dither used by its corresponding encoder EN Cl , dl (in practice the dither
signals are generated using a pseudo-random generator which allows the common randomness). In order to compensate for
the dither at the decoder, the LLR values are modified by flipping their sign for each dither value of 1 (and maintaining the
′
′
... zln
]. The LLR values
sign where the dither is 0). Formally, denote the LLR values at the de-interleaver output by z′l = [zl1
at the decoders input shall be denoted by zl = [zl1 ... zln ] and calculated as follows:
′
zlj = zlj
(1 − 2dlj ),
j = 1, ..., n.
(10)
The PBICM decoding scheme is shown in Fig. 8.
C. Model equivalence
Before we analyze the channel between each encoder-decoder pair in PBICM, let us define a binary memoryless channel
f , that will prove useful in the analysis of PBICM.
that is related to W
Definition 2: Let W be a memoryless binary channel with input B and output hY, S, Di: S is drawn at random from
{1, ..., L}, D is drawn at random from {0, 1} (S and D are independent, and both do not depend on the input B). Y is the
f where the
output of the channel WS with input B ⊕ D (⊕ is the XOR operation). Note that the channel W is the channel W
input is XORed with a binary RV D (see Fig. 9).
Note that the LLR calculation for the channel W is given by
d
LLRW (y, s, d) = (−1)d LLRW
f (y, s) = (−1) LLRs (y),
(11)
where LLRW
f and LLRs are given in (5) and (1), respectively.
Theorem 1: In parallel BICM, the channel between every encoder-decoder pair is exactly the binary memoryless channel
W , with its exact LLR output.
Proof: Consider the pair EN C1 and DEC1 . Let b1 be the codeword sent from EN C1 . After adding the dither d1 , the
dithered codeword b′1 enters the interleaver. The other codewords b2 , ..., bL are dithered using d2 , ..., dL . Since the dither
of these codewords is unknown at DEC1 , the dithered codewords b′2 , ..., b′L are truly random i.i.d. signals. The interleaving
signal s interleaves the dithered codewords according to (8). The interleaved signal enters the mapper µ and the channel W ,
W
D ∈ {0, 1}
B
Fig. 9.
f S ∈ {1, ..L}
W
Ws
D
S
Y
The binary channel W . The random state S and the dither D are known at the receiver.
resulting in an output y. Since the dithered codewords b′2 , ..., b′L are i.i.d., the equivalent channel from b′1 to hy, si is exactly
f . The LLR calculation at the PBICM receiver along with the interleaver produce z′1 , which is exactly the LLR
the channel W
f (cf. (5)).
calculation that fits the channel W
f with its input XORed with a binary RV, and that the LLR of
Recalling that the channel W is nothing but the channel W
f can be easily modified by the dither according to Eq. (11) to produce the LLR of the channel W , we conclude
the channel W
that the channel between b1 to z1 is exactly the channel W with LLR calculation.
Since by symmetry the above holds for any encoder-decoder pair EN Cl -DECl , the proof is concluded.
An important note should be made: Parallel BICM allows the decomposition of the nonbinary channel W to L binary
channels of the type W . These L channels are not independent. For example, if W is an additive noise channel, and at some
point the noise instance is very strong, this will affect all the decoders and they will fail in decoding together. However,
since in the PBICM scheme the channels are used independently, the operation of each decoder depends only on the marginal
distribution of the relevant channel outputs. The outputs of these decoders will inevitably be statistically dependent, and we
take this into consideration when analyzing the performance of coding using PBICM in the following.
D. Error Probability Analysis
We wish to analyze the performance of PBICM, and specifically, we are interested in the overall codeword error probability.
Let C be a binary (n, R) code, used in the PBICM scheme. To assure a fair comparison, we regard each L consecutive
information messages (m1 , ..., mL ) as a single message m, and regard the scheme as a code of length n on the channel input
alphabet X . We define the following error events: Let El beSthe event of a codeword error in DECl , and let E be the event
of an error in any of the messages {m1 , ..., mL }, i.e. E = l El . Denote the corresponding error probabilities by pel and pe
respectively.
Corollary 1: Let pe (W ) be the codeword error probability of the code C over the channel W . Then the overall error
probability pe of the code C used with PBICM can be bounded by
pe (W ) ≤ pe ≤ L · pe (W ).
(12)
Proof: Since the error events El in codewords that are mapped to the same channel codeword together are dependent, we
can only bound the overall error probability pe using the union bound. pe can be also lower bounded by the minimum of the
error probabilities in any of the channels:
X
min{pe1 , ..., peL } ≤ pe ≤
pel .
(13)
l
Since by Theorem 1 the channel between each of the encoder-decoder pairs is W , we get that the error probabilities must be
all equal to the error probability of the code C over the channel W . Setting pe1 = pe2 = ... = peL = pe (W ) in (13) completes
the proof.
In many cases the bit error rate (BER) is of interest. Suppose that each of the messages (m1 , ..., mL ) represents k information
b
′
bits and the entire message m represents L · k information bits. Let Elk
′ denote the error in the k -th bit of the information
message ml . The average BER for the encoder-decoder pair EN Cl -DECl is defined by
pbel ,
k
1 X
b
P r{Elk
′ }.
k ′
(14)
k =1
Similarly, define the overall average BER as
pbe ,
L
k
L
1X b
1 XX
b
P r{Elk
pel .
′} =
L·k
L
′
l=1 k =1
(15)
l=1
Corollary 2: Let pbe (W ) be the average BER of a binary code C over the channel W . Then the average BER pbe of the code
C used with PBICM is equal to pbe (W ).
Proof: Follows directly from Theorem 1 and from the definition of the average BER in (15).
IV. PARALLEL BICM: I NFORMATION T HEORETICAL A NALYSIS
In the previous section we defined the PBICM scheme and analyzed its basic error probability properties. The equivalence
of the channel between each encoder-decoder pair that was established in Theorem 1 enables a full information-theoretical
analysis of the scheme. We show that the highest achievable rate by PBICM (the PBICM capacity) is equal to the BICM
capacity as in Equation (7), which should not be a surprise. At the finite-length regime, we derive error exponent and channel
dispersion results as information-theoretical measures for optimal PBICM performance at finite-length.
A. Capacity
Let the PBICM capacity of W , CPBICM (W ), be the highest achievable rate for reliable communication over the channel W
with PBICM and a given mapping µ. (As usual, reliable communication means a vanishing codeword error probability as the
codelength n goes to infinity.)
Theorem 2: The PBICM capacity is given by
CPBICM (W ) = L · C(W ) =
L
X
C(Ws ) = CBICM (W ).
(16)
s=1
Proof:
(n)
Achievability: Let C (n) be a series of (binary) capacity-achieving codes for the channel W , and let pe (W ) be the
corresponding (vanishing) codeword error probabilities. By Corollary 1, the overall error probability of PBICM with a binary
code is upper bounded by L times the error probability of the same code over the channel W , therefore when the codes C (n)
(n)
are used with PBICM, the overall error probability is bounded by L · pe (W ) and also vanish with n. Since there are L
instances of the channel W , we get that the rate of L · C(W ) is achievable by PBICM.
Converse: Let C (n) be a series of binary codes that are used with PBICM and achieve a vanishing overall error probability
(n)
pe , and suppose that the overall PBICM rate is given by L · R (a rate of R at each encoder-decoder pair). By Corollary
1, the codeword error probability of a code over W is upper bounded by the overall error probability of the same code used
(n)
in PBICM. Therefore, if pe vanishes as n → ∞, then the error probability over W must also vanish, and therefore the
communication rate between each encoder-decoder pair must be upper bounded by C(W ), and the overall rate cannot surpass
L · C(W ).
All that remains is to calculate the capacity of W :
1
(17)
C(W ) = I(B; Y, S, D) = I(B; Y, S|D) = (I(B; Y, S|D = 0) + I(B; Y, S|D = 1)) .
2
f exactly, and when D = 1 we get the channel W
f with its input symbols always switched.
When D = 0, we get the channel W
f
In either way, the expression I(B; Y, S|D = d) is equal to the capacity of W . Using Lemma 1, we get that
L
X
f) = 1
C(W ) = C(W
C(Ws ).
L s=1
(18)
A note regarding the capacity proof: one might me tempted to try and prove the capacity theorem for PBICM without
dither, since with random coding, the code C is merely an i.i.d. binary random vector. This approach fails because of the
f relies on the fact that the other codewords are
following. In the decoding of each codeword, the correctness of the model W
i.i.d. signals. Since PBICM requires a single code for all the L levels, such a condition can never be met. It it possible to prove
the achievability without dither when using a different random code at each level, but such an approach will not guarantee the
existence of a single code, as required by PBICM.
B. Error Exponent
The error exponent of a channel W is defined by
1
log (pe (n)) ,
(19)
n
where pe (n) is the average codeword error probability for the best code of length n. A lower bound on the error exponent for
memoryless channels is the random coding error exponent [3], which is given by
E(R) , lim −
n→∞
Er (R) = max max {E0 (ρ, PX ) − ρR},
(20)
ρ∈[0,1] PX (·)
where
E0 (ρ, PX ) , − log
X
y∈Y
X
x∈X
PX (x)W (y|x)1/(1+ρ)
!1+ρ
.
(21)
Since we consider equiprobable inputs only we omit the dependence of E0 (ρ) in PX , and omit the maximization w.r.t. PX in
(20).
Others known bounds on the error exponent include the expurgation error exponent lower bound, the sphere packing error
exponent (an upper bound) and others [3]. The random coding and sphere packing exponents coincide for rates above the
critical rate, and therefore the error exponent is known precisely at these rates.
1) PBICM error exponent:
Similarly to (19), we define the PBICM error exponent:
Definition 3: For a given channel W and a mapping µ, let EPBICM (R) be defined as
1
log (pe (n)) ,
n
where pe (n) is the average codeword error probability for the best PBICM scheme with block length of n.
Using Corollary 1, we can calculate the PBICM exponent using the error exponent of W :
Theorem 3: The PBICM error exponent of a channel W is given by
EPBICM (R) , lim −
n→∞
EPBICM (R) = E(R/L),
(22)
(23)
where E(·) is the error exponent function of the binary channel W .
(n)
Proof: Let C (n) be a series of the binary codes. Denote their codeword error probabilities over the channel W by pe (W ).
(n)
Let pe be the error probabilities of the corresponding PBICM schemes with C (n) used as underlying codes. It follows from
(12) that
1
1
1
(n)
− log(L · p(n)
(24)
≤ − log p(n)
e (W )) ≤ − log pe
e (W ).
n
n
n
By taking n → ∞ the factor of L vanishes and we get that for any series of codes,
lim −
n→∞
1
1
log p(n)
= lim − log p(n)
e
e (W ).
n→∞
n
n
(25)
The above equation holds for the series of best codes for the channel W , as well as for the series of the best codes for PBICM.
Therefore the equality holds for the sequence of best codes on either side. Since the rate for PBICM is L times the rate for
coding on W , the proof is concluded.
2) The error exponent of W :
The channel W has a special structure, and is related to the binary sub-channels Wi . We now calculate two basic bounds
for the error exponent of W in terms of the sub-channels Wi . By Theorem 3, the PBICM error exponent of the channel W
can be bounded accordingly.
Theorem 4: Let E(R) be the error exponent of the channel W . It can be bounded as follows:
Random coding:
E(R) ≥ Er (R) = max {E0 (ρ) − ρR},
(26)
ρ∈[0,1]
where
(s)
i
h
(S)
E0 (ρ) = − log E 2−E0 (ρ) ,
(27)
E0 (ρ) is the E0 function for the channel Ws , and the expectation is w.r.t. the state S which is drawn uniformly from {1..L}.
Sphere packing:
E(R) ≤ Esp (R) = max{E0 (ρ) − ρR},
(28)
ρ>0
where E0 (ρ) is given in (27).
Proof:
The bounds in the theorem are the original random coding and sphere packing exponents [3]. The proof, therefore, boils
down to the simplification of the E0 function to the form of (27).
f
Consider the channel W (Definition 2) with binary input B and outputs hY, S, Di. Since W is equivalent to the channel W
with input B ⊕ D, where D is an equiprobable binary RV (and known at the receiver), we get that
W (y, s, d|b) =
1f
W (y, s|b ⊕ d).
2
f , in turn, is nothing more than the channel Ws with the additional output S. This yields
The channel W
1
1f
W (y, s|b ⊕ d) =
Ws (y|b ⊕ d).
2
2L
(29)
(30)
Combining the above, the function E0 of W is therefore given by
1+ρ
X
X
1
PB (b)W (y, s, d|b) 1+ρ
E0 (ρ) = − log
y∈Y
s∈{1..L}
b∈{0,1}
d∈{0,1}
=
− log
X
y∈Y
1+ρ
1
1+ρ
X 1 1
Ws (y|b ⊕ d)
2 2L
s∈{1..L}
d∈{0,1}
(a)
=
− log
X
y∈Y
s∈{1..L}
=
(b)
=
=
− log
X
s∈{1..L}
X
b∈{0,1}
X
b′ ∈{0,1}
1
2
1+ρ
1
1+ρ
1
Ws (y|b′ )
L
1 X X
L
′
y∈Y b ∈{0,1}
1 −E(0s) (ρ)
2
− log
L
s∈{1..L}
i
h
(s)
− log E 2−E0 (ρ)
(31)
1+ρ
1
1
Ws (y|b′ ) 1+ρ
2
(32)
(a) follows by setting b′ , b ⊕ d, and by noting that the summation result is independent of the value of d. (b) follows from
(s)
the definition of E0 (ρ) (the E0 function for the channel Ws ).
Several notes can be made:
• It is well known that the random coding and sphere packing exponents coincide at rates above the critical rate. Therefore
W
the exact error exponent of W is known at rates above the critical rate of W , Rcr
. It follows that the exact PBICM error
PBICM
W
exponent is known at rates above Rcr
, L · Rcr , which we define to be the PBICM critical rate.
• In theorem 4 we have shown that the random coding and the sphere packing bounds have a compact form because of the
special structure of the channel W . Clearly, following Theorem 3, every bound on E(R) of W serves as a bound on the
PBICM error exponent. However, for other bounds (such as the expurgation error exponent [3]), no compact form could
be found. Such bounds, of course, can still be applied to bound EPBICM (R).
f is equal to the E0 function of the channel W . This can easily be seen from the proof
• The E0 function of the channel W
f
above: E0 for W is given in (31) by definition.
f for calculating the error exponent of BICM. It is claimed that E0 of the
• In [7], the authors offered the model of W
f is given by [7, Eq. (37)]:
channel W
L
h
i
1 X (s)
(S)
E E0 (ρ) =
E (ρ).
(33)
L s=1 0
h
i
(S)
As we have just shown in Theorem 4, this is not the exact expression. In fact, it can be shown that E0 (ρ) ≤ E E0 (ρ) .
This follows directly from the convexity of the function 2−(·) and the Jensen inequality. Therefore the incorrect expression
in [7, Eq. (37)] always overestimates the value of E0 (ρ), and therefore the resulting Er (R) expression also overestimates
the true random coding expression.
C. Channel Dispersion
An alternative information theoretical measure for quantifying coding performance with finite block lengths is the channel
dispersion. Suppose that a fixed codeword error probability pe and a codeword length n are given. We can then seek the
maximal achievable rate R given pe and n.
√
It appears that for fixed pe and n, the gap to the channel capacity is approximately proportional to Q−1 (pe )/ n (where
Q(·) is the complementary Gaussian cumulative distribution function). The proportion constant (squared) is called the channel
dispersion. Formally, define the (operational) channel dispersion as follows [6]:
Definition 4: The dispersion V(W ) of a channel W with capacity C is defined as
2
C − R(n, pe )
,
(34)
V(W ) = lim lim sup n ·
pe →0 n→∞
Q−1 (pe )
where R(n, pe ) is the highest achievable rate for codeword error probability pe and codeword length n.
In 1962 , Strassen [4] used the Gaussian approximation to derive the following result for DMCs1 :
p
log n
−1
R(n, pe ) = C − V /nQ (pe ) + O
,
n
(35)
where C is the channel capacity, and the new quantity V is the (information-theoretic) dispersion , which is given by
V , VAR(i(X; Y )),
(36)
where i(x; y) is the information spectrum, given by
i(x; y) , log
PXY (x, y)
,
PX (x)PY (y)
(37)
and the distribution of X is the capacity-achieving distribution that minimizes V . Strassen’s result proves that the dispersion of
DMCs is equal to VAR(i(X; Y )). This result was recently tightened (and extended to the power-constrained AWGN channel)
in [6]. It is also known that the channel dispersion and the error exponent are related as follows. For a channel with capacity
(C−R)2
C and dispersion V , the error exponent can be approximated by E(R) ∼
= 2V ln 2 . See [6] for details on the early origins of
this approximation by Shannon.
1) PBICM dispersion: In order to estimate the finite-block performance of PBICM schemes we extend the dispersion
definition as follows:
Definition 5: The PBICM dispersion VPBICM (W ) of a channel W with a given mapping µ and PBICM capacity CPBICM (W )
is defined as
2
PBICM
C
(W ) − R(n, pe )
,
(38)
VPBICM (W ) = lim lim sup n ·
pe →0 n→∞
Q−1 (pe )
where R(n, pe ) is the highest achievable rate for any PBICM scheme with a given n and pe .
Relying on the relationship between the PBICM scheme and the binary channel W , we can show the following:
Theorem 5: Let n be a given block length and let pe be a given codeword error probability. The highest achievable rate
attained using PBICM, RPBICM (n, pe ) is bounded from above and below by:
s
1
L2 V(W ) −1 pe
+O
,
(39)
Q
RPBICM (n, pe ) ≥ CPBICM (W ) −
n
L
n
s
log n
L2 V(W ) −1
PBICM
PBICM
.
(40)
Q (pe ) + O
R
(n, pe ) ≤ C
(W ) −
n
n
As a result, the PBICM dispersion is given by
VPBICM (W ) = L2 V(W ).
(41)
Proof:
Direct: From the achievability proof of (35) [6, Theorem 45], there must exist an (n, R′ , p′e = pe /L) binary code for W
that satisfies
s
1
V(W ) −1
′
.
(42)
Q (pe /L) + O
R ≥ C(W ) −
n
n
By Theorem 1 and Corollary 1, it follows that the PBICM scheme based on this code is not greater than Lp′e = pe . The rate
of the PBICM scheme satisfies
s
V(W
)
1
R = L · R′ ≥ L C(W ) −
Q−1 (p′e ) + O
(43)
n
n
s
L2 V(W ) −1 pe
1
BICM
+O
.
(44)
= C
(W ) −
Q
n
L
n
1 see
Appendix B for the big-O notation.
Converse: Suppose we have a (n, R, pe ) PBICM scheme. According to Corollary 1, the codeword error probability p′e of
the underlying binary code is not greater than than pe . By Equation (35), the rate R′ of the underlying binary code is bounded
by
s
V(W ) −1 ′
log n
′
R ≤ C(W ) −
.
(45)
Q (pe ) + O
n
n
Since Q−1 (·) is a decreasing function, the bound loosens by replacing p′e with the higher pe . Therefore the overall rate R is
bounded by
s
log
n
V(W
)
(46)
Q−1 (p′e ) + O
R = L · R′ ≤ L C(W ) −
n
n
s
L2 V(W ) −1
log n
PBICM
≤ C
(W ) −
.
(47)
Q (pe ) + O
n
n
PBICM dispersion: Rewriting Equations (39) and (40), we get the following:
s
s
1
log n
L2 V(W ) −1
L2 V(W ) −1 pe
PBICM
≤C
(W ) − R ≤
+O
Q (pe ) + O
Q
n
n
n
L
n
PBICM
q
q
√
Q−1 pLe
log n
C
(W ) − R
1
2 V(W ) ·
√
≤ n
≤
L2 V(W ) + O √
L
+
O
Q−1 (pe )
Q−1 (pe )
n
n
(48)
(49)
Taking the limit w.r.t. n yields
or
PBICM
q
q
√
Q−1 pLe
C
(W ) − R
2
2
L V(W ) ≤ lim sup n
≤ L V(W ) · −1
,
Q−1 (pe )
Q (pe )
n→∞
2
L V(W ) ≤ lim sup n
n→∞
By noting that limε→0+
Q−1 (ε)2
2 ln 1ε
lim
pe →0
CPBICM (W ) − R
Q−1 (pe )
2
!2
Q−1 pLe
.
Q−1 (pe )
2
≤ L V(W )
(50)
(51)
= 1 (see Appendix A), we get that
!2
Q−1 pLe
ln(L/pe )
ln(1/pe ) + ln L
= lim
= lim
= 1,
−1
pe →0 ln(1/pe )
pe →0
Q (pe )
ln(1/pe )
(52)
which leads to the desired result:
VPBICM (W ) = lim lim sup n ·
pe →0 n→∞
CPBICM (W ) − R(n, pe )
Q−1 (pe )
2
= L2 V(W ).
(53)
Note that the PBICM dispersion result is not as tight as the bound for general coding schemes as in (35). The reason is the
unavoidable use of the union bound when estimating the overall error probability of PBICM in Theorem 1. In the dispersion
proof for DMCs, the value of the dispersion is obtained even without taking the limit w.r.t. pe . However, the gap between
Q−1 (pe ) and Q−1 (pe /L) for values of interest is not very large.
2) The dispersion of W :
As in the error exponent case, the PBICM dispersion of a channel is related to the dispersion of the binary channel W . We
now calculate it explicitly from the dispersions of the sub-channels Wi .
Theorem 6: The dispersion of the channel W is given by
#
"
L
X
1
f ) = E[V(WS )] + VAR [C(WS )] =
V(Ws ) + VAR(C(WS ))
(54)
V(W ) = V(W
L s=1
where VAR(C(WS )) is the statistical variance of the capacity of Ws , i.e.
VAR(C(WS )) , E[C2 (WS )] − E2 [C(WS )].
(55)
Proof: Consider the channel W (Definition 2) with binary input B and outputs hY, S, Di, and recall that
PY SD|B (y, s, d|b) = W (y, s, d|b) =
1
1f
W (y, s|b ⊕ d) =
Ws (y|b ⊕ d).
2
2L
(56)
f . Since S and the channel input B are independent, the information spectrum is given
We first calculate the dispersion of W
by
i(b; y, s) ,
=
PY |SB (y|s, b)PS (s)PB (b)
PY SB (y, s, b)
= log
PY S (y, s)PB (b)
PY S (y, s)PB (b)
PY |SB (y|s, b)
log
, i(b; y|s).
PY |S (y|s)
log
(57)
(58)
Using this notation, the dispersion of the channel Ws is given by
V(Ws ) = VAR(i(B; Y |s)|S = s)
= E i2 (B; Y |s)|S = s − C(Ws )2 .
f is given as follows:
Next, the dispersion of the channel W
f) =
V(W
(a)
=
=
VAR(i(B; Y, S)) = VAR(i(B; Y |S))
E [VAR[i(B; Y |s)|S = s]] + VAR [E[i(B; Y |S)|S = s]]
#
"
L
1X
V(Ws ) + VAR(C(WS )).
E[V(WS )] + VAR [C(WS )] =
L s=1
(59)
(a) follows from the law of total variance.
Finally, the dispersion of the channel W is calculated as follows:
f to a single output Z = hY, Si. We therefore end up with a channel with input
Let us combine the outputs of the channel W
B and outputs Z and D (see Fig. 9). Similarly to (57), we get that the information spectrum is given by
i(b; z, d) , log
PZDB (z, d, b)
= i(b; z|d).
PZD (z, d)PB (b)
(60)
Following (59), we get that
h
i
fD )] + VAR C(W
fD ) = 1
V(W ) = E[V(W
2
X
d={0,1}
fd ) + VAR(C(W
fD )),
V(W
(61)
fd is the channel W
f with its input XORed with the value d.
where W
f0 ) = C(W
f1 ) = C(W
f ), and that V(W
f0 ) = V(W
f1 ) =
Since only equiprobable inputs are considered, it follows that C(W
f
f
f
V(W ). It therefore follows that VAR(C(WD )) = 0, and consequently, V(W ) = V(W ), as required.
Note that since large dispersion means higher backoff from the capacity (see (35)), the term VAR(C(WS )) can be thought
of as a penalty factor for the dispersion, over the expected dispersion over the channels Ws , E[V(WS )]. This factor grows as
the capacities of the sub-channels Wi are more spread.
V. N UMERICAL R ESULTS
In this section we evaluate numerically the information-theoretical quantities for PBICM. In particular, we calculate the
PBICM random coding error exponent (see Theorems 3 and 4) in order to compare with the mismatched decoding approach
[8]. We consider the AWGN channel and the Rayleigh fading channel (with perfect channel state information at the receiver)
over a wide range of SNR values and constellations. Gray mapping was used throughout all the examples.
A. Normalization: latency vs. complexity
One way to compare the PBICM error exponent with the mismatched decoding exponent is to compare the error probability
when the block length n is fixed, which gives a simple comparison between the exponent values. Such an approach makes
sense, since both schemes have the same latency of n channel uses. As will be seen in the coming examples, for fixed n the
PBICM error exponent is inferior to that of the mismatched decoding. This can also be seen by observing that the PBICM
random coding exponent has a slope of −1/L (in its straight-line region), where the mismatched decoding exponent has a
slope of −1.
However, it should be taken into consideration that when the block length is n, the mismatched decoder is working with a
binary code of length n · L. The complexity of the maximum-metric decoder is proportional to the number of codewords 2n·L·R
[8], where R is the rate of the binary code. On the other hand, the number of codewords in the PBICM scheme is L · 2n·R
only. In order to assure a fair comparison from the complexity point of view, one has to allow the PBICM scheme to use a
block length that is L time the block length of the mismatched decoding scheme. Comparing the error probabilities of both
schemes gives nLEPBICM
= nEMismatched
. We therefore define the normalized PBICM error exponent as L times the PBICM
r
r
error exponent. We conclude that when the complexity is more important (and the latency is less important), the normalized
PBICM exponent is the quantity of interest.
It could be claimed, of course, that practical codes used today (such as low-density parity check (LDPC) codes ) will be
used and they do not have exponential decoding complexity. On the other hand, such codes do not guarantee an exponentially
decaying error probability.
B. Comparison with the Mismatched Decoding Exponent
In the following figures we show the comparison between the PBICM error exponent and the mismatched decoding error
exponent [8]. The figures show the (unconstrained) random coding error exponent of the channel, along with the mismatched
error exponent and the PBICM random coding error exponent (both normalized and un-normalized).
Figure 10 compares the exponents of 16QAM signaling over the Rayleigh fading channel at SNR = 5dB. Figure 11 shows
the same graph, zoomed-in on the capacity region. It can be seen that throughout the entire range of rates between zero and the
BICM capacity, the normalized PBICM random coding exponent is higher (better) than the mismatched decoding exponent.
Both BICM exponents are above zero for rates below the BICM capacity, and the unconstrained random coding exponent
reaches zero at the full channel capacity, as expected. A fact that might be somewhat surprising at first glance is that the
normalized PBICM exponent is better than the unconstrained random coding exponent in some rates. While this may seem
contradictory, recall that we consider coding schemes with the same maximum-likelihood (or maximum metric) complexity.
When normalizing the schemes complexity, PBICM operates with a block length that is L times the block length of the
unconstrained scheme, and therefore there is no contradiction. The mismatched decoder never attains higher values than the
unconstrained exponent, a fact that is known as the data processing inequality for exponents (see e.g. [8, Proposition 3.2]).
Figure 12 shows a similar picture (zoomed on the capacity in Figure 13). Again, the normalized PBICM outperforms the
mismatched decoding exponent for all rates. In this case, the BICM capacity is very close to the full channel capacity, which
enables the normalized PBICM to outperform the unconstrained exponent for essentially all rates.
On the Rayleigh fading channel, the same behavior was observed for the range of all practical ranges of SNR for 8PSK,
16QAM and 64QAM signaling: the normalized PBICM exponent outperformed the mismatched decoding exponent.
On the AWGN channel it cannot be claimed that the normalized PBICM exponent outperforms the mismatched exponent,
and the other way around is also not true: for 16QAM signaling and a SNR of 0dB (Fig. 14) the normalized PBICM exponent
was better, while for a SNR of 5dB the mismatched exponent was better (Fig. 15).
VI. D ISCUSSION
In this paper we have presented parallel bit-interleaved coded modulation (PBICM). The scheme is based on a finite-length
interleaver and adding binary dither to the binary codewords. The scheme is shown to be equivalent to a binary memoryless
channel, therefore the scheme allows easy code design and exact analysis. The scheme was analyzed from an informationtheoretical viewpoint, and the capacity, error exponent and the dispersion of the PBICM scheme were calculated.
Another approach for analyzing BICM at finite block length was proposed in [8], where BICM is thought of as a mismatched
decoder. Since this BICM setting uses finite length, the random coding error exponent of the scheme can be calculated. In
the previous section we have compared the error exponents of PBICM and of the mismatched decoding approach. When the
two schemes have the same latency (same block length) the PBICM exponent is inferior to that of the mismatched decoding
approach. However, when the complexity of the scheme is considered (or equivalently, when codeword length of the underlying
code is the same), PBICM becomes comparable, and generally better over the Rayleigh fading channel.
An important merit of the PBICM scheme is that it allows an easy code design. In PBICM, one has to design a binary code
for a memoryless binary channel. In recent years there have developed methods to design very efficient binary codes, such as
LDPC codes [10]. When designing LDPC codes, A desired property of a binary channel is that its output will be symmetric.
It appears that no matter what channel W we have at hand, the resulting binary channel W is always output-symmetric (when
the output is the LLR).
Because of its simplicity and easy code design, we conclude that PBICM is an attractive practical communication scheme,
which also allows exact theoretical analysis.
Capacity
Capacity (BICM)
E (R)
1
r
E (R) [PBICM]
r
0.8
E (R) [PBICM, normalized]
r
E (R) [Mismatched decoding]
r
0.6
0.4
0.2
0
Fig. 10.
0
0.5
1
1.5
2
Random coding exponents over the Rayleigh fading channel with 16-QAM signaling and SNR of 5dB.
0.02
Capacity
Capacity (BICM)
Er(R)
0.018
0.016
E (R) [PBICM]
r
E (R) [PBICM, normalized]
0.014
r
Er(R) [Mismatched decoding]
r
E (R)
0.012
0.01
0.008
0.006
0.004
0.002
0
1.3
Fig. 11.
1.35
1.4
1.45
1.5
R[bits]
1.55
1.6
1.65
1.7
Random coding exponents over the Rayleigh fading channel with 16-QAM signaling and SNR of 5dB (zoomed on the capacity)
5.5
Capacity
Capacity (BICM)
E (R)
5
r
4.5
E (R) [PBICM]
r
4
E (R) [PBICM, normalized]
r
E (R) [Mismatched decoding]
r
3
r
E (R)
3.5
2.5
2
1.5
1
0.5
0
Fig. 12.
0
1
2
3
R[bits]
4
5
Random coding exponents over the Rayleigh fading channel with 64-QAM signaling and SNR of 20dB.
Capacity
Capacity (BICM)
Er(R)
1.2
E (R) [PBICM]
1
r
E (R) [PBICM, normalized]
r
Er(R) [Mismatched decoding]
r
E (R)
0.8
0.6
0.4
0.2
0
3
3.5
4
4.5
5
5.5
R[bits]
Fig. 13.
Random coding exponents over the Rayleigh fading channel with 64-QAM signaling and SNR of 20dB (zoomed on the capacity)
0.7
Capacity
Capacity (BICM)
E (R)
0.6
r
E (R) [PBICM]
r
E (R) [PBICM, normalized]
0.5
r
E (R) [Mismatched decoding]
r
r
E (R)
0.4
0.3
0.2
0.1
0
Fig. 14.
0
0.2
0.4
0.6
R[bits]
0.8
1
Random coding exponents over the AWGN channel with 16QAM signaling and SNR of 0dB
1.5
Capacity
Capacity (BICM)
Er(R)
E (R) [PBICM]
r
E (R) [PBICM, normalized]
r
1
r
E (R)
Er(R) [Mismatched decoding]
0.5
0
Fig. 15.
0
0.5
1
R[bits]
1.5
2
Random coding exponents over the AWGN channel with 16QAM signaling and SNR of 5dB
Several additional notes can be made:
• The analysis holds for any mapping µ. Finding the mapping that yields the optimal performance at finite lengths is an
open question (although Gray mapping is expected to perform well).
• PBICM scheme is composed of, among other things, binary dither. Such tool is used in some cases as a theoretical tool
for proving achievability in some problems. In PBICM, it is an essential part of the scheme itself, and even the random
capacity proof becomes impossible without it. The main role of the dither is to validate the equivalence of the PBICM
scheme to a binary memoryless channel. In addition, the binary dither is the element that symmetrizes the binary channel,
which makes the code design easier. This symmetrization property was also noticed by [11] where a similar dither is
used with BICM (and termed ’channel adapters’). The code design proposed in [11] rely on the assumption of an ideal
interleaver.
• The channel is assumed to be memoryless. This captures many interesting channels, including the AWGN channel, and
the memoryless fading channel with and without state known at the receiver (ergodic fading). For slow-fading channels,
another interleaver (symbol interleaver) is required in order to transform the slowly fading channel into a fast-fading
channel (cf. [2]).
A PPENDIX A
A PPROXIMATION
Q- FUNCTION
OF THE INVERSE
The following is a useful approximation for the inverse Q-function.
Lemma 2:
−1
(Q (ε))2
lim
= 1.
ε→0
2 ln 1ε
(62)
Proof:
We start with the well known bound on the Q function:
x2
x2
1
1
1
√
1 + 2 e− 2 ≤ Q(x) ≤ √
e− 2
x
2πx
2πx
Dividing by the upper bound yields
1
1+ 2 ≤
x
Taking the limit x → ∞ gives
lim
Q(x)
√ 1 e−
2πx
Q(x)
x2
2
≤ 1.
(64)
= 1.
2
x
x→∞ √ 1
e− 2
2πx
(63)
(65)
Since the limit exists, we may take the natural logarithm:
lim ln
x→∞
Q(x)
lim ln Q(x) − ln √
x→∞
= 0.
(66)
1
x2
− ln e− 2 = 0.
2πx
(67)
2
x
√ 1 e− 2
2πx
Since limx→∞ ln Q(x) = −∞, we get
lim
1
ln Q(x) − ln √2πx
− ln e−
ln Q(x)
x→∞
which leads to
x2
2
= 0,
(68)
x2
−x2
ln e− 2
= lim
= 1.
lim
x→∞ 2 ln Q(x)
x→∞ ln Q(x)
(69)
Since limε→0 Q−1 (ε) = ∞, we may substitute x with Q−1 (ε), and write
−(Q−1 (ε))2
= 1,
ε→0
2 ln ε
lim
which leads to (62).
(70)
A PPENDIX B
B IG -O NOTATION :
As usual, f (n) = O(εn ) means that there exist c > 0 and n0 > 0 s.t. for all n > n0 , |f (n)| ≤ εn or equivalently, that
− cεn ≤ f (n) ≤ cεn .
(71)
fn = gn + O(εn ) will mean that fn − gn = O(εn ), which means that fn can be approximated by gn , up to a factor that is
not greater in absolute value than c · εn for some constant c.
Sometimes we will be interested in only one of the sides in (71). For that purpose, f (n) ≤ O(εn ) means that there exist
c > 0 and n0 > 0 s.t. for all n > n0 , f (n) ≤ c · εn , and f (n) ≥ O(εn ) will mean that there exist c > 0 and n0 > 0 s.t. for
all n > n0 , −f (n) ≤ c · εn .
The different combinations of usages of the O notation are listed in the table below.
Notation
fn = O(εn )
fn = gn + O(εn )
fn ≤ O(εn )
fn ≤ gn + O(εn )
fn ≥ O(εn )
fn ≥ gn + O(εn )
Meaning
∃c>0,n0 >0 ∀n>n0 |fn | ≤ c · εn
fn − gn = O(εn )
∃c>0,n0 >0 ∀n>n0 fn ≤ c · εn
fn − gn ≤ O(εn )
−fn ≤ O(εn ), or ∃c>0,n0 >0 ∀n>n0 −fn ≤ c · εn
fn − gn ≥ O(εn )
Note that fn ≤ O(εn ) with fn ≥ O(εn ) is equivalent to fn = O(εn ), as expected.
ACKNOWLEDGMENT
Interesting discussions with A. G. i Fàbregas are acknowledged.
R EFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE Trans. on Communications, vol. 40, no. 5, pp. 873–884, May 1992.
G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modulation,” IEEE Trans. on Information Theory, vol. 44, no. 3, pp. 927–946, 1998.
R. G. Gallager, Information Theory and Reliable Communication. New York, NY, USA: John Wiley & Sons, Inc., 1968.
V. Strassen, “Asymptotische abschätzungen in shannons informationstheorie,” Trans. Third Prague Conf. Information Theory, 1962, Czechoslovak Academy
of Sciences, pp. 689–723.
Y. Polyanskiy, V. Poor, and S. Verdú, “Dispersion of Gaussian channels,” in Proc. IEEE International Symposium on Information Theory, 2009, pp.
2204–2208.
Y. Polyanskiy, H. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Trans. on Information Theory, vol. 56, no. 5, pp.
2307 –2359, May 2010.
U. Wachsmann, R. F. H. Fischer, and J. B. Huber, “Multilevel codes: Theoretical concepts and practical design rules,” IEEE Trans. on Information
Theory, vol. 45, no. 5, pp. 1361–1391, 1999.
A. Martinez, A. Guillén i Fàbregas, G. Caire, and F. Willems, “Bit-interleaved coded modulation revisited: A mismatched decoding perspective,” IEEE
Trans. on Information Theory, vol. 55, no. 6, pp. 2756–2765, June 2009.
A. Guillén i Fàbregas, A. Martinez, and G. Caire, “Bit-interleaved coded modulation,” Foundations and Trends in Communications and Information
Theory, vol. 5, no. 1-2, pp. 1–153, 2008.
T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching irregular low-density parity-check codes,” IEEE Trans. on
Information Theory, vol. 47, no. 2, pp. 619–637, 2001.
J. Hou, P. H. Siegel, L. B. Milstein, and H. D. Pfister, “Capacity-approaching bandwidth-efficient coded modulation schemes based on low-density
parity-check codes,” IEEE Trans. on Information Theory, vol. 49, no. 9, pp. 2141–2155, 2003.