1
Low Density Lattice Codes
arXiv:0704.1317v1 [cs.IT] 11 Apr 2007
Naftali Sommer, Senior Member, IEEE, Meir Feder, Fellow, IEEE, and Ofir Shalvi, Member, IEEE
Abstract— Low density lattice codes (LDLC) are novel lattice
codes that can be decoded efficiently and approach the capacity of
the additive white Gaussian noise (AWGN) channel. In LDLC a
codeword x is generated directly at the n-dimensional Euclidean
space as a linear transformation of a corresponding integer
message vector b, i.e., x = Gb, where H = G−1 is restricted to
be sparse. The fact that H is sparse is utilized to develop a lineartime iterative decoding scheme which attains, as demonstrated
by simulations, good error performance within ∼ 0.5dB from
capacity at block length of n = 100, 000 symbols. The paper also
discusses convergence results and implementation considerations.
Index Terms— Lattices, lattice codes, iterative decoding, LDPC.
I. I NTRODUCTION
If we take a look at the evolution of codes for binary or
finite alphabet channels, it was first shown [1] that channel
capacity can be achieved with long random codewords. Then,
it was found out [2] that capacity can be achieved via a simpler
structure of linear codes. Then, specific families of linear
codes were found that are practical and have good minimum
Hamming distance (e.g. convolutional codes, cyclic block
codes, specific cyclic codes such as BCH and Reed-Solomon
codes [4]). Later, capacity achieving schemes were found,
which have special structures that allow efficient iterative
decoding, such as low-density parity-check (LDPC) codes [5]
or turbo codes [6].
If we now take a similar look at continuous alphabet codes
for the additive white Gaussian noise (AWGN) channel, it was
first shown [3] that codes with long random Gaussian codewords can achieve capacity. Later, it was shown that lattice
codes can also achieve capacity ([7] – [12]). Lattice codes are
clearly the Euclidean space analogue of linear codes. Similarly
to binary codes, we could expect that specific practical lattice
codes will then be developed. However, there was almost
no further progress from that point. Specific lattice codes
that were found were based on fixed dimensional classical
lattices [19] or based on algebraic error correcting codes
[13][14], but no significant effort was made in designing lattice
codes directly in the Euclidean space or in finding specific
capacity achieving lattice codes. Practical coding schemes for
the AWGN channel were based on finite alphabet codes.
In [15], “signal codes” were introduced. These are lattice
codes, designed directly in the Euclidean space, where the
information sequence of integers in , n = 1, 2, ... is encoded
by convolving it with a fixed signal pattern gn , n = 1, 2, ...d.
Signal codes are clearly analogous to convolutional codes, and
The material in this paper was presented in part in the IEEE International
Symposium on Information Theory, Seattle, July 2006, and in part in the
Inauguration of the UCSD Information Theory and Applications Center, San
Diego, Feb. 2006.
in particular can work at the AWGN channel cutoff rate with
simple sequential decoders. In [16] it is also demonstrated that
signal codes can work near the AWGN channel capacity with
more elaborated bi-directional decoders. Thus, signal codes
provided the first step toward finding effective lattice codes
with practical decoders.
Inspired by LDPC codes and in the quest of finding practical
capacity achieving lattice codes, we propose in this work
“Low Density Lattice Codes” (LDLC). We show that these
codes can approach the AWGN channel capacity with iterative
decoders whose complexity is linear in block length. In recent
years several schemes were proposed for using LDPC over
continuous valued channels by either multilevel coding [18] or
by non-binary alphabet (e.g. [17]). Unlike these LDPC based
schemes, in LDLC both the encoder and the channel use the
same real algebra which is natural for the continuous-valued
AWGN channel. This feature also simplifies the convergence
analysis of the iterative decoder.
The outline of this paper is as follows. Low density lattice
codes are first defined in Section II. The iterative decoder is
then presented in Section III, followed by convergence analysis
of the decoder in Section IV. Then, Section V describes how to
choose the LDLC code parameters, and Section VI discusses
implementation considerations. The computational complexity
of the decoder is then discussed in Section VII, followed by
a brief description of encoding and shaping in Section VIII.
Simulation results are finally presented in Section IX.
II. BASIC D EFINITIONS AND P ROPERTIES
A. Lattices and Lattice Codes
An n dimensional lattice in Rm is defined as the set of all
linear combinations of a given basis of n linearly independent
vectors in Rm with integer coefficients. The matrix G, whose
columns are the basis vectors, is called a generator matrix of
the lattice. Every lattice point is therefore of the form x = Gb,
where b is an n-dimensional vector of integers. The Voronoi
cell of a lattice point is defined as the set of all points that are
closer to this point than to any other lattice point. The Voronoi
cells of all lattice points are congruent, and for square G the
volume of the Voronoi cell is equal to det(G). In the sequel
G will be used to denote both the lattice and its generator
matrix.
A lattice code of dimension n is defined by a (possibly
shifted) lattice G in Rm and a shaping region B ⊂ Rm ,
where the codewords are all the lattice points that lie within
the shaping region B. Denote the number of these codewords
by N . The average transmitted power (per channel use, or per
symbol) is the average energy of all codewords, divided by
the codeword length m. The information rate (in bits/symbol)
is log2 (N )/m.
2
When using a lattice code for the AWGN channel with
power limit P and noise variance σ 2 , the maximal information
rate is limited by the capacity 12 log2 (1 + σP2 ). Poltyrev [20]
considered the AWGN channel without restrictions. If there is
no power restriction, code rate is a meaningless measure, since
it can be increased without limit. Instead, it was suggested in
[20] to use the measure of constellation density, leading to a
generalized definition of the capacity as the maximal possible
codeword density that can be recovered reliably. When applied
to lattices, the generalized capacity implies that there exists a
lattice G of high enough dimension n that enables transmis2
sion
√ with arbitrary small error probability, if and only if σ <
n
|det(G)|2
.
2πe
A lattice that achieves the generalized capacity
of the AWGN channel without restrictions, also achieves the
channel capacity of the power constrained AWGN channel,
with a properly chosen spherical shaping region (see also [12]).
In the rest of this work we shall concentrate on the lattice
design and the lattice decoding algorithm, and not on the
shaping region or shaping algorithms. We shall use lattices
with det(G) = 1, where analysis and simulations will be
carried for the AWGN channel without restrictions. A capacity
achieving lattice will have small error probability for noise
1
.
variance σ 2 which is close to the theoretical limit 2πe
B. Syndrome and Parity Check Matrix for Lattice Codes
A binary (n, k) error correcting code is defined by its n × k
binary generator matrix G. A binary information vector b with
dimension k is encoded by x = Gb, where calculations are
performed in the finite field GF(2). The parity check matrix H
is an (n − k) × n matrix such that x is a codeword if and only
if Hx = 0. The input to the decoder is the noisy codeword
y = x + e, where e is the error sequence and addition is done
in the finite field. The decoder typically starts by calculating
the syndrome s = Hy = H(x+e) = He which depends only
on the noise sequence and not on the transmitted codeword.
We would now like to extend the definitions of the parity
check matrix and the syndrome to lattice codes. An ndimensional lattice code is defined by its n×n lattice generator
matrix G (throughout this paper we assume that G is square,
but the results are easily extended to the non-square case).
Every codeword is of the form x = Gb, where b is a
vector of integers. Therefore, G−1 x is a vector of integers
for every codeword x. We define the parity check matrix
∆
for the lattice code as H = G−1 . Given a noisy codeword
y = x + w (where w is the additive noise vector, e.g. AWGN,
added by real arithmetic), we can then define the syndrome as
∆
s = f rac{Hy}, where f rac{x} is the fractional part of x,
defined as f rac{x} = x − bxe, where bxe denotes the nearest
integer to x. The syndrome s will be zero if and only if y is a
lattice point, since Hy will then be a vector of integers with
zero fractional part. For a noisy codeword, the syndrome will
equal s = f rac{Hy} = f rac{H(x + w)} = f rac{Hw}
and therefore will depend only on the noise sequence and not
on the transmitted codeword, as desired.
Note that the above definitions of the syndrome and parity
check matrix for lattice codes are consistent with the definitions of the dual lattice and the dual code[19]: the dual lattice
of a lattice G is defined as the lattice with generator matrix
H = G−1 , where for binary codes, the dual code of G is
defined as the code whose generator matrix is H, the parity
check matrix of G.
C. Low Density Lattice Codes
We shall now turn to the definition of the codes proposed
in this paper - low density lattice codes (LDLC).
Definition 1 (LDLC): An n dimensional LDLC is an ndimensional lattice code with a non-singular lattice generator
matrix G satisfying |det(G)| = 1, for which the parity
check matrix H = G−1 is sparse. The i’th row degree ri ,
i = 1, 2, ...n is defined as the number of nonzero elements in
row i of H, and the i’th column degree ci , i = 1, 2, ...n is
defined as the number of nonzero elements in column i of H.
Note that in binary LDPC codes, the code is completely
defined by the locations of the nonzero elements of H. In
LDLC there is another degree of freedom since we also have
to choose the values of the nonzero elements of H.
Definition 2 (regular LDLC): An n dimensional LDLC is
regular if all the row degrees and column degrees of the parity
check matrix are equal to a common degree d.
Definition 3 (magic square LDLC): An n dimensional regular LDLC with degree d is called “magic square LDLC”
if every row and column of the parity check matrix H has
the same d nonzero values, except for a possible change of
order and random signs. The sorted sequence of these d values
h1 ≥ h2 ≥ ... ≥ hd > 0 will be referred to as the generating
sequence of the magic square LDLC.
For example, the matrix
0 −0.8
0
−0.5 1
0
0.8
0
0
1
0 −0.5
0
0.5
1
0
0.8
0
H=
0
−0.5 −0.8 0
1
0
1
0
0
0
0.5 0.8
0.5 −1 −0.8
0
0
0
is a parity check matrix of a magic square LDLC with lattice
dimension n = 6, degree d = 3 and generating sequence
{1, 0.8, 0.5}.
p This H should be further normalized by the
constant n |det(H)| in order to have |det(H)| = |det(G)| =
1, as required by Definition 1.
The bipartite graph of an LDLC is defined similarly to
LDPC codes: it is a graph with variable nodes at one side and
check nodes at the other side. Each variable node corresponds
to a single element of the codeword x = Gb. Each check
node corresponds to a check
P equation (a row of H). A check
equation is of the form k hk xik = integer, where ik denotes the locations of the nonzero elements at the appropriate
row of H, hk are the values of H at these locations and the
integer at the right hand side is unknown. An edge connects
check node i and variable node j if and only if Hi,j 6= 0.
This edge is assigned the value Hi,j . Figure 1 illustrates the
bi-partite graph of a magic square LDLC with degree 3. In
the figure, every variable node xk is also associated with its
noisy channel observation yk .
Finally, a k-loop is defined as a loop in the bipartite graph
that consists of k edges. A bipartite graph, in general, can only
3
X11
X10
X9
X8
Tier 2 (24 nodes)
y1
x1
h1
Tier1 (6 nodes)
X7
h3
yk
xk
h2
h1
yn
Fig. 1.
xn
The bi-partite graph of an LDLC
contain loops with even length. Also, a 2-loop, which consists
of two parallel edges that originate from the same variable
node to the same check node, is not possible by the definition
of the graph. However, longer loops are certainly possible. For
example, a 4-loop exists when two variable nodes are both
connected to the same pair of check nodes.
III. I TERATIVE D ECODING FOR THE AWGN C HANNEL
Assume that the codeword x = Gb was transmitted, where
b is a vector of integers. We observe the noisy codeword y =
x + w, where w is a vector of i.i.d Gaussian noise samples
with common variance σ 2 , and we need to estimate the integer
valued vector b. The maximum likelihood (ML) estimator is
then b̂ = arg min ||y − Gb||2 .
b
Our decoder will not estimate directly the integer vector
b. Instead, it will estimate the probability density function
(PDF) of the codeword vector x. Furthermore, instead of
calculating the n-dimensional PDF of the whole vector x,
we shall calculate the n one-dimensional PDF’s for each
of the components xk of this vector (conditioned on the
whole observation vector y). In appendix I it is shown that
fxk |y (xk |y) is a weighted sum of Dirac delta functions:
X
2
2
fxk |y (xk |y) = C ·
δ(xk − lk ) · e−d (l,y)/2σ
(1)
l∈G∩B
where l is a lattice point (vector), lk is its k-th component, C
is a constant independent of xk and d(l, y) is the Euclidean
distance between l and y. Direct evaluation of (1) is not
X5
X4
X3
X2
X1
h2
h3
X6
Fig. 2.
Tier diagram
practical, so our decoder will try to estimate fxk |y (xk |y) (or
at least approximate it) in an iterative manner.
Our decoder will decode to the infinite lattice, thus ignoring
the shaping region boundaries. This approximate decoding
method is no longer exact maximum likelihood decoding, and
is usually denoted “lattice decoding” [12].
The calculation of fxk |y (xk |y) is involved since the components xk are not independent random variables (RV’s),
because x is restricted to be a lattice point. Following [5]
we use a “trick” - we assume that the xk ’s are independent,
but add a condition that assures that x is a lattice point.
∆
Specifically, define s = H · x. Restricting x to be a lattice
point is equivalent to restricting s ∈ Zn . Therefore, instead
of calculating fxk |y (xk |y) under the assumption that x is a
lattice point, we can calculate fxk |y (xk |y, s ∈ Zn ) and assume
that the xk are independent and identically distributed (i.i.d)
with a continuous PDF (that does not include Dirac delta
functions). It still remains to set fxk (xk ), the PDF of xk .
Under theQi.i.d assumption, the PDF of the codeword x is
n
fx (x) = k=1 fxk (xk ). As shown in Appendix II, the value
of fx (x) is not important at values of x which are not lattice
points, but at a lattice point it should be proportional to the
probability of using this lattice point. Since we assume that all
lattice points are used equally likely, fx (x) must have the same
value at all lattice points. A reasonable choice for fxk (xk ) is
then to use a uniform distribution such that x will be uniformly
distributed in an n-dimensional cube. For an exact ML decoder
(that takes into account the boundaries of the shaping region),
it is enough to choose the range of fxk (xk ) such that this
cube will contain the shaping region. For our decoder, that
performs lattice decoding, we should set the range of fxk (xk )
large enough such that the resulting cube will include all the
lattice points which are likely to be decoded. The derivation of
the iterative decoder shows that this range can be set as large
as needed without affecting the complexity of the decoder.
The derivation in [5] further imposed the tree assumption. In
order to understand the tree assumption, it is useful to define
the tier diagram, which is shown in Figure 2 for a regular
LDLC with degree 3. Each vertical line corresponds to a check
equation. The tier 1 nodes of x1 are all the elements xk that
take place in a check equation with x1 . The tier 2 nodes of
x1 are all the elements that take place in check equations with
the tier 1 elements of x1 , and so on. The tree assumption
assumes that all the tree elements are distinct (i.e. no element
appears in different tiers or twice in the same tier). This
4
assumption simplifies the derivation, but in general, does not
hold in practice, so our iterative algorithm is not guaranteed
to converge to the exact solution (1) (see Section IV).
The detailed derivation of the iterative decoder (using the
above “trick” and the tree assumption) is given in Appendix
III. In Section III-A below we present the final resulting
algorithm. This iterative algorithm can also be explained by
intuitive arguments, described after the algorithm specification.
channel PDF
check node message #1
check node message #2
check node message #3
A. The Iterative Decoding Algorithm
The iterative algorithm is most conveniently represented by
using a message passing scheme over the bipartite graph of
the code, similarly to LDPC codes. The basic difference is
that in LDPC codes the messages are scalar values (e.g. the
log likelihood ratio of a bit), where for LDLC the messages
are real functions over the interval (−∞, ∞). As in LDPC, in
each iteration the check nodes send messages to the variable
nodes along the edges of the bipartite graph and vice versa.
The messages sent by the check nodes are periodic extensions
of PDF’s. The messages sent by the variable nodes are PDF’s.
LDLC iterative decoding algorithm:
Denote the variable nodes by x1 , x2 , ..., xn and the check
nodes by c1 , c2 , ...cn .
• Initialization: each variable node xk sends to all its check
2
(0)
•
Final variable node message
Fig. 3.
The function Qj (x) is the message that is finally sent to
variable node xmj .
Basic iteration - variable node message: Each variable
node sends a (different) message to each of the check
nodes that are connected to it. For a specific variable
node xk , assume that it is connected to check nodes
Signals at variable node
cm1 , cm2 , ...cme , where e is the appropriate column degree of H. Denote by Ql (x), l = 1, 2, ...e, the message
that was sent from check node cml to this variable node
in the previous half-iteration. The message that is sent
back to check node cmj is calculated in two basic steps:
(yk −x)2 Q
e
1) The product step: f˜j (x) = e− 2σ2
l=1 Ql (x)
(yk −x)
1
e− 2σ2 .
nodes the message fk (x) = √2πσ
2
Basic iteration - check node message: Each check node
sends a (different) message to each of the variable nodes
that are connected to it. For a specific check node denote
(withoutP
loss of generality) the appropriate check equar
tion by l=1 hl xml = integer, where xml , l = 1, 2...r
are the variable nodes that are connected to this check
node (and r is the appropriate row degree of H). Denote
by fl (x), l = 1, 2...r, the message that was sent to this
check node by variable node xml in the previous halfiteration. The message that the check node transmits back
to variable node xmj is calculated in three basic steps.
1) The convolution step - all messages, except fj (x),
are convolved (after expanding each fl (x) by hl ):
x
x
~ · · · fj−1
~
p̃j (x) = f1
h
hj−1
1
x
x
~fj+1
~ · · · · · · ~ fr
(2)
hj+1
hr
2) The stretching step - The result is stretched by
(−hj ) to pj (x) = p̃j (−hj x)
3) The periodic extension step - The result is extended
to a periodic function with period 1/|hj |:
∞
X
i
Qj (x) =
pj x −
(3)
hj
i=−∞
•
check node message #4
2) The normalization step: fj (x) =
R∞
l6=j
f˜j (x)
f˜j (x)dx
−∞
•
This basic iteration is repeated for the desired number of
iterations.
Final decision: After finishing the iterations, we want to
estimate the integer information vector b. First, we estimate the final PDF’s of the codeword elements xk , k =
1, 2, ...n, by calculating the variable node messages at the
last iteration without omitting any check node message
(yk −x)2 Q
(k)
e
in the product step: f˜f inal (x) = e− 2σ2
l=1 Ql (x).
Then, we estimate each xk by finding the peak of its
(k)
PDF: xˆk = arg maxx f˜f inal (x). Finally, we estimate b
as b̂ = bH x̂e.
The operation of the iterative algorithm can be intuitively
explained as follows. The check node operation is equivalent
to calculating the PDF of xmj fromP
the PDF’s of xmi , i =
r
1, 2, ..., j − 1, j + 1, ...r, given that l=1 hl xml = integer,
and assuming that xmi are independent. Extracting
Pr xmj from
the check equation, we get xmj = h1j (integer− l=1 hl xml ).
l6=j
Since the PDF of a sum of independent random variables is the
convolution of the corresponding PDF’s, equation (2) and the
stretching step that follows it simply calculate the PDF of xmj ,
assuming that the integer at the right hand side of the check
equation is zero. The result is then periodically extended such
that a properly shifted copy exists for every possible value of
this (unknown) integer. The variable node gets such a message
from all the check equations that involve the corresponding
variable. The check node messages and the channel PDF are
treated as independent sources of information on the variable,
so they are multiplied all together.
5
Note that the periodic extension step at the check nodes
is equivalent to a convolution with an infinite impulse train.
With this observation, the operation of the variable nodes is
completely analogous to that of the check nodes: the variable
nodes multiply the incoming messages by the channel PDF,
where the check nodes convolve the incoming messages with
an impulse train, which can be regarded as a generalized
“integer PDF”.
In the above formulation, the integer information vector b
is recovered from the PDF’s of the codeword elements xk . An
alternative approach is to calculate the PDF of each integer
element bm directly as the PDF of the left hand side of the
appropriate check equation. Using the tree assumption, this can
be done by simply calculating the convolution p̃(x) as in (2),
but this time without omitting any PDF, i.e. all the received
variable node messages are convolved. Then, the integer bm
is determined by b̂m = arg maxj∈Z p̃(j).
Figure 3 shows an example for a regular LDLC with degree
d = 5. The figure shows all the signals that are involved in
generating a variable node message for a certain variable node.
The top signal is the channel Gaussian, centered around the
noisy observation of the variable. The next 4 signals are the
periodically extended PDF’s that arrived from the check nodes,
and the bottom signal is the product of all the 5 signals. It
can be seen that each periodic signal has a different period,
according to the relevant coefficient of H. Also, the signals
with larger period have larger variance. This diversity resolves
all the ambiguities such that the multiplication result (bottom
plot) remains with a single peak. We expect the iterative
algorithm to converge to a solution where a single peak will
remain at each PDF, located at the desired value and narrow
enough to estimate the information.
IV. C ONVERGENCE
A. The Gaussian Mixture Model
Interestingly, for LDLC we can come up with a convergence
analysis that in many respects is more specific than the similar
analysis for LDPC.
We start by introducing basic claims about Gaussian PDF’s.
(x−m)2
1
e− 2V .
Denote Gm,V (x) = √2πV
Claim 1 (convolution of Gaussians): The convolution of n
Gaussians with mean values m1 , m2 , ..., mn and variances
V1 , V2 , ..., Vn , respectively, is a Gaussian with mean m1 +
m2 + ... + mn and variance V1 + V2 + ... + Vn .
Proof: See [21].
Claim 2 (product of n Gaussians): Let
Gm1 ,V1 (x),
Gm2 ,V2 (x),...,Gmn ,Vn (x) be n Gaussians with mean
values m1 , m2 , ..., mn
and variances V1 , V2 , ..., Vn
respectively. Then, the
product
of these Gaussians is
Qn
a scaled Gaussian:
= Â · Gm̂,V̂ (x),
i=1 Gmi ,Vi (x)
Pn
−1
Pn 1
1
i=1 mi Vi
P
=
=
, and
where V̂
n
i=1 Vi , m̂
V −1
 =
1
√
Q
(2π)n−1 V̂ −1 n
k=1 Vk
·e
− V̂2
Pn
i=1
i=1 i
(mi −mj )2
j=i+1
Vi ·Vj
Pn
.
Proof: By straightforward mathematical manipulations.
The reason that we are interested in the properties of
Gaussian PDF’s lies in the following lemma.
Lemma 1: Each message that is exchanged between the
check nodes and variable nodes in the LDLC decoding algorithm (i.e. Qj (x) and fj (x)), at every iteration, can be
expressed as a Gaussian mixture of the form M (x) =
P
∞
j=1 Aj Gmj ,Vj (x).
Proof: By induction: The initial messages are Gaussians,
and the basic operations of the iterative decoder preserve the
Gaussian mixture nature of Gaussian mixture inputs (convolution and multiplication preserve the Gaussian nature according
to claims 1 and 2, stretching, expanding and shifting preserve
it by the definition of a Gaussian, and periodic extension
transforms a single Gaussian to a mixture and a mixture to
a mixture).
Convergence analysis should therefore analyze the convergence of the variances, mean values and amplitudes of the
Gaussians in each mixture.
B. Convergence of the Variances
We shall now analyze the behavior of the variances, and
start with the following lemma.
Lemma 2: For both variable node messages and check node
messages, all the Gaussians that take place in the same mixture
have the same variance.
Proof: By induction. The initial variable node messages
are single element mixtures so the claim obviously holds. Assume now that all the variable node messages at iteration t are
mixtures where all the Gaussians that take place in the same
mixture have the same variance. In the convolution step (2),
each variable node message is first expanded. All Gaussians
in the expanded mixture will still have the same variance,
since the whole mixture is expanded together. Then, d − 1
expanded Gaussian mixtures are convolved. In the resulting
mixture, each Gaussian will be the result of convolving d − 1
single Gaussians, one from each mixture. According to claim
1, all the Gaussians in the convolution result will have the same
variance, which will equal the sum of the d−1 variances of the
expanded messages. The stretching and periodic extension (3)
do not change the equal variance property, so it holds for the
final check node messages. The variable nodes multiply d − 1
check node messages. Each Gaussian in the resulting mixture
is a product of d − 1 single Gaussians, one from each mixture,
and the channel noise Gaussian. According to claim 2, they
will all have the same variance. The final normalization step
does not change the variances so the equal variance property
is kept for the final variable node messages at iteration t + 1.
Until this point we did not impose any restrictions on the
LDLC. From now on, we shall restrict ourselves to magic
square regular LDLC (see Definition 3). The basic iterative
equations that relate the variances at iteration t + 1 to the
variances at iteration t are summarized in the following two
lemmas.
Lemma 3: For magic square LDLC, variable node messages that are sent at the same iteration along edges with the
same absolute value have the same variance.
Proof: See Appendix IV.
Lemma 4: For magic square LDLC with degree d, denote
the variance of the messages that are sent at iteration t
6
(t)
along edges with weight ±hl by Vl . The variance values
(t)
(t)
(t)
V1 , V2 , ..., Vd obey the following recursion:
1
(t+1)
=
Vi
d
X
1
h2m
+
P
σ 2 m=1 dj=1 h2 V (t)
m6=i
j
j6=m
(4)
j
(0)
(0)
for i = 1, 2, ...d, with initial conditions V1 = V2 = ... =
(0)
Vd = σ 2 .
Proof: See Appendix IV.
For illustration, the recursion for the case d = 3 is:
1
(t+1)
V1
=
1
(t+1)
V2
1
(t+1)
V3
h22
(t)
h21 V1
=
=
+
+
(t)
h23 V3
h21
(t)
h22 V2
+
+
(t)
h23 V3
h21
(t)
h22 V2
+
h23
(t)
h21 V1
+
(t)
h23 V3
(t)
h22 V2
+
+
h23
(t)
h21 V1
+
(t)
h22 V2
h22
(t)
h21 V1
+
(t)
h23 V3
1
σ2
+
1
σ2
+
1
σ2
(5)
The lemmas above are used to prove the following theorem
regarding the convergence of the variances.
Theorem 1: For a magic square LDLC with degree d and
∆
generating
sequence h1 ≥ h2 ≥ ... ≥ hd > 0, define α =
Pd
2
i=2 hi
. Assume that α < 1. Then:
h2
1
1) The first variance approaches a constant value of σ 2 (1−
α), where σ 2 is the channel noise variance:
(∞) ∆
V1
(t)
= lim V1
t→∞
= σ 2 (1 − α).
2) The other variances approach zero:
(∞) ∆
Vi
(t)
= lim Vi
t→∞
=0
for i = 2, 3..d.
3) The asymptotic convergence rate of all variances is
exponential:
(t)
0 < lim
Vi
t→∞
(∞)
− Vi
αt
<∞
for i = 1, 2..d.
4) The zero approaching variances are upper bounded by
the decaying exponential σ 2 αt :
(t)
Vi
≤ σ 2 αt
for i = 2, 3..d and t ≥ 0.
Proof: See Appendix IV.
If α ≥ 1, the variances may still converge, but convergence
rate may be as slow as o(1/t), as illustrated in Appendix IV.
Convergence of the variances to zero implies that the
Gaussians approach impulses. This is a desired property of
the decoder, since the exact PDF that we want to calculate is
indeed a weighted sum of impulses (see (1)). ItPcan be seen
d
that by designing a code with α < 1, i.e. h21 > i=2 h2i , one
variance approaches a constant (and not zero). However, all the
other variances approach zero, where all variances converge in
an exponential rate. This will be the preferred mode because
the information can be recovered even if a single variance does
not decay to zero, where exponential convergence is certainly
preferred over slow 1/t convergence. Therefore, from now
on we shall restrict our analysis to magic square LDLC with
α < 1.
Theorem 1 shows that every iteration, each variable node
will generate d−1 messages with variances that approach zero,
and a single message with variance that approaches a constant.
The message with nonzero variance will be transmitted along
the edge with largest weight (i.e. h1 ). However, from the
derivation of Appendix IV it can be seen that the opposite
happens for the check nodes: each check node will generate
d − 1 messages with variances that approach a constant, and
a single message with variance that approaches zero. The
check node message with zero approaching variance will be
transmitted along the edge with largest weight.
C. Convergence of the Mean Values
The reason that the messages are mixtures and not single
Gaussians lies in the periodic extension step (3) at the check
nodes, and every Gaussian at the output of this step can be
related to a single index of the infinite sum. Therefore, we can
label each Gaussian at iteration t with a list of all the indices
that were used in (3) during its creation process in iterations
1, 2, ...t.
Definition 4 (label of a Gaussian): The label of a Gaussian
consists of a sequence of triplets of the form {t, c, i}, where
t is an iteration index, c is a check node index and i is an
integer. The labels are initialized to the empty sequence. Then,
the labels are updated along each iteration according to the
following update rules:
1) In the periodic extension step (3), each Gaussian in
the output periodic mixture is assigned the label of the
specific Gaussian of pj (x) that generated it, concatenated with a single triplet {t, c, i}, where t is the current
iteration index, c is the check node index and i is the
index in the infinite sum of (3) that corresponds to this
Gaussian.
2) In the convolution step and the product step, each
Gaussian in the output mixture is assigned a label
that equals the concatenation of all the labels of the
specific Gaussians in the input messages that formed
this Gaussian.
3) The stretching and normalization steps do not alter
the label of each Gaussian: Each Gaussian in the
stretched/normalized mixture inherits the label of the
appropriate Gaussian in the original mixture.
Definition 5 (a consistent Gaussian): A Gaussian in a mixture is called “[ta , tb ] consistent” if its label contains no
contradictions for iterations ta to tb , i.e. for every pair of
triplets {t1 , c1 , i1 }, {t2 , c2 , i2 } such that ta ≤ t1 , t2 ≤ tb ,
if c1 = c2 then i1 = i2 . A [0, ∞] consistent Gaussian will be
simply called a consistent Gaussian.
We can relate every consistent Gaussian to a unique integer
vector b ∈ Zn , which holds the n integers used in the n check
nodes. Since in the periodic extension step (3) the sum is taken
over all integers, a consistent Gaussian exists in each variable
node message for every possible integer valued vector b ∈ Zn .
We shall see later that this consistent Gaussian corresponds to
the lattice point Gb.
7
According to Theorem 1, if we choose the nonzero values
of H such that α < 1, every variable node generates
d − 1 messages with variances approaching zero and a single
message with variance that approaches a constant. We shall
refer to these messages as “narrow” messages and “wide”
messages, respectively. For a given integer valued vector b, we
shall concentrate on the consistent Gaussians that relate to b in
all the nd variable node messages that are generated in each
iteration (a single Gaussian in each message). The following
lemmas summarize the asymptotic behavior of the mean values
of these consistent Gaussians for the narrow messages.
Lemma 5: For a magic square LDLC with degree d and
α < 1, consider the d − 1 narrow messages that are sent from
a specific variable node. Consider further a single Gaussian in
each message, which is the consistent Gaussian that relates to
a given integer vector b. Asymptotically, the mean values of
these d − 1 Gaussians become equal.
Proof: See Appendix V.
Lemma 6: For a magic square LDLC with dimension n,
degree d and α < 1, consider only consistent Gaussians
that relate to a given integer vector b and belong to narrow
messages. Denote the common mean value of the d − 1 such
Gaussians that are sent from variable node i at iteration t by
(t)
mi , and arrange all these mean values in a column vector
∆
m(t) of dimension n. Define the error vector e(t) = m(t) − x,
where x = Gb is the lattice point that corresponds to b. Then,
for large t, e(t) satisfies:
e(t+1) ≈ −H̃ · e(t)
(6)
where H̃ is derived from H by permuting the rows such that
the ±h1 elements will be placed on the diagonal, dividing
each row by the appropriate diagonal element (h1 or −h1 ),
and then nullifying the diagonal.
Proof: See Appendix V.
We can now state the following theorem, which describes
the conditions for convergence and the steady state value of
the mean values of the consistent Gaussians of the narrow
variable node messages.
Theorem 2: For a magic square LDLC with α < 1, the
mean values of the consistent Gaussians of the narrow variable
node messages that relate to a given integer vector b are
assured to converge if and only if all the eigenvalues of H̃
have magnitude less than 1, where H̃ is defined in Lemma 6.
When this condition is fulfilled, the mean values converge to
the coordinates of the appropriate lattice point: m(∞) = G · b.
Proof: Immediate from Lemma 6.
Note that without adding random signs to the LDLC
nonzero values, the P
all-ones vector will be an eigenvector of
d
hi
, which may exceed 1.
H̃ with eigenvalue i=2
h1
Interestingly, recursion (6) is also obeyed by the error of the
Jacobi method for solving systems of sparse linear equations
[22] (see also Section VIII-A), when it is used to solve Hm =
b (with solution m = Gb). Therefore, the LDLC decoder can
be viewed as a superposition of Jacobi solvers, one for each
possible value of the integer valued vector b.
We shall now turn to the convergence of the mean values
of the wide messages. The asymptotic behavior is summarized
in the following lemma.
Lemma 7: For a magic square LDLC with dimension n and
α < 1, consider only consistent Gaussians that relate to a given
integer vector b and belong to wide messages. Denote the mean
value of such a Gaussian that is sent from variable node i at
(t)
iteration t by mi , and arrange all these mean values in a
column vector m(t) of dimension n. Define the error vector
∆
e(t) = m(t) − Gb. Then, for large t, e(t) satisfies:
e(t+1) ≈ −F · e(t) + (1 − α)(y − Gb)
(7)
where y is the noisy codeword and F is an n × n matrix
defined by:
Hr,k
Hr,l if k 6= l and there exist a row r of H
Fk,l =
for which |Hr,l | = h1 and Hr,k 6= 0
0
otherwise
(8)
Proof: See Appendix V, where an alternative way to
construct F from H is also presented.
The conditions for convergence and steady state solution for
the wide messages are described in the following theorem.
Theorem 3: For a magic square LDLC with α < 1, the
mean values of the consistent Gaussians of the wide variable
node messages that relate to a given integer vector b are
assured to converge if and only if all the eigenvalues of F
have magnitude less than 1, where F is defined in Lemma
7. When this condition is fulfilled, the steady state solution is
m(∞) = G · b + (1 − α)(I + F )−1 (y − G · b).
Proof: Immediate from Lemma 7.
Unlike the narrow messages, the mean values of the wide
messages do not converge to the appropriate lattice point
coordinates. The steady state error depends on the difference
between the noisy observation and the lattice point, as well
as on α, and it decreases to zero as α → 1. Note that
the final PDF of a variable is generated by multiplying all
the d check node messages that arrive to the appropriate
variable node. d − 1 of these messages are wide, and therefore
their mean values have a steady state error. One message
is narrow, so it converges to an impulse at the lattice point
coordinate. Therefore, the final product will be an impulse at
the correct location, where the wide messages will only affect
the magnitude of this impulse. As long as the mean values
errors are not too large (relative to the width of the wide
messages), this should not cause an impulse that corresponds
to a wrong lattice point to have larger amplitude than the
correct one. However, for large noise, these steady state errors
may cause the decoder to deviate from the ML solution (As
explained in Section IV-D).
To summarize the results for the mean values, we considered
the mean values of all the consistent Gaussians that correspond
to a given integer vector b. A single Gaussian of this form
exists in each of the nd variable node messages that are
generated in each iteration. For each variable node, d − 1
messages are narrow (have variance that approaches zero) and
a single message is wide (variance approaches a constant).
Under certain conditions on H, the mean values of all the
narrow messages converge to the appropriate coordinate of
the lattice point Gb. Under additional conditions on H, the
8
mean values of the wide messages converge, but the steady
state values contain an error term.
We analyzed the behavior of consistent Gaussian. It should
be noted that there are many more non-consistent Gaussians.
Furthermore non-consistent Gaussians are generated in each
iteration for any existing consistent Gaussian. We conjecture
that unless a Gaussian is consistent, or becomes consistent
along the iterations, it fades out, at least at noise conditions where the algorithm converges. The reason is that nonconsistency in the integer values leads to mismatch in the
corresponding PDF’s, and so the amplitude of that Gaussian
is attenuated.
We considered consistent Gaussians which correspond to a
specific integer vector b, but such a set of Gaussians exists
for every possible choice of b, i.e. for every lattice point.
Therefore, the narrow messages will converge to a solution that
has an impulse at the appropriate coordinate of every lattice
point. This resembles the exact solution (1), so the key for
proper convergence lies in the amplitudes: we would like the
consistent Gaussians of the ML lattice point to have the largest
amplitude for each message.
D. Convergence of the Amplitudes
We shall now analyze the behavior of the amplitudes of
consistent Gaussians (as discussed later, this is not enough for
complete convergence analysis, but it certainly gives insight
to the nature of the convergence process and its properties).
The behavior of the amplitudes of consistent Gaussians is
summarized in the following lemma.
Lemma 8: For a magic square LDLC with dimension n,
degree d and α < 1, consider the nd consistent Gaussians
that relate to a given integer vector b in the variable node
messages that are sent at iteration t (one consistent Gaussian
per message). Denote the amplitudes of these Gaussians by
(t)
(t)
pi , i = 1, 2, ...nd, and define the log-amplitude as li =
(t)
log pi . Arrange these nd log-amplitudes in a column vector
(t)
l , such that element (k − 1)d + i corresponds to the message
that is sent from variable node k along an edge with weight
±hi . Assume further that the bipartite graph of the LDLC
contains no 4-loops. Then, the log-amplitudes satisfy the
following recursion:
l(t+1) = A · l(t) − a(t) − c(t)
(9)
with initialization l(0) = 0. A is an nd × nd matrix which is
all zeros except exactly (d − 1)2 ’1’s in each row and each
column. The element of the excitation vector a(t) at location
(k − 1)d + i (where k = 1, 2, ...n and i = 1, 2, ...d) equals:
(t)
a(k−1)d+i =
=
(t)
V̂k,i
d
X
2
d
X
l=1 j=l+1
l6=i j6=i
(t)
(t)
(t)
m̃k,l
−
(t)
Ṽk,l
(t)
m̃k,j
(t)
· Ṽk,j
(10)
2
+
d
X
l=1
l6=i
2
− yk
(t)
σ 2 · Ṽk,l
(t)
m̃k,l
where m̃k,l and Ṽk,l denote the mean value and variance
of the consistent Gaussian that relates to the integer vector
b in the check node message that arrives to variable node
k at iteration t along an edge with weight ±hl . yk is the
(t) ∆
noisy channel observation of variable node k, and V̂k,i =
−1
Pd
1
1
. Finally, c(t) is a constant excitation
l=1 (t)
σ2 +
l6=i Ṽk,l
term that is independent of the integer vector b (i.e. is the same
for all consistent Gaussians). Note that an iteration is defined
as sending variable node messages, followed by sending check
node messages. The first iteration (where the variable nodes
send the initialization PDF) is regarded as iteration 0.
Proof: At the check node, the amplitude of a Gaussian
at the convolution output is the product of the amplitudes of
the corresponding Gaussians in the appropriate variable node
messages. At the variable node, the amplitude of a Gaussian
at the product output is the product of the amplitudes of
the corresponding Gaussians in the appropriate check node
messages, multiplied by the Gaussian scaling term, according
to claim 2. Since we assume that the bipartite graph of the
LDLC contains no 4-loops, an amplitude of a variable node
message at iteration t will therefore equal the product of
(d − 1)2 amplitudes of Gaussians of variable node messages
from iteration t − 1, multiplied by the Gaussian scaling term.
This proves (9) and shows that A has (d−1)2 ’1’s in every row.
However, since each variable node message affects exactly
(d − 1)2 variable node messages of the next iteration, A must
also have (d − 1)2 ’1’s in every column. The total excitation
term −a(t) −c(t) corresponds to the logarithm of the Gaussian
scaling term. Each element of this scaling term results from
the product of d − 1 check node Gaussians and the channel
Gaussian, according to claim 2. This scaling term sums over
all the pairs of Gaussians, and in (10) the sum is separated to
pairs that include the channel Gaussian and pairs that do not.
The total excitation is divided between (10), which depends
on the choice of the integer vector b, and c(t) , which includes
all the constant terms that are independent on b (including the
normalization operation which is performed at the variable
node).
Since there are exactly (d − 1)2 ’1’s in each column of
the matrix A, it is easy to see that the all-ones vector is an
eigenvector of A, with eigenvalue (d − 1)2 . If d > 2, this
eigenvalue is larger than 1, meaning that the recursion (9) is
non-stable.
It can be seen that the excitation term a(t) has two components. The first term sums the squared differences between the
mean values of all the possible pairs of received check node
messages (weighted by the inverse product of the appropriate
variances). It therefore measures the mismatch between the
incoming messages. This mismatch will be small if the mean
values of the consistent Gaussians converge to the coordinates
of a lattice point (any lattice point). The second term sums the
squared differences between the mean values of the incoming
messages and the noisy channel output yk . This term measures
the mismatch between the incoming messages and the channel
measurement. It will be smallest if the mean values of the
consistent Gaussians converge to the coordinates of the ML
lattice point.
9
The following lemma summarizes some properties of the
excitation term a(t) .
Lemma 9: For a magic square LDLC with dimension n,
degree d, α < 1 and no 4-loops, consider the consistent
Gaussians that correspond to a given integer vector b. According to Lemma 8, their amplitudes satisfy recursion (9). The
excitation term a(t) of (9), which is defined by (10), satisfies
the following properties:
(t)
1) ai , the i’th element of a(t) , is non-negative, finite
(t)
and bounded for every i and every t. Moreover, ai
converges to a finite non-negative steady state value as
t → ∞. P
(t)
nd
2) limt→∞ i=1 ai = 2σ1 2 (Gb − y)T W (Gb − y), where
y is the noisy received codeword and W is a positive
definite matrix defined by:
∆
−1
W = (d + 1 − α)I − 2(1 − α)(I + F )
T
+(1 − α)(I + F )−1
+
(11)
where F is defined in Lemma 7.
3) For an LDLC
with degree d > 2, the weighted infinite
(j)
P∞ Pnd
i=1 ai
sum j=0 (d−1)2j+2 converges to a finite value.
Proof: See Appendix VI.
The following theorem addresses the question of which consistent Gaussian will have the maximal asymptotic amplitude.
We shall first consider the case of an LDLC with degree d > 2,
and then consider the special case of d = 2 in a separate
theorem.
Theorem 4: For a magic square LDLC with dimension n,
degree d > 2, α < 1 and no 4-loops, consider the nd
consistent Gaussians that relate to a given integer vector b
in the variable node messages that are sent at iteration t
(one consistent Gaussian per message). Denote the amplitudes
(t)
of these Gaussians by pi , i = 1, 2, ...nd, and define the
∆ Qnd
(t)
product-of-amplitudes as P (t) =
i=1 pi . Define further
P
(j)
nd
P∞
a
(j)
i=1 i
S =
is defined by (10) (S is
j=0 (d−1)2j+2 , where ai
well defined according to Lemma 9). Then:
1) The integer vector b for which the consistent Gaussians
will have the largest asymptotic product-of-amplitudes
limt→∞ P (t) is the one for which S is minimized.
2) The product-of-amplitudes for the consistent Gaussians
that correspond to all other integer vectors will decay to
zero in a super-exponential rate.
(t) ∆
Proof: As in Lemma 8, define the log-amplitudes li =
P
∆
(t)
nd (t)
log pi . Define further s(t) = i=1 li . Taking the elementwise sum of (9), we get:
nd
X
(t)
ai
(t+1)
s̃
s(t)
(d−1)2t .
(t)
= s̃
(12)
i=1
with initialization s(0) = 0. Note that we ignored the term
Pnd (t)
i=1 ci . As shown below, we are looking for the vector b
that maximizes s(t) . Since (12) is a linear difference equation,
Pnd (t)
and the term i=1 ci is independent of b, its effect on s(t)
is common to all b and is therefore not interesting.
Substituting in (12), we get:
nd
X
1
(t)
−
a
(d − 1)2t+2 i=1 i
(13)
with initialization s̃(0) = 0, which can be solved to get:
t−1 Pnd
(j)
X
(t)
i=1 ai
s̃ = −
(14)
(d − 1)2j+2
j=0
We would now like to compare the amplitudes of consistent
Gaussians with various values of the corresponding integer
vector b in order to find the lattice point whose consistent
Gaussians will have largest product-of-amplitudes. From the
definitions of s(t) and s̃(t) we then have:
(t)
P (t) = es
(d − 1)2 I − F T F (I + F )−1
s(t+1) = (d − 1)2 s(t) −
∆
Define now s̃(t) =
2t
= e(d−1)
·s̃(t)
(15)
Consider two integer vectors b that relate to two lattice points.
(t)
Denote the corresponding product-of-amplitudes by P0 and
(t)
P1 , respectively, and assume that for these two vectors S
converges to the values S0 and S1 , respectively. Then, taking
into account that limt→∞ s̃(t) = −S, the asymptotic ratio of
the product-of-amplitudes for these lattice points will be:
(t)
lim
t→∞
P1
(t)
P0
2t
2t
e−(d−1) ·S1
= −(d−1)2t ·S = e(d−1) ·(S0 −S1 )
0
e
(16)
It can be seen that if S0 < S1 , the ratio decreases to zero
in a super exponential rate. This shows that the lattice point
for which S is minimized will have the largest product-ofamplitudes, where for all other lattice points, the productof-amplitudes will decay to zero in a super-exponential rate
(recall that the normalization operation at the variable node
keeps the sum of all amplitudes in a message to be 1). This
completes the proof of the theorem.
We now have to find which integer valued vector b minimizes S. The analysis is difficult because the weighting factor
inside the sum of (14) performs exponential weighting of
the excitation terms, where the dominant terms are those of
the first iterations. Therefore, we can not use the asymptotic
results of Lemma 9, but have to analyze the transient behavior.
However, the analysis is simpler for the case of an LDLC with
row and column degree of d = 2, so we shall first turn to this
simple case (note that for this case, both the convolution in
the check nodes and the product at the variable nodes involve
only a single message).
Theorem 5: For a magic square LDLC with dimension n,
degree d = 2, α < 1 and no 4-loops, consider the 2n
consistent Gaussians that relate to a given integer vector b
in the variable node messages that are sent at iteration t (one
consistent Gaussian per message). Denote the amplitudes of
(t)
these Gaussians by pi , i = 1, 2, ...2n, and define the product∆ Q2n
(t)
of-amplitudes as P (t) = i=1 pi . Then:
1) The integer vector b for which the consistent Gaussians
will have the largest asymptotic product-of-amplitudes
limt→∞ P (t) is the one for which (Gb−y)T W (Gb−y)
is minimized, where W is defined by (11) and y is the
noisy received codeword.
10
2) The product-of-amplitudes for the consistent Gaussians
that correspond to all other integer vectors will decay to
zero in an exponential rate.
Proof: For d = 2 (12) becomes:
s(t+1) = s(t) −
2n
X
(t)
ai
(17)
(j)
(18)
i=1
With solution:
s(t) = −
t−1 X
2n
X
ai
j=0 i=1
P2n (j)
Denote Sa = limj→∞ i=1 ai . Sa is well defined according
to Lemma 9. For large t, we then have s(t) ≈ −t · Sa . Therefore, for two lattice points with excitation sum terms which
approach Sa0 , Sa1 , respectively, the ratio of the corresponding
product-of-amplitudes will approach
(t)
lim
P1
t→∞
=
(t)
P0
e−Sa1 ·t
= e(Sa0 −Sa1 )·t
e−Sa0 ·t
(19)
If Sa0 < Sa1 , the ratio decreases to zero exponentially (unlike
the case of d > 2 where the rate was super-exponential,
as in (16)). This shows that the lattice point for which
Sa is minimized will have the largest product-of-amplitudes,
where for all other lattice points, the product-of-amplitudes
will decay to zero in an exponential rate (recall that the
normalization operation at the variable node keeps the sum
of all amplitudes in a message to be 1). This completes the
proof of the second part of the theorem.
We still have to find the vector b that minimizes Sa . The
basic difference between the case of d = 2 and the case of
d > 2 is that for d > 2 we need to analyze the transient
behavior of the excitation terms, where for d = 2 we only
need to analyze the asymptotic behavior, which is much easier
to handle.
According to Lemma 9, we have:
∆
Sa = lim
j→∞
2n
X
i=1
(j)
ai
=
1
(Gb − y)T W (Gb − y)
2σ 2
(20)
where W is defined by (11) and y is the noisy received
codeword. Therefore, for d = 2, the lattice points whose
consistent Gaussians will have largest product-of-amplitudes
is the point for which (Gb − y)T W (Gb − y) is minimized.
This completes the proof of the theorem.
For d = 2 we could find an explicit expression for the
“winning” lattice point. As discussed above, we could not find
an explicit expression for d > 2, since the result depends on
the transient behavior of the excitation sum term, and not only
on the steady state value. However, a reasonable conjecture is
to assume that b that maximizes the steady state excitation
will also maximize the term that depends on the transient
behavior. This means that a reasonable conjecture is to assume
that the “winning” lattice point for d > 2 will also minimize
an expression of the form (20).
Note that for d > 2 we can still show that for “weak” noise,
the ML point will have the minimal S. To see that, it comes
out from (10) that for zero noise, the ML lattice point will
(t)
have ai = 0 for every t and i, where all other lattice points
(t)
will have ai > 0 for at least some i and t. Therefore, the ML
point will have a minimal excitation term along the transient
behavior so it will surely have the minimal S and the best
product-of-amplitudes. As the noise increases, it is difficult to
(t)
analyze the transient behavior of ai , as discussed above.
Note that the ML solution minimizes (Gb − y)T (Gb −
y), where the above analysis yields minimization of (Gb −
y)T W (Gb − y). Obviously, for zero noise (i.e. y = G · b)
both minimizations will give the correct solution with zero
score. As the noise increases, the solutions may deviate from
one another. Therefore, both minimizations will give the same
solution for “weak” noise but may give different solutions for
“strong” noise.
An example for another decoder that performs this form
of
is the linear detector, which calculates b̂ =
minimization
H · y (where bxe denotes the nearest integer to x). This is
equivalent to minimizing (Gb − y)T W (Gb − y) with W =
T
H T H = G−1 G−1 . The linear detector fails to yield the
ML solution if the noise is too strong, due to its inherent
noise amplification.
For the LDLC iterative decoder, we would like that the
deviation from the ML decoder due to the W matrix would
be negligible in the expected range of noise variance. Experimental results (see Section IX) show that the iterative decoder
indeed converges to the ML solution for noise variance values
that approach channel capacity. However, for quantization or
shaping applications (see Section VIII-B), where the effective
noise is uniformly distributed along the Voronoi cell of the
lattice (and is much stronger than the noise variance at channel
capacity) the iterative decoder fails, and this can be explained
by the influence of the W matrix on the minimization, as
described above. Note from (11) that as α → 1, W approaches
a scaled identity matrix, which means that the minimization
criterion approaches the ML criterion. However, the variances
converge as αt , so as α → 1 convergence time approaches
infinity.
Until this point, we concentrated only on consistent Gaussians, and checked what lattice point maximizes the productof-amplitudes of all the corresponding consistent Gaussians.
However, this approach does not necessarily lead to the lattice
point that will be finally chosen by the decoder, due to 3 main
reasons:
1) It comes out experimentally that the strongest Gaussian
in each message is not necessarily a consistent Gaussian,
but a Gaussian that started as non-consistent and became
consistent at a certain iteration. Such a Gaussian will
finally converge to the appropriate lattice point, since
the convergence of the mean values is independent
of initial conditions. The non-consistency at the first
several iterations, where the mean values are still very
noisy, allows these Gaussians to accumulate stronger
amplitudes than the consistent Gaussians (recall that the
exponential weighting in (14) for d > 2 results in strong
dependency on the behavior at the first iterations).
2) There is an exponential number of Gaussians that start
as non-consistent and become consistent (with the same
11
integer vector b) at a certain iteration, and the final amplitude of the Gaussians at the lattice point coordinates
will be determined by the sum of all these Gaussians.
3) We ignored non-consistent Gaussians that endlessly remain non-consistent. We have not shown it analytically,
but it is reasonable to assume that the excitation terms
for such Gaussians will be weaker than for Gaussians
that become consistent at some point, so their amplitude
will fade away to zero. However, non-consistent Gaussians are born every iteration, even at steady state. The
“newly-born” non-consistent Gaussians may appear as
sidelobes to the main impulse, since it may take several
iterations until they are attenuated. Proper choice of the
coefficients of H may minimize this effect, as discussed
in Sections III-A and V-A. However, these Gaussians
may be a problem for small d (e.g. d = 2) where
the product step at the variable node does not include
enough messages to suppress them.
Note that the first two issues are not a problem for d = 2,
where the winning lattice point depends only on the asymptotic
behavior. The amplitude of a sum of Gaussians that converged
to the same coordinates will still be governed by (18) and the
winning lattice point will still minimize (20). The third issue
is a problem for small d, but less problematic for large d, as
described above.
As a result, we can not regard the convergence analysis of
the consistent Gaussians’ amplitudes as a complete convergence analysis. However, it can certainly be used as a qualitative analysis that gives certain insights to the convergence
process. Two main observations are:
1) The narrow variable node messages tend to converge
to single impulses at the coordinates of a single lattice
point. This results from (16), (19), which show that
the “non-winning” consistent Gaussians will have amplitudes that decrease to zero relative to the amplitude of
the “winning” consistent Gaussian. This result remains
valid for the sum of non-consistent Gaussians that became consistent at a certain point, because it results from
the non-stable nature of the recursion (9), which makes
strong Gaussians stronger in an exponential manner.
The single impulse might be accompanied by weak
“sidelobes” due to newly-born non-consistent Gaussians.
Interestingly, this form of solution is different from
the exact solution (1), where every lattice point is
represented by an impulse at the appropriate coordinate,
with amplitude that depends on the Euclidean distance
of the lattice point from the observation. The iterative
decoder’s solution has a single impulse that corresponds
to a single lattice point, where all other impulses have
amplitudes that decay to zero. This should not be a
problem, as long as the ML point is the remaining point
(see discussion above).
2) We have shown that for d = 2 the strongest consistent
Gaussians relate to b that minimizes an expression of the
form (Gb − y)T W (Gb − y). We proposed a conjecture
that this is also true for d > 2. We can further widen the
conjecture to say that the finally decoded b (and not only
the b that relates to strongest consistent Gaussians) minimizes such an expression. Such a conjecture can explain
why the iterative decoder works well for decoding near
channel capacity, but fails for quantization or shaping,
where the effective noise variance is much larger.
E. Summary of Convergence Results
To summarize the convergence analysis, it was first shown
that the variable node messages are Gaussian mixtures. Therefore, it is sufficient to analyze the sequences of variances,
mean values and relative amplitudes of the Gaussians in each
mixture. Starting with the variances, it was shown that with
proper choice of the magic square LDLC generating sequence,
each variable node generates d − 1 “narrow” messages, whose
variance decreases exponentially to zero, and a single “wide”
message, whose variance reaches a finite value. Consistent
Gaussians were then defined as Gaussians that their generation
process always involved the same integer at the same check
node. Consistent Gaussians can then be related to an integer
vector b or equivalently to the lattice point Gb. It was then
shown that under appropriate conditions on H, the mean
values of consistent Gaussians that belong to narrow messages
converge to the coordinates of the appropriate lattice point.
The mean values of wide messages also converge to these
coordinates, but with a steady state error. Then, the amplitudes
of consistent Gaussians were analyzed. For d = 2 it was
shown that the consistent Gaussians with maximal productof-amplitudes (over all messages) are those that correspond
to an integer vector b than minimizes (Gb − y)T W (Gb − y),
where W is a positive definite matrix that depends only on H.
The product-of-amplitudes for all other consistent Gaussians
decays to zero. For d > 2 the analysis is complex and
depends on the transient behavior of the mean values and
variances (and not only on their steady state values), but
a reasonable conjecture is to assume that a same form of
criterion is also minimized for d > 2. The result is different
2
from the ML lattice point, which minimizes G · b − y ,
where both criteria give the same point for weak noise but
may give different solutions for strong noise. This may explain
the experiments where the iterative decoder is successful in
decoding the ML point for the AWGN channel near channel
capacity, but fails in quantization or shaping applications
where the effective noise is much stronger. These results also
show that the iterative decoder converges to impulses at the
coordinates of a single lattice point. It was then explained
that analyzing the amplitudes of consistent Gaussians is not
sufficient, so these results can not be regarded as a complete
convergence analysis. However, the analysis gave a set of
necessary conditions on H, and also led to useful insights
to the convergence process.
V. C ODE D ESIGN
A. Choosing the Generating Sequence
We shall concentrate on magic square LDLC, since they
have inherent diversity of the nonzero elements in each
row and column, which was shown above to be beneficial.
It still remains to choose the LDLC generating sequence
12
h1 , h2 , ...hd . Assume that the algorithm converged, and
each PDF has a peak at the desired value. When the
periodic functions are multiplied at a variable node, the
correct peaks will then align. We would like that all the
other peaks will be strongly attenuated, i.e. there will be
no other point where the peaks align. This resembles the
definition of the least common multiple (LCM) of integers:
if the periods were integers, we would like to have their
LCM as large as possible. This argument suggests the
sequence {1/2, 1/3, 1/5, 1/7, 1/11, 1/13, 1/17, ...}, i.e. the
reciprocals of the smallest d prime numbers. Since the periods
are 1/h1 , 1/h2 , ...1/hd , we will get the desired property.
Simulations have shown that increasing d beyond 7 with this
choice gave negligible improvement. Also, performance was
improved by adding some “dither” to the sequence, resulting in
{1/2.31, 1/3.17, 1/5.11, 1/7.33, 1/11.71, 1/13.11, 1/17.55}.
For d < 7, the first d elements are used.
An alternative approach is a sequence of the form
{1, , , ..., }, where << 1. For this case, every variable
node will receive a single message with period 1 and d − 1
messages with period 1/. For small , the period of these
d − 1 messages will be large and multiplication by the
channel Gaussian will attenuate all the unwanted replicas.
The single remaining replica will attenuate all the unwanted
replicas of the message with period 1. A convenient choice
is = √1d , which ensures that α = d−1
d < 1, as required by
Theorem 1. As an example, for d = 7 the sequence will be
{1, √17 , √17 , √17 , √17 , √17 , √17 }.
decoded codeword from the ML codeword, as discussed
in Section IV-D. For the first LDLC generating sequence
of the previous subsection, we have α = 0.92 and 0.87
for d = 7 and 5, respectively, which is a reasonable trade
off. For the second sequence type we have α = d−1
d .
3) All the eigenvalues of H̃ must have magnitude less than
1, where H̃ is defined in Theorem 2. This is a necessary
condition for convergence of the mean values of the
narrow messages. Note that adding random signs to the
nonzero H elements is essential to fulfill this necessary
condition, as explained in Section IV-C.
4) All the eigenvalues of F must have magnitude less than
1, where F is defined in Theorem 3. This is a necessary
condition for convergence of the mean values of the wide
messages.
Interestingly, it comes out experimentally that for large codeword length n and relatively small degree d (e.g. n ≥ 1000
and d ≤ 10), a magic square LDLC with generating sequence
that satisfies h1 = 1 and α < 1 results in H that satisfies
all these four conditions: H ispnonsingular without any need
to omit rows and columns, n |det(H)| ≈ 1 without any
need for normalization, and all eigenvalues of H̃ and F have
magnitude less than 1 (typically, the largest eigenvalue of H̃
or F has magnitude of 0.94 − 0.97, almost independently
of n and the choice of nonzero H locations). Therefore, by
simply dividing the first generating sequence of the previous
subsection by its first element, the constructed H meets all
the necessary conditions, where the second type of sequence
meets the conditions without any need for modifications.
B. Necessary Conditions on H
The magic square LDLC definition and convergence analysis imply four necessary conditions on H:
1) |det(H)| = 1. This condition is part of the LDLC
definition, which ensures proper density of the lattice
points in Rm . If |det(H)| 6=p1, it can be easily
normalized by dividing H by n |det(H)|. Note that
practically
we can allowp|det(H)| =
6 1 as long as
p
n
|det(H)| ≈ 1, since n |det(H)| is the gain factor
of the transmitted codeword. For example, if n = 1000,
having
|det(H)| = 0.01 is acceptable, since we have
p
n
|det(H)| = 0.995, which means that the codeword
has to be further amplified by 20 · log10 (0.995) = 0.04
dB, which is negligible.
Note that normalizing H is applicable only if H is
non-singular. If H is singular, a row and a column
should be sequentially omitted until H becomes full
rank. This process may result in slightly reducing n
and a slightly different row and column degrees than
originally planned.
Pd
2
∆
i=2 hi
. This guarantees expo2) α < 1, where α =
h21
nential convergence rate for the variances (Theorem 1).
Choosing a smaller α results in faster convergence, but
we should not take α too small since the steady state
variance of the wide variable node messages, as well
as the steady state error of the mean values of these
messages, increases when α decreases, as discussed
in Section IV-C. This may result in deviation of the
C. Construction of H for Magic Square LDLC
We shall now present a simple algorithm for constructing a
parity check matrix for a magic square LDLC. If we look at
the bipartite graph, each variable node and each check node
has d edges connected to it, one with every possible weight
h1 , h2 , ...hd . All the edges that have the same weight hj form
a permutation from the variable nodes to the check nodes
(or vice versa). The proposed algorithm generates d random
permutations and then searches sequentially and cyclically for
2-loops (two parallel edges from a variable node to a check
node) and 4-loops (two variable nodes that both are connected
to a pair of check nodes). When such a loop is found, a pair
is swapped in one of the permutations such that the loop is
removed. A detailed pseudo-code for this algorithm is given
in Appendix VII.
VI. D ECODER I MPLEMENTATION
Each PDF should be approximated with a discrete vector
with resolution ∆ and finite range. According to the Gaussian
Q-function, choosing a range of, say, 6σ to both sides of the
noisy channel observation will ensure that the error probability
due to PDF truncation will be ≈ 10−9 . Near capacity, σ 2 ≈
1
2πe , so 12σ ≈ 3. Simulation showed that resolution errors
became negligible for ∆ = 1/64. Each PDF was then stored
in a L = 256 elements vector, corresponding to a range of
size 4.
13
At the check node, the PDF fj (x) that arrives from variable
node j is first expanded by hj (the appropriate coefficient
of H) to get fj (x/hj ). In a discrete implementation with
resolution ∆ the PDF is a vector of values fj (k∆), k ∈ Z.
As described in Section V, we shall usually use hj ≤ 1 so
the expanded PDF will be shorter than the original PDF. If the
expand factor 1/|hj | was an integer, we could simply decimate
fj (k∆) by 1/|hj |. However, in general it is not an integer so
we should use some kind of interpolation. The PDF fj (x)
is certainly not band limited, and as the iterations go on it
approaches an impulse, so simple interpolation methods (e.g.
linear) are not suitable. Suppose that we need to calculate
fj ((k + )∆), where −0.5 ≤ ≤ 0.5. A simple interpolation
method which showed to be effective is to average fj (x)
around the desired point, where the averaging window length
lw is chosen to ensure that every sample of fj (x) is used in
the interpolation of at least one output point. This ensures that
an impulse can not be missed. The interpolation
result
j
k is then
Plw
d1/|hj |e
1
f
((k
−
i)∆),
where
l
=
.
j
w
i=−lw
2lw +1
2
The most computationally extensive step at the check nodes
is the calculation the convolution of d − 1 expanded PDF’s.
An efficient method is to calculate the fast Fourier transforms
(FFTs) of all the PDF’s, multiply the results and then perform
inverse FFT (IFFT). The resolution of the FFT should be
larger thanP
the expected convolution length, which is roughly
d
Lout ≈ L· i=1 hi , where L denotes the original PDF length.
Appendix VIII shows a way to use FFTs of size 1/∆, where
∆ is the resolution of the PDF. Usually 1/∆ << Lout so
FFT complexity is significantly reduced. Practical values are
L = 256 and ∆ = 1/64, which give an improvement factor
of at least 4 in complexity.
Each variable node receives d check node messages. The
output variable node message is calculated by generating the
product of d − 1 input messages and the channel Gaussian. As
the iterations go on, the messages get narrow and may become
impulses, with only a single nonzero sample. Quantization
effects may cause impulses in two messages to be shifted
by one sample. This will result in a zero output (instead of
an impulse). Therefore, it was found useful to widen each
check nodePmessage Q(k) prior to multiplication, such that
1
Qw (k) = i=−1 Q(k + i), i.e. the message is added to its
right shifted and left shifted (by one sample) versions.
VII. C OMPUTATIONAL C OMPLEXITY AND S TORAGE
R EQUIREMENTS
Most of the computational effort is invested in the d FFT’s
and d IFFT’s (of length 1/∆) that each check node performs
each iteration. The total number of multiplications for t
1
1
· log2 ( ∆
) . As in binary LDPC
iterations is o n · d · t · ∆
codes, the computational complexity has the attractive property
of being linear with block length. However, the constant that
precedes the linear term is significantly higher, mainly due to
the FFT operations.
The memory requirements are governed by the storage of
the nd check node and variable node messages, with total
memory of o(n · d · L). Compared to binary LDPC, the factor
of L significantly increases the required memory. For example,
for n = 10, 000, d = 7 and L = 256, the number of storage
elements is of the order of 107 .
VIII. E NCODING AND S HAPING
A. Encoding
The LDLC encoder has to calculate x = G·b, where b is an
integer message vector. Note that unlike H, G = H −1 is not
sparse, in general, so the calculation requires computational
complexity and storage of o(n2 ). This is not a desirable
property because the decoder’s computational complexity is
only o(n). A possible solution is to use the Jacobi method
[22] to solve H · x = b, which is a system of sparse linear
equations. Using this method, a magic square LDLC encoder
calculates several iterations of the form:
x(t) = b̃ − H̃ · x(t−1)
(21)
(0)
with initialization x
= 0. The matrix H̃ is defined in
Lemma 6 of Section IV-C. The vector b̃ is a permuted and
scaled version of the integer vector b, such that the i’th element
of b̃ equals the element of b for which the appropriate row of
H has its largest magnitude value at the i’th location. This
element is further divided by this largest magnitude element.
A necessary and sufficient condition for convergence to x =
G · b is that all the eigenvalues of H̃ have magnitude less than
1 [22]. However, it was shown that this is also a necessary
condition for convergence of the LDLC iterative decoder (see
Sections IV-C, V-B), so it is guaranteed to be fulfilled for a
properly designed magic square LDLC. Since H̃ is sparse,
this is an o(n) algorithm, both in complexity and storage.
B. Shaping
For practical use with the power constrained AWGN channel, the encoding operation must be accompanied by shaping,
in order to prevent the transmitted codeword’s power from
being too large. Therefore, instead of mapping the information
vector b to the lattice point x = G · b, it should be mapped
to some other lattice point x0 = G · b0 , such that the lattice
points that are used as codewords belong to a shaping region
(e.g. an n-dimensional sphere). The shaping operation is the
mapping of the integer vector b to the integer vector b0 .
As explained in Section II-A, this work concentrates on
the lattice design and the lattice decoding algorithm, and not
on the shaping region or shaping algorithms. Therefore, this
section will only highlight some basic shaping principles and
ideas.
A natural shaping scheme for lattice codes is nested lattice
coding [12]. In this scheme, shaping is done by quantizing
the lattice point G · b onto a coarse lattice G0 , where the
transmitted codeword is the quantization error, which is uniformly distributed along the Voronoi cell of the coarse lattice.
If the second moment of this Voronoi cell is close to that
of an n-dimensional sphere, the scheme will attain close-tooptimal shaping gain. Specifically, assume that the information
vector b assumes integer values in the range 0, 1, ...M − 1 for
some constant integer M . Then, we can choose the coarse
lattice to be G0 = M G. The volume of the Voronoi cell
for this lattice is M n , since we assume det(G) = 1 (see
14
10
-1
10
-2
10
-3
10
-4
10
-5
Symbol error rate (SER) for various block lengths
100
1000
10,000
100,000
10
Fig. 4.
-6
0
0.5
1
1.5
2
2.5
3
distance from capacity [dB]
3.5
4
Simulation results
Section II-A). If the shape of the Voronoi cell resembles an ndimensional sphere (as expected from a capacity approaching
lattice code), it will attain optimal shaping gain (compared to
uncoded transmission of the original integer sequence b).
The shaping operation will find the coarse lattice point
M Gk, k ∈ Zn , which is closest to the fine lattice point
x = G · b. The transmitted codeword will be:
x0 = x − M Gk = G(b − M k) = Gb0
{1/2.31, 1/3.17, 1/5.11, 1/7.33, 1/11.71, 1/13.11, 1/17.55})
were simulated for the AWGN channel at various block
lengths. The degree was d = 5 for n = 100 and d = 7 for all
otherpn. For n = 100 the matrix H was further normalized to
get n det(H) = 1. For all other n, normalizing the generating
sequence such that the largest element has magnitude 1 also
gave the desired determinant normalization (see Section
V-B). The H matrices were generated using the algorithm of
Section V-C. PDF resolution was set to ∆ = 1/256 with a
total range of 4, i.e. each PDF was represented by a vector
of L = 1024 elements. High resolution was used since our
main target is to prove the LDLC concept and eliminate
degradation due to implementation considerations. For this
reason, the decoder was used with 200 iterations (though
most of the time, a much smaller number was sufficient).
In all simulations the all-zero codeword was used. Ap1
(see
proaching channel capacity is equivalent to σ 2 → 2πe
Section II-A), so performance is measured in symbol error rate
(SER), vs. the distance of the noise variance σ 2 from capacity
(in dB). The results are shown in Figure 4. At SER of 10−5 ,
for n = 100000, 10000, 1000, 100 we can work as close as
0.6dB, 0.8dB, 1.5dB and 3.7dB from capacity, respectively.
Similar results were obtained for d = 7 with the
second type of generating sequence of Section V-A, i.e.
{1, √17 , √17 , √17 , √17 , √17 , √17 }. Results were slightly worse
than for the first generating sequence (by less than 0.1 dB).
Increasing d did not give any visible improvement.
∆
where b0 = b − M k (note that the “inverse shaping” at the
decoder, i.e. transforming from b0 to b, is a simple modulo
calculation: b = b0 mod M ). Finding the closest coarse lattice
point M Gk to x is equivalent to finding the closest fine lattice
point G · k to the vector x/M . This is exactly the operation of
the iterative LDLC decoder, so we could expect that is could be
used for shaping. However, simulations show that the iterative
decoding finds a vector k with poor shaping gain. The reason
is that for shaping, the effective noise is much stronger than
for decoding, and the iterative decoder fails to find the nearest
lattice point if the noise is too large (see Section IV-D).
Therefore, an alternative algorithm has to be used for finding
the nearest coarse lattice point. The complexity of finding
the nearest lattice point grows exponentially with the lattice
dimension n and is not feasible for large dimensions [23].
However, unlike decoding, for shaping applications it is not
critical to find the exact nearest lattice point, and approximate
algorithms may be considered (see [15]). A possible method
[24] is to perform QR decomposition on G in order to
transform to a lattice with upper triangular generator matrix,
and then use sequential decoding algorithms (such as the Fano
algorithm) to search the resulting tree. The main disadvantage
of this approach is computational complexity and storage of
at least o(n2 ). Finding an efficient shaping scheme for LDLC
is certainly a topic for further research.
IX. S IMULATION R ESULTS
Magic
erating
square
LDLC
sequence
of
with
the
Section
first
V-A
gen(i.e.
X. C ONCLUSION
Low density lattice codes (LDLC) were introduced. LDLC
are novel lattice codes that can approach capacity and be
decoded efficiently. Good error performance within ∼ 0.5dB
from capacity at block length of 100,000 symbols was demonstrated. Convergence analysis was presented for the iterative
decoder, which is not complete, but yields necessary conditions on H and significant insight to the convergence process.
Code parameters were chosen from intuitive arguments, so
it is reasonable to assume that when the code structure will
be more understood, better parameters could be found, and
channel capacity could be approached even closer.
Multi-input, multi-output (MIMO) communication systems
have become popular in recent years. Lattice codes have
been proposed in this context as space-time codes (LAST)
[25]. The concatenation of the lattice encoder and the MIMO
channel generates a lattice. If LDLC are used as lattice codes
and the MIMO configuration is small, the inverse generator
matrix of this concatenated lattice can be assumed to be
sparse. Therefore, the MIMO channel and the LDLC can
be jointly decoded using an LDLC-like decoder. However,
even if a magic square LDLC is used as the lattice code,
the concatenated lattice is not guaranteed to be equivalent
to a magic square LDLC, and the necessary conditions for
convergence are not guaranteed to be fulfilled. Therefore, the
usage of LDLC for MIMO systems is a topic for further
research.
15
A PPENDIX I
E XACT PDF C ALCULATIONS
Given the n-dimensional noisy observation y = x + w of
the transmitted codeword x = Gb, we would like to calculate
the probability density function (PDF) fxk |y (xk |y). We shall
fx (x)fy|x (y|x)
start by calculating fx|y (x|y) =
. Denote the
fy (y)
shaping region by B (G will be used to denote both the
lattice and its generator matrix). fx (x) is a sum of |G ∩ B|
n-dimensional Dirac delta functions, since x has nonzero
probability only for the lattice points that lie inside the shaping
region. Assuming further that all codewords are used with
equal probability, all these delta functions have equal weight
1
. The expression for fy|x (y|x) is simply the PDF of
of |G∩B|
the i.i.d Gaussian noise vector. We therefore get:
fx|y (x|y) =
1
|G∩B|
P
l∈G∩B
fx (x)fy|x (y|x)
fy (y)
δ(x − l) · (2πσ 2 )−n/2 e−
=
Pn
2
2
i=1 (yi −xi ) /2σ
=
fy (y)
=C·
2
X
(22)
δ(x − l) · e−d
(l,y)/2σ 2
l∈G∩B
Where C is not a function of x and d2 (l, y) is the squared
Euclidean distance between the vectors l and y in Rn . It can be
seen that the conditional PDF of x has a delta function for each
lattice point, located at this lattice point with weight that is
proportional to the exponent of the negated squared Euclidean
distance of this lattice point from the noisy observation. The
ML point corresponds to the delta function with largest weight.
As the next step, instead of calculating the n-dimensional
PDF of the whole vector x, we shall calculate the n onedimensional PDF’s for each of the components xk of the vector
x (conditioned on the whole observation vector y):
fxk |y (xk |y) =
Z Z
(23)
Z
···
fx|y (x|y)dx1 dx2 · · · dxk−1 dxk+1 · · · dxn =
xi ,i6=k
=C·
X
δ(xk − lk ) · e−d
2
(l,y)/2σ 2
l∈G∩B
This finishes the proof of (1). It can be seen that the
conditional PDF of xk has a delta function for each lattice
point, located at the projection of this lattice point on the
coordinate xk , with weight that is proportional to the exponent
of the negated squared Euclidean distance of this lattice point
from the noisy observation. The ML point will therefore
correspond to the delta function with largest weight in each
coordinate. Note, however, that if several lattice points have
the same projection on a specific coordinate, the weights of
the corresponding delta functions will add and may exceed the
weight of the ML point.
A PPENDIX II
E XTENDING G ALLAGER ’ S T ECHNIQUE TO THE
C ONTINUOUS C ASE
In [5], the derivation of the LDPC iterative decoder was
simplified using the following technique: the codeword elements xk were assumed i.i.d. and a condition was added to
all the probability calculations, such that only valid codewords
were actually considered. The question is then how to choose
the marginal PDF of the codeword elements. In [5], binary
codewords were considered, and the i.i.d distribution assumed
the values ’0’ and ’1’ with equal probability. Since we extend
the technique to the continuous case, we have to set the
continuous marginal distribution fxk (xk ). It should be set such
that fx (x), assuming that x is a lattice point, is the same as
f (x|s ∈ Zn ), assuming that xk are i.i.d with marginal PDF
∆
fxk (xk ), where s = H · x. This fx (x) equals a weighted sum
of Dirac delta functions at all lattice points, where the weight
at each lattice point equals the probability to use this point as
a codeword.
Before proceeding, we need the following property of
conditional probabilities. For any two continuous valued RV’s
u, v we have:
PN
k=1 fu,v (u, vk )
(24)
f (u|v ∈ {v1 , v2 , ..., vN }) = P
N
k=1 fv (vk )
(This property can be easily proved by following the lines of
[21], pp. 159-160, and can also be extended to the infinite sum
case).
Using (24), we now have:
P
i∈Zn fx,s (x, s = i)
n
P
f (x|s ∈ Z ) =
=
i∈Zn fs (i)
=C
X
i∈Zn
f (x)f (s = i|x) = C 0
X
f (x)δ(x − Gi)
(25)
i∈Zn
where C, C 0 are independent of x.
The result is a weighted sum of Dirac delta functions at all
lattice points, as desired. Now, the weight at each lattice point
should equal the probability to use this point as a codeword.
Therefore, fxk (xk ) should be chosen such that Q
at each lattice
n
point, the resulting vector distribution fx (x) = k=1 fxk (xk )
will have a value that is proportional to the probability to use
this lattice point. At x which is not a lattice point, the value
of fx (x) is not important.
A PPENDIX III
D ERIVATION OF THE I TERATIVE D ECODER
In this appendix we shall derive the LDLC iterative decoder
for a code with dimension n, using the tree assumption and
Gallager’s trick.
Referring to figure 2, assume that there are only 2 tiers.
Using Gallager’s trick we assume that the xk ’s are i.i.d. We
∆
would like to calculate f (x1 |(y, s ∈ Zn ), where s = H · x.
Due to the tree assumption, we can do it in two steps:
1. calculate the conditional PDF of the tier 1 variables of
x1 , conditioned only on the check equations that relate the tier
1 and tier 2 variables.
16
2. calculate the conditional PDF of x1 itself, conditioned
only on the check equations that relate x1 and its first tier
variables, but using the results of step 1 as the PDF’s for
the tier 1 variables. Hence, the results will be equivalent to
conditioning on all the check equations.
There is a basic difference between the calculation in step
1 and step 2: the condition in step 2 involves all the check
equations that are related to x1 , where in step 1 a single check
equation is always omitted (the one that relates the relevant
tier 1 element with x1 itself).
Assume now that there are many tiers, where each tier
contains distinct elements of x (i.e. each element appears only
once in the resulting tree). We can then start at the farthest tier
and start moving toward x1 . We do it by repeatedly calculating
step 1. After reaching tier 1, we use step 2 to finally calculate
the desired conditional PDF for x1 .
This approach suggests an iterative algorithm for the calculation of f (xk |(y, s ∈ Zn ) for k=1, 2..n. In this approach
we assume that the resulting tier diagram for each xk contains
distinct elements for several tiers (larger or equal to the number
of required iterations). We then repeat step 1 several times,
where the results of the previous iteration are used as initial
PDF’s for the next iteration. Finally, we perform step 2 to
calculate the final results.
Note that by conditioning only on part of the check equations in each iteration, we can not restrict the result to the
shaping region. This is the reason that the decoder performs
lattice decoding and not exact ML decoding, as described in
Section III.
We shall now turn to derive the basic iteration of the
algorithm. For simplicity, we shall start with the final step
of the algorithm (denoted step 2 above). We would like to
perform t iterations, so assume that for each xk there are
t tiers with a total of Nc check equations. For every xk
we need to calculate f (xk |s ∈ ZNc , y) = f (xk |s(tier1 ) ∈
Zck , s(tier2 :tiert ) ∈ ZNc −ck , y), where ck is the number of
(tier1 ) ∆
f (xk |s
P
=
i∈Zck
P
(28)
where f (xk |y) = f (xk |yk ) due to the i.i.d assumption.
Evaluating now the right term of (27), we get:
f (s(tier1 ) = i|xk , A, y) =
ck
Y
(tier1 )
=
f (sm
= im |xk , A, y)
(29)
m=1
(tier )
where sm 1 denotes the m’th component of s(tier1 ) and im
denotes the m’th component of i. Note that each element of
s(tier1 ) is a linear combination of several elements of x. Due
to the tree assumption, two such linear combinations have
no common elements, except for xk itself, which appears
in all linear combinations. However, xk is given, so the
i.i.d assumption implies that all these linear combinations
are independent, so (29) is justified. The condition A (i.e.
s(tier2 :tiert ) ∈ ZNc −ck ) does not impact the independence due
to the tree assumption.
Substituting (27), (28), (29) back in (26), we get:
f (xk |s(tier1 ) ∈ Zck , A, y) =
= C · f (xk ) · e
−
(yk −xk )2
2σ 2
ck
X Y
ck
i∈Z
= C · f (xk ) · e−
(30)
1)
f (s(tier
= im |xk , A, y) =
m
m=1
(yk −xk )2
2σ 2
X X
···
i1 ∈Z i2 ∈Z
···
ck
X Y
(tier1 )
f (sm
= im |xk , A, y) =
ick ∈Z m=1
∈ Z , A, y) =
f (s(tier1 ) = i|A, y)
(26)
Evaluating the term inside the sum of the nominator, we get:
f (xk , s(tier1 ) = i|A, y) =
= f (xk |A, y) · f (s(tier1 ) = i|xk , A, y)
= C · f (xk ) · e−
(yk −xk )2
2σ 2
ck X
Y
1)
f (s(tier
= im |xk , A, y)
m
m=1 im ∈Z
where C is independent of xk .
(tier )
We shall now examine the term inside the sum: f (sm 1 =
(tier1 )
im |xk , A, y). Denote the linear combination that sm
represents by:
(tier1 )
sm
= hm,1 xk +
rm
X
hm,l xjl
(31)
l=2
ck
f (xk , s(tier1 ) = i|A, y)
i∈Zck
f (xk )f (yk |xk )
=
f (yk )
(yk −xk )2
f (xk )
1
=
·√
e− 2σ2
f (yk )
2πσ 2
f (xk |A, y) = f (xk |yk ) =
(tier1 )
check equations that involve xk . s
=H
·x denotes
the value of the left hand side of these check equations
when x is substituted (H (tier1 ) is a submatrix of H that
contains only the rows that relate to these check equations),
and s(tier2 :tiert ) relates in the same manner to all the other
check equations. For simplicity of notations, denote the event
s(tier2 :tiert ) ∈ ZNc −ck by A. As explained above, in all the
calculations we assume that all the xk ’s are independent.
Using (24), we get:
(tier1 )
Evaluating the left term, we get:
(27)
where {hm,l }, l = 1, 2...rm is the set of nonzero coefficients
of the appropriate parity check equation, and jl is the set
of indices of the appropriate x elements (note that the set
jl depends on m but we omit the “m” index for clarity of
notations). Without loss of generality, hm,1 is assumed to be
∆ P rm
the coefficient of xk . Define zm = l=2
hm,l xjl , such that
(tier )
sm 1 = hm,1 xk + zm . We then have:
(tier1 )
f (sm
= im |xk , A, y) =
(32)
17
= fzm |xk ,A,y (zm = im − hm,1 xk |xk , A, y)
Now, since we assume that the elements of x are independent,
the PDF of the linear combination zm equals the convolution
of the PDF’s of its components:
fzm |xk ,A,y (zm |xk , A, y) =
1
zm
=
fxj2 |A,y
|A, y ~
|hm,2 |
h
m,2
zm
1
~
fxj3 |A,y
|A, y ~
|hm,3 |
h
m,3
1
zm
|A, y
··· ~
fx |A,y
(33)
|hm,rm | jrm
hm,rm
Note that the functions fxji |y xji |A, y are simply the output
PDF’s of the previous iteration.
Define now
∆
pm (xk ) = fzm |xk ,A,y (zm = −hm,1 xk |xk , A, y)
(34)
Substituting (32), (34) in (30), we finally get:
f (xk |s(tier1 ) ∈ Zck , A, y) =
= C · f (xk ) · e−
(yk −xk )2
2σ 2
ck X
Y
pm (xk −
m=1 im ∈Z
(35)
im
)
hm,1
This result can be summarized as follows. For each of
the ck check equations that involve xk , the PDF’s (previous
iteration results) of the active equation elements, except for
xk itself, are expanded and convolved, according to (33).
The convolution result is scaled by (−hm,1 ), the negated
coefficient of xk in this check equation, according to (34), to
yield pm (xk ). Then, a periodic function with period 1/|hm,1 |
is generated by adding an infinite number of shifted versions
of the scaled convolution result, according to the sum term
in (35). After repeating this process for all the ck check
equations that involve xk , we get ck periodic functions,
with possibly different periods. We then multiply all these
functions. The multiplication result is further multiplied by
(yk −xk )2
the channel Gaussian PDF term e− 2σ2
and finally by
f (xk ), the marginal PDF of xk under the i.i.d assumption. As
discussed in Section III, we assume that f (xk ) is a uniform
distribution with large enough range. This means that f (xk )
is constant over the valid range of xk , and can therefore be
omitted from (35) and absorbed in the constant C.
As noted above, this result is for the final step (equivalent to
step 2 above), where we determine the PDF of xk according
to the PDF’s of all its tier 1 elements. However, the repeated
iteration step is equivalent to step 1 above. In this step ,we
assume that xk is a tier 1 element of another element, say
xl , and derive the PDF of xk that should be used as input
to step 2 of xl (see figure 2). It can be seen that the only
difference between step 2 and step 1 is that in step 2 all the
check equations that involve xk are used, where in step 1 the
check equation that involves both xk and xl is ignored (there
must be such an equation since xk is one of the tier 1 elements
of xl ). Therefore, the step1 iteration is identical to (35), except
that the product does not contain the term that corresponds to
the check equation that combines both xk and xl . Denote
∆
fkl (xk ) = f (xk |s(tier1
except l)
∈ Zck −1 , A, y)
(36)
We then get:
fkl (xk ) = C · e−
(yk −xk )2
2σ 2
ck
Y
X
m=1 im ∈Z
m6=ml
pm (xk −
im
)
hm,1
(37)
where ml is the index of the check equation that combines both
xk and xl . In principle, a different fkl (xk ) should be calculated
for each xl for which xk is a tier 1 element. However, the
calculation is the same for all xl that share the same check
equation. Therefore, we should calculate fkl (xk ) once for each
check equation that involves xk . l can be regarded as the index
of the check equation within the set of check equations that
involve xk .
We can now formulate the iterative decoder. The decoder
(t)
state variables are PDF’s of the form fkl (xk ), where k =
1, 2, ...n. For each k, l assumes the values 1, 2, ...ck , where ck
is the number of check equations that involve xk . t denotes the
iteration index. For a regular LDLC with degree d there will
be nd PDF’s. The PDF’s are initialized by assuming that xk is
a leaf of the tier diagram. Such a leaf has no tier 1 elements, so
fkl (xk ) = f (xk ) · f (yk |xk ). As explained above for equation
(35), we shall omit the term f (xk ), resulting in initialization
with the channel noise Gaussian around the noisy observation
yk . Then, the PDF’s are updated in each iteration according to
(37). The variable node messages should Rbe further normalized
∞
in order to get actual PDF’s, such that −∞ fkl (xk )dxk = 1
(this will compensate for the constant C). The final PDF’s for
xk , k = 1, 2, ...n are then calculated according to (35).
Finally, we have to estimate the integer valued information vector b. This can be done by first estimating the
codeword vector x from the peaks of the PDF’s: xˆk =
arg maxxk f (xk |s(tier1 ) ∈ Zck , A, y). Finally, we can estimate b as b̂ = bH x̂e.
We have finished developing the iterative algorithm. It can
be easily seen that the message passing formulation of Section
III-A actually implements this algorithm.
A PPENDIX IV
A SYMPTOTIC B EHAVIOR OF THE VARIANCES R ECURSION
A. Proof of Lemma 3 and Lemma 4
We shall now derive the basic iterative equations that relate
the variances at iteration t + 1 to the variances at iteration t
for a magic square LDLC with dimension n, degree d and
generating sequence h1 ≥ h2 ≥ ... ≥ hd > 0.
Each iteration, every check node generates d output messages, one for each variable node that is connected to it, where
the weights of these d connections are ±h1 , ±h2 , ..., ±hd . For
each such output message, the check node convolves d − 1
expanded variable node PDF messages, and then stretches
and periodically extends the result. For a specific check node,
denote the variance of the variable node message that arrives
(t)
along an edge with weight ±hj by Vj , j = 1, 2, ...d. Denote
the variance of the message that is sent back to a variable
18
(t)
node along an edge with weight ±hj by Ṽj . From (2), (3),
we get:
(t)
Ṽj
d
1 X 2 (t)
h V
= 2
hj i=1 i i
For illustration, the new recursion for the case d = 3 is:
1
(t+1)
U1
(38)
(t+1)
=
d
X
1
Ṽi
(39)
From symmetry considerations, it can be seen that all messages that are sent along edges with the same absolute value
of their weight will have the same variance, since the same
variance update occurs for all these messages (both for check
node messages and variable node messages). Therefore, the d
(t)
(t)
(t)
variance values V1 , V2 , ..., Vd are the same for all variable
(t)
nodes, where Vl is the variance of the message that is sent
along an edge with weight ±hl . This completes the proof of
Lemma 3.
Using this symmetry, we can now derive the recursive
(t)
(t)
(t)
update of the variance values V1 , V2 , ..., Vd . Substituting
(38) in (39), we get:
1
(t+1)
=
Vi
d
X
1
h2m
+
Pd
2
2 (t)
σ
j=1 h V
m=1
m6=i
j
j6=m
(40)
j
for i = 1, 2, ...d, which completes the proof of Lemma 4.
B. Proof of Theorem 1
We would like to analyze the convergence of the nonlinear
(t)
(t)
(t)
recursion (4) for the variances V1 , V2 , ..., Vd . This recursion is illustrated in (5)
Pd for 2the case d = 3. It is assumed that
hi
α < 1, where α = i=2
. Define another set of variables
h2
(t)
(t)
1
(t)
U1 , U2 , ..., Ud , which obey the following recursion. The
recursion for the first variable is:
1
(t+1)
U1
d
X
1
h2m
= 2+
2 (t)
σ
m=2 h U
1
(41)
1
where for i = 2, 3, ...d the recursion is:
1
(t+1)
Ui
= Pd
(0)
h21
j=2
(0)
=
U2
1
(t+1)
=
U3
(t)
h2j Uj
(0)
with initial conditions U1 = U2 = ... = Ud = σ 2 .
It can be seen that (41) can be regarded as the approximation
(t)
(t)
(t)
of (4) under the assumptions that Vi << V1 and Vi <<
2
σ for i = 2, 3, ...d.
+
h23
(t)
h21 U1
1
σ2
+
(42)
h21
(t)
(t)
h22 U2 + h23 U3
h21
(t)
(t)
h22 U2 + h23 U3
(t)
It can be seen that in the new recursion, U1 obeys a
recursion that is independent of the other variables. From
1
(41), this recursion can be written as (t+1)
= σ12 + α(t) ,
(0)
with initial condition U1
stable linear recursion for
(t)
1
+ 2
σ
(t)
i=1
i6=j
(t)
h21 U1
(t+1)
Then, each variable node generates d messages, one for each
check node that is connected to it, where the weights of these
d connections are ±h1 , ±h2 , ..., ±hd . For each such output
message, the variable node generates the product of d − 1
check node messages and the channel noise PDF. For a specific
variable node, denote the variance of the message that is sent
back to a check node along an edge with weight ±hj by
(t+1)
(this is the final variance of the iteration). From claim
Vj
2, we then get:
Vj
h22
1
i6=j
1
=
U1
U1
= σ 2 . Since α < 1, this is a
1
(t) , which can be solved to get
U1
U1 = σ 2 (1 − α) 1−α1t+1 .
For the other variables, it can be seen that all have the same
right hand side in the recursion (41). Since all are initialized
(t)
(t)
(t)
with the same value, it follows that U2 = U3 = ... = Ud
for all t ≥ 0. Substituting back in (41), we get the recursion
(t+1)
(t)
(0)
U2
= αU2 , with initial condition U2 = σ 2 . Since α <
(t)
1, this is a stable linear recursion for U2 , which can be solved
(t)
to get U2 = σ 2 αt .
(t)
We found an analytic solution for the variables Ui . How(t)
ever, we are interested in the variances Vi . The following
claim relates the two sets of variables.
Claim 3: For every t ≥ 0, the first variables of the two sets
(t)
(t)
are related by V1 ≥ U1 , where for i = 2, 3, ...d we have
(t)
(t)
V i ≤ Ui .
Proof: By induction: the initialization of the two sets
of variables obviously satisfies the required relations. Assume
(t)
now that the relations are satisfied for iteration t, i.e. V1 ≥
(t)
(t)
(t)
U1 and for i = 2, 3, ...d, Vi ≤ Ui . If we now compare the
1
right hand side of the update recursion for (t+1)
to that of
V1
1
1
(t+1) (i.e. (4) to (41)), then the right hand side for
(t+1)
U1
V1
is smaller, because it has additional positive terms in the
denominators, where the common terms in the denominators
are larger according to the induction assumption. Therefore,
(t+1)
(t+1)
, as required. In the same manner, if we
V1
≥ U1
1
compare the right hand side of the update recursion for (t+1)
Vi
1
1
to that of (t+1)
for i ≥ 2, then the right hand side for (t+1)
Ui
Vi
is larger, because it has additional positive terms, where the
common terms are also larger since their denominators are
(t+1)
smaller due to the induction assumption. Therefore, Vi
≤
(t+1)
Ui
for i = 2, 3, ...d, as required.
(t)
Using claim 3 and the analytic results for Ui , we now
have:
1
(t)
(t)
V1 ≥ U1 = σ 2 (1 − α)
≥ σ 2 (1 − α)
(43)
1 − αt+1
where for i = 2, 3, ...d we have:
(t)
Vi
(t)
≤ Ui
= σ 2 αt
(44)
We have shown that the first variance is lower bounded
by a positive nonzero constant where the other variances
19
are upper bounded by a term that decays exponentially to
(t)
(t)
zero. Therefore, for large t we have Vi
<< V1 and
(t)
2
Vi << σ for i = 2, 3, ...d. It then follows that for large
t the variances approximately obey the recursion (41), which
was built from the actual variance recursion (4) under these
assumptions. Therefore, for i = 2, 3, ...d the variances are not
only upper bounded by an exponentially decaying term, but
actually approach such a term, where the first variance actually
approaches the constant σ 2 (1 − α) in an exponential rate. This
completes the proof of Theorem 1.
Note that the above analysis only applies if α < 1. To
illustrate the behavior for α ≥ 1, consider the simple case
of h1 = h2 = ... = hd . From (4), (5) it can be seen
(0)
(t)
that for this case, if Vi is independent of i, then Vi is
independent of i for every t > 0, since all the elements will
follow the same recursive equations. Substituting this result
in the first equation, we get the single variable recursion
(0)
1
= 1(t) + σ12 with initialization Vi
= σ 2 . This
(t+1)
Vi
Vi
(t)
2
σ
recursion is easily solved to get 1(t) = t+1
= t+1
. It
σ 2 or Vi
Vi
can be seen that all the variances converge to zero, but with
slow convergence rate of o(1/t).
A PPENDIX V
A SYMPTOTIC B EHAVIOR OF THE M EAN VALUES
R ECURSION
A. Proof of Lemma 5 and Lemma 6 (Mean of Narrow Messages)
Assume a magic square LDLC with dimension n and degree
d. We shall now examine the effect of the calculations in
the check nodes and variable nodes on the mean values and
derive the resulting recursion. Every iteration, each check node
generates d output messages, one for each variable node that
connects to it, where the weights of these d connections are
±h1 , ±h2 , ..., ±hd . For each such output message, the check
node convolves d − 1 expanded variable node PDF messages,
and then stretches and periodically extends the result. We shall
concentrate on the nd consistent Gaussians that relate to the
same integer vector b (one Gaussian in each message), and
analyze them jointly. For convenience, we shall refer to the
mean value of the relevant consistent Gaussian as the mean of
the message.
Consider now a specific check node. Denote the mean value
of the variable node message that arrives at iteration t along
(t)
the edge with weight ±hj by mj , j = 1, 2, ...d. Denote the
mean value of the message that is sent back to a variable node
(t)
along an edge with weight ±hj by m̃j . From (2), (3) and
claim 1, we get:
d
X
1
(t)
(t)
m̃j =
hi mi
(45)
bk −
hj
i=1
i6=j
where bk is the appropriate element of b that is related to this
specific check equation, which is the only relevant index in
the infinite sum of the periodic extension step (3). Note that
the check node operation is equivalent
Pd to extracting the value
of mj from the check equation i=1 hi mi = bk , assuming
all the other mi are known. Note also that the coefficients
hj should have a random sign. To keep notations simple, we
assume that hj already includes the random sign. Later, when
several equations will be combined together, we should take
it into account.
Then, each variable node generates d messages, one for
each check node that is connected to it, where the weights
of these d connections are ±h1 , ±h2 , ..., ±hd . For each such
output message, the variable node generates the product of
d − 1 check node messages and the channel noise PDF. For a
specific variable node, denote the mean value of the message
that arrives from a check node along an edge with weight
(t)
(t)
±hj by m̃j , and the appropriate variance by Ṽj . The mean
value of the message that is sent back to a check node along
(t+1)
an edge with weight ±hj is mj
, the final mean value of
the iteration. From claim 2, we then get:
Pd
(t)
(t)
yk /σ 2 + i=1 m̃i /Ṽi
i6=j
(t+1)
(46)
mj
=
Pd
(t)
1/σ 2 + i=1 1/Ṽi
i6=j
where yk is the channel observation for the variable node and
(t)
σ 2 is the noise variance. Note that m̃i , i = 1, 2, ..., d in (46)
are the mean values of check node messages that arrive to
the same variable node from different check nodes, where in
(45) they define the mean values of check node messages that
leave the same check node. However, it is beneficial to keep
the notations simple, and we shall take special care when (46)
and (45) are combined.
It can be seen that the convergence of the mean values
is coupled to the convergence of the variances (unlike the
recursion of the variances which was autonomous). However,
as the iterations go on, this coupling disappears. To see that,
recall from Theorem 1 that for each check node, the variance
of the variable node message that comes along an edge with
weight ±h1 approaches a finite value, where the variance of all
the other messages approaches zero exponentially. According
to (38), the variance of the check node message is a weighted
sum of the variances of the incoming variable node messages.
Therefore, the variance of the check node message that goes
along an edge with weight ±h1 will approach zero, since the
weighted sum involves only zero-approaching variances. All
the other messages will have finite variance, since the weighted
sum involves the non zero-approaching variance. To summarize, each variable node sends (and each check node receives)
d − 1 “narrow” messages and a single “wide” message. Each
check node sends (and each variable node receives) d − 1
“wide” messages and a single “narrow” message, where the
narrow message is sent along the edge from which the wide
message was received (the edge with weight ±h1 ).
We shall now concentrate on the case where the variable
node generates a narrow message. Then, the sum in the
(t)
nominator of (46) has a single term for which Ṽi → 0,
which corresponds to i = 1. The same is true for the sum in
the denominator. Therefore, for large t, all the other terms will
become negligible and we get:
(t+1)
mj
(t)
≈ m̃1
(47)
20
(t)
where m̃1 is the mean of the message that comes from the
edge with weight h1 , i.e. the narrow check node message. As
discussed above, d − 1 of the d variable node messages that
leave the same variable node are narrow. From (47) it comes
out that for large t, all these d − 1 narrow messages will have
the same mean value. This completes the proof of Lemma 5.
Now, combining (45) and (47) (where the indices are
arranged again, as discussed above), we get:
!
d
X
1
(t)
(t+1)
bk −
hi mli
(48)
ml1
≈
h1
i=2
where li , i = 1, 2..., d are the variable nodes that take place
in the check equation for which variable node l1 appears
with coefficient ±h1 . bk is the element of b that is related
(t+1)
to this check equation. ml1
denotes the mean value of the
d − 1 narrow messages that leave variable node l1 at iteration
(t)
t + 1. mli is the mean value of the narrow messages that
were generated at variable node li at iteration t. Only narrow
messages are involved in (48), because the right hand side of
(47) is the mean value of the narrow check node message that
arrived to variable node l1 , which results from the convolution
of d−1 narrow variable node messages. Therefore, for large t,
the mean values of the narrow messages are decoupled from
the mean values of the wide messages (and also from the
variances), and they obey an autonomous recursion.
The mean values of the narrow messages at iteration t can
be arranged in an n-element column vector m(t) (one mean
value for each variable node). We would like to show that the
mean values converge to the coordinates of the lattice point
x = Gb. Therefore, it is useful to define the error vector
∆
e(t) = m(t) − x. Since Hx = b, we can write (using the same
notations as (48)):
!
d
X
1
xl1 =
bk −
hi xli
(49)
h1
i=2
variable node messages have already converged to impulses at
the corresponding lattice point coordinates (Theorem 2). We
(t)
can then substitute in (45) mi = xi for i ≥ 2. The mean
value of the (wide) message that is returned along the edge
with weight ±hj (j 6= 1) is:
d
X
1
(t)
(t)
m̃j =
hi xi − h1 m1 =
(52)
bk −
hj
i=2
i6=j
=
h1
1
(t)
(t)
h1 x1 + hj xj − h1 m1 = xj +
x1 − m1
hj
hj
As in the previous section, for convenience of notations we
embed the sign of ±hj in hj itself. The sign ambiguity will
be resolved later.
The meaning of (52) is that the returned mean value
is the desired lattice coordinate plus an error term that is
proportional to the error in the incoming wide message. From
(38), assuming that the variance of the incoming wide message
has already converged to its steady state value σ 2 (1 − α) and
the variance of the incoming narrow messages has already
converged to zero, the variance of this check node message
will be:
(t)
Ṽj
Pd
=
(t+1)
el1
1
≈−
h1
h2
(t+1)
mk
(t)
hi eli
(50)
yk /σ 2 +
i=2
=
Or, in vector notation:
e(t+1) ≈ −H̃ · e(t)
Pd
j=2
xk +
1/σ 2
(51)
where H̃ is derived from H by permuting the rows such that
the ±h1 elements will be placed on the diagonal, dividing
each row by the appropriate diagonal element (h1 or −h1 ),
and then nullifying the diagonal. Note that in order to simplify
the notations, we embedded the sign of ±hj in hj and did not
write it implicitly. However, the definition of H̃ solves this
ambiguity. This completes the proof of Lemma 6.
=
(54)
h1
hj (xp(k,j)
+
h2j
(t)
− mp(k,j) ) h2 σ2 (1−α)
1
h2j
j=2 h21 σ 2 (1−α)
Pd
Note that the x1 and m1 terms of (52) were replaced by xp(k,j)
and mp(k,j) , respectively, since for convenience of notations
we denoted by m1 the mean of the message that came to a
check node along the edge with weight ±h1 . For substitution
in (46) we need to know the exact variable node index that
this edge came from. Therefore, p(k, j) denotes the index of
the variable node that takes place with coefficient ±h1 in the
check equation where xk takes place with coefficient ±hj .
Rearranging terms, we then get:
(t+1)
mk
B. Proof of Lemma 7 (Mean of Wide Messages)
Recall that each check node receives d−1 narrow messages
and a single wide message. The wide message comes along
the edge with weight ±h1 . Denote the appropriate lattice point
by x = Gb, and assume that the Gaussians of the narrow
(53)
i
. Now, each variable node receives d − 1
where α = i=2
h21
wide messages and a single narrow message. The mean values
of the wide messages are according to (52) and the variances
are according to (53). The single wide message that this
variable node generates results from the d − 1 input wide
messages and it is sent along the edge with weight ±h1 . From
(46), the wide mean value generated at variable node k will
then be:
Subtracting (49) from (48), we get:
d
X
h21 2
σ (1 − α)
h2j
yk (1 − α) + xk · α +
=
=
hj
j=2 h1
Pd
(55)
(1 − α) + α
(t)
xp(k,j) − mp(k,j)
=
21
= yk + α(xk − yk ) +
d
1 X
(t)
hj (xp(k,j) − mp(k,j) )
h1 j=2
(t) ∆
Denote now the wide message mean value error by ek =
(t)
mk − xk (where x = Gb is the lattice point that corresponds
to b). Denote by q the difference vector between x and the
are used for the generation of wide variable node messages).
The zero approaching variance corresponds to the message that
(t)
arrives along an edge with weight ±h1 , so assume that Ṽk,1
approaches zero and all other variances approach a non-zero
(t)
value. Then, V̂k,i also approaches zero and we have to show
(t)
V̂k,i
that the term
(t)
Ṽk,1
∆
noisy observation y, i.e. q = y − x. Note that if b corresponds
to the correct lattice point that was transmitted, q equals the
channel noise vector w. Subtracting xk from both sides of
(55), we finally get:
(t+1)
ek
d
1 X
(t)
= qk (1 − α) −
hj ep(k,j)
h1 j=2
(56)
, which is a quotient of zero approaching
(t)
terms, approaches a finite value. Substituting for V̂k,i , we get:
−1
(t)
V̂k,i
lim
(t)
(t)
Ṽk,1 →0 Ṽk,1
d
X
1
1
1 +
2
(t)
(t)
(t)
σ
Ṽk,1 →0 Ṽk,1
j=1 Ṽk,j
= lim
j6=i
If we now arrange all the errors in a single column vector e,
we can write:
−1
e
(t+1)
(t)
= −F · e
+ (1 − α)q
(57)
where F is an n × n matrix defined by:
Hr,k
Hr,l if k 6= l and there exist a row r of H
Fk,l =
for which |Hr,l | = h1 and Hr,k 6= 0
0
otherwise
(58)
F is well defined, since for a given l there can be at most
a single row of H for which |Hr,l | = h1 (note that α < 1
implies that h1 is different from all the other elements of the
generating sequence).
As discussed above, we embedded the sign in hi for convenience of notations, but when several equations are combined
the correct signs should be used. It can be seen that using the
notations of (57) resolves the correct signs of the hi elements.
This completes the proof of Lemma 7.
An alternative way to construct F from H is as follows. To
construct the k’th row of F , denote by ri , i = 1, 2, ...d, the
index of the element in the k’th column of H with value hi
(i.e. |Hri ,k | = hi ). Denote by li , i = 1, 2, ...d, the index of the
element in the ri ’th row of H with value h1 (i.e. |Hri ,li | =
h1 ). The k’th row of F will be all zeros except for the d − 1
H
elements li , i = 2, 3...d, where Fk,li = Hrri,l,k .
i
= lim
(t)
Ṽk,1 →0
(t)
Ṽk,1
σ2
+1+
(t)
Ṽk,1
(t)
j=2 Ṽk,j
j6=i
d
X
=1
(59)
(t)
Therefore, ai converges to a finite steady state value, and
has a finite value for every i and t. This completes the first
part of the proof.
Pnd (t)
We would now like to show that limt→∞ i=1 ai can
be expressed in the form 2σ1 2 (Gb − y)T W (Gb − y). Every
variable node sends d − 1 narrow messages and a single wide
(t)
message. We shall start by calculating ai that corresponds to
a narrow message. For this case, d − 1 check node messages
take place in the sums of (10), from which a single message
is narrow and d − 2 are wide. The narrow message arrives
(t)
along the edge with weight ±h1 , and has variance Ṽk,1 → 0.
Substituting in (10), and using (59), we get:
(t)
a(k−1)d+i →
d
X
1
2 j=2
(t)
m̃k,1
−
(t)
m̃k,j
(t)
Ṽk,j
2
+
2
− yk
σ2
(t)
m̃k,1
j6=i
(60)
i
A PPENDIX VI
A SYMPTOTIC B EHAVIOR OF THE A MPLITUDES R ECURSION
A. Proof of Lemma 9
(t)
=
From (10), ai is clearly non-negative. From Sections IVB, IV-C (and the appropriate appendices) it comes out that for
consistent Gaussians, the mean values and variances of the
messages have a finite bounded value and converge to a finite
(t)
steady state value. The excitation term ai depends on these
mean values and variances according to (10), so it is also finite
and bounded, and it converges to a steady state value, where
caution should be taken for the case of a zero approaching
variance. Note that at most a single variance in (10) may
approach zero (as explained in Section IV-B, a single narrow
check node message is used for the generation of narrow
variable node messages, and only wide check node messages
Denote x = Gb. The mean values of the narrow check node
messages converge to the appropriate lattice point coordinates,
(t)
i.e. m̃k,1 → xk . From Theorem 3, the mean value of the wide
variable node message that originates from variable node k
converges to xk +ek , where e denotes the vector of error terms.
The mean value of a wide check node message that arrives to
node k along an edge with weight ±hj can be seen to approach
(t)
m̃k,j = xk − hh1j ep(k,j) , where p(k, j) denotes the index of
the variable node that takes place with coefficient ±h1 in the
check equation where xk takes place with coefficient ±hj .
For convenience of notations, we shall assume that hj already
includes the sign (this sign ambiguity will be resolved later).
The variance of the wide variable node messages converges to
σ 2 (1 − α), so the variance of the wide check node message
that arrives to node k along an edge with weight ±hj can be
(t)
h2
seen to approach Ṽk,j → h21 σ 2 (1 − α). Substituting in (60),
j
22
and denoting q = y − x, we get:
2
h
1
d
2
(xk − yk )
1 X hj ep(k,j)
(t)
=
+
a(k−1)d+i →
2
2 j=2 h12 σ 2 (1 − α)
σ2
h
±hj in hj for convenience of notations, as discussed above.
Turning now to the second term of (63):
2
d
xk − hh1l ep(k,l) − yk
X
=
(65)
h2
j
j6=i
1
1
=
2
2σ 1 − α
d
X
j=2
j6=i
e2p(k,j)
qk2
+
=
(61)
→
(62)
d
X
=
d
=
To complete the calculation of the contribution of node k to the
(t)
excitation term, we still have to calculate ai that corresponds
(t)
to a wide message. Substituting m̃k,j → xk − hh1j ep(k,j) ,
(t)
a(k−1)d+1
(t)
− α), V̂k,1 → σ 2 (1 − α) in (10), we get:
d
d
1X X
→
2
h1
hl ep(k,l)
h21
h2l
l=2 j=l+1
+
d
1 X xk −
2
l=2
·
h1
hl ep(k,l)
h21
h2l
−
h1
hj ep(k,j)
h21 2
σ (1
h2j
d
d
h21
h2l
l=2
d
d
1 XX
=
2
j=2
l=2
+ αqk2 + 2qk (F · e)k =
− yk
·
2
+
d
X
2
(63)
+
hj
hl
ep(k,l) − ep(k,j)
h1
h1
e2p(k,l)
+ αqk2 + 2qk [(1 − α)qk − ek ] =
d
X
!
e2p(k,l)
d
=
(64)
(66)
d+1−α 2
1
1
qk − 2
(F e)2k − 2 qk ek
2σ 2
2σ (1 − α)
σ
nd
X
(t)
ai =
n X
d
X
(t)
a(k−1)d+i →
k=1 i=1
(d − 1)2
2
kek +
2σ 2 (1 − α)
(67)
2
=
+
!
2
d
d
X
X
h
j
2
=α
ep(k,j) −
ep(k,j) =
h
j=2
j=2 1
d+1−α
q
2σ 2
2
1
1
2
kF ek − 2 q T e
2σ 2 (1 − α)
σ
−
Substituting e = (1 − α)(I + F )−1 q (see Theorem 3), we
= finally get:
nd
X
(t)
ai →
i=1
1 T
q Wq
2σ 2
(68)
where:
∆
T
W = (1 − α)(I + F )−1
d
X
X
d−1
e2
+
2
2σ (1 − α) j=2 p(k,j)
(t)
a(k−1)d+i →
Summing over all the variable nodes, the total asymptotic
excitation sum term is:
h2j 2
h2
hj hl
ep(k,l) + l2 e2p(k,j) − 2
ep(k,l) ep(k,j)
2
h1
h1
h1 h1
=α
+ (2 − α)qk2 − 2qk ek
i=1
σ2
h21
h2j
!
where we have substituted F e → (1 − α)q − e, as comes out
from Lemma 7. Again, using F resolves the sign ambiguity
of hj , as discussed above.
Substituting (64) and (65) back in (63), summing the result
with (62), and rearranging terms, the total contribution of
variable node k to the asymptotic excitation sum term is:
i=1
1 XX
=
2
j=2
!
e2p(k,l)
l=2
− α)
Starting with the first term, we have:
2
h1
h1
d
d
X
X
hl ep(k,l) − hj ep(k,j)
l=2 j=l+1
=
l=2
1 d − 2 X 2
+ (d − 1)qk2
e
→
2σ 2
1 − α j=2 p(k,j)
h21 2
σ (1
h2j
l=2
(t)
a(k−1)d+i
(t)
hl
h2l 2
q + 2qk ep(k,l)
h21 k
h1
e2p(k,l) +
d
X
=
i=2
Ṽk,j →
d
X
h2l
l=2
Summing over all the narrow messages that leave variable
node k, we get:
d
X
1
l=2
(d − 1)2 I − F T F (I + F )−1 +
2
e2p(k,j) − (F · e)k
j=2
where F is defined in Theorem 3 and (F · e)k denotes the
k’th element of the vector (F · e). Note that using F solves
the sign ambiguity that results from embedding the sign of
+(d + 1 − α)I − 2(1 − α)(I + F )−1
(69)
Pnd (t)
From (10) it can be seen that i=1 ai is positive for every
nonzero q. Therefore, W is positive definite. This completes
the second part of the proof.
23
(t)
Since ai is finite and bounded, there exists ma such that
(t)
|ai | ≤ ma for all 1 ≤ i ≤ nd and t > 0. We then have:
∞ Pnd
∞
(j)
X
X
n · ma
nd · ma
i=1 ai
≤
=
2j+2
2j+2
(d − 1)
(d − 1)
(d − 2)
j=0
j=0
Therefore, for d > 2 the infinite sum will have a finite steady
state value. This completes the proof of Lemma 9.
A PPENDIX VII
G ENERATION OF A PARITY C HECK M ATRIX FOR LDLC
In the following pseudo-code description, the i, j element
of a matrix P is denoted by Pi,j and the k’th column of a
matrix P is denoted by P:,k .
# Input: block length n, degree d,
nonzero elements {h1 , h2 , ...hd }.
# Output: a magic square LDLC parity check matrix H
with generating sequence {h1 , h2 , ...hd }.
# Initialization:
choose d random permutations on {1, 2, ...n}.
Arrange the permutations in an d × n matrix P
such that each row holds a permutation.
c = 1; # column index
loopless columns = 0; # number of consecutive
# columns without loops
# loop removal:
while loopless columns < n
changed permutation = 0;
if exists i 6= j such that Pi,c = Pj,c
# a 2-loop was found at column c
changed permutation = i;
else
# if there is no 2-loop, look for a 4-loop
if exists c0 6= c such that P:,c and P:,c0 have
two or more common elements
# a 4-loop was found at column c
changed permutation = line of P for which
the first common element appears in column c;
end
end
if changed permutation 6= 0
# a permutation should be modified to
# remove loop
choose a random integer 1 ≤ i ≤ n;
swap locations c and i in
permutation changed permutation;
loopless columns = 0;
else
# no loop was found at column c
loopless columns = loopless columns + 1;
end
# increase column index
c = c + 1;
if c > n
c = 1;
end
end
# Finally, build H from the permutations
initialize H as an n × n zero matrix;
for i = 1 : n
for j = 1 : d
HPj,i ,i = hj · random sign;
end
end
A PPENDIX VIII
R EDUCING THE C OMPLEXITY OF THE FFT C ALCULATIONS
FFT calculation can be made simpler by using the fact
that the convolution is followed by the following steps: the
convolution result p̃j (x) is stretched to pj (x) = p̃j (−h
j x) and
P∞
then periodically extended to Qj (x) = i=−∞ pj x − hij
(see (3)). It can be seen that the stretching and periodic extension steps can be exchanged, and the convolution result p̃j (x)
can
P∞be first periodically extended with period 1 to Q̃j (x) =
i=−∞ p̃j (x + i) and then stretched to Qj (x) = Q̃j (−hj x).
Now, the infinite sum can be written as a convolution with a
sequence of Dirac impulses:
Q̃j (x) =
∞
X
i=−∞
p̃j (x + i) = p̃j (x) ~
∞
X
δ(x + i)
(70)
i=−∞
Therefore, the Fourier transform of Q̃j (x) will equal the
Fourier transform of p̃j (x) multiplied by the Fourier transform
of the impulse sequence, which is itself an impulse sequence.
The FFT of Q̃j (x) will therefore have several nonzero values,
separated by sequences of zeros. These nonzero values will
equal the FFT of p̃j (x) after decimation. To ensure an integer
decimation rate, we should choose the PDF resolution ∆ such
that an interval with range 1 (the period of Q̃j (x)) will contain
an integer number of samples, i.e. 1/∆ should be an integer.
Also, we should choose L (the number of samples in Q̃j (x)) to
correspond to a range which equals an integer, i.e. D = L · ∆
should be an integer. Then, we can calculate the (size L) FFT
of p̃j (x) and then decimate by D. The result will give 1/∆
samples which correspond to a single period (with range 1)
of Q̃j (x).
However, instead of calculating an FFT of length L and immediately decimating, we can directly calculate the decimated
FFT. Denote the expanded PDF at the convolution input by
f˜k , k = 1, 2, ...L (where the expanded PDF is zero padded to
length L). To generate directly the decimated result, we can
first calculate the (size D) FFT of each group of D samples
which are generated by decimating f˜k by L/D = 1/∆. Then,
the desired decimated result is the FFT (of size 1/∆) of the
sequence of first samples of each FFT of size D. However,
The first sample of an FFT is simply the sum of its inputs.
Therefore, we
should only calculate the sequence (of length
PD−1
1/∆) gi = k=0 f˜i+k/∆ , i = 1, 2, ...1/∆ and then calculate
the FFT (of length 1/∆) of the result. This is done for all the
24
expanded PDF’s. Then, d − 1 such results are multiplied, and
an IFFT (of length 1/∆) gives a single period of Q̃j (x).
With this method, instead of calculating d FFT’s and d
IFFT’s of size larger than L, we calculate d FFT’s and d IFFT’s
of size L/D = 1/∆.
In order to generate the final check node message, we
should stretch Q̃j (x) to Qj (x) = Q̃j (−hj x). This can be done
by interpolating a single period of Q̃j (x) using interpolation
methods similar to those that were used in Section VI for
expanding the variable node PDF’s.
ACKNOWLEDGMENT
Support and interesting discussions with Ehud Weinstein are
gratefully acknowledged.
R EFERENCES
[1] C. E. Shannon, “A mathematical theory of communication,” Bell Syst.
Tech. J., vol. 27, pp. 379-423 and pp. 623-656, July and Oct. 1948.
[2] P. Elias, “Coding for noisy channels,” in IRE Conv. Rec., Mar. 1955, vol.
3, pt. 4, pp. 37-46.
[3] C. E. Shannon, “Probability of error for optimal codes in a Gaussian
channel,” Bell Syst. Tech. J., vol. 38, pp. 611-656, 1959.
[4] R. E. Blahut, Theory and Practice of Error Control Codes. AddisonWesley, 1983.
[5] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA: MIT
Press, 1963.
[6] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit errorcorrecting coding and decoding: Turbo codes,” Proc. IEEE Int. Conf.
Communications, pp. 1064–1070, 1993.
[7] R. de Buda, “The upper error bound of a new near-optimal code,” IEEE
Trans. Inform. Theory, vol. IT-21, pp. 441-445, July 1975.
[8] R. de Buda, “Some optimal codes have structure,” IEEE J. Select. Areas
Commun., vol. 7, pp. 893-899, Aug. 1989.
[9] T. Linder, Ch. Schlegel, and K. Zeger, “Corrected proof of de Buda‘s
Theorem,” IEEE Trans. Inform. Theory, pp. 1735-1737, Sept. 1993.
[10] H. A. Loeliger, “Averaging bounds for lattices and linear codes,” IEEE
Trans. Inform. Theory, vol. 43, pp. 1767-1773, Nov. 1997.
[11] R. Urbanke and B. Rimoldi, “Lattice codes can achieve capacity on the
AWGN channel,” IEEE Trans. Inform. Theory, pp. 273-278, Jan. 1998.
[12] U. Erez and R. Zamir, “Achieving 1/2 log(1 + SNR) on the AWGN
channel with lattice encoding and decoding,” IEEE Trans. Inf. Theory,
vol. 50, pp. 2293-2314, Oct. 2004.
[13] A. R. Calderbank and N. J. A. Sloane, “New trellis codes based on
lattices and cosets,” IEEE Trans. Inform. Theory, vol. IT-33, pp. 177195, Mar. 1987.
[14] G. D. Forney, Jr., “Coset codes-Part I: Introduction and geometrical
classification,” IEEE Trans. Inform. Theory, pp. 1123-1151, Sept. 1988.
[15] O. Shalvi, N. Sommer and M. Feder, “Signal Codes,” proceedings of
the Information theory Workshop, 2003, pp. 332–336.
[16] O. Shalvi, N. Sommer and M. Feder, “Signal Codes,” in preparation.
[17] A. Bennatan and D. Burshtein, “Design and analysis of nonbinary LDPC
codes for arbitrary discrete-memoryless channels,” IEEE Transactions on
Information Theory, volume 52, no. 2, pp. 549–583, February 2006.
[18] J. Hou, P. H Siegel, L. B Milstein and H. D Pfister, “Capacity
approaching bandwidth efficient coded modulation schemes based on low
density parity check codes,” IEEE Transactions on Information Theory,
volume 49, pp. 2141–2155, Sept. 2003.
[19] J. H. Conway and N. J. Sloane, Sphere Packings, Lattices and Groups.
New York: Springer, 1988.
[20] G. Poltyrev, “On coding without restrictions for the AWGN channel,”
IEEE Trans. Inform. Theory, vol. 40, pp. 409-417, Mar. 1994.
[21] A. Papoulis, Probability, Random variables and Stochastic Processes.
McGraw Hill, second edition, 1984.
[22] Y. Saad, Iterative Methods for Sparse Linear Systems. Society for
Industrial and Applied Mathematic (SIAM), 2nd edition, 2003.
[23] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point search in
lattices,” IEEE Trans. Inf. Theory, vol. 48, pp. 2201-2214, Aug. 2002.
[24] N. Sommer, M. Feder and O. Shalvi, “Closest point search in lattices
using sequential decoding,” proceedings of the International Symposium
on Information Theory (ISIT), 2005, pp. 1053–1057 .
[25] H. El Gamal, G. Caire and M. Damen, “Lattice coding and decoding
achieve the optimal diversity-multiplexing tradeoff of MIMO channels,”
IEEE Trans. Inf. Theory, vol. 50, pp. 968–985, Jun. 2004.