Academia.eduAcademia.edu

Low-density lattice codes

2008, Information Theory, IEEE …

1 Low Density Lattice Codes arXiv:0704.1317v1 [cs.IT] 11 Apr 2007 Naftali Sommer, Senior Member, IEEE, Meir Feder, Fellow, IEEE, and Ofir Shalvi, Member, IEEE Abstract— Low density lattice codes (LDLC) are novel lattice codes that can be decoded efficiently and approach the capacity of the additive white Gaussian noise (AWGN) channel. In LDLC a codeword x is generated directly at the n-dimensional Euclidean space as a linear transformation of a corresponding integer message vector b, i.e., x = Gb, where H = G−1 is restricted to be sparse. The fact that H is sparse is utilized to develop a lineartime iterative decoding scheme which attains, as demonstrated by simulations, good error performance within ∼ 0.5dB from capacity at block length of n = 100, 000 symbols. The paper also discusses convergence results and implementation considerations. Index Terms— Lattices, lattice codes, iterative decoding, LDPC. I. I NTRODUCTION If we take a look at the evolution of codes for binary or finite alphabet channels, it was first shown [1] that channel capacity can be achieved with long random codewords. Then, it was found out [2] that capacity can be achieved via a simpler structure of linear codes. Then, specific families of linear codes were found that are practical and have good minimum Hamming distance (e.g. convolutional codes, cyclic block codes, specific cyclic codes such as BCH and Reed-Solomon codes [4]). Later, capacity achieving schemes were found, which have special structures that allow efficient iterative decoding, such as low-density parity-check (LDPC) codes [5] or turbo codes [6]. If we now take a similar look at continuous alphabet codes for the additive white Gaussian noise (AWGN) channel, it was first shown [3] that codes with long random Gaussian codewords can achieve capacity. Later, it was shown that lattice codes can also achieve capacity ([7] – [12]). Lattice codes are clearly the Euclidean space analogue of linear codes. Similarly to binary codes, we could expect that specific practical lattice codes will then be developed. However, there was almost no further progress from that point. Specific lattice codes that were found were based on fixed dimensional classical lattices [19] or based on algebraic error correcting codes [13][14], but no significant effort was made in designing lattice codes directly in the Euclidean space or in finding specific capacity achieving lattice codes. Practical coding schemes for the AWGN channel were based on finite alphabet codes. In [15], “signal codes” were introduced. These are lattice codes, designed directly in the Euclidean space, where the information sequence of integers in , n = 1, 2, ... is encoded by convolving it with a fixed signal pattern gn , n = 1, 2, ...d. Signal codes are clearly analogous to convolutional codes, and The material in this paper was presented in part in the IEEE International Symposium on Information Theory, Seattle, July 2006, and in part in the Inauguration of the UCSD Information Theory and Applications Center, San Diego, Feb. 2006. in particular can work at the AWGN channel cutoff rate with simple sequential decoders. In [16] it is also demonstrated that signal codes can work near the AWGN channel capacity with more elaborated bi-directional decoders. Thus, signal codes provided the first step toward finding effective lattice codes with practical decoders. Inspired by LDPC codes and in the quest of finding practical capacity achieving lattice codes, we propose in this work “Low Density Lattice Codes” (LDLC). We show that these codes can approach the AWGN channel capacity with iterative decoders whose complexity is linear in block length. In recent years several schemes were proposed for using LDPC over continuous valued channels by either multilevel coding [18] or by non-binary alphabet (e.g. [17]). Unlike these LDPC based schemes, in LDLC both the encoder and the channel use the same real algebra which is natural for the continuous-valued AWGN channel. This feature also simplifies the convergence analysis of the iterative decoder. The outline of this paper is as follows. Low density lattice codes are first defined in Section II. The iterative decoder is then presented in Section III, followed by convergence analysis of the decoder in Section IV. Then, Section V describes how to choose the LDLC code parameters, and Section VI discusses implementation considerations. The computational complexity of the decoder is then discussed in Section VII, followed by a brief description of encoding and shaping in Section VIII. Simulation results are finally presented in Section IX. II. BASIC D EFINITIONS AND P ROPERTIES A. Lattices and Lattice Codes An n dimensional lattice in Rm is defined as the set of all linear combinations of a given basis of n linearly independent vectors in Rm with integer coefficients. The matrix G, whose columns are the basis vectors, is called a generator matrix of the lattice. Every lattice point is therefore of the form x = Gb, where b is an n-dimensional vector of integers. The Voronoi cell of a lattice point is defined as the set of all points that are closer to this point than to any other lattice point. The Voronoi cells of all lattice points are congruent, and for square G the volume of the Voronoi cell is equal to det(G). In the sequel G will be used to denote both the lattice and its generator matrix. A lattice code of dimension n is defined by a (possibly shifted) lattice G in Rm and a shaping region B ⊂ Rm , where the codewords are all the lattice points that lie within the shaping region B. Denote the number of these codewords by N . The average transmitted power (per channel use, or per symbol) is the average energy of all codewords, divided by the codeword length m. The information rate (in bits/symbol) is log2 (N )/m. 2 When using a lattice code for the AWGN channel with power limit P and noise variance σ 2 , the maximal information rate is limited by the capacity 12 log2 (1 + σP2 ). Poltyrev [20] considered the AWGN channel without restrictions. If there is no power restriction, code rate is a meaningless measure, since it can be increased without limit. Instead, it was suggested in [20] to use the measure of constellation density, leading to a generalized definition of the capacity as the maximal possible codeword density that can be recovered reliably. When applied to lattices, the generalized capacity implies that there exists a lattice G of high enough dimension n that enables transmis2 sion √ with arbitrary small error probability, if and only if σ < n |det(G)|2 . 2πe A lattice that achieves the generalized capacity of the AWGN channel without restrictions, also achieves the channel capacity of the power constrained AWGN channel, with a properly chosen spherical shaping region (see also [12]). In the rest of this work we shall concentrate on the lattice design and the lattice decoding algorithm, and not on the shaping region or shaping algorithms. We shall use lattices with det(G) = 1, where analysis and simulations will be carried for the AWGN channel without restrictions. A capacity achieving lattice will have small error probability for noise 1 . variance σ 2 which is close to the theoretical limit 2πe B. Syndrome and Parity Check Matrix for Lattice Codes A binary (n, k) error correcting code is defined by its n × k binary generator matrix G. A binary information vector b with dimension k is encoded by x = Gb, where calculations are performed in the finite field GF(2). The parity check matrix H is an (n − k) × n matrix such that x is a codeword if and only if Hx = 0. The input to the decoder is the noisy codeword y = x + e, where e is the error sequence and addition is done in the finite field. The decoder typically starts by calculating the syndrome s = Hy = H(x+e) = He which depends only on the noise sequence and not on the transmitted codeword. We would now like to extend the definitions of the parity check matrix and the syndrome to lattice codes. An ndimensional lattice code is defined by its n×n lattice generator matrix G (throughout this paper we assume that G is square, but the results are easily extended to the non-square case). Every codeword is of the form x = Gb, where b is a vector of integers. Therefore, G−1 x is a vector of integers for every codeword x. We define the parity check matrix ∆ for the lattice code as H = G−1 . Given a noisy codeword y = x + w (where w is the additive noise vector, e.g. AWGN, added by real arithmetic), we can then define the syndrome as ∆ s = f rac{Hy}, where f rac{x} is the fractional part of x, defined as f rac{x} = x − bxe, where bxe denotes the nearest integer to x. The syndrome s will be zero if and only if y is a lattice point, since Hy will then be a vector of integers with zero fractional part. For a noisy codeword, the syndrome will equal s = f rac{Hy} = f rac{H(x + w)} = f rac{Hw} and therefore will depend only on the noise sequence and not on the transmitted codeword, as desired. Note that the above definitions of the syndrome and parity check matrix for lattice codes are consistent with the definitions of the dual lattice and the dual code[19]: the dual lattice of a lattice G is defined as the lattice with generator matrix H = G−1 , where for binary codes, the dual code of G is defined as the code whose generator matrix is H, the parity check matrix of G. C. Low Density Lattice Codes We shall now turn to the definition of the codes proposed in this paper - low density lattice codes (LDLC). Definition 1 (LDLC): An n dimensional LDLC is an ndimensional lattice code with a non-singular lattice generator matrix G satisfying |det(G)| = 1, for which the parity check matrix H = G−1 is sparse. The i’th row degree ri , i = 1, 2, ...n is defined as the number of nonzero elements in row i of H, and the i’th column degree ci , i = 1, 2, ...n is defined as the number of nonzero elements in column i of H. Note that in binary LDPC codes, the code is completely defined by the locations of the nonzero elements of H. In LDLC there is another degree of freedom since we also have to choose the values of the nonzero elements of H. Definition 2 (regular LDLC): An n dimensional LDLC is regular if all the row degrees and column degrees of the parity check matrix are equal to a common degree d. Definition 3 (magic square LDLC): An n dimensional regular LDLC with degree d is called “magic square LDLC” if every row and column of the parity check matrix H has the same d nonzero values, except for a possible change of order and random signs. The sorted sequence of these d values h1 ≥ h2 ≥ ... ≥ hd > 0 will be referred to as the generating sequence of the magic square LDLC. For example, the matrix   0 −0.8 0 −0.5 1 0  0.8 0 0 1 0 −0.5     0 0.5 1 0 0.8 0    H= 0 −0.5 −0.8 0 1    0  1 0 0 0 0.5 0.8  0.5 −1 −0.8 0 0 0 is a parity check matrix of a magic square LDLC with lattice dimension n = 6, degree d = 3 and generating sequence {1, 0.8, 0.5}. p This H should be further normalized by the constant n |det(H)| in order to have |det(H)| = |det(G)| = 1, as required by Definition 1. The bipartite graph of an LDLC is defined similarly to LDPC codes: it is a graph with variable nodes at one side and check nodes at the other side. Each variable node corresponds to a single element of the codeword x = Gb. Each check node corresponds to a check P equation (a row of H). A check equation is of the form k hk xik = integer, where ik denotes the locations of the nonzero elements at the appropriate row of H, hk are the values of H at these locations and the integer at the right hand side is unknown. An edge connects check node i and variable node j if and only if Hi,j 6= 0. This edge is assigned the value Hi,j . Figure 1 illustrates the bi-partite graph of a magic square LDLC with degree 3. In the figure, every variable node xk is also associated with its noisy channel observation yk . Finally, a k-loop is defined as a loop in the bipartite graph that consists of k edges. A bipartite graph, in general, can only 3 X11 X10 X9 X8 Tier 2 (24 nodes) y1 x1 h1 Tier1 (6 nodes) X7 h3 yk xk h2 h1 yn Fig. 1. xn The bi-partite graph of an LDLC contain loops with even length. Also, a 2-loop, which consists of two parallel edges that originate from the same variable node to the same check node, is not possible by the definition of the graph. However, longer loops are certainly possible. For example, a 4-loop exists when two variable nodes are both connected to the same pair of check nodes. III. I TERATIVE D ECODING FOR THE AWGN C HANNEL Assume that the codeword x = Gb was transmitted, where b is a vector of integers. We observe the noisy codeword y = x + w, where w is a vector of i.i.d Gaussian noise samples with common variance σ 2 , and we need to estimate the integer valued vector b. The maximum likelihood (ML) estimator is then b̂ = arg min ||y − Gb||2 . b Our decoder will not estimate directly the integer vector b. Instead, it will estimate the probability density function (PDF) of the codeword vector x. Furthermore, instead of calculating the n-dimensional PDF of the whole vector x, we shall calculate the n one-dimensional PDF’s for each of the components xk of this vector (conditioned on the whole observation vector y). In appendix I it is shown that fxk |y (xk |y) is a weighted sum of Dirac delta functions: X 2 2 fxk |y (xk |y) = C · δ(xk − lk ) · e−d (l,y)/2σ (1) l∈G∩B where l is a lattice point (vector), lk is its k-th component, C is a constant independent of xk and d(l, y) is the Euclidean distance between l and y. Direct evaluation of (1) is not X5 X4 X3 X2 X1 h2 h3 X6 Fig. 2. Tier diagram practical, so our decoder will try to estimate fxk |y (xk |y) (or at least approximate it) in an iterative manner. Our decoder will decode to the infinite lattice, thus ignoring the shaping region boundaries. This approximate decoding method is no longer exact maximum likelihood decoding, and is usually denoted “lattice decoding” [12]. The calculation of fxk |y (xk |y) is involved since the components xk are not independent random variables (RV’s), because x is restricted to be a lattice point. Following [5] we use a “trick” - we assume that the xk ’s are independent, but add a condition that assures that x is a lattice point. ∆ Specifically, define s = H · x. Restricting x to be a lattice point is equivalent to restricting s ∈ Zn . Therefore, instead of calculating fxk |y (xk |y) under the assumption that x is a lattice point, we can calculate fxk |y (xk |y, s ∈ Zn ) and assume that the xk are independent and identically distributed (i.i.d) with a continuous PDF (that does not include Dirac delta functions). It still remains to set fxk (xk ), the PDF of xk . Under theQi.i.d assumption, the PDF of the codeword x is n fx (x) = k=1 fxk (xk ). As shown in Appendix II, the value of fx (x) is not important at values of x which are not lattice points, but at a lattice point it should be proportional to the probability of using this lattice point. Since we assume that all lattice points are used equally likely, fx (x) must have the same value at all lattice points. A reasonable choice for fxk (xk ) is then to use a uniform distribution such that x will be uniformly distributed in an n-dimensional cube. For an exact ML decoder (that takes into account the boundaries of the shaping region), it is enough to choose the range of fxk (xk ) such that this cube will contain the shaping region. For our decoder, that performs lattice decoding, we should set the range of fxk (xk ) large enough such that the resulting cube will include all the lattice points which are likely to be decoded. The derivation of the iterative decoder shows that this range can be set as large as needed without affecting the complexity of the decoder. The derivation in [5] further imposed the tree assumption. In order to understand the tree assumption, it is useful to define the tier diagram, which is shown in Figure 2 for a regular LDLC with degree 3. Each vertical line corresponds to a check equation. The tier 1 nodes of x1 are all the elements xk that take place in a check equation with x1 . The tier 2 nodes of x1 are all the elements that take place in check equations with the tier 1 elements of x1 , and so on. The tree assumption assumes that all the tree elements are distinct (i.e. no element appears in different tiers or twice in the same tier). This 4 assumption simplifies the derivation, but in general, does not hold in practice, so our iterative algorithm is not guaranteed to converge to the exact solution (1) (see Section IV). The detailed derivation of the iterative decoder (using the above “trick” and the tree assumption) is given in Appendix III. In Section III-A below we present the final resulting algorithm. This iterative algorithm can also be explained by intuitive arguments, described after the algorithm specification. channel PDF check node message #1 check node message #2 check node message #3 A. The Iterative Decoding Algorithm The iterative algorithm is most conveniently represented by using a message passing scheme over the bipartite graph of the code, similarly to LDPC codes. The basic difference is that in LDPC codes the messages are scalar values (e.g. the log likelihood ratio of a bit), where for LDLC the messages are real functions over the interval (−∞, ∞). As in LDPC, in each iteration the check nodes send messages to the variable nodes along the edges of the bipartite graph and vice versa. The messages sent by the check nodes are periodic extensions of PDF’s. The messages sent by the variable nodes are PDF’s. LDLC iterative decoding algorithm: Denote the variable nodes by x1 , x2 , ..., xn and the check nodes by c1 , c2 , ...cn . • Initialization: each variable node xk sends to all its check 2 (0) • Final variable node message Fig. 3. The function Qj (x) is the message that is finally sent to variable node xmj . Basic iteration - variable node message: Each variable node sends a (different) message to each of the check nodes that are connected to it. For a specific variable node xk , assume that it is connected to check nodes Signals at variable node cm1 , cm2 , ...cme , where e is the appropriate column degree of H. Denote by Ql (x), l = 1, 2, ...e, the message that was sent from check node cml to this variable node in the previous half-iteration. The message that is sent back to check node cmj is calculated in two basic steps: (yk −x)2 Q e 1) The product step: f˜j (x) = e− 2σ2 l=1 Ql (x) (yk −x) 1 e− 2σ2 . nodes the message fk (x) = √2πσ 2 Basic iteration - check node message: Each check node sends a (different) message to each of the variable nodes that are connected to it. For a specific check node denote (withoutP loss of generality) the appropriate check equar tion by l=1 hl xml = integer, where xml , l = 1, 2...r are the variable nodes that are connected to this check node (and r is the appropriate row degree of H). Denote by fl (x), l = 1, 2...r, the message that was sent to this check node by variable node xml in the previous halfiteration. The message that the check node transmits back to variable node xmj is calculated in three basic steps. 1) The convolution step - all messages, except fj (x), are convolved (after expanding each fl (x) by hl ):     x x ~ · · · fj−1 ~ p̃j (x) = f1 h hj−1  1    x x ~fj+1 ~ · · · · · · ~ fr (2) hj+1 hr 2) The stretching step - The result is stretched by (−hj ) to pj (x) = p̃j (−hj x) 3) The periodic extension step - The result is extended to a periodic function with period 1/|hj |:   ∞ X i Qj (x) = pj x − (3) hj i=−∞ • check node message #4 2) The normalization step: fj (x) = R∞ l6=j f˜j (x) f˜j (x)dx −∞ • This basic iteration is repeated for the desired number of iterations. Final decision: After finishing the iterations, we want to estimate the integer information vector b. First, we estimate the final PDF’s of the codeword elements xk , k = 1, 2, ...n, by calculating the variable node messages at the last iteration without omitting any check node message (yk −x)2 Q (k) e in the product step: f˜f inal (x) = e− 2σ2 l=1 Ql (x). Then, we estimate each xk by finding the peak of its (k) PDF: xˆk = arg maxx f˜f inal (x). Finally, we estimate b as b̂ = bH x̂e. The operation of the iterative algorithm can be intuitively explained as follows. The check node operation is equivalent to calculating the PDF of xmj fromP the PDF’s of xmi , i = r 1, 2, ..., j − 1, j + 1, ...r, given that l=1 hl xml = integer, and assuming that xmi are independent. Extracting Pr xmj from the check equation, we get xmj = h1j (integer− l=1 hl xml ). l6=j Since the PDF of a sum of independent random variables is the convolution of the corresponding PDF’s, equation (2) and the stretching step that follows it simply calculate the PDF of xmj , assuming that the integer at the right hand side of the check equation is zero. The result is then periodically extended such that a properly shifted copy exists for every possible value of this (unknown) integer. The variable node gets such a message from all the check equations that involve the corresponding variable. The check node messages and the channel PDF are treated as independent sources of information on the variable, so they are multiplied all together. 5 Note that the periodic extension step at the check nodes is equivalent to a convolution with an infinite impulse train. With this observation, the operation of the variable nodes is completely analogous to that of the check nodes: the variable nodes multiply the incoming messages by the channel PDF, where the check nodes convolve the incoming messages with an impulse train, which can be regarded as a generalized “integer PDF”. In the above formulation, the integer information vector b is recovered from the PDF’s of the codeword elements xk . An alternative approach is to calculate the PDF of each integer element bm directly as the PDF of the left hand side of the appropriate check equation. Using the tree assumption, this can be done by simply calculating the convolution p̃(x) as in (2), but this time without omitting any PDF, i.e. all the received variable node messages are convolved. Then, the integer bm is determined by b̂m = arg maxj∈Z p̃(j). Figure 3 shows an example for a regular LDLC with degree d = 5. The figure shows all the signals that are involved in generating a variable node message for a certain variable node. The top signal is the channel Gaussian, centered around the noisy observation of the variable. The next 4 signals are the periodically extended PDF’s that arrived from the check nodes, and the bottom signal is the product of all the 5 signals. It can be seen that each periodic signal has a different period, according to the relevant coefficient of H. Also, the signals with larger period have larger variance. This diversity resolves all the ambiguities such that the multiplication result (bottom plot) remains with a single peak. We expect the iterative algorithm to converge to a solution where a single peak will remain at each PDF, located at the desired value and narrow enough to estimate the information. IV. C ONVERGENCE A. The Gaussian Mixture Model Interestingly, for LDLC we can come up with a convergence analysis that in many respects is more specific than the similar analysis for LDPC. We start by introducing basic claims about Gaussian PDF’s. (x−m)2 1 e− 2V . Denote Gm,V (x) = √2πV Claim 1 (convolution of Gaussians): The convolution of n Gaussians with mean values m1 , m2 , ..., mn and variances V1 , V2 , ..., Vn , respectively, is a Gaussian with mean m1 + m2 + ... + mn and variance V1 + V2 + ... + Vn . Proof: See [21]. Claim 2 (product of n Gaussians): Let Gm1 ,V1 (x), Gm2 ,V2 (x),...,Gmn ,Vn (x) be n Gaussians with mean values m1 , m2 , ..., mn and variances V1 , V2 , ..., Vn respectively. Then, the product of these Gaussians is Qn a scaled Gaussian: = Â · Gm̂,V̂ (x), i=1 Gmi ,Vi (x) Pn −1 Pn 1 1 i=1 mi Vi P = = , and where V̂ n i=1 Vi , m̂ V −1 Â = 1 √ Q (2π)n−1 V̂ −1 n k=1 Vk ·e − V̂2 Pn i=1 i=1 i (mi −mj )2 j=i+1 Vi ·Vj Pn . Proof: By straightforward mathematical manipulations. The reason that we are interested in the properties of Gaussian PDF’s lies in the following lemma. Lemma 1: Each message that is exchanged between the check nodes and variable nodes in the LDLC decoding algorithm (i.e. Qj (x) and fj (x)), at every iteration, can be expressed as a Gaussian mixture of the form M (x) = P ∞ j=1 Aj Gmj ,Vj (x). Proof: By induction: The initial messages are Gaussians, and the basic operations of the iterative decoder preserve the Gaussian mixture nature of Gaussian mixture inputs (convolution and multiplication preserve the Gaussian nature according to claims 1 and 2, stretching, expanding and shifting preserve it by the definition of a Gaussian, and periodic extension transforms a single Gaussian to a mixture and a mixture to a mixture). Convergence analysis should therefore analyze the convergence of the variances, mean values and amplitudes of the Gaussians in each mixture. B. Convergence of the Variances We shall now analyze the behavior of the variances, and start with the following lemma. Lemma 2: For both variable node messages and check node messages, all the Gaussians that take place in the same mixture have the same variance. Proof: By induction. The initial variable node messages are single element mixtures so the claim obviously holds. Assume now that all the variable node messages at iteration t are mixtures where all the Gaussians that take place in the same mixture have the same variance. In the convolution step (2), each variable node message is first expanded. All Gaussians in the expanded mixture will still have the same variance, since the whole mixture is expanded together. Then, d − 1 expanded Gaussian mixtures are convolved. In the resulting mixture, each Gaussian will be the result of convolving d − 1 single Gaussians, one from each mixture. According to claim 1, all the Gaussians in the convolution result will have the same variance, which will equal the sum of the d−1 variances of the expanded messages. The stretching and periodic extension (3) do not change the equal variance property, so it holds for the final check node messages. The variable nodes multiply d − 1 check node messages. Each Gaussian in the resulting mixture is a product of d − 1 single Gaussians, one from each mixture, and the channel noise Gaussian. According to claim 2, they will all have the same variance. The final normalization step does not change the variances so the equal variance property is kept for the final variable node messages at iteration t + 1. Until this point we did not impose any restrictions on the LDLC. From now on, we shall restrict ourselves to magic square regular LDLC (see Definition 3). The basic iterative equations that relate the variances at iteration t + 1 to the variances at iteration t are summarized in the following two lemmas. Lemma 3: For magic square LDLC, variable node messages that are sent at the same iteration along edges with the same absolute value have the same variance. Proof: See Appendix IV. Lemma 4: For magic square LDLC with degree d, denote the variance of the messages that are sent at iteration t 6 (t) along edges with weight ±hl by Vl . The variance values (t) (t) (t) V1 , V2 , ..., Vd obey the following recursion: 1 (t+1) = Vi d X 1 h2m + P σ 2 m=1 dj=1 h2 V (t) m6=i j j6=m (4) j (0) (0) for i = 1, 2, ...d, with initial conditions V1 = V2 = ... = (0) Vd = σ 2 . Proof: See Appendix IV. For illustration, the recursion for the case d = 3 is: 1 (t+1) V1 = 1 (t+1) V2 1 (t+1) V3 h22 (t) h21 V1 = = + + (t) h23 V3 h21 (t) h22 V2 + + (t) h23 V3 h21 (t) h22 V2 + h23 (t) h21 V1 + (t) h23 V3 (t) h22 V2 + + h23 (t) h21 V1 + (t) h22 V2 h22 (t) h21 V1 + (t) h23 V3 1 σ2 + 1 σ2 + 1 σ2 (5) The lemmas above are used to prove the following theorem regarding the convergence of the variances. Theorem 1: For a magic square LDLC with degree d and ∆ generating sequence h1 ≥ h2 ≥ ... ≥ hd > 0, define α = Pd 2 i=2 hi . Assume that α < 1. Then: h2 1 1) The first variance approaches a constant value of σ 2 (1− α), where σ 2 is the channel noise variance: (∞) ∆ V1 (t) = lim V1 t→∞ = σ 2 (1 − α). 2) The other variances approach zero: (∞) ∆ Vi (t) = lim Vi t→∞ =0 for i = 2, 3..d. 3) The asymptotic convergence rate of all variances is exponential: (t) 0 < lim Vi t→∞ (∞) − Vi αt <∞ for i = 1, 2..d. 4) The zero approaching variances are upper bounded by the decaying exponential σ 2 αt : (t) Vi ≤ σ 2 αt for i = 2, 3..d and t ≥ 0. Proof: See Appendix IV. If α ≥ 1, the variances may still converge, but convergence rate may be as slow as o(1/t), as illustrated in Appendix IV. Convergence of the variances to zero implies that the Gaussians approach impulses. This is a desired property of the decoder, since the exact PDF that we want to calculate is indeed a weighted sum of impulses (see (1)). ItPcan be seen d that by designing a code with α < 1, i.e. h21 > i=2 h2i , one variance approaches a constant (and not zero). However, all the other variances approach zero, where all variances converge in an exponential rate. This will be the preferred mode because the information can be recovered even if a single variance does not decay to zero, where exponential convergence is certainly preferred over slow 1/t convergence. Therefore, from now on we shall restrict our analysis to magic square LDLC with α < 1. Theorem 1 shows that every iteration, each variable node will generate d−1 messages with variances that approach zero, and a single message with variance that approaches a constant. The message with nonzero variance will be transmitted along the edge with largest weight (i.e. h1 ). However, from the derivation of Appendix IV it can be seen that the opposite happens for the check nodes: each check node will generate d − 1 messages with variances that approach a constant, and a single message with variance that approaches zero. The check node message with zero approaching variance will be transmitted along the edge with largest weight. C. Convergence of the Mean Values The reason that the messages are mixtures and not single Gaussians lies in the periodic extension step (3) at the check nodes, and every Gaussian at the output of this step can be related to a single index of the infinite sum. Therefore, we can label each Gaussian at iteration t with a list of all the indices that were used in (3) during its creation process in iterations 1, 2, ...t. Definition 4 (label of a Gaussian): The label of a Gaussian consists of a sequence of triplets of the form {t, c, i}, where t is an iteration index, c is a check node index and i is an integer. The labels are initialized to the empty sequence. Then, the labels are updated along each iteration according to the following update rules: 1) In the periodic extension step (3), each Gaussian in the output periodic mixture is assigned the label of the specific Gaussian of pj (x) that generated it, concatenated with a single triplet {t, c, i}, where t is the current iteration index, c is the check node index and i is the index in the infinite sum of (3) that corresponds to this Gaussian. 2) In the convolution step and the product step, each Gaussian in the output mixture is assigned a label that equals the concatenation of all the labels of the specific Gaussians in the input messages that formed this Gaussian. 3) The stretching and normalization steps do not alter the label of each Gaussian: Each Gaussian in the stretched/normalized mixture inherits the label of the appropriate Gaussian in the original mixture. Definition 5 (a consistent Gaussian): A Gaussian in a mixture is called “[ta , tb ] consistent” if its label contains no contradictions for iterations ta to tb , i.e. for every pair of triplets {t1 , c1 , i1 }, {t2 , c2 , i2 } such that ta ≤ t1 , t2 ≤ tb , if c1 = c2 then i1 = i2 . A [0, ∞] consistent Gaussian will be simply called a consistent Gaussian. We can relate every consistent Gaussian to a unique integer vector b ∈ Zn , which holds the n integers used in the n check nodes. Since in the periodic extension step (3) the sum is taken over all integers, a consistent Gaussian exists in each variable node message for every possible integer valued vector b ∈ Zn . We shall see later that this consistent Gaussian corresponds to the lattice point Gb. 7 According to Theorem 1, if we choose the nonzero values of H such that α < 1, every variable node generates d − 1 messages with variances approaching zero and a single message with variance that approaches a constant. We shall refer to these messages as “narrow” messages and “wide” messages, respectively. For a given integer valued vector b, we shall concentrate on the consistent Gaussians that relate to b in all the nd variable node messages that are generated in each iteration (a single Gaussian in each message). The following lemmas summarize the asymptotic behavior of the mean values of these consistent Gaussians for the narrow messages. Lemma 5: For a magic square LDLC with degree d and α < 1, consider the d − 1 narrow messages that are sent from a specific variable node. Consider further a single Gaussian in each message, which is the consistent Gaussian that relates to a given integer vector b. Asymptotically, the mean values of these d − 1 Gaussians become equal. Proof: See Appendix V. Lemma 6: For a magic square LDLC with dimension n, degree d and α < 1, consider only consistent Gaussians that relate to a given integer vector b and belong to narrow messages. Denote the common mean value of the d − 1 such Gaussians that are sent from variable node i at iteration t by (t) mi , and arrange all these mean values in a column vector ∆ m(t) of dimension n. Define the error vector e(t) = m(t) − x, where x = Gb is the lattice point that corresponds to b. Then, for large t, e(t) satisfies: e(t+1) ≈ −H̃ · e(t) (6) where H̃ is derived from H by permuting the rows such that the ±h1 elements will be placed on the diagonal, dividing each row by the appropriate diagonal element (h1 or −h1 ), and then nullifying the diagonal. Proof: See Appendix V. We can now state the following theorem, which describes the conditions for convergence and the steady state value of the mean values of the consistent Gaussians of the narrow variable node messages. Theorem 2: For a magic square LDLC with α < 1, the mean values of the consistent Gaussians of the narrow variable node messages that relate to a given integer vector b are assured to converge if and only if all the eigenvalues of H̃ have magnitude less than 1, where H̃ is defined in Lemma 6. When this condition is fulfilled, the mean values converge to the coordinates of the appropriate lattice point: m(∞) = G · b. Proof: Immediate from Lemma 6. Note that without adding random signs to the LDLC nonzero values, the P all-ones vector will be an eigenvector of d hi , which may exceed 1. H̃ with eigenvalue i=2 h1 Interestingly, recursion (6) is also obeyed by the error of the Jacobi method for solving systems of sparse linear equations [22] (see also Section VIII-A), when it is used to solve Hm = b (with solution m = Gb). Therefore, the LDLC decoder can be viewed as a superposition of Jacobi solvers, one for each possible value of the integer valued vector b. We shall now turn to the convergence of the mean values of the wide messages. The asymptotic behavior is summarized in the following lemma. Lemma 7: For a magic square LDLC with dimension n and α < 1, consider only consistent Gaussians that relate to a given integer vector b and belong to wide messages. Denote the mean value of such a Gaussian that is sent from variable node i at (t) iteration t by mi , and arrange all these mean values in a column vector m(t) of dimension n. Define the error vector ∆ e(t) = m(t) − Gb. Then, for large t, e(t) satisfies: e(t+1) ≈ −F · e(t) + (1 − α)(y − Gb) (7) where y is the noisy codeword and F is an n × n matrix defined by:  Hr,k  Hr,l if k 6= l and there exist a row r of H Fk,l = for which |Hr,l | = h1 and Hr,k 6= 0  0 otherwise (8) Proof: See Appendix V, where an alternative way to construct F from H is also presented. The conditions for convergence and steady state solution for the wide messages are described in the following theorem. Theorem 3: For a magic square LDLC with α < 1, the mean values of the consistent Gaussians of the wide variable node messages that relate to a given integer vector b are assured to converge if and only if all the eigenvalues of F have magnitude less than 1, where F is defined in Lemma 7. When this condition is fulfilled, the steady state solution is m(∞) = G · b + (1 − α)(I + F )−1 (y − G · b). Proof: Immediate from Lemma 7. Unlike the narrow messages, the mean values of the wide messages do not converge to the appropriate lattice point coordinates. The steady state error depends on the difference between the noisy observation and the lattice point, as well as on α, and it decreases to zero as α → 1. Note that the final PDF of a variable is generated by multiplying all the d check node messages that arrive to the appropriate variable node. d − 1 of these messages are wide, and therefore their mean values have a steady state error. One message is narrow, so it converges to an impulse at the lattice point coordinate. Therefore, the final product will be an impulse at the correct location, where the wide messages will only affect the magnitude of this impulse. As long as the mean values errors are not too large (relative to the width of the wide messages), this should not cause an impulse that corresponds to a wrong lattice point to have larger amplitude than the correct one. However, for large noise, these steady state errors may cause the decoder to deviate from the ML solution (As explained in Section IV-D). To summarize the results for the mean values, we considered the mean values of all the consistent Gaussians that correspond to a given integer vector b. A single Gaussian of this form exists in each of the nd variable node messages that are generated in each iteration. For each variable node, d − 1 messages are narrow (have variance that approaches zero) and a single message is wide (variance approaches a constant). Under certain conditions on H, the mean values of all the narrow messages converge to the appropriate coordinate of the lattice point Gb. Under additional conditions on H, the 8 mean values of the wide messages converge, but the steady state values contain an error term. We analyzed the behavior of consistent Gaussian. It should be noted that there are many more non-consistent Gaussians. Furthermore non-consistent Gaussians are generated in each iteration for any existing consistent Gaussian. We conjecture that unless a Gaussian is consistent, or becomes consistent along the iterations, it fades out, at least at noise conditions where the algorithm converges. The reason is that nonconsistency in the integer values leads to mismatch in the corresponding PDF’s, and so the amplitude of that Gaussian is attenuated. We considered consistent Gaussians which correspond to a specific integer vector b, but such a set of Gaussians exists for every possible choice of b, i.e. for every lattice point. Therefore, the narrow messages will converge to a solution that has an impulse at the appropriate coordinate of every lattice point. This resembles the exact solution (1), so the key for proper convergence lies in the amplitudes: we would like the consistent Gaussians of the ML lattice point to have the largest amplitude for each message. D. Convergence of the Amplitudes We shall now analyze the behavior of the amplitudes of consistent Gaussians (as discussed later, this is not enough for complete convergence analysis, but it certainly gives insight to the nature of the convergence process and its properties). The behavior of the amplitudes of consistent Gaussians is summarized in the following lemma. Lemma 8: For a magic square LDLC with dimension n, degree d and α < 1, consider the nd consistent Gaussians that relate to a given integer vector b in the variable node messages that are sent at iteration t (one consistent Gaussian per message). Denote the amplitudes of these Gaussians by (t) (t) pi , i = 1, 2, ...nd, and define the log-amplitude as li = (t) log pi . Arrange these nd log-amplitudes in a column vector (t) l , such that element (k − 1)d + i corresponds to the message that is sent from variable node k along an edge with weight ±hi . Assume further that the bipartite graph of the LDLC contains no 4-loops. Then, the log-amplitudes satisfy the following recursion: l(t+1) = A · l(t) − a(t) − c(t) (9) with initialization l(0) = 0. A is an nd × nd matrix which is all zeros except exactly (d − 1)2 ’1’s in each row and each column. The element of the excitation vector a(t) at location (k − 1)d + i (where k = 1, 2, ...n and i = 1, 2, ...d) equals: (t) a(k−1)d+i =  = (t) V̂k,i d X   2   d X l=1 j=l+1 l6=i j6=i (t) (t) (t) m̃k,l − (t) Ṽk,l (t) m̃k,j (t) · Ṽk,j (10) 2 + d X l=1 l6=i  2  − yk    (t) σ 2 · Ṽk,l (t) m̃k,l where m̃k,l and Ṽk,l denote the mean value and variance of the consistent Gaussian that relates to the integer vector b in the check node message that arrives to variable node k at iteration t along an edge with weight ±hl . yk is the (t) ∆ noisy channel observation of variable node k, and V̂k,i = −1  Pd 1 1 . Finally, c(t) is a constant excitation l=1 (t) σ2 + l6=i Ṽk,l term that is independent of the integer vector b (i.e. is the same for all consistent Gaussians). Note that an iteration is defined as sending variable node messages, followed by sending check node messages. The first iteration (where the variable nodes send the initialization PDF) is regarded as iteration 0. Proof: At the check node, the amplitude of a Gaussian at the convolution output is the product of the amplitudes of the corresponding Gaussians in the appropriate variable node messages. At the variable node, the amplitude of a Gaussian at the product output is the product of the amplitudes of the corresponding Gaussians in the appropriate check node messages, multiplied by the Gaussian scaling term, according to claim 2. Since we assume that the bipartite graph of the LDLC contains no 4-loops, an amplitude of a variable node message at iteration t will therefore equal the product of (d − 1)2 amplitudes of Gaussians of variable node messages from iteration t − 1, multiplied by the Gaussian scaling term. This proves (9) and shows that A has (d−1)2 ’1’s in every row. However, since each variable node message affects exactly (d − 1)2 variable node messages of the next iteration, A must also have (d − 1)2 ’1’s in every column. The total excitation term −a(t) −c(t) corresponds to the logarithm of the Gaussian scaling term. Each element of this scaling term results from the product of d − 1 check node Gaussians and the channel Gaussian, according to claim 2. This scaling term sums over all the pairs of Gaussians, and in (10) the sum is separated to pairs that include the channel Gaussian and pairs that do not. The total excitation is divided between (10), which depends on the choice of the integer vector b, and c(t) , which includes all the constant terms that are independent on b (including the normalization operation which is performed at the variable node). Since there are exactly (d − 1)2 ’1’s in each column of the matrix A, it is easy to see that the all-ones vector is an eigenvector of A, with eigenvalue (d − 1)2 . If d > 2, this eigenvalue is larger than 1, meaning that the recursion (9) is non-stable. It can be seen that the excitation term a(t) has two components. The first term sums the squared differences between the mean values of all the possible pairs of received check node messages (weighted by the inverse product of the appropriate variances). It therefore measures the mismatch between the incoming messages. This mismatch will be small if the mean values of the consistent Gaussians converge to the coordinates of a lattice point (any lattice point). The second term sums the squared differences between the mean values of the incoming messages and the noisy channel output yk . This term measures the mismatch between the incoming messages and the channel measurement. It will be smallest if the mean values of the consistent Gaussians converge to the coordinates of the ML lattice point. 9 The following lemma summarizes some properties of the excitation term a(t) . Lemma 9: For a magic square LDLC with dimension n, degree d, α < 1 and no 4-loops, consider the consistent Gaussians that correspond to a given integer vector b. According to Lemma 8, their amplitudes satisfy recursion (9). The excitation term a(t) of (9), which is defined by (10), satisfies the following properties: (t) 1) ai , the i’th element of a(t) , is non-negative, finite (t) and bounded for every i and every t. Moreover, ai converges to a finite non-negative steady state value as t → ∞. P (t) nd 2) limt→∞ i=1 ai = 2σ1 2 (Gb − y)T W (Gb − y), where y is the noisy received codeword and W is a positive definite matrix defined by: ∆ −1 W = (d + 1 − α)I − 2(1 − α)(I + F ) T +(1 − α)(I + F )−1  + (11)  where F is defined in Lemma 7. 3) For an LDLC with degree d > 2, the weighted infinite (j) P∞ Pnd i=1 ai sum j=0 (d−1)2j+2 converges to a finite value. Proof: See Appendix VI. The following theorem addresses the question of which consistent Gaussian will have the maximal asymptotic amplitude. We shall first consider the case of an LDLC with degree d > 2, and then consider the special case of d = 2 in a separate theorem. Theorem 4: For a magic square LDLC with dimension n, degree d > 2, α < 1 and no 4-loops, consider the nd consistent Gaussians that relate to a given integer vector b in the variable node messages that are sent at iteration t (one consistent Gaussian per message). Denote the amplitudes (t) of these Gaussians by pi , i = 1, 2, ...nd, and define the ∆ Qnd (t) product-of-amplitudes as P (t) = i=1 pi . Define further P (j) nd P∞ a (j) i=1 i S = is defined by (10) (S is j=0 (d−1)2j+2 , where ai well defined according to Lemma 9). Then: 1) The integer vector b for which the consistent Gaussians will have the largest asymptotic product-of-amplitudes limt→∞ P (t) is the one for which S is minimized. 2) The product-of-amplitudes for the consistent Gaussians that correspond to all other integer vectors will decay to zero in a super-exponential rate. (t) ∆ Proof: As in Lemma 8, define the log-amplitudes li = P ∆ (t) nd (t) log pi . Define further s(t) = i=1 li . Taking the elementwise sum of (9), we get: nd X (t) ai (t+1) s̃ s(t) (d−1)2t . (t) = s̃ (12) i=1 with initialization s(0) = 0. Note that we ignored the term Pnd (t) i=1 ci . As shown below, we are looking for the vector b that maximizes s(t) . Since (12) is a linear difference equation, Pnd (t) and the term i=1 ci is independent of b, its effect on s(t) is common to all b and is therefore not interesting. Substituting in (12), we get: nd X 1 (t) − a (d − 1)2t+2 i=1 i (13) with initialization s̃(0) = 0, which can be solved to get: t−1 Pnd (j) X (t) i=1 ai s̃ = − (14) (d − 1)2j+2 j=0 We would now like to compare the amplitudes of consistent Gaussians with various values of the corresponding integer vector b in order to find the lattice point whose consistent Gaussians will have largest product-of-amplitudes. From the definitions of s(t) and s̃(t) we then have: (t) P (t) = es (d − 1)2 I − F T F (I + F )−1 s(t+1) = (d − 1)2 s(t) − ∆ Define now s̃(t) = 2t = e(d−1) ·s̃(t) (15) Consider two integer vectors b that relate to two lattice points. (t) Denote the corresponding product-of-amplitudes by P0 and (t) P1 , respectively, and assume that for these two vectors S converges to the values S0 and S1 , respectively. Then, taking into account that limt→∞ s̃(t) = −S, the asymptotic ratio of the product-of-amplitudes for these lattice points will be: (t) lim t→∞ P1 (t) P0 2t 2t e−(d−1) ·S1 = −(d−1)2t ·S = e(d−1) ·(S0 −S1 ) 0 e (16) It can be seen that if S0 < S1 , the ratio decreases to zero in a super exponential rate. This shows that the lattice point for which S is minimized will have the largest product-ofamplitudes, where for all other lattice points, the productof-amplitudes will decay to zero in a super-exponential rate (recall that the normalization operation at the variable node keeps the sum of all amplitudes in a message to be 1). This completes the proof of the theorem. We now have to find which integer valued vector b minimizes S. The analysis is difficult because the weighting factor inside the sum of (14) performs exponential weighting of the excitation terms, where the dominant terms are those of the first iterations. Therefore, we can not use the asymptotic results of Lemma 9, but have to analyze the transient behavior. However, the analysis is simpler for the case of an LDLC with row and column degree of d = 2, so we shall first turn to this simple case (note that for this case, both the convolution in the check nodes and the product at the variable nodes involve only a single message). Theorem 5: For a magic square LDLC with dimension n, degree d = 2, α < 1 and no 4-loops, consider the 2n consistent Gaussians that relate to a given integer vector b in the variable node messages that are sent at iteration t (one consistent Gaussian per message). Denote the amplitudes of (t) these Gaussians by pi , i = 1, 2, ...2n, and define the product∆ Q2n (t) of-amplitudes as P (t) = i=1 pi . Then: 1) The integer vector b for which the consistent Gaussians will have the largest asymptotic product-of-amplitudes limt→∞ P (t) is the one for which (Gb−y)T W (Gb−y) is minimized, where W is defined by (11) and y is the noisy received codeword. 10 2) The product-of-amplitudes for the consistent Gaussians that correspond to all other integer vectors will decay to zero in an exponential rate. Proof: For d = 2 (12) becomes: s(t+1) = s(t) − 2n X (t) ai (17) (j) (18) i=1 With solution: s(t) = − t−1 X 2n X ai j=0 i=1 P2n (j) Denote Sa = limj→∞ i=1 ai . Sa is well defined according to Lemma 9. For large t, we then have s(t) ≈ −t · Sa . Therefore, for two lattice points with excitation sum terms which approach Sa0 , Sa1 , respectively, the ratio of the corresponding product-of-amplitudes will approach (t) lim P1 t→∞ = (t) P0 e−Sa1 ·t = e(Sa0 −Sa1 )·t e−Sa0 ·t (19) If Sa0 < Sa1 , the ratio decreases to zero exponentially (unlike the case of d > 2 where the rate was super-exponential, as in (16)). This shows that the lattice point for which Sa is minimized will have the largest product-of-amplitudes, where for all other lattice points, the product-of-amplitudes will decay to zero in an exponential rate (recall that the normalization operation at the variable node keeps the sum of all amplitudes in a message to be 1). This completes the proof of the second part of the theorem. We still have to find the vector b that minimizes Sa . The basic difference between the case of d = 2 and the case of d > 2 is that for d > 2 we need to analyze the transient behavior of the excitation terms, where for d = 2 we only need to analyze the asymptotic behavior, which is much easier to handle. According to Lemma 9, we have: ∆ Sa = lim j→∞ 2n X i=1 (j) ai = 1 (Gb − y)T W (Gb − y) 2σ 2 (20) where W is defined by (11) and y is the noisy received codeword. Therefore, for d = 2, the lattice points whose consistent Gaussians will have largest product-of-amplitudes is the point for which (Gb − y)T W (Gb − y) is minimized. This completes the proof of the theorem. For d = 2 we could find an explicit expression for the “winning” lattice point. As discussed above, we could not find an explicit expression for d > 2, since the result depends on the transient behavior of the excitation sum term, and not only on the steady state value. However, a reasonable conjecture is to assume that b that maximizes the steady state excitation will also maximize the term that depends on the transient behavior. This means that a reasonable conjecture is to assume that the “winning” lattice point for d > 2 will also minimize an expression of the form (20). Note that for d > 2 we can still show that for “weak” noise, the ML point will have the minimal S. To see that, it comes out from (10) that for zero noise, the ML lattice point will (t) have ai = 0 for every t and i, where all other lattice points (t) will have ai > 0 for at least some i and t. Therefore, the ML point will have a minimal excitation term along the transient behavior so it will surely have the minimal S and the best product-of-amplitudes. As the noise increases, it is difficult to (t) analyze the transient behavior of ai , as discussed above. Note that the ML solution minimizes (Gb − y)T (Gb − y), where the above analysis yields minimization of (Gb − y)T W (Gb − y). Obviously, for zero noise (i.e. y = G · b) both minimizations will give the correct solution with zero score. As the noise increases, the solutions may deviate from one another. Therefore, both minimizations will give the same solution for “weak” noise but may give different solutions for “strong” noise. An example for another decoder that performs this form of is the linear detector, which calculates b̂ =   minimization H · y (where bxe denotes the nearest integer to x). This is equivalent to minimizing (Gb − y)T W (Gb − y) with W = T H T H = G−1 G−1 . The linear detector fails to yield the ML solution if the noise is too strong, due to its inherent noise amplification. For the LDLC iterative decoder, we would like that the deviation from the ML decoder due to the W matrix would be negligible in the expected range of noise variance. Experimental results (see Section IX) show that the iterative decoder indeed converges to the ML solution for noise variance values that approach channel capacity. However, for quantization or shaping applications (see Section VIII-B), where the effective noise is uniformly distributed along the Voronoi cell of the lattice (and is much stronger than the noise variance at channel capacity) the iterative decoder fails, and this can be explained by the influence of the W matrix on the minimization, as described above. Note from (11) that as α → 1, W approaches a scaled identity matrix, which means that the minimization criterion approaches the ML criterion. However, the variances converge as αt , so as α → 1 convergence time approaches infinity. Until this point, we concentrated only on consistent Gaussians, and checked what lattice point maximizes the productof-amplitudes of all the corresponding consistent Gaussians. However, this approach does not necessarily lead to the lattice point that will be finally chosen by the decoder, due to 3 main reasons: 1) It comes out experimentally that the strongest Gaussian in each message is not necessarily a consistent Gaussian, but a Gaussian that started as non-consistent and became consistent at a certain iteration. Such a Gaussian will finally converge to the appropriate lattice point, since the convergence of the mean values is independent of initial conditions. The non-consistency at the first several iterations, where the mean values are still very noisy, allows these Gaussians to accumulate stronger amplitudes than the consistent Gaussians (recall that the exponential weighting in (14) for d > 2 results in strong dependency on the behavior at the first iterations). 2) There is an exponential number of Gaussians that start as non-consistent and become consistent (with the same 11 integer vector b) at a certain iteration, and the final amplitude of the Gaussians at the lattice point coordinates will be determined by the sum of all these Gaussians. 3) We ignored non-consistent Gaussians that endlessly remain non-consistent. We have not shown it analytically, but it is reasonable to assume that the excitation terms for such Gaussians will be weaker than for Gaussians that become consistent at some point, so their amplitude will fade away to zero. However, non-consistent Gaussians are born every iteration, even at steady state. The “newly-born” non-consistent Gaussians may appear as sidelobes to the main impulse, since it may take several iterations until they are attenuated. Proper choice of the coefficients of H may minimize this effect, as discussed in Sections III-A and V-A. However, these Gaussians may be a problem for small d (e.g. d = 2) where the product step at the variable node does not include enough messages to suppress them. Note that the first two issues are not a problem for d = 2, where the winning lattice point depends only on the asymptotic behavior. The amplitude of a sum of Gaussians that converged to the same coordinates will still be governed by (18) and the winning lattice point will still minimize (20). The third issue is a problem for small d, but less problematic for large d, as described above. As a result, we can not regard the convergence analysis of the consistent Gaussians’ amplitudes as a complete convergence analysis. However, it can certainly be used as a qualitative analysis that gives certain insights to the convergence process. Two main observations are: 1) The narrow variable node messages tend to converge to single impulses at the coordinates of a single lattice point. This results from (16), (19), which show that the “non-winning” consistent Gaussians will have amplitudes that decrease to zero relative to the amplitude of the “winning” consistent Gaussian. This result remains valid for the sum of non-consistent Gaussians that became consistent at a certain point, because it results from the non-stable nature of the recursion (9), which makes strong Gaussians stronger in an exponential manner. The single impulse might be accompanied by weak “sidelobes” due to newly-born non-consistent Gaussians. Interestingly, this form of solution is different from the exact solution (1), where every lattice point is represented by an impulse at the appropriate coordinate, with amplitude that depends on the Euclidean distance of the lattice point from the observation. The iterative decoder’s solution has a single impulse that corresponds to a single lattice point, where all other impulses have amplitudes that decay to zero. This should not be a problem, as long as the ML point is the remaining point (see discussion above). 2) We have shown that for d = 2 the strongest consistent Gaussians relate to b that minimizes an expression of the form (Gb − y)T W (Gb − y). We proposed a conjecture that this is also true for d > 2. We can further widen the conjecture to say that the finally decoded b (and not only the b that relates to strongest consistent Gaussians) minimizes such an expression. Such a conjecture can explain why the iterative decoder works well for decoding near channel capacity, but fails for quantization or shaping, where the effective noise variance is much larger. E. Summary of Convergence Results To summarize the convergence analysis, it was first shown that the variable node messages are Gaussian mixtures. Therefore, it is sufficient to analyze the sequences of variances, mean values and relative amplitudes of the Gaussians in each mixture. Starting with the variances, it was shown that with proper choice of the magic square LDLC generating sequence, each variable node generates d − 1 “narrow” messages, whose variance decreases exponentially to zero, and a single “wide” message, whose variance reaches a finite value. Consistent Gaussians were then defined as Gaussians that their generation process always involved the same integer at the same check node. Consistent Gaussians can then be related to an integer vector b or equivalently to the lattice point Gb. It was then shown that under appropriate conditions on H, the mean values of consistent Gaussians that belong to narrow messages converge to the coordinates of the appropriate lattice point. The mean values of wide messages also converge to these coordinates, but with a steady state error. Then, the amplitudes of consistent Gaussians were analyzed. For d = 2 it was shown that the consistent Gaussians with maximal productof-amplitudes (over all messages) are those that correspond to an integer vector b than minimizes (Gb − y)T W (Gb − y), where W is a positive definite matrix that depends only on H. The product-of-amplitudes for all other consistent Gaussians decays to zero. For d > 2 the analysis is complex and depends on the transient behavior of the mean values and variances (and not only on their steady state values), but a reasonable conjecture is to assume that a same form of criterion is also minimized for d > 2. The result is different 2 from the ML lattice point, which minimizes G · b − y , where both criteria give the same point for weak noise but may give different solutions for strong noise. This may explain the experiments where the iterative decoder is successful in decoding the ML point for the AWGN channel near channel capacity, but fails in quantization or shaping applications where the effective noise is much stronger. These results also show that the iterative decoder converges to impulses at the coordinates of a single lattice point. It was then explained that analyzing the amplitudes of consistent Gaussians is not sufficient, so these results can not be regarded as a complete convergence analysis. However, the analysis gave a set of necessary conditions on H, and also led to useful insights to the convergence process. V. C ODE D ESIGN A. Choosing the Generating Sequence We shall concentrate on magic square LDLC, since they have inherent diversity of the nonzero elements in each row and column, which was shown above to be beneficial. It still remains to choose the LDLC generating sequence 12 h1 , h2 , ...hd . Assume that the algorithm converged, and each PDF has a peak at the desired value. When the periodic functions are multiplied at a variable node, the correct peaks will then align. We would like that all the other peaks will be strongly attenuated, i.e. there will be no other point where the peaks align. This resembles the definition of the least common multiple (LCM) of integers: if the periods were integers, we would like to have their LCM as large as possible. This argument suggests the sequence {1/2, 1/3, 1/5, 1/7, 1/11, 1/13, 1/17, ...}, i.e. the reciprocals of the smallest d prime numbers. Since the periods are 1/h1 , 1/h2 , ...1/hd , we will get the desired property. Simulations have shown that increasing d beyond 7 with this choice gave negligible improvement. Also, performance was improved by adding some “dither” to the sequence, resulting in {1/2.31, 1/3.17, 1/5.11, 1/7.33, 1/11.71, 1/13.11, 1/17.55}. For d < 7, the first d elements are used. An alternative approach is a sequence of the form {1, , , ..., }, where  << 1. For this case, every variable node will receive a single message with period 1 and d − 1 messages with period 1/. For small , the period of these d − 1 messages will be large and multiplication by the channel Gaussian will attenuate all the unwanted replicas. The single remaining replica will attenuate all the unwanted replicas of the message with period 1. A convenient choice is  = √1d , which ensures that α = d−1 d < 1, as required by Theorem 1. As an example, for d = 7 the sequence will be {1, √17 , √17 , √17 , √17 , √17 , √17 }. decoded codeword from the ML codeword, as discussed in Section IV-D. For the first LDLC generating sequence of the previous subsection, we have α = 0.92 and 0.87 for d = 7 and 5, respectively, which is a reasonable trade off. For the second sequence type we have α = d−1 d . 3) All the eigenvalues of H̃ must have magnitude less than 1, where H̃ is defined in Theorem 2. This is a necessary condition for convergence of the mean values of the narrow messages. Note that adding random signs to the nonzero H elements is essential to fulfill this necessary condition, as explained in Section IV-C. 4) All the eigenvalues of F must have magnitude less than 1, where F is defined in Theorem 3. This is a necessary condition for convergence of the mean values of the wide messages. Interestingly, it comes out experimentally that for large codeword length n and relatively small degree d (e.g. n ≥ 1000 and d ≤ 10), a magic square LDLC with generating sequence that satisfies h1 = 1 and α < 1 results in H that satisfies all these four conditions: H ispnonsingular without any need to omit rows and columns, n |det(H)| ≈ 1 without any need for normalization, and all eigenvalues of H̃ and F have magnitude less than 1 (typically, the largest eigenvalue of H̃ or F has magnitude of 0.94 − 0.97, almost independently of n and the choice of nonzero H locations). Therefore, by simply dividing the first generating sequence of the previous subsection by its first element, the constructed H meets all the necessary conditions, where the second type of sequence meets the conditions without any need for modifications. B. Necessary Conditions on H The magic square LDLC definition and convergence analysis imply four necessary conditions on H: 1) |det(H)| = 1. This condition is part of the LDLC definition, which ensures proper density of the lattice points in Rm . If |det(H)| 6=p1, it can be easily normalized by dividing H by n |det(H)|. Note that practically we can allowp|det(H)| = 6 1 as long as p n |det(H)| ≈ 1, since n |det(H)| is the gain factor of the transmitted codeword. For example, if n = 1000, having |det(H)| = 0.01 is acceptable, since we have p n |det(H)| = 0.995, which means that the codeword has to be further amplified by 20 · log10 (0.995) = 0.04 dB, which is negligible. Note that normalizing H is applicable only if H is non-singular. If H is singular, a row and a column should be sequentially omitted until H becomes full rank. This process may result in slightly reducing n and a slightly different row and column degrees than originally planned. Pd 2 ∆ i=2 hi . This guarantees expo2) α < 1, where α = h21 nential convergence rate for the variances (Theorem 1). Choosing a smaller α results in faster convergence, but we should not take α too small since the steady state variance of the wide variable node messages, as well as the steady state error of the mean values of these messages, increases when α decreases, as discussed in Section IV-C. This may result in deviation of the C. Construction of H for Magic Square LDLC We shall now present a simple algorithm for constructing a parity check matrix for a magic square LDLC. If we look at the bipartite graph, each variable node and each check node has d edges connected to it, one with every possible weight h1 , h2 , ...hd . All the edges that have the same weight hj form a permutation from the variable nodes to the check nodes (or vice versa). The proposed algorithm generates d random permutations and then searches sequentially and cyclically for 2-loops (two parallel edges from a variable node to a check node) and 4-loops (two variable nodes that both are connected to a pair of check nodes). When such a loop is found, a pair is swapped in one of the permutations such that the loop is removed. A detailed pseudo-code for this algorithm is given in Appendix VII. VI. D ECODER I MPLEMENTATION Each PDF should be approximated with a discrete vector with resolution ∆ and finite range. According to the Gaussian Q-function, choosing a range of, say, 6σ to both sides of the noisy channel observation will ensure that the error probability due to PDF truncation will be ≈ 10−9 . Near capacity, σ 2 ≈ 1 2πe , so 12σ ≈ 3. Simulation showed that resolution errors became negligible for ∆ = 1/64. Each PDF was then stored in a L = 256 elements vector, corresponding to a range of size 4. 13 At the check node, the PDF fj (x) that arrives from variable node j is first expanded by hj (the appropriate coefficient of H) to get fj (x/hj ). In a discrete implementation with resolution ∆ the PDF is a vector of values fj (k∆), k ∈ Z. As described in Section V, we shall usually use hj ≤ 1 so the expanded PDF will be shorter than the original PDF. If the expand factor 1/|hj | was an integer, we could simply decimate fj (k∆) by 1/|hj |. However, in general it is not an integer so we should use some kind of interpolation. The PDF fj (x) is certainly not band limited, and as the iterations go on it approaches an impulse, so simple interpolation methods (e.g. linear) are not suitable. Suppose that we need to calculate fj ((k + )∆), where −0.5 ≤  ≤ 0.5. A simple interpolation method which showed to be effective is to average fj (x) around the desired point, where the averaging window length lw is chosen to ensure that every sample of fj (x) is used in the interpolation of at least one output point. This ensures that an impulse can not be missed. The interpolation result j k is then Plw d1/|hj |e 1 f ((k − i)∆), where l = . j w i=−lw 2lw +1 2 The most computationally extensive step at the check nodes is the calculation the convolution of d − 1 expanded PDF’s. An efficient method is to calculate the fast Fourier transforms (FFTs) of all the PDF’s, multiply the results and then perform inverse FFT (IFFT). The resolution of the FFT should be larger thanP the expected convolution length, which is roughly d Lout ≈ L· i=1 hi , where L denotes the original PDF length. Appendix VIII shows a way to use FFTs of size 1/∆, where ∆ is the resolution of the PDF. Usually 1/∆ << Lout so FFT complexity is significantly reduced. Practical values are L = 256 and ∆ = 1/64, which give an improvement factor of at least 4 in complexity. Each variable node receives d check node messages. The output variable node message is calculated by generating the product of d − 1 input messages and the channel Gaussian. As the iterations go on, the messages get narrow and may become impulses, with only a single nonzero sample. Quantization effects may cause impulses in two messages to be shifted by one sample. This will result in a zero output (instead of an impulse). Therefore, it was found useful to widen each check nodePmessage Q(k) prior to multiplication, such that 1 Qw (k) = i=−1 Q(k + i), i.e. the message is added to its right shifted and left shifted (by one sample) versions. VII. C OMPUTATIONAL C OMPLEXITY AND S TORAGE R EQUIREMENTS Most of the computational effort is invested in the d FFT’s and d IFFT’s (of length 1/∆) that each check node performs each iteration. The total number of multiplications for t 1 1 · log2 ( ∆ ) . As in binary LDPC iterations is o n · d · t · ∆ codes, the computational complexity has the attractive property of being linear with block length. However, the constant that precedes the linear term is significantly higher, mainly due to the FFT operations. The memory requirements are governed by the storage of the nd check node and variable node messages, with total memory of o(n · d · L). Compared to binary LDPC, the factor of L significantly increases the required memory. For example, for n = 10, 000, d = 7 and L = 256, the number of storage elements is of the order of 107 . VIII. E NCODING AND S HAPING A. Encoding The LDLC encoder has to calculate x = G·b, where b is an integer message vector. Note that unlike H, G = H −1 is not sparse, in general, so the calculation requires computational complexity and storage of o(n2 ). This is not a desirable property because the decoder’s computational complexity is only o(n). A possible solution is to use the Jacobi method [22] to solve H · x = b, which is a system of sparse linear equations. Using this method, a magic square LDLC encoder calculates several iterations of the form: x(t) = b̃ − H̃ · x(t−1) (21) (0) with initialization x = 0. The matrix H̃ is defined in Lemma 6 of Section IV-C. The vector b̃ is a permuted and scaled version of the integer vector b, such that the i’th element of b̃ equals the element of b for which the appropriate row of H has its largest magnitude value at the i’th location. This element is further divided by this largest magnitude element. A necessary and sufficient condition for convergence to x = G · b is that all the eigenvalues of H̃ have magnitude less than 1 [22]. However, it was shown that this is also a necessary condition for convergence of the LDLC iterative decoder (see Sections IV-C, V-B), so it is guaranteed to be fulfilled for a properly designed magic square LDLC. Since H̃ is sparse, this is an o(n) algorithm, both in complexity and storage. B. Shaping For practical use with the power constrained AWGN channel, the encoding operation must be accompanied by shaping, in order to prevent the transmitted codeword’s power from being too large. Therefore, instead of mapping the information vector b to the lattice point x = G · b, it should be mapped to some other lattice point x0 = G · b0 , such that the lattice points that are used as codewords belong to a shaping region (e.g. an n-dimensional sphere). The shaping operation is the mapping of the integer vector b to the integer vector b0 . As explained in Section II-A, this work concentrates on the lattice design and the lattice decoding algorithm, and not on the shaping region or shaping algorithms. Therefore, this section will only highlight some basic shaping principles and ideas. A natural shaping scheme for lattice codes is nested lattice coding [12]. In this scheme, shaping is done by quantizing the lattice point G · b onto a coarse lattice G0 , where the transmitted codeword is the quantization error, which is uniformly distributed along the Voronoi cell of the coarse lattice. If the second moment of this Voronoi cell is close to that of an n-dimensional sphere, the scheme will attain close-tooptimal shaping gain. Specifically, assume that the information vector b assumes integer values in the range 0, 1, ...M − 1 for some constant integer M . Then, we can choose the coarse lattice to be G0 = M G. The volume of the Voronoi cell for this lattice is M n , since we assume det(G) = 1 (see 14 10 -1 10 -2 10 -3 10 -4 10 -5 Symbol error rate (SER) for various block lengths 100 1000 10,000 100,000 10 Fig. 4. -6 0 0.5 1 1.5 2 2.5 3 distance from capacity [dB] 3.5 4 Simulation results Section II-A). If the shape of the Voronoi cell resembles an ndimensional sphere (as expected from a capacity approaching lattice code), it will attain optimal shaping gain (compared to uncoded transmission of the original integer sequence b). The shaping operation will find the coarse lattice point M Gk, k ∈ Zn , which is closest to the fine lattice point x = G · b. The transmitted codeword will be: x0 = x − M Gk = G(b − M k) = Gb0 {1/2.31, 1/3.17, 1/5.11, 1/7.33, 1/11.71, 1/13.11, 1/17.55}) were simulated for the AWGN channel at various block lengths. The degree was d = 5 for n = 100 and d = 7 for all otherpn. For n = 100 the matrix H was further normalized to get n det(H) = 1. For all other n, normalizing the generating sequence such that the largest element has magnitude 1 also gave the desired determinant normalization (see Section V-B). The H matrices were generated using the algorithm of Section V-C. PDF resolution was set to ∆ = 1/256 with a total range of 4, i.e. each PDF was represented by a vector of L = 1024 elements. High resolution was used since our main target is to prove the LDLC concept and eliminate degradation due to implementation considerations. For this reason, the decoder was used with 200 iterations (though most of the time, a much smaller number was sufficient). In all simulations the all-zero codeword was used. Ap1 (see proaching channel capacity is equivalent to σ 2 → 2πe Section II-A), so performance is measured in symbol error rate (SER), vs. the distance of the noise variance σ 2 from capacity (in dB). The results are shown in Figure 4. At SER of 10−5 , for n = 100000, 10000, 1000, 100 we can work as close as 0.6dB, 0.8dB, 1.5dB and 3.7dB from capacity, respectively. Similar results were obtained for d = 7 with the second type of generating sequence of Section V-A, i.e. {1, √17 , √17 , √17 , √17 , √17 , √17 }. Results were slightly worse than for the first generating sequence (by less than 0.1 dB). Increasing d did not give any visible improvement. ∆ where b0 = b − M k (note that the “inverse shaping” at the decoder, i.e. transforming from b0 to b, is a simple modulo calculation: b = b0 mod M ). Finding the closest coarse lattice point M Gk to x is equivalent to finding the closest fine lattice point G · k to the vector x/M . This is exactly the operation of the iterative LDLC decoder, so we could expect that is could be used for shaping. However, simulations show that the iterative decoding finds a vector k with poor shaping gain. The reason is that for shaping, the effective noise is much stronger than for decoding, and the iterative decoder fails to find the nearest lattice point if the noise is too large (see Section IV-D). Therefore, an alternative algorithm has to be used for finding the nearest coarse lattice point. The complexity of finding the nearest lattice point grows exponentially with the lattice dimension n and is not feasible for large dimensions [23]. However, unlike decoding, for shaping applications it is not critical to find the exact nearest lattice point, and approximate algorithms may be considered (see [15]). A possible method [24] is to perform QR decomposition on G in order to transform to a lattice with upper triangular generator matrix, and then use sequential decoding algorithms (such as the Fano algorithm) to search the resulting tree. The main disadvantage of this approach is computational complexity and storage of at least o(n2 ). Finding an efficient shaping scheme for LDLC is certainly a topic for further research. IX. S IMULATION R ESULTS Magic erating square LDLC sequence of with the Section first V-A gen(i.e. X. C ONCLUSION Low density lattice codes (LDLC) were introduced. LDLC are novel lattice codes that can approach capacity and be decoded efficiently. Good error performance within ∼ 0.5dB from capacity at block length of 100,000 symbols was demonstrated. Convergence analysis was presented for the iterative decoder, which is not complete, but yields necessary conditions on H and significant insight to the convergence process. Code parameters were chosen from intuitive arguments, so it is reasonable to assume that when the code structure will be more understood, better parameters could be found, and channel capacity could be approached even closer. Multi-input, multi-output (MIMO) communication systems have become popular in recent years. Lattice codes have been proposed in this context as space-time codes (LAST) [25]. The concatenation of the lattice encoder and the MIMO channel generates a lattice. If LDLC are used as lattice codes and the MIMO configuration is small, the inverse generator matrix of this concatenated lattice can be assumed to be sparse. Therefore, the MIMO channel and the LDLC can be jointly decoded using an LDLC-like decoder. However, even if a magic square LDLC is used as the lattice code, the concatenated lattice is not guaranteed to be equivalent to a magic square LDLC, and the necessary conditions for convergence are not guaranteed to be fulfilled. Therefore, the usage of LDLC for MIMO systems is a topic for further research. 15 A PPENDIX I E XACT PDF C ALCULATIONS Given the n-dimensional noisy observation y = x + w of the transmitted codeword x = Gb, we would like to calculate the probability density function (PDF) fxk |y (xk |y). We shall fx (x)fy|x (y|x) start by calculating fx|y (x|y) = . Denote the fy (y) shaping region by B (G will be used to denote both the lattice and its generator matrix). fx (x) is a sum of |G ∩ B| n-dimensional Dirac delta functions, since x has nonzero probability only for the lattice points that lie inside the shaping region. Assuming further that all codewords are used with equal probability, all these delta functions have equal weight 1 . The expression for fy|x (y|x) is simply the PDF of of |G∩B| the i.i.d Gaussian noise vector. We therefore get: fx|y (x|y) = 1 |G∩B| P l∈G∩B fx (x)fy|x (y|x) fy (y) δ(x − l) · (2πσ 2 )−n/2 e− = Pn 2 2 i=1 (yi −xi ) /2σ = fy (y) =C· 2 X (22) δ(x − l) · e−d (l,y)/2σ 2 l∈G∩B Where C is not a function of x and d2 (l, y) is the squared Euclidean distance between the vectors l and y in Rn . It can be seen that the conditional PDF of x has a delta function for each lattice point, located at this lattice point with weight that is proportional to the exponent of the negated squared Euclidean distance of this lattice point from the noisy observation. The ML point corresponds to the delta function with largest weight. As the next step, instead of calculating the n-dimensional PDF of the whole vector x, we shall calculate the n onedimensional PDF’s for each of the components xk of the vector x (conditioned on the whole observation vector y): fxk |y (xk |y) = Z Z (23) Z ··· fx|y (x|y)dx1 dx2 · · · dxk−1 dxk+1 · · · dxn = xi ,i6=k =C· X δ(xk − lk ) · e−d 2 (l,y)/2σ 2 l∈G∩B This finishes the proof of (1). It can be seen that the conditional PDF of xk has a delta function for each lattice point, located at the projection of this lattice point on the coordinate xk , with weight that is proportional to the exponent of the negated squared Euclidean distance of this lattice point from the noisy observation. The ML point will therefore correspond to the delta function with largest weight in each coordinate. Note, however, that if several lattice points have the same projection on a specific coordinate, the weights of the corresponding delta functions will add and may exceed the weight of the ML point. A PPENDIX II E XTENDING G ALLAGER ’ S T ECHNIQUE TO THE C ONTINUOUS C ASE In [5], the derivation of the LDPC iterative decoder was simplified using the following technique: the codeword elements xk were assumed i.i.d. and a condition was added to all the probability calculations, such that only valid codewords were actually considered. The question is then how to choose the marginal PDF of the codeword elements. In [5], binary codewords were considered, and the i.i.d distribution assumed the values ’0’ and ’1’ with equal probability. Since we extend the technique to the continuous case, we have to set the continuous marginal distribution fxk (xk ). It should be set such that fx (x), assuming that x is a lattice point, is the same as f (x|s ∈ Zn ), assuming that xk are i.i.d with marginal PDF ∆ fxk (xk ), where s = H · x. This fx (x) equals a weighted sum of Dirac delta functions at all lattice points, where the weight at each lattice point equals the probability to use this point as a codeword. Before proceeding, we need the following property of conditional probabilities. For any two continuous valued RV’s u, v we have: PN k=1 fu,v (u, vk ) (24) f (u|v ∈ {v1 , v2 , ..., vN }) = P N k=1 fv (vk ) (This property can be easily proved by following the lines of [21], pp. 159-160, and can also be extended to the infinite sum case). Using (24), we now have: P i∈Zn fx,s (x, s = i) n P f (x|s ∈ Z ) = = i∈Zn fs (i) =C X i∈Zn f (x)f (s = i|x) = C 0 X f (x)δ(x − Gi) (25) i∈Zn where C, C 0 are independent of x. The result is a weighted sum of Dirac delta functions at all lattice points, as desired. Now, the weight at each lattice point should equal the probability to use this point as a codeword. Therefore, fxk (xk ) should be chosen such that Q at each lattice n point, the resulting vector distribution fx (x) = k=1 fxk (xk ) will have a value that is proportional to the probability to use this lattice point. At x which is not a lattice point, the value of fx (x) is not important. A PPENDIX III D ERIVATION OF THE I TERATIVE D ECODER In this appendix we shall derive the LDLC iterative decoder for a code with dimension n, using the tree assumption and Gallager’s trick. Referring to figure 2, assume that there are only 2 tiers. Using Gallager’s trick we assume that the xk ’s are i.i.d. We ∆ would like to calculate f (x1 |(y, s ∈ Zn ), where s = H · x. Due to the tree assumption, we can do it in two steps: 1. calculate the conditional PDF of the tier 1 variables of x1 , conditioned only on the check equations that relate the tier 1 and tier 2 variables. 16 2. calculate the conditional PDF of x1 itself, conditioned only on the check equations that relate x1 and its first tier variables, but using the results of step 1 as the PDF’s for the tier 1 variables. Hence, the results will be equivalent to conditioning on all the check equations. There is a basic difference between the calculation in step 1 and step 2: the condition in step 2 involves all the check equations that are related to x1 , where in step 1 a single check equation is always omitted (the one that relates the relevant tier 1 element with x1 itself). Assume now that there are many tiers, where each tier contains distinct elements of x (i.e. each element appears only once in the resulting tree). We can then start at the farthest tier and start moving toward x1 . We do it by repeatedly calculating step 1. After reaching tier 1, we use step 2 to finally calculate the desired conditional PDF for x1 . This approach suggests an iterative algorithm for the calculation of f (xk |(y, s ∈ Zn ) for k=1, 2..n. In this approach we assume that the resulting tier diagram for each xk contains distinct elements for several tiers (larger or equal to the number of required iterations). We then repeat step 1 several times, where the results of the previous iteration are used as initial PDF’s for the next iteration. Finally, we perform step 2 to calculate the final results. Note that by conditioning only on part of the check equations in each iteration, we can not restrict the result to the shaping region. This is the reason that the decoder performs lattice decoding and not exact ML decoding, as described in Section III. We shall now turn to derive the basic iteration of the algorithm. For simplicity, we shall start with the final step of the algorithm (denoted step 2 above). We would like to perform t iterations, so assume that for each xk there are t tiers with a total of Nc check equations. For every xk we need to calculate f (xk |s ∈ ZNc , y) = f (xk |s(tier1 ) ∈ Zck , s(tier2 :tiert ) ∈ ZNc −ck , y), where ck is the number of (tier1 ) ∆ f (xk |s P = i∈Zck P (28) where f (xk |y) = f (xk |yk ) due to the i.i.d assumption. Evaluating now the right term of (27), we get: f (s(tier1 ) = i|xk , A, y) = ck Y (tier1 ) = f (sm = im |xk , A, y) (29) m=1 (tier ) where sm 1 denotes the m’th component of s(tier1 ) and im denotes the m’th component of i. Note that each element of s(tier1 ) is a linear combination of several elements of x. Due to the tree assumption, two such linear combinations have no common elements, except for xk itself, which appears in all linear combinations. However, xk is given, so the i.i.d assumption implies that all these linear combinations are independent, so (29) is justified. The condition A (i.e. s(tier2 :tiert ) ∈ ZNc −ck ) does not impact the independence due to the tree assumption. Substituting (27), (28), (29) back in (26), we get: f (xk |s(tier1 ) ∈ Zck , A, y) = = C · f (xk ) · e − (yk −xk )2 2σ 2 ck X Y ck i∈Z = C · f (xk ) · e− (30) 1) f (s(tier = im |xk , A, y) = m m=1 (yk −xk )2 2σ 2 X X ··· i1 ∈Z i2 ∈Z ··· ck X Y (tier1 ) f (sm = im |xk , A, y) = ick ∈Z m=1 ∈ Z , A, y) = f (s(tier1 ) = i|A, y) (26) Evaluating the term inside the sum of the nominator, we get: f (xk , s(tier1 ) = i|A, y) = = f (xk |A, y) · f (s(tier1 ) = i|xk , A, y) = C · f (xk ) · e− (yk −xk )2 2σ 2 ck X Y 1) f (s(tier = im |xk , A, y) m m=1 im ∈Z where C is independent of xk . (tier ) We shall now examine the term inside the sum: f (sm 1 = (tier1 ) im |xk , A, y). Denote the linear combination that sm represents by: (tier1 ) sm = hm,1 xk + rm X hm,l xjl (31) l=2 ck f (xk , s(tier1 ) = i|A, y) i∈Zck f (xk )f (yk |xk ) = f (yk ) (yk −xk )2 f (xk ) 1 = ·√ e− 2σ2 f (yk ) 2πσ 2 f (xk |A, y) = f (xk |yk ) = (tier1 ) check equations that involve xk . s =H ·x denotes the value of the left hand side of these check equations when x is substituted (H (tier1 ) is a submatrix of H that contains only the rows that relate to these check equations), and s(tier2 :tiert ) relates in the same manner to all the other check equations. For simplicity of notations, denote the event s(tier2 :tiert ) ∈ ZNc −ck by A. As explained above, in all the calculations we assume that all the xk ’s are independent. Using (24), we get: (tier1 ) Evaluating the left term, we get: (27) where {hm,l }, l = 1, 2...rm is the set of nonzero coefficients of the appropriate parity check equation, and jl is the set of indices of the appropriate x elements (note that the set jl depends on m but we omit the “m” index for clarity of notations). Without loss of generality, hm,1 is assumed to be ∆ P rm the coefficient of xk . Define zm = l=2 hm,l xjl , such that (tier ) sm 1 = hm,1 xk + zm . We then have: (tier1 ) f (sm = im |xk , A, y) = (32) 17 = fzm |xk ,A,y (zm = im − hm,1 xk |xk , A, y) Now, since we assume that the elements of x are independent, the PDF of the linear combination zm equals the convolution of the PDF’s of its components: fzm |xk ,A,y (zm |xk , A, y) =   1 zm = fxj2 |A,y |A, y ~ |hm,2 | h  m,2  zm 1 ~ fxj3 |A,y |A, y ~ |hm,3 | h  m,3  1 zm |A, y ··· ~ fx |A,y (33) |hm,rm | jrm hm,rm  Note that the functions fxji |y xji |A, y are simply the output PDF’s of the previous iteration. Define now ∆ pm (xk ) = fzm |xk ,A,y (zm = −hm,1 xk |xk , A, y) (34) Substituting (32), (34) in (30), we finally get: f (xk |s(tier1 ) ∈ Zck , A, y) = = C · f (xk ) · e− (yk −xk )2 2σ 2 ck X Y pm (xk − m=1 im ∈Z (35) im ) hm,1 This result can be summarized as follows. For each of the ck check equations that involve xk , the PDF’s (previous iteration results) of the active equation elements, except for xk itself, are expanded and convolved, according to (33). The convolution result is scaled by (−hm,1 ), the negated coefficient of xk in this check equation, according to (34), to yield pm (xk ). Then, a periodic function with period 1/|hm,1 | is generated by adding an infinite number of shifted versions of the scaled convolution result, according to the sum term in (35). After repeating this process for all the ck check equations that involve xk , we get ck periodic functions, with possibly different periods. We then multiply all these functions. The multiplication result is further multiplied by (yk −xk )2 the channel Gaussian PDF term e− 2σ2 and finally by f (xk ), the marginal PDF of xk under the i.i.d assumption. As discussed in Section III, we assume that f (xk ) is a uniform distribution with large enough range. This means that f (xk ) is constant over the valid range of xk , and can therefore be omitted from (35) and absorbed in the constant C. As noted above, this result is for the final step (equivalent to step 2 above), where we determine the PDF of xk according to the PDF’s of all its tier 1 elements. However, the repeated iteration step is equivalent to step 1 above. In this step ,we assume that xk is a tier 1 element of another element, say xl , and derive the PDF of xk that should be used as input to step 2 of xl (see figure 2). It can be seen that the only difference between step 2 and step 1 is that in step 2 all the check equations that involve xk are used, where in step 1 the check equation that involves both xk and xl is ignored (there must be such an equation since xk is one of the tier 1 elements of xl ). Therefore, the step1 iteration is identical to (35), except that the product does not contain the term that corresponds to the check equation that combines both xk and xl . Denote ∆ fkl (xk ) = f (xk |s(tier1 except l) ∈ Zck −1 , A, y) (36) We then get: fkl (xk ) = C · e− (yk −xk )2 2σ 2 ck Y X m=1 im ∈Z m6=ml pm (xk − im ) hm,1 (37) where ml is the index of the check equation that combines both xk and xl . In principle, a different fkl (xk ) should be calculated for each xl for which xk is a tier 1 element. However, the calculation is the same for all xl that share the same check equation. Therefore, we should calculate fkl (xk ) once for each check equation that involves xk . l can be regarded as the index of the check equation within the set of check equations that involve xk . We can now formulate the iterative decoder. The decoder (t) state variables are PDF’s of the form fkl (xk ), where k = 1, 2, ...n. For each k, l assumes the values 1, 2, ...ck , where ck is the number of check equations that involve xk . t denotes the iteration index. For a regular LDLC with degree d there will be nd PDF’s. The PDF’s are initialized by assuming that xk is a leaf of the tier diagram. Such a leaf has no tier 1 elements, so fkl (xk ) = f (xk ) · f (yk |xk ). As explained above for equation (35), we shall omit the term f (xk ), resulting in initialization with the channel noise Gaussian around the noisy observation yk . Then, the PDF’s are updated in each iteration according to (37). The variable node messages should Rbe further normalized ∞ in order to get actual PDF’s, such that −∞ fkl (xk )dxk = 1 (this will compensate for the constant C). The final PDF’s for xk , k = 1, 2, ...n are then calculated according to (35). Finally, we have to estimate the integer valued information vector b. This can be done by first estimating the codeword vector x from the peaks of the PDF’s: xˆk = arg maxxk f (xk |s(tier1 ) ∈ Zck , A, y). Finally, we can estimate b as b̂ = bH x̂e. We have finished developing the iterative algorithm. It can be easily seen that the message passing formulation of Section III-A actually implements this algorithm. A PPENDIX IV A SYMPTOTIC B EHAVIOR OF THE VARIANCES R ECURSION A. Proof of Lemma 3 and Lemma 4 We shall now derive the basic iterative equations that relate the variances at iteration t + 1 to the variances at iteration t for a magic square LDLC with dimension n, degree d and generating sequence h1 ≥ h2 ≥ ... ≥ hd > 0. Each iteration, every check node generates d output messages, one for each variable node that is connected to it, where the weights of these d connections are ±h1 , ±h2 , ..., ±hd . For each such output message, the check node convolves d − 1 expanded variable node PDF messages, and then stretches and periodically extends the result. For a specific check node, denote the variance of the variable node message that arrives (t) along an edge with weight ±hj by Vj , j = 1, 2, ...d. Denote the variance of the message that is sent back to a variable 18 (t) node along an edge with weight ±hj by Ṽj . From (2), (3), we get: (t) Ṽj d 1 X 2 (t) h V = 2 hj i=1 i i For illustration, the new recursion for the case d = 3 is: 1 (t+1) U1 (38) (t+1) = d X 1 Ṽi (39) From symmetry considerations, it can be seen that all messages that are sent along edges with the same absolute value of their weight will have the same variance, since the same variance update occurs for all these messages (both for check node messages and variable node messages). Therefore, the d (t) (t) (t) variance values V1 , V2 , ..., Vd are the same for all variable (t) nodes, where Vl is the variance of the message that is sent along an edge with weight ±hl . This completes the proof of Lemma 3. Using this symmetry, we can now derive the recursive (t) (t) (t) update of the variance values V1 , V2 , ..., Vd . Substituting (38) in (39), we get: 1 (t+1) = Vi d X 1 h2m + Pd 2 2 (t) σ j=1 h V m=1 m6=i j j6=m (40) j for i = 1, 2, ...d, which completes the proof of Lemma 4. B. Proof of Theorem 1 We would like to analyze the convergence of the nonlinear (t) (t) (t) recursion (4) for the variances V1 , V2 , ..., Vd . This recursion is illustrated in (5) Pd for 2the case d = 3. It is assumed that hi α < 1, where α = i=2 . Define another set of variables h2 (t) (t) 1 (t) U1 , U2 , ..., Ud , which obey the following recursion. The recursion for the first variable is: 1 (t+1) U1 d X 1 h2m = 2+ 2 (t) σ m=2 h U 1 (41) 1 where for i = 2, 3, ...d the recursion is: 1 (t+1) Ui = Pd (0) h21 j=2 (0) = U2 1 (t+1) = U3 (t) h2j Uj (0) with initial conditions U1 = U2 = ... = Ud = σ 2 . It can be seen that (41) can be regarded as the approximation (t) (t) (t) of (4) under the assumptions that Vi << V1 and Vi << 2 σ for i = 2, 3, ...d. + h23 (t) h21 U1 1 σ2 + (42) h21 (t) (t) h22 U2 + h23 U3 h21 (t) (t) h22 U2 + h23 U3 (t) It can be seen that in the new recursion, U1 obeys a recursion that is independent of the other variables. From 1 (41), this recursion can be written as (t+1) = σ12 + α(t) , (0) with initial condition U1 stable linear recursion for (t) 1 + 2 σ (t) i=1 i6=j (t) h21 U1 (t+1) Then, each variable node generates d messages, one for each check node that is connected to it, where the weights of these d connections are ±h1 , ±h2 , ..., ±hd . For each such output message, the variable node generates the product of d − 1 check node messages and the channel noise PDF. For a specific variable node, denote the variance of the message that is sent back to a check node along an edge with weight ±hj by (t+1) (this is the final variance of the iteration). From claim Vj 2, we then get: Vj h22 1 i6=j 1 = U1 U1 = σ 2 . Since α < 1, this is a 1 (t) , which can be solved to get U1 U1 = σ 2 (1 − α) 1−α1t+1 . For the other variables, it can be seen that all have the same right hand side in the recursion (41). Since all are initialized (t) (t) (t) with the same value, it follows that U2 = U3 = ... = Ud for all t ≥ 0. Substituting back in (41), we get the recursion (t+1) (t) (0) U2 = αU2 , with initial condition U2 = σ 2 . Since α < (t) 1, this is a stable linear recursion for U2 , which can be solved (t) to get U2 = σ 2 αt . (t) We found an analytic solution for the variables Ui . How(t) ever, we are interested in the variances Vi . The following claim relates the two sets of variables. Claim 3: For every t ≥ 0, the first variables of the two sets (t) (t) are related by V1 ≥ U1 , where for i = 2, 3, ...d we have (t) (t) V i ≤ Ui . Proof: By induction: the initialization of the two sets of variables obviously satisfies the required relations. Assume (t) now that the relations are satisfied for iteration t, i.e. V1 ≥ (t) (t) (t) U1 and for i = 2, 3, ...d, Vi ≤ Ui . If we now compare the 1 right hand side of the update recursion for (t+1) to that of V1 1 1 (t+1) (i.e. (4) to (41)), then the right hand side for (t+1) U1 V1 is smaller, because it has additional positive terms in the denominators, where the common terms in the denominators are larger according to the induction assumption. Therefore, (t+1) (t+1) , as required. In the same manner, if we V1 ≥ U1 1 compare the right hand side of the update recursion for (t+1) Vi 1 1 to that of (t+1) for i ≥ 2, then the right hand side for (t+1) Ui Vi is larger, because it has additional positive terms, where the common terms are also larger since their denominators are (t+1) smaller due to the induction assumption. Therefore, Vi ≤ (t+1) Ui for i = 2, 3, ...d, as required. (t) Using claim 3 and the analytic results for Ui , we now have: 1 (t) (t) V1 ≥ U1 = σ 2 (1 − α) ≥ σ 2 (1 − α) (43) 1 − αt+1 where for i = 2, 3, ...d we have: (t) Vi (t) ≤ Ui = σ 2 αt (44) We have shown that the first variance is lower bounded by a positive nonzero constant where the other variances 19 are upper bounded by a term that decays exponentially to (t) (t) zero. Therefore, for large t we have Vi << V1 and (t) 2 Vi << σ for i = 2, 3, ...d. It then follows that for large t the variances approximately obey the recursion (41), which was built from the actual variance recursion (4) under these assumptions. Therefore, for i = 2, 3, ...d the variances are not only upper bounded by an exponentially decaying term, but actually approach such a term, where the first variance actually approaches the constant σ 2 (1 − α) in an exponential rate. This completes the proof of Theorem 1. Note that the above analysis only applies if α < 1. To illustrate the behavior for α ≥ 1, consider the simple case of h1 = h2 = ... = hd . From (4), (5) it can be seen (0) (t) that for this case, if Vi is independent of i, then Vi is independent of i for every t > 0, since all the elements will follow the same recursive equations. Substituting this result in the first equation, we get the single variable recursion (0) 1 = 1(t) + σ12 with initialization Vi = σ 2 . This (t+1) Vi Vi (t) 2 σ recursion is easily solved to get 1(t) = t+1 = t+1 . It σ 2 or Vi Vi can be seen that all the variances converge to zero, but with slow convergence rate of o(1/t). A PPENDIX V A SYMPTOTIC B EHAVIOR OF THE M EAN VALUES R ECURSION A. Proof of Lemma 5 and Lemma 6 (Mean of Narrow Messages) Assume a magic square LDLC with dimension n and degree d. We shall now examine the effect of the calculations in the check nodes and variable nodes on the mean values and derive the resulting recursion. Every iteration, each check node generates d output messages, one for each variable node that connects to it, where the weights of these d connections are ±h1 , ±h2 , ..., ±hd . For each such output message, the check node convolves d − 1 expanded variable node PDF messages, and then stretches and periodically extends the result. We shall concentrate on the nd consistent Gaussians that relate to the same integer vector b (one Gaussian in each message), and analyze them jointly. For convenience, we shall refer to the mean value of the relevant consistent Gaussian as the mean of the message. Consider now a specific check node. Denote the mean value of the variable node message that arrives at iteration t along (t) the edge with weight ±hj by mj , j = 1, 2, ...d. Denote the mean value of the message that is sent back to a variable node (t) along an edge with weight ±hj by m̃j . From (2), (3) and claim 1, we get:   d X 1  (t)  (t) m̃j = hi mi  (45) bk − hj i=1 i6=j where bk is the appropriate element of b that is related to this specific check equation, which is the only relevant index in the infinite sum of the periodic extension step (3). Note that the check node operation is equivalent Pd to extracting the value of mj from the check equation i=1 hi mi = bk , assuming all the other mi are known. Note also that the coefficients hj should have a random sign. To keep notations simple, we assume that hj already includes the random sign. Later, when several equations will be combined together, we should take it into account. Then, each variable node generates d messages, one for each check node that is connected to it, where the weights of these d connections are ±h1 , ±h2 , ..., ±hd . For each such output message, the variable node generates the product of d − 1 check node messages and the channel noise PDF. For a specific variable node, denote the mean value of the message that arrives from a check node along an edge with weight (t) (t) ±hj by m̃j , and the appropriate variance by Ṽj . The mean value of the message that is sent back to a check node along (t+1) an edge with weight ±hj is mj , the final mean value of the iteration. From claim 2, we then get: Pd (t) (t) yk /σ 2 + i=1 m̃i /Ṽi i6=j (t+1) (46) mj = Pd (t) 1/σ 2 + i=1 1/Ṽi i6=j where yk is the channel observation for the variable node and (t) σ 2 is the noise variance. Note that m̃i , i = 1, 2, ..., d in (46) are the mean values of check node messages that arrive to the same variable node from different check nodes, where in (45) they define the mean values of check node messages that leave the same check node. However, it is beneficial to keep the notations simple, and we shall take special care when (46) and (45) are combined. It can be seen that the convergence of the mean values is coupled to the convergence of the variances (unlike the recursion of the variances which was autonomous). However, as the iterations go on, this coupling disappears. To see that, recall from Theorem 1 that for each check node, the variance of the variable node message that comes along an edge with weight ±h1 approaches a finite value, where the variance of all the other messages approaches zero exponentially. According to (38), the variance of the check node message is a weighted sum of the variances of the incoming variable node messages. Therefore, the variance of the check node message that goes along an edge with weight ±h1 will approach zero, since the weighted sum involves only zero-approaching variances. All the other messages will have finite variance, since the weighted sum involves the non zero-approaching variance. To summarize, each variable node sends (and each check node receives) d − 1 “narrow” messages and a single “wide” message. Each check node sends (and each variable node receives) d − 1 “wide” messages and a single “narrow” message, where the narrow message is sent along the edge from which the wide message was received (the edge with weight ±h1 ). We shall now concentrate on the case where the variable node generates a narrow message. Then, the sum in the (t) nominator of (46) has a single term for which Ṽi → 0, which corresponds to i = 1. The same is true for the sum in the denominator. Therefore, for large t, all the other terms will become negligible and we get: (t+1) mj (t) ≈ m̃1 (47) 20 (t) where m̃1 is the mean of the message that comes from the edge with weight h1 , i.e. the narrow check node message. As discussed above, d − 1 of the d variable node messages that leave the same variable node are narrow. From (47) it comes out that for large t, all these d − 1 narrow messages will have the same mean value. This completes the proof of Lemma 5. Now, combining (45) and (47) (where the indices are arranged again, as discussed above), we get: ! d X 1 (t) (t+1) bk − hi mli (48) ml1 ≈ h1 i=2 where li , i = 1, 2..., d are the variable nodes that take place in the check equation for which variable node l1 appears with coefficient ±h1 . bk is the element of b that is related (t+1) to this check equation. ml1 denotes the mean value of the d − 1 narrow messages that leave variable node l1 at iteration (t) t + 1. mli is the mean value of the narrow messages that were generated at variable node li at iteration t. Only narrow messages are involved in (48), because the right hand side of (47) is the mean value of the narrow check node message that arrived to variable node l1 , which results from the convolution of d−1 narrow variable node messages. Therefore, for large t, the mean values of the narrow messages are decoupled from the mean values of the wide messages (and also from the variances), and they obey an autonomous recursion. The mean values of the narrow messages at iteration t can be arranged in an n-element column vector m(t) (one mean value for each variable node). We would like to show that the mean values converge to the coordinates of the lattice point x = Gb. Therefore, it is useful to define the error vector ∆ e(t) = m(t) − x. Since Hx = b, we can write (using the same notations as (48)): ! d X 1 xl1 = bk − hi xli (49) h1 i=2 variable node messages have already converged to impulses at the corresponding lattice point coordinates (Theorem 2). We (t) can then substitute in (45) mi = xi for i ≥ 2. The mean value of the (wide) message that is returned along the edge with weight ±hj (j 6= 1) is:   d X 1  (t) (t)  m̃j = hi xi − h1 m1  = (52) bk − hj i=2 i6=j =   h1  1  (t) (t) h1 x1 + hj xj − h1 m1 = xj + x1 − m1 hj hj As in the previous section, for convenience of notations we embed the sign of ±hj in hj itself. The sign ambiguity will be resolved later. The meaning of (52) is that the returned mean value is the desired lattice coordinate plus an error term that is proportional to the error in the incoming wide message. From (38), assuming that the variance of the incoming wide message has already converged to its steady state value σ 2 (1 − α) and the variance of the incoming narrow messages has already converged to zero, the variance of this check node message will be: (t) Ṽj Pd = (t+1) el1 1 ≈− h1 h2 (t+1) mk (t) hi eli (50) yk /σ 2 + i=2 = Or, in vector notation: e(t+1) ≈ −H̃ · e(t) Pd j=2  xk + 1/σ 2 (51) where H̃ is derived from H by permuting the rows such that the ±h1 elements will be placed on the diagonal, dividing each row by the appropriate diagonal element (h1 or −h1 ), and then nullifying the diagonal. Note that in order to simplify the notations, we embedded the sign of ±hj in hj and did not write it implicitly. However, the definition of H̃ solves this ambiguity. This completes the proof of Lemma 6. = (54) h1 hj (xp(k,j) +  h2j (t) − mp(k,j) ) h2 σ2 (1−α) 1 h2j j=2 h21 σ 2 (1−α) Pd Note that the x1 and m1 terms of (52) were replaced by xp(k,j) and mp(k,j) , respectively, since for convenience of notations we denoted by m1 the mean of the message that came to a check node along the edge with weight ±h1 . For substitution in (46) we need to know the exact variable node index that this edge came from. Therefore, p(k, j) denotes the index of the variable node that takes place with coefficient ±h1 in the check equation where xk takes place with coefficient ±hj . Rearranging terms, we then get: (t+1) mk B. Proof of Lemma 7 (Mean of Wide Messages) Recall that each check node receives d−1 narrow messages and a single wide message. The wide message comes along the edge with weight ±h1 . Denote the appropriate lattice point by x = Gb, and assume that the Gaussians of the narrow (53) i . Now, each variable node receives d − 1 where α = i=2 h21 wide messages and a single narrow message. The mean values of the wide messages are according to (52) and the variances are according to (53). The single wide message that this variable node generates results from the d − 1 input wide messages and it is sent along the edge with weight ±h1 . From (46), the wide mean value generated at variable node k will then be: Subtracting (49) from (48), we get: d X h21 2 σ (1 − α) h2j yk (1 − α) + xk · α + = = hj j=2 h1 Pd (55)  (1 − α) + α (t) xp(k,j) − mp(k,j)  = 21 = yk + α(xk − yk ) + d 1 X (t) hj (xp(k,j) − mp(k,j) ) h1 j=2 (t) ∆ Denote now the wide message mean value error by ek = (t) mk − xk (where x = Gb is the lattice point that corresponds to b). Denote by q the difference vector between x and the are used for the generation of wide variable node messages). The zero approaching variance corresponds to the message that (t) arrives along an edge with weight ±h1 , so assume that Ṽk,1 approaches zero and all other variances approach a non-zero (t) value. Then, V̂k,i also approaches zero and we have to show (t) V̂k,i that the term (t) Ṽk,1 ∆ noisy observation y, i.e. q = y − x. Note that if b corresponds to the correct lattice point that was transmitted, q equals the channel noise vector w. Subtracting xk from both sides of (55), we finally get: (t+1) ek d 1 X (t) = qk (1 − α) − hj ep(k,j) h1 j=2 (56) , which is a quotient of zero approaching (t) terms, approaches a finite value. Substituting for V̂k,i , we get: −1  (t) V̂k,i lim (t) (t) Ṽk,1 →0 Ṽk,1 d X 1  1  1 +   2 (t) (t)  (t) σ Ṽk,1 →0 Ṽk,1 j=1 Ṽk,j = lim j6=i If we now arrange all the errors in a single column vector e, we can write: −1  e (t+1) (t) = −F · e + (1 − α)q (57) where F is an n × n matrix defined by:  Hr,k  Hr,l if k 6= l and there exist a row r of H Fk,l = for which |Hr,l | = h1 and Hr,k 6= 0  0 otherwise (58) F is well defined, since for a given l there can be at most a single row of H for which |Hr,l | = h1 (note that α < 1 implies that h1 is different from all the other elements of the generating sequence). As discussed above, we embedded the sign in hi for convenience of notations, but when several equations are combined the correct signs should be used. It can be seen that using the notations of (57) resolves the correct signs of the hi elements. This completes the proof of Lemma 7. An alternative way to construct F from H is as follows. To construct the k’th row of F , denote by ri , i = 1, 2, ...d, the index of the element in the k’th column of H with value hi (i.e. |Hri ,k | = hi ). Denote by li , i = 1, 2, ...d, the index of the element in the ri ’th row of H with value h1 (i.e. |Hri ,li | = h1 ). The k’th row of F will be all zeros except for the d − 1 H elements li , i = 2, 3...d, where Fk,li = Hrri,l,k . i = lim (t) Ṽk,1 →0 (t)  Ṽk,1   σ2 +1+ (t) Ṽk,1   (t)  j=2 Ṽk,j j6=i d X =1 (59) (t) Therefore, ai converges to a finite steady state value, and has a finite value for every i and t. This completes the first part of the proof. Pnd (t) We would now like to show that limt→∞ i=1 ai can be expressed in the form 2σ1 2 (Gb − y)T W (Gb − y). Every variable node sends d − 1 narrow messages and a single wide (t) message. We shall start by calculating ai that corresponds to a narrow message. For this case, d − 1 check node messages take place in the sums of (10), from which a single message is narrow and d − 2 are wide. The narrow message arrives (t) along the edge with weight ±h1 , and has variance Ṽk,1 → 0. Substituting in (10), and using (59), we get:  (t) a(k−1)d+i → d X 1  2  j=2  (t) m̃k,1 − (t) m̃k,j (t) Ṽk,j 2  + 2  − yk    σ2 (t) m̃k,1 j6=i (60) i A PPENDIX VI A SYMPTOTIC B EHAVIOR OF THE A MPLITUDES R ECURSION A. Proof of Lemma 9 (t) = From (10), ai is clearly non-negative. From Sections IVB, IV-C (and the appropriate appendices) it comes out that for consistent Gaussians, the mean values and variances of the messages have a finite bounded value and converge to a finite (t) steady state value. The excitation term ai depends on these mean values and variances according to (10), so it is also finite and bounded, and it converges to a steady state value, where caution should be taken for the case of a zero approaching variance. Note that at most a single variance in (10) may approach zero (as explained in Section IV-B, a single narrow check node message is used for the generation of narrow variable node messages, and only wide check node messages Denote x = Gb. The mean values of the narrow check node messages converge to the appropriate lattice point coordinates, (t) i.e. m̃k,1 → xk . From Theorem 3, the mean value of the wide variable node message that originates from variable node k converges to xk +ek , where e denotes the vector of error terms. The mean value of a wide check node message that arrives to node k along an edge with weight ±hj can be seen to approach (t) m̃k,j = xk − hh1j ep(k,j) , where p(k, j) denotes the index of the variable node that takes place with coefficient ±h1 in the check equation where xk takes place with coefficient ±hj . For convenience of notations, we shall assume that hj already includes the sign (this sign ambiguity will be resolved later). The variance of the wide variable node messages converges to σ 2 (1 − α), so the variance of the wide check node message that arrives to node k along an edge with weight ±hj can be (t) h2 seen to approach Ṽk,j → h21 σ 2 (1 − α). Substituting in (60), j 22 and denoting q = y − x, we get:   2  h 1 d 2 (xk − yk )  1 X hj ep(k,j) (t) = + a(k−1)d+i →  2  2  j=2 h12 σ 2 (1 − α) σ2 h ±hj in hj for convenience of notations, as discussed above. Turning now to the second term of (63):  2 d xk − hh1l ep(k,l) − yk X = (65) h2 j j6=i   1   1 =  2 2σ  1 − α d X j=2 j6=i   e2p(k,j)   qk2    +  = (61) → (62) d X =  d = To complete the calculation of the contribution of node k to the (t) excitation term, we still have to calculate ai that corresponds (t) to a wide message. Substituting m̃k,j → xk − hh1j ep(k,j) , (t) a(k−1)d+1 (t) − α), V̂k,1 → σ 2 (1 − α) in (10), we get: d d 1X X → 2  h1 hl ep(k,l) h21 h2l l=2 j=l+1  + d 1 X xk − 2 l=2 · h1 hl ep(k,l) h21 h2l − h1 hj ep(k,j) h21 2 σ (1 h2j d d h21 h2l l=2 d d 1 XX = 2 j=2 l=2  + αqk2 + 2qk (F · e)k = − yk · 2 + d X 2 (63) + hj hl ep(k,l) − ep(k,j) h1 h1 e2p(k,l) + αqk2 + 2qk [(1 − α)qk − ek ] = d X ! e2p(k,l) d = (64) (66) d+1−α 2 1 1 qk − 2 (F e)2k − 2 qk ek 2σ 2 2σ (1 − α) σ nd X (t) ai = n X d X (t) a(k−1)d+i → k=1 i=1 (d − 1)2 2 kek + 2σ 2 (1 − α) (67) 2 = + !  2 d d X X h j 2 =α ep(k,j) −  ep(k,j)  = h j=2 j=2 1 d+1−α q 2σ 2 2 1 1 2 kF ek − 2 q T e 2σ 2 (1 − α) σ − Substituting e = (1 − α)(I + F )−1 q (see Theorem 3), we = finally get: nd X (t) ai → i=1 1 T q Wq 2σ 2 (68) where: ∆ T W = (1 − α)(I + F )−1 d X X d−1 e2 + 2 2σ (1 − α) j=2 p(k,j) (t) a(k−1)d+i → Summing over all the variable nodes, the total asymptotic excitation sum term is: h2j 2 h2 hj hl ep(k,l) + l2 e2p(k,j) − 2 ep(k,l) ep(k,j) 2 h1 h1 h1 h1 =α + (2 − α)qk2 − 2qk ek i=1 σ2 h21 h2j ! where we have substituted F e → (1 − α)q − e, as comes out from Lemma 7. Again, using F resolves the sign ambiguity of hj , as discussed above. Substituting (64) and (65) back in (63), summing the result with (62), and rearranging terms, the total contribution of variable node k to the asymptotic excitation sum term is: i=1 1 XX = 2 j=2 ! e2p(k,l) l=2 − α) Starting with the first term, we have:  2 h1 h1 d d X X hl ep(k,l) − hj ep(k,j) l=2 j=l+1 = l=2  1  d − 2 X 2  + (d − 1)qk2  e → 2σ 2 1 − α j=2 p(k,j) h21 2 σ (1 h2j  l=2 (t) a(k−1)d+i  (t) hl h2l 2 q + 2qk ep(k,l) h21 k h1 e2p(k,l) + d X = i=2 Ṽk,j → d  X h2l l=2 Summing over all the narrow messages that leave variable node k, we get: d X 1 l=2   (d − 1)2 I − F T F (I + F )−1 + 2 e2p(k,j) − (F · e)k j=2 where F is defined in Theorem 3 and (F · e)k denotes the k’th element of the vector (F · e). Note that using F solves the sign ambiguity that results from embedding the sign of +(d + 1 − α)I − 2(1 − α)(I + F )−1 (69) Pnd (t) From (10) it can be seen that i=1 ai is positive for every nonzero q. Therefore, W is positive definite. This completes the second part of the proof. 23 (t) Since ai is finite and bounded, there exists ma such that (t) |ai | ≤ ma for all 1 ≤ i ≤ nd and t > 0. We then have: ∞ Pnd ∞ (j) X X n · ma nd · ma i=1 ai ≤ = 2j+2 2j+2 (d − 1) (d − 1) (d − 2) j=0 j=0 Therefore, for d > 2 the infinite sum will have a finite steady state value. This completes the proof of Lemma 9. A PPENDIX VII G ENERATION OF A PARITY C HECK M ATRIX FOR LDLC In the following pseudo-code description, the i, j element of a matrix P is denoted by Pi,j and the k’th column of a matrix P is denoted by P:,k . # Input: block length n, degree d, nonzero elements {h1 , h2 , ...hd }. # Output: a magic square LDLC parity check matrix H with generating sequence {h1 , h2 , ...hd }. # Initialization: choose d random permutations on {1, 2, ...n}. Arrange the permutations in an d × n matrix P such that each row holds a permutation. c = 1; # column index loopless columns = 0; # number of consecutive # columns without loops # loop removal: while loopless columns < n changed permutation = 0; if exists i 6= j such that Pi,c = Pj,c # a 2-loop was found at column c changed permutation = i; else # if there is no 2-loop, look for a 4-loop if exists c0 6= c such that P:,c and P:,c0 have two or more common elements # a 4-loop was found at column c changed permutation = line of P for which the first common element appears in column c; end end if changed permutation 6= 0 # a permutation should be modified to # remove loop choose a random integer 1 ≤ i ≤ n; swap locations c and i in permutation changed permutation; loopless columns = 0; else # no loop was found at column c loopless columns = loopless columns + 1; end # increase column index c = c + 1; if c > n c = 1; end end # Finally, build H from the permutations initialize H as an n × n zero matrix; for i = 1 : n for j = 1 : d HPj,i ,i = hj · random sign; end end A PPENDIX VIII R EDUCING THE C OMPLEXITY OF THE FFT C ALCULATIONS FFT calculation can be made simpler by using the fact that the convolution is followed by the following steps: the convolution result p̃j (x) is stretched to pj (x) = p̃j (−h   j x) and P∞ then periodically extended to Qj (x) = i=−∞ pj x − hij (see (3)). It can be seen that the stretching and periodic extension steps can be exchanged, and the convolution result p̃j (x) can P∞be first periodically extended with period 1 to Q̃j (x) = i=−∞ p̃j (x + i) and then stretched to Qj (x) = Q̃j (−hj x). Now, the infinite sum can be written as a convolution with a sequence of Dirac impulses: Q̃j (x) = ∞ X i=−∞ p̃j (x + i) = p̃j (x) ~ ∞ X δ(x + i) (70) i=−∞ Therefore, the Fourier transform of Q̃j (x) will equal the Fourier transform of p̃j (x) multiplied by the Fourier transform of the impulse sequence, which is itself an impulse sequence. The FFT of Q̃j (x) will therefore have several nonzero values, separated by sequences of zeros. These nonzero values will equal the FFT of p̃j (x) after decimation. To ensure an integer decimation rate, we should choose the PDF resolution ∆ such that an interval with range 1 (the period of Q̃j (x)) will contain an integer number of samples, i.e. 1/∆ should be an integer. Also, we should choose L (the number of samples in Q̃j (x)) to correspond to a range which equals an integer, i.e. D = L · ∆ should be an integer. Then, we can calculate the (size L) FFT of p̃j (x) and then decimate by D. The result will give 1/∆ samples which correspond to a single period (with range 1) of Q̃j (x). However, instead of calculating an FFT of length L and immediately decimating, we can directly calculate the decimated FFT. Denote the expanded PDF at the convolution input by f˜k , k = 1, 2, ...L (where the expanded PDF is zero padded to length L). To generate directly the decimated result, we can first calculate the (size D) FFT of each group of D samples which are generated by decimating f˜k by L/D = 1/∆. Then, the desired decimated result is the FFT (of size 1/∆) of the sequence of first samples of each FFT of size D. However, The first sample of an FFT is simply the sum of its inputs. Therefore, we should only calculate the sequence (of length PD−1 1/∆) gi = k=0 f˜i+k/∆ , i = 1, 2, ...1/∆ and then calculate the FFT (of length 1/∆) of the result. This is done for all the 24 expanded PDF’s. Then, d − 1 such results are multiplied, and an IFFT (of length 1/∆) gives a single period of Q̃j (x). With this method, instead of calculating d FFT’s and d IFFT’s of size larger than L, we calculate d FFT’s and d IFFT’s of size L/D = 1/∆. In order to generate the final check node message, we should stretch Q̃j (x) to Qj (x) = Q̃j (−hj x). This can be done by interpolating a single period of Q̃j (x) using interpolation methods similar to those that were used in Section VI for expanding the variable node PDF’s. ACKNOWLEDGMENT Support and interesting discussions with Ehud Weinstein are gratefully acknowledged. R EFERENCES [1] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, pp. 379-423 and pp. 623-656, July and Oct. 1948. [2] P. Elias, “Coding for noisy channels,” in IRE Conv. Rec., Mar. 1955, vol. 3, pt. 4, pp. 37-46. [3] C. E. Shannon, “Probability of error for optimal codes in a Gaussian channel,” Bell Syst. Tech. J., vol. 38, pp. 611-656, 1959. [4] R. E. Blahut, Theory and Practice of Error Control Codes. AddisonWesley, 1983. [5] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA: MIT Press, 1963. [6] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit errorcorrecting coding and decoding: Turbo codes,” Proc. IEEE Int. Conf. Communications, pp. 1064–1070, 1993. [7] R. de Buda, “The upper error bound of a new near-optimal code,” IEEE Trans. Inform. Theory, vol. IT-21, pp. 441-445, July 1975. [8] R. de Buda, “Some optimal codes have structure,” IEEE J. Select. Areas Commun., vol. 7, pp. 893-899, Aug. 1989. [9] T. Linder, Ch. Schlegel, and K. Zeger, “Corrected proof of de Buda‘s Theorem,” IEEE Trans. Inform. Theory, pp. 1735-1737, Sept. 1993. [10] H. A. Loeliger, “Averaging bounds for lattices and linear codes,” IEEE Trans. Inform. Theory, vol. 43, pp. 1767-1773, Nov. 1997. [11] R. Urbanke and B. Rimoldi, “Lattice codes can achieve capacity on the AWGN channel,” IEEE Trans. Inform. Theory, pp. 273-278, Jan. 1998. [12] U. Erez and R. Zamir, “Achieving 1/2 log(1 + SNR) on the AWGN channel with lattice encoding and decoding,” IEEE Trans. Inf. Theory, vol. 50, pp. 2293-2314, Oct. 2004. [13] A. R. Calderbank and N. J. A. Sloane, “New trellis codes based on lattices and cosets,” IEEE Trans. Inform. Theory, vol. IT-33, pp. 177195, Mar. 1987. [14] G. D. Forney, Jr., “Coset codes-Part I: Introduction and geometrical classification,” IEEE Trans. Inform. Theory, pp. 1123-1151, Sept. 1988. [15] O. Shalvi, N. Sommer and M. Feder, “Signal Codes,” proceedings of the Information theory Workshop, 2003, pp. 332–336. [16] O. Shalvi, N. Sommer and M. Feder, “Signal Codes,” in preparation. [17] A. Bennatan and D. Burshtein, “Design and analysis of nonbinary LDPC codes for arbitrary discrete-memoryless channels,” IEEE Transactions on Information Theory, volume 52, no. 2, pp. 549–583, February 2006. [18] J. Hou, P. H Siegel, L. B Milstein and H. D Pfister, “Capacity approaching bandwidth efficient coded modulation schemes based on low density parity check codes,” IEEE Transactions on Information Theory, volume 49, pp. 2141–2155, Sept. 2003. [19] J. H. Conway and N. J. Sloane, Sphere Packings, Lattices and Groups. New York: Springer, 1988. [20] G. Poltyrev, “On coding without restrictions for the AWGN channel,” IEEE Trans. Inform. Theory, vol. 40, pp. 409-417, Mar. 1994. [21] A. Papoulis, Probability, Random variables and Stochastic Processes. McGraw Hill, second edition, 1984. [22] Y. Saad, Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematic (SIAM), 2nd edition, 2003. [23] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point search in lattices,” IEEE Trans. Inf. Theory, vol. 48, pp. 2201-2214, Aug. 2002. [24] N. Sommer, M. Feder and O. Shalvi, “Closest point search in lattices using sequential decoding,” proceedings of the International Symposium on Information Theory (ISIT), 2005, pp. 1053–1057 . [25] H. El Gamal, G. Caire and M. Damen, “Lattice coding and decoding achieve the optimal diversity-multiplexing tradeoff of MIMO channels,” IEEE Trans. Inf. Theory, vol. 50, pp. 968–985, Jun. 2004.