FPGA Implementation of LDPC Decoder Architecture For Wireless Communication Standards
FPGA Implementation of LDPC Decoder Architecture For Wireless Communication Standards
FPGA Implementation of LDPC Decoder Architecture For Wireless Communication Standards
Abstract—This paper presents a decoder design for Quasi- In this paper we provide the parallel decoder implemen-
Cyclic (QC) Low-Density Parity-Check (LDPC) codes. The design tation for QC-LDPC codes, which meets the requirements of
is parameterized and can be easily rebuilt to support various modern specifications for throughput, and has the flexibility to
LDPC Parity-Check matrices taken from the WiMAX (IEEE
802.16e) and the WiFi (IEEE 802.11n) standards. New techniques work with various code rates and lengths. The proposed LDPC
such as parallelization of the decoding architecture cores are decoder can operate with matrices similar to IEEE 802.11 and
proposed. These cores calculate variable-to-check (VTC) and IEEE 802.16e. Most of the previously proposed architectures
new check-to-variable (CTV) messages and also update posterior do not pay much attention to the way the sparse parity check
probabilities (APPs). The parallel multicore decoding architec- matrix is stored. So, the new way of storing the parity check
ture implies a prior shift of values based on the LDPC matrix and
simultaneous calculation of values for the core. Our decoder is matrix is proposed. It allows to reduce the amount of memory
implemented on FPGAs of the Zynq-7000 Mini-ITX Evaluation needed to store the code matrix.
Board (XC7Z100-2FFG900). The throughput of up to 1,2 GBit/s Moreover, the details and results of implementing the above
and the operation frequency of up to 240 MHz have been mentioned architecture on the FPGA part of the Zynq-7000
achieved. Mini-ITX Evaluation Board (XC7Z100-2FFG900) are pro-
Index Terms—Communication systems, Quasi-Cyclic codes,
Low-Density Parity-Check codes, error correction codes, parallel vided.
decoder architecture, iterative decoding, min-sum (MS) algo- The paper is organized as follows: the next section presents
rithm, FPGA implementation. the overview of the decoding algorithm used herein. The pro-
posed FPGA architecture of the LDPC decoder is described in
I. I NTRODUCTION section III. The hardware implementation results are presented
in section IV. Finally, Section V concludes this paper.
The Low-density parity-check (LDPC) codes concept was
developed by Gallager in 1963 [1]. LDPC codes had been for- II. M IN -S UM D ECODING A LGORITHM
gotten until the Gallager’s work was rediscovered by MacKay Low-Density Parity-Check (LDPC) codes are a class of
and Neal [2] in late 1990’s. Currently, there is a high interest in linear block codes. As the name implies, they exhibit the
LDPC codes due to their excellent error correcting capability linearity property and are applied to source data blocks.
and their performance able to achieve up to 0.0045 dB of Any LDPC code can be represented as a check matrix which
the Shannon’s limit at a BER of 10−6 [3]. These codes have contains a small number of ones and all other elements are
become part of important communication standards such as filled with zeros. A regular (N, K) LDPC code is represented
IEEE 802.11n/ad/ay/ax (WiFi) [4], IEEE 802.16e (WiMAX) as a matrix H which includes M ≥ N − K rows and N
[5], and ETSI 5G [6]. They are mentioned in digital video columns. The M rows of H are defined as check nodes, and
broadcasting standards such as DVB-T2, DVB-S2, and DVB- all the elements of a LDPC codeword as variable nodes.
C2. Nowadays LDPC codes also find place in other fields, The min-sum algorithm (MSA) is a simplified Belief prop-
including error correction techniques for solid state drives agation algorithm (BP) built by reducing complexity of the
(SSD). unit responsible for check node updates (CNU). The following
A number of works devoted to parallel LDPC decoding definitions are given for the MSA algorithm: cn indicates
architectures have been published in the last decade [7] - [10]. the n-th bit of a codeword and yn is a value received from
Only paper [8] presents a universal parametrizable decoder. the channel. The check-to-variable (CTV) message rmn [k]
The decoders proposed in [7], [9], [10] are based only on one is a message from the check node m to the variable node
communication standard or one codeword length. The authors n during the k-th iteration. The variable-to-check message
of the papers widely use BRAMs to store the decoder internal (VTC) qmn [k] is a message from the variable node n to the
values. check node m. The set of variables involved in checking m is
This work was supported by Russian Science Foundation under Grant 17- defined as N (m) and the set of variable n checks is defined
79-20302. as M (n). The set N (m) without the variable n is denoted as
Figure 1. FPGA LDPC decoder structure
N (m)/n and the set M (n) without the check m is denoted III. P ROPOSED D ECODER A RCHITECTURE
as M (n)/m [11]. A. Decoder architecture definitions
The algorithm implementation includes the following steps:
1) Initialization: The general structure of the proposed LDPC decoder is
At this step, the channel probability pn (intrinsic informa- shown in Figure 1.
tion) of the variable node n is calculated based on a priori The soft decisions from the demodulator are used as input
probability: to the decoder. The input serial bitstream is converted into a
parallel and is written to the block memory. For ease of parallel
P (yn |cn = 0) APP reading and writing, the block memory is divided into
pn = log (1)
P (yn |cn = 1) K = N/z blocks. The number of these blocks is determined
The value rmn is initially set to zero. by the length of the codeword (N ) and the size of the circulant
2) Iterative Decoding: (z). The size of each block depends on the number of symbols
The variable-to-check message qmn [k] is calculated at each in the circulant and the width of the input data.
iteration as follows: The Iterations counter block makes a decision on comple-
X tion of the decoding process and output of the result. When
qmn [k] = pn + rm0 n [k − 1] (2) a given number of iterations is reached, its output is set to a
m0 ∈{M (n)/m} high level.
The LDPC decoder makes a hard decision by calculating The Operating mode selection block is used to switch
the a-posterior probability (APP) as follows: the operating mode to ”decoding” or ”result output” and to
organize output of the decoding result. When the high level
is received at the input from the Iterations counter block, the
X
Λn [k] = pn + rm0 n [k − 1] (3)
m0 ∈M (n)
“result output” mode is switched on: a high level is set at
the READY output, and the decoded word is obtained at the
The value of the n-th result bit of the codeword xn is equal
output. Otherwise, the next iteration is to be run.
to 0 when Λn > 0 and xn = 1 otherwise. The decoding
The VTC calculation block is used to calculate the variable-
process is completed when the entire codeword x = [x1 , x2 , · ·
to-check messages based on APPs from the BRAM APP
· · · · xN ] satisfies the parity check equations: Hx = 0, or the
block. A simplified expression for calculating variable-to-
preset maximum number of iterations is reached.
check messages is shown in expression (5).
If the decoding process does not stop, then the check-to-
variable message rmn [k] for the check node m is calculated V T Cn = AP Pn − CT Vn , n = 1..dr , (5)
as follows:
where dr is the matrix row weight.
A check matrix, which is stored in the distributed memory,
Y is needed to retrieve characters from an APP correctly and to
rmn [k] = sign(qmn0 [k])
calculate bit-to-variable messages.
n0 ∈{N (m)/n}
The Min/submin calculation block receives dr signed VTC
× α× min {|qmn0 [k]|} (4) values including the minimum and subminimal values at the
n0 ∈{N (m)/n} input. The dr value is determined by the check matrix.
Compared to the BP algorithm, usage of the min-sum The diagram shown in Figure 2 has eight signed data inputs.
algorithm reduces consumption of hardware resources and the This diagram is a modified version of the algorithm from [12].
computational complexity of CNUs [11]. The maximum matrix row weight is limited by the number of
Figure 2. Min/submin calculation block
inputs, it equals to eight in this case. When the line weight is in Figure 3.
less than eight, the remaining inputs are fed with the maximum To calculate the updated CTV values, the input of the CTV
positive values for the current input bitness. The ABS block is and APP calculation block requires the minimum VTC value,
used to calculate modulo values: the input signed data (1−8) is the product of the VTC signs and the sign for the current
converted to the unsigned data (10 −80 ). Also, the ABS module value. The calculation of the updated CTV is presented by
extracts signs from all inputs and sends them to the signs bus. formula (6).
The collection of all signs is passed to the Signs XORing
block, where modulo 2 addition is performed. The search N
algorithm is a cascade of 4 consecutive stages of comparators Y
CT Vnewn = sign(V T Ck ) × sign(V T Cn )
(Compare), connected as shown in Figure 2. In this case the
k=1
maximum row weight is 7. ×min(|V T C|), (6)
The subminimal value is obtained at the output corre-
sponding to the input with the minimum modulo value. The where n is from 1 to dr .
minimum value is obtained at the other outputs. All outputs The updated APPs are obtained by summing the VTCs and
are signed and have the sign obtained from the Signs XORing CTVs, as shown by the formula (7).
block. The VTC input is also passed to the output.
where n is from 1 to dr .