APCCAS2008 MGomes
APCCAS2008 MGomes
APCCAS2008 MGomes
Abstract— The recent Digital Video Satellite Broadcast DVB-S2 LDPC codes. Encoder architecture explores theses
Standard (DVB-S2) has adopted a powerful FEC scheme based properties by a compacted memory mapping of the parity
on the serial concatenation of Bose-Chaudhuri-Hocquenghen check equations and by performing partial-parallel
(BCH) and Low-Density Parity-Check (LDPC) codes. The computation of the parity check bits, providing a huge encoder
high-speed requirements, long block lengths and adaptive throughput [4]. Also, the partial parallel computation of the
encoding defined in the DVB-S2 standard, present complex parity bits is performed on the fly according to the incoming
challenges in the design of an efficient codec hardware flux of information bits, guaranteeing a negligible delay.
architecture. In this paper, synthesizable, high throughput,
Decoding is accomplished by using a flexible parallel
scalable and parallel HDL models supporting the 21 different
BCH+LDPC DVB-S2 code configurations are presented. For
architecture whose number of functional units (FU) can be any
BCH decoding, an efficient Chien search circuit for shortened factor of M = 360 . Authors [5] were able to generalize a well
BCH codes is proposed. The LDPC codec architecture explores known M-kernel parallel architecture [6], by reducing the
the periodicity M = 360 of the special LDPC-IRA codes adopted number of computation kernels to any integer factor of M ,
by the standard. Synthesis results for an FPGA device from without memory addressing overhead and keep unchanged the
Xilinx show a throughput above the minimal 90Mbps. efficient message mapping scheme [6]. Our strategy also
reduces the length of the barrel shifter by the same factor and,
considerably, simplifies the routing problem.
I. INTRODUCTION
DVB-S2 has adopted shortened BCH codes. Encoding
Channel coding systems based on Forward Error architecture based on a linear feedback shift register machine,
Correction (FEC) systems, play a crucial role in today’s digital has no secrets and can be found on any good classic book on
transmission and storage systems. The presented work is part coding [7]. In order to accomplish high decoding throughputs
of an effort to design a codec for the second-generation Digital a parallel decoder architecture, shared by all 21 codes defined
Video Broadcast Standard (DVB-S2) FEC system based on on the standard, is presented. Several enhancements were
the serial concatenation of Low-Density Parity-Check (LDPC) introduced, namely, in the Chien search circuit for shortened
and Bose-Chaudhuri-Hocquenghen (BCH) codes, allowing to BCH codes.
operate near the Shannon limit [1][2].
The paper is organized as follows. Section II presents the
An important issue in the design of the DVB-S2 FEC outline of the DVB-S2 FEC system. Section III describes the
encoder and decoder architectures is the fact that the standard LDPC codec and section IV the BCH decoder. Synthesis
supports two different frame lengths (16200 bits for low delay results are reported in section V and conclusions are drawn in
applications and 64800 bits otherwise) and a set of different Section VI.
code rates ( 1 4 , 1 3 , 2 5 , 1 2 , 3 5 , 2 3 , 3 4 , 4 5 , 5 6 ,
8 9 and 9 10 ) for both frame lengths and different
II. DVB-S2 FEC SYSTEM
modulation schemes [1]. For each mode of operation it is
defined a different LDPC and BCH code and, although they The new DVB-S2 [1] [2] standard adopted a powerful
share a similar structure and properties, this still poses an FEC scheme based on the serial concatenation of shortened
enormous challenge on the development of an encoder and a BCH codes and LDPC-IRA codes [7]. LDPC codes are linear
decoder fully compliant with all operating modes. block codes defined by sparse binary parity-check matrices
[8][9], H and, usually, represented by Tanner graphs [5]. A
LDPC codec design poses the biggest challenge, due to Tanner graph is a bi-partite graph formed by two types of
their huge dimension, even though DVB-S2 standard has nodes. Check nodes ( ν C ), one per each code constraint, and
adopted a special class of LDPC codes, with linear encoding bit nodes one per each codeword bit (information and parity,
complexity, known by Irregular Repeat-Accumulate (IRA) [2] respectively, ν I and ν P ), with the connection edges between
[3]. In this paper we propose a low complexity partial-parallel them being given by H. An LDPC-IRA code is characterized
and vectorized hardware architectures capable of encoding by a binary parity check matrix, H, of the form,
and decoding all LDPC codes defined in the DVB-S2
standard, that explores the M = 360 periodicity nature of
360 bits
DATA IN 360 bits 360 bits
vector of 360 IN BARREL SHIFTER Rotated Entry
messages
360 bits
1507
ν I of the group C( c ) to which ν cI belongs. architecture is shared by all operating modes. RAM containing
messages are design for worst case scenario, i.e., q = 135 and
The decoder state-of-the-art [6] was based on a flexible a simpler control system (similar to the one used on encoder
partial parallel architecture, with M FU’s, that explores the Fig.1) selects the appropriate control ROM’s of Shift and
M periodicity nature of the codes. Although it was able to
Addresses, according to the operation mode.
provide a throughput far above from the minimum mandatory
rate of 90 Mbps , the high number of FU’s and the long width
of the barrel shifter require a huge silicon area. Authors have IV. BCH DECODER
shown [4] to be possible reducing the number of FU’s to any BCH decoding consists in three major steps [12] depicted
factor L of M (with M = L × N and L, N ∈ ), keeping in Fig. 3, where r represents the received vector, S the
unchanged the simplicity of the shuffling mechanism and the syndromes, σ is the error location polynomial, and D the
efficient memory mapping scheme. Fig. 2 presents the LDPC decoded codeword.
decoder architecture for DVB-S2. For a N -FU architecture
[4], C( c ) and R ( r ) are decomposed by subsampling in L r S σ
subgroups,
Cγ( ) = {c + γ , c + γ + L, c + γ + 2 L, , c + γ + ( N − 1) × L} ,
c
(8) D
R β = {r + β × q, r + ( L + β ) × q,
(r )
, r + (( N − 1) × L + β ) × q} ,
with γ , β = 0, , L − 1 . As mentioned before, a single FU unit Figure 3. Block diagram for the BCH decoding mechanism.
[11] is shared by a constant number of ν I , ν C and ν P nodes
(the last two are processed jointly following a horizontal The first step in the decoding process is the syndrome
schedule approach [6][11]), depending on the code length and computation. For a BCH code with t error correction capacity,
rate. More precisely, for a (n, k ) DVB-S2 LDPC-IRA code, 2t-1 syndromes are calculated from the received polynomial
the FUi, with 0 ≤ i ≤ N − 1 , updates sequentially in bit mode the r(x), as follows:
ν {Ii+γ , i+γ + M , i+γ +2×M , , i+γ +(α −1)×M } (and γ = 0, , L − 1 ) nodes, with n -1
α = k M . In check mode, the same FU updates the S j = r (α j ) = ∑ ri (α j )i , for 1 ≤ j ≤ 2t − 1 , (9)
ν {Cj , j +1, , j + L×q−1} and ν {Pj , j +1, , j +q−1} nodes, with j = i × L × q . This i =0
guarantees that when processing simultaneously the group where r ( x ) = rn −1 x n −1 + mk − 2 x k − 2 + … + r0 is the polynomial
BCH
BCH
where each one of them will be processed by a different FU. is the j-th root of the generator polynomial defining the BCH
Considering (2), the new computed messages only need to be code. If the process is carried out in parallel, processing p bits
right rotated to be handled by the correct ν C nodes [4]. The per clock cycle the necessary time to generate syndromes is
same happens when processing each R ( r ) set, where according nBCH p cycles.
to (2), the right rotation must be reversed in order to the new After syndrome generation, it is computed the error
computed messages have as destination the exact ν I nodes. location polynomial σ ( x) = σ 0 + σ 0 x + ... + σ 0 xt , whose roots
The shuffling network (barrel shifter) is responsible for the are associated with the error locations, by solving the key
correct message exchange between ν C and ν I nodes, equations:
emulating the code Tanner graph. The shift values stored in t
ROM (Fig. 2) can be easily obtained from the annexes B and
C of DVB-S2 standard tables [1]. Once again the decoder ∑S
j =0
t + i -1σ j = 0 , 1 ≤ j ≤ 2t − 1 . (10)
t
positions
SHIFTS
d r = ∑ σ i S2 r - j +1 . (11)
q x wCN
j =0
1508
used, testing p error locations on each cycle, reducing the recent DVB-S2 standard. An efficient Chien search is
search time to k BCH p clock cycles [12]. proposed for the parallel decoding of shortened BCH codes.
The LDPC codec exploits the periodic properties of
In Chien search, the 2m − 1 positions of the finite field are LDPC-IRA codes through structured vectorized hardware
tested. For the shortened BCH it is necessary to modify the
architectures. The developed solution for the challenging
circuits [12] in order to advance the search, β = 2m − n − 1 LDPC decoder proves to be flexible and easily reconfigurable
positions, corresponding to a shortened codeword. This can be according to the coder/decoder constraints.
performed by multiplying the initial values of σ by
α β ,α 2 β ,..., α t β . These constant values are stored in ROM
TABLE II. THROUGHPUT CONSIDERING FREQUENCY OF OPERATION f op
memory and the multiplication is performed using a bit
parallel multiplier as depicted in Fig. 4. Once again the
architecture can handle with the different BCH modes of BCH Encoder f op × nBCH ( k BCH + 1)
operation defined on the standard. A control system adjusts BCH Decoder f op × p × k BCH nBCH
decoding parameters according to the frame size and code
rate. LDPC Encoder f op × nLDPC ⎣⎡(W + q ) × 2 ⎦⎤
N-FU LDPC
σ Decoder ≥ f op × nLDPC ⎡⎣( 2 × W + w j − 3) × max _ iter × L ⎤⎦
0 σ (α j )
(M=NxL)
W - number of elements of matrix H in compacted form given by annexes B and C of DVB-S2
standard
max_iter - maximum number of LDPC decoding iterations
w j - weight of ν I with weight ≠ 3
REFERENCES
αβ α α 2β α 2 α tβ α t [1] ETSI, Digital video broadcasting (DVB); Second generation framing
structure, channel coding and modulation systems for broadcasting,
σ σ σ
interactive services, news gathering and other broad-band satellite
1 2 t applications: EN 302 307 V1. 1.1, 2005.
Figure 4. Modified Chien search circuit for decoding shortened BCH codes. [2] A. Morello and V. Mignone, "DVB-S2: The second generation
standard for satellite broad-band services," Proceedings of the IEEE,
V. SYNTHESIS RESULTS vol. 94, pp. 210-227, 2006.
[3] H. Jin, A. Khandekar, and R. McEliece, "Irregular repeat-accumulate
The proposed hardware modules for the DVB-S2 FEC codes," Proc. 2nd International Symposium on Turbo Codes & Related
system, presented in the previous sections were validated and Topics, Brest, France, Sept. 2000.
synthesized on the FPGA Virtex-II PRO family. The [4] Marco Gomes, Gabriel Falcão, Vitor Silva, Alexandre Sengo, Vitor
well-known BCH encoder [7] was also implemented for Ferreira and Miguel Falcão, “High Throughput Encoder Architecture
comparison purposes. Table I presents the synthesis results on for DVB-S2 LDPC-IRA Codes”, IEEE ICM2007, pp. 285-288, Cairo,
Egypt, Dec. 2007.
the XC2VP30 FPGA.
[5] Marco Gomes, Gabriel Falcão, Vitor Silva, Vitor Ferreira, Alexandre
TABLE I. SYNTHESYS RESULTS ON XC2VP30 VIRTEX2P XILINX FPGA Sengo and Miguel Falcão, “Flexible Parallel Architecture for DVB-S2
LDPC Decoders”, IEEE Globelcom2007, Washington DC – USA, Nov.
BCH BCH LDPC 45-FU's LDPC 2007.
Encoder Decoder Encoder Decoder [6] F. Kienle, T. Brack, and N. Wehn, "A Synthesizable IP Core for DVB-
Slices 4% 96% 32% 87% S2 LDPC Code Decoding," Design, Automation and Test in Europe
Flip-Flops 2% 9% 3% 19% (DATE'05), Munich, Germany, 2005.
LUT’s 4% 92% 81% [7] S. Lin, D. J. Costello, Error Control Coding, 2nd ed., Prentice-Hall,
BRAM’s -- 2% 20% 132%* N.J., 2004.
Max. Freq. [8] R. G. Gallager, "Low-Density Parity-Check Codes," IRE Transactions
182.7 MHZ 114.8 MHZ 131.7 MHZ 70.8 MHZ
of Oper. on Information Theory, vol. 8, pp. 21-&, 1962.
[9] D. J. C. MacKay, "Good error-correcting codes based on very sparse
Attainable throughputs for a frequency of operation, f op matrices," IEEE Transactions on Information Theory, vol. 45, pp. 399-
are presented in Table II. The minimum 90 Mbps throughput 431, 1999.
is guaranteed in all cases, even though for the 45-FU’s LDPC [10] J. H. Chen and M. P. C. Fossorier, "Near optimum universal belief
decoder only a maximum number of 5 iterations can be propagation based decoding of low-density parity check codes," IEEE
performed. A 90-FU’s architecture (that guarantees 90 Mbps Transactions on Communications, vol. 50, pp. 406-414, 2002.
performing 15 iterations) is suitable to this FPGA family [11] M. Gomes, G. Falcão, J. Gonçalves, V. Silva, M. Falcão, and P. Faia,
which requires at least the use of a XC2VP50 that would also "HDL Library of Processing Units for Generic and DVB-S2 LDPC
Decoding," International Conference on Signal Processing and
satisfy the minimum memory requirements of the 21 Multimedia Applications (SIGMAP2006), Setúbal, Portugal, 2006.
BCH+LDPC operating modes. All modules can be jointly [12] Y.Chen, K.K. Parhi, "Area efficient parallel decoder architecture for
integrated into a single XC2VP100 FPGA. long BCH codes", Proc. IEEE Int. Conf. Acoustics, Speech, and Signal
Processing, vol.5, pp. V - 73-76, May 2004.
VI. CONCLUSIONS [13] E. Jamro, "The Design of a VHDL Based Synthesis Tool for BCH
Codecs", The University of Huddersfield, Sep. 1997.
This paper proposes a high throughput encoder/decoder [14] A. Reyhani-Masoleh, M. Hasan, "Low complexity bit parallel
hardware synthesizable solution for the FEC system of the architectures for polynomial basis multiplication over GF(2m)", IEEE
Trans. Computers, vol. 53, no. 8, pp. 945-959, Aug. 2004.
1509