APCCAS2008 MGomes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Scalable and Parallel Codec Architectures

for the DVB-S2 FEC System

M. Gomes, G. Falcão, V. Silva V. Ferreira, A. Sengo, L. Silva, N. Marques and M. Falcão


Instituto de Telecomunicações Chipidea – MIPS Technologies Analog Business Group
Dep. of Electrical and Computer Engineering Rua Frederico Ulrich, n. 2650
University of Coimbra, Portugal 4470-605 Moreira da Maia, Portugal
{marco, gff, vitor}@co.it.pt {vferreir, asengo, lsilva, nmarques, mfalcao}@chipidea.mips.com

Abstract— The recent Digital Video Satellite Broadcast DVB-S2 LDPC codes. Encoder architecture explores theses
Standard (DVB-S2) has adopted a powerful FEC scheme based properties by a compacted memory mapping of the parity
on the serial concatenation of Bose-Chaudhuri-Hocquenghen check equations and by performing partial-parallel
(BCH) and Low-Density Parity-Check (LDPC) codes. The computation of the parity check bits, providing a huge encoder
high-speed requirements, long block lengths and adaptive throughput [4]. Also, the partial parallel computation of the
encoding defined in the DVB-S2 standard, present complex parity bits is performed on the fly according to the incoming
challenges in the design of an efficient codec hardware flux of information bits, guaranteeing a negligible delay.
architecture. In this paper, synthesizable, high throughput,
Decoding is accomplished by using a flexible parallel
scalable and parallel HDL models supporting the 21 different
BCH+LDPC DVB-S2 code configurations are presented. For
architecture whose number of functional units (FU) can be any
BCH decoding, an efficient Chien search circuit for shortened factor of M = 360 . Authors [5] were able to generalize a well
BCH codes is proposed. The LDPC codec architecture explores known M-kernel parallel architecture [6], by reducing the
the periodicity M = 360 of the special LDPC-IRA codes adopted number of computation kernels to any integer factor of M ,
by the standard. Synthesis results for an FPGA device from without memory addressing overhead and keep unchanged the
Xilinx show a throughput above the minimal 90Mbps. efficient message mapping scheme [6]. Our strategy also
reduces the length of the barrel shifter by the same factor and,
considerably, simplifies the routing problem.
I. INTRODUCTION
DVB-S2 has adopted shortened BCH codes. Encoding
Channel coding systems based on Forward Error architecture based on a linear feedback shift register machine,
Correction (FEC) systems, play a crucial role in today’s digital has no secrets and can be found on any good classic book on
transmission and storage systems. The presented work is part coding [7]. In order to accomplish high decoding throughputs
of an effort to design a codec for the second-generation Digital a parallel decoder architecture, shared by all 21 codes defined
Video Broadcast Standard (DVB-S2) FEC system based on on the standard, is presented. Several enhancements were
the serial concatenation of Low-Density Parity-Check (LDPC) introduced, namely, in the Chien search circuit for shortened
and Bose-Chaudhuri-Hocquenghen (BCH) codes, allowing to BCH codes.
operate near the Shannon limit [1][2].
The paper is organized as follows. Section II presents the
An important issue in the design of the DVB-S2 FEC outline of the DVB-S2 FEC system. Section III describes the
encoder and decoder architectures is the fact that the standard LDPC codec and section IV the BCH decoder. Synthesis
supports two different frame lengths (16200 bits for low delay results are reported in section V and conclusions are drawn in
applications and 64800 bits otherwise) and a set of different Section VI.
code rates ( 1 4 , 1 3 , 2 5 , 1 2 , 3 5 , 2 3 , 3 4 , 4 5 , 5 6 ,
8 9 and 9 10 ) for both frame lengths and different
II. DVB-S2 FEC SYSTEM
modulation schemes [1]. For each mode of operation it is
defined a different LDPC and BCH code and, although they The new DVB-S2 [1] [2] standard adopted a powerful
share a similar structure and properties, this still poses an FEC scheme based on the serial concatenation of shortened
enormous challenge on the development of an encoder and a BCH codes and LDPC-IRA codes [7]. LDPC codes are linear
decoder fully compliant with all operating modes. block codes defined by sparse binary parity-check matrices
[8][9], H and, usually, represented by Tanner graphs [5]. A
LDPC codec design poses the biggest challenge, due to Tanner graph is a bi-partite graph formed by two types of
their huge dimension, even though DVB-S2 standard has nodes. Check nodes ( ν C ), one per each code constraint, and
adopted a special class of LDPC codes, with linear encoding bit nodes one per each codeword bit (information and parity,
complexity, known by Irregular Repeat-Accumulate (IRA) [2] respectively, ν I and ν P ), with the connection edges between
[3]. In this paper we propose a low complexity partial-parallel them being given by H. An LDPC-IRA code is characterized
and vectorized hardware architectures capable of encoding by a binary parity check matrix, H, of the form,
and decoding all LDPC codes defined in the DVB-S2
standard, that explores the M = 360 periodicity nature of

978-1-4244-2342-2/08/$25.00 ©2008 IEEE. 1506


H ( n−k )×n = ⎡ A ( n−k )×k B( n−k )×( n−k ) ⎤ Authors have shown [4] that is possible to parallelize (3),
⎣ ⎦ based on the M = 360 periodicity nature of DVB-S2 LDPC
⎡ a a 00
a 1 0 0⎤ 01 0 , k −1 codes, being able to process simultaneously M ν I nodes.
⎢ a a a 1 1 0
⎥ Fig. 1 presents the decoder architecture composed by three
⎢ 10
⎥ 11 1, k −1

⎢ 0 1 1 ⎥, (1) major modules which are responsible for computing the S r


=⎢ ⎥ values (Task 1), performing the checksum of all the columns
⎢ 0 ⎥
⎢a a a 1 1 0⎥
of the S memory where the Sr values are stored (Task 2) and,
⎢ n − k − 2 ,0

n − k − 2 ,1 n − k − 2 , k −1
finally, computing the parity check bits (Task 3). The
⎢⎣ a a a
n − k −1,0
0 … … 0 1 1⎥

n − k −1,1 n − k −1, k −1
proposed architecture is independent of the chosen DVB-S2
where B is a staircase lower triangular matrix and A is sparse. LDPC-IRA code. In fact, we only need to design the S RAM
Some periodicity constraints were put on the for the worst case scenario, i.e., q = 135 . Based on the code
pseudo-random design of the A matrices for DVB-S2. The rate and frame length, it is selected the appropriate row of the
control ROM, where the encoder control parameters are stored
matrix A is divided in disjoint groups of M consecutives ν I
nodes. Denoting by, r1 , r2 , … , rw , the indices of the ν C nodes (the number of IN’s groups of weight w ) for each DVB-S2
that connect to the first ν I of group l , the indices of the
l
LDPC code. The corresponding ROM containing shift and
address values is read sequentially in order to perform Task 1,
ν C nodes that connect to ν iI , with 0 ≤ i ≤ M − 1 , of group l can
be obtained by, S(r mod q,:) = S(r mod q,:) ⊕ rotr div q (i ) , (4)
{(r1 + i × q) mod(n − k ), (r2 + i × q) mod(n − k ),… , with all the lines r ∈ IN ( ic ) being updated, for a information
(2) vector input, i = [ic ic +1 ic + M −1 ] ( rota (⋅) is a right circular shift
(rwl + i × q) mod(n − k )},
of a bits).
with q = (nLDPC − k LDPC ) M and M = 360 (a common factor for
all DVB-S2 supported codes). This allows a significant Task 2 is performed simultaneously with Task 1[4]. After
reduction on the storage requirement without code Step 1, formed by tasks 1 and 2, the first 359 values of the
performance loss. Total GF(2) Sum register are processed by a simple
combinatorial module in order to obtain
BCH codes defined in DVB-S2 are shortened codes, since q −1 2 q −1 359 q −1
the block lengths nLDPC are shorter than 2m − 1 , with ⎡ 0 ⊕ S ⊕ S ,…, ⊕ S ⎤ = ⎡0 p
⎢⎣ i = 0 i i = 0 i ⎦ ⎣
i⎥ ⎦ . (5)
q −1 p2 q −1 ,… , p359 q −1 ⎤
m = {14, 16} [1]. i =0

And finally, for j = 0, , q − 1 we calculate M parity bits at


III. LDPC CODEC a time, by using the following procedure,
[ p j pq+ j p2 q+ j ,… , p359 q + j ] = [ p j −1 pq+ j −1 p2 q + j −1 ,… , p359 q + j −1 ] ⊕ S( j ,:) . (6)
A. Encoder
Codes defined by (1) are systematic, i.e., the codewords B. Decoder
have the following format, c = [i p] , with i = [i0 i1 ik −1 ] , LDPC
LDPC codes are decoded using low complexity iterative
the information bits, and p = [ p0 p1 pn − k −1 ] , the parity LDPC LDPC
belief propagation algorithms operating over the Tanner
check bits, being associated to the Tanner graph ν I and ν P graph description [10]. Periodicity properties (2) of DVB-S2
nodes, respectively. Denote by IN ( r ) , the indices of the LDPC-IRA codes turns possible the simultaneous processing
information bits that participate in the parity check node of ν I and ν C node sets, whose indices are given by,
restriction r , i.e. ν rC , and denote by CN ( c ) the indices of C( ) = {c, c + 1,
c
, c + M − 1} , with, c mod M = 0 , and
CN’s that are connected to ν cI (the letters ‘r’ and ‘c’ mean row (r )
R = {r , r + q, r + 2q, , r + ( M − 1)q} , with, 0 ≤ r ≤ q − 1 , (7)
and column of H ). Encoding can be performed recursively
according to respectively, which significantly simplifies the decoder
r control. In fact, according to (2), if ν cI is connected to ν rC ,
pr = S r + pr −1 = ⊕ Si , with S r = ⊕ iz , (3)
i =0 z∈IN ( r ) then, ν rC+i×q , with 0 ≤ i ≤ M − 1 , will be connected to,
for r = 0, , (n − k − 1) , where ⊕ is the GF(2) addition. ν cI+( c −c+i ) mod M , where, c = M × (c div M ) is the index of the first

STEP 1 - Task 1: Sr’s Computation


Control:
‘0’ - step 1 S RAM
Code Memmories of Shifts and Addresses ‘1’ - step 2 M = 360
Frame Size
and Rate 0
a en S0 Sq S359*q
en en en
d 1 8 S1 Sq+1 360 bits
s
d
q Control:
h S2 Sq+2
r ‘0’ - step 1
i first 359 bits ‘1’ - step 2
f e
t s 8 bits
s s 359 bits
Sq-1 S2q-1 S360*q-1 1
e 359 bits 0 360 bits
s 360 bits Total GF(2) Sum 359 bits
360 bits
Control: GF(2) Sum Processed GF(2) Sum DATA OUT
17 bits sequential 1 0
ROM of control counter 0 1
‘0’ - step 1
Control:
Processor
‘1’ - step 2 360 bits
parameters ‘0’ - step 1
‘1’ - step 2
address 360 bits
Control STEP 2 – Task 3: Parity
shift 8 bits STEP 1 – Task 2: Total Sum Computation Bits Computation
9 bits Sr values read from memory

360 bits
DATA IN 360 bits 360 bits
vector of 360 IN BARREL SHIFTER Rotated Entry
messages
360 bits

Figure 1. Architecture for the DVB-S2 LDPC-IRA encoder.

1507
ν I of the group C( c ) to which ν cI belongs. architecture is shared by all operating modes. RAM containing
messages are design for worst case scenario, i.e., q = 135 and
The decoder state-of-the-art [6] was based on a flexible a simpler control system (similar to the one used on encoder
partial parallel architecture, with M FU’s, that explores the Fig.1) selects the appropriate control ROM’s of Shift and
M periodicity nature of the codes. Although it was able to
Addresses, according to the operation mode.
provide a throughput far above from the minimum mandatory
rate of 90 Mbps , the high number of FU’s and the long width
of the barrel shifter require a huge silicon area. Authors have IV. BCH DECODER
shown [4] to be possible reducing the number of FU’s to any BCH decoding consists in three major steps [12] depicted
factor L of M (with M = L × N and L, N ∈ ), keeping in Fig. 3, where r represents the received vector, S the
unchanged the simplicity of the shuffling mechanism and the syndromes, σ is the error location polynomial, and D the
efficient memory mapping scheme. Fig. 2 presents the LDPC decoded codeword.
decoder architecture for DVB-S2. For a N -FU architecture
[4], C( c ) and R ( r ) are decomposed by subsampling in L r S σ
subgroups,
Cγ( ) = {c + γ , c + γ + L, c + γ + 2 L, , c + γ + ( N − 1) × L} ,
c

(8) D
R β = {r + β × q, r + ( L + β ) × q,
(r )
, r + (( N − 1) × L + β ) × q} ,
with γ , β = 0, , L − 1 . As mentioned before, a single FU unit Figure 3. Block diagram for the BCH decoding mechanism.
[11] is shared by a constant number of ν I , ν C and ν P nodes
(the last two are processed jointly following a horizontal The first step in the decoding process is the syndrome
schedule approach [6][11]), depending on the code length and computation. For a BCH code with t error correction capacity,
rate. More precisely, for a (n, k ) DVB-S2 LDPC-IRA code, 2t-1 syndromes are calculated from the received polynomial
the FUi, with 0 ≤ i ≤ N − 1 , updates sequentially in bit mode the r(x), as follows:
ν {Ii+γ , i+γ + M , i+γ +2×M , , i+γ +(α −1)×M } (and γ = 0, , L − 1 ) nodes, with n -1
α = k M . In check mode, the same FU updates the S j = r (α j ) = ∑ ri (α j )i , for 1 ≤ j ≤ 2t − 1 , (9)
ν {Cj , j +1, , j + L×q−1} and ν {Pj , j +1, , j +q−1} nodes, with j = i × L × q . This i =0

guarantees that when processing simultaneously the group where r ( x ) = rn −1 x n −1 + mk − 2 x k − 2 + … + r0 is the polynomial
BCH
BCH

Cγ( ) , the computed messages have as destination a set R (β ) ,


c r corresponding to the received word r = [r0 , …, rn −1 ] , and α j
BCH

where each one of them will be processed by a different FU. is the j-th root of the generator polynomial defining the BCH
Considering (2), the new computed messages only need to be code. If the process is carried out in parallel, processing p bits
right rotated to be handled by the correct ν C nodes [4]. The per clock cycle the necessary time to generate syndromes is
same happens when processing each R ( r ) set, where according nBCH p cycles.
to (2), the right rotation must be reversed in order to the new After syndrome generation, it is computed the error
computed messages have as destination the exact ν I nodes. location polynomial σ ( x) = σ 0 + σ 0 x + ... + σ 0 xt , whose roots
The shuffling network (barrel shifter) is responsible for the are associated with the error locations, by solving the key
correct message exchange between ν C and ν I nodes, equations:
emulating the code Tanner graph. The shift values stored in t
ROM (Fig. 2) can be easily obtained from the annexes B and
C of DVB-S2 standard tables [1]. Once again the decoder ∑S
j =0
t + i -1σ j = 0 , 1 ≤ j ≤ 2t − 1 . (10)

We used inversionless Berlekamp-Massey Algorithm (BMA)


[13], which takes r iterations to compute the discrepancy d r :
ADDRESSES
q x wCN mem

t
positions

SHIFTS

d r = ∑ σ i S2 r - j +1 . (11)
q x wCN

j =0

If d r ≠ 0 , it is used to modify the degree and coefficients


of the polynomial σ ( x) . For a number of iterations greater or
equal to the number of occurred errors d r = 0 and, therefore,
q x wCN

t − 1 iterations are required to find σ ( x) . A low complexity


architecture for the bit-parallel multiplier [14], was used in the
sums of products of the BMA.
q x wCN

The last step in BCH decoding is the Chien search. In this


method, the sum,
t
σ (α j )= ∑ σ iα ij , j = 0,1,..., k − 1 , (12)
i =0
is evaluated every clock cycle. If σ (α i )=0 , the received bit
rn -1- j is corrupted and must be corrected [12]-[14]. To speed
Figure 2. DVB-S2 LDPC decoding architecture using N parallel processing up this process a parallel architecture by a factor p can be
units with M = L × N .

1508
used, testing p error locations on each cycle, reducing the recent DVB-S2 standard. An efficient Chien search is
search time to k BCH p clock cycles [12]. proposed for the parallel decoding of shortened BCH codes.
The LDPC codec exploits the periodic properties of
In Chien search, the 2m − 1 positions of the finite field are LDPC-IRA codes through structured vectorized hardware
tested. For the shortened BCH it is necessary to modify the
architectures. The developed solution for the challenging
circuits [12] in order to advance the search, β = 2m − n − 1 LDPC decoder proves to be flexible and easily reconfigurable
positions, corresponding to a shortened codeword. This can be according to the coder/decoder constraints.
performed by multiplying the initial values of σ by
α β ,α 2 β ,..., α t β . These constant values are stored in ROM
TABLE II. THROUGHPUT CONSIDERING FREQUENCY OF OPERATION f op
memory and the multiplication is performed using a bit
parallel multiplier as depicted in Fig. 4. Once again the
architecture can handle with the different BCH modes of BCH Encoder f op × nBCH ( k BCH + 1)
operation defined on the standard. A control system adjusts BCH Decoder f op × p × k BCH nBCH
decoding parameters according to the frame size and code
rate. LDPC Encoder f op × nLDPC ⎣⎡(W + q ) × 2 ⎦⎤
N-FU LDPC
σ Decoder ≥ f op × nLDPC ⎡⎣( 2 × W + w j − 3) × max _ iter × L ⎤⎦
0 σ (α j )
(M=NxL)
W - number of elements of matrix H in compacted form given by annexes B and C of DVB-S2
standard
max_iter - maximum number of LDPC decoding iterations
w j - weight of ν I with weight ≠ 3

REFERENCES
αβ α α 2β α 2 α tβ α t [1] ETSI, Digital video broadcasting (DVB); Second generation framing
structure, channel coding and modulation systems for broadcasting,
σ σ σ
interactive services, news gathering and other broad-band satellite
1 2 t applications: EN 302 307 V1. 1.1, 2005.
Figure 4. Modified Chien search circuit for decoding shortened BCH codes. [2] A. Morello and V. Mignone, "DVB-S2: The second generation
standard for satellite broad-band services," Proceedings of the IEEE,
V. SYNTHESIS RESULTS vol. 94, pp. 210-227, 2006.
[3] H. Jin, A. Khandekar, and R. McEliece, "Irregular repeat-accumulate
The proposed hardware modules for the DVB-S2 FEC codes," Proc. 2nd International Symposium on Turbo Codes & Related
system, presented in the previous sections were validated and Topics, Brest, France, Sept. 2000.
synthesized on the FPGA Virtex-II PRO family. The [4] Marco Gomes, Gabriel Falcão, Vitor Silva, Alexandre Sengo, Vitor
well-known BCH encoder [7] was also implemented for Ferreira and Miguel Falcão, “High Throughput Encoder Architecture
comparison purposes. Table I presents the synthesis results on for DVB-S2 LDPC-IRA Codes”, IEEE ICM2007, pp. 285-288, Cairo,
Egypt, Dec. 2007.
the XC2VP30 FPGA.
[5] Marco Gomes, Gabriel Falcão, Vitor Silva, Vitor Ferreira, Alexandre
TABLE I. SYNTHESYS RESULTS ON XC2VP30 VIRTEX2P XILINX FPGA Sengo and Miguel Falcão, “Flexible Parallel Architecture for DVB-S2
LDPC Decoders”, IEEE Globelcom2007, Washington DC – USA, Nov.
BCH BCH LDPC 45-FU's LDPC 2007.
Encoder Decoder Encoder Decoder [6] F. Kienle, T. Brack, and N. Wehn, "A Synthesizable IP Core for DVB-
Slices 4% 96% 32% 87% S2 LDPC Code Decoding," Design, Automation and Test in Europe
Flip-Flops 2% 9% 3% 19% (DATE'05), Munich, Germany, 2005.
LUT’s 4% 92% 81% [7] S. Lin, D. J. Costello, Error Control Coding, 2nd ed., Prentice-Hall,
BRAM’s -- 2% 20% 132%* N.J., 2004.
Max. Freq. [8] R. G. Gallager, "Low-Density Parity-Check Codes," IRE Transactions
182.7 MHZ 114.8 MHZ 131.7 MHZ 70.8 MHZ
of Oper. on Information Theory, vol. 8, pp. 21-&, 1962.
[9] D. J. C. MacKay, "Good error-correcting codes based on very sparse
Attainable throughputs for a frequency of operation, f op matrices," IEEE Transactions on Information Theory, vol. 45, pp. 399-
are presented in Table II. The minimum 90 Mbps throughput 431, 1999.
is guaranteed in all cases, even though for the 45-FU’s LDPC [10] J. H. Chen and M. P. C. Fossorier, "Near optimum universal belief
decoder only a maximum number of 5 iterations can be propagation based decoding of low-density parity check codes," IEEE
performed. A 90-FU’s architecture (that guarantees 90 Mbps Transactions on Communications, vol. 50, pp. 406-414, 2002.
performing 15 iterations) is suitable to this FPGA family [11] M. Gomes, G. Falcão, J. Gonçalves, V. Silva, M. Falcão, and P. Faia,
which requires at least the use of a XC2VP50 that would also "HDL Library of Processing Units for Generic and DVB-S2 LDPC
Decoding," International Conference on Signal Processing and
satisfy the minimum memory requirements of the 21 Multimedia Applications (SIGMAP2006), Setúbal, Portugal, 2006.
BCH+LDPC operating modes. All modules can be jointly [12] Y.Chen, K.K. Parhi, "Area efficient parallel decoder architecture for
integrated into a single XC2VP100 FPGA. long BCH codes", Proc. IEEE Int. Conf. Acoustics, Speech, and Signal
Processing, vol.5, pp. V - 73-76, May 2004.
VI. CONCLUSIONS [13] E. Jamro, "The Design of a VHDL Based Synthesis Tool for BCH
Codecs", The University of Huddersfield, Sep. 1997.
This paper proposes a high throughput encoder/decoder [14] A. Reyhani-Masoleh, M. Hasan, "Low complexity bit parallel
hardware synthesizable solution for the FEC system of the architectures for polynomial basis multiplication over GF(2m)", IEEE
Trans. Computers, vol. 53, no. 8, pp. 945-959, Aug. 2004.

1509

You might also like