A Self Checking Reed Solomon Encoder Design and Analysis
A Self Checking Reed Solomon Encoder Design and Analysis
A Self Checking Reed Solomon Encoder Design and Analysis
Abstract
Reed Solomon codes are widely used to identify and correct data errors in transmission and storage systems.
Due to the vital importance of these blocks, a very important research topic is the study of the effects of faults on
their behavior. The presented architecture exploits some properties of the arithmetic operations on GF(2n ) Galois
Field, related to the parity of the binary representation of the elements of the field. The encoder has been mapped on
an SRAM based FPGA, the self-checking property has been analyzed using a SEU fault model and the performances
in terms of area and delay overhead are presented.
I. I NTRODUCTION
High reliable data transmission and storage systems frequently use Error Correction Codes (ECC) to protect their
data. So they are able to detect errors in binary configuration allowing, under suitable assumptions, the possibility
to correct the coded words. The performance of a code is measured in terms of the maximum number of wrong bits
it is able to detect in a coded word and the maximum numbers of bits it is able to correct. Other key element is the
circuit complexity required for code realization. In fact, the actual implementation of a coding procedure on a real
system requires the development of two basic blocks, the encoder and the decoder. The first one, starting from the
non coded word computes the code bits realizing the codeword (composed by the data and code bits) that is the
protected data to be stored or transmitted. On the contrary, the decoder receives in input the codeword and checks
its correctness eventually correcting the wrong bits. In order to meet the system constraints, frequently these blocks
must have high performance in terms of speed and error correction capabilities in order to process a large amount
of data preserving their integrity. Large efforts are devoted to develop codes with increased capabilities of error
detection and correction , but, normally, larger performances correspond to greater efforts for developing encoders
and decoders. Usually system designers focus their attention on the data encoding and decoding performed at the
system input and output, supposing that the primary goal is to protect data during their flow inside the system. In
this approach very critical points are the input and output circuits, since any error in these circuits may introduce
catastrophic effects on the overall system. A fault in the encoder can produce a non correct codeword, while a
fault in the decoder can give a wrong data word even if no errors occurs during the transmission of the codeword.
Moreover these errors will be present in each data flowing in the systems. Therefore great attention must be paid
in order to detect and recover faults in encoding and decoding circuitry. These faults are caused either by the
conventional sources as, for example, the fail of the technological process, the aging of the electronic devices or
by new phenomena related to the new technologies since the shrinking of the elementary devices in the electronic
systems causes a greater susceptibility of the components to the external environment. One of the main concerns
in this field is related to the effects of radiations on silicon devices. A high energy ion during its traveling trough
the device can inject charges into the substrate and these charges, due to the electrical biasing, can reach the
active elements (transistors) changing the content of storage elements. These effects known as Single Event Upsets
(SEU) have been widely studied for applications related to space environment [1], where they are very frequent
and predominant with respect to other possible failures . Moreover, SEU’s have been recently observed and studied
at sea level [2] and in aircraft electronics [3] and therefore they are expected to be an important issue in the
next years. ECC are widely used in space applications for the design of space-borne mass memories [4] and for
the transmission of the collected data to the earth stations. These applications require high reliability, and related
systems must be tolerant to the effects induced by mechanical stresses, thermal stresses and, especially, radiation
Proceedings of the 2005 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05)
0-7695-2464-8/05 $20.00 © 2005 IEEE
2
related SEU phenomena. Nowadays, the most used error correcting codes are the Reed-Solomon codes, based on
the properties of the finite field arithmetics. In particular, the finite fields with a number of elements of 2n are
suitable for a digital implementation due to the isomorphism between the addition operation, that is performed
modulo 2, and the xor operation between the bits representing the elements of the field. We propose to exploit this
relationship to detect faults that can occurs in the encoders, achieving the self-checking property for the arithmetic
structures used in the design of the circuitry, and therefore in the overall encoder. The encoder is then implemented
on a FPGA and the proposed solution is evaluated. The paper is organized as follows: Section II describe the fault
model used for the FPGA, Section III illustrates the used background of the Reed Solomon codes, and Section IV
describes the properties of the arithmetic structures used in the Reed Solomon encoders with respect to the parity
of the arithmetic operands. In Section V the architecture of the proposed self-checking Reed Solomon encoder is
presented while in Section VI some evaluation in term of area and delay overhead are provided. Finally, conclusions
are drawn in Section VII.
It must be noticed that only few routing resources of the FPGA are used for each configuration of the device,
while the other interconnecting resources remain unused. In the same way, although SEUs affecting the routing
configuration memory may have many different effects, only a few of them are meaningful and correspond to
Proceedings of the 2005 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05)
0-7695-2464-8/05 $20.00 © 2005 IEEE
3
modification of the netlist realized by the original configuration. In fig. 2a we show how a SEU can modify one
net segment, interrupting the signal propagation from the CLB. This kind of defect can be modeled as an open
defect. Instead, in fig. 2b the SEU affecting the configuration memory cause the activation of a routing net that
connect the two original nets of the configuration, causing a defect modeled as a short defect.
In [5] the open defects are supposed equivalent to stuck-at 0 or stuck-at 1 defect, while the short defect can be
characterized as either a wired-AND or a wired-OR model. Two special cases of short defects are stuck-at-0 and
stuck-at-1 defects, where a line is shortened with the ground or power line respectively. In [6] the same defect (open
and short) are described, while the effect of these defects at logic level are assumed as unknown logic values for
the open defects and indefinite logic values for the short defects. In our work we model the SEU in the registers,
the SEU in the LUTs and the open defects as faults affecting only one resource of the logic netlist implemented
in the FPGA. In other words we model these kinds of defects at a higher abstraction level and consider all these
defects as a erroneous value in an input or an output of one of the LUT or memory elements composing the circuit
realizing the RS encoder. Analogous considerations can be done for the fault model of short defects: let us suppose
that the short defect affecting two nets named A and B is modeled as a wired-AND (see fig.3). The two nets A and
B corresponds to a block with two inputs Ain and Bin and two outputs Aout and Bouts. When the inputs of this
block are the identical, Ain,Bin=(1,1) or (0,0), the output of the block is the same of the fault-free configuration.
When the two inputs differs the two outputs provides a value of 0, and therefore only the output of one net differs
from the fault-free configuration. This behavior can be considered as a fault that affect only one component of the
logic netlist and therefore we can work at the abstraction level of a logic netlist composed of LUT and flip-flop in
order to design the self-checking Reed Solomon encoder.
Proceedings of the 2005 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05)
0-7695-2464-8/05 $20.00 © 2005 IEEE
4
The finite fields used in digital implementations are in the form GF(2s ), where s is used avoid confusion in
the notation and represents the number of bits of a symbol. An element a(x) ∈ GF (2s ) is a polynomial with
coefficients ai ∈ 0, 1 and can be seen as a symbol of s bits a = as . . . a1 a0 . The addition of two elements a(x) and
b(x) ∈ GF (2s ) is the sum modulo 2 of the coefficients ai and bi , therefore can be seen as the bitwise xor of the
two symbols a and b. The multiplication of two elements a(x) and b(x) ∈ GF (2s ) requires the multiplication of
the two polynomial followed by the reduction modulo i(x). These operation can be implemented as an AND-XOR
network, as explained in [7],[8].
Now we can introduce the RS(n,k ) code, where the symbols composing the data word are represented as elements
of the field GF(2s ) and the overall data word is treated as a polynomial d(x) of degree k with coefficient in GF(2s ).
In other word d(x) is an element of GF(2s )[x], the ring of polynomial with coefficient in GF(2s ).
A Reed-Solomon codeword is generated using a special polynomial g(x). All valid codewords are exactly divisible
by the generator polynomial. The general form of the generator polynomial is:
g(x) = (x + αi )(x + αi+1 ) . . . (x + αi+2t ) (1)
where 2t = n − k and α is a root of the irreducible polynomial i(x) used to construct the field.
The codeword of a separable RS(n,k) code can be seen as c(x), an element of the ring GF(2s )[x] with degree n
that can be constructed in the following way:
c(x) = d(x) · xn−k + p(x) (2)
n−k
p(x) = d(x) · x mod g(x) (3)
where p(x) is an element of the ring GF(2s )[x] with degree less than n − k that represents the parity symbols.
This means that the encoder takes k data symbols and adds 2t parity symbols to make an n symbol codeword.
The 2t parity symbols allows to correct up to t symbols that contain errors in a codeword. Given a symbol size s,
the maximum codeword length n for a Reed-Solomon code is n = 2s − 1. For example, the maximum length of a
code with 8-bit symbols (s=8) is 255 bytes.
of the irreducible generator polynomial g(x). In this case the multiplier can be implemented using an suitable
network of xor gates. As an example we shown in fig. 4 the and-xor network implementing the two LSB of a
multiplication over GF(24 ).
If we suppose the constant g equal to 1010 the network is reduced to the one in fig. 5.
Therefore, each bit of the result of the constant multiplier can be computed xoring some input bits depending
from the constant g and from the chosen i(x) polynomial described in section II.
It must be noticed that, if an input ai is evaluated for an odd number of output bits, the parity P(c(x)) of the
result depend from the parity of the inputs bits. In other words, the parity of the result can be evaluated as:
P (c) = ai (5)
i∈A
where A is the set of inputs that are evaluated an odd number of times.
We propose to modify the constant multiplier block reporting as additional outputs the inputs bits that are
evaluated an even number of times. The proposed modification is can be explained using the concept of ”‘odd
observability”’ proposed in [9]. In this way, the parity of the output word o is:
Proceedings of the 2005 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05)
0-7695-2464-8/05 $20.00 © 2005 IEEE
6
Fig. 6. RS encoder
The RS encoder can be constructed using the slice block composed by a constant multiplier, an adder and
a register (the shaded block in fig. 6). The number of slices for a RS(n,k ) code is n − k . The self-checking
implementation requires the insertion of some parity prediction blocks and of a parity checker block. We propose
to check the correctness of each slice using the structure presented in fig. 7.
FPGA with a 4-inputs LUT, if the number of inputs is less or equal 4 the implementation requires only a LUT, if
the number of inputs is up to 7 requires 2 LUT’s, while the worst case of a network of 8 XOR’s requires 3 LUT’s.
As an example of this consideration in table I are reported the overhead introduced for different constant gi .
The predicted parity bit and the output of each slice are evaluated by the parity checker block as shown in fig.
8, and an error indicator signal informs if a difference between the predicted parity bit and the parity of the slices
outputs is detected. The parity checker block signals if the parity of the inputs is even or odd. An even (odd)
parity checker corresponds to a even (odd) number n − k of inputs. The self checking implementation of the parity
checker can be realized with a two rail circuit with two outputs, each equal to the parity of one of the two disjoint
subsets of the inputs [12]. In this way, the fault free behaviour of the checker, with a correct set of inputs (i.e. no
faults occur in the slices), is to provide the output codes 01 or 01 for an odd parity checker or the output codes
00 or 11 for an even parity checker. If the checker receive as input an erroneous codeword (i.e. a fault occurs in a
slice) the checker provide the output codes 11 or 00 for a odd parity checker or the output codes 01 or 10 for a
even parity checker. Also if a fault occurs in the checker, when the fault is activated, the outputs provided are 11 or
00 for an odd parity checker or the output codes 01 or 10 for an even parity checker. Therefore, the self-checking
property of the checker can be obtained simply using two networks of XOR gates for computing the two disjoint
sets of inputs.
Proceedings of the 2005 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05)
0-7695-2464-8/05 $20.00 © 2005 IEEE
8
*mean value
an overhead due to the Parity checker of 3 LUTs. The equation estimating the overhead between the self-checking
implementation and the one without self-checking capabilities is:
(n − k) ∗ (6 + 3)
= 50%
(n − k) ∗ 18
The maximum frequency of the Reed Solomon encoders depends on the critical path from the output of the last
slice to the register of another slice. The characterization of the critical time is different for each slice, depending on
the complexity of the constant multiplier gi . We can estimate this time evaluating the numbers of LUTs composing
the path from the outputs of the last slice to the registers in the worst case. For the implementation of the constant
multiplier gi realized with a network of 8 xor the number of LUT’s is 3, therefore the worst path can be estimated
as the path going through 5 LUT’s. In the self-checking implementation of the encoder two consideration can be
done to estimate the delay overhead introduced. First of all the path of the parity prediction block, that provide the
parity of the vector composed by the copies of the even computed bits and of the feed-back inputs, is comparable
with the path of the worst-case constant multiplier. Moreover, an another path can be estimated, which is the
path from the Aout and Pout registers storing the result and its predicted parity, to the register latching the error
indicator signals provided by the error indicator block. This block is realized as a two-rail parity checker, therefore
the number of the LUTs in the critical path can be estimated from the number of bits provided as input to the
checker. In fact, the number of LUTs is the number of levels of the network composed by four inputs xor realized,
that is log4 (n − k)(s + 1). Therefore, the worst path to be used to the delay overhead estimation is the maximum
between the worst path of the standard RS encoder and the path of the two-rail parity checker. It must be noticed
that the number of levels of the two-rail parity checker increases very slowly with the grow of the number of
check symbols, and therefore actually do not represents a problem for the maximum frequency obtainable for the
self-checking implementation of the encoder.
VII. C ONCLUSIONS
In this paper an innovative self-checking Reed Solomon encoder architecture is described. The parity properties
of the binary representation of the elements of the fields GF(2n ) are studied and a method for a self checking
implementation of the arithmetic structures used in the Reed Solomon encoder has been proposed. The method has
been applied to obtain an FPGA implementation of a self-checking encoder, the self-checking property has been
analyzed using a SEU fault model and both area and delay overhead of the proposed solution has been evaluated.
R EFERENCES
[1] A.H. Johnston, Radiation effects in advanced microelectronics technologies, IEEE Trans. Nucl. Sci., Vol. 43, no. 3, pp. 1339-1354, June
1998.
[2] E. Normand, Single Event Upset at Ground Level, IEEE Trans. Nucl. Sci., 43, 2742, (1996)
[3] E. Normand, Single Event Effects in Avionics, IEEE Trans. Nucl. Sci., 43, 461, (1996)
Proceedings of the 2005 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05)
0-7695-2464-8/05 $20.00 © 2005 IEEE
9
[4] G.C. Cardarilli, A. Leandri, P. Marinucci, M. Ottavi, S. Pontarelli, M. Re, A. Salsano,Design of a fault tolerant solid state mass memory,
Reliability, IEEE Transactions on Volume 52, Issue 4, Dec. 2003 Page(s):476 - 491
[5] M. Violante, M. Ceschia, M. Sonza Reorda, A. Paccagnella, P. Bernardi, M. Rebaudengo, D. Bortolato, M. Bellato, P. Zambolin, A.
Candelori, Analyzing SEU Effects in SRAM-based FPGAs, On-Line Testing Symposium, 2003. IOLTS 2003, 7-9 July 2003 pp. 119 -
123
[6] Reddy, E.S.S.; Chandrasekhar, V.; Sashikanth, M.; Kamakoti, V.; Vijaykrishnan, N.; Detecting SEU-caused routing errors in SRAM-based
FPGAs, VLSI Design, 2005. 18th International Conference on 3-7 Jan. 2005 pp. 736 - 741
[7] H. Wu, Bit-parallel finite field multiplier and squarer using polynomial basis, Computers, IEEE Transactions on , Vol. 51 , no. 7 , pp.
750 - 758, July 2002
[8] A. Reyhani-Masoleh, M.A. Hasan, Low complexity bit parallel architectures for polynomial basis multiplication over GF(2m), Computers,
IEEE Transactions on , Vol. 53 , no 8, pp. 945 - 959, Aug. 2004
[9] C. Bolchini, F. Salice, D. Sciuto, A novel methodology for designing TSC networks based on the parity bit code, European Design and
Test Conference, 1997. ED&TC 97. Proceedings , pp. 440 - 444, 17-20 March 1997
[10] R.E. Blahut, Theory and Practice of Error Control Codes, Addison-Wesley Publishing Company, 1983
[11] N. A. Touba, E. J. McCluskey, Logic Synthesis Techniques for Reduced Area Implementation of Multilevel Circuits with Concurrent
Error Detection, Proc. of Int. Conf. on Computer Aided Design (ICCAD), pp. 65l-654, 1994.
[12] D. Nikolos, Design Techniques for testable embedded error checkers, Computers, Vol. 23, Issue 7, pp. 84 - 88, July 1990
Proceedings of the 2005 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05)
0-7695-2464-8/05 $20.00 © 2005 IEEE