Anum Imran: Software Implementation and Performance of Umts Turbo Code
Anum Imran: Software Implementation and Performance of Umts Turbo Code
Anum Imran: Software Implementation and Performance of Umts Turbo Code
(2.1)
Where W is the bandwidth, S is signal power, N is received noise power and R is the
data transmission rate. In practical transmission, no such thing as an ideal error free
channel exists. Instead, the bit error rate is brought to an arbitrarily small constant often
chosen at
or
of 0.7 dB [9]. The BER curves for different iterations of rate r=1/2 turbo code
are shown in Figure 2.1 [10].
The turbo codes consist of two or more component encoders separated by interleavers
so that each encoder uses an interleaved version of the same information sequence.
Contrary to conventional decoders which use hard decision to decode the bits,
concatenated codes like turbo codes do not limit themselves by passing hard decision
among the decoders. They rather exchange soft decision from the output of one decoder
to the input of the other decoder. The decoding is done in an iterative fashion to increase
reliability of the decision.
9
3. TURBO CODES
The generic form of a turbo encoder consists of two encoders separated by the
interleaver. The two encoders used are normally identical and the code is systematic,
i.e., the output contains the input bits as well. Turbo codes are linear codes. Linear
codes are codes for which the modulo sum of two valid code words is also a valid
codeword. A good linear code is one that has mostly high-weight code words. The
weight or Hamming weight of a codeword is the number of ones that it contains, e.g.,
the Hamming weight of codeword 000 and 101 is 0 and 2 respectively. High-weight
code words are desirable because it means that they are more distinct, and thus the
decoder will have an easier time distinguishing among them. While a few low-weight
code words can be tolerated, the relative frequency of their occurrence should be
minimized.
The choice of the interleaver is crucial in the code design. Interleaver is used to
scramble bits before being input to the second encoder. This makes the output of one
encoder different from the other encoder. Thus, even if one of the encoders occasionally
produces a low-weight, the probability of both the encoders producing a low-weight
output is extremely small. This improvement is known as interleaver gain. Another
purpose of interleaving is to make the outputs of the two encoders uncorrelated from
each other. Thus, the exchange of information between the two decoders while decoding
yields more reliability. There are different types of interleavers, e.g., row column
interleaver, helical interleaver, odd-even interleaver, etc.
To summarize, it can be said that turbo codes make use of three simple ideas:
Parallel concatenation of codes to allow simpler decoding
Interleaving to provide better weight distribution
Soft decoding to enhance decoder decisions and maximize the gain from
decoder interaction.
3.1 Turbo code encoder
The fundamental turbo code encoder is built using two identical recursive systematic
convolutional (RSC) encoders concatenated in parallel [9]. Convolutional codes are
usually described using two parameters: the code rate r and the constraint length K. The
code rate k/n, is expressed as a ratio of the number of bits into the convolutional encoder
k to the number of channel symbols output by the convolutional encoder n in a given
encoder cycle. The constraint length parameter K denotes the length of the
10
convolutional encoder, i.e., the maximum number of input bits that either output can
depend on. Constraint length K is given as
(3.1)
Where m is the maximum number of stages (memory size) in any shift register. The
shift registers store the state information of the convolutional encoder and the constraint
length relates the number of bits upon which the output depends.
In turbo code encoder, both the RSC encoders are of short constraint length in order to
avoid excessive decoding complexity. An RSC encoder is typically of rate r = 1/2 and
is termed a component encoder. The two component encoders are separated by an
interleaver. The output of the turbo encoder consists of the systematic input data and the
parity outputs from two constituent RSC encoders. The systematic outputs from the two
RSC encoders are not needed because they are identical to each other (although ordered
differently) and to the turbo code input. Thus the overall code rate becomes r = 1/3.
Figure 3.1 shows the fundamental turbo code encoder [9].
The first RSC encoder outputs the systematic output c
1
and recursive convolutional
encoded output sequence c
2
whereas the second RSC encoder discards its systematic
sequence and only outputs the recursive convolutional encoded sequence c
3
.
3.1.1 Recursive Systematic Convolutional (RSC) encoder
The RSC encoder is obtained from the conventional non-recursive non-systematic
convolutional encoder by feeding back one of its encoded outputs to its input. Figure
3.2 shows a conventional rate r = 1/3 convolutional encoder with constraint length K=3.
RSC Encoder 1
Interleaver
RSC Encoder 2
Input Systematic Output c
1
Output c
2
Output c
3
Figure 3.1 Fundamental turbo code encoder.
11
Figure 3.2 Conventional convolutional encoder with r =1/2 and K =3.
For each adder in the convolutional encoder, a generator polynomial is defined. It shows
the hardware connections of the shift register taps to the modulo-2 adders. A 1
represents a connection and a 0 represents no connection. The generator polynomials
for the above conventional convolution encoder are given as g
1
= [111] and g
2
= [101]
where the subscripts 1 and 2 denote the corresponding output terminals. The generator
matrix of the convolutional encoder is a k-by-n matrix. The element in the i
th
row and j
th
column indicates how the i
th
input contributes to the j
th
output. The generator matrix of
the above convolutional encoder is given in equation (3.2).
(3.2)
The conventional encoder can be transformed into an RSC encoder by feeding back the
first output to the input. The generator matrix of the encoder then becomes
(3.3)
Where 1 denotes the systematic output, g
2
denotes the feed forward output, and g
1
is
the feedback to the input of the RSC encoder. Figure 3.3 shows the resulting RSC
encoder.
D D
+
+
Output 1
Output 2
Input
12
Figure 3.3 RSC encoder obtained from the conventional convolution encoder with
r =1/2 and K =3.
It was suggested in [11] that good codes can be obtained by setting the feedback of the
RSC encoder to a primitive polynomial, because the primitive polynomial generates
maximum-length sequences which adds randomness to the turbo code.
3.1.2 Representation of turbo codes
Turbo codes are often viewed as finite state machines and are represented using state
diagrams and trellis diagrams. The contents in the memory elements of a coder
represent its state. For the rate r =1/2 convolutional encoder of Figure 3.2 with
constraint length K=3, the number of memory elements is L=K-1 =2. The input bit can
be either one or zero so the size M of the input alphabet 2. The number of possible states
of the coder is then M
L
= 2
2
=4 [12].
The operation of a convolutional coder can be represented by a state diagram which
consists of a set of nodes S
j
representing the possible states of the encoder where j {
0M
L
}. The nodes are connected by branches and are labelled by the input symbol and
the corresponding output symbol. Figure 3.4 shows the state diagram of the rate r = 1/2
convolutional encoder of Figure 3.2. The state diagram has four states S = { 00, 01, 10,
11}. The arrows represent the transition from one state to other and the labelling on
arrow gives the input bit (1 or 0) and the output of the encoder.
D D
Output 1
Output 2
Input
+
+
+
13
Figure 3.4 State diagram of rate r =1/2 constraint length K =3 convolutional
encoder
The state diagram can be expanded into the trellis diagram. All state transitions at each
time step are explicitly shown in the diagram to retain the time dimension. The trellis
diagram is very convenient for describing the behaviour of the corresponding decoder.
Figure 3.5 shows the trellis diagram for the encoder in Figure 3.2 [13].
The four possible states of the encoder are depicted as four rows of horizontal dots.
There is one column of four dots for the initial state of the encoder and one for each
time instant during the message. The solid lines connecting dots in the diagram
represent state transitions when the input bit is a one whereas the dotted lines represent
state transitions when the input bit is a zero. The labels on the branch represent the
output bits.
Figure 3.5 Trellis diagram for the encoder in Figure 3.2 [13].
14
Figure 3.6 Trellis termination for the encoder in Figure 3.2 [14].
Consider the trellis diagram for encoding 15 input bits as shown in Figure 3.6 [14]. The
trellis is in state 00 at the beginning. i.e., t=0. The trellis shows the next two states at
time t=1 with input bit 1 and 0. At the end of the 15 bit input, the encoder can be in any
of the states. However, for better performance of the decoder, both the initial and final
states of the encoder should be known. Thus, it is desirable to bring the encoders to a
known state after the entire input has been encoded. For this purpose, trellis termination
is performed by passing m=k-1 tail bits from the constituent encoders bringing them to
all zeros state. This brings the trellis to the initial all zero state as well. This is known as
trellis termination.
3.1.3 Trellis termination
Unlike conventional convolutional codes which always use a stream of zeros as tail bits,
the tail bits of a RSC depend on the state of the encoder when all the data bits have been
encoded [15]. Also because of the presence of interleaver between the two encoders, the
final states of the two component encoders will be different. Thus, the trellis termination
bits for the two encoders will also be different and an RSC cannot be brought to an all
zero state simply by passing a stream of zeros through it. However, this can be done by
using the feedback bit as the encoder input. This is done by using a switch at the input
as shown in Figure 3.7.
Figure 3.7 The trellis termination strategy for RSC encoder.
+
Input
Output 1
D D
Output 2
+
+
A
B
15
The switch is in position A while encoding the input sequence and is switched to
position B at the end of the input bit sequence for termination of trellis. The XOR of the
bit with itself will be zero (output of left most XOR) and thus the encoder will return to
all zero state after m=k-1 clock cycles by inputting the m=k-1 feedback bits generated
immediately after all the data bits have been encoded. Thus, a feedback is used for
terminating the trellis bringing the encoder to the all zero state.
3.1.4 Recursive convolutional encoders vs. Non-recursive encoder
The recursive convolutional encoders are better suited for turbo codes as compared to
non-recursive encoders because they tend to produce higher weight code words.
Consider a rate r = 1/2 constraint length K = 2 non-recursive convolutional encoder
with generator polynomial g
1
= [11] and g
2
= [10] as shown in Figure 3.8. The
corresponding RSC encoder with generator matrix G = [1, g
2
/ g
1
] is shown in Figure
3.9.
Figure 3.8 Non-recursive r=1/2 and K=2 convolutional encoder with input and output
sequences.
Figure 3.9 Recursive r =1/2 and K =2 convolutional encoder of Figure 3.8 with
input and output sequences.
Output 1
D
Input
+
Output 2
Output 2
D
Input
+
Output 1
16
Table 3.1 I nput and output sequences for convolutional encoders of Figure 3.8 and
Figure 3.9.
Encoder Type Input Output 1
c
1
Output 2
c
2
Weight
Non- RSC 1000 1000 1100 3
RSC 1000 1000 1111 5
Table 3.1 shows the output sequences corresponding to the same input sequence given
to the two encoders. The non-recursive convolutional encoder produces output c
1
=
[1100] and c
2
= [1000], thus it has a weight of 3. On the other hand, the recursive
convolution encoder outputs c
1
= [1000] and c
2
= [1111] which has a weight of 5. Thus, a
recursive convolutional encoder tends to produce higher weight code words as
compared to non-recursive encoder, resulting in better error performance. For turbo
codes, the main purpose of implementing RSC encoders as component encoders is to
utilize the recursive nature of the encoders and not the fact that the encoders are
systematic [16].
Figure 3.10 shows the state diagram of the non-recursive and recursive encoders.
Clearly, the state diagrams of the encoders are very similar. Also, the two encoders have
the same minimum free distance and can be described by the same trellis structure [9].
However, these two codes have different BERs as the BER depends on the input-output
correspondence of the encoders [17]. It has been shown that the BER for a recursive
convolutional code is lower than that of the corresponding non-recursive convolutional
code at low signal-to-noise ratios Eb/No [9][17].
(a) State diagram of the non-recursive encoder in Figure 3.8.
(b) State diagram of recursive encoder in Figure 3.9.
Figure 3.10 The state diagram of recursive and non recursive encoders.
17
3.1.5 Concatenation of codes
In coding theory, concatenated codes form a class of error correcting codes obtained by
combining an inner code with an outer code. The main purpose of concatenated codes is
to have larger block length codes with exponentially decreasing error probability [18].
There are two types of concatenation, namely serial and parallel concatenations. In
serial concatenation, the blocks are connected serially such that the output of one block
is input to the next block and so on.
An interleaver is often used between the encoders in both serial and parallel
concatenated codes to improve burst error correction capacity and increase the
randomness of the code. In case of serial concatenation, the output of encoder 1 is
interleaved before being input to encoder 2 whereas in parallel concatenation, the input
data is interleaved before being input to the second encoder. The serial concatenated
coding scheme is shown in Figure 3.11.
The total code rate for serial concatenation is the product of the code rates of the
constituent encoders and is given as [18].
(3.4)
For parallel concatenation, the blocks are connected in parallel. The parallel
concatenation scheme is shown in Figure 3.12. The total code rate for parallel
concatenation is calculated as
(3.5)
Figure 3.11 Serial concatenated code.
Encoder 1
r
1
= k
1
/ n
1
Encoder 2
r
2
= k
2
/ n
2
Modulator
Channel
Decoder 1 Decoder 2
Demodulator
Interleaver
Deinterleaver
18
Figure 3.12 Parallel concatenated code.
Turbo codes use the parallel concatenated encoding scheme in which the two RSC
encoders are connected in parallel separated by an interleaver. However, the turbo code
decoder is based on the serial concatenated decoding scheme. The performance of serial
concatenated decoders is better than the parallel concatenated decoding scheme as the
serial concatenation scheme has the ability to share information between the
concatenated decoders. On the other hand, the decoders for the parallel concatenation
scheme are primarily decoding independently.
3.1.6 Interleaver design
Turbo codes use an interleaver between two component encoders. The purpose of using
the interleaver is to provide randomness to the input sequences and increase the weight
of the code words.
Consider a constraint length K = 2 rate r = 1/2 convolutional encoder as shown earlier
in Figure 3.9. The input sequence x
i
produces output sequences c
1i
and c
2i
respectively.
The input sequences x
1
and x
2
are different permuted sequences of x
0
. Table 3.2 shows
the resulting code words and weights.
Table 3.2 I nput and Output Sequences for Encoder in Figure 3.9.
Input
Sequence
Output Sequence
c
1i
Output Sequence
c
2i
Codeword
Weight
x
0
1100 1100 1000 3
x
1
1010 1010 1100 4
x
2
1001 1001 1110 5
Modulator
Channel
Demultiplexer
Demodulator
Encoder 1
r
1
= k
1
/ n
1
Multiplexer
Interleaver
Encoder 2
r
2
= k
2
/ n
2
General
Concatenated
Decoding
(Depends on
Decoder Structure)
19
The values in Table 3.2 show that the codeword weight can be increased by permuting
the input sequence using an interleaver. The interleaver affects the performance of turbo
codes by directly affecting the distance properties of the code [19]. The BER of a turbo
code is improved significantly by avoiding low-weight code words. The choice of the
interleaver is important in the turbo code design. The optimal interleaver is the one that
produces fewest low weight coded sequences. The algorithm shows the basic interleaver
design concept:
Generate a random interleaver.
Generate all possible input information sequences.
Determine the resulting code words for all possible input information
sequences. Determine the weight of the code words to find the weight
distribution of the code.
From the collected data, determine the minimum codeword weight and the
number of code words with that weight.
Repeat the algorithm for a reasonable number of times. By comparison of the data, the
interleaver with the largest minimum codeword weight and lowest number of code
words with that weight is selected.
A number of interleavers are used in turbo codes [20]. A few of them are discussed
below in detail.
3.1.6.1 Matrix interleaver
The matrix interleaver is the most commonly used interleaver in communication
systems. It writes data in a matrix columnwise from top to bottom and left to right
without repeating or omitting any of the data bits. The data is then read out rowwise
from left to right and top to bottom. For example, if the interleaver uses a 2-by-3 matrix
to do its internal computations, then for an input of [A B C D E F], the interleaver
matrix is
The interleaved output is [A D B E C F]. At the deinterleaver the data is written in
columnwise fashion and read rowwise to obtain the original data sequence.
3.1.6.2 Random (Pseudo-Random) interleaver
The random interleaver uses a fixed random permutation and maps the input sequence
according to the permutation order. Consider an input sequence of length L=8 and the
random permutation pattern be [2 3 5 6 7 4 8 1]. If the input sequence is [A B C D E F
G H], the interleaved sequence will be [B C E F G D H A]. At the deinterleaver, the
reverse permutation pattern is used to deinterleave the data.
20
Figure 3.13 The interleaving using random interleaver.
3.1.6.3 Circular-shifting interleaver
The permutation p of the circular-shifting interleaver is defined by
(3.7)
Where i is the index, b is the step size, s is the offset and L is the size of the interleaver.
The step size b should be less than L and b should be relatively prime to L [21]. Figure
3.14 illustrates a circular-shifting interleaver with L=8, b=3, and s=0.
Figure 3.14 The interleaving using circular shifting interleaver.
It can be seen that the adjacent bit separation is either 3 or 5. This type of interleaver is
suited for permuting weight-2 input sequences with low codeword weights into weight-
2 input sequences with high codeword weights. However, because of the regularity of
adjacent bit separation (3 or 5 in the above case), it is difficult to permute input
sequences with weight higher than 2 using this interleaver [21].
3.1.6.4 Odd-Even interleaver design
The code rate of the turbo code can be changed by puncturing the output sequence.
Puncturing is the process of removing some of the parity bits after encoding. This has
the same effect as encoding with an error-correction code with a higher rate, or less
21
redundancy. However, with puncturing the same decoder can be used regardless of how
many bits have been punctured, thus puncturing considerably increases the flexibility of
the system without significantly increasing its complexity. A pre-defined pattern of
puncturing is used at the encoder end. The inverse operation, known as depuncturing, is
implemented by the decoder. By puncturing the two coded output sequences of a rate
r = 1/3 turbo code, a rate r = 1/2 turbo code can be obtained. However, by simple
puncturing the two coded output sequences there is a possibility that both the coded bits
corresponding to a systematic information bits be punctured. Due to this, the
information bit will have none of its coded bits in the sequence and vice versa. As a
result, the performance of the turbo decoder degrades in case of error for an unprotected
information bit. This can be avoided by using an odd-even interleaver design. First, the
bits are left uninterleaved and encoded, but only the odd-positioned coded bits are
stored. Then, the bits are scrambled and encoded, but now only the even-positioned
coded bits are stored. As a result, each information bit will now have exactly one of its
coded bits [22]. This guarantees an even protection of each bit of information, a uniform
distribution of the error correction capability and better performance of the code.
3.2 Turbo code decoder
In traditional decoding approach, demodulation is based on hard decision of the
received symbol. This discrete value is then passed on to the error control decoder. This
approach has some disadvantages. The decoder cannot make use of the certainty of
information available to it while decoding [9].
A similar but better rule is to take into account the a priori probabilities of the input. If
the +1 symbol has a probability of 0.9 and if the symbol falls in negative decision range,
the Maximum Likelihood Decoder (MLD) will decide it as -1. It does not take into
account the 0.9 probability of symbol being +1. A detection method that does take this
conditional probability into account is the Maximum a posteriori probability (MAP)
algorithm. It is also known as the minimum error rule [23].
Another algorithm used for turbo decoding is the soft output Viterbi algorithm (SOVA).
It uses Viterbi algorithm but with soft outputs instead of hard.
3.2.1 Map decoder
The process of MAP decoding includes the formation of a posteriori probabilities
(APP) of each information bit followed by choosing the data bit value corresponding to
MAP probability for that data bit. While decoding, the decoder receives as input a soft
(i.e. real) value of the signal. The decoder then outputs for each data bit an estimate
expressing the probability that the transmitted data bit was equal to one indicating the
reliability of the decision. In the case of turbo codes, there are two decoders for outputs
from both encoders. Both decoders provide estimates of the same set of data bits, but in
22
a different order due to the presence of interleaver. Information exchange is iterated a
number of times to enhance performance. During each iteration, the estimates are re-
evaluated by the decoders using information from the other decoder. This allows the
decoder to find the likelihood as to what information bit was transmitted at each time
instant. In the final stage, hard decisions will be made, i.e., each bit is assigned the value
1 or 0 [24].
The APPs are used to calculate the likelihood ratio by taking their ratio. The logarithmic
form of likelihood ratio is the log likelihood ratio (LLR).
(3.8)
(3.9)
Where
is described as
the joint probability that data
and state
(3.10)
The sequence
can be written as
(3.11)
Substituting equation (3.11) in equation (3.10), we get
(3.12)
The Bayess theorem gives the conditional probabilities as
(3.13)
Applying the results of equation (3.13) to equation (3.12) yield,
23
(3.14)
Equation (3.14) will now be explained in terms of the forward state metric, reverse state
metric and the branch metric.
3.2.1.1 The state metrics and the branch metric
FORWARD STATE METRIC
Consider the first term on the right side of equation (3.14)
(3.15)
For i =0,1 and time k and state
and
(3.16)
The equation represents the forward state metric at time k as being a probability of the
past sequence that is dependent only on the current state
induced by the
sequence.
REVERSE STATE METRIC
The second term of equation (3.14) represents the reverse state metric
at time k and
state
(3.17)
Where f (i,m) is the next state given an input i and state m and
is the reverse
state metric at time k+1 and state f(i,m). Equation (3.17) represents the reverse state
metric
(3.18)
Substituting equation (3.16), (3.17), (3.18) in equation (3.14) yields
(3.19)
Equation (3.8) and (3.9) can be expressed in terms of equation (3.19) as
(3.20)
(3.21)
Equations (3.20) and (3.21) represent the likelihood ratio
of
the k
th
data bit respectively.
3.2.1.2 Calculating the forward state metric
The forward state metric in equation (3.16) can be expressed as the summation of all
possible transition probabilities from time k-1, as follows [24]
(3.22)
Applying Bayess theorem yields
(3.23)
As the knowledge about state m and the input j at time k-1 completely defines the path
resulting in state
(3.24)
Where b(j, m) is the state going backward in time from state m, via the previous branch
corresponding to input j. Equation (3.24) can be simplified using equations (3.16) and
(3.18) .
(3.25)
Figure 3.15 represents equation (3.25) in terms of transitions from state k-1 to state k.
3.2.1.3 Calculating the reverse state metric
The reverse state metric was given in equation (3.17) as
Using this equation, the reverse state metric for state m at time k is given as:
(3.26)
Equation (3.26) can be expressed in terms of summation of all possible transition
probabilities to time k+1 as follows [24]
(3.27)
j=0
j=1
k-1
k
26
Using Bayess theorem
(3.28)
In the first term of equation (3.28), the terms
and
can
be replaced with
(3.29)
Using equation (3.17) and (3.18), equation (3.29) becomes
(3.30)
The equation represents the reverse state metric at time k as the weighted sum of state
metrics from time k+1 where the weighting consists of branch metrics associated with
transitions corresponding to data bits 0 and 1. Figure 3.16 shows the graphical
representation of calculating the reverse state metric.
3.2.1.4 Calculating the branch metric
The branch metric was given in equation (3.18) as
j=0
j=1
k k+1
Figure 3.16 The graphical representation for calculating reverse state metric.
27
The equation can be solved using Bayess rule as
(3.31)
Where
is the received noisy signal and it consists of both noisy data bits
and
noisy parity bits
.The noise affects data and parity bits independently so the current
state is independent of the current input and can therefore be any of the
states where
is the number of memory elements in the convolutional code system. Thus the
probability is given as
(3.32)
Solving equation (3.31) using equation (3.32) yields
(3.33)
The term
and is given as
, thus equation
(3.33) becomes
(3.34)
Now, the probability density function (pdf)
of a random variable X
k
is related
to the probability of that random variable X
k
taking on the value x
k
as
(3.35)
For an AWGN channel with zero mean and variance
(3.36)
Where u
k
and v
k
represent the transmitted data bits and parity bits respectively. The
parameter
represents parity which depends on the state m since the code has memory.
Simplifying equation (3.36) gives [24]:
(3.37)
Substituting equation (3.37) in equation (3.8), we obtain:
28
(3.38a)
(3.38b)
Where
at time k and
is the output
extrinsic likelihood. The term
(3.39)
The final soft number
,
the channel measurement LLR
.
Implementing the MAP algorithm in terms of the likelihood ratios is very complex
because of the multiplication operations that are required. However, by implementing it
in the logarithmic domain, the complexity is greatly reduced.
29
4. THE UMTS TURBO CODE
Turbo codes are discussed in detail in chapter 3. This chapter will focus on the
implementation of Turbo encoder and decoder for UMTS systems. The architecture of
the encoder will be discussed in section 4.1. It explains the basic convolutional encoders
and interleaver used in UMTS turbo code in detail. After coding, the systematic bits and
code bits are arranged in a sequence, modulated using BPSK modulation and
transmitted over the channel. The channel models and the decoder architecture are
discussed in detail in section 4.2 and section 4.3 respectively.
4.1 UMTS turbo encoder
The UMTS turbo coder scheme is Parallel Concatenated Convolution Code (PCCC). It
comprises of two constraint length K = 4 (8 state) RSC encoders concatenated in
parallel. The overall code rate is approximately r = 1/3. Figure 4.1 shows a UMTS
turbo encoder [26].
The two convolutional encoders used in the Turbo code are identical with generator
polynomials [25].
g
0
(D)=1+D
2
+D
3
(4.1)
g
1
(D)=1+D+D
3
(4.2)
Figure 4.1 The UMTS turbo encoder [26].
30
Where g
0
and g
1
are the feedback and feed forward generator polynomials respectively.
The transfer function of each constituent convolutional encoder is:
(4.3)
The data bits are transmitted together with the parity bits generated by two
constituent convolutional encoders. Prior to encoding, both the convolutional encoders
are set to all zero state, i.e., each shift register is filled with zeros. The turbo encoder
consists of an internal interleaver which interleaves the input data bits X
1
, X
2
.X
K
to
X
1
, X
2
. X
K
which are then input to the second constituent encoder. Thus, the data
is encoded by the first encoder in the natural order and by the second encoder after
being interleaved. The systematic output of the second encoder is not used and thus the
output of the turbo coder is serialized combination of the systematic bits X
k
, parity bits
from the first (upper) encoder Z
k
and parity bits from the second encoder Z
k
for k =
1,2,K.
(4.4)
The size of the input data word may range from as few as 40 to as many as 5114
bits. If the interleaver size is equal to the input data size K the data is scrambled
according to the interleaving algorithm, otherwise dummy bits are added before
scrambling. After all the data bits K have been encoded, trellis termination is performed
by passing tail bits from the constituent encoders bringing them to all zeros state. To
achieve this, the switches in Figure 4.1 are moved in the down position. The input in
this case is shown by dashed lines (input=feedback bit). Because of the interleaver, the
states of both the constituent encoders will usually be different, so the tail bits will also
be different and need to be dealt separately.
As constraint length K=4 constituent convolutional encoders are used, so the
transmitted bit stream includes not only the tail bits {X
k+1
, X
k+2
, X
k+3
} corresponding to
the upper encoder but also tail bits corresponding to the lower encoder {X
k+1
, X
k+2
,
X
k+3
}. In addition to these six tail bits, six corresponding parity bits {Z
k+1
, Z
k+2
, Z
k+3
}
and {Z
k+1
, Z
k+2
, Z
k+3
} for the upper and lower encoder respectively are also
transmitted. First, the switch in the upper (first) encoder is brought to lower (flushing)
position and then the switch in the lower (second) encoder. The tail bits are then
transmitted at the end of the encoded data frame. The tail bits sequence is:
(4.5)
The total length of the encoded bit sequence now becomes 3K+12, 3K being the
coded data bits and 12 being the tail bits. The code rate of the encoder is thus
31
r = K / (3K+12). However, for large size of input K, the fractional loss in code rate due
to tail bits in negligible and thus, the code rate is approximated at 1/3.
4.1.1 Interleaver
Turbo code uses matrix interleaver [25]; [26]. Bits are input to a rectangular matrix in a
rowwise fashion, inter-row and intra-row permutations are performed and the bits are
then read out columnwise. The interleaver matrix can have 5, 10 or 20 rows and
columns between 8 and 256 (inclusive) depending upon the input data size. Zeros are
padded if the number of data bits is less than the interleaver size and are later pruned
after interleaving.
The input to the interleaver is the input data bit sequence X
1
, X
2
.. .X
K
where K
is the number of data bits. Interleaving is a complicated process and it comprises of a
number of steps which are described below.
4.1.1.1 Bits input to the rectangular matrix
1. Determine number of rows R :
The number of rows can be either 5, 10 or 20 depending on the size of input data. If the
input data size lies between 40 and 159, the number of rows R is five. If the input data
size K is either between 160 and 200 or 481 and 530, the number of rows of the
interleaver matrix R is ten. The interleaver matrix has twenty rows for size of input data
K lying anywhere outside the range specified.
(4.6)
The rows are numbered from 0 to R-1 from top to bottom.
2. Determine number of columns C :
For determining C, first a prime root p is calculated which is later used in the intra-row
permutations. If K lies in the range between 481 and 530 (inclusive), then both C and p
are assigned a value 53. Otherwise, the value of p is selected from Table 4.1 such that
the product of R and (p+1) is less than or equal to the number of input bits K. In this
case, the number of columns C can have any of the three values p-1, p and p+1
depending on the relation between number of data bits K, number of rows R and prime
root p. Like the rows, the number of columns is also numbered from 0 to C-1.
32
(4.7)
Table 4.1 The list of prime number p and associated primitive root v.
p v p v p v p v p v
7 3 47 5 101 2 157 5 223 3
11 2 53 2 103 5 163 2 227 2
13 2 59 2 107 2 167 5 229 6
17 3 61 2 109 6 173 2 233 3
19 2 67 2 113 3 179 2 239 7
23 5 71 7 127 3 181 2 241 7
29 2 73 5 131 2 191 19 251 6
31 3 79 3 137 3 193 5 257 3
37 2 83 2 139 2 197 2
41 6 89 3 149 2 199 3
43 3 97 5 151 6 211 2
3. Writing data in rectangular matrix:
The data is written in the interleaver matrix row wise. The size of the interleaver matrix
is R C. If the size of input stream is smaller than the interleaver size, dummy bits are
padded which are later pruned away after interleaving. The interleaver matrix is
Where K is the length of Input bit sequence and k =0, 1, 2..K.
4.1.1.2 Intra row and inter row permutations
The permutation process consists of seven steps which have been discussed in detail
below:
1. For the calculated value of prime root p, corresponding primitive toot v is selected
from Table 4.1.
33
2. The base sequence s for intra row permutation is constructed by using the primitive
root v, the prime root p and the previous value of base sequence. The value of base
sequence is initialized to one.
(4.8)
3. The prime number sequence q is found out using the value of prime root p which is
later used for the calculations of variables for intra-row permutation. The size of the
sequence is equal to the number of rows of the interleaver matrix. The value of q is
calculated using the greatest common divisor (gcd). The prime integer q
i
(i = 1, 2,
R-1) in the sequence is the least prime integer such that the gcd of q and (p-1)
equals one. The value of q should be greater than 6. Also, the next value of q should
be greater than the previous value in the sequence. The value of q is initialized to 1.
(4.9)
4. The inter-row permutation pattern is selected depending on the number of input bits
K and number of rows R of the interleaver matrix. The inter-row permutation pattern
simply changes the ordering of rows without changing the ordering of elements
within a row. If the number of rows is either five or ten, the inter-row permutation
pattern is simply the flipping of rows or in other words a reflection around the centre
row, e.g., if the number of rows is five, then the rows {0, 1, 2, 3, 4} are permuted to
rows {4, 3, 2, 1, 0} respectively. Same is the case when the number of rows is ten.
The rows {0, 1, 2, 3 .9} become rows {9, 8, 7, 6, 5, 4, 3, 2, 1, 0} respectively.
When the number of rows is twenty, there are two different cases based on the
number of input bits. The rows {0, 1, 2,19} become rows {20, 10, 15, 5, 1, 3, 6, 8,
13, 19, 17, 14, 18, 16, 4, 2, 7, 12, 9, 11} respectively, when the number of input bits
K satisfies either of the two conditions, i.e., 2281 K 2480 or 3161 K 3210.
For any other value of K, the permutation pattern is {20, 10, 15, 5, 1, 3, 6, 8, 13, 19,
11, 9, 14, 18, 4, 2, 17, 7, 16, 12}.
5. The prime integer sequence q
i
is permuted to get r
i
(for i = 0, 1, . R-1) which is
used in calculating the intra-row permutation patterns.
(4.10)
Where T(i) refers to the i
th
value of the inter-row permutation pattern calculated in
the previous step.
6. Next, the intra-row permutation is performed depending on the relation between
number of columns C and the prime root p. The intra-row permutation changes the
34
ordering of elements within a row without changing the ordering of the rows. The i
th
( i = 0, 1, .R-1) intra-row permutation is performed as:
(4.11)
Where
The output is pruned by deleting dummy bits that were added before interleaving.
For example, bits Y
k
that correspond to bits Y
k
for k > K were the dummy bits and are
removed from the output. The bit sequence obtained after interleaving and pruning is
X
k
for k K where X
1
corresponds to Y
k
with smallest index k after pruning. The
number of bits output from the turbo code interleaver is equal to the number of input
bits K and the number of pruned bits is (RC)-K.
4.2 Channel
After encoding, the entire n bit turbo codeword is assembled into a frame, modulated,
transmitted over the channel, and then decoded. The input to the modulator is assumed
to be U
k
where U
k
can be either systematic bit or parity bit and it can have a value either
35
0 or 1. The signal passing received after passing through the channel and demodulation
is Y
k
which can take on any value. Thus, U
k
is a hard value an Y
k
is a soft value.
The modulation scheme assumed for the proposed system was BPSK and the
channel model can be either AWGN or flat fading channel. For a channel with channel
gain a
k
and Gaussian noise n
k
, the output from the receivers matched filter is
Y
k
= a
k
S
k
+ n
k
where S
k
is the signal after passing through the matched filter. For
systematic data bits, S
k
= 2X
k
- 1. For upper and lower encoder parity bits S
k
= 2Z
k
- 1
and S
k
= 2Z
k
- 1 respectively. For an AWGN channel, channel gain a
k
equals one
whereas for a Rayleigh flat fading channel, it is a Rayleigh random variable. The
Gaussian noise n
k
has a variance
2
= 1/(2E
s
/ N
o
) where E
s
is the energy per code bit
and N
o
is the one sided noise power spectral density. For a coderate r = K/ (3K+12), the
noise variance in terms of energy per data bit E
b
becomes
(4.12)
The input to the decoder is in the log likelihood ratio LLR form to assure that the
effects of the channel .i.e. noise variance n
k
and channel gain a
k
have been properly
taken into account. The input to the decoder is in the form
(4.13)
By applying Bayess rule and assuming that P[S
k
=+1] = P[S
k
=-1]
(4.14)
where f
Y
(Y
k
|S
k
) is the conditional pdf of receiving Y
k
given code bit S
k
, which is
Gaussian with mean a
k
S
k
and variance
2
[27]. Thus, for decoding the bit, we need both
the code bit as well as knowledge of the statistics of the channel.
Substituting the expression for Gaussian pdf and simplifying yields:
(4.15)
36
The expression shows that the matched filter coefficients need to be scaled by a
factor 2a
k
/
2
in order to meet the input requirements of the decoder where R
k
shows the
received LLR for both data and parity bits. The notation R(X
k
) denotes the received LLR
corresponding to systematic data bits X
k
, R(Z
k
) is the LLR corresponding to the upper
encoder parity bit Z
k
and R(Z
k
) corresponds to the lower encoder parity bit Z
k
.
4.3 Decoder architecture
The architecture of the UMTS decoder is as shown in Figure 4.2. It operates in an
iterative manner as indicated by the presence of the feedback path [26] [27].
The decoder uses the received codeword along with the knowledge of the code
structure to compute the LLRs. Because of the presence of the interleaver at the encoder
end, the structure of the code becomes very complicated and it is not feasible to
compute the LLR by using a single probabilistic processor. It is rather feasible to break
the job of achieving a global LLR estimate in two iterations. As a result, each iteration
of the Turbo decoder consists of two half iterations, one for each constituent RSC
decoder. As indicated by the sequence of arrows, the timing of the decoder is such that
the RSC decoder 1 operates during the first half iteration and RSC decoder 2 operates
during the second half iteration.
Both the decoder structures compute LLR using a soft-input-soft-output processor
(SISO). Although the data bits used by the second decoder are interleaved, the two
SISO processors are producing LLR estimate of same set of data bits in different order.
Figure 4.2 The UMTS turbo decoder architecture [26].
)
-
-
RSC
Decoder
1
RSC
Decoder
2
I
n
t
e
r
l
e
a
v
e
D
e
i
n
t
e
r
l
e
a
v
e
+
+ +
37
The performance of the decoders can be greatly improved by sharing the LLR
estimates with each other. The first SISO decoder calculates the LLR and passes it to
the second decoder. The second decoder uses that value to calculate LLR which is fed
back to the first decoder. These back and forth exchange of information between the
processors make turbo decoder an iterative decoder. This also results in an improved
performance as after every iteration, the decoder is better able to estimate the data.
However as the number of iterations increase, the rate of performance improvements
decreases [27].
The value w(X
k
) for 1 k K, is the extrinsic information produced by RSC
decoder 2 and fed back to the input of RSC decoder 1. Prior to the first iterations, the
value of w(X
k
) is initialized to all zeros as decoder 2 has not yet decoded the data. After
each iteration, the value of w(X
k
) is updated to a new non zero value to reflect the
beliefs regarding the data and are propagated from decoder 2 to decoder 1 which uses
this information as the extrinsic information. As both RSC encoders at the transmitter
end are using independent tail bits so only the information regarding actual information
bits will be exchanged between the two encoders. For the value of k in range K+1 k
K+3, w(X
k
) is undefined or in other words zero after every iteration.
Decoder 1 must use the extrinsic information while decoding. Figure 4.2 gives a
detailed understanding of the steps being followed. The following sequence of steps is
followed:
1. The extrinsic information w(X
k
) is added to the received systematic LLR R(X
k
)
forming a new variable V
1
(X
k
).
(4.16)
As the extrinsic information w(X
k
) is non zero for 1 k K so the input to the RSC
decoder 1 is the received parity bits in LLR form R(Z
k
) and a combination of the
systematic data R(X
k
) and extrinsic information w(X
k
), i.e., V
1
(X
k
). For
K+1 k K+3 (tail bits) no extrinsic information is available, i.e., w(X
k
) equals
zero. In this case, the input to RSC decoder 1 is the received and scaled upper
encoders tail bits V
1
(X
k
) = R(X
k
) + 0 = R(X
k
), and the corresponding received and
scaled parity bits R(Z
k
).
2. The RSC decoder 1 uses this information to decode the data and outputs the LLR
1
(X
k
) for 1 k K as only the LLR of data bits is shared with the second encoder.
(4.17)
3. The extrinsic information w(X
k
) is subtracted from the first encoders LLR
1
(X
k
),
to obtain a new variable V
2
(X
k
) which is to be used by the second encoder. Similar to
V
1
(X
k
),
(4.18)
4. The sequence
is interleaved to obtain
(4.19)
This is done so that the sequence of bits in
and
(4.20)
6. The LLR
is deinterleaved to form
(4.21)
7. The sequence
by
subtracting
(4.22)
8. When the number of iterations are completed, a hard bit decision is taken using
such that
is
greater than zero and vice versa for 1 k K.
(4.23)
4.4 RSC decoder
The heart of the turbo decoder is the algorithm used to implement the RSC decoder. The
RSC decoders use a trellis diagram to represent all possible transitions between the
states along with their respective outputs. As the RSC encoder used by UMTS has three
shift registers, the number of distinct possible states is
(4.24)
Thus, max* operator in this case be calculated by taking the maximum of the two input
arguments x and y and then adding a correction function f
c
to it. The correction function
is simply a function of the absolute difference between the two arguments of the max*
operator. The correction function f
c
can be computed using log and exponential
functions or it can be pre-computed and stored in a look-up table to decrease
40
Figure 4.3 The correction function fc used by log-MAP, constant-log-MAP and
linear-log-MAP algorithm.
complexity. The log-MAP algorithm is the most complex of all the four algorithms but
it offers the best BER performance. The correction function f
c
used by log MAP
algorithm is illustrated in Figure 4.3 along with the correction functions used by
constant-log MAP and linear-log MAP algorithms.
4.4.1.2 Max-log MAP algorithm
In the max-log MAP algorithm, the log MAP algorithm is loosely approximated by
setting the correction function f
c
to zero in equation (4.24), i.e., it is not used at all [27].
(4.25)
The max-log MAP algorithm is the least complex of all the four algorithms but it has
still twice the complexity of the Viterbi algorithm. It can be implemented using a pair of
Viterbi algorithm, one that sweeps through the trellis in the forward direction and one in
the reverse direction [31]. It offers the worst BER performance of all the four variants of
MAP algorithm. It has however the additional benefit of being intolerant of imperfect
noise variance estimates while operating in an AWGN channel.
4.4.1.3 Constant-log MAP algorithm
The constant-log MAP algorithm was first introduced by W. J. Gross and P. G. Gulak in
1998 [33]. It uses a lookup table with only two entries of the correction function, thus
decreasing the complexity.
0 0.5 1 1.5 2 2.5 3 3.5 4
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
|y-x|
f
c
|
y
-
x
|
log-MAP
linear-log-MAP
constant-log-MAP
41
(4.26)
The values of T and C used in the implementation are 1.5 and 0.5 respectively [26] [34].
The performance and complexity of constant-log MAP algorithm lies between log MAP
and max-log MAP however, it is more susceptible to noise variance estimation errors
than log MAP.
4.4.1.4 Linear log MAP algorithm
In the constant-log MAP algorithm, the correction function was approximated by a set
of lookup tables. This implies the need of a high speed memory. To avoid the need of
high-speed memory, a linear approximation can be used. The linear-log MAP algorithm
is also based on using a linear approximation of the Jacobi algorithm [35].
(4.27)
For a floating point processor, the parameters a and T used by the linear approximation
are chosen to minimize the total squared error between the exact correction function and
its linear approximation [26].
(4.28)
To minimize the function, partial derivates w.r.t a and T are set to zero. The partial
derivated of equation (4.26) w.r.t a is
42
(4.29)
Where K
1
and K
2
are constants :
Now taking partial derivative of of equation (4.28) with respect to T
(4.30)
Adding 3/2 times equation (4.29) and 2T times equation (4.30) yields
43
(4.31)
By solving g(T) which is momotonically increasing function of T, we arrive at the
optimal value of T. An iterative approach is used by first choosing T
1
and T
2
such that
g(T
1
) < 0 and g(T
2
) > 0. The mid point between T
1
and T
2
is T
0
. If g(T
0
)< 0, T
1
is set
equal to T
0
, otherwise T
2
is set equal to T
0
. The iterations are repeate until T
1
ans T
2
become very close. In the implementaion, the upper limit of the summation is set to 30
as with this value an error of 10
-10
can be achieved. By iteratively solving equation
(4.29) for g(T) and setting the upper value of summation to 30, the value of T and a
come out to be 2.50681640022001 and -0.24904181891710 respectively. For the
simulation, the value of T and a are approximated to 2.5068 and -0.24904 respectively.
The complexity and performance of linear-log-MAP algorithm lies between that of log-
MAP and constant log-MAP algorithms however it converges much faster than the
constant-log-MAP algorithm.
4.4.2 RSC decoder operation
As discussed earlier, each of the two RSC decoders sweep through the trellis twice,
once in forward and once in reverse direction. However, it does not matter in which
direction the sweep is performed first, i.e., one can sweep in either the forward or the
reverse direction first. Also, the partial path metrics for only the entire first sweep must
be stored in memory. The partial path metrics for the entire second sweep need not be
stored as the LLR values can be computed during the second sweep. Thus, partial path
metrics for only the current state and the previous state must be stored during the second
sweep. It is recommended to perform the reverse sweep first and save partial path
metrices for each node in memory. Then the forward sweep is performed and LLR
estimates of data are produced. If the forward sweep is performed first, the LLR
estimates are produced during the reverse sweep and are thus in reverse order [26].
44
Figure 4.4 A trellis section for the RSC code used by the UMTS turbo code [26].
The trellis structure used by the RSC decoder is shown in Figure 4.4. Each state has two
branches leaving it, one corresponding to an input one and one for input zero. Solid
lines indicate data one and dotted lines indicate data zero. The branches indicate which
next state can be reached from a particular state. The branches are labelled with branch
metrics. Every distinct codeword follows a particular path in the trellis [27] [26]
The branch metric connecting state S
i
(previous state, on left) and state S
j
(present
state, on right) is denoted as
(4.32)
The RSC encoder being rate r=1/2, only four distinct branch metrics are possible:
(4.33)
where
and
are the inputs to the RSC decoder. In Figure 4.2, for the first
(upper) RSC decoder,
equals
equals
. Also for the lower decoder, the second input is the parity bits
corresponding to the second encoder, i.e.,
instead of
.
45
4.4.2.1 Backward recursion
The simulated decoder sweep through the trellis in the backward direction first. The
normalized partial path metrics at all the nodes in the trellis are saved in memory. These
normalized partial metrics denoted by are later used to calculate LLRs during the
forward recursion [26]. The partial path metrics at trellis stage k with 2 k K+3 and at
state S
i
for (0 i 7) is denoted as
(4.34)
The partial path metrics for the current state k are found using the partial path metrics
for the next (previously calculated) k+1 state and the associated branch metrics
(4.35)
Where
and
indicate the two states at stage k+1 that are connected to state
to obtain normalized
partial path metrics.
(4.36)
The purpose of normalization is to reduce memory requirements. As after normalization
equals zero, so only the other seven normalized partial path metrics
for
1 i 7, need to be stored. As a result, there is a 12.5% saving in memory compared to
when no normalization was used.
4.4.2.2 Forward recursion and LLR calculation
Once the backward recursion is complete, the trellis is then swept in the forward
direction. Unlike the backward recursion where the partial path metrics for all states at
all stages need to be stored in memory, for the forward recursion only the partial path
metrics for two stages of the trellis need to be stored in memory, i.e., the current stage k
and the previous stage k-1. The forward partial path metric for state S
i
at trellis stage k is
denoted as
with the stages k ranging from 0 to K-1 and all possible states 0 i
7. The forward recursion is initialized by setting the value of forward partial path metric
for stage zero,
0
initialized to zero for state S
0
and negative infinity for all other seven
states.
(4.37)
46
The forward recursion begins with stage k=1 and the trellis is swept in the forward
direction until stage K is reached. The un-normalized partial path metrics
are
calculated as
(4.38)
Where
and
are the two states at stage k-1 that are connected to state
at stage k.
The partial path metrics
are normalized to
(4.39)
Using these values of normalized partial path metrics
(4.40)
The likelihood of data 1 or 0 is then the Jacobi algorithm of the likelihood of all the
branches corresponding to data 1 or zero.
(4.41)
The
operator is computed recursively over the over the likelihoods of all the data
one branches
. Once
is calculated,