2002 Ijacsp

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING

Int. J. Adapt. Control Signal Process. 2002; 16: 557–576 (DOI: 10.1002/acs.725)

Natural gradient algorithm for neural networks applied to


non-linear high power amplifiers$

H. Abdulkadern,y, F. Langlet, D. Roviras and F. Castanie


ENSEEIHT-T!eSA, 2 rue C. CAMICHEL, 31071 Toulouse Cedex 7, France

SUMMARY
This paper investigates the processing techniques for non-linear high power amplifiers (HPA) using neural
networks (NNs). Several applications are presented: Identification and Predistortion of the HPA. Various
Neural Network structures are proposed to identify and predistort the HPA.
Since a few decades, NNs have shown excellent performance in solving complex problems (like
classification, recognition, etc.) but usually they suffer from slow convergence speed. Here, we propose to
use the natural gradient instead of the classical ordinary gradient in order to enhance the convergence
properties. Results are presented concerning identification and predistortion using classical and natural
gradient. Practical implementations issues are given at the end of the paper. Copyright # 2002 John Wiley
& Sons, Ltd.

KEY WORDS: high power amplifier; identification; predistortion; equalization; natural gradient algorithm;
neural networks

1. INTRODUCTION

The worldwide demand for wireless communications increases the need for wideband channels.
The third generation system ‘universal mobile telecommunication system’ (UMTS) is a
promising solution providing a global covering area and high transmission rate when using
satellite communication channels (S-UMTS channel). On board the satellite, a high power
amplifier (HPA: travelling wave tube (TWT) or solid state power amplifier (SSPA)) is used.
The HPA is designed in general to work near the saturation in order to get the maximum
power efficiency from the power sources on board the satellite. Working in this region, the HPA
has a non-linear (NL) behaviour: non-linearity in phase (AM/PM conversion) and amplitude
non-linearity (AM/AM conversion). These non-linearities have various effects: NL inter symbol

$
Invited Paper
n
Correspondence to: H. Abdulkader, ENSEEIHT-T!eSA, 2 rue C. CAMICHEL, 31071 Toulouse Cedex 7, France
y
E-mail: [email protected]

Copyright # 2002 John Wiley & Sons, Ltd.


10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
558 H. ABDULKADER ET AL.

interference (ISI), constellation degradation when using non constant envelope constellations
(16-QAM for example).
Because bandwidth efficiency is a very important topic in digital communications, the use of
non-constant envelope constellations is very interesting for satellite communications. Until now,
non-constant envelope constellations are not used because of NL problems related to the HPA.
Multicarrier communications (OFDM for example) are also forbidden in satellite communica-
tions because of the peak to mean power factor [1].
In order to give the ability of using such non-constant envelope modulations it is necessary to
fight against the HPA non-linear effects. For this purpose, two approaches are possible: the first
one is equalization at the terminal side. Conventional equalizers can be used for fighting against
ISI introduced by the propagation channel [2] while NL equalizers are best suited for equalizing
the non-linear effects of the HPA. Many researchers have investigated this last topic. NL
equalizers can be based on Volterra series or on Neural Network (NN) structures. Among
Volterra series equalizers we can point out for example References [3,4]. A complete
presentation of Neural Networks can be found in Reference [5]. Concerning (NN) structures,
we can find in the literature equalizers based on multi layer perceptron (MLP) References [6–8],
radial basis functions (RBF) [9,10], and self organizing maps (SOM) [11]. In References [12,13] a
description and comparison between MLP, RBF and SOM equalizers can be found. NN
structures can be updated using training sequences or in blind [14,15]. NN equalization at the
terminal side has the advantage of equalizing the down link propagation channel (possibly time
varying fading channel) together with fighting against NL distortions (amplitude, phase and
ISI). The main drawback of this technique is the additional cost and the computation load of the
NN equalizer in each terminal.
A second approach is power amplifier linearization or predistortion. The advantage of this
technique lays on the fact that we need only a single system for fighting against the HPA non-
linearity (compared to NL equalizer in each terminal). Linearization techniques are generally
non-adaptive systems [16]. In this paper we present predistortion techniques using NNs with
ability of adaptive behaviour. This kind of predistortion is well adapted for satellites with
regenerative payloads (in-phase I and quadrature Q baseband signals are available on board). It
is interesting to note that there exists analogue NN [17] working in baseband up to symbol rates
equal to 50 Mbauds.
Finally, a third technique is often necessary when dealing with NL links: Identification. In
Reference [18] Ibnkahla has shown that a NN model of the NL satellite channel may be used for
failures detection. Identification is also necessary in order to build adaptive predistortion
schemes because the NL mathematical model is generally unknown (HPA is time varying with
age, temperature, etc.). As for predistortion, Volterra series or NNs can be used for the
identification [19]. As all adaptive systems, the three previous techniques need to minimize some
cost criteria associated to some updating algorithm. In this paper we present two minimization
algorithms: classical gradient descent algorithm called: ‘ordinary gradient descent’ (OGD) and
natural gradient descent (Nat-GD).
Section 2 of the paper presents NN training algorithms using the back propagation (BP)
algorithm based on the OGD and the Nat-GD. Section 3 is devoted to the identification of the
NL HPA using NNs. After presenting the characteristics of the HPA we propose two structures
to model the HPA: mimic structure with a NL system to model the amplitude conversion and a
second one to model the phase distortion, and a classical structure (a simple multilayer NN).
Section 4 of the paper presents two NN predistortion methods using a mimic NN structure.

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
NATURAL GRADIENT ALGORITHM 559

Identification and predistortion structures will be learned with Nat-GD and OGD algorithms.
Due to intensive literature we have omitted NNs equalization. Section 5 of the paper is related
to implementation issues. We present the implementation of NN equalizers and predistorter.
Finally Section 6 concludes the paper.

2. NEURAL NETWORKS

Recently, NNs were widely used as powerful adaptive signal processing tools. Especially,
wireless and satellite digital communications take advantage of NNs properties in several
applications as identification, equalization, etc. (see Reference [20] for a survey). Ibnkahla [21]
has studied the problem of HPA predistortion in order to linearize the HPA characteristics
using NN. In [22,23] Ibnkahla et al. have applied NNs for identifying and modeling the HPA.
They have used the classical ordinary gradient based-backpropagation algorithm. In the
following we propose to use the natural gradient algorithm for training NNs applied for
identification and predistortion purposes.

2.1. Neural networks structures


Many NN structures have been investigated in literature (see for example References [5,20]). It is
out of the scope of the paper to present all kinds of NNs: multilayer perceptron (MLP), radial
basis function (RBF), self organizing maps (SOM), etc. Interested readers can view [5] for more
details. In this paper we use MLP based NN structures and consequently we will only describe
this type of NNs. An MLP is generally composed of an input layer, one or more hidden layers
and an output layer. Hidden and output layers include several neurons. Each neuron j in a layer
l is connected to a neuron i in the layer l-1 via a weight wij and has a bias bj. The output of a
hidden neuron j is
!
X
zj ¼ f wij zi þ bj ð1Þ
i2l1

where f is the activation function of the neuron j. In our applications, the activation function of
an hidden neurons is an hyperbolic tangent function while the activation function of an output
neuron is linear.
Amari [24] has demonstrated an interesting property concerning NNs. This property which is
called ‘universal approximation theory’ says that an MLP with an infinite number of neurons is
able to model any NL function. In digital communication applications of NNs, experience
showed that some tenths of neurons in the hidden layer can be sufficient to overcome most
problems. In the two following subsections we present two algorithms for training NNs (OGD T
and Nat-GD). Algorithms  areTused for modelling some system with input vector x ¼ xI ; xQ
and output vector y ¼ yI ; yQ where the superscript T is matrix transposition.

2.2. Ordinary gradient-based backpropagation algorithm


Feedforward NN (see Figure 2) is a special structure among the great family of NNs structures
but it is the most widely used in practice since it is easily designed and implemented. In the

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
560 H. ABDULKADER ET AL.

following we will use only this kind of NNs and the expression feedforward will be omitted
henceforth. NNs are trained by the well known algorithm called backpropagation (BP)
algorithm which can be summarized as follows [5]:
1. Initialization of the coefficients (weights w and bias b).
2. Presentation of an input vector x(n).Then the algorithm contains two phases:
3. Feedforward phase: the network input is forward propagated through the hidden layers
towards the output layer. An error signal is calculated at the output of the NN according
to a certain cost function L. Depending on the cost function L, the algorithm can be either
a supervised learning algorithm (for each input vector x(n) there is a known desired output
vector y(n) called the teacher output), or an unsupervised one (if no teacher output is
available). In our paper we use supervised learning algorithms together with a cost function
equal to the square error function.
1   1  1h  2 i
L ¼ e2  ¼ e2I þ e2Q ¼ ðyI  y# I Þ2 þ yQ  y# Q ð2Þ
2 2 2

4. Feedback phase: the error signal is then backpropagated from the network output towards
the input. At the output of each neuron an error signal is computed, representing its
contribution to the whole network error.
X
0
dlj ðnÞ ¼ dlþ1
k ðnÞwjk fj ð3Þ
k2lþ1

where dlj ðnÞ


is the error contribution of the neuron j in the layer l to the whole error at time
n. The coefficients are then updated according to the stochastic LMS algorithm.
@L
rwij ¼ m zi dlj ¼ m ð4Þ
@wij
where m is a learning rate 05m51. This very classical algorithm will be called the
Ordinary Gradient Descent (OGD) algorithm. The OGD vector can be written as
follows:
@L @L
rL ¼ ½:::; ; ; ; . . .T ð5Þ
@wij @bi

2.3. Natural gradient


Gradient methods are widely used in signal processing (filtering, prediction, estimation, etc.).
Besides the ordinary gradient there is many other gradient-based algorithms. All these
algorithms try to enhance the convergence properties while having a low computation load.
Compromise between complexity and good convergence properties have given various classes of
gradient algorithms [5].
One drawback of the ordinary gradient is the difficulty in choosing the learning rate. Several
researchers have tried to define the optimal learning rate in order to accelerate the convergence
behavior together with algorithm stability. In Reference [25] the learning rate m follows a NL
fixed function (schedule). Other ones have chosen dynamic learning rate. In Reference [26] the

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
NATURAL GRADIENT ALGORITHM 561

authors have proposed several algorithms under certain assumptions for training batch
feedforward NNs. The learning rate is adjusted dynamically according to an estimation of the
Lipschitz} constant. Another approach was used in Reference [27], it takes advantage of the
eigenvalues of the Hessian matrix in order to choose a global learning rate for the whole NN or
an individual one for each coefficient of the NN. Other methods consist in adding a
supplementary term to the ordinary gradient rule in order to accelerate the convergence and to
enhance the stability [28]. Orr et al. [29] have used the Hessian’s matrix in the momentum term.
This algorithm uses information about the NN manifold curvature related to the Hessian
matrix.
In the 1980s, Amari proposed his theory called ‘information geometry’ from which he has
pointed out a novel algorithm: the natural gradient descent algorithm (Nat-GD). When
applying this algorithm for training NNs, the coefficients evolve in the direction of the steepest
descent. Amari et al. (for example References [30–32]) have studied the properties of this
algorithm and showed its ability to avoid the ‘plateaus phenomenon’ currently encountered with
NNs [32]. Besides its fast convergence speed, natural gradient algorithm reaches the Cramer-
Rao lower bound when the cost function is the maximum likelihood criterion [30]. The Nat-GD
can be explained as follows:
Along time, the output of the NN y# ¼ cð x;yÞ moves in a manifold T with dimension equal to
the dimension of the parameters vector y ¼ . . . ; wij ; . . . ; bi ; . . . : This manifold is not an
Euclidean space since its axis along y are not orthonormal, such a manifold is said a
Riemannian manifold. The vectorial field rL=ry is a covariant vectorial field and to get the
contravariant vectorial field (the steepest descent direction in a Riemannian manifold) a
contravariant tensor metric must be used.
Amari [30] has shown that the fisher information matrix (FIM) is the only invariant metric
matrix in a Riemannian manifold. So, the natural gradient vector in such a manifold is:
* L ¼ A1 rL
r ð6Þ
where rL is given by (5) and A1 is the inverse of the fisher information matrix:
"  #
rL rL T
A¼E ð7Þ
ry ry

where E stands for mathematical expectation. Finally, the Nat-GD algorithm is


*L
ry ¼ mr ð8Þ

2.4. Comparison between Nat-GD and OGD


From Equation (6) it is obvious that the difference between the two algorithms is the use of the
inverse of the FIM in the Nat-GD. Implementation of the Nat-GD needs computing the inverse
of A at each iteration and multiplying it by the OGD vector rL. In order to reduce the
computation load of the Nat-GD there is a method allowing the on-line estimation of the FIM
inverse iteratively. See, for example, the Sherman–Morrison–Woodbury formula [33].

}
The optimal learning rate is equal to half the inverse of the Lipschitz constant which reflects some topological properties
of the error surface.

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
562 H. ABDULKADER ET AL.

In the following we will use, for each application (NN identification or predistortion), the two
algorithms: OGD and Nat-GD.

3. IDENTIFICATION OF THE HPA

In this section we model the HPA using two NN structures:


1. Global model: In this case we do not use in the design of the NN any knowledge about the
HPA. A simple global MLP will be used for modeling and identifying the HPA.
2. Mimic model: Here we take advantage of the knowledge of the HPA which has a Non
Linearity in amplitude (AM/AM) and in phase (AM/PM). In the mimic structure we use
two sub-NNs: the first one used for modeling the amplitude conversion while the second
one models the phase distortion.

3.1. HPA behaviour


The HPA we used in the paper is a TWT described by the Saleh’s model [34]:
aa r ap r 2
AðrÞ ¼ ; fðrÞ ¼ ð9Þ
1 þ ba r 2 1 þ bp r 2
where r is the modulus of the TWT input, aa ¼ 2; ba ¼ 1; ap ¼ 4:0033; and bp ¼ 9:104:
Figure 1 illustrates the effects of the HPA non-linearity when amplifying a 16-QAM signal
with a TWT working at the saturation point (IBO}¼ 0). It is obvious that the decision
boundaries at the HPA output are highly non linear.

3.2. Global model structure for TWT modeling


In this subsection we use a general purpose MLP (Figure 2) to model the TWT. We shall present
hereafter the updating rules. Applying an input signal x ¼ ½xI ; xQ T ; The output of the hidden
neuron k is
!
X
zk ¼ f wjk xj þ bk ð10Þ
j

where wjk is the weight connection between xj and the neuron k,(j ¼ I; Q). The function f is a
nonlinear activation function (hyperbolic tangent function). The output of the NN can be
expressed as
X
y# j ¼ ukj zk ð11Þ
k

where ukj is the weight connecting the neuron k of the hidden layer to the neuron j of the output
layer (j ¼ I; Q). The cost function is the square error defined in(2). Updating rules of weights
and bias of the network can be computed following the OGD-based backpropagation algorithm
explained earlier.

}
IBO input back off, is the ratio of the average power of the input signal to the saturation input power.

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
NATURAL GRADIENT ALGORITHM 563

Figure 1. Top: the AM/AM and AM/PM characteristics of the TWT (Saleh’s model). Bottom: the input
16-QAM constellation (left) and TWT output constellation (right, TWT at 0dB IBO).

Figure 2. NN structure for global modelling of a TWT.

In order to implement the Nat-GD algorithm (Equation(6)), we construct the vector y


containing all the NN coefficients and calculate the ordinary gradient vector of the cost function
L. Matrix A1 must be updated using the actual ordinary gradient. Finally, Equation (8) is
applied in order to compute the update of each coefficient.

3.3. Mimic model structure for TWT modeling


It is well known that providing information about the system to be modelled can be useful for
enhancing the modelling procedure and reducing the residual error. As we know, the HPA
distorts the transmitted signal in two ways: amplitude (AM/AM conversion) and phase (AM/
PM distortion). Figure 3(a) presents the functional diagram of a TWT. In this subsection we use
polar notations to compute the updating rules. The TWT input signal is x ¼ rejj and its output
is y ¼ rGðrÞejðfðrÞþjÞ : The phase distortion fðrÞ and the amplitude gain GðrÞ depend only on the
modulus r of the input signal. The AM/AM conversion is given by rGðrÞ: According to Figure
3(a), the behaviour of the HPA can be modelled using a NN structure containing two sub-NNs
fm ðrÞ and Gm ðrÞ in order to model separately fðrÞ and GðrÞ (see Figure 3(b)). The two sub-NNs
have the structure presented in Figure 3(c). Equations:

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
564 H. ABDULKADER ET AL.

Figure 3. (a) functional diagram of the TWT. (b) NN structure containing two sub-NNs for modelling the
TWT behaviour. (c) sub-NN structure.

y# I ¼ rGm ðrÞ cos ðfm ðrÞ þ jÞ


y# Q ¼ rGm ðrÞ sin ðfm ðrÞ þ jÞ ð12Þ

give the output of the NN formulated as a function of the two sub-NNs outputs. The cost
function is given by (2). To conclude the updating rules for this structure let us introduce the
following quantities:
rL
dg ¼ ¼ rðeI cos ðfm ðrÞ þ jÞ þ eQ sin ðfm ðrÞ þ jÞÞ
rGm ðrÞ
rL
df ¼ ¼ rGm ðrÞðeI sinðfm ðrÞ þ jÞ  eQ cosðfm ðrÞ þ jÞÞ ð13Þ
rfm ðrÞ

Since the two sub-NNs have the same structure we shall carry out the updating rules for one
sub-NN only (Gm(r) for example). The updating rules for weights and biases of this network are:

ui ðn þ 1Þ ¼ ui ðnÞ  mdg zi
wki ðn þ 1Þ ¼ wki ðnÞ  mrdk
bk ðnÞ ¼ bk ðnÞ  mdk

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
NATURAL GRADIENT ALGORITHM 565

with dk ¼ dg uk f 0 : Updating rules of fm(r) sub-NN can be deduced from the previous equations
by replacing dg by df :

3.4. Simulations
Hereafter we present simulation results for the TWT identification with the structures presented
above.

3.4.1. Global model


The NN (Figure 2) has 15 neurons in the hidden layer and 2 neurons in the output layer. The
learning rate for the two algorithms is taken equal to m ¼ 103 :
Figure 4 shows the MSE evolution. It is clear that the NN trained by the Nat-GD converges
faster than the NN trained by the OGD. Furthermore, the residual MSE of the Nat-GD is
smaller. Figures 5 and 6 compare the conversions (AM/AM and AM/PM) of the two NNs
models with the TWT. It is obvious that the NN trained by the Nat-GD model the TWT more
accurately than the one trained by OGD. The cloud of points around the TWT curves can be

Figure 4. MSE versus Iteration number (global model).

Figure 5. AM/AM conversion of the Nat-GD (left) and OGD (right) (global model).

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
566 H. ABDULKADER ET AL.

Figure 6. AM/PM distortion of the Nat-GD (left) and OGD (right) (global model).

Figure 7. MSE versus Iteration number (mimic model).

explained as follows: The AM/AM and AM/PM conversion of the TWT depend only on the
input signal modulus. For a given modulus ri, there are many input patterns ½xI ; xQ T applied to
the NN input with modulus ri. So, for the same input modulus there are different NN responses
(as many as patterns with the same modulus in the space of the input signal).

3.4.2. mimic model


The NN structure used to model the TWT is composed of two sub-NNs as explained above.
Empirically (following experimental results) we have concluded that modelling G(r) is easier
than modeling the AM/PM (f(r)). So the number of neurons used to model the AM/AM
conversion is smaller than those used in the AM/PM sub-NN.
Gm(r) contains 9 neurons in the hidden layer and one neuron in the output layer, while fm(r)
contains 15 neurons in the hidden layer and one neuron in the output layer. Learning rate still
equals 103.
Figure 7 illustrates the evolution of the MSE. Here, also, the Nat-GD converges faster than
the OGD and reaches smaller MSE. Figures 8 and 9 compare the behavior of the two NNs with
the TWT. It is clear that the NN trained by the natural gradient matches better the TWT
conversions.

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
NATURAL GRADIENT ALGORITHM 567

Figure 8. AM/PM distortion of the Nat-GD (left) and OGD (right) (mimic model).

Figure 9. AM/AM conversion of the Nat-GD (left) and OGD (right) (mimic model).

Table I. Comparison between the ordinary and the natural gradient algorithms for the two
models (identification case).
Global model Mimic model
OGD Nat-GD OGD Nat-GD
SER (dB) 25 54 37 74

By comparing Figures 5–6 with 8–9, we can see that the clouds of Figures 5–6 are replaced by
thin curves in Figures 8–9. This is because the sub-NNs are trained by the modulus of the input
patterns instead of the patterns itself, in the mimic model.
As a measure of the modelling quality we have measured the signal to modeling error ratio. It
is given by the following expression:
 p 
o
SER ¼ 10 log10 ð14Þ
MSE
where po is the power at the TWT output.

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
568 H. ABDULKADER ET AL.

Table I compares the signal to modeling error ratio (SER) between the two algorithms and
the two structures. SER values validate the previous remarks, i.e. the mimic structure performs
better than the other one.

4. TWT PREDISTORTION

Mimic model outperforms the global model for TWT modelling. In the above section, results
confirmed the fact that using a priori knowledge about the system to be identified, enhances the
modelling procedure. In this section we shall use, also, mimic structures in order to predistort
the TWT. The goal of predistortion techniques is to linearize the AM/AM conversion and to
cancel the AM/PM distortion of the TWT. Ibnkahla [21] has compared the performance of a
NN based predistorter learned by the OGD with a Volterra series based predistoter. Simulations
gave advantage to the NN. In our paper we propose a new NN implementation scheme which
enhances the cancellation of the AM/PM distortion.

4.1. Predistortion architecture


Here, we study on-board predistortion of the TWT. This method is possible with regenerative
payloads where In-phase and Quadrature baseband signals are available. Section 5 of this paper
will develop some elements concerning implementation issues. The novel method has the
following features:
1. Generality: this method can predisort any HPA: a solid state power amplifier (SSPA) or a
TWT,
2. Ability to follow the HPA variation and ageing, and
3. Phase canceling.
Figure 10 presents the general architecture of the HPA predistortion system on-board the
satellite. Obviously this application needs three extra equipment:
1. A demodulator to acquire the in-phase I and the quadrature Q components at the TWT
output,
2. The NN predistorter at the HPA input, and
3. A learning algorithm for updating the predistoter.
For applying the above architecture, we need a mathematical model of the HPA. Since HPAs
do not have all the same model and because HPA may be time varying (ageing, temperature
drifts, etc.), we propose two adaptive predistortion methods suitable when no knowledge about
the HPA is available.

Figure 10. Predistortion architecture for regenerative satellite payload.

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
NATURAL GRADIENT ALGORITHM 569

Figure 11. Predistorter Implementation.

Figure 12. Principle of predistortion based on HPA identification.

Figure 13. Equivalent scheme for updating the gain sub-NN of the predistorter.

4.2. NN structures for TWT predistortion


The NN based predistorter is composed of two sub-NNs (Figure 11):
1. A sub-NN in order to invert the AM/AM conversion of the TWT (Gp(r) sub-NN),
2. Another sub-NN to cancel the TWT phase shift (fp ðrÞ sub-NN).
This section presents two methods for predistorting the TWT: The first one uses the identified
model of the HPA (given by identification using a mimic model) for updating the NN
predistorter. The second one attempts to build the predistorter parts (Gp(r) and fp ðrÞ sub-NNs)
from the set of input–output patterns of the TWT.

4.2.1. Predistortion based on TWT identification


For predistorting the TWT, we need its mathematical NL conversions. In order to design a
general predistortion scheme able to predistort any HPA and following its variations, we
propose, in this subsection, to identify the TWT using a mimic model. The TWT will be replaced
by this identified model when designing the predistorter (Figure 12). In Figure 11, if we replace
fp(r) by the identified function fm(r), we correct completely the phase shift. The update of Gp(r)
becomes as simple as depicted on Figure 13. The cost function is the square error between the
modulus of the input and output patterns.
!2
X
e ¼ ðr  r#Þ ¼ r  r
2 2 m m
ui f ðwi rÞ ð15Þ
i

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
570 H. ABDULKADER ET AL.

Figure 14. Updating schemes of the gain predistortion and the phase canceling sub-NNs.

where r is the modulus of the predistortion NN output.


X p
r¼r ui f ðwpi rÞ ð16Þ
i

The superscripts p and m denote the coefficients of NN predistortion and of NN identifying


model.

4.2.2. Predistortion of the TWT (second method)


Previous method is easy to implement since fp(r) sub-NN is frozen and set identical to fm(r) but
its accuracy depends intrinsically on the accuracy of the identified model. Hereafter we describe
another method to carry out Gp(r) and fp(r) from the input–output patterns of the TWT.
Figures 14(a) and 14(b) illustrate the updating schemes of Gp(r) and fp(r), respectively. After
convergence, the two sub-NNs will be placed in the predistortion scheme of Figure 11.
(a) Amplitude predistortion sub-NN.
Here we design Gp(r) sub-NN in order to model the inverse of the AM/AM conversion. The
input of this sub-NN (Gp(r)) is the modulus of the TWT output. The cost function is the
square error between the output of the sub-NN multiplied by its input, and the TWT input
modulus (Figure 14(a)).
(b) Phase canceling sub-NN.
As it was said above, the AM/PM distortion does not depend on the phase input but is a
function of the input modulus only. So, the input of this sub-NN is the modulus of the input
patterns while the cost function is
!2
X
2
e ¼ jout  jIn  ui f ðwi rIn Þ ð17Þ
i

4.3. Simulation results


In this subsection we illustrate some results concerning the two predistortion methods. The
number of neurons is equal to 9 in Gp(r) and 15 in fp(r) (same number of neurons as for the
mimic model). Learning rates of the ordinary and the natural GD are, respectively, 5  103 and
5  104.

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
NATURAL GRADIENT ALGORITHM 571

Table II. SER for the two gradient algorithms at the end of the training phase (predistortion
based on TWT identification).
Nat-GD OGD
SERI (dB) 31 61
SERQ (dB) 31 61

Figure 15. Predistortion results concerning the mimic model of TWT.

4.3.1. Predistortion of the mimic model


The neural mimic model obtained by the Nat-GD is used to design the predistorter sub-NNs.
Training the sub-NNs is done by the Nat-GD and the OGD, respectively. The algorithms are
run for 106 iterations (learning curves will not be illustrated for space reasons). As it is expected,
Nat-GD converges faster than the OGD and the residual mean square error is smaller. After the
training phase, we ran a validation sequence and measured the SER. These results are presented
in Table II.
Figures 15 and 16 correspond to the validation phase. Curves in Figure 15 present: the TWT
output versus the modulus of the predistorter input (. Linearized HPA), the modulus of the
predistorter output versus the modulus of the predistorter input (+) and the modulus of the
TWT output versus its own input modulus (}). It is clear that the OGD does not linearize
perfectly the AM/AM conversion. Figure 16 presents the phase distortion of the system
(predistorter + TWT) for the two algorithms. This curve is related to the accuracy of the AM/
PM modelling by fm.

4.3.2. Predistortion of the TWT (second method)


Table III presents a comparison between the two gradient algorithms concerning the AM/PM
distortion and the AM/AM inversion. Latter table shows that the mean square error of the Nat-
GD is smaller than the OGD algorithm. Here also, the Nat-GD converges faster than the OGD.
After convergence, the sub-NNs Gp(r) and fp(r) are copied into the predistortion scheme
(Figure 11).

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
572 H. ABDULKADER ET AL.

Figure 16. Phase distortion results concerning the predistorter+TWT (predistortion based on TWT
identification).

Table III. SER (in dB) at the end of the training phase (predistortion: second method).
AM/PM u" mmmmmodel (AM/AM)1
OGD 32 21
Nat-GD 83 51

To validate this method we presented a validation sequence to the input. Validation results
are depicted in Figures 17 and 18, and in the Table IV. Table IV compares the SER of the two
algorithms for the I and Q signals.
Figures 17 and 18 illustrate some simulation results concerning the validation of the
predistortion scheme. Curves on Figure 17 show: the TWT output modulus versus the
predistorter input modulus (. Linearized HPA), the output predistorter modulus versus the
input predistorter modulus (+) and the TWT output modulus versus its input modulus (}).
Figure 18 presents the residual phase shift of the overall system. It is evident that the NN trained
by the Nat-GD is more accurate than the other one. Moreover it converges faster.
Comparing Tables 2 and 4, we obviously remark that the first method is more accurate than
the second one. In fact, the second method is not really adequate to predistort the TWT but to
post-linearize it. It can be seen, from Figures 13 and 14(a), that the equivalence between the two
methods stands only when the following equation is true:

rGp ðrÞ Gm ðrGp ðrÞÞ ¼ rGðrÞ Gp ðrGðrÞÞ ð18Þ

G(r), Gm(r) and Gp(r) are the gain of TWT, mimic identified model and predistortion NN,
respectively. Admitting that GðrÞ ¼ Gm ðrÞ; i.e. neglecting the modeling error, we ascertain that
Equation (18) is true when GðrÞ ¼ Gp ðrÞ or when the two functions are linear that do not stand
in the TWT case.

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
NATURAL GRADIENT ALGORITHM 573

Figure 17. Output modulus versus Input modulus for OGD (left) and Nat-GD (right)
(predistortion: second method).

Figure 18. Overall system (Predistorter+HPA) phase distortion using OGD (left) and Nat-
GD (right) (predistortion: second method).

Table IV. SER for the two gradient algorithms during validation phase (predistortion: second method).
OGD Nat-GD
SERI (dB) 19 54
SERQ (dB) 19 54

5. PRACTICAL ISSUE

Practical implementations of Neural Networks have been yet realized. The first implementation
has been achieved during the NEWTEST European project and was dealing with NN
equalization. Another one is being under construction concerning predistortion purposes.

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
574 H. ABDULKADER ET AL.

5.1. NEWTEST project


The European project NEWTEST [35], in the frame of the ACTS program has studied the
possible use of NN equalizers for satellite non linear communications. NN equalizers based on
MLP, RBF and SOM structures have been studied [12]. Within this project, the implementation
of NN equalizers has been carried out. An MLP equalizer has been implemented on a digital
signal processor and speeds up to 64 Kb/s have been demonstrated. These speeds are sufficient
for S-UMTS applications. The implementation of the same MLP equalizer on an FPGA
structures has reached 400Kb/s demonstrating the ability of NN equalizers to be used on
terminal equipment.

5.2. Implementation of NN predistorter on silicon


Another ongoing project is to use a Neural Network structure for predistorsion of a high power
amplifier. This HPA can be an SSPA or a TWT. TWT is generally memoryless devices while
SSPA has memory. The predistortion system will be located on board the satellite. That means
the satellite payload is regenerative. Baseband signals will then be available on board the
satellite and the predistortion will be applied on them. The targeted data rate is 100 Mb/s with
QPSK or 16-QAM modulations, yielding 25–50 MHz necessary bandwidth. Available processor
speeds are not sufficient for implementing in digital manner the MLP based predistorter. An
hybrid solution has been adopted: The MLP core will be implemented in analog manner on
CMOS chip while the training algorithm will be ran on a digital processor [17,36]. The training
algorithm will be realized on earth, in laboratory, with the real HPA. In order to fight against
the ageing and temperature drifts of the HPA an update of the predistorter will be done during
the satellite life time. This update has to overcome slow variations of the HPA. Thus, it is not
necessary to update the HPA at the symbol rate. An update every 100 ms will be sufficient for
correcting these variations. Computation of the new coefficients of the predistorter is then
possible on a classical digital processor. The targeted technology will be 0.35 mm CMOS for the
MLP core implementation [36].

6. CONCLUSION

This paper treated the problems arisen by non-linearity on board satellite. Identification and
predistortion of the TWT have been discussed intensively. We proposed to identify the TWT
following two ways: by a general purpose MLP and by a mimic NN structure composed of two
sub-NNs. The mimic model takes into account an a priori knowledge of the HPA behavior. The
advantage of the mimic model over the general purpose MLP model is shown by simulations.
For identification purposes, we have used two gradient algorithms: classical ordinary gradient
and natural gradient. Nat-GD has shown better convergence speed together with better MSE at
the end of convergence. The performance given by the mimic NN structure motivated using a
mimic NN structure for predistorting the TWT. We have also proposed two methods in order to
predistort the TWT. The first one is based on the identification of the HPA followed by
predistortion of the identified model. The second method uses the input and output patterns of
the TWT in order to compute the predistortion device. Simulations have shown that the method
based on identification is the better one. Furthermore, the latter method can be generalized to

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
NATURAL GRADIENT ALGORITHM 575

identify the HPA and to predistort it adaptively and simultaneously. So, it is possible to follow
the variations of the TWT (ageing, temperature drift, etc.).
Throughout the paper, we used two on-line stochastic algorithms to train NNs: ordinary
gradient descent and natural gradient descent. Simulations showed the advantage of using
natural gradient descent. It converges faster than the ordinary gradient descent and reaches
smaller mean square error in all our applications.
Practical implementations of NN structures for equalization have been yet demonstrated on
devices like Digital Signal Processors or FPGA. The implementation of NN predistortion for
regenerative satellite payloads will be possible using an hybrid technology: analog implementa-
tion for the NN predistorter core and digital implementation of the updating algorithm on a
classical processor.
REFERENCES
1. Davis, JA., Jedwab, J. Peak-to-mean power control in OFDM, golay complementary sequences, and reed-muller
codes. IEEE, Transaction on Information Theory 1999; 45(7).
2. Nordberg, J., Abbas, M., Sven, N., Ingvar, C. Fractionally spaced adaptive equalization of S-UMTS mobile
terminals. Wiley International Journal on Adaptive Control and Signal Processing, this issue.
3. Gutierrez, A., Ryan, W. E. Performance of adaptive Volterra equalizers on nonlinear satellite channels. Proceedings
of ICC’96, Seattle, USA.
4. Benedetto, S., Biglieri, E., Daffara, R., Modeling and Performance Evaluation of Nonlinear Satellite Links}A
volterra Series Approach. IEEE Transactions on Aerospace and Electronics Systems AES 1979; 15(4).
5. Haykin, S. Neural Networks: A Comprehensive Foundations, 2nd edn. Prentice-Hall: Upper Saddle River, NJ. 07458,
1999.
6. Balay, Palicot, J. Equalization of non-linear perturbations by a multilayer perceptron in satellite channel
transmission. Proceedings of IEEE Globecom’94.
7. Chang, P., Wang, B. Adaptive decision feedback equalization for digital satellite channels using multilayer neural
networks. IEEE Journal on selected areas in Communications 1995; 13(2).
8. Chen S., et al. Adaptive equalization of finite non-linear channels using multilayer perceptrons. Signal Processing
1990; 20:107–119.
9. Cha, I., Kassam, S.A., Channel equalization using adaptive complex radial basis function networks, IEEE Journal
On selected areas in communications 1995; 13(1):122–131.
10. Chen, S., McLaughlin, S., Mulgrew, B. Complex-valued radial basis function network, Part I: network architecture
and learning algorithms. Signal Processing 1994; 35:19–31.
11. Kohonen, T. Self-Organizing Maps. Springer: Berlin, 1995.
12. Bouchired, S., Roviras, D., Castanie, F. Equalization of satellite mobile channels with neural network techniques.
Space Communications, Vol. 15. IOS Press: Boston, New York, 1998/1999.
13. Bouchired, S., Ibnkahla, M., Roviras, D., Castanie, F. Equalization of satellite UMTS channels using neural
network devices. In Proceedings of IEEE, (ICASSP’1999), Phoenix, USA.
14. Benvenuto, N., Piazza, F. On the Complex Backpropagation Algorithm. IEEE Transactions on Signal Processing
1992; 40(4):967–969.
15. Murphy, C.D., Kassam, S.A. A Novel Linear/RBF Blind Equalizer for Nonlinear Channels. Proceedings of
CISS’99, Baltimore, March 1999.
16. Bernardini, A., Fina, S. D. Analysis of different optimization criteria for IF predistortion in digital radio links with
nonlinear amplifiers. IEEE Transactions on Communications 1997; 45(4).
17. Langlet, F., Ibnkahla, M., Castanie, F. Neural network hardware implementation: overview and applications to
satellite communications. Proceedings of DSP’98, ESA, Nordwick, Holland, September 1998.
18. Ibnkahla, M., Sombrin, J., Castanie, F. Channel identification and failure detection in digital satellite
communications. Globcom’96, London (UK), November 1996.
19. Pearson, RK., Ogunnaike, BA., Doyle, FJ. Identification of structurally constrained second order Volterra models.
IEEE Transactions on SP 1996; 44(11).
20. Ibnkahla, M. Applications of neural networks to digital communications-survey. Signal processing 2000; 80(7).
21. Ibnkahla, M. Neural network predistortion technique for digital satellite communications. ICASSP’2000, Istanbul,
Turkey.
22. Ibnkahla, M., Bershad, NJ., Sombrin, J., Castanie, F. Neural networks for modeling nonlinear channels. IEEE
Transactions on SP 1997; 45(7).
23. Ibnkahla, M., Bershad, NJ., Sombrin, J., Castanie, F. Neural network modeling and identification of non-linear
channels with memory: algorithms, applications, and analytic models. IEEE Transactions on SP 1998; 46(5).

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576
10991115, 2002, 8, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/acs.725 by Altinbas University, Wiley Online Library on [20/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
576 H. ABDULKADER ET AL.

24. Amari, S. A universal theorem for learning curves. Neural Networks 1993; 6: 161–166.
25. Darken, C., Moody, J. Towards faster stochastic gradient search. In Moody, JE., Hanson, SJ., Lipmann, R.P., (eds),
Advances in Neural Information Processing Systems, Vol. 4. Morgan Kufmann Publishers: San Mateo, CA. 1992.
26. Magoulas, GD., Vrahatis, MN., Androulakis, GS. Improving the convergence of the backpropagation algorithm
using learning rate adaptation methods. Neural Computation 1999; 11.
27. LeCun, Y., Simard, P., Pearlmutter, B. Automatic Learning Rate Maximization by On-Line Estimation of the
Hessian’s Eigenvectors, vol. 5, In Hanson, SJ., Cowan, JD., Giles, CL., (eds.). Morgan Kaufmann: San Mateo, CA;
156–163, 1993.
28. Haykin, S. Adaptive digital communication receivers. IEEE Communications Magazine December 2000.
29. Orret, GB., Leen, TK. Using curvature information for fast stochastic search. Advances in Neural Information
Processing Systems vol. 9. MIT Press: Cambridge, MA.
30. Amari, SI. Natural gradient works efficiently in learning. Neural Computing 1998; 10:251–276.
31. Yang, H.H., Amari, S.I. The efficiency and the robustness of natural gradient descent learning rule. Advances in
Neural Information Processing Systems, vol. 10. MIT Press: Cambridge, MA.
32. Yang, H.H., Amari, S.I. Training multi-layer perceptrons by natural gradient descent. In ICONIP prime 97
Proceedings. new Zeland.
33. Golub, G.H., Van Loan, C.F. Matrix computation, (2nd Edn.) The Johns Hopkins University Press: 1989.
34. Saleh, A. Frequency-independent and frequency-dependent nonlinear models of TWT amplifiers. IEEE Transactions
on Communications COMM 29, 1981.
35. Guntsh, A., Ibnkahla, M., Losquadro, G., Mazella, M., Roviras, D., Timm, A. EU’s R&D activities on the third
generation mobile satellite systems (S-UMTS). IEEE Communication Magazine 1998; 36(2):104–110.
36. Langlet, F., Roviras, D., Mallet, A., Castanie, F. Mixed analog/digital implementation of MLP NN for
predistortion. International joint conference on neural networks. Hi, USA, 2002.

Copyright # 2002 John Wiley & Sons, Ltd. Int. J. Adapt. Control Signal Process. 2002; 16:557–576

You might also like