Distributed Recursive Least-Squares: Stability and Performance Analysis
Distributed Recursive Least-Squares: Stability and Performance Analysis
Distributed Recursive Least-Squares: Stability and Performance Analysis
Abstract
The recursive least-squares (RLS) algorithm has well-documented merits for reducing complexity and
storage requirements, when it comes to online estimation of stationary signals as well as for tracking slowly-
varying nonstationary processes. In this paper, a distributed recursive least-squares (D-RLS) algorithm is
developed for cooperative estimation using ad hoc wireless sensor networks. Distributed iterations are
obtained by minimizing a separable reformulation of the exponentially-weighted least-squares cost, using
the alternating-minimization algorithm. Sensors carry out reduced-complexity tasks locally, and exchange
messages with one-hop neighbors to consent on the network-wide estimates adaptively. A steady-state
mean-square error (MSE) performance analysis of D-RLS is conducted, by studying a stochastically-driven
‘averaged’ system that approximates the D-RLS dynamics asymptotically in time. For sensor observations
that are linearly related to the time-invariant parameter vector sought, the simplifying independence setting
assumptions facilitate deriving accurate closed-form expressions for the MSE steady-state values. The
problems of mean- and MSE-sense stability of D-RLS are also investigated, and easily-checkable sufficient
conditions are derived under which a steady-state is attained. Without resorting to diminishing step-sizes
which compromise the tracking ability of D-RLS, stability ensures that per sensor estimates hover inside
a ball of finite radius centered at the true parameter vector, with high-probability, even when inter-sensor
communication links are noisy. Interestingly, computer simulations demonstrate that the theoretical findings
are accurate also in the pragmatic settings whereby sensors acquire temporally-correlated data.
Index Terms
Wireless sensor networks (WSNs), distributed estimation, RLS algorithm, performance analysis.
† Work in this paper was supported by the NSF grants CCF-0830480 and ECCS-0824007.
∗ The authors are with the Dept. of Electrical and Computer Engineering, University of Minnesota, 200 Union Street SE,
Minneapolis, MN 55455. Tel/fax: (612)626-7781/625-2002; Emails: {mate0058,georgios}@umn.edu
IEEE TRANSACTIONS ON SIGNAL PROCESSING (SUBMITTED) 1
I. I NTRODUCTION
Wireless sensor networks (WSNs), whereby large numbers of inexpensive sensors with constrained
resources cooperate to achieve a common goal, constitute a promising technology for applications as
diverse and crucial as environmental monitoring, process control and fault diagnosis for the industry, and
protection of critical infrastructure including the smart grid, just to name a few. Emergent WSNs have
created renewed interest also in the field of distributed computing, calling for collaborative solutions that
enable low-cost estimation of stationary signals as well as reduced-complexity tracking of nonstationary
processes; see e.g., [22], [33].
In this paper, a distributed recursive least-squares (D-RLS) algorithm is developed for estimation and
tracking using ad hoc WSNs with noisy links, and analyzed in terms of its stability and mean-square error
(MSE) steady-state performance. Ad hoc WSNs lack a central processing unit, and accordingly D-RLS
performs in-network processing of the (spatially) distributed sensor observations. In words, a two-step
iterative process takes place towards consenting on the desired global exponentially-weighted least-squares
estimator (EWLSE): sensors perform simple local tasks to refine their current estimates, and exchange
messages with one-hop neighbors over noisy communication channels. New sensor data acquired in real
time enrich the estimation process and learn the unknown statistics ‘on-the-fly’. In addition, the exponential
weighting effected through a forgetting factor endows D-RLS with tracking capabilities. This is desirable
in a constantly changing environment, within which WSNs are envisioned to operate.
Unique challenges arising with WSNs dictate that often times sensors need to perform estimation in a
constantly changing environment without having available a (statistical) model for the underlying processes
of interest. This has motivated the development of distributed adaptive estimation schemes, generalizing
the notion of adaptive filtering to a setup involving networked sensing/processing devices [3, SectionI-B].
The incremental (I-) RLS algorithm in [24] is one of the first such approaches, which sequentially
incorporates new sensor data while performing least-squares estimation. If one can afford maintaining
a so-termed Hamiltonian cyclic path across sensors, then I-RLS yields the centralized EWLS benchmark
estimate. Reducing the communication cost at a modest price in terms of estimation performance, an I-RLS
variant was also put forth in [24]; but the NP-hard challenge of determining a Hamiltonian cycle in large-
size WSNs remains [18]. Without topological constraints and increasing the degree of collaboration among
sensors, a diffusion RLS algorithm was proposed in [3]. In addition to local estimates, sensors continuously
diffuse raw sensor observations and regression vectors per neighborhood. This facilitates percolating new
data across the WSN, but estimation performance is degraded in the presence of communication noise. When
both the sensor measurements and regression vectors are corrupted by additive (colored) noise, the diffusion-
based RLS algorithm of [1] exploits sensor cooperation to reduce bias in the EWLSE. All [3], [1] and [24]
include steady-state MSE performance analysis under the independence setting assumptions [23, p. 448].
Distributed least mean-squares (LMS) counterparts are also available, trading off computational complexity
for estimation performance; for noteworthy representatives see [7], [14], [26], and references therein. Recent
studies have also considered more elaborate sensor processing strategies including projections [8], [12],
adaptive combination weights [30], or even sensor hierarchies [4], [26], and mobility [32].
Several distributed (adaptive) estimation algorithms are rooted on iterative optimization methods, which
capitalize upon the separable structure of the cost defining the desired estimator. The sample mean estimator
was formulated in [20] as an optimization problem, and was solved in a distributed fashion using a
primal dual approach; see, e.g., [2]. Similarly, the incremental schemes in e.g., [7], [19], [21], [24] are all
based on incremental (sub)gradient methods [17]. Even the diffusion LMS algorithm of [14] has been
recently shown related to incremental strategies, when these are adopted to optimize an approximate
reformulation of the LMS cost [5]. Building on the framework introduced by [27], the D-LMS and D-RLS
algorithms in [15], [16], [26] are obtained upon recasting the respective decentralized estimation problems
as multiple equivalent constrained subproblems. The resulting minimization subtasks are shown to be highly
paralellizable across sensors, when carried out using the alternating-direction method of multipliers (AD-
MoM) [2]. Much related to the AD-MoM is the alternating minimization algorithm (AMA) [31], used here
to develop a novel D-RLS algorithm offering reduced complexity when compared to its counterpart of [15].
The present paper develops a fully distributed (D-) RLS type of algorithm, which performs in-network,
adaptive LS estimation. D-RLS is applicable to general ad hoc WSNs that are challenged by additive
communication noise, and may lack a Hamiltonian cycle altogether. Different from the distributed Kalman
trackers of e.g., [6], [22], the universality of the LS principle broadens the applicability of D-RLS to a wide
class of distributed adaptive estimation tasks, since it requires no knowledge of the underlying state space
model. The algorithm is developed by reformulating the EWLSE into an equivalent constrained form [27],
which can be minimized in a distributed fashion by capitalizing on the separable structure of the augmented
Lagrangian using the AMA solver in [31] (Section II). From an algorithmic standpoint, the novel distributed
iterations here offer two extra features relative to the AD-MoM-based D-RLS variants in [15], [25]. First,
as discussed in Section II-B the per sensor computational complexity is markedly reduced, since there is no
need to explicitly carry out a matrix inversion per iteration as in [15]. Second, the approach here bypasses
the need of the so-termed bridge sensors [25]. As a result, a fully distributed algorithm is obtained whereby
all sensors perform the same tasks in a more efficient manner, without introducing hierarchies that may
require intricate recovery protocols to cope with sensor failures.
Another contribution of the present paper pertains to a detailed stability and MSE steady-state per-
formance analysis for D-RLS (Section IV). These theoretical results were lacking in the algorithmic pa-
pers [15], [25], where claims were only supported via computer simulations. Evaluating the performance of
(centralized) adaptive filters is a challenging problem in its own right; prior art is surveyed in e.g., [28], [29,
pg. 120], [23, pg. 357], and the extensive list of references therein. On top of that, a WSN setting introduces
unique challenges in the analysis such as space-time sensor data and multiple sources of additive noise,
a consequence of imperfect sensors and communication links. The approach pursued here capitalizes on
an ‘averaged’ error-form representation of the local recursions comprising D-RLS, as a global dynamical
system described by a stochastic difference-equation derived in Section III-B. The covariance matrix of
the resulting state is then shown to encompass all the information needed to evaluate the relevant global
and sensor-level performance metrics (Section III-C). For sensor observations that are linearly related
to the time-invariant parameter vector sought, the simplifying independence setting assumptions [29, pg.
110], [23, pg. 448] are key enablers towards deriving accurate closed-form expressions for the mean-square
deviation and excess-MSE steady-state values (Section IV-B). Stability in the mean- and MSE-sense are
also investigated, revealing easily-checkable sufficient conditions under which a steady-state is attained.
Numerical tests corroborating the theoretical findings are presented in Section V, while concluding
remarks and possible directions for future work are given in Section VI.
Notation: Operators ⊗, (.)T , (.)† , λmax (.), tr(.), diag(.), bdiag(.), E [.], vec [.] will denote Kronecker
product, transposition, matrix pseudo-inverse, spectral radius, matrix trace, diagonal matrix, block diagonal
matrix, expectation, and matrix vectorization, respectively. For both vectors and matrices, k.k will stand
for the 2−norm. and |.| for the cardinality of a set or the magnitude of a scalar. Positive definite matrix M
will be denoted by M ≻ 0. The n × n identity matrix will be represented by In , while 1n will denote the
n × 1 vector of all ones and 1n×m := 1n 1Tm . Similar notation will be adopted for vectors (matrices) of all
zeros. For matrix M ∈ Rm×n , nullspace(M) := {x ∈ Rn : Mx = 0m }. The i-th vector in the canonical
basis for Rn will be denoted by bn,i , i = 1, . . . , n.
Consider a WSN with sensors {1, . . . , J} := J . Only single-hop communications are allowed, i.e., sensor
j can communicate only with the sensors in its neighborhood Nj ⊆ J , having cardinality |Nj |. Assuming
that inter-sensor links are symmetric, the WSN is modeled as an undirected connected graph with associated
graph Laplacian matrix L. Different from [1], [3] and [24], the present network model accounts explicitly
for non-ideal sensor-to-sensor links. Specifically, signals received at sensor j from sensor i at discrete-
time instant t are corrupted by a zero-mean additive noise vector ηij (t), assumed temporally and spatially
uncorrelated. The communication noise covariance matrices are denoted by Rηj := E[η ij (t)(η ij (t))T ],
j ∈ J.
The WSN is deployed to estimate a real signal vector s0 ∈ Rp×1 in a distributed fashion and subject to
the single-hop communication constraints, by resorting to the LS criterion [23, p. 658]. Per time instant
t = 0, 1, . . . , each sensor acquires a regression vector hj (t) ∈ Rp×1 and a scalar observation xj (t),
both assumed zero-mean without loss of generality. A similar setting comprising complex-valued data
was considered in [3] and [24]. Here, the exposition focuses on real-valued quantities for simplicity, but
extensions to the complex case are straightforward. Given new data sequentially acquired, a pertinent
approach is to consider the EWLSE [3], [23], [24]
J
t X
X 2
λt−τ xj (τ ) − hTj (τ )s + λt sT Φ0 s
ŝewls (t) := arg min (1)
s
τ =0 j=1
where λ ∈ (0, 1] is a forgetting factor, while Φ0 ≻ 0p×p is included for regularization. Note that in
forming the EWLSE at time t, the entire history of data {xj (τ ), hj (τ )}tτ =0 , ∀ j ∈ J is incorporated in the
online estimation process. Whenever λ < 1, past data are exponentially discarded thus enabling tracking
of nonstationary processes. Regarding applications, a distributed power spectrum estimation task matching
the aforementioned problem statement, can be found in [15].
To decompose the cost function in (1), in which summands are coupled through the global variable s,
introduce auxiliary variables {sj }Jj=1 representing local estimates of s0 per sensor j . These local estimates
are utilized to form the separable convex constrained minimization problem
t X
X J J
X
−1 t
{ŝj (t)}Jj=1 := arg min λ t−τ
[xj (τ ) − hTj (τ )sj ]2 +J λ sTj Φ0 sj ,
J
{sj }j=1
τ =0 j=1 j=1
s. t. sj = sj ′ , j ∈ J , j ′ ∈ Nj . (2)
From the connectivity of the WSN, (1) and (2) are equivalent in the sense that ŝj (t) = ŝewls (t), ∀ j ∈ J
and t ≥ 0; see also [27]. To arrive at the D-RLS recursions, it is convenient to reparametrize the constraint
where {z̄jj , z̃jj }j ′ ∈Nj , j ∈ J , are auxiliary optimization variables that will be eventually eliminated.
′ ′
To tackle the constrained minimization problem (2) at time instant t, associate Lagrange multipliers vjj
′
and ujj with the first pair of consensus constraints in (3). Introduce the ordinary Lagrangian function
′
X t
J X J
X
−1 t
L [s, z , v , u] = t−τ
λ [xj (τ ) − hTj (τ )sj ]2 +J λ sTj Φ0 sj
j=1 τ =0 j=1
J X h
X i
′ ′ ′ ′
+ (vjj )T (sj − z̄jj ) + (ujj )T (sj ′ − z̃jj ) (4)
j=1 j ′ ∈Nj
′ ′
Observe that the remaining constraints in (3), namely z ∈ Cz := {z : z̄jj = z̃jj , j ∈ J , j ′ ∈ Nj , j 6= j ′ },
have not been dualized.
Towards deriving the D-RLS recursions, the alternating minimization algorithm (AMA) of [31] will be
adopted here to tackle the separable EWLSE reformulation (2) in a distributed fashion. Much related to
AMA is the alternating-direction method of multipliers (AD-MoM), an iterative augmented Lagrangian
method specially well-suited for parallel processing [2], [15], [27]. While the AD-MoM has been proven
successful to tackle the optimization tasks stemming from general distributed estimators of deterministic and
(non-)stationary random signals, it is somehow curious that the AMA has remained largely underutilized.
To minimize (2) at time instant t, the AMA solver entails an iterative procedure comprising three steps
per iteration k = 0, 1, 2, . . .
[S1] Multiplier updates:
s(t, k + 1) = arg min L [s, z (t, k), v (t, k), u (t, k)] . (6)
s
z (t, k + 1) = arg min Lc [s(t, k + 1), z , v (t, k), u (t, k)] . (7)
z ∈Cz
Steps [S1] and [S3] are identical to those in AD-MoM [2]. In words, these steps correspond to dual ascent
iterations to update the Lagrange multipliers, and a block coordinate-descent minimization of the augmented
Lagrangian with respect to z ∈ Cz , respectively. The only difference is with regards to the local estimate
updates in [S2], where in AMA the new iterates are obtained by minimizing the ordinary Lagrangian with
respect to s . For the sake of the aforementioned minimization, all other variables are considered fixed
taking their most up to date values {z (t, k), v (t, k), u (t, k)}. For the AD-MoM instead, the minimized
quantity is the augmented Lagrangian both in [S2] and in [S3].
The AMA was motivated in [31] for separable problems that are strictly convex in s , but (possibly) only
convex with respect to z . Under this assumption, [S2] still yields a unique minimizer per iteration, and the
AMA is useful for those cases in which the Lagrangian is much simpler to optimize than the augmented
Lagrangian. Because of the regularization matrix Φ0 ≻ 0p×p , the EWLS cost in (2) is indeed strictly
convex for all t > 0, and the AMA is applicable. Section II-B dwells into the benefits of minimizing the
ordinary Lagrangian instead of its augmented counterpart (5), in the context of distributed RLS estimation.
Carrying out the minimization in [S3] first, one finds
1
z̄jj (t, k + 1) = z̃jj (t, k + 1) =
′ ′
[sj (t, k + 1) + sj ′ (t, k + 1)] , j ∈ J , j ′ ∈ Nj
2
so that vjj (t; k) = −ujj (t; k) for all k > −1 [15]. As a result vjj (t; k) is given by
′ ′ ′
′ ′ c
vjj (t; k) = vjj (t; k − 1) + [sj (t; k) − sj ′ (t; k)] , j ∈ J , j ′ ∈ Nj . (8)
2
Moving on to [S2], from the separable structure of (4) the minimization (6) can be split into J subproblems
t
X X h j′ iT
sj (t, k + 1) = arg min λt−τ [xj (τ ) − hTj (τ )sj ]2 + J −1 λt sTj Φ0 sj + vj (t, k) − vjj ′ (t, k) sj .
sj
τ =0 j ′ ∈Nj
Since each of the local subproblems corresponds to an unconstrained quadratic minimization, they all admit
closed-form solutions
1 −1 X h j′ i
sj (t, k + 1) = Φ−1
j (t)ψ j (t) − Φj (t) vj (t, k) − vjj ′ (t, k) (9)
2 ′ j ∈Nj
where
t
X
Φj (t) := λt−τ hj (τ )hTj (τ ) + J −1 λt Φ0 = λΦj (t − 1) + hj (t)hTj (t) (10)
τ =0
t
X
ψ j (t) := λt−τ hj (τ )xj (τ ) = λψ j (t − 1) + hj (t)xj (t). (11)
τ =0
Recursions (8) and (9) constitute the AMA-based D-RLS algorithm, whereby all sensors j ∈ J keep track
of their local estimate sj (t; k + 1) and their multipliers {vjj (t; k)}j ′ ∈Nj , which can be arbitrarily initialized.
′
From the rank-one update in (10) and capitalizing on the matrix inversion lemma, matrix Φ−1
j (t) can be
with complexity O(p2 ). It is recommended to initialize the matrix recursion with Φ−1 −1
j (0) = JΦ0 := δIp ,
where δ > 0 is chosen sufficiently large [23]. Not surprisingly, by direct application of the convergence
results in [31, Proposition 3], it follows that:
j ′ ∈N
Proposition 1: For arbitrarily initialized {vjj (t; −1)}j∈J j , sj (t; 0) and c ∈ (0, cu ); the local estimates
′
The upper bound cu is proportional to the modulus of the strictly convex cost function in (2), and
inversely proportional to the norm of a matrix suitably chosen to express the linear constraints in (3);
further details are in [31, Section 4]. Proposition 1 asserts that per time instant t, the AMA-based D-RLS
algorithm yields a sequence of local estimates that converge to the global EWLSE sought, as k → ∞, or,
pragmatically for large enough k. In principle, one could argue that running many consensus iterations may
not be a problem in a stationary environment. However, when the WSN is deployed to track a time-varying
parameter vector s0 (t), one cannot afford significant delays in-between consecutive sensing instants.
One possible way to overcome this hurdle is to run a single consensus iteration per acquired observation
xj (t). Specifically, letting k = t in recursions (8)-(9), one arrives at a single time scale D-RLS algorithm
which is suitable for operation in nonstationary WSN environments. Accounting also for additive communi-
cation noise that corrupts the exchanges of multipliers and local estimates, the per sensor tasks comprising
the novel AMA-based single time scale D-RLS algorithm are given by
′ ′ ch ′
i
vjj (t) = vjj (t − 1) + sj (t) − (sj ′ (t) + η jj (t)) , j ′ ∈ Nj (13)
2
λ−1 Φ−1 T
j (t)hj (t + 1)hj (t + 1)Φj (t)
−1
Φ−1
j (t + 1) = λ−1 −1
Φ j (t) − (14)
λ + hTj (t + 1)Φ−1 j (t)hj (t + 1)
′ j ∈Nj
Recursions (13)-(15) are tabulated as Algorithm 1, which also details the inter-sensor communications
of multipliers and local estimates taking place within neighborhoods. When powerful error control codes
render inter-sensor links virtually ideal, direct application of the results in [15], [16] show that D-RLS can
be further simplified to reduce the communication overhead and memory storage requirements.
A related D-RLS algorithm was put forth in [15], whereby the decomposable exponentially-weighted LS
cost (2) is minimized using the AD-MoM, rather than the AMA as in Section II-A. Recall that the AD-MoM
solver yields sj (t + 1) as the optimizer of the augmented Lagragian, while its AMA counterpart minimizes
the ordinary Lagrangian instead. Consequently, different from (16) local estimates in the AD-MoM-based
D-RLS algorithm of [15] are updated via
−1 c −1 X h ′
i
sj (t + 1) = Φ̄j (t + 1)ψ j (t + 1) + Φ̄j (t + 1) sj (t) + (sj ′ (t) + η jj (t))
2
j ′ ∈Nj
1 −1 h i
vjj (t) − (vjj ′ (t) + η̄ jj (t))
X ′ ′
− Φ̄j (t + 1) (17)
2 ′ j ∈Nj
Unless λ = 1, it is impossible to derive a rank-one update for Φ̄j (t) as in (10). The reason is the
regularization term c|Nj |Ip in (18), a direct consequence of the quadratic penalty in the augmented
−1
Lagrangian (5). This prevents one from efficiently updating Φ̄j (t + 1) in (17) using the matrix inversion
lemma [cf. (14)]. Direct inversion of Φ̄j (t + 1) per iteration dominates the computational complexity of
the AD-MoM-based D-RLS algorithm, which is roughly O(p3 ) [15].
Unfortunately, the penalty coefficient cannot be set to zero because the D-RLS algorithm breaks down.
For instance, when the initial Lagrange multipliers are null and c = 0, D-RLS boils down to a purely
local (L-) RLS algorithm where sensors do not cooperate, hence consensus cannot be attained. All in
all, the novel AMA-based D-RLS algorithm of this paper offers an improved alternative with an order
of magnitude reduction in terms of computational complexity per sensor. With regards to communication
cost, the AD-MoM-based D-RLS and Algorithm 1 here incur identical overheads; see [15, Sec. III-B] for
a detailed analysis of the associated cost, as well as comparisons with the I-RLS [24] and diffusion RLS
algorithms [3].
While the AMA-based D-RLS algorithm is less complex computationally than its AD-MoM counterpart
in [15], Proposition 1 asserts that when many consensus iterations can be afforded, convergence to the
centralized EWLSE is guaranteed provided c ∈ (0, cu ). On the other hand, the AD-MoM-based D-RLS
algorithm will attain the EWLSE for any c > 0 (cf. [15, Prop. 1]). In addition, it does not require tuning
the extra parameter δ, since it is applicable when Φ0 = 0p×p because the augmented Lagrangian provides
the needed regularization.
Performance evaluation of the D-RLS algorithm is much more involved than that of e.g., D-LMS [16],
[26]. The challenges are well documented for the classical (centralized) LMS and RLS filters [23], [29],
and results for the latter are less common and typically involve simplifying approximations. What is more,
the distributed setting introduces unique challenges in the analysis. These include space-time sensor data
and multiple sources of additive noise, a consequence of imperfect sensors and communication links.
In order to proceed, a few typical modeling assumptions are introduced to delineate the scope of the
ensuing stability and performance results. For all j ∈ J , it is assumed that:
(a1) Sensor observations adhere to the linear model xj (t) = hTj (t)s0 + ǫj (t), where the zero-mean
white noise {ǫj (t)} has variance σǫ2j ;
(a2) Vectors {hj (t)} are spatio-temporally white with covariance matrix Rhj ≻ 0p×p ; and
Vectors {hj (t)}, {ǫj (t)}, {η jj (t)}j ′ ∈Nj and {η̄ jj (t)}j ′ ∈Nj are independent.
′ ′
(a3)
Assumptions (a1)-(a3) comprise the widely adopted independence setting, for sensor observations that are
linearly related to the time-invariant parameter of interest; see e.g., [29, pg. 110], [23, pg. 448]. Clearly,
(a2) can be violated in, e.g., FIR filtering of signals (regressors) with a shift structure as in the distributed
power spectrum estimation problem described in [26] and [15]. Nevertheless, the steady-state performance
results extend accurately to the pragmatic setup that involves time-correlated sensor data; see also the
numerical tests in Section V. In line with a distributed setting such as a WSN, the statistical profiles of
both regressors and the noise quantities vary across sensors (space), yet they are assumed to remain time
invariant. For a related analysis of a distributed LMS algorithm operating in a nonstationary environment,
the reader is referred to [16].
In the particular case of the D-RLS algorithm, a unique challenge stems from the stochastic matrices
Φ−1 −1
j (t) present in the local estimate updates (16). Recalling (10), it is apparent that Φj (t) depends
upon the whole history of local regression vectors {hj (τ )}tτ =0 . Even obtaining Φ−1
j (t)’s distribution or
computing its expected value is a formidable task in general, due to the matrix inversion operation. It is for
these reasons that some simplifying approximations will be adopted in the sequel, to carry out the analysis
that otherwise becomes intractable.
Neglecting the regularization term in (10) that vanishes exponentially as t → ∞, the matrix Φj (t) is
obtained as an exponentially weighted moving average (EWMA). The EWMA can be seen as an average
modulated by a sliding window of equivalent length 1/(1 − λ), which clearly grows as λ → 1. This
observation in conjunction with (a2) and the strong law of large numbers, justifies the approximation
Rhj
Φj (t) ≈ E[Φj (t)] = , 0 ≪ λ < 1 and t → ∞. (19)
1−λ
The expectation of Φ−1
j (t), on the other hand, is considerably harder to evaluate. To overcome this
E[Φ−1
j (t)] ≈ E
−1
[Φj (t)] ≈ (1 − λ)R−1
hj , 0 ≪ λ < 1 and t → ∞. (20)
It is admittedly a crude approximation at first sight, because E X −1 6= E[X]−1 in general, for any random
variable X . However, experimental evidence suggests that the approximation is sufficiently accurate for all
practical purposes, when the forgetting factor approaches unity [23, p. 319].
B. Error-form D-RLS
The approach here to steady-state performance analysis relies on an ‘averaged’ error-form system
representation of D-RLS in (13)-(16), where Φ−1 −1
j (t) in (16) is replaced by the approximation (1 − λ)Rhj ,
for sufficiently large t. Somehow related approaches were adopted in [3] and [1]. Other noteworthy analysis
techniques include the energy-conservation methodology in [35], [23, p. 287], and stochastic averaging [29,
p. 229]. For performance analysis of distributed adaptive algorithms seeking time-invariant parameters, the
former has been applied in e.g., [13], [14], while the latter can be found in [26].
Towards obtaining such error-form representation, introduce the local estimation errors {y1,j (t) := sj (t)−
s0 }Jj=1 and multiplier-based quantities {y2,j (t) := 12 j ′ ∈Nj (vjj (t − 1) − vjj ′ (t − 1))}Jj=1 . It turns out that
P ′
a convenient global state to describe the spatio-temporal dynamics of D-RLS in (13)-(16) is y(t) :=
T (t) . . . yT (t) yT (t) . . . yT (t)]T ∈ R2Jp . In addition, to concisely capture the
[y1T (t) y2T (t)]T = [y1,1 1,J 2,1 2,J
effects of both observation and communication noise on the estimation errors across the WSN, define the
Jp×1 noise supervectors ǫ(t) := tτ =0 λt−τ [hT1 (τ )ǫ1 (τ ) . . . hTJ (τ )ǫJ (τ )]T and η̄(t) := [η̄ T1 (t) . . . η̄ TJ (t)]T .
P
Vectors {η̄ j (t)}Jj=1 represent the aggregate noise corrupting the multipliers received by sensor j at time
instant t, and are given by
1 X j′
η̄ j (t) := η̄j (t). (21)
2 ′
j ∈Nj
Their respective covariance matrices are easily computable under (a2)-(a3). For instance,
1 − λ2(t+1)
Rǫ (t) := E[ǫ(t)ǫT (t)] = bdiag(Rh1 σǫ21 , . . . , RhJ σǫ2J ) (22)
1 − λ2
while the structure of Rη̄ := E[η̄(t)η̄ T (t)] is given in Appendix E. Two additional Jp × 1 com-
T
munication noise supervectors are needed, namely η α (t) := (η α1 (t))T . . . (η αJ (t))T
and η β (t) :=
h iT
β β
(η 1 (t))T . . . (η J (t))T , where for j ∈ J
c X j′ c X j
ηαj (t) := η j (t), η βj (t) := η j ′ (t). (23)
4 ′ 4 ′
j ∈Nj j ∈Nj
Finally, let (c/2)L ⊗ Ip ∈ RJp×Jp be a matrix capturing the WSN connectivity pattern through the (scaled)
graph Laplacian matrix L, and define R−1 −1 −1
h := bdiag(Rh1 , . . . , RhJ ). Based on these definitions, it is
Lemma 1: Let (a1) and (a2) hold. Then for t ≥ t0 with t0 sufficiently large while 0 ≪ λ < 1, the global
state y(t) approximately evolves according to
IJp IJp
y(t + 1) = bdiag((1 − λ)R−1h , IJp ) Υy(t) +
ǫ(t + 1) + η̄(t)
0Jp×Jp 0Jp×Jp
IJp I
+ η α (t) − Jp η β (t) (24)
−IJp −IJp
where the 2Jp × 2Jp matrix Υ consists of the Jp × Jp blocks [Υ]11 = −[Υ]21 = −Lc and [Υ]12 =
−[Υ]22 = −IJp . The initial condition y(t0 ) should be selected as y(t0 ) = bdiag(IJp , Lc )y′ (t0 ), where
y′ (t0 ) is any vector in R2Jp .
The convenience of representing y(t) as in Lemma 1 will become apparent in the sequel, especially
when investigating sufficient conditions under which the D-RLS algorithm is stable in the mean sense
(Section IV-A). In addition, the covariance matrix of the state vector y(t) can be shown to encompass all
the information needed to evaluate the relevant per sensor and networkwide performance figures of merit,
the subject dealt with next.
C. Performance Metrics
whereas their global counterparts are defined as the respective averages across sensors, e.g., MSE(t) :=
J −1 Jj=1 E[ej (t)2 ], and so on.
P
Next, it is shown that it suffices to evaluate the state covariance matrix Ry (t) := E[y(t)yT (t)] in order
to assess the aforementioned performance metrics. To this end, note that by virtue of (a1) it is possible to
write ej (t) = −hTj (t)y1,j (t − 1) + ǫj (t). Because y1,j (t − 1) is independent of the zero-mean {hj (t), ǫj (t)}
under (a1)-(a3), from the previous relationship between the a priori and estimation errors one finds that
MSEj (t) = EMSEj (t) + σǫ2j . Hence, it suffices to focus on the evaluation of EMSEj (t), through which
MSEj (t) can also be determined under the assumption that the observation noise variances are known, or
T (t)] denotes the j -th local error covariance
can be estimated for that matter. If Ry1,j (t) := E[y1,j (t)y1,j
matrix, then MSDj (t) = tr(Ry1,j (t)); and under (a1)-(a3), a simple manipulation yields
To derive corresponding formulas for the global performance figures of merit, let Ry1 (t) := E[y1 (t)y1T (t)]
denote the global error covariance matrix, and define Rh := E[Rh (t)] = bdiag(Rh1 , . . . , RhJ ). It follows
that MSD(t) = J −1 tr(Ry1 (t)), and EMSE(t) = J −1 tr(Rh Ry1 (t − 1)).
TABLE I
E VALUATION OF LOCAL AND GLOBAL FIGURES OF MERIT FROM Ry (t)
Local tr([Ry (t)]11,j ) tr(Rhj [Ry (t − 1)]11,j ) tr(Rhj [Ry (t − 1)]11,j ) + σǫ2j
PJ
Global J −1 tr([Ry (t)]11 ) J −1 tr(Rh [Ry (t − 1)]11 ) J −1 tr(Rh [Ry (t − 1)]11 ) + J −1 j=1 σǫ2j
It is now straightforward to recognize that Ry (t) indeed provides all the information needed to evaluate
the performance of the D-RLS algorithm. For instance, observe that the global error covariance matrix
Ry1 (t) corresponds to the Jp × Jp upper left submatrix of Ry (t), which is denoted by [Ry (t)]11 . Further,
the j -th p × p diagonal submatrix (j = 1, . . . , J ) of [Ry (t)]11 is exactly Ry1,j (t), and is likewise denoted
by [Ry (t)]11,j . For clarity, the aforementioned notational conventions regarding submatrices within Ry (t)
are illustrated in Fig. 1. In a nutshell, deriving a closed-form expression for Ry (t) enables the evaluation
of all performance metrics of interest, as summarized in Table I. This task will be considered in Section
IV-B.
Remark 1 Since the ‘average’ system representation of y(t) in (24) relies on an approximation that
becomes increasingly accurate as λ → 1 and t → ∞, so does the covariance recursion for Ry (t) derived
in Section IV-B. For this reason, the scope of the MSE performance analysis of this paper pertains to the
steady-state behavior of the D-RLS algorithm.
In this section, stability and steady-state performance analyses are conducted for the D-RLS algorithm
developed in Section II-A. Because recursions (13)-(16) are stochastic in nature, stability will be assessed
both in the mean- and in the MSE-sense. The techniques presented here can be utilized with minimal
modifications to derive analogous results for the AD-MoM-based D-RLS algorithm in [15].
A. Mean Stability
Based on Lemma 1, it follows that D-RLS achieves consensus in the mean sense on the parameter s0 .
Proposition 2: Under (a1)-(a3) and for 0 ≪ λ < 1, D-RLS achieves consensus in the mean, i.e.,
Proof: Based on (a1)-(a3) and since the data is zero-mean, one obtains after taking expectations on
(24) that E[y(t)] = bdiag((1 − λ)R−1
h , IJp )ΥE[y(t − 1)]. The following lemma characterizes the spectrum
p eigenvalues equal to one. Further, the left eigenvectors associated with the unity eigenvalue have the
structure viT = 01×Jp qTi , where qi ∈ nullspace(Lc ) and i = 1, . . . , p. The remaining eigenvalues are
equal to zero, or else have modulus strictly smaller than one provided c satisfies the bound (25).
Back to establishing the mean stability result, let {ui } and {viT } respectively denote the collection of p
right and left eigenvectors of Ω associated with the eigenvalue one. By virtue of Lemma 2 and provided
c satisfies the bound (25), one has that limt→∞ Ωt = pi=1 ui viT ; hence,
P
p p
! !
X X
lim E[y(t)] = ui viT y(t0 ) = ui viT bdiag(IJp , Lc )y′ (t0 )
t→∞
i=1 i=1
p
!
X
ui 01×Jp qTi Lc y′ (t0 ) = 02Jp .
=
i=1
In obtaining the second equality, the structure for y(t0 ) that is given in Lemma 1 was used. The last
equality follows from the fact that qi ∈ nullspace(Lc ) as per Lemma 2, thus completing the proof.
Before wrapping up this section, a comment is due on the sufficient condition (25). When performing
distributed estimation under 0 ≪ λ < 1, the condition is actually not restrictive at all since a 1 − λ
factor is present in the denominator. When λ is close to one, any practical choice of c > 0 will result in
asymptotically unbiased sensor estimates. Also note that the bound depends on the WSN topology, through
the scaled graph Laplacian matrix Lc .
In order to assess the steady-state MSE performance of the D-RLS algorithm, we will evaluate the figures
of merit introduced in Section III-C. The limiting values of both the local (per sensor) and global (network-
wide) MSE, excess mean-square error (EMSE), and mean-square deviation (MSD), will be assessed. To
this end, it suffices to derive a closed-form expression for the global estimation error covariance matrix
Ry1 (t) := E[y1 (t)y1T (t)], as already argued in Section III-C.
The next result provides an equivalent representation of the approximate D-RLS global recursion (24),
that is more suitable for the recursive evaluation of Ry1 (t). First, introduce the p( Jj=1 |Nj |) × 1 vector
P
T
η(t) := {(η 1j ′ (t))T }j ′ ∈N1 . . . {(η Jj′ (t))T }j ′ ∈NJ
(26)
which comprises the receiver noise terms corrupting transmissions of local estimates across the whole
network at time instant t, and define Rη := E[η(t)η T (t)]. For notational convenience, let R−1
h,λ :=
(1 − λ)R−1
h .
Lemma 3: Under the assumptions of Lemma 1, the global state y(t) in (24) can be equivalently written
as
R−1
h,λ R−1
h,λ (Pα − Pβ )
y(t + 1) = bdiag(IJp , Lc )z(t + 1) + η̄(t) + η(t). (27)
0Jp×Jp Pβ − Pα
The inner state z(t) := [zT1 (t) zT2 (t)]T is arbitrarily initialized at time t0 , and updated according to
R−1 (P α − Pβ ) R −1
R −1
z(t+1) = Ψz(t)+Ψ h,λ η(t−1)+Ψ h,λ
η̄(t−1)+ h,λ
ǫ(t+1) (28)
C 0Jp×Jp 0Jp×Jp
where the 2Jp × 2Jp transition matrix Ψ consists of the blocks [Ψ]11 = [Ψ]12 = −R−1
h,λ Lc and [Ψ]21 =
[Ψ]22 = Lc L†c . Matrix C is chosen such that Lc C = Pβ − Pα , where the structure of the time-invariant
matrices Pα and Pβ is given in Appendix E.
Proof: See Appendix C.
The desired state y(t) is obtained as a rank-deficient linear transformation of the inner state z(t), plus a
stochastic offset due to the presence of communication noise. A linear, time-invariant, first-order difference
equation describes the dynamics of z(t), and hence of y(t), via the algebraic transformation in (27).
The time-invariant nature of the transition matrix Ψ is due to the approximations Φ−1 −1
j (t) ≈ Rh,λ , j ∈ J ,
particularly accurate for large enough t > t0 . Examination of (28) reveals that the evolution of z(t) is driven
by three stochastic input processes: i) communication noise η(t − 1) affecting the transmission of local
estimates; ii) communication noise η̄(t − 1) contaminating the Lagrange multipliers; and iii) observation
noise within ǫ(t + 1).
Focusing now on the calculation of Ry1 (t) = [Ry (t)]11 based on Lemma 3, observe from the upper
Jp × 1 block of y(t + 1) in (27) that y1 (t + 1) = z1 (t + 1) + R−1
h,λ [η̄(t) + (Pα − Pβ )η(t)]. Under (a3),
which prompts one to obtain Rz (t) := E[z(t)zT (t)]. Specifically, the goal is to extract its upper-left Jp×Jp
matrix block [Rz (t)]11 = Rz1 (t). To this end, define the vectors
R−1
h,λ R −1
(Pα − Pβ )
η̄ λ (t) := η̄(t), η λ (t) := h,λ η(t) (30)
0Jp×Jp C
whose respective covariance matrices Rη̄ λ := E[η̄ λ (t)η̄ Tλ (t)] and Rη λ := E[η λ (t)η Tλ (t)] have a structure
detailed in Appendix E. Also recall that ǫ(t) depends on the entire history of regressors up to time instant t.
Starting from (28) and capitalizing on (a2)-(a3), it is straightforward to obtain a first-order matrix recursion
to update Rz (t) as
T
R−1
h,λ R−1
h,λ
Rz (t) = ΨRz (t − 1)ΨT + ΨRη̄ λ ΨT + ΨRη λ ΨT + Rǫ (t)
0Jp×Jp 0Jp×Jp
T T
T
R−1
h,λ R−1
h,λ
+ ΨRz ǫ (t) +
ΨRz ǫ (t)
(31)
0Jp×Jp 0Jp×Jp
where the cross-correlation matrix Rz ǫ (t) := E[z(t − 1)ǫT (t)] is recursively updated as (cf. Appendix D)
R−1
h,λ
Rz ǫ (t) = λΨRz ǫ (t − 1) + λ Rǫ (t − 1). (33)
0Jp×Jp
For notational brevity in what follows, Rν (t) in (32) denotes all the covariance forcing terms in the
right-hand side of (31). The main result of this section pertains to MSE stability of the D-RLS algorithm,
and provides a checkable sufficient condition under which the global error covariance matrix Ry1 (t) has
bounded entries as t → ∞. Recall that a matrix is termed stable, when all its eigenvalues lie strictly inside
the unit circle.
Proposition 3: Under (a1)-(a3) and for 0 ≪ λ < 1, D-RLS is MSE stable, i.e., limt→∞ Ry1 (t) has bounded
entries, provided that c > 0 is chosen so that Ψ is a stable matrix.
Proof: First observe that because λ ∈ (0, 1), it holds that
1 − λ2(t+1)
lim Rǫ (t) = lim bdiag(Rh1 σǫ21 , . . . , RhJ σǫ2J )
t→∞ t→∞ 1 − λ2
1
= bdiag(Rh1 σǫ21 , . . . , RhJ σǫ2J ) =: Rǫ (∞). (34)
1 − λ2
If c > 0 is selected such that Ψ is a stable matrix, then clearly λΨ is also stable, and hence the matrix
recursion (33) converges to the bounded limit
λR−1
Rz ǫ (∞) = (I2Jp − λΨ)−1 h,λ Rǫ (∞). (35)
0Jp×Jp
Based on the previous arguments, it follows that the forcing matrix Rν (t) in (31) will also attain a bounded
limit as t → ∞, denoted as Rν (∞). Next, we show that limt→∞ Rz (t) has bounded entries by studying
its equivalent vectorized dynamical system. Upon vectorizing (32), it follows that
where in obtaining the last equality we used the property vec[RST] = TT ⊗ R vec[S]. Because the
eigenvalues of Ψ ⊗ Ψ are the pairwise products of those of Ψ, stability of Ψ implies stability of the
Kronecker product. As a result, the vectorized recursion will converge to the limit
−1
vec[Rz (∞)] = I(2Jp)2 − Ψ ⊗ Ψ vec[Rν (∞)] (36)
which of course implies that limt→∞ Rz (t) = Rz (∞) has bounded entries. From (29), the same holds true
for Ry1 (t), and the proof is completed.
Proposition 3 asserts that the AMA-based D-RLS algorithm is stable in the MSE-sense, even when the
WSN links are challenged by additive noise. While most distributed adaptive estimation works have only
looked at ideal inter-sensor links, others have adopted diminishing step-sizes to mitigate the undesirable
effects of communication noise [10], [11]. This approach however, limits their applicability to stationary
environments. Remarkably, the AMA-based D-RLS algorithm exhibits robustness to noise when using a
constant step-size c, a feature that has also been observed for AD-MoM related distributed iterations in
e.g., [26], [27], and [15].
As a byproduct, the proof of Proposition 3 also provides part of the recipe towards evaluating the
steady-state MSE performance of the D-RLS algorithm. Indeed, by plugging (34) and (35) into (31) one
obtains the steady-state covariance matrix Rν (∞). It is then possible to evaluate Rz (∞), by reshaping
the vectorized identity (36). Matrix Rz1 (∞) can be extracted from the upper-left Jp × Jp matrix block of
Rz (∞), and the desired global error covariance matrix Ry1 (∞) = [Ry (∞)]11 becomes available via (29).
Closed-form evaluation of the MSE(∞), EMSE(∞) and MSD(∞) for every sensor j ∈ J is now possible
given Ry1 (∞), by resorting to the formulae in Table I.
Before closing this section, an alternative notion of stochastic stability that readily follows from Proposi-
tion 3 is established here. Specifically, it is possible to show that under the independence setting assumptions
(a1)-(a3) considered so far, the global error norm ky1 (t)k remains most of the time within a finite interval,
i.e., errors are weakly stochastic bounded (WSB) [28], [29, pg. 110]. This WSB stability guarantees that
for any θ > 0, there exists a ζ > 0 such that Pr[ky1 (t)k < ζ] = 1 − θ uniformly in time.
Corollary 1: Under (a1)-(a3) and for 0 ≪ λ < 1, if c > 0 is chosen so that Ψ is a stable matrix, then the
D-RLS algorithm yields estimation errors which are WSB; i.e., limζ→∞ supt≥t0 Pr[ky1 (t)k ≥ ζ] = 0.
Proof: Chebyshev’s inequality implies that
E[ky1 (t)k2 ] tr([Ry (t)]11 )
Pr[ky1 (t)k ≥ ζ] ≤ 2
= . (37)
ζ ζ2
From Proposition 3, limt→∞ [Ry (t)]11 has bounded entries, implying that supt≥t0 tr([Ry (t)]11 ) < ∞. Taking
the limit as ζ → ∞, while relying on the bound in (37) which holds for all values of t ≥ t0 , yields the
desired result.
In words, Corollary 1 ensures that with overwhelming probability, local sensor estimates remain inside a
ball with finite radius, centered at s0 . It is certainly a weak notion of stability, many times the only one that
can be asserted when the presence of, e.g., time-correlated data, renders variance calculations impossible;
see also [26], [28]. In this case where stronger assumptions are invoked, WSB follows immediately once
MSE-sense stability is established. Nevertheless, it is an important practical notion as it ensures – on a
per-realization basis – that estimation errors have no probability mass escaping to infinity. In particular,
D-RLS estimation errors are shown WSB in the presence of communication noise; a property not enjoyed
by other distributed iterations for e.g., consenting on averages [34].
V. N UMERICAL T ESTS
Computer simulations are carried out here to corroborate the analytical results of Section IV-B. Even
though based on simplifying assumptions and approximations, the usefulness of the analysis is justified
since the predicted steady-state MSE figures of merit accurately match the empirical D-RLS limiting
values. In accordance with the adaptive filtering folklore, when λ → 1 the upshot of the analysis under the
independence setting assumptions is shown to extend accurately to the pragmatic scenario whereby sensors
acquire time-correlated data. For J = 15 sensors, a connected ad hoc WSN is generated as a realization of
the random geometric graph model on the unit-square, with communication range r = 0.3 [9]. To model
non-ideal inter-sensor links, additive white Gaussian noise (AWGN) with variance ση2 = 10−1 is added at
the receiving end. The WSN used for the experiments is depicted in Fig. 2.
With p = 4 and s0 = 1p , observations obey a linear model [cf. (a1)] with sensing WGN of spatial
variance profile σǫ2j = 10−3 αj , where αj ∼ U [0, 1] (uniform distribution) and i.i.d.. The regression vectors
hj (t) := [hj (t) . . . hj (t − p + 1)]T have a shift structure, and entries which evolve according to first-
√
order stable autoregressive processes hj (t) = (1 − ρ)βj hj (t − 1) + ρωj (t) for all j ∈ J . We choose
√ √
ρ = 5 × 10−1 , the βj ∼ U [0, 1] i.i.d. in space, and the driving white noise ωj (t) ∼ U [− 3σωj , 3σωj ]
with spatial variance profile given by σω2 j = 2γj with γj ∼ U [0, 1] and i.i.d.. Observe that the data is
temporally-correlated, implying that (a2) does not hold here.
For all experimental performance curves obtained by running the algorithms, the ensemble averages are
approximated by sample averaging 200 runs of the experiment.
First, with λ = 0.95, c = 0.1 and δ = 100 for the AMA-based D-RLS algorithm, Fig. 3 depicts the
network performance through the evolution of the EMSE(t) and MSD(t) figures of merit. Both noisy
and ideal links are considered. The steady-state limiting values found in Section IV-B are extremely
accurate, even though the simulated data does not adhere to (a2), and the results are based on simplifying
approximations. As intuitively expected and analytically corroborated via the noise-related additive terms
in (29) and (31), the performance penalty due to non-ideal links is also apparent.
We also utilize the analytical results developed throughout this paper to contrast the per sensor perfor-
mance of D-RLS and the D-LMS algorithm in [16]. In particular, the parameters chosen for D-LMS are
µ = 5 × 10−3 and c = 1. Fig. 4 shows the values of the EMSEj (∞) and MSDj (∞) for all j ∈ J . As
expected, the second-order D-RLS scheme attains improved steady-state performance uniformly across all
sensors in the simulated WSN. In this particular simulated test, gains as high as 5dB in estimation error can
be achieved at the price of increasing computational burden per sensor, from O(p) to O(p2 ) per iteration.
A distributed RLS-like algorithm is developed in this paper, which is capable of performing adaptive
estimation and tracking using WSNs in which sensors cooperate with single-hop neighbors. The WSNs
considered here are quite general since they do not necessarily possess a Hamiltonian cycle, while the inter-
sensor links are challenged by communication noise. Distributed iterations are derived after: i) reformulating
in a separable way the exponentially weighed least-squares (EWLS) cost involved in the classical RLS
algorithm; and ii) applying the AMA to minimize this separable cost in a distributed fashion. The AMA
is especially well-suited to capitalize on the strict convexity of the EWLS cost, and thus offer significant
reductions in computational complexity per sensor, when compared to existing alternatives. This way, salient
features of the classical RLS algorithm are shown to carry over to a distributed WSN setting, namely
reduced-complexity estimation when a state and/or data model is not available and fast convergence rates
are at a premium.
An additional contribution of this paper pertains to a detailed steady-state MSE performance analysis,
that relies on an ‘averaged’ error-form system representation of D-RLS. The theory is developed under
some simplifying approximations, and resorting to the independence setting assumptions. This way, it is
possible to obtain accurate closed-form expressions for both the per sensor and network-wide relevant
performance metrics as t → ∞. Sufficient conditions under which the D-RLS algorithm is stable in the
mean- and MSE-sense are provided as well. As a corollary, the D-RLS estimation errors are also shown
to remain within a finite interval with high probability, even when the inter-sensor links are challenged
by additive noise. Numerical simulations demonstrated that the analytical findings of this paper extend
accurately to a more realistic WSN setting, whereby sensors acquire temporally correlated sensor data.
Regarding the performance of the D-RLS algorithm, there are still several interesting directions to pursue
as future work. First, it would be nice to establish a stochastic trajectory locking result which formally
shows that as λ → 1, the D-RLS estimation error trajectories closely follow the ones of its time-invariant
‘averaged’ system companion. Second, the steady-state MSE performance analysis was carried out when
0 ≪ λ < 1. For the infinite memory case in which λ = 1, numerical simulations indicate that D-RLS
provides mean-square sense-consistent estimates, even in the presence of communication noise. By formally
establishing this property, D-RLS becomes an even more appealing alternative for distributed parameter
estimation in stationary environments. While the approximations used in this paper are no longer valid
when λ = 1, for Gaussian i.i.d. regressors matrix Φ−1 (t) is Wishart distributed with known moments.
Under these assumptions, consistency analysis is a subject of ongoing investigation.
A PPENDIX
After summing (vjj (t) − vjj ′ (t))/2 over j ′ ∈ Nj , it follows from (38) that for all j ∈ J
′
1 X j′ c X c X j′
y2,j (t + 1) := (vj (t) − vjj ′ (t)) = y2,j (t) + (sj (t) − sj ′ (t)) − (η j (t) − η jj ′ (t)) (40)
2 ′ 2 ′ 4 ′
j ∈Nj j ∈Nj j ∈Nj
c X
= y2,j (t) + (y1,j (t) − y1,j ′ (t)) − η αj (t) + η βj (t), (41)
2 ′
j ∈Nj
where the last equality was obtained after adding and subtracting c|Nj |s0 from the right-hand side of (40),
and relying on the definitions in (23). Next, starting from (39) and upon: i) using (a1) to eliminate
t+1 t+1 t+1
X X Rhj X
ψ j (t + 1) = λt+1−τ hj (τ )hTj (τ )s0 + λt+1−τ hj (τ )ǫj (τ ) ≈ s0 + λt+1−τ hj (τ )ǫj (τ )
1−λ
τ =0 τ =0 τ =0
from (39); ii) recognizing y2,j (t + 1) in the right-hand side of (39) and substituting it with (41); and iii)
replacing the sums of noise vectors with the quantities defined in (21) and (23); one arrives at
− c
X
y1,j (t + 1) = (1 − λ)R−1
hj (y1,j (t) − y1,j ′ (t)) − y2,j (t)
2 ′
j ∈Nj
" t+1 #
X
+ (1 − λ)R−1hj λt+1−τ hj (τ )ǫj (τ ) + η αj (t) − η βj (t) + η̄ j (t) . (42)
τ =0
What remains to be shown is that after stacking the recursions (42) and (41) for j = 1, . . . , J to form
the one for y(t + 1), we can obtain the compact representation in (24). Examining (41) and (42), it is
apparent that a common matrix factor bdiag((1 − λ)R−1
hj , IJp ) can be pulled out to symplify the expression
for y(t + 1). Consider first the forcing terms in (24). Stacking the channel noise terms from (42) and
(41), readily yields the last three terms inside the curly brackets in (24). Likewise, stacking the terms
Pt+1 t+1−τ
τ =0 λ hj (τ )ǫj (τ ) for j = 1, . . . , J yields the second term due to the observation noise; recall the
definition of ǫ(t + 1). This term as well as the vectors η̄ j (t) are not present in (41), which explains the
zero vector at the lower part of the second and third terms inside the curly brackets of (24).
To specify the structure of the transition matrix Υ, note that the first term on the right-hand side of (41)
explains why [Υ]22 = IJp . Similarly, the second term inside the first square brackets in (42) explains why
P
[Υ]12 = −IJp . Next, it follows readily that upon stacking the terms (c/2) j ′ ∈Nj (y1,j (t)−y1,j ′ (t)), which
correspond to a scaled Laplacian-based combination of p × 1 vectors, one obtains [(c/2)L ⊗ Ip ]y1 (t) =
Lc y1 (t). This justifies why [Υ]11 = −[Υ]21 = −Lc .
A comment is due regarding the initialization for t = t0 . Although the vectors {y1,j (t0 )}Jj=1 are
decoupled so that y1 (t0 ) can be chosen arbitrarily, this is not the case for {y2,j (t0 )}Jj=1 which are coupled
and satisfy
J J X
(vjj (t − 1) − vjj ′ (t − 1)) = 0p ,
X X ′
y2,j (t) = ∀ t ≥ 0. (43)
j=1 j=1 j ′ ∈N j
The coupling across {y2,j (t)}Jj=1 dictates y2 (t0 ) to be chosen in compliance with (43), so that the system
(24) is equivalent to (38) and (39) for all t ≥ t0 . Let y2 (t0 ) = Lc y2′ (t0 ), where y2′ (t0 ) is any vector in RJp .
Then, it is not difficult to see that y2 (t0 ) satisfies the conservation law (43). In conclusion, for arbitrary
y′ (0) ∈ R2Jp the recursion (24) should be initialized as y(0) = bdiag(IJp , Lc )y′ (0), and the proof of
Lemma 1 is completed.
h i
B. Proof of Lemma 2: Recall the structure of matrix Υ given in Lemma 1. A vector viT := v1,i
T vT
2,i
with {vj,i }2j=1 ∈ RJp×1 is a left eigenvector of Ω associated to the eigenvalue one, if and only if it solves
the following linear system of equations
T
−v1,i (1 − λ)R−1 T T
h Lc + v2,i Lc = v1,i
T
−v1,i (1 − λ)R−1 T T
h + v2,i = v2,i
The second equation can only be satisfied for v1,i = 0Jp , and upon substituting this value in the first
equation one obtains that v2,i ∈ nullspace(Lc ) = nullspace(L ⊗ Ip ) for all values of c > 0. Under the
assumption of a connected ad hoc WSN, nullspace(L) = span(1J ) and hence nullspace(L ⊗ Ip ) is a
p-dimensional subspace.
Following steps similar to those in [27, Appendix H], it is possible to express the eigenvalues of Ω
that are different from one as the roots of a second-order polynomial. Such a polynomial does not have
an independent term, so that some eigenvalues are zero. With respect to the rest of the eigenvalues, it is
possible to show that their magnitude is upper bounded by λmax (IJp −(1−λ)R−1
h Lc ). Hence, it is possible
convention) there is no communication noise for t < t0 . Upon substituting z(t0 + 1) into (27), we find
−1 −1
R h,λ R (Pα − Pβ )
y(t0 + 1) = bdiag(IJp , Lc )Ψy′ (t0 ) + (ǫ(t0 + 1) + η̄(t0 )) + h,λ η(t0 ). (44)
0Jp×Jp Pβ − Pα
Note that: i) bdiag(IJp , Lc )Ψ = Υbdiag(IJp , Lc ); ii) y(t0 ) = bdiag(IJp , Lc )y′ (t0 ) for the system in
Lemma 1; and iii) ηα (t) = Pα η(t), while ηβ (t) = Pβ η(t) [cf. Appendix E]. Thus, the right-hand side of
(44) is equal to the right-hand side of (24) for t = t0 .
Suppose next that (27) and (28) hold true for y(t) and z(t), with t ≥ t0 . The same will be shown for
y(t + 1) and z(t + 1). To this end, replace y(t) with the right-hand side of (27) evaluated at time t, into
(24) to obtain
R−1
h,λ IJp
y(t + 1) = bdiag(R−1
h,λ , IJp ) Υbdiag(IJp , Lc )z(t) + Υ η̄(t − 1) + ǫ(t + 1)
0Jp×Jp 0Jp×Jp
R−1
h,λ (Pα − Pβ ) IJp IJp IJp
+Υ η(t − 1) + η̄(t) + η α (t) − η β (t)
Pβ − Pα 0Jp×Jp −IJp −IJp
R−1
h,λ R−1
h,λ (Pα − Pβ )
= bdiag(IJp , Lc ) Ψz(t) + Ψ η̄(t − 1) + Ψ η(t − 1)
0Jp×Jp C
R−1
h,λ R−1
h,λ R−1
h,λ (Pα − Pβ )
+ ǫ(t + 1) + η̄(t) + η(t) (45)
0Jp×Jp 0Jp×Jp Pβ − Pα
where in obtaining the last equality in (45), the following were used: i) bdiag(IJp , Lc )Ψ = Υbdiag(IJp , Lc )
; ii) the relationship between ηα (t), η β (t) and η(t) given in Appendix E; and iii) the existence of a matrix
C such that Lc C = Pβ − Pα . This made possible to extract the common factor bdiag(IJp , Lc ) and deduce
from (45) that y(t + 1) is given by (27), while z(t + 1) is provided by (28).
In order to complete the proof, one must show the existence of matrix C. To this end, via a simple
evaluation one can check that nullspace(Lc ) ⊆ nullspace(PTβ − PTα ), and since Lc is symmetric, one has
nullspace(Lc )⊥range(Lc ). As nullspace(PTβ − PTα )⊥range(Pβ − Pα ), it follows that range(Pβ − Pα ) ⊆
range(Lc ), which further implies that there exists C such that Lc C = Pβ − Pα .
D. Derivation of (33): First observe that the noise supervector ǫ(t) obeys the first-order recursion
t
X
ǫ(t) := λt−τ [hT1 (τ )ǫ1 (τ ) . . . hTJ (τ )ǫJ (τ )]T = λǫ(t − 1) + [hT1 (t)ǫ1 (t) . . . hTJ (t)ǫJ (t)]T . (46)
τ =0
Because under (a3) the zero-mean {ǫj (t)}j∈J are independent of z(t − 1) [cf. (28)], it follows readily that
Rz ǫ (t) := E[z(t − 1)ǫT (t)] = λE[z(t − 1)ǫT (t − 1)]. Plugging the expression for z(t − 1) and carrying
out the expectation yields
R−1
h,λ (Pα − Pβ )
E[z(t − 1)ǫT (t − 1)] = ΨE[z(t − 2)ǫT (t − 1)] + Ψ E[η(t − 3)ǫT (t − 1)]
C
R−1
h,λ R−1
h,λ
+Ψ E[η̄(t − 3)ǫT (t − 1)] + E[ǫ(t − 1)ǫT (t − 1)]
0Jp×Jp 0Jp×Jp
R−1
h,λ
= ΨRz ǫ (t − 1) + Rǫ (t − 1). (47)
0Jp×Jp
The second equality follows from the fact that the zero-mean communication noise vectors are independent
of ǫ(t − 1). Scaling (47) by λ yields the desired result.
E. Structure of matrices Pα , Pβ , Rη̄ , Rη , Rη̄ λ , and Rη λ : In order to relate the noise supervectors
ηα (t) and η β (t) with η(t) in (26), introduce two Jp × ( Jj=1 |Nj |)p matrices Pα := [p1 . . . pJ ]T and
P
Pβ := [p′1 . . . p′J ]T . The ( Jj=1 |Nj |)p × p submatrices pj , p′j are given by pj := [(pj,1 )T . . . (pj,J )T ]T
P
Note also that the blocks Rη j,j = 0p×p for all j ∈ J , since a sensor does not communicate with itself. In
both cases, the block diagonal structure of the covariance matrices is due to the spatial uncorrelatedness
of the noise vectors.
What is left to determine is the structure of Rη̄ λ and Rη λ . From (30) one readily obtains
T T
−1 −1 −1 −1
Rh,λ Rh,λ R (P − Pβ ) R (P − Pβ )
Rη̄ µ = Rη̄ , Rη = h,λ α Rη h,λ α .
µ
0Jp×Jp 0Jp×Jp C C
(48)
R EFERENCES
[1] A. Bertrand, M. Moonen, and A. H. Sayed, “Diffusion bias-compensated RLS estimation over adaptive networks,” IEEE Trans.
Signal Process., vol. 59, 2011. [Online]. Available: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=$&$arnumber=5975252
[2] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, 2nd ed. Athena-Scientific,
1999.
[3] F. S. Cattivelli, C. G. Lopes, and A. H. Sayed, “Diffusion recursive least-squares for distributed estimation over adaptive
networks,” IEEE Trans. Signal Process., vol. 56, pp. 1865–1877, May 2008.
[4] F. S. Cattivelli and A. H. Sayed, “Hierarchical diffusion algorithms for distributed estimation,” in Proc. of the Wrkshp. on
Statistical Signal Proc., Cardiff, Wales, Aug. 2009, pp. 537–540.
[5] ——, “Diffusion LMS strategies for distributed estimation,” IEEE Trans. Signal Process., vol. 58, pp. 1035–1048, Mar. 2010.
[6] ——, “Diffusion strategies for distributed Kalman filtering and smoothing,” IEEE Trans. Autom. Contr., vol. 55, pp. 2069–
2084, 2010.
[7] ——, “Analysis of spatial and incremental LMS processing for distributed estimation,” IEEE Trans. Signal Process., vol. 59,
pp. 1465–1480, Apr. 2011.
[8] S. Chouvardas, K. Slavakis, and S. Theodoridis, “Adaptive robust distributed learning in diffusion sensor networks,” IEEE
Trans. Signal Process., vol. 59, pp. 4692–4707, Oct. 2011.
[9] P. Gupta and P. R. Kumar, “The capacity of wireless networks,” IEEE Trans. on Information Theory, vol. 46, pp. 388–404,
Mar. 2000.
[10] Y. Hatano, A. K. Das, and M. Mesbahi, “Agreement in presence of noise: pseudogradients on random geometric networks,”
in Proc. of the 44th Conf. on Dec. and Contr., Seville, Spain, Dec. 2005, pp. 6382–6387.
[11] S. Kar and J. M. F. Moura, “Distributed consensus algorithms in sensor networks with imperfect communication: link failures
and channel noise,” IEEE Trans. Signal Process., vol. 57, pp. 355–369, Jan. 2009.
[12] L. Li, C. G. Lopes, J. Chambers, and A. H. Sayed, “Distributed estimation over an adaptive incremental network based on
the affine projection algorithm,” IEEE Trans. Signal Process., vol. 58, pp. 151–164, Jan. 2010.
[13] C. G. Lopes and A. H. Sayed, “Incremental adaptive strategies over distributed networks,” IEEE Trans. Signal Process.,
vol. 55, pp. 4064–4077, Aug. 2007.
[14] ——, “Diffusion least-mean squares over adaptive networks: Formulation and performance analysis,” IEEE Trans. Signal
Process., vol. 56, pp. 3122–3136, Jul. 2008.
[15] G. Mateos, I. D. Schizas, and G. B. Giannakis, “Distributed recursive least-squares for consensus-based in-network adaptive
estimation,” IEEE Trans. Signal Process., vol. 57, pp. 4583–4588, Nov. 2009.
[16] ——, “Performance analysis of the consensus-based distributed LMS algorithm,” EURASIP Journal on Advances in Signal
Processing, Dec. 2009, article ID 981030.
[17] A. Nedic and D. P. Bertsekas, “Incremental subgradient methods for nondifferentiable optimization,” SIAM Jounral on
Optimization, vol. 12, pp. 109–138, Jan. 2001.
[18] C. H. Papadimitriou, Computational Complexity. Addison-Wesley, 1993.
[19] M. G. Rabbat and R. D. Nowak, “Quantized incremental algorithms for distributed optimization,” IEEE Journal on Sel. Areas
In Comm., vol. 23, pp. 798–808, 2005.
[20] M. G. Rabbat, R. D. Nowak, and J. A. Bucklew, “Generalized consensus computation in networked systems with erasure
links,” in Proc. of the Wrkshp. on Signal Proc. Adv. in Wireless Communications, New York, NY, Jun. 2005, pp. 1088– 1092.
[21] S. S. Ram, A. Nedic, and V. V. Veeravalli, “Stochastic incremental gradient descent for estimation in sensor networks,” in
Proc. of 41st Asilomar Conf. on Signals, Systems, and Computers, Pacific Grove, CA, 2007, pp. 582–586.
[22] A. Ribeiro, I. D. Schizas, S. I. Roumeliotis, and G. B. Giannakis, “Kalman filtering in wireless sensor networks: Incorporating
communication cost in state estimation problems,” IEEE Control Syst. Mag., vol. 30, pp. 66–86, Apr. 2010.
[23] A. H. Sayed, Fundamentals of Adaptive Filtering. John Wiley & Sons, 2003.
[24] A. H. Sayed and C. G. Lopes, “Distributed recursive least-squares over adaptive networks,” in Proc. of 40th Asilomar Conf.
On Signals, Systems and Computers, Pacific Grove, CA, Oct./Nov. 2006, pp. 233–237.
[25] I. D. Schizas, G. Mateos, and G. B. Giannakis, “Consensus-based distributed recursive least-squares estimation using ad hoc
wireless sensor networks,” in Proc. of 41st Asilomar Conf. On Signals, Systems and Computers, Pacific Grove, CA, Nov.
2007, pp. 386–390.
[26] ——, “Distributed LMS for consesus-based in-network adaptive processing,” IEEE Trans. Signal Process., vol. 57, pp.
2365–2381, Jun. 2009.
[27] I. D. Schizas, A. Ribeiro, and G. B. Giannakis, “Consensus in ad hoc WSNs with noisy links - part I: Distributed estimation
of deterministic signals,” IEEE Trans. Signal Process., vol. 56, pp. 350–364, Jan. 2008.
[28] V. Solo, “The stability of LMS,” IEEE Trans. Signal Process., vol. 45, pp. 3017–3026, Dec. 1997.
[29] V. Solo and X. Kong, Adaptive Signal Processing Algorithms: Stability and Performance. Prentice Hall, 1995.
[30] N. Takahashi, I. Yamada, and A. H. Sayed, “Diffusion least-mean squares with adaptive combiners: Formulation and
performance analysis,” IEEE Trans. Signal Process., vol. 58, pp. 4795–4810, Sep. 2011.
[31] P. Tseng, “Applications of a splitting algorithm to decomposition in convex programming and variational inequalities,” SIAM
Journal on Control and Optimization, vol. 29, pp. 119–138, Jan. 1991.
[32] S.-Y. Tu and A. H. Sayed, “Mobile adaptive networks,” IEEE Sel. Topics Signal Process., vol. 5, pp. 649–664, Aug. 2011.
[33] J.-J. Xiao, A. Ribeiro, T. Luo, and G. B. Giannakis, “Distributed compression-estimation using wireless sensor networks,”
IEEE Signal Process. Mag., vol. 23, pp. 27–41, Jul. 2006.
[34] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Systems and Control Letters, vol. 53, pp. 65–78, Sep.
2004.
[35] N. R. Yousef and A. H. Sayed, “A unified approach to the steady-state and tracking analysis of adaptive filters,” IEEE Trans.
Signal Process., vol. 49, pp. 314–324, Feb. 2001.
Fig. 1. The covariance matrix Ry (t) and some of its inner submatrices that are relevant to the performance evaluation of the
D-RLS algorithm.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Fig. 2. An ad hoc WSN with J = 15 sensors, generated as a realization of the random geometric graph model on the unity
square, with communication range r = 0.3.
0
10
Global EMSE Evolution
−2
10
Fig. 3. Global steady-state performance evaluation. D-RLS is ran with ideal links and when communication noise with variance
ση2 = 10−1
is present.
−50
Theoretical D−RLS
Theoretical D−LMS [16]
−55
−60
−65
−70
−75
2 4 6 8 10 12 14
Sensor index
Local MSD (dB) in steady−state
−55
−60
−65
2 4 6 8 10 12 14
Sensor index
Fig. 4. Local steady-state performance evaluation. D-RLS is compared to the D-LMS algorithm in [16].