Linear Algebra and its Applications 320 (2000) 193–198
www.elsevier.com/locate/laa
Symmetric centrosymmetric matrix–vector
multiplication
A. Melman 1
Department of Mathematics, University of San Francisco, San Francisco, CA 94117-1080, USA
Received 15 April 1999; accepted 15 July 2000
Submitted by R.E. Hartwig
Abstract
We present a method for the multiplication of an arbitrary vector by a symmetric centrosymmetric matrix, requiring 54 n2 + O(n) floating-point operations, rather than the 2n2
operations needed in the case of an arbitrary matrix. Combining this method with Trench’s
algorithm for Toeplitz matrix inversion yields a method for solving Toeplitz systems with the
same complexity as Levinson’s algorithm. © 2000 Elsevier Science Inc. All rights reserved.
AMS classification: 65F05
Keywords: Centrosymmetric matrix; Matrix–vector multiplication; Toeplitz system; Trench’s algorithm;
Levinson’s algorithm
1. Introduction
The multiplication of an arbitrary vector by a matrix on a sequential machine
requires, in general, 2n2 floating-point operations (following [5], we define a floating-point operation, or flop, as either an addition/subtraction or multiplication/division). This remains true even for a symmetric matrix. However, if, in addition to
being symmetric, the matrix is also persymmetric (symmetric with respect to the
southwest–northeast diagonal), then the complexity can be reduced significantly.
Matrices, which are both symmetric and persymmetric, are called symmetric centrosymmetric. In this paper, we present a method for multiplying an arbitrary vector
E-mail address:
[email protected] (A. Melman).
1 On leave from Ben-Gurion University, Beer-Sheva, Israel.
0024-3795/00/$ - see front matter ( 2000 Elsevier Science Inc. All rights reserved.
PII: S 0 0 2 4 - 3 7 9 5 ( 0 0 ) 0 0 2 3 2 - 9
194
A. Melman / Linear Algebra and its Applications 320 (2000) 193–198
by such a matrix with 54 n2 + O(n) flops. Additional multiplications by the same
matrix cost only an additional n2 + O(n) flops per multiplication. We also briefly
consider an application of this method by combining it with Trench’s algorithm
[5,7] for Toeplitz matrix inversion to produce a method for solving positive-definite
symmetric Toeplitz systems, which has the same complexity as Levinson’s algorithm
[5,6]. The paper is organized as follows: Section 2 is devoted to a few basic definitions and results; in Section 3, we present our method and discuss some implications,
whereas in Section 4, we briefly show an application to Toeplitz systems.
2. Preliminaries
A matrix M ∈ R(n,n) is said to be persymmetric if it is symmetric about its
southwest–northeast diagonal. For such a matrix M, this is the same as requiring
that J M T J = M, where J is a matrix with ones on its southwest–northeast diagonal
and zeros everywhere else (the exchange matrix). It is easy to see that the inverse
of a persymmetric matrix is also persymmetric. A matrix Q ∈ R(n,n) is said to be
centrosymmetric if it satisfies J QJ = Q. Its inverse is also centrosymmetric. A
matrix which is both symmetric and persymmetric is therefore called symmetric
centrosymmetric, or doubly symmetric.
An even vector v is defined as a vector satisfying J v = v and an odd vector w as
one that satisfies J w = −w.
A symmetric matrix T ∈ R(n,n) is said to be Toeplitz if its elements Tij satisn
T
fy Tij = ρ|j −i| , where {ρj }n−1
j =0 are the components of a vector (ρ0 , r) ∈ R , with
n−1
r = (ρ1 , . . . , ρn−1 )T ∈ R , so that
ρ0
ρ1
ρ2
. . . ρn−1
ρ1
ρ0
ρ1
. . . ρn−2
..
.. .
.
.
..
..
T = .
.
.
.
.
ρn−2 ρn−3 ρn−4 . . .
ρ1
ρn−1 ρn−2 ρn−3 . . .
ρ0
The matrix T is symmetric centrosymmetric and so is its inverse. Such matrices
appear in many applications and we refer to [2] for a good overview.
We denote the exchange matrix by J without specifically indicating its dimension,
which is assumed to be clear from the context.
3. Symmetric centrosymmetric matrix–vector multiplication
Consider the multiplication Mn b (n) , where Mn is a real n × n symmetric centrosymmetric matrix and b (n) is a real n-dimensional vector. The matrix Mn can be
partitioned as follows:
A. Melman / Linear Algebra and its Applications 320 (2000) 193–198
α
t
Mn =
β
tT
Mn−2
tTJ
195
β
J t
,
α
(n−2,n−2)
where Mn−2 ∈ R
, t ∈ Rn−2 and α, β ∈ R. We note that Mn−2 is also
symmetric centrosymmetric. The method we are about to construct is based on this
partition of Mn and on the fact that b(n) = be(n) + bo(n) , where
be(n) = 21 (b(n) + J b (n) )
and bo(n) = 21 (b(n) − J b(n) ).
The even part of b(n) satisfies J be(n) = be(n) , whereas for the odd part we have
(n)
(n)
(n)
(n)
J bo = −bo . We compute Mn b(n) as Mn be + Mn bo . Writing
be(n) = (σe , be(n−2) , σe )T
respectively, we have
α
tT
Mn be(n) = t Mn−2
β
t TJ
and bo(n) = (σo , bo(n−2) , −σo )T ,
(n−2)
(α + β)σe + t T be
β
σe
(n−2)
(n−2)
J t
=
be
,
σe (t + J t) + Mn−2 be
σ
e
α
(n−2)
(α + β)σe + t T be
(n)
which computes the first and last components of Mn be , and
(n−2)
β
α
tT
σo
(α − β)σo + t T bo
Mn bo(n) = t Mn−2 J t bo(n−2) = σo (t − J t) + Mn−2 bo(n−2) ,
(n−2)
−σo
−(α − β)σo − t T bo
β
t TJ
α
which computes the first and last components of Mn bo(n). We then repeat this pro(n−2)
(n−2)
cedure for Mn−2 be
and Mn−2 bo
, add the result to that of the previous step,
and continue in this manner until the whole product is computed. This leads to the
algorithm, labeled “Algorithm 1”, for computing y = Mn b (n) . We have used similar
notation as in [5], with the subscripts “e” and “o” referring to even and odd quantities, respectively. For positive numbers (as is the case here), the notation ⌊ ⌋ means
“rounded to the nearest smaller integer”. For the sake of clarity, some redundant
quantities were introduced in the algorithm. These do not affect the complexity.
Algorithm 1.
m = ⌊ n+1
2 ⌋
be (1 : m) = (1/2) (b(1 : m) + b(m : −1 : 1))
bo (1 : m) = (1/2) (b(1 : m) − b(m : −1 : 1))
if n is even
for j = 1 : m − 1
α = q(j, j )
β = q(j, n − j + 1)
196
A. Melman / Linear Algebra and its Applications 320 (2000) 193–198
s = q(j, j + 1 : n − j )
se (1 : m − j ) = s(1 : m − j ) + s(n − 2j : −1 : m − j + 1)
so (1 : m − j ) = s(1 : m − j ) − s(n − 2j : −1 : m − j + 1)
ze (j ) = (α + β)be (j ) + se (1 : m − j )T be (j + 1 : m)
ze (j + 1 : m) = ze (j + 1 : m) + be (j )se (1 : m − j )
zo (j ) = (α − β)bo (j ) + so (1 : m − j )T bo (j + 1 : m)
zo (j + 1 : m) = zo (j + 1 : m) + bo (j )so (1 : m − j )
end
else (n is odd)
for j = 1 : m − 1
α = q(j, j )
β = q(j, n − j + 1)
s = q(j, j + 1 : n − j )
se (1 : m − j ) = s(1 : m − j ) + s(n − 2j : −1 : m − j )
so (1 : m − j ) = s(1 : m − j ) − s(n − 2j : −1 : m − j )
ze (j ) = (α + β)be (j ) + se (1 : m − j )T be (j + 1 : m)
−se (m − j )be (m)
ze (j + 1 : m) = ze (j + 1 : m) + be (j )se (1 : m − j )
zo (j ) = (α − β)bo (j ) + so (1 : m − j )T bo (j + 1 : m)
zo (j + 1 : m) = zo (j + 1 : m) + bo (j )so (1 : m − j )
end
end
y(1 : m) = ze (1 : m) + zo (1 : m)
if
n is even
y(m + 1 : n) = ze (m : −1 : 1) − zo (m : −1 : 1)
else (n is odd)
y(m + 1 : n) = ze (m − 1 : −1 : 1) − zo (m − 1 : −1 : 1)
end
We stress that this is a conceptual algorithm only, as the final implementation depends on the architecture of the machine on which it is supposed to run. The algorithm could also be executed backwards, by starting with M1 b(1) for n odd, or
M2 b(2) for n even. Furthermore, the computations of the even and odd parts of y
are independent of each other and could be processed in parallel. At each step of
(n)
(n)
Algorithm 1, two components are computed for both be and bo , from the outer
components down to the middle ones as it proceeds from step to step.
197
A. Melman / Linear Algebra and its Applications 320 (2000) 193–198
Let us now have a look at the number of flops executed by Algorithm 1. We first
(j )
consider Mn be(n) when n is even. At the stage where Mj be (j < n) is carried out,
we have
ρ
e
T b (j −2)
δ
γ
sT
(γ
+
δ)ρ
+
s
e
e
(j −2)
(j )
(j −2)
=
Mj be = s Mj −2 J s
,
ρe (s + J s) + Mj −2 be
be
(j
−2)
T
T
(γ + δ)ρe + s be
δ
s J
γ
ρe
(j )
where γ , δ ∈ R and s ∈ Rj −2 . Taking into account the symmetry of be and (s +
J s), we need to execute, at this stage, (j − 1) additions and (j − 1) multiplications.
Except for j = n, this must then be added to the solution vector found thus far, which
requires an additional j/2 additions. Summing over all steps, this gives
n
n n
−1 +
− 2 + ···+ 2 + 1
(n + (n − 2) + · · · + 2) − +
2
2
2
additions and
n
(n + (n − 2) + · · · + 2) −
2
multiplications. This yields ( 38 n2 − 41 n) additions and ( 14 n2 ) multiplications. Analo(n)
gously, the same complexity is obtained for Mn bo , so that if we apply the matrix
(n)
to an arbitrary vector b and also take into account the cost of computing the even
and odd parts of b(n) and of adding the even and odd parts of the solution, we obtain
for the computation of Mn b (n) a complexity of ( 43 n2 + n) additions and ( 12 n2 + n)
multiplications. This means that the total complexity of the method is 45 n2 + O(n)
flops. The same complexity is obtained when n is odd. If the vectors
⌊ n+1
⌊ n+1
2 ⌋
2 ⌋
and
vj − J vj j =1
,
vj + J vj j =1
where the vj ’s are the columns of Mn , are stored (at a cost of roughly n2 /4), then
each subsequent multiplication by an arbitrary vector costs only n2 + O(n) additional flops. No extra storage is required if the matrix Mn , which requires the same
amount of storage, is overwritten by these data.
4. Positive-definite Toeplitz systems of equations
In this section we briefly illustrate an application of symmetric centrosymmetric
matrix–vector multiplication in the context of Toeplitz systems by showing that it
leads to yet another O(n2 ) algorithm for such problems.
The special system of equations T x = −r, with T and r as in Section 2 and T
positive-definite, is called the Yule–Walker (YW) system of equations. It frequently
appears in digital signal processing problems and can be solved by Durbin’s method
with 2n2 + O(n) flops (see [4]). Its solution is used in an efficient algorithm for the
198
A. Melman / Linear Algebra and its Applications 320 (2000) 193–198
inversion of a positive-definite Toeplitz matrix, due to Trench [7]. If Durbin’s algorithm is used for solving the YW equations, then Trench’s algorithm requires a total
2
of 13
4 n + O(n) flops. However, in [3], an algorithm (“the split Levinson method”)
is derived, requiring only 23 n2 + O(n) flops to solve the YW equations. This reduces
2
Trench’s algorithm’s complexity to 11
4 n + O(n) flops. We refer, once again, to [5]
for an overview of algorithms and references related to Toeplitz matrices. There are
other algorithms for these same problems with better complexity for large matrices
and we refer to, i.e., [1] and references therein.
A classical method for solving positive-definite real symmetric Toeplitz systems
of linear equations with arbitrary right-hand side (of the form T y = b) and which
does not involve matrix inversion, is Levinson’s algorithm [5,6]. Its complexity is
4n2 + O(n) flops. An alternative to Levinson’s algorithm can be obtained by first inverting the coefficient matrix using Trench’s method and then applying the resulting
matrix to the right-hand side, i.e., y = T −1 b. However, this would not be advantageous in terms of number of operations if regular matrix multiplication were used.
On the other hand, the inverse of a symmetric Toeplitz matrix is symmetric centrosymmetric and we can therefore, given T −1 , apply our algorithm to compute T −1 b.
Let us have a look at the complexity of such a procedure. Trench’s algorithm with the
2
split Levinson algorithm requires 11
4 n + O(n) flops. Multiplying the inverse matrix
and the right-hand side using Algorithm 1 then adds 54 n2 + O(n) flops, yielding a
total of 4n2 + O(n) flops, which is precisely the same as for Levinson’s algorithm.
In fact, it may be more desirable to use our method if the system needs to be solved
for more than one right-hand side, especially since each subsequent solve requires
only n2 + O(n) flops, once the necessary quantities have been stored. Such an algorithm is, in general, likely to be less accurate than Levinson’s algorithm, but iterative
refinement of the solution could always be used, if necessary. However, these are
practical considerations (as is the size of the matrix, which may give an advantage to
the so-called superfast solvers in [1]) and the choice of algorithm will depend on the
circumstances.
References
[1] G.S. Ammar, W.B. Gragg, The generalized Schur algorithm for the superfast solution of Toeplitz systems, in: J. Gilewicz, M. Pindor, W. Siemaszko (Eds.), Rational Approximations and its Applications
in Mathematics and Physics, Lecture Notes in Mathematics, 1237, Berlin, 1987, pp. 315–330.
[2] J.R. Bunch, Stability of methods for solving Toeplitz systems of equations, SIAM J. Sci. Stat. Comput.
6 (1985) 349–364.
[3] P. Delsarte, Y. Genin, The split Levinson algorithm, IEEE Trans. Acoust. Speech, Signal Processing
ASSP-34 (1986) 470–478.
[4] J. Durbin, The fitting of time series model, Rev. Inst. Int. Stat. 28 (1960) 233–243.
[5] G. Golub, C. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, MD, 1989.
[6] N. Levinson, The Wiener RMS (root mean square) error criterion in filter design and prediction, J.
Math. Phys. 25 (1947) 261–278.
[7] W.F. Trench, An algorithm for the inversion of finite Toeplitz matrices, J. SIAM 12 (1964) 515–522.