Strassen Algorithm

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Strassen algorithm

Not to be confused with the SchnhageStrassen algo- We partition A, B and C into equally sized block matrices
rithm for multiplication of polynomials.
[
]
[
A1,1 A1,2
B
In linear algebra, the Strassen algorithm, named after A =
, B = 1,1
A2,1 A2,2
B2,1
Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication with
algorithm and is useful in practice for large matrices, but
would be slower than the fastest known algorithms for exn1
n1
tremely large matrices.
Ai,j , Bi,j , Ci,j R2 2
Strassens algorithm works for any ring, such as
plus/multiply, but not all semirings, such as min/plus or then
boolean algebra, where the naive algorithm still works,
and so called combinatorial matrix multiplication.
C1,1 = A1,1 B1,1 + A1,2 B2,1

]
[
B1,2
C1,1
,C=
B2,2
C2,1

C1,2
C2,2

C1,2 = A1,1 B1,2 + A1,2 B2,2

History

C2,1 = A2,1 B1,1 + A2,2 B2,1

Volker Strassen rst published this algorithm in 1969 and


proved that the n3 general matrix multiplication algorithm
wasn't optimal. The Strassen algorithm is only slightly
better, but its publication resulted in much more research
about matrix multiplication that led to faster approaches,
such as the Coppersmith-Winograd algorithm.

C2,2 = A2,1 B1,2 + A2,2 B2,2


With this construction we have not reduced the number of
multiplications. We still need 8 multiplications to calculate the Ci,j matrices, the same number of multiplications
we need when using standard matrix multiplication.
Now comes the important part. We dene new matrices

Algorithm
M1
A11 A12 A21 A22

B11
B21
C11B12

1
1

B22

C12

C21

M2

M3

M4

1
1

1
1

1
1

M2 := (A2,1 + A2,2 )B1,1

M7
1
1

-1

M3 := A1,1 (B1,2 B2,2 )

-1

M4 := A2,2 (B2,1 B1,1 )

1 1

1 1

M6

-1 -1

1
-1

1
1

M5

-1
1

1
1

1
C22

M1 := (A1,1 + A2,2 )(B1,1 + B2,2 )

-1
1

-1 -1

-1
1
-1

-1

M5 := (A1,1 + A1,2 )B2,2

1
1

M6 := (A2,1 A1,1 )(B1,1 + B1,2 )


M7 := (A1,2 A2,2 )(B2,1 + B2,2 )

The left column represents 2x2 matrix multiplication. Nave matrix multiplication requires one multiplication for each 1 of the
left column. Each of the other columns represents a single one of
the 7 multiplications in the algorithm, and the sum of the columns
gives the full matrix multiplication on the left.

only using 7 multiplications (one for each M ) instead of


8. We may now express the C, in terms of M , like this:

Let A, B be two square matrices over a ring R. We want C1,1 = M1 + M4 M5 + M7


to calculate the matrix product C as
C1,2 = M3 + M5
C = AB

A, B, C R2

C2,1 = M2 + M4

2n

C2,2 = M1 M2 + M3 + M6
n

If the matrices A, B are not of type 2 2 we ll the We iterate this division process n times (recursively) until the submatrices degenerate into numbers (elements of
missing rows and columns with zeros.
1

4 IMPLEMENTATION CONSIDERATIONS

the ring R). The resulting product will be padded with


{
zeroes just like A and B, and should be stripped of the


corresponding rows and columns.
R(/F) = min r fi A , gi B , wi C, a A, b B, (a, b) =

Practical implementations of Strassens algorithm switch
to standard methods of matrix multiplication for small
In other words, the rank of a bilinear map is the length
enough submatrices, for which those algorithms are
of its shortest bilinear computation.[4] The existence of
more ecient. The particular crossover point for which
Strassens algorithm shows that the rank of 22 matrix
Strassens algorithm is more ecient depends on the
multiplication is no more than seven. To see this, let us
specic implementation and hardware. Earlier authors
express this algorithm (alongside the standard algorithm)
had estimated that Strassens algorithm is faster for
as such a bilinear computation. In the case of matrices,
matrices with widths from 32 to 128 for optimized
the dual spaces A* and B* consist of maps into the eld
[1]
implementations. However, it has been observed that
F induced by a scalar double-dot product, (i.e. in this
this crossover point has been increasing in recent years,
case the sum of all the entries of a Hadamard product.)
and a 2010 study found that even a single step of
Strassens algorithm is often not benecial on current ar- It can be shown that the total number of elementary mulchitectures, compared to a highly optimized traditional tiplications L required for matrix multiplication is tightly
multiplication, until matrix sizes exceed 1000 or more, asymptotically bound to the rank R, i.e. L = (R)
and even for matrix sizes of several thousand the benet , or more specically, since the constants are known,
1
is typically marginal at best (around 10% or less).[2]
2 R L R. One useful property of the rank is that
it is submultiplicative for tensor products, and this enables one to show that 2n 2n 2n matrix multiplication
can be accomplished with no more than 7n elementary
3 Asymptotic complexity
multiplications for any n. (This n-fold tensor product of
the 222 matrix multiplication map with itselfan nth
The standard matrix multiplication takes approximately tensor poweris realized by the recursive step in the al2N 3 (where N = 2n ) arithmetic operations (additions and gorithm shown.)
multiplications); the asymptotic complexity is (N 3 ).
The number of additions and multiplications required in
the Strassen algorithm can be calculated as follows: let
f(n) be the number of operations for a 2n 2n matrix.
Then by recursive application of the Strassen algorithm,
we see that f(n) = 7f(n1) + 4n , for some constant that
depends on the number of additions performed at each
application of the algorithm. Hence f(n) = (7 + o(1))n ,
i.e., the asymptotic complexity for multiplying matrices
of size N = 2n using the Strassen algorithm is

O([7 + o(1)]n ) = O(N log2 7+o(1) ) O(N 2.8074 )


The reduction in the number of arithmetic operations
however comes at the price of a somewhat reduced
numerical stability,[3] and the algorithm also requires signicantly more memory compared to the naive algorithm. Both initial matrices must have their dimensions
expanded to the next power of 2, which results in storing
up to four times as many elements, and the seven auxiliary matrices each contain a quarter of the elements in the
expanded ones.

3.1

Rank or bilinear complexity

3.2 Cache behavior


Strassens algorithm is cache oblivious. Analysis of its
cache behavior algorithm has shown it to incur
(
)
n2
nlog2 7
1+
+
b
b M
cache misses during its execution, assuming an idealized
cache of M lines, each of b bytes.[5]:13

4 Implementation considerations
The description above states that the matrices are square,
and the size is a power of two, and that padding should
be used if needed. This restriction allows the matrices
to be split in half, recursively, until limit of scalar multiplication is reached. The restriction simplies the explanation, and analysis of complexity, but is not actually
necessary;[6] and in fact, padding the matrix as described
will increase the computation time and can easily eliminate the fairly narrow time savings obtained by using the
method in the rst place.

The bilinear complexity or rank of a bilinear map is an


important concept in the asymptotic complexity of matrix A good implementation will observe the following:
multiplication. The rank of a bilinear map : A B
It is not necessary or desirable to use the Strassen
C over a eld F is dened as (somewhat of an abuse of
notation)
algorithm down to the limit of scalars. Com-

3
pared to conventional matrix multiplication, the algorithm adds a considerable O(n2 ) workload in addition/subtractions; so below a certain size, it will
be better to use conventional multiplication. Thus,
for instance, if you start with matrices that are
1600x1600, there is no need to pad to 2048x2048,
since you could subdivide down to 25x25 and then
use conventional multiplication at that level.
The method can indeed be applied to square matrices of any dimension.[2] If the dimension is even,
they are split in half as described. If the dimension
is odd, zero padding by one row and one column is
applied rst. Such padding can be applied on-the-y
and lazily, and the extra rows and columns discarded
as the result is formed. For instance, suppose the
matrices are 199x199. They can be split so that the
upper-left portion is 100x100 and the lower-right is
99x99. Wherever the operations require it, dimensions of 99 are zero padded to 100 rst. Note, for instance, that the product M2 is only used in the lower
row of the output, so is only required to be 99 rows
high; and thus the left factor (A2,1 + A2,2 ) used to
generate it need only be 99 rows high; accordingly,
there is no need to pad that sum to 100 rows; it is
only necessary to pad A2,2 to 100 columns to match
A2,1 .
Furthermore, there is no need for the matrices to be
square. Non-square matrices can be split in half using the
same methods, yielding smaller non-square matrices. If
the matrices are suciently non-square it will be worthwhile reducing the initial operation to more square products, using simple methods which are essentially O(n2 ) ,
for instance:
A product of size [2N x N] * [N x 10N] can be
done as 20 separate [N x N] * [N x N] operations,
arranged to form the result;
A product of size [N x 10N] * [10N x N] can be
done as 10 separate [N x N] * [N x N] operations,
summed to form the result.
These techniques will make the implementation more
complicated, compared to simply padding to a powerof-two square; however, it is a reasonable assumption
that anyone undertaking an implementation of Strassen,
rather than conventional, multiplication, will place a
higher priority on computational eciency than on simplicity of the implementation.

CoppersmithWinograd algorithm
Z-order matrix representation
Karatsuba algorithm, for multiplying n-digit integers
in O(nlog2 3 ) instead of in O(n2 ) time
Gausss complex multiplication algorithm multiplies
two complex numbers using 3 real multiplications
instead of 4

6 References
[1] Skiena, Steven S. (1998), "8.2.3 Matrix multiplication, The Algorithm Design Manual, Berlin, New York:
Springer-Verlag, ISBN 978-0-387-94860-7.
[2] D'Alberto, Paolo; Nicolau, Alexandru (2005). Using Recursion to Boost ATLASs Performance (PDF). Sixth Int'l
Symp. on High Performance Computing.
[3] Webb, Miller (1975). Computational complexity and numerical stability. SIAM J. Comput: 97107.
[4] Burgisser, Clausen, and Shokrollahi. Algebraic Complexity Theory. Springer-Verlag 1997.
[5] Frigo, M.; Leiserson, C. E.; Prokop, H.; Ramachandran,
S. (1999). Cache-oblivious algorithms (PDF). Proc. IEEE
Symp. on Foundations of Computer Science (FOCS). pp.
285297.
[6] Higham, Nicholas J. (1990). Exploiting fast matrix multiplication within the level 3 BLAS (PDF). ACM Transactions on Mathematical Software. 16 (4): 352368.
doi:10.1145/98267.98290.

Strassen, Volker, Gaussian Elimination is not Optimal, Numer. Math. 13, p. 354-356, 1969
Thomas H. Cormen, Charles E. Leiserson, Ronald
L. Rivest, and Cliord Stein. Introduction to Algorithms, Second Edition. MIT Press and McGrawHill, 2001. ISBN 0-262-03293-7. Chapter 28: Section 28.2: Strassens algorithm for matrix multiplication, pp. 735741.

7 External links
Weisstein, Eric
MathWorld.

W.

Strassens

Formulas.

(also includes formulas for fast matrix inversion)

See also
Computational complexity of mathematical operations
GaussJordan elimination

Tyler J. Earnest, Strassens Algorithm on the Cell


Broadband Engine
Simple Strassens algorithms implementation in C
(easy for education purposes)

8 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

Text and image sources, contributors, and licenses

8.1

Text

Strassen algorithm Source: https://en.wikipedia.org/wiki/Strassen_algorithm?oldid=732485107 Contributors: Michael Hardy, Dominus,


Cyp, Stevenj, Dcoetzee, Jitse Niesen, Fredrik, Altenmann, Chris Roy, MathMartin, (:Julien:), Bkell, Giftlite, Macrakis, Pmanderson,
Andreas Kaufmann, Elwikipedista~enwiki, Gauge, MisterSheik, Phansen, Benbread, Burn, Suruena,
, Forderud, Oleg Alexandrov, GregorB, Qwertyus, Rjwilmsi, Who, Fresheneesz, Parerga, Kri, Chobot, Pburka, Iamfscked, WILLY-MART, Paul D. Anderson,
SmackBot, Hftf, Timotheus Canens, Silly rabbit, Mhym, Rhebus, Andrei.lopatenko, CRGreathouse, Gremagor, Thijs!bot, Headbomb,
Xypron, J.delanoy, Thomasda, K.menin, Facuq, Adam Zivner, VolkovBot, Goelvivek, Jesin, Darkieboy236, Mimihitam, OboeCrack,
Alexbot, PixelBot, Aprock, Addbot, HerculeBot, Greg.smith.tor.ca, Yobot, Ptbotgourou, AnomieBOT, Galoubet, Bhaveshnande, LilHelpa,
Xqbot, Miracle Pen, Xnn, Felipe.barreta, EmausBot, John of Reading, ZroBot, ClueBot NG, Michael P. Barnett, Remococco, Coralium,
12, E8xE8, Anonymous Random Person, Opencooper, Kkpengboy and Anonymous: 73

8.2

Images

File:Question_book-new.svg Source: https://upload.wikimedia.org/wikipedia/en/9/99/Question_book-new.svg License: Cc-by-sa-3.0


Contributors: ? Original artist: ?
File:Strassen_algorithm.svg Source: https://upload.wikimedia.org/wikipedia/commons/2/2e/Strassen_algorithm.svg License: CC BYSA 3.0 Contributors: Own work Original artist: Cyp

8.3

Content license

Creative Commons Attribution-Share Alike 3.0

You might also like