Strassen Algorithm
Strassen Algorithm
Strassen Algorithm
Not to be confused with the SchnhageStrassen algo- We partition A, B and C into equally sized block matrices
rithm for multiplication of polynomials.
[
]
[
A1,1 A1,2
B
In linear algebra, the Strassen algorithm, named after A =
, B = 1,1
A2,1 A2,2
B2,1
Volker Strassen, is an algorithm for matrix multiplication. It is faster than the standard matrix multiplication with
algorithm and is useful in practice for large matrices, but
would be slower than the fastest known algorithms for exn1
n1
tremely large matrices.
Ai,j , Bi,j , Ci,j R2 2
Strassens algorithm works for any ring, such as
plus/multiply, but not all semirings, such as min/plus or then
boolean algebra, where the naive algorithm still works,
and so called combinatorial matrix multiplication.
C1,1 = A1,1 B1,1 + A1,2 B2,1
]
[
B1,2
C1,1
,C=
B2,2
C2,1
C1,2
C2,2
History
Algorithm
M1
A11 A12 A21 A22
B11
B21
C11B12
1
1
B22
C12
C21
M2
M3
M4
1
1
1
1
1
1
M7
1
1
-1
-1
1 1
1 1
M6
-1 -1
1
-1
1
1
M5
-1
1
1
1
1
C22
-1
1
-1 -1
-1
1
-1
-1
1
1
The left column represents 2x2 matrix multiplication. Nave matrix multiplication requires one multiplication for each 1 of the
left column. Each of the other columns represents a single one of
the 7 multiplications in the algorithm, and the sum of the columns
gives the full matrix multiplication on the left.
A, B, C R2
C2,1 = M2 + M4
2n
C2,2 = M1 M2 + M3 + M6
n
If the matrices A, B are not of type 2 2 we ll the We iterate this division process n times (recursively) until the submatrices degenerate into numbers (elements of
missing rows and columns with zeros.
1
4 IMPLEMENTATION CONSIDERATIONS
3.1
4 Implementation considerations
The description above states that the matrices are square,
and the size is a power of two, and that padding should
be used if needed. This restriction allows the matrices
to be split in half, recursively, until limit of scalar multiplication is reached. The restriction simplies the explanation, and analysis of complexity, but is not actually
necessary;[6] and in fact, padding the matrix as described
will increase the computation time and can easily eliminate the fairly narrow time savings obtained by using the
method in the rst place.
3
pared to conventional matrix multiplication, the algorithm adds a considerable O(n2 ) workload in addition/subtractions; so below a certain size, it will
be better to use conventional multiplication. Thus,
for instance, if you start with matrices that are
1600x1600, there is no need to pad to 2048x2048,
since you could subdivide down to 25x25 and then
use conventional multiplication at that level.
The method can indeed be applied to square matrices of any dimension.[2] If the dimension is even,
they are split in half as described. If the dimension
is odd, zero padding by one row and one column is
applied rst. Such padding can be applied on-the-y
and lazily, and the extra rows and columns discarded
as the result is formed. For instance, suppose the
matrices are 199x199. They can be split so that the
upper-left portion is 100x100 and the lower-right is
99x99. Wherever the operations require it, dimensions of 99 are zero padded to 100 rst. Note, for instance, that the product M2 is only used in the lower
row of the output, so is only required to be 99 rows
high; and thus the left factor (A2,1 + A2,2 ) used to
generate it need only be 99 rows high; accordingly,
there is no need to pad that sum to 100 rows; it is
only necessary to pad A2,2 to 100 columns to match
A2,1 .
Furthermore, there is no need for the matrices to be
square. Non-square matrices can be split in half using the
same methods, yielding smaller non-square matrices. If
the matrices are suciently non-square it will be worthwhile reducing the initial operation to more square products, using simple methods which are essentially O(n2 ) ,
for instance:
A product of size [2N x N] * [N x 10N] can be
done as 20 separate [N x N] * [N x N] operations,
arranged to form the result;
A product of size [N x 10N] * [10N x N] can be
done as 10 separate [N x N] * [N x N] operations,
summed to form the result.
These techniques will make the implementation more
complicated, compared to simply padding to a powerof-two square; however, it is a reasonable assumption
that anyone undertaking an implementation of Strassen,
rather than conventional, multiplication, will place a
higher priority on computational eciency than on simplicity of the implementation.
CoppersmithWinograd algorithm
Z-order matrix representation
Karatsuba algorithm, for multiplying n-digit integers
in O(nlog2 3 ) instead of in O(n2 ) time
Gausss complex multiplication algorithm multiplies
two complex numbers using 3 real multiplications
instead of 4
6 References
[1] Skiena, Steven S. (1998), "8.2.3 Matrix multiplication, The Algorithm Design Manual, Berlin, New York:
Springer-Verlag, ISBN 978-0-387-94860-7.
[2] D'Alberto, Paolo; Nicolau, Alexandru (2005). Using Recursion to Boost ATLASs Performance (PDF). Sixth Int'l
Symp. on High Performance Computing.
[3] Webb, Miller (1975). Computational complexity and numerical stability. SIAM J. Comput: 97107.
[4] Burgisser, Clausen, and Shokrollahi. Algebraic Complexity Theory. Springer-Verlag 1997.
[5] Frigo, M.; Leiserson, C. E.; Prokop, H.; Ramachandran,
S. (1999). Cache-oblivious algorithms (PDF). Proc. IEEE
Symp. on Foundations of Computer Science (FOCS). pp.
285297.
[6] Higham, Nicholas J. (1990). Exploiting fast matrix multiplication within the level 3 BLAS (PDF). ACM Transactions on Mathematical Software. 16 (4): 352368.
doi:10.1145/98267.98290.
Strassen, Volker, Gaussian Elimination is not Optimal, Numer. Math. 13, p. 354-356, 1969
Thomas H. Cormen, Charles E. Leiserson, Ronald
L. Rivest, and Cliord Stein. Introduction to Algorithms, Second Edition. MIT Press and McGrawHill, 2001. ISBN 0-262-03293-7. Chapter 28: Section 28.2: Strassens algorithm for matrix multiplication, pp. 735741.
7 External links
Weisstein, Eric
MathWorld.
W.
Strassens
Formulas.
See also
Computational complexity of mathematical operations
GaussJordan elimination
8.1
Text
8.2
Images
8.3
Content license