Modern Computer Arithmetic: Richard P. Brent and Paul Zimmermann Version 0.5.9 of 7 October 2010
Modern Computer Arithmetic: Richard P. Brent and Paul Zimmermann Version 0.5.9 of 7 October 2010
Modern Computer Arithmetic: Richard P. Brent and Paul Zimmermann Version 0.5.9 of 7 October 2010
This electronic version is distributed under the terms and conditions of the
Creative Commons license “Attribution-Noncommercial-No Derivative Works
3.0”. You are free to copy, distribute and transmit this book under the following
conditions:
• Attribution. You must attribute the work in the manner specified by the
author or licensor (but not in any way that suggests that they endorse you or
your use of the work).
• Noncommercial. You may not use this work for commercial purposes.
• No Derivative Works. You may not alter, transform, or build upon this
work.
For any reuse or distribution, you must make clear to others the license terms
of this work. The best way to do this is with a link to the web page below. Any
of the above conditions can be waived if you get permission from the copyright
holder. Nothing in this license impairs or restricts the author’s moral rights.
Contents page iv
Preface ix
Acknowledgements xi
Notation xiii
1 Integer arithmetic 1
1.1 Representation and notations 1
1.2 Addition and subtraction 2
1.3 Multiplication 3
1.3.1 Naive multiplication 4
1.3.2 Karatsuba’s algorithm 5
1.3.3 Toom–Cook multiplication 6
1.3.4 Use of the fast Fourier transform (FFT) 8
1.3.5 Unbalanced multiplication 8
1.3.6 Squaring 11
1.3.7 Multiplication by a constant 13
1.4 Division 14
1.4.1 Naive division 14
1.4.2 Divisor preconditioning 16
1.4.3 Divide and conquer division 18
1.4.4 Newton’s method 21
1.4.5 Exact division 21
1.4.6 Only quotient or remainder wanted 22
1.4.7 Division by a single word 23
1.4.8 Hensel’s division 24
1.5 Roots 25
1.5.1 Square root 25
1.5.2 kth root 27
Contents v
This is a book about algorithms for performing arithmetic, and their imple-
mentation on modern computers. We are concerned with software more than
hardware – we do not cover computer architecture or the design of computer
hardware since good books are already available on these topics. Instead, we
focus on algorithms for efficiently performing arithmetic operations such as
addition, multiplication, and division, and their connections to topics such
as modular arithmetic, greatest common divisors, the fast Fourier transform
(FFT), and the computation of special functions.
The algorithms that we present are mainly intended for arbitrary-precision
arithmetic. That is, they are not limited by the computer wordsize of 32 or 64
bits, only by the memory and time available for the computation. We consider
both integer and real (floating-point) computations.
The book is divided into four main chapters, plus one short chapter (essen-
tially an appendix). Chapter 1 covers integer arithmetic. This has, of course,
been considered in many other books and papers. However, there has been
much recent progress, inspired in part by the application to public key cryp-
tography, so most of the published books are now partly out of date or incom-
plete. Our aim is to present the latest developments in a concise manner. At the
same time, we provide a self-contained introduction for the reader who is not
an expert in the field.
Chapter 2 is concerned with modular arithmetic and the FFT, and their appli-
cations to computer arithmetic. We consider different number representations,
fast algorithms for multiplication, division and exponentiation, and the use of
the Chinese remainder theorem (CRT).
Chapter 3 covers floating-point arithmetic. Our concern is with high-
precision floating-point arithmetic, implemented in software if the precision
provided by the hardware (typically IEEE standard 53-bit significand) is
x Preface
We thank the French National Institute for Research in Computer Science and
Control (INRIA), the Australian National University (ANU), and the Aus-
tralian Research Council (ARC), for their support. The book could not have
been written without the contributions of many friends and colleagues, too nu-
merous to mention here, but acknowledged in the text and in the Notes and
references sections at the end of each chapter.
We also thank those who have sent us comments on and corrections to ear-
lier versions of this book: Jörg Arndt, Marco Bodrato, Wolfgang Ehrhardt
(with special thanks), Steven Galbraith, Torbjörn Granlund, Guillaume Han-
rot, Marc Mezzarobba, Jean-Michel Muller, Denis Roegel, Wolfgang Schmid,
Arnold Schönhage, Sidi Mohamed Sedjelmaci, Emmanuel Thomé, and Mark
Wezelenburg. Two anonymous reviewers provided very helpful suggestions.
Jérémie Detrey and Anne Rix helped us in the copy-editing phase.
The Mathematics Genealogy Project (http://www.genealogy.ams.
org/) and Don Knuth’s The Art of Computer Programming [142] were useful
resources for details of entries in the index.
We also thank the authors of the LATEX program, which allowed us to pro-
duce this book, the authors of the gnuplot program, and the authors of the
GNU MP library, which helped us to illustrate several algorithms with concrete
figures.
Finally, we acknowledge the contribution of Erin Brent, who first suggested
writing the book; and thank our wives, Judy-anne and Marie, for their patience
and encouragement.
Notation
a
t
[a, b] or [a, b]t column vector
b
a b
[a, b; c, d] 2 × 2 matrix
c d
aj element of the (forward) Fourier transform of vector a
aj element of the backward Fourier transform of vector a
123 456 789 123456789 (for large integers, we may use a space after
every third digit)
xvi Notation
a b
|A| determinant of a matrix A, e.g. = ad − bc
c d
b
PV a
f (x) dx Cauchy principal value integral, defined by a limit
if f has a singularity in (a, b)
are respectively 109 and 1019 for a decimal representation, or 253 when using
double-precision floating-point registers. Most algorithms given in this chapter
work in any base; the exceptions are explicitly mentioned.
We assume that the sign is stored separately from the absolute value. This
is known as the “sign-magnitude” representation. Zero is an important special
case; to simplify the algorithms we assume that n = 0 if A = 0, and we usually
assume that this case is treated separately.
Except when explicitly mentioned, we assume that all operations are off-line,
i.e. all inputs (resp. outputs) are completely known at the beginning (resp. end)
of the algorithm. Different models include lazy and relaxed algorithms, and
are discussed in the Notes and references (§1.9).
Let T be the number of different values taken by the data type representing
the coefficients ai , bi . (Clearly, β ≤ T , but equality does not necessarily hold,
for example β = 109 and T = 232 .) At step 3, the value of s can be as
large as 2β − 1, which is not representable if β = T . Several workarounds
are possible: either use a machine instruction that gives the possible carry of
ai + bi , or use the fact that, if a carry occurs in ai + bi , then the computed
1.3 Multiplication 3
1.3 Multiplication
A nice application of large integer multiplication is the Kronecker–Schönhage
trick, also called segmentation or substitution by some authors. Assume we
want to multiply two polynomials, A(x) and B(x), with non-negative integer
coefficients (see Exercise 1.1 for negative coefficients). Assume both polyno-
mials have degree less than n, and the coefficients are bounded by ρ. Now take
a power X = β k > nρ2 of the base β, and multiply the integers a = A(X) and
b = B(X) obtained by evaluating A and B at x = X. If C(x) = A(x)B(x) =
ci xi , we clearly have C(X) = ci X i . Now since the ci are bounded by
nρ < X, the coefficients ci can be retrieved by simply “reading” blocks of k
2
• split the two operands into an equal number of pieces of unequal sizes;
• or split the two operands into different numbers of pieces.
Each strategy has advantages and disadvantages. We discuss each in turn.
a4 a3 a2 a1 a0 a4 a3 a2 a1 a0 a4 a3 a2 a1 a0
b2 b1 b0 b2 b1 b0 b2 b1 b0
A×B A × (βB) A × (β 2 B)
The left variant leads to two products of size 3, i.e. 2K(3, 3), the middle one to
K(2, 1)+K(3, 2)+K(3, 3), and the right one to K(2, 2)+K(3, 1)+K(3, 3),
which give respectively 14, 15, 13 word-products.
However, whenever m/2 ≤ n ≤ m, any such “padding variant” will re-
quire K(⌈m/2⌉, ⌈m/2⌉) for the product of the differences (or sums) of the
low and high parts from the operands, due to a “wrap-around” effect when
subtracting the parts from the smaller operand; this will ultimately lead to a
cost similar to that of an m×m product. The “odd–even scheme” of Algorithm
OddEvenKaratsuba (see also Exercise 1.13) avoids this wrap-around. Here is
an example of this algorithm for m = 3 and n = 2. Take A = a2 x2 + a1 x + a0
and B = b1 x + b0 . This yields A0 = a2 x + a0 , A1 = a1 , B0 = b0 , B1 = b1 ;
thus, C0 = (a2 x + a0 )b0 , C1 = (a2 x + a0 + a1 )(b0 + b1 ), C2 = a1 b1 .
C0 ← OddEvenKaratsuba(A0 , B0 )
C1 ← OddEvenKaratsuba(A0 + A1 , B0 + B1 )
C2 ← OddEvenKaratsuba(A1 , B1 )
return C0 (x2 ) + x(C1 − C0 − C2 )(x2 ) + x2 C2 (x2 ).