Baby Step Giant Step Algorithm
Baby Step Giant Step Algorithm
Baby Step Giant Step Algorithm
1 Introduction
2
Pollard
√ rho algorithm. Theoretically, one can achieve a speedup by a factor of
2 when solving the ECDLP, for details see [1, 21]. This idea can be applied in
any algebraic group for which inversion is efficient (for example in torus-based
cryptography). For the DLP in an interval of size N , Galbraith and Ruprai [7]
showed a variant of the Gaudry-Schost
√ algorithm with heuristic average-case
running time of (1.36 + o(1)) N group operations.
The most expensive operation in the baby-step giant-step algorithm is point
addition. To speed up the algorithm we can aim to either reduce the total number
of group operations performed, or else to reduce the cost of each individual group
operation. The main observation in this paper allows us to take the second
approach. Precisely, we exploit the fact that, given any two points X and Y , we
can compute X + Y and X − Y (in affine coordinates) in less than two times
the cost of an elliptic curve addition, by recycling the field inversion. Hence,
we propose improved algorithms to compute a list of consecutive elliptic curve
points. We also discuss improved variants of the “grumpy giants” algorithm
of Bernstein and Lange. We give a new approach to determine the average-
case behaviour of the grumpy-giants algorithm, and also describe and analyse a
variant of this algorithm for groups with efficient inversion. All our results are
summarised in Table 3.
The paper is organized as follows. We recall the baby-step giant-step algo-
rithm and its minor modifications in Section 2. Section 2.1 gives our new heuristic
method for analysing interleaved BSGS algorithms. In Section 3, we describe a
negation map variant of the baby-step giant-step algorithm and grumpy-giants
algorithm. Section 4 discusses methods for efficiently computing lists of con-
secutive points on elliptic curve. Section 5 brings the ideas together to present
faster variants of the BSGS algorithm. Details of our experiments are given in
the Appendices.
In this section, we recall the baby-step giant-step algorithm for DLP computa-
tions and present some standard variants of it. The baby-step giant-step algo-
rithm makes use of a time-space tradeoff to solve the discrete logarithm problem
in arbitrary groups. We use the notation of the ECDLP as defined as in the
introduction: Given P and Q to find n such that Q = nP and 0 ≤ n < N .√
The “textbook” baby-step giant-step algorithm is as follows: Let M = d N e.
Then n = n0 + M n1 , where 0 ≤ n0 < M and 0 ≤ n1 < M . Precompute
P 0 = M P . Now compute the baby steps n0 P and store the pairs (n0 P, n0 ) in an
easily searched structure (searchable on the first component) such as a sorted list
or binary tree. Then compute the giant steps Q − n1 P 0 and check whether each
value lies in the list of baby steps. When a match n0 P = Q − n1 P 0 is found then
the ECDLP is solved. √The baby-step giant-step algorithm is deterministic.
√ The
algorithm requires 2 N group operations in the worst case, and 32 N group
operations on average over uniformly random choices for Q.
3
p
To improve the average-case performance one can instead choose M = d N/2e.
Then n = n0 + M n1 , where 0 ≤ n0 < M and 0 ≤ n1 < 2M . We precompute
P 0 = M P and compute baby steps n0 P for 0 ≤ n0 < M and store them appro-
priately. Then we compute the giant steps Q − n1 P 0 . On average, the algorithm
finds a match after√half the giant steps have been performed. Hence, this variant
solves the DLP in 2N√group √ operations
√ on average. The worst-case complexity
is (M + 2M ) = ( √12 + 2) N = √32 N group operations.
A variant of the baby-step giant-step algorithm due to Pollard [18] is to
compute the baby steps and giant steps in parallel, storing the points in two
sorted lists/binary trees. One can show that the average-case√ running time of
this variant of the baby-step giant-step algorithm is 43 N group operations
(Pollard justifies this with an argument that uses the fact that the expected
value of max{x, y}, over uniformly distributed x, y ∈ [0, 1], is 2/3; at the end of
Section 2.1 we give a new derivation of this result). We call this the “interleaving”
variant of BSGS. The downside is that the storage requirement slightly increases
(in both the average and worst case). We remark that the average-case running
time of Pollard’s method does not seem to be very sensitive in practice to changes
in M ; however to minimise both the average-case and worst-case running time
one should choose the smallest value for M with the property that when both
lists contain exactly M elements then the algorithm must terminate.
All the above analysis applies both for the discrete logarithm problem in a
group of size N and the discrete logarithm problem in an interval of length N .
Bernstein and Lange [2] suggested a new variant of the BSGS algorithm, called
the “two grumpy giants and a baby” algorithm. The baby steps are of the form
n0 P for small values n√0 . One grumpy giant starts at Q and takes steps of size
P 0 = M P for M ≈ 0.5 N . The other grumpy giant starts at 2Q and takes steps
of size −P 00 = −(M + 1)P . The algorithm is an interleaving algorithm in the
sense that all three walks are done in parallel and stored in lists. At each step
one checks for a match among the lists (a match between any two lists allows
to solve the DLP; the case of a match 2Q − j(M + 1)P = iP implies the DLP
satisfies 2n ≡ i + j(M + 1) (mod N ), and this equation has a unique solution
when N > 2 is prime). The exact performance of the grumpy giants method is
not known, but Bernstein and Lange conjecture that their method can be faster
than Pollard rho. However [2] does not contain very much evidence to support
their claim.
The crux of the analysis in [2], building on the work of Chateauneuf, Ling
and Stinson [5], is to count “slopes”. We re-phrase this idea as follows: After L
steps we have computed three lists {iP : 0 ≤ i < L}, {Q + jM P : 0 ≤ j < L}
and {2Q − k(M + 1)P : 0 ≤ k < L} and the DLP is solved as long as either
Q = (i − jM )P or 2Q = (i + k(M + 1))P or Q = (jM + k(M + 1))P . Hence,
the number of points Q whose DLP is solved after L steps is exactly the size of
4
the union
LL = {i − jM (mod N ) : 0 ≤ i, j < L}
∪ {2−1 (i + k(M + 1)) (mod N ) : 0 ≤ i, k < L}
∪ {jM + k(M + 1) (mod N ) : 0 ≤ j, k < L}.
The algorithm succeeds after L steps with probability #LL /N , over random
choices for Q. The term “slopes” is just another way to express the number of
elements in LL .
Note that the grumpy giants algorithm is designed for solving the DLP in a
group of size N . The algorithm is not appropriate for the DLP in an interval,
as the lists {iP } and {Q + jM P } might not collide at all for 0 ≤ i, j ≤ M , and
the point 2Q may be outside the interval. Hence, for the rest of this section we
assume P has order equal to N .
We have developed a new approach for computing approximations of the
average-case runtimes of interleaved BSGS algorithms. We use this approach to
give an approximation to the average-case running time of the grumpy giants
algorithm. Such an analysis does not appear in [2].
√
We√let α be such that the algorithm halts after at most α N steps. When √
M = N the DLP is solved using the first two sets {iP }, {Q + jM P } after N
steps. Hence it seems safe to always assume α ≤ 1. Note that α seems to be the
main quantity that is influenced by the choice of M .
√
Now, for 1 ≤ L ≤ α N one can consider the size of LL on average (the precise
size depends slightly on the value of N ). Initially we expect #LL to grow √ like
3L2 , but as L becomes larger then the number decreases until finally at L ≈ N
we have #LL = N ≈ L2 . Bernstein and Lange suggest that #LL ≈ 23 2
8 L when
√
L becomes close to N . The performance of the algorithm depends on the
value #LL /L2 so we need to study this in detail. It will be more convenient √
to rescale the √ problem to be independent of N . So for integers 0 ≤ L ≤ α N
we write c(L/ N ) = #LL /L2 . We then extend this function to c(t) for a real
√ 0 ≤ t ≤ α by linearly interpolating
variable √ those points. Note that #LL /N =
c(L/ N )L2 /N = c(t)t2 when t = L/ N .
We have performed simulations, for relatively small values of N , that enu-
merate the set LL for a range of values of L. From these simulations we can get
approximate values for α and get p some data points for c(t). A typical example
(this is when N ≈ 228 and M = N/2) is 0.94 < α < 0.97 and some data points
for c(t) are listed in Table 1.
5
Denote by Pr(L) the probability that the algorithm has solved the DLP
after L steps, and Pr(> L) = 1 − Pr(L) the probability
√ that the algorithm has
not succeeded after L steps. As noted, after L = t N steps, for√0 ≤ t ≤ α,
the algorithm succeeds with probability Pr(L) = #LL /N = c(L/ N )L2 /N =
2
c(t)t
√ . Hence, the probability that2 the algorithm has not succeeded after L =
t N steps is Pr(> L) = (1 − c(t)t ). Now, the expected
√ number of steps before
the algorithm halts is (using the fact that Pr(bα N c) = 1)
√ √
bα N c bα N c
X √ X
L(Pr(L) − Pr(L − 1)) = bα N c − Pr(L)
L=1 L=0
√
Z α N √
≈ (1 − c(L/ N )L2 /N )dL.
0
√ √
Substituting t = L/ N and noting dL = N dt allows us to approximate the
sum as Z α √
2
α− c(t)t dt N.
0
Rα
To determine the running time it remains to estimate 0 c(t)t2 dt and to multiply
by 3 (since there are three group operations performed for each step of the algo-
rithm). For example, using numerical integration (we simply use
R α the trapezoidal
rule) based on the data in Table 1, we estimate α ≈ 0.97 and 0 c(t)t2 dt ≈ 0.55.
This estimate would give running time (number of group operations) roughly
√ √
3(0.97 − 0.55) N = 1.26 N .
We now give slightly more careful, but still experimental, analysis. Figure 1
reports the results of some of our simulations, for three different choices of M .
The corresponding
p values for α are 0.95 < α < 1 for the first set of simulations
√
(M = N/2), √ and 0.96 < α < 0.99 respectively 0.98 < α < 1.02 for M = N
and M = N /2. For the three cases, we compute areas under the graphs in
Figure 1 using very simple numerical methods (summing the areas of rectan-
gles coming from the data points). The areas lie in the intervals, respectively,
[0.577, 0.581], [0.552, 0.569] and [0.595, 0.617]; the lower limit is the sum of rect-
angles under the curve and the upper limit is the sum of rectangles above√the
curve. For these values, the corresponding running times of the algorithm is c N
where the constant c lies in,prespectively, [1.11, 1.27], [1.17, 1.31] and [1.09, 1.28].
These results suggest M = N/2 is a good choice, and that the constant should
be close to 1.2. It is an open problem to give a complete and rigorous analysis
of the grumpy-giants algorithm.
The above analysis does not allow us to give a very precise conjecture on
the running time of the grumpy-giants algorithm. Instead we performed some
computational experiments to get an estimate of its average-case performance.
Details of the experiments are given p in Appendix A. The results are listed in
Table 2 (these results are for M = N/2). In Table 3 we write the value 1.25,
which seems to be a reasonable conjecture. We write 3α for the worst-case cost,
6
2
√ for t ∈ [0, 1]
Fig. 1.pGraph of c(t)t √obtained by simulations for the three values
M = N/2, M = N and M = 12 N . The multiple lines denote the results for
different experiments coming from different choices of N .
which seems a safe upper bound. Overall, our analysis and experiments support
the claim by Bernstein and Lange [2] that the grumpy-giant algorithm has better
complexity
√ than Pollard rho. Our work also confirms that their suggested value
M = N /2 is a good choice.
Bits #Elliptic Curves #DLPs per Curve average value for c standard deviation
28 100 10000 1.2579 0.0083
29 100 10000 1.2533 0.0064
30 100 10000 1.2484 0.0062
31 100 10000 1.2517 0.0067
32 100 10000 1.2736 0.0054
2.2 Summary
7
guished points. We also include entries for versions of the Gaudry-Schost algo-
rithm (which performs better than the Pollard kangaroo method). The +o(1)
term accounts for many issues (e.g., initial set-up costs, “un-needed” group ele-
ments computed in the final block, effects from the number of partitions used to
define pseudorandom walks, expected time to get a distinguished point, dealing
with cycles in random walks on equivalence classes, etc).
Table √ 3. The table lists constants c such that the named algorithm requires (c +
o(1)) N group operations for large enough groups of size N . The first block lists
algorithms for general groups, and all these results are known (see Section 2). The
values for the grumpy-giant algorithm (marked by an asterisk) are conjectural and
the values for the rho and Gaudry-Schost algorithm are heuristic. The second block
lists algorithms for groups having an efficiently computable inversion (see Section 3).
Some of these results are new (the first one appears as an exercise in the first author’s
textbook). The third block lists algorithms that exploit efficient inversion as well as
our main observation, and these results are all new (see Section 5).
8
One can also speed up the baby-step giant-step
√ algorithm. We present the
details in terms of elliptic curves. Let M = d N e. Then n = ±n0 + M n1 ,
where − M M
2 ≤ n0 < 2 and 0 ≤ n1 < M . Compute M/2 baby steps n0 P
for 0 ≤ n0 ≤ M/2. Store the values (x(n0 P ), n0 ) in a sorted structure. Next,
compute P 0 = M P and the giant steps Q − n1 P 0 for n1 = 0, 1, . . . . For each
point computed we check if its x-coordinate lies in the sorted structure. If we
have a match then
Q − n1 P 0 = ±n0 P
and so Q = (±n0 + M n1 )P and the ECDLP is solved.
Lemma 1. The average-case
√ running time of the baby-step giant-step using√ef-
ficient inversion is N group operations. The worst-case running time is 1.5 N
group operations.
Proof. Computing the baby steps requires M/2 group operations and comput-
ing the giant steps requires, on average M/2 group √
operations. Hence the total
number of group operations is, on average, M = N . In the worst case one
performs all M giant steps.
Compared with the original “textbook” BSGS algorithm√optimised for the
average case, we have reduced the running time by a factor of 2. This is exactly
what one would expect.
Readers may be confused about whether we are fully exploiting the ± sign.
To eliminate confusion, note that a match x(Q − n1 P 0 ) = x(n0 P ) is the same as
±(Q − n1 P 0 ) = ±n0 P , and this reduces to the equation Q − n1 P 0 = ±n0 P .
One can √ of course combine this trick with Pollard’s interleaving idea. Take
now M = 2N so that n = ±n0 + M n1 with 0 ≤ n0 , n1 ≤ M/2. The algorithm
computes the lists of baby-steps {x(n0 P )} and giant steps {x(Q − n1 P 0 )} in
parallel until the first match.
Lemma 2. The average-case running √ time√ of the interleaved baby-step giant-
step using efficient
√ inversion is (2 2/3) N group operations. The worst-case
running time is 2N group operations.
Proof. The worst-case number of steps (this is an interleaved algorithm so a
“step” now means computing one√baby step and one giant step) is M/2, giving
a total cost of 2(M/2) = M = 2N group operations. By Pollard’s analysis
[18], generating both walks in parallel leads to a match in 4(M/2)/3 group
operations
√ on average. The leading constant in the running time is therefore
2 2/3 ≈ 0.9428.
We can also prove this result using the method from Section 2.1. The number
2
of points Q whose DLP can be solved after L steps is 2L√ (since we havepQ =
(±i + jM )P ) and so c(t) = 2 for 0 ≤ t < α. When M = 2N then α = 1/2
and so the constant in the running-time is
Z α
2 α− 2t2 dt = 2(α − 2α3 /3) = 0.9428.
0
9
One might wonder if there is a better way to organise the interleaving. Since
one gets two baby-steps for each group operation it would be tempting to take
more giant steps on average than baby steps. However, the goal is to maximise
the number 2L1 L2 of points Q solved after L1 + L2 steps (where L1 and L2
denote the number of group operations spent computing baby-steps and giant-
steps respectively). This boils down to maximising f (x, y) = 2xy subject to
x + y = 1, which is easily seen to have the solution x = y = 1/2. Hence, the
optimal way to organise the interleaving is to use the same number of group
operations for baby-steps and giant-steps for each time L.
One can consider the grumpy giants method where matches are detected using
x(iP ), x(Q + jM P ), x(2Q − k(M + 1)P ). This algorithm is not mentioned in [2].
Hence, one of the contributions of our paper is to develop the algorithm in this
case and analyse it.
The first task is to count “slopes”. After L steps we have computed three
lists {x(iP ) : 0 ≤ i < L}, {x(Q + jM P ) : 0 ≤ j < L} and {x(2Q − k(M + 1)P ) :
0 ≤ k < L}. A collision between the first two lists implies Q + jM P = ±iP
and so Q = (±i − jM )P . A collision between the first and third lists implies
2Q − k(M + 1)P = ±iP and so Q = 2−1 (±i + k(M + 1))P . A collision between
the second and third lists implies either Q + jM P = 2Q − k(M + 1)P or Q +
jM P = −2Q + k(M + 1)P . Hence we have either Q = (jM + k(M + 1))P or
Q = 3−1 (k(M + 1) − jM )P . The relevant quantity to consider is the size of the
union
We follow the heuristic analysis √ given in Section 2.1. Let α be such that
the algorithm halts after at most α N steps. We√also define c(t) to be such
that the number of elements in LL is equal to c(L/ N )L2 . We have conducted
simulations, for various values of N (again, the values of α and c(t) depend
pon N
and on the choice of M ). A typical example (this is for N ≈ 228 and M = N/2)
has 0.87 < α < 0.92 and the values for c(t) as in Table 4.
10
Figure 2 reports on our simulations, again by computing p LL exactly for
various choices of N . Our experiments suggest that M = N/2 is the best
choice, for which 0.9 ≤ α ≤ 0.96. The integral under the curve, computed √ using
numerical methods, is in the interval [0.58, 0.60], giving a running time c N for
3(0.9 − 0.6) = 0.9 ≤ c ≤ 1.14 = 3(0.96 − 0.58).
Table 5 gives our experimental results. Details about the experiment √ are
given in Appendix A. The average running time seems to be around √ 0.9 N , so
that is the value we put in Table 3. Note that 1.25/0.9√≈ 1.39 < 2 so we have
not obtained a speed-up by a factor of approximately 2. Hence, this algorithm
seems to be not faster than Pollard rho on equivalence classes [1, 21]. However,
it does still seem to be an improvement over Pollard’s interleaved version of the
standard BSGS algorithm.
Bits #Elliptic Curves #DLPs per Curve average value for c standard deviation
28 100 10000 0.8926 0.0077
29 100 10000 0.9053 0.0061
30 100 10000 0.8961 0.0073
31 100 10000 0.9048 0.0068
32 100 10000 0.9207 0.0065
The BSGS and Pollard rho algorithms require computing elliptic curve opera-
tions in affine coordinates rather than projective coordinates (otherwise collisions
11
are not detected). Affine arithmetic requires inversion in the field, and this op-
eration is considerably more expensive than multiplications in the field or other
operations. A standard technique is to use the Montgomery trick [15] to offset
the cost of inversions, but this technique in the context of baby-step giant-step
algorithms has not previously been fully explored.
Let
E : y 2 + a1 xy + a3 y = x3 + a2 x2 + a4 x + a6
be a Weierstrass equation for an elliptic curve and let X = (x1 , y1 ) and Y =
(x2 , y2 ) be two general points on E (in other words, we assume Y 6= ±X). Recall
that −X = (x1 , −y1 − a1 x1 − a3 ). Let (x3 , y3 ) = X + Y . (We are interested in
affine coordinates rather than projective coordinates as the BSGS algorithm
cannot be used with projective coordinates.) Then
y2 − y1
λ= ,
x2 − x1
x3 = λ2 + a1 λ − x1 − x2 − a2 ,
y3 = −λ(x3 − x1 ) − y1 − a1 x3 − a3 .
We see that we can re-use the inversion (x2 − x1 )−1 and hence compute X − Y
with merely the additional costs 2M + S.
Taking I = 8M and S = 0.8M the cost of point “combined ± addition” is
I + 4M + 2S = 13.6M. Compared with the cost 2(I + 2M + S) of two additions,
we have a speedup by a factor of
I + 4M + 2S (8 + 4 + 1.6)M
≈ ≈ 0.63.
2(I + 2M + S) 2(8 + 2 + 0.8)M
12
If the cost of inversion is much higher (e.g., I ≈ 20M) then the speedup is by a
factor close to 2.
Elliptic curves over non-binary finite fields can be transformed to Edwards
form x2 +y 2 = c2 (1+x2 y 2 ), with (0, c) as identity element. Edwards curves can be
efficient when using projective coordinates, but we need to use affine coordinates
for BSGS algorithms. Let X = (x1 , y1 ) and Y = (x2 , y2 ), the addition law is
x1 y2 + y1 x2 y1 y2 − x1 x2
X +Y = , .
c(1 + x1 x2 y1 y2 ) c(1 − x1 x2 y1 y2 )
The cost of addition is therefore I + 6M. Since −(x, y) = (−x, y), we have
x1 y2 − y1 x2 y1 y2 + x1 x2
X −Y = , .
c(1 − x1 x2 y1 y2 ) c(1 + x1 x2 y1 y2 )
We can re-use the inversions and compute X − Y with merely the additional
costs 2M. So the speedup is
I + 8M 8+8
≈ ≈ 0.57.
2(I + 6M) 2(8 + 6)
The speedup looks good, but note that the cost of combined ± addition is I +8M
which is slower than the I + 4M + 2S when using Weierstrass curves.
The observation that one can compute X + Y and X − Y efficiently simul-
taneously was first introduced by Wang and Zhang [22], and they tried to apply
it to the Pollard rho algorithm. This seems not to be effective, since the Pollard
rho algorithm uses random walks that go in a “single direction”, whereas the
combined operation X +Y and X −Y gives us steps in “two different directions”.
13
Algorithm 1 Compute list of points using parallel computation
Input: Points S and T , integer M
Output: L = {S + [i]T : 0 ≤ i < M }
1: Choose k
2: Set M 0 = dM/ke
3: Compute T 0 = (M 0 )T
4: Compute (serial) T0 = S, Tj = Tj−1 + T 0 with 1 ≤ j < k
5: L = {T0 , T1 , . . . , Tk−1 }
6: for i = 0 to M 0 − 1 do
7: Compute in parallel using Montgomery trick Tj = Tj + T with 0 ≤ j < k
8: L = L ∪ {T0 , T1 , . . . , Tk−1 }
9: end for
Proof. Step 3 takes up to 2 log2 (M/k) group operations (including both doubles
and adds, so we use the cost I + 2M + 2S), while step 4 is k group operations
(in serial). The loop contains k additions performed in parallel, and so benefits
from the simultaneous modular inversion trick. The cost of each iteration is
I + 3(k − 1)M for the modular inversion, followed by k(2M + S) for the rest of
the elliptic curve addition operations.
14
The idea is to also precompute T 00 = 3T . For 0 ≤ i < M 0 /3 we compute the
sums Tj ± T and then Tj + T 00 . In other words, we compute three consecutive
points Tj + (3i − 1)T, Tj + (3i)T, Tj + (3i + 1)T using fewer than three full group
operations.
To get a greater speedup we use more of the combined operations. Hence
instead of using T 00 = [3]T we can take T 00 = [5]T . Then in the i-th iteration
we compute Tj = Tj + T 00 = Tj + 5T and then also Tj ± T and Tj ± 2T (where
2T is precomputed). In other words, we compute a block of 5 consecutive points
Tj + (5i − 2)T , Tj + (5i − 1)T , Tj + (5i)T , Tj + (5i + 1)T and Tj + (5i + 2)T
with one addition and two “combined ± additions”. Extending this idea one can
use T 00 = (2` + 1)T , and then precompute 2T, . . . , `T . We give the details as
Algorithm 2.
(` + k + 2 + 2 log2 (M/k))(I + 2M + S)
+ M/(k(2` + 1))((` + 1)I + (7k` + 5k − 3(` + 1))M + (2k` + k)S).
15
The result follows.
The important point is that we have reduced the number of inversions further
and reduced the number of multiplications. Again, a naive argument suggests a
speedup by up to (3.5 + 0.8)/(10.8) = 0.39 compared with the standard serial
approach, or 4.3/5.8 = 0.74 compared with using Algorithm 1.
16
(M/(k(2` + 1)))I + 72 M M + M S. Taking I = 8M, S = 0.8M and k = ` = 8
this is a speedup by
Hence, all the results in Table 3 can be multiplied by 0.4, and that is what we
have done to get the first two rows in the final block (e.g., 0.9 · 0.4 = 0.36 is the
claimed constant in the running time of the grumpy-giants algorithm).
For comparison, we consider the Pollard method implemented so that k walks
are performed in parallel using Montgomery’s simultaneous modular inversion
method. As we explained, this method gives an asymptotic speed-up of 0.53,
meaning the constant in this parallel Pollard rho algorithm is 0.53 · 0.886(1 +
o(1)) = 0.47(1 + o(1)). This value is the final entry in Table 3. Our conclusion is
that, in this optimised context, both the interleaved BSGS and grumpy-giants
algorithms are superior to Pollard rho.
6 Other Settings
There are other groups that have efficient inversion. For example, consider the
group Gq,2 of order q + 1 in F∗q2 . This can be viewed as the group of elements
in F∗q2 of norm 1. A standard fact is that one can compute a product h ∗ g
of two elements in Fq2 using three multiplications in Fq (and some additions).
Now, when g lies in the group Gq,2 then computing g −1 is easy (since gg q =
Norm(g) = 1, so g −1 = g q is got by action of Frobenius. So if g = u + vθ then
g −1 = u−vθ. It follows that one can compute hg and hg −1 in four multiplications,
only one multiplication more than computing hg. So our BSGS improvement can
be applied in this setting too. The speedup is by a factor of 4/6 = 2/3 since we
need 4 multiplications to do the work previously done by 6 multiplications. This
is less total benefit than in the elliptic curve case, but it still might be of practical
use.
The group Gq,2 is relevant for XTR. One way to represent XTR is using
elements of Gq3 ,2 . We refer to Section 3 of [11] for discussion. The cost of an
“affine” group operation in the torus representation of Gq,2 is I + 2M, which
is worse than the cost 3M mentioned already. So when implementing a BSGS
algorithm it is probably better to use standard finite field representations.
7 Conclusion
Our work is inspired by the idea that the negation map can be used to speed up
the computation of elliptic curve discrete logarithms. We explain how to compute
lists of consecutive elliptic curve points efficiently, by exploiting Montgomery’s
trick and also the fact that for any two points X and Y we can efficiently get
X − Y when we compute X + Y by sharing a field inversion. We use these
ideas to speed up the baby-step giant-step algorithm. Compared to the previous
17
approaches we achieve a significant speedup for computing elliptic curve discrete
logarithms.
We also give a new method to analyse the grumpy-giants algorithm, and de-
scribe and analyse a variant of this algorithm for groups with efficient inversion.
The new algorithms, like the original, have low overhead but high memory.
This means that our algorithms are useful only for discrete logarithm problems
small enough for the lists to fit into fast memory. For applications such as [4]
that require solving DLP instances in a short interval, the best method by far
is Pollard’s interleaved BSGS algorithm together with our idea of computing
points in blocks; the grumpy-giants algorithm is not suitable for this problem
and the Pollard kangaroo and Gaudry-Schost methods will take about twice the
time.
Acknowledgements
The authors thank Siouxsie Wiles for assistance with the graphs, and an anony-
mous referee for helpful comments.
References
1. Daniel J. Bernstein, Tanja Lange and Peter Schwabe, “On the Correct Use of the
Negation Map in the Pollard rho Method”, in D. Catalano et al (eds.), PKC 2011,
Springer LNCS 6571 (2011) 128–146.
2. D. J. Bernstein and T. Lange, “Two grumpy giants and a baby”, in E. W. Howe
and K. S. Kedlaya (eds.), Proceedings of the Tenth Algorithmic Number Theory
Symposium, MSP, Vol. 1 (2013) 87–111.
3. D. J. Bernstein and T. Lange, “Computing small discrete logarithms faster”, in S.
D. Galbraith and M. Nandi (eds.), INDOCRYPT 2012, Springer Lecture Notes in
Computer Science 7668 (2012) 317–338.
4. D. Boneh, E. Goh and K. Nissim, “Evaluating 2-DNF formulas on ciphertexts”, in
J. Kilian (ed.), Theory of Cryptography-TCC 2005, Springer LNCS 3378 (2005)
325-341.
5. M. Chateauneuf, A. C. H. Ling and D. R. Stinson, “Slope packings and coverings,
and generic algorithms for the discrete logarithm problem”, Journal of Combina-
torial Designs, Vol. 11, No. 1 (2003) 36–50.
6. K. Fong, D. Hankerson, J. Lopez, A. Menezes, “Field Inversion and Point Halving
Revisited”, IEEE Trans. Computers 53(8) (2004) 1047–1059.
7. S. D. Galbraith and R. S. Ruprai, “Using equivalence classes to speed up the
discrete logarithm problem in a short interval”, in P. Nguyen and D. Pointcheval
(eds.), PKC 2010, Springer LNCS 6056 (2010) 368–383.
8. S. D. Galbraith, J. M. Pollard and R. S. Ruprai, “Computing discrete logarithms
in an interval”, Math. Comp., 82, No. 282 (2013) 1181–1195.
9. R. Gallant, R. Lambert and S. Vanstone, “Improving the parallelized Pollard
lambda search on binary anomalous curves”, Mathematics of Computation, Vol-
ume 69 (1999) 1699–1705.
10. P. Gaudry and E. Schost, “A low-memory parallel version of Matsuo, Chao and
Tsujii’s algorithm, in D. A. Buell (ed.), ANTS VI, Springer LNCS 3076 (2004)
208–222.
18
11. R. Granger, D. Page and M. Stam, “On Small Characteristic Algebraic Tori in
Pairing-Based Cryptography”, LMS Journal of Computation and Mathematics, 9
(2006) 64–85.
12. R. Henry, K. Henry and I. Goldberg, “Making a nymbler Nymble using VERBS”,
in M. J. Atallah and N. J. Hopper (eds.), PETS 2010, Springer LNCS 6205 (2010)
111–129.
13. N. Koblitz, “Elliptic curve cryptosystems”, Mathematics of Computation, 48
(1987) 203–209.
14. V. Miller, “Use of elliptic curves in cryptography”, in H. C. Williams (ed.), Crypto
’85, Springer LNCS 218 (1986) 417–426.
15. P. L. Montgomery, “Speeding the Pollard and elliptic curve methods of factoriza-
tion”, Mathematics of Computation 48 (1987) 243–264.
16. V. I. Nechaev, “Complexity of a determinate algorithm for the discrete logarithm”,
Mathematical Notes 55, no. 2 (1994) 165–172.
17. J. M. Pollard, “Monte Carlo methods for index computation (mod p)”, Math.
Comp. 32, no. 143 (1978) 918–924.
18. J. Pollard, “Kangaroos, Monopoly and discrete logarithms”, Journal of Cryptology
13 (2000) 437–447.
19. D. Shanks, “Five number-theoretic algorithms”, Proceedings of the Second Man-
itoba Conference on Numerical Mathematics, Congressus Numerantium, No. VII,
Utilitas Math., Winnipeg, Man. (1973) 51–70.
20. P. van Oorschot and M. Wiener, Parallel collision search with cryptanalytic appli-
cations, Journal of Cryptology, vol. 12, no. 1 (1999) 1–28.
21. P. Wang and F. Zhang, “Computing Elliptic Curve Discrete Logarithms with the
Negation Map”, Information Sciences, Vol. 195 (2012) 277–286.
22. P. Wang and F. Zhang, “Improving the Parallelized Pollard Rho Method for Com-
puting Elliptic Curve Discrete Logarithms”, in The 4th International Conference
on Emerging Intelligent Data and Web Technologies (EIDWT-2013), 2013.
23. M. Wiener and R. Zuccherato, “Faster attacks on elliptic curve cryptosystems”, in
S. E. Tavares and H. Meijer (eds.), Selected Areas in Cryptography ’98, Springer
LNCS 1556 (1998) 190–120.
19
algorithm (or its variant) on the DLP instance (P, Q) and counted the number
of group operations performed until each DLP was solved. The average number
of group
√ operations over those 10000 trials was computed and represented in the
form c N . Repeating this for each of the groups G gives a list of 100 values c,
each of which is an estimate of the average-case running time of the algorithm
in one of the groups G. Then we computed the mean and standard deviation of
the
√ 100 estimates c, giving us a good estimate of the average case complexity
ci N of the algorithm for (i + 1)-bit group orders. This is the value reported.
Table 2 gives the mean values obtained for ci p over such experiments with the
original grumpy-giants algorithm with M = N/2. Table 5 gives the mean
valuespobtained for the grumpy-giants algorithm using efficient inversion with
M = N/2.
B 4-giants Algorithm
In [2] a 4-giants algorithm is mentioned (two in one direction, two in the other,
with computer-optimized shifts of the initial positions) as a variant of 2-giants
algorithm. The paper did not describe the 4-giants algorithm in detail. However,
we implemented a possible version of the 4-giants algorithm, and the experimen-
tal results do not seem to be as good as one may have expected. It seems that
the 4-giants algorithm is not as efficient as the 2-giants algorithm.
More precisely, the baby steps are of the form n0 P for small values n0 . √
The
first grumpy giant starts at Q and takes steps of size P 0 = M P for M ≈ 0.5 N .
The second grumpy giant starts at 2Q and takes steps of size P 00 = −(M + 1)P .
The third one starts at 3Q and takes steps of size P 000 = (M + 2)P . The fourth
one starts at 4Q and takes steps of size P 0000 = −(M + 3)P . The algorithm is an
interleaving algorithm in the sense that all five walks are done in parallel and
stored in lists. At each step one checks for a match among the lists (a match
between any two lists allows to solve the DLP). We performed experiments in
the same framework as above and summarize the results in the following table.
One sees that the constants are larger than the 1.25 of the 2-giants method.
20